System Design
While rare, system design problems are quite challenging to tackle if it's the first time you are dealing with them. The best practice is building real-world projects
There is no better substitute to getting better at design systems than building projects that use these techniques, however, if you are short on time, I have tried compiling the key ideas of system design. The structure and some of the components have been inspired by this Medium article about the glossary of system design basics and I have updated it to suit my needs when revising. Some of the diagrams have been used from that article as well
Characteristics of Distributed Systems
Scalability
Ability of a system, process, or network to grow and cope with increasing demand
Horizontal scaling: scale by adding more servers (Cassandra/MongoDB)
Vertical scaling: scale by adding more power to existing server (MySQL)
Involves downtime of the current server
Reliability
Probability a system will fail in a given period
Focus on redundancy of both software and data components
Availability over time
Reliability and performance are exclusive
High reliability but poor performance and high performance but poor reliability are both possibilities
Availability
Time a system remains operational to perform its required function
Includes maintainability, repair time, spares availability, etc.
Reliable → Available but converse is not true
Remaining accessible even if one or more nodes is down
Efficiency
Measured using latency and throughput
Latency: delay to obtain the first item (time spent waiting to get first item)
Throughput: number of items delivered in a given time unit (how many items can be sent in a time period)
Serviceability/Manageability
Simplicity/speed with which a system can be repaired or maintained
Ease of diagnosing and understanding problems, ease of making updates/modifications, how simple the system is to operate
Load Balancing
Spreads traffic across a cluster of servers to improve responsiveness and availability of applications. Keeps track of status of all resources while distributing requests
Reduces individual server load and avoids single point of failure
Added:
Between user and web server
Between web server and internal platform layer like application/cache servers
Between internal platform layer and database
Health check: periodically attempt to connect to backend server, if failed, removes server from pool
Benefits
Faster, uninterrupted service
Service providers experience less downtime and higher throughput
Easier for system administrators to handle incoming requests while decreasing wait time for others
Smart load balancers can provide predictive analytics, detecting traffic bottlenecks before they happen
Fewer failed or stressed components
Algorithms
Least connection method: server with fewest active connections; useful for large number of persistent client connections
Least response time method: server with fewest active connections and lowest average response time
Least bandwidth method: server that is currently serving the least amount of traffic measured in Mbps
Round robin method: cycles through list of servers and sends each new request to the next server; useful when servers are of equal specification and not many persistent connections
Weighted round robin method: servers with different processing capabilities; each server has a weight and higher weighted servers receive new connections first
IP hash: hash of IP address of client is calculated to redirect request to server
Redundancy: second load balancer can connect to primary to form cluster; each perform health checks on each other, if main is down, secondary takes over
Caching
Make better use of existing resources and achieving previously unattainable requirements
Take advantage of locality of reference principle: recently requested data is likely to be requested again
Application server cache: caching database resources or user requested data; can be distributed to global/distributed cache
Content Delivery Network (CDN): caching for blobs/static media
Push CDN: changes from server uploaded directly to CDN; good for low traffic systems
Pull CDN: new content grabbed from the server when the first user requests the content from the site; slower requests due to cache miss; good for high traffic website
Cache invalidation
Write-through: write to both database and cache; high reliability, high latency for writes
Achieves data consistency
Introduce delay when writing
Write-around: data written directly to database; less write operations to cache but cache miss needed to cache record
Write-back: write to cache first and periodically write to database; low latency and high throughput for write intensive, risky for data loss; async write-through
Better performance
Cache eviction policies
FIFO: first block accessed first
LIFO: last block accessed
LRU: least recently used
MRU: most recently used
LFU: items with least use discarded first
Random Replacement: randomly selected
Data Partitioning
Breaking up a big database into many small parts
Improves manageability, performance, availability, and load balancing
Grow horizontally instead of vertically
Partitioning methods
Horizontal: range-based partitioning (store rows with key in range to different tables); key must be balanced; Sharding
Vertical: divide data into separate servers; further growth can force each component to partition more finely; Partitioning
Dictionary-based: lookup service that maps each key to the database server with specific partitioning algorithm; easy to add new servers/change partitioning scheme
Partitioning criteria
Key/Hash-based partitioning: hash a key attribute to get partition number; fixes total number of servers since more servers requires hash function to change, involving redistribution of data and downtime; resolve using consistent hashing
List partitioning: each partition assigned list of values and assign records to partition containing key
Round-robin partitioning: given
n
partitions,i
row assigned to partitioni % n
Composite partitioning: combining the above partitioning; consistent hashing combines hash and list partitioning to reduce key-space to size that can be listed
Problems with data partitioning
Joins and denormalization: not feasible to perform joins that span database partitions and not efficient; workaround to denormalize database so queries can be performed from single table; introduces risk of data inconsistency
Referential integrity: enforcing integrity constraints like foreign key is difficult, most RDBMS does not support cross server foreign key; requires enforcement on application layer & periodic jobs to clean dangling references
Rebalancing: non-uniform data distribution and high load on partition requires more database partitions or rebalancing of existing partitions, requiring movement of data to new locations; use dictionary-based partitioning to make rebalancing easier but system more complex and single point of failure (lookup table)
Indexing
Makes searching through a table faster by using one or more columns of the database table, providing the basis for both rapid random lookups and efficient access of ordered records
Speed up data retrieval by introducing more keys but can reduce data insertion/update/delete time
Not worth using if write-heavy
Proxies
Intermediate piece of software/hardware that sits between client and server, facilitating requests for resources from other servers on behalf of the client, anonymizing the client (forward proxy) or anonymizing the server (reverse proxy)
Forward proxy
Cache data, filter requests, log requests, transform requests (add/remove headers, encrypt/decrypt, compressing resource)
Optimize request traffic by combining same data access requests into one request and return the result to the user (collapsed forwarding)
Reverse proxy
Retrieve resources from one or more servers on behalf of a client, anonymizing the server
Similar role to forward proxy
Redundancy & Replication
Redundancy is the duplication of critical components or functions of a system with the intention of increasing the reliability of the system, usually in the form of a backup or fail-safe, or to improve actual system performance
Removes single points of failure and provides backup in case of crisis
Focus on components/servers
Replication is the sharing of information to ensure consistency between redundant resources (software/hardware) to improve reliability, fault-tolerance, or accessibility
Focus on the data
Primary-replica relationship between original and copies
Primary gets all updates, ripple through to replica servers
Replicas outputs a message stating it has received the update and allow subsequent updates
SQL vs NoSQL
Relational database: structured with predefined schemas like phone books that store phone numbers and addresses
Non-relational database: unstructured, distributed, and have dynamic schema
SQL
Relational database
Stored in rows and columns
Row contains all information about entity
Column contains separate data points
NoSQL
Key-value stores: data stored as key-value pairs; e.g. Redis, Voldemort, Dynamo
Document databases: data stored as documents and grouped together in collections with each document having completely different structure; e.g. CouchDB and MongoDB
Wide-Column database: column families act as containers for rows, each row does not have the same number of columns; columns not known up front; useful for analyzing large datasets; e.g. Cassandra and HBase
Graph database: store data whose relations are best represented as a graph; data saved in graph structures with nodes as entities containing properties and lines as edges between entities; e.g. Neo4j and InfiniteGraph
Key differences
Storage
Data in tables
Different data storage models
Schema
Fixed schema, any alterations alter all rows
Dynamic schemas without uniformity needed
Querying
Uses SQL to define and manipulate
Uses UnQL (Unstructured Query Language) and query focused on a collection of documents
Scalability
Traditionally built to scale vertically
Horizontally scalable with ease to distribute data across servers automatically
Reliability (ACID)
ACID compliant, reliable and safe guarantee for performing transactions
Sacrifice ACID for performance and scalability
ACID
Atomicity: all transactions must succeed or fail completely and cannot be left partially complete even in system failure
Consistency: database follows rules that validate and prevent corruption at every step
Isolation: concurrent transactions cannot affect each other
Durability: transactions are final and system failures cannot rollback a complete transaction
BASE
Basically Available: ensures availability by spreading and replicating data across the nodes of the database cluster
Soft-state: delegates responsibility of synchronizing changes to developers
Eventually consistent: data reads are possible even until data is eventually consistent
Why SQL?
Ensuring ACID compliance: reducing anomalies and protects integrity of database by prescribing exactly how transactions interact with the database
Structured and unchanging data: data is consistent so there’s no need to use a system to support variety of data types and high traffic volume
Update once: normalizing data allows single change to affect all related data
Write-heavy flows can be optimized but must be careful about indexing
Why NoSQL?
Storing large volumes of data with little to no structure
Make the most of cloud computing and storage: cloud computing emphasizes horizontally scaling and NoSQL database support this well
Rapid development: useful for rapidly changing data structure and iterating without much downtime between versions
High performance and availability
Optimized for high-throughput read and write operations
CAP Theorem
CAP Theorem states that it is impossible for a distributed system to simultaneously provide the following properties: consistency, availability, and partition tolerance
Any distributed system must pick 2 out of 3 properties
Options are CA, CP, and AP
CA doesn’t make sense as a non-partition tolerant system is forced to give up either consistency or availability in the event of a network partition
Consistency
All nodes see the same data at the same time
Users can read or write from/to any node in the system and will receive the same data
Same as having single up-to-date copy of the data
Availability
Every request received by a non-failing node in the system must result in a response
Every request must terminate even during severe network failure
Partition tolerance
Partition is a communication break/network failure between any 2 nodes in the system
System must continue to operate even if there are partitions in the system
PACELC Theorem
In the event of no partition (E), then trade off between latency and consistency
Examples
DynamoDB and Cassandra: PA/EL
BigTable and HBase: PC/EC
MongoDB: PA/EC
Long-Polling vs WebSockets vs Server-Sent Events
Communication protocols between client and web server
AJAX Polling
Client repeatedly polls/requests a server for data
Regular intervals of 0.5 seconds
No data available → empty response
If data is sparse, lots of HTTP overhead
HTTP Long-Polling
Server pushes information to client whenever data is available
Client requests for data but does not expect an immediate response (hanging GET)
Server holds onto request if no data available and sends full response once available or until timeout expires
WebSockets
Full-duplex (bidirectional) communication channels over a single TCP connection
Persistent connection between client and server
Client establish web socket with WebSocket handshake
Server-Sent Events
Unidirectional client <- server connection
Client establishes connection
Server sends data to client in chunks when available (used for ChatGPT API with streaming API)
Best when needing real-time traffic from server to client or server generates data in a loop
Sharding vs Partitioning
Sharding
Divide single dataset among many databases to be distributed
Improves response time with less rows per database
Avoids total service outage since not every server will go down
Scales efficiently so more servers can be added
Methods
Range-based sharding: shard key assigned to a range; can cause unbalanced data but easy to implement
Hashed sharding: shard key assigned to each row using has function; results in even distribution of data but new hash function can cause problems (more servers)
Directory sharding: lookup table to match database information with shard key; flexible but can fail if lookup table has wrong information
Geo sharding: split and store database information according to geographical location; fast information retrieval due to shorter distance between shard and customer; cause uneven data distribution
Partitioning
Distributing database objects across different servers; some tables belong to certain servers
Usually referring to vertical partitioning
Improves reliability, availability, and performance
Consistency Patterns
Weak consistency: after data write, read request may or may not be able to get new data
Eventual consistency: post data write, reads will eventually see the latest data within milliseconds; data replicated asynchronously
Strong consistency: after data write, subsequent reads immediately see latest data; done synchronously
Common Questions
Why use a UUID and not an auto-incrementing ID as primary key?
UUID
Globally unique across tables/distributed system
Stateless
Secure as people cannot guess the ID
Can store timestamps
Not readable
Not naturally sortable by creation time
Potentially bad for insertion time (clustered primary key); requires reordering before insertion
Uses 16 bytes to store
ID
Readable
Less space needed (4/8 bytes at most)
Cannot be used in distributed systems due to duplicates
Cannot be generated in isolation (must read the previous ID)
Prone to guess attacks
What happens when you enter a URL into the browser?
Request is sent to the local DNS where the appropriate IP address is found
Client tries to establish TCP connection with server residing at IP address
Once connection established, client makes the HTTP request
Server processes request and sends response
[Optional] Any additional objects are requested separately or through a persistent TCP connection
How are web pages loaded?
Build the HTML DOM tree
Build the CSSOM tree, handle JS parsing, accessibility tree
Combine the DOM and CSSOM trees into render tree
Layout the geometry of the elements into boxes
Paint the boxes (first meaningful paint)
What’s the difference between hot and cold storage?
Hot storage refers to data that is accessed frequently or requires fast access while cold storage is used for data rarely needed.
How do you deal with high frequency user input?
Apply techniques like debouncing. The user input function can be triggered after a period of no-action to avoid repeatedly spamming the backend with requests. Alternatively, the user input function can trigger once and wait for a cooldown period (no-action).
How does debouncing differ from throttling?
Throttling involves running function at regular intervals while debouncing focuses on creating a cooldown period to avoid creating too many requests.
Last updated