System Design Concepts You Should Know

System Design Concepts You Should Know

In this section we will cover a few concepts that are widely applied when designing large-scale systems.

Vertical vs. Horizontal Scaling

Scaling an application comes down to two strategies: vertical scaling and horizontal scaling.

Vertical scaling is where you upgrade the server(s) you’re using to a more powerful machine. You increase the RAM, beef up the CPU, etc.

Horizontal scaling is where you add more machines to your server pool so the incoming work is distributed amongst more computers.

Vertical scaling is generally much easier than scaling horizontally, since you just need to swap out the machine. You don’t have to rewrite your application, deal with data consistency issues, worry about load balancing, etc. You just have to go to your AWS Console and pay Andy Jassy some more money to fix your scaling issues.

However, vertical scaling has its limitations. You’ll eventually reach a point where you’ve maxed out the RAM/CPU of the machine and you can’t upgrade it further. At that point, you’ll need to consider splitting the workload up across multiple machines.

The benefit of horizontal scaling is that it’s virtually limitless. Companies like Meta, Google and Amazon have used horizontal scaling to handle billions of users concurrently.

The downside of horizontal scaling is that it’s extremely complicated. You’ll have to deal with single points of failure, data consistency issues, network partitions, and much more.

Load Balancing

When you’re scaling horizontally, you have multiple instances of an API or service. Traffic to these instances should be distributed evenly, which is handled by another server called a load balancer. It sits between the client and the application servers.

There are many different algorithms and strategies to balance traffic between these server instances. Some strategies include

  • Round Robin - Assign requests to each server, one-by-one. This strategy is simple, however, it does not  consider the current conditions/capacity of your servers. This can lead to scheduling issues.

  • Least Connections - Poll all the servers in your cluster to determine their current load. Then, send the request to the server that has the minimum load. This strategy distributes load evenly since it considers the current resource usage, but it requires extra work to determine this usage among all machines.

  • Power of 2 Random Choices - You randomly select two servers from the cluster. You check the load (CPU/RAM usage) for both of these machines. Then, you send the request to the server with the lesser load of the two. This approach avoids determining the usage of every machine, while still routing requests to servers with higher availability.

CAP Theorem

One of the hardest bottlenecks with scaling an application is the database. Your first course of action should be to scale your database vertically and scale reads with read-only replicas.

However, you’ll eventually reach a point where you need to use a distributed database in order to scale, such as Amazon’s DynamoDB or Apache Cassandra.

Using a distributed database adds additional complexity. For example, we need a strategy for dealing with network issues where our database nodes may not be able to communicate with each other. This is called a network partition.

Tradeoffs for these strategies can be understood with the CAP Theorem. This describes the trade off your distributed database makes between Consistency and Availability in the face of a (network) Partition.

Databases can either be CP (favor consistency) or AP (favor availability).

For more system design basics check out this 10 minute video. It covers 20 system design concepts like HTTP, DNS, sharding and more.