Introduction to Horizontal and Vertical Scaling

Imagine you have a website that solves mathematical equations. You notice that it slows down or even fails to respond when it receives a large number of requests. This is because your server, which runs on a machine with 1GB of RAM and a 2-core CPU, can only handle a certain amount of requests before it becomes overwhelmed.

To fix this issue, you can scale your servers. There are two approaches to scaling: horizontal and vertical.

Vertical scaling, also known as scaling up, involves switching to a more powerful machine or upgrading your current machine. This means adding more resources such as more RAM, CPU cores, storage, and networking hardware. However, vertical scaling has its limits. Larger machines are more likely to shut down and there is a physical limit to how large a single machine can be. Additionally, if the machine goes down, your website will be unavailable until the machine and server are back up and running.

To overcome these limitations, you can use horizontal scaling, also known as scaling out. This involves creating a pool of machines with the same configuration and distributing incoming requests among them. If one of the machines goes down, the load can be distributed to the remaining machines, eliminating the issue of a single point of failure.

In practice, it is common to use a combination of horizontal and vertical scaling. Tools like Kubernetes, AWS ASG, and AWS LB make it easy to manage horizontal and vertical scaling.