systemsMarch 21, 20263 min read

Load Balancers: Distributing Traffic Without Bottlenecks

What a load balancer actually does, when your system needs one, and the one mistake engineers make when they think adding one solves availability.

Load Balancers: Distributing Traffic Without Bottlenecks

What Is a Load Balancer?

A load balancer sits in front of your servers and distributes incoming requests across them. That's it. One IP address, many servers behind it.

Without one, every request hits the same machine. That machine becomes your ceiling — in capacity and in availability. When it goes down, your service goes down.


The Problem It Solves

Single servers have hard limits. You can only add so much CPU and RAM before vertical scaling becomes impractical or cost-prohibitive. At some point, you need to add more servers — and a load balancer is what makes them look like one.

It solves two things:

  • Capacity — spread load across many servers so no one machine is saturated
  • Availability — if one server dies, the load balancer routes around it

Note

A load balancer doesn't eliminate failure. It routes around it. Your servers still need to be stateless and interchangeable for this to work.

How It Works

The load balancer receives every request and picks a server to forward it to. Common algorithms:

  • Round-robin — request 1 goes to server A, request 2 to B, request 3 to C, back to A. Simple and even.
  • Least connections — route to whichever server is handling the fewest active requests. Better for long-lived connections.
  • IP hash — same client always hits the same server. Used when you can't avoid sticky sessions.

It also runs health checks. Every few seconds it pings each server — if a server stops responding, it gets removed from the pool. Automatically.


When to Add It

Add a load balancer when:

  • Your single server is consistently hitting CPU or memory limits
  • You want zero-downtime deploys (deploy to half the pool, shift traffic, deploy the other half)
  • You need more than one server for redundancy
  • Traffic is above ~2K sustained RPS on a single box

Rule of thumb

If you have more than one app server, you need a load balancer. There's no clean way to distribute traffic without one.

When NOT to Add It

  • Early stage, single server, <1K RPS — it's unnecessary complexity
  • When your bottleneck is the database, not the app server — adding more app servers behind a load balancer won't help if they all hammer the same DB

Common mistake

Adding a load balancer thinking it solves availability when the real single point of failure is the database. A load balancer routes around failed app servers. It does nothing for a failed database.

Real World

AWS ALB, Google Cloud Load Balancing, Nginx, and HAProxy are the common choices. Nginx is free and handles millions of requests per second on modest hardware. AWS ALB handles SSL termination, health checks, and path-based routing out of the box.

Netflix runs thousands of load balancers across regions. They treat them as commodity infrastructure — replaceable, stateless, instrumented.


Takeaways

  • A load balancer distributes traffic and routes around failed servers
  • It requires your app servers to be stateless — if they store local session state, round-robin breaks everything
  • Add one when you have >1 server or >2K RPS
  • It solves the app tier bottleneck — not the database bottleneck
  • Don't add one prematurely; it adds latency and a new component to operate
Share