systemsMarch 21, 20264 min read

Database Read Replicas: Scaling Reads Without Sharding

How read replicas work, when they solve your database bottleneck, and the replication lag problem that bites teams who add them without thinking about consistency.

Database Read Replicas: Scaling Reads Without Sharding

What Are Read Replicas?

A read replica is a copy of your primary database that stays synchronized with it and serves only read queries. Writes still go to the primary. Reads can go to either.

It's horizontal scaling for your database read tier — without the complexity of sharding.


The Problem It Solves

Most applications are read-heavy. A social feed, a product catalog, a dashboard — they might have 10x or 100x more reads than writes. When your primary database is handling all of them, it becomes the bottleneck. CPU spikes, query latency climbs, writes slow down because reads are fighting them for resources.

Adding more app servers behind a load balancer doesn't help if they all hit the same database.

Read replicas split the load: writes go to the primary, reads go to replicas. The primary handles less traffic, query latency drops, and you can scale the read tier horizontally by adding more replicas.

Rule of thumb

When your database's CPU is consistently above 70% and your read/write ratio is heavily read-skewed, read replicas are the right first move — before you consider sharding.

How It Works

The primary database writes changes to a replication log (called WAL in PostgreSQL, binlog in MySQL). Replicas continuously stream this log and apply the changes to their own copy of the data.

This process is asynchronous by default. The primary doesn't wait for replicas to confirm before acknowledging a write to the application.

plain
1Write → Primary DB → WAL/binlog
2
3 Replica 1 (async)
4 Replica 2 (async)

Your application routes read queries to replicas and write queries to the primary. This routing can be handled by your ORM, a connection pool proxy (like PgBouncer with read/write splitting), or explicitly in application code.


When to Add Them

Add read replicas when:

  • Your system is read-heavy — >70% reads vs writes
  • Your primary database CPU is a sustained bottleneck
  • You need a hot standby for failover — a replica can be promoted to primary if the primary fails
  • You're running analytics queries that would otherwise lock or slow down the primary

When to use

High-traffic feeds, product listing pages, dashboards, reporting queries — anything where many users read data that changes infrequently.

The Replication Lag Problem

Here's the catch: because replication is asynchronous, replicas are always slightly behind the primary. This is called replication lag.

In practice it's usually milliseconds. Under heavy write load it can be seconds or more.

This means reads from replicas might return stale data. A user updates their profile, then immediately reads it back — they might get the old value if the read hit a replica that hasn't caught up yet.

Warning

Don't route reads that need to see the latest write to a replica. The pattern is: write to primary, then immediately read from primary for that same user's request. Background reads, feed queries, and non-latency-sensitive reads can go to replicas.

This is called read-your-writes consistency — it's a guarantee that reads after writes see those writes. Replicas break this by default.


When NOT to Add Them

  • Write-heavy systems — replicas barely help if 80% of your traffic is writes
  • When strong read consistency is non-negotiable — financial transactions, inventory counts, anything where stale reads cause correctness issues
  • When the bottleneck is write throughput — replicas don't help with write scaling; that requires sharding or a different DB model

Common mistake

Routing all reads to replicas and then getting bug reports that users see stale data after updates. Always route post-write reads to the primary for the same session.

Real World

GitHub uses read replicas heavily — most of what you browse (repositories, issues, pull requests) is read from replicas. They route writes to primary and reads to their replica pool, with explicit primary reads for operations that need fresh data.

Instagram at scale used a combination of read replicas and sharding — replicas handled the read fan-out, sharding handled write throughput as they grew to billions of accounts.


Takeaways

  • Read replicas copy your primary DB and serve read queries — writes still go to primary
  • Use them for read-heavy systems where the DB is the bottleneck
  • They introduce replication lag — replicas are slightly behind primary
  • Route post-write reads back to the primary to avoid stale reads
  • Don't use them for write-heavy systems or when strong read consistency is required
  • Adding 1–2 replicas often buys significant headroom before you need sharding
Share