Database Read Replicas: Scaling Reads Without Sharding
What Are Read Replicas?
A read replica is a copy of your primary database that stays synchronized with it and serves only read queries. Writes still go to the primary. Reads can go to either.
It's horizontal scaling for your database read tier — without the complexity of sharding.
The Problem It Solves
Most applications are read-heavy. A social feed, a product catalog, a dashboard — they might have 10x or 100x more reads than writes. When your primary database is handling all of them, it becomes the bottleneck. CPU spikes, query latency climbs, writes slow down because reads are fighting them for resources.
Adding more app servers behind a load balancer doesn't help if they all hit the same database.
Read replicas split the load: writes go to the primary, reads go to replicas. The primary handles less traffic, query latency drops, and you can scale the read tier horizontally by adding more replicas.
Rule of thumb
How It Works
The primary database writes changes to a replication log (called WAL in PostgreSQL, binlog in MySQL). Replicas continuously stream this log and apply the changes to their own copy of the data.
This process is asynchronous by default. The primary doesn't wait for replicas to confirm before acknowledging a write to the application.
| 1 | Write → Primary DB → WAL/binlog |
| 2 | ↓ |
| 3 | Replica 1 (async) |
| 4 | Replica 2 (async) |
Your application routes read queries to replicas and write queries to the primary. This routing can be handled by your ORM, a connection pool proxy (like PgBouncer with read/write splitting), or explicitly in application code.
When to Add Them
Add read replicas when:
- Your system is read-heavy — >70% reads vs writes
- Your primary database CPU is a sustained bottleneck
- You need a hot standby for failover — a replica can be promoted to primary if the primary fails
- You're running analytics queries that would otherwise lock or slow down the primary
When to use
The Replication Lag Problem
Here's the catch: because replication is asynchronous, replicas are always slightly behind the primary. This is called replication lag.
In practice it's usually milliseconds. Under heavy write load it can be seconds or more.
This means reads from replicas might return stale data. A user updates their profile, then immediately reads it back — they might get the old value if the read hit a replica that hasn't caught up yet.
Warning
This is called read-your-writes consistency — it's a guarantee that reads after writes see those writes. Replicas break this by default.
When NOT to Add Them
- Write-heavy systems — replicas barely help if 80% of your traffic is writes
- When strong read consistency is non-negotiable — financial transactions, inventory counts, anything where stale reads cause correctness issues
- When the bottleneck is write throughput — replicas don't help with write scaling; that requires sharding or a different DB model
Common mistake
Real World
GitHub uses read replicas heavily — most of what you browse (repositories, issues, pull requests) is read from replicas. They route writes to primary and reads to their replica pool, with explicit primary reads for operations that need fresh data.
Instagram at scale used a combination of read replicas and sharding — replicas handled the read fan-out, sharding handled write throughput as they grew to billions of accounts.
Takeaways
- Read replicas copy your primary DB and serve read queries — writes still go to primary
- Use them for read-heavy systems where the DB is the bottleneck
- They introduce replication lag — replicas are slightly behind primary
- Route post-write reads back to the primary to avoid stale reads
- Don't use them for write-heavy systems or when strong read consistency is required
- Adding 1–2 replicas often buys significant headroom before you need sharding