Message Queues: Async Without the Chaos
What Is a Message Queue?
A message queue is a persistent buffer between two services. One service (the producer) puts a message into the queue. Another service (the consumer) picks it up and processes it — on its own schedule, at its own pace.
The producer doesn't wait. It fires the message and moves on.
The Problem It Solves
Without a queue, services are tightly coupled. Service A calls Service B directly. If B is slow, A waits. If B is down, A fails. If B gets a burst of traffic it can't handle, requests back up and everything crashes.
A queue breaks that dependency:
- Decoupling — A doesn't know or care if B is fast, slow, or temporarily down
- Buffering — traffic spikes don't overwhelm the consumer; the queue absorbs them
- Retry logic — if processing fails, the message stays in the queue and gets retried
- Backpressure — consumers process at their natural pace without being overwhelmed
Rule of thumb
How It Works
| 1 | Producer → [Queue] → Consumer |
| 2 | ↑ ↓ |
| 3 | fast path slow processing |
| 4 | returns 200 happens async |
The producer writes a message (a JSON payload, typically) to the queue. The queue persists it to disk — it won't lose the message if the consumer crashes. The consumer polls the queue, picks up the message, processes it, and acknowledges it. On acknowledgment, the queue deletes the message. If the consumer crashes before acknowledging, the message goes back to the queue and gets redelivered.
Dead-letter queues (DLQ): After N failed retries, the message moves to a DLQ — a separate queue for messages that couldn't be processed. This lets you investigate failures without losing data.
When to Add It
Add a message queue when:
- Processing is async by nature — emails, notifications, image processing, report generation
- You have write-heavy bursts — a queue absorbs the spike so downstream services don't get crushed
- Two services need to communicate but shouldn't be directly dependent on each other
- You need at-least-once delivery with retry guarantees
When to use
When NOT to Add It
- The user needs the result immediately — you can't put a login check in a queue
- Simple CRUD operations with no downstream processing
- Read-heavy systems — queues are for write/processing flows
- When you're adding it for "future scale" with no current need — this is premature complexity
Warning
Choosing a Queue
| System | Best for |
|---|---|
| Kafka | High-throughput event streaming, audit logs, billions of messages/day |
| RabbitMQ | Task queues, complex routing, smaller scale |
| AWS SQS | Managed, simple, great if you're already on AWS |
| Redis Streams | Lightweight, if Redis is already in your stack |
Kafka is often over-specified. Unless you're doing event sourcing or need message replay across multiple consumers at massive scale, SQS or RabbitMQ is simpler.
Common mistake
Real World
Uber uses Kafka for everything that happens after you book a ride — notifications, driver dispatch events, analytics. The booking request itself is synchronous (you need to know it worked), but everything downstream is a message.
WhatsApp uses queues heavily for message delivery. Your message goes into a queue for the recipient's device. If they're offline, it stays there until they reconnect.
Takeaways
- A message queue decouples producers and consumers — the producer fires and forgets
- Use it for async processing: emails, notifications, image resizing, background jobs
- Queues absorb traffic spikes and provide retry guarantees
- Don't use them when the user needs an immediate response
- Kafka for high-throughput streaming; SQS/RabbitMQ for task queues
- Always implement dead-letter queues for failed message handling