Circuit Breakers: Stopping Cascading Failures Before They Spread
What Is a Circuit Breaker?
A circuit breaker is a pattern that stops calls to a failing service before they pile up and take your whole system down.
Named after the electrical component. When too much current flows through an electrical circuit, the breaker trips and cuts the connection — protecting everything downstream. Same idea in software: when a dependency is failing, stop calling it. Fail fast instead of waiting.
The Problem It Solves
Imagine Service A calls Service B. Service B starts responding slowly — 30 seconds instead of 200ms. Service A's threads are now stuck waiting. New requests come in, create new threads waiting on B. Thread pool exhausts. Service A stops responding to its callers. They pile up. Now the entire system is down because one dependency got slow.
This is a cascading failure. One slow service takes down everything upstream.
A circuit breaker short-circuits this pattern. Instead of waiting 30 seconds for B to respond, Service A detects the failure pattern and immediately returns an error (or a fallback) — in milliseconds. No waiting. No thread exhaustion. No cascade.
How It Works
The circuit breaker has three states:
Closed (normal operation) — requests pass through to the downstream service. The breaker monitors success/failure rates.
Open (failing) — too many failures occurred. The breaker trips open. All requests immediately return an error without calling the downstream service. No actual calls are made.
Half-open (recovery check) — after a timeout, the breaker allows one test request through. If it succeeds, the breaker closes again. If it fails, it opens again.
| 1 | Closed → (failures exceed threshold) → Open |
| 2 | Open → (timeout expires) → Half-open |
| 3 | Half-open → (test succeeds) → Closed |
| 4 | Half-open → (test fails) → Open |
Common thresholds: open after 5 consecutive failures or 50% failure rate in a 10-second window. Timeout before half-open: 30–60 seconds.
When to Add It
Add a circuit breaker when:
- Your service has external dependencies — third-party APIs, other internal services, databases
- High failure tolerance is required — 99.9%+ uptime matters
- You're in a microservices architecture — each service call is a potential failure point
- A slow dependency would otherwise exhaust your thread pool or connection pool
Rule of thumb
When NOT to Add It
- Simple monolithic applications — fewer service boundaries, less need
- In-process function calls — circuit breakers are for network calls
- When you're at early stage — the operational overhead isn't justified yet
Note
Fallback Behavior — the Part That Matters
A circuit breaker without a fallback strategy just converts a slow failure into a fast failure. That's better, but incomplete.
Good fallback patterns:
- Cached response — return the last known good response
- Degraded response — return an empty result or default value
- Queueing — buffer the request and retry when the circuit closes
- User messaging — show the user a "this feature is temporarily unavailable" message
Common mistake
Real World
Netflix built Hystrix specifically for this problem — they had hundreds of microservices and a single slow dependency was taking down unrelated features. Hystrix let them isolate failures: if the recommendations service was slow, it didn't affect video playback.
Netflix's rule: every outbound call in their stack runs through a circuit breaker with a defined fallback.
Takeaways
- Circuit breakers detect failing dependencies and short-circuit calls to them — fast failure instead of slow failure
- Three states: closed (normal) → open (failing) → half-open (recovery check)
- They prevent cascading failures in distributed systems
- Requires a defined fallback — what happens when the circuit is open?
- Essential in microservices; overkill in simple monoliths
- Netflix's Hystrix, Resilience4j (Java), and Polly (.NET) are common implementations