![Consistency vs Availability cover image](https://bugfree-s3.s3.amazonaws.com/mermaid_diagrams/image_1766427354319.png "Consistency vs Availability"){:width="700px"}

Consistency vs Availability — Choose Like an Engineer

In distributed systems interviews you'll often be asked to pick between Consistency and Availability under the constraints of the CAP theorem. Here's a practical, interview-ready approach to make that decision and explain it clearly.

Quick definitions

Consistency (C): Every read returns the most recent write. Strong consistency implies linearizability.
Availability (A): Every request receives a response (non-error) — even if the returned data is stale.
Partition Tolerance (P): The system continues to operate despite network partitions. Since partitions happen in real systems, you effectively must trade off C vs A when partitions occur.

How to decide (step-by-step)

Clarify product requirements
- Ask what must be correct on every request and what can tolerate staleness.
- Questions to ask: Is money involved? Can users see temporarily inconsistent views? What are SLAs (latency, uptime)?
Map core use cases to correctness needs
- Strong consistency required: banking transfers, inventory reservation at checkout, billing, authentication state.
- High availability preferred: social feeds, analytics dashboards, recommendation systems, news feeds.
State the cost of choosing each side
- Choosing Consistency (CP): During partitions you may reject or timeout some requests (reduced availability). Users may see errors or delays, but data stays correct.
- Choosing Availability (AP): During partitions you continue to serve requests (high availability) but replicas can diverge and some reads may be stale.
Propose mechanisms and concrete implementation options
- For consistency (CP):
  - Use consensus algorithms (Raft, Paxos) or primary/leader sync replication.
  - Synchronous replication or quorum reads/writes to ensure latest data.
  - Strong guarantees like linearizability where needed.
- For availability (AP):
  - Use asynchronous replication, eventual consistency, caching layers.
  - Conflict resolution strategies (last-write-wins, application-level merges, CRDTs).
  - Read-repair, background reconciliation, and TTL-driven consistency.
- Middle-ground / flexible approaches:
  - Tunable consistency (e.g., Cassandra's read/write consistency levels) so you can pick per-operation guarantees.
  - Hybrid designs: strong consistency for critical operations, eventual consistency for non-critical reads.
Be ready to iterate
- Propose monitoring, metrics (latency, error rates, staleness), and rollback/mitigation strategies.
- Discuss how you'd adjust consistency levels or fallback behaviors when you observe real-world failures.

Interview checklist (short script)

Clarify requirements: "Which operations must be correct immediately? Which can be eventual?"
Pick default: justify CP or AP with 1–2 crisp examples.
Explain costs: what fails under partition and how the system responds.
Give concrete tech choices: Raft/Paxos or async replication + CRDTs/caching.
Offer hybrid or tunable solutions and how you'd monitor them.

Concrete examples

Banking transfer: CP. Use leader-based consensus, synchronous commit, and strict ordering to avoid double-spends. Expect some availability loss during partitions.
Product inventory for checkout: CP for the final reservation step; AP for browsing and catalog reads.
Social feed: AP. Serve rapidly from caches or replicas; use eventual reconciliation and versioning.

Additional considerations

Client-side strategies: idempotent operations, retries with backoff, and read-your-writes guarantees where necessary.
Observability: track staleness windows, divergence rates, and error vs success ratios.
Business impact: measure user tolerance for stale data vs downtime — use that to justify design choices.

TL;DR (what to say in interviews)

"Because partitions happen, we must pick C or A for the affected operations. Let's clarify which operations require immediate correctness. If it's money or inventory, favor consistency (CP) and accept reduced availability during partitions. If it's user-facing feeds or analytics, favor availability (AP) and reconcile divergence later. I'll propose concrete mechanisms (Raft/Paxos or eventual consistency + CRDTs), explain failure modes, and offer hybrid/tunable options and monitoring to iterate after launch."

#SystemDesign #CAPTheorem #DistributedSystems #InterviewPrep

Consistency vs Availability in System Design Interviews: Choose Like an Engineer

Consistency vs Availability — Choose Like an Engineer

Quick definitions

How to decide (step-by-step)

Interview checklist (short script)

Concrete examples

Additional considerations

TL;DR (what to say in interviews)

Comments

More from this blog

High-Score Amazon Data Scientist Interview Experience (Bugfree Users): What to Expect & How to Prepare

High-Score Amazon Data Scientist Interview Experience (Bugfree Users): What to Expect & How to Prepare

Stop Guessing in System Design Interviews: Use These 8 Resources

Stop Guessing in System Design Interviews: 8 Essential Resources

Hospital System OOD: Stop Modeling IDs—Model Relationships

Command Palette

Consistency vs Availability — Choose Like an Engineer

Quick definitions

How to decide (step-by-step)

Interview checklist (short script)

Concrete examples

Additional considerations

TL;DR (what to say in interviews)

Comments

More from this blog