Database Partitioning

Database Partitioning in System Design: Horizontal vs Vertical

Partitioning splits a large database into smaller, more manageable pieces to improve performance, scalability, and maintainability. In interviews you’ll often be asked when to partition and which strategy to choose. The short, memorable rule: pick based on access patterns and growth.

Below is a concise guide to the two primary partitioning strategies, their trade-offs, common patterns, pitfalls, and what to say in an interview.

What is Horizontal Partitioning (Sharding)?

Horizontal partitioning, or sharding, splits a table’s rows across multiple database instances. Each shard holds a subset of rows determined by a shard key (e.g., user_id, account_id, or a time range).

Pros:

Queries touch smaller datasets → lower latency for shard-local queries
Better write/read throughput because load is distributed
Fault isolation: one shard can fail without taking down all data
Easier to scale out by adding shards

Cons:

Cross-shard joins and transactions are harder or more expensive
Hot shards (uneven key distribution) lead to imbalanced load
Re-sharding (reshuffling data) can be operationally complex
Increased infrastructure and operational complexity (routing, metadata)

Common shard strategies:

Hash-based: evenly distributes keys (good default)
Range-based: good for time-series and range queries
Directory-based: explicit mapping for complex distributions

Example: Shard users by user_id hash, or shard time-series by month/year ranges.

What is Vertical Partitioning?

Vertical partitioning splits a table by columns. You separate frequently accessed (“hot”) columns from rarely used (“cold”) ones, or isolate sensitive fields into their own table.

Pros:

Fewer bytes read per query → improved I/O and cache efficiency
Better performance for narrow, common queries on wide tables
Easier to apply different storage/backup policies (hot vs cold)
Isolation of sensitive data (e.g., move PII to a hardened table)

Cons:

Requires joins across vertical partitions for full-row reads
More complex schema with additional foreign keys
May need additional care for transactional consistency across partitions

Common vertical patterns:

Hot/cold split: place frequently read columns in the main table; archive old columns elsewhere
Functional split: profile data vs authentication data vs analytics columns
Large binary data (images, blobs) moved to object storage or separate tables

Example: Move profile_photos and large JSON blobs to a separate table or blob store; keep core user fields (id, name, email) in the main table.

Decision Checklist (what to say in interviews)

Identify access patterns: Are reads/writes focused on subsets of rows (good for sharding) or on a few columns of many rows (good for vertical)?
Estimate growth: Is data volume or QPS expected to grow horizontally (more users) or vertically (wider schemas)?
Consider transaction and join requirements: Do you need strong cross-row transactions or complex joins? If yes, sharding can complicate things.
Look for hot keys: If a small keyspace will get heavy load, rethink shard strategy or use caching/load leveling.
Plan for operational complexity: re-sharding, backups, replication, and monitoring cost more with sharding.
Security and compliance: vertical partitioning can simplify isolating and auditing sensitive fields.

When answering: state the access pattern, justify the chosen partitioning, acknowledge trade-offs, and mention mitigation strategies (caching, consistent hashing, read replicas, two-phase commit alternatives, async rebalancing).

Common Pitfalls and Mitigations

Imbalanced shards: Use better shard keys or consistent hashing; implement rebalancing tools.
Cross-shard joins: Denormalize or maintain secondary indexes to avoid expensive joins.
Operational overhead: Automate monitoring, routing, and resharding; use managed sharding if available.
Data locality: Design shard keys to keep related data together when possible (e.g., by account or region).

Quick Interview Examples to Mention

Time-series logs: shard by date ranges (range-based sharding) or roll up older data to cheaper storage (vertical cold storage).
Multi-tenant app: shard by tenant_id to isolate tenant load and enable per-tenant scaling.
Wide user table: vertical-split large profile blob and photos into separate tables or object storage to reduce I/O on common queries.

Final Rule of Thumb

Choose partitioning based on access patterns and growth: shard (horizontal) when rows and traffic scale out; split columns (vertical) when tables are wide and you need to read fewer bytes or isolate sensitive/hot data.

Good interview answer: explain the dominant access pattern, pick a partitioning strategy, list trade-offs, and propose concrete mitigations for common issues.

#SystemDesign #Database #SoftwareEngineering

Database Partitioning in System Design: Horizontal vs Vertical (Know This for Interviews)

Database Partitioning in System Design: Horizontal vs Vertical

What is Horizontal Partitioning (Sharding)?

What is Vertical Partitioning?

Decision Checklist (what to say in interviews)

Common Pitfalls and Mitigations

Quick Interview Examples to Mention

Final Rule of Thumb

Comments

More from this blog

High-Score Amazon Data Scientist Interview Experience (Bugfree Users): What to Expect & How to Prepare

High-Score Amazon Data Scientist Interview Experience (Bugfree Users): What to Expect & How to Prepare

Stop Guessing in System Design Interviews: Use These 8 Resources

Stop Guessing in System Design Interviews: 8 Essential Resources

Hospital System OOD: Stop Modeling IDs—Model Relationships

Command Palette

Database Partitioning in System Design: Horizontal vs Vertical

What is Horizontal Partitioning (Sharding)?

What is Vertical Partitioning?

Decision Checklist (what to say in interviews)

Common Pitfalls and Mitigations

Quick Interview Examples to Mention

Final Rule of Thumb

Comments

More from this blog