Low latency live streaming diagram

System Design Interview: How to Build Low‑Latency Live Streaming (What Actually Matters)

Low latency in live streaming means minimizing the delay from camera capture to viewer playback — often called glass‑to‑glass latency. In interviews you should (1) explain where latency is introduced, (2) show the levers you can pull to reduce it, and (3) describe measurable tradeoffs.

Below is a concise, interview‑friendly approach that highlights the things that actually matter.

Where latency is introduced

Typical glass‑to‑glass pipeline and common sources of delay:

Capture & encode: camera capture, frame batching, encoder latency (lookahead, CPU/GPU latency).
Transport from source to origin: network RTT, packet loss, retransmissions.
Ingest & server processing: transmuxing, packaging, origin processing (e.g., segmenting for HLS/DASH).
CDN / edge propagation: cache warm‑up, round trips to the edge.
Player/network buffers & decoding: jitter buffer, rebuffering, decoder latency, client buffering strategy.

If asked, sketch this pipeline quickly and attach approximate relative delays. Then say: “We reduce latency by targeting the biggest contributors first.”

The 4 levers (what to focus on)

1) Encode fast

Choose efficient codecs and low‑latency presets: H.264/H.265 are common; AV1 may be useful but has higher encode latency today.
Use low‑latency encoder settings: disable heavy lookahead, use faster presets, tune GOP (smaller GOP = fewer frames between keyframes → faster recovery and lower segment durations).
Reduce frame batching and tune encoder bitrate buffers (CBR vs VBR tradeoffs).
Consider hardware encoders (NVENC, QuickSync) for lower encode latency at scale.

Practical tips: set keyframe interval to match your segmenting strategy (e.g., 1s keyframes for sub‑second segments), and pick encoder presets that favour speed over compression when latency is critical.

2) Pick the right protocol

RTMP: long used for ingest; low overhead but less suited for browser playback.
WebRTC: built for real‑time, sub‑second latency, adaptive, and uses RTP/UDP with congestion control. Best for ultra‑low latency and interactive use cases.
SRT / RIST: strong for contribution links (reliability over lossy networks) with low latency guarantees; UDP‑based with ARQ/FEC.
Low‑latency HLS (LL‑HLS) / CMAF / DASH: bring down segment sizes and use chunked transfer to get lower latency with existing CDN stacks but typically a few hundred ms to a couple seconds.

Tradeoffs: WebRTC gives the lowest latency but is more complex to scale through traditional CDNs; chunked‑CMAF or LL‑HLS trades slightly higher latency for easier CDN compatibility and wider device support.

3) Use a low‑latency CDN and edge packaging

Push processing to the edge: do packaging/transmuxing close to viewers to avoid extra round trips to origin.
Use CDNs that support WebRTC or low‑latency HLS/CMAF (some CDNs provide edge WebRTC or chunked delivery).
Avoid origin round trips per segment—use edge caching and prefetching to reduce RTT.
Consider push vs pull strategies: pushing streams to many POPs reduces first‑viewer startup latency.

4) Adaptive bitrate (ABR)

ABR keeps playback smooth as network conditions change. Use multiple encoded renditions (ladder) and a good player ABR algorithm.
Ladder design matters: too many renditions increases ingest/packaging work; too few can cause stalls.
For low latency, coordinate ABR switching with short segments or chunked segments so switches don't introduce long stalls or require long keyframe waits.

Practical ladder tip: maintain an overlap of keyframe boundaries across renditions so the player can switch quickly.

Tuning buffers, monitoring, and iteration

Buffers to tune:
- Encoder buffer and keyframe interval
- Transport jitter buffer (at ingest and player)
- Player startup and rebuffer thresholds
Monitoring & observability: track end‑to‑end latency, segment arrival times, encode time, packet loss, retransmits, rebuffer events, and viewer QoE metrics.
Simulate network conditions (packet loss, bandwidth throttling, RTT) and run A/B experiments to measure changes.
Instrument timestamps (clock sync) to measure true glass‑to‑glass latency and identify hotspots.

Interview checklist (what to say)

Explain the pipeline and where latency comes from.
Present the four levers (encode, protocol, CDN/edge, ABR) and why each matters.
Discuss tradeoffs: latency vs scalability, complexity, and cost.
Mention monitoring, buffers, and iterative testing.

Quick example summary you can say in an interview

"Latency comes from capture, encode, network, origin/CDN, and player buffering. To reduce it prioritize: 1) faster encoding (small GOP, low‑latency presets), 2) real‑time protocols like WebRTC or reliable UDP (SRT) depending on browser/device needs, 3) low‑latency CDN/edge packaging to avoid origin RTTs, and 4) ABR tuned for short segments. Then tune buffers, instrument end‑to‑end latency, and iterate under real network conditions."

That structure shows you understand both the components and the practical tradeoffs of building low‑latency live streaming.

Useful metrics to bring up

Glass‑to‑glass latency (primary)
Rebuffer rate and startup time
Packet loss and FEC/retransmit rates
Viewer bitrate distribution (ABR effectiveness)
CDN origin vs edge latency

System Design Interview: How to Build Low‑Latency Live Streaming (What Actually Matters)

System Design Interview: How to Build Low‑Latency Live Streaming (What Actually Matters)

Where latency is introduced

The 4 levers (what to focus on)

Tuning buffers, monitoring, and iteration

Interview checklist (what to say)

Quick example summary you can say in an interview

Useful metrics to bring up

Comments

More from this blog

High-Score Amazon Data Scientist Interview Experience (Bugfree Users): What to Expect & How to Prepare

High-Score Amazon Data Scientist Interview Experience (Bugfree Users): What to Expect & How to Prepare

Stop Guessing in System Design Interviews: Use These 8 Resources

Stop Guessing in System Design Interviews: 8 Essential Resources

Hospital System OOD: Stop Modeling IDs—Model Relationships

Command Palette

System Design Interview: How to Build Low‑Latency Live Streaming (What Actually Matters)

Where latency is introduced

The 4 levers (what to focus on)

Tuning buffers, monitoring, and iteration

Interview checklist (what to say)

Quick example summary you can say in an interview

Useful metrics to bring up

Comments

More from this blog