System Design: Building a Scalable URL Shortener

A URL shortener seems trivial until it needs to serve 100,000 redirects per second with 99.99% uptime. Under that load, every architectural decision — from ID generation to cache eviction — becomes load-bearing. This article walks through the design from a blank whiteboard to a production-grade system.

1. Requirements and Scale Estimation

Before touching any architecture, nail the numbers. Assumptions drive every decision that follows.

Scale Targets

Writes: 100 million new URLs/day → ~1,160 writes/sec
Reads: 10 billion redirects/month → ~3,858 reads/sec (peak ~50,000/sec)
Read/Write ratio: ~100:1
URL retention: 5 years
Storage estimate: 100M URLs × 5 years × ~500 bytes = ~250 GB

The ratio tells us immediately: this is a read-heavy system. Caching is not optional — it is the primary scaling lever. Writes are slow; reads must be blazing fast.

2. High-Level Architecture

The system has two distinct flows: a write path (create short URL) and a read path (redirect). They have different latency requirements, different bottlenecks, and should be optimised independently.

3. The Core Problem: Generating Short IDs

The short ID is the heart of the system. It must be unique across all servers, compact enough to fit in 6–8 characters, and fast to generate at 1,000+ writes/sec without coordination bottlenecks.

Option A: MD5/SHA hash truncation

Hash the long URL and take the first 7 characters. Simple but broken: collision probability grows quickly (birthday problem), and you still need a collision-resolution strategy. For 1 billion URLs, a 7-character hash collides with ~50% probability.

Option B: Auto-increment + Base62 encoding

Use a global auto-increment counter (stored in a dedicated ID Generator service). Convert the integer to Base62 [a-zA-Z0-9]. Integer 125000 → "W7e" in Base62. This is clean, collision-free, and produces short URLs. The weakness is the single counter being a bottleneck and SPOF.

Range-based ID allocation

Solve the counter bottleneck by pre-allocating ranges to each application server. The ID service hands out blocks of 10,000 IDs at a time. App Server 1 gets IDs 1–10,000; App Server 2 gets 10,001–20,000. Each server exhausts its range locally before requesting another. The ID service handles ~1 RPC every few seconds instead of 1,000/sec.

Option C: Snowflake-style distributed IDs

Twitter's Snowflake approach generates 64-bit IDs using: 41 bits timestamp + 10 bits machine ID + 12 bits sequence. No coordination required — each server generates IDs independently. The trade-off is a ~10-character Base62 representation, slightly longer than the range-allocation approach.

4. The Redirect Path (Read)

When a user visits https://sh.rt/abc123, three things must happen in under 50ms: look up the short ID, find the original URL, and issue a redirect.

301 vs 302 — a critical decision

A 301 Permanent redirect is cached by the browser indefinitely. Future visits skip your servers entirely. This is ideal for latency and server load — but you lose all click analytics after the first visit from each browser, and you cannot update or delete URLs.

A 302 Temporary redirect is not cached. Every visit hits your servers. This gives you accurate analytics, hot-swappable destinations, and the ability to expire URLs — but at higher server cost.

Most real systems use 302 with a CDN layer: the CDN caches the 302 response (with a short TTL of 60–300 seconds), giving you near-CDN performance on popular links while retaining analytics and the ability to update destinations.

Cache layer: Redis

The in-memory cache sits between the app servers and the database. With a hot URL set of ~20 million entries, Redis holds the entire working set in RAM. Cache hit rate of 99%+ is achievable because URL access follows a power law — a small fraction of URLs receive the vast majority of traffic.

Cache eviction strategy matters

Use LRU (Least Recently Used) eviction, not TTL-based expiry. Most short URLs are created for specific campaigns and have a natural access curve: high traffic at launch, then silence. LRU naturally keeps active URLs warm and evicts stale ones. A 20 GB Redis node can hold ~10 million URL mappings (at 2 KB per entry) — easily covering the hot working set.

5. Database Design and Sharding

With 100 million URLs/day over 5 years, you will accumulate ~180 billion rows. A single PostgreSQL instance maxes out around 100M–500M rows at production query performance. You need horizontal sharding.

Schema (per shard)

CREATE TABLE urls (
    short_id   VARCHAR(8)   PRIMARY KEY,
    long_url   TEXT         NOT NULL,
    user_id    BIGINT,
    created_at TIMESTAMPTZ  DEFAULT NOW(),
    expires_at TIMESTAMPTZ,
    click_count BIGINT       DEFAULT 0
);

CREATE INDEX idx_user_id ON urls(user_id);
CREATE INDEX idx_expires  ON urls(expires_at) WHERE expires_at IS NOT NULL;

Sharding strategy

Shard by hash(short_id) mod N. Since reads always come in with the short ID, you can deterministically route to the correct shard without a routing table lookup. The application layer contains the sharding logic: shard = crc32(short_id) % num_shards. Start with 12 shards to allow headroom before re-sharding.

Why not consistent hashing?

Consistent hashing is ideal when you need to add/remove nodes without full data reshuffling. For a URL shortener where shards rarely change and short IDs are immutable, deterministic modulo sharding is simpler and equally effective. Consistent hashing shines in caches and peer-to-peer systems where membership is fluid.

6. Analytics: Decoupled from the Critical Path

Never block a redirect to record a click. The analytics write is non-critical — a slight delay or loss of a few events is acceptable. The redirect latency is not.

The redirect service publishes a click_event to Kafka (fire-and-forget). A separate analytics consumer reads from Kafka and writes to a time-series store (ClickHouse, Druid, or even PostgreSQL with a time-partitioned table). This design ensures a Kafka producer failure never impacts redirect SLA.

7. Handling Edge Cases

URL expiration

Store expires_at in the DB row and in Redis as a TTL. In Redis: SET abc123 https://... EX 86400 (expires in 24 hours). The CDN cache TTL must be shorter than the URL's expiry to avoid serving stale redirects from edge nodes.

Custom short codes

Users want sh.rt/my-brand. Add a custom flag column and a unique constraint on short_id. For custom codes, skip the ID generator and write directly with the user-supplied slug. Validate against a blocklist of reserved words (api, admin, static, etc.).

Hot key thundering herd

A viral tweet can drive 50,000 requests/sec to a single short URL. If the Redis entry expires at that exact moment, all 50,000 requests simultaneously cache-miss and hit the DB. Solution: probabilistic early expiration — recompute the cache value slightly before it expires using a small random chance, so refreshes are staggered. Alternatively, use cache locking: the first cache-miss acquires a lock, fetches from DB, writes to cache; all others wait for the lock then hit cache.

8. Capacity and Cost Summary

Back-of-envelope infrastructure

CDN nodes: 3 PoPs (US, EU, Asia) with ~5 TB cache each — absorbs 95% of read traffic
Load balancers: 2 (active-active) for app layer
App servers: 6–10 nodes (redirect + write services)
Redis cluster: 3 nodes, 64 GB RAM each — covers full hot working set
DB shards: 12 PostgreSQL nodes (primary + replica per shard)
ID generator: 2 nodes (1 standby), ~50 MB RAM
Kafka: 3 broker cluster for analytics events
Monthly cloud cost estimate: $8,000–$18,000 at this scale

9. What Interviewers Actually Want to Hear

In a system design interview, the diagram matters less than the reasoning. Here is what separates a strong answer:

State your assumptions explicitly before you design anything. If you assume 10B redirects/month, say it out loud.
Identify the read/write ratio and explain how it shapes the architecture. A 100:1 read-heavy system and a write-heavy system look completely different.
Discuss trade-offs: 301 vs 302, Base62 vs Snowflake, LRU vs TTL, modulo sharding vs consistent hashing. There is no single right answer — the interviewers want to see that you know the trade-off exists.
Know your bottlenecks: the DB is usually the bottleneck, which is why Redis exists. The ID generator is a potential SPOF, which is why range allocation or Snowflake exists.
Bring up failure modes: what happens if Redis goes down? If a shard goes down? If the ID generator goes down? A senior engineer thinks about the failure modes even when not asked.

UUID Generator

Need to generate IDs for your system? Use our client-side UUID v4 generator — bulk generation, multiple formats, instant download.

Open UUID Generator

Tools-Hut

System Design Deep Dive: Building a URL Shortener at Scale

1. Requirements and Scale Estimation

2. High-Level Architecture

3. The Core Problem: Generating Short IDs

Option A: MD5/SHA hash truncation

Option B: Auto-increment + Base62 encoding

Option C: Snowflake-style distributed IDs

4. The Redirect Path (Read)

301 vs 302 — a critical decision

Cache layer: Redis

5. Database Design and Sharding

Schema (per shard)

Sharding strategy

6. Analytics: Decoupled from the Critical Path

7. Handling Edge Cases

URL expiration

Custom short codes

Hot key thundering herd

8. Capacity and Cost Summary

9. What Interviewers Actually Want to Hear

UUID Generator

1. Requirements and Scale Estimation

2. High-Level Architecture

3. The Core Problem: Generating Short IDs

Option A: MD5/SHA hash truncation

Option B: Auto-increment + Base62 encoding

Option C: Snowflake-style distributed IDs

4. The Redirect Path (Read)

301 vs 302 — a critical decision

Cache layer: Redis

5. Database Design and Sharding

Schema (per shard)

Sharding strategy

6. Analytics: Decoupled from the Critical Path

7. Handling Edge Cases

URL expiration

Custom short codes

Hot key thundering herd

8. Capacity and Cost Summary

9. What Interviewers Actually Want to Hear

UUID Generator

Related Reading