High Bandwidth Memory
How stacked DRAM enables modern AI accelerators
High Bandwidth Memory represents a fundamental shift in how we connect processors to memory. Rather than spreading memory chips across a circuit board and connecting them with traces, HBM stacks DRAM dies vertically and connects them with thousands of through-silicon vias.
The Bandwidth Problem
Traditional DDR memory connects to processors through a relatively narrow bus. DDR5, the current standard, provides around 50 GB/s per channel. A high-end system with 8 channels achieves roughly 400 GB/s total.
Modern AI accelerators need more. Training large language models requires moving billions of parameters between memory and compute units every forward pass. The memory wall—the gap between compute capability and memory bandwidth—becomes the bottleneck.
Vertical Integration
HBM solves this by going vertical. A single HBM stack contains 4-12 DRAM dies, each connected to a base logic die through TSVs. These vias are essentially vertical wires drilled through silicon, enabling thousands of parallel connections in the footprint of a single chip.
The result: HBM3E delivers over 1 TB/s per stack. An NVIDIA H100 GPU with 6 HBM stacks achieves roughly 3.35 TB/s—nearly an order of magnitude more than DDR5.
The Tradeoffs
Nothing is free. HBM costs more per gigabyte than DDR. The stacking process is complex. Heat dissipation becomes harder when you stack heat-generating dies. Capacity per stack is limited compared to spreading chips across a board.
For AI training, the bandwidth gains outweigh these costs. For other workloads, the calculus differs.
What Comes Next
HBM4 is coming. Expect wider interfaces, more stacks per package, and tighter integration with compute dies. The memory wall isn't going away, but we keep finding ways to climb it.
See also: Through-Silicon Vias for how the vertical connections work.