Kseniya Parkhamchuk

k.parkhamchuk@gmail.com

Model Memory Usage: Impact of KV Cache

Sequence Length: 4,096 tokens

5124K8K

Batch Size: 32

11632

Model Weights OnlyModel + KV Cache

Model	Parameters	Model Weights (GB)	KV Cache (GB)	Total (GB)	Cache Overhead
7B	7B	13.0	64.0	77.0	+491%
13B	13B	24.2	100.0	124.2	+413%
30B	30B	55.9	195.0	250.9	+349%
65B	65B	121.1	320.0	441.1	+264%
175B	175B	326.0	576.0	902.0	+177%

Key Insights:

KV cache grows linearly with sequence length and batch size
For long sequences, KV cache can exceed model weight memory
Larger models have proportionally smaller KV cache overhead
Memory requirements scale quadratically with attention heads