Kseniya Parkhamchuk

Model Memory Usage: Impact of KV Cache

5124K8K
11632
02505007501000Memory (GB)7B13B30B65B175B
Model Weights OnlyModel + KV Cache
ModelParametersModel Weights (GB)KV Cache (GB)Total (GB)Cache Overhead
7B7B13.064.077.0+491%
13B13B24.2100.0124.2+413%
30B30B55.9195.0250.9+349%
65B65B121.1320.0441.1+264%
175B175B326.0576.0902.0+177%

Key Insights:

  • KV cache grows linearly with sequence length and batch size
  • For long sequences, KV cache can exceed model weight memory
  • Larger models have proportionally smaller KV cache overhead
  • Memory requirements scale quadratically with attention heads