Kseniya Parkhamchuk
k.parkhamchuk@gmail.com
Model Memory Usage: Impact of KV Cache
Sequence Length: 4,096 tokens
512
4K
8K
Batch Size: 32
1
16
32
0
250
500
750
1000
Memory (GB)
7B
13B
30B
65B
175B
Model Weights Only
Model + KV Cache
Model
Parameters
Model Weights (GB)
KV Cache (GB)
Total (GB)
Cache Overhead
7B
7B
13.0
64.0
77.0
+491%
13B
13B
24.2
100.0
124.2
+413%
30B
30B
55.9
195.0
250.9
+349%
65B
65B
121.1
320.0
441.1
+264%
175B
175B
326.0
576.0
902.0
+177%
Key Insights:
KV cache grows linearly with sequence length and batch size
For long sequences, KV cache can exceed model weight memory
Larger models have proportionally smaller KV cache overhead
Memory requirements scale quadratically with attention heads