Kseniya Parkhamchuk

KV Cache Explained

Visualizing how transformers avoid recomputing past key/value pairs.

Enter a short sequence (words separated by spaces).

1. Input Tokens

Tokens will appear here.

2. K/V Vector Generation (Current Token)

Waiting for input...

3. Attention Calculation (Using Cache)

The current query attends to its own K/V pairs and all pairs from the cache.

4. KV Cache (Stored Past K/V Pairs)

Stores K/V pairs from past tokens to be reused.

Cache is empty.

Explanation

Click "Start / Next Step" to begin processing the sequence.