Visualizing how transformers avoid recomputing past key/value pairs.
Enter a short sequence (words separated by spaces).
The current query attends to its own K/V pairs and all pairs from the cache.
Stores K/V pairs from past tokens to be reused.