Visualizing how a Transformer decoder generates text one token at a time.
This shows what the current token (Query) can "look at" to predict the next token. It can only see itself and previous tokens.
Indigo Box w/ Arrow: Current 'Query' token.
Green Boxes: Tokens the 'Query' can attend to.
This is the full "causal mask". Rows are queries, columns are keys. The highlighted row matches the query above.
Q \ K | <BOS> |
---|---|
<BOS> | 1 |