Decoder Masked Self-Attention

Visualizing how a Transformer decoder generates text one token at a time.

Current Step: 0

Generating for Position: 1

Current Input Length: 1

<BOS>

Token Attention Focus

This shows what the current token (Query) can "look at" to predict the next token. It can only see itself and previous tokens.

<BOS>

Indigo Box w/ Arrow: Current 'Query' token.

Green Boxes: Tokens the 'Query' can attend to.

This is the full "causal mask". Rows are queries, columns are keys. The highlighted row matches the query above.

Q \ K	<BOS>
<BOS>	1

Generated Sequence: