Kseniya Parkhamchuk

Decoder Masked Self-Attention

Visualizing how a Transformer decoder generates text one token at a time.

Current Step: 0
Generating for Position: 1
Current Input Length: 1

Decoder Input Sequence:

<BOS>

Token Attention Focus

This shows what the current token (Query) can "look at" to predict the next token. It can only see itself and previous tokens.

<BOS>

Indigo Box w/ Arrow: Current 'Query' token.

Green Boxes: Tokens the 'Query' can attend to.

Attention Mask Matrix

This is the full "causal mask". Rows are queries, columns are keys. The highlighted row matches the query above.

Q \ K<BOS>
<BOS>1
Generated Sequence: