Phase 07

Transformers Deep Dive

Phase 7: Transformers Deep Dive. 16 hands-on lessons building AI from first principles in the browser. Free reading; graded exercises and certificate with lifetime access.

Why Transformers — The Problems with RNNs
Self-Attention from Scratch (graded)
Multi-Head Attention (graded)
Positional Encoding — Sinusoidal, RoPE, ALiBi (graded)
The Full Transformer — Encoder + Decoder (graded)
BERT — Masked Language Modeling (graded)
GPT — Causal Language Modeling (graded)
T5, BART — Encoder-Decoder Models (graded)
Vision Transformers (ViT) (graded)
Audio Transformers — Whisper Architecture (graded)
Mixture of Experts (MoE) (graded)
KV Cache, Flash Attention & Inference Optimization (graded)
Scaling Laws (graded)
Build a Transformer from Scratch — The Capstone
Attention Variants — Sliding Window, Sparse, Differential (graded)
Speculative Decoding — Draft, Verify, Repeat (graded)