Phase 07
Transformers Deep Dive
Phase 7: Transformers Deep Dive. 16 hands-on lessons building AI from first principles in the browser. Free reading; graded exercises and certificate with lifetime access.
- Why Transformers — The Problems with RNNs
- Self-Attention from Scratch (graded)
- Multi-Head Attention (graded)
- Positional Encoding — Sinusoidal, RoPE, ALiBi (graded)
- The Full Transformer — Encoder + Decoder (graded)
- BERT — Masked Language Modeling (graded)
- GPT — Causal Language Modeling (graded)
- T5, BART — Encoder-Decoder Models (graded)
- Vision Transformers (ViT) (graded)
- Audio Transformers — Whisper Architecture (graded)
- Mixture of Experts (MoE) (graded)
- KV Cache, Flash Attention & Inference Optimization (graded)
- Scaling Laws (graded)
- Build a Transformer from Scratch — The Capstone
- Attention Variants — Sliding Window, Sparse, Differential (graded)
- Speculative Decoding — Draft, Verify, Repeat (graded)