Phase 09

Reinforcement Learning

Phase 9: Reinforcement Learning. 12 hands-on lessons building AI from first principles in the browser. Free reading; graded exercises and certificate with lifetime access.

MDPs, States, Actions & Rewards (graded)
Dynamic Programming — Policy Iteration & Value Iteration (graded)
Monte Carlo Methods — Learning from Complete Episodes (graded)
Temporal Difference — Q-Learning & SARSA (graded)
Deep Q-Networks (DQN) (graded)
Policy Gradient — REINFORCE from Scratch (graded)
Actor-Critic — A2C and A3C (graded)
Proximal Policy Optimization (PPO) (graded)
Reward Modeling & RLHF (graded)
Multi-Agent RL (graded)
Sim-to-Real Transfer (graded)
RL for Games — AlphaZero, MuZero, and the LLM-Reasoning Era (graded)