Phase 09
Reinforcement Learning
Phase 9: Reinforcement Learning. 12 hands-on lessons building AI from first principles in the browser. Free reading; graded exercises and certificate with lifetime access.
- MDPs, States, Actions & Rewards (graded)
- Dynamic Programming — Policy Iteration & Value Iteration (graded)
- Monte Carlo Methods — Learning from Complete Episodes (graded)
- Temporal Difference — Q-Learning & SARSA (graded)
- Deep Q-Networks (DQN) (graded)
- Policy Gradient — REINFORCE from Scratch (graded)
- Actor-Critic — A2C and A3C (graded)
- Proximal Policy Optimization (PPO) (graded)
- Reward Modeling & RLHF (graded)
- Multi-Agent RL (graded)
- Sim-to-Real Transfer (graded)
- RL for Games — AlphaZero, MuZero, and the LLM-Reasoning Era (graded)