Phase 09

Reinforcement Learning

Phase 9: Reinforcement Learning. 12 hands-on lessons building AI from first principles in the browser. Free reading; graded exercises and certificate with lifetime access.

  1. MDPs, States, Actions & Rewards (graded)
  2. Dynamic Programming — Policy Iteration & Value Iteration (graded)
  3. Monte Carlo Methods — Learning from Complete Episodes (graded)
  4. Temporal Difference — Q-Learning & SARSA (graded)
  5. Deep Q-Networks (DQN) (graded)
  6. Policy Gradient — REINFORCE from Scratch (graded)
  7. Actor-Critic — A2C and A3C (graded)
  8. Proximal Policy Optimization (PPO) (graded)
  9. Reward Modeling & RLHF (graded)
  10. Multi-Agent RL (graded)
  11. Sim-to-Real Transfer (graded)
  12. RL for Games — AlphaZero, MuZero, and the LLM-Reasoning Era (graded)
0 lifetime access. Curriculum based on AI Engineering from Scratch by Rohit Ghumare (MIT, used under attribution).