Phase 04

Computer Vision

Phase 4: Computer Vision. 28 hands-on lessons building AI from first principles in the browser. Free reading; graded exercises and certificate with lifetime access.

  1. Image Fundamentals — Pixels, Channels, Color Spaces (graded)
  2. Convolutions from Scratch (graded)
  3. CNNs — LeNet to ResNet (graded)
  4. Image Classification (graded)
  5. Transfer Learning & Fine-Tuning
  6. Object Detection — YOLO from Scratch (graded)
  7. Semantic Segmentation — U-Net
  8. Instance Segmentation — Mask R-CNN
  9. Image Generation — GANs
  10. Image Generation — Diffusion Models
  11. Stable Diffusion — Architecture & Fine-Tuning
  12. Video Understanding — Temporal Modeling
  13. 3D Vision — Point Clouds & NeRFs
  14. Vision Transformers (ViT) (graded)
  15. Real-Time Vision — Edge Deployment
  16. Build a Complete Vision Pipeline — Capstone
  17. Self-Supervised Vision — SimCLR, DINO, MAE (graded)
  18. Open-Vocabulary Vision — CLIP (graded)
  19. OCR & Document Understanding
  20. Image Retrieval & Metric Learning (graded)
  21. Keypoint Detection & Pose Estimation (graded)
  22. 3D Gaussian Splatting from Scratch
  23. Diffusion Transformers & Rectified Flow
  24. SAM 3 & Open-Vocabulary Segmentation (graded)
  25. Vision-Language Models — The ViT-MLP-LLM Pattern
  26. Monocular Depth & Geometry Estimation
  27. Multi-Object Tracking & Video Memory (graded)
  28. World Models & Video Diffusion
0 lifetime access. Curriculum based on AI Engineering from Scratch by Rohit Ghumare (MIT, used under attribution).