Phase 04
Computer Vision
Phase 4: Computer Vision. 28 hands-on lessons building AI from first principles in the browser. Free reading; graded exercises and certificate with lifetime access.
- Image Fundamentals — Pixels, Channels, Color Spaces (graded)
- Convolutions from Scratch (graded)
- CNNs — LeNet to ResNet (graded)
- Image Classification (graded)
- Transfer Learning & Fine-Tuning
- Object Detection — YOLO from Scratch (graded)
- Semantic Segmentation — U-Net
- Instance Segmentation — Mask R-CNN
- Image Generation — GANs
- Image Generation — Diffusion Models
- Stable Diffusion — Architecture & Fine-Tuning
- Video Understanding — Temporal Modeling
- 3D Vision — Point Clouds & NeRFs
- Vision Transformers (ViT) (graded)
- Real-Time Vision — Edge Deployment
- Build a Complete Vision Pipeline — Capstone
- Self-Supervised Vision — SimCLR, DINO, MAE (graded)
- Open-Vocabulary Vision — CLIP (graded)
- OCR & Document Understanding
- Image Retrieval & Metric Learning (graded)
- Keypoint Detection & Pose Estimation (graded)
- 3D Gaussian Splatting from Scratch
- Diffusion Transformers & Rectified Flow
- SAM 3 & Open-Vocabulary Segmentation (graded)
- Vision-Language Models — The ViT-MLP-LLM Pattern
- Monocular Depth & Geometry Estimation
- Multi-Object Tracking & Video Memory (graded)
- World Models & Video Diffusion