Phase 04

Computer Vision

Phase 4: Computer Vision. 28 hands-on lessons building AI from first principles in the browser. Free reading; graded exercises and certificate with lifetime access.

Image Fundamentals — Pixels, Channels, Color Spaces (graded)
Convolutions from Scratch (graded)
CNNs — LeNet to ResNet (graded)
Image Classification (graded)
Transfer Learning & Fine-Tuning
Object Detection — YOLO from Scratch (graded)
Semantic Segmentation — U-Net
Instance Segmentation — Mask R-CNN
Image Generation — GANs
Image Generation — Diffusion Models
Stable Diffusion — Architecture & Fine-Tuning
Video Understanding — Temporal Modeling
3D Vision — Point Clouds & NeRFs
Vision Transformers (ViT) (graded)
Real-Time Vision — Edge Deployment
Build a Complete Vision Pipeline — Capstone
Self-Supervised Vision — SimCLR, DINO, MAE (graded)
Open-Vocabulary Vision — CLIP (graded)
OCR & Document Understanding
Image Retrieval & Metric Learning (graded)
Keypoint Detection & Pose Estimation (graded)
3D Gaussian Splatting from Scratch
Diffusion Transformers & Rectified Flow
SAM 3 & Open-Vocabulary Segmentation (graded)
Vision-Language Models — The ViT-MLP-LLM Pattern
Monocular Depth & Geometry Estimation
Multi-Object Tracking & Video Memory (graded)
World Models & Video Diffusion