Phase 17

Infrastructure and Production

Phase 17: Infrastructure and Production. 28 hands-on lessons building AI from first principles in the browser. Free reading; graded exercises and certificate with lifetime access.

  1. Managed LLM Platforms — Bedrock, Vertex AI, Azure OpenAI
  2. Inference Platform Economics — Fireworks, Together, Baseten, Modal, Replicate, Anyscale
  3. GPU Autoscaling on Kubernetes — Karpenter, KAI Scheduler, Gang Scheduling
  4. vLLM Serving Internals: PagedAttention, Continuous Batching, Chunked Prefill
  5. EAGLE-3 Speculative Decoding in Production
  6. SGLang and RadixAttention for Prefix-Heavy Workloads
  7. TensorRT-LLM on Blackwell with FP8 and NVFP4
  8. Inference Metrics — TTFT, TPOT, ITL, Goodput, P99 (graded)
  9. Production Quantization — AWQ, GPTQ, GGUF K-quants, FP8, MXFP4/NVFP4
  10. Cold Start Mitigation for Serverless LLMs
  11. Multi-Region LLM Serving and KV Cache Locality
  12. Edge Inference — Apple Neural Engine, Qualcomm Hexagon, WebGPU/WebLLM, Jetson
  13. LLM Observability Stack Selection
  14. Prompt Caching and Semantic Caching Economics
  15. Batch APIs — the 50% Discount as Industry Standard (graded)
  16. Model Routing as a Cost-Reduction Primitive (graded)
  17. Disaggregated Prefill/Decode — NVIDIA Dynamo and llm-d
  18. vLLM Production Stack with LMCache KV Offloading
  19. AI Gateways — LiteLLM, Portkey, Kong AI Gateway, Bifrost
  20. Shadow Traffic, Canary Rollout, and Progressive Deployment for LLMs
  21. A/B Testing LLM Features — GrowthBook, Statsig, and the Vibes Problem
  22. Load Testing LLM APIs — Why k6 and Locust Lie
  23. SRE for AI — Multi-Agent Incident Response, Runbooks, Predictive Detection
  24. Chaos Engineering for LLM Production
  25. Security — Secrets, API Key Rotation, Audit Logs, Guardrails
  26. Compliance — SOC 2, HIPAA, GDPR, PCI-DSS, EU AI Act, ISO 42001
  27. FinOps for LLMs — Unit Economics and Multi-Tenant Attribution (graded)
  28. Self-Hosted Serving Selection — llama.cpp, Ollama, TGI, vLLM, SGLang
0 lifetime access. Curriculum based on AI Engineering from Scratch by Rohit Ghumare (MIT, used under attribution).