SITP

Anatomy of an Autograd

cuDNN!

Level 1: DNN

Resources

  • GPU MODE Lecture 6: Optimizing Optimizers
  • GPU MODE Lecture 7: Advanced Quantization
  • GPU MODE Lecture 11: Sparsity
  • GPU MODE Lecture 12: Flash Attention
  • GPU MODE Lecture 13: Ring Attention
  • GPU MODE Lecture 30: Quantized Training
  • GPU MODE Lecture 60: Optimizing Linear Attention
  • GPU MODE Lecture 65: Neighborhood Attention
  • GPU MODE Lecture 73: Quantization in Large Models
  • GPU MODE Lecture 23: Tensor Cores
  • GPU MODE Lecture 15: CUTLASS
  • GPU MODE Lecture 36: CUTLASS and Flash Attention 3
  • GPU MODE Lecture 57: CuTe

Differentiable Compilation

Resources

  • GPU MODE Lecture 18: Fusing Kernels
  • GPU MODE Lecture 53: torch.compile Q&A