Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Afterword

To continue deepening your knowledge, the following courses are a good next step. You might find this book complementary to your reading, since the two streams were woven into a single narrative for the book. Once you feel comfortable, you should graduate towards contributing to larger machine learning systems.

Good luck on your journey. I'll see you at work.

Teenygrad/Tinygrad Abstraction Correspondance

TeenygradTinygradNotes
OpNodeUOpExpression graph vertices
OpCodeOps (enum)Operation types
BufferBufferDevice memory handles
RuntimeCompiled (Device class)Memory + compute management
AllocatorAllocatorBuffer allocation/free
CompilerCompilerSource → binary compilation
GeneratorRendererIR → source code generation
KernelProgram (CPUProgram, CUDAProgram)Executable kernel wrapper

Recommend Resources

Tensor Programming

Recommended Books

  • Speech and Language Processing by Jurafsky and Martin
  • The Elements of Statistical Learning by Friedman, Tibshirani, and Hastie
  • Deep Learning by Goodfellow, Bengio and Courville
  • Reinforcement Learning by Sutton and Barto
  • Probabilistic Machine Learning by Kevin Murphy

Recommended Lectures

  • UPenn STAT 4830: Numerical Optimization for Machine Learning by Damek Davis
  • MIT 18.S096: Matrix Calculus by Alan Edelman and Steven Johnson
  • Stanford CS124: From Languages to Information by Dan Jurafsky
  • Stanford CS229: Machine Learning by Andrew Ng
  • Stanford CS230: Deep Learning by Andrew Ng
  • Stanford CS224N: NLP with Deep Learning by Christopher Manning
  • Eureka LLM101N: Neural Networks Zero to Hero by Andrej Karpathy
  • Stanford CS336: Language Modeling from Scratch by Percy Liang
  • HuggingFace: Ultra-Scale Playbook: Training LLMs on GPU Clusters

Tensor Interpretation and Compilation

Recommended Books

  • Structure and Interpretation of Tensor Programs by j4orz
  • Programming Massively Parallel Processors by Hwu, Kirk, and Hajj
  • Computer Architecture: A Quantitative Approach by Hennessy and Patterson

Recommended Lectures

  • CMU 10-414/714: Deep Learning Systems by Tianqi Chen and Zico Kotler
  • MLC: Machine Learning Compilers by Tianqi Chen
  • MIT 6.172: Performance Engineering by Saman Amarasinghe, Charles Leiserson and Julian Shun
  • MIT 6.S894: Accelerated Computing by Jonathan Ragan-Kelley
  • Berkeley CS267: Applications of Parallel Computers by Katthie Yellick
  • UIUC ECE408: Programming Massively Parallel Processors by Wen-mei Hwu
  • Stanford CS149: Parallel Computing by Kayvon Fatahalian
  • Stanford CS217: Hardware Accelerators for Machine Learning by Ardavan Pedram and Kunle Olukotun
  • Carnegie Mellon 18-447: Computer Architecture by Onur Mutlu
  • Carnegie Mellon 18-742: Parallel Computer Architecture by Onur Mutlu
  • ETH 227: Programming Heterogeneous Computing Systems with GPUs by Onur Mutlu