Afterword
To continue deepening your knowledge, the following courses are a good next step. You might find this book complementary to your reading, since the two streams were woven into a single narrative for the book. Once you feel comfortable, you should graduate towards contributing to larger machine learning systems.
Good luck on your journey. I'll see you at work.
Teenygrad/Tinygrad Abstraction Correspondance
| Teenygrad | Tinygrad | Notes |
|---|---|---|
OpNode | UOp | Expression graph vertices |
OpCode | Ops (enum) | Operation types |
Buffer | Buffer | Device memory handles |
Runtime | Compiled (Device class) | Memory + compute management |
Allocator | Allocator | Buffer allocation/free |
Compiler | Compiler | Source → binary compilation |
Generator | Renderer | IR → source code generation |
Kernel | Program (CPUProgram, CUDAProgram) | Executable kernel wrapper |
Recommend Resources
Tensor Programming
Recommended Books
- Speech and Language Processing by Jurafsky and Martin
- The Elements of Statistical Learning by Friedman, Tibshirani, and Hastie
- Deep Learning by Goodfellow, Bengio and Courville
- Reinforcement Learning by Sutton and Barto
- Probabilistic Machine Learning by Kevin Murphy
Recommended Lectures
- UPenn STAT 4830: Numerical Optimization for Machine Learning by Damek Davis
- MIT 18.S096: Matrix Calculus by Alan Edelman and Steven Johnson
- Stanford CS124: From Languages to Information by Dan Jurafsky
- Stanford CS229: Machine Learning by Andrew Ng
- Stanford CS230: Deep Learning by Andrew Ng
- Stanford CS224N: NLP with Deep Learning by Christopher Manning
- Eureka LLM101N: Neural Networks Zero to Hero by Andrej Karpathy
- Stanford CS336: Language Modeling from Scratch by Percy Liang
- HuggingFace: Ultra-Scale Playbook: Training LLMs on GPU Clusters
Tensor Interpretation and Compilation
Recommended Books
- Structure and Interpretation of Tensor Programs by j4orz
- Programming Massively Parallel Processors by Hwu, Kirk, and Hajj
- Computer Architecture: A Quantitative Approach by Hennessy and Patterson
Recommended Lectures
- CMU 10-414/714: Deep Learning Systems by Tianqi Chen and Zico Kotler
- MLC: Machine Learning Compilers by Tianqi Chen
- MIT 6.172: Performance Engineering by Saman Amarasinghe, Charles Leiserson and Julian Shun
- MIT 6.S894: Accelerated Computing by Jonathan Ragan-Kelley
- Berkeley CS267: Applications of Parallel Computers by Katthie Yellick
- UIUC ECE408: Programming Massively Parallel Processors by Wen-mei Hwu
- Stanford CS149: Parallel Computing by Kayvon Fatahalian
- Stanford CS217: Hardware Accelerators for Machine Learning by Ardavan Pedram and Kunle Olukotun
- Carnegie Mellon 18-447: Computer Architecture by Onur Mutlu
- Carnegie Mellon 18-742: Parallel Computer Architecture by Onur Mutlu
- ETH 227: Programming Heterogeneous Computing Systems with GPUs by Onur Mutlu