Afterword

To continue deepening your knowledge, the following courses are a good next step. You might find this book complementary to your reading, since the two streams were woven into a single narrative for the book. Once you feel comfortable, you should graduate towards contributing to larger machine learning systems.

Good luck on your journey. I'll see you at work.

Teenygrad/Tinygrad Abstraction Correspondance

Teenygrad	Tinygrad	Notes
`OpNode`	`UOp`	Expression graph vertices
`OpCode`	`Ops` (enum)	Operation types
`Buffer`	`Buffer`	Device memory handles
`Runtime`	`Compiled` (Device class)	Memory + compute management
`Allocator`	`Allocator`	Buffer allocation/free
`Compiler`	`Compiler`	Source → binary compilation
`Generator`	`Renderer`	IR → source code generation
`Kernel`	`Program` (CPUProgram, CUDAProgram)	Executable kernel wrapper

Tensor Programming

Recommended Books

Speech and Language Processing by Jurafsky and Martin
The Elements of Statistical Learning by Friedman, Tibshirani, and Hastie
Deep Learning by Goodfellow, Bengio and Courville
Reinforcement Learning by Sutton and Barto
Probabilistic Machine Learning by Kevin Murphy

Recommended Lectures

UPenn STAT 4830: Numerical Optimization for Machine Learning by Damek Davis
MIT 18.S096: Matrix Calculus by Alan Edelman and Steven Johnson
Stanford CS124: From Languages to Information by Dan Jurafsky
Stanford CS229: Machine Learning by Andrew Ng
Stanford CS230: Deep Learning by Andrew Ng
Stanford CS224N: NLP with Deep Learning by Christopher Manning
Eureka LLM101N: Neural Networks Zero to Hero by Andrej Karpathy
Stanford CS336: Language Modeling from Scratch by Percy Liang
HuggingFace: Ultra-Scale Playbook: Training LLMs on GPU Clusters

Tensor Interpretation and Compilation

Recommended Books

Structure and Interpretation of Tensor Programs by j4orz
Programming Massively Parallel Processors by Hwu, Kirk, and Hajj
Computer Architecture: A Quantitative Approach by Hennessy and Patterson

Recommended Lectures

CMU 10-414/714: Deep Learning Systems by Tianqi Chen and Zico Kotler
MLC: Machine Learning Compilers by Tianqi Chen
MIT 6.172: Performance Engineering by Saman Amarasinghe, Charles Leiserson and Julian Shun
MIT 6.S894: Accelerated Computing by Jonathan Ragan-Kelley
Berkeley CS267: Applications of Parallel Computers by Katthie Yellick
UIUC ECE408: Programming Massively Parallel Processors by Wen-mei Hwu
Stanford CS149: Parallel Computing by Kayvon Fatahalian
Stanford CS217: Hardware Accelerators for Machine Learning by Ardavan Pedram and Kunle Olukotun
Carnegie Mellon 18-447: Computer Architecture by Onur Mutlu
Carnegie Mellon 18-742: Parallel Computer Architecture by Onur Mutlu
ETH 227: Programming Heterogeneous Computing Systems with GPUs by Onur Mutlu

Keyboard shortcuts

SITP

Afterword

Recommend Resources

Tensor Programming

Tensor Interpretation and Compilation