Preface
As a compiler writer for domain specific cloud languages, I became frustrated with the non-constructiveness and disjointedness of my learning experience in the discipline of machine learning systems, particularly with domain specific tensor languages for training deep neural networks. The course notes that you are currently reading is my personal answer to these frustrations. It's
- inspired by interpreter (schemers) and compiler writers (mlers) teaching introductory programming with the canon of SICP/HTDP/PAPL/DCIC/OCEB. SITP carries that same whirlwind tour form but applies it to the substance of deep neural networks — while the fundamental data structure of SICP is the number, of HtDP the set, of DCIC the table, with SITP it's the ndarray. I've tried my best to write SITP such that however you feel about the previous books you also feel about this one.
- uses opensource code and explainers which globally sources the highest talent in the world. The deep learning framework
teenygradimplemented throughout the book shares 90% of it's abstractions withtinygrad, and is developed to such a point where it can run the training and inference of nanogpt. This makes it so the level of difficulty of the book and it's contents sit nicely between the "toy"-level which is too trivial and the "production"-level which is too complex.
Because the book concerns itself with the low-level programming of deep learning systems, we will be programming against language, platform, and architecture specifics. The teenygrad deep learning framework you develop throughout the book has a Python core for productivty, with CPU and GPU accelerated kernels implemented with Rust and CUDA Rust for it's amenability towards native acceleration. You are more than welcome to follow along with your own choice of host language implementation. For instance, feel free to swap out Python for Javascript1, Rust for C++, etc2.
With that said, the book has three parts:
- in part 1 you implement a multidimensional
Tensorand acceleratedBLASkernels - in part 2 you implement
.backward()and acceleratedcuBLASkernels for the age of research - in part 3 you implement a fusion compiler with
OpNodegraph IR for the age of scaling
If you empathize with some of my frustrations, you may benefit from the course too3. Good luck on your journey. Are you ready to begin?