Austin et al., "How to Scale Your Model", Google DeepMind, online, 2025.
Ansel, Jason, Edward Yang, Horace He, Natalia Gimelshein, Animesh Jain, Michael Voznesensky, Bin Bao, et al. 2024. “PyTorch 2: Faster Machine Learning through Dynamic Python Bytecode Transformation and Graph Compilation.” ACM, April.
Abelson, Harold. 1996. Structure and Interpretation of Computer Programs, Second Edition. MIT Press.
Aho, Alfred V, Monica S Lam, Ravi Sethi, and Jeffrey D Ullman. 2015. Compilers: Principles, Techniques, & Tools. Pearson.
Baydin, Atilim Gunes, Barak A Pearlmutter, Radul, Alexey Andreyevich, and Jeffrey Mark Siskind. 2015. “Automatic Differentiation in Machine Learning: A Survey.” ArXiv.org
Bright, Paige, Alan Edelman, and Steven G Johnson. 2025. “Matrix Calculus (for Machine Learning and Beyond).” ArXiv.org
Bryant, Randal E, and David R O’hallaron. 2016. Computer Systems: A Programmer’s Perspective. Boston: Pearson Education.
Blondel, Mathieu, and Vincent Roulet. 2024. “The Elements of Differentiable Programming.” ArXiv.org
Boehm, Simon. 2022. “How to Optimize a CUDA Matmul Kernel for CuBLAS-like Performance: A Worklog.” Siboehm.com
Chen, Tianqi, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, et al. 2018. “TVM: An Automated End-To-End Optimizing Compiler for Deep Learning.” ArXiv, February.
Cho, Kyunghyun. 2025. “Machine Learning: A Lecture Note.” ArXiv.org
Cooper, Keith D, and Linda Torczon. 2022. Engineering a Compiler. Morgan Kaufmann.
Cormen, Thomas H, Charles Eric Leiserson, Ronald L Rivest, and Clifford Stein. 2009. Introduction to Algorithms. MIT Press.
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. Cambridge, Massachusetts: The MIT Press.
Griewank, Andreas, and Andrea Walther. 2009. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Philadelphia, Pa.: Society For Industrial & Applied Mathematics ; Cambridge.
Hack, Sebastian. 2007. Register Allocation for Programs in SSA Form.
Harris, Charles R., K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, et al. 2020. “Array Programming with NumPy.” Nature 585 (7825): 357–62.
Harris, Sarah. 2021. Digital Design and Computer Architecture: RISC-V Edition. S.L.: Morgan Kaufmann Publisher.
Hennessy, John L, and David A Patterson. 2025. Computer Architecture: A Quantitative Approach. Cambridge, Ma: Morgan Kaufmann.
Hwu, Wen-Mei W, David B. Kirk, and Izzat El Hajj. 2022. Programming Massively Parallel Processors: A Hands-on Approach. S.L.: Morgan Kaufmann.
Jurafsky, Dan , and James H. Martin. 2025. “Speech and Language Processing.” Stanford.edu. 2025.
Kang, Wanmo, and Kyunghyun Cho. 2025. “Linear Algebra for Data Science.” 2025.
Klein, Philip N. 2013. Coding the Matrix: Linear Algebra through Applications to Computer Science. Newton, Mass: Newtonian Press.
Krishnamurthi, Shriram. 2025. “Programming Languages: Application and Interpretation.” 2025.
Mackay, David J C. 2003. Information Theory, Inference, and Learning Algorithms. Cambridge: Cambridge University Press.
Møller, Anders, and Michael I Schwartzbach. 2024. “Static Program Analysis.” Cs.au.dk. 2024.
Murphy, Kevin P. 2023. Probabilistic Machine Learning: Advanced Concepts. MIT Press.
Murphy, Kevin P. 2022. Probabilistic Machine Learning: An Introduction. Cambridge: MIT Press.
Ng, Andrew, and Tengyu Ma. 2023. CS229 Lecture Notes.
Paszke, Adam, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, et al. 2019. “PyTorch: An Imperative Style, High-Performance Deep Learning Library.” ArXiv.org
Ragan-Kelley, Jonathan, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. “Halide.” Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation.
Shankhdhar, Pranjal. 2025. “Outperforming CuBLAS on H100: A Worklog.” 2025.
Suhan, Alex, Davide Libenzi, Ailing Zhang, Parker Schuh, Brennan Saeta, Jie Young Sohn, and Denys Shabalin. 2021. “LazyTensor: Combining Eager Execution with Domain-Specific Compilers.” ArXiv.org
Spector, Benjamin F, Simran Arora, Aaryan Singhal, Daniel Y Fu, and Christopher Ré. 2024. “ThunderKittens: Simple, Fast, and Adorable AI Kernels.” ArXiv.org
Stepanov, Alexander A, and Daniel E Rose. 2015. From Mathematics to Generic Programming. Upper Saddle River, Nj: Addison-Wesley.
Tarjan, Robert E. 1988. Data Structures and Network Algorithms. Philadelphia: Society For Industrial And Applied Mathematics.
Tazi et al., "The Ultra-Scale Playbook: Training LLMs on GPU Clusters", 2025.
Trefethen, Lloyd N, and David Bau. 1997. Numerical Linear Algebra. SIAM.
Uwe Naumann. 2012. The Art of Differentiating Computer Programs: An Introduction to Algorithmic Differentiation. Philadelphia: Society For Industrial And Applied Mathematics.