Bibliography

  • Anthropic. n.d. “Transformer Circuits Thread.”
  • Austin et al., "How to Scale Your Model", Google DeepMind, online, 2025.
  • Ansel, Jason, Edward Yang, Horace He, Natalia Gimelshein, Animesh Jain, Michael Voznesensky, Bin Bao, et al. 2024. “PyTorch 2: Faster Machine Learning through Dynamic Python Bytecode Transformation and Graph Compilation.” ACM, April.
  • Abelson, Harold. 1996. Structure and Interpretation of Computer Programs, Second Edition. MIT Press.
  • Aho, Alfred V, Monica S Lam, Ravi Sethi, and Jeffrey D Ullman. 2015. Compilers: Principles, Techniques, & Tools. Pearson.
  • Baydin, Atilim Gunes, Barak A Pearlmutter, Radul, Alexey Andreyevich, and Jeffrey Mark Siskind. 2015. “Automatic Differentiation in Machine Learning: A Survey.” ArXiv.org
  • Bright, Paige, Alan Edelman, and Steven G Johnson. 2025. “Matrix Calculus (for Machine Learning and Beyond).” ArXiv.org
  • Bryant, Randal E, and David R O’hallaron. 2016. Computer Systems: A Programmer’s Perspective. Boston: Pearson Education.
  • Blondel, Mathieu, and Vincent Roulet. 2024. “The Elements of Differentiable Programming.” ArXiv.org
  • Boehm, Simon. 2022. “How to Optimize a CUDA Matmul Kernel for CuBLAS-like Performance: A Worklog.” Siboehm.com
  • Chen, Tianqi, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, et al. 2018. “TVM: An Automated End-To-End Optimizing Compiler for Deep Learning.” ArXiv, February.
  • Cho, Kyunghyun. 2025. “Machine Learning: A Lecture Note.” ArXiv.org
  • Cooper, Keith D, and Linda Torczon. 2022. Engineering a Compiler. Morgan Kaufmann.
  • Cormen, Thomas H, Charles Eric Leiserson, Ronald L Rivest, and Clifford Stein. 2009. Introduction to Algorithms. MIT Press.
  • Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. Cambridge, Massachusetts: The MIT Press.
  • Griewank, Andreas, and Andrea Walther. 2009. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Philadelphia, Pa.: Society For Industrial & Applied Mathematics ; Cambridge.
  • Hack, Sebastian. 2007. Register Allocation for Programs in SSA Form.
  • Harris, Charles R., K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, et al. 2020. “Array Programming with NumPy.” Nature 585 (7825): 357–62.
  • Harris, Sarah. 2021. Digital Design and Computer Architecture: RISC-V Edition. S.L.: Morgan Kaufmann Publisher.
  • Hennessy, John L, and David A Patterson. 2025. Computer Architecture: A Quantitative Approach. Cambridge, Ma: Morgan Kaufmann.
  • Hwu, Wen-Mei W, David B. Kirk, and Izzat El Hajj. 2022. Programming Massively Parallel Processors: A Hands-on Approach. S.L.: Morgan Kaufmann.
  • Jurafsky, Dan , and James H. Martin. 2025. “Speech and Language Processing.” Stanford.edu. 2025.
  • Kang, Wanmo, and Kyunghyun Cho. 2025. “Linear Algebra for Data Science.” 2025.
  • Klein, Philip N. 2013. Coding the Matrix: Linear Algebra through Applications to Computer Science. Newton, Mass: Newtonian Press.
  • Krishnamurthi, Shriram. 2025. “Programming Languages: Application and Interpretation.” 2025.
  • Mackay, David J C. 2003. Information Theory, Inference, and Learning Algorithms. Cambridge: Cambridge University Press.
  • Møller, Anders, and Michael I Schwartzbach. 2024. “Static Program Analysis.” Cs.au.dk. 2024.
  • Murphy, Kevin P. 2023. Probabilistic Machine Learning: Advanced Concepts. MIT Press.
  • Murphy, Kevin P. 2022. Probabilistic Machine Learning: An Introduction. Cambridge: MIT Press.
  • Ng, Andrew, and Tengyu Ma. 2023. CS229 Lecture Notes.
  • Paszke, Adam, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, et al. 2019. “PyTorch: An Imperative Style, High-Performance Deep Learning Library.” ArXiv.org
  • Ragan-Kelley, Jonathan, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. “Halide.” Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation.
  • Rastello, Fabrice, and Florent Bouchez Tichadou. 2022. SSA-Based Compiler Design. Springer Nature.
  • Shankhdhar, Pranjal. 2025. “Outperforming CuBLAS on H100: A Worklog.” 2025.
  • Suhan, Alex, Davide Libenzi, Ailing Zhang, Parker Schuh, Brennan Saeta, Jie Young Sohn, and Denys Shabalin. 2021. “LazyTensor: Combining Eager Execution with Domain-Specific Compilers.” ArXiv.org
  • Spector, Benjamin F, Simran Arora, Aaryan Singhal, Daniel Y Fu, and Christopher Ré. 2024. “ThunderKittens: Simple, Fast, and Adorable AI Kernels.” ArXiv.org
  • Stepanov, Alexander A, and Daniel E Rose. 2015. From Mathematics to Generic Programming. Upper Saddle River, Nj: Addison-Wesley.
  • Tarjan, Robert E. 1988. Data Structures and Network Algorithms. Philadelphia: Society For Industrial And Applied Mathematics.
  • Tazi et al., "The Ultra-Scale Playbook: Training LLMs on GPU Clusters", 2025.
  • Trefethen, Lloyd N, and David Bau. 1997. Numerical Linear Algebra. SIAM.
  • Uwe Naumann. 2012. The Art of Differentiating Computer Programs: An Introduction to Algorithmic Differentiation. Philadelphia: Society For Industrial And Applied Mathematics.