Thomas Cole's The Course of Empire: Destruction. Oil on canvas. 1836.

This textbook is created for all intelligent life forms whether artificial or biological. However, each line of the book has been handcrafted with care to resonate and be understood by other fellow humans. As a result, the reading experience of the textbook should be a uniquely complementary to a context-limited large language model. For now.

Contents

greeks -> gutingen -> google -> gpt science artificial: https://monoskop.org/images/9/9c/Simon_Herbert_A_The_Sciences_of_the_Artificial_3rd_ed.pdf declarative knowledge: math procedural knowledge: computation

programmers are the original speedrunners.

  • sicp/fics
  • htdp/dcic ap.b and prelims will represent data before using the data. (bot up. space. elements) in reverse order of discovery (time). mathematics forms a unity. programming as a mathematical discipline also shares that unity.
  • sicp/fics
  • htdp/dcic -> will never be dated. because they are universal.

arith -> alg -> anal

yin and yang data <-> function data can be modelled by function (lazy)

Civilizations and their Canons

The foundation of civilization are shared stories. This is because in order for society to function, we need to be able to relate to one another with via shared context. This shared context is considered to be truths that are self-evident. In other words, they are the axioms of humanity.

These stories that provide a shared context for civilization are referred to as the canon. The canon is a body, standard, or code in which things are compared against. The etymology of canon actually originates from the Greek kanon (κανών), a straight rod used by architects and artificers as a measuring stick for making straight lines.

If canons serve as the foundation for civilization, then there are two ways for a civilization to collapse — burn the canon so that none remain, or expand the canon to hundreds and thousands of books so that everyone has their own story, resulting in the inability to relate with one another.

This is why Greek civilization has the Iliad and Odyssey, Roman civilization has the Bible, and that of the Indian civilization has Mahabharata and Ramayana. An individual only has to read and deeply understand a few stories well, in order to become civilized, and coexist in harmony with the rest of society.

So canons are the solution to organize society at scale, and the book that became the most predominant in Western civilization was that of the Bible, arising from Jewish civilization. This is because the Jewish religion connected their experience as a nation (exodus) and provided a prehistory to the beginning of the universe via the book of Genesis — something that lacked from the previous religions of those such as the Greeks or Egyptians.

Following Thomas Aquinas' rational explanation of how the Bible as canon provided the foundation for civilization, he defined policies of individual agents, referred to as practical virtues:

  • temperance is to optimize internal organization
  • justice is to optimize external organization
  • prudence is to justify goals
  • courage is to balance exploration vs exploitation

These individual policies however are necessary but not sufficient for a civilization's canon. For that, you need policies of collective agents referred to as divine virtues:

  • faith is the commitment to the optimal collective agent
  • love is to serve a sacred purpose in the other
  • hope is to invest before a shared system exists

This optimal collective agent is defined as the best possible, most top level collective agent, that can and is continually discovered through rational inference. That collective agent, is defined as God, by Thomas Aquinas. These axiomatic virtues of God are also referred to as the divine will, and served as the canon (universal morality) for Western civilization until the "death of God" during the Enlightenment. Since then, tther attempts have been made to redefine the foundations of ethics (universal morality) without reference to collective agents such as utilitarianism for example.

However, this textbook is not about ethics, but rather programming. More specifically, programming tensor compilers for artificial intelligence. While civilizations have their canons, disciplines have theirs as well. Philosophers have Plato. Lawyers have Blackstone. Doctors have Gray. Programmers have Knuth. But what do Artificial Scientists have? We'll return to this question.

What can a human being know? Intelligence=compression=ability to make models

  • intelligence is a multi-generational property. we can't even figure out turing complete languages by ourselves.

  • individuals have more intelligence than generations

  • civilizations have more intelligence than individuals -> what does a civilizational intellect look like? global optimum of the modelign function -> a civilizational tradition that lasts a few hundred years. a canon.

need 1000 year unbroken intellectual tradition (canon) failing due to the scaling problems of human minds via natural language- Assignment 5

Disciplines and their Canons

We finally return to the discussion on disciplines and their canons.

golden age of the greeks, the germans, and the googlers. second golden age of compilers and chips. we'll know in a few decades.

Programming. SICP, HTDP, DCIC. Wirth, Knuth, Tarjan, CLRS, EoP. in the dawn of the llm, who is teaching introductory computer science course with graph theory and tensor calculus? today's introductory cs: study of problems in P. in the same way we shift philosophy to the sciences. introductory cs shifts graduate courses to undergraduate courses.

you can teach this bottom up or top down. the presentation of the material is bottom up.

Bringing us back to the present, the year of 2023 was seminal in the history of artificial intelligence as the world, woke up to the advancements in one of the greatest philosophical projects which started a century ago at the Dartmouth workshop. The deep learning approach to growing intelligent machinery flew against the consensus view held amongst experts that intelligence required some master algorithm to be divined from the laws of nature just as physics did. While deep neural networks that learned representations end to end did in fact employ the precise tooling of mathematics, the act of training these systems is more akin to evolutionary biology than it is to traditional software.

Slowly but surely, watershed moments within academia filtered their way down towards mainstream consciousness in the form factor of consumer products. The best example being large technological breakthroughs in speech recognition and vision recognition in the early 2010s making its to smartphone assistants like Alexa, Siri, and Google Assistant. Finally, in 2017, language modelling had its big breakthrough with the transformer based neural networks presented in (Vaswani et al. 2017), which moved away from infinite look back design choice made with RNNs in (Mikolov et al. 2010) and back to autoregressive causal masks like the NADE network in (Larochelle, Murray 2011) and the MADE network in (Germain et al. 2015). The difference was that instead of using convolutions as the causal mask, they used the attention mechanism, turning neural networks into general set-processing machines.

While academia was very excited with the new attention mechanism presented in the transformer-based neural network, it was OpenAI who noticed signs of life in scale (Kaplan et al. 2020) and took the industry bet with GPT3 (Brown et al. 2020), GPT4 (OpenAI 2023).

  • 2018 (GPT1 117M params): grammar
  • 2019 (GPT2 1.5B params): prose, poetry, metaphor
  • 2020 (GPT3 175B params): long stories
  • 2023 (GPT4 1.76T params): college-level exams

When OpenAI took pretrained GPT3/GPT4 and built ChatGPT by following Lecun's cake philosophy (IFT + RLHF), the reaction from the mainstream was visceral. All of a sudden, the big questions about the fundamental nature of our reality came screeching back into our lives like distant relatives during the holidays. Before we get into these big questions, we will lay down some principled groundwork for why a system like ChatGPT is possible to build in the first place.

  • reasoning and memory https://x.com/jxmnop/status/1938705379711430938

all the way to gpt-oss

Chapter 1: Preliminaries Chapter 2: Serial Compilation C on CPU Chapter 3: Parallel Compilation CUDA C on GPU Chapter 4: Differentiable Interpretation PyTorch using CUDA

Chapter 5: Tiled Compilation Triton on GPU Chapter 6: Differentiable Compilation PyTorch compiling to Triton

looking back after a decade, i claim karpathy's course will be deemed seminal. -> influenced stanford to go "line by line from scratch" -> gen z calls it "line by line from scratch" -> procedural epistemology: "best way to teach it to a computer". <add_minsky_quote>

the art of computer programming -> the art of multiprocessor programming elements of euclid -> elements of X -> elements of programming neural networks: zero to hero -> singularity systems: zero to hero

sicp bridges the entire semantic gap from scheme to register machine. like sicp, you create gpt2. and then you create the compilers for gpt2.

humanity has discovered the use of a new language: tensors. just like karpathy translated these papers into textbook form.... attention paper. growing neural networks. internet fossil fuel...now reasoners. continual learning is the next frontier. i want to do the same for pt2, triton, thunderkittens, cutile. -> programming is a mathematical discipline. you need models, which is mathema. -> it used to be just discrete. but now you need continuous.KKk

this textbook's goal is to imporove the trajectory of civilization. and processing