briefing

Briefing: The Bitter Lesson Was Right. Then the Sun Got Involved.

Published: May 2, 2026 | Source: ejsays.com | Author: E. J. Original article: https://posts.ejsays.com/the-bitter-lesson-was-right-then-the-sun-got-involved/

Core claim: Richard Sutton's Bitter Lesson (2019) was most true at the moment of publication. It is slightly less true today. It will be less true still in the future. Not because scaling stopped working — but because the assumptions underneath it are quietly falling apart.

The Bitter Lesson (2019): Sutton's ~1,125-word essay argued that seventy years of AI history show one repeating pattern: brute force always wins. Methods that leverage computation outperform methods that leverage human insight. The lesson is bitter because it implies that the clever thing researchers spent their careers building will be outrun by someone with a bigger cluster.

The silent engine nobody mentioned: The Bitter Lesson only works if compute keeps getting cheaper. Moore's Law is the unstated foundation of the entire argument. Without reliably falling cost-per-computation, "just scale" is not a viable strategy. At 2nm nodes and below, quantum tunneling and heat dissipation become unsolvable engineering problems. The cost per transistor, which fell for decades, will likely rise. The new generation is no longer guaranteed to be cheaper than the last.

The sun argument: A photon born in the solar core spends thousands of years bouncing through dense plasma before its eight-minute journey to Earth. Computing every photon path would require more energy than the photon itself carries. Some processes are their own fastest simulation. The universe has no shortcut for itself. Scaling is a local phenomenon at most — it cannot reach processes where computation costs more than the thing being computed.

The monkey theorem reframe: The infinite monkey theorem requires cosmologically more monkeys than atoms in the observable universe to produce Hamlet. But even if a monkey types Hamlet, someone must find it — a second army of readers checking output, page by page. William Shakespeare produced Hamlet on a 20-watt compute. Human intelligence shows up either at the generation stage or the evaluation stage. Brute force was never the powerful choice. It was always the expensive one dressed as pragmatism.

The Vincent argument: Billions of images generated at scale do not accumulate into one Starry Night. What makes Van Gogh's work matter is inseparable from a specific person in a specific state of mind through a specific history of suffering and seeing. Scaling cannot introduce new frequencies — it can only better approximate frequencies already present in the training data. A Fourier transform with more samples gives a better approximation of the original signal, but not frequencies that were never in the signal to begin with.

The Leonardo distinction: The fantasy of general intelligence — one ring — is a procurement error. Leonardo never finished the Mona Lisa not for lack of talent but because acquiring new knowledge made completion harder. Leonardo was also expensive: castles, kings. Vincent was cheaper and more focused. If the task requires millions of paintings, hire Vincents. Specialization beats generality when efficiency matters.

Language as lossy compression: LLMs were chosen because language is the simplest possible interface — how humans already think, store knowledge, and transmit culture. But language is already an approximation of reality. Training on every word ever written still means training on humanity's approximation of reality. Everything never written is permanently outside the training distribution. Scaling does not introduce new information.

The democratic ceiling: Data centers now compete with small towns for electricity and water. Communities are voting them down. The democratic ceiling may arrive before the mathematical, physical, or thermodynamic ceilings. Compute abundance is not guaranteed.

Author's conclusion: The Bitter Lesson was true during the free scaling era — when efficiency did not matter because growth masked everything. Efficiency matters now. When efficiency matters, the calculus shifts: specialization becomes valuable, human insight becomes worth investing in, finding the right person and giving them a typewriter looks like the rational strategy, not the sentimental one.

Assumptions Underlying the Bitter Lesson — Status in 2026

Assumption	Status
Compute cost falls reliably (Moore's Law)	Failing at 2nm — quantum tunneling, heat dissipation unsolved
Scaling introduces new capability	Cannot introduce frequencies absent from training data
Brute force is cheaper than expertise	Failing — energy, water, democratic resistance
General intelligence is the goal	Disputed — specialization may dominate
Data centers face no resource constraints	Failing — communities voting against them

Bitter Lesson Timeline

Year	Event
2019	Sutton publishes The Bitter Lesson — scaling hypothesis at peak credibility
2020–2022	GPT-3, GPT-4, Claude, Gemini — scaling hypothesis confirmed
2023–2025	2nm node challenges emerge; energy/water constraints visible
2026	Author argues the lesson was most true at publication; assumptions quietly failing