Aug 13, 2025 - 1 min read

Neural Scaling Laws and Allometry: coincidence or more?

In my pursuit in 1) understanding, 2) pushing forward human and machine intelligence I came to make the following connection between the biological and the (current) digital realm.

At the current state of affairs, this is more of an observation post than an explanation -- given that there is any correlation at all. Additionally, I will raise a number of questions that could interesting to explore.

Alt text

Left image plot source: Training Compute-Optimal Large Language Models -- Plot is on log(FLOPs) vs. log(model size)

Right image plot source: The Evolutions of Large Brain Size in Mammals: The 'Over-700-Gram Club Quartet'

Plot is on log(body mass) vs. log(brain mass)

Given the state of affairs, I would like to draw the attention on the two outlier clusters formed by, 1) the models: GPT-3, Gopher, Megatron-Turing NLG on the left side and, 2) the mammalian brains of hominids on the right side.
The case of Chinchilla (70B, 1.4T) is interesting because contrary to the other models, it is under-parameterized and over-trained.
One can also wonder about the necessity of embodiement? Once again, human intelligence does NOT equate artificial intelligence i.e. roughly speaking, so-called human intelligence in-silico! The term of machine intelligence may be less error-prone for some, consider looking into: Universal Intelligence: A Definition of Machine Intelligence
As for a possible explanation of this observation, different lenses can be used -- again, if there is more than an apparent connection.