Neural Scaling Laws and Allometry: coincidence or more?

In my pursuit in 1) understanding, 2) pushing forward human and machine intelligence I came to make the following connection between the biological and the (current) digital realm.

At the current state of affairs, this is more of an observation post than an explanation -- given that there is any correlation at all. Additionally, I will raise a number of questions that could interesting to explore.

Alt text

Left image plot source: Training Compute-Optimal Large Language Models -- Plot is on log(FLOPs) vs. log(model size)

Right image plot source: The Evolutions of Large Brain Size in Mammals: The 'Over-700-Gram Club Quartet'

Plot is on log(body mass) vs. log(brain mass)


  • Given the state of affairs, I would like to draw the attention on the two outlier clusters formed by, 1) the models: GPT-3, Gopher, Megatron-Turing NLG on the left side and, 2) the mammalian brains of hominids on the right side.

  • The case of Chinchilla (70B, 1.4T) is interesting because contrary to the other models, it is under-parameterized and over-trained.

  • One can also wonder about the necessity of embodiement? Once again, human intelligence does NOT equate artificial intelligence i.e. roughly speaking, so-called human intelligence in-silico! The term of machine intelligence may be less error-prone for some, consider looking into: Universal Intelligence: A Definition of Machine Intelligence

  • As for a possible explanation of this observation, different lenses can be used -- again, if there is more than an apparent connection.