AI vs Linguistics: Thoughts on LLMs as theories of language

04 April 2023

A priori it would seem a generally valid thesis that for every process there is a corresponding system. — Louis Hjelmslev, Prolegomena to a Theory of Language (1943/1961, p. 9)

In Hjelmslev’s well-known but little read Prolegomena, arguments were put forth for the advancement of linguistics as an independent, formal science distinct from the older tradition known as philology. The goal was a theory of language, one that described not languages but the systems underlying languages.

In a way this echoes Chomsky’s influential classic position that a theory of language ought to address the “internal system that determines an infinite range of structured expressions along with their semantic and phonetic interpretations”. For Chomsky, deciding if a body of knowledge qualifies as a theory of language is simple. If it does not describe such an internal generative system, then the answer is no, it is not a theory of language. If it does, then the answer is a yes.

Now, the world is rocked by the introduction of GPT-3, a large language model (LLM) that seems to have finally cracked the language code. And rightfully so. Anyone who uses ChatGPT, the most popular implementation of GPT-3, is bound to be impressed by the model’s uncannily human-like powers of comprehension and response. Unsurprisingly, the remarkable performance of GPT-3 and other LLMs has caused quite a stir in the academic linguistics community.

After all, when Chomsky himself was conducting his early research in Generative Grammar (GG) way back in the 1950s and 60s, much of his funding came from U.S. military projects with the hope of developing a form of natural language understanding (NLU) for their “command and control systems”. But even decades after, GG never seemed to deliver on NLU, and yet research continued.

Of course, NLU has never been (nor, by definition, can it ever be) a core aim of linguistics, which is a scientific field, not an engineering one. But it is easy to see why linguistics is rattled now that LLMs are here (and here to stay, no less).

The fact remains that LLMs are no less computationally or formally explicit than the formal grammars developed by human linguists. LLMs’ human-like performance, which has taken the world by storm, then presents a shocking revelation that formal linguistics now has a serious competitor in the scientific domain. Linguists are asking the question: If we take LLMs as serious formal models of language, what does that mean for our own theories?

I will not paint caricatures of linguists’ reactions following this little bit of introspective questioning. I doubt there are any linguists who would seriously entertain the idea that future linguistics will merely consist of studying how LLMs work, effectively substituting real human subjects with LLMs.

Yet, there is a possible argument to be made that language generated by an LLM is still language. Thus, whether the language data comes from an LLM or a human is besides the point. A theory of language, in this view then, should account for the structure of language regardless of whether it is of human or machine origin.

There are already studies being produced on analysing the generative capacities of LLMs, especially GPT-3. This morning, for example, I came across a preprint by Mahowald et al. that reflects on GPT-3 performance data to propose LLMs may possess human-like “formal competence” in language but not “functional competence”.

I have nothing but great admiration and excitement for this line of work, for which I think “machine cognitive science” may be an appropriate label. But I think it is important that, whether we are dealing with machine or human cognition, we do not lose sight of the scientific goal Hjelmslev had specified nearly 80 years ago: the description of the underlying system.

At present, I feel that the rise of LLMs seems to be distracting linguists from this goal. Perhaps this is fueled by the media and computer science/machine learning community, who tend to evaluate the success of a system by its performance against codified benchmarks and metrics.

This is admirable and ensures new technological developments provide enough added value to combat the evils of economic stagnation, but being well-suited to economic progress does not neccessarily mean well-suited to scientific progress (at least in the human cognition domain).

Deep learning networks themselves are a case-in-point. Despite using bloated buzzwords like “neural computation” or “brain-inspired”, the fact remains that such architectures are a far-cry from remotely resembling any actual brain structures.

Neurons in the brain come in many different kinds – pyramidal, purkinje, basket, and chandelier cells just to name a few – and all of that complexity is simply wastebasketed in favour of oversimplified artificial “neurons”. Pyramidal neurons, for example, have thousands of dendrites differentiated into three functional groups: proximal dendrites (which detect the neuron’s main receptive field), distal basal dendrites (which detect contextual input), and distal apical dendrites (which detect feedback). Artificial “neurons” don’t even model dendrites!

This structural disparity between artificial neural networks and neurobiological networks is an important point. What this means is that machine cognitive science results – understood as the study of the underlying systems of machine cognition – cannot easily generalise to human cognitive science because we already know a priori that the two underlying systems are qualitatively different to a significant extent.

We must not forget this: comparable performance on the surface does not equal to comparable competence systems. It is a logical fallacy to assume that because ChatGPT can churn out human-like responses means it processes language in a human-like way.

So, if we define natural text generation as a computational problem, I don’t think it is surprising that there may be more than one solution. Human cognition represents a solution shaped by thousands of years of natural evolution, and our modern machine cognition systems represent a novel class of engineered solutions. But the two solution classes do not necessarily approximate one another.

Let us return to Hjelmslev and Chomsky (and Halliday and Lamb too), and set for ourselves the goal of linguistics as the description of the underlying system behind language use. With the advent of LLMs based on high-performing but biologically implausible computational architectures, I think we unfortunately have to (re-)introduce yet another dichotomy to cognitive science: the division between machine and human cognition. This is a natural consequence of the structural gap between the two types of competence systems. (This is not the first time that human and machine cognition have been mixed up, however.)

The question that linguists and other cognitive scientists need to ask, then, is how we can bridge this gap. I think the obvious answer is the development of more biologically faithful archtitectures (e.g., Numenta’s Hierarchical Temporal Memory framework as just one example), but it seems we’ll have to wait until everyone gets over the ChatGPT hype before we see any productive work in this direction.

Dylan Scott Low

​