Can Large Language Models Reason and Plan?
Subbarao Kambhampati

Large Language Models (LLMs), essentially n-gram models on steroids that have 
been trained on web-scale lan-
guage corpora (or, effectively, our civilizational knowledge), have caught our 
collective imagination with linguis-
tic behaviors that no one expected text completion systems to possess. By 
training and operation, LLMs are perhaps
best seen as giant non-veridical memories akin to an external System 1 for us 
all (see Figure 1). Their seem-
ing versatility has however led many researchers to wonder whether they can 
also do well on planning and reasoning
tasks typically associated with System 2 competency.
Nothing in the training and use of LLMs would seem to suggest remotely that 
they can do any type of princi-
pled reasoning (which, as we know, often involves computationally hard 
inference/search). What LLMs are good
at is a form of universal approximate retrieval. Unlike databases that index 
and retrieve data exactly, LLMs, as n-
gram models, probabilistically reconstruct completions for the prompt word by 
word–a process we shall refer to as
approximate retrieval. This means that LLMs can’t even guarantee memorizing 
complete answers, something that
is the flip side of their appeal about constructing “novel” prompt completions 
on the fly. The boon (“creativity”) and
bane (“hallucination”) of LLMs is that n-gram models will naturally mix and 
match–and have almost as much trouble
strictly memorizing as we do. It is indeed the very basis of their appeal.
