I find LLM engagement to be somewhere between that with a highly
plausible gossip and a well researched survey paper in a subject I am
interested in?
Where a given conversation lands in this interval almost exclusively
seems to rely on my care in crafting my prompts.
I don't expect 'truth' out of either gossip or a survey paper... just
'perspective'?
On 9/11/25 10:55 am, glen wrote:
OK. You're right in principle. But we might want to think of this in
the context of all algorithms. For example, let's say you run a FFT on
a signal and it outputs some frequencies. Does the signal *actually*
contain or express those frequencies? Or is it just an inference that
we find reliable?
The same is true of the LLM inferences. Whether one ascribes truth or
falsity to those inferences is only relevant to metaphysicians and
philosophers. What matters is how reliable the inferences are when we
do some task. Yelling at the kids on your lawn doesn't achieve
anything. It's better to go out there and talk to them. 8^D
On 9/10/25 8:38 PM, Russ Abbott wrote:
Glen, I wish people would stop talking about whether LLM-generated
sentences are true or false. The mechanisms LLMs employ to generate a
sentence have nothing to do with whether the sentence turns out to be
true or false. A sentence may have a higher probability of being true
if the training data consisted entirely of true sentences. (Even
that's not guaranteed; similar true sentences might have their
components interchanged when used during generation.) But the point
is: the transformer process has no connection to the validity of its
output. If an LLM reliably generates true sentences, no credit is due
to the transformer. If the training data consists entirely of
true/false sentences, the generated output is more likely to be
true/false. Output validity plays no role in how an LLM generates its
output.
Marcus, if an LLM is trained entirely on false statements, its
"confidence" in its output will presumably be the same as it would be
if it were trained entirely on true statements. Truthfulness is not a
consideration in the generation process. Speaking of a need to reduce
ambiguity suggests that the LLM understands the input and realizes it
might have multiple meanings. But of course, LLMs don't understand
anything, they don't realize anything, and they can't take meaning
into consideration when generating output.
On Tue, Sep 9, 2025 at 5:20 PM glen <[email protected]
<mailto:[email protected]>> wrote:
It's unfortunate jargon [⛧]. So it's nothing like whether an LLM
is red (unless you adopt a jargonal definition of "red"). And your
example is a great one for understanding how language fluency *is* at
least somewhat correlated with fidelity. The statistical probability
of the phrase "LLMs hallucinate" is >> 0, whereas the prob for the
phrase "LLMs are red" is vanishingly small. It would be the same for
black swans and Lewis Carroll writings *if* they weren't canonical
teaching devices. It can't be that sophisticated if children think
it's funny.
But imagine all the woo out there where words like "entropy" or
"entanglement" are used falsely. IDK for sure, but my guess is the
false sentences outnumber the true ones by a lot. So the LLM has a
high probability of forming false sentences.
Of course, in that sense, if a physicist finds themselves talking
to an expert in the "Law of Attraction" (e.g. the movie "The Secret")
and makes scientifically true statements about entanglement, the guru
may well judge them as false. So there's "true in context" (validity)
and "ontologically true" (soundness). A sentence can be true in
context but false in the world and vice versa, depending on who's in
control of the reinforcement.
[⛧] We could discuss the strength of the analogy between human
hallucination and LLM "hallucination", especially in the context of
prediction coding. But we don't need to. Just consider it jargon and
move on.
On 9/9/25 4:37 PM, Russ Abbott wrote:
> Marcus, Glen,
>
> Your responses are much too sophisticated for me. Now that I'm
retired (and, in truth, probably before as well), I tend to think in
much simpler terms.
>
> My basic point was to express my surprise at realizing that it
makes as much sense to ask whether an LLM hallucinates as it does to
ask whether an LLM is red. It's a category mismatch--at least I now
think so.
> _
> _
> __-- Russ <https://russabbott.substack.com/
<https://russabbott.substack.com/>>
>
>
>
>
> On Tue, Sep 9, 2025 at 3:45 PM glen <[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>> wrote:
>
> The question of whether fluency is (well) correlated to
accuracy seems to assume something like mentalizing, the idea that
there's a correspondence between minds mediated by a correspondence
between the structure of the world and the structure of our
minds/language. We've talked about the "interface theory of
perception", where Hoffman (I think?) argues we're more likely to
learn *false* things than we are true things. And we've argued about
realism, pragmatism, prediction coding, and everything else under the
sun on this list.
>
> So it doesn't surprise me if most people assume there will
be more true statements in the corpus than false statements, at least
in domains where there exists a common sense, where the laity *can*
perceive the truth. In things like quantum mechanics or whatever,
then all bets are off becuase there are probably more false sentences
than true ones.
>
> If there are more true than false sentences in the corpus,
then reinforcement methods like Marcus' only bear a small burden (in
lay domains). The implicit fidelity does the lion's share. But in
those domains where counter-intuitive facts dominate, the
reinforcement does the most work.
>
>
> On 9/9/25 3:12 PM, Marcus Daniels wrote:
> > Three ways some to mind.. I would guess that OpenAI,
Google, Anthropic, and xAI are far more sophisticated..
> >
> > 1. Add a softmax penalty to the loss that tracks
non-factual statements or grammatical constraints. Cross entropy may
not understand that some parts of content are more important than
others.
> > 2. Change how the beam search works during inference
to skip sequences that fail certain predicates – like a lookahead
that says “Oh, I can’t say that..”
> > 3. Grade the output, either using human or non-LLM
supervision, and re-train.
> >
> > *From:*Friam <[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>> *On Behalf Of *Russ Abbott
> > *Sent:* Tuesday, September 9, 2025 3:03 PM
> > *To:* The Friday Morning Applied Complexity Coffee
Group <[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>>
> > *Subject:* [FRIAM] Hallucinations
> >
> > OpenAI just published a paper on hallucinations
<https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
<https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf>
<https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
<https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf>>> as
well as a post summarizing the paper
<https://openai.com/index/why-language-models-hallucinate/
<https://openai.com/index/why-language-models-hallucinate/>
<https://openai.com/index/why-language-models-hallucinate/
<https://openai.com/index/why-language-models-hallucinate/>>>. The
two of them seem wrong-headed in such a simple and obvious way that
I'm surprised the issue they discuss is still alive.
> >
> > The paper and post point out that LLMs are trained to
generate fluent language--which they do extraordinarily well. The
paper and post also point out that LLMs are not trained to
distinguish valid from invalid statements. Given those facts about
LLMs, it's not clear why one should expect LLMs to be able to
distinguish true statements from false statements--and hence why one
should expect to be able to prevent LLMs from hallucinating.
> >
> > In other words, LLMs are built to generate text; they
are not built to understand the texts they generate and certainly not
to be able to determine whether the texts they generate make
factually correct or incorrect statements.
> >
> > Please see my post
<https://russabbott.substack.com/p/why-language-models-hallucinate-according
<https://russabbott.substack.com/p/why-language-models-hallucinate-according>
<https://russabbott.substack.com/p/why-language-models-hallucinate-according
<https://russabbott.substack.com/p/why-language-models-hallucinate-according>>>
elaborating on this.
> >
> > Why is this not obvious, and why is OpenAI still
talking about it?
> >
--
.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ...
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives: 5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
1/2003 thru 6/2021 http://friam.383.s1.nabble.com/