Re: [FRIAM] Hallucinations

Steve Smith Sat, 13 Sep 2025 01:29:39 -0700

I find LLM engagement to be somewhere between that with a highlyplausible gossip and a well researched survey paper in a subject I aminterested in?

Where a given conversation lands in this interval almost exclusivelyseems to rely on my care in crafting my prompts.

I don't expect 'truth' out of either gossip or a survey paper... just'perspective'?


On 9/11/25 10:55 am, glen wrote:

OK. You're right in principle. But we might want to think of this inthe context of all algorithms. For example, let's say you run a FFT ona signal and it outputs some frequencies. Does the signal *actually*contain or express those frequencies? Or is it just an inference thatwe find reliable?
The same is true of the LLM inferences. Whether one ascribes truth orfalsity to those inferences is only relevant to metaphysicians andphilosophers. What matters is how reliable the inferences are when wedo some task. Yelling at the kids on your lawn doesn't achieveanything. It's better to go out there and talk to them. 8^D
On 9/10/25 8:38 PM, Russ Abbott wrote:
Glen, I wish people would stop talking about whether LLM-generatedsentences are true or false. The mechanisms LLMs employ to generate asentence have nothing to do with whether the sentence turns out to betrue or false. A sentence may have a higher probability of being trueif the training data consisted entirely of true sentences. (Eventhat's not guaranteed; similar true sentences might have theircomponents interchanged when used during generation.) But the pointis: the transformer process has no connection to the validity of itsoutput. If an LLM reliably generates true sentences, no credit is dueto the transformer. If the training data consists entirely oftrue/false sentences, the generated output is more likely to betrue/false. Output validity plays no role in how an LLM generates itsoutput.
Marcus, if an LLM is trained entirely on false statements, its"confidence" in its output will presumably be the same as it would beif it were trained entirely on true statements. Truthfulness is not aconsideration in the generation process. Speaking of a need to reduceambiguity suggests that the LLM understands the input and realizes itmight have multiple meanings. But of course, LLMs don't understandanything, they don't realize anything, and they can't take meaninginto consideration when generating output.
On Tue, Sep 9, 2025 at 5:20 PM glen <[email protected]<mailto:[email protected]>> wrote:
It's unfortunate jargon [⛧]. So it's nothing like whether an LLMis red (unless you adopt a jargonal definition of "red"). And yourexample is a great one for understanding how language fluency *is* atleast somewhat correlated with fidelity. The statistical probabilityof the phrase "LLMs hallucinate" is >> 0, whereas the prob for thephrase "LLMs are red" is vanishingly small. It would be the same forblack swans and Lewis Carroll writings *if* they weren't canonicalteaching devices. It can't be that sophisticated if children thinkit's funny.
But imagine all the woo out there where words like "entropy" or"entanglement" are used falsely. IDK for sure, but my guess is thefalse sentences outnumber the true ones by a lot. So the LLM has ahigh probability of forming false sentences.
Of course, in that sense, if a physicist finds themselves talkingto an expert in the "Law of Attraction" (e.g. the movie "The Secret")and makes scientifically true statements about entanglement, the gurumay well judge them as false. So there's "true in context" (validity)and "ontologically true" (soundness). A sentence can be true incontext but false in the world and vice versa, depending on who's incontrol of the reinforcement.
[⛧] We could discuss the strength of the analogy between humanhallucination and LLM "hallucination", especially in the context ofprediction coding. But we don't need to. Just consider it jargon andmove on.
    On 9/9/25 4:37 PM, Russ Abbott wrote:
     > Marcus, Glen,
     >
> Your responses are much too sophisticated for me. Now that I'mretired (and, in truth, probably before as well), I tend to think inmuch simpler terms.
     >
> My basic point was to express my surprise at realizing that itmakes as much sense to ask whether an LLM hallucinates as it does toask whether an LLM is red. It's a category mismatch--at least I nowthink so.
     > _
     > _
> __-- Russ <https://russabbott.substack.com/<https://russabbott.substack.com/>>
     >
     >
     >
     >
> On Tue, Sep 9, 2025 at 3:45 PM glen <[email protected]<mailto:[email protected]> <mailto:[email protected]<mailto:[email protected]>>> wrote:
     >
> The question of whether fluency is (well) correlated toaccuracy seems to assume something like mentalizing, the idea thatthere's a correspondence between minds mediated by a correspondencebetween the structure of the world and the structure of ourminds/language. We've talked about the "interface theory ofperception", where Hoffman (I think?) argues we're more likely tolearn *false* things than we are true things. And we've argued aboutrealism, pragmatism, prediction coding, and everything else under thesun on this list.
     >
> So it doesn't surprise me if most people assume there willbe more true statements in the corpus than false statements, at leastin domains where there exists a common sense, where the laity *can*perceive the truth. In things like quantum mechanics or whatever,then all bets are off becuase there are probably more false sentencesthan true ones.
     >
> If there are more true than false sentences in the corpus,then reinforcement methods like Marcus' only bear a small burden (inlay domains). The implicit fidelity does the lion's share. But inthose domains where counter-intuitive facts dominate, thereinforcement does the most work.
     >
     >
     >     On 9/9/25 3:12 PM, Marcus Daniels wrote:
> > Three ways some to mind.. I would guess that OpenAI,Google, Anthropic, and xAI are far more sophisticated..
     >      >
> > 1. Add a softmax penalty to the loss that tracksnon-factual statements or grammatical constraints. Cross entropy maynot understand that some parts of content are more important thanothers. > > 2. Change how the beam search works during inferenceto skip sequences that fail certain predicates – like a lookaheadthat says “Oh, I can’t say that..” > > 3. Grade the output, either using human or non-LLMsupervision, and re-train.
     >      >
> > *From:*Friam <[email protected]<mailto:[email protected]> <mailto:[email protected]<mailto:[email protected]>>> *On Behalf Of *Russ Abbott
     >      > *Sent:* Tuesday, September 9, 2025 3:03 PM
> > *To:* The Friday Morning Applied Complexity CoffeeGroup <[email protected] <mailto:[email protected]><mailto:[email protected] <mailto:[email protected]>>>
     >      > *Subject:* [FRIAM] Hallucinations
     >      >
> > OpenAI just published a paper on hallucinations<https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf<https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf><https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf<https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf>>> aswell as a post summarizing the paper<https://openai.com/index/why-language-models-hallucinate/<https://openai.com/index/why-language-models-hallucinate/><https://openai.com/index/why-language-models-hallucinate/<https://openai.com/index/why-language-models-hallucinate/>>>. Thetwo of them seem wrong-headed in such a simple and obvious way thatI'm surprised the issue they discuss is still alive.
     >      >
> > The paper and post point out that LLMs are trained togenerate fluent language--which they do extraordinarily well. Thepaper and post also point out that LLMs are not trained todistinguish valid from invalid statements. Given those facts aboutLLMs, it's not clear why one should expect LLMs to be able todistinguish true statements from false statements--and hence why oneshould expect to be able to prevent LLMs from hallucinating.
     >      >
> > In other words, LLMs are built to generate text; theyare not built to understand the texts they generate and certainly notto be able to determine whether the texts they generate makefactually correct or incorrect statements.
     >      >
> > Please see my post<https://russabbott.substack.com/p/why-language-models-hallucinate-according<https://russabbott.substack.com/p/why-language-models-hallucinate-according><https://russabbott.substack.com/p/why-language-models-hallucinate-according<https://russabbott.substack.com/p/why-language-models-hallucinate-according>>>elaborating on this.
     >      >
> > Why is this not obvious, and why is OpenAI stilltalking about it?
     >      >
--


.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
 1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

Re: [FRIAM] Hallucinations

Reply via email to