Re: [FRIAM] Hallucinations

Marcus Daniels Thu, 11 Sep 2025 12:55:00 -0700

It may have to do with some people believing that their consciousness amounts 
to something bigger than their stack of skills and their memory.


 

From: Friam <[email protected]> On Behalf Of Gillian Densmore
Sent: Thursday, September 11, 2025 12:47 PM
To: The Friday Morning Applied Complexity Coffee Group <[email protected]>
Subject: Re: [FRIAM] Hallucinations

 

I have a question: Why is it when a chatbot makes stuff up it's called 
Hallucinating. On the other hand if I said there's lots of people out there: I 
am either douglas addams kinds of wrong, or (purely to avoid rule lawyering): 
Give context, that or i'm just making tings up (the fine and noble art the 
white lie..

Ok so why is it then: Chatbot's get callued Hallucinating, but a person is 
either making things up, or just confused someone. So why do we say: Chatbot 
Hallucinated,vs started pulling ********** out of it's digital and hightly 
trained electronic backside and sitting part? (aka BSing and telling fat lies).

For instance many years ago: chatgpt 2+2 is 4 (internet trolls went: oh 
really?) and soon the 2+2=5  Acording to chatgpt (ie making stuff up!)  Why is 
that called hallucinating vs BSing and making sh**** up?   

 

On Thu, Sep 11, 2025 at 11:30 AM Marcus Daniels <[email protected] 
<mailto:[email protected]> > wrote:

It seems to me that we are full circle back to a Turing Test.  If the LLM 
encodes and demonstrates skill (they certainly do), and these skills can be 
progress a solution of some real-world problem, then it is just empty 
chauvinism to say they don’t understand a topic.

 

From: Friam <[email protected] <mailto:[email protected]> > on 
behalf of Steve Smith <[email protected] <mailto:[email protected]> >
Date: Thursday, September 11, 2025 at 10:12 AM
To: [email protected] <mailto:[email protected]>  <[email protected] 
<mailto:[email protected]> >
Subject: Re: [FRIAM] Hallucinations

I find LLM engagement to be somewhere between that with a highly 
plausible gossip and a well researched survey paper in a subject I am 
interested in?

Where a given conversation lands in this interval almost exclusively 
seems to rely on my care in crafting my prompts.

I don't expect 'truth' out of either gossip or a survey paper... just 
'perspective'?

On 9/11/25 10:55 am, glen wrote:
> OK. You're right in principle. But we might want to think of this in 
> the context of all algorithms. For example, let's say you run a FFT on 
> a signal and it outputs some frequencies. Does the signal *actually* 
> contain or express those frequencies? Or is it just an inference that 
> we find reliable?
>
> The same is true of the LLM inferences. Whether one ascribes truth or 
> falsity to those inferences is only relevant to metaphysicians and 
> philosophers. What matters is how reliable the inferences are when we 
> do some task. Yelling at the kids on your lawn doesn't achieve 
> anything. It's better to go out there and talk to them. 8^D
>
>
> On 9/10/25 8:38 PM, Russ Abbott wrote:
>> Glen, I wish people would stop talking about whether LLM-generated 
>> sentences are true or false. The mechanisms LLMs employ to generate a 
>> sentence have nothing to do with whether the sentence turns out to be 
>> true or false. A sentence may have a higher probability of being true 
>> if the training data consisted entirely of true sentences. (Even 
>> that's not guaranteed; similar true sentences might have their 
>> components interchanged when used during generation.) But the point 
>> is: the transformer process has no connection to the validity of its 
>> output. If an LLM reliably generates true sentences, no credit is due 
>> to the transformer. If the training data consists entirely of 
>> true/false sentences, the generated output is more likely to be 
>> true/false. Output validity plays no role in how an LLM generates its 
>> output.
>>
>> Marcus, if an LLM is trained entirely on false statements, its 
>> "confidence" in its output will presumably be the same as it would be 
>> if it were trained entirely on true statements. Truthfulness is not a 
>> consideration in the generation process. Speaking of a need to reduce 
>> ambiguity suggests that the LLM understands the input and realizes it 
>> might have multiple meanings. But of course, LLMs don't understand 
>> anything, they don't realize anything, and they can't take meaning 
>> into consideration when generating output.
>>
>>
>>
>>
>>
>> On Tue, Sep 9, 2025 at 5:20 PM glen <[email protected] 
>> <mailto:[email protected]>  
>> <mailto:[email protected]>> wrote:
>>
>>     It's unfortunate jargon [⛧]. So it's nothing like whether an LLM 
>> is red (unless you adopt a jargonal definition of "red"). And your 
>> example is a great one for understanding how language fluency *is* at 
>> least somewhat correlated with fidelity. The statistical probability 
>> of the phrase "LLMs hallucinate" is >> 0, whereas the prob for the 
>> phrase "LLMs are red" is vanishingly small. It would be the same for 
>> black swans and Lewis Carroll writings *if* they weren't canonical 
>> teaching devices. It can't be that sophisticated if children think 
>> it's funny.
>>
>>     But imagine all the woo out there where words like "entropy" or 
>> "entanglement" are used falsely. IDK for sure, but my guess is the 
>> false sentences outnumber the true ones by a lot. So the LLM has a 
>> high probability of forming false sentences.
>>
>>     Of course, in that sense, if a physicist finds themselves talking 
>> to an expert in the "Law of Attraction" (e.g. the movie "The Secret") 
>> and makes scientifically true statements about entanglement, the guru 
>> may well judge them as false. So there's "true in context" (validity) 
>> and "ontologically true" (soundness). A sentence can be true in 
>> context but false in the world and vice versa, depending on who's in 
>> control of the reinforcement.
>>
>>
>>     [⛧] We could discuss the strength of the analogy between human 
>> hallucination and LLM "hallucination", especially in the context of 
>> prediction coding. But we don't need to. Just consider it jargon and 
>> move on.
>>
>>     On 9/9/25 4:37 PM, Russ Abbott wrote:
>>      > Marcus, Glen,
>>      >
>>      > Your responses are much too sophisticated for me. Now that I'm 
>> retired (and, in truth, probably before as well), I tend to think in 
>> much simpler terms.
>>      >
>>      > My basic point was to express my surprise at realizing that it 
>> makes as much sense to ask whether an LLM hallucinates as it does to 
>> ask whether an LLM is red. It's a category mismatch--at least I now 
>> think so.
>>      > _
>>      > _
>>      > __-- Russ <https://russabbott.substack.com/ 
>> <https://russabbott.substack.com/>>
>>      >
>>      >
>>      >
>>      >
>>      > On Tue, Sep 9, 2025 at 3:45 PM glen <[email protected] 
>> <mailto:[email protected]>  
>> <mailto:[email protected]> <mailto:[email protected] 
>> <mailto:[email protected]>  
>> <mailto:[email protected]>>> wrote:
>>      >
>>      >     The question of whether fluency is (well) correlated to 
>> accuracy seems to assume something like mentalizing, the idea that 
>> there's a correspondence between minds mediated by a correspondence 
>> between the structure of the world and the structure of our 
>> minds/language. We've talked about the "interface theory of 
>> perception", where Hoffman (I think?) argues we're more likely to 
>> learn *false* things than we are true things. And we've argued about 
>> realism, pragmatism, prediction coding, and everything else under the 
>> sun on this list.
>>      >
>>      >     So it doesn't surprise me if most people assume there will 
>> be more true statements in the corpus than false statements, at least 
>> in domains where there exists a common sense, where the laity *can* 
>> perceive the truth. In things like quantum mechanics or whatever, 
>> then all bets are off becuase there are probably more false sentences 
>> than true ones.
>>      >
>>      >     If there are more true than false sentences in the corpus, 
>> then reinforcement methods like Marcus' only bear a small burden (in 
>> lay domains). The implicit fidelity does the lion's share. But in 
>> those domains where counter-intuitive facts dominate, the 
>> reinforcement does the most work.
>>      >
>>      >
>>      >     On 9/9/25 3:12 PM, Marcus Daniels wrote:
>>      >      > Three ways some to mind..  I would guess that OpenAI, 
>> Google, Anthropic, and xAI are far more sophisticated..
>>      >      >
>>      >      >  1. Add a softmax penalty to the loss that tracks 
>> non-factual statements or grammatical constraints.  Cross entropy may 
>> not understand that some parts of content are more important than 
>> others.
>>      >      >  2. Change how the beam search works during inference 
>> to skip sequences that fail certain predicates – like a lookahead 
>> that says “Oh, I can’t say that..”
>>      >      >  3. Grade the output, either using human or non-LLM 
>> supervision, and re-train.
>>      >      >
>>      >      > *From:*Friam <[email protected] 
>> <mailto:[email protected]>  
>> <mailto:[email protected]> <mailto:[email protected] 
>> <mailto:[email protected]>  
>> <mailto:[email protected]>>> *On Behalf Of *Russ Abbott
>>      >      > *Sent:* Tuesday, September 9, 2025 3:03 PM
>>      >      > *To:* The Friday Morning Applied Complexity Coffee 
>> Group <[email protected] <mailto:[email protected]>  
>> <mailto:[email protected]> 
>> <mailto:[email protected] <mailto:[email protected]>  
>> <mailto:[email protected]>>>
>>      >      > *Subject:* [FRIAM] Hallucinations
>>      >      >
>>      >      > OpenAI just published a paper on hallucinations 
>> <https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
>>  
>> <https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf>
>>  
>> <https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
>>  
>> <https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf>>>
>>  as 
>> well as a post summarizing the paper 
>> <https://openai.com/index/why-language-models-hallucinate/ 
>> <https://openai.com/index/why-language-models-hallucinate/> 
>> <https://openai.com/index/why-language-models-hallucinate/ 
>> <https://openai.com/index/why-language-models-hallucinate/>>>. The 
>> two of them seem wrong-headed in such a simple and obvious way that 
>> I'm surprised the issue they discuss is still alive.
>>      >      >
>>      >      > The paper and post point out that LLMs are trained to 
>> generate fluent language--which they do extraordinarily well. The 
>> paper and post also point out that LLMs are not trained to 
>> distinguish valid from invalid statements. Given those facts about 
>> LLMs, it's not clear why one should expect LLMs to be able to 
>> distinguish true statements from false statements--and hence why one 
>> should expect to be able to prevent LLMs from hallucinating.
>>      >      >
>>      >      > In other words, LLMs are built to generate text; they 
>> are not built to understand the texts they generate and certainly not 
>> to be able to determine whether the texts they generate make 
>> factually correct or incorrect statements.
>>      >      >
>>      >      > Please see my post 
>> <https://russabbott.substack.com/p/why-language-models-hallucinate-according 
>> <https://russabbott.substack.com/p/why-language-models-hallucinate-according>
>>  
>> <https://russabbott.substack.com/p/why-language-models-hallucinate-according 
>> <https://russabbott.substack.com/p/why-language-models-hallucinate-according>>>
>>  
>> elaborating on this.
>>      >      >
>>      >      > Why is this not obvious, and why is OpenAI still 
>> talking about it?
>>      >      >
>>     -- 
>
>

.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

smime.p7s
Description: S/MIME cryptographic signature

.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

Re: [FRIAM] Hallucinations

Reply via email to