One of the principles of FAIR+ is to carefully track the provenance of data. Taken seriously, this 
implies we should keep the "raw" data [⛧], each transform, and each checkpoint of the 
derived data, throughout any given workflow. What LLMs do is start a workflow from a checkpoint (or 
a set of checkpoints) data, largely ignoring the provenance of that data. Biological organisms are 
a bit different in their ability to return to the "raw" data and reproduce (not repeat) 
checkpointed data through similar workflows. E.g. I can not only read about brewing (starting in 
the middle of someone else's workflow), I can also brew (reproduce their workflow myself). LLMs 
can't do that. But a LLM embodied in metal or flesh *could* do that.

[⛧] There is no such thing as raw data. There is no such thing as absolute 
grounding. It's an infinite regress.

On 4/7/23 12:21, Steve Smith wrote:

Or ... or ... they counter the conventional wisdom that *humans* generalize 
their learning or reasoning beyond text. We are the OG bots.

I am fascinated by the "bootstrapping" that semantic/syntactic recursion seems 
to imply.   I'm looking for examples in these LLMs where this is exposed.   It seems that 
at the level of sophistication GPT4 is responding, this must be common?

It feels as if there might be a surprisingly robust "house of cards" built 
entirely of lexical units which are stacked very cleverly together (in an echo of the 
patterns of the artificial neural net connections made while training an LLM?) and you 
are suggesting that this is roughly what human learning/understanding looks like as 
well...

Lakoff and Nunez's "embodiment" arguements seem to suggest that even if there 
may be some elaborate card-stacking going on, there is a foundation (in human 
understanding) of sensorial/sentient/qualia experiences they ground out upon/in?

If we tentatively accept this account, then I wonder what LLMs are grounded in?  It would 
seem maybe "hallucinations" of sensorial experiences?

So: from the horse's mouth:

    what are LLMs' structures grounded out in?

    Language models such as LLMs (large language models) are grounded in vast 
amounts of textual data that they are trained on. The structures of LLMs are 
derived from the statistical patterns and regularities that are present in the 
data. These patterns are then encoded as learned parameters in the neural 
network architecture of the language model.

    In other words, the structures of LLMs are not explicitly programmed by 
humans, but are instead learned from large datasets of natural language text. 
This allows LLMs to capture the complex and subtle patterns of natural 
language, including its syntax, semantics, and pragmatics.

    While LLMs are not directly grounded in embodied experiences, as in the 
theory of embodied mind, they do reflect the linguistic and cultural context in 
which the data they are trained on was produced.



I do really appreciate this duality/tension:   I think you were the first to alert me to 
this a few thousand messages back (before LLMs/GPT talk, etc erupted here) though I 
vaguely remember Marcus making a (qualitatively) similar statement as well.  I think his 
comment was about whether human (early childhood in particular) was anything different 
from "emulation".


On 4/7/23 09:15, Steve Smith wrote:
    These findings counter the conventional wisdom that LLMs are merely 
statistical next-word predictors and can’t generalize their learning or 
reasoning beyond text.



--
ꙮ Mɥǝu ǝlǝdɥɐuʇs ɟᴉƃɥʇ' ʇɥǝ ƃɹɐss snɟɟǝɹs˙ ꙮ

-. --- - / ...- .- .-.. .. -.. / -- --- .-. ... . / -.-. --- -.. .
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
 1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

Reply via email to