Matt,

Interesting about the lexical clustering techniques also working for whale
song. An old curiosity of mine. I've always wanted to extend language to
this. What technique did you use? Entropy maximization, repeated sequences?

There's an old paper on finding word boundaries in English based on entropy
maximization I recall from the '90s that I'd like to find. Maybe you have a
list of standard papers on it from that time.

But re. your statement for the reason NNs have been so successful
developing language models being because "Language evolved to be learnable
by neural networks one layer at a time, segmentation first, then vocabulary
... then semantics, then grammar." I dispute that. You're giving people a
bum steer if they read that and imagine it is "settled science". Not least
the hierarchical attribution of grammar over semantics (possibly influenced
by cognitivist/functionalist linguistic dogma? Or is that still a Hutter
Prize idea, that semantics will provide a minimal representation for
grammar?)

LLMs are deep. But I see no neat separation of layers into semantics and
grammar. Quite the opposite. I argue the success of LLMs is exactly that
language forced researchers to abandon any attempt to abstract structure
above the lexical level. And that that's an unappreciated cause of the
significance of language models in AI today. That this abandonment of
structure has proven relevant for broader cognition. So I see language
having forced AI to abandon stable structure. No nice continuation into
layers at all. Instead blowing out into eternally expanding menageries of
structure only constrained by the fact they predict well in sequence.
Paradoxically, the fact that prediction in sequence becomes the only
parameter, forced by language, liberating AI models from assumptions of
structure.

The revolution of "attention" being just that it enabled even more
idiosyncratic context to distinguish this eternally expanding menagerie of
structure. Idiosyncratic context unlearnable by RNNs, and only poorly
approximated by LSTM before. No stable layers.

So I'm not surprised no-one has been able to structure whale song above a
lexical level. And that we can't extract meaning from them. Because we
can't extract meaning from LLMs, either! We can identify no unifying
structure in them. They're all still just increasingly flexible (and
unreliable) look up mechanisms.

And of course, you also point out that "The study of 8 years of whale song
recordings did not analyze semantics or grammar". Consistent with my
assertion that there is no stable structure above lexicon. But inconsistent
with your belief that there is actually a neat continuation of structural
layers into semantics and grammar. And you can only explain that by
imagining the problem is "size" as usual: "I imagine the problem is the
data set size. It is easy to train a lexical model like mine on 30 KB of
text. You need a lot more data to train the higher layers in a language
model."

Ha. "Size" is the new black. Ever more data, ever larger server farms.

I'm surprised you didn't argue with Yann leCunn that whale song needs to
stop looking at whale song, and use different data. To build a "world
model" which unlike language will have the stable structure everyone
insists on continuing to expect, despite language with LLMs constantly
rubbing our noses in the utility of not expecting comprehensible stable
structure!

-Rob

On Sat, Feb 8, 2025 at 1:59 AM Matt Mahoney <mattmahone...@gmail.com> wrote:

> Whale songs have a lexical structure like human speech. In 2000 I
> experimented with finding word boundaries in text without spaces. Infants
> 7-10 months old learn to segment continuous speech before learning words by
> finding boundaries with low mutual information across them.
>
> https://mattmahoney.net/dc/lex1.html
>
> Whale songs can also be partitioned using the same technique. And
> furthermore, the words have a Zipf distribution like all human languages,
> where the n'th most frequent word has a frequency proportional to 1/n.
>
>
> https://theconversation.com/whalesong-patterns-follow-a-universal-law-of-human-language-new-research-finds-249271
>
> The study of 8 years of whale song recordings did not analyze semantics or
> grammar. So we still don't know what the whales are saying. I imagine the
> problem is the data set size. It is easy to train a lexical model like mine
> on 30 KB of text. You need a lot more data to train the higher layers in a
> language model.
>
> Language evolved to be learnable by neural networks one layer at a time,
> segmentation first, then vocabulary at a constant rate (about 15 words per
> day, after 20 lifetime exposures per word), then semantics, then grammar.
> That's why neural networks have been so successful in developing LLMs.
> *Artificial General Intelligence List <https://agi.topicbox.com/latest>*
> / AGI / see discussions <https://agi.topicbox.com/groups/agi> +
> participants <https://agi.topicbox.com/groups/agi/members> +
> delivery options <https://agi.topicbox.com/groups/agi/subscription>
> Permalink
> <https://agi.topicbox.com/groups/agi/T73fe79f7d09a903a-Me20da016dbf39b28d92f49ee>
>

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T73fe79f7d09a903a-Mc9d6f81f81081dcd0671bc73
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to