Matt, Interesting about the lexical clustering techniques also working for whale song. An old curiosity of mine. I've always wanted to extend language to this. What technique did you use? Entropy maximization, repeated sequences?
There's an old paper on finding word boundaries in English based on entropy maximization I recall from the '90s that I'd like to find. Maybe you have a list of standard papers on it from that time. But re. your statement for the reason NNs have been so successful developing language models being because "Language evolved to be learnable by neural networks one layer at a time, segmentation first, then vocabulary ... then semantics, then grammar." I dispute that. You're giving people a bum steer if they read that and imagine it is "settled science". Not least the hierarchical attribution of grammar over semantics (possibly influenced by cognitivist/functionalist linguistic dogma? Or is that still a Hutter Prize idea, that semantics will provide a minimal representation for grammar?) LLMs are deep. But I see no neat separation of layers into semantics and grammar. Quite the opposite. I argue the success of LLMs is exactly that language forced researchers to abandon any attempt to abstract structure above the lexical level. And that that's an unappreciated cause of the significance of language models in AI today. That this abandonment of structure has proven relevant for broader cognition. So I see language having forced AI to abandon stable structure. No nice continuation into layers at all. Instead blowing out into eternally expanding menageries of structure only constrained by the fact they predict well in sequence. Paradoxically, the fact that prediction in sequence becomes the only parameter, forced by language, liberating AI models from assumptions of structure. The revolution of "attention" being just that it enabled even more idiosyncratic context to distinguish this eternally expanding menagerie of structure. Idiosyncratic context unlearnable by RNNs, and only poorly approximated by LSTM before. No stable layers. So I'm not surprised no-one has been able to structure whale song above a lexical level. And that we can't extract meaning from them. Because we can't extract meaning from LLMs, either! We can identify no unifying structure in them. They're all still just increasingly flexible (and unreliable) look up mechanisms. And of course, you also point out that "The study of 8 years of whale song recordings did not analyze semantics or grammar". Consistent with my assertion that there is no stable structure above lexicon. But inconsistent with your belief that there is actually a neat continuation of structural layers into semantics and grammar. And you can only explain that by imagining the problem is "size" as usual: "I imagine the problem is the data set size. It is easy to train a lexical model like mine on 30 KB of text. You need a lot more data to train the higher layers in a language model." Ha. "Size" is the new black. Ever more data, ever larger server farms. I'm surprised you didn't argue with Yann leCunn that whale song needs to stop looking at whale song, and use different data. To build a "world model" which unlike language will have the stable structure everyone insists on continuing to expect, despite language with LLMs constantly rubbing our noses in the utility of not expecting comprehensible stable structure! -Rob On Sat, Feb 8, 2025 at 1:59 AM Matt Mahoney <mattmahone...@gmail.com> wrote: > Whale songs have a lexical structure like human speech. In 2000 I > experimented with finding word boundaries in text without spaces. Infants > 7-10 months old learn to segment continuous speech before learning words by > finding boundaries with low mutual information across them. > > https://mattmahoney.net/dc/lex1.html > > Whale songs can also be partitioned using the same technique. And > furthermore, the words have a Zipf distribution like all human languages, > where the n'th most frequent word has a frequency proportional to 1/n. > > > https://theconversation.com/whalesong-patterns-follow-a-universal-law-of-human-language-new-research-finds-249271 > > The study of 8 years of whale song recordings did not analyze semantics or > grammar. So we still don't know what the whales are saying. I imagine the > problem is the data set size. It is easy to train a lexical model like mine > on 30 KB of text. You need a lot more data to train the higher layers in a > language model. > > Language evolved to be learnable by neural networks one layer at a time, > segmentation first, then vocabulary at a constant rate (about 15 words per > day, after 20 lifetime exposures per word), then semantics, then grammar. > That's why neural networks have been so successful in developing LLMs. > *Artificial General Intelligence List <https://agi.topicbox.com/latest>* > / AGI / see discussions <https://agi.topicbox.com/groups/agi> + > participants <https://agi.topicbox.com/groups/agi/members> + > delivery options <https://agi.topicbox.com/groups/agi/subscription> > Permalink > <https://agi.topicbox.com/groups/agi/T73fe79f7d09a903a-Me20da016dbf39b28d92f49ee> > ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T73fe79f7d09a903a-Mc9d6f81f81081dcd0671bc73 Delivery options: https://agi.topicbox.com/groups/agi/subscription