Whale songs have a lexical structure like human speech. In 2000 I
experimented with finding word boundaries in text without spaces. Infants
7-10 months old learn to segment continuous speech before learning words by
finding boundaries with low mutual information across them.

https://mattmahoney.net/dc/lex1.html

Whale songs can also be partitioned using the same technique. And
furthermore, the words have a Zipf distribution like all human languages,
where the n'th most frequent word has a frequency proportional to 1/n.

https://theconversation.com/whalesong-patterns-follow-a-universal-law-of-human-language-new-research-finds-249271

The study of 8 years of whale song recordings did not analyze semantics or
grammar. So we still don't know what the whales are saying. I imagine the
problem is the data set size. It is easy to train a lexical model like mine
on 30 KB of text. You need a lot more data to train the higher layers in a
language model.

Language evolved to be learnable by neural networks one layer at a time,
segmentation first, then vocabulary at a constant rate (about 15 words per
day, after 20 lifetime exposures per word), then semantics, then grammar.
That's why neural networks have been so successful in developing LLMs.

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T73fe79f7d09a903a-Me20da016dbf39b28d92f49ee
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to