Whale songs have a lexical structure like human speech. In 2000 I experimented with finding word boundaries in text without spaces. Infants 7-10 months old learn to segment continuous speech before learning words by finding boundaries with low mutual information across them.
https://mattmahoney.net/dc/lex1.html Whale songs can also be partitioned using the same technique. And furthermore, the words have a Zipf distribution like all human languages, where the n'th most frequent word has a frequency proportional to 1/n. https://theconversation.com/whalesong-patterns-follow-a-universal-law-of-human-language-new-research-finds-249271 The study of 8 years of whale song recordings did not analyze semantics or grammar. So we still don't know what the whales are saying. I imagine the problem is the data set size. It is easy to train a lexical model like mine on 30 KB of text. You need a lot more data to train the higher layers in a language model. Language evolved to be learnable by neural networks one layer at a time, segmentation first, then vocabulary at a constant rate (about 15 words per day, after 20 lifetime exposures per word), then semantics, then grammar. That's why neural networks have been so successful in developing LLMs. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T73fe79f7d09a903a-Me20da016dbf39b28d92f49ee Delivery options: https://agi.topicbox.com/groups/agi/subscription