On Tue, Jul 23, 2024 at 7:07 PM James Bowery <jabow...@gmail.com> wrote: > > That sounds like you're saying benchmarks for language modeling algorithms > aka training algorithms are uninteresting because we've learned all we need > to learn about them. Surely you don't mean to say that!
I mean to say that testing algorithms and testing language models are different things. Language models have to be tested in the way they are to be used, on terabytes of up to date training data with lots of users. It is an expensive, manual process of curating the training data, looking at the responses, and providing feedback. The correct output is no longer the most likely prediction, like if the LLM is going to be used in a customer service position or something. Testing on a standard compression benchmark like the Hutter prize is the easy part. -- -- Matt Mahoney, mattmahone...@gmail.com ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T6510028eea311a76-Ma7f4afd32f70b9a207fdb388 Delivery options: https://agi.topicbox.com/groups/agi/subscription