On Tue, Jul 23, 2024 at 7:07 PM James Bowery <jabow...@gmail.com> wrote:
>
> That sounds like you're saying benchmarks for language modeling algorithms 
> aka training algorithms are uninteresting because we've learned all we need 
> to learn about them.  Surely you don't mean to say that!

I mean to say that testing algorithms and testing language models are
different things. Language models have to be tested in the way they
are to be used, on terabytes of up to date training data with lots of
users. It is an expensive, manual process of curating the training
data, looking at the responses, and providing feedback. The correct
output is no longer the most likely prediction, like if the LLM is
going to be used in a customer service position or something. Testing
on a standard compression benchmark like the Hutter prize is the easy
part.

-- 
-- Matt Mahoney, mattmahone...@gmail.com

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T6510028eea311a76-Ma7f4afd32f70b9a207fdb388
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to