On Tue, Jul 23, 2024 at 7:15 PM Matt Mahoney <mattmahone...@gmail.com>
wrote:

> On Tue, Jul 23, 2024 at 7:07 PM James Bowery <jabow...@gmail.com> wrote:
> >
> > That sounds like you're saying benchmarks for language modeling
> algorithms aka training algorithms are uninteresting because we've learned
> all we need to learn about them.  Surely you don't mean to say that!
>
> I mean to say that testing algorithms and testing language models are
> different things.


That was my point.

On Tue, Jul 23, 2024 at 2:08 PM James Bowery <jabow...@gmail.com> wrote:

> I directed the question at you because you are likely to understand how
> different training and inference are ...
>



> Language models have to be tested in the way they
> are to be used, on terabytes of up to date training data with lots of
> users.


Obviously, except in the case where we are interested in benchmarking
modeling algorithms aka training algorithms in accord with scaling laws
which pertain both to modeling performance and model performance.

The issue of "data efficiency", for one example, is far from settled
despite the motivated reasoning of those who have access to enormous
resources. e.g.

https://arxiv.org/pdf/2201.02177

> Abstract: In this paper we propose to study generalization of neural
> networks on small algorithmically generated datasets. In this setting,
> questions about data efficiency, memorization, generalization, and speed of
> learning can be studied in great detail. In some situations we show that
> neural networks learn through a process of “grokking” a pattern in the
> data, improving generalization performance from random chance level to
> perfect generalization, and that this improvement in generalization can
> happen well past the point of overfitting. We also study generalization as
> a function of dataset size and find that smaller datasets require
> increasing amounts of optimization for generalization. We argue that these
> datasets provide a fertile ground for studying a poorly understood aspect
> of deep learning: generalization of overparametrized neural networks beyond
> memorization of the finite training dataset.


and the derivative
https://github.com/ironjr/grokfast

> Abstract: One puzzling artifact in machine learning dubbed grokking is
> where delayed generalization is achieved tenfolds of iterations after near
> perfect overfitting to the training data. Focusing on the long delay itself
> on behalf of machine learning practitioners, our goal is to accelerate
> generalization of a model under grokking phenomenon. By regarding a series
> of gradients of a parameter over training iterations as a random signal
> over time, we can spectrally decompose the parameter trajectories under
> gradient descent into two components: the fast-varying,
> overfitting-yielding component and the slow-varying,
> generalization-inducing component. This analysis allows us to accelerate
> the grokking phenomenon more than× 50 with only a few lines of code that
> amplifies the slow-varying components of gradients. The experiments show
> that our algorithm applies to diverse tasks involving images, languages,
> and graphs, enabling practical availability of this peculiar artifact of
> sudden generalization.


One of the earliest examples of state space model breakthrough demonstrated
a 10x improvement in data efficiency or computational efficiency over
transformers in the range of scales that the researchers could afford, but
it was ignored and they couldn't get funding to expand the scaling law.
Nowadays, of course, everyone is all over state space models because of
their modeling efficiency.




> It is an expensive, manual process of curating the training
> data, looking at the responses, and providing feedback. The correct
> output is no longer the most likely prediction, like if the LLM is
> going to be used in a customer service position or something. Testing
> on a standard compression benchmark like the Hutter prize is the easy
> part.
> 
> --
> -- Matt Mahoney, mattmahone...@gmail.com

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T6510028eea311a76-M030b5b3dd6bd602cec76603b
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to