On Thu, May 15, 2025 at 12:39 PM Matt Mahoney <mattmahone...@gmail.com> wrote:
> > In a text compressor, the model is updated after each prediction. > Single pass text compressors do that. Existing "large models" are multi-pass. As long as existing scaling laws of machine learning remain un-disturbed by fundamental advances (as might be discovered at relatively low risk to payoff ratio by, for example, a $1e9 Hutter Prize purse) there is good reason to believe that all down-stream stages from the "foundation model" aka "pre-training model" aka (?) not only benefit from better approximation of the algorithmic information of the corpus, but that it may be futile to patch their inadequacies by any amount of downstream modification. For example, all attempts at suppressing "toxicity", such as "cleaning" the data and then using RL turn out to have been, in hindsight, obviously wrong. All you need to do is let the multi-pass compression of the unedited corpus during "pre-training" of the "foundation model" do the forensic epistemology. This doesn't tell you what is "toxic" vs what is not "toxic" but it _does_ enable the RL to figure out what you mean by "toxic" more efficiently when you whack it upside the head for being "toxic". This is because forensic epistemology -- ie: ruthless truth discovery -- provides a predictive ontology. You don't need ongoing updates from "news" to get such essential modeling work done and there is reason to believe it may be a wasted effort to add new data to the mix unless you are willing to unlearn a great deal that existing scaling laws will force you to relearn at exponential costs. You will want to do that when there is sufficient backlog of "news" to justify the cost. ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/Tdc5c19d0f38aacd6-M21cd4aa52a24c88e49cfb192 Delivery options: https://agi.topicbox.com/groups/agi/subscription