Re: [agi] Re: Safety Via Wolpert-Constrained ML

James Bowery Sun, 12 Jan 2025 08:11:05 -0800

On Sat, Jan 11, 2025 at 4:53 PM Matt Mahoney <mattmahone...@gmail.com>
wrote:

> On Fri, Jan 10, 2025, 5:33 PM James Bowery <jabow...@gmail.com> wrote:
>
>>
>> Stop palavering and start doing algorithmic information criterion
>> macrosocial model selection.
>>
>
> That's what I'm doing when I project past trends into the future. You
> compress by fitting a line to the data points and encoding the differences.
>
> But that only finds correlations, not causations. If we think that A
> causes B, then we can test P(B|A) > P(B). But you could rewrite that as:
>
> P(A,B)/P(A) > P(B)
> P(A,B) > P(A)P(B)
> P(A,B)/P(B) > P(A)
> P(A|B) > P(A)
> B causes A.
>

Nope.  That's statistics, not dynamics.  You need state space aka dynamical
models to predict things in time but you also need a dynamical, not
statistical, information criterion for selection among those models.

Popper and Kuhn both successfully attacked the foundation of data-drive
scientific discovery of causality with their psychologically and
rhetorically intense popularizations of “the philosophy of science” at the
precise moment in history that it became practical to rigorously
discriminate mere correlation from causation, even without controlled
experimentation, by looking at the data.

I only became aware of this after attempting to do first-order epidemiology
of the rise of autism that had severely impacted colleagues of mine in
Silicon Valley — investigation required since it was apparent that no one
more qualified was bothering to do so.  However, I did expect to find a
correlation with non-western immigrants from India (gut bacteria), and
found one. Now, having said that, I’m not here to make the case for that
particular causal hypothesis — there are others that I can set forth that I
also expected and did find evidence for. What I’m here to point out is that
my attempts to bring these hypotheses up were greeted with the usual
“social science” rhetoric one expects: “Correlation doesn’t imply
causation.” “Ecological correlations are invalid due to the ecological
fallacy.” and so forth. This got me interested in precisely how it is that
“social science” purports to infer causation from the data — experimental
controls being the one widely accepted means of determining causality in
the philosophy of science.

This interest was amplified when I, on something of a lark, decided to take
my data that I’d gathered to investigate the ecology of autism, and see
which of the ecological variables was the most powerful predictor of the
other variables I had chosen. One variable, in particular, that I had been
interested in, not for autism causation, but for social causality in
general, was the ratio of Jews to Whites in a human ecology at the State
level in the US. Well, out of hundreds of variables, guess which one came
out on top?

Imagine the kind of rhetorical attacks on this “lark” of mine: Same old,
same old…

So my investigation of causal inference intensified.

Eventually, circa 2004-2005, I intuited that data compression had the
answer and suggested something I called “The C-Prize” — a prize that would
pay out for incremental improvements in compression of a wide-ranging
corpus of data, resulting in computational models of complex *dynamical
systems*, including everything from physics to macrosocial models. That’s
when you, Matt Mahoney, of all people, alerted me to the information
theoretic work that distinguished between Shannon information and what is
now called “Algorithmic Information”. The seminal work in Algorithmic
Information occurred in the late 1950s and early 1960s — precisely when
Moore’s Law was taking off in its relentlessly exponentiating power.
Algorithmic Information content of a data set is the number of bits in its
smallest executable archive — the smallest algorithm that outputs that
data. Shannon information is basically just statistical. Think of the
digits of pi. Shannon says the information content is identical with the
string of digits. Algorithmic Information says the information content is
the size of the program that outputs the digits of pi.

That discovery threatened to bring the social sciences to heel with a
rigorous and principled information criterion for model selection far
superior, and provably so, to all other model selection criteria used by
the social sciences. Moreover, the models so-selected would be necessarily
causal in nature and be amenable to using the power of silicon to make
predictions without any kind of ideological bias.

This, I strongly believe, was the precise reason Popper and Kuhn committed
their acts of violence against science at the precise moment in history
they did.

I can’t tell you how depressing it is that I can’t get this across to
people even after 18 years of Marcus Hutter sponsoring, out of his own
pocket, the C-Prize.

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/Tff34429f975bba30-M87c6ca7e09fb9ebd999412de
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] Re: Safety Via Wolpert-Constrained ML

Reply via email to