I did something along these lines with the LaboratoryOfTheStates by taking
all pairs of columns, applying all combinations of (identity, sqrt, log)
between them, dropping the high and low datapoints (standard practice to
inhibit outlier effects), picking the combination that results in the
highest r^2 (coefficient of determination) for that pair (under that
combination), and then averaging across all other variables to rank order
variables (columns) according to how much they tell you about the rest of
the dataset. That's when I got myself into some *serious* trouble
(Crimestop-wise) which resulted in me thinking about something more
principled that could get me *out* of trouble by eliminating any suspicion
that I was engaged in motivated reasoning:  Hence the C-Prize idea.

So, no it doesn't "make sense" in the sense I mean by seeking algorithmic
information approximation as the information criterion for model
selection.

However, I think the algorithmic markov paper is saying something different:

Any markov process has a notion of "state" which, in turn, has a notion of
ordering of *rows* rather than columns.



On Thu, May 1, 2025 at 6:21 PM Matt Mahoney <mattmahone...@gmail.com> wrote:

> On Thu, May 1, 2025 at 2:50 PM Matt Mahoney <mattmahone...@gmail.com>
> wrote:
> >
> > I think I understand. We say that X causes Y if we can describe Y as a
> > function of X. If the simplest description of X and Y has the form Y =
> > f(X), then we are using algorithmic information to find causality. For
> > example,
> >
> > X Y
> > - -
> > 1 1
> > 2 2
> > 3 2
> >
> > then I can write Y as a function of X, but not X as a function of Y.
> > Thus, the DAG X -> Y is more plausible than Y -> X.
> >
> > To make this practical, the paper postulates a noise signal, as Y =
> > f(X, N), where N can be 0 in the first case but not in the second.
> > Thus, less algorithmic information is needed to encode the first case.
> 
> But is this a reasonable definition of causality? To test if Y is a
> simple function of X, we would use compression to approximate K(Y|X),
> and say that X causes Y if this value is smaller than K(X|Y). To
> measure K(Y|X) you would compress X (a column of numbers in a table),
> and subtract that from the compressed size of X concatenated with Y.
> To test the causal relationships between n variables, you need to
> compress n^2 pairs of columns. But observe that
> 
> K(Y|X)K(X) = K(X, Y) = K(X|Y)K(Y)
> 
> So you really only need to compare K(X) and K(Y). Whichever is larger
> causes the other. You compress n columns and sort them by size from
> largest to smallest. That is your DAG.
> 
> This would be easy to do with thousands of rows and columns, for
> example, LaboratoryOfTheCounties, where the rows are counties and the
> columns are things like the percent of population under age 5 or the
> number of farms between 50 and 100 acres, to see which causes the
> other.
> 
> But does that make sense?
> 
> --
> -- Matt Mahoney, mattmahone...@gmail.com

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T0f47884dae19d52d-M8c387ff42db47a221cb4ee05
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Reply via email to