I did something along these lines with the LaboratoryOfTheStates by taking all pairs of columns, applying all combinations of (identity, sqrt, log) between them, dropping the high and low datapoints (standard practice to inhibit outlier effects), picking the combination that results in the highest r^2 (coefficient of determination) for that pair (under that combination), and then averaging across all other variables to rank order variables (columns) according to how much they tell you about the rest of the dataset. That's when I got myself into some *serious* trouble (Crimestop-wise) which resulted in me thinking about something more principled that could get me *out* of trouble by eliminating any suspicion that I was engaged in motivated reasoning: Hence the C-Prize idea.
So, no it doesn't "make sense" in the sense I mean by seeking algorithmic information approximation as the information criterion for model selection. However, I think the algorithmic markov paper is saying something different: Any markov process has a notion of "state" which, in turn, has a notion of ordering of *rows* rather than columns. On Thu, May 1, 2025 at 6:21 PM Matt Mahoney <mattmahone...@gmail.com> wrote: > On Thu, May 1, 2025 at 2:50 PM Matt Mahoney <mattmahone...@gmail.com> > wrote: > > > > I think I understand. We say that X causes Y if we can describe Y as a > > function of X. If the simplest description of X and Y has the form Y = > > f(X), then we are using algorithmic information to find causality. For > > example, > > > > X Y > > - - > > 1 1 > > 2 2 > > 3 2 > > > > then I can write Y as a function of X, but not X as a function of Y. > > Thus, the DAG X -> Y is more plausible than Y -> X. > > > > To make this practical, the paper postulates a noise signal, as Y = > > f(X, N), where N can be 0 in the first case but not in the second. > > Thus, less algorithmic information is needed to encode the first case. > > But is this a reasonable definition of causality? To test if Y is a > simple function of X, we would use compression to approximate K(Y|X), > and say that X causes Y if this value is smaller than K(X|Y). To > measure K(Y|X) you would compress X (a column of numbers in a table), > and subtract that from the compressed size of X concatenated with Y. > To test the causal relationships between n variables, you need to > compress n^2 pairs of columns. But observe that > > K(Y|X)K(X) = K(X, Y) = K(X|Y)K(Y) > > So you really only need to compare K(X) and K(Y). Whichever is larger > causes the other. You compress n columns and sort them by size from > largest to smallest. That is your DAG. > > This would be easy to do with thousands of rows and columns, for > example, LaboratoryOfTheCounties, where the rows are counties and the > columns are things like the percent of population under age 5 or the > number of farms between 50 and 100 acres, to see which causes the > other. > > But does that make sense? > > -- > -- Matt Mahoney, mattmahone...@gmail.com ------------------------------------------ Artificial General Intelligence List: AGI Permalink: https://agi.topicbox.com/groups/agi/T0f47884dae19d52d-M8c387ff42db47a221cb4ee05 Delivery options: https://agi.topicbox.com/groups/agi/subscription