Re: [agi] "Causal inference using the algorithmic Markov condition" by Janzing and Schölkopf

James Bowery Fri, 02 May 2025 12:08:25 -0700

Try these prompts on your favorite language model:

When investigating hypotheses regarding causality, if one has many
variables being measured (columns) compared to the number of cases measured
(rows) what are the guidelines about the ratio of columns to rows?


In situations where the data being modeled are many terabytes of text drawn
from the entire body of human knowledge, what explains the utility of the
resulting world models given that one may view next token prediction as a
single time series variable with literally trillions of cases?


On Fri, May 2, 2025 at 1:31 PM James Bowery <jabow...@gmail.com> wrote:

> Let me elaborate on what I mean by "ordering of *rows* rather than
> columns" in the present context:
>
> In my exploratory data analysis for autism using the Laboratory of the
> States data, there were 51 "rows":  The 50 States plus Washington, D.C.
> There is no "time" apparent in these rows unless one posits markov
> processes *latent in the data*.  By that I mean (and this is something I
> discussed with you some months ago in this very forum) some "States" may be
> in a different "state" of time evolution of the underlying process: they
> may be "behind the times" or "ahead of the times" so to speak.  This spread
> of evolutionary states across States results in a scatter between two
> variables.  One may *model* this process by some sort of regression fit
> between the States.  Let's say, for instance, there is a time progression
> of urbanization of the population.  One might posit all manner of causal
> laws involving an individual experiencing high population density
> environments vs those experiencing low population density environments.
> Indeed, Steve Sailer
> <https://www.amazon.com/Noticing-Essential-1973-2023-Steve-Sailer/dp/1959403028>
>  (who
> used my old-now-defunct laboratoryopfthestates.org website quite a bit)
> got himself in some serious Crimestop trouble by "noticing:
> <https://web.archive.org/web/20050829182413/http://www.vdare.com/sailer/041212_secret.htm>
>
> Overall, Bush carried the top 25 states ranked on years married for white
>> women. The correlation coefficient with Bush's share of the vote is 0.91,
>> or 83 percent of the variation "explained." That's extremely high. Years
>> married also correlates with the 2000 election results at the 0.89 level
>> (80 percent). So it's no fluke.
>
>
> The r-squared when years married and fertility are combined in a multiple
>> regression model improves to 88 percent. (Small-sounding change, perhaps,
>> but actually an important (30%) reduction in the unaccounted variation -
>> from 17 percent to 12 percent.)
>
>
> That r^2 is among the highest if not *the* highest ever found in the
> social sciences between two variables that are *both* salient *and*
> non-trivially related. You will also notice that website no longer exists
> due to lawfare against it by the same person that prosecuted Trump.  Sailer
> started down the path toward that discovery by first looking at what he
> called "affordable family formation" and then noticing the white TFR was a
> strong predictor of electoral outcome.
>
> The accusation that Sailer is a "white supremacist" followed on such
> Crimethink about TFR -- which boils down to an indictment of his notincing
> such a huge r^2 as resulting merely from "reasoning motivated by a desire
> to kill all the nonwhites" or some such transparent bullshit -- bullshit
> that is nevertheless highly politically motivating to places like Harvard's
> sociology department.
>
> On Fri, May 2, 2025 at 12:20 PM James Bowery <jabow...@gmail.com> wrote:
>
>> I did something along these lines with the LaboratoryOfTheStates by
>> taking all pairs of columns, applying all combinations of (identity, sqrt,
>> log) between them, dropping the high and low datapoints (standard practice
>> to inhibit outlier effects), picking the combination that results in the
>> highest r^2 (coefficient of determination) for that pair (under that
>> combination), and then averaging across all other variables to rank order
>> variables (columns) according to how much they tell you about the rest of
>> the dataset. That's when I got myself into some *serious* trouble
>> (Crimestop-wise) which resulted in me thinking about something more
>> principled that could get me *out* of trouble by eliminating any
>> suspicion that I was engaged in motivated reasoning:  Hence the C-Prize
>> idea.
>>
>> So, no it doesn't "make sense" in the sense I mean by seeking algorithmic
>> information approximation as the information criterion for model
>> selection.
>>
>> However, I think the algorithmic markov paper is saying something
>> different:
>>
>> Any markov process has a notion of "state" which, in turn, has a notion
>> of ordering of *rows* rather than columns.
>>
>>
>>
>> On Thu, May 1, 2025 at 6:21 PM Matt Mahoney <mattmahone...@gmail.com>
>> wrote:
>>
>>> On Thu, May 1, 2025 at 2:50 PM Matt Mahoney <mattmahone...@gmail.com>
>>> wrote:
>>> >
>>> > I think I understand. We say that X causes Y if we can describe Y as a
>>> > function of X. If the simplest description of X and Y has the form Y =
>>> > f(X), then we are using algorithmic information to find causality. For
>>> > example,
>>> >
>>> > X Y
>>> > - -
>>> > 1 1
>>> > 2 2
>>> > 3 2
>>> >
>>> > then I can write Y as a function of X, but not X as a function of Y.
>>> > Thus, the DAG X -> Y is more plausible than Y -> X.
>>> >
>>> > To make this practical, the paper postulates a noise signal, as Y =
>>> > f(X, N), where N can be 0 in the first case but not in the second.
>>> > Thus, less algorithmic information is needed to encode the first case.
>>> 
>>> But is this a reasonable definition of causality? To test if Y is a
>>> simple function of X, we would use compression to approximate K(Y|X),
>>> and say that X causes Y if this value is smaller than K(X|Y). To
>>> measure K(Y|X) you would compress X (a column of numbers in a table),
>>> and subtract that from the compressed size of X concatenated with Y.
>>> To test the causal relationships between n variables, you need to
>>> compress n^2 pairs of columns. But observe that
>>> 
>>> K(Y|X)K(X) = K(X, Y) = K(X|Y)K(Y)
>>> 
>>> So you really only need to compare K(X) and K(Y). Whichever is larger
>>> causes the other. You compress n columns and sort them by size from
>>> largest to smallest. That is your DAG.
>>> 
>>> This would be easy to do with thousands of rows and columns, for
>>> example, LaboratoryOfTheCounties, where the rows are counties and the
>>> columns are things like the percent of population under age 5 or the
>>> number of farms between 50 and 100 acres, to see which causes the
>>> other.
>>> 
>>> But does that make sense?
>>> 
>>> --
>>> -- Matt Mahoney, mattmahone...@gmail.com

------------------------------------------
Artificial General Intelligence List: AGI
Permalink: 
https://agi.topicbox.com/groups/agi/T0f47884dae19d52d-Mdfeeea7b7fa7801358ee201f
Delivery options: https://agi.topicbox.com/groups/agi/subscription

Re: [agi] "Causal inference using the algorithmic Markov condition" by Janzing and Schölkopf

Reply via email to