Laurent Jégou wrote:

> Hello list members, i'd like to calculate the Moran I on a large dataset,
> a 8640x3432 grid of values.
> 
> When i try to create the neighbours list with the cell2nb function, on
> such a scale, R works for several hours and seems to crash. I'm using the last
> version (2.13), 64 bits, on a mac pro with 4go of memory.
> 
> I think that i'm doing it wrong :-)
> 
> I'd like to use a "moving window" weight matrix instead of a full scale
> one, but i was not lucky enough to find a solution in the docs.

  I confronted this problem recently, and decided to roll my own.  For a large 
dataset, an exhaustive computation of Moran's I is very time-consuming, as you 
say; but an estimate of it, based on a subset of pairs chosen randomly, seemed 
(for my data, at least) to converge quite nicely.  Here's my code.  It assumes 
a square grid, so you'll need to adapt it for your non-square grid, but that 
should be trivial.  To determine the right number of pairs for a good estimate 
in my application, I called this function with various numbers of pairs, and 
plotted the estimate produced as a function of the number of pairs used, and 
observed good convergence by 200,000 pairs.  You will probably want to do this 
yourself, for your dataset.  200,000 pairs took just a second or two to run, on 
my machine, so this is much, much faster than an exhaustive calculation.  As a 
bonus, this code also gives you Geary's C.  Oh, and the code below assumes a 
wraparound lattice (i.e. a torus, i.e. periodic boundaries), which may not be 
what you want; just get rid of the d_wraparound stuff and the following pmax() 
call if you want non-wraparound boundaries, and that should work fine, I think. 
 Not sure what you mean by the moving window thing, so I leave that as an 
exercise for the reader.  :->

  I've never made an R package, and I'm not sure there's much demand for this 
code (is there?), so I'm presently unlikely to package it up.  However, if 
someone who owns a package related to this problem would like to adopt this 
code, generalize it to non-square lattices, add a flag to choose periodic or 
non-periodic boundaries, etc., feel free to do so; I hereby place it in the 
public domain.  I'd just like to hear about it if someone does this, and 
receive credit somewhere, that's all.  I'm not super-good in R, either, so if 
there's a better way to do some things, like the somewhat hacked random index 
generation (trying to avoid floating point issues when generating a random 
integer), please let me know.  I'm always interested in learning how to do 
things better.

  And of course this code is provided without warranty, may have bugs, etc.

  Enjoy!

Ben Haller
McGill University


correlation_stats <- function(p, n_pairs=200000)
{
        # Compute Moran's I and Geary's C for the landscape p.  This is tricky 
because the exact calculation
        # would be far too time-consuming, as it involves comparison of every 
point to every other point.
        # So I am trying to estimate it here with a small subset of all 
possible point comparisons.
        p_size <- NROW(p)               # 512
        p_length <- length(p)   # 262144
        mean_p <- mean(p)
        pv <- as.vector(p)
        
        # select points and look up their values; for speed, points are 
selected by their vector index
        p1ix <- floor(runif(n_pairs, min=0, max=p_length - 0.001))
        p1x <- (p1ix %/% p_size) + 1
        p1y <- (p1ix %% p_size) + 1
        p1ix <- p1ix + 1
        
        p2ix <- floor(runif(n_pairs, min=0, max=p_length - 0.001))
        p2x <- (p2ix %/% p_size) + 1
        p2y <- (p2ix %% p_size) + 1
        p2ix <- p2ix + 1
        
        v1 <- pv[p1ix] - mean_p
        v2 <- pv[p2ix] - mean_p
        
        # Calculate distances between point pairs, both directly and wrapping 
around
        # The average direct distance is much smaller than the average 
wraparound distance,
        # because points near the center vertically have a maximum direct 
distance near 256,
        # but a minimum wraparound distance near 256.  The Moran's I estimate 
from wraparound
        # distances is different, as a result.  Rupert recommends taking the 
shorter of the
        # two distances, whichever it is, because that keeps us within the 
meaningful scale
        # of our space, before it just starts repeating due to periodicity.
        d_direct <- 1 / sqrt(((p1x - p2x) ^ 2) + ((p1y - p2y) ^ 2))
        d_direct[is.infinite(d_direct)] <- 0
        
        d_wraparound <- 1 / sqrt(((p1x - p2x) ^ 2) + ((p_size - abs(p1y - p2y)) 
^ 2))
        d_wraparound[is.infinite(d_wraparound)] <- 0
        
        d <- pmax(d_direct, d_wraparound)       # max because we want the min 
distance, and these are 1/distance
        
        # precalculate some shared terms
        sum_d <- sum(d)
        sum_v1_sq <- sum(v1 ^ 2)
        
        # Moran's I: -1 is perfect dispersion, 1 is perfect correlation, 0 is a 
random spatial pattern
        M_I <- (n_pairs / sum_d) * (sum(d * v1 * v2) / ((sum_v1_sq + sum(v2 ^ 
2)) / 2))
        # (n_pairs / sum(d_direct)) * (sum(d_direct * v1 * v2) / ((sum(v1 ^ 2) 
+ sum(v2 ^ 2)) / 2))
        # (n_pairs / sum(d_wraparound)) * (sum(d_wraparound * v1 * v2) / 
((sum(v1 ^ 2) + sum(v2 ^ 2)) / 2))
        
        # Geary's C: 0 is positive spatial autocorrelation, 2 is negative 
spatial autocorrelation, 1 is a random spatial pattern
        G_C <- ((n_pairs - 1) * sum(d * ((v1 - v2) ^ 2))) / (2 * sum_d * 
sum_v1_sq)
        
        return(list(Moran_I=M_I, Geary_C=G_C))
}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to