Dear Dominic, Thanks a lot for the quick reply. Just a few questions to make sure I got it all right (I now understand that transport and spatstat in particular can do much more than I need right now). Essentially I am after the Wasserstein distance between univariate distributions (and it would be great if I can extend it to the case of two distributions with a different bin structure).
1) two distributions with the same bins (I identify each bin by the central point in the bin). n_bin <- 11 # number of bins bin_structure <- seq(10, by=1, len=n_bin) set.seed(1234) x_counts <- rpois(n_bin, 10) y_counts <- rpois(n_bin, 10) x <- pp(as.matrix(cbind(bin_structure, x_counts))) y <- pp(as.matrix(cbind(bin_structure, y_counts))) match <- transport(x,y,p=1) plot(x,y,match) wasserstein_dist <- wasserstein(x,y,p=1,match) 2) Now I do not have the same bin structure y2 <- pp(as.matrix(cbind(bin_structure+2, y_counts))) match <- transport(x,y2,p=1) plot(x,y2,match) wasserstein_dist2 <- wasserstein(x,y2,p=1,match) Do 1) and 2) make sense?
If you have no particular need for binning, check out the function pppdist in the R-package spatstat, which offers a more flexible way to deal with point patterns of different size.
Well, this is not clear, but possibly very important for me. My raw data consists of 2 univariate samples of unequal length. suppose that x<-rnorm(100) and y<-rnorm(90) Is there a way to define the Wasserstein distance between them without going through the binning procedure? Many thanks! Lorenzo ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.