Dear Dominic,
Thanks a lot for the quick reply.
Just a few questions to make sure I got it all right (I now understand that
transport and spatstat in particular can do much more than I need
right now).
Essentially I am after the Wasserstein distance between univariate
distributions (and it would be great if I can extend it to the
case of two distributions with a different bin structure).

1) two distributions with the same bins (I identify each bin by the
central point in the bin).

n_bin <- 11 # number of bins

bin_structure <- seq(10, by=1, len=n_bin)

set.seed(1234)

x_counts <- rpois(n_bin, 10)
y_counts <- rpois(n_bin, 10)

x <- pp(as.matrix(cbind(bin_structure, x_counts)))
y <- pp(as.matrix(cbind(bin_structure, y_counts)))


match <- transport(x,y,p=1)
plot(x,y,match)
wasserstein_dist <- wasserstein(x,y,p=1,match)


2) Now I do not have the same bin structure


y2 <- pp(as.matrix(cbind(bin_structure+2, y_counts)))


match <- transport(x,y2,p=1)
plot(x,y2,match)
wasserstein_dist2 <- wasserstein(x,y2,p=1,match)


Do 1) and 2) make sense?


If you have no particular need for binning, check out the function
pppdist in the R-package spatstat, which offers a more flexible way
to deal with point patterns of different size.


Well, this is not clear, but possibly very important for me.
My raw data consists of 2 univariate samples of unequal length.

suppose that

x<-rnorm(100)

and

y<-rnorm(90)

Is there a way to define the Wasserstein distance between them without
going through the binning procedure?



Many thanks!

Lorenzo

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to