> I am wondering if there is a function which will do a join between 2 > data.frames by minimum distance, as it is done in ArcGIS for example. For > people who are not familiar with ArcGIS here it is an explanation: > > Suppose you have a data.frame with x, y, coordinates called track, and a > second data frame with different x, y coordinates and some other attributes > called classif. The track data.frame has a different number of rows than > classif. I want to join the rows from classif to track in such a way that for > each row in track I add only the row from classif that has coordinates > closest to the coordinates in the track row (and hence minimum distance in > between the 2 rows), and also add a new column which will record this minimum > distance. Even if the coordinates in the 2 data.frames have same name, the > values are not identical between the data.frames, so a merge by column is not > what I am after.
#----------------------------------------------------------------------- # get the distance between two points on the globe. # # args: # lat1 - latitude of first point. # long1 - longitude of first point. # lat2 - latitude of first point. # long2 - longitude of first point. # radius - average radius of the earth in km # # see: http://en.wikipedia.org/wiki/Great_circle_distance #----------------------------------------------------------------------- greatCircleDistance <- function(lat1, long1, lat2, long2, radius=6372.795){ sf <- pi/180 lat1 <- lat1*sf lat2 <- lat2*sf long1 <- long1*sf long2 <- long2*sf lod <- abs(long1-long2) radius * atan2( sqrt((cos(lat1)*sin(lod))**2 + (cos(lat2)*sin(lat1)-sin(lat2)*cos(lat1)*cos(lod))**2), sin(lat2)*sin(lat1)+cos(lat2)*cos(lat1)*cos(lod) ) } #----------------------------------------------------------------------- # Calculate the nearest point using latitude and longitude. # and attach the other args and nearest distance from the # other data.frame. # # args: # x as you describe 'track' # y as you describe 'classif' # xlongnme name of longitude variable in x # xlatnme name of latitude location variable in x # ylongnme name of longitude location variable on y # ylatnme name of latitude location variable on y #----------------------------------------------------------------------- dist.merge <- function(x, y, xlongnme, xlatnme, ylongnme, ylatnme){ tmp <- t(apply(x[,c(xlongnme, xlatnme)], 1, function(x, y){ dists <- apply(y, 1, function(x, y) greatCircleDistance(x[2], x[1], y[2], y[1]), x) cbind(1:nrow(y), dists)[dists == min(dists),,drop=F][1,] } , y[,c(ylongnme, ylatnme)])) tmp <- cbind(x, min.dist=tmp[,2], y[tmp[,1],-match(c(ylongnme, ylatnme), names(y))]) row.names(tmp) <- NULL tmp } # demo track <- data.frame(xt=runif(10,0,360), yt=rnorm(10,-90, 90)) classif <- data.frame(xc=runif(10,0,360), yc=rnorm(10,-90, 90), v1=letters[1:20], v2=1:20) dist.merge(track, classif, 'xt', 'yt', 'xc', 'yc') ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.