The following, f2(A,B), should do well with lots of rows in A and B as long as the number of types is not huge.
f2 <- function(A, B) { types <- as.character(unique(A$TYPE)) result <- numeric(nrow(A)) Bs <- split(B$Special_Date, B$TYPE) for(type in types) { w <- A$TYPE == type # can omit the sort() below if you know that B$Special_Date is sorted. result[w] <- closestValue(A$DATE[w], sort(Bs[[type]])) } A$Difference <- A$DATE - result A } closestValue <- function (x, vec) { # for each value in x, find closest value in vec. # Break ties by using highest. # Assume vec is sorted. intervalNo <- findInterval(x, vec) lowerValue <- vec[pmax(1, intervalNo)] upperValue <- vec[pmin(length(vec), intervalNo+1)] ifelse(x - lowerValue < upperValue - x, lowerValue, upperValue) } Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of William Dunlap > Sent: Sunday, August 19, 2012 9:43 AM > To: Francesco; r-help@r-project.org > Subject: Re: [R] merging and obtaining the nearest value > > How many different types are there? Just a handful or many thousands? > For this sort of problem it is often handy to write a function which generates > datasets of the sort you are thinking of but parameterized by the > number of rows, levels, etc., so you can see how the execution time > varies with these things. > > If there are just a few types, try looping over types and using findInterval > to see where A$Date fits into the sequence of B$Special_Date. > > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > > -----Original Message----- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > > Behalf > > Of Francesco > > Sent: Sunday, August 19, 2012 4:01 AM > > To: r-help@r-project.org > > Subject: Re: [R] merging and obtaining the nearest value > > > > Dear Riu, Many thanks for your suggestion > > > > However these are just simplified examples... in reality the dataset A > > contains millions of observations and B several thousands of rows... > > Could I still use a modified form of your suggestion? > > > > Thanks > > > > On 19 August 2012 12:51, Rui Barradas <ruipbarra...@sapo.pt> wrote: > > > Hello, > > > > > > Try the following. > > > > > > > > > A <- read.table(text=" > > > > > > TYPE DATE > > > A 2 > > > A 5 > > > A 20 > > > B 10 > > > B 2 > > > ", header = TRUE) > > > > > > > > > B <- read.table(text=" > > > > > > TYPE Special_Date > > > A 2 > > > A 6 > > > A 20 > > > A 22 > > > B 5 > > > B 6 > > > ", header = TRUE) > > > > > > result <- do.call( rbind, lapply(split(merge(A, B), list(m$DATE, m$TYPE)), > > > function(x){ > > > a <- abs(x$DATE - x$Special_Date) > > > if(nrow(x)) x[which(min(a) == a), ] }) ) > > > result$Difference <- result$DATE - result$Special_Date > > > result$Special_Date <- NULL > > > rownames(result) <- seq_len(nrow(result)) > > > result > > > > > > > > > Also, it's a good practice to post data examples using dput(). For > > > instance, > > > > > > dput(A) > > > structure(list(TYPE = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("A", > > > "B"), class = "factor"), DATE = c(2L, 5L, 20L, 10L, 2L)), .Names = > > > c("TYPE", > > > "DATE"), class = "data.frame", row.names = c(NA, -5L)) > > > > > > Now all we have to do is run the statement A <- structure(... etc...) to > > > have an exact copy of the data example. > > > Anyway, your example with input and the wanted result was very welcome. > > > > > > Hope this helps, > > > > > > Rui Barradas > > > > > > Em 19-08-2012 11:10, Francesco escreveu: > > >> > > >> Dear R-help > > >> > > >> Î would like to know if there is a short solution in R for this > > >> merging problem... > > >> > > >> Let say I have a dataset A as: > > >> > > >> TYPE DATE > > >> A 2 > > >> A 5 > > >> A 20 > > >> B 10 > > >> B 2 > > >> > > >> (there can be duplicates for the same type and date) > > >> > > >> and I have another dataset B as : > > >> > > >> TYPE Special_Date > > >> A 2 > > >> A 6 > > >> A 20 > > >> A 22 > > >> B 5 > > >> B 6 > > >> > > >> The question is : I would like to obtain the difference between the > > >> date of each observation in A and the closest special date in B with > > >> the same type. In case of ties I would take the latest date of the > > >> two. > > >> > > >> For example I would obtain here > > >> > > >> TYPE DATE Difference > > >> A 2 0=2-2 > > >> A 5 -1=5-6 > > >> A 20 0=20-20 > > >> B 10 +4=10-6 > > >> B 2 -3=2-5 > > >> > > >> Do you know how to (simply?) obtain this in R? > > >> > > >> Many thanks! > > >> Best Regards > > >> > > >> ______________________________________________ > > >> R-help@r-project.org mailing list > > >> https://stat.ethz.ch/mailman/listinfo/r-help > > >> PLEASE do read the posting guide > > >> http://www.R-project.org/posting-guide.html > > >> and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.