How many different types are there? Just a handful or many thousands? For this sort of problem it is often handy to write a function which generates datasets of the sort you are thinking of but parameterized by the number of rows, levels, etc., so you can see how the execution time varies with these things.
If there are just a few types, try looping over types and using findInterval to see where A$Date fits into the sequence of B$Special_Date. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of Francesco > Sent: Sunday, August 19, 2012 4:01 AM > To: r-help@r-project.org > Subject: Re: [R] merging and obtaining the nearest value > > Dear Riu, Many thanks for your suggestion > > However these are just simplified examples... in reality the dataset A > contains millions of observations and B several thousands of rows... > Could I still use a modified form of your suggestion? > > Thanks > > On 19 August 2012 12:51, Rui Barradas <ruipbarra...@sapo.pt> wrote: > > Hello, > > > > Try the following. > > > > > > A <- read.table(text=" > > > > TYPE DATE > > A 2 > > A 5 > > A 20 > > B 10 > > B 2 > > ", header = TRUE) > > > > > > B <- read.table(text=" > > > > TYPE Special_Date > > A 2 > > A 6 > > A 20 > > A 22 > > B 5 > > B 6 > > ", header = TRUE) > > > > result <- do.call( rbind, lapply(split(merge(A, B), list(m$DATE, m$TYPE)), > > function(x){ > > a <- abs(x$DATE - x$Special_Date) > > if(nrow(x)) x[which(min(a) == a), ] }) ) > > result$Difference <- result$DATE - result$Special_Date > > result$Special_Date <- NULL > > rownames(result) <- seq_len(nrow(result)) > > result > > > > > > Also, it's a good practice to post data examples using dput(). For instance, > > > > dput(A) > > structure(list(TYPE = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("A", > > "B"), class = "factor"), DATE = c(2L, 5L, 20L, 10L, 2L)), .Names = c("TYPE", > > "DATE"), class = "data.frame", row.names = c(NA, -5L)) > > > > Now all we have to do is run the statement A <- structure(... etc...) to > > have an exact copy of the data example. > > Anyway, your example with input and the wanted result was very welcome. > > > > Hope this helps, > > > > Rui Barradas > > > > Em 19-08-2012 11:10, Francesco escreveu: > >> > >> Dear R-help > >> > >> Î would like to know if there is a short solution in R for this > >> merging problem... > >> > >> Let say I have a dataset A as: > >> > >> TYPE DATE > >> A 2 > >> A 5 > >> A 20 > >> B 10 > >> B 2 > >> > >> (there can be duplicates for the same type and date) > >> > >> and I have another dataset B as : > >> > >> TYPE Special_Date > >> A 2 > >> A 6 > >> A 20 > >> A 22 > >> B 5 > >> B 6 > >> > >> The question is : I would like to obtain the difference between the > >> date of each observation in A and the closest special date in B with > >> the same type. In case of ties I would take the latest date of the > >> two. > >> > >> For example I would obtain here > >> > >> TYPE DATE Difference > >> A 2 0=2-2 > >> A 5 -1=5-6 > >> A 20 0=20-20 > >> B 10 +4=10-6 > >> B 2 -3=2-5 > >> > >> Do you know how to (simply?) obtain this in R? > >> > >> Many thanks! > >> Best Regards > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.