-----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of dms Sent: Wednesday, March 02, 2011 3:16 PM To: r-help@r-project.org Subject: [R] merge( , by='row.names') slowness
I noticed that joining two data.frames in R using the "merge" function that using by='row.names' slows things down substantially when compared to just joining on a common index column. Using a dataframe size of ~10,000 rows: it's as slow as 10 minutes in the by='row.names' case versus merely 1 second using an index column. Beyond the 10^6 range, it's unusably slow. n <- 5 a <- data.frame(id=as.character(1:10^n), x=rnorm(10^n)); rownames(a) <- a$id b <- data.frame(id=as.character(1:10^n + 10^(n-1)), y=rnorm(10^n)); rownames(b) <- b$id date() fast <- merge(a, b, all=T) date() slow <- merge(a, b, all=T, by='row.names') date() Has anybody else noticed this? _________________________________________________ HI DMS, Well, first off, they don't give the same answer... in fact, not even the same dimension. Even so, from looking at merge.data.frame, it's not immediately obvious what would make a difference of this magnitude. The answer might be buried in the internal merge. Here for n=3: > system.time(print(dim(merge(a,b,all=T)))) [1] 1100 3 user system elapsed 0.01 0.00 0.01 > system.time(print(dim(merge(a,b,all=T,by=1)))) [1] 1100 3 user system elapsed 0.01 0.00 0.02 > system.time(print(dim(merge(a,b,all=T,by=0)))) [1] 1100 5 user system elapsed 3.26 0.00 3.17 > system.time(print(dim(merge(a,b,all=T,by="row.names")))) [1] 1100 5 user system elapsed 3.17 0.00 3.17 > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. message may contain confidential information. If you are not the designated recipient, please notify the sender immediately, and delete the original and any copies. Any use of the message by you is prohibited. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.