I noticed that joining two data.frames in R using the "merge" function that using by='row.names' slows things down substantially when compared to just joining on a common index column.
Using a dataframe size of ~10,000 rows: it's as slow as 10 minutes in the by='row.names' case versus merely 1 second using an index column. Beyond the 10^6 range, it's unusably slow. n <- 5 a <- data.frame(id=as.character(1:10^n), x=rnorm(10^n)); rownames(a) <- a$id b <- data.frame(id=as.character(1:10^n + 10^(n-1)), y=rnorm(10^n)); rownames(b) <- b$id date() fast <- merge(a, b, all=T) date() slow <- merge(a, b, all=T, by='row.names') date() Has anybody else noticed this? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.