On Tue, Oct 12, 2010 at 2:39 AM, Chris Howden <ch...@trickysolutions.com.au> wrote: > I’m working with some very big datasets (each dataset has 11 million rows > and 2 columns). My first step is to merge all my individual data sets > together (I have about 20) > > I’m using the following command from sqldf > > data1 <- sqldf("select A.*, B.* from A inner join B > using(ID)") > > But it’s taking A VERY VERY LONG TIME to merge just 2 of the datasets (well > over 2 hours, possibly longer since it’s still going).
You need to add indexes to your tables. See example 4i on the sqldf home page http://sqldf.googlecode.com This can result in huge speedups for large tables. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.