Thanks for the suggestion and code Gabor, I've tried creating 2 indices:
1) just for the variable I intend to merge on 2) on the entire data set I am merging (which I think is the one I should be using??) However neither seemed to work. The first was still going after 2 hours, and the second after 12 hours, so I stopped the join. If it's not too much bother I was wondering if U could let me know which index I should be using? Or in other words since I plan to merge using POINTID do I create an index on system.time(sqldf("create index ai1 on A(POINTID)")) system.time(sqldf("create index ai2 on B(POINTID)")) or system.time(sqldf("create index ai1 on A(POINTID,alistair)")) system.time(sqldf("create index ai2 on B(POINTID, alistair_range)") I'm now using the following join statement system.time(data2 <- sqldf("select * from A natural join B")) thanks Chris Howden Founding Partner Tricky Solutions Tricky Solutions 4 Tricky Problems Evidence Based Strategic Development, IP development, Data Analysis, Modelling, and Training (mobile) 0410 689 945 (fax / office) (+618) 8952 7878 ch...@trickysolutions.com.au -----Original Message----- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Thursday, 14 October 2010 9:02 AM To: Chris Howden Cc: r-help@r-project.org Subject: Re: [R] merging and working with BIG data sets. Is sqldf the best way?? On Tue, Oct 12, 2010 at 2:39 AM, Chris Howden <ch...@trickysolutions.com.au> wrote: > I’m working with some very big datasets (each dataset has 11 million rows > and 2 columns). My first step is to merge all my individual data sets > together (I have about 20) > > I’m using the following command from sqldf > > data1 <- sqldf("select A.*, B.* from A inner join B > using(ID)") > > But it’s taking A VERY VERY LONG TIME to merge just 2 of the datasets (well > over 2 hours, possibly longer since it’s still going). You need to add indexes to your tables. See example 4i on the sqldf home page http://sqldf.googlecode.com This can result in huge speedups for large tables. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.