On Tue, Oct 12, 2010 at 2:39 AM, Chris Howden
<ch...@trickysolutions.com.au> wrote:
> I’m working with some very big datasets (each dataset has 11 million rows
> and 2 columns). My first step is to merge all my individual data sets
> together (I have about 20)
>
> I’m using the following command from sqldf
>
>               data1 <- sqldf("select A.*, B.* from A inner join B
> using(ID)")
>
> But it’s taking A VERY VERY LONG TIME to merge just 2 of the datasets (well
> over 2 hours, possibly longer since it’s still going).

You need to add indexes to your tables.   See example 4i on the sqldf home page
http://sqldf.googlecode.com
This can result in huge speedups for large tables.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to