Hi everyone,


I’m working with some very big datasets (each dataset has 11 million rows
and 2 columns). My first step is to merge all my individual data sets
together (I have about 20)



I’m using the following command from sqldf

               data1 <- sqldf("select A.*, B.* from A inner join B
using(ID)")



But it’s taking A VERY VERY LONG TIME to merge just 2 of the datasets (well
over 2 hours, possibly longer since it’s still going).





I was wondering if anyone could suggest a better way, or maybe some
suggestions on how I could tweak my computer set up to speed it up?





I’ve looked at the following packages and this is the only way I’ve found to
actually merge large data sets in R. These packages seem great for accessing
large data sets by avoiding storing them in RAM….but I can’t see how they
can be used to merge data sets together:

·        ff

·        filehash

·        bigmemory



Does anyone have any ideas?



At the moment my best idea is to hand it over to someone with a dedicated
database server and get them to do the merges (and then hope package biglm
can do the modelling)



Thanks for any ideas at all!!







Chris Howden

Founding Partner

Tricky Solutions

Tricky Solutions 4 Tricky Problems

Evidence Based Strategic Development, IP development, Data Analysis,
Modelling, and Training

(mobile) 0410 689 945

(fax / office) (+618) 8952 7878

ch...@trickysolutions.com.au

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to