Hi everyone,
Im working with some very big datasets (each dataset has 11 million rows and 2 columns). My first step is to merge all my individual data sets together (I have about 20) Im using the following command from sqldf data1 <- sqldf("select A.*, B.* from A inner join B using(ID)") But its taking A VERY VERY LONG TIME to merge just 2 of the datasets (well over 2 hours, possibly longer since its still going). I was wondering if anyone could suggest a better way, or maybe some suggestions on how I could tweak my computer set up to speed it up? Ive looked at the following packages and this is the only way Ive found to actually merge large data sets in R. These packages seem great for accessing large data sets by avoiding storing them in RAM .but I cant see how they can be used to merge data sets together: · ff · filehash · bigmemory Does anyone have any ideas? At the moment my best idea is to hand it over to someone with a dedicated database server and get them to do the merges (and then hope package biglm can do the modelling) Thanks for any ideas at all!! Chris Howden Founding Partner Tricky Solutions Tricky Solutions 4 Tricky Problems Evidence Based Strategic Development, IP development, Data Analysis, Modelling, and Training (mobile) 0410 689 945 (fax / office) (+618) 8952 7878 ch...@trickysolutions.com.au [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.