Re: [R] How to read HUGE data sets?

Roland Rau Thu, 28 Feb 2008 13:48:48 -0800

Hi,

Jorge Iván Vélez wrote:
> Dear R-list,
> 
> Does somebody know how can I read a HUGE data set using R? It is a hapmap
> data set (txt format) which is around 4GB. After read it, I need to delete
> some specific rows and columns. I'm running R 2.6.2 patched over XP SP2


in such a case, I would recommend not to use R in the beginning. Try to 
use awk[1] to cut out the correct rows and columns. If the resulting 
data are still very large, I would suggest to read it into a Database 
System. My experience is limited in that respect: I only used SQLite. 
But in conjunction with the RSQLite package, I was managed all my "big 
data problems".

Check http://www.ibm.com/developerworks/library/l-awk1.html to get you 
smoothly started with awk.

I hope this helps,
Roland

[1] I think the gawk implementation offers most options (e.g. for 
timing) but I recently used mawk on Windows XP and it was way faster (or 
was it nawk?). If you don't have experience in some language such as 
perl, I'd say it is much easier to learn awk than perl.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to read HUGE data sets?

Reply via email to