Hi, Jorge Iván Vélez wrote: > Dear R-list, > > Does somebody know how can I read a HUGE data set using R? It is a hapmap > data set (txt format) which is around 4GB. After read it, I need to delete > some specific rows and columns. I'm running R 2.6.2 patched over XP SP2
in such a case, I would recommend not to use R in the beginning. Try to use awk[1] to cut out the correct rows and columns. If the resulting data are still very large, I would suggest to read it into a Database System. My experience is limited in that respect: I only used SQLite. But in conjunction with the RSQLite package, I was managed all my "big data problems". Check http://www.ibm.com/developerworks/library/l-awk1.html to get you smoothly started with awk. I hope this helps, Roland [1] I think the gawk implementation offers most options (e.g. for timing) but I recently used mawk on Windows XP and it was way faster (or was it nawk?). If you don't have experience in some language such as perl, I'd say it is much easier to learn awk than perl. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.