Jorge Iván Vélez a écrit : > Dear R-list, > > Does somebody know how can I read a HUGE data set using R? It is a hapmap > data set (txt format) which is around 4GB. After read it, I need to delete > some specific rows and columns. I'm running R 2.6.2 patched over XP SP2 > using a 2.4 GHz Core 2-Duo processor and 4GB RAM. Any suggestion would be > appreciated.
Hmmm... Unless you're running a 64-bits version of XP, you might be SOL (nonwhistanding the astounding feats of the R Core Team, which managed to be able to use about 3,5 GB of memory under 32-bits Windows) : your *raw* data will eat more than the available memory. You might be lucky if some of them can be abstracted (e. g. long character chains that can be reduced to vectors), or get unlucky (large R storage overhead of nonreducible data). You might consider changing machines : get a 64-bit machine with gobs of memory and cross your fingers. Note that, since R pointers are 64-bits wide instead of 32-bits, data storage needs will inflate... Depending of the real meaning of your data and the processing they need, you might also consider storing your raw data in a SQL DBMS, reduce them in SQL and read in R only the relevant part(s). There also are some contributed packages that might help in special situations : biglm, birch. HTH, Emmanuel Charpentier ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.