On Mon, 23 Mar 2009, David Reiss wrote:

I have a very large tab-delimited file, too big to store in memory via
readLines() or read.delim(). Turns out I only need a few hundred of those
lines to be read in. If it were not so large, I could read the entire file
in and "grep" the lines I need. For such a large file; many calls to
read.delim() with incrementing "skip" and "nrows" parameters, followed by
grep() calls is very slow.

You certainly don't want to use repeated reads from the start of the file with 
skip=,  but if you set up a file connection
   fileconnection <- file("my.tsv", open="r")
you can read from it incrementally with readLines() or read.delim() without 
going back to the start each time.

The speed of approach should be within a reasonable constant factor of anything 
else, since reading the file once is unavoidable and should be the bottleneck.

      -thomas

Thomas Lumley                   Assoc. Professor, Biostatistics
tlum...@u.washington.edu        University of Washington, Seattle

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to