On Mon, 23 Mar 2009, David Reiss wrote:
I have a very large tab-delimited file, too big to store in memory via readLines() or read.delim(). Turns out I only need a few hundred of those lines to be read in. If it were not so large, I could read the entire file in and "grep" the lines I need. For such a large file; many calls to read.delim() with incrementing "skip" and "nrows" parameters, followed by grep() calls is very slow.
You certainly don't want to use repeated reads from the start of the file with skip=, but if you set up a file connection fileconnection <- file("my.tsv", open="r") you can read from it incrementally with readLines() or read.delim() without going back to the start each time. The speed of approach should be within a reasonable constant factor of anything else, since reading the file once is unavoidable and should be the bottleneck. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.edu University of Washington, Seattle ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.