I have a very large tab-delimited file, too big to store in memory via
readLines() or read.delim(). Turns out I only need a few hundred of those
lines to be read in. If it were not so large, I could read the entire file
in and "grep" the lines I need. For such a large file; many calls to
read.delim() with incrementing "skip" and "nrows" parameters, followed by
grep() calls is very slow. I am aware of possibilities via SQLite; I would
prefer to not use that in this case.

My question is...Is there a function for efficiently reading in a file along
the lines of read.delim(), which allows me to specify a filter (via grep or
something else) that tells the function to only read in certain lines that
match?

If not, I would *love* to see a "filter" parameter added as an option to
read.delim() and/or readLines().

thanks for any pointers.

--David

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to