On Monday 23 March 2009, David Reiss wrote: > I have a very large tab-delimited file, too big to store in memory via > readLines() or read.delim(). Turns out I only need a few hundred of those > lines to be read in. If it were not so large, I could read the entire file > in and "grep" the lines I need. For such a large file; many calls to > read.delim() with incrementing "skip" and "nrows" parameters, followed by > grep() calls is very slow. I am aware of possibilities via SQLite; I would > prefer to not use that in this case. > > My question is...Is there a function for efficiently reading in a file > along the lines of read.delim(), which allows me to specify a filter (via > grep or something else) that tells the function to only read in certain > lines that match? > > If not, I would *love* to see a "filter" parameter added as an option to > read.delim() and/or readLines(). > > thanks for any pointers. > > --David
How about pre-filtering before loading the data into R: grep -E 'your pattern here' your_file_here > your_filtered_file alternatively if you need to search in fields, see 'awk', and 'cut', or if you need to delete things see 'tr'. These tools come with any unix-like OS, and you can probably get them on windows without much effort. Cheers, Dylan -- Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.