On Mon, Mar 23, 2009 at 5:53 PM, David Reiss <dre...@systemsbiology.org> wrote: > I have a very large tab-delimited file, too big to store in memory via > readLines() or read.delim(). Turns out I only need a few hundred of those > lines to be read in. If it were not so large, I could read the entire file > in and "grep" the lines I need. For such a large file; many calls to > read.delim() with incrementing "skip" and "nrows" parameters, followed by > grep() calls is very slow. I am aware of possibilities via SQLite; I would > prefer to not use that in this case. >
Why the restriction? Using sqldf with sqlite its only two R statements. The first statement defines the name of the file and the second statement defines what you want to extract from it and its format. First create a sample file using built in data frame BOD and lets specify tab delimiters. A header, i.e. col.names = TRUE, is the default for write.table: # create tab delimited test file with headers write.table(BOD, file = "myfile.dat", sep = "\t") # now read in only those records satisfying a condition library(sqldf) # here are the two statements myfile <- file("myfile.dat") DF <- sqldf("select * from myfile where demand < 10 or demand > 15", file.format = list(sep = "\t")) See examples on the home page http://sqldf.googlecode.com and see ?sqldf for info on the file.format argument in case your format differs. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.