I agree on the database solution.
Database are the rigth tool to solve this kind of problem.
Only consider the start up cost of setting up the database. This could be a very time consuming task if someone is not familiar with database technology.

Using file() is not a real reading of all the file. This function will simply open a connection to the file without reading it.
countLines should do something lile "wc -l" from a bash shell

I would say that if this is a one time job this solution should work even thought is not the fastest. In case this job is a repetitive one, then a database solution is surely better

A.


Wacek Kusnierczyk wrote:
if the file is really large, reading it twice may add considerable penalty:

r...@quantide.com wrote:
Something like this should work

library(R.utils)
out = numeric()
qr = c("AAC", "ATT")
n =countLines("test.txt")

# 1st pass

file = file("test.txt", "r")
for (i in 1:n){

# 2nd pass

line = readLines(file, n = 1)
A = strsplit (line, split = " ")[[1]][1]
if(is.element(A, qr)) {
value = as.numeric(strsplit (line, split = " ")[[1]][2])
out = c(out, value)
}
}

if this is a one-go task, counting the lines does not pay, and why
bother.  if this is a repetitive task, a database-based solution will
probably be a better idea.

vQ



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to