I agree on the database solution.
Database are the rigth tool to solve this kind of problem.
Only consider the start up cost of setting up the database. This could
be a very time consuming task if someone is not familiar with database
technology.
Using file() is not a real reading of all the file. This function will
simply open a connection to the file without reading it.
countLines should do something lile "wc -l" from a bash shell
I would say that if this is a one time job this solution should work
even thought is not the fastest. In case this job is a repetitive one,
then a database solution is surely better
A.
Wacek Kusnierczyk wrote:
if the file is really large, reading it twice may add considerable penalty:
r...@quantide.com wrote:
Something like this should work
library(R.utils)
out = numeric()
qr = c("AAC", "ATT")
n =countLines("test.txt")
# 1st pass
file = file("test.txt", "r")
for (i in 1:n){
# 2nd pass
line = readLines(file, n = 1)
A = strsplit (line, split = " ")[[1]][1]
if(is.element(A, qr)) {
value = as.numeric(strsplit (line, split = " ")[[1]][2])
out = c(out, value)
}
}
if this is a one-go task, counting the lines does not pay, and why
bother. if this is a repetitive task, a database-based solution will
probably be a better idea.
vQ
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.