Something like this should work
library(R.utils)
out = numeric()
qr = c("AAC", "ATT")
n =countLines("test.txt")
file = file("test.txt", "r")
for (i in 1:n){
line = readLines(file, n = 1)
A = strsplit (line, split = " ")[[1]][1]
if(is.element(A, qr)) {
value = as.numeric(strsplit (line, split = " ")[[1]][2])
out = c(out, value)
}
}
You may want to improve execution speed by reading data in chunks
instead of line by line. Code requires a little modification
Carlos J. Gil Bellosta wrote:
On Fri, 2009-01-16 at 18:02 +0900, Gundala Viswanath wrote:
Dear all,
I have a repository file (let's call it repo.txt)
that contain two columns like this:
# tag value
AAA 0.2
AAT 0.3
AAC 0.02
AAG 0.02
ATA 0.3
ATT 0.7
Given another query vector
qr <- c("AAC", "ATT")
I would like to find the corresponding value for each query above,
yielding:
0.02
0.7
However, I want to avoid slurping whole repo.txt into an object (e.g. hash).
Is there any ways to do that?
The reason I want to do that because repo.txt is very2 large size
(milions of lines,
with tag length > 30 bp), and my PC memory is too small to keep it.
- Gundala Viswanath
Jakarta - Indonesia
Hello,
You can always store your repo.txt into a database, say, SQLite, and
select only the values you want via an SQL query.
Thus, you will prevent loading the full file into memory.
Best regards,
Carlos J. Gil Bellosta
http://www.datanalytics.com
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.