useRs- I'm attempting to scan a more than 1Gb text file and read and store the values that follow a specific key-phrase that is repeated multiple time throughout the file. A snippet of the text file I'm trying to read is attached. The text file is a dumping ground for various aspects of the performance of the model that generates it. Thus, the location of information I'm wanting to extract from the file is not in a fixed position (i.e. it does not always appears in a predictable location, like line 1000, or 2000, etc.). Rather, the desired values always follow a specific phrase: " PERCENT DISCREPANCY ="
One approach I took was the following: library(R.utils) txt_con<-file(description="D:/MCR_BeoPEST - Copy/MCR.out",open="r") #The above will need to be altered if one desires to test code on the attached txt file, which will run much quicker system.time(num_lines<-countLines("D:/MCR_BeoPEST - Copy/MCR.out")) #elapsed time on full 1Gb file took about 55 seconds on a 3.6Gh Xeon num_lines #14405247 system.time( for(i in 1:num_lines){ txt_line<-readLines(txt_con,n=1) if (length(grep(" PERCENT DISCREPANCY =",txt_line))) { pd<-c(pd,as.numeric(substr(txt_line,70,78))) } } ) #Time took about 5 minutes The inefficiencies in this approach arise due to reading the file twice (first to get num_lines, then to step through each line looking for the desired text). Is there a way to speed this process up through the use of a ?scan ? I wan't able to get anything working, but what I had in mind was scan through the more than 1Gb file and when the keyphrase (e.g. " PERCENT DISCREPANCY = ") is encountered, read and store the next 13 characters (which will include some white spaces) as a numeric value, then resume the scan until the key phrase is encountered again and repeat until the end-of-the-file marker is encountered. Is such an approach even possible or is line-by-line the best bet? http://r.789695.n4.nabble.com/file/n4632558/MCR.out MCR.out -- View this message in context: http://r.789695.n4.nabble.com/extracting-values-from-txt-file-that-follow-user-supplied-quote-tp4632558.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.