On 03.10.2011 19:19, Cable, Sam B Civ USAF AFMC AFRL/RVBXI wrote:
I am using readLines to read a fairly large ASCII file.  readLines reads
a fixed number of lines, then other R code processes the data, then
readLines reads the same number of lines again, then other R code
processes the data, then ....



Sort of like:



conn<-file('filename','r')

for (chunk in 1:100000) {

    Lines<-readLines(conn,n=25)

   # process "Lines"

}



The code is working, but I notice that it slows down greatly as time
progresses.  It took 2 seconds to read my first chunk of data, 4 seconds
to read the next chunk, 10 after that.  The quasi-exponential trend has
slowed, thank goodness, but after about a hundred reads, the read time
for the next chunk is over a minute.  Let me stress that the number of
lines read in each chunk of data is absolutely fixed.



The only processing I am doing at the point is to parse the new data,
and rbind the results to an existing data frame.

And that's may be the interesting point.
Have you tried to allocate the whole data.frame and assign into it later? It is probbaly not readLines() slowing you down. A minute seems to be quite a lot for resonable sized data. How many columns are we talking about?.

Uwe Ligges




 Processing of new data
in no way depends on earlier data.



So, my question is why is the reading taking longer as time goes on?  Is
there a way to fix this?  Is there a better method than readLines?



Thanks.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to