At the moment I'm just reading the large file to see how fast it goes. Eventually, if I can get the read time down, I'll write out a processed version. Thanks for suggesting scan(); I'll try it.
Rob jim holtman wrote: > Since you are reading it in chunks, I assume that you are writing out each > segment as you read it in. How are you writing it out to save it? Is the > time you are quoting both the reading and the writing? If so, can you break > down the differences in what these operations are taking? > > How do you plan to use the data? Is it all numeric? Are you keeping it in > a dataframe? Have you considered using 'scan' to read in the data and to > specify what the columns are? If you would like some more help, the answer > to these questions will help. > > On Sat, May 9, 2009 at 10:09 PM, Rob Steele > <freenx.10.robste...@xoxy.net>wrote: > >> Thanks guys, good suggestions. To clarify, I'm running on a fast >> multi-core server with 16 GB RAM under 64 bit CentOS 5 and R 2.8.1. >> Paging shouldn't be an issue since I'm reading in chunks and not trying >> to store the whole file in memory at once. Thanks again. >> >> Rob Steele wrote: >>> I'm finding that readLines() and read.fwf() take nearly two hours to >>> work through a 3.5 GB file, even when reading in large (100 MB) chunks. >>> The unix command wc by contrast processes the same file in three >>> minutes. Is there a faster way to read files in R? >>> >>> Thanks! >> > >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.