[R] read.table performance

Gene Leynes Tue, 06 Dec 2011 10:34:33 -0800

** Disclaimer: I'm looking for general suggestions **
I'm sorry, but can't send out the file I'm using, so there is no
reproducible example.


I'm using read.table and it's taking over 30 seconds to read a tiny file.
The strange thing is that it takes roughly the same amount of time if the
file is 100 times larger.

After re-reviewing the data Import / Export manual I think the best
approach would be to use Python, or perhaps the readLines function, but I
was hoping to understand why the simple read.table approach wasn't working
as expected.

Some relevant facts:

   1. There are about 3700 columns.  Maybe this is the problem?  Still the
   file size is not very large.
   2. The file encoding is ANSI, but I'm not specifying that in the
   function.  Setting fileEncoding="ANSI" produces an "unsupported conversion"
   error
   3. readLines imports the lines quickly
   4. scan imports the file quickly also

Obviously, scan and readLines would require more coding to identify
columns, etc.

my code:
system.time(dat <- read.table('C:/test.txt', nrows=-1, sep='\t',
header=TRUE))

It's taking 33.4 seconds and the file size is only 315 kb!

Thanks

Gene

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] read.table performance

Reply via email to