[R] reading large csv data sets efficiently

ivo welch Wed, 22 May 2013 09:33:57 -0700

I have a couple of large data sets, on the order of 4GB.  they come in .csv
files, with about 50 columns and lots of rows.  a couple have weird NA
values, such as "C" and "B", in numeric columns.


I am wondering how good read.csv() is dealing with this stuff on the first
pass.

d<-(read.csv("t.csv", colClasses=c(NA, NA, "NULL", "NULL",
"numeric","numeric", "numeric", "numeric"), na.strings=c("C","B")))

does R first read the entire file and then worry about colClasses and
na.strings, or does it handle this line by line as it goes?

(if it does the former, I can write a perl pre-filter)

/iaw

----
Ivo Welch (ivo.we...@gmail.com)

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] reading large csv data sets efficiently

Reply via email to