I have a couple of large data sets, on the order of 4GB. they come in .csv files, with about 50 columns and lots of rows. a couple have weird NA values, such as "C" and "B", in numeric columns.
I am wondering how good read.csv() is dealing with this stuff on the first pass. d<-(read.csv("t.csv", colClasses=c(NA, NA, "NULL", "NULL", "numeric","numeric", "numeric", "numeric"), na.strings=c("C","B"))) does R first read the entire file and then worry about colClasses and na.strings, or does it handle this line by line as it goes? (if it does the former, I can write a perl pre-filter) /iaw ---- Ivo Welch (ivo.we...@gmail.com) [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.