The following seems to work:
data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
next.rows = 1005,sep=",",colClasses = c("integer","factor","logical"))
'character' doesn't work because ff does not support character
vectors. Character vector need to be stored as factors. The
disadvantage of that is that the levels are stored in memory, so if
the number of levels is very large (e.g. with unique strings) you
might still run into memory problems.
'integer' doesn't work because read.csv.ffdf passes the colClasses on
to read.table, which then tries to converts your second column to
integer which it can't.
Jan
Nick McClure <nfmccl...@gmail.com> schreef:
I've spent some time trying to wrap my head around reading in large csv
files with the ff-package. I think I know how to do it, but am bumping
into some problems. I've tried to recreate the issues as best as I can
with a smaller example and maybe someone can help explain the problems.
The following code just creates a csv file with an integer column,
character column and logical column.
-------------------------------------------------
library(ff)
#Create data
size = 2000
fake.data =
data.frame("Integer"=round(100000*runif(size)),"Character"=sample(LETTERS,size,replace=T),"Logical"=sample(c(T,F),size,replace=T))
#Write to csv
write.csv(fake.data,"data.csv",row.names=F)
-------------------------------------------------
Now to read it in as a 'ffdf' class, I can do the following:
-------------------------------------------------
data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
next.rows = 1005,sep=",")
-------------------------------------------------
That works. But with my current large data set, read.csv.ffdf is debating
with me about the classes it's importing. I was also messing around with
the first.rows/next.rows, but that's a question for another time. So I'll
try to load the data in, specifying the column types (same exact command,
except with specifying colClasses):
-------------------------------------------------
data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows =
500, next.rows = 1005,sep=",",colClasses =
c("integer","integer","logical"))Error in scan(file, what, nmax,
sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'an integer', got '"J"'> data =
read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
next.rows = 1005,sep=",",colClasses =
c("integer","character","logical"))Error in ff(initdata = initdata,
length = length, levels = levels, ordered = ordered, :
vmode 'character' not implemented> data =
read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
next.rows = 1005,sep=",",colClasses = rep("character",3))Error in
ff(initdata = initdata, length = length, levels = levels, ordered =
ordered, :
vmode 'character' not implemented> data =
read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
next.rows = 1005,sep=",",colClasses = rep("raw",3))Error in scan(file,
what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'a raw', got '8601'
-------------------------------------------------
I just can't find a combination of classes that will result in this reading
in. I really don't understand why the classes 'character' won't work for
all of them. Any thoughts as to why? I appreciate the help and time.
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.