Having had a quick look at the source code for read.table.ffdf, I suspect that using 'NULL' in the colClasses argument is not allowed. Could you try to see if you can use read.table.ffdf with specifying the colClasses for all columns (thereby reading in all columns in the file)? If that works, you can be quite sure that indeed that number of columns is constant in the file (sometimes a ' or unquoted , can mess things up).

Jan




threshold <r.kozar...@gmail.com> schreef:

*Dear R users, Ive just started using the ff package.

There is a csv file (~4Gb) with 7 columns and 6e+7 rows. I want to read only
column from the file, skipping the first 100 rows.
Below Ive provided different outcomes, which will clarify my problem
*
sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
...

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] ff_2.2-7  bit_1.1-8

##---------------------------------------------------------------------------------------
## *I want to read the second column only:*
x.class <- c('NULL', 'numeric','NULL','NULL','NULL', 'NULL', 'NULL')

##* The following command works fine:*

    read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
colClasses=x.class, nrows=1e3)
ffdf (all open) dim=c(1000,1), dimorder=c(1,2) row.names=NULL
ffdf virtual mapping
   PhysicalName VirtualVmode PhysicalVmode  AsIs VirtualIsMatrix
V2           V2       double        double FALSE           FALSE
   PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
V2            FALSE                 1                1               1
   PhysicalIsOpen
V2           TRUE
ffdf data
          V2
1    -0.5412
2    -0.5842
3    -0.5920
4    -0.5451
5    -0.5099
6    -0.5021
7    -0.4943
8    -0.5490
:          :
993  -0.4865
994  -0.6584
995  -0.7482
996  -0.8732
997  -0.8303
998  -0.7248
999  -0.5490
1000 -0.4240

*Then I extend nrows by 1, I get warning about number of columns:*

    read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
colClasses=x.class, nrows=1001)
ffdf (all open) dim=c(1001,1), dimorder=c(1,2) row.names=NULL
ffdf virtual mapping
   PhysicalName VirtualVmode PhysicalVmode  AsIs VirtualIsMatrix
V2           V2       double        double FALSE           FALSE
   PhysicalIsMatrix PhysicalElementNo PhysicalFirstCol PhysicalLastCol
V2            FALSE                 1                1               1
   PhysicalIsOpen
V2           TRUE
ffdf data
          V2
1    -0.5412
2    -0.5842
3    -0.5920
4    -0.5451
5    -0.5099
6    -0.5021
7    -0.4943
8    -0.5490
:          :
994  -0.6584
995  -0.7482
996  -0.8732
997  -0.8303
998  -0.7248
999  -0.5490
1000 -0.4240
1001 -0.3849
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote,  :
  cols = 1 != length(data) = 7


*Then, going much beyond 1000 brings problems:*
    read.csv.ffdf(file=csvfile, header=FALSE, skip=100,
colClasses=x.class, nrows=1e4)
Error in read.table(file = file, header = header, sep = sep, quote = quote,
:
  more columns than column names

*Question is why? The number of columns does not change in the file...

I will appreciate any help..


Best, Robert

*




--
View this message in context: http://r.789695.n4.nabble.com/ff-package-reading-selected-columns-from-csv-tp4637794.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to