I am using the library(bigmemory) to handle large datasets, say 1 GB, and facing following problems. Any hints from anybody can be helpful.

_Problem-1:
_
I am using "read.big.matrix" function to create a filebacked big matrix of my data and get the following warning:

> x = read.big.matrix("/home/utkarsh.s/data.csv",header=T,type="double",shared=T,backingfile = "backup", backingpath = "/home/utkarsh.s")

Warning message:
In filebacked.big.matrix(nrow = numRows, ncol = numCols, type = type,  :
A descriptor file has not been specified. A descriptor named backup.desc will be created.

However there is no such argument in "read.big.matrix". Although there is an argument "descriptorfile" in the function "as.big.matrix" but if I try to use it in "read.big.matrix", I get an error showing it as unused argument (as expected).


_Problem-2:_

I want to get a filebacked *sub*matrix of "x", say only selected columns: x[, 1:100]. Is there any way of doing that without actually loading the data into R memory.

_
Problem-3

_There are functions available like: summary, colmean, colsd, ... for standard summary statistics. But is there any way to calculate other summaries say number of missing values or skewness of each variable, without loading the whole data into R memory.


Regards
Utkarsh

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to