Thanks for trying this out. Problem 1. We'll check this. Options should certainly be available. Thanks!
Problem 2. Fascinating. We just (yesterday) implemented a sub.big.matrix() function doing exactly this, creating something that is a big matrix but which just references a contiguous subset of the original matrix. This will be available in an upcoming version (hopefully in the next week). A more specialized function would create an entirely new big.matrix from a subset of a first big.matrix, making an actual copy, but this is something else altogether. You could do this entirely within R without much work, by the way, and only 2* memory overhead. Problem 3. You can count missing values using mwhich(). For other exploration (e.g. skewness) at the moment you should just extract a single column (variable) at a time into R, study it, then get the next column, etc... . We will not be implementing all of R's functions directly with big.matrix objects. We will be creating a new package "bigmemoryAnalytics" and would welcome contributions to the package. Feel free to email us directly with bugs, questions, etc... Cheers, Jay ---------------------------------------------------------- From: utkarshsinghal <utkarsh.sing...@global-analytics.com> Date: Tue, Jun 2, 2009 at 8:25 AM Subject: [R] bigmemory - extracting submatrix from big.matrix object To: r help <r-help@r-project.org> I am using the library(bigmemory) to handle large datasets, say 1 GB, and facing following problems. Any hints from anybody can be helpful. _Problem-1: _ I am using "read.big.matrix" function to create a filebacked big matrix of my data and get the following warning: > x = > read.big.matrix("/home/utkarsh.s/data.csv",header=T,type="double",shared=T,backingfile > = "backup", backingpath = "/home/utkarsh.s") Warning message: In filebacked.big.matrix(nrow = numRows, ncol = numCols, type = type, : A descriptor file has not been specified. A descriptor named backup.desc will be created. However there is no such argument in "read.big.matrix". Although there is an argument "descriptorfile" in the function "as.big.matrix" but if I try to use it in "read.big.matrix", I get an error showing it as unused argument (as expected). _Problem-2:_ I want to get a filebacked *sub*matrix of "x", say only selected columns: x[, 1:100]. Is there any way of doing that without actually loading the data into R memory. _ Problem-3 _There are functions available like: summary, colmean, colsd, ... for standard summary statistics. But is there any way to calculate other summaries say number of missing values or skewness of each variable, without loading the whole data into R memory. Regards Utkarsh -- John W. Emerson (Jay) Assistant Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.