Interesting, thanks for the pointer. In light of the existing (and future) work on this, may I suggest an eSet like class, but build using the technologies in SummarizedExperiment. Ie. a SummarizedExperiment without the rowRanges. I would very much like this for modern work using eSet like containers. Not everything has ranges.
Vince: I am not claiming that it is easy to work with; we have pains as well. But am I missing something or is the assay matrix only 2.3Gb? Best, Kasper On Fri, Sep 18, 2015 at 6:28 PM, Peter Haverty <haverty.pe...@gene.com> wrote: > Yes, bigmemoryExtras::BigMatrix and genoset::RleDataFrame() are good tricks > for reducing the size of your eSets and SummarizedExperiments. Both object > types can go into assayData or assays. In fact, that's what they were > designed for. > > At Genentech, we use these for our 2.5e6 x 1e3 rectangular data from > Illumina SNP arrays. We typically have ~6 such rectangular objects in one > eSet. With a mix of BigMatrix object for point estimates and RleDataFrames > for segmented data, readRDS times are quite reasonable. > > > Pete > > ____________________ > Peter M. Haverty, Ph.D. > Genentech, Inc. > phave...@gene.com > > On Fri, Sep 18, 2015 at 1:56 PM, Tim Triche, Jr. <tim.tri...@gmail.com> > wrote: > > > bigmemoryExtras (Peter Haverty's extensions to bigMemory/bigMatrix) can > be > > handy for this, as it works well as a backend, especially if you go about > > splitting by chromosome as for CNV segmentation, DMR finding, etc. It's > > not as seamless as one might like, but it's the closest thing I've found. > > > > SciDb tries to implement a similar API, but for a distributed version of > > this where the data itself is in a columnar database and served on > demand. > > I tried getting that up and running as a SummarizedExperiment backend, > but > > did not succeed. I have previously shoveled all of the TCGA 450k data > into > > one 7,000+ column bigMatrix which serializes to about 14GB on disk. > > > > If you have any replicates in your 700+ samples, it's a good idea to keep > > their SNP calls in metadata(yourSE), although if you change names it > needs > > to propagate into the dependent metadata. This is why I started > monkeying > > around with linkedExperiments where those mappings are enforced; it's > > becoming more of an issue with the TARGET pediatric AML study, where > there > > are numerous diagnosis-remission-relapse trios whose identity I wish to > > verify periodically. The SNPs on the 450k array are great for this > > purpose, but minfi doesn't really have a slot for them per se, so live in > > metadata(). > > > > > > --t > > > > On Fri, Sep 18, 2015 at 1:29 PM, Vincent Carey < > st...@channing.harvard.edu > > > > > wrote: > > > > > i am dealing with ~700 450k arrays > > > > > > they are derived from one study, so it makes sense to think of > > > > > > them holistically. > > > > > > both the load time and the memory consumption are not satisfactory. > > > > > > has anyone worked on an object type that implements the rangedSE API > but > > > has > > > > > > the assay data out of memory? > > > > > > > unix.time(load("wbmse.rda")) > > > > > > user system elapsed > > > > > > 30.131 2.396 61.036 > > > > > > > object.size(wbmse) > > > > > > 124031032 bytes > > > > > > > dim(wbmse) > > > > > > [1] 485577 690 > > > > > > > object.size(assays(wbmse)) > > > > > > 2680430992 bytes > > > > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > > Bioc-devel@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel