thanks to all, lots of potential here. On Fri, Sep 18, 2015 at 3:28 PM, Peter Haverty <haverty.pe...@gene.com> wrote:
> Yes, bigmemoryExtras::BigMatrix and genoset::RleDataFrame() are good > tricks for reducing the size of your eSets and SummarizedExperiments. Both > object types can go into assayData or assays. In fact, that's what they > were designed for. > > At Genentech, we use these for our 2.5e6 x 1e3 rectangular data from > Illumina SNP arrays. We typically have ~6 such rectangular objects in one > eSet. With a mix of BigMatrix object for point estimates and RleDataFrames > for segmented data, readRDS times are quite reasonable. > > > Pete > > ____________________ > Peter M. Haverty, Ph.D. > Genentech, Inc. > phave...@gene.com > > On Fri, Sep 18, 2015 at 1:56 PM, Tim Triche, Jr. <tim.tri...@gmail.com> > wrote: > >> bigmemoryExtras (Peter Haverty's extensions to bigMemory/bigMatrix) can be >> handy for this, as it works well as a backend, especially if you go about >> splitting by chromosome as for CNV segmentation, DMR finding, etc. It's >> not as seamless as one might like, but it's the closest thing I've found. >> >> SciDb tries to implement a similar API, but for a distributed version of >> this where the data itself is in a columnar database and served on demand. >> I tried getting that up and running as a SummarizedExperiment backend, but >> did not succeed. I have previously shoveled all of the TCGA 450k data >> into >> one 7,000+ column bigMatrix which serializes to about 14GB on disk. >> >> If you have any replicates in your 700+ samples, it's a good idea to keep >> their SNP calls in metadata(yourSE), although if you change names it needs >> to propagate into the dependent metadata. This is why I started monkeying >> around with linkedExperiments where those mappings are enforced; it's >> becoming more of an issue with the TARGET pediatric AML study, where there >> are numerous diagnosis-remission-relapse trios whose identity I wish to >> verify periodically. The SNPs on the 450k array are great for this >> purpose, but minfi doesn't really have a slot for them per se, so live in >> metadata(). >> >> >> --t >> >> On Fri, Sep 18, 2015 at 1:29 PM, Vincent Carey < >> st...@channing.harvard.edu> >> wrote: >> >> > i am dealing with ~700 450k arrays >> > >> > they are derived from one study, so it makes sense to think of >> > >> > them holistically. >> > >> > both the load time and the memory consumption are not satisfactory. >> > >> > has anyone worked on an object type that implements the rangedSE API but >> > has >> > >> > the assay data out of memory? >> > >> > > unix.time(load("wbmse.rda")) >> > >> > user system elapsed >> > >> > 30.131 2.396 61.036 >> > >> > > object.size(wbmse) >> > >> > 124031032 bytes >> > >> > > dim(wbmse) >> > >> > [1] 485577 690 >> > >> > > object.size(assays(wbmse)) >> > >> > 2680430992 bytes >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioc-devel@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel