For what it's worth, I've written a class which I have creatively named SubsettableListOfArrays which basically taking the "subset everything together" aspect of eSet and SummarizedExperiment and making it as generic as possible. It's basically like (non-ranged) SummarizedExperiment, except that, like the assays slot, everything can have multiple elements, and you can also have 1-dimensional vectors associated with rows and columns. The implementation is the most straightforward you can imagine, and is not at all optimized, but it works. The only contract that is required to store things in it is that they be subsettable in the appropriate way. As an example use case, you might use it to store a SummarizedExperiment, and then store the DGEList that you create from it, and then also store the fit object from glmFit as row data, and then store the result table as another row data object, and so on, and store an entire edgeR analysis in it, and maybe DESeq2 and limma-voom analyses of the same data as well. I haven't actually felt the need to do that yet, so at the moment it's mostly a proof of concept. I'm not actually using it for anything.
If anyone's interested, you can get it here: http://mneme.homenet.org/~ryan/SubsettableListOfArrays.R -Ryan On 9/18/15 7:41 PM, Michael Lawrence wrote: > While it's useful (and often necessary) to store the big matrices out > of core, it would be convenient to store the metadata (the other > components of the object) along with the matrices. Something along the > lines of HDF5, but we would want to keep things abstract. Other > options include GDS (for genotypes), and of couse most any database. > > On Fri, Sep 18, 2015 at 6:18 PM, Peter Haverty <haverty.pe...@gene.com > <mailto:haverty.pe...@gene.com>> wrote: > > While we are on the topic, my GenoSet class will become a subclass of > RangedSummarizedExperiment, rather than eSet, after this upcoming > release. > For this release both APIs work (colnames and sampleNames, etc.) > > I think the range-free SummarizedExperiment will be great. I've > seen a lot > of ExpressionSets with random, non-exprs stuff in the exprs slot > for lack > of something more appropriate. > > Pete > > ____________________ > Peter M. Haverty, Ph.D. > Genentech, Inc. > phave...@gene.com <mailto:phave...@gene.com> > > On Fri, Sep 18, 2015 at 6:09 PM, Ryan <r...@thompsonclan.org > <mailto:r...@thompsonclan.org>> wrote: > > > In the dev version, SummarizedExperiment has been split into > > RangedSummarizedExperiment (equivalent to the current > > SummarizedExperiement, with rowRanges) and SummarizedExperiment > (kind of > > like eSet, no rowRanges). Given that eSet objects also support > multiple > > assayData elements, I believe the new SummarizedExperiment is > pretty close > > to being eSet with different method names. In fact, I wonder if eSet > > could/should be reimplemented as a subclass of the new > SummarizedExperiment > > class. > > > > > > On 9/18/15 5:36 PM, Kasper Daniel Hansen wrote: > > > >> Interesting, thanks for the pointer. > >> > >> In light of the existing (and future) work on this, may I > suggest an eSet > >> like class, but build using the technologies in > SummarizedExperiment. Ie. > >> a SummarizedExperiment without the rowRanges. I would very much > like this > >> for modern work using eSet like containers. Not everything has > ranges. > >> > >> Vince: I am not claiming that it is easy to work with; we have > pains as > >> well. But am I missing something or is the assay matrix only > 2.3Gb? > >> > >> Best, > >> Kasper > >> > >> On Fri, Sep 18, 2015 at 6:28 PM, Peter Haverty > <haverty.pe...@gene.com <mailto:haverty.pe...@gene.com>> > >> wrote: > >> > >> Yes, bigmemoryExtras::BigMatrix and genoset::RleDataFrame() are > good > >>> tricks > >>> for reducing the size of your eSets and > SummarizedExperiments. Both > >>> object > >>> types can go into assayData or assays. In fact, that's what > they were > >>> designed for. > >>> > >>> At Genentech, we use these for our 2.5e6 x 1e3 rectangular > data from > >>> Illumina SNP arrays. We typically have ~6 such rectangular > objects in > >>> one > >>> eSet. With a mix of BigMatrix object for point estimates and > >>> RleDataFrames > >>> for segmented data, readRDS times are quite reasonable. > >>> > >>> > >>> Pete > >>> > >>> ____________________ > >>> Peter M. Haverty, Ph.D. > >>> Genentech, Inc. > >>> phave...@gene.com <mailto:phave...@gene.com> > >>> > >>> On Fri, Sep 18, 2015 at 1:56 PM, Tim Triche, Jr. > <tim.tri...@gmail.com <mailto:tim.tri...@gmail.com>> > >>> wrote: > >>> > >>> bigmemoryExtras (Peter Haverty's extensions to > bigMemory/bigMatrix) can > >>>> > >>> be > >>> > >>>> handy for this, as it works well as a backend, especially if > you go > >>>> about > >>>> splitting by chromosome as for CNV segmentation, DMR finding, > etc. > >>>> It's > >>>> not as seamless as one might like, but it's the closest thing > I've > >>>> found. > >>>> > >>>> SciDb tries to implement a similar API, but for a distributed > version of > >>>> this where the data itself is in a columnar database and > served on > >>>> > >>> demand. > >>> > >>>> I tried getting that up and running as a SummarizedExperiment > backend, > >>>> > >>> but > >>> > >>>> did not succeed. I have previously shoveled all of the TCGA > 450k data > >>>> > >>> into > >>> > >>>> one 7,000+ column bigMatrix which serializes to about 14GB on > disk. > >>>> > >>>> If you have any replicates in your 700+ samples, it's a good > idea to > >>>> keep > >>>> their SNP calls in metadata(yourSE), although if you change > names it > >>>> > >>> needs > >>> > >>>> to propagate into the dependent metadata. This is why I started > >>>> > >>> monkeying > >>> > >>>> around with linkedExperiments where those mappings are > enforced; it's > >>>> becoming more of an issue with the TARGET pediatric AML > study, where > >>>> > >>> there > >>> > >>>> are numerous diagnosis-remission-relapse trios whose identity > I wish to > >>>> verify periodically. The SNPs on the 450k array are great > for this > >>>> purpose, but minfi doesn't really have a slot for them per > se, so live > >>>> in > >>>> metadata(). > >>>> > >>>> > >>>> --t > >>>> > >>>> On Fri, Sep 18, 2015 at 1:29 PM, Vincent Carey < > >>>> > >>> st...@channing.harvard.edu <mailto:st...@channing.harvard.edu> > >>> > >>>> wrote: > >>>> > >>>> i am dealing with ~700 450k arrays > >>>>> > >>>>> they are derived from one study, so it makes sense to think of > >>>>> > >>>>> them holistically. > >>>>> > >>>>> both the load time and the memory consumption are not > satisfactory. > >>>>> > >>>>> has anyone worked on an object type that implements the > rangedSE API > >>>>> > >>>> but > >>> > >>>> has > >>>>> > >>>>> the assay data out of memory? > >>>>> > >>>>> unix.time(load("wbmse.rda")) > >>>>>> > >>>>> user system elapsed > >>>>> > >>>>> 30.131 2.396 61.036 > >>>>> > >>>>> object.size(wbmse) > >>>>>> > >>>>> 124031032 bytes > >>>>> > >>>>> dim(wbmse) > >>>>>> > >>>>> [1] 485577 690 > >>>>> > >>>>> object.size(assays(wbmse)) > >>>>>> > >>>>> 2680430992 bytes > >>>>> > >>>>> [[alternative HTML version deleted]] > >>>>> > >>>>> _______________________________________________ > >>>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> > mailing list > >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>>>> > >>>>> [[alternative HTML version deleted]] > >>>> > >>>> _______________________________________________ > >>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> > mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>>> > >>>> [[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> > mailing list > >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>> > >>> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> > mailing list > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >> > >> > >> > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing > list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel