While we are on the topic, my GenoSet class will become a subclass of RangedSummarizedExperiment, rather than eSet, after this upcoming release. For this release both APIs work (colnames and sampleNames, etc.)
I think the range-free SummarizedExperiment will be great. I've seen a lot of ExpressionSets with random, non-exprs stuff in the exprs slot for lack of something more appropriate. Pete ____________________ Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Fri, Sep 18, 2015 at 6:09 PM, Ryan <r...@thompsonclan.org> wrote: > In the dev version, SummarizedExperiment has been split into > RangedSummarizedExperiment (equivalent to the current > SummarizedExperiement, with rowRanges) and SummarizedExperiment (kind of > like eSet, no rowRanges). Given that eSet objects also support multiple > assayData elements, I believe the new SummarizedExperiment is pretty close > to being eSet with different method names. In fact, I wonder if eSet > could/should be reimplemented as a subclass of the new SummarizedExperiment > class. > > > On 9/18/15 5:36 PM, Kasper Daniel Hansen wrote: > >> Interesting, thanks for the pointer. >> >> In light of the existing (and future) work on this, may I suggest an eSet >> like class, but build using the technologies in SummarizedExperiment. Ie. >> a SummarizedExperiment without the rowRanges. I would very much like this >> for modern work using eSet like containers. Not everything has ranges. >> >> Vince: I am not claiming that it is easy to work with; we have pains as >> well. But am I missing something or is the assay matrix only 2.3Gb? >> >> Best, >> Kasper >> >> On Fri, Sep 18, 2015 at 6:28 PM, Peter Haverty <haverty.pe...@gene.com> >> wrote: >> >> Yes, bigmemoryExtras::BigMatrix and genoset::RleDataFrame() are good >>> tricks >>> for reducing the size of your eSets and SummarizedExperiments. Both >>> object >>> types can go into assayData or assays. In fact, that's what they were >>> designed for. >>> >>> At Genentech, we use these for our 2.5e6 x 1e3 rectangular data from >>> Illumina SNP arrays. We typically have ~6 such rectangular objects in >>> one >>> eSet. With a mix of BigMatrix object for point estimates and >>> RleDataFrames >>> for segmented data, readRDS times are quite reasonable. >>> >>> >>> Pete >>> >>> ____________________ >>> Peter M. Haverty, Ph.D. >>> Genentech, Inc. >>> phave...@gene.com >>> >>> On Fri, Sep 18, 2015 at 1:56 PM, Tim Triche, Jr. <tim.tri...@gmail.com> >>> wrote: >>> >>> bigmemoryExtras (Peter Haverty's extensions to bigMemory/bigMatrix) can >>>> >>> be >>> >>>> handy for this, as it works well as a backend, especially if you go >>>> about >>>> splitting by chromosome as for CNV segmentation, DMR finding, etc. >>>> It's >>>> not as seamless as one might like, but it's the closest thing I've >>>> found. >>>> >>>> SciDb tries to implement a similar API, but for a distributed version of >>>> this where the data itself is in a columnar database and served on >>>> >>> demand. >>> >>>> I tried getting that up and running as a SummarizedExperiment backend, >>>> >>> but >>> >>>> did not succeed. I have previously shoveled all of the TCGA 450k data >>>> >>> into >>> >>>> one 7,000+ column bigMatrix which serializes to about 14GB on disk. >>>> >>>> If you have any replicates in your 700+ samples, it's a good idea to >>>> keep >>>> their SNP calls in metadata(yourSE), although if you change names it >>>> >>> needs >>> >>>> to propagate into the dependent metadata. This is why I started >>>> >>> monkeying >>> >>>> around with linkedExperiments where those mappings are enforced; it's >>>> becoming more of an issue with the TARGET pediatric AML study, where >>>> >>> there >>> >>>> are numerous diagnosis-remission-relapse trios whose identity I wish to >>>> verify periodically. The SNPs on the 450k array are great for this >>>> purpose, but minfi doesn't really have a slot for them per se, so live >>>> in >>>> metadata(). >>>> >>>> >>>> --t >>>> >>>> On Fri, Sep 18, 2015 at 1:29 PM, Vincent Carey < >>>> >>> st...@channing.harvard.edu >>> >>>> wrote: >>>> >>>> i am dealing with ~700 450k arrays >>>>> >>>>> they are derived from one study, so it makes sense to think of >>>>> >>>>> them holistically. >>>>> >>>>> both the load time and the memory consumption are not satisfactory. >>>>> >>>>> has anyone worked on an object type that implements the rangedSE API >>>>> >>>> but >>> >>>> has >>>>> >>>>> the assay data out of memory? >>>>> >>>>> unix.time(load("wbmse.rda")) >>>>>> >>>>> user system elapsed >>>>> >>>>> 30.131 2.396 61.036 >>>>> >>>>> object.size(wbmse) >>>>>> >>>>> 124031032 bytes >>>>> >>>>> dim(wbmse) >>>>>> >>>>> [1] 485577 690 >>>>> >>>>> object.size(assays(wbmse)) >>>>>> >>>>> 2680430992 bytes >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioc-devel@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> >>>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioc-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> >>>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >>> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> >> > > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel