Sounds good. One note: if range information becomes optional, it would be nice if we could mark the availability of the information in the class hierarchy. Otherwise, it's not easy to enforce a contract (that we can call range-based methods on a SE) through dispatch. An alternative would be to drop direct range-based accessors and operations from SummarizedExperiment, although that potentially puts more burden on the user.
On Mon, Dec 1, 2014 at 10:30 AM, Martin Morgan <mtmor...@fredhutch.org> wrote: > On 11/26/2014 12:11 PM, Hervé Pagès wrote: > >> Hi guys, >> >> I like the idea of separating the row data from the row ranges. >> This could be formalized with 2 distinct accessors: rowData() and >> rowRanges(). The former would return a DataFrame, and the latter >> NULL or a range-based object (GRanges or GRangesList). >> I don't think there is the need for an emptyRanges class. >> > > For the original question, I think the ability to store genomic > coordinates as well as other 'S4Vector' classes is very helpful for > advanced users, even if a little intimidating for novice users. > > Also, it's clear that SummarizedExperiment in its current form doesn't > satisfy the common use case of identifiers without range information. > > I think it makes sense to enable some like Herve outlines above, where the > rowData() are separated into range information and annotation information, > and I'll move forward with that implementation over the next week or so. > > Martin > > > >> H. >> >> On 11/26/2014 11:40 AM, Hector Corrada Bravo wrote: >> >>> One thing that’s become apparent working on epivizr is that it may be >>> useful >>> to think about ‘rowData’ in a SummarizedExperiment as having two distinct >>> components: row coordinates and row metadata. In the current class >>> rowData is >>> a ‘GenomicRanges’ which contains both coordinates (the ranges) and >>> metadata >>> (mcols(rowData)). In metagenomics (the other application my group works >>> a lot >>> with), we think of the taxonomy as providing coordinates. The >>> distinction is >>> worthwhile thinking about since there are certain operations we do on >>> coordinates that we don’t do with metadata (and conversely). >>> >>> >>> >>> >>> Thinking about it this way, the ‘ExpressionSet’ object would be data >>> without >>> coordinates. So, I would avoid making ‘GenomicRanges’ behave like >>> ‘DataFrame’ >>> since this distinction between coordinates and metadata is lost. The >>> ‘emptyRanges’ proposal gets closer to this since this corresponds to ‘no >>> coordinates’, but it may be worth thinking in the long term on making the >>> coordinate/metadata distinction more general. >>> >>> >>> >>> >>> Hector >>> >>> On Wed, Nov 26, 2014 at 12:38 PM, Tim Triche, Jr. <tim.tri...@gmail.com> >>> wrote: >>> >>> so as a simple experiment, I did the following: >>>> library(GenomicRanges) >>>> bar <- matrix(rnorm(100), ncol=10) >>>> colnames(bar) <- as.character(1:10) >>>> rownames(bar) <- letters[1:10] >>>> foo <- SummarizedExperiment(assays=list(bar=bar)) >>>> rowData(foo) >>>> ## GRangesList object of length 10: >>>> ## $a >>>> ## GRanges object with 0 ranges and 0 metadata columns: >>>> ## seqnames ranges strand >>>> ## <Rle> <IRanges> <Rle> >>>> ## >>>> ## $b >>>> ## GRanges object with 0 ranges and 0 metadata columns: >>>> ## seqnames ranges strand >>>> ## >>>> ## $c >>>> ## GRanges object with 0 ranges and 0 metadata columns: >>>> ## seqnames ranges strand >>>> ## >>>> ## ... >>>> ## <7 more elements> >>>> colData(foo) >>>> ## DataFrame with 10 rows and 0 columns >>>> This got me to thinking, why not have an emptyRanges class, or else the >>>> ability to index a bunch of NULL ranges without a lot of hoohah? The >>>> defaults mostly do what they're supposed to; why not have a compact >>>> representation of empty rowData as for empty colData (i.e., a DataFrame >>>> with 0 rows)? Or is a GRangesList of empty GRanges as compact as it is >>>> practicable to get for this purpose? >>>> Just pondering what the lowest-impact solution to the problem at hand >>>> might >>>> be. >>>> Statistics is the grammar of science. >>>> Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> >>>> On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty <haverty.pe...@gene.com> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I believe there is a strong need for an object that organizes a >>>>> collection >>>>> of rectangular data (matrices, etc.) with metadata on the rows and >>>>> columns. Can SummarizedExperiment inherit from something simpler that >>>>> has >>>>> a DataFrame as rowData? (I believe GenomicRanges should inherit from >>>>> DataTable, rather than Vector, and subset as x[i,j], but maybe that's >>>>> getting a bit off topic.) I often see people stuffing arbitrary data >>>>> into >>>>> an ExpressionSet and calling one of the assays "exprs" as a >>>>> work-around. >>>>> >>>>> Regards, >>>>> >>>>> Pete >>>>> >>>>> ____________________ >>>>> Peter M. Haverty, Ph.D. >>>>> Genentech, Inc. >>>>> phave...@gene.com >>>>> >>>>> On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto <lg...@cam.ac.uk> >>>>> wrote: >>>>> >>>>> >>>>>> On 26 November 2014 14:59, Wolfgang Huber wrote: >>>>>> >>>>>> A colleague and I are designing a package for quantitative proteomics >>>>>>> data, and we are debating whether to base it on the >>>>>>> SummarizedExperiment or the ExpressionSet class. >>>>>>> >>>>>>> There is no immediate use for the ranges aspect of >>>>>>> SummarizedExperiment, so that would have to be carried around with >>>>>>> NAs, and this is a parsimony argument for using ExpressionSet >>>>>>> instead. OTOH, the interface of SummarizedExperiment is cleaner, its >>>>>>> code more modern and more likely to be updated, and users of the >>>>>>> Bioconductor project are likely to benefit from having to deal with a >>>>>>> single interface that works the same or similarly across packages, >>>>>>> rather than a variety of formats; which argues that new packages >>>>>>> should converge towards SummarizedExperiment('s interface). >>>>>>> >>>>>>> Are there any pertinent insights from this group? >>>>>>> >>>>>> >>>>>> Instead of ExpressionSet, you could use MSnbase::MSnSet, which is >>>>>> essentially an ExpressionSet for quantitative proteomics (i.e it has a >>>>>> MIAPE slot, instead of MIAME for example). >>>>>> >>>>>> Ideally, a SummarizedExperiment for proteomics would use >>>>>> peptide/protein >>>>>> ranges, which is in the pipeline, as far as I am concerned. When that >>>>>> becomes available, there should be infrastructure to coerce and MSnSet >>>>>> (and/or other relevant data) into an SummarizedExperiment. >>>>>> >>>>>> Hope this helps. >>>>>> >>>>>> Best wishes, >>>>>> >>>>>> Laurent >>>>>> >>>>>> Thanks and best wishes >>>>>>> Wolfgang >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioc-devel@r-project.org mailing list >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>>> >>>>>> >>>>>> -- >>>>>> Laurent Gatto >>>>>> http://cpu.sysbiol.cam.ac.uk/ >>>>>> >>>>>> _______________________________________________ >>>>>> Bioc-devel@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>> >>>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioc-devel@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> >>>>> [[alternative HTML version deleted]] >>>> _______________________________________________ >>>> Bioc-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >>> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel