Michael has a good point. The complexity of the BioC universe of classes hurts our ability to attract new users. More classes would be a minus there ... but a small set of common, explicit APIs would simplify things. Rectangular things implement the matrix Interface. :-) Deprecating old stuff, like eSet, might help more than it hurts, on the simplicity front.
P.S. apropos of understanding this universe of classes, I *love* the methods(class=x) thing Vincent mentioned. Pete ____________________ Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Wed, Mar 4, 2015 at 9:38 AM, Michael Lawrence <lawrence.mich...@gene.com> wrote: > I think we need to make sure that there are enough benefits of something > like GRangesFrame before we introduce yet another complicated and > overlapping data structure into the framework. Prior to summarization, the > ranges seem primary, after summarization, it may often make sense for them > to be secondary. But I'm just not sure what we gain from a new data > structure. > > On Wed, Mar 4, 2015 at 12:28 AM, Herv� Pag�s <hpa...@fredhutch.org> wrote: > >> GRangesFrame is an interesting idea and I gave it some thoughts. >> >> There is this nice symmetry between GRanges and GRangesFrame: >> >> - GRanges = a naked GRanges + a DataFrame accessible via mcols() >> >> - GRangesFrame = a DataFrame + a naked GRanges accessible via >> some accessor (e.g. rowRanges()) >> >> So GRanges and GRangesFrame are equivalent in terms of what they >> can hold, but different in terms of API: the former has the ranges >> API as primary API and the DataFrame API on its mcols() component, >> and the latter has the DataFrame API as primary API and the ranges >> API on its rowRanges() component. Nice switch! >> >> What does this API switch bring us? A GRangesFrame object is now >> an object that fully behaves like a DataFrame and people can also >> perform range-based operations on its rowRanges() component. >> Here is what I'm afraid is going to happen: people will also want >> to be able to perform range-based operations *directly* on >> these objects, i.e. without having to call rowRanges() first. >> So for example when they do subsetByOverlaps(), subsetting >> happens vertically. Also the Hits object returned by findOverlaps() >> would contain row indices. Problem with this is that these objects >> now start to suffer from the "dual personality syndrome". For >> example, it's not clear anymore what their length should be. >> Strictly speaking it should be their number of columns (that's >> what the length of a DataFrame is), but the ranges API that >> we're trying to put on them also makes them feel like vectors >> along the vertical dimension so it also feels that their length >> should be their number of rows. Same thing with 1D subsetting. >> Why does it subset the columns and not the rows? Most people >> are now confused. >> >> It's interesting to note that the same thing happens with GRanges >> objects, but in the opposite direction: people wish they could >> do DataFrame operations directly on them without calling mcols() >> first. But in order to preserve the good health of GRanges objects, >> we've not done that (except for $, a shortcut for mcols(x)$, >> the pressure was just too strong). >> >> H. >> >> >> >> On 03/03/2015 04:35 PM, Michael Lawrence wrote: >> >>> Should be possible for the annotations to be of any type, as long as they >>> satisfy a simple contract of NROW() and 2D "[". Then, you could have a >>> DataFrame, GRanges, or whatever in there. But it would be nice to have a >>> special class for the container with range information. The contract for >>> the range annotation would be to have a granges() method. >>> >>> I agree it would be nice if there was a way with the methods package to >>> easily assert such contracts. For example, one could define an interface >>> with a set of generics (and optionally the relevant position in the >>> generic >>> signature). Then, once all of the methods have been assigned for a >>> particular class, it is made to inherit from that contract class. There >>> are >>> lots of gotchas though. Not sure how useful it would be in practice. >>> >>> >>> On Tue, Mar 3, 2015 at 4:07 PM, Peter Haverty <haverty.pe...@gene.com> >>> wrote: >>> >>> There are some nice similarities in these new imaginary types. A >>>> "GRangesFrame" is a list of dimensionally identical things (columns) and >>>> some row meta-data (the GRanges). The SE-like object is similarly a >>>> list >>>> of dimensionally like things (matrices, RleDataFrames, BigMatrix >>>> objects, >>>> HDF5-backed things) with some row meta-data (a DataFrame or >>>> GRangesFrame). >>>> Elegant? Maybe they would actually be relatives in the class tree. >>>> >>>> I wonder if this kind of thing would be easier if we had Java-style >>>> Interfaces or duck-typing. The "x" slot of "y" holds something that >>>> implements this set of methods ... >>>> >>>> Oh, and kinda apropos, the genoset class will probably go away or become >>>> an extension to this new SE-like thing. The extra stuff that comes >>>> along >>>> with genoset will still be available. >>>> >>>> Pete >>>> >>>> ____________________ >>>> Peter M. Haverty, Ph.D. >>>> Genentech, Inc. >>>> phave...@gene.com >>>> >>>> On Tue, Mar 3, 2015 at 3:42 PM, Tim Triche, Jr. <tim.tri...@gmail.com> >>>> wrote: >>>> >>>> This. >>>>> >>>>> It would be damned near perfect as a return value for assays coming >>>>> out of >>>>> an object that held several such assays at several time points in a >>>>> population, where there are both assay-wise and covariate-wise "holes" >>>>> that >>>>> could nonetheless be usefully imputed across assays. >>>>> >>>>> >>>>> Statistics is the grammar of science. >>>>> Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> >>>>> >>>>> On Tue, Mar 3, 2015 at 3:25 PM, Peter Haverty <haverty.pe...@gene.com> >>>>> wrote: >>>>> >>>>> >>>>>>> >>>>>>> >>>>>>> I still think GRanges should be a subclass of DataFrame, >>>>>>> >>>>>>>> which would make this easy, but I don't seem to be winning that >>>>>>>> >>>>>>> argument. >>>>>> >>>>>>> >>>>>>>> >>>>>>> Just impossible. As Michael mentioned back in November, they have >>>>>>> conflicting APIs. >>>>>>> >>>>>> >>>>>> >>>>>> Maybe a new "GRangesFrame" that is a DataFrame and holds a GRanges >>>>>> (without mcols) as an index? >>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> _______________________________________________ >>>>>> Bioc-devel@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>> >>>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioc-devel@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> >>>>> >>>> >>>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >>> >> -- >> Herv� Pag�s >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpa...@fredhutch.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> > > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel