Clarification: the complexity of the full BioC class universe, not the SE/eSet part. GenomicRanges, GRanges, GRangesList, RangesView, RangesViewsList, ... I think all of that intimidates new people. Maybe that's not generally the case. Sorry, I've taken this thread way off topic. I'll stop now.
Pete ____________________ Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Wed, Mar 4, 2015 at 10:08 AM, Tim Triche, Jr. <tim.tri...@gmail.com> wrote: > What complexity? The Nature Methods paper laid it out: for most people, > most of the time, use an SE. > > That way, the organization of metadata and covariates is enforced for you, > like an ExpressionSet (another winning data structure) but without its > baggage. > > Maybe the "Summarized" in the name isn't such a bad idea after all. > "AfterTheDataMungingIsDone" doesn't have the same ring to it. > > What would be equally awesome IMHO is to have a similarly unifying > structure for integrative work. > > But that's just, like, my opinion. I've taken a whack at it when I knew > even less than I do now, and it's hard. However, data management for > expression arrays was hard, too. If I'm not mistaken, there were benefits > to solving that data management problem, too. Some sort of a software > project. I think it was called "MADMAN". I'll have to go look. ;-) > > > > Statistics is the grammar of science. > Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> > > On Wed, Mar 4, 2015 at 10:03 AM, Peter Haverty <haverty.pe...@gene.com> > wrote: > >> Michael has a good point. The complexity of the BioC universe of >> classes hurts our ability to attract new users. More classes would be a >> minus there ... but a small set of common, explicit APIs would simplify >> things. Rectangular things implement the matrix Interface. :-) >> Deprecating old stuff, like eSet, might help more than it hurts, on the >> simplicity front. >> >> P.S. apropos of understanding this universe of classes, I *love* the >> methods(class=x) thing Vincent mentioned. >> >> Pete >> >> ____________________ >> Peter M. Haverty, Ph.D. >> Genentech, Inc. >> phave...@gene.com >> >> On Wed, Mar 4, 2015 at 9:38 AM, Michael Lawrence < >> lawrence.mich...@gene.com> wrote: >> >>> I think we need to make sure that there are enough benefits of something >>> like GRangesFrame before we introduce yet another complicated and >>> overlapping data structure into the framework. Prior to summarization, the >>> ranges seem primary, after summarization, it may often make sense for them >>> to be secondary. But I'm just not sure what we gain from a new data >>> structure. >>> >>> On Wed, Mar 4, 2015 at 12:28 AM, Herv� Pag�s <hpa...@fredhutch.org> >>> wrote: >>> >>>> GRangesFrame is an interesting idea and I gave it some thoughts. >>>> >>>> There is this nice symmetry between GRanges and GRangesFrame: >>>> >>>> - GRanges = a naked GRanges + a DataFrame accessible via mcols() >>>> >>>> - GRangesFrame = a DataFrame + a naked GRanges accessible via >>>> some accessor (e.g. rowRanges()) >>>> >>>> So GRanges and GRangesFrame are equivalent in terms of what they >>>> can hold, but different in terms of API: the former has the ranges >>>> API as primary API and the DataFrame API on its mcols() component, >>>> and the latter has the DataFrame API as primary API and the ranges >>>> API on its rowRanges() component. Nice switch! >>>> >>>> What does this API switch bring us? A GRangesFrame object is now >>>> an object that fully behaves like a DataFrame and people can also >>>> perform range-based operations on its rowRanges() component. >>>> Here is what I'm afraid is going to happen: people will also want >>>> to be able to perform range-based operations *directly* on >>>> these objects, i.e. without having to call rowRanges() first. >>>> So for example when they do subsetByOverlaps(), subsetting >>>> happens vertically. Also the Hits object returned by findOverlaps() >>>> would contain row indices. Problem with this is that these objects >>>> now start to suffer from the "dual personality syndrome". For >>>> example, it's not clear anymore what their length should be. >>>> Strictly speaking it should be their number of columns (that's >>>> what the length of a DataFrame is), but the ranges API that >>>> we're trying to put on them also makes them feel like vectors >>>> along the vertical dimension so it also feels that their length >>>> should be their number of rows. Same thing with 1D subsetting. >>>> Why does it subset the columns and not the rows? Most people >>>> are now confused. >>>> >>>> It's interesting to note that the same thing happens with GRanges >>>> objects, but in the opposite direction: people wish they could >>>> do DataFrame operations directly on them without calling mcols() >>>> first. But in order to preserve the good health of GRanges objects, >>>> we've not done that (except for $, a shortcut for mcols(x)$, >>>> the pressure was just too strong). >>>> >>>> H. >>>> >>>> >>>> >>>> On 03/03/2015 04:35 PM, Michael Lawrence wrote: >>>> >>>>> Should be possible for the annotations to be of any type, as long as >>>>> they >>>>> satisfy a simple contract of NROW() and 2D "[". Then, you could have a >>>>> DataFrame, GRanges, or whatever in there. But it would be nice to have >>>>> a >>>>> special class for the container with range information. The contract >>>>> for >>>>> the range annotation would be to have a granges() method. >>>>> >>>>> I agree it would be nice if there was a way with the methods package to >>>>> easily assert such contracts. For example, one could define an >>>>> interface >>>>> with a set of generics (and optionally the relevant position in the >>>>> generic >>>>> signature). Then, once all of the methods have been assigned for a >>>>> particular class, it is made to inherit from that contract class. >>>>> There are >>>>> lots of gotchas though. Not sure how useful it would be in practice. >>>>> >>>>> >>>>> On Tue, Mar 3, 2015 at 4:07 PM, Peter Haverty <haverty.pe...@gene.com> >>>>> wrote: >>>>> >>>>> There are some nice similarities in these new imaginary types. A >>>>>> "GRangesFrame" is a list of dimensionally identical things (columns) >>>>>> and >>>>>> some row meta-data (the GRanges). The SE-like object is similarly a >>>>>> list >>>>>> of dimensionally like things (matrices, RleDataFrames, BigMatrix >>>>>> objects, >>>>>> HDF5-backed things) with some row meta-data (a DataFrame or >>>>>> GRangesFrame). >>>>>> Elegant? Maybe they would actually be relatives in the class tree. >>>>>> >>>>>> I wonder if this kind of thing would be easier if we had Java-style >>>>>> Interfaces or duck-typing. The "x" slot of "y" holds something that >>>>>> implements this set of methods ... >>>>>> >>>>>> Oh, and kinda apropos, the genoset class will probably go away or >>>>>> become >>>>>> an extension to this new SE-like thing. The extra stuff that comes >>>>>> along >>>>>> with genoset will still be available. >>>>>> >>>>>> Pete >>>>>> >>>>>> ____________________ >>>>>> Peter M. Haverty, Ph.D. >>>>>> Genentech, Inc. >>>>>> phave...@gene.com >>>>>> >>>>>> On Tue, Mar 3, 2015 at 3:42 PM, Tim Triche, Jr. <tim.tri...@gmail.com >>>>>> > >>>>>> wrote: >>>>>> >>>>>> This. >>>>>>> >>>>>>> It would be damned near perfect as a return value for assays coming >>>>>>> out of >>>>>>> an object that held several such assays at several time points in a >>>>>>> population, where there are both assay-wise and covariate-wise >>>>>>> "holes" >>>>>>> that >>>>>>> could nonetheless be usefully imputed across assays. >>>>>>> >>>>>>> >>>>>>> Statistics is the grammar of science. >>>>>>> Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> >>>>>>> >>>>>>> On Tue, Mar 3, 2015 at 3:25 PM, Peter Haverty < >>>>>>> haverty.pe...@gene.com> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I still think GRanges should be a subclass of DataFrame, >>>>>>>>> >>>>>>>>>> which would make this easy, but I don't seem to be winning that >>>>>>>>>> >>>>>>>>> argument. >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> Just impossible. As Michael mentioned back in November, they have >>>>>>>>> conflicting APIs. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Maybe a new "GRangesFrame" that is a DataFrame and holds a GRanges >>>>>>>> (without mcols) as an index? >>>>>>>> >>>>>>>> >>>>>>>> [[alternative HTML version deleted]] >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioc-devel@r-project.org mailing list >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>>>> >>>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioc-devel@r-project.org mailing list >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioc-devel@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>> >>>>> >>>> -- >>>> Herv� Pag�s >>>> >>>> Program in Computational Biology >>>> Division of Public Health Sciences >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N, M1-B514 >>>> P.O. Box 19024 >>>> Seattle, WA 98109-1024 >>>> >>>> E-mail: hpa...@fredhutch.org >>>> Phone: (206) 667-5791 >>>> Fax: (206) 667-1319 >>>> >>> >>> >> > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel