I am glad you are keeping this discussion alive Kasper. On Mon, Mar 9, 2015 at 10:06 AM, Kasper Daniel Hansen < kasperdanielhan...@gmail.com> wrote:
> It sounds like the proposed changes are already made. However (like > others) I am still a bit mystified why this was necessary. The old version > did allow for a GRanges inside the DataFrame of the rowData, as far as I > recall. So I assume this is for efficiency. But why? What kind of > data/use cases is this for? > > I am happy to hear that SummarizedExperiment is going to be spun out into > its own package. When that happens, I have some comments, which I'll > include here in anticipation > 1) I now very strongly believe it was a design mistake to not have > colnames on the assays. The advantage of this choice is that sampleNames > are only stored one place. The extreme disadvantage is the high > ineffeciency when you want colnames on an extracted assay. > after example(SummarizedExperiment) > colnames(assays(se1)[[1]]) [1] "A" "B" "C" "D" "E" "F" so this seems to be optional. But attempts to set rownames will fail silently > rownames(assays(se1)[[1]]) = as.character(1:200) > rownames(assays(se1)[[1]]) NULL seems we could issue a warning there 2) I still strongly believe we should support pData, sampleNames etc etc > on SummarizedExperiments. > worthy of discussion > 3) Having developed a package (minfi) where eSets co-exists with > SummarizedExperiment, I have to mention that for the developer there is a > number of places where the different internals of these two classes makes > like irritating. For this reason I would support a "modern" implementation > of eSet, in parallel with SummarizedExperiment. > > also worthy of further discussion IMHO > Best, > Kasper > > On Fri, Mar 6, 2015 at 10:59 AM, Valerie Obenchain <voben...@fredhutch.org > > > wrote: > > > Hi Mike, > > > > Our error - we didn't bump GenomicRanges when rowRanges was added. > > Hopefully 1.19.43 will propagate today and things will be sorted out. > > > > Val > > > > > > On 03/06/2015 07:40 AM, Michael Love wrote: > > > >> hi all, > >> > >> just a practical issue: I have GenomicRanges version 1.19.42 on my > >> computer which does not have rowRanges defined, although the 1.19.42 > >> version on the Bioc website does have rowRanges in the man page: > >> > >> > http://master.bioconductor.org/packages/3.1/bioc/html/GenomicRanges.html > >> > >> So I pass check locally but not in the devel branch on Bioc servers. > >> > >> library(GenomicRanges) > >>> rowRanges > >>> > >> Error: object 'rowRanges' not found > >> > >>> sessionInfo() > >>> > >> R Under development (unstable) (2014-12-08 r67137) > >> Platform: x86_64-apple-darwin12.5.0 (64-bit) > >> > >> locale: > >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > >> > >> attached base packages: > >> [1] stats4 parallel stats graphics grDevices datasets utils > >> methods base > >> > >> other attached packages: > >> [1] GenomicRanges_1.19.42 GenomeInfoDb_1.3.13 IRanges_2.1.41 > >> S4Vectors_0.5.21 > >> [5] BiocGenerics_0.13.6 RUnit_0.4.28 devtools_1.7.0 > >> knitr_1.9 > >> [9] BiocInstaller_1.17.5 > >> > >> > >> > >> On Wed, Mar 4, 2015 at 3:03 PM, Martin Morgan <mtmor...@fredhutch.org> > >> wrote: > >> > >>> > >>> On 03/04/2015 10:03 AM, Peter Haverty wrote: > >>> > >>>> > >>>> Michael has a good point. The complexity of the BioC universe of > classes > >>>> hurts our ability to attract new users. More classes would be a minus > >>>> there > >>>> ... but a small set of common, explicit APIs would simplify things. > >>>> Rectangular things implement the matrix Interface. :-) Deprecating > old > >>>> stuff, like eSet, might help more than it hurts, on the simplicity > >>>> front. > >>>> > >>>> P.S. apropos of understanding this universe of classes, I *love* the > >>>> methods(class=x) thing Vincent mentioned. > >>>> > >>> > >>> > >>> The current version, under R-devel, is at > >>> > >>> devtools::source_gist("https://gist.github.com/mtmorgan/ > >>> 9f98871adb9f0c1891a4") > >>> > >>> > methods(class="SummarizedExperiment") > >>> [1] [ [[ [[<- [<- > >>> [5] $ $<- assay assay<- > >>> [9] assayNames assayNames<- assays assays<- > >>> [13] cbind coerce colData colData<- > >>> [17] compare Compare countOverlaps coverage > >>> [21] dim dimnames dimnames<- > >>> disjointBins > >>> [25] distance distanceToNearest duplicated > >>> elementMetadata > >>> [29] elementMetadata<- end end<- exptData > >>> [33] exptData<- extractROWS findOverlaps flank > >>> [37] follow granges isDisjoint mcols > >>> [41] mcols<- narrow nearest order > >>> [45] overlapsAny precede ranges ranges<- > >>> [49] rank rbind replaceROWS resize > >>> [53] restrict rowData rowData<- seqinfo > >>> [57] seqinfo<- seqnames shift show > >>> [61] sort split start start<- > >>> [65] strand strand<- subset > >>> subsetByOverlaps > >>> [69] updateObject values values<- width > >>> [73] width<- > >>> > >>> see ?"methods" for accessing help and source code > >>> > >>> and > >>> > >>> head(attr(methods(class="SummarizedExperiment"), "info")) > >>>> > >>> generic > >>> visible > >>> [,SummarizedExperiment,ANY-method [ > >>> TRUE > >>> [[,SummarizedExperiment,ANY,missing-method [[ > >>> TRUE > >>> [[<-,SummarizedExperiment,ANY,missing-method [[<- > >>> TRUE > >>> [<-,SummarizedExperiment,ANY,ANY,SummarizedExperiment-method [<- > >>> TRUE > >>> $,SummarizedExperiment-method $ > >>> TRUE > >>> $<-,SummarizedExperiment-method $<- > >>> TRUE > >>> isS4 > >>> from > >>> [,SummarizedExperiment,ANY-method TRUE > >>> GenomicRanges > >>> [[,SummarizedExperiment,ANY,missing-method TRUE > >>> GenomicRanges > >>> [[<-,SummarizedExperiment,ANY,missing-method TRUE > >>> GenomicRanges > >>> [<-,SummarizedExperiment,ANY,ANY,SummarizedExperiment-method TRUE > >>> GenomicRanges > >>> $,SummarizedExperiment-method TRUE > >>> GenomicRanges > >>> $<-,SummarizedExperiment-method TRUE > >>> GenomicRanges > >>> > >>> Martin > >>> > >>> > >>>> Pete > >>>> > >>>> ____________________ > >>>> Peter M. Haverty, Ph.D. > >>>> Genentech, Inc. > >>>> phave...@gene.com > >>>> > >>>> On Wed, Mar 4, 2015 at 9:38 AM, Michael Lawrence < > >>>> lawrence.mich...@gene.com> > >>>> wrote: > >>>> > >>>> I think we need to make sure that there are enough benefits of > >>>>> something > >>>>> like GRangesFrame before we introduce yet another complicated and > >>>>> overlapping data structure into the framework. Prior to > summarization, > >>>>> the > >>>>> ranges seem primary, after summarization, it may often make sense for > >>>>> them > >>>>> to be secondary. But I'm just not sure what we gain from a new data > >>>>> structure. > >>>>> > >>>>> On Wed, Mar 4, 2015 at 12:28 AM, Herv� Pag�s <hpa...@fredhutch.org> > >>>>> wrote: > >>>>> > >>>>> GRangesFrame is an interesting idea and I gave it some thoughts. > >>>>>> > >>>>>> There is this nice symmetry between GRanges and GRangesFrame: > >>>>>> > >>>>>> - GRanges = a naked GRanges + a DataFrame accessible via mcols() > >>>>>> > >>>>>> - GRangesFrame = a DataFrame + a naked GRanges accessible via > >>>>>> some accessor (e.g. rowRanges()) > >>>>>> > >>>>>> So GRanges and GRangesFrame are equivalent in terms of what they > >>>>>> can hold, but different in terms of API: the former has the ranges > >>>>>> API as primary API and the DataFrame API on its mcols() component, > >>>>>> and the latter has the DataFrame API as primary API and the ranges > >>>>>> API on its rowRanges() component. Nice switch! > >>>>>> > >>>>>> What does this API switch bring us? A GRangesFrame object is now > >>>>>> an object that fully behaves like a DataFrame and people can also > >>>>>> perform range-based operations on its rowRanges() component. > >>>>>> Here is what I'm afraid is going to happen: people will also want > >>>>>> to be able to perform range-based operations *directly* on > >>>>>> these objects, i.e. without having to call rowRanges() first. > >>>>>> So for example when they do subsetByOverlaps(), subsetting > >>>>>> happens vertically. Also the Hits object returned by findOverlaps() > >>>>>> would contain row indices. Problem with this is that these objects > >>>>>> now start to suffer from the "dual personality syndrome". For > >>>>>> example, it's not clear anymore what their length should be. > >>>>>> Strictly speaking it should be their number of columns (that's > >>>>>> what the length of a DataFrame is), but the ranges API that > >>>>>> we're trying to put on them also makes them feel like vectors > >>>>>> along the vertical dimension so it also feels that their length > >>>>>> should be their number of rows. Same thing with 1D subsetting. > >>>>>> Why does it subset the columns and not the rows? Most people > >>>>>> are now confused. > >>>>>> > >>>>>> It's interesting to note that the same thing happens with GRanges > >>>>>> objects, but in the opposite direction: people wish they could > >>>>>> do DataFrame operations directly on them without calling mcols() > >>>>>> first. But in order to preserve the good health of GRanges objects, > >>>>>> we've not done that (except for $, a shortcut for mcols(x)$, > >>>>>> the pressure was just too strong). > >>>>>> > >>>>>> H. > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 03/03/2015 04:35 PM, Michael Lawrence wrote: > >>>>>> > >>>>>> Should be possible for the annotations to be of any type, as long > as > >>>>>>> they > >>>>>>> satisfy a simple contract of NROW() and 2D "[". Then, you could > have > >>>>>>> a > >>>>>>> DataFrame, GRanges, or whatever in there. But it would be nice to > >>>>>>> have a > >>>>>>> special class for the container with range information. The > contract > >>>>>>> for > >>>>>>> the range annotation would be to have a granges() method. > >>>>>>> > >>>>>>> I agree it would be nice if there was a way with the methods > package > >>>>>>> to > >>>>>>> easily assert such contracts. For example, one could define an > >>>>>>> interface > >>>>>>> with a set of generics (and optionally the relevant position in the > >>>>>>> generic > >>>>>>> signature). Then, once all of the methods have been assigned for a > >>>>>>> particular class, it is made to inherit from that contract class. > >>>>>>> There > >>>>>>> are > >>>>>>> lots of gotchas though. Not sure how useful it would be in > practice. > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Mar 3, 2015 at 4:07 PM, Peter Haverty < > >>>>>>> haverty.pe...@gene.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>> There are some nice similarities in these new imaginary types. > A > >>>>>>> > >>>>>>>> > >>>>>>>> "GRangesFrame" is a list of dimensionally identical things > >>>>>>>> (columns) and > >>>>>>>> some row meta-data (the GRanges). The SE-like object is > similarly a > >>>>>>>> list > >>>>>>>> of dimensionally like things (matrices, RleDataFrames, BigMatrix > >>>>>>>> objects, > >>>>>>>> HDF5-backed things) with some row meta-data (a DataFrame or > >>>>>>>> GRangesFrame). > >>>>>>>> Elegant? Maybe they would actually be relatives in the class > tree. > >>>>>>>> > >>>>>>>> I wonder if this kind of thing would be easier if we had > Java-style > >>>>>>>> Interfaces or duck-typing. The "x" slot of "y" holds something > that > >>>>>>>> implements this set of methods ... > >>>>>>>> > >>>>>>>> Oh, and kinda apropos, the genoset class will probably go away or > >>>>>>>> become > >>>>>>>> an extension to this new SE-like thing. The extra stuff that > comes > >>>>>>>> along > >>>>>>>> with genoset will still be available. > >>>>>>>> > >>>>>>>> Pete > >>>>>>>> > >>>>>>>> ____________________ > >>>>>>>> Peter M. Haverty, Ph.D. > >>>>>>>> Genentech, Inc. > >>>>>>>> phave...@gene.com > >>>>>>>> > >>>>>>>> On Tue, Mar 3, 2015 at 3:42 PM, Tim Triche, Jr. < > >>>>>>>> tim.tri...@gmail.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>> This. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> It would be damned near perfect as a return value for assays > coming > >>>>>>>>> out of > >>>>>>>>> an object that held several such assays at several time points > in a > >>>>>>>>> population, where there are both assay-wise and covariate-wise > >>>>>>>>> "holes" > >>>>>>>>> that > >>>>>>>>> could nonetheless be usefully imputed across assays. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Statistics is the grammar of science. > >>>>>>>>> Karl Pearson < > http://en.wikipedia.org/wiki/The_Grammar_of_Science> > >>>>>>>>> > >>>>>>>>> On Tue, Mar 3, 2015 at 3:25 PM, Peter Haverty < > >>>>>>>>> haverty.pe...@gene.com> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> I still think GRanges should be a subclass of DataFrame, > >>>>>>>>>>> > >>>>>>>>>>> which would make this easy, but I don't seem to be winning > that > >>>>>>>>>>>> > >>>>>>>>>>>> argument. > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> Just impossible. As Michael mentioned back in November, they > >>>>>>>>>>> have > >>>>>>>>>>> conflicting APIs. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Maybe a new "GRangesFrame" that is a DataFrame and holds a > GRanges > >>>>>>>>>> (without mcols) as an index? > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> [[alternative HTML version deleted]] > >>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> Bioc-devel@r-project.org mailing list > >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> [[alternative HTML version deleted]] > >>>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> Bioc-devel@r-project.org mailing list > >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> [[alternative HTML version deleted]] > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioc-devel@r-project.org mailing list > >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>> Herv� Pag�s > >>>>>> > >>>>>> Program in Computational Biology > >>>>>> Division of Public Health Sciences > >>>>>> Fred Hutchinson Cancer Research Center > >>>>>> 1100 Fairview Ave. N, M1-B514 > >>>>>> P.O. Box 19024 > >>>>>> Seattle, WA 98109-1024 > >>>>>> > >>>>>> E-mail: hpa...@fredhutch.org > >>>>>> Phone: (206) 667-5791 > >>>>>> Fax: (206) 667-1319 > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>> [[alternative HTML version deleted]] > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Bioc-devel@r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>>> > >>>> > >>> > >>> -- > >>> Computational Biology / Fred Hutchinson Cancer Research Center > >>> 1100 Fairview Ave. N. > >>> PO Box 19024 Seattle, WA 98109 > >>> > >>> Location: Arnold Building M1 B861 > >>> Phone: (206) 667-2793 > >>> > >>> > >>> _______________________________________________ > >>> Bioc-devel@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >>> > >> > >> _______________________________________________ > >> Bioc-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel > >> > >> > > > > -- > > Computational Biology / Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, Seattle, WA 98109 > > > > Email: voben...@fredhutch.org > > Phone: (206) 667-3158 > > > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel