On Tue, Mar 25, 2014 at 9:31 AM, Peter Haverty <haverty.pe...@gene.com>wrote:
> One benefit of having dimnames on assays would be that one could use > DataFrames as assays, like in eSet. My genoset class is becoming more and > more like SummarizedExperiment. The dimname issues prevent me from > switching entirely from eSet to SummarizedExperiment. > > I think that keeping only one copy of dimnames is a great feature, if a bit > dangerous. My typical object has ~6 BigMatrix and/or DataFrame of Rle > objects as assays, so the rownames actually make up a considerable portion > of the object size. (My typical dataset is 2.5M rows by 1k samples). I've > been moving towards keeping a single dimnames copy just to improve RData > load times. > > I think that assays should be required to have dimnames when they are added > to a SummarizedExperiment. These dimnames should be checked for equality > with the dimnames of the SE in the setter function. > > Perhaps with the recent (R 3.1) improvements in shallow/lazy copying and > reference counting, adding dimnames to outgoing assays will be less of a > performance hit. > > Yes, with the 3.1 improvements, there should in theory be only one copy of the rownames, i.e., rownames will not be duplicated when assigned to an object. Due to the way R lays out its objects, any object will be shallow copied when an attribute is assigned. Thus, assigning dimnames to a matrix will duplicate the matrix data, which is obviously costly. But, assigning dimnames to a data.frame, DataFrame or just a dummy S4 object that wraps a matrix, will be a cheap copy, i.e., there would only be a copy of a short list. It might be worth the time to implement a SimpleMatrix class that just delegates all of the matrix operations to a "matrix" slot (also would need slots for the dims and dimnames). The reasoning is that a matrix is conceptually more appropriate than a data frame for storing assay data. > I also like the compromise I have seen elsewhere, where the colnames are > always retained on assays, but only one rownames copy is kept. Colnames > are typically small and getting them wrong often makes for silent, but > catastrophic errors. > > Pete > > ____________________ > Peter M. Haverty, Ph.D. > Genentech, Inc. > phave...@gene.com > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel