On Tue, Mar 25, 2014 at 9:31 AM, Peter Haverty <haverty.pe...@gene.com>wrote:

> One benefit of having dimnames on assays would be that one could use
> DataFrames as assays, like in eSet.  My genoset class is becoming more and
> more like SummarizedExperiment. The dimname issues prevent me from
> switching entirely from eSet to SummarizedExperiment.
>
> I think that keeping only one copy of dimnames is a great feature, if a bit
> dangerous.  My typical object has ~6 BigMatrix and/or DataFrame of Rle
> objects as assays, so the rownames actually make up a considerable portion
> of the object size.  (My typical dataset is 2.5M rows by 1k samples). I've
> been moving towards keeping a single dimnames copy just to improve RData
> load times.
>
> I think that assays should be required to have dimnames when they are added
> to a SummarizedExperiment. These dimnames should be checked for equality
> with the dimnames of the SE in the setter function.
>
> Perhaps with the recent (R 3.1) improvements in shallow/lazy copying and
> reference counting, adding dimnames to outgoing assays will be less of a
> performance hit.
>
>
Yes, with the 3.1 improvements, there should in theory be only one copy of
the rownames, i.e., rownames will not be duplicated when assigned to an
object.

Due to the way R lays out its objects, any object will be shallow copied
when an attribute is assigned. Thus, assigning dimnames to a matrix will
duplicate the matrix data, which is obviously costly. But, assigning
dimnames to a data.frame, DataFrame or just a dummy S4 object that wraps a
matrix, will be a cheap copy, i.e., there would only be a copy of a short
list.

It might be worth the time to implement a SimpleMatrix class that just
delegates all of the matrix operations to a "matrix" slot (also would need
slots for the dims and dimnames). The reasoning is that a matrix is
conceptually more appropriate than a data frame for storing assay data.


> I also like the compromise I have seen elsewhere, where the colnames are
> always retained on assays, but only one rownames copy is kept.  Colnames
> are typically small and getting them wrong often makes for silent, but
> catastrophic errors.
>
> Pete
>
> ____________________
> Peter M. Haverty, Ph.D.
> Genentech, Inc.
> phave...@gene.com
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to