Re: [Bioc-devel] SummarizedExperiment

Kasper Daniel Hansen Mon, 24 Mar 2014 12:19:28 -0700

There could be repurcussions, depending on the long term strategy.

Pre-change behaviour was
  all assay matrices would be stored without dimnames.
  when they are accessed, dimnames would be copied in from the colData slot
(could be optionally disabled).


Current behaviour (which I have not fully examined)
  It is valid to have an assay matrix both with and without dimnames
  It is unclear (to me right now) if there is some enforcement that if it
has dimnames, that they match across different assays and with colData

Possible behaviour (like Biobase::eSet)
  assay need to have dimnames and they need to match between all assays and
also colData



We could consider keeping the current behavior, but let assay() check
whether the dimnames of the "raw" assay object correspond to colData.  If
untrue, do the copying.  If this is chosen, I think we should also have a
function like "optimizeObject" which would harmonize the various dimnames,
at the cost of (potentially) copying the entire object.  This would be
backwards compatible for most users, it would be (potentially) a nice
feature for package designers because they can have (1) intermediate
objects without harmonized dimnames and (2) it would make for potentially
faster construction if the developer _knows_ the dimnames matches (which I
often do).  It would mean more checking in assay() and assays().

I think this is important to get right, and I am not 100% sure that the
suggestion above is the best.

Kasper


On Mon, Mar 24, 2014 at 11:42 AM, Vincent Carey
<st...@channing.harvard.edu>wrote:

>
>
>
> On Mon, Mar 24, 2014 at 11:07 AM, Martin Morgan <mtmor...@fhcrc.org>wrote:
>
>> On 03/20/2014 06:29 PM, Kasper Daniel Hansen wrote:
>>
>>> It used to be the case that when a SummarizedExperiment was constructed,
>>> dim names was removed from the matrices in assay.  One could then either
>>> use
>>> (1)  assay(, withDimnames = TRUE)
>>> which ensured dim names in the return value, but implied copying of the
>>> return object because the dim names had to get added, or
>>> (2) assay(, withDimnames = FALSE)
>>> which ensured that the return object had no dim names (because they were
>>> stripped).
>>>
>>> It seems in a recent commit (based on log message I am guessing the two
>>> copied in at the bottom of the email, dim names are not stripped at
>>> construction.  This implies that
>>>    assay(, withDimnames = FALSE)
>>> returns an object with the dimnames because they are already present in
>>> the
>>> raw object.
>>>
>>> Now, my questions are
>>> (1) can I depend on this behavior?
>>>
>>
>> yes.
>>
>>
>>  (2) Is there any check that the dimnames which may be present in the
>>> 'raw'
>>> assay object are in line with what I get from assay(withDimnames = TRUE)
>>> or
>>> could I imagine getting different dimnames (and not just no dimnames vs
>>> with dimnames) depending on withDimnames?
>>>
>>>
>> more structure will be imposed; the dimnames of the overall object will
>> agree with the dimnames of the assays.
>>
>>
>>  To get some context, in bsseq I always use withDimnames=FALSE because the
>>> assay matrices are big (28M rows), so I want to avoid copying.  But now I
>>> get a failed test, since I construct an object with colnames in the
>>> assay.
>>>   This seems to be an esoteric point, but it has performance
>>> implications in
>>> my usage.  I don't know what the right design is - I like that renaming
>>> things are quick, because it only happens in the colData slot.
>>>
>>
>> I think stripping the dimnames from assays was a mistake -- it saves
>> space (but not much compared to the assay data) but causes a performance
>> bottleneck in normal use (when the dimnames are copied to the assay data)
>> so I think it makes sense to just duplicate / check dimnames. This is the
>> direction I'll go in, unless there are other opinions.
>>
>>
>>
> Will there be repercussions for serialized SummarizedExperiment instances?
> updateObject will be necessary?  Should we be doing/are we doing class
> versioning?
>
>
>
>>
>>> Finally, it seems that the NEWS file in GenomicRanges is no longer
>>> maintained.  Is this intentional ? :(
>>>
>>
>> the *Ranges tradition seems to be to update the NEWS files prior to
>> release, rather than during development. So for instance
>>
>> ------------------------------------------------------------------------
>> r87773 | hpa...@fhcrc.org | 2014-03-24 00:49:50 -0700 (Mon, 24 Mar 2014)
>> | 1 line
>>
>> start to update NEWS file with changes in the upcoming 1.16.0 version
>>
>>
>>
>>> Best,
>>> Kasper
>>>
>>>
>>> r77404 | mtmor...@fhcrc.org | 2013-06-11 15:52:25 -0400 (Tue, 11 Jun
>>> 2013)
>>> | 5 lines
>>>
>>> relax SummarizedExperiment assays class validity
>>>
>>> - dim() of length >= 2
>>> - does not guarantee functionality; may be altered in the future
>>>
>>> ------------------------------------------------------------------------
>>> r76679 | mtmor...@fhcrc.org | 2013-05-16 17:12:43 -0400 (Thu, 16 May
>>> 2013)
>>> | 4 lines
>>>
>>> more efficient ref class constructor
>>>
>>> - new empty instance, the update slots
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
>>
>> _______________________________________________
>> Bioc-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] SummarizedExperiment

Reply via email to