Re: [Bioc-devel] Changes to the SummarizedExperiment Class

Michael Love Tue, 31 Mar 2015 13:41:36 -0700

Would this code inspired by the release version of GenomicRanges work?
e.g. if I want to add a DataFrame with 10 rows:


names <- letters[1:10]
x <- relist(GRanges(), PartitioningByEnd(integer(10), names=names))
mcols(x) <- DataFrame(foo=1:10)

Then give x to the rowRanges argument of SummarizedExperiment?

On Tue, Mar 31, 2015 at 3:49 PM, Michael Love
<michaelisaiahl...@gmail.com> wrote:
> I forgot to ask my other question. I had gone in early March and fixed
> my code to eliminate rowData<-, but the argument to SummarizedExperiment
> was still called rowData, and a DataFrame could be provided. Then I
> didn't check for a few weeks, but the argument for the rowData slot is
> now called rowRanges. What's the trick to putting a DataFrame on an
> empty GRanges, so I can get the old behavior but now using the rowRanges
> argument?
>
> On Tue, Mar 31, 2015 at 3:40 PM, Michael Love
> <michaelisaiahl...@gmail.com> wrote:
>> With GenomicRanges 1.19.48, I'm still having issues with re-naming the
>> first assay and duplication of memory from my March 9 email. I tried
>> assayNames<- as well. My use case is if I am given a
>> SummarizedExperiment where the first element is not named "counts"
>> (albeit the SE is most likely coming from summarizeOverlaps() and
>> already named "counts"...).
>>
>>> sessionInfo()
>> R Under development (unstable) (2015-03-31 r68129)
>> Platform: x86_64-apple-darwin12.5.0 (64-bit)
>> Running under: OS X 10.8.5 (Mountain Lion)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats4    parallel  stats     graphics  grDevices datasets  utils
>>    methods   base
>>
>> other attached packages:
>> [1] GenomicRanges_1.19.48 GenomeInfoDb_1.3.16   IRanges_2.1.43
>> S4Vectors_0.5.22
>> [5] BiocGenerics_0.13.10  testthat_0.9.1        devtools_1.7.0        
>> knitr_1.9
>> [9] BiocInstaller_1.17.6
>>
>> loaded via a namespace (and not attached):
>> [1] formatR_1.1    XVector_0.7.4  tools_3.3.0    stringr_0.6.2  
>> evaluate_0.5.5
>>
>> On Mon, Mar 9, 2015 at 1:21 PM, Michael Love
>> <michaelisaiahl...@gmail.com> wrote:
>>>
>>>
>>> On Mar 9, 2015 12:36 PM, "Martin Morgan" <mtmor...@fredhutch.org> wrote:
>>> >
>>> > On 03/09/2015 08:07 AM, Michael Love wrote:
>>> >>
>>> >> Some guidance on how to avoid duplication of the matrix for developers
>>> >> would be greatly appreciated.
>>> >
>>> >
>>> > It's unsatisfactory, but using withDimnames=FALSE avoids duplication on 
>>> > extraction of assays (but obviously you don't have dimnames on the 
>>> > matrix). Row or column subsetting necessarily causes the subsetted assay 
>>> > data to be duplicated. There should not be any duplication when 
>>> > rowRanges() or colData() are changed without changing their dimension / 
>>> > ordering.
>>> >
>>>
>>> Thanks Martin for checking into the regression.
>>>
>>> Sorry, I should have been more specific earlier, I meant more 
>>> guidance/documentation in the man page for SE. I scanned the 'Extension' 
>>> section but didn't find a note on withDimnames for extracting the matrix or 
>>> this example of renaming the assays (it seems like this could easily be 
>>> relevant for other package authors).
>>>
>>> A prominent note there might help devs write more memory efficient packages.
>>>
>>> The argument section mentions speed but I'd explicitly mention memory given 
>>> that we're often storing big matrices:
>>>
>>> "Setting withDimnames=FALSE  increases the speed with which assays are 
>>> extracted."
>>>
>>> (its entirely possible the info is there but i missed it)
>>>
>>> Best,
>>>
>>> Mike
>>>
>>> >
>>> >> Another example of a trouble point, is that if I am given an SE with
>>> >> an unnamed assay and I need to give the assay a name, this also can
>>> >> expand the memory used. I had found a solution (which works with
>>> >> GenomicRanges 1.18 / current release) with:
>>> >>
>>> >> names(assays(se, withDimnames=FALSE))[1] <- "foo"
>>> >>
>>> >> But now I'm looking in devel and this appears to no longer work. The
>>> >> memory used expands, equivalent to:
>>> >>
>>> >> names(assays(se))[1] <- "foo"
>>> >>
>>> >> Here's some code to try this:
>>> >>
>>> >> m <- matrix(1:1e7,ncol=10,dimnames=list(1:1e6,1:10))
>>> >> se <- SummarizedExperiment(m)
>>> >> names(assays(se, withDimnames=FALSE))[1] <- "foo"
>>> >> names(assays(se))[1] <- "foo"
>>> >>
>>> >> while running gc() in between steps.
>>> >
>>> >
>>> > I think this is a regression of some sort, and I'll look into it. Thanks 
>>> > for the heads-up.
>>> >
>>> > Martin
>>> >
>>> >
>>> >>
>>> >>
>>> >> On Mon, Mar 9, 2015 at 10:36 AM, Kasper Daniel Hansen
>>> >> <kasperdanielhan...@gmail.com> wrote:
>>> >>>
>>> >>> On Mon, Mar 9, 2015 at 10:30 AM, Vincent Carey 
>>> >>> <st...@channing.harvard.edu>
>>> >>> wrote:
>>> >>>
>>> >>>> I am glad you are keeping this discussion alive Kasper.
>>> >>>>
>>> >>>> On Mon, Mar 9, 2015 at 10:06 AM, Kasper Daniel Hansen <
>>> >>>> kasperdanielhan...@gmail.com> wrote:
>>> >>>>
>>> >>>>> It sounds like the proposed changes are already made.  However (like
>>> >>>>> others) I am still a bit mystified why this was necessary.  The old
>>> >>>>> version
>>> >>>>> did allow for a GRanges inside the DataFrame of the rowData, as far 
>>> >>>>> as I
>>> >>>>> recall.  So I assume this is for efficiency.  But why?  What kind of
>>> >>>>> data/use cases is this for?
>>> >>>>>
>>> >>>>> I am happy to hear that SummarizedExperiment is going to be spun out 
>>> >>>>> into
>>> >>>>> its own package.  When that happens, I have some comments, which I'll
>>> >>>>> include here in anticipation
>>> >>>>>    1) I now very strongly believe it was a design mistake to not have
>>> >>>>> colnames on the assays.  The advantage of this choice is that 
>>> >>>>> sampleNames
>>> >>>>> are only stored one place.  The extreme disadvantage is the high
>>> >>>>> ineffeciency when you want colnames on an extracted assay.
>>> >>>>>
>>> >>>>
>>> >>>> after example(SummarizedExperiment)
>>> >>>>
>>> >>>>> colnames(assays(se1)[[1]])
>>> >>>>
>>> >>>> [1] "A" "B" "C" "D" "E" "F"
>>> >>>>
>>> >>>> so this seems to be optional.  But attempts to set rownames will fail
>>> >>>> silently
>>> >>>>
>>> >>>>> rownames(assays(se1)[[1]]) = as.character(1:200)
>>> >>>>
>>> >>>>
>>> >>>>> rownames(assays(se1)[[1]])
>>> >>>>
>>> >>>>
>>> >>>> NULL
>>> >>>> seems we could issue a warning there
>>> >>>>
>>> >>>
>>> >>>
>>> >>> Vince, you need to be careful here.
>>> >>>
>>> >>> The assays are stored without colnames (unless something has recently
>>> >>> changed).  The default is to - upon extraction - set the colnames of the
>>> >>> matrix.  This however requires a copy of the entire matrix.  So
>>> >>> essentially, upon extraction, each assay is needlessly duplicated to add
>>> >>> the colnames.  This is what I mean by inefficient. I would prefer to 
>>> >>> store
>>> >>> the assays with colnames.  This means that changing sampleNames of the
>>> >>> object will be inefficient (as it is for eSets) since it would require a
>>> >>> complete copy of everything.  But I would rather - much rather - copy 
>>> >>> when
>>> >>> setting sampleNames than copy when extracting an assay.
>>> >>>
>>> >>> Best,
>>> >>> Kasper
>>> >>>
>>> >>>          [[alternative HTML version deleted]]
>>> >>>
>>> >>> _______________________________________________
>>> >>> Bioc-devel@r-project.org mailing list
>>> >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> Bioc-devel@r-project.org mailing list
>>> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> >>
>>> >
>>> >
>>> > --
>>> > Computational Biology / Fred Hutchinson Cancer Research Center
>>> > 1100 Fairview Ave. N.
>>> > PO Box 19024 Seattle, WA 98109
>>> >
>>> > Location: Arnold Building M1 B861
>>> > Phone: (206) 667-2793

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] Changes to the SummarizedExperiment Class

Reply via email to