I'll retract those last two emails about empty GRanges. That's simply: se <- SummarizedExperiment(assays, colData=colData) mcols(se) <- myDataFrame
On Tue, Mar 31, 2015 at 4:40 PM, Michael Love <michaelisaiahl...@gmail.com> wrote: > Would this code inspired by the release version of GenomicRanges work? > e.g. if I want to add a DataFrame with 10 rows: > > names <- letters[1:10] > x <- relist(GRanges(), PartitioningByEnd(integer(10), names=names)) > mcols(x) <- DataFrame(foo=1:10) > > Then give x to the rowRanges argument of SummarizedExperiment? > > On Tue, Mar 31, 2015 at 3:49 PM, Michael Love > <michaelisaiahl...@gmail.com> wrote: >> I forgot to ask my other question. I had gone in early March and fixed >> my code to eliminate rowData<-, but the argument to SummarizedExperiment >> was still called rowData, and a DataFrame could be provided. Then I >> didn't check for a few weeks, but the argument for the rowData slot is >> now called rowRanges. What's the trick to putting a DataFrame on an >> empty GRanges, so I can get the old behavior but now using the rowRanges >> argument? >> >> On Tue, Mar 31, 2015 at 3:40 PM, Michael Love >> <michaelisaiahl...@gmail.com> wrote: >>> With GenomicRanges 1.19.48, I'm still having issues with re-naming the >>> first assay and duplication of memory from my March 9 email. I tried >>> assayNames<- as well. My use case is if I am given a >>> SummarizedExperiment where the first element is not named "counts" >>> (albeit the SE is most likely coming from summarizeOverlaps() and >>> already named "counts"...). >>> >>>> sessionInfo() >>> R Under development (unstable) (2015-03-31 r68129) >>> Platform: x86_64-apple-darwin12.5.0 (64-bit) >>> Running under: OS X 10.8.5 (Mountain Lion) >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] stats4 parallel stats graphics grDevices datasets utils >>> methods base >>> >>> other attached packages: >>> [1] GenomicRanges_1.19.48 GenomeInfoDb_1.3.16 IRanges_2.1.43 >>> S4Vectors_0.5.22 >>> [5] BiocGenerics_0.13.10 testthat_0.9.1 devtools_1.7.0 >>> knitr_1.9 >>> [9] BiocInstaller_1.17.6 >>> >>> loaded via a namespace (and not attached): >>> [1] formatR_1.1 XVector_0.7.4 tools_3.3.0 stringr_0.6.2 >>> evaluate_0.5.5 >>> >>> On Mon, Mar 9, 2015 at 1:21 PM, Michael Love >>> <michaelisaiahl...@gmail.com> wrote: >>>> >>>> >>>> On Mar 9, 2015 12:36 PM, "Martin Morgan" <mtmor...@fredhutch.org> wrote: >>>> > >>>> > On 03/09/2015 08:07 AM, Michael Love wrote: >>>> >> >>>> >> Some guidance on how to avoid duplication of the matrix for developers >>>> >> would be greatly appreciated. >>>> > >>>> > >>>> > It's unsatisfactory, but using withDimnames=FALSE avoids duplication on >>>> > extraction of assays (but obviously you don't have dimnames on the >>>> > matrix). Row or column subsetting necessarily causes the subsetted assay >>>> > data to be duplicated. There should not be any duplication when >>>> > rowRanges() or colData() are changed without changing their dimension / >>>> > ordering. >>>> > >>>> >>>> Thanks Martin for checking into the regression. >>>> >>>> Sorry, I should have been more specific earlier, I meant more >>>> guidance/documentation in the man page for SE. I scanned the 'Extension' >>>> section but didn't find a note on withDimnames for extracting the matrix >>>> or this example of renaming the assays (it seems like this could easily be >>>> relevant for other package authors). >>>> >>>> A prominent note there might help devs write more memory efficient >>>> packages. >>>> >>>> The argument section mentions speed but I'd explicitly mention memory >>>> given that we're often storing big matrices: >>>> >>>> "Setting withDimnames=FALSE increases the speed with which assays are >>>> extracted." >>>> >>>> (its entirely possible the info is there but i missed it) >>>> >>>> Best, >>>> >>>> Mike >>>> >>>> > >>>> >> Another example of a trouble point, is that if I am given an SE with >>>> >> an unnamed assay and I need to give the assay a name, this also can >>>> >> expand the memory used. I had found a solution (which works with >>>> >> GenomicRanges 1.18 / current release) with: >>>> >> >>>> >> names(assays(se, withDimnames=FALSE))[1] <- "foo" >>>> >> >>>> >> But now I'm looking in devel and this appears to no longer work. The >>>> >> memory used expands, equivalent to: >>>> >> >>>> >> names(assays(se))[1] <- "foo" >>>> >> >>>> >> Here's some code to try this: >>>> >> >>>> >> m <- matrix(1:1e7,ncol=10,dimnames=list(1:1e6,1:10)) >>>> >> se <- SummarizedExperiment(m) >>>> >> names(assays(se, withDimnames=FALSE))[1] <- "foo" >>>> >> names(assays(se))[1] <- "foo" >>>> >> >>>> >> while running gc() in between steps. >>>> > >>>> > >>>> > I think this is a regression of some sort, and I'll look into it. Thanks >>>> > for the heads-up. >>>> > >>>> > Martin >>>> > >>>> > >>>> >> >>>> >> >>>> >> On Mon, Mar 9, 2015 at 10:36 AM, Kasper Daniel Hansen >>>> >> <kasperdanielhan...@gmail.com> wrote: >>>> >>> >>>> >>> On Mon, Mar 9, 2015 at 10:30 AM, Vincent Carey >>>> >>> <st...@channing.harvard.edu> >>>> >>> wrote: >>>> >>> >>>> >>>> I am glad you are keeping this discussion alive Kasper. >>>> >>>> >>>> >>>> On Mon, Mar 9, 2015 at 10:06 AM, Kasper Daniel Hansen < >>>> >>>> kasperdanielhan...@gmail.com> wrote: >>>> >>>> >>>> >>>>> It sounds like the proposed changes are already made. However (like >>>> >>>>> others) I am still a bit mystified why this was necessary. The old >>>> >>>>> version >>>> >>>>> did allow for a GRanges inside the DataFrame of the rowData, as far >>>> >>>>> as I >>>> >>>>> recall. So I assume this is for efficiency. But why? What kind of >>>> >>>>> data/use cases is this for? >>>> >>>>> >>>> >>>>> I am happy to hear that SummarizedExperiment is going to be spun out >>>> >>>>> into >>>> >>>>> its own package. When that happens, I have some comments, which I'll >>>> >>>>> include here in anticipation >>>> >>>>> 1) I now very strongly believe it was a design mistake to not have >>>> >>>>> colnames on the assays. The advantage of this choice is that >>>> >>>>> sampleNames >>>> >>>>> are only stored one place. The extreme disadvantage is the high >>>> >>>>> ineffeciency when you want colnames on an extracted assay. >>>> >>>>> >>>> >>>> >>>> >>>> after example(SummarizedExperiment) >>>> >>>> >>>> >>>>> colnames(assays(se1)[[1]]) >>>> >>>> >>>> >>>> [1] "A" "B" "C" "D" "E" "F" >>>> >>>> >>>> >>>> so this seems to be optional. But attempts to set rownames will fail >>>> >>>> silently >>>> >>>> >>>> >>>>> rownames(assays(se1)[[1]]) = as.character(1:200) >>>> >>>> >>>> >>>> >>>> >>>>> rownames(assays(se1)[[1]]) >>>> >>>> >>>> >>>> >>>> >>>> NULL >>>> >>>> seems we could issue a warning there >>>> >>>> >>>> >>> >>>> >>> >>>> >>> Vince, you need to be careful here. >>>> >>> >>>> >>> The assays are stored without colnames (unless something has recently >>>> >>> changed). The default is to - upon extraction - set the colnames of >>>> >>> the >>>> >>> matrix. This however requires a copy of the entire matrix. So >>>> >>> essentially, upon extraction, each assay is needlessly duplicated to >>>> >>> add >>>> >>> the colnames. This is what I mean by inefficient. I would prefer to >>>> >>> store >>>> >>> the assays with colnames. This means that changing sampleNames of the >>>> >>> object will be inefficient (as it is for eSets) since it would require >>>> >>> a >>>> >>> complete copy of everything. But I would rather - much rather - copy >>>> >>> when >>>> >>> setting sampleNames than copy when extracting an assay. >>>> >>> >>>> >>> Best, >>>> >>> Kasper >>>> >>> >>>> >>> [[alternative HTML version deleted]] >>>> >>> >>>> >>> _______________________________________________ >>>> >>> Bioc-devel@r-project.org mailing list >>>> >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> >> >>>> >> >>>> >> _______________________________________________ >>>> >> Bioc-devel@r-project.org mailing list >>>> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> >> >>>> > >>>> > >>>> > -- >>>> > Computational Biology / Fred Hutchinson Cancer Research Center >>>> > 1100 Fairview Ave. N. >>>> > PO Box 19024 Seattle, WA 98109 >>>> > >>>> > Location: Arnold Building M1 B861 >>>> > Phone: (206) 667-2793 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel