I forgot to ask my other question. I had gone in early March and fixed my code to eliminate rowData<-, but the argument to SummarizedExperiment was still called rowData, and a DataFrame could be provided. Then I didn't check for a few weeks, but the argument for the rowData slot is now called rowRanges. What's the trick to putting a DataFrame on an empty GRanges, so I can get the old behavior but now using the rowRanges argument?
On Tue, Mar 31, 2015 at 3:40 PM, Michael Love <michaelisaiahl...@gmail.com> wrote: > With GenomicRanges 1.19.48, I'm still having issues with re-naming the > first assay and duplication of memory from my March 9 email. I tried > assayNames<- as well. My use case is if I am given a > SummarizedExperiment where the first element is not named "counts" > (albeit the SE is most likely coming from summarizeOverlaps() and > already named "counts"...). > >> sessionInfo() > R Under development (unstable) (2015-03-31 r68129) > Platform: x86_64-apple-darwin12.5.0 (64-bit) > Running under: OS X 10.8.5 (Mountain Lion) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats4 parallel stats graphics grDevices datasets utils > methods base > > other attached packages: > [1] GenomicRanges_1.19.48 GenomeInfoDb_1.3.16 IRanges_2.1.43 > S4Vectors_0.5.22 > [5] BiocGenerics_0.13.10 testthat_0.9.1 devtools_1.7.0 > knitr_1.9 > [9] BiocInstaller_1.17.6 > > loaded via a namespace (and not attached): > [1] formatR_1.1 XVector_0.7.4 tools_3.3.0 stringr_0.6.2 evaluate_0.5.5 > > On Mon, Mar 9, 2015 at 1:21 PM, Michael Love > <michaelisaiahl...@gmail.com> wrote: >> >> >> On Mar 9, 2015 12:36 PM, "Martin Morgan" <mtmor...@fredhutch.org> wrote: >> > >> > On 03/09/2015 08:07 AM, Michael Love wrote: >> >> >> >> Some guidance on how to avoid duplication of the matrix for developers >> >> would be greatly appreciated. >> > >> > >> > It's unsatisfactory, but using withDimnames=FALSE avoids duplication on >> > extraction of assays (but obviously you don't have dimnames on the >> > matrix). Row or column subsetting necessarily causes the subsetted assay >> > data to be duplicated. There should not be any duplication when >> > rowRanges() or colData() are changed without changing their dimension / >> > ordering. >> > >> >> Thanks Martin for checking into the regression. >> >> Sorry, I should have been more specific earlier, I meant more >> guidance/documentation in the man page for SE. I scanned the 'Extension' >> section but didn't find a note on withDimnames for extracting the matrix or >> this example of renaming the assays (it seems like this could easily be >> relevant for other package authors). >> >> A prominent note there might help devs write more memory efficient packages. >> >> The argument section mentions speed but I'd explicitly mention memory given >> that we're often storing big matrices: >> >> "Setting withDimnames=FALSE increases the speed with which assays are >> extracted." >> >> (its entirely possible the info is there but i missed it) >> >> Best, >> >> Mike >> >> > >> >> Another example of a trouble point, is that if I am given an SE with >> >> an unnamed assay and I need to give the assay a name, this also can >> >> expand the memory used. I had found a solution (which works with >> >> GenomicRanges 1.18 / current release) with: >> >> >> >> names(assays(se, withDimnames=FALSE))[1] <- "foo" >> >> >> >> But now I'm looking in devel and this appears to no longer work. The >> >> memory used expands, equivalent to: >> >> >> >> names(assays(se))[1] <- "foo" >> >> >> >> Here's some code to try this: >> >> >> >> m <- matrix(1:1e7,ncol=10,dimnames=list(1:1e6,1:10)) >> >> se <- SummarizedExperiment(m) >> >> names(assays(se, withDimnames=FALSE))[1] <- "foo" >> >> names(assays(se))[1] <- "foo" >> >> >> >> while running gc() in between steps. >> > >> > >> > I think this is a regression of some sort, and I'll look into it. Thanks >> > for the heads-up. >> > >> > Martin >> > >> > >> >> >> >> >> >> On Mon, Mar 9, 2015 at 10:36 AM, Kasper Daniel Hansen >> >> <kasperdanielhan...@gmail.com> wrote: >> >>> >> >>> On Mon, Mar 9, 2015 at 10:30 AM, Vincent Carey >> >>> <st...@channing.harvard.edu> >> >>> wrote: >> >>> >> >>>> I am glad you are keeping this discussion alive Kasper. >> >>>> >> >>>> On Mon, Mar 9, 2015 at 10:06 AM, Kasper Daniel Hansen < >> >>>> kasperdanielhan...@gmail.com> wrote: >> >>>> >> >>>>> It sounds like the proposed changes are already made. However (like >> >>>>> others) I am still a bit mystified why this was necessary. The old >> >>>>> version >> >>>>> did allow for a GRanges inside the DataFrame of the rowData, as far as >> >>>>> I >> >>>>> recall. So I assume this is for efficiency. But why? What kind of >> >>>>> data/use cases is this for? >> >>>>> >> >>>>> I am happy to hear that SummarizedExperiment is going to be spun out >> >>>>> into >> >>>>> its own package. When that happens, I have some comments, which I'll >> >>>>> include here in anticipation >> >>>>> 1) I now very strongly believe it was a design mistake to not have >> >>>>> colnames on the assays. The advantage of this choice is that >> >>>>> sampleNames >> >>>>> are only stored one place. The extreme disadvantage is the high >> >>>>> ineffeciency when you want colnames on an extracted assay. >> >>>>> >> >>>> >> >>>> after example(SummarizedExperiment) >> >>>> >> >>>>> colnames(assays(se1)[[1]]) >> >>>> >> >>>> [1] "A" "B" "C" "D" "E" "F" >> >>>> >> >>>> so this seems to be optional. But attempts to set rownames will fail >> >>>> silently >> >>>> >> >>>>> rownames(assays(se1)[[1]]) = as.character(1:200) >> >>>> >> >>>> >> >>>>> rownames(assays(se1)[[1]]) >> >>>> >> >>>> >> >>>> NULL >> >>>> seems we could issue a warning there >> >>>> >> >>> >> >>> >> >>> Vince, you need to be careful here. >> >>> >> >>> The assays are stored without colnames (unless something has recently >> >>> changed). The default is to - upon extraction - set the colnames of the >> >>> matrix. This however requires a copy of the entire matrix. So >> >>> essentially, upon extraction, each assay is needlessly duplicated to add >> >>> the colnames. This is what I mean by inefficient. I would prefer to >> >>> store >> >>> the assays with colnames. This means that changing sampleNames of the >> >>> object will be inefficient (as it is for eSets) since it would require a >> >>> complete copy of everything. But I would rather - much rather - copy >> >>> when >> >>> setting sampleNames than copy when extracting an assay. >> >>> >> >>> Best, >> >>> Kasper >> >>> >> >>> [[alternative HTML version deleted]] >> >>> >> >>> _______________________________________________ >> >>> Bioc-devel@r-project.org mailing list >> >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> >> >> >> >> _______________________________________________ >> >> Bioc-devel@r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> >> > >> > >> > -- >> > Computational Biology / Fred Hutchinson Cancer Research Center >> > 1100 Fairview Ave. N. >> > PO Box 19024 Seattle, WA 98109 >> > >> > Location: Arnold Building M1 B861 >> > Phone: (206) 667-2793 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel