Would this code inspired by the release version of GenomicRanges work? e.g. if I want to add a DataFrame with 10 rows:
names <- letters[1:10] x <- relist(GRanges(), PartitioningByEnd(integer(10), names=names)) mcols(x) <- DataFrame(foo=1:10) Then give x to the rowRanges argument of SummarizedExperiment? On Tue, Mar 31, 2015 at 3:49 PM, Michael Love <michaelisaiahl...@gmail.com> wrote: > I forgot to ask my other question. I had gone in early March and fixed > my code to eliminate rowData<-, but the argument to SummarizedExperiment > was still called rowData, and a DataFrame could be provided. Then I > didn't check for a few weeks, but the argument for the rowData slot is > now called rowRanges. What's the trick to putting a DataFrame on an > empty GRanges, so I can get the old behavior but now using the rowRanges > argument? > > On Tue, Mar 31, 2015 at 3:40 PM, Michael Love > <michaelisaiahl...@gmail.com> wrote: >> With GenomicRanges 1.19.48, I'm still having issues with re-naming the >> first assay and duplication of memory from my March 9 email. I tried >> assayNames<- as well. My use case is if I am given a >> SummarizedExperiment where the first element is not named "counts" >> (albeit the SE is most likely coming from summarizeOverlaps() and >> already named "counts"...). >> >>> sessionInfo() >> R Under development (unstable) (2015-03-31 r68129) >> Platform: x86_64-apple-darwin12.5.0 (64-bit) >> Running under: OS X 10.8.5 (Mountain Lion) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats4 parallel stats graphics grDevices datasets utils >> methods base >> >> other attached packages: >> [1] GenomicRanges_1.19.48 GenomeInfoDb_1.3.16 IRanges_2.1.43 >> S4Vectors_0.5.22 >> [5] BiocGenerics_0.13.10 testthat_0.9.1 devtools_1.7.0 >> knitr_1.9 >> [9] BiocInstaller_1.17.6 >> >> loaded via a namespace (and not attached): >> [1] formatR_1.1 XVector_0.7.4 tools_3.3.0 stringr_0.6.2 >> evaluate_0.5.5 >> >> On Mon, Mar 9, 2015 at 1:21 PM, Michael Love >> <michaelisaiahl...@gmail.com> wrote: >>> >>> >>> On Mar 9, 2015 12:36 PM, "Martin Morgan" <mtmor...@fredhutch.org> wrote: >>> > >>> > On 03/09/2015 08:07 AM, Michael Love wrote: >>> >> >>> >> Some guidance on how to avoid duplication of the matrix for developers >>> >> would be greatly appreciated. >>> > >>> > >>> > It's unsatisfactory, but using withDimnames=FALSE avoids duplication on >>> > extraction of assays (but obviously you don't have dimnames on the >>> > matrix). Row or column subsetting necessarily causes the subsetted assay >>> > data to be duplicated. There should not be any duplication when >>> > rowRanges() or colData() are changed without changing their dimension / >>> > ordering. >>> > >>> >>> Thanks Martin for checking into the regression. >>> >>> Sorry, I should have been more specific earlier, I meant more >>> guidance/documentation in the man page for SE. I scanned the 'Extension' >>> section but didn't find a note on withDimnames for extracting the matrix or >>> this example of renaming the assays (it seems like this could easily be >>> relevant for other package authors). >>> >>> A prominent note there might help devs write more memory efficient packages. >>> >>> The argument section mentions speed but I'd explicitly mention memory given >>> that we're often storing big matrices: >>> >>> "Setting withDimnames=FALSE increases the speed with which assays are >>> extracted." >>> >>> (its entirely possible the info is there but i missed it) >>> >>> Best, >>> >>> Mike >>> >>> > >>> >> Another example of a trouble point, is that if I am given an SE with >>> >> an unnamed assay and I need to give the assay a name, this also can >>> >> expand the memory used. I had found a solution (which works with >>> >> GenomicRanges 1.18 / current release) with: >>> >> >>> >> names(assays(se, withDimnames=FALSE))[1] <- "foo" >>> >> >>> >> But now I'm looking in devel and this appears to no longer work. The >>> >> memory used expands, equivalent to: >>> >> >>> >> names(assays(se))[1] <- "foo" >>> >> >>> >> Here's some code to try this: >>> >> >>> >> m <- matrix(1:1e7,ncol=10,dimnames=list(1:1e6,1:10)) >>> >> se <- SummarizedExperiment(m) >>> >> names(assays(se, withDimnames=FALSE))[1] <- "foo" >>> >> names(assays(se))[1] <- "foo" >>> >> >>> >> while running gc() in between steps. >>> > >>> > >>> > I think this is a regression of some sort, and I'll look into it. Thanks >>> > for the heads-up. >>> > >>> > Martin >>> > >>> > >>> >> >>> >> >>> >> On Mon, Mar 9, 2015 at 10:36 AM, Kasper Daniel Hansen >>> >> <kasperdanielhan...@gmail.com> wrote: >>> >>> >>> >>> On Mon, Mar 9, 2015 at 10:30 AM, Vincent Carey >>> >>> <st...@channing.harvard.edu> >>> >>> wrote: >>> >>> >>> >>>> I am glad you are keeping this discussion alive Kasper. >>> >>>> >>> >>>> On Mon, Mar 9, 2015 at 10:06 AM, Kasper Daniel Hansen < >>> >>>> kasperdanielhan...@gmail.com> wrote: >>> >>>> >>> >>>>> It sounds like the proposed changes are already made. However (like >>> >>>>> others) I am still a bit mystified why this was necessary. The old >>> >>>>> version >>> >>>>> did allow for a GRanges inside the DataFrame of the rowData, as far >>> >>>>> as I >>> >>>>> recall. So I assume this is for efficiency. But why? What kind of >>> >>>>> data/use cases is this for? >>> >>>>> >>> >>>>> I am happy to hear that SummarizedExperiment is going to be spun out >>> >>>>> into >>> >>>>> its own package. When that happens, I have some comments, which I'll >>> >>>>> include here in anticipation >>> >>>>> 1) I now very strongly believe it was a design mistake to not have >>> >>>>> colnames on the assays. The advantage of this choice is that >>> >>>>> sampleNames >>> >>>>> are only stored one place. The extreme disadvantage is the high >>> >>>>> ineffeciency when you want colnames on an extracted assay. >>> >>>>> >>> >>>> >>> >>>> after example(SummarizedExperiment) >>> >>>> >>> >>>>> colnames(assays(se1)[[1]]) >>> >>>> >>> >>>> [1] "A" "B" "C" "D" "E" "F" >>> >>>> >>> >>>> so this seems to be optional. But attempts to set rownames will fail >>> >>>> silently >>> >>>> >>> >>>>> rownames(assays(se1)[[1]]) = as.character(1:200) >>> >>>> >>> >>>> >>> >>>>> rownames(assays(se1)[[1]]) >>> >>>> >>> >>>> >>> >>>> NULL >>> >>>> seems we could issue a warning there >>> >>>> >>> >>> >>> >>> >>> >>> Vince, you need to be careful here. >>> >>> >>> >>> The assays are stored without colnames (unless something has recently >>> >>> changed). The default is to - upon extraction - set the colnames of the >>> >>> matrix. This however requires a copy of the entire matrix. So >>> >>> essentially, upon extraction, each assay is needlessly duplicated to add >>> >>> the colnames. This is what I mean by inefficient. I would prefer to >>> >>> store >>> >>> the assays with colnames. This means that changing sampleNames of the >>> >>> object will be inefficient (as it is for eSets) since it would require a >>> >>> complete copy of everything. But I would rather - much rather - copy >>> >>> when >>> >>> setting sampleNames than copy when extracting an assay. >>> >>> >>> >>> Best, >>> >>> Kasper >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> >>> >>> _______________________________________________ >>> >>> Bioc-devel@r-project.org mailing list >>> >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >> >>> >> >>> >> _______________________________________________ >>> >> Bioc-devel@r-project.org mailing list >>> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >> >>> > >>> > >>> > -- >>> > Computational Biology / Fred Hutchinson Cancer Research Center >>> > 1100 Fairview Ave. N. >>> > PO Box 19024 Seattle, WA 98109 >>> > >>> > Location: Arnold Building M1 B861 >>> > Phone: (206) 667-2793 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel