Thanks. While I was beating a bit on Biobase, I understand if we don't want to revisit the design now. However, we might want to do so for the multi assay stuff. I have some additional thoughts on that.
Kasper On Mon, Feb 10, 2014 at 2:17 AM, Martin Morgan <mtmor...@fhcrc.org> wrote: > On 02/09/2014 02:38 PM, Kasper Daniel Hansen wrote: > >> Memory usage is a common bottleneck. >> >> For people interested in profiling their memory usage I want to recommend >> the lineprof package by Hadley Wickham which I have had great success with >> so far. There is some details in his 'Advanced R programming' at >> http://adv-r.had.co.nz/memory.html >> I see this package as a real game changer. >> >> I have written an example debugging session on a real use case >> (minfi::preprocessRaw) at >> http://www.hansenlab.org/rstats/2014/01/30/lineprof/ >> where I end up having to workaround using new() for Biobase classes (an >> eSet derived class in minfi) >> > > Thanks Kasper for the pointer. This is a bit brutal > > > m <- matrix(0, 0, 0) > > tracemem(m) > [1] "<0xe048d80>" > > ExpressionSet(m) > > tracemem[0xe048d80 -> 0xeb10530]: eapply sampleNames<- sampleNames<- > .local .nextMethod eval eval callNextMethod .local initialize initialize > new .ExpressionSet ExpressionSet ExpressionSet > > ... 15 copies later... > > tracemem[0xf93b2b8 -> 0xf93bc00]: colnames<- sampleNames<- sampleNames<- > .harmonizeDimnames .local initialize initialize new .ExpressionSet > ExpressionSet ExpressionSet > > Much of this is avoidable... copyEnv(), eapply(), and rownames<-, used > when making row and column names of the assayData consistent with feature > and sample names, all seem to unnecessarily duplicated elements > > e <- new.env(); m <- matrix(1); tracemem(m) > ## [1] "<0x1810d650>" > e[["m"]] <- m > x <- copyEnv(e) > ## tracemem[0x1810d650 -> 0x1810e0d8]: .Call copyEnv > x <- eapply(e, dim) > ## tracemem[0x1810d650 -> 0x1810e9f8]: eapply > dimnames(e[["m"]]) <- list("a", "A") > ## tracemem[0x1810d650 -> 0x1810fab0]: > rownames(e[["m"]]) <- "a" > ## tracemem[0x1810fab0 -> 0x18110de8]: > ## tracemem[0x18110de8 -> 0x18111730]: rownames<- > > I've updated the C code for copyEnv in Biobase, and avoided eapply and > row/colnames, so that there are usually only one or two copies for the > simplest constructor. I'll look out for bugs in downstream packages, and > would be happy to hear of other easily reproducible examples of apparently > unnecessary duplication. > > Martin > > >> Best, >> Kasper >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel