On 02/09/2014 02:38 PM, Kasper Daniel Hansen wrote:
Memory usage is a common bottleneck.

For people interested in profiling their memory usage I want to recommend
the lineprof package by Hadley Wickham which I have had great success with
so far.  There is some details in his 'Advanced R programming' at
   http://adv-r.had.co.nz/memory.html
I see this package as a real game changer.

I have written an example debugging session on a real use case
(minfi::preprocessRaw) at
   http://www.hansenlab.org/rstats/2014/01/30/lineprof/
where I end up having to workaround using new() for Biobase classes (an
eSet derived class in minfi)

Thanks Kasper for the pointer.  This is a bit brutal

> m <- matrix(0, 0, 0)
> tracemem(m)
[1] "<0xe048d80>"
> ExpressionSet(m)

tracemem[0xe048d80 -> 0xeb10530]: eapply sampleNames<- sampleNames<- .local .nextMethod eval eval callNextMethod .local initialize initialize new .ExpressionSet ExpressionSet ExpressionSet

... 15 copies later...

tracemem[0xf93b2b8 -> 0xf93bc00]: colnames<- sampleNames<- sampleNames<- .harmonizeDimnames .local initialize initialize new .ExpressionSet ExpressionSet ExpressionSet

Much of this is avoidable... copyEnv(), eapply(), and rownames<-, used when making row and column names of the assayData consistent with feature and sample names, all seem to unnecessarily duplicated elements

    e <- new.env(); m <- matrix(1); tracemem(m)
    ## [1] "<0x1810d650>"
    e[["m"]] <- m
    x <- copyEnv(e)
    ## tracemem[0x1810d650 -> 0x1810e0d8]: .Call copyEnv
    x <- eapply(e, dim)
    ## tracemem[0x1810d650 -> 0x1810e9f8]: eapply
    dimnames(e[["m"]]) <- list("a", "A")
    ## tracemem[0x1810d650 -> 0x1810fab0]:
    rownames(e[["m"]]) <- "a"
    ## tracemem[0x1810fab0 -> 0x18110de8]:
    ## tracemem[0x18110de8 -> 0x18111730]: rownames<-

I've updated the C code for copyEnv in Biobase, and avoided eapply and row/colnames, so that there are usually only one or two copies for the simplest constructor. I'll look out for bugs in downstream packages, and would be happy to hear of other easily reproducible examples of apparently unnecessary duplication.

Martin


Best,
Kasper

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to