On 02/09/2014 02:38 PM, Kasper Daniel Hansen wrote:
Memory usage is a common bottleneck.
For people interested in profiling their memory usage I want to recommend
the lineprof package by Hadley Wickham which I have had great success with
so far. There is some details in his 'Advanced R programming' at
http://adv-r.had.co.nz/memory.html
I see this package as a real game changer.
I have written an example debugging session on a real use case
(minfi::preprocessRaw) at
http://www.hansenlab.org/rstats/2014/01/30/lineprof/
where I end up having to workaround using new() for Biobase classes (an
eSet derived class in minfi)
Thanks Kasper for the pointer. This is a bit brutal
> m <- matrix(0, 0, 0)
> tracemem(m)
[1] "<0xe048d80>"
> ExpressionSet(m)
tracemem[0xe048d80 -> 0xeb10530]: eapply sampleNames<- sampleNames<- .local
.nextMethod eval eval callNextMethod .local initialize initialize new
.ExpressionSet ExpressionSet ExpressionSet
... 15 copies later...
tracemem[0xf93b2b8 -> 0xf93bc00]: colnames<- sampleNames<- sampleNames<-
.harmonizeDimnames .local initialize initialize new .ExpressionSet ExpressionSet
ExpressionSet
Much of this is avoidable... copyEnv(), eapply(), and rownames<-, used when
making row and column names of the assayData consistent with feature and sample
names, all seem to unnecessarily duplicated elements
e <- new.env(); m <- matrix(1); tracemem(m)
## [1] "<0x1810d650>"
e[["m"]] <- m
x <- copyEnv(e)
## tracemem[0x1810d650 -> 0x1810e0d8]: .Call copyEnv
x <- eapply(e, dim)
## tracemem[0x1810d650 -> 0x1810e9f8]: eapply
dimnames(e[["m"]]) <- list("a", "A")
## tracemem[0x1810d650 -> 0x1810fab0]:
rownames(e[["m"]]) <- "a"
## tracemem[0x1810fab0 -> 0x18110de8]:
## tracemem[0x18110de8 -> 0x18111730]: rownames<-
I've updated the C code for copyEnv in Biobase, and avoided eapply and
row/colnames, so that there are usually only one or two copies for the simplest
constructor. I'll look out for bugs in downstream packages, and would be happy
to hear of other easily reproducible examples of apparently unnecessary duplication.
Martin
Best,
Kasper
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel