On Jul 19, 2011, at 4:18 PM, Peter Lomas wrote:
Hi Richard,
As others have said, try to use the "apply" functions rather than
loops.
There is also an apply function for lists, see ?lapply. This is
much more
efficient.
Actually the "apply" functions are not "more efficient" in the usual
meaning of time of execution. And sometimes they is rather
inefficient. Prior discussions of this topic in the archives should be
easy to find. The economy is in expression and the advantage is in
code creation and maintenance.
Doubters of this proposition should consider these results:
library(rbenchmark) # help page has a more compact version of these
tests
means.rep = function(n, m) {res1 <- vector(length=100, mode="numeric")
res1 <- replicate(n, mean( rexp(m)))}
means.colMn = function(n, m) {res2 <- vector(length=100, mode="numeric")
res2 <- colMeans(matrix( rexp(n*m), c(m, n)))}
means.tapply = function(n,m) {res3 <- vector(length=100, mode="numeric")
res3 <- tapply( rexp(n*m), rep(1:n, each = m), mean)}
means.apply =function(n,m) { res4 <- vector(length=100, mode="numeric")
res4 <-apply( matrix(rexp(m*n),n,m), 1, mean) }
means.forloop =function(n, m) {res5 <- vector(length=100,
mode="numeric")
for (i in n) {res5[i] <-mean(rexp(m))} }
benchmark(
repl = means.rep(100, 100),
tappl = means.tapply(100, 100),
appl = means.apply(100, 100),
pat = means.pat(100, 100),
forloop = means.forloop(100,100),
replications=100, columns=c("test","replications","elapsed"),
order='elapsed' )
###
Results:
test replications relative elapsed
5 forloop 100 1.00 0.004
4 pat 100 20.25 0.081
1 repl 100 77.00 0.308
3 appl 100 89.75 0.359
2 tappl 100 264.50 1.058
I admit that I was rather surprised to see the for-loop beating
colMeans by such a wide margin, and this is making me wonder if I
reversed some index or coded the for-loop test wrong. So would
appreciate some auditing and improvement of this test. (But I don't
see how I could have reversed the order since the n and m are both
100. And I tried adding assignments to see if there were only promises
being made with no calculations. The relative efficiencies stays the
same.)
--
David.
I also like writing my own functions. For example:
f <- function(x) {
x^2
}
Which can then be used by:
f(2)
[1] 4
This is very useful if you're getting into maximum likelihood
programming,
or want to use the "optim" function (for multivariate functions) or
"optimize" (for univariate functions).
Lastly, check out the R reference card.
http://cran.r-project.org/doc/contrib/Short-refcard.pdf
Regards,
Peter
On Tue, Jul 19, 2011 at 12:43, RichardLang <l...@zedat.fu-berlin.de>
wrote:
Hi everyone!
I'm trying to teach myself R in order to do some data analysis. I'm a
mathematics student and (only) familiar with matlab and latex. I'm
working
trough the "official" introduction to R at the moment, while
simultaneously
solving some exercises I found in the web. Before I post my (probably
stupid) question, I'd like to ask you for some general advice. How
do you
work with R? Is it like in matlab, that you write your functions
with a lot
of loops etc. in a textfile and then run it? Or do you just prepare
your
data and then use the functions provided by R (plot, mean etc) to
get some
analysis? I'd be very thankfull for some of your thoughts about
"approaches".
Now the question: I'm trying to build a vector with n entries, each
consisting of the mean of m random numbers (exponential distributed
for
example). My approach was to construct a nxm random matrix and then
to
somehow take the mean of each row. But in the mean function there
is no
parameter to do this, so the intended approach of R is probably
different..
any ideas? =)
Richard
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.