pnmath currently uses up to 8 threads (i.e. 1, 2, 4, or 8).
getNumPnmathThreads() should tell you the maximum number used on your
system, which should be 8 if the number of processors is being
identified correctly.  With the size of m this calculation should be
using 8 threads, but the exp calculation is fairly fast, so the
overhead is noticable. On a Linux box with 4 dual-core AMD processors
I get

m <- matrix(0, 10000, 1000)
mean(replicate(10, system.time(exp(m), gcFirst=TRUE))["elapsed",])
[1] 0.3859
library(pnmath)
mean(replicate(10, system.time(exp(m), gcFirst=TRUE))["elapsed",])
[1] 0.0775

A similar example using qbeta, a slower function, gives

p <- matrix(0.5,1000,1000)
setNumPnmathThreads(1)
[1] 1
mean(replicate(10, system.time(qbeta(p,2,3), gcFirst=TRUE))["elapsed",])
[1] 7.334
setNumPnmathThreads(8)
[1] 8
mean(replicate(10, system.time(qbeta(p,2,3), gcFirst=TRUE))["elapsed",])
[1] 0.9576


On an 8-core Intel/OS X box the improvement for exp is much less, but
is similar for qbeta.

luke


On Thu, 10 Jul 2008, Martin Morgan wrote:

"Juan Pablo Romero Méndez" <[EMAIL PROTECTED]> writes:

Just out of curiosity, what system do you have?

These are the results in my machine:

system.time(exp(m), gcFirst=TRUE)
   user  system elapsed
   0.52    0.04    0.56
library(pnmath)
system.time(exp(m), gcFirst=TRUE)
   user  system elapsed
  0.660   0.016   0.175


from cat /proc/cpuinfo, the original results were from a 32 bit
dual-core system

model name   : Intel(R) Core(TM)2 CPU         T7600  @ 2.33GHz

Here's a second set of results on a 64-bit system with 16 core (4 core
on 4 physical processors, I think)

mean(replicate(10, system.time(exp(m), gcFirst=TRUE))["elapsed",])
[1] 0.165
mean(replicate(10, system.time(exp(m), gcFirst=TRUE))["elapsed",])
[1] 0.0397

model name   : Intel(R) Xeon(R) CPU           X7350  @ 2.93GHz

One thing is that for me in single-thread mode the faster processor
actually evaluates slower. This could be because of 64-bit issues,
other hardware design aspects, the way I've compiled R on the two
platforms, or other system activities on the larger machine.

A second thing is that it appears that the larger machine only
accelerates 4-fold, rather than a naive 16-fold; I think this is from
decisions in the pnmath code about the number of processors to use,
although I'm not sure.

A final thing is that running intensive tests on my laptop generates
enough extra heat to increase the fan speed and laptop temperature. I
sort of wonder whether consumer laptops / desktops are engineered for
sustained use of their multiple core (although I guess the gaming
community makes heavy use of multiple cores).

Martin



  Juan Pablo



system.time(exp(m), gcFirst=TRUE)
  user  system elapsed
 0.108   0.000   0.106
library(pnmath)
system.time(exp(m), gcFirst=TRUE)
  user  system elapsed
 0.096   0.004   0.052

(elapsed time about 2x faster). Both BLAS and pnmath make much better
use of resources, since they do not require multiple R instances.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
   Actuarial Science
241 Schaeffer Hall                  email:      [EMAIL PROTECTED]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to