Re: [R] More than doubling performance with snow

Luke Tierney Fri, 28 Nov 2008 06:36:45 -0800

Hi Markus,

I'm happy to participate in this, as I think I said previously.


I won't have time to look carefully at the draft until sometime next
week, but I remain puzzled about the high time listed for case 3 with
snow/Rmpi.  It would be good to understand what is going on there --
the discrepancy between show/Rmpi and the other snow variants seems
odd.

I'm not sure how meaningful the timing comparisons are overall.  The
differences are mainly overhead due to additional feature and
communication difference.  The feature-related overhead is not likely
to be important in any real examples. In my experience, if
communication is an issue in a substantial (i.e. realistic)
computaiton, then a more sophisticated approach than simple
scatter-compute-gather is needed, and then the ability to express such
an approach becomes more important than the performance per se.

Best,

luke

On Mon, 24 Nov 2008, Markus Schmidberger wrote:

Hi,

there is a new mailing list for R and HPC: [EMAIL PROTECTED]
This is probably a better list for this question. Do not forget, first
of all you have to register: https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

In this case the communication overhead is the problem. The data /
matrix is to big!
Have a look to the function snow.time to visualize your communication
and calculation time. It is a new function in snow_0.3-4.
( http://www.cs.uiowa.edu/~luke/R/cluster/ )

Best
Markus



Stefan Evert wrote:

I'm sorry but I don't quite understand what "not running solve() in
this process" means. I updated the code and it do show that the result
from clusterApply() are identical with the result from lapply(). Could
you please explain more about this?


The point is that a parallel processing framework like Snow and PVM does
not execute the operation in your (interactive) R session, but rather
starts separate computing processes that carry out the actual
calculation (while your R session is just waiting for the results to
become available).  These separate processes can either run on different
computers in a network, or on your local machine (in order to make use
of multiple CPU cores).

user  system elapsed
0.584   0.144   4.355

user  system elapsed
4.777   0.100   4.901



If you take a close look at your timing results, you can see that the
total processing time ("elapsed") is only slightly shorter with
parallelisation (4.35 s) than without (4.9 s).  You've probably been
looking at "user" time, i.e. the amount of CPU time your interactive R
session consumed.  Since with parallel processing, the R session itself
doesn't perform the actual calculation (as explained above), it is
mostly waiting for results to become available and "user" time is
therefore reduced drastically.  In short, when measuring performance
improvements from parallelisation, always look at the total "elapsed" time.

So why isn't parallel processing twice as fast as performing the
caculation in a single thread? Perhaps the advantage of using both CPU
cores was eaten up by the communication overhead.  You should also take
into account that a lot of other processes (terminals, GUI, daemons,
etc.) are running on your computer at the same time, so even with
parallel processing you will not have both cores fully available to R.
In my experience, there is little benefit in parallelisation as long as
you just have two CPU cores on your computer (rather than, say, 8 cores).

Hope this clarifies things a bit (and is reasonably accurate, since I
don't have much experience with parallelisation),
Stefan

[ [EMAIL PROTECTED] | http://purl.org/stefan.evert ]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Dipl.-Tech. Math. Markus Schmidberger

Ludwig-Maximilians-Universität München
IBE - Institut für medizinische Informationsverarbeitung,
Biometrie und Epidemiologie
Marchioninistr. 15, D-81377 Muenchen
URL: http://www.ibe.med.uni-muenchen.de
Mail: Markus.Schmidberger [at] ibe.med.uni-muenchen.de
Tel: +49 (089) 7095 - 4599

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
   Actuarial Science
241 Schaeffer Hall                  email:      [EMAIL PROTECTED]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] More than doubling performance with snow

Reply via email to