As with Milan's answer: perfect explanation and hugely appreciated.  A few 
follow up questions/comments below.

----- Original Message -----
> From: "Jeff Newmiller" <jdnew...@dcn.davis.ca.us>
> To: "Chris Evans" <chrish...@psyctc.org>
> Cc: r-help@r-project.org
> Sent: Saturday, 17 October, 2015 18:28:12
> Subject: Re: [R] No speed up using the parallel package and ncpus > 1 with 
> boot() on linux machines

> None of this is surprising. If the calculations you divide your work up
> into are small, then the overhead of communicating between parallel
> processes will be a relatively large penalty to pay.  You have to break
> your problem up into larger chunks and depend on vector processing within
> processes to keep the cpu busy doing useful work.

Aha.  Got it!
 
> Also, I am not aware of any model of Mac Mini that has 8 physical cores...
> 4 is the max. Virtual cores gain a logical simplification of
> multiprocessing but do not offer actual improved performance because
> there are only as many physical data paths and registers as there are
> cores.

Ah.  Hadn't thought of that.  It's a machine I rent, I thought it was a mac 
mini.  detectCores() reports 8 but perhaps they are virtual cores. 
/proc/cpuinfo says the processor is an Intel(R) Core(TM) i7-3615QM CPU @ 
2.30GHz and shows 8 cores but again ... perhaps they are virtual.  What's the 
best way to get a true core count?
 
> Note that your problems are with long-running simulations... your examples
> are too small to demonstrate the actual balance of processing vs.
> communication overhead. Before you draw conclusions, try upping bootReps
> by a few orders of magnitude, and run your test code a couple
> of times to stabilize the memory conditions and obtain some consistency
> in timings.

OK.  Good advice again but what you are saying, and the findings I had there, 
are pretty consistent with what I was seeing with long running things with 
bootReps up at 10k and I think you've told me what I really want to know.  I 
think the simplest way to parallelise may actually be fine for me: I'll run 
four (or maybe eight) separate R jobs (having a look at swapping to make sure 
I'm not pushing beyond physical RAM, don't think these simulations will.

> I have never used the parallel option in the boot package before... I have
> always rolled my own to allow me to decide how much work to do within the
> worker processes before returning from them. (This is particularly severe
> when using snow, but not necessarily something you can neglect with
> multicore.)

That sounds like an impressive and obviously pertinent approach.  I think, as I 
say, I may be able to get away with a very simple approach that runs parallel 
simulations and then aggregates the data from each and analyses that.

Many thanks Jeff.  Brilliant help.

Chris

 
> On Sat, 17 Oct 2015, Chris Evans wrote:
> 
>> I think I am failing to understand how boot() uses the parallel package on 
>> linux

... rest of my original post deleted to save space ...

 
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> ---------------------------------------------------------------------------

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to