As with Milan's answer: perfect explanation and hugely appreciated. A few follow up questions/comments below.
----- Original Message ----- > From: "Jeff Newmiller" <jdnew...@dcn.davis.ca.us> > To: "Chris Evans" <chrish...@psyctc.org> > Cc: r-help@r-project.org > Sent: Saturday, 17 October, 2015 18:28:12 > Subject: Re: [R] No speed up using the parallel package and ncpus > 1 with > boot() on linux machines > None of this is surprising. If the calculations you divide your work up > into are small, then the overhead of communicating between parallel > processes will be a relatively large penalty to pay. You have to break > your problem up into larger chunks and depend on vector processing within > processes to keep the cpu busy doing useful work. Aha. Got it! > Also, I am not aware of any model of Mac Mini that has 8 physical cores... > 4 is the max. Virtual cores gain a logical simplification of > multiprocessing but do not offer actual improved performance because > there are only as many physical data paths and registers as there are > cores. Ah. Hadn't thought of that. It's a machine I rent, I thought it was a mac mini. detectCores() reports 8 but perhaps they are virtual cores. /proc/cpuinfo says the processor is an Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz and shows 8 cores but again ... perhaps they are virtual. What's the best way to get a true core count? > Note that your problems are with long-running simulations... your examples > are too small to demonstrate the actual balance of processing vs. > communication overhead. Before you draw conclusions, try upping bootReps > by a few orders of magnitude, and run your test code a couple > of times to stabilize the memory conditions and obtain some consistency > in timings. OK. Good advice again but what you are saying, and the findings I had there, are pretty consistent with what I was seeing with long running things with bootReps up at 10k and I think you've told me what I really want to know. I think the simplest way to parallelise may actually be fine for me: I'll run four (or maybe eight) separate R jobs (having a look at swapping to make sure I'm not pushing beyond physical RAM, don't think these simulations will. > I have never used the parallel option in the boot package before... I have > always rolled my own to allow me to decide how much work to do within the > worker processes before returning from them. (This is particularly severe > when using snow, but not necessarily something you can neglect with > multicore.) That sounds like an impressive and obviously pertinent approach. I think, as I say, I may be able to get away with a very simple approach that runs parallel simulations and then aggregates the data from each and analyses that. Many thanks Jeff. Brilliant help. Chris > On Sat, 17 Oct 2015, Chris Evans wrote: > >> I think I am failing to understand how boot() uses the parallel package on >> linux ... rest of my original post deleted to save space ... > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.