Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

Gus Correa Tue, 4 May 2010 23:38:29 -0400

Hi Doug

Thank you for your input.
I fully agree with you.
I do not expect to get much from hyperthreading in terms of performance.
However, at this point I am just interested in having Open MPI working
right with *both* HT on and HT off.


Anyway, back to your comment about the usefulness of HT.
This is all hearsay, the hands waving argument I heard about the Intel
hyperthreading (HT), and its IBM cousin "symmetric multi-threading"
(SMT), and most likely some other equivalents out there.
You kind of suggested some of these points in your message.
In any case, please don't quote me on that,
although just posting this on the list already puts me on the spot.

An expert could jump in and correct me, please.

1) HT/SMT works well for those codes that have many
branch/decisions (like if/else), as the new instructions to
be fetched/executed are not predictable, and by having two threads
active on a single core can harness those idle core/CPU cycles
when the "other" thread is fetching a new non-predictable
instruction after the frequent branches/decisions.

2) Predictable instructions, on the other hand, can be piplelined
to be executed, and do not leave much of CPU idle cycles.

Most of our scientific codes (finite element, finite differences,
finite volume, spectral, linear algebra solvers) are NOT characterized
by branches, but by big repetitive inner loops that do not leave
much of idle CPU cycles.
(Well, at least when they are thoughtfully written.)

I.e., they are mostly made of predictable instructions that fit
nicely in the CPU pipeline.
Hence, the active thread becomes greedy,
and doesn't give much of a chance
to the "other" thread to get the hold of the CPU.
Hence, hyperthreading on these codes is not helpful.

That is the (common) wisdom about HT/SMT I was told.

3) However, I saw one person reporting modest gains in speedup
(10-20%) when running an ocean model (finite differences, domain
decomposition, OpenMPI, actually a Mac OS-X cluster).
It may have been here in this list, IIRR.

I myself experienced speedup numbers on this range, maybe up to 30%,
not on Linux, but on a IBM Power6 big machine
(32 CPUs/node, look like 64 CPUs/node with SMT turned on).
On these IBM machines SMT is turned on/off by the user,
via environment variables, which is very convenient.
This was when I ran a coupled
climate model (5 executables in MPMD mode using the IBM MPI).

4) I am not so surprised by the numbers you reported.
Based on the common wisdom above, the more optimized
the loops on your code are, the less useful HT becomes.
You may need to screw up the code a bit, say, by inserting
branch/decisions in your inner loops, for HT to be of help.
However, the net gain by doing that may be actually a loss
w.r.t. to just running the optimized code without HT, I would guess.
There is nothing like a clean and clever algorithm.

Cheers,
Gus Correa
(still struggling to get Open MPI to get along with HT,
but now self-promoted to parallel programming theoretician :) )
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


Doug Reeder wrote:

Hello,
I have a mac with two quad core nehalem chips (8 cores). The sysctlcommand shows 16 cpus (apparently w/ hyperthreading). I have a finiteelement code that runs in parallel using openmpi. Running on the singlemachine using openmpi -np 8 runs in about 2/3 time that running with -np16 does. The program is very well optimized for parallel processing so Istrongly suspect that hyperthreading is not helping. The program fairlyaggressively uses 100% of each cpu it is on so I don't thinkhyperthreading gets much of a chance to split the cpu activity. I wouldcertainly welcome input/insight from an intel hardware engineer. I makesure that I don't ask for more processors than there are physical coresand that seems to work.
Doug Reeder
On May 4, 2010, at 7:06 PM, Gus Correa wrote:
Hi Ralph

Thank you so much for your help.

You are right, paffinity is turned off (default):

**************
/opt/sw/openmpi/1.4.2/gnu-4.4.3-4/bin/ompi_info --param opal all |grep paffinityMCA opal: parameter "opal_paffinity_alone" (currentvalue: "0", data source: default value, synonyms: mpi_paffinity_alone,mpi_paffinity_alone)
**************

I will try your suggestion to turn off HT tomorrow,
and report back here.
Douglas Guptill kindly sent a recipe to turn HT off via BIOS settings.

Cheers,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


Ralph Castain wrote:
On May 4, 2010, at 4:51 PM, Gus Correa wrote:
Hi Ralph

Ralph Castain wrote:
One possibility is that the sm btl might not like that you havehyperthreading enabled.
I remember that hyperthreading was discussed months ago,
in the previous incarnation of this problem/thread/discussion on"Nehalem vs. Open MPI".
(It sounds like one of those supreme court cases ... )

I don't really administer that machine,
or any machine with hyperthreading,
so I am not much familiar to the HT nitty-gritty.
How do I turn off hyperthreading?
Is it a BIOS or a Linux thing?
I may try that.
I believe it can be turned off via an admin-level cmd, but I'm notcertain about it
Another thing to check: do you have any paffinity settings turned on
(e.g., mpi_paffinity_alone)?

I didn't turn on or off any paffinity setting explicitly,
either in the command line or in the mca config file.
All that I did on the tests was to turn off "sm",
or just use the default settings.
I wonder if paffinity is on by default, is it?
Should I turn it off?
It is off by default - I mention it because sometimes people have itset in the default MCA param file and don't realize it is on. Soundsokay here, though.
Our paffinity system doesn't handle hyperthreading at this time.
OK, so *if* paffinity is on by default (Is it?),
and hyperthreading is also on, as it is now,
I must turn off one of them, maybe both, right?
I may go combinatorial about this tomorrow.
Can't do it today.
Darn locked office door!
I would say don't worry about the paffinity right now - sounds likeit is off. You can always check, though, by running "ompi_info--param opal all" and checking for the setting of theopal_paffinity_alone variable
I'm just suspicious of the HT since you have a quad-core machine,
and the limit where things work seems to be 4...

It may be.
If you tell me how to turn off HT (I'll google around for itmeanwhile),
I will do it tomorrow, if I get a chance to
hard reboot that pesky machine now locked behind a door.
Yeah, I'm beginning to believe it is the HT that is causing theproblem...
Thanks again for your help.

Gus
On May 4, 2010, at 3:44 PM, Gus Correa wrote:
Hi Jeff

Sure, I will certainly try v1.4.2.
I am downloading it right now.
As of this morning, when I first downloaded,
the web site still had 1.4.1.
Maybe I should have refreshed the web page on my browser.

I will tell you how it goes.

Gus

Jeff Squyres wrote:
Gus -- Can you try v1.4.2 which was just released today?
On May 4, 2010, at 4:18 PM, Gus Correa wrote:
Hi Ralph

Thank you very much.
The "-mca btl ^sm" workaround seems to have solved the problem,
at least for the little hello_c.c test.
I just ran it fine up to 128 processes.

I confess I am puzzled by this workaround.
* Why should we turn off "sm" in a standalone machine,
where everything is supposed to operate via shared memory?
* Do I incur in a performance penalty by not using "sm"?
* What other mechanism is actually used by OpenMPI for process
communication in this case?
It seems to be using tcp, because when I try -np 256 I get thiserror:
[spinoza:02715] [[11518,0],0] ORTE_ERROR_LOG: The system limiton number
of network connections a process can open was reached in file
../../../../../orte/mca/oob/tcp/oob_tcp.c at line 447
--------------------------------------------------------------------------Error: system limit exceeded on number of network connectionsthat can
be open
This can be resolved by setting the mca parameter
opal_set_max_sys_limits to 1,
increasing your limit descriptor setting (using limit or ulimitcommands),
or asking the system administrator to increase the system limit.
--------------------------------------------------------------------------
Anyway, no big deal, because we don't intend to oversubrcribe the
processors on real jobs anyway (and the very error messagesuggests a
workaround to increase np, if needed).

Many thanks,
Gus Correa

Ralph Castain wrote:
I would certainly try it -mca btl ^sm and see if that solvesthe problem.
On May 4, 2010, at 2:38 PM, Eugene Loh wrote:
Gus Correa wrote:
Dear Open MPI experts

I need your help to get Open MPI right on a standalone
machine with Nehalem processors.

How to tweak the mca parameters to avoid problems
with Nehalem (and perhaps AMD processors also),
where MPI programs hang, was discussed here before.
However, I lost track of the details, how to work around theproblem,
and if it was fully fixed already perhaps.
Yes, perhaps the problem you're seeing is not what youremember being discussed.
Perhaps you're thinking ofhttps://svn.open-mpi.org/trac/ompi/ticket/2043 . It'spresumably fixed.
I am now facing the problem directly on a single Nehalem box.

I installed OpenMPI 1.4.1 from source,
and compiled the test hello_c.c with mpicc.
Then I tried to run it with:

1) mpirun -np 4 a.out
It ran OK (but seemed to be slow).

2) mpirun -np 16 a.out
It hung, and brought the machine to a halt.

Any words of wisdom are appreciated.

More info:

* OpenMPI 1.4.1 installed from source (tarball from your site).
* Compilers are gcc/g++/gfortran 4.4.3-4.
* OS is Fedora Core 12.
* The machine is a Dell box with Intel Xeon 5540 (quad core)
processors on a two-way motherboard and 48GB of RAM.
* /proc/cpuinfo indicates that hyperthreading is turned on.
(I can see 16 "processors".)

**

What should I do?

Use -mca btl ^sm  ?
Use -mca btl -mca btl_sm_num_fifos=some_number ? (Which number?)
Use Both?
Do something else?
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

Reply via email to