Gus Correa wrote:
Dear Open MPI experts
I need your help to get Open MPI right on a standalone
machine with Nehalem processors.
How to tweak the mca parameters to avoid problems
with Nehalem (and perhaps AMD processors also),
where MPI programs hang, was discussed here before.
However, I lost track of the details, how to work around the problem,
and if it was fully fixed already perhaps.
Yes, perhaps the problem you're seeing is not what you remember being
discussed.
Perhaps you're thinking of
https://svn.open-mpi.org/trac/ompi/ticket/2043 . It's presumably fixed.
I am now facing the problem directly on a single Nehalem box.
I installed OpenMPI 1.4.1 from source,
and compiled the test hello_c.c with mpicc.
Then I tried to run it with:
1) mpirun -np 4 a.out
It ran OK (but seemed to be slow).
2) mpirun -np 16 a.out
It hung, and brought the machine to a halt.
Any words of wisdom are appreciated.
More info:
* OpenMPI 1.4.1 installed from source (tarball from your site).
* Compilers are gcc/g++/gfortran 4.4.3-4.
* OS is Fedora Core 12.
* The machine is a Dell box with Intel Xeon 5540 (quad core)
processors on a two-way motherboard and 48GB of RAM.
* /proc/cpuinfo indicates that hyperthreading is turned on.
(I can see 16 "processors".)
**
What should I do?
Use -mca btl ^sm ?
Use -mca btl -mca btl_sm_num_fifos=some_number ? (Which number?)
Use Both?
Do something else?