Dear Open MPI experts

I need your help to get Open MPI right on a standalone
machine with Nehalem processors.

How to tweak the mca parameters to avoid problems
with Nehalem (and perhaps AMD processors also),
where MPI programs hang, was discussed here before.

However, I lost track of the details, how to work around the problem,
and if it was fully fixed already perhaps.

I am now facing the problem directly on a single Nehalem box.

I installed OpenMPI 1.4.1 from source,
and compiled the test hello_c.c with mpicc.
Then I tried to run it with:

1) mpirun -np 4 a.out
It ran OK (but seemed to be slow).

2) mpirun -np 16 a.out
It hung, and brought the machine to a halt.

Any words of wisdom are appreciated.

More info:

* OpenMPI 1.4.1 installed from source (tarball from your site).
* Compilers are gcc/g++/gfortran 4.4.3-4.
* OS is Fedora Core 12.
* The machine is a Dell box with Intel Xeon 5540 (quad core)
processors on a two-way motherboard and 48GB of RAM.
* /proc/cpuinfo indicates that hyperthreading is turned on.
(I can see 16 "processors".)

**

What should I do?

Use -mca btl ^sm  ?
Use -mca btl -mca btl_sm_num_fifos=some_number ? (Which number?)
Use Both?
Do something else?

Many thanks,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Reply via email to