Dear Open MPI experts I need your help to get Open MPI right on a standalone machine with Nehalem processors.
How to tweak the mca parameters to avoid problems with Nehalem (and perhaps AMD processors also), where MPI programs hang, was discussed here before. However, I lost track of the details, how to work around the problem, and if it was fully fixed already perhaps. I am now facing the problem directly on a single Nehalem box. I installed OpenMPI 1.4.1 from source, and compiled the test hello_c.c with mpicc. Then I tried to run it with: 1) mpirun -np 4 a.out It ran OK (but seemed to be slow). 2) mpirun -np 16 a.out It hung, and brought the machine to a halt. Any words of wisdom are appreciated. More info: * OpenMPI 1.4.1 installed from source (tarball from your site). * Compilers are gcc/g++/gfortran 4.4.3-4. * OS is Fedora Core 12. * The machine is a Dell box with Intel Xeon 5540 (quad core) processors on a two-way motherboard and 48GB of RAM. * /proc/cpuinfo indicates that hyperthreading is turned on. (I can see 16 "processors".) ** What should I do? Use -mca btl ^sm ? Use -mca btl -mca btl_sm_num_fifos=some_number ? (Which number?) Use Both? Do something else? Many thanks, Gus Correa --------------------------------------------------------------------- Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA ---------------------------------------------------------------------