I would certainly try it -mca btl ^sm and see if that solves the problem. On May 4, 2010, at 2:38 PM, Eugene Loh wrote:
> Gus Correa wrote: > >> Dear Open MPI experts >> >> I need your help to get Open MPI right on a standalone >> machine with Nehalem processors. >> >> How to tweak the mca parameters to avoid problems >> with Nehalem (and perhaps AMD processors also), >> where MPI programs hang, was discussed here before. >> >> However, I lost track of the details, how to work around the problem, >> and if it was fully fixed already perhaps. > > Yes, perhaps the problem you're seeing is not what you remember being > discussed. > > Perhaps you're thinking of https://svn.open-mpi.org/trac/ompi/ticket/2043 . > It's presumably fixed. > >> I am now facing the problem directly on a single Nehalem box. >> >> I installed OpenMPI 1.4.1 from source, >> and compiled the test hello_c.c with mpicc. >> Then I tried to run it with: >> >> 1) mpirun -np 4 a.out >> It ran OK (but seemed to be slow). >> >> 2) mpirun -np 16 a.out >> It hung, and brought the machine to a halt. >> >> Any words of wisdom are appreciated. >> >> More info: >> >> * OpenMPI 1.4.1 installed from source (tarball from your site). >> * Compilers are gcc/g++/gfortran 4.4.3-4. >> * OS is Fedora Core 12. >> * The machine is a Dell box with Intel Xeon 5540 (quad core) >> processors on a two-way motherboard and 48GB of RAM. >> * /proc/cpuinfo indicates that hyperthreading is turned on. >> (I can see 16 "processors".) >> >> ** >> >> What should I do? >> >> Use -mca btl ^sm ? >> Use -mca btl -mca btl_sm_num_fifos=some_number ? (Which number?) >> Use Both? >> Do something else? > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users