I would certainly try it -mca btl ^sm and see if that solves the problem.

On May 4, 2010, at 2:38 PM, Eugene Loh wrote:

> Gus Correa wrote:
> 
>> Dear Open MPI experts
>> 
>> I need your help to get Open MPI right on a standalone
>> machine with Nehalem processors.
>> 
>> How to tweak the mca parameters to avoid problems
>> with Nehalem (and perhaps AMD processors also),
>> where MPI programs hang, was discussed here before.
>> 
>> However, I lost track of the details, how to work around the problem,
>> and if it was fully fixed already perhaps.
> 
> Yes, perhaps the problem you're seeing is not what you remember being 
> discussed.
> 
> Perhaps you're thinking of https://svn.open-mpi.org/trac/ompi/ticket/2043 .  
> It's presumably fixed.
> 
>> I am now facing the problem directly on a single Nehalem box.
>> 
>> I installed OpenMPI 1.4.1 from source,
>> and compiled the test hello_c.c with mpicc.
>> Then I tried to run it with:
>> 
>> 1) mpirun -np 4 a.out
>> It ran OK (but seemed to be slow).
>> 
>> 2) mpirun -np 16 a.out
>> It hung, and brought the machine to a halt.
>> 
>> Any words of wisdom are appreciated.
>> 
>> More info:
>> 
>> * OpenMPI 1.4.1 installed from source (tarball from your site).
>> * Compilers are gcc/g++/gfortran 4.4.3-4.
>> * OS is Fedora Core 12.
>> * The machine is a Dell box with Intel Xeon 5540 (quad core)
>> processors on a two-way motherboard and 48GB of RAM.
>> * /proc/cpuinfo indicates that hyperthreading is turned on.
>> (I can see 16 "processors".)
>> 
>> **
>> 
>> What should I do?
>> 
>> Use -mca btl ^sm  ?
>> Use -mca btl -mca btl_sm_num_fifos=some_number ? (Which number?)
>> Use Both?
>> Do something else? 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to