I have a user whos code when ran on ethernet performs fine. When ran on verbs 
based IB the code deadlocks in an MPI_AllReduce() call.

We are using openmpi/1.4.3  with the intel compilers.

I poked at the running code with padb and I get the following:

0....5....1....5....2....5....3....5....4....5....
,,---,-,-,----,--,--,,-,RRRRRRRR,---,----,,--,-,-,
,,-,-,,,-,,--,-,,-,-,-,-RRRRRRRR-,-,---,,,--,,---,
,,---,-,,,,-,-,,-,-,----RRRRRRRR,----,-,--,,-----,
--,,-,-,,,,-,,------,,--RRRRRRRR,,----,,--,------,


For multiple runs which ranks are stuck in AllReduce() changes, 
Is there any open bugs?  I found one but only on shared memory and our version 
should be new enough (from what I could tell) to avoid it.

Thanks,  what should I look for to diagnose the issue?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985




Reply via email to