Re: [OMPI users] MPI_AllReduce() deadlock on IB

2011-03-25 Thread Brock Palen
Running with rdmacm the problem does seam to resolve its self, The code is large and complicated, but the problem does appear to arise regularly when ran. Just FYI, can I collect extra information to help find a fix? Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.ed

Re: [OMPI users] MPI_AllReduce() deadlock on IB

2011-03-16 Thread Jeff Squyres
This could be related to https://svn.open-mpi.org/trac/ompi/ticket/2714 and/or https://svn.open-mpi.org/trac/ompi/ticket/2722. There isn't much info in the ticket, but we've been talking about it a bunch offline. IBM and Mellanox have had reports of the error, but haven't been able to reproduc

[OMPI users] MPI_AllReduce() deadlock on IB

2011-03-16 Thread Brock Palen
I have a user whos code when ran on ethernet performs fine. When ran on verbs based IB the code deadlocks in an MPI_AllReduce() call. We are using openmpi/1.4.3 with the intel compilers. I poked at the running code with padb and I get the following: 051525354...