Running with rdmacm the problem does seam to resolve its self,
The code is large and complicated, but the problem does appear to arise
regularly when ran.
Just FYI, can I collect extra information to help find a fix?
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.ed
This could be related to https://svn.open-mpi.org/trac/ompi/ticket/2714 and/or
https://svn.open-mpi.org/trac/ompi/ticket/2722.
There isn't much info in the ticket, but we've been talking about it a bunch
offline. IBM and Mellanox have had reports of the error, but haven't been able
to reproduc