We have a code (arts) that locks up only when running on IB. Works
fine on tcp and sm.
When we ran it in a debugger. It locked up on a MPI_Comm_split()
That as far as I could tell was valid.
Because the split was a hack they did to use MPI_File_open() on a
single cpu, we reworked it to remove the split. The code then locks
up again.
This time its locked up on an MPI_Allreduce() Which was really
strange. When running on 8 cpus only rank 4 would get sucks. The
rest of the ranks are fine and get the right value. (we are using
ddt as our debugger).
Its very strange. Do you have any idea what could cause this to
happen? We are using openmpi-1.2.3/1.2.6 with PGI compilers.
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985