On 10/20/2010 8:30 PM, Scott Atchley wrote:
On Oct 20, 2010, at 9:22 PM, Raymond Muno wrote:
On 10/20/2010 7:59 PM, Ralph Castain wrote:
The error message seems to imply that mpirun itself didn't segfault, but that
something else did. Is that segfault pid from mpirun?
This kind of problem usually is caused by mismatched builds - i.e., you compile
against your new build, but you pick up the Myrinet build when you try to run
because of path and ld_library_path issues. You might check to ensure you are
running against what you built with.
The PATH and LD_LIBRARY_PATH are set explicitly (through modules) on the
frontend and each node. The PGI compiler and the OpenMPI I am trying to run is
set for each.
<snip>
Are you building OMPI with support for both MX and IB? If not and you only want
MX support, try configuring OMPI using --disable-memory-manager (check
configure for the exact option).
We have fixed this bug in the most recent 1.4.x and 1.5.x releases.
Scott
I just downloaded 1.4.3 and compiled it with PGI 10.4. I get the same
result.
I did confirm that the process ID shown is that of mpirun.
This cluster only has Myrinet. The install is separate from the IB
cluster and a fresh build. I will try the configure option.