On 10/20/2010 7:59 PM, Ralph Castain wrote:
The error message seems to imply that mpirun itself didn't segfault, but that
something else did. Is that segfault pid from mpirun?
This kind of problem usually is caused by mismatched builds - i.e., you compile
against your new build, but you pick up the Myrinet build when you try to run
because of path and ld_library_path issues. You might check to ensure you are
running against what you built with.
The PATH and LD_LIBRARY_PATH are set explicitly (through modules) on the
frontend and each node. The PGI compiler and the OpenMPI I am trying to
run is set for each.
ldd /share/apps/opt/OpenMPI/1.4.2/PGI/10.4/bin/mpirun
libopen-rte.so.0 =>
/share/apps/opt/OpenMPI/1.4.2/PGI/10.4/lib/libopen-rte.so.0
(0x00002b6a16552000)
libopen-pal.so.0 =>
/share/apps/opt/OpenMPI/1.4.2/PGI/10.4/lib/libopen-pal.so.0
(0x00002b6a167aa000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003a7dc00000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003a80400000)
libutil.so.1 => /lib64/libutil.so.1 (0x0000003a88a00000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003a7e000000)
libm.so.6 => /lib64/libm.so.6 (0x0000003a7d800000)
libc.so.6 => /lib64/libc.so.6 (0x0000003a7d400000)
libpgc.so =>
/share/apps/opt/PGI/10.4/linux86-64/10.4/libso/libpgc.so
(0x00002b6a16a28000)
/lib64/ld-linux-x86-64.so.2 (0x0000003a7d000000)
The one that works from the other tree
ldd /opt/openmpi-myrinet_mx/bin/mpirun
libopen-rte.so.0 =>
/opt/openmpi-myrinet_mx/lib/libopen-rte.so.0 (0x00002b51c71b0000)
libopen-pal.so.0 =>
/opt/openmpi-myrinet_mx/lib/libopen-pal.so.0 (0x00002b51c7430000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003a7dc00000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003a80400000)
libutil.so.1 => /lib64/libutil.so.1 (0x0000003a88a00000)
libm.so.6 => /lib64/libm.so.6 (0x0000003a7d800000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003a7e000000)
libc.so.6 => /lib64/libc.so.6 (0x0000003a7d400000)
/lib64/ld-linux-x86-64.so.2 (0x0000003a7d000000)