Hi, We are seeing the following issue with Iprobe on our clusters running openmpi-1.2.2. Here is the code and related information:
======= Modules currently loaded: (sn31)/projects>module list > > Currently Loaded Modulefiles: > > 1) /opt/modules/oscar-modulefiles/default-manpath/1.0.1 > > 2) compilers/intel-9.1-f040-c045 > > 3) misc/env-openmpi-1.2 > > 4) mpi/openmpi-1.2.2_mx_intel-9.1-f040-c045 > > 5) libraries/intel-mkl ======= Source code: > > > > (sn31)/projects/>more probeTest.cc > > > > #include <mpi.h> > > #include <cassert> > > > > int main(int argc, char* argv[]) > > { > > MPI::Init(argc, argv); > > > > const int rank = MPI::COMM_WORLD.Get_rank(); > > const int size = MPI::COMM_WORLD.Get_size(); > > const int sendProc = (rank + size - 1) % size; > > const int recvProc = (rank + 1) % size; > > const int tag = 1; > > > > // send an asynchronous message > > const int sendVal = 1; > > MPI::Request sendRequest = > > MPI::COMM_WORLD.Isend(&sendVal, 1, MPI_INT, recvProc, tag); > > > > // wait for message to arrive > > while (!MPI::COMM_WORLD.Iprobe(sendProc, tag)) {} // This line > > causes problems > > > > // Receive asynchronous message > > int recvVal; > > MPI::Request recvRequest = > > MPI::COMM_WORLD.Irecv(&recvVal, 1, MPI_INT, sendProc, tag); > > recvRequest.Wait(); > > > > MPI::Finalize(); > > } ======= Compiled with: > > (sn31)/projects>/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi > > -1.2.2_mx/bin/mpicxx > > -I/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_m > > x/include -g -c -o probeTest.o probeTest.cc > > > > (sn31)/projects>/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi > > -1.2.2_mx/bin/mpicxx -g -o probeTest > > -L/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/lib > > probeTest.o -lmpi > > /projects/global/x86_64/compilers/intel/intel-9.1-cce-045/lib/ibimf.so: > > warning: warning: feupdateenv is not implemented and will always > > fail > > ======= Error at runtime: > > > > (sn31)/projects>mpiexec -n 1 ./probeTest [sn31:17616] *** Process > > received signal *** [sn31:17616] Signal: > > Segmentation fault (11) [sn31:17616] Signal code: Address not mapped > > (1) [sn31:17616] Failing at address: 0x8 [sn31:17616] [ 0] > > /lib64/tls/libpthread.so.0 [0x2a9665a4f0] [sn31:17616] [ 1] > > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/ > > lib/openmpi/mca_mtl_mx.so(ompi_mtl_mx_iprobe+0x81) > > [0x2a9980b305] > > [sn31:17616] [ 2] > > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/ > > lib/openmpi/mca_pml_cm.so(mca_pml_cm_iprobe+0x1f) > > [0x2a995eb817] > > [sn31:17616] [ 3] > > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/ > > lib/libmpi.so.0(MPI_Iprobe+0xef) > > [0x2a956d363f] > > [sn31:17616] [ 4] ./probeTest(_ZNK3MPI4Comm6IprobeEii+0x3a) > > [0x4046aa][sn31:17616] [ 5] ./probeTest(main+0x147) [0x40480b] > > [sn31:17616] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb) > > [0x2a967803fb] > > [sn31:17616] [ 7] ./probeTest(_ZNSt8ios_base4InitD1Ev+0x3a) > > [0x4038ca][sn31:17616] *** End of error message *** mpiexec noticed > > that job rank 0 with PID 17616 on node sn31 exited on > > signal 11 (Segmentation fault). > > > > (sn31)/projects/ceptre/sdpautz/NWCC/temp>mpiexec -n 2 ./probeTest > > [sn31:17621] *** Process received signal *** [sn31:17620] *** Process > > received signal *** [sn31:17620] Signal: Segmentation fault (11) > > [sn31:17620] Signal code: Address not mapped (1) [sn31:17620] > > Failing at address: 0x8 [sn31:17620] [ 0] /lib64/tls/libpthread.so.0 > > [0x2a9665a4f0] [sn31:17620] [ 1] > > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/ > > lib/openmpi/mca_mtl_mx.so(ompi_mtl_mx_iprobe+0x81) > > [0x2a9980b305] > > [sn31:17620] [ 2] > > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/ > > lib/openmpi/mca_pml_cm.so(mca_pml_cm_iprobe+0x1f) > > [0x2a995eb817] > > [sn31:17620] [ 3] > > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/ > > lib/libmpi.so.0(MPI_Iprobe+0xef) > > [0x2a956d363f] > > [sn31:17620] [ 4] ./probeTest(_ZNK3MPI4Comm6IprobeEii+0x3a) > > [0x4046aa][sn31:17620] [ 5] ./probeTest(main+0x147) [0x40480b] > > [sn31:17620] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb) > > [0x2a967803fb] > > [sn31:17620] [ 7] ./probeTest(_ZNSt8ios_base4InitD1Ev+0x3a) > > [0x4038ca][sn31:17620] *** End of error message *** [sn31:17621] > > Signal: Segmentation fault (11) [sn31:17621] Signal code: Address > > not mapped (1) [sn31:17621] Failing at address: 0x8 [sn31:17621] [ > > 0] /lib64/tls/libpthread.so.0 [0x2a9665a4f0] [sn31:17621] [ 1] > > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/ > > lib/openmpi/mca_mtl_mx.so(ompi_mtl_mx_iprobe+0x81) > > [0x2a9980b305] > > [sn31:17621] [ 2] > > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/ > > lib/openmpi/mca_pml_cm.so(mca_pml_cm_iprobe+0x1f) > > [0x2a995eb817] > > [sn31:17621] [ 3] > > /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/ > > lib/libmpi.so.0(MPI_Iprobe+0xef) > > [0x2a956d363f] > > [sn31:17621] [ 4] ./probeTest(_ZNK3MPI4Comm6IprobeEii+0x3a) > > [0x4046aa][sn31:17621] [ 5] ./probeTest(main+0x1ad) [0x404871] > > [sn31:17621] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb) > > [0x2a967803fb] > > [sn31:17621] [ 7] ./probeTest(_ZNSt8ios_base4InitD1Ev+0x3a) > > [0x4038ca][sn31:17621] *** End of error message *** mpiexec noticed > > that job rank 0 with PID 17620 on node sn31 exited on > > signal 11 (Segmentation fault). > > 1 additional process aborted (not shown) > > ======= Additional Information: > > It appears that the call of Iprobe causes problems; if that line is > > taken out, the code completes normally. Failures also occur with the gcc compilers. > > Mpich appears to work, at least for the Intel compiler. ======= Hardware information: [root@spirit1 ~]# mx_info -q MX Version: 1.2.1-rc20 MX Build: r...@tocc1.sandia.gov:/projects/global/src/myricom/mx-1.2.1-rc20 Thu Jun 7 17:08:02 MDT 2007 1 Myrinet board installed. The MX driver is configured to support a maximum of: 8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host =================================================================== Instance #0: 333.2 MHz LANai, 133.3 MHz PCI bus, 4 MB SRAM Status: Running, P0: Link up, P1: Link up Network: Myrinet 2000 MAC Address: 00:60:dd:48:ba:ae Product code: M3F2-PCIXE-4 Part number: 09-02878 Serial number: 219851 Mapper (P0): 00:60:dd:48:c0:08, version = 0x01920f75, configured Mapped hosts: 506 Mapper (P1): 00:60:dd:48:c0:08, version = 0x01920f75, configured Mapped hosts: 506 cat /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/BUILD_ENV # Build Environment: USE="doc icc modules mx torque" COMPILER="intel-9.1-f040-c045" CC="icc" CXX="icpc" CLINKER="icc" FC="ifort" F77="ifort" CFLAGS=" -O3 -pipe" CXXFLAGS=" -O3 -pipe" FFLAGS=" -O3" MODULE_DEST="/apps/modules/modulefiles/mpi" MODULE_FILE="openmpi-1.2.2_mx_intel-9.1-f040-c045" INSTALL_DEST="/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2 _mx" CONF_FLAGS=" --with-mx=/opt/mx --with-tm=/apps/torque" ======= Thanks in advance for any help/advice you can provide. -Sophia