I'm seeing a crash in the openib btl on ompi-trunk when running any tests (whether running my own programs or generic ones). For example, when running IMB pingpong I get the following:
$ mpirun --n 2 --host vic12,vic20 -mca btl openib,self # /usr/mpi/gcc/openmpi-trunk/tests/IMB-2.3/IMB-MPI1 pingpong -------------------------------------------------------------------------- WARNING: No HCA parameters were found for the HCA that Open MPI detected: Hostname: vic20 HCA vendor ID: 0x1425 HCA vendor part ID: 48 Default HCA parameters will be used, which may result in lower performance. You can edit any of the files specified by the btl_openib_hca_param_files MCA parameter to set values for your HCA. NOTE: You can turn off this warning by setting the MCA parameter btl_openib_warn_no_hca_params_found to 0. -------------------------------------------------------------------------- -------------------------------------------------------------------------- WARNING: No HCA parameters were found for the HCA that Open MPI detected: Hostname: vic12 HCA vendor ID: 0x1425 HCA vendor part ID: 48 Default HCA parameters will be used, which may result in lower performance. You can edit any of the files specified by the btl_openib_hca_param_files MCA parameter to set values for your HCA. NOTE: You can turn off this warning by setting the MCA parameter btl_openib_warn_no_hca_params_found to 0. -------------------------------------------------------------------------- [vic20:04339] *** Process received signal *** [vic12:04539] *** Process received signal *** [vic12:04539] Signal: Segmentation fault (11) [vic12:04539] Signal code: Address not mapped (1) [vic12:04539] Failing at address: 0xffffffffffffffea [vic20:04339] Signal: Segmentation fault (11) [vic20:04339] Signal code: Address not mapped (1) [vic20:04339] Failing at address: 0xffffffffffffffea [vic20:04339] [ 0] /lib64/libpthread.so.0 [0x35db80dd40] [vic20:04339] [ 1] /usr/lib64/libibverbs.so.1(ibv_create_srq+0x3e) [0x32b7e083be] [vic20:04339] [ 2] /usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_btl_openib.so [0x2aaaaf0bdc27] [vic20:04339] [ 3] /usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_btl_openib.so [0x2aaaaf0be07e] [vic20:04339] [ 4] /usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_btl_openib.so(mca_btl_openib_add_procs+0x857) [0x2aaaaf0bd97c] [vic20:04339] [ 5] /usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x37d) [0x2aaaaeeb399e] [vic20:04339] [ 6] /usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0x15c) [0x2aaaaec9036b] [vic20:04339] [ 7] /usr/mpi/gcc/openmpi-trunk/lib64/libmpi.so.0(ompi_mpi_init+0xb2b) [0x2aaaaab03817] [vic20:04339] [ 8] /usr/mpi/gcc/openmpi-trunk/lib64/libmpi.so.0(MPI_Init+0x15d) [0x2aaaaab44dc9] [vic20:04339] [ 9] /usr/mpi/gcc/openmpi-trunk/tests/IMB-2.3/IMB-MPI1(main+0x29) [0x402df9] [vic20:04339] [10] /lib64/libc.so.6(__libc_start_main+0xf4) [0x35dac1d8a4] [vic20:04339] [11] /usr/mpi/gcc/openmpi-trunk/tests/IMB-2.3/IMB-MPI1 [0x402d39] [vic20:04339] *** End of error message *** [vic12:04539] [ 0] /lib64/libpthread.so.0 [0x3a7dc0dd40] [vic12:04539] [ 1] /usr/lib64/libibverbs.so.1(ibv_create_srq+0x3e) [0x3e82e083be] [vic12:04539] [ 2] /usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_btl_openib.so [0x2aaaaf0bdc27] [vic12:04539] [ 3] /usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_btl_openib.so [0x2aaaaf0be07e] [vic12:04539] [ 4] /usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_btl_openib.so(mca_btl_openib_add_procs+0x857) [0x2aaaaf0bd97c] [vic12:04539] [ 5] /usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x37d) [0x2aaaaeeb399e] [vic12:04539] [ 6] /usr/mpi/gcc/openmpi-trunk/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0x15c) [0x2aaaaec9036b] [vic12:04539] [ 7] /usr/mpi/gcc/openmpi-trunk/lib64/libmpi.so.0(ompi_mpi_init+0xb2b) [0x2aaaaab03817] [vic12:04539] [ 8] /usr/mpi/gcc/openmpi-trunk/lib64/libmpi.so.0(MPI_Init+0x15d) [0x2aaaaab44dc9] [vic12:04539] [ 9] /usr/mpi/gcc/openmpi-trunk/tests/IMB-2.3/IMB-MPI1(main+0x29) [0x402df9] [vic12:04539] [10] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3a7d01d8a4] [vic12:04539] [11] /usr/mpi/gcc/openmpi-trunk/tests/IMB-2.3/IMB-MPI1 [0x402d39] [vic12:04539] *** End of error message *** -------------------------------------------------------------------------- mpirun has exited due to process rank 1 with PID 4339 on node vic20 calling "abort". This will have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- I am not having any problems running this test with the openib btl on the ompi-1.2 branch, and I can run this test successfully with the udapl and tcp btls on ompi-trunk. Is anyone else seeing this problem? Thanks, Jon