Hi there, I try to use openmpi in a docker container. My host and container OS is CentOS 7 (7.2.1511 to be exact). When I try to run a simple MPI hello world application, the app core dumps every time with BUS ERROR. The OpenMPI version is 2.0.2 and I compiled in the container. When I copied the installation from container to host, it runs without any problem.
Have you ever tried to run OpenMPI and encountered a problem like this one. If so what can be wrong? What should I do to find the root cause and solve the problem? The very same application can be run with IntelMPI in the container without any problem. I pasted the output of my mpirun command and its output below. [root@cn15 ~]# mpirun --allow-run-as-root -mca btl sm -np 2 -machinefile mpd.hosts ./mpi_hello.x [cn15:25287] *** Process received signal *** [cn15:25287] Signal: Bus error (7) [cn15:25287] Signal code: Non-existant physical address (2) [cn15:25287] Failing at address: 0x7fe2d0fbf000 [cn15:25287] [ 0] /lib64/libpthread.so.0(+0xf100)[0x7fe2d53e9100] [cn15:25287] [ 1] /lib64/libpsm2.so.2(+0x4b034)[0x7fe2d5a9a034] [cn15:25287] [ 2] /lib64/libpsm2.so.2(+0xc45f)[0x7fe2d5a5b45f] [cn15:25287] [ 3] /lib64/libpsm2.so.2(+0xc706)[0x7fe2d5a5b706] [cn15:25287] [ 4] /lib64/libpsm2.so.2(+0x10d60)[0x7fe2d5a5fd60] [cn15:25287] [ 5] /lib64/libpsm2.so.2(psm2_ep_open+0x41e)[0x7fe2d5a5e8de] [cn15:25287] [ 6] /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_psm2_module_init+0x1df)[0x7fe2d69b5d5b] [cn15:25287] [ 7] /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x1b3249)[0x7fe2d69b7249] [cn15:25287] [ 8] /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_base_select+0xc2)[0x7fe2d69b2956] [cn15:25287] [ 9] /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x216c9f)[0x7fe2d6a1ac9f] [cn15:25287] [10] /opt/openmpi/2.0.2/lib/libmpi.so.20(mca_pml_base_select+0x29b)[0x7fe2d69f7566] [cn15:25287] [11] /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mpi_init+0x665)[0x7fe2d687e0f4] [cn15:25287] [12] /opt/openmpi/2.0.2/lib/libmpi.so.20(MPI_Init+0x99)[0x7fe2d68b1cb4] [cn15:25287] [13] ./mpi_hello.x[0x400927] [cn15:25287] [14] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fe2d5039b15] [cn15:25287] [15] ./mpi_hello.x[0x400839] [cn15:25287] *** End of error message *** [cn15:25286] *** Process received signal *** [cn15:25286] Signal: Bus error (7) [cn15:25286] Signal code: Non-existant physical address (2) [cn15:25286] Failing at address: 0x7fd4abb18000 [cn15:25286] [ 0] /lib64/libpthread.so.0(+0xf100)[0x7fd4b3f56100] [cn15:25286] [ 1] /lib64/libpsm2.so.2(+0x4b034)[0x7fd4b4607034] [cn15:25286] [ 2] /lib64/libpsm2.so.2(+0xc45f)[0x7fd4b45c845f] [cn15:25286] [ 3] /lib64/libpsm2.so.2(+0xc706)[0x7fd4b45c8706] [cn15:25286] [ 4] /lib64/libpsm2.so.2(+0x10d60)[0x7fd4b45ccd60] [cn15:25286] [ 5] /lib64/libpsm2.so.2(psm2_ep_open+0x41e)[0x7fd4b45cb8de] [cn15:25286] [ 6] /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_psm2_module_init+0x1df)[0x7fd4b5522d5b] [cn15:25286] [ 7] /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x1b3249)[0x7fd4b5524249] [cn15:25286] [ 8] /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_base_select+0xc2)[0x7fd4b551f956] [cn15:25286] [ 9] /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x216c9f)[0x7fd4b5587c9f] [cn15:25286] [10] /opt/openmpi/2.0.2/lib/libmpi.so.20(mca_pml_base_select+0x29b)[0x7fd4b5564566] [cn15:25286] [11] /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mpi_init+0x665)[0x7fd4b53eb0f4] [cn15:25286] [12] /opt/openmpi/2.0.2/lib/libmpi.so.20(MPI_Init+0x99)[0x7fd4b541ecb4] [cn15:25286] [13] ./mpi_hello.x[0x400927] [cn15:25286] [14] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd4b3ba6b15] [cn15:25286] [15] ./mpi_hello.x[0x400839] [cn15:25286] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 0 on node cn15 exited on signal 7 (Bus error). -------------------------------------------------------------------------- Thanks in advance, Ender
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users