>From the stack track it looks like it's failing the PSM2 MTL, which you shouldn't need (or want?) in this scenario.
Try adding this additional MCA parameter to your command line: -mca pml ob1 That will force Open MPI's selection such that it avoids that component. That might get you further along. On Sat, Mar 11, 2017 at 7:49 AM, Ender GÜLER <glorifin...@gmail.com> wrote: > Hi there, > > I try to use openmpi in a docker container. My host and container OS is > CentOS 7 (7.2.1511 to be exact). When I try to run a simple MPI hello world > application, the app core dumps every time with BUS ERROR. The OpenMPI > version is 2.0.2 and I compiled in the container. When I copied the > installation from container to host, it runs without any problem. > > Have you ever tried to run OpenMPI and encountered a problem like this > one. If so what can be wrong? What should I do to find the root cause and > solve the problem? The very same application can be run with IntelMPI in > the container without any problem. > > I pasted the output of my mpirun command and its output below. > > [root@cn15 ~]# mpirun --allow-run-as-root -mca btl sm -np 2 -machinefile > mpd.hosts ./mpi_hello.x > [cn15:25287] *** Process received signal *** > [cn15:25287] Signal: Bus error (7) > [cn15:25287] Signal code: Non-existant physical address (2) > [cn15:25287] Failing at address: 0x7fe2d0fbf000 > [cn15:25287] [ 0] /lib64/libpthread.so.0(+0xf100)[0x7fe2d53e9100] > [cn15:25287] [ 1] /lib64/libpsm2.so.2(+0x4b034)[0x7fe2d5a9a034] > [cn15:25287] [ 2] /lib64/libpsm2.so.2(+0xc45f)[0x7fe2d5a5b45f] > [cn15:25287] [ 3] /lib64/libpsm2.so.2(+0xc706)[0x7fe2d5a5b706] > [cn15:25287] [ 4] /lib64/libpsm2.so.2(+0x10d60)[0x7fe2d5a5fd60] > [cn15:25287] [ 5] /lib64/libpsm2.so.2(psm2_ep_open+0x41e)[0x7fe2d5a5e8de] > [cn15:25287] [ 6] /opt/openmpi/2.0.2/lib/libmpi. > so.20(ompi_mtl_psm2_module_init+0x1df)[0x7fe2d69b5d5b] > [cn15:25287] [ 7] /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x1b3249)[ > 0x7fe2d69b7249] > [cn15:25287] [ 8] /opt/openmpi/2.0.2/lib/libmpi. > so.20(ompi_mtl_base_select+0xc2)[0x7fe2d69b2956] > [cn15:25287] [ 9] /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x216c9f)[ > 0x7fe2d6a1ac9f] > [cn15:25287] [10] /opt/openmpi/2.0.2/lib/libmpi.so.20(mca_pml_base_select+ > 0x29b)[0x7fe2d69f7566] > [cn15:25287] [11] /opt/openmpi/2.0.2/lib/libmpi. > so.20(ompi_mpi_init+0x665)[0x7fe2d687e0f4] > [cn15:25287] [12] /opt/openmpi/2.0.2/lib/libmpi.so.20(MPI_Init+0x99)[ > 0x7fe2d68b1cb4] > [cn15:25287] [13] ./mpi_hello.x[0x400927] > [cn15:25287] [14] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fe2d5039b15] > [cn15:25287] [15] ./mpi_hello.x[0x400839] > [cn15:25287] *** End of error message *** > [cn15:25286] *** Process received signal *** > [cn15:25286] Signal: Bus error (7) > [cn15:25286] Signal code: Non-existant physical address (2) > [cn15:25286] Failing at address: 0x7fd4abb18000 > [cn15:25286] [ 0] /lib64/libpthread.so.0(+0xf100)[0x7fd4b3f56100] > [cn15:25286] [ 1] /lib64/libpsm2.so.2(+0x4b034)[0x7fd4b4607034] > [cn15:25286] [ 2] /lib64/libpsm2.so.2(+0xc45f)[0x7fd4b45c845f] > [cn15:25286] [ 3] /lib64/libpsm2.so.2(+0xc706)[0x7fd4b45c8706] > [cn15:25286] [ 4] /lib64/libpsm2.so.2(+0x10d60)[0x7fd4b45ccd60] > [cn15:25286] [ 5] /lib64/libpsm2.so.2(psm2_ep_open+0x41e)[0x7fd4b45cb8de] > [cn15:25286] [ 6] /opt/openmpi/2.0.2/lib/libmpi. > so.20(ompi_mtl_psm2_module_init+0x1df)[0x7fd4b5522d5b] > [cn15:25286] [ 7] /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x1b3249)[ > 0x7fd4b5524249] > [cn15:25286] [ 8] /opt/openmpi/2.0.2/lib/libmpi. > so.20(ompi_mtl_base_select+0xc2)[0x7fd4b551f956] > [cn15:25286] [ 9] /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x216c9f)[ > 0x7fd4b5587c9f] > [cn15:25286] [10] /opt/openmpi/2.0.2/lib/libmpi.so.20(mca_pml_base_select+ > 0x29b)[0x7fd4b5564566] > [cn15:25286] [11] /opt/openmpi/2.0.2/lib/libmpi. > so.20(ompi_mpi_init+0x665)[0x7fd4b53eb0f4] > [cn15:25286] [12] /opt/openmpi/2.0.2/lib/libmpi.so.20(MPI_Init+0x99)[ > 0x7fd4b541ecb4] > [cn15:25286] [13] ./mpi_hello.x[0x400927] > [cn15:25286] [14] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd4b3ba6b15] > [cn15:25286] [15] ./mpi_hello.x[0x400839] > [cn15:25286] *** End of error message *** > -------------------------------------------------------------------------- > mpirun noticed that process rank 1 with PID 0 on node cn15 exited on > signal 7 (Bus error). > -------------------------------------------------------------------------- > > Thanks in advance, > > Ender > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > -- Josh Hursey IBM Spectrum MPI Developer
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users