Hi Josh,

Thanks for your suggestion. When I add "-mca pml ob1" it worked. Actually I
need the psm support (but not with this scenario). Here's the story:

I compiled the openmpi source with psm2 support becuase the host has
OmniPath device and my first try is to test whether I can use the hardware
or not and I ended up testing the compiled OpenMPI against the different
transport modes without success.

The psm2 support is working when running directly from physical host and I
suppose the docker layer has something to do with this error. But I cannot
figure out what causes this situation.

Do you guys, have any idea what to look at next? I'll ask opinion at the
Docker Forums but before that I try to get more information and I wondered
whether anyone else have this kind of problem before.

Regards,

Ender

On Sat, Mar 11, 2017 at 6:19 PM Josh Hursey <jjhur...@open-mpi.org> wrote:

> From the stack track it looks like it's failing the PSM2 MTL, which you
> shouldn't need (or want?) in this scenario.
>
> Try adding this additional MCA parameter to your command line:
>  -mca pml ob1
>
> That will force Open MPI's selection such that it avoids that component.
> That might get you further along.
>
>
> On Sat, Mar 11, 2017 at 7:49 AM, Ender GÜLER <glorifin...@gmail.com>
> wrote:
>
> Hi there,
>
> I try to use openmpi in a docker container. My host and container OS is
> CentOS 7 (7.2.1511 to be exact). When I try to run a simple MPI hello world
> application, the app core dumps every time with BUS ERROR. The OpenMPI
> version is 2.0.2 and I compiled in the container. When I copied the
> installation from container to host, it runs without any problem.
>
> Have you ever tried to run OpenMPI and encountered a problem like this
> one. If so what can be wrong? What should I do to find the root cause and
> solve the problem? The very same application can be run with IntelMPI in
> the container without any problem.
>
> I pasted the output of my mpirun command and its output below.
>
> [root@cn15 ~]# mpirun --allow-run-as-root -mca btl sm -np 2 -machinefile
> mpd.hosts ./mpi_hello.x
> [cn15:25287] *** Process received signal ***
> [cn15:25287] Signal: Bus error (7)
> [cn15:25287] Signal code: Non-existant physical address (2)
> [cn15:25287] Failing at address: 0x7fe2d0fbf000
> [cn15:25287] [ 0] /lib64/libpthread.so.0(+0xf100)[0x7fe2d53e9100]
> [cn15:25287] [ 1] /lib64/libpsm2.so.2(+0x4b034)[0x7fe2d5a9a034]
> [cn15:25287] [ 2] /lib64/libpsm2.so.2(+0xc45f)[0x7fe2d5a5b45f]
> [cn15:25287] [ 3] /lib64/libpsm2.so.2(+0xc706)[0x7fe2d5a5b706]
> [cn15:25287] [ 4] /lib64/libpsm2.so.2(+0x10d60)[0x7fe2d5a5fd60]
> [cn15:25287] [ 5] /lib64/libpsm2.so.2(psm2_ep_open+0x41e)[0x7fe2d5a5e8de]
> [cn15:25287] [ 6]
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_psm2_module_init+0x1df)[0x7fe2d69b5d5b]
> [cn15:25287] [ 7]
> /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x1b3249)[0x7fe2d69b7249]
> [cn15:25287] [ 8]
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_base_select+0xc2)[0x7fe2d69b2956]
> [cn15:25287] [ 9]
> /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x216c9f)[0x7fe2d6a1ac9f]
> [cn15:25287] [10]
> /opt/openmpi/2.0.2/lib/libmpi.so.20(mca_pml_base_select+0x29b)[0x7fe2d69f7566]
> [cn15:25287] [11]
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mpi_init+0x665)[0x7fe2d687e0f4]
> [cn15:25287] [12]
> /opt/openmpi/2.0.2/lib/libmpi.so.20(MPI_Init+0x99)[0x7fe2d68b1cb4]
> [cn15:25287] [13] ./mpi_hello.x[0x400927]
> [cn15:25287] [14] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fe2d5039b15]
> [cn15:25287] [15] ./mpi_hello.x[0x400839]
> [cn15:25287] *** End of error message ***
> [cn15:25286] *** Process received signal ***
> [cn15:25286] Signal: Bus error (7)
> [cn15:25286] Signal code: Non-existant physical address (2)
> [cn15:25286] Failing at address: 0x7fd4abb18000
> [cn15:25286] [ 0] /lib64/libpthread.so.0(+0xf100)[0x7fd4b3f56100]
> [cn15:25286] [ 1] /lib64/libpsm2.so.2(+0x4b034)[0x7fd4b4607034]
> [cn15:25286] [ 2] /lib64/libpsm2.so.2(+0xc45f)[0x7fd4b45c845f]
> [cn15:25286] [ 3] /lib64/libpsm2.so.2(+0xc706)[0x7fd4b45c8706]
> [cn15:25286] [ 4] /lib64/libpsm2.so.2(+0x10d60)[0x7fd4b45ccd60]
> [cn15:25286] [ 5] /lib64/libpsm2.so.2(psm2_ep_open+0x41e)[0x7fd4b45cb8de]
> [cn15:25286] [ 6]
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_psm2_module_init+0x1df)[0x7fd4b5522d5b]
> [cn15:25286] [ 7]
> /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x1b3249)[0x7fd4b5524249]
> [cn15:25286] [ 8]
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_base_select+0xc2)[0x7fd4b551f956]
> [cn15:25286] [ 9]
> /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x216c9f)[0x7fd4b5587c9f]
> [cn15:25286] [10]
> /opt/openmpi/2.0.2/lib/libmpi.so.20(mca_pml_base_select+0x29b)[0x7fd4b5564566]
> [cn15:25286] [11]
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mpi_init+0x665)[0x7fd4b53eb0f4]
> [cn15:25286] [12]
> /opt/openmpi/2.0.2/lib/libmpi.so.20(MPI_Init+0x99)[0x7fd4b541ecb4]
> [cn15:25286] [13] ./mpi_hello.x[0x400927]
> [cn15:25286] [14] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd4b3ba6b15]
> [cn15:25286] [15] ./mpi_hello.x[0x400839]
> [cn15:25286] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 0 on node cn15 exited on
> signal 7 (Bus error).
> --------------------------------------------------------------------------
>
> Thanks in advance,
>
> Ender
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
>
> --
> Josh Hursey
> IBM Spectrum MPI Developer
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to