Past attempts have indicated that only TCP works well with Docker - if you want 
to use OPA, you’re probably better off using Singularity as your container.

http://singularity.lbl.gov/ <http://singularity.lbl.gov/>

The OMPI master has some optimized integration for Singularity, but 2.0.2 will 
work with it just fine as well.


> On Mar 11, 2017, at 11:09 AM, Ender GÜLER <glorifin...@gmail.com> wrote:
> 
> Hi Josh,
> 
> Thanks for your suggestion. When I add "-mca pml ob1" it worked. Actually I 
> need the psm support (but not with this scenario). Here's the story: 
> 
> I compiled the openmpi source with psm2 support becuase the host has OmniPath 
> device and my first try is to test whether I can use the hardware or not and 
> I ended up testing the compiled OpenMPI against the different transport modes 
> without success. 
> 
> The psm2 support is working when running directly from physical host and I 
> suppose the docker layer has something to do with this error. But I cannot 
> figure out what causes this situation.
> 
> Do you guys, have any idea what to look at next? I'll ask opinion at the 
> Docker Forums but before that I try to get more information and I wondered 
> whether anyone else have this kind of problem before.
> 
> Regards,
> 
> Ender
> 
> On Sat, Mar 11, 2017 at 6:19 PM Josh Hursey <jjhur...@open-mpi.org 
> <mailto:jjhur...@open-mpi.org>> wrote:
> From the stack track it looks like it's failing the PSM2 MTL, which you 
> shouldn't need (or want?) in this scenario.
> 
> Try adding this additional MCA parameter to your command line:
>  -mca pml ob1
> 
> That will force Open MPI's selection such that it avoids that component. That 
> might get you further along.
> 
> 
> On Sat, Mar 11, 2017 at 7:49 AM, Ender GÜLER <glorifin...@gmail.com 
> <mailto:glorifin...@gmail.com>> wrote:
> Hi there,
> 
> I try to use openmpi in a docker container. My host and container OS is 
> CentOS 7 (7.2.1511 to be exact). When I try to run a simple MPI hello world 
> application, the app core dumps every time with BUS ERROR. The OpenMPI 
> version is 2.0.2 and I compiled in the container. When I copied the 
> installation from container to host, it runs without any problem.
> 
> Have you ever tried to run OpenMPI and encountered a problem like this one. 
> If so what can be wrong? What should I do to find the root cause and solve 
> the problem? The very same application can be run with IntelMPI in the 
> container without any problem.
> 
> I pasted the output of my mpirun command and its output below.
> 
> [root@cn15 ~]# mpirun --allow-run-as-root -mca btl sm -np 2 -machinefile 
> mpd.hosts ./mpi_hello.x
> [cn15:25287] *** Process received signal ***
> [cn15:25287] Signal: Bus error (7)
> [cn15:25287] Signal code: Non-existant physical address (2)
> [cn15:25287] Failing at address: 0x7fe2d0fbf000
> [cn15:25287] [ 0] /lib64/libpthread.so.0(+0xf100)[0x7fe2d53e9100]
> [cn15:25287] [ 1] /lib64/libpsm2.so.2(+0x4b034)[0x7fe2d5a9a034]
> [cn15:25287] [ 2] /lib64/libpsm2.so.2(+0xc45f)[0x7fe2d5a5b45f]
> [cn15:25287] [ 3] /lib64/libpsm2.so.2(+0xc706)[0x7fe2d5a5b706]
> [cn15:25287] [ 4] /lib64/libpsm2.so.2(+0x10d60)[0x7fe2d5a5fd60]
> [cn15:25287] [ 5] /lib64/libpsm2.so.2(psm2_ep_open+0x41e)[0x7fe2d5a5e8de]
> [cn15:25287] [ 6] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_psm2_module_init+0x1df)[0x7fe2d69b5d5b]
> [cn15:25287] [ 7] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x1b3249)[0x7fe2d69b7249]
> [cn15:25287] [ 8] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_base_select+0xc2)[0x7fe2d69b2956]
> [cn15:25287] [ 9] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x216c9f)[0x7fe2d6a1ac9f]
> [cn15:25287] [10] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(mca_pml_base_select+0x29b)[0x7fe2d69f7566]
> [cn15:25287] [11] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mpi_init+0x665)[0x7fe2d687e0f4]
> [cn15:25287] [12] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(MPI_Init+0x99)[0x7fe2d68b1cb4]
> [cn15:25287] [13] ./mpi_hello.x[0x400927]
> [cn15:25287] [14] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fe2d5039b15]
> [cn15:25287] [15] ./mpi_hello.x[0x400839]
> [cn15:25287] *** End of error message ***
> [cn15:25286] *** Process received signal ***
> [cn15:25286] Signal: Bus error (7)
> [cn15:25286] Signal code: Non-existant physical address (2)
> [cn15:25286] Failing at address: 0x7fd4abb18000
> [cn15:25286] [ 0] /lib64/libpthread.so.0(+0xf100)[0x7fd4b3f56100]
> [cn15:25286] [ 1] /lib64/libpsm2.so.2(+0x4b034)[0x7fd4b4607034]
> [cn15:25286] [ 2] /lib64/libpsm2.so.2(+0xc45f)[0x7fd4b45c845f]
> [cn15:25286] [ 3] /lib64/libpsm2.so.2(+0xc706)[0x7fd4b45c8706]
> [cn15:25286] [ 4] /lib64/libpsm2.so.2(+0x10d60)[0x7fd4b45ccd60]
> [cn15:25286] [ 5] /lib64/libpsm2.so.2(psm2_ep_open+0x41e)[0x7fd4b45cb8de]
> [cn15:25286] [ 6] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_psm2_module_init+0x1df)[0x7fd4b5522d5b]
> [cn15:25286] [ 7] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x1b3249)[0x7fd4b5524249]
> [cn15:25286] [ 8] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_base_select+0xc2)[0x7fd4b551f956]
> [cn15:25286] [ 9] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x216c9f)[0x7fd4b5587c9f]
> [cn15:25286] [10] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(mca_pml_base_select+0x29b)[0x7fd4b5564566]
> [cn15:25286] [11] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mpi_init+0x665)[0x7fd4b53eb0f4]
> [cn15:25286] [12] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(MPI_Init+0x99)[0x7fd4b541ecb4]
> [cn15:25286] [13] ./mpi_hello.x[0x400927]
> [cn15:25286] [14] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd4b3ba6b15]
> [cn15:25286] [15] ./mpi_hello.x[0x400839]
> [cn15:25286] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 0 on node cn15 exited on signal 7 
> (Bus error).
> --------------------------------------------------------------------------
> 
> Thanks in advance,
> 
> Ender
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> 
> 
> 
> -- 
> Josh Hursey
> IBM Spectrum MPI Developer
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>_______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to