Past attempts have indicated that only TCP works well with Docker - if you want to use OPA, you’re probably better off using Singularity as your container.
http://singularity.lbl.gov/ <http://singularity.lbl.gov/> The OMPI master has some optimized integration for Singularity, but 2.0.2 will work with it just fine as well. > On Mar 11, 2017, at 11:09 AM, Ender GÜLER <glorifin...@gmail.com> wrote: > > Hi Josh, > > Thanks for your suggestion. When I add "-mca pml ob1" it worked. Actually I > need the psm support (but not with this scenario). Here's the story: > > I compiled the openmpi source with psm2 support becuase the host has OmniPath > device and my first try is to test whether I can use the hardware or not and > I ended up testing the compiled OpenMPI against the different transport modes > without success. > > The psm2 support is working when running directly from physical host and I > suppose the docker layer has something to do with this error. But I cannot > figure out what causes this situation. > > Do you guys, have any idea what to look at next? I'll ask opinion at the > Docker Forums but before that I try to get more information and I wondered > whether anyone else have this kind of problem before. > > Regards, > > Ender > > On Sat, Mar 11, 2017 at 6:19 PM Josh Hursey <jjhur...@open-mpi.org > <mailto:jjhur...@open-mpi.org>> wrote: > From the stack track it looks like it's failing the PSM2 MTL, which you > shouldn't need (or want?) in this scenario. > > Try adding this additional MCA parameter to your command line: > -mca pml ob1 > > That will force Open MPI's selection such that it avoids that component. That > might get you further along. > > > On Sat, Mar 11, 2017 at 7:49 AM, Ender GÜLER <glorifin...@gmail.com > <mailto:glorifin...@gmail.com>> wrote: > Hi there, > > I try to use openmpi in a docker container. My host and container OS is > CentOS 7 (7.2.1511 to be exact). When I try to run a simple MPI hello world > application, the app core dumps every time with BUS ERROR. The OpenMPI > version is 2.0.2 and I compiled in the container. When I copied the > installation from container to host, it runs without any problem. > > Have you ever tried to run OpenMPI and encountered a problem like this one. > If so what can be wrong? What should I do to find the root cause and solve > the problem? The very same application can be run with IntelMPI in the > container without any problem. > > I pasted the output of my mpirun command and its output below. > > [root@cn15 ~]# mpirun --allow-run-as-root -mca btl sm -np 2 -machinefile > mpd.hosts ./mpi_hello.x > [cn15:25287] *** Process received signal *** > [cn15:25287] Signal: Bus error (7) > [cn15:25287] Signal code: Non-existant physical address (2) > [cn15:25287] Failing at address: 0x7fe2d0fbf000 > [cn15:25287] [ 0] /lib64/libpthread.so.0(+0xf100)[0x7fe2d53e9100] > [cn15:25287] [ 1] /lib64/libpsm2.so.2(+0x4b034)[0x7fe2d5a9a034] > [cn15:25287] [ 2] /lib64/libpsm2.so.2(+0xc45f)[0x7fe2d5a5b45f] > [cn15:25287] [ 3] /lib64/libpsm2.so.2(+0xc706)[0x7fe2d5a5b706] > [cn15:25287] [ 4] /lib64/libpsm2.so.2(+0x10d60)[0x7fe2d5a5fd60] > [cn15:25287] [ 5] /lib64/libpsm2.so.2(psm2_ep_open+0x41e)[0x7fe2d5a5e8de] > [cn15:25287] [ 6] > /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_psm2_module_init+0x1df)[0x7fe2d69b5d5b] > [cn15:25287] [ 7] > /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x1b3249)[0x7fe2d69b7249] > [cn15:25287] [ 8] > /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_base_select+0xc2)[0x7fe2d69b2956] > [cn15:25287] [ 9] > /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x216c9f)[0x7fe2d6a1ac9f] > [cn15:25287] [10] > /opt/openmpi/2.0.2/lib/libmpi.so.20(mca_pml_base_select+0x29b)[0x7fe2d69f7566] > [cn15:25287] [11] > /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mpi_init+0x665)[0x7fe2d687e0f4] > [cn15:25287] [12] > /opt/openmpi/2.0.2/lib/libmpi.so.20(MPI_Init+0x99)[0x7fe2d68b1cb4] > [cn15:25287] [13] ./mpi_hello.x[0x400927] > [cn15:25287] [14] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fe2d5039b15] > [cn15:25287] [15] ./mpi_hello.x[0x400839] > [cn15:25287] *** End of error message *** > [cn15:25286] *** Process received signal *** > [cn15:25286] Signal: Bus error (7) > [cn15:25286] Signal code: Non-existant physical address (2) > [cn15:25286] Failing at address: 0x7fd4abb18000 > [cn15:25286] [ 0] /lib64/libpthread.so.0(+0xf100)[0x7fd4b3f56100] > [cn15:25286] [ 1] /lib64/libpsm2.so.2(+0x4b034)[0x7fd4b4607034] > [cn15:25286] [ 2] /lib64/libpsm2.so.2(+0xc45f)[0x7fd4b45c845f] > [cn15:25286] [ 3] /lib64/libpsm2.so.2(+0xc706)[0x7fd4b45c8706] > [cn15:25286] [ 4] /lib64/libpsm2.so.2(+0x10d60)[0x7fd4b45ccd60] > [cn15:25286] [ 5] /lib64/libpsm2.so.2(psm2_ep_open+0x41e)[0x7fd4b45cb8de] > [cn15:25286] [ 6] > /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_psm2_module_init+0x1df)[0x7fd4b5522d5b] > [cn15:25286] [ 7] > /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x1b3249)[0x7fd4b5524249] > [cn15:25286] [ 8] > /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_base_select+0xc2)[0x7fd4b551f956] > [cn15:25286] [ 9] > /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x216c9f)[0x7fd4b5587c9f] > [cn15:25286] [10] > /opt/openmpi/2.0.2/lib/libmpi.so.20(mca_pml_base_select+0x29b)[0x7fd4b5564566] > [cn15:25286] [11] > /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mpi_init+0x665)[0x7fd4b53eb0f4] > [cn15:25286] [12] > /opt/openmpi/2.0.2/lib/libmpi.so.20(MPI_Init+0x99)[0x7fd4b541ecb4] > [cn15:25286] [13] ./mpi_hello.x[0x400927] > [cn15:25286] [14] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd4b3ba6b15] > [cn15:25286] [15] ./mpi_hello.x[0x400839] > [cn15:25286] *** End of error message *** > -------------------------------------------------------------------------- > mpirun noticed that process rank 1 with PID 0 on node cn15 exited on signal 7 > (Bus error). > -------------------------------------------------------------------------- > > Thanks in advance, > > Ender > > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> > > > > -- > Josh Hursey > IBM Spectrum MPI Developer > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>_______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users