On Jul 12, 2019, at 18:33, Adrian Reber via users <users@lists.open-mpi.org>
wrote:
So upstream Podman was really fast and merged a PR which makes my
wrapper unnecessary:
Add support for --env-host : https://github.com/containers/libpod/pull/3557
As commented in the PR I can now start mpirun with Podman without a
wrapper:
$ mpirun --hostfile ~/hosts --mca orte_tmpdir_base /tmp/podman-mpirun podman
run --env-host --security-opt label=disable -v
/tmp/podman-mpirun:/tmp/podman-mpirun --userns=keep-id --net=host mpi-test
/home/mpi/ring
Rank 0 has cleared MPI_Init
Rank 1 has cleared MPI_Init
Rank 0 has completed ring
Rank 0 has completed MPI_Barrier
Rank 1 has completed ring
Rank 1 has completed MPI_Barrier
This is example was using TCP and on an InfiniBand based system I have
to map the InfiniBand devices into the container.
$ mpirun --mca btl ^openib --hostfile ~/hosts --mca orte_tmpdir_base
/tmp/podman-mpirun podman run --env-host -v
/tmp/podman-mpirun:/tmp/podman-mpirun --security-opt label=disable
--userns=keep-id --device /dev/infiniband/uverbs0 --device
/dev/infiniband/umad0 --device /dev/infiniband/rdma_cm --net=host mpi-test
/home/mpi/ring
Rank 0 has cleared MPI_Init
Rank 1 has cleared MPI_Init
Rank 0 has completed ring
Rank 0 has completed MPI_Barrier
Rank 1 has completed ring
Rank 1 has completed MPI_Barrier
This is all running without root and only using Podman's rootless
support.
Running multiple processes on one system, however, still gives me an
error. If I disable vader I guess that Open MPI is using TCP for
localhost communication and that works. But with vader it fails.
The first error message I get is a segfault:
[test1:00001] *** Process received signal ***
[test1:00001] Signal: Segmentation fault (11)
[test1:00001] Signal code: Address not mapped (1)
[test1:00001] Failing at address: 0x7fb7b1552010
[test1:00001] [ 0] /lib64/libpthread.so.0(+0x12d80)[0x7f6299456d80]
[test1:00001] [ 1]
/usr/lib64/openmpi/lib/openmpi/mca_btl_vader.so(mca_btl_vader_send+0x3db)[0x7f628b33ab0b]
[test1:00001] [ 2]
/usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_start_rdma+0x1fb)[0x7f62901d24bb]
[test1:00001] [ 3]
/usr/lib64/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0xfd6)[0x7f62901be086]
[test1:00001] [ 4]
/usr/lib64/openmpi/lib/libmpi.so.40(PMPI_Send+0x1bd)[0x7f62996f862d]
[test1:00001] [ 5] /home/mpi/ring[0x400b76]
[test1:00001] [ 6] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f62990a3813]
[test1:00001] [ 7] /home/mpi/ring[0x4008be]
[test1:00001] *** End of error message ***
Guessing that vader uses shared memory this is expected to fail, with
all the namespace isolations in place. Maybe not with a segfault, but
each container has its own shared memory. So next step was to use the
host's ipc and pid namespace and mount /dev/shm:
'-v /dev/shm:/dev/shm --ipc=host --pid=host'
Which does not segfault, but still does not look correct:
Rank 0 has cleared MPI_Init
Rank 1 has cleared MPI_Init
Rank 2 has cleared MPI_Init
[test1:17722] Read -1, expected 80000, errno = 1
[test1:17722] Read -1, expected 80000, errno = 1
[test1:17722] Read -1, expected 80000, errno = 1
[test1:17722] Read -1, expected 80000, errno = 1
[test1:17722] Read -1, expected 80000, errno = 1
[test1:17722] Read -1, expected 80000, errno = 1
[test1:17722] Read -1, expected 80000, errno = 1
[test1:17722] Read -1, expected 80000, errno = 1
[test1:17722] Read -1, expected 80000, errno = 1
[test1:17722] Read -1, expected 80000, errno = 1
[test1:17722] Read -1, expected 80000, errno = 1
Rank 0 has completed ring
Rank 2 has completed ring
Rank 0 has completed MPI_Barrier
Rank 1 has completed ring
Rank 2 has completed MPI_Barrier
Rank 1 has completed MPI_Barrier
This is using the Open MPI ring.c example with SIZE increased from 20 to 20000.
Any recommendations what vader needs to communicate correctly?
Adrian
On Thu, Jul 11, 2019 at 12:07:35PM +0200, Adrian Reber via users wrote:
Gilles,
thanks for pointing out the environment variables. I quickly created a
wrapper which tells Podman to re-export all OMPI_ and PMIX_ variables
(grep "\(PMIX\|OMPI\)"). Now it works:
$ mpirun --hostfile ~/hosts ./wrapper -v /tmp:/tmp --userns=keep-id --net=host
mpi-test /home/mpi/hello
Hello, world (2 procs total)
--> Process # 0 of 2 is alive. ->test1
--> Process # 1 of 2 is alive. ->test2
I need to tell Podman to mount /tmp from the host into the container, as
I am running rootless I also need to tell Podman to use the same user ID
in the container as outside (so that the Open MPI files in /tmp) can be
shared and I am also running without a network namespace.
So this is now with the full Podman provided isolation except the
network namespace. Thanks for you help!
Adrian
On Thu, Jul 11, 2019 at 04:47:21PM +0900, Gilles Gouaillardet via users wrote:
Adrian,
the MPI application relies on some environment variables (they typically
start with OMPI_ and PMIX_).
The MPI application internally uses a PMIx client that must be able to
contact a PMIx server
(that is included in mpirun and the orted daemon(s) spawned on the remote
hosts).
located on the same host.
If podman provides some isolation between the app inside the container (e.g.
/home/mpi/hello)
and the outside world (e.g. mpirun/orted), that won't be an easy ride.
Cheers,
Gilles
On 7/11/2019 4:35 PM, Adrian Reber via users wrote:
I did a quick test to see if I can use Podman in combination with Open
MPI:
[test@test1 ~]$ mpirun --hostfile ~/hosts podman run
quay.io/adrianreber/mpi-test /home/mpi/hello
Hello, world (1 procs total)
--> Process # 0 of 1 is alive. ->789b8fb622ef
Hello, world (1 procs total)
--> Process # 0 of 1 is alive. ->749eb4e1c01a
The test program (hello) is taken from
https://raw.githubusercontent.com/openhpc/ohpc/obs/OpenHPC_1.3.8_Factory/tests/mpi/hello.c
The problem with this is that each process thinks it is process 0 of 1
instead of
Hello, world (2 procs total)
--> Process # 1 of 2 is alive. ->test1
--> Process # 0 of 2 is alive. ->test2
My questions is how is the rank determined? What resources do I need to have
in my container to correctly determine the rank.
This is Podman 1.4.2 and Open MPI 4.0.1.
Adrian
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users