date:20190722

Re: [OMPI users] How is the rank determined (Open MPI and Podman)

2019-07-22 Thread Adrian Reber via users

I had a look at it and not sure if it really makes sense.

In btl_vader_{put,get}.c it would be easy to check for the user
namespace ID of the other process, but the function would then just
return OPAL_ERROR a bit earlier instead of as a result of
process_vm_{read,write}v(). Nothing would really change.

A better place for the check would be mca_btl_vader_check_single_copy()
but I do not know if at this point the PID of the other processes is
already known. Not sure if I can check for the user namespace ID of the
other processes.

Any recommendations how to do this?

Adrian

On Sun, Jul 21, 2019 at 03:08:01PM -0400, Nathan Hjelm wrote:
> Patches are always welcome. What would be great is a nice big warning that 
> CMA support is disabled because the processes are on different namespaces. 
> Ideally all MPI processes should be on the same namespace to ensure the best 
> performance. 
> 
> -Nathan
> 
> > On Jul 21, 2019, at 2:53 PM, Adrian Reber via users 
> >  wrote:
> > 
> > For completeness I am mentioning my results also here.
> > 
> > To be able to mount file systems in the container it can only work if
> > user namespaces are used and even if the user IDs are all the same (in
> > each container and on the host), to be able to ptrace the kernel also
> > checks if the processes are in the same user namespace (in addition to
> > being owned by the same user). This check - same user namespace - fails
> > and so process_vm_readv() and process_vm_writev() will also fail.
> > 
> > So Open MPI's checks are currently not enough to detect if 'cma' can be
> > used. Checking for the same user namespace would also be necessary.
> > 
> > Is this a use case important enough to accept a patch for it?
> > 
> >Adrian
> > 
> >> On Fri, Jul 12, 2019 at 03:42:15PM +0200, Adrian Reber via users wrote:
> >> Gilles,
> >> 
> >> thanks again. Adding '--mca btl_vader_single_copy_mechanism none' helps
> >> indeed.
> >> 
> >> The default seems to be 'cma' and that seems to use process_vm_readv()
> >> and process_vm_writev(). That seems to require CAP_SYS_PTRACE, but
> >> telling Podman to give the process CAP_SYS_PTRACE with 
> >> '--cap-add=SYS_PTRACE'
> >> does not seem to be enough. Not sure yet if this related to the fact
> >> that Podman is running rootless. I will continue to investigate, but now
> >> I know where to look. Thanks!
> >> 
> >>Adrian
> >> 
> >>> On Fri, Jul 12, 2019 at 06:48:59PM +0900, Gilles Gouaillardet via users 
> >>> wrote:
> >>> Adrian,
> >>> 
> >>> Can you try
> >>> mpirun --mca btl_vader_copy_mechanism none ...
> >>> 
> >>> Please double check the MCA parameter name, I am AFK
> >>> 
> >>> IIRC, the default copy mechanism used by vader directly accesses the 
> >>> remote process address space, and this requires some permission (ptrace?) 
> >>> that might be dropped by podman.
> >>> 
> >>> Note Open MPI might not detect both MPI tasks run on the same node 
> >>> because of podman.
> >>> If you use UCX, then btl/vader is not used at all (pml/ucx is used 
> >>> instead)
> >>> 
> >>> 
> >>> Cheers,
> >>> 
> >>> Gilles
> >>> 
> >>> Sent from my iPod
> >>> 
>  On Jul 12, 2019, at 18:33, Adrian Reber via users 
>   wrote:
>  
>  So upstream Podman was really fast and merged a PR which makes my
>  wrapper unnecessary:
>  
>  Add support for --env-host : 
>  https://github.com/containers/libpod/pull/3557
>  
>  As commented in the PR I can now start mpirun with Podman without a
>  wrapper:
>  
>  $ mpirun --hostfile ~/hosts --mca orte_tmpdir_base /tmp/podman-mpirun 
>  podman run --env-host --security-opt label=disable -v 
>  /tmp/podman-mpirun:/tmp/podman-mpirun --userns=keep-id --net=host 
>  mpi-test /home/mpi/ring
>  Rank 0 has cleared MPI_Init
>  Rank 1 has cleared MPI_Init
>  Rank 0 has completed ring
>  Rank 0 has completed MPI_Barrier
>  Rank 1 has completed ring
>  Rank 1 has completed MPI_Barrier
>  
>  This is example was using TCP and on an InfiniBand based system I have
>  to map the InfiniBand devices into the container.
>  
>  $ mpirun --mca btl ^openib --hostfile ~/hosts --mca orte_tmpdir_base 
>  /tmp/podman-mpirun podman run --env-host -v 
>  /tmp/podman-mpirun:/tmp/podman-mpirun --security-opt label=disable 
>  --userns=keep-id --device /dev/infiniband/uverbs0 --device 
>  /dev/infiniband/umad0 --device /dev/infiniband/rdma_cm --net=host 
>  mpi-test /home/mpi/ring
>  Rank 0 has cleared MPI_Init
>  Rank 1 has cleared MPI_Init
>  Rank 0 has completed ring
>  Rank 0 has completed MPI_Barrier
>  Rank 1 has completed ring
>  Rank 1 has completed MPI_Barrier
>  
>  This is all running without root and only using Podman's rootless
>  support.
>  
>  Running multiple processes on one system, however, still gives me an
>  error. If I disable vader I guess that Open MPI is using TCP for
>  loc

Re: [OMPI users] How is the rank determined (Open MPI and Podman)

2019-07-22 Thread Gilles Gouaillardet via users


Adrian,


An option is to involve the modex.

each task would OPAL_MODEX_SEND() its own namespace ID, and then 
OPAL_MODEX_RECV()


the one from its peers and decide whether CMA support can be enabled.


Cheers,


Gilles

On 7/22/2019 4:53 PM, Adrian Reber via users wrote:

I had a look at it and not sure if it really makes sense.

In btl_vader_{put,get}.c it would be easy to check for the user
namespace ID of the other process, but the function would then just
return OPAL_ERROR a bit earlier instead of as a result of
process_vm_{read,write}v(). Nothing would really change.

A better place for the check would be mca_btl_vader_check_single_copy()
but I do not know if at this point the PID of the other processes is
already known. Not sure if I can check for the user namespace ID of the
other processes.

Any recommendations how to do this?

Adrian

On Sun, Jul 21, 2019 at 03:08:01PM -0400, Nathan Hjelm wrote:

Patches are always welcome. What would be great is a nice big warning that CMA 
support is disabled because the processes are on different namespaces. Ideally 
all MPI processes should be on the same namespace to ensure the best 
performance.

-Nathan


On Jul 21, 2019, at 2:53 PM, Adrian Reber via users  
wrote:

For completeness I am mentioning my results also here.

To be able to mount file systems in the container it can only work if
user namespaces are used and even if the user IDs are all the same (in
each container and on the host), to be able to ptrace the kernel also
checks if the processes are in the same user namespace (in addition to
being owned by the same user). This check - same user namespace - fails
and so process_vm_readv() and process_vm_writev() will also fail.

So Open MPI's checks are currently not enough to detect if 'cma' can be
used. Checking for the same user namespace would also be necessary.

Is this a use case important enough to accept a patch for it?

Adrian


On Fri, Jul 12, 2019 at 03:42:15PM +0200, Adrian Reber via users wrote:
Gilles,

thanks again. Adding '--mca btl_vader_single_copy_mechanism none' helps
indeed.

The default seems to be 'cma' and that seems to use process_vm_readv()
and process_vm_writev(). That seems to require CAP_SYS_PTRACE, but
telling Podman to give the process CAP_SYS_PTRACE with '--cap-add=SYS_PTRACE'
does not seem to be enough. Not sure yet if this related to the fact
that Podman is running rootless. I will continue to investigate, but now
I know where to look. Thanks!

Adrian


On Fri, Jul 12, 2019 at 06:48:59PM +0900, Gilles Gouaillardet via users wrote:
Adrian,

Can you try
mpirun --mca btl_vader_copy_mechanism none ...

Please double check the MCA parameter name, I am AFK

IIRC, the default copy mechanism used by vader directly accesses the remote 
process address space, and this requires some permission (ptrace?) that might 
be dropped by podman.

Note Open MPI might not detect both MPI tasks run on the same node because of 
podman.
If you use UCX, then btl/vader is not used at all (pml/ucx is used instead)


Cheers,

Gilles

Sent from my iPod


On Jul 12, 2019, at 18:33, Adrian Reber via users  
wrote:

So upstream Podman was really fast and merged a PR which makes my
wrapper unnecessary:

Add support for --env-host : https://github.com/containers/libpod/pull/3557

As commented in the PR I can now start mpirun with Podman without a
wrapper:

$ mpirun --hostfile ~/hosts --mca orte_tmpdir_base /tmp/podman-mpirun podman 
run --env-host --security-opt label=disable -v 
/tmp/podman-mpirun:/tmp/podman-mpirun --userns=keep-id --net=host mpi-test 
/home/mpi/ring
Rank 0 has cleared MPI_Init
Rank 1 has cleared MPI_Init
Rank 0 has completed ring
Rank 0 has completed MPI_Barrier
Rank 1 has completed ring
Rank 1 has completed MPI_Barrier

This is example was using TCP and on an InfiniBand based system I have
to map the InfiniBand devices into the container.

$ mpirun --mca btl ^openib --hostfile ~/hosts --mca orte_tmpdir_base 
/tmp/podman-mpirun podman run --env-host -v 
/tmp/podman-mpirun:/tmp/podman-mpirun --security-opt label=disable 
--userns=keep-id --device /dev/infiniband/uverbs0 --device 
/dev/infiniband/umad0 --device /dev/infiniband/rdma_cm --net=host mpi-test 
/home/mpi/ring
Rank 0 has cleared MPI_Init
Rank 1 has cleared MPI_Init
Rank 0 has completed ring
Rank 0 has completed MPI_Barrier
Rank 1 has completed ring
Rank 1 has completed MPI_Barrier

This is all running without root and only using Podman's rootless
support.

Running multiple processes on one system, however, still gives me an
error. If I disable vader I guess that Open MPI is using TCP for
localhost communication and that works. But with vader it fails.

The first error message I get is a segfault:

[test1:1] *** Process received signal ***
[test1:1] Signal: Segmentation fault (11)
[test1:1] Signal code: Address not mapped (1)
[test1:1] Failing at address: 0x7fb7b1552010
[test1:1] [ 0] /lib64/libpthread.so.0(+0x12d80)[0x7f6

Re: [OMPI users] When is it save to free the buffer after MPI_Isend?

2019-07-22 Thread Jeff Squyres (jsquyres) via users

> On Jul 21, 2019, at 11:31 AM, carlos aguni via users 
>  wrote:
> 
> MPI_Isend()
> ... some stuff..
> flag = 0;
> MPI_Test(req, &flag, &status);
> if (flag){
> free(buffer);
> }
> 
> After the free() i'm getting errors like:
> [[58327,1],0][btl_tcp_frag.c:130:mca_btl_tcp_frag_send] 
> mca_btl_tcp_frag_send: writev error (0x2b9daf474000, 12800)
> Bad address(1)
> [[58327,1],0][btl_tcp_frag.c:130:mca_btl_tcp_frag_send] 
> mca_btl_tcp_frag_send: writev error (0x2b9daf473ee8, 19608)
> Bad address(1)
> pml_ob1_sendreq.c:308 FATAL

Do you get the same error if you don't free()?

-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] How is the rank determined (Open MPI and Podman)

2019-07-22 Thread Ralph Castain via users

If that works, then it might be possible to include the namespace ID in the 
job-info provided by PMIx at startup - would have to investigate, so please 
confirm that the modex option works first.

> On Jul 22, 2019, at 1:22 AM, Gilles Gouaillardet via users 
>  wrote:
> 
> Adrian,
> 
> 
> An option is to involve the modex.
> 
> each task would OPAL_MODEX_SEND() its own namespace ID, and then 
> OPAL_MODEX_RECV()
> 
> the one from its peers and decide whether CMA support can be enabled.
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> On 7/22/2019 4:53 PM, Adrian Reber via users wrote:
>> I had a look at it and not sure if it really makes sense.
>> 
>> In btl_vader_{put,get}.c it would be easy to check for the user
>> namespace ID of the other process, but the function would then just
>> return OPAL_ERROR a bit earlier instead of as a result of
>> process_vm_{read,write}v(). Nothing would really change.
>> 
>> A better place for the check would be mca_btl_vader_check_single_copy()
>> but I do not know if at this point the PID of the other processes is
>> already known. Not sure if I can check for the user namespace ID of the
>> other processes.
>> 
>> Any recommendations how to do this?
>> 
>>  Adrian
>> 
>> On Sun, Jul 21, 2019 at 03:08:01PM -0400, Nathan Hjelm wrote:
>>> Patches are always welcome. What would be great is a nice big warning that 
>>> CMA support is disabled because the processes are on different namespaces. 
>>> Ideally all MPI processes should be on the same namespace to ensure the 
>>> best performance.
>>> 
>>> -Nathan
>>> 
 On Jul 21, 2019, at 2:53 PM, Adrian Reber via users 
  wrote:
 
 For completeness I am mentioning my results also here.
 
 To be able to mount file systems in the container it can only work if
 user namespaces are used and even if the user IDs are all the same (in
 each container and on the host), to be able to ptrace the kernel also
 checks if the processes are in the same user namespace (in addition to
 being owned by the same user). This check - same user namespace - fails
 and so process_vm_readv() and process_vm_writev() will also fail.
 
 So Open MPI's checks are currently not enough to detect if 'cma' can be
 used. Checking for the same user namespace would also be necessary.
 
 Is this a use case important enough to accept a patch for it?
 
Adrian
 
> On Fri, Jul 12, 2019 at 03:42:15PM +0200, Adrian Reber via users wrote:
> Gilles,
> 
> thanks again. Adding '--mca btl_vader_single_copy_mechanism none' helps
> indeed.
> 
> The default seems to be 'cma' and that seems to use process_vm_readv()
> and process_vm_writev(). That seems to require CAP_SYS_PTRACE, but
> telling Podman to give the process CAP_SYS_PTRACE with 
> '--cap-add=SYS_PTRACE'
> does not seem to be enough. Not sure yet if this related to the fact
> that Podman is running rootless. I will continue to investigate, but now
> I know where to look. Thanks!
> 
>Adrian
> 
>> On Fri, Jul 12, 2019 at 06:48:59PM +0900, Gilles Gouaillardet via users 
>> wrote:
>> Adrian,
>> 
>> Can you try
>> mpirun --mca btl_vader_copy_mechanism none ...
>> 
>> Please double check the MCA parameter name, I am AFK
>> 
>> IIRC, the default copy mechanism used by vader directly accesses the 
>> remote process address space, and this requires some permission 
>> (ptrace?) that might be dropped by podman.
>> 
>> Note Open MPI might not detect both MPI tasks run on the same node 
>> because of podman.
>> If you use UCX, then btl/vader is not used at all (pml/ucx is used 
>> instead)
>> 
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> Sent from my iPod
>> 
>>> On Jul 12, 2019, at 18:33, Adrian Reber via users 
>>>  wrote:
>>> 
>>> So upstream Podman was really fast and merged a PR which makes my
>>> wrapper unnecessary:
>>> 
>>> Add support for --env-host : 
>>> https://github.com/containers/libpod/pull/3557
>>> 
>>> As commented in the PR I can now start mpirun with Podman without a
>>> wrapper:
>>> 
>>> $ mpirun --hostfile ~/hosts --mca orte_tmpdir_base /tmp/podman-mpirun 
>>> podman run --env-host --security-opt label=disable -v 
>>> /tmp/podman-mpirun:/tmp/podman-mpirun --userns=keep-id --net=host 
>>> mpi-test /home/mpi/ring
>>> Rank 0 has cleared MPI_Init
>>> Rank 1 has cleared MPI_Init
>>> Rank 0 has completed ring
>>> Rank 0 has completed MPI_Barrier
>>> Rank 1 has completed ring
>>> Rank 1 has completed MPI_Barrier
>>> 
>>> This is example was using TCP and on an InfiniBand based system I have
>>> to map the InfiniBand devices into the container.
>>> 
>>> $ mpirun --mca btl ^openib --hostfile ~/hosts --mca orte_tmpdir_base 
>>> /tmp/podman-mpirun podman run --env-host -

Re: [OMPI users] How is the rank determined (Open MPI and Podman)

2019-07-22 Thread Adrian Reber via users

I have most of the code ready, but I still have troubles doing
OPAL_MODEX_RECV. I am using the following lines, based on the code from
orte/test/mpi/pmix.c:

OPAL_MODEX_SEND_VALUE(rc, OPAL_PMIX_LOCAL, "user_ns_id", &value, OPAL_INT);

This sets rc to 0. For receiving:

OPAL_MODEX_RECV_VALUE(rc, "user_ns_id", &wildcard_rank, &ptr, OPAL_INT);

and rc is always set to -13. Is this how it is supposed to work, or do I
have to do it differently?

Adrian

On Mon, Jul 22, 2019 at 02:03:20PM +, Ralph Castain via users wrote:
> If that works, then it might be possible to include the namespace ID in the 
> job-info provided by PMIx at startup - would have to investigate, so please 
> confirm that the modex option works first.
> 
> > On Jul 22, 2019, at 1:22 AM, Gilles Gouaillardet via users 
> >  wrote:
> > 
> > Adrian,
> > 
> > 
> > An option is to involve the modex.
> > 
> > each task would OPAL_MODEX_SEND() its own namespace ID, and then 
> > OPAL_MODEX_RECV()
> > 
> > the one from its peers and decide whether CMA support can be enabled.
> > 
> > 
> > Cheers,
> > 
> > 
> > Gilles
> > 
> > On 7/22/2019 4:53 PM, Adrian Reber via users wrote:
> >> I had a look at it and not sure if it really makes sense.
> >> 
> >> In btl_vader_{put,get}.c it would be easy to check for the user
> >> namespace ID of the other process, but the function would then just
> >> return OPAL_ERROR a bit earlier instead of as a result of
> >> process_vm_{read,write}v(). Nothing would really change.
> >> 
> >> A better place for the check would be mca_btl_vader_check_single_copy()
> >> but I do not know if at this point the PID of the other processes is
> >> already known. Not sure if I can check for the user namespace ID of the
> >> other processes.
> >> 
> >> Any recommendations how to do this?
> >> 
> >>Adrian
> >> 
> >> On Sun, Jul 21, 2019 at 03:08:01PM -0400, Nathan Hjelm wrote:
> >>> Patches are always welcome. What would be great is a nice big warning 
> >>> that CMA support is disabled because the processes are on different 
> >>> namespaces. Ideally all MPI processes should be on the same namespace to 
> >>> ensure the best performance.
> >>> 
> >>> -Nathan
> >>> 
>  On Jul 21, 2019, at 2:53 PM, Adrian Reber via users 
>   wrote:
>  
>  For completeness I am mentioning my results also here.
>  
>  To be able to mount file systems in the container it can only work if
>  user namespaces are used and even if the user IDs are all the same (in
>  each container and on the host), to be able to ptrace the kernel also
>  checks if the processes are in the same user namespace (in addition to
>  being owned by the same user). This check - same user namespace - fails
>  and so process_vm_readv() and process_vm_writev() will also fail.
>  
>  So Open MPI's checks are currently not enough to detect if 'cma' can be
>  used. Checking for the same user namespace would also be necessary.
>  
>  Is this a use case important enough to accept a patch for it?
>  
> Adrian
>  
> > On Fri, Jul 12, 2019 at 03:42:15PM +0200, Adrian Reber via users wrote:
> > Gilles,
> > 
> > thanks again. Adding '--mca btl_vader_single_copy_mechanism none' helps
> > indeed.
> > 
> > The default seems to be 'cma' and that seems to use process_vm_readv()
> > and process_vm_writev(). That seems to require CAP_SYS_PTRACE, but
> > telling Podman to give the process CAP_SYS_PTRACE with 
> > '--cap-add=SYS_PTRACE'
> > does not seem to be enough. Not sure yet if this related to the fact
> > that Podman is running rootless. I will continue to investigate, but now
> > I know where to look. Thanks!
> > 
> >Adrian
> > 
> >> On Fri, Jul 12, 2019 at 06:48:59PM +0900, Gilles Gouaillardet via 
> >> users wrote:
> >> Adrian,
> >> 
> >> Can you try
> >> mpirun --mca btl_vader_copy_mechanism none ...
> >> 
> >> Please double check the MCA parameter name, I am AFK
> >> 
> >> IIRC, the default copy mechanism used by vader directly accesses the 
> >> remote process address space, and this requires some permission 
> >> (ptrace?) that might be dropped by podman.
> >> 
> >> Note Open MPI might not detect both MPI tasks run on the same node 
> >> because of podman.
> >> If you use UCX, then btl/vader is not used at all (pml/ucx is used 
> >> instead)
> >> 
> >> 
> >> Cheers,
> >> 
> >> Gilles
> >> 
> >> Sent from my iPod
> >> 
> >>> On Jul 12, 2019, at 18:33, Adrian Reber via users 
> >>>  wrote:
> >>> 
> >>> So upstream Podman was really fast and merged a PR which makes my
> >>> wrapper unnecessary:
> >>> 
> >>> Add support for --env-host : 
> >>> https://github.com/containers/libpod/pull/3557
> >>> 
> >>> As commented in the PR I can now start mpirun with Podman without a
> >>

Re: [OMPI users] How is the rank determined (Open MPI and Podman)

2019-07-22 Thread Nathan Hjelm via users

Just add it to the existing modex.

-Nathan

> On Jul 22, 2019, at 12:20 PM, Adrian Reber via users 
>  wrote:
> 
> I have most of the code ready, but I still have troubles doing
> OPAL_MODEX_RECV. I am using the following lines, based on the code from
> orte/test/mpi/pmix.c:
> 
> OPAL_MODEX_SEND_VALUE(rc, OPAL_PMIX_LOCAL, "user_ns_id", &value, OPAL_INT);
> 
> This sets rc to 0. For receiving:
> 
> OPAL_MODEX_RECV_VALUE(rc, "user_ns_id", &wildcard_rank, &ptr, OPAL_INT);
> 
> and rc is always set to -13. Is this how it is supposed to work, or do I
> have to do it differently?
> 
>Adrian
> 
>> On Mon, Jul 22, 2019 at 02:03:20PM +, Ralph Castain via users wrote:
>> If that works, then it might be possible to include the namespace ID in the 
>> job-info provided by PMIx at startup - would have to investigate, so please 
>> confirm that the modex option works first.
>> 
>>> On Jul 22, 2019, at 1:22 AM, Gilles Gouaillardet via users 
>>>  wrote:
>>> 
>>> Adrian,
>>> 
>>> 
>>> An option is to involve the modex.
>>> 
>>> each task would OPAL_MODEX_SEND() its own namespace ID, and then 
>>> OPAL_MODEX_RECV()
>>> 
>>> the one from its peers and decide whether CMA support can be enabled.
>>> 
>>> 
>>> Cheers,
>>> 
>>> 
>>> Gilles
>>> 
 On 7/22/2019 4:53 PM, Adrian Reber via users wrote:
 I had a look at it and not sure if it really makes sense.
 
 In btl_vader_{put,get}.c it would be easy to check for the user
 namespace ID of the other process, but the function would then just
 return OPAL_ERROR a bit earlier instead of as a result of
 process_vm_{read,write}v(). Nothing would really change.
 
 A better place for the check would be mca_btl_vader_check_single_copy()
 but I do not know if at this point the PID of the other processes is
 already known. Not sure if I can check for the user namespace ID of the
 other processes.
 
 Any recommendations how to do this?
 
Adrian
 
> On Sun, Jul 21, 2019 at 03:08:01PM -0400, Nathan Hjelm wrote:
> Patches are always welcome. What would be great is a nice big warning 
> that CMA support is disabled because the processes are on different 
> namespaces. Ideally all MPI processes should be on the same namespace to 
> ensure the best performance.
> 
> -Nathan
> 
>> On Jul 21, 2019, at 2:53 PM, Adrian Reber via users 
>>  wrote:
>> 
>> For completeness I am mentioning my results also here.
>> 
>> To be able to mount file systems in the container it can only work if
>> user namespaces are used and even if the user IDs are all the same (in
>> each container and on the host), to be able to ptrace the kernel also
>> checks if the processes are in the same user namespace (in addition to
>> being owned by the same user). This check - same user namespace - fails
>> and so process_vm_readv() and process_vm_writev() will also fail.
>> 
>> So Open MPI's checks are currently not enough to detect if 'cma' can be
>> used. Checking for the same user namespace would also be necessary.
>> 
>> Is this a use case important enough to accept a patch for it?
>> 
>>   Adrian
>> 
>>> On Fri, Jul 12, 2019 at 03:42:15PM +0200, Adrian Reber via users wrote:
>>> Gilles,
>>> 
>>> thanks again. Adding '--mca btl_vader_single_copy_mechanism none' helps
>>> indeed.
>>> 
>>> The default seems to be 'cma' and that seems to use process_vm_readv()
>>> and process_vm_writev(). That seems to require CAP_SYS_PTRACE, but
>>> telling Podman to give the process CAP_SYS_PTRACE with 
>>> '--cap-add=SYS_PTRACE'
>>> does not seem to be enough. Not sure yet if this related to the fact
>>> that Podman is running rootless. I will continue to investigate, but now
>>> I know where to look. Thanks!
>>> 
>>>   Adrian
>>> 
 On Fri, Jul 12, 2019 at 06:48:59PM +0900, Gilles Gouaillardet via 
 users wrote:
 Adrian,
 
 Can you try
 mpirun --mca btl_vader_copy_mechanism none ...
 
 Please double check the MCA parameter name, I am AFK
 
 IIRC, the default copy mechanism used by vader directly accesses the 
 remote process address space, and this requires some permission 
 (ptrace?) that might be dropped by podman.
 
 Note Open MPI might not detect both MPI tasks run on the same node 
 because of podman.
 If you use UCX, then btl/vader is not used at all (pml/ucx is used 
 instead)
 
 
 Cheers,
 
 Gilles
 
 Sent from my iPod
 
> On Jul 12, 2019, at 18:33, Adrian Reber via users 
>  wrote:
> 
> So upstream Podman was really fast and merged a PR which makes my
> wrapper unnecessary:
> 
> Add support for --env-host : 
> ht

Re: [OMPI users] How is the rank determined (Open MPI and Podman)

Re: [OMPI users] How is the rank determined (Open MPI and Podman)

Re: [OMPI users] When is it save to free the buffer after MPI_Isend?

Re: [OMPI users] How is the rank determined (Open MPI and Podman)

Re: [OMPI users] How is the rank determined (Open MPI and Podman)

Re: [OMPI users] How is the rank determined (Open MPI and Podman)

6 matches

Site Navigation

Mail list logo

Footer information