I’m not entirely sure I understand what you are trying to do. The 
PMIX_SERVER_URI2 envar tells local clients how to connect to their local PMIx 
server (i.e., the OMPI daemon on that node). This is always done over the 
loopback device since it is a purely local connection that is never used for 
MPI messages.

I’m sure that the tcp/btl is using your indicated subnet as that would be used 
for internode messages.


> On Jun 18, 2018, at 3:52 PM, Maksym Planeta <mplan...@os.inf.tu-dresden.de> 
> wrote:
> 
> Hello,
> 
> I want to force OpenMPI to use TCP and in particular use a particular subnet. 
> Unfortunately, I can't manage to do that.
> 
> Here is what I try:
> 
> $BIN/mpirun --mca pml ob1 --mca btl tcp,self --mca ptl_tcp_remote_connections 
> 1 --mca btl_tcp_if_include '10.233.0.0/19' -np 4  --oversubscribe -H 
> ib1n,ib2n bash -c 'echo $PMIX_SERVER_URI2'
> 
> The expected result would be a list of IP addresses in 10.233.0.0 subnet, but 
> instead I get this:
> 
> 2659516416.2;tcp4://127.0.0.1:46777
> 2659516416.2;tcp4://127.0.0.1:46777
> 2659516416.1;tcp4://127.0.0.1:45055
> 2659516416.1;tcp4://127.0.0.1:45055
> 
> Could you help me to debug this problem somehow?
> 
> The IP addresses are completely available in the desired subnet
> 
> $BIN/mpirun --mca pml ob1 --mca btl tcp,self  --mca 
> ptl_tcp_remote_connections 1 --mca btl_tcp_if_include '10.233.0.0/19' -np 4  
> --oversubscribe -H ib1n,ib2n ip addr show dev br0
> 
> Returns a set of bridges looking like:
> 
> 9: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP 
> group default qlen 1000
>    link/ether 94:de:80:ba:37:e4 brd ff:ff:ff:ff:ff:ff
>    inet 141.76.49.17/26 brd 141.76.49.63 scope global br0
>       valid_lft forever preferred_lft forever
>    inet 10.233.0.82/19 scope global br0
>       valid_lft forever preferred_lft forever
>    inet6 2002:8d4c:3001:48:40de:80ff:feba:37e4/64 scope global deprecated 
> mngtmpaddr dynamic 
>       valid_lft 59528sec preferred_lft 0sec
>    inet6 fe80::96de:80ff:feba:37e4/64 scope link tentative dadfailed 
>       valid_lft forever preferred_lft forever
> <three overs are similar>
> 
> What is more boggling is that if I attache with a debugger at 
> opal/mca/pmix/pmix3x/pmix/src/mca/ptl/tcp/ptl_tcp_components.c around line 
> 500 I see that mca_ptl_tcp_component.remote_connections is false. This means 
> that the way I set up component parameters is ignored.
> 
> -- 
> Regards,
> Maksym Planeta
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to