I’m not entirely sure I understand what you are trying to do. The PMIX_SERVER_URI2 envar tells local clients how to connect to their local PMIx server (i.e., the OMPI daemon on that node). This is always done over the loopback device since it is a purely local connection that is never used for MPI messages.
I’m sure that the tcp/btl is using your indicated subnet as that would be used for internode messages. > On Jun 18, 2018, at 3:52 PM, Maksym Planeta <mplan...@os.inf.tu-dresden.de> > wrote: > > Hello, > > I want to force OpenMPI to use TCP and in particular use a particular subnet. > Unfortunately, I can't manage to do that. > > Here is what I try: > > $BIN/mpirun --mca pml ob1 --mca btl tcp,self --mca ptl_tcp_remote_connections > 1 --mca btl_tcp_if_include '10.233.0.0/19' -np 4 --oversubscribe -H > ib1n,ib2n bash -c 'echo $PMIX_SERVER_URI2' > > The expected result would be a list of IP addresses in 10.233.0.0 subnet, but > instead I get this: > > 2659516416.2;tcp4://127.0.0.1:46777 > 2659516416.2;tcp4://127.0.0.1:46777 > 2659516416.1;tcp4://127.0.0.1:45055 > 2659516416.1;tcp4://127.0.0.1:45055 > > Could you help me to debug this problem somehow? > > The IP addresses are completely available in the desired subnet > > $BIN/mpirun --mca pml ob1 --mca btl tcp,self --mca > ptl_tcp_remote_connections 1 --mca btl_tcp_if_include '10.233.0.0/19' -np 4 > --oversubscribe -H ib1n,ib2n ip addr show dev br0 > > Returns a set of bridges looking like: > > 9: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP > group default qlen 1000 > link/ether 94:de:80:ba:37:e4 brd ff:ff:ff:ff:ff:ff > inet 141.76.49.17/26 brd 141.76.49.63 scope global br0 > valid_lft forever preferred_lft forever > inet 10.233.0.82/19 scope global br0 > valid_lft forever preferred_lft forever > inet6 2002:8d4c:3001:48:40de:80ff:feba:37e4/64 scope global deprecated > mngtmpaddr dynamic > valid_lft 59528sec preferred_lft 0sec > inet6 fe80::96de:80ff:feba:37e4/64 scope link tentative dadfailed > valid_lft forever preferred_lft forever > <three overs are similar> > > What is more boggling is that if I attache with a debugger at > opal/mca/pmix/pmix3x/pmix/src/mca/ptl/tcp/ptl_tcp_components.c around line > 500 I see that mca_ptl_tcp_component.remote_connections is false. This means > that the way I set up component parameters is ignored. > > -- > Regards, > Maksym Planeta > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users