Hello, I want to force OpenMPI to use TCP and in particular use a particular subnet. Unfortunately, I can't manage to do that.
Here is what I try: $BIN/mpirun --mca pml ob1 --mca btl tcp,self --mca ptl_tcp_remote_connections 1 --mca btl_tcp_if_include '10.233.0.0/19' -np 4 --oversubscribe -H ib1n,ib2n bash -c 'echo $PMIX_SERVER_URI2' The expected result would be a list of IP addresses in 10.233.0.0 subnet, but instead I get this: 2659516416.2;tcp4://127.0.0.1:46777 2659516416.2;tcp4://127.0.0.1:46777 2659516416.1;tcp4://127.0.0.1:45055 2659516416.1;tcp4://127.0.0.1:45055 Could you help me to debug this problem somehow? The IP addresses are completely available in the desired subnet $BIN/mpirun --mca pml ob1 --mca btl tcp,self --mca ptl_tcp_remote_connections 1 --mca btl_tcp_if_include '10.233.0.0/19' -np 4 --oversubscribe -H ib1n,ib2n ip addr show dev br0 Returns a set of bridges looking like: 9: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 94:de:80:ba:37:e4 brd ff:ff:ff:ff:ff:ff inet 141.76.49.17/26 brd 141.76.49.63 scope global br0 valid_lft forever preferred_lft forever inet 10.233.0.82/19 scope global br0 valid_lft forever preferred_lft forever inet6 2002:8d4c:3001:48:40de:80ff:feba:37e4/64 scope global deprecated mngtmpaddr dynamic valid_lft 59528sec preferred_lft 0sec inet6 fe80::96de:80ff:feba:37e4/64 scope link tentative dadfailed valid_lft forever preferred_lft forever <three overs are similar> What is more boggling is that if I attache with a debugger at opal/mca/pmix/pmix3x/pmix/src/mca/ptl/tcp/ptl_tcp_components.c around line 500 I see that mca_ptl_tcp_component.remote_connections is false. This means that the way I set up component parameters is ignored. -- Regards, Maksym Planeta
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users