Hi, I have read conflicting statements about OMPI support for virtual interfaces.
The Open MPI FAQ mentions that virtual IP interfaces are not supported and this will not be solved by using either btl_tcp_if_include or btl_tcp_if_exclude. (https://www.open-mpi.org/faq/?category=tcp#ip-virtual-ip-interfaces) However, somewhere else, I read that you can exclude the virtual interfaces by specifying -mca btl_tcp_if_exclude virbr0,lo (https://github.com/open-mpi/ompi/issues/6377) I am trying this out on different machines and find that it (specifying btl_tcp_if_exclude virbr0,lo) works on one pair of machine but does not work on another pair of machines. I am hoping to get an explanation on why does one work and other does not. I tried to generate some verbose output (on the pair of machine where it does not work) by specifying -mca btl_base_verbose 30, but it just hangs and does not generate any messages. $ mpirun -np 4 --mca btl_base_verbose 30 --mca btl_tcp_if_exclude virbr0,virbr1,virbr2,virbr3,lo --hostfile host.txt /home/vipulk/mpitest2 100 ..... .... <no output and remains stuck forever> The ifconfig output for the 2 machines in the host list are listed below. Thanks, Vipul Host1: eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 175.148.218.46 netmask 255.255.255.0 broadcast 175.148.218.255 inet6 fe80::9af2:b3ff:fe2a:3e84 prefixlen 64 scopeid 0x20<link> ether 98:f2:b3:2a:3e:84 txqueuelen 1000 (Ethernet) RX packets 5938671220 bytes 6033195902625 (5.4 TiB) RX errors 0 dropped 534674 overruns 0 frame 0 TX packets 3933921252 bytes 3077919856788 (2.7 TiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 16 eno2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.2 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::be68:2aa2:8b42:d6d prefixlen 64 scopeid 0x20<link> ether 98:f2:b3:2a:3e:85 txqueuelen 1000 (Ethernet) RX packets 2355308 bytes 279699254 (266.7 MiB) RX errors 0 dropped 350 overruns 0 frame 0 TX packets 60 bytes 8732 (8.5 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 17 eno3: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 ether 98:f2:b3:2a:3e:86 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 16 eno4: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 ether 98:f2:b3:2a:3e:87 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 17 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 3161146200 bytes 225991248912 (210.4 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3161146200 bytes 225991248912 (210.4 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 virbr2: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255 ether 52:54:00:0a:cd:21 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 virbr3: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 192.168.123.1 netmask 255.255.255.0 broadcast 192.168.123.255 ether 52:54:00:0a:cd:22 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 Host2: eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 175.148.218.210 netmask 255.255.255.0 broadcast 175.148.218.255 inet6 fe80::9af2:b3ff:fe2a:3e78 prefixlen 64 scopeid 0x20<link> ether 98:f2:b3:2a:3e:78 txqueuelen 1000 (Ethernet) RX packets 8632800 bytes 3938419917 (3.6 GiB) RX errors 0 dropped 350 overruns 0 frame 0 TX packets 5504444 bytes 1791707074 (1.6 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 16 eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.1.2 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::9af2:b3ff:fe2a:3e79 prefixlen 64 scopeid 0x20<link> ether 98:f2:b3:2a:3e:79 txqueuelen 1000 (Ethernet) RX packets 2317163 bytes 275220791 (262.4 MiB) RX errors 0 dropped 350 overruns 0 frame 0 TX packets 336 bytes 26726 (26.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 17 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 32539 bytes 2540603 (2.4 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 32539 bytes 2540603 (2.4 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 192.168.123.1 netmask 255.255.255.0 broadcast 192.168.123.255 ether 52:54:00:0a:cd:22 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 virbr1: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255 ether 52:54:00:0a:cd:21 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 --mca btl_tcp_if_exclude virbr0,lo works on machines with below configuration: Host 3: eno1: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 ether 80:30:e0:3b:c8:40 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 16 eno2: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 ether 80:30:e0:3b:c8:41 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 17 eno3: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 ether 80:30:e0:3b:c8:42 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 16 eno4: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 ether 80:30:e0:3b:c8:43 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 17 eno5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 65.10.19.30 netmask 255.255.255.192 broadcast 65.10.19.63 inet6 fe80::8230:e0ff:fe20:96a8 prefixlen 64 scopeid 0x20<link> ether 80:30:e0:20:96:a8 txqueuelen 1000 (Ethernet) RX packets 1618138239 bytes 1552281705604 (1.4 TiB) RX errors 184 dropped 0 overruns 184 frame 0 TX packets 1500861577 bytes 1593767198059 (1.4 TiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 34 memory 0xe8000000-e87fffff eno6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether 80:30:e0:20:96:ac txqueuelen 1000 (Ethernet) RX packets 1299786 bytes 150289059 (143.3 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 77 memory 0xe7000000-e77fffff lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 20936389 bytes 2632538104 (2.4 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 20936389 bytes 2632538104 (2.4 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255 ether 52:54:00:05:7c:dd txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 HOST 4: eno1: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 ether 80:30:e0:3b:b8:5c txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 16 eno2: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 ether 80:30:e0:3b:b8:5d txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 17 eno3: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 ether 80:30:e0:3b:b8:5e txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 16 eno4: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 ether 80:30:e0:3b:b8:5f txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 17 eno5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 65.10.19.29 netmask 255.255.255.192 broadcast 65.10.19.63 inet6 fe80::8230:e0ff:fe20:96c0 prefixlen 64 scopeid 0x20<link> ether 80:30:e0:20:96:c0 txqueuelen 1000 (Ethernet) RX packets 2904054722 bytes 2656941056010 (2.4 TiB) RX errors 11 dropped 0 overruns 11 frame 0 TX packets 5801141892 bytes 7474409123677 (6.7 TiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 34 memory 0xe8000000-e87fffff eno6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether 80:30:e0:20:96:c4 txqueuelen 1000 (Ethernet) RX packets 1299694 bytes 150265217 (143.3 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 77 memory 0xe7000000-e77fffff lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 19850956 bytes 5578561316 (5.1 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 19850956 bytes 5578561316 (5.1 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255 ether 52:54:00:79:33:89 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0