You might want to make sure that the run time is working properly before going 
too much further.  E.g., try mpirun'ing hostname (i.e., the Linux command -- a 
non-MPI program) and make sure that that works.

If that works, then try mpirun'ing the "hello world" example program that comes 
in the examples/ directory in the Open MPI tarball.  That program just 
initializes and finalizes MPI; it does no actual MPI communication.

If that works, then try mpirun'ing the "ring" example program in the same 
examples/ directory.  That does very simple MPI communication.



> On Jun 23, 2020, at 2:39 PM, Kulshrestha, Vipul 
> <vipul_kulshres...@mentor.com> wrote:
> 
> Thanks for the clarification Jeff.
>  
> I am using Open MPI 4.0.1
>  
> Once fully setup, I intend to run my application in conjunction with grid, so 
> the resources will be allocated by grid. This makes it very difficult to 
> specify IP address for btl_tcp_if_include.
>  
> For the named exclude interfaces, it still hanged (with no output) when I 
> specified btl_base_verbose 100.
>  
> I will try using the CIDR for the below hosts as an experiment.
>  
> Regards,
> Vipul
>  
>  
>  
> From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] 
> Sent: Tuesday, June 23, 2020 1:36 PM
> To: Open MPI User's List <users@lists.open-mpi.org>
> Cc: Kulshrestha, Vipul <vipul_kulshres...@mentor.com>
> Subject: Re: [OMPI users] Question about virtual interface
>  
> https://www.open-mpi.org/faq/?category=tcp#ip-virtual-ip-interfaces is 
> referring to interfaces like "eth0:0", where the Linux kernel will have the 
> same index for both "eth0" and "eth0:0".  This will cause Open MPI to get 
> confused (because it identifies Ethernet interfaces by their kernel indexes).
>  
> If you have non-physical Ethernet interfaces (like vibr0, etc.), those should 
> work just fine with btl_tcp_if_include|exclude.
>  
> What version of Open MPI are you using?
>  
> You might want to "--mca btl_tcp_if_include CIDR" where CIDR is the 
> representation of the subnet you want to use.  This will allow your app to 
> work, even if that network is on different Ethernet interfaces on different 
> hosts.  For example:
>  
>     mpirun --mca btl_tcp_if_include 192.168.10.0/24 ...
>  
> If you're still getting a hang, try with btl_base_verbose value of 100.
>  
>  
> 
> 
> On Jun 18, 2020, at 7:39 PM, Kulshrestha, Vipul via users 
> <users@lists.open-mpi.org> wrote:
>  
> Hi,
>  
> I have read conflicting statements about OMPI support for virtual interfaces.
>  
> The Open MPI FAQ mentions that virtual IP interfaces are not supported and 
> this will not be solved by using either btl_tcp_if_include or 
> btl_tcp_if_exclude.  
> (https://www.open-mpi.org/faq/?category=tcp#ip-virtual-ip-interfaces)
>  
> However, somewhere else, I read that you can exclude the virtual interfaces 
> by specifying –mca btl_tcp_if_exclude virbr0,lo 
> (https://github.com/open-mpi/ompi/issues/6377)
>  
> I am trying this out on different machines and find that it (specifying 
> btl_tcp_if_exclude virbr0,lo) works on one pair of machine but does not work 
> on another pair of machines. I am hoping to get an explanation on why does 
> one work and other does not.
>  
> I tried to generate some verbose output (on the pair of machine where it does 
> not work) by specifying –mca btl_base_verbose 30, but it just hangs and does 
> not generate any messages.
>  
> $ mpirun -np 4 --mca btl_base_verbose 30 --mca btl_tcp_if_exclude 
> virbr0,virbr1,virbr2,virbr3,lo --hostfile host.txt /home/vipulk/mpitest2 100
> …..
> ….
> <no output and remains stuck forever>
>  
> The ifconfig output for the 2 machines in the host list are listed below.
>  
> Thanks,
> Vipul
>  
>  
> Host1:
>  
> eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500                    
>                                        
>         inet 175.148.218.46  netmask 255.255.255.0  broadcast 175.148.218.255 
>                                        
>         inet6 fe80::9af2:b3ff:fe2a:3e84  prefixlen 64  scopeid 0x20<link>     
>                                        
>         ether 98:f2:b3:2a:3e:84  txqueuelen 1000  (Ethernet)                  
>                                        
>         RX packets 5938671220  bytes 6033195902625 (5.4 TiB)                  
>                                        
>         RX errors 0  dropped 534674  overruns 0  frame 0                      
>                                        
>         TX packets 3933921252  bytes 3077919856788 (2.7 TiB)                  
>                                        
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0            
>                                        
>         device interrupt 16                                                   
>                                         
>  
> eno2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>         inet 192.168.1.2  netmask 255.255.255.0  broadcast 192.168.1.255
>         inet6 fe80::be68:2aa2:8b42:d6d  prefixlen 64  scopeid 0x20<link>
>         ether 98:f2:b3:2a:3e:85  txqueuelen 1000  (Ethernet)           
>         RX packets 2355308  bytes 279699254 (266.7 MiB)                
>         RX errors 0  dropped 350  overruns 0  frame 0                  
>         TX packets 60  bytes 8732 (8.5 KiB)                            
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0     
>         device interrupt 17                                            
>  
> eno3: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>         ether 98:f2:b3:2a:3e:86  txqueuelen 1000  (Ethernet)
>         RX packets 0  bytes 0 (0.0 B)                       
>         RX errors 0  dropped 0  overruns 0  frame 0         
>         TX packets 0  bytes 0 (0.0 B)                       
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>         device interrupt 16                                      
>  
> eno4: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>         ether 98:f2:b3:2a:3e:87  txqueuelen 1000  (Ethernet)
>         RX packets 0  bytes 0 (0.0 B)                       
>         RX errors 0  dropped 0  overruns 0  frame 0         
>         TX packets 0  bytes 0 (0.0 B)                       
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>         device interrupt 17                                      
>  
> lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
>         inet 127.0.0.1  netmask 255.0.0.0   
>         inet6 ::1  prefixlen 128  scopeid 0x10<host>
>         loop  txqueuelen 1000  (Local Loopback)
>         RX packets 3161146200  bytes 225991248912 (210.4 GiB)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 3161146200  bytes 225991248912 (210.4 GiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>  
> virbr2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>         inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
>         ether 52:54:00:0a:cd:21  txqueuelen 1000  (Ethernet)
>         RX packets 0  bytes 0 (0.0 B)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 0  bytes 0 (0.0 B)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>  
> virbr3: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>         inet 192.168.123.1  netmask 255.255.255.0  broadcast 192.168.123.255
>         ether 52:54:00:0a:cd:22  txqueuelen 1000  (Ethernet)
>         RX packets 0  bytes 0 (0.0 B)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 0  bytes 0 (0.0 B)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>  
> Host2:
> eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500                    
>                                      
>         inet 175.148.218.210  netmask 255.255.255.0  broadcast 
> 175.148.218.255                                     
>         inet6 fe80::9af2:b3ff:fe2a:3e78  prefixlen 64  scopeid 0x20<link>     
>                                      
>         ether 98:f2:b3:2a:3e:78  txqueuelen 1000  (Ethernet)                  
>                                      
>         RX packets 8632800  bytes 3938419917 (3.6 GiB)                        
>                                      
>         RX errors 0  dropped 350  overruns 0  frame 0                         
>                                      
>         TX packets 5504444  bytes 1791707074 (1.6 GiB)                        
>                                       
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0            
>                                      
>         device interrupt 16                                                   
>                                       
>  
> eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>         inet 192.168.1.2  netmask 255.255.255.0  broadcast 192.168.1.255
>         inet6 fe80::9af2:b3ff:fe2a:3e79  prefixlen 64  scopeid 0x20<link>
>         ether 98:f2:b3:2a:3e:79  txqueuelen 1000  (Ethernet)            
>         RX packets 2317163  bytes 275220791 (262.4 MiB)                 
>         RX errors 0  dropped 350  overruns 0  frame 0                   
>         TX packets 336  bytes 26726 (26.0 KiB)                           
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0      
>         device interrupt 17                                             
>  
> lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
>         inet 127.0.0.1  netmask 255.0.0.0   
>         inet6 ::1  prefixlen 128  scopeid 0x10<host>
>         loop  txqueuelen 1000  (Local Loopback)     
>         RX packets 32539  bytes 2540603 (2.4 MiB)   
>         RX errors 0  dropped 0  overruns 0  frame 0 
>         TX packets 32539  bytes 2540603 (2.4 MiB)   
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>  
> virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>         inet 192.168.123.1  netmask 255.255.255.0  broadcast 192.168.123.255
>         ether 52:54:00:0a:cd:22  txqueuelen 1000  (Ethernet)               
>         RX packets 0  bytes 0 (0.0 B)                                      
>         RX errors 0  dropped 0  overruns 0  frame 0                        
>         TX packets 0  bytes 0 (0.0 B)                                      
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0         
>  
> virbr1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>         inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
>         ether 52:54:00:0a:cd:21  txqueuelen 1000  (Ethernet)               
>         RX packets 0  bytes 0 (0.0 B)                                      
>         RX errors 0  dropped 0  overruns 0  frame 0                        
>         TX packets 0  bytes 0 (0.0 B)                                      
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0         
>  
>  
> --mca btl_tcp_if_exclude virbr0,lo works on machines with below configuration:
>  
> Host 3:
> eno1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500                            
>                                        
>         ether 80:30:e0:3b:c8:40  txqueuelen 1000  (Ethernet)                  
>                                        
>         RX packets 0  bytes 0 (0.0 B)                                         
>                                        
>         RX errors 0  dropped 0  overruns 0  frame 0                           
>                                        
>         TX packets 0  bytes 0 (0.0 B)                                         
>                                        
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0            
>                                        
>         device interrupt 16                                                   
>                                        
>  
> eno2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>         ether 80:30:e0:3b:c8:41  txqueuelen 1000  (Ethernet)
>         RX packets 0  bytes 0 (0.0 B)                       
>         RX errors 0  dropped 0  overruns 0  frame 0         
>         TX packets 0  bytes 0 (0.0 B)                       
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>         device interrupt 17                                      
>  
> eno3: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>         ether 80:30:e0:3b:c8:42  txqueuelen 1000  (Ethernet)
>         RX packets 0  bytes 0 (0.0 B)                       
>         RX errors 0  dropped 0  overruns 0  frame 0         
>         TX packets 0  bytes 0 (0.0 B)                       
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>         device interrupt 16                                      
>  
> eno4: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>         ether 80:30:e0:3b:c8:43  txqueuelen 1000  (Ethernet)
>         RX packets 0  bytes 0 (0.0 B)                       
>         RX errors 0  dropped 0  overruns 0  frame 0         
>         TX packets 0  bytes 0 (0.0 B)                       
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>         device interrupt 17                                      
>  
> eno5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>         inet 65.10.19.30  netmask 255.255.255.192  broadcast 65.10.19.63
>         inet6 fe80::8230:e0ff:fe20:96a8  prefixlen 64  scopeid 0x20<link>
>         ether 80:30:e0:20:96:a8  txqueuelen 1000  (Ethernet)             
>         RX packets 1618138239  bytes 1552281705604 (1.4 TiB)
>         RX errors 184  dropped 0  overruns 184  frame 0
>         TX packets 1500861577  bytes 1593767198059 (1.4 TiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>         device interrupt 34  memory 0xe8000000-e87fffff
>  
> eno6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>         ether 80:30:e0:20:96:ac  txqueuelen 1000  (Ethernet)
>         RX packets 1299786  bytes 150289059 (143.3 MiB)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 0  bytes 0 (0.0 B)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>         device interrupt 77  memory 0xe7000000-e77fffff
>  
> lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
>         inet 127.0.0.1  netmask 255.0.0.0
>         inet6 ::1  prefixlen 128  scopeid 0x10<host>
>         loop  txqueuelen 1000  (Local Loopback)
>         RX packets 20936389  bytes 2632538104 (2.4 GiB)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 20936389  bytes 2632538104 (2.4 GiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>  
> virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>         inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
>         ether 52:54:00:05:7c:dd  txqueuelen 1000  (Ethernet)
>         RX packets 0  bytes 0 (0.0 B)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 0  bytes 0 (0.0 B)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>  
>  
> HOST 4:
>  
> eno1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>         ether 80:30:e0:3b:b8:5c  txqueuelen 1000  (Ethernet)
>         RX packets 0  bytes 0 (0.0 B)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 0  bytes 0 (0.0 B)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>         device interrupt 16
>  
> eno2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>         ether 80:30:e0:3b:b8:5d  txqueuelen 1000  (Ethernet)
>         RX packets 0  bytes 0 (0.0 B)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 0  bytes 0 (0.0 B)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>         device interrupt 17
>  
> eno3: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>         ether 80:30:e0:3b:b8:5e  txqueuelen 1000  (Ethernet)
>         RX packets 0  bytes 0 (0.0 B)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 0  bytes 0 (0.0 B)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>         device interrupt 16
>  
> eno4: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>         ether 80:30:e0:3b:b8:5f  txqueuelen 1000  (Ethernet)
>         RX packets 0  bytes 0 (0.0 B)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 0  bytes 0 (0.0 B)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>         device interrupt 17
>  
> eno5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>         inet 65.10.19.29  netmask 255.255.255.192  broadcast 65.10.19.63
>         inet6 fe80::8230:e0ff:fe20:96c0  prefixlen 64  scopeid 0x20<link>
>         ether 80:30:e0:20:96:c0  txqueuelen 1000  (Ethernet)
>         RX packets 2904054722  bytes 2656941056010 (2.4 TiB)
>         RX errors 11  dropped 0  overruns 11  frame 0
>         TX packets 5801141892  bytes 7474409123677 (6.7 TiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>         device interrupt 34  memory 0xe8000000-e87fffff
>  
> eno6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>         ether 80:30:e0:20:96:c4  txqueuelen 1000  (Ethernet)
>         RX packets 1299694  bytes 150265217 (143.3 MiB)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 0  bytes 0 (0.0 B)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>         device interrupt 77  memory 0xe7000000-e77fffff
>  
>  
> lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
>         inet 127.0.0.1  netmask 255.0.0.0
>         inet6 ::1  prefixlen 128  scopeid 0x10<host>
>         loop  txqueuelen 1000  (Local Loopback)
>         RX packets 19850956  bytes 5578561316 (5.1 GiB)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 19850956  bytes 5578561316 (5.1 GiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>  
> virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>         inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
>         ether 52:54:00:79:33:89  txqueuelen 1000  (Ethernet)
>         RX packets 0  bytes 0 (0.0 B)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 0  bytes 0 (0.0 B)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>  
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com


-- 
Jeff Squyres
jsquy...@cisco.com

Reply via email to