Public bug reported: Certain flows are missing in a distributed openstack setup after restart of openvswitch. I have tested this on openstack ussuri deployed with kolla-ansible on ubuntu bionic, so there is a chance that this has been either been fixed or is caused by specifics of the deployment.
## Steps to reproduce There might be a simpler reproducer, but this is what I did: * Setup a distributed openstack with at least one control node and two compute nodes * Configure neutron with OVS and DVR * Configure octavia with amphora driver * Setup an external network as floating ip pool * Create an instance with an http server * Create a loadbalancer with an http listener/pool * Add the instance as pool member to the loadbalancer * Attach a floating IP to the loadbalancer's virtual IP * Make sure that the loadbalancer amphora and the instance are on different compute nodes * Ensure that you can make an http request, e.g.: ``` # curl -I http://${FLOATING_IP} HTTP/1.1 200 OK Server: nginx/1.18.0 (Ubuntu) Date: Fri, 27 Jan 2023 15:00:00 GMT Content-Type: text/html Content-Length: 612 Last-Modified: Fri, 27 Jan 2023 13:45:11 GMT ETag: "63d3d567-264" Accept-Ranges: bytes 0 612 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 ``` * Restart openvswitch ``` # docker restart openvswitch_vswitchd openvswitch_vswitchd ``` * Observe that the connection fails with, e.g.: ``` # curl -I http://${FLOATING_IP} % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0 curl: (7) Failed to connect to ${FLOATING_IP} port 80: No route to host ``` * Connections will re-establish only after restarting neutron- openvswitch-agent ## Flows before and after restart of openvswitch Looking at the flows on the controller node on the tunnel bridge one can see, that flows are missing after restarting openvswitch: ``` # docker exec openvswitch_vswitchd ovs-ofctl dump-flows br-tun > before_ovs_restart.log # docker restart openvswitch_vswitchd openvswitch_vswitchd # docker exec openvswitch_vswitchd ovs-ofctl dump-flows br-tun > after_ovs_restart.log # awk '{print $3" "$(NF)}' < before_ovs_restart.log > before_ovs_restart_cleaned.log # awk '{print $3" "$(NF)}' < after_ovs_restart.log > after_ovs_restart_cleaned.log # diff before_ovs_restart_cleaned.log after_ovs_restart_cleaned.log 3,4d2 < table=0, actions=resubmit(,4) < table=0, actions=resubmit(,4) 6,7d3 < table=1, actions=drop < table=1, actions=mod_dl_src:fa:16:3f:56:bb:5a,resubmit(,2) 13d8 < table=4, actions=mod_vlan_vid:53,resubmit(,9) 20,22d14 < table=20, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:22 < table=20, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:23 < table=20, actions=load:0->NXM_OF_VLAN_TCI[],load:0x2ed->NXM_NX_TUN_ID[],output:22 24,25d15 < table=21, actions=load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xfa163eb4cf96->NXM_NX_ARP_SHA[],load:0xa000165->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:fa:16:3e:b4:cf:96,IN_PORT < table=21, actions=load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xfa163e77e67e->NXM_NX_ARP_SHA[],load:0xa0000a3->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:fa:16:3e:77:e6:7e,IN_PORT 27,28d16 < table=22, actions=drop < table=22, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:22,output:23 ``` Please let me know if you need more information. I also have a heat stack which automates the openstack resource part of the reproducer, in case this makes things easier. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2004041 Title: Missing flows with ovs dvr after openvswitch restart Status in neutron: New Bug description: Certain flows are missing in a distributed openstack setup after restart of openvswitch. I have tested this on openstack ussuri deployed with kolla-ansible on ubuntu bionic, so there is a chance that this has been either been fixed or is caused by specifics of the deployment. ## Steps to reproduce There might be a simpler reproducer, but this is what I did: * Setup a distributed openstack with at least one control node and two compute nodes * Configure neutron with OVS and DVR * Configure octavia with amphora driver * Setup an external network as floating ip pool * Create an instance with an http server * Create a loadbalancer with an http listener/pool * Add the instance as pool member to the loadbalancer * Attach a floating IP to the loadbalancer's virtual IP * Make sure that the loadbalancer amphora and the instance are on different compute nodes * Ensure that you can make an http request, e.g.: ``` # curl -I http://${FLOATING_IP} HTTP/1.1 200 OK Server: nginx/1.18.0 (Ubuntu) Date: Fri, 27 Jan 2023 15:00:00 GMT Content-Type: text/html Content-Length: 612 Last-Modified: Fri, 27 Jan 2023 13:45:11 GMT ETag: "63d3d567-264" Accept-Ranges: bytes 0 612 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 ``` * Restart openvswitch ``` # docker restart openvswitch_vswitchd openvswitch_vswitchd ``` * Observe that the connection fails with, e.g.: ``` # curl -I http://${FLOATING_IP} % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0 curl: (7) Failed to connect to ${FLOATING_IP} port 80: No route to host ``` * Connections will re-establish only after restarting neutron- openvswitch-agent ## Flows before and after restart of openvswitch Looking at the flows on the controller node on the tunnel bridge one can see, that flows are missing after restarting openvswitch: ``` # docker exec openvswitch_vswitchd ovs-ofctl dump-flows br-tun > before_ovs_restart.log # docker restart openvswitch_vswitchd openvswitch_vswitchd # docker exec openvswitch_vswitchd ovs-ofctl dump-flows br-tun > after_ovs_restart.log # awk '{print $3" "$(NF)}' < before_ovs_restart.log > before_ovs_restart_cleaned.log # awk '{print $3" "$(NF)}' < after_ovs_restart.log > after_ovs_restart_cleaned.log # diff before_ovs_restart_cleaned.log after_ovs_restart_cleaned.log 3,4d2 < table=0, actions=resubmit(,4) < table=0, actions=resubmit(,4) 6,7d3 < table=1, actions=drop < table=1, actions=mod_dl_src:fa:16:3f:56:bb:5a,resubmit(,2) 13d8 < table=4, actions=mod_vlan_vid:53,resubmit(,9) 20,22d14 < table=20, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:22 < table=20, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:23 < table=20, actions=load:0->NXM_OF_VLAN_TCI[],load:0x2ed->NXM_NX_TUN_ID[],output:22 24,25d15 < table=21, actions=load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xfa163eb4cf96->NXM_NX_ARP_SHA[],load:0xa000165->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:fa:16:3e:b4:cf:96,IN_PORT < table=21, actions=load:0x2->NXM_OF_ARP_OP[],move:NXM_NX_ARP_SHA[]->NXM_NX_ARP_THA[],move:NXM_OF_ARP_SPA[]->NXM_OF_ARP_TPA[],load:0xfa163e77e67e->NXM_NX_ARP_SHA[],load:0xa0000a3->NXM_OF_ARP_SPA[],move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],mod_dl_src:fa:16:3e:77:e6:7e,IN_PORT 27,28d16 < table=22, actions=drop < table=22, actions=strip_vlan,load:0x2ed->NXM_NX_TUN_ID[],output:22,output:23 ``` Please let me know if you need more information. I also have a heat stack which automates the openstack resource part of the reproducer, in case this makes things easier. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2004041/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp