Public bug reported: Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started.
Steps ===== # add more physical bridges ovs-vsctl add-br br-physnet1 ip link set dev br-physnet1 up ovs-vsctl add-br br-physnet2 ip link set dev br-physnet2 up # set a broadcast going from one bridge ip address add 1.1.1.1/31 dev br-physnet1 arping -b -I br-physnet1 1.1.1.1 # listen on the other tcpdump -eni br-physnet2 # Update /etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_vlan] network_vlan_ranges = public,physnet1,physnet2 [ovs] datapath_type = system bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2 tunnel_bridge = br-tun local_ip = 127.0.0.1 [agent] tunnel_types = vxlan root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf enable_distributed_routing = True l2_population = True # stop server and agent systemctl stop devstack@q-svc systemctl stop devstack@q-agt # clear all flows for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done # start agent systemctl start devstack@q-agt $ sudo tcpdump -eni br-physnet2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 ... If there is more than one node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec. I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138 https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 ** Affects: neutron Importance: Undecided Status: New ** Summary changed: - Network loop between physical network with DVR + Network loop between physical networks with DVR ** Description changed: Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started. Steps ===== # add more physical bridges ovs-vsctl add-br br-physnet1 ip link set dev br-physnet1 up ovs-vsctl add-br br-physnet2 ip link set dev br-physnet2 up # set a broadcast going from one bridge ip address add 1.1.1.1/31 dev br-physnet1 arping -b -I br-physnet1 1.1.1.1 # listen on the other tcpdump -eni br-physnet2 - # Update /etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_vlan] network_vlan_ranges = public,physnet1,physnet2 [ovs] datapath_type = system bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2 tunnel_bridge = br-tun local_ip = 127.0.0.1 [agent] tunnel_types = vxlan root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf enable_distributed_routing = True l2_population = True - # stop server and agent systemctl stop devstack@q-svc systemctl stop devstack@q-agt # clear all flows for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done - systemctl start devstack@q-agt - - $ sudo tcpdump -eni br-physnet2 - tcpdump: verbose output suppressed, use -v or -vv for full protocol decode - listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes + $ sudo tcpdump -eni br-physnet2 + tcpdump: verbose output suppressed, use -v or -vv for full protocol decode + listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 ... - If there is more than node running the ovs agent in this state, then + If there are more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec. - I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138 https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 ** Description changed: Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started. Steps ===== # add more physical bridges ovs-vsctl add-br br-physnet1 ip link set dev br-physnet1 up ovs-vsctl add-br br-physnet2 ip link set dev br-physnet2 up # set a broadcast going from one bridge ip address add 1.1.1.1/31 dev br-physnet1 arping -b -I br-physnet1 1.1.1.1 # listen on the other tcpdump -eni br-physnet2 # Update /etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_vlan] network_vlan_ranges = public,physnet1,physnet2 [ovs] datapath_type = system bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2 tunnel_bridge = br-tun local_ip = 127.0.0.1 [agent] tunnel_types = vxlan root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf enable_distributed_routing = True l2_population = True # stop server and agent systemctl stop devstack@q-svc systemctl stop devstack@q-agt # clear all flows for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done + # start agent systemctl start devstack@q-agt $ sudo tcpdump -eni br-physnet2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 ... - If there are more than node running the ovs agent in this state, then + If there is more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec. I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138 https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 ** Description changed: Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started. Steps ===== # add more physical bridges ovs-vsctl add-br br-physnet1 ip link set dev br-physnet1 up ovs-vsctl add-br br-physnet2 ip link set dev br-physnet2 up # set a broadcast going from one bridge ip address add 1.1.1.1/31 dev br-physnet1 arping -b -I br-physnet1 1.1.1.1 # listen on the other tcpdump -eni br-physnet2 # Update /etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_vlan] network_vlan_ranges = public,physnet1,physnet2 [ovs] datapath_type = system bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2 tunnel_bridge = br-tun local_ip = 127.0.0.1 [agent] tunnel_types = vxlan root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf enable_distributed_routing = True l2_population = True # stop server and agent systemctl stop devstack@q-svc systemctl stop devstack@q-agt # clear all flows for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done # start agent systemctl start devstack@q-agt $ sudo tcpdump -eni br-physnet2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 ... - If there is more than node running the ovs agent in this state, then + If there are more than node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec. I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138 https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 ** Description changed: Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started. Steps ===== # add more physical bridges ovs-vsctl add-br br-physnet1 ip link set dev br-physnet1 up ovs-vsctl add-br br-physnet2 ip link set dev br-physnet2 up # set a broadcast going from one bridge ip address add 1.1.1.1/31 dev br-physnet1 arping -b -I br-physnet1 1.1.1.1 # listen on the other tcpdump -eni br-physnet2 # Update /etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_vlan] network_vlan_ranges = public,physnet1,physnet2 [ovs] datapath_type = system bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2 tunnel_bridge = br-tun local_ip = 127.0.0.1 [agent] tunnel_types = vxlan root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf enable_distributed_routing = True l2_population = True # stop server and agent systemctl stop devstack@q-svc systemctl stop devstack@q-agt # clear all flows for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done # start agent systemctl start devstack@q-agt $ sudo tcpdump -eni br-physnet2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 ... - If there are more than node running the ovs agent in this state, then + If there is more than one node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec. I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138 https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1887148 Title: Network loop between physical networks with DVR Status in neutron: New Bug description: Our CI experienced a network loop due to https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than one physical bridge mapping, and the neutron server was not available when the ovs agents were started. Steps ===== # add more physical bridges ovs-vsctl add-br br-physnet1 ip link set dev br-physnet1 up ovs-vsctl add-br br-physnet2 ip link set dev br-physnet2 up # set a broadcast going from one bridge ip address add 1.1.1.1/31 dev br-physnet1 arping -b -I br-physnet1 1.1.1.1 # listen on the other tcpdump -eni br-physnet2 # Update /etc/neutron/plugins/ml2/ml2_conf.ini [ml2_type_vlan] network_vlan_ranges = public,physnet1,physnet2 [ovs] datapath_type = system bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2 tunnel_bridge = br-tun local_ip = 127.0.0.1 [agent] tunnel_types = vxlan root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf enable_distributed_routing = True l2_population = True # stop server and agent systemctl stop devstack@q-svc systemctl stop devstack@q-agt # clear all flows for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows $BR; done # start agent systemctl start devstack@q-agt $ sudo tcpdump -eni br-physnet2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 bytes 09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, length 28 ... If there is more than one node running the ovs agent in this state, then there will be a network loop and packets can multiple quickly and overwhelm the network. We saw ~1 million packets/sec. I think because the neutron server is not available, the get_dvr_mac_address rpc is blocked and the required drops are not installed: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138 https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1887148/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp