Reviewed: https://review.opendev.org/c/openstack/neutron/+/810592 Committed: https://opendev.org/openstack/neutron/commit/57629dc05122b01a9ba76606b8d75cce9da40776 Submitter: "Zuul (22348)" Branch: master
commit 57629dc05122b01a9ba76606b8d75cce9da40776 Author: Rodolfo Alonso Hernandez <ralon...@redhat.com> Date: Thu Sep 23 09:15:09 2021 +0000 Add retry when executing OF commands if "InvalidDatapath" When using the OF API (currently Neutron only uses native implementation via "os-ken" librarr), retry the command in case of "InvalidDatapath" exception. As commented in the related bug, some operations could restart the OF controller (set the OF procols, set the bridge fail mode). During the controller restart, a command can return a "InvalidDatapath" exception. Closes-Bug: #1944201 Change-Id: Ia8d202f8a38362272e9519c1cbd9d6ba9359e0a1 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1944201 Title: neutron-openvswitch-agent crashes on start with firewall config of br- int Status in neutron: Fix Released Bug description: In upstream CI, Ironic jobs have been encountering failures where we never find the networking to be stood up by neutron. Investigation into what was going on led us to finding the neutron-openvswitch-agent in failed state, exited due to RuntimeError, just a few seconds after the service was started. neutron-openvswitch-agent[78787]: DEBUG neutron.agent.securitygroups_rpc [None req-b18a79b7-7258-44f0-9a69-fa92a490bc26 None None] Init firewall settings (driver=openvswitch) {{(pid=78787) init_firewall /opt/stack/neutron/neutron/agent/securitygroups_rpc.py:118}} neutron-openvswitch-agent[78787]: DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): DbAddCommand(table=Bridge, record=br-int, column=protocols, values=('OpenFlow10', 'OpenFlow11', 'OpenFlow12', 'OpenFlow13', 'OpenFlow14')) {{(pid=78787) do_commit /usr/local/lib/python3.8/dist-packages/ovsdbapp/backend/ovs_idl/transaction.py:90}} neutron-openvswitch-agent[78787]: ERROR OfctlService [-] unknown dpid 90695823979334 neutron-openvswitch-agent[78787]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [None req-b18a79b7-7258-44f0-9a69-fa92a490bc26 None None] ofctl request version=None,msg_type=None,msg_len=None,xid=None,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_group=4294967295,out_port=4294967295,table_id=71,type=1) error Datapath Invalid 90695823979334: os_ken.app.ofctl.exception.InvalidDatapath: Datapath Invalid 90695823979334 neutron-openvswitch-agent[78787]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-b18a79b7-7258-44f0-9a69-fa92a490bc26 None None] ofctl request version=None,msg_type=None,msg_len=None,xid=None,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_group=4294967295,out_port=4294967295,table_id=71,type=1) error Datapath Invalid 90695823979334 agent terminated!: RuntimeError: ofctl request version=None,msg_type=None,msg_len=None,xid=None,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_group=4294967295,out_port=4294967295,table_id=71,type=1) error Datapath Invalid 90695823979334 systemd[1]: devstack@q-agt.service: Main process exited, code=exited, status=1/FAILURE systemd[1]: devstack@q-agt.service: Failed with result 'exit-code'. Originally, this was thought to be related to https://bugs.launchpad.net/neutron/+bug/1817022, however this is upon service startup on a relatively low load machine where the only action really is truly just neutron starting at that time. Also, starting, the connections have not been able to exist long enough for inactivity idle triggers to occur. Investigation into allowed us to identify the general path of what is occurring, yet why we don't understand, at least in the Ironic community. init_firewall() invocation: https://github.com/openstack/neutron/blob/79445f12be3a9ca892672fe0e016336ef60877a2/neutron/agent/securitygroups_rpc.py#L70 Firewall class launch: https://github.com/openstack/neutron/blob/79445f12be3a9ca892672fe0e016336ef60877a2/neutron/agent/securitygroups_rpc.py#L121 As the default for the firewall driver ends up sending us into openvswitch's firewall code: https://github.com/openstack/neutron/blob/79445f12be3a9ca892672fe0e016336ef60877a2/neutron/agent/linux/openvswitch_firewall/firewall.py#L548 https://github.com/openstack/neutron/blob/79445f12be3a9ca892672fe0e016336ef60877a2/neutron/agent/linux/openvswitch_firewall/firewall.py#L628 Which eventually ends up in https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py#L91 where it raises a RuntimeError and the service exits out. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1944201/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp