Reviewed: https://review.openstack.org/408281 Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1a2a71baf3904209679fc5448814a0e7940fe44d Submitter: Jenkins Branch: master
commit 1a2a71baf3904209679fc5448814a0e7940fe44d Author: Kevin Benton <[email protected]> Date: Wed Dec 7 11:33:46 2016 -0800 SRIOV: don't block report_state with device count The device count process can be quite slow on a system with lots of interfaces. Doing this during report_state can block it long enough that the agent will be reported as dead and bindings will fail. This adjusts the logic to only update the configuration during the normal device retrieval for the scan loop. This will leave the report_state loop unblocked by the operation so the agent doesn't get reported as dead (which blocks port binding). Closes-Bug: #1648206 Change-Id: Iff45fb6617974b1eceeed238a8d9e958f874f12b ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1648206 Title: sriov agent report_state is slow Status in neutron: Fix Released Bug description: On a system with lots of VFs and PFs we get these logs: WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 29.67 sec WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 45.43 sec WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 47.64 sec WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 23.89 sec WARNING oslo.service.loopingcall [-] Function 'neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent.SriovNicSwitchAgent._report_state' run outlasted interval by 30.20 sec Depending on the agent_down_time configuration, this can cause the Neutron server to think the agent has died. This appears to be caused by blocking on the eswitch manager every time to get a device count to include in the state report. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1648206/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

