Public bug reported:
In case when neutron-ovs-agent is handling many ports, like e.g. 1000 and
configures all of them in one rpc loop iteration, it will send
update_devices_status rpc message to the neutron-server and it may timeout as
neutron server will process ports one by one.
Such timeout will force to do full sync on agent's side so the problem will
repeat.
Error on agent side is like:
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
[req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Error while processing VIF
ports: MessagingTimeout: Timed out waiting for a reply to message ID
894728e7bda24382931b5af26a367f16
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most
recent call last):
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
line 2339, in rpc_loop
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent port_info,
provisioning_needed)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 158, in wrapper
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent result =
f(*args, **kwargs)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
line 1851, in process_network_ports
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
self.treat_devices_skipped(skipped_devices)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 158, in wrapper
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent result =
f(*args, **kwargs)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
line 1783, in treat_devices_skipped
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
self.conf.host)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 171, in
update_device_list
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
agent_restarted=agent_restarted)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 185, in call
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
time.sleep(wait)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
self.force_reraise()
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in
force_reraise
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
six.reraise(self.type_, self.value, self.tb)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 162, in call
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent return
self._original_context.call(ctxt, method, **kwargs)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 174, in
call
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
retry=self.retry)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 131, in
_send
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
timeout=timeout, retry=retry)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line
625, in send
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent retry=retry)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line
614, in _send
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent result =
self._waiter.wait(msg_id, timeout)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line
506, in wait
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent message =
self.waiters.get(msg_id, timeout=timeout)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line
394, in get
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 'to message
ID %s' % msg_id)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
MessagingTimeout: Timed out waiting for a reply to message ID
894728e7bda24382931b5af26a367f16
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
2020-06-30 16:13:49.176 148737 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
[req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent rpc_loop -
iteration:0 completed. Processed ports statistics: {'regular': {'updated': 0,
'added': 1250, 'removed': 0}}. Elapsed:2169.184
2020-06-30 16:13:49.177 148737 DEBUG
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
[req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Loop iteration exceeded
interval (2 vs. 2169.18406701)! loop_count_and_wait
/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutr
on_agent.py:2063
2020-06-30 16:13:49.178 148737 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
[req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent rpc_loop -
iteration:1 started
2020-06-30 16:13:49.182 148737 DEBUG
neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch
[req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] ofctl request
version=0x4,msg_type=0x12,msg_len=0x38,xid=0x27b6f57f,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_grou
p=4294967295,out_port=4294967295,table_id=23,type=1) result
[OFPFlowStatsReply(body=[OFPFlowStats(byte_count=0,cookie=12410424124845347866L,duration_nsec=444000000,duration_sec=2186,flags=0,hard_timeout=0,idle_timeout=0,instructions=[],length=56,match=OFPMatch(oxm_fields={}),packet_count=0,priority=0,table_id=23)],fl
ags=0,type=1)] _send_msg
/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py:98
2020-06-30 16:13:49.191 148737 DEBUG neutron.agent.linux.utils
[req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Found cmdline
['ovsdb-client', 'monitor', 'tcp:127.0.0.1:6640', 'Interface',
'name,ofport,external_ids', '--format=json'] for rocess with PID 149081.
get_cmdline_from_pid /usr/lib/python2.7/site-package
s/neutron/agent/linux/utils.py:329
2020-06-30 16:13:49.224 148737 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
[req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent rpc_loop -
iteration:1 - starting polling. Elapsed:0.046
2020-06-30 16:13:49.226 148737 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
[req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent out of sync with
plugin!
We should probably do here something similar what is done in e.g. DHCP agent
and we should report statuses about ports in batches.
** Affects: neutron
Importance: Medium
Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez)
Status: Confirmed
** Tags: ovs
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1885758
Title:
RPCMessage timeouts when ovs agent is reporting status about many
ports
Status in neutron:
Confirmed
Bug description:
In case when neutron-ovs-agent is handling many ports, like e.g. 1000 and
configures all of them in one rpc loop iteration, it will send
update_devices_status rpc message to the neutron-server and it may timeout as
neutron server will process ports one by one.
Such timeout will force to do full sync on agent's side so the problem will
repeat.
Error on agent side is like:
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
[req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Error while processing VIF
ports: MessagingTimeout: Timed out waiting for a reply to message ID
894728e7bda24382931b5af26a367f16
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most
recent call last):
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
line 2339, in rpc_loop
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent port_info,
provisioning_needed)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 158, in wrapper
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent result =
f(*args, **kwargs)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
line 1851, in process_network_ports
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
self.treat_devices_skipped(skipped_devices)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 158, in wrapper
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent result =
f(*args, **kwargs)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
line 1783, in treat_devices_skipped
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
self.conf.host)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 171, in
update_device_list
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
agent_restarted=agent_restarted)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 185, in call
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
time.sleep(wait)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
self.force_reraise()
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in
force_reraise
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
six.reraise(self.type_, self.value, self.tb)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 162, in call
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent return
self._original_context.call(ctxt, method, **kwargs)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 174, in
call
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
retry=self.retry)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 131, in
_send
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
timeout=timeout, retry=retry)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line
625, in send
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent retry=retry)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line
614, in _send
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent result =
self._waiter.wait(msg_id, timeout)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line
506, in wait
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent message =
self.waiters.get(msg_id, timeout=timeout)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File
"/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line
394, in get
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 'to message
ID %s' % msg_id)
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
MessagingTimeout: Timed out waiting for a reply to message ID
894728e7bda24382931b5af26a367f16
2020-06-30 16:13:49.164 148737 ERROR
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
2020-06-30 16:13:49.176 148737 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
[req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent rpc_loop -
iteration:0 completed. Processed ports statistics: {'regular': {'updated': 0,
'added': 1250, 'removed': 0}}. Elapsed:2169.184
2020-06-30 16:13:49.177 148737 DEBUG
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
[req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Loop iteration exceeded
interval (2 vs. 2169.18406701)! loop_count_and_wait
/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutr
on_agent.py:2063
2020-06-30 16:13:49.178 148737 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
[req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent rpc_loop -
iteration:1 started
2020-06-30 16:13:49.182 148737 DEBUG
neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch
[req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] ofctl request
version=0x4,msg_type=0x12,msg_len=0x38,xid=0x27b6f57f,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_grou
p=4294967295,out_port=4294967295,table_id=23,type=1) result
[OFPFlowStatsReply(body=[OFPFlowStats(byte_count=0,cookie=12410424124845347866L,duration_nsec=444000000,duration_sec=2186,flags=0,hard_timeout=0,idle_timeout=0,instructions=[],length=56,match=OFPMatch(oxm_fields={}),packet_count=0,priority=0,table_id=23)],fl
ags=0,type=1)] _send_msg
/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py:98
2020-06-30 16:13:49.191 148737 DEBUG neutron.agent.linux.utils
[req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Found cmdline
['ovsdb-client', 'monitor', 'tcp:127.0.0.1:6640', 'Interface',
'name,ofport,external_ids', '--format=json'] for rocess with PID 149081.
get_cmdline_from_pid /usr/lib/python2.7/site-package
s/neutron/agent/linux/utils.py:329
2020-06-30 16:13:49.224 148737 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
[req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent rpc_loop -
iteration:1 - starting polling. Elapsed:0.046
2020-06-30 16:13:49.226 148737 INFO
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
[req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent out of sync with
plugin!
We should probably do here something similar what is done in e.g. DHCP agent
and we should report statuses about ports in batches.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1885758/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp