[bringing a side email conversation onto the main mailing list] I have been looking into the issue with neutron_fwaas having an error when running under the neutron-l3-agent on Debian when using wsgi. Here's what I have tracked it down to at this point. I am going to lay it all out there, including points that you already know, because I am going to bring in another party or two at this point.
To make sure we are on solid ground, let me restate what are the parameters of the problem: 1. The error does not occur when neutron_fwaas is disabled. 2. The error does not occur if wsgi is in use. If standard eventlet is used, the error is not observed. 3. The error only occurs on debian; centos and ubuntu do not manifest the problem. As the neutron-l3-agent loads, it is trying to initialize the fwaas_v2 driver. The driver initializes without incident, and then proceeds to attempt to fetch firewall groups. Note that you do not need to exercise tempest to see this behavior; it is visible in the logs without anything else going on. Running pdb, I was able to trace the attempt to send the message deep into the RPC transmission code; I saw very little there to be suspicious of. 2018-06-20 21:06:34.761 915007 DEBUG neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 [-] Fetch firewall groups from plugin get_firewall_groups_for_project /usr/lib/python3/dist-packages/neutron_fwaas/services/firewall/agents/l3reference/firewall_l3_agent_v2.py:44 ... 2018-06-20 21:07:05.551 915007 ERROR neutron.common.rpc [-] Timeout in RPC method get_firewall_groups_for_project. Waiting for 10 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 8616e98dd8d943eea1dcf99c04bd2be6 You can see, the RPC message goes into the ether and does not return. This results in the stacktraces in neutron-l3-agent.log. This example is from a later transaction. 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 [-] FWaaS router add RPC info call failed for 8c13b5d7-7b93-4b91-ae4c-c387abe96734: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 44851518f5ee4d40a2cdbcabc27e3c92 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 Traceback (most recent call last): 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 324, in get 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 return self._queues[msg_id].get(block=True, timeout=timeout) 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 313, in get 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 return waiter.wait() 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 return get_hub().switch() 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 294, in switch 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 return self.greenlet.switch() 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 queue.Empty 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 During handling of the above exception, another exception occurred: 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 Traceback (most recent call last): 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/neutron_fwaas/services/firewall/agents/l3reference/firewall_l3_agent_v2.py", line 292, in add_router 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 self._process_router_update(context, new_router) 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/neutron_fwaas/services/firewall/agents/l3reference/firewall_l3_agent_v2.py", line 256, in _process_router_update 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 fwg_list = self.fwplugin_rpc.get_firewall_groups_for_project(ctx) 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/neutron_fwaas/services/firewall/agents/l3reference/firewall_l3_agent_v2.py", line 45, in get_firewall_groups_for_project 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 host=self.host) 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/neutron/common/rpc.py", line 185, in call 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 time.sleep(wait) 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 self.force_reraise() 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 six.reraise(self.type_, self.value, self.tb) 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 raise value 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/neutron/common/rpc.py", line 162, in call 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 return self._original_context.call(ctxt, method, **kwargs) 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 174, in call 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 retry=self.retry) 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 131, in _send 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 timeout=timeout, retry=retry) 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in send 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 retry=retry) 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 548, in _send 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 result = self._waiter.wait(msg_id, timeout) 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 440, in wait 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 message = self.waiters.get(msg_id, timeout=timeout) 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 328, in get 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 'to message ID %s' % msg_id) 2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 44851518f5ee4d40a2cdbcabc27e3c92 So it seems the RPC message for get_firewall_groups_for_project gets lost. I can tell that there is no shortage of AMQP messaging occurring between the neutron-l3-agent and RabbitMQ. Looking at the RabbitMQ error log, I am concerned that the message is not even really being transmitted, because RabbitMQ is registering premature disconnections from only the neutron_l3_agent at times that correspond to this testing. =WARNING REPORT==== 20-Jun-2018::19:42:41 === closing AMQP connection <0.16171.21> (127.0.0.1:47942 -> 127.0.0.1:5671 - neutron-l3-agent:804510:ecc4ca7d-361f-49f7-94f3-9b4a07d102fc): client unexpectedly closed TCP connection =WARNING REPORT==== 20-Jun-2018::19:42:41 === closing AMQP connection <0.16110.21> (127.0.0.1:47934 -> 127.0.0.1:5671 - neutron-l3-agent:804510:8974dfd0-cb22-4925-bf95-ed5dcd905e41): client unexpectedly closed TCP connection =WARNING REPORT==== 20-Jun-2018::19:42:41 === closing AMQP connection <0.15410.21> (127.0.0.1:47814 -> 127.0.0.1:5671 - neutron-l3-agent:804510:ce915f0c-e46d-442f-aba2-5c204589fb0f): client unexpectedly closed TCP connection =WARNING REPORT==== 20-Jun-2018::19:42:41 === closing AMQP connection <0.15388.21> (127.0.0.1:47810 -> 127.0.0.1:5671 - neutron-l3-agent:804510:ab9b2193-199b-4312-9eca-2fbfc6cf27cd): client unexpectedly closed TCP connection I will continue to debug the issue tomorrow. I see no lonkage at this point with any of the previously listed constraints on this scenario. So I am going to copy Brian Haley for his L3 expertise, as well as the 3 FWAaaS cores to see if this directs their thoughts in any particular direction. I hope to continue the investigation tomorrow. Thanks, Nate Johnston njohnston __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev