[bringing a side email conversation onto the main mailing list]

I have been looking into the issue with neutron_fwaas having an error
when running under the neutron-l3-agent on Debian when using wsgi.
Here's what I have tracked it down to at this point.  I am going to lay
it all out there, including points that you already know, because I am
going to bring in another party or two at this point.

To make sure we are on solid ground, let me restate what are the
parameters of the problem:

1. The error does not occur when neutron_fwaas is disabled.
2. The error does not occur if wsgi is in use.  If standard eventlet is
used, the error is not observed.
3. The error only occurs on debian; centos and ubuntu do not manifest
the problem.

As the neutron-l3-agent loads, it is trying to initialize the fwaas_v2
driver.  The driver initializes without incident, and then proceeds to
attempt to fetch firewall groups.  Note that you do not need to exercise
tempest to see this behavior; it is visible in the logs without anything
else going on.  Running pdb, I was able to trace the attempt to send the
message deep into the RPC transmission code; I saw very little there to
be suspicious of.

    2018-06-20 21:06:34.761 915007 DEBUG 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 [-] 
Fetch firewall groups from plugin get_firewall_groups_for_project 
/usr/lib/python3/dist-packages/neutron_fwaas/services/firewall/agents/l3reference/firewall_l3_agent_v2.py:44
    ...
    2018-06-20 21:07:05.551 915007 ERROR neutron.common.rpc [-] Timeout in RPC 
method get_firewall_groups_for_project. Waiting for 10 seconds before next 
attempt. If the server is not down, consider increasing the 
rpc_response_timeout option as Neutron server(s) may be overloaded and unable 
to respond quickly enough.:
    oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply 
to message ID 8616e98dd8d943eea1dcf99c04bd2be6

You can see, the RPC message goes into the ether and does not return.  This
results in the stacktraces in neutron-l3-agent.log.  This example is from a
later transaction.


        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 [-] 
FWaaS router add RPC info call failed for 8c13b5d7-7b93-4b91-ae4c-c387abe96734: 
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to 
message ID 44851518f5ee4d40a2cdbcabc27e3c92
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 
Traceback (most recent call last):
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
324, in get
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
return self._queues[msg_id].get(block=True, timeout=timeout)
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/eventlet/queue.py", line 313, in get
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
return waiter.wait()
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
return get_hub().switch()
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 294, in switch
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
return self.greenlet.switch()
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 
queue.Empty
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 During 
handling of the above exception, another exception occurred:
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 
Traceback (most recent call last):
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/neutron_fwaas/services/firewall/agents/l3reference/firewall_l3_agent_v2.py",
 line 292, in add_router
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
self._process_router_update(context, new_router)
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/neutron_fwaas/services/firewall/agents/l3reference/firewall_l3_agent_v2.py",
 line 256, in _process_router_update
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
fwg_list = self.fwplugin_rpc.get_firewall_groups_for_project(ctx)
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/neutron_fwaas/services/firewall/agents/l3reference/firewall_l3_agent_v2.py",
 line 45, in get_firewall_groups_for_project
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
host=self.host)
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/neutron/common/rpc.py", line 185, in call
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
time.sleep(wait)
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
self.force_reraise()
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
six.reraise(self.type_, self.value, self.tb)
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/six.py", line 686, in reraise
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
raise value
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/neutron/common/rpc.py", line 162, in call
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
return self._original_context.call(ctxt, method, **kwargs)
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 174, in call
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
retry=self.retry)
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 131, in _send
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
timeout=timeout, retry=retry)
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
559, in send
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
retry=retry)
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
548, in _send
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
result = self._waiter.wait(msg_id, timeout)
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
440, in wait
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     
message = self.waiters.get(msg_id, timeout=timeout)
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File 
"/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 
328, in get
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     'to 
message ID %s' % msg_id)
        2018-06-20 21:23:24.772 919205 ERROR 
neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 
oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to 
message ID 44851518f5ee4d40a2cdbcabc27e3c92

So it seems the RPC message for get_firewall_groups_for_project gets
lost. I can tell that there is no shortage of AMQP messaging occurring
between the neutron-l3-agent and RabbitMQ.  Looking at the RabbitMQ
error log, I am concerned that the message is not even really being
transmitted, because RabbitMQ is registering premature disconnections
from only the neutron_l3_agent at times that correspond to this testing.

        =WARNING REPORT==== 20-Jun-2018::19:42:41 ===
        closing AMQP connection <0.16171.21> (127.0.0.1:47942 -> 127.0.0.1:5671 
- neutron-l3-agent:804510:ecc4ca7d-361f-49f7-94f3-9b4a07d102fc):
        client unexpectedly closed TCP connection

        =WARNING REPORT==== 20-Jun-2018::19:42:41 ===
        closing AMQP connection <0.16110.21> (127.0.0.1:47934 -> 127.0.0.1:5671 
- neutron-l3-agent:804510:8974dfd0-cb22-4925-bf95-ed5dcd905e41):
        client unexpectedly closed TCP connection

        =WARNING REPORT==== 20-Jun-2018::19:42:41 ===
        closing AMQP connection <0.15410.21> (127.0.0.1:47814 -> 127.0.0.1:5671 
- neutron-l3-agent:804510:ce915f0c-e46d-442f-aba2-5c204589fb0f):
        client unexpectedly closed TCP connection

        =WARNING REPORT==== 20-Jun-2018::19:42:41 ===
        closing AMQP connection <0.15388.21> (127.0.0.1:47810 -> 127.0.0.1:5671 
- neutron-l3-agent:804510:ab9b2193-199b-4312-9eca-2fbfc6cf27cd):
        client unexpectedly closed TCP connection

I will continue to debug the issue tomorrow.  I see no lonkage at this
point with any of the previously listed constraints on this scenario.
So I am going to copy Brian Haley for his L3 expertise, as well as the 3
FWAaaS cores to see if this directs their thoughts in any particular
direction.  I hope to continue the investigation tomorrow.

Thanks,

Nate Johnston
njohnston

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to