AFAIK this should be fixed, looking at
https://review.openstack.org/#/c/542858/. Closing on TripleO side,
please shout if i'm wrong :).

** Changed in: tripleo
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1738768

Title:
  Dataplane downtime when containers are stopped/restarted

Status in neutron:
  Invalid
Status in tripleo:
  Fix Released

Bug description:
  I have deployed a 3 controllers - 3 computes HA environment with
  ML2/OVS and observed dataplane downtime when restarting/stopping
  neutron-l3 container on controllers. This is what I did:

  1. Created a network, subnet, router, a VM and attached a FIP to the VM
  2. Left a ping running on the undercloud to the FIP
  3. Stopped l3 container in controller-0.
     Result: Observed some packet loss while the router was failed over to 
controller-1
  4. Stopped l3 container in controller-1
     Result: Observed some packet loss while the router was failed over to 
controller-2
  5. Stopped l3 container in controller-2
     Result: No traffic to/from the FIP at all.

  (overcloud) [stack@undercloud ~]$ ping 10.0.0.131
  PING 10.0.0.131 (10.0.0.131) 56(84) bytes of data.
  64 bytes from 10.0.0.131: icmp_seq=1 ttl=63 time=1.83 ms
  64 bytes from 10.0.0.131: icmp_seq=2 ttl=63 time=1.56 ms

  <---- Last l3 container was stopped here (step 5 above)---->

  From 10.0.0.1 icmp_seq=10 Destination Host Unreachable
  From 10.0.0.1 icmp_seq=11 Destination Host Unreachable

  When containers are stopped, I guess that the qrouter namespace is not
  accessible by the kernel:

  [heat-admin@overcloud-controller-2 ~]$ sudo ip netns e 
qrouter-5244e91c-f533-4128-9289-f37c9656792c ip a
  RTNETLINK answers: Invalid argument
  RTNETLINK answers: Invalid argument
  setting the network namespace "qrouter-5244e91c-f533-4128-9289-f37c9656792c" 
failed: Invalid argument

  This means that not only we're getting controlplane downtime but also 
dataplane which could be seen as a regression when compared to 
non-containerized environments.
  The same would happen with DHCP and I expect instances not being able to 
fetch IP addresses from dnsmasq when dhcp containers are stopped.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1738768/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to