Public bug reported:

After a successful upgrade of the control-plance from Train -> Ussuri on
Ubuntu Bionic, we upgraded a first compute / network node and
immediately ran into issues with Neutron:

We noticed that Neutron is extremely slow in setting up and wiring the
network ports, so slow it would never finish and throw all sorts of
errors (RabbitMQ connection timeouts, full sync required, ...)

We were now able to reproduce the error on our Ussuri DEV cloud as well:

1) First we used strace -ffff -p $PID_OF_NEUTRON_LINUXBRIDGE_AGENT and noticed 
that the data exchange on the unix socket between the rootwrap-daemon and the 
main process is really really slow.
One could actually read line by line the read calls to the fd of the socket.

2) We then (after adding lots of log lines and other intensive manual
debugging) used py-spy (https://github.com/benfred/py-spy) via "py-spy
top --pid $PID" on the running neutron-linuxbridge-agent process and
noticed all the CPU time (process was at 100% most of the time) was
spent in msgpack/fallback.py

3) Since the issue was not observed in TRAIN we compared the msgpack
version used and noticed that TRAIN was using version 0.5.6 while Ussuri
upgraded this dependency to 0.6.2.

4) We then downgraded to version 0.5.6 of msgpack (ignoring the actual
dependencies)

--- cut ---
apt policy python3-msgpack
python3-msgpack:
  Installed: 0.6.2-1~cloud0
  Candidate: 0.6.2-1~cloud0
  Version table:
 *** 0.6.2-1~cloud0 500
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu 
bionic-updates/ussuri/main amd64 Packages
     0.5.6-1 500
        500 http://de.archive.ubuntu.com/ubuntu bionic/main amd64 Packages
        100 /var/lib/dpkg/status
--- cut ---


vs.

--- cut ---
apt policy python3-msgpack
python3-msgpack:
  Installed: 0.5.6-1
  Candidate: 0.6.2-1~cloud0
  Version table:
     0.6.2-1~cloud0 500
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu 
bionic-updates/ussuri/main amd64 Packages
 *** 0.5.6-1 500
        500 http://de.archive.ubuntu.com/ubuntu bionic/main amd64 Packages
        100 /var/lib/dpkg/status
--- cut ---


and et voila: The Neutron-Linuxbridge-Agent worked just like before (building 
one port every few seconds) and all network ports eventually converged to 
ACTIVE.

I could not yet spot which commit of msgpack changes
(https://github.com/msgpack/msgpack-python/compare/0.5.6...v0.6.2) might
have caused this issue, but I am really certain that this is a major
issue for Ussuri on Ubuntu Bionic.

There are "similar" issues with
 * https://bugs.launchpad.net/oslo.privsep/+bug/1844822
 * https://bugs.launchpad.net/oslo.privsep/+bug/1896734

both related to msgpack or the size of messages exchanged.

** Affects: cloud-archive
     Importance: Undecided
         Status: New

** Affects: neutron
     Importance: Undecided
         Status: New

** Affects: oslo.privsep
     Importance: Undecided
         Status: New

** Affects: python-oslo.privsep (Ubuntu)
     Importance: Undecided
         Status: New

** Also affects: ubuntu
   Importance: Undecided
       Status: New

** Package changed: ubuntu => neutron

** Also affects: python-oslo.privsep (Ubuntu)
   Importance: Undecided
       Status: New

** Summary changed:

- linuxbridge agent broken due to msgpack upgrade 0.6.2 for Ussuri on Bionic
+ linuxbridge agent broken due to msgpack upgrade to 0.6.2 for Ussuri on Bionic

** Summary changed:

- linuxbridge agent broken due to msgpack upgrade to 0.6.2 for Ussuri on Bionic
+ msgpack upgrade to 0.6.2 for Ussuri on Bionic breaks linuxbridge agent

** Summary changed:

- msgpack upgrade to 0.6.2 for Ussuri on Bionic breaks linuxbridge agent
+ msgpack upgrade to 0.6.2 breaks linuxbridge agent

** Also affects: neutron (Ubuntu)
   Importance: Undecided
       Status: New

** Description changed:

  After a successful upgrade of the control-plance from Train -> Ussuri on
  Ubuntu Bionic, we upgraded a first compute / network node and
  immediately ran into issues with Neutron:
  
  We noticed that Neutron is extremely slow in setting up and wiring the
  network ports, so slow it would never finish and throw all sorts of
  errors (RabbitMQ connection timeouts, full sync required, ...)
  
- 
  We were now able to reproduce the error on our Ussuri DEV cloud as well:
-  
  
  1) First we used strace -ffff -p $PID_OF_NEUTRON_LINUXBRIDGE_AGENT and 
noticed that the data exchange on the unix socket between the rootwrap-daemon 
and the main process is really really slow.
  One could actually read line by line the read calls to the fd of the socket.
  
  2) We then (after adding lots of log lines and other intensive manual
  debugging) used py-spy (https://github.com/benfred/py-spy) via "py-spy
  top --pid $PID" on the running neutron-linuxbridge-agent process and
  noticed all the CPU time (process was at 100% most of the time) was
  spent in msgpack/fallback.py
  
  3) Since the issue was not observed in TRAIN we compared the msgpack
  version used and noticed that TRAIN was using version 0.5.6 while Ussuri
  upgraded this dependency to 0.6.2.
  
- 
- 4) We then installed version 0.5.6 of msgpack (ignoring the actual 
dependencies)
+ 4) We then downgraded to version 0.5.6 of msgpack (ignoring the actual
+ dependencies)
  
  --- cut ---
- apt policy python3-msgpack                            
+ apt policy python3-msgpack
  python3-msgpack:
-   Installed: 0.6.2-1~cloud0
+   Installed: 0.6.2-1~cloud0
+   Candidate: 0.6.2-1~cloud0
+   Version table:
+  *** 0.6.2-1~cloud0 500
+         500 http://ubuntu-cloud.archive.canonical.com/ubuntu 
bionic-updates/ussuri/main amd64 Packages
+      0.5.6-1 500
+         500 http://de.archive.ubuntu.com/ubuntu bionic/main amd64 Packages
+         100 /var/lib/dpkg/status
+ --- cut ---
+ 
+ 
+ vs.
+ 
+ --- cut ---
+ apt policy python3-msgpack
+ python3-msgpack:
+   Installed: 0.5.6-1
    Candidate: 0.6.2-1~cloud0
    Version table:
-  *** 0.6.2-1~cloud0 500
+      0.6.2-1~cloud0 500
          500 http://ubuntu-cloud.archive.canonical.com/ubuntu 
bionic-updates/ussuri/main amd64 Packages
-      0.5.6-1 500
+  *** 0.5.6-1 500
          500 http://de.archive.ubuntu.com/ubuntu bionic/main amd64 Packages
          100 /var/lib/dpkg/status
  --- cut ---
  
  
  and et voila: The Neutron-Linuxbridge-Agent worked just like before (building 
one port every few seconds) and all network ports eventually converged to 
ACTIVE.
  
- 
  I could not yet spot which commit of msgpack changes
  (https://github.com/msgpack/msgpack-python/compare/0.5.6...v0.6.2) might
  have caused this issue, but I am really certain that this is a major
  issue for Ussuri on Ubuntu Bionic.
  
- 
- There are "similar" issues with 
-  * https://bugs.launchpad.net/oslo.privsep/+bug/1844822
-  * https://bugs.launchpad.net/oslo.privsep/+bug/1896734
+ There are "similar" issues with
+  * https://bugs.launchpad.net/oslo.privsep/+bug/1844822
+  * https://bugs.launchpad.net/oslo.privsep/+bug/1896734
  
  both related to msgpack or the size of messages exchanged.

** Package changed: neutron (Ubuntu) => cloud-archive

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1937261

Title:
  msgpack upgrade to 0.6.2 breaks linuxbridge agent

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1937261/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to