[Ubuntu-ha] [Bug 1848902] Re: haproxy in bionic can get stuck

Christian Ehrhardt  Mon, 18 Nov 2019 00:06:48 -0800

Uploaded to Bionic-unapproved

** Description changed:


  [Impact]
  
-  * The master process will exit with the status of the last worker. 
-    When the worker is killed with SIGTERM, it is expected to get 143 as an
-    exit status. Therefore, we consider this exit status as normal from a
-    systemd point of view. If it happens when not stopping, the systemd
-    unit is configured to always restart, so it has no adverse effect.
+  * The master process will exit with the status of the last worker.
+    When the worker is killed with SIGTERM, it is expected to get 143 as an
+    exit status. Therefore, we consider this exit status as normal from a
+    systemd point of view. If it happens when not stopping, the systemd
+    unit is configured to always restart, so it has no adverse effect.
  
-  * Backport upstream fix - adding another accepted RC to the systemd 
-    service
+  * Backport upstream fix - adding another accepted RC to the systemd
+    service
  
  [Test Case]
  
-  * You want to install haproxy and have it running. Then sigterm it a lot.
-    With the fix it would restart the service all the time, well except 
-    restart limit. But in the bad case it will just stay down and didn't 
-    even try to restart it.
+  * You want to install haproxy and have it running. Then sigterm it a lot.
+    With the fix it would restart the service all the time, well except
+    restart limit. But in the bad case it will just stay down and didn't
+    even try to restart it.
  
-    $ apt install haproxy
-    $ for x in {1..100}; do pkill -TERM -x haproxy ; sleep 0.1 ; done
-    $ systemctl status haproxy
+    $ apt install haproxy
+    $ for x in {1..100}; do pkill -TERM -x haproxy ; sleep 0.1 ; done
+    $ systemctl status haproxy
+ 
+    The above is a hacky way to trigger some A/B behavior on the fix.
+    It isn't perfect as systemd restart counters will kick in and you 
+    essentially check a secondary symptom.
+    I'd recommend to in addition run the following:
+ 
+    $ apt install haproxy
+    $ for x in {1..1000}; do pkill -TERM -x haproxy ; sleep 0.001 systemctl 
+ reset-failed haproxy.service; done
+    $ systemctl status haproxy
+ 
+    You can do so with even smaller sleeps, that should keep the service up 
+    and running (this isn't changing with the fix, but should work with the 
new code).
  
  [Regression Potential]
  
-  * This eventually is a conffile modification, so if there are other 
-    modifications done by the user they will get a prompt. But that isn't a 
-    regression. I checked the code and I can't think of another RC=143 that 
-    would due to that "no more" detected as error. I really think other 
-    than the update itself triggering a restart (as usual for services) 
-    there is no further regression potential to this.
+  * This eventually is a conffile modification, so if there are other
+    modifications done by the user they will get a prompt. But that isn't a
+    regression. I checked the code and I can't think of another RC=143 that
+    would due to that "no more" detected as error. I really think other
+    than the update itself triggering a restart (as usual for services)
+    there is no further regression potential to this.
  
  [Other Info]
-  
-  * Fix already active in IS hosted cloud without issues since a while
-  * Also reports (comment #5) show that others use this in production as 
-    well
+ 
+  * Fix already active in IS hosted cloud without issues since a while
+  * Also reports (comment #5) show that others use this in production as
+    well
  
  ---
  
  On a Bionic/Stein cloud, after a network partition, we saw several units
  (glance, swift-proxy and cinder) fail to start haproxy, like so:
  
  root@juju-df624b-6-lxd-4:~# systemctl status haproxy.service
  ● haproxy.service - HAProxy Load Balancer
     Loaded: loaded (/lib/systemd/system/haproxy.service; enabled; vendor 
preset: enabled)
     Active: failed (Result: exit-code) since Sun 2019-10-20 00:23:18 UTC; 1h 
35min ago
       Docs: man:haproxy(1)
             file:/usr/share/doc/haproxy/configuration.txt.gz
    Process: 2002655 ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE 
$EXTRAOPTS (code=exited, status=143)
    Process: 2002649 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS 
(code=exited, status=0/SUCCESS)
   Main PID: 2002655 (code=exited, status=143)
  
  Oct 20 00:16:52 juju-df624b-6-lxd-4 systemd[1]: Starting HAProxy Load 
Balancer...
  Oct 20 00:16:52 juju-df624b-6-lxd-4 systemd[1]: Started HAProxy Load Balancer.
  Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: Stopping HAProxy Load 
Balancer...
  Oct 20 00:23:18 juju-df624b-6-lxd-4 haproxy[2002655]: [WARNING] 292/001652 
(2002655) : Exiting Master process...
  Oct 20 00:23:18 juju-df624b-6-lxd-4 haproxy[2002655]: [ALERT] 292/001652 
(2002655) : Current worker 2002661 exited with code 143
  Oct 20 00:23:18 juju-df624b-6-lxd-4 haproxy[2002655]: [WARNING] 292/001652 
(2002655) : All workers exited. Exiting... (143)
  Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: haproxy.service: Main process 
exited, code=exited, status=143/n/a
  Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: haproxy.service: Failed with 
result 'exit-code'.
  Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: Stopped HAProxy Load Balancer.
  root@juju-df624b-6-lxd-4:~#
  
  The Debian maintainer came up with the following patch for this:
  
    https://www.mail-archive.com/haproxy@formilux.org/msg30477.html
  
  Which was added to the 1.8.10-1 Debian upload and merged into upstream 1.8.13.
  Unfortunately Bionic is on 1.8.8-1ubuntu0.4 and doesn't have this patch.
  
  Please consider pulling this patch into an SRU for Bionic.

-- 
You received this bug notification because you are a member of Ubuntu
High Availability Team, which is subscribed to haproxy in Ubuntu.
https://bugs.launchpad.net/bugs/1848902

Title:
  haproxy in bionic can get stuck

Status in haproxy package in Ubuntu:
  Fix Released
Status in haproxy source package in Bionic:
  Triaged

Bug description:
  [Impact]

   * The master process will exit with the status of the last worker.
     When the worker is killed with SIGTERM, it is expected to get 143 as an
     exit status. Therefore, we consider this exit status as normal from a
     systemd point of view. If it happens when not stopping, the systemd
     unit is configured to always restart, so it has no adverse effect.

   * Backport upstream fix - adding another accepted RC to the systemd
     service

  [Test Case]

   * You want to install haproxy and have it running. Then sigterm it a lot.
     With the fix it would restart the service all the time, well except
     restart limit. But in the bad case it will just stay down and didn't
     even try to restart it.

     $ apt install haproxy
     $ for x in {1..100}; do pkill -TERM -x haproxy ; sleep 0.1 ; done
     $ systemctl status haproxy

     The above is a hacky way to trigger some A/B behavior on the fix.
     It isn't perfect as systemd restart counters will kick in and you 
     essentially check a secondary symptom.
     I'd recommend to in addition run the following:

     $ apt install haproxy
     $ for x in {1..1000}; do pkill -TERM -x haproxy ; sleep 0.001 systemctl 
  reset-failed haproxy.service; done
     $ systemctl status haproxy

     You can do so with even smaller sleeps, that should keep the service up 
     and running (this isn't changing with the fix, but should work with the 
new code).

  [Regression Potential]

   * This eventually is a conffile modification, so if there are other
     modifications done by the user they will get a prompt. But that isn't a
     regression. I checked the code and I can't think of another RC=143 that
     would due to that "no more" detected as error. I really think other
     than the update itself triggering a restart (as usual for services)
     there is no further regression potential to this.

  [Other Info]

   * Fix already active in IS hosted cloud without issues since a while
   * Also reports (comment #5) show that others use this in production as
     well

  ---

  On a Bionic/Stein cloud, after a network partition, we saw several
  units (glance, swift-proxy and cinder) fail to start haproxy, like so:

  root@juju-df624b-6-lxd-4:~# systemctl status haproxy.service
  ● haproxy.service - HAProxy Load Balancer
     Loaded: loaded (/lib/systemd/system/haproxy.service; enabled; vendor 
preset: enabled)
     Active: failed (Result: exit-code) since Sun 2019-10-20 00:23:18 UTC; 1h 
35min ago
       Docs: man:haproxy(1)
             file:/usr/share/doc/haproxy/configuration.txt.gz
    Process: 2002655 ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE 
$EXTRAOPTS (code=exited, status=143)
    Process: 2002649 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS 
(code=exited, status=0/SUCCESS)
   Main PID: 2002655 (code=exited, status=143)

  Oct 20 00:16:52 juju-df624b-6-lxd-4 systemd[1]: Starting HAProxy Load 
Balancer...
  Oct 20 00:16:52 juju-df624b-6-lxd-4 systemd[1]: Started HAProxy Load Balancer.
  Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: Stopping HAProxy Load 
Balancer...
  Oct 20 00:23:18 juju-df624b-6-lxd-4 haproxy[2002655]: [WARNING] 292/001652 
(2002655) : Exiting Master process...
  Oct 20 00:23:18 juju-df624b-6-lxd-4 haproxy[2002655]: [ALERT] 292/001652 
(2002655) : Current worker 2002661 exited with code 143
  Oct 20 00:23:18 juju-df624b-6-lxd-4 haproxy[2002655]: [WARNING] 292/001652 
(2002655) : All workers exited. Exiting... (143)
  Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: haproxy.service: Main process 
exited, code=exited, status=143/n/a
  Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: haproxy.service: Failed with 
result 'exit-code'.
  Oct 20 00:23:18 juju-df624b-6-lxd-4 systemd[1]: Stopped HAProxy Load Balancer.
  root@juju-df624b-6-lxd-4:~#

  The Debian maintainer came up with the following patch for this:

    https://www.mail-archive.com/haproxy@formilux.org/msg30477.html

  Which was added to the 1.8.10-1 Debian upload and merged into upstream 1.8.13.
  Unfortunately Bionic is on 1.8.8-1ubuntu0.4 and doesn't have this patch.

  Please consider pulling this patch into an SRU for Bionic.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/haproxy/+bug/1848902/+subscriptions

_______________________________________________
Mailing list: https://launchpad.net/~ubuntu-ha
Post to     : ubuntu-ha@lists.launchpad.net
Unsubscribe : https://launchpad.net/~ubuntu-ha
More help   : https://help.launchpad.net/ListHelp

[Ubuntu-ha] [Bug 1848902] Re: haproxy in bionic can get stuck

Reply via email to