** Description changed: + [Impact] + + * Cluster resource timeouts are not working and should be working. + Timeouts are important in order for the actions (done for the resource) + don't timeout before we're expecting (sometimes starting a resource can + take more time than the default time because of configuration files, or + cache to be loaded, etc). + + [Test Case] + + * Create a pacemaker cluster with Ubuntu focal and configure a + primitive with: + + primitive haproxy systemd:haproxy \ + op monitor interval=2s \ + op start interval=0s timeout=500s \ + op stop interval=0s timeout=500s \ + meta migration-threshold=2 + + or even + + primitive haproxy systemd:haproxy \ + op monitor interval=2s \ + op start interval=0s timeout=500 \ + op stop interval=0s timeout=500 \ + meta migration-threshold=2 + + and observe timeouts are not being respected. + + [Regression Potential] + + * The number of patches are not small but they're ALL related to the + same thing: fixing timeout not working and re-organizing timing for + resources. + + * TBD (more info to come) + + [Other Info] + + * Original Description (from the reporter): + While working on pacemaker, i discovered a issue with timeouts haproxy_stop_0 on primary 'OCF_TIMEOUT' (198): call=583, status='Timed Out', exitreason='', last-rc-change='1970-01-04 17:21:18 -05:00', queued=44ms, exec=176272ms this lead me down the path of finding that setting a timeout unit value was not doing anything primitive haproxy systemd:haproxy \ - op monitor interval=2s \ - op start interval=0s timeout=500s \ - op stop interval=0s timeout=500s \ - meta migration-threshold=2 + op monitor interval=2s \ + op start interval=0s timeout=500s \ + op stop interval=0s timeout=500s \ + meta migration-threshold=2 primitive haproxy systemd:haproxy \ - op monitor interval=2s \ - op start interval=0s timeout=500 \ - op stop interval=0s timeout=500 \ - meta migration-threshold=2 + op monitor interval=2s \ + op start interval=0s timeout=500 \ + op stop interval=0s timeout=500 \ + meta migration-threshold=2 - the two above configs result in the same behaviour, pacemaker/crm seems to be ignoring the "s" + the two above configs result in the same behavior, pacemaker/crm seems + to be ignoring the "s" + I file a bug with pacemaker itself https://bugs.clusterlabs.org/show_bug.cgi?id=5429 but this lead to the following responsed, copied from the ticket: <<Looking back on your irc chat, I see you have a version of Pacemaker with a known bug: <<haproxy_stop_0 on primary 'OCF_TIMEOUT' (198): call=583, status='Timed Out', exitreason='', last-rc-<<change='1970-01-04 17:21:18 -05:00', queued=44ms, exec=176272ms <<The incorrect date is a result of bugs that occur in systemd resources when Pacemaker 2.0.3 is built <<with the -UPCMK_TIME_EMERGENCY_CGT C flag (which is not the default). I was only aware of that being the <<case in one Fedora release. If those are stock Ubuntu packages, please file an Ubuntu bug to make sure <<they are aware of it. <<The underlying bugs are fixed as of the Pacemaker 2.0.4 release. If anyone wants to backport specific <<commits instead, the github pull requests #1992 and #1997 should take care of it. It appears the the root cause of my issue with setting timeout values with units ("600s") is a bug in the build process of ubuntu pacemaker 1) lsb_release -d Description: Ubuntu 20.04 LTS 2) ii pacemaker 2.0.3-3ubuntu3 amd64 cluster resource manager 3) setting "100s" in the timeout of a resource should result in a 100 second timeout, not a 100 milisecond timeout 4) the settings unit value "s", is being ignored. force me to set the timeout to 10000 to get a 10 second timeout
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1881762 Title: resource timeout not respecting units To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1881762/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs