Public bug reported: We were debugging an unexpected failover of a PostgreSQL-9.3 Pacemaker cluster running on 14.04 LTS at a client. As a timeline, the client put back the second node (h1db2) into the cluster at around 9:10 AM, and the unexpected failover occured at 10:32 AM.
What exactly lead to the failover could not be exatly figured out, but two problems were apparent from the logs: 1. The standby monitor action thought there was no master running: May 4 09:10:59 h1db2 pgsql(pgsql)[2460]: INFO: Master does not exist. [Message repeats 175 times] May 4 10:32:04 h1db2 pgsql(pgsql)[2611]: INFO: Master does not exist. [...] May 4 10:32:04 h1db2 pgsql(pgsql)[2611]: INFO: I have a master right. At this point, Pacemaker decided to promote h1db2. 2. Between 9:10 and 10:32, the score of the standby was -INFINITY, at10:32 it was then set to the same score as the master (1000) while it should be 100 for standbys. Both problems were debugged and traced back to bugs in the pgsql resource agent version in trusty, which are due to output changes in newer pacemaker versions (including the one in trusty) and have since been fixed. The following git commits from https://github.com/ClusterLabs/resource- agents are relevant: https://github.com/ClusterLabs/resource- agents/commit/78ddf466e413d0c1f18f7610cfbd63968b012ce0 fixes the first issue. https://github.com/ClusterLabs/resource- agents/commit/956244dd05f69bdad979b252a3e359855b88e6bd fixes the second issue. However, several other intermediate commits are required on top of the version in trusty, so the full list we are using is: 956244dd05f69bdad979b252a3e359855b88e6bd b7911abce27889becc8a4637e003bfcf5ef1b15e (adjusted) ffc9c6444996144076ef2b4bc79a38569e05250a 404d205636ad02e09ddffdb9710dd660b8171c6b ff9f0ed32e64f9be9e57dc712ec241231b04d917 78ddf466e413d0c1f18f7610cfbd63968b012ce0 b7911abce27 needs to be adjusted as it uses a function (exec_with_retry) which is not available yet, but it (and its first argument) can be safely removed. Commits 3-5 just keep changing the same line (as does the last) so the final patch isn't getting any bigger. The attached patch makes the pgsql resource agent work much better for us, would it be possible to apply it to the resource-agents package in trusty? ** Affects: resource-agents (Ubuntu) Importance: Undecided Status: New ** Patch added: "Proposed patch" https://bugs.launchpad.net/bugs/1688613/+attachment/4872336/+files/pgsql.diff -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1688613 Title: pgsql RA has problems with pacemaker version To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1688613/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs