By doing graceful shutdowns I can get in a state where the last node to die will have "safe_to_bootstrap:1" in its grastate.dat file. But I couldn't get that node back running, which was odd, as it should be the *only* one that can be started. I had to use one of the other initscript targets, restart-bootstrap, instead of just restart, or else it would timeout trying to reach the "juju cluster":
2018-11-09 18:54:58 14147 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1478: Failed to open channel 'juju_cluster' at 'gcomm://10.0.100.131,10.0.100.191': -110 (Connection timed out) I see two options here (at least): a) we backport just what was called the workaround bit, since you say this is what you have been using for a long time now. That is the bit that handles the case where all nodes crashed, and thus "safe_to_bootstrap" is set to zero in all of them. Without the fix, in this case no node will be able to start up. The fix uses the same logic that has been always used to determine the right node to start before "safe_to_bootstrap" existed, and once it finds that node, it just flips that flag to 1 to allow the service to be started b) we backport the full patch, which consiste of part (a) above, plus skipping the logic to find the right node to start if it finds "safe_to_bootstrap" set to 1. This one will need more testing. -- You received this bug notification because you are a member of Ubuntu Server, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/1789527 Title: Galera agent doesn't work when grastate.dat contains safe_to_bootstrap To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/resource-agents/+bug/1789527/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs