Additional information from the charm:

Without cluster_count set to NUM_UNITS a race occurs where the relation
to the last hacluster node is not yet set leading to the attempt to
startup corosync and pacemaker with only n-1/n nodes.

The last node only has one relationship it is aware of yet when there should be 
2 relations:
relation-list -r hanode:0
hacluster/0

corosync.conf looks like the following when there should be 3 nodes:

nodelist {

        node {
                ring0_addr: 10.5.35.235
                nodeid: 1000
        }

        node {
                ring0_addr: 10.5.35.237
                nodeid: 1001
        }

}

The services themselves (not the charm) fail:
corosync logs thousands of RETRANSMIT errors.
pacemaker eventually times out after waiting on corosync.

Adding more documentation to push the setting of cluster_count and
updating the amulet tests to include it.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1654403

Title:
  Race condition in hacluster charm that leaves pacemaker down

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1654403/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to