[Pacemaker] Problems after updating from debian squeeze to wheezy

Arnold Krille Mon, 29 Jul 2013 09:53:18 -0700

Hi all,

I have a little problem here and would like to get some help:

I have (had?) a working three-node cluster of two active nodes (nebel1and nebel2) and one standby-node (nebel3) running debian squeeze +backports. That is pacemaker 1.1.7-1~bpo60+1 and corosync1.4.2-1~bpo60+1.

Now I updated the standby-node nebel3 to debian wheezy which wentwithout problems itself. But as neither the versions of pacemaker andcorosync changed, I expected the updated nebel3 to join the originalcluster again. Little did I know... So while nebel3 has pacemaker1.1.7-1 and corosync 1.4.2-3, it seems something in the update broke it./etc/corosync/corosync.conf is still the same on all nodes.

I suspect the problem is somewhere in corosync as nebel1 and nebel2only see each other:


$ ssh root@nebel2 --  corosync-objctl |grep member

runtime.totem.pg.mrp.srp.members.33648138.ip=r(0) ip(10.110.1.2) r(1)ip(10.112.0.2)

runtime.totem.pg.mrp.srp.members.33648138.join_count=1
runtime.totem.pg.mrp.srp.members.33648138.status=joined

runtime.totem.pg.mrp.srp.members.16870922.ip=r(0) ip(10.110.1.1) r(1)ip(10.112.0.1)

runtime.totem.pg.mrp.srp.members.16870922.join_count=1
runtime.totem.pg.mrp.srp.members.16870922.status=joined

runtime.totem.pg.mrp.srp.members.50425354.ip=r(0) ip(10.110.1.3) r(1)ip(10.112.0.3)

runtime.totem.pg.mrp.srp.members.50425354.join_count=39
runtime.totem.pg.mrp.srp.members.50425354.status=left

nebel3 on the other hand:

$ ssh root@nebel3 --  corosync-objctl |grep member

runtime.totem.pg.mrp.srp.members.50425354.ip=r(0) ip(10.110.1.3) r(1)ip(10.112.0.3)

runtime.totem.pg.mrp.srp.members.50425354.join_count=1
runtime.totem.pg.mrp.srp.members.50425354.status=joined

Both nebel2 and nebel3 think the communication-rings are free offaults:


$ ssh root@nebel2 --  corosync-cfgtool -s
Printing ring status.
Local node ID 33648138
RING ID 0
        id      = 10.110.1.2
        status  = ring 0 active with no faults
RING ID 1
        id      = 10.112.0.2
        status  = ring 1 active with no faults
$ ssh root@nebel3 --  corosync-cfgtool -s
Printing ring status.
Local node ID 50425354
RING ID 0
        id      = 10.110.1.3
        status  = ring 0 active with no faults
RING ID 1
        id      = 10.112.0.3
        status  = ring 1 active with no faults

I can ping all the participating nodes via all their connections andIPs from all nodes

The corosync.log on nebel2 doesn't mention nebel3 after it leaving thecluster for reboot after the update. Likewise the corosync.log on nebel3doesn't mention nebel2 and nebel1 anymore.

So, what did I miss during the update? How can I get nebel3 to joinback into the original cluster instead of forming its own 1-out-of-3cluster (with the same resources defined)?


Any helps is highly appreciated!

- Arnold

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Problems after updating from debian squeeze to wheezy

Reply via email to