Hi all,

I have a little problem here and would like to get some help:

I have (had?) a working three-node cluster of two active nodes (nebel1 and nebel2) and one standby-node (nebel3) running debian squeeze + backports. That is pacemaker 1.1.7-1~bpo60+1 and corosync 1.4.2-1~bpo60+1.

Now I updated the standby-node nebel3 to debian wheezy which went without problems itself. But as neither the versions of pacemaker and corosync changed, I expected the updated nebel3 to join the original cluster again. Little did I know... So while nebel3 has pacemaker 1.1.7-1 and corosync 1.4.2-3, it seems something in the update broke it. /etc/corosync/corosync.conf is still the same on all nodes.

I suspect the problem is somewhere in corosync as nebel1 and nebel2 only see each other:

$ ssh root@nebel2 --  corosync-objctl |grep member
runtime.totem.pg.mrp.srp.members.33648138.ip=r(0) ip(10.110.1.2) r(1) ip(10.112.0.2)
runtime.totem.pg.mrp.srp.members.33648138.join_count=1
runtime.totem.pg.mrp.srp.members.33648138.status=joined
runtime.totem.pg.mrp.srp.members.16870922.ip=r(0) ip(10.110.1.1) r(1) ip(10.112.0.1)
runtime.totem.pg.mrp.srp.members.16870922.join_count=1
runtime.totem.pg.mrp.srp.members.16870922.status=joined
runtime.totem.pg.mrp.srp.members.50425354.ip=r(0) ip(10.110.1.3) r(1) ip(10.112.0.3)
runtime.totem.pg.mrp.srp.members.50425354.join_count=39
runtime.totem.pg.mrp.srp.members.50425354.status=left

nebel3 on the other hand:

$ ssh root@nebel3 --  corosync-objctl |grep member
runtime.totem.pg.mrp.srp.members.50425354.ip=r(0) ip(10.110.1.3) r(1) ip(10.112.0.3)
runtime.totem.pg.mrp.srp.members.50425354.join_count=1
runtime.totem.pg.mrp.srp.members.50425354.status=joined

Both nebel2 and nebel3 think the communication-rings are free of faults:

$ ssh root@nebel2 --  corosync-cfgtool -s
Printing ring status.
Local node ID 33648138
RING ID 0
        id      = 10.110.1.2
        status  = ring 0 active with no faults
RING ID 1
        id      = 10.112.0.2
        status  = ring 1 active with no faults
$ ssh root@nebel3 --  corosync-cfgtool -s
Printing ring status.
Local node ID 50425354
RING ID 0
        id      = 10.110.1.3
        status  = ring 0 active with no faults
RING ID 1
        id      = 10.112.0.3
        status  = ring 1 active with no faults

I can ping all the participating nodes via all their connections and IPs from all nodes

The corosync.log on nebel2 doesn't mention nebel3 after it leaving the cluster for reboot after the update. Likewise the corosync.log on nebel3 doesn't mention nebel2 and nebel1 anymore.

So, what did I miss during the update? How can I get nebel3 to join back into the original cluster instead of forming its own 1-out-of-3 cluster (with the same resources defined)?

Any helps is highly appreciated!

- Arnold

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to