[Pacemaker] Backup ring is marked faulty

Sebastian Kaps Tue, 02 Aug 2011 08:03:54 -0700

Hi,

we're running a two-node cluster with redundant rings.

Ring 0 is a 10 GB direct connection; ring 1 consists of two 1GBinterfaces that are bonded inactive-backup mode and routed through two independent switches for eachnode. The ring 1 networkis our "normal" 1G LAN and should only be used in case the direct 10Gconnection should fail.I often (once a day on average, I'd guess) see that ring 1 (an onlythat one) is marked as

FAULTY without any obvious reasons.

Aug 2 08:56:15 node02 corosync[5752]: [TOTEM ] Retransmit List: c76c7a c7c c7e c80 c82 c84

Aug  2 08:56:15 node02 corosync[5752]:  [TOTEM ] Retransmit List: c82

Aug 2 08:56:15 node02 corosync[5752]: [TOTEM ] Marking seqid 568416ringid 1 interface x.y.z.1 FAULTY - administrative interventionrequired.

Whenever I see this, I check if the other node's address can be pinged(I never saw anyconnectivity problems there), then reenable the ring with"corosync-cfgtool -r" and

everything looks ok for a while (i.e. hours or days).

How could I find out why this happens?

What do these "Retransmit List" or seqid (sequence id, I assume?)values tell me?Is it safe to reenable the second ring when the partner node can bepinged successfully?


The totem section on our config looks like this:

totem {
       rrp_mode:       passive
       join:   60
       max_messages:   20
       vsftype:        none
       consensus:      10000
       secauth:        on
       token_retransmits_before_loss_const:    10
       threads:        16
       token:  10000
       version:        2
       interface {
               bindnetaddr:    192.168.1.0
               mcastaddr:      239.250.1.1
               mcastport:      5405
               ringnumber:     0
       }
       interface {
               bindnetaddr:    x.y.z.0
               mcastaddr:      239.250.1.2
               mcastport:      5415
               ringnumber:     1
       }
       clear_node_high_bit:    yes
}

--
Sebastian

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] Backup ring is marked faulty

Reply via email to