Hi!

I think I understand corosync/pacemaker a bit, but I'm wondering occasionally: 
Today some node rebooted (still investigating why), and I examined the syslog.

Here's an interesting example:
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ] New Configuration:
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.31) r(1) 
ip(192.168.0.61)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.35) r(1) 
ip(192.168.0.65)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ] Members Left:
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.33) r(1) 
ip(192.168.0.63)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.34) r(1) 
ip(192.168.0.64)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ] Members Joined:
Dec 20 10:34:57 o1 corosync[12690]:  [pcmk  ] notice: pcmk_peer_update: 
Transitional membership event on ring 2496: memb=2, new=0, lost=2
Dec 20 10:34:57 o1 corosync[12690]:  [pcmk  ] info: pcmk_peer_update: memb: o1 
520295596
Dec 20 10:34:57 o1 corosync[12690]:  [pcmk  ] info: pcmk_peer_update: memb: o5 
587404460
Dec 20 10:34:57 o1 corosync[12690]:  [pcmk  ] info: pcmk_peer_update: lost: o3 
553850028
Dec 20 10:34:57 o1 corosync[12690]:  [pcmk  ] info: pcmk_peer_update: lost: o4 
570627244
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ] CLM CONFIGURATION CHANGE
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ] New Configuration:
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.31) r(1) 
ip(192.168.0.61)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.33) r(1) 
ip(192.168.0.63)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.34) r(1) 
ip(192.168.0.64)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.35) r(1) 
ip(192.168.0.65)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ] Members Left:
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ] Members Joined:
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.33) r(1) 
ip(192.168.0.63)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.34) r(1) 
ip(192.168.0.64)
Dec 20 10:34:57 o1 corosync[12690]:  [pcmk  ] notice: pcmk_peer_update: Stable 
membership event on ring 2496: memb=4, new=2, lost=0

Withing one second two nodes left the cluster/ring, then joined the 
cluster/ring. Shouldn't the ring number increase on every change?

In the very same second, three nodes left the cluster and joined again:
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ] CLM CONFIGURATION CHANGE
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ] New Configuration:
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.31) r(1) 
ip(192.168.0.61)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ] Members Left:
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.33) r(1) 
ip(192.168.0.63)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.34) r(1) 
ip(192.168.0.64)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.35) r(1) 
ip(192.168.0.65)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ] Members Joined:
Dec 20 10:34:57 o1 corosync[12690]:  [pcmk  ] notice: pcmk_peer_update: 
Transitional membership event on ring 2504: memb=1, new=0, lost=3
Dec 20 10:34:57 o1 corosync[12690]:  [pcmk  ] info: pcmk_peer_update: memb: o1 
520295596
Dec 20 10:34:57 o1 corosync[12690]:  [pcmk  ] info: pcmk_peer_update: lost: o3 
553850028
Dec 20 10:34:57 o1 corosync[12690]:  [pcmk  ] info: pcmk_peer_update: lost: o4 
570627244
Dec 20 10:34:57 o1 corosync[12690]:  [pcmk  ] info: pcmk_peer_update: lost: o5 
587404460
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ] CLM CONFIGURATION CHANGE
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ] New Configuration:
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.31) r(1) 
ip(192.168.0.61)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.33) r(1) 
ip(192.168.0.63)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.34) r(1) 
ip(192.168.0.64)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.35) r(1) 
ip(192.168.0.65)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ] Members Left:
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ] Members Joined:
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.33) r(1) 
ip(192.168.0.63)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.34) r(1) 
ip(192.168.0.64)
Dec 20 10:34:57 o1 corosync[12690]:  [CLM   ]     r(0) ip(172.20.3.35) r(1) 
ip(192.168.0.65)
Dec 20 10:34:57 o1 corosync[12690]:  [pcmk  ] notice: pcmk_peer_update: Stable 
membership event on ring 2504: memb=4, new=3, lost=0

A moment later I saw this:
Dec 20 10:34:57 rkdvmso1 kernel: [185601.044523] kernel BUG at 
/usr/src/packages/BUILD/ocfs2-1.6/xen/ocfs2/heartbeat.c:67!
[...]
Dec 20 10:34:58 o1 kernel: [185601.044674] Supported: Yes
Dec 20 10:34:58 o1 kernel: [185601.044678]
Dec 20 10:34:59 o1 kernel: [185601.044682] Pid: 14239, comm: ocfs2_controld. 
Not tainted 3.0.42-0.7-xen #1 Sun Microsystems Sun Fire X4100 Server/Sun Fire 
X4100 Server
Dec 20 10:34:59 o1 kernel: [185601.044692] RIP: e030:[<ffffffffa06818f5>]  
[<ffffffffa06818f5>] ocfs2_do_node_down+0x65/0x70 [ocfs2]
Dec 20 10:35:00 o1 kernel: [185601.044745] RSP: e02b:ffff880032331e18  EFLAGS: 
00010246
Dec 20 10:35:00 o1 kernel: [185601.044749] RAX: 0000000000000000 RBX: 
ffff880032960da0 RCX: 000000000000001f
Dec 20 10:35:00 o1 kernel: [185601.044753] RDX: 0000000000000000 RSI: 
ffff8800314b5000 RDI: 000000001f0314ac
[???]

(The bug messages were interleaved with cluster messages (cLVM and OCFS2 are 
quite chatty). Before completion, SBD kicked in:)

Dec 20 10:34:59 o1 sbd: [12635]: info: Received command off from o3 on disk 
/dev/disk/by-id/dm-name-Shared-E1_part1
Dec 20 10:34:59 o1 sbd: [12636]: info: Received command off from o3 on disk 
/dev/disk/by-id/dm-name-Shared-E2_part1
Dec 20 10:34:59 o1 cluster-dlm: check_fencing_done: 
0192F256F87A4E5CA69BCF2BDF7659FA check_fencing 520295596 wait add 1355810586 
fail 1355996098 last 0
Dec 20 10:34:59 o1 sbd: [12635]: info: sysrq-trigger: o
Dec 20 10:34:59 o1 sbd: [12636]: info: sysrq-trigger: o
Dec 20 10:34:59 o1 sbd: [12635]: EMERG: Rebooting system.  Reason: sbd is 
self-fencing (power-off)
Dec 20 10:34:59 o1 sbd: [12636]: EMERG: Rebooting system.  Reason: sbd is 
self-fencing (power-off)

The following reboot also replaced the kernel 3.0.42-0.7-xen with 
3.0.51-0.7.9-xen (a reboot was intended anyway, but manually ;-)

(Reboot also fenced the DC, and another DC was elected)

After a short wile I saw messages like these:
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #2 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #3 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #4 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #5 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #6 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #7 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #8 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #9 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #10 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #20 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #30 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #40 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #50 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #60 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #70 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #80 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #90 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #100 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #200 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #300 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #400 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #500 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #600 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:39 o1 cmirrord[16392]: [35cRf7c2]  Retry #700 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:40 o1 cmirrord[16392]: [35cRf7c2]  Retry #800 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:40 o1 cmirrord[16392]: [35cRf7c2]  Retry #900 of cpg_mcast_joined: 
SA_AIS_ERR_TRY_AGAIN
Dec 20 10:42:40 o1 cmirrord[16392]: [35cRf7c2]  Retry #1000 of 
cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN - OpenAIS not handling the load?
Dec 20 10:42:41 o1 cmirrord[16392]: [35cRf7c2]  Retry #2000 of 
cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN - OpenAIS not handling the load?
Dec 20 10:42:42 o1 cmirrord[16392]: [35cRf7c2]  Retry #3000 of 
cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN - OpenAIS not handling the load?
Dec 20 10:42:43 o1 cmirrord[16392]: [35cRf7c2]  Retry #4000 of 
cpg_mcast_joined: SA_AIS_ERR_TRY_AGAIN - OpenAIS not handling the load?
Dec 20 10:42:44 o1 cluster-dlm: update_cluster: Processing membership 2536
Dec 20 10:42:44 o1 corosync[12829]:  [CLM   ] CLM CONFIGURATION CHANGE
Dec 20 10:42:44 o1 corosync[12829]:  [CLM   ] New Configuration:
Dec 20 10:42:44 o1 cluster-dlm: dlm_process_node: Skipped active node 
520295596: born-on=2520, last-seen=2536, this-event=2536, last-event=2524
Dec 20 10:42:44 o1 corosync[12829]:  [CLM   ]     r(0) ip(172.20.3.31) r(1) 
ip(192.168.0.61)
Dec 20 10:42:44 o1 corosync[12829]:  [CLM   ]     r(0) ip(172.20.3.32) r(1) 
ip(192.168.0.62)
Dec 20 10:42:44 o1 cluster-dlm: dlm_process_node: Skipped active node 
537072812: born-on=2512, last-seen=2536, this-event=2536, last-event=2524
Dec 20 10:42:44 o1 corosync[12829]:  [CLM   ]     r(0) ip(172.20.3.34) r(1) 
ip(192.168.0.64)
Dec 20 10:42:44 o1 corosync[12829]:  [CLM   ]     r(0) ip(172.20.3.35) r(1) 
ip(192.168.0.65)
Dec 20 10:42:44 o1 corosync[12829]:  [CLM   ] Members Left:
Dec 20 10:42:44 o1 corosync[12829]:  [CLM   ] Members Joined:
Dec 20 10:42:44 o1 corosync[12829]:  [pcmk  ] notice: pcmk_peer_update: 
Transitional membership event on ring 2536: memb=4, new=0, lost=0

At some later time I saw the start of what I call "retransmit pyramid":
Dec 20 11:07:38 o1 corosync[12829]:  [TOTEM ] Retransmit List: d2d
Dec 20 11:07:38 o1 corosync[12829]:  [TOTEM ] Retransmit List: d2f
Dec 20 11:07:38 o1 corosync[12829]:  [TOTEM ] Retransmit List: d31
Dec 20 11:07:38 o1 corosync[12829]:  [TOTEM ] Retransmit List: d33
Dec 20 11:07:38 o1 corosync[12829]:  [TOTEM ] Retransmit List: d35
Dec 20 11:07:38 o1 corosync[12829]:  [TOTEM ] Retransmit List: d37
Dec 20 11:07:38 o1 corosync[12829]:  [TOTEM ] Retransmit List: d39
[...]
Dec 20 11:07:38 o1 corosync[12829]:  [TOTEM ] Retransmit List: d5f
Dec 20 11:07:38 o1 corosync[12829]:  [TOTEM ] Retransmit List: d60
Dec 20 11:07:38 o1 corosync[12829]:  [TOTEM ] Retransmit List: d61
Dec 20 11:07:38 o1 corosync[12829]:  [TOTEM ] Retransmit List: d62
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d68
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d69
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d6a
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d6a
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d6c
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Marking ringid 0 interface 
172.20.3.31 FAULTY
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:41 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
[...]
Dec 20 11:07:42 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:42 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:42 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:42 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:42 o1 corosync[12829]:  [TOTEM ] Automatically recovered ring 0
Dec 20 11:07:42 o1 corosync[12829]:  [TOTEM ] Automatically recovered ring 0
Dec 20 11:07:42 o1 corosync[12829]:  [TOTEM ] Automatically recovered ring 0
Dec 20 11:07:42 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:42 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:42 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:42 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:43 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:43 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:43 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:43 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:43 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:44 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:44 o1 corosync[12829]:  [TOTEM ] Marking ringid 0 interface 
172.20.3.31 FAULTY
Dec 20 11:07:44 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:44 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:44 o1 corosync[12829]:  [TOTEM ] Automatically recovered ring 0
Dec 20 11:07:44 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:44 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:45 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:07:45 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:08:12 o1 corosync[12829]:  [TOTEM ] Retransmit List: d78 d79 d7a d7b 
d7c d7d d7e d7f d80 d81
Dec 20 11:08:12 o1 corosync[12829]:  [TOTEM ] Retransmit List: d78 d79 d7a d7b 
d7c d7d d7e d7f d80 d81
Dec 20 11:08:12 o1 corosync[12829]:  [TOTEM ] Retransmit List: d78 d79 d7a d7b 
d7c d7d d7e d7f d80 d81
Dec 20 11:08:12 o1 corosync[12829]:  [TOTEM ] Automatically recovered ring 1
Dec 20 11:08:12 o1 corosync[12829]:  [TOTEM ] Automatically recovered ring 1
Dec 20 11:08:12 o1 corosync[12829]:  [TOTEM ] Retransmit List: d78 d79 d7a d7b 
d7c d7d d7e d7f d80 d81
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d96
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Marking ringid 0 interface 
172.20.3.31 FAULTY
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9a
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9a
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9a
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9a
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 da3
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 da3
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 da3 da4
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 da8
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: da8 d79 d7a d7b 
d7c d7d d7e d7f d80 d81 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 da9
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: da9 d79 d7a d7b 
d7c d7d d7e d7f d80 d81 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 da8
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: da8 d79 d7a d7b 
d7c d7d d7e d7f d80 d81 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 da9
[...]
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: da8 d79 d7a d7b 
d7c d7d d7e d7f d80 d81 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 da9
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: da9 d79 d7a d7b 
d7c d7d d7e d7f d80 d81 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 da8
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: da8 d79 d7a d7b 
d7c d7d d7e d7f d80 d81 d98 d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 da9
Dec 20 11:08:13 o1 corosync[12829]:  [TOTEM ] Retransmit List: da9 d79 d82 d98 
d9c d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 da8
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: d98 d9c d9d d9e 
da0 da1 da2 da3 da4 da5 da6 da7 da8 da9
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: d98 d9c d9d d9e 
da0 da1 da2 da3 da4 da5 da6 da7 da8 da9
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: d98 d9c d9d d9e 
da0 da1 da2 da3 da4 da5 da6 da7 da8 da9
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Automatically recovered ring 0
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: d9b d9f d98 d9c 
d9d d9e da0 da1 da2 da3 da4 da5 da6 da7 da8 da9
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: d9c d9e da1 da3 
da5 da7 da9
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: d9e da3 da7
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: da3
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: da3 dcc dcd
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: dcc
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: dcf
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: dcf
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: dcf dd0
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: dd0 dd1
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: dd0
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: dd4
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: dd7
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: dd9
Dec 20 11:08:14 o1 corosync[12829]:  [TOTEM ] Retransmit List: dda
[...]
Dec 20 11:08:15 o1 corosync[12829]:  [TOTEM ] Retransmit List: e74
Dec 20 11:08:15 o1 corosync[12829]:  [TOTEM ] Retransmit List: e76
Dec 20 11:08:15 o1 corosync[12829]:  [TOTEM ] Retransmit List: e76 e77
Dec 20 11:08:15 o1 corosync[12829]:  [TOTEM ] Retransmit List: e76 e78
Dec 20 11:08:15 o1 corosync[12829]:  [TOTEM ] Retransmit List: e76
Dec 20 11:08:15 o1 corosync[12829]:  [TOTEM ] Retransmit List: e76
Dec 20 11:08:15 o1 corosync[12829]:  [TOTEM ] Marking ringid 1 interface 
192.168.0.61 FAULTY
Dec 20 11:08:16 o1 corosync[12829]:  [TOTEM ] Automatically recovered ring 1
Dec 20 11:08:16 o1 corosync[12829]:  [TOTEM ] Automatically recovered ring 1
Dec 20 11:08:16 o1 corosync[12829]:  [TOTEM ] Automatically recovered ring 1
Dec 20 11:08:19 o1 corosync[12829]:  [TOTEM ] Retransmit List: e82
Dec 20 11:08:19 o1 corosync[12829]:  [TOTEM ] Retransmit List: e85
Dec 20 11:08:24 o1 cib: [12867]: info: cib_stats: Processed 61 operations 
(0.00us average, 0% utilization) in the last 10min

So there was a significant "blackout" of communications. I always wondered 
whether this is purely a software problem. At the same time I had even a longer 
retransmit list on another node, while some nodes showed no problem at all:

Dec 20 11:08:11 o5 corosync[12677]:  [TOTEM ] Retransmit List: d65 d66 d67 d68 
d69 d6a d6b d6c d6d d6e d6f d70 d71 d72 d73 d74 d75 d76 d77 d78 d79 d7a d7b d7c 
d7d d7e d7f d80 d81 d82

Does anybody know what causes these messages?
Regards,
Ulrich

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to