Hey Team, I'm receiving some strange intermittent failovers on a two-node cluster (happens once every week or two). When this happens, both nodes are unavailable; one node will be marked offline and the other will be shown as unclean. Any help on this would be massively appreciated. Thanks.
Running Ubuntu 12.04 (64-bit) Pacemaker 1.1.6-2ubuntu3.3 Corosync 1.4.2-2ubuntu0.2 Here are the logs: Nov 08 14:26:26 corosync [pcmk ] info: pcmk_ipc_exit: Client crmd (conn=0x12bebe0, async-conn=0x12bebe0) left Nov 08 14:26:26 corosync [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2) Nov 08 14:26:27 corosync [pcmk ] info: pcmk_ipc_exit: Client attrd (conn=0x12d0230, async-conn=0x12d0230) left Nov 08 14:26:32 corosync [pcmk ] info: pcmk_ipc_exit: Client cib (conn=0x12c7d80, async-conn=0x12c7d80) left Nov 08 14:26:32 corosync [pcmk ] info: pcmk_ipc_exit: Client stonith-ng (conn=0x12c3a20, async-conn=0x12c3a20) left Nov 08 14:26:32 corosync [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2) Nov 08 14:26:32 corosync [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2) Nov 08 14:26:32 corosync [pcmk ] info: pcmk_ipc: Recorded connection 0x12bebe0 for stonith-ng/0 Nov 08 14:26:32 corosync [pcmk ] info: pcmk_ipc: Recorded connection 0x12c2f40 for attrd/0 Nov 08 14:26:33 corosync [pcmk ] info: pcmk_ipc: Recorded connection 0x12c72a0 for cib/0 Nov 08 14:26:33 corosync [pcmk ] info: pcmk_ipc: Sending membership update 12 to cib Nov 08 14:26:33 corosync [pcmk ] info: pcmk_ipc: Recorded connection 0x12cb600 for crmd/0 Nov 08 14:26:33 corosync [pcmk ] info: pcmk_ipc: Sending membership update 12 to crmd Output of crm configure show: node p-sbc3 \ attributes standby="off" node p-sbc4 \ attributes standby="off" primitive fs lsb:FSSofia \ op monitor interval="2s" enabled="true" timeout="10s" on-fail="standby" \ meta target-role="Started" primitive fs-ip ocf:heartbeat:IPaddr2 \ params ip="10.100.0.90" nic="eth0:0" cidr_netmask="24" \ op monitor interval="10s" primitive fs-ip2 ocf:heartbeat:IPaddr2 \ params ip="10.100.0.99" nic="eth0:1" cidr_netmask="24" \ op monitor interval="10s" group cluster_services fs-ip fs-ip2 fs \ meta target-role="Started" property $id="cib-bootstrap-options" \ dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ last-lrm-refresh="1348755080" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100"
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org