Hi! I have a 2-node cluster with shared storage and SBD-fencing. One node was down for maintenance. Due to external reasons, second node was rebotted. After reboot service never got up:
Oct 29 13:04:21 wcs2 pengine[2362]: warning: stage6: Scheduling Node wcs1 for STONITH Oct 29 13:04:21 wcs2 crmd[2363]: notice: te_fence_node: Executing reboot fencing operation (53) on wcs1 (timeout=60000) Oct 29 13:05:33 wcs2 stonith-ng[2359]: error: remote_op_done: Operation reboot of wcs1 by wcs2 for crmd.2363@wcs2.4a3b045d: Timer expired Oct 29 13:05:33 wcs2 crmd[2363]: notice: tengine_stonith_callback: Stonith operation 2/53:0:0:f56c4538-1ad8-4871-825e-167eb9304677: Timer expired (-62) Oct 29 13:05:33 wcs2 crmd[2363]: notice: tengine_stonith_callback: Stonith operation 2 for wcs1 failed (Timer expired): aborting transition. Oct 29 13:05:33 wcs2 crmd[2363]: notice: tengine_stonith_notify: Peer wcs1 was not terminated (st_notify_fence) by wcs2 for wcs2: Timer expired (ref=4a3b045d-cc08-4e2f-8279-a85d113781b2) by client crmd.2363 Oct 29 13:05:33 wcs2 crmd[2363]: notice: run_graph: Transition 0 (Complete=20, Pending=0, Fired=0, Skipped=29, Incomplete=0, Source=/usr/var/lib/pacemaker/pengine/pe-warn-54.bz2): Stopped Oct 29 13:05:33 wcs2 pengine[2362]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 29 13:05:33 wcs2 pengine[2362]: warning: stage6: Scheduling Node wcs1 for STONITH And this runs forever in cycle... The node wcs1 is off, should not SBD determine that, and should not the cluster start the resources? Best regards, Alexandr A. Alexandrov -- С уважением, ААА.
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org