Re: [Pacemaker] resources does not start on survied node after reboot

Andrew Beekhof Wed, 30 Oct 2013 20:27:21 -0700

On 30 Oct 2013, at 1:12 am, Саша Александров <[email protected]> wrote:


> Hi!
> 
> I have a 2-node cluster with shared storage and SBD-fencing.
> One node was down for maintenance.
> Due to external reasons, second node was rebotted. After reboot service never 
> got up:
> 
> Oct 29 13:04:21 wcs2 pengine[2362]:  warning: stage6: Scheduling Node wcs1 
> for STONITH
> Oct 29 13:04:21 wcs2 crmd[2363]:   notice: te_fence_node: Executing reboot 
> fencing operation (53) on wcs1 (timeout=60000)
> Oct 29 13:05:33 wcs2 stonith-ng[2359]:    error: remote_op_done: Operation 
> reboot of wcs1 by wcs2 for [email protected]: Timer expired
> Oct 29 13:05:33 wcs2 crmd[2363]:   notice: tengine_stonith_callback: Stonith 
> operation 2/53:0:0:f56c4538-1ad8-4871-825e-167eb9304677: Timer expired (-62)
> Oct 29 13:05:33 wcs2 crmd[2363]:   notice: tengine_stonith_callback: Stonith 
> operation 2 for wcs1 failed (Timer expired): aborting transition.
> Oct 29 13:05:33 wcs2 crmd[2363]:   notice: tengine_stonith_notify: Peer wcs1 
> was not terminated (st_notify_fence) by wcs2 for wcs2: Timer expired 
> (ref=4a3b045d-cc08-4e2f-8279-a85d113781b2) by client crmd.2363
> Oct 29 13:05:33 wcs2 crmd[2363]:   notice: run_graph: Transition 0 
> (Complete=20, Pending=0, Fired=0, Skipped=29, Incomplete=0, 
> Source=/usr/var/lib/pacemaker/pengine/pe-warn-54.bz2): Stopped
> Oct 29 13:05:33 wcs2 pengine[2362]:   notice: unpack_config: On loss of CCM 
> Quorum: Ignore
> Oct 29 13:05:33 wcs2 pengine[2362]:  warning: stage6: Scheduling Node wcs1 
> for STONITH
> 
> And this runs forever in cycle...
> 
> The node wcs1 is off, should not SBD determine that, and should not the 
> cluster start the resources?

The cluster can't start resources until fencing completes.
For some reason SBD is reporting that it is unable to fence wcs1 and so the 
cluster cannot continue.
_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] resources does not start on survied node after reboot

Reply via email to