Re: [Pacemaker] [Problem]The movement of the resource is not possible.

Andrew Beekhof Mon, 15 Nov 2010 00:10:45 -0800

2010/11/8  <renayama19661...@ybb.ne.jp>:
> Hi,
>
> By two simple node constitution, it caused trouble(monitor error) in a 
> resource.
>
> ============
> Last updated: Mon Nov  8 10:16:50 2010
> Stack: Heartbeat
> Current DC: srv02 (f80f87fd-cc09-43c7-80bc-8d9e96de376b) - partition WITHOUT 
> quorum
> Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438
> 2 Nodes configured, unknown expected votes
> 1 Resources configured.
> ============
>
> Online: [ srv01 srv02 ]
>
>  Resource Group: grpDummy
>     prmDummy1-1        (ocf::heartbeat:Dummy): Started srv02
>     prmDummy1-2        (ocf::heartbeat:Dummy): Started srv02
>     prmDummy1-3        (ocf::heartbeat:Dummy): Started srv02
>     prmDummy1-4        (ocf::heartbeat:Dummy): Started srv02
>
> Migration summary:
> * Node srv02:
> * Node srv01:
>   prmDummy1-1: migration-threshold=1 fail-count=1
>
> Failed actions:
>    prmDummy1-1_monitor_30000 (node=srv01, call=7, rc=7, status=complete): not 
> running
>
>
> I carried out the next command consecutively after a resource exceeded a 
> fail-over.
>
> [r...@srv01 ~]# crm_resource -C -r prmDummy1-1 -N srv01;crm_resource -M -r 
> grpDummy -N srv01 -f -Q
>
> ============
> Last updated: Mon Nov  8 10:17:33 2010
> Stack: Heartbeat
> Current DC: srv02 (f80f87fd-cc09-43c7-80bc-8d9e96de376b) - partition WITHOUT 
> quorum
> Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438
> 2 Nodes configured, unknown expected votes
> 1 Resources configured.
> ============
>
> Online: [ srv01 srv02 ]
>
>  Resource Group: grpDummy
>     prmDummy1-1        (ocf::heartbeat:Dummy): Started srv02
>     prmDummy1-2        (ocf::heartbeat:Dummy): Started srv02
>     prmDummy1-3        (ocf::heartbeat:Dummy): Started srv02
>     prmDummy1-4        (ocf::heartbeat:Dummy): Started srv02
>
> Migration summary:
> * Node srv02:
> * Node srv01:
>
> But, the resource does not move to a srv01 node.
>
> Does the "crm_resource -M" command have to carry it out after waiting for a 
> S_IDLE state?
>
> Or is this phenomenon a bug?
>
>  * I attach a collection of hb_report file


So the problem here is that not only does -f  enable logic in
move_resource(), but also

                cib_options |= cib_scope_local|cib_quorum_override;

Combined with the fact that crm_resource -C is not synchronous in 1.0,
if you run -M on a non-DC node, the updates hit the local cib while
the cluster is re-probing the resource(s).
This results in the two CIBs getting out of sync:
Nov  8 10:17:15 srv01 crmd: [5367]: WARN: cib_native_callback: CIB
command failed: Application of an update diff failed
Nov  8 10:17:15 srv01 crmd: [5367]: WARN: cib_native_callback: CIB
command failed: Application of an update diff failed

and the process of re-syncing them results in the behavior you saw.

I think the smartest thing to do here is drop the cib_scope_local flag from -f

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] [Problem]The movement of the resource is not possible.

Reply via email to