2010/11/8 <renayama19661...@ybb.ne.jp>: > Hi, > > By two simple node constitution, it caused trouble(monitor error) in a > resource. > > ============ > Last updated: Mon Nov 8 10:16:50 2010 > Stack: Heartbeat > Current DC: srv02 (f80f87fd-cc09-43c7-80bc-8d9e96de376b) - partition WITHOUT > quorum > Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438 > 2 Nodes configured, unknown expected votes > 1 Resources configured. > ============ > > Online: [ srv01 srv02 ] > > Resource Group: grpDummy > prmDummy1-1 (ocf::heartbeat:Dummy): Started srv02 > prmDummy1-2 (ocf::heartbeat:Dummy): Started srv02 > prmDummy1-3 (ocf::heartbeat:Dummy): Started srv02 > prmDummy1-4 (ocf::heartbeat:Dummy): Started srv02 > > Migration summary: > * Node srv02: > * Node srv01: > prmDummy1-1: migration-threshold=1 fail-count=1 > > Failed actions: > prmDummy1-1_monitor_30000 (node=srv01, call=7, rc=7, status=complete): not > running > > > I carried out the next command consecutively after a resource exceeded a > fail-over. > > [r...@srv01 ~]# crm_resource -C -r prmDummy1-1 -N srv01;crm_resource -M -r > grpDummy -N srv01 -f -Q > > ============ > Last updated: Mon Nov 8 10:17:33 2010 > Stack: Heartbeat > Current DC: srv02 (f80f87fd-cc09-43c7-80bc-8d9e96de376b) - partition WITHOUT > quorum > Version: 1.0.9-0a40fd0cb9f2fcedef9d1967115c912314c57438 > 2 Nodes configured, unknown expected votes > 1 Resources configured. > ============ > > Online: [ srv01 srv02 ] > > Resource Group: grpDummy > prmDummy1-1 (ocf::heartbeat:Dummy): Started srv02 > prmDummy1-2 (ocf::heartbeat:Dummy): Started srv02 > prmDummy1-3 (ocf::heartbeat:Dummy): Started srv02 > prmDummy1-4 (ocf::heartbeat:Dummy): Started srv02 > > Migration summary: > * Node srv02: > * Node srv01: > > But, the resource does not move to a srv01 node. > > Does the "crm_resource -M" command have to carry it out after waiting for a > S_IDLE state? > > Or is this phenomenon a bug? > > * I attach a collection of hb_report file
So the problem here is that not only does -f enable logic in move_resource(), but also cib_options |= cib_scope_local|cib_quorum_override; Combined with the fact that crm_resource -C is not synchronous in 1.0, if you run -M on a non-DC node, the updates hit the local cib while the cluster is re-probing the resource(s). This results in the two CIBs getting out of sync: Nov 8 10:17:15 srv01 crmd: [5367]: WARN: cib_native_callback: CIB command failed: Application of an update diff failed Nov 8 10:17:15 srv01 crmd: [5367]: WARN: cib_native_callback: CIB command failed: Application of an update diff failed and the process of re-syncing them results in the behavior you saw. I think the smartest thing to do here is drop the cib_scope_local flag from -f _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker