2010/9/14 <renayama19661...@ybb.ne.jp>: > Hi Andrew, > > Thank you for comment. > > As a conclusion in case of the freeze setting.... > > * At the divided point in time, the resource maintains it. > * When a node shuts it down, in divided constitution, the resource does > migrate. > -> Maintaining a resource in divided constitution. > > Is my understanding right?
I'd probably summarize it as: "resources are frozen to their current _partition_" They can only move around within their partition. So if the partition does not have quorum and * a node shuts down, the partition can reallocate any services on that node, but * a node disappears, the partition can NOT reallocate any services on that node (because it's no longer in our partition > > Best Regards, > Hideo Yamauchi. > > --- Andrew Beekhof <and...@beekhof.net> wrote: > >> On Fri, Sep 10, 2010 at 7:22 AM, <renayama19661...@ybb.ne.jp> wrote: >> > Hi, >> > >> > We confirmed movement of no-quorum-policy=freeze in four node constitution. >> > >> > Of course we understand that quorum control does not act in Heartbeat well. >> > >> > We confirmed the service stop of four nodes in the next procedure. >> > >> > Step1) We start four nodes.(3ACT:1STB) >> > >> > Step2) We send cib.xml. >> > >> > ============ >> > Last updated: Fri Sep 10 14:16:30 2010 >> > Stack: Heartbeat >> > Current DC: srv04 (96faf899-13a6-4550-9d3b-b784f7241d06) - partition with >> > quorum >> > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b >> > 4 Nodes configured, unknown expected votes >> > 7 Resources configured. >> > ============ >> > >> > Online: [ srv01 srv02 srv03 srv04 ] >> > >> > �Resource Group: Group01 >> > � � Dummy01 � �(ocf::heartbeat:Dummy): Started >> > srv01 >> > � � Dummy01-2 �(ocf::heartbeat:Dummy): Started srv01 >> > �Resource Group: Group02 >> > � � Dummy02 � �(ocf::heartbeat:Dummy): Started >> > srv02 >> > � � Dummy02-2 �(ocf::heartbeat:Dummy): Started srv02 >> > �Resource Group: Group03 >> > � � Dummy03 � �(ocf::heartbeat:Dummy): Started >> > srv03 >> > � � Dummy03-2 �(ocf::heartbeat:Dummy): Started srv03 >> > �Resource Group: grpStonith1 >> > � � prmStonith1-3 � � >> > �(stonith:external/ssh): Started srv01 >> > �Resource Group: grpStonith2 >> > � � prmStonith2-3 � � >> > �(stonith:external/ssh): Started srv02 >> > �Resource Group: grpStonith3 >> > � � prmStonith3-3 � � >> > �(stonith:external/ssh): Started srv03 >> > �Resource Group: grpStonith4 >> > � � prmStonith4-3 � � >> > �(stonith:external/ssh): Started srv04 >> > >> > Step3) We stop the first node after being stable. >> > >> > [r...@srv02 ~]# crm_mon -1 >> > ============ >> > Last updated: Fri Sep 10 14:17:07 2010 >> > Stack: Heartbeat >> > Current DC: srv04 (96faf899-13a6-4550-9d3b-b784f7241d06) - partition with >> > quorum >> > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b >> > 4 Nodes configured, unknown expected votes >> > 7 Resources configured. >> > ============ >> > >> > Online: [ srv02 srv03 srv04 ] >> > OFFLINE: [ srv01 ] >> > >> > �Resource Group: Group01 >> > � � Dummy01 � �(ocf::heartbeat:Dummy): Started >> > srv04 ---->FO >> > � � Dummy01-2 �(ocf::heartbeat:Dummy): Started srv04 >> > ---->FO >> > �Resource Group: Group02 >> > � � Dummy02 � �(ocf::heartbeat:Dummy): Started >> > srv02 >> > � � Dummy02-2 �(ocf::heartbeat:Dummy): Started srv02 >> > �Resource Group: Group03 >> > � � Dummy03 � �(ocf::heartbeat:Dummy): Started >> > srv03 >> > � � Dummy03-2 �(ocf::heartbeat:Dummy): Started srv03 >> > �Resource Group: grpStonith1 >> > � � prmStonith1-3 � � >> > �(stonith:external/ssh): Started srv03 >> > �Resource Group: grpStonith2 >> > � � prmStonith2-3 � � >> > �(stonith:external/ssh): Started srv02 >> > �Resource Group: grpStonith3 >> > � � prmStonith3-3 � � >> > �(stonith:external/ssh): Started srv03 >> > �Resource Group: grpStonith4 >> > � � prmStonith4-3 � � >> > �(stonith:external/ssh): Started srv04 >> > >> > >> > Step4) Furthermore, we stop the next node after being stable. >> > �* Because a notice of ccm which does not have Quorum is late, two >> > remaining node nodes > move >> the >> > resource. >> >> Thats not strictly true. >> The movement is initiated before the second node shuts down, so it is >> considered safe because we still had quorum at the point the decision >> was made. >> >> > >> > [r...@srv03 ~]# crm_mon -1 >> > ============ >> > Last updated: Fri Sep 10 14:17:59 2010 >> > Stack: Heartbeat >> > Current DC: srv04 (96faf899-13a6-4550-9d3b-b784f7241d06) - partition with >> > quorum >> > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b >> > 4 Nodes configured, unknown expected votes >> > 7 Resources configured. >> > ============ >> > >> > Online: [ srv03 srv04 ] >> > OFFLINE: [ srv01 srv02 ] >> > >> > �Resource Group: Group01 >> > � � Dummy01 � �(ocf::heartbeat:Dummy): Started >> > srv04 >> > � � Dummy01-2 �(ocf::heartbeat:Dummy): Started srv04 >> > �Resource Group: Group02 >> > � � Dummy02 � �(ocf::heartbeat:Dummy): Started >> > srv04 ---->FO >> > � � Dummy02-2 �(ocf::heartbeat:Dummy): Started srv04 >> > ---->FO >> > �Resource Group: Group03 >> > � � Dummy03 � �(ocf::heartbeat:Dummy): Started >> > srv03 >> > � � Dummy03-2 �(ocf::heartbeat:Dummy): Started srv03 >> > �Resource Group: grpStonith1 >> > � � prmStonith1-3 � � >> > �(stonith:external/ssh): Started srv03 >> > �Resource Group: grpStonith2 >> > � � prmStonith2-3 � � >> > �(stonith:external/ssh): Started srv04 >> > �Resource Group: grpStonith3 >> > � � prmStonith3-3 � � >> > �(stonith:external/ssh): Started srv03 >> > �Resource Group: grpStonith4 >> > � � prmStonith4-3 � � >> > �(stonith:external/ssh): Started srv04 >> > >> > Step5) We stop one node after being more stable. >> > �* We stopped it since I became have-quorum=0 of cib. >> > >> > [r...@srv03 ~]# cibadmin -Q | more >> > <cib epoch="102" num_updates="3" admin_epoch="0" >> > validate-with="pacemaker-1.0" >> crm_feature_set="3.0.1" >> > have-quorum="0" dc-uuid="96faf899-13a6-4550-9d3b-b784f >> > 7241d06"> >> > >> > Step6) Some resources moved to the last node. >> > >> > [r...@srv04 ~]# crm_mon -1 >> > ============ >> > Last updated: Fri Sep 10 14:19:43 2010 >> > Stack: Heartbeat >> > Current DC: srv04 (96faf899-13a6-4550-9d3b-b784f7241d06) - partition >> > WITHOUT quorum >> > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b >> > 4 Nodes configured, unknown expected votes >> > 7 Resources configured. >> > ============ >> > >> > Online: [ srv04 ] >> > OFFLINE: [ srv01 srv02 srv03 ] >> > >> > �Resource Group: Group01 >> > � � Dummy01 � �(ocf::heartbeat:Dummy): Started >> > srv04 >> > � � Dummy01-2 �(ocf::heartbeat:Dummy): Started srv04 >> > �Resource Group: Group02 >> > � � Dummy02 � �(ocf::heartbeat:Dummy): Started >> > srv04 >> > � � Dummy02-2 �(ocf::heartbeat:Dummy): Started srv04 >> > �Resource Group: Group03 >> > � � Dummy03 � �(ocf::heartbeat:Dummy): Started >> > srv04 ---->Why FO? >> > � � Dummy03-2 �(ocf::heartbeat:Dummy): Started srv04 >> > ---->Why FO? >> >> In this case, it is because a member of our partition owned the >> resource at the time we initiated the move. >> >> Unfortunately the scenario here isn't quite testing what you had in mind. >> You only achieve the expected behavior if you remove the second and >> third machines from the cluster _ungracefully_. >> Ie. by fencing them or unplugging them. >> >> > �Resource Group: grpStonith1 >> > � � prmStonith1-3 � � >> > �(stonith:external/ssh): Started srv04 >> > �Resource Group: grpStonith2 >> > � � prmStonith2-3 � � >> > �(stonith:external/ssh): Started srv04 >> > �Resource Group: grpStonith4 >> > � � prmStonith4-3 � � >> > �(stonith:external/ssh): Started srv04 >> > >> > >> > We thought that the resource that I left in a left node in Step5 did not >> > move last. >> > Because the reason is because it appoints no-quorum-policy=freeze. >> >> Freeze still allows recovery within a partition. >> Recovery can also occur for graceful shutdowns because the partition >> owned the resource beforehand. >> >> > However, the starting resource seems to move at the time of >> > no-quorum-policy=freeze when I >> watch a >> > source code. >> > >> > (snip) >> > action_t * >> > custom_action(resource_t *rsc, char *key, const char *task, >> > � � � � � � �node_t >> > *on_node, gboolean optional, > gboolean save_action, >> > � � � � � � >> > �pe_working_set_t *data_set) >> > { >> > � � � �action_t *action = NULL; >> > � � � �GListPtr possible_matches = NULL; >> > � � � �CRM_CHECK(key != NULL, return NULL); >> > � � � �CRM_CHECK(task != NULL, return NULL); >> > (snip) >> > � � � � � � � �} >> > else > if(is_set(data_set->flags, pe_flag_have_quorum) == FALSE >> > � � � � � � � � >> > � � � > �&& data_set->no_quorum_policy == no_quorum_freeze) { >> > � � � � � � � � >> > � � � > �crm_debug_3("Check resource is already active"); >> > � � � � � � � � >> > � � � > �if(rsc->fns->active(rsc, TRUE) == FALSE) { >> > � � � � � � � � >> > � � � > � � � � �action->runnable = FALSE; >> > � � � � � � � � >> > � � � > � � � � �crm_debug("%s\t%s (cancelled : > quorum freeze)", >> > � � � � � � � � >> > � � � > � � � � � � � � > � > �action->node->details->uname, >> > � � � � � � � � >> > � � � > � � � � � � � � > � > �action->uuid); >> > � � � � � � � � >> > � � � > �} >> > >> > � � � � � � � �} >> > else { >> > === 以下のメッセージは省略されました === > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker