Re: [Pacemaker] About Quorum control at the time of the service stop.(no-quorum-policy=freeze)

Andrew Beekhof Tue, 14 Sep 2010 03:44:36 -0700

2010/9/14  <renayama19661...@ybb.ne.jp>:
> Hi Andrew,
>
> Thank you for comment.
>
> As a conclusion in case of the freeze setting....
>
>  * At the divided point in time, the resource maintains it.
>  * When a node shuts it down, in divided constitution, the resource does 
> migrate.
>   -> Maintaining a resource in divided constitution.
>
> Is my understanding right?


I'd probably summarize it as:
 "resources are frozen to their current _partition_"

They can only move around within their partition.  So if the partition
does not have quorum and
*  a node shuts down, the partition can reallocate any services on
that node, but
*  a node disappears, the partition can NOT reallocate any services on
that node (because it's no longer in our partition

>
> Best Regards,
> Hideo Yamauchi.
>
> --- Andrew Beekhof <and...@beekhof.net> wrote:
>
>> On Fri, Sep 10, 2010 at 7:22 AM,  <renayama19661...@ybb.ne.jp> wrote:
>> > Hi,
>> >
>> > We confirmed movement of no-quorum-policy=freeze in four node constitution.
>> >
>> > Of course we understand that quorum control does not act in Heartbeat well.
>> >
>> > We confirmed the service stop of four nodes in the next procedure.
>> >
>> > Step1) We start four nodes.(3ACT:1STB)
>> >
>> > Step2) We send cib.xml.
>> >
>> > ============
>> > Last updated: Fri Sep 10 14:16:30 2010
>> > Stack: Heartbeat
>> > Current DC: srv04 (96faf899-13a6-4550-9d3b-b784f7241d06) - partition with 
>> > quorum
>> > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
>> > 4 Nodes configured, unknown expected votes
>> > 7 Resources configured.
>> > ============
>> >
>> > Online: [ srv01 srv02 srv03 srv04 ]
>> >
>> > &#65533;Resource Group: Group01
>> > &#65533; &#65533; Dummy01 &#65533; &#65533;(ocf::heartbeat:Dummy): Started 
>> > srv01
>> > &#65533; &#65533; Dummy01-2 &#65533;(ocf::heartbeat:Dummy): Started srv01
>> > &#65533;Resource Group: Group02
>> > &#65533; &#65533; Dummy02 &#65533; &#65533;(ocf::heartbeat:Dummy): Started 
>> > srv02
>> > &#65533; &#65533; Dummy02-2 &#65533;(ocf::heartbeat:Dummy): Started srv02
>> > &#65533;Resource Group: Group03
>> > &#65533; &#65533; Dummy03 &#65533; &#65533;(ocf::heartbeat:Dummy): Started 
>> > srv03
>> > &#65533; &#65533; Dummy03-2 &#65533;(ocf::heartbeat:Dummy): Started srv03
>> > &#65533;Resource Group: grpStonith1
>> > &#65533; &#65533; prmStonith1-3 &#65533; &#65533; 
>> > &#65533;(stonith:external/ssh): Started srv01
>> > &#65533;Resource Group: grpStonith2
>> > &#65533; &#65533; prmStonith2-3 &#65533; &#65533; 
>> > &#65533;(stonith:external/ssh): Started srv02
>> > &#65533;Resource Group: grpStonith3
>> > &#65533; &#65533; prmStonith3-3 &#65533; &#65533; 
>> > &#65533;(stonith:external/ssh): Started srv03
>> > &#65533;Resource Group: grpStonith4
>> > &#65533; &#65533; prmStonith4-3 &#65533; &#65533; 
>> > &#65533;(stonith:external/ssh): Started srv04
>> >
>> > Step3) We stop the first node after being stable.
>> >
>> > [r...@srv02 ~]# crm_mon -1
>> > ============
>> > Last updated: Fri Sep 10 14:17:07 2010
>> > Stack: Heartbeat
>> > Current DC: srv04 (96faf899-13a6-4550-9d3b-b784f7241d06) - partition with 
>> > quorum
>> > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
>> > 4 Nodes configured, unknown expected votes
>> > 7 Resources configured.
>> > ============
>> >
>> > Online: [ srv02 srv03 srv04 ]
>> > OFFLINE: [ srv01 ]
>> >
>> > &#65533;Resource Group: Group01
>> > &#65533; &#65533; Dummy01 &#65533; &#65533;(ocf::heartbeat:Dummy): Started 
>> > srv04 ---->FO
>> > &#65533; &#65533; Dummy01-2 &#65533;(ocf::heartbeat:Dummy): Started srv04 
>> > ---->FO
>> > &#65533;Resource Group: Group02
>> > &#65533; &#65533; Dummy02 &#65533; &#65533;(ocf::heartbeat:Dummy): Started 
>> > srv02
>> > &#65533; &#65533; Dummy02-2 &#65533;(ocf::heartbeat:Dummy): Started srv02
>> > &#65533;Resource Group: Group03
>> > &#65533; &#65533; Dummy03 &#65533; &#65533;(ocf::heartbeat:Dummy): Started 
>> > srv03
>> > &#65533; &#65533; Dummy03-2 &#65533;(ocf::heartbeat:Dummy): Started srv03
>> > &#65533;Resource Group: grpStonith1
>> > &#65533; &#65533; prmStonith1-3 &#65533; &#65533; 
>> > &#65533;(stonith:external/ssh): Started srv03
>> > &#65533;Resource Group: grpStonith2
>> > &#65533; &#65533; prmStonith2-3 &#65533; &#65533; 
>> > &#65533;(stonith:external/ssh): Started srv02
>> > &#65533;Resource Group: grpStonith3
>> > &#65533; &#65533; prmStonith3-3 &#65533; &#65533; 
>> > &#65533;(stonith:external/ssh): Started srv03
>> > &#65533;Resource Group: grpStonith4
>> > &#65533; &#65533; prmStonith4-3 &#65533; &#65533; 
>> > &#65533;(stonith:external/ssh): Started srv04
>> >
>> >
>> > Step4) Furthermore, we stop the next node after being stable.
>> > &#65533;* Because a notice of ccm which does not have Quorum is late, two 
>> > remaining node nodes
> move
>> the
>> > resource.
>>
>> Thats not strictly true.
>> The movement is initiated before the second node shuts down, so it is
>> considered safe because we still had quorum at the point the decision
>> was made.
>>
>> >
>> > [r...@srv03 ~]# crm_mon -1
>> > ============
>> > Last updated: Fri Sep 10 14:17:59 2010
>> > Stack: Heartbeat
>> > Current DC: srv04 (96faf899-13a6-4550-9d3b-b784f7241d06) - partition with 
>> > quorum
>> > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
>> > 4 Nodes configured, unknown expected votes
>> > 7 Resources configured.
>> > ============
>> >
>> > Online: [ srv03 srv04 ]
>> > OFFLINE: [ srv01 srv02 ]
>> >
>> > &#65533;Resource Group: Group01
>> > &#65533; &#65533; Dummy01 &#65533; &#65533;(ocf::heartbeat:Dummy): Started 
>> > srv04
>> > &#65533; &#65533; Dummy01-2 &#65533;(ocf::heartbeat:Dummy): Started srv04
>> > &#65533;Resource Group: Group02
>> > &#65533; &#65533; Dummy02 &#65533; &#65533;(ocf::heartbeat:Dummy): Started 
>> > srv04 ---->FO
>> > &#65533; &#65533; Dummy02-2 &#65533;(ocf::heartbeat:Dummy): Started srv04 
>> > ---->FO
>> > &#65533;Resource Group: Group03
>> > &#65533; &#65533; Dummy03 &#65533; &#65533;(ocf::heartbeat:Dummy): Started 
>> > srv03
>> > &#65533; &#65533; Dummy03-2 &#65533;(ocf::heartbeat:Dummy): Started srv03
>> > &#65533;Resource Group: grpStonith1
>> > &#65533; &#65533; prmStonith1-3 &#65533; &#65533; 
>> > &#65533;(stonith:external/ssh): Started srv03
>> > &#65533;Resource Group: grpStonith2
>> > &#65533; &#65533; prmStonith2-3 &#65533; &#65533; 
>> > &#65533;(stonith:external/ssh): Started srv04
>> > &#65533;Resource Group: grpStonith3
>> > &#65533; &#65533; prmStonith3-3 &#65533; &#65533; 
>> > &#65533;(stonith:external/ssh): Started srv03
>> > &#65533;Resource Group: grpStonith4
>> > &#65533; &#65533; prmStonith4-3 &#65533; &#65533; 
>> > &#65533;(stonith:external/ssh): Started srv04
>> >
>> > Step5) We stop one node after being more stable.
>> > &#65533;* We stopped it since I became have-quorum=0 of cib.
>> >
>> > [r...@srv03 ~]# cibadmin -Q | more
>> > <cib epoch="102" num_updates="3" admin_epoch="0" 
>> > validate-with="pacemaker-1.0"
>> crm_feature_set="3.0.1"
>> > have-quorum="0" dc-uuid="96faf899-13a6-4550-9d3b-b784f
>> > 7241d06">
>> >
>> > Step6) Some resources moved to the last node.
>> >
>> > [r...@srv04 ~]# crm_mon -1
>> > ============
>> > Last updated: Fri Sep 10 14:19:43 2010
>> > Stack: Heartbeat
>> > Current DC: srv04 (96faf899-13a6-4550-9d3b-b784f7241d06) - partition 
>> > WITHOUT quorum
>> > Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
>> > 4 Nodes configured, unknown expected votes
>> > 7 Resources configured.
>> > ============
>> >
>> > Online: [ srv04 ]
>> > OFFLINE: [ srv01 srv02 srv03 ]
>> >
>> > &#65533;Resource Group: Group01
>> > &#65533; &#65533; Dummy01 &#65533; &#65533;(ocf::heartbeat:Dummy): Started 
>> > srv04
>> > &#65533; &#65533; Dummy01-2 &#65533;(ocf::heartbeat:Dummy): Started srv04
>> > &#65533;Resource Group: Group02
>> > &#65533; &#65533; Dummy02 &#65533; &#65533;(ocf::heartbeat:Dummy): Started 
>> > srv04
>> > &#65533; &#65533; Dummy02-2 &#65533;(ocf::heartbeat:Dummy): Started srv04
>> > &#65533;Resource Group: Group03
>> > &#65533; &#65533; Dummy03 &#65533; &#65533;(ocf::heartbeat:Dummy): Started 
>> > srv04 ---->Why FO?
>> > &#65533; &#65533; Dummy03-2 &#65533;(ocf::heartbeat:Dummy): Started srv04 
>> > ---->Why FO?
>>
>> In this case, it is because a member of our partition owned the
>> resource at the time we initiated the move.
>>
>> Unfortunately the scenario here isn't quite testing what you had in mind.
>> You only achieve the expected behavior if you remove the second and
>> third machines from the cluster _ungracefully_.
>> Ie. by fencing them or unplugging them.
>>
>> > &#65533;Resource Group: grpStonith1
>> > &#65533; &#65533; prmStonith1-3 &#65533; &#65533; 
>> > &#65533;(stonith:external/ssh): Started srv04
>> > &#65533;Resource Group: grpStonith2
>> > &#65533; &#65533; prmStonith2-3 &#65533; &#65533; 
>> > &#65533;(stonith:external/ssh): Started srv04
>> > &#65533;Resource Group: grpStonith4
>> > &#65533; &#65533; prmStonith4-3 &#65533; &#65533; 
>> > &#65533;(stonith:external/ssh): Started srv04
>> >
>> >
>> > We thought that the resource that I left in a left node in Step5 did not 
>> > move last.
>> > Because the reason is because it appoints no-quorum-policy=freeze.
>>
>> Freeze still allows recovery within a partition.
>> Recovery can also occur for graceful shutdowns because the partition
>> owned the resource beforehand.
>>
>> > However, the starting resource seems to move at the time of 
>> > no-quorum-policy=freeze when I
>> watch a
>> > source code.
>> >
>> > (snip)
>> > action_t *
>> > custom_action(resource_t *rsc, char *key, const char *task,
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533;node_t 
>> > *on_node, gboolean optional,
> gboolean save_action,
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533;pe_working_set_t *data_set)
>> > {
>> > &#65533; &#65533; &#65533; &#65533;action_t *action = NULL;
>> > &#65533; &#65533; &#65533; &#65533;GListPtr possible_matches = NULL;
>> > &#65533; &#65533; &#65533; &#65533;CRM_CHECK(key != NULL, return NULL);
>> > &#65533; &#65533; &#65533; &#65533;CRM_CHECK(task != NULL, return NULL);
>> > (snip)
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533;} 
>> > else
> if(is_set(data_set->flags, pe_flag_have_quorum) == FALSE
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> &#65533;&& data_set->no_quorum_policy == no_quorum_freeze) {
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> &#65533;crm_debug_3("Check resource is already active");
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> &#65533;if(rsc->fns->active(rsc, TRUE) == FALSE) {
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> &#65533; &#65533; &#65533; &#65533; &#65533;action->runnable = FALSE;
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> &#65533; &#65533; &#65533; &#65533; &#65533;crm_debug("%s\t%s (cancelled : 
> quorum freeze)",
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
> &#65533;
> &#65533;action->node->details->uname,
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
> &#65533;
> &#65533;action->uuid);
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 
>> > &#65533; &#65533; &#65533;
> &#65533;}
>> >
>> > &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533; &#65533;} 
>> > else {
>>
> === 以下のメッセージは省略されました ===
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] About Quorum control at the time of the service stop.(no-quorum-policy=freeze)

Reply via email to