Re: [Pacemaker] About Quorum control at the time of the service stop.(no-quorum-policy=freeze)

Andrew Beekhof Mon, 13 Sep 2010 06:34:55 -0700

On Fri, Sep 10, 2010 at 7:22 AM,  <renayama19661...@ybb.ne.jp> wrote:
> Hi,
>
> We confirmed movement of no-quorum-policy=freeze in four node constitution.
>
> Of course we understand that quorum control does not act in Heartbeat well.
>
> We confirmed the service stop of four nodes in the next procedure.
>
> Step1) We start four nodes.(3ACT:1STB)
>
> Step2) We send cib.xml.
>
> ============
> Last updated: Fri Sep 10 14:16:30 2010
> Stack: Heartbeat
> Current DC: srv04 (96faf899-13a6-4550-9d3b-b784f7241d06) - partition with 
> quorum
> Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> 4 Nodes configured, unknown expected votes
> 7 Resources configured.
> ============
>
> Online: [ srv01 srv02 srv03 srv04 ]
>
>  Resource Group: Group01
>     Dummy01    (ocf::heartbeat:Dummy): Started srv01
>     Dummy01-2  (ocf::heartbeat:Dummy): Started srv01
>  Resource Group: Group02
>     Dummy02    (ocf::heartbeat:Dummy): Started srv02
>     Dummy02-2  (ocf::heartbeat:Dummy): Started srv02
>  Resource Group: Group03
>     Dummy03    (ocf::heartbeat:Dummy): Started srv03
>     Dummy03-2  (ocf::heartbeat:Dummy): Started srv03
>  Resource Group: grpStonith1
>     prmStonith1-3      (stonith:external/ssh): Started srv01
>  Resource Group: grpStonith2
>     prmStonith2-3      (stonith:external/ssh): Started srv02
>  Resource Group: grpStonith3
>     prmStonith3-3      (stonith:external/ssh): Started srv03
>  Resource Group: grpStonith4
>     prmStonith4-3      (stonith:external/ssh): Started srv04
>
> Step3) We stop the first node after being stable.
>
> [r...@srv02 ~]# crm_mon -1
> ============
> Last updated: Fri Sep 10 14:17:07 2010
> Stack: Heartbeat
> Current DC: srv04 (96faf899-13a6-4550-9d3b-b784f7241d06) - partition with 
> quorum
> Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> 4 Nodes configured, unknown expected votes
> 7 Resources configured.
> ============
>
> Online: [ srv02 srv03 srv04 ]
> OFFLINE: [ srv01 ]
>
>  Resource Group: Group01
>     Dummy01    (ocf::heartbeat:Dummy): Started srv04 ---->FO
>     Dummy01-2  (ocf::heartbeat:Dummy): Started srv04 ---->FO
>  Resource Group: Group02
>     Dummy02    (ocf::heartbeat:Dummy): Started srv02
>     Dummy02-2  (ocf::heartbeat:Dummy): Started srv02
>  Resource Group: Group03
>     Dummy03    (ocf::heartbeat:Dummy): Started srv03
>     Dummy03-2  (ocf::heartbeat:Dummy): Started srv03
>  Resource Group: grpStonith1
>     prmStonith1-3      (stonith:external/ssh): Started srv03
>  Resource Group: grpStonith2
>     prmStonith2-3      (stonith:external/ssh): Started srv02
>  Resource Group: grpStonith3
>     prmStonith3-3      (stonith:external/ssh): Started srv03
>  Resource Group: grpStonith4
>     prmStonith4-3      (stonith:external/ssh): Started srv04
>
>
> Step4) Furthermore, we stop the next node after being stable.
>  * Because a notice of ccm which does not have Quorum is late, two remaining 
> node nodes move the
> resource.


Thats not strictly true.
The movement is initiated before the second node shuts down, so it is
considered safe because we still had quorum at the point the decision
was made.

>
> [r...@srv03 ~]# crm_mon -1
> ============
> Last updated: Fri Sep 10 14:17:59 2010
> Stack: Heartbeat
> Current DC: srv04 (96faf899-13a6-4550-9d3b-b784f7241d06) - partition with 
> quorum
> Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> 4 Nodes configured, unknown expected votes
> 7 Resources configured.
> ============
>
> Online: [ srv03 srv04 ]
> OFFLINE: [ srv01 srv02 ]
>
>  Resource Group: Group01
>     Dummy01    (ocf::heartbeat:Dummy): Started srv04
>     Dummy01-2  (ocf::heartbeat:Dummy): Started srv04
>  Resource Group: Group02
>     Dummy02    (ocf::heartbeat:Dummy): Started srv04 ---->FO
>     Dummy02-2  (ocf::heartbeat:Dummy): Started srv04 ---->FO
>  Resource Group: Group03
>     Dummy03    (ocf::heartbeat:Dummy): Started srv03
>     Dummy03-2  (ocf::heartbeat:Dummy): Started srv03
>  Resource Group: grpStonith1
>     prmStonith1-3      (stonith:external/ssh): Started srv03
>  Resource Group: grpStonith2
>     prmStonith2-3      (stonith:external/ssh): Started srv04
>  Resource Group: grpStonith3
>     prmStonith3-3      (stonith:external/ssh): Started srv03
>  Resource Group: grpStonith4
>     prmStonith4-3      (stonith:external/ssh): Started srv04
>
> Step5) We stop one node after being more stable.
>  * We stopped it since I became have-quorum=0 of cib.
>
> [r...@srv03 ~]# cibadmin -Q | more
> <cib epoch="102" num_updates="3" admin_epoch="0" 
> validate-with="pacemaker-1.0" crm_feature_set="3.0.1"
> have-quorum="0" dc-uuid="96faf899-13a6-4550-9d3b-b784f
> 7241d06">
>
> Step6) Some resources moved to the last node.
>
> [r...@srv04 ~]# crm_mon -1
> ============
> Last updated: Fri Sep 10 14:19:43 2010
> Stack: Heartbeat
> Current DC: srv04 (96faf899-13a6-4550-9d3b-b784f7241d06) - partition WITHOUT 
> quorum
> Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> 4 Nodes configured, unknown expected votes
> 7 Resources configured.
> ============
>
> Online: [ srv04 ]
> OFFLINE: [ srv01 srv02 srv03 ]
>
>  Resource Group: Group01
>     Dummy01    (ocf::heartbeat:Dummy): Started srv04
>     Dummy01-2  (ocf::heartbeat:Dummy): Started srv04
>  Resource Group: Group02
>     Dummy02    (ocf::heartbeat:Dummy): Started srv04
>     Dummy02-2  (ocf::heartbeat:Dummy): Started srv04
>  Resource Group: Group03
>     Dummy03    (ocf::heartbeat:Dummy): Started srv04 ---->Why FO?
>     Dummy03-2  (ocf::heartbeat:Dummy): Started srv04 ---->Why FO?

In this case, it is because a member of our partition owned the
resource at the time we initiated the move.

Unfortunately the scenario here isn't quite testing what you had in mind.
You only achieve the expected behavior if you remove the second and
third machines from the cluster _ungracefully_.
Ie. by fencing them or unplugging them.

>  Resource Group: grpStonith1
>     prmStonith1-3      (stonith:external/ssh): Started srv04
>  Resource Group: grpStonith2
>     prmStonith2-3      (stonith:external/ssh): Started srv04
>  Resource Group: grpStonith4
>     prmStonith4-3      (stonith:external/ssh): Started srv04
>
>
> We thought that the resource that I left in a left node in Step5 did not move 
> last.
> Because the reason is because it appoints no-quorum-policy=freeze.

Freeze still allows recovery within a partition.
Recovery can also occur for graceful shutdowns because the partition
owned the resource beforehand.

> However, the starting resource seems to move at the time of 
> no-quorum-policy=freeze when I watch a
> source code.
>
> (snip)
> action_t *
> custom_action(resource_t *rsc, char *key, const char *task,
>              node_t *on_node, gboolean optional, gboolean save_action,
>              pe_working_set_t *data_set)
> {
>        action_t *action = NULL;
>        GListPtr possible_matches = NULL;
>        CRM_CHECK(key != NULL, return NULL);
>        CRM_CHECK(task != NULL, return NULL);
> (snip)
>                } else if(is_set(data_set->flags, pe_flag_have_quorum) == FALSE
>                        && data_set->no_quorum_policy == no_quorum_freeze) {
>                        crm_debug_3("Check resource is already active");
>                        if(rsc->fns->active(rsc, TRUE) == FALSE) {
>                                action->runnable = FALSE;
>                                crm_debug("%s\t%s (cancelled : quorum freeze)",
>                                          action->node->details->uname,
>                                          action->uuid);
>                        }
>
>                } else {
>
> (snip)
>
> Is this specifications of right no-quorum-policy=freeze movement?
> Is there detailed explanation of the no-quorum-policy=freeze movement 
> somewhere?
>
> Best Regards,
> Hideo Yamauchi.
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] About Quorum control at the time of the service stop.(no-quorum-policy=freeze)

Reply via email to