Re: [Pacemaker] pacemaker shutdown waits for a failover

Andrew Beekhof Mon, 28 Jul 2014 16:17:34 -0700

On 28 Jul 2014, at 5:07 pm, Liron Amitzi <lir...@imperva.com> wrote:


> When I run "service pacemaker stop" it takes a long time, I see that it stops 
> all the resources, then starts them on the other node, and only then the 
> "stop" command is completed.

Ahhh! It was the DC.

It appears to be deliberate, I found this commit from 2008 where the behaviour 
was introduced:
   https://github.com/beekhof/pacemaker/commit/7bf55f0

I could change it, but I'm no longer sure this would be a good idea as it would 
increase service downtime.
(Electing and bootstrapping a new DC introduces additional delays before the 
cluster can bring up any resources).

I assume there is a particular resource that takes a long time to start?


> I have 3 resources, IP, OracleDB and JavaSrv
> 
> This is the output on the screen:
> [root@ha1 ~]# service pacemaker stop
> Signaling Pacemaker Cluster Manager to terminate:          [  OK  ]
> Waiting for cluster services to 
> unload:....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
>                                               [  OK  ]
> [root@ha1 ~]#
> 
> And these are parts of the log (/var/log/cluster/corosync.log):
> Jun 29 15:14:15 [28031] ha1    pengine:   notice: stage6:  Scheduling Node 
> ha1 for shutdown
> Jun 29 15:14:15 [28031] ha1    pengine:   notice: LogActions:      Move    
> ip_resource     (Started ha1 -> ha2)
> Jun 29 15:14:15 [28031] ha1    pengine:   notice: LogActions:      Move    
> OracleDB        (Started ha1 -> ha2)
> Jun 29 15:14:15 [28031] ha1    pengine:   notice: LogActions:      Move    
> JavaSrv    (Started ha1 -> ha2)
> Jun 29 15:14:15 [28032] ha1       crmd:     info: te_rsc_command:  Initiating 
> action 12: stop JavaSrv_stop_0 on ha1 (local)
> Jun 29 15:14:15 ha1 lrmd: [28029]: info: rsc:JavaSrv:16: stop
> ...
> Jun 29 15:14:41 [28032] ha1       crmd:     info: process_lrm_event:       
> LRM operation JavaSrv_stop_0 (call=16, rc=0, cib-update=447, confirmed=true) 
> ok
> Jun 29 15:14:41 [28032] ha1       crmd:     info: te_rsc_command:  Initiating 
> action 9: stop OracleDB_stop_0 on ha1 (local)
> Jun 29 15:14:41 ha1 lrmd: [28029]: info: cancel_op: operation monitor[13] on 
> lsb::ha-dbora::OracleDB for client 28032, its parameters: 
> CRM_meta_name=[monitor] crm_feature_set=[3.0.6] CRM_meta_timeout=[600000] 
> CRM_meta_interval=[60000]  cancelled
> Jun 29 15:14:41 ha1 lrmd: [28029]: info: rsc:OracleDB:17: stop
> ...
> Jun 29 15:15:08 [28032] ha1       crmd:     info: process_lrm_event:       
> LRM operation OracleDB_stop_0 (call=17, rc=0, cib-update=448, confirmed=true) 
> ok
> Jun 29 15:15:08 [28032] ha1       crmd:     info: te_rsc_command:  Initiating 
> action 7: stop ip_resource_stop_0 on ha1 (local)
> ...
> Jun 29 15:15:08 [28032] ha1       crmd:     info: process_lrm_event:       
> LRM operation ip_resource_stop_0 (call=18, rc=0, cib-update=449, 
> confirmed=true) ok
> Jun 29 15:15:08 [28032] ha1       crmd:     info: te_rsc_command:  Initiating 
> action 8: start ip_resource_start_0 on ha2
> Jun 29 15:15:08 [28032] ha1       crmd:     info: te_crm_command:  Executing 
> crm-event (21): do_shutdown on ha1
> Jun 29 15:15:08 [28032] ha1       crmd:     info: te_crm_command:  crm-event 
> (21) is a local shutdown
> Jun 29 15:15:09 [28032] ha1       crmd:     info: te_rsc_command:  Initiating 
> action 10: start OracleDB_start_0 on ha2
> Jun 29 15:15:51 [28032] ha1       crmd:     info: te_rsc_command:  Initiating 
> action 11: monitor OracleDB_monitor_60000 on ha2
> Jun 29 15:15:51 [28032] ha1       crmd:     info: te_rsc_command:  Initiating 
> action 13: start JavaSrv_start_0 on ha2
> ...
> Jun 29 15:27:09 [28023] ha1 pacemakerd:     info: pcmk_child_exit:         
> Child process cib exited (pid=28027, rc=0)
> Jun 29 15:27:09 [28023] ha1 pacemakerd:   notice: pcmk_shutdown_worker:    
> Shutdown complete
> Jun 29 15:27:09 [28023] ha1 pacemakerd:     info: main:    Exiting pacemakerd
> 
> 
> 
> ________________________________________
> From: Andrew Beekhof <and...@beekhof.net>
> Sent: Monday, July 28, 2014 2:08
> To: The Pacemaker cluster resource manager
> Subject: Re: [Pacemaker] pacemaker shutdown waits for a failover
> 
> On 28 Jul 2014, at 12:40 am, Liron Amitzi <lir...@imperva.com> wrote:
> 
>> Hi guys,
>> I'm working with pacemaker 1.1.7-6 with corosync 1.4.1-15 (2 nodes) and 
>> facing a strange behavior.
>> I have several resources including Oracle database, and when I try to stop 
>> the pacemaker or reboot the active node it takes a very long time. I checked 
>> it and it seems that pacemaker waits until the failover is complete before 
>> stopping. I expect it to stop the resources, initiate the failover and stop, 
>> not wait until everything is up on the other node.
> 
> Thats what I would expect too.
> Can you show us something that would suggest this isn't happening?
> 
>> Am i missing something? Is this expected?
>> Thanks,
>> Liron
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] pacemaker shutdown waits for a failover

Reply via email to