Re: [Pacemaker] pacemaker shutdown waits for a failover

Liron Amitzi Mon, 28 Jul 2014 00:13:31 -0700

When I run "service pacemaker stop" it takes a long time, I see that it stops 
all the resources, then starts them on the other node, and only then the "stop" 
command is completed.
I have 3 resources, IP, OracleDB and JavaSrv

This is the output on the screen:
[root@ha1 ~]# service pacemaker stop
Signaling Pacemaker Cluster Manager to terminate:          [  OK  ]
Waiting for cluster services to 
unload:....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
                                              [  OK  ]
[root@ha1 ~]#

And these are parts of the log (/var/log/cluster/corosync.log):
Jun 29 15:14:15 [28031] ha1    pengine:   notice: stage6:  Scheduling Node ha1 
for shutdown
Jun 29 15:14:15 [28031] ha1    pengine:   notice: LogActions:      Move    
ip_resource     (Started ha1 -> ha2)
Jun 29 15:14:15 [28031] ha1    pengine:   notice: LogActions:      Move    
OracleDB        (Started ha1 -> ha2)
Jun 29 15:14:15 [28031] ha1    pengine:   notice: LogActions:      Move    
JavaSrv    (Started ha1 -> ha2)
Jun 29 15:14:15 [28032] ha1       crmd:     info: te_rsc_command:  Initiating 
action 12: stop JavaSrv_stop_0 on ha1 (local)
Jun 29 15:14:15 ha1 lrmd: [28029]: info: rsc:JavaSrv:16: stop
...
Jun 29 15:14:41 [28032] ha1       crmd:     info: process_lrm_event:       LRM 
operation JavaSrv_stop_0 (call=16, rc=0, cib-update=447, confirmed=true) ok
Jun 29 15:14:41 [28032] ha1       crmd:     info: te_rsc_command:  Initiating 
action 9: stop OracleDB_stop_0 on ha1 (local)
Jun 29 15:14:41 ha1 lrmd: [28029]: info: cancel_op: operation monitor[13] on 
lsb::ha-dbora::OracleDB for client 28032, its parameters: 
CRM_meta_name=[monitor] crm_feature_set=[3.0.6] CRM_meta_timeout=[600000] 
CRM_meta_interval=[60000]  cancelled
Jun 29 15:14:41 ha1 lrmd: [28029]: info: rsc:OracleDB:17: stop
...
Jun 29 15:15:08 [28032] ha1       crmd:     info: process_lrm_event:       LRM 
operation OracleDB_stop_0 (call=17, rc=0, cib-update=448, confirmed=true) ok
Jun 29 15:15:08 [28032] ha1       crmd:     info: te_rsc_command:  Initiating 
action 7: stop ip_resource_stop_0 on ha1 (local)
...
Jun 29 15:15:08 [28032] ha1       crmd:     info: process_lrm_event:       LRM 
operation ip_resource_stop_0 (call=18, rc=0, cib-update=449, confirmed=true) ok
Jun 29 15:15:08 [28032] ha1       crmd:     info: te_rsc_command:  Initiating 
action 8: start ip_resource_start_0 on ha2
Jun 29 15:15:08 [28032] ha1       crmd:     info: te_crm_command:  Executing 
crm-event (21): do_shutdown on ha1
Jun 29 15:15:08 [28032] ha1       crmd:     info: te_crm_command:  crm-event 
(21) is a local shutdown
Jun 29 15:15:09 [28032] ha1       crmd:     info: te_rsc_command:  Initiating 
action 10: start OracleDB_start_0 on ha2
Jun 29 15:15:51 [28032] ha1       crmd:     info: te_rsc_command:  Initiating 
action 11: monitor OracleDB_monitor_60000 on ha2
Jun 29 15:15:51 [28032] ha1       crmd:     info: te_rsc_command:  Initiating 
action 13: start JavaSrv_start_0 on ha2
...
Jun 29 15:27:09 [28023] ha1 pacemakerd:     info: pcmk_child_exit:         
Child process cib exited (pid=28027, rc=0)
Jun 29 15:27:09 [28023] ha1 pacemakerd:   notice: pcmk_shutdown_worker:    
Shutdown complete
Jun 29 15:27:09 [28023] ha1 pacemakerd:     info: main:    Exiting pacemakerd

________________________________________
From: Andrew Beekhof <and...@beekhof.net>
Sent: Monday, July 28, 2014 2:08
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] pacemaker shutdown waits for a failover

On 28 Jul 2014, at 12:40 am, Liron Amitzi <lir...@imperva.com> wrote:

> Hi guys,
> I'm working with pacemaker 1.1.7-6 with corosync 1.4.1-15 (2 nodes) and 
> facing a strange behavior.
> I have several resources including Oracle database, and when I try to stop 
> the pacemaker or reboot the active node it takes a very long time. I checked 
> it and it seems that pacemaker waits until the failover is complete before 
> stopping. I expect it to stop the resources, initiate the failover and stop, 
> not wait until everything is up on the other node.

Thats what I would expect too.
Can you show us something that would suggest this isn't happening?

> Am i missing something? Is this expected?
> Thanks,
> Liron
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] pacemaker shutdown waits for a failover

Reply via email to