When I run "service pacemaker stop" it takes a long time, I see that it stops all the resources, then starts them on the other node, and only then the "stop" command is completed. I have 3 resources, IP, OracleDB and JavaSrv
This is the output on the screen: [root@ha1 ~]# service pacemaker stop Signaling Pacemaker Cluster Manager to terminate: [ OK ] Waiting for cluster services to unload:.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... [ OK ] [root@ha1 ~]# And these are parts of the log (/var/log/cluster/corosync.log): Jun 29 15:14:15 [28031] ha1 pengine: notice: stage6: Scheduling Node ha1 for shutdown Jun 29 15:14:15 [28031] ha1 pengine: notice: LogActions: Move ip_resource (Started ha1 -> ha2) Jun 29 15:14:15 [28031] ha1 pengine: notice: LogActions: Move OracleDB (Started ha1 -> ha2) Jun 29 15:14:15 [28031] ha1 pengine: notice: LogActions: Move JavaSrv (Started ha1 -> ha2) Jun 29 15:14:15 [28032] ha1 crmd: info: te_rsc_command: Initiating action 12: stop JavaSrv_stop_0 on ha1 (local) Jun 29 15:14:15 ha1 lrmd: [28029]: info: rsc:JavaSrv:16: stop ... Jun 29 15:14:41 [28032] ha1 crmd: info: process_lrm_event: LRM operation JavaSrv_stop_0 (call=16, rc=0, cib-update=447, confirmed=true) ok Jun 29 15:14:41 [28032] ha1 crmd: info: te_rsc_command: Initiating action 9: stop OracleDB_stop_0 on ha1 (local) Jun 29 15:14:41 ha1 lrmd: [28029]: info: cancel_op: operation monitor[13] on lsb::ha-dbora::OracleDB for client 28032, its parameters: CRM_meta_name=[monitor] crm_feature_set=[3.0.6] CRM_meta_timeout=[600000] CRM_meta_interval=[60000] cancelled Jun 29 15:14:41 ha1 lrmd: [28029]: info: rsc:OracleDB:17: stop ... Jun 29 15:15:08 [28032] ha1 crmd: info: process_lrm_event: LRM operation OracleDB_stop_0 (call=17, rc=0, cib-update=448, confirmed=true) ok Jun 29 15:15:08 [28032] ha1 crmd: info: te_rsc_command: Initiating action 7: stop ip_resource_stop_0 on ha1 (local) ... Jun 29 15:15:08 [28032] ha1 crmd: info: process_lrm_event: LRM operation ip_resource_stop_0 (call=18, rc=0, cib-update=449, confirmed=true) ok Jun 29 15:15:08 [28032] ha1 crmd: info: te_rsc_command: Initiating action 8: start ip_resource_start_0 on ha2 Jun 29 15:15:08 [28032] ha1 crmd: info: te_crm_command: Executing crm-event (21): do_shutdown on ha1 Jun 29 15:15:08 [28032] ha1 crmd: info: te_crm_command: crm-event (21) is a local shutdown Jun 29 15:15:09 [28032] ha1 crmd: info: te_rsc_command: Initiating action 10: start OracleDB_start_0 on ha2 Jun 29 15:15:51 [28032] ha1 crmd: info: te_rsc_command: Initiating action 11: monitor OracleDB_monitor_60000 on ha2 Jun 29 15:15:51 [28032] ha1 crmd: info: te_rsc_command: Initiating action 13: start JavaSrv_start_0 on ha2 ... Jun 29 15:27:09 [28023] ha1 pacemakerd: info: pcmk_child_exit: Child process cib exited (pid=28027, rc=0) Jun 29 15:27:09 [28023] ha1 pacemakerd: notice: pcmk_shutdown_worker: Shutdown complete Jun 29 15:27:09 [28023] ha1 pacemakerd: info: main: Exiting pacemakerd ________________________________________ From: Andrew Beekhof <and...@beekhof.net> Sent: Monday, July 28, 2014 2:08 To: The Pacemaker cluster resource manager Subject: Re: [Pacemaker] pacemaker shutdown waits for a failover On 28 Jul 2014, at 12:40 am, Liron Amitzi <lir...@imperva.com> wrote: > Hi guys, > I'm working with pacemaker 1.1.7-6 with corosync 1.4.1-15 (2 nodes) and > facing a strange behavior. > I have several resources including Oracle database, and when I try to stop > the pacemaker or reboot the active node it takes a very long time. I checked > it and it seems that pacemaker waits until the failover is complete before > stopping. I expect it to stop the resources, initiate the failover and stop, > not wait until everything is up on the other node. Thats what I would expect too. Can you show us something that would suggest this isn't happening? > Am i missing something? Is this expected? > Thanks, > Liron > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org