On 28 Jul 2014, at 5:07 pm, Liron Amitzi <lir...@imperva.com> wrote:
> When I run "service pacemaker stop" it takes a long time, I see that it stops > all the resources, then starts them on the other node, and only then the > "stop" command is completed. Ahhh! It was the DC. It appears to be deliberate, I found this commit from 2008 where the behaviour was introduced: https://github.com/beekhof/pacemaker/commit/7bf55f0 I could change it, but I'm no longer sure this would be a good idea as it would increase service downtime. (Electing and bootstrapping a new DC introduces additional delays before the cluster can bring up any resources). I assume there is a particular resource that takes a long time to start? > I have 3 resources, IP, OracleDB and JavaSrv > > This is the output on the screen: > [root@ha1 ~]# service pacemaker stop > Signaling Pacemaker Cluster Manager to terminate: [ OK ] > Waiting for cluster services to > unload:.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... > [ OK ] > [root@ha1 ~]# > > And these are parts of the log (/var/log/cluster/corosync.log): > Jun 29 15:14:15 [28031] ha1 pengine: notice: stage6: Scheduling Node > ha1 for shutdown > Jun 29 15:14:15 [28031] ha1 pengine: notice: LogActions: Move > ip_resource (Started ha1 -> ha2) > Jun 29 15:14:15 [28031] ha1 pengine: notice: LogActions: Move > OracleDB (Started ha1 -> ha2) > Jun 29 15:14:15 [28031] ha1 pengine: notice: LogActions: Move > JavaSrv (Started ha1 -> ha2) > Jun 29 15:14:15 [28032] ha1 crmd: info: te_rsc_command: Initiating > action 12: stop JavaSrv_stop_0 on ha1 (local) > Jun 29 15:14:15 ha1 lrmd: [28029]: info: rsc:JavaSrv:16: stop > ... > Jun 29 15:14:41 [28032] ha1 crmd: info: process_lrm_event: > LRM operation JavaSrv_stop_0 (call=16, rc=0, cib-update=447, confirmed=true) > ok > Jun 29 15:14:41 [28032] ha1 crmd: info: te_rsc_command: Initiating > action 9: stop OracleDB_stop_0 on ha1 (local) > Jun 29 15:14:41 ha1 lrmd: [28029]: info: cancel_op: operation monitor[13] on > lsb::ha-dbora::OracleDB for client 28032, its parameters: > CRM_meta_name=[monitor] crm_feature_set=[3.0.6] CRM_meta_timeout=[600000] > CRM_meta_interval=[60000] cancelled > Jun 29 15:14:41 ha1 lrmd: [28029]: info: rsc:OracleDB:17: stop > ... > Jun 29 15:15:08 [28032] ha1 crmd: info: process_lrm_event: > LRM operation OracleDB_stop_0 (call=17, rc=0, cib-update=448, confirmed=true) > ok > Jun 29 15:15:08 [28032] ha1 crmd: info: te_rsc_command: Initiating > action 7: stop ip_resource_stop_0 on ha1 (local) > ... > Jun 29 15:15:08 [28032] ha1 crmd: info: process_lrm_event: > LRM operation ip_resource_stop_0 (call=18, rc=0, cib-update=449, > confirmed=true) ok > Jun 29 15:15:08 [28032] ha1 crmd: info: te_rsc_command: Initiating > action 8: start ip_resource_start_0 on ha2 > Jun 29 15:15:08 [28032] ha1 crmd: info: te_crm_command: Executing > crm-event (21): do_shutdown on ha1 > Jun 29 15:15:08 [28032] ha1 crmd: info: te_crm_command: crm-event > (21) is a local shutdown > Jun 29 15:15:09 [28032] ha1 crmd: info: te_rsc_command: Initiating > action 10: start OracleDB_start_0 on ha2 > Jun 29 15:15:51 [28032] ha1 crmd: info: te_rsc_command: Initiating > action 11: monitor OracleDB_monitor_60000 on ha2 > Jun 29 15:15:51 [28032] ha1 crmd: info: te_rsc_command: Initiating > action 13: start JavaSrv_start_0 on ha2 > ... > Jun 29 15:27:09 [28023] ha1 pacemakerd: info: pcmk_child_exit: > Child process cib exited (pid=28027, rc=0) > Jun 29 15:27:09 [28023] ha1 pacemakerd: notice: pcmk_shutdown_worker: > Shutdown complete > Jun 29 15:27:09 [28023] ha1 pacemakerd: info: main: Exiting pacemakerd > > > > ________________________________________ > From: Andrew Beekhof <and...@beekhof.net> > Sent: Monday, July 28, 2014 2:08 > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] pacemaker shutdown waits for a failover > > On 28 Jul 2014, at 12:40 am, Liron Amitzi <lir...@imperva.com> wrote: > >> Hi guys, >> I'm working with pacemaker 1.1.7-6 with corosync 1.4.1-15 (2 nodes) and >> facing a strange behavior. >> I have several resources including Oracle database, and when I try to stop >> the pacemaker or reboot the active node it takes a very long time. I checked >> it and it seems that pacemaker waits until the failover is complete before >> stopping. I expect it to stop the resources, initiate the failover and stop, >> not wait until everything is up on the other node. > > Thats what I would expect too. > Can you show us something that would suggest this isn't happening? > >> Am i missing something? Is this expected? >> Thanks, >> Liron >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org