>> When I run "service pacemaker stop" it takes a long time, I see that it >> stops all the resources, then starts them on the other node, and only then >> the "stop" command is completed. > >Ahhh! It was the DC. > >It appears to be deliberate, I found this commit from 2008 where the behaviour >was introduced: > https://github.com/beekhof/pacemaker/commit/7bf55f0 > >I could change it, but I'm no longer sure this would be a good idea as it >would increase service downtime. >(Electing and bootstrapping a new DC introduces additional delays before the >cluster can bring up any resources). > >I assume there is a particular resource that takes a long time to start? > Yes, mainly the JavaSrv takes quite a lot of time... So you say this is by design since the server I'm rebooting is the DC, and I suffer because my resources take long time to start? Got it, thanks a lot for your response.
> >> I have 3 resources, IP, OracleDB and JavaSrv >> >> This is the output on the screen: >> [root@ha1 ~]# service pacemaker stop >> Signaling Pacemaker Cluster Manager to terminate: [ OK ] >> Waiting for cluster services to >> >unload:.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... >> [ OK ] >> [root@ha1 ~]# >> >> And these are parts of the log (/var/log/cluster/corosync.log): >> Jun 29 15:14:15 [28031] ha1 pengine: notice: stage6: Scheduling Node >> ha1 for shutdown >> Jun 29 15:14:15 [28031] ha1 pengine: notice: LogActions: Move >> ip_resource (Started ha1 -> ha2) >> Jun 29 15:14:15 [28031] ha1 pengine: notice: LogActions: Move >> OracleDB (Started ha1 -> ha2) >> Jun 29 15:14:15 [28031] ha1 pengine: notice: LogActions: Move >> JavaSrv (Started ha1 -> ha2) >> Jun 29 15:14:15 [28032] ha1 crmd: info: te_rsc_command: >> Initiating action 12: stop JavaSrv_stop_0 on ha1 (local) >> Jun 29 15:14:15 ha1 lrmd: [28029]: info: rsc:JavaSrv:16: stop >> ... >> Jun 29 15:14:41 [28032] ha1 crmd: info: process_lrm_event: >> LRM operation JavaSrv_stop_0 (call=16, rc=0, cib-update=447, confirmed=true) >> ok >> Jun 29 15:14:41 [28032] ha1 crmd: info: te_rsc_command: >> Initiating action 9: stop OracleDB_stop_0 on ha1 (local) >> Jun 29 15:14:41 ha1 lrmd: [28029]: info: cancel_op: operation monitor[13] on >> lsb::ha-dbora::OracleDB for client 28032, its parameters: >> CRM_meta_name=[monitor] crm_feature_set=[3.0.6] CRM_meta_timeout=[600000] >> CRM_meta_interval=[60000] cancelled >> Jun 29 15:14:41 ha1 lrmd: [28029]: info: rsc:OracleDB:17: stop >> ... >> Jun 29 15:15:08 [28032] ha1 crmd: info: process_lrm_event: >> LRM operation OracleDB_stop_0 (call=17, rc=0, cib-update=448, >> confirmed=true) ok >> Jun 29 15:15:08 [28032] ha1 crmd: info: te_rsc_command: >> Initiating action 7: stop ip_resource_stop_0 on ha1 (local) >> ... >> Jun 29 15:15:08 [28032] ha1 crmd: info: process_lrm_event: >> LRM operation ip_resource_stop_0 (call=18, rc=0, cib-update=449, >> confirmed=true) ok >> Jun 29 15:15:08 [28032] ha1 crmd: info: te_rsc_command: >> Initiating action 8: start ip_resource_start_0 on ha2 >> Jun 29 15:15:08 [28032] ha1 crmd: info: te_crm_command: Executing >> crm-event (21): do_shutdown on ha1 >> Jun 29 15:15:08 [28032] ha1 crmd: info: te_crm_command: crm-event >> (21) is a local shutdown >> Jun 29 15:15:09 [28032] ha1 crmd: info: te_rsc_command: >> Initiating action 10: start OracleDB_start_0 on ha2 >> Jun 29 15:15:51 [28032] ha1 crmd: info: te_rsc_command: >> Initiating action 11: monitor OracleDB_monitor_60000 on ha2 >> Jun 29 15:15:51 [28032] ha1 crmd: info: te_rsc_command: >> Initiating action 13: start JavaSrv_start_0 on ha2 >> ... >> Jun 29 15:27:09 [28023] ha1 pacemakerd: info: pcmk_child_exit: >> Child process cib exited (pid=28027, rc=0) >> Jun 29 15:27:09 [28023] ha1 pacemakerd: notice: pcmk_shutdown_worker: >> Shutdown complete >> Jun 29 15:27:09 [28023] ha1 pacemakerd: info: main: Exiting pacemakerd >> _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org