On 3 Aug 2014, at 4:07 pm, Liron Amitzi <lir...@imperva.com> wrote: >>>>> When I run "service pacemaker stop" it takes a long time, I see that it >>>>> stops all the resources, then starts them on the other node, and only >>>>> then the "stop" command is completed. >>>> >>>> Ahhh! It was the DC. >>>> >>>> It appears to be deliberate, I found this commit from 2008 where the >>>> behaviour was introduced: >>>> https://github.com/beekhof/pacemaker/commit/7bf55f0 >>>> >>>> I could change it, but I'm no longer sure this would be a good idea as it >>>> would increase service downtime. >>>> (Electing and bootstrapping a new DC introduces additional delays before >>>> the cluster can bring up any resources). >>>> >>>> I assume there is a particular resource that takes a long time to start? >>>> >>> Yes, mainly the JavaSrv takes quite a lot of time... >> >> Do you have any resources that need to start after JavaSrv? >> If not there might be some magic you can use... > > No I don't, the Java is the last one. If I manage to do a "magic" it will > help me a lot...
1. You _may_ be able to set op_no_wait as a meta-attribute for your java resource. 2. You could change the agent's start action to return early and set a large start-delay for the recurring monitor operation (we usually recommend the exact opposite) 3. You could set start-delay > START_DELAY_THRESHOLD (aka. 5 * 60 * 1000) #2 might be the least worst > >>> So you say this is by design since the server I'm rebooting is the DC, and >>> I suffer because my resources take long time to start? >> >> Essentially, yes. >> >>> Got it, thanks a lot for your response. >>> >>>> >>>>> I have 3 resources, IP, OracleDB and JavaSrv >>>>> >>>>> This is the output on the screen: >>>>> [root@ha1 ~]# service pacemaker stop >>>>> Signaling Pacemaker Cluster Manager to terminate: [ OK ] >>>>> Waiting for cluster services to >>>>> >unload:.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... >>>>> [ OK ] >>>>> [root@ha1 ~]# >>>>> >>>>> And these are parts of the log (/var/log/cluster/corosync.log): >>>>> Jun 29 15:14:15 [28031] ha1 pengine: notice: stage6: Scheduling >>>>> Node ha1 for shutdown >>>>> Jun 29 15:14:15 [28031] ha1 pengine: notice: LogActions: Move >>>>> ip_resource (Started ha1 -> ha2) >>>>> Jun 29 15:14:15 [28031] ha1 pengine: notice: LogActions: Move >>>>> OracleDB (Started ha1 -> ha2) >>>>> Jun 29 15:14:15 [28031] ha1 pengine: notice: LogActions: Move >>>>> JavaSrv (Started ha1 -> ha2) >>>>> Jun 29 15:14:15 [28032] ha1 crmd: info: te_rsc_command: >>>>> Initiating action 12: stop JavaSrv_stop_0 on ha1 (local) >>>>> Jun 29 15:14:15 ha1 lrmd: [28029]: info: rsc:JavaSrv:16: stop >>>>> ... >>>>> Jun 29 15:14:41 [28032] ha1 crmd: info: process_lrm_event: >>>>> LRM operation JavaSrv_stop_0 (call=16, rc=0, cib-update=447, >>>>> confirmed=true) ok >>>>> Jun 29 15:14:41 [28032] ha1 crmd: info: te_rsc_command: >>>>> Initiating action 9: stop OracleDB_stop_0 on ha1 (local) >>>>> Jun 29 15:14:41 ha1 lrmd: [28029]: info: cancel_op: operation monitor[13] >>>>> on lsb::ha-dbora::OracleDB for client 28032, its parameters: >>>>> CRM_meta_name=[monitor] crm_feature_set=[3.0.6] CRM_meta_timeout=[600000] >>>>> CRM_meta_interval=[60000] cancelled >>>>> Jun 29 15:14:41 ha1 lrmd: [28029]: info: rsc:OracleDB:17: stop >>>>> ... >>>>> Jun 29 15:15:08 [28032] ha1 crmd: info: process_lrm_event: >>>>> LRM operation OracleDB_stop_0 (call=17, rc=0, cib-update=448, >>>>> confirmed=true) ok >>>>> Jun 29 15:15:08 [28032] ha1 crmd: info: te_rsc_command: >>>>> Initiating action 7: stop ip_resource_stop_0 on ha1 (local) >>>>> ... >>>>> Jun 29 15:15:08 [28032] ha1 crmd: info: process_lrm_event: >>>>> LRM operation ip_resource_stop_0 (call=18, rc=0, cib-update=449, >>>>> confirmed=true) ok >>>>> Jun 29 15:15:08 [28032] ha1 crmd: info: te_rsc_command: >>>>> Initiating action 8: start ip_resource_start_0 on ha2 >>>>> Jun 29 15:15:08 [28032] ha1 crmd: info: te_crm_command: >>>>> Executing crm-event (21): do_shutdown on ha1 >>>>> Jun 29 15:15:08 [28032] ha1 crmd: info: te_crm_command: >>>>> crm-event (21) is a local shutdown >>>>> Jun 29 15:15:09 [28032] ha1 crmd: info: te_rsc_command: >>>>> Initiating action 10: start OracleDB_start_0 on ha2 >>>>> Jun 29 15:15:51 [28032] ha1 crmd: info: te_rsc_command: >>>>> Initiating action 11: monitor OracleDB_monitor_60000 on ha2 >>>>> Jun 29 15:15:51 [28032] ha1 crmd: info: te_rsc_command: >>>>> Initiating action 13: start JavaSrv_start_0 on ha2 >>>>> ... >>>>> Jun 29 15:27:09 [28023] ha1 pacemakerd: info: pcmk_child_exit: >>>>> Child process cib exited (pid=28027, rc=0) >>>>> Jun 29 15:27:09 [28023] ha1 pacemakerd: notice: pcmk_shutdown_worker: >>>>> Shutdown complete >>>>> Jun 29 15:27:09 [28023] ha1 pacemakerd: info: main: Exiting >>>>> pacemakerd >>>>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org