Re: [Pacemaker] pacemaker shutdown waits for a failover

Liron Amitzi Thu, 31 Jul 2014 03:27:13 -0700

>> When I run "service pacemaker stop" it takes a long time, I see that it 
>> stops all the resources, then starts them on the other node, and only then 
>> the "stop" command is completed.
>
>Ahhh! It was the DC.
>
>It appears to be deliberate, I found this commit from 2008 where the behaviour 
>was introduced:
>   https://github.com/beekhof/pacemaker/commit/7bf55f0
>
>I could change it, but I'm no longer sure this would be a good idea as it 
>would increase service downtime.
>(Electing and bootstrapping a new DC introduces additional delays before the 
>cluster can bring up any resources).
>
>I assume there is a particular resource that takes a long time to start?
>
Yes, mainly the JavaSrv takes quite a lot of time...
So you say this is by design since the server I'm rebooting is the DC, and I 
suffer because my resources take long time to start?
Got it, thanks a lot for your response.


>
>> I have 3 resources, IP, OracleDB and JavaSrv
>>
>> This is the output on the screen:
>> [root@ha1 ~]# service pacemaker stop
>> Signaling Pacemaker Cluster Manager to terminate:          [  OK  ]
>> Waiting for cluster services to 
>> >unload:....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
>>                                               [  OK  ]
>> [root@ha1 ~]#
>>
>> And these are parts of the log (/var/log/cluster/corosync.log):
>> Jun 29 15:14:15 [28031] ha1    pengine:   notice: stage6:  Scheduling Node 
>> ha1 for shutdown
>> Jun 29 15:14:15 [28031] ha1    pengine:   notice: LogActions:      Move    
>> ip_resource     (Started ha1 -> ha2)
>> Jun 29 15:14:15 [28031] ha1    pengine:   notice: LogActions:      Move    
>> OracleDB        (Started ha1 -> ha2)
>> Jun 29 15:14:15 [28031] ha1    pengine:   notice: LogActions:      Move    
>> JavaSrv    (Started ha1 -> ha2)
>> Jun 29 15:14:15 [28032] ha1       crmd:     info: te_rsc_command:  
>> Initiating action 12: stop JavaSrv_stop_0 on ha1 (local)
>> Jun 29 15:14:15 ha1 lrmd: [28029]: info: rsc:JavaSrv:16: stop
>> ...
>> Jun 29 15:14:41 [28032] ha1       crmd:     info: process_lrm_event:       
>> LRM operation JavaSrv_stop_0 (call=16, rc=0, cib-update=447, confirmed=true) 
>> ok
>> Jun 29 15:14:41 [28032] ha1       crmd:     info: te_rsc_command:  
>> Initiating action 9: stop OracleDB_stop_0 on ha1 (local)
>> Jun 29 15:14:41 ha1 lrmd: [28029]: info: cancel_op: operation monitor[13] on 
>> lsb::ha-dbora::OracleDB for client 28032, its parameters: 
>> CRM_meta_name=[monitor] crm_feature_set=[3.0.6] CRM_meta_timeout=[600000] 
>> CRM_meta_interval=[60000]  cancelled
>> Jun 29 15:14:41 ha1 lrmd: [28029]: info: rsc:OracleDB:17: stop
>> ...
>> Jun 29 15:15:08 [28032] ha1       crmd:     info: process_lrm_event:       
>> LRM operation OracleDB_stop_0 (call=17, rc=0, cib-update=448, 
>> confirmed=true) ok
>> Jun 29 15:15:08 [28032] ha1       crmd:     info: te_rsc_command:  
>> Initiating action 7: stop ip_resource_stop_0 on ha1 (local)
>> ...
>> Jun 29 15:15:08 [28032] ha1       crmd:     info: process_lrm_event:       
>> LRM operation ip_resource_stop_0 (call=18, rc=0, cib-update=449, 
>> confirmed=true) ok
>> Jun 29 15:15:08 [28032] ha1       crmd:     info: te_rsc_command:  
>> Initiating action 8: start ip_resource_start_0 on ha2
>> Jun 29 15:15:08 [28032] ha1       crmd:     info: te_crm_command:  Executing 
>> crm-event (21): do_shutdown on ha1
>> Jun 29 15:15:08 [28032] ha1       crmd:     info: te_crm_command:  crm-event 
>> (21) is a local shutdown
>> Jun 29 15:15:09 [28032] ha1       crmd:     info: te_rsc_command:  
>> Initiating action 10: start OracleDB_start_0 on ha2
>> Jun 29 15:15:51 [28032] ha1       crmd:     info: te_rsc_command:  
>> Initiating action 11: monitor OracleDB_monitor_60000 on ha2
>> Jun 29 15:15:51 [28032] ha1       crmd:     info: te_rsc_command:  
>> Initiating action 13: start JavaSrv_start_0 on ha2
>> ...
>> Jun 29 15:27:09 [28023] ha1 pacemakerd:     info: pcmk_child_exit:         
>> Child process cib exited (pid=28027, rc=0)
>> Jun 29 15:27:09 [28023] ha1 pacemakerd:   notice: pcmk_shutdown_worker:    
>> Shutdown complete
>> Jun 29 15:27:09 [28023] ha1 pacemakerd:     info: main:    Exiting pacemakerd
>>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] pacemaker shutdown waits for a failover

Reply via email to