[Pacemaker] Time to a service stop is very long.

renayama19661014 Thu, 21 Oct 2010 01:34:19 -0700

Hi,

We confirmed movement when we set freeze in no-quorum-policy.
In the cluster that freeze setting became effective, we stopped the service.


However, a stop of the service took time very much.

We set "shutdown-escalation" for five minutes to shorten the time for test.
But, a stop of the service of one node takes time more than five minutes.

I confirmed it in the next procedure.

Step1) Start four nodes and send cib.xml.
Step2) Intercept Heartbeat communication and divide it in two nodes.
Step3) The node does freeze.
Step4) In two divided one nodes, we stop Hearbeat at the same time.

[r...@srv03 ~]# service heartbeat stop
Stopping High-Availability services:                       
[r...@srv04 ~]# service heartbeat stop
Stopping High-Availability services:                       

Step5) Heartbeat of one node stops in a few minutes.
[r...@srv04 ~]# service heartbeat stop
Stopping High-Availability services:                       [  OK  ]

Step6) But, Heartbeat of one node does not stop anymore unless, furthermore, 
time passes.
 * The timer of shutdown-escalation starts, but time when we set it(5min) does 
not seem to become
effective.

[r...@srv03 ~]# service heartbeat stop
Stopping High-Availability services:                       [  OK  ] 

Oct 21 16:46:57 srv03 crmd: [4432]: info: do_shutdown_req: Sending shutdown 
request to DC: srv03
Oct 21 16:46:57 srv03 crmd: [4432]: info: handle_shutdown_request: Creating 
shutdown request for srv03
(state=S_IDLE)
Oct 21 16:53:07 srv03 cib: [4428]: info: cib_stats: Processed 805 operations 
(38149.00us average, 5%
utilization) in the last 10min
Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: crm_timer_popped: Shutdown 
Escalation (I_STOP) just popped!
Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: do_log: FSA: Input I_STOP from 
crm_timer_popped() received
in state S_IDLE
Oct 21 16:57:20 srv03 crmd: [4432]: info: do_state_transition: State transition 
S_IDLE -> S_STOPPING [
input=I_STOP cause=C_TIMER_POPPED origin=crm_timer_popped ]
Oct 21 16:57:20 srv03 crmd: [4432]: info: do_dc_release: DC role released
Oct 21 16:57:20 srv03 crmd: [4432]: info: stop_subsystem: Sent -TERM to 
pengine: [5007]


Is it right movement to take time to this service stop?

 * Because the log was very big, I did not attach it. 
 * If log is necessary, I send it in Bugzilla.

Best Regards,
Hideo Yamauchi.


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] Time to a service stop is very long.

Reply via email to