Re: [Pacemaker] Problem: monitor timeout causes cluster resource unmanaged and stopped on both nodes.

Oscar Remírez de Ganuza Satrústegui Thu, 17 Dec 2009 05:48:33 -0800

Hi,

Dejan Muhamedagic escribió:

Hi,


On Thu, Dec 17, 2009 at 09:18:20AM +0100, Andrew Beekhof wrote:

On Wed, Dec 16, 2009 at 5:55 PM, Oscar Remírez de Ganuza Satrústegui
<oscar...@unav.es> wrote:

[snip]

What is happening here?? As it appears in the log, the timeout is suposed to
be 20s (20000ms), and the service jsut took 3s to shutdown.
Is it a problem with lrmd?

Looks like it.


Don't think so. Here's the logs again:

Dec 15 20:12:55 herculespre lrmd: [8559]: info: rsc:mysql-horde-service:38: stop

lrmd invokes the RA to stop mysql. Whatever happened between this
time and the following.

20:13:14 [Note] /usr/local/etc2/mysql-horde/libexec/mysqld: Normal shutdown
20:13:17 [Note] /usr/local/etc2/mysql-horde/libexec/mysqld: Shutdown
Dec 15 20:13:17 herculespre lrmd: [8559]: WARN: mysql-horde-service:stop
process (PID 12270) timed out (try 1). Killing with signal SIGTERM (15).

It could be that you were unlucky here and that the database
really took around 20 seconds to shutdown. If it is so, then

Oh, thanks! You are right!

The command to shutdown the mysql resource was sent at 20:12:55, but the mysql service did not start shutting down until 20:13:14, finishing at 20:13:17, (22 seconds > timeout (20 s))


How is it possible to change the timeout for start or stop operations?

please increase your timeouts. You also mentioned somewhere that
5s is set for a monitor timeout, that's way to low for any kind
of resource. There's a chapter on applications in HA environments
in a paper I recently presented (http://tinyurl.com/yg7u4bd).

We had configured very low timeout for the monitors too. When I tried today to change them, even the crm alerted me and advised me:

crm(live)# configure edit

WARNING: mysql-horde-nfs: timeout 10s for monitor_0 is smaller than the advised 40 WARNING: mysql-horde-service: timeout 10s for monitor_0 is smaller than the advised 15

WARNING: pingd: timeout 10s for monitor_0 is smaller than the advised 20

I have read your paper and understand the importance of tunning correctly the timeout values, in order not to cause false positives and unavailabilities.


Just two last questions:

Is it 'normal' to set a resource as "unmanaged" just because the stop operation was timed out once? Is it possible to configure the cluster to try more than once to stop a resource? (as it is possible to do for the start operation with the cluster property start-failure-is-fatal="false")


Thank you very much for your help!
I really appreciate it!

Regards,

---
Oscar Remírez de Ganuza
Servicios Informáticos
Universidad de Navarra
Ed. de Derecho, Campus Universitario
31080 Pamplona (Navarra), Spain
tfno: +34 948 425600 Ext. 3130

http://www.unav.es/SI

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] Problem: monitor timeout causes cluster resource unmanaged and stopped on both nodes.

Reply via email to