Hi again,I have been digging on the documentation, and thought I must answer my own questions, just to share them with the list, maybe someone will find them interesting too.
Oscar Remírez de Ganuza Satrústegui escribió:
What is happening here?? As it appears in the log, the timeout is suposed tobe 20s (20000ms), and the service jsut took 3s to shutdown. Is it a problem with lrmd?Looks like it.It could be that you were unlucky here and that the database really took around 20 seconds to shutdown. If it is so, thenOh, thanks! You are right!The command to shutdown the mysql resource was sent at 20:12:55, but the mysql service did not start shutting down until 20:13:14, finishing at 20:13:17, (22 seconds > timeout (20 s))How is it possible to change the timeout for start or stop operations?
Have a look here: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-operation-defaults.html#id525162
As found here (http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-failure-migration.html): "Stop failures are slightly different and crucial. If a resource fails to stop and STONITH is enabled, then the cluster will fence the node in order to be able to start the resource elsewhere. If STONITH is not enabled, then the cluster has no way to continue and will not try to start the resource elsewhere, but will try to stop it again after the failure timeout."We had configured very low timeout for the monitors too. When I tried today to change them, even the crm alerted me and advised me:please increase your timeouts. You also mentioned somewhere that 5s is set for a monitor timeout, that's way to low for any kind of resource. There's a chapter on applications in HA environments in a paper I recently presented (http://tinyurl.com/yg7u4bd).crm(live)# configure editWARNING: mysql-horde-nfs: timeout 10s for monitor_0 is smaller than the advised 40 WARNING: mysql-horde-service: timeout 10s for monitor_0 is smaller than the advised 15WARNING: pingd: timeout 10s for monitor_0 is smaller than the advised 20I have read your paper and understand the importance of tunning correctly the timeout values, in order not to cause false positives and unavailabilities.Just two last questions:Is it 'normal' to set a resource as "unmanaged" just because the stop operation was timed out once?
Is it possible to configure the cluster to try more than once to stop a resource? (as it is possible to do for the start operation with the cluster property start-failure-is-fatal="false")
I will configure the attribute failure-timeout and make some tests.Thank you very much for your time building this software, and helping us to use it!
Regards, --- Oscar Remírez de Ganuza Servicios Informáticos Universidad de Navarra Ed. de Derecho, Campus Universitario 31080 Pamplona (Navarra), Spain tfno: +34 948 425600 Ext. 3130 http://www.unav.es/SI
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker