hy, thx i configured these values now. i hope that we won't face this problem again, otherwise, like i said, i turned on the debug mode of the ping ra, and if i get the next maintenance window, i'll turn on cluster debog mode. so we'd have more log info to find the reason for this problem.
thx again. kr patrik Mit freundlichen Grüßen / Best Regards Patrik Rapposch, BSc System Administration KNAPP Systemintegration GmbH Waltenbachstraße 9 8700 Leoben, Austria Phone: +43 3842 805-915 Fax: +43 3842 82930-500 patrik.rappo...@knapp.com www.KNAPP.com Commercial register number: FN 138870x Commercial register court: Leoben The information in this e-mail (including any attachment) is confidential and intended to be for the use of the addressee(s) only. If you have received the e-mail by mistake, any disclosure, copy, distribution or use of the contents of the e-mail is prohibited, and you must delete the e-mail from your system. As e-mail can be changed electronically KNAPP assumes no responsibility for any alteration to this e-mail or its attachments. KNAPP has taken every reasonable precaution to ensure that any attachment to this e-mail has been swept for virus. However, KNAPP does not accept any liability for damage sustained as a result of such attachment being virus infected and strongly recommend that you carry out your own virus check before opening any attachment. Lars Ellenberg <lars.ellenb...@linbit.com> 11.01.2011 14:47 Bitte antworten an The Pacemaker cluster resource manager <pacemaker@oss.clusterlabs.org> An pacemaker@oss.clusterlabs.org Kopie Thema Re: [Pacemaker] pingd process dies for no reason On Tue, Jan 11, 2011 at 11:24:35AM +0100, patrik.rappo...@knapp.com wrote: > we already made changes to the interval and timeout (<op > id="pingd-op-monitor-30s" interval="30s" name="monitor" timeout="10s"/>). > > how big should dampen be set? > > please correct me, if i am wrong, as i calculate it as following: > assuming the last check was ok and in the next second, the failures takes > place: > then we there would be 29s till the next check will start, and another 10 > seconds timeout, plus 5 seconds dampen. this would be 44 seconds, isn't > that enough? I think "dampen" needs to be larger than the monitoring interval. And the timeout on the operation should be large enough that ping, even if the remote is unreachable for the first time, will timeout by itself (and not killed prematurely by lrmd because the operation timeout elapsed). try with interval 15s, dampen 20, instance parameter timeout: something explicit, if you want to. instance parameter attempts: something explicit, if you want to. monitor operation timeout=60s BTW, someone should really implement the fping based ping RA ... Or did I miss it? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker