Hi, I've installed Heartbeat+Pacemaker (3.0.3 and 1.0.9). I have a resource which executes an script to check the service:
primitive kolab_imapd ocf:heartbeat:kolab-service \ params service="all" monitor_script="/usr/local/bin/check-imap.py" \ meta migration-threshold="3" failure-timeout="300s" is-managed="true" \ operations $id="operations-imap" \ op monitor interval="20s" timeout="30s" on-fail="restart" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" I did I/O stress using bonnie++ and I started to see this message: Jul 16 18:24:38 imapserver lrmd: [4719]: WARN: perform_ra_op: the operation operation monitor[21] on ocf::kolab-service::kolab_imapd for client 4722, its parameters: CRM_meta_interval=[20000] monitor_script=[/usr/local/bin/check-imap.py] CRM_meta_on_fail=[restart] CRM_meta_timeout=[30000] crm_feature_set=[3.0.1] CRM_meta_name=[monitor] service=[all] stayed in operation list for 32740 ms (longer than 10000 ms) The problem is that I've got this messages under High I/O without the stress testing, for example running backups. If I understand that message correctly the monitor operation didn't start, it was waiting on some workqueue to start. If I try to execute a command while I'm running the stress it's slow (3 seconds aprox.) but it works. For example, I can run "crm configure show" and the output appears in 3 o 4 seconds. The server have 2 quad-core processors, 6 GB of RAM, running RHEL 5. Regards, Diego -- Diego Woitasen _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker