Hi, On Mon, Jul 19, 2010 at 07:09:11PM -0300, Diego Woitasen wrote: > 2010/7/16 Diego Woitasen <di...@woitasen.com.ar>: > > Hi, > > I've installed Heartbeat+Pacemaker (3.0.3 and 1.0.9). I have a > > resource which executes an script to check the service: > > > > primitive kolab_imapd ocf:heartbeat:kolab-service \ > > params service="all" monitor_script="/usr/local/bin/check-imap.py" \ > > meta migration-threshold="3" failure-timeout="300s" > > is-managed="true" \ > > operations $id="operations-imap" \ > > op monitor interval="20s" timeout="30s" on-fail="restart" \ > > op start interval="0" timeout="120" \ > > op stop interval="0" timeout="120" > > > > I did I/O stress using bonnie++ and I started to see this message: > > > > Jul 16 18:24:38 imapserver lrmd: [4719]: WARN: perform_ra_op: the > > operation operation monitor[21] on ocf::kolab-service::kolab_imapd for > > client 4722, its parameters: CRM_meta_interval=[20000] > > monitor_script=[/usr/local/bin/check-imap.py] > > CRM_meta_on_fail=[restart] CRM_meta_timeout=[30000] > > crm_feature_set=[3.0.1] CRM_meta_name=[monitor] service=[all] stayed > > in operation list for 32740 ms (longer than 10000 ms) > > > > The problem is that I've got this messages under High I/O without the > > stress testing, for example running backups. If I understand that > > message correctly the monitor operation didn't start, it was waiting > > on some workqueue to start.
It was most probably waiting for the previous monitor operation to finish, though that one should have timed out according to your configuration. Or there were at least 4 operations on different resources running on the node. If you expect high load on the server, you should tune timeouts accordingly. Thanks, Dejan > > If I try to execute a command while I'm running the stress it's slow > > (3 seconds aprox.) but it works. For example, I can run "crm configure > > show" and the output appears in 3 o 4 seconds. > > > > The server have 2 quad-core processors, 6 GB of RAM, running RHEL 5. > > > > Regards, > > Diego > > > > -- > > Diego Woitasen > > > > > I've rised the priority of the process to 10 and works now. > > The documentations says that default rtprio is 5. That's wrong it's 1. > At least in my pkgs... > > Regards, > Diego > > -- > Diego Woitasen > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker