I posted earlier asking for help because I had a primitive whose monitor operation was not getting canceled at the time that a manual relocation was performed. I updated pacemaker (as was suggested) to pacemaker-1.1.2-0.6.1 which is the latest I could find for an IA64 platform without having to build from source. If anyone knows of a later IA64 binary version I would appreciate that information.

The monitor problem persisted after the upgrade, though the error messages I was seeing earlier were no longer present. They were apparently unrelated. Painful trial and error lead me to the conclusion that it was the primitive's start-op timeout and monitor-op start-delay values. When I had these values set at 480s, the monitor-op did not get canceled for a manual relocation and so would get rescheduled after the relocation only to find the resource not operational (it had been relocated) and thus set the fail-count to non-zero, fencing the resource. If I set the values to 240s, everything went smoothly and the monitor-op was canceled.

As a test, I changed a different primitive's values to 480s and that primitive then displayed the failing behavior.

If anyone knows why this might be the case (perhaps there are rules I am unaware of that prohibit larger values) I would appreciate the information. If not, I guess I should will a bug.

Thanks for any help in advance.

Phil

--
        Phil Armstrong       p...@sgi.com
        Phone: 651-683-5561  VNET 233-5561


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to