I posted earlier asking for help because I had a primitive whose monitor
operation was not getting canceled at the time that a manual relocation
was performed. I updated pacemaker (as was suggested) to
pacemaker-1.1.2-0.6.1 which is the latest I could find for an IA64
platform without having to build from source. If anyone knows of a later
IA64 binary version I would appreciate that information.
The monitor problem persisted after the upgrade, though the error
messages I was seeing earlier were no longer present. They were
apparently unrelated. Painful trial and error lead me to the conclusion
that it was the primitive's start-op timeout and monitor-op start-delay
values. When I had these values set at 480s, the monitor-op did not get
canceled for a manual relocation and so would get rescheduled after the
relocation only to find the resource not operational (it had been
relocated) and thus set the fail-count to non-zero, fencing the
resource. If I set the values to 240s, everything went smoothly and the
monitor-op was canceled.
As a test, I changed a different primitive's values to 480s and that
primitive then displayed the failing behavior.
If anyone knows why this might be the case (perhaps there are rules I am
unaware of that prohibit larger values) I would appreciate the
information. If not, I guess I should will a bug.
Thanks for any help in advance.
Phil
--
Phil Armstrong p...@sgi.com
Phone: 651-683-5561 VNET 233-5561
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker