[Pacemaker] Resource monitoring actions when a resource dies uncleanly

Andrew Lacey Thu, 06 Jan 2011 09:44:41 -0800

Hi- 

First off, I'm new to Pacemaker and there's a tremendous amount of information 
to sift through, so my apologies if this has been answered already.


I'm trying to set up a simple 2-node active/passive cluster that runs squid 
(reverse proxy for web services) on a service IP address. I'm not using STONITH 
because there's no shared data, so nothing horrible would happen if squid 
somehow ends up running on both boxes. So, there are just two resources, squid 
itself and the IP address, configured as a resource group because they must be 
on the same machine. 

I've done some investigation on setting up resource monitoring for squid. 
Ideally, if squid dies for any reason on the currently-active node, I would 
like to fail both resources (squid and IP) over to the other node. For resource 
monitoring, there is an on-fail action called "standby", which is described as: 
"Move all resources away from the node on which the resource failed." That 
sounded to me like what I want, so I tested it. Unfortunately, I found that if 
squid dies uncleanly (simulated by issuing a kill -9 to its process), Pacemaker 
gets into an infinite loop of repeatedly trying to use the init script to 
"stop" squid. The init script is returning some error value because, in its 
words, "squid is dead but pid file exists". squid is never started on the other 
node because Pacemaker is never satisfied that it has truly stopped on the 
original node. 

Since a typical unexpected software failure would be an unclean failure (seg 
fault or whatever), this monitoring doesn't seem very useful if it always gets 
stuck trying to "stop" the crashed service before taking any further action. Is 
there a generally-accepted way around this? Should the init script (LSB) be 
rewritten to respond differently to this situation, or is there some way to get 
Pacemaker to respond differently? 

Thanks, 

-Andrew L

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] Resource monitoring actions when a resource dies uncleanly

Reply via email to