On 10/05/2011 04:19 AM, Andrew Beekhof wrote:
On Mon, Oct 3, 2011 at 5:50 PM, Proskurin Kirill
<k.prosku...@corp.mail.ru>  wrote:
On 10/03/2011 05:32 AM, Andrew Beekhof wrote:

corosync-1.4.1
pacemaker-1.1.5
pacemaker runs with "ver: 1"

2)
This one is scary.
I twice run on situation then pacemaker thinks what resource is started
but
it is not.

RA is misbehaving.  Pacemaker will only consider a resource running if
the RA tells us it is (running or in a failed state).

But you can see below, what agent return "7".

Its still broken. Not one stop action succeeds.

Sep 30 13:58:41 mysender34.mail.ru lrmd: [26299]: WARN:
tranprocessor:stop process (PID 4082) timed out (try 1).  Killing with
signal SIGTERM (15).
Sep 30 14:09:34 mysender34.mail.ru lrmd: [26299]: WARN:
tranprocessor:stop process (PID 21859) timed out (try 1).  Killing
with signal SIGTERM (15).
Sep 30 20:04:17 mysender34.mail.ru lrmd: [26299]: WARN:
tranprocessor:stop process (PID 24576) timed out (try 1).  Killing
with signal SIGTERM (15).

/That/ is why pacemaker thinks its still running.

I made an experiment.

I create script what don`t die at SIGTERM

#!/usr/bin/perl
$SIG{TERM} = "IGNORE"; sleep 1 while 1

And run it on pacemaker.
I run 3 tests:
1) primitive test-kill-15.pl ocf:mail.ru:generic \
        op monitor interval="20" timeout="5" on-fail="restart" \
        params binfile="/tmp/test-kill-15.pl" external_pidfile="1"

2) Same but on-fail=block

3) Same but with metaware stonith.

Each time I do:
crm resource stop test-kill-15.pl

And in case 1 and 2 - I get "unmanaged" on this resource.
In case 3 I get stonith situation.

From IRC:
(12:20:44 PM) beekhof: Oloremo: what the hell is the cluster supposed to do if stop fails and you dont want fencing? it cant start it anywhere because its still active in the original location (12:30:09 PM) Oloremo: I get the point, really. But may be it should make it unmanaged?

And it does.

So can I assume what my problem with monitoring still not that clear? I don`t get "unmanaged" - it is just thinks that resource are started but it`s not.


--
Best regards,
Proskurin Kirill

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to