Andrew Beekhof írta:

On Feb 12, 2008, at 4:59 PM, Zoltan Boszormenyi wrote:

Hi,

Serge Dubrouski írta:
pgsql OCF RA doesn't support multistate configuration so I don't think
that creating a clone would be a good idea.


Thanks for the information.

Some other questions.

According to http://linux-ha.org/v2/faq/resource_too_active
the monitor action should return 0 for running, 7 ($OCF_NOT_RUNNING)
for downed resources and anything else for failed ones.
Either this documentation is buggy,

no

or heartbeat doesn't conform to its own docs.

also no


Here's the scenario: londiste creates a pidfile and deletes it when it quits correctly. However, if I kill it manually then the pidfile stays. What should my script return when it detects that the process with the indicated PID is no longer there? It's not a "downed" resource, it's a failed one. So I returned $OCF_ERR_GENERIC.
But after some time heartbeat says that my resource became "unmanaged".

i'm guessing (because you've not included anything on which to comment properly) that the stop action failed

It shouldn't have failed, stop action always returns $OCF_SUCCESS.

In contrast to this, the pgsql OCF RA does it differently. It always returns 7 when it finds that there's no postmaster process. Which is the right behaviour?

it depends what you want to happen.
if you want a stop to be sent, use OCF_ERR_GENERIC.
if the resource is stateless and doesnt need any cleaning up, use OCF_NOT_RUNNING

It's quite an important detail. Shouldn't this be documented at
http://linux-ha.org/OCFResourceAgent ?

We use heartbeat 2.0.8, I haven't said it in my first mail.

which was arguably the worst release that ever went out - please get something newer

That's quite an interesting info. :-|
Unfortunately we're kind of stuck to it...

On Feb 8, 2008 2:43 PM, Zoltan Boszormenyi <[EMAIL PROTECTED]> wrote:

...
But I noticed that somehow IP takeover doesn't take place
if I pull the plug on the virtual ethernet card(s).
I tried it with two Fedora 6 systems inside VMWare.
I have set up pingd and my host machine as the ping node
and services stop on the machine separated from the network,
the virtual IP isn't started on the still connected machine.


This above isn't true. I just didn't wait enough. However, there is a problem. I have set up pingd according to the docs, so the ping attribute is 100 and when the node doesn't have the ping attribute then the resource is stopped.
I have set a static preference of 20 points score to start virtual IP on
the master node and a 40 resource_stickiness for the virtual IP. So:

1. both nodes are up, preferred score makes the virtual IP to start on the master.
   (master: 120 points, slave: 100 points)
2. both nodes are up, virtual IP is already running on master
   (master: 160 points, slave: 100 points)
3. I pull the ethernet out of master, after some time master notices
  that the ping host is gone, and the slave notices the master is gone.
   (master: 60 points, slave: 100 points)
4. At this point, slave starts virtual IP, the master stops its own.
   (master: 20 points, slave: 140 points)
5. I have put the plug back into the master, soon it notices the
  ping node is back.
  The slave also notices that the master is back.
   (master: 120 points, slave: 140 points)

Despite the 140 points on the slave for having the pingd score (100)
and already running the virtual IP (resource stickiness), the master
takes back the virtual IP. And it's regardless of auto_failback being
on or off. Why?

I kind of solved it by using resource_stickiness = 200 for the virtual IP.
But my "why" question still stands.
Is the pingd score counted for every medium used by heartbeat?
E.g. for a redundant connection, I have set up eth0 and eth1 for
both nodes inside vmware, both interfaces are on the same subnet.
But in this case, both nodes get 200 as pingd score and the difference
at 5. above would still prevent IP failback (220 vs 240)

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to