Re: [Pacemaker] resource moving unnecessarily due to ping race condition

Brad Johnson Tue, 27 Sep 2011 05:46:52 -0700

The patch alone does not give an advantage to the active node. Butremember I said we are using an fping resource agent we wrote thatvaries the dampening based on which node it is running on and whetherthe score is rising or falling. But the dampening it sets was beingover-ridden by the attrd flush message, since it stops the timer andsends the score immediately. That RA, along with the patch, solves ourproblem. I can now reboot ping nodes all day long without our resourcefailing back and forth, while still allowing legitimate fail-overs whena node truly has better network connectivity than the other.


On 09/26/2011 09:57 PM, Andrew Beekhof wrote:

On Mon, Sep 26, 2011 at 10:57 PM, Brad Johnson<bjohn...@ecessa.com>  wrote:

I agree that the patch assumes the use of "pingd" for the attribute name,
and there may be a better way of coding that. However, I don't see how
setting dampen=0 fixes our problem.  The problem occurs when a ping node
becomes inaccessible to all nodes in the cluster (it is rebooted for
example). Without giving any timing advantage to the currently active node,

The patch doesn't do this though.

it is essentially just a race between the nodes to see who notices the
outage first and can update the attribute fastest.

Now I'm confused.

You said "we do not want the other node to be able to challenge us to
an immediate score comparison".
dampen=0 does the same thing as the patch... it tells attrd to update
the CIB immediately, without waiting to give everyone a chance to
notice the change in connectivity too.

The result is we see
fail-over when the ping node goes down, and fail-back when it comes back up.
The fact is that dampening alone does not solve this. Which is why we use a
resource agent that uses selective dampening based on where the resource is
running.

On 09/25/2011 08:58 PM, Andrew Beekhof wrote:

On Fri, Sep 23, 2011 at 9:53 PM, Brad Johnson<bjohn...@ecessa.com>    wrote:

Yes, but the patch only affects the pingd attribute.

Use of the name 'pingd' isnt mandatory though.

And we do not want the
other node to be able to challenge us to an immediate score comparison.
That
is the whole idea behind the fping OCF resource agent we are using, to
give
the timing advantage to the node currently running the resource by
delaying
rising scores on the idle, and falling scores on the active node.

Why not just set dampen=0?

On 09/22/2011 09:10 PM, Andrew Beekhof wrote:

On Tue, Sep 20, 2011 at 10:34 PM, Brad Johnson<bjohn...@ecessa.com>
  wrote:

It is not necessarily the case that the outside world can't reach the
cluster. Ours is a multi-homed device connecting to multiple WANs and
LANs.
We want the device with the best connectivity to be the active device.
To
get around the problem of failovers occurring when a ping node reboots
for
example, I have written an fping OCF RA that uses different dampening
delays
based on if it is running on the active or idle device. I have also
patched
pacemaker attrd.c to fix it so it doesn't send an immediate update when
it
receives a flush message from the other node. This was causing it to
ignore
any running delay timer.

Thats the point of the flush message though.  So that all nodes write
their current value at the same time.

Here is that patch:

--- tools/attrd.orig.c    2011-09-13 08:29:46.946820348 -0500
+++ tools/attrd.c    2011-09-14 13:33:59.606894754 -0500
@@ -348,10 +348,14 @@
         attrd_local_callback(xml);

     } else if(ignore == NULL || safe_str_neq(from, attrd_uname)) {
+        const char *attr  = crm_element_value(xml, F_ATTRD_ATTRIBUTE);
+        /* Don't send update for score if msg is from other node */
+        if(safe_str_eq(from, attrd_uname) || safe_str_neq(attr,
"pingd")) {
         crm_info("%s message from %s", op, from);
         hash_entry = find_hash_entry(xml);
         stop_attrd_timer(hash_entry);
         attrd_perform_update(hash_entry);
+        }
     }
     free_xml(xml);
  }


On 09/19/2011 10:51 PM, Andrew Beekhof wrote:

On Sun, Sep 11, 2011 at 2:30 AM, Vadym Chepkov<vchep...@gmail.com>
  wrote:

On Sep 8, 2011, at 3:40 PM, Florian Haas wrote:

On 09/08/11 20:59, Brad Johnson wrote:

We have a 2 node cluster with a single resource. The resource
must
run
on only a single node at one time. Using the pacemaker:ocf:ping
RA
we
are pinging a WAN gateway and a LAN host on each node so the
resource
runs on the node with the greatest connectivity. The problem is
when
a
ping host goes down (so both nodes lose connectivity to it), the
resource moves to the other node due to timing differences in how
fast
they update the score attribute. The dampening value has no
effect,
since it delays both nodes by the same amount. These unnecessary
fail-overs aren't acceptable since they are disruptive to the
network
for no reason.
Is there a way to dampen the ping update by different amounts on
the
active and passive nodes? Or some other way to configure the
cluster
to
try to keep the resource where it is during these tie score
scenarios?

location pingd-constraint group_1 \
  rule $id="pingd-constraint-rule" pingd: defined pingd

May I suggest that you simply change this constraint to

location pingd-constraint group_1 \
  rule $id="pingd-constraint-rule" \
    -inf: not_defined pingd or pingd lte 0

That way, only a host that definitely has _no_ connectivity carries
a
-INF score for that resource group. And I believe that is what you
really want, rather than take the actual ping score as a placement
weight (your "best connectivity" approach).

Just my 2 cents, though.

Even though this approach was recommended many times, there is a
problem
with it.
What if all nodes for some reason are not able to ping ?
This rule would cause a resource to be brought down completely,
whereas
if you use "best connectivity" approach it will stay up where it was
before
network failed.

If the outside[1] world can't reach the cluster, is there much benefit
in having it running?

[1] Substitute "outside" for wherever your users are, hopefully you
picked a ping node from the same area.

Vadym




_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:


http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:


http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:

http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:

http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] resource moving unnecessarily due to ping race condition

Reply via email to