Re: [Pacemaker] New patch for System Health feature

Lars Marowsky-Bree Mon, 18 May 2009 06:29:28 -0700

On 2009-05-15T19:29:30, Mark Hamzy <ha...@us.ibm.com> wrote:

> 
> Here is attempt #3:
> 
> (See attached file: pacemaker.mark.patch)


Hi Mark,

thanks for your contribution!

Can you provide a description and example of how this final version
would be used? Maybe a wiki page or something?

> Questions/comments?

>From an initial reading, I think what you're effectively doing is
modifying the "base score" of a node, similar to the symmetric-cluster
yes/no setting.

You're also assuming, if I'm not misreading this, that the scale of
every "health" value is identical (as there is just one set of score
mappings).

As it stands, I feel I don't much like the approach, I _think_, but I
understand I'm chiming in a bit late :-(

Can't we instead define a policy to dynamically calculate a node
attribute?

<constraints>
<dyn_node_attr node="node1" attribute="health">
 <rule id="health-1" score="-INFINITY">
  <expression id="exp-1" attribute="health-X" operation="lt" value="0" />
 </rule>
</dyn_node_attr>
<rsc_location rsc="my-filesystem" score_from_node_attr="health" />
...

If you think having one such constraint for each node or resource is
cumbersome, I'd agree, but that could be handled using wildcards (or
macros in the crm shell).

Or the attribute could be "#base-score", which would magically make that
rule modify the node's base score.

The rsc_location constraint could be modified to apply to resources
which have (or do not have) a specific attribute set, so one could
invent a "exclude-from-health-check" attribute for the resources which
should _not_ be disabled, for example like the health monitoring / pingd
agents themselves ...

Surprisingly, one would then find that the "standby" node attribute
could be internally mapped to a dyn_node_attr rule too, and this would
offer interesting combinations with attrd.

It would allow health scores (and others) to be aggregated into several
variables, so that perhaps some affect all resources while others just
affect a subset.

This would also allow one to specify a rule that inhibited just
promotion to "master" state, while keeping the replica active for
longer.

Whether or not any particular health checker's result suggests that
resources should be moved away now or not should be handled internally
to the health checker, before it sets the yes/no flag in the CIB; i.e.,
a particular health checker either is red or green. I don't much like
"yellow" - we've gone through the same with resource agent exit codes,
and ended up not having any use for this (except for pretty pictures in
the GUI ;-)


Before this can be merged into the stable branch, we need:

- concensus on the design,
- documentation,
- support from the crm shell,
- and some field experience at least from test clusters.

So for now I think this should go into the development branch.


Regards,
    Lars

-- 
SuSE Labs, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Re: [Pacemaker] New patch for System Health feature

Reply via email to