On Thu, Apr 23, 2009 at 17:49, Mark Hamzy <ha...@us.ibm.com> wrote: > Hello, > > I am working on a feature to add system health metrics to HA. With this > information, HA could failover nodes away from hardware that might have > problems. The initial proposal briefly started on the linux-HA mailing list, > but it has been moved to the pacemaker mailing list. > > The following is a short description of what we want this new feature to do. > > Feature Name: Health monitoring support > Purpose: Allow pacemaker to schedule resources in a way that's sensitive to > a variety of server-related health metrics > > Description: > Add support in pacemaker for a class of attributes which would be specially > treated. Under this proposal, all attributes defined for a node whose name > matches the regular expression /^#health-.*$/ would be automatically added > into the score for each resource being considered for scheduling on that > node. > > The purpose of this is to allow multiple independent health monitors to each > set their own health status and have that taken into account when scheduling > resources. For example, IBM might define one called #health-ibmserver. > Someone using smarttools (disk health monitors) might define one called > #health-smarttools. Someone else using IPMI might define one called > #health-ipmi. This means that this feature is not specific to any vendor, > and various health monitor providers can develop health metrics for their > hardware and not have to coordinate with each other in their development > process. > > Typical usage of these variables is expected to be something like this: > > Health Attribute-value Meaning > green 1000 server is happy, capable of running any resource > yellow 0 server is marginal - it is desirable to schedule resources > somewhere else if you can > red -INFINITY server is unreliable (but still up) and should not be used > > Note that all of the values given would be configuration-specific. These > attributes would be set via attrd_updater.
Agreed. What I'm not yet clear on though, is why you can't just use these attribute with the existing rsc_location constraints. (And even if there is a need to expose it differently to users, it should definitely be using the rsc_location logic internally) > Should the translation of health scores (colors) into specific valuse be > done outside the core system? I think some PE options would be a good idea. health-score-red=..., health-score-yelow=..., ... > There should be an API for health monitoring agents. More information? > This would be similar to cluster-wide default set by symmetric-cluster true > (0) or false (-INFINITY). You lost me here. > Special Note: > IBM is already in the process of developing such a health monitoring tool > for IBM X (intel-class) servers. > > So, what do you all think of this proposed functionality? Does it sound > reasonable? Comments are appreciated. > > Mark > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker