[ https://issues.apache.org/jira/browse/SOLR-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319504#comment-17319504 ]
Andrzej Bialecki commented on SOLR-15300: ----------------------------------------- Based on the Slack discussions, I propose to add the following information to the output of CLUSTERSTATUS command: * add a calculated (not stored in DocCollection) "health" property at the level of each shard and each collection. * use the following symbolic names for the health state: ** GREEN: all replicas up, leader exists, ** YELLOW: some replicas down, leader exists, ** ORANGE: many replicas down, leader exists, ** RED: most replicas down, or no leader. * use 66% and 33% of active replicas as the thresholds between yellow/orange/red. * the collection-level health status will be reported as the worst status of any shard. The notion of having a flag for a "read only" collection (when there's no leader or only PULL replicas) needs further thought, because there's already a "readOnly" flag that users can explicitly set using MODIFYCOLLECTION (this flag is also used in REINDEXCOLLECTION). > Shard "state" flag is confusing and of limited value to outside consumers > ------------------------------------------------------------------------- > > Key: SOLR-15300 > URL: https://issues.apache.org/jira/browse/SOLR-15300 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Priority: Major > > Solr API (and consequently the metric reporters, which are often used for > Solr monitoring) report the shard as being in ACTIVE state even when in > reality its functionality is severely compromised (eg. no replicas, all > replicas down, or no leader). > This reported state is technically correct because it is used only for > tracking of the SPLITSHARD operations, as defined in {{Slice.State}}. > However, this may be misleading and more often unhelpful than not - for > constant monitoring a flag that actually reports impaired functionality of a > shard would be more useful than a flag that reports a relatively uncommon > SPLITSHARD operation. > We could either redefine the meaning of the existing flag (and change its > state according to some of the criteria I listed above), or add another flag > to represent the "health" status of a shard. The value of this flag would > then provide an easy way to monitor and to alert external systems of > dangerous function impairment, without monitoring the state of all replicas > of a collection. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org