Hi, 

Have a look at https://issues.apache.org/jira/browse/SOLR-15767 and assiciated 
PR which proposes some common alerting rules.

Jan

> 4. feb. 2025 kl. 17:08 skrev Afroz, Neshat (NIH/NLM/NCBI) [C] 
> <neshat.af...@nih.gov.INVALID>:
> 
> Hello
> 
> I am setting up solr monitoring using Prometheus, solr-exporter and Grafana. 
> I also went ahead and installed alertmanager. Now my collection clusterstatus 
> looks as below:
> 
> 
> curl -u "solradmin:xxxxxxxxxxxxx" 
> 'http://host01:8940/solr/admin/collections?action=CLUSTERSTATUS&wt=json&indent=true'
> {
>  "responseHeader":{
>    "status":0,
>    "QTime":1
>  },
>  "cluster":{
>    "collections":{
>      "test_coll1":{
>        "pullReplicas":0,
>        "configName":"test_coll1",
>        "replicationFactor":1,
>        "router":{
>          "name":"compositeId"
>        },
>        "nrtReplicas":1,
>        "tlogReplicas":0,
>        "shards":{
>          "shard1":{
>            "range":"80000000-7fffffff",
>            "state":"active",
>            "stateTimestamp":"1733764681612235673",
>            "replicas":{
>              "core_node2":{
>                "core":"test_coll1_shard1_replica_n1",
>                "node_name":"host01:8940_solr",
>                "type":"NRT",
>                "state":"active",
>                "leader":"true",
>                "force_set_state":"false",
>                "base_url":http://host01:8940/solr
>              }
>            },
>            "health":"GREEN"
>          }
>        },
>        "health":"GREEN",
>        "znodeVersion":39,
>        "creationTimeMillis":1733764591523
>      },
>      "test_coll2":{
>        "pullReplicas":0,
>        "configName":"test_coll2",
>        "replicationFactor":1,
>        "router":{
>          "name":"compositeId"
>        },
>        "nrtReplicas":1,
>        "tlogReplicas":0,
>        "shards":{
>          "shard1":{
>            "range":"80000000-7fffffff",
>            "state":"active",
>            "stateTimestamp":"1733765402371289174",
>            "replicas":{
>              "core_node2":{
>                "core":"test_coll2_shard1_replica_n1",
>                "node_name":"host01:8940_solr",
>                "type":"NRT",
>                "state":"active",
>                "leader":"true",
>                "force_set_state":"false",
>                "base_url":http://host01:8940/solr
>              }
>            },
>            "health":"GREEN"
>          }
>        },
>        "health":"GREEN",
>        "znodeVersion":39,
>        "creationTimeMillis":1733765317706
>      }
>    },
>    "live_nodes":["host01:8940_solr"]
>  }
> }
> 
> 
> I am looking to get alerted when the state of core changes. As per 
> https://solr.apache.org/guide/solr/latest/deployment-guide/cluster-node-management.html#clusterstatus
>  I can have 4 states i.e. red, orange, yellow and green. I am looking to 
> setup an alert for the same either through Grafana or alertmanager to be 
> emailed when either of these states happen.
> 
> I have the below entry in rules.yml in Prometheus:
>  # Alert for collection core status solr_collections_shard_state
>  - alert: SolrCoreDown
>    # Condition for alerting
>    expr: solr_collections_shard_state < 1.00
>    for: 1m
>    # Annotation - additional informational labels to store more information
>    annotations:
>      summary: "Solr shard {{ $labels.collection }}-{{ $labels.shard }} is 
> down"
>      description: "The Solr shard {{ $labels.collection }}-{{ $labels.shard 
> }} has been in a non-active state for 1 minutes."
> 
> I don't get alerted with the above rule. I assume that it would be the same 
> expression or query that can be used in Grafana as well. I would appreciate 
> guidance with writing the rule/query above either in alertmanager or Grafana 
> to get the desired alerts.
> 
> Thanks

Reply via email to