Hi, While investigation further, I noticed the following metric being exposed: "SEARCHER.searcher.indexVersion":1000687854,
However, independently this doesn't provide any view of the replication lag and I may need to compare it with what is reported by master's core (and/or other pull replicas in the cluster) Also, on the core overview page, we have a similar metric exposed in terms of "version" which is inline with what metric endpoint provides. However, it also has version and gen mentioned under the replication section. [image: Screenshot 2023-08-28 at 12.18.29 PM.png] Wanted to understand the following things in order to build the proper alerting strategy: 1. Difference between Statistics.version and Replication.version? 2. Which amongst the following is the best indicator for replication lag: Statistics.version, Replication.version, Replication.Gen 3. Not all nodes report values under Replication>Master(Replicable) On Mon, Aug 28, 2023 at 11:30 AM 6harat <bharat.gulati.ce...@gmail.com> wrote: > More details about the setup: > Solr version: 8.2.0 > > The documentation here: > https://solr.apache.org/guide/8_2/metrics-reporting.html#core-solrcore-registry > talks about > >> The Core (SolrCore) Registry >> <https://solr.apache.org/guide/8_2/metrics-reporting.html#core-level-metrics> >> includes solr.core.<collection>, one for each core. When making >> requests with the Metrics API >> <https://solr.apache.org/guide/8_2/metrics-reporting.html#metrics-api>, >> you can specify &group=core to limit to only these metrics. >> >> - all common RequestHandlers report: request timers / counters, >> timeouts, errors. Handlers that support process distributed shard requests >> also report shardRequests sub-counters for each type of distributed >> request. >> - index-level events >> >> <https://solr.apache.org/guide/8_2/metrics-reporting.html#index-merge-metrics>: >> meters for minor / major merges, number of merged docs, number of deleted >> docs, gauges for currently running merges and their size. >> - *shard replication and transaction log replay on replicas,* >> >> but I am unable to find the relevant metric name which corresponds to > shard replication and transaction log replay. I have also attached the > output of "solr/admin/metrics?group=core" from one of our pull replica nodes > > Regards > 6harat > > On Mon, Aug 28, 2023 at 11:20 AM 6harat <bharat.gulati.ce...@gmail.com> > wrote: > >> Hi, >> >> We are running a Solr Cloud setup in production with the following setup: >> 1. TLOG nodes: 3 >> 2. Pull nodes: M (depending upon the read scalability that is needed) >> >> Last week we encountered an issue where one of the pull replica wasn't >> able to fetch the index from the leader. While we are still in the RCA >> process, we wanted to find if a metric already exists under >> "/solr/admin/metrics" which can be used as a way to identify when the given >> core last synced up with the leader. This will massively help in improving >> our alerting setup and figure out stale nodes quickly. >> >> Apologies if such a question is already answered or the details already >> exist in the reference manual. If that is indeed the case, please drop the >> relevant link below. >> >> Regards >> 6harat >> > > > -- > Regards > 6harat > -- Regards 6harat