Hi,

While investigation further, I noticed the following metric being exposed:
"SEARCHER.searcher.indexVersion":1000687854,

However, independently this doesn't provide any view of the replication lag
and I may need to compare it with what is reported by master's core (and/or
other pull replicas in the cluster)

Also, on the core overview page, we have a similar metric exposed in terms
of "version" which is inline with what metric endpoint provides.
However, it also has version and gen mentioned under the replication
section.
[image: Screenshot 2023-08-28 at 12.18.29 PM.png]


Wanted to understand the following things in order to build the proper
alerting strategy:
1. Difference between Statistics.version and Replication.version?
2. Which amongst the following is the best indicator for replication lag:
Statistics.version, Replication.version, Replication.Gen
3. Not all nodes report values under Replication>Master(Replicable)


On Mon, Aug 28, 2023 at 11:30 AM 6harat <bharat.gulati.ce...@gmail.com>
wrote:

> More details about the setup:
> Solr version: 8.2.0
>
> The documentation here:
> https://solr.apache.org/guide/8_2/metrics-reporting.html#core-solrcore-registry
> talks about
>
>> The Core (SolrCore) Registry
>> <https://solr.apache.org/guide/8_2/metrics-reporting.html#core-level-metrics>
>>  includes solr.core.<collection>, one for each core. When making
>> requests with the Metrics API
>> <https://solr.apache.org/guide/8_2/metrics-reporting.html#metrics-api>,
>> you can specify &group=core to limit to only these metrics.
>>
>>    - all common RequestHandlers report: request timers / counters,
>>    timeouts, errors. Handlers that support process distributed shard requests
>>    also report shardRequests sub-counters for each type of distributed
>>    request.
>>    - index-level events
>>    
>> <https://solr.apache.org/guide/8_2/metrics-reporting.html#index-merge-metrics>:
>>    meters for minor / major merges, number of merged docs, number of deleted
>>    docs, gauges for currently running merges and their size.
>>    - *shard replication and transaction log replay on replicas,*
>>
>> but I am unable to find the relevant metric name which corresponds to
> shard replication and transaction log replay. I have also attached the
> output of "solr/admin/metrics?group=core" from one of our pull replica nodes
>
> Regards
> 6harat
>
> On Mon, Aug 28, 2023 at 11:20 AM 6harat <bharat.gulati.ce...@gmail.com>
> wrote:
>
>> Hi,
>>
>> We are running a Solr Cloud setup in production with the following setup:
>> 1. TLOG nodes: 3
>> 2. Pull nodes: M (depending upon the read scalability that is needed)
>>
>> Last week we encountered an issue where one of the pull replica wasn't
>> able to fetch the index from the leader. While we are still in the RCA
>> process, we wanted to find if a metric already exists under
>> "/solr/admin/metrics" which can be used as a way to identify when the given
>> core last synced up with the leader. This will massively help in improving
>> our alerting setup and figure out stale nodes quickly.
>>
>> Apologies if such a question is already answered or the details already
>> exist in the reference manual. If that is indeed the case, please drop the
>> relevant link below.
>>
>> Regards
>> 6harat
>>
>
>
> --
> Regards
> 6harat
>


-- 
Regards
6harat

Reply via email to