Hello.

We have a Solr implementation with an index being replicated from a primary 
instance to 4 secondary instances, all on separate physical servers (5 Ubuntu 
22.04.3 servers total) using the data import handler where we import changes 
every 10 minutes.  We are using Solr version 8.11.2.  Our implementation in its 
current setup has been in place since January 2023 without any significant 
changes (other than OS updates).

The issue we are experiencing has only occurred a handful of times.  It first 
occurred on February 8, 2024 and has occurred about once a month since then 
(March 14, April 10, June 3).

Sequence of events with basic pattern:

  *   Change import command is issued
  *   Replication
     *   We optimize index to a maximum of 4 segments
     *   Index replicated to secondary instances on optimize
  *   New searcher becomes available
  *   Issue immediately presents on all 4 secondary instances which exhibit 
same behavior
     *   Increased filter cache lookup (from ~100K per minute to upwards of 
2.9M per minute)
     *   Increased load
     *   Increased query times
     *   Other caches (query result cache, etc.) remain normal
     *   Request rates don't increase
  *   10 minutes later a new import is issued and things return to normal
  *   Issue shows up again about 1 hour later, but a bit less impactful on 
load, query time and filter cache lookups.
     *   In most instances it resolves itself 10 minutes later when another new 
searcher becomes available and then has remained stable until the next 
"monthly" occurence.
     *   *On June 3, the issue that appeared an hour later did not resolve 
itself in a timely manner.
        *   Performing a full import of an fully optimized index resolved the 
issue for us.  We have remained stable since.

We have scoured logs and performed extensive analysis on our end each time this 
has occurred and so far we have come up empty handed.

Thank you

Reply via email to