Hello. We have a Solr implementation with an index being replicated from a primary instance to 4 secondary instances, all on separate physical servers (5 Ubuntu 22.04.3 servers total) using the data import handler where we import changes every 10 minutes. We are using Solr version 8.11.2. Our implementation in its current setup has been in place since January 2023 without any significant changes (other than OS updates).
The issue we are experiencing has only occurred a handful of times. It first occurred on February 8, 2024 and has occurred about once a month since then (March 14, April 10, June 3). Sequence of events with basic pattern: * Change import command is issued * Replication * We optimize index to a maximum of 4 segments * Index replicated to secondary instances on optimize * New searcher becomes available * Issue immediately presents on all 4 secondary instances which exhibit same behavior * Increased filter cache lookup (from ~100K per minute to upwards of 2.9M per minute) * Increased load * Increased query times * Other caches (query result cache, etc.) remain normal * Request rates don't increase * 10 minutes later a new import is issued and things return to normal * Issue shows up again about 1 hour later, but a bit less impactful on load, query time and filter cache lookups. * In most instances it resolves itself 10 minutes later when another new searcher becomes available and then has remained stable until the next "monthly" occurence. * *On June 3, the issue that appeared an hour later did not resolve itself in a timely manner. * Performing a full import of an fully optimized index resolved the issue for us. We have remained stable since. We have scoured logs and performed extensive analysis on our end each time this has occurred and so far we have come up empty handed. Thank you