Hi Team,

We have a SolrCloud cluster with the following initial setup:

Collection: Single collection with all document types (legal, report,
thesis).
Shard: 1 shard with 3 replicas.
Collection Size: ~30GB with 1 million documents.
Document Distribution: Thesis documents constitute 24GB out of 30GB and are
generally large.
Resource Allocation:
RAM Allocated: 32GB (from a total of 128GB).
Processors: 16.
Performance Metrics (Before Splitting Collections)
P75 Response Time: 650ms for type:thesis filter.
P90 Response Time: 900ms for type:thesis filter.
Change Made
To improve performance, we decided to split the collection by moving thesis
documents into a separate collection. The new setup is:

Collection 1: Legal + Report documents (~6GB size, 0.75M documents).
Collection 2: Thesis documents (~24GB size, 0.25M documents).
Both collections are:

In the same cluster.
Have the same schema, SolrConfig, cache settings, etc..
Have the same number of replicas and shards.
Observation (After Splitting)
Unexpectedly, the performance for Collection 2 (Thesis) degraded
significantly:

P75 Response Time: Increased to 740ms.
P90 Response Time: Increased to 1.3 seconds.
This was surprising since the thesis documents now have a dedicated
collection and fewer documents to process. However, the performance has
worsened instead of improving.

Question
What could be the potential root cause of this performance degradation?
Given that the configurations, schema, and hardware resources are
identical, why would a dedicated collection for larger documents perform
worse?
Are there any Solr-specific optimizations or cluster configurations that we
may have missed when splitting the collections?
Any insights or suggestions to improve the response time would be greatly
appreciated.

Stackoverflow Link:
https://stackoverflow.com/questions/79500749/performance-degradation-after-splitting-collection-in-solrcloud


Thank you!
Sagar Gole

Reply via email to