Hi Team, We have a SolrCloud cluster with the following initial setup:
Collection: Single collection with all document types (legal, report, thesis). Shard: 1 shard with 3 replicas. Collection Size: ~30GB with 1 million documents. Document Distribution: Thesis documents constitute 24GB out of 30GB and are generally large. Resource Allocation: RAM Allocated: 32GB (from a total of 128GB). Processors: 16. Performance Metrics (Before Splitting Collections) P75 Response Time: 650ms for type:thesis filter. P90 Response Time: 900ms for type:thesis filter. Change Made To improve performance, we decided to split the collection by moving thesis documents into a separate collection. The new setup is: Collection 1: Legal + Report documents (~6GB size, 0.75M documents). Collection 2: Thesis documents (~24GB size, 0.25M documents). Both collections are: In the same cluster. Have the same schema, SolrConfig, cache settings, etc.. Have the same number of replicas and shards. Observation (After Splitting) Unexpectedly, the performance for Collection 2 (Thesis) degraded significantly: P75 Response Time: Increased to 740ms. P90 Response Time: Increased to 1.3 seconds. This was surprising since the thesis documents now have a dedicated collection and fewer documents to process. However, the performance has worsened instead of improving. Question What could be the potential root cause of this performance degradation? Given that the configurations, schema, and hardware resources are identical, why would a dedicated collection for larger documents perform worse? Are there any Solr-specific optimizations or cluster configurations that we may have missed when splitting the collections? Any insights or suggestions to improve the response time would be greatly appreciated. Stackoverflow Link: https://stackoverflow.com/questions/79500749/performance-degradation-after-splitting-collection-in-solrcloud Thank you! Sagar Gole