Hi Saksham, When you are saying one replica node is having more size, do you mean that shard, it's shard is also of same size of 10gb. If not please check if there is any old recovery issue due to which the old logs or index still exist.
If this shard is having 10gb of space, you please try to divide data. I hope you can try in development environment before applying it on production clusters. Regards, Aman On Wed, 17 Jul 2024, 18:12 Saksham Gupta, <saksham.gu...@indiamart.com.invalid> wrote: > Hi All, > Pinging again for assistance. This is a very unusual case, which is ruining > user experience for a particular type of search [searches mapped in the > replica facing timeouts] as these requests are taking more than 3 seconds. > > On Wed, Jul 17, 2024 at 11:37 AM Saksham Gupta < > saksham.gu...@indiamart.com> > wrote: > > > Hi All, > > > > We are using a solr cloud cluster of 59 shards [1 replica for each shard] > > spread across 8 nodes. We have used implicit routing for indexing and > > searching data across these shards. > > > > Upon analyzing the timeouts on solr, we have found that more than 85% > > [3097/3693 timeouts on 9th July] of the solr timeouts were happening due > to > > just 1 replica where the the size of the replica is more compared to > other > > replica [other replica contain < 5gb of data, whereas this replica > contains > > 10 gb]. > > > > 1. Anyone who faced a similar issue, how to mitigate this? Is there a way > > to increase timeout for a particular replica/ node? > > > > 2. Also, has someone tried to further divide a shards' data into multiple > > shards? How can we plan this, as there is already a logical separation > > [implicit routing] b/w the 59 shards, and we will be adding another logic > > to subdivide data for 1 of the shards. > > >