Hi Pierre, We have been using solr 8.10, and are running backup core/ replica wise instead of node wise. Still, the IO wait used to increase significantly as the size of each core used to be ~30 gb. The problem has reduced significantly (not completely resolved, and load still increases slightly) after changing our solr architecture where we have reduced the size of each replica by changing sharding strategy and increasing the number of shards.
We will be migrating to solr9.6 soon, so will explore node-wise backup as well!! Thanks On Mon, Jul 22, 2024 at 7:44 PM Pierre Salagnac <pierre.salag...@gmail.com> wrote: > Hi Saksham, > What Solr version do you run? > > With SOLR-16879 in Solr 9.4, a new throttling was added to limit the number > of concurrent backups per node. If I recall well, the default is 5 per > node. Before this fix, all the replica snapshots were started concurrently. > > As far as I know, there is no mechanism to specifically limit IOs, but I > achieved the same by limiting the number of snapshots concurrently done. > > > Le mer. 10 juil. 2024 à 08:27, Saksham Gupta > <saksham.gu...@indiamart.com.invalid> a écrit : > > > Hi All, > > Pinging again for some assistance! > > > > On Tue, Jul 9, 2024 at 4:02 PM Saksham Gupta < > saksham.gu...@indiamart.com> > > wrote: > > > > > Hi All, > > > > > > As an effort to enhance disaster recovery for solr, we have started a > > solr > > > backup process on a daily basis. The backup runs for each replica one > > after > > > the other, after which an integrity check is executed to check if the > > index > > > is having no faults. > > > > > > Although, throughout the backup, we experience high io wait on > production > > > servers as complete data of 25 gb is being read [size of each shard is > > ~25 > > > gb]. The backup executes daily at night 3 AM [backup for each replica > > runs > > > sequentially] and write is done on a separate disc, still response time > > > takes a significant hit, thereby increasing the number of timeouts and > > 5xx. > > > > > > Is there a way to limit the io so that backup is done at a slower pace > > > keeping the response time and other metrics intact? > > > > > >