Here's a POC: https://github.com/apache/solr/pull/1729
~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Mon, Jun 26, 2023 at 1:53 PM Jason Gerlowski <gerlowsk...@gmail.com> wrote: > Sounds like something that would be very useful for folks. > > I'm sure it'd be very dependent on your data and the type of backup, > but I'm curious - if you can share Pierre - is there a number of > cores-per-node being backed up where you start to see problems? > > Jason > > On Wed, Jun 21, 2023 at 8:34 AM Pierre Salagnac > <pierre.salag...@gmail.com> wrote: > > > > Thanks for starting this thread David. > > > > I've been internally working on this, since we have issues (query > failures) > > during backups of big collections because of IO saturation. > > > > I see two different approaches to solve this: > > 1. Throttle at the IO level, like David mentioned. > > 2. Limit the number of cores we backup concurrently. > > (These two options are *not* mutually exclusive.) > > > > I've been focused on the second option, to limit the number of concurrent > > backups per node. Currently, the overseer sends shard requests to all > > shards in a simple 'for' loop. If the collection has one thousand shards, > > we'll start 1 thousand concurrent backups. The idea is to only send shard > > level requests up to a certain limit per node, and then each time a shard > > is complete, we send the next one for this node. > > If you're interested, I integrated my experiment (for non incremental > > backups) here: > > > https://github.com/psalagnac/solr/commit/c77c94e9a3c20aee3e45ec1198f00ab9cf0f76c5 > > > > I don't think backup is the only operation that should be considered. At > > least restore is, not sure whether we have other IO intensive operations > > that are at the collection level. Ideally, we should have something > generic > > and not consider each type of operation individually. > > > > Thanks > > > > > > Le mar. 20 juin 2023 à 09:58, Ishan Chattopadhyaya < > > ichattopadhy...@gmail.com> a écrit : > > > > > Might be a good question for users@ list, I guess. I'm sure other > users > > > must've thought about this. > > > Cross posting there, as I'm curious myself too. > > > > > > On Tue, 20 Jun 2023 at 01:07, David Smiley <dsmi...@apache.org> wrote: > > > > > > > Has anyone mitigated the potentially large IO impact of doing a > backup > > > of a > > > > large collection or just in general? If the collection is large > enough, > > > > there very well could be many shards on one host and it could > saturate > > > the > > > > IO. I wonder if there should be a rate limit mechanism or some other > > > > mechanism. > > > > > > > > Not the same but I know that at a segment level, the merges are rate > > > > limited -- ConcurrentMergeScheduler doesn't quite let you set it but > > > > adjusts itself automatically ("ioThrottle" boolean). > > > > > > > > ~ David Smiley > > > > Apache Lucene/Solr Search Developer > > > > http://www.linkedin.com/in/davidwsmiley > > > > > > > >