Sounds like something that would be very useful for folks.

I'm sure it'd be very dependent on your data and the type of backup,
but I'm curious - if you can share Pierre - is there a number of
cores-per-node being backed up where you start to see problems?

Jason

On Wed, Jun 21, 2023 at 8:34 AM Pierre Salagnac
<pierre.salag...@gmail.com> wrote:
>
> Thanks for starting this thread David.
>
> I've been internally working on this, since we have issues (query failures)
> during backups of big collections because of IO saturation.
>
> I see two different approaches to solve this:
> 1. Throttle at the IO level, like David mentioned.
> 2. Limit the number of cores we backup concurrently.
> (These two options are *not* mutually exclusive.)
>
> I've been focused on the second option, to limit the number of concurrent
> backups per node. Currently, the overseer sends shard requests to all
> shards in a simple 'for' loop. If the collection has one thousand shards,
> we'll start 1 thousand concurrent backups. The idea is to only send shard
> level requests up to a certain limit per node, and then each time a shard
> is complete, we send the next one for this node.
> If you're interested, I integrated my experiment (for non incremental
> backups) here:
> https://github.com/psalagnac/solr/commit/c77c94e9a3c20aee3e45ec1198f00ab9cf0f76c5
>
> I don't think backup is the only operation that should be considered. At
> least restore is, not sure whether we have other IO intensive operations
> that are at the collection level. Ideally, we should have something generic
> and not consider each type of operation individually.
>
> Thanks
>
>
> Le mar. 20 juin 2023 à 09:58, Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> a écrit :
>
> > Might be a good question for users@ list, I guess. I'm sure other users
> > must've thought about this.
> > Cross posting there, as I'm curious myself too.
> >
> > On Tue, 20 Jun 2023 at 01:07, David Smiley <dsmi...@apache.org> wrote:
> >
> > > Has anyone mitigated the potentially large IO impact of doing a backup
> > of a
> > > large collection or just in general?  If the collection is large enough,
> > > there very well could be many shards on one host and it could saturate
> > the
> > > IO.  I wonder if there should be a rate limit mechanism or some other
> > > mechanism.
> > >
> > > Not the same but I know that at a segment level, the merges are rate
> > > limited -- ConcurrentMergeScheduler doesn't quite let you set it but
> > > adjusts itself automatically ("ioThrottle" boolean).
> > >
> > > ~ David Smiley
> > > Apache Lucene/Solr Search Developer
> > > http://www.linkedin.com/in/davidwsmiley
> > >
> >

Reply via email to