Here's a POC: https://github.com/apache/solr/pull/1729

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Jun 26, 2023 at 1:53 PM Jason Gerlowski <gerlowsk...@gmail.com>
wrote:

> Sounds like something that would be very useful for folks.
>
> I'm sure it'd be very dependent on your data and the type of backup,
> but I'm curious - if you can share Pierre - is there a number of
> cores-per-node being backed up where you start to see problems?
>
> Jason
>
> On Wed, Jun 21, 2023 at 8:34 AM Pierre Salagnac
> <pierre.salag...@gmail.com> wrote:
> >
> > Thanks for starting this thread David.
> >
> > I've been internally working on this, since we have issues (query
> failures)
> > during backups of big collections because of IO saturation.
> >
> > I see two different approaches to solve this:
> > 1. Throttle at the IO level, like David mentioned.
> > 2. Limit the number of cores we backup concurrently.
> > (These two options are *not* mutually exclusive.)
> >
> > I've been focused on the second option, to limit the number of concurrent
> > backups per node. Currently, the overseer sends shard requests to all
> > shards in a simple 'for' loop. If the collection has one thousand shards,
> > we'll start 1 thousand concurrent backups. The idea is to only send shard
> > level requests up to a certain limit per node, and then each time a shard
> > is complete, we send the next one for this node.
> > If you're interested, I integrated my experiment (for non incremental
> > backups) here:
> >
> https://github.com/psalagnac/solr/commit/c77c94e9a3c20aee3e45ec1198f00ab9cf0f76c5
> >
> > I don't think backup is the only operation that should be considered. At
> > least restore is, not sure whether we have other IO intensive operations
> > that are at the collection level. Ideally, we should have something
> generic
> > and not consider each type of operation individually.
> >
> > Thanks
> >
> >
> > Le mar. 20 juin 2023 à 09:58, Ishan Chattopadhyaya <
> > ichattopadhy...@gmail.com> a écrit :
> >
> > > Might be a good question for users@ list, I guess. I'm sure other
> users
> > > must've thought about this.
> > > Cross posting there, as I'm curious myself too.
> > >
> > > On Tue, 20 Jun 2023 at 01:07, David Smiley <dsmi...@apache.org> wrote:
> > >
> > > > Has anyone mitigated the potentially large IO impact of doing a
> backup
> > > of a
> > > > large collection or just in general?  If the collection is large
> enough,
> > > > there very well could be many shards on one host and it could
> saturate
> > > the
> > > > IO.  I wonder if there should be a rate limit mechanism or some other
> > > > mechanism.
> > > >
> > > > Not the same but I know that at a segment level, the merges are rate
> > > > limited -- ConcurrentMergeScheduler doesn't quite let you set it but
> > > > adjusts itself automatically ("ioThrottle" boolean).
> > > >
> > > > ~ David Smiley
> > > > Apache Lucene/Solr Search Developer
> > > > http://www.linkedin.com/in/davidwsmiley
> > > >
> > >
>

Reply via email to