Re: Significant Backup/Restore Performance Degradation for Large Collections

Hakan Özler Tue, 06 Aug 2024 07:48:22 -0700

Hey Pierre,

 See EnvUtils class, which is now the recommended way to get system
> properties or environment
> variables in Solr code.
>


That's very useful!

For some reason in your case, it seems the "cached" thread pool does not
> grow as expected. If you have 5 replicas or more queued to be backed up on
> a node, this node *should* have 5 threads running on this.


Exactly! I had the same expectation for this behavior.

Just to be sure, you trigger an *async* backup, correct?


That's right, tasks are created with an async id.


> The behavior before 9.4 was to have up to 50 threads per nodes doing
> backup/restore. We figured out this was saturing disks, and the overall
> time taken to do a snapshot for a big collection was larger than with some
> sort of throttling.


I see where the problem originates. If we had multiple collections with
more than 5 shards per node, we would have encountered the same problem.
Thank you for the clarification!

Could you give some numbers on your cluster?
> When you submit the backup command, how many collections/shards/replicas do

you have? And how many per node?


We have a cluster with 8 solr nodes. We're doing backup operations on a
collection with 40 shards distributed equally across the nodes with rf=2.
This is the cluster dedicated for this particular collection.



On Tue, 6 Aug 2024 at 13:22, Pierre Salagnac <pierre.salag...@gmail.com>
wrote:

> Hi Hakan,
>
> > I was also wondering if the max thread size, which is currently 5 in
> 9.4+, could be configurable
>
> Yes, this makes sense to have this configuration. See EnvUtils class, which
> is now the recommended way to get system properties or environment
> variables in Solr code.
>
> The behavior before 9.4 was to have up to 50 threads per nodes doing
> backup/restore. We figured out this was saturing disks, and the overall
> time taken to do a snapshot for a big collection was larger than with some
> sort of throttling.
> With 9.4+, it is supposed to have 5 threads per node doing
> backups/restores/splits (the core admin operations that are IO intensive).
>
> For some reason in your case, it seems the "cached" thread pool does not
> grow as expected. If you have 5 replicas or more queued to be backed up on
> a node, this node *should* have 5 threads running on this.
> Just to be sure, you trigger an *async* backup, correct?
>
> Could you give some numbers on your cluster?
> When you submit the backup command, how many collections/shards/replicas do
> you have? And how many per node?
>

Re: Significant Backup/Restore Performance Degradation for Large Collections

Reply via email to