Hey Pierre, See EnvUtils class, which is now the recommended way to get system > properties or environment > variables in Solr code. >
That's very useful! For some reason in your case, it seems the "cached" thread pool does not > grow as expected. If you have 5 replicas or more queued to be backed up on > a node, this node *should* have 5 threads running on this. Exactly! I had the same expectation for this behavior. Just to be sure, you trigger an *async* backup, correct? That's right, tasks are created with an async id. > The behavior before 9.4 was to have up to 50 threads per nodes doing > backup/restore. We figured out this was saturing disks, and the overall > time taken to do a snapshot for a big collection was larger than with some > sort of throttling. I see where the problem originates. If we had multiple collections with more than 5 shards per node, we would have encountered the same problem. Thank you for the clarification! Could you give some numbers on your cluster? > When you submit the backup command, how many collections/shards/replicas do you have? And how many per node? We have a cluster with 8 solr nodes. We're doing backup operations on a collection with 40 shards distributed equally across the nodes with rf=2. This is the cluster dedicated for this particular collection. On Tue, 6 Aug 2024 at 13:22, Pierre Salagnac <pierre.salag...@gmail.com> wrote: > Hi Hakan, > > > I was also wondering if the max thread size, which is currently 5 in > 9.4+, could be configurable > > Yes, this makes sense to have this configuration. See EnvUtils class, which > is now the recommended way to get system properties or environment > variables in Solr code. > > The behavior before 9.4 was to have up to 50 threads per nodes doing > backup/restore. We figured out this was saturing disks, and the overall > time taken to do a snapshot for a big collection was larger than with some > sort of throttling. > With 9.4+, it is supposed to have 5 threads per node doing > backups/restores/splits (the core admin operations that are IO intensive). > > For some reason in your case, it seems the "cached" thread pool does not > grow as expected. If you have 5 replicas or more queued to be backed up on > a node, this node *should* have 5 threads running on this. > Just to be sure, you trigger an *async* backup, correct? > > Could you give some numbers on your cluster? > When you submit the backup command, how many collections/shards/replicas do > you have? And how many per node? >