Just a heads up, with the patch mentioned above, we managed to backup a
data of 3TB in 50 minutes with `solr.maxExpensiveTaskThreads=5` [1]

I would like to contribute to Solr, however, I'm unsure of the steps I
should take if no one is available to take on this patch.

1. https://imgur.com/a/AAd0czU

On Tue, 30 Jul 2024 at 16:53, Hakan Özler <ozler.ha...@gmail.com> wrote:

> Hi!,
>
> We're experiencing performance issues in the recent Solr versions — 9.5.0
> and 9.6.1 — regarding backup and restore. In 9.2.1, we could take a backup
> of 10TB data in just 1 and a half hours. Currently, as of 9.5.0, taking a
> backup of the collection takes 7 hours! We're unable to make use of
> disaster recovery effectively and reliably in Solr. Therefore, Solr 9.2.1
> still remains the most effective choice among the other 9.x versions for
> our use.
>
> It seems that this is the ticket causing this issue:
> 1. https://issues.apache.org/jira/browse/SOLR-16879
>
> Interestingly, we never encountered a throttling problem during operations
> when this was introduced to be solved based on this argument on 9.2.1. From
> a devops perspective, we have some details and metrics on these tasks to
> distinguish the difference between two versions. The overall IOPS was 150MB
> on 9.6.1, while IOPS was 500MB on 9.2.1 during the same backup and restore
> tasks. In the first image [1], the peak on the left represents a backup, in
> contrast, in the 2nd image [2], the same backup operation in 9.5.0 uses
> less resource. As you may spot, 9.5.0 seems to be using a fifth of the
> resources of 9.2.1.
>
> Apart from that, monitoring some relevant metrics during the operations, I
> had some difficulty interpreting the following metrics:
>
> "ADMIN./admin/cores.threadPool.parallelCoreExpensiveAdminExecutor.pool.core":
> 0,
> "ADMIN./admin/cores.threadPool.parallelCoreExpensiveAdminExecutor.pool.max":
> 5,
> "ADMIN./admin/cores.threadPool.parallelCoreExpensiveAdminExecutor.pool.size":
> 1,
> "ADMIN./admin/cores.threadPool.parallelCoreExpensiveAdminExecutor.running":
> 1,
>
> The pool size was 1 although the pool max size is 5. Shouldn't the pool
> size be 5, instead? However, there is always one task running on a single
> node, not 5 concurrently, if I'm not mistaken.
>
> I was also wondering if the max thread size, which is currently 5 in 9.4+,
> could be configurable with either an environment variable or Java
> parameter? The part that needs to be changed seems to be in
> CoreAdminHandler.java on line 446 [3] I've made a small adjustment to add a
> Solr parameter called `solr.maxExpensiveTaskThreads` for those who want to
> set a different thread size for expensive tasks. The number given in this
> parameter must meet the criteria of ThreadPoolExecutor, otherwise
> IllegalArgumentException will occur. I've generated a patch [4] and I would
> love to see if someone from the Solr committers would take on this and
> apply for the upcoming release. Do you think our observation is accurate
> and would this patch be feasible to implement?
>
> Thanks!
> Hakan
>
> 1. https://i.imgur.com/aSrs8OM.png
> 2. https://i.imgur.com/Yr6hBM8.png
> 3.
> https://github.com/apache/solr/commit/82a847f0f9af18d6eceee18743d636db7a879f3e#diff-5bc3d44ca8b189f44fe9e6f75af8a5510463bdba79ff72a7d0ed190973a32533L446
> 4. https://gist.github.com/ozlerhakan/e4d11bddae6a2f89d2c212c220f4c965
>
>

Reply via email to