fabriziofortino opened a new pull request, #2925:
URL: https://github.com/apache/jackrabbit-oak/pull/2925
### Problem
Queries against Elastic could hang and time out after `queryTimeoutMs`
(default 60s), with the consumer parked in
`ElasticResultRowAsyncIterator.hasNext()` on `queue.poll()`.
OAK-12174 (#2828) moved async response handling off the ES client I/O
dispatcher thread by switching `whenComplete` → `whenCompleteAsync`, which
dispatches onto `ForkJoinPool.commonPool()`. When the caller also drives query
iteration from the common pool, all common-pool workers block in queue.poll()
while the producer callback (`handleResponse` → `queue.offer`) is queued behind
them on the same pool. The queue never fills and the consumer times out. This
is load-correlated and unrelated to actual ES latency.
### Fix
Route async response processing onto a dedicated, ElasticConnection-owned
executor that is disjoint from any caller pool. This preserves OAK-12174's goal
(work stays off the I/O dispatcher thread) while removing the common-pool
contention.
- ElasticConnection owns a lazily-created, bounded thread pool (daemon
threads), exposed via `getResponseExecutor()`, shut down on close().
- Applied to all common-pool-bound async sites: both `whenCompleteAsync`
calls in `ElasticResultRowAsyncIterator`, and the `thenApplyAsync` / recursive
`thenComposeAsync` calls in the secure and statistical facet providers.
### Feature toggle
`FT_OAK-12234`, enabled by default. When disabled, `getResponseExecutor()`
returns ForkJoinPool.commonPool(), exactly restoring pre-fix behavior.
### Configuration
Pool size via system property `oak.elastic.searchResponseThreadPoolSize`
(default max(4, cores * 2); these threads are I/O/enqueue-bound, not CPU-bound).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]