Andrea Cosentino created CAMEL-23120:
----------------------------------------
Summary: camel-docling - Implement batchSize sub-batch
partitioning in batch processing
Key: CAMEL-23120
URL: https://issues.apache.org/jira/browse/CAMEL-23120
Project: Camel
Issue Type: Bug
Reporter: Andrea Cosentino
Assignee: Andrea Cosentino
Fix For: 4.18.1, 4.19.0
The batchSize configuration parameter (default 10, defined in
DoclingConfiguration) is declared with @UriParam and read from exchange headers
in both processBatchConversion() and processBatchStructuredData(), but the
value is never actually applied in the batch processing logic.
Both convertDocumentsBatch() and convertStructuredDataBatch() accept batchSize
as a method parameter but submit all documents to the thread pool executor at
once, regardless of the configured value. This makes batchSize a no-op
parameter, which is misleading for users who configure it expecting it to
control processing granularity.
When processing large document sets (e.g., thousands of files), all documents
are submitted as CompletableFuture instances simultaneously, which can cause
excessive memory consumption and lack of back-pressure.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)