Andrea Cosentino created CAMEL-23120:
----------------------------------------

             Summary: camel-docling - Implement batchSize sub-batch 
partitioning in batch processing
                 Key: CAMEL-23120
                 URL: https://issues.apache.org/jira/browse/CAMEL-23120
             Project: Camel
          Issue Type: Bug
            Reporter: Andrea Cosentino
            Assignee: Andrea Cosentino
             Fix For: 4.18.1, 4.19.0


The batchSize configuration parameter (default 10, defined in 
DoclingConfiguration) is declared with @UriParam and read from exchange headers 
in both processBatchConversion() and processBatchStructuredData(), but the 
value is never actually applied in the batch processing logic.

Both convertDocumentsBatch() and convertStructuredDataBatch() accept batchSize 
as a method parameter but submit all documents to the thread pool executor at 
once, regardless of the configured value. This makes batchSize a no-op 
parameter, which is misleading for users who configure it expecting it to 
control processing granularity.

When processing large document sets (e.g., thousands of files), all documents 
are submitted as CompletableFuture instances simultaneously, which can cause 
excessive memory consumption and lack of back-pressure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to