This is an automated email from the ASF dual-hosted git repository. acosentino pushed a commit to branch docling-experiment in repository https://gitbox.apache.org/repos/asf/camel.git
commit 79dcb5e7399b960090f3b86f9126cc1a08379d27 Author: Andrea Cosentino <[email protected]> AuthorDate: Fri Sep 19 11:55:15 2025 +0200 CAMEL-22433 - Create a Camel Docling component Signed-off-by: Andrea Cosentino <[email protected]> --- README.md | 109 ------ bom/camel-bom/pom.xml | 5 + catalog/camel-allcomponents/pom.xml | 5 + components/camel-ai/{ => camel-docling}/pom.xml | 64 +-- .../docling/DoclingComponentConfigurer.java | 130 +++++++ .../docling/DoclingConfigurationConfigurer.java | 102 +++++ .../docling/DoclingEndpointConfigurer.java | 126 ++++++ .../docling/DoclingEndpointUriFactory.java | 84 ++++ .../apache/camel/component/docling/docling.json | 69 ++++ .../services/org/apache/camel/component.properties | 7 + .../services/org/apache/camel/component/docling | 2 + .../org/apache/camel/configurer/docling-component | 2 + .../org/apache/camel/configurer/docling-endpoint | 2 + ...he.camel.component.docling.DoclingConfiguration | 2 + .../org/apache/camel/urifactory/docling-endpoint | 2 + .../src/main/docs/docling-component.adoc | 216 ++++++++++ .../camel/component/docling/DoclingComponent.java | 64 +++ .../component/docling/DoclingConfiguration.java | 161 ++++++++ .../camel/component/docling/DoclingEndpoint.java | 73 ++++ .../camel/component/docling/DoclingHeaders.java | 50 +++ .../camel/component/docling/DoclingOperations.java | 46 +++ .../camel/component/docling/DoclingProducer.java | 433 +++++++++++++++++++++ .../component/docling/DoclingComponentTest.java | 58 +++ .../component/docling/DoclingProducerTest.java | 117 ++++++ .../src/test/resources/log4j2.properties | 43 ++ components/camel-ai/pom.xml | 1 + .../modules/ROOT/examples/json/docling.json | 1 + docs/components/modules/ROOT/nav.adoc | 1 + .../modules/ROOT/pages/docling-component.adoc | 1 + parent/pom.xml | 5 + 30 files changed, 1843 insertions(+), 138 deletions(-) diff --git a/README.md b/README.md deleted file mode 100644 index 4acfaf0d43a..00000000000 --- a/README.md +++ /dev/null @@ -1,109 +0,0 @@ -# Apache Camel - -[](https://maven-badges.herokuapp.com/maven-central/org.apache.camel/apache-camel) -[](https://www.javadoc.io/doc/org.apache.camel/camel-api) -[](http://stackoverflow.com/questions/tagged/apache-camel) -[](https://camel.zulipchat.com/) -[](https://twitter.com/ApacheCamel) - - -[Apache Camel](https://camel.apache.org/) is an Open Source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data. - -### Introduction - -Camel empowers you to define routing and mediation rules in a variety of domain-specific languages (DSL, such as Java, XML, Groovy and YAML). This means you get smart completion of routing rules in your IDE, whether in a Java or XML editor. - -Apache Camel uses URIs to enable easier integration with all kinds of -transport or messaging model including HTTP, ActiveMQ, JMS, JBI, SCA, MINA -or CXF together with working with pluggable Data Format options. -Apache Camel is a small library that has minimal dependencies for easy embedding -in any Java application. Apache Camel lets you work with the same API regardless of the -transport type, making it possible to interact with all the components provided out-of-the-box, -with a good understanding of the API. - -Apache Camel has powerful Bean Binding and integrated seamlessly with -popular frameworks such as Spring, Quarkus, and CDI. - -Apache Camel has extensive testing support allowing you to easily -unit test your routes. - -## Components - -Apache Camel comes alongside several artifacts with components, data formats, languages, and kinds. -The up-to-date list is available online at the Camel website: - -* Components: <https://camel.apache.org/components/latest/> -* Data Formats: <https://camel.apache.org/components/latest/dataformats/> -* Languages: <https://camel.apache.org/components/latest/languages/> -* Miscellaneous: <https://camel.apache.org/components/latest/#_miscellaneous_components> - -## Examples - -Apache Camel comes with many examples. -The up to date list is available online at GitHub: - -* Examples: <https://github.com/apache/camel-examples/tree/main#welcome-to-the-apache-camel-examples> - -## Getting Started - -To help you get started, try the following links: - -**Getting Started** - -<https://camel.apache.org/getting-started.html> - -The beginner examples are another powerful alternative pathway for getting started with Apache Camel. - -* Examples: <https://github.com/apache/camel-examples/tree/main#welcome-to-the-apache-camel-examples> - -**Building** - -<https://camel.apache.org/camel-core/contributing/> - -**Contributions** - -We welcome all kinds of contributions, the details of which are specified here: - -<https://github.com/apache/camel/blob/main/CONTRIBUTING.md> - - -Please refer to the website for details of finding the issue tracker, -email lists, GitHub, chat - -Website: <https://camel.apache.org/> - -GitHub (source): <https://github.com/apache/camel> - -Issue tracker: <https://issues.apache.org/jira/projects/CAMEL> - -Mailing-list: <https://camel.apache.org/community/mailing-list/> - -Chat: <https://camel.zulipchat.com/> - -StackOverflow: <https://stackoverflow.com/questions/tagged/apache-camel> - -Twitter: <https://twitter.com/ApacheCamel> - - -**Support** - -For additional help, support, we recommend referencing this page first: - -<https://camel.apache.org/community/support/> - -**Getting Help** - -If you get stuck somewhere, please feel free to reach out to us on either StackOverflow, Chat, or the email mailing list. - -Please help us make Apache Camel better — we appreciate any feedback you may have. - -Enjoy! - ------------------ -The Camel riders! - -# Licensing - -The terms for software licensing are detailed in the `LICENSE.txt` file, -located in the working directory. - diff --git a/bom/camel-bom/pom.xml b/bom/camel-bom/pom.xml index abd5bd20822..b3075d91365 100644 --- a/bom/camel-bom/pom.xml +++ b/bom/camel-bom/pom.xml @@ -722,6 +722,11 @@ <artifactId>camel-docker</artifactId> <version>4.15.0-SNAPSHOT</version> </dependency> + <dependency> + <groupId>org.apache.camel</groupId> + <artifactId>camel-docling</artifactId> + <version>4.15.0-SNAPSHOT</version> + </dependency> <dependency> <groupId>org.apache.camel</groupId> <artifactId>camel-drill</artifactId> diff --git a/catalog/camel-allcomponents/pom.xml b/catalog/camel-allcomponents/pom.xml index c73a17c6a70..c8ecbb89650 100644 --- a/catalog/camel-allcomponents/pom.xml +++ b/catalog/camel-allcomponents/pom.xml @@ -611,6 +611,11 @@ <artifactId>camel-docker</artifactId> <version>${project.version}</version> </dependency> + <dependency> + <groupId>org.apache.camel</groupId> + <artifactId>camel-docling</artifactId> + <version>${project.version}</version> + </dependency> <dependency> <groupId>org.apache.camel</groupId> <artifactId>camel-drill</artifactId> diff --git a/components/camel-ai/pom.xml b/components/camel-ai/camel-docling/pom.xml similarity index 52% copy from components/camel-ai/pom.xml copy to components/camel-ai/camel-docling/pom.xml index d2268d1283d..90207a0192f 100644 --- a/components/camel-ai/pom.xml +++ b/components/camel-ai/camel-docling/pom.xml @@ -23,36 +23,42 @@ <modelVersion>4.0.0</modelVersion> <parent> + <artifactId>camel-ai-parent</artifactId> <groupId>org.apache.camel</groupId> - <artifactId>components</artifactId> <version>4.15.0-SNAPSHOT</version> </parent> - <artifactId>camel-ai-parent</artifactId> - <packaging>pom</packaging> - <name>Camel :: AI :: Parent</name> - <description>Camel AI parent</description> - - <modules> - <module>camel-chatscript</module> - <module>camel-djl</module> - <module>camel-kserve</module> - <module>camel-langchain4j-agent-api</module> - <module>camel-langchain4j-agent</module> - <module>camel-langchain4j-chat</module> - <module>camel-langchain4j-core</module> - <module>camel-langchain4j-embeddings</module> - <module>camel-langchain4j-embeddingstore</module> - <module>camel-langchain4j-tokenizer</module> - <module>camel-langchain4j-tools</module> - <module>camel-langchain4j-web-search</module> - <module>camel-milvus</module> - <module>camel-neo4j</module> - <module>camel-pinecone</module> - <module>camel-qdrant</module> - <module>camel-tensorflow-serving</module> - <module>camel-torchserve</module> - <module>camel-weaviate</module> - </modules> - -</project> + <artifactId>camel-docling</artifactId> + <packaging>jar</packaging> + <name>Camel :: AI :: Docling</name> + <description>Docling document processing component</description> + + <dependencies> + + <dependency> + <groupId>org.apache.camel</groupId> + <artifactId>camel-support</artifactId> + </dependency> + + <!-- Docling Python integration - will use process execution --> + <dependency> + <groupId>com.fasterxml.jackson.core</groupId> + <artifactId>jackson-databind</artifactId> + <version>${jackson2-version}</version> + </dependency> + + <!-- for testing --> + <dependency> + <groupId>org.apache.camel</groupId> + <artifactId>camel-test-junit5</artifactId> + <scope>test</scope> + </dependency> + <dependency> + <groupId>org.assertj</groupId> + <artifactId>assertj-core</artifactId> + <scope>test</scope> + </dependency> + + </dependencies> + +</project> \ No newline at end of file diff --git a/components/camel-ai/camel-docling/src/generated/java/org/apache/camel/component/docling/DoclingComponentConfigurer.java b/components/camel-ai/camel-docling/src/generated/java/org/apache/camel/component/docling/DoclingComponentConfigurer.java new file mode 100644 index 00000000000..20196108d6e --- /dev/null +++ b/components/camel-ai/camel-docling/src/generated/java/org/apache/camel/component/docling/DoclingComponentConfigurer.java @@ -0,0 +1,130 @@ +/* Generated by camel build tools - do NOT edit this file! */ +package org.apache.camel.component.docling; + +import javax.annotation.processing.Generated; +import java.util.Map; + +import org.apache.camel.CamelContext; +import org.apache.camel.spi.ExtendedPropertyConfigurerGetter; +import org.apache.camel.spi.PropertyConfigurerGetter; +import org.apache.camel.spi.ConfigurerStrategy; +import org.apache.camel.spi.GeneratedPropertyConfigurer; +import org.apache.camel.util.CaseInsensitiveMap; +import org.apache.camel.support.component.PropertyConfigurerSupport; + +/** + * Generated by camel build tools - do NOT edit this file! + */ +@Generated("org.apache.camel.maven.packaging.EndpointSchemaGeneratorMojo") +@SuppressWarnings("unchecked") +public class DoclingComponentConfigurer extends PropertyConfigurerSupport implements GeneratedPropertyConfigurer, PropertyConfigurerGetter { + + private org.apache.camel.component.docling.DoclingConfiguration getOrCreateConfiguration(DoclingComponent target) { + if (target.getConfiguration() == null) { + target.setConfiguration(new org.apache.camel.component.docling.DoclingConfiguration()); + } + return target.getConfiguration(); + } + + @Override + public boolean configure(CamelContext camelContext, Object obj, String name, Object value, boolean ignoreCase) { + DoclingComponent target = (DoclingComponent) obj; + switch (ignoreCase ? name.toLowerCase() : name) { + case "autowiredenabled": + case "autowiredEnabled": target.setAutowiredEnabled(property(camelContext, boolean.class, value)); return true; + case "bridgeerrorhandler": + case "bridgeErrorHandler": target.setBridgeErrorHandler(property(camelContext, boolean.class, value)); return true; + case "configuration": target.setConfiguration(property(camelContext, org.apache.camel.component.docling.DoclingConfiguration.class, value)); return true; + case "contentinbody": + case "contentInBody": getOrCreateConfiguration(target).setContentInBody(property(camelContext, boolean.class, value)); return true; + case "doclingcommand": + case "doclingCommand": getOrCreateConfiguration(target).setDoclingCommand(property(camelContext, java.lang.String.class, value)); return true; + case "enableocr": + case "enableOCR": getOrCreateConfiguration(target).setEnableOCR(property(camelContext, boolean.class, value)); return true; + case "includelayoutinfo": + case "includeLayoutInfo": getOrCreateConfiguration(target).setIncludeLayoutInfo(property(camelContext, boolean.class, value)); return true; + case "lazystartproducer": + case "lazyStartProducer": target.setLazyStartProducer(property(camelContext, boolean.class, value)); return true; + case "maxfilesize": + case "maxFileSize": getOrCreateConfiguration(target).setMaxFileSize(property(camelContext, long.class, value)); return true; + case "ocrlanguage": + case "ocrLanguage": getOrCreateConfiguration(target).setOcrLanguage(property(camelContext, java.lang.String.class, value)); return true; + case "operation": getOrCreateConfiguration(target).setOperation(property(camelContext, org.apache.camel.component.docling.DoclingOperations.class, value)); return true; + case "outputformat": + case "outputFormat": getOrCreateConfiguration(target).setOutputFormat(property(camelContext, java.lang.String.class, value)); return true; + case "processtimeout": + case "processTimeout": getOrCreateConfiguration(target).setProcessTimeout(property(camelContext, long.class, value)); return true; + case "workingdirectory": + case "workingDirectory": getOrCreateConfiguration(target).setWorkingDirectory(property(camelContext, java.lang.String.class, value)); return true; + default: return false; + } + } + + @Override + public Class<?> getOptionType(String name, boolean ignoreCase) { + switch (ignoreCase ? name.toLowerCase() : name) { + case "autowiredenabled": + case "autowiredEnabled": return boolean.class; + case "bridgeerrorhandler": + case "bridgeErrorHandler": return boolean.class; + case "configuration": return org.apache.camel.component.docling.DoclingConfiguration.class; + case "contentinbody": + case "contentInBody": return boolean.class; + case "doclingcommand": + case "doclingCommand": return java.lang.String.class; + case "enableocr": + case "enableOCR": return boolean.class; + case "includelayoutinfo": + case "includeLayoutInfo": return boolean.class; + case "lazystartproducer": + case "lazyStartProducer": return boolean.class; + case "maxfilesize": + case "maxFileSize": return long.class; + case "ocrlanguage": + case "ocrLanguage": return java.lang.String.class; + case "operation": return org.apache.camel.component.docling.DoclingOperations.class; + case "outputformat": + case "outputFormat": return java.lang.String.class; + case "processtimeout": + case "processTimeout": return long.class; + case "workingdirectory": + case "workingDirectory": return java.lang.String.class; + default: return null; + } + } + + @Override + public Object getOptionValue(Object obj, String name, boolean ignoreCase) { + DoclingComponent target = (DoclingComponent) obj; + switch (ignoreCase ? name.toLowerCase() : name) { + case "autowiredenabled": + case "autowiredEnabled": return target.isAutowiredEnabled(); + case "bridgeerrorhandler": + case "bridgeErrorHandler": return target.isBridgeErrorHandler(); + case "configuration": return target.getConfiguration(); + case "contentinbody": + case "contentInBody": return getOrCreateConfiguration(target).isContentInBody(); + case "doclingcommand": + case "doclingCommand": return getOrCreateConfiguration(target).getDoclingCommand(); + case "enableocr": + case "enableOCR": return getOrCreateConfiguration(target).isEnableOCR(); + case "includelayoutinfo": + case "includeLayoutInfo": return getOrCreateConfiguration(target).isIncludeLayoutInfo(); + case "lazystartproducer": + case "lazyStartProducer": return target.isLazyStartProducer(); + case "maxfilesize": + case "maxFileSize": return getOrCreateConfiguration(target).getMaxFileSize(); + case "ocrlanguage": + case "ocrLanguage": return getOrCreateConfiguration(target).getOcrLanguage(); + case "operation": return getOrCreateConfiguration(target).getOperation(); + case "outputformat": + case "outputFormat": return getOrCreateConfiguration(target).getOutputFormat(); + case "processtimeout": + case "processTimeout": return getOrCreateConfiguration(target).getProcessTimeout(); + case "workingdirectory": + case "workingDirectory": return getOrCreateConfiguration(target).getWorkingDirectory(); + default: return null; + } + } +} + diff --git a/components/camel-ai/camel-docling/src/generated/java/org/apache/camel/component/docling/DoclingConfigurationConfigurer.java b/components/camel-ai/camel-docling/src/generated/java/org/apache/camel/component/docling/DoclingConfigurationConfigurer.java new file mode 100644 index 00000000000..a5d88d14ed4 --- /dev/null +++ b/components/camel-ai/camel-docling/src/generated/java/org/apache/camel/component/docling/DoclingConfigurationConfigurer.java @@ -0,0 +1,102 @@ +/* Generated by camel build tools - do NOT edit this file! */ +package org.apache.camel.component.docling; + +import javax.annotation.processing.Generated; +import java.util.Map; + +import org.apache.camel.CamelContext; +import org.apache.camel.spi.ExtendedPropertyConfigurerGetter; +import org.apache.camel.spi.PropertyConfigurerGetter; +import org.apache.camel.spi.ConfigurerStrategy; +import org.apache.camel.spi.GeneratedPropertyConfigurer; +import org.apache.camel.util.CaseInsensitiveMap; +import org.apache.camel.component.docling.DoclingConfiguration; + +/** + * Generated by camel build tools - do NOT edit this file! + */ +@Generated("org.apache.camel.maven.packaging.GenerateConfigurerMojo") +@SuppressWarnings("unchecked") +public class DoclingConfigurationConfigurer extends org.apache.camel.support.component.PropertyConfigurerSupport implements GeneratedPropertyConfigurer, PropertyConfigurerGetter { + + @Override + public boolean configure(CamelContext camelContext, Object obj, String name, Object value, boolean ignoreCase) { + org.apache.camel.component.docling.DoclingConfiguration target = (org.apache.camel.component.docling.DoclingConfiguration) obj; + switch (ignoreCase ? name.toLowerCase() : name) { + case "contentinbody": + case "contentInBody": target.setContentInBody(property(camelContext, boolean.class, value)); return true; + case "doclingcommand": + case "doclingCommand": target.setDoclingCommand(property(camelContext, java.lang.String.class, value)); return true; + case "enableocr": + case "enableOCR": target.setEnableOCR(property(camelContext, boolean.class, value)); return true; + case "includelayoutinfo": + case "includeLayoutInfo": target.setIncludeLayoutInfo(property(camelContext, boolean.class, value)); return true; + case "maxfilesize": + case "maxFileSize": target.setMaxFileSize(property(camelContext, long.class, value)); return true; + case "ocrlanguage": + case "ocrLanguage": target.setOcrLanguage(property(camelContext, java.lang.String.class, value)); return true; + case "operation": target.setOperation(property(camelContext, org.apache.camel.component.docling.DoclingOperations.class, value)); return true; + case "outputformat": + case "outputFormat": target.setOutputFormat(property(camelContext, java.lang.String.class, value)); return true; + case "processtimeout": + case "processTimeout": target.setProcessTimeout(property(camelContext, long.class, value)); return true; + case "workingdirectory": + case "workingDirectory": target.setWorkingDirectory(property(camelContext, java.lang.String.class, value)); return true; + default: return false; + } + } + + @Override + public Class<?> getOptionType(String name, boolean ignoreCase) { + switch (ignoreCase ? name.toLowerCase() : name) { + case "contentinbody": + case "contentInBody": return boolean.class; + case "doclingcommand": + case "doclingCommand": return java.lang.String.class; + case "enableocr": + case "enableOCR": return boolean.class; + case "includelayoutinfo": + case "includeLayoutInfo": return boolean.class; + case "maxfilesize": + case "maxFileSize": return long.class; + case "ocrlanguage": + case "ocrLanguage": return java.lang.String.class; + case "operation": return org.apache.camel.component.docling.DoclingOperations.class; + case "outputformat": + case "outputFormat": return java.lang.String.class; + case "processtimeout": + case "processTimeout": return long.class; + case "workingdirectory": + case "workingDirectory": return java.lang.String.class; + default: return null; + } + } + + @Override + public Object getOptionValue(Object obj, String name, boolean ignoreCase) { + org.apache.camel.component.docling.DoclingConfiguration target = (org.apache.camel.component.docling.DoclingConfiguration) obj; + switch (ignoreCase ? name.toLowerCase() : name) { + case "contentinbody": + case "contentInBody": return target.isContentInBody(); + case "doclingcommand": + case "doclingCommand": return target.getDoclingCommand(); + case "enableocr": + case "enableOCR": return target.isEnableOCR(); + case "includelayoutinfo": + case "includeLayoutInfo": return target.isIncludeLayoutInfo(); + case "maxfilesize": + case "maxFileSize": return target.getMaxFileSize(); + case "ocrlanguage": + case "ocrLanguage": return target.getOcrLanguage(); + case "operation": return target.getOperation(); + case "outputformat": + case "outputFormat": return target.getOutputFormat(); + case "processtimeout": + case "processTimeout": return target.getProcessTimeout(); + case "workingdirectory": + case "workingDirectory": return target.getWorkingDirectory(); + default: return null; + } + } +} + diff --git a/components/camel-ai/camel-docling/src/generated/java/org/apache/camel/component/docling/DoclingEndpointConfigurer.java b/components/camel-ai/camel-docling/src/generated/java/org/apache/camel/component/docling/DoclingEndpointConfigurer.java new file mode 100644 index 00000000000..97af5f68b97 --- /dev/null +++ b/components/camel-ai/camel-docling/src/generated/java/org/apache/camel/component/docling/DoclingEndpointConfigurer.java @@ -0,0 +1,126 @@ +/* Generated by camel build tools - do NOT edit this file! */ +package org.apache.camel.component.docling; + +import javax.annotation.processing.Generated; +import java.util.Map; + +import org.apache.camel.CamelContext; +import org.apache.camel.spi.ExtendedPropertyConfigurerGetter; +import org.apache.camel.spi.PropertyConfigurerGetter; +import org.apache.camel.spi.ConfigurerStrategy; +import org.apache.camel.spi.GeneratedPropertyConfigurer; +import org.apache.camel.util.CaseInsensitiveMap; +import org.apache.camel.support.component.PropertyConfigurerSupport; + +/** + * Generated by camel build tools - do NOT edit this file! + */ +@Generated("org.apache.camel.maven.packaging.EndpointSchemaGeneratorMojo") +@SuppressWarnings("unchecked") +public class DoclingEndpointConfigurer extends PropertyConfigurerSupport implements GeneratedPropertyConfigurer, PropertyConfigurerGetter { + + @Override + public boolean configure(CamelContext camelContext, Object obj, String name, Object value, boolean ignoreCase) { + DoclingEndpoint target = (DoclingEndpoint) obj; + switch (ignoreCase ? name.toLowerCase() : name) { + case "bridgeerrorhandler": + case "bridgeErrorHandler": target.setBridgeErrorHandler(property(camelContext, boolean.class, value)); return true; + case "contentinbody": + case "contentInBody": target.getConfiguration().setContentInBody(property(camelContext, boolean.class, value)); return true; + case "doclingcommand": + case "doclingCommand": target.getConfiguration().setDoclingCommand(property(camelContext, java.lang.String.class, value)); return true; + case "enableocr": + case "enableOCR": target.getConfiguration().setEnableOCR(property(camelContext, boolean.class, value)); return true; + case "exceptionhandler": + case "exceptionHandler": target.setExceptionHandler(property(camelContext, org.apache.camel.spi.ExceptionHandler.class, value)); return true; + case "exchangepattern": + case "exchangePattern": target.setExchangePattern(property(camelContext, org.apache.camel.ExchangePattern.class, value)); return true; + case "includelayoutinfo": + case "includeLayoutInfo": target.getConfiguration().setIncludeLayoutInfo(property(camelContext, boolean.class, value)); return true; + case "lazystartproducer": + case "lazyStartProducer": target.setLazyStartProducer(property(camelContext, boolean.class, value)); return true; + case "maxfilesize": + case "maxFileSize": target.getConfiguration().setMaxFileSize(property(camelContext, long.class, value)); return true; + case "ocrlanguage": + case "ocrLanguage": target.getConfiguration().setOcrLanguage(property(camelContext, java.lang.String.class, value)); return true; + case "operation": target.getConfiguration().setOperation(property(camelContext, org.apache.camel.component.docling.DoclingOperations.class, value)); return true; + case "outputformat": + case "outputFormat": target.getConfiguration().setOutputFormat(property(camelContext, java.lang.String.class, value)); return true; + case "processtimeout": + case "processTimeout": target.getConfiguration().setProcessTimeout(property(camelContext, long.class, value)); return true; + case "workingdirectory": + case "workingDirectory": target.getConfiguration().setWorkingDirectory(property(camelContext, java.lang.String.class, value)); return true; + default: return false; + } + } + + @Override + public Class<?> getOptionType(String name, boolean ignoreCase) { + switch (ignoreCase ? name.toLowerCase() : name) { + case "bridgeerrorhandler": + case "bridgeErrorHandler": return boolean.class; + case "contentinbody": + case "contentInBody": return boolean.class; + case "doclingcommand": + case "doclingCommand": return java.lang.String.class; + case "enableocr": + case "enableOCR": return boolean.class; + case "exceptionhandler": + case "exceptionHandler": return org.apache.camel.spi.ExceptionHandler.class; + case "exchangepattern": + case "exchangePattern": return org.apache.camel.ExchangePattern.class; + case "includelayoutinfo": + case "includeLayoutInfo": return boolean.class; + case "lazystartproducer": + case "lazyStartProducer": return boolean.class; + case "maxfilesize": + case "maxFileSize": return long.class; + case "ocrlanguage": + case "ocrLanguage": return java.lang.String.class; + case "operation": return org.apache.camel.component.docling.DoclingOperations.class; + case "outputformat": + case "outputFormat": return java.lang.String.class; + case "processtimeout": + case "processTimeout": return long.class; + case "workingdirectory": + case "workingDirectory": return java.lang.String.class; + default: return null; + } + } + + @Override + public Object getOptionValue(Object obj, String name, boolean ignoreCase) { + DoclingEndpoint target = (DoclingEndpoint) obj; + switch (ignoreCase ? name.toLowerCase() : name) { + case "bridgeerrorhandler": + case "bridgeErrorHandler": return target.isBridgeErrorHandler(); + case "contentinbody": + case "contentInBody": return target.getConfiguration().isContentInBody(); + case "doclingcommand": + case "doclingCommand": return target.getConfiguration().getDoclingCommand(); + case "enableocr": + case "enableOCR": return target.getConfiguration().isEnableOCR(); + case "exceptionhandler": + case "exceptionHandler": return target.getExceptionHandler(); + case "exchangepattern": + case "exchangePattern": return target.getExchangePattern(); + case "includelayoutinfo": + case "includeLayoutInfo": return target.getConfiguration().isIncludeLayoutInfo(); + case "lazystartproducer": + case "lazyStartProducer": return target.isLazyStartProducer(); + case "maxfilesize": + case "maxFileSize": return target.getConfiguration().getMaxFileSize(); + case "ocrlanguage": + case "ocrLanguage": return target.getConfiguration().getOcrLanguage(); + case "operation": return target.getConfiguration().getOperation(); + case "outputformat": + case "outputFormat": return target.getConfiguration().getOutputFormat(); + case "processtimeout": + case "processTimeout": return target.getConfiguration().getProcessTimeout(); + case "workingdirectory": + case "workingDirectory": return target.getConfiguration().getWorkingDirectory(); + default: return null; + } + } +} + diff --git a/components/camel-ai/camel-docling/src/generated/java/org/apache/camel/component/docling/DoclingEndpointUriFactory.java b/components/camel-ai/camel-docling/src/generated/java/org/apache/camel/component/docling/DoclingEndpointUriFactory.java new file mode 100644 index 00000000000..9ead057d65b --- /dev/null +++ b/components/camel-ai/camel-docling/src/generated/java/org/apache/camel/component/docling/DoclingEndpointUriFactory.java @@ -0,0 +1,84 @@ +/* Generated by camel build tools - do NOT edit this file! */ +package org.apache.camel.component.docling; + +import javax.annotation.processing.Generated; +import java.net.URISyntaxException; +import java.util.Collections; +import java.util.HashMap; +import java.util.HashSet; +import java.util.Map; +import java.util.Set; + +import org.apache.camel.spi.EndpointUriFactory; + +/** + * Generated by camel build tools - do NOT edit this file! + */ +@Generated("org.apache.camel.maven.packaging.GenerateEndpointUriFactoryMojo") +public class DoclingEndpointUriFactory extends org.apache.camel.support.component.EndpointUriFactorySupport implements EndpointUriFactory { + + private static final String BASE = ":operationId"; + + private static final Set<String> PROPERTY_NAMES; + private static final Set<String> SECRET_PROPERTY_NAMES; + private static final Set<String> MULTI_VALUE_PREFIXES; + static { + Set<String> props = new HashSet<>(15); + props.add("bridgeErrorHandler"); + props.add("contentInBody"); + props.add("doclingCommand"); + props.add("enableOCR"); + props.add("exceptionHandler"); + props.add("exchangePattern"); + props.add("includeLayoutInfo"); + props.add("lazyStartProducer"); + props.add("maxFileSize"); + props.add("ocrLanguage"); + props.add("operation"); + props.add("operationId"); + props.add("outputFormat"); + props.add("processTimeout"); + props.add("workingDirectory"); + PROPERTY_NAMES = Collections.unmodifiableSet(props); + SECRET_PROPERTY_NAMES = Collections.emptySet(); + MULTI_VALUE_PREFIXES = Collections.emptySet(); + } + + @Override + public boolean isEnabled(String scheme) { + return "docling".equals(scheme); + } + + @Override + public String buildUri(String scheme, Map<String, Object> properties, boolean encode) throws URISyntaxException { + String syntax = scheme + BASE; + String uri = syntax; + + Map<String, Object> copy = new HashMap<>(properties); + + uri = buildPathParameter(syntax, uri, "operationId", null, true, copy); + uri = buildQueryParameters(uri, copy, encode); + return uri; + } + + @Override + public Set<String> propertyNames() { + return PROPERTY_NAMES; + } + + @Override + public Set<String> secretPropertyNames() { + return SECRET_PROPERTY_NAMES; + } + + @Override + public Set<String> multiValuePrefixes() { + return MULTI_VALUE_PREFIXES; + } + + @Override + public boolean isLenientProperties() { + return false; + } +} + diff --git a/components/camel-ai/camel-docling/src/generated/resources/META-INF/org/apache/camel/component/docling/docling.json b/components/camel-ai/camel-docling/src/generated/resources/META-INF/org/apache/camel/component/docling/docling.json new file mode 100644 index 00000000000..a6258b47a24 --- /dev/null +++ b/components/camel-ai/camel-docling/src/generated/resources/META-INF/org/apache/camel/component/docling/docling.json @@ -0,0 +1,69 @@ +{ + "component": { + "kind": "component", + "name": "docling", + "title": "Docling", + "description": "Process documents using Docling library for parsing and conversion.", + "deprecated": false, + "firstVersion": "4.15.0", + "label": "ai", + "javaType": "org.apache.camel.component.docling.DoclingComponent", + "supportLevel": "Preview", + "groupId": "org.apache.camel", + "artifactId": "camel-docling", + "version": "4.15.0-SNAPSHOT", + "scheme": "docling", + "extendsScheme": "", + "syntax": "docling:operationId", + "async": false, + "api": false, + "consumerOnly": false, + "producerOnly": false, + "lenientProperties": false, + "browsable": false, + "remote": true + }, + "componentProperties": { + "configuration": { "index": 0, "kind": "property", "displayName": "Configuration", "group": "common", "label": "", "required": false, "type": "object", "javaType": "org.apache.camel.component.docling.DoclingConfiguration", "deprecated": false, "autowired": false, "secret": false, "description": "The configuration;" }, + "contentInBody": { "index": 1, "kind": "property", "displayName": "Content In Body", "group": "common", "label": "", "required": false, "type": "boolean", "javaType": "boolean", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "defaultValue": false, "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Include the content of the output file in the exchange body and delete [...] + "enableOCR": { "index": 2, "kind": "property", "displayName": "Enable OCR", "group": "common", "label": "", "required": false, "type": "boolean", "javaType": "boolean", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "defaultValue": true, "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Enable OCR processing for scanned documents" }, + "includeLayoutInfo": { "index": 3, "kind": "property", "displayName": "Include Layout Info", "group": "common", "label": "", "required": false, "type": "boolean", "javaType": "boolean", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "defaultValue": false, "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Show layout information with bounding boxes" }, + "ocrLanguage": { "index": 4, "kind": "property", "displayName": "Ocr Language", "group": "common", "label": "", "required": false, "type": "string", "javaType": "java.lang.String", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "defaultValue": "en", "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Language code for OCR processing" }, + "operation": { "index": 5, "kind": "property", "displayName": "Operation", "group": "common", "label": "", "required": true, "type": "object", "javaType": "org.apache.camel.component.docling.DoclingOperations", "enum": [ "CONVERT_TO_MARKDOWN", "CONVERT_TO_HTML", "CONVERT_TO_JSON", "EXTRACT_TEXT", "EXTRACT_STRUCTURED_DATA" ], "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "defaultValue": "CONVERT_TO_MARKDOWN", "configurationClass": "org.apache.camel.c [...] + "outputFormat": { "index": 6, "kind": "property", "displayName": "Output Format", "group": "common", "label": "", "required": false, "type": "string", "javaType": "java.lang.String", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "defaultValue": "markdown", "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Output format for document conversion" }, + "bridgeErrorHandler": { "index": 7, "kind": "property", "displayName": "Bridge Error Handler", "group": "consumer", "label": "consumer", "required": false, "type": "boolean", "javaType": "boolean", "deprecated": false, "autowired": false, "secret": false, "defaultValue": false, "description": "Allows for bridging the consumer to the Camel routing Error Handler, which mean any exceptions (if possible) occurred while the Camel consumer is trying to pickup incoming messages, or the like [...] + "lazyStartProducer": { "index": 8, "kind": "property", "displayName": "Lazy Start Producer", "group": "producer", "label": "producer", "required": false, "type": "boolean", "javaType": "boolean", "deprecated": false, "autowired": false, "secret": false, "defaultValue": false, "description": "Whether the producer should be started lazy (on the first message). By starting lazy you can use this to allow CamelContext and routes to startup in situations where a producer may otherwise fail [...] + "autowiredEnabled": { "index": 9, "kind": "property", "displayName": "Autowired Enabled", "group": "advanced", "label": "advanced", "required": false, "type": "boolean", "javaType": "boolean", "deprecated": false, "autowired": false, "secret": false, "defaultValue": true, "description": "Whether autowiring is enabled. This is used for automatic autowiring options (the option must be marked as autowired) by looking up in the registry to find if there is a single instance of matching t [...] + "doclingCommand": { "index": 10, "kind": "property", "displayName": "Docling Command", "group": "advanced", "label": "advanced", "required": false, "type": "string", "javaType": "java.lang.String", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Path to Docling Python executable or command" }, + "processTimeout": { "index": 11, "kind": "property", "displayName": "Process Timeout", "group": "advanced", "label": "advanced", "required": false, "type": "integer", "javaType": "long", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "defaultValue": 30000, "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Timeout for Docling process execution in milliseconds" }, + "workingDirectory": { "index": 12, "kind": "property", "displayName": "Working Directory", "group": "advanced", "label": "advanced", "required": false, "type": "string", "javaType": "java.lang.String", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Working directory for Docling execution" }, + "maxFileSize": { "index": 13, "kind": "property", "displayName": "Max File Size", "group": "security", "label": "security", "required": false, "type": "integer", "javaType": "long", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "defaultValue": 52428800, "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Maximum file size in bytes for processing" } + }, + "headers": { + "CamelDoclingOperation": { "index": 0, "kind": "header", "displayName": "", "group": "common", "label": "", "required": false, "javaType": "DoclingOperations", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "description": "The operation to perform", "constantName": "org.apache.camel.component.docling.DoclingHeaders#OPERATION" }, + "CamelDoclingOutputFormat": { "index": 1, "kind": "header", "displayName": "", "group": "common", "label": "", "required": false, "javaType": "String", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "description": "The output format for conversion", "constantName": "org.apache.camel.component.docling.DoclingHeaders#OUTPUT_FORMAT" }, + "CamelDoclingInputFilePath": { "index": 2, "kind": "header", "displayName": "", "group": "common", "label": "", "required": false, "javaType": "String", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "description": "The input file path or content", "constantName": "org.apache.camel.component.docling.DoclingHeaders#INPUT_FILE_PATH" }, + "CamelDoclingOutputFilePath": { "index": 3, "kind": "header", "displayName": "", "group": "common", "label": "", "required": false, "javaType": "String", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "description": "The output file path for saving result", "constantName": "org.apache.camel.component.docling.DoclingHeaders#OUTPUT_FILE_PATH" }, + "CamelDoclingProcessingOptions": { "index": 4, "kind": "header", "displayName": "", "group": "common", "label": "", "required": false, "javaType": "Map<String, Object>", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "description": "Additional processing options", "constantName": "org.apache.camel.component.docling.DoclingHeaders#PROCESSING_OPTIONS" }, + "CamelDoclingEnableOCR": { "index": 5, "kind": "header", "displayName": "", "group": "common", "label": "", "required": false, "javaType": "Boolean", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "description": "Whether to include OCR processing", "constantName": "org.apache.camel.component.docling.DoclingHeaders#ENABLE_OCR" }, + "CamelDoclingOCRLanguage": { "index": 6, "kind": "header", "displayName": "", "group": "common", "label": "", "required": false, "javaType": "String", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "description": "Language for OCR processing", "constantName": "org.apache.camel.component.docling.DoclingHeaders#OCR_LANGUAGE" }, + "CamelDoclingCustomArguments": { "index": 7, "kind": "header", "displayName": "", "group": "common", "label": "", "required": false, "javaType": "List<String>", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "description": "Custom command line arguments to pass to Docling", "constantName": "org.apache.camel.component.docling.DoclingHeaders#CUSTOM_ARGUMENTS" } + }, + "properties": { + "operationId": { "index": 0, "kind": "path", "displayName": "Operation Id", "group": "common", "label": "", "required": true, "type": "string", "javaType": "java.lang.String", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "description": "The operation identifier" }, + "contentInBody": { "index": 1, "kind": "parameter", "displayName": "Content In Body", "group": "common", "label": "", "required": false, "type": "boolean", "javaType": "boolean", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "defaultValue": false, "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Include the content of the output file in the exchange body and delet [...] + "enableOCR": { "index": 2, "kind": "parameter", "displayName": "Enable OCR", "group": "common", "label": "", "required": false, "type": "boolean", "javaType": "boolean", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "defaultValue": true, "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Enable OCR processing for scanned documents" }, + "includeLayoutInfo": { "index": 3, "kind": "parameter", "displayName": "Include Layout Info", "group": "common", "label": "", "required": false, "type": "boolean", "javaType": "boolean", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "defaultValue": false, "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Show layout information with bounding boxes" }, + "ocrLanguage": { "index": 4, "kind": "parameter", "displayName": "Ocr Language", "group": "common", "label": "", "required": false, "type": "string", "javaType": "java.lang.String", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "defaultValue": "en", "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Language code for OCR processing" }, + "operation": { "index": 5, "kind": "parameter", "displayName": "Operation", "group": "common", "label": "", "required": true, "type": "object", "javaType": "org.apache.camel.component.docling.DoclingOperations", "enum": [ "CONVERT_TO_MARKDOWN", "CONVERT_TO_HTML", "CONVERT_TO_JSON", "EXTRACT_TEXT", "EXTRACT_STRUCTURED_DATA" ], "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "defaultValue": "CONVERT_TO_MARKDOWN", "configurationClass": "org.apache.camel. [...] + "outputFormat": { "index": 6, "kind": "parameter", "displayName": "Output Format", "group": "common", "label": "", "required": false, "type": "string", "javaType": "java.lang.String", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "defaultValue": "markdown", "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Output format for document conversion" }, + "bridgeErrorHandler": { "index": 7, "kind": "parameter", "displayName": "Bridge Error Handler", "group": "consumer (advanced)", "label": "consumer,advanced", "required": false, "type": "boolean", "javaType": "boolean", "deprecated": false, "autowired": false, "secret": false, "defaultValue": false, "description": "Allows for bridging the consumer to the Camel routing Error Handler, which mean any exceptions (if possible) occurred while the Camel consumer is trying to pickup incoming [...] + "exceptionHandler": { "index": 8, "kind": "parameter", "displayName": "Exception Handler", "group": "consumer (advanced)", "label": "consumer,advanced", "required": false, "type": "object", "javaType": "org.apache.camel.spi.ExceptionHandler", "optionalPrefix": "consumer.", "deprecated": false, "autowired": false, "secret": false, "description": "To let the consumer use a custom ExceptionHandler. Notice if the option bridgeErrorHandler is enabled then this option is not in use. By def [...] + "exchangePattern": { "index": 9, "kind": "parameter", "displayName": "Exchange Pattern", "group": "consumer (advanced)", "label": "consumer,advanced", "required": false, "type": "object", "javaType": "org.apache.camel.ExchangePattern", "enum": [ "InOnly", "InOut" ], "deprecated": false, "autowired": false, "secret": false, "description": "Sets the exchange pattern when the consumer creates an exchange." }, + "lazyStartProducer": { "index": 10, "kind": "parameter", "displayName": "Lazy Start Producer", "group": "producer (advanced)", "label": "producer,advanced", "required": false, "type": "boolean", "javaType": "boolean", "deprecated": false, "autowired": false, "secret": false, "defaultValue": false, "description": "Whether the producer should be started lazy (on the first message). By starting lazy you can use this to allow CamelContext and routes to startup in situations where a produ [...] + "doclingCommand": { "index": 11, "kind": "parameter", "displayName": "Docling Command", "group": "advanced", "label": "advanced", "required": false, "type": "string", "javaType": "java.lang.String", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Path to Docling Python executable or command" }, + "processTimeout": { "index": 12, "kind": "parameter", "displayName": "Process Timeout", "group": "advanced", "label": "advanced", "required": false, "type": "integer", "javaType": "long", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "defaultValue": 30000, "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Timeout for Docling process execution in milliseconds" }, + "workingDirectory": { "index": 13, "kind": "parameter", "displayName": "Working Directory", "group": "advanced", "label": "advanced", "required": false, "type": "string", "javaType": "java.lang.String", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Working directory for Docling execution" }, + "maxFileSize": { "index": 14, "kind": "parameter", "displayName": "Max File Size", "group": "security", "label": "security", "required": false, "type": "integer", "javaType": "long", "deprecated": false, "deprecationNote": "", "autowired": false, "secret": false, "defaultValue": 52428800, "configurationClass": "org.apache.camel.component.docling.DoclingConfiguration", "configurationField": "configuration", "description": "Maximum file size in bytes for processing" } + } +} diff --git a/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/component.properties b/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/component.properties new file mode 100644 index 00000000000..80f8676d29d --- /dev/null +++ b/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/component.properties @@ -0,0 +1,7 @@ +# Generated by camel build tools - do NOT edit this file! +components=docling +groupId=org.apache.camel +artifactId=camel-docling +version=4.15.0-SNAPSHOT +projectName=Camel :: AI :: Docling +projectDescription=Docling document processing component diff --git a/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/component/docling b/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/component/docling new file mode 100644 index 00000000000..fe9f3e8f194 --- /dev/null +++ b/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/component/docling @@ -0,0 +1,2 @@ +# Generated by camel build tools - do NOT edit this file! +class=org.apache.camel.component.docling.DoclingComponent diff --git a/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/configurer/docling-component b/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/configurer/docling-component new file mode 100644 index 00000000000..3108259f26a --- /dev/null +++ b/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/configurer/docling-component @@ -0,0 +1,2 @@ +# Generated by camel build tools - do NOT edit this file! +class=org.apache.camel.component.docling.DoclingComponentConfigurer diff --git a/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/configurer/docling-endpoint b/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/configurer/docling-endpoint new file mode 100644 index 00000000000..c0da5aaafb9 --- /dev/null +++ b/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/configurer/docling-endpoint @@ -0,0 +1,2 @@ +# Generated by camel build tools - do NOT edit this file! +class=org.apache.camel.component.docling.DoclingEndpointConfigurer diff --git a/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/configurer/org.apache.camel.component.docling.DoclingConfiguration b/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/configurer/org.apache.camel.component.docling.DoclingConfiguration new file mode 100644 index 00000000000..5c215dc54b2 --- /dev/null +++ b/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/configurer/org.apache.camel.component.docling.DoclingConfiguration @@ -0,0 +1,2 @@ +# Generated by camel build tools - do NOT edit this file! +class=org.apache.camel.component.docling.DoclingConfigurationConfigurer diff --git a/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/urifactory/docling-endpoint b/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/urifactory/docling-endpoint new file mode 100644 index 00000000000..8d7493fd071 --- /dev/null +++ b/components/camel-ai/camel-docling/src/generated/resources/META-INF/services/org/apache/camel/urifactory/docling-endpoint @@ -0,0 +1,2 @@ +# Generated by camel build tools - do NOT edit this file! +class=org.apache.camel.component.docling.DoclingEndpointUriFactory diff --git a/components/camel-ai/camel-docling/src/main/docs/docling-component.adoc b/components/camel-ai/camel-docling/src/main/docs/docling-component.adoc new file mode 100644 index 00000000000..45f8f4f604f --- /dev/null +++ b/components/camel-ai/camel-docling/src/main/docs/docling-component.adoc @@ -0,0 +1,216 @@ += Docling Component +:doctitle: Docling +:shortname: docling +:artifactid: camel-docling +:description: Process documents using Docling library for parsing and conversion. +:since: 4.15 +:supportlevel: Preview +:tabs-sync-option: +:component-header: Both producer and consumer are supported +//Manually maintained attributes +:group: AI +:camel-spring-boot-name: docling + +*Since Camel {since}* + +*{component-header}* + +The Docling component allows you to convert and process documents using https://github.com/DS4SD/docling[IBM's Docling AI document parser]. +Docling is a powerful Python library that can parse and convert various document formats including PDF, Word documents, PowerPoint presentations, and more into structured formats like Markdown, HTML, JSON, or plain text. + +Maven users will need to add the following dependency to their `pom.xml` for this component: + +[source,xml] +---- +<dependency> + <groupId>org.apache.camel</groupId> + <artifactId>camel-docling</artifactId> + <version>x.x.x</version> + <!-- use the same version as your Camel core version --> +</dependency> +---- + +== Prerequisites + +Before using this component, you need to have Docling installed on your system. You can install it using pip: + +[source,bash] +---- +pip install docling +---- + +== URI format + +---- +docling:operation[?options] +---- + +Where `operation` represents the document processing operation to perform. + +=== Supported Operations + +The component supports the following operations: + +[width="100%",cols="2,4",options="header"] +|=== +| Operation | Description + +| `CONVERT_TO_MARKDOWN` +| Convert document to Markdown format (default) + +| `CONVERT_TO_HTML` +| Convert document to HTML format + +| `CONVERT_TO_JSON` +| Convert document to JSON format with structure information + +| `EXTRACT_TEXT` +| Extract plain text content from document + +| `EXTRACT_STRUCTURED_DATA` +| Extract structured data including tables and layout information + +|=== + +// component-configure options: START + +// component-configure options: END + +// component options: START +include::partial$component-configure-options.adoc[] +include::partial$component-endpoint-options.adoc[] +// component options: END + +// endpoint options: START + +// endpoint options: END + +// component headers: START +include::partial$component-endpoint-headers.adoc[] +// component headers: END + +== Usage + +=== Input Types + +The component accepts the following input types in the message body: + +- `String` - File path or document content +- `byte[]` - Binary document content +- `File` - File object +- `InputStream` - Input stream containing document data + +=== Output Behavior + +The component behavior depends on the `contentInBody` configuration option: + +- When `contentInBody=true` (default: false): The converted content is placed in the exchange body and the output file is automatically deleted +- When `contentInBody=false`: The file path to the generated output file is returned in the exchange body + +== Examples + +=== Basic document conversion to Markdown + +[source,java] +---- +from("file:///data/documents?include=.*\\.pdf") + .to("docling:CONVERT_TO_MARKDOWN") + .to("file:///data/output"); +---- + +=== Convert to HTML with content in body + +[source,java] +---- +from("file:///data/documents?include=.*\\.pdf") + .to("docling:CONVERT_TO_HTML?contentInBody=true") + .process(exchange -> { + String htmlContent = exchange.getIn().getBody(String.class); + // Process the HTML content + }); +---- + +=== Extract structured data from documents + +[source,java] +---- +from("file:///data/documents?include=.*\\.pdf") + .to("docling:EXTRACT_STRUCTURED_DATA?outputFormat=json&contentInBody=true") + .process(exchange -> { + String jsonData = exchange.getIn().getBody(String.class); + // Process the structured JSON data + }); +---- + +=== Convert with OCR disabled + +[source,java] +---- +from("file:///data/documents?include=.*\\.pdf") + .to("docling:CONVERT_TO_MARKDOWN?enableOCR=false") + .to("file:///data/output"); +---- + +=== Using headers to control processing + +[source,java] +---- +from("file:///data/documents?include=.*\\.pdf") + .setHeader("CamelDoclingOperation", constant(DoclingOperations.CONVERT_TO_HTML)) + .setHeader("CamelDoclingEnableOCR", constant(true)) + .setHeader("CamelDoclingOCRLanguage", constant("es")) + .to("docling:CONVERT_TO_MARKDOWN") // Operation will be overridden by header + .to("file:///data/output"); +---- + +=== Processing with custom arguments + +[source,java] +---- +from("file:///data/documents?include=.*\\.pdf") + .process(exchange -> { + List<String> customArgs = Arrays.asList("--verbose", "--preserve-tables"); + exchange.getIn().setHeader("CamelDoclingCustomArguments", customArgs); + }) + .to("docling:CONVERT_TO_MARKDOWN") + .to("file:///data/output"); +---- + +=== Content in body vs file path output + +[source,java] +---- +// Get content directly in body (file is automatically deleted) +from("file:///data/documents?include=.*\\.pdf") + .to("docling:CONVERT_TO_MARKDOWN?contentInBody=true") + .process(exchange -> { + String markdownContent = exchange.getIn().getBody(String.class); + log.info("Converted content: {}", markdownContent); + }); + +// Get file path (file is preserved) +from("file:///data/documents?include=.*\\.pdf") + .to("docling:CONVERT_TO_MARKDOWN?contentInBody=false") + .process(exchange -> { + String outputFilePath = exchange.getIn().getBody(String.class); + log.info("Output file saved at: {}", outputFilePath); + }); +---- + +== Error Handling + +The component handles various error scenarios: + +- **File size limit exceeded**: Files larger than `maxFileSize` are rejected +- **Process timeout**: Long-running conversions are terminated after `processTimeout` milliseconds +- **Invalid file formats**: Unsupported file formats result in processing errors +- **Docling not found**: Missing Docling installation causes startup failures + +== Performance Considerations + +- Large documents may require increased `processTimeout` values +- OCR processing significantly increases processing time for scanned documents +- Consider using `contentInBody=false` for large outputs to avoid memory issues +- The `maxFileSize` setting helps prevent resource exhaustion + +include::spring-boot:partial$starter.adoc[] diff --git a/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingComponent.java b/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingComponent.java new file mode 100644 index 00000000000..51a6daafc99 --- /dev/null +++ b/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingComponent.java @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.camel.component.docling; + +import java.util.Map; + +import org.apache.camel.CamelContext; +import org.apache.camel.Endpoint; +import org.apache.camel.spi.Metadata; +import org.apache.camel.spi.annotations.Component; +import org.apache.camel.support.DefaultComponent; + +/** + * Component for integrating with Docling document processing library. + */ +@Component("docling") +public class DoclingComponent extends DefaultComponent { + + @Metadata + DoclingConfiguration configuration; + + public DoclingComponent() { + this(null); + } + + public DoclingComponent(CamelContext context) { + super(context); + this.configuration = new DoclingConfiguration(); + } + + @Override + protected Endpoint createEndpoint(String uri, String remaining, Map<String, Object> parameters) throws Exception { + DoclingConfiguration config = this.configuration.copy(); + DoclingEndpoint endpoint = new DoclingEndpoint(uri, this, remaining, config); + setProperties(endpoint, parameters); + return endpoint; + } + + public DoclingConfiguration getConfiguration() { + return configuration; + } + + /** + * The configuration; + */ + public void setConfiguration(DoclingConfiguration configuration) { + this.configuration = configuration; + } + +} diff --git a/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingConfiguration.java b/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingConfiguration.java new file mode 100644 index 00000000000..41b1258c833 --- /dev/null +++ b/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingConfiguration.java @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.camel.component.docling; + +import org.apache.camel.RuntimeCamelException; +import org.apache.camel.spi.Configurer; +import org.apache.camel.spi.Metadata; +import org.apache.camel.spi.UriParam; +import org.apache.camel.spi.UriParams; + +/** + * Configuration for the Docling component. + */ +@Configurer +@UriParams +public class DoclingConfiguration implements Cloneable { + + @UriParam + @Metadata(required = true, defaultValue = "CONVERT_TO_MARKDOWN", description = "The operation to perform") + private DoclingOperations operation = DoclingOperations.CONVERT_TO_MARKDOWN; + + @UriParam(label = "advanced") + @Metadata(description = "Path to Docling Python executable or command") + private String doclingCommand = "docling"; + + @UriParam(label = "advanced") + @Metadata(description = "Working directory for Docling execution") + private String workingDirectory; + + @UriParam + @Metadata(description = "Enable OCR processing for scanned documents", defaultValue = "true") + private boolean enableOCR = true; + + @UriParam + @Metadata(description = "Language code for OCR processing", defaultValue = "en") + private String ocrLanguage = "en"; + + @UriParam + @Metadata(description = "Output format for document conversion", defaultValue = "markdown") + private String outputFormat = "markdown"; + + @UriParam(label = "advanced") + @Metadata(description = "Timeout for Docling process execution in milliseconds", defaultValue = "30000") + private long processTimeout = 30000; + + @UriParam + @Metadata(description = "Show layout information with bounding boxes", defaultValue = "false") + private boolean includeLayoutInfo = false; + + @UriParam(label = "security") + @Metadata(description = "Maximum file size in bytes for processing", defaultValue = "52428800") + private long maxFileSize = 50 * 1024 * 1024; // 50MB + + @UriParam + @Metadata(description = "Include the content of the output file in the exchange body and delete the output file", + defaultValue = "false") + private boolean contentInBody = false; + + public DoclingOperations getOperation() { + return operation; + } + + public void setOperation(DoclingOperations operation) { + this.operation = operation; + } + + public String getDoclingCommand() { + return doclingCommand; + } + + public void setDoclingCommand(String doclingCommand) { + this.doclingCommand = doclingCommand; + } + + public String getWorkingDirectory() { + return workingDirectory; + } + + public void setWorkingDirectory(String workingDirectory) { + this.workingDirectory = workingDirectory; + } + + public boolean isEnableOCR() { + return enableOCR; + } + + public void setEnableOCR(boolean enableOCR) { + this.enableOCR = enableOCR; + } + + public String getOcrLanguage() { + return ocrLanguage; + } + + public void setOcrLanguage(String ocrLanguage) { + this.ocrLanguage = ocrLanguage; + } + + public String getOutputFormat() { + return outputFormat; + } + + public void setOutputFormat(String outputFormat) { + this.outputFormat = outputFormat; + } + + public long getProcessTimeout() { + return processTimeout; + } + + public void setProcessTimeout(long processTimeout) { + this.processTimeout = processTimeout; + } + + public boolean isIncludeLayoutInfo() { + return includeLayoutInfo; + } + + public void setIncludeLayoutInfo(boolean includeLayoutInfo) { + this.includeLayoutInfo = includeLayoutInfo; + } + + public long getMaxFileSize() { + return maxFileSize; + } + + public void setMaxFileSize(long maxFileSize) { + this.maxFileSize = maxFileSize; + } + + public boolean isContentInBody() { + return contentInBody; + } + + public void setContentInBody(boolean contentInBody) { + this.contentInBody = contentInBody; + } + + public DoclingConfiguration copy() { + try { + return (DoclingConfiguration) super.clone(); + } catch (CloneNotSupportedException e) { + throw new RuntimeCamelException(e); + } + } + +} diff --git a/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingEndpoint.java b/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingEndpoint.java new file mode 100644 index 00000000000..f08983d12a8 --- /dev/null +++ b/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingEndpoint.java @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.camel.component.docling; + +import org.apache.camel.Category; +import org.apache.camel.Consumer; +import org.apache.camel.Processor; +import org.apache.camel.Producer; +import org.apache.camel.spi.Metadata; +import org.apache.camel.spi.UriEndpoint; +import org.apache.camel.spi.UriParam; +import org.apache.camel.spi.UriPath; +import org.apache.camel.support.DefaultEndpoint; + +/** + * Process documents using Docling library for parsing and conversion. + */ +@UriEndpoint(firstVersion = "4.15.0", scheme = "docling", + title = "Docling", + syntax = "docling:operationId", + category = { Category.AI }, headersClass = DoclingHeaders.class) +public class DoclingEndpoint extends DefaultEndpoint { + + @Metadata(required = true) + @UriPath(description = "The operation identifier") + private final String operationId; + + @UriParam + private DoclingConfiguration configuration; + + public DoclingEndpoint(String uri, DoclingComponent component, String operationId, DoclingConfiguration configuration) { + super(uri, component); + this.operationId = operationId; + this.configuration = configuration; + } + + @Override + public Producer createProducer() throws Exception { + return new DoclingProducer(this); + } + + @Override + public Consumer createConsumer(Processor processor) throws Exception { + throw new UnsupportedOperationException("Consumer not supported for Docling component"); + } + + public String getOperationId() { + return operationId; + } + + public DoclingConfiguration getConfiguration() { + return configuration; + } + + public void setConfiguration(DoclingConfiguration configuration) { + this.configuration = configuration; + } + +} diff --git a/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingHeaders.java b/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingHeaders.java new file mode 100644 index 00000000000..f8f96c9b8c2 --- /dev/null +++ b/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingHeaders.java @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.camel.component.docling; + +import org.apache.camel.spi.Metadata; + +public final class DoclingHeaders { + + @Metadata(description = "The operation to perform", javaType = "DoclingOperations") + public static final String OPERATION = "CamelDoclingOperation"; + + @Metadata(description = "The output format for conversion", javaType = "String") + public static final String OUTPUT_FORMAT = "CamelDoclingOutputFormat"; + + @Metadata(description = "The input file path or content", javaType = "String") + public static final String INPUT_FILE_PATH = "CamelDoclingInputFilePath"; + + @Metadata(description = "The output file path for saving result", javaType = "String") + public static final String OUTPUT_FILE_PATH = "CamelDoclingOutputFilePath"; + + @Metadata(description = "Additional processing options", javaType = "Map<String, Object>") + public static final String PROCESSING_OPTIONS = "CamelDoclingProcessingOptions"; + + @Metadata(description = "Whether to include OCR processing", javaType = "Boolean") + public static final String ENABLE_OCR = "CamelDoclingEnableOCR"; + + @Metadata(description = "Language for OCR processing", javaType = "String") + public static final String OCR_LANGUAGE = "CamelDoclingOCRLanguage"; + + @Metadata(description = "Custom command line arguments to pass to Docling", javaType = "List<String>") + public static final String CUSTOM_ARGUMENTS = "CamelDoclingCustomArguments"; + + private DoclingHeaders() { + } + +} diff --git a/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingOperations.java b/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingOperations.java new file mode 100644 index 00000000000..9d48bdb3daa --- /dev/null +++ b/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingOperations.java @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.camel.component.docling; + +public enum DoclingOperations { + + /** + * Convert document to markdown format + */ + CONVERT_TO_MARKDOWN, + + /** + * Convert document to HTML format + */ + CONVERT_TO_HTML, + + /** + * Convert document to JSON format with structure + */ + CONVERT_TO_JSON, + + /** + * Extract text content from document + */ + EXTRACT_TEXT, + + /** + * Extract structured data including tables and layout + */ + EXTRACT_STRUCTURED_DATA + +} diff --git a/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingProducer.java b/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingProducer.java new file mode 100644 index 00000000000..85c8f2050da --- /dev/null +++ b/components/camel-ai/camel-docling/src/main/java/org/apache/camel/component/docling/DoclingProducer.java @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.camel.component.docling; + +import java.io.BufferedReader; +import java.io.File; +import java.io.IOException; +import java.io.InputStreamReader; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.TimeUnit; + +import org.apache.camel.Exchange; +import org.apache.camel.InvalidPayloadException; +import org.apache.camel.support.DefaultProducer; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Producer for Docling document processing operations. + */ +public class DoclingProducer extends DefaultProducer { + + private static final Logger LOG = LoggerFactory.getLogger(DoclingProducer.class); + + private DoclingEndpoint endpoint; + private DoclingConfiguration configuration; + + public DoclingProducer(DoclingEndpoint endpoint) { + super(endpoint); + this.endpoint = endpoint; + this.configuration = endpoint.getConfiguration(); + } + + @Override + public void process(Exchange exchange) throws Exception { + LOG.debug("DoclingProducer processing exchange with message ID: {}", exchange.getExchangeId()); + DoclingOperations operation = getOperation(exchange); + LOG.debug("DoclingProducer performing operation: {}", operation); + + switch (operation) { + case CONVERT_TO_MARKDOWN: + processConvertToMarkdown(exchange); + break; + case CONVERT_TO_HTML: + processConvertToHTML(exchange); + break; + case CONVERT_TO_JSON: + processConvertToJSON(exchange); + break; + case EXTRACT_TEXT: + processExtractText(exchange); + break; + case EXTRACT_STRUCTURED_DATA: + processExtractStructuredData(exchange); + break; + default: + throw new IllegalArgumentException("Unsupported operation: " + operation); + } + } + + private DoclingOperations getOperation(Exchange exchange) { + DoclingOperations operation = exchange.getIn().getHeader(DoclingHeaders.OPERATION, DoclingOperations.class); + if (operation == null) { + operation = configuration.getOperation(); + } + return operation; + } + + private void processConvertToMarkdown(Exchange exchange) throws Exception { + LOG.debug("DoclingProducer converting to markdown"); + String inputPath = getInputPath(exchange); + exchange.getIn().setBody(executeDoclingCommand(inputPath, "markdown", exchange)); + } + + private void processConvertToHTML(Exchange exchange) throws Exception { + LOG.debug("DoclingProducer converting to HTML"); + String inputPath = getInputPath(exchange); + exchange.getIn().setBody(executeDoclingCommand(inputPath, "html", exchange)); + } + + private void processConvertToJSON(Exchange exchange) throws Exception { + String inputPath = getInputPath(exchange); + exchange.getIn().setBody(executeDoclingCommand(inputPath, "json", exchange)); + } + + private void processExtractText(Exchange exchange) throws Exception { + String inputPath = getInputPath(exchange); + exchange.getIn().setBody(executeDoclingCommand(inputPath, "text", exchange)); + } + + private void processExtractStructuredData(Exchange exchange) throws Exception { + String inputPath = getInputPath(exchange); + exchange.getIn().setBody(executeDoclingCommand(inputPath, "json", exchange)); + } + + private String getInputPath(Exchange exchange) throws InvalidPayloadException, IOException { + String inputPath = exchange.getIn().getHeader(DoclingHeaders.INPUT_FILE_PATH, String.class); + + if (inputPath != null) { + validateFileSize(inputPath); + return inputPath; + } + + Object body = exchange.getIn().getBody(); + if (body instanceof String) { + String content = (String) body; + if (content.startsWith("/") || content.contains("\\")) { + validateFileSize(content); + return content; + } else { + Path tempFile = Files.createTempFile("docling-", ".tmp"); + Files.write(tempFile, content.getBytes()); + validateFileSize(tempFile.toString()); + return tempFile.toString(); + } + } else if (body instanceof byte[]) { + byte[] content = (byte[]) body; + if (content.length > configuration.getMaxFileSize()) { + throw new IllegalArgumentException("File size exceeds maximum allowed size: " + configuration.getMaxFileSize()); + } + Path tempFile = Files.createTempFile("docling-", ".tmp"); + Files.write(tempFile, content); + return tempFile.toString(); + } else if (body instanceof File) { + File file = (File) body; + validateFileSize(file.getAbsolutePath()); + return file.getAbsolutePath(); + } + + throw new InvalidPayloadException(exchange, String.class); + } + + private void validateFileSize(String filePath) throws IOException { + Path path = Paths.get(filePath); + if (Files.exists(path)) { + long fileSize = Files.size(path); + if (fileSize > configuration.getMaxFileSize()) { + throw new IllegalArgumentException( + "File size (" + fileSize + " bytes) exceeds maximum allowed size: " + configuration.getMaxFileSize()); + } + } + } + + private String executeDoclingCommand(String inputPath, String outputFormat, Exchange exchange) throws Exception { + LOG.debug("DoclingProducer executing Docling command for input: {} with format: {}", inputPath, outputFormat); + // Create temporary output directory + Path tempOutputDir = Files.createTempDirectory("docling-output"); + + try { + List<String> command = buildDoclingCommand(inputPath, outputFormat, exchange, tempOutputDir.toString()); + + LOG.debug("Executing Docling command: {}", command); + + ProcessBuilder processBuilder = new ProcessBuilder(command); + + if (configuration.getWorkingDirectory() != null) { + processBuilder.directory(new File(configuration.getWorkingDirectory())); + } + + Process process = processBuilder.start(); + + StringBuilder output = new StringBuilder(); + StringBuilder error = new StringBuilder(); + + try (BufferedReader outputReader = new BufferedReader(new InputStreamReader(process.getInputStream())); + BufferedReader errorReader = new BufferedReader(new InputStreamReader(process.getErrorStream()))) { + + String line; + while ((line = outputReader.readLine()) != null) { + LOG.debug("Docling output: {}", line); + output.append(line).append("\n"); + } + + while ((line = errorReader.readLine()) != null) { + error.append(line).append("\n"); + } + } + + boolean finished = process.waitFor(configuration.getProcessTimeout(), TimeUnit.MILLISECONDS); + + if (!finished) { + process.destroyForcibly(); + throw new RuntimeException( + "Docling process timed out after " + configuration.getProcessTimeout() + " milliseconds"); + } + + int exitCode = process.exitValue(); + if (exitCode != 0) { + throw new RuntimeException( + "Docling process failed with exit code " + exitCode + ". Error: " + error.toString()); + } + + // Read the generated output file or return file path based on configuration + String result = readGeneratedOutputFile(tempOutputDir, inputPath, outputFormat); + + // If contentInBody is false, we need to move the file to a permanent location + if (!configuration.isContentInBody()) { + result = moveOutputFileToFinalLocation(tempOutputDir, inputPath, outputFormat); + } + + return result; + + } finally { + // Clean up temporary directory only if contentInBody is true + // (the file has already been read and deleted) + if (configuration.isContentInBody()) { + deleteDirectory(tempOutputDir); + } + } + } + + private String readGeneratedOutputFile(Path outputDir, String inputPath, String outputFormat) throws IOException { + // Docling generates files with the same base name as input but different extension + Path inputFilePath = Paths.get(inputPath); + String baseName = inputFilePath.getFileName().toString(); + int lastDot = baseName.lastIndexOf('.'); + if (lastDot > 0) { + baseName = baseName.substring(0, lastDot); + } + + // Determine the expected output file extension + String extension = getOutputFileExtension(outputFormat); + String expectedFileName = baseName + "." + extension; + + Path outputFile = outputDir.resolve(expectedFileName); + Path actualOutputFile = null; + + if (Files.exists(outputFile)) { + actualOutputFile = outputFile; + } else { + // Fallback: look for any file in the output directory + try (var stream = Files.list(outputDir)) { + actualOutputFile = stream.findFirst().orElse(null); + if (actualOutputFile == null || !Files.isRegularFile(actualOutputFile)) { + throw new RuntimeException("No output file generated in: " + outputDir); + } + } + } + + if (configuration.isContentInBody()) { + // Read content into body and delete the file + String content = Files.readString(actualOutputFile); + try { + Files.delete(actualOutputFile); + LOG.debug("Deleted output file: {}", actualOutputFile); + } catch (IOException e) { + LOG.warn("Failed to delete output file: {}", actualOutputFile, e); + } + return content; + } else { + // Return the file path and let the user manage the file + return actualOutputFile.toString(); + } + } + + private String moveOutputFileToFinalLocation(Path tempOutputDir, String inputPath, String outputFormat) throws IOException { + // Find the generated output file + Path inputFilePath = Paths.get(inputPath); + String baseName = inputFilePath.getFileName().toString(); + int lastDot = baseName.lastIndexOf('.'); + if (lastDot > 0) { + baseName = baseName.substring(0, lastDot); + } + + String extension = getOutputFileExtension(outputFormat); + String expectedFileName = baseName + "." + extension; + Path tempOutputFile = tempOutputDir.resolve(expectedFileName); + + if (!Files.exists(tempOutputFile)) { + // Fallback: look for any file in the output directory + try (var stream = Files.list(tempOutputDir)) { + tempOutputFile = stream.findFirst().orElse(null); + if (tempOutputFile == null || !Files.isRegularFile(tempOutputFile)) { + throw new RuntimeException("No output file generated in: " + tempOutputDir); + } + } + } + + // Create final output file in the same directory as input + Path finalOutputFile = inputFilePath.getParent().resolve(tempOutputFile.getFileName()); + + // Ensure we don't overwrite an existing file + int counter = 1; + while (Files.exists(finalOutputFile)) { + String nameWithoutExt = baseName; + String ext = extension; + finalOutputFile = inputFilePath.getParent().resolve(nameWithoutExt + "_" + counter + "." + ext); + counter++; + } + + // Move the file from temp location to final location + Files.move(tempOutputFile, finalOutputFile); + LOG.debug("Moved output file from {} to {}", tempOutputFile, finalOutputFile); + + return finalOutputFile.toString(); + } + + private String getOutputFileExtension(String outputFormat) { + switch (outputFormat.toLowerCase()) { + case "markdown": + case "md": + return "md"; + case "html": + return "html"; + case "json": + return "json"; + case "text": + return "txt"; + default: + return "md"; + } + } + + private void deleteDirectory(Path directory) { + try { + if (Files.exists(directory)) { + Files.walk(directory) + .sorted((a, b) -> b.compareTo(a)) // Delete files before directories + .forEach(path -> { + try { + Files.delete(path); + } catch (IOException e) { + LOG.warn("Failed to delete temporary file: {}", path, e); + } + }); + } + } catch (IOException e) { + LOG.warn("Failed to clean up temporary directory: {}", directory, e); + } + } + + private List<String> buildDoclingCommand(String inputPath, String outputFormat, Exchange exchange, String outputDirectory) { + List<String> command = new ArrayList<>(); + command.add(configuration.getDoclingCommand()); + + // Add custom arguments from headers if provided + addCustomArguments(command, exchange); + + // Output format + addOutputFormatArguments(command, outputFormat); + + // OCR configuration + addOcrArguments(command); + + // Layout information + addLayoutArguments(command); + + // Output directory + addOutputDirectoryArguments(command, exchange, outputDirectory); + + // Input source (positional argument - must be last) + command.add(inputPath); + + return command; + } + + private void addCustomArguments(List<String> command, Exchange exchange) { + // Allow custom arguments to be passed via headers + @SuppressWarnings("unchecked") + List<String> customArgs = exchange.getIn().getHeader(DoclingHeaders.CUSTOM_ARGUMENTS, List.class); + if (customArgs != null && !customArgs.isEmpty()) { + LOG.debug("Adding custom Docling arguments: {}", customArgs); + command.addAll(customArgs); + } + } + + private void addOutputFormatArguments(List<String> command, String outputFormat) { + if (outputFormat != null && !outputFormat.isEmpty()) { + command.add("--to"); + command.add(mapToDoclingFormat(outputFormat)); + } + } + + private void addOcrArguments(List<String> command) { + if (!configuration.isEnableOCR()) { + command.add("--no-ocr"); + } else if (configuration.getOcrLanguage() != null) { + command.add("--ocr-lang"); + command.add(configuration.getOcrLanguage()); + } + } + + private void addLayoutArguments(List<String> command) { + if (configuration.isIncludeLayoutInfo()) { + command.add("--show-layout"); + } + } + + private void addOutputDirectoryArguments(List<String> command, Exchange exchange, String outputDirectory) { + String outputPath = exchange.getIn().getHeader(DoclingHeaders.OUTPUT_FILE_PATH, String.class); + if (outputPath != null) { + command.add("--output"); + command.add(outputPath); + } else { + command.add("--output"); + command.add(outputDirectory); + } + } + + private String mapToDoclingFormat(String outputFormat) { + switch (outputFormat.toLowerCase()) { + case "markdown": + return "md"; + case "html": + return "html"; + case "json": + return "json"; + case "text": + return "text"; + default: + return "md"; // Default to markdown + } + } + +} diff --git a/components/camel-ai/camel-docling/src/test/java/org/apache/camel/component/docling/DoclingComponentTest.java b/components/camel-ai/camel-docling/src/test/java/org/apache/camel/component/docling/DoclingComponentTest.java new file mode 100644 index 00000000000..d852bb04577 --- /dev/null +++ b/components/camel-ai/camel-docling/src/test/java/org/apache/camel/component/docling/DoclingComponentTest.java @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.camel.component.docling; + +import org.apache.camel.Endpoint; +import org.apache.camel.test.junit5.CamelTestSupport; +import org.junit.jupiter.api.Test; + +import static org.junit.jupiter.api.Assertions.*; + +public class DoclingComponentTest extends CamelTestSupport { + + @Test + public void testCreateEndpoint() throws Exception { + Endpoint endpoint = context.getEndpoint("docling:convert"); + assertNotNull(endpoint); + assertTrue(endpoint instanceof DoclingEndpoint); + + DoclingEndpoint doclingEndpoint = (DoclingEndpoint) endpoint; + assertEquals("convert", doclingEndpoint.getOperationId()); + assertEquals(DoclingOperations.CONVERT_TO_MARKDOWN, doclingEndpoint.getConfiguration().getOperation()); + assertTrue(doclingEndpoint.getConfiguration().isEnableOCR()); // OCR is enabled by default + } + + @Test + public void testCreateEndpointWithParameters() throws Exception { + Endpoint endpoint = context.getEndpoint("docling:process?operation=CONVERT_TO_HTML&enableOCR=false&ocrLanguage=es"); + assertNotNull(endpoint); + assertTrue(endpoint instanceof DoclingEndpoint); + + DoclingEndpoint doclingEndpoint = (DoclingEndpoint) endpoint; + assertEquals("process", doclingEndpoint.getOperationId()); + assertEquals(DoclingOperations.CONVERT_TO_HTML, doclingEndpoint.getConfiguration().getOperation()); + assertFalse(doclingEndpoint.getConfiguration().isEnableOCR()); + assertEquals("es", doclingEndpoint.getConfiguration().getOcrLanguage()); + } + + @Test + public void testProducerCreation() throws Exception { + DoclingEndpoint endpoint = (DoclingEndpoint) context.getEndpoint("docling:convert"); + assertNotNull(endpoint.createProducer()); + } + +} diff --git a/components/camel-ai/camel-docling/src/test/java/org/apache/camel/component/docling/DoclingProducerTest.java b/components/camel-ai/camel-docling/src/test/java/org/apache/camel/component/docling/DoclingProducerTest.java new file mode 100644 index 00000000000..610bc49e971 --- /dev/null +++ b/components/camel-ai/camel-docling/src/test/java/org/apache/camel/component/docling/DoclingProducerTest.java @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.camel.component.docling; + +import java.nio.file.Files; +import java.nio.file.Path; + +import org.apache.camel.builder.RouteBuilder; +import org.apache.camel.test.junit5.CamelTestSupport; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.condition.EnabledIfSystemProperty; + +import static org.junit.jupiter.api.Assertions.assertNotNull; +import static org.junit.jupiter.api.Assertions.assertTrue; + +public class DoclingProducerTest extends CamelTestSupport { + + @Test + @EnabledIfSystemProperty(named = "docling.test.enabled", matches = "true") + public void testMarkdownConversion() throws Exception { + Path testFile = createTestFile(); + + String result = template.requestBodyAndHeader("direct:convert-markdown", + testFile.toString(), + DoclingHeaders.INPUT_FILE_PATH, testFile.toString(), String.class); + + assertNotNull(result); + assertTrue(result.length() > 0); + } + + @Test + @EnabledIfSystemProperty(named = "docling.test.enabled", matches = "true") + public void testHtmlConversion() throws Exception { + Path testFile = createTestFile(); + + String result = template.requestBodyAndHeader("direct:convert-html", + testFile.toString(), + DoclingHeaders.OPERATION, DoclingOperations.CONVERT_TO_HTML, String.class); + + assertNotNull(result); + assertTrue(result.length() > 0); + } + + @Test + @EnabledIfSystemProperty(named = "docling.test.enabled", matches = "true") + public void testContentInBodyEnabled() throws Exception { + Path testFile = createTestFile(); + + String result = template.requestBodyAndHeader("direct:convert-content-in-body", + testFile.toString(), + DoclingHeaders.INPUT_FILE_PATH, testFile.toString(), String.class); + + assertNotNull(result); + assertTrue(result.length() > 0); + // When contentInBody is true, result should contain the actual content, not a file path + assertTrue(result.contains("Test Document") || result.contains("test document")); + } + + @Test + @EnabledIfSystemProperty(named = "docling.test.enabled", matches = "true") + public void testContentInBodyDisabled() throws Exception { + Path testFile = createTestFile(); + + String result = template.requestBodyAndHeader("direct:convert-file-path", + testFile.toString(), + DoclingHeaders.INPUT_FILE_PATH, testFile.toString(), String.class); + + assertNotNull(result); + assertTrue(result.length() > 0); + // When contentInBody is false, result should be a file path + assertTrue(result.endsWith(".md") || result.endsWith(".html")); + assertTrue(Files.exists(Path.of(result))); + } + + private Path createTestFile() throws Exception { + Path tempFile = Files.createTempFile("docling-test", ".md"); + Files.write(tempFile, + "# Test Document\n\nThis is a test document for Docling processing.\n\n## Section 1\n\nSome content here.\n\n- List item 1\n- List item 2\n" + .getBytes()); + return tempFile; + } + + @Override + protected RouteBuilder createRouteBuilder() throws Exception { + return new RouteBuilder() { + @Override + public void configure() throws Exception { + from("direct:convert-markdown") + .to("docling:convert?operation=CONVERT_TO_MARKDOWN"); + + from("direct:convert-html") + .to("docling:convert?operation=CONVERT_TO_HTML"); + + from("direct:convert-content-in-body") + .to("docling:convert?operation=CONVERT_TO_MARKDOWN&contentInBody=true"); + + from("direct:convert-file-path") + .to("docling:convert?operation=CONVERT_TO_MARKDOWN&contentInBody=false"); + } + }; + } + +} diff --git a/components/camel-ai/camel-docling/src/test/resources/log4j2.properties b/components/camel-ai/camel-docling/src/test/resources/log4j2.properties new file mode 100644 index 00000000000..3ad5d3a43c0 --- /dev/null +++ b/components/camel-ai/camel-docling/src/test/resources/log4j2.properties @@ -0,0 +1,43 @@ +## --------------------------------------------------------------------------- +## Licensed to the Apache Software Foundation (ASF) under one or more +## contributor license agreements. See the NOTICE file distributed with +## this work for additional information regarding copyright ownership. +## The ASF licenses this file to You under the Apache License, Version 2.0 +## (the "License"); you may not use this file except in compliance with +## the License. You may obtain a copy of the License at +## +## http://www.apache.org/licenses/LICENSE-2.0 +## +## Unless required by applicable law or agreed to in writing, software +## distributed under the License is distributed on an "AS IS" BASIS, +## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +## See the License for the specific language governing permissions and +## limitations under the License. +## --------------------------------------------------------------------------- + +appender.file.type = File +appender.file.name = file +appender.file.fileName = target/camel-docling-tests.log +appender.file.layout.type = PatternLayout +appender.file.layout.pattern = %d [%-15.15t] %-5p %-30.30c{1} - %m%n + +appender.out.type = Console +appender.out.name = out +appender.out.layout.type = PatternLayout +appender.out.layout.pattern = [%30.30t] %-30.30c{1} %-5p %m%n + +rootLogger.level = DEBUG +rootLogger.appenderRef.file.ref = file +# To log to stdout +#rootLogger.appenderRef.out.ref = out + +logger.camel-docling.name = org.apache.camel.component.docling +logger.camel-docling.level = DEBUG +logger.camel-docling.additivity = false +logger.camel-docling.appenderRef.file.ref = file + +# Specific logger for DoclingProducer to ensure it logs properly +logger.docling-producer.name = org.apache.camel.component.docling.DoclingProducer +logger.docling-producer.level = DEBUG +logger.docling-producer.additivity = false +logger.docling-producer.appenderRef.file.ref = file diff --git a/components/camel-ai/pom.xml b/components/camel-ai/pom.xml index d2268d1283d..76718f92647 100644 --- a/components/camel-ai/pom.xml +++ b/components/camel-ai/pom.xml @@ -36,6 +36,7 @@ <modules> <module>camel-chatscript</module> <module>camel-djl</module> + <module>camel-docling</module> <module>camel-kserve</module> <module>camel-langchain4j-agent-api</module> <module>camel-langchain4j-agent</module> diff --git a/docs/components/modules/ROOT/examples/json/docling.json b/docs/components/modules/ROOT/examples/json/docling.json new file mode 120000 index 00000000000..b33675358ac --- /dev/null +++ b/docs/components/modules/ROOT/examples/json/docling.json @@ -0,0 +1 @@ +../../../../../../components/camel-ai/camel-docling/src/generated/resources/META-INF/org/apache/camel/component/docling/docling.json \ No newline at end of file diff --git a/docs/components/modules/ROOT/nav.adoc b/docs/components/modules/ROOT/nav.adoc index 9686d50091a..52b088b28c7 100644 --- a/docs/components/modules/ROOT/nav.adoc +++ b/docs/components/modules/ROOT/nav.adoc @@ -7,6 +7,7 @@ ** xref:ai-summary.adoc[AI] *** xref:chatscript-component.adoc[ChatScript] *** xref:djl-component.adoc[Deep Java Library] +*** xref:docling-component.adoc[Docling] *** xref:kserve-component.adoc[KServe] *** xref:langchain4j-agent-component.adoc[LangChain4j Agent] *** xref:langchain4j-chat-component.adoc[LangChain4j Chat] diff --git a/docs/components/modules/ROOT/pages/docling-component.adoc b/docs/components/modules/ROOT/pages/docling-component.adoc new file mode 120000 index 00000000000..2abd3e69fe7 --- /dev/null +++ b/docs/components/modules/ROOT/pages/docling-component.adoc @@ -0,0 +1 @@ +../../../../../components/camel-ai/camel-docling/src/main/docs/docling-component.adoc \ No newline at end of file diff --git a/parent/pom.xml b/parent/pom.xml index aae57df53d9..16a9b6a14d5 100644 --- a/parent/pom.xml +++ b/parent/pom.xml @@ -1262,6 +1262,11 @@ <artifactId>camel-docker</artifactId> <version>${project.version}</version> </dependency> + <dependency> + <groupId>org.apache.camel</groupId> + <artifactId>camel-docling</artifactId> + <version>${project.version}</version> + </dependency> <dependency> <groupId>org.apache.camel</groupId> <artifactId>camel-drill</artifactId>
