Re: [PR] Add an example of data extraction with Quarkus LangChain4j [camel-quarkus-examples]

via GitHub Tue, 03 Sep 2024 09:31:49 -0700


jamesnetherton commented on code in PR #261:
URL: 
https://github.com/apache/camel-quarkus-examples/pull/261#discussion_r1742346077



##########
data-extract-langchain4j/README.adoc:
##########
@@ -0,0 +1,126 @@
+= Unstructured Data Extraction with LangChain4j: A Camel Quarkus example
+:cq-example-description: An example that shows how to convert unstructured 
text data to structured Java objects helped with a Large Language Model and 
LangChain4j
+
+{cq-description}
+
+TIP: Check the 
https://camel.apache.org/camel-quarkus/latest/first-steps.html[Camel Quarkus 
User guide] for prerequisites
+and other general information.
+
+Suppose the volume of 
https://en.wikipedia.org/wiki/Unstructured_data[unstructured data] grows at a 
high pace in a given organization.
+How could one transform those disseminated gold particles into a conform 
bullion that could be used in banks.
+For instance, let's imagine an insurance company that would record the 
transcripts of the conversation when customers are discussing with the hotline.
+There is probably a lot of valuable information that could be extracted from 
those conversation transcripts.
+In this example, we'll convert those text conversations into Java Objects that 
could then be used in the rest of the Camel route.
+
+In order to achieve this extraction, we'll need a 
https://en.wikipedia.org/wiki/Large_language_model[Large Language Model (LLM)] 
that natively support JSON output.
+Here, we arbitrarily choose https://ollama.com/library/codellama[codellama] 
served through https://ollama.com/[ollama].
+In order to invoke the served model, we'll use the high-level LangChain4j APIs 
like https://docs.langchain4j.dev/tutorials/ai-services[AiServices].
+As we are using the Quarkus runtime, we can leverage all the advantages of the 
https://docs.quarkiverse.io/quarkus-langchain4j/dev/index.html[Quarkus 
LangChain4j extension].
+
+=== Start the Large Language Model
+
+Let's start a container to serve the LLM with Ollama:
+
+[source,shell]
+----
+docker run -p11434:11434 langchain4j/ollama-codellama:latest
+----
+
+After a moment, a log like below should be output:
+
+[source,shell]
+----
+time=2024-09-03T08:03:15.532Z level=INFO source=types.go:98 msg="inference 
compute" id=0 library=cpu compute="" driver=0.0 name="" total="62.5 GiB" 
available="54.4 GiB"
+----
+
+That's it, the LLM is now ready to serve our data extraction requests.
+
+=== Package and run the application
+
+You are now ready to package and run the application.
+
+TIP: Find more details about the JVM mode and Native mode in the Package and 
run section of
+https://camel.apache.org/camel-quarkus/latest/first-steps.html#_package_and_run_the_application[Camel
 Quarkus User guide]
+
+==== JVM mode
+
+[source,shell]
+----
+mvn clean package -DskipTests
+java -jar target/quarkus-app/quarkus-run.jar
+----
+
+==== Extracting data from unstructured conversation
+
+Let's atomically copy/move the transcript files to the input folder named 
`target/transcripts/`, for instance like below:
+
+[source,shell]
+----
+cp -rf src/test/resources/transcripts/ target/transcripts-tmp
+mv target/transcripts-tmp/*.json target/transcripts/
+----
+
+The Camel route should output a log as below:
+
+[source,shell]
+----
+024-09-03 10:14:34,757 INFO  [route1] (Camel (camel-1) thread #1 - 
file://target/transcripts) A document has been received by the 
camel-quarkus-file extension: {
+  "id": 1,
+  "content": "Operator: Hello, how may I help you ?\nCustomer: Hello, I'm 
calling because I need to declare an accident on my main vehicle.\nOperator: 
Ok, can you please give me your name ?\nCustomer: My name is Sarah 
London.\nOperator: Could you please give me your birth date ?\nCustomer: 1986, 
July the 10th.\nOperator: Ok, I've got your contract and I'm happy to share 
with you that we'll be able to reimburse all expenses linked to this 
accident.\nCustomer: Oh great, many thanks."
+}
+----
+
+In the first log above, we can see that a JSON file handling transcript 
related information has been consumed.
+The conversation is present in the JSON field named `content`.
+This content will be injected into the LLM prompt.
+
+After a few seconds or minutes depending on your hardware setup, the LLM 
provides an answer strictly conforming to the expected JSON schema.
+It's now easy for LangChain4j to convert the returned JSON into a Java Object.
+At the end of the day, we are provided with a Plain Old Java Object (POJO) 
handling the extracted data like below.

Review Comment:
   Nitpick - I try to avoid clichés. They can be confusing for folks who are 
not so familiar with English.
   
   ```suggestion
   At the end, we are provided with a Plain Old Java Object (POJO) handling the 
extracted data like below.
   ```



##########
data-extract-langchain4j/README.adoc:
##########
@@ -0,0 +1,126 @@
+= Unstructured Data Extraction with LangChain4j: A Camel Quarkus example
+:cq-example-description: An example that shows how to convert unstructured 
text data to structured Java objects helped with a Large Language Model and 
LangChain4j
+
+{cq-description}
+
+TIP: Check the 
https://camel.apache.org/camel-quarkus/latest/first-steps.html[Camel Quarkus 
User guide] for prerequisites
+and other general information.
+
+Suppose the volume of 
https://en.wikipedia.org/wiki/Unstructured_data[unstructured data] grows at a 
high pace in a given organization.
+How could one transform those disseminated gold particles into a conform 
bullion that could be used in banks.
+For instance, let's imagine an insurance company that would record the 
transcripts of the conversation when customers are discussing with the hotline.
+There is probably a lot of valuable information that could be extracted from 
those conversation transcripts.
+In this example, we'll convert those text conversations into Java Objects that 
could then be used in the rest of the Camel route.
+
+In order to achieve this extraction, we'll need a 
https://en.wikipedia.org/wiki/Large_language_model[Large Language Model (LLM)] 
that natively support JSON output.

Review Comment:
   ```suggestion
   In order to achieve this extraction, we'll need a 
https://en.wikipedia.org/wiki/Large_language_model[Large Language Model (LLM)] 
that natively supports JSON output.
   ```



##########
data-extract-langchain4j/README.adoc:
##########
@@ -0,0 +1,126 @@
+= Unstructured Data Extraction with LangChain4j: A Camel Quarkus example
+:cq-example-description: An example that shows how to convert unstructured 
text data to structured Java objects helped with a Large Language Model and 
LangChain4j
+
+{cq-description}
+
+TIP: Check the 
https://camel.apache.org/camel-quarkus/latest/first-steps.html[Camel Quarkus 
User guide] for prerequisites
+and other general information.
+
+Suppose the volume of 
https://en.wikipedia.org/wiki/Unstructured_data[unstructured data] grows at a 
high pace in a given organization.
+How could one transform those disseminated gold particles into a conform 
bullion that could be used in banks.
+For instance, let's imagine an insurance company that would record the 
transcripts of the conversation when customers are discussing with the hotline.
+There is probably a lot of valuable information that could be extracted from 
those conversation transcripts.
+In this example, we'll convert those text conversations into Java Objects that 
could then be used in the rest of the Camel route.
+
+In order to achieve this extraction, we'll need a 
https://en.wikipedia.org/wiki/Large_language_model[Large Language Model (LLM)] 
that natively support JSON output.
+Here, we arbitrarily choose https://ollama.com/library/codellama[codellama] 
served through https://ollama.com/[ollama].
+In order to invoke the served model, we'll use the high-level LangChain4j APIs 
like https://docs.langchain4j.dev/tutorials/ai-services[AiServices].
+As we are using the Quarkus runtime, we can leverage all the advantages of the 
https://docs.quarkiverse.io/quarkus-langchain4j/dev/index.html[Quarkus 
LangChain4j extension].
+
+=== Start the Large Language Model
+
+Let's start a container to serve the LLM with Ollama:
+
+[source,shell]
+----
+docker run -p11434:11434 langchain4j/ollama-codellama:latest
+----
+
+After a moment, a log like below should be output:
+
+[source,shell]
+----
+time=2024-09-03T08:03:15.532Z level=INFO source=types.go:98 msg="inference 
compute" id=0 library=cpu compute="" driver=0.0 name="" total="62.5 GiB" 
available="54.4 GiB"
+----
+
+That's it, the LLM is now ready to serve our data extraction requests.
+
+=== Package and run the application
+
+You are now ready to package and run the application.
+
+TIP: Find more details about the JVM mode and Native mode in the Package and 
run section of
+https://camel.apache.org/camel-quarkus/latest/first-steps.html#_package_and_run_the_application[Camel
 Quarkus User guide]
+
+==== JVM mode
+
+[source,shell]
+----
+mvn clean package -DskipTests
+java -jar target/quarkus-app/quarkus-run.jar
+----
+
+==== Extracting data from unstructured conversation
+
+Let's atomically copy/move the transcript files to the input folder named 
`target/transcripts/`, for instance like below:
+
+[source,shell]
+----
+cp -rf src/test/resources/transcripts/ target/transcripts-tmp
+mv target/transcripts-tmp/*.json target/transcripts/
+----
+
+The Camel route should output a log as below:
+
+[source,shell]
+----
+024-09-03 10:14:34,757 INFO  [route1] (Camel (camel-1) thread #1 - 
file://target/transcripts) A document has been received by the 
camel-quarkus-file extension: {
+  "id": 1,
+  "content": "Operator: Hello, how may I help you ?\nCustomer: Hello, I'm 
calling because I need to declare an accident on my main vehicle.\nOperator: 
Ok, can you please give me your name ?\nCustomer: My name is Sarah 
London.\nOperator: Could you please give me your birth date ?\nCustomer: 1986, 
July the 10th.\nOperator: Ok, I've got your contract and I'm happy to share 
with you that we'll be able to reimburse all expenses linked to this 
accident.\nCustomer: Oh great, many thanks."
+}
+----
+
+In the first log above, we can see that a JSON file handling transcript 
related information has been consumed.
+The conversation is present in the JSON field named `content`.
+This content will be injected into the LLM prompt.
+
+After a few seconds or minutes depending on your hardware setup, the LLM 
provides an answer strictly conforming to the expected JSON schema.
+It's now easy for LangChain4j to convert the returned JSON into a Java Object.
+At the end of the day, we are provided with a Plain Old Java Object (POJO) 
handling the extracted data like below.
+
+[source,shell]
+----
+2024-09-03 10:14:51,284 INFO  [org.acm.ext.CustomPojoStore] (Camel (camel-1) 
thread #1 - file://target/transcripts) An extracted POJO has been added to the 
store: 
+{
+    "customerSatisfied": "true",
+    "customerName": "Sarah London",
+    "customerBirthday": "10 July 1986",
+    "summary": "Declare an accident on main vehicle and receive reimbursement 
for expenses."
+}
+----
+
+See how the LLM shows its capacity to:
+ * Extract a human friendly sentiment like `customerSatisfied`
+ * Exhibits 
https://nlp.stanford.edu/projects/coref.shtml#:~:text=Overview,question%20answering%2C%20and%20information%20extraction.[coreference
 resolution], like `customerName` that is deduced from information spread in 
the whole conversation
+ * Manage issues related to date format, like the field `customerBirthday`
+ * Mixed structured and unstructured data (semi-structured data) with the 
field `summary`.
+
+Cherry on the cake, all those informations are computed simultaneously during 
a single LLM inference.
+
+At the end of the day, the application should have extracted 3 POJOs.

Review Comment:
   ```suggestion
   At the end, the application should have extracted 3 POJOs.
   ```



##########
data-extract-langchain4j/pom.xml:
##########
@@ -0,0 +1,329 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+         http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0";
+    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+    <modelVersion>4.0.0</modelVersion>
+
+    <artifactId>camel-quarkus-examples-data-extract-langchain4j</artifactId>
+    <groupId>org.apache.camel.quarkus.examples</groupId>
+    <version>3.15.0-SNAPSHOT</version>
+
+    <name>Camel Quarkus :: Examples :: Data Extract LangChain4j 
Repository</name>
+    <description>Camel Quarkus Example :: Data Extract LangChain4j 
Repository</description>
+
+    <properties>
+
+        <!-- Cannot upgrade to Quarkus Platform 3.14.1 because of 
https://github.com/apache/camel-quarkus/issues/6341 -->    
+        <quarkus.platform.version>3.14.0</quarkus.platform.version>
+        
<camel-quarkus.platform.version>3.15.0-SNAPSHOT</camel-quarkus.platform.version>
+
+        <quarkus.platform.group-id>io.quarkus</quarkus.platform.group-id>
+        
<quarkus.platform.artifact-id>quarkus-bom</quarkus.platform.artifact-id>
+        
<camel-quarkus.platform.group-id>org.apache.camel.quarkus</camel-quarkus.platform.group-id>
+        
<camel-quarkus.platform.artifact-id>camel-quarkus-bom</camel-quarkus.platform.artifact-id>
+
+        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+        
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
+        <maven.compiler.target>17</maven.compiler.target>
+        <maven.compiler.source>17</maven.compiler.source>
+        
<maven.compiler.testTarget>${maven.compiler.target}</maven.compiler.testTarget>
+        
<maven.compiler.testSource>${maven.compiler.source}</maven.compiler.testSource>
+
+        <formatter-maven-plugin.version>2.24.1</formatter-maven-plugin.version>
+        <groovy-maven-plugin.version>2.1.1</groovy-maven-plugin.version>
+        <impsort-maven-plugin.version>1.11.0</impsort-maven-plugin.version>
+        <license-maven-plugin.version>4.5</license-maven-plugin.version>
+        <maven-compiler-plugin.version>3.13.0</maven-compiler-plugin.version>
+        <maven-jar-plugin.version>3.4.2</maven-jar-plugin.version>
+        <maven-resources-plugin.version>3.3.1</maven-resources-plugin.version>
+        <maven-surefire-plugin.version>3.4.0</maven-surefire-plugin.version>
+    </properties>
+
+    <dependencyManagement>
+        <dependencies>
+            <!-- Import BOM -->
+            <dependency>
+                <groupId>io.quarkiverse.langchain4j</groupId>
+                <artifactId>quarkus-langchain4j-bom</artifactId>
+                <version>0.17.2</version>

Review Comment:
   Please add a version property for this.



##########
data-extract-langchain4j/pom.xml:
##########
@@ -0,0 +1,329 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+         http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0";
+    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+    <modelVersion>4.0.0</modelVersion>
+
+    <artifactId>camel-quarkus-examples-data-extract-langchain4j</artifactId>
+    <groupId>org.apache.camel.quarkus.examples</groupId>
+    <version>3.15.0-SNAPSHOT</version>
+
+    <name>Camel Quarkus :: Examples :: Data Extract LangChain4j 
Repository</name>
+    <description>Camel Quarkus Example :: Data Extract LangChain4j 
Repository</description>
+
+    <properties>
+
+        <!-- Cannot upgrade to Quarkus Platform 3.14.1 because of 
https://github.com/apache/camel-quarkus/issues/6341 -->    
+        <quarkus.platform.version>3.14.0</quarkus.platform.version>
+        
<camel-quarkus.platform.version>3.15.0-SNAPSHOT</camel-quarkus.platform.version>
+
+        <quarkus.platform.group-id>io.quarkus</quarkus.platform.group-id>
+        
<quarkus.platform.artifact-id>quarkus-bom</quarkus.platform.artifact-id>
+        
<camel-quarkus.platform.group-id>org.apache.camel.quarkus</camel-quarkus.platform.group-id>
+        
<camel-quarkus.platform.artifact-id>camel-quarkus-bom</camel-quarkus.platform.artifact-id>
+
+        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+        
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
+        <maven.compiler.target>17</maven.compiler.target>
+        <maven.compiler.source>17</maven.compiler.source>
+        
<maven.compiler.testTarget>${maven.compiler.target}</maven.compiler.testTarget>
+        
<maven.compiler.testSource>${maven.compiler.source}</maven.compiler.testSource>
+
+        <formatter-maven-plugin.version>2.24.1</formatter-maven-plugin.version>
+        <groovy-maven-plugin.version>2.1.1</groovy-maven-plugin.version>
+        <impsort-maven-plugin.version>1.11.0</impsort-maven-plugin.version>
+        <license-maven-plugin.version>4.5</license-maven-plugin.version>
+        <maven-compiler-plugin.version>3.13.0</maven-compiler-plugin.version>
+        <maven-jar-plugin.version>3.4.2</maven-jar-plugin.version>
+        <maven-resources-plugin.version>3.3.1</maven-resources-plugin.version>
+        <maven-surefire-plugin.version>3.4.0</maven-surefire-plugin.version>
+    </properties>
+
+    <dependencyManagement>
+        <dependencies>
+            <!-- Import BOM -->
+            <dependency>
+                <groupId>io.quarkiverse.langchain4j</groupId>
+                <artifactId>quarkus-langchain4j-bom</artifactId>
+                <version>0.17.2</version>
+                <type>pom</type>
+                <scope>import</scope>
+            </dependency>
+            <dependency>
+                <groupId>${quarkus.platform.group-id}</groupId>
+                <artifactId>${quarkus.platform.artifact-id}</artifactId>
+                <version>${quarkus.platform.version}</version>
+                <type>pom</type>
+                <scope>import</scope>
+            </dependency>
+            <dependency>
+                <groupId>${camel-quarkus.platform.group-id}</groupId>
+                <artifactId>${camel-quarkus.platform.artifact-id}</artifactId>
+                <version>${camel-quarkus.platform.version}</version>
+                <type>pom</type>
+                <scope>import</scope>
+            </dependency>
+        </dependencies>
+    </dependencyManagement>
+
+    <dependencies>
+        <dependency>
+            <groupId>org.apache.camel.quarkus</groupId>
+            <artifactId>camel-quarkus-bean</artifactId>
+        </dependency>
+        <dependency>
+            <groupId>org.apache.camel.quarkus</groupId>
+            <artifactId>camel-quarkus-file</artifactId>
+        </dependency>
+        <dependency>
+            <groupId>org.apache.camel.quarkus</groupId>
+            <artifactId>camel-quarkus-jsonpath</artifactId>
+        </dependency>
+        <dependency>
+            <groupId>org.apache.camel.quarkus</groupId>
+            <artifactId>camel-quarkus-platform-http</artifactId>
+        </dependency>
+        <dependency>
+            <groupId>io.quarkiverse.langchain4j</groupId>
+            <artifactId>quarkus-langchain4j-ollama</artifactId>
+        </dependency>
+        <dependency>
+            <groupId>commons-io</groupId>
+            <artifactId>commons-io</artifactId>

Review Comment:
   Should be test scoped?
   
   If so, IIRC, it's already a transitive of the Quarkus test framework so it 
could probably be removed entirely.



##########
data-extract-langchain4j/README.adoc:
##########
@@ -0,0 +1,126 @@
+= Unstructured Data Extraction with LangChain4j: A Camel Quarkus example
+:cq-example-description: An example that shows how to convert unstructured 
text data to structured Java objects helped with a Large Language Model and 
LangChain4j
+
+{cq-description}
+
+TIP: Check the 
https://camel.apache.org/camel-quarkus/latest/first-steps.html[Camel Quarkus 
User guide] for prerequisites
+and other general information.
+
+Suppose the volume of 
https://en.wikipedia.org/wiki/Unstructured_data[unstructured data] grows at a 
high pace in a given organization.
+How could one transform those disseminated gold particles into a conform 
bullion that could be used in banks.
+For instance, let's imagine an insurance company that would record the 
transcripts of the conversation when customers are discussing with the hotline.
+There is probably a lot of valuable information that could be extracted from 
those conversation transcripts.
+In this example, we'll convert those text conversations into Java Objects that 
could then be used in the rest of the Camel route.
+
+In order to achieve this extraction, we'll need a 
https://en.wikipedia.org/wiki/Large_language_model[Large Language Model (LLM)] 
that natively support JSON output.
+Here, we arbitrarily choose https://ollama.com/library/codellama[codellama] 
served through https://ollama.com/[ollama].
+In order to invoke the served model, we'll use the high-level LangChain4j APIs 
like https://docs.langchain4j.dev/tutorials/ai-services[AiServices].
+As we are using the Quarkus runtime, we can leverage all the advantages of the 
https://docs.quarkiverse.io/quarkus-langchain4j/dev/index.html[Quarkus 
LangChain4j extension].
+
+=== Start the Large Language Model
+
+Let's start a container to serve the LLM with Ollama:
+
+[source,shell]
+----
+docker run -p11434:11434 langchain4j/ollama-codellama:latest
+----
+
+After a moment, a log like below should be output:
+
+[source,shell]
+----
+time=2024-09-03T08:03:15.532Z level=INFO source=types.go:98 msg="inference 
compute" id=0 library=cpu compute="" driver=0.0 name="" total="62.5 GiB" 
available="54.4 GiB"
+----
+
+That's it, the LLM is now ready to serve our data extraction requests.
+
+=== Package and run the application
+
+You are now ready to package and run the application.
+
+TIP: Find more details about the JVM mode and Native mode in the Package and 
run section of
+https://camel.apache.org/camel-quarkus/latest/first-steps.html#_package_and_run_the_application[Camel
 Quarkus User guide]
+
+==== JVM mode
+
+[source,shell]
+----
+mvn clean package -DskipTests
+java -jar target/quarkus-app/quarkus-run.jar
+----
+
+==== Extracting data from unstructured conversation
+
+Let's atomically copy/move the transcript files to the input folder named 
`target/transcripts/`, for instance like below:
+
+[source,shell]
+----
+cp -rf src/test/resources/transcripts/ target/transcripts-tmp
+mv target/transcripts-tmp/*.json target/transcripts/
+----
+
+The Camel route should output a log as below:
+
+[source,shell]
+----
+024-09-03 10:14:34,757 INFO  [route1] (Camel (camel-1) thread #1 - 
file://target/transcripts) A document has been received by the 
camel-quarkus-file extension: {
+  "id": 1,
+  "content": "Operator: Hello, how may I help you ?\nCustomer: Hello, I'm 
calling because I need to declare an accident on my main vehicle.\nOperator: 
Ok, can you please give me your name ?\nCustomer: My name is Sarah 
London.\nOperator: Could you please give me your birth date ?\nCustomer: 1986, 
July the 10th.\nOperator: Ok, I've got your contract and I'm happy to share 
with you that we'll be able to reimburse all expenses linked to this 
accident.\nCustomer: Oh great, many thanks."
+}
+----
+
+In the first log above, we can see that a JSON file handling transcript 
related information has been consumed.
+The conversation is present in the JSON field named `content`.
+This content will be injected into the LLM prompt.
+
+After a few seconds or minutes depending on your hardware setup, the LLM 
provides an answer strictly conforming to the expected JSON schema.
+It's now easy for LangChain4j to convert the returned JSON into a Java Object.
+At the end of the day, we are provided with a Plain Old Java Object (POJO) 
handling the extracted data like below.
+
+[source,shell]
+----
+2024-09-03 10:14:51,284 INFO  [org.acm.ext.CustomPojoStore] (Camel (camel-1) 
thread #1 - file://target/transcripts) An extracted POJO has been added to the 
store: 
+{
+    "customerSatisfied": "true",
+    "customerName": "Sarah London",
+    "customerBirthday": "10 July 1986",
+    "summary": "Declare an accident on main vehicle and receive reimbursement 
for expenses."
+}
+----
+
+See how the LLM shows its capacity to:
+ * Extract a human friendly sentiment like `customerSatisfied`
+ * Exhibits 
https://nlp.stanford.edu/projects/coref.shtml#:~:text=Overview,question%20answering%2C%20and%20information%20extraction.[coreference
 resolution], like `customerName` that is deduced from information spread in 
the whole conversation
+ * Manage issues related to date format, like the field `customerBirthday`
+ * Mixed structured and unstructured data (semi-structured data) with the 
field `summary`.
+
+Cherry on the cake, all those informations are computed simultaneously during 
a single LLM inference.
+
+At the end of the day, the application should have extracted 3 POJOs.
+For each of them, it could be interesting to compare the unstructured input 
text and the corresponding structured POJO.
+
+More details could be found in the 
`src/main/java/org/acme/extraction/CustomPojoExtractionService.java` class.

Review Comment:
   ```suggestion
   More details can be found in the 
`src/main/java/org/acme/extraction/CustomPojoExtractionService.java` class.
   ```



##########
data-extract-langchain4j/pom.xml:
##########
@@ -0,0 +1,329 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+         http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+
+-->
+<project xmlns="http://maven.apache.org/POM/4.0.0";
+    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+    <modelVersion>4.0.0</modelVersion>
+
+    <artifactId>camel-quarkus-examples-data-extract-langchain4j</artifactId>
+    <groupId>org.apache.camel.quarkus.examples</groupId>
+    <version>3.15.0-SNAPSHOT</version>
+
+    <name>Camel Quarkus :: Examples :: Data Extract LangChain4j 
Repository</name>
+    <description>Camel Quarkus Example :: Data Extract LangChain4j 
Repository</description>
+
+    <properties>
+
+        <!-- Cannot upgrade to Quarkus Platform 3.14.1 because of 
https://github.com/apache/camel-quarkus/issues/6341 -->    
+        <quarkus.platform.version>3.14.0</quarkus.platform.version>
+        
<camel-quarkus.platform.version>3.15.0-SNAPSHOT</camel-quarkus.platform.version>
+
+        <quarkus.platform.group-id>io.quarkus</quarkus.platform.group-id>
+        
<quarkus.platform.artifact-id>quarkus-bom</quarkus.platform.artifact-id>
+        
<camel-quarkus.platform.group-id>org.apache.camel.quarkus</camel-quarkus.platform.group-id>
+        
<camel-quarkus.platform.artifact-id>camel-quarkus-bom</camel-quarkus.platform.artifact-id>
+
+        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
+        
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
+        <maven.compiler.target>17</maven.compiler.target>
+        <maven.compiler.source>17</maven.compiler.source>
+        
<maven.compiler.testTarget>${maven.compiler.target}</maven.compiler.testTarget>
+        
<maven.compiler.testSource>${maven.compiler.source}</maven.compiler.testSource>
+
+        <formatter-maven-plugin.version>2.24.1</formatter-maven-plugin.version>
+        <groovy-maven-plugin.version>2.1.1</groovy-maven-plugin.version>
+        <impsort-maven-plugin.version>1.11.0</impsort-maven-plugin.version>
+        <license-maven-plugin.version>4.5</license-maven-plugin.version>
+        <maven-compiler-plugin.version>3.13.0</maven-compiler-plugin.version>
+        <maven-jar-plugin.version>3.4.2</maven-jar-plugin.version>
+        <maven-resources-plugin.version>3.3.1</maven-resources-plugin.version>
+        <maven-surefire-plugin.version>3.4.0</maven-surefire-plugin.version>
+    </properties>
+
+    <dependencyManagement>
+        <dependencies>
+            <!-- Import BOM -->
+            <dependency>
+                <groupId>io.quarkiverse.langchain4j</groupId>
+                <artifactId>quarkus-langchain4j-bom</artifactId>
+                <version>0.17.2</version>
+                <type>pom</type>
+                <scope>import</scope>
+            </dependency>
+            <dependency>
+                <groupId>${quarkus.platform.group-id}</groupId>
+                <artifactId>${quarkus.platform.artifact-id}</artifactId>
+                <version>${quarkus.platform.version}</version>
+                <type>pom</type>
+                <scope>import</scope>
+            </dependency>
+            <dependency>
+                <groupId>${camel-quarkus.platform.group-id}</groupId>
+                <artifactId>${camel-quarkus.platform.artifact-id}</artifactId>
+                <version>${camel-quarkus.platform.version}</version>
+                <type>pom</type>
+                <scope>import</scope>
+            </dependency>
+        </dependencies>
+    </dependencyManagement>
+
+    <dependencies>
+        <dependency>
+            <groupId>org.apache.camel.quarkus</groupId>
+            <artifactId>camel-quarkus-bean</artifactId>
+        </dependency>
+        <dependency>
+            <groupId>org.apache.camel.quarkus</groupId>
+            <artifactId>camel-quarkus-file</artifactId>
+        </dependency>
+        <dependency>
+            <groupId>org.apache.camel.quarkus</groupId>
+            <artifactId>camel-quarkus-jsonpath</artifactId>
+        </dependency>
+        <dependency>
+            <groupId>org.apache.camel.quarkus</groupId>
+            <artifactId>camel-quarkus-platform-http</artifactId>
+        </dependency>
+        <dependency>
+            <groupId>io.quarkiverse.langchain4j</groupId>
+            <artifactId>quarkus-langchain4j-ollama</artifactId>
+        </dependency>
+        <dependency>
+            <groupId>commons-io</groupId>
+            <artifactId>commons-io</artifactId>
+        </dependency>
+
+        <!-- Test -->
+        <dependency>
+            <groupId>io.quarkus</groupId>
+            <artifactId>quarkus-junit5</artifactId>
+            <scope>test</scope>
+        </dependency>
+        <dependency>
+            <groupId>org.awaitility</groupId>
+            <artifactId>awaitility</artifactId>
+            <scope>test</scope>
+        </dependency>
+        <dependency>
+            <groupId>io.rest-assured</groupId>
+            <artifactId>rest-assured</artifactId>
+            <scope>test</scope>
+        </dependency>
+        <dependency>
+            <groupId>org.testcontainers</groupId>
+            <artifactId>testcontainers</artifactId>
+            <scope>test</scope>
+        </dependency>
+        <dependency>
+            <groupId>org.wiremock</groupId>
+            <artifactId>wiremock-standalone</artifactId>
+            <version>3.9.1</version>

Review Comment:
   Needs a version property. Also, please add WireMock to the Dependabot config.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add an example of data extraction with Quarkus LangChain4j [camel-quarkus-examples]

Reply via email to