Re: [PR] Improve Docling metadata retrieval: pageNumber and format (mimetype) [camel]

via GitHub Thu, 22 Jan 2026 04:44:06 -0800


apupier commented on code in PR #20979:
URL: https://github.com/apache/camel/pull/20979#discussion_r2716751965



##########
components/camel-ai/camel-docling/src/test/java/org/apache/camel/component/docling/integration/MetadataExtractionIT.java:
##########
@@ -79,8 +83,37 @@ public void testBasicMetadataExtraction() throws Exception {
         LOG.info("File name: {}", metadata.getFileName());
         LOG.info("File size: {} bytes", metadata.getFileSizeBytes());
     }
-
+    
     @Test
+    void testMetadataExtractionFromPdf() throws Exception {
+        Path testFile = createTestPdfFile();
+
+        DocumentMetadata metadata = 
template.requestBody("direct:extract-metadata",
+                testFile.toString(), DocumentMetadata.class);
+
+        assertNotNull(metadata, "Metadata should not be null");
+        assertNotNull(metadata.getFileName(), "File name should be extracted");
+        assertTrue(metadata.getFileSizeBytes() > 0, "File size should be 
greater than 0");
+        assertNotNull(metadata.getFilePath(), "File path should be set");
+        assertThat(metadata.getPageCount()).isEqualTo(5);
+        assertThat(metadata.getFormat()).isEqualTo("application/pdf");
+        // TODO: assertThat(metadata.getTitle()).isEqualTo("The Evolution of 
the Word Processor");
+        // TODO: assertThat(metadata.getDocumentType()).isEqualTo("PDF");

Review Comment:
   I'm not sure to understand the interest of the DocumentType given that there 
is already a mime type



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Improve Docling metadata retrieval: pageNumber and format (mimetype) [camel]

Reply via email to