apupier commented on code in PR #20979:
URL: https://github.com/apache/camel/pull/20979#discussion_r2716751965
##########
components/camel-ai/camel-docling/src/test/java/org/apache/camel/component/docling/integration/MetadataExtractionIT.java:
##########
@@ -79,8 +83,37 @@ public void testBasicMetadataExtraction() throws Exception {
LOG.info("File name: {}", metadata.getFileName());
LOG.info("File size: {} bytes", metadata.getFileSizeBytes());
}
-
+
@Test
+ void testMetadataExtractionFromPdf() throws Exception {
+ Path testFile = createTestPdfFile();
+
+ DocumentMetadata metadata =
template.requestBody("direct:extract-metadata",
+ testFile.toString(), DocumentMetadata.class);
+
+ assertNotNull(metadata, "Metadata should not be null");
+ assertNotNull(metadata.getFileName(), "File name should be extracted");
+ assertTrue(metadata.getFileSizeBytes() > 0, "File size should be
greater than 0");
+ assertNotNull(metadata.getFilePath(), "File path should be set");
+ assertThat(metadata.getPageCount()).isEqualTo(5);
+ assertThat(metadata.getFormat()).isEqualTo("application/pdf");
+ // TODO: assertThat(metadata.getTitle()).isEqualTo("The Evolution of
the Word Processor");
+ // TODO: assertThat(metadata.getDocumentType()).isEqualTo("PDF");
Review Comment:
I'm not sure to understand the interest of the DocumentType given that there
is already a mime type
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]