Peter Hoogendijk created TIKA-4458: -------------------------------------- Summary: PDFParser with Tesseract: Improve documentation about embedded JP2 and JB2 files Key: TIKA-4458 URL: https://issues.apache.org/jira/browse/TIKA-4458 Project: Tika Issue Type: Wish Components: parser Affects Versions: 3.2.1 Reporter: Peter Hoogendijk
When using Tika-app 3.2.1 with Tesseract 5.3.0 to parse PDF-files with embedded JP2 and JB2 data the following errors are reported: {code:java} ERROR [main] 20:26:27,356 org.apache.pdfbox.contentstream.PDFStreamEngine Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed {code} Installing jai 1.1.3 and jai-imageio 1.1 in the OpenJDK 17 lib directory does not change the error messages. Please provide instructions (or a link to existing instructions) on how to configure Apache Tika to solve this error. After a lot of searching I only found instructions how to configure PDFBox (in pom.xml) but this does not solve the issue for Apache Tika. How do I translate the required PDFBox configuration sections to the Apache Tika cofiguration file? -- This message was sent by Atlassian Jira (v8.20.10#820010)