[ https://issues.apache.org/jira/browse/TIKA-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18009904#comment-18009904 ]
Peter Hoogendijk commented on TIKA-4458: ---------------------------------------- Thanks for the pointers. I'll test on Monday when I'm back at work (today is already the weekend for me). I'm testing with tika-app but I expect it will work the same for tika-server. I'll let you know the result. And thanks again for the support on my previous issues with the metadata: everything I needed is now available :). > PDFParser with Tesseract: Improve documentation about embedded JP2 and JB2 > files > -------------------------------------------------------------------------------- > > Key: TIKA-4458 > URL: https://issues.apache.org/jira/browse/TIKA-4458 > Project: Tika > Issue Type: Wish > Components: documentation, ocr > Affects Versions: 3.2.1 > Reporter: Peter Hoogendijk > Priority: Minor > > When using Tika-app 3.2.1 with Tesseract 5.3.0 to parse PDF-files with > embedded JP2 and JB2 data the following errors are reported: > {code:java} > ERROR [main] 20:26:27,356 org.apache.pdfbox.contentstream.PDFStreamEngine > Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are > not installed {code} > Installing jai 1.1.3 and jai-imageio 1.1 in the OpenJDK 17 lib directory does > not change the error messages. Is it enough to put the *.jar and *.so files > in that directory, or is more required? > Please provide instructions (or a link to existing instructions) on how to > configure Apache Tika to solve this error. After a lot of searching I only > found instructions how to configure PDFBox (in pom.xml) but this does not > solve the issue for Apache Tika. How do I translate the required PDFBox > configuration sections to the Apache Tika cofiguration file? -- This message was sent by Atlassian Jira (v8.20.10#820010)