[ https://issues.apache.org/jira/browse/TIKA-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864420#comment-17864420 ]
ASF GitHub Bot commented on TIKA-3347: -------------------------------------- kbachuHighSpot commented on PR #1473: URL: https://github.com/apache/tika/pull/1473#issuecomment-2218995629 Thank you. That worked but I bumped into a new issue now after working through few other huccups. I am trying to parse a ppt file. ``` import org.apache.tika.io.TikaInputStream; import org.apache.tika.metadata.Metadata; import org.apache.tika.parser.AutoDetectParser; import org.apache.tika.parser.ParseContext; import org.apache.tika.parser.Parser; import org.apache.tika.sax.BodyContentHandler; import org.apache.tika.sax.OfflineContentHandler; import org.apache.tika.parser.ocr.TesseractOCRConfig; TesseractOCRConfig config = new TesseractOCRConfig(); config.setSkipOcr(true); ParseContext context = new ParseContext(); context.set(TesseractOCRConfig.class, config); Parser parser = new AutoDetectParser(); Metadata metadata = new Metadata(); OfflineContentHandler handler = new OfflineContentHandler(new BodyContentHandler(writer)); // Note: here we have to use TikaInputStream.get, otherwise certain content type (e.g. 2007 // pptx) might not be correctly detected by the parser try (InputStream original = TikaInputStream.get(input, metadata)) { parser.parse(original, handler, metadata, context); ==> Above call is crashing with Execution error (NoSuchMethodError) at org.apache.poi.util.IOUtils/toByteArray (IOUtils.java:241). 'org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream$Builder org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream.builder()' } ``` > Upgrade to PDFBox 3.x when available > ------------------------------------ > > Key: TIKA-3347 > URL: https://issues.apache.org/jira/browse/TIKA-3347 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Priority: Major > > 3.0.0-RC1 was recently released. We should integrate it on a dev branch asap > so that we can help with regression testing... -- This message was sent by Atlassian Jira (v8.20.10#820010)