kbachuHighSpot commented on PR #1473: URL: https://github.com/apache/tika/pull/1473#issuecomment-2218995629
Thank you. That worked but I bumped into a new issue now after working through few other huccups. I am trying to parse a ppt file. ``` import org.apache.tika.io.TikaInputStream; import org.apache.tika.metadata.Metadata; import org.apache.tika.parser.AutoDetectParser; import org.apache.tika.parser.ParseContext; import org.apache.tika.parser.Parser; import org.apache.tika.sax.BodyContentHandler; import org.apache.tika.sax.OfflineContentHandler; import org.apache.tika.parser.ocr.TesseractOCRConfig; TesseractOCRConfig config = new TesseractOCRConfig(); config.setSkipOcr(true); ParseContext context = new ParseContext(); context.set(TesseractOCRConfig.class, config); Parser parser = new AutoDetectParser(); Metadata metadata = new Metadata(); OfflineContentHandler handler = new OfflineContentHandler(new BodyContentHandler(writer)); // Note: here we have to use TikaInputStream.get, otherwise certain content type (e.g. 2007 // pptx) might not be correctly detected by the parser try (InputStream original = TikaInputStream.get(input, metadata)) { parser.parse(original, handler, metadata, context); ==> Above call is crashing with Execution error (NoSuchMethodError) at org.apache.poi.util.IOUtils/toByteArray (IOUtils.java:241). 'org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream$Builder org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream.builder()' } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org