[ 
https://issues.apache.org/jira/browse/TIKA-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864420#comment-17864420
 ] 

ASF GitHub Bot commented on TIKA-3347:
--------------------------------------

kbachuHighSpot commented on PR #1473:
URL: https://github.com/apache/tika/pull/1473#issuecomment-2218995629

   Thank you. That worked but I bumped into a new issue now after working 
through few other huccups. 
   I am trying to parse a ppt file.
   
   ```
   import org.apache.tika.io.TikaInputStream;
   import org.apache.tika.metadata.Metadata;
   import org.apache.tika.parser.AutoDetectParser;
   import org.apache.tika.parser.ParseContext;
   import org.apache.tika.parser.Parser;
   import org.apache.tika.sax.BodyContentHandler;
   import org.apache.tika.sax.OfflineContentHandler;
   import org.apache.tika.parser.ocr.TesseractOCRConfig;
   
       TesseractOCRConfig config = new TesseractOCRConfig();
       config.setSkipOcr(true);
       ParseContext context = new ParseContext();
       context.set(TesseractOCRConfig.class, config);
   
       Parser parser = new AutoDetectParser();
       Metadata metadata = new Metadata();
       OfflineContentHandler handler = new OfflineContentHandler(new 
BodyContentHandler(writer));
   
       // Note: here we have to use TikaInputStream.get, otherwise certain 
content type (e.g. 2007
       // pptx) might not be correctly detected by the parser
       try (InputStream original = TikaInputStream.get(input, metadata)) {
         parser.parse(original, handler, metadata, context); 
                    ==> Above call is crashing with
           Execution error (NoSuchMethodError) at 
org.apache.poi.util.IOUtils/toByteArray (IOUtils.java:241).
   'org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream$Builder 
org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream.builder()'
       }
   ```




> Upgrade to PDFBox 3.x when available
> ------------------------------------
>
>                 Key: TIKA-3347
>                 URL: https://issues.apache.org/jira/browse/TIKA-3347
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> 3.0.0-RC1 was recently released.  We should integrate it on a dev branch asap 
> so that we can help with regression testing...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to