[ 
https://issues.apache.org/jira/browse/TIKA-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17943199#comment-17943199
 ] 

Tim Allison commented on TIKA-4395:
-----------------------------------

Y, you're right. I was focusing on users who bypass that AutoDetectParser. But 
there was still more to do even for those using the TikaInputStream as is the 
default with AutoDetectParser etc.

I've pushed a fix to main. I'm working on cherry-picking that back to 3.x. I 
don't think I'll have time or patience to cherrypick that back to 2.x.

> cannot get any slide content for pptx file
> ------------------------------------------
>
>                 Key: TIKA-4395
>                 URL: https://issues.apache.org/jira/browse/TIKA-4395
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 2.9.3, 3.1.0
>            Reporter: james
>            Assignee: Tim Allison
>            Priority: Major
>
> i have a reasonably large pptx file from which i don't get any slide content. 
>  i get slide notes, and some ocr from embedded images, but not the slide 
> content itself.  unfortunately, i cannot share the file, but i can answer 
> questions about it if necessary (and can probably share some of the internal 
> structure related files). 
>  
> using poi 5.4.0, not in streaming mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to