[ https://issues.apache.org/jira/browse/TIKA-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17943199#comment-17943199 ]
Tim Allison commented on TIKA-4395: ----------------------------------- Y, you're right. I was focusing on users who bypass that AutoDetectParser. But there was still more to do even for those using the TikaInputStream as is the default with AutoDetectParser etc. I've pushed a fix to main. I'm working on cherry-picking that back to 3.x. I don't think I'll have time or patience to cherrypick that back to 2.x. > cannot get any slide content for pptx file > ------------------------------------------ > > Key: TIKA-4395 > URL: https://issues.apache.org/jira/browse/TIKA-4395 > Project: Tika > Issue Type: Bug > Affects Versions: 2.9.3, 3.1.0 > Reporter: james > Assignee: Tim Allison > Priority: Major > > i have a reasonably large pptx file from which i don't get any slide content. > i get slide notes, and some ocr from embedded images, but not the slide > content itself. unfortunately, i cannot share the file, but i can answer > questions about it if necessary (and can probably share some of the internal > structure related files). > > using poi 5.4.0, not in streaming mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)