[ 
https://issues.apache.org/jira/browse/TIKA-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938216#comment-17938216
 ] 

Tim Allison commented on TIKA-4395:
-----------------------------------

This is really hard without the file. I understand limitations on sharing.

a) are you getting any exceptions?
b) do you get content if you turn on streaming? Streaming should be more robust 
to differences in structure than the dom-based default parser.
c) does it look like you're getting something from all the slides? Or, does it 
look like the iteration through the slides is the problem?
d) how specifically are you calling Tika?
e) Have you tried with the 3.x branch? I don't think there's anything 
significantly different for pptx, but newer should be better.

If you're able to share the file with me personally, I can send you my email 
address, but I understand if this is not possible.

I'll look at the pptx code and see what we're doing.

> cannot get any slide content for pptx file
> ------------------------------------------
>
>                 Key: TIKA-4395
>                 URL: https://issues.apache.org/jira/browse/TIKA-4395
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 2.9.3
>            Reporter: james
>            Priority: Major
>
> i have a reasonably large pptx file from which i don't get any slide content. 
>  i get slide notes, and some ocr from embedded images, but not the slide 
> content itself.  unfortunately, i cannot share the file, but i can answer 
> questions about it if necessary (and can probably share some of the internal 
> structure related files). 
>  
> using poi 5.4.0, not in streaming mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to