[ 
https://issues.apache.org/jira/browse/TIKA-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison reopened TIKA-4395:
-------------------------------
      Assignee: Tim Allison

This should not be the default behavior. We should find a way to cache to disk 
for large files even during detection.

If users want "high performance" in memory and don't want to cache to disk at 
the risk of getting a bad detection (pkg instead of ooxml), they should 
configure that.

> cannot get any slide content for pptx file
> ------------------------------------------
>
>                 Key: TIKA-4395
>                 URL: https://issues.apache.org/jira/browse/TIKA-4395
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 2.9.3, 3.1.0
>            Reporter: james
>            Assignee: Tim Allison
>            Priority: Major
>
> i have a reasonably large pptx file from which i don't get any slide content. 
>  i get slide notes, and some ocr from embedded images, but not the slide 
> content itself.  unfortunately, i cannot share the file, but i can answer 
> questions about it if necessary (and can probably share some of the internal 
> structure related files). 
>  
> using poi 5.4.0, not in streaming mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to