[ https://issues.apache.org/jira/browse/TIKA-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison reopened TIKA-4395: ------------------------------- Assignee: Tim Allison This should not be the default behavior. We should find a way to cache to disk for large files even during detection. If users want "high performance" in memory and don't want to cache to disk at the risk of getting a bad detection (pkg instead of ooxml), they should configure that. > cannot get any slide content for pptx file > ------------------------------------------ > > Key: TIKA-4395 > URL: https://issues.apache.org/jira/browse/TIKA-4395 > Project: Tika > Issue Type: Bug > Affects Versions: 2.9.3, 3.1.0 > Reporter: james > Assignee: Tim Allison > Priority: Major > > i have a reasonably large pptx file from which i don't get any slide content. > i get slide notes, and some ocr from embedded images, but not the slide > content itself. unfortunately, i cannot share the file, but i can answer > questions about it if necessary (and can probably share some of the internal > structure related files). > > using poi 5.4.0, not in streaming mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)