[ https://issues.apache.org/jira/browse/TIKA-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
james closed TIKA-4395. ----------------------- Resolution: Not A Bug turns out when parsing an InputStream, tika may not recognize some ooxml files and will instead parse them as generic zip files, resulting in useless content. by writing the file to a temp file first before parsing, it will then be correctly parsed as an ooxml file. > cannot get any slide content for pptx file > ------------------------------------------ > > Key: TIKA-4395 > URL: https://issues.apache.org/jira/browse/TIKA-4395 > Project: Tika > Issue Type: Bug > Affects Versions: 2.9.3, 3.1.0 > Reporter: james > Priority: Major > > i have a reasonably large pptx file from which i don't get any slide content. > i get slide notes, and some ocr from embedded images, but not the slide > content itself. unfortunately, i cannot share the file, but i can answer > questions about it if necessary (and can probably share some of the internal > structure related files). > > using poi 5.4.0, not in streaming mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)