I've added SAX parsers for pptx and docx over on Apache Tika. These
rely on POI for OPCPackage, a bunch of other classes and overall
design.
I've thought about moving that code into POI, but I haven't found the
time or need, and the code is my typical kludgy-mess...and I don't
want to pollute POI
Noone that I know of is actively working on a streaming API for docx or pptx.
Contributions to POI in these areas would be welcome.
One low level approach is to read docx/pptx files as zip files. If they are
password protected, you can use POI to first create a copy of the file with
the password p
Hi team
We are trying to read the data from office documents like xlsx, xls, docx
etc.,. But we are facing memory issues while reading OOXML file formatted
files, of large size(around 100 MB) using POI apis. For xls/xlsx formats there
are event based APIs which solve the memory issue(XSSF/HSSF