Noone that I know of is actively working on a streaming API for docx or pptx.
Contributions to POI in these areas would be welcome.

One low level approach is to read docx/pptx files as zip files. If they are
password protected, you can use POI to first create a copy of the file with
the password protection removed (this supports streaming).

The zip files contain XML files that have the content and the metadata (eg
style data). The XML can be parsed with SAX or StAX parsers. The XML specs
are detailed in https://en.wikipedia.org/wiki/Office_Open_XML

docx4j may be an option. As far as I know it does not support streaming but
it's possible it uses less memory when reading docx or pptx files.

For ppt and doc formats, I believe that the data formats don't lend
themselves to streaming the data.




--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-Dev-f2312866.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to