Noone that I know of is actively working on a streaming API for docx or pptx. Contributions to POI in these areas would be welcome.
One low level approach is to read docx/pptx files as zip files. If they are password protected, you can use POI to first create a copy of the file with the password protection removed (this supports streaming). The zip files contain XML files that have the content and the metadata (eg style data). The XML can be parsed with SAX or StAX parsers. The XML specs are detailed in https://en.wikipedia.org/wiki/Office_Open_XML docx4j may be an option. As far as I know it does not support streaming but it's possible it uses less memory when reading docx or pptx files. For ppt and doc formats, I believe that the data formats don't lend themselves to streaming the data. -- Sent from: http://apache-poi.1045710.n5.nabble.com/POI-Dev-f2312866.html --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org