Re: Event Based APIs for parsing docx,doc,pptx,ppt files

2019-02-15 Thread Tim Allison
I've added SAX parsers for pptx and docx over on Apache Tika. These rely on POI for OPCPackage, a bunch of other classes and overall design. I've thought about moving that code into POI, but I haven't found the time or need, and the code is my typical kludgy-mess...and I don't want to pollute POI

Re: Event Based APIs for parsing docx,doc,pptx,ppt files

2019-02-14 Thread pj.fanning
Noone that I know of is actively working on a streaming API for docx or pptx. Contributions to POI in these areas would be welcome. One low level approach is to read docx/pptx files as zip files. If they are password protected, you can use POI to first create a copy of the file with the password p

Event Based APIs for parsing docx,doc,pptx,ppt files

2019-02-14 Thread Kalam, Venkata Krishna Chaitanya
Hi team We are trying to read the data from office documents like xlsx, xls, docx etc.,. But we are facing memory issues while reading OOXML file formatted files, of large size(around 100 MB) using POI apis. For xls/xlsx formats there are event based APIs which solve the memory issue(XSSF/HSSF