Hi team We are trying to read the data from office documents like xlsx, xls, docx etc.,. But we are facing memory issues while reading OOXML file formatted files, of large size(around 100 MB) using POI apis. For xls/xlsx formats there are event based APIs which solve the memory issue(XSSF/HSSF event based API). But for reading word files or ppt files, there are no event based APIs. We have to create XWPF/HWPF Document which consumes lot of memory , ex: for 45 MB DOCX file, the heap size to prepare XWPFDocument it's taking 12GB memory.
So similar to Xlsx files, is there any plan to provide event based apis for rest of office documents.? And if there is any workaround to read the data with less memory consumption. Please let me know? Our use case is to just read the data. Thanks Chaitanya