On Sun, 14 Dec 2014, Sumedh wrote:
We are using POI SXSSF for writing excel files. We faced heap memory usage
issues with large excels containing large amounts of text data.

Currently, SXSSF stores the shared strings table completely in memory, and
if shared strings table is not used, the xlsx file is not compatible with
some clients (like iPad) (
https://issues.apache.org/bugzilla/show_bug.cgi?id=53130).

I keep hoping that this issue will trip up someone with a hefty support contract with Apple, who'll be able to get the Cupertino lads and lasses to go and properly read the OOXML spec, but sadly that hasn't happened yet...

We tried using MapDB to store shared strings table, which flushes entries
to disk if there are a large number of entries. It's working quite well,
and successfully writes large files where POI currently throws OOM error.

If we were to add a dependency on an on-disk DB to help with this, it'd need to be:
 * Optional - people with mid-sized files could continue as now
 * Small - adding on-disk support for SXSSF shouldn't dramatically
   increase the size of the POI library
 * Suitably licensed - see
   http://www.apache.org/legal/resolved.html#category-a for what licenses
   we can accept dependencies under
 * Well unit tested
 * Done by someone else ;-)

Currently, everything I do for $DAYJOB can be done with XSSF+SXSSF for writing, and XSSF+SAX stuff for reading, so I have no real work need for SXSSF to be better memory wise. I therefore can't spent work time on it, but I'm happy to spend some persoanl time looking at patches if other community members want to put the work in. I suspect many other POI committers are the same - the current code is good enough that we can't convince our bosses to let us work on it during the day, but we see the value for everyone else to give up an evening or two to review patches to make it better for others.

So, if an on-disk backed SXSSF shared strings table is of interest, and you can find a suitable smallish + tested + licensed library, go ahead and start on adding it as an option! http://poi.apache.org/guidelines.html should cover the rest of what we need. Ask on the dev list for advice if you have a few ways to go, and need suggestions / advice / etc!

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to