Thanks Nick.

1. For a quick win, is it possible to provide a hook so that we can plug in
an overridden implementation of SharedStringTable class? As far as I saw,
there is no clean pluggability available right now (but I have very little
understanding of POI codebase).

2. If that works well, we can explore using MapDB as one of the options to
be used natively after considering all the other factors (like licensing
and size)...or may be some other smaller library focused only on this
aspect, or Alex's homegrown code. :)

BTW, MapDB is free as speech and free as beer under Apache License 2.0
<https://github.com/jankotek/MapDB/blob/master/doc/license.txt>. :)
- https://github.com/jankotek/MapDB/blob/master/license.txt








On Mon, Dec 15, 2014 at 7:22 PM, Nick Burch <[email protected]> wrote:
>
> On Sun, 14 Dec 2014, Sumedh wrote:
>
>> We are using POI SXSSF for writing excel files. We faced heap memory usage
>> issues with large excels containing large amounts of text data.
>>
>> Currently, SXSSF stores the shared strings table completely in memory, and
>> if shared strings table is not used, the xlsx file is not compatible with
>> some clients (like iPad) (
>> https://issues.apache.org/bugzilla/show_bug.cgi?id=53130).
>>
>
> I keep hoping that this issue will trip up someone with a hefty support
> contract with Apple, who'll be able to get the Cupertino lads and lasses to
> go and properly read the OOXML spec, but sadly that hasn't happened yet...
>
>  We tried using MapDB to store shared strings table, which flushes entries
>> to disk if there are a large number of entries. It's working quite well,
>> and successfully writes large files where POI currently throws OOM error.
>>
>
> If we were to add a dependency on an on-disk DB to help with this, it'd
> need to be:
>  * Optional - people with mid-sized files could continue as now
>  * Small - adding on-disk support for SXSSF shouldn't dramatically
>    increase the size of the POI library
>  * Suitably licensed - see
>    http://www.apache.org/legal/resolved.html#category-a for what licenses
>    we can accept dependencies under
>  * Well unit tested
>  * Done by someone else ;-)
>
> Currently, everything I do for $DAYJOB can be done with XSSF+SXSSF for
> writing, and XSSF+SAX stuff for reading, so I have no real work need for
> SXSSF to be better memory wise. I therefore can't spent work time on it,
> but I'm happy to spend some persoanl time looking at patches if other
> community members want to put the work in. I suspect many other POI
> committers are the same - the current code is good enough that we can't
> convince our bosses to let us work on it during the day, but we see the
> value for everyone else to give up an evening or two to review patches to
> make it better for others.
>
> So, if an on-disk backed SXSSF shared strings table is of interest, and
> you can find a suitable smallish + tested + licensed library, go ahead and
> start on adding it as an option! http://poi.apache.org/guidelines.html
> should cover the rest of what we need. Ask on the dev list for advice if
> you have a few ways to go, and need suggestions / advice / etc!
>
> Nick
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

-- 
Cheers,
Sumedh
http://www.linkedin.com/in/sumedhinamdar
Ph: +91 - 95610 99125

Reply via email to