https://bz.apache.org/bugzilla/show_bug.cgi?id=69628

            Bug ID: 69628
           Summary: unable to reduce memory usage when parsing docx file
           Product: POI
           Version: 5.4.0-FINAL
          Hardware: PC
                OS: Mac OS X 10.1
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XWPF
          Assignee: dev@poi.apache.org
          Reporter: jtahlb...@yahoo.com
  Target Milestone: ---

i attempted to reduce the amount of memory used when parsing poi documents by
calling IOUtils.setByteArrayMaxOverride() with 50MB.  however, i am then unable
to parse a simple docx file.  it seems that for whatever reason, one of the zip
entries in the file does not have a content length specified (i.e.
entrySize=-1).  therefore, when ZipArchiveFakeEntry attempts to open the entry
(line ~82), it uses ZipArchiveFakeEntry.MAX_ENTRY_SIZE for the size of the
entry (which is 100MB), which results in an exception being thrown (because
100MB > 50MB).  this class is package private, so i am unable to call
ZipArchiveFakeEntry.setMaxEntrySize() to make it use 50MB instead (without
using reflection nastiness).

ideally, ZipArchiveFakeEntry should constrain the max entry size to the min of
ZipArchiveFakeEntry.MAX_ENTRY_SIZE and IOUtils.BYTE_ARRAY_MAX_OVERRIDE so that
setting the value in IOUtils works without requiring additional configuration
changes.

as a secondary note, using the MAX_ENTRY_SIZE in ZipArchiveFakeEntry for
entries with an unknown content length results in excessive memory usage when
the entries don't end up being that large.  ideally, entries with unknown
length should be uncompressed starting with a smaller array size, expanding up
to the max as necessary.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to