https://bz.apache.org/bugzilla/show_bug.cgi?id=61371
Javen O'Neal <one...@apache.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO --- Comment #10 from Javen O'Neal <one...@apache.org> --- attachment 35199 references a different file that was uploaded to Google Sheets which is 2.75 MB instead of 8 KB in size. Unzipping the 2.75 MB file expands to 27.9 MB (90% compression ratio is typical for workbooks that do not contain embedded objects). xl/worksheets/sheet1.xml is 27.9 MB (leaving a few KB for the rest of the extracted files). Inspecting the XML, I see that the workbook defines 1 cell at A1 and 1 million rows: <sheetFormatPr defaultColWidth="0" defaultRowHeight="15" zeroHeight="1"/> <cols> <col min="1" max="1" width="9.140625" customWidth="1"/> <col min="2" max="16384" width="9.140625" hidden="1"/> </cols> <sheetData> <row r="1" spans="1:1"> <c r="A1" t="s"><v>0</v></c> </row> <row r="2" spans="1:1" hidden="1"/> <row r="3" spans="1:1" hidden="1"/> ... <row r="16" spans="1:1" hidden="1"/> <row r="17" hidden="1"/> ... <row r="1048556" hidden="1"/> <row r="1048557" hidden="1"/> </sheetData> POI uses a TreeMap<int rowNumber, XSSFRow row> to store these rows for fast random and sequential access, at the cost of some memory. Oracle Java 8 docs state that it implements TreeMap using a self-balancing pointer-based red-black tree. This should be fine. To test that your JVM can handle 1 million items in a TreeMap, do something like the following: private static final Random rand = new Random(); private Object createFakeRow(int nbytes) { byte[] row = new byte[nbytes]; rand.nextBytes(row); return row; } SortedMap<Integer, Object> rows = new TreeMap<>(); for (int r=1; r<=1048557; r++) { // create an object that consumes 10 KB of RAM in place of a real XSSFRow rows.put(i, createFakeRow(10*1024)); } If that works without issue, then the next focus is on what XMLBeans is doing as it's reading Sheet1.xml. You could look at how POI unzips the file into memory and reads each XML file into an XML DOM using XMLBeans and how much extra memory is consumed by the CT classes. Before we blame XMLBeans, we'd have to fairly compare it with JAXB. Here's a Google Drive link to this file since the file exceeds the file size limit of bugzilla: https://drive.google.com/file/d/0B2v9cndcBwIWeURpNXVrSjJYbDg/view Based on your reported 9 GB of RAM, the amortized size of each XSSFRow would be roughly 10 KB. To make sure this really isn't a bug in POI's handling of hidden rows or columns, we'd need to test for OOM on a workbook with the same number of rows defined but all of them visible. I think you have enough ideas here to try to figure out where the problem is, which is needed before a potential fix can be written. -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org