Mahmood,

that was discussed many times already: XSSFWorkbook is a memory hog as it holds 
the whole information in a DOM tree with
a lot of strings. (Does Java 8 String Deduplication help on that, when you have 
enough cores?)
For cases like yours reading from a SXSSFWorkbook/stream will be more suitable. 
You also might have a look at: https://g
ithub.com/monitorjbl/excel-streaming-reader eventually.

Best regards
Andreas

On Wed, 2017-05-03 at 15:40 +0000, Mahmood Naderan wrote:
> Hi
> 
> I used the Apache POI to read an excel file. The file has 16000 rows and 50 
> columns. 
> 
> 
> Each cell is read as a string and they are put into a 2D array 
> 
> 
> (e.g. a row is a string array with size 50 and 16000 rows are put in another 
> array). 
> 
> 
> the code looks like
> 
> 
> 
> try (FileInputStream fIP = new FileInputStream(selectedFile)) {
>   XSSFWorkbook wb = new XSSFWorkbook(fIP);
>   XSSFSheet ws = wb.getSheetAt(0);
>   maxRows = ws.getLastRowNum() + 1;
>   maxColumns = ws.getRow(0).getLastCellNum();
>   theRows = new OneRow[maxRows]; 
>   Iterator< Row > rowIt = ws.iterator();
>   XSSFRow row;
>   int i = 0;
>   long chs = 0;
>   while ( rowIt.hasNext() ) {
>     row = (XSSFRow) rowIt.next();
>     Iterator< Cell > cellIt = row.cellIterator();
>     String [] str = new String[maxColumns];
>     int j = 0;
>     long ch = 0;
>     while ( cellIt.hasNext() ) {
>       Cell cell = cellIt.next();
>       str[ j ] = cell.getStringCellValue();
>              ch += str[ j ].length(); // sum one row's length
>       System.out.print( str[ j ].length() + " " );
>       ++j;
>     }
>     System.out.print( "\n  ->" + ch + "\n" );
>     chs += ch;   // sum total characters length
>     theRows[ i ] = new OneRow( maxColumns );
>     theRows[ i ].add( str );
>     ++i;
>   }
>   System.out.print( "\n  ==>" + chs + "\n" );
> }
> 
> 
> 
> the program also counts the length of each cell, ch and the size of all 
> 
> 
> rows are updated with chs += ch. Then the total number of characters are chs. 
> 
> 
> The value is 3,230,798 (3M chars) but the used heap is about 1GB. That means 
> a 
> 
> 
> character is consuming 330 bytes!! (more or less).
> 
> 
> 
> Using the profilers, I see that needs a lot of heap and I have to increase 
> 
> 
> the heap size to 2048. Also the cpu time is devoted to GC which is 
> undesirable.
> 
> 
> 
> I have to say that XSSFSheet ws = wb.getSheetAt(0); is the most time and 
> 
> 
> memory consuming part. In order to investigate that, I manually count the 
> data 
> 
> 
> structure sizes with ch and chs.
> 
> 
> 
> 
> Please see the profiler at  https://i.stack.imgur.com/ZfLHj.jpg
> 
> In the memory chart and from left, the first peak is where it finishes 
> 
> 
> loading the excel file.
> 
> 
> 
>  Regards,
> Mahmood
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@poi.apache.org
> For additional commands, e-mail: user-h...@poi.apache.org
> 

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to