Mahmood, that was discussed many times already: XSSFWorkbook is a memory hog as it holds the whole information in a DOM tree with a lot of strings. (Does Java 8 String Deduplication help on that, when you have enough cores?) For cases like yours reading from a SXSSFWorkbook/stream will be more suitable. You also might have a look at: https://g ithub.com/monitorjbl/excel-streaming-reader eventually.
Best regards Andreas On Wed, 2017-05-03 at 15:40 +0000, Mahmood Naderan wrote: > Hi > > I used the Apache POI to read an excel file. The file has 16000 rows and 50 > columns. > > > Each cell is read as a string and they are put into a 2D array > > > (e.g. a row is a string array with size 50 and 16000 rows are put in another > array). > > > the code looks like > > > > try (FileInputStream fIP = new FileInputStream(selectedFile)) { > XSSFWorkbook wb = new XSSFWorkbook(fIP); > XSSFSheet ws = wb.getSheetAt(0); > maxRows = ws.getLastRowNum() + 1; > maxColumns = ws.getRow(0).getLastCellNum(); > theRows = new OneRow[maxRows]; > Iterator< Row > rowIt = ws.iterator(); > XSSFRow row; > int i = 0; > long chs = 0; > while ( rowIt.hasNext() ) { > row = (XSSFRow) rowIt.next(); > Iterator< Cell > cellIt = row.cellIterator(); > String [] str = new String[maxColumns]; > int j = 0; > long ch = 0; > while ( cellIt.hasNext() ) { > Cell cell = cellIt.next(); > str[ j ] = cell.getStringCellValue(); > ch += str[ j ].length(); // sum one row's length > System.out.print( str[ j ].length() + " " ); > ++j; > } > System.out.print( "\n ->" + ch + "\n" ); > chs += ch; // sum total characters length > theRows[ i ] = new OneRow( maxColumns ); > theRows[ i ].add( str ); > ++i; > } > System.out.print( "\n ==>" + chs + "\n" ); > } > > > > the program also counts the length of each cell, ch and the size of all > > > rows are updated with chs += ch. Then the total number of characters are chs. > > > The value is 3,230,798 (3M chars) but the used heap is about 1GB. That means > a > > > character is consuming 330 bytes!! (more or less). > > > > Using the profilers, I see that needs a lot of heap and I have to increase > > > the heap size to 2048. Also the cpu time is devoted to GC which is > undesirable. > > > > I have to say that XSSFSheet ws = wb.getSheetAt(0); is the most time and > > > memory consuming part. In order to investigate that, I manually count the > data > > > structure sizes with ch and chs. > > > > > Please see the profiler at https://i.stack.imgur.com/ZfLHj.jpg > > In the memory chart and from left, the first peak is where it finishes > > > loading the excel file. > > > > Regards, > Mahmood > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@poi.apache.org > For additional commands, e-mail: user-h...@poi.apache.org >
signature.asc
Description: This is a digitally signed message part