Oscar, unfortunately I am just an accountant and so my answer might not be technically accurate. In my understanding, you will read row by row and write the cells into a BufferedOutPutStream which is feeding into a FileOutPutStream.
You could even put a ZIP/BZIP2 stream in between to compress the output on the fly. This won't take much memory (especially not in comparison to POI holding the whole XSSF Objects in Memory). 60k lines are nothing. I wold expects that -Xmx250MB would serve already, probably less. Cheers Andreas On Mon, 2021-05-03 at 06:19 -0500, Oscar Bastidas wrote: > Great. Thanks Andreas, I'll check this one out too. > > I just started taking a quick look at the first Github you sent me > and I > didn't see anything explicit about writing the streamed data from the > Excel > workbook to a csv file. Pardon my elementary question, but is it > possible > to write a big and single csv file using these streaming methods > where only > a handful of rows are kept in memory at a time? Or would I need to > write a > bunch of mini csv files and then concatenate the file contents later > after > the Github code is finished? I ask this because my only experience > with my > past work is where a file is read and held in memory at once and then > written to the file vs. the streaming technique. Thanks again. > > Oscar > > Oscar Bastidas > Research Associate > University of Minnesota > > On Mon, May 3, 2021, 6:12 AM Andreas Reichel > <[email protected]> > wrote: > > > Oscar, > > > > there is also a fork of Excel Streamer which is worth to > > mention: https://github.com/pjfanning/excel-streaming-reader > > Personally I prefer that fork, but I have no good reason or > > argument. > > In the past this fork picked up version changes of POI faster. > > > > Cheers > > Andreas > > > > > > > > On Mon, 2021-05-03 at 06:05 -0500, Oscar Bastidas wrote: > > > Awesome. Thanks, I'll give this a try. > > > > > > Oscar > > > > > > Oscar Bastidas > > > Research Associate > > > University of Minnesota > > > > > > On Mon, May 3, 2021, 6:04 AM Andreas Reichel > > > <[email protected]> > > > wrote: > > > > > > > Greetings. > > > > > > > > Please use the Excel Streaming Reader when reading large > > > > files: https://github.com/monitorjbl/excel-streaming-reader > > > > > > > > import com.monitorjbl.xlsx.StreamingReader; > > > > > > > > InputStream is = new FileInputStream(new > > > > File("/path/to/workbook.xlsx")); > > > > Workbook workbook = StreamingReader.builder() > > > > .rowCacheSize(100) // number of rows to keep in > > > > memory > > > > (defaults to 10) > > > > .bufferSize(4096) // buffer size to use when > > > > reading > > > > InputStream to file (defaults to 1024) > > > > .open(is); // InputStream or File for XLSX > > > > file > > > > (required) > > > > > > > > > > > > > > > > With the code above you can loop through your rows and write it > > > > to > > > > CSV. > > > > Best regards > > > > Andreas > > > > > > > > > > > > On Mon, 2021-05-03 at 05:31 -0500, Oscar Bastidas wrote: > > > > > Hello, > > > > > > > > > > I am trying to read a large Excel spreadsheet (60,000 rows) > > > > > but I > > > > > get > > > > > what > > > > > appears to be a memory leak error from the JVM when I use the > > > > > *XSSFWorkbook > > > > > *API. I learned recently that there are size limitations on > > > > > Excel > > > > > files > > > > > being read in this way and the Apache POI website > > > > > specifically > > > > > recommends > > > > > reading the file in a streaming fashion instead of taking the > > > > > whole > > > > > file in > > > > > memory. To do this, POI recommends using something called > > > > > *XLSX2CSV* > > > > > but > > > > > the provided link to teach how to use this returns a "page > > > > > not > > > > > found > > > > > error." > > > > > > > > > > Would someone please point me in the direction of how to > > > > > handle > > > > > reading my > > > > > big Excel file? > > > > > > > > > > The Apache POI URL that contains the link to *XLSX2CSV* is: > > > > > > > > > > http://poi.apache.org/components/spreadsheet/limitations.html > > > > > > > > > > Thanks for any help anyone can provide. > > > > > > > > > > Oscar > > > > > > > > > > Oscar Bastidas > > > > > Research Associate > > > > > University of Minnesota > > > > > > > > > > > >
