Great.  Thanks Andreas, I'll check this one out too.

I just started taking a quick look at the first Github you sent me and I
didn't see anything explicit about writing the streamed data from the Excel
workbook to a csv file.  Pardon my elementary question, but is it possible
to write a big and single csv file using these streaming methods where only
a handful of rows are kept in memory at a time?  Or would I need to write a
bunch of mini csv files and then concatenate the file contents later after
the Github code is finished?  I ask this because my only experience with my
past work is where a file is read and held in memory at once and then
written to the file vs. the streaming technique.  Thanks again.

Oscar

Oscar Bastidas
Research Associate
University of Minnesota

On Mon, May 3, 2021, 6:12 AM Andreas Reichel <[email protected]>
wrote:

> Oscar,
>
> there is also a fork of Excel Streamer which is worth to
> mention: https://github.com/pjfanning/excel-streaming-reader
> Personally I prefer that fork, but I have no good reason or argument.
> In the past this fork picked up version changes of POI faster.
>
> Cheers
> Andreas
>
>
>
> On Mon, 2021-05-03 at 06:05 -0500, Oscar Bastidas wrote:
> > Awesome.  Thanks, I'll give this a try.
> >
> > Oscar
> >
> > Oscar Bastidas
> > Research Associate
> > University of Minnesota
> >
> > On Mon, May 3, 2021, 6:04 AM Andreas Reichel
> > <[email protected]>
> > wrote:
> >
> > > Greetings.
> > >
> > > Please use the Excel Streaming Reader when reading large
> > > files: https://github.com/monitorjbl/excel-streaming-reader
> > >
> > > import com.monitorjbl.xlsx.StreamingReader;
> > >
> > > InputStream is = new FileInputStream(new
> > > File("/path/to/workbook.xlsx"));
> > > Workbook workbook = StreamingReader.builder()
> > >         .rowCacheSize(100)    // number of rows to keep in memory
> > > (defaults to 10)
> > >         .bufferSize(4096)     // buffer size to use when reading
> > > InputStream to file (defaults to 1024)
> > >         .open(is);            // InputStream or File for XLSX file
> > > (required)
> > >
> > >
> > >
> > > With the code above you can loop through your rows and write it to
> > > CSV.
> > > Best regards
> > > Andreas
> > >
> > >
> > > On Mon, 2021-05-03 at 05:31 -0500, Oscar Bastidas wrote:
> > > > Hello,
> > > >
> > > > I am trying to read a large Excel spreadsheet (60,000 rows) but I
> > > > get
> > > > what
> > > > appears to be a memory leak error from the JVM when I use the
> > > > *XSSFWorkbook
> > > > *API.  I learned recently that there are size limitations on
> > > > Excel
> > > > files
> > > > being read in this way and the Apache POI website specifically
> > > > recommends
> > > > reading the file in a streaming fashion instead of taking the
> > > > whole
> > > > file in
> > > > memory.  To do this, POI recommends using something called
> > > > *XLSX2CSV*
> > > > but
> > > > the provided link to teach how to use this returns a "page not
> > > > found
> > > > error."
> > > >
> > > > Would someone please point me in the direction of how to handle
> > > > reading my
> > > > big Excel file?
> > > >
> > > > The Apache POI URL that contains the link to *XLSX2CSV* is:
> > > >
> > > > http://poi.apache.org/components/spreadsheet/limitations.html
> > > >
> > > > Thanks for any help anyone can provide.
> > > >
> > > > Oscar
> > > >
> > > > Oscar Bastidas
> > > > Research Associate
> > > > University of Minnesota
> > >
> > >
>
>

Reply via email to