On Wed, 4 May 2011, Alex Geller wrote:
The good news is that it is fast enough, uses little memory and handles
all the decorational items we were worrying about (CellStyles,
MergedRegions and Images), just fine. A .xlsx file with 300,000 rows and
33 columns is generated in 50 seconds using the default heap space of 64
MB.
Great news
The BigGridDemo should be improved to replace the "sheetData" element
with the real data rather then than replacing the complete file because
MergedRegions, Image references and other information such as margins
are otherwise lost on overwriting the file.
I think (as you hint later) that the demo is probably approaching the
point where it can be re-done as a streaming api similar to the read one,
rather than an example you write your code into.
Since we need to continue supporting HSSF as well as XSSF we implemented
an API compatible streaming version of XSSF called SXSSF ("S" for
streaming) so that we wouldn't have to change the code and could have
identical code for both file formats (Apart from the "Workbook"
construction of course (change from "new XSSFWorkbook" to "new
SXSSFWorkbook")).
Interesting idea. I'd probably say that an event model writing code would
be the first thing to nail down, then possibly an event backed workbook
implementation would be next after that. If you've done much of that then
it could help!
If you're happy, what I'd suggest is you post your code to a new bug in
bugzilla, then we all work together to get the code into svn
We have some open points which don't bother us for the moment because we
are using only string, numeric and date cell values. We are not using
formulas, rich text, hyperlinks and comments but we may in the future:
What cell types and values have to be written into the "c" elements of
the "sheetData" structure for the cell types BLANK, FORMULA and ERROR?
How is rich text written?
The quick way to check is just to look at some example files. Full details
are in the microsoft specifications, which should be linked from the POI
website.
How are comments and hyperlinks written into the file?
With a fair bit of xml around them... Take a look at some example files to
see. It's quite fiddly, the xssf usermodel code should show you the
details too
What are the exact semantics of Sheet.shiftRows()? Why is the code in
HSSFSheet complicated? Is it perhaps because of formulas that are cell
relative that need to be recomputed in the new location?
HSSF is often more complicated than XSSF, as more of the work needs to be
done and more things need to be kept in sync. Formulas are just one bit
What should the return value for Row.getZeroHeight() be if
Row.setZeroHeight() has not been called previously?
I'd suggest you crib off what xssf does?
How are cell formulas parsed? How is the type determined? What is the
lifecycle of formulas? What is the precomputed value and when is is
precomputed? What is the difference between "precomputed" and "cached"
and when are values cached?
XSSF is much simpler than HSSF for this. Formulas are stored as text, and
the cached/precomputed (same thing) value goes inline with them. You
normally set those values at the end of writing the cells, using the
formula evaluator. However, as you don't have all the cells in memory at
once, any references will be very tricky. The best bet might be to set the
force calculation flag, skip writing the cached values, and let excel do
them on next load
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]