Correct me if I'm wrong: all three said implementations should behave
identically in the common field (say, SXSSF doen't support array functions for
understandable reasons) as long as SpreadsheetVersion constraints aren't
violated.
However, I was trying to improved test coverage for *Cell class and found out
that at times the implementations behave very differently. Understandably, this
comes from three reasons:
a) written by different people at different times
b) lack of specifications on the interface
c) bloated interfaces.
So I could take the job of unifying the behavior. I forsee the objections like
"we shouldn't break what users are accustomed to". Fair enough, but mainly for
the "user-observable" part. I.e. values are set/got ok, formulae are evaluated.
But then there are those differences in the methods users (I guess) don't
actually use.
Say, we have a formula with a string cached value and a formula:
Cell cell = ...;
cell.setCellFormula("\"foo\");
cell.setCellValue(1);
ok, now
getCellType() == CellType.FORMULA,
getCachedFormulaResultType() == CellType.NUMERIC, and
getNumericCellValue() == 1.0.
So far so good. But if we say
cell.setCellType(CellType.NUMERIC), what will happen? Spoiler: anything can
happen.
For HSSF and SXSSF it will be FORMULA (why??)
For XSSF it will be BLANK (why??)
Such discrepancies may happen whenever implicit value/type casting is
performed, i.e. for setCellValue, get*CellValue, setCellType and setCellFormula.
So,
1. I can prepare a set of contracts for such functions and present them for
discussion here. The passed through ones will be clearly stated in the Cell
interface, tests updated (1 clause in the javadoc = 1 test), implementations
fixed.
2. A group of stronger and more invasive specific proposals:
a) mark setCellType @Deprecated. It's not something you can do in Excel, nor
the implicit value conversion logic is reliable. setCellType(FORMULA) is
meaningless. On the contrary, setBlank() should be a part of the public
interface because it has a clear meaning: delete both value and formula, and
preserve the cell with its style.
b) make get*Value return the value only if the value type strictly
corresponds to the method name. The evaluator performs conversions on its own,
and I very much doubt that users rely on a conversion like "get the numeric
value of the error code".
c) at last, remove FORMULA from CellType and deprecate
getCachedFormulaResultType. Why should anyone be concerned with the formula
while reading a value? get*Value work identically, be formula set or not. I
suspect there may be a reason for current state, though, because HSSF and XSSF
implementations are strongly tied to the underlying xml beans. But there's
still a good chance this removal is possible.
d) probably, CellType._NONE can also be removed
e) it also may be a good idea to remove BLANK as well (simultaneously adding
isBlank() method), because it's "not a value".
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]