Hi Pedro,

Pedro Giffuni schrieb:
Hi guys;

FWIW, while I was playing with the new random number generator I went
around looking for some references and I found this paper from the Journal
of Statistical Software (2010) titled "On the Numerical Accuracy of
Spreadsheets":

http://www.jstatsoft.org/v34/i04/paper


It basically shows that Calc, among other Spreadsheet programs, is not
really well suited for statistical analysis.

They use an old version of Calc. In the meantime Calc has got a lot of accuracy improvements. And the new implementations in Excel 2010 are far more accurate than the old ones. The special results of the paper are outdated. Of cause the general problem of using spreadsheets for data exploration remains.


Something rather amazing is that the major statistic suites have been moving
towards a more "spreadsheet-like" environment. I am personally a fan of
Minitab as it brings many functions that I needed for Quality control in a
previous job. The price of the software package sky-rocketed in few years
though :(.

I'm not familiar with special statistical software. One problem with Calc is, that users do not how to use the functions in Calc for they purpose, for example making an ANOVA. So providing wizards would be helpful.


One approach could be improving our local functions to match more
demanding specifications: some of that will necessarily have to be done.
Another approach could be facilitating interactions with software like R,

https://issues.apache.org/ooo/show_bug.cgi?id=66589


and I am aware that approach has many followers. A third approach, which
I would like to suggest as a future project, would be developing a scaddin
focused on statistics and making full use of the functions from boost that
we already have available as a module but we are not using to their full
extent.

I know that Calc is really inaccurate in some corner cases and a comparison with the solutions from boost would be good. One problem is, that Calc is limited to double precision because of the MSCV compiler. As far as I know, boost uses own types to get better precision.


I know we are all busy with other stuff to improve for 4.0 Release, just
thought I'd leave the idea for the future.

I had done a lot for statistical functions under the mentor-ship of Eike in the past, but now I'm more interested in Draw.

Some problems, which need to be solved are:
- Adapt FDIST, FINV,  and TDIST to ODF
- New algorithm needed in ScInterpreter::GetBetaDist, see "FIXME" there
- Better detection of singular matrices
- Change the LINEST function to check for collinearity (Excel compatibility)

Kind regards
Regina





Reply via email to