Re: Project idea: Calc for Statistics

Regina Henschel Thu, 06 Dec 2012 11:10:19 -0800

Hi Pedro,

Pedro Giffuni schrieb:

Hi guys;


FWIW, while I was playing with the new random number generator I went
around looking for some references and I found this paper from the Journal
of Statistical Software (2010) titled "On the Numerical Accuracy of
Spreadsheets":

http://www.jstatsoft.org/v34/i04/paper


It basically shows that Calc, among other Spreadsheet programs, is not
really well suited for statistical analysis.

They use an old version of Calc. In the meantime Calc has got a lot ofaccuracy improvements. And the new implementations in Excel 2010 are farmore accurate than the old ones. The special results of the paper areoutdated. Of cause the general problem of using spreadsheets for dataexploration remains.


Something rather amazing is that the major statistic suites have been moving
towards a more "spreadsheet-like" environment. I am personally a fan of
Minitab as it brings many functions that I needed for Quality control in a
previous job. The price of the software package sky-rocketed in few years
though :(.

I'm not familiar with special statistical software. One problem withCalc is, that users do not how to use the functions in Calc for theypurpose, for example making an ANOVA. So providing wizards would behelpful.


One approach could be improving our local functions to match more
demanding specifications: some of that will necessarily have to be done.
Another approach could be facilitating interactions with software like R,


https://issues.apache.org/ooo/show_bug.cgi?id=66589


and I am aware that approach has many followers. A third approach, which
I would like to suggest as a future project, would be developing a scaddin
focused on statistics and making full use of the functions from boost that
we already have available as a module but we are not using to their full
extent.

I know that Calc is really inaccurate in some corner cases and acomparison with the solutions from boost would be good. One problem is,that Calc is limited to double precision because of the MSCV compiler.As far as I know, boost uses own types to get better precision.


I know we are all busy with other stuff to improve for 4.0 Release, just
thought I'd leave the idea for the future.

I had done a lot for statistical functions under the mentor-ship of Eikein the past, but now I'm more interested in Draw.


Some problems, which need to be solved are:
- Adapt FDIST, FINV,  and TDIST to ODF
- New algorithm needed in ScInterpreter::GetBetaDist, see "FIXME" there
- Better detection of singular matrices
- Change the LINEST function to check for collinearity (Excel compatibility)

Kind regards
Regina

Re: Project idea: Calc for Statistics

Reply via email to