On Thu, Dec 6, 2012 at 10:57 AM, Pedro Giffuni <p...@apache.org> wrote: > Hi guys; > > FWIW, while I was playing with the new random number generator I went > around looking for some references and I found this paper from the Journal > of Statistical Software (2010) titled "On the Numerical Accuracy of > Spreadsheets": > > http://www.jstatsoft.org/v34/i04/paper >
Two other relevant papers: http://arc.nucapt.northwestern.edu/~karnesky/sdarticle.pdf http://www.csdassn.org/software_reports/gnumeric.pdf > > It basically shows that Calc, among other Spreadsheet programs, is not > really well suited for statistical analysis. > > Something rather amazing is that the major statistic suites have been moving > towards a more "spreadsheet-like" environment. I am personally a fan of > Minitab as it brings many functions that I needed for Quality control in a > previous job. The price of the software package sky-rocketed in few years > though :(. > > One approach could be improving our local functions to match more > demanding specifications: some of that will necessarily have to be done. > Another approach could be facilitating interactions with software like R, > > and I am aware that approach has many followers. A third approach, which > I would like to suggest as a future project, would be developing a scaddin > focused on statistics and making full use of the functions from boost that > we already have available as a module but we are not using to their full > extent. > So two entirely different questions: 1) Improving the accuracy the statistical (and other numerical methods) we already have. 2) Extending the range of numerical methods we provide out-of-the-box I think #1 is a no-brainer, but it does require some expertise. The hard part is determining whether we have improved. For most problems we probably already get the same results as SPSS, R or other standard statistical packages. To really make an improvement we need to test the edge cases, the "poorly conditioned" and more complex cases. For #2, it probably makes sense to define a bridge to R. R is now the standard and there are hundreds of libraries that extend the environment. You can call R routines from SAS or SPPS. I just got the new Mathematica 9 upgrade, and guess what? They've now added the ability to call R. So some seamless of calling R routines and embedding R plots in Calc would be great. -Rob > I know we are all busy with other stuff to improve for 4.0 Release, just > thought I'd leave the idea for the future. > > cheers, > > Pedro.