Thanks for the response Phil. This question has been on my mind for a month or so. I was working on some Student T approximators (ergo my check in of the NIST data for Student T), when I noticed that the commons Student T was based on the continuing fraction version of the incomplete beta formulation. Then I updated my local working copy and noticed you have made some changes to the Normal distribution. I looked at that implementation and it is the one based on the error function. There is nothing wrong with the choices that were made and I am sure your error function is awesome. However, my experience has been that, especially in the tails, you want other approximators (typically some variation on a power series).
As for parallels, I think you are correct, the best one is the commons implementation schema for random number generators. As I was looking back, it struck me as odd that the old SVD was ditched. Again, not saying it did not deserve it, just wanted to make sure I understood the philosophy behind the decision making. Thank you, -Greg On Thu, Sep 1, 2011 at 1:11 AM, Phil Steitz <phil.ste...@gmail.com> wrote: > On 8/31/11 10:05 PM, Greg Sterijevski wrote: > > Hello All, > > > > This question popped into my head this evening, what is the right way to > > handle multiple algorithms which purport to calculate the same thing? > There > > are, for example, a couple of ways to calculate the student t cdf. What > is > > the common's philosophy on deciding: > > > > 1. Whether to allow multiple algorithms. > > 2. How an algorithm is included. > > a.) Does a 'bug' or shortcoming need to be shown in the current > > implementation? > > b.) Say that algorithm a works for a numerical range and b works best > on > > another. Are both included? Is a new 'meta' algorithm written which mixes > > both a and b? > > 3. Does simplicity count? > > 4. Does speed matter? > > > > A while back, Chris Nix reimplemented the SVD routine. I am not sure I > > remember the old routine so I cannot say there was anything worth keeping > > there. However was there a conscious decision to scrap it? Why not have > it > > live side by side with the new one? (Again, I am not saying the old > > algorithm was better-Chris' contribution definitely was valuable). I > think > > we will run into these issues often. > > > > Thoughts? If this has been discussed already, my apologies. > > Well, at a high level, we tried to settle this at the very > beginning. Have a look at items 3 and 4 in the "guiding principles" > on the [math] main page :) That stuff comes from the original > proposal for [math] and we have tried to stay more or less faithful > to the principles laid out there. > > The integration, ode and solvers packages all try to do exactly what > you are suggesting - when multiple algorithms, or even variations on > an algorithm, exist and no single one can really do the job for all > practical use cases, we welcome and incorporate implementations of > multiple different ones. How we do that varies by package and this > is one place where our "interface pain" has led to some trauma. > Initially, we pretty much uniformly separated interfaces from > implementation precisely for this reason. You can still see that in > the distributions package and while it is a little ironic since we > have been talking about collapsing the interfaces into the impls > there, the setup now supports multiple different implementations of > any of the distributions. > > I don't want to turn this into an abstract discussion of how best to > support the strategy pattern in [math]. I think the "how to do it" > part depends on the problem / package. What we have tried to do so > far and I think we should keep trying to do is: > > 0) If there is a single best algorithm that really does work well in > almost all cases, make that "the obvious thing" and try to make our > implementation of it as robust as possible. If we provide only one > implementation, try to make it the best one. I would say put > SimpleRegression, or Variance, for example, in this category. > > 1) When multiple different standard algorithms exist, make sure our > API supports adding alternatives - including user-supplied > alternatives - in the least-confusing way we can think of. Welcome > contributions of the alternatives and try to document as best we can > how and when to use the different implementations. Try to stick to > standard algorithms, with good external references, so we don't have > to turn our javadoc and/or user guide into a numerical analysis > textbook. > > 2) Whenever possible, try to encapsulate the part that varies for > different implementations, so the API remains simple and the > variable part can be "plugged in." Good examples of this are the > RandomGenerators or the Solvers used by different classes. > > The general question is a good one; but the specific answer depends > on the algorithms and classes involved. In the two cases that you > mention (t distribution cdf and SVD), we have to balance API > complexity and more code to support against practical value. SVD > has been a big challenge for us, so I would say we should start at > step 0) on that one; but I could easily be talked into supporting > multiple impls given strong numerical arguments and multiple good, > robust impls. For the t distribution, I am not so sure. A separate > thread for that is probably best. My intuition there is that like > the other distributions, we and our users are best off with a single > impl that does whatever it needs to do in different ranges to > provide robust estimates, possibly allowing configuration settings > to trade performance for accuracy. > > Phil > > > > > -Greg > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >