On 7/12/11 12:12 PM, Greg Sterijevski wrote: > All, > > So I included the wampler data in the test suite. The interesting thing, is > to get clean runs I need wider tolerances with OLSMultipleRegression than > with the version of the Miller algorithm I am coding up. This is good for your Miller impl, not so good for OLSMultipleRegression. > Perhaps we should come to a consensus of what good enough is? How close do > we want to be? Should we require passing on all of NIST's 'hard' problems? > (for all regression techniques that get cooked up) > The goal should be to match all of the displayed digits in the reference data. When we can't do that, we should try to understand why and aim to, if possible, improve the impls. As we improve the code, the tolerances in the tests can be improved. Characterization of the types of models where the different implementations do well / poorly is another thing we should aim for (and include in the javadoc). As with all reference validation tests, we need to keep in mind that a) the "hard" examples are designed to be numerically unstable and b) conversely, a handful of examples does not really demonstrate correctness.
Phil > -Greg > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org