On 8/12/11 7:16 PM, Greg Sterijevski wrote:
> Hello All,
>
> Before I chum the water with more JIRA tickets, I thought I would see
> whether people thought this was important.
>
> In the SimpleRegression you have two methods:
>
> public void addData(double x, double y) {
>   ...some code that is not germane to discussion......
>
>         if (n > 2) {
>             distribution = new TDistributionImpl(n - 2);
>         }
>     }
>
>     public void removeData(double x, double y) {
>   ...some code that is not germane......
>
>             if (n > 2) {
>                 distribution = new TDistributionImpl(n - 2);
>             }
>         }
>     }
>
> >From the perspective of a user, you are likely to call add/remove repeatedly
> before you ever need to check for statistical significance. Wouldn't it be
> better to instantiate the TDistribution when it is needed?
>
> So you would have to make the following two methods a bit more complicated:
>
>  public double getSlopeConfidenceInterval(double alpha)
>         throws MathException {
>         if (alpha >= 1 || alpha <= 0) {
>             throw new
> OutOfRangeException(LocalizedFormats.SIGNIFICANCE_LEVEL,
>                                           alpha, 0, 1);
>         }
>         if( distribution == null || distribution.getDegreesOfFreedom() !=
> n-2){
>           distribution = new TDistributionImpl(n - 2);
> }
>         return getSlopeStdErr() *
>             distribution.inverseCumulativeProbability(1d - alpha / 2d);
>     }
>
> Similarly getSignificance() would have to be modified with the check of the
> degrees of freedom of the distribution.
>
> There is nothing wrong with the current code, but making the change means
> faster updates.

Slightly, yes.  There is not much code at all in the distribution
constructor, but you are correct.  Moreover, I can see now that the
"immutability-everywhere" changes in trunk have made the code in the
class a little funny.  In versions <=2.2, the TDistribution used by
the class was pluggable - i.e., there was a constructor that took at
TDistribution as an argument, so if for some reason you wanted to
use an impl different from TDistributionImpl, you could do that. 
There was also a setter for the distribution. In addition,
TDistributionImpl itself was mutable, exposing a setDegreesOfFreedom
method.  So the distribution member was set at construction time and
the data update methods called the setter for DF on the
distribution.  We decided to make the distributions immutable in 3.0
(search the archives for discussion), so the current mods were done
to basically accomplish that.  But the code should be cleaned up. 
The constructor taking an int is meaningless and should be
deprecated or removed (unfortunately, we added that in 2.2 and
advertised it as a deprecation replacement for the version that took
a distribution as parameter.  We should have realized then that it
was meaningless.  My bad for missing that. I would favor dropping it
in 3.0, since even if anyone is using it, it isn't doing anything
meaningful for them.)  Given that constructing a TDistributionImpl
is trivial, we might as well eliminate the member field altogether
and just create one when needed.  If you agree and no one else
objects, I will make these changes.  Thanks for reviewing the code.

Phil
>
> Thoughts?
>
> -Greg
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to