Re: [math] SimpleRegression

Greg Sterijevski Fri, 12 Aug 2011 22:13:41 -0700

One more thing... (ala Detective Colombo).

In add and remove observation there is a snippet which looks like:


            sumXX += dx * dx * (double) n / (n + 1d);
            sumYY += dy * dy * (double) n / (n + 1d);
            sumXY += dx * dy * (double) n / (n + 1d);
            xbar += dx / (n + 1.0);
            ybar += dy / (n + 1.0);

Would you be against pulling out the divisions by ( n+1.0)?  Maybe something
like:
           final double fact1 = 1.0/((double) n + 1.0);
           final double fact2 = ((double) n) * fact1;
            sumXX += dx * dx * fact2;
            sumYY += dy * dy * fact2;
            sumXY += dx * dy * fact2;
            xbar += dx * fact1;
            ybar += dy * fact1;


I realize that most likely the compiler or runtime does this optimization,
but just in case...


-Greg

On Fri, Aug 12, 2011 at 11:40 PM, Greg Sterijevski
<gsterijev...@gmail.com>wrote:

> +1 from me, on all counts! -Greg
>
>
> On Fri, Aug 12, 2011 at 11:30 PM, Phil Steitz <phil.ste...@gmail.com>wrote:
>
>> On 8/12/11 7:16 PM, Greg Sterijevski wrote:
>> > Hello All,
>> >
>> > Before I chum the water with more JIRA tickets, I thought I would see
>> > whether people thought this was important.
>> >
>> > In the SimpleRegression you have two methods:
>> >
>> > public void addData(double x, double y) {
>> >   ...some code that is not germane to discussion......
>> >
>> >         if (n > 2) {
>> >             distribution = new TDistributionImpl(n - 2);
>> >         }
>> >     }
>> >
>> >     public void removeData(double x, double y) {
>> >   ...some code that is not germane......
>> >
>> >             if (n > 2) {
>> >                 distribution = new TDistributionImpl(n - 2);
>> >             }
>> >         }
>> >     }
>> >
>> > >From the perspective of a user, you are likely to call add/remove
>> repeatedly
>> > before you ever need to check for statistical significance. Wouldn't it
>> be
>> > better to instantiate the TDistribution when it is needed?
>> >
>> > So you would have to make the following two methods a bit more
>> complicated:
>> >
>> >  public double getSlopeConfidenceInterval(double alpha)
>> >         throws MathException {
>> >         if (alpha >= 1 || alpha <= 0) {
>> >             throw new
>> > OutOfRangeException(LocalizedFormats.SIGNIFICANCE_LEVEL,
>> >                                           alpha, 0, 1);
>> >         }
>> >         if( distribution == null || distribution.getDegreesOfFreedom()
>> !=
>> > n-2){
>> >           distribution = new TDistributionImpl(n - 2);
>> > }
>> >         return getSlopeStdErr() *
>> >             distribution.inverseCumulativeProbability(1d - alpha / 2d);
>> >     }
>> >
>> > Similarly getSignificance() would have to be modified with the check of
>> the
>> > degrees of freedom of the distribution.
>> >
>> > There is nothing wrong with the current code, but making the change
>> means
>> > faster updates.
>>
>> Slightly, yes.  There is not much code at all in the distribution
>> constructor, but you are correct.  Moreover, I can see now that the
>> "immutability-everywhere" changes in trunk have made the code in the
>> class a little funny.  In versions <=2.2, the TDistribution used by
>> the class was pluggable - i.e., there was a constructor that took at
>> TDistribution as an argument, so if for some reason you wanted to
>> use an impl different from TDistributionImpl, you could do that.
>> There was also a setter for the distribution. In addition,
>> TDistributionImpl itself was mutable, exposing a setDegreesOfFreedom
>> method.  So the distribution member was set at construction time and
>> the data update methods called the setter for DF on the
>> distribution.  We decided to make the distributions immutable in 3.0
>> (search the archives for discussion), so the current mods were done
>> to basically accomplish that.  But the code should be cleaned up.
>> The constructor taking an int is meaningless and should be
>> deprecated or removed (unfortunately, we added that in 2.2 and
>> advertised it as a deprecation replacement for the version that took
>> a distribution as parameter.  We should have realized then that it
>> was meaningless.  My bad for missing that. I would favor dropping it
>> in 3.0, since even if anyone is using it, it isn't doing anything
>> meaningful for them.)  Given that constructing a TDistributionImpl
>> is trivial, we might as well eliminate the member field altogether
>> and just create one when needed.  If you agree and no one else
>> objects, I will make these changes.  Thanks for reviewing the code.
>>
>> Phil
>> >
>> > Thoughts?
>> >
>> > -Greg
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
>>
>

Re: [math] SimpleRegression

Reply via email to