On 8/12/11 10:59 PM, Greg Sterijevski wrote: > Also, I was thinking that maybe it might be useful to bring SimpleRegression > into line with the other regression techniques and give the user the ability > to constrain the constant to zero?
+1 - just make sure to include clear javadoc and tests. Phil > > -Greg > > On Sat, Aug 13, 2011 at 12:13 AM, Greg Sterijevski > <gsterijev...@gmail.com>wrote: > >> One more thing... (ala Detective Colombo). >> >> In add and remove observation there is a snippet which looks like: >> >> sumXX += dx * dx * (double) n / (n + 1d); >> sumYY += dy * dy * (double) n / (n + 1d); >> sumXY += dx * dy * (double) n / (n + 1d); >> xbar += dx / (n + 1.0); >> ybar += dy / (n + 1.0); >> >> Would you be against pulling out the divisions by ( n+1.0)? Maybe >> something like: >> final double fact1 = 1.0/((double) n + 1.0); >> final double fact2 = ((double) n) * fact1; >> sumXX += dx * dx * fact2; >> sumYY += dy * dy * fact2; >> sumXY += dx * dy * fact2; >> xbar += dx * fact1; >> ybar += dy * fact1; >> >> >> I realize that most likely the compiler or runtime does this optimization, >> but just in case... >> >> >> -Greg >> >> >> On Fri, Aug 12, 2011 at 11:40 PM, Greg Sterijevski <gsterijev...@gmail.com >>> wrote: >>> +1 from me, on all counts! -Greg >>> >>> >>> On Fri, Aug 12, 2011 at 11:30 PM, Phil Steitz <phil.ste...@gmail.com>wrote: >>> >>>> On 8/12/11 7:16 PM, Greg Sterijevski wrote: >>>>> Hello All, >>>>> >>>>> Before I chum the water with more JIRA tickets, I thought I would see >>>>> whether people thought this was important. >>>>> >>>>> In the SimpleRegression you have two methods: >>>>> >>>>> public void addData(double x, double y) { >>>>> ...some code that is not germane to discussion...... >>>>> >>>>> if (n > 2) { >>>>> distribution = new TDistributionImpl(n - 2); >>>>> } >>>>> } >>>>> >>>>> public void removeData(double x, double y) { >>>>> ...some code that is not germane...... >>>>> >>>>> if (n > 2) { >>>>> distribution = new TDistributionImpl(n - 2); >>>>> } >>>>> } >>>>> } >>>>> >>>>> >From the perspective of a user, you are likely to call add/remove >>>> repeatedly >>>>> before you ever need to check for statistical significance. Wouldn't it >>>> be >>>>> better to instantiate the TDistribution when it is needed? >>>>> >>>>> So you would have to make the following two methods a bit more >>>> complicated: >>>>> public double getSlopeConfidenceInterval(double alpha) >>>>> throws MathException { >>>>> if (alpha >= 1 || alpha <= 0) { >>>>> throw new >>>>> OutOfRangeException(LocalizedFormats.SIGNIFICANCE_LEVEL, >>>>> alpha, 0, 1); >>>>> } >>>>> if( distribution == null || distribution.getDegreesOfFreedom() >>>> != >>>>> n-2){ >>>>> distribution = new TDistributionImpl(n - 2); >>>>> } >>>>> return getSlopeStdErr() * >>>>> distribution.inverseCumulativeProbability(1d - alpha / 2d); >>>>> } >>>>> >>>>> Similarly getSignificance() would have to be modified with the check of >>>> the >>>>> degrees of freedom of the distribution. >>>>> >>>>> There is nothing wrong with the current code, but making the change >>>> means >>>>> faster updates. >>>> Slightly, yes. There is not much code at all in the distribution >>>> constructor, but you are correct. Moreover, I can see now that the >>>> "immutability-everywhere" changes in trunk have made the code in the >>>> class a little funny. In versions <=2.2, the TDistribution used by >>>> the class was pluggable - i.e., there was a constructor that took at >>>> TDistribution as an argument, so if for some reason you wanted to >>>> use an impl different from TDistributionImpl, you could do that. >>>> There was also a setter for the distribution. In addition, >>>> TDistributionImpl itself was mutable, exposing a setDegreesOfFreedom >>>> method. So the distribution member was set at construction time and >>>> the data update methods called the setter for DF on the >>>> distribution. We decided to make the distributions immutable in 3.0 >>>> (search the archives for discussion), so the current mods were done >>>> to basically accomplish that. But the code should be cleaned up. >>>> The constructor taking an int is meaningless and should be >>>> deprecated or removed (unfortunately, we added that in 2.2 and >>>> advertised it as a deprecation replacement for the version that took >>>> a distribution as parameter. We should have realized then that it >>>> was meaningless. My bad for missing that. I would favor dropping it >>>> in 3.0, since even if anyone is using it, it isn't doing anything >>>> meaningful for them.) Given that constructing a TDistributionImpl >>>> is trivial, we might as well eliminate the member field altogether >>>> and just create one when needed. If you agree and no one else >>>> objects, I will make these changes. Thanks for reviewing the code. >>>> >>>> Phil >>>>> Thoughts? >>>>> >>>>> -Greg >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>> >>>> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org