On 8/12/11 10:13 PM, Greg Sterijevski wrote: > One more thing... (ala Detective Colombo). > > In add and remove observation there is a snippet which looks like: > > sumXX += dx * dx * (double) n / (n + 1d); > sumYY += dy * dy * (double) n / (n + 1d); > sumXY += dx * dy * (double) n / (n + 1d); > xbar += dx / (n + 1.0); > ybar += dy / (n + 1.0); > > Would you be against pulling out the divisions by ( n+1.0)? Maybe something > like: > final double fact1 = 1.0/((double) n + 1.0); > final double fact2 = ((double) n) * fact1; > sumXX += dx * dx * fact2; > sumYY += dy * dy * fact2; > sumXY += dx * dy * fact2; > xbar += dx * fact1; > ybar += dy * fact1; > > > I realize that most likely the compiler or runtime does this optimization, > but just in case...
Probably optimized away, but I am OK with this. Phil > > > -Greg > > On Fri, Aug 12, 2011 at 11:40 PM, Greg Sterijevski > <gsterijev...@gmail.com>wrote: > >> +1 from me, on all counts! -Greg >> >> >> On Fri, Aug 12, 2011 at 11:30 PM, Phil Steitz <phil.ste...@gmail.com>wrote: >> >>> On 8/12/11 7:16 PM, Greg Sterijevski wrote: >>>> Hello All, >>>> >>>> Before I chum the water with more JIRA tickets, I thought I would see >>>> whether people thought this was important. >>>> >>>> In the SimpleRegression you have two methods: >>>> >>>> public void addData(double x, double y) { >>>> ...some code that is not germane to discussion...... >>>> >>>> if (n > 2) { >>>> distribution = new TDistributionImpl(n - 2); >>>> } >>>> } >>>> >>>> public void removeData(double x, double y) { >>>> ...some code that is not germane...... >>>> >>>> if (n > 2) { >>>> distribution = new TDistributionImpl(n - 2); >>>> } >>>> } >>>> } >>>> >>>> >From the perspective of a user, you are likely to call add/remove >>> repeatedly >>>> before you ever need to check for statistical significance. Wouldn't it >>> be >>>> better to instantiate the TDistribution when it is needed? >>>> >>>> So you would have to make the following two methods a bit more >>> complicated: >>>> public double getSlopeConfidenceInterval(double alpha) >>>> throws MathException { >>>> if (alpha >= 1 || alpha <= 0) { >>>> throw new >>>> OutOfRangeException(LocalizedFormats.SIGNIFICANCE_LEVEL, >>>> alpha, 0, 1); >>>> } >>>> if( distribution == null || distribution.getDegreesOfFreedom() >>> != >>>> n-2){ >>>> distribution = new TDistributionImpl(n - 2); >>>> } >>>> return getSlopeStdErr() * >>>> distribution.inverseCumulativeProbability(1d - alpha / 2d); >>>> } >>>> >>>> Similarly getSignificance() would have to be modified with the check of >>> the >>>> degrees of freedom of the distribution. >>>> >>>> There is nothing wrong with the current code, but making the change >>> means >>>> faster updates. >>> Slightly, yes. There is not much code at all in the distribution >>> constructor, but you are correct. Moreover, I can see now that the >>> "immutability-everywhere" changes in trunk have made the code in the >>> class a little funny. In versions <=2.2, the TDistribution used by >>> the class was pluggable - i.e., there was a constructor that took at >>> TDistribution as an argument, so if for some reason you wanted to >>> use an impl different from TDistributionImpl, you could do that. >>> There was also a setter for the distribution. In addition, >>> TDistributionImpl itself was mutable, exposing a setDegreesOfFreedom >>> method. So the distribution member was set at construction time and >>> the data update methods called the setter for DF on the >>> distribution. We decided to make the distributions immutable in 3.0 >>> (search the archives for discussion), so the current mods were done >>> to basically accomplish that. But the code should be cleaned up. >>> The constructor taking an int is meaningless and should be >>> deprecated or removed (unfortunately, we added that in 2.2 and >>> advertised it as a deprecation replacement for the version that took >>> a distribution as parameter. We should have realized then that it >>> was meaningless. My bad for missing that. I would favor dropping it >>> in 3.0, since even if anyone is using it, it isn't doing anything >>> meaningful for them.) Given that constructing a TDistributionImpl >>> is trivial, we might as well eliminate the member field altogether >>> and just create one when needed. If you agree and no one else >>> objects, I will make these changes. Thanks for reviewing the code. >>> >>> Phil >>>> Thoughts? >>>> >>>> -Greg >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> >>> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org