Also, I was thinking that maybe it might be useful to bring SimpleRegression into line with the other regression techniques and give the user the ability to constrain the constant to zero?
-Greg On Sat, Aug 13, 2011 at 12:13 AM, Greg Sterijevski <gsterijev...@gmail.com>wrote: > One more thing... (ala Detective Colombo). > > In add and remove observation there is a snippet which looks like: > > sumXX += dx * dx * (double) n / (n + 1d); > sumYY += dy * dy * (double) n / (n + 1d); > sumXY += dx * dy * (double) n / (n + 1d); > xbar += dx / (n + 1.0); > ybar += dy / (n + 1.0); > > Would you be against pulling out the divisions by ( n+1.0)? Maybe > something like: > final double fact1 = 1.0/((double) n + 1.0); > final double fact2 = ((double) n) * fact1; > sumXX += dx * dx * fact2; > sumYY += dy * dy * fact2; > sumXY += dx * dy * fact2; > xbar += dx * fact1; > ybar += dy * fact1; > > > I realize that most likely the compiler or runtime does this optimization, > but just in case... > > > -Greg > > > On Fri, Aug 12, 2011 at 11:40 PM, Greg Sterijevski <gsterijev...@gmail.com > > wrote: > >> +1 from me, on all counts! -Greg >> >> >> On Fri, Aug 12, 2011 at 11:30 PM, Phil Steitz <phil.ste...@gmail.com>wrote: >> >>> On 8/12/11 7:16 PM, Greg Sterijevski wrote: >>> > Hello All, >>> > >>> > Before I chum the water with more JIRA tickets, I thought I would see >>> > whether people thought this was important. >>> > >>> > In the SimpleRegression you have two methods: >>> > >>> > public void addData(double x, double y) { >>> > ...some code that is not germane to discussion...... >>> > >>> > if (n > 2) { >>> > distribution = new TDistributionImpl(n - 2); >>> > } >>> > } >>> > >>> > public void removeData(double x, double y) { >>> > ...some code that is not germane...... >>> > >>> > if (n > 2) { >>> > distribution = new TDistributionImpl(n - 2); >>> > } >>> > } >>> > } >>> > >>> > >From the perspective of a user, you are likely to call add/remove >>> repeatedly >>> > before you ever need to check for statistical significance. Wouldn't it >>> be >>> > better to instantiate the TDistribution when it is needed? >>> > >>> > So you would have to make the following two methods a bit more >>> complicated: >>> > >>> > public double getSlopeConfidenceInterval(double alpha) >>> > throws MathException { >>> > if (alpha >= 1 || alpha <= 0) { >>> > throw new >>> > OutOfRangeException(LocalizedFormats.SIGNIFICANCE_LEVEL, >>> > alpha, 0, 1); >>> > } >>> > if( distribution == null || distribution.getDegreesOfFreedom() >>> != >>> > n-2){ >>> > distribution = new TDistributionImpl(n - 2); >>> > } >>> > return getSlopeStdErr() * >>> > distribution.inverseCumulativeProbability(1d - alpha / 2d); >>> > } >>> > >>> > Similarly getSignificance() would have to be modified with the check of >>> the >>> > degrees of freedom of the distribution. >>> > >>> > There is nothing wrong with the current code, but making the change >>> means >>> > faster updates. >>> >>> Slightly, yes. There is not much code at all in the distribution >>> constructor, but you are correct. Moreover, I can see now that the >>> "immutability-everywhere" changes in trunk have made the code in the >>> class a little funny. In versions <=2.2, the TDistribution used by >>> the class was pluggable - i.e., there was a constructor that took at >>> TDistribution as an argument, so if for some reason you wanted to >>> use an impl different from TDistributionImpl, you could do that. >>> There was also a setter for the distribution. In addition, >>> TDistributionImpl itself was mutable, exposing a setDegreesOfFreedom >>> method. So the distribution member was set at construction time and >>> the data update methods called the setter for DF on the >>> distribution. We decided to make the distributions immutable in 3.0 >>> (search the archives for discussion), so the current mods were done >>> to basically accomplish that. But the code should be cleaned up. >>> The constructor taking an int is meaningless and should be >>> deprecated or removed (unfortunately, we added that in 2.2 and >>> advertised it as a deprecation replacement for the version that took >>> a distribution as parameter. We should have realized then that it >>> was meaningless. My bad for missing that. I would favor dropping it >>> in 3.0, since even if anyone is using it, it isn't doing anything >>> meaningful for them.) Given that constructing a TDistributionImpl >>> is trivial, we might as well eliminate the member field altogether >>> and just create one when needed. If you agree and no one else >>> objects, I will make these changes. Thanks for reviewing the code. >>> >>> Phil >>> > >>> > Thoughts? >>> > >>> > -Greg >>> > >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> >>> >> >