As soon as you push the other changes, I will update my source and get to work on this... I thought SimpleRegression was a toy example, but after thinking about it, there is much to like about the class.
-Greg On Sat, Aug 13, 2011 at 1:16 AM, Phil Steitz <phil.ste...@gmail.com> wrote: > On 8/12/11 10:59 PM, Greg Sterijevski wrote: > > Also, I was thinking that maybe it might be useful to bring > SimpleRegression > > into line with the other regression techniques and give the user the > ability > > to constrain the constant to zero? > > +1 - just make sure to include clear javadoc and tests. > > Phil > > > > -Greg > > > > On Sat, Aug 13, 2011 at 12:13 AM, Greg Sterijevski > > <gsterijev...@gmail.com>wrote: > > > >> One more thing... (ala Detective Colombo). > >> > >> In add and remove observation there is a snippet which looks like: > >> > >> sumXX += dx * dx * (double) n / (n + 1d); > >> sumYY += dy * dy * (double) n / (n + 1d); > >> sumXY += dx * dy * (double) n / (n + 1d); > >> xbar += dx / (n + 1.0); > >> ybar += dy / (n + 1.0); > >> > >> Would you be against pulling out the divisions by ( n+1.0)? Maybe > >> something like: > >> final double fact1 = 1.0/((double) n + 1.0); > >> final double fact2 = ((double) n) * fact1; > >> sumXX += dx * dx * fact2; > >> sumYY += dy * dy * fact2; > >> sumXY += dx * dy * fact2; > >> xbar += dx * fact1; > >> ybar += dy * fact1; > >> > >> > >> I realize that most likely the compiler or runtime does this > optimization, > >> but just in case... > >> > >> > >> -Greg > >> > >> > >> On Fri, Aug 12, 2011 at 11:40 PM, Greg Sterijevski < > gsterijev...@gmail.com > >>> wrote: > >>> +1 from me, on all counts! -Greg > >>> > >>> > >>> On Fri, Aug 12, 2011 at 11:30 PM, Phil Steitz <phil.ste...@gmail.com > >wrote: > >>> > >>>> On 8/12/11 7:16 PM, Greg Sterijevski wrote: > >>>>> Hello All, > >>>>> > >>>>> Before I chum the water with more JIRA tickets, I thought I would see > >>>>> whether people thought this was important. > >>>>> > >>>>> In the SimpleRegression you have two methods: > >>>>> > >>>>> public void addData(double x, double y) { > >>>>> ...some code that is not germane to discussion...... > >>>>> > >>>>> if (n > 2) { > >>>>> distribution = new TDistributionImpl(n - 2); > >>>>> } > >>>>> } > >>>>> > >>>>> public void removeData(double x, double y) { > >>>>> ...some code that is not germane...... > >>>>> > >>>>> if (n > 2) { > >>>>> distribution = new TDistributionImpl(n - 2); > >>>>> } > >>>>> } > >>>>> } > >>>>> > >>>>> >From the perspective of a user, you are likely to call add/remove > >>>> repeatedly > >>>>> before you ever need to check for statistical significance. Wouldn't > it > >>>> be > >>>>> better to instantiate the TDistribution when it is needed? > >>>>> > >>>>> So you would have to make the following two methods a bit more > >>>> complicated: > >>>>> public double getSlopeConfidenceInterval(double alpha) > >>>>> throws MathException { > >>>>> if (alpha >= 1 || alpha <= 0) { > >>>>> throw new > >>>>> OutOfRangeException(LocalizedFormats.SIGNIFICANCE_LEVEL, > >>>>> alpha, 0, 1); > >>>>> } > >>>>> if( distribution == null || > distribution.getDegreesOfFreedom() > >>>> != > >>>>> n-2){ > >>>>> distribution = new TDistributionImpl(n - 2); > >>>>> } > >>>>> return getSlopeStdErr() * > >>>>> distribution.inverseCumulativeProbability(1d - alpha / > 2d); > >>>>> } > >>>>> > >>>>> Similarly getSignificance() would have to be modified with the check > of > >>>> the > >>>>> degrees of freedom of the distribution. > >>>>> > >>>>> There is nothing wrong with the current code, but making the change > >>>> means > >>>>> faster updates. > >>>> Slightly, yes. There is not much code at all in the distribution > >>>> constructor, but you are correct. Moreover, I can see now that the > >>>> "immutability-everywhere" changes in trunk have made the code in the > >>>> class a little funny. In versions <=2.2, the TDistribution used by > >>>> the class was pluggable - i.e., there was a constructor that took at > >>>> TDistribution as an argument, so if for some reason you wanted to > >>>> use an impl different from TDistributionImpl, you could do that. > >>>> There was also a setter for the distribution. In addition, > >>>> TDistributionImpl itself was mutable, exposing a setDegreesOfFreedom > >>>> method. So the distribution member was set at construction time and > >>>> the data update methods called the setter for DF on the > >>>> distribution. We decided to make the distributions immutable in 3.0 > >>>> (search the archives for discussion), so the current mods were done > >>>> to basically accomplish that. But the code should be cleaned up. > >>>> The constructor taking an int is meaningless and should be > >>>> deprecated or removed (unfortunately, we added that in 2.2 and > >>>> advertised it as a deprecation replacement for the version that took > >>>> a distribution as parameter. We should have realized then that it > >>>> was meaningless. My bad for missing that. I would favor dropping it > >>>> in 3.0, since even if anyone is using it, it isn't doing anything > >>>> meaningful for them.) Given that constructing a TDistributionImpl > >>>> is trivial, we might as well eliminate the member field altogether > >>>> and just create one when needed. If you agree and no one else > >>>> objects, I will make these changes. Thanks for reviewing the code. > >>>> > >>>> Phil > >>>>> Thoughts? > >>>>> > >>>>> -Greg > >>>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>>> For additional commands, e-mail: dev-h...@commons.apache.org > >>>> > >>>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >