OK, I have created a patch...I tried to follow the instructions to file a bug on bugzilla but i can't seem to find the right place to file a new bug to either commons or commons math.
I wonder if someone could help me out. /b On 9/29/07, Phil Steitz <[EMAIL PROTECTED]> wrote: > > On 9/22/07, Bradford Cross <[EMAIL PROTECTED]> wrote: > > Greetings! > > > > Recently I stumbled into the Commons math project; nice design, good > > abstractions, "smart updates" and even unit tests! :-) > > > Thanks! > > > the Smart updates are a key feature for event stream processing / time > > series simulation. The only piece that is missing from a time series > > analysis and simulation perspective is the ability to supply a lag that > > defines a fixed sample size and perform rolling calculations. > > > > That functionality actually already exists in the > DescriptiveStatistics class. You can set a "window size" for rolling > computations of univariate statistics using the concrete > implementation of this class, > o.a.c.math.stat.descriptive.DescriptiveStatisticsImpl. See > http://commons.apache.org/math/userguide/stat.html > > > I was very happy to see this as an item on the wish list. > > The wishlist item is not as clear as it could be. Sorry about that. > In addition to the computations in DescriptiveStatistics that require > that you maintain all of the values in the current window in memory, > we also support "storeless" computation of statistics than can be > computed in one pass through the data. This allows very large data > streams to be handled with fixed storage overhead. I think that what > the wishlist item refers to is something in between - ways to support > the window concept without storing all of the data. Strictly > speaking, this is impossible, but doing things like sampling from the > streams, periodically resetting or maintaining arrays of storeless > stats with different offsets would in theory be possible. > > > > A ThoughtWorks colleague (Yaxin Wang) and I are prototyping a java time > > series simulation engine and we are considering the commons math as the > base > > of our numerical libraries. In order to do this we need to complete the > > rolling calculations, so here is our first spike (spike means prototype > that > > can be thrown away / not a real patch.) We thought we would start with > an > > easy case; mean, which uses sum. > > > > We have already combined the rolling calculations with the smart update > > algorithms before in the numerical libraries for our previous time > series > > simulation engine. As you have mentioned in the wish list notes, our > past > > experience is that some of the algorithms can not avoid using queues for > > rolling updates case. Obviously it is something pretty fundamental to > the > > design and requires a bit of work across a lot of places to do this for > all > > the statistics (at least starting with summary statistics.) > > > > Please give feedback on the design, any issues with performance (better > data > > structure than the queue we used), etc! > > > > If the community is OK with this initial spike, then we can start > submitting > > patches. :-) > > > > Thanks for the contribution! There are a few problems with > incorporating the code as is, though. First it uses generics and the > concurrent package, which requires JDK 1.5 and our current minimum JDK > level is 1.3. That could probably be eliminated fairly easily, > though. The second is really whether or not the queue implementation > is going to improve performance over the ResizeableDoubleArray store > that DescriptiveStatisticsImpl uses now. If you think so and can > demonstrate with benchmarks, we can talk about swapping out that > implementation. Otherwise, its probably better to use > ResizeableDoubleArray. > > I am +1 on adding a RollingStatistic abstract base class (would prefer > that name to "Statistic" since it is specialized) like you have > defined and rolling versions of the individual statistics. This would > be a convenience over the current setup and provide a more intuitive > way to access rolling stats than to use DescriptiveStatisticsImpl as a > container. Currently this is only the only way to do it. So if you > can refactor to either use ResizableDoubleArray as the backing store > (look at DescriptiveStatisticsImpl.apply - the convenience classes > could just use that pattern) or otherwise eliminate the JDK 1.5 > dependency, I would support adding the rolling stats. If I understand > correctly the idea of what you mean by Sum, and Mean (using > constructor arguments to determine whether or not statistic is > rolling), I would prefer to leave the existing statistics in > commons-math as is and introduce Rolling versions as separate classes. > > One more thing. It is very important that any contributions that you > make can be made in accordance with the Apache Contributor's License > Agreement. Have a look here: > http://www.apache.org/licenses/#clas > and make sure you can agree to those terms. Then you can start > submitting patches with attachements to Jira tickets. > > Thanks! > > Phil > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >