A couple of general goals behind of the aggregate package: 1. If you are application developers using aggregate package, you only need to develop your own (user defined) valuator descriptor classes, which are typically sub class of ValueAggregatorDescriptor. You can use the existing aggregator types (such as LongValueSum, ValueHistogram, etc.)
2. If you want to contribute new types of aggregator (for example, an ValueAverage class that keeps track the average of values will be a much needed one), then you need to implement a class tham implements ValueAggregator class, and to update the generateValueAggregator method of ValueAggregatorBaseDescriptor to handle your new aggregators. 3. If you want to contribute to the aggregate framework itsself, you may need to touch every bit of the code in the package. Runping On Thu, Apr 23, 2009 at 1:44 PM, Dan Milstein <[email protected]> wrote: > Hello all, > > I've been using streaming + the aggregate package (available via -reducer > aggregate), and have been very happy with what it gives me. > > I'm interested in writing my own new aggregate functions (in Java) which I > could then access from my streaming code. > > Can anyone give me pointers towards how to make that happen? I've read > through the aggregate package source, but I'm not seeing how to define my > own, and get access to it from streaming. > > To be specific, here's the sort of thing I'd like to be able to do: > > - In Java, define a SampleValues aggregator, which chooses a sample of the > input given to it > > - From my streaming program, in say python, output: > > SampleValues:some_key \t some_value > > - Have the aggregate framework somehow call my new aggregator for the > combiner and reducer steps > > Thanks, > -Dan Milstein >
