Runping,
Thanks for the response. A question about case (2) below, (which is,
in fact, what I want to do):
- Is there any way to do this without patching the code within the
aggregator package?
It sure doesn't look like it, but just to make sure.
Thanks again,
-Dan M
On Apr 24, 2009, at 12:56 PM, Runping Qi wrote:
A couple of general goals behind of the aggregate package:
1. If you are application developers using aggregate package, you
only need
to develop your own (user defined) valuator descriptor classes,
which are
typically sub class of ValueAggregatorDescriptor. You can use
the existing aggregator types (such as LongValueSum,
ValueHistogram, etc.)
2. If you want to contribute new types of aggregator (for example, an
ValueAverage class that keeps track the average of values will be a
much
needed one), then you need to implement a class tham implements
ValueAggregator class, and to update the generateValueAggregator
method of
ValueAggregatorBaseDescriptor to handle your new aggregators.
3. If you want to contribute to the aggregate framework itsself, you
may
need to touch every bit of the code in the package.
Runping
On Thu, Apr 23, 2009 at 1:44 PM, Dan Milstein
<[email protected]> wrote:
Hello all,
I've been using streaming + the aggregate package (available via -
reducer
aggregate), and have been very happy with what it gives me.
I'm interested in writing my own new aggregate functions (in Java)
which I
could then access from my streaming code.
Can anyone give me pointers towards how to make that happen? I've
read
through the aggregate package source, but I'm not seeing how to
define my
own, and get access to it from streaming.
To be specific, here's the sort of thing I'd like to be able to do:
- In Java, define a SampleValues aggregator, which chooses a sample
of the
input given to it
- From my streaming program, in say python, output:
SampleValues:some_key \t some_value
- Have the aggregate framework somehow call my new aggregator for the
combiner and reducer steps
Thanks,
-Dan Milstein