I'd like to start a discussion of FLIP-491: BundledAggregateFunction for
batched aggregation [1]

This feature proposes adding a new interface BundledAggregateFunction that
can be implemented by AggregateFunction UDFs.  This allows the use of a
batched method call so that users can handle many rows at a time for
multiple keys rather than the per-row calls such as accumulate and retract.

The purpose is to achieve high throughput while still allowing for calls to
external systems or other blocking operations.  Similar calls through the
conventional AggregateFunction methods would be prohibitively slow, but if
given a batch of inputs and accumulators for each key, the implementer has
the power to parallelize or internally batch lookups to improve performance.

Looking forward to your feedback and suggestions.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-491%3A+BundledAggregateFunction+for+batched+aggregation


Thanks,
Alan

Reply via email to