Hi Alan,

thanks for sharing this FLIP with us. Sorry for the late reply. I think the FLIP is already in a very good shape.

I like the approach that the BundledAggregateFunction is rather a separate interface that can be implemented by advanced users. This matches with the existing SpecializedFunction interface and the upcoming ChangelogFunction of FLIP-440.

However, I have some additional feedback:

Correct me if I'm wrong, but for AggregateFunctions implementing a retract() and merge() is optional. How can a BundledAggregateFunction communicate whether or not this is supported to the planner? Enforcing the retract() feature in the interface specification could be an option, but esp for window aggregations there might not be a retract required.

Also how do you plan to support merge() in this design? I couldn't find any mentioning in the FLIP.

Regards,
Timo



On 12.12.24 02:57, Alan Sheinberg wrote:
I'd like to start a discussion of FLIP-491: BundledAggregateFunction for
batched aggregation [1]

This feature proposes adding a new interface BundledAggregateFunction that
can be implemented by AggregateFunction UDFs.  This allows the use of a
batched method call so that users can handle many rows at a time for
multiple keys rather than the per-row calls such as accumulate and retract.

The purpose is to achieve high throughput while still allowing for calls to
external systems or other blocking operations.  Similar calls through the
conventional AggregateFunction methods would be prohibitively slow, but if
given a batch of inputs and accumulators for each key, the implementer has
the power to parallelize or internally batch lookups to improve performance.

Looking forward to your feedback and suggestions.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-491%3A+BundledAggregateFunction+for+batched+aggregation


Thanks,
Alan


Reply via email to