Hi Alan,
thanks for sharing this FLIP with us. Sorry for the late reply. I think
the FLIP is already in a very good shape.
I like the approach that the BundledAggregateFunction is rather a
separate interface that can be implemented by advanced users. This
matches with the existing SpecializedFunction interface and the upcoming
ChangelogFunction of FLIP-440.
However, I have some additional feedback:
Correct me if I'm wrong, but for AggregateFunctions implementing a
retract() and merge() is optional. How can a BundledAggregateFunction
communicate whether or not this is supported to the planner? Enforcing
the retract() feature in the interface specification could be an option,
but esp for window aggregations there might not be a retract required.
Also how do you plan to support merge() in this design? I couldn't find
any mentioning in the FLIP.
Regards,
Timo
On 12.12.24 02:57, Alan Sheinberg wrote:
I'd like to start a discussion of FLIP-491: BundledAggregateFunction for
batched aggregation [1]
This feature proposes adding a new interface BundledAggregateFunction that
can be implemented by AggregateFunction UDFs. This allows the use of a
batched method call so that users can handle many rows at a time for
multiple keys rather than the per-row calls such as accumulate and retract.
The purpose is to achieve high throughput while still allowing for calls to
external systems or other blocking operations. Similar calls through the
conventional AggregateFunction methods would be prohibitively slow, but if
given a batch of inputs and accumulators for each key, the implementer has
the power to parallelize or internally batch lookups to improve performance.
Looking forward to your feedback and suggestions.
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-491%3A+BundledAggregateFunction+for+batched+aggregation
Thanks,
Alan