Hi Alan, Thanks for working on this FLIP! Overall the document looks very good, but I have a few question / comments:
* Why do we need the `canBundle()` function? We can use the interface itself as a marker. A function that can't bundle, shouldn't implement the interface. * the interface could just contain the `bundledAccumulateRetract(List<BKS> bundle)` method? * why does a function call need to handle multiple keys? * It feels that the key-loop is something that 90% of the functions would implement the same. * Couldn't we have a default implementation for the key loop and another method that just handles the bundle of a single key? * does the list of updatedValuesAfterEachRow also include the last row? * it would be good to clarify that in the API documentation * Could you clarify what the table.exec.agg-bundled.size parameter refers to? * Number of keys? Total number of rows? Number of rows per key? * It would be nice to add the keys in the example * How do the bundled agg functions interact with checkpointing? * Are bundles sent out when a checkpoint barrier is received? Thank you, Fabian On Mon, Jan 13, 2025 at 3:42 PM Timo Walther <twal...@apache.org> wrote: > Hi Alan, > > thanks for sharing this FLIP with us. Sorry for the late reply. I think > the FLIP is already in a very good shape. > > I like the approach that the BundledAggregateFunction is rather a > separate interface that can be implemented by advanced users. This > matches with the existing SpecializedFunction interface and the upcoming > ChangelogFunction of FLIP-440. > > However, I have some additional feedback: > > Correct me if I'm wrong, but for AggregateFunctions implementing a > retract() and merge() is optional. How can a BundledAggregateFunction > communicate whether or not this is supported to the planner? Enforcing > the retract() feature in the interface specification could be an option, > but esp for window aggregations there might not be a retract required. > > Also how do you plan to support merge() in this design? I couldn't find > any mentioning in the FLIP. > > Regards, > Timo > > > > On 12.12.24 02:57, Alan Sheinberg wrote: > > I'd like to start a discussion of FLIP-491: BundledAggregateFunction for > > batched aggregation [1] > > > > This feature proposes adding a new interface BundledAggregateFunction > that > > can be implemented by AggregateFunction UDFs. This allows the use of a > > batched method call so that users can handle many rows at a time for > > multiple keys rather than the per-row calls such as accumulate and > retract. > > > > The purpose is to achieve high throughput while still allowing for calls > to > > external systems or other blocking operations. Similar calls through the > > conventional AggregateFunction methods would be prohibitively slow, but > if > > given a batch of inputs and accumulators for each key, the implementer > has > > the power to parallelize or internally batch lookups to improve > performance. > > > > Looking forward to your feedback and suggestions. > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-491%3A+BundledAggregateFunction+for+batched+aggregation > > > > > > Thanks, > > Alan > > > >