Hi Alan,

Thanks for working on this FLIP!
Overall the document looks very good, but I have a few question / comments:

* Why do we need the `canBundle()` function? We can use the interface
itself as a marker. A function that can't bundle, shouldn't implement the
interface.
  * the interface could just contain the
`bundledAccumulateRetract(List<BKS> bundle)` method?

* why does a function call need to handle multiple keys?
  * It feels that the key-loop is something that 90% of the functions would
implement the same.
  * Couldn't we have a default implementation for the key loop and another
method that just handles the bundle of a single key?

* does the list of updatedValuesAfterEachRow also include the last row?
  * it would be good to clarify that in the API documentation

* Could you clarify what the table.exec.agg-bundled.size parameter refers
to?
  * Number of keys? Total number of rows? Number of rows per key?

* It would be nice to add the keys in the example

* How do the bundled agg functions interact with checkpointing?
  * Are bundles sent out when a checkpoint barrier is received?

Thank you, Fabian

On Mon, Jan 13, 2025 at 3:42 PM Timo Walther <twal...@apache.org> wrote:

> Hi Alan,
>
> thanks for sharing this FLIP with us. Sorry for the late reply. I think
> the FLIP is already in a very good shape.
>
> I like the approach that the BundledAggregateFunction is rather a
> separate interface that can be implemented by advanced users. This
> matches with the existing SpecializedFunction interface and the upcoming
> ChangelogFunction of FLIP-440.
>
> However, I have some additional feedback:
>
> Correct me if I'm wrong, but for AggregateFunctions implementing a
> retract() and merge() is optional. How can a BundledAggregateFunction
> communicate whether or not this is supported to the planner? Enforcing
> the retract() feature in the interface specification could be an option,
> but esp for window aggregations there might not be a retract required.
>
> Also how do you plan to support merge() in this design? I couldn't find
> any mentioning in the FLIP.
>
> Regards,
> Timo
>
>
>
> On 12.12.24 02:57, Alan Sheinberg wrote:
> > I'd like to start a discussion of FLIP-491: BundledAggregateFunction for
> > batched aggregation [1]
> >
> > This feature proposes adding a new interface BundledAggregateFunction
> that
> > can be implemented by AggregateFunction UDFs.  This allows the use of a
> > batched method call so that users can handle many rows at a time for
> > multiple keys rather than the per-row calls such as accumulate and
> retract.
> >
> > The purpose is to achieve high throughput while still allowing for calls
> to
> > external systems or other blocking operations.  Similar calls through the
> > conventional AggregateFunction methods would be prohibitively slow, but
> if
> > given a batch of inputs and accumulators for each key, the implementer
> has
> > the power to parallelize or internally batch lookups to improve
> performance.
> >
> > Looking forward to your feedback and suggestions.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-491%3A+BundledAggregateFunction+for+batched+aggregation
> >
> >
> > Thanks,
> > Alan
> >
>
>

Reply via email to