Hello everyone,

I am writing this email to propose a new User Defined Aggregate interface.
We were trying to leverage the existing Aggregate interface, but
unfortunately we realized that it is not sufficient to meet all our needs.
Here are the obstacles we have observed:
1) The current aggregate interface is not very concise to users. One needs
to know the design details of the intermediate Row buffer before implements
an Aggregate. Seven functions are needed even for a simple Count aggregate.
We'd better to make the UDAGG interface much more concisely.
2) the current aggregate function can be only applied on one single column.
There are many scenarios which require the aggregate function taking
multiple columns as the inputs.
3) “Retraction” is not covered in the current Aggregate.

For #1, I am thinking instead of letting users to manipulate the
intermediate buffer, we could potentially put the entire Aggregate instance
or a subclass instance of Aggregate to the Row buffer, such that the user
does not need to know how the Aggregate state is maintained by the
framework.
But to achieve this goal, we probably need a new dataStream API. The
existing reduce API does not work with two different types of inputs (in
this proposal, the inputs will be upstream values, and the instance of the
current accumulated Aggregate), while the fold API is not able to merge the
two Aggregate results (which is usually needed for merging two session
windows).

For #3, besides the aggregate itself, there are a few other things need to
be taken care of to fully support the retractions. I will share a separate
concrete proposal about how to generate and process retractions, and how it
works along with this new proposed UDAGG.

I would like really appreciate if you can share your opinions on this
proposal, especially for the needed dataStream API for #1. Also, if there
is any other good things you think to be better added for UDAGG, please
feel free to share with us. I will draft my proposal in a google doc and
share to the flink DEV group very soon.

Thanks,
Shaoxuan

Reply via email to