Hello everyone, I am writing this email to propose a new User Defined Aggregate interface. We were trying to leverage the existing Aggregate interface, but unfortunately we realized that it is not sufficient to meet all our needs. Here are the obstacles we have observed: 1) The current aggregate interface is not very concise to users. One needs to know the design details of the intermediate Row buffer before implements an Aggregate. Seven functions are needed even for a simple Count aggregate. We'd better to make the UDAGG interface much more concisely. 2) the current aggregate function can be only applied on one single column. There are many scenarios which require the aggregate function taking multiple columns as the inputs. 3) “Retraction” is not covered in the current Aggregate.
For #1, I am thinking instead of letting users to manipulate the intermediate buffer, we could potentially put the entire Aggregate instance or a subclass instance of Aggregate to the Row buffer, such that the user does not need to know how the Aggregate state is maintained by the framework. But to achieve this goal, we probably need a new dataStream API. The existing reduce API does not work with two different types of inputs (in this proposal, the inputs will be upstream values, and the instance of the current accumulated Aggregate), while the fold API is not able to merge the two Aggregate results (which is usually needed for merging two session windows). For #3, besides the aggregate itself, there are a few other things need to be taken care of to fully support the retractions. I will share a separate concrete proposal about how to generate and process retractions, and how it works along with this new proposed UDAGG. I would like really appreciate if you can share your opinions on this proposal, especially for the needed dataStream API for #1. Also, if there is any other good things you think to be better added for UDAGG, please feel free to share with us. I will draft my proposal in a google doc and share to the flink DEV group very soon. Thanks, Shaoxuan