Re: Support local aggregate push down for Blink batch planner

Jark Wu Tue, 05 Jan 2021 19:03:11 -0800

I think this may be over designed. We should have confidence in the
interface we design, the interface should be stable.
Wrapping things in a big context has a cost of losing user convenience.
Foremost, we don't see any parameters to add in the future. Do you know any
potential parameters?


Best,
Jark

On Wed, 6 Jan 2021 at 10:28, Jingsong Li <jingsongl...@gmail.com> wrote:

> Hi Sebastian,
>
> Well, I mean:
>
> `boolean applyAggregates(int[] groupingFields, List<CallExpression>
> aggregateExpressions, DataType producedDataType);`
> VS
> ```
> boolean applyAggregates(Aggregation agg);
>
> interface Aggregation {
>   int[] groupingFields();
>   List<CallExpression> aggregateExpressions();
>   DataType producedDataType();
> }
> ```
>
> Maybe I've over considered it, but I think Aggregation is a complicated
> thing. Maybe we need to extend its parameters in the future, so make the
> parameters interface, which is conducive to the future expansion without
> destroying the compatibility of user implementation. If it is the way
> before, users need to modify the code.
>
> Best,
> Jingsong
>
> On Wed, Jan 6, 2021 at 12:52 AM Sebastian Liu <liuyang0...@gmail.com>
> wrote:
>
>> Hi Jinsong,
>>
>> Thx a lot for your suggestion. These points really need to be clear in
>> the proposal.
>>
>> For the semantic problem, I think the main point is the different
>> returned data types
>> for the target aggregate function and the row format returned by the
>> underlying storage.
>> That's why we provide the producedDataType in the
>> SupportsAggregatePushDown interface.
>> Need to let developers know that we need to handle the semantic
>> differences between
>> the underlying storage system and Flink in related connectors.
>> [Supplemented in proposal]
>>
>> For the phase of the new PushLocalAggIntoTableSourceScanRule rule, it's also
>> a key point.
>> As you suggested, we should put it into the PHYSICAL_REWRITE rule set,
>> and better to put it
>> behind the EnforceLocalXXAggRule. [Supplemented in proposal]
>>
>> For the scalability of the interface, actually I don't exactly understand
>> your suggestion. Is it to add
>> an abstract class, to implement the SupportsAggregatePushDown interface,
>> and holds the
>> `List < CallExpression > aggregateExpressions, int[] GroupingFields,
>> DataType producedDataType`
>> fields?
>>
>> Looking forward to your further feedback or guidance.
>>
>> Jingsong Li <jingsongl...@gmail.com> 于2021年1月5日周二 下午2:44写道：
>>
>>> Thanks for your proposal! Sebastian.
>>>
>>> +1 for SupportsAggregatePushDown. The above wonderful discussion has
>>> solved
>>> many of my concerns.
>>>
>>> ## Semantic problems
>>>
>>> We may need to add some mechanisms or comments, because as far as I know,
>>> the semantics of each database is actually different, which may need to
>>> be
>>> reflected in your specific implementation.
>>>
>>> For example, the AVG output types of various databases may be different.
>>> For example, MySQL outputs double, this is different from Flink. What
>>> should we do? (Lucky, avg will be splitted into sum and count, But we
>>> also
>>> need care about decimal and others)
>>>
>>> ## The phase of push-down rule
>>>
>>> I strongly recommend that you do not put it in the Volcano phase, which
>>> may
>>> make the cost calculation very troublesome.
>>> So in PHYSICAL_REWRITE?
>>>
>>> ## About interface
>>>
>>> For scalability, I slightly recommend that we introduce an `Aggregate`
>>> interface, it contains `List<CallExpression> aggregateExpressions, int[]
>>> groupingFields, DataType producedDataType` fields. In this way, we can
>>> add
>>> fields easily without breaking compatibility.
>>>
>>> I think the current design is very good, just put forward some ideas.
>>>
>>> Best,
>>> Jingsong
>>>
>>> On Tue, Jan 5, 2021 at 1:55 PM Sebastian Liu <liuyang0...@gmail.com>
>>> wrote:
>>>
>>> > Hi Jark,
>>> >
>>> > Thx for your further feedback and help. The interface of
>>> > SupportsAggregatePushDown may indeed need some adjustments.
>>> >
>>> > For (1) Agree: Yeah, the upstream only need to know if the TableSource
>>> can
>>> > handle all of the aggregates.
>>> > It's better to just return a boolean type to indicate whether all of
>>> > aggregates push down was successful or not. [Resolved in proposal]
>>> >
>>> > For (2) Agree: The aggOutputDataType represent the produced data type
>>> of
>>> > the new table source to make sure that the new table source can
>>> > connect with the related exchange node. The format of this
>>> > aggOutputDataType is groupedFields's type + agg function's return type.
>>> > The reason for adding this parameter in this function is also to
>>> facilitate
>>> > the user to build the final output type. I have changed this parameter
>>> > to be producedDataType. [Resolved in proposal]
>>> >
>>> > For (3) Agree: Indeed, groupSet may mislead users, I have changed to
>>> use
>>> > groupingFields. [Resolved in proposal]
>>> >
>>> > Thx again for the suggestion, looking for the further discussion.
>>> >
>>> > Jark Wu <imj...@gmail.com> 于2021年1月5日周二 下午12:05写道：
>>> >
>>> > > I'm also +1 for idea#2.
>>> > >
>>> > > Regarding to the updated interface,
>>> > >
>>> > > Result applyAggregates(List<CallExpression> aggregateExpressions,
>>> > >      int[] groupSet, DataType aggOutputDataType);
>>> > >
>>> > > final class Result {
>>> > >        private final List<CallExpression> acceptedAggregates;
>>> > >        private final List<CallExpression> remainingAggregates;
>>> > > }
>>> > >
>>> > > I have following comments:
>>> > >
>>> > > 1) Do we need the composite Result return type? Is a boolean return
>>> type
>>> > > enough?
>>> > >     From my understanding, all of the aggregates should be accepted,
>>> > > otherwise the pushdown should fail.
>>> > >     Therefore, users don't need to distinguish which aggregates are
>>> > > "accepted".
>>> > >
>>> > > 2) Does the `aggOutputDataType` represent the produced data type of
>>> the
>>> > > new source, or just the return type of all the agg functions?
>>> > >     I would prefer to `producedDataType` just like
>>> > > `SupportsReadingMetadata` to reduce the effort for users to concat a
>>> > final
>>> > > output type.
>>> > >     The return type of each agg function can be obtained from the
>>> > > `CallExpression`.
>>> > >
>>> > > 3) What do you think about renaming `groupSet` to `grouping` or
>>> > > `groupedFields` ?
>>> > >     The `groupSet` may confuse users that it relates to "grouping
>>> sets".
>>> > >
>>> > >
>>> > > What do you think?
>>> > >
>>> > > Best,
>>> > > Jark
>>> > >
>>> > >
>>> > >
>>> > > On Tue, 5 Jan 2021 at 11:04, Kurt Young <ykt...@gmail.com> wrote:
>>> > >
>>> > >> Sorry for the typo -_-!
>>> > >> I meant idea #2.
>>> > >>
>>> > >> Best,
>>> > >> Kurt
>>> > >>
>>> > >>
>>> > >> On Tue, Jan 5, 2021 at 10:59 AM Sebastian Liu <
>>> liuyang0...@gmail.com>
>>> > >> wrote:
>>> > >>
>>> > >>> Hi Kurt,
>>> > >>>
>>> > >>> Thx a lot for your feedback. If local aggregation is more like a
>>> > >>> physical operator rather than logical
>>> > >>> operator, I think your suggestion should be idea #2 which handle
>>> all in
>>> > >>> the physical optimization phase?
>>> > >>>
>>> > >>> Looking forward for the further discussion.
>>> > >>>
>>> > >>>
>>> > >>> Kurt Young <ykt...@gmail.com> 于2021年1月5日周二 上午9:52写道：
>>> > >>>
>>> > >>>> Local aggregation is more like a physical operator rather than
>>> logical
>>> > >>>> operator. I would suggest going with idea #1.
>>> > >>>>
>>> > >>>> Best,
>>> > >>>> Kurt
>>> > >>>>
>>> > >>>>
>>> > >>>> On Wed, Dec 30, 2020 at 8:31 PM Sebastian Liu <
>>> liuyang0...@gmail.com>
>>> > >>>> wrote:
>>> > >>>>
>>> > >>>> > Hi Jark, Thx a lot for your quick reply and valuable
>>> suggestions.
>>> > >>>> > For (1): Agree: Since we are in the period of upgrading the new
>>> > table
>>> > >>>> > source api,
>>> > >>>> > we really should consider the new interface for the new optimize
>>> > >>>> rule. If
>>> > >>>> > the new rule
>>> > >>>> > doesn't use the new api, we'll have to upgrade it sooner or
>>> later. I
>>> > >>>> have
>>> > >>>> > change to use
>>> > >>>> > the ability interface for the SupportsAggregatePushDown
>>> definition
>>> > in
>>> > >>>> above
>>> > >>>> > proposal.
>>> > >>>> >
>>> > >>>> > For (2): Agree: Change to use CallExpression is a better
>>> choice, and
>>> > >>>> have
>>> > >>>> > resolved this
>>> > >>>> > comment in the proposal.
>>> > >>>> >
>>> > >>>> > For (3): I suggest we first support the JDBC connector, as we
>>> don't
>>> > >>>> have
>>> > >>>> > Druid connector
>>> > >>>> > and ES connector just has sink api at present.
>>> > >>>> >
>>> > >>>> > But perhaps the biggest question may be whether we should use
>>> idea 1
>>> > >>>> or
>>> > >>>> > idea 2 in proposal.
>>> > >>>> >
>>> > >>>> > What do you think?  After we reach the agreement on the
>>> proposal,
>>> > our
>>> > >>>> team
>>> > >>>> > can drive to
>>> > >>>> > complete this feature.
>>> > >>>> >
>>> > >>>> > Jark Wu <imj...@gmail.com> 于2020年12月29日周二 下午2:58写道：
>>> > >>>> >
>>> > >>>> > > Hi Sebastian,
>>> > >>>> > >
>>> > >>>> > > Thanks for the proposal. I think this is a great improvement
>>> for
>>> > >>>> Flink
>>> > >>>> > SQL.
>>> > >>>> > > I went through the design doc and have following thoughts:
>>> > >>>> > >
>>> > >>>> > > 1) Flink has deprecated the legacy TableSource in 1.11 and
>>> > proposed
>>> > >>>> a new
>>> > >>>> > >  set of DynamicTableSource interfaces. Could you update your
>>> > >>>> proposal to
>>> > >>>> > > use the new interfaces?
>>> > >>>> > >  Follow the existing ability interfaces, e.g.
>>> > >>>> > > SupportsFilterPushDown, SupportsProjectionPushDown.
>>> > >>>> > >
>>> > >>>> > > 2) Personally, I think CallExpression would be a better
>>> > >>>> representation
>>> > >>>> > than
>>> > >>>> > > separate `FunctionDefinition` and args. Because, it would be
>>> > easier
>>> > >>>> to
>>> > >>>> > know
>>> > >>>> > > what's the index and type of the arguments.
>>> > >>>> > >
>>> > >>>> > > 3) It would be better to list which connectors will be
>>> supported
>>> > in
>>> > >>>> the
>>> > >>>> > > plan?
>>> > >>>> > >
>>> > >>>> > > Best,
>>> > >>>> > > Jark
>>> > >>>> > >
>>> > >>>> > >
>>> > >>>> > > On Tue, 29 Dec 2020 at 00:49, Sebastian Liu <
>>> > liuyang0...@gmail.com>
>>> > >>>> > wrote:
>>> > >>>> > >
>>> > >>>> > > > Hi all,
>>> > >>>> > > >
>>> > >>>> > > > I'd like to discuss a new feature for the Blink Planner.
>>> > >>>> > > > Aggregate operator of Flink SQL is currently fully done at
>>> Flink
>>> > >>>> layer.
>>> > >>>> > > > With the developing of storage, many downstream storage of
>>> Flink
>>> > >>>> SQL
>>> > >>>> > has
>>> > >>>> > > > the ability to deal with Aggregation operator.
>>> > >>>> > > > Pushing down Aggregate to data source layer will improve
>>> > >>>> performance
>>> > >>>> > from
>>> > >>>> > > > the perspective of the network IO and computation overhead.
>>> > >>>> > > >
>>> > >>>> > > > I have drafted a design doc for this new feature.
>>> > >>>> > > >
>>> > >>>> > > >
>>> > >>>> > >
>>> > >>>> >
>>> > >>>>
>>> >
>>> https://docs.google.com/document/d/1kGwC_h4qBNxF2eMEz6T6arByOB8yilrPLqDN0QBQXW4/edit?usp=sharing
>>> > >>>> > > >
>>> > >>>> > > > Any comment or discussion is welcome.
>>> > >>>> > > >
>>> > >>>> > > > --
>>> > >>>> > > >
>>> > >>>> > > > *With kind regards
>>> > >>>> > > > ------------------------------------------------------------
>>> > >>>> > > > Sebastian Liu 刘洋
>>> > >>>> > > > Institute of Computing Technology, Chinese Academy of
>>> Science
>>> > >>>> > > > Mobile\WeChat: +86—15201613655
>>> > >>>> > > > E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com>
>>> > >>>> > > > QQ: 3239559*
>>> > >>>> > > >
>>> > >>>> > >
>>> > >>>> >
>>> > >>>> >
>>> > >>>> > --
>>> > >>>> >
>>> > >>>> > *With kind regards
>>> > >>>> > ------------------------------------------------------------
>>> > >>>> > Sebastian Liu 刘洋
>>> > >>>> > Institute of Computing Technology, Chinese Academy of Science
>>> > >>>> > Mobile\WeChat: +86—15201613655
>>> > >>>> > E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com>
>>> > >>>> > QQ: 3239559*
>>> > >>>> >
>>> > >>>>
>>> > >>>
>>> > >>>
>>> > >>> --
>>> > >>>
>>> > >>> *With kind regards
>>> > >>> ------------------------------------------------------------
>>> > >>> Sebastian Liu 刘洋
>>> > >>> Institute of Computing Technology, Chinese Academy of Science
>>> > >>> Mobile\WeChat: +86—15201613655
>>> > >>> E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com>
>>> > >>> QQ: 3239559*
>>> > >>>
>>> > >>>
>>> >
>>> > --
>>> >
>>> > *With kind regards
>>> > ------------------------------------------------------------
>>> > Sebastian Liu 刘洋
>>> > Institute of Computing Technology, Chinese Academy of Science
>>> > Mobile\WeChat: +86—15201613655
>>> > E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com>
>>> > QQ: 3239559*
>>> >
>>>
>>>
>>> --
>>> Best, Jingsong Lee
>>>
>>
>>
>> --
>>
>> *With kind regards
>> ------------------------------------------------------------
>> Sebastian Liu 刘洋
>> Institute of Computing Technology, Chinese Academy of Science
>> Mobile\WeChat: +86—15201613655
>> E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com>
>> QQ: 3239559*
>>
>>
>
> --
> Best, Jingsong Lee
>

Re: Support local aggregate push down for Blink batch planner

Reply via email to