Re: Support local aggregate push down for Blink batch planner

Jark Wu Tue, 05 Jan 2021 00:05:48 -0800

Thanks for the update.

The proposal looks good to me now.


Best,
Jark

On Tue, 5 Jan 2021 at 14:44, Jingsong Li <[email protected]> wrote:

> Thanks for your proposal! Sebastian.
>
> +1 for SupportsAggregatePushDown. The above wonderful discussion has
> solved many of my concerns.
>
> ## Semantic problems
>
> We may need to add some mechanisms or comments, because as far as I know,
> the semantics of each database is actually different, which may need to be
> reflected in your specific implementation.
>
> For example, the AVG output types of various databases may be different.
> For example, MySQL outputs double, this is different from Flink. What
> should we do? (Lucky, avg will be splitted into sum and count, But we also
> need care about decimal and others)
>
> ## The phase of push-down rule
>
> I strongly recommend that you do not put it in the Volcano phase, which
> may make the cost calculation very troublesome.
> So in PHYSICAL_REWRITE?
>
> ## About interface
>
> For scalability, I slightly recommend that we introduce an `Aggregate`
> interface, it contains `List<CallExpression> aggregateExpressions, int[]
> groupingFields, DataType producedDataType` fields. In this way, we can add
> fields easily without breaking compatibility.
>
> I think the current design is very good, just put forward some ideas.
>
> Best,
> Jingsong
>
> On Tue, Jan 5, 2021 at 1:55 PM Sebastian Liu <[email protected]>
> wrote:
>
>> Hi Jark,
>>
>> Thx for your further feedback and help. The interface of
>> SupportsAggregatePushDown may indeed need some adjustments.
>>
>> For (1) Agree: Yeah, the upstream only need to know if the TableSource can
>> handle all of the aggregates.
>> It's better to just return a boolean type to indicate whether all of
>> aggregates push down was successful or not. [Resolved in proposal]
>>
>> For (2) Agree: The aggOutputDataType represent the produced data type of
>> the new table source to make sure that the new table source can
>> connect with the related exchange node. The format of this
>> aggOutputDataType is groupedFields's type + agg function's return type.
>> The reason for adding this parameter in this function is also to
>> facilitate
>> the user to build the final output type. I have changed this parameter
>> to be producedDataType. [Resolved in proposal]
>>
>> For (3) Agree: Indeed, groupSet may mislead users, I have changed to use
>> groupingFields. [Resolved in proposal]
>>
>> Thx again for the suggestion, looking for the further discussion.
>>
>> Jark Wu <[email protected]> 于2021年1月5日周二 下午12:05写道：
>>
>> > I'm also +1 for idea#2.
>> >
>> > Regarding to the updated interface,
>> >
>> > Result applyAggregates(List<CallExpression> aggregateExpressions,
>> >      int[] groupSet, DataType aggOutputDataType);
>> >
>> > final class Result {
>> >        private final List<CallExpression> acceptedAggregates;
>> >        private final List<CallExpression> remainingAggregates;
>> > }
>> >
>> > I have following comments:
>> >
>> > 1) Do we need the composite Result return type? Is a boolean return type
>> > enough?
>> >     From my understanding, all of the aggregates should be accepted,
>> > otherwise the pushdown should fail.
>> >     Therefore, users don't need to distinguish which aggregates are
>> > "accepted".
>> >
>> > 2) Does the `aggOutputDataType` represent the produced data type of the
>> > new source, or just the return type of all the agg functions?
>> >     I would prefer to `producedDataType` just like
>> > `SupportsReadingMetadata` to reduce the effort for users to concat a
>> final
>> > output type.
>> >     The return type of each agg function can be obtained from the
>> > `CallExpression`.
>> >
>> > 3) What do you think about renaming `groupSet` to `grouping` or
>> > `groupedFields` ?
>> >     The `groupSet` may confuse users that it relates to "grouping sets".
>> >
>> >
>> > What do you think?
>> >
>> > Best,
>> > Jark
>> >
>> >
>> >
>> > On Tue, 5 Jan 2021 at 11:04, Kurt Young <[email protected]> wrote:
>> >
>> >> Sorry for the typo -_-!
>> >> I meant idea #2.
>> >>
>> >> Best,
>> >> Kurt
>> >>
>> >>
>> >> On Tue, Jan 5, 2021 at 10:59 AM Sebastian Liu <[email protected]>
>> >> wrote:
>> >>
>> >>> Hi Kurt,
>> >>>
>> >>> Thx a lot for your feedback. If local aggregation is more like a
>> >>> physical operator rather than logical
>> >>> operator, I think your suggestion should be idea #2 which handle all
>> in
>> >>> the physical optimization phase?
>> >>>
>> >>> Looking forward for the further discussion.
>> >>>
>> >>>
>> >>> Kurt Young <[email protected]> 于2021年1月5日周二 上午9:52写道：
>> >>>
>> >>>> Local aggregation is more like a physical operator rather than
>> logical
>> >>>> operator. I would suggest going with idea #1.
>> >>>>
>> >>>> Best,
>> >>>> Kurt
>> >>>>
>> >>>>
>> >>>> On Wed, Dec 30, 2020 at 8:31 PM Sebastian Liu <[email protected]
>> >
>> >>>> wrote:
>> >>>>
>> >>>> > Hi Jark, Thx a lot for your quick reply and valuable suggestions.
>> >>>> > For (1): Agree: Since we are in the period of upgrading the new
>> table
>> >>>> > source api,
>> >>>> > we really should consider the new interface for the new optimize
>> >>>> rule. If
>> >>>> > the new rule
>> >>>> > doesn't use the new api, we'll have to upgrade it sooner or later.
>> I
>> >>>> have
>> >>>> > change to use
>> >>>> > the ability interface for the SupportsAggregatePushDown definition
>> in
>> >>>> above
>> >>>> > proposal.
>> >>>> >
>> >>>> > For (2): Agree: Change to use CallExpression is a better choice,
>> and
>> >>>> have
>> >>>> > resolved this
>> >>>> > comment in the proposal.
>> >>>> >
>> >>>> > For (3): I suggest we first support the JDBC connector, as we don't
>> >>>> have
>> >>>> > Druid connector
>> >>>> > and ES connector just has sink api at present.
>> >>>> >
>> >>>> > But perhaps the biggest question may be whether we should use idea
>> 1
>> >>>> or
>> >>>> > idea 2 in proposal.
>> >>>> >
>> >>>> > What do you think?  After we reach the agreement on the proposal,
>> our
>> >>>> team
>> >>>> > can drive to
>> >>>> > complete this feature.
>> >>>> >
>> >>>> > Jark Wu <[email protected]> 于2020年12月29日周二 下午2:58写道：
>> >>>> >
>> >>>> > > Hi Sebastian,
>> >>>> > >
>> >>>> > > Thanks for the proposal. I think this is a great improvement for
>> >>>> Flink
>> >>>> > SQL.
>> >>>> > > I went through the design doc and have following thoughts:
>> >>>> > >
>> >>>> > > 1) Flink has deprecated the legacy TableSource in 1.11 and
>> proposed
>> >>>> a new
>> >>>> > >  set of DynamicTableSource interfaces. Could you update your
>> >>>> proposal to
>> >>>> > > use the new interfaces?
>> >>>> > >  Follow the existing ability interfaces, e.g.
>> >>>> > > SupportsFilterPushDown, SupportsProjectionPushDown.
>> >>>> > >
>> >>>> > > 2) Personally, I think CallExpression would be a better
>> >>>> representation
>> >>>> > than
>> >>>> > > separate `FunctionDefinition` and args. Because, it would be
>> easier
>> >>>> to
>> >>>> > know
>> >>>> > > what's the index and type of the arguments.
>> >>>> > >
>> >>>> > > 3) It would be better to list which connectors will be supported
>> in
>> >>>> the
>> >>>> > > plan?
>> >>>> > >
>> >>>> > > Best,
>> >>>> > > Jark
>> >>>> > >
>> >>>> > >
>> >>>> > > On Tue, 29 Dec 2020 at 00:49, Sebastian Liu <
>> [email protected]>
>> >>>> > wrote:
>> >>>> > >
>> >>>> > > > Hi all,
>> >>>> > > >
>> >>>> > > > I'd like to discuss a new feature for the Blink Planner.
>> >>>> > > > Aggregate operator of Flink SQL is currently fully done at
>> Flink
>> >>>> layer.
>> >>>> > > > With the developing of storage, many downstream storage of
>> Flink
>> >>>> SQL
>> >>>> > has
>> >>>> > > > the ability to deal with Aggregation operator.
>> >>>> > > > Pushing down Aggregate to data source layer will improve
>> >>>> performance
>> >>>> > from
>> >>>> > > > the perspective of the network IO and computation overhead.
>> >>>> > > >
>> >>>> > > > I have drafted a design doc for this new feature.
>> >>>> > > >
>> >>>> > > >
>> >>>> > >
>> >>>> >
>> >>>>
>> https://docs.google.com/document/d/1kGwC_h4qBNxF2eMEz6T6arByOB8yilrPLqDN0QBQXW4/edit?usp=sharing
>> >>>> > > >
>> >>>> > > > Any comment or discussion is welcome.
>> >>>> > > >
>> >>>> > > > --
>> >>>> > > >
>> >>>> > > > *With kind regards
>> >>>> > > > ------------------------------------------------------------
>> >>>> > > > Sebastian Liu 刘洋
>> >>>> > > > Institute of Computing Technology, Chinese Academy of Science
>> >>>> > > > Mobile\WeChat: +86—15201613655
>> >>>> > > > E-mail: [email protected] <[email protected]>
>> >>>> > > > QQ: 3239559*
>> >>>> > > >
>> >>>> > >
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> >
>> >>>> > *With kind regards
>> >>>> > ------------------------------------------------------------
>> >>>> > Sebastian Liu 刘洋
>> >>>> > Institute of Computing Technology, Chinese Academy of Science
>> >>>> > Mobile\WeChat: +86—15201613655
>> >>>> > E-mail: [email protected] <[email protected]>
>> >>>> > QQ: 3239559*
>> >>>> >
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>>
>> >>> *With kind regards
>> >>> ------------------------------------------------------------
>> >>> Sebastian Liu 刘洋
>> >>> Institute of Computing Technology, Chinese Academy of Science
>> >>> Mobile\WeChat: +86—15201613655
>> >>> E-mail: [email protected] <[email protected]>
>> >>> QQ: 3239559*
>> >>>
>> >>>
>>
>> --
>>
>> *With kind regards
>> ------------------------------------------------------------
>> Sebastian Liu 刘洋
>> Institute of Computing Technology, Chinese Academy of Science
>> Mobile\WeChat: +86—15201613655
>> E-mail: [email protected] <[email protected]>
>> QQ: 3239559*
>>
>
>
> --
> Best, Jingsong Lee
>

Re: Support local aggregate push down for Blink batch planner

Reply via email to