Re: Support local aggregate push down for Blink batch planner

Jark Wu Wed, 06 Jan 2021 18:25:28 -0800

Thanks for updating the design doc.
It looks good to me.

Best,
Jark


On Thu, 7 Jan 2021 at 10:16, Jingsong Li <[email protected]> wrote:

> Sounds good to me.
>
> We don't have to worry about future changes, because it has covered all
> the capabilities of calcite aggregation.
>
> Best,
> Jingsong
>
> On Thu, Jan 7, 2021 at 12:14 AM Sebastian Liu <[email protected]>
> wrote:
>
>> Hi Jark,
>>
>> Sounds good to me. For better scalability in the future, we could add the
>> AggregateExpression.
>> ```
>>
>> public class AggregateExpression implements ResolvedExpression {
>>
>>    private final FunctionDefinition functionDefinition;
>>
>>    private final List<FieldReferenceExpression> args;
>>
>>    private final @Nullable CallExpression filterExpression;
>>
>>    private final DataType resultType;
>>
>>    private final boolean distinct;
>>
>>    private final boolean approximate;
>>
>>
>>
>>    private final boolean ignoreNulls;
>>
>> }
>> ```
>>
>> And we really only need one GroupingSets parameter for grouping. I have
>> updated the related interface in the proposal.
>> Appreciate the continued feedback and help.
>>
>> Jark Wu <[email protected]> 于2021年1月6日周三 下午9:34写道：
>>
>>> Hi Liu, Jingsong,
>>>
>>> Regarding the agg with filter, I think in theory we can support pushing
>>> such a pattern into source.
>>> We don't need to support it in the first version, but in the long term,
>>> we can support it.
>>> The designed interface should be future proof.
>>>
>>> Considering filter arg and distinct flag should be part of the aggregate
>>> expression.
>>> I'm wondering if CallExpression is a good representation for it.
>>> What do you think about proposing the following `AggregateExpression` to
>>> replace the `CallExpression`?
>>>
>>> class AggregateExpression implements ResolvedExpression {
>>>     private final FunctionDefinition functionDefinition;
>>>     private final List<FieldReferenceExpression> args;
>>>     private final @Nullable CallExpression filterExpr;
>>>     private final boolean distinct;
>>> }
>>>
>>> Besides, we don't need both groupingFields and groupingSets.
>>> `groupingSets` should be a superset of groupingFields.
>>> Then the interface of SupportsAggregatePushDown can be:
>>>
>>> interface SupportsAggregatePushDown {
>>>
>>>   boolean applyAggregates(
>>>     List<int[]> groupingSets,
>>>     List<AggregateExpression> aggregates,
>>>     DataType producedDataType);
>>> }
>>>
>>> What do you think?
>>>
>>> Best,
>>> Jark
>>>
>>> On Wed, 6 Jan 2021 at 19:56, Sebastian Liu <[email protected]>
>>> wrote:
>>>
>>>> Hi Jingsong, Jark,
>>>>
>>>> Thx so much for our discussion, and the cases mentioned above are
>>>> really worthy for further discussion.
>>>>
>>>> 1. For aggregate with filter expressions: eg: select COUNT(1)
>>>> FILTER(WHERE cc_call_center_sk > 3) from call_center;
>>>> For the current Blink Planner, the optimized plan will be:
>>>> TableSourceScan -> Calc(IS TRUE(>(cc_call_center_sk, 3))) -> LocalAgg
>>>> -> Exchange -> FinalAgg.
>>>> As there is a Calc above the TableSource, this pattern can't match the
>>>> LocalAggPushDownRule in the current design.
>>>>
>>>> 2. For the grouping set or rollup use case: eg: select COUNT(1) from
>>>> call_center group by rollup(cc_class, cc_employees);
>>>> For the current Blink Planner, the optimized plan will be:
>>>> TableSourceScan -> Expand -> LocalAgg -> Exchange -> FinalAgg -> Calc.
>>>> It's also not covered by the current LocalAggPushDownRule design.
>>>>
>>>> 3. I want to add a case which we haven't discussed yet. Aggregate with
>>>> Having clause.
>>>> eg: select COUNT(1) from call_center group by cc_class having
>>>> max(cc_tax_percentage) > 0.2;
>>>> For the current Blink Planner, the optimized plan will be:
>>>> TableSourceScan -> LocalAgg -> Exchange -> FinalAgg ->
>>>> Calc(where=[>($f2, 0.2:DECIMAL(2, 1))]).
>>>>
>>>> The core discussion points are summarized as follows:
>>>> a) Aggregate is a more complex scenario than predicates or limits, and
>>>> also depends on the different underlying storages.
>>>> b) One rule seems can't completely cover all aggregate scenario, but
>>>> whether SupportSAggregatePushDown interface can be a bit more general?
>>>> c) Could the CallExpression express the semantics of CalCite
>>>> AggregateCall?
>>>>
>>>> IMO: Completely push down aggregate is generally hard for distributed
>>>> systems. Usually we need a GROUP BY and exactly
>>>> matches the partition mode in downstream storage. At the same time, the
>>>> benefit of remove the final aggregate is actually limited.
>>>> The LocalAggPushDown generally yields more than 80% of the CPU and IO
>>>> benefits. But I also agree that
>>>> the SupportsAggregatePushDown interface should be as generic as
>>>> possible for future extensions, and meanwhile keep confidence
>>>> in the interface we design.
>>>>
>>>> For core points (a): As the complexity of aggregate, one
>>>> LogicalAggregate node may extend to "Expand / Calc / LocalXXAgg / Exchange
>>>> / FinalXXAgg"
>>>> in physical phase. Seems that we can't solve all cases with only one
>>>> rule. So I suggest PushLocalAggIntoTableSourceScanRule focus only
>>>> on the pattern of TableSourceScan + LocalXXAggregate at present.
>>>>
>>>> For core points (b & c): I think we can change the interface to be:
>>>> ```
>>>>
>>>> boolean applyAggregates(int[] groupingFields, List<CallExpression>
>>>> aggregateExpressions, DataType producedDataType, List<int[]>
>>>> groupingSets);
>>>>
>>>> ```
>>>>
>>>>
>>>> Simple Group: groupingSets.size() == 1 &&
>>>> groupingSets.get(0).equals(groupingFields)
>>>>
>>>> Cube Group: groupingSets.size() == IntMath.pow(2,
>>>> groupingFields.cardinality())
>>>> Rollup: Refernece org.apache.calcite.rel.core.Aggregate.Group#isRollup
>>>>
>>>> Then we can handle the complex grouping case. The Connector developer
>>>> of the downstream storage should determine
>>>> whether it supports the associated grouping type. For the filter and
>>>> having clause, they will convert to be related Calc RelNode,
>>>> and no longer in the LocalAggregate node, the CallExpression may be
>>>> sufficient to express the semantics of AggregateCall.
>>>>
>>>> What do you think? Looking forward to our further discussion.
>>>>
>>>>
>>>> Jingsong Li <[email protected]> 于2021年1月6日周三 下午2:24写道：
>>>>
>>>>> > I think filter expressions and grouping sets are semantic arguments
>>>>> instead of utilities. If we want to push them into sources, the connector
>>>>> developers should be aware of them.Wrapping them in a context implicitly 
>>>>> is
>>>>> error-prone that the existing connector will produce wrong results when
>>>>> upgrading to new Flink versions.
>>>>>
>>>>> We can have some mechanism to check the upgrading.
>>>>>
>>>>> > I think for these cases, providing a new default method to override
>>>>> might be a better choice.
>>>>>
>>>>> Then we will have three or more methods. For the API level, I really
>>>>> don't like it...
>>>>>
>>>>> Best,
>>>>> Jingsong
>>>>>
>>>>> On Wed, Jan 6, 2021 at 2:10 PM Jark Wu <[email protected]> wrote:
>>>>>
>>>>>> I think filter expressions and grouping sets are semantic arguments
>>>>>> instead of utilities.
>>>>>> If we want to push them into sources, the connector developers should
>>>>>> be aware of them.
>>>>>> Wrapping them in a context implicitly is error-prone that the
>>>>>> existing connector will produce wrong results
>>>>>>  when upgrading to new Flink versions (as we are pushing
>>>>>> grouping_sets/filter_args, but connector ignores it).
>>>>>> I think for these cases, providing a new default method to override
>>>>>> might be a better choice.
>>>>>>
>>>>>> Best,
>>>>>> Jark
>>>>>>
>>>>>> On Wed, 6 Jan 2021 at 13:56, Jingsong Li <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm also curious about aggregate with filter (COUNT(1) FILTER(WHERE
>>>>>>> d > 1)). Can we push it down? I'm not sure that a single call expression
>>>>>>> can express it, and how we should embody it and convey it to users.
>>>>>>>
>>>>>>> Best,
>>>>>>> Jingsong
>>>>>>>
>>>>>>> On Wed, Jan 6, 2021 at 1:36 PM Jingsong Li <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Jark,
>>>>>>>>
>>>>>>>> I don't want to limit this interface to LocalAgg Push down.
>>>>>>>> Actually, sometimes, we can push whole aggregation to source too.
>>>>>>>>
>>>>>>>> So, this rule can do something more advanced. For example, we can
>>>>>>>> push down group sets to source too, for the SQL: "GROUP BY GROUPING 
>>>>>>>> SETS
>>>>>>>> (f1, f2)". Then, we need to add more information to push down.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jingsong
>>>>>>>>
>>>>>>>> On Wed, Jan 6, 2021 at 11:02 AM Jark Wu <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> I think this may be over designed. We should have confidence in
>>>>>>>>> the interface we design, the interface should be stable.
>>>>>>>>> Wrapping things in a big context has a cost of losing user
>>>>>>>>> convenience.
>>>>>>>>> Foremost, we don't see any parameters to add in the future. Do you
>>>>>>>>> know any potential parameters?
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Jark
>>>>>>>>>
>>>>>>>>> On Wed, 6 Jan 2021 at 10:28, Jingsong Li <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>
>>>>>>>>>> Well, I mean:
>>>>>>>>>>
>>>>>>>>>> `boolean applyAggregates(int[] groupingFields,
>>>>>>>>>> List<CallExpression> aggregateExpressions, DataType 
>>>>>>>>>> producedDataType);`
>>>>>>>>>> VS
>>>>>>>>>> ```
>>>>>>>>>> boolean applyAggregates(Aggregation agg);
>>>>>>>>>>
>>>>>>>>>> interface Aggregation {
>>>>>>>>>>   int[] groupingFields();
>>>>>>>>>>   List<CallExpression> aggregateExpressions();
>>>>>>>>>>   DataType producedDataType();
>>>>>>>>>> }
>>>>>>>>>> ```
>>>>>>>>>>
>>>>>>>>>> Maybe I've over considered it, but I think Aggregation is a
>>>>>>>>>> complicated thing. Maybe we need to extend its parameters in the 
>>>>>>>>>> future, so
>>>>>>>>>> make the parameters interface, which is conducive to the future 
>>>>>>>>>> expansion
>>>>>>>>>> without destroying the compatibility of user implementation. If it 
>>>>>>>>>> is the
>>>>>>>>>> way before, users need to modify the code.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Jingsong
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 6, 2021 at 12:52 AM Sebastian Liu <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Jinsong,
>>>>>>>>>>>
>>>>>>>>>>> Thx a lot for your suggestion. These points really need to be
>>>>>>>>>>> clear in the proposal.
>>>>>>>>>>>
>>>>>>>>>>> For the semantic problem, I think the main point is the
>>>>>>>>>>> different returned data types
>>>>>>>>>>> for the target aggregate function and the row format returned by
>>>>>>>>>>> the underlying storage.
>>>>>>>>>>> That's why we provide the producedDataType in the
>>>>>>>>>>> SupportsAggregatePushDown interface.
>>>>>>>>>>> Need to let developers know that we need to handle the semantic
>>>>>>>>>>> differences between
>>>>>>>>>>> the underlying storage system and Flink in related connectors.
>>>>>>>>>>> [Supplemented in proposal]
>>>>>>>>>>>
>>>>>>>>>>> For the phase of the new PushLocalAggIntoTableSourceScanRule
>>>>>>>>>>> rule, it's also a key point.
>>>>>>>>>>> As you suggested, we should put it into the PHYSICAL_REWRITE
>>>>>>>>>>> rule set, and better to put it
>>>>>>>>>>> behind the EnforceLocalXXAggRule. [Supplemented in proposal]
>>>>>>>>>>>
>>>>>>>>>>> For the scalability of the interface, actually I don't exactly
>>>>>>>>>>> understand your suggestion. Is it to add
>>>>>>>>>>> an abstract class, to implement the SupportsAggregatePushDown
>>>>>>>>>>> interface, and holds the
>>>>>>>>>>> `List < CallExpression > aggregateExpressions, int[]
>>>>>>>>>>> GroupingFields, DataType producedDataType`
>>>>>>>>>>> fields?
>>>>>>>>>>>
>>>>>>>>>>> Looking forward to your further feedback or guidance.
>>>>>>>>>>>
>>>>>>>>>>> Jingsong Li <[email protected]> 于2021年1月5日周二 下午2:44写道：
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your proposal! Sebastian.
>>>>>>>>>>>>
>>>>>>>>>>>> +1 for SupportsAggregatePushDown. The above wonderful
>>>>>>>>>>>> discussion has solved
>>>>>>>>>>>> many of my concerns.
>>>>>>>>>>>>
>>>>>>>>>>>> ## Semantic problems
>>>>>>>>>>>>
>>>>>>>>>>>> We may need to add some mechanisms or comments, because as far
>>>>>>>>>>>> as I know,
>>>>>>>>>>>> the semantics of each database is actually different, which may
>>>>>>>>>>>> need to be
>>>>>>>>>>>> reflected in your specific implementation.
>>>>>>>>>>>>
>>>>>>>>>>>> For example, the AVG output types of various databases may be
>>>>>>>>>>>> different.
>>>>>>>>>>>> For example, MySQL outputs double, this is different from
>>>>>>>>>>>> Flink. What
>>>>>>>>>>>> should we do? (Lucky, avg will be splitted into sum and count,
>>>>>>>>>>>> But we also
>>>>>>>>>>>> need care about decimal and others)
>>>>>>>>>>>>
>>>>>>>>>>>> ## The phase of push-down rule
>>>>>>>>>>>>
>>>>>>>>>>>> I strongly recommend that you do not put it in the Volcano
>>>>>>>>>>>> phase, which may
>>>>>>>>>>>> make the cost calculation very troublesome.
>>>>>>>>>>>> So in PHYSICAL_REWRITE?
>>>>>>>>>>>>
>>>>>>>>>>>> ## About interface
>>>>>>>>>>>>
>>>>>>>>>>>> For scalability, I slightly recommend that we introduce an
>>>>>>>>>>>> `Aggregate`
>>>>>>>>>>>> interface, it contains `List<CallExpression>
>>>>>>>>>>>> aggregateExpressions, int[]
>>>>>>>>>>>> groupingFields, DataType producedDataType` fields. In this way,
>>>>>>>>>>>> we can add
>>>>>>>>>>>> fields easily without breaking compatibility.
>>>>>>>>>>>>
>>>>>>>>>>>> I think the current design is very good, just put forward some
>>>>>>>>>>>> ideas.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Jingsong
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 5, 2021 at 1:55 PM Sebastian Liu <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> > Hi Jark,
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thx for your further feedback and help. The interface of
>>>>>>>>>>>> > SupportsAggregatePushDown may indeed need some adjustments.
>>>>>>>>>>>> >
>>>>>>>>>>>> > For (1) Agree: Yeah, the upstream only need to know if the
>>>>>>>>>>>> TableSource can
>>>>>>>>>>>> > handle all of the aggregates.
>>>>>>>>>>>> > It's better to just return a boolean type to indicate whether
>>>>>>>>>>>> all of
>>>>>>>>>>>> > aggregates push down was successful or not. [Resolved in
>>>>>>>>>>>> proposal]
>>>>>>>>>>>> >
>>>>>>>>>>>> > For (2) Agree: The aggOutputDataType represent the produced
>>>>>>>>>>>> data type of
>>>>>>>>>>>> > the new table source to make sure that the new table source
>>>>>>>>>>>> can
>>>>>>>>>>>> > connect with the related exchange node. The format of this
>>>>>>>>>>>> > aggOutputDataType is groupedFields's type + agg function's
>>>>>>>>>>>> return type.
>>>>>>>>>>>> > The reason for adding this parameter in this function is also
>>>>>>>>>>>> to facilitate
>>>>>>>>>>>> > the user to build the final output type. I have changed this
>>>>>>>>>>>> parameter
>>>>>>>>>>>> > to be producedDataType. [Resolved in proposal]
>>>>>>>>>>>> >
>>>>>>>>>>>> > For (3) Agree: Indeed, groupSet may mislead users, I have
>>>>>>>>>>>> changed to use
>>>>>>>>>>>> > groupingFields. [Resolved in proposal]
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thx again for the suggestion, looking for the further
>>>>>>>>>>>> discussion.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Jark Wu <[email protected]> 于2021年1月5日周二 下午12:05写道：
>>>>>>>>>>>> >
>>>>>>>>>>>> > > I'm also +1 for idea#2.
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Regarding to the updated interface,
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Result applyAggregates(List<CallExpression>
>>>>>>>>>>>> aggregateExpressions,
>>>>>>>>>>>> > >      int[] groupSet, DataType aggOutputDataType);
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > final class Result {
>>>>>>>>>>>> > >        private final List<CallExpression>
>>>>>>>>>>>> acceptedAggregates;
>>>>>>>>>>>> > >        private final List<CallExpression>
>>>>>>>>>>>> remainingAggregates;
>>>>>>>>>>>> > > }
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > I have following comments:
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > 1) Do we need the composite Result return type? Is a
>>>>>>>>>>>> boolean return type
>>>>>>>>>>>> > > enough?
>>>>>>>>>>>> > >     From my understanding, all of the aggregates should be
>>>>>>>>>>>> accepted,
>>>>>>>>>>>> > > otherwise the pushdown should fail.
>>>>>>>>>>>> > >     Therefore, users don't need to distinguish which
>>>>>>>>>>>> aggregates are
>>>>>>>>>>>> > > "accepted".
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > 2) Does the `aggOutputDataType` represent the produced data
>>>>>>>>>>>> type of the
>>>>>>>>>>>> > > new source, or just the return type of all the agg
>>>>>>>>>>>> functions?
>>>>>>>>>>>> > >     I would prefer to `producedDataType` just like
>>>>>>>>>>>> > > `SupportsReadingMetadata` to reduce the effort for users to
>>>>>>>>>>>> concat a
>>>>>>>>>>>> > final
>>>>>>>>>>>> > > output type.
>>>>>>>>>>>> > >     The return type of each agg function can be obtained
>>>>>>>>>>>> from the
>>>>>>>>>>>> > > `CallExpression`.
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > 3) What do you think about renaming `groupSet` to
>>>>>>>>>>>> `grouping` or
>>>>>>>>>>>> > > `groupedFields` ?
>>>>>>>>>>>> > >     The `groupSet` may confuse users that it relates to
>>>>>>>>>>>> "grouping sets".
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > What do you think?
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Best,
>>>>>>>>>>>> > > Jark
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > On Tue, 5 Jan 2021 at 11:04, Kurt Young <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> > >
>>>>>>>>>>>> > >> Sorry for the typo -_-!
>>>>>>>>>>>> > >> I meant idea #2.
>>>>>>>>>>>> > >>
>>>>>>>>>>>> > >> Best,
>>>>>>>>>>>> > >> Kurt
>>>>>>>>>>>> > >>
>>>>>>>>>>>> > >>
>>>>>>>>>>>> > >> On Tue, Jan 5, 2021 at 10:59 AM Sebastian Liu <
>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>> > >> wrote:
>>>>>>>>>>>> > >>
>>>>>>>>>>>> > >>> Hi Kurt,
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>> Thx a lot for your feedback. If local aggregation is more
>>>>>>>>>>>> like a
>>>>>>>>>>>> > >>> physical operator rather than logical
>>>>>>>>>>>> > >>> operator, I think your suggestion should be idea #2 which
>>>>>>>>>>>> handle all in
>>>>>>>>>>>> > >>> the physical optimization phase?
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>> Looking forward for the further discussion.
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>> Kurt Young <[email protected]> 于2021年1月5日周二 上午9:52写道：
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>>> Local aggregation is more like a physical operator
>>>>>>>>>>>> rather than logical
>>>>>>>>>>>> > >>>> operator. I would suggest going with idea #1.
>>>>>>>>>>>> > >>>>
>>>>>>>>>>>> > >>>> Best,
>>>>>>>>>>>> > >>>> Kurt
>>>>>>>>>>>> > >>>>
>>>>>>>>>>>> > >>>>
>>>>>>>>>>>> > >>>> On Wed, Dec 30, 2020 at 8:31 PM Sebastian Liu <
>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>> > >>>> wrote:
>>>>>>>>>>>> > >>>>
>>>>>>>>>>>> > >>>> > Hi Jark, Thx a lot for your quick reply and valuable
>>>>>>>>>>>> suggestions.
>>>>>>>>>>>> > >>>> > For (1): Agree: Since we are in the period of
>>>>>>>>>>>> upgrading the new
>>>>>>>>>>>> > table
>>>>>>>>>>>> > >>>> > source api,
>>>>>>>>>>>> > >>>> > we really should consider the new interface for the
>>>>>>>>>>>> new optimize
>>>>>>>>>>>> > >>>> rule. If
>>>>>>>>>>>> > >>>> > the new rule
>>>>>>>>>>>> > >>>> > doesn't use the new api, we'll have to upgrade it
>>>>>>>>>>>> sooner or later. I
>>>>>>>>>>>> > >>>> have
>>>>>>>>>>>> > >>>> > change to use
>>>>>>>>>>>> > >>>> > the ability interface for the
>>>>>>>>>>>> SupportsAggregatePushDown definition
>>>>>>>>>>>> > in
>>>>>>>>>>>> > >>>> above
>>>>>>>>>>>> > >>>> > proposal.
>>>>>>>>>>>> > >>>> >
>>>>>>>>>>>> > >>>> > For (2): Agree: Change to use CallExpression is a
>>>>>>>>>>>> better choice, and
>>>>>>>>>>>> > >>>> have
>>>>>>>>>>>> > >>>> > resolved this
>>>>>>>>>>>> > >>>> > comment in the proposal.
>>>>>>>>>>>> > >>>> >
>>>>>>>>>>>> > >>>> > For (3): I suggest we first support the JDBC
>>>>>>>>>>>> connector, as we don't
>>>>>>>>>>>> > >>>> have
>>>>>>>>>>>> > >>>> > Druid connector
>>>>>>>>>>>> > >>>> > and ES connector just has sink api at present.
>>>>>>>>>>>> > >>>> >
>>>>>>>>>>>> > >>>> > But perhaps the biggest question may be whether we
>>>>>>>>>>>> should use idea 1
>>>>>>>>>>>> > >>>> or
>>>>>>>>>>>> > >>>> > idea 2 in proposal.
>>>>>>>>>>>> > >>>> >
>>>>>>>>>>>> > >>>> > What do you think?  After we reach the agreement on
>>>>>>>>>>>> the proposal,
>>>>>>>>>>>> > our
>>>>>>>>>>>> > >>>> team
>>>>>>>>>>>> > >>>> > can drive to
>>>>>>>>>>>> > >>>> > complete this feature.
>>>>>>>>>>>> > >>>> >
>>>>>>>>>>>> > >>>> > Jark Wu <[email protected]> 于2020年12月29日周二 下午2:58写道：
>>>>>>>>>>>> > >>>> >
>>>>>>>>>>>> > >>>> > > Hi Sebastian,
>>>>>>>>>>>> > >>>> > >
>>>>>>>>>>>> > >>>> > > Thanks for the proposal. I think this is a great
>>>>>>>>>>>> improvement for
>>>>>>>>>>>> > >>>> Flink
>>>>>>>>>>>> > >>>> > SQL.
>>>>>>>>>>>> > >>>> > > I went through the design doc and have following
>>>>>>>>>>>> thoughts:
>>>>>>>>>>>> > >>>> > >
>>>>>>>>>>>> > >>>> > > 1) Flink has deprecated the legacy TableSource in
>>>>>>>>>>>> 1.11 and
>>>>>>>>>>>> > proposed
>>>>>>>>>>>> > >>>> a new
>>>>>>>>>>>> > >>>> > >  set of DynamicTableSource interfaces. Could you
>>>>>>>>>>>> update your
>>>>>>>>>>>> > >>>> proposal to
>>>>>>>>>>>> > >>>> > > use the new interfaces?
>>>>>>>>>>>> > >>>> > >  Follow the existing ability interfaces, e.g.
>>>>>>>>>>>> > >>>> > > SupportsFilterPushDown, SupportsProjectionPushDown.
>>>>>>>>>>>> > >>>> > >
>>>>>>>>>>>> > >>>> > > 2) Personally, I think CallExpression would be a
>>>>>>>>>>>> better
>>>>>>>>>>>> > >>>> representation
>>>>>>>>>>>> > >>>> > than
>>>>>>>>>>>> > >>>> > > separate `FunctionDefinition` and args. Because, it
>>>>>>>>>>>> would be
>>>>>>>>>>>> > easier
>>>>>>>>>>>> > >>>> to
>>>>>>>>>>>> > >>>> > know
>>>>>>>>>>>> > >>>> > > what's the index and type of the arguments.
>>>>>>>>>>>> > >>>> > >
>>>>>>>>>>>> > >>>> > > 3) It would be better to list which connectors will
>>>>>>>>>>>> be supported
>>>>>>>>>>>> > in
>>>>>>>>>>>> > >>>> the
>>>>>>>>>>>> > >>>> > > plan?
>>>>>>>>>>>> > >>>> > >
>>>>>>>>>>>> > >>>> > > Best,
>>>>>>>>>>>> > >>>> > > Jark
>>>>>>>>>>>> > >>>> > >
>>>>>>>>>>>> > >>>> > >
>>>>>>>>>>>> > >>>> > > On Tue, 29 Dec 2020 at 00:49, Sebastian Liu <
>>>>>>>>>>>> > [email protected]>
>>>>>>>>>>>> > >>>> > wrote:
>>>>>>>>>>>> > >>>> > >
>>>>>>>>>>>> > >>>> > > > Hi all,
>>>>>>>>>>>> > >>>> > > >
>>>>>>>>>>>> > >>>> > > > I'd like to discuss a new feature for the Blink
>>>>>>>>>>>> Planner.
>>>>>>>>>>>> > >>>> > > > Aggregate operator of Flink SQL is currently fully
>>>>>>>>>>>> done at Flink
>>>>>>>>>>>> > >>>> layer.
>>>>>>>>>>>> > >>>> > > > With the developing of storage, many downstream
>>>>>>>>>>>> storage of Flink
>>>>>>>>>>>> > >>>> SQL
>>>>>>>>>>>> > >>>> > has
>>>>>>>>>>>> > >>>> > > > the ability to deal with Aggregation operator.
>>>>>>>>>>>> > >>>> > > > Pushing down Aggregate to data source layer will
>>>>>>>>>>>> improve
>>>>>>>>>>>> > >>>> performance
>>>>>>>>>>>> > >>>> > from
>>>>>>>>>>>> > >>>> > > > the perspective of the network IO and computation
>>>>>>>>>>>> overhead.
>>>>>>>>>>>> > >>>> > > >
>>>>>>>>>>>> > >>>> > > > I have drafted a design doc for this new feature.
>>>>>>>>>>>> > >>>> > > >
>>>>>>>>>>>> > >>>> > > >
>>>>>>>>>>>> > >>>> > >
>>>>>>>>>>>> > >>>> >
>>>>>>>>>>>> > >>>>
>>>>>>>>>>>> >
>>>>>>>>>>>> https://docs.google.com/document/d/1kGwC_h4qBNxF2eMEz6T6arByOB8yilrPLqDN0QBQXW4/edit?usp=sharing
>>>>>>>>>>>> > >>>> > > >
>>>>>>>>>>>> > >>>> > > > Any comment or discussion is welcome.
>>>>>>>>>>>> > >>>> > > >
>>>>>>>>>>>> > >>>> > > > --
>>>>>>>>>>>> > >>>> > > >
>>>>>>>>>>>> > >>>> > > > *With kind regards
>>>>>>>>>>>> > >>>> > > >
>>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>> > >>>> > > > Sebastian Liu 刘洋
>>>>>>>>>>>> > >>>> > > > Institute of Computing Technology, Chinese Academy
>>>>>>>>>>>> of Science
>>>>>>>>>>>> > >>>> > > > Mobile\WeChat: +86—15201613655
>>>>>>>>>>>> > >>>> > > > E-mail: [email protected] <
>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>> > >>>> > > > QQ: 3239559*
>>>>>>>>>>>> > >>>> > > >
>>>>>>>>>>>> > >>>> > >
>>>>>>>>>>>> > >>>> >
>>>>>>>>>>>> > >>>> >
>>>>>>>>>>>> > >>>> > --
>>>>>>>>>>>> > >>>> >
>>>>>>>>>>>> > >>>> > *With kind regards
>>>>>>>>>>>> > >>>> >
>>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>> > >>>> > Sebastian Liu 刘洋
>>>>>>>>>>>> > >>>> > Institute of Computing Technology, Chinese Academy of
>>>>>>>>>>>> Science
>>>>>>>>>>>> > >>>> > Mobile\WeChat: +86—15201613655
>>>>>>>>>>>> > >>>> > E-mail: [email protected] <[email protected]>
>>>>>>>>>>>> > >>>> > QQ: 3239559*
>>>>>>>>>>>> > >>>> >
>>>>>>>>>>>> > >>>>
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>> --
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>> *With kind regards
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>> > >>> Sebastian Liu 刘洋
>>>>>>>>>>>> > >>> Institute of Computing Technology, Chinese Academy of
>>>>>>>>>>>> Science
>>>>>>>>>>>> > >>> Mobile\WeChat: +86—15201613655
>>>>>>>>>>>> > >>> E-mail: [email protected] <[email protected]>
>>>>>>>>>>>> > >>> QQ: 3239559*
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> > >>>
>>>>>>>>>>>> >
>>>>>>>>>>>> > --
>>>>>>>>>>>> >
>>>>>>>>>>>> > *With kind regards
>>>>>>>>>>>> > ------------------------------------------------------------
>>>>>>>>>>>> > Sebastian Liu 刘洋
>>>>>>>>>>>> > Institute of Computing Technology, Chinese Academy of Science
>>>>>>>>>>>> > Mobile\WeChat: +86—15201613655
>>>>>>>>>>>> > E-mail: [email protected] <[email protected]>
>>>>>>>>>>>> > QQ: 3239559*
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best, Jingsong Lee
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> *With kind regards
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> Sebastian Liu 刘洋
>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of Science
>>>>>>>>>>> Mobile\WeChat: +86—15201613655
>>>>>>>>>>> E-mail: [email protected] <[email protected]>
>>>>>>>>>>> QQ: 3239559*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best, Jingsong Lee
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best, Jingsong Lee
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best, Jingsong Lee
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Best, Jingsong Lee
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *With kind regards
>>>> ------------------------------------------------------------
>>>> Sebastian Liu 刘洋
>>>> Institute of Computing Technology, Chinese Academy of Science
>>>> Mobile\WeChat: +86—15201613655
>>>> E-mail: [email protected] <[email protected]>
>>>> QQ: 3239559*
>>>>
>>>>
>>
>> --
>>
>> *With kind regards
>> ------------------------------------------------------------
>> Sebastian Liu 刘洋
>> Institute of Computing Technology, Chinese Academy of Science
>> Mobile\WeChat: +86—15201613655
>> E-mail: [email protected] <[email protected]>
>> QQ: 3239559*
>>
>>
>
> --
> Best, Jingsong Lee
>

Re: Support local aggregate push down for Blink batch planner

Reply via email to