Hi Jark, Sounds good to me. For better scalability in the future, we could add the AggregateExpression. ```
public class AggregateExpression implements ResolvedExpression { private final FunctionDefinition functionDefinition; private final List<FieldReferenceExpression> args; private final @Nullable CallExpression filterExpression; private final DataType resultType; private final boolean distinct; private final boolean approximate; private final boolean ignoreNulls; } ``` And we really only need one GroupingSets parameter for grouping. I have updated the related interface in the proposal. Appreciate the continued feedback and help. Jark Wu <imj...@gmail.com> 于2021年1月6日周三 下午9:34写道: > Hi Liu, Jingsong, > > Regarding the agg with filter, I think in theory we can support pushing > such a pattern into source. > We don't need to support it in the first version, but in the long term, we > can support it. > The designed interface should be future proof. > > Considering filter arg and distinct flag should be part of the aggregate > expression. > I'm wondering if CallExpression is a good representation for it. > What do you think about proposing the following `AggregateExpression` to > replace the `CallExpression`? > > class AggregateExpression implements ResolvedExpression { > private final FunctionDefinition functionDefinition; > private final List<FieldReferenceExpression> args; > private final @Nullable CallExpression filterExpr; > private final boolean distinct; > } > > Besides, we don't need both groupingFields and groupingSets. > `groupingSets` should be a superset of groupingFields. > Then the interface of SupportsAggregatePushDown can be: > > interface SupportsAggregatePushDown { > > boolean applyAggregates( > List<int[]> groupingSets, > List<AggregateExpression> aggregates, > DataType producedDataType); > } > > What do you think? > > Best, > Jark > > On Wed, 6 Jan 2021 at 19:56, Sebastian Liu <liuyang0...@gmail.com> wrote: > >> Hi Jingsong, Jark, >> >> Thx so much for our discussion, and the cases mentioned above are really >> worthy for further discussion. >> >> 1. For aggregate with filter expressions: eg: select COUNT(1) >> FILTER(WHERE cc_call_center_sk > 3) from call_center; >> For the current Blink Planner, the optimized plan will be: >> TableSourceScan -> Calc(IS TRUE(>(cc_call_center_sk, 3))) -> LocalAgg -> >> Exchange -> FinalAgg. >> As there is a Calc above the TableSource, this pattern can't match the >> LocalAggPushDownRule in the current design. >> >> 2. For the grouping set or rollup use case: eg: select COUNT(1) from >> call_center group by rollup(cc_class, cc_employees); >> For the current Blink Planner, the optimized plan will be: >> TableSourceScan -> Expand -> LocalAgg -> Exchange -> FinalAgg -> Calc. >> It's also not covered by the current LocalAggPushDownRule design. >> >> 3. I want to add a case which we haven't discussed yet. Aggregate with >> Having clause. >> eg: select COUNT(1) from call_center group by cc_class having >> max(cc_tax_percentage) > 0.2; >> For the current Blink Planner, the optimized plan will be: >> TableSourceScan -> LocalAgg -> Exchange -> FinalAgg -> Calc(where=[>($f2, >> 0.2:DECIMAL(2, 1))]). >> >> The core discussion points are summarized as follows: >> a) Aggregate is a more complex scenario than predicates or limits, and >> also depends on the different underlying storages. >> b) One rule seems can't completely cover all aggregate scenario, but >> whether SupportSAggregatePushDown interface can be a bit more general? >> c) Could the CallExpression express the semantics of CalCite >> AggregateCall? >> >> IMO: Completely push down aggregate is generally hard for distributed >> systems. Usually we need a GROUP BY and exactly >> matches the partition mode in downstream storage. At the same time, the >> benefit of remove the final aggregate is actually limited. >> The LocalAggPushDown generally yields more than 80% of the CPU and IO >> benefits. But I also agree that >> the SupportsAggregatePushDown interface should be as generic as possible >> for future extensions, and meanwhile keep confidence >> in the interface we design. >> >> For core points (a): As the complexity of aggregate, one LogicalAggregate >> node may extend to "Expand / Calc / LocalXXAgg / Exchange / FinalXXAgg" >> in physical phase. Seems that we can't solve all cases with only one >> rule. So I suggest PushLocalAggIntoTableSourceScanRule focus only >> on the pattern of TableSourceScan + LocalXXAggregate at present. >> >> For core points (b & c): I think we can change the interface to be: >> ``` >> >> boolean applyAggregates(int[] groupingFields, List<CallExpression> >> aggregateExpressions, DataType producedDataType, List<int[]> groupingSets >> ); >> >> ``` >> >> >> Simple Group: groupingSets.size() == 1 && >> groupingSets.get(0).equals(groupingFields) >> >> Cube Group: groupingSets.size() == IntMath.pow(2, >> groupingFields.cardinality()) >> Rollup: Refernece org.apache.calcite.rel.core.Aggregate.Group#isRollup >> >> Then we can handle the complex grouping case. The Connector developer of >> the downstream storage should determine >> whether it supports the associated grouping type. For the filter and >> having clause, they will convert to be related Calc RelNode, >> and no longer in the LocalAggregate node, the CallExpression may be >> sufficient to express the semantics of AggregateCall. >> >> What do you think? Looking forward to our further discussion. >> >> >> Jingsong Li <jingsongl...@gmail.com> 于2021年1月6日周三 下午2:24写道: >> >>> > I think filter expressions and grouping sets are semantic arguments >>> instead of utilities. If we want to push them into sources, the connector >>> developers should be aware of them.Wrapping them in a context implicitly is >>> error-prone that the existing connector will produce wrong results when >>> upgrading to new Flink versions. >>> >>> We can have some mechanism to check the upgrading. >>> >>> > I think for these cases, providing a new default method to override >>> might be a better choice. >>> >>> Then we will have three or more methods. For the API level, I really >>> don't like it... >>> >>> Best, >>> Jingsong >>> >>> On Wed, Jan 6, 2021 at 2:10 PM Jark Wu <imj...@gmail.com> wrote: >>> >>>> I think filter expressions and grouping sets are semantic arguments >>>> instead of utilities. >>>> If we want to push them into sources, the connector developers should >>>> be aware of them. >>>> Wrapping them in a context implicitly is error-prone that the existing >>>> connector will produce wrong results >>>> when upgrading to new Flink versions (as we are pushing >>>> grouping_sets/filter_args, but connector ignores it). >>>> I think for these cases, providing a new default method to override >>>> might be a better choice. >>>> >>>> Best, >>>> Jark >>>> >>>> On Wed, 6 Jan 2021 at 13:56, Jingsong Li <jingsongl...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I'm also curious about aggregate with filter (COUNT(1) FILTER(WHERE d >>>>> > 1)). Can we push it down? I'm not sure that a single call expression can >>>>> express it, and how we should embody it and convey it to users. >>>>> >>>>> Best, >>>>> Jingsong >>>>> >>>>> On Wed, Jan 6, 2021 at 1:36 PM Jingsong Li <jingsongl...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Jark, >>>>>> >>>>>> I don't want to limit this interface to LocalAgg Push down. Actually, >>>>>> sometimes, we can push whole aggregation to source too. >>>>>> >>>>>> So, this rule can do something more advanced. For example, we can >>>>>> push down group sets to source too, for the SQL: "GROUP BY GROUPING SETS >>>>>> (f1, f2)". Then, we need to add more information to push down. >>>>>> >>>>>> Best, >>>>>> Jingsong >>>>>> >>>>>> On Wed, Jan 6, 2021 at 11:02 AM Jark Wu <imj...@gmail.com> wrote: >>>>>> >>>>>>> I think this may be over designed. We should have confidence in the >>>>>>> interface we design, the interface should be stable. >>>>>>> Wrapping things in a big context has a cost of losing user >>>>>>> convenience. >>>>>>> Foremost, we don't see any parameters to add in the future. Do you >>>>>>> know any potential parameters? >>>>>>> >>>>>>> Best, >>>>>>> Jark >>>>>>> >>>>>>> On Wed, 6 Jan 2021 at 10:28, Jingsong Li <jingsongl...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Sebastian, >>>>>>>> >>>>>>>> Well, I mean: >>>>>>>> >>>>>>>> `boolean applyAggregates(int[] groupingFields, List<CallExpression> >>>>>>>> aggregateExpressions, DataType producedDataType);` >>>>>>>> VS >>>>>>>> ``` >>>>>>>> boolean applyAggregates(Aggregation agg); >>>>>>>> >>>>>>>> interface Aggregation { >>>>>>>> int[] groupingFields(); >>>>>>>> List<CallExpression> aggregateExpressions(); >>>>>>>> DataType producedDataType(); >>>>>>>> } >>>>>>>> ``` >>>>>>>> >>>>>>>> Maybe I've over considered it, but I think Aggregation is a >>>>>>>> complicated thing. Maybe we need to extend its parameters in the >>>>>>>> future, so >>>>>>>> make the parameters interface, which is conducive to the future >>>>>>>> expansion >>>>>>>> without destroying the compatibility of user implementation. If it is >>>>>>>> the >>>>>>>> way before, users need to modify the code. >>>>>>>> >>>>>>>> Best, >>>>>>>> Jingsong >>>>>>>> >>>>>>>> On Wed, Jan 6, 2021 at 12:52 AM Sebastian Liu < >>>>>>>> liuyang0...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Jinsong, >>>>>>>>> >>>>>>>>> Thx a lot for your suggestion. These points really need to be >>>>>>>>> clear in the proposal. >>>>>>>>> >>>>>>>>> For the semantic problem, I think the main point is the different >>>>>>>>> returned data types >>>>>>>>> for the target aggregate function and the row format returned by >>>>>>>>> the underlying storage. >>>>>>>>> That's why we provide the producedDataType in the >>>>>>>>> SupportsAggregatePushDown interface. >>>>>>>>> Need to let developers know that we need to handle the semantic >>>>>>>>> differences between >>>>>>>>> the underlying storage system and Flink in related connectors. >>>>>>>>> [Supplemented in proposal] >>>>>>>>> >>>>>>>>> For the phase of the new PushLocalAggIntoTableSourceScanRule >>>>>>>>> rule, it's also a key point. >>>>>>>>> As you suggested, we should put it into the PHYSICAL_REWRITE rule >>>>>>>>> set, and better to put it >>>>>>>>> behind the EnforceLocalXXAggRule. [Supplemented in proposal] >>>>>>>>> >>>>>>>>> For the scalability of the interface, actually I don't exactly >>>>>>>>> understand your suggestion. Is it to add >>>>>>>>> an abstract class, to implement the SupportsAggregatePushDown >>>>>>>>> interface, and holds the >>>>>>>>> `List < CallExpression > aggregateExpressions, int[] >>>>>>>>> GroupingFields, DataType producedDataType` >>>>>>>>> fields? >>>>>>>>> >>>>>>>>> Looking forward to your further feedback or guidance. >>>>>>>>> >>>>>>>>> Jingsong Li <jingsongl...@gmail.com> 于2021年1月5日周二 下午2:44写道: >>>>>>>>> >>>>>>>>>> Thanks for your proposal! Sebastian. >>>>>>>>>> >>>>>>>>>> +1 for SupportsAggregatePushDown. The above wonderful discussion >>>>>>>>>> has solved >>>>>>>>>> many of my concerns. >>>>>>>>>> >>>>>>>>>> ## Semantic problems >>>>>>>>>> >>>>>>>>>> We may need to add some mechanisms or comments, because as far as >>>>>>>>>> I know, >>>>>>>>>> the semantics of each database is actually different, which may >>>>>>>>>> need to be >>>>>>>>>> reflected in your specific implementation. >>>>>>>>>> >>>>>>>>>> For example, the AVG output types of various databases may be >>>>>>>>>> different. >>>>>>>>>> For example, MySQL outputs double, this is different from Flink. >>>>>>>>>> What >>>>>>>>>> should we do? (Lucky, avg will be splitted into sum and count, >>>>>>>>>> But we also >>>>>>>>>> need care about decimal and others) >>>>>>>>>> >>>>>>>>>> ## The phase of push-down rule >>>>>>>>>> >>>>>>>>>> I strongly recommend that you do not put it in the Volcano phase, >>>>>>>>>> which may >>>>>>>>>> make the cost calculation very troublesome. >>>>>>>>>> So in PHYSICAL_REWRITE? >>>>>>>>>> >>>>>>>>>> ## About interface >>>>>>>>>> >>>>>>>>>> For scalability, I slightly recommend that we introduce an >>>>>>>>>> `Aggregate` >>>>>>>>>> interface, it contains `List<CallExpression> >>>>>>>>>> aggregateExpressions, int[] >>>>>>>>>> groupingFields, DataType producedDataType` fields. In this way, >>>>>>>>>> we can add >>>>>>>>>> fields easily without breaking compatibility. >>>>>>>>>> >>>>>>>>>> I think the current design is very good, just put forward some >>>>>>>>>> ideas. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Jingsong >>>>>>>>>> >>>>>>>>>> On Tue, Jan 5, 2021 at 1:55 PM Sebastian Liu < >>>>>>>>>> liuyang0...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> > Hi Jark, >>>>>>>>>> > >>>>>>>>>> > Thx for your further feedback and help. The interface of >>>>>>>>>> > SupportsAggregatePushDown may indeed need some adjustments. >>>>>>>>>> > >>>>>>>>>> > For (1) Agree: Yeah, the upstream only need to know if the >>>>>>>>>> TableSource can >>>>>>>>>> > handle all of the aggregates. >>>>>>>>>> > It's better to just return a boolean type to indicate whether >>>>>>>>>> all of >>>>>>>>>> > aggregates push down was successful or not. [Resolved in >>>>>>>>>> proposal] >>>>>>>>>> > >>>>>>>>>> > For (2) Agree: The aggOutputDataType represent the produced >>>>>>>>>> data type of >>>>>>>>>> > the new table source to make sure that the new table source can >>>>>>>>>> > connect with the related exchange node. The format of this >>>>>>>>>> > aggOutputDataType is groupedFields's type + agg function's >>>>>>>>>> return type. >>>>>>>>>> > The reason for adding this parameter in this function is also >>>>>>>>>> to facilitate >>>>>>>>>> > the user to build the final output type. I have changed this >>>>>>>>>> parameter >>>>>>>>>> > to be producedDataType. [Resolved in proposal] >>>>>>>>>> > >>>>>>>>>> > For (3) Agree: Indeed, groupSet may mislead users, I have >>>>>>>>>> changed to use >>>>>>>>>> > groupingFields. [Resolved in proposal] >>>>>>>>>> > >>>>>>>>>> > Thx again for the suggestion, looking for the further >>>>>>>>>> discussion. >>>>>>>>>> > >>>>>>>>>> > Jark Wu <imj...@gmail.com> 于2021年1月5日周二 下午12:05写道: >>>>>>>>>> > >>>>>>>>>> > > I'm also +1 for idea#2. >>>>>>>>>> > > >>>>>>>>>> > > Regarding to the updated interface, >>>>>>>>>> > > >>>>>>>>>> > > Result applyAggregates(List<CallExpression> >>>>>>>>>> aggregateExpressions, >>>>>>>>>> > > int[] groupSet, DataType aggOutputDataType); >>>>>>>>>> > > >>>>>>>>>> > > final class Result { >>>>>>>>>> > > private final List<CallExpression> acceptedAggregates; >>>>>>>>>> > > private final List<CallExpression> remainingAggregates; >>>>>>>>>> > > } >>>>>>>>>> > > >>>>>>>>>> > > I have following comments: >>>>>>>>>> > > >>>>>>>>>> > > 1) Do we need the composite Result return type? Is a boolean >>>>>>>>>> return type >>>>>>>>>> > > enough? >>>>>>>>>> > > From my understanding, all of the aggregates should be >>>>>>>>>> accepted, >>>>>>>>>> > > otherwise the pushdown should fail. >>>>>>>>>> > > Therefore, users don't need to distinguish which >>>>>>>>>> aggregates are >>>>>>>>>> > > "accepted". >>>>>>>>>> > > >>>>>>>>>> > > 2) Does the `aggOutputDataType` represent the produced data >>>>>>>>>> type of the >>>>>>>>>> > > new source, or just the return type of all the agg functions? >>>>>>>>>> > > I would prefer to `producedDataType` just like >>>>>>>>>> > > `SupportsReadingMetadata` to reduce the effort for users to >>>>>>>>>> concat a >>>>>>>>>> > final >>>>>>>>>> > > output type. >>>>>>>>>> > > The return type of each agg function can be obtained from >>>>>>>>>> the >>>>>>>>>> > > `CallExpression`. >>>>>>>>>> > > >>>>>>>>>> > > 3) What do you think about renaming `groupSet` to `grouping` >>>>>>>>>> or >>>>>>>>>> > > `groupedFields` ? >>>>>>>>>> > > The `groupSet` may confuse users that it relates to >>>>>>>>>> "grouping sets". >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > What do you think? >>>>>>>>>> > > >>>>>>>>>> > > Best, >>>>>>>>>> > > Jark >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > On Tue, 5 Jan 2021 at 11:04, Kurt Young <ykt...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> > > >>>>>>>>>> > >> Sorry for the typo -_-! >>>>>>>>>> > >> I meant idea #2. >>>>>>>>>> > >> >>>>>>>>>> > >> Best, >>>>>>>>>> > >> Kurt >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> On Tue, Jan 5, 2021 at 10:59 AM Sebastian Liu < >>>>>>>>>> liuyang0...@gmail.com> >>>>>>>>>> > >> wrote: >>>>>>>>>> > >> >>>>>>>>>> > >>> Hi Kurt, >>>>>>>>>> > >>> >>>>>>>>>> > >>> Thx a lot for your feedback. If local aggregation is more >>>>>>>>>> like a >>>>>>>>>> > >>> physical operator rather than logical >>>>>>>>>> > >>> operator, I think your suggestion should be idea #2 which >>>>>>>>>> handle all in >>>>>>>>>> > >>> the physical optimization phase? >>>>>>>>>> > >>> >>>>>>>>>> > >>> Looking forward for the further discussion. >>>>>>>>>> > >>> >>>>>>>>>> > >>> >>>>>>>>>> > >>> Kurt Young <ykt...@gmail.com> 于2021年1月5日周二 上午9:52写道: >>>>>>>>>> > >>> >>>>>>>>>> > >>>> Local aggregation is more like a physical operator rather >>>>>>>>>> than logical >>>>>>>>>> > >>>> operator. I would suggest going with idea #1. >>>>>>>>>> > >>>> >>>>>>>>>> > >>>> Best, >>>>>>>>>> > >>>> Kurt >>>>>>>>>> > >>>> >>>>>>>>>> > >>>> >>>>>>>>>> > >>>> On Wed, Dec 30, 2020 at 8:31 PM Sebastian Liu < >>>>>>>>>> liuyang0...@gmail.com> >>>>>>>>>> > >>>> wrote: >>>>>>>>>> > >>>> >>>>>>>>>> > >>>> > Hi Jark, Thx a lot for your quick reply and valuable >>>>>>>>>> suggestions. >>>>>>>>>> > >>>> > For (1): Agree: Since we are in the period of upgrading >>>>>>>>>> the new >>>>>>>>>> > table >>>>>>>>>> > >>>> > source api, >>>>>>>>>> > >>>> > we really should consider the new interface for the new >>>>>>>>>> optimize >>>>>>>>>> > >>>> rule. If >>>>>>>>>> > >>>> > the new rule >>>>>>>>>> > >>>> > doesn't use the new api, we'll have to upgrade it sooner >>>>>>>>>> or later. I >>>>>>>>>> > >>>> have >>>>>>>>>> > >>>> > change to use >>>>>>>>>> > >>>> > the ability interface for the SupportsAggregatePushDown >>>>>>>>>> definition >>>>>>>>>> > in >>>>>>>>>> > >>>> above >>>>>>>>>> > >>>> > proposal. >>>>>>>>>> > >>>> > >>>>>>>>>> > >>>> > For (2): Agree: Change to use CallExpression is a better >>>>>>>>>> choice, and >>>>>>>>>> > >>>> have >>>>>>>>>> > >>>> > resolved this >>>>>>>>>> > >>>> > comment in the proposal. >>>>>>>>>> > >>>> > >>>>>>>>>> > >>>> > For (3): I suggest we first support the JDBC connector, >>>>>>>>>> as we don't >>>>>>>>>> > >>>> have >>>>>>>>>> > >>>> > Druid connector >>>>>>>>>> > >>>> > and ES connector just has sink api at present. >>>>>>>>>> > >>>> > >>>>>>>>>> > >>>> > But perhaps the biggest question may be whether we >>>>>>>>>> should use idea 1 >>>>>>>>>> > >>>> or >>>>>>>>>> > >>>> > idea 2 in proposal. >>>>>>>>>> > >>>> > >>>>>>>>>> > >>>> > What do you think? After we reach the agreement on the >>>>>>>>>> proposal, >>>>>>>>>> > our >>>>>>>>>> > >>>> team >>>>>>>>>> > >>>> > can drive to >>>>>>>>>> > >>>> > complete this feature. >>>>>>>>>> > >>>> > >>>>>>>>>> > >>>> > Jark Wu <imj...@gmail.com> 于2020年12月29日周二 下午2:58写道: >>>>>>>>>> > >>>> > >>>>>>>>>> > >>>> > > Hi Sebastian, >>>>>>>>>> > >>>> > > >>>>>>>>>> > >>>> > > Thanks for the proposal. I think this is a great >>>>>>>>>> improvement for >>>>>>>>>> > >>>> Flink >>>>>>>>>> > >>>> > SQL. >>>>>>>>>> > >>>> > > I went through the design doc and have following >>>>>>>>>> thoughts: >>>>>>>>>> > >>>> > > >>>>>>>>>> > >>>> > > 1) Flink has deprecated the legacy TableSource in 1.11 >>>>>>>>>> and >>>>>>>>>> > proposed >>>>>>>>>> > >>>> a new >>>>>>>>>> > >>>> > > set of DynamicTableSource interfaces. Could you >>>>>>>>>> update your >>>>>>>>>> > >>>> proposal to >>>>>>>>>> > >>>> > > use the new interfaces? >>>>>>>>>> > >>>> > > Follow the existing ability interfaces, e.g. >>>>>>>>>> > >>>> > > SupportsFilterPushDown, SupportsProjectionPushDown. >>>>>>>>>> > >>>> > > >>>>>>>>>> > >>>> > > 2) Personally, I think CallExpression would be a better >>>>>>>>>> > >>>> representation >>>>>>>>>> > >>>> > than >>>>>>>>>> > >>>> > > separate `FunctionDefinition` and args. Because, it >>>>>>>>>> would be >>>>>>>>>> > easier >>>>>>>>>> > >>>> to >>>>>>>>>> > >>>> > know >>>>>>>>>> > >>>> > > what's the index and type of the arguments. >>>>>>>>>> > >>>> > > >>>>>>>>>> > >>>> > > 3) It would be better to list which connectors will be >>>>>>>>>> supported >>>>>>>>>> > in >>>>>>>>>> > >>>> the >>>>>>>>>> > >>>> > > plan? >>>>>>>>>> > >>>> > > >>>>>>>>>> > >>>> > > Best, >>>>>>>>>> > >>>> > > Jark >>>>>>>>>> > >>>> > > >>>>>>>>>> > >>>> > > >>>>>>>>>> > >>>> > > On Tue, 29 Dec 2020 at 00:49, Sebastian Liu < >>>>>>>>>> > liuyang0...@gmail.com> >>>>>>>>>> > >>>> > wrote: >>>>>>>>>> > >>>> > > >>>>>>>>>> > >>>> > > > Hi all, >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > I'd like to discuss a new feature for the Blink >>>>>>>>>> Planner. >>>>>>>>>> > >>>> > > > Aggregate operator of Flink SQL is currently fully >>>>>>>>>> done at Flink >>>>>>>>>> > >>>> layer. >>>>>>>>>> > >>>> > > > With the developing of storage, many downstream >>>>>>>>>> storage of Flink >>>>>>>>>> > >>>> SQL >>>>>>>>>> > >>>> > has >>>>>>>>>> > >>>> > > > the ability to deal with Aggregation operator. >>>>>>>>>> > >>>> > > > Pushing down Aggregate to data source layer will >>>>>>>>>> improve >>>>>>>>>> > >>>> performance >>>>>>>>>> > >>>> > from >>>>>>>>>> > >>>> > > > the perspective of the network IO and computation >>>>>>>>>> overhead. >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > I have drafted a design doc for this new feature. >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > >>>>>>>>>> > >>>> > >>>>>>>>>> > >>>> >>>>>>>>>> > >>>>>>>>>> https://docs.google.com/document/d/1kGwC_h4qBNxF2eMEz6T6arByOB8yilrPLqDN0QBQXW4/edit?usp=sharing >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > Any comment or discussion is welcome. >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > -- >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > *With kind regards >>>>>>>>>> > >>>> > > > >>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>> > >>>> > > > Sebastian Liu 刘洋 >>>>>>>>>> > >>>> > > > Institute of Computing Technology, Chinese Academy >>>>>>>>>> of Science >>>>>>>>>> > >>>> > > > Mobile\WeChat: +86—15201613655 >>>>>>>>>> > >>>> > > > E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com >>>>>>>>>> > >>>>>>>>>> > >>>> > > > QQ: 3239559* >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > >>>>>>>>>> > >>>> > >>>>>>>>>> > >>>> > >>>>>>>>>> > >>>> > -- >>>>>>>>>> > >>>> > >>>>>>>>>> > >>>> > *With kind regards >>>>>>>>>> > >>>> > >>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>> > >>>> > Sebastian Liu 刘洋 >>>>>>>>>> > >>>> > Institute of Computing Technology, Chinese Academy of >>>>>>>>>> Science >>>>>>>>>> > >>>> > Mobile\WeChat: +86—15201613655 >>>>>>>>>> > >>>> > E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com> >>>>>>>>>> > >>>> > QQ: 3239559* >>>>>>>>>> > >>>> > >>>>>>>>>> > >>>> >>>>>>>>>> > >>> >>>>>>>>>> > >>> >>>>>>>>>> > >>> -- >>>>>>>>>> > >>> >>>>>>>>>> > >>> *With kind regards >>>>>>>>>> > >>> ------------------------------------------------------------ >>>>>>>>>> > >>> Sebastian Liu 刘洋 >>>>>>>>>> > >>> Institute of Computing Technology, Chinese Academy of >>>>>>>>>> Science >>>>>>>>>> > >>> Mobile\WeChat: +86—15201613655 >>>>>>>>>> > >>> E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com> >>>>>>>>>> > >>> QQ: 3239559* >>>>>>>>>> > >>> >>>>>>>>>> > >>> >>>>>>>>>> > >>>>>>>>>> > -- >>>>>>>>>> > >>>>>>>>>> > *With kind regards >>>>>>>>>> > ------------------------------------------------------------ >>>>>>>>>> > Sebastian Liu 刘洋 >>>>>>>>>> > Institute of Computing Technology, Chinese Academy of Science >>>>>>>>>> > Mobile\WeChat: +86—15201613655 >>>>>>>>>> > E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com> >>>>>>>>>> > QQ: 3239559* >>>>>>>>>> > >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best, Jingsong Lee >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> *With kind regards >>>>>>>>> ------------------------------------------------------------ >>>>>>>>> Sebastian Liu 刘洋 >>>>>>>>> Institute of Computing Technology, Chinese Academy of Science >>>>>>>>> Mobile\WeChat: +86—15201613655 >>>>>>>>> E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com> >>>>>>>>> QQ: 3239559* >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best, Jingsong Lee >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Best, Jingsong Lee >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best, Jingsong Lee >>>>> >>>> >>> >>> -- >>> Best, Jingsong Lee >>> >> >> >> -- >> >> *With kind regards >> ------------------------------------------------------------ >> Sebastian Liu 刘洋 >> Institute of Computing Technology, Chinese Academy of Science >> Mobile\WeChat: +86—15201613655 >> E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com> >> QQ: 3239559* >> >> -- *With kind regards ------------------------------------------------------------ Sebastian Liu 刘洋 Institute of Computing Technology, Chinese Academy of Science Mobile\WeChat: +86—15201613655 E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com> QQ: 3239559*