Re: Support local aggregate push down for Blink batch planner

2021-01-08 Thread Jark Wu
Great! Thanks for pushing this work. Looking forward to the pull requests. Best, Jark On Fri, 8 Jan 2021 at 17:57, Sebastian Liu wrote: > Hi Jark, > > Cool, following your suggestions I have created three related subtasks > under Flink-20791. > Hope to assign these subtasks to me too, when you

Re: Support local aggregate push down for Blink batch planner

2021-01-08 Thread Sebastian Liu
Hi Jark, Cool, following your suggestions I have created three related subtasks under Flink-20791. Hope to assign these subtasks to me too, when you have time. And I will push forward the relevant implementation. Jark Wu 于2021年1月8日周五 下午12:30写道: > Hi Sebastian, > > I assigned the issue to you. B

Re: Support local aggregate push down for Blink batch planner

2021-01-07 Thread Jark Wu
Hi Sebastian, I assigned the issue to you. But I suggest creating sub-tasks under this issue. Because I think this would be a big contribution. For example, you can split it into: 1. Introduce SupportsAggregatePushDown interface 2. Support SupportsAggregatePushDown in planner 3. Support SupportsAg

Re: Support local aggregate push down for Blink batch planner

2021-01-07 Thread Sebastian Liu
Hi Jark, Seems that we have reached the agreement on the proposal. Could you please help to assign the below jira ticket to me? https://issues.apache.org/jira/browse/FLINK-20791 Jark Wu 于2021年1月7日周四 上午10:25写道: > Thanks for updating the design doc. > It looks good to me. > > Best, > Jark > > On

Re: Support local aggregate push down for Blink batch planner

2021-01-06 Thread Jark Wu
Thanks for updating the design doc. It looks good to me. Best, Jark On Thu, 7 Jan 2021 at 10:16, Jingsong Li wrote: > Sounds good to me. > > We don't have to worry about future changes, because it has covered all > the capabilities of calcite aggregation. > > Best, > Jingsong > > On Thu, Jan 7,

Re: Support local aggregate push down for Blink batch planner

2021-01-06 Thread Jingsong Li
Sounds good to me. We don't have to worry about future changes, because it has covered all the capabilities of calcite aggregation. Best, Jingsong On Thu, Jan 7, 2021 at 12:14 AM Sebastian Liu wrote: > Hi Jark, > > Sounds good to me. For better scalability in the future, we could add the > Agg

Re: Support local aggregate push down for Blink batch planner

2021-01-06 Thread Sebastian Liu
Hi Jark, Sounds good to me. For better scalability in the future, we could add the AggregateExpression. ``` public class AggregateExpression implements ResolvedExpression { private final FunctionDefinition functionDefinition; private final List args; private final @Nullable CallExpres

Re: Support local aggregate push down for Blink batch planner

2021-01-06 Thread Jark Wu
Hi Liu, Jingsong, Regarding the agg with filter, I think in theory we can support pushing such a pattern into source. We don't need to support it in the first version, but in the long term, we can support it. The designed interface should be future proof. Considering filter arg and distinct flag

Re: Support local aggregate push down for Blink batch planner

2021-01-06 Thread Sebastian Liu
Hi Jingsong, Jark, Thx so much for our discussion, and the cases mentioned above are really worthy for further discussion. 1. For aggregate with filter expressions: eg: select COUNT(1) FILTER(WHERE cc_call_center_sk > 3) from call_center; For the current Blink Planner, the optimized plan will be:

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jingsong Li
> I think filter expressions and grouping sets are semantic arguments instead of utilities. If we want to push them into sources, the connector developers should be aware of them.Wrapping them in a context implicitly is error-prone that the existing connector will produce wrong results when upgradi

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jark Wu
I think filter expressions and grouping sets are semantic arguments instead of utilities. If we want to push them into sources, the connector developers should be aware of them. Wrapping them in a context implicitly is error-prone that the existing connector will produce wrong results when upgradi

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jingsong Li
Hi, I'm also curious about aggregate with filter (COUNT(1) FILTER(WHERE d > 1)). Can we push it down? I'm not sure that a single call expression can express it, and how we should embody it and convey it to users. Best, Jingsong On Wed, Jan 6, 2021 at 1:36 PM Jingsong Li wrote: > Hi Jark, > > I

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jingsong Li
Hi Jark, I don't want to limit this interface to LocalAgg Push down. Actually, sometimes, we can push whole aggregation to source too. So, this rule can do something more advanced. For example, we can push down group sets to source too, for the SQL: "GROUP BY GROUPING SETS (f1, f2)". Then, we nee

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jark Wu
I think this may be over designed. We should have confidence in the interface we design, the interface should be stable. Wrapping things in a big context has a cost of losing user convenience. Foremost, we don't see any parameters to add in the future. Do you know any potential parameters? Best, J

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jingsong Li
Hi Sebastian, Well, I mean: `boolean applyAggregates(int[] groupingFields, List aggregateExpressions, DataType producedDataType);` VS ``` boolean applyAggregates(Aggregation agg); interface Aggregation { int[] groupingFields(); List aggregateExpressions(); DataType producedDataType(); } ``

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Sebastian Liu
Hi Jinsong, Thx a lot for your suggestion. These points really need to be clear in the proposal. For the semantic problem, I think the main point is the different returned data types for the target aggregate function and the row format returned by the underlying storage. That's why we provide the

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jark Wu
Thanks for the update. The proposal looks good to me now. Best, Jark On Tue, 5 Jan 2021 at 14:44, Jingsong Li wrote: > Thanks for your proposal! Sebastian. > > +1 for SupportsAggregatePushDown. The above wonderful discussion has > solved many of my concerns. > > ## Semantic problems > > We may

Re: Support local aggregate push down for Blink batch planner

2021-01-04 Thread Jingsong Li
Thanks for your proposal! Sebastian. +1 for SupportsAggregatePushDown. The above wonderful discussion has solved many of my concerns. ## Semantic problems We may need to add some mechanisms or comments, because as far as I know, the semantics of each database is actually different, which may nee

Re: Support local aggregate push down for Blink batch planner

2021-01-04 Thread Sebastian Liu
Hi Jark, Thx for your further feedback and help. The interface of SupportsAggregatePushDown may indeed need some adjustments. For (1) Agree: Yeah, the upstream only need to know if the TableSource can handle all of the aggregates. It's better to just return a boolean type to indicate whether all

Re: Support local aggregate push down for Blink batch planner

2021-01-04 Thread Sebastian Liu
Thanks for the clarification. I have resolved all of the comments and added a conclusion section. Looking forward to the further feedback from our community. If we get consensus on the design doc, I can push the implementation related work. Kurt Young 于2021年1月5日周二 上午11:04写道: > Sorry for the typ

Re: Support local aggregate push down for Blink batch planner

2021-01-04 Thread Jark Wu
I'm also +1 for idea#2. Regarding to the updated interface, Result applyAggregates(List aggregateExpressions, int[] groupSet, DataType aggOutputDataType); final class Result { private final List acceptedAggregates; private final List remainingAggregates; } I have following co

Re: Support local aggregate push down for Blink batch planner

2021-01-04 Thread Kurt Young
Sorry for the typo -_-! I meant idea #2. Best, Kurt On Tue, Jan 5, 2021 at 10:59 AM Sebastian Liu wrote: > Hi Kurt, > > Thx a lot for your feedback. If local aggregation is more like a physical > operator rather than logical > operator, I think your suggestion should be idea #2 which handle a

Re: Support local aggregate push down for Blink batch planner

2021-01-04 Thread Sebastian Liu
Hi Kurt, Thx a lot for your feedback. If local aggregation is more like a physical operator rather than logical operator, I think your suggestion should be idea #2 which handle all in the physical optimization phase? Looking forward for the further discussion. Kurt Young 于2021年1月5日周二 上午9:52写道:

Re: Support local aggregate push down for Blink batch planner

2021-01-04 Thread Kurt Young
Local aggregation is more like a physical operator rather than logical operator. I would suggest going with idea #1. Best, Kurt On Wed, Dec 30, 2020 at 8:31 PM Sebastian Liu wrote: > Hi Jark, Thx a lot for your quick reply and valuable suggestions. > For (1): Agree: Since we are in the period

Re: Support local aggregate push down for Blink batch planner

2020-12-30 Thread Sebastian Liu
Hi Jark, Thx a lot for your quick reply and valuable suggestions. For (1): Agree: Since we are in the period of upgrading the new table source api, we really should consider the new interface for the new optimize rule. If the new rule doesn't use the new api, we'll have to upgrade it sooner or late

Re: Support local aggregate push down for Blink batch planner

2020-12-28 Thread Jark Wu
Hi Sebastian, Thanks for the proposal. I think this is a great improvement for Flink SQL. I went through the design doc and have following thoughts: 1) Flink has deprecated the legacy TableSource in 1.11 and proposed a new set of DynamicTableSource interfaces. Could you update your proposal to u