Re: Re: Re: [DISCUSS] FLIP-229: Introduces Join Hint for Flink SQL Batch Job

cao zou Mon, 09 May 2022 03:31:59 -0700

Hi Xuyang, thanks for your explanation.

For more information about inheritPath and the logic of propagating, you
can see `RelOptUtil`. As the doc says, The hint will be propagated from the
root node to the children nodes. if we have an AST as follows:


LogicalProject    Hint1[]<shuffle(t1)>
> +- LogicalJoin    Hint1[0]<shuffle(t1)>
>    :- LogicalProject
>    :  +- LogicalJoin  Hint1[0,0,0]<shuffle(t1)>

   :     +- LogicalTableScan(table=[[default_catalog, default_database,
> t1]])
>    :     +- LogicalTableScan(table=[[default_catalog, default_database,
> t2]])
>    +- LogicalProject  Hint2[]<broadcast(t3)>
>       +- LogicalJoin  Hint1[0,0,0]<shuffle(t1)>,Hint2[0]<broadcast(t3)>

         :- LogicalTableScan(table=[[default_catalog, default_database,
> t3]])
>          +- LogicalTableScan(table=[[default_catalog, default_database,
> t1]])


The child node (t1 join t2) and (t3 join t4) can inherit Hint1 from the
parent node, such that we can not apply the Hint1 to (t1 join t2) and (t3
join t1) in this case, the hint1 should only be applied to ((t1 join t2)
join (t3 join t1)), and hint2 should be applied to (t3 join t1). We can
choose the join node whose hint inheritPath is [0] to apply.
We can use inheritPath to help to determine. But when facing the cascade
join, like:

> select /*+ SHUFFLE_HASH(t2,t4) */ xxx
> from t1
> join t2 on xxx
> join t3 on xxx

join t4 on xxx

Join node with estimated inheritPath:

> LogicalProject Hint[]<shuffle(t2, t4)>
> +- LogicalJoin  Hint[0]<shuffle(t2, t4)>
>    :- LogicalJoin  Hint[0,0]<shuffle(t2, t4)>
>    :  :- LogicalJoin Hint[0,0,0]<shuffle(t2, t4)>
>    :  :  :- LogicalTableScan(table=[[default_catalog, default_database,
> t1]])
>    :  :  +- LogicalTableScan(table=[[default_catalog, default_database,
> t2]])
>    :  +- LogicalTableScan(table=[[default_catalog, default_database, t3]])
>    +- LogicalTableScan(table=[[default_catalog, default_database, t4]])

In this situation, all the join nodes will get the hint which has a
different inheritPath, we need to rely on the capability of propagation
because only one hint can be specified at the top join node.
The join node (t1 join t2) from the above two situations all inherit the
hint from their parent nodes, but in situation 1, we should not apply it,
although the join node contains the table t1. In situation 2, we should
apply the hint to (t1 join t2), although his inheritPath is [0,0,0].
I think we need to rely on inherited information to aid in judgment but it
is not easy, please correct me if I have made some mistakes.


For the table alias.
Since we record the view name to help to verify, could we record the
relationship between table name and table alias?

[1]
https://github.com/apache/calcite/blob/b9c2099ea92a575084b55a206efc5dd341c0df62/core/src/main/java/org/apache/calcite/plan/RelOptUtil.java#L4009

Xuyang <[email protected]> 于2022年5月9日周一 15:11写道：

> Thanks for your attention, Li. I agree with you and Cao that in theory
> join hint should support alias just like other popular computing engines
> and dbs. But the main difficult is that currently, calcite will ignore the
> alias when converting SqlNode to RelNode, so the information about alias
> will not be retained. I think it's not a good idea to fix it in Flink
> because this work will change many codes that belong to Calcite and cause
> many copied files from Calcite. We can support this feature in the future
> and temporarily throw an exception just like the behavior that the name of
> the table or view could not be found.
>
>
>
>
> What do you think about this.
>
>
>
>
> --
>
>     Best！
>     Xuyang
>
>
>
>
>
> At 2022-05-09 12:50:28, "Jingsong Li" <[email protected]> wrote:
> >Thanks Xuyang for driving.
> >
> >zoucao also mentioned alias.
> >
> >Can you explain in the FLIP why alias is not supported? What are the
> >difficulties and maybe we can try to overcome them. Or how do we need
> >to report errors if we don't support it.
> >
> >Best,
> >Jingsong
> >
> >On Mon, May 9, 2022 at 10:53 AM Xuyang <[email protected]> wrote:
> >>
> >> Hi, Jark. Thanks for your review. <br/>&gt;Join Hint is a public API
> for SQL syntax. It should work for both streaming and batch SQL.<br/>I
> agree with your opinion. But currently, only in batch the optimizer has
> different Join strategies for Join and there is no choice of join
> strategies in the stream. The join hints listed in the current flip should
> be ignored (maybe can be warned) in streaming mode.  When in the future the
> stream mode has the choice of join strategies, I think that's a good time
> to discuss that the join hint can affect the streaming
> SQL.<br/><br/>&gt;Besides that, could you move your design docs into the
> wiki?<br/>Thanks for your reminder, I have moved the content from doc to
> the wiki.
> >> At 2022-05-07 12:46:18, "Jark Wu" <[email protected]> wrote:
> >> >Hi Xuyang,
> >> >
> >> >Thanks for starting this discussion. Join Hint is a long-time requested
> >> >feature.
> >> >I have briefly gone through the design doc. Join Hint is a public API
> for
> >> >SQL syntax.
> >> >It should work for both streaming and batch SQL. I understand some
> special
> >> >hints
> >> >may only work for batch SQL. Could you demonstrate how the hints affect
> >> >stream SQL as well?
> >> >
> >> >Besides that, could you move your design docs into the wiki?
> >> >Google docs are usually used for offline discussion.
> >> >The discussion on google docs is not very visible to the community.
> >> >So we would like to move designs to the wiki and move discussions to
> the
> >> >mailing list.
> >> >
> >> >Best,
> >> >Jark
> >> >
> >> >
> >> >
> >> >
> >> >On Fri, 6 May 2022 at 11:07, Xuyang <[email protected]> wrote:
> >> >
> >> >> Hi, all.
> >> >>     I want to start a discussion about the FLIP-229: Introduces Join
> Hint
> >> >> for Flink SQL Batch Job(The cwiki[1] is not ready completely but you
> can
> >> >> see the whole details in docs[2]).
> >> >> Join Hint is a common solution in many popular computing engines and
> DBs
> >> >> to improve the shortcomings of the optimizer by intervening in
> optimizing
> >> >> the plan. By Join Hint, users can intervene in the selection of the
> join
> >> >> strategy in optimizer, and manually optimize the execution plan to
> improve
> >> >> the performance of the query.
> >> >>     In this FLIP, we propose some join hints by the existing join
> >> >> strategies in Flink SQL for Batch job.
> >> >> I'm look forward to your feedback about FLIP-229.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >>     Best！
> >> >>     Xuyang
> >> >>
> >> >>
> >> >> [1]
> >> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job
> >> >> [2]
> >> >>
> https://docs.google.com/document/d/1IL00ME0Z0nlXGDWTUPODobVQMAm94PAPr9pw9EdGkoQ/edit?usp=sharing
>

Re: Re: Re: [DISCUSS] FLIP-229: Introduces Join Hint for Flink SQL Batch Job

Reply via email to