Re:Re: Re: Re: [DISCUSS] FLIP-229: Introduces Join Hint for Flink SQL Batch Job

Xuyang Mon, 09 May 2022 21:14:20 -0700

Hi Cao. 
If i guess right, what you mean is the scope that the join hint takes affect.
In your example 1, the expected result is that the hint will not takes affect 
both on (t1 join t2) and (t3 join t1) if we have the sql following,
"SELECT /*+ SHUFFLE_HASH(test1) */ t1.a FROM (select test1.* from test1 join 
test2 on test1.a = test2.a) t1 join (select test3.* from test3 join test1 on 
test3.a = test1.a) t2 ON t1.a = t2.a"
The hint will throw an exception that "test1" will not exist because the test1 
is in the subquery and not in the top-level query. The only trouble is to avoid 
the propagate the hint into the view / subquery.



In you example 2 following (note that I added the serial number in 
LogicalJoin), the hint will only take affect on LogicalJoin3 (the arg 't4' in 
'shuffle(t2, t4)' will be ignored and will choose 't2' as the build side) and 
LogicalJoin1(the arg 't2' in 'shuffle(t2, t4)' will be ignored and will choose 
't4' as the build side).
> LogicalProject Hint[]<shuffle(t2, t4)>
> +- LogicalJoin1  Hint[0]<shuffle(t2, t4)> 
>    :- LogicalJoin2  Hint[0,0]<shuffle(t2, t4)>
>    :  :- LogicalJoin3 Hint[0,0,0]<shuffle(t2, t4)>
>    :  :  :- LogicalTableScan(table=[[default_catalog, default_database,
> t1]])
>    :  :  +- LogicalTableScan(table=[[default_catalog, default_database,
> t2]])
>    :  +- LogicalTableScan(table=[[default_catalog, default_database, t3]])

>    +- LogicalTableScan(table=[[default_catalog, default_database, t4]])


So i think it seems no need to check the inherit path. Please correct me if I 
am wrong.
For the table alias.
> Since we record the view name to help to verify, could we record the 
> relationship between table name and table alias?
I think you're right and we can also use a internel information to tag alias. I 
open an issue in Calcite https://issues.apache.org/jira/browse/CALCITE-5144 and 
you can follow it if necessary. But for the initial version, we can temporarily 
not support it. And we can support it as a improvement work.



--

    Best！
    Xuyang





At 2022-05-09 18:31:40, "cao zou" <zoucao...@gmail.com> wrote:
>Hi Xuyang, thanks for your explanation.
>
>For more information about inheritPath and the logic of propagating, you
>can see `RelOptUtil`. As the doc says, The hint will be propagated from the
>root node to the children nodes. if we have an AST as follows:
>
>LogicalProject    Hint1[]<shuffle(t1)>
>> +- LogicalJoin    Hint1[0]<shuffle(t1)>
>>    :- LogicalProject
>>    :  +- LogicalJoin  Hint1[0,0,0]<shuffle(t1)>
>
>   :     +- LogicalTableScan(table=[[default_catalog, default_database,
>> t1]])
>>    :     +- LogicalTableScan(table=[[default_catalog, default_database,
>> t2]])
>>    +- LogicalProject  Hint2[]<broadcast(t3)>
>>       +- LogicalJoin  Hint1[0,0,0]<shuffle(t1)>,Hint2[0]<broadcast(t3)>
>
>         :- LogicalTableScan(table=[[default_catalog, default_database,
>> t3]])
>>          +- LogicalTableScan(table=[[default_catalog, default_database,
>> t1]])
>
>
>The child node (t1 join t2) and (t3 join t4) can inherit Hint1 from the
>parent node, such that we can not apply the Hint1 to (t1 join t2) and (t3
>join t1) in this case, the hint1 should only be applied to ((t1 join t2)
>join (t3 join t1)), and hint2 should be applied to (t3 join t1). We can
>choose the join node whose hint inheritPath is [0] to apply.
>We can use inheritPath to help to determine. But when facing the cascade
>join, like:
>
>> select /*+ SHUFFLE_HASH(t2,t4) */ xxx
>> from t1
>> join t2 on xxx
>> join t3 on xxx
>
>join t4 on xxx
>
>Join node with estimated inheritPath:
>
>> LogicalProject Hint[]<shuffle(t2, t4)>
>> +- LogicalJoin  Hint[0]<shuffle(t2, t4)>
>>    :- LogicalJoin  Hint[0,0]<shuffle(t2, t4)>
>>    :  :- LogicalJoin Hint[0,0,0]<shuffle(t2, t4)>
>>    :  :  :- LogicalTableScan(table=[[default_catalog, default_database,
>> t1]])
>>    :  :  +- LogicalTableScan(table=[[default_catalog, default_database,
>> t2]])
>>    :  +- LogicalTableScan(table=[[default_catalog, default_database, t3]])
>>    +- LogicalTableScan(table=[[default_catalog, default_database, t4]])
>
>In this situation, all the join nodes will get the hint which has a
>different inheritPath, we need to rely on the capability of propagation
>because only one hint can be specified at the top join node.
>The join node (t1 join t2) from the above two situations all inherit the
>hint from their parent nodes, but in situation 1, we should not apply it,
>although the join node contains the table t1. In situation 2, we should
>apply the hint to (t1 join t2), although his inheritPath is [0,0,0].
>I think we need to rely on inherited information to aid in judgment but it
>is not easy, please correct me if I have made some mistakes.
>
>
>For the table alias.
>Since we record the view name to help to verify, could we record the
>relationship between table name and table alias?
>
>[1]
>https://github.com/apache/calcite/blob/b9c2099ea92a575084b55a206efc5dd341c0df62/core/src/main/java/org/apache/calcite/plan/RelOptUtil.java#L4009
>
>Xuyang <xyzhong...@163.com> 于2022年5月9日周一 15:11写道：
>
>> Thanks for your attention, Li. I agree with you and Cao that in theory
>> join hint should support alias just like other popular computing engines
>> and dbs. But the main difficult is that currently, calcite will ignore the
>> alias when converting SqlNode to RelNode, so the information about alias
>> will not be retained. I think it's not a good idea to fix it in Flink
>> because this work will change many codes that belong to Calcite and cause
>> many copied files from Calcite. We can support this feature in the future
>> and temporarily throw an exception just like the behavior that the name of
>> the table or view could not be found.
>>
>>
>>
>>
>> What do you think about this.
>>
>>
>>
>>
>> --
>>
>>     Best！
>>     Xuyang
>>
>>
>>
>>
>>
>> At 2022-05-09 12:50:28, "Jingsong Li" <jingsongl...@gmail.com> wrote:
>> >Thanks Xuyang for driving.
>> >
>> >zoucao also mentioned alias.
>> >
>> >Can you explain in the FLIP why alias is not supported? What are the
>> >difficulties and maybe we can try to overcome them. Or how do we need
>> >to report errors if we don't support it.
>> >
>> >Best,
>> >Jingsong
>> >
>> >On Mon, May 9, 2022 at 10:53 AM Xuyang <xyzhong...@163.com> wrote:
>> >>
>> >> Hi, Jark. Thanks for your review. <br/>&gt;Join Hint is a public API
>> for SQL syntax. It should work for both streaming and batch SQL.<br/>I
>> agree with your opinion. But currently, only in batch the optimizer has
>> different Join strategies for Join and there is no choice of join
>> strategies in the stream. The join hints listed in the current flip should
>> be ignored (maybe can be warned) in streaming mode.  When in the future the
>> stream mode has the choice of join strategies, I think that's a good time
>> to discuss that the join hint can affect the streaming
>> SQL.<br/><br/>&gt;Besides that, could you move your design docs into the
>> wiki?<br/>Thanks for your reminder, I have moved the content from doc to
>> the wiki.
>> >> At 2022-05-07 12:46:18, "Jark Wu" <imj...@gmail.com> wrote:
>> >> >Hi Xuyang,
>> >> >
>> >> >Thanks for starting this discussion. Join Hint is a long-time requested
>> >> >feature.
>> >> >I have briefly gone through the design doc. Join Hint is a public API
>> for
>> >> >SQL syntax.
>> >> >It should work for both streaming and batch SQL. I understand some
>> special
>> >> >hints
>> >> >may only work for batch SQL. Could you demonstrate how the hints affect
>> >> >stream SQL as well?
>> >> >
>> >> >Besides that, could you move your design docs into the wiki?
>> >> >Google docs are usually used for offline discussion.
>> >> >The discussion on google docs is not very visible to the community.
>> >> >So we would like to move designs to the wiki and move discussions to
>> the
>> >> >mailing list.
>> >> >
>> >> >Best,
>> >> >Jark
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >On Fri, 6 May 2022 at 11:07, Xuyang <xyzhong...@163.com> wrote:
>> >> >
>> >> >> Hi, all.
>> >> >>     I want to start a discussion about the FLIP-229: Introduces Join
>> Hint
>> >> >> for Flink SQL Batch Job(The cwiki[1] is not ready completely but you
>> can
>> >> >> see the whole details in docs[2]).
>> >> >> Join Hint is a common solution in many popular computing engines and
>> DBs
>> >> >> to improve the shortcomings of the optimizer by intervening in
>> optimizing
>> >> >> the plan. By Join Hint, users can intervene in the selection of the
>> join
>> >> >> strategy in optimizer, and manually optimize the execution plan to
>> improve
>> >> >> the performance of the query.
>> >> >>     In this FLIP, we propose some join hints by the existing join
>> >> >> strategies in Flink SQL for Batch job.
>> >> >> I'm look forward to your feedback about FLIP-229.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >>     Best！
>> >> >>     Xuyang
>> >> >>
>> >> >>
>> >> >> [1]
>> >> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job
>> >> >> [2]
>> >> >>
>> https://docs.google.com/document/d/1IL00ME0Z0nlXGDWTUPODobVQMAm94PAPr9pw9EdGkoQ/edit?usp=sharing
>>

Re:Re: Re: Re: [DISCUSS] FLIP-229: Introduces Join Hint for Flink SQL Batch Job

Reply via email to