[ 
https://issues.apache.org/jira/browse/CALCITE-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809082#comment-16809082
 ] 

Jesus Camacho Rodriguez commented on CALCITE-2967:
--------------------------------------------------

Same mechanism as for materialized view rewriting in presence of PK-FK 
relationships is used to implement a similar optimization rule in Hive (cf 
HIVE-17040 and some refactoring in HIVE-20744). The rule contains logic to 
remove and transform inner/left/right joins, which relies on information 
extracted via metadata providers.
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinConstraintsRule.java
It was pushed to Hive recently and I did not have the chance to contribute it 
back to Calcite yet, since it needs some refactoring of the materialized view 
rewriting rule at the same time (it basically relies on some internal logic of 
{{AbstractMaterializedViewRule}} that I had to duplicate on Hive side in 
{{HiveRelOptUtil}} and that [~julianhyde] is referring to).

> New rule to remove SemiJoin
> ---------------------------
>
>                 Key: CALCITE-2967
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2967
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: Ruben Quesada Lopez
>            Assignee: Ruben Quesada Lopez
>            Priority: Major
>
> (As far as I know, there is no rule to achieve what I am about to describe, 
> if there exists already a way to do it, please let me know).
> In some specific situations, a SemiJoin can be completely removed and 
> replaced by its left child with an appropriate transformed filter.
> Let us say we want to retrieve all employees whose department satisfy a 
> certain condition, i.e.:
> {code}
> SELECT * from Employee e
> WHERE e.deptno IN
>   (SELECT d.deptno FROM Department d
>     WHERE <condition>)
> {code}
> Which would translate to something like:
> {code}
> SemiJoin (e.deptno=d.deptno)
>     Scan (table=Employee as e)
>     Filter(<condition>)
>         Scan (table=Department as d)
> {code}
> In a "normal" scenario, e.g. "all employees from Sales department", the plan 
> could no be simplified:
> {code}
> SemiJoin (e.deptno=d.deptno)
>     Scan (table=Employee as e)
>     Filter(d.name="Sales")
>         Scan (table=Department as d)
> {code}
> But with a specific condition, based on deptno, e.g. "all employees whose 
> deptno is greater than 10":
> {code}
> SemiJoin (e.deptno=d.deptno)
>     Scan (table=Employee as e)
>     Filter(d.deptno>10)
>         Scan (table=Department as d)
> {code}
> The plan could be simplified: the SemiJoin is not actually needed, we can 
> perform the query with a single scan and a converted filter:
> {code}
> Filter(e.deptno>10)
>     Scan (table=Employee as e)
> {code}
> The goal would be to provide a new rule to achieve that (since there is 
> already a SemiJoinRemoveRule, we could name this new rule as e.g. 
> SemiJoinSimplifyRule?).
> I know that, ideally, this rule should not be needed because the plan could 
> be directly written without the SemiJoin, but let's say that we are in a 
> situation were the plan is systematically generated with the same pattern, 
> and there is no way to know in advanced the filter condition that will be 
> used within.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to