[
https://issues.apache.org/jira/browse/CALCITE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17981836#comment-17981836
]
Silun Dong commented on CALCITE-7029:
-------------------------------------
This PR mainly includes the following changes:
# Based on the paper {_}On the correct and complete enumeration of the core
search space{_}, the conflict detection-c algorithm is implemented.
## According to the concepts in the paper, define classes for conflict rule
and total eligibility set, and used as fields of HyperEdge.
## When JoinToHyperGraphRule builds a hypergraph, calculate the conflict rules.
## When Dphyp enumerates a csgcmp pair, verify whether the pair is legal
according to the conflict rules.
# Limit the size of the hypergraph inputs (node bitmap uses long type,
cannot exceed 64 nodes) and the size of the dp table (if the number of
connected subgraphs is too large, the enumeration is terminated early).
# Added RexNodeAndFieldIndex class, temporarily replace RexInputRef during the
dphyp algorithm. (the inputs order is frequently adjusted during enumerating,
it is difficult to maintain the correct RexInputRef).
# Added a logger for Dphyp.
Looking forward to more reviews.
> Support DPhyp to handle various join types
> ------------------------------------------
>
> Key: CALCITE-7029
> URL: https://issues.apache.org/jira/browse/CALCITE-7029
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.39.0
> Reporter: Silun Dong
> Assignee: Silun Dong
> Priority: Major
> Labels: pull-request-available
>
> Add conflict detection algorithm CD-C to DPhyp so that it can handle various
> join types.
> DpHyp algorithm is a join reorder algorithm based on dynamic programming. It
> can enumerate all possibilities of the query graph without duplication, and
> it can theoretically handle complex join predicates and various types of
> joins (outer, semi, anti, etc.).
> CALCITE-6846 completed the basic DpHyp algorithm, defined the hypergraph
> structure and enumeration process. It can enumerate the query graph without
> duplication and handle complex join predicates, but is limited to inner join.
> The ability to handle various types of joins requires conflict detection.
> This pr implements the CD-C conflict detection algorithm based on the paper
> {_}On the correct and complete enumeration of the core search space{_}. The
> conflict detection algorithm does not change the enumeration process of
> DpHyp. It calculates the conflict rules for each join operator in the process
> of constructing the hypergraph from the plan tree, and verifies the
> applicability of csg-cmp according to the conflict rules when DpHyp
> enumerates csg-cmp.
>
> The following is an example of sql and the expected plan:
> sql
> {code:java}
> select emp.empno from
> emp_address inner join emp on emp_address.empno = emp.empno
> left join dept on emp.deptno = dept.deptno
> inner join dept_nested on dept.deptno = dept_nested.deptno {code}
> initial plan
> {code:java}
> LogicalProject(EMPNO=[$3])
> LogicalJoin(condition=[=($12, $14)], joinType=[inner])
> LogicalJoin(condition=[=($10, $12)], joinType=[left])
> LogicalJoin(condition=[=($0, $3)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code}
> build hypergraph
> {code:java}
> LogicalProject(EMPNO=[$3])
> HyperGraph(edges=[{0}——[INNER, =(vertex(0)_field(0),
> vertex(1)_field(0))]——{1},{1}——[LEFT, =(vertex(1)_field(7),
> vertex(2)_field(0))]——{2},{1, 2}——[INNER, =(vertex(2)_field(0),
> vertex(3)_field(0))]——{3}])
> LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code}
> after dphyp
> {code:java}
> LogicalProject(EMPNO=[$4])
> LogicalJoin(condition=[=($15, $4)], joinType=[inner])
> LogicalJoin(condition=[=($13, $0)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
> LogicalJoin(condition=[=($7, $9)], joinType=[left])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)