[
https://issues.apache.org/jira/browse/CALCITE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17953301#comment-17953301
]
Yu Xu commented on CALCITE-7029:
--------------------------------
In my opinion, It is not entirely related to this issue. This algorithm may be
very good. Currently, there are no relevant benchmarks or end-to-end tests for
all Calcite Rules scenarios. It is difficult to say how much benefit an
algorithm or Rule can bring to users in what scenarios. However, some other
engines such as ClickHouse at least have relevant slow query tests. Good ideas
may require a complete verification system to fully verify them.
> Support DPhyp to handle various join types
> ------------------------------------------
>
> Key: CALCITE-7029
> URL: https://issues.apache.org/jira/browse/CALCITE-7029
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.39.0
> Reporter: Silun Dong
> Assignee: Silun Dong
> Priority: Major
> Labels: pull-request-available
>
> Add conflict detection algorithm CD-C to DPhyp so that it can handle various
> join types.
> DpHyp algorithm is a join reorder algorithm based on dynamic programming. It
> can enumerate all possibilities of the query graph without duplication, and
> it can theoretically handle complex join predicates and various types of
> joins (outer, semi, anti, etc.).
> Calcite-6846 completed the basic DpHyp algorithm, defined the hypergraph
> structure and enumeration process. It can enumerate the query graph without
> duplication and handle complex join predicates, but is limited to inner join.
> The ability to handle various types of joins requires conflict detection.
> This pr implements the CD-C conflict detection algorithm based on the paper
> {_}On the correct and complete enumeration of the core search space{_}. The
> conflict detection algorithm does not change the enumeration process of
> DpHyp. It calculates the conflict rules for each join operator in the process
> of constructing the hypergraph from the plan tree, and verifies the
> applicability of csg-cmp according to the conflict rules when DpHyp
> enumerates csg-cmp.
>
> The following is an example of sql and the expected plan:
> sql
> {code:java}
> select emp.empno from
> emp_address inner join emp on emp_address.empno = emp.empno
> left join dept on emp.deptno = dept.deptno
> inner join dept_nested on dept.deptno = dept_nested.deptno {code}
> initial plan
> {code:java}
> LogicalProject(EMPNO=[$3])
> LogicalJoin(condition=[=($12, $14)], joinType=[inner])
> LogicalJoin(condition=[=($10, $12)], joinType=[left])
> LogicalJoin(condition=[=($0, $3)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code}
> build hypergraph
> {code:java}
> LogicalProject(EMPNO=[$3])
> HyperGraph(edges=[{0}——[INNER, =(vertex(0)_field(0),
> vertex(1)_field(0))]——{1},{1}——[LEFT, =(vertex(1)_field(7),
> vertex(2)_field(0))]——{2},{1, 2}——[INNER, =(vertex(2)_field(0),
> vertex(3)_field(0))]——{3}])
> LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code}
> after dphyp
> {code:java}
> LogicalProject(EMPNO=[$4])
> LogicalJoin(condition=[=($15, $4)], joinType=[inner])
> LogicalJoin(condition=[=($13, $0)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
> LogicalJoin(condition=[=($7, $9)], joinType=[left])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)