[ https://issues.apache.org/jira/browse/CALCITE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17933873#comment-17933873 ]
Stamatis Zampetakis commented on CALCITE-6846: ---------------------------------------------- For the record, my [previous question above|https://issues.apache.org/jira/browse/CALCITE-6846?focusedCommentId=17932252&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17932252] was not performance oriented. Theoretically since DPhyp algorithm explores all possible join orders there should be a test that shows that DPhyp picks the best join order while the greedy LOpt algorithm fails to do so. The performance comparison of the two algorithms would be useful so that we could derive an automatic way of picking between the two (if not is always better than the other) based on the number of joins or other criteria. All of the above, are improvements that could be done in follow-up work if some contributor has time and interest. > Support basic DPhyp join reorder algorithm > ------------------------------------------ > > Key: CALCITE-6846 > URL: https://issues.apache.org/jira/browse/CALCITE-6846 > Project: Calcite > Issue Type: New Feature > Components: core > Affects Versions: 1.38.0 > Reporter: Silun Dong > Assignee: Silun Dong > Priority: Major > Labels: pull-request-available > Fix For: 1.39.0 > > > Supports the basic dphyp join reorder algorithm. > For example : > {code:java} > SELECT > i_item_id > FROM store_sales, customer_demographics, date_dim, item, promotion > WHERE ss_sold_date_sk = d_date_sk AND > ss_item_sk = i_item_sk AND > ss_cdemo_sk = cd_demo_sk AND > ss_promo_sk = p_promo_sk {code} > The plan tree after pushing down filter : > {code:java} > LogicalProject(i_item_id=[$61]) > LogicalJoin(condition=[=($7, $82)], joinType=[inner]) > LogicalJoin(condition=[=($1, $60)], joinType=[inner]) > LogicalJoin(condition=[=($22, $32)], joinType=[inner]) > LogicalJoin(condition=[=($3, $23)], joinType=[inner]) > LogicalTableScan(table=[[tpcds, store_sales]]) > LogicalTableScan(table=[[tpcds, customer_demographics]]) > LogicalTableScan(table=[[tpcds, date_dim]]) > LogicalTableScan(table=[[tpcds, item]]) > LogicalTableScan(table=[[tpcds, promotion]]){code} > Convert Joins into one HyperGraph : > {code:java} > LogicalProject(i_item_id=[$61]) > > HyperGraph(edges=[{0}——INNER——{1},{0}——INNER——{2},{0}——INNER——{3},{0}——INNER——{4}]) > LogicalTableScan(table=[[tpcds, store_sales]]) > LogicalTableScan(table=[[tpcds, customer_demographics]]) > LogicalTableScan(table=[[tpcds, date_dim]]) > LogicalTableScan(table=[[tpcds, item]]) > LogicalTableScan(table=[[tpcds, promotion]]) {code} > After dphyp join reorder (with trimming fields and pushing down Project), the > plan is : > {code:java} > LogicalProject(i_item_id=[$1]) > LogicalJoin(condition=[=($0, $2)], joinType=[inner]) > LogicalProject(ss_cdemo_sk=[$0], i_item_id=[$2]) > LogicalJoin(condition=[=($1, $3)], joinType=[inner]) > LogicalProject(ss_cdemo_sk=[$1], ss_sold_date_sk=[$2], i_item_id=[$4]) > LogicalJoin(condition=[=($0, $3)], joinType=[inner]) > LogicalProject(ss_item_sk=[$0], ss_cdemo_sk=[$1], > ss_sold_date_sk=[$3]) > LogicalJoin(condition=[=($2, $4)], joinType=[inner]) > LogicalProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], > ss_promo_sk=[$7], ss_sold_date_sk=[$22]) > LogicalTableScan(table=[[tpcds, store_sales]]) > LogicalProject(p_promo_sk=[$0]) > LogicalTableScan(table=[[tpcds, promotion]]) > LogicalProject(i_item_sk=[$0], i_item_id=[$1]) > LogicalTableScan(table=[[tpcds, item]]) > LogicalProject(d_date_sk=[$0]) > LogicalTableScan(table=[[tpcds, date_dim]]) > LogicalProject(cd_demo_sk=[$0]) > LogicalTableScan(table=[[tpcds, customer_demographics]]) {code} > The main enumeration process of dphyp will be implemented in pr. However, it > only can process inner join for now and the simplification of hypergraph has > not yet been implemented. -- This message was sent by Atlassian Jira (v8.20.10#820010)