I have tried this approach, but the decorrelator fails to decorrelate many queries. Here is a simple query where after decorrelation there are still LogicalCorrelate nodes left:
SELECT 3 in (SELECT empno FROM emp) Maybe I am missing some optimization steps? >From my reading of the code, the Calcite decorrelator seems to be a heuristic >decorrelator, which only handles a few types of patterns of correlated >subqueries. If your query doesn't fall into one of these patterns, the >decorrelator leaves it unchanged. For example, this query, after converting it to use LogicalCorrelate as you have described, becomes: LogicalProject(EXPR$0=[IS NOT NULL($1)]), id = 211 LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{}]), id = 213 LogicalValues(tuples=[[{ 0 }]]), id = 180 LogicalAggregate(group=[{0}]), id = 207 LogicalProject(cs=[true]), id = 205 LogicalFilter(condition=[=(3, $0)]), id = 203 LogicalProject(EMPNO=[$0]), id = 201 LogicalTableScan(table=[[schema, EMP]]), id = 182 The call to RelDecorrelator.decorrelateQuery(rel, builder) cannot decorrelate this query. Do you have a solution that handles this case? BTW: I also think that the FILTER_INTO_JOIN rule is unsound , e.g.: https://issues.apache.org/jira/browse/CALCITE-5627 Thank you, Mihai ________________________________ From: suibianwanwan33 <suibianwanwa...@foxmail.com> Sent: Tuesday, August 13, 2024 10:31 PM To: dev <dev@calcite.apache.org> Subject: Re:Query planner structure Hi, this is one of the classic optimization cases in my work. First you need to add all CORRELATE optimizations such as FILTER_SUB_QUERY_TO_CORRELATE in a HepPlanner, after that you need to do a decorrelateQuery, and finally use a HepPlanner to add the JOIN_CONDITION_PUSH and the FILTER_INTO_JOIN. my guess is that the Cartesian product of this query will be optimized ------------------ Original ------------------ From: "dev" <mbu...@gmail.com>; Date: Wed, Aug 14, 2024 12:51 PM To: "dev@calcite.apache.org"<dev@calcite.apache.org>; Subject: Query planner structure Hello all, Calcite contains many optimization rules. It is not obvious to me how to assemble them into a good planner. Can people recommend tested query planners based on Calcite that are open-source and that can be used as a starting point? In particular, I am having trouble making a HEP-planner produce a good plan for Q21 from TPC-H; the planner I assembled produces many cartesian products. For reference, here is the variant of Q21 that I am using: create view q21 ( s_name, numwait ) as select s_name, count(*) as numwait from supplier, lineitem l1, orders, nation where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey and n_name = 'GERMANY' group by s_name order by numwait desc, s_name LIMIT 100; Thank you for any suggestions! Mihai