By the way, LoptOptimizeMultiJoinRule was written by Zelaine Fong, who implemented a similar rule in Postgres (when it was a university research project). Read her thesis, " The design and implementation of the POSTGRES query optimizer”: https://dsf.berkeley.edu/papers/UCB-MS-zfong.pdf
> On Mar 14, 2025, at 10:43 AM, Julian Hyde <jhyde.apa...@gmail.com> wrote: > > Since these are inner joins, you could convert > > Join(i0, i1, c0 and c1 and c2) > > into > > Filter(c2) > Filter(c1) > Filter(c0) > CrossJoin(i0, i1) > > without changing semantics. > > I believe that’s what LoptOptimizeMultiJoinRule does - it shreds the joins > into a join graph and reassembles them using a greedy algorithm. > > >> On Mar 14, 2025, at 8:29 AM, Steven Phillips <ste...@dremio.com.INVALID> >> wrote: >> >> You might try looking into the MultiJoinOptimizeBushy rule. I think it also >> is not doing exactly what you want, but it might be a good starting point >> for a rule that does. >> >> >> On Fri, Mar 14, 2025 at 1:35 AM Mads Sejer Pedersen <s...@itu.dk.invalid> >> wrote: >> >>> Hi people, >>> >>> I am doing some benchmarking with Calcite for the sql-api in Apache Wayang >>> that requires typically multiconditional joins to be split into "binary" >>> joins ala: >>> LogicalJoin(condition=[AND(=($0, $27), =($10, $28), =($34, $2))], >>> joinType=[inner]): rowcount = 118.65234375, cumulative cost = 1038.96484375 >>> LogicalJoin(condition=[=($0, $11)], joinType=[inner]): >>> rowcount = 351.5625, cumulative cost = 820.3125 >>> LogicalJoin(condition=[=($0, $3)], joinType=[inner]): >>> rowcount = 93.75, cumulative cost = 343.75 >>> LogicalFilter(condition=[SEARCH($1, >>> Sarg['cs':CHAR(11), 'gaming':CHAR(11), 'mathematica']:CHAR(11))]): rowcount >>> = 25.0, cumulative cost = 125.0 >>> LogicalTableScan(table=[[postgres, site]]): rowcount >>> = 100.0, cumulative cost = 100.0 >>> LogicalFilter(condition=[SEARCH($6, >>> Sarg[[10..100000]])]): rowcount = 25.0, cumulative cost = 125.0 >>> LogicalTableScan(table=[[postgres, so_user]]): >>> rowcount = 100.0, cumulative cost = 100.0 >>> LogicalFilter(condition=[SEARCH($6, Sarg[[0..100]])]): >>> rowcount = 25.0, cumulative cost = 125.0 >>> LogicalTableScan(table=[[postgres, question]]): >>> rowcount = 100.0, cumulative cost = 100.0 >>> LogicalTableScan(table=[[postgres, answer]]): rowcount = >>> 100.0, cumulative cost = 100.0 >>> >>> >>> BinaryJoin(condition=[=($60, $2)], joinType=[inner]) >>> BinaryJoin(condition=[=($10, $41)], joinType=[inner]) >>> BinaryJoin(condition=[=($0, $27)], joinType=[inner]) >>> LogicalJoin(condition=[=($0, $11)], joinType=[inner]) >>> LogicalJoin(condition=[=($0, $3)], joinType=[inner]) >>> LogicalFilter(condition=[SEARCH($1, Sarg['cs':CHAR(11), >>> 'gaming':CHAR(11), 'mathematica']:CHAR(11))]) >>> LogicalTableScan(table=[[postgres, site]]) >>> LogicalFilter(condition=[SEARCH($6, Sarg[[10..100000]])]) >>> LogicalTableScan(table=[[postgres, so_user]]) >>> LogicalFilter(condition=[SEARCH($6, Sarg[[0..100]])]) >>> LogicalTableScan(table=[[postgres, question]]) >>> LogicalTableScan(table=[[postgres, answer]]) >>> LogicalTableScan(table=[[postgres, answer]]) >>> LogicalTableScan(table=[[postgres, answer]]) >>> >>> Is this something that is already supported in Calcite? I have looked at >>> current Calcite rules; JoinToMultiJoinRule, LoptOptimizeJoinRule, but they >>> don't quite fit my use case. >>> Furthermore, if it is not supported, how would one go about implementing >>> such a split? I have looked at a rules-based implementation using the >>> hep-planner. But I am having issues with how to translate the RexInputRef's >>> indexes to the "right" place, as I need the indexes to always point to the >>> new joining table rows. >>> >>> >