Re: Splitting multiconditional joins into binary joins?

Julian Hyde Fri, 14 Mar 2025 11:09:00 -0700

By the way, LoptOptimizeMultiJoinRule was written by Zelaine Fong, who 
implemented a similar rule in Postgres (when it was a university research 
project). Read her thesis, " The design and implementation of the POSTGRES 
query optimizer”: https://dsf.berkeley.edu/papers/UCB-MS-zfong.pdf



> On Mar 14, 2025, at 10:43 AM, Julian Hyde <jhyde.apa...@gmail.com> wrote:
> 
> Since these are inner joins, you could convert
> 
>  Join(i0, i1, c0 and c1 and c2)
> 
> into
> 
>  Filter(c2)
>    Filter(c1)
>      Filter(c0)
>        CrossJoin(i0, i1)
> 
> without changing semantics.
> 
> I believe that’s what LoptOptimizeMultiJoinRule does - it shreds the joins 
> into a join graph and reassembles them using a greedy algorithm.
> 
> 
>> On Mar 14, 2025, at 8:29 AM, Steven Phillips <ste...@dremio.com.INVALID> 
>> wrote:
>> 
>> You might try looking into the MultiJoinOptimizeBushy rule. I think it also
>> is not doing exactly what you want, but it might be a good starting point
>> for a rule that does.
>> 
>> 
>> On Fri, Mar 14, 2025 at 1:35 AM Mads Sejer Pedersen <s...@itu.dk.invalid>
>> wrote:
>> 
>>> Hi people,
>>> 
>>> I am doing some benchmarking with Calcite for the sql-api in Apache Wayang
>>> that requires typically multiconditional joins to be split into "binary"
>>> joins ala:
>>> LogicalJoin(condition=[AND(=($0, $27), =($10, $28), =($34, $2))],
>>> joinType=[inner]): rowcount = 118.65234375, cumulative cost = 1038.96484375
>>>               LogicalJoin(condition=[=($0, $11)], joinType=[inner]):
>>> rowcount = 351.5625, cumulative cost = 820.3125
>>>                 LogicalJoin(condition=[=($0, $3)], joinType=[inner]):
>>> rowcount = 93.75, cumulative cost = 343.75
>>>                   LogicalFilter(condition=[SEARCH($1,
>>> Sarg['cs':CHAR(11), 'gaming':CHAR(11), 'mathematica']:CHAR(11))]): rowcount
>>> = 25.0, cumulative cost = 125.0
>>>                     LogicalTableScan(table=[[postgres, site]]): rowcount
>>> = 100.0, cumulative cost = 100.0
>>>                   LogicalFilter(condition=[SEARCH($6,
>>> Sarg[[10..100000]])]): rowcount = 25.0, cumulative cost = 125.0
>>>                     LogicalTableScan(table=[[postgres, so_user]]):
>>> rowcount = 100.0, cumulative cost = 100.0
>>>                 LogicalFilter(condition=[SEARCH($6, Sarg[[0..100]])]):
>>> rowcount = 25.0, cumulative cost = 125.0
>>>                   LogicalTableScan(table=[[postgres, question]]):
>>> rowcount = 100.0, cumulative cost = 100.0
>>>               LogicalTableScan(table=[[postgres, answer]]): rowcount =
>>> 100.0, cumulative cost = 100.0
>>> 
>>> 
>>> BinaryJoin(condition=[=($60, $2)], joinType=[inner])
>>> BinaryJoin(condition=[=($10, $41)], joinType=[inner])
>>>   BinaryJoin(condition=[=($0, $27)], joinType=[inner])
>>>     LogicalJoin(condition=[=($0, $11)], joinType=[inner])
>>>       LogicalJoin(condition=[=($0, $3)], joinType=[inner])
>>>         LogicalFilter(condition=[SEARCH($1, Sarg['cs':CHAR(11),
>>> 'gaming':CHAR(11), 'mathematica']:CHAR(11))])
>>>           LogicalTableScan(table=[[postgres, site]])
>>>         LogicalFilter(condition=[SEARCH($6, Sarg[[10..100000]])])
>>>           LogicalTableScan(table=[[postgres, so_user]])
>>>       LogicalFilter(condition=[SEARCH($6, Sarg[[0..100]])])
>>>         LogicalTableScan(table=[[postgres, question]])
>>>     LogicalTableScan(table=[[postgres, answer]])
>>>   LogicalTableScan(table=[[postgres, answer]])
>>> LogicalTableScan(table=[[postgres, answer]])
>>> 
>>> Is this something that is already supported in Calcite? I have looked at
>>> current Calcite rules; JoinToMultiJoinRule, LoptOptimizeJoinRule, but they
>>> don't quite fit my use case.
>>> Furthermore, if it is not supported, how would one go about implementing
>>> such a split? I have looked at a rules-based implementation using the
>>> hep-planner. But I am having issues with how to translate the RexInputRef's
>>> indexes to the "right" place, as I need the indexes to always point to the
>>> new joining table rows.
>>> 
>>> 
>

Re: Splitting multiconditional joins into binary joins?

Reply via email to