Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

Haisheng Yuan Wed, 15 Jul 2020 12:11:36 -0700

> 1) I do not like the idea of XOR-based hash code, because it would make
> ($1=$1) have the same hashcode as ($2=$2) and so on.
You have a point. However, is it really a concern? How frequent will it occur? 
Especially when an operator like Join, Filter, that has the same input rel, but 
with different rex node $1=$1 vs $2=$2? Most likely they have different input. 
Even if we consider AND/OR with many these sub-expressions, OR($1=$1, $2=$2, 
$3=$3....), things might be bad if we want do dedup RexNode children using 
Set<RexNode>, but how often will we see this in production? I haven't. 
Greenplum database has been using this strategy for many years, I haven't seen 
any performance issue that is caused by it.


> This reverting can easily happen as rule does its transformations (e.g.
> swap join order and so on).

If just swapping join order, I doubt it really helps to normalize it. The Join 
operator e.g. innerjoin(S, R) generated by join reordering is not equivalent 
with the original join innerjoin(R, S), they are in different RelSet, and with 
different input rel order, we don't have chance to compare the 2 joins, because 
the input order, rel type and hashcode are different.

> What I do not like with the current code is it does perform
> compute-intensive operations when calling equals.
I agree. I guess you mean every time we call equals, it will normalize it again 
and again. However, it is a tradeoff. Still better than normalize it in RexNode 
constructor. Sometimes, we want to specify the exact operand order, e.g. 
AND($2>10, $1 < 5). If $2>10 is much more selective, can filter out more 
tuples, it will help the query performance a lot. In that case, I don't want 
Calcite reorder it for me.

On 2020/07/15 18:06:43, Vladimir Sitnikov <sitnikov.vladi...@gmail.com> wrote: 
> I agree that extensibility might be helpful, however:
> 
> 1) I do not like the idea of XOR-based hash code, because it would make
> ($1=$1) have the same hashcode as ($2=$2) and so on.
> 2) "$2 > $1 is reordered to  $1 < $2, so that predicate a > b and b < a can
> be reduced to a > b."
> This reverting can easily happen as rule does its transformations (e.g.
> swap join order and so on).
> That is why ability to normalize < into > helps like it helps for $1=$2 vs
> $2=$1
> 
> >when just computing the hash code?
> 
> What I do not like with the current code is it does perform
> compute-intensive operations when calling equals.
> Previous code (the one from CALCITE-2450) never computed the normalization
> multiple times per RexNode.
> It looks like now we losing that feature.
> 
> Vladimir
>

Re: [DISCUSS] Default disable the RexNode normalization(or operands reorder)

Reply via email to