+1 to move rex node normalization out of constructor.
On a second thought, I can't help thinking, do we really need RexNode
normalization?
There are 2 kinds of normalization in current codebase:
1. reverse the operator
$2 > $1 is reordered to $1 < $2, so that predicate a > b and b < a can be
reduced to a > b.
In my experience, this is very rare, unless written by human on purpose, it may
not be worth the optimization. Perhaps some other people may find it useful in
their specific scenario.
2. simply flipping 2 sides
$2 = $1 is normalized to $1 = $2, which can help dedup the predicates with
different operand orders that are generated during optimization, like inferring
predicates from Join.
However, it is very limited, because:
- it only applies to RexInputRef, not general expression.
- It is not extensible. Customized sql operator can't benefit from this, e.g.
geospatial operator intersect:
boolean &&( geometry A , geometry B )
So here is another approach in my mind (only for case 2):
Add an default overridable method to SqlOperator:
boolean inputOrderSensitive() {
return true;
}
Sql operators like EQUALS, NOT_EQUALS, AND, OR should return false to indicate
they are not input order sensitive.
When computing the hash code of RexCall with input order insensitive sql
operator, we can use XOR.
op0.hashCode() ^ op1.hashCode() ...
Usually XOR is not a good candidate for hash code, but here it is a very good
scenario to use XOR, because no matter what order it is, the XOR will generate
the same hash code, which is exactly what we want.
By doing this way, we can avoid early normalization when computing hashcode,
because there may not be equivalent rexnodes with different input order at all.
If the hashcode is the same, then we need to check equals() method. We don't
need to sort them, and in some cases like the 2 operands of EQUAL are not
RexInputRef, it may be hard to sort. We just check the 2 RexNodes have the same
number of distinct operands and each occurred the same times. And we can limit
the number of operands to 2, 3, 5... what ever, if we are concerned about the
performance when sql operator like OR has thousands of operands.
Thanks,
Haisheng Yuan
On 2020/07/13 07:58:19, Danny Chan <[email protected]> wrote:
> Yes, it is. We can keep it as a builtin promotion.
>
> Best,
> Danny Chan
> 在 2020年7月13日 +0800 PM3:48,Vladimir Sitnikov <[email protected]>,写道:
> > > Hi, all, I’m planning to default disable the RexNode normalization in
> > CALCITE-4073, if you have any objections, please let me know in 24 hours,
> > thanks so much ~
> >
> > I assume it would still normalize RexNodes when building plan digest. Is it
> > the case?
> >
> > Vladimir
>