+1 to move rex node normalization out of constructor.

On a second thought, I can't help thinking, do we really need RexNode 
normalization?

There are 2 kinds of normalization in current codebase:
1. reverse the operator
$2 > $1 is reordered to  $1 < $2, so that predicate a > b and b < a can be 
reduced to a > b.
In my experience, this is very rare, unless written by human on purpose, it may 
not be worth the optimization. Perhaps some other people may find it useful in 
their specific scenario.

2. simply flipping 2 sides
$2 = $1 is normalized to $1 = $2, which can help dedup the predicates with 
different operand orders that are generated during optimization, like inferring 
predicates from Join.

However, it is very limited, because:
- it only applies to RexInputRef, not general expression. 
- It is not extensible. Customized sql operator can't benefit from this, e.g. 
geospatial operator intersect:
  boolean &&( geometry A , geometry B )

So here is another approach in my mind (only for case 2):

Add an default overridable method to SqlOperator:
boolean inputOrderSensitive() {
  return true;
}

Sql operators like EQUALS, NOT_EQUALS, AND, OR should return false to indicate 
they are not input order sensitive.

When computing the hash code of RexCall with input order insensitive sql 
operator, we can use XOR.
op0.hashCode() ^ op1.hashCode() ...

Usually XOR is not a good candidate for hash code, but here it is a very good 
scenario to use XOR, because no matter what order it is, the XOR will generate 
the same hash code, which is exactly what we want.

By doing this way, we can avoid early normalization when computing hashcode, 
because there may not be equivalent rexnodes with different input order at all. 
If the hashcode is the same, then we need to check equals() method. We don't 
need to sort them, and in some cases like the 2 operands of EQUAL are not 
RexInputRef, it may be hard to sort. We just check the 2 RexNodes have the same 
number of distinct operands and each occurred the same times. And we can limit 
the number of operands to 2, 3, 5... what ever, if we are concerned about the 
performance when sql operator like OR has thousands of operands.


Thanks,
Haisheng Yuan

On 2020/07/13 07:58:19, Danny Chan <yuzhao....@gmail.com> wrote: 
> Yes, it is. We can keep it as a builtin promotion.
> 
> Best,
> Danny Chan
> 在 2020年7月13日 +0800 PM3:48,Vladimir Sitnikov <sitnikov.vladi...@gmail.com>,写道:
> > > Hi, all, I’m planning to default disable the RexNode normalization in
> > CALCITE-4073, if you have any objections, please let me know in 24 hours,
> > thanks so much ~
> >
> > I assume it would still normalize RexNodes when building plan digest. Is it
> > the case?
> >
> > Vladimir
> 

Reply via email to