+1 to move rex node normalization out of constructor. On a second thought, I can't help thinking, do we really need RexNode normalization?
There are 2 kinds of normalization in current codebase: 1. reverse the operator $2 > $1 is reordered to $1 < $2, so that predicate a > b and b < a can be reduced to a > b. In my experience, this is very rare, unless written by human on purpose, it may not be worth the optimization. Perhaps some other people may find it useful in their specific scenario. 2. simply flipping 2 sides $2 = $1 is normalized to $1 = $2, which can help dedup the predicates with different operand orders that are generated during optimization, like inferring predicates from Join. However, it is very limited, because: - it only applies to RexInputRef, not general expression. - It is not extensible. Customized sql operator can't benefit from this, e.g. geospatial operator intersect: boolean &&( geometry A , geometry B ) So here is another approach in my mind (only for case 2): Add an default overridable method to SqlOperator: boolean inputOrderSensitive() { return true; } Sql operators like EQUALS, NOT_EQUALS, AND, OR should return false to indicate they are not input order sensitive. When computing the hash code of RexCall with input order insensitive sql operator, we can use XOR. op0.hashCode() ^ op1.hashCode() ... Usually XOR is not a good candidate for hash code, but here it is a very good scenario to use XOR, because no matter what order it is, the XOR will generate the same hash code, which is exactly what we want. By doing this way, we can avoid early normalization when computing hashcode, because there may not be equivalent rexnodes with different input order at all. If the hashcode is the same, then we need to check equals() method. We don't need to sort them, and in some cases like the 2 operands of EQUAL are not RexInputRef, it may be hard to sort. We just check the 2 RexNodes have the same number of distinct operands and each occurred the same times. And we can limit the number of operands to 2, 3, 5... what ever, if we are concerned about the performance when sql operator like OR has thousands of operands. Thanks, Haisheng Yuan On 2020/07/13 07:58:19, Danny Chan <yuzhao....@gmail.com> wrote: > Yes, it is. We can keep it as a builtin promotion. > > Best, > Danny Chan > 在 2020年7月13日 +0800 PM3:48,Vladimir Sitnikov <sitnikov.vladi...@gmail.com>,写道: > > > Hi, all, I’m planning to default disable the RexNode normalization in > > CALCITE-4073, if you have any objections, please let me know in 24 hours, > > thanks so much ~ > > > > I assume it would still normalize RexNodes when building plan digest. Is it > > the case? > > > > Vladimir >