> 1) I do not like the idea of XOR-based hash code, because it would make > ($1=$1) have the same hashcode as ($2=$2) and so on. You have a point. However, is it really a concern? How frequent will it occur? Especially when an operator like Join, Filter, that has the same input rel, but with different rex node $1=$1 vs $2=$2? Most likely they have different input. Even if we consider AND/OR with many these sub-expressions, OR($1=$1, $2=$2, $3=$3....), things might be bad if we want do dedup RexNode children using Set<RexNode>, but how often will we see this in production? I haven't. Greenplum database has been using this strategy for many years, I haven't seen any performance issue that is caused by it.
> This reverting can easily happen as rule does its transformations (e.g. > swap join order and so on). If just swapping join order, I doubt it really helps to normalize it. The Join operator e.g. innerjoin(S, R) generated by join reordering is not equivalent with the original join innerjoin(R, S), they are in different RelSet, and with different input rel order, we don't have chance to compare the 2 joins, because the input order, rel type and hashcode are different. > What I do not like with the current code is it does perform > compute-intensive operations when calling equals. I agree. I guess you mean every time we call equals, it will normalize it again and again. However, it is a tradeoff. Still better than normalize it in RexNode constructor. Sometimes, we want to specify the exact operand order, e.g. AND($2>10, $1 < 5). If $2>10 is much more selective, can filter out more tuples, it will help the query performance a lot. In that case, I don't want Calcite reorder it for me. On 2020/07/15 18:06:43, Vladimir Sitnikov <sitnikov.vladi...@gmail.com> wrote: > I agree that extensibility might be helpful, however: > > 1) I do not like the idea of XOR-based hash code, because it would make > ($1=$1) have the same hashcode as ($2=$2) and so on. > 2) "$2 > $1 is reordered to $1 < $2, so that predicate a > b and b < a can > be reduced to a > b." > This reverting can easily happen as rule does its transformations (e.g. > swap join order and so on). > That is why ability to normalize < into > helps like it helps for $1=$2 vs > $2=$1 > > >when just computing the hash code? > > What I do not like with the current code is it does perform > compute-intensive operations when calling equals. > Previous code (the one from CALCITE-2450) never computed the normalization > multiple times per RexNode. > It looks like now we losing that feature. > > Vladimir >