Ruben Q L created CALCITE-7635:
----------------------------------
Summary: Simplification result of conjunction of comparisons
depends on terms order
Key: CALCITE-7635
URL: https://issues.apache.org/jira/browse/CALCITE-7635
Project: Calcite
Issue Type: Bug
Components: core
Affects Versions: 1.42.0
Reporter: Ruben Q L
Calling RexSimplify with unknown=FALSE on this expression:
{noformat}
x < 10 AND x > 0 AND x < 20
i.e.
AND( <(x, 10), >(x, 0), <(x, 20) )
{noformat}
Should return {{x < 10 AND x > 0}}
And this happens if we run the following test inside RexProgramTest
(simplification result is returned on SEARCH format):
{code:java}
@Test void testSimplifyAndComparison() {
checkSimplifyFilter(
and(
gt(vInt(), literal(0)),
lt(vInt(), literal(10)),
lt(vInt(), literal(20))),
"SEARCH(?0.int0, Sarg[(0..10)])"); // runs OK
}
{code}
However, if we alter the order of the terms, and run instead:
{code:java}
@Test void testSimplifyAndComparison() {
checkSimplifyFilter(
and(
lt(vInt(), literal(10)),
gt(vInt(), literal(0)),
lt(vInt(), literal(20))),
"SEARCH(?0.int0, Sarg[(0..10)])"); // fails
}
{code}
If fails with:
{noformat}
Expected: with toString() is "SEARCH(?0.int0, Sarg[(0..10)])"
but: toString() was "SEARCH(?0.int0, Sarg[(0..10); NULL AS FALSE])"
{noformat}
The problem is that, in the second case, the simplification result (before
being converted into a SEARCH) is not {{x < 10 AND x > 0}} , but {{x < 10 AND x
> 0 AND X IS NOT NULL}}; this extra IS NOT NULL turns the final SEARCH
expression into "NULL AS FALSE" (instead of the default "NULL AS UNKNOWN" that
we got on the first case). This "NULL AS FALSE" will make that, if this SEARCH
expression gets later expanded, we'll get again {{x < 10 AND x > 0 AND X IS NOT
NULL}}; instead of the expected {{x < 10 AND x > 0}};
The problem originates on {{RexSimplify#simplifyAnd}}, with unknownAs == FALSE
and predicateElimination (be default true), we get into {{simplifyAndTerms}}:
{code:java}}
RexNode simplifyAnd(RexCall e, RexUnknownAs unknownAs) {
List<RexNode> operands = RelOptUtil.conjunctions(e);
if (unknownAs == FALSE && predicateElimination) {
simplifyAndTerms(operands, FALSE);
}
...
{code}
Inside this method, each term is simplified using the previous terms as
"predicates" into the RexSimplify:
{code:java}
private void simplifyAndTerms(List<RexNode> terms, RexUnknownAs unknownAs) {
RexSimplify simplify = this;
for (int i = 0; i < terms.size(); i++) {
RexNode t = terms.get(i);
if (Predicate.of(t) == null) {
continue;
}
terms.set(i, simplify.simplify(t, unknownAs)); // simplifies using
previous terms as predicates
RelOptPredicateList newPredicates =
simplify.predicates.union(rexBuilder,
RelOptPredicateList.of(rexBuilder, terms.subList(i, i + 1)));
simplify = simplify.withPredicates(newPredicates);
}
...
{code}
When simplifying each term (using the previous ones as predicates), since they
are comparisons, we'll get inside RexSimplify#simplifyComparison, which calls
at the end {{simplifyUsingPredicates}}:
{code:java}
private <C extends Comparable<C>> RexNode simplifyComparison(RexCall e,
RexUnknownAs unknownAs, Class<C> clazz) {
...
return simplifyUsingPredicates(e2, clazz);
}
{code}
And {{simplifyUsingPredicates}} calls {{residue}} method, which can return
{{RangeSets.rangeSetAll()}} (which results in returning IS NOT NULL, because
"nullability might be problematic"), but only depending on the order in which
the terms have been processed:
{code:java}
private <C extends Comparable<C>> RexNode simplifyUsingPredicates(RexNode e,
Class<C> clazz) {
...
final C v0 = comparison.literal.getValueAs(clazz);
...
final RangeSet<C> rangeSet = rangeSet(comparison.kind, v0);
final RangeSet<C> rangeSet2 =
residue(comparison.ref, rangeSet, predicates.pulledUpPredicates,
clazz);
if (rangeSet2.isEmpty()) {
// Term is impossible to satisfy given these predicates
return rexBuilder.makeLiteral(false);
} else if (rangeSet2.equals(rangeSet)) {
// no change
return e;
} else if (rangeSet2.equals(RangeSets.rangeSetAll())) {
// Range is always satisfied given these predicates; but nullability might
// be problematic
return simplify(
rexBuilder.makeCall(RexUtil.getPos(e),
SqlStdOperatorTable.IS_NOT_NULL, comparison.ref),
RexUnknownAs.UNKNOWN); // [1]
...
} else {
// range has been reduced but it's not worth simplifying
return e; // [2]
}
{code}
For example:
- Processing x<10, x>0, x<20; will return in X is NOT NULL (via [1]) when x<20
is processed here
- Processing x<10, x<20, x>0; will return in X is NOT NULL (via [1]) when x<20
is processed here
- Processing x>0, x<10, x<20; will return x<20 (via [2]) when x<20 is processed
here
- Processing x>0, x<20, x<10; will return x<20 (via [2]) when x<20 is processed
here (and x<10 when x<10 is processed)
- Processing x<20, x>0, x<10; will return x<20 (via [2]) when x<20 is processed
here (and x<10 when x<10 is processed)
Basically, the problem is: when the expression x<20 (i.e. range (-INF, 20)) is
processed and the list of previous predicates contains as first item a
predicate enclosed by it [3], in this case x<10 (i.e. range(-INF, 10)),
{{residue}} will consider intermediate result as {{RangeSets.rangeSetAll()}}
(i.e. (-INF, +INF), which will enclose any other predicate on subsequent
iterations, so this will be the result to be returned:
{code:java}
private static <C extends Comparable<C>> RangeSet<C> residue(RexNode ref,
RangeSet<C> r0, List<RexNode> predicates, Class<C> clazz) {
RangeSet<C> result = r0;
for (RexNode predicate : predicates) {
...
final RexCall call = (RexCall) predicate;
final Comparison comparison = Comparison.of(call);
if (comparison != null && comparison.ref.equals(ref)) {
final C c1 = comparison.literal.getValueAs(clazz);
...
final Range<C> r1 = range(comparison.kind, c1);
if (result.encloses(r1)) {
// Given these predicates, term is always satisfied.
// e.g. r0 is "$0 < 10", r1 is "$0 < 5"
result = RangeSets.rangeSetAll(); [3]
continue;
}
result = result.subRangeSet(r1); [4]
...
}
return result;
}
{code}
Notice that, in case we process x<20 (i.e. range (-INF, 20)), but the list of
predicates is not [(-INF, 10), (0, INF)] , but instead the other way around
[(0, INF), (-INF, 10)]; then result will never be (-INF, +INF), but (0,10),
since it will be accumulated as (0, 20) via [2] on the first iteration (when
processing predicate (-INF, 10)); and then (0, 10) on the second iteration,
again via [2] when processing this intermediate result with the predicate
(-INF, 10).
To sum up, the fact that RexSimplify#residue result can differ depending on the
order of its list of predicates parameter, results in a total different result
on the initial RexSimplify#simplify call.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)