Ruben Q L created CALCITE-3846:
----------------------------------

             Summary: EnumerableMergeJoin: wrong comparison of composite key 
with null values
                 Key: CALCITE-3846
                 URL: https://issues.apache.org/jira/browse/CALCITE-3846
             Project: Calcite
          Issue Type: Bug
          Components: core
    Affects Versions: 1.22.0
            Reporter: Ruben Q L


The problem can be reproduced with the following test in EnumerablesTest.java:
{code}
  @Test public void testMergeJoinWithCompositeKeyAndNull() {
    assertThat(
        EnumerableDefaults.mergeJoin(
            Linq4j.asEnumerable(
                Arrays.asList(
                    new Emp(10, "A"),
                    new Emp(10, "B"),
                    new Emp(10, "C"),
                    new Emp(10, "D"),
                    new Emp(40, "X"),
                    new Emp(50, "A"))),
            Linq4j.asEnumerable(
                Arrays.asList(
                    new Dept(10, "C"),
                    new Dept(10, null),
                    new Dept(30, "A"),
                    new Dept(40, "X"))),
            e -> (Comparable) FlatLists.of(e.deptno, e.name),
            d -> (Comparable) FlatLists.of(d.deptno, d.name),
            (v0, v1) -> v0 + ", " + v1, false, false).toList().toString(),
        equalTo("[Emp(10, C), Dept(10, C),"
            + " Emp(40, X), Dept(40, X)]"));
  }
{code}

The test fails with the following exception:
{code}
java.lang.IllegalStateException: mergeJoin assumes input sorted in ascending 
order, however [10, C] is greater than [10, null]
{code}

The problem is that EnumerableMergeJoin implementation (i.e. 
EnumerableDefaults#mergeJoin) expects its inputs to be sorted in ascending 
order, nulls last (see EnumerableMergeJoinRule). In case of a composite key, 
EnumerableMergeJoin will represent keys as JavaRowFormat.LIST, which is a 
comparable list, whose comparison is implemented via 
FlatLists.ComparableListImpl#compare. This method will compare both lists, item 
by item, but in will consider that a null item is less than a non-null item. 
This is a de-facto nulls-first collation, which contradicts the pre-requisite 
of the mergeJoin algorithm.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to