Ruben Q L created CALCITE-3846:
----------------------------------
Summary: EnumerableMergeJoin: wrong comparison of composite key
with null values
Key: CALCITE-3846
URL: https://issues.apache.org/jira/browse/CALCITE-3846
Project: Calcite
Issue Type: Bug
Components: core
Affects Versions: 1.22.0
Reporter: Ruben Q L
The problem can be reproduced with the following test in EnumerablesTest.java:
{code}
@Test public void testMergeJoinWithCompositeKeyAndNull() {
assertThat(
EnumerableDefaults.mergeJoin(
Linq4j.asEnumerable(
Arrays.asList(
new Emp(10, "A"),
new Emp(10, "B"),
new Emp(10, "C"),
new Emp(10, "D"),
new Emp(40, "X"),
new Emp(50, "A"))),
Linq4j.asEnumerable(
Arrays.asList(
new Dept(10, "C"),
new Dept(10, null),
new Dept(30, "A"),
new Dept(40, "X"))),
e -> (Comparable) FlatLists.of(e.deptno, e.name),
d -> (Comparable) FlatLists.of(d.deptno, d.name),
(v0, v1) -> v0 + ", " + v1, false, false).toList().toString(),
equalTo("[Emp(10, C), Dept(10, C),"
+ " Emp(40, X), Dept(40, X)]"));
}
{code}
The test fails with the following exception:
{code}
java.lang.IllegalStateException: mergeJoin assumes input sorted in ascending
order, however [10, C] is greater than [10, null]
{code}
The problem is that EnumerableMergeJoin implementation (i.e.
EnumerableDefaults#mergeJoin) expects its inputs to be sorted in ascending
order, nulls last (see EnumerableMergeJoinRule). In case of a composite key,
EnumerableMergeJoin will represent keys as JavaRowFormat.LIST, which is a
comparable list, whose comparison is implemented via
FlatLists.ComparableListImpl#compare. This method will compare both lists, item
by item, but in will consider that a null item is less than a non-null item.
This is a de-facto nulls-first collation, which contradicts the pre-requisite
of the mergeJoin algorithm.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)