[ https://issues.apache.org/jira/browse/CALCITE-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ian Bertolacci updated CALCITE-6882: ------------------------------------ Description: There is some overly conservative logic in determining uniqueness of columns in RelNodes which have joins beneath them (not necessarily as children) In this case, this occurs when a filter condition involving the non-unique side of a join can engage in predicate pullup/refinement. Demonstration: inner join between emp and dept testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter has no filter after the join testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter has a filter after the filter dep.name using `=` (this demonstrates the bug) testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter has a filter after the filter dep.name using `!=` In all three cases, the field position 0 (empno) should be listed as unique. But testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter reports it as not-unique. Specifically, testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter asks if column 0 is unique after the project, which asks if column 0 is unique after the filter; the pullup then adds column 10 to the bitset for unique columns. Column 10 is not unique, and so the result is that column 0 is marked as not unique. In testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter, the `!=` operator precludes the predicate pullup/refinement, and so the bitset is unmodified, and the analysis correctly marks column 0 after the join as unique. it seems like there is a combination of issues: # Join being too conservative # Incorrect processing for columns which can engage in predicate pullup/refinement. [Spawned by discussion in mailing list|https://lists.apache.org/thread/fmqyzokcqzv7vzq8lvg35h9opf5kwo33] Tests inside RelMetadataTest: {code:java} @Test void testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter() { /** Project(EMPNO=[$0], DEPTNO=[$9]) * Join(condition=[=($7, $9)], joinType=[inner]) * TableScan(table=[[CATALOG, SALES, EMP]]) * TableScan(table=[[CATALOG, SALES, DEPT]]) */ sql( "select emp.empno, dept.deptno\n" + "from emp\n" + " inner join dept\n" + "on emp.deptno = dept.deptno" ) .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true)) .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false)); } @Test void testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter() { /** Project(EMPNO=[$0], DEPTNO=[$9]) * Filter(condition=[=($10, 'foo')]) * Join(condition=[=($7, $9)], joinType=[inner]) * TableScan(table=[[CATALOG, SALES, EMP]]) * TableScan(table=[[CATALOG, SALES, DEPT]]) */ sql( "select emp.empno, dept.deptno\n" + "from emp\n" + " inner join dept\n" + "on emp.deptno = dept.deptno\n" + "where dept.name = 'foo'" ) .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true)) .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false)); } @Test void testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter() { /** Project(EMPNO=[$0], DEPTNO=[$9]) * Filter(condition=[!=($10, 'foo')]) * Join(condition=[=($7, $9)], joinType=[inner]) * TableScan(table=[[CATALOG, SALES, EMP]]) * TableScan(table=[[CATALOG, SALES, DEPT]]) */ sql( "select emp.empno, dept.deptno\n" + "from emp\n" + " inner join dept\n" + "on emp.deptno = dept.deptno\n" + "where dept.name <> 'foo'" ) .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true)) .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false)); }{code} was: There is some overly conservative logic in determining uniqueness of columns in RelNodes which have joins beneath them (not necessarily as children) In this case, this occurs when a filter condition involving the non-unique side of a join can engage in predicate pullup/refinement. Demonstration: inner join between emp and dept testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter has no filter after the join testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter has a filter after the filter dep.name using `=` (this demonstrates the bug) testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter has a filter after the filter dep.name using `!=` In all three cases, the field position 0 (empno) should be listed as unique. But testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter reports it as not-unique. Specifically, testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter asks if column 0 is unique after the project, which asks if column 0 is unique after the filter; the pullup then adds column 10 to the bitset for unique columns. Column 10 is not unique, and so the result is that column 0 is marked as not unique. In testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter, the `!=` operator precludes the predicate pullup/refinement, and so the bitset is unmodified, and the analysis correctly marks column 0 after the join as unique. it seems like there is a combination of issues: # Join being too conservative # Incorrect processing for columns which can engage in predicate pullup/refinement. [Spawned by discussion in mailing list|https://lists.apache.org/thread/fmqyzokcqzv7vzq8lvg35h9opf5kwo33] Tests inside RelMetadataTest: {code:java} @Test void testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter() { /** Project(EMPNO=[$0], DEPTNO=[$9]) * Join(condition=[=($7, $9)], joinType=[inner]) * TableScan(table=[[CATALOG, SALES, EMP]]) * TableScan(table=[[CATALOG, SALES, DEPT]]) */ sql( "select emp.empno, dept.deptno\n" + "from emp\n" + " inner join dept\n" + "on emp.deptno = dept.deptno" ) .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true)) .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false)); } @Test void testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter() { /** Project(EMPNO=[$0], DEPTNO=[$9]) * Filter(condition=[=($10, 'foo')]) * Join(condition=[=($7, $9)], joinType=[inner]) * TableScan(table=[[CATALOG, SALES, EMP]]) * TableScan(table=[[CATALOG, SALES, DEPT]]) */ sql( "select emp.empno, dept.deptno\n" + "from emp\n" + " inner join dept\n" + "on emp.deptno = dept.deptno\n" + "where dept.name = 'foo'" ) .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true)) .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false)); } @Test void testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter() { /** Project(EMPNO=[$0], DEPTNO=[$9]) * Filter(condition=[!=($10, 'foo')]) * Join(condition=[=($7, $9)], joinType=[inner]) * TableScan(table=[[CATALOG, SALES, EMP]]) * TableScan(table=[[CATALOG, SALES, DEPT]]) */ sql( "select emp.empno, dept.deptno\n" + "from emp\n" + " inner join dept\n" + "on emp.deptno = dept.deptno\n" + "where dept.name <> 'foo'" ) .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true)) .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false)); }{code} > RelMdColumnUniqueness incorrectly claims fields are not unique if constant > refinement occurs in a node above join > ----------------------------------------------------------------------------------------------------------------- > > Key: CALCITE-6882 > URL: https://issues.apache.org/jira/browse/CALCITE-6882 > Project: Calcite > Issue Type: Bug > Reporter: Ian Bertolacci > Priority: Major > > There is some overly conservative logic in determining uniqueness of columns > in RelNodes which have joins beneath them (not necessarily as children) > In this case, this occurs when a filter condition involving the non-unique > side of a join can engage in predicate pullup/refinement. > Demonstration: inner join between emp and dept > testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter has no filter after the join > testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter has a > filter after the filter dep.name using `=` (this demonstrates the bug) > testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter has a > filter after the filter dep.name using `!=` > In all three cases, the field position 0 (empno) should be listed as unique. > But testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter reports > it as not-unique. > Specifically, > testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter asks if > column 0 is unique after the project, which asks if column 0 is unique after > the filter; the pullup then adds column 10 to the bitset for unique columns. > Column 10 is not unique, and so the result is that column 0 is marked as not > unique. > In testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter, the > `!=` operator precludes the predicate pullup/refinement, and so the bitset is > unmodified, and the analysis correctly marks column 0 after the join as > unique. > it seems like there is a combination of issues: > # Join being too conservative > # Incorrect processing for columns which can engage in predicate > pullup/refinement. > [Spawned by discussion in mailing > list|https://lists.apache.org/thread/fmqyzokcqzv7vzq8lvg35h9opf5kwo33] > Tests inside RelMetadataTest: > {code:java} > @Test void testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter() { > /** Project(EMPNO=[$0], DEPTNO=[$9]) > * Join(condition=[=($7, $9)], joinType=[inner]) > * TableScan(table=[[CATALOG, SALES, EMP]]) > * TableScan(table=[[CATALOG, SALES, DEPT]]) > */ > sql( > "select emp.empno, dept.deptno\n" > + "from emp\n" > + " inner join dept\n" > + "on emp.deptno = dept.deptno" > ) > .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true)) > .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false)); > } > @Test void > testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter() { > /** Project(EMPNO=[$0], DEPTNO=[$9]) > * Filter(condition=[=($10, 'foo')]) > * Join(condition=[=($7, $9)], joinType=[inner]) > * TableScan(table=[[CATALOG, SALES, EMP]]) > * TableScan(table=[[CATALOG, SALES, DEPT]]) > */ > sql( > "select emp.empno, dept.deptno\n" > + "from emp\n" > + " inner join dept\n" > + "on emp.deptno = dept.deptno\n" > + "where dept.name = 'foo'" > ) > .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true)) > .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false)); > } > @Test void > testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter() { > /** Project(EMPNO=[$0], DEPTNO=[$9]) > * Filter(condition=[!=($10, 'foo')]) > * Join(condition=[=($7, $9)], joinType=[inner]) > * TableScan(table=[[CATALOG, SALES, EMP]]) > * TableScan(table=[[CATALOG, SALES, DEPT]]) > */ > sql( > "select emp.empno, dept.deptno\n" > + "from emp\n" > + " inner join dept\n" > + "on emp.deptno = dept.deptno\n" > + "where dept.name <> 'foo'" > ) > .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true)) > .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false)); > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)