[ 
https://issues.apache.org/jira/browse/CALCITE-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Bertolacci updated CALCITE-6882:
------------------------------------
    Description: 
There is some overly conservative logic in determining uniqueness of columns in 
RelNodes which have joins beneath them (not necessarily as children)

In this case, this occurs when a filter condition involving the non-unique side 
of a join can engage in predicate pullup/refinement.

Demonstration: inner join between emp and dept

testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter has no filter after the join

testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter has a filter 
after the filter dep.name using `=` (this demonstrates the bug)

testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter has a filter 
after the filter dep.name using `!=`

In all three cases, the field position 0 (empno) should be listed as unique.

But testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter reports 
it as not-unique.

Specifically, testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter 
asks if column 0 is unique after the project, which asks if column 0 is unique 
after the filter; the pullup then adds column 10 to the bitset for unique 
columns. Column 10 is not unique, and so the result is that column 0 is marked 
as not unique. 

In testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter, the `!=` 
operator precludes the predicate pullup/refinement, and so the bitset is 
unmodified, and the analysis correctly marks column 0 after the join as unique.

it seems like there is a combination of issues:
 # Join being too conservative
 # Incorrect processing for columns which can engage in predicate 
pullup/refinement.

[Spawned by discussion in mailing 
list|https://lists.apache.org/thread/fmqyzokcqzv7vzq8lvg35h9opf5kwo33]

Tests inside RelMetadataTest:
{code:java}
@Test void testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter() {
  /** Project(EMPNO=[$0], DEPTNO=[$9])
    *   Join(condition=[=($7, $9)], joinType=[inner])
    *     TableScan(table=[[CATALOG, SALES, EMP]])
    *     TableScan(table=[[CATALOG, SALES, DEPT]])
    */
  sql(
      "select emp.empno, dept.deptno\n"
          + "from emp\n"
          + "  inner join dept\n"
          + "on emp.deptno = dept.deptno"
  )
      .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
      .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
}

@Test void testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter() {
  /** Project(EMPNO=[$0], DEPTNO=[$9])
    *   Filter(condition=[=($10, 'foo')])
    *     Join(condition=[=($7, $9)], joinType=[inner])
    *       TableScan(table=[[CATALOG, SALES, EMP]])
    *       TableScan(table=[[CATALOG, SALES, DEPT]])
    */
  sql(
    "select emp.empno, dept.deptno\n"
        + "from emp\n"
        + "  inner join dept\n"
        + "on emp.deptno = dept.deptno\n"
        + "where dept.name = 'foo'"
  )
      .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
      .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
}

@Test void testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter() {
  /** Project(EMPNO=[$0], DEPTNO=[$9])
    *   Filter(condition=[!=($10, 'foo')])
    *     Join(condition=[=($7, $9)], joinType=[inner])
    *       TableScan(table=[[CATALOG, SALES, EMP]])
    *       TableScan(table=[[CATALOG, SALES, DEPT]])
    */
  sql(
      "select emp.empno, dept.deptno\n"
          + "from emp\n"
          + "  inner join dept\n"
          + "on emp.deptno = dept.deptno\n"
          + "where dept.name <> 'foo'"
  )
      .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
      .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
}{code}

  was:
There is some overly conservative logic in determining uniqueness of columns in 
RelNodes which have joins beneath them (not necessarily as children)

In this case, this occurs when a filter condition involving the non-unique side 
of a join can engage in predicate pullup/refinement.

 

 

Demonstration: inner join between emp and dept

testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter has no filter after the join

testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter has a filter 
after the filter dep.name using `=` (this demonstrates the bug)

testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter has a filter 
after the filter dep.name using `!=`

In all three cases, the field position 0 (empno) should be listed as unique.

But testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter reports 
it as not-unique.

Specifically, testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter 
asks if column 0 is unique after the project, which asks if column 0 is unique 
after the filter; the pullup then adds column 10 to the bitset for unique 
columns. Column 10 is not unique, and so the result is that column 0 is marked 
as not unique. 

In testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter, the `!=` 
operator precludes the predicate pullup/refinement, and so the bitset is 
unmodified, and the analysis correctly marks column 0 after the join as unique.

it seems like there is a combination of issues:
 # Join being too conservative
 # Incorrect processing for columns which can engage in predicate 
pullup/refinement.

[Spawned by discussion in mailing 
list|https://lists.apache.org/thread/fmqyzokcqzv7vzq8lvg35h9opf5kwo33]

Tests inside RelMetadataTest:
{code:java}
@Test void testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter() {
  /** Project(EMPNO=[$0], DEPTNO=[$9])
    *   Join(condition=[=($7, $9)], joinType=[inner])
    *     TableScan(table=[[CATALOG, SALES, EMP]])
    *     TableScan(table=[[CATALOG, SALES, DEPT]])
    */
  sql(
      "select emp.empno, dept.deptno\n"
          + "from emp\n"
          + "  inner join dept\n"
          + "on emp.deptno = dept.deptno"
  )
      .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
      .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
}

@Test void testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter() {
  /** Project(EMPNO=[$0], DEPTNO=[$9])
    *   Filter(condition=[=($10, 'foo')])
    *     Join(condition=[=($7, $9)], joinType=[inner])
    *       TableScan(table=[[CATALOG, SALES, EMP]])
    *       TableScan(table=[[CATALOG, SALES, DEPT]])
    */
  sql(
    "select emp.empno, dept.deptno\n"
        + "from emp\n"
        + "  inner join dept\n"
        + "on emp.deptno = dept.deptno\n"
        + "where dept.name = 'foo'"
  )
      .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
      .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
}

@Test void testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter() {
  /** Project(EMPNO=[$0], DEPTNO=[$9])
    *   Filter(condition=[!=($10, 'foo')])
    *     Join(condition=[=($7, $9)], joinType=[inner])
    *       TableScan(table=[[CATALOG, SALES, EMP]])
    *       TableScan(table=[[CATALOG, SALES, DEPT]])
    */
  sql(
      "select emp.empno, dept.deptno\n"
          + "from emp\n"
          + "  inner join dept\n"
          + "on emp.deptno = dept.deptno\n"
          + "where dept.name <> 'foo'"
  )
      .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
      .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
}{code}


> RelMdColumnUniqueness incorrectly claims fields are not unique if constant 
> refinement occurs in a node above join
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-6882
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6882
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Ian Bertolacci
>            Priority: Major
>
> There is some overly conservative logic in determining uniqueness of columns 
> in RelNodes which have joins beneath them (not necessarily as children)
> In this case, this occurs when a filter condition involving the non-unique 
> side of a join can engage in predicate pullup/refinement.
> Demonstration: inner join between emp and dept
> testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter has no filter after the join
> testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter has a 
> filter after the filter dep.name using `=` (this demonstrates the bug)
> testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter has a 
> filter after the filter dep.name using `!=`
> In all three cases, the field position 0 (empno) should be listed as unique.
> But testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter reports 
> it as not-unique.
> Specifically, 
> testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter asks if 
> column 0 is unique after the project, which asks if column 0 is unique after 
> the filter; the pullup then adds column 10 to the bitset for unique columns. 
> Column 10 is not unique, and so the result is that column 0 is marked as not 
> unique. 
> In testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter, the 
> `!=` operator precludes the predicate pullup/refinement, and so the bitset is 
> unmodified, and the analysis correctly marks column 0 after the join as 
> unique.
> it seems like there is a combination of issues:
>  # Join being too conservative
>  # Incorrect processing for columns which can engage in predicate 
> pullup/refinement.
> [Spawned by discussion in mailing 
> list|https://lists.apache.org/thread/fmqyzokcqzv7vzq8lvg35h9opf5kwo33]
> Tests inside RelMetadataTest:
> {code:java}
> @Test void testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter() {
>   /** Project(EMPNO=[$0], DEPTNO=[$9])
>     *   Join(condition=[=($7, $9)], joinType=[inner])
>     *     TableScan(table=[[CATALOG, SALES, EMP]])
>     *     TableScan(table=[[CATALOG, SALES, DEPT]])
>     */
>   sql(
>       "select emp.empno, dept.deptno\n"
>           + "from emp\n"
>           + "  inner join dept\n"
>           + "on emp.deptno = dept.deptno"
>   )
>       .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
>       .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
> }
> @Test void 
> testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter() {
>   /** Project(EMPNO=[$0], DEPTNO=[$9])
>     *   Filter(condition=[=($10, 'foo')])
>     *     Join(condition=[=($7, $9)], joinType=[inner])
>     *       TableScan(table=[[CATALOG, SALES, EMP]])
>     *       TableScan(table=[[CATALOG, SALES, DEPT]])
>     */
>   sql(
>     "select emp.empno, dept.deptno\n"
>         + "from emp\n"
>         + "  inner join dept\n"
>         + "on emp.deptno = dept.deptno\n"
>         + "where dept.name = 'foo'"
>   )
>       .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
>       .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
> }
> @Test void 
> testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter() {
>   /** Project(EMPNO=[$0], DEPTNO=[$9])
>     *   Filter(condition=[!=($10, 'foo')])
>     *     Join(condition=[=($7, $9)], joinType=[inner])
>     *       TableScan(table=[[CATALOG, SALES, EMP]])
>     *       TableScan(table=[[CATALOG, SALES, DEPT]])
>     */
>   sql(
>       "select emp.empno, dept.deptno\n"
>           + "from emp\n"
>           + "  inner join dept\n"
>           + "on emp.deptno = dept.deptno\n"
>           + "where dept.name <> 'foo'"
>   )
>       .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
>       .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to