[ https://issues.apache.org/jira/browse/CALCITE-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17933163#comment-17933163 ]
Ian Bertolacci commented on CALCITE-3428: ----------------------------------------- Sorry to ping a 4 year old thread, but I think there is something off about the approach taken that affects other rules. For example: {code:java} Filter(condition=[=($1, 12345)]) └── TableScan(table=[T1], Schema=[ID, Foo]){code} In our system, ID is always unique, so the unique bitset list is {{`[ \{0} ]`}} at the the table scan. If I perform {{{}`RelMdColumnUniqueness.areColumnsUnique( filter, \{0}, ... )`{}}}, it will add {{`1`}} into the bitset (making {{`\{0, 1}`) }}then invoke {{`RelMdColumnUniqueness.areColumnsUnique( tableScan, \{0, 1}, ... )`.}} For table scans, if *any* of the columns are unique, table.isKey will return true (since at least one of the columns is unique, it does not matter the uniquness of the other columns) So ultimately, the will tell me that the column set {{`0`}} is unique after the filter. But lets add a join: {code:java} Filter(condition=[=($1, 12345)]) └── Join(condition=[=($0, $3)], joinType=[inner]) ├── TableScan(table=[T1, Schema=[ID, Foo]) └── TableScan(table=[T2], Schema=[ID, ForeignKey]){code} And in this case I want to ask, is column $2 (T2.ID) unique? It should be, since it is unique and is the "driver" of the join, making $0 (T1.ID) non-unique, since it may be repeated by repeated values of $3 (T2.ForeignKey) But what happens is that RelMdColumnUniqueness.areColumnsUnique for the filter will add $1 to the column bitset, and ask the join "are columns {{`\{1, 2}`}} unique? For joins, a column set is unique if *all* columns are unique. Because `1` is not unique, then the column set {{`\{1, 2}`}} is not unique. But thats not what we really care about. We want to know if `2` is unique, which it should be. If you remove the filter entirely, or change the condition to anything other than EQUALS, then the filter check does not add `1` to the column set, and so {{`2`}} is confirmed to be unique. This specifically comes up in AggregateRemoveRule. {code:java} Aggregate(group=[{2}], ...]) └── Filter(condition=[=($1, 12345)]) └── Join(condition=[=($0, $3)], joinType=[inner]) ├── TableScan(table=[T1, Schema=[ID, Foo]) └── TableScan(table=[T2], Schema=[ID, ForeignKey]) {code} In this case, we should be able to remove the aggregate because column 2 (T2.id) under the aggregate is unique, but it is not confirmed as such because of the filter refinement. Am I missing something? I feel like maybe the refinement should have been a check for if the column set that are given as a parameter to the function intersects with the predicate pull up. > Refine RelMdColumnUniqueness for Filter by considering constant columns > ----------------------------------------------------------------------- > > Key: CALCITE-3428 > URL: https://issues.apache.org/jira/browse/CALCITE-3428 > Project: Calcite > Issue Type: Improvement > Reporter: Jin Xing > Assignee: Jin Xing > Priority: Major > Labels: pull-request-available > Fix For: 1.22.0 > > Time Spent: 50m > Remaining Estimate: 0h > > AggregateRemoveRule fails to remove the top Aggregate for below SQL > {code:java} > select mgr, sum(sum_sal) > from > (select mgr, deptno, sum(sal) sum_sal > from sales.emp > group by mgr, deptno) > where deptno=100 > group by mgr > {code} > The reason is that RelMdColumnUniqueness doesn't take the filtering condition > into consideration when checking uniqueness of columns. > This PR proposes to refine RelMdColumnUniqueness for Filter, thus to > strengthen AggregateRemoveRule. > Resolving this Jira will help a lot for CALCITE-3334 by removing the > redundant compensation Aggregate when doing materialization matching -- This message was sent by Atlassian Jira (v8.20.10#820010)