Hi, I made a comment in CALCITE-3428 [1] about what I believe to be an “oddity” around the uniqueness analysis in RelMdColumnUniqueness related to, and I wanted to surface that here in case a comment on a 4 year old closed Jira was not going to get any traffic. Basically: if a filter has a equality condition against an input, that input is *added* [2] to the set of columns whose uniqueness is being checked, and I don’t think that’s correct, since that input may not be unique, and its uniqueness in the input does not impact the uniqueness of the columns originally being asked about (at least that Is my belief).
We are specifically hitting this when filters to the native-key side of a foreign-key/native-key join occur above the join For example: `select T1.id from T1 inner join T2 on T1.foreignKey = T2.ID where T2.foo = 1234` (Assume that T1.ID is listed as a unique key in the TableScan statistics, but neither foreignKey or foo are listed as unique in any way) If you ask RelMdColumnUniqueness if T1.id is unique for this query, the answer will be no. But if you remove the filter, the answer is yes, and if you change the filter to something like `T2.foo != 1234` the answer is again yes. I don’t think this is correct, since foo’s uniqueness does not impact the uniqueness of id (in-fact, the paring of (id, foo) is still unique, since id is unique). This has a lot of impact on rules where uniqueness is important, such as AggregateRemoveRule (which ironically this Jira was meant to help). For example, if the query is `select T1.id, count(*) from T1 inner join T2 on T1.foreignKey = T2.ID group by T1.id`, the aggregate is removed entirely. But if the query is `select T1.id, count(*) from T1 inner join T2 on T1.foreignKey = T2.ID where T2.foo = 1234 group by T1.id`, the aggregate is not removed. And if the query is `select T1.id, count(*) from T1 inner join T2 on T1.foreignKey = T2.ID where T2.foo != 1234 group by T1.id`, the aggregate is again removed entirely. This is also not limited to filter, it looks like nearly all areColumnsUnique methods union the parameter columns with the result of decorateWithConstantColumnsFromPredicates, so I imagine this impacts other RelNode types and optimizations. Is my understanding of this correct? Thanks! -Ian Bertolacci [1] https://issues.apache.org/jira/browse/CALCITE-3428?focusedCommentId=17933163&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17933163 [2] https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/metadata/RelMdColumnUniqueness.java#L112-L113