Questions about RelMdColumnUniqueness

Ian Bertolacci Fri, 07 Mar 2025 10:41:42 -0800

Hi, I made a comment in CALCITE-3428 [1] about what I believe to be an “oddity” 
around the uniqueness analysis in RelMdColumnUniqueness related to, and I 
wanted to surface that here in case a comment on a 4 year old closed Jira was 
not going to get any traffic.
Basically: if a filter has a equality condition against an input, that input is 
*added* [2] to the set of columns whose uniqueness is being checked, and I 
don’t think that’s correct, since that input may not be unique, and its 
uniqueness in the input does not impact the uniqueness of the columns 
originally being asked about (at least that Is my belief).


We are specifically hitting this when filters to the native-key side of a 
foreign-key/native-key join occur above the join
For example: `select T1.id from T1 inner join T2 on T1.foreignKey = T2.ID where 
T2.foo = 1234`
(Assume that T1.ID is listed as a unique key in the TableScan statistics, but 
neither foreignKey or foo are listed as unique in any way)
If you ask RelMdColumnUniqueness if T1.id is unique for this query, the answer 
will be no.
But if you remove the filter, the answer is yes, and if you change the filter 
to something like `T2.foo != 1234` the answer is again yes.

I don’t think this is correct, since foo’s uniqueness does not impact the 
uniqueness of id (in-fact, the paring of (id, foo) is still unique, since id is 
unique).

This has a lot of impact on rules where uniqueness is important, such as 
AggregateRemoveRule (which ironically this Jira was meant to help).
For example, if the query is `select T1.id, count(*) from T1 inner join T2 on 
T1.foreignKey = T2.ID group by T1.id`, the aggregate is removed entirely.
But if the query is `select T1.id, count(*) from T1 inner join T2 on 
T1.foreignKey = T2.ID where T2.foo = 1234 group by T1.id`, the aggregate is not 
removed.
And if the query is `select T1.id, count(*) from T1 inner join T2 on 
T1.foreignKey = T2.ID where T2.foo != 1234 group by T1.id`, the aggregate is 
again removed entirely.
This is also not limited to filter, it looks like nearly all areColumnsUnique 
methods union the parameter columns with the result of 
decorateWithConstantColumnsFromPredicates, so I imagine this impacts other 
RelNode types and optimizations.

Is my understanding of this correct?
Thanks!
-Ian Bertolacci

[1] 
https://issues.apache.org/jira/browse/CALCITE-3428?focusedCommentId=17933163&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17933163
[2] 
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/metadata/RelMdColumnUniqueness.java#L112-L113

Questions about RelMdColumnUniqueness

Reply via email to