[ 
https://issues.apache.org/jira/browse/CALCITE-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18022714#comment-18022714
 ] 

Ruben Q L edited comment on CALCITE-7199 at 9/25/25 9:25 AM:
-------------------------------------------------------------

IIUC an example showing this scenario would be:
 - nation (n_nationkey, n_name, n_regionkey) => n_nationkey is PK (i.e. unique)
 - region (r_regionkey, r_name) => r_regionkey is PK (i.e. unique)

Considering:
{code:sql}
SELECT * FROM nation n INNER JOIN region r ON n_regionkey = r_regionkey
{code}
If we compute columnUniqueness on {{{{}n_nationkey, r_name{}}}} the current 
algorithm will return false (because r_name is not unique on its table), 
however they are in fact unique, as explained in the description (n_nationkey 
is unique, the RHS join key r_regionkey is unique, and we are dealing with an 
INNER JOIN so no NULLs from the LHS)


was (Author: rubenql):
IIUC an example showing this scenario would be:
 - nation (n_nationkey, n_name, n_regionkey) => n_nationkey is PK (i.e. unique)
 - region (r_regionkey, r_name) => r_regionkey is PK (i.e. unique)

Considering:
{code:sql}
SELECT * FROM nation n INNER JOIN region r ON n_regionkey = r_regionkey
{code}
If we compute columnUniqueness on {{{{}n_nationkey, r_name{}}}} the current 
algorithm will return false (because r_name is not unique on its table), 
however they are in fact unique, as explained in the description (n_regionkey 
is unique, the RHS join key r_regionkey is unique, and we are dealing with an 
INNER JOIN so no NULLs from the LHS)

> RelMdColumnUniqueness joins handler is invalid for columns in both sides
> ------------------------------------------------------------------------
>
>                 Key: CALCITE-7199
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7199
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.40.0
>            Reporter: Claude Brisson
>            Priority: Major
>              Labels: pull-request-available
>
> {{RelMdColumnUniqueness.areColumnsUnique()}} handler for joins contains the 
> following code:
> {code}
>     // If the original column mask contains columns from both the left and
>     // right hand side, then the columns are unique if and only if they're
>     // unique for their respective join inputs
>     Boolean leftUnique = mq.areColumnsUnique(left, leftColumns, ignoreNulls);
>     Boolean rightUnique = mq.areColumnsUnique(right, rightColumns, 
> ignoreNulls);
>     if ((leftColumns.cardinality() > 0)
>         && (rightColumns.cardinality() > 0)) {
>       if ((leftUnique == null) || (rightUnique == null)) {
>         return null;
>       } else {
>         return leftUnique && rightUnique;
>       }
>     }
> {code}
> This is not correct. Uniqueness on both sides is a sufficient condition for 
> the columns to be unique in the join result, but not a necessary one.
> The columns will also be unique in the joins result if the following 
> conditions are all met:
> * the queried columns coming from the left input are unique
> * the columns implied in the right side of the join equi-condition are unique
> * the join does not generate nulls on the left
> (and the same with left and right reversed).
> The fact that the join is done on a unique key on the right side will 
> guarantee that the uniqueness of the queried columns on the left is 
> preserved, and whatever columns we add in the queried subset, it will remain 
> unique.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to