[
https://issues.apache.org/jira/browse/CALCITE-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18022714#comment-18022714
]
Ruben Q L edited comment on CALCITE-7199 at 9/25/25 9:25 AM:
-------------------------------------------------------------
IIUC an example showing this scenario would be:
- nation (n_nationkey, n_name, n_regionkey) => n_nationkey is PK (i.e. unique)
- region (r_regionkey, r_name) => r_regionkey is PK (i.e. unique)
Considering:
{code:sql}
SELECT * FROM nation n INNER JOIN region r ON n_regionkey = r_regionkey
{code}
If we compute columnUniqueness on {{{{}n_nationkey, r_name{}}}} the current
algorithm will return false (because r_name is not unique on its table),
however they are in fact unique, as explained in the description (n_nationkey
is unique, the RHS join key r_regionkey is unique, and we are dealing with an
INNER JOIN so no NULLs from the LHS)
was (Author: rubenql):
IIUC an example showing this scenario would be:
- nation (n_nationkey, n_name, n_regionkey) => n_nationkey is PK (i.e. unique)
- region (r_regionkey, r_name) => r_regionkey is PK (i.e. unique)
Considering:
{code:sql}
SELECT * FROM nation n INNER JOIN region r ON n_regionkey = r_regionkey
{code}
If we compute columnUniqueness on {{{{}n_nationkey, r_name{}}}} the current
algorithm will return false (because r_name is not unique on its table),
however they are in fact unique, as explained in the description (n_regionkey
is unique, the RHS join key r_regionkey is unique, and we are dealing with an
INNER JOIN so no NULLs from the LHS)
> RelMdColumnUniqueness joins handler is invalid for columns in both sides
> ------------------------------------------------------------------------
>
> Key: CALCITE-7199
> URL: https://issues.apache.org/jira/browse/CALCITE-7199
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.40.0
> Reporter: Claude Brisson
> Priority: Major
> Labels: pull-request-available
>
> {{RelMdColumnUniqueness.areColumnsUnique()}} handler for joins contains the
> following code:
> {code}
> // If the original column mask contains columns from both the left and
> // right hand side, then the columns are unique if and only if they're
> // unique for their respective join inputs
> Boolean leftUnique = mq.areColumnsUnique(left, leftColumns, ignoreNulls);
> Boolean rightUnique = mq.areColumnsUnique(right, rightColumns,
> ignoreNulls);
> if ((leftColumns.cardinality() > 0)
> && (rightColumns.cardinality() > 0)) {
> if ((leftUnique == null) || (rightUnique == null)) {
> return null;
> } else {
> return leftUnique && rightUnique;
> }
> }
> {code}
> This is not correct. Uniqueness on both sides is a sufficient condition for
> the columns to be unique in the join result, but not a necessary one.
> The columns will also be unique in the joins result if the following
> conditions are all met:
> * the queried columns coming from the left input are unique
> * the columns implied in the right side of the join equi-condition are unique
> * the join does not generate nulls on the left
> (and the same with left and right reversed).
> The fact that the join is done on a unique key on the right side will
> guarantee that the uniqueness of the queried columns on the left is
> preserved, and whatever columns we add in the queried subset, it will remain
> unique.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)