[
https://issues.apache.org/jira/browse/CALCITE-5646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713104#comment-17713104
]
Stamatis Zampetakis commented on CALCITE-5646:
----------------------------------------------
I agree with [~lchistov1987] that the problem seems to be in the rule and not
in the Strong class.
Conceptually I would say that the following holds:
{noformat}
COALESCE(t1.name, t2.name) is strong in {t1.name, t2.name}
COALESCE(t1.name, t2.name) is not strong in {t1.name}
COALESCE(t1.name, t2.name) is not strong in {t2.name}
COALESCE(t1.name, t2.name) is not strong in {}
{noformat}
The rule currently checks the first case (considers null all
inputs/attributes) and given that the predicate is deemed strong under {t1,t2}
it wrongly infers that: IS NOT NULL(t1.name) and IS NOT NULL(t2.name)
I think it makes sense to consider the inputs in isolation as [~lchistov1987]
did in the PR.
Few other general comments about Strong class.
The definition of Strong/Null rejecting seems to defer a bit from the database
literature. The following definition is taken from "On the Correct and Complete
Enumeration of the Core Search Space":
{quote}Definition 1. A predicate is null rejecting for a set of attributes A if
it evaluates to false or unknown on every tuple in which all attributes in A
are null.{quote}
I think this definition coincides with what Strong#isNotTrue method is doing
but not with Strong#isStrong and this caused a bit of confusion for me.
The current javadoc of Strong#isNotTrue contains some contradiction;
"definitely not return true... equivalently, will definitely not return null or
false". Suggested rephrase below:
{code:java}
/** Returns whether the expression evaluates to false or null when all
specified columns are null. */
{code}
The current implementation of Strong#isStrong is not different from
Strong#isNull yet they have a kind of different javadoc/definition. I think we
would be better if we removed one of them.
Probably we can defer javadoc/refactoring/definition comments in other PRs but
I though it would be good to discuss now since more than one people have the
context around these classes on our mind.
> JoinDeriveIsNotNullFilterRule incorrectly handles COALESCE in join condition
> ----------------------------------------------------------------------------
>
> Key: CALCITE-5646
> URL: https://issues.apache.org/jira/browse/CALCITE-5646
> Project: Calcite
> Issue Type: Bug
> Affects Versions: 1.34.0
> Reporter: Leonid Chistov
> Assignee: Leonid Chistov
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Consider query
> {code:java}
> select t1.deptno from empnullables t1 inner join
> empnullables t2 on coalesce(t1.ename, t2.ename) = 'abc' {code}
> When JoinDeriveIsNotNullFilterRule is applied to it, it is incorrectly
> transformed to query plan
> {code:java}
> LogicalProject(DEPTNO=[$7])
> LogicalJoin(condition=[=(CASE(IS NOT NULL($1), $1, $10), 'abc')],
> joinType=[inner])
> LogicalFilter(condition=[IS NOT NULL($1)])
> LogicalTableScan(table=[[CATALOG, SALES, EMPNULLABLES]])
> LogicalFilter(condition=[IS NOT NULL($1)])
> LogicalTableScan(table=[[CATALOG, SALES, EMPNULLABLES]]) {code}
> It is not valid to deduce that join keys from the both sides cannot have null
> values. All that we can deduce from the join condition, is that they cannot
> be null in the same time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)