[ https://issues.apache.org/jira/browse/FLINK-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206121#comment-16206121 ]
Xingcan Cui edited comment on FLINK-7730 at 10/16/17 4:17 PM: -------------------------------------------------------------- Hi [~fhueske], more reasons are revealed for these issues. 1. The first problem should be caused by the incorrect identification of the equi-predicate. I think we need a more formal definition about this term. If {{'a.cast(Types.STRING) === 'd}} is a valid one, how about (1) {{'a.cast(Types.STRING).cast(Types.INT) === 'd}}, (2) {{'a === ('d - 1)}} and (3) {{'a.cast(Types.STRING) === 'd.cast(Types.STRING)}}? IMO, they can be summarized as {{leftLocalExpression === rightLocalExpression}}, right? 2. The second problem is caused by the decorrelation optimization. For instance, the original plan {code:java} LogicalProject(c=[$2], s=[$3], a=[$0]) LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{}]) LogicalTableScan(table=[[_DataSetTable_0]]) LogicalFilter(condition=[=($cor0.c, $0)]) LogicalTableFunctionScan(invocation=[XXX($2)], rowType=[RecordType(VARCHAR(65536) s)], elementType=[class [Ljava.lang.Object;]) {code} will be decorrelated to {code:java} LogicalProject(c=[$2], s=[$3], a=[$0]) LogicalJoin(condition=[=($2, $3)], joinType=[left]) LogicalTableScan(table=[[_DataSetTable_0]]) LogicalFilter(condition=[IS NOT NULL($0)]) LogicalTableFunctionScan(invocation=[XXX($2)], rowType=[RecordType(VARCHAR(65536) s)], elementType=[class [Ljava.lang.Object;]) {code} where the {{LogicalCorrelate}} is eliminated. Since the {{LogicalTableFunctionScan}} can only be handled in DataSet/DataStream {{CorrelateRule}}, the decorrelated plan can not be properly translated and that raises the exception. 3. The reason for the third issue has been discussed before. To be honest, the workload is heavier than my expectation. I suggest to create sub-issues to track them. How do you think? was (Author: xccui): Hi [~fhueske], more reasons are revealed for these issues. 1. The first problem should be caused by the incorrectly identification of the equi-predicate. I think we need a more formal definition about this term. If {{'a.cast(Types.STRING) === 'd}} is a valid one, how about (1) {{'a.cast(Types.STRING).cast(Types.INT) === 'd}}, (2) {{'a === ('d - 1)}} and (3) {{'a.cast(Types.STRING) === 'd.cast(Types.STRING)}}? IMO, they can be summarized as {{leftLocalExpression === rightLocalExpression}}, right? 2. The second problem is caused by the decorrelation optimization. For instance, the original plan {code:java} LogicalProject(c=[$2], s=[$3], a=[$0]) LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{}]) LogicalTableScan(table=[[_DataSetTable_0]]) LogicalFilter(condition=[=($cor0.c, $0)]) LogicalTableFunctionScan(invocation=[XXX($2)], rowType=[RecordType(VARCHAR(65536) s)], elementType=[class [Ljava.lang.Object;]) {code} will be decorrelated to {code:java} LogicalProject(c=[$2], s=[$3], a=[$0]) LogicalJoin(condition=[=($2, $3)], joinType=[left]) LogicalTableScan(table=[[_DataSetTable_0]]) LogicalFilter(condition=[IS NOT NULL($0)]) LogicalTableFunctionScan(invocation=[XXX($2)], rowType=[RecordType(VARCHAR(65536) s)], elementType=[class [Ljava.lang.Object;]) {code} where the {{LogicalCorrelate}} is eliminated. Since the {{LogicalTableFunctionScan}} can only be handled in DataSet/DataStream {{CorrelateRule}}, the decorrelated plan can not be properly translated and that raises the exception. 3. The reason for the third issue has been discussed before. To be honest, the workload is heavier than my expectation. I suggest to create sub-issues to track them. How do you think? > TableFunction LEFT OUTER joins with ON predicates are broken > ------------------------------------------------------------ > > Key: FLINK-7730 > URL: https://issues.apache.org/jira/browse/FLINK-7730 > Project: Flink > Issue Type: Bug > Components: Table API & SQL > Affects Versions: 1.4.0, 1.3.2 > Reporter: Fabian Hueske > Assignee: Xingcan Cui > Priority: Critical > > TableFunction left outer joins with predicates in the ON clause are broken. > Apparently, the are no tests for this and it has never worked. I observed > issues on several layers: > - Table Function does not correctly validate equality predicate: > {{leftOuterJoin(func1('c) as 'd, 'a.cast(Types.STRING) === 'd)}} is rejected > because the predicate is not considered as an equality predicate (the cast > needs to be pushed down). > - Plans cannot be correctly translated: {{leftOuterJoin(func1('c) as 'd, 'c > === 'd)}} gives an optimizer exception. > - SQL queries get translated but produce incorrect results. For example > {{SELECT a, b, c, d FROM MyTable LEFT OUTER JOIN LATERAL TABLE(tfunc(c)) AS > T(d) ON d = c}} returns an empty result if the condition {{d = c}} never > returns true. However, the outer side should be preserved and padded with > nulls. > So there seem to be many issues with table function outer joins. Especially, > the wrong result produced by SQL queries need to be quickly fixed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)