[ https://issues.apache.org/jira/browse/FLINK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fabian Hueske updated FLINK-5498: --------------------------------- Priority: Minor (was: Critical) > OuterJoins with non-equality predicates compute wrong results > ------------------------------------------------------------- > > Key: FLINK-5498 > URL: https://issues.apache.org/jira/browse/FLINK-5498 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL > Affects Versions: 1.3.0 > Reporter: lincoln.lee > Assignee: lincoln.lee > Priority: Minor > > I found the expected result of a unit test case incorrect compare to that in > a RDMBS, > see > flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/batch/table/JoinITCase.scala > {code:title=JoinITCase.scala} > def testRightJoinWithNotOnlyEquiJoin(): Unit = { > ... > val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 'b, > 'c) > val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 'e, > 'f, 'g, 'h) > val joinT = ds1.rightOuterJoin(ds2, 'a === 'd && 'b < 'h).select('c, 'g) > > val expected = "Hello world,BCD\n" > val results = joinT.toDataSet[Row].collect() > TestBaseUtils.compareResultAsText(results.asJava, expected) > } > {code} > Then I took some time to learn about the ‘outer join’ in relational > databases, the right result of above case should be(tested in SQL Server and > MySQL, the results are same): > {code} > > select c, g from tuple3 right outer join tuple5 on a=f and b<h; > c g > -------------------------------- -------------------------------- > NULL Hallo > NULL Hallo Welt > NULL Hallo Welt wie > NULL Hallo Welt wie gehts? > NULL ABC > Hello world BCD > NULL CDE > NULL DEF > NULL EFG > NULL FGH > NULL GHI > NULL HIJ > NULL IJK > NULL JKL > NULL KLM > {code} > the join condition {{rightOuterJoin('a === 'd && 'b < 'h)}} is not equivalent > to {{rightOuterJoin('a === 'd).where('b < 'h)}}. > The problem is rooted in the code-generated {{JoinFunction}} (see > {{DataSetJoin.translateToPlan()}}, line 188). If the join condition does not > match, we must emit the outer row padded with nulls instead of returning from > the function without emitting anything. > The code-generated {{JoinFunction}} does also include equality predicates. > These should be removed before generating the code, e.g., in > {{DataSetJoinRule}} when generating the {{DataSetJoin}} with help of > {{JoinInfo.getRemaining()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)