[ https://issues.apache.org/jira/browse/FLINK-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16268970#comment-16268970 ]
ASF GitHub Bot commented on FLINK-8158: --------------------------------------- Github user xccui commented on the issue: https://github.com/apache/flink/pull/5094 Hi @hequn8128, thanks for looking into this. I've checked the current implementation and found that it really may emit late data. However, that was caused by the checkings below: https://github.com/apache/flink/blob/427dfe42e2bea891b40e662bc97cdea57cdae3f5/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/join/TimeBoundedStreamInnerJoin.scala#L173 and https://github.com/apache/flink/blob/427dfe42e2bea891b40e662bc97cdea57cdae3f5/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/join/TimeBoundedStreamInnerJoin.scala#L234 In some situations, they will not forbid the late rows from being calculated and emitted. Honestly, I cannot think out a solution in a short time. Do you want to continue working on that? Or I could take it over, if you don't mind. Thanks, Xingcan > Rowtime window inner join emits late data > ----------------------------------------- > > Key: FLINK-8158 > URL: https://issues.apache.org/jira/browse/FLINK-8158 > Project: Flink > Issue Type: Bug > Components: Table API & SQL > Reporter: Hequn Cheng > Assignee: Hequn Cheng > Attachments: screenshot-1xxx.png > > > When executing the join, the join operator needs to make sure that no late > data is emitted. Currently, this achieved by holding back watermarks. > However, the window border is not handled correctly. For the sql bellow: > {quote} > val sqlQuery = > """ > SELECT t2.key, t2.id, t1.id > FROM T1 as t1 join T2 as t2 ON > t1.key = t2.key AND > t1.rt BETWEEN t2.rt - INTERVAL '5' SECOND AND > t2.rt + INTERVAL '1' SECOND > """.stripMargin > val data1 = new mutable.MutableList[(String, String, Long)] > // for boundary test > data1.+=(("A", "LEFT1", 6000L)) > val data2 = new mutable.MutableList[(String, String, Long)] > data2.+=(("A", "RIGHT1", 6000L)) > {quote} > Join will output a watermark with timestamp 1000, but if left comes with > another data ("A", "LEFT1", 1000L), join will output a record with timestamp > 1000 which equals previous watermark. -- This message was sent by Atlassian JIRA (v6.4.14#64029)