[ https://issues.apache.org/jira/browse/FLINK-23740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400932#comment-17400932 ]
Fu Kai commented on FLINK-23740: -------------------------------- Hi [~TsReaper] Thank you for the response. # The update we expect is that updates in source table can be reflected in the final output record(existing ones in our case). The data causing problem is not special with others and has strict schema enforcement with Avro; # We do not have any TTL setup for the states. The left join one has running for month, and there is no such issue detected and it appears in full outer join after 1 or 2 days; We're currently going to explore to exposing the JOIN states in state views with query stable interface and inspect the state in each operator stage by stage to see where the update gets lost. Except for this approach, do you have any other suggestion on how to better inspect the internal states, or any other ways to trouble shooting the problem? > SQL Full Outer Join bug > ----------------------- > > Key: FLINK-23740 > URL: https://issues.apache.org/jira/browse/FLINK-23740 > Project: Flink > Issue Type: Bug > Components: Table SQL / Runtime > Affects Versions: 1.13.1, 1.13.2 > Reporter: Fu Kai > Priority: Critical > > Hi team, > We encountered an issue about FULL OUTER JOIN of Flink SQL, which happens > occasionally at very low probability that join output records cannot be > correctly updated. We cannot locate the root cause for now by glancing at the > SQL join logic in > [StreamingJoinOperator.|https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/join/stream/StreamingJoinOperator.java#L198] > It cannot be stably reproduced and it does happen with massive data volume. > The reason we suspect it's the FULL OUER join problem instead of others like > LEFT OUTER join is because the issue only arises after we introduced FULL > OUTER into the join flow. The query we are using is like the following. The > are two join code pieces below, the fist one contains solely left join(though > with nested) and there is no issue detected; the second one contains both > left and full outer join(nested as well), and the problem is that sometimes > update from the left table A(and other tables before the full outer join > operator) cannot be reflected in the final output. We suspect it could be the > introduce of full outer join that caused the problem, although at a very low > probability(~10 out of ~30million). > The root cause of the bug could be something else, the suspecting of FULL OUT > join is based on the result of our current experiment and observation. > {code:java} > create table A( > k1 int, > k2 int, > k3 int, > k4 int, > k5 int, > PRIMARY KEY (k1, k2, k3, k4, k5) NOT ENFORCED > ) WITH (); > create table B( > k1 int, > k2 int, > k3 int, > PRIMARY KEY (k1, k2, k3) NOT ENFORCED > ) WITH (); > create table C( > k1 int, > k2 int, > k3 int, > PRIMARY KEY (k1, k2, k3) NOT ENFORCED > ) WITH (); > create table D( > k1 int, > k2 int, > PRIMARY KEY (k1, k2) NOT ENFORCED > ) WITH (); > // query with left join, no issue detected > select * from A > left outer join > (select * from B > left outer join C > on > B.k1 = C.k1 > B.k2 = C.k2 > B.k3 = C.k3 > ) as BC > on > A.k1 = BC.k1 > A.k2 = BC.k2 > A.k3 = BC.k3 > left outer join D > on > A.k1 = D.k1 > A.k2 = D.k2 > ; > // query with full outer join combined with left outer join, record updates > from left table A cannot be updated in the final output record some times > select * from A > left outer join > (select * from B > full outer join C > on > B.k1 = C.k1 > B.k2 = C.k2 > B.k3 = C.k3 > ) as BC > on > A.k1 = BC.k1 > A.k2 = BC.k2 > A.k3 = BC.k3 > left outer join D > on > A.k1 = D.k1 > A.k2 = D.k2 > ; > {code} > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)