Roman Boyko created FLINK-34694:
-----------------------------------

             Summary: Delete num of associations for streaming outer join
                 Key: FLINK-34694
                 URL: https://issues.apache.org/jira/browse/FLINK-34694
             Project: Flink
          Issue Type: Improvement
          Components: Table SQL / Runtime
    Affects Versions: 1.18.1, 1.19.0, 1.17.2, 1.16.3
            Reporter: Roman Boyko
         Attachments: image-2024-03-15-19-51-29-282.png, 
image-2024-03-15-19-52-24-391.png

Currently in StreamingJoinOperator (non-window) in case of OUTER JOIN the 
OuterJoinRecordStateView is used to store additional field - the number of 
associations for every record. This leads to store additional Tuple2 and 
Integer data for every record in outer state.

This functionality is used only for sending:
 * -D[nullPaddingRecord] in case of first Accumulate record
 * +I[nullPaddingRecord] in case of last Revoke record

The overhead of storing additional data and updating the counter for 
associations can be avoided by checking the input state for these events.

 

The proposed solution can be found here - 
[https://github.com/rovboyko/flink/commit/1ca2f5bdfc2d44b99d180abb6a4dda123e49d423]

 

According to the nexmark q20 test (changed to OUTER JOIN) it could increase the 
performance up to 20%:
 * Before:

!image-2024-03-15-19-52-24-391.png!
 * After:

!image-2024-03-15-19-51-29-282.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to