Zhongmin Qiao created FLINK-35874:
-------------------------------------

             Summary: Check pureBinlogPhaseTables set before call 
getBinlogPosition method in BinlogSplitReader
                 Key: FLINK-35874
                 URL: https://issues.apache.org/jira/browse/FLINK-35874
             Project: Flink
          Issue Type: Improvement
          Components: Flink CDC
            Reporter: Zhongmin Qiao
         Attachments: image-2024-07-22-19-26-59-158.png, 
image-2024-07-22-19-27-19-366.png, image-2024-07-22-19-30-08-989.png, 
image-2024-07-22-19-36-20-481.png, image-2024-07-22-19-36-40-581.png, 
image-2024-07-22-19-37-35-542.png, image-2024-07-22-21-12-03-316.png

The method getBinlogPosition of RecordUtil which is called by  
BinlogSplitReader.
shouldEmit is a highly performance-consuming method. This is because it 
iterates through the sourceOffset map of the SourceRecord, and during the 
iteration, it also performs a toString() conversion on the value. Finally, it 
calls the putAll method of BinlogOffsetBuilder to put all the elements obtained 
from the iteration into the offsetMap (which involves another map traversal and 
hashcode computation). Despite the significant performance impact of 
getBinlogPosition, we still need to call it when emitting each 
DataChangeRecord, which reduces the efficiency of data processing in Flink CDC.
!image-2024-07-22-19-26-59-158.png|width=545,height=222!

!image-2024-07-22-19-27-19-366.png|width=545,height=119!

However, we can optimize and avoid frequent invocations of getBinlogPosition by 
moving the check pureBinlogPhaseTables.contains(tableId) in the 
hasEnterPureBinlogPhase method before calling getBinlogPosition. This way, if 
the SourceRecord belongs to a pure binlog phase table, we can directly return 
true without the need for the highly performance-consuming getBinlogPosition 
method.

diff

!image-2024-07-22-21-12-03-316.png|width=548,height=236!

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to