[ 
https://issues.apache.org/jira/browse/FLINK-35874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu resolved FLINK-35874.
--------------------------------
    Resolution: Implemented

Implemented via master(3.2-SNAPSHOT): ea71b2302ddc5f9b7be65843dbf3f5bed4ca9d8e

> Check pureBinlogPhaseTables set before call getBinlogPosition method in 
> BinlogSplitReader
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-35874
>                 URL: https://issues.apache.org/jira/browse/FLINK-35874
>             Project: Flink
>          Issue Type: Improvement
>          Components: Flink CDC
>            Reporter: Zhongmin Qiao
>            Assignee: Zhongmin Qiao
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: cdc-3.2.0
>
>         Attachments: image-2024-07-22-19-26-59-158.png, 
> image-2024-07-22-19-27-19-366.png, image-2024-07-22-19-30-08-989.png, 
> image-2024-07-22-19-36-20-481.png, image-2024-07-22-19-36-40-581.png, 
> image-2024-07-22-19-37-35-542.png, image-2024-07-22-21-12-03-316.png
>
>
> The method getBinlogPosition of RecordUtil which is called by  
> BinlogSplitReader.
> shouldEmit is a highly performance-consuming method. This is because it 
> iterates through the sourceOffset map of the SourceRecord, and during the 
> iteration, it also performs a toString() conversion on the value. Finally, it 
> calls the putAll method of BinlogOffsetBuilder to put all the elements 
> obtained from the iteration into the offsetMap (which involves another map 
> traversal and hashcode computation). Despite the significant performance 
> impact of getBinlogPosition, we still need to call it when emitting each 
> DataChangeRecord, which reduces the efficiency of data processing in Flink 
> CDC.
> !image-2024-07-22-19-26-59-158.png|width=545,height=222!
> !image-2024-07-22-19-27-19-366.png|width=545,height=119!
> However, we can optimize and avoid frequent invocations of getBinlogPosition 
> by moving the check pureBinlogPhaseTables.contains(tableId) in the 
> hasEnterPureBinlogPhase method before calling getBinlogPosition. This way, if 
> the SourceRecord belongs to a pure binlog phase table, we can directly return 
> true without the need for the highly performance-consuming getBinlogPosition 
> method.
> diff
> !image-2024-07-22-21-12-03-316.png|width=548,height=236!
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to