[ https://issues.apache.org/jira/browse/FLINK-35874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Leonard Xu resolved FLINK-35874. -------------------------------- Resolution: Implemented Implemented via master(3.2-SNAPSHOT): ea71b2302ddc5f9b7be65843dbf3f5bed4ca9d8e > Check pureBinlogPhaseTables set before call getBinlogPosition method in > BinlogSplitReader > ----------------------------------------------------------------------------------------- > > Key: FLINK-35874 > URL: https://issues.apache.org/jira/browse/FLINK-35874 > Project: Flink > Issue Type: Improvement > Components: Flink CDC > Reporter: Zhongmin Qiao > Assignee: Zhongmin Qiao > Priority: Minor > Labels: pull-request-available > Fix For: cdc-3.2.0 > > Attachments: image-2024-07-22-19-26-59-158.png, > image-2024-07-22-19-27-19-366.png, image-2024-07-22-19-30-08-989.png, > image-2024-07-22-19-36-20-481.png, image-2024-07-22-19-36-40-581.png, > image-2024-07-22-19-37-35-542.png, image-2024-07-22-21-12-03-316.png > > > The method getBinlogPosition of RecordUtil which is called by > BinlogSplitReader. > shouldEmit is a highly performance-consuming method. This is because it > iterates through the sourceOffset map of the SourceRecord, and during the > iteration, it also performs a toString() conversion on the value. Finally, it > calls the putAll method of BinlogOffsetBuilder to put all the elements > obtained from the iteration into the offsetMap (which involves another map > traversal and hashcode computation). Despite the significant performance > impact of getBinlogPosition, we still need to call it when emitting each > DataChangeRecord, which reduces the efficiency of data processing in Flink > CDC. > !image-2024-07-22-19-26-59-158.png|width=545,height=222! > !image-2024-07-22-19-27-19-366.png|width=545,height=119! > However, we can optimize and avoid frequent invocations of getBinlogPosition > by moving the check pureBinlogPhaseTables.contains(tableId) in the > hasEnterPureBinlogPhase method before calling getBinlogPosition. This way, if > the SourceRecord belongs to a pure binlog phase table, we can directly return > true without the need for the highly performance-consuming getBinlogPosition > method. > diff > !image-2024-07-22-21-12-03-316.png|width=548,height=236! > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)