LiuZeshan created FLINK-35291:
---------------------------------

             Summary: Improve the ROW data deserialization performance of 
DebeziumEventDeserializationScheme
                 Key: FLINK-35291
                 URL: https://issues.apache.org/jira/browse/FLINK-35291
             Project: Flink
          Issue Type: Improvement
          Components: Flink CDC
    Affects Versions: 1.20.0
            Reporter: LiuZeshan
             Fix For: 1.20.0
         Attachments: cdc-3.0-1c-2.html, cdc-3.0-1c.html, 
image-2024-05-06-00-29-34-618.png, image-2024-05-06-00-37-16-028.png

We are doing performance testing on Flink cdc 3.0 and found through the arthas 
profile that there is a significant performance bottleneck in the serialization 
of row data. The main problem lies in the String. format in the 
BinaryRecordDataGenerator class, so we have made simple performance 
optimizations.

test environment:
 * flink: 1.20-SNAPSHOT master
 * flink-cdc: 3.2-SNAPSHOT master
 * 1CU minicluster mode

{code:java}
source:
  type: mysql
  hostname: localhost
  port: 3308
  username: root
  password: 123456
  tables: test.user_behavior
  server-id: 5400-5404
  #server-time-zone: UTC
  scan.startup.mode: earliest-offset
  debezium.poll.interval.ms: 10

sink:
  type: values
  name: Values Sink
  materialized.in.memory: false
  print.enabled: false

pipeline:
  name: Sync MySQL Database to Values
  parallelism: 1{code}
 

*before optimization: 3.5w/s* 
!https://bytedance.larkoffice.com/space/api/box/stream/download/asynccode/?code=MTRjZGIyNWYyYmVlY2YwNDNmYjExZDE4MjRhMGYyYzlfcVRuM0JBYXpTem9qUWRxdkY0NGZmVkpWc1cxMnlzaE9fVG9rZW46RklTbWJUNkVYb2s0WGF4eEttWWN6M0hIbjJTXzE3MTQ5MjU4OTY6MTcxNDkyOTQ5Nl9WNA|width=361,height=179!

[^cdc-3.0-1c.html]

^Analyzing the flame chart, it can be found that approximately 24.45% of the 
time is spent on string.format.^

!image-2024-05-06-00-29-34-618.png|width=583,height=171!

 

*after optimization: 5w/s* 

!https://bytedance.larkoffice.com/space/api/box/stream/download/asynccode/?code=YjRkMDRmYTkzNzRiNjBmMzVmN2VlYTYyMGRmMGU0ZDRfcFIyNGNGMEViSzRjektpdVFWYTYyUnJQbWJjd1lnb3dfVG9rZW46V2ZXVGJ2T3lDb3dCSmF4WVZvTGMzc2h2bmpmXzE3MTQ5MjU5NTM6MTcxNDkyOTU1M19WNA|width=363,height=174!
 
 [^cdc-3.0-1c-2.html]

After optimization, 4.7%(extractBeforeDataRecord+extractAfterDataRecord) of the 
time is still spent on 
org/apache/flink/cdc/runtime/typeutils/BinaryRecordDataGenerator.<init>. 
Perhaps we can further optimize it.

!image-2024-05-06-00-37-16-028.png|width=379,height=107!

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to