[ https://issues.apache.org/jira/browse/FLINK-35291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-35291: ----------------------------------- Labels: pull-request-available (was: ) > Improve the ROW data deserialization performance of > DebeziumEventDeserializationScheme > -------------------------------------------------------------------------------------- > > Key: FLINK-35291 > URL: https://issues.apache.org/jira/browse/FLINK-35291 > Project: Flink > Issue Type: Improvement > Components: Flink CDC > Affects Versions: 1.20.0 > Reporter: LiuZeshan > Priority: Minor > Labels: pull-request-available > Fix For: 1.20.0 > > Attachments: cdc-3.0-1c-2.html, cdc-3.0-1c.html, > image-2024-05-06-00-29-34-618.png, image-2024-05-06-00-37-16-028.png > > > We are doing performance testing on Flink cdc 3.0 and found through the > arthas profile that there is a significant performance bottleneck in the > serialization of row data. The main problem lies in the String. format in the > BinaryRecordDataGenerator class, so we have made simple performance > optimizations. > test environment: > * flink: 1.20-SNAPSHOT master > * flink-cdc: 3.2-SNAPSHOT master > * 1CU minicluster mode > {code:java} > source: > type: mysql > hostname: localhost > port: 3308 > username: root > password: 123456 > tables: test.user_behavior > server-id: 5400-5404 > #server-time-zone: UTC > scan.startup.mode: earliest-offset > debezium.poll.interval.ms: 10 > sink: > type: values > name: Values Sink > materialized.in.memory: false > print.enabled: false > pipeline: > name: Sync MySQL Database to Values > parallelism: 1{code} > > *before optimization: 3.5w/s* > !https://bytedance.larkoffice.com/space/api/box/stream/download/asynccode/?code=MTRjZGIyNWYyYmVlY2YwNDNmYjExZDE4MjRhMGYyYzlfcVRuM0JBYXpTem9qUWRxdkY0NGZmVkpWc1cxMnlzaE9fVG9rZW46RklTbWJUNkVYb2s0WGF4eEttWWN6M0hIbjJTXzE3MTQ5MjU4OTY6MTcxNDkyOTQ5Nl9WNA|width=361,height=179! > [^cdc-3.0-1c.html] > ^Analyzing the flame chart, it can be found that approximately 24.45% of the > time is spent on string.format.^ > !image-2024-05-06-00-29-34-618.png|width=583,height=171! > > *after optimization: 5w/s* > !https://bytedance.larkoffice.com/space/api/box/stream/download/asynccode/?code=YjRkMDRmYTkzNzRiNjBmMzVmN2VlYTYyMGRmMGU0ZDRfcFIyNGNGMEViSzRjektpdVFWYTYyUnJQbWJjd1lnb3dfVG9rZW46V2ZXVGJ2T3lDb3dCSmF4WVZvTGMzc2h2bmpmXzE3MTQ5MjU5NTM6MTcxNDkyOTU1M19WNA|width=363,height=174! > > [^cdc-3.0-1c-2.html] > After optimization, 4.7%(extractBeforeDataRecord+extractAfterDataRecord) of > the time is still spent on > org/apache/flink/cdc/runtime/typeutils/BinaryRecordDataGenerator.<init>. > Perhaps we can further optimize it. > !image-2024-05-06-00-37-16-028.png|width=379,height=107! > -- This message was sent by Atlassian Jira (v8.20.10#820010)