[ https://issues.apache.org/jira/browse/FLINK-37120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Leonard Xu reassigned FLINK-37120: ---------------------------------- Assignee: JunboWang > MySqlSnapshotSplitAssigner assign the ending chunk early to avoid TaskManager > OOM > --------------------------------------------------------------------------------- > > Key: FLINK-37120 > URL: https://issues.apache.org/jira/browse/FLINK-37120 > Project: Flink > Issue Type: Improvement > Components: Flink CDC > Affects Versions: cdc-3.2.1 > Reporter: JunboWang > Assignee: JunboWang > Priority: Minor > Labels: pull-request-available > > When synchronizing a large table, the ending chunk is always executed last, > and splitEnd is null. This causing the task to scan too much data and > eventually TaskManager OOM. > > Related log: > {code:java} > // code placeholder > 2025-01-13T20:52:01.926+0800: 136.022: [Full GC (Allocation Failure) > 2025-01-13T20:52:01.926+0800: 136.022: [Tenured: > 1713535K->1713535K(1713536K), 3.6111578 secs] 2484607K->2482139K(2484608K), > [Metaspace: 89121K->89121K(1130496K)], 3.6113026 secs] [Times: user=3.20 > sys=0.40, real=3.61 secs] > 2025-01-13T20:52:05.555+0800: 139.651: [Full GC (Allocation Failure) > 2025-01-13T20:52:05.555+0800: 139.651: [Tenured: > 1713535K->1713535K(1713536K), 3.9733441 secs] 2484607K->2482375K(2484608K), > [Metaspace: 89133K->89133K(1130496K)], 3.9734511 secs] [Times: user=3.52 > sys=0.45, real=3.98 secs] > 2025-01-13T20:52:09.548+0800: 143.644: [Full GC (Allocation Failure) > 2025-01-13T20:52:09.548+0800: 143.644: [Tenured: > 1713535K->1713535K(1713536K), 3.3805432 secs] 2484607K->2482897K(2484608K), > [Metaspace: 89134K->89134K(1130496K)], 3.3806496 secs] [Times: user=3.36 > sys=0.02, real=3.38 secs] {code} > {code:java} > // code placeholder > 2025-01-13 20:49:54,563 INFO > org.apache.flink.cdc.connectors.mysql.debezium.task.context.StatefulTaskContext > [] - Starting offset is initialized to {ts_sec=0, file=, pos=0, > kind=EARLIEST, row=0, event=0} > 2025-01-13 20:49:54,631 INFO > org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask > [] - Snapshot step 1 - Determining low watermark {ts_sec=0, > file=mysql-bin.xxxxx, pos=xxxxx, kind=SPECIFIC, gtids=xxxxxxxxx, row=0, > event=0} for split MySqlSnapshotSplit{tableId=xxxxdb.xxxxx_test_table, > splitId='xxxxdb.xxxxx_test_table:159959', splitKeyType=[`id` BIGINT NOT > NULL], splitStart=[1333738235], splitEnd=null, highWatermark=null} > 2025-01-13 20:49:54,636 INFO > org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask > [] - Snapshot step 2 - Snapshotting data > 2025-01-13 20:49:54,636 INFO > org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask > [] - Exporting data from split 'xxxxdb.xxxxx_test_table:159959' of table > xxxxdb.xxxxx_test_table > 2025-01-13 20:49:54,637 INFO > org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask > [] - For split 'xxxxdb.xxxxx_test_table:159959' of table > xxxxdb.xxxxx_test_table using select statement: 'SELECT * FROM > `xxxxdb`.`xxxxx_test_table` WHERE `id` >= ?' > 2025-01-13 20:50:17,482 INFO > org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask > [] - Exported 167463 records for split 'xxxxdb.xxxxx_test_table:159959' > after 00:00:22.846 > 2025-01-13 20:50:31,627 INFO > org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask > [] - Exported 409419 records for split 'xxxxdb.xxxxx_test_table:159959' > after 00:00:36.991 > 2025-01-13 20:50:41,805 INFO > org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask > [] - Exported 510663 records for split 'xxxxdb.xxxxx_test_table:159959' > after 00:00:47.169 > 2025-01-13 20:50:55,184 INFO > org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask > [] - Exported 588220 records for split 'xxxxdb.xxxxx_test_table:159959' > after 00:01:00.548 > 2025-01-13 20:51:05,580 INFO > org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask > [] - Exported 615374 records for split 'xxxxdb.xxxxx_test_table:159959' > after 00:01:10.944 {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)