[ 
https://issues.apache.org/jira/browse/FLINK-37120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Xu reassigned FLINK-37120:
----------------------------------

    Assignee: JunboWang

> MySqlSnapshotSplitAssigner assign the ending chunk early to avoid TaskManager 
> OOM
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-37120
>                 URL: https://issues.apache.org/jira/browse/FLINK-37120
>             Project: Flink
>          Issue Type: Improvement
>          Components: Flink CDC
>    Affects Versions: cdc-3.2.1
>            Reporter: JunboWang
>            Assignee: JunboWang
>            Priority: Minor
>              Labels: pull-request-available
>
> When synchronizing a large table, the ending chunk is always executed last, 
> and splitEnd is null. This causing the task to scan too much data and 
> eventually TaskManager OOM.
>  
> Related log:
> {code:java}
> // code placeholder
> 2025-01-13T20:52:01.926+0800: 136.022: [Full GC (Allocation Failure) 
> 2025-01-13T20:52:01.926+0800: 136.022: [Tenured: 
> 1713535K->1713535K(1713536K), 3.6111578 secs] 2484607K->2482139K(2484608K), 
> [Metaspace: 89121K->89121K(1130496K)], 3.6113026 secs] [Times: user=3.20 
> sys=0.40, real=3.61 secs]
> 2025-01-13T20:52:05.555+0800: 139.651: [Full GC (Allocation Failure) 
> 2025-01-13T20:52:05.555+0800: 139.651: [Tenured: 
> 1713535K->1713535K(1713536K), 3.9733441 secs] 2484607K->2482375K(2484608K), 
> [Metaspace: 89133K->89133K(1130496K)], 3.9734511 secs] [Times: user=3.52 
> sys=0.45, real=3.98 secs]
> 2025-01-13T20:52:09.548+0800: 143.644: [Full GC (Allocation Failure) 
> 2025-01-13T20:52:09.548+0800: 143.644: [Tenured: 
> 1713535K->1713535K(1713536K), 3.3805432 secs] 2484607K->2482897K(2484608K), 
> [Metaspace: 89134K->89134K(1130496K)], 3.3806496 secs] [Times: user=3.36 
> sys=0.02, real=3.38 secs] {code}
> {code:java}
> // code placeholder
> 2025-01-13 20:49:54,563 INFO 
> org.apache.flink.cdc.connectors.mysql.debezium.task.context.StatefulTaskContext
>  [] - Starting offset is initialized to {ts_sec=0, file=, pos=0, 
> kind=EARLIEST, row=0, event=0}
> 2025-01-13 20:49:54,631 INFO 
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
>  [] - Snapshot step 1 - Determining low watermark {ts_sec=0, 
> file=mysql-bin.xxxxx, pos=xxxxx, kind=SPECIFIC, gtids=xxxxxxxxx, row=0, 
> event=0} for split MySqlSnapshotSplit{tableId=xxxxdb.xxxxx_test_table, 
> splitId='xxxxdb.xxxxx_test_table:159959', splitKeyType=[`id` BIGINT NOT 
> NULL], splitStart=[1333738235], splitEnd=null, highWatermark=null}
> 2025-01-13 20:49:54,636 INFO 
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
>  [] - Snapshot step 2 - Snapshotting data
> 2025-01-13 20:49:54,636 INFO 
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
>  [] - Exporting data from split 'xxxxdb.xxxxx_test_table:159959' of table 
> xxxxdb.xxxxx_test_table
> 2025-01-13 20:49:54,637 INFO 
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
>  [] - For split 'xxxxdb.xxxxx_test_table:159959' of table 
> xxxxdb.xxxxx_test_table using select statement: 'SELECT * FROM 
> `xxxxdb`.`xxxxx_test_table` WHERE `id` >= ?'
> 2025-01-13 20:50:17,482 INFO 
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
>  [] - Exported 167463 records for split 'xxxxdb.xxxxx_test_table:159959' 
> after 00:00:22.846
> 2025-01-13 20:50:31,627 INFO 
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
>  [] - Exported 409419 records for split 'xxxxdb.xxxxx_test_table:159959' 
> after 00:00:36.991
> 2025-01-13 20:50:41,805 INFO 
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
>  [] - Exported 510663 records for split 'xxxxdb.xxxxx_test_table:159959' 
> after 00:00:47.169
> 2025-01-13 20:50:55,184 INFO 
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
>  [] - Exported 588220 records for split 'xxxxdb.xxxxx_test_table:159959' 
> after 00:01:00.548
> 2025-01-13 20:51:05,580 INFO 
> org.apache.flink.cdc.connectors.mysql.debezium.task.MySqlSnapshotSplitReadTask
>  [] - Exported 615374 records for split 'xxxxdb.xxxxx_test_table:159959' 
> after 00:01:10.944 {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to