Yanquan Lv created FLINK-36682:
----------------------------------

             Summary: Add split assign strategy to avoid OOM error in 
TaskManager
                 Key: FLINK-36682
                 URL: https://issues.apache.org/jira/browse/FLINK-36682
             Project: Flink
          Issue Type: Bug
          Components: Flink CDC
    Affects Versions: cdc-3.3.0
            Reporter: Yanquan Lv
             Fix For: cdc-3.3.0


During snapshot reading phase, we will split table into chunks and assign them 
to split reader in TaskManager.

For evenly chunk split, them are assigned in ascending order. For example, a 
table that primary key is id may be split into chunks like [-∞, 10000), 
[10000,20000), [20000,30000), ......[1500000, +∞). However, during snapshot 
reading phase, more records may be inserted and id will increase to relative 
high, and the last split may need to fetch too many records, for example, the 
last split may need to fetch records in range [1500000, 3000000], witch will 
cause TaskManager out of memory.

So I propose to add a strategy to allow user to config how to assign split, and 
by default, we can send the last split first to split reader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to