[jira] [Created] (FLINK-33354) Reuse the TaskInformation for multiple slots

Rui Fan (Jira) Tue, 24 Oct 2023 20:04:54 -0700

Rui Fan created FLINK-33354:
-------------------------------

             Summary: Reuse the TaskInformation for multiple slots
                 Key: FLINK-33354
                 URL: https://issues.apache.org/jira/browse/FLINK-33354
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Task
    Affects Versions: 1.17.1, 1.18.0
            Reporter: Rui Fan
            Assignee: Rui Fan



The background is similar to FLINK-33315.

A hive table with a lot of data, and the HiveSource#partitionBytes is 281MB. 
When slotPerTM = 4, one TM will run 4 HiveSources at the same time.

 

How the TaskExecutor to submit a large task?
 # TaskExecutor#loadBigData will read all bytes from file to 
SerializedValue<TaskInformation> 
 ** The SerializedValue<TaskInformation>  has a byte[]
 ** It will cost the heap memory
 ** It will be great than 281 MB, because it not only stores 
HiveSource#partitionBytes, it also stores other information of TaskInformation.
 # Generate the TaskInformation from SerializedValue<TaskInformation> 
 ** TaskExecutor#submitTask calls the 
tdd.getSerializedTaskInformation()..deserializeValue()
 ** tdd.getSerializedTaskInformation() is SerializedValue<TaskInformation> 
 ** It will generate the TaskInformation
 ** TaskInformation includes the Configuration 
{color:#9876aa}taskConfiguration{color}
 ** The {color:#9876aa}taskConfiguration{color} includes 
StreamConfig#{color:#9876aa}SERIALIZEDUDF{color}

 

{color:#172b4d}Based on the above process, TM memory will have 2 big byte array 
for each task:{color}
 * {color:#172b4d}The SerializedValue<TaskInformation>{color}
 * {color:#172b4d}The TaskInformation{color}

When one TM runs 4 HiveSources at the same time, it will have 8 big byte array.

In our production environment, this is also a situation that often leads to TM 
OOM.
h2. Solution:

These data is totally same due to the PermanentBlobKey is same. We can add a 
cache for it to reduce the memory and cpu cost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33354) Reuse the TaskInformation for multiple slots

Reply via email to