[ https://issues.apache.org/jira/browse/FLINK-33354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-33354: ----------------------------------- Labels: pull-request-available (was: ) > Cache TaskInformation and JobInformation to avoid deserializing duplicate big > objects > ------------------------------------------------------------------------------------- > > Key: FLINK-33354 > URL: https://issues.apache.org/jira/browse/FLINK-33354 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Task > Affects Versions: 1.18.0, 1.17.1 > Reporter: Rui Fan > Assignee: Rui Fan > Priority: Major > Labels: pull-request-available > > The background is similar to FLINK-33315. > A hive table with a lot of data, and the HiveSource#partitionBytes is 281MB. > When slotPerTM = 4, one TM will run 4 HiveSources at the same time. > > How the TaskExecutor to submit a large task? > # TaskExecutor#loadBigData will read all bytes from file to > SerializedValue<TaskInformation> > ** The SerializedValue<TaskInformation> has a byte[] > ** It will cost the heap memory > ** It will be great than 281 MB, because it not only stores > HiveSource#partitionBytes, it also stores other information of > TaskInformation. > # Generate the TaskInformation from SerializedValue<TaskInformation> > ** TaskExecutor#submitTask calls the > tdd.getSerializedTaskInformation()..deserializeValue() > ** tdd.getSerializedTaskInformation() is SerializedValue<TaskInformation> > ** It will generate the TaskInformation > ** TaskInformation includes the Configuration > {color:#9876aa}taskConfiguration{color} > ** The {color:#9876aa}taskConfiguration{color} includes > StreamConfig#{color:#9876aa}SERIALIZEDUDF{color} > > {color:#172b4d}Based on the above process, TM memory will have 2 big byte > array for each task:{color} > * {color:#172b4d}The SerializedValue<TaskInformation>{color} > * {color:#172b4d}The TaskInformation{color} > When one TM runs 4 HiveSources at the same time, it will have 8 big byte > array. > In our production environment, this is also a situation that often leads to > TM OOM. > h2. Solution: > These data is totally same due to the PermanentBlobKey is same. We can add a > cache for it to reduce the memory and cpu cost. -- This message was sent by Atlassian Jira (v8.20.10#820010)