Re: [PR] [FLINK-33354][runtime] Cache TaskInformation and JobInformation to avoid deserializing duplicate big objects [flink]

via GitHub Sun, 05 Nov 2023 23:53:18 -0800


1996fanrui commented on code in PR #23599:
URL: https://github.com/apache/flink/pull/23599#discussion_r1382890456



##########
flink-runtime/src/main/java/org/apache/flink/runtime/deployment/TaskDeploymentDescriptor.java:
##########
@@ -253,13 +274,19 @@ public void loadBigData(
 
             Preconditions.checkNotNull(blobService);
 
-            final File dataFile = blobService.getFile(jobId, jobInfoKey);
-            // NOTE: Do not delete the job info BLOB since it may be needed 
again during recovery.
-            //       (it is deleted automatically on the BLOB server and cache 
when the job
-            //       enters a terminal state)
-            SerializedValue<JobInformation> serializedValue =
-                    
SerializedValue.fromBytes(FileUtils.readAllBytes(dataFile.toPath()));
-            serializedJobInformation = new NonOffloaded<>(serializedValue);
+            JobInformation jobInformation = jobInformationCache.get(jobId, 
jobInfoKey);
+            if (jobInformation == null) {
+                final File dataFile = blobService.getFile(jobId, jobInfoKey);
+                // NOTE: Do not delete the job info BLOB since it may be 
needed again during
+                // recovery. (it is deleted automatically on the BLOB server 
and cache when the job
+                // enters a terminal state)
+                jobInformation =
+                        InstantiationUtil.deserializeObject(
+                                new 
BufferedInputStream(Files.newInputStream(dataFile.toPath())),
+                                getClass().getClassLoader());
+                jobInformationCache.put(jobId, jobInfoKey, jobInformation);
+            }
+            this.jobInformation = jobInformation.deepCopy();

Review Comment:
   Hi @huwh , thanks a lot for your review!
   
   > Can we use this.jobInformation = jobInformation here ?
   
   Yes, we can. Your suggestion can work, and both are fine to me. 
   
   Let me explain the background of the design: The purpose of defining 
`deepCopy` for `JobInformation` is that when other developers add new fields to 
`JobInformation`, he can more easily notice whether the new field requires 
`deepCopy`. When adding a new filed in the future, and the new field can be 
changed by tasks, it's really dangerous. So let developers know this risk is 
necessary.
   
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-33354][runtime] Cache TaskInformation and JobInformation to avoid deserializing duplicate big objects [flink]

Reply via email to