Sxnan commented on code in PR #19653:
URL: https://github.com/apache/flink/pull/19653#discussion_r879216461


##########
flink-runtime/src/main/java/org/apache/flink/runtime/shuffle/ShuffleMaster.java:
##########
@@ -84,6 +86,33 @@ CompletableFuture<T> registerPartitionWithProducer(
             PartitionDescriptor partitionDescriptor,
             ProducerDescriptor producerDescriptor);
 
+    /**
+     * Returns all the shuffle descriptors for the partitions in the 
intermediate data set with the
+     * given id.
+     *
+     * @param intermediateDataSetID The id of the intermediate data set.
+     * @return all the shuffle descriptors for the partitions in the 
intermediate data set. Null if
+     *     not exist.
+     */
+    default Collection<T> getClusterPartitionShuffleDescriptors(

Review Comment:
   Thanks for the clarification.
   
   If I understand correctly, the `ResourceManagerPartitionTracker` currently 
only tracks the cluster partition in TaskManager. If we want to unify the 
cluster partition management, we have to extend the scope of the 
`ResourceManagerPartitionTracker` so that it also tracks the cluster partitions 
for remote shuffle service and keeps track of the ShuffleDescriptors of the 
cluster partitions. The ShuffleDescriptor can be encapsulated in the Heartbeat 
payload `ClusterPartitionReport` from Task Executor to ResourceManager.
   
   In order to take care of the case when a cluster partition gets lost in the 
remote shuffle cluster, ShuffleMaster may need to have access to the 
ResourceManager and report the status of the cluster partition. I think 
supporting cluster partition for remote shuffle can be another PR to not making 
this PR more complicated.
   
   Please let me know if I understand correctly, then I will go ahead and make 
the change.



##########
flink-runtime/src/main/java/org/apache/flink/runtime/shuffle/ShuffleMaster.java:
##########
@@ -84,6 +86,33 @@ CompletableFuture<T> registerPartitionWithProducer(
             PartitionDescriptor partitionDescriptor,
             ProducerDescriptor producerDescriptor);
 
+    /**
+     * Returns all the shuffle descriptors for the partitions in the 
intermediate data set with the
+     * given id.
+     *
+     * @param intermediateDataSetID The id of the intermediate data set.
+     * @return all the shuffle descriptors for the partitions in the 
intermediate data set. Null if
+     *     not exist.
+     */
+    default Collection<T> getClusterPartitionShuffleDescriptors(

Review Comment:
   Thanks for the clarification.
   
   If I understand correctly, the `ResourceManagerPartitionTracker` currently 
only tracks the cluster partition in TaskManager. If we want to unify the 
cluster partition management, we have to extend the scope of the 
`ResourceManagerPartitionTracker` so that it also tracks the cluster partitions 
for remote shuffle service and keeps track of the ShuffleDescriptors of the 
cluster partitions. The ShuffleDescriptor can be encapsulated in the Heartbeat 
payload `ClusterPartitionReport` from Task Executor to ResourceManager.
   
   In order to take care of the case when a cluster partition gets lost in the 
remote shuffle cluster, ShuffleMaster may need to have access to the 
ResourceManager and report the status of the cluster partition. I think 
supporting cluster partition for remote shuffle can be another PR to not make 
this PR more complicated.
   
   Please let me know if I understand correctly, then I will go ahead and make 
the change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to