jinxing64 commented on pull request #16118:
URL: https://github.com/apache/flink/pull/16118#issuecomment-880417258


   Thanks for comments @guoweiM . 
   Currently there are two data structures in JobMasterPartitionTrackerImpl:
   1. `partitionTable`, which maintains the mapping from tmId to result 
partitions it produced;
   2. `partitionInfos`, which records all the partitions under tracking;
   
   When a tm is gone, it's relative partitions in `partitionTable` will be 
cleared, which means no partitions is accommodated on it;
   At the same time, only tm internal partitions is cleared from 
`partitionsInfos` but external partitions are kept -- external partitions are 
available and could be decoupled from the lifecycle from tm;
   I think that's where the `inconsistencies` you mentioned came from. And I 
agree that it take words to explain the underlying mechanism and might be hard 
for understanding for new commers;
   
   > If JobMasterPartitionTracker has found that 
ResultPartitionDeploymentDescriptor is not an internal shuffle, don’t maintain 
the mapping from TM to ResultPartition at the beginning.
   
   I agree with you for the proposal -- `partitionTable` only maintains tm 
internal partitions from the start and `partitionInfos` records all external 
and internal partitions. With this change the semantics of below interfaces 
will be:
   1. `boolean isTrackingPartitionsFor(K key);`: if there's internal partitions 
tracked for the `key`;
   2. `boolean isPartitionTracked(ResultPartitionID resultPartitionID)`: if the 
resultPartitionID is tracked anywhere, no matter internally or externally
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to