[ https://issues.apache.org/jira/browse/FLINK-19774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arvid Heise updated FLINK-19774: -------------------------------- Fix Version/s: 1.13.0 > Introduce Sub Partition View Version for Approximate Local Recovery > ------------------------------------------------------------------- > > Key: FLINK-19774 > URL: https://issues.apache.org/jira/browse/FLINK-19774 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Task > Reporter: Yuan Mei > Assignee: Yuan Mei > Priority: Major > Fix For: 1.13.0 > > > > This ticket is to solve a corner case where a downstream task continuously > fails multiple times, or an orphan task execution may exist for a short > period of time after new execution is running (as described in the FLIP) > > Here is an idea of how to cleanly and thoroughly solve this kind of problem: > # We go with the simplified release view version: only release view before a > new creation (in thread2). That says we won't clean up view when downstream > task disconnects ({{releaseView}} would not be called from the reference copy > of view) (in thread1 or 2). > * > ** This would greatly simplify the threading model > ** This won't cause any resource leak, since view release is only to notify > the upstream result partition to releaseOnConsumption when all subpartitions > are consumed in PipelinedSubPartitionView. In our case, we do not release the > result partition on consumption any way (the result partition is put in track > in JobMaster, similar to the ResultParition.blocking Type). > 2. Each view is associated with a downstream task execution version > * > ** This is making sense because we actually have different versions of view > now, corresponding to the vertex.version of the downstream task. > ** createView is performed only if the new version to create is greater than > the existing one > ** If we decide to create a new view, the old view should be released. > I think this way, we can completely disconnect the old view with the > subpartition. Besides that, the working handler in use would always hold the > freshest view reference. > > Point 1 has already been addressed in FLINK-19632. This ticket is to address > Point 2. > Details discussion in [https://github.com/apache/flink/pull/13648] > -- This message was sent by Atlassian Jira (v8.3.4#803005)