[ https://issues.apache.org/jira/browse/FLINK-19142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhu Zhu updated FLINK-19142: ---------------------------- Comment: was deleted (was: This issue was marked "stale-assigned" 7 days ago and has not received an update. I have automatically removed the current assignee from the issue so others in the community may pick it up. If you are still working on this ticket, please ask a committer to reassign you and provide an update about your current status. ) > Investigate slot hijacking from preceding pipelined regions after failover > -------------------------------------------------------------------------- > > Key: FLINK-19142 > URL: https://issues.apache.org/jira/browse/FLINK-19142 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination > Affects Versions: 1.12.0 > Reporter: Andrey Zagrebin > Assignee: Zhu Zhu > Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > The ticket originates from [this PR > discussion|https://github.com/apache/flink/pull/13181#discussion_r481087221]. > The previous AllocationIDs are used by > PreviousAllocationSlotSelectionStrategy to schedule subtasks into the slot > where they were previously executed before a failover. If the previous slot > (AllocationID) is not available, we do not want subtasks to take previous > slots (AllocationIDs) of other subtasks. > The MergingSharedSlotProfileRetriever gets all previous AllocationIDs of the > bulk from SlotSharingExecutionSlotAllocator but only from the current bulk. > The previous AllocationIDs of other bulks stay unknown. Therefore, the > current bulk can potentially hijack the previous slots from the preceding > bulks. On the other hand the previous AllocationIDs of other tasks should be > taken if the other tasks are not going to run at the same time, e.g. not > enough resources after failover or other bulks are done. > One way to do it may be to give to MergingSharedSlotProfileRetriever all > previous AllocationIDs of bulks which are going to run at the same time. -- This message was sent by Atlassian Jira (v8.3.4#803005)