[ 
https://issues.apache.org/jira/browse/FLINK-21450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289776#comment-17289776
 ] 

Robert Metzger commented on FLINK-21450:
----------------------------------------

Thanks a lot for offering your help. 
Even though I added this ticket, I have to admit that I don't know how to right 
away how to add this feature. If I were to implement this, I would look at the 
proposed extensions of the Adaptive Scheduler states in the FLIP 
(https://cwiki.apache.org/confluence/display/FLINK/FLIP-160%3A+Adaptive+Scheduler),
 and also how the DefaultScheduler / SchedulerBase implements this.
I think it would be best if you could either describe in the Jira how you are 
conceptually planning to add this feature, or if you would implement a 
prototype, and share it here.

We will also need to see how quickly we can implement and stabilize this 
feature, if we want to include it into the 1.13 release (feature freeze is by 
end of March.

> Add local recovery support to adaptive scheduler
> ------------------------------------------------
>
>                 Key: FLINK-21450
>                 URL: https://issues.apache.org/jira/browse/FLINK-21450
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Coordination
>            Reporter: Robert Metzger
>            Priority: Major
>
> local recovery means that, on a failure, we are able to re-use the state in a 
> taskmanager, instead of loading it again from distributed storage (which 
> means the scheduler needs to know where which state is located, and schedule 
> tasks accordingly).
> Adaptive Scheduler is currently not respecting the location of state, so 
> failures require the re-loading of state from the distributed storage.
> Adding this feature will allow us to enable the {{Local recovery and sticky 
> scheduling end-to-end test}} for adaptive scheduler again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to