[jira] [Commented] (FLINK-9480) Let local recovery support rescaling

Stefan Richter (JIRA) Wed, 30 May 2018 01:52:54 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494892#comment-16494892
 ]


Stefan Richter commented on FLINK-9480:
---------------------------------------

Can you give some more details why you think this is useful or important or 
which use case you want to improve? The main goal of local recovery was to have 
a faster recovery under failures, and recovery does not give you any 
opportunity to rescale, so we are talking about restarts from local state and 
this makes things already a bit tricky. For local recovery, you need to know 
about your previous scheduling. The information about your previous scheduling 
might get lost when the job is stopped and the JM goes away. So we would need 
to persist that, e.g. in Zookeeper. Even then you can still run into the 
problem that the previous locations are already occupied by another job in the 
meantime, and also when can you finally let go of the local state for this 
approach? Or are we talking about some form of rescaling that does not 
terminate the previous job / JM?
I want to make aware that this could complicate things quiet a bit. In this 
context, we can also think about replicating state to pre-warm node or have 
more alternatives with local state in case a node goes down. But that is also a 
new feature by itself.
Bottom line is, personally, I currently still see many features (timer service, 
ttl state,...) that I would consider to have a higher priority, but eventually 
we can surely think about improved rescaling and/or replication.

> Let local recovery support rescaling
> ------------------------------------
>
>                 Key: FLINK-9480
>                 URL: https://issues.apache.org/jira/browse/FLINK-9480
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.5.0
>            Reporter: Sihua Zhou
>            Priority: Major
>
> Currently, local recovery only support restore from checkpoint and without 
> rescaling. Maybe we should enable it to support rescaling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-9480) Let local recovery support rescaling

Reply via email to