zhangjing created FLINK-4356:
--------------------------------
Summary: JobMaster HA
Key: FLINK-4356
URL: https://issues.apache.org/jira/browse/FLINK-4356
Project: Flink
Issue Type: Sub-task
Reporter: zhangjing
1. for standalone mode, LocalDispatcher watch JobMaster
LocalDispatcher detect the failure of JobMaster, recover jobGraph and
Libraries from persistent storage, spawn a new JobManager
new JobMaster compete for leadership, save address to zookeeper storage
new JobMaster registers at ResourceManager
new JobMaster recover Execution of its job (execution graph) from latest
completed checkpoint
2. for yarn mode, YarnApplicationMasterRunner create a ProcessReaper of
JobMaster
ProcessReaper monitor JobMaster, kill JVM upon JobMaster termination
Yarn will create a new AppMaster which contains a new JobManager, JobGraph and
Libraries are retrieved as startup artifacts
new JobMaster compete for leadership, save address to zookeeper storage
new JobMaster registers at ResourceManager
new JobMaster recover Execution of its job (execution graph) from latest
completed checkpoint
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)