Hi, Yow I think there is another restart strategy in flink: region failover[1], but I could not find the documentation, maybe someone else may help here, For region failover, please take a look at this issue[2] before you use it. And you can take a look at this FLIP[3].
[1] https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/RestartPipelinedRegionStrategy.java [2] https://issues.apache.org/jira/browse/FLINK-10712 [3] https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures Siew Wai Yow <wai_...@hotmail.com> 于2019年1月13日周日 上午8:49写道: > Hi Qiu thanks again! > Based on my experience on Flink 1.3, when one of the TM crash the whole > cluster need to be restarted so i guess this is the recovery you mentioned. > But it sounds defeat the purpose of cluster as one TM crash should not > crash the whole cluster. May i know is this still the same in Flink 1.7? > Restart strategy is for job though not for TM failure. > > Thanks! > > Hi, Siew Wai Yow > > Yes, David is correct, the TM must be recovered, the number of TMs before > and after the crash must be the same. > In my last reply, I want to say that the states may not on the same TM > after the crash. Sorry for the unclear description. > > Siew Wai Yow <wai_...@hotmail.com> 于2019年1月12日周六 下午6:44写道: > >> Thanks Qiu but David has different view from stackoverflow. He mentioned >> the Crashed TM must be recovered. >> >> >> https://stackoverflow.com/questions/54149134/what-happen-to-state-in-flink-task-manager-when-crash/54153686?noredirect=1#comment95144500_54153686 >> >> "The crashed TM must be recovered, and state updates since the last >> checkpoint will be lost. Of course that lost state should be recreated as >> the job rewinds and resumes." >> >> Hi, Siew Wai Yow >> >> When the job is running, the states are stored in the local RocksDB, >> Flink will copy all the needed states to checkpointPath when doing a >> Checkpoint. >> If there have any failures, the job will be restored from the last >> previously *Successfully* checkpoint, and assign the restored states to all >> the current TM >> (These TMs do not need to be the same as before) . >> >> Siew Wai Yow <wai_...@hotmail.com> 于2019年1月12日周六 上午11:24写道: >> >>> Thanks. But this is something I know. I would like to know will the >>> other TM take over the crashed TM's state to ensure data completion(say the >>> state BYKEY, different key state will be stored in different TM) OR the >>> crashed TM need to be recovered to continue? >>> >>> For example, 5 records, >>> rec1:KEYA >>> rec2:KEYB >>> rec3:KEYA >>> rec4:KEYC >>> rec5:KEYB >>> >>> TM1 stored state for rec1:KEYA, rec3:KEYA >>> TM2 stored state for rec2:KEYB, rec5:KEYB <--crashed, what happen to >>> those state stored inTM2 >>> TM3 stored state for rec4:KEYC >>> >>> In case TM2 crashed, rec2 and rec5 will be assigned to other TM? Or >>> those record only recover when TM2 being recover? >>> >>> Thanks. >>> >>> >>> ------------------------------ >>> *From:* Jamie Grier <jgr...@lyft.com> >>> *Sent:* Saturday, January 12, 2019 2:26 AM >>> *To:* Siew Wai Yow >>> *Cc:* user@flink.apache.org >>> *Subject:* Re: What happen to state in Flink Task Manager when crash? >>> >>> Flink is designed such that local state is backed up to a highly >>> available system such as HDFS or S3. When a TaskManager fails state is >>> recovered from there. >>> >>> I suggest reading this: >>> <https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/state/state_backends.html> >>> https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/state/state_backends.html >>> >>> >>> On Fri, Jan 11, 2019 at 7:33 AM Siew Wai Yow <wai_...@hotmail.com> >>> wrote: >>> >>> Hello, >>> >>> May i know what happen to state stored in Flink Task Manager when this >>> Task manager crash. Say the state storage is rocksdb, would those data >>> transfer to other running Task Manager so that complete state data is ready >>> for data processing? >>> >>> Regards, >>> Yow >>> >>> >> >> -- >> Best, >> Congxian >> >> > > -- > Best, > Congxian > > -- Best, Congxian