Hi, Yow
I think there is another restart strategy in flink: region failover[1], but
I could not find the documentation, maybe someone else may help here, For
region failover, please take a look at this issue[2] before you use it. And
you can take a look at this FLIP[3].

[1]
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/RestartPipelinedRegionStrategy.java
[2] https://issues.apache.org/jira/browse/FLINK-10712
[3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures

Siew Wai Yow <wai_...@hotmail.com> 于2019年1月13日周日 上午8:49写道:

> Hi Qiu thanks again!
> Based on my experience on Flink 1.3, when one of the TM crash the whole
> cluster need to be restarted so i guess this is the recovery you mentioned.
> But it sounds defeat the purpose of cluster as one TM crash should not
> crash the whole cluster. May i know is this still the same in Flink 1.7?
> Restart strategy is for job though not for TM failure.
>
> Thanks!
>
> Hi, Siew Wai Yow
>
> Yes, David is correct, the TM must be recovered, the number of TMs before
> and after the crash must be the same.
> In my last reply, I want to say that the states may not on the same TM
> after the crash. Sorry for the unclear description.
>
> Siew Wai Yow <wai_...@hotmail.com> 于2019年1月12日周六 下午6:44写道:
>
>> Thanks Qiu but David has different view from stackoverflow. He mentioned
>> the Crashed TM must be recovered.
>>
>>
>> https://stackoverflow.com/questions/54149134/what-happen-to-state-in-flink-task-manager-when-crash/54153686?noredirect=1#comment95144500_54153686
>>
>> "The crashed TM must be recovered, and state updates since the last
>> checkpoint will be lost. Of course that lost state should be recreated as
>> the job rewinds and resumes."
>>
>> Hi, Siew Wai Yow
>>
>> When the job is running, the states are stored in the local RocksDB,
>> Flink will copy all the needed states to checkpointPath when doing a
>> Checkpoint.
>> If there have any failures, the job will be restored from the last
>> previously *Successfully* checkpoint, and assign the restored states to all
>> the current TM
>> (These TMs do not need to be the same as before) .
>>
>> Siew Wai Yow <wai_...@hotmail.com> 于2019年1月12日周六 上午11:24写道:
>>
>>> Thanks. But this is something I know. I would like to know will the
>>> other TM take over the crashed TM's state to ensure data completion(say the
>>> state BYKEY, different key state will be stored in different TM) OR the
>>> crashed TM need to be recovered to continue?
>>>
>>> For example, 5 records,
>>> rec1:KEYA
>>> rec2:KEYB
>>> rec3:KEYA
>>> rec4:KEYC
>>> rec5:KEYB
>>>
>>> TM1 stored state for rec1:KEYA, rec3:KEYA
>>> TM2 stored state for rec2:KEYB, rec5:KEYB <--crashed, what happen to
>>> those state stored inTM2
>>> TM3 stored state for rec4:KEYC
>>>
>>> In case TM2 crashed, rec2 and rec5 will be assigned to other TM? Or
>>> those record only recover when TM2 being recover?
>>>
>>> Thanks.
>>>
>>>
>>> ------------------------------
>>> *From:* Jamie Grier <jgr...@lyft.com>
>>> *Sent:* Saturday, January 12, 2019 2:26 AM
>>> *To:* Siew Wai Yow
>>> *Cc:* user@flink.apache.org
>>> *Subject:* Re: What happen to state in Flink Task Manager when crash?
>>>
>>> Flink is designed such that local state is backed up to a highly
>>> available system such as HDFS or S3.  When a TaskManager fails state is
>>> recovered from there.
>>>
>>> I suggest reading this:
>>> <https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/state/state_backends.html>
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/state/state_backends.html
>>>
>>>
>>> On Fri, Jan 11, 2019 at 7:33 AM Siew Wai Yow <wai_...@hotmail.com>
>>> wrote:
>>>
>>> Hello,
>>>
>>> May i know what happen to state stored in Flink Task Manager when this
>>> Task manager crash. Say the state storage is rocksdb, would those data
>>> transfer to other running Task Manager so that complete state data is ready
>>> for data processing?
>>>
>>> Regards,
>>> Yow
>>>
>>>
>>
>> --
>> Best,
>> Congxian
>>
>>
>
> --
> Best,
> Congxian
>
>

-- 
Best,
Congxian

Reply via email to