Re: Restart hook and checkpoint

Aljoscha Krettek Wed, 14 Mar 2018 17:45:53 -0700

Hi,

Have you looked into fine-grained recovery? 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures
 
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+:+Fine+Grained+Recovery+from+Task+Failures>


Stefan cc'ed might be able to give you some pointers about configuration.

Best,
Aljoscha

> On 6. Mar 2018, at 22:35, Ashish Pokharel <ashish...@yahoo.com> wrote:
> 
> Hi Gordon,
> 
> The issue really is we are trying to avoid checkpointing as datasets are 
> really heavy and all of the states are really transient in a few of our apps 
> (flushed within few seconds). So high volume/velocity and transient nature of 
> state make those app good candidates to just not have checkpoints. 
> 
> We do have offsets committed to Kafka AND we have “some” tolerance for gap / 
> duplicate. However, we do want to handle “graceful” restarts / shutdown. For 
> shutdown, we have been taking savepoints (which works great) but for restart, 
> we just can’t find a way. 
> 
> Bottom line - we are trading off resiliency for resource utilization and 
> performance but would like to harden apps for production deployments as much 
> as we can.
> 
> Hope that makes sense.
> 
> Thanks, Ashish
> 
>> On Mar 6, 2018, at 10:19 PM, Tzu-Li Tai <tzuli...@gmail.com> wrote:
>> 
>> Hi Ashish,
>> 
>> Could you elaborate a bit more on why you think the restart of all operators
>> lead to data loss?
>> 
>> When restart occurs, Flink will restart the job from the latest complete
>> checkpoint.
>> All operator states will be reloaded with state written in that checkpoint,
>> and the position of the input stream will also be re-winded.
>> 
>> I don't think there is a way to force a checkpoint before restarting occurs,
>> but as I mentioned, that should not be required, because the last complete
>> checkpoint will be used.
>> Am I missing something in your particular setup?
>> 
>> Cheers,
>> Gordon
>> 
>> 
>> 
>> --
>> Sent from: 
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: Restart hook and checkpoint

Reply via email to