Re: Questions about checkpoints/savepoints

vipul singh Mon, 09 Oct 2017 18:02:32 -0700

Thanks Stefan for the answers above. These are really helpful.

I have a few followup questions:


   1. I see my savepoints are created in a folder, which has a _metadata
   file and another file. Looking at the code
   
<https://github.com/apache/flink/blob/6642768ad8f8c5d1856742a6d148f7724c20666c/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/savepoint/SavepointStore.java#L191>
   it seems like the metadata file contains tasks states, operator state
   and master states
   
<https://github.com/apache/flink/blob/6642768ad8f8c5d1856742a6d148f7724c20666c/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/savepoint/SavepointV2.java#L87>.
   What is the purpose of the other file in the savepoint folder? My guess is
   it should be a checkpoint file?
   2. I am planning to use s3 as my state backend, so want to ensure that
   application restarts are not affected by read-after-write consistency of
   s3( if I use s3 as a savepoint backend). I am curious how flink restores
   data from the _metadata file, and the other file? Does the _metadata file
   contain path to these other files? or would it do a listing on the s3
   folder?


Please let me know,

Thanks,
Vipul

On Tue, Sep 26, 2017 at 2:36 AM, Stefan Richter <s.rich...@data-artisans.com
> wrote:

> Hi,
>
> I have answered your questions inline:
>
>
>    1. It seems to me that checkpoints can be treated as flink internal
>    recovery mechanism, and savepoints act more as user-defined recovery
>    points. Would that be a correct assumption?
>
> You could see it that way, but I would describe savepoints more as
> user-defined *restart* points than *recovery* points. Please take a look at
> my answers in this thread, because they cover most of your question:
>
> http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/difference-between-checkpoints-amp-savepoints-td14787.html .
>
>
>    1. While cancelling an application with -s option, it specifies the
>    savepoint location. Is there a way during application startup to identify
>    the last know savepoint from a folder by itself, and restart from there.
>    Since I am saving my savepoints on s3, I want to avoid issues arising from
>    *ls* command on s3 due to read-after-write consistency of s3.
>
> I don’t think that this feature exists, you have to specify the savepoint.
>
>
>    1. Suppose my application has a checkpoint at point t1, and say i
>    cancel this application sometime in future before the next available
>    checkpoint( say t1+x). If I start the application without specifying the
>    savepoint, it will start from the last known checkpoint(at t1), which wont
>    have the application state saved, since I had cancelled the application.
>    Would this is a correct assumption?
>
> If you restart a canceled application it will not consider checkpoints.
> They are only considered in recovery on failure. You need to specify a
> savepoint or externalized checkpoint for restarts to make explicit that you
> intend to restart a job, and not to run a new instance of the job.
>
>
>    1. Would using ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION be
>    same as manually saving regular savepoints?
>
> Not the same, because checkpoints and savepoints are different in certain
> aspects, but both methods leave you with something that survives job
> cancelation and can be used to restart from a certain state.
>
> Best,
> Stefan
>
>


-- 
Thanks,
Vipul

Re: Questions about checkpoints/savepoints

Reply via email to