Re: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Aljoscha Krettek Fri, 14 Jun 2019 07:04:05 -0700

Please also see my comment on 
https://issues.apache.org/jira/browse/FLINK-12619?focusedCommentId=16864098&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16864098
 
<https://issues.apache.org/jira/browse/FLINK-12619?focusedCommentId=16864098&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16864098>


For this FLIP-41 it means we go forward with the design basically as is but 
should call it “Unified Format” or something like it.

If no-one else comments, we should proceed to a [VOTE] thread to formally adopt 
the FLIP.

Aljoscha

> On 14. Jun 2019, at 15:40, Yu Li <l...@apache.org> wrote:
> 
> Hi Aljoscha and all,
> 
> My 2 cents here:
> 
> 1. Conceptually it worth a second thought about introducing an optimized
> snapshot format for now (i.e. use checkpoint format in savepoint), just
> like it's not recommended to use snapshot for backup in database (although
> practically it could be implemented).
> 
> 2. Stop-with-checkpoint mechanism is like stopping database instance with a
> data flush, thus (IMHO) a different story from the checkpoint/savepoint (db
> snapshot/backup) diversity.
> 
> 3. In the long run we may improve the checkpoint to allow a short enough
> interval thus it may become some format of transactional log, then we could
> enable checkpoint-based savepoint (like transactional log based backup), so
> I agree to still call the new format in FLIP-41 a "Unified Format" although
> in the short term it only unifies savepoint.
> 
> I've also wrote a document [1] to include more details and please refer to
> it if interested. Thanks!
> 
> [1] https://docs.google.com/document/d/1uE4R3wNal6e67FkDe0UvcnsIMMDpr35j
> 
> Best Regards,
> Yu
> 
> 
> On Thu, 6 Jun 2019 at 19:42, Aljoscha Krettek <aljos...@apache.org> wrote:
> 
>> Btw, I think this FLIP is a very good effort, we just need to reframe the
>> effort a tiny bit. +1
>> 
>>> On 6. Jun 2019, at 13:41, Aljoscha Krettek <aljos...@apache.org> wrote:
>>> 
>>> Hi,
>>> 
>>> I had a brief discussion with Stephan that helped me sort my thoughts on
>> the broader topics of checkpoints, savepoints, binary formats,
>> user-triggered checkpoints, and periodic savepoints. I’ll try to summarise
>> my stance on this and also comment with the same message on the other
>> relevant Jira Issues and threads.
>>> 
>>> For reference, the relevant FLIP and Jira issues are these:
>>> 
>>> -
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints:
>> <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41:+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints:>
>> Unified Savepoint Format
>>> - https://issues.apache.org/jira/browse/FLINK-12619: Add support for
>> stop-with-checkpoint
>>> - https://issues.apache.org/jira/browse/FLINK-6755: User-triggered
>> checkpoints
>>> - https://issues.apache.org/jira/browse/FLINK-4620: Automatically
>> creating savepoints
>>> - https://issues.apache.org/jira/browse/FLINK-4511: Schedule periodic
>> savepoints
>>> 
>>> There are roughly two different dimensions in the topic of
>> savepoints/checkpoints (I’ll use snapshot as the generic term for both):
>>> 1) who controls the snapshot
>>> 2) what’s the (binary) format of the snapshot
>>> 
>>> For 1), we currently have checkpoints and savepoints. Checkpoints are
>> created by the system for fault tolerance. They are managed by the system
>> and the system is free to discard them when it sees fit. Savepoints are in
>> the control of the user. A user can choose to create a save point, they can
>> delete them, they can restore from them at will. The system will not clean
>> up savepoints. We should try and keep this separation and not muddle the
>> two concepts.
>>> 
>>> For 2), we currently have various different formats between the
>> different state backends and also for the same backend. I.e. RocksDB can do
>> full or incremental snapshots, local snapshots, and probably more.
>>> 
>>> FLIP-41 aims at introducing a unified “savepoint" format that is
>> interchangeable between the different state backends. In light of the above
>> points, we should say that FLIP-41 aims to introduce a canonical format
>> that is interchangeable between different backends. This doesn’t mean that
>> we should tie this format strictly to savepoints, though. For performance
>> reasons, users might choose to do savepoints that use one of the optimised
>> formats that the backends offer, for example incremental snapshots. Or they
>> might choose to use the canonical format for regular checkpoints so that
>> they can always switch between backends using periodically created
>> externalised checkpoints.
>>> 
>>> The motivation behind FLINK-12619 is to have a more lightweight
>> alternative for stop-with-savepoint, for example using the incremental
>> snapshot format that RocksDB has. With the above in mind, however, this
>> becomes “Add support for choosing the snapshot format for
>> stop-with-savepoint”. It should not be stop-with-checkpoint, because
>> checkpoints are something that the system manages and not something that
>> the user should trigger. The same is true for FLINK-6755, the motivation is
>> the same I think. The change should be called “Add support for choosing the
>> snapshot format for savepoints”, however.
>>> 
>>> For the last two Jira issues mentioned above it should be quite clear
>> what I think. I do, however, see a need for potentially different
>> overlapping checkpoint periods or intervals. Users might want to have their
>> regular checkpoints use an optimised format but they also want to have a
>> “canonical format” checkpoint every no and then so that the lineage of
>> incremental checkpoints does not become too unwieldy.
>>> 
>>> Please let me know what you think!
>>> 
>>> Aljoscha
>>> 
>>>> On 5. Jun 2019, at 10:36, Tzu-Li (Gordon) Tai <tzuli...@apache.org>
>> wrote:
>>>> 
>>>> I want to quickly bump this discussion to gather more consensus from
>> others
>>>> on the FLIP, and see if we want to aim this for the upcoming 1.9.0
>> release.
>>>> The proposal touches binary formats of savepoints, which is a major
>> part of
>>>> Flink's public user interface, so having explicit approval from other
>>>> members of the community would be nice here.
>>>> 
>>>> Cheers,
>>>> Gordon
>>>> 
>>>> On Wed, May 29, 2019 at 11:45 AM Tzu-Li (Gordon) Tai <
>> tzuli...@apache.org>
>>>> wrote:
>>>> 
>>>>> I also should point out something that I forgot to mention in the
>> initial
>>>>> post:
>>>>> Stefan has helped a lot in understanding the current status of state
>>>>> backends and also participated a lot in design choices for the FLIP :)
>>>>> 
>>>>> On Wed, May 29, 2019 at 5:02 AM Tzu-Li (Gordon) Tai <
>> tzuli...@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> Hi Flink devs,
>>>>>> 
>>>>>> Congxian, Kostas, and I have recently been discussing to unify the
>> binary
>>>>>> formats for keyed state in savepoints, which would allow for more
>>>>>> operational flexibility such as swapping state backends across
>> restores.
>>>>>> 
>>>>>> As part of this FLIP, another main proposal is to start allowing
>>>>>> checkpoints and savepoints to have different formats. Savepoint
>> formats
>>>>>> should in the future be designed with interoperability in mind and
>>>>>> reasonable snapshot / restore overhead is tolerable, while
>> checkpoints are
>>>>>> allowed to be backend specific for more efficient snapshots and
>> restores.
>>>>>> From recent proposals in the state backends such as disk-spilling heap
>>>>>> backend [1], this flexibility seems to be reasonable.
>>>>>> 
>>>>>> The main user-facing API this would affect is of course, the binary
>>>>>> formats of savepoints, as well as the fact that we will no longer be
>>>>>> guaranteeing functional parity between savepoints and full
>> checkpoints in
>>>>>> the future (w.r.t. operational features related to upgrading
>> applications;
>>>>>> so far they have equal functionality).
>>>>>> 
>>>>>> Therefore, we would like to collect feedback on the proposal before
>>>>>> continuing efforts.
>>>>>> 
>>>>>> This is the FLIP:
>>>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints
>>>>>> .
>>>>>> 
>>>>>> I'm happy to discuss details and looking forward to any feedback.
>>>>>> 
>>>>>> Cheers,
>>>>>> Gordon
>>>>>> 
>>>>>> [1]
>>>>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Proposal-to-support-disk-spilling-in-HeapKeyedStateBackend-td29109.html
>>>>>> 
>>>>> 
>>> 
>> 
>>

Re: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Reply via email to