Re: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Yu Li Fri, 14 Jun 2019 06:40:46 -0700

Hi Aljoscha and all,

My 2 cents here:


1. Conceptually it worth a second thought about introducing an optimized
snapshot format for now (i.e. use checkpoint format in savepoint), just
like it's not recommended to use snapshot for backup in database (although
practically it could be implemented).

2. Stop-with-checkpoint mechanism is like stopping database instance with a
data flush, thus (IMHO) a different story from the checkpoint/savepoint (db
snapshot/backup) diversity.

3. In the long run we may improve the checkpoint to allow a short enough
interval thus it may become some format of transactional log, then we could
enable checkpoint-based savepoint (like transactional log based backup), so
I agree to still call the new format in FLIP-41 a "Unified Format" although
in the short term it only unifies savepoint.

I've also wrote a document [1] to include more details and please refer to
it if interested. Thanks!

[1] https://docs.google.com/document/d/1uE4R3wNal6e67FkDe0UvcnsIMMDpr35j

Best Regards,
Yu


On Thu, 6 Jun 2019 at 19:42, Aljoscha Krettek <aljos...@apache.org> wrote:

> Btw, I think this FLIP is a very good effort, we just need to reframe the
> effort a tiny bit. +1
>
> > On 6. Jun 2019, at 13:41, Aljoscha Krettek <aljos...@apache.org> wrote:
> >
> > Hi,
> >
> > I had a brief discussion with Stephan that helped me sort my thoughts on
> the broader topics of checkpoints, savepoints, binary formats,
> user-triggered checkpoints, and periodic savepoints. I’ll try to summarise
> my stance on this and also comment with the same message on the other
> relevant Jira Issues and threads.
> >
> > For reference, the relevant FLIP and Jira issues are these:
> >
> > -
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints:
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41:+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints:>
> Unified Savepoint Format
> > - https://issues.apache.org/jira/browse/FLINK-12619: Add support for
> stop-with-checkpoint
> > - https://issues.apache.org/jira/browse/FLINK-6755: User-triggered
> checkpoints
> > - https://issues.apache.org/jira/browse/FLINK-4620: Automatically
> creating savepoints
> > - https://issues.apache.org/jira/browse/FLINK-4511: Schedule periodic
> savepoints
> >
> > There are roughly two different dimensions in the topic of
> savepoints/checkpoints (I’ll use snapshot as the generic term for both):
> > 1) who controls the snapshot
> > 2) what’s the (binary) format of the snapshot
> >
> > For 1), we currently have checkpoints and savepoints. Checkpoints are
> created by the system for fault tolerance. They are managed by the system
> and the system is free to discard them when it sees fit. Savepoints are in
> the control of the user. A user can choose to create a save point, they can
> delete them, they can restore from them at will. The system will not clean
> up savepoints. We should try and keep this separation and not muddle the
> two concepts.
> >
> > For 2), we currently have various different formats between the
> different state backends and also for the same backend. I.e. RocksDB can do
> full or incremental snapshots, local snapshots, and probably more.
> >
> > FLIP-41 aims at introducing a unified “savepoint" format that is
> interchangeable between the different state backends. In light of the above
> points, we should say that FLIP-41 aims to introduce a canonical format
> that is interchangeable between different backends. This doesn’t mean that
> we should tie this format strictly to savepoints, though. For performance
> reasons, users might choose to do savepoints that use one of the optimised
> formats that the backends offer, for example incremental snapshots. Or they
> might choose to use the canonical format for regular checkpoints so that
> they can always switch between backends using periodically created
> externalised checkpoints.
> >
> > The motivation behind FLINK-12619 is to have a more lightweight
> alternative for stop-with-savepoint, for example using the incremental
> snapshot format that RocksDB has. With the above in mind, however, this
> becomes “Add support for choosing the snapshot format for
> stop-with-savepoint”. It should not be stop-with-checkpoint, because
> checkpoints are something that the system manages and not something that
> the user should trigger. The same is true for FLINK-6755, the motivation is
> the same I think. The change should be called “Add support for choosing the
> snapshot format for savepoints”, however.
> >
> > For the last two Jira issues mentioned above it should be quite clear
> what I think. I do, however, see a need for potentially different
> overlapping checkpoint periods or intervals. Users might want to have their
> regular checkpoints use an optimised format but they also want to have a
> “canonical format” checkpoint every no and then so that the lineage of
> incremental checkpoints does not become too unwieldy.
> >
> > Please let me know what you think!
> >
> > Aljoscha
> >
> >> On 5. Jun 2019, at 10:36, Tzu-Li (Gordon) Tai <tzuli...@apache.org>
> wrote:
> >>
> >> I want to quickly bump this discussion to gather more consensus from
> others
> >> on the FLIP, and see if we want to aim this for the upcoming 1.9.0
> release.
> >> The proposal touches binary formats of savepoints, which is a major
> part of
> >> Flink's public user interface, so having explicit approval from other
> >> members of the community would be nice here.
> >>
> >> Cheers,
> >> Gordon
> >>
> >> On Wed, May 29, 2019 at 11:45 AM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org>
> >> wrote:
> >>
> >>> I also should point out something that I forgot to mention in the
> initial
> >>> post:
> >>> Stefan has helped a lot in understanding the current status of state
> >>> backends and also participated a lot in design choices for the FLIP :)
> >>>
> >>> On Wed, May 29, 2019 at 5:02 AM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org>
> >>> wrote:
> >>>
> >>>> Hi Flink devs,
> >>>>
> >>>> Congxian, Kostas, and I have recently been discussing to unify the
> binary
> >>>> formats for keyed state in savepoints, which would allow for more
> >>>> operational flexibility such as swapping state backends across
> restores.
> >>>>
> >>>> As part of this FLIP, another main proposal is to start allowing
> >>>> checkpoints and savepoints to have different formats. Savepoint
> formats
> >>>> should in the future be designed with interoperability in mind and
> >>>> reasonable snapshot / restore overhead is tolerable, while
> checkpoints are
> >>>> allowed to be backend specific for more efficient snapshots and
> restores.
> >>>> From recent proposals in the state backends such as disk-spilling heap
> >>>> backend [1], this flexibility seems to be reasonable.
> >>>>
> >>>> The main user-facing API this would affect is of course, the binary
> >>>> formats of savepoints, as well as the fact that we will no longer be
> >>>> guaranteeing functional parity between savepoints and full
> checkpoints in
> >>>> the future (w.r.t. operational features related to upgrading
> applications;
> >>>> so far they have equal functionality).
> >>>>
> >>>> Therefore, we would like to collect feedback on the proposal before
> >>>> continuing efforts.
> >>>>
> >>>> This is the FLIP:
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Keyed+State+Snapshot+Binary+Format+for+Savepoints
> >>>> .
> >>>>
> >>>> I'm happy to discuss details and looking forward to any feedback.
> >>>>
> >>>> Cheers,
> >>>> Gordon
> >>>>
> >>>> [1]
> >>>>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Proposal-to-support-disk-spilling-in-HeapKeyedStateBackend-td29109.html
> >>>>
> >>>
> >
>
>

Re: [DISCUSS] FLIP-41: Unify Keyed State Snapshot Binary Format for Savepoints

Reply via email to