Re: [DISCUSS] FLIP-203: Incremental savepoints

David Morávek Fri, 21 Jan 2022 00:58:03 -0800

As per offline discussion with Piotr: we have some ways to deal with
possible incompatibilities (hiding them behind a feature flag, new state
backend implementation). Having faster adoption of the new Flink releases
is more valuable in this context, than a possible overhead on the
implementation side.


+1 for the effort, no more concerns from my side

D.

On Wed, Jan 19, 2022 at 9:54 AM David Morávek <d...@apache.org> wrote:

> For some users canonical savepoints are prohibitively expensive either to
>> take or to recover from. To a point where the system is unable to complete
>> them before some timeout/failure happens.
>
>
> I'd say the same logic applies for re-scaling the job from the RDB based
> checkpoint / savepoint. I'm still not positive that we can solve the
> re-scaling performance without disrupting the backward compatibility. Just
> for more context, this is something we need to solve for getting reactive
> mode / auto-scaling broadly adopted.
>
> Overall, I like the idea of making the migration path as smooth as
> possible for end users as this may allow for faster adoption of new Flink
> versions, but there are new problems this might introduce and we should be
> aware of them.
>
> D.
>
> On Wed, Jan 19, 2022 at 9:04 AM Piotr Nowojski <pnowoj...@apache.org>
> wrote:
>
>> Hi David,
>>
>> I didn't mean "best effort". It's just that we would be relaying on a 3rd
>> party system, which quoting [1]:
>>
>> > RocksDB goes to great lengths to ensure data remains both forward- and
>> backward-compatible
>>
>> But it's still a 3rd party system that we do not control. It's the same
>> with KafkaClient for example. Yes, it's supposed to be backward/forward
>> compatible, but we do not control it. Moreover, even for things that we do
>> control, and we claim that they are "stable" "public" and we do guarantee
>> backward compatibility, we might break compatibility from time to time. We
>> did things like that in the past if there was no way of fixing a bug
>> without some breaking change.
>>
>> > Why are the canonical savepoints not good enough for supporting minor
>> version upgrades?
>>
>> For some users canonical savepoints are prohibitively expensive either to
>> take or to recover from. To a point where the system is unable to complete
>> them before some timeout/failure happens.
>>
>> Besides, I would really like for the sake of simplicity (from the user's
>> perspective) to keep both canonical and native savepoints as close
>> together
>> as possible.
>>
>> Best,
>> Piotrek
>>
>> [1] https://dl.acm.org/doi/fullHtml/10.1145/3483840
>>
>>
>> wt., 18 sty 2022 o 19:27 David Morávek <d...@apache.org> napisał(a):
>>
>> > Hi Piotr,
>> >
>> > thanks for following up on this,
>> >
>> > Regarding the RocksDB incompatibility, I think we can claim that Flink
>> > > version upgrades are/will be supported. If ever we need to break the
>> > > backward compatibility via bumping RocksDB version in a such way, that
>> > > RocksDB won't be able to provide that compatibility, we will need to
>> make
>> > > this a prominent notice in the release notes.
>> >
>> >
>> > I'm still not sure about this. Would that mean giving up flexibility for
>> > the future improvements to the state backend infrastructure? For example
>> > I'm experimenting along the lines of switching range partitioning for
>> > consistent hashing to speed up re-scaling. Just this simple change (or
>> any
>> > other change to layout in general) would make the checkpoints
>> incompatible
>> > or it would involve non-trivial effort to support migration from older
>> > versions.
>> >
>> > Maybe it's just about how it's phrased, what you're suggesting fits
>> > somewhere between "best effort compatibility" & "guaranteed
>> compatibility".
>> > If we say it's best effort ("we're free to change anything, but this
>> > shouldn't be done blindly"), then it should provide the flexibility we
>> > need.
>> >
>> > Why are the canonical savepoints not good enough for supporting minor
>> > version upgrades? Yes with some larger state this might be time
>> consuming,
>> > but wouldn't these use-cases benefit more from further optimizations to
>> the
>> > state backend? Also how often are users doing these updates (I'd say max
>> > twice a year if they follow-up the release cycle)?
>> >
>> > Best,
>> > D.
>> >
>> > On Tue, Jan 18, 2022 at 3:51 PM Piotr Nowojski <pnowoj...@apache.org>
>> > wrote:
>> >
>> > > Hi All,
>> > >
>> > > Yu, sorry I was somehow confused about the configuration. I've changed
>> > > the FLIP as you sugested.
>> > >
>> > > Yu/Yun Tang:
>> > >
>> > > Regarding the RocksDB incompatibility, I think we can claim that Flink
>> > > version upgrades are/will be supported. If ever we need to break the
>> > > backward compatibility via bumping RocksDB version in a such way, that
>> > > RocksDB won't be able to provide that compatibility, we will need to
>> make
>> > > this a prominent notice in the release notes.
>> > >
>> > > > I have used the State Processor API with aligned, full checkpoints.
>> > There
>> > > it has worked just fine.
>> > >
>> > > Thanks for this information.
>> > >
>> > > > 0) What exactly does the "State Processor API" row mean? Is it: Can
>> it
>> > be
>> > > > read by the State Processor API? Can it be written by the State
>> > Processor
>> > > > API? Both? Something else?
>> > >
>> > > Good question. I'm not sure how State Processor API is working. Can
>> > someone
>> > > help answer what we guarantee/support right now and what we can
>> > > reasonably support?
>> > >
>> > > > 1) and 2)
>> > >
>> > > I guess you are simply in favour of the 2nd proposal? So
>> > >
>> > > * rescaling
>> > > * Job upgrade w/o changing graph shape and record types
>> > > * Flink bug/patch (1.14.x → 1.14.y) version upgrade
>> > >
>> > > + Flink minor (1.x → 1.y) version upgrade
>> > >
>> > > which I think is important for the native savepoint to be truly
>> > savepoints.
>> > >
>> > > > 3) Should "Job upgrade w/o changing graph shape and record types" be
>> > > split? I guess "record types" is only relevant for unaligned
>> checkpoints.
>> > >
>> > > Shape of the job graph is also an issue with unaligned checkpoints.
>> > > Changing record types/serialisation causes obvious problems with the
>> > > in-flight records, but if you change job graph via for example
>> changing
>> > > type of the network connection (like random -> broadcast, keyed -> non
>> > > keyed), or remove some operators, we also have problems with the
>> > in-flight
>> > > records in the affected connections.
>> > >
>> > > > 4)
>> > >
>> > > I think configuration change should be always supported. If you think
>> > > that's important, I can add this to the documentation/FLIP proposal
>> as a
>> > > separate row.
>> > >
>> > > > 5) Do the guarantees that a Savepoint/Checkpoint provide change when
>> > > > generalized incremental checkpoints [1] are enabled? My
>> understanding
>> > is:
>> > > > No, the same guarantees apply.
>> > >
>> > > This will be more tricky and it will highly depend on the FLIP-158
>> > > implementation.
>> > >
>> > > Yun:
>> > >
>> > > >   1.  From my understanding, native savepoint appears much closer to
>> > > current alignment checkpoint. What's their difference?
>> > >
>> > > Technically there would be no difference, but we might decide to limit
>> > what
>> > > we officially support, to allow us easier changes in the future. Just
>> as
>> > > for the most part so far between savepoint and checkpoints there was
>> very
>> > > little difference. For us, as developers, the fewer things we
>> officially
>> > > support and claim are stable, the better.
>> > >
>> > > > 2.  If self-contained and relocatable are the most important
>> > difference,
>> > > why not include them in the proposal table?
>> > >
>> > > Good point. I will add this.
>> > >
>> > > >  What does "Job full upgrade" means?
>> > >
>> > > I have clarified it to:
>> > > > Arbitrary job upgrade (changed graph shape/record types)
>> > >
>> > > It's an arbitrary job change. Anything that doesn't fall into the
>> second
>> > > category "Job upgrade w/o changing graph shape and record types"
>> > >
>> > > Best,
>> > > Piotrek
>> > >
>> > > pon., 17 sty 2022 o 11:25 Yun Tang <myas...@live.com> napisał(a):
>> > >
>> > > > Hi everyone,
>> > > >
>> > > > Thanks for Piotr to drive this topic.
>> > > >
>> > > > I have several questions on this FLIP.
>> > > >
>> > > >   1.  From my understanding, native savepoint appears much closer to
>> > > > current alignment checkpoint. What's their difference?
>> > > >   2.  If self-contained and relocatable are the most important
>> > > difference,
>> > > > why not include them in the proposal table?
>> > > >
>> > > >   1.  What does "Job full upgrade" means?
>> > > >
>> > > > For the question of RocksDB upgrading, this depends on the backwards
>> > > > compatibility [1], and it proves to be very well as the
>> documentation
>> > > said.
>> > > >
>> > > >
>> > > > [1]
>> > > >
>> > >
>> >
>> https://github.com/facebook/rocksdb/wiki/RocksDB-Compatibility-Between-Different-Releases
>> > > >
>> > > > Best，
>> > > > Yun Tang
>> > > >
>> > > >
>> > > >
>> > > > ________________________________
>> > > > From: Konstantin Knauf <konstan...@ververica.com>
>> > > > Sent: Friday, January 14, 2022 20:39
>> > > > To: dev <dev@flink.apache.org>; Seth Wiesman <s...@ververica.com>;
>> > Nico
>> > > > Kruber <n...@ververica.com>; dander...@apache.org <
>> > dander...@apache.org>
>> > > > Subject: Re: [DISCUSS] FLIP-203: Incremental savepoints
>> > > >
>> > > > Hi everyone,
>> > > >
>> > > > Thank you, Piotr. Please find my thoughts on the topic below:
>> > > >
>> > > > 0) What exactly does the "State Processor API" row mean? Is it: Can
>> it
>> > be
>> > > > read by the State Processor API? Can it be written by the State
>> > Processor
>> > > > API? Both? Something else?
>> > > >
>> > > > 1) If we take the assumption from FLIP-193 "that ownership should be
>> > the
>> > > > only difference between Checkpoints and Savepoints.", we would need
>> to
>> > > work
>> > > > in the direction of "Proposal 2". The distinction would then be the
>> > > > following:
>> > > >
>> > > > * Canonical Savepoint = Guarantees A
>> > > > * Canonical Checkpoint = Guarantees A (in theory; does not exist)
>> > > > * Aligned, Native Checkpoint = Guarantees B
>> > > > * Aligned, Native Savepoint = Guarantees B
>> > > > * Unaligned, Native Checkpoint = Guarantees C
>> > > > * Unaligned, Native Savepoint = Guarantees C (if this would exist in
>> > the
>> > > > future)
>> > > >
>> > > > I think it is important to make this matrix not too complicated
>> like:
>> > > there
>> > > > are 8 different sets of guarantees depending on all kinds of more or
>> > less
>> > > > well-known configuration options.
>> > > >
>> > > > 2) With respect to the concrete guarantees, I believe, it's
>> important
>> > > that
>> > > > we can cover all important use cases in "green", so that users can
>> rely
>> > > on
>> > > > official, tested behavior in regular operations. In my experience
>> this
>> > > > includes manual recovery of a Job from a retained checkpoint. I
>> would
>> > > argue
>> > > > that most users operating a long-running, stateful Apache Flink
>> > > application
>> > > > have been in the situation, where a graceful "stop" was not possible
>> > > > anymore, because the Job was unable to take a Savepoint. This could
>> be,
>> > > > because the Job is frequently restarting (e.g. poison pill) or
>> because
>> > it
>> > > > fails on taking the Savepoint itself for some reason (e.g. unable to
>> > > commit
>> > > > a transaction to an external system). The solution strategy in this
>> > > > scenario is to cancel the job, make some changes to the Job or
>> > > > configuration that fix the problem and restore from the last
>> successful
>> > > > (retained) checkpoint. I think the following changes would need to
>> be
>> > > > officially supported for Native Checkpoints/Savepoint (Guarantees B,
>> > > > ideally also Guarantees C), in order to fix a Job in most of these
>> > cases.
>> > > >
>> > > > * rescaling
>> > > > * Job upgrade w/o changing graph shape and record types
>> > > > * Flink bug/patch (1.14.x → 1.14.y) version upgrade
>> > > >
>> > > > I would be very interested to hear from users as well as people like
>> > > Seth,
>> > > > Nico or David (cc), who work with many users, what  in their
>> experience
>> > > > would be needed here.
>> > > >
>> > > > 3) Should "Job upgrade w/o changing graph shape and record types" be
>> > > split?
>> > > > I guess "record types" is only relevant for unaligned checkpoints.
>> > > >
>> > > > 4) Does it make sense to consider Flink configuration changes
>> besides
>> > the
>> > > > statebackend type as another row? Maybe split by "pipeline.*"
>> options,
>> > > > "execution.*" options, and whichever other categories would make
>> sense.
>> > > > Just to give a few examples: it should be *officially* supported to
>> > take
>> > > a
>> > > > native retained checkpoint and restart a the Job with a
>> > > > pipeline.auto-watermark-interval and different high-availability
>> > > > configurations
>> > > >
>> > > > 5) Do the guarantees that a Savepoint/Checkpoint provide change when
>> > > > generalized incremental checkpoints [1] are enabled? My
>> understanding
>> > is:
>> > > > No, the same guarantees apply.
>> > > >
>> > > > Cheers and thank you,
>> > > >
>> > > > Konstantin
>> > > >
>> > > > [1]
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints?src=contextnavpagetreemode
>> > > >
>> > > > On Fri, Jan 14, 2022 at 11:24 AM David Anderson <
>> dander...@apache.org>
>> > > > wrote:
>> > > >
>> > > > > > I have a very similar question to State Processor API. Is it the
>> > same
>> > > > > scenario in this case?
>> > > > > > Should it also be working with checkpoints but might be just
>> > > untested?
>> > > > >
>> > > > > I have used the State Processor API with aligned, full
>> checkpoints.
>> > > There
>> > > > > it has worked just fine.
>> > > > >
>> > > > > David
>> > > > >
>> > > > > On Thu, Jan 13, 2022 at 12:40 PM Piotr Nowojski <
>> > pnowoj...@apache.org>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > Thanks for the comments and questions. Starting from the top:
>> > > > > >
>> > > > > > Seth: good point about schema evolution. Actually, I have a very
>> > > > similar
>> > > > > > question to State Processor API. Is it the same scenario in this
>> > > case?
>> > > > > > Should it also be working with checkpoints but might be just
>> > > untested?
>> > > > > >
>> > > > > > And next question, should we commit to supporting those two
>> things
>> > > > (State
>> > > > > > Processor API and schema evolution) for native savepoints? What
>> > about
>> > > > > > aligned checkpoints? (please check [1] for that).
>> > > > > >
>> > > > > > Yu Li: 1, 2 and 4 done.
>> > > > > >
>> > > > > > > 3. How about changing the description of "the default
>> > configuration
>> > > > of
>> > > > > > the
>> > > > > > > checkpoints will be used to determine whether the savepoint
>> > should
>> > > be
>> > > > > > > incremental or not" to something like "the
>> > > > `state.backend.incremental`
>> > > > > > > setting now denotes the type of native format snapshot and
>> will
>> > > take
>> > > > > > effect
>> > > > > > > for both checkpoint and savepoint (with native type)", to
>> prevent
>> > > > > concept
>> > > > > > > confusion between checkpoint and savepoint?
>> > > > > >
>> > > > > > Is `state.backend.incremental` the only configuration parameter
>> > that
>> > > > can
>> > > > > be
>> > > > > > used in this context? I would guess not? What about for example
>> > > > > > "state.storage.fs.memory-threshold" or all of the Advanced
>> RocksDB
>> > > > State
>> > > > > > Backends Options [2]?
>> > > > > >
>> > > > > > David:
>> > > > > >
>> > > > > > > does this mean that we need to keep the checkpoints compatible
>> > > across
>> > > > > > minor
>> > > > > > > versions? Or can we say, that the minor version upgrades are
>> only
>> > > > > > > guaranteed with canonical savepoints?
>> > > > > >
>> > > > > > Good question. Frankly I was always assuming that this is
>> > implicitly
>> > > > > given.
>> > > > > > Otherwise users would not be able to recover jobs that are
>> failing
>> > > > > because
>> > > > > > of bugs in Flink. But I'm pretty sure that was never explicitly
>> > > stated.
>> > > > > >
>> > > > > > As Konstantin suggested, I've written down the pre-existing
>> > > guarantees
>> > > > of
>> > > > > > checkpoints and savepoints followed by two proposals on how they
>> > > should
>> > > > > be
>> > > > > > changed [1]. Could you take a look?
>> > > > > >
>> > > > > > I'm especially unsure about the following things:
>> > > > > > a) What about RocksDB upgrades? If we bump RocksDB version
>> between
>> > > > Flink
>> > > > > > versions, do we support recovering from a native format snapshot
>> > > > > > (incremental checkpoint)?
>> > > > > > b) State Processor API - both pre-existing and what do we want
>> to
>> > > > provide
>> > > > > > in the future
>> > > > > > c) Schema Evolution - both pre-existing and what do we want to
>> > > provide
>> > > > in
>> > > > > > the future
>> > > > > >
>> > > > > > Best,
>> > > > > > Piotrek
>> > > > > >
>> > > > > > [1]
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Checkpointvssavepointguarantees
>> > > > > > [2]
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#advanced-rocksdb-state-backends-options
>> > > > > >
>> > > > > > wt., 11 sty 2022 o 09:45 Konstantin Knauf <kna...@apache.org>
>> > > > > napisał(a):
>> > > > > >
>> > > > > > > Hi Piotr,
>> > > > > > >
>> > > > > > > would it be possible to provide a table that shows the
>> > > > > > > compatibility guarantees provided by the different snapshots
>> > going
>> > > > > > forward?
>> > > > > > > Like type of change (Topology. State Schema, Parallelism, ..)
>> in
>> > > one
>> > > > > > > dimension, and type of snapshot as the other dimension. Based
>> on
>> > > > that,
>> > > > > it
>> > > > > > > would be easier to discuss those guarantees, I believe.
>> > > > > > >
>> > > > > > > Cheers,
>> > > > > > >
>> > > > > > > Konstantin
>> > > > > > >
>> > > > > > > On Mon, Jan 3, 2022 at 9:11 AM David Morávek <d...@apache.org
>> >
>> > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Piotr,
>> > > > > > > >
>> > > > > > > > does this mean that we need to keep the checkpoints
>> compatible
>> > > > across
>> > > > > > > minor
>> > > > > > > > versions? Or can we say, that the minor version upgrades are
>> > only
>> > > > > > > > guaranteed with canonical savepoints?
>> > > > > > > >
>> > > > > > > > My concern is especially if we'd want to change layout of
>> the
>> > > > > > checkpoint.
>> > > > > > > >
>> > > > > > > > D.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Wed, Dec 29, 2021 at 5:19 AM Yu Li <car...@gmail.com>
>> > wrote:
>> > > > > > > >
>> > > > > > > > > Thanks for the proposal Piotr! Overall I'm +1 for the
>> idea,
>> > and
>> > > > > below
>> > > > > > > are
>> > > > > > > > > my two cents:
>> > > > > > > > >
>> > > > > > > > > 1. How about adding a "Term Definition" section and
>> clarify
>> > > what
>> > > > > > > "native
>> > > > > > > > > format" (the "native" data persistence format of the
>> current
>> > > > state
>> > > > > > > > backend)
>> > > > > > > > > and "canonical format" (the "uniform" format that supports
>> > > > > switching
>> > > > > > > > state
>> > > > > > > > > backends) means?
>> > > > > > > > >
>> > > > > > > > > 2. IIUC, currently the FLIP proposes to only support
>> > > incremental
>> > > > > > > > savepoint
>> > > > > > > > > with native format, and there's no plan to add such
>> support
>> > for
>> > > > > > > canonical
>> > > > > > > > > format, right? If so, how about writing this down
>> explicitly
>> > in
>> > > > the
>> > > > > > > FLIP
>> > > > > > > > > doc, maybe in a "Limitations" section, plus the fact that
>> > > > > > > > > `HashMapStateBackend` cannot support incremental savepoint
>> > > before
>> > > > > > > > FLIP-151
>> > > > > > > > > is done? (side note: @Roman just a kindly reminder, that
>> > please
>> > > > > take
>> > > > > > > > > FLIP-203 into account when implementing FLIP-151)
>> > > > > > > > >
>> > > > > > > > > 3. How about changing the description of "the default
>> > > > configuration
>> > > > > > of
>> > > > > > > > the
>> > > > > > > > > checkpoints will be used to determine whether the
>> savepoint
>> > > > should
>> > > > > be
>> > > > > > > > > incremental or not" to something like "the
>> > > > > > `state.backend.incremental`
>> > > > > > > > > setting now denotes the type of native format snapshot and
>> > will
>> > > > > take
>> > > > > > > > effect
>> > > > > > > > > for both checkpoint and savepoint (with native type)", to
>> > > prevent
>> > > > > > > concept
>> > > > > > > > > confusion between checkpoint and savepoint?
>> > > > > > > > >
>> > > > > > > > > 4. How about putting the notes of behavior change (the
>> > default
>> > > > type
>> > > > > > of
>> > > > > > > > > savepoint will be changed to `native` in the future, and
>> by
>> > > then
>> > > > > the
>> > > > > > > > taken
>> > > > > > > > > savepoint cannot be used to switch state backends by
>> default)
>> > > to
>> > > > a
>> > > > > > more
>> > > > > > > > > obvious place, for example moving from the "CLI" section
>> to
>> > the
>> > > > > > > > > "Compatibility" section? (although it will only happen in
>> > 1.16
>> > > > > > release
>> > > > > > > > > based on the proposed plan)
>> > > > > > > > >
>> > > > > > > > > And all above suggestions apply for our user-facing
>> document
>> > > > after
>> > > > > > the
>> > > > > > > > FLIP
>> > > > > > > > > is (partially or completely, accordingly) done, if taken
>> > > (smile).
>> > > > > > > > >
>> > > > > > > > > Best Regards,
>> > > > > > > > > Yu
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Tue, 21 Dec 2021 at 22:23, Seth Wiesman <
>> > > sjwies...@gmail.com>
>> > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > >> AFAIK state schema evolution should work both for
>> native
>> > > and
>> > > > > > > > canonical
>> > > > > > > > > > >> savepoints.
>> > > > > > > > > >
>> > > > > > > > > > Schema evolution does technically work for both
>> formats, it
>> > > > > happens
>> > > > > > > > after
>> > > > > > > > > > the code paths have been unified, but the community has
>> up
>> > > > until
>> > > > > > this
>> > > > > > > > > point
>> > > > > > > > > > considered that an unsupported feature. From my
>> perspective
>> > > > > making
>> > > > > > > this
>> > > > > > > > > > supported could be as simple as adding test coverage but
>> > > that's
>> > > > > an
>> > > > > > > > active
>> > > > > > > > > > decision we'd need to make.
>> > > > > > > > > >
>> > > > > > > > > > On Tue, Dec 21, 2021 at 7:43 AM Piotr Nowojski <
>> > > > > > pnowoj...@apache.org
>> > > > > > > >
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Hi Konstantin,
>> > > > > > > > > > >
>> > > > > > > > > > > > In this context: will the native format support
>> state
>> > > > schema
>> > > > > > > > > evolution?
>> > > > > > > > > > > If
>> > > > > > > > > > > > not, I am not sure, we can let the format default to
>> > > > native.
>> > > > > > > > > > >
>> > > > > > > > > > > AFAIK state schema evolution should work both for
>> native
>> > > and
>> > > > > > > > canonical
>> > > > > > > > > > > savepoints.
>> > > > > > > > > > >
>> > > > > > > > > > > Regarding what is/will be supported we will document
>> as
>> > > part
>> > > > of
>> > > > > > > this
>> > > > > > > > > > > FLIP-203. But it's not as simple as just the
>> difference
>> > > > between
>> > > > > > > > native
>> > > > > > > > > > and
>> > > > > > > > > > > canonical formats.
>> > > > > > > > > > >
>> > > > > > > > > > > Best, Piotrek
>> > > > > > > > > > >
>> > > > > > > > > > > pon., 20 gru 2021 o 14:28 Konstantin Knauf <
>> > > > kna...@apache.org>
>> > > > > > > > > > napisał(a):
>> > > > > > > > > > >
>> > > > > > > > > > > > Hi Piotr,
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks a lot for starting the discussion. Big +1.
>> > > > > > > > > > > >
>> > > > > > > > > > > > In my understanding, this FLIP introduces the
>> snapshot
>> > > > format
>> > > > > > as
>> > > > > > > a
>> > > > > > > > > > > *really*
>> > > > > > > > > > > > user facing concept. IMO it is important that we
>> > document
>> > > > > > > > > > > >
>> > > > > > > > > > > > a) that it is not longer the checkpoint/savepoint
>> > > > > > characteristics
>> > > > > > > > > that
>> > > > > > > > > > > > determines the kind of changes that a snapshots
>> allows
>> > > > (user
>> > > > > > > code,
>> > > > > > > > > > state
>> > > > > > > > > > > > schema evolution, topology changes), but now this
>> > > becomes a
>> > > > > > > > property
>> > > > > > > > > of
>> > > > > > > > > > > the
>> > > > > > > > > > > > format regardless of whether this is a snapshots or
>> a
>> > > > > > checkpoint
>> > > > > > > > > > > > b) the exact changes that each format allows (code,
>> > state
>> > > > > > schema,
>> > > > > > > > > > > topology,
>> > > > > > > > > > > > state backend, max parallelism)
>> > > > > > > > > > > >
>> > > > > > > > > > > > In this context: will the native format support
>> state
>> > > > schema
>> > > > > > > > > evolution?
>> > > > > > > > > > > If
>> > > > > > > > > > > > not, I am not sure, we can let the format default to
>> > > > native.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks,
>> > > > > > > > > > > >
>> > > > > > > > > > > > Konstantin
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Mon, Dec 20, 2021 at 2:09 PM Piotr Nowojski <
>> > > > > > > > pnowoj...@apache.org
>> > > > > > > > > >
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Hi devs,
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > I would like to start a discussion about a
>> previously
>> > > > > > announced
>> > > > > > > > > > follow
>> > > > > > > > > > > up
>> > > > > > > > > > > > > of the FLIP-193 [1], namely allowing savepoints
>> to be
>> > > in
>> > > > > > native
>> > > > > > > > > > format
>> > > > > > > > > > > > and
>> > > > > > > > > > > > > incremental. The changes do not seem invasive. The
>> > full
>> > > > > > > proposal
>> > > > > > > > is
>> > > > > > > > > > > > > written down as FLIP-203: Incremental savepoints
>> [2].
>> > > > > Please
>> > > > > > > > take a
>> > > > > > > > > > > look,
>> > > > > > > > > > > > > and let me know what you think.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Best,
>> > > > > > > > > > > > > Piotrek
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > [1]
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership
>> > > > > > > > > > > > > [2]
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Semantic
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > --
>> > > > > > > > > > > >
>> > > > > > > > > > > > Konstantin Knauf
>> > > > > > > > > > > >
>> > > > > > > > > > > > https://twitter.com/snntrable
>> > > > > > > > > > > >
>> > > > > > > > > > > > https://github.com/knaufk
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > >
>> > > > > > > Konstantin Knauf
>> > > > > > >
>> > > > > > > https://twitter.com/snntrable
>> > > > > > >
>> > > > > > > https://github.com/knaufk
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > Konstantin Knauf | Head of Product
>> > > >
>> > > > +49 160 91394525
>> > > >
>> > > >
>> > > > Follow us @VervericaData Ververica <https://www.ververica.com/>
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>> > > > Conference
>> > > >
>> > > > Stream Processing | Event Driven | Real Time
>> > > >
>> > > > --
>> > > >
>> > > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>> > > >
>> > > > --
>> > > > Ververica GmbH
>> > > > Registered at Amtsgericht Charlottenburg: HRB 158244 B
>> > > > Managing Directors: Karl Anton Wehner, Holger Temme, Yip Park Tung
>> > Jason,
>> > > > Jinwei (Kevin) Zhang
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-203: Incremental savepoints

Reply via email to