> I have a very similar question to State Processor API. Is it the same scenario in this case? > Should it also be working with checkpoints but might be just untested?
I have used the State Processor API with aligned, full checkpoints. There it has worked just fine. David On Thu, Jan 13, 2022 at 12:40 PM Piotr Nowojski <pnowoj...@apache.org> wrote: > Hi, > > Thanks for the comments and questions. Starting from the top: > > Seth: good point about schema evolution. Actually, I have a very similar > question to State Processor API. Is it the same scenario in this case? > Should it also be working with checkpoints but might be just untested? > > And next question, should we commit to supporting those two things (State > Processor API and schema evolution) for native savepoints? What about > aligned checkpoints? (please check [1] for that). > > Yu Li: 1, 2 and 4 done. > > > 3. How about changing the description of "the default configuration of > the > > checkpoints will be used to determine whether the savepoint should be > > incremental or not" to something like "the `state.backend.incremental` > > setting now denotes the type of native format snapshot and will take > effect > > for both checkpoint and savepoint (with native type)", to prevent concept > > confusion between checkpoint and savepoint? > > Is `state.backend.incremental` the only configuration parameter that can be > used in this context? I would guess not? What about for example > "state.storage.fs.memory-threshold" or all of the Advanced RocksDB State > Backends Options [2]? > > David: > > > does this mean that we need to keep the checkpoints compatible across > minor > > versions? Or can we say, that the minor version upgrades are only > > guaranteed with canonical savepoints? > > Good question. Frankly I was always assuming that this is implicitly given. > Otherwise users would not be able to recover jobs that are failing because > of bugs in Flink. But I'm pretty sure that was never explicitly stated. > > As Konstantin suggested, I've written down the pre-existing guarantees of > checkpoints and savepoints followed by two proposals on how they should be > changed [1]. Could you take a look? > > I'm especially unsure about the following things: > a) What about RocksDB upgrades? If we bump RocksDB version between Flink > versions, do we support recovering from a native format snapshot > (incremental checkpoint)? > b) State Processor API - both pre-existing and what do we want to provide > in the future > c) Schema Evolution - both pre-existing and what do we want to provide in > the future > > Best, > Piotrek > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Checkpointvssavepointguarantees > [2] > > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#advanced-rocksdb-state-backends-options > > wt., 11 sty 2022 o 09:45 Konstantin Knauf <kna...@apache.org> napisał(a): > > > Hi Piotr, > > > > would it be possible to provide a table that shows the > > compatibility guarantees provided by the different snapshots going > forward? > > Like type of change (Topology. State Schema, Parallelism, ..) in one > > dimension, and type of snapshot as the other dimension. Based on that, it > > would be easier to discuss those guarantees, I believe. > > > > Cheers, > > > > Konstantin > > > > On Mon, Jan 3, 2022 at 9:11 AM David Morávek <d...@apache.org> wrote: > > > > > Hi Piotr, > > > > > > does this mean that we need to keep the checkpoints compatible across > > minor > > > versions? Or can we say, that the minor version upgrades are only > > > guaranteed with canonical savepoints? > > > > > > My concern is especially if we'd want to change layout of the > checkpoint. > > > > > > D. > > > > > > > > > > > > On Wed, Dec 29, 2021 at 5:19 AM Yu Li <car...@gmail.com> wrote: > > > > > > > Thanks for the proposal Piotr! Overall I'm +1 for the idea, and below > > are > > > > my two cents: > > > > > > > > 1. How about adding a "Term Definition" section and clarify what > > "native > > > > format" (the "native" data persistence format of the current state > > > backend) > > > > and "canonical format" (the "uniform" format that supports switching > > > state > > > > backends) means? > > > > > > > > 2. IIUC, currently the FLIP proposes to only support incremental > > > savepoint > > > > with native format, and there's no plan to add such support for > > canonical > > > > format, right? If so, how about writing this down explicitly in the > > FLIP > > > > doc, maybe in a "Limitations" section, plus the fact that > > > > `HashMapStateBackend` cannot support incremental savepoint before > > > FLIP-151 > > > > is done? (side note: @Roman just a kindly reminder, that please take > > > > FLIP-203 into account when implementing FLIP-151) > > > > > > > > 3. How about changing the description of "the default configuration > of > > > the > > > > checkpoints will be used to determine whether the savepoint should be > > > > incremental or not" to something like "the > `state.backend.incremental` > > > > setting now denotes the type of native format snapshot and will take > > > effect > > > > for both checkpoint and savepoint (with native type)", to prevent > > concept > > > > confusion between checkpoint and savepoint? > > > > > > > > 4. How about putting the notes of behavior change (the default type > of > > > > savepoint will be changed to `native` in the future, and by then the > > > taken > > > > savepoint cannot be used to switch state backends by default) to a > more > > > > obvious place, for example moving from the "CLI" section to the > > > > "Compatibility" section? (although it will only happen in 1.16 > release > > > > based on the proposed plan) > > > > > > > > And all above suggestions apply for our user-facing document after > the > > > FLIP > > > > is (partially or completely, accordingly) done, if taken (smile). > > > > > > > > Best Regards, > > > > Yu > > > > > > > > > > > > On Tue, 21 Dec 2021 at 22:23, Seth Wiesman <sjwies...@gmail.com> > > wrote: > > > > > > > > > >> AFAIK state schema evolution should work both for native and > > > canonical > > > > > >> savepoints. > > > > > > > > > > Schema evolution does technically work for both formats, it happens > > > after > > > > > the code paths have been unified, but the community has up until > this > > > > point > > > > > considered that an unsupported feature. From my perspective making > > this > > > > > supported could be as simple as adding test coverage but that's an > > > active > > > > > decision we'd need to make. > > > > > > > > > > On Tue, Dec 21, 2021 at 7:43 AM Piotr Nowojski < > pnowoj...@apache.org > > > > > > > > wrote: > > > > > > > > > > > Hi Konstantin, > > > > > > > > > > > > > In this context: will the native format support state schema > > > > evolution? > > > > > > If > > > > > > > not, I am not sure, we can let the format default to native. > > > > > > > > > > > > AFAIK state schema evolution should work both for native and > > > canonical > > > > > > savepoints. > > > > > > > > > > > > Regarding what is/will be supported we will document as part of > > this > > > > > > FLIP-203. But it's not as simple as just the difference between > > > native > > > > > and > > > > > > canonical formats. > > > > > > > > > > > > Best, Piotrek > > > > > > > > > > > > pon., 20 gru 2021 o 14:28 Konstantin Knauf <kna...@apache.org> > > > > > napisał(a): > > > > > > > > > > > > > Hi Piotr, > > > > > > > > > > > > > > Thanks a lot for starting the discussion. Big +1. > > > > > > > > > > > > > > In my understanding, this FLIP introduces the snapshot format > as > > a > > > > > > *really* > > > > > > > user facing concept. IMO it is important that we document > > > > > > > > > > > > > > a) that it is not longer the checkpoint/savepoint > characteristics > > > > that > > > > > > > determines the kind of changes that a snapshots allows (user > > code, > > > > > state > > > > > > > schema evolution, topology changes), but now this becomes a > > > property > > > > of > > > > > > the > > > > > > > format regardless of whether this is a snapshots or a > checkpoint > > > > > > > b) the exact changes that each format allows (code, state > schema, > > > > > > topology, > > > > > > > state backend, max parallelism) > > > > > > > > > > > > > > In this context: will the native format support state schema > > > > evolution? > > > > > > If > > > > > > > not, I am not sure, we can let the format default to native. > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Konstantin > > > > > > > > > > > > > > > > > > > > > On Mon, Dec 20, 2021 at 2:09 PM Piotr Nowojski < > > > pnowoj...@apache.org > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi devs, > > > > > > > > > > > > > > > > I would like to start a discussion about a previously > announced > > > > > follow > > > > > > up > > > > > > > > of the FLIP-193 [1], namely allowing savepoints to be in > native > > > > > format > > > > > > > and > > > > > > > > incremental. The changes do not seem invasive. The full > > proposal > > > is > > > > > > > > written down as FLIP-203: Incremental savepoints [2]. Please > > > take a > > > > > > look, > > > > > > > > and let me know what you think. > > > > > > > > > > > > > > > > Best, > > > > > > > > Piotrek > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Semantic > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > Konstantin Knauf > > > > > > > > > > > > > > https://twitter.com/snntrable > > > > > > > > > > > > > > https://github.com/knaufk > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Konstantin Knauf > > > > https://twitter.com/snntrable > > > > https://github.com/knaufk > > >