[ https://issues.apache.org/jira/browse/FLINK-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15731784#comment-15731784 ]
ASF GitHub Bot commented on FLINK-5051: --------------------------------------- GitHub user StefanRRichter opened a pull request: https://github.com/apache/flink/pull/2962 [FLINK-5051] Backwards compatibility for serializers in backens This PR sits on top of PR #2781 and introduces future backwards compatibility for state serializers and backends. We do so by providing version compatibility checking for TypeSerializer and making the serializers mandatory part of a keyed backend's meta data in checkpoints (so that we have everything required to reconstruct states in a self contained way). A serialization proxy is introduced for keyed state backend and operator state backend. Currently this serialization proxy covers the meta data, not yet the actual data. For most parts, the PR essentially moves functionality to a different place or makes formats more explicit. You can merge this pull request into a Git repository by running: $ git pull https://github.com/StefanRRichter/flink serializer-backwards-compatibility-operator Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2962.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2962 ---- commit a373585c2fe71b467f49f0e295dc647b43ab7a9c Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-11-01T11:29:01Z Backwards compatibility 1.1 -> 1.2 commit 8e4e4bcede50e66a95928ec854e51d45a7df28bf Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-11-09T13:54:35Z Removing some unecessary code from migration classes commit 78bd66fade7f836eafbab978329caf1ea26f2ffc Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-11-09T17:21:13Z MultiStreamStateHandle commit a9355679c3476dd890b54312e1696b61c7839873 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-11-10T13:18:55Z Added migration unit test commit d079bd4bdb762c307a3c5cd084590804b90996b1 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-11-10T13:45:58Z rebase fixes commit 9f47bac9c25fc33993c3942a57462039cc578dcd Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-11-11T13:46:39Z Minor cleanups: deleting more unnecessary classes commit 2bbe66386d28c7914c62e2c3829ff3ab6840164c Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-11-23T13:15:33Z Versioned serialization commit 6460e27717ab208aada988ba2c83d5628b31b310 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-11-23T17:59:45Z Common meta info introduced to keyed backends commit e7d66377730339523bad8e3e6e75865ea5a29a6b Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-11-23T21:40:26Z Introducing isCompatibleWith to TypeSerializers commit 89e3779d231fd0dadb01782791c92ec8ebb15a81 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-11-23T22:33:42Z Splitting / Introducing interface for versiond and compatibile commit 434f9424e5cd0e01d45e51f44b917306606c5fb1 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-11-24T10:59:01Z Cleanup and documentation commit 5cb40348dac235dfbe6c6fda532f2b87a6aee7f9 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-11-24T16:19:51Z Better abstractions commit 500361fb07a428034cb96deb46ece3531d277080 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-11-24T16:29:24Z Serialization proxies for operator state backend meta data commit 614ab7531a644eaf9edbc420383936bb6e39a34b Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-01T14:12:40Z handle one forgotten type of state handle commit 11792a84c6dbf5057a46fc98b5190abf2cfad014 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-01T14:37:29Z Serialization Proxies for KeyedBackends, OperatorBackends, TypeSerializers. Still needs integration in OperatorBackend. commit 89dcc375bd732a370609c10bcaaf5c3b42e93b98 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-02T00:19:44Z isCompatibleWith, code dedup and cleanup commit 3bf993be2aaf17d7f71f0b3709b15aeb78baed6b Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-02T00:20:05Z Tests commit 336fbedf8699792da3586ebe5d30d2644f0abe08 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-02T00:43:22Z Some compatibility logic for compount tuple serializer commit 4ad6fc7884bede64f15d6251e752c97222ac665a Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-02T11:29:43Z Partial rollback, going for the simpelest approach. Also including some info about state type to serialization commit d7eed9bc8b13803aa01fc682b22c100ba0e76072 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-02T12:15:59Z fixup for base branch, todo cherrypick commit e962751de45e00d25e063d2f6f19f82dff8f4838 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-02T15:58:24Z Fix for 1/0 if UDF present commit d827f3dd27054844dd154e620e7a1cd75d43fdf5 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-02T16:00:28Z Fix for Unknown State type in statemetainfo commit 5fb822e478acb6bd046b7376834e9e16e80dffd1 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-05T13:23:49Z Introduce Eager restore and serialization proxies in DefaultOperatorStateBackend commit dbb54e765bc163c9b33f014b60ecfed8bc98f65c Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-06T15:00:59Z WIP offset stream commit 2331f62ff3a21d9a1337d336573c3eb4b8305e7e Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-07T14:37:23Z Backwards compatibility for JobVertexID generation. commit 820f1fde8899d6068a6ad08cf1f93312d3f238c9 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-07T15:25:59Z Backwards compatibility for JobVertexID generation -> StateAssignmentOperation. commit 3e2c877bb3a4049a817a23f7144d3234e65af0f4 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-07T15:34:32Z Backwards compatibility for JobVertexID generation -> Fixups. commit 6182fb4d40ab5bfb64730276e53435d8206c1373 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-07T15:55:39Z unit test for legacy jobvertexid commit a8753e57054c0e043c01eb4b29558ad3467e0de5 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-07T20:23:35Z [FLINK-5283] Fix closing streams when restoring old savepoint in keyed backends commit 5f4bd4c352a27a3dadd606508ade3ba33c729ad3 Author: Stefan Richter <s.rich...@data-artisans.com> Date: 2016-12-07T20:25:29Z [FLINK-5282] Fix closing streams on exception in SavepointV0Serializer ---- > Backwards compatibility for serializers in backend state > -------------------------------------------------------- > > Key: FLINK-5051 > URL: https://issues.apache.org/jira/browse/FLINK-5051 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing > Reporter: Stefan Richter > Assignee: Stefan Richter > > When a new state is register, e.g. in a keyed backend via > `getPartitionedState`, the caller has to provide all type serializers > required for the persistence of state components. Explicitly passing the > serializers on state creation already allows for potentiall version upgrades > of serializers. > However, those serializers are currently not part of any snapshot and are > only provided at runtime, when the state is registered newly or restored. For > backwards compatibility, this has strong implications: checkpoints are not > self contained in that state is currently a blackbox without knowledge about > it's corresponding serializers. Most cases where we would need to restructure > the state are basically lost. We could only convert them lazily at runtime > and only once the user is registering the concrete state, which might happen > at unpredictable points. > I suggest to adapt our solution as follows: > - As now, all states are registered with their set of serializers. > - Unlike now, all serializers are written to the snapshot. This makes > savepoints self-contained and also allows to create inspection tools for > savepoints at some point in the future. > - Introduce an interface {{Versioned}} with {{long getVersion()}} and > {{boolean isCompatible(Versioned v)}} which is then implemented by > serializers. Compatible serializers must ensure that they can deserialize > older versions, and can then serialize them in their new format. This is how > we upgrade. > We need to find the right tradeoff in how many places we need to store the > serializers. I suggest to write them once per parallel operator instance for > each state, i.e. we have a map with state_name -> tuple3<serializer<KEY>, > serializer<NAMESPACE>, serializer<STATE>>. This could go before all > key-groups are written, right at the head of the file. Then, for each file we > see on restore, we can first read the serializer map from the head of the > stream, then go through the key groups by offset. -- This message was sent by Atlassian JIRA (v6.3.4#6332)