[ 
https://issues.apache.org/jira/browse/FLINK-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15731784#comment-15731784
 ] 

ASF GitHub Bot commented on FLINK-5051:
---------------------------------------

GitHub user StefanRRichter opened a pull request:

    https://github.com/apache/flink/pull/2962

    [FLINK-5051] Backwards compatibility for serializers in backens 

    This PR sits on top of PR #2781 and introduces future backwards 
compatibility for state serializers and backends. We do so by providing version 
compatibility checking for TypeSerializer and making the serializers mandatory 
part of a keyed backend's meta data in checkpoints (so that we have everything 
required to reconstruct states in a self contained way). A serialization proxy 
is introduced for keyed state backend and operator state backend. Currently 
this serialization proxy covers the meta data, not yet the actual data. For 
most parts, the PR essentially moves functionality to a different place or 
makes formats more explicit.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StefanRRichter/flink 
serializer-backwards-compatibility-operator

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2962.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2962
    
----
commit a373585c2fe71b467f49f0e295dc647b43ab7a9c
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-11-01T11:29:01Z

    Backwards compatibility 1.1 -> 1.2

commit 8e4e4bcede50e66a95928ec854e51d45a7df28bf
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-11-09T13:54:35Z

    Removing some unecessary code from migration classes

commit 78bd66fade7f836eafbab978329caf1ea26f2ffc
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-11-09T17:21:13Z

    MultiStreamStateHandle

commit a9355679c3476dd890b54312e1696b61c7839873
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-11-10T13:18:55Z

    Added migration unit test

commit d079bd4bdb762c307a3c5cd084590804b90996b1
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-11-10T13:45:58Z

    rebase fixes

commit 9f47bac9c25fc33993c3942a57462039cc578dcd
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-11-11T13:46:39Z

    Minor cleanups: deleting more unnecessary classes

commit 2bbe66386d28c7914c62e2c3829ff3ab6840164c
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-11-23T13:15:33Z

    Versioned serialization

commit 6460e27717ab208aada988ba2c83d5628b31b310
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-11-23T17:59:45Z

    Common meta info introduced to keyed backends

commit e7d66377730339523bad8e3e6e75865ea5a29a6b
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-11-23T21:40:26Z

    Introducing isCompatibleWith to TypeSerializers

commit 89e3779d231fd0dadb01782791c92ec8ebb15a81
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-11-23T22:33:42Z

    Splitting / Introducing interface for versiond and compatibile

commit 434f9424e5cd0e01d45e51f44b917306606c5fb1
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-11-24T10:59:01Z

    Cleanup and documentation

commit 5cb40348dac235dfbe6c6fda532f2b87a6aee7f9
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-11-24T16:19:51Z

    Better abstractions

commit 500361fb07a428034cb96deb46ece3531d277080
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-11-24T16:29:24Z

    Serialization proxies for operator state backend meta data

commit 614ab7531a644eaf9edbc420383936bb6e39a34b
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-01T14:12:40Z

    handle one forgotten type of state handle

commit 11792a84c6dbf5057a46fc98b5190abf2cfad014
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-01T14:37:29Z

    Serialization Proxies for KeyedBackends, OperatorBackends, TypeSerializers. 
Still needs integration in OperatorBackend.

commit 89dcc375bd732a370609c10bcaaf5c3b42e93b98
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-02T00:19:44Z

    isCompatibleWith, code dedup and cleanup

commit 3bf993be2aaf17d7f71f0b3709b15aeb78baed6b
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-02T00:20:05Z

    Tests

commit 336fbedf8699792da3586ebe5d30d2644f0abe08
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-02T00:43:22Z

    Some compatibility logic for compount tuple serializer

commit 4ad6fc7884bede64f15d6251e752c97222ac665a
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-02T11:29:43Z

    Partial rollback, going for the simpelest approach. Also including some 
info about state type to serialization

commit d7eed9bc8b13803aa01fc682b22c100ba0e76072
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-02T12:15:59Z

    fixup for base branch, todo cherrypick

commit e962751de45e00d25e063d2f6f19f82dff8f4838
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-02T15:58:24Z

    Fix for 1/0 if UDF present

commit d827f3dd27054844dd154e620e7a1cd75d43fdf5
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-02T16:00:28Z

    Fix for Unknown State type in statemetainfo

commit 5fb822e478acb6bd046b7376834e9e16e80dffd1
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-05T13:23:49Z

    Introduce Eager restore and serialization proxies in 
DefaultOperatorStateBackend

commit dbb54e765bc163c9b33f014b60ecfed8bc98f65c
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-06T15:00:59Z

    WIP offset stream

commit 2331f62ff3a21d9a1337d336573c3eb4b8305e7e
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-07T14:37:23Z

    Backwards compatibility for JobVertexID generation.

commit 820f1fde8899d6068a6ad08cf1f93312d3f238c9
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-07T15:25:59Z

    Backwards compatibility for JobVertexID generation -> 
StateAssignmentOperation.

commit 3e2c877bb3a4049a817a23f7144d3234e65af0f4
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-07T15:34:32Z

    Backwards compatibility for JobVertexID generation -> Fixups.

commit 6182fb4d40ab5bfb64730276e53435d8206c1373
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-07T15:55:39Z

    unit test for legacy jobvertexid

commit a8753e57054c0e043c01eb4b29558ad3467e0de5
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-07T20:23:35Z

    [FLINK-5283] Fix closing streams when restoring old savepoint in keyed 
backends

commit 5f4bd4c352a27a3dadd606508ade3ba33c729ad3
Author: Stefan Richter <s.rich...@data-artisans.com>
Date:   2016-12-07T20:25:29Z

    [FLINK-5282] Fix closing streams on exception in SavepointV0Serializer

----


> Backwards compatibility for serializers in backend state
> --------------------------------------------------------
>
>                 Key: FLINK-5051
>                 URL: https://issues.apache.org/jira/browse/FLINK-5051
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>            Reporter: Stefan Richter
>            Assignee: Stefan Richter
>
> When a new state is register, e.g. in a keyed backend via 
> `getPartitionedState`, the caller has to provide all type serializers 
> required for the persistence of state components. Explicitly passing the 
> serializers on state creation already allows for potentiall version upgrades 
> of serializers.
> However, those serializers are currently not part of any snapshot and are 
> only provided at runtime, when the state is registered newly or restored. For 
> backwards compatibility, this has strong implications: checkpoints are not 
> self contained in that state is currently a blackbox without knowledge about 
> it's corresponding serializers. Most cases where we would need to restructure 
> the state are basically lost. We could only convert them lazily at runtime 
> and only once the user is registering the concrete state, which might happen 
> at unpredictable points.
> I suggest to adapt our solution as follows:
> - As now, all states are registered with their set of serializers.
> - Unlike now, all serializers are written to the snapshot. This makes 
> savepoints self-contained and also allows to create inspection tools for 
> savepoints at some point in the future.
> - Introduce an interface {{Versioned}} with {{long getVersion()}} and 
> {{boolean isCompatible(Versioned v)}} which is then implemented by 
> serializers. Compatible serializers must ensure that they can deserialize 
> older versions, and can then serialize them in their new format. This is how 
> we upgrade.
> We need to find the right tradeoff in how many places we need to store the 
> serializers. I suggest to write them once per parallel operator instance for 
> each state, i.e. we have a map with state_name -> tuple3<serializer<KEY>, 
> serializer<NAMESPACE>, serializer<STATE>>. This could go before all 
> key-groups are written, right at the head of the file. Then, for each file we 
> see on restore, we can first read the serializer map from the head of the 
> stream, then go through the key groups by offset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to