[ https://issues.apache.org/jira/browse/FLINK-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15748060#comment-15748060 ]
ASF GitHub Bot commented on FLINK-5051: --------------------------------------- Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/2962 Thanks for changing! I just realised, however, that we already have tests in `StateBackendTestBase` that check for restoring from wrong serialisers so we don't actually need those. I will also split the changes into 4 commits, you already split into FLINK-5282 and FLINK-5283, the later commits I will split into "[FLINK-5041] Savepoint Backwards Compatibility 1.1 -> 1.2" since these are essentially fixups for that and are not related to FLINK-5051. The last commit will contain the actual changes for FLINK-5051. What do you think? > Backwards compatibility for serializers in backend state > -------------------------------------------------------- > > Key: FLINK-5051 > URL: https://issues.apache.org/jira/browse/FLINK-5051 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing > Reporter: Stefan Richter > Assignee: Stefan Richter > > When a new state is register, e.g. in a keyed backend via > `getPartitionedState`, the caller has to provide all type serializers > required for the persistence of state components. Explicitly passing the > serializers on state creation already allows for potentiall version upgrades > of serializers. > However, those serializers are currently not part of any snapshot and are > only provided at runtime, when the state is registered newly or restored. For > backwards compatibility, this has strong implications: checkpoints are not > self contained in that state is currently a blackbox without knowledge about > it's corresponding serializers. Most cases where we would need to restructure > the state are basically lost. We could only convert them lazily at runtime > and only once the user is registering the concrete state, which might happen > at unpredictable points. > I suggest to adapt our solution as follows: > - As now, all states are registered with their set of serializers. > - Unlike now, all serializers are written to the snapshot. This makes > savepoints self-contained and also allows to create inspection tools for > savepoints at some point in the future. > - Introduce an interface {{Versioned}} with {{long getVersion()}} and > {{boolean isCompatible(Versioned v)}} which is then implemented by > serializers. Compatible serializers must ensure that they can deserialize > older versions, and can then serialize them in their new format. This is how > we upgrade. > We need to find the right tradeoff in how many places we need to store the > serializers. I suggest to write them once per parallel operator instance for > each state, i.e. we have a map with state_name -> tuple3<serializer<KEY>, > serializer<NAMESPACE>, serializer<STATE>>. This could go before all > key-groups are written, right at the head of the file. Then, for each file we > see on restore, we can first read the serializer map from the head of the > stream, then go through the key groups by offset. -- This message was sent by Atlassian JIRA (v6.3.4#6332)