[ 
https://issues.apache.org/jira/browse/FLINK-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641469#comment-16641469
 ] 

ASF GitHub Bot commented on FLINK-9377:
---------------------------------------

tzulitai commented on issue #6711: [FLINK-9377] [core, state backends] Remove 
serializers from checkpoints
URL: https://github.com/apache/flink/pull/6711#issuecomment-427744344
 
 
   Thanks a lot for the detailed reviews and discussions, @dawidwys and 
@StephanEwen.
   
   @dawidwys and I went through some offline second-pass discussions on the 
suggestions, and we finally came up with this (implemented in the follow-up 
e2cbe99):
   
   1. To incorporate all the desired changes to the current 
`TypeSerializerConfigSnapshot` class (e.g. not explicitly setting the 
classloader, dump using the `VersionedIOReadableWritable` as base class, etc.), 
we came up with introducing `TypeSerializerSnapshot` which will eventually be 
the new snapshot interface once we completely remove 
`TypeSerializerConfigSnapshot`:
   
   ```
   @PublicEvolving
   public interface TypeSerializerSnapshot<T> {
       int getCurrentVersion();
       void write(DataOutputView out) throws IOException;
       void read(int readVersion, DataInputView in, ClassLoader 
userCodeClassLoader) throws IOException;
       TypeSerializer<T> restoreSerializer();
       <NS extends TypeSerializer<T>> TypeSerializerSchemaCompatibility<T, NS> 
resolveSchemaCompatibility(NS newSerializer);
   }
   ```
   
   If a serializer's snapshot is still the legacy 
`TypeSerializerConfigSnapshot`, then the prior serializer will always be 
written along with it so that the dummy `restoreSerializer()` factory method 
works with requiring the user for any changes. Once the user upgrades to 
directly implementing the `TypeSerializerSnapshot`, then the 
`restoreSerializer()` method will be strictly required to be implemented, and 
the prior serializer will no longer be written anymore.
   
   It is only possible to change a snapshot from `TypeSerializerConfigSnapshot` 
to `TypeSerializerSnapshot`, and not the other way around.
   
   This allows us to incrementally upgrade each serializer snapshot class 
currently in Flink.
   We can still release if only partially some of the serializers have been 
upgraded.
   
   2. The `BackwardsCompatibleConfigSnapshot` wrapper class is still required, 
so that for Flink version <= 1.2 (which only wrote the serializer instance and 
no snapshots), we can wrap the serializer instance inside a snapshot 
implementation.
   
   Overall, the new changes quite closely follows your suggestions, only with a 
different approach to how the new / old interfaces work together.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove writing serializers as part of the checkpoint meta information
> ---------------------------------------------------------------------
>
>                 Key: FLINK-9377
>                 URL: https://issues.apache.org/jira/browse/FLINK-9377
>             Project: Flink
>          Issue Type: Sub-task
>          Components: State Backends, Checkpointing
>            Reporter: Tzu-Li (Gordon) Tai
>            Assignee: Tzu-Li (Gordon) Tai
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>
> When writing meta information of a state in savepoints, we currently write 
> both the state serializer as well as the state serializer's configuration 
> snapshot.
> Writing both is actually redundant, as most of the time they have identical 
> information.
>  Moreover, the fact that we use Java serialization to write the serializer 
> and rely on it to be re-readable on the restore run, already poses problems 
> for serializers such as the {{AvroSerializer}} (see discussion in FLINK-9202) 
> to perform even a compatible upgrade.
> The proposal here is to leave only the config snapshot as meta information, 
> and use that as the single source of truth of information about the schema of 
> serialized state.
>  The config snapshot should be treated as a factory (or provided to a 
> factory) to re-create serializers capable of reading old, serialized state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to