[ 
https://issues.apache.org/jira/browse/FLINK-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534771#comment-16534771
 ] 

ASF GitHub Bot commented on FLINK-9377:
---------------------------------------

Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/6235
  
    Took a look at this WIP and I think it goes into a good direction.
    
    My most important comment is that I think it would help to move the 
"ensureCompatibility" into the config snapshot, for the following reasons:
      - Clearer separation of concerns, the serializer has only the 
serialization logic, and creating the snapshot. Compatibility is not the 
serializers immediate concern.
      - The current design means that the serializer mutates internal fields on 
reconfiguration. That is error prone. Consider a serializer like the 
KryoSerializer, where the configuration is not fully deep copied on duplication 
(makes sense, it is read only during serialization). Mutating that 
configuration would change the behavior of other previously duplicated 
serializers as well, which is unexpected.
    
    Thoughts for improvements with lower priority:
    
      - Can we avoid setting the ClassLoader into a field in the config 
snapshot, and then deserializing? I think such solutions are fragile and should 
be avoided if possible. The ClassLoader is not really part of the snapshots 
state, it is an auxiliary to the deserialization and should, as such, be passed 
as an argument to the read method: read(in, classloader). This means that the 
TypeSerializerConfigSnapshot would not implement `IOReadableWritable`, but that 
might be not a problem.
    
      - Is the TypeSerializerConfigSnapshotSerializationProxy needed? It seems 
like an unnecessary indirection given that it is used exclusively in the 
TypeSerializerSerializationUtil and could be a static util method instead.



> Remove writing serializers as part of the checkpoint meta information
> ---------------------------------------------------------------------
>
>                 Key: FLINK-9377
>                 URL: https://issues.apache.org/jira/browse/FLINK-9377
>             Project: Flink
>          Issue Type: Sub-task
>          Components: State Backends, Checkpointing
>            Reporter: Tzu-Li (Gordon) Tai
>            Assignee: Tzu-Li (Gordon) Tai
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.6.0
>
>
> When writing meta information of a state in savepoints, we currently write 
> both the state serializer as well as the state serializer's configuration 
> snapshot.
> Writing both is actually redundant, as most of the time they have identical 
> information.
>  Moreover, the fact that we use Java serialization to write the serializer 
> and rely on it to be re-readable on the restore run, already poses problems 
> for serializers such as the {{AvroSerializer}} (see discussion in FLINK-9202) 
> to perform even a compatible upgrade.
> The proposal here is to leave only the config snapshot as meta information, 
> and use that as the single source of truth of information about the schema of 
> serialized state.
>  The config snapshot should be treated as a factory (or provided to a 
> factory) to re-create serializers capable of reading old, serialized state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to