Regarding this comment: "The version in the state is the serializer version, 
and applies to the entire state, independent of what it contains. If you use 
Kryo2 for reading and Kryo5 for writing (which also implies writing the new 
serializer version into state), then I'd assume that a migration is an 
all-or-nothing kind of deal."

Much of Flink uses the TypeSerializerSnapshot classes for serialization. With 
that, the fully qualified package+class name of a subclass of 
TypeSerializerSnapshot is written to the state as a string. The pull-request 
uses this class name to determine the correct version of Kryo to use. Flink up 
to and including 1.17.x uses 
`org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializerSnapshot` for 
Kryo 2.x serialized data. The pull request uses 
`org.apache.flink.api.java.typeutils.runtime.kryo5.KryoSerializerSnapshot` for 
Kryo 5.x serialized data. Serialized state mixes different types of snapshots 
and if it has both Kryo 2.x and Kryo 5.x snapshot data, that works without 
problems and uses the correct version of Kryo to deserialize successfully.

The state version number is used to determine the serialized Kryo version at 
only one point in the source code where the Snapshot classes are not used:

https://github.com/kurtostfeld/flink/blob/e013e9e95096efb41d376f3b6584b5d3d78dc916/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/StateTableByKeyGroupReaders.java#L73

This seems to work from my testing. If I can find a scenario where this doesn't 
work I can come up with a revised solution.

I'd like to conclude that this pull-request demonstrates that a backward 
compatible Kryo upgrade is possible and is mostly done. More testing from a 
wider pool of people would be needed to proceed, but this demonstrates it is 
possible. However, whether this proceeds at all is up to the Flink project. If 
the plan of the Flink project is to drop all backward compatibility anyway for 
a 2.0 release as Martijn Visser suggested in this thread, then the Kryo upgrade 
can be done in a much much simpler fashion, and doing a more complex backward 
compatible upgrade seems unnecessary.


Reply via email to