Regarding this comment: "The version in the state is the serializer version, and applies to the entire state, independent of what it contains. If you use Kryo2 for reading and Kryo5 for writing (which also implies writing the new serializer version into state), then I'd assume that a migration is an all-or-nothing kind of deal."
Much of Flink uses the TypeSerializerSnapshot classes for serialization. With that, the fully qualified package+class name of a subclass of TypeSerializerSnapshot is written to the state as a string. The pull-request uses this class name to determine the correct version of Kryo to use. Flink up to and including 1.17.x uses `org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializerSnapshot` for Kryo 2.x serialized data. The pull request uses `org.apache.flink.api.java.typeutils.runtime.kryo5.KryoSerializerSnapshot` for Kryo 5.x serialized data. Serialized state mixes different types of snapshots and if it has both Kryo 2.x and Kryo 5.x snapshot data, that works without problems and uses the correct version of Kryo to deserialize successfully. The state version number is used to determine the serialized Kryo version at only one point in the source code where the Snapshot classes are not used: https://github.com/kurtostfeld/flink/blob/e013e9e95096efb41d376f3b6584b5d3d78dc916/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/StateTableByKeyGroupReaders.java#L73 This seems to work from my testing. If I can find a scenario where this doesn't work I can come up with a revised solution. I'd like to conclude that this pull-request demonstrates that a backward compatible Kryo upgrade is possible and is mostly done. More testing from a wider pool of people would be needed to proceed, but this demonstrates it is possible. However, whether this proceeds at all is up to the Flink project. If the plan of the Flink project is to drop all backward compatibility anyway for a 2.0 release as Martijn Visser suggested in this thread, then the Kryo upgrade can be done in a much much simpler fashion, and doing a more complex backward compatible upgrade seems unnecessary.