I am unable to restore a 1.9 savepoint into a 1.11 runtime for the very 
interesting reason that the Savepoint class was renamed and repackaged between 
those two releases.   Apparently a Kryo serializer has that class registered in 
the 1.9 runtime.     I can’t think of a good reason for that class to be 
registered with Kryo; none of the job operators reference any such thing.   Yet 
there it is causing the following exception and preventing upgrade to a new 
runtime.

Caused by: java.lang.IllegalStateException: Missing value for the key 
'org.apache.flink.runtime.checkpoint.savepoint.Savepoint'
at 
org.apache.flink.util.LinkedOptionalMap.unwrapOptionals(LinkedOptionalMap.java:190)
 ~[flink-dist_2.11-1.11.3.jar:1.11.3]
at 
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializerSnapshot.restoreSerializer(KryoSerializerSnapshot.java:86)
 ~[flink-dist_2.11-1.11.3.jar:1.11.3]

There doesn’t seem to be any way to unregister a class from Kryo.   And the 
mechanism for dealing with missing classes looks to me like it has never worked 
as advertised.    Instead of registering a dummy class for a missing class name 
a null gets registered instead, leading to the exception which prevents 
restoring the savepoint.   The code that returns a null instead of a dummy is 
here  - 
https://github.com/apache/flink/blob/e8cfe6701b9768d1f1fe4488640cba5f9b42d73f/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/kryo/KryoSerializerSnapshotData.java#L263

Resulting in this log.

2021-07-27 18:38:11,703 WARN 
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializerSnapshotData [] 
- Cannot find registered class 
org.apache.flink.runtime.checkpoint.savepoint.Savepoint for Kryo serialization 
in classpath; using a dummy class as a placeholder.
java.lang.ClassNotFoundException: 
org.apache.flink.runtime.checkpoint.savepoint.Savepoint

One way or another I need to be able to restore a 1.9 savepoint into 1.11.   
Perhaps the Kryo registration needs to be cleansed from wherever it is lurking 
in the 1.9 savepoint,  or an effective dummy needs to be substituted when 
reading it into 1.11.

Has anyone else encountered this problem, or have any advice to offer?






Reply via email to