Hi Weston, I haven never looked into the savepoint migration code paths myself, but I know that savepoint migration across multiple versions is not supported (1.9 can only migrate to 1.10, not 1.11). We have test coverage for these migrations, and I would be surprised if this "Savepoint" class migration is not covered in these tests.
Have you tried upgrading from 1.9 to 1.10, and then from 1.10 to 1.11? On Fri, Jul 30, 2021 at 11:53 PM Weston Woods <wwo...@spireon.com> wrote: > I am unable to restore a 1.9 savepoint into a 1.11 runtime for the very > interesting reason that the Savepoint class was renamed and repackaged > between those two releases. Apparently a Kryo serializer has that class > registered in the 1.9 runtime. I can’t think of a good reason for that > class to be registered with Kryo; none of the job operators reference any > such thing. Yet there it is causing the following exception and > preventing upgrade to a new runtime. > > > > Caused by: java.lang.IllegalStateException: Missing value for the key > 'org.apache.flink.runtime.checkpoint.savepoint.Savepoint' > at > org.apache.flink.util.LinkedOptionalMap.unwrapOptionals(LinkedOptionalMap.java:190) > ~[flink-dist_2.11-1.11.3.jar:1.11.3] > at > org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializerSnapshot.restoreSerializer(KryoSerializerSnapshot.java:86) > ~[flink-dist_2.11-1.11.3.jar:1.11.3] > > > > There doesn’t seem to be any way to unregister a class from Kryo. And > the mechanism for dealing with missing classes looks to me like it has > never worked as advertised. Instead of registering a dummy class for a > missing class name a null gets registered instead, leading to the exception > which prevents restoring the savepoint. The code that returns a null > instead of a dummy is here - > https://github.com/apache/flink/blob/e8cfe6701b9768d1f1fe4488640cba5f9b42d73f/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/kryo/KryoSerializerSnapshotData.java#L263 > > > > Resulting in this log. > > > > 2021-07-27 18:38:11,703 WARN > org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializerSnapshotData > [] - Cannot find registered class > org.apache.flink.runtime.checkpoint.savepoint.Savepoint for Kryo > serialization in classpath; using a dummy class as a placeholder. > java.lang.ClassNotFoundException: > org.apache.flink.runtime.checkpoint.savepoint.Savepoint > > > > One way or another I need to be able to restore a 1.9 savepoint into > 1.11. Perhaps the Kryo registration needs to be cleansed from wherever it > is lurking in the 1.9 savepoint, or an effective dummy needs to be > substituted when reading it into 1.11. > > > > Has anyone else encountered this problem, or have any advice to offer? > > > > > > > > > > > > >