Hi Federico,

It seems like the state cannot be restored because the class of the state type 
(i.e., Event) had been modified since the savepoint, and therefore has a 
conflicting serialVersionUID with whatever it is in the savepoint.
This can happen if Java serialization is used for some part of your state, and 
the class of the written data was modified while a fixed serialVersionUID was 
not explicitly specified for that class.

To avoid this, you should explicitly set a serialVersionUID for the Event class.
You can actually also do that now without losing state while also incorporating 
the modifications you were trying to do for your updated job.
Explicitly declare the serialVersionUID of the Event class to what is was 
before your modifications (i.e., 8728793377941765980, according to your error 
log).

One side question: are you experiencing this restore failure for one of your 
custom operator states, or is this failing state part of some Flink built-in 
operator / connector?
I’m asking just to have an idea of which Flink built-in operator / connectors 
still use Java serialization for user state; ideally we would want that to be 
completed removed in the future.

Cheers,
Gordon

On 28 November 2017 at 10:02:19 PM, Federico D'Ambrosio 
(federico.dambro...@smartlab.ws) wrote:

Hi,

I recently had to do a code update of a long running Flink Stream job (1.3.2) 
and on the restart from the savepoint I had to deal with:

java.lang.IllegalStateException: Could not initialize keyed state backend.

Caused by: java.io.InvalidClassException: lab.vardata.events.Event; local class 
incompatible: stream classdesc serial                                           
        VersionUID = 8728793377341765980, local class serialVersionUID = 
-4253404384162522764

because I have changed a method used to convert the Event to a Cassandra 
writable Tuple (in particular, I changed the return type from Tuple10 to 
Tuple11, after adding a field). I reverted those changes back since it wasn't 
much of a problem per se.

Now, I understand the root cause of this issue and I wanted to ask if there are 
any "best practices" to prevent this kind of issues, without losing the state 
of the job, because of restarting it from the very beginning.

--
Federico D'Ambrosio

Reply via email to