Re: non-deserializable root cause in DeclineCheckpoint

Terry Wang Sun, 22 Sep 2019 05:04:07 -0700

Hi, Jeffrey~

I think two fixes you mentioned may not work in your case. 
This problem https://issues.apache.org/jira/browse/FLINK-14076 
<https://issues.apache.org/jira/browse/FLINK-14076> is caused by TM and JM jar 
package environment inconsistent or jar loaded behavior inconsistent in nature.
Maybe the behavior  of standalone cluster’s dynamic class loader changed in 
flink 1.9 since you mentioned that your program run normally in flink 1.8. 
Just a thought from me.
Hope to be useful~


Best,
Terry Wang



> 在 2019年9月21日，上午2:58，Jeffrey Martin <jeffrey.martin...@gmail.com> 写道：
> 
> JIRA ticket: https://issues.apache.org/jira/browse/FLINK-14076
> 
> I'm on Flink v1.9 with the Kafka connector and a standalone JM.
> 
> If FlinkKafkaProducer fails while checkpointing, it throws a KafkaException
> which gets wrapped in a CheckpointException which is sent to the JM as a
> DeclineCheckpoint. KafkaException isn't on the JM default classpath, so the
> JM throws a fairly cryptic ClassNotFoundException. The details of the
> KafkaException wind up suppressed so it's impossible to figure out what
> actually went wrong.
> 
> I can think of two fixes that would prevent this from occurring in the
> Kafka or other connectors in the future:
> 1. DeclineCheckpoint should always send a SerializedThrowable to the JM
> rather than allowing CheckpointExceptions with non-deserializable root
> causes to slip through
> 2. CheckpointException should always capture its wrapped exception as a
> SerializedThrowable (i.e., use 'super(new SerializedThrowable(cause))'
> rather than 'super(cause)').
> 
> Thoughts?

Re: non-deserializable root cause in DeclineCheckpoint

Reply via email to