Re: [FLINK-14076] non-deserializable root cause in DeclineCheckpoint

2019-09-23 Thread Piotr Nowojski
Hi, I guess the TaskManager should have logged the original exception somewhere (I’m not saying that we shouldn’t solve this, just to make sure that the basics are covered), so you should already be able to deduce the reason of failure, right? I think that option 2. would not only be easier, b

Re: [FLINK-14076] non-deserializable root cause in DeclineCheckpoint

2019-09-23 Thread Till Rohrmann
Hi Jeffrey, thanks for reporting this issue and starting a discussion how to solve this problem. I've pulled in Piotr who is working on the checkpointing part of Flink. If a user generated exception can get reported, then we need to make sure that it is properly handled. Approach 2. would be easi

Re: [FLINK-14076] non-deserializable root cause in DeclineCheckpoint

2019-09-20 Thread Jeffrey Martin
To be clear -- I'm happy to make a PR for either option below. (Either is <10 lines diff.) It's just the contributor guidelines said to get consensus first and then only make a PR if I'm assigned to do the work. On Fri, Sep 20, 2019 at 12:23 PM Jeffrey Martin wrote: > (possible dupe; I wasn't su

[FLINK-14076] non-deserializable root cause in DeclineCheckpoint

2019-09-20 Thread Jeffrey Martin
(possible dupe; I wasn't subscribed before and the previous message didn't seem to go through) I'm on Flink v1.9 with the Kafka connector and a standalone JM. If FlinkKafkaProducer fails while checkpointing, it throws a KafkaException which gets wrapped in a CheckpointException which is sent to th