Hi Robert,

Uncaught exceptions that cause the job to fall into a fail-and-restart loop
is likewise to the corrupt record case I mentioned.

With exactly-once guarantees, the job will roll back to the last complete
checkpoint, which "resets" the Flink consumer to some earlier Kafka
partition offset. Eventually, that failing record will be processed again.
Currently there is no way to manipulate the "reset" offset on restore from
failure. That is strictly reset to the offset stored in the last complete
checkpoint, otherwise exactly-once is violated.


Rob wrote
> Or maybe the recipe is to manually retrieve the record at
> partitionX/offsetY for the group and then restart?

This would not work, as exactly-once is achieved with the offsets that Flink
stores in its checkpoints, not the offsets that are committed back to Kafka.

Cheers,
Gordon



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to