Hello,

I'm wondering how, in the event of a poison pill record on Kafka, to
advance a partition's checkpointed offsets by 1 when using the TableAPI/SQL.

It is my understanding that when checkpointing is enabled Flink uses its
own checkpoint committed offsets and not the offsets committed to Kafka
when starting a job from a checkpoint.

In the event that there is a poison pill record in Kafka that is crashing
the Flink job, we may want to simply advance our checkpointed offsets by 1
for the partition, past the poison record, and then continue operation as
normal. We do not want to lose any other state in Flink however.

I'm wondering how to go about this then. It's easy enough to have Kafka
advance its committed offsets. Is there a way to tell Flink to ignore
checkpointed offsets and instead respect the offsets committed to Kafka for
a consumer group when restoring from a checkpoint?
If so we could:
1. Advance Kafka's offsets.
2. Run our job from the checkpoint and have it use Kafka's offsets and then
checkpoint with new Kafka offsets.
3. Stop the job, and rerun it using Flink's committed, now advanced,
offsets.

Is this possible? Are there any better strategies?

Thanks!

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Reply via email to