Re: Manage Kafka Offsets for Fault Tolerance Without Checkpointing

Roman Khachatryan Mon, 12 Apr 2021 05:31:38 -0700

Hi,

Could you please explain what you mean by internal restarts?


If you commit offsets or timestamps from sink after emitting records
to the external system then there should be no data loss.
Otherwise (if you commit offsets earlier), you have to persist
in-flight records to avoid data loss (i.e. enable checkpointing).

Regards,
Roman

On Mon, Apr 12, 2021 at 12:11 PM Rahul Patwari
<[email protected]> wrote:
>
> Hello,
>
> Context:
>
> We have a stateless Flink Pipeline which reads from Kafka topics.
> The pipeline has a Windowing operator(Used only for introducing a delay in 
> processing records) and AsyncI/O operators (used for Lookup/Enrichment).
>
> "At least Once" Processing semantics is needed for the pipeline to avoid data 
> loss.
>
> Checkpointing is disabled and we are dependent on the auto offset commit of 
> Kafka consumer for fault tolerance currently.
>
> As auto offset commit indicates that "the record is successfully read", 
> instead of "the record is successfully processed", there will be data loss if 
> there is a restart when the offset is committed to Kafka but not successfully 
> processed by the Flink Pipeline, as the record is NOT replayed again when the 
> pipeline is restarted.
>
> Checkpointing can solve this problem. But, since the pipeline is stateless, 
> we do not want to use checkpointing, which will persist all the records in 
> Windowing Operator and in-flight Async I/O calls.
>
> Question:
>
> We are looking for other ways to guarantee "at least once" processing without 
> checkpointing. One such way is to manage Kafka Offsets Externally.
>
> We can maintain offsets of each partition of each topic in Cassandra(or 
> maintain timestamp, where all records with timestamps less than this 
> timestamp are successfully processed) and configure Kafka consumer Start 
> Position - setStartFromTimestamp() or setStartFromSpecificOffsets()
>
> This will be helpful if the pipeline is manually restarted (say, JobManager 
> pod is restarted). But, how to avoid data loss in case of internal restarts?
>
> Has anyone used this approach?
> What are other ways to guarantee "at least once" processing without 
> checkpointing for a stateless Flink pipeline?
>
> Thanks,
> Rahul

Re: Manage Kafka Offsets for Fault Tolerance Without Checkpointing

Reply via email to