Hello,

*Context*:

We have a stateless Flink Pipeline which reads from Kafka topics.
The pipeline has a Windowing operator(Used only for introducing a delay in
processing records) and AsyncI/O operators (used for Lookup/Enrichment).

"At least Once" Processing semantics is needed for the pipeline to avoid
data loss.

Checkpointing is disabled and we are dependent on the auto offset commit of
Kafka consumer for fault tolerance currently.

As auto offset commit indicates that "the record is successfully read",
instead of "the record is successfully processed", there will be data loss
if there is a restart when the offset is committed to Kafka but not
successfully processed by the Flink Pipeline, as the record is NOT replayed
again when the pipeline is restarted.

Checkpointing can solve this problem. But, since the pipeline is stateless,
we do not want to use checkpointing, which will persist all the records in
Windowing Operator and in-flight Async I/O calls.

*Question*:

We are looking for other ways to guarantee "at least once" processing
without checkpointing. One such way is to manage Kafka Offsets Externally.

We can maintain offsets of each partition of each topic in Cassandra(or
maintain timestamp, where all records with timestamps less than this
timestamp are successfully processed) and configure Kafka consumer Start
Position
<https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-start-position-configuration>
- setStartFromTimestamp() or setStartFromSpecificOffsets()

This will be helpful if the pipeline is manually restarted (say, JobManager
pod is restarted). *But, how to avoid data loss in case of internal
restarts?*

Has anyone used this approach?
What are other ways to guarantee "at least once" processing without
checkpointing for a stateless Flink pipeline?

Thanks,
Rahul

Reply via email to