End-to-end exactly once from kafka source to S3 sink

chris snow Sun, 28 Jan 2018 04:20:07 -0800

I’m working with a kafka environment where I’m limited to 100 partitions @
1GB log.retention.bytes each.  I’m looking to implement exactly once
processing from this kafka source to a S3 sink.


If I have understood correctly, Flink will only commit the kafka offsets
when the data has been saved to S3.

Have I understood correctly that for Flink checkpoints and exactly once to
work, the assumption is that the number and size of partitions
(log.retention.bytes) in kafka are sufficient that should a checkpoint need
to be rolled back, the data still exists in kafka (I.e. it hasn’t been
over-written by new data)?

If the above is true, and I am using a DateTimeBucketer the bucket sizes
will directly influence how big the partitions should be in kafka, because
larger buckets will result in less frequent commits of the offsets?

Many thanks,

Chris

End-to-end exactly once from kafka source to S3 sink

Reply via email to