Hello,

Spark streaming kafka 0.10 integ provides an option to commit offset to
kafka using commitAsyn() API.
This only records the offset commit request. The actual commit is performed
in compute() after RDD for next batch is created.
Why is this so? Why not do a commit right when the API is called?
Anyway the commit process itself is async with an option to provide
callback handler.

This adds a window where application does a commit but it is not recorded
in kafka internal topic.
Any failure during that window will cause the last batch to be recomputed.

My app does a sink to external source that can't be idempotent. As such the
operations are assumed to be atleast once.
This seems to be one place where duplicates and be reduced.

Srikanth

Reply via email to