Have a look at this: https://github.com/koeninger/kafka-exactly-once

especially:
https://github.com/koeninger/kafka-exactly-once/blob/master/src/main/scala/example/TransactionalPerBatch.scala
https://github.com/koeninger/kafka-exactly-once/blob/master/src/main/scala/example/TransactionalPerPartition.scala

On Fri, Oct 23, 2015 at 5:07 AM, Ramkumar V <ramkumar.c...@gmail.com> wrote:

> Hi,
>
> I had written spark streaming application using kafka stream and its
> writing to hdfs for every hour(batch time). I would like to know how to get
> offset or commit offset of kafka stream while writing to hdfs so that if
> there is any issue or redeployment, i'll start from the point where i did a
> previous successful commit offset. I want to store offset in external db or
> something like that, not in zookeeper. if i want to resume kafka stream
> from the particular offset, how to resume from the particular offset in
> spark ?
>
> *Thanks*,
> <https://in.linkedin.com/in/ramkumarcs31>
>
>

Reply via email to