Have a look at this: https://github.com/koeninger/kafka-exactly-once
especially: https://github.com/koeninger/kafka-exactly-once/blob/master/src/main/scala/example/TransactionalPerBatch.scala https://github.com/koeninger/kafka-exactly-once/blob/master/src/main/scala/example/TransactionalPerPartition.scala On Fri, Oct 23, 2015 at 5:07 AM, Ramkumar V <ramkumar.c...@gmail.com> wrote: > Hi, > > I had written spark streaming application using kafka stream and its > writing to hdfs for every hour(batch time). I would like to know how to get > offset or commit offset of kafka stream while writing to hdfs so that if > there is any issue or redeployment, i'll start from the point where i did a > previous successful commit offset. I want to store offset in external db or > something like that, not in zookeeper. if i want to resume kafka stream > from the particular offset, how to resume from the particular offset in > spark ? > > *Thanks*, > <https://in.linkedin.com/in/ramkumarcs31> > >