Questions about spark's kafka integration should probably be directed to the spark user mailing list, not this one. I don't monitor kafka mailing lists as closely, for instance.
For the direct stream, Spark doesn't keep any state regarding offsets, unless you enable checkpointing. Have you read https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md On Mon, Oct 26, 2015 at 3:43 AM, Charan Ganga Phani Adabala < char...@eiqnetworks.com> wrote: > Hi All, > > > > We are working in Apache spark with Kafka integration, in this use case we > are using DirectStream approach. we want to avoid the data loss in this > approach for actually we take offsets and saving that offset into MongoDB. > > We want some clarification is Spark stores any offsets internally, let us > explain some *example* : > > For the first rdd batch *we get 0 to 5 offsets of events to be processed*, > but *unexpectedly* the application is crashed, then we started aging the > application, then this job *fetches again from 0 to 5 events or where the > event stopped in previous job.* > > *We are not committing any offsets in the above process, because we have > to commit offsets manually in DirectStream approach. Is that new job > fetches events form 0th position.* > > > > > > Thanks & Regards, > > *Ganga Phani Charan Adabala | Software Engineer* > > o: +91-40-23116680 | c: +91-9491418099 > > e: char...@eiqnetworks.com > > [image: cid:image001.jpg@01CF60B1.87C0C870] > *EiQ Networks®, Inc.* | www.eiqnetworks.com > > *www.socvue.com <http://www.socvue.com/>* | www.eiqfederal.com > > > > [image: Blog] <http://blog.eiqnetworks.com/>Blog > <http://blog.eiqnetworks.com/> [image: Twitter] > <https://twitter.com/eiqnetworks> Twitter > <https://twitter.com/eiqnetworks> [image: LinkedIn] > <http://www.linkedin.com/company/eiqnetworks> LinkedIn > <http://www.linkedin.com/company/eiqnetworks> [image: Facebook] > <http://www.facebook.com/eiqnetworks> Facebook > <http://www.facebook.com/eiqnetworks> > > > > *"This email is intended only for the use of the individual or entity > named above and may contain information that is confidential and > privileged. If you are not the intended recipient, you are hereby notified > that any dissemination, distribution or copying of the email is strictly > prohibited. If you have received this email in error, please destroy > the original message."* > > > > >