Hi All, We are working in Apache spark with Kafka integration, in this use case we are using DirectStream approach. we want to avoid the data loss in this approach for actually we take offsets and saving that offset into MongoDB. We want some clarification is Spark stores any offsets internally, let us explain some example : For the first rdd batch we get 0 to 5 offsets of events to be processed, but unexpectedly the application is crashed, then we started aging the application, then this job fetches again from 0 to 5 events or where the event stopped in previous job. We are not committing any offsets in the above process, because we have to commit offsets manually in DirectStream approach. Is that new job fetches events form 0th position.
Thanks & Regards, Ganga Phani Charan Adabala | Software Engineer o: +91-40-23116680 | c: +91-9491418099 e: char...@eiqnetworks.com<mailto:char...@eiqnetworks.com> [cid:image001.jpg@01CF60B1.87C0C870] EiQ Networks(r), Inc. | www.eiqnetworks.com<http://www.eiqnetworks.com/> www.socvue.com<http://www.socvue.com/> | www.eiqfederal.com<http://www.eiqfederal.com/> [Blog]<http://blog.eiqnetworks.com/>Blog<http://blog.eiqnetworks.com/> [Twitter] <https://twitter.com/eiqnetworks> Twitter<https://twitter.com/eiqnetworks> [LinkedIn] <http://www.linkedin.com/company/eiqnetworks> LinkedIn<http://www.linkedin.com/company/eiqnetworks> [Facebook] <http://www.facebook.com/eiqnetworks> Facebook<http://www.facebook.com/eiqnetworks> "This email is intended only for the use of the individual or entity named above and may contain information that is confidential and privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of the email is strictly prohibited. If you have received this email in error, please destroy the original message."