I think before doing a code update you would like to gracefully shutdown your streaming job and checkpoint the processed offsets ( and any state that you maintain ) in database or Hdfs. When you start the job up it should read this checkpoint file , build the necessary state and begin processing from the last offset processed.
Another approach would be to checkpoint the processed offsets in the streaming job whenever you read from Kafka . Then before reading the next batch of offsets instead of relying on spark checkpoint for offsets, read from the last processed offset that you saved. Regards Soumitra > On Apr 11, 2016, at 8:31 PM, Siva Gudavalli <gss.su...@gmail.com> wrote: > > Okie. That makes sense. > > Any recommendations on how to manage changes to my spark streaming app and > achieving fault tolerance at the same time > >> On Mon, Apr 11, 2016 at 8:16 PM, Shixiong(Ryan) Zhu >> <shixi...@databricks.com> wrote: >> You cannot. Streaming doesn't support it because code changes will break >> Java serialization. >> >>> On Mon, Apr 11, 2016 at 4:30 PM, Siva Gudavalli <gss.su...@gmail.com> wrote: >>> hello, >>> >>> i am writing a spark streaming application to read data from kafka. I am >>> using no receiver approach and enabled checkpointing to make sure I am not >>> reading messages again in case of failure. (exactly once semantics) >>> >>> i have a quick question how checkpointing needs to be configured to handle >>> code changes in my spark streaming app. >>> >>> can you please suggest. hope the question makes sense. >>> >>> thank you >>> >>> regards >>> shiv >