Re: how to deploy new code with checkpointing

Soumitra Siddharth Johri Mon, 11 Apr 2016 23:16:12 -0700

I think before doing a code update you would like to gracefully shutdown your 
streaming job and checkpoint the processed offsets ( and any state that you 
maintain ) in database or Hdfs.
When you start the job up it should read this checkpoint file , build the 
necessary state and begin processing from the last offset processed.


Another approach would be to checkpoint the processed offsets in the streaming 
job whenever you read from Kafka . Then before reading the next batch of 
offsets instead of relying on spark checkpoint for offsets, read from the last 
processed offset that you saved.

Regards
Soumitra

> On Apr 11, 2016, at 8:31 PM, Siva Gudavalli <gss.su...@gmail.com> wrote:
> 
> Okie. That makes sense. 
> 
> Any recommendations on how to manage changes to my spark streaming app and 
> achieving fault tolerance at the same time
> 
>> On Mon, Apr 11, 2016 at 8:16 PM, Shixiong(Ryan) Zhu 
>> <shixi...@databricks.com> wrote:
>> You cannot. Streaming doesn't support it because code changes will break 
>> Java serialization.
>> 
>>> On Mon, Apr 11, 2016 at 4:30 PM, Siva Gudavalli <gss.su...@gmail.com> wrote:
>>> hello,
>>> 
>>> i am writing a spark streaming application to read data from kafka. I am 
>>> using no receiver approach and enabled checkpointing to make sure I am not 
>>> reading messages again in case of failure. (exactly once semantics) 
>>> 
>>> i have a quick question how checkpointing needs to be configured to handle 
>>> code changes in my spark streaming app. 
>>> 
>>> can you please suggest. hope the question makes sense.
>>> 
>>> thank you 
>>> 
>>> regards
>>> shiv
>

Re: how to deploy new code with checkpointing

Reply via email to