Re: Why do checkpoints work the way they do?

Hugo Reinwald Tue, 12 Sep 2017 09:39:22 -0700

Thanks Tathagata for the clarification. +1 on documenting limitations of
checkpoints on structured streaming.


Hugo

On Mon, Sep 11, 2017 at 7:13 PM, Dmitry Naumenko <dm.naume...@gmail.com>
wrote:

> +1 for me for this question. If there any constraints in restoring
> checkpoint for Structured Streaming, they should be documented.
>
>
> 2017-08-31 9:20 GMT+03:00 张万新 <kevinzwx1...@gmail.com>:
>
>> So is there any documents demonstrating in what condition can my
>> application recover from the same checkpoint and in what condition not?
>>
>> Tathagata Das <tathagata.das1...@gmail.com>于2017年8月30日周三 下午1:20写道：
>>
>>> Hello,
>>>
>>> This is an unfortunate design on my part when I was building DStreams :)
>>>
>>> Fortunately, we learnt from our mistakes and built Structured Streaming
>>> the correct way. Checkpointing in Structured Streaming stores only the
>>> progress information (offsets, etc.), and the user can change their
>>> application code (within certain constraints, of course) and still restart
>>> from checkpoints (unlike DStreams). If you are just building out your
>>> streaming applications, then I highly recommend you to try out Structured
>>> Streaming instead of DStreams (which is effectively in maintenance mode).
>>>
>>>
>>> On Fri, Aug 25, 2017 at 7:41 PM, Hugo Reinwald <hugo.reinw...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am implementing a spark streaming solution with Kafka and read that
>>>> checkpoints cannot be used across application code changes - here
>>>> <https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html>
>>>>
>>>> I tested changes in application code and got the error message as b
>>>> below -
>>>>
>>>> 17/08/25 15:10:47 WARN CheckpointReader: Error reading checkpoint from
>>>> file file:/tmp/checkpoint/checkpoint-1503641160000.bk
>>>> java.io.InvalidClassException: scala.collection.mutable.ArrayBuffer;
>>>> local class incompatible: stream classdesc serialVersionUID =
>>>> -2927962711774871866, local class serialVersionUID = 1529165946227428979
>>>>
>>>> While I understand that this is as per design, can I know why does
>>>> checkpointing work the way that it does verifying the class signatures?
>>>> Would it not be easier to let the developer decide if he/she wants to use
>>>> the old checkpoints depending on what is the change in application logic
>>>> e.g. changes in code unrelated to spark/kafka - Logging / conf changes etc
>>>>
>>>> This is first post in the group. Apologies if I am asking the question
>>>> again, I did a nabble search and it didnt throw up the answer.
>>>>
>>>> Thanks for the help.
>>>> Hugo
>>>>
>>>
>>>
>

Re: Why do checkpoints work the way they do?

Reply via email to