Thanks Tathagata for the clarification. +1 on documenting limitations of checkpoints on structured streaming.
Hugo On Mon, Sep 11, 2017 at 7:13 PM, Dmitry Naumenko <dm.naume...@gmail.com> wrote: > +1 for me for this question. If there any constraints in restoring > checkpoint for Structured Streaming, they should be documented. > > > 2017-08-31 9:20 GMT+03:00 张万新 <kevinzwx1...@gmail.com>: > >> So is there any documents demonstrating in what condition can my >> application recover from the same checkpoint and in what condition not? >> >> Tathagata Das <tathagata.das1...@gmail.com>于2017年8月30日周三 下午1:20写道: >> >>> Hello, >>> >>> This is an unfortunate design on my part when I was building DStreams :) >>> >>> Fortunately, we learnt from our mistakes and built Structured Streaming >>> the correct way. Checkpointing in Structured Streaming stores only the >>> progress information (offsets, etc.), and the user can change their >>> application code (within certain constraints, of course) and still restart >>> from checkpoints (unlike DStreams). If you are just building out your >>> streaming applications, then I highly recommend you to try out Structured >>> Streaming instead of DStreams (which is effectively in maintenance mode). >>> >>> >>> On Fri, Aug 25, 2017 at 7:41 PM, Hugo Reinwald <hugo.reinw...@gmail.com> >>> wrote: >>> >>>> Hello, >>>> >>>> I am implementing a spark streaming solution with Kafka and read that >>>> checkpoints cannot be used across application code changes - here >>>> <https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html> >>>> >>>> I tested changes in application code and got the error message as b >>>> below - >>>> >>>> 17/08/25 15:10:47 WARN CheckpointReader: Error reading checkpoint from >>>> file file:/tmp/checkpoint/checkpoint-1503641160000.bk >>>> java.io.InvalidClassException: scala.collection.mutable.ArrayBuffer; >>>> local class incompatible: stream classdesc serialVersionUID = >>>> -2927962711774871866, local class serialVersionUID = 1529165946227428979 >>>> >>>> While I understand that this is as per design, can I know why does >>>> checkpointing work the way that it does verifying the class signatures? >>>> Would it not be easier to let the developer decide if he/she wants to use >>>> the old checkpoints depending on what is the change in application logic >>>> e.g. changes in code unrelated to spark/kafka - Logging / conf changes etc >>>> >>>> This is first post in the group. Apologies if I am asking the question >>>> again, I did a nabble search and it didnt throw up the answer. >>>> >>>> Thanks for the help. >>>> Hugo >>>> >>> >>> >