Storing passbacks transactionally with results in your own data store, with a schema that makes sense for you, is the optimal solution.
On Fri, Sep 25, 2015 at 11:05 AM, Radu Brumariu <bru...@gmail.com> wrote: > Right, I understand why the exceptions happen. > However, it seems less useful to have a checkpointing that only works in > the case of an application restart. IMO, code changes happen quite often, > and not being able to pick up where the previous job left off is quite a > bit of a hinderance. > > The solutions you mention would partially solve the problem, while > bringing new problems along ( increased resource utilization, difficulty in > managing multiple jobs consuming the same data ,etc ). > > The solution that we currently employ is committing the offsets to a > durable storage and making sure that the job reads the offsets from there > upon restart, while forsaking checkpointing. > > The scenario seems not to be an edge case, which is why I was asking that > perhaps it could be handled by the spark kafka API instead having everyone > come up with their own, sub-optimal solutions. > > Radu > > On Fri, Sep 25, 2015 at 5:06 AM, Adrian Tanase <atan...@adobe.com> wrote: > >> Hi Radu, >> >> The problem itself is not checkpointing the data – if your operations are >> stateless then you are only checkpointing the kafka offsets, you are right. >> The problem is that you are also checkpointing metadata – including the >> actual Code and serialized java classes – that’s why you’ll see >> ser/deser exceptions on restart with upgrade. >> >> If you’re not using stateful opetations, you might get away by using the >> old Kafka receiver w/o WAL – but you accept “at least once semantics”. As >> soon as you add in the WAL you are forced to checkpoint and you’re better >> off with the DirectReceiver approach. >> >> I believe the simplest way to get around is to support runnning 2 >> versions in parallel – with some app level control of a barrier (e.g. v1 >> reads events up to 3:00am, v2 after that). Manual state management is also >> supported by the framework but it’s harder to control because: >> >> - you’re not guaranteed to shut down gracefully >> - You may have a bug that prevents the state to be saved and you >> can’t restart the app w/o upgrade >> >> Less than ideal, yes :) >> >> -adrian >> >> From: Radu Brumariu >> Date: Friday, September 25, 2015 at 1:31 AM >> To: Cody Koeninger >> Cc: "user@spark.apache.org" >> Subject: Re: kafka direct streaming with checkpointing >> >> Would changing the direct stream api to support committing the offsets to >> kafka's ZK( like a regular consumer) as a fallback mechanism, in case >> recovering from checkpoint fails , be an accepted solution? >> >> On Thursday, September 24, 2015, Cody Koeninger <c...@koeninger.org> >> wrote: >> >>> This has been discussed numerous times, TD's response has consistently >>> been that it's unlikely to be possible >>> >>> On Thu, Sep 24, 2015 at 12:26 PM, Radu Brumariu <bru...@gmail.com> >>> wrote: >>> >>>> It seems to me that this scenario that I'm facing, is quite common for >>>> spark jobs using Kafka. >>>> Is there a ticket to add this sort of semantics to checkpointing ? Does >>>> it even make sense to add it there ? >>>> >>>> Thanks, >>>> Radu >>>> >>>> >>>> On Thursday, September 24, 2015, Cody Koeninger <c...@koeninger.org> >>>> wrote: >>>> >>>>> No, you cant use checkpointing across code changes. Either store >>>>> offsets yourself, or start up your new app code and let it catch up before >>>>> killing the old one. >>>>> >>>>> On Thu, Sep 24, 2015 at 8:40 AM, Radu Brumariu <bru...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> in my application I use Kafka direct streaming and I have also >>>>>> enabled checkpointing. >>>>>> This seems to work fine if the application is restarted. However if I >>>>>> change the code and resubmit the application, it cannot start because of >>>>>> the checkpointed data being of different class versions. >>>>>> Is there any way I can use checkpointing that can survive across >>>>>> application version changes? >>>>>> >>>>>> Thanks, >>>>>> Radu >>>>>> >>>>>> >>>>> >>> >