Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

Eno Thereska Mon, 06 Feb 2017 07:06:48 -0800
Makes sense, thanks.

Eno
> On 6 Feb 2017, at 15:01, Damian Guy <[email protected]> wrote:
> 
> Hi Eno,
> 
> The state is on local disk, so having the checkpoint in a topic won't help.
> If the host fails permanently, then all of the local state is gone.
> Starting on another host requires restoring from the earliest offset.
> 
> Thanks,
> Damian
> 
> On Mon, 6 Feb 2017 at 14:58 Eno Thereska <[email protected]> wrote:
> 
>> Hi Damian,
>> 
>> I am trying to figure out if this handles a common enough failure
>> scenario. It seems to me this handles transient failures: a server with an
>> instance fails, then comes back up shortly and the instance recovers
>> quickly by reading the checkpoint file.
>> 
>> Permanent failures, where the server fails and the instance is migrated
>> onto another server are not helped since the checkpoint file is lost with
>> the server down. Even if the server eventually comes up (transient failure,
>> but instance has migrated), the instance would have migrated to another
>> server, and it doesn't help that we have a checkpoint file locally.
>> 
>> I was thinking a topic-based implementation would handle all scenarios.
>> What am I missing? I'm basically worried that the file-based implementation
>> addresses a niche problem, but can be convinced otherwise.
>> 
>> Thanks
>> Eno
>> 
>> 
>> 
>>> On 3 Feb 2017, at 10:03, Damian Guy <[email protected]> wrote:
>>> 
>>> Hi Matthias,
>>> 
>>> It possibly doesn't make sense to disable it, but then i'm sure someone
>>> will come up with a reason they don't want it!
>>> I'm happy to change it such that the checkpoint interval must be > 0.
>>> 
>>> Cheers,
>>> Damian
>>> 
>>> On Fri, 3 Feb 2017 at 01:29 Matthias J. Sax <[email protected]>
>> wrote:
>>> 
>>>> Thanks Damian.
>>>> 
>>>> One more question: "Checkpointing is disabled if the checkpoint interval
>>>> is set to a value <=0."
>>>> 
>>>> 
>>>> Does it make sense to disable check pointing? What's the tradeoff here?
>>>> 
>>>> 
>>>> -Matthias
>>>> 
>>>> 
>>>> On 2/2/17 1:51 AM, Damian Guy wrote:
>>>>> Hi Matthias,
>>>>> 
>>>>> Thanks for the comments.
>>>>> 
>>>>> 1. TBD - i need to do some performance tests and try and work out a
>>>>> sensible default.
>>>>> 2. Yes, you are correct. It could be a multiple of the
>>>> commit.interval.ms.
>>>>> But, that would also mean if you change the commit interval - say you
>>>> lower
>>>>> it, then you might also need to change the checkpoint setting (i.e, you
>>>>> still only want to checkpoint every n minutes).
>>>>> 
>>>>> On Wed, 1 Feb 2017 at 23:46 Matthias J. Sax <[email protected]>
>>>> wrote:
>>>>> 
>>>>>> Thanks for the KIP Damian.
>>>>>> 
>>>>>> I am wondering about two things:
>>>>>> 
>>>>>> 1. what should be the default value for the new parameter?
>>>>>> 2. why is the new parameter provided in ms?
>>>>>> 
>>>>>> About (2): because
>>>>>> 
>>>>>> "the minimum checkpoint interval will be the value of
>>>>>> commit.interval.ms. In effect the actual checkpoint interval will be
>> a
>>>>>> multiple of the commit interval"
>>>>>> 
>>>>>> it might be easier to just use an parameter that is "number-or-commit
>>>>>> intervals".
>>>>>> 
>>>>>> 
>>>>>> -Matthias
>>>>>> 
>>>>>> 
>>>>>> On 2/1/17 7:29 AM, Damian Guy wrote:
>>>>>>> Thanks for the comments Eno.
>>>>>>> As for exactly once, i don't believe this matters as we are just
>>>>>> restoring
>>>>>>> the change-log, i.e, the result of the aggregations that previously
>> ran
>>>>>>> etc. So once initialized the state store will be in the same state as
>>>> it
>>>>>>> was before.
>>>>>>> Having the checkpoint in a kafka topic is not ideal as the state is
>> per
>>>>>>> kafka streams instance. So each instance would need to start with a
>>>>>> unique
>>>>>>> id that is persistent.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Damian
>>>>>>> 
>>>>>>> On Wed, 1 Feb 2017 at 13:20 Eno Thereska <[email protected]>
>>>> wrote:
>>>>>>> 
>>>>>>>> As a follow up to my previous comment, have you thought about
>> writing
>>>>>> the
>>>>>>>> checkpoint to a topic instead of a local file? That would have the
>>>>>>>> advantage that all metadata continues to be managed by Kafka, as
>> well
>>>> as
>>>>>>>> fit with EoS. The potential disadvantage would be a slower latency,
>>>>>> however
>>>>>>>> if it is periodic as you mention, I'm not sure that would be a show
>>>>>> stopper.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> Eno
>>>>>>>>> On 1 Feb 2017, at 12:58, Eno Thereska <[email protected]>
>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks Damian, this is a good idea and will reduce the restore
>> time.
>>>>>>>> Looking forward, with exactly once and support for transactions in
>>>>>> Kafka, I
>>>>>>>> believe we'll have to add some support for rolling back checkpoints,
>>>>>> e.g.,
>>>>>>>> when a transaction is aborted. We need to be aware of that and
>> ideally
>>>>>>>> anticipate a bit those needs in the KIP.
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> Eno
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 1 Feb 2017, at 10:18, Damian Guy <[email protected]> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi all,
>>>>>>>>>> 
>>>>>>>>>> I would like to start the discussion on KIP-116:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-116+-+Add+State+Store+Checkpoint+Interval+Configuration
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Damian
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>>
Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

Reply via email to