Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

Matthias J. Sax Thu, 09 Feb 2017 10:49:27 -0800

But 5 min means, that we (in the worst case) need to reply data from the
last 5 minutes to get the store ready.


So why not go with the min possible value of 30 seconds to speed up this
process if the impact is negligible anyway?

What do you gain by being conservative?


-Matthias

On 2/9/17 2:54 AM, Damian Guy wrote:
> Why shouldn't it be 5 minutes? ;-)
> It is a finger in the air number. Based on the testing i did it shows that
> there isn't much, if any, overhead when checkpointing a single store on the
> commit interval. The default commit interval is 30 seconds, so it could
> possibly be set to that. However, i'd prefer to be a little conservative so
> 5 minutes seemed reasonable.
> 
> 
> On Thu, 9 Feb 2017 at 10:25 Michael Noll <mich...@confluent.io> wrote:
> 
>> Damian,
>>
>> could you elaborate briefly why the default value should be 5 minutes?
>> What are the considerations, assumptions, etc. that go into picking this
>> value?
>>
>> Right now, in the KIP and in this discussion, "5 mins" looks like a magic
>> number to me. :-)
>>
>> -Michael
>>
>>
>>
>> On Thu, Feb 9, 2017 at 11:03 AM, Damian Guy <damian....@gmail.com> wrote:
>>
>>> I've ran the SimpleBenchmark with checkpoint on and off to see what the
>>> impact is. It appears that there is very little impact, if any. The
>> numbers
>>> with checkpointing on actually look better, but that is likely largely
>> due
>>> to external influences.
>>>
>>> In any case, i'm going to suggest we go with a default checkpoint
>> interval
>>> of 5 minutes. I've update the KIP with this.
>>>
>>> commit every 10 seconds (no checkpoint)
>>> Streams Performance [records/latency/rec-sec/MB-sec source+store]:
>>> 10000000/34798/287372.83751939767/29.570664980746017
>>> Streams Performance [records/latency/rec-sec/MB-sec source+store]:
>>> 10000000/35942/278226.0308274442/28.62945857214401
>>> Streams Performance [records/latency/rec-sec/MB-sec source+store]:
>>> 10000000/34677/288375.58035585546/29.673847218617528
>>> Streams Performance [records/latency/rec-sec/MB-sec source+store]:
>>> 10000000/34677/288375.58035585546/29.673847218617528
>>> Streams Performance [records/latency/rec-sec/MB-sec source+store]:
>>> 10000000/31192/320595.02436522185/32.98922800718133
>>>
>>>
>>> checkpoint every 10 seconds (same as commit interval)
>>> Streams Performance [records/latency/rec-sec/MB-sec source+store]:
>>> 10000000/36997/270292.185852907/27.81306592426413
>>> Streams Performance [records/latency/rec-sec/MB-sec source+store]:
>>> 10000000/32087/311652.69423754164/32.069062237043035
>>> Streams Performance [records/latency/rec-sec/MB-sec source+store]:
>>> 10000000/32895/303997.5680194558/31.281349749202004
>>> Streams Performance [records/latency/rec-sec/MB-sec source+store]:
>>> 10000000/33476/298721.4720994145/30.738439479029754
>>> Streams Performance [records/latency/rec-sec/MB-sec source+store]:
>>> 10000000/33196/301241.1133871551/30.99771056753826
>>>
>>> On Wed, 8 Feb 2017 at 09:02 Damian Guy <damian....@gmail.com> wrote:
>>>
>>>> Matthias,
>>>>
>>>> Fair point. I'll update it the KIP.
>>>> Thanks
>>>>
>>>> On Wed, 8 Feb 2017 at 05:49 Matthias J. Sax <matth...@confluent.io>
>>> wrote:
>>>>
>>>> Damian,
>>>>
>>>> I am not strict about it either. However, if there is no advantage in
>>>> disabling it, we might not want to allow it. This would have the
>>>> advantage to guard users to accidentally switch it off.
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 2/3/17 2:03 AM, Damian Guy wrote:
>>>>> Hi Matthias,
>>>>>
>>>>> It possibly doesn't make sense to disable it, but then i'm sure
>> someone
>>>>> will come up with a reason they don't want it!
>>>>> I'm happy to change it such that the checkpoint interval must be > 0.
>>>>>
>>>>> Cheers,
>>>>> Damian
>>>>>
>>>>> On Fri, 3 Feb 2017 at 01:29 Matthias J. Sax <matth...@confluent.io>
>>>> wrote:
>>>>>
>>>>>> Thanks Damian.
>>>>>>
>>>>>> One more question: "Checkpointing is disabled if the checkpoint
>>> interval
>>>>>> is set to a value <=0."
>>>>>>
>>>>>>
>>>>>> Does it make sense to disable check pointing? What's the tradeoff
>>> here?
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>> On 2/2/17 1:51 AM, Damian Guy wrote:
>>>>>>> Hi Matthias,
>>>>>>>
>>>>>>> Thanks for the comments.
>>>>>>>
>>>>>>> 1. TBD - i need to do some performance tests and try and work out a
>>>>>>> sensible default.
>>>>>>> 2. Yes, you are correct. It could be a multiple of the
>>>>>> commit.interval.ms.
>>>>>>> But, that would also mean if you change the commit interval - say
>> you
>>>>>> lower
>>>>>>> it, then you might also need to change the checkpoint setting (i.e,
>>> you
>>>>>>> still only want to checkpoint every n minutes).
>>>>>>>
>>>>>>> On Wed, 1 Feb 2017 at 23:46 Matthias J. Sax <matth...@confluent.io
>>>
>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for the KIP Damian.
>>>>>>>>
>>>>>>>> I am wondering about two things:
>>>>>>>>
>>>>>>>> 1. what should be the default value for the new parameter?
>>>>>>>> 2. why is the new parameter provided in ms?
>>>>>>>>
>>>>>>>> About (2): because
>>>>>>>>
>>>>>>>> "the minimum checkpoint interval will be the value of
>>>>>>>> commit.interval.ms. In effect the actual checkpoint interval will
>>> be
>>>> a
>>>>>>>> multiple of the commit interval"
>>>>>>>>
>>>>>>>> it might be easier to just use an parameter that is
>>> "number-or-commit
>>>>>>>> intervals".
>>>>>>>>
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2/1/17 7:29 AM, Damian Guy wrote:
>>>>>>>>> Thanks for the comments Eno.
>>>>>>>>> As for exactly once, i don't believe this matters as we are just
>>>>>>>> restoring
>>>>>>>>> the change-log, i.e, the result of the aggregations that
>> previously
>>>> ran
>>>>>>>>> etc. So once initialized the state store will be in the same
>> state
>>> as
>>>>>> it
>>>>>>>>> was before.
>>>>>>>>> Having the checkpoint in a kafka topic is not ideal as the state
>> is
>>>> per
>>>>>>>>> kafka streams instance. So each instance would need to start
>> with a
>>>>>>>> unique
>>>>>>>>> id that is persistent.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Damian
>>>>>>>>>
>>>>>>>>> On Wed, 1 Feb 2017 at 13:20 Eno Thereska <eno.there...@gmail.com
>>>
>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> As a follow up to my previous comment, have you thought about
>>>> writing
>>>>>>>> the
>>>>>>>>>> checkpoint to a topic instead of a local file? That would have
>> the
>>>>>>>>>> advantage that all metadata continues to be managed by Kafka, as
>>>> well
>>>>>> as
>>>>>>>>>> fit with EoS. The potential disadvantage would be a slower
>>> latency,
>>>>>>>> however
>>>>>>>>>> if it is periodic as you mention, I'm not sure that would be a
>>> show
>>>>>>>> stopper.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Eno
>>>>>>>>>>> On 1 Feb 2017, at 12:58, Eno Thereska <eno.there...@gmail.com>
>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Thanks Damian, this is a good idea and will reduce the restore
>>>> time.
>>>>>>>>>> Looking forward, with exactly once and support for transactions
>> in
>>>>>>>> Kafka, I
>>>>>>>>>> believe we'll have to add some support for rolling back
>>> checkpoints,
>>>>>>>> e.g.,
>>>>>>>>>> when a transaction is aborted. We need to be aware of that and
>>>> ideally
>>>>>>>>>> anticipate a bit those needs in the KIP.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Eno
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On 1 Feb 2017, at 10:18, Damian Guy <damian....@gmail.com>
>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I would like to start the discussion on KIP-116:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> 116+-+Add+State+Store+Checkpoint+Interval+Configuration
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Damian
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

Reply via email to