Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-15 Thread Damian Guy
Ok. lets close this KIP off then as it isn't needed at the moment. We can revive later if needed. On Tue, 14 Feb 2017 at 04:16 Eno Thereska wrote: > Even if users commit on every record, the expensive part will not be the > checkpointing proposed in this KIP, but the rest of the commit. > > Eno

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-14 Thread Eno Thereska
Even if users commit on every record, the expensive part will not be the checkpointing proposed in this KIP, but the rest of the commit. Eno > On 13 Feb 2017, at 23:46, Guozhang Wang wrote: > > I think I'm OK to always enable checkpointing, but I'm not sure if we want > to checkpoint on every

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-13 Thread Guozhang Wang
I think I'm OK to always enable checkpointing, but I'm not sure if we want to checkpoint on every commit. Since in the extreme case users can commit on completed processing each record. So I think it is still valuable to have a checkpoint internal config in this KIP, which can be ignored if EOS is

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-10 Thread Damian Guy
I'm fine with that. Gouzhang? On Fri, 10 Feb 2017 at 19:45, Matthias J. Sax wrote: > I am actually supporting Eno's view: checkpoint on every commit. > > @Dhwani: I understand your view and did raise the same question about > performance trade-off with checkpoiting enabled/disabled etc. However,

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-10 Thread Matthias J. Sax
I am actually supporting Eno's view: checkpoint on every commit. @Dhwani: I understand your view and did raise the same question about performance trade-off with checkpoiting enabled/disabled etc. However, it seems that writing the checkpoint file is super cheap -- thus, there is nothing to gain p

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-10 Thread Damian Guy
Gouzhang, Thanks for the clarification. Understood. Eno, you are correct if we just used commit interval then we wouldn't need a KIP. But, then we'd have no way of turning it off. On Fri, 10 Feb 2017 at 17:14 Eno Thereska wrote: > A quick check: the checkpoint file is not new, we're just exposi

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-10 Thread Eno Thereska
A quick check: the checkpoint file is not new, we're just exposing a knob on when to set it, right? Would turning if off still do what it does today (i.e., write the checkpoint at the end when the user quits?) So it's not a new feature as such, I was only recommending we dial up the frequency by

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-10 Thread Guozhang Wang
Damian, I was thinking if it is a new failure scenarios but as Eno pointed out it was not. Another thing I was considering is if it has any impact for incorporating KIP-98 to avoid duplicates: if there is a failure in the middle of a transaction, then upon recovery we cannot rely on the local sta

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-10 Thread Eno Thereska
The overhead of writing to the checkpoint file should be much, much smaller than the overall overhead of doing a commit, so I think tuning the commit time is sufficient to guide performance tradeoffs. Eno > On 10 Feb 2017, at 13:08, Dhwani Katagade > wrote: > > May be for fine tuning the pe

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-10 Thread Dhwani Katagade
May be for fine tuning the performance. Say we don't need the checkpointing and would like to gain the lil bit of performance improvement by turning it off. The trade off is between giving people control knobs vs complicating the complete set of knobs. -dk On Friday 10 February 2017 04:05 PM,

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-10 Thread Eno Thereska
I can't see why users would care to turn it off. Eno > On 10 Feb 2017, at 10:29, Damian Guy wrote: > > Hi Eno, > > Sounds good to me. The only reason i can think of is if we want to be able > to turn it off. > Gouzhang - thoughts? > > On Fri, 10 Feb 2017 at 10:28 Eno Thereska wrote: > >> Qu

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-10 Thread Damian Guy
Hi Eno, Sounds good to me. The only reason i can think of is if we want to be able to turn it off. Gouzhang - thoughts? On Fri, 10 Feb 2017 at 10:28 Eno Thereska wrote: > Question: if checkpointing is so cheap why not do it every commit > interval? That way we can get rid of this extra config v

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-10 Thread Eno Thereska
Question: if checkpointing is so cheap why not do it every commit interval? That way we can get rid of this extra config variable and just use the existing commit interval. Less tuning knobs. Eno > On 10 Feb 2017, at 09:27, Damian Guy wrote: > > Gouzhang, > > You've confused me. The failure

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-10 Thread Damian Guy
Gouzhang, You've confused me. The failure scenarios you have described are the same as they are today. With the checkpoint files in place less data will be replayed, so there will be fewer duplicates. Are you saying you'd like the option to turn checkpointing off? Thanks, Damian On Thu, 9 Feb 2

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-09 Thread Guozhang Wang
Eno, You are right, it is not a new scenario. Thinking a bit more on how we could incorporate KIP-98 in Streams, I feel that if EOS is turned on inside Streams, then we probably cannot always resume from the checkpointed offsets as it is not guaranteed to be "consistent"; but since EOS may not be

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-09 Thread Eno Thereska
Hi Guozhang, It seems to me we have the same semantics today. Are you saying there is a new failure scenario? Thanks, Eno > On 9 Feb 2017, at 19:42, Guozhang Wang wrote: > > More specifically, here is my reasoning of failure cases, and would like to > get your feedbacks: > > *StreamTask* >

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-09 Thread Guozhang Wang
More specifically, here is my reasoning of failure cases, and would like to get your feedbacks: *StreamTask* For stream-task, the committing order is 1) flush state (may send more records to changelog in producer), 2) flush producer, 3) commit upstream offsets. My understanding is that the writin

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-09 Thread Guozhang Wang
A quick question re: `We will add the above config parameter to *StreamsConfig*. During *StreamTask#commit()*, *StandbyTask#commit()*, and *GlobalUpdateStateTask#flushState()* we will check if the checkpoint interval has elapsed and write the checkpoint file.` Will the writing of the checkpoint fi

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-09 Thread Matthias J. Sax
But 5 min means, that we (in the worst case) need to reply data from the last 5 minutes to get the store ready. So why not go with the min possible value of 30 seconds to speed up this process if the impact is negligible anyway? What do you gain by being conservative? -Matthias On 2/9/17 2:54

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-09 Thread Damian Guy
Why shouldn't it be 5 minutes? ;-) It is a finger in the air number. Based on the testing i did it shows that there isn't much, if any, overhead when checkpointing a single store on the commit interval. The default commit interval is 30 seconds, so it could possibly be set to that. However, i'd pre

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-09 Thread Michael Noll
Damian, could you elaborate briefly why the default value should be 5 minutes? What are the considerations, assumptions, etc. that go into picking this value? Right now, in the KIP and in this discussion, "5 mins" looks like a magic number to me. :-) -Michael On Thu, Feb 9, 2017 at 11:03 AM,

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-09 Thread Damian Guy
I've ran the SimpleBenchmark with checkpoint on and off to see what the impact is. It appears that there is very little impact, if any. The numbers with checkpointing on actually look better, but that is likely largely due to external influences. In any case, i'm going to suggest we go with a defa

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-08 Thread Damian Guy
Matthias, Fair point. I'll update it the KIP. Thanks On Wed, 8 Feb 2017 at 05:49 Matthias J. Sax wrote: > Damian, > > I am not strict about it either. However, if there is no advantage in > disabling it, we might not want to allow it. This would have the > advantage to guard users to accidental

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-07 Thread Matthias J. Sax
Damian, I am not strict about it either. However, if there is no advantage in disabling it, we might not want to allow it. This would have the advantage to guard users to accidentally switch it off. -Matthias On 2/3/17 2:03 AM, Damian Guy wrote: > Hi Matthias, > > It possibly doesn't make sens

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-06 Thread Michael Noll
> Starting on another host requires restoring from the earliest offset. Btw, there's a special scenario where a full restore is not required: When the local storage (volume) is being re-used, e.g. when a container uses a storage mount that will be re-used by a new container in case the original o

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-06 Thread Eno Thereska
Makes sense, thanks. Eno > On 6 Feb 2017, at 15:01, Damian Guy wrote: > > Hi Eno, > > The state is on local disk, so having the checkpoint in a topic won't help. > If the host fails permanently, then all of the local state is gone. > Starting on another host requires restoring from the earliest

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-06 Thread Damian Guy
Hi Eno, The state is on local disk, so having the checkpoint in a topic won't help. If the host fails permanently, then all of the local state is gone. Starting on another host requires restoring from the earliest offset. Thanks, Damian On Mon, 6 Feb 2017 at 14:58 Eno Thereska wrote: > Hi Dami

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-06 Thread Eno Thereska
Hi Damian, I am trying to figure out if this handles a common enough failure scenario. It seems to me this handles transient failures: a server with an instance fails, then comes back up shortly and the instance recovers quickly by reading the checkpoint file. Permanent failures, where the ser

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-03 Thread Damian Guy
Hi Matthias, It possibly doesn't make sense to disable it, but then i'm sure someone will come up with a reason they don't want it! I'm happy to change it such that the checkpoint interval must be > 0. Cheers, Damian On Fri, 3 Feb 2017 at 01:29 Matthias J. Sax wrote: > Thanks Damian. > > One m

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-02 Thread Matthias J. Sax
Thanks Damian. One more question: "Checkpointing is disabled if the checkpoint interval is set to a value <=0." Does it make sense to disable check pointing? What's the tradeoff here? -Matthias On 2/2/17 1:51 AM, Damian Guy wrote: > Hi Matthias, > > Thanks for the comments. > > 1. TBD - i

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-02 Thread Damian Guy
Hi Matthias, Thanks for the comments. 1. TBD - i need to do some performance tests and try and work out a sensible default. 2. Yes, you are correct. It could be a multiple of the commit.interval.ms. But, that would also mean if you change the commit interval - say you lower it, then you might als

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-01 Thread Matthias J. Sax
Thanks for the KIP Damian. I am wondering about two things: 1. what should be the default value for the new parameter? 2. why is the new parameter provided in ms? About (2): because "the minimum checkpoint interval will be the value of commit.interval.ms. In effect the actual checkpoint interva

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-01 Thread Damian Guy
Thanks for the comments Eno. As for exactly once, i don't believe this matters as we are just restoring the change-log, i.e, the result of the aggregations that previously ran etc. So once initialized the state store will be in the same state as it was before. Having the checkpoint in a kafka topic

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-01 Thread Eno Thereska
As a follow up to my previous comment, have you thought about writing the checkpoint to a topic instead of a local file? That would have the advantage that all metadata continues to be managed by Kafka, as well as fit with EoS. The potential disadvantage would be a slower latency, however if it

Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-01 Thread Eno Thereska
Thanks Damian, this is a good idea and will reduce the restore time. Looking forward, with exactly once and support for transactions in Kafka, I believe we'll have to add some support for rolling back checkpoints, e.g., when a transaction is aborted. We need to be aware of that and ideally antic

[DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-01 Thread Damian Guy
Hi all, I would like to start the discussion on KIP-116: https://cwiki.apache.org/confluence/display/KAFKA/KIP-116+-+Add+State+Store+Checkpoint+Interval+Configuration Thanks, Damian