Ok. lets close this KIP off then as it isn't needed at the moment. We can
revive later if needed.
On Tue, 14 Feb 2017 at 04:16 Eno Thereska wrote:
> Even if users commit on every record, the expensive part will not be the
> checkpointing proposed in this KIP, but the rest of the commit.
>
> Eno
Even if users commit on every record, the expensive part will not be the
checkpointing proposed in this KIP, but the rest of the commit.
Eno
> On 13 Feb 2017, at 23:46, Guozhang Wang wrote:
>
> I think I'm OK to always enable checkpointing, but I'm not sure if we want
> to checkpoint on every
I think I'm OK to always enable checkpointing, but I'm not sure if we want
to checkpoint on every commit. Since in the extreme case users can commit
on completed processing each record. So I think it is still valuable to
have a checkpoint internal config in this KIP, which can be ignored if EOS
is
I'm fine with that. Gouzhang?
On Fri, 10 Feb 2017 at 19:45, Matthias J. Sax wrote:
> I am actually supporting Eno's view: checkpoint on every commit.
>
> @Dhwani: I understand your view and did raise the same question about
> performance trade-off with checkpoiting enabled/disabled etc. However,
I am actually supporting Eno's view: checkpoint on every commit.
@Dhwani: I understand your view and did raise the same question about
performance trade-off with checkpoiting enabled/disabled etc. However,
it seems that writing the checkpoint file is super cheap -- thus, there
is nothing to gain p
Gouzhang, Thanks for the clarification. Understood.
Eno, you are correct if we just used commit interval then we wouldn't need
a KIP. But, then we'd have no way of turning it off.
On Fri, 10 Feb 2017 at 17:14 Eno Thereska wrote:
> A quick check: the checkpoint file is not new, we're just exposi
A quick check: the checkpoint file is not new, we're just exposing a knob on
when to set it, right? Would turning if off still do what it does today (i.e.,
write the checkpoint at the end when the user quits?) So it's not a new feature
as such, I was only recommending we dial up the frequency by
Damian,
I was thinking if it is a new failure scenarios but as Eno pointed out it
was not.
Another thing I was considering is if it has any impact for incorporating
KIP-98 to avoid duplicates: if there is a failure in the middle of a
transaction, then upon recovery we cannot rely on the local sta
The overhead of writing to the checkpoint file should be much, much smaller
than the overall overhead of doing a commit, so I think tuning the commit time
is sufficient to guide performance tradeoffs.
Eno
> On 10 Feb 2017, at 13:08, Dhwani Katagade
> wrote:
>
> May be for fine tuning the pe
May be for fine tuning the performance.
Say we don't need the checkpointing and would like to gain the lil bit
of performance improvement by turning it off.
The trade off is between giving people control knobs vs complicating the
complete set of knobs.
-dk
On Friday 10 February 2017 04:05 PM,
I can't see why users would care to turn it off.
Eno
> On 10 Feb 2017, at 10:29, Damian Guy wrote:
>
> Hi Eno,
>
> Sounds good to me. The only reason i can think of is if we want to be able
> to turn it off.
> Gouzhang - thoughts?
>
> On Fri, 10 Feb 2017 at 10:28 Eno Thereska wrote:
>
>> Qu
Hi Eno,
Sounds good to me. The only reason i can think of is if we want to be able
to turn it off.
Gouzhang - thoughts?
On Fri, 10 Feb 2017 at 10:28 Eno Thereska wrote:
> Question: if checkpointing is so cheap why not do it every commit
> interval? That way we can get rid of this extra config v
Question: if checkpointing is so cheap why not do it every commit interval?
That way we can get rid of this extra config variable and just use the existing
commit interval.
Less tuning knobs.
Eno
> On 10 Feb 2017, at 09:27, Damian Guy wrote:
>
> Gouzhang,
>
> You've confused me. The failure
Gouzhang,
You've confused me. The failure scenarios you have described are the same
as they are today. With the checkpoint files in place less data will be
replayed, so there will be fewer duplicates.
Are you saying you'd like the option to turn checkpointing off?
Thanks,
Damian
On Thu, 9 Feb 2
Eno,
You are right, it is not a new scenario.
Thinking a bit more on how we could incorporate KIP-98 in Streams, I feel
that if EOS is turned on inside Streams, then we probably cannot always
resume from the checkpointed offsets as it is not guaranteed to be
"consistent"; but since EOS may not be
Hi Guozhang,
It seems to me we have the same semantics today. Are you saying there is a new
failure scenario?
Thanks,
Eno
> On 9 Feb 2017, at 19:42, Guozhang Wang wrote:
>
> More specifically, here is my reasoning of failure cases, and would like to
> get your feedbacks:
>
> *StreamTask*
>
More specifically, here is my reasoning of failure cases, and would like to
get your feedbacks:
*StreamTask*
For stream-task, the committing order is 1) flush state (may send more
records to changelog in producer), 2) flush producer, 3) commit upstream
offsets. My understanding is that the writin
A quick question re: `We will add the above config parameter to
*StreamsConfig*. During *StreamTask#commit()*, *StandbyTask#commit()*, and
*GlobalUpdateStateTask#flushState()* we will check if the checkpoint
interval has elapsed and write the checkpoint file.`
Will the writing of the checkpoint fi
But 5 min means, that we (in the worst case) need to reply data from the
last 5 minutes to get the store ready.
So why not go with the min possible value of 30 seconds to speed up this
process if the impact is negligible anyway?
What do you gain by being conservative?
-Matthias
On 2/9/17 2:54
Why shouldn't it be 5 minutes? ;-)
It is a finger in the air number. Based on the testing i did it shows that
there isn't much, if any, overhead when checkpointing a single store on the
commit interval. The default commit interval is 30 seconds, so it could
possibly be set to that. However, i'd pre
Damian,
could you elaborate briefly why the default value should be 5 minutes?
What are the considerations, assumptions, etc. that go into picking this
value?
Right now, in the KIP and in this discussion, "5 mins" looks like a magic
number to me. :-)
-Michael
On Thu, Feb 9, 2017 at 11:03 AM,
I've ran the SimpleBenchmark with checkpoint on and off to see what the
impact is. It appears that there is very little impact, if any. The numbers
with checkpointing on actually look better, but that is likely largely due
to external influences.
In any case, i'm going to suggest we go with a defa
Matthias,
Fair point. I'll update it the KIP.
Thanks
On Wed, 8 Feb 2017 at 05:49 Matthias J. Sax wrote:
> Damian,
>
> I am not strict about it either. However, if there is no advantage in
> disabling it, we might not want to allow it. This would have the
> advantage to guard users to accidental
Damian,
I am not strict about it either. However, if there is no advantage in
disabling it, we might not want to allow it. This would have the
advantage to guard users to accidentally switch it off.
-Matthias
On 2/3/17 2:03 AM, Damian Guy wrote:
> Hi Matthias,
>
> It possibly doesn't make sens
> Starting on another host requires restoring from the earliest offset.
Btw, there's a special scenario where a full restore is not required: When
the local storage (volume) is being re-used, e.g. when a container uses a
storage mount that will be re-used by a new container in case the original
o
Makes sense, thanks.
Eno
> On 6 Feb 2017, at 15:01, Damian Guy wrote:
>
> Hi Eno,
>
> The state is on local disk, so having the checkpoint in a topic won't help.
> If the host fails permanently, then all of the local state is gone.
> Starting on another host requires restoring from the earliest
Hi Eno,
The state is on local disk, so having the checkpoint in a topic won't help.
If the host fails permanently, then all of the local state is gone.
Starting on another host requires restoring from the earliest offset.
Thanks,
Damian
On Mon, 6 Feb 2017 at 14:58 Eno Thereska wrote:
> Hi Dami
Hi Damian,
I am trying to figure out if this handles a common enough failure scenario. It
seems to me this handles transient failures: a server with an instance fails,
then comes back up shortly and the instance recovers quickly by reading the
checkpoint file.
Permanent failures, where the ser
Hi Matthias,
It possibly doesn't make sense to disable it, but then i'm sure someone
will come up with a reason they don't want it!
I'm happy to change it such that the checkpoint interval must be > 0.
Cheers,
Damian
On Fri, 3 Feb 2017 at 01:29 Matthias J. Sax wrote:
> Thanks Damian.
>
> One m
Thanks Damian.
One more question: "Checkpointing is disabled if the checkpoint interval
is set to a value <=0."
Does it make sense to disable check pointing? What's the tradeoff here?
-Matthias
On 2/2/17 1:51 AM, Damian Guy wrote:
> Hi Matthias,
>
> Thanks for the comments.
>
> 1. TBD - i
Hi Matthias,
Thanks for the comments.
1. TBD - i need to do some performance tests and try and work out a
sensible default.
2. Yes, you are correct. It could be a multiple of the commit.interval.ms.
But, that would also mean if you change the commit interval - say you lower
it, then you might als
Thanks for the KIP Damian.
I am wondering about two things:
1. what should be the default value for the new parameter?
2. why is the new parameter provided in ms?
About (2): because
"the minimum checkpoint interval will be the value of
commit.interval.ms. In effect the actual checkpoint interva
Thanks for the comments Eno.
As for exactly once, i don't believe this matters as we are just restoring
the change-log, i.e, the result of the aggregations that previously ran
etc. So once initialized the state store will be in the same state as it
was before.
Having the checkpoint in a kafka topic
As a follow up to my previous comment, have you thought about writing the
checkpoint to a topic instead of a local file? That would have the advantage
that all metadata continues to be managed by Kafka, as well as fit with EoS.
The potential disadvantage would be a slower latency, however if it
Thanks Damian, this is a good idea and will reduce the restore time. Looking
forward, with exactly once and support for transactions in Kafka, I believe
we'll have to add some support for rolling back checkpoints, e.g., when a
transaction is aborted. We need to be aware of that and ideally antic
Hi all,
I would like to start the discussion on KIP-116:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-116+-+Add+State+Store+Checkpoint+Interval+Configuration
Thanks,
Damian
36 matches
Mail list logo