erTasksFinished-TriggeringCheckpointsAfterTasksFinished
--
From:Yun Gao
Send Time:2021 Jan. 13 (Wed.) 16:09
To:dev ; user
Subject:Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished
Hi all,
I updated the FLIP[1] to reflect the
eteBeforeFinish
--
From:Yun Gao
Send Time:2021 Jan. 12 (Tue.) 10:30
To:Khachatryan Roman
Cc:dev ; user
Subject:Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished
Hi Roman,
Very thanks for th
see explicit problems for waiting for the flush of pipeline
result partition.
Glad that we have the same viewpoints on this issue. :)
Best,
Yun
----------
From:Khachatryan Roman
Send Time:2021 Jan. 11 (Mon.) 19:14
To:Yun Gao
Cc:de
d be ok for us to view it as an optimization and
> postpone it to future versions ?
>
> Best,
> Yun
>
>
>
> --
> From:Khachatryan Roman
> Send Time:2021 Jan. 11 (Mon.) 05:46
> To:Yun
e: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished
Thanks a lot for your answers Yun,
> In detail, support we have a job with the graph A -> B -> C, support in one
> checkpoint A has reported FINISHED, CheckpointCoordinator would
> choose B as the new "source"
tChannel.onBuffer() is called
> with
> EndOfPartition) and then taking snapshot for the input channels, as the
> normal unaligned checkpoints does for the InputChannel side. Then
> we would be able to ensure the finished tasks always have an empty state.
>
> I'll also optimi
Hi Roman,
Very thanks for the feedbacks! I'll try to answer the issues inline:
> 1. Option 1 is said to be not preferable because it wastes resources and adds
> complexity (new event).
> However, the resources would be wasted for a relatively short time until the
> job finishes completely.
Thanks for starting this discussion (and sorry for probably duplicated
questions, I couldn't find them answered in FLIP or this thread).
1. Option 1 is said to be not preferable because it wastes resources and
adds complexity (new event).
However, the resources would be wasted for a relatively sho
>
> We could introduce an interface, sth like `RequiresFinalization` or
> `FinalizationListener` (all bad names). The operator itself knows when
> it is ready to completely shut down, Async I/O would wait for all
> requests, sink would potentially wait for a given number of checkpoints.
> The inter
This is somewhat unrelated to the discussion about how to actually do
the triggering when sources shut down, I'll write on that separately. I
just wanted to get this quick thought out.
For letting operators decide whether they actually want to wait for a
final checkpoint, which is relevant at
Hi Avrid,
Very thanks for the feedbacks!
For the second issue, sorry I think I might not make it very clear,
I'm initially thinking the case that for example for a job with graph A -> B ->
C, when we compute which tasks to trigger, A is still running, so we trigger A
to
Hi Yun,
1. I'd think that this is an orthogonal issue, which I'd solve separately.
My gut feeling says that this is something we should only address for new
sinks where we decouple the semantics of commits and checkpoints
anyways. @Aljoscha
Krettek any idea on this one?
2. I'm not sure I get it
Hi all,
I tested the previous PoC with the current tests and I found some new
issues that might cause divergence, and sorry for there might also be some
reversal for some previous problems:
1. Which operators should wait for one more checkpoint before close ?
One motiv
Hi Aljoscha,
Very thanks for the feedbacks! For the remaining issues:
> 1. You mean we would insert "artificial" barriers for barrier 2 in case
we receive EndOfPartition while other inputs have already received barrier 2?
I think that makes sense, yes.
Yes, exactly, I
Thanks for the thorough update! I'll answer inline.
On 14.12.20 16:33, Yun Gao wrote:
1. To include EndOfPartition into consideration for barrier alignment at
the TM side, we now tend to decouple the logic for EndOfPartition with the
normal alignment behaviors to avoid the complex interfe
Hi all,
I would like to resume this discussion for supporting checkpoints after
tasks Finished :) Based on the previous discussion, we now implement a version
of PoC [1] to try the idea. During the PoC we also met with some possible
issues:
1. To include EndOfPartition into considerat
Hi Till,
Very thanks for the feedbacks !
> 1) When restarting all tasks independent of the status at checkpoint time
> (finished, running, scheduled), we might allocate more resources than we
> actually need to run the remaining job. From a scheduling perspective it
> would be easier if we alrea
Thanks for starting this discussion Yun Gao,
I have three comments/questions:
1) When restarting all tasks independent of the status at checkpoint time
(finished, running, scheduled), we might allocate more resources than we
actually need to run the remaining job. From a scheduling perspective it
Hi Arvid,
Very thanks for the comments!
>>> 4) Yes, the interaction is not trivial and also I have not completely
>>> thought it through. But in general, I'm currently at the point where I
>>> think that we also need non-checkpoint related events in unaligned
>>> checkpoints. So just keep that i
Hi Yun,
4) Yes, the interaction is not trivial and also I have not completely
thought it through. But in general, I'm currently at the point where I
think that we also need non-checkpoint related events in unaligned
checkpoints. So just keep that in mind, that we might converge anyhow at
this poin
Hi Arvid,
Very thanks for the insightful comments! I added the responses for this issue
under the quota:
>> 1) You call the tasks that get the barriers injected leaf nodes, which would
>> make the > sinks the root nodes. That is very similar to how graphs in
>> relational algebra are labeled. H
Hi Yun,
Thank you for starting the discussion. This will solve one of the
long-standing issues [1] that confuse users. I'm also a big fan of option
3. It is also a bit closer to Chandy-Lamport again.
A couple of comments:
1) You call the tasks that get the barriers injected leaf nodes, which
wou
Hi, devs & users
Very sorry for the spoiled formats, I resent the discussion as follows.
As discussed in FLIP-131[1], Flink will make DataStream the unified API for
processing bounded and unbounded data in both streaming and blocking modes.
However, one long-standing problem for the streaming m
23 matches
Mail list logo