Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-14 Thread Yun Gao
erTasksFinished-TriggeringCheckpointsAfterTasksFinished -- From:Yun Gao Send Time:2021 Jan. 13 (Wed.) 16:09 To:dev ; user Subject:Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished Hi all, I updated the FLIP[1] to reflect the

Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-13 Thread Yun Gao
eteBeforeFinish -- From:Yun Gao Send Time:2021 Jan. 12 (Tue.) 10:30 To:Khachatryan Roman Cc:dev ; user Subject:Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished Hi Roman, Very thanks for th

Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-11 Thread Yun Gao
see explicit problems for waiting for the flush of pipeline result partition. Glad that we have the same viewpoints on this issue. :) Best, Yun ---------- From:Khachatryan Roman Send Time:2021 Jan. 11 (Mon.) 19:14 To:Yun Gao Cc:de

Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-11 Thread Khachatryan Roman
d be ok for us to view it as an optimization and > postpone it to future versions ? > > Best, > Yun > > > > -- > From:Khachatryan Roman > Send Time:2021 Jan. 11 (Mon.) 05:46 > To:Yun

Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-11 Thread Yun Gao
e: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished Thanks a lot for your answers Yun, > In detail, support we have a job with the graph A -> B -> C, support in one > checkpoint A has reported FINISHED, CheckpointCoordinator would > choose B as the new "source"

Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-10 Thread Khachatryan Roman
tChannel.onBuffer() is called > with > EndOfPartition) and then taking snapshot for the input channels, as the > normal unaligned checkpoints does for the InputChannel side. Then > we would be able to ensure the finished tasks always have an empty state. > > I'll also optimi

Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-07 Thread Yun Gao
Hi Roman, Very thanks for the feedbacks! I'll try to answer the issues inline: > 1. Option 1 is said to be not preferable because it wastes resources and adds > complexity (new event). > However, the resources would be wasted for a relatively short time until the > job finishes completely.

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-07 Thread Khachatryan Roman
Thanks for starting this discussion (and sorry for probably duplicated questions, I couldn't find them answered in FLIP or this thread). 1. Option 1 is said to be not preferable because it wastes resources and adds complexity (new event). However, the resources would be wasted for a relatively sho

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-07 Thread Arvid Heise
> > We could introduce an interface, sth like `RequiresFinalization` or > `FinalizationListener` (all bad names). The operator itself knows when > it is ready to completely shut down, Async I/O would wait for all > requests, sink would potentially wait for a given number of checkpoints. > The inter

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-07 Thread Aljoscha Krettek
This is somewhat unrelated to the discussion about how to actually do the triggering when sources shut down, I'll write on that separately. I just wanted to get this quick thought out. For letting operators decide whether they actually want to wait for a final checkpoint, which is relevant at

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-05 Thread Yun Gao
Hi Avrid, Very thanks for the feedbacks! For the second issue, sorry I think I might not make it very clear, I'm initially thinking the case that for example for a job with graph A -> B -> C, when we compute which tasks to trigger, A is still running, so we trigger A to

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-05 Thread Arvid Heise
Hi Yun, 1. I'd think that this is an orthogonal issue, which I'd solve separately. My gut feeling says that this is something we should only address for new sinks where we decouple the semantics of commits and checkpoints anyways. @Aljoscha Krettek any idea on this one? 2. I'm not sure I get it

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-12-24 Thread Yun Gao
Hi all, I tested the previous PoC with the current tests and I found some new issues that might cause divergence, and sorry for there might also be some reversal for some previous problems: 1. Which operators should wait for one more checkpoint before close ? One motiv

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-12-15 Thread Yun Gao
Hi Aljoscha, Very thanks for the feedbacks! For the remaining issues: > 1. You mean we would insert "artificial" barriers for barrier 2 in case we receive EndOfPartition while other inputs have already received barrier 2? I think that makes sense, yes. Yes, exactly, I

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-12-15 Thread Aljoscha Krettek
Thanks for the thorough update! I'll answer inline. On 14.12.20 16:33, Yun Gao wrote: 1. To include EndOfPartition into consideration for barrier alignment at the TM side, we now tend to decouple the logic for EndOfPartition with the normal alignment behaviors to avoid the complex interfe

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-12-14 Thread Yun Gao
Hi all, I would like to resume this discussion for supporting checkpoints after tasks Finished :) Based on the previous discussion, we now implement a version of PoC [1] to try the idea. During the PoC we also met with some possible issues: 1. To include EndOfPartition into considerat

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-10-13 Thread Yun Gao
Hi Till, Very thanks for the feedbacks ! > 1) When restarting all tasks independent of the status at checkpoint time > (finished, running, scheduled), we might allocate more resources than we > actually need to run the remaining job. From a scheduling perspective it > would be easier if we alrea

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-10-13 Thread Till Rohrmann
Thanks for starting this discussion Yun Gao, I have three comments/questions: 1) When restarting all tasks independent of the status at checkpoint time (finished, running, scheduled), we might allocate more resources than we actually need to run the remaining job. From a scheduling perspective it

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-10-13 Thread Yun Gao
Hi Arvid, Very thanks for the comments! >>> 4) Yes, the interaction is not trivial and also I have not completely >>> thought it through. But in general, I'm currently at the point where I >>> think that we also need non-checkpoint related events in unaligned >>> checkpoints. So just keep that i

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-10-12 Thread Arvid Heise
Hi Yun, 4) Yes, the interaction is not trivial and also I have not completely thought it through. But in general, I'm currently at the point where I think that we also need non-checkpoint related events in unaligned checkpoints. So just keep that in mind, that we might converge anyhow at this poin

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-10-12 Thread Yun Gao
Hi Arvid, Very thanks for the insightful comments! I added the responses for this issue under the quota: >> 1) You call the tasks that get the barriers injected leaf nodes, which would >> make the > sinks the root nodes. That is very similar to how graphs in >> relational algebra are labeled. H

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-10-12 Thread Arvid Heise
Hi Yun, Thank you for starting the discussion. This will solve one of the long-standing issues [1] that confuse users. I'm also a big fan of option 3. It is also a bit closer to Chandy-Lamport again. A couple of comments: 1) You call the tasks that get the barriers injected leaf nodes, which wou

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-10-08 Thread Yun Gao
Hi, devs & users Very sorry for the spoiled formats, I resent the discussion as follows. As discussed in FLIP-131[1], Flink will make DataStream the unified API for processing bounded and unbounded data in both streaming and blocking modes. However, one long-standing problem for the streaming m