Hi Till,

Very thanks for the review and comments!

1) First I think in fact we could be able to do the computation outside of the 
main thread, 
and the current implementation mainly due to the computation is in general fast 
and we
initially want to have a simplified first version. 

The main requirement here is to have a constant view of the state of the tasks, 
otherwise
for example if we have A -> B, if A is running when we check if we need to 
trigger A, we will 
mark A as have to trigger, but if A gets to finished when we check B, we will 
also mark B as 
have to trigger, then B will receive both rpc trigger and checkpoint barrier, 
which would break
some assumption on the task side and complicate the implementation. 

But to cope this issue, we in fact could first have a snapshot of the tasks' 
state and then do the 
computation, both the two step do not need to be in the main thread. 

2) For the computation logic, in fact currently we benefit a lot from some 
shortcuts on all-to-all
edges and job vertex with all tasks running, these shortcuts could do checks on 
the job vertex level
first and skip some job vertices as a whole. With this optimization we have a 
O(V) algorithm, and the 
current running time of the worst case for a job with 320,000 tasks is less 
than 100ms. For
daily graph sizes the time would be further reduced linearly. 

If we do the computation based on the last triggered tasks, we may not easily 
encode this information 
into the shortcuts on the job vertex level. And since the time seems to be 
short, perhaps it is enough
to do re-computation from the scratch in consideration of the tradeoff between 
the performance and 
the complexity ? 

3) We are going to emit the EndOfInput event exactly after the finish() method 
and before the last 
snapshotState() method so that we could shut down the whole topology with a 
single final checkpoint. 
Very sorry for not include enough details for this part and I'll complement the 
FLIP with the details on
the process of the final checkpoint / savepoint.

Best,
Yun



------------------------------------------------------------------
From:Till Rohrmann <trohrm...@apache.org>
Send Time:2021 Jul. 14 (Wed.) 22:05
To:dev <dev@flink.apache.org>
Subject:Re: [RESULT][VOTE] FLIP-147: Support Checkpoint After Tasks Finished

Hi everyone,

I am a bit late to the voting party but let me ask three questions:

1) Why do we execute the trigger plan computation in the main thread if we
cannot guarantee that all tasks are still running when triggering the
checkpoint? Couldn't we do the computation in a different thread in order
to relieve the main thread a bit.

2) The implementation of the DefaultCheckpointPlanCalculator seems to go
over the whole topology for every calculation. Wouldn't it be more
efficient to maintain the set of current tasks to trigger and check whether
anything has changed and if so check the succeeding tasks until we have
found the current checkpoint trigger frontier?

3) When are we going to send the endOfInput events to a downstream task? If
this happens after we call finish on the upstream operator but before
snapshotState then it would be possible to shut down the whole topology
with a single final checkpoint. I think this part could benefit from a bit
more detailed description in the FLIP.

Cheers,
Till

On Fri, Jul 2, 2021 at 8:36 AM Yun Gao <yungao...@aliyun.com.invalid> wrote:

> Hi there,
>
> Since the voting time of FLIP-147[1] has passed, I'm closing the vote now.
>
> There were seven +1 votes ( 6 / 7 are bindings) and no -1 votes:
>
> - Dawid Wysakowicz (binding)
> - Piotr Nowojski(binding)
> - Jiangang Liu (binding)
> - Arvid Heise (binding)
> - Jing Zhang (binding)
> - Leonard Xu (non-binding)
> - Guowei Ma (binding)
>
> Thus I'm happy to announce that the update to the FLIP-147 is accepted.
>
> Very thanks everyone!
>
> Best,
> Yun
>
> [1]  https://cwiki.apache.org/confluence/x/mw-ZCQ

Reply via email to