Hi all, With some offline discussion with Becket and Zhipeng, we have made some modification to the API to provide a unified method for iteration on the bounded streams, and also allow users to construct the iteration with mixed types of operator lifecycle (namely whether we re-create users' operators for each round). We have updated the FLIP to reflect the change.
If we do not have other concerns, we would start the vote tomorrow. Very thanks! Best, Yun ------------------------------------------------------------------ From:Yun Gao <yungao...@aliyun.com> Send Time:2021 Oct. 5 (Tue.) 14:27 To:"David Morávek" <d...@apache.org>; dev <dev@flink.apache.org> Subject:Re: [DISCUSS] FLIP-176: Unified Iteration to Support Algorithms (Flink ML) Hi David, Very thanks for the feedback and glad to see that we have the same opinions on a lot of points of the iteration! :) And for the checkpoint, basically to support the checkpoint for a job with feedback edges, we need to also include the records on the feedback edges into the checkpoint snapshot, as described in [1]. We do this by exploiting the reference count mechanism provided by the raw states so that the asynchronous phase would wait until we finish writing all the feedback records into the raw states, which is also similar to the implementation in the statefun. Including the feedback records into snapshot is enough for the unbounded iteration, but for the bounded iteration, we would also need 1. Checkpoint after tasks finished: since for an iteration job with bounded inputs, most time of the execution is spent after all the sources are finished and the iteration body is executing, we would need to support checkpoints during this period. Fortunately in 1.14 we have implemented the first version of this functionality. 2. Keep the notification of round increment exactly-once: for bounded iteration we would notify the round end for each operator via onEpochWatermarkIncrement(), this is done by insert epoch watermarks at the end of each round. We would like to keep the notification of onEpochWatermarkIncrement() exactly-once to simplify the algorithms' development. This is done by ensuring that the epoch watermarks with the same epoch value and the barriers of the same checkpoint always have the same order when transmitting in the iteration body. With this condition, after failover all the operators inside the iteration body must have received the same amount of notifications, and we could start with the next one. Also since the epoch watermarks might also be snapshot in the feedback edge snapshot, we disable the rescaling of the head / tail operators for the bounded iteration. Best, Yun [1] https://arxiv.org/abs/1506.08603 ------------------------------------------------------------------ From:David Morávek <d...@apache.org> Send Time:2021 Oct. 4 (Mon.) 14:05 To:dev <dev@flink.apache.org>; Yun Gao <yungao...@aliyun.com> Subject:Re: [DISCUSS] FLIP-176: Unified Iteration to Support Algorithms (Flink ML) Hi Yun, I did a quick pass over the design doc and it addresses all of the problems with the current iterations I'm aware of. It's great to see that you've been able to workaround the need of vectorized watermarks by giving up nested iterations (which IMO is more of an academic concept than something with a solid use-case). I'll try to give it some more thoughts, but from a first pass it looks great +1 ;) One thing that I'm unsure about, how do you plan to implement exactly-once checkpointing of the feedback edge? Best, D. On Mon, Oct 4, 2021 at 4:42 AM Yun Gao <yungao...@aliyun.com.invalid> wrote: > Hi all, > > If we do not have other concerns on this FLIP, we would like to start the > voting by the end of oct 8th. > > Best, > Yun. > > > ------------------------------------------------------------------ > From:Yun Gao <yungao...@aliyun.com> > Send Time:2021 Sep. 15 (Wed.) 20:47 > To:dev <dev@flink.apache.org> > Subject:[DISCUSS] FLIP-176: Unified Iteration to Support Algorithms (Flink > ML) > > > Hi all, > > DongLin, ZhiPeng and I are opening this thread to propose designing and > implementing a new iteration library inside the Flink-ML project, as > described in > FLIP-176[1]. > > Iteration serves as a fundamental functionality required to support the > implementation > of ML algorithms. Previously Flink supports bounded iteration on top of > the > DataSet API and unbounded iteration on top of the DataStream API. However, > since we are going to deprecated the dataset API and the current unbounded > iteration > API on top of the DataStream API is not fully complete, thus we are > proposing > to add the new unified iteration library on top of DataStream API to > support both > unbounded and bounded iterations. > > Very thanks for your feedbacks! > > [1] https://cwiki.apache.org/confluence/x/hAEBCw > > Best,Yun ------------------------------------------------------------------ From:David Morávek <d...@apache.org> Send Time:2021 Oct. 4 (Mon.) 14:05 To:dev <dev@flink.apache.org>; Yun Gao <yungao...@aliyun.com> Subject:Re: [DISCUSS] FLIP-176: Unified Iteration to Support Algorithms (Flink ML) Hi Yun, I did a quick pass over the design doc and it addresses all of the problems with the current iterations I'm aware of. It's great to see that you've been able to workaround the need of vectorized watermarks by giving up nested iterations (which IMO is more of an academic concept than something with a solid use-case). I'll try to give it some more thoughts, but from a first pass it looks great +1 ;) One thing that I'm unsure about, how do you plan to implement exactly-once checkpointing of the feedback edge? Best, D. On Mon, Oct 4, 2021 at 4:42 AM Yun Gao <yungao...@aliyun.com.invalid> wrote: > Hi all, > > If we do not have other concerns on this FLIP, we would like to start the > voting by the end of oct 8th. > > Best, > Yun. > > > ------------------------------------------------------------------ > From:Yun Gao <yungao...@aliyun.com> > Send Time:2021 Sep. 15 (Wed.) 20:47 > To:dev <dev@flink.apache.org> > Subject:[DISCUSS] FLIP-176: Unified Iteration to Support Algorithms (Flink > ML) > > > Hi all, > > DongLin, ZhiPeng and I are opening this thread to propose designing and > implementing a new iteration library inside the Flink-ML project, as > described in > FLIP-176[1]. > > Iteration serves as a fundamental functionality required to support the > implementation > of ML algorithms. Previously Flink supports bounded iteration on top of > the > DataSet API and unbounded iteration on top of the DataStream API. However, > since we are going to deprecated the dataset API and the current unbounded > iteration > API on top of the DataStream API is not fully complete, thus we are > proposing > to add the new unified iteration library on top of DataStream API to > support both > unbounded and bounded iterations. > > Very thanks for your feedbacks! > > [1] https://cwiki.apache.org/confluence/x/hAEBCw > > Best,Yun