Note^2: InputSelectable is `@PublicEvolving` API, so it can be used. However as Timo pointed out, it would block the checkpointing. If I remember correctly there is a checkState that will not allow to use `InputSelectable` with enabled checkpointing.
Piotrek śr., 17 lut 2021 o 16:46 Kezhu Wang <kez...@gmail.com> napisał(a): > Hi all, > > Thanks Arvid and Timo for more candidates. > > I also think “buffering until control side ready” should be more canonical > in current stage of Flink. > > Timo has created FLINK-21392 for exposing user friendly data stream api to > block one input temporarily. > > If one really want go deep down the rabbit hole as Arvid said, I have one > approach from the top of my head. > > Combination of `MultipleInputStreamOperator`, `BoundedMultiInput`, > `InputSeletable`, FLIP-27 source and `ChainingStrategy.HEAD_WITH_SOURCES` > should achieve the goal and not interfering with checkpoint, but the > control side must not be bounded before FLIP-147 delivered. > > [1] FLINK-21392: https://issues.apache.org/jira/browse/FLINK-21392 > > Best, > Kezhu Wang > > On February 17, 2021 at 22:58:23, Arvid Heise (ar...@apache.org) wrote: > > Note that the question is also posted on SO [1]. > > [1] > https://stackoverflow.com/questions/66236004/connectedstreams-paused-until-control-stream-ready/ > > On Wed, Feb 17, 2021 at 3:31 PM Timo Walther <twal...@apache.org> wrote: > >> Hi Kezhu, >> >> `InputSelectable` is currently not exposed in the DataStream API because >> it might have side effects that need to be considered (e.g. are >> checkpoints still go through?). In any case, we don't have a good story >> for blocking a control stream yet. The best option is to buffer the >> other stream in state until the control stream is ready. You can also >> artifically slow down the other stream until then (e.g. by sleeping) to >> not buffer too much state. >> >> I hope this helps. >> >> Regards, >> Timo >> >> >> On 17.02.21 14:35, Kezhu Wang wrote: >> > A combination of `BoundedMultiInput` and `InputSelectable` could help. >> > You could see >> > `org.apache.flink.table.runtime.operators.join.HashJoinOperator` >> > for an usage example. The control topic have not to be bounded. >> > >> > There are maybe other approaches from later responses. I could not tell >> > whether it is canonical or not. >> > >> > Best, >> > Kezhu Wang >> > >> > On February 17, 2021 at 13:03:42, Salva Alcántara >> > (salcantara...@gmail.com <mailto:salcantara...@gmail.com>) wrote: >> > >> >> What is the canonical way to accomplish this: >> >> >> >> >Given a ConnectedStreams, e.g., a CoFlatMap UDF, how to prevent any >> >> processing of the data stream until >the control stream is "ready", so >> to >> >> speak >> >> >> >> My particular use case is as follows: I have a CoFlatMap function. The >> >> data >> >> stream contains elements that need to be enriched with additional >> >> information (they come with some fields empty). The missing >> >> information is >> >> taken from the control stream, whose elements come through a kafka >> >> source. >> >> Essentially, what I want is to pause any processing until having read >> the >> >> full (control) topic, otherwise (at least initially) the output >> elements >> >> will not be enriched as expected. >> >> >> >> >> >> >> >> -- >> >> Sent from: >> >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >> >> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/> >> >>