Hi Chirag, Broadcast state is checkpointed, hence the savepoint would contain it.
Best, Konstantin On Wed, Feb 13, 2019 at 4:04 PM Chirag Dewan <chirag.dewa...@yahoo.in> wrote: > Hi Konstantin, > > For the second solution, would savepoint persist the Broadcast state in > State backend? Because I am aware that Broadcast state is not checkpointed. > > Is that correct? > > Thanks, > > Chirag > > Sent from Yahoo Mail on Android > <https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_AndroidEmailSig__AndroidUsers&af_wl=ym&af_sub1=Internal&af_sub2=Global_YGrowth&af_sub3=EmailSignature> > > On Mon, 11 Feb 2019 at 2:39 PM, Konstantin Knauf > <konstan...@ververica.com> wrote: > Hi Chirag, Hi Vadim, > > from the top of my head, I see two options here: > > * Buffer the "fast" stream inside the KeyedBroadcastProcessFunction until > relevant (whatever this means in your use case) broadcast events have > arrived. Advantage: operationally easy, events are emitted as early as > possible. Disadvantage: state size might become very large, depending on > the nature of the broadcast stream it might be hard to know, when the > "relevant broadcast events have arrived". > > * Start your job and only consume the broadcast stream (by configuration). > Once the stream is "fully processed", i.e. has caught up, take a savepoint. > Finally, start the job from this savepoint with the correct "fast" stream. > There is a small race condition between taking the savepoint and restarting > the job, which might matter (or not) depending on your use case. > > This topic is related to event-time alignment in sources, which has been > actively discussed in the community in the past and we might be able to > solve this in a similar way in the future. > > Cheers, > > Konstantin > > On Fri, Feb 8, 2019 at 5:48 PM Chirag Dewan <chirag.dewa...@yahoo.in> > wrote: > > Hi Vadim, > > I would be interested in this too. > > Presently, I have to read my lookup source in the *open *method and keep > it in a cache. By doing that I cannot make use of the broadcast state until > ofcourse the first emit comes on the *Broadcast *stream. > > The problem with waiting the event stream is the lack of knowledge that I > have read all the data from the lookup source. There is no possibility of > having a special marker in the data as well for my use case. > > So pre loading the data seems to be the only option right now. > > Thanks, > > Chirag > > > > On Friday, 8 February, 2019, 7:45:37 pm IST, Vadim Vararu < > vadim.var...@adswizz.com> wrote: > > > Hi all, > > I need to use the broadcast state mechanism ( > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html) > for the next scenario. > > I have a reference data stream (slow) and an events stream (fast running) > and I want to do a kind of lookup in the reference stream for each > event. The broadcast state mechanism seems to fit perfect the scenario. > > From documentation: > *As an example where broadcast state can emerge as a natural fit, one can > imagine a low-throughput stream containing a set of rules which we want to > evaluate against all elements coming from another stream.* > > However, I am not sure what is the correct way to delay the consumption of > the fast running stream until the slow one is fully read (in case of a > file) or until a marker is emitted (in case of some other source). Is there > any way to accomplish that? It doesn't seem to be a rare use case. > > Thanks, Vadim. > > > > -- > > Konstantin Knauf | Solutions Architect > > +49 160 91394525 > > <https://www.ververica.com/> > > Follow us @VervericaData > > -- > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > Conference > > Stream Processing | Event Driven | Real Time > > -- > > Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > -- > Data Artisans GmbH > Registered at Amtsgericht Charlottenburg: HRB 158244 B > Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen > > -- Konstantin Knauf | Solutions Architect +49 160 91394525 <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Data Artisans GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen