I'd second Chesnay's suggestion to use a custom source. It would be a piece
of cake with FLIP-27 [1], but we are not there yet unfortunately. It's
probably in Flink 1.11 (mid year) if you can wait.

The current way would be a source that wraps the two KafkaConsumer and
blocks the normal consumer from outputting elements. Here is a quick and
dirty solution that I threw together:
https://gist.github.com/AHeise/d7a8662f091e5a135c5ccfd6630634dd .

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface

On Mon, Jan 6, 2020 at 1:16 PM David Morin <morin.david....@gmail.com>
wrote:

> My naive solution can't work because a dump can be quite long.
> So, yes I have to find a way to stop the consumption from the topic used
> for streaming mode when a dump is done :(
> Terry, I try to implement something based on your reply and based on this
> thread
> https://stackoverflow.com/questions/59201503/flink-kafka-gracefully-close-flink-consuming-messages-from-kafka-source-after-a
> Any suggestions are welcomed
> thx.
>
> David
>
> On 2020/01/06 09:35:37, David Morin <morin.david....@gmail.com> wrote:
> > Hi,
> >
> > Thanks for your replies.
> > Yes Terry. You are right. I can try to create a custom source.
> > But perhaps, according to my use case, I figured out I can use a
> technical field in my data. This is a timestamp and I think I just have to
> ignore late events with watermarks or later in the pipeline according to
> metadata stored in the Flink state. I test it now...
> > Thx
> >
> > David
> >
> > On 2020/01/03 15:44:08, Chesnay Schepler <ches...@apache.org> wrote:
> > > Are you asking how to detect from within the job whether the dump is
> > > complete, or how to combine these 2 jobs?
> > >
> > > If you had a way to notice whether the dump is complete, then I would
> > > suggest to create a custom source that wraps 2 kafka sources, and
> switch
> > > between them at will based on your conditions.
> > >
> > >
> > > On 03/01/2020 03:53, Terry Wang wrote:
> > > > Hi,
> > > >
> > > > I’d like to share my opinion here. It seems that you need adjust the
> Kafka consumer to have communication each other. When your begin the dump
> process, you need to notify another CDC-topic consumer to wait idle.
> > > >
> > > >
> > > > Best,
> > > > Terry Wang
> > > >
> > > >
> > > >
> > > >> 2020年1月2日 16:49,David Morin <morin.david....@gmail.com> 写道:
> > > >>
> > > >> Hi,
> > > >>
> > > >> Is there a way to stop temporarily to consume one kafka source in
> streaming mode ?
> > > >> Use case: I have to consume 2 topics but in fact one of them is
> more prioritized.
> > > >> One of this topic is dedicated to ingest data from db (change data
> capture) and one of them is dedicated to make a synchronization (a dump
> i.e. a SELECT ... from db). At the moment the last one is performed by one
> Flink job and we start this one after stop the previous one (CDC) manually
> > > >> I want to merge these 2 modes and automatically stop consumption of
> the topic dedicated to the CDC mode when a dump is done.
> > > >> How to handle that with Flink in a streaming way ? backpressure ?
> ...
> > > >> Thx in advance for your insights
> > > >>
> > > >> David
> > > >
> > >
> > >
> >
>

Reply via email to