RE: Re: [DISCUSS] Contribution of Multi Cluster Kafka Source

Ryan van Huuksloot via user Fri, 02 Sep 2022 09:08:20 -0700

Hi Mason,

First off, thanks for putting this FLIP together! Sorry for the delay. Full
disclosure Mason and I chatted a little bit at Flink Forward 2022 but I
have tried to capture the questions I had for him then.


I'll start the conversation with a few questions:

1. The concept of streamIds is not clear to me in the proposal and could
use some more information. If I understand correctly, they will be used in
the MetadataService to link KafkaClusters to ones you want to use? If you
assign stream ids using `setStreamIds`, how can you dynamically increase
the number of clusters you consume if the list of StreamIds is static? I am
basing this off of your example .setStreamIds(List.of("my-stream-1",
"my-stream-2")) so I could be off base with my assumption. If you don't
mind clearing up the intention, that would be great!

2. How would offsets work if you wanted to use this MultiClusterKafkaSource
with a file based backfill? In the case I am thinking of, you have a bucket
backed archive of Kafka data per cluster. and you want to pick up from the
last offset in the archived system, how would you set OffsetInitializers
"per cluster" potentially as a function or are you limited to setting an
OffsetInitializer for the entire Source?

3. Just to make sure - because this system will layer on top of Flink-27
and use KafkaSource for some aspects under the hood, the watermark
alignment that was introduced in FLIP-182 / Flink 1.15 would be possible
across multiple clusters if you assign them to the same alignment group?

Thanks!
Ryan

On 2022/06/28 07:21:15 Mason Chen wrote:
> Hi all,
>
> Thanks for the feedback! I'm adding the users, who responded in the user
> mailing list, to this thread.
>
> @Qingsheng - Yes, I would prefer to reuse the existing Kafka connector
> module. It makes a lot of sense since the dependencies are the same and
the
> implementation can also extend and improve some of the test utilities you
> have been working on for the FLIP 27 Kafka Source. I will enumerate the
> migration steps in the FLIP template.
>
> @Ryan - I don't have a public branch available yet, but I would appreciate
> your review on the FLIP design! When the FLIP design is approved by devs
> and the community, I can start to commit our implementation to a fork.
>
> @Andrew - Yup, one of the requirements of the connector is to read
multiple
> clusters within a single source, so it should be able to work well with
> your use case.
>
> @Devs - what do I need to get started on the FLIP design? I see the FLIP
> template and I have an account (mason6345), but I don't have access to
> create a page.
>
> Best,
> Mason
>
>
>
>
> On Sun, Jun 26, 2022 at 8:08 PM Qingsheng Ren <[email protected]> wrote:
>
> > Hi Mason,
> >
> > It sounds like an exciting enhancement to the Kafka source and will
> > benefit a lot of users I believe.
> >
> > Would you prefer to reuse the existing flink-connector-kafka module or
> > create a new one for the new multi-cluster feature? Personally I prefer
the
> > former one because users won’t need to introduce another dependency
module
> > to their projects in order to use the feature.
> >
> > Thanks for the effort on this and looking forward to your FLIP!
> >
> > Best,
> > Qingsheng
> >
> > > On Jun 24, 2022, at 09:43, Mason Chen <[email protected]> wrote:
> > >
> > > Hi community,
> > >
> > > We have been working on a Multi Cluster Kafka Source and are looking
to
> > > contribute it upstream. I've given a talk about the features and
design
> > at
> > > a Flink meetup: https://youtu.be/H1SYOuLcUTI.
> > >
> > > The main features that it provides is:
> > > 1. Reading multiple Kafka clusters within a single source.
> > > 2. Adjusting the clusters and topics the source consumes from
> > dynamically,
> > > without Flink job restart.
> > >
> > > Some of the challenging use cases that these features solve are:
> > > 1. Transparent Kafka cluster migration without Flink job restart.
> > > 2. Transparent Kafka topic migration without Flink job restart.
> > > 3. Direct integration with Hybrid Source.
> > >
> > > In addition, this is designed with wrapping and managing the existing
> > > KafkaSource components to enable these features, so it can continue to
> > > benefit from KafkaSource improvements and bug fixes. It can be
considered
> > > as a form of a composite source.
> > >
> > > I think the contribution of this source could benefit a lot of users
who
> > > have asked in the mailing list about Flink handling Kafka migrations
and
> > > removing topics in the past. I would love to hear and address your
> > thoughts
> > > and feedback, and if possible drive a FLIP!
> > >
> > > Best,
> > > Mason
> >
> >
>

RE: Re: [DISCUSS] Contribution of Multi Cluster Kafka Source

Reply via email to