Re: [DISCUSS] KIP-382: MirrorMaker 2.0

Jun Rao Mon, 17 Dec 2018 13:41:12 -0800

Hi, Sonke, Ryanne,

Thanks for the explanation. To me, the single connect cluster model could
be useful for any connector, not just MM. So, if we want to add it, it
seems it would be useful to do it in a backward compatible way in the
connect framework, rather than sth specific to MM. I am not sure what the
best approach is. For example, one other option is KIP-296. If we feel that's
adding too much work in this KIP, it might be ok to leave this part out in
this KIP.


Jun

On Fri, Dec 14, 2018 at 1:25 PM Ryanne Dolan <[email protected]> wrote:

> Thanks Sönke, you're spot-on. I don't want MM2 to wait for Connect features
> that don't exist yet, especially if MM2 is the primary use case for them.
> Moreover, I think MM2 can drive and inform some of these features, which
> only makes sense if we adopt MM2 first.
>
> Ryanne
>
> On Fri, Dec 14, 2018, 9:03 AM Sönke Liebau
> <[email protected] wrote:
>
> > Hi Jun,
> >
> > I believe Ryanne's idea is to run multiple workers per MM cluster-node,
> one
> > per target cluster. So in essence you'd specify three clusters in the MM
> > config and MM would then instantiate one worker per cluster. Every MM
> > connector would then be deployed to the appropriate (internal) worker
> that
> > is configured for the cluster in question. Thus there would be no changes
> > necessary to the Connect framework itself, everything would be handled
> by a
> > new layer around existing Connect code (probably a sibling implementation
> > to the DistributedHerder if I understood him correctly). Ryanne, please
> > correct/expand if I misunderstood your intentions.
> >
> > To briefly summarize the discussion that Ryanne and I had around this
> > earlier, my opinion was that the extra layer could potentially be avoided
> > by extending Connect instead, which would benefit all connectors.
> >
> > My proposal was to add a configuration option to the worker config that
> > allows defining "external clusters" which can then be referenced from the
> > connector config.
> >
> > For example:
> >
> > # Core cluster config stays the same and is used for status, config and
> > offsets as usual
> > bootstrap.servers=localkafka1:9092,localkafka2:9092
> >
> > # Allow defining extra remote clusters
> >
> >
> externalcluster.kafka_europe.bootstrap.servers=europekafka1:9092,europekafka2:9092
> > externalcluster.kafka_europe.security.protocol=SSL
> >
> >
> externalcluster.kafka_europe.ssl.truststore.location=/var/private/ssl/kafka.client.truststore.jks
> > ...
> >
> >
> externalcluster.kafka_asia.bootstrap.servers=asiakafka1:9092,asiakafka2:9092
> >
> >
> > When starting a connector you could now reference these pre-configured
> > clusters in the config:
> > {
> >   "name": "file-source",
> >   "config": {
> >     "connector.class": "FileStreamSource",
> >     "file": "/tmp/test.txt",
> >     "topic": "connect-test",
> >     "name": "file-source",
> >     "cluster": "kafka_asia"
> >   }
> > }
> >
> > When omitting the "cluster" parameter current behavior of Connect remains
> > unchanged. This way we could address multiple remote clusters from
> within a
> > single worker without adding the extra layer for MirrorMaker. I believe
> > that this could be done without major structural changes to the Connect
> > codebase, but I freely admit that this opinion is based on 10 minutes
> > poking through the code not any real expertise.
> >
> > Ryanne's main concern with this approach was that there are additional
> > worker setting that apply to all connectors and that no truly universal
> > approach would be feasible while running a single worker per Connect
> node.
> > Also he feels that from a development perspective it would be preferable
> to
> > have independent MM code and contribute applicable features back to
> > Connect.
> > While I agree that this would make development of MM easier it will also
> > create a certain amount of extra code (can probably be kept at a minimum,
> > but still) that could be avoided by using "vanilla" Connect for MM.
> >
> > I hope I summarized your views accurately Ryanne, if not please feel free
> > to correct me!
> >
> > Best regards,
> > Sönke
> >
> >
> > On Fri, Dec 14, 2018 at 1:55 AM Jun Rao <[email protected]> wrote:
> >
> > > Hi, Ryanne,
> > >
> > > Regarding the single connect cluster model, yes, the co-existence of a
> > MM2
> > > REST API and the nearly identical Connect API is one of my concerns.
> > > Implementation wise, my understanding is that the producer URL in a
> > > SourceTask is always obtained from the connect worker's configuration.
> > So,
> > > not sure how you would customize the producer URL for individual
> > SourceTask
> > > w/o additional support from the Connect framework.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Mon, Dec 10, 2018 at 1:17 PM Ryanne Dolan <[email protected]>
> > > wrote:
> > >
> > > > Jun, thanks for your time reviewing the KIP.
> > > >
> > > > > In a MirrorSourceConnector, it seems that the offsets of the source
> > > will
> > > > be stored in a different cluster from the target cluster?
> > > >
> > > > Jan Filipiak raised this issue as well, and suggested that no state
> be
> > > > tracked in the source cluster. I've since implemented
> > > MirrorSourceConnector
> > > > accordingly. And actually, this issue coincides with another major
> > > weakness
> > > > of legacy MirrorMaker: "rebalance storm". In both cases, the problem
> is
> > > due
> > > > to MirrorMaker using high-level consumer groups for replication.
> > > >
> > > > MM2 does not use consumer groups at all, but instead manages its own
> > > > partition assignments and offsets. MirrorSourceConnector monitors
> > > > topic-partitions and assigns them to MirrorSourceTasks directly --
> > there
> > > > are no high-level subscriptions and therefore no rebalances.
> Likewise,
> > > > MirrorSourceConnector stores its own offsets in the target cluster,
> so
> > no
> > > > state information is lost if the source cluster disappears. Both of
> > these
> > > > features are facilitated by the Connect framework and were inspired
> by
> > > > Uber's uReplicator.
> > > >
> > > > > If the single connect cluster model is indeed useful, it seems that
> > we
> > > > should support it in the general connect framework since it can be
> > useful
> > > > for managing other types connectors.
> > > >
> > > > Sönke Liebau suggested this as well. I've spent some time looking
> into
> > > > this, and I do believe it would be possible to bring these features
> to
> > > > Connect in general without breaking the existing APIs. For example,
> > > maybe a
> > > > connector config could specify which worker to use as a property like
> > > > worker.name=foo, and otherwise a default worker would be used. In
> this
> > > > case, a "MirrorMaker cluster" would just be a Connect cluster with a
> > > > pre-configured set of workers.
> > > >
> > > > My plan is to contribute MM2 and then help pull features from MM2
> into
> > > > Connect. I don't think it would make sense to prime Connect first,
> nor
> > > do I
> > > > want to propose a bunch of changes to Connect in this one KIP. If the
> > > > concern is primarily around the co-existence of a MM2 REST API and
> the
> > > > nearly identical Connect API, perhaps it would make sense to split
> off
> > > the
> > > > "MirrorMaker clusters" section of this KIP into a separate KIP aimed
> at
> > > > Connect in general? Would love to hear your thoughts on this.
> > > >
> > > > > Could you provide a bit more details on the content of the
> heartbeat
> > > > topic?
> > > >
> > > > At present the heartbeat is just a timestamp and the alias of the
> > cluster
> > > > of origin. This is more powerful than existing Connector-level
> metrics,
> > > as
> > > > these heartbeats are themselves replicated and can be traced across
> > > > multiple hops in the replication topology. I'll add this to the KIP.
> > > >
> > > > > Also, if this is useful, should we just add it add in the connect
> > > > framework, instead of just mirror maker?
> > > >
> > > > Same deal, I'd love to see this, but I don't think we should try to
> > prime
> > > > Connect before adopting MM2.
> > > >
> > > > > RemoteClusterUtils. Since this is part of the public interface,
> could
> > > you
> > > > document the public APIs?
> > > >
> > > > Will do, thanks.
> > > >
> > > > > source.cluster.bootstrap.servers/target.cluster.bootstrap.servers:
> > > Does a
> > > > Source/Sink connect need both?
> > > >
> > > > Sort of. I'm using this to construct an AdminClient for topic ACL and
> > > > configuration sync, since the Connect framework doesn't expose it. I
> > > intend
> > > > to follow-up KIP-382 with a proposal to expose this info to
> Connectors.
> > > > There's also KIP-158, but it deals with topic creation only.
> > > >
> > > > Thanks again for the feedback!
> > > >
> > > > Ryanne
> > > >
> > > >
> > > >
> > > > On Fri, Dec 7, 2018 at 6:22 PM Jun Rao <[email protected]> wrote:
> > > >
> > > > > Hi, Ryanne,
> > > > >
> > > > > Thanks for the KIP. At the high level, this looks like a reasonable
> > > > > proposal. A few comments below.
> > > > >
> > > > > 1. About using a single connector cluster to manage connectors
> > > accessing
> > > > > multiple Kafka clusters. It's good that you brought this up.  The
> > > > following
> > > > > are the tradeoffs that I see. The benefit of using a single connect
> > > > cluster
> > > > > is that it simplifies the management. There are a couple of
> potential
> > > > > downsides.
> > > > > (a) In a MirrorSourceConnector, it seems that the offsets of the
> > source
> > > > > will be stored in a different cluster from the target cluster? If
> the
> > > > data
> > > > > in the target Kafka cluster is lost (say the whole cluster is wiped
> > > out),
> > > > > one has to manually reset the offset to re-mirror the missing data.
> > (2)
> > > > If
> > > > > the offsets are stored in a separate cluster from the produced
> data,
> > it
> > > > > prevents the connector from running features such as EOS since
> > > currently
> > > > > EOS doesn't span Kafka clusters. If the single connect cluster
> model
> > is
> > > > > indeed useful, it seems that we should support it in the general
> > > connect
> > > > > framework since it can be useful for managing other types
> connectors.
> > > > This
> > > > > could be related to KIP-296 since it allows connector level
> > > > > producer/consumer customization.
> > > > >
> > > > > 2. The heartbeats topic. Could you provide a bit more details on
> the
> > > > > content of the heartbeat topic? I am not sure how that's different
> > from
> > > > the
> > > > > connector level metrics. Also, if this is useful, should we just
> add
> > it
> > > > add
> > > > > in the connect framework, instead of just mirror maker?
> > > > >
> > > > > 3. RemoteClusterUtils. Since this is part of the public interface,
> > > could
> > > > > you document the public APIs?
> > > > >
> > > > > 4.
> source.cluster.bootstrap.servers/target.cluster.bootstrap.servers:
> > > > Does
> > > > > a Source/Sink connect need both? Currently, the producer URL used
> in
> > a
> > > > > SourceWorker always comes from the Worker configuration. Are you
> > > > proposing
> > > > > to change that?
> > > > >
> > > > > Jun
> > > > >
> > > > > On Fri, Dec 7, 2018 at 12:18 PM Ryanne Dolan <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Michael, thanks for the comments!
> > > > > >
> > > > > > >  would like to see support for this to be done by hops, as well
> > > [...]
> > > > > > This then allows ring (hops = number of brokers in the ring),
> mesh
> > > > (every
> > > > > > cluster interconnected so hop=1), or even a tree (more fine
> grained
> > > > > setup)
> > > > > > cluster topology.
> > > > > >
> > > > > > That's a good idea, though we can do this at the topic level
> > without
> > > > > > tagging individual records. A max.hop of 1 would mean "A.topic1"
> is
> > > > > > allowed, but not "B.A.topic1". I think the default behavior would
> > > need
> > > > to
> > > > > > be max.hops = 1 to avoid unexpectedly creating a bunch of
> > D.C.B.A...
> > > > > topics
> > > > > > when you create a fully-connected mesh topology.
> > > > > >
> > > > > > Looking ahead a bit, I can imagine an external tool computing the
> > > > > spanning
> > > > > > tree of topics among a set of clusters based on inter-cluster
> > > > replication
> > > > > > lag, and setting up MM2 accordingly. But that's probably outside
> > the
> > > > > scope
> > > > > > of this KIP :)
> > > > > >
> > > > > > >  ...standalone MirrorMaker connector...
> > > > > > >     ./bin/kafka-mirror-maker-2.sh --consumer
> consumer.properties
> > > > > > --producer producer.properties
> > > > > >
> > > > > > Eventually, I'd like MM2 to completely replace legacy MM,
> including
> > > the
> > > > > > ./bin/kafka-mirror-maker.sh script. In the meantime, it's a good
> > idea
> > > > to
> > > > > > include a standalone driver. Something like
> > > > > > ./bin/connect-mirror-maker-standalone.sh with the same high-level
> > > > > > configuration file. I'll do that, thanks.
> > > > > >
> > > > > > > I see no section on providing support for mirror maker
> Handlers,
> > > > today
> > > > > > people can add handlers to have a little extra custom logic if
> > > needed,
> > > > > and
> > > > > > the handler api is public today so should be supported going
> > forwards
> > > > so
> > > > > > people are not on mass re-writing these.
> > > > > >
> > > > > > Great point. Connect offers single-message transformations and
> > > > converters
> > > > > > for this purpose, but I agree that we should honor the existing
> API
> > > if
> > > > > > possible. This might be as easy as providing an adapter class
> > between
> > > > > > connect's Transformation and mirror-maker's Handler. Maybe file a
> > > Jira
> > > > > > ticket to track this?
> > > > > >
> > > > > > Really appreciate your feedback!
> > > > > >
> > > > > > Ryanne
> > > > > >
> > > > > >
> > > > > > On Thu, Dec 6, 2018 at 7:03 PM Michael Pearce <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Re hops to stop the cycle and to allow a range of multi cluster
> > > > > > > topologies, see
> > https://www.rabbitmq.com/federated-exchanges.html
> > > > > where
> > > > > > > very similar was done in rabbit.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 12/7/18, 12:47 AM, "Michael Pearce" <[email protected]
> >
> > > > wrote:
> > > > > > >
> > > > > > >     Nice proposal.
> > > > > > >
> > > > > > >     Some comments.
> > > > > > >
> > > > > > >
> > > > > > >     On the section around cycle detection.
> > > > > > >
> > > > > > >     I would like to see support for this to be done by hops, as
> > > well
> > > > > e.g.
> > > > > > > using approach is to use a header for the number of hops, as
> the
> > > mm2
> > > > > > > replicates it increases the hop count and you can make the mm2
> > > > > > configurable
> > > > > > > to only produce messages onwards where hops are less than x.
> > > > > > >     This then allows ring (hops = number of brokers in the
> ring),
> > > > mesh
> > > > > > > (every cluster interconnected so hop=1), or even a tree (more
> > fine
> > > > > > grained
> > > > > > > setup) cluster topology.
> > > > > > >     FYI we do this currently with the current mirror maker,
> > using a
> > > > > > custom
> > > > > > > handler.
> > > > > > >
> > > > > > >
> > > > > > >     On the section around running a standalone MirrorMaker
> > > connector
> > > > > > >
> > > > > > >     I would suggest making this as easy to run as the
> > mirrormakers
> > > > are
> > > > > > > today, with a simple single sh script.
> > > > > > >     I assume this is what is proposed in section "Running
> > > MirrorMaker
> > > > > in
> > > > > > > legacy mode" but I would even do this before MM would be
> removed,
> > > > with
> > > > > a
> > > > > > -2
> > > > > > > varient.
> > > > > > >     e.g.
> > > > > > >     ./bin/kafka-mirror-maker-2.sh --consumer
> consumer.properties
> > > > > > > --producer producer.properties
> > > > > > >
> > > > > > >     Lastly
> > > > > > >
> > > > > > >     I see no section on providing support for mirror maker
> > > Handlers,
> > > > > > today
> > > > > > > people can add handlers to have a little extra custom logic if
> > > > needed,
> > > > > > and
> > > > > > > the handler api is public today so should be supported going
> > > forwards
> > > > > so
> > > > > > > people are not on mass re-writing these.
> > > > > > >
> > > > > > >     On 12/5/18, 5:36 PM, "Ryanne Dolan" <[email protected]
> >
> > > > wrote:
> > > > > > >
> > > > > > >         Sönke,
> > > > > > >
> > > > > > >         > The only thing that I could come up with is the
> > > limitation
> > > > > to a
> > > > > > > single
> > > > > > >         offset commit interval
> > > > > > >
> > > > > > >         Yes, and other internal properties, e.g. those used by
> > the
> > > > > > internal
> > > > > > >         consumers and producers, which, granted, probably are
> not
> > > > often
> > > > > > > changed
> > > > > > >         from their defaults, but that apply to Connectors
> across
> > > the
> > > > > > > entire cluster.
> > > > > > >
> > > > > > >         Ryanne
> > > > > > >
> > > > > > >         On Wed, Dec 5, 2018 at 3:21 AM Sönke Liebau
> > > > > > >         <[email protected]> wrote:
> > > > > > >
> > > > > > >         > Hi Ryanne,
> > > > > > >         >
> > > > > > >         > when you say "Currently worker configs apply across
> the
> > > > > entire
> > > > > > > cluster,
> > > > > > >         > which is limiting even for use-cases involving a
> single
> > > > Kafka
> > > > > > > cluster.",
> > > > > > >         > may I ask you to elaborate on those limitations a
> > little?
> > > > > > >         > The only thing that I could come up with is the
> > > limitation
> > > > > to a
> > > > > > > single
> > > > > > >         > offset commit interval value for all running
> > connectors.
> > > > > > >         > Maybe also the limitation to shared config
> providers..
> > > > > > >         >
> > > > > > >         > But you sound like you had painful experiences with
> > this
> > > > > > before,
> > > > > > > maybe
> > > > > > >         > you'd like to share the burden :)
> > > > > > >         >
> > > > > > >         > Best regards,
> > > > > > >         > Sönke
> > > > > > >         >
> > > > > > >         > On Wed, Dec 5, 2018 at 5:15 AM Ryanne Dolan <
> > > > > > > [email protected]> wrote:
> > > > > > >         >
> > > > > > >         > > Sönke,
> > > > > > >         > >
> > > > > > >         > > I think so long as we can keep the differences at a
> > > very
> > > > > high
> > > > > > > level (i.e.
> > > > > > >         > > the "control plane"), there is little downside to
> MM2
> > > and
> > > > > > > Connect
> > > > > > >         > > coexisting. I do expect them to converge to some
> > > extent,
> > > > > with
> > > > > > > features
> > > > > > >         > from
> > > > > > >         > > MM2 being pulled into Connect whenever this is
> > possible
> > > > > > > without breaking
> > > > > > >         > > things.
> > > > > > >         > >
> > > > > > >         > > I could definitely see your idea re hierarchies or
> > > groups
> > > > > of
> > > > > > > connectors
> > > > > > >         > > being useful outside MM2. Currently "worker
> configs"
> > > > apply
> > > > > > > across the
> > > > > > >         > > entire cluster, which is limiting even for
> use-cases
> > > > > > involving
> > > > > > > a single
> > > > > > >         > > Kafka cluster. If Connect supported multiple
> workers
> > in
> > > > the
> > > > > > > same cluster,
> > > > > > >         > > it would start to look a lot like a MM2 cluster.
> > > > > > >         > >
> > > > > > >         > > Ryanne
> > > > > > >         > >
> > > > > > >         > > On Tue, Dec 4, 2018 at 3:26 PM Sönke Liebau
> > > > > > >         > > <[email protected]> wrote:
> > > > > > >         > >
> > > > > > >         > > > Hi Ryanne,
> > > > > > >         > > >
> > > > > > >         > > > thanks for your response!
> > > > > > >         > > >
> > > > > > >         > > > It seems like you have already done a lot of
> > > > > investigation
> > > > > > > into the
> > > > > > >         > > > existing code and the solution design and all of
> > what
> > > > you
> > > > > > > write makes
> > > > > > >         > > sense
> > > > > > >         > > > to me. Would it potentially be worth adding this
> to
> > > the
> > > > > > KIP,
> > > > > > > now that
> > > > > > >         > you
> > > > > > >         > > > had to write it up because of me anyway?
> > > > > > >         > > >
> > > > > > >         > > > However, I am afraid that I am still not entirely
> > > > > convinced
> > > > > > > of the
> > > > > > >         > > > fundamental benefit this provides over an
> extended
> > > > > Connect
> > > > > > > that has the
> > > > > > >         > > > following functionality:
> > > > > > >         > > > - allow for organizing connectors into a
> > hierarchical
> > > > > > > structure -
> > > > > > >         > > > "clusters/us-west/..."
> > > > > > >         > > > - allow defining external Kafka clusters to be
> used
> > > by
> > > > > > > Source and Sink
> > > > > > >         > > > connectors instead of the local cluster
> > > > > > >         > > >
> > > > > > >         > > > Personally I think both of these features are
> > useful
> > > > > > > additions to
> > > > > > >         > > Connect,
> > > > > > >         > > > I'll address both separately below.
> > > > > > >         > > >
> > > > > > >         > > > Allowing to structure connectors in a hierarchy
> > > > > > >         > > > Organizing running connectors will grow more
> > > important
> > > > as
> > > > > > > corporate
> > > > > > >         > > > customers adapt Connect and installations grow in
> > > size.
> > > > > > > Additionally
> > > > > > >         > this
> > > > > > >         > > > could be useful for ACLs in case they are ever
> > added
> > > to
> > > > > > > Connect, as you
> > > > > > >         > > > could allow specific users access only to
> specific
> > > > > > > namespaces (and
> > > > > > >         > until
> > > > > > >         > > > ACLs are added it would facilitate using a
> reverse
> > > > proxy
> > > > > > for
> > > > > > > the same
> > > > > > >         > > > effect).
> > > > > > >         > > >
> > > > > > >         > > > Allow accessing multiple external clusters
> > > > > > >         > > > The reasoning for this feature is pretty much the
> > > same
> > > > as
> > > > > > > for a central
> > > > > > >         > > > Mirror Maker cluster, if a company has multiple
> > > > clusters
> > > > > > for
> > > > > > > whatever
> > > > > > >         > > > reason but wants to have ingest centralized in
> one
> > > > system
> > > > > > > aka one
> > > > > > >         > Connect
> > > > > > >         > > > cluster they would need the ability to read from
> > and
> > > > > write
> > > > > > > to an
> > > > > > >         > > arbitrary
> > > > > > >         > > > number of Kafka clusters.
> > > > > > >         > > > I haven't really looked at the code, just poked
> > > around
> > > > a
> > > > > > > couple of
> > > > > > >         > > minutes,
> > > > > > >         > > > but it appears like this could be done with
> fairly
> > > low
> > > > > > > effort. My
> > > > > > >         > general
> > > > > > >         > > > idea would be to leave the existing configuration
> > > > options
> > > > > > > untouched -
> > > > > > >         > > > Connect will always need a "primary" cluster that
> > is
> > > > used
> > > > > > > for storage
> > > > > > >         > of
> > > > > > >         > > > internal data (config, offsets, status) there is
> no
> > > > need
> > > > > to
> > > > > > > break
> > > > > > >         > > existing
> > > > > > >         > > > configs. But additionally allow adding named
> extra
> > > > > clusters
> > > > > > > by
> > > > > > >         > specifying
> > > > > > >         > > > options like
> > > > > > >         > > >   external.sales_cluster.bootstrap_servers=...
> > > > > > >         > > >
>  external.sales_cluster.ssl.keystore.location=...
> > > > > > >         > > >
>  external.marketing_cluster.bootstrap_servers=...
> > > > > > >         > > >
> > > > > > >         > > > The code for status, offset and config storage is
> > > > mostly
> > > > > > > isolated in
> > > > > > >         > the
> > > > > > >         > > > Kafka[Offset|Status|Config]BackingStore classes
> and
> > > > could
> > > > > > > remain pretty
> > > > > > >         > > > much unchanged.
> > > > > > >         > > >
> > > > > > >         > > > Producer and consumer creation for Tasks is done
> in
> > > the
> > > > > > > Worker as of
> > > > > > >         > > > KAFKA-7551 and is isolated in two functions. We
> > could
> > > > > add a
> > > > > > > two more
> > > > > > >         > > > functions with an extra argument for the external
> > > > cluster
> > > > > > > name to be
> > > > > > >         > used
> > > > > > >         > > > and return fitting consumers/producers.
> > > > > > >         > > > The source and sink config would then simply gain
> > an
> > > > > > > optional setting
> > > > > > >         > to
> > > > > > >         > > > specify the cluster name.
> > > > > > >         > > >
> > > > > > >         > > > I am very sure that I am missing a few large
> issues
> > > > with
> > > > > > > these ideas,
> > > > > > >         > I'm
> > > > > > >         > > > mostly back-of-the-napkin designing here, but it
> > > might
> > > > be
> > > > > > > worth a
> > > > > > >         > second
> > > > > > >         > > > look.
> > > > > > >         > > >
> > > > > > >         > > > Once we decide to diverge into two clusters:
> > > > MirrorMaker
> > > > > > and
> > > > > > > Connect, I
> > > > > > >         > > > think realistically the chance of those two ever
> > > being
> > > > > > > merged again
> > > > > > >         > > because
> > > > > > >         > > > they grow back together is practically zero -
> hence
> > > my
> > > > > > > hesitation.
> > > > > > >         > > >
> > > > > > >         > > > ----
> > > > > > >         > > >
> > > > > > >         > > > All of that being said, I am absolutely happy to
> > > agree
> > > > to
> > > > > > > disagree, I
> > > > > > >         > > think
> > > > > > >         > > > to a certain extent this is down to a question of
> > > > > personal
> > > > > > >         > > > style/preference. And as this is your baby and
> you
> > > have
> > > > > put
> > > > > > > a lot more
> > > > > > >         > > > effort and thought into it than I ever will I'll
> > shut
> > > > up
> > > > > > now
> > > > > > > :)
> > > > > > >         > > >
> > > > > > >         > > > Again, thanks for all your good work!
> > > > > > >         > > >
> > > > > > >         > > > Best regards,
> > > > > > >         > > > Sönke
> > > > > > >         > > >
> > > > > > >         > > > On Fri, Nov 30, 2018 at 9:00 PM Ryanne Dolan <
> > > > > > > [email protected]>
> > > > > > >         > > > wrote:
> > > > > > >         > > >
> > > > > > >         > > > > Thanks Sönke.
> > > > > > >         > > > >
> > > > > > >         > > > > > it just feels to me like an awful lot of
> > Connect
> > > > > > > functionality
> > > > > > >         > would
> > > > > > >         > > > need
> > > > > > >         > > > > to be reimplemented or at least wrapped
> > > > > > >         > > > >
> > > > > > >         > > > > Connect currently has two drivers,
> > > ConnectDistributed
> > > > > and
> > > > > > >         > > > > ConnectStandalone. Both set up a Herder, which
> > > > manages
> > > > > > > Workers. I've
> > > > > > >         > > > > implemented a third driver which sets up
> multiple
> > > > > > Herders,
> > > > > > > one for
> > > > > > >         > each
> > > > > > >         > > > > Kafka cluster as specified in a config file.
> From
> > > the
> > > > > > > Herder level
> > > > > > >         > > down,
> > > > > > >         > > > > nothing is changed or duplicated -- it's just
> > > > Connect.
> > > > > > >         > > > >
> > > > > > >         > > > > For the REST API, Connect wraps a Herder in a
> > > > > RestServer
> > > > > > > class, which
> > > > > > >         > > > > creates a Jetty server with a few JAX-RS
> > resources.
> > > > One
> > > > > > of
> > > > > > > these
> > > > > > >         > > > resources
> > > > > > >         > > > > is ConnectorsResource, which is the real meat
> of
> > > the
> > > > > REST
> > > > > > > API,
> > > > > > >         > enabling
> > > > > > >         > > > > start, stop, creation, deletion, and
> > configuration
> > > of
> > > > > > > Connectors.
> > > > > > >         > > > >
> > > > > > >         > > > > I've added MirrorRestServer, which wraps a set
> of
> > > > > Herders
> > > > > > > instead of
> > > > > > >         > > one.
> > > > > > >         > > > > The server exposes a single resource,
> > > > ClustersResource,
> > > > > > > which is
> > > > > > >         > only a
> > > > > > >         > > > few
> > > > > > >         > > > > lines of code:
> > > > > > >         > > > >
> > > > > > >         > > > > @GET
> > > > > > >         > > > > @Path("/")
> > > > > > >         > > > > public Collection<String> listClusters() {
> > > > > > >         > > > >   return clusters.keySet();
> > > > > > >         > > > > }
> > > > > > >         > > > >
> > > > > > >         > > > > @Path("/{cluster}")
> > > > > > >         > > > > public ConnectorsResource
> > > > > > >         > getConnectorsForCluster(@PathParam("cluster")
> > > > > > >         > > > > cluster) {
> > > > > > >         > > > >   return new
> > > > ConnectorsResource(clusters.get(cluster));
> > > > > > >         > > > > }
> > > > > > >         > > > >
> > > > > > >         > > > > (simplified a bit and subject to change)
> > > > > > >         > > > >
> > > > > > >         > > > > The ClustersResource defers to the existing
> > > > > > > ConnectorsResource, which
> > > > > > >         > > > again
> > > > > > >         > > > > is most of the Connect API. With this in
> place, I
> > > can
> > > > > > make
> > > > > > > requests
> > > > > > >         > > like:
> > > > > > >         > > > >
> > > > > > >         > > > > GET /clusters
> > > > > > >         > > > >
> > > > > > >         > > > > GET /clusters/us-west/connectors
> > > > > > >         > > > >
> > > > > > >         > > > > PUT /clusters/us-west/connectors/us-east/config
> > > > > > >         > > > > { "topics" : "topic1" }
> > > > > > >         > > > >
> > > > > > >         > > > > etc.
> > > > > > >         > > > >
> > > > > > >         > > > > So on the whole, very little code is involved
> in
> > > > > > > implementing
> > > > > > >         > > > "MirrorMaker
> > > > > > >         > > > > clusters". I won't rule out adding additional
> > > > features
> > > > > on
> > > > > > > top of this
> > > > > > >         > > > basic
> > > > > > >         > > > > API, but nothing should require re-implementing
> > > what
> > > > is
> > > > > > > already in
> > > > > > >         > > > Connect.
> > > > > > >         > > > >
> > > > > > >         > > > > > Wouldn't it be a viable alternative to look
> > into
> > > > > > > extending Connect
> > > > > > >         > > > itself
> > > > > > >         > > > >
> > > > > > >         > > > > Maybe Connect will evolve to the point where
> > > Connect
> > > > > > > clusters and
> > > > > > >         > > > > MirrorMaker clusters are indistinguishable,
> but I
> > > > think
> > > > > > > this is
> > > > > > >         > > unlikely,
> > > > > > >         > > > > since really no use-case outside replication
> > would
> > > > > > benefit
> > > > > > > from the
> > > > > > >         > > added
> > > > > > >         > > > > complexity. Moreover, I think support for
> > multiple
> > > > > Kafka
> > > > > > > clusters
> > > > > > >         > would
> > > > > > >         > > > be
> > > > > > >         > > > > hard to add without significant changes to the
> > > > existing
> > > > > > > APIs and
> > > > > > >         > > configs,
> > > > > > >         > > > > which all assume a single Kafka cluster. I
> think
> > > > > > > Connect-as-a-Service
> > > > > > >         > > and
> > > > > > >         > > > > Replication-as-a-Service are sufficiently
> > different
> > > > > > > use-cases that we
> > > > > > >         > > > > should expect the APIs and configuration files
> to
> > > be
> > > > at
> > > > > > > least
> > > > > > >         > slightly
> > > > > > >         > > > > different, even if both use the same framework
> > > > > > underneath.
> > > > > > > That
> > > > > > >         > said, I
> > > > > > >         > > > do
> > > > > > >         > > > > plan to contribute a few improvements to the
> > > Connect
> > > > > > > framework in
> > > > > > >         > > support
> > > > > > >         > > > > of MM2 -- just nothing within the scope of the
> > > > current
> > > > > > KIP.
> > > > > > >         > > > >
> > > > > > >         > > > > Thanks again!
> > > > > > >         > > > > Ryanne
> > > > > > >         > > > >
> > > > > > >         > > > >
> > > > > > >         > > > > On Fri, Nov 30, 2018 at 3:47 AM Sönke Liebau
> > > > > > >         > > > > <[email protected]> wrote:
> > > > > > >         > > > >
> > > > > > >         > > > > > Hi Ryanne,
> > > > > > >         > > > > >
> > > > > > >         > > > > > thanks. I missed the remote to remote
> > replication
> > > > > > > scenario in my
> > > > > > >         > > train
> > > > > > >         > > > of
> > > > > > >         > > > > > thought, you are right.
> > > > > > >         > > > > >
> > > > > > >         > > > > > That being said I have to admit that I am not
> > yet
> > > > > fully
> > > > > > > on board
> > > > > > >         > with
> > > > > > >         > > > the
> > > > > > >         > > > > > concept, sorry. But I might just be
> > > > misunderstanding
> > > > > > > what your
> > > > > > >         > > > intention
> > > > > > >         > > > > > is. Let me try and explain what I think it is
> > you
> > > > are
> > > > > > > trying to do
> > > > > > >         > > and
> > > > > > >         > > > > why
> > > > > > >         > > > > > I am on the fence about that and take it from
> > > > there.
> > > > > > >         > > > > >
> > > > > > >         > > > > > You want to create an extra mirrormaker
> driver
> > > > class
> > > > > > > which will
> > > > > > >         > take
> > > > > > >         > > > > > multiple clusters as configuration options.
> > Based
> > > > on
> > > > > > > these clusters
> > > > > > >         > > it
> > > > > > >         > > > > will
> > > > > > >         > > > > > then reuse the connect workers and create as
> > many
> > > > as
> > > > > > > necessary to
> > > > > > >         > be
> > > > > > >         > > > able
> > > > > > >         > > > > > to replicate to/from each of those configured
> > > > > clusters.
> > > > > > > It will
> > > > > > >         > then
> > > > > > >         > > > > > expose a rest api (since you stated subset of
> > > > Connect
> > > > > > > rest api I
> > > > > > >         > > assume
> > > > > > >         > > > > it
> > > > > > >         > > > > > will be a new / own one?)  that allows users
> to
> > > > send
> > > > > > > requests like
> > > > > > >         > > > > > "replicate topic a from cluster 1 to cluster
> 1"
> > > and
> > > > > > > start a
> > > > > > >         > connector
> > > > > > >         > > > on
> > > > > > >         > > > > > the relevant worker that can offer this
> > "route".
> > > > > > >         > > > > > This can be extended to a cluster by starting
> > > > mirror
> > > > > > > maker drivers
> > > > > > >         > on
> > > > > > >         > > > > other
> > > > > > >         > > > > > nodes with the same config and it would offer
> > all
> > > > the
> > > > > > > connect
> > > > > > >         > > features
> > > > > > >         > > > of
> > > > > > >         > > > > > balancing restarting in case of failure etc.
> > > > > > >         > > > > >
> > > > > > >         > > > > > If this understanding is correct then it just
> > > feels
> > > > > to
> > > > > > > me like an
> > > > > > >         > > awful
> > > > > > >         > > > > lot
> > > > > > >         > > > > > of Connect functionality would need to be
> > > > > reimplemented
> > > > > > > or at least
> > > > > > >         > > > > > wrapped, which potentially could mean
> > additional
> > > > > effort
> > > > > > > for
> > > > > > >         > > maintaining
> > > > > > >         > > > > and
> > > > > > >         > > > > > extending Connect down the line. Wouldn't it
> > be a
> > > > > > viable
> > > > > > >         > alternative
> > > > > > >         > > to
> > > > > > >         > > > > > look into extending Connect itself to allow
> > > > defining
> > > > > > > "remote
> > > > > > >         > > clusters"
> > > > > > >         > > > > > which can then be specified in the connector
> > > config
> > > > > to
> > > > > > > be used
> > > > > > >         > > instead
> > > > > > >         > > > of
> > > > > > >         > > > > > the local cluster? I imagine that change
> itself
> > > > would
> > > > > > > not be too
> > > > > > >         > > > > extensive,
> > > > > > >         > > > > > the main effort would probably be in coming
> up
> > > > with a
> > > > > > > sensible
> > > > > > >         > config
> > > > > > >         > > > > > structure and ensuring backwards
> compatibility
> > > with
> > > > > > > existing
> > > > > > >         > > connector
> > > > > > >         > > > > > configs.
> > > > > > >         > > > > > This would still allow to use a regular
> Connect
> > > > > cluster
> > > > > > > for an
> > > > > > >         > > > arbitrary
> > > > > > >         > > > > > number of clusters, thus still having a
> > dedicated
> > > > > > > MirrorMaker
> > > > > > >         > cluster
> > > > > > >         > > > by
> > > > > > >         > > > > > running only MirrorMaker Connectors in there
> if
> > > you
> > > > > > want
> > > > > > > the
> > > > > > >         > > > isolation. I
> > > > > > >         > > > > > agree that it would not offer the level of
> > > > > abstraction
> > > > > > > around
> > > > > > >         > > > replication
> > > > > > >         > > > > > that your concept would enable to implement,
> > but
> > > I
> > > > > > think
> > > > > > > if would
> > > > > > >         > be
> > > > > > >         > > > far
> > > > > > >         > > > > > less implementation and maintenance effort.
> > > > > > >         > > > > >
> > > > > > >         > > > > > But again, all of that is based on my,
> > > potentially
> > > > > > > flawed,
> > > > > > >         > > > understanding
> > > > > > >         > > > > of
> > > > > > >         > > > > > your proposal, please feel free to correct me
> > :)
> > > > > > >         > > > > >
> > > > > > >         > > > > > Best regards,
> > > > > > >         > > > > > Sönke
> > > > > > >         > > > > >
> > > > > > >         > > > > > On Fri, Nov 30, 2018 at 1:39 AM Ryanne Dolan
> <
> > > > > > >         > [email protected]>
> > > > > > >         > > > > > wrote:
> > > > > > >         > > > > >
> > > > > > >         > > > > > > Sönke, thanks for the feedback!
> > > > > > >         > > > > > >
> > > > > > >         > > > > > > >  the renaming policy [...] can be
> disabled
> > > > [...]
> > > > > > The
> > > > > > > KIP itself
> > > > > > >         > > > does
> > > > > > >         > > > > > not
> > > > > > >         > > > > > > mention this
> > > > > > >         > > > > > >
> > > > > > >         > > > > > > Good catch. I've updated the KIP to call
> this
> > > > out.
> > > > > > >         > > > > > >
> > > > > > >         > > > > > > > "MirrorMaker clusters" I am not sure I
> > fully
> > > > > > > understand the
> > > > > > >         > issue
> > > > > > >         > > > you
> > > > > > >         > > > > > > are trying to solve
> > > > > > >         > > > > > >
> > > > > > >         > > > > > > MirrorMaker today is not scalable from an
> > > > > operational
> > > > > > >         > perspective.
> > > > > > >         > > > > Celia
> > > > > > >         > > > > > > Kung at LinkedIn does a great job of
> > explaining
> > > > > this
> > > > > > > problem [1],
> > > > > > >         > > > which
> > > > > > >         > > > > > has
> > > > > > >         > > > > > > caused LinkedIn to drop MirrorMaker in
> favor
> > of
> > > > > > > Brooklin. With
> > > > > > >         > > > > Brooklin,
> > > > > > >         > > > > > a
> > > > > > >         > > > > > > single cluster, single API, and single UI
> > > > controls
> > > > > > > replication
> > > > > > >         > > flows
> > > > > > >         > > > > for
> > > > > > >         > > > > > an
> > > > > > >         > > > > > > entire data center. With MirrorMaker 2.0,
> the
> > > > > vision
> > > > > > > is much the
> > > > > > >         > > > same.
> > > > > > >         > > > > > >
> > > > > > >         > > > > > > If your data center consists of a small
> > number
> > > of
> > > > > > > Kafka clusters
> > > > > > >         > > and
> > > > > > >         > > > an
> > > > > > >         > > > > > > existing Connect cluster, it might make
> more
> > > > sense
> > > > > to
> > > > > > > re-use the
> > > > > > >         > > > > Connect
> > > > > > >         > > > > > > cluster with MirrorSource/SinkConnectors.
> > > There's
> > > > > > > nothing wrong
> > > > > > >         > > with
> > > > > > >         > > > > this
> > > > > > >         > > > > > > approach for small deployments, but this
> > model
> > > > also
> > > > > > > doesn't
> > > > > > >         > scale.
> > > > > > >         > > > This
> > > > > > >         > > > > > is
> > > > > > >         > > > > > > because Connect clusters are built around a
> > > > single
> > > > > > > Kafka cluster
> > > > > > >         > --
> > > > > > >         > > > > what
> > > > > > >         > > > > > I
> > > > > > >         > > > > > > call the "primary" cluster -- and all
> > > Connectors
> > > > in
> > > > > > > the cluster
> > > > > > >         > > must
> > > > > > >         > > > > > either
> > > > > > >         > > > > > > consume from or produce to this single
> > cluster.
> > > > If
> > > > > > you
> > > > > > > have more
> > > > > > >         > > than
> > > > > > >         > > > > one
> > > > > > >         > > > > > > "active" Kafka cluster in each data center,
> > > > you'll
> > > > > > end
> > > > > > > up needing
> > > > > > >         > > > > > multiple
> > > > > > >         > > > > > > Connect clusters there as well.
> > > > > > >         > > > > > >
> > > > > > >         > > > > > > The problem with Connect clusters for
> > > replication
> > > > > is
> > > > > > > way less
> > > > > > >         > > severe
> > > > > > >         > > > > > > compared to legacy MirrorMaker. Generally
> you
> > > > need
> > > > > > one
> > > > > > > Connect
> > > > > > >         > > > cluster
> > > > > > >         > > > > > per
> > > > > > >         > > > > > > active Kafka cluster. As you point out,
> MM2's
> > > > > > > SinkConnector means
> > > > > > >         > > you
> > > > > > >         > > > > can
> > > > > > >         > > > > > > get away with a single Connect cluster for
> > > > > topologies
> > > > > > > that center
> > > > > > >         > > > > around
> > > > > > >         > > > > > a
> > > > > > >         > > > > > > single primary cluster. But each Connector
> > > within
> > > > > > each
> > > > > > > Connect
> > > > > > >         > > > cluster
> > > > > > >         > > > > > must
> > > > > > >         > > > > > > be configured independently, with no
> > high-level
> > > > > view
> > > > > > > of your
> > > > > > >         > > > > replication
> > > > > > >         > > > > > > flows within and between data centers.
> > > > > > >         > > > > > >
> > > > > > >         > > > > > > With MirrorMaker 2.0, a single MirrorMaker
> > > > cluster
> > > > > > > manages
> > > > > > >         > > > replication
> > > > > > >         > > > > > > across any number of Kafka clusters. Much
> > like
> > > > > > > Brooklin, MM2 does
> > > > > > >         > > the
> > > > > > >         > > > > > work
> > > > > > >         > > > > > > of setting up connectors between clusters
> as
> > > > > needed.
> > > > > > > This
> > > > > > >         > > > > > > Replication-as-a-Service is a huge win for
> > > larger
> > > > > > > deployments, as
> > > > > > >         > > > well
> > > > > > >         > > > > as
> > > > > > >         > > > > > > for organizations that haven't adopted
> > Connect.
> > > > > > >         > > > > > >
> > > > > > >         > > > > > > [1]
> > > > > > >         > > > > > >
> > > > > > >         > > > > >
> > > > > > >         > > > >
> > > > > > >         > > >
> > > > > > >         > >
> > > > > > >         >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.slideshare.net/ConfluentInc/more-data-more-problems-scaling-kafkamirroring-pipelines-at-linkedin
> > > > > > >         > > > > > >
> > > > > > >         > > > > > > Keep the questions coming! Thanks.
> > > > > > >         > > > > > > Ryanne
> > > > > > >         > > > > > >
> > > > > > >         > > > > > > On Thu, Nov 29, 2018 at 3:30 AM Sönke
> Liebau
> > <
> > > > > > >         > > > > [email protected]
> > > > > > >         > > > > > >
> > > > > > >         > > > > > > wrote:
> > > > > > >         > > > > > >
> > > > > > >         > > > > > >> Hi Ryanne,
> > > > > > >         > > > > > >>
> > > > > > >         > > > > > >> first of all, thanks for the KIP, great
> work
> > > > > overall
> > > > > > > and much
> > > > > > >         > > > needed I
> > > > > > >         > > > > > >> think!
> > > > > > >         > > > > > >>
> > > > > > >         > > > > > >> I have a small comment on the renaming
> > policy,
> > > > in
> > > > > > one
> > > > > > > of the
> > > > > > >         > mails
> > > > > > >         > > > on
> > > > > > >         > > > > > >> this thread you mention that this can be
> > > > disabled
> > > > > > (to
> > > > > > > replicate
> > > > > > >         > > > topic1
> > > > > > >         > > > > > in
> > > > > > >         > > > > > >> cluster A as topic1 on cluster B I
> assume).
> > > The
> > > > > KIP
> > > > > > > itself does
> > > > > > >         > > not
> > > > > > >         > > > > > mention
> > > > > > >         > > > > > >> this, from reading just the KIP one might
> > get
> > > > the
> > > > > > > assumption
> > > > > > >         > that
> > > > > > >         > > > > > renaming
> > > > > > >         > > > > > >> is mandatory. It might be useful to add a
> > > > sentence
> > > > > > or
> > > > > > > two around
> > > > > > >         > > > > > renaming
> > > > > > >         > > > > > >> policies and what is possible here. I
> assume
> > > you
> > > > > > > intend to make
> > > > > > >         > > > these
> > > > > > >         > > > > > >> pluggable?
> > > > > > >         > > > > > >>
> > > > > > >         > > > > > >> Regarding the latest addition of
> > "MirrorMaker
> > > > > > > clusters" I am not
> > > > > > >         > > > sure
> > > > > > >         > > > > I
> > > > > > >         > > > > > >> fully understand the issue you are trying
> to
> > > > solve
> > > > > > > and what
> > > > > > >         > > exactly
> > > > > > >         > > > > > these
> > > > > > >         > > > > > >> scripts will do - but that may just me
> being
> > > > dense
> > > > > > > about it :)
> > > > > > >         > > > > > >> I understand the limitation to a single
> > source
> > > > and
> > > > > > > target
> > > > > > >         > cluster
> > > > > > >         > > > that
> > > > > > >         > > > > > >> Connect imposes, but isn't this worked
> > around
> > > by
> > > > > the
> > > > > > > fact that
> > > > > > >         > you
> > > > > > >         > > > > have
> > > > > > >         > > > > > >> MirrorSource- and MirrorSinkConnectors and
> > one
> > > > > part
> > > > > > > of the
> > > > > > >         > > equation
> > > > > > >         > > > > will
> > > > > > >         > > > > > >> always be under your control?
> > > > > > >         > > > > > >> The way I understood your intention was
> that
> > > > there
> > > > > > is
> > > > > > > a
> > > > > > >         > (regular,
> > > > > > >         > > > not
> > > > > > >         > > > > > MM)
> > > > > > >         > > > > > >> Connect Cluster somewhere next to a Kafka
> > > > Cluster
> > > > > A
> > > > > > > and if you
> > > > > > >         > > > deploy
> > > > > > >         > > > > a
> > > > > > >         > > > > > >> MirrorSourceTask to that it will read
> > messages
> > > > > from
> > > > > > a
> > > > > > > remote
> > > > > > >         > > > cluster B
> > > > > > >         > > > > > and
> > > > > > >         > > > > > >> replicate them into the local cluster A.
> If
> > > you
> > > > > > > deploy a
> > > > > > >         > > > > MirrorSinkTask
> > > > > > >         > > > > > it
> > > > > > >         > > > > > >> will read from local cluster A and
> replicate
> > > > into
> > > > > > > cluster B.
> > > > > > >         > > > > > >>
> > > > > > >         > > > > > >> Since in both causes the configuration for
> > > > > cluster B
> > > > > > > will be
> > > > > > >         > > passed
> > > > > > >         > > > > into
> > > > > > >         > > > > > >> the connector in the ConnectorConfig
> > contained
> > > > in
> > > > > > the
> > > > > > > rest
> > > > > > >         > > request,
> > > > > > >         > > > > > what's
> > > > > > >         > > > > > >> to stop us from starting a third connector
> > > with
> > > > a
> > > > > > >         > MirrorSourceTask
> > > > > > >         > > > > > reading
> > > > > > >         > > > > > >> from cluster C?
> > > > > > >         > > > > > >>
> > > > > > >         > > > > > >> I am a bit hesitant about the entire
> concept
> > > of
> > > > > > > having extra
> > > > > > >         > > scripts
> > > > > > >         > > > > to
> > > > > > >         > > > > > >> run an entire separate Connect cluster -
> I'd
> > > > much
> > > > > > > prefer an
> > > > > > >         > option
> > > > > > >         > > > to
> > > > > > >         > > > > > use a
> > > > > > >         > > > > > >> regular connect cluster from an ops point
> of
> > > > view.
> > > > > > Is
> > > > > > > it maybe
> > > > > > >         > > worth
> > > > > > >         > > > > > >> spending some time investigating whether
> we
> > > can
> > > > > come
> > > > > > > up with a
> > > > > > >         > > > change
> > > > > > >         > > > > to
> > > > > > >         > > > > > >> connect that enables what MM would need?
> > > > > > >         > > > > > >>
> > > > > > >         > > > > > >> Best regards,
> > > > > > >         > > > > > >> Sönke
> > > > > > >         > > > > > >>
> > > > > > >         > > > > > >>
> > > > > > >         > > > > > >>
> > > > > > >         > > > > > >> On Tue, Nov 27, 2018 at 10:02 PM Ryanne
> > Dolan
> > > <
> > > > > > >         > > > [email protected]>
> > > > > > >         > > > > > >> wrote:
> > > > > > >         > > > > > >>
> > > > > > >         > > > > > >>> Hey y'all, I'd like you draw your
> attention
> > > to
> > > > a
> > > > > > new
> > > > > > > section in
> > > > > > >         > > > > KIP-382
> > > > > > >         > > > > > >>> re
> > > > > > >         > > > > > >>> MirrorMaker Clusters:
> > > > > > >         > > > > > >>>
> > > > > > >         > > > > > >>>
> > > > > > >         > > > > > >>>
> > > > > > >         > > > > >
> > > > > > >         > > > >
> > > > > > >         > > >
> > > > > > >         > >
> > > > > > >         >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382:+MirrorMaker+2.0#KIP-382:MirrorMaker2.0-MirrorMakerClusters
> > > > > > >         > > > > > >>>
> > > > > > >         > > > > > >>> A common concern I hear about using
> Connect
> > > for
> > > > > > > replication is
> > > > > > >         > > that
> > > > > > >         > > > > all
> > > > > > >         > > > > > >>> SourceConnectors in a Connect cluster
> must
> > > use
> > > > > the
> > > > > > > same target
> > > > > > >         > > > Kafka
> > > > > > >         > > > > > >>> cluster, and likewise all SinkConnectors
> > must
> > > > use
> > > > > > > the same
> > > > > > >         > source
> > > > > > >         > > > > Kafka
> > > > > > >         > > > > > >>> cluster. In order to use multiple Kafka
> > > > clusters
> > > > > > > from Connect,
> > > > > > >         > > > there
> > > > > > >         > > > > > are
> > > > > > >         > > > > > >>> two possible approaches:
> > > > > > >         > > > > > >>>
> > > > > > >         > > > > > >>> 1) use an intermediate Kafka cluster, K.
> > > > > > > SourceConnectors (A,
> > > > > > >         > B,
> > > > > > >         > > C)
> > > > > > >         > > > > > write
> > > > > > >         > > > > > >>> to K and SinkConnectors (X, Y, Z) read
> from
> > > K.
> > > > > This
> > > > > > > enables
> > > > > > >         > flows
> > > > > > >         > > > > like
> > > > > > >         > > > > > A
> > > > > > >         > > > > > >>> ->
> > > > > > >         > > > > > >>> K - > X but means that some topologies
> > > require
> > > > > > > extraneous hops,
> > > > > > >         > > and
> > > > > > >         > > > > > means
> > > > > > >         > > > > > >>> that K must be scaled to handle records
> > from
> > > > all
> > > > > > > sources and
> > > > > > >         > > sinks.
> > > > > > >         > > > > > >>>
> > > > > > >         > > > > > >>> 2) use multiple Connect clusters, one for
> > > each
> > > > > > > target cluster.
> > > > > > >         > > Each
> > > > > > >         > > > > > >>> cluster
> > > > > > >         > > > > > >>> has multiple SourceConnectors, one for
> each
> > > > > source
> > > > > > > cluster.
> > > > > > >         > This
> > > > > > >         > > > > > enables
> > > > > > >         > > > > > >>> direct replication of A -> X but means
> > there
> > > > is a
> > > > > > > proliferation
> > > > > > >         > > of
> > > > > > >         > > > > > >>> Connect
> > > > > > >         > > > > > >>> clusters, each of which must be managed
> > > > > separately.
> > > > > > >         > > > > > >>>
> > > > > > >         > > > > > >>> Both options are viable for small
> > deployments
> > > > > > > involving a small
> > > > > > >         > > > > number
> > > > > > >         > > > > > of
> > > > > > >         > > > > > >>> Kafka clusters in a small number of data
> > > > centers.
> > > > > > > However,
> > > > > > >         > > neither
> > > > > > >         > > > is
> > > > > > >         > > > > > >>> scalable, especially from an operational
> > > > > > standpoint.
> > > > > > >         > > > > > >>>
> > > > > > >         > > > > > >>> KIP-382 now introduces "MirrorMaker
> > > clusters",
> > > > > > which
> > > > > > > are
> > > > > > >         > distinct
> > > > > > >         > > > > from
> > > > > > >         > > > > > >>> Connect clusters. A single MirrorMaker
> > > cluster
> > > > > > > provides
> > > > > > >         > > > > > >>> "Replication-as-a-Service" among any
> number
> > > of
> > > > > > Kafka
> > > > > > > clusters
> > > > > > >         > > via a
> > > > > > >         > > > > > >>> high-level REST API based on the Connect
> > API.
> > > > > Under
> > > > > > > the hood,
> > > > > > >         > > > > > MirrorMaker
> > > > > > >         > > > > > >>> sets up Connectors between each pair of
> > Kafka
> > > > > > > clusters. The
> > > > > > >         > REST
> > > > > > >         > > > API
> > > > > > >         > > > > > >>> enables on-the-fly reconfiguration of
> each
> > > > > > > Connector, including
> > > > > > >         > > > > updates
> > > > > > >         > > > > > >>> to
> > > > > > >         > > > > > >>> topic whitelists/blacklists.
> > > > > > >         > > > > > >>>
> > > > > > >         > > > > > >>> To configure MirrorMaker 2.0, you need a
> > > > > > > configuration file
> > > > > > >         > that
> > > > > > >         > > > > lists
> > > > > > >         > > > > > >>> connection information for each Kafka
> > cluster
> > > > > > > (broker lists,
> > > > > > >         > SSL
> > > > > > >         > > > > > settings
> > > > > > >         > > > > > >>> etc). At a minimum, this looks like:
> > > > > > >         > > > > > >>>
> > > > > > >         > > > > > >>> clusters=us-west, us-east
> > > > > > >         > > > > > >>>
> > > > > > cluster.us-west.broker.list=us-west-kafka-server:9092
> > > > > > >         > > > > > >>>
> > > > > > cluster.us-east.broker.list=us-east-kafka-server:9092
> > > > > > >         > > > > > >>>
> > > > > > >         > > > > > >>> You can specify topic whitelists and
> other
> > > > > > > connector-level
> > > > > > >         > > settings
> > > > > > >         > > > > > here
> > > > > > >         > > > > > >>> too, or you can use the REST API to
> > > > > remote-control
> > > > > > a
> > > > > > > running
> > > > > > >         > > > cluster.
> > > > > > >         > > > > > >>>
> > > > > > >         > > > > > >>> I've also updated the KIP with minor
> > changes
> > > to
> > > > > > > bring it in
> > > > > > >         > line
> > > > > > >         > > > with
> > > > > > >         > > > > > the
> > > > > > >         > > > > > >>> current implementation.
> > > > > > >         > > > > > >>>
> > > > > > >         > > > > > >>> Looking forward to your feedback, thanks!
> > > > > > >         > > > > > >>> Ryanne
> > > > > > >         > > > > > >>>
> > > > > > >         > > > > > >>> On Mon, Nov 19, 2018 at 10:26 PM Ryanne
> > > Dolan <
> > > > > > >         > > > [email protected]
> > > > > > >         > > > > >
> > > > > > >         > > > > > >>> wrote:
> > > > > > >         > > > > > >>>
> > > > > > >         > > > > > >>> > Dan, you've got it right. ACL sync will
> > be
> > > > done
> > > > > > by
> > > > > > > MM2
> > > > > > >         > > > > automatically
> > > > > > >         > > > > > >>> > (unless disabled) according to simple
> > > rules:
> > > > > > >         > > > > > >>> >
> > > > > > >         > > > > > >>> > - If a principal has READ access on a
> > topic
> > > > in
> > > > > a
> > > > > > > source
> > > > > > >         > > cluster,
> > > > > > >         > > > > the
> > > > > > >         > > > > > >>> same
> > > > > > >         > > > > > >>> > principal should have READ access on
> > > > downstream
> > > > > > > replicated
> > > > > > >         > > topics
> > > > > > >         > > > > > >>> ("remote
> > > > > > >         > > > > > >>> > topics").
> > > > > > >         > > > > > >>> > - Only MM2 has WRITE access on "remote
> > > > topics".
> > > > > > >         > > > > > >>> >
> > > > > > >         > > > > > >>> > This covers sync from upstream topics
> > like
> > > > > > > "topic1" to
> > > > > > >         > > downstream
> > > > > > >         >
>

Re: [DISCUSS] KIP-382: MirrorMaker 2.0

Reply via email to