Hello again, the reviewers and myself want to point out that PR-6295
includes the minor API change discussed in KIP-416, a new parameter in
SourceTask.commitRecord(), since KIP-382 depends on it. Concensus is to not
deprecate anything or alter any existing semantics. I've updated both KIPs
to refle
Hey y'all, I'm happy to announce that the PR for "MirrorMaker 2.0" is ready
for review, after a long spell in "draft".
https://github.com/apache/kafka/pull/6295
MirrorMaker 2.0 is in the Kafka 2.3.0 release plan. Please take a look so
we can get this merged.
Also, shameless plug: I'm giving a ta
Pippin, thanks for your interest. I will publish a PR soon (several
days?) which you'll be able to build and play with. Watch this space
:)
Ryanne
On Thu, Jan 24, 2019 at 5:19 PM Pippin Wallace wrote:
>
> I see that the Current state of KIP-382 recently changed from Voting to
> Accepted on Conf
I see that the Current state of KIP-382 recently changed from Voting to
Accepted on Confluence page
https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
I am just looking for a best guess as to when this might make it into an alpha,
beta, or GA release?
Regards,
Pippin
Thanks Dong.
> 1. Currently if there is topic created with "." in the topic name, would
it cause correctness issue for this KIP?
Yes, RemoteClusterUtils would be confused by existing topics that have a
period, and MM2 might try to send records to existing topics if they happen
to be prefixed with
Hey Ryanne,
Sorry I am late here. Thanks much for all the work! After reading through
the latest KIP and all the previous discussion, I have some questions below:
1. Currently if there is topic created with "." in the topic name, would it
cause correctness issue for this KIP? For example, will co
sal makes
> a
> > > lot
> > > > of
> > > > > > > sense
> > > > > > > > > to me. Unifying Kafka clusters in different locations has
> > been
> > > a
> > > > > very
> > > > > > > > > important scenario. Right now open source community does
> not
> > > > have a
> > > > > > > > > standard solutio
> > > > > > > offset.
> > > > > > > > Does the connector need to keep a mapping between each source
> > > > offset
> > > > > to
> > > > > > > > destination offset? If so how would that be done?
> complicates your consumer that expects a single topic to be
> > > > compacted
> > > > > > > > across all clusters, since log-compaction does not span
> topics.
> > > > > > > >
> > > > > > > > Notice
> make
> > > > > this
> > > > > > > trivial!
> > > > > > >
> > > > > > > With (2) you still get the nice DR semantics. The KTable will
> > > > represent
> > > > > > the
> > > >
; > > > get
> > > > > > a foo.aggregate topic automatically. Personally, I think Streams
> is
> > > > > better
> > > > > > suited than Connect for this, but I won't rule it out for a
> future
> > > KIP.
> > &
ccounts are distributed across area based on an number of
> factors
> > > > > >
> > > > > > Account A + B are in Ireland
> > > > > > Account C + D are in Germany
> > > > > > Account E are in UK
> > > > > >
account_state
> > > > > Great!
> > > > >
> > > > > Now image we take a planned or unplanned outage in the UK.
> > > > > We move/shift the transactional processing of account E to Ireland.
> > > > >
> > > > > Now we end up
using
> > > > *.account_state
> > > >
> > > > It will get both current state and also outdated state for E, which
> > state
> > > > would it use? And this gets worse as it scales and you move the
> > > processing
> > > > of
Sönke, I can probably get a KIP together in the next several weeks, but
you're welcome to beat me to it :)
Ryanne
On Fri, Dec 21, 2018, 3:59 AM Sönke Liebau Hi Ryanne,
>
> just to briefly check in, am I understanding your mail correctly, that
> you want to pick up the "multi-cluster/herder/worke
Hi Ryanne,
just to briefly check in, am I understanding your mail correctly, that
you want to pick up the "multi-cluster/herder/worker features" in a
different KIP at some time? If yes, please feel free to let me know if
I can provide any help on that front. Otherwise, I am also happy to
draft a p
Jun, let's leave the REST API out of the KIP then.
I have been arguing that Connect wouldn't benefit from the
multi-cluster/herder/worker features we need in MM2, and that the effort
would result in a needlessly complex Connect REST API. But certainly two
separate APIs is inherently more complex t
Hi, Ryanne,
Hmm, there are still quite a few MM2 specific rest apis. Overall, I am
still not sure that having a separate dedicated MM2 cluster is better. From
an operational perspective, if someone is already running a connect cluster
for other connectors, it seems that it's better to just run MM2
gt;
> > > > > Now image we take a planned or unplanned outage in the UK.
> > > > > We move/shift the transactional processing of account E to Ireland.
> > > > >
> > > > > Now we end up with
> > > > > Z.account_
state and also outdated state for E, which
> > state
> > > > would it use? And this gets worse as it scales and you move the
> > > processing
> > > > of accounts around over time.
> > > >
> > > >
> > > > Likewise the issue is the same w
r the state changes? As order guarantee can be given within a
> > single
> > > topic partition only. You only care for order by the account (not for
> > > offets being same, we simply care that updates are in order and latest
> > > state is at head on restart we ignor
Ryanne, thank you, this looks great and will be really appreciated by the
community.
My only comment at this point: is the REST api strictly necessary for this
KIP? Perhaps consider moving that to a different KIP since the main
contribution is more than sufficient. However that is just a suggestio
> So, if we want to add it, it seems it would be useful to do it in a
backward compatible way in the connect framework, rather than sth specific
to MM
Jun, that sgtm. The MirrorMaker driver I have right now creates multiple
Herders (for multiple Kafka clusters) and exposes them through a high-leve
Hi, Sonke, Ryanne,
Thanks for the explanation. To me, the single connect cluster model could
be useful for any connector, not just MM. So, if we want to add it, it
seems it would be useful to do it in a backward compatible way in the
connect framework, rather than sth specific to MM. I am not sure
Thanks Sönke, you're spot-on. I don't want MM2 to wait for Connect features
that don't exist yet, especially if MM2 is the primary use case for them.
Moreover, I think MM2 can drive and inform some of these features, which
only makes sense if we adopt MM2 first.
Ryanne
On Fri, Dec 14, 2018, 9:03
Hi Jun,
I believe Ryanne's idea is to run multiple workers per MM cluster-node, one
per target cluster. So in essence you'd specify three clusters in the MM
config and MM would then instantiate one worker per cluster. Every MM
connector would then be deployed to the appropriate (internal) worker t
Hi, Ryanne,
Regarding the single connect cluster model, yes, the co-existence of a MM2
REST API and the nearly identical Connect API is one of my concerns.
Implementation wise, my understanding is that the producer URL in a
SourceTask is always obtained from the connect worker's configuration. So,
B1, A3, B2, E1, E2 and in cluster Y you
> > see B1,B2, A1,E1, A2, A3, E2 as the ordering by of the updates account is
> > preserved.
> >
> > With the topic solution your suggesting we would have no way true way of
> > replaying and re-constituting the order between X.acc
_state and
> Z.account_state topics in the case of E, as messages will be in different
> topics and partitions.
>
>
>
>
>
>
> -Original Message-
> From: Ryanne Dolan
> Sent: Wednesday, December 12, 2018 4:37 PM
> To: dev@kafka.apache.org
> Subject: Re:
4:37 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-382: MirrorMaker 2.0
> You haven’t described how one would handle the ordering issues and also
the compaction issues where transactional processing is master-master in
regions, where the processing is sticky to region but of failure
gt; into MM2.
>
>
>
> -Original Message-----
> From: Ryanne Dolan
> Sent: Wednesday, December 12, 2018 6:41 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-382: MirrorMaker 2.0
>
> > One based on hops using headers, and another based on topic nami
> Wikimedia currently implements 'master <-> master' replication by
manually prefixing topics with datacenter names, and then configuring
MirrorMaker to only replicate topics that begin with a DC name to another.
Andrew, this is a common approach and solves some of the problems I've
mentioned, but
you keep compatibility of the handler api interface in MM into
MM2.
-Original Message-
From: Ryanne Dolan
Sent: Wednesday, December 12, 2018 6:41 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-382: MirrorMaker 2.0
> One based on hops using headers, and another based on to
T1 and C43F4SA
> could be processing in germany region currently and C43F2SA could be in uk
> region currently.
>
>
>
> Sent from my Samsung Galaxy smartphone.
> -------- Original message From: Andrew Otto
> Date: 11/12/2018 14:28 (GMT+00:00) To: dev@kafka.apache.o
: dev@kafka.apache.org Subject: Re: [DISCUSS]
KIP-382: MirrorMaker 2.0
Wikimedia currently implements 'master <-> master' replication by manually
prefixing topics with datacenter names, and then configuring MirrorMaker to
only replicate topics that begin with a DC name to another.
Hi Ryanne,
We had an IM exchange about KIP-382 and Mirus a few weeks back, but I also
want to post here to publicly express my support. I'm the primary developer
of Mirus, which is a Kafka Connect based replication tool we wrote at
Salesforce to replace Mirror Maker internally. We open-sourced Mir
Wikimedia currently implements 'master <-> master' replication by manually
prefixing topics with datacenter names, and then configuring MirrorMaker to
only replicate topics that begin with a DC name to another.
While having topics named with topological details is manageable, I
wouldn't say it is
So this is indeed what using headers with hops avoids is creating lots and lots
of topics __, so you can have more complex topology setups.
I ask why not support having two ways of setting up and closing the door?
One based on hops using headers, and another based on topic naming. After all
fle
Hey Ryanne,
Thanks much for the KIP!
Though I don't have time to review this KIP in detail at this stage, I
think this KIP will be very useful to Apache Kafka users (particularly
global enterprise users) who need geo replication capability. Currently
Kafka users have to setup and manage MM cluste
Jun, thanks for your time reviewing the KIP.
> In a MirrorSourceConnector, it seems that the offsets of the source will
be stored in a different cluster from the target cluster?
Jan Filipiak raised this issue as well, and suggested that no state be
tracked in the source cluster. I've since implem
Hi, Ryanne,
Thanks for the KIP. At the high level, this looks like a reasonable
proposal. A few comments below.
1. About using a single connector cluster to manage connectors accessing
multiple Kafka clusters. It's good that you brought this up. The following
are the tradeoffs that I see. The be
Michael, thanks for the comments!
> would like to see support for this to be done by hops, as well [...]
This then allows ring (hops = number of brokers in the ring), mesh (every
cluster interconnected so hop=1), or even a tree (more fine grained setup)
cluster topology.
That's a good idea, thou
Re hops to stop the cycle and to allow a range of multi cluster topologies, see
https://www.rabbitmq.com/federated-exchanges.html where very similar was done
in rabbit.
On 12/7/18, 12:47 AM, "Michael Pearce" wrote:
Nice proposal.
Some comments.
On the section around cycle det
Nice proposal.
Some comments.
On the section around cycle detection.
I would like to see support for this to be done by hops, as well e.g. using
approach is to use a header for the number of hops, as the mm2 replicates it
increases the hop count and you can make the mm2 configurable to only p
Sönke,
> The only thing that I could come up with is the limitation to a single
offset commit interval
Yes, and other internal properties, e.g. those used by the internal
consumers and producers, which, granted, probably are not often changed
from their defaults, but that apply to Connectors acro
Hi Ryanne,
when you say "Currently worker configs apply across the entire cluster,
which is limiting even for use-cases involving a single Kafka cluster.",
may I ask you to elaborate on those limitations a little?
The only thing that I could come up with is the limitation to a single
offset commit
Sönke,
I think so long as we can keep the differences at a very high level (i.e.
the "control plane"), there is little downside to MM2 and Connect
coexisting. I do expect them to converge to some extent, with features from
MM2 being pulled into Connect whenever this is possible without breaking
th
Hi Ryanne,
thanks for your response!
It seems like you have already done a lot of investigation into the
existing code and the solution design and all of what you write makes sense
to me. Would it potentially be worth adding this to the KIP, now that you
had to write it up because of me anyway?
Thanks Sönke.
> it just feels to me like an awful lot of Connect functionality would need
to be reimplemented or at least wrapped
Connect currently has two drivers, ConnectDistributed and
ConnectStandalone. Both set up a Herder, which manages Workers. I've
implemented a third driver which sets up
Hi Ryanne,
thanks. I missed the remote to remote replication scenario in my train of
thought, you are right.
That being said I have to admit that I am not yet fully on board with the
concept, sorry. But I might just be misunderstanding what your intention
is. Let me try and explain what I think i
Sönke, thanks for the feedback!
> the renaming policy [...] can be disabled [...] The KIP itself does not
mention this
Good catch. I've updated the KIP to call this out.
> "MirrorMaker clusters" I am not sure I fully understand the issue you are
trying to solve
MirrorMaker today is not scalabl
Hi Ryanne,
first of all, thanks for the KIP, great work overall and much needed I
think!
I have a small comment on the renaming policy, in one of the mails on this
thread you mention that this can be disabled (to replicate topic1 in
cluster A as topic1 on cluster B I assume). The KIP itself does
Hey y'all, I'd like you draw your attention to a new section in KIP-382 re
MirrorMaker Clusters:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-382:+MirrorMaker+2.0#KIP-382:MirrorMaker2.0-MirrorMakerClusters
A common concern I hear about using Connect for replication is that all
SourceConn
Dan, you've got it right. ACL sync will be done by MM2 automatically
(unless disabled) according to simple rules:
- If a principal has READ access on a topic in a source cluster, the same
principal should have READ access on downstream replicated topics ("remote
topics").
- Only MM2 has WRITE acce
Hi guys,
This is an exciting topic. could I have a word here?
I understand there are many scenarios that we can apply mirrormaker. I am at
the moment working on active/active DC solution using MirrorMaker; our goal is
to allow the clients to failover to connect the other kafka cluster (on the
Alex, thanks for the feedback.
> Would it be possible to utilize the
> Message Headers feature to prevent infinite recursion
This isn't necessary due to the topic renaming feature which already
prevents infinite recursion.
If you turn off topic renaming you lose cycle detection, so maybe we coul
Hey Ryanne,
Awesome KIP, exited to see improvements in MirrorMaker land, I particularly
like the reuse of Connect framework! Would it be possible to utilize the
Message Headers feature to prevent infinite recursion? For example, MM2
could stamp every message with a special header payload (e.g.
MM2
Thanks Harsha. Done.
On Fri, Oct 19, 2018 at 1:03 AM Harsha Chintalapani wrote:
> Ryanne,
>Makes sense. Can you please add this under rejected alternatives so
> that everyone has context on why it wasn’t picked.
>
> Thanks,
> Harsha
> On Oct 18, 2018, 8:02 AM -0700, Ryanne Dolan ,
> wro
Ryanne,
Makes sense. Can you please add this under rejected alternatives so that
everyone has context on why it wasn’t picked.
Thanks,
Harsha
On Oct 18, 2018, 8:02 AM -0700, Ryanne Dolan , wrote:
> Harsha, concerning uReplicator specifically, the project is a major
> inspiration for MM2,
Harsha, concerning uReplicator specifically, the project is a major
inspiration for MM2, but I don't think it is a good foundation for anything
included in Apache Kafka. uReplicator uses Helix to solve problems that
Connect also solves, e.g. REST API, live configuration changes, cluster
management,
Jan, thanks for the share. Also similar are Pulsar's concepts of namespaces
and global topics. I don't think these need to be supported in Kafka
itself, but there are many benefits to adopting naming conventions along
these lines, esp for tooling, dashboards etc.
> use it to copy my messages from
then I just hope that in the midsts of all this new features I can still
at least use it to copy my messages from A to B later.
Another hint you should be aware of:
https://cwiki.apache.org/confluence/display/KAFKA/Hierarchical+Topics
That was always a design I admired, with active / active re
Harsha, yes I can do that. I'll update the KIP accordingly, thanks.
Ryanne
On Wed, Oct 17, 2018 at 12:18 PM Harsha wrote:
> Hi Ryanne,
>Thanks for the KIP. I am also curious about why not use the
> uReplicator design as the foundation given it alreadys resolves some of the
> fun
Hi Ryanne,
Thanks for the KIP. I am also curious about why not use the
uReplicator design as the foundation given it alreadys resolves some of the
fundamental issues in current MIrrorMaker, updating the confifgs on the fly and
running the mirror maker agents in a worker model whic
Jan, these are two separate issues.
1) consumer coordination should not, ideally, involve unreliable or slow
connections. Naively, a KafkaSourceConnector would coordinate via the
source cluster. We can do better than this, but I'm deferring this
optimization for now.
2) exactly-once between two c
This is not a performance optimisation. Its a fundamental design choice.
I never really took a look how streams does exactly once. (its a trap
anyways and you usually can deal with at least once donwstream pretty
easy). But I am very certain its not gonna get somewhere if offset
commit and rec
> Oh - got it, it checks the entire prefix, which seems obvious to me in
retrospect :)
Rhys, I've changed the wording to make this more clear, thanks for calling
it out.
Ryanne
On Tue, Oct 16, 2018 at 4:16 PM McCaig, Rhys
wrote:
>
> > In your example, us-west.us-east.us-central.us-west.topic i
> In your example, us-west.us-east.us-central.us-west.topic is an invalid
> "remote topic" name because us-west appears twice. MM2 will not replicate
> us-east.us-central.us-west.topic into us-west a second time, because the
> source topic already has us-west in the prefix. This is what I mean by
> Could you comment on the approach of
> your method vs. using other open source tools like Uber's uReplicator or
> the recently open-sourced Mirus from Salesforce?
Eno, a primary differentiator is that KIP-382 is "opinionated" about how
replication should be done, e.g. by applying topic renaming
> But one big obstacle in this was
always that group coordination happened on the source cluster.
Jan, thank you for bringing up this issue with legacy MirrorMaker. I
totally agree with you. This is one of several problems with MirrorMaker I
intend to solve in MM2, and I already have a design and
no worries,
glad i could clarify
On 16.10.2018 15:14, Andrew Otto wrote:
> O ok apologies. Interesting!
>
> On Tue, Oct 16, 2018 at 9:06 AM Jan Filipiak
> wrote:
>
>> Hi Andrew,
>>
>> thanks for your message, you missed my point.
>>
>> Mirrormaker collocation with target is for sure correct.
O ok apologies. Interesting!
On Tue, Oct 16, 2018 at 9:06 AM Jan Filipiak
wrote:
> Hi Andrew,
>
> thanks for your message, you missed my point.
>
> Mirrormaker collocation with target is for sure correct.
> But then group coordination happens across WAN which is unnecessary.
> And I request
Hi Andrew,
thanks for your message, you missed my point.
Mirrormaker collocation with target is for sure correct.
But then group coordination happens across WAN which is unnecessary.
And I request to be thought about again.
I made a PR back then for zk Consumer to allow having 2 zookeeper
conne
> I would generally say a LAN is better than a WAN for doing group
> coordinaton
For sure, but a LAN is better than a WAN for producing messages too. If
there is network congestion during network production, messages will be
dropped. With MirrorMaker currently, you can either skip these dropped
Hi,
Currently MirrorMaker is usually run collocated with the target cluster.
This is all nice and good. But one big obstacle in this was
always that group coordination happened on the source cluster. So when
then network was congested, you sometimes loose group membership and
have to rebalance
This update is much needed, thank you! Could you comment on the approach of
your method vs. using other open source tools like Uber's uReplicator or
the recently open-sourced Mirus from Salesforce? (
https://engineering.salesforce.com/open-sourcing-mirus-3ec2c8a38537). I
strongly believe Mirrormake
Rhys, thanks for your enthusiasm!
In your example, us-west.us-east.us-central.us-west.topic is an invalid
"remote topic" name because us-west appears twice. MM2 will not replicate
us-east.us-central.us-west.topic into us-west a second time, because the
source topic already has us-west in the prefi
Hi Ryanne,
This KIP is fantastic. It provides a great vision for how MirrorMaker should
evolve in the Kafka project.
I have a question on cycle detection - In a scenario where I have 3 clusters
replicating between each other, it seems it may be easy to misconfigure the
connectors if auto topic
Hey y'all!
Please take a look at KIP-382:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
Thanks for your feedback and support.
Ryanne
79 matches
Mail list logo