Hi Ryanne, first of all, thanks for the KIP, great work overall and much needed I think!
I have a small comment on the renaming policy, in one of the mails on this thread you mention that this can be disabled (to replicate topic1 in cluster A as topic1 on cluster B I assume). The KIP itself does not mention this, from reading just the KIP one might get the assumption that renaming is mandatory. It might be useful to add a sentence or two around renaming policies and what is possible here. I assume you intend to make these pluggable? Regarding the latest addition of "MirrorMaker clusters" I am not sure I fully understand the issue you are trying to solve and what exactly these scripts will do - but that may just me being dense about it :) I understand the limitation to a single source and target cluster that Connect imposes, but isn't this worked around by the fact that you have MirrorSource- and MirrorSinkConnectors and one part of the equation will always be under your control? The way I understood your intention was that there is a (regular, not MM) Connect Cluster somewhere next to a Kafka Cluster A and if you deploy a MirrorSourceTask to that it will read messages from a remote cluster B and replicate them into the local cluster A. If you deploy a MirrorSinkTask it will read from local cluster A and replicate into cluster B. Since in both causes the configuration for cluster B will be passed into the connector in the ConnectorConfig contained in the rest request, what's to stop us from starting a third connector with a MirrorSourceTask reading from cluster C? I am a bit hesitant about the entire concept of having extra scripts to run an entire separate Connect cluster - I'd much prefer an option to use a regular connect cluster from an ops point of view. Is it maybe worth spending some time investigating whether we can come up with a change to connect that enables what MM would need? Best regards, Sönke On Tue, Nov 27, 2018 at 10:02 PM Ryanne Dolan <ryannedo...@gmail.com> wrote: > Hey y'all, I'd like you draw your attention to a new section in KIP-382 re > MirrorMaker Clusters: > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-382:+MirrorMaker+2.0#KIP-382:MirrorMaker2.0-MirrorMakerClusters > > A common concern I hear about using Connect for replication is that all > SourceConnectors in a Connect cluster must use the same target Kafka > cluster, and likewise all SinkConnectors must use the same source Kafka > cluster. In order to use multiple Kafka clusters from Connect, there are > two possible approaches: > > 1) use an intermediate Kafka cluster, K. SourceConnectors (A, B, C) write > to K and SinkConnectors (X, Y, Z) read from K. This enables flows like A -> > K - > X but means that some topologies require extraneous hops, and means > that K must be scaled to handle records from all sources and sinks. > > 2) use multiple Connect clusters, one for each target cluster. Each cluster > has multiple SourceConnectors, one for each source cluster. This enables > direct replication of A -> X but means there is a proliferation of Connect > clusters, each of which must be managed separately. > > Both options are viable for small deployments involving a small number of > Kafka clusters in a small number of data centers. However, neither is > scalable, especially from an operational standpoint. > > KIP-382 now introduces "MirrorMaker clusters", which are distinct from > Connect clusters. A single MirrorMaker cluster provides > "Replication-as-a-Service" among any number of Kafka clusters via a > high-level REST API based on the Connect API. Under the hood, MirrorMaker > sets up Connectors between each pair of Kafka clusters. The REST API > enables on-the-fly reconfiguration of each Connector, including updates to > topic whitelists/blacklists. > > To configure MirrorMaker 2.0, you need a configuration file that lists > connection information for each Kafka cluster (broker lists, SSL settings > etc). At a minimum, this looks like: > > clusters=us-west, us-east > cluster.us-west.broker.list=us-west-kafka-server:9092 > cluster.us-east.broker.list=us-east-kafka-server:9092 > > You can specify topic whitelists and other connector-level settings here > too, or you can use the REST API to remote-control a running cluster. > > I've also updated the KIP with minor changes to bring it in line with the > current implementation. > > Looking forward to your feedback, thanks! > Ryanne > > On Mon, Nov 19, 2018 at 10:26 PM Ryanne Dolan <ryannedo...@gmail.com> > wrote: > > > Dan, you've got it right. ACL sync will be done by MM2 automatically > > (unless disabled) according to simple rules: > > > > - If a principal has READ access on a topic in a source cluster, the same > > principal should have READ access on downstream replicated topics > ("remote > > topics"). > > - Only MM2 has WRITE access on "remote topics". > > > > This covers sync from upstream topics like "topic1" to downstream remote > > topics like "us-west.topic1". What's missing from the KIP, as you point > > out, is ACL sync between normal topics (non-remote). If a consumer has > READ > > access to topic1 in an upstream cluster, should it have READ access in > > topic1 in a downstream cluster? > > > > I think the answer generally is no, you don't want to give principals > > blanket permissions across all DCs automatically. For example, I've seen > > scenarios where certain topics are replicated between an internal and > > external Kafka cluster. You don't want to accidentally push ACL changes > > across this boundary. > > > > Moreover, it's clear that MM2 "owns" downstream remote topics like > > "us-west.topic1" -- MM2 is the only producer and the only admin of these > > topics -- so it's natural to have MM2 set the ACL for these topics. But I > > think it would be surprising if MM2 tried to manipulate topics it doesn't > > own. So I think granting permissions across DCs is probably outside MM2's > > purview, but I agree it'd be nice to have tooling to help with this. > > > > Thanks. > > Ryanne > > > > -- > > www.ryannedolan.info > > > > > > On Mon, Nov 19, 2018 at 3:58 PM daniel.loci...@gmail.com < > > daniel.loci...@gmail.com> wrote: > > > >> Hi guys, > >> > >> This is an exciting topic. could I have a word here? > >> I understand there are many scenarios that we can apply mirrormaker. I > am > >> at the moment working on active/active DC solution using MirrorMaker; > our > >> goal is to allow the clients to failover to connect the other kafka > >> cluster (on the other DC) when an incident happens. > >> > >> To do this, I need: > >> 1 MirrorMaker to replicate the partitioned messages in a sequential > order > >> (in timely fashion) to the same partition on the other cluster (also > need > >> keep the promise that both clusters creates the same number of > partitions > >> for a topic) – so that a consumer can pick up the right order of the > latest > >> messages > >> 2 MirrorMaker to replicate the local consumer offset to the other side – > >> so that the consumer knows where is the offset/ latest messages > >> 3 MirrorMaker to provide cycle detection for messages across the DCs. > >> > >> I can see the possibility for Remote Topic to solve all these problems, > >> as long as the consumer can see the remote topic equally as the local > >> topic, i.e. For a consumer which has a permission to consume topic1, on > >> subscribe event it can automatically subscribe both remote.topic1 and > >> local.topic1. First we need to find a way for topic ACL granting for the > >> consumer across the DCs. Secondly the consumer need to be able to > subscribe > >> topics with wildcard or suffix. Last but not the least, the consumer > has to > >> deal with the timely ordering of the messages from the 2 topics. > >> > >> My understanding is, all of these should be configurable to be turned on > >> or off, to fit for different use cases. > >> > >> Interesting I was going to support topic messages with extra headers of > >> source DC info, for cycle detection….. > >> > >> Looking forward your reply. > >> > >> Regards, > >> > >> Dan > >> On 2018/10/23 19:56:02, Ryanne Dolan <ryannedo...@gmail.com> wrote: > >> > Alex, thanks for the feedback. > >> > > >> > > Would it be possible to utilize the > >> > > Message Headers feature to prevent infinite recursion > >> > > >> > This isn't necessary due to the topic renaming feature which already > >> > prevents infinite recursion. > >> > > >> > If you turn off topic renaming you lose cycle detection, so maybe we > >> could > >> > provide message headers as an optional second mechanism. I'm not > >> opposed to > >> > that idea, but there are ways to improve efficiency if we don't need > to > >> > modify or inspect individual records. > >> > > >> > Ryanne > >> > > >> > On Tue, Oct 23, 2018 at 6:06 AM Alex Mironov <alexandr...@gmail.com> > >> wrote: > >> > > >> > > Hey Ryanne, > >> > > > >> > > Awesome KIP, exited to see improvements in MirrorMaker land, I > >> particularly > >> > > like the reuse of Connect framework! Would it be possible to utilize > >> the > >> > > Message Headers feature to prevent infinite recursion? For example, > >> MM2 > >> > > could stamp every message with a special header payload (e.g. > >> > > MM2="cluster-name-foo") so in case another MM2 instance sees this > >> message > >> > > and it is configured to replicate data into "cluster-name-foo" it > >> would > >> > > just skip it instead of replicating it back. > >> > > > >> > > On Sat, Oct 20, 2018 at 5:48 AM Ryanne Dolan <ryannedo...@gmail.com > > > >> > > wrote: > >> > > > >> > > > Thanks Harsha. Done. > >> > > > > >> > > > On Fri, Oct 19, 2018 at 1:03 AM Harsha Chintalapani < > >> ka...@harsha.io> > >> > > > wrote: > >> > > > > >> > > > > Ryanne, > >> > > > > Makes sense. Can you please add this under rejected > >> alternatives > >> > > > so > >> > > > > that everyone has context on why it wasn’t picked. > >> > > > > > >> > > > > Thanks, > >> > > > > Harsha > >> > > > > On Oct 18, 2018, 8:02 AM -0700, Ryanne Dolan < > >> ryannedo...@gmail.com>, > >> > > > > wrote: > >> > > > > > >> > > > > Harsha, concerning uReplicator specifically, the project is a > >> major > >> > > > > inspiration for MM2, but I don't think it is a good foundation > for > >> > > > anything > >> > > > > included in Apache Kafka. uReplicator uses Helix to solve > >> problems that > >> > > > > Connect also solves, e.g. REST API, live configuration changes, > >> cluster > >> > > > > management, coordination etc. This also means that existing > >> tooling, > >> > > > > dashboards etc that work with Connectors do not work with > >> uReplicator, > >> > > > and > >> > > > > any future tooling would need to treat uReplicator as a special > >> case. > >> > > > > > >> > > > > Ryanne > >> > > > > > >> > > > > On Wed, Oct 17, 2018 at 12:30 PM Ryanne Dolan < > >> ryannedo...@gmail.com> > >> > > > > wrote: > >> > > > > > >> > > > >> Harsha, yes I can do that. I'll update the KIP accordingly, > >> thanks. > >> > > > >> > >> > > > >> Ryanne > >> > > > >> > >> > > > >> On Wed, Oct 17, 2018 at 12:18 PM Harsha <ka...@harsha.io> > wrote: > >> > > > >> > >> > > > >>> Hi Ryanne, > >> > > > >>> Thanks for the KIP. I am also curious about why > >> not > >> > > use > >> > > > >>> the uReplicator design as the foundation given it alreadys > >> resolves > >> > > > some of > >> > > > >>> the fundamental issues in current MIrrorMaker, updating the > >> confifgs > >> > > > on the > >> > > > >>> fly and running the mirror maker agents in a worker model > which > >> can > >> > > > >>> deployed in mesos or container orchestrations. If possible > can > >> you > >> > > > >>> document in the rejected alternatives what are missing parts > >> that > >> > > made > >> > > > you > >> > > > >>> to consider a new design from ground up. > >> > > > >>> > >> > > > >>> Thanks, > >> > > > >>> Harsha > >> > > > >>> > >> > > > >>> On Wed, Oct 17, 2018, at 8:34 AM, Ryanne Dolan wrote: > >> > > > >>> > Jan, these are two separate issues. > >> > > > >>> > > >> > > > >>> > 1) consumer coordination should not, ideally, involve > >> unreliable or > >> > > > >>> slow > >> > > > >>> > connections. Naively, a KafkaSourceConnector would > coordinate > >> via > >> > > the > >> > > > >>> > source cluster. We can do better than this, but I'm > deferring > >> this > >> > > > >>> > optimization for now. > >> > > > >>> > > >> > > > >>> > 2) exactly-once between two clusters is mind-bending. But > >> keep in > >> > > > mind > >> > > > >>> that > >> > > > >>> > transactions are managed by the producer, not the consumer. > In > >> > > fact, > >> > > > >>> it's > >> > > > >>> > the producer that requests that offsets be committed for the > >> > > current > >> > > > >>> > transaction. Obviously, these offsets are committed in > >> whatever > >> > > > >>> cluster the > >> > > > >>> > producer is sending to. > >> > > > >>> > > >> > > > >>> > These two issues are closely related. They are both resolved > >> by not > >> > > > >>> > coordinating or committing via the source cluster. And in > >> fact, > >> > > this > >> > > > >>> is the > >> > > > >>> > general model of SourceConnectors anyway, since most > >> > > SourceConnectors > >> > > > >>> > _only_ have a destination cluster. > >> > > > >>> > > >> > > > >>> > If there is a lot of interest here, I can expound further on > >> this > >> > > > >>> aspect of > >> > > > >>> > MM2, but again I think this is premature until this first > KIP > >> is > >> > > > >>> approved. > >> > > > >>> > I intend to address each of these in separate KIPs following > >> this > >> > > > one. > >> > > > >>> > > >> > > > >>> > Ryanne > >> > > > >>> > > >> > > > >>> > On Wed, Oct 17, 2018 at 7:09 AM Jan Filipiak < > >> > > > jan.filip...@trivago.com > >> > > > >>> > > >> > > > >>> > wrote: > >> > > > >>> > > >> > > > >>> > > This is not a performance optimisation. Its a fundamental > >> design > >> > > > >>> choice. > >> > > > >>> > > > >> > > > >>> > > > >> > > > >>> > > I never really took a look how streams does exactly once. > >> (its a > >> > > > trap > >> > > > >>> > > anyways and you usually can deal with at least once > >> donwstream > >> > > > pretty > >> > > > >>> > > easy). But I am very certain its not gonna get somewhere > if > >> > > offset > >> > > > >>> > > commit and record produce cluster are not the same. > >> > > > >>> > > > >> > > > >>> > > Pretty sure without this _design choice_ you can skip on > >> that > >> > > > exactly > >> > > > >>> > > once already > >> > > > >>> > > > >> > > > >>> > > Best Jan > >> > > > >>> > > > >> > > > >>> > > On 16.10.2018 18:16, Ryanne Dolan wrote: > >> > > > >>> > > > > But one big obstacle in this was > >> > > > >>> > > > always that group coordination happened on the source > >> cluster. > >> > > > >>> > > > > >> > > > >>> > > > Jan, thank you for bringing up this issue with legacy > >> > > > MirrorMaker. > >> > > > >>> I > >> > > > >>> > > > totally agree with you. This is one of several problems > >> with > >> > > > >>> MirrorMaker > >> > > > >>> > > > I intend to solve in MM2, and I already have a design > and > >> > > > >>> prototype that > >> > > > >>> > > > solves this and related issues. But as you pointed out, > >> this > >> > > KIP > >> > > > is > >> > > > >>> > > > already rather complex, and I want to focus on the core > >> feature > >> > > > set > >> > > > >>> > > > rather than performance optimizations for now. If we can > >> agree > >> > > on > >> > > > >>> what > >> > > > >>> > > > MM2 looks like, it will be very easy to agree to improve > >> its > >> > > > >>> performance > >> > > > >>> > > > and reliability. > >> > > > >>> > > > > >> > > > >>> > > > That said, I look forward to your support on a > subsequent > >> KIP > >> > > > that > >> > > > >>> > > > addresses consumer coordination and rebalance issues. > Stay > >> > > tuned! > >> > > > >>> > > > > >> > > > >>> > > > Ryanne > >> > > > >>> > > > > >> > > > >>> > > > On Tue, Oct 16, 2018 at 6:58 AM Jan Filipiak < > >> > > > >>> jan.filip...@trivago.com > >> > > > >>> > > > <mailto:jan.filip...@trivago.com>> wrote: > >> > > > >>> > > > > >> > > > >>> > > > Hi, > >> > > > >>> > > > > >> > > > >>> > > > Currently MirrorMaker is usually run collocated with > >> the > >> > > > target > >> > > > >>> > > > cluster. > >> > > > >>> > > > This is all nice and good. But one big obstacle in > >> this was > >> > > > >>> > > > always that group coordination happened on the > source > >> > > > cluster. > >> > > > >>> So > >> > > > >>> > > when > >> > > > >>> > > > then network was congested, you sometimes loose > group > >> > > > >>> membership and > >> > > > >>> > > > have to rebalance and all this. > >> > > > >>> > > > > >> > > > >>> > > > So one big request from we would be the support of > >> having > >> > > > >>> > > coordination > >> > > > >>> > > > cluster != source cluster. > >> > > > >>> > > > > >> > > > >>> > > > I would generally say a LAN is better than a WAN for > >> doing > >> > > > >>> group > >> > > > >>> > > > coordinaton and there is no reason we couldn't have > a > >> group > >> > > > >>> consuming > >> > > > >>> > > > topics from a different cluster and committing > >> offsets to > >> > > > >>> another > >> > > > >>> > > > one right? > >> > > > >>> > > > > >> > > > >>> > > > Other than that. It feels like the KIP has too much > >> > > features > >> > > > >>> where > >> > > > >>> > > many > >> > > > >>> > > > of them are not really wanted and counter productive > >> but I > >> > > > >>> will just > >> > > > >>> > > > wait and see how the discussion goes. > >> > > > >>> > > > > >> > > > >>> > > > Best Jan > >> > > > >>> > > > > >> > > > >>> > > > > >> > > > >>> > > > On 15.10.2018 18:16, Ryanne Dolan wrote: > >> > > > >>> > > > > Hey y'all! > >> > > > >>> > > > > > >> > > > >>> > > > > Please take a look at KIP-382: > >> > > > >>> > > > > > >> > > > >>> > > > > > >> > > > >>> > > > > >> > > > >>> > > > >> > > > >>> > >> > > > > >> > > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0 > >> > > > >>> > > > > > >> > > > >>> > > > > Thanks for your feedback and support. > >> > > > >>> > > > > > >> > > > >>> > > > > Ryanne > >> > > > >>> > > > > > >> > > > >>> > > > > >> > > > >>> > > > >> > > > >>> > >> > > > >> > >> > > > > >> > > > >> > > > >> > > -- > >> > > Best, > >> > > Alex Mironov > >> > > > >> > > >> > > > -- Sönke Liebau Partner Tel. +49 179 7940878 OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany