Re: Mirror Maker 2.0 Queries

Ryanne Dolan Thu, 20 Aug 2020 06:09:18 -0700

Ananya, see responses below.

> Can this number of workers be configured?

The number of workers is not exactly configurable, but you can control it
by spinning up drivers and using the '--clusters' flag. A driver instance
without '--clusters' will run one worker for each A->B replication flow. So
e.g. if you've got two clusters being replicated bidirectionally, you'll
have an A->B worker and a B->A worker on each MM2 driver.

You can use the '--clusters' flag to limit what clusters are targeted for a
given driver, which is useful in many ways, including to limit the number
of workers for a given worker. So e.g. if you've got 10 clusters all being
replicated in a full mesh you can run a driver with '--clusters A' and it
will have only 9 workers, one for each of the other clusters.

Also note that there is a configuration property 'tasks.max' that controls
the number of tasks available to workers. Each A->B flow is replicated by a
Herd of Workers (in Connect terminology), and Herds work on Tasks. By
default, 'tasks.max' is one, which means there will only be one task for
each Herd, regardless of how many drivers and workers you spin up. You
definitely want to change this property. You can tweak this for each A->B
replication flow independently to strike the right balance. If 'tasks.max'
is the same or more than the total number of topic-partitions being
replicated, it will mean each topic-partition is replicated in a dedicated
task, which is probably not an efficient use of resource overhead.

> Does every topic partition given a new task?

No, topic-partitions are spread out across tasks. Each topic's partitions
are divided round-robin among available tasks. However, keep in mind that
if 'tasks.max' is too high, you could end up with one topic-partition in
each task.

> Does every consumer group - topic pair given a new task for replicating
offset?

No, consumer-groups are also spread out across tasks. As with
topic-partitions, 'tasks.max' applies.

> How can I scale up the mirror maker instance so that I can have very
little lag?

Tweak 'tasks.max' and spin up more driver instances.

Ryanne

On Sat, Aug 8, 2020 at 1:43 AM Ananya Sen <ananya281...@gmail.com> wrote:

> Thank you Ryanne for the quick response.
> I further want to clarify a few points.
>
> The mirror maker 2.0 is based on the Kafka Connect framework. In Kafka
> connect we have multiple workers and each worker has some assigned task. To
> map this to Mirror Maker 2.0, A mirror Maker will driver have some workers.
>
> 1) Can this number of workers be configured?
> 2) What is the default value of this worker configuration?
> 3) Does every topic partition given a new task?
> 4) Does every consumer group - topic pair given a new task for replicating
> offset?
>
> Also, consider a case where I have 1000 topics in a Kafka cluster and each
> topic has a high amount of data + new data is being written at high
> throughput. Now I want to set up a mirror maker 2.0 on this cluster to
> replicate all the old data (which is retained in the topic) as well as the
> new incoming data in a backup cluster. How can I scale up the mirror maker
> instance so that I can have very little lag?
>
> On 2020/07/11 06:37:56, Ananya Sen <ananya281...@gmail.com> wrote:
> > Hi
> >
> > I was exploring the Mirror maker 2.0. I read through this
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> > documentation
> > and I have  a few questions.
> >
> >    1. For running mirror maker as a dedicated mirror maker cluster, the
> >    documentation specifies a config file and a starter script. Is this
> mirror
> >    maker process distributed ?
> >    2. I could not find any port configuration for the above mirror maker
> >    process, So can we configure mirror maker itself to run as a cluster
> i.e
> >    running the process instance across multiple server to avoid downtime
> due
> >    to server crash.
> >    3. If we could somehow run the mirror maker as a distributed process
> >    then does that mean that topic and consumer offset replication will be
> >    shared among those mirror maker processes?
> >    4. What is the default port of this mirror maker process and how can
> we
> >    override it?
> >
> > Looking forward to your reply.
> >
> >
> > Thanks & Regards
> > Ananya Sen
> >
>

Re: Mirror Maker 2.0 Queries

Reply via email to