Re: Mirror Maker 2.0 Queries

Ananya Sen Thu, 20 Aug 2020 10:31:25 -0700

Thanks, Ryanne. That answers my questions. I was actually missing this
"tasks.max" property. Thanks for pointing that out.


Furthermore, as per the KIP of Mirror Maker 2.0, there are 3 types of
connectors in a Mirror Maker Cluster:

   1. KafkaSourceConnector - focus on replicating topic partitions
   2. KafkaCheckpointConnector - focus on replicating consumer groups
   3. KafkaHeartbeatConnector - focus on checking cluster availability

*Can we configure tasks.max for each of these connectors separately? That
is, Can I have 3 tasks for KafkaSourceConnector, 5
for KafkaCheckpointConnector, and 1 for KafkaHeartbeatConnector?*



Regards
Ananya Sen

On Thu, Aug 20, 2020 at 6:39 PM Ryanne Dolan <ryannedo...@gmail.com> wrote:

> Ananya, see responses below.
>
> > Can this number of workers be configured?
>
> The number of workers is not exactly configurable, but you can control it
> by spinning up drivers and using the '--clusters' flag. A driver instance
> without '--clusters' will run one worker for each A->B replication flow. So
> e.g. if you've got two clusters being replicated bidirectionally, you'll
> have an A->B worker and a B->A worker on each MM2 driver.
>
> You can use the '--clusters' flag to limit what clusters are targeted for a
> given driver, which is useful in many ways, including to limit the number
> of workers for a given worker. So e.g. if you've got 10 clusters all being
> replicated in a full mesh you can run a driver with '--clusters A' and it
> will have only 9 workers, one for each of the other clusters.
>
> Also note that there is a configuration property 'tasks.max' that controls
> the number of tasks available to workers. Each A->B flow is replicated by a
> Herd of Workers (in Connect terminology), and Herds work on Tasks. By
> default, 'tasks.max' is one, which means there will only be one task for
> each Herd, regardless of how many drivers and workers you spin up. You
> definitely want to change this property. You can tweak this for each A->B
> replication flow independently to strike the right balance. If 'tasks.max'
> is the same or more than the total number of topic-partitions being
> replicated, it will mean each topic-partition is replicated in a dedicated
> task, which is probably not an efficient use of resource overhead.
>
> > Does every topic partition given a new task?
>
> No, topic-partitions are spread out across tasks. Each topic's partitions
> are divided round-robin among available tasks. However, keep in mind that
> if 'tasks.max' is too high, you could end up with one topic-partition in
> each task.
>
> > Does every consumer group - topic pair given a new task for replicating
> offset?
>
> No, consumer-groups are also spread out across tasks. As with
> topic-partitions, 'tasks.max' applies.
>
> > How can I scale up the mirror maker instance so that I can have very
> little lag?
>
> Tweak 'tasks.max' and spin up more driver instances.
>
> Ryanne
>
> On Sat, Aug 8, 2020 at 1:43 AM Ananya Sen <ananya281...@gmail.com> wrote:
>
> > Thank you Ryanne for the quick response.
> > I further want to clarify a few points.
> >
> > The mirror maker 2.0 is based on the Kafka Connect framework. In Kafka
> > connect we have multiple workers and each worker has some assigned task.
> To
> > map this to Mirror Maker 2.0, A mirror Maker will driver have some
> workers.
> >
> > 1) Can this number of workers be configured?
> > 2) What is the default value of this worker configuration?
> > 3) Does every topic partition given a new task?
> > 4) Does every consumer group - topic pair given a new task for
> replicating
> > offset?
> >
> > Also, consider a case where I have 1000 topics in a Kafka cluster and
> each
> > topic has a high amount of data + new data is being written at high
> > throughput. Now I want to set up a mirror maker 2.0 on this cluster to
> > replicate all the old data (which is retained in the topic) as well as
> the
> > new incoming data in a backup cluster. How can I scale up the mirror
> maker
> > instance so that I can have very little lag?
> >
> > On 2020/07/11 06:37:56, Ananya Sen <ananya281...@gmail.com> wrote:
> > > Hi
> > >
> > > I was exploring the Mirror maker 2.0. I read through this
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> > > documentation
> > > and I have  a few questions.
> > >
> > >    1. For running mirror maker as a dedicated mirror maker cluster, the
> > >    documentation specifies a config file and a starter script. Is this
> > mirror
> > >    maker process distributed ?
> > >    2. I could not find any port configuration for the above mirror
> maker
> > >    process, So can we configure mirror maker itself to run as a cluster
> > i.e
> > >    running the process instance across multiple server to avoid
> downtime
> > due
> > >    to server crash.
> > >    3. If we could somehow run the mirror maker as a distributed process
> > >    then does that mean that topic and consumer offset replication will
> be
> > >    shared among those mirror maker processes?
> > >    4. What is the default port of this mirror maker process and how can
> > we
> > >    override it?
> > >
> > > Looking forward to your reply.
> > >
> > >
> > > Thanks & Regards
> > > Ananya Sen
> > >
> >
>

Re: Mirror Maker 2.0 Queries

Reply via email to