Thanks, Ryanne. That answers my questions. I was actually missing this "tasks.max" property. Thanks for pointing that out.
Furthermore, as per the KIP of Mirror Maker 2.0, there are 3 types of connectors in a Mirror Maker Cluster: 1. KafkaSourceConnector - focus on replicating topic partitions 2. KafkaCheckpointConnector - focus on replicating consumer groups 3. KafkaHeartbeatConnector - focus on checking cluster availability *Can we configure tasks.max for each of these connectors separately? That is, Can I have 3 tasks for KafkaSourceConnector, 5 for KafkaCheckpointConnector, and 1 for KafkaHeartbeatConnector?* Regards Ananya Sen On Thu, Aug 20, 2020 at 6:39 PM Ryanne Dolan <ryannedo...@gmail.com> wrote: > Ananya, see responses below. > > > Can this number of workers be configured? > > The number of workers is not exactly configurable, but you can control it > by spinning up drivers and using the '--clusters' flag. A driver instance > without '--clusters' will run one worker for each A->B replication flow. So > e.g. if you've got two clusters being replicated bidirectionally, you'll > have an A->B worker and a B->A worker on each MM2 driver. > > You can use the '--clusters' flag to limit what clusters are targeted for a > given driver, which is useful in many ways, including to limit the number > of workers for a given worker. So e.g. if you've got 10 clusters all being > replicated in a full mesh you can run a driver with '--clusters A' and it > will have only 9 workers, one for each of the other clusters. > > Also note that there is a configuration property 'tasks.max' that controls > the number of tasks available to workers. Each A->B flow is replicated by a > Herd of Workers (in Connect terminology), and Herds work on Tasks. By > default, 'tasks.max' is one, which means there will only be one task for > each Herd, regardless of how many drivers and workers you spin up. You > definitely want to change this property. You can tweak this for each A->B > replication flow independently to strike the right balance. If 'tasks.max' > is the same or more than the total number of topic-partitions being > replicated, it will mean each topic-partition is replicated in a dedicated > task, which is probably not an efficient use of resource overhead. > > > Does every topic partition given a new task? > > No, topic-partitions are spread out across tasks. Each topic's partitions > are divided round-robin among available tasks. However, keep in mind that > if 'tasks.max' is too high, you could end up with one topic-partition in > each task. > > > Does every consumer group - topic pair given a new task for replicating > offset? > > No, consumer-groups are also spread out across tasks. As with > topic-partitions, 'tasks.max' applies. > > > How can I scale up the mirror maker instance so that I can have very > little lag? > > Tweak 'tasks.max' and spin up more driver instances. > > Ryanne > > On Sat, Aug 8, 2020 at 1:43 AM Ananya Sen <ananya281...@gmail.com> wrote: > > > Thank you Ryanne for the quick response. > > I further want to clarify a few points. > > > > The mirror maker 2.0 is based on the Kafka Connect framework. In Kafka > > connect we have multiple workers and each worker has some assigned task. > To > > map this to Mirror Maker 2.0, A mirror Maker will driver have some > workers. > > > > 1) Can this number of workers be configured? > > 2) What is the default value of this worker configuration? > > 3) Does every topic partition given a new task? > > 4) Does every consumer group - topic pair given a new task for > replicating > > offset? > > > > Also, consider a case where I have 1000 topics in a Kafka cluster and > each > > topic has a high amount of data + new data is being written at high > > throughput. Now I want to set up a mirror maker 2.0 on this cluster to > > replicate all the old data (which is retained in the topic) as well as > the > > new incoming data in a backup cluster. How can I scale up the mirror > maker > > instance so that I can have very little lag? > > > > On 2020/07/11 06:37:56, Ananya Sen <ananya281...@gmail.com> wrote: > > > Hi > > > > > > I was exploring the Mirror maker 2.0. I read through this > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0 > > > documentation > > > and I have a few questions. > > > > > > 1. For running mirror maker as a dedicated mirror maker cluster, the > > > documentation specifies a config file and a starter script. Is this > > mirror > > > maker process distributed ? > > > 2. I could not find any port configuration for the above mirror > maker > > > process, So can we configure mirror maker itself to run as a cluster > > i.e > > > running the process instance across multiple server to avoid > downtime > > due > > > to server crash. > > > 3. If we could somehow run the mirror maker as a distributed process > > > then does that mean that topic and consumer offset replication will > be > > > shared among those mirror maker processes? > > > 4. What is the default port of this mirror maker process and how can > > we > > > override it? > > > > > > Looking forward to your reply. > > > > > > > > > Thanks & Regards > > > Ananya Sen > > > > > >