Ananya, see responses below. > Can this number of workers be configured?
The number of workers is not exactly configurable, but you can control it by spinning up drivers and using the '--clusters' flag. A driver instance without '--clusters' will run one worker for each A->B replication flow. So e.g. if you've got two clusters being replicated bidirectionally, you'll have an A->B worker and a B->A worker on each MM2 driver. You can use the '--clusters' flag to limit what clusters are targeted for a given driver, which is useful in many ways, including to limit the number of workers for a given worker. So e.g. if you've got 10 clusters all being replicated in a full mesh you can run a driver with '--clusters A' and it will have only 9 workers, one for each of the other clusters. Also note that there is a configuration property 'tasks.max' that controls the number of tasks available to workers. Each A->B flow is replicated by a Herd of Workers (in Connect terminology), and Herds work on Tasks. By default, 'tasks.max' is one, which means there will only be one task for each Herd, regardless of how many drivers and workers you spin up. You definitely want to change this property. You can tweak this for each A->B replication flow independently to strike the right balance. If 'tasks.max' is the same or more than the total number of topic-partitions being replicated, it will mean each topic-partition is replicated in a dedicated task, which is probably not an efficient use of resource overhead. > Does every topic partition given a new task? No, topic-partitions are spread out across tasks. Each topic's partitions are divided round-robin among available tasks. However, keep in mind that if 'tasks.max' is too high, you could end up with one topic-partition in each task. > Does every consumer group - topic pair given a new task for replicating offset? No, consumer-groups are also spread out across tasks. As with topic-partitions, 'tasks.max' applies. > How can I scale up the mirror maker instance so that I can have very little lag? Tweak 'tasks.max' and spin up more driver instances. Ryanne On Sat, Aug 8, 2020 at 1:43 AM Ananya Sen <ananya281...@gmail.com> wrote: > Thank you Ryanne for the quick response. > I further want to clarify a few points. > > The mirror maker 2.0 is based on the Kafka Connect framework. In Kafka > connect we have multiple workers and each worker has some assigned task. To > map this to Mirror Maker 2.0, A mirror Maker will driver have some workers. > > 1) Can this number of workers be configured? > 2) What is the default value of this worker configuration? > 3) Does every topic partition given a new task? > 4) Does every consumer group - topic pair given a new task for replicating > offset? > > Also, consider a case where I have 1000 topics in a Kafka cluster and each > topic has a high amount of data + new data is being written at high > throughput. Now I want to set up a mirror maker 2.0 on this cluster to > replicate all the old data (which is retained in the topic) as well as the > new incoming data in a backup cluster. How can I scale up the mirror maker > instance so that I can have very little lag? > > On 2020/07/11 06:37:56, Ananya Sen <ananya281...@gmail.com> wrote: > > Hi > > > > I was exploring the Mirror maker 2.0. I read through this > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0 > > documentation > > and I have a few questions. > > > > 1. For running mirror maker as a dedicated mirror maker cluster, the > > documentation specifies a config file and a starter script. Is this > mirror > > maker process distributed ? > > 2. I could not find any port configuration for the above mirror maker > > process, So can we configure mirror maker itself to run as a cluster > i.e > > running the process instance across multiple server to avoid downtime > due > > to server crash. > > 3. If we could somehow run the mirror maker as a distributed process > > then does that mean that topic and consumer offset replication will be > > shared among those mirror maker processes? > > 4. What is the default port of this mirror maker process and how can > we > > override it? > > > > Looking forward to your reply. > > > > > > Thanks & Regards > > Ananya Sen > > >