Hi Peter,

these are remarkable numbers but to be honest I do not get where you run the 
Mirror Maker processes. 
Do you run them near the remote clusters or near the target (core?) datacenter 
cluster?

As I understand you run 30 MirrorMaker Instances (one for each remote cluster) 
on each of the 100 Kafka Nodes of your core datacenter cluster.
So you run the Mirror Maker on the same machine as the Kafka Nodes and do not 
use a dedicated machines for the Mirror Maker process?


Best regards,
  Franz
 

Gesendet: Dienstag, 12. März 2019 um 16:24 Uhr
Von: "Peter Bukowinski" <pmb...@gmail.com>
An: users@kafka.apache.org
Betreff: Re: Kafka Mirror Maker place of execution
I have a setup with about 30 remote kafka clusters and one cluster in a core 
datacenter where I aggregate data from all the remote clusters. The remote 
clusters have 30 nodes each with moderate specs. The core cluster has 100 nodes 
with lots of cpu, ram, and ssd storage per node.

I run MirrorMaker directly on the core brokers. Each broker runs one 
MirrorMaker instance per edge cluster, sharing the same group.id. Since I’m 
running 100 instances per edge cluster, the number of threads I use = (total 
partition count of topics I am mirroring) / 100. In practice, each MM instance 
runs with about 25 threads, so each broker runs 25*30=750 threads of 
MirrorMaker.

I’ve been running this setup for many months and it’s proved to be stable with 
very low consumer lag.

--
Peter Bukowinski

> On Mar 12, 2019, at 6:42 AM, Ryanne Dolan <ryannedo...@gmail.com> wrote:
>
> Franz, you can run MM on or near either source or target cluster, but it's
> more efficient near the target because this minimizes producer latency. If
> latency is high, poducers will block waiting on ACKs for in-flight records,
> which reduces throughput.
>
> I recommend running MM near the target cluster but not necessarily on the
> same machines, because often Kafka nodes are relatively expensive, with SSD
> arrays and huge IO bandwidth etc, which isn't necessary for MM.
>
> Ryanne
>
> On Tue, Mar 12, 2019, 8:13 AM Franz van Betteraey <fvbetter...@web.de>
> wrote:
>
>> Hi all,
>>
>> there are best practices out there which recommend to run the Mirror Maker
>> on the target cluster.
>>
>> https://community.hortonworks.com/articles/79891/kafka-mirror-maker-best-practices.html
>>
>> I wonder why this recommendation exists because ultimately all data must
>> cross the border between the clusters, regardless of whether they are
>> consumed at the target or produced at the source. A reason I can imagine is
>> that the Mirror Maker supports multimple consumer but only one producer -
>> so consuming data on the way with the greater latency might be speed up by
>> the use of multiple consumers.
>>
>> If performance because of multi threading is a point, would it be usefaul
>> to use several producer (one per consumer) to replicate the data (with a
>> custom replication process)? Does anyone knows why the Mirror Maker shares
>> a single producer among all consumers?
>>
>> My usecase is the replication of data from several source cluster (~10) to
>> a single target cluster. I would prefer to run the replication process on
>> the source cluster to avoid to many replication processes (each for one
>> source) on the target cluster.
>>
>> Hints and suggestions on this topic are very welcome.
>>
>> Best regards
>> Franz
>>
>> If you would like to earn some SO recommendation points feel free to
>> answer this question on SO ;-)
>> https://stackoverflow.com/q/55122268/367285[https://stackoverflow.com/q/55122268/367285]
>>

Reply via email to