The bottleneck can be either CPU, network, or disk I/O. You just need to
monitor the load on each. For example, if you monitor the per thread level
CPU load, in MM you can figure out if there is single thread that's the
bottleneck. Then you can look at the I/O load on the target broker and see
if I/O is saturated. If not, increasing the batch size in the producer will
likely help.

Refreshing metadata is only for existing topics. The producer always
refreshes metadata on new topics that it never sees.

Thanks,

Jun


On Fri, Aug 23, 2013 at 7:08 AM, Rajasekar Elango <rela...@salesforce.com>wrote:

> Thanks Jun,
>
> What trouble shooting steps can we do to identify if bottleneck is with
> consuming or producing..? Does changing anything in log4j configuration or
> a jmx mbeans provide insight into it..? Does Metadata refresh interval
> affect picking up new partitions for only existing topic or it affect
> picking up any new topics..?
>
> Thanks,
> Raja.
>
> ---------- Forwarded message ----------
> From: Jun Rao <jun...@gmail.com>
> Date: Fri, Aug 23, 2013 at 12:08 AM
> Subject: Re: Tuning mirror maker performance
> To: "users@kafka.apache.org" <users@kafka.apache.org>
>
>
> You have to determine whether the bottleneck is in the consumer or the
> producer.
>
> To improve the performance of the latter, you can increase the # of total
> consumer streams. # streams is capped by total # partitions. So, you may
> need to increase the # of partitions.
>
> To improve the performance of the latter, you can (a) increase the batch
> size in async mode and/or (b) run more instances of producers.
>
> Metadata refresh interval is configurable. It's mainly for the producer to
> pick up newly available partitions.
>
> Thanks,
>
> Jun
>
>
> On Thu, Aug 22, 2013 at 1:44 PM, Rajasekar Elango <rela...@salesforce.com
> >wrote:
>
> > I am trying to tune mirrormaker configurations based on this doc
> > <
> >
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring+(MirrorMaker)#Kafkamirroring%28MirrorMaker%29-Consumerandsourceclustersocketbuffersizes
> > >
> > and
> > would like know your recommendations.
> >
> > Our configuration: We are doing inter datacenter replication with 5
> brokers
> > in source and destination DC and 2 mirrormakers doing replication. We
> have
> > about 4 topics with 4 partitions each.
> > I have been consumerOffsetChecker to analysis lag based on tuning.
> >
> >
> >    1. num.streams : - We have set num.streams=2 so that 4 partitions will
> >    be shared between 2 mirrormaker. Increasing num.streams more than this
> > did
> >    not improve any performance, is this correct?
> >    2. num.producers:- We initially set num.producers = 4 (assuming one
> >    producer thread per topic), then we bumped num.producers = 16, but did
> > not
> >    see any improvement in performance..? Is this correct..? How do we
> >    determine optimum value for num.producers ?
> >    3. *socket.buffersize : *We initially had default values for these,
> then
> >    I changed socket.send.buffer.bytes on source broker,
> >    socket.receive.buffer.bytes, fetch.message.max.bytes on mirrormaker
> >    consumer properties, socket.receive.buffer.bytes,
> >    socket.request.max.bytes on destination broker all to
> >    1024*1024*1024(1073741824) . This did improve the performance, but I
> > could
> >    not get Lag to < 100.
> >
> >    Here is how our lag looks like after above changes:
> >
> > Group           Topic                                  Pid Offset
> >     logSize          Lag             Owner
> > mirrormakerProd FunnelProto                    0   554704539
> > 554717088       12549
> > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
> > mirrormakerProd FunnelProto                    1   547370573
> > 547383136       12563
> > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
> > mirrormakerProd FunnelProto                    2   553124930
> > 553125742       812
> > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
> > mirrormakerProd FunnelProto                    3   552990834
> > 552991650       816
> > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
> > mirrormakerProd agent                          0   35438           35440
> >         2
> > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
> > mirrormakerProd agent                          1   35447           35448
> >         1
> > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
> > mirrormakerProd agent                          2   35375           35375
> >         0
> > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
> > mirrormakerProd agent                          3   35336           35336
> >         0
> > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
> > mirrormakerProd internal_metrics               0   1930852823
> >  1930917418      64595
> > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
> > mirrormakerProd internal_metrics               1   1937237324
> >  1937301841      64517
> > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
> > mirrormakerProd internal_metrics               2   1945894901
> >  1945904067      9166
> >  mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
> > mirrormakerProd internal_metrics               3   1946906932
> >  1946915928      8996
> >  mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
> > mirrormakerProd jmx                            0   485270038
> > 485280882       10844
> > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
> > mirrormakerProd jmx                            1   486363914
> > 486374759       10845
> > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
> > mirrormakerProd jmx                            2   491783842
> > 491784826       984
> > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
> > mirrormakerProd jmx                            3   485675629
> > 485676643       1014
> >  mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
> >
> > In mirrormaker logs, I see topic metadata is fetched after every 10mins
> and
> > connection reestablished with producers for producing. Is this normal? If
> > it's continuously producing, why does it need to reconnect to destination
> > brokers for producing.?
> > What else can we tune to bring lag < 100 ..?  This is just small set of
> > data we are currently testing, the real production traffic will be very
> > large. How can compute optimum configuration as data traffic increases.?
> >
> > Thanks for help,
> >
> > Thanks,
> > Raja.
> >
>
>
>
> --
> Thanks,
> Raja.
>

Reply via email to