You have to determine whether the bottleneck is in the consumer or the
producer.

To improve the performance of the latter, you can increase the # of total
consumer streams. # streams is capped by total # partitions. So, you may
need to increase the # of partitions.

To improve the performance of the latter, you can (a) increase the batch
size in async mode and/or (b) run more instances of producers.

Metadata refresh interval is configurable. It's mainly for the producer to
pick up newly available partitions.

Thanks,

Jun


On Thu, Aug 22, 2013 at 1:44 PM, Rajasekar Elango <rela...@salesforce.com>wrote:

> I am trying to tune mirrormaker configurations based on this doc
> <
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring+(MirrorMaker)#Kafkamirroring%28MirrorMaker%29-Consumerandsourceclustersocketbuffersizes
> >
> and
> would like know your recommendations.
>
> Our configuration: We are doing inter datacenter replication with 5 brokers
> in source and destination DC and 2 mirrormakers doing replication. We have
> about 4 topics with 4 partitions each.
> I have been consumerOffsetChecker to analysis lag based on tuning.
>
>
>    1. num.streams : - We have set num.streams=2 so that 4 partitions will
>    be shared between 2 mirrormaker. Increasing num.streams more than this
> did
>    not improve any performance, is this correct?
>    2. num.producers:- We initially set num.producers = 4 (assuming one
>    producer thread per topic), then we bumped num.producers = 16, but did
> not
>    see any improvement in performance..? Is this correct..? How do we
>    determine optimum value for num.producers ?
>    3. *socket.buffersize : *We initially had default values for these, then
>    I changed socket.send.buffer.bytes on source broker,
>    socket.receive.buffer.bytes, fetch.message.max.bytes on mirrormaker
>    consumer properties, socket.receive.buffer.bytes,
>    socket.request.max.bytes on destination broker all to
>    1024*1024*1024(1073741824) . This did improve the performance, but I
> could
>    not get Lag to < 100.
>
>    Here is how our lag looks like after above changes:
>
> Group           Topic                                  Pid Offset
>     logSize          Lag             Owner
> mirrormakerProd FunnelProto                    0   554704539
> 554717088       12549
> mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
> mirrormakerProd FunnelProto                    1   547370573
> 547383136       12563
> mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
> mirrormakerProd FunnelProto                    2   553124930
> 553125742       812
> mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
> mirrormakerProd FunnelProto                    3   552990834
> 552991650       816
> mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
> mirrormakerProd agent                          0   35438           35440
>         2
> mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
> mirrormakerProd agent                          1   35447           35448
>         1
> mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
> mirrormakerProd agent                          2   35375           35375
>         0
> mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
> mirrormakerProd agent                          3   35336           35336
>         0
> mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
> mirrormakerProd internal_metrics               0   1930852823
>  1930917418      64595
> mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
> mirrormakerProd internal_metrics               1   1937237324
>  1937301841      64517
> mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
> mirrormakerProd internal_metrics               2   1945894901
>  1945904067      9166
>  mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
> mirrormakerProd internal_metrics               3   1946906932
>  1946915928      8996
>  mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
> mirrormakerProd jmx                            0   485270038
> 485280882       10844
> mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0
> mirrormakerProd jmx                            1   486363914
> 486374759       10845
> mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1
> mirrormakerProd jmx                            2   491783842
> 491784826       984
> mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0
> mirrormakerProd jmx                            3   485675629
> 485676643       1014
>  mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1
>
> In mirrormaker logs, I see topic metadata is fetched after every 10mins and
> connection reestablished with producers for producing. Is this normal? If
> it's continuously producing, why does it need to reconnect to destination
> brokers for producing.?
> What else can we tune to bring lag < 100 ..?  This is just small set of
> data we are currently testing, the real production traffic will be very
> large. How can compute optimum configuration as data traffic increases.?
>
> Thanks for help,
>
> Thanks,
> Raja.
>

Reply via email to