You have to determine whether the bottleneck is in the consumer or the producer.
To improve the performance of the latter, you can increase the # of total consumer streams. # streams is capped by total # partitions. So, you may need to increase the # of partitions. To improve the performance of the latter, you can (a) increase the batch size in async mode and/or (b) run more instances of producers. Metadata refresh interval is configurable. It's mainly for the producer to pick up newly available partitions. Thanks, Jun On Thu, Aug 22, 2013 at 1:44 PM, Rajasekar Elango <rela...@salesforce.com>wrote: > I am trying to tune mirrormaker configurations based on this doc > < > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring+(MirrorMaker)#Kafkamirroring%28MirrorMaker%29-Consumerandsourceclustersocketbuffersizes > > > and > would like know your recommendations. > > Our configuration: We are doing inter datacenter replication with 5 brokers > in source and destination DC and 2 mirrormakers doing replication. We have > about 4 topics with 4 partitions each. > I have been consumerOffsetChecker to analysis lag based on tuning. > > > 1. num.streams : - We have set num.streams=2 so that 4 partitions will > be shared between 2 mirrormaker. Increasing num.streams more than this > did > not improve any performance, is this correct? > 2. num.producers:- We initially set num.producers = 4 (assuming one > producer thread per topic), then we bumped num.producers = 16, but did > not > see any improvement in performance..? Is this correct..? How do we > determine optimum value for num.producers ? > 3. *socket.buffersize : *We initially had default values for these, then > I changed socket.send.buffer.bytes on source broker, > socket.receive.buffer.bytes, fetch.message.max.bytes on mirrormaker > consumer properties, socket.receive.buffer.bytes, > socket.request.max.bytes on destination broker all to > 1024*1024*1024(1073741824) . This did improve the performance, but I > could > not get Lag to < 100. > > Here is how our lag looks like after above changes: > > Group Topic Pid Offset > logSize Lag Owner > mirrormakerProd FunnelProto 0 554704539 > 554717088 12549 > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0 > mirrormakerProd FunnelProto 1 547370573 > 547383136 12563 > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1 > mirrormakerProd FunnelProto 2 553124930 > 553125742 812 > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0 > mirrormakerProd FunnelProto 3 552990834 > 552991650 816 > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1 > mirrormakerProd agent 0 35438 35440 > 2 > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0 > mirrormakerProd agent 1 35447 35448 > 1 > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1 > mirrormakerProd agent 2 35375 35375 > 0 > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0 > mirrormakerProd agent 3 35336 35336 > 0 > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1 > mirrormakerProd internal_metrics 0 1930852823 > 1930917418 64595 > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0 > mirrormakerProd internal_metrics 1 1937237324 > 1937301841 64517 > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1 > mirrormakerProd internal_metrics 2 1945894901 > 1945904067 9166 > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0 > mirrormakerProd internal_metrics 3 1946906932 > 1946915928 8996 > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1 > mirrormakerProd jmx 0 485270038 > 485280882 10844 > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0 > mirrormakerProd jmx 1 486363914 > 486374759 10845 > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1 > mirrormakerProd jmx 2 491783842 > 491784826 984 > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0 > mirrormakerProd jmx 3 485675629 > 485676643 1014 > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1 > > In mirrormaker logs, I see topic metadata is fetched after every 10mins and > connection reestablished with producers for producing. Is this normal? If > it's continuously producing, why does it need to reconnect to destination > brokers for producing.? > What else can we tune to bring lag < 100 ..? This is just small set of > data we are currently testing, the real production traffic will be very > large. How can compute optimum configuration as data traffic increases.? > > Thanks for help, > > Thanks, > Raja. >