The bottleneck can be either CPU, network, or disk I/O. You just need to monitor the load on each. For example, if you monitor the per thread level CPU load, in MM you can figure out if there is single thread that's the bottleneck. Then you can look at the I/O load on the target broker and see if I/O is saturated. If not, increasing the batch size in the producer will likely help.
Refreshing metadata is only for existing topics. The producer always refreshes metadata on new topics that it never sees. Thanks, Jun On Fri, Aug 23, 2013 at 7:08 AM, Rajasekar Elango <rela...@salesforce.com>wrote: > Thanks Jun, > > What trouble shooting steps can we do to identify if bottleneck is with > consuming or producing..? Does changing anything in log4j configuration or > a jmx mbeans provide insight into it..? Does Metadata refresh interval > affect picking up new partitions for only existing topic or it affect > picking up any new topics..? > > Thanks, > Raja. > > ---------- Forwarded message ---------- > From: Jun Rao <jun...@gmail.com> > Date: Fri, Aug 23, 2013 at 12:08 AM > Subject: Re: Tuning mirror maker performance > To: "users@kafka.apache.org" <users@kafka.apache.org> > > > You have to determine whether the bottleneck is in the consumer or the > producer. > > To improve the performance of the latter, you can increase the # of total > consumer streams. # streams is capped by total # partitions. So, you may > need to increase the # of partitions. > > To improve the performance of the latter, you can (a) increase the batch > size in async mode and/or (b) run more instances of producers. > > Metadata refresh interval is configurable. It's mainly for the producer to > pick up newly available partitions. > > Thanks, > > Jun > > > On Thu, Aug 22, 2013 at 1:44 PM, Rajasekar Elango <rela...@salesforce.com > >wrote: > > > I am trying to tune mirrormaker configurations based on this doc > > < > > > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+mirroring+(MirrorMaker)#Kafkamirroring%28MirrorMaker%29-Consumerandsourceclustersocketbuffersizes > > > > > and > > would like know your recommendations. > > > > Our configuration: We are doing inter datacenter replication with 5 > brokers > > in source and destination DC and 2 mirrormakers doing replication. We > have > > about 4 topics with 4 partitions each. > > I have been consumerOffsetChecker to analysis lag based on tuning. > > > > > > 1. num.streams : - We have set num.streams=2 so that 4 partitions will > > be shared between 2 mirrormaker. Increasing num.streams more than this > > did > > not improve any performance, is this correct? > > 2. num.producers:- We initially set num.producers = 4 (assuming one > > producer thread per topic), then we bumped num.producers = 16, but did > > not > > see any improvement in performance..? Is this correct..? How do we > > determine optimum value for num.producers ? > > 3. *socket.buffersize : *We initially had default values for these, > then > > I changed socket.send.buffer.bytes on source broker, > > socket.receive.buffer.bytes, fetch.message.max.bytes on mirrormaker > > consumer properties, socket.receive.buffer.bytes, > > socket.request.max.bytes on destination broker all to > > 1024*1024*1024(1073741824) . This did improve the performance, but I > > could > > not get Lag to < 100. > > > > Here is how our lag looks like after above changes: > > > > Group Topic Pid Offset > > logSize Lag Owner > > mirrormakerProd FunnelProto 0 554704539 > > 554717088 12549 > > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0 > > mirrormakerProd FunnelProto 1 547370573 > > 547383136 12563 > > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1 > > mirrormakerProd FunnelProto 2 553124930 > > 553125742 812 > > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0 > > mirrormakerProd FunnelProto 3 552990834 > > 552991650 816 > > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1 > > mirrormakerProd agent 0 35438 35440 > > 2 > > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0 > > mirrormakerProd agent 1 35447 35448 > > 1 > > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1 > > mirrormakerProd agent 2 35375 35375 > > 0 > > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0 > > mirrormakerProd agent 3 35336 35336 > > 0 > > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1 > > mirrormakerProd internal_metrics 0 1930852823 > > 1930917418 64595 > > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0 > > mirrormakerProd internal_metrics 1 1937237324 > > 1937301841 64517 > > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1 > > mirrormakerProd internal_metrics 2 1945894901 > > 1945904067 9166 > > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0 > > mirrormakerProd internal_metrics 3 1946906932 > > 1946915928 8996 > > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1 > > mirrormakerProd jmx 0 485270038 > > 485280882 10844 > > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-0 > > mirrormakerProd jmx 1 486363914 > > 486374759 10845 > > mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1377192412490-38a53dc9-1 > > mirrormakerProd jmx 2 491783842 > > 491784826 984 > > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-0 > > mirrormakerProd jmx 3 485675629 > > 485676643 1014 > > mirrormakerProd_ops-mmrs1-2-asg.ops.sfdc.net-1377193322178-7262ed87-1 > > > > In mirrormaker logs, I see topic metadata is fetched after every 10mins > and > > connection reestablished with producers for producing. Is this normal? If > > it's continuously producing, why does it need to reconnect to destination > > brokers for producing.? > > What else can we tune to bring lag < 100 ..? This is just small set of > > data we are currently testing, the real production traffic will be very > > large. How can compute optimum configuration as data traffic increases.? > > > > Thanks for help, > > > > Thanks, > > Raja. > > > > > > -- > Thanks, > Raja. >