Mirror mare is placed to close to target and send/receive buffer size set to 10MB which is the result of bandwidth-delay product. OS level tcp buffer has also been increased to 16MB max
On Wed, 6 Dec 2017 at 15:19 Jan Filipiak <jan.filip...@trivago.com> wrote: > Hi, > > two questions. Is your MirrorMaker collocated with the source or the > target? > what are the send and receive buffer sizes on the connections that do span > across WAN? > > Hope we can get you some help. > > Best jan > > > > On 06.12.2017 14:36, Xu, Zhaohui wrote: > > Any update on this issue? > > > > We also run into similar situation recently. The mirrormaker is > leveraged to replicate messages between clusters in different dc. But > sometimes a portion of partitions are with high consumer lag and tcpdump > also shows similar packet delivery pattern. The behavior is sort of weird > and is not self-explaining. Wondering whether it has anything to do with > the fact that number of consumers is too large? In our example, we have > around 100 consumer connections per broker. > > > > Regards, > > Jeff > > > > On 12/5/17, 10:14 AM, "tao xiao" <xiaotao...@gmail.com> wrote: > > > > Hi, > > > > any pointer will be highly appreciated > > > > On Thu, 30 Nov 2017 at 14:56 tao xiao <xiaotao...@gmail.com> wrote: > > > > > Hi There, > > > > > > > > > > > > We are running into a weird situation when using Mirrormaker to > replicate > > > messages between Kafka clusters across datacenter and reach you > for help in > > > case you also encountered this kind of problem before or have > some insights > > > in this kind of issue. > > > > > > > > > > > > Here is the scenario. We have setup a deployment where we run 30 > > > Mirrormaker instances on 30 different nodes. Each Mirrormaker > instance is > > > configure with num.streams=1 thus only one consumer runs. The > topics to > > > replicate is configure with 100 partitions and data is almost > evenly > > > distributed across all partitions. After running a period of > time, weird > > > things happened that some of the Mirrormaker instances seems to > slow down > > > and consume at a relative slow speed from source Kafka cluster. > The output > > > of tcptrack shows the consume rate of problematic instances > dropped to > > > ~1MB/s, while the other healthy instances consume at a rate of > ~3MB/s. As > > > a result, the consumer lag for corresponding partitions are going > high. > > > > > > > > > > > > > > > After triggering a tcpdump, we noticed the traffic pattern in tcp > > > connection of problematic Mirrmaker instances is very different > from > > > others. Packets flowing in problematic tcp connections are > relatively small > > > and seq and ack packets are basically coming in one after > another. On the > > > other hand, the packets in healthy tcp connections are coming in a > > > different pattern, basically several seq packets comes with an > ack packets. > > > Below screenshot shows the situation, and these two captures are > got on the > > > same mirrormaker node. > > > > > > > > > > > > problematic connection. ps. 10.kfk.kfk.kfk is kafka broker, > 10.mm.mm.mm > > > is Mirrormaker node > > > > > > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fimgur.com%2FZ3odjjT&data=02%7C01%7Czhaohxu%40ebay.com%7Ca8efe84f9feb47ecb5fd08d53b85d7ac%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636480368398154028&sdata=2DdGcjPWD7QI7lZ7v7QDN6I53P9tsSTMzEGdw6IywmU%3D&reserved=0 > > > > > > > > > healthy connection > > > > > > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fimgur.com%2Fw0A6qHT&data=02%7C01%7Czhaohxu%40ebay.com%7Ca8efe84f9feb47ecb5fd08d53b85d7ac%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636480368398154028&sdata=v52DmmY9LHN2%2F59Hb5Xo77JuLreOA3lfDyq8eHKmISQ%3D&reserved=0 > > > > > > > > > If we stop the problematic Mirrormaker instance and when other > instances > > > take over the lagged partitions, they can consume messages > quickly and > > > catch up the lag soon. So the broker in source Kafaka cluster is > supposed > > > to be good. But if Mirrormaker itself causes the issue, how can > one tcp > > > connection is good but others are problematic since the > connections are all > > > established in the same manner by Kafka library. > > > > > > > > > > > > Consumer configuration for Mirrormaker instance as below. > > > > > > auto.offset.reset=earliest > > > > > > > > > > partition.assignment.strategy=org.apache.kafka.clients.consumer.RoundRobinAssignor > > > > > > heartbeat.interval.ms=10000 > > > > > > session.timeout.ms=120000 > > > > > > request.timeout.ms=150000 > > > > > > receive.buffer.bytes=1048576 > > > > > > max.partition.fetch.bytes=2097152 > > > > > > fetch.min.bytes=1048576 > > > > > > > > > > > > Kafka version is 0.10.0.0 and we have Kafka and Mirrormaker run > on Ubuntu > > > 14.04 > > > > > > > > > > > > Any response is appreciated. > > > > > > Regards, > > > > > > Tao > > > > > > > > >