Hi Theo, You can try to set replica.fetch.min.bytes to some large number (default to 1) and increase replica.fetch.wait.max.ms (default to 500) and see if that helps. In general, with 4 fetchers and min.bytes to 1 the replicas would effectively exchange many small packets over the wire.
Guozhang On Mon, Sep 1, 2014 at 11:06 PM, Theo Hultberg <t...@iconara.net> wrote: > Hi Guozhang, > > We're using the default on all of those, except num.replica.fetchers which > is set to 4. > > T# > > > On Mon, Sep 1, 2014 at 9:41 PM, Guozhang Wang <wangg...@gmail.com> wrote: > > > Hello Theo, > > > > What are the values for your "replica.fetch.max.bytes", > > "replica.fetch.min.bytes", "replica.fetch.wait.max.ms" and > > "num.replica.fetchers" configs? > > > > Guozhang > > > > > > On Mon, Sep 1, 2014 at 2:52 AM, Theo Hultberg <t...@iconara.net> wrote: > > > > > Hi, > > > > > > We're evaluating Kafka, and have a problem with it using more bandwidth > > > than we can explain. From what we can tell the replication uses at > least > > > twice the bandwidth it should. > > > > > > We have four producer nodes and three broker nodes. We have enabled 3x > > > replication, so each node will get a copy of all data in this setup. > The > > > producers have Snappy compression enabled and send batches of 200 > > messages. > > > The messages are around 1 KiB each. The cluster runs using mostly > default > > > configuration, and the Kafka version is 0.8.1.1. > > > > > > When we run iftop on the broker nodes we see that each Kafka node > > receives > > > around 6-7 Mbit from each producer node (or around 25-30 Mbit in > total), > > > but then sends around 50 Mbit to each other Kafka node (or 100 Mbit in > > > total). This is twice what we expected to see, and it seems to saturate > > the > > > bandwidth on our m1.xlarge machines. In other words, we expected the > > > incoming 25 Mbit to be amplified to 50 Mbit, not 100. > > > > > > One thing that could explain it, and that we don't really know how to > > > verify, is that the inter-node communication is not compressed. We > aren't > > > sure about what compression ratio we get on the incoming data, but 50% > > > sounds reasonable. Could this explain what we're seeing? Is there a > > > configuration property to enable compression on the replication traffic > > > that we've missed? > > > > > > yours > > > Theo > > > > > > > > > > > -- > > -- Guozhang > > > -- -- Guozhang