Hey guys,

I was running the end-to-end latency test (kafka.TestEndToEndLatency) and
saw something a little weird. This test runs a producer and a consumer and
sends a single message at a time and measures the round-trip time from the
producer's send to the consumer getting the message.

With replication-factor=1 I see very consistent performance with end-to-end
latency at 0.4-0.5 ms which is extremely good.

But with replication factor=2 I see something like this:

count   latency
1000 1.9 ms
2000 1.8 ms
3000 1.4 ms
4000 1.7 ms
5000 102.6 ms
6000 101.4 ms
7000 102.4 ms
8000 1.6 ms
9000 101.5 ms

This pattern is very reproducible, essentially every 4-5k messages things
slow down to an average round trip of 100ms and then pick back up again.

Note that this test is not using the new producer.

Have we seen this before. The issue could be in the producer
acknowledgement or in the highwatermark advancement or fetch request, but I
notice that the default fetch max wait is 100ms which makes me think there
is a bug in the async request handling that causes it to wait until the
timeout. Any ideas? If not I'll file a bug...

-Jay

Reply via email to