Hey guys, I was running the end-to-end latency test (kafka.TestEndToEndLatency) and saw something a little weird. This test runs a producer and a consumer and sends a single message at a time and measures the round-trip time from the producer's send to the consumer getting the message.
With replication-factor=1 I see very consistent performance with end-to-end latency at 0.4-0.5 ms which is extremely good. But with replication factor=2 I see something like this: count latency 1000 1.9 ms 2000 1.8 ms 3000 1.4 ms 4000 1.7 ms 5000 102.6 ms 6000 101.4 ms 7000 102.4 ms 8000 1.6 ms 9000 101.5 ms This pattern is very reproducible, essentially every 4-5k messages things slow down to an average round trip of 100ms and then pick back up again. Note that this test is not using the new producer. Have we seen this before. The issue could be in the producer acknowledgement or in the highwatermark advancement or fetch request, but I notice that the default fetch max wait is 100ms which makes me think there is a bug in the async request handling that causes it to wait until the timeout. Any ideas? If not I'll file a bug... -Jay