I’m comparing some of the performance differences between two clusters that are theoretically identical except for the physical networks they’re on (configs and hardware are identical, each has a local zookeeper ensemble). I noticed that the slower cluster is exhibiting significantly higher times in the kafka.network.RequestMetrics.FetchFollower Mbean, notice the values are ~ 2x - 3x higher from the “Cluster B” broker
Metric Cluster A broker Cluster B broker 50thPercentile 30 106 75thPercentile 68 263 95thPercentile 161 539 98thPercentile 219 695 999thPercentile 459 1151 99thPercentile 289 777 Count 51591577 518675 Max 10004 2554 Mean 50 170 Min 0 0 StdDev 59 187 Are these times that brokers with follower partitions are taking to fetch updates from the leaders? If someone familiar with this metric can shed some light on what it’s measuring, I’d really appreciate it. I’d like to be able to characterize more specifically what this metric is measuring to better isolate the latencies (disk, network, zookeeper roundtrips, something else…? ). If there’s a guide to what the specific MBeans are exposing, pointers to that would be fantastic. Thanks, -Ian -- Ian Kallen / @spidaman “Whatever comes out of these gates, we've got a better chance of survival if we work together.” - General Maximus