I apologize for sending this to dev. Reposting to the Users mailing list. ---------- Forwarded message ----------
I was investigating some performance issues we're issues in one of our production clusters, and I ran into extremely unbalanced offset partitions for the __consumer_offsets topic. I only pasted the top 8 below, out of 50 total. As you can see, between the top 5 partitions, those servers have to handle 83% of the commit volume, and brokers 9 and 10 show up repeatedly on as leader as well as replicas. Partition Offsets Percentage Leader Replicas ISR 6 52,761,610,477 34.24% 10 (10,6,7) (7,6,10) 5 46,196,021,230 29.98% 9 (9,5,6) (5,6,9) 42 17,530,298,423 11.38% 10 (10,9,11) (10,11,9) 31 12,927,081,106 8.39% 11 (11,9,10) (10,11,9) 0 8,557,903,671 5.55% 4 (4,12,1) (4,12,1) 2 3,969,232,652 2.58% 6 (6,2,3) (6,3,2) 49 3,555,754,347 2.31% 5 (5,11,7) (5,7,11) 33 2,273,951,745 1.48% 1 (1,11,12) (1,12,11) Those brokers (9, 10 and 11) also happen to be the ones we're having performance issues with. We can't be sure yet if this is the cause of the performance issues, but it's looking extremely likely. So, I was wondering, what can be done to "rebalance" these consumer offsets? This was something, as far as I know, automatically decided, I don't believe we ever changed a setting related to this. I also don't believe we can influence which partition gets which offsets when consuming. It would also be interesting to know what is the algorithm/pattern used to decide the consumer offset partition, and is this something we can change or influence? Thanks, Marcos Juarez