Hi Johnny, If committing offsets puts this much load on the cluster, you might want to consider committing them elsewhere. Maybe a key value store. Or if you send the data you read from Kafka to a transactional store, you can write the offsets there.
I hope this helps, Andras On Wed, Mar 21, 2018 at 1:28 AM, Johnny Lou <john...@campaignmonitor.com> wrote: > Hi Andras, > > Thanks for that information, handcraft group.id to make sure it spread > across broker is a way, I will give that a go. > > I understand the benefit of consumer group, my concern at the moment is > the potential to create a hot spot on one or the broker... > > Thanks, > > Johnny Luo > > On 20/3/18, 10:03 pm, "Andras Beni" <andrasb...@cloudera.com> wrote: > > Hi Johnny, > > As you already mentioned, it depends on the group.id which broker > will be > the group leader. > You can change the group.id to modify which _consumer_offsets > partition the > group will belong to, thus change which broker will manage a group. > You can > check which partition a group.id is assigned using > > Utils.toPositive(Utils.murmur2(groupIdAsByteArray)) % partitionCount > > consumer group is a way to distribute work across equivalent > consumers. I > would assume it is a good idea but it depends on your architecture and > use > case. > > Best regards, > Andras > > On Sat, Mar 17, 2018 at 12:55 PM, Johnny Luo < > john...@campaignmonitor.com> > wrote: > > > Hello, > > > > We are running a 16 nodes kafka cluster on AWS, each node is a > m4.xLarge > > EC2 instance, with 2TB EBS(ST1) disk. Kafka version is 0.10.1.0, > we have > > about 100 topics at the moment. Some busy topics will have about 2 > billion > > events every day, some low volume topics will only have thousands > per day. > > > > Most of our topics use an UUID as the partition key when we produce > the > > message, so the partitions are quite evenly distributed. > > > > We have quite a lot consumer consume from this cluster using consumer > > group. Each consumer has a unique group id. Some consumer group > commit > > offsets every 500ms, some will commit offsets in sync as soon as it > > finishes processing a batch of messages. > > > > Recently we observed a behaviour that some of the brokers are far > busier > > than the others. With some digging, we find out, it is actually > quite a > > lot traffic go to "__consumer_offsets", thus we created a tool to > see the > > high watermark of each partitions in "__consumer_offsets", which > reveal > > that the partitions are very uneven distributed. > > > > Based on this link "Consumer offset management in Kafka" > > > > It seems it is an intended behaviour, each consumer group only have > one > > leader, thus committed offsets all need to go to this leader, and > also only > > use “group.Id” to decide the partition. > > > > Given the fact that we have some consumers consume from those very > busy > > topics, thus the commit offsets will cause a lot traffic to > > "__consumer_offsets" topic on the broker that handle the consumer > group. > > > > My questions are : > > 1. Is there a way we can make sure that the consumer groups that > consume > > from busy topics doesn't fall on to the same broker? Don’t' want to > create > > a hotspot. > > 2. For consumers that consumer from busy topics (topics have billions > > messages per day), is it a good idea to use consumer group? > > > > Thanks in advance > > > > Johnny Luo > > > > > >