Hi Johnny,

If committing offsets puts this much load on the cluster, you might want to
consider committing them elsewhere. Maybe a key value store. Or if you send
the data you read from Kafka to a transactional store, you can write the
offsets there.

I hope this helps,
Andras

On Wed, Mar 21, 2018 at 1:28 AM, Johnny Lou <john...@campaignmonitor.com>
wrote:

> Hi Andras,
>
>    Thanks for that information, handcraft group.id to make sure it spread
> across broker is a way, I will give that a go.
>
>   I understand the benefit of consumer group, my concern at the moment is
> the potential to create a hot spot on one or the broker...
>
> Thanks,
>
> Johnny Luo
>
> On 20/3/18, 10:03 pm, "Andras Beni" <andrasb...@cloudera.com> wrote:
>
>     Hi Johnny,
>
>     As you already mentioned, it depends on the group.id which broker
> will be
>     the group leader.
>     You can change the group.id to modify which _consumer_offsets
> partition the
>     group will belong to, thus change which broker will manage a group.
> You can
>     check which partition a group.id is assigned using
>
>     Utils.toPositive(Utils.murmur2(groupIdAsByteArray)) % partitionCount
>
>     consumer group is a way to distribute work across equivalent
> consumers. I
>     would assume it is a good idea but it depends on your architecture and
> use
>     case.
>
>     Best regards,
>     Andras
>
>     On Sat, Mar 17, 2018 at 12:55 PM, Johnny Luo <
> john...@campaignmonitor.com>
>     wrote:
>
>     > Hello,
>     >
>     > We are running a 16 nodes kafka cluster on AWS, each node is a
> m4.xLarge
>     > EC2 instance,  with 2TB EBS(ST1) disk.  Kafka version is 0.10.1.0,
> we have
>     > about 100 topics at the moment.  Some busy topics will have about 2
> billion
>     > events every day, some low volume topics will only have thousands
> per day.
>     >
>     > Most of our topics use an UUID as the partition key when we produce
> the
>     > message, so the partitions are quite evenly distributed.
>     >
>     > We have quite a lot consumer consume from this cluster using consumer
>     > group. Each consumer has a unique group id. Some consumer group
> commit
>     > offsets every 500ms, some will commit offsets in sync as soon as it
>     > finishes processing a batch of messages.
>     >
>     > Recently we observed a behaviour that some of the brokers are far
> busier
>     > than the others.  With some digging, we find out, it is actually
> quite a
>     > lot traffic go to "__consumer_offsets", thus we created a tool to
> see the
>     > high watermark of each partitions in "__consumer_offsets", which
> reveal
>     > that the partitions are very uneven distributed.
>     >
>     > Based on this link "Consumer offset management in Kafka"
>     >
>     > It seems it is an intended behaviour, each consumer group only have
> one
>     > leader, thus committed offsets all need to go to this leader, and
> also only
>     > use “group.Id” to decide the partition.
>     >
>     > Given the fact that we have some consumers consume from those very
> busy
>     > topics, thus the commit offsets will cause a lot traffic to
>     > "__consumer_offsets" topic on the broker that handle the consumer
> group.
>     >
>     > My questions are :
>     > 1. Is there a way we can make sure that the consumer groups that
> consume
>     > from busy topics doesn't fall on to the same broker? Don’t' want to
> create
>     > a hotspot.
>     > 2. For consumers that consumer from busy topics (topics have billions
>     > messages per day), is it a good idea to use consumer group?
>     >
>     > Thanks in advance
>     >
>     > Johnny Luo
>     >
>
>
>
>

Reply via email to