Hi again – interesting discussion thanks. Coming from a fairly old school background of MOM and JMS in the 00’s I guess I found Kafka consumer groups “interesting” because:
1 – they enable message fan-out to multiple consumers – i.e. each group gets a copy of each message – this isn’t “normal” for traditional pub-sub/MOM 2 – they enable the sharing of messages for scalability and concurrency – you can easily scale consumer processing by increasing the number of consumers (and partitions) 3 – message processing order – order is only guaranteed per partition 4 – check out the parallel consumer for higher throughput with less consumers/partitions – there are more options for parallel processing and ordering Regards, Paul From: Artem Livshits <alivsh...@confluent.io.INVALID> Date: Friday, 24 January 2025 at 5:29 am To: users@kafka.apache.org <users@kafka.apache.org> Subject: Re: JoinGroup API response timing. [You don't often get email from alivsh...@confluent.io.invalid. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] EXTERNAL EMAIL - USE CAUTION when clicking links or attachments > I wanted to understand the motivation for consumer group The motivation for the consumer group is to let Kafka manage the workload distribution among consumers. The application then can just run multiple uncoordinated instances and not worry about partitions that have no consumer assigned or consumers that are overloaded / idle -- the consumer group coordinator would coordinate partition assignment across consumers in the same consumer group. If the application doesn't need this functionality it can use KafkaConsumer.assign method to manually assign partitions to consumers and run instance coordination logic to manage liveness / workload balance by itself. > As for offsets, let them be saved locally by consumer or Redis, etc. This can work, unless we need exactly-once semantics (make sure that if we consume a record from a topic and produce a corresponding record to another topic, we either atomically do both or not do both). For exactly-once semantics, the application would need to use Kafka transactions and store offsets in Kafka. -Artem On Wed, Jan 22, 2025 at 5:56 PM Chain Head <mrchainh...@gmail.com> wrote: > I wanted to understand the motivation for consumer group. > > *If number of partitions are known in advance*, then each consumer can > subscribe to individual topic-partition. If such a consumer fails, let > kubernetes reschedule it. (Elsewhere, similar restart mechanism may exist.) > Sharing the load of a failed consumer assumes that data across partitions > are "same". Else, such sharing is needless burden. As for offsets, let them > be saved locally by consumer or Redis, etc. > > The concept of consumer group shifts the responsibility of partition > assignment to broker because only broker knows the number of partitions. > > Best regards. > > On Thu, 23 Jan, 2025, 06:25 Brebner, Paul, <paul.breb...@netapp.com > .invalid> > wrote: > > > Hi – short answer is consumers can read from a specific partition, but in > > general for a consumer group you want to balance the partitions across > the > > available consumers for high throughput – if a consumer fails or is > kicked > > off the group because it times out etc then the remainder of the > consumers > > are rebalanced across the partitions etc. Paul > > > > From: Chain Head <mrchainh...@gmail.com> > > Date: Wednesday, 22 January 2025 at 5:19 pm > > To: users@kafka.apache.org <users@kafka.apache.org> > > Subject: Re: JoinGroup API response timing. > > EXTERNAL EMAIL - USE CAUTION when clicking links or attachments > > > > > > > > > > Thanks for the explanation. > > > > Slight digression and perhaps a silly question w.r.t. consumers. > > > > Since multiple groups are possible, at a high level, the broker > effectively > > sends data of a given topic-partition to multiple consumers while keeping > > track of offsets. So, why not let consumers specify the partition ID they > > want to consume? Is the concept of consumer groups only because the > > consumers wouldn't know in advance how many partitions exist in a topic? > > > > Best regards. > > > > On Wed, Jan 22, 2025 at 7:41 AM Greg Harris <greg.har...@aiven.io.invalid > > > > wrote: > > > > > Hi, > > > > > > Thanks for the follow up. > > > > > > By "classic" I meant the protocol implemented by the > SyncGroup/JoinGroup > > > API [1]. It's a general group protocol that is still fully supported in > > > Kraft, and at this time has no intention of being deprecated. > > > It's called "classic" to distinguish it from the newer KIP-848 [2] > > > ConsumerGroupHeartbeat API and the share group protocol used in KIP-932 > > > [3]. It is my understanding that these other protocols are _not_ a > > > synchronizing barrier in the same way the "classic" protocol is. > > > > > > Hope this helps, > > > Greg > > > > > > [1] > > > > > > > > > https://urldefense.com/v3/__https://github.com/apache/kafka/blob/adb033211497e539725366960e1013a4638de59f/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/classic/ClassicGroup.java__;!!Nhn8V6BzJA!RUMrj1UUerrRHrSLYNqS64hcjRVWOfQEBJYb0SYerdVNtC92K2MjTB8p00y7-OkYac1rhR0rYeBbneBwtCWqMDDfrw$<https://urldefense.com/v3/__https:/github.com/apache/kafka/blob/adb033211497e539725366960e1013a4638de59f/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/classic/ClassicGroup.java__;!!Nhn8V6BzJA!RUMrj1UUerrRHrSLYNqS64hcjRVWOfQEBJYb0SYerdVNtC92K2MjTB8p00y7-OkYac1rhR0rYeBbneBwtCWqMDDfrw$> > > < > > > https://urldefense.com/v3/__https:/github.com/apache/kafka/blob/adb033211497e539725366960e1013a4638de59f/group-coordinator/src/main/java/org/apache/kafka/coordinator/group/classic/ClassicGroup.java__;!!Nhn8V6BzJA!RUMrj1UUerrRHrSLYNqS64hcjRVWOfQEBJYb0SYerdVNtC92K2MjTB8p00y7-OkYac1rhR0rYeBbneBwtCWqMDDfrw$ > > > > > > [2] > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-848*3A*The*Next*Generation*of*the*Consumer*Rebalance*Protocol__;JSsrKysrKysr!!Nhn8V6BzJA!RUMrj1UUerrRHrSLYNqS64hcjRVWOfQEBJYb0SYerdVNtC92K2MjTB8p00y7-OkYac1rhR0rYeBbneBwtCVcPWDEEg$<https://urldefense.com/v3/__https:/cwiki.apache.org/confluence/display/KAFKA/KIP-848*3A*The*Next*Generation*of*the*Consumer*Rebalance*Protocol__;JSsrKysrKysr!!Nhn8V6BzJA!RUMrj1UUerrRHrSLYNqS64hcjRVWOfQEBJYb0SYerdVNtC92K2MjTB8p00y7-OkYac1rhR0rYeBbneBwtCVcPWDEEg$> > > < > > > https://urldefense.com/v3/__https:/cwiki.apache.org/confluence/display/KAFKA/KIP-848*3A*The*Next*Generation*of*the*Consumer*Rebalance*Protocol__;JSsrKysrKysr!!Nhn8V6BzJA!RUMrj1UUerrRHrSLYNqS64hcjRVWOfQEBJYb0SYerdVNtC92K2MjTB8p00y7-OkYac1rhR0rYeBbneBwtCVcPWDEEg$ > > > > > > [3] > > > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-932*3A*Queues*for*Kafka__;JSsrKw!!Nhn8V6BzJA!RUMrj1UUerrRHrSLYNqS64hcjRVWOfQEBJYb0SYerdVNtC92K2MjTB8p00y7-OkYac1rhR0rYeBbneBwtCWxDptKnA$<https://urldefense.com/v3/__https:/cwiki.apache.org/confluence/display/KAFKA/KIP-932*3A*Queues*for*Kafka__;JSsrKw!!Nhn8V6BzJA!RUMrj1UUerrRHrSLYNqS64hcjRVWOfQEBJYb0SYerdVNtC92K2MjTB8p00y7-OkYac1rhR0rYeBbneBwtCWxDptKnA$> > > < > > > https://urldefense.com/v3/__https:/cwiki.apache.org/confluence/display/KAFKA/KIP-932*3A*Queues*for*Kafka__;JSsrKw!!Nhn8V6BzJA!RUMrj1UUerrRHrSLYNqS64hcjRVWOfQEBJYb0SYerdVNtC92K2MjTB8p00y7-OkYac1rhR0rYeBbneBwtCWxDptKnA$ > > > > > > > > > On Tue, Jan 21, 2025 at 4:41 PM Chain Head <mrchainh...@gmail.com> > > wrote: > > > > > > > Thanks. > > > > > > > > By "classic" you mean pre-KRaft consensus? What is the "current" > Kafka > > > > Group protocol? > > > > > > > > Best regards. > > > > > > > > On Tue, Jan 21, 2025 at 9:55 PM Greg Harris > > <greg.har...@aiven.io.invalid > > > > > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > Yes you are correct. The "classic" Kafka Group Protocol is a > > > > synchronizing > > > > > barrier for all members. > > > > > All JoinGroup member responses are returned after all JoinGroup > > member > > > > > requests are received. > > > > > > > > > > Thanks, > > > > > Greg > > > > > > > > > > On Tue, Jan 21, 2025 at 7:10 AM Chain Head <mrchainh...@gmail.com> > > > > wrote: > > > > > > > > > > > Assume that three consumers of a certain group want to connect > to a > > > > > broker > > > > > > for a topic with 3 partitions. After the FindCoordinator API is > > done, > > > > the > > > > > > consumers send JoinGroup. Since the broker cannot know in advance > > how > > > > > many > > > > > > consumers are expected to join, it waits > > > > > group.initial.rebalance.delay.ms > > > > > > before starting a rebalance. > > > > > > > > > > > > Therefore, does this mean the JoinGroup API response of each > > request > > > is > > > > > > "held" until the waiting period is over? > > > > > > > > > > > > Best regards. > > > > > > > > > > > > > > > > > > > > >