Hi Girish, Thanks for raising the discussion.
I can confirm that your understanding is correct, and the document is confusing. If there are four consumers connected to a partitioned topic with two partitions, each partition will have four connected consumers but only one active consumer. The document said two consumers are connected to each partition is wrong. We will try to improve the document, and your contribution is welcome if you want to improve it. For the consumer shift for the partition without active consumer failures. I think it should be a load-balance consideration. Kafka has a consumer group coordinator, which can balance traffic between consumers. But Pulsar doesn't have. So Pulsar has to re-assign the active consumer when the consumer leaves, no matter whether the consumer is active or not. Frankly, it's not the best policy for all the cases. IMO, Pulsar also can have different policies for assigning active consumers for different requirements. Do you have a real case that the unnecessary consumer shift will impact? Which will help us to understand the value of introducing different policies. All I can think of at the moment are load balance (if the traffic of the partitions is far from each other) and the duplicated messages when switching the active consumer. Regards, Penghui On Mon, Jul 3, 2023 at 2:49 PM Girish Sharma <scrapmachi...@gmail.com> wrote: > Bumping this up. Would really like to discuss this in the community. > > Regards > > On Wed, Jun 28, 2023 at 11:49 PM Girish Sharma <scrapmachi...@gmail.com> > wrote: > > > Hi everyone, I am trying to understand the failover subscription logic a > > bit more in detail. Specifically, the doc > > <https://pulsar.apache.org/docs/3.0.x/concepts-messaging/#failover > >mention > > this part for partitioned topic: > > > > > > > > * If the number of partitions in a partitioned topic is less than the > > number of consumers:For example, in the diagram below, this partitioned > > topic has 2 partitions and there are 4 consumers.Each partition has 1 > > active consumer and 1 stand-by consumer.* > > > > > > - *For p0, consumer A is the master consumer, while consumer B would > > be the next consumer in line to receive messages if consumer A is > > disconnected.* > > - *For p1, consumer C is the master consumer, while consumer D would > > be the next consumer in line to receive messages if consumer C is > > disconnected*. > > > > So, as per this, since all four (A,B,C,D) consumers make connection to > > both partitions p0 and p1, the consumers array size in > > AbstractDispatcherSingleActiveConsumer should be 4. Now based on the > > consumer index choosing logic spanning lines 126 - 130 > > < > https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractDispatcherSingleActiveConsumer.java#L126-L130 > > > > , the consumer index assigned to p0 should be 0 (i.e. A) and to p1 should > > be 1 (i.e. B) . I am assuming here that all 4 consumers have the same > > priority.Now consider consumer B getting disconnected. remaining consumer > > array == (A,C,D) . In this case, p1 will get a new consumer using logic 1 > > % 3 = 1 index i.e. consumer C now. p0's consumer would remain same i.e. 0 > > % 3 = 0 i.e. A. > > Now next consider that consumer A also goes down. remaining consumer > array > > == (C,D) In this case, p0 will get a new consumer -> 0%2 = 0 i.e. > > consumer C and p1 would now be shifted to 1%2 = 1 Consumer D . Even > > though p1's active consumer was untouched, p1 got a consumer shift.So I > > have couple of questions - > > > > - Am I missing something? Is my understanding of logic correct? > > - If yes, why does the doc say what it says? And why change p1's > > consumer uselessly in above example > > > > > > Regards > > -- > > Girish Sharma > > > > > -- > Girish Sharma >