Hi Sophie, Thanks for your clarification. :) Luke
Sophie Blee-Goldman <sop...@confluent.io.invalid> 於 2021年6月24日 週四 上午8:00 寫道: > Just to clarify, this bug actually does impact only the cooperative-sticky > assignor. The cooperative sticky assignor gets its > "ownedPartitions" input from the (possibly corrupted) SubscriptionState, > while the plain sticky assignor has to rely on > keeping track of these partitions itself, since in eager rebalancing the > "ownedPartitions" are always empty during a rebalance. > So you can safely use the regular sticky assignor to avoid this issue. > > On Wed, Jun 23, 2021 at 4:38 PM Luke Chen <show...@gmail.com> wrote: > > > Hi Tao, > > 1. So this bug only applies to cooperative-sticky assignor? > > --> Yes, this bug only applies to sticky assignor (both eager and > > cooperative) since we will refer to the consumer's previous assignment. > > > > 2. Does assignor strategy (cooperative-sticky vs sticky vs others) really > > matter in this case? > > --> No, the assignor strategy won't affect the at most once. They are > > independent concepts. > > > > That is, to workaround this issue, please change to a non-sticky assignor > > before the bug fixed. > > > > Thank you. > > Luke > > > > Tao Huang <sandy.huang...@gmail.com> 於 2021年6月23日 週三 下午9:34 寫道: > > > > > Thank you Sophie for sharing the details. > > > > > > So this bug only applies to cooperative-sticky assignor? Should I > switch > > > to other strategy (eg: StickyAssignor) while I am waiting for the fix? > > > > > > On the other hand, my application is using "auto-commit" mechanism for > > "at > > > most once" event consuming. Does assignor strategy (cooperative-sticky > vs > > > sticky vs others) really matter in this case? My understanding is that, > > > regardless which strategy is used, the members in the group have to > > rejoin > > > when re-balance happens. > > > > > > Thanks! > > > > > > Tao > > > > > > On 2021/06/23 02:01:04, Sophie Blee-Goldman > <sop...@confluent.io.INVALID > > > > > > wrote: > > > > Here's the ticket: https://issues.apache.org/jira/browse/KAFKA-12984 > > > > > > > > And the root cause of that itself: > > > > https://issues.apache.org/jira/browse/KAFKA-12983 > > > > > > > > On Tue, Jun 22, 2021 at 6:15 PM Sophie Blee-Goldman < > > sop...@confluent.io > > > > > > > > wrote: > > > > > > > > > Hey Tao, > > > > > > > > > > We recently discovered a bug in the way that the consumer tracks > > > partition > > > > > metadata which may cause the cooperative-sticky assignor to throw > > this > > > > > exception in the case of a consumer that dropped out of the group > at > > > some > > > > > point. I'm just about to file a ticket for it, and it should be > fixed > > > in > > > > > the upcoming releases. > > > > > > > > > > The problem is that some consumers are claiming to own partitions > > that > > > > > they no longer actually own after having dropped out. If you can > > narrow > > > > > down the problematic consumers and restart them, it should resolve > > the > > > > > issue. I believe you should be able to tell which consumers are > > > claiming > > > > > partitions they no longer own based on the logs, but another option > > is > > > just > > > > > to restart all the consumers (or do a rolling restart until the > > problem > > > > > goes away). > > > > > > > > > > I'll follow up here with the ticket link once I've filed it. > > > > > > > > > > -Sophie > > > > > > > > > > On Tue, Jun 22, 2021 at 12:07 PM Tao Huang < > sandy.huang...@gmail.com > > > > > > > > wrote: > > > > > > > > > >> Thanks for the feedback. > > > > >> > > > > >> It seems the referred bug is on the server (Broker) side? I just > > > checked > > > > >> my Kafka Broker version, it is actually on 2.4.1. So the bug seems > > > does not > > > > >> apply to my case. > > > > >> > > > > >> Should I downgrade my client (Java library) version to 2.4.1? > > > > >> > > > > >> Thanks! > > > > >> > > > > >> On 2021/06/21 20:04:31, Ran Lupovich <ranlupov...@gmail.com> > wrote: > > > > >> > > > > > https://issues.apache.org/jira/plugins/servlet/mobile#issue/KAFKA-12890 > > > > >> > > > > > >> > Check out this jira ticket > > > > >> > > > > > >> > בתאריך יום ב׳, 21 ביוני 2021, 22:15, מאת Tao Huang < > > > > >> > sandy.huang...@gmail.com>: > > > > >> > > > > > >> > > Hi There, > > > > >> > > > > > > >> > > I am experiencing intermittent issue when consumer group stuck > > on > > > > >> > > "Completing-Reblalance" state. When this is happening, client > > > throws > > > > >> error > > > > >> > > as below: > > > > >> > > > > > > >> > > 2021-06-18 13:55:41,086 ERROR io.mylab.adapter.KafkaListener > > > > >> > > [edfKafkaListener:CIO.PandC.CIPG.InternalLoggingMetadataInfo] > > > > >> Exception on > > > > >> > > Kafka listener (InternalLoggingMetadataInfo) - Some partitions > > are > > > > >> > > unassigned but all consumers are at maximum capacity > > > > >> > > java.lang.IllegalStateException: Some partitions are > unassigned > > > but > > > > >> all > > > > >> > > consumers are at maximum capacity > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor.constrainedAssign(AbstractStickyAssignor.java:248) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor.assign(AbstractStickyAssignor.java:81) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.CooperativeStickyAssignor.assign(CooperativeStickyAssignor.java:64) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.AbstractPartitionAssignor.assign(AbstractPartitionAssignor.java:66) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:589) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:686) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1100(AbstractCoordinator.java:111) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:599) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:562) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1151) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1126) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:206) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:169) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:129) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:602) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:412) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:297) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1296) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1237) > > > > >> > > at > > > > >> > > > > > > >> > > > > > > >> > > > > > > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) > > > > >> > > at > io.mylab.adapter.KafkaListener.run(EdfKafkaListener.java:93) > > > > >> > > at java.lang.Thread.run(Thread.java:748) > > > > >> > > > > > > >> > > The option to exit the state is to stop some members of the > > > consumer > > > > >> group. > > > > >> > > > > > > >> > > Version: 2.6.1 > > > > >> > > PARTITION_ASSIGNMENT_STRATEGY: CooperativeStickyAssignor > > > > >> > > > > > > >> > > Would you please advise what would be the condition to trigger > > > such > > > > >> issue? > > > > >> > > > > > > >> > > Thanks! > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > >