Hi Richard, Yeah, the old 2.8.1 version of Kafka clients used by trunk fs2-kafka is what I think might be the issue, not the wrapper itself, sorry if I was unclear on that.
Please let us know how your testing with the latest fs2-kafka that's using 3.1.0 goes. :) Kind regards, Liam Clarke-Hutchinson On Sun, 20 Mar 2022 at 07:19, Richard Ney <kamisama....@gmail.com> wrote: > I am using the kafka-clients through the fs2-kafka wrapper. Thou the log > message I posted and copied again here > > Notifying assignor about the new Assignment(partitions=[ > platform-data.appquery-platform.aoav3.backfill-28, > platform-data.appquery-platform.aoav3.backfill-31, > platform-data.appquery-platform.aoav3.backfill-34, > platform-data.appquery-platform.aoav3.backfill-37, > platform-data.appquery-platform.aoav3.backfill-40, > platform-data.appquery-platform.aoav3.backfill-43, > platform-data.appquery-platform.aoav3.backfill-46, > > platform-data.appquery-platform.aoav3.backfill-49])","logger_name":"org.apache.kafka.clients.consumer.internals.ConsumerCoordinator","thread_name":"fs2-kafka-consumer-95","level":"INFO"} > > is not generated by the fs2-kafka wrapper. Based on the kafka-client code; > these are the log messages generated during the initial partition > assignment when my application connects to the Kafka brokers. If this list > contained the full list of partitions listed in the kafka-consumer-groups > output but the lag was increasing on 2 partitions, I would immediately > suspect the fs2-kafka wrapper as the issue. The fact that the notification > messages from the kafka-clients library to the fs2-kafka library are > missing two partitions makes me suspect the issue is in the kafka-clients > library. In this occurrence this happened on 2 of the 5 consumer instances. > The version of the kafka-clients library used by the version of the > fs2-kafka library for this test is *2.8.1*. I'm currently running another > test with the latest fs2-kafka library which is consuming the *3.1.0* > version of the kafka-clients library. Initial partition assignment was > successful. On Monday I'll do a large number of scale-up/scale-down tests > to force rebalancing of partitions to see if I can replicate the issue > using the latest version. > > On Sat, Mar 19, 2022 at 2:06 AM Liam Clarke-Hutchinson < > lclar...@redhat.com> > wrote: > > > So to clarify, you're using kafka-clients directly? Or via fx2-kafka? If > > it's kafka-clients directly, what version please? > > > > On Sat, 19 Mar 2022 at 19:59, Richard Ney <kamisama....@gmail.com> > wrote: > > > > > Hi Liam, > > > > > > Sorry for the mis-identification in the last email. The fun of > answering > > an > > > email on a phone instead of a desktop. I confirmed the upper log > > messages I > > > included in the message come from this location in the `kafka-clients` > > > library > > > > > > > > > > > > https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java#L422 > > > > > > And it's only including 8 of the 10 partitions that were assigned to > that > > > consumer instance. > > > > > > -Richard > > > > > > On Fri, Mar 18, 2022 at 11:15 PM Richard Ney <kamisama....@gmail.com> > > > wrote: > > > > > > > Hi Ngā mihi, > > > > > > > > I believe the log entry I included was from the underlying > > kafka-clients > > > > library given that the logger identified is > > > > “org.apache.kafka.clients.consumer.internals.ConsumerCoordinator”. > I’ll > > > > admit at first I thought it also might be the fs2-kafka wrapper given > > > that > > > > the 2.4.0 version is the first version that has correct support for > the > > > > messaging from the ConsumerCoordinator. I am planning to do a test > with > > > the > > > > 3.0.0-M5 version which is built on the updated 3.1.0 kafka-clients > > > library > > > > and will let the list know. > > > > > > > > -Richard Ney > > > > > > > > Sent from my iPhone > > > > > > > > > On Mar 18, 2022, at 10:55 PM, Liam Clarke-Hutchinson < > > > > lclar...@redhat.com> wrote: > > > > > > > > > > Kia ora Richard, > > > > > > > > > > I see support for the Cooperative Sticky Assignor in fs2-kafka is > > quite > > > > > new. Have you discussed this issue with the community of that > client > > at > > > > > all? I ask because I see on GitHub that fs2-kafka is using > > > kafka-clients > > > > > 2.8.1 as the underlying client, and there's been a fair few > bugfixes > > > > around > > > > > the cooperative sticky assignor since that version. > > > > > > > > > > Could you perhaps try overriding the kafka-clients dependency of > > > > fs2-kafka > > > > > and try a higher version, perhaps 3.1.0, and see if the issue > > remains? > > > > I'm > > > > > not sure how well that'll work, but might help narrow down the > issue. > > > > > > > > > > Ngā mihi, > > > > > > > > > > Liam Clarke-Hutchinson > > > > > > > > > >> On Sat, 19 Mar 2022 at 14:24, Richard Ney <kamisama....@gmail.com > > > > > > wrote: > > > > >> > > > > >> Thanks for the additional information Bruno. Does this look like a > > > > possible > > > > >> bug in the CooperativeStickyAssignor? I have 5 consumers reading > > from > > > a > > > > 50 > > > > >> partition topic. Based on the log messages this application > instance > > > is > > > > >> only getting assigned 8 partitions but when I ask the consumer > group > > > for > > > > >> LAG information the consumer group thinks the correct number of 10 > > > > >> partitions were assigned but as should 2 partitions aren't getting > > > read > > > > due > > > > >> to the application not knowing it has them. > > > > >> > > > > >> {"timestamp":"2022-03-19T00:54:46.025Z","message":"[Consumer > > > > >> instanceId=i-0e89c9bee06f71f68, > > > > >> > > > > > > clientId=consumer-app-query-platform-aoa-backfill-v7-i-0e89c9bee06f71f68, > > > > >> groupId=app-query-platform-aoa-backfill-v7] Updating assignment > > > with\n\t > > > > >> Assigned partitions: [ > > > > >> platform-data.appquery-platform.aoav3.backfill-28, > > > > >> platform-data.appquery-platform.aoav3.backfill-43, > > > > >> platform-data.appquery-platform.aoav3.backfill-31, > > > > >> platform-data.appquery-platform.aoav3.backfill-46, > > > > >> platform-data.appquery-platform.aoav3.backfill-34, > > > > >> platform-data.appquery-platform.aoav3.backfill-49, > > > > >> platform-data.appquery-platform.aoav3.backfill-40, > > > > >> platform-data.appquery-platform.aoav3.backfill-37] \n\t > > > > >> Current owned partitions: []\n\t > > > > >> > > > > >> Added partitions (assigned - owned): [ > > > > >> platform-data.appquery-platform.aoav3.backfill-28, > > > > >> platform-data.appquery-platform.aoav3.backfill-43, > > > > >> platform-data.appquery-platform.aoav3.backfill-31, > > > > >> platform-data.appquery-platform.aoav3.backfill-46, > > > > >> platform-data.appquery-platform.aoav3.backfill-34, > > > > >> platform-data.appquery-platform.aoav3.backfill-49, > > > > >> platform-data.appquery-platform.aoav3.backfill-40, > > > > >> platform-data.appquery-platform.aoav3.backfill-37]\n\t > > > > >> > > > > >> Revoked partitions (owned - assigned): > > > > >> > > > > >> > > > > > > > > > > []\n","logger_name":"org.apache.kafka.clients.consumer.internals.ConsumerCoordinator","thread_name":"fs2-kafka-consumer-95","level":"INFO"} > > > > >> > > > > >> > > > > >> {"timestamp":"2022-03-19T00:54:46.026Z","message":"[Consumer > > > > >> instanceId=i-0e89c9bee06f71f68, > > > > >> > > > > > > clientId=consumer-app-query-platform-aoa-backfill-v7-i-0e89c9bee06f71f68, > > > > >> groupId=app-query-platform-aoa-backfill-v7] > > > > >> > > > > >> Notifying assignor about the new Assignment(partitions=[ > > > > >> platform-data.appquery-platform.aoav3.backfill-28, > > > > >> platform-data.appquery-platform.aoav3.backfill-31, > > > > >> platform-data.appquery-platform.aoav3.backfill-34, > > > > >> platform-data.appquery-platform.aoav3.backfill-37, > > > > >> platform-data.appquery-platform.aoav3.backfill-40, > > > > >> platform-data.appquery-platform.aoav3.backfill-43, > > > > >> platform-data.appquery-platform.aoav3.backfill-46, > > > > >> > > > > >> > > > > > > > > > > platform-data.appquery-platform.aoav3.backfill-49])","logger_name":"org.apache.kafka.clients.consumer.internals.ConsumerCoordinator","thread_name":"fs2-kafka-consumer-95","level":"INFO"} > > > > >> > > > > >> {"timestamp":"2022-03-19T00:54:46.028Z","message":"[Consumer > > > > >> instanceId=i-0e89c9bee06f71f68, > > > > >> > > > > > > clientId=consumer-app-query-platform-aoa-backfill-v7-i-0e89c9bee06f71f68, > > > > >> groupId=app-query-platform-aoa-backfill-v7] > > > > >> > > > > >> Adding newly assigned partitions: > > > > >> platform-data.appquery-platform.aoav3.backfill-28, > > > > >> platform-data.appquery-platform.aoav3.backfill-43, > > > > >> platform-data.appquery-platform.aoav3.backfill-31, > > > > >> platform-data.appquery-platform.aoav3.backfill-46, > > > > >> platform-data.appquery-platform.aoav3.backfill-34, > > > > >> platform-data.appquery-platform.aoav3.backfill-49, > > > > >> platform-data.appquery-platform.aoav3.backfill-40, > > > > >> > > > > >> > > > > > > > > > > platform-data.appquery-platform.aoav3.backfill-37","logger_name":"org.apache.kafka.clients.consumer.internals.ConsumerCoordinator","thread_name":"fs2-kafka-consumer-95","level":"INFO"} > > > > >> > > > > >> *OUTPUT FROM `/usr/local/opt/kafka/bin/kafka-consumer-groups`* > > > > >> > > > > >> GROUP TOPIC > > > > >> PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG > > > > >> CONSUMER-ID HOST > > > > >> CLIENT-ID > > > > >> app-query-platform-aoa-backfill-v7 > > > > >> platform-data.appquery-platform.aoav3.backfill 40 8369679 > > > > >> 8369696 17 > > > > >> i-0e89c9bee06f71f68-2c1558e4-7975-493e-8c6b-08da7c4bf941 / > > > 10.123.16.69 > > > > >> consumer-app-query-platform-aoa-backfill-v7-i-0e89c9bee06f71f68 > > > > >> app-query-platform-aoa-backfill-v7 > > > > >> platform-data.appquery-platform.aoav3.backfill 37 8369643 > > > > >> 8369658 15 > > > > >> i-0e89c9bee06f71f68-2c1558e4-7975-493e-8c6b-08da7c4bf941 / > > > 10.123.16.69 > > > > >> consumer-app-query-platform-aoa-backfill-v7-i-0e89c9bee06f71f68 > > > > >> app-query-platform-aoa-backfill-v7 > > > > >> platform-data.appquery-platform.aoav3.backfill 46 8368044 > > > > >> 8368055 11 > > > > >> i-0e89c9bee06f71f68-2c1558e4-7975-493e-8c6b-08da7c4bf941 / > > > 10.123.16.69 > > > > >> consumer-app-query-platform-aoa-backfill-v7-i-0e89c9bee06f71f68 > > > > >> app-query-platform-aoa-backfill-v7 > > > > >> platform-data.appquery-platform.aoav3.backfill 34 8379346 > > > > >> 8379358 12 > > > > >> i-0e89c9bee06f71f68-2c1558e4-7975-493e-8c6b-08da7c4bf941 / > > > 10.123.16.69 > > > > >> consumer-app-query-platform-aoa-backfill-v7-i-0e89c9bee06f71f68 > > > > >> app-query-platform-aoa-backfill-v7 > > > > >> platform-data.appquery-platform.aoav3.backfill 28 8374244 > > > > >> 8374247 3 > > > > >> i-0e89c9bee06f71f68-2c1558e4-7975-493e-8c6b-08da7c4bf941 / > > > 10.123.16.69 > > > > >> consumer-app-query-platform-aoa-backfill-v7-i-0e89c9bee06f71f68 > > > > >> app-query-platform-aoa-backfill-v7 > > > > >> platform-data.appquery-platform.aoav3.backfill 49 8364656 > > > > >> 8364665 9 > > > > >> i-0e89c9bee06f71f68-2c1558e4-7975-493e-8c6b-08da7c4bf941 / > > > 10.123.16.69 > > > > >> consumer-app-query-platform-aoa-backfill-v7-i-0e89c9bee06f71f68 > > > > >> app-query-platform-aoa-backfill-v7 > > > > >> platform-data.appquery-platform.aoav3.backfill 43 8369980 > > > > >> 8369988 8 > > > > >> i-0e89c9bee06f71f68-2c1558e4-7975-493e-8c6b-08da7c4bf941 / > > > 10.123.16.69 > > > > >> consumer-app-query-platform-aoa-backfill-v7-i-0e89c9bee06f71f68 > > > > >> app-query-platform-aoa-backfill-v7 > > > > >> platform-data.appquery-platform.aoav3.backfill 25 8369261 > > > > >> 8370063 802 > > > > >> i-0e89c9bee06f71f68-2c1558e4-7975-493e-8c6b-08da7c4bf941 / > > > 10.123.16.69 > > > > >> consumer-app-query-platform-aoa-backfill-v7-i-0e89c9bee06f71f68 > > > > >> app-query-platform-aoa-backfill-v7 > > > > >> platform-data.appquery-platform.aoav3.backfill 31 8368087 > > > > >> 8368097 10 > > > > >> i-0e89c9bee06f71f68-2c1558e4-7975-493e-8c6b-08da7c4bf941 / > > > 10.123.16.69 > > > > >> consumer-app-query-platform-aoa-backfill-v7-i-0e89c9bee06f71f68 > > > > >> app-query-platform-aoa-backfill-v7 > > > > >> platform-data.appquery-platform.aoav3.backfill 22 8370475 > > > > >> 8371319 844 > > > > >> i-0e89c9bee06f71f68-2c1558e4-7975-493e-8c6b-08da7c4bf941 / > > > 10.123.16.69 > > > > >> consumer-app-query-platform-aoa-backfill-v7-i-0e89c9bee06f71f68 > > > > >> > > > > >>> On Thu, Mar 17, 2022 at 2:29 AM Bruno Cadonna < > cado...@apache.org> > > > > wrote: > > > > >>> > > > > >>> Hi Richard, > > > > >>> > > > > >>> The group.instance.id config is orthogonal to the partition > > > assignment > > > > >>> strategy. The group.instance.id is used if you want to have > static > > > > >>> membership which is not related to the partition assignment > > strategy. > > > > >>> > > > > >>> If you think you found a bug, could you please open a JIRA ticket > > > with > > > > >>> steps to reproduce the bug. > > > > >>> > > > > >>> Best, > > > > >>> Bruno > > > > >>> > > > > >>> On 16.03.22 10:01, Luke Chen wrote: > > > > >>>> Hi Richard, > > > > >>>> > > > > >>>> Right, you are not missing any settings beyond the partition > > > > assignment > > > > >>>> strategy and the group instance id. > > > > >>>> You might need to know from the log that why the rebalance > > triggered > > > > to > > > > >>> do > > > > >>>> troubleshooting. > > > > >>>> > > > > >>>> Thank you. > > > > >>>> Luke > > > > >>>> > > > > >>>> On Wed, Mar 16, 2022 at 3:02 PM Richard Ney < > > kamisama....@gmail.com > > > > > > > > >>> wrote: > > > > >>>> > > > > >>>>> Hi Luke, > > > > >>>>> > > > > >>>>> I did end up with a situation where I had two instances > > connecting > > > to > > > > >>> the > > > > >>>>> same consumer group and they ended up in a rebalance trade-off. > > All > > > > >>>>> partitions kept going back and forth between the two > microservice > > > > >>>>> instances. That was a test case where I'd removed the Group > > > Instance > > > > >> Id > > > > >>>>> setting to see what would happen. I stabilized that one by > > reducing > > > > it > > > > >>> to a > > > > >>>>> single consumer after 20+ rebalances. > > > > >>>>> > > > > >>>>> The other issue I'm seeing may be a bug in the Functional Scala > > > > >>> `fs2-kafka` > > > > >>>>> wrapper where I see the partitions cleanly assigned but one or > > more > > > > >>>>> instances isn't ingesting. I found out that they recently added > > > > >> support > > > > >>> for > > > > >>>>> the cooperative sticky assignor for the stream recreation since > > > they > > > > >>> were > > > > >>>>> assuming a full revocation of the partitions. > > > > >>>>> > > > > >>>>> So I basically wanted to make sure I wasn't missing any > settings > > > > >> beyond > > > > >>> the > > > > >>>>> partition assignment strategy and the group instance id. > > > > >>>>> > > > > >>>>> -Richard > > > > >>>>> > > > > >>>>> -Richard > > > > >>>>> > > > > >>>>> On Tue, Mar 15, 2022 at 11:27 PM Luke Chen <show...@gmail.com> > > > > wrote: > > > > >>>>> > > > > >>>>>> Hi Richard, > > > > >>>>>> > > > > >>>>>> To use `CooperativeStickyAssignor`, no other special > > configuration > > > > is > > > > >>>>>> required. > > > > >>>>>> > > > > >>>>>> I'm not sure what does `make the rebalance happen cleanly` > mean. > > > > >>>>>> Did you find any problem during group rebalance? > > > > >>>>>> > > > > >>>>>> Thank you. > > > > >>>>>> Luke > > > > >>>>>> > > > > >>>>>> On Wed, Mar 16, 2022 at 1:00 PM Richard Ney < > > > > richard....@lookout.com > > > > >>>>>> .invalid> > > > > >>>>>> wrote: > > > > >>>>>> > > > > >>>>>>> Trying to find a good sample of what consumer settings > besides > > > > >> setting > > > > >>>>>>> > > > > >>>>>>> ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG to > > > > >>>>>>> org.apache.kafka.clients.consumer.CooperativeStickyAssignor > > > > >>>>>>> > > > > >>>>>>> is needed to make the rebalance happen cleanly. Unable to > find > > > and > > > > >>>>> decent > > > > >>>>>>> documentation or code samples. I have set the Group Instance > Id > > > to > > > > >> the > > > > >>>>>> EC2 > > > > >>>>>>> instance id based on one blog write up I found. > > > > >>>>>>> > > > > >>>>>>> Any help would be appreciated > > > > >>>>>>> > > > > >>>>>>> -Richard > > > > >>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>>> > > > > >>> > > > > >> > > > > > > > > > >