Luke / James I agree that this bug is critical enough to release a new patch. Plus, there are 10 more bug fixes <https://issues.apache.org/jira/browse/KAFKA-13805?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%202.8.2> with major/blocker priority waiting to be released in 2.8.2.
I will be happy to assist / perform the release process for 2.8.2 or assist in any other way I can. Luke, please let me know how we want to proceed ahead on this. Regards, Divij Vaidya On Fri, Apr 29, 2022 at 5:09 AM James Olsen <ja...@inaseq.com> wrote: > Luke, > > I would argue that https://issues.apache.org/jira/browse/KAFKA-13636 is a > critical defect as it can have a very serious impact. > > We run on AWS MSK which supports these versions: > https://docs.aws.amazon.com/msk/latest/developerguide/supported-kafka-versions.html. > We are currently on 2.7.2. > > I note that MSK does not support any 3.x (maybe they're not ready for the > Zookeeper removal). So I suspect we're going to need a 2.x if MSK is going > to adopt it any time soon. I'd be happier with a 2.7.3 incorporating > KAFKA-13636 in order to minimise the risk of introducing other issues, or > the 2.8.2 if that's not possible. > > What can we do to make this happen ASAP? > > Regards, James. > > On 29/04/2022, at 14:50, Luke Chen <show...@gmail.com<mailto: > show...@gmail.com>> wrote: > > Hi James, > > So far, v2.8.2 is not planned, yet. And usually, the patch release only > has one, that is, v2.8.0, and v2.8.1. > But there are of course some exceptions that some releases have 2 or 3 > patch releases. > > For KAFKA-13658, you can check KAFKA-13658< > https://issues.apache.org/jira/browse/KAFKA-13658>, which is included in > v3.0.1, v3.1.1, and v3.2.0. > So far, the v3.0.1 is released, and v3.1.1 and v3.2.0 will be coming soon. > > Thank you. > Luke > > On Fri, Apr 29, 2022 at 8:53 AM James Olsen <ja...@inaseq.com<mailto: > ja...@inaseq.com>> wrote: > Luke, > > Do you know if 2.8.2 will be released anytime soon? It appears to be > waiting on https://issues.apache.org/jira/browse/KAFKA-13805 for which > fixes are available. > > Regards, James. > > On 11/04/2022, at 14:22, Luke Chen <show...@gmail.com<mailto: > show...@gmail.com>> wrote: > > Hi James, > > This looks like this known issue KAFKA-13636 > <https://issues.apache.org/jira/browse/KAFKA-13636>, which should be fixed > in the newer version. > > Thank you. > Luke > > On Mon, Apr 11, 2022 at 9:18 AM James Olsen <ja...@inaseq.com<mailto: > ja...@inaseq.com>> wrote: > > I recently observed the following series of events for a particular > partition (MyTopic-6): > > 2022-03-18 03:18:28,562 INFO > [org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] > 'executor-thread-2' [Consumer clientId=consumer-MyTopicService-group-3, > groupId=MyTopicService-group] Setting offset for partition MyTopic-6 to the > committed offset FetchPosition{offset=438, offsetEpoch=Optional.empty, > currentLeader=LeaderAndEpoch{leader=Optional[b-2.redacted.kafka.us< > http://b-2.redacted.kafka.us/>< > http://b-2.redacted.kafka.us<http://b-2.redacted.kafka.us/>>- > east-1.amazonaws.com:9094<http://east-1.amazonaws.com:9094/> (id: 2 rack: > use1-az4)], epoch=64}} > > -- RESTART (bring up new consumer node) > > 2022-04-01 15:17:47,943 INFO > [org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] > 'executor-thread-6' [Consumer clientId=consumer-MyTopicService-group-7, > groupId=MyTopicService-group] Setting offset for partition MyTopic-6 to the > committed offset FetchPosition{offset=449, offsetEpoch=Optional.empty, > currentLeader=LeaderAndEpoch{leader=Optional[b-2.redacted.kafka.us< > http://b-2.redacted.kafka.us/>< > http://b-2.redacted.kafka.us<http://b-2.redacted.kafka.us/>>- > east-1.amazonaws.com:9094<http://east-1.amazonaws.com:9094/> (id: 2 rack: > use1-az4)], epoch=64}} > > -- REBALANCE (drop old consumer node) > > 2022-04-01 15:18:24,414 INFO > [org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] > 'executor-thread-2' [Consumer clientId=consumer-MyTopicService-group-3, > groupId=MyTopicService-group] Found no committed offset for partition > MyTopic-6 > 2022-04-01 15:18:24,474 INFO > [org.apache.kafka.clients.consumer.internals.SubscriptionState] > 'executor-thread-2' [Consumer clientId=consumer-MyTopicService-group-3, > groupId=MyTopicService-group] Resetting offset for partition MyTopic-6 to > position FetchPosition{offset=411, offsetEpoch=Optional.empty, > currentLeader=LeaderAndEpoch{leader=Optional[b-2.redacted.kafka.us< > http://b-2.redacted.kafka.us/>< > http://b-2.redacted.kafka.us<http://b-2.redacted.kafka.us/>>- > east-1.amazonaws.com:9094<http://east-1.amazonaws.com:9094/> (id: 2 rack: > use1-az4)], epoch=64}}. > > Seems odd that no offsets were found at 2022-04-01 15:18:24,414 when they > were clearly present 36 seconds earlier at 2022-04-01 15:17:47,943. > > This resulted in message replay from offset 411-449. This was in a test > system only and we have duplicate detection in place but I'd still like to > avoid similar occurrences in production if we can. > > There has clearly been a low volume of traffic but there have been active > consumers all the time. We have log.retention.ms<http://log.retention.ms/ > ><http://log.retention.ms<http://log.retention.ms/>>=1814400000 > (3 weeks) which I believe explains why it resumed from 411 as messages > prior to that will have been deleted. > > There may not have been any new traffic in the last 7 days (we have the > default offset retention) so I'm wondering if there is a chance the offsets > were deleted during the rebalance when I presume there's a brief moment > when there is no active consumer. My understanding is that they shouldn't > be deleted until there has been no consumer for 7 days ( > > https://kafka.apache.org/27/documentation.html#brokerconfigs_offsets.retention.minutes > - not using static assignment). Is it possible the logic is actually > checking for no consumer now and no offsets for 7 days instead? > > Server and Client are 2.7.2. Sorry I don't have any more detailed > server-side logs. > > Regards, James. > > > >