I recently observed the following series of events for a particular partition 
(MyTopic-6):

2022-03-18 03:18:28,562 INFO  
[org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] 
'executor-thread-2' [Consumer clientId=consumer-MyTopicService-group-3, 
groupId=MyTopicService-group] Setting offset for partition MyTopic-6 to the 
committed offset FetchPosition{offset=438, offsetEpoch=Optional.empty, 
currentLeader=LeaderAndEpoch{leader=Optional[b-2.redacted.kafka.us<http://b-2.redacted.kafka.us>-east-1.amazonaws.com:9094
 (id: 2 rack: use1-az4)], epoch=64}}

-- RESTART (bring up new consumer node)

2022-04-01 15:17:47,943 INFO  
[org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] 
'executor-thread-6' [Consumer clientId=consumer-MyTopicService-group-7, 
groupId=MyTopicService-group] Setting offset for partition MyTopic-6 to the 
committed offset FetchPosition{offset=449, offsetEpoch=Optional.empty, 
currentLeader=LeaderAndEpoch{leader=Optional[b-2.redacted.kafka.us<http://b-2.redacted.kafka.us>-east-1.amazonaws.com:9094
 (id: 2 rack: use1-az4)], epoch=64}}

-- REBALANCE (drop old consumer node)

2022-04-01 15:18:24,414 INFO  
[org.apache.kafka.clients.consumer.internals.ConsumerCoordinator] 
'executor-thread-2' [Consumer clientId=consumer-MyTopicService-group-3, 
groupId=MyTopicService-group] Found no committed offset for partition MyTopic-6
2022-04-01 15:18:24,474 INFO  
[org.apache.kafka.clients.consumer.internals.SubscriptionState] 
'executor-thread-2' [Consumer clientId=consumer-MyTopicService-group-3, 
groupId=MyTopicService-group] Resetting offset for partition MyTopic-6 to 
position FetchPosition{offset=411, offsetEpoch=Optional.empty, 
currentLeader=LeaderAndEpoch{leader=Optional[b-2.redacted.kafka.us<http://b-2.redacted.kafka.us>-east-1.amazonaws.com:9094
 (id: 2 rack: use1-az4)], epoch=64}}.

Seems odd that no offsets were found at 2022-04-01 15:18:24,414 when they were 
clearly present 36 seconds earlier at 2022-04-01 15:17:47,943.

This resulted in message replay from offset 411-449.  This was in a test system 
only and we have duplicate detection in place but I'd still like to avoid 
similar occurrences in production if we can.

There has clearly been a low volume of traffic but there have been active 
consumers all the time.  We have 
log.retention.ms<http://log.retention.ms>=1814400000 (3 weeks) which I believe 
explains why it resumed from 411 as messages prior to that will have been 
deleted.

There may not have been any new traffic in the last 7 days (we have the default 
offset retention) so I'm wondering if there is a chance the offsets were 
deleted during the rebalance when I presume there's a brief moment when there 
is no active consumer.  My understanding is that they shouldn't be deleted 
until there has been no consumer for 7 days 
(https://kafka.apache.org/27/documentation.html#brokerconfigs_offsets.retention.minutes
 - not using static assignment).  Is it possible the logic is actually checking 
for no consumer now and no offsets for 7 days instead?

Server and Client are 2.7.2.  Sorry I don't have any more detailed server-side 
logs.

Regards, James.

Reply via email to