[ 
https://issues.apache.org/jira/browse/KAFKA-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-9840.
------------------------------------
    Fix Version/s: 2.6.0
       Resolution: Fixed

> Consumer should not use OffsetForLeaderEpoch without current epoch validation
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-9840
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9840
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 2.4.1
>            Reporter: Jason Gustafson
>            Assignee: Boyang Chen
>            Priority: Major
>             Fix For: 2.6.0
>
>
> We have observed a case where the consumer attempted to detect truncation 
> with the OffsetsForLeaderEpoch API against a broker which had become a 
> zombie. In this case, the last epoch known to the consumer was higher than 
> the last epoch known to the zombie broker, so the broker returned -1 as both 
> the end offset and epoch in the response. The consumer did not check for this 
> in the response, which resulted in the following message:
> {code}
> Truncation detected for partition topic-1 at offset 
> FetchPosition{offset=11859, offsetEpoch=Optional[46], 
> currentLeader=LeaderAndEpoch{leader=broker-host (id: 3 rack: null), 
> epoch=-1}}, resetting offset to the first offset known to diverge 
> FetchPosition{offset=-1, offsetEpoch=Optional[-1], 
> currentLeader=LeaderAndEpoch{broker-host (id: 3 rack: null), epoch=-1}} 
> (org.apache.kafka.clients.consumer.internals.SubscriptionState:414)
> {code}
> There are a couple ways we the consumer can handle this situation better. 
> First, the reason we did not detect the zombie broker is that we did not 
> include the current leader epoch in the OffsetForLeaderEpoch request. This 
> was likely because of KAFKA-9212. Following this patch, we would not 
> initialize the current leader epoch from metadata responses because there are 
> cases that we cannot rely on it. But if the client cannot rely on being able 
> to detect zombies, then the epoch validation is less useful anyway. So the 
> simple solution is to not bother with the validation unless we have a 
> reliable current leader epoch.
> Second, the consumer needs to check for the case when the returned offset and 
> epoch are not defined. In this case, we have to treat this as a normal 
> OffsetOutOfRange case and invoke the reset policy. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to