[ 
https://issues.apache.org/jira/browse/KAFKA-20634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-20634.
---------------------------------
    Fix Version/s: 4.1.3
                   4.0.3
                   4.3.1
                   4.4.0
                   4.2.2
       Resolution: Fixed

> Spurious HighWatermarkUpdate failed errors in the group coordinator after 
> partition leadership change
> -----------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-20634
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20634
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 4.0.0, 4.1.0, 4.2.0, 4.3.0
>            Reporter: David Jacot
>            Assignee: David Jacot
>            Priority: Minor
>             Fix For: 4.1.3, 4.0.3, 4.3.1, 4.4.0, 4.2.2
>
>
> During routine __consumer_offsets partition leadership changes, the group 
> coordinator spams ERROR-level logs like:
> {noformat}
> [GroupCoordinator id=N] Execution of HighWatermarkUpdate failed due to New 
> committed offset X of __consumer_offsets-N must be less than or equal to Y.
> [GroupCoordinator id=N] Execution of HighWatermarkUpdate failed due to No 
> in-memory snapshot for epoch X. Snapshot epochs are: Y.
> {noformat}
> These appear on the group coordinator that lost leadership of a 
> __consumer_offsets partition and last a few seconds. The exceptions are 
> caught inside CoordinatorInternalEvent and don't propagate to clients, but 
> they create unnecessary and confusing noise.
> Root cause: when a partition transitions to follower, the local log gets 
> truncated and replicates from the new leader, advancing HWM. The group 
> coordinator stays ACTIVE until scheduleUnloadOperation runs (async). In that 
> window the HWM listener fires with offsets that don't match the coordinator's 
> write boundaries, violating invariants in 
> SnapshottableCoordinator.updateLastCommittedOffset and in 
> SnapshotRegistry.getSnapshot, and hence resulting in IllegalStateExceptions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to