[
https://issues.apache.org/jira/browse/KAFKA-20634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Jacot resolved KAFKA-20634.
---------------------------------
Fix Version/s: 4.1.3
4.0.3
4.3.1
4.4.0
4.2.2
Resolution: Fixed
> Spurious HighWatermarkUpdate failed errors in the group coordinator after
> partition leadership change
> -----------------------------------------------------------------------------------------------------
>
> Key: KAFKA-20634
> URL: https://issues.apache.org/jira/browse/KAFKA-20634
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 4.0.0, 4.1.0, 4.2.0, 4.3.0
> Reporter: David Jacot
> Assignee: David Jacot
> Priority: Minor
> Fix For: 4.1.3, 4.0.3, 4.3.1, 4.4.0, 4.2.2
>
>
> During routine __consumer_offsets partition leadership changes, the group
> coordinator spams ERROR-level logs like:
> {noformat}
> [GroupCoordinator id=N] Execution of HighWatermarkUpdate failed due to New
> committed offset X of __consumer_offsets-N must be less than or equal to Y.
> [GroupCoordinator id=N] Execution of HighWatermarkUpdate failed due to No
> in-memory snapshot for epoch X. Snapshot epochs are: Y.
> {noformat}
> These appear on the group coordinator that lost leadership of a
> __consumer_offsets partition and last a few seconds. The exceptions are
> caught inside CoordinatorInternalEvent and don't propagate to clients, but
> they create unnecessary and confusing noise.
> Root cause: when a partition transitions to follower, the local log gets
> truncated and replicates from the new leader, advancing HWM. The group
> coordinator stays ACTIVE until scheduleUnloadOperation runs (async). In that
> window the HWM listener fires with offsets that don't match the coordinator's
> write boundaries, violating invariants in
> SnapshottableCoordinator.updateLastCommittedOffset and in
> SnapshotRegistry.getSnapshot, and hence resulting in IllegalStateExceptions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)