David Jacot created KAFKA-20635:
-----------------------------------
Summary: Spurious "Writing records..." failed errors in the group
coordinator after partition leadership change
Key: KAFKA-20635
URL: https://issues.apache.org/jira/browse/KAFKA-20635
Project: Kafka
Issue Type: Bug
Affects Versions: 4.3.0, 4.2.0, 4.1.0, 4.0.0
Reporter: David Jacot
Assignee: David Jacot
During routine __consumer_offsets partition leadership changes, the group
coordinator spams ERROR-level logs for every in-flight write at the moment of
transition:
{noformat}
[GroupCoordinator id=N] Writing records to __consumer_offsets-N failed due to:
For requests intended only for the leader, this error indicates that the broker
is not the current leader ...
[GroupCoordinator id=N] Execution of FlushBatch failed due to For requests
intended only for the leader, this error indicates that the broker is not the
current leader ...
{noformat}
These appear on the group coordinator that lost leadership and last for the
duration of the in-flight batch queue. The behavior is correct —
NotLeaderOrFollowerException propagates through failCurrentBatch to the
deferred events and is mapped to NOT_COORDINATOR for clients via
CoordinatorOperationExceptionHelper, so clients retry against the new
coordinator. This is purely a logging-noise issue.
Same root cause as KAFKA-20634: the partition transitions to follower
synchronously while the coordinator unload is async. In that window,
partitionWriter.append calls replicaManager.appendRecordsToLeader which
legitimately rejects writes for a partition no longer led by this broker. The
exception is expected — but it gets logged at ERROR by the catch block in
flushCurrentBatch and by CoordinatorInternalEvent.complete.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)