David Jacot created KAFKA-20635:
-----------------------------------

             Summary: Spurious "Writing records..." failed errors in the group 
coordinator after partition leadership change
                 Key: KAFKA-20635
                 URL: https://issues.apache.org/jira/browse/KAFKA-20635
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 4.3.0, 4.2.0, 4.1.0, 4.0.0
            Reporter: David Jacot
            Assignee: David Jacot


During routine __consumer_offsets partition leadership changes, the group 
coordinator spams ERROR-level logs for every in-flight write at the moment of 
transition:

{noformat}
[GroupCoordinator id=N] Writing records to __consumer_offsets-N failed due to: 
For requests intended only for the leader, this error indicates that the broker 
is not the current leader ...
[GroupCoordinator id=N] Execution of FlushBatch failed due to For requests 
intended only for the leader, this error indicates that the broker is not the 
current leader ...
{noformat}

These appear on the group coordinator that lost leadership and last for the 
duration of the in-flight batch queue. The behavior is correct — 
NotLeaderOrFollowerException propagates through failCurrentBatch to the 
deferred events and is mapped to NOT_COORDINATOR for clients via 
CoordinatorOperationExceptionHelper, so clients retry against the new 
coordinator. This is purely a logging-noise issue.

Same root cause as KAFKA-20634: the partition transitions to follower 
synchronously while the coordinator unload is async. In that window, 
partitionWriter.append calls replicaManager.appendRecordsToLeader which 
legitimately rejects writes for a partition no longer led by this broker. The 
exception is expected — but it gets logged at ERROR by the catch block in 
flushCurrentBatch and by CoordinatorInternalEvent.complete.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to