Rajini Sivaram created KAFKA-9171:
-------------------------------------

             Summary: DelayedFetch completion may throw exception, causing 
successful produce to be failed
                 Key: KAFKA-9171
                 URL: https://issues.apache.org/jira/browse/KAFKA-9171
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 2.4.0
            Reporter: Rajini Sivaram
            Assignee: Rajini Sivaram
             Fix For: 2.4.0


I was looking at the logs of the system test failure of ReassignPartitionsTest.

Logs show produce error ReplicaNotAvailableException for two records in the 
producer log, but the data logs of all the brokers contain the records. The 
offsets of these records are returned as successful produce for two subsequent 
records which don't appear in the logs and hence the test failed.

Broker logs of the leader at the time of the reassignment and leader change 
show:

 

{{[2019-11-11 07:23:17,727] ERROR [ReplicaManager broker=3] Error processing 
append operation on partition test_topic-17 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.ReplicaNotAvailableException: Partition 
test_topic-5 is not available}}

This is failing the append operation on `test_topic-17` when a different 
partition `test_topic-5` was unavailable for fetch. I think it is fetch since 
produce would have thrown NotLeaderForPartitionException rather than 
ReplicaNotAvailableException.

We don't expect DelayedFetch to throw exceptions and it looks like we are not 
handling `ReplicaNotAvailableException`.

I am not sure if this fixes the issues with ReassignPartitionsTest, but this 
seems to a scenario that we should fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to