[jira] [Resolved] (KAFKA-790) Kafka server throws UnknownTopicOrPartitionException right after becoming leader for a new partition

Neha Narkhede (JIRA) Thu, 07 Mar 2013 09:51:12 -0800

     [ 
https://issues.apache.org/jira/browse/KAFKA-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Neha Narkhede resolved KAFKA-790.
---------------------------------

    Resolution: Won't Fix

This happens in cases when the broker becomes follower faster than another 
broker becomes leader. The root cause is the become-follower state change 
design, where we don't wait until the other broker finishes becoming leader and 
is ready to serve the replica fetch requests. To get rid of this exception, the 
controller would have to send a "start-replica" state change request to the 
followers after it has heard from the leaders about the become-leader 
completion. This would reduce the occurrence of these fake 
UnknownTopicOrPartitionException significantly. However, even after the 
controller receives the response for become-leader and after it sends the 
start-replica to the followers, the leader can die and we can get the same 
errors in the logs. One can argue that this scenario is very rare and the fix 
will work in most cases. On the other hand, implementing the fix will be 
somewhat complex on the controller since now it has to keep track of the 
outstanding leader and isr requests and send the start replica state change 
accordingly. 

I think we can wait and see how annoying this error message gets and then 
thinking about fixing it. Thoughts ?

                
> Kafka server throws UnknownTopicOrPartitionException right after becoming 
> leader for a new partition
> ----------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-790
>                 URL: https://issues.apache.org/jira/browse/KAFKA-790
>             Project: Kafka
>          Issue Type: Bug
>          Components: replication
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Blocker
>              Labels: kafka-0.8, p1
>
> 2013/03/05 01:24:01.207 INFO [ReplicaManager] [kafka-request-handler-5] 
> [kafka] []  Replica Manager on Broker 273: Completed leader and isr request 
> Name: LeaderAndIsrRequest; Version: 0; CorrelationId: 15
> 38; ClientId: ; AckTimeoutMs: 1000 ms; ControllerEpoch: 3; 
> PartitionStateInfo: (foo,7) -> 
> PartitionStateInfo(LeaderIsrAndControllerEpoch({ "ISR":"273,271", 
> "leader":"273", "leaderEpoch":"0" },3),2),(foo,4) -> 
> PartitionStateInfo(LeaderIsrAndControllerEpoch({ "ISR":"270,273", 
> "leader":"270", "leaderEpoch":"0" },3),2),(foo,1) -> 
> PartitionStateInfo(LeaderIsrAndControllerEpoch({ "ISR":"273,270", 
> "leader":"273", "leaderEpoch":"0" },3),2); Leaders: 
> id:270,host:host270,port:10251,id:273,host:host273
> ,port:10251
> 2013/03/05 01:24:01.473 INFO [Processor] [kafka-processor-10251-4] [kafka] [] 
>  Closing socket connection to /172.20.72.66.
> 2013/03/05 01:24:01.505 INFO [Processor] [kafka-processor-10251-4] [kafka] [] 
>  Closing socket connection to /172.20.72.244.
> 2013/03/05 01:24:01.521 INFO [Processor] [kafka-processor-10251-5] [kafka] [] 
>  Closing socket connection to /172.20.72.243.
> 2013/03/05 01:24:01.555 INFO [Processor] [kafka-processor-10251-3] [kafka] [] 
>  Closing socket connection to /172.20.72.11.
> 2013/03/05 01:24:01.568 INFO [Processor] [kafka-processor-10251-1] [kafka] [] 
>  Closing socket connection to /172.20.72.66.
> 2013/03/05 01:24:01.606 INFO [Processor] [kafka-processor-10251-0] [kafka] [] 
>  Closing socket connection to /172.20.72.66.
> 2013/03/05 01:24:01.758 INFO [Processor] [kafka-processor-10251-0] [kafka] [] 
>  Closing socket connection to /172.20.72.66.
> 2013/03/05 01:24:01.776 ERROR [KafkaApis] [kafka-request-handler-7] [kafka] 
> []  [KafkaApi-273] error when processing request (foo,1,0,1048576)
> kafka.common.UnknownTopicOrPartitionException: Topic foo partition 1 doesn't 
> exist on 273
>         at 
> kafka.server.ReplicaManager.getLeaderReplicaIfLocal(ReplicaManager.scala:170)
>         at 
> kafka.server.KafkaApis.kafka$server$KafkaApis$$readMessageSet(KafkaApis.scala:301)
>         at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:268)
>         at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMessageSets$1.apply(KafkaApis.scala:264)
>         at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
>         at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
>         at scala.collection.immutable.Map$Map2.foreach(Map.scala:127)
>         at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
>         at scala.collection.immutable.Map$Map2.map(Map.scala:110)
>         at 
> kafka.server.KafkaApis.kafka$server$KafkaApis$$readMessageSets(KafkaApis.scala:264)
>         at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:231)
>         at kafka.server.KafkaApis.handle(KafkaApis.scala:61)
>         at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:41)
>         at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (KAFKA-790) Kafka server throws UnknownTopicOrPartitionException right after becoming leader for a new partition

Reply via email to