[ https://issues.apache.org/jira/browse/KAFKA-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353502#comment-14353502 ]
Joel Koshy commented on KAFKA-1987: ----------------------------------- I actually think it would be worthwhile to improve the error logging. E.g., if it is a replica fetcher thread, then instead of showing an error, then provide a more meaningful info message: e.g., Could not fetch from partition [topicA, partition 30] as the leader may not have created the topic yet.. (or something clearer if possible) > Potential race condition in partition creation > ---------------------------------------------- > > Key: KAFKA-1987 > URL: https://issues.apache.org/jira/browse/KAFKA-1987 > Project: Kafka > Issue Type: Bug > Components: controller > Reporter: Todd Palino > > I am finding that there appears to be a race condition when creating > partitions, with replication factor 2 or higher, between the creation of the > partition on the leader and the follower. What appears to be happening is > that the follower is processing the command to create the partition before > the leader does, and when the follower starts the replica fetcher, it fails > with an UnknownTopicOrPartitionException. > The situation is that I am creating a large number of partitions on a > cluster, preparing it for data being mirrored from another cluster. So there > are a sizeable number of create and alter commands being sent sequentially. > Eventually, the replica fetchers start up properly. But it seems like the > controller should issue the command to create the partition to the leader, > wait for confirmation, and then issue the command to create the partition to > the followers. > 2015/02/26 21:11:50.413 INFO [LogManager] [kafka-request-handler-12] > [kafka-server] [] Created log for partition [topicA,30] in > /path_to/i001_caches with properties {segment.index.bytes -> 10485760, > file.delete.delay.ms -> 60000, segment.bytes -> 268435456, flush.ms -> 10000, > delete.retention.ms -> 86400000, index.interval.bytes -> 4096, > retention.bytes -> -1, min.insync.replicas -> 1, cleanup.policy -> delete, > unclean.leader.election.enable -> true, segment.ms -> 43200000, > max.message.bytes -> 1000000, flush.messages -> 20000, > min.cleanable.dirty.ratio -> 0.5, retention.ms -> 86400000, segment.jitter.ms > -> 0}. > 2015/02/26 21:11:50.418 WARN [Partition] [kafka-request-handler-12] > [kafka-server] [] Partition [topicA,30] on broker 1551: No checkpointed > highwatermark is found for partition [topicA,30] > 2015/02/26 21:11:50.418 INFO [ReplicaFetcherManager] > [kafka-request-handler-12] [kafka-server] [] [ReplicaFetcherManager on broker > 1551] Removed fetcher for partitions [topicA,30] > 2015/02/26 21:11:50.418 INFO [Log] [kafka-request-handler-12] [kafka-server] > [] Truncating log topicA-30 to offset 0. > 2015/02/26 21:11:50.450 INFO [ReplicaFetcherManager] > [kafka-request-handler-12] [kafka-server] [] [ReplicaFetcherManager on broker > 1551] Added fetcher for partitions List([[topicA,30], initOffset 0 to broker > id:1555,host:host1555.example.com,port:10251] ) > 2015/02/26 21:11:50.615 ERROR [ReplicaFetcherThread] > [ReplicaFetcherThread-0-1555] [kafka-server] [] > [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker > 1555:class kafka.common.UnknownTopicOrPartitionException > 2015/02/26 21:11:50.616 ERROR [ReplicaFetcherThread] > [ReplicaFetcherThread-0-1555] [kafka-server] [] > [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker > 1555:class kafka.common.UnknownTopicOrPartitionException > 2015/02/26 21:11:50.618 ERROR [ReplicaFetcherThread] > [ReplicaFetcherThread-0-1555] [kafka-server] [] > [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker > 1555:class kafka.common.UnknownTopicOrPartitionException > 2015/02/26 21:11:50.620 ERROR [ReplicaFetcherThread] > [ReplicaFetcherThread-0-1555] [kafka-server] [] > [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker > 1555:class kafka.common.UnknownTopicOrPartitionException > 2015/02/26 21:11:50.621 ERROR [ReplicaFetcherThread] > [ReplicaFetcherThread-0-1555] [kafka-server] [] > [ReplicaFetcherThread-0-1555], Error for partition [topicA,30] to broker > 1555:class kafka.common.UnknownTopicOrPartitionException > 2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)