[ https://issues.apache.org/jira/browse/KAFKA-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14363678#comment-14363678 ]
Jun Rao commented on KAFKA-2020: -------------------------------- The following is the protocol for TopicMetadataResponse. Currently, we do the following: 1. If leader is not available, we set the partition level error code to LeaderNotAvailable. 2. If a non-leader replica is not available, we take that replica out of the the assigned replica list and isr in the response. As an indication for doing that, we set the partition level error code to ReplicaNotAvailable. This has a few problems. First, ReplicaNotAvailable probably shouldn't be an error, at least for the normal producer/consumer clients that just want to find out the leader. Second, it can happen that both the leader and another replica are not available at the same time. There is no error code to indicate both. Third, even if a replica is not available, it's still useful to return its replica id since some clients (e.g. admin tool) may still make use of it. One way to address this issue is to always return the replica id for leader, assigned replicas, and isr regardless of whether the corresponding broker is live or not. Since we also return the list of live brokers, the client can figure out whether a leader or a replica is live or not and act accordingly. This way, we don't need to set the partition level error code when the leader or a replica is not available. This doesn't change the wire protocol, but does change the semantics. So, a new version of the protocol is needed. Since we are debating evolving TopicMetadataRequest in KIP-4. We can potentially piggyback on that. {code} MetadataResponse => [Broker][TopicMetadata] Broker => NodeId Host Port (any number of brokers may be returned) NodeId => int32 Host => string Port => int32 TopicMetadata => TopicErrorCode TopicName [PartitionMetadata] TopicErrorCode => int16 PartitionMetadata => PartitionErrorCode PartitionId Leader Replicas Isr PartitionErrorCode => int16 PartitionId => int32 Leader => int32 Replicas => [int32] Isr => [int32] {code} > I expect ReplicaNotAvailableException to have proper Javadocs > ------------------------------------------------------------- > > Key: KAFKA-2020 > URL: https://issues.apache.org/jira/browse/KAFKA-2020 > Project: Kafka > Issue Type: Bug > Components: consumer > Reporter: Chris Riccomini > Assignee: Neha Narkhede > > It looks like ReplicaNotAvailableException was copy and pasted from > LeaderNotAvailable exception. The Javadocs were never changed. This means > that users think that ReplicaNotAvailableException signifies leaders are not > available. This is very different from, "I can ignore this exception," which > is what the Kafka protocol docs say to do with ReplicaNotAvailableException. > Related: what's the point of ReplicaNotAvailableException if it's supposed to > be ignored? -- This message was sent by Atlassian JIRA (v6.3.4#6332)