Re: 0.8.2.0 behavior change: ReplicaNotAvailableError

2015-01-14 Thread Jun Rao
Yes, the topicMetadataResponse format is a bit weird. The main reason that it's done that way is that we don't want to return null replica objects to the client. An alternative is to have the broker return all the replica ids and a list of live brokers, and let the client decide what to do with rep

Re: 0.8.2.0 behavior change: ReplicaNotAvailableError

2015-01-14 Thread Jun Rao
Dana, Perhaps you can make the change in the wiki first and then let us know for review? Thanks, Jun On Wed, Jan 14, 2015 at 6:35 PM, Dana Powers wrote: > Thanks -- i see that this was more of a bug in 0.8.1 than a regression in > 0.8.2. But I do think the 0.8.2 bug fix to the metadata cache

Re: 0.8.2.0 behavior change: ReplicaNotAvailableError

2015-01-14 Thread Jay Kreps
I agree. Also, is this behavior a good one? It seems kind of hacky to give an error code and a result both, no? -Jay On Wed, Jan 14, 2015 at 6:35 PM, Dana Powers wrote: > Thanks -- i see that this was more of a bug in 0.8.1 than a regression in > 0.8.2. But I do think the 0.8.2 bug fix to the

Re: 0.8.2.0 behavior change: ReplicaNotAvailableError

2015-01-14 Thread Dana Powers
Thanks -- i see that this was more of a bug in 0.8.1 than a regression in 0.8.2. But I do think the 0.8.2 bug fix to the metadata cache means that the very common scenario of a single broker failure (and subsequent partition leadership change) will now return error codes in the MetadataResponse --

Re: 0.8.2.0 behavior change: ReplicaNotAvailableError

2015-01-14 Thread Jun Rao
Hi, Dana, Thanks for reporting this. I investigated this a bit more. What you observed is the following: a client getting a partition level error code of ReplicaNotAvailableError in a TopicMetadataResponse when one of replicas is offline. The short story is that that behavior can already happen in

0.8.2.0 behavior change: ReplicaNotAvailableError

2015-01-14 Thread Dana Powers
Overall the 0.8.2.0 release candidate looks really good. All of the kafka-python integration tests pass as they do w/ prior servers, except one... When testing recovery from a broker failure / leader switch, we now see a ReplicaNotAvailableError in broker metadata / PartitionMetadata, which we do