Also strange: If I start this broker back up, and then issue a kafkacat metadata request, I do not see any 'Broker: Replica not available’, even though this broker’s preferred partitions have not yet replicated back in sync, and are not the leader. Everything seems normal.
Somehow this broker being offline makes the rest of the cluster think that its none of its replicas are available. > On Aug 15, 2015, at 11:18, Andrew Otto <ao...@wikimedia.org> wrote: > > I am having trouble with a single broker causing consumers to lag. As I am > troubleshooting this issue, I have stopped this broker in the hopes that > other replicas will take over as leader for this broker’s preferred > partitions. However, when I do so, Camus reports: > > kafka.CamusJob: Skipping the creation of ETL request for Topic : > webrequest_text and Partition : 3 Exception : > kafka.common.ReplicaNotAvailableException > > kafka-topics.sh —describe shows: > > Topic: webrequest_text Partition: 3 Leader: 22 Replicas: > 22,21,12 Isr: 22,21 > > However, when I use kafkacat to look at metadata (which asks for metadata > from Kafka rather than Zookeeper), I see: > > partition 3, leader 22, replicas: 22,21, isrs: 22,21, Broker: Replica not > available > > > Doh! Clearly there is a replica available. I can use kafkacat and > kafka-simple-consumer-shell to consume from this partition from either in > sync replica just fine. > > This happens for all partitions for whom the stopped broker was previously > the leader. > > Anyone know why I’d see something like this? I have not seen this error > before upgrading to 0.8.2.1. > > Thanks, > -Andrew > >