I know we ran into the same issue with Camus at LinkedIn and it has since been fixed. I hope that we committed the patch to open source.
Are you running the latest version of Camus? -Clark Sent from my iPhone > On Aug 15, 2015, at 10:25 AM, Andrew Otto <ao...@wikimedia.org> wrote: > > Hm, interesting. So my real issue is more with Camus than with cluster > problems? It seems that Camus won’t consume if it encounters a > ReplicaNotAvailableException. > > >> On Aug 15, 2015, at 12:02, Clark Haskins <cl...@kafka.guru> wrote: >> >> Replica not available is not a fatal exception. This simply means that there >> is a replica that is down. >> >> If you get Leader not available that means the partition is offline. >> >> -Clark >> >> Sent from my iPhone >> >>> On Aug 15, 2015, at 8:41 AM, Andrew Otto <ao...@wikimedia.org> wrote: >>> >>> Also strange: If I start this broker back up, and then issue a kafkacat >>> metadata request, I do not see any 'Broker: Replica not available’, even >>> though this broker’s preferred partitions have not yet replicated back in >>> sync, and are not the leader. Everything seems normal. >>> >>> Somehow this broker being offline makes the rest of the cluster think that >>> its none of its replicas are available. >>> >>> >>> >>>> On Aug 15, 2015, at 11:18, Andrew Otto <ao...@wikimedia.org> wrote: >>>> >>>> I am having trouble with a single broker causing consumers to lag. As I >>>> am troubleshooting this issue, I have stopped this broker in the hopes >>>> that other replicas will take over as leader for this broker’s preferred >>>> partitions. However, when I do so, Camus reports: >>>> >>>> kafka.CamusJob: Skipping the creation of ETL request for Topic : >>>> webrequest_text and Partition : 3 Exception : >>>> kafka.common.ReplicaNotAvailableException >>>> >>>> kafka-topics.sh —describe shows: >>>> >>>> Topic: webrequest_text Partition: 3 Leader: 22 Replicas: 22,21,12 >>>> Isr: 22,21 >>>> >>>> However, when I use kafkacat to look at metadata (which asks for metadata >>>> from Kafka rather than Zookeeper), I see: >>>> >>>> partition 3, leader 22, replicas: 22,21, isrs: 22,21, Broker: Replica not >>>> available >>>> >>>> >>>> Doh! Clearly there is a replica available. I can use kafkacat and >>>> kafka-simple-consumer-shell to consume from this partition from either in >>>> sync replica just fine. >>>> >>>> This happens for all partitions for whom the stopped broker was previously >>>> the leader. >>>> >>>> Anyone know why I’d see something like this? I have not seen this error >>>> before upgrading to 0.8.2.1. >>>> >>>> Thanks, >>>> -Andrew >