Re: kafka.server.ReplicaManager error

2015-02-05 Thread svante karlsson
In our case unclean leader selection was enabled As the cluster should have been empty I can't really say that we did not lose any data but as I wrote earlier, I could not get the log messages to stop until I took down all brokers at the same time. 2015-02-05 22:16 GMT+01:00 Kyle Banker :

Re: kafka.server.ReplicaManager error

2015-02-05 Thread Kyle Banker
Thanks for sharing, svante. We're also running 0.8.2. Our cluster appears to be completely unusable at this point. We tried restarting the "down" broker with a clean log directory, and it's doing nothing. It doesn't seem to be able to get topic data, which this Zookeeper message appears to confirm

Re: kafka.server.ReplicaManager error

2015-02-05 Thread svante karlsson
I believe I've had the same problem on the 0.8.2 rc2. We had a idle test cluster with unknown health status and I applied rc3 without checking if everything was ok before. Since that cluster had been doing nothing for a couple of days and the retention time was 48 hours it's reasonable to assume th

Re: kafka.server.ReplicaManager error

2015-02-05 Thread Kyle Banker
Digging in a bit more, it appears that the "down" broker had likely partially failed. Thus, it was still attempting to fetch offsets that no longer exists. Does this make sense as an explanation of the above-mentioned behavior? On Thu, Feb 5, 2015 at 10:58 AM, Kyle Banker wrote: > Dug into this

Re: kafka.server.ReplicaManager error

2015-02-05 Thread Kyle Banker
Dug into this a bit more, and it turns out that we lost one of our 9 brokers at the exact moment when this started happening. At the time that we lost the broker, we had no under-replicated partitions. Since the broker disappeared, we've had a fairly constant number of under replicated partitions.