In our case unclean leader selection was enabled
As the cluster should have been empty I can't really say that we did not
lose any data but as I wrote earlier, I could not get the log messages to
stop until I took down all brokers at the same time.
2015-02-05 22:16 GMT+01:00 Kyle Banker :
Thanks for sharing, svante. We're also running 0.8.2.
Our cluster appears to be completely unusable at this point. We tried
restarting the "down" broker with a clean log directory, and it's doing
nothing. It doesn't seem to be able to get topic data, which this Zookeeper
message appears to confirm
I believe I've had the same problem on the 0.8.2 rc2. We had a idle test
cluster with unknown health status and I applied rc3 without checking if
everything was ok before. Since that cluster had been doing nothing for a
couple of days and the retention time was 48 hours it's reasonable to
assume th
Digging in a bit more, it appears that the "down" broker had likely
partially failed. Thus, it was still attempting to fetch offsets that no
longer exists. Does this make sense as an explanation of the
above-mentioned behavior?
On Thu, Feb 5, 2015 at 10:58 AM, Kyle Banker wrote:
> Dug into this
Dug into this a bit more, and it turns out that we lost one of our 9
brokers at the exact moment when this started happening. At the time that
we lost the broker, we had no under-replicated partitions. Since the broker
disappeared, we've had a fairly constant number of under replicated
partitions.