Dug into this a bit more, and it turns out that we lost one of our 9 brokers at the exact moment when this started happening. At the time that we lost the broker, we had no under-replicated partitions. Since the broker disappeared, we've had a fairly constant number of under replicated partitions. This makes some sense, of course.
Still, the log message doesn't. On Thu, Feb 5, 2015 at 10:39 AM, Kyle Banker <kyleban...@gmail.com> wrote: > I have a 9-node Kafka cluster, and all of the brokers just started > spouting the following error: > > ERROR [Replica Manager on Broker 1]: Error when processing fetch request > for partition [mytopic,57] offset 0 from follower with correlation id > 58166. Possible cause: Request for offset 0 but we only have log segments > in the range 39 to 39. (kafka.server.ReplicaManager) > > The "mytopic" topic has a replication factor of 3, and metrics are showing > a large number of under replicated partitions. > > My assumption is that a log aged out but that the replicas weren't aware > of it. > > In any case, this problem isn't fixing itself, and the volume of log > messages of this type is enormous. > > What might have caused this? How does one resolve it? >