Relevant parameters from server.properties log.dir=/var/lib/fk-3p-kafka/logs log.flush.interval.messages=10000 log.flush.interval.ms=1000 log.retention.hours=168 log.segment.bytes=536870912 log.cleanup.interval.mins=1 log.retention.hours=336
On Thu, Jul 24, 2014 at 10:34 PM, Kashyap Paidimarri <kashy...@gmail.com> wrote: > We just noticed that one of our topics has been horribly misbehaving. > > *retention.ms <http://retention.ms>* for the topic is set to 1209600000 ms > > However, segments are getting schedule for deletetion as soon as a new one > is rolled over. And naturally consumers are running into a > kafka.common.OffsetOutOfRangeException whenever this happens. > > Is this a known bug? It is incredibly serious. We seem to have lost about > 40 million messages on a single topic and are yet to figure out what all > topics are affected. > > I thought of restarting Kafka but figured I'd leave it untouched while I > figure out what I can capture for finding the root cause. > > Meanwhile in order to keep from losing any more data, I have a periodic > job that is doing a *'cp -al' *of the partitions into a separate folder. > That way Kafka goes ahead and deletes the segment but the data is not lost > from the filesystem. > > If this is a unseen bug, what should I save from the running instance. > > By the way, this has affected all partitions and replicas of the topic and > not on a specific host. > -- “ The difference between ramen and varelse is not in the creature judged, but in the creature judging. When we declare an alien species to be ramen, it does not mean that *they* have passed a threshold of moral maturity. It means that *we* have. —Demosthenes, *Letter to the Framlings* ”