We are running Kafka 0.7.2. We set log.roll.hours=1. I hoped that meant
logs would be rolled every hour, or more. Only, sometimes logs that are
many hours (sometimes days) old have more data added to them. This perturbs
our systems for reasons I won't get in to.

Have others observed this? Is it a bug? Is there a planned fix?

I don't know Scala or Kafka well, but I have proposal for why this might
happen: upon restart, a broker forgets when its log files have been
appended to ("firstAppendTime"). Then a potentially infinite amount of time
later, the restarted broker receives another message for the particular
(topic, partition), and starts the clock again. It will then roll over that
log after an hour.

https://svn.apache.org/repos/asf/kafka/branches/0.7/core/src/main/scala/kafka/server/KafkaConfig.scalasays:

  /* the maximum time before a new log segment is rolled out */
  val logRollHours = Utils.getIntInRange(props, "log.roll.hours", 24*7, (1,
Int.MaxValue))

https://svn.apache.org/repos/asf/kafka/branches/0.7/core/src/main/scala/kafka/log/Log.scalahas
maybeRoll, which needs segment.firstAppendTime defined. It also has
updateFirstAppendTime() which says if it's empty, then set it.

Reply via email to