Commented on the jira. Thanks,
Jun On Thu, May 2, 2013 at 2:22 PM, Dan Frankowski <dfran...@gmail.com> wrote: > Someone pointed out a particularly easy fix: don't reuse files after a > restart. Done. I really like that. Simple. Any chance of this happening any > time soon? > > > On Sun, Apr 28, 2013 at 2:04 AM, Swapnil Ghike <sgh...@linkedin.com> > wrote: > > > @Dan: Upon restart of the broker, if a segment already has data, the > > broker resets the firstAppendTime of the segment to the time when that > > segment's file handles are being loaded into memory. Thus as you > correctly > > explained, every time you shut down a broker, the broker essentially > > forgets the firstAppendTime. This behavior is present in both 0.7.2 and > > 0.8. > > > > As Jun said, ideally we should set firstAppendTime to the file creation > > time. Unfortunately Java nio can provide you provide you that information > > only if the underlying filesystem implementation supports the notion of > > file creation time. > > > > Thanks for filing the JIRA, these are good suggestions. > > > > @Jason: Thanks for pointing out that log.roll.hours is not documented on > > the website. This config was added late in 0.7 and we probably forgot to > > update the website. We have filed KAFKA-834/KAFKA-835 to update the > > configs and other documentation on the website in general. Please let us > > know if you see any other missing piece. > > > > Thanks, > > Swapnil > > > > On 4/27/13 2:36 PM, "Dan Frankowski" <dfran...@gmail.com> wrote: > > > > >I believe there is a separate watcher thread. The only issue is upon > > >restart the broker forgets when the file was created. The behavior I > > >described (files can be appended to infinitely) is awkward for us. We > have > > >tried to work around it. > > > > > > > > >On Fri, Apr 26, 2013 at 10:32 AM, Adam Talaat <atal...@extole.com> > wrote: > > > > > >> I don't know how Kafka's rollover algorithm is implemented, but this > is > > >> common behavior for other logging frameworks. You would need a > separate > > >> watcher/scheduled thread to rollover a log file, even if no events > were > > >> coming in. Logback (and probably log4j, by the same author) dispenses > > >>with > > >> the watcher thread. Instead, it checks each message as it comes in and > > >> decides whether the message should belong in a new file. If it > should, a > > >> rollover of the old file is triggered and the message is deposited in > > >>the > > >> new file. But no rollover will occur until a message that belongs in a > > >>new > > >> file arrives. > > >> > > >> Cheers, > > >> Adam > > >> > > >> > > >> > > >> On Fri, Apr 26, 2013 at 9:52 AM, Jason Rosenberg <j...@squareup.com> > > >>wrote: > > >> > > >> > By the way, is there a reason why 'log.roll.hours' is not documented > > >>on > > >> the > > >> > apache configuration page: > > >>http://kafka.apache.org/configuration.html ? > > >> > > > >> > It's possible to find this setting (and several other undocumented > > >> > settings) by looking at the source code. I'm just not sure why the > > >> > complete set of options is not documented on the site (is it meant > to > > >>be > > >> > experimental?). > > >> > > > >> > Jason > > >> > > > >> > > > >> > On Fri, Apr 26, 2013 at 8:19 AM, Dan Frankowski <dfran...@gmail.com > > > > >> > wrote: > > >> > > > >> > > https://issues.apache.org/jira/browse/KAFKA-881 > > >> > > > > >> > > Thanks. > > >> > > > > >> > > > > >> > > On Fri, Apr 26, 2013 at 7:40 AM, Jun Rao <jun...@gmail.com> > wrote: > > >> > > > > >> > > > Yes, for low volume topic, the time-based rolling can be > > >>imprecise. > > >> > Could > > >> > > > you file a jira and describe your suggestions there? Ideally, we > > >> should > > >> > > set > > >> > > > firstAppendTime to the file creation time. However, it doesn't > > >>seem > > >> you > > >> > > can > > >> > > > get the creation time in java. > > >> > > > > > >> > > > Thanks, > > >> > > > > > >> > > > Jun > > >> > > > > > >> > > > > > >> > > > On Thu, Apr 25, 2013 at 11:12 PM, Dan Frankowski > > >><dfran...@gmail.com > > >> > > > >> > > > wrote: > > >> > > > > > >> > > > > We have high-volume topics and low-volume topics. The problem > > >> occurs > > >> > > more > > >> > > > > often for low-volume topics to be sure. > > >> > > > > > > >> > > > > But if my hypothesis is correct about why it is happening, > here > > >>is > > >> a > > >> > > case > > >> > > > > where rolling is longer than an hour, even on a high volume > > >>topic: > > >> > > > > > > >> > > > > - write to a topic for 20 minutes > > >> > > > > - restart the broker > > >> > > > > - wait for 5 days > > >> > > > > - write to a topic for 20 minutes > > >> > > > > - restart the broker > > >> > > > > - write to a topic for an hour > > >> > > > > > > >> > > > > The rollover time was now 5 days, 1 hour, 40 minutes. You can > > >>make > > >> it > > >> > > as > > >> > > > > long as you want. Does this make sense? > > >> > > > > > > >> > > > > We would like the rollover time to be no more than an hour, > > >>even if > > >> > the > > >> > > > > broker is restarted, or the topic is low-volume. > > >> > > > > > > >> > > > > The cleanest way to do that might be to roll over on the hour > no > > >> > matter > > >> > > > > when the file was started. That would be too fast sometimes, > but > > >> > that's > > >> > > > > fine. A second way would be to embed the first append time in > > >>the > > >> > file > > >> > > > > name. A third way (not perfect, but an approximation at least) > > >> would > > >> > be > > >> > > > to > > >> > > > > not to write to a segment if firstAppendTime is not defined > and > > >>the > > >> > > > > timestamp on the file is more than an hour old. There are > > >>probably > > >> > > other > > >> > > > > ways. > > >> > > > > > > >> > > > > What say you? > > >> > > > > > > >> > > > > > > >> > > > > On Thu, Apr 25, 2013 at 9:49 PM, Jun Rao <jun...@gmail.com> > > >>wrote: > > >> > > > > > > >> > > > > > That logic in 0.7.2 seems correct. Basically, > firstAppendTime > > >>is > > >> > set > > >> > > on > > >> > > > > > first append to a log segment. Then, later on, when a new > > >>message > > >> > is > > >> > > > > > appended and the elapsed time since firstAppendTime is > larger > > >> than > > >> > > the > > >> > > > > roll > > >> > > > > > time, a new segment is rolled. Is your data constantly being > > >> > > produced? > > >> > > > > > > > >> > > > > > Thanks, > > >> > > > > > > > >> > > > > > Jun > > >> > > > > > > > >> > > > > > > > >> > > > > > On Thu, Apr 25, 2013 at 12:44 PM, Dan Frankowski < > > >> > dfran...@gmail.com > > >> > > > > > >> > > > > > wrote: > > >> > > > > > > > >> > > > > > > We are running Kafka 0.7.2. We set log.roll.hours=1. I > hoped > > >> that > > >> > > > meant > > >> > > > > > > logs would be rolled every hour, or more. Only, sometimes > > >>logs > > >> > that > > >> > > > are > > >> > > > > > > many hours (sometimes days) old have more data added to > > >>them. > > >> > This > > >> > > > > > perturbs > > >> > > > > > > our systems for reasons I won't get in to. > > >> > > > > > > > > >> > > > > > > Have others observed this? Is it a bug? Is there a planned > > >>fix? > > >> > > > > > > > > >> > > > > > > I don't know Scala or Kafka well, but I have proposal for > > >>why > > >> > this > > >> > > > > might > > >> > > > > > > happen: upon restart, a broker forgets when its log files > > >>have > > >> > been > > >> > > > > > > appended to ("firstAppendTime"). Then a potentially > infinite > > >> > amount > > >> > > > of > > >> > > > > > time > > >> > > > > > > later, the restarted broker receives another message for > the > > >> > > > particular > > >> > > > > > > (topic, partition), and starts the clock again. It will > then > > >> roll > > >> > > > over > > >> > > > > > that > > >> > > > > > > log after an hour. > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > https://svn.apache.org/repos/asf/kafka/branches/0.7/core/src/main/scala/k > > >>afka/server/KafkaConfig.scalasays > > >> > > > > > > : > > >> > > > > > > > > >> > > > > > > /* the maximum time before a new log segment is rolled > > >>out */ > > >> > > > > > > val logRollHours = Utils.getIntInRange(props, > > >> "log.roll.hours", > > >> > > > 24*7, > > >> > > > > > (1, > > >> > > > > > > Int.MaxValue)) > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > https://svn.apache.org/repos/asf/kafka/branches/0.7/core/src/main/scala/k > > >>afka/log/Log.scalahas > > >> > > > > > > maybeRoll, which needs segment.firstAppendTime defined. It > > >>also > > >> > has > > >> > > > > > > updateFirstAppendTime() which says if it's empty, then set > > >>it. > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > >