Hey Jason, As Jun says, we haven't seen that issue and no one else has reported that but it sounds like a bug of some kind.
In 0.7 we don't do any preallocation of anything. The only time files shrink is during recovery--we re-checksum all messages that may not have been flushed and if any invalid messages are found we truncate them off. This would only occur after an unclean shutdown (kill -9) and you would get a bunch of errors in log warning you this was happening. -Jay On Sun, Jul 14, 2013 at 1:36 AM, Jason Rosenberg <j...@squareup.com> wrote: > An update on this. It appears that the phenomenon I'm seeing is that disk > space is freed on restart, but it's not due files getting deleted on > restart, but instead files are getting truncated on restart. It appears > that log files get pre-allocated to a larger size than is used right away. > Upon restart, they get truncated to the size of the file that actually > contains data. Does this make sense? > > Before restart, I see a large number of log files size 2.1Gb. Upon > restart, the disk space reclaimed drops to almost half that, on average. > > Thoughts? > > Jason > > > On Thu, May 23, 2013 at 8:55 PM, Jun Rao <jun...@gmail.com> wrote: > > > I haven't seen this issue before. We do have ~1K topics in one of the > Kafka > > clusters at LinkedIn. > > > > Thanks, > > > > Jun > > > > > > On Thu, May 23, 2013 at 11:05 AM, Jason Rosenberg <j...@squareup.com> > > wrote: > > > > > Yeah, that's what it looks like to me (looking at the code). So, I'm > > > guessing it's some os level caching, resource recycling. Have you ever > > > heard of this happening? One thing that might be different in my usage > > > from the norm is a relatively large number of topics (e.g. ~2K topics). > > > > > > Jason > > > > > > > > > On Thu, May 23, 2013 at 7:14 AM, Jun Rao <jun...@gmail.com> wrote: > > > > > > > Jason, > > > > > > > > Kafka closes the handler of all delete files. Otherwise, the broker > > will > > > > run out of file handler quickly. > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > > > > > On Wed, May 22, 2013 at 10:17 PM, Jason Rosenberg <j...@squareup.com> > > > > wrote: > > > > > > > > > So, does this indicate kafka (or the jvm itself) is not > aggressively > > > > > closing file handles of deleted files? Is there a fix for this? > Or > > is > > > > > there not likely anything to be done? What happens if the disk > fills > > > up > > > > > with file handles for phantom deleted files? > > > > > > > > > > Jason > > > > > > > > > > > > > > > On Wed, May 22, 2013 at 9:50 PM, Jonathan Creasy <j...@box.com> > wrote: > > > > > > > > > > > It isn't uncommon if a process has an open file handle on a file > > that > > > > is > > > > > > deleted, the space is not freed until the handle is closed. So > > > > restarting > > > > > > the process that has a handle on the file would cause the space > to > > be > > > > > freed > > > > > > also. > > > > > > > > > > > > You can troubleshoot that with lsof. > > > > > > Normally, I see 2-4 log segments deleted every hour in my > brokers. > > I > > > > see > > > > > > log lines like this: > > > > > > > > > > > > 2013-05-23 04:40:06,857 INFO [kafka-logcleaner-0] > log.LogManager - > > > > > > Deleting log segment 00000000035434043157.kafka from <redacted > > topic> > > > > > > > > > > > > However, it seems like if I restart the broker, a massive amount > of > > > > disk > > > > > > space is freed (without a corresponding flood of these log > segment > > > > > deleted > > > > > > messages). Is there an explanation for this? Does kafka keep > > > > reference > > > > > to > > > > > > file segments around, and reuse them as needed or something? And > > > then > > > > or > > > > > > restart, the references to those free segment files are dropped? > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > This is with 0.7.2. > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > >