Interesting. Yes it will respect whatever setting it is given for new segments created from that point on.
-Jay On Tue, Jul 16, 2013 at 11:23 AM, Jason Rosenberg <j...@squareup.com> wrote: > Ok, > > An update on this. It seems we are using XFS, which is available in newer > versions of Centos. It definitely does pre-allocate space as a file grows, > see: > > http://serverfault.com/questions/406069/why-are-my-xfs-filesystems-suddenly-consuming-more-space-and-full-of-sparse-file > > Apparently it's not hard-allocated space, and would be released under > resource pressure....seems we may need to update how we monitor disk space > usage, etc.... > > But, it seems that the default log file size of 1.1Gb, causes it to jump to > preallocate an extra Gb. So, in theory, if I set a strategic log file > size to be just under the threshold that forces it to exponentially double > the size from 1Gb to 2Gb, I should be able to mostly solve this issue. > E.g. use 950Mb instead of 1.1Gb max log file size. > > If I change the max log file size for a broker, and restart it, will it > respect the new size going forward? > > Jason > > > On Sun, Jul 14, 2013 at 9:44 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > Hey Jason, > > > > As Jun says, we haven't seen that issue and no one else has reported that > > but it sounds like a bug of some kind. > > > > In 0.7 we don't do any preallocation of anything. The only time files > > shrink is during recovery--we re-checksum all messages that may not have > > been flushed and if any invalid messages are found we truncate them off. > > This would only occur after an unclean shutdown (kill -9) and you would > get > > a bunch of errors in log warning you this was happening. > > > > -Jay > > > > > > > > On Sun, Jul 14, 2013 at 1:36 AM, Jason Rosenberg <j...@squareup.com> > wrote: > > > > > An update on this. It appears that the phenomenon I'm seeing is that > > disk > > > space is freed on restart, but it's not due files getting deleted on > > > restart, but instead files are getting truncated on restart. It > appears > > > that log files get pre-allocated to a larger size than is used right > > away. > > > Upon restart, they get truncated to the size of the file that actually > > > contains data. Does this make sense? > > > > > > Before restart, I see a large number of log files size 2.1Gb. Upon > > > restart, the disk space reclaimed drops to almost half that, on > average. > > > > > > Thoughts? > > > > > > Jason > > > > > > > > > On Thu, May 23, 2013 at 8:55 PM, Jun Rao <jun...@gmail.com> wrote: > > > > > > > I haven't seen this issue before. We do have ~1K topics in one of the > > > Kafka > > > > clusters at LinkedIn. > > > > > > > > Thanks, > > > > > > > > Jun > > > > > > > > > > > > On Thu, May 23, 2013 at 11:05 AM, Jason Rosenberg <j...@squareup.com> > > > > wrote: > > > > > > > > > Yeah, that's what it looks like to me (looking at the code). So, > I'm > > > > > guessing it's some os level caching, resource recycling. Have you > > ever > > > > > heard of this happening? One thing that might be different in my > > usage > > > > > from the norm is a relatively large number of topics (e.g. ~2K > > topics). > > > > > > > > > > Jason > > > > > > > > > > > > > > > On Thu, May 23, 2013 at 7:14 AM, Jun Rao <jun...@gmail.com> wrote: > > > > > > > > > > > Jason, > > > > > > > > > > > > Kafka closes the handler of all delete files. Otherwise, the > broker > > > > will > > > > > > run out of file handler quickly. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > On Wed, May 22, 2013 at 10:17 PM, Jason Rosenberg < > > j...@squareup.com> > > > > > > wrote: > > > > > > > > > > > > > So, does this indicate kafka (or the jvm itself) is not > > > aggressively > > > > > > > closing file handles of deleted files? Is there a fix for > this? > > > Or > > > > is > > > > > > > there not likely anything to be done? What happens if the disk > > > fills > > > > > up > > > > > > > with file handles for phantom deleted files? > > > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > > On Wed, May 22, 2013 at 9:50 PM, Jonathan Creasy <j...@box.com> > > > wrote: > > > > > > > > > > > > > > > It isn't uncommon if a process has an open file handle on a > > file > > > > that > > > > > > is > > > > > > > > deleted, the space is not freed until the handle is closed. > So > > > > > > restarting > > > > > > > > the process that has a handle on the file would cause the > space > > > to > > > > be > > > > > > > freed > > > > > > > > also. > > > > > > > > > > > > > > > > You can troubleshoot that with lsof. > > > > > > > > Normally, I see 2-4 log segments deleted every hour in my > > > brokers. > > > > I > > > > > > see > > > > > > > > log lines like this: > > > > > > > > > > > > > > > > 2013-05-23 04:40:06,857 INFO [kafka-logcleaner-0] > > > log.LogManager - > > > > > > > > Deleting log segment 00000000035434043157.kafka from > <redacted > > > > topic> > > > > > > > > > > > > > > > > However, it seems like if I restart the broker, a massive > > amount > > > of > > > > > > disk > > > > > > > > space is freed (without a corresponding flood of these log > > > segment > > > > > > > deleted > > > > > > > > messages). Is there an explanation for this? Does kafka > keep > > > > > > reference > > > > > > > to > > > > > > > > file segments around, and reuse them as needed or something? > > And > > > > > then > > > > > > or > > > > > > > > restart, the references to those free segment files are > > dropped? > > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > This is with 0.7.2. > > > > > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >