Jason, Thanks, we'll give that option a try.
Mike On Mon, Sep 9, 2013 at 2:41 PM, Jason Rosenberg <j...@squareup.com> wrote: > Sorry, I forgot to close the loop on my experiences with this.... > > We solved this issue by setting the 'allocsize' mount option, in the fstab. > E.g. allocsize=16M. > > Jason > > > On Mon, Sep 9, 2013 at 1:47 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > This could certainly be done. It would be slightly involved since you > would > > need to implement some kind of file-handle cache for both indexes and log > > files and re-open them on demand when a read occurs. If someone wants to > > take a shot at this, the first step would be to get a design wiki in > place > > on how this would work. This is potentially nice to reduce the open file > > count (though open files are pretty cheap). > > > > That said this issue only impacts xfs and it seems to be fixed by that > > setting jonathan found. I wonder if you could give that a try and see if > it > > works for you too? I feel dealing with closed files does add a lot of > > complexity so if there is an easy fix I would probably rather avoid it. > > > > -Jay > > > > > > On Mon, Sep 9, 2013 at 8:17 AM, Mike Heffner <m...@librato.com> wrote: > > > > > We are also seeing this problem with version 0.7.1 and logs on an XFS > > > partition. At our largest scale we can frequently free over 600GB of > disk > > > usage by simply restarting Kafka. We've examined the `lsof` output from > > the > > > Kafka process and while it does appear to have FDs open for all log > files > > > on disk (even those long past read from), it does not have any files > open > > > that were previously deleted from disk. > > > > > > Du output agrees that the seen size is much larger than apparent-size > > size: > > > > > > root@kafkanode-1:/raid0/kafka-logs/measures-0# du -h > > > 00000000242666442619.kafka > > > 1.1G 00000000242666442619.kafka > > > root@kafkanode-1:/raid0/kafka-logs/measures-0# du -h --apparent-size > > > 00000000242666442619.kafka > > > 513M 00000000242666442619.kafka > > > > > > > > > Our log size/retention policy is: > > > > > > log.file.size=536870912 > > > log.retention.hours=96 > > > > > > We tried dropping the caches from the Stack Overflow suggestion (sync; > > echo > > > 3 > /proc/sys/vm/drop_caches) but that didn't seem to clear up the > extra > > > space. Haven't had the chance to try remounting with the allocsize > > option. > > > > > > In summary, it would be great if Kafka would close FD's to log files > that > > > hadn't been read from for some period of time if it addresses this > issue. > > > > > > Cheers, > > > > > > Mike > > > > > > On Fri, Jul 26, 2013 at 5:03 PM, Jay Kreps <jay.kr...@gmail.com> > wrote: > > > > > > > Cool, good to know. > > > > > > > > > > > > On Fri, Jul 26, 2013 at 2:00 PM, Jason Rosenberg <j...@squareup.com> > > > wrote: > > > > > > > > > Jay, > > > > > > > > > > My only experience so far with this is using XFS. It appears the > XFS > > > > > behavior is evolving, and in fact, we see somewhat different > behavior > > > > from > > > > > 2 of our CentOS kernel versions in use. I've been trying to ask > > > > questions > > > > > about all this on the XFS.org mailing list, but so far, having not > > much > > > > > luck understanding the xfs versioning correlated to CentOS > versions. > > > > > > > > > > Anyway, yes, I think it would definitely be worth trying the > solution > > > you > > > > > suggest, which would be to close the file on rotation, and re-open > > > > > read-only. Or to close files after a few hours of not being > > accessed. > > > > If > > > > > a patch for one of these approaches can be cobbled together, I'd > love > > > to > > > > > test it out on our staging environment. I'd be willing to > experiment > > > > with > > > > > such a patch myself, although I'm not 100% of all the places to > look > > > (but > > > > > might dive in). > > > > > > > > > > Xfs appears to the option of using dynamic, speculative > > preallocation, > > > in > > > > > which case it progressively doubles the amount of space reserved > for > > a > > > > > file, as the file grows. It does do this for all open files. If > the > > > > file > > > > > is closed, it will then release the preallocated space not in use. > > > It's > > > > > not clear whether this releasing of space happens immediately on > > close, > > > > and > > > > > whether re-opening the file read-only immediately, will keep it > from > > > > > releasing space (still trying to gather more info on that). > > > > > > > > > > I haven't looked too much at the index files, but those too appear > to > > > > have > > > > > this behavior (e.g. preallocated size is always on the order of > > double > > > > the > > > > > actual size, until the app is restarted). > > > > > > > > > > Jason > > > > > > > > > > > > > > > On Fri, Jul 26, 2013 at 12:46 PM, Jay Kreps <jay.kr...@gmail.com> > > > wrote: > > > > > > > > > > > Interesting. > > > > > > > > > > > > Yes, Kafka keeps all log files open indefinitely. There is no > > > inherent > > > > > > reason this needs to be the case, though, it would be possible to > > LRU > > > > out > > > > > > old file descriptors and close them if they are not accessed for > a > > > few > > > > > > hours and then reopen on the first access. We just haven't > > > implemented > > > > > > anything like that. > > > > > > > > > > > > It would be good to understand this a little better. Does xfs > > > > > pre-allocate > > > > > > space for all open files? Perhaps just closing the file on log > role > > > and > > > > > > opening it read-only would solve the issue? Is this at all > related > > to > > > > the > > > > > > use of sparse files for the indexes (i.e. > > > > > RandomAccessFile.setLength(10MB) > > > > > > when we create the index)? Does this effect other filesystems or > > just > > > > > xfs? > > > > > > > > > > > > -Jay > > > > > > > > > > > > > > > > > > On Fri, Jul 26, 2013 at 12:42 AM, Jason Rosenberg < > > j...@squareup.com> > > > > > > wrote: > > > > > > > > > > > > > It looks like xfs will reclaim the preallocated space for a > file, > > > > after > > > > > > it > > > > > > > is closed. > > > > > > > > > > > > > > Does kafka close a file after it has reached it's max size and > > > > started > > > > > > > writing to the next log file in sequence? Or does it keep all > > open > > > > > until > > > > > > > they are deleted, or the server quits (that's what it seems > > like). > > > > > > > > > > > > > > I could imagine that it might need to keep log files open, in > > order > > > > to > > > > > > > allow consumers access to them. But does it keep them open > > > > > indefinitely, > > > > > > > after there is no longer any data to be written to them, and no > > > > > consumers > > > > > > > are currently attempting to read from them? > > > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 16, 2013 at 4:32 PM, Jay Kreps < > jay.kr...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > Interesting. Yes it will respect whatever setting it is given > > for > > > > new > > > > > > > > segments created from that point on. > > > > > > > > > > > > > > > > -Jay > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 16, 2013 at 11:23 AM, Jason Rosenberg < > > > > j...@squareup.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Ok, > > > > > > > > > > > > > > > > > > An update on this. It seems we are using XFS, which is > > > available > > > > > in > > > > > > > > newer > > > > > > > > > versions of Centos. It definitely does pre-allocate space > > as a > > > > > file > > > > > > > > grows, > > > > > > > > > see: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://serverfault.com/questions/406069/why-are-my-xfs-filesystems-suddenly-consuming-more-space-and-full-of-sparse-file > > > > > > > > > > > > > > > > > > Apparently it's not hard-allocated space, and would be > > released > > > > > under > > > > > > > > > resource pressure....seems we may need to update how we > > monitor > > > > > disk > > > > > > > > space > > > > > > > > > usage, etc.... > > > > > > > > > > > > > > > > > > But, it seems that the default log file size of 1.1Gb, > causes > > > it > > > > to > > > > > > > jump > > > > > > > > to > > > > > > > > > preallocate an extra Gb. So, in theory, if I set a > > strategic > > > > log > > > > > > file > > > > > > > > > size to be just under the threshold that forces it to > > > > exponentially > > > > > > > > double > > > > > > > > > the size from 1Gb to 2Gb, I should be able to mostly solve > > this > > > > > > issue. > > > > > > > > > E.g. use 950Mb instead of 1.1Gb max log file size. > > > > > > > > > > > > > > > > > > If I change the max log file size for a broker, and restart > > it, > > > > > will > > > > > > it > > > > > > > > > respect the new size going forward? > > > > > > > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Jul 14, 2013 at 9:44 AM, Jay Kreps < > > > jay.kr...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hey Jason, > > > > > > > > > > > > > > > > > > > > As Jun says, we haven't seen that issue and no one else > has > > > > > > reported > > > > > > > > that > > > > > > > > > > but it sounds like a bug of some kind. > > > > > > > > > > > > > > > > > > > > In 0.7 we don't do any preallocation of anything. The > only > > > time > > > > > > files > > > > > > > > > > shrink is during recovery--we re-checksum all messages > that > > > may > > > > > not > > > > > > > > have > > > > > > > > > > been flushed and if any invalid messages are found we > > > truncate > > > > > them > > > > > > > > off. > > > > > > > > > > This would only occur after an unclean shutdown (kill -9) > > and > > > > you > > > > > > > would > > > > > > > > > get > > > > > > > > > > a bunch of errors in log warning you this was happening. > > > > > > > > > > > > > > > > > > > > -Jay > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Jul 14, 2013 at 1:36 AM, Jason Rosenberg < > > > > > j...@squareup.com > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > An update on this. It appears that the phenomenon I'm > > > seeing > > > > > is > > > > > > > that > > > > > > > > > > disk > > > > > > > > > > > space is freed on restart, but it's not due files > getting > > > > > deleted > > > > > > > on > > > > > > > > > > > restart, but instead files are getting truncated on > > > restart. > > > > > It > > > > > > > > > appears > > > > > > > > > > > that log files get pre-allocated to a larger size than > is > > > > used > > > > > > > right > > > > > > > > > > away. > > > > > > > > > > > Upon restart, they get truncated to the size of the > file > > > > that > > > > > > > > actually > > > > > > > > > > > contains data. Does this make sense? > > > > > > > > > > > > > > > > > > > > > > Before restart, I see a large number of log files size > > > 2.1Gb. > > > > > > Upon > > > > > > > > > > > restart, the disk space reclaimed drops to almost half > > > that, > > > > on > > > > > > > > > average. > > > > > > > > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, May 23, 2013 at 8:55 PM, Jun Rao < > > jun...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > I haven't seen this issue before. We do have ~1K > topics > > > in > > > > > one > > > > > > of > > > > > > > > the > > > > > > > > > > > Kafka > > > > > > > > > > > > clusters at LinkedIn. > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, May 23, 2013 at 11:05 AM, Jason Rosenberg < > > > > > > > > j...@squareup.com> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Yeah, that's what it looks like to me (looking at > the > > > > > code). > > > > > > > So, > > > > > > > > > I'm > > > > > > > > > > > > > guessing it's some os level caching, resource > > > recycling. > > > > > > Have > > > > > > > > you > > > > > > > > > > ever > > > > > > > > > > > > > heard of this happening? One thing that might be > > > > different > > > > > > in > > > > > > > my > > > > > > > > > > usage > > > > > > > > > > > > > from the norm is a relatively large number of > topics > > > > (e.g. > > > > > > ~2K > > > > > > > > > > topics). > > > > > > > > > > > > > > > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, May 23, 2013 at 7:14 AM, Jun Rao < > > > > jun...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Jason, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kafka closes the handler of all delete files. > > > > Otherwise, > > > > > > the > > > > > > > > > broker > > > > > > > > > > > > will > > > > > > > > > > > > > > run out of file handler quickly. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, May 22, 2013 at 10:17 PM, Jason > Rosenberg < > > > > > > > > > > j...@squareup.com> > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So, does this indicate kafka (or the jvm > itself) > > is > > > > not > > > > > > > > > > > aggressively > > > > > > > > > > > > > > > closing file handles of deleted files? Is > there > > a > > > > fix > > > > > > for > > > > > > > > > this? > > > > > > > > > > > Or > > > > > > > > > > > > is > > > > > > > > > > > > > > > there not likely anything to be done? What > > happens > > > > if > > > > > > the > > > > > > > > disk > > > > > > > > > > > fills > > > > > > > > > > > > > up > > > > > > > > > > > > > > > with file handles for phantom deleted files? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, May 22, 2013 at 9:50 PM, Jonathan > Creasy > > < > > > > > > > j...@box.com > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It isn't uncommon if a process has an open > file > > > > > handle > > > > > > > on a > > > > > > > > > > file > > > > > > > > > > > > that > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > deleted, the space is not freed until the > > handle > > > is > > > > > > > closed. > > > > > > > > > So > > > > > > > > > > > > > > restarting > > > > > > > > > > > > > > > > the process that has a handle on the file > would > > > > cause > > > > > > the > > > > > > > > > space > > > > > > > > > > > to > > > > > > > > > > > > be > > > > > > > > > > > > > > > freed > > > > > > > > > > > > > > > > also. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You can troubleshoot that with lsof. > > > > > > > > > > > > > > > > Normally, I see 2-4 log segments deleted > every > > > hour > > > > > in > > > > > > my > > > > > > > > > > > brokers. > > > > > > > > > > > > I > > > > > > > > > > > > > > see > > > > > > > > > > > > > > > > log lines like this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2013-05-23 04:40:06,857 INFO > > > [kafka-logcleaner-0] > > > > > > > > > > > log.LogManager - > > > > > > > > > > > > > > > > Deleting log segment > 00000000035434043157.kafka > > > > from > > > > > > > > > <redacted > > > > > > > > > > > > topic> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, it seems like if I restart the > > broker, a > > > > > > massive > > > > > > > > > > amount > > > > > > > > > > > of > > > > > > > > > > > > > > disk > > > > > > > > > > > > > > > > space is freed (without a corresponding flood > > of > > > > > these > > > > > > > log > > > > > > > > > > > segment > > > > > > > > > > > > > > > deleted > > > > > > > > > > > > > > > > messages). Is there an explanation for this? > > > Does > > > > > > kafka > > > > > > > > > keep > > > > > > > > > > > > > > reference > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > file segments around, and reuse them as > needed > > or > > > > > > > > something? > > > > > > > > > > And > > > > > > > > > > > > > then > > > > > > > > > > > > > > or > > > > > > > > > > > > > > > > restart, the references to those free segment > > > files > > > > > are > > > > > > > > > > dropped? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This is with 0.7.2. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Mike Heffner <m...@librato.com> > > > Librato, Inc. > > > > > > -- Mike Heffner <m...@librato.com> Librato, Inc.