Sorry, I forgot to close the loop on my experiences with this.... We solved this issue by setting the 'allocsize' mount option, in the fstab. E.g. allocsize=16M.
Jason On Mon, Sep 9, 2013 at 1:47 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > This could certainly be done. It would be slightly involved since you would > need to implement some kind of file-handle cache for both indexes and log > files and re-open them on demand when a read occurs. If someone wants to > take a shot at this, the first step would be to get a design wiki in place > on how this would work. This is potentially nice to reduce the open file > count (though open files are pretty cheap). > > That said this issue only impacts xfs and it seems to be fixed by that > setting jonathan found. I wonder if you could give that a try and see if it > works for you too? I feel dealing with closed files does add a lot of > complexity so if there is an easy fix I would probably rather avoid it. > > -Jay > > > On Mon, Sep 9, 2013 at 8:17 AM, Mike Heffner <m...@librato.com> wrote: > > > We are also seeing this problem with version 0.7.1 and logs on an XFS > > partition. At our largest scale we can frequently free over 600GB of disk > > usage by simply restarting Kafka. We've examined the `lsof` output from > the > > Kafka process and while it does appear to have FDs open for all log files > > on disk (even those long past read from), it does not have any files open > > that were previously deleted from disk. > > > > Du output agrees that the seen size is much larger than apparent-size > size: > > > > root@kafkanode-1:/raid0/kafka-logs/measures-0# du -h > > 00000000242666442619.kafka > > 1.1G 00000000242666442619.kafka > > root@kafkanode-1:/raid0/kafka-logs/measures-0# du -h --apparent-size > > 00000000242666442619.kafka > > 513M 00000000242666442619.kafka > > > > > > Our log size/retention policy is: > > > > log.file.size=536870912 > > log.retention.hours=96 > > > > We tried dropping the caches from the Stack Overflow suggestion (sync; > echo > > 3 > /proc/sys/vm/drop_caches) but that didn't seem to clear up the extra > > space. Haven't had the chance to try remounting with the allocsize > option. > > > > In summary, it would be great if Kafka would close FD's to log files that > > hadn't been read from for some period of time if it addresses this issue. > > > > Cheers, > > > > Mike > > > > On Fri, Jul 26, 2013 at 5:03 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > > > Cool, good to know. > > > > > > > > > On Fri, Jul 26, 2013 at 2:00 PM, Jason Rosenberg <j...@squareup.com> > > wrote: > > > > > > > Jay, > > > > > > > > My only experience so far with this is using XFS. It appears the XFS > > > > behavior is evolving, and in fact, we see somewhat different behavior > > > from > > > > 2 of our CentOS kernel versions in use. I've been trying to ask > > > questions > > > > about all this on the XFS.org mailing list, but so far, having not > much > > > > luck understanding the xfs versioning correlated to CentOS versions. > > > > > > > > Anyway, yes, I think it would definitely be worth trying the solution > > you > > > > suggest, which would be to close the file on rotation, and re-open > > > > read-only. Or to close files after a few hours of not being > accessed. > > > If > > > > a patch for one of these approaches can be cobbled together, I'd love > > to > > > > test it out on our staging environment. I'd be willing to experiment > > > with > > > > such a patch myself, although I'm not 100% of all the places to look > > (but > > > > might dive in). > > > > > > > > Xfs appears to the option of using dynamic, speculative > preallocation, > > in > > > > which case it progressively doubles the amount of space reserved for > a > > > > file, as the file grows. It does do this for all open files. If the > > > file > > > > is closed, it will then release the preallocated space not in use. > > It's > > > > not clear whether this releasing of space happens immediately on > close, > > > and > > > > whether re-opening the file read-only immediately, will keep it from > > > > releasing space (still trying to gather more info on that). > > > > > > > > I haven't looked too much at the index files, but those too appear to > > > have > > > > this behavior (e.g. preallocated size is always on the order of > double > > > the > > > > actual size, until the app is restarted). > > > > > > > > Jason > > > > > > > > > > > > On Fri, Jul 26, 2013 at 12:46 PM, Jay Kreps <jay.kr...@gmail.com> > > wrote: > > > > > > > > > Interesting. > > > > > > > > > > Yes, Kafka keeps all log files open indefinitely. There is no > > inherent > > > > > reason this needs to be the case, though, it would be possible to > LRU > > > out > > > > > old file descriptors and close them if they are not accessed for a > > few > > > > > hours and then reopen on the first access. We just haven't > > implemented > > > > > anything like that. > > > > > > > > > > It would be good to understand this a little better. Does xfs > > > > pre-allocate > > > > > space for all open files? Perhaps just closing the file on log role > > and > > > > > opening it read-only would solve the issue? Is this at all related > to > > > the > > > > > use of sparse files for the indexes (i.e. > > > > RandomAccessFile.setLength(10MB) > > > > > when we create the index)? Does this effect other filesystems or > just > > > > xfs? > > > > > > > > > > -Jay > > > > > > > > > > > > > > > On Fri, Jul 26, 2013 at 12:42 AM, Jason Rosenberg < > j...@squareup.com> > > > > > wrote: > > > > > > > > > > > It looks like xfs will reclaim the preallocated space for a file, > > > after > > > > > it > > > > > > is closed. > > > > > > > > > > > > Does kafka close a file after it has reached it's max size and > > > started > > > > > > writing to the next log file in sequence? Or does it keep all > open > > > > until > > > > > > they are deleted, or the server quits (that's what it seems > like). > > > > > > > > > > > > I could imagine that it might need to keep log files open, in > order > > > to > > > > > > allow consumers access to them. But does it keep them open > > > > indefinitely, > > > > > > after there is no longer any data to be written to them, and no > > > > consumers > > > > > > are currently attempting to read from them? > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > On Tue, Jul 16, 2013 at 4:32 PM, Jay Kreps <jay.kr...@gmail.com> > > > > wrote: > > > > > > > > > > > > > Interesting. Yes it will respect whatever setting it is given > for > > > new > > > > > > > segments created from that point on. > > > > > > > > > > > > > > -Jay > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 16, 2013 at 11:23 AM, Jason Rosenberg < > > > j...@squareup.com> > > > > > > > wrote: > > > > > > > > > > > > > > > Ok, > > > > > > > > > > > > > > > > An update on this. It seems we are using XFS, which is > > available > > > > in > > > > > > > newer > > > > > > > > versions of Centos. It definitely does pre-allocate space > as a > > > > file > > > > > > > grows, > > > > > > > > see: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://serverfault.com/questions/406069/why-are-my-xfs-filesystems-suddenly-consuming-more-space-and-full-of-sparse-file > > > > > > > > > > > > > > > > Apparently it's not hard-allocated space, and would be > released > > > > under > > > > > > > > resource pressure....seems we may need to update how we > monitor > > > > disk > > > > > > > space > > > > > > > > usage, etc.... > > > > > > > > > > > > > > > > But, it seems that the default log file size of 1.1Gb, causes > > it > > > to > > > > > > jump > > > > > > > to > > > > > > > > preallocate an extra Gb. So, in theory, if I set a > strategic > > > log > > > > > file > > > > > > > > size to be just under the threshold that forces it to > > > exponentially > > > > > > > double > > > > > > > > the size from 1Gb to 2Gb, I should be able to mostly solve > this > > > > > issue. > > > > > > > > E.g. use 950Mb instead of 1.1Gb max log file size. > > > > > > > > > > > > > > > > If I change the max log file size for a broker, and restart > it, > > > > will > > > > > it > > > > > > > > respect the new size going forward? > > > > > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Jul 14, 2013 at 9:44 AM, Jay Kreps < > > jay.kr...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > Hey Jason, > > > > > > > > > > > > > > > > > > As Jun says, we haven't seen that issue and no one else has > > > > > reported > > > > > > > that > > > > > > > > > but it sounds like a bug of some kind. > > > > > > > > > > > > > > > > > > In 0.7 we don't do any preallocation of anything. The only > > time > > > > > files > > > > > > > > > shrink is during recovery--we re-checksum all messages that > > may > > > > not > > > > > > > have > > > > > > > > > been flushed and if any invalid messages are found we > > truncate > > > > them > > > > > > > off. > > > > > > > > > This would only occur after an unclean shutdown (kill -9) > and > > > you > > > > > > would > > > > > > > > get > > > > > > > > > a bunch of errors in log warning you this was happening. > > > > > > > > > > > > > > > > > > -Jay > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Jul 14, 2013 at 1:36 AM, Jason Rosenberg < > > > > j...@squareup.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > An update on this. It appears that the phenomenon I'm > > seeing > > > > is > > > > > > that > > > > > > > > > disk > > > > > > > > > > space is freed on restart, but it's not due files getting > > > > deleted > > > > > > on > > > > > > > > > > restart, but instead files are getting truncated on > > restart. > > > > It > > > > > > > > appears > > > > > > > > > > that log files get pre-allocated to a larger size than is > > > used > > > > > > right > > > > > > > > > away. > > > > > > > > > > Upon restart, they get truncated to the size of the file > > > that > > > > > > > actually > > > > > > > > > > contains data. Does this make sense? > > > > > > > > > > > > > > > > > > > > Before restart, I see a large number of log files size > > 2.1Gb. > > > > > Upon > > > > > > > > > > restart, the disk space reclaimed drops to almost half > > that, > > > on > > > > > > > > average. > > > > > > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, May 23, 2013 at 8:55 PM, Jun Rao < > jun...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > I haven't seen this issue before. We do have ~1K topics > > in > > > > one > > > > > of > > > > > > > the > > > > > > > > > > Kafka > > > > > > > > > > > clusters at LinkedIn. > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, May 23, 2013 at 11:05 AM, Jason Rosenberg < > > > > > > > j...@squareup.com> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Yeah, that's what it looks like to me (looking at the > > > > code). > > > > > > So, > > > > > > > > I'm > > > > > > > > > > > > guessing it's some os level caching, resource > > recycling. > > > > > Have > > > > > > > you > > > > > > > > > ever > > > > > > > > > > > > heard of this happening? One thing that might be > > > different > > > > > in > > > > > > my > > > > > > > > > usage > > > > > > > > > > > > from the norm is a relatively large number of topics > > > (e.g. > > > > > ~2K > > > > > > > > > topics). > > > > > > > > > > > > > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, May 23, 2013 at 7:14 AM, Jun Rao < > > > jun...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Jason, > > > > > > > > > > > > > > > > > > > > > > > > > > Kafka closes the handler of all delete files. > > > Otherwise, > > > > > the > > > > > > > > broker > > > > > > > > > > > will > > > > > > > > > > > > > run out of file handler quickly. > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, May 22, 2013 at 10:17 PM, Jason Rosenberg < > > > > > > > > > j...@squareup.com> > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > So, does this indicate kafka (or the jvm itself) > is > > > not > > > > > > > > > > aggressively > > > > > > > > > > > > > > closing file handles of deleted files? Is there > a > > > fix > > > > > for > > > > > > > > this? > > > > > > > > > > Or > > > > > > > > > > > is > > > > > > > > > > > > > > there not likely anything to be done? What > happens > > > if > > > > > the > > > > > > > disk > > > > > > > > > > fills > > > > > > > > > > > > up > > > > > > > > > > > > > > with file handles for phantom deleted files? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, May 22, 2013 at 9:50 PM, Jonathan Creasy > < > > > > > > j...@box.com > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It isn't uncommon if a process has an open file > > > > handle > > > > > > on a > > > > > > > > > file > > > > > > > > > > > that > > > > > > > > > > > > > is > > > > > > > > > > > > > > > deleted, the space is not freed until the > handle > > is > > > > > > closed. > > > > > > > > So > > > > > > > > > > > > > restarting > > > > > > > > > > > > > > > the process that has a handle on the file would > > > cause > > > > > the > > > > > > > > space > > > > > > > > > > to > > > > > > > > > > > be > > > > > > > > > > > > > > freed > > > > > > > > > > > > > > > also. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You can troubleshoot that with lsof. > > > > > > > > > > > > > > > Normally, I see 2-4 log segments deleted every > > hour > > > > in > > > > > my > > > > > > > > > > brokers. > > > > > > > > > > > I > > > > > > > > > > > > > see > > > > > > > > > > > > > > > log lines like this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2013-05-23 04:40:06,857 INFO > > [kafka-logcleaner-0] > > > > > > > > > > log.LogManager - > > > > > > > > > > > > > > > Deleting log segment 00000000035434043157.kafka > > > from > > > > > > > > <redacted > > > > > > > > > > > topic> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, it seems like if I restart the > broker, a > > > > > massive > > > > > > > > > amount > > > > > > > > > > of > > > > > > > > > > > > > disk > > > > > > > > > > > > > > > space is freed (without a corresponding flood > of > > > > these > > > > > > log > > > > > > > > > > segment > > > > > > > > > > > > > > deleted > > > > > > > > > > > > > > > messages). Is there an explanation for this? > > Does > > > > > kafka > > > > > > > > keep > > > > > > > > > > > > > reference > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > file segments around, and reuse them as needed > or > > > > > > > something? > > > > > > > > > And > > > > > > > > > > > > then > > > > > > > > > > > > > or > > > > > > > > > > > > > > > restart, the references to those free segment > > files > > > > are > > > > > > > > > dropped? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This is with 0.7.2. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Mike Heffner <m...@librato.com> > > Librato, Inc. > > >