Re: large amount of disk space freed on restart

2013-09-09 Thread Jason Rosenberg
Sorry, I forgot to close the loop on my experiences with this We solved this issue by setting the 'allocsize' mount option, in the fstab. E.g. allocsize=16M. Jason On Mon, Sep 9, 2013 at 1:47 PM, Jay Kreps wrote: > This could certainly be done. It would be slightly involved since you wou

Re: large amount of disk space freed on restart

2013-09-09 Thread Mike Heffner
Jason, Thanks, we'll give that option a try. Mike On Mon, Sep 9, 2013 at 2:41 PM, Jason Rosenberg wrote: > Sorry, I forgot to close the loop on my experiences with this > > We solved this issue by setting the 'allocsize' mount option, in the fstab. > E.g. allocsize=16M. > > Jason > > > O

Re: large amount of disk space freed on restart

2013-09-09 Thread Jay Kreps
This could certainly be done. It would be slightly involved since you would need to implement some kind of file-handle cache for both indexes and log files and re-open them on demand when a read occurs. If someone wants to take a shot at this, the first step would be to get a design wiki in place o

Re: large amount of disk space freed on restart

2013-09-09 Thread Mike Heffner
We are also seeing this problem with version 0.7.1 and logs on an XFS partition. At our largest scale we can frequently free over 600GB of disk usage by simply restarting Kafka. We've examined the `lsof` output from the Kafka process and while it does appear to have FDs open for all log files on di

Re: large amount of disk space freed on restart

2013-07-26 Thread Jay Kreps
Cool, good to know. On Fri, Jul 26, 2013 at 2:00 PM, Jason Rosenberg wrote: > Jay, > > My only experience so far with this is using XFS. It appears the XFS > behavior is evolving, and in fact, we see somewhat different behavior from > 2 of our CentOS kernel versions in use. I've been trying t

Re: large amount of disk space freed on restart

2013-07-26 Thread Jason Rosenberg
Jay, My only experience so far with this is using XFS. It appears the XFS behavior is evolving, and in fact, we see somewhat different behavior from 2 of our CentOS kernel versions in use. I've been trying to ask questions about all this on the XFS.org mailing list, but so far, having not much l

Re: large amount of disk space freed on restart

2013-07-26 Thread Jay Kreps
Interesting. Yes, Kafka keeps all log files open indefinitely. There is no inherent reason this needs to be the case, though, it would be possible to LRU out old file descriptors and close them if they are not accessed for a few hours and then reopen on the first access. We just haven't implemente

Re: large amount of disk space freed on restart

2013-07-26 Thread Jason Rosenberg
It looks like xfs will reclaim the preallocated space for a file, after it is closed. Does kafka close a file after it has reached it's max size and started writing to the next log file in sequence? Or does it keep all open until they are deleted, or the server quits (that's what it seems like).

Re: large amount of disk space freed on restart

2013-07-16 Thread Jay Kreps
Interesting. Yes it will respect whatever setting it is given for new segments created from that point on. -Jay On Tue, Jul 16, 2013 at 11:23 AM, Jason Rosenberg wrote: > Ok, > > An update on this. It seems we are using XFS, which is available in newer > versions of Centos. It definitely doe

Re: large amount of disk space freed on restart

2013-07-16 Thread Jason Rosenberg
Ok, An update on this. It seems we are using XFS, which is available in newer versions of Centos. It definitely does pre-allocate space as a file grows, see: http://serverfault.com/questions/406069/why-are-my-xfs-filesystems-suddenly-consuming-more-space-and-full-of-sparse-file Apparently it's

Re: large amount of disk space freed on restart

2013-07-14 Thread Jay Kreps
Hey Jason, As Jun says, we haven't seen that issue and no one else has reported that but it sounds like a bug of some kind. In 0.7 we don't do any preallocation of anything. The only time files shrink is during recovery--we re-checksum all messages that may not have been flushed and if any invali

Re: large amount of disk space freed on restart

2013-07-14 Thread Jason Rosenberg
An update on this. It appears that the phenomenon I'm seeing is that disk space is freed on restart, but it's not due files getting deleted on restart, but instead files are getting truncated on restart. It appears that log files get pre-allocated to a larger size than is used right away. Upon r

Re: large amount of disk space freed on restart

2013-05-23 Thread Jun Rao
I haven't seen this issue before. We do have ~1K topics in one of the Kafka clusters at LinkedIn. Thanks, Jun On Thu, May 23, 2013 at 11:05 AM, Jason Rosenberg wrote: > Yeah, that's what it looks like to me (looking at the code). So, I'm > guessing it's some os level caching, resource recycl

Re: large amount of disk space freed on restart

2013-05-23 Thread Jason Rosenberg
Yeah, that's what it looks like to me (looking at the code). So, I'm guessing it's some os level caching, resource recycling. Have you ever heard of this happening? One thing that might be different in my usage from the norm is a relatively large number of topics (e.g. ~2K topics). Jason On T

Re: large amount of disk space freed on restart

2013-05-23 Thread Jun Rao
Jason, Kafka closes the handler of all delete files. Otherwise, the broker will run out of file handler quickly. Thanks, Jun On Wed, May 22, 2013 at 10:17 PM, Jason Rosenberg wrote: > So, does this indicate kafka (or the jvm itself) is not aggressively > closing file handles of deleted files

Re: large amount of disk space freed on restart

2013-05-22 Thread Jason Rosenberg
No, nothing outside of kafka would look at those files I'm wondering if it's an os level thing too On Wed, May 22, 2013 at 10:25 PM, Jonathan Creasy wrote: > Well, it sounds like files were deleted while Kafka still had them open. > Or something else opened them while Kafka deleted the

Re: large amount of disk space freed on restart

2013-05-22 Thread Jonathan Creasy
Well, it sounds like files were deleted while Kafka still had them open. Or something else opened them while Kafka deleted them. I haven't noticed this on our systems but we haven't looked for it either. Is anything outside of Kafka deleting or reading those files? On May 23, 2013 1:17 AM, "Jason

Re: large amount of disk space freed on restart

2013-05-22 Thread Jason Rosenberg
So, does this indicate kafka (or the jvm itself) is not aggressively closing file handles of deleted files? Is there a fix for this? Or is there not likely anything to be done? What happens if the disk fills up with file handles for phantom deleted files? Jason On Wed, May 22, 2013 at 9:50 PM

Re: large amount of disk space freed on restart

2013-05-22 Thread Jonathan Creasy
It isn't uncommon if a process has an open file handle on a file that is deleted, the space is not freed until the handle is closed. So restarting the process that has a handle on the file would cause the space to be freed also. You can troubleshoot that with lsof. Normally, I see 2-4 log segments