Re: large amount of disk space freed on restart

Mike Heffner Mon, 09 Sep 2013 14:14:11 -0700

Jason,

Thanks, we'll give that option a try.


Mike


On Mon, Sep 9, 2013 at 2:41 PM, Jason Rosenberg <j...@squareup.com> wrote:

> Sorry, I forgot to close the loop on my experiences with this....
>
> We solved this issue by setting the 'allocsize' mount option, in the fstab.
>  E.g. allocsize=16M.
>
> Jason
>
>
> On Mon, Sep 9, 2013 at 1:47 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
>
> > This could certainly be done. It would be slightly involved since you
> would
> > need to implement some kind of file-handle cache for both indexes and log
> > files and re-open them on demand when a read occurs. If someone wants to
> > take a shot at this, the first step would be to get a design wiki in
> place
> > on how this would work. This is potentially nice to reduce the open file
> > count (though open files are pretty cheap).
> >
> > That said this issue only impacts xfs and it seems to be fixed by that
> > setting jonathan found. I wonder if you could give that a try and see if
> it
> > works for you too? I feel dealing with closed files does add a lot of
> > complexity so if there is an easy fix I would probably rather avoid it.
> >
> > -Jay
> >
> >
> > On Mon, Sep 9, 2013 at 8:17 AM, Mike Heffner <m...@librato.com> wrote:
> >
> > > We are also seeing this problem with version 0.7.1 and logs on an XFS
> > > partition. At our largest scale we can frequently free over 600GB of
> disk
> > > usage by simply restarting Kafka. We've examined the `lsof` output from
> > the
> > > Kafka process and while it does appear to have FDs open for all log
> files
> > > on disk (even those long past read from), it does not have any files
> open
> > > that were previously deleted from disk.
> > >
> > > Du output agrees that the seen size is much larger than apparent-size
> > size:
> > >
> > > root@kafkanode-1:/raid0/kafka-logs/measures-0# du -h
> > > 00000000242666442619.kafka
> > > 1.1G 00000000242666442619.kafka
> > > root@kafkanode-1:/raid0/kafka-logs/measures-0# du -h --apparent-size
> > > 00000000242666442619.kafka
> > > 513M 00000000242666442619.kafka
> > >
> > >
> > > Our log size/retention policy is:
> > >
> > > log.file.size=536870912
> > > log.retention.hours=96
> > >
> > > We tried dropping the caches from the Stack Overflow suggestion (sync;
> > echo
> > > 3 > /proc/sys/vm/drop_caches) but that didn't seem to clear up the
> extra
> > > space. Haven't had the chance to try remounting with the allocsize
> > option.
> > >
> > > In summary, it would be great if Kafka would close FD's to log files
> that
> > > hadn't been read from for some period of time if it addresses this
> issue.
> > >
> > > Cheers,
> > >
> > > Mike
> > >
> > > On Fri, Jul 26, 2013 at 5:03 PM, Jay Kreps <jay.kr...@gmail.com>
> wrote:
> > >
> > > > Cool, good to know.
> > > >
> > > >
> > > > On Fri, Jul 26, 2013 at 2:00 PM, Jason Rosenberg <j...@squareup.com>
> > > wrote:
> > > >
> > > > > Jay,
> > > > >
> > > > > My only experience so far with this is using XFS.  It appears the
> XFS
> > > > > behavior is evolving, and in fact, we see somewhat different
> behavior
> > > > from
> > > > > 2 of our CentOS kernel versions in use.  I've been trying to ask
> > > > questions
> > > > > about all this on the XFS.org mailing list, but so far, having not
> > much
> > > > > luck understanding the xfs versioning correlated to CentOS
> versions.
> > > > >
> > > > > Anyway, yes, I think it would definitely be worth trying the
> solution
> > > you
> > > > > suggest, which would be to close the file on rotation, and re-open
> > > > > read-only.  Or to close files after a few hours of not being
> > accessed.
> > > > If
> > > > > a patch for one of these approaches can be cobbled together, I'd
> love
> > > to
> > > > > test it out on our staging environment.  I'd be willing to
> experiment
> > > > with
> > > > > such a patch myself, although I'm not 100% of all the places to
> look
> > > (but
> > > > > might dive in).
> > > > >
> > > > > Xfs appears to the option of using dynamic, speculative
> > preallocation,
> > > in
> > > > > which case it progressively doubles the amount of space reserved
> for
> > a
> > > > > file, as the file grows.  It does do this for all open files.  If
> the
> > > > file
> > > > > is closed, it will then release the preallocated space not in use.
> > >  It's
> > > > > not clear whether this releasing of space happens immediately on
> > close,
> > > > and
> > > > > whether re-opening the file read-only immediately, will keep it
> from
> > > > > releasing space (still trying to gather more info on that).
> > > > >
> > > > > I haven't looked too much at the index files, but those too appear
> to
> > > > have
> > > > > this behavior (e.g. preallocated size is always on the order of
> > double
> > > > the
> > > > > actual size, until the app is restarted).
> > > > >
> > > > > Jason
> > > > >
> > > > >
> > > > > On Fri, Jul 26, 2013 at 12:46 PM, Jay Kreps <jay.kr...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Interesting.
> > > > > >
> > > > > > Yes, Kafka keeps all log files open indefinitely. There is no
> > > inherent
> > > > > > reason this needs to be the case, though, it would be possible to
> > LRU
> > > > out
> > > > > > old file descriptors and close them if they are not accessed for
> a
> > > few
> > > > > > hours and then reopen on the first access. We just haven't
> > > implemented
> > > > > > anything like that.
> > > > > >
> > > > > > It would be good to understand this a little better. Does xfs
> > > > > pre-allocate
> > > > > > space for all open files? Perhaps just closing the file on log
> role
> > > and
> > > > > > opening it read-only would solve the issue? Is this at all
> related
> > to
> > > > the
> > > > > > use of sparse files for the indexes (i.e.
> > > > > RandomAccessFile.setLength(10MB)
> > > > > > when we create the index)? Does this effect other filesystems or
> > just
> > > > > xfs?
> > > > > >
> > > > > > -Jay
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 26, 2013 at 12:42 AM, Jason Rosenberg <
> > j...@squareup.com>
> > > > > > wrote:
> > > > > >
> > > > > > > It looks like xfs will reclaim the preallocated space for a
> file,
> > > > after
> > > > > > it
> > > > > > > is closed.
> > > > > > >
> > > > > > > Does kafka close a file after it has reached it's max size and
> > > > started
> > > > > > > writing to the next log file in sequence?  Or does it keep all
> > open
> > > > > until
> > > > > > > they are deleted, or the server quits (that's what it seems
> > like).
> > > > > > >
> > > > > > > I could imagine that it might need to keep log files open, in
> > order
> > > > to
> > > > > > > allow consumers access to them.  But does it keep them open
> > > > > indefinitely,
> > > > > > > after there is no longer any data to be written to them, and no
> > > > > consumers
> > > > > > > are currently attempting to read from them?
> > > > > > >
> > > > > > > Jason
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jul 16, 2013 at 4:32 PM, Jay Kreps <
> jay.kr...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Interesting. Yes it will respect whatever setting it is given
> > for
> > > > new
> > > > > > > > segments created from that point on.
> > > > > > > >
> > > > > > > > -Jay
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Jul 16, 2013 at 11:23 AM, Jason Rosenberg <
> > > > j...@squareup.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Ok,
> > > > > > > > >
> > > > > > > > > An update on this.  It seems we are using XFS, which is
> > > available
> > > > > in
> > > > > > > > newer
> > > > > > > > > versions of Centos.  It definitely does pre-allocate space
> > as a
> > > > > file
> > > > > > > > grows,
> > > > > > > > > see:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://serverfault.com/questions/406069/why-are-my-xfs-filesystems-suddenly-consuming-more-space-and-full-of-sparse-file
> > > > > > > > >
> > > > > > > > > Apparently it's not hard-allocated space, and would be
> > released
> > > > > under
> > > > > > > > > resource pressure....seems we may need to update how we
> > monitor
> > > > > disk
> > > > > > > > space
> > > > > > > > > usage, etc....
> > > > > > > > >
> > > > > > > > > But, it seems that the default log file size of 1.1Gb,
> causes
> > > it
> > > > to
> > > > > > > jump
> > > > > > > > to
> > > > > > > > > preallocate an extra Gb.  So, in theory, if  I set a
> > strategic
> > > > log
> > > > > > file
> > > > > > > > > size to be just under the threshold that forces it to
> > > > exponentially
> > > > > > > > double
> > > > > > > > > the size from 1Gb to 2Gb, I should be able to mostly solve
> > this
> > > > > > issue.
> > > > > > > > >  E.g. use 950Mb instead of 1.1Gb max log file size.
> > > > > > > > >
> > > > > > > > > If I change the max log file size for a broker, and restart
> > it,
> > > > > will
> > > > > > it
> > > > > > > > > respect the new size going forward?
> > > > > > > > >
> > > > > > > > > Jason
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sun, Jul 14, 2013 at 9:44 AM, Jay Kreps <
> > > jay.kr...@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hey Jason,
> > > > > > > > > >
> > > > > > > > > > As Jun says, we haven't seen that issue and no one else
> has
> > > > > > reported
> > > > > > > > that
> > > > > > > > > > but it sounds like a bug of some kind.
> > > > > > > > > >
> > > > > > > > > > In 0.7 we don't do any preallocation of anything. The
> only
> > > time
> > > > > > files
> > > > > > > > > > shrink is during recovery--we re-checksum all messages
> that
> > > may
> > > > > not
> > > > > > > > have
> > > > > > > > > > been flushed and if any invalid messages are found we
> > > truncate
> > > > > them
> > > > > > > > off.
> > > > > > > > > > This would only occur after an unclean shutdown (kill -9)
> > and
> > > > you
> > > > > > > would
> > > > > > > > > get
> > > > > > > > > > a bunch of errors in log warning you this was happening.
> > > > > > > > > >
> > > > > > > > > > -Jay
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Sun, Jul 14, 2013 at 1:36 AM, Jason Rosenberg <
> > > > > j...@squareup.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > An update on this.  It appears that the phenomenon I'm
> > > seeing
> > > > > is
> > > > > > > that
> > > > > > > > > > disk
> > > > > > > > > > > space is freed on restart, but it's not due files
> getting
> > > > > deleted
> > > > > > > on
> > > > > > > > > > > restart, but instead files are getting truncated on
> > > restart.
> > > > >  It
> > > > > > > > > appears
> > > > > > > > > > > that log files get pre-allocated to a larger size than
> is
> > > > used
> > > > > > > right
> > > > > > > > > > away.
> > > > > > > > > > >  Upon restart, they get truncated to the size of the
> file
> > > > that
> > > > > > > > actually
> > > > > > > > > > > contains data.  Does this make sense?
> > > > > > > > > > >
> > > > > > > > > > > Before restart, I see a large number of log files size
> > > 2.1Gb.
> > > > > >  Upon
> > > > > > > > > > > restart, the disk space reclaimed drops to almost half
> > > that,
> > > > on
> > > > > > > > > average.
> > > > > > > > > > >
> > > > > > > > > > > Thoughts?
> > > > > > > > > > >
> > > > > > > > > > > Jason
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Thu, May 23, 2013 at 8:55 PM, Jun Rao <
> > jun...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I haven't seen this issue before. We do have ~1K
> topics
> > > in
> > > > > one
> > > > > > of
> > > > > > > > the
> > > > > > > > > > > Kafka
> > > > > > > > > > > > clusters at LinkedIn.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Jun
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, May 23, 2013 at 11:05 AM, Jason Rosenberg <
> > > > > > > > j...@squareup.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Yeah, that's what it looks like to me (looking at
> the
> > > > > code).
> > > > > > >  So,
> > > > > > > > > I'm
> > > > > > > > > > > > > guessing it's some os level caching, resource
> > > recycling.
> > > > > >  Have
> > > > > > > > you
> > > > > > > > > > ever
> > > > > > > > > > > > > heard of this happening?  One thing that might be
> > > > different
> > > > > > in
> > > > > > > my
> > > > > > > > > > usage
> > > > > > > > > > > > > from the norm is a relatively large number of
> topics
> > > > (e.g.
> > > > > > ~2K
> > > > > > > > > > topics).
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jason
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, May 23, 2013 at 7:14 AM, Jun Rao <
> > > > jun...@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Jason,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Kafka closes the handler of all delete files.
> > > > Otherwise,
> > > > > > the
> > > > > > > > > broker
> > > > > > > > > > > > will
> > > > > > > > > > > > > > run out of file handler quickly.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, May 22, 2013 at 10:17 PM, Jason
> Rosenberg <
> > > > > > > > > > j...@squareup.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So, does this indicate kafka (or the jvm
> itself)
> > is
> > > > not
> > > > > > > > > > > aggressively
> > > > > > > > > > > > > > > closing file handles of deleted files?  Is
> there
> > a
> > > > fix
> > > > > > for
> > > > > > > > > this?
> > > > > > > > > > >  Or
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > there not likely anything to be done?  What
> > happens
> > > > if
> > > > > > the
> > > > > > > > disk
> > > > > > > > > > > fills
> > > > > > > > > > > > > up
> > > > > > > > > > > > > > > with file handles for phantom deleted files?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Jason
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, May 22, 2013 at 9:50 PM, Jonathan
> Creasy
> > <
> > > > > > > j...@box.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > It isn't uncommon if a process has an open
> file
> > > > > handle
> > > > > > > on a
> > > > > > > > > > file
> > > > > > > > > > > > that
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > deleted, the space is not freed until the
> > handle
> > > is
> > > > > > > closed.
> > > > > > > > > So
> > > > > > > > > > > > > > restarting
> > > > > > > > > > > > > > > > the process that has a handle on the file
> would
> > > > cause
> > > > > > the
> > > > > > > > > space
> > > > > > > > > > > to
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > freed
> > > > > > > > > > > > > > > > also.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > You can troubleshoot that with lsof.
> > > > > > > > > > > > > > > > Normally, I see 2-4 log segments deleted
> every
> > > hour
> > > > > in
> > > > > > my
> > > > > > > > > > > brokers.
> > > > > > > > > > > >  I
> > > > > > > > > > > > > > see
> > > > > > > > > > > > > > > > log lines like this:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2013-05-23 04:40:06,857  INFO
> > > [kafka-logcleaner-0]
> > > > > > > > > > > log.LogManager -
> > > > > > > > > > > > > > > > Deleting log segment
> 00000000035434043157.kafka
> > > > from
> > > > > > > > > <redacted
> > > > > > > > > > > > topic>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > However, it seems like if I restart the
> > broker, a
> > > > > > massive
> > > > > > > > > > amount
> > > > > > > > > > > of
> > > > > > > > > > > > > > disk
> > > > > > > > > > > > > > > > space is freed (without a corresponding flood
> > of
> > > > > these
> > > > > > > log
> > > > > > > > > > > segment
> > > > > > > > > > > > > > > deleted
> > > > > > > > > > > > > > > > messages).  Is there an explanation for this?
> > >  Does
> > > > > > kafka
> > > > > > > > > keep
> > > > > > > > > > > > > > reference
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > file segments around, and reuse them as
> needed
> > or
> > > > > > > > something?
> > > > > > > > > >  And
> > > > > > > > > > > > > then
> > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > restart, the references to those free segment
> > > files
> > > > > are
> > > > > > > > > > dropped?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thoughts?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This is with 0.7.2.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Jason
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > >   Mike Heffner <m...@librato.com>
> > >   Librato, Inc.
> > >
> >
>



-- 

  Mike Heffner <m...@librato.com>
  Librato, Inc.

Re: large amount of disk space freed on restart

Reply via email to