Interesting. Yes it will respect whatever setting it is given for new
segments created from that point on.

-Jay


On Tue, Jul 16, 2013 at 11:23 AM, Jason Rosenberg <j...@squareup.com> wrote:

> Ok,
>
> An update on this.  It seems we are using XFS, which is available in newer
> versions of Centos.  It definitely does pre-allocate space as a file grows,
> see:
>
> http://serverfault.com/questions/406069/why-are-my-xfs-filesystems-suddenly-consuming-more-space-and-full-of-sparse-file
>
> Apparently it's not hard-allocated space, and would be released under
> resource pressure....seems we may need to update how we monitor disk space
> usage, etc....
>
> But, it seems that the default log file size of 1.1Gb, causes it to jump to
> preallocate an extra Gb.  So, in theory, if  I set a strategic log file
> size to be just under the threshold that forces it to exponentially double
> the size from 1Gb to 2Gb, I should be able to mostly solve this issue.
>  E.g. use 950Mb instead of 1.1Gb max log file size.
>
> If I change the max log file size for a broker, and restart it, will it
> respect the new size going forward?
>
> Jason
>
>
> On Sun, Jul 14, 2013 at 9:44 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
>
> > Hey Jason,
> >
> > As Jun says, we haven't seen that issue and no one else has reported that
> > but it sounds like a bug of some kind.
> >
> > In 0.7 we don't do any preallocation of anything. The only time files
> > shrink is during recovery--we re-checksum all messages that may not have
> > been flushed and if any invalid messages are found we truncate them off.
> > This would only occur after an unclean shutdown (kill -9) and you would
> get
> > a bunch of errors in log warning you this was happening.
> >
> > -Jay
> >
> >
> >
> > On Sun, Jul 14, 2013 at 1:36 AM, Jason Rosenberg <j...@squareup.com>
> wrote:
> >
> > > An update on this.  It appears that the phenomenon I'm seeing is that
> > disk
> > > space is freed on restart, but it's not due files getting deleted on
> > > restart, but instead files are getting truncated on restart.  It
> appears
> > > that log files get pre-allocated to a larger size than is used right
> > away.
> > >  Upon restart, they get truncated to the size of the file that actually
> > > contains data.  Does this make sense?
> > >
> > > Before restart, I see a large number of log files size 2.1Gb.  Upon
> > > restart, the disk space reclaimed drops to almost half that, on
> average.
> > >
> > > Thoughts?
> > >
> > > Jason
> > >
> > >
> > > On Thu, May 23, 2013 at 8:55 PM, Jun Rao <jun...@gmail.com> wrote:
> > >
> > > > I haven't seen this issue before. We do have ~1K topics in one of the
> > > Kafka
> > > > clusters at LinkedIn.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Thu, May 23, 2013 at 11:05 AM, Jason Rosenberg <j...@squareup.com>
> > > > wrote:
> > > >
> > > > > Yeah, that's what it looks like to me (looking at the code).  So,
> I'm
> > > > > guessing it's some os level caching, resource recycling.  Have you
> > ever
> > > > > heard of this happening?  One thing that might be different in my
> > usage
> > > > > from the norm is a relatively large number of topics (e.g. ~2K
> > topics).
> > > > >
> > > > > Jason
> > > > >
> > > > >
> > > > > On Thu, May 23, 2013 at 7:14 AM, Jun Rao <jun...@gmail.com> wrote:
> > > > >
> > > > > > Jason,
> > > > > >
> > > > > > Kafka closes the handler of all delete files. Otherwise, the
> broker
> > > > will
> > > > > > run out of file handler quickly.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Wed, May 22, 2013 at 10:17 PM, Jason Rosenberg <
> > j...@squareup.com>
> > > > > > wrote:
> > > > > >
> > > > > > > So, does this indicate kafka (or the jvm itself) is not
> > > aggressively
> > > > > > > closing file handles of deleted files?  Is there a fix for
> this?
> > >  Or
> > > > is
> > > > > > > there not likely anything to be done?  What happens if the disk
> > > fills
> > > > > up
> > > > > > > with file handles for phantom deleted files?
> > > > > > >
> > > > > > > Jason
> > > > > > >
> > > > > > >
> > > > > > > On Wed, May 22, 2013 at 9:50 PM, Jonathan Creasy <j...@box.com>
> > > wrote:
> > > > > > >
> > > > > > > > It isn't uncommon if a process has an open file handle on a
> > file
> > > > that
> > > > > > is
> > > > > > > > deleted, the space is not freed until the handle is closed.
> So
> > > > > > restarting
> > > > > > > > the process that has a handle on the file would cause the
> space
> > > to
> > > > be
> > > > > > > freed
> > > > > > > > also.
> > > > > > > >
> > > > > > > > You can troubleshoot that with lsof.
> > > > > > > > Normally, I see 2-4 log segments deleted every hour in my
> > > brokers.
> > > >  I
> > > > > > see
> > > > > > > > log lines like this:
> > > > > > > >
> > > > > > > > 2013-05-23 04:40:06,857  INFO [kafka-logcleaner-0]
> > > log.LogManager -
> > > > > > > > Deleting log segment 00000000035434043157.kafka from
> <redacted
> > > > topic>
> > > > > > > >
> > > > > > > > However, it seems like if I restart the broker, a massive
> > amount
> > > of
> > > > > > disk
> > > > > > > > space is freed (without a corresponding flood of these log
> > > segment
> > > > > > > deleted
> > > > > > > > messages).  Is there an explanation for this?  Does kafka
> keep
> > > > > > reference
> > > > > > > to
> > > > > > > > file segments around, and reuse them as needed or something?
> >  And
> > > > > then
> > > > > > or
> > > > > > > > restart, the references to those free segment files are
> > dropped?
> > > > > > > >
> > > > > > > > Thoughts?
> > > > > > > >
> > > > > > > > This is with 0.7.2.
> > > > > > > >
> > > > > > > > Jason
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to