On 06/10/2016 07:07 PM, Henk Slager wrote:
On Thu, Jun 9, 2016 at 5:41 PM, Duncan <1i5t5.dun...@cox.net> wrote:
Hans van Kranenburg posted on Thu, 09 Jun 2016 01:10:46 +0200 as
excerpted:

The next question is what files these extents belong to. To find out, I
need to open up the extent items I get back and follow a backreference
to an inode object. Might do that tomorrow, fun.

To be honest, I suspect /var/log and/or the file storage of mailman to
be the cause of the fragmentation, since there's logging from postfix,
mailman and nginx going on all day long in a slow but steady tempo.
While using btrfs for a number of use cases at work now, we normally
don't use it for the root filesystem. And the cases where it's used as
root filesystem don't do much logging or mail.

FWIW, that's one reason I have a dedicated partition (and filesystem) for
logs, here.  (The other reason is that should something go runaway log-
spewing, I get a warning much sooner when my log filesystem fills up, not
much later, with much worse implications, when the main filesystem fills
up!)

Well, there it is:
https://syrinx.knorrie.org/~knorrie/btrfs/keep/2016-06-11-extents_ichiban_77621886976.txt

Playing around a bit with the search ioctl:
https://github.com/knorrie/btrfs-heatmap/blob/master/chunk-contents.py

This is clearly primarily logging and mailman mbox files. All kinds of small extents, and a huge amount of fragmented free space in between.

And no, autodefrag is not in the mount options currently. Would that be
helpful in this case?

It should be helpful, yes.  Be aware that autodefrag works best with
smaller (sub-half-gig) files, however, and that it used to cause
performance issues with larger database and VM files, in particular.

I don't know why you relate filesize and autodefrag. Maybe because you
say '... used to cause ...'.

Log files grow to few tens of MBs and logrotate will copy the contents into gzipped files (defragging everything as a side effect) every once in a while, so the only concern is the current logfiles.

autodefrag detects random writes and then tries to defrag a certain
range. Its scope size is 256K as far as I see from the code and over
time you see VM images that are on a btrfs fs (CoW, hourly ro
snapshots) having a lot of 256K (or a bit less) sized extents
according to what filefrag reports. I once wanted to try and change
the 256K to 1M or even 4M, but I haven't  come to that.
A 32G VM image would consist of 131072 extents for 256K, 32768 extents
for 1M, 8192 extents for 4M.

Aha.

There used to be a warning on the wiki about that, that was recently
removed, so apparently it's not the issue that it was, but you might wish
to monitor any databases or VMs with gig-plus files to see if it's going
to be a performance issue, once you turn on autodefrag.

For very active databases, I don't know what the effects are, with or
without autodefrag ( either on SSD and/or HDD).
At least on HDD-only, so no persistent SSD caching and noautodefrag,
VMs will result in unacceptable performance soon.

The other issue with autodefrag is that if it hasn't been on and things
are heavily fragmented, it can at first drive down performance as it
rewrites all these heavily fragmented files, until it catches up and is
mostly dealing only with the normal refragmentation load.

I assume you mean that one only gets a performance drop if you
actually do new writes to the fragmented files since autodefrag on. It
shouldn't start defragging by itself AFAIK.

As far as I understand, it only considers new writes yes.

So I can manually defrag the mbox files (which get data appended slowly all the time) and turn on autodefrag, which will also take care of the log files, and after the next logrotate, all old fragmented extents will be freed.

Of course the
best way around that is to run autodefrag from the first time you mount
the filesystem and start writing to it, so it never gets overly
fragmented in the first place.  For a currently in-use and highly
fragmented filesystem, you have two choices, either backup and do a fresh
mkfs.btrfs so you can start with a clean filesystem and autodefrag from
the beginning, or doing manual defrag.

However, be aware that if you have snapshots locking down the old extents
in their fragmented form, a manual defrag will copy the data to new
extents without releasing the old ones as they're locked in place by the
snapshots, thus using additional space.  Worse, if the filesystem is
already heavily fragmented and snapshots are locking most of those
fragments in place, defrag likely won't help a lot, because the free
space as well will be heavily fragmented.   So starting off with a clean
and new filesystem and using autodefrag from the beginning really is your
best bet.

No snapshots here.

If it is about multi-TB fs, I think most important is to have enough
unfragmented free space available and hopefully at the beginning of
the device if it is flat HDD. Maybe a  balance -ddrange=1M..<20% of
device> can do that, I haven't tried.

I'm going to enable autodefrag now, and defrag the existing mbox files, and then do some balance to compact the used space.

A question remains of course... Even when slowly appending data to e.g. a log file... what causes all the free space in between the newly written data extents...?! 300kB?! 4MB?!

78081548288 78081875967    327680 0.03% free space

78081875968 78081896447     20480 0.00% extent item
        extent refs 1 gen 155003 flags DATA
        extent data backref root 257 objectid 901223 names ['access.log.1']

78081896448 78081904639      8192 0.00% extent item
        extent refs 1 gen 155003 flags DATA
        extent data backref root 257 objectid 901223 names ['access.log.1']

78081904640 78082236415    331776 0.03% free space

78082236416 78082256895     20480 0.00% extent item
        extent refs 1 gen 155004 flags DATA
        extent data backref root 257 objectid 901223 names ['access.log.1']

78082256896 78082596863    339968 0.03% free space

78082596864 78082621439     24576 0.00% extent item
        extent refs 1 gen 155005 flags DATA
        extent data backref root 257 objectid 901223 names ['access.log.1']

78082621440 78087327743   4706304 0.44% free space

78087327744 78087335935      8192 0.00% extent item


To be continued...

--
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenb...@mendix.com | www.mendix.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to