On 09/17/2017 04:17 AM, Kai Krakow wrote:
> Am Sun, 17 Sep 2017 01:20:45 -0500
> schrieb Dan Douglas <orm...@gmail.com>:
> 
>> On 09/16/2017 07:06 AM, Kai Krakow wrote:
>>> Am Fri, 15 Sep 2017 14:28:49 -0400
>>> schrieb Rich Freeman <ri...@gentoo.org>:
>>>   
>>>> On Fri, Sep 8, 2017 at 3:16 PM, Kai Krakow <hurikha...@gmail.com>
>>>> wrote:  
>>  [...]  
>>>>
>>>> True, but keep in mind that this applies in general in btrfs to any
>>>> kind of modification to a file.  If you modify 1MB in the middle
>>>> of a 10GB file on ext4 you end up it taking up 10GB of space.  If
>>>> you do the same thing in btrfs you'll probably end up with the
>>>> file taking up 10.001GB.  Since btrfs doesn't overwrite files
>>>> in-place it will typically allocate a new extent for the
>>>> additional 1MB, and the original content at that position within
>>>> the file is still on disk in the original extent.  It works a bit
>>>> like a log-based filesystem in this regard (which is also
>>>> effectively copy on write).  
>>>
>>> Good point, this makes sense. I never thought about that.
>>>
>>> But I guess that btrfs doesn't use 10G sized extents? And I also
>>> guess, this is where autodefrag jumps in.  
>>
>> According to btrfs-filesystem(8), defragmentation breaks reflinks, in
>> all but a few old kernel versions where I guess they tried to fix the
>> problem and apparently failed.
> 
> It was splitting and splicing all the reflinks which is actually a tree
> walk with more and more extents coming into the equation, and ended up
> doing a lot of small IO and needing a lot of memory. I think you really
> cannot fix this when working with extents.

I figured by "break up" they meant it eliminates the reflink by making
a full copy... so the increased space they're talking about isn't really
double that of the original data in other words.

> 
>> This really makes much of what btrfs
>> does altogether pointless if you ever defragment manually or have
>> autodefrag enabled. Deduplication is broken for the same reason.
> 
> It's much easier to fix this for deduplication: Just write your common
> denominator of an extent to a tmp file, then walk all the reflinks and
> share them with parts of this extent.
> 
> If you carefully select what to defragment, there should be no problem.
> A defrag tool could simply skip all the shared extents. A few fragments
> do not hurt performance at all, but what's important is spatial
> locality. A lot small fragments may hurt performance a lot, so one
> could give the defragger a hint when to ignore the rule and still
> defragment the extent. Also, when your deduplication window is 1M you
> could probably safely defrag all extents smaller than 1M.

Yeah this sort of hurts with the way I deal wtih KVM image snapshots. I
have raw base images as backing files with lots of shared and null
data, so I run `fallocate --dig-holes' followed by `duperemove
--dedupe-options=same' on the cow-enabled base images and hope that
btrfs defrag can clean up the resulting fragmented mess, but it's a slow
process and doesn't seem to do a good job.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to