On Mon, Apr 27, 2020 at 12:20 PM <tu...@posteo.de> wrote:
>
> The kernel is keep track of all, which already has been fstrimmed and
> avoids to retrimm the same data.
> This knowledge gets lost, when the PC is powercycled or rebooted.
>

I imagine this is filesystem-specific.  When I checked the ext4 source
I didn't think to actually check whether those flags are stored on
disk vs in some kind of cache.

I wouldn't be surprised if this data is also lost by simply unmounting
the filesystem.

> I think, the value of the amount of fstrimmed data does not reflect
> the amount of data, which gets physically fstrimmed by the SSD
> controller.

Yup.  Though I'd take issue with the term "physically fstrimmed" - I
don't think that a concept like this really exists.  The only physical
operations are reading, writing, and erasing.  TRIM is really a
logical operation at its heart.

It wouldn't make sense for a TRIM to automatically trigger some kind
of erase operation all the time.  Suppose blocks 1-32 are in a single
erase group.  You send a TRIM command for block 1 only.  It makes no
sense to have the device read blocks 2-32, erase blocks 1-32, and then
write blocks 2-32 back.  That does erase block 1, but it costs a bunch
of IO and it only replicates the worst case scenario of what would
happen if you overwrote block 1 in place without trimming it first.
You might argue that now block 1 can be written later without having
to do another erase, but this is only true if the drive can remember
that it was already erased - otherwise all writes have to be preceded
with reads just to see if the block is already empty.

Maybe that is how they actually do it, but it seems like it would make
more sense for a drive to try to look for opportunities to erase
entire blocks that don't require a read first, or to try to keep these
unused areas in some kind in extents that are less expensive to track.
The drive already has to do a lot of mapping for the sake of wear
leveling.

Really though a better solution than any of this is for the filesystem
to be more SSD-aware and just only perform writes on entire erase
regions at one time.  If the drive is told to write blocks 1-32 then
it can just blindly erase their contents first because it knows
everything there is getting overwritten anyway.  Likewise a filesystem
could do its own wear-leveling also, especially on something like
flash where the cost of fragmentation is not high.  I'm not sure how
well either zfs or ext4 perform in these roles.  Obviously a solution
like f2fs designed for flash storage is going to excel here.

-- 
Rich

Reply via email to