On Mon, Apr 27, 2020 at 12:20 PM <tu...@posteo.de> wrote: > > The kernel is keep track of all, which already has been fstrimmed and > avoids to retrimm the same data. > This knowledge gets lost, when the PC is powercycled or rebooted. >
I imagine this is filesystem-specific. When I checked the ext4 source I didn't think to actually check whether those flags are stored on disk vs in some kind of cache. I wouldn't be surprised if this data is also lost by simply unmounting the filesystem. > I think, the value of the amount of fstrimmed data does not reflect > the amount of data, which gets physically fstrimmed by the SSD > controller. Yup. Though I'd take issue with the term "physically fstrimmed" - I don't think that a concept like this really exists. The only physical operations are reading, writing, and erasing. TRIM is really a logical operation at its heart. It wouldn't make sense for a TRIM to automatically trigger some kind of erase operation all the time. Suppose blocks 1-32 are in a single erase group. You send a TRIM command for block 1 only. It makes no sense to have the device read blocks 2-32, erase blocks 1-32, and then write blocks 2-32 back. That does erase block 1, but it costs a bunch of IO and it only replicates the worst case scenario of what would happen if you overwrote block 1 in place without trimming it first. You might argue that now block 1 can be written later without having to do another erase, but this is only true if the drive can remember that it was already erased - otherwise all writes have to be preceded with reads just to see if the block is already empty. Maybe that is how they actually do it, but it seems like it would make more sense for a drive to try to look for opportunities to erase entire blocks that don't require a read first, or to try to keep these unused areas in some kind in extents that are less expensive to track. The drive already has to do a lot of mapping for the sake of wear leveling. Really though a better solution than any of this is for the filesystem to be more SSD-aware and just only perform writes on entire erase regions at one time. If the drive is told to write blocks 1-32 then it can just blindly erase their contents first because it knows everything there is getting overwritten anyway. Likewise a filesystem could do its own wear-leveling also, especially on something like flash where the cost of fragmentation is not high. I'm not sure how well either zfs or ext4 perform in these roles. Obviously a solution like f2fs designed for flash storage is going to excel here. -- Rich