Mouse <mo...@rodents-montreal.org> writes: >> The semantics of fdiscard(2) are a bit messy, with TRIM and undefined >> contents. > > The messy bits apply only when discarding space on a device, as I read > the (9.1) manpage. I suspect that's because that's how devices work: > dropped blocks return, essentially, whatever the implementation finds > most convenient. Requiring anything else would mean it simply couldn't > work on existing devices.
Agreed about devices, but fdiscard is a filesystem operation. I find it bizarre that a fs op whose basic semantics are: turn this region of a file into a hole, more or less as it if were not written to change to for this part of the file, either make it a whole, or tell the underlying device -- whatever that means -- that you are ok with those data blocks having UB when read The essence of fdicard, as I read it, is to drop those blocks from the inode. What happens to those blocks after that is another matter. >> That's surprising to me, as I see telling the hardware that blocks >> are no longer needed seems separable from a file no longer >> referencing those blocks. > > Conceptually, so do I. But, on a device with a filesystem on it, when > discarding part of a file, the blocks are no longer used, so they might > as well be discarded; when discarding blocks, they'd better no longer > be part of a file. So tying them together makes at least some sense > when discarding space in a filesystem file. I agree that while separable, there can be a sequential relationship. It is reasonable to tell a device that the OS no longer cares about block contents, when those blocks are placed on the free list, via fdiscard(2), just as via (the last) unlink(2). It does not seem reasonable to tell a device the OS no longer cares if the blocks are still allocated to a file. I could see defining a fmakeub(2) call that tells the OS that it's ok to return arbitrary bits for that range, vs making them a hole -- while keeping the blocks in the file. But I can't see anyone wanting to use that, except maybe a database that's essentially implementing a filesystem within a file. I would want to understand what kinds of discard operations are available in relatively modern SSDs and flash drives (which I suspect differ), what they actually do, if there is any point in calling them, before completing a design for fdiscard(2). I could also see a reasonable option of implementing fdiscard(2) as a FS-layer operation, and not making any TRIM-like calls.