On Thu, Dec 05, 2024 at 10:26:18PM +0700, Max Nikulin wrote:
On 05/12/2024 16:19, Jörg-Volker Peetz wrote:
1. SSD's have some self healing capacities (discarding defect sectors) which are performed when the drive is not mounted. Therefore, enter the BIOS of the computer and let it running for ca. an hour. Then restart the computer.

I am curious which way OS notifies a drive that it is mounted. I believed that drivers read and write blocks, maybe switch power save states, but mount is performed on a higher level.

It doesn't: leaving the system unmounted ensures that the drive is idle, but in general that's not necessary--just leaving the system alone will usually have the same result unless you've got a runaway process chewing on the disk. The SSD will do maintenance tasks when it's idle, or under pressure (has no other choice because there are no writable blocks available).

The relevant limitation is that an SSD physical block can only be written once and then needs to be erased before another write. Changing a logical block writing the logical block to a different physical location. Physical blocks vary in size but are many times the size of a 512 byte logically-addressible block. Many logical blocks (or versions of the same logical block) can be written to a physical block, and logical blocks that change leave unused older copies on the physical block. The entire physical block must be erased to write anything to the now-unused portions. This means copying all of the in-use logical blocks to a different physical block before erasing the original physical block. The drive will try to keep a pool of writable physical locations, and has a cache of faster storage to hold data pending a write to slower storage. Ideally your writes fit in cache, and the drive can do the erasing and moving when the drive is idle. If you write more data than can be cached, and there are no erased blocks to move data into, the drive needs to relocate existing logical blocks to free up and erase physical blocks before writing the new data. This has a significant performance impact if you're trying to write faster than the drive can relocate/erase.

If you use fstrim/discard you'll notify the drive that certain logical blocks are not in use, allowing the physical block to be erased without the need to read & relocate those logical blocks. A block is marked unavailable/bad if it fails, and won't be used again. This will happen transparently if a block fails on erase/write (the data will simply be written to a different physical block and the logical block is unaffected). The drive will also notice if a physical block is readable but degrading, and will stop using it once any logical blocks it contains are written to a new physical block. If a block totally fails on read (much less common) it can't be relocated and the OS will get very non-transparent errors every time it tries to read that logical block. If you have a logical block that can't be read, discarding it can effectively make it disappear (i.e., the drive marks it as unused without needing to read it, and it will be available after it is written to again). You may be able to revitalize a drive with a troublesome bad block (e.g., underneath a directory entry so it can't be deleted and trimmed) by trimming the entire drive and restoring from backup. This is rare; in hundreds of TB of SSD I've encountered that situation exactly once. In that case it may be just a fluke that won't reoccur, but probably I wouldn't use that drive again (but if it was just a fluke, the drive is likely fine and not using it is overly paranoid; the right course of action is dependent on budget and risk tolerance).

Reply via email to