On Thu, Dec 05, 2024 at 10:26:18PM +0700, Max Nikulin wrote:
On 05/12/2024 16:19, Jörg-Volker Peetz wrote:
1. SSD's have some self healing capacities (discarding defect
sectors) which are performed when the drive is not mounted.
Therefore, enter the BIOS of the computer and let it running for ca.
an hour. Then restart the computer.
I am curious which way OS notifies a drive that it is mounted. I
believed that drivers read and write blocks, maybe switch power save
states, but mount is performed on a higher level.
It doesn't: leaving the system unmounted ensures that the drive is idle,
but in general that's not necessary--just leaving the system alone will
usually have the same result unless you've got a runaway process chewing
on the disk. The SSD will do maintenance tasks when it's idle, or under
pressure (has no other choice because there are no writable blocks
available).
The relevant limitation is that an SSD physical block can only be
written once and then needs to be erased before another write. Changing
a logical block writing the logical block to a different physical
location. Physical blocks vary in size but are many times the size of a
512 byte logically-addressible block. Many logical blocks (or versions
of the same logical block) can be written to a physical block, and
logical blocks that change leave unused older copies on the physical
block. The entire physical block must be erased to write anything to the
now-unused portions. This means copying all of the in-use logical blocks
to a different physical block before erasing the original physical
block. The drive will try to keep a pool of writable physical locations,
and has a cache of faster storage to hold data pending a write to slower
storage. Ideally your writes fit in cache, and the drive can do the
erasing and moving when the drive is idle. If you write more data than
can be cached, and there are no erased blocks to move data into, the
drive needs to relocate existing logical blocks to free up and erase
physical blocks before writing the new data. This has a significant
performance impact if you're trying to write faster than the drive can
relocate/erase.
If you use fstrim/discard you'll notify the drive that certain logical
blocks are not in use, allowing the physical block to be erased without
the need to read & relocate those logical blocks. A block is marked
unavailable/bad if it fails, and won't be used again. This will happen
transparently if a block fails on erase/write (the data will simply be
written to a different physical block and the logical block is
unaffected). The drive will also notice if a physical block is readable
but degrading, and will stop using it once any logical blocks it
contains are written to a new physical block. If a block totally fails
on read (much less common) it can't be relocated and the OS will get
very non-transparent errors every time it tries to read that logical
block. If you have a logical block that can't be read, discarding it can
effectively make it disappear (i.e., the drive marks it as unused
without needing to read it, and it will be available after it is written
to again). You may be able to revitalize a drive with a troublesome bad
block (e.g., underneath a directory entry so it can't be deleted and
trimmed) by trimming the entire drive and restoring from backup. This is
rare; in hundreds of TB of SSD I've encountered that situation exactly
once. In that case it may be just a fluke that won't reoccur, but
probably I wouldn't use that drive again (but if it was just a fluke,
the drive is likely fine and not using it is overly paranoid; the right
course of action is dependent on budget and risk tolerance).