On Saturday 16 November 2024 14:36:02 GMT Rich Freeman wrote:
> On Sat, Nov 16, 2024 at 6:02 AM Michael <confabul...@kintzios.com> wrote:
> > I assume (simplistically) with DM-SMRs the
> > discard-garbage collection is managed wholly by the onboard drive
> > controller, while with HM-SMRs the OS will signal the drive to start
> > trimming when the workload is low in order to distribute the timing
> > overheads to the system's idle time.
> 
> I'll admit I haven't looked into the details as I have no need for SMR
> and there aren't any good FOSS solutions for using it that I'm aware
> of (just a few that might be slightly less terrible).  However, this
> doesn't seem correct for two reasons:
> 
> First, I'm not sure why HM-SMR would even need a discard function.
> The discard command is used to tell a drive that a block is safe to
> overwrite without preservation.  A host-managed SMR drive doesn't need
> to know what data is disposable and what data is not.  It simply needs
> to write data when the host instructs it to do so, destroying other
> data in the process, and it is the host's job to not destroy anything
> it cares about.  If a write requires a prior read, then the host needs
> to first do the read, then adjust the written data appropriately so
> that nothing is lost.

As I understand it from reading various articles, the constraint of having to 
write sequentially a whole band when a random block changes within a band 
works the same on both HM-SMR and the more common DM-SMR.  What differs with 
HM-SMR instructions is the host is meant to take over the management of random 
writes and submit these as sequential whole band streams to the drive to be 
committed without a read-modify-write penalty.  I suppose for the host to have 
to read the whole band first from the drive, modify it and then submit it to 
the drive to write it as a whole band will be faster than letting the drive 
manage this operation internally and getting its internal cache full.  This 
will not absolve the drive firmware from having to manage its own trim 
operations and the impact metadata changes could have on the drive, but some 
timing optimisation is perhaps reasonable.  I can't recall where I read this 
bit - perhaps some presentation on XFS or ext4 - not sure.


> Second, there is no reason that any drive of any kind (SMR or SSD)
> NEEDS to do discard/trim operations when the drive is idle, because
> discard/trim is entirely a metadata operation that doesn't require IO
> with the drive data itself.  Now, some drives might CHOOSE to
> implement it that way, but they don't have to.  On an SSD, a discard
> command does not mean that the drive needs to erase or move any data
> at all.  It just means that if there is a subsequent erase that would
> impact that block, it isn't necessary to first read the data and
> re-write it afterwards.  A discard could be implemented entirely in
> non-volatile metadata storage, such as with a bitmap.  For a DM-SMR
> using flash for this purpose would make a lot of sense - you wouldn't
> need much of it.

I don't know if SMRs use flash to record their STL status and data allocation 
between their persistent cache and shingled storage space.  I would think yes, 
or at least they ought to.  Without metadata written to different media, for 
such a small random write to take place atomically a whole SMR band will be 
read, modified in memory, written to a new temporary location and finally 
overwrite the original SMR band.


> This is probably why you have endless arguing online about whether
> discard/trim is helpful for SSDs. It completely depends on how the
> drive implements the command.  The drives I've owned can discard
> blocks without any impact on IO, but I've heard some have a terrible
> impact on IO.  It is just like how you can complete the same sort
> operation in seconds or hours depending on how dumb your sorting
> algorithm is.

I have an old OCZ which would increase IO latency to many seconds if not 
minutes whenever trim was running, to the point where users started 
complaining I had 'broken' their PC.  As if I would do such a thing.  LOL!  
Never mind trying to write anything, reading from the disk would take ages and 
the drive IO LED on the case stayed on for many minutes while TRIM was 
running.  I reformatted with btrfs, overprovisioned enough spare capacity and 
reduced the cron job for trim to once a month, which stopped them complaining.  
I don't know if the firmware was trying to write zeros to the drive 
deterministically, instead of just de-allocating the trimmed blocks.


> In any case, to really take advantage of SMR the OS needs to
> understand exactly how to structure its writes so as to not take a
> penalty, and that requires information about the implementation of the
> storage that isn't visible in a DM-SMR.  

Yes, I think all the OS can do is seek to minimise random writes and from what 
I read a SMR-friendlier fs will try to do this.


> Sure, some designs will do
> better on SMR even without this information, but I don't think they'll
> ever be all that efficient.  It is no different from putting f2fs on a
> flash drive with a brain-dead discard implementation - even if the OS
> does all its discards in nice consolidated contiguous operations it
> doesn't mean that the drive will handle that in milliseconds instead
> of just blocking all IO for an hour - sure, the drive COULD do the
> operation quickly, but that doesn't mean that the firmware designers
> didn't just ignore the simplest use case in favor of just optimizing
> around the assumption that NTFS is the only filesystem in the world.

For all I know consumer grade USB sticks with their cheap controller chips use 
no wear levelling at all:

https://support-en.sandisk.com/app/answers/detailweb/a_id/25185/~/learn-about-trim-support-for-usb-flash%2C-memory-cards%2C-and-ssd-on-windows-and

Consequently, all flash friendly fs can do is perhaps compress and write in 
batched mode to minimise write ops.

I can see where an SMR drive would be a suitable solution for storing media 
files, but I don't know if the shingled bands would cause leakage due to their 
proximity and eventually start losing data.  I haven't seen any reliability 
reports on this technology.

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to