On 19/02/2024 18:06, cglogic wrote:
Hello Warner,

Thank you for detailed explanation. Looks like it has no sense for me to worry about disabling NCQ TRIM on that old PC.

However it would be great to have possibility to apply such hardware bug workarounds for affected controller/drive pairs only. Or, at least, have possibility to disable such quirks on per drive basis via sysctl. I didn't find such a possibility.

Yeah, we do not have a way to control ata_da (and scsi_da) quirks via configuration. It would be nice to have it, indeed.

Also, perhaps what ADA_Q_NCQ_TRIM_BROKEN could do is prefer a different delete method by default. That way, NCQ_DSM_TRIM option would not disappear completely and it would be possible to set with kern.cam.ada.N.delete_method sysctl.

On Monday, February 19th, 2024 at 5:18 PM, Warner Losh <i...@bsdimp.com> wrote:


On Mon, Feb 19, 2024, 4:29 AM cglogic <cglo...@protonmail.com <mailto:cglo...@protonmail.com>> wrote:

    On Monday, February 19th, 2024 at 1:03 PM, Andriy Gapon <a...@freebsd.org>
    wrote:

    > On 19/02/2024 12:47, cglogic wrote:
    >
    > > Hello Andriy,
    > >
    > > I use ZFS with autotrim enabled on Samsung 860 PRO connected to Intel
    AHCI SATA controller for 9 years without any issue.
    >
    >
    > I think that it was released in 2018?

    You are right, it's 9 years old computer, SSD was added later.

    >
    > > Can I disable this quirk locally or have I revert the patch to
    continue use NCQ TRIM with this SSD?
    >
    >
    > I don't think that there is a way to disable the quirk, so you'll have
    to revert.
    > Note that ATA TRIM is still enabled, it's only NCQ TRIM that got disabled.

    From my understanding ATA TRIM is non-queued TRIM as opposed to NCQ TRIM.
    With only ATA TRIM enabled, drive will have increased latency when large
    amount of files deleted.

    However I'm not sure how ZFS handles TRIMs, maybe it groups several TRIMs
    and then sends when drive is idle.
    Another question is how this affects TRIM enabled swap partition.

    In general, if I'm correct here, disabling NCQ TRIM should not be an issue
    for desktop usage, except for large git repository updates, etc.


In general, it won't matter a ton. Unless you are seeing stalls because of read/write I/O waiting behind trims that have to go to the drive one at a time, the effects are in the noise. If you are seeing that, though, the effects can be quite helpful in terms of latency... assuming the drive is cooperative. The benefits of it vary a lot from drive to drive as well. Some early drives still serialize the trims, so the latency benefit is not so great. Some enterprise drives do trims instantly and keep some internal state that is helped a lot by them. But if NCQ trims are unstable, we likely should err on the side of disabling.

The errors that Andriy was seeing typically are cable or controller issues. In his case, it was a controller issue (apparently, and surprisingly, paired with a specific type of drive). Linux has a specific workaround for the pair, which is unusual. But there's two issues at play.

The first issue is that these drives should have NCQ Trim turned off, according to Linux's commit log. This may be firmware revision based or may just be some people are luckier with newer firmware (it's hard to say from the mixed reports, but they read more like luck than version for the sampling I did). So Andriy's change is good. The second issue, and one I was confused on earlier, is that Linux goes a step further and disables NCQ entirely for these old controllers for 860 and 870 drives. This isn't too surprising. We have a workaround for PMP for these controllers already.... One can disable NCQ entirely by setting the tags to 0... Given the specificity of the problem, and the age of the controllers, I'm inclined to not implement the Linux workaround for this specific pair.


--
Andriy Gapon


Reply via email to