Public bug reported: Setting drive device/queue_depth to 31 from 32 resolved an issue whereby I had numerous zpool and ATA errors but only under high load (zpool scrub) or when trimming the drives. Was able to reduce incidence by setting to libata.force=noncqtrim, and resolve with libata.force=noncq, but with an obvious performance impact.
Upstream kernel seems to be aware of this issue, so I'm assuming this is a downstream or udev configuration issue, see: https://ata.wiki.kernel.org/index.php/Libata_FAQ#Enabling.2C_disabling_and_checking_NCQ Scrub repaired all errors, but, because repairs were made, it seems like it's not just a communications issue, and that there is the potential for DATA LOSS on non-redundant and/or non-ZFS configurations. Sample syslog error: [ 33.688898] ata1.00: exception Emask 0x50 SAct 0x1003000 SErr 0x4c0900 action 0x6 frozen [ 33.688908] ata1.00: irq_stat 0x08000000, interface fatal error [ 33.688913] ata1: SError: { UnrecovData HostInt CommWake 10B8B Handshk } [ 33.688917] ata1.00: failed command: WRITE FPDMA QUEUED [ 33.688923] ata1.00: cmd 61/00:60:df:28:3d/01:00:2a:00:00/40 tag 12 ncq dma 131072 out [ 33.688929] ata1.00: status: { DRDY } [ 33.688931] ata1.00: failed command: WRITE FPDMA QUEUED [ 33.688937] ata1.00: cmd 61/08:68:18:a9:d2/00:00:04:00:00/40 tag 13 ncq dma 4096 out [ 33.688942] ata1.00: status: { DRDY } [ 33.688945] ata1.00: failed command: WRITE FPDMA QUEUED [ 33.688951] ata1.00: cmd 61/00:c0:df:27:3d/01:00:2a:00:00/40 tag 24 ncq dma 131072 out [ 33.688956] ata1.00: status: { DRDY } [ 33.688963] ata1: hard resetting link ** Affects: linux (Ubuntu) Importance: Undecided Status: Incomplete ** Attachment added: "lspci-vnvn.log" https://bugs.launchpad.net/bugs/1894230/+attachment/5407655/+files/lspci-vnvn.log -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1894230 Title: Device queue depth should be 31 not 32 Status in linux package in Ubuntu: Incomplete Bug description: Setting drive device/queue_depth to 31 from 32 resolved an issue whereby I had numerous zpool and ATA errors but only under high load (zpool scrub) or when trimming the drives. Was able to reduce incidence by setting to libata.force=noncqtrim, and resolve with libata.force=noncq, but with an obvious performance impact. Upstream kernel seems to be aware of this issue, so I'm assuming this is a downstream or udev configuration issue, see: https://ata.wiki.kernel.org/index.php/Libata_FAQ#Enabling.2C_disabling_and_checking_NCQ Scrub repaired all errors, but, because repairs were made, it seems like it's not just a communications issue, and that there is the potential for DATA LOSS on non-redundant and/or non-ZFS configurations. Sample syslog error: [ 33.688898] ata1.00: exception Emask 0x50 SAct 0x1003000 SErr 0x4c0900 action 0x6 frozen [ 33.688908] ata1.00: irq_stat 0x08000000, interface fatal error [ 33.688913] ata1: SError: { UnrecovData HostInt CommWake 10B8B Handshk } [ 33.688917] ata1.00: failed command: WRITE FPDMA QUEUED [ 33.688923] ata1.00: cmd 61/00:60:df:28:3d/01:00:2a:00:00/40 tag 12 ncq dma 131072 out [ 33.688929] ata1.00: status: { DRDY } [ 33.688931] ata1.00: failed command: WRITE FPDMA QUEUED [ 33.688937] ata1.00: cmd 61/08:68:18:a9:d2/00:00:04:00:00/40 tag 13 ncq dma 4096 out [ 33.688942] ata1.00: status: { DRDY } [ 33.688945] ata1.00: failed command: WRITE FPDMA QUEUED [ 33.688951] ata1.00: cmd 61/00:c0:df:27:3d/01:00:2a:00:00/40 tag 24 ncq dma 131072 out [ 33.688956] ata1.00: status: { DRDY } [ 33.688963] ata1: hard resetting link To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894230/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp