Public bug reported:

Setting drive device/queue_depth to 31 from 32 resolved an issue whereby
I had numerous zpool and ATA errors but only under high load (zpool
scrub) or when trimming the drives.  Was able to reduce incidence by
setting to libata.force=noncqtrim, and resolve with libata.force=noncq,
but with an obvious performance impact.

Upstream kernel seems to be aware of this issue, so I'm assuming this is
a downstream or udev configuration issue, see:
https://ata.wiki.kernel.org/index.php/Libata_FAQ#Enabling.2C_disabling_and_checking_NCQ

Scrub repaired all errors, but, because repairs were made, it seems like
it's not just a communications issue, and that there is the potential
for DATA LOSS on non-redundant and/or non-ZFS configurations.

Sample syslog error:

[   33.688898] ata1.00: exception Emask 0x50 SAct 0x1003000 SErr 0x4c0900 
action 0x6 frozen
[   33.688908] ata1.00: irq_stat 0x08000000, interface fatal error
[   33.688913] ata1: SError: { UnrecovData HostInt CommWake 10B8B Handshk }
[   33.688917] ata1.00: failed command: WRITE FPDMA QUEUED
[   33.688923] ata1.00: cmd 61/00:60:df:28:3d/01:00:2a:00:00/40 tag 12 ncq dma 
131072 out
[   33.688929] ata1.00: status: { DRDY }
[   33.688931] ata1.00: failed command: WRITE FPDMA QUEUED
[   33.688937] ata1.00: cmd 61/08:68:18:a9:d2/00:00:04:00:00/40 tag 13 ncq dma 
4096 out
[   33.688942] ata1.00: status: { DRDY }
[   33.688945] ata1.00: failed command: WRITE FPDMA QUEUED
[   33.688951] ata1.00: cmd 61/00:c0:df:27:3d/01:00:2a:00:00/40 tag 24 ncq dma 
131072 out
[   33.688956] ata1.00: status: { DRDY }
[   33.688963] ata1: hard resetting link

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Incomplete

** Attachment added: "lspci-vnvn.log"
   
https://bugs.launchpad.net/bugs/1894230/+attachment/5407655/+files/lspci-vnvn.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1894230

Title:
  Device queue depth should be 31 not 32

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  Setting drive device/queue_depth to 31 from 32 resolved an issue
  whereby I had numerous zpool and ATA errors but only under high load
  (zpool scrub) or when trimming the drives.  Was able to reduce
  incidence by setting to libata.force=noncqtrim, and resolve with
  libata.force=noncq, but with an obvious performance impact.

  Upstream kernel seems to be aware of this issue, so I'm assuming this
  is a downstream or udev configuration issue, see:
  
https://ata.wiki.kernel.org/index.php/Libata_FAQ#Enabling.2C_disabling_and_checking_NCQ

  Scrub repaired all errors, but, because repairs were made, it seems
  like it's not just a communications issue, and that there is the
  potential for DATA LOSS on non-redundant and/or non-ZFS
  configurations.

  Sample syslog error:

  [   33.688898] ata1.00: exception Emask 0x50 SAct 0x1003000 SErr 0x4c0900 
action 0x6 frozen
  [   33.688908] ata1.00: irq_stat 0x08000000, interface fatal error
  [   33.688913] ata1: SError: { UnrecovData HostInt CommWake 10B8B Handshk }
  [   33.688917] ata1.00: failed command: WRITE FPDMA QUEUED
  [   33.688923] ata1.00: cmd 61/00:60:df:28:3d/01:00:2a:00:00/40 tag 12 ncq 
dma 131072 out
  [   33.688929] ata1.00: status: { DRDY }
  [   33.688931] ata1.00: failed command: WRITE FPDMA QUEUED
  [   33.688937] ata1.00: cmd 61/08:68:18:a9:d2/00:00:04:00:00/40 tag 13 ncq 
dma 4096 out
  [   33.688942] ata1.00: status: { DRDY }
  [   33.688945] ata1.00: failed command: WRITE FPDMA QUEUED
  [   33.688951] ata1.00: cmd 61/00:c0:df:27:3d/01:00:2a:00:00/40 tag 24 ncq 
dma 131072 out
  [   33.688956] ata1.00: status: { DRDY }
  [   33.688963] ata1: hard resetting link

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1894230/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to