Various people have reported seeing kernel diagnostic assertion
"ccb->ccb_xa.state == ATA_S_ONCHIP" panics with ahci.  In short, this happens
when a queued command fails, we ask the device which command fails, and it
gives us the wrong answer.  The ccb_xa.state assertion fails if the command
was not active.

For non-queued commands, we handle this by failing all active commands (since
r1.157 in 2010), and every other driver I've looked at does this too for both
queued and non-queued commands, so I think it makes sense to handle queued
command errors the same way.

This came out of the most recent thread about this on bugs@, where it seems
to have made a slight improvement, and has been in snaps for over a week,
so I think it should go in.

ok?


Index: ahci.c
===================================================================
RCS file: /cvs/src/sys/dev/ic/ahci.c,v
retrieving revision 1.28
diff -u -p -u -p -r1.28 ahci.c
--- ahci.c      2 Oct 2016 18:56:05 -0000       1.28
+++ ahci.c      27 Feb 2017 07:10:40 -0000
@@ -2158,6 +2158,12 @@ ahci_port_intr(struct ahci_port *ap, u_i
                                PORTNAME(ap), err_slot);
 
                        ccb = &ap->ap_ccbs[err_slot];
+                       if (ccb->ccb_xa.state != ATA_S_ONCHIP) {
+                               printf("%s: NCQ errored slot %d is idle"
+                                   " (%08x active)\n", PORTNAME(ap), err_slot,
+                                   ci_saved);
+                               goto failall;
+                       }
                } else {
                        /* Didn't reset, could gather extended info from log. */
                }

Reply via email to