git: a8b49e7c6629 - main - cam: Add 3e/3 as a fatal code

Warner Losh Fri, 17 Jan 2025 13:11:11 -0800

The branch main has been updated by imp:

URL: 
https://cgit.FreeBSD.org/src/commit/?id=a8b49e7c66292852339481536f039719e7914200


commit a8b49e7c66292852339481536f039719e7914200
Author:     Warner Losh <i...@freebsd.org>
AuthorDate: 2025-01-17 21:06:32 +0000
Commit:     Warner Losh <i...@freebsd.org>
CommitDate: 2025-01-17 21:07:40 +0000

    cam: Add 3e/3 as a fatal code
    
    We see this error:
    
    (da4:mps0:0:3:0): SCSI sense: HARDWARE FAILURE asc:3e,3 (Logical unit 
failed self-test)
    
    for drives that have failed. Our vendor tells us there's no recovery
    from that state, though we can still grab logs from the drives and run
    their diagnostics. Drives in this state need to bascially be
    remanufactured because some part of them has failed. The prior default
    behavior is to retry, and retrying takes a long time to work
    out. Instead, short-circuit the retries and fail right away. I selected
    ENXIO because no I/O to LBAs is possible for drives in this state (both
    my experience and per vendor). Some googling suggests that other vendors
    behave identically, but it was inconclusive. Should this be too
    pessimistic, we can adjust in the future. Also, this is with some aging
    drives in our fleet, and if we have more than one drive in this state,
    our systems take so long to get to mountroot that the watchdog fires
    sometimes. Adding this patch makes them boot reliably again.
    
    MFC After:              1 week
    Sponsored by:           Netflix
    Reviewed by:            mav
    Differential Revision:  https://reviews.freebsd.org/D48505
---
 sys/cam/scsi/scsi_all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sys/cam/scsi/scsi_all.c b/sys/cam/scsi/scsi_all.c
index a26354e3dd97..0f31757cae25 100644
--- a/sys/cam/scsi/scsi_all.c
+++ b/sys/cam/scsi/scsi_all.c
@@ -2308,7 +2308,7 @@ static struct asc_table_entry asc_table[] = {
        { SST(0x3E, 0x02, SS_RDEF,
            "Timeout on logical unit") },
        /* DTLPWROMAEBKVF */
-       { SST(0x3E, 0x03, SS_RDEF,      /* XXX TBD */
+       { SST(0x3E, 0x03, SS_FATAL | ENXIO,
            "Logical unit failed self-test") },
        /* DTLPWROMAEBKVF */
        { SST(0x3E, 0x04, SS_RDEF,      /* XXX TBD */

git: a8b49e7c6629 - main - cam: Add 3e/3 as a fatal code

Reply via email to