The branch main has been updated by imp: URL: https://cgit.FreeBSD.org/src/commit/?id=a8b49e7c66292852339481536f039719e7914200
commit a8b49e7c66292852339481536f039719e7914200 Author: Warner Losh <i...@freebsd.org> AuthorDate: 2025-01-17 21:06:32 +0000 Commit: Warner Losh <i...@freebsd.org> CommitDate: 2025-01-17 21:07:40 +0000 cam: Add 3e/3 as a fatal code We see this error: (da4:mps0:0:3:0): SCSI sense: HARDWARE FAILURE asc:3e,3 (Logical unit failed self-test) for drives that have failed. Our vendor tells us there's no recovery from that state, though we can still grab logs from the drives and run their diagnostics. Drives in this state need to bascially be remanufactured because some part of them has failed. The prior default behavior is to retry, and retrying takes a long time to work out. Instead, short-circuit the retries and fail right away. I selected ENXIO because no I/O to LBAs is possible for drives in this state (both my experience and per vendor). Some googling suggests that other vendors behave identically, but it was inconclusive. Should this be too pessimistic, we can adjust in the future. Also, this is with some aging drives in our fleet, and if we have more than one drive in this state, our systems take so long to get to mountroot that the watchdog fires sometimes. Adding this patch makes them boot reliably again. MFC After: 1 week Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D48505 --- sys/cam/scsi/scsi_all.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sys/cam/scsi/scsi_all.c b/sys/cam/scsi/scsi_all.c index a26354e3dd97..0f31757cae25 100644 --- a/sys/cam/scsi/scsi_all.c +++ b/sys/cam/scsi/scsi_all.c @@ -2308,7 +2308,7 @@ static struct asc_table_entry asc_table[] = { { SST(0x3E, 0x02, SS_RDEF, "Timeout on logical unit") }, /* DTLPWROMAEBKVF */ - { SST(0x3E, 0x03, SS_RDEF, /* XXX TBD */ + { SST(0x3E, 0x03, SS_FATAL | ENXIO, "Logical unit failed self-test") }, /* DTLPWROMAEBKVF */ { SST(0x3E, 0x04, SS_RDEF, /* XXX TBD */