https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237463

--- Comment #5 from Leandro Lupori <lup...@freebsd.org> ---
I've noticed that the AIF interrupts always occur about 5 minutes after a
reboot.
Luckily, they occur on Petitboot too, which made it possible to collect the
following information about the remaining issue:


/ # dmesg | tail -20
[   40.494002] sd 1:2:23:0: [sdi] 4096-byte physical blocks
[   40.494004] scsi 1:3:123:0: Enclosure         ADAPTEC  Smart Adapter    4.02
PQ: 0 ANSI: 5
[   40.495376] sd 1:2:23:0: [sdi] Write Protect is off
[   40.495379] sd 1:2:23:0: [sdi] Mode Sense: 46 00 10 08
[   40.495520] scsi 1:3:123:0: Attached scsi generic sg11 type 13
[   40.498220] sd 1:2:23:0: [sdi] Write cache: enabled, read cache: enabled,
supports DPO and FUA
[   40.533826] udevd[2649]: inotify_add_watch(6, /dev/dm-8, 10) failed: No such
file or directory
[   40.585006] sd 1:2:23:0: [sdi] Attached SCSI disk
[   41.437318] udevd[2688]: inotify_add_watch(6, /dev/dm-11, 10) failed: No
such file or directory
[  321.101655] sd 1:2:16:0: [sdb] Synchronizing SCSI cache
[  321.102364] sd 1:2:16:0: [sdb] Synchronize Cache(10) failed: Result:
hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[  334.245061] scsi 1:2:16:0: Direct-Access     ATA      ST4000NM0115-1YZ SN04
PQ: 0 ANSI: 6
[  334.250710] sd 1:2:16:0: Attached scsi generic sg2 type 0
[  334.260739] sd 1:2:16:0: [sdb] 7814037168 512-byte logical blocks: (4.00
TB/3.64 TiB)
[  334.260742] sd 1:2:16:0: [sdb] 4096-byte physical blocks
[  334.261614] sd 1:2:16:0: [sdb] Write Protect is off
[  334.261616] sd 1:2:16:0: [sdb] Mode Sense: 46 00 10 08
[  334.264430] sd 1:2:16:0: [sdb] Write cache: disabled, read cache: enabled,
supports DPO and FUA
[  334.325386]  sdb: sdb1 sdb2 sdb3
[  334.349896] sd 1:2:16:0: [sdb] Attached SCSI disk


/var/petitboot/mnt/dev/sda2/bsd # ./arcconf getlogs 1 event
Controllers found: 1
<ControllerLog controllerID="0" time="Wed Feb 26 16:52:47 2020">
    <eventlog>
        <event message="Previous Firmware Lockup Detected, Lockup Code=227
Detail=0x00000000" eventTag="1" relativeControllerTime="4" eventClassCode="12"
eventSubClassCode="0" eventDetailCode
="0"/>
        <event message="Cache battery/Super cap is missing" eventTag="2"
relativeControllerTime="4" eventClassCode="2" eventSubClassCode="4"
eventDetailCode="2"/>
        <event message="Encryption Self-Test failed" eventTag="3"
relativeControllerTime="4" eventClassCode="2" eventSubClassCode="10"
eventDetailCode="0"/>
        <event message="Hot-plug drive removed, Port=C0 Box=1 Bay=0 SN=        
   ZC19RD9E" eventTag="4" relativeControllerTime="335" eventClassCode="1"
eventSubClassCode="0" eventDetailCode
="0"/>
        <event message="Physical drive failure, Port=C0 Box=1 Bay=0
reason=0x14" eventTag="5" relativeControllerTime="335" eventClassCode="4"
eventSubClassCode="0" eventDetailCode="0"/>
        <event message="Hot-plug drive inserted, Port=C0 Box=1 Bay=0 SN=       
    ZC19RD9E" eventTag="6" relativeControllerTime="348" eventClassCode="1"
eventSubClassCode="0" eventDetailCod
e="1"/>
        <event message="Drive is re-enabled, Port=C0 Box=1 Bay=0" eventTag="7"
relativeControllerTime="348" eventClassCode="4" eventSubClassCode="0"
eventDetailCode="3"/>
    </eventlog>
</ControllerLog>


So, the AIFs are about the drive being removed and then re-inserted after a few
seconds, which explains the "Target Selection Timeout" errors that were being
seen right after the AIF interrupts occurred.

However, further investigation is needed to understand why the drive is being
removed. It could be due to a bad HDD/SAS expander cable, a write cache issue,
or maybe a setup issue with the 2 SAS controllers/cabling on the machine, or
maybe something else.

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"

Reply via email to