Hi!

Long story short: looks like r360843 can lead to kernel panic at disk
initialization in 11.4-STABLE (12-STABLE shall be affected too, however,
this is not tested). 

Long story longer: after routine upgrade from 11.2-STABLE to 11.4-STABLE,
host panics during disc initialization. Hardware: Dell R530, onboard
PERC adapter with two drives exported to system as JBOD / SYSPD:

mfi0 Adapter:
    Product Name: PERC H730 Mini
   Serial Number: 83M024U
        Firmware: 25.5.5.0005

mfi0 Configuration: 0 arrays, 0 volumes, 0 spares

mfi0 Physical Drives:
 0 (  447G) JBOD <SSDSC2KB480G7R DL58 serial=BTYS81210DYR480BGN> SATA E1:S0
 1 (  447G) JBOD <SSDSC2KB480G7R DL58 serial=BTYS81200F9U480BGN> SATA E1:S1

(zfs mirror is no worse than perc one).

=========== console log starts (a bit garbled by other devices init) ========
mfisyspd0 numa-domain 0 on mfi0
mfisyspd0: 457862MB (937703088 sectors) SYSPD volume (deviceid: 0)
mfisyspd0:  SYSPD volume attached
mfi0: DJA NA XXX SYSPDIO


ses0 at ahciem0 bus 0 scbus4 target 0 lun 0
Fatal trap 12: page fault while in kernel mode
uhub0: ses0: <AHCI SGPIO Enclosure 2.00 0001> SEMB S-E-S 2.00 device
ses0: SEMB SES Device
cpuid = 18; <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus0
apic id = 18
ses1 at ahciem1 bus 0 scbus11 target 0 lun 0
ses1: uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
fault virtual address   = 0x0
<AHCI SGPIO Enclosure 2.00 0001> SEMB S-E-S 2.00 device
ses1: SEMB SES Device
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff803daacb
pass1 at ahcich9 bus 0 scbus10 target 0 lun 0
pass1: <PLDS DVD+-RW DS-8ABSH LD5M> Removable CD-ROM SCSI device
stack pointer           = 0x28:0xfffffe07c2f457f0
frame pointer           = 0x28:0xfffffe07c2f45820
pass1: Serial Number JD6H1PLC0084O50KOA00
pass1: 150.000MB/s transfers (SATA 1.x, UDMA6, ATAPI 12bytes, PIO 8192bytes)
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
ses1: pass1 in 'Slot 05', SATA Slot: scbus10 target 0
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 13 (g_down)
trap number             = 12
panic: page fault
cpuid = 18
KDB: stack backtrace:
#0 0xffffffff805e5a45 at kdb_backtrace+0x65
#1 0xffffffff8059fd7e at vpanic+0x15e
#2 0xffffffff8059fc13 at panic+0x43
#3 0xffffffff80861515 at trap_fatal+0x365
#4 0xffffffff80861569 at trap_pfault+0x49
#5 0xffffffff80860c1e at trap+0x27e
#6 0xffffffff808427af at calltrap+0x8
#7 0xffffffff803d39f8 at mfi_send_frame+0x28
#8 0xffffffff803d395f at mfi_data_cb+0x2bf
#9 0xffffffff805de0be at bus_dmamap_load_bio+0xae
#10 0xffffffff803d351e at mfi_mapcmd+0xae
#11 0xffffffff803d292b at mfi_startio+0xeb
#12 0xffffffff803d8a39 at mfi_syspd_strategy+0x99
#13 0xffffffff804f8c99 at g_disk_start+0x369
#14 0xffffffff804fc3c3 at g_io_schedule_down+0x173
#15 0xffffffff804fcc5c at g_down_procbody+0x6c
#16 0xffffffff8056b0de at fork_exit+0x7e
#17 0xffffffff808437ce at fork_trampoline+0xe
Uptime: 1s
========================= console log ends ============================

this line from log

mfi0: DJA NA XXX SYSPDIO

suggests that instead of proceeding to initializing req_desc (line 1111:
https://svnweb.freebsd.org/base/stable/11/sys/dev/mfi/mfi_tbolt.c?revision=360843&view=markup#l1110)
code just prints this message and continues to MFI_WRITE (line 1141) with
req_desc initialized to NULL (line 1093).

Manual rollback of mentioned patch leads to following warning
during compilation:

cc -target x86_64-unknown-freebsd11.4 --sysroot=/usr/obj/usr/src/tmp 
-B/usr/obj/usr/src/tmp/usr/bin -c -O2 -pipe -fno-strict-aliasing  -g -nostdinc  
-I. -I/usr/src/sys -I/usr/src/sys/contrib/libfdt -D_KERNEL 
-DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h  -fno-omit-frame-pointer 
-mno-omit-leaf-frame-pointer -MD  -MF.depend.mfi_tbolt.o -MTmfi_tbolt.o 
-mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float  
-fno-asynchronous-unwind-tables -ffreestanding -fwrapv -fstack-protector 
-gdwarf-2 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes 
-Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef 
-Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs 
-fdiagnostics-show-option -Wno-unknown-pragmas -Wno-error-tautological-compare 
-Wno-error-empty-body -Wno-error-parentheses-equality 
-Wno-error-unused-function -Wno-error-pointer-sign 
-Wno-error-shift-negative-value -Wno-address-of-packed-member  -mno-aes 
-mno-avx  -std=iso9899:1999 -Werror  /us
 r/src/sys/dev/mfi/mfi_tbolt.c
/usr/src/sys/dev/mfi/mfi_tbolt.c:1110:22: warning: overlapping comparisons
      always evaluate to true [-Wtautological-overlap-compare]
                if (cdb[0] != 0x28 || cdb[0] != 0x2A) {
                    ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
1 warning generated.

however, system boots and works just fine (all variands of cdb[0] now
translated to correct req_desc).

Attempt to return error in case of cdb[0] in 0x28/0x2A leads to numerous
read errors in console log and inability to boot (geom thinks that gpart
is broken, zfs is unable to find pool), so this is not the option:

[......]
mfisyspd0 numa-domain 0 on mfi0
mfisyspd0: 457862MB (937703088 sectors) SYSPD volume (deviceid: 0)
ses0 at ahciem0 bus 0 scbus4 target 0 lun 0
ses0: <AHCI SGPIO Enclosure 2.00 0001> SEMB S-E-S 2.00 device
mfi0: mfisyspd0:  SYSPD volume attached
ses0: SEMB SES Device
ugen1.1: <Intel EHCI root HUB> at usbus1
ugen0.1: <Intel EHCI root HUB> at usbus0
DJA NA XXX SYSPDIO
error 1 in callback from mfi_send_frame
ses1 at ahciem1 bus 0 scbus11 target 0 lun 0
uhub0: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
mfi0: I/O error, cmd=0xfffffe00010e2660, error=0x1
ses1: <AHCI SGPIO Enclosure 2.00 0001> SEMB S-E-S 2.00 device
uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus0
mfisyspd0: hard error cmd=read 937703025-937703028
ses1: SEMB SES Device
mfi0: DJA NA XXX SYSPDIO
error 1 in callback from mfi_send_frame
pass1 at ahcich9 bus 0 scbus10 target 0 lun 0
pass1: <PLDS DVD+-RW DS-8ABSH LD5M> Removable CD-ROM SCSI device
mfi0: I/O error, cmd=0xfffffe00010e26e8, error=0x1
mfisyspd0: hard error cmd=read fsbn 937703086
pass1: Serial Number JD6H1PLC0084O50KOA00
pass1: 150.000MB/s transfers (SATA 1.x, UDMA6, ATAPI 12bytes, PIO 8192bytes)
mfi0: DJA NA XXX SYSPDIO
error 1 in callback from mfi_send_frame
ses1: pass1 in 'Slot 05', SATA Slot: scbus10 target 0
mfi0: I/O error, cmd=0xfffffe00010e2770, error=0x1
mfisyspd0: hard error cmd=read fsbn 937703087
mfi0: DJA NA XXX SYSPDIO
error 1 in callback from mfi_send_frame
mfisyspd1 numa-domain 0 on mfi0
mfi0: I/O error, cmd=0xfffffe00010e2880, error=0x1
mfisyspd0: hard error cmd=read fsbn 937703086
mfi0: DJA NA XXX SYSPDIO
error 1 in callback from mfi_send_frame
mfi0: I/O error, cmd=0xfffffe00010e2908, error=0x1
mfisyspd0: hard error cmd=read fsbn 937703087
mfisyspd1: 457862MB (937703088 sectors) SYSPD volume (deviceid: 1)
mfisyspd1:  SYSPD volume attached
mfi0: DJA NA XXX SYSPDIO
error 1 in callback from mfi_send_frame
[.....]


_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to