On 08.11.2024 4:15, Mark Millard wrote:
[I narrowed the artifact kernel range for the change in the type of
failure that happens.]

On Nov 7, 2024, at 17:43, Mark Millard <mark...@yahoo.com> wrote:

[The change to LLVM 19 is what leads to the Alignment
Fault' on write failure. Details later below.]

On Nov 7, 2024, at 01:42, Mark Millard <mark...@yahoo.com> wrote:

Note: Unfortunately, the panics here are too early for a
dump device to be available.

Context started PkgBase upgrade from:

# uname -apKU
FreeBSD OPiP2E-RPi2v1p1 15.0-CURRENT FreeBSD 15.0-CURRENT 
main-n272821-37798b1d5dd1 GENERIC-NODEBUG arm armv7 1500025 1500025

Installed packages to be UPGRADED:
       FreeBSD-dtb: 15.snap20241009161500 -> 15.snap20241028121139 [base]
       FreeBSD-kernel-generic: 15.snap20241011221604 -> 15.snap20241106134422 
[base]
       FreeBSD-kernel-generic-dbg: 15.snap20241011221604 -> 
15.snap20241106134422 [base]
       FreeBSD-kernel-generic-mmccam: 15.snap20241011221604 -> 
15.snap20241106134422 [base]
       FreeBSD-kernel-generic-mmccam-dbg: 15.snap20241011221604 -> 
15.snap20241106134422 [base]
       FreeBSD-kernel-generic-nodebug: 15.snap20241011221604 -> 
15.snap20241106134422 [base]
       FreeBSD-kernel-generic-nodebug-dbg: 15.snap20241011221604 -> 
15.snap20241106134422 [base]
       FreeBSD-src-sys: 15.snap20241011221604 -> 15.snap20241106160110 [base]

(Those were installed but the FreeBSD-dtb had linux 6.4
dtb files, not the 6.8 ones. 6.8 ones from a personal build
were copied to where they need to be. I've separately
reported the 6.4 vs. 6.8 issue.)

# ~/pkgbase-snapshot-list.sh
Via pkg-static info -C -x '^FreeBSD-' . . .
  1 FreeBSD-*-15.snap20241106160110
  6 FreeBSD-*-15.snap20241106134422
  1 FreeBSD-*-15.snap20241028121139
  3 FreeBSD-*-15.snap20241011221604
  2 FreeBSD-*-15.snap20241011210446
38 FreeBSD-*-15.snap20241011182434
  4 FreeBSD-*-15.snap20241011073851
  5 FreeBSD-*-15.snap20241010141501
  1 FreeBSD-*-15.snap20241010120743
296 FreeBSD-*-15.snap20241009161500
Instead via /var/cache/pkg/*.snap*.pkg . . .
  1 FreeBSD-*-15.snap20241106160110
  6 FreeBSD-*-15.snap20241106134422
  1 FreeBSD-*-15.snap20241028121139
10 FreeBSD-*-15.snap20241011221604
  2 FreeBSD-*-15.snap20241011210446
38 FreeBSD-*-15.snap20241011182434
  4 FreeBSD-*-15.snap20241011073851
  5 FreeBSD-*-15.snap20241010141501
  1 FreeBSD-*-15.snap20241010120743
297 FreeBSD-*-15.snap20241009161500


The failure (kernel-GENERIC-NODEBUG):

. . .
Root mount waiting for: usbus3 CAM
Fatal kernel mode data abort: 'Alignment Fault' on write
trapframe: 0xc6c9ac10
FSR=00000801, FAR=db23209b, spsr=20000013
r0 =db232080, r1 =00000000, r2 =00000006, r3 =00000024
r4 =db19e280, r5 =00000000, r6 =00000001, r7 =00000006
r8 =c6c9ad20, r9 =c0b7973c, r10=c092074c, r11=c6c9acb8
r12=00000000, ssp=c6c9aca0, slr=c01b01d8, pc =c01aff88

panic: Fatal abort
cpuid = 1
time = 3
KDB: stack backtrace:
db_trace_self() at db_trace_self
        pc = 0xc0667004  lr = 0xc0078630 (db_trace_self_wrapper+0x30)
        sp = 0xc6c9a9c8  fp = 0xc6c9aae0
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
        pc = 0xc0078630  lr = 0xc0328db8 (vpanic+0x140)
        sp = 0xc6c9aae8  fp = 0xc6c9ab08
        r4 = 0x00000100  r5 = 0x00000000
        r6 = 0xc084d1f1  r7 = 0xc0b69a94
vpanic() at vpanic+0x140
        pc = 0xc0328db8  lr = 0xc0328c78 (vpanic)
        sp = 0xc6c9ab10  fp = 0xc6c9ab14
        r4 = 0xc6c9ac10  r5 = 0x00000013
        r6 = 0xdb23209b  r7 = 0x00000001
        r8 = 0x00000801  r9 = 0x00000013
       r10 = 0xdb23209b
vpanic() at vpanic
        pc = 0xc0328c78  lr = 0xc068c8e8 (abort_align)
        sp = 0xc6c9ab1c  fp = 0xc6c9ab48
        r4 = 0x00000001  r5 = 0x00000801
        r6 = 0x00000013  r7 = 0xdb23209b
        r8 = 0xc6c9ab14  r9 = 0xc0328c78
       r10 = 0xc6c9ab1c
abort_align() at abort_align
        pc = 0xc068c8e8  lr = 0xc068c958 (abort_align+0x70)
        sp = 0xc6c9ab50  fp = 0xc6c9ab68
        r4 = 0xc6d21c00 r10 = 0xdb23209b
abort_align() at abort_align+0x70
        pc = 0xc068c958  lr = 0xc068c5e0 (abort_handler+0x430)
        sp = 0xc6c9ab70  fp = 0xc6c9ac08
        r4 = 0x00000000 r10 = 0xdb23209b
abort_handler() at abort_handler+0x430
        pc = 0xc068c5e0  lr = 0xc0669868 (exception_exit)
        sp = 0xc6c9ac10  fp = 0xc6c9acb8
        r4 = 0xdb19e280  r5 = 0x00000000
        r6 = 0x00000001  r7 = 0x00000006
        r8 = 0xc6c9ad20  r9 = 0xc0b7973c
       r10 = 0xc092074c
exception_exit() at exception_exit
        pc = 0xc0669868  lr = 0xc01b01d8 (usb_msc_auto_quirk+0xfc)
        sp = 0xc6c9aca0  fp = 0xc6c9acb8
        r0 = 0xdb232080  r1 = 0x00000000
        r2 = 0x00000006  r3 = 0x00000024
        r4 = 0xdb19e280  r5 = 0x00000000
        r6 = 0x00000001  r7 = 0x00000006
        r8 = 0xc6c9ad20  r9 = 0xc0b7973c
       r10 = 0xc092074c r12 = 0x00000000
bbb_command_start() at bbb_command_start+0x4c
        pc = 0xc01aff88  lr = 0xc01b01d8 (usb_msc_auto_quirk+0xfc)
        sp = 0xc6c9acc0  fp = 0xc6c9acf8
        r4 = 0xdb16d800  r5 = 0xdb19e280
        r6 = 0x00000001 r10 = 0xc092074c
usb_msc_auto_quirk() at usb_msc_auto_quirk+0xfc
        pc = 0xc01b01d8  lr = 0xc01a4bd8 (usb_alloc_device+0x9c4)
        sp = 0xc6c9ad00  fp = 0xc6c9ad68
        r4 = 0x00000000  r5 = 0x00000001
        r6 = 0x00000000  r7 = 0x00000002
        r8 = 0xdb16d800  r9 = 0xda241c78
       r10 = 0x000003ee
usb_alloc_device() at usb_alloc_device+0x9c4
        pc = 0xc01a4bd8  lr = 0xc01ad16c (uhub_explore+0x494)
        sp = 0xc6c9ad70  fp = 0xc6c9adc0
        r4 = 0x00000000  r5 = 0x00000000
        r6 = 0xdb16e800  r7 = 0x00000000
        r8 = 0xdb18c200  r9 = 0x00000001
       r10 = 0x00000000
uhub_explore() at uhub_explore+0x494
        pc = 0xc01ad16c  lr = 0xc0198654 (usb_bus_explore+0x1d4)
        sp = 0xc6c9adc8  fp = 0xc6c9add8
        r4 = 0xda241c78  r5 = 0xdb16e800
        r6 = 0x00000000  r7 = 0xda241d6c
        r8 = 0xc09b0b5f  r9 = 0x00000001
       r10 = 0xda241d1c
usb_bus_explore() at usb_bus_explore+0x1d4
        pc = 0xc0198654  lr = 0xc01b22d0 (usb_process+0x124)
        sp = 0xc6c9ade0  fp = 0xc6c9ae10
        r4 = 0xda241d0c  r5 = 0xda241d14
usb_process() at usb_process+0x124
        pc = 0xc01b22d0  lr = 0xc02da4f0 (fork_exit+0xb0)
        sp = 0xc6c9ae18  fp = 0xc6c9ae38
        r4 = 0xc6c9ae40  r5 = 0xc6d21c00
        r6 = 0xc6d08740  r7 = 0xda241d0c
        r8 = 0xc01b21ac  r9 = 0x00000000
       r10 = 0x00000000
fork_exit() at fork_exit+0xb0
        pc = 0xc02da4f0  lr = 0xc06697fc (swi_exit)
        sp = 0xc6c9ae40  fp = 0x00000000
        r4 = 0xc01b21ac  r5 = 0xda241d0c
        r6 = 0x00000000  r7 = 0x00000000
        r8 = 0x00000000 r10 = 0x00000000
swi_exit() at swi_exit
        pc = 0xc06697fc  lr = 0xc06697fc (swi_exit)
        sp = 0xc6c9ae40  fp = 0x00000000
KDB: enter: panic
[ thread pid 14 tid 100069 ]
Stopped at      kdb_enter+0x54: ldrb    r15, [r15, r15, ror r15]!
db>

Using just available official artifact kernels for testing
I've established that 0953460ce149 (and various from before
that) does not have the problem:

Wed, 23 Oct 2024
    • git: 5c92f84bb607 - main - LinuxKPI: update rcu_dereference_*() and 
lockdep_is_held() Bjoern A. Zeeb
    • git: 6fa91acca40d - main - conf/NOTES: Remove trailing whitespace Li-Wen 
Hsu
    • git: 91b7b225b2ce - main - LINT: Add mac_do Li-Wen Hsu
    • git: 419249c1cacc - main - Revert "LINT: Add mac_do" Li-Wen Hsu
    • Re: git: 419249c1cacc - main - Revert "LINT: Add mac_do" Baptiste 
Daroussin
    • Re: git: 13da1af1cd67 - main - libcxxrt: Update to upstream 698997bfde1f 
John Baldwin
    • Re: git: 419249c1cacc - main - Revert "LINT: Add mac_do" John Baldwin
    • git: 0953460ce149 - main - libc: fix access mode tests in fmemopen(3) Ed 
Maste

So the above one worked.

The next available kernel to test was f3dbef108212 (the bump for LLVM19
at the end of the below):

    • RE: git: 6a07e67fb7a8 - main - vm_meter: Fix laundry accounting Mark 
Millard
    • git: 6b9f7133aba4 - main - libc: Add one more check in new fmemopen test 
Ed Maste
    • git: 0fca6ea1d4ee - main - Merge llvm-project main 
llvmorg-19-init-18630-gf2ccf80136a0 Dimitry Andric
    • git: 36b606ae6aa4 - main - Merge llvm-project release/19.x 
llvmorg-19.1.0-rc1-0-ga4902a36d5c2 Dimitry Andric
    • git: 3f157662c0ef - main - Tentatively apply 
https://github.com/llvm/llvm-project/pull/101403 Dimitry Andric
    • git: d575077527d4 - main - bsd.sys.mk: for clang >= 19, similar to gcc >= 
8.1, turn off -Werror for -Wcast-function-type-mismatch. Dimitry Andric
    • git: 36d486cc2ecd - main - Fix enum warning in ath_hal's ar9002 Dimitry 
Andric
    • git: 6846ab2fb663 - main - libcxx simd_utils.h: only enable 
_LIBCPP_HAS_ALGORITHM_VECTOR_UTILS for clang >= 15, since older versions do not 
support the required builtins. Dimitry Andric
    • git: 81e300df5e65 - main - libcxx atomic_ref.h: add typename keyword for 
difference_type declarations, otherwise older clang versions cannot compile 
this header. Dimitry Andric
    • git: 6b4981df6008 - main - libcxx cstdlib, cwchar: avoid using long long 
functions if not supported, even for older compilers that do not support the 
using_if_exists attribute. Dimitry Andric
    • git: 2f6d6eaf2d51 - main - libcxx-compat: revert 
llvmorg-19-init-18063-g561246e90282: Dimitry Andric
    • git: 04f5b79cfa49 - main - libcxx-compat: revert 
llvmorg-19-init-18062-g4dfa75c663e5: Dimitry Andric
    • git: e8054e44f4ca - main - libcxx-compat: revert 
llvmorg-19-init-17853-g578c6191eff7: Dimitry Andric
    • git: 0bec0529b1d7 - main - libcxx-compat: revert 
llvmorg-19-init-17728-g30cc12cd818d: Dimitry Andric
    • git: e8847079df1b - main - libcxx-compat: revert 
llvmorg-19-init-17727-g0eebb48fcfbc: Dimitry Andric
    • git: 2f2ebe758bea - main - libcxx-compat: revert 
llvmorg-19-init-17473-g69fecaa1a455: Dimitry Andric
    • git: 1199d38d8ec7 - main - libcxx-compat: revert 
llvmorg-19-init-8667-g472b612ccbed: Dimitry Andric
    • git: a7b2d7f261b8 - main - libcxx-compat: revert 
llvmorg-19-init-5639-ga10aa4485e83: Dimitry Andric
    • git: f3859a1a13a1 - main - libcxx-compat: revert 
llvmorg-19-init-4504-g937a5396cf3e: Dimitry Andric
    • git: 072b5fb698ab - main - libcxx-compat: revert 
llvmorg-19-init-4003-g55357160d0e1: Dimitry Andric
    • git: b60301d8b594 - main - libcxx-compat: don't remove headers that were 
reintroduced by reverts Dimitry Andric
    • git: 2e861daab905 - main - libcxx-compat: install headers that were 
reintroduced by reverts Dimitry Andric
    • git: ff6c8447844b - main - libcxx-compat: update libcxx.imp for headers 
that were reintroduced by reverts Dimitry Andric
    • git: 52418fc2be8e - main - Merge llvm-project release/19.x 
llvmorg-19.1.0-rc2-0-gd033ae172d1c Dimitry Andric
    • git: 62987288060f - main - Merge llvm-project release/19.x 
llvmorg-19.1.0-rc3-0-g437434df21d8 Dimitry Andric
    • git: 6c4b055cfb6b - main - Merge llvm-project release/19.x 
llvmorg-19.1.0-rc4-0-g0c641568515a Dimitry Andric
    • git: 835c3a3e69af - main - Merge commit 6dbdb8430b49 from llvm git (by 
Nikolas Klauser): Dimitry Andric
    • git: c80e69b00d97 - main - Merge llvm-project release/19.x 
llvmorg-19.1.0-0-ga4bf6cd7cfb1 Dimitry Andric
    • git: 6e516c87b6d7 - main - Merge llvm-project release/19.x 
llvmorg-19.1.1-0-gd401987fe349 Dimitry Andric
    • git: 5deeebd8c6ca - main - Merge llvm-project release/19.x 
llvmorg-19.1.2-0-g7ba7d8e2f7b6 Dimitry Andric
    • git: f3dbef108212 - main - Bump __FreeBSD_version for llvm 19.1.2 merge 
Dimitry Andric

f3dbef108212 gets the:

"Fatal kernel mode data abort: 'Alignment Fault' on write"

boot failure for artifact kernel. 6b9f7133aba4 does nit
seem a likely source of the problem, basically leaving the
LLVM changes as what is at issue.

I'll note that artifact kernels are witness kernels. So
this exploration adds to the distinctions observed
compared to the prior notes.

Looking at bbb_command_start() 's pc:

# llvm-addr2line -e /boot/kernel.GENERIC-NODEBUG/kernel 0xc01aff88
/home/pkgbuild/worktrees/main/sys/dev/usb/usb_msctest.c:554

What leads to that line is:

/*------------------------------------------------------------------------*
*      bbb_command_start - execute a SCSI command synchronously
*
* Return values
* 0: Success
* Else: Failure
*------------------------------------------------------------------------*/
static int
bbb_command_start(struct bbb_transfer *sc, uint8_t dir, uint8_t lun,
   void *data_ptr, size_t data_len, void *cmd_ptr, size_t cmd_len,
   usb_timeout_t data_timeout)
{
       sc->lun = lun;
       sc->dir = data_len ? dir : DIR_NONE;
       sc->data_ptr = data_ptr;
       sc->data_len = data_len;
       sc->data_rem = data_len;
       sc->data_timeout = (data_timeout + USB_MS_HZ);
       sc->actlen = 0;
       sc->error = 0;
       sc->cmd_len = cmd_len;
       memset(&sc->cbw->CBWCDB, 0, sizeof(sc->cbw->CBWCDB));

The memset line is line 554 of sys/dev/usb/usb_msctest.c .

The below looks to be a separate problem based on
some later FreeBSD kernel update than the above.

I'll note that attempting to use the WITNESS variant of the kernel
( /boot/kernel/ ) gets a different, even earlier failure:

. . .
VT: init without driver.
panic: acquiring blockable sleep lock with spinlock or critical section held 
(sleep mutex) pmap @ /home/pkgbuild/worktrees/main/sys/arm/arm/pmap-v6.c:6455

I do know that d021d3b3c675 at the end of the below
shows this failure --before the system has a chance
to get the usb related write alignment failure
reported above.

I have not explored where in the below range the
behavior changes (for what is available as an
official artifact kernel). It seems unlikely that
any of the below would actually boot: it is likely
a question of which of the 2 (or more) failure
types happen for each instead.

The last before "Thu, 24, Oct 2024" was:

         • git: 8b2e7da70855 - main - llvm19: permit incremental builds from 
llvm18 Brooks Davis

That is the last available artifact kernel that gets the
original usb related write alignment type of failure.

Thu, 24 Oct 2024
    • git: 34951b0b9e78 - main - swap_pager: move scan_all_shadowed, use 
iterators Doug Moore
    • git: 2ac21f2c98ed - main - x86 specialreg.h: visually align %cr4 and 
MSR_EFER bit mask definitions Konstantin Belousov
    • git: cc11bc1150d5 - main - x86 specialreg.h: add all defined bits for 
%cr4 Konstantin Belousov
    • git: cc4b25f10211 - main - x86 specialreg: reorder %cr3 bits masks 
definitions by value Konstantin Belousov
    • git: 5999b74e9637 - main - x86 specialreg: add bit masks definitions for 
LAM in %cr3 Konstantin Belousov
    • git: 6308db659f2a - main - x86 specialreg: add bit masks definitions for 
EFER features Konstantin Belousov
    • git: 9f718b57b846 - main - x86 specialreg: add bit masks definitions for 
LASS and LAM features Konstantin Belousov
    • git: 3360a15898ce - main - net: route: convert routing statistics to a 
sysctl Kyle Evans
    • Re: git: 3360a15898ce - main - net: route: convert routing statistics to 
a sysctl Kyle Evans
    • git: 77b70ad751df - main - e1000: Move I219 LM19/V19 to ADL Kevin Bowling

The last above is the first available artifact kernel that
that gets the different error. There are no armv7 artifact
kernels between 8b2e7da70855 and 77b70ad751df .

So something from 34951b0b9e78 .. 77b70ad751df leads to
the change in the type of failure. I've no clue what.

It looked to me like the x86 commits and e1000 commit had
no chance of contributing to the armv7 context. Thus
who I added to the CC vs. did not add.

    • git: d64442a89896 - main - arm{,64}: use genassym for INTR_ROOT_* values 
Kyle Evans
    • git: 536c8d948e85 - main - intrng: change multi-interrupt root support 
type to enum Kyle Evans
    • git: 4f12b529f404 - main - sys/intr.h: formally depend on machine/intr.h 
Kyle Evans
    • git: a5b1eecbed07 - main - Apply workaround for building llvm-project 
with WITHOUT_LLVM_ASSERTIONS Dimitry Andric
    • git: 1c83996beda7 - main - Adjust LLVM_ENABLE_ABI_BREAKING_CHECKS 
depending on NDEBUG Dimitry Andric
    • git: b2dd4970c7b5 - main - dev/gpio: Mask all pl011 interrupts Andrew 
Turner
    • git: 3b03e1bb8615 - main - intrng: Store the IPI priority Andrew Turner
    • git: 6204391e99ca - main - arm64: Check TDP_NOFAULTING in a data abort 
Andrew Turner
    • git: a84653c5db25 - main - arm64: Don't enable interrupts when in a 
spinlock Andrew Turner
    • git: d7f930b80e89 - main - arm64: Implement efi_rt_arch_call Andrew Turner
    • git: 8efb1500d4f1 - main - arm64: Enable handling EFI runtime service 
faults Andrew Turner
    • git: 9693241188aa - main - sound: Call DSP_REGISTERED before 
PCM_DETACHING Christos Margiolis
    • git: bb5e3ac1a7b7 - main - sound: Use DSP_REGISTERED in dsp_clone() 
Christos Margiolis
    • git: a4111e9dc722 - main - sound: Change PCMDIR_* numbering Christos 
Margiolis
    • git: 802c78f5194e - main - sound: Untangle dsp_cdevs[] and 
dsp_unit2name() confusion Christos Margiolis
    • git: b1bb6934bb87 - main - sound: Fix build error in chm_mkname() KASSERT 
Christos Margiolis
    • git: ce20b48a60fb - main - sctp: improve debug output Michael Tuexen
    • git: e4ac0183a1a8 - main - sctp: cleanup Michael Tuexen
    • git: 8c8ebbb04518 - main - bhyve ahci: Improve robustness of TRIM 
handling John Baldwin
    • git: f0bc751d6fb4 - main - csa: Use pci_find_device to simplify 
clkrun_hack John Baldwin
    • git: d96ba5a62365 - main - config: Remove a stray semicolon Zhenlei Huang
    • git: 56b17de1e836 - main - makefs: Remove a stray semicolon Zhenlei Huang
    • git: 88b71d1fe054 - main - arm64: rockchip: Remove a stray semicolon 
Zhenlei Huang
    • git: b4856b8e9d87 - main - LinuxKPI: Remove stray semicolons Zhenlei Huang
    • git: 75ff90814aec - main - enic: Remove a stray semicolon Zhenlei Huang
    • git: 6ccf4f4071c5 - main - mana: Remove stray semicolons Zhenlei Huang
    • git: 86a2c910c05c - main - mpi3mr: Remove a stray semicolon Zhenlei Huang
    • git: 36756195a342 - main - ocs_fc: Remove a stray semicolon Zhenlei Huang
    • git: 2f395cfda8b5 - main - tcp cc: Remove a stray semicolon Zhenlei Huang
    • git: f3a097d0312c - main - netstat: switch to using the sysctl-exported 
stats for live stats Kyle Evans
    • git: 656991b0c629 - main - locks: augment lock_class with lc_trylock 
method Gleb Smirnoff
    • git: efcb2ec8cb81 - main - callout: provide CALLOUT_TRYLOCK flag Gleb 
Smirnoff
    • git: bffebc336f4e - main - tcp: use CALLOUT_TRYLOCK for the TCP callout 
Gleb Smirnoff
    • git: d021d3b3c675 - main - tcp: get rid of TDP_INTCPCALLOUT Gleb Smirnoff



cpuid = 0
time = 1
KDB: stack backtrace:
Fatal kernel mode data abort: 'Translation Fault (L1)' on read
trapframe: 0xc0f14568
FSR=00000005, FAR=db7fcfb1, spsr=200001d3
r0 =c0f1465c, r1 =00000001, r2 =db7fcfae, r3 =1b000a4e
r4 =c07fc55c, r5 =8fce1b89, r6 =00006f3e, r7 =81000000
r8 =c07c4b6c, r9 =c094ace8, r10=c09741d8, r11=c0f14618
r12=c0f146c4, ssp=c0f145fc, slr=c0601428, pc =c062686c

panic: Fatal abort
cpuid = 0
time = 1
KDB: stack backtrace:
Fatal kernel mode data abort: 'Translation Fault (L1)' on read
trapframe: 0xc0f141f0
FSR=00000005, FAR=db7fcfb1, spsr=200001d3
r0 =c0f142e4, r1 =00000001, r2 =db7fcfae, r3 =1b000a4e
r4 =c07fc55c, r5 =8fce1b89, r6 =00006f3e, r7 =81000000
r8 =c07c4b6c, r9 =c094ace8, r10=c09741d8, r11=c0f142a0
r12=c0f1434c, ssp=c0f14284, slr=c0601428, pc =c062686c

panic: Fatal abort
cpuid = 0
time = 1
KDB: stack backtrace:
Fatal kernel mode data abort: 'Translation Fault (L1)' on read
trapframe: 0xc0f13e78
FSR=00000005, FAR=db7fcfb1, spsr=200001d3
r0 =c0f13f6c, r1 =00000001, r2 =db7fcfae, r3 =1b000a4e
r4 =c07fc55c, r5 =8fce1b89, r6 =00006f3e, r7 =81000000
r8 =c07c4b6c, r9 =c094ace8, r10=c09741d8, r11=c0f13f28
r12=c0f13fd4, ssp=c0f13f0c, slr=c0601428, pc =c062686c

panic: Fatal abort
cpuid = 0
time = 1
KDB: stack backtrace:
Fatal kernel mode data abort: 'Translation Fault (L1)' on read
trapframe: 0xc0f13b00
FSR=00000005, FAR=db7fcfb1, spsr=200001d3
r0 =c0f13bf4, r1 =00000001, r2 =db7fcfae, r3 =1b000a4e
r4 =c07fc55c, r5 =8fce1b89, r6 =00006f3e, r7 =81000000
r8 =c07c4b6c, r9 =c094ace8, r10=c09741d8, r11=c0f13bb0
r12=c0f13c5c, ssp=c0f13b94, slr=c0601428, pc =c062686c

panic: Fatal abort
cpuid = 0
time = 1
KDB: stack backtrace:
Fatal kernel mode data abort: 'Translation Fault (L1)' on read
trapframe: 0xc0f13788
FSR=00000005, FAR=db7fcfb1, spsr=200001d3
r0 =c0f1387c, r1 =00000001, r2 =db7fcfae, r3 =1b000a4e
r4 =c07fc55c, r5 =8fce1b89, r6 =00006f3e, r7 =81000000
r8 =c07c4b6c, r9 =c094ace8, r10=c09741d8, r11=c0f13838
r12=c0f138e4, ssp=c0f1381c, slr=c0601428, pc =c062686c

. . .

Looking:

# llvm-addr2line -e /boot/kernel.GENERIC-NODEBUG/kernel 0xc062686c
/home/pkgbuild/worktrees/main/sys/vm/uma_core.c:5676

static int
sysctl_handle_uma_zone_frees(SYSCTL_HANDLER_ARGS)
{
       uma_zone_t zone = arg1;
       uint64_t cur;

       cur = uma_zone_get_frees(zone);
       return (sysctl_handle_64(oidp, &cur, 0, req));
}

The "return" line is 5676 of sys/vm/uma_core.c .


Also, for what leads up to:

/home/pkgbuild/worktrees/main/sys/arm/arm/pmap-v6.c:6455

/*
*  The implementation of pmap_fault() uses IN_RANGE2() macro which
*  depends on the fact that given range size is a power of 2.
*/
CTASSERT(powerof2(NB_IN_PT1));
CTASSERT(powerof2(PT2MAP_SIZE));

#define IN_RANGE2(addr, start, size)    \
   ((vm_offset_t)(start) == ((vm_offset_t)(addr) & ~((size) - 1)))

/*
*  Handle access and R/W emulation faults.
*/
int
pmap_fault(pmap_t pmap, vm_offset_t far, uint32_t fsr, int idx, bool usermode)
{
       pt1_entry_t *pte1p, pte1;
       pt2_entry_t *pte2p, pte2;

       if (pmap == NULL)
               pmap = kernel_pmap;

       /*
        * In kernel, we should never get abort with FAR which is in range of
        * pmap->pm_pt1 or PT2MAP address spaces. If it happens, stop here
        * and print out a useful abort message and even get to the debugger
        * otherwise it likely ends with never ending loop of aborts.
        */
       if (__predict_false(IN_RANGE2(far, pmap->pm_pt1, NB_IN_PT1))) {
               /*
                * All L1 tables should always be mapped and present.
                * However, we check only current one herein. For user mode,
                * only permission abort from malicious user is not fatal.
                * And alignment abort as it may have higher priority.
                */
               if (!usermode || (idx != FAULT_ALIGN && idx != FAULT_PERM_L2)) {
                       CTR4(KTR_PMAP, "%s: pmap %#x pm_pt1 %#x far %#x",
                           __func__, pmap, pmap->pm_pt1, far);
                       panic("%s: pm_pt1 abort", __func__);
               }
               return (KERN_INVALID_ADDRESS);
       }
       if (__predict_false(IN_RANGE2(far, PT2MAP, PT2MAP_SIZE))) {
               /*
                * PT2MAP should be always mapped and present in current
                * L1 table. However, only existing L2 tables are mapped
                * in PT2MAP. For user mode, only L2 translation abort and
                * permission abort from malicious user is not fatal.
                * And alignment abort as it may have higher priority.
                */
               if (!usermode || (idx != FAULT_ALIGN &&
                   idx != FAULT_TRAN_L2 && idx != FAULT_PERM_L2)) {
                       CTR4(KTR_PMAP, "%s: pmap %#x PT2MAP %#x far %#x",
                           __func__, pmap, PT2MAP, far);
                       panic("%s: PT2MAP abort", __func__);
               }
               return (KERN_INVALID_ADDRESS);
       }

       /*
        * A pmap lock is used below for handling of access and R/W emulation
        * aborts. They were handled by atomic operations before so some
        * analysis of new situation is needed to answer the following question:
        * Is it safe to use the lock even for these aborts?
        *
        * There may happen two cases in general:
        *
        * (1) Aborts while the pmap lock is locked already - this should not
        * happen as pmap lock is not recursive. However, under pmap lock only
        * internal kernel data should be accessed and such data should be
        * mapped with A bit set and NM bit cleared. If double abort happens,
        * then a mapping of data which has caused it must be fixed. Further,
        * all new mappings are always made with A bit set and the bit can be
        * cleared only on managed mappings.
        *
        * (2) Aborts while another lock(s) is/are locked - this already can
        * happen. However, there is no difference here if it's either access or
        * R/W emulation abort, or if it's some other abort.
        */

       PMAP_LOCK(pmap);

That "PMAP_LOCK(pmap);" line is line 6455 of sys/arm/arm/pmap-v6.c .


FYI: Running the prior kernel.GENERIC-NODEBUG/ ( called
kernel.GENERIC-NODEBUG.good/ ) continues to operate
normally. I do not have the older PkgBase kernel/ around
to try, unfortunately.

I'll remind that this is from using official FreeBSD builds
of the kernel versions that I tested, not from my personal
build context.


===
Mark Millard
marklmi at yahoo.com


Hi Mark,

Please see https://reviews.freebsd.org/D47485

Unfortunately, I see 2 problems with llvm 19.

The first is regression, the compiler generates inline memset() accessing non-aligned data with sub-optimal instructions (with word access). This regression triggers bug in the kernel (which should be fixed in D47485).

Second, regarding "panic: acquiring blockable sleep lock" is due to an bug in lld. It mis-links the ".ARM.exidx" section on the output binary, which is used by the stack unwinder in the kernel. I don't have a fix for this for now, so you have to use the linker from llvm18 as a workaround.

I'm not sure if I have enough free cycles to manage both issues on the llvm side...

Michal




Reply via email to