On Feb 15, 2023, at 16:08, Mark Millard <mark...@yahoo.com> wrote: > Kornel Dulęba <kd_at_FreeBSD.org> wrote on > Date: Sat, 04 Feb 2023 19:22:23 UTC : > >> The branch main has been updated by kd: >> >> URL: >> https://cgit.FreeBSD.org/src/commit/?id=6926e2699ae55080f860488895a2a9aa6e6d9b4d >> >> commit 6926e2699ae55080f860488895a2a9aa6e6d9b4d >> Author: Kornel Dulęba <k...@freebsd.org> >> AuthorDate: 2023-02-04 12:59:30 +0000 >> Commit: Kornel Dulęba <k...@freebsd.org> >> CommitDate: 2023-02-04 19:21:43 +0000 >> >> arm: Add support for using VFP in kernel >> >> Add missing logic to allow in-kernel VFP usage for ARMv7 NEON. >> The implementation is strongly based on arm64 code. >> It introduces a family of fpu_kern_* functions to enable the usage >> of VFP instructions in kernel. >> Apart from that the existing armv7 VFP logic was modified, >> taking into account that the state of the VFP registers can now >> be modified in the kernel. >> >> Co-developed by: Wojciech Macek <w...@freebsd.org> >> Sponsored by: Stormshield >> Obtained from: Semihalf >> Reviewed by: andrew >> Differential Revision: https://reviews.freebsd.org/D37419 >> --- >> lib/libthread_db/arch/arm/libpthread_md.c | 21 ++-- >> sys/arm/arm/exec_machdep.c | 49 ++++---- >> sys/arm/arm/machdep.c | 1 + >> sys/arm/arm/machdep_kdb.c | 31 ++++- >> sys/arm/arm/swtch-v6.S | 8 +- >> sys/arm/arm/swtch.S | 8 +- >> sys/arm/arm/vfp.c | 182 +++++++++++++++++++++++++++++- >> sys/arm/arm/vm_machdep.c | 6 +- >> sys/arm/include/fpu.h | 7 ++ >> sys/arm/include/pcb.h | 5 + >> sys/arm/include/reg.h | 12 +- >> sys/arm/include/vfp.h | 17 +++ >> 12 files changed, 293 insertions(+), 54 deletions(-) > > [This is a somewhat adjusted version of a note replying > to a Warner note about a panic someone got during a > process coredump that was happening.] > > Just a possible point, given recent kernel floating > point work: > > I tried to do a typical build and test of some > benchmark programs that I sometimes use that involve > floating point in some of the programs, some use with > multithreading involved. (As FreeBSD and g++ progress > I tend to do this once and a while, not as often on > armv7 as on aarch64.) > > On armv7, I now usually get a message about a failure > of an internal cross-check, which also leads to the > program being stopped early. The messaging from run > to run varies what the failure is, but the runs should > not vary and should not fail the cross-checks --and > previously did not, including when I last tried armv7. > (Not recently.) > > For the specific example failures, the initial serial > (single thread) test with float involved works but the > following multi-thread test in the same program fails > and causes the program to stop when it notices there > is a problem. (On occasion the cross-check does does > not detect a problem.) > > The programs that do not test floating point do not > fail. (Same algorithm on integral types.) These can > involve floating point outside the algorithm > benchmarked, but with no multi-threading involved for > such and no floating point based cross-checks involved. > > At this point it is far from obvious to me how I > would trackdown the specifics of what leads to the > failed cross-checks. But the above is suggestive of > there being problems for armv7 handling of saving > and restoring floating point context for > multi-threading in a process, at least. I've no clue > if such are strictly limited to the floating point > values that show up vs. if there might be wider > memory handling problems that result in the process. >
Further runs of the benchmark program show that I also get cross-check failures for single-threaded (the first way it tests). But it turns out that, even for single treaded execution of the algorithm benchmarked, it is not run on the process's initial thread but instead on a created thread. Turns out that for a debug armv7 kernel (debug is not what I normally run) attempting a bt in gdb can lead to a kernel panic (td == curthread failed) related to floating point handling: . . . (gdb) br serial_kernel_runner Breakpoint 1 at 0x1db34: serial_kernel_runner. (6 locations) (gdb) br parallel_kernel_runner Breakpoint 2 at 0x1b43c: parallel_kernel_runner. (6 locations) (gdb) run Starting program: /root/acpphint/acpphint_kernelsamplers_main-OPi+2E-2048MiB-threads_4-ILP32-FreeBSD_main_n260797_dc1b8c9a846e_32bit-g++_12_O3lto-libc++-cpulockdown . . . Breakpoint 1, serial_kernel_runner<float, unsigned short> (clock_info=..., laps=3, memry=2, ki=...) at acpphint_kernelrunners.cpp:69 69 static auto serial_kernel_runner (gdb) bt #0 serial_panic: Assertion td == curthread failed at /usr/main-src/sys/arm/arm/exec_machdep.c:103 cpuid = 3 time = 1676519530 KDB: stack backtrace: db_trace_self() at db_trace_self pc = 0xc05f04a0 lr = 0xc007ab0c (db_trace_self_wrapper+0x30) sp = 0xe28ea960 fp = 0xe28eaa78 db_trace_self_wrapper() at db_trace_self_wrapper+0x30 pc = 0xc007ab0c lr = 0xc02ddc44 (vpanic+0x140) sp = 0xe28eaa80 fp = 0xe28eaaa0 r4 = 0x00000100 r5 = 0x00000000 r6 = 0xc0790bb4 r7 = 0xc0b1b930 vpanic() at vpanic+0x140 pc = 0xc02ddc44 lr = 0xc02dda28 (dump_savectx) sp = 0xe28eaaa8 fp = 0xe28eaaac r4 = 0xe28eaad0 r5 = 0xbfbfe150 r6 = 0xe28eaad0 r7 = 0xc076a096 r8 = 0xdb8a47f4 r9 = 0x00000016 r10 = 0x00000040 dump_savectx() at dump_savectx pc = 0xc02dda28 lr = 0xc05f3354 (get_vfpcontext+0xb8) sp = 0xe28eaab4 fp = 0xe28eaac8 get_vfpcontext() at get_vfpcontext+0xb8 pc = 0xc05f3354 lr = 0xc0611148 (cpu_ptrace+0x38) sp = 0xe28eaad0 fp = 0xe28eabe8 r4 = 0xdb75cba0 r5 = 0xbfbfe150 cpu_ptrace() at cpu_ptrace+0x38 pc = 0xc0611148 lr = 0xc0360f4c (kern_ptrace+0x810) sp = 0xe28eabf0 fp = 0xe28eac70 r4 = 0xe583dba0 r5 = 0x00000000 r6 = 0xdb8a47a8 r10 = 0x00000040 kern_ptrace() at kern_ptrace+0x810 pc = 0xc0360f4c lr = 0xc0360550 (sys_ptrace+0x1cc) sp = 0xe28eac78 fp = 0xe28eadc0 r4 = 0xe583de5c r5 = 0xe583dba0 r6 = 0xbfbfe150 r7 = 0x00000000 r8 = 0x00000000 r9 = 0xe583de50 r10 = 0xdb756730 sys_ptrace() at sys_ptrace+0x1cc pc = 0xc0360550 lr = 0xc0613b48 (swi_handler+0x170) sp = 0xe28eadc8 fp = 0xe28eae38 r4 = 0xe583dba0 r5 = 0x00000001 r6 = 0xc090b220 r7 = 0x00000000 r8 = 0x00000000 r9 = 0xe583de50 swi_handler() at swi_handler+0x170 pc = 0xc0613b48 lr = 0xc05f2d90 (swi_exit) sp = 0xe28eae40 fp = 0xbfbfe128 r4 = 0x00000042 r5 = 0x22e61c20 r6 = 0xbfbfe150 r7 = 0x0000001a r8 = 0x00424124 r9 = 0x00000108 r10 = 0x00000040 swi_exit() at swi_exit pc = 0xc05f2d90 lr = 0xc05f2d90 (swi_exit) sp = 0xe28eae40 fp = 0xbfbfe128 KDB: enter: panic [ thread pid 5438 tid 106943 ] Stopped at kdb_enter+0x54: ldrb r15, [r15, r15, ror r15]! Note: the code was built via g++12 but using libc++, not libstdc++. So I tried the b=program variant that does not tryin to lock down which CPUs are used by the threads (a completely C++20 standard program variant, not FreeBSD specific for its used source code). Failure again . . . (gdb) br serial_kernel_runner Breakpoint 1 at 0x1c1bc: serial_kernel_runner. (6 locations) (gdb) br parallel_kernel_runner Breakpoint 2 at 0x19ac8: parallel_kernel_runner. (6 locations) (gdb) run Starting program: /root/acpphint/acpphint_kernelsamplers_main-OPi+2E-2048MiB-threads_4-ILP32-FreeBSD_main_n260797_dc1b8c9a846e_32bit-g++_12_O3lto-libc++ . . . Breakpoint 1, serial_kernel_runner<float, unsigned short> (clock_info=..., laps=3, memry=2, ki=...) at acpphint_kernelrunners.cpp:69 69 static auto serial_kernel_runner (gdb) bt #0 serial_kernel_runner<float, unsigned short> (clock_info=...,panic: Assertion td == curthread failed at /usr/main-src/sys/arm/arm/exec_machdep.c:103 cpuid = 0 time = 1676520400 KDB: stack backtrace: db_trace_self() at db_trace_self pc = 0xc05f04a0 lr = 0xc007ab0c (db_trace_self_wrapper+0x30) sp = 0xe2964960 fp = 0xe2964a78 db_trace_self_wrapper() at db_trace_self_wrapper+0x30 pc = 0xc007ab0c lr = 0xc02ddc44 (vpanic+0x140) sp = 0xe2964a80 fp = 0xe2964aa0 r4 = 0x00000100 r5 = 0x00000000 r6 = 0xc0790bb4 r7 = 0xc0b1b930 vpanic() at vpanic+0x140 pc = 0xc02ddc44 lr = 0xc02dda28 (dump_savectx) sp = 0xe2964aa8 fp = 0xe2964aac r4 = 0xe2964ad0 r5 = 0xbfbfe158 r6 = 0xe2964ad0 r7 = 0xc076a096 r8 = 0xdb7a511c r9 = 0x00000016 r10 = 0x00000040 dump_savectx() at dump_savectx pc = 0xc02dda28 lr = 0xc05f3354 (get_vfpcontext+0xb8) sp = 0xe2964ab4 fp = 0xe2964ac8 get_vfpcontext() at get_vfpcontext+0xb8 pc = 0xc05f3354 lr = 0xc0611148 (cpu_ptrace+0x38) sp = 0xe2964ad0 fp = 0xe2964be8 r4 = 0xdb7ca3e0 r5 = 0xbfbfe158 cpu_ptrace() at cpu_ptrace+0x38 pc = 0xc0611148 lr = 0xc0360f4c (kern_ptrace+0x810) sp = 0xe2964bf0 fp = 0xe2964c70 r4 = 0xdb76fba0 r5 = 0x00000000 r6 = 0xdb7a50d0 r10 = 0x00000040 kern_ptrace() at kern_ptrace+0x810 pc = 0xc0360f4c lr = 0xc0360550 (sys_ptrace+0x1cc) sp = 0xe2964c78 fp = 0xe2964dc0 r4 = 0xdb76fe5c r5 = 0xdb76fba0 r6 = 0xbfbfe158 r7 = 0x00000000 r8 = 0x00000000 r9 = 0xdb76fe50 r10 = 0xdb754000 sys_ptrace() at sys_ptrace+0x1cc pc = 0xc0360550 lr = 0xc0613b48 (swi_handler+0x170) sp = 0xe2964dc8 fp = 0xe2964e38 r4 = 0xdb76fba0 r5 = 0x00000001 r6 = 0xc090b220 r7 = 0x00000000 r8 = 0x00000000 r9 = 0xdb76fe50 swi_handler() at swi_handler+0x170 pc = 0xc0613b48 lr = 0xc05f2d90 (swi_exit) sp = 0xe2964e40 fp = 0xbfbfe130 r4 = 0x00000042 r5 = 0x22e61c20 r6 = 0xbfbfe158 r7 = 0x0000001a r8 = 0x00424124 r9 = 0x00000108 r10 = 0x00000040 swi_exit() at swi_exit pc = 0xc05f2d90 lr = 0xc05f2d90 (swi_exit) sp = 0xe2964e40 fp = 0xbfbfe130 KDB: enter: panic [ thread pid 1107 tid 100140 ] Stopped at kdb_enter+0x54: ldrb r15, [r15, r15, ror r15]! For reference (whitespace may not have been preserved): void get_vfpcontext(struct thread *td, mcontext_vfp_t *vfp) { struct pcb *pcb; MPASS(td == curthread); pcb = td->td_pcb; if ((pcb->pcb_fpflags & PCB_FP_STARTED) != 0) { critical_enter(); vfp_store(&pcb->pcb_vfpstate, false); critical_exit(); } KASSERT(pcb->pcb_vfpsaved == &pcb->pcb_vfpstate, ("Called get_vfpcontext while the kernel is using the VFP")); memcpy(vfp->mcv_reg, pcb->pcb_vfpstate.reg, sizeof(vfp->mcv_reg)); vfp->mcv_fpscr = pcb->pcb_vfpstate.fpscr; } Unfortunately the benchmark program is far from being a minimalist/simple example. I'm not sure what FreeBSD might have around that would have floating point in use but be simple, and possibly standardly available, to see if a simpler context is available for analogous testing. === Mark Millard marklmi at yahoo.com