On Wed, 18 Dec 2024 15:57:02 +0100 Ludwig Rydberg <ludwig.rydb...@gaisler.com> wrote:
> Dear maintainers, > > When I try to enable the function tracer using Linux 6.13.0-rc3 on some > 32-bit systems (tested on qemu-riscv32 and LEON4-sparc32) a BUG message > about spinlock recursion is printed and the system becomes unresponsive. > > Steps to reproduce the issue: > # mount -t tracefs nodev /sys/kernel/tracing > # echo function > /sys/kernel/tracing/current_tracer > [ 16.204882] BUG: spinlock recursion on CPU#0, sh/117 > [ 16.205758] lock: atomic64_lock+0x0/0x400, .magic: dead4ead, .owner: > sh/117, > .owner_cpu: 0 > [ 16.206564] CPU: 0 UID: 0 PID: 117 Comm: sh Not tainted 6.13.0-rc3 #7 > [ 16.206777] Hardware name: riscv-virtio,qemu (DT) > [ 16.206966] Call Trace: > [ 16.207245] dump_backtrace (arch/riscv/kernel/stacktrace.c:131) > [ 16.207392] show_stack (arch/riscv/kernel/stacktrace.c:137) > [ 16.207497] dump_stack_lvl (lib/dump_stack.c:122) > [ 16.207623] dump_stack (lib/dump_stack.c:130) > [ 16.207745] spin_dump (kernel/locking/spinlock_debug.c:71) > [ 16.207859] do_raw_spin_lock (kernel/locking/spinlock_debug.c:78 > kernel/locking/spinlock_debug.c:87 kernel/locking/spinlock_debug.c:115) > [ 16.207999] _raw_spin_lock_irqsave (kernel/locking/spinlock.c:163) > [ 16.208139] generic_atomic64_read (lib/atomic64.c:51) Grumble. This is due to your architecture using the atomic64 code that takes spin locks. I'm not bringing back the logic that the commit you specified removed. Hmm, we do have recursion protection, but it allows one loop to handle transitions between normal and interrupt context. If we stop that transition for archs that use the generic atomic64, I wonder if that would fix things. Can you try this patch? -- Steve diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 7e257e855dd1..c402874a979b 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3935,6 +3935,9 @@ trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer) bit = RB_CTX_NORMAL - bit; if (unlikely(val & (1 << (bit + cpu_buffer->nest)))) { + /* Do not allow any recursion for archs using locks for atomic64 */ + if (IS_ENABLED(CONFIG_GENERIC_ATOMIC64)) + return true; /* * It is possible that this was called by transitioning * between interrupt context, and preempt_count() has not