Re: [PATCH mm-unstable v1 16/20] mm/frame-vector: remove FOLL_FORCE usage
On 16.11.22 11:26, David Hildenbrand wrote: FOLL_FORCE is really only for ptrace access. According to commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always writable"), get_vaddr_frames() currently pins all pages writable as a workaround for issues with read-only buffers. FOLL_FORCE, however, seems to be a legacy leftover as it predates commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always writable"). Let's just remove it. Once the read-only buffer issue has been resolved, FOLL_WRITE could again be set depending on the DMA direction. Cc: Hans Verkuil Cc: Marek Szyprowski Cc: Tomasz Figa Cc: Marek Szyprowski Cc: Mauro Carvalho Chehab Signed-off-by: David Hildenbrand --- drivers/media/common/videobuf2/frame_vector.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c index 542dde9d2609..062e98148c53 100644 --- a/drivers/media/common/videobuf2/frame_vector.c +++ b/drivers/media/common/videobuf2/frame_vector.c @@ -50,7 +50,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, start = untagged_addr(start); ret = pin_user_pages_fast(start, nr_frames, - FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM, + FOLL_WRITE | FOLL_LONGTERM, (struct page **)(vec->ptrs)); if (ret > 0) { vec->got_ref = true; Hi Andrew, see the discussion at [1] regarding a conflict and how to proceed with upstreaming. The conflict would be easy to resolve, however, also the patch description doesn't make sense anymore with [1]. On top of mm-unstable, reverting this patch and applying [1] gives me an updated patch: From 1e66c25f1467c1f1e5f275312f2c6df29308d4df Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Wed, 16 Nov 2022 11:26:55 +0100 Subject: [PATCH] mm/frame-vector: remove FOLL_FORCE usage GUP now supports reliable R/O long-term pinning in COW mappings, such that we break COW early. MAP_SHARED VMAs only use the shared zeropage so far in one corner case (DAXFS file with holes), which can be ignored because GUP does not support long-term pinning in fsdax (see check_vma_flags()). Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop using FOLL_FORCE, which is really only for ptrace access. Reviewed-by: Daniel Vetter Acked-by: Hans Verkuil Cc: Hans Verkuil Cc: Marek Szyprowski Cc: Tomasz Figa Cc: Marek Szyprowski Cc: Mauro Carvalho Chehab Signed-off-by: David Hildenbrand --- drivers/media/common/videobuf2/frame_vector.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c index aad72640f055..8606fdacf5b8 100644 --- a/drivers/media/common/videobuf2/frame_vector.c +++ b/drivers/media/common/videobuf2/frame_vector.c @@ -41,7 +41,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, bool write, int ret_pin_user_pages_fast = 0; int ret = 0; int err; - unsigned int gup_flags = FOLL_FORCE | FOLL_LONGTERM; + unsigned int gup_flags = FOLL_LONGTERM; if (nr_frames == 0) return 0; -- 2.38.1 Please let me know how you want to proceed. Ideally, you'd pick up [1] and apply this updated patch. Also, please tell me if I should send this updated patch in a separate mail (e.g., as reply to this mail). [1] https://lkml.kernel.org/r/71bdd3cf-b044-3f12-df58-7c16d5749...@xs4all.nl -- Thanks, David / dhildenb
Re: [PATCH linux-next][RFC]torture: avoid offline tick_do_timer_cpu
Zhouyi, On Sun, Nov 27 2022 at 10:45, Zhouyi Zhou wrote: > On Sun, Nov 27, 2022 at 1:05 AM Thomas Gleixner wrote: > > So, I should construct my patch as: > We avoid ... by ... Not "We avoid". Avoid this behaviour by >> No. We are not exporting this just to make a bogus test case happy. >> >> Fix the torture code to handle -EBUSY correctly. > I am going to do a study on this, for now, I do a grep in the kernel tree: > find . -name "*.c"|xargs grep cpuhp_setup_state|wc -l > The result of the grep command shows that there are 268 > cpuhp_setup_state* cases. > which may make our task more complicated. Why? The whole point of this torture thing is to stress the infrastructure. There are quite some reasons why a CPU-hotplug or a hot-unplug operation can fail, which is not a fatal problem, really. So if a CPU hotplug operation fails, then why can't the torture test just move on and validate that the system still behaves correctly? That gives us more coverage than just testing the good case and giving up when something unexpected happens. I even argue that the torture test should inject random failures into the hotplug state machine to achieve extended code coverage. Thanks, tglx
[PATCH 00/17] powerpc: Remove STACK_FRAME_OVERHEAD
Since RFC: - Fix a compile bug. - Fix BookE KVM properly. Hopefully -- I don't have a BookE KVM environment to test. Can QEMU do it? Is it still tested? - Drop the last two patches that changed the stack layout, they can be done later. - Drop the load/store-multiple change to 32-bit. Thanks, Nick Nicholas Piggin (17): KVM: PPC: Book3E: Fix CONFIG_TRACE_IRQFLAGS support powerpc/64: Remove asm interrupt tracing call helpers powerpc/perf: callchain validate kernel stack pointer bounds powerpc: Rearrange copy_thread child stack creation powerpc/pseries: hvcall stack frame overhead powerpc: simplify ppc_save_regs powerpc: add definition for pt_regs offset within an interrupt frame powerpc: add a definition for the marker offset within the interrupt frame powerpc: Rename STACK_FRAME_MARKER and derive it from frame offset powerpc: add a define for the user interrupt frame size powerpc: add a define for the switch frame size and regs offset powerpc: copy_thread fill in interrupt frame marker and back chain powerpc: copy_thread add a back chain to the switch stack frame powerpc: split validate_sp into two functions powerpc: allow minimum sized kernel stack frames powerpc/64: ELFv2 use minimal stack frames in int and switch frame sizes powerpc: remove STACK_FRAME_OVERHEAD arch/powerpc/include/asm/irqflags.h | 58 - arch/powerpc/include/asm/kvm_ppc.h| 12 +++ arch/powerpc/include/asm/processor.h | 15 +++- arch/powerpc/include/asm/ptrace.h | 37 ++--- arch/powerpc/kernel/asm-offsets.c | 9 +- arch/powerpc/kernel/entry_32.S| 14 ++-- arch/powerpc/kernel/exceptions-64e.S | 44 +- arch/powerpc/kernel/exceptions-64s.S | 82 +-- arch/powerpc/kernel/head_32.h | 4 +- arch/powerpc/kernel/head_40x.S| 2 +- arch/powerpc/kernel/head_44x.S| 6 +- arch/powerpc/kernel/head_64.S | 6 +- arch/powerpc/kernel/head_85xx.S | 8 +- arch/powerpc/kernel/head_8xx.S| 2 +- arch/powerpc/kernel/head_book3s_32.S | 4 +- arch/powerpc/kernel/head_booke.h | 4 +- arch/powerpc/kernel/interrupt_64.S| 32 arch/powerpc/kernel/irq.c | 4 +- arch/powerpc/kernel/kgdb.c| 2 +- arch/powerpc/kernel/misc_32.S | 2 +- arch/powerpc/kernel/misc_64.S | 4 +- arch/powerpc/kernel/optprobes_head.S | 4 +- arch/powerpc/kernel/ppc_save_regs.S | 57 - arch/powerpc/kernel/process.c | 54 +++- arch/powerpc/kernel/smp.c | 2 +- arch/powerpc/kernel/stacktrace.c | 10 +-- arch/powerpc/kernel/tm.S | 8 +- arch/powerpc/kernel/trace/ftrace_mprofile.S | 2 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +- arch/powerpc/kvm/booke.c | 3 + arch/powerpc/kvm/bookehv_interrupts.S | 9 -- .../lib/test_emulate_step_exec_instr.S| 2 +- arch/powerpc/perf/callchain.c | 9 +- arch/powerpc/platforms/pseries/hvCall.S | 38 + arch/powerpc/xmon/xmon.c | 10 +-- 35 files changed, 259 insertions(+), 302 deletions(-) -- 2.37.2
[PATCH 01/17] KVM: PPC: Book3E: Fix CONFIG_TRACE_IRQFLAGS support
32-bit does not trace_irqs_off() to match the trace_irqs_on() call in kvmppc_fix_ee_before_entry(). This can lead to irqs being enabled twice in the trace, and the irqs-off region between guest exit and the host enabling local irqs again is not properly traced. 64-bit code does call this, but from asm code where volatiles are live and so incorrectly get clobbered. Move the irq reconcile into C to fix both problems. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/kvm_ppc.h| 12 arch/powerpc/kvm/booke.c | 3 +++ arch/powerpc/kvm/bookehv_interrupts.S | 9 - 3 files changed, 15 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index bfacf12784dd..eae9619b6190 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -1014,6 +1014,18 @@ static inline void kvmppc_fix_ee_before_entry(void) #endif } +static inline void kvmppc_fix_ee_after_exit(void) +{ +#ifdef CONFIG_PPC64 + /* Only need to enable IRQs by hard enabling them after this */ + local_paca->irq_happened = PACA_IRQ_HARD_DIS; + irq_soft_mask_set(IRQS_ALL_DISABLED); +#endif + + trace_hardirqs_off(); +} + + static inline ulong kvmppc_get_ea_indexed(struct kvm_vcpu *vcpu, int ra, int rb) { ulong ea; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 7b4920e9fd26..0dce93ccaadf 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1015,6 +1015,9 @@ int kvmppc_handle_exit(struct kvm_vcpu *vcpu, unsigned int exit_nr) u32 last_inst = KVM_INST_FETCH_FAILED; enum emulation_result emulated = EMULATE_DONE; + /* Fix irq state (pairs with kvmppc_fix_ee_before_entry()) */ + kvmppc_fix_ee_after_exit(); + /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 8262c14fc9e6..b5fe6fb53c66 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -424,15 +424,6 @@ _GLOBAL(kvmppc_resume_host) mtspr SPRN_EPCR, r3 isync -#ifdef CONFIG_64BIT - /* -* We enter with interrupts disabled in hardware, but -* we need to call RECONCILE_IRQ_STATE to ensure -* that the software state is kept in sync. -*/ - RECONCILE_IRQ_STATE(r3,r5) -#endif - /* Switch to kernel stack and jump to handler. */ mr r3, r4 mr r5, r14 /* intno */ -- 2.37.2
[PATCH 02/17] powerpc/64: Remove asm interrupt tracing call helpers
These are now unused. Remove. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/irqflags.h | 58 - 1 file changed, 58 deletions(-) diff --git a/arch/powerpc/include/asm/irqflags.h b/arch/powerpc/include/asm/irqflags.h index 1a6c1ce17735..47d46712928a 100644 --- a/arch/powerpc/include/asm/irqflags.h +++ b/arch/powerpc/include/asm/irqflags.h @@ -11,64 +11,6 @@ */ #include -#else -#ifdef CONFIG_TRACE_IRQFLAGS -#ifdef CONFIG_IRQSOFF_TRACER -/* - * Since the ftrace irqsoff latency trace checks CALLER_ADDR1, - * which is the stack frame here, we need to force a stack frame - * in case we came from user space. - */ -#define TRACE_WITH_FRAME_BUFFER(func) \ - mflrr0; \ - stdur1, -STACK_FRAME_OVERHEAD(r1); \ - std r0, 16(r1); \ - stdur1, -STACK_FRAME_OVERHEAD(r1); \ - bl func;\ - ld r1, 0(r1); \ - ld r1, 0(r1); -#else -#define TRACE_WITH_FRAME_BUFFER(func) \ - bl func; -#endif - -/* - * These are calls to C code, so the caller must be prepared for volatiles to - * be clobbered. - */ -#define TRACE_ENABLE_INTS TRACE_WITH_FRAME_BUFFER(trace_hardirqs_on) -#define TRACE_DISABLE_INTS TRACE_WITH_FRAME_BUFFER(trace_hardirqs_off) - -/* - * This is used by assembly code to soft-disable interrupts first and - * reconcile irq state. - * - * NB: This may call C code, so the caller must be prepared for volatiles to - * be clobbered. - */ -#define RECONCILE_IRQ_STATE(__rA, __rB)\ - lbz __rA,PACAIRQSOFTMASK(r13); \ - lbz __rB,PACAIRQHAPPENED(r13); \ - andi. __rA,__rA,IRQS_DISABLED;\ - li __rA,IRQS_DISABLED; \ - ori __rB,__rB,PACA_IRQ_HARD_DIS;\ - stb __rB,PACAIRQHAPPENED(r13); \ - bne 44f;\ - stb __rA,PACAIRQSOFTMASK(r13); \ - TRACE_DISABLE_INTS; \ -44: - -#else -#define TRACE_ENABLE_INTS -#define TRACE_DISABLE_INTS - -#define RECONCILE_IRQ_STATE(__rA, __rB)\ - lbz __rA,PACAIRQHAPPENED(r13); \ - li __rB,IRQS_DISABLED; \ - ori __rA,__rA,PACA_IRQ_HARD_DIS;\ - stb __rB,PACAIRQSOFTMASK(r13); \ - stb __rA,PACAIRQHAPPENED(r13) -#endif #endif #endif -- 2.37.2
[PATCH 03/17] powerpc/perf: callchain validate kernel stack pointer bounds
The interrupt frame detection and loads from the hypothetical pt_regs are not bounds-checked. The next-frame validation only bounds-checks STACK_FRAME_OVERHEAD, which does not include the pt_regs. Add another test for this. The user could set r1 to be equal to the address matching the first interrupt frame - STACK_INT_FRAME_SIZE, which is in the previous page due to the kernel redzone, and induce the kernel to load the marker from there. Possibly this could cause a crash at least. If the user could induce the previous page to contain a valid marker, then it might be able to direct perf to read specific memory addresses in a way that could be transmitted back to the user in the perf data. Signed-off-by: Nicholas Piggin --- Not sure if my attack scenario is actually valid, but I think there is some concern here... Thanks, Nick arch/powerpc/perf/callchain.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c index 082f6d0308a4..8718289c051d 100644 --- a/arch/powerpc/perf/callchain.c +++ b/arch/powerpc/perf/callchain.c @@ -61,6 +61,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re next_sp = fp[0]; if (next_sp == sp + STACK_INT_FRAME_SIZE && + validate_sp(sp, current, STACK_INT_FRAME_SIZE) && fp[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) { /* * This looks like an interrupt frame for an -- 2.37.2
[PATCH 04/17] powerpc: Rearrange copy_thread child stack creation
This makes it a bit clearer where the stack frame is created, and will allow easier use of some of the stack offset constants in a later change. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/process.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 67da147fe34d..acfa197fb2df 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1726,13 +1726,16 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) klp_init_thread_info(p); + /* Create initial stack frame. */ + sp -= (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD); + ((unsigned long *)sp)[0] = 0; + /* Copy registers */ - sp -= sizeof(struct pt_regs); - childregs = (struct pt_regs *) sp; + childregs = (struct pt_regs *)(sp + STACK_FRAME_OVERHEAD); if (unlikely(args->fn)) { /* kernel thread */ memset(childregs, 0, sizeof(struct pt_regs)); - childregs->gpr[1] = sp + sizeof(struct pt_regs); + childregs->gpr[1] = sp + (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD); /* function */ if (args->fn) childregs->gpr[14] = ppc_function_entry((void *)args->fn); @@ -1767,7 +1770,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) f = ret_from_fork; } childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX); - sp -= STACK_FRAME_OVERHEAD; /* * The way this works is that at some point in the future @@ -1777,7 +1779,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) * do some house keeping and then return from the fork or clone * system call, using the stack frame created above. */ - ((unsigned long *)sp)[0] = 0; sp -= sizeof(struct pt_regs); kregs = (struct pt_regs *) sp; sp -= STACK_FRAME_OVERHEAD; -- 2.37.2
[PATCH 05/17] powerpc/pseries: hvcall stack frame overhead
This call may use the min size stack frame. The scratch space used is in the caller's parameter area frame, not this function's frame. Signed-off-by: Nicholas Piggin --- arch/powerpc/platforms/pseries/hvCall.S | 38 + 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/platforms/pseries/hvCall.S b/arch/powerpc/platforms/pseries/hvCall.S index 762eb15d3bd4..783c16ad648b 100644 --- a/arch/powerpc/platforms/pseries/hvCall.S +++ b/arch/powerpc/platforms/pseries/hvCall.S @@ -27,7 +27,9 @@ hcall_tracepoint_refcount: /* * precall must preserve all registers. use unused STK_PARAM() - * areas to save snapshots and opcode. + * areas to save snapshots and opcode. STK_PARAM() in the caller's + * frame will be available even on ELFv2 because these are all + * variadic functions. */ #define HCALL_INST_PRECALL(FIRST_REG) \ mflrr0; \ @@ -41,29 +43,29 @@ hcall_tracepoint_refcount: std r10,STK_PARAM(R10)(r1); \ std r0,16(r1); \ addir4,r1,STK_PARAM(FIRST_REG); \ - stdur1,-STACK_FRAME_OVERHEAD(r1); \ + stdur1,-STACK_FRAME_MIN_SIZE(r1); \ bl __trace_hcall_entry;\ - ld r3,STACK_FRAME_OVERHEAD+STK_PARAM(R3)(r1); \ - ld r4,STACK_FRAME_OVERHEAD+STK_PARAM(R4)(r1); \ - ld r5,STACK_FRAME_OVERHEAD+STK_PARAM(R5)(r1); \ - ld r6,STACK_FRAME_OVERHEAD+STK_PARAM(R6)(r1); \ - ld r7,STACK_FRAME_OVERHEAD+STK_PARAM(R7)(r1); \ - ld r8,STACK_FRAME_OVERHEAD+STK_PARAM(R8)(r1); \ - ld r9,STACK_FRAME_OVERHEAD+STK_PARAM(R9)(r1); \ - ld r10,STACK_FRAME_OVERHEAD+STK_PARAM(R10)(r1) + ld r3,STACK_FRAME_MIN_SIZE+STK_PARAM(R3)(r1); \ + ld r4,STACK_FRAME_MIN_SIZE+STK_PARAM(R4)(r1); \ + ld r5,STACK_FRAME_MIN_SIZE+STK_PARAM(R5)(r1); \ + ld r6,STACK_FRAME_MIN_SIZE+STK_PARAM(R6)(r1); \ + ld r7,STACK_FRAME_MIN_SIZE+STK_PARAM(R7)(r1); \ + ld r8,STACK_FRAME_MIN_SIZE+STK_PARAM(R8)(r1); \ + ld r9,STACK_FRAME_MIN_SIZE+STK_PARAM(R9)(r1); \ + ld r10,STACK_FRAME_MIN_SIZE+STK_PARAM(R10)(r1) /* * postcall is performed immediately before function return which * allows liberal use of volatile registers. */ #define __HCALL_INST_POSTCALL \ - ld r0,STACK_FRAME_OVERHEAD+STK_PARAM(R3)(r1); \ - std r3,STACK_FRAME_OVERHEAD+STK_PARAM(R3)(r1); \ + ld r0,STACK_FRAME_MIN_SIZE+STK_PARAM(R3)(r1); \ + std r3,STACK_FRAME_MIN_SIZE+STK_PARAM(R3)(r1); \ mr r4,r3; \ mr r3,r0; \ bl __trace_hcall_exit; \ - ld r0,STACK_FRAME_OVERHEAD+16(r1); \ - addir1,r1,STACK_FRAME_OVERHEAD; \ + ld r0,STACK_FRAME_MIN_SIZE+16(r1); \ + addir1,r1,STACK_FRAME_MIN_SIZE; \ ld r3,STK_PARAM(R3)(r1); \ mtlrr0 @@ -303,14 +305,14 @@ plpar_hcall9_trace: mr r7,r8 mr r8,r9 mr r9,r10 - ld r10,STACK_FRAME_OVERHEAD+STK_PARAM(R11)(r1) - ld r11,STACK_FRAME_OVERHEAD+STK_PARAM(R12)(r1) - ld r12,STACK_FRAME_OVERHEAD+STK_PARAM(R13)(r1) + ld r10,STACK_FRAME_MIN_SIZE+STK_PARAM(R11)(r1) + ld r11,STACK_FRAME_MIN_SIZE+STK_PARAM(R12)(r1) + ld r12,STACK_FRAME_MIN_SIZE+STK_PARAM(R13)(r1) HVSC mr r0,r12 - ld r12,STACK_FRAME_OVERHEAD+STK_PARAM(R4)(r1) + ld r12,STACK_FRAME_MIN_SIZE+STK_PARAM(R4)(r1) std r4,0(r12) std r5,8(r12) std r6,16(r12) -- 2.37.2
[PATCH 06/17] powerpc: simplify ppc_save_regs
Adjust the pt_regs pointer so the interrupt frame offsets can be used to save registers. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/ppc_save_regs.S | 57 - 1 file changed, 15 insertions(+), 42 deletions(-) diff --git a/arch/powerpc/kernel/ppc_save_regs.S b/arch/powerpc/kernel/ppc_save_regs.S index 2d4d21bb46a9..6e86f3bf4673 100644 --- a/arch/powerpc/kernel/ppc_save_regs.S +++ b/arch/powerpc/kernel/ppc_save_regs.S @@ -21,60 +21,33 @@ * different ABIs, though). */ _GLOBAL(ppc_save_regs) - PPC_STL r0,0*SZL(r3) + /* This allows stack frame accessor macros and offsets to be used */ + subir3,r3,STACK_FRAME_OVERHEAD + PPC_STL r0,GPR0(r3) #ifdef CONFIG_PPC32 - stmwr2, 2*SZL(r3) + stmwr2,GPR2(r3) #else - PPC_STL r2,2*SZL(r3) - PPC_STL r3,3*SZL(r3) - PPC_STL r4,4*SZL(r3) - PPC_STL r5,5*SZL(r3) - PPC_STL r6,6*SZL(r3) - PPC_STL r7,7*SZL(r3) - PPC_STL r8,8*SZL(r3) - PPC_STL r9,9*SZL(r3) - PPC_STL r10,10*SZL(r3) - PPC_STL r11,11*SZL(r3) - PPC_STL r12,12*SZL(r3) - PPC_STL r13,13*SZL(r3) - PPC_STL r14,14*SZL(r3) - PPC_STL r15,15*SZL(r3) - PPC_STL r16,16*SZL(r3) - PPC_STL r17,17*SZL(r3) - PPC_STL r18,18*SZL(r3) - PPC_STL r19,19*SZL(r3) - PPC_STL r20,20*SZL(r3) - PPC_STL r21,21*SZL(r3) - PPC_STL r22,22*SZL(r3) - PPC_STL r23,23*SZL(r3) - PPC_STL r24,24*SZL(r3) - PPC_STL r25,25*SZL(r3) - PPC_STL r26,26*SZL(r3) - PPC_STL r27,27*SZL(r3) - PPC_STL r28,28*SZL(r3) - PPC_STL r29,29*SZL(r3) - PPC_STL r30,30*SZL(r3) - PPC_STL r31,31*SZL(r3) + SAVE_GPRS(2, 31, r3) lbz r0,PACAIRQSOFTMASK(r13) - PPC_STL r0,SOFTE-STACK_FRAME_OVERHEAD(r3) + PPC_STL r0,SOFTE(r3) #endif /* go up one stack frame for SP */ PPC_LL r4,0(r1) - PPC_STL r4,1*SZL(r3) + PPC_STL r4,GPR1(r3) /* get caller's LR */ PPC_LL r0,LRSAVE(r4) - PPC_STL r0,_LINK-STACK_FRAME_OVERHEAD(r3) + PPC_STL r0,_LINK(r3) mflrr0 - PPC_STL r0,_NIP-STACK_FRAME_OVERHEAD(r3) + PPC_STL r0,_NIP(r3) mfmsr r0 - PPC_STL r0,_MSR-STACK_FRAME_OVERHEAD(r3) + PPC_STL r0,_MSR(r3) mfctr r0 - PPC_STL r0,_CTR-STACK_FRAME_OVERHEAD(r3) + PPC_STL r0,_CTR(r3) mfxer r0 - PPC_STL r0,_XER-STACK_FRAME_OVERHEAD(r3) + PPC_STL r0,_XER(r3) mfcrr0 - PPC_STL r0,_CCR-STACK_FRAME_OVERHEAD(r3) + PPC_STL r0,_CCR(r3) li r0,0 - PPC_STL r0,_TRAP-STACK_FRAME_OVERHEAD(r3) - PPC_STL r0,ORIG_GPR3-STACK_FRAME_OVERHEAD(r3) + PPC_STL r0,_TRAP(r3) + PPC_STL r0,ORIG_GPR3(r3) blr -- 2.37.2
[PATCH 07/17] powerpc: add definition for pt_regs offset within an interrupt frame
This is a common offset that currently uses the overloaded STACK_FRAME_OVERHEAD constant. It's easier to read and more flexible to use a specific regs offset for this. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/ptrace.h | 2 + arch/powerpc/kernel/asm-offsets.c | 7 +- arch/powerpc/kernel/entry_32.S| 6 +- arch/powerpc/kernel/exceptions-64e.S | 42 +- arch/powerpc/kernel/exceptions-64s.S | 80 +-- arch/powerpc/kernel/head_32.h | 2 +- arch/powerpc/kernel/head_85xx.S | 4 +- arch/powerpc/kernel/head_booke.h | 2 +- arch/powerpc/kernel/interrupt_64.S| 22 ++--- arch/powerpc/kernel/kgdb.c| 2 +- arch/powerpc/kernel/optprobes_head.S | 4 +- arch/powerpc/kernel/ppc_save_regs.S | 2 +- arch/powerpc/kernel/process.c | 4 +- arch/powerpc/kernel/tm.S | 8 +- arch/powerpc/kernel/trace/ftrace_mprofile.S | 2 +- .../lib/test_emulate_step_exec_instr.S| 2 +- arch/powerpc/perf/callchain.c | 2 +- arch/powerpc/xmon/xmon.c | 7 +- 18 files changed, 100 insertions(+), 100 deletions(-) diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h index 2efec6d87049..a4ae67aa9b76 100644 --- a/arch/powerpc/include/asm/ptrace.h +++ b/arch/powerpc/include/asm/ptrace.h @@ -124,6 +124,7 @@ struct pt_regs #define STACK_FRAME_LR_SAVE2 /* Location of LR in stack frame */ #define STACK_INT_FRAME_SIZE (sizeof(struct pt_regs) + \ STACK_FRAME_OVERHEAD + KERNEL_REDZONE_SIZE) +#define STACK_INT_FRAME_REGS STACK_FRAME_OVERHEAD #define STACK_FRAME_MARKER 12 #ifdef CONFIG_PPC64_ELF_ABI_V2 @@ -143,6 +144,7 @@ struct pt_regs #define STACK_FRAME_OVERHEAD 16 /* size of minimum stack frame */ #define STACK_FRAME_LR_SAVE1 /* Location of LR in stack frame */ #define STACK_INT_FRAME_SIZE (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD) +#define STACK_INT_FRAME_REGS STACK_FRAME_OVERHEAD #define STACK_FRAME_MARKER 2 #define STACK_FRAME_MIN_SIZE STACK_FRAME_OVERHEAD diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 4ce2a4aa3985..db5e66c1d031 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -72,7 +72,7 @@ #endif #define STACK_PT_REGS_OFFSET(sym, val) \ - DEFINE(sym, STACK_FRAME_OVERHEAD + offsetof(struct pt_regs, val)) + DEFINE(sym, STACK_INT_FRAME_REGS + offsetof(struct pt_regs, val)) int main(void) { @@ -167,9 +167,8 @@ int main(void) OFFSET(THREAD_CKVRSTATE, thread_struct, ckvr_state.vr); OFFSET(THREAD_CKVRSAVE, thread_struct, ckvrsave); OFFSET(THREAD_CKFPSTATE, thread_struct, ckfp_state.fpr); - /* Local pt_regs on stack for Transactional Memory funcs. */ - DEFINE(TM_FRAME_SIZE, STACK_FRAME_OVERHEAD + - sizeof(struct pt_regs) + 16); + /* Local pt_regs on stack in int frame form, plus 16 bytes for TM */ + DEFINE(TM_FRAME_SIZE, STACK_INT_FRAME_SIZE + 16); #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ OFFSET(TI_LOCAL_FLAGS, thread_info, local_flags); diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 3fc7c9886bb7..24c8d84a56c9 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -123,12 +123,12 @@ transfer_to_syscall: kuep_lock /* Calling convention has r3 = regs, r4 = orig r0 */ - addir3,r1,STACK_FRAME_OVERHEAD + addir3,r1,STACK_INT_FRAME_REGS mr r4,r0 bl system_call_exception ret_from_syscall: - addir4,r1,STACK_FRAME_OVERHEAD + addir4,r1,STACK_INT_FRAME_REGS li r5,0 bl syscall_exit_prepare #ifdef CONFIG_PPC_47x @@ -293,7 +293,7 @@ _ASM_NOKPROBE_SYMBOL(fast_exception_return) .globl interrupt_return interrupt_return: lwz r4,_MSR(r1) - addir3,r1,STACK_FRAME_OVERHEAD + addir3,r1,STACK_INT_FRAME_REGS andi. r0,r4,MSR_PR beq .Lkernel_interrupt_return bl interrupt_exit_user_prepare diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 2f68fb2ee4fc..62033d022e0a 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -455,7 +455,7 @@ exc_##n##_bad_stack: \ EXCEPTION_COMMON(trapnum) \ ack(r8);\ CHECK_NAPPING();\ - addir3,r1,STACK_FRAME_OVERHEAD; \ + addir3,r1,STACK_INT_FRAME_REGS;
[PATCH 08/17] powerpc: add a definition for the marker offset within the interrupt frame
Define a constant rather than open-code the offset for the "regs" marker. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/ptrace.h | 2 ++ arch/powerpc/kernel/entry_32.S | 2 +- arch/powerpc/kernel/exceptions-64e.S| 2 +- arch/powerpc/kernel/exceptions-64s.S| 2 +- arch/powerpc/kernel/head_32.h | 2 +- arch/powerpc/kernel/head_booke.h| 2 +- arch/powerpc/kernel/interrupt_64.S | 10 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +- 8 files changed, 13 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h index a4ae67aa9b76..8a9f4cf8c4c5 100644 --- a/arch/powerpc/include/asm/ptrace.h +++ b/arch/powerpc/include/asm/ptrace.h @@ -125,6 +125,7 @@ struct pt_regs #define STACK_INT_FRAME_SIZE (sizeof(struct pt_regs) + \ STACK_FRAME_OVERHEAD + KERNEL_REDZONE_SIZE) #define STACK_INT_FRAME_REGS STACK_FRAME_OVERHEAD +#define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 16) #define STACK_FRAME_MARKER 12 #ifdef CONFIG_PPC64_ELF_ABI_V2 @@ -145,6 +146,7 @@ struct pt_regs #define STACK_FRAME_LR_SAVE1 /* Location of LR in stack frame */ #define STACK_INT_FRAME_SIZE (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD) #define STACK_INT_FRAME_REGS STACK_FRAME_OVERHEAD +#define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 8) #define STACK_FRAME_MARKER 2 #define STACK_FRAME_MIN_SIZE STACK_FRAME_OVERHEAD diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 24c8d84a56c9..2f61b7d3677c 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -114,7 +114,7 @@ transfer_to_syscall: addir12,r12,STACK_FRAME_REGS_MARKER@l stw r9,_MSR(r1) li r2, INTERRUPT_SYSCALL - stw r12,8(r1) + stw r12,STACK_INT_FRAME_MARKER(r1) stw r2,_TRAP(r1) SAVE_GPR(0, r1) SAVE_GPRS(3, 8, r1) diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 62033d022e0a..b9cec22df9f9 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -391,7 +391,7 @@ exc_##n##_common: \ std r10,_CCR(r1); /* store orig CR in stackframe */ \ std r9,GPR1(r1);/* store stack frame back link */ \ std r11,SOFTE(r1); /* and save it to stackframe */ \ - std r12,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame */ \ + std r12,STACK_INT_FRAME_MARKER(r1); /* mark the frame */\ std r3,_TRAP(r1); /* set trap number */ \ std r0,RESULT(r1); /* clear regs->result */\ SAVE_NVGPRS(r1); diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 29b78536ca59..ac3b0580224e 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -591,7 +591,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR) li r10,0 LOAD_REG_IMMEDIATE(r11, STACK_FRAME_REGS_MARKER) std r10,RESULT(r1) /* clear regs->result */ - std r11,STACK_FRAME_OVERHEAD-16(r1) /* mark the frame */ + std r11,STACK_INT_FRAME_MARKER(r1) /* mark the frame*/ .endm /* diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h index 117d25330e13..f8e2911478a7 100644 --- a/arch/powerpc/kernel/head_32.h +++ b/arch/powerpc/kernel/head_32.h @@ -112,7 +112,7 @@ _ASM_NOKPROBE_SYMBOL(\name\()_virt) stw r0,GPR0(r1) lis r10,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */ addir10,r10,STACK_FRAME_REGS_MARKER@l - stw r10,8(r1) + stw r10,STACK_INT_FRAME_MARKER(r1) li r10, \trapno stw r10,_TRAP(r1) SAVE_GPRS(3, 8, r1) diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h index 3149ac20b18e..37d43c172676 100644 --- a/arch/powerpc/kernel/head_booke.h +++ b/arch/powerpc/kernel/head_booke.h @@ -84,7 +84,7 @@ END_BTB_FLUSH_SECTION stw r0,GPR0(r1) lis r10, STACK_FRAME_REGS_MARKER@ha /* exception frame marker */ addir10, r10, STACK_FRAME_REGS_MARKER@l - stw r10, 8(r1) + stw r10, STACK_INT_FRAME_MARKER(r1) li r10, \trapno stw r10,_TRAP(r1) SAVE_GPRS(3, 8, r1) diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S index 49d585eae7c8..321992c1c9f9 100644 --- a/arch/powerpc/kernel/interrupt_64.S +++ b/arch/powerpc/kernel/interrupt_64.S @@ -77,11 +77,11 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name) std r11,_TRAP(r1) std r12,_CCR(r1) std r3,ORIG_GPR3(r1) + LOAD_REG_IMMEDIATE(r
[PATCH 09/17] powerpc: Rename STACK_FRAME_MARKER and derive it from frame offset
This is a count of longs from the stack pointer to the regs marker. Rename it to make it more distinct from the other byte offsets. It can be derived from the byte offset definitions just added. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/ptrace.h | 4 ++-- arch/powerpc/kernel/process.c | 2 +- arch/powerpc/kernel/stacktrace.c | 2 +- arch/powerpc/perf/callchain.c | 2 +- arch/powerpc/xmon/xmon.c | 3 +-- 5 files changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h index 8a9f4cf8c4c5..fdd50648df56 100644 --- a/arch/powerpc/include/asm/ptrace.h +++ b/arch/powerpc/include/asm/ptrace.h @@ -126,7 +126,6 @@ struct pt_regs STACK_FRAME_OVERHEAD + KERNEL_REDZONE_SIZE) #define STACK_INT_FRAME_REGS STACK_FRAME_OVERHEAD #define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 16) -#define STACK_FRAME_MARKER 12 #ifdef CONFIG_PPC64_ELF_ABI_V2 #define STACK_FRAME_MIN_SIZE 32 @@ -147,7 +146,6 @@ struct pt_regs #define STACK_INT_FRAME_SIZE (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD) #define STACK_INT_FRAME_REGS STACK_FRAME_OVERHEAD #define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 8) -#define STACK_FRAME_MARKER 2 #define STACK_FRAME_MIN_SIZE STACK_FRAME_OVERHEAD /* Size of stack frame allocated when calling signal handler. */ @@ -155,6 +153,8 @@ struct pt_regs #endif /* __powerpc64__ */ +#define STACK_INT_FRAME_MARKER_LONGS (STACK_INT_FRAME_MARKER/sizeof(long)) + #ifndef __ASSEMBLY__ #include diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index e7010f71de24..b0a9e5eeec4c 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -2234,7 +2234,7 @@ void __no_sanitize_address show_stack(struct task_struct *tsk, * We look for the "regs" marker in the current frame. */ if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS) - && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) { + && stack[STACK_INT_FRAME_MARKER_LONGS] == STACK_FRAME_REGS_MARKER) { struct pt_regs *regs = (struct pt_regs *) (sp + STACK_INT_FRAME_REGS); diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c index a2443d61728e..7efa0ec9dd77 100644 --- a/arch/powerpc/kernel/stacktrace.c +++ b/arch/powerpc/kernel/stacktrace.c @@ -136,7 +136,7 @@ int __no_sanitize_address arch_stack_walk_reliable(stack_trace_consume_fn consum /* Mark stacktraces with exception frames as unreliable. */ if (sp <= stack_end - STACK_INT_FRAME_SIZE && - stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) { + stack[STACK_INT_FRAME_MARKER_LONGS] == STACK_FRAME_REGS_MARKER) { return -EINVAL; } diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c index 9e254aed1f61..b01497ed5173 100644 --- a/arch/powerpc/perf/callchain.c +++ b/arch/powerpc/perf/callchain.c @@ -62,7 +62,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re if (next_sp == sp + STACK_INT_FRAME_SIZE && validate_sp(sp, current, STACK_INT_FRAME_SIZE) && - fp[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) { + fp[STACK_INT_FRAME_MARKER_LONGS] == STACK_FRAME_REGS_MARKER) { /* * This looks like an interrupt frame for an * interrupt that occurred in the kernel diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index e403f14eb6eb..bbdaa42ba4ba 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -1720,7 +1720,6 @@ static void get_function_bounds(unsigned long pc, unsigned long *startp, } #define LRSAVE_OFFSET (STACK_FRAME_LR_SAVE * sizeof(unsigned long)) -#define MARKER_OFFSET (STACK_FRAME_MARKER * sizeof(unsigned long)) static void xmon_show_stack(unsigned long sp, unsigned long lr, unsigned long pc) @@ -1783,7 +1782,7 @@ static void xmon_show_stack(unsigned long sp, unsigned long lr, /* Look for "regs" marker to see if this is an exception frame. */ - if (mread(sp + MARKER_OFFSET, &marker, sizeof(unsigned long)) + if (mread(sp + STACK_INT_FRAME_MARKER, &marker, sizeof(unsigned long)) && marker == STACK_FRAME_REGS_MARKER) { if (mread(sp + STACK_INT_FRAME_REGS, ®s, sizeof(regs)) != sizeof(regs)) { printf("Couldn't read registers at %lx\n", -- 2.37.2
[PATCH 10/17] powerpc: add a define for the user interrupt frame size
The user interrupt frame is a different size from the kernel frame, so give it its own name. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/ptrace.h | 6 +++--- arch/powerpc/kernel/process.c | 6 +++--- arch/powerpc/kernel/stacktrace.c | 4 ++-- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h index fdd50648df56..705ce26ae887 100644 --- a/arch/powerpc/include/asm/ptrace.h +++ b/arch/powerpc/include/asm/ptrace.h @@ -122,8 +122,7 @@ struct pt_regs #define STACK_FRAME_OVERHEAD 112 /* size of minimum stack frame */ #define STACK_FRAME_LR_SAVE2 /* Location of LR in stack frame */ -#define STACK_INT_FRAME_SIZE (sizeof(struct pt_regs) + \ -STACK_FRAME_OVERHEAD + KERNEL_REDZONE_SIZE) +#define STACK_USER_INT_FRAME_SIZE (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD) #define STACK_INT_FRAME_REGS STACK_FRAME_OVERHEAD #define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 16) @@ -143,7 +142,7 @@ struct pt_regs #define KERNEL_REDZONE_SIZE0 #define STACK_FRAME_OVERHEAD 16 /* size of minimum stack frame */ #define STACK_FRAME_LR_SAVE1 /* Location of LR in stack frame */ -#define STACK_INT_FRAME_SIZE (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD) +#define STACK_USER_INT_FRAME_SIZE (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD) #define STACK_INT_FRAME_REGS STACK_FRAME_OVERHEAD #define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 8) #define STACK_FRAME_MIN_SIZE STACK_FRAME_OVERHEAD @@ -153,6 +152,7 @@ struct pt_regs #endif /* __powerpc64__ */ +#define STACK_INT_FRAME_SIZE (KERNEL_REDZONE_SIZE + STACK_USER_INT_FRAME_SIZE) #define STACK_INT_FRAME_MARKER_LONGS (STACK_INT_FRAME_MARKER/sizeof(long)) #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index b0a9e5eeec4c..d6daf0d073b3 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1727,15 +1727,15 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) klp_init_thread_info(p); /* Create initial stack frame. */ - sp -= (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD); + sp -= STACK_USER_INT_FRAME_SIZE; ((unsigned long *)sp)[0] = 0; /* Copy registers */ - childregs = (struct pt_regs *)(sp + STACK_FRAME_OVERHEAD); + childregs = (struct pt_regs *)(sp + STACK_INT_FRAME_REGS); if (unlikely(args->fn)) { /* kernel thread */ memset(childregs, 0, sizeof(struct pt_regs)); - childregs->gpr[1] = sp + (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD); + childregs->gpr[1] = sp + STACK_USER_INT_FRAME_SIZE; /* function */ if (args->fn) childregs->gpr[14] = ppc_function_entry((void *)args->fn); diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c index 7efa0ec9dd77..453ac317a6cf 100644 --- a/arch/powerpc/kernel/stacktrace.c +++ b/arch/powerpc/kernel/stacktrace.c @@ -77,7 +77,7 @@ int __no_sanitize_address arch_stack_walk_reliable(stack_trace_consume_fn consum /* * For user tasks, this is the SP value loaded on * kernel entry, see "PACAKSAVE(r13)" in _switch() and -* system_call_common()/EXCEPTION_PROLOG_COMMON(). +* system_call_common(). * * Likewise for non-swapper kernel threads, * this also happens to be the top of the stack @@ -88,7 +88,7 @@ int __no_sanitize_address arch_stack_walk_reliable(stack_trace_consume_fn consum * an unreliable stack trace until it's been * _switch()'ed to for the first time. */ - stack_end -= STACK_FRAME_OVERHEAD + sizeof(struct pt_regs); + stack_end -= STACK_USER_INT_FRAME_SIZE; } else { /* * idle tasks have a custom stack layout, -- 2.37.2
[PATCH 11/17] powerpc: add a define for the switch frame size and regs offset
This is open-coded in process.c, ppc32 uses a different define with the same value, and the C definition is name differently which makes it an extra indirection to grep for. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/ptrace.h | 6 -- arch/powerpc/kernel/asm-offsets.c | 2 +- arch/powerpc/kernel/entry_32.S| 6 +++--- arch/powerpc/kernel/process.c | 12 4 files changed, 16 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h index 705ce26ae887..412ef0749775 100644 --- a/arch/powerpc/include/asm/ptrace.h +++ b/arch/powerpc/include/asm/ptrace.h @@ -97,8 +97,6 @@ struct pt_regs #endif -#define STACK_FRAME_WITH_PT_REGS (STACK_FRAME_OVERHEAD + sizeof(struct pt_regs)) - // Always displays as "REGS" in memory dumps #ifdef CONFIG_CPU_BIG_ENDIAN #define STACK_FRAME_REGS_MARKERASM_CONST(0x52454753) @@ -125,6 +123,8 @@ struct pt_regs #define STACK_USER_INT_FRAME_SIZE (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD) #define STACK_INT_FRAME_REGS STACK_FRAME_OVERHEAD #define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 16) +#define STACK_SWITCH_FRAME_SIZE(sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD) +#define STACK_SWITCH_FRAME_REGSSTACK_FRAME_OVERHEAD #ifdef CONFIG_PPC64_ELF_ABI_V2 #define STACK_FRAME_MIN_SIZE 32 @@ -146,6 +146,8 @@ struct pt_regs #define STACK_INT_FRAME_REGS STACK_FRAME_OVERHEAD #define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 8) #define STACK_FRAME_MIN_SIZE STACK_FRAME_OVERHEAD +#define STACK_SWITCH_FRAME_SIZE(sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD) +#define STACK_SWITCH_FRAME_REGSSTACK_FRAME_OVERHEAD /* Size of stack frame allocated when calling signal handler. */ #define __SIGNAL_FRAMESIZE 64 diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index db5e66c1d031..f7dff906c24b 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -260,7 +260,7 @@ int main(void) /* Interrupt register frame */ DEFINE(INT_FRAME_SIZE, STACK_INT_FRAME_SIZE); - DEFINE(SWITCH_FRAME_SIZE, STACK_FRAME_WITH_PT_REGS); + DEFINE(SWITCH_FRAME_SIZE, STACK_SWITCH_FRAME_SIZE); STACK_PT_REGS_OFFSET(GPR0, gpr[0]); STACK_PT_REGS_OFFSET(GPR1, gpr[1]); STACK_PT_REGS_OFFSET(GPR2, gpr[2]); diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S index 2f61b7d3677c..6e99ec10be89 100644 --- a/arch/powerpc/kernel/entry_32.S +++ b/arch/powerpc/kernel/entry_32.S @@ -215,9 +215,9 @@ ret_from_kernel_thread: * in arch/ppc/kernel/process.c */ _GLOBAL(_switch) - stwur1,-INT_FRAME_SIZE(r1) + stwur1,-SWITCH_FRAME_SIZE(r1) mflrr0 - stw r0,INT_FRAME_SIZE+4(r1) + stw r0,SWITCH_FRAME_SIZE+4(r1) /* r3-r12 are caller saved -- Cort */ SAVE_NVGPRS(r1) stw r0,_NIP(r1) /* Return to switch caller */ @@ -248,7 +248,7 @@ _GLOBAL(_switch) lwz r4,_NIP(r1) /* Return to _switch caller in new task */ mtlrr4 - addir1,r1,INT_FRAME_SIZE + addir1,r1,SWITCH_FRAME_SIZE blr .globl fast_exception_return diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index d6daf0d073b3..a097879b0474 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1779,10 +1779,10 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) * do some house keeping and then return from the fork or clone * system call, using the stack frame created above. */ - sp -= sizeof(struct pt_regs); - kregs = (struct pt_regs *) sp; - sp -= STACK_FRAME_OVERHEAD; + sp -= STACK_SWITCH_FRAME_SIZE; + kregs = (struct pt_regs *)(sp + STACK_SWITCH_FRAME_REGS); p->thread.ksp = sp; + #ifdef CONFIG_HAVE_HW_BREAKPOINT for (i = 0; i < nr_wp_slots(); i++) p->thread.ptrace_bps[i] = NULL; @@ -2232,8 +2232,12 @@ void __no_sanitize_address show_stack(struct task_struct *tsk, /* * See if this is an exception frame. * We look for the "regs" marker in the current frame. +* +* STACK_SWITCH_FRAME_SIZE being the smallest frame that +* could hold a pt_regs, if that does not fit then it can't +* have regs. */ - if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS) + if (validate_sp(sp, tsk, STACK_SWITCH_FRAME_SIZE) && stack[STACK_INT_FRAME_MARKER_LONGS] == STACK_FRAME_REGS_MARKER) { struct pt_regs *regs = (struct pt_regs *) (sp + STACK_INT_FRAME_REGS); -- 2.37.2
[PATCH 12/17] powerpc: copy_thread fill in interrupt frame marker and back chain
Backtraces will not recognise the fork system call interrupt without the regs marker. And regular interrupt entry from userspace creates the back chain to the user stack, so do this for the initial fork frame too, to be consistent. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/process.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index a097879b0474..27956831fa5d 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1728,12 +1728,13 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) /* Create initial stack frame. */ sp -= STACK_USER_INT_FRAME_SIZE; - ((unsigned long *)sp)[0] = 0; + *(unsigned long *)(sp + STACK_INT_FRAME_MARKER) = STACK_FRAME_REGS_MARKER; /* Copy registers */ childregs = (struct pt_regs *)(sp + STACK_INT_FRAME_REGS); if (unlikely(args->fn)) { /* kernel thread */ + ((unsigned long *)sp)[0] = 0; memset(childregs, 0, sizeof(struct pt_regs)); childregs->gpr[1] = sp + STACK_USER_INT_FRAME_SIZE; /* function */ @@ -1753,6 +1754,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) *childregs = *regs; if (usp) childregs->gpr[1] = usp; + ((unsigned long *)sp)[0] = childregs->gpr[1]; p->thread.regs = childregs; /* 64s sets this in ret_from_fork */ if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64)) -- 2.37.2
[PATCH 13/17] powerpc: copy_thread add a back chain to the switch stack frame
Stack unwinders need LR and the back chain as a minimum. The switch stack uses regs->nip for its return pointer rather than lrsave, so that was not set in the fork frame, and neither was the back chain. This change sets those fields in the stack. With this and the previous change, a stack trace in the switch or interrupt stack goes from looking like this: Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 3 PID: 90 Comm: systemd Not tainted NIP: c0011060 LR: c0010f68 CTR: 7fff [ ... regs ... ] NIP [c0011060] _switch+0x160/0x17c LR [c0010f68] _switch+0x68/0x17c Call Trace: To this: Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries CPU: 0 PID: 93 Comm: systemd Not tainted NIP: c0011060 LR: c0010f68 CTR: 7fff [ ... regs ... ] NIP [c0011060] _switch+0x160/0x17c LR [c0010f68] _switch+0x68/0x17c Call Trace: [c5a93e10] [c000cdbc] ret_from_fork_scv+0x0/0x54 --- interrupt: 3000 at 0x7fffa72f56d8 NIP: 7fffa72f56d8 LR: CTR: [ ... regs ... ] NIP [7fffa72f56d8] 0x7fffa72f56d8 LR [] 0x0 --- interrupt: 3000 Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/process.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 27956831fa5d..6cb3982a11ef 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1781,7 +1781,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) * do some house keeping and then return from the fork or clone * system call, using the stack frame created above. */ + ((unsigned long *)sp)[STACK_FRAME_LR_SAVE] = (unsigned long)f; sp -= STACK_SWITCH_FRAME_SIZE; + ((unsigned long *)sp)[0] = sp + STACK_SWITCH_FRAME_SIZE; kregs = (struct pt_regs *)(sp + STACK_SWITCH_FRAME_REGS); p->thread.ksp = sp; -- 2.37.2
[PATCH 14/17] powerpc: split validate_sp into two functions
Most callers just want to validate an arbitrary kernel stack pointer, some need a particular size. Make the size case the exceptional one with an extra function. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/processor.h | 15 --- arch/powerpc/kernel/process.c| 23 ++- arch/powerpc/kernel/stacktrace.c | 2 +- arch/powerpc/perf/callchain.c| 6 +++--- 4 files changed, 30 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 631802999d59..e96c9b8c2a60 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -374,9 +374,18 @@ static inline unsigned long __pack_fe01(unsigned int fpmode) #endif -/* Check that a certain kernel stack pointer is valid in task_struct p */ -int validate_sp(unsigned long sp, struct task_struct *p, - unsigned long nbytes); +/* + * Check that a certain kernel stack pointer is a valid (minimum sized) + * stack frame in task_struct p. + */ +int validate_sp(unsigned long sp, struct task_struct *p); + +/* + * validate the stack frame of a particular minimum size, used for when we are + * looking at a certain object in the stack beyond the minimum. + */ +int validate_sp_size(unsigned long sp, struct task_struct *p, +unsigned long nbytes); /* * Prefetch macros. diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 6cb3982a11ef..6820d90744c3 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -2128,9 +2128,12 @@ static inline int valid_emergency_stack(unsigned long sp, struct task_struct *p, return 0; } - -int validate_sp(unsigned long sp, struct task_struct *p, - unsigned long nbytes) +/* + * validate the stack frame of a particular minimum size, used for when we are + * looking at a certain object in the stack beyond the minimum. + */ +int validate_sp_size(unsigned long sp, struct task_struct *p, +unsigned long nbytes) { unsigned long stack_page = (unsigned long)task_stack_page(p); @@ -2146,7 +2149,10 @@ int validate_sp(unsigned long sp, struct task_struct *p, return valid_emergency_stack(sp, p, nbytes); } -EXPORT_SYMBOL(validate_sp); +int validate_sp(unsigned long sp, struct task_struct *p) +{ + return validate_sp_size(sp, p, STACK_FRAME_OVERHEAD); +} static unsigned long ___get_wchan(struct task_struct *p) { @@ -2154,13 +2160,12 @@ static unsigned long ___get_wchan(struct task_struct *p) int count = 0; sp = p->thread.ksp; - if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD)) + if (!validate_sp(sp, p)) return 0; do { sp = READ_ONCE_NOCHECK(*(unsigned long *)sp); - if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD) || - task_is_running(p)) + if (!validate_sp(sp, p) || task_is_running(p)) return 0; if (count > 0) { ip = READ_ONCE_NOCHECK(((unsigned long *)sp)[STACK_FRAME_LR_SAVE]); @@ -2214,7 +2219,7 @@ void __no_sanitize_address show_stack(struct task_struct *tsk, lr = 0; printk("%sCall Trace:\n", loglvl); do { - if (!validate_sp(sp, tsk, STACK_FRAME_OVERHEAD)) + if (!validate_sp(sp, tsk)) break; stack = (unsigned long *) sp; @@ -2241,7 +2246,7 @@ void __no_sanitize_address show_stack(struct task_struct *tsk, * could hold a pt_regs, if that does not fit then it can't * have regs. */ - if (validate_sp(sp, tsk, STACK_SWITCH_FRAME_SIZE) + if (validate_sp_size(sp, tsk, STACK_SWITCH_FRAME_SIZE) && stack[STACK_INT_FRAME_MARKER_LONGS] == STACK_FRAME_REGS_MARKER) { struct pt_regs *regs = (struct pt_regs *) (sp + STACK_INT_FRAME_REGS); diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c index 453ac317a6cf..1dbbf30f265e 100644 --- a/arch/powerpc/kernel/stacktrace.c +++ b/arch/powerpc/kernel/stacktrace.c @@ -43,7 +43,7 @@ void __no_sanitize_address arch_stack_walk(stack_trace_consume_fn consume_entry, unsigned long *stack = (unsigned long *) sp; unsigned long newsp, ip; - if (!validate_sp(sp, task, STACK_FRAME_OVERHEAD)) + if (!validate_sp(sp, task)) return; newsp = stack[0]; diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c index b01497ed5173..6b4434dd0ff3 100644 --- a/arch/powerpc/perf/callchain.c +++ b/arch/powerpc/perf/callchain.c @@ -27,7 +27,7 @@ static int valid_next_sp(unsigned long sp, unsigned long prev_sp) { if (sp & 0xf)
[PATCH 15/17] powerpc: allow minimum sized kernel stack frames
This affects only 64-bit ELFv2 kernels, and reduces the minimum asm-created stack frame size from 112 to 32 byte on those kernels. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/head_40x.S | 2 +- arch/powerpc/kernel/head_44x.S | 6 +++--- arch/powerpc/kernel/head_64.S| 6 +++--- arch/powerpc/kernel/head_85xx.S | 4 ++-- arch/powerpc/kernel/head_8xx.S | 2 +- arch/powerpc/kernel/head_book3s_32.S | 4 ++-- arch/powerpc/kernel/irq.c| 4 ++-- arch/powerpc/kernel/misc_32.S| 2 +- arch/powerpc/kernel/misc_64.S| 4 ++-- arch/powerpc/kernel/process.c| 2 +- arch/powerpc/kernel/smp.c| 2 +- arch/powerpc/kernel/stacktrace.c | 2 +- 12 files changed, 20 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S index 088f500896c7..918547b93b5e 100644 --- a/arch/powerpc/kernel/head_40x.S +++ b/arch/powerpc/kernel/head_40x.S @@ -602,7 +602,7 @@ start_here: lis r1,init_thread_union@ha addir1,r1,init_thread_union@l li r0,0 - stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1) + stwur0,THREAD_SIZE-STACK_FRAME_MIN_SIZE(r1) bl early_init /* We have to do this with MMU on */ diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S index f15cb9fdb692..63a85c16fef4 100644 --- a/arch/powerpc/kernel/head_44x.S +++ b/arch/powerpc/kernel/head_44x.S @@ -109,7 +109,7 @@ _GLOBAL(_start); lis r1,init_thread_union@h ori r1,r1,init_thread_union@l li r0,0 - stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1) + stwur0,THREAD_SIZE-STACK_FRAME_MIN_SIZE(r1) bl early_init @@ -1012,7 +1012,7 @@ _GLOBAL(start_secondary_47x) */ lis r1,temp_boot_stack@h ori r1,r1,temp_boot_stack@l - addir1,r1,1024-STACK_FRAME_OVERHEAD + addir1,r1,1024-STACK_FRAME_MIN_SIZE li r0,0 stw r0,0(r1) bl mmu_init_secondary @@ -1025,7 +1025,7 @@ _GLOBAL(start_secondary_47x) lwz r1,TASK_STACK(r2) /* Current stack pointer */ - addir1,r1,THREAD_SIZE-STACK_FRAME_OVERHEAD + addir1,r1,THREAD_SIZE-STACK_FRAME_MIN_SIZE li r0,0 stw r0,0(r1) diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index dedcc6fe2263..b513d13bf79e 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -424,7 +424,7 @@ generic_secondary_common_init: /* Create a temp kernel stack for use before relocation is on. */ ld r1,PACAEMERGSP(r13) - subir1,r1,STACK_FRAME_OVERHEAD + subir1,r1,STACK_FRAME_MIN_SIZE /* See if we need to call a cpu state restore handler */ LOAD_REG_ADDR(r23, cur_cpu_spec) @@ -780,7 +780,7 @@ _GLOBAL(pmac_secondary_start) /* Create a temp kernel stack for use before relocation is on. */ ld r1,PACAEMERGSP(r13) - subir1,r1,STACK_FRAME_OVERHEAD + subir1,r1,STACK_FRAME_MIN_SIZE b __secondary_start @@ -958,7 +958,7 @@ start_here_multiplatform: LOAD_REG_IMMEDIATE(r1,THREAD_SIZE) add r1,r3,r1 li r0,0 - stdur0,-STACK_FRAME_OVERHEAD(r1) + stdur0,-STACK_FRAME_MIN_SIZE(r1) /* * Do very early kernel initializations, including initial hash table diff --git a/arch/powerpc/kernel/head_85xx.S b/arch/powerpc/kernel/head_85xx.S index 24f39abf81df..d9bd377dec91 100644 --- a/arch/powerpc/kernel/head_85xx.S +++ b/arch/powerpc/kernel/head_85xx.S @@ -229,7 +229,7 @@ set_ivor: lis r1,init_thread_union@h ori r1,r1,init_thread_union@l li r0,0 - stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1) + stwur0,THREAD_SIZE-STACK_FRAME_MIN_SIZE(r1) #ifdef CONFIG_SMP stw r24, TASK_CPU(r2) @@ -1044,7 +1044,7 @@ __secondary_start: lwz r1,TASK_STACK(r2) /* stack */ - addir1,r1,THREAD_SIZE-STACK_FRAME_OVERHEAD + addir1,r1,THREAD_SIZE-STACK_FRAME_MIN_SIZE li r0,0 stw r0,0(r1) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 0b05f2be66b9..cf546d0e5c40 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -537,7 +537,7 @@ start_here: ori r0, r0, STACK_END_MAGIC@l stw r0, 0(r1) li r0,0 - stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1) + stwur0,THREAD_SIZE-STACK_FRAME_MIN_SIZE(r1) lis r6, swapper_pg_dir@ha tophys(r6,r6) diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S index 519b60695167..40854d092dd3 100644 --- a/arch/powerpc/kernel/head_book3s_32.S +++ b/arch/powerpc/kernel/head_book3s_32.S @@ -840,7 +840,7 @@ __secondary_start:
[PATCH 16/17] powerpc/64: ELFv2 use minimal stack frames in int and switch frame sizes
Adjust the ELFv2 interrupt and switch frames to the minimum C ABI size, plus pt_regs, plus 16 bytes for the aligned regs marker for the int frame (and the switch frame needs to match that because it uses the same regs offset as the int frame). This saves 80 bytes of kernel stack per interrupt. It's the principle of getting our accounting right that's more important than the practical saving. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/ptrace.h | 21 +++-- 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h index 412ef0749775..a9dfce62a5eb 100644 --- a/arch/powerpc/include/asm/ptrace.h +++ b/arch/powerpc/include/asm/ptrace.h @@ -120,16 +119,26 @@ struct pt_regs #define STACK_FRAME_OVERHEAD 112 /* size of minimum stack frame */ #define STACK_FRAME_LR_SAVE2 /* Location of LR in stack frame */ + +#ifdef CONFIG_PPC64_ELF_ABI_V2 +#define STACK_FRAME_MIN_SIZE 32 +#define STACK_USER_INT_FRAME_SIZE (sizeof(struct pt_regs) + STACK_FRAME_MIN_SIZE + 16) +#define STACK_INT_FRAME_REGS (STACK_FRAME_MIN_SIZE + 16) +#define STACK_INT_FRAME_MARKER STACK_FRAME_MIN_SIZE +#define STACK_SWITCH_FRAME_SIZE (sizeof(struct pt_regs) + STACK_FRAME_MIN_SIZE + 16) +#define STACK_SWITCH_FRAME_REGS(STACK_FRAME_MIN_SIZE + 16) +#else +/* + * The ELFv1 ABI specifies 48 bytes plus a minimum 64 byte parameter save + * area. This parameter area is not used by calls to C from interrupt entry, + * so the second from last one of those is used for the frame marker. + */ +#define STACK_FRAME_MIN_SIZE 112 #define STACK_USER_INT_FRAME_SIZE (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD) #define STACK_INT_FRAME_REGS STACK_FRAME_OVERHEAD #define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 16) #define STACK_SWITCH_FRAME_SIZE(sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD) #define STACK_SWITCH_FRAME_REGSSTACK_FRAME_OVERHEAD - -#ifdef CONFIG_PPC64_ELF_ABI_V2 -#define STACK_FRAME_MIN_SIZE 32 -#else -#define STACK_FRAME_MIN_SIZE STACK_FRAME_OVERHEAD #endif /* Size of dummy stack frame allocated when calling signal handler. */ -- 2.37.2
[PATCH 17/17] powerpc: remove STACK_FRAME_OVERHEAD
This is equal to STACK_FRAME_MIN_SIZE on 32-bit and 64-bit ELFv1, and no longer used in 64-bit ELFv2, so replace STACK_FRAME_OVERHEAD occurrences with STACK_FRAME_MIN_SIZE. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/ptrace.h | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h index a9dfce62a5eb..a53c580388e2 100644 --- a/arch/powerpc/include/asm/ptrace.h +++ b/arch/powerpc/include/asm/ptrace.h @@ -117,7 +117,6 @@ struct pt_regs #define USER_REDZONE_SIZE 512 #define KERNEL_REDZONE_SIZE288 -#define STACK_FRAME_OVERHEAD 112 /* size of minimum stack frame */ #define STACK_FRAME_LR_SAVE2 /* Location of LR in stack frame */ #ifdef CONFIG_PPC64_ELF_ABI_V2 @@ -134,11 +133,11 @@ struct pt_regs * so the second from last one of those is used for the frame marker. */ #define STACK_FRAME_MIN_SIZE 112 -#define STACK_USER_INT_FRAME_SIZE (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD) -#define STACK_INT_FRAME_REGS STACK_FRAME_OVERHEAD -#define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 16) -#define STACK_SWITCH_FRAME_SIZE(sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD) -#define STACK_SWITCH_FRAME_REGSSTACK_FRAME_OVERHEAD +#define STACK_USER_INT_FRAME_SIZE (sizeof(struct pt_regs) + STACK_FRAME_MIN_SIZE) +#define STACK_INT_FRAME_REGS STACK_FRAME_MIN_SIZE +#define STACK_INT_FRAME_MARKER (STACK_FRAME_MIN_SIZE - 16) +#define STACK_SWITCH_FRAME_SIZE(sizeof(struct pt_regs) + STACK_FRAME_MIN_SIZE) +#define STACK_SWITCH_FRAME_REGSSTACK_FRAME_MIN_SIZE #endif /* Size of dummy stack frame allocated when calling signal handler. */ @@ -149,14 +148,13 @@ struct pt_regs #define USER_REDZONE_SIZE 0 #define KERNEL_REDZONE_SIZE0 -#define STACK_FRAME_OVERHEAD 16 /* size of minimum stack frame */ +#define STACK_FRAME_MIN_SIZE 16 #define STACK_FRAME_LR_SAVE1 /* Location of LR in stack frame */ -#define STACK_USER_INT_FRAME_SIZE (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD) -#define STACK_INT_FRAME_REGS STACK_FRAME_OVERHEAD -#define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 8) -#define STACK_FRAME_MIN_SIZE STACK_FRAME_OVERHEAD -#define STACK_SWITCH_FRAME_SIZE(sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD) -#define STACK_SWITCH_FRAME_REGSSTACK_FRAME_OVERHEAD +#define STACK_USER_INT_FRAME_SIZE (sizeof(struct pt_regs) + STACK_FRAME_MIN_SIZE) +#define STACK_INT_FRAME_REGS STACK_FRAME_MIN_SIZE +#define STACK_INT_FRAME_MARKER (STACK_FRAME_MIN_SIZE - 8) +#define STACK_SWITCH_FRAME_SIZE(sizeof(struct pt_regs) + STACK_FRAME_MIN_SIZE) +#define STACK_SWITCH_FRAME_REGSSTACK_FRAME_MIN_SIZE /* Size of stack frame allocated when calling signal handler. */ #define __SIGNAL_FRAMESIZE 64 -- 2.37.2
Re: [PATCH linux-next][RFC]torture: avoid offline tick_do_timer_cpu
On Sun, Nov 27, 2022 at 01:40:28PM +0100, Thomas Gleixner wrote: [ . . . ] > >> No. We are not exporting this just to make a bogus test case happy. > >> > >> Fix the torture code to handle -EBUSY correctly. > > I am going to do a study on this, for now, I do a grep in the kernel tree: > > find . -name "*.c"|xargs grep cpuhp_setup_state|wc -l > > The result of the grep command shows that there are 268 > > cpuhp_setup_state* cases. > > which may make our task more complicated. > > Why? The whole point of this torture thing is to stress the > infrastructure. Indeed. > There are quite some reasons why a CPU-hotplug or a hot-unplug operation > can fail, which is not a fatal problem, really. > > So if a CPU hotplug operation fails, then why can't the torture test > just move on and validate that the system still behaves correctly? > > That gives us more coverage than just testing the good case and giving > up when something unexpected happens. Agreed, with access to a function like the tick_nohz_full_timekeeper() suggested earlier in this email thread, then yes, it would make sense to try to offline the CPU anyway, then forgive the failure in cases where the CPU matches that indicated by tick_nohz_full_timekeeper(). > I even argue that the torture test should inject random failures into > the hotplug state machine to achieve extended code coverage. I could imagine torture_onoff() telling various CPU-hotplug notifiers to refuse the transition using some TBD interface. That would better test the CPU-hotplug common code's ability to deal with failures. Or did you have something else/additional in mind? Thanx, Paul
[powerpc:next-test] BUILD SUCCESS 4eef1c9ccd19132c34fd55e79b104ace87ff09d4
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test branch HEAD: 4eef1c9ccd19132c34fd55e79b104ace87ff09d4 selftests/powerpc: Account for offline cpus in perf-hwbreak test elapsed time: 743m configs tested: 58 configs skipped: 4 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arc defconfig alpha defconfig um i386_defconfig x86_64randconfig-a011 x86_64randconfig-a004 x86_64rhel-8.3-kselftests x86_64randconfig-a002 x86_64 rhel-8.3-func um x86_64_defconfig x86_64 rhel-8.3 x86_64randconfig-a013 powerpc allnoconfig x86_64randconfig-a006 x86_64 defconfig i386defconfig i386 randconfig-a014 sh allmodconfig arc randconfig-r043-20221127 i386 randconfig-a001 riscvrandconfig-r042-20221127 x86_64randconfig-a015 x86_64 allyesconfig i386 randconfig-a003 i386 randconfig-a005 s390 randconfig-r044-20221127 ia64 allmodconfig x86_64 rhel-8.3-kvm i386 randconfig-a012 s390defconfig i386 randconfig-a016 s390 allmodconfig i386 allyesconfig x86_64 rhel-8.3-syz m68k allyesconfig x86_64 rhel-8.3-kunit s390 allyesconfig mips allyesconfig powerpc allmodconfig alphaallyesconfig arc allyesconfig m68k allmodconfig arm defconfig arm allyesconfig arm64allyesconfig clang tested configs: hexagon randconfig-r045-20221127 hexagon randconfig-r041-20221127 x86_64randconfig-a012 x86_64randconfig-a005 x86_64randconfig-a001 x86_64randconfig-a016 x86_64randconfig-a003 i386 randconfig-a013 i386 randconfig-a011 i386 randconfig-a004 i386 randconfig-a002 i386 randconfig-a006 x86_64randconfig-a014 i386 randconfig-a015 -- 0-DAY CI Kernel Test Service https://01.org/lkp
[powerpc:topic/ppc-kvm] BUILD SUCCESS a96b20758b23be7e9f693218908228d6100c3c26
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git topic/ppc-kvm branch HEAD: a96b20758b23be7e9f693218908228d6100c3c26 KVM: PPC: Book3S HV: Use the bitmap API to allocate bitmaps elapsed time: 743m configs tested: 2 configs skipped: 100 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: powerpc allnoconfig powerpc allmodconfig -- 0-DAY CI Kernel Test Service https://01.org/lkp
[powerpc:fixes-test] BUILD SUCCESS 2e7ec190a0e38aaa8a6d87fd5f804ec07947febc
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git fixes-test branch HEAD: 2e7ec190a0e38aaa8a6d87fd5f804ec07947febc powerpc/64s: Add missing declaration for machine_check_early_boot() elapsed time: 746m configs tested: 58 configs skipped: 2 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: x86_64 rhel-8.3-func x86_64rhel-8.3-kselftests x86_64 rhel-8.3-kunit x86_64 rhel-8.3-kvm x86_64randconfig-a013 x86_64 rhel-8.3-syz x86_64randconfig-a011 x86_64randconfig-a015 um i386_defconfig um x86_64_defconfig arc defconfig i386 randconfig-a001 s390 allmodconfig x86_64 defconfig alpha defconfig i386 randconfig-a003 sh allmodconfig i386defconfig powerpc allmodconfig x86_64randconfig-a006 i386 randconfig-a016 mips allyesconfig ia64 allmodconfig s390defconfig i386 randconfig-a005 i386 randconfig-a012 s390 allyesconfig x86_64 rhel-8.3 x86_64 allyesconfig i386 randconfig-a014 arc randconfig-r043-20221127 m68k allmodconfig powerpc allnoconfig arc allyesconfig i386 allyesconfig x86_64randconfig-a002 alphaallyesconfig riscvrandconfig-r042-20221127 m68k allyesconfig x86_64randconfig-a004 s390 randconfig-r044-20221127 arm defconfig arm allyesconfig arm64allyesconfig clang tested configs: x86_64randconfig-a014 x86_64randconfig-a012 x86_64randconfig-a016 hexagon randconfig-r045-20221127 hexagon randconfig-r041-20221127 x86_64randconfig-a005 i386 randconfig-a002 i386 randconfig-a015 i386 randconfig-a006 i386 randconfig-a013 i386 randconfig-a004 i386 randconfig-a011 x86_64randconfig-a001 x86_64randconfig-a003 -- 0-DAY CI Kernel Test Service https://01.org/lkp
Re: [PATCH 2/3] powerpc/book3e: remove #include
Thomas Weißschuh writes: > On 2022-11-26 07:36+, Christophe Leroy wrote: >> Le 26/11/2022 à 06:10, Thomas Weißschuh a écrit : >>> Commit 7ad4bd887d27 ("powerpc/book3e: get rid of #include >>> ") >>> removed the usage of the define UTS_VERSION but forgot to drop the >>> include. >> >> What about: >> arch/powerpc/platforms/52xx/efika.c >> arch/powerpc/platforms/amigaone/setup.c >> arch/powerpc/platforms/chrp/setup.c >> arch/powerpc/platforms/powermac/bootx_init.c >> >> I believe you can do a lot more than what you did in your series. > > The commit messages are wrong. > They should have said UTS_RELEASE instead of UTS_VERSION. > > Could the maintainers fix this up when applying? > I also changed it locally so it will be fixed for v2. I'll take this patch, but not the others. cheers
Re: [PATCH] powerpc/64s: Add missing declaration for machine_check_early_boot()
On Fri Nov 25, 2022 at 11:25 PM AEST, Michael Ellerman wrote: > There's no declaration for machine_check_early_boot(), which leads to a > build failure with W=1. Add one. > > Fixes: 2f5182cffa43 ("powerpc/64s: early boot machine check handler") > Signed-off-by: Michael Ellerman Acked-by: Nicholas Piggin > --- > arch/powerpc/include/asm/interrupt.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/powerpc/include/asm/interrupt.h > b/arch/powerpc/include/asm/interrupt.h > index 4745bb9998bd..6d8492b6e2b8 100644 > --- a/arch/powerpc/include/asm/interrupt.h > +++ b/arch/powerpc/include/asm/interrupt.h > @@ -602,6 +602,7 @@ ##func(struct pt_regs *regs) > /* kernel/traps.c */ > DECLARE_INTERRUPT_HANDLER_NMI(system_reset_exception); > #ifdef CONFIG_PPC_BOOK3S_64 > +DECLARE_INTERRUPT_HANDLER_RAW(machine_check_early_boot); > DECLARE_INTERRUPT_HANDLER_ASYNC(machine_check_exception_async); > #endif > DECLARE_INTERRUPT_HANDLER_NMI(machine_check_exception); > -- > 2.38.1
Re: [PATCH v2 1/4] powerpc/64: Add INTERRUPT_SANITIZE_REGISTERS Kconfig
On Tue Nov 8, 2022 at 12:28 AM AEST, Christophe Leroy wrote: > > > Le 07/11/2022 à 04:31, Rohan McLure a écrit : > > Add Kconfig option for enabling clearing of registers on arrival in an > > interrupt handler. This reduces the speculation influence of registers > > on kernel internals. The option will be consumed by 64-bit systems that > > feature speculation and wish to implement this mitigation. > > > > This patch only introduces the Kconfig option, no actual mitigations. > > If that has to do with speculation, do we need a new Kconfig option ? > Can't we use CONFIG_PPC_BARRIER_NOSPEC for that ? NOSPEC barrier adds runtime-patchable hardware barrier and that config is a build implementation detail. Also that spec barrier is for bounds checks speculation that is easy to get the kernel to do something like speculatively branch to arbitrary address. Interrupt/syscall register sanitization is more handwavy. It could be a bandaid for cases where the above speculation barrier was missed for exampel. But at some point, at least for syscalls, registers have to contain some values influenced by userspace so if we were paranoid we would have to put barriers before every branch while any registers contained a value from userspace. A security option menu might be a good idea though. There's some other build time options like rop protection that we might want to add. Thanks, Nick
Re: [PATCH v2 2/4] powerpc/64s: Clear gprs on interrupt routine entry on Book3S
On Mon Nov 7, 2022 at 1:32 PM AEST, Rohan McLure wrote: > Zero user state in gprs (assign to zero) to reduce the influence of user > registers on speculation within kernel syscall handlers. Clears occur > at the very beginning of the sc and scv 0 interrupt handlers, with > restores occurring following the execution of the syscall handler. > > Zero GPRS r0, r2-r11, r14-r31, on entry into the kernel for all > other interrupt sources. The remaining gprs are overwritten by > entry macros to interrupt handlers, irrespective of whether or not a > given handler consumes these register values. > > Prior to this commit, r14-r31 are restored on a per-interrupt basis at > exit, but now they are always restored on 64bit Book3S. Remove explicit > REST_NVGPRS invocations on 64-bit Book3S. 32-bit systems do not clear > user registers on interrupt, and continue to depend on the return value > of interrupt_exit_user_prepare to determine whether or not to restore > non-volatiles. > > The mmap_bench benchmark in selftests should rapidly invoke pagefaults. > See ~0.8% performance regression with this mitigation, but this > indicates the worst-case performance due to heavier-weight interrupt > handlers. This mitigation is able to be enabled/disabled through > CONFIG_INTERRUPT_SANITIZE_REGISTERS. I think it looks good. You could put those macros into a .h file shared by exceptions-64s.S and interrupt_64.S. Also interrupt_64.S could use the HANDLER_RESTORE_NVGPRS macro to kill a few ifdefs I think? The IMSR_R12 change *could* be done in a separate patch, if you're doing another spin... sorry for the late feedback. Reviewed-by: Nicholas Piggin > > Signed-off-by: Rohan McLure > --- > Resubmitting patches as their own series after v6 partially merged: > Link: > https://lore.kernel.org/all/166488988686.779920.13794870102696416283.b4...@ellerman.id.au/t/ > > v2: REST_NVGPRS should be conditional on mitigation in scv handler. Fix > improper multi-line preprocessor macro in interrupt_64.S > --- > arch/powerpc/kernel/exceptions-64s.S | 47 +- > arch/powerpc/kernel/interrupt_64.S | 36 > 2 files changed, 74 insertions(+), 9 deletions(-) > > diff --git a/arch/powerpc/kernel/exceptions-64s.S > b/arch/powerpc/kernel/exceptions-64s.S > index 651c36b056bd..0605018762d1 100644 > --- a/arch/powerpc/kernel/exceptions-64s.S > +++ b/arch/powerpc/kernel/exceptions-64s.S > @@ -21,6 +21,19 @@ > #include > #include > > +/* > + * macros for handling user register sanitisation > + */ > +#ifdef CONFIG_INTERRUPT_SANITIZE_REGISTERS > +#define SANITIZE_ZEROIZE_NVGPRS()ZEROIZE_NVGPRS() > +#define SANITIZE_RESTORE_NVGPRS()REST_NVGPRS(r1) > +#define HANDLER_RESTORE_NVGPRS() > +#else > +#define SANITIZE_ZEROIZE_NVGPRS() > +#define SANITIZE_RESTORE_NVGPRS() > +#define HANDLER_RESTORE_NVGPRS() REST_NVGPRS(r1) > +#endif /* CONFIG_INTERRUPT_SANITIZE_REGISTERS */ > + > /* > * Following are fixed section helper macros. > * > @@ -111,6 +124,7 @@ name: > #define ISTACK .L_ISTACK_\name\() /* Set regular kernel > stack */ > #define __ISTACK(name) .L_ISTACK_ ## name > #define IKUAP.L_IKUAP_\name\() /* Do KUAP lock */ > +#define IMSR_R12 .L_IMSR_R12_\name\()/* Assumes MSR saved to r12 */ > > #define INT_DEFINE_BEGIN(n) \ > .macro int_define_ ## n name > @@ -176,6 +190,9 @@ do_define_int n > .ifndef IKUAP > IKUAP=1 > .endif > + .ifndef IMSR_R12 > + IMSR_R12=0 > + .endif > .endm > > /* > @@ -502,6 +519,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real, text) > std r10,0(r1) /* make stack chain pointer */ > std r0,GPR0(r1) /* save r0 in stackframe*/ > std r10,GPR1(r1)/* save r1 in stackframe*/ > + ZEROIZE_GPR(0) > > /* Mark our [H]SRRs valid for return */ > li r10,1 > @@ -544,8 +562,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) > std r9,GPR11(r1) > std r10,GPR12(r1) > std r11,GPR13(r1) > + .if !IMSR_R12 > + ZEROIZE_GPRS(9, 12) > + .else > + ZEROIZE_GPRS(9, 11) > + .endif > > SAVE_NVGPRS(r1) > + SANITIZE_ZEROIZE_NVGPRS() > > .if IDAR > .if IISIDE > @@ -577,8 +601,8 @@ BEGIN_FTR_SECTION > END_FTR_SECTION_IFSET(CPU_FTR_CFAR) > ld r10,IAREA+EX_CTR(r13) > std r10,_CTR(r1) > - std r2,GPR2(r1) /* save r2 in stackframe*/ > - SAVE_GPRS(3, 8, r1) /* save r3 - r8 in stackframe */ > + SAVE_GPRS(2, 8, r1) /* save r2 - r8 in stackframe */ > + ZEROIZE_GPRS(2, 8) > mflrr9 /* Get LR, later save to stack */ > LOAD_PACA_TOC() /* get kernel TOC into r2 */ > std r9,_LINK(r1) > @@ -696,6 +720,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR) >
Re: [PATCH v2 3/4] powerpc/64e: Clear gprs on interrupt routine entry on Book3E
On Mon Nov 7, 2022 at 1:32 PM AEST, Rohan McLure wrote: > Zero GPRS r14-r31 on entry into the kernel for interrupt sources to > limit influence of user-space values in potential speculation gadgets. > Prior to this commit, all other GPRS are reassigned during the common > prologue to interrupt handlers and so need not be zeroised explicitly. > > This may be done safely, without loss of register state prior to the > interrupt, as the common prologue saves the initial values of > non-volatiles, which are unconditionally restored in interrupt_64.S. In the case of ret_from_crit_except and ret_from_mc_except, it looks like those are restored by ret_from_level_except, so that's fine. And fast_interrupt_return you added NVGPRS restore in the previous patch too. Maybe actually you could move that interrupt_64.h code that applies to both 64s and 64e in patch 1. So then the 64s/e enablement patches are independent and apply to exactly that subarch. But code-wise I think this looks good. Reviewed-by: Nicholas Piggin > Mitigation defaults to enabled by INTERRUPT_SANITIZE_REGISTERS. > > Signed-off-by: Rohan McLure > --- > Resubmitting patches as their own series after v6 partially merged: > Link: > https://lore.kernel.org/all/166488988686.779920.13794870102696416283.b4...@ellerman.id.au/t/ > --- > arch/powerpc/kernel/exceptions-64e.S | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/exceptions-64e.S > b/arch/powerpc/kernel/exceptions-64e.S > index 2f68fb2ee4fc..91d8019123c2 100644 > --- a/arch/powerpc/kernel/exceptions-64e.S > +++ b/arch/powerpc/kernel/exceptions-64e.S > @@ -358,6 +358,11 @@ ret_from_mc_except: > std r14,PACA_EXMC+EX_R14(r13); \ > std r15,PACA_EXMC+EX_R15(r13) > > +#ifdef CONFIG_INTERRUPT_SANITIZE_REGISTERS > +#define SANITIZE_ZEROIZE_NVGPRS()ZEROIZE_NVGPRS() > +#else > +#define SANITIZE_ZEROIZE_NVGPRS() > +#endif Could possibly share these macros. > > /* Core exception code for all exceptions except TLB misses. */ > #define EXCEPTION_COMMON_LVL(n, scratch, excf) > \ > @@ -394,7 +399,8 @@ exc_##n##_common: > \ > std r12,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame */ \ > std r3,_TRAP(r1); /* set trap number */ \ > std r0,RESULT(r1); /* clear regs->result */\ > - SAVE_NVGPRS(r1); > + SAVE_NVGPRS(r1);\ > + SANITIZE_ZEROIZE_NVGPRS(); /* minimise speculation influence */ > > #define EXCEPTION_COMMON(n) \ > EXCEPTION_COMMON_LVL(n, SPRN_SPRG_GEN_SCRATCH, PACA_EXGEN) > -- > 2.34.1
Re: [PATCH v2 4/4] powerpc/64s: Sanitise user registers on interrupt in pseries
On Mon Nov 7, 2022 at 1:32 PM AEST, Rohan McLure wrote: > Cause pseries platforms to default to zeroising all potentially user-defined > registers when entering the kernel by means of any interrupt source, > reducing user-influence of the kernel and the likelihood or producing > speculation gadgets. For POWERNV as well? Thanks, Nick > > Signed-off-by: Rohan McLure > --- > Resubmitting patches as their own series after v6 partially merged: > Link: > https://lore.kernel.org/all/166488988686.779920.13794870102696416283.b4...@ellerman.id.au/t/ > --- > arch/powerpc/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 9d3d20c6f365..2eb328b25e49 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -532,7 +532,7 @@ config HOTPLUG_CPU > config INTERRUPT_SANITIZE_REGISTERS > bool "Clear gprs on interrupt arrival" > depends on PPC64 && ARCH_HAS_SYSCALL_WRAPPER > - default PPC_BOOK3E_64 > + default PPC_BOOK3E_64 || PPC_PSERIES > help > Reduce the influence of user register state on interrupt handlers and > syscalls through clearing user state from registers before handling > -- > 2.34.1
Re: [PATCH 03/13] powerpc/rtas: avoid device tree lookups in rtas_os_term()
On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote: > rtas_os_term() is called during panic. Its behavior depends on a > couple of conditions in the /rtas node of the device tree, the > traversal of which entails locking and local IRQ state changes. If the > kernel panics while devtree_lock is held, rtas_os_term() as currently > written could hang. Nice. > > Instead of discovering the relevant characteristics at panic time, > cache them in file-static variables at boot. Note the lookup for > "ibm,extended-os-term" is converted to of_property_read_bool() since > it is a boolean property, not a RTAS function token. Small nit, but you could do that at the query site unless you were going to start using ibm,os-term without the extended capability. Reviewed-by: Nicholas Piggin > > Signed-off-by: Nathan Lynch > --- > arch/powerpc/kernel/rtas.c | 14 +++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c > index c12dd5ed5e00..81e4996012b7 100644 > --- a/arch/powerpc/kernel/rtas.c > +++ b/arch/powerpc/kernel/rtas.c > @@ -947,6 +947,8 @@ void __noreturn rtas_halt(void) > > /* Must be in the RMO region, so we place it here */ > static char rtas_os_term_buf[2048]; > +static s32 ibm_os_term_token = RTAS_UNKNOWN_SERVICE; > +static bool ibm_extended_os_term; > > void rtas_os_term(char *str) > { > @@ -958,14 +960,13 @@ void rtas_os_term(char *str) >* this property may terminate the partition which we want to avoid >* since it interferes with panic_timeout. >*/ > - if (RTAS_UNKNOWN_SERVICE == rtas_token("ibm,os-term") || > - RTAS_UNKNOWN_SERVICE == rtas_token("ibm,extended-os-term")) > + if (ibm_os_term_token == RTAS_UNKNOWN_SERVICE || !ibm_extended_os_term) > return; > > snprintf(rtas_os_term_buf, 2048, "OS panic: %s", str); > > do { > - status = rtas_call(rtas_token("ibm,os-term"), 1, 1, NULL, > + status = rtas_call(ibm_os_term_token, 1, 1, NULL, > __pa(rtas_os_term_buf)); > } while (rtas_busy_delay(status)); > > @@ -1335,6 +1336,13 @@ void __init rtas_initialize(void) > no_entry = of_property_read_u32(rtas.dev, "linux,rtas-entry", &entry); > rtas.entry = no_entry ? rtas.base : entry; > > + /* > + * Discover these now to avoid device tree lookups in the > + * panic path. > + */ > + ibm_os_term_token = rtas_token("ibm,os-term"); > + ibm_extended_os_term = of_property_read_bool(rtas.dev, > "ibm,extended-os-term"); > + > /* If RTAS was found, allocate the RMO buffer for it and look for >* the stop-self token if any >*/ > -- > 2.37.1
Re: [PATCH 04/13] powerpc/rtas: avoid scheduling in rtas_os_term()
On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote: > It's unsafe to use rtas_busy_delay() to handle a busy status from > the ibm,os-term RTAS function in rtas_os_term(): > > Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b > BUG: sleeping function called from invalid context at > arch/powerpc/kernel/rtas.c:618 > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 > preempt_count: 2, expected: 0 > CPU: 7 PID: 1 Comm: swapper/0 Tainted: G D > 6.0.0-rc5-02182-gf8553a572277-dirty #9 > Call Trace: > [c7b8f000] [c1337110] dump_stack_lvl+0xb4/0x110 (unreliable) > [c7b8f040] [c02440e4] __might_resched+0x394/0x3c0 > [c7b8f0e0] [c004f680] rtas_busy_delay+0x120/0x1b0 > [c7b8f100] [c0052d04] rtas_os_term+0xb8/0xf4 > [c7b8f180] [c01150fc] pseries_panic+0x50/0x68 > [c7b8f1f0] [c0036354] ppc_panic_platform_handler+0x34/0x50 > [c7b8f210] [c02303c4] notifier_call_chain+0xd4/0x1c0 > [c7b8f2b0] [c02306cc] atomic_notifier_call_chain+0xac/0x1c0 > [c7b8f2f0] [c01d62b8] panic+0x228/0x4d0 > [c7b8f390] [c01e573c] do_exit+0x140c/0x1420 > [c7b8f480] [c01e586c] make_task_dead+0xdc/0x200 > > Use rtas_busy_delay_time() instead, which signals without side effects > whether to attempt the ibm,os-term RTAS call again. rtas_busy_delay should probably be renamed to rtas_busy_sleep, to make that self-documenting that it can schedule. You could then add a rtas_busy_delay which doesn't sleep, which a few other places could use... But that's a bigger chance and there is precedent for using this call this way, so looks okay to me. Maybe you could open-code an mdelay though, although I guess firmware should be tolerant of calling it in a loop. Reviewed-by: Nicholas Piggin > > Signed-off-by: Nathan Lynch > --- > arch/powerpc/kernel/rtas.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c > index 81e4996012b7..51f0508593a7 100644 > --- a/arch/powerpc/kernel/rtas.c > +++ b/arch/powerpc/kernel/rtas.c > @@ -965,10 +965,15 @@ void rtas_os_term(char *str) > > snprintf(rtas_os_term_buf, 2048, "OS panic: %s", str); > > + /* > + * Keep calling as long as RTAS returns a "try again" status, > + * but don't use rtas_busy_delay(), which potentially > + * schedules. > + */ > do { > status = rtas_call(ibm_os_term_token, 1, 1, NULL, > __pa(rtas_os_term_buf)); > - } while (rtas_busy_delay(status)); > + } while (rtas_busy_delay_time(status)); > > if (status != 0) > printk(KERN_EMERG "ibm,os-term call failed %d\n", status); > -- > 2.37.1
Re: [PATCH 11/13] powerpc/rtas: strengthen do_enter_rtas() type safety, drop inline
On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote: > Make do_enter_rtas() take a pointer to struct rtas_args and do the > __pa() conversion in one place instead of leaving it to callers. This > also makes it possible to introduce enter/exit tracepoints that access > the rtas_args struct fields. > > There's no apparent reason to force inlining of do_enter_rtas() > either, and it seems to bloat the code a bit. Let the compiler decide. Reviewed-by: Nicholas Piggin > > Signed-off-by: Nathan Lynch > --- > arch/powerpc/kernel/rtas.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c > index a88db3b3486f..198366d641d0 100644 > --- a/arch/powerpc/kernel/rtas.c > +++ b/arch/powerpc/kernel/rtas.c > @@ -522,7 +522,7 @@ static const struct rtas_function > *rtas_token_to_function(s32 token) > /* This is here deliberately so it's only used in this file */ > void enter_rtas(unsigned long); > > -static inline void do_enter_rtas(unsigned long args) > +static void do_enter_rtas(struct rtas_args *args) > { > unsigned long msr; > > @@ -537,7 +537,7 @@ static inline void do_enter_rtas(unsigned long args) > > hard_irq_disable(); /* Ensure MSR[EE] is disabled on PPC64 */ > > - enter_rtas(args); > + enter_rtas(__pa(args)); > > srr_regs_clobbered(); /* rtas uses SRRs, invalidate */ > } > @@ -908,7 +908,7 @@ static char *__fetch_rtas_last_error(char *altbuf) > save_args = rtas.args; > rtas.args = err_args; > > - do_enter_rtas(__pa(&rtas.args)); > + do_enter_rtas(&rtas.args); > > err_args = rtas.args; > rtas.args = save_args; > @@ -955,7 +955,7 @@ va_rtas_call_unlocked(struct rtas_args *args, int token, > int nargs, int nret, > for (i = 0; i < nret; ++i) > args->rets[i] = 0; > > - do_enter_rtas(__pa(args)); > + do_enter_rtas(args); > } > > void rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int > nret, ...) > @@ -1731,7 +1731,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) > flags = lock_rtas(); > > rtas.args = args; > - do_enter_rtas(__pa(&rtas.args)); > + do_enter_rtas(&rtas.args); > args = rtas.args; > > /* A -1 return code indicates that the last command couldn't > -- > 2.37.1
[RFC PATCH 08/13] powerpc/dexcr: Add enforced userspace ROP protection config
The DEXCR Non-Privileged Hash Instruction Enable (NPHIE) aspect controls whether the hashst and hashchk instructions are treated as no-ops by the CPU. NPHIE behaviour per ISA 3.1B: 0: hashst and hashchk instructions are executed as no-ops (even when allowed by PCR) 1: hashst and hashchk instructions are executed normally (if allowed by PCR) Currently this aspect may be set per-process by prctl() or enforced globally by the hypervisor. Add a kernel config option PPC_USER_ROP_PROTECT to enforce DEXCR[NPHIE] globally regardless of prctl() or hypervisor. If set, don't report NPHIE as editable via prctl(), as the prctl() value can never take effect. Signed-off-by: Benjamin Gray --- arch/powerpc/Kconfig| 5 + arch/powerpc/kernel/dexcr.c | 15 +++ 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 699df27b0e2f..ba3458d07744 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -434,6 +434,11 @@ config PGTABLE_LEVELS default 2 if !PPC64 default 4 +config PPC_USER_ROP_PROTECT + bool + depends on PPC_BOOK3S_64 + default y + source "arch/powerpc/sysdev/Kconfig" source "arch/powerpc/platforms/Kconfig" diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c index 8239bcc92026..394140fc23aa 100644 --- a/arch/powerpc/kernel/dexcr.c +++ b/arch/powerpc/kernel/dexcr.c @@ -2,6 +2,7 @@ #include #include #include +#include #include #include #include @@ -18,8 +19,8 @@ #define DEFAULT_DEXCR 0 /* Allow process configuration of these by default */ -#define DEXCR_PRCTL_EDITABLE (DEXCR_PRO_SBHE | DEXCR_PRO_IBRTPD | \ - DEXCR_PRO_SRAPD | DEXCR_PRO_NPHIE) +static unsigned long dexcr_prctl_editable __ro_after_init = + DEXCR_PRO_SBHE | DEXCR_PRO_IBRTPD | DEXCR_PRO_SRAPD | DEXCR_PRO_NPHIE; /* * Lock to protect system DEXCR override from concurrent updates. @@ -83,6 +84,12 @@ static int __init dexcr_init(void) if (early_cpu_has_feature(CPU_FTR_DEXCR_SBHE)) update_userspace_system_dexcr(DEXCR_PRO_SBHE, spec_branch_hint_enable); + if (early_cpu_has_feature(CPU_FTR_DEXCR_NPHIE) && + IS_ENABLED(CONFIG_PPC_USER_ROP_PROTECT)) { + update_userspace_system_dexcr(DEXCR_PRO_NPHIE, 1); + dexcr_prctl_editable &= ~DEXCR_PRO_NPHIE; + } + return 0; } early_initcall(dexcr_init); @@ -131,7 +138,7 @@ static int dexcr_aspect_get(struct task_struct *task, unsigned int aspect) { int ret = 0; - if (aspect & DEXCR_PRCTL_EDITABLE) + if (aspect & dexcr_prctl_editable) ret |= PR_PPC_DEXCR_PRCTL; if (aspect & task->thread.dexcr_mask) { @@ -174,7 +181,7 @@ int dexcr_prctl_get(struct task_struct *task, unsigned long which) static int dexcr_aspect_set(struct task_struct *task, unsigned int aspect, unsigned long ctrl) { - if (!(aspect & DEXCR_PRCTL_EDITABLE)) + if (!(aspect & dexcr_prctl_editable)) return -ENXIO; /* Aspect is not allowed to be changed by prctl */ if (aspect & task->thread.dexcr_forced) -- 2.38.1
[RFC PATCH 04/13] powerpc/dexcr: Support userspace ROP protection
The ISA 3.1B hashst and hashchk instructions use a per-cpu SPR HASHKEYR to hold a key used in the hash calculation. This key should be different for each process to make it harder for a malicious process to recreate valid hash values for a victim process. Add support for storing a per-thread hash key, and setting/clearing HASHKEYR appropriately. Signed-off-by: Benjamin Gray --- arch/powerpc/include/asm/book3s/64/kexec.h | 3 +++ arch/powerpc/include/asm/processor.h | 1 + arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/kernel/process.c | 12 4 files changed, 17 insertions(+) diff --git a/arch/powerpc/include/asm/book3s/64/kexec.h b/arch/powerpc/include/asm/book3s/64/kexec.h index 563baf94a962..163de935df28 100644 --- a/arch/powerpc/include/asm/book3s/64/kexec.h +++ b/arch/powerpc/include/asm/book3s/64/kexec.h @@ -24,6 +24,9 @@ static inline void reset_sprs(void) if (cpu_has_feature(CPU_FTR_ARCH_31)) mtspr(SPRN_DEXCR, 0); + if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE)) + mtspr(SPRN_HASHKEYR, 0); + /* Do we need isync()? We are going via a kexec reset */ isync(); } diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index c17ec1e44c86..2381217c95dc 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -264,6 +264,7 @@ struct thread_struct { unsigned long mmcr3; unsigned long sier2; unsigned long sier3; + unsigned long hashkeyr; #endif }; diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index cdd1f174c399..854664cf844f 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -384,6 +384,7 @@ #define SPRN_HRMOR 0x139 /* Real mode offset register */ #define SPRN_HSRR0 0x13A /* Hypervisor Save/Restore 0 */ #define SPRN_HSRR1 0x13B /* Hypervisor Save/Restore 1 */ +#define SPRN_HASHKEYR 0x1D4 /* Non-privileged hashst/hashchk key register */ #define SPRN_ASDR 0x330 /* Access segment descriptor register */ #define SPRN_DEXCR 0x33C /* Dynamic execution control register */ #define DEXCR_PRO_MASK(aspect) __MASK(63 - (32 + (aspect)))/* Aspect number to problem state aspect mask */ diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 17d26f652b80..4d7b0c7641d0 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1229,6 +1229,9 @@ static inline void restore_sprs(struct thread_struct *old_thread, old_thread->tidr != new_thread->tidr) mtspr(SPRN_TIDR, new_thread->tidr); + if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE)) + mtspr(SPRN_HASHKEYR, new_thread->hashkeyr); + if (cpu_has_feature(CPU_FTR_ARCH_31)) { unsigned long new_dexcr = get_thread_dexcr(new_thread); @@ -1818,6 +1821,10 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) childregs->ppr = DEFAULT_PPR; p->thread.tidr = 0; +#endif +#ifdef CONFIG_PPC_BOOK3S_64 + if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE)) + p->thread.hashkeyr = current->thread.hashkeyr; #endif /* * Run with the current AMR value of the kernel @@ -1947,6 +1954,11 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp) current->thread.load_tm = 0; #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */ #ifdef CONFIG_PPC_BOOK3S_64 + if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE)) { + current->thread.hashkeyr = get_random_long(); + mtspr(SPRN_HASHKEYR, current->thread.hashkeyr); + } + if (cpu_has_feature(CPU_FTR_ARCH_31)) mtspr(SPRN_DEXCR, get_thread_dexcr(¤t->thread)); #endif /* CONFIG_PPC_BOOK3S_64 */ -- 2.38.1
[RFC PATCH 03/13] powerpc/dexcr: Handle hashchk exception
Recognise and pass the appropriate signal to the user program when a hashchk instruction triggers. This is independent of allowing configuration of DEXCR[NPHIE], as a hypervisor can enforce this aspect regardless of the kernel. Signed-off-by: Benjamin Gray --- arch/powerpc/include/asm/ppc-opcode.h | 1 + arch/powerpc/include/asm/processor.h | 6 ++ arch/powerpc/kernel/dexcr.c | 22 ++ arch/powerpc/kernel/traps.c | 6 ++ 4 files changed, 35 insertions(+) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 21e33e46f4b8..89b316466ed1 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -215,6 +215,7 @@ #define OP_31_XOP_STFSX663 #define OP_31_XOP_STFSUX695 #define OP_31_XOP_STFDX 727 +#define OP_31_XOP_HASHCHK 754 #define OP_31_XOP_STFDUX759 #define OP_31_XOP_LHBRX 790 #define OP_31_XOP_LFIWAX855 diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 0a8a793b8b8b..c17ec1e44c86 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -448,10 +448,16 @@ void *exit_vmx_ops(void *dest); #ifdef CONFIG_PPC_BOOK3S_64 +bool is_hashchk_trap(struct pt_regs const *regs); unsigned long get_thread_dexcr(struct thread_struct const *t); #else +static inline bool is_hashchk_trap(struct pt_regs const *regs) +{ + return false; +} + static inline unsigned long get_thread_dexcr(struct thread_struct const *t) { return 0; diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c index 32a0a69ff638..11515e67afac 100644 --- a/arch/powerpc/kernel/dexcr.c +++ b/arch/powerpc/kernel/dexcr.c @@ -3,6 +3,9 @@ #include #include +#include +#include +#include #include #include @@ -19,6 +22,25 @@ static int __init dexcr_init(void) } early_initcall(dexcr_init); +bool is_hashchk_trap(struct pt_regs const *regs) +{ + ppc_inst_t insn; + + if (!cpu_has_feature(CPU_FTR_DEXCR_NPHIE)) + return false; + + if (get_user_instr(insn, (void __user *)regs->nip)) { + WARN_ON(1); + return false; + } + + if (ppc_inst_primary_opcode(insn) == 31 && + get_xop(ppc_inst_val(insn)) == OP_31_XOP_HASHCHK) + return true; + + return false; +} + unsigned long get_thread_dexcr(struct thread_struct const *t) { return DEFAULT_DEXCR; diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 9bdd79aa51cf..b83f5b382f24 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -1516,6 +1516,12 @@ static void do_program_check(struct pt_regs *regs) return; } } + + if (user_mode(regs) && is_hashchk_trap(regs)) { + _exception(SIGILL, regs, ILL_ILLOPN, regs->nip); + return; + } + _exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip); return; } -- 2.38.1
[RFC PATCH 01/13] powerpc/book3s: Add missing include
The functions here use struct thread_struct fields, so need to import the full definition from . The header that defines current only forward declares struct thread_struct. Failing to include this header leads to a compilation error when a translation unit does not also include indirectly. Signed-off-by: Benjamin Gray --- arch/powerpc/include/asm/book3s/64/kup.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h index 54cf46808157..84c09e546115 100644 --- a/arch/powerpc/include/asm/book3s/64/kup.h +++ b/arch/powerpc/include/asm/book3s/64/kup.h @@ -194,6 +194,7 @@ #else /* !__ASSEMBLY__ */ #include +#include DECLARE_STATIC_KEY_FALSE(uaccess_flush_key); -- 2.38.1
[RFC PATCH 02/13] powerpc: Add initial Dynamic Execution Control Register (DEXCR) support
ISA 3.1B introduces the Dynamic Execution Control Register (DEXCR). It is a per-cpu register that allows control over various CPU behaviours including branch hint usage, indirect branch speculation, and hashst/hashchk support. Though introduced in 3.1B, no CPUs using 3.1 were released, so CPU_FTR_ARCH_31 is used to determine support for the register itself. Support for each DEXCR bit (aspect) is reported separately by the firmware. Add various definitions and basic support for the DEXCR in the kernel. Right now it just initialises and maintains the DEXCR on process creation/swap, and clears it in reset_sprs(). Signed-off-by: Benjamin Gray --- arch/powerpc/include/asm/book3s/64/kexec.h | 3 +++ arch/powerpc/include/asm/cputable.h| 8 ++- arch/powerpc/include/asm/processor.h | 13 +++ arch/powerpc/include/asm/reg.h | 6 ++ arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/dexcr.c| 25 ++ arch/powerpc/kernel/dt_cpu_ftrs.c | 4 arch/powerpc/kernel/process.c | 13 ++- arch/powerpc/kernel/prom.c | 4 9 files changed, 75 insertions(+), 2 deletions(-) create mode 100644 arch/powerpc/kernel/dexcr.c diff --git a/arch/powerpc/include/asm/book3s/64/kexec.h b/arch/powerpc/include/asm/book3s/64/kexec.h index d4b9d476ecba..563baf94a962 100644 --- a/arch/powerpc/include/asm/book3s/64/kexec.h +++ b/arch/powerpc/include/asm/book3s/64/kexec.h @@ -21,6 +21,9 @@ static inline void reset_sprs(void) plpar_set_ciabr(0); } + if (cpu_has_feature(CPU_FTR_ARCH_31)) + mtspr(SPRN_DEXCR, 0); + /* Do we need isync()? We are going via a kexec reset */ isync(); } diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index 757dbded11dc..03bc192f2d8b 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -192,6 +192,10 @@ static inline void cpu_feature_keys_init(void) { } #define CPU_FTR_P9_RADIX_PREFETCH_BUG LONG_ASM_CONST(0x0002) #define CPU_FTR_ARCH_31 LONG_ASM_CONST(0x0004) #define CPU_FTR_DAWR1 LONG_ASM_CONST(0x0008) +#define CPU_FTR_DEXCR_SBHE LONG_ASM_CONST(0x0010) +#define CPU_FTR_DEXCR_IBRTPD LONG_ASM_CONST(0x0020) +#define CPU_FTR_DEXCR_SRAPDLONG_ASM_CONST(0x0040) +#define CPU_FTR_DEXCR_NPHIELONG_ASM_CONST(0x0080) #ifndef __ASSEMBLY__ @@ -451,7 +455,9 @@ static inline void cpu_feature_keys_init(void) { } CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \ CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \ CPU_FTR_ARCH_300 | CPU_FTR_ARCH_31 | \ - CPU_FTR_DAWR | CPU_FTR_DAWR1) + CPU_FTR_DAWR | CPU_FTR_DAWR1 | \ + CPU_FTR_DEXCR_SBHE | CPU_FTR_DEXCR_IBRTPD | CPU_FTR_DEXCR_SRAPD | \ + CPU_FTR_DEXCR_NPHIE) #define CPU_FTRS_CELL (CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \ diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 631802999d59..0a8a793b8b8b 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -446,6 +446,19 @@ int exit_vmx_usercopy(void); int enter_vmx_ops(void); void *exit_vmx_ops(void *dest); +#ifdef CONFIG_PPC_BOOK3S_64 + +unsigned long get_thread_dexcr(struct thread_struct const *t); + +#else + +static inline unsigned long get_thread_dexcr(struct thread_struct const *t) +{ + return 0; +} + +#endif /* CONFIG_PPC_BOOK3S_64 */ + #endif /* __KERNEL__ */ #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_PROCESSOR_H */ diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 1e8b2e04e626..cdd1f174c399 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -385,6 +385,12 @@ #define SPRN_HSRR0 0x13A /* Hypervisor Save/Restore 0 */ #define SPRN_HSRR1 0x13B /* Hypervisor Save/Restore 1 */ #define SPRN_ASDR 0x330 /* Access segment descriptor register */ +#define SPRN_DEXCR 0x33C /* Dynamic execution control register */ +#define DEXCR_PRO_MASK(aspect) __MASK(63 - (32 + (aspect)))/* Aspect number to problem state aspect mask */ +#define DEXCR_PRO_SBHE DEXCR_PRO_MASK(0) /* Speculative Branch Hint Enable */ +#define DEXCR_PRO_IBRTPD DEXCR_PRO_MASK(3) /* Indirect Branch Recurrent Target Prediction Disable */ +#define DEXCR_PRO_SRAPD DEXCR_PRO_MASK(4) /* Subroutine Return Address Prediction Disable */ +#define DEXCR_PRO_NPHIE DEXCR_PRO_MASK(5) /* Non-Privileged Hash Instruction Enable */ #define SPRN_IC
[RFC PATCH 09/13] selftests/powerpc: Add more utility macros
Adds more assertion variants to provide more context behind why a failure occurred. The SIGSAFE_FAIL_* variants are to allow safely asserting conditions in a signal handler (though we are about to exit, so it's unlikely to run into an issue with regular FAIL_IF_EXIT). Also adds an ARRAY_SIZE macro. These will be used by the following DEXCR selftests. Signed-off-by: Benjamin Gray --- .../testing/selftests/powerpc/include/utils.h | 44 +++ 1 file changed, 44 insertions(+) diff --git a/tools/testing/selftests/powerpc/include/utils.h b/tools/testing/selftests/powerpc/include/utils.h index 95f3a24a4569..b03d2192c6f6 100644 --- a/tools/testing/selftests/powerpc/include/utils.h +++ b/tools/testing/selftests/powerpc/include/utils.h @@ -9,12 +9,19 @@ #define __cacheline_aligned __attribute__((aligned(128))) #include +#include #include +#include +#include #include #include #include #include "reg.h" +#ifndef ARRAY_SIZE +# define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) +#endif + /* Avoid headaches with PRI?64 - just use %ll? always */ typedef unsigned long long u64; typedef signed long long s64; @@ -111,6 +118,16 @@ do { \ } \ } while (0) +#define FAIL_IF_MSG(x, msg)\ +do { \ + if ((x)) { \ + fprintf(stderr, \ + "[FAIL] Test FAILED on line %d: %s\n", \ + __LINE__, msg); \ + return 1; \ + } \ +} while (0) + #define FAIL_IF_EXIT(x)\ do { \ if ((x)) { \ @@ -120,6 +137,16 @@ do { \ } \ } while (0) +#define FAIL_IF_EXIT_MSG(x, msg) \ +do { \ + if ((x)) { \ + fprintf(stderr, \ + "[FAIL] Test FAILED on line %d: %s\n", \ + __LINE__, msg); \ + _exit(1); \ + } \ +} while (0) + /* The test harness uses this, yes it's gross */ #define MAGIC_SKIP_RETURN_VALUE99 @@ -149,6 +176,23 @@ do { \ ssize_t nbytes __attribute__((unused)); \ nbytes = write(STDERR_FILENO, msg, strlen(msg)); }) +#define SIGSAFE_FAIL_IF_EXIT(x) \ +do { \ + if ((x)) { \ + sigsafe_err("[FAIL] Test FAILED on line " str(__LINE__) "\n"); \ + _exit(1); \ + } \ +} while (0) + +#define SIGSAFE_FAIL_IF_EXIT_MSG(x, msg) \ +do { \ + if ((x)) { \ + sigsafe_err("[FAIL] Test FAILED on line " \ + str(__LINE__) ": " msg "\n"); \ + _exit(1); \ + } \ +} while (0) + /* POWER9 feature */ #ifndef PPC_FEATURE2_ARCH_3_00 #define PPC_FEATURE2_ARCH_3_00 0x0080 -- 2.38.1
[RFC PATCH 06/13] powerpc/dexcr: Add prctl implementation
Adds an initial prctl interface implementation. Unprivileged processes can query the current prctl setting, including whether an aspect is implemented by the hardware or is permitted to be modified by a setter prctl. Editable aspects can be changed by a CAP_SYS_ADMIN privileged process. The prctl setting represents what the process itself has requested, and does not account for any overrides. Either the kernel or a hypervisor may enforce a different setting for an aspect. Userspace can access a readonly view of the current DEXCR via SPR 812, and a readonly view of the aspects enforced by the hypervisor via SPR 455. A bitwise OR of these two SPRs will give the effective DEXCR aspect state of the process. Signed-off-by: Benjamin Gray --- arch/powerpc/include/asm/processor.h | 13 +++ arch/powerpc/kernel/dexcr.c | 133 ++- arch/powerpc/kernel/process.c| 6 ++ 3 files changed, 151 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 2381217c95dc..4c995258f668 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -265,6 +265,9 @@ struct thread_struct { unsigned long sier2; unsigned long sier3; unsigned long hashkeyr; + unsigned intdexcr_override; + unsigned intdexcr_mask; + unsigned intdexcr_forced; #endif }; @@ -338,6 +341,16 @@ extern int set_endian(struct task_struct *tsk, unsigned int val); extern int get_unalign_ctl(struct task_struct *tsk, unsigned long adr); extern int set_unalign_ctl(struct task_struct *tsk, unsigned int val); +#ifdef CONFIG_PPC_BOOK3S_64 + +#define PPC_GET_DEXCR_ASPECT(tsk, asp) dexcr_prctl_get((tsk), (asp)) +#define PPC_SET_DEXCR_ASPECT(tsk, asp, val) dexcr_prctl_set((tsk), (asp), (val)) + +int dexcr_prctl_get(struct task_struct *tsk, unsigned long asp); +int dexcr_prctl_set(struct task_struct *tsk, unsigned long asp, unsigned long val); + +#endif + extern void load_fp_state(struct thread_fp_state *fp); extern void store_fp_state(struct thread_fp_state *fp); extern void load_vr_state(struct thread_vr_state *vr); diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c index 11515e67afac..9290beed722a 100644 --- a/arch/powerpc/kernel/dexcr.c +++ b/arch/powerpc/kernel/dexcr.c @@ -1,5 +1,8 @@ #include +#include #include +#include +#include #include #include @@ -11,6 +14,10 @@ #define DEFAULT_DEXCR 0 +/* Allow process configuration of these by default */ +#define DEXCR_PRCTL_EDITABLE (DEXCR_PRO_SBHE | DEXCR_PRO_IBRTPD | \ + DEXCR_PRO_SRAPD | DEXCR_PRO_NPHIE) + static int __init dexcr_init(void) { if (!early_cpu_has_feature(CPU_FTR_ARCH_31)) @@ -43,5 +50,129 @@ bool is_hashchk_trap(struct pt_regs const *regs) unsigned long get_thread_dexcr(struct thread_struct const *t) { - return DEFAULT_DEXCR; + unsigned long dexcr = DEFAULT_DEXCR; + + /* Apply prctl overrides */ + dexcr = (dexcr & ~t->dexcr_mask) | t->dexcr_override; + + return dexcr; +} + +static void update_dexcr_on_cpu(void *info) +{ + mtspr(SPRN_DEXCR, get_thread_dexcr(¤t->thread)); +} + +static int dexcr_aspect_get(struct task_struct *task, unsigned int aspect) +{ + int ret = 0; + + if (aspect & DEXCR_PRCTL_EDITABLE) + ret |= PR_PPC_DEXCR_PRCTL; + + if (aspect & task->thread.dexcr_mask) { + if (aspect & task->thread.dexcr_override) { + if (aspect & task->thread.dexcr_forced) + ret |= PR_PPC_DEXCR_FORCE_SET_ASPECT; + else + ret |= PR_PPC_DEXCR_SET_ASPECT; + } else { + ret |= PR_PPC_DEXCR_CLEAR_ASPECT; + } + } + + return ret; +} + +int dexcr_prctl_get(struct task_struct *task, unsigned long which) +{ + switch (which) { + case PR_PPC_DEXCR_SBHE: + if (!cpu_has_feature(CPU_FTR_DEXCR_SBHE)) + return -ENODEV; + return dexcr_aspect_get(task, DEXCR_PRO_SBHE); + case PR_PPC_DEXCR_IBRTPD: + if (!cpu_has_feature(CPU_FTR_DEXCR_IBRTPD)) + return -ENODEV; + return dexcr_aspect_get(task, DEXCR_PRO_IBRTPD); + case PR_PPC_DEXCR_SRAPD: + if (!cpu_has_feature(CPU_FTR_DEXCR_SRAPD)) + return -ENODEV; + return dexcr_aspect_get(task, DEXCR_PRO_SRAPD); + case PR_PPC_DEXCR_NPHIE: + if (!cpu_has_feature(CPU_FTR_DEXCR_NPHIE)) + return -ENODEV; + return dexcr_aspect_get(task, DEXCR_PRO_NPHIE); + default: + return -ENODEV; + } +} + +static int dexcr_aspect_set(struct task_struct *task, unsigned int aspect, unsigned long ctrl) +{ + if (!(aspect & DEXCR_
[RFC PATCH 11/13] selftests/powerpc: Add DEXCR prctl, sysctl interface test
Test the prctl and sysctl interfaces of the DEXCR. This adds a new capabilities util for getting and setting CAP_SYS_ADMIN. Adding this avoids depending on an external libcap package. There is a similar implementation (and reason) in the tools/testing/selftests/bpf subtree but there's no obvious place to move it for sharing. Signed-off-by: Benjamin Gray --- .../selftests/powerpc/dexcr/.gitignore| 1 + .../testing/selftests/powerpc/dexcr/Makefile | 4 +- tools/testing/selftests/powerpc/dexcr/cap.c | 72 ++ tools/testing/selftests/powerpc/dexcr/cap.h | 18 ++ tools/testing/selftests/powerpc/dexcr/dexcr.h | 2 + .../selftests/powerpc/dexcr/dexcr_test.c | 241 ++ 6 files changed, 336 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/powerpc/dexcr/cap.c create mode 100644 tools/testing/selftests/powerpc/dexcr/cap.h create mode 100644 tools/testing/selftests/powerpc/dexcr/dexcr_test.c diff --git a/tools/testing/selftests/powerpc/dexcr/.gitignore b/tools/testing/selftests/powerpc/dexcr/.gitignore index 37adb7f47832..035a1fcd8fb3 100644 --- a/tools/testing/selftests/powerpc/dexcr/.gitignore +++ b/tools/testing/selftests/powerpc/dexcr/.gitignore @@ -1 +1,2 @@ +dexcr_test hashchk_user diff --git a/tools/testing/selftests/powerpc/dexcr/Makefile b/tools/testing/selftests/powerpc/dexcr/Makefile index 4b4380d4d986..9814e72a4afa 100644 --- a/tools/testing/selftests/powerpc/dexcr/Makefile +++ b/tools/testing/selftests/powerpc/dexcr/Makefile @@ -1,4 +1,4 @@ -TEST_GEN_PROGS := hashchk_test +TEST_GEN_PROGS := dexcr_test hashchk_test TEST_FILES := settings top_srcdir = ../../../../.. @@ -6,4 +6,4 @@ include ../../lib.mk HASHCHK_TEST_CFLAGS = -no-pie $(call cc-option,-mno-rop-protect) -$(TEST_GEN_PROGS): ../harness.c ../utils.c ./dexcr.c +$(TEST_GEN_PROGS): ../harness.c ../utils.c ./dexcr.c ./cap.c diff --git a/tools/testing/selftests/powerpc/dexcr/cap.c b/tools/testing/selftests/powerpc/dexcr/cap.c new file mode 100644 index ..3c9b1f27345d --- /dev/null +++ b/tools/testing/selftests/powerpc/dexcr/cap.c @@ -0,0 +1,72 @@ +#include +#include +#include + +#include "cap.h" +#include "utils.h" + +struct kernel_capabilities { + struct __user_cap_header_struct header; + + struct __user_cap_data_struct data[_LINUX_CAPABILITY_U32S_3]; +}; + +static void get_caps(struct kernel_capabilities *caps) +{ + FAIL_IF_EXIT_MSG(syscall(SYS_capget, &caps->header, &caps->data), +"cannot get capabilities"); +} + +static void set_caps(struct kernel_capabilities *caps) +{ + FAIL_IF_EXIT_MSG(syscall(SYS_capset, &caps->header, &caps->data), +"cannot set capabilities"); +} + +static void init_caps(struct kernel_capabilities *caps, pid_t pid) +{ + memset(caps, 0, sizeof(*caps)); + + caps->header.version = _LINUX_CAPABILITY_VERSION_3; + caps->header.pid = pid; + + get_caps(caps); +} + +static bool has_cap(struct kernel_capabilities *caps, size_t cap) +{ + size_t data_index = cap / 32; + size_t offset = cap % 32; + + FAIL_IF_EXIT_MSG(data_index >= ARRAY_SIZE(caps->data), "cap out of range"); + + return caps->data[data_index].effective & (1 << offset); +} + +static void drop_cap(struct kernel_capabilities *caps, size_t cap) +{ + size_t data_index = cap / 32; + size_t offset = cap % 32; + + FAIL_IF_EXIT_MSG(data_index >= ARRAY_SIZE(caps->data), "cap out of range"); + + caps->data[data_index].effective &= ~(1 << offset); +} + +bool check_cap_sysadmin(void) +{ + struct kernel_capabilities caps; + + init_caps(&caps, 0); + + return has_cap(&caps, CAP_SYS_ADMIN); +} + +void drop_cap_sysadmin(void) +{ + struct kernel_capabilities caps; + + init_caps(&caps, 0); + drop_cap(&caps, CAP_SYS_ADMIN); + set_caps(&caps); +} diff --git a/tools/testing/selftests/powerpc/dexcr/cap.h b/tools/testing/selftests/powerpc/dexcr/cap.h new file mode 100644 index ..41f41dda9862 --- /dev/null +++ b/tools/testing/selftests/powerpc/dexcr/cap.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Simple capabilities getter/setter + * + * This header file contains helper functions and macros + * required to get and set capabilities(7). Introduced so + * we aren't the first to rely on libcap. + */ +#ifndef _SELFTESTS_POWERPC_DEXCR_CAP_H +#define _SELFTESTS_POWERPC_DEXCR_CAP_H + +#include + +bool check_cap_sysadmin(void); + +void drop_cap_sysadmin(void); + +#endif /* _SELFTESTS_POWERPC_DEXCR_CAP_H */ diff --git a/tools/testing/selftests/powerpc/dexcr/dexcr.h b/tools/testing/selftests/powerpc/dexcr/dexcr.h index fb8007bf19f8..b90633ae49e9 100644 --- a/tools/testing/selftests/powerpc/dexcr/dexcr.h +++ b/tools/testing/selftests/powerpc/dexcr/dexcr.h @@ -21,6 +21,8 @@ #define DEXCR_PRO_SRAPDDEXCR_PRO_MASK(4) #define DEXCR_PRO_NPHIEDEXCR_PRO_MA
[RFC PATCH 12/13] selftests/powerpc: Add DEXCR status utility lsdexcr
Add a utility 'lsdexcr' to print the current DEXCR status. Useful for quickly checking the status when debugging test failures, using the sysctl interfaces manually, or just wanting to check it. Example output: Requested: 8400 (SBHE, NPHIE) Hypervisor enforced: Effective: 8400 (SBHE, NPHIE) SBHE * (0): set, prctl editable (Speculative branch hint enable) IBRTPD (3): clear, prctl editable (Indirect branch recurrent target prediction disable) SRAPD (4): clear, prctl editable (Subroutine return address prediction disable) NPHIE * (5): set (Non-privileged hash instruction enable) Global SBHE override: 1 (set) Signed-off-by: Benjamin Gray --- .../selftests/powerpc/dexcr/.gitignore| 1 + .../testing/selftests/powerpc/dexcr/Makefile | 2 + .../testing/selftests/powerpc/dexcr/lsdexcr.c | 178 ++ 3 files changed, 181 insertions(+) create mode 100644 tools/testing/selftests/powerpc/dexcr/lsdexcr.c diff --git a/tools/testing/selftests/powerpc/dexcr/.gitignore b/tools/testing/selftests/powerpc/dexcr/.gitignore index 035a1fcd8fb3..7dd2fad93732 100644 --- a/tools/testing/selftests/powerpc/dexcr/.gitignore +++ b/tools/testing/selftests/powerpc/dexcr/.gitignore @@ -1,2 +1,3 @@ dexcr_test hashchk_user +lsdexcr diff --git a/tools/testing/selftests/powerpc/dexcr/Makefile b/tools/testing/selftests/powerpc/dexcr/Makefile index 9814e72a4afa..8cb732cda7e7 100644 --- a/tools/testing/selftests/powerpc/dexcr/Makefile +++ b/tools/testing/selftests/powerpc/dexcr/Makefile @@ -1,4 +1,5 @@ TEST_GEN_PROGS := dexcr_test hashchk_test +TEST_GEN_FILES := lsdexcr TEST_FILES := settings top_srcdir = ../../../../.. @@ -7,3 +8,4 @@ include ../../lib.mk HASHCHK_TEST_CFLAGS = -no-pie $(call cc-option,-mno-rop-protect) $(TEST_GEN_PROGS): ../harness.c ../utils.c ./dexcr.c ./cap.c +$(TEST_GEN_FILES): ../utils.c ./dexcr.c diff --git a/tools/testing/selftests/powerpc/dexcr/lsdexcr.c b/tools/testing/selftests/powerpc/dexcr/lsdexcr.c new file mode 100644 index ..c9f0035f8e2e --- /dev/null +++ b/tools/testing/selftests/powerpc/dexcr/lsdexcr.c @@ -0,0 +1,178 @@ +#include +#include +#include +#include +#include + +#include "dexcr.h" +#include "utils.h" + +static unsigned int requested; +static unsigned int enforced; +static unsigned int effective; + +struct dexcr_aspect { + const char *name; + const char *desc; + unsigned int index; + unsigned long pr_val; +}; + +static const struct dexcr_aspect aspects[] = { + { + .name = "SBHE", + .desc = "Speculative branch hint enable", + .index = 0, + .pr_val = PR_PPC_DEXCR_SBHE, + }, + { + .name = "IBRTPD", + .desc = "Indirect branch recurrent target prediction disable", + .index = 3, + .pr_val = PR_PPC_DEXCR_IBRTPD, + }, + { + .name = "SRAPD", + .desc = "Subroutine return address prediction disable", + .index = 4, + .pr_val = PR_PPC_DEXCR_SRAPD, + }, + { + .name = "NPHIE", + .desc = "Non-privileged hash instruction enable", + .index = 5, + .pr_val = PR_PPC_DEXCR_NPHIE, + }, +}; + +#define NUM_ASPECTS (sizeof(aspects) / sizeof(struct dexcr_aspect)) + +static void print_list(const char *list[], size_t len) +{ + for (size_t i = 0; i < len; i++) { + printf("%s", list[i]); + if (i + 1 < len) + printf(", "); + } +} + +static void print_dexcr(char *name, unsigned int bits) +{ + const char *enabled_aspects[32] = {NULL}; + size_t j = 0; + + printf("%s: %08x", name, bits); + + if (bits == 0) { + printf("\n"); + return; + } + + for (size_t i = 0; i < NUM_ASPECTS; i++) { + unsigned int mask = pr_aspect_to_dexcr_mask(aspects[i].pr_val); + if (bits & mask) { + enabled_aspects[j++] = aspects[i].name; + bits &= ~mask; + } + } + + if (bits) + enabled_aspects[j++] = "unknown"; + + printf(" ("); + print_list(enabled_aspects, j); + printf(")\n"); +} + +static void print_aspect(const struct dexcr_aspect *aspect) +{ + const char *attributes[32] = {NULL}; + size_t j = 0; + unsigned long mask; + int pr_status; + + /* Kernel-independent info about aspect */ + mask = pr_aspect_to_dexcr_mask(aspect->pr_val); + if (requested & mask) + attributes[j++] = "set"; + if (enforced & mask) + attributes[j++] = "hypervisor enforced"; + if (!(effective & mask)) + attributes[j++] = "clear"; + + /* Kernel understanding of the aspect */ + pr_stat
[RFC PATCH 00/13] Add DEXCR support
This series is based on initial work by Chris Riedl that was not sent to the list. Adds a kernel interface for userspace to interact with the DEXCR. The DEXCR is a SPR that allows control over various execution 'aspects', such as indirect branch prediction and enabling the hashst/hashchk instructions. Further details are in ISA 3.1B Book 3 chapter 12. This RFC proposes an interface for users to interact with the DEXCR. It aims to support * Querying supported aspects * Getting/setting aspects on a per-process level * Allowing global overrides across all processes There are some parts that I'm not sure on the best way to approach (hence RFC): * The feature names in arch/powerpc/kernel/dt_cpu_ftrs.c appear to be unimplemented in skiboot, so are being defined by this series. Is being so verbose fine? * What aspects should be editable by a process? E.g., SBHE has effects that potentially bleed into other processes. Should it only be system wide configurable? * Should configuring certain aspects for the process be non-privileged? E.g., Is there harm in always allowing configuration of IBRTPD, SRAPD? The *FORCE_SET* action prevents further process local changes regardless of privilege. * The tests fail Patchwork CI because of the new prctl macros, and the CI doesn't run headers_install and add -isystem /usr/include to the make command. * On handling an exception, I don't check if the NPHIE bit is enabled in the DEXCR. To do so would require reading both the DEXCR and HDEXCR, for little gain (it should only matter that the current instruction was a hashchk. If so, the only reason it would cause an exception is the failed check. If the instruction is rewritten between exception and check we'd be wrong anyway). The series is based on the earlier selftest utils series[1], so the tests won't build at all without applying that first. The kernel side should build fine on ppc/next 247f34f7b80357943234f93f247a1ae6b6c3a740 though. [1]: https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20221122231103.15829-1-bg...@linux.ibm.com/ Benjamin Gray (13): powerpc/book3s: Add missing include powerpc: Add initial Dynamic Execution Control Register (DEXCR) support powerpc/dexcr: Handle hashchk exception powerpc/dexcr: Support userspace ROP protection prctl: Define PowerPC DEXCR interface powerpc/dexcr: Add prctl implementation powerpc/dexcr: Add sysctl entry for SBHE system override powerpc/dexcr: Add enforced userspace ROP protection config selftests/powerpc: Add more utility macros selftests/powerpc: Add hashst/hashchk test selftests/powerpc: Add DEXCR prctl, sysctl interface test selftests/powerpc: Add DEXCR status utility lsdexcr Documentation: Document PowerPC kernel DEXCR interface Documentation/powerpc/dexcr.rst | 183 +++ Documentation/powerpc/index.rst | 1 + arch/powerpc/Kconfig | 5 + arch/powerpc/include/asm/book3s/64/kexec.h| 6 + arch/powerpc/include/asm/book3s/64/kup.h | 1 + arch/powerpc/include/asm/cputable.h | 8 +- arch/powerpc/include/asm/ppc-opcode.h | 1 + arch/powerpc/include/asm/processor.h | 33 ++ arch/powerpc/include/asm/reg.h| 7 + arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/dexcr.c | 310 ++ arch/powerpc/kernel/dt_cpu_ftrs.c | 4 + arch/powerpc/kernel/process.c | 31 +- arch/powerpc/kernel/prom.c| 4 + arch/powerpc/kernel/traps.c | 6 + include/uapi/linux/prctl.h| 14 + kernel/sys.c | 16 + tools/testing/selftests/powerpc/Makefile | 1 + .../selftests/powerpc/dexcr/.gitignore| 3 + .../testing/selftests/powerpc/dexcr/Makefile | 11 + tools/testing/selftests/powerpc/dexcr/cap.c | 72 tools/testing/selftests/powerpc/dexcr/cap.h | 18 + tools/testing/selftests/powerpc/dexcr/dexcr.c | 118 +++ tools/testing/selftests/powerpc/dexcr/dexcr.h | 54 +++ .../selftests/powerpc/dexcr/dexcr_test.c | 241 ++ .../selftests/powerpc/dexcr/hashchk_test.c| 229 + .../testing/selftests/powerpc/dexcr/lsdexcr.c | 178 ++ tools/testing/selftests/powerpc/include/reg.h | 4 + .../testing/selftests/powerpc/include/utils.h | 44 +++ 29 files changed, 1602 insertions(+), 2 deletions(-) create mode 100644 Documentation/powerpc/dexcr.rst create mode 100644 arch/powerpc/kernel/dexcr.c create mode 100644 tools/testing/selftests/powerpc/dexcr/.gitignore create mode 100644 tools/testing/selftests/powerpc/dexcr/Makefile create mode 100644 tools/testing/selftests/powerpc/dexcr/cap.c create mode 100644 tools/testing/selftests/powerpc/dexcr/cap.h create mode 100644 tools/testing/selftests/powerpc/dexcr/dexcr.c create mode 100644 tools/testing/selftests/powerpc/dexcr/de
[RFC PATCH 13/13] Documentation: Document PowerPC kernel DEXCR interface
Describe the DEXCR and document how to interact with it via the prctl and sysctl interfaces. Signed-off-by: Benjamin Gray --- Documentation/powerpc/dexcr.rst | 183 Documentation/powerpc/index.rst | 1 + 2 files changed, 184 insertions(+) create mode 100644 Documentation/powerpc/dexcr.rst diff --git a/Documentation/powerpc/dexcr.rst b/Documentation/powerpc/dexcr.rst new file mode 100644 index ..3c995f4b9fe0 --- /dev/null +++ b/Documentation/powerpc/dexcr.rst @@ -0,0 +1,183 @@ +== +DEXCR (Dynamic Execution Control Register) +== + +Overview + + +The DEXCR is a privileged special purpose register (SPR) introduced in +PowerPC ISA 3.1B (Power10) that allows per-cpu control over several dynamic +execution behaviours. These behaviours include speculation (e.g., indirect +branch target prediction) and enabling return-oriented programming (ROP) +protection instructions. + +The execution control is exposed in hardware as up to 32 bits ('aspects') in +the DEXCR. Each aspect controls a certain behaviour, and can be set or cleared +to enable/disable the aspect. There are several variants of the DEXCR for +different purposes: + +DEXCR +A priviliged SPR that can control aspects for userspace and kernel space +HDEXCR +A hypervisor-privileged SPR that can control aspects for the hypervisor and +enforce aspects for the kernel and userspace. +UDEXCR +An optional ultravisor-privileged SPR that can control aspects for the ultravisor. + +Userspace can examine the current DEXCR state using a dedicated SPR that +provides a non-privileged read-only view of the userspace DEXCR aspects. +There is also an SPR that provides a read-only view of the hypervisor enforced +aspects, which ORed with the userspace DEXCR view gives the effective DEXCR +state for a process. + + +User API + + +prctl() +--- + +A process can control its own userspace DEXCR value using the +``PR_PPC_GET_DEXCR`` and ``PR_PPC_SET_DEXCR`` pair of +:manpage:`prctl(2)` commands. These calls have the form:: + +prctl(PR_PPC_GET_DEXCR, unsigned long aspect, 0, 0, 0); +prctl(PR_PPC_SET_DEXCR, unsigned long aspect, unsigned long flags, 0, 0); + +Where ``aspect`` (``arg1``) is a constant and ``flags`` (``arg2``) is a bifield. +The possible aspect and flag values are as follows. Note there is no relation +between aspect value and ``prctl()`` constant value. + +.. flat-table:: + :header-rows: 1 + :widths: 2 7 1 + + * - ``prctl()`` constant + - Aspect name + - Aspect bit + + * - ``PR_PPC_DEXCR_SBHE`` + - Speculative Branch Hint Enable (SBHE) + - 0 + + * - ``PR_PPC_DEXCR_IBRTPD`` + - Indirect Branch Recurrent Target Prediction Disable (IBRTPD) + - 3 + + * - ``PR_PPC_DEXCR_SRAPD`` + - Subroutine Return Address Prediction Disable (SRAPD) + - 4 + + * - ``PR_PPC_DEXCR_NPHIE`` + - Non-Privileged Hash Instruction Enable (NPHIE) + - 5 + +.. flat-table:: + :header-rows: 1 + :widths: 2 8 + + * - ``prctl()`` flag + - Meaning + + * - ``PR_PPC_DEXCR_PRCTL`` + - This aspect can be configured with ``prctl(PR_PPC_SET_DEXCR, ...)`` + + * - ``PR_PPC_DEXCR_SET_ASPECT`` + - This aspect is set + + * - ``PR_PPC_DEXCR_FORCE_SET_ASPECT`` + - This aspect is set and cannot be undone. A subsequent + ``prctl(..., PR_PPC_DEXCR_CLEAR_ASPECT)`` will fail. + + * - ``PR_PPC_DEXCR_CLEAR_ASPECT`` + - This aspect is clear + +Note that + +* The ``*_SET_ASPECT`` / ``*_CLEAR_ASPECT`` refers to setting/clearing the bit in the DEXCR. + For example:: + + prctl(PR_PPC_SET_DEXCR, PR_PPC_DEXCR_IBRTPD, PR_PPC_DEXCR_SET_ASPECT, 0, 0); + + will set the IBRTPD aspect bit in the DEXCR, causing indirect branch prediction + to be disabled. + +* The status returned by ``PR_PPC_GET_DEXCR`` does not include any alternative + config overrides. To see the true DEXCR state software should read the appropriate + SPRs directly. + +* A forced aspect will still report ``PR_PPC_DEXCR_PRCTL`` if it would + otherwise be editable. + +* The aspect state when starting a process is copied from the parent's + state on :manpage:`fork(2)` and :manpage:`execve(2)`. Aspects may also be set + or cleared by the kernel on process creation. + +Use ``PR_PPC_SET_DEXCR`` with one of ``PR_PPC_DEXCR_SET_ASPECT``, +``PR_PPC_DEXCR_FORCE_SET_ASPECT``, or ``PR_PPC_DEXCR_CLEAR_ASPECT`` to edit a + given aspect. + +Common error codes for both getting and setting the DEXCR are as follows: + +.. flat-table:: + :header-rows: 1 + :widths: 2 8 + + * - Error + - Meaning + + * - ``EINVAL`` + - The DEXCR is not supported by the kernel. + + * - ``ENODEV`` + - The aspect is not recognised by the kernel or not supported by the hardware. + +``PR_PPC_SET_DEXCR`` may also report the following error codes: + +.. flat-table:: + :header-rows: 1 + :widths: 2 8 + + * - Err
[RFC PATCH 10/13] selftests/powerpc: Add hashst/hashchk test
Test the kernel DEXCR[NPHIE] interface and hashchk exception handling. Introduces with it a DEXCR utils library for common DEXCR operations. Signed-off-by: Benjamin Gray --- tools/testing/selftests/powerpc/Makefile | 1 + .../selftests/powerpc/dexcr/.gitignore| 1 + .../testing/selftests/powerpc/dexcr/Makefile | 9 + tools/testing/selftests/powerpc/dexcr/dexcr.c | 118 + tools/testing/selftests/powerpc/dexcr/dexcr.h | 52 .../selftests/powerpc/dexcr/hashchk_test.c| 229 ++ tools/testing/selftests/powerpc/include/reg.h | 4 + 7 files changed, 414 insertions(+) create mode 100644 tools/testing/selftests/powerpc/dexcr/.gitignore create mode 100644 tools/testing/selftests/powerpc/dexcr/Makefile create mode 100644 tools/testing/selftests/powerpc/dexcr/dexcr.c create mode 100644 tools/testing/selftests/powerpc/dexcr/dexcr.h create mode 100644 tools/testing/selftests/powerpc/dexcr/hashchk_test.c diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile index 6ba95cd19e42..00dbd000ee01 100644 --- a/tools/testing/selftests/powerpc/Makefile +++ b/tools/testing/selftests/powerpc/Makefile @@ -17,6 +17,7 @@ SUB_DIRS = alignment \ benchmarks \ cache_shape \ copyloops\ + dexcr\ dscr \ mm \ nx-gzip \ diff --git a/tools/testing/selftests/powerpc/dexcr/.gitignore b/tools/testing/selftests/powerpc/dexcr/.gitignore new file mode 100644 index ..37adb7f47832 --- /dev/null +++ b/tools/testing/selftests/powerpc/dexcr/.gitignore @@ -0,0 +1 @@ +hashchk_user diff --git a/tools/testing/selftests/powerpc/dexcr/Makefile b/tools/testing/selftests/powerpc/dexcr/Makefile new file mode 100644 index ..4b4380d4d986 --- /dev/null +++ b/tools/testing/selftests/powerpc/dexcr/Makefile @@ -0,0 +1,9 @@ +TEST_GEN_PROGS := hashchk_test + +TEST_FILES := settings +top_srcdir = ../../../../.. +include ../../lib.mk + +HASHCHK_TEST_CFLAGS = -no-pie $(call cc-option,-mno-rop-protect) + +$(TEST_GEN_PROGS): ../harness.c ../utils.c ./dexcr.c diff --git a/tools/testing/selftests/powerpc/dexcr/dexcr.c b/tools/testing/selftests/powerpc/dexcr/dexcr.c new file mode 100644 index ..3e7cb581d4a2 --- /dev/null +++ b/tools/testing/selftests/powerpc/dexcr/dexcr.c @@ -0,0 +1,118 @@ +#include +#include +#include +#include +#include +#include +#include +#include + +#include "dexcr.h" +#include "reg.h" +#include "utils.h" + +long sysctl_get_sbhe(void) +{ + long value; + + FAIL_IF_EXIT_MSG(read_long(SYSCTL_DEXCR_SBHE, &value, 10), +"failed to read " SYSCTL_DEXCR_SBHE); + + return value; +} + +void sysctl_set_sbhe(long value) +{ + FAIL_IF_EXIT_MSG(write_long(SYSCTL_DEXCR_SBHE, value, 10), +"failed to write to " SYSCTL_DEXCR_SBHE); +} + +unsigned int pr_aspect_to_dexcr_mask(unsigned long which) +{ + switch (which) { + case PR_PPC_DEXCR_SBHE: + return DEXCR_PRO_SBHE; + case PR_PPC_DEXCR_IBRTPD: + return DEXCR_PRO_IBRTPD; + case PR_PPC_DEXCR_SRAPD: + return DEXCR_PRO_SRAPD; + case PR_PPC_DEXCR_NPHIE: + return DEXCR_PRO_NPHIE; + default: + FAIL_IF_EXIT_MSG(true, "unknown PR aspect"); + } +} + +static inline unsigned int get_dexcr_pro(void) +{ + return mfspr(SPRN_DEXCR); +} + +static inline unsigned int get_dexcr_enf(void) +{ + return mfspr(SPRN_HDEXCR); +} + +static inline unsigned int get_dexcr_eff(void) +{ + return get_dexcr_pro() | get_dexcr_enf(); +} + +unsigned int get_dexcr(enum DexcrSource source) +{ + switch (source) { + case UDEXCR: + return get_dexcr_pro(); + case ENFORCED: + return get_dexcr_enf(); + case EFFECTIVE: + return get_dexcr_eff(); + default: + FAIL_IF_EXIT_MSG(true, "bad DEXCR source"); + } +} + +bool pr_aspect_supported(unsigned long which) +{ + return prctl(PR_PPC_GET_DEXCR, which, 0, 0, 0) >= 0; +} + +bool pr_aspect_editable(unsigned long which) +{ + int ret = prctl(PR_PPC_GET_DEXCR, which, 0, 0, 0); + return ret > 0 && (ret & PR_PPC_DEXCR_PRCTL) > 0; +} + +bool pr_aspect_edit(unsigned long which, unsigned long ctrl) +{ + return prctl(PR_PPC_SET_DEXCR, which, ctrl, 0, 0) == 0; +} + +bool pr_aspect_check(unsigned long which, enum DexcrSource source) +{ + unsigned int dexcr = get_dexcr(source); + unsigned int aspect = pr_aspect_to_dexcr_mask(which); + return (dexcr & aspect) != 0; +} + +int pr_aspect_get(unsigned long pr_aspect) +{ + int ret = prctl(PR_PPC_GET_DEXCR, pr_aspect, 0, 0, 0); + FAIL_IF_EXIT_MSG(ret < 0, "prctl failed"); + return ret; +} + +bool dexcr_pro_check(unsigned int
[RFC PATCH 05/13] prctl: Define PowerPC DEXCR interface
Adds the definitions and generic handler for prctl control of the PowerPC Dynamic Execution Control Register (DEXCR). Signed-off-by: Benjamin Gray --- include/uapi/linux/prctl.h | 14 ++ kernel/sys.c | 16 2 files changed, 30 insertions(+) diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index a5e06dcbba13..b4720e8de6f3 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -281,6 +281,20 @@ struct prctl_mm_map { # define PR_SME_VL_LEN_MASK0x # define PR_SME_VL_INHERIT (1 << 17) /* inherit across exec */ +/* PowerPC Dynamic Execution Control Register (DEXCR) controls */ +#define PR_PPC_GET_DEXCR 65 +#define PR_PPC_SET_DEXCR 66 +/* DEXCR aspect to act on */ +# define PR_PPC_DEXCR_SBHE 0 /* Speculative branch hint enable */ +# define PR_PPC_DEXCR_IBRTPD 1 /* Indirect branch recurrent target prediction disable */ +# define PR_PPC_DEXCR_SRAPD2 /* Subroutine return address prediction disable */ +# define PR_PPC_DEXCR_NPHIE3 /* Non-privileged hash instruction enable */ +/* Action to apply / return */ +# define PR_PPC_DEXCR_PRCTL(1 << 0) +# define PR_PPC_DEXCR_SET_ASPECT (1 << 1) +# define PR_PPC_DEXCR_FORCE_SET_ASPECT (1 << 2) +# define PR_PPC_DEXCR_CLEAR_ASPECT (1 << 3) + #define PR_SET_VMA 0x53564d41 # define PR_SET_VMA_ANON_NAME 0 diff --git a/kernel/sys.c b/kernel/sys.c index 5fd54bf0e886..55b8f7369059 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -139,6 +139,12 @@ #ifndef GET_TAGGED_ADDR_CTRL # define GET_TAGGED_ADDR_CTRL()(-EINVAL) #endif +#ifndef PPC_GET_DEXCR_ASPECT +# define PPC_GET_DEXCR_ASPECT(a, b)(-EINVAL) +#endif +#ifndef PPC_SET_DEXCR_ASPECT +# define PPC_SET_DEXCR_ASPECT(a, b, c) (-EINVAL) +#endif /* * this is where the system-wide overflow UID and GID are defined, for @@ -2623,6 +2629,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, error = sched_core_share_pid(arg2, arg3, arg4, arg5); break; #endif + case PR_PPC_GET_DEXCR: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = PPC_GET_DEXCR_ASPECT(me, arg2); + break; + case PR_PPC_SET_DEXCR: + if (arg4 || arg5) + return -EINVAL; + error = PPC_SET_DEXCR_ASPECT(me, arg2, arg3); + break; case PR_SET_VMA: error = prctl_set_vma(arg2, arg3, arg4, arg5); break; -- 2.38.1
[RFC PATCH 07/13] powerpc/dexcr: Add sysctl entry for SBHE system override
The DEXCR Speculative Branch Hint Enable (SBHE) aspect controls whether the hints provided by BO field of Branch instructions are obeyed during speculative execution. SBHE behaviour per ISA 3.1B: 0: The hints provided by BO field of Branch instructions may be ignored during speculative execution 1: The hints provided by BO field of Branch instructions are obeyed during speculative execution Add a sysctl entry to allow changing this aspect globally in the system at runtime: /proc/sys/kernel/speculative_branch_hint_enable Three values are supported: -1: Disable DEXCR SBHE sysctl override 0: Override and set DEXCR[SBHE] aspect to 0 1: Override and set DEXCR[SBHE] aspect to 1 Internally, introduces a mechanism to apply arbitrary system wide overrides on top of the prctl() config. Signed-off-by: Benjamin Gray --- arch/powerpc/kernel/dexcr.c | 125 1 file changed, 125 insertions(+) diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c index 9290beed722a..8239bcc92026 100644 --- a/arch/powerpc/kernel/dexcr.c +++ b/arch/powerpc/kernel/dexcr.c @@ -1,8 +1,11 @@ #include #include +#include #include #include #include +#include +#include #include #include @@ -18,6 +21,58 @@ #define DEXCR_PRCTL_EDITABLE (DEXCR_PRO_SBHE | DEXCR_PRO_IBRTPD | \ DEXCR_PRO_SRAPD | DEXCR_PRO_NPHIE) +/* + * Lock to protect system DEXCR override from concurrent updates. + * RCU semantics: writers take lock, readers are unlocked. + * Writers ensure the memory update is atomic, readers read + * atomically. + */ +static DEFINE_SPINLOCK(dexcr_sys_enforced_write_lock); + +struct mask_override { + union { + struct { + unsigned int mask; + unsigned int override; + }; + + /* Raw access for atomic read/write */ + unsigned long all; + }; +}; + +static struct mask_override dexcr_sys_enforced; + +static int spec_branch_hint_enable = -1; + +static void update_userspace_system_dexcr(unsigned int pro_mask, int value) +{ + struct mask_override update = { .all = 0 }; + + switch (value) { + case -1: /* Clear the mask bit, clear the override bit */ + break; + case 0: /* Set the mask bit, clear the override bit */ + update.mask |= pro_mask; + break; + case 1: /* Set the mask bit, set the override bit */ + update.mask |= pro_mask; + update.override |= pro_mask; + break; + } + + spin_lock(&dexcr_sys_enforced_write_lock); + + /* Use the existing values for the non-updated bits */ + update.mask |= dexcr_sys_enforced.mask & ~pro_mask; + update.override |= dexcr_sys_enforced.override & ~pro_mask; + + /* Atomically update system enforced aspects */ + WRITE_ONCE(dexcr_sys_enforced.all, update.all); + + spin_unlock(&dexcr_sys_enforced_write_lock); +} + static int __init dexcr_init(void) { if (!early_cpu_has_feature(CPU_FTR_ARCH_31)) @@ -25,6 +80,9 @@ static int __init dexcr_init(void) mtspr(SPRN_DEXCR, DEFAULT_DEXCR); + if (early_cpu_has_feature(CPU_FTR_DEXCR_SBHE)) + update_userspace_system_dexcr(DEXCR_PRO_SBHE, spec_branch_hint_enable); + return 0; } early_initcall(dexcr_init); @@ -52,9 +110,15 @@ unsigned long get_thread_dexcr(struct thread_struct const *t) { unsigned long dexcr = DEFAULT_DEXCR; + /* Atomically read enforced mask & override */ + struct mask_override enforced = READ_ONCE(dexcr_sys_enforced); + /* Apply prctl overrides */ dexcr = (dexcr & ~t->dexcr_mask) | t->dexcr_override; + /* Apply system overrides */ + dexcr = (dexcr & ~enforced.mask) | enforced.override; + return dexcr; } @@ -176,3 +240,64 @@ int dexcr_prctl_set(struct task_struct *task, unsigned long which, unsigned long return 0; } + +#ifdef CONFIG_SYSCTL + +static const int min_sysctl_val = -1; + +static int sysctl_dexcr_sbhe_handler(struct ctl_table *table, int write, +void *buf, size_t *lenp, loff_t *ppos) +{ + int err; + int prev = spec_branch_hint_enable; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (!cpu_has_feature(CPU_FTR_DEXCR_SBHE)) + return -ENODEV; + + err = proc_dointvec_minmax(table, write, buf, lenp, ppos); + if (err) + return err; + + if (prev != spec_branch_hint_enable && write) { + update_userspace_system_dexcr(DEXCR_PRO_SBHE, spec_branch_hint_enable); + cpus_read_lock(); + on_each_cpu(update_dexcr_on_cpu, NULL, 1); + cpus_read_unlock(); + } + + return 0; +} + +static struct ctl_table dexcr_sbhe_ctl_table[] = { + {
Re: [PATCH 12/13] powerpc/tracing: tracepoints for RTAS entry and exit
On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote: > Add two sets of tracepoints to be used around RTAS entry: > > * rtas_input/rtas_output, which emit the function name, its inputs, > the returned status, and any other outputs. These produce an API-level > record of OS<->RTAS activity. > > * rtas_ll_entry/rtas_ll_exit, which are lower-level and emit the > entire contents of the parameter block (aka rtas_args) on entry and > exit. Likely useful only for debugging. > > With uses of these tracepoints in do_enter_rtas() to be added in the > following patch, examples of get-time-of-day and event-scan functions > as rendered by trace-cmd (with some multi-line formatting manually > imposed on the rtas_ll_* entries to avoid extremely long lines in the > commit message): > > cat-36800 [059] 4978.518303: rtas_input: get-time-of-day arguments: > cat-36800 [059] 4978.518306: rtas_ll_entry:token=3 nargs=0 nret=8 > params: [0]=0x > [1]=0x [2]=0x [3]=0x > [4]=0x > [5]=0x [6]=0x [7]=0x > [8]=0x > [9]=0x [10]=0x [11]=0x > [12]=0x > [13]=0x [14]=0x [15]=0x > cat-36800 [059] 4978.518366: rtas_ll_exit: token=3 nargs=0 nret=8 > params: [0]=0x > [1]=0x07e6 [2]=0x000b [3]=0x0001 > [4]=0x > [5]=0x000e [6]=0x0008 [7]=0x2e0dac40 > [8]=0x > [9]=0x [10]=0x [11]=0x > [12]=0x > [13]=0x [14]=0x [15]=0x > cat-36800 [059] 4978.518366: rtas_output: get-time-of-day status: > 0, other outputs: 2022 11 1 0 14 8 772648000 > > kworker/39:1-336 [039] 4982.731623: rtas_input: event-scan > arguments: 4294967295 0 80484920 2048 > kworker/39:1-336 [039] 4982.731626: rtas_ll_entry:token=6 nargs=4 > nret=1 > params: > [0]=0x [1]=0x [2]=0x04cc1a38 [3]=0x0800 > > [4]=0x [5]=0x000e [6]=0x0008 [7]=0x2e0dac40 > > [8]=0x [9]=0x [10]=0x [11]=0x > > [12]=0x [13]=0x [14]=0x [15]=0x > kworker/39:1-336 [039] 4982.731676: rtas_ll_exit: token=6 nargs=4 > nret=1 > params: > [0]=0x [1]=0x [2]=0x04cc1a38 [3]=0x0800 > > [4]=0x0001 [5]=0x000e [6]=0x0008 [7]=0x2e0dac40 > > [8]=0x [9]=0x [10]=0x [11]=0x > > [12]=0x [13]=0x [14]=0x [15]=0x > kworker/39:1-336 [039] 4982.731677: rtas_output: event-scan > status: 1, other outputs: > > Signed-off-by: Nathan Lynch > --- > arch/powerpc/include/asm/trace.h | 116 +++ > 1 file changed, 116 insertions(+) > > diff --git a/arch/powerpc/include/asm/trace.h > b/arch/powerpc/include/asm/trace.h > index 08cd60cd70b7..e7a301c9eb95 100644 > --- a/arch/powerpc/include/asm/trace.h > +++ b/arch/powerpc/include/asm/trace.h > @@ -119,6 +119,122 @@ TRACE_EVENT_FN_COND(hcall_exit, > ); > #endif > > +#ifdef CONFIG_PPC_RTAS > + > +#include > + > +/* > + * Since stop-self is how CPUs go offline on RTAS platforms, > + * these tracepoints are conditional. > + */ > + > +TRACE_EVENT_CONDITION(rtas_input, > + > + TP_PROTO(struct rtas_args *rtas_args, const char *name), > + > + TP_ARGS(rtas_args, name), > + > + TP_CONDITION(cpu_online(raw_smp_processor_id())), > + > + TP_STRUCT__entry( > + __field(__u32, nargs) > + __string(name, name) > + __dynamic_array(__u32, inputs, be32_to_cpu(rtas_args->nargs)) > + ), > + > + TP_fast_assign( > + __entry->nargs = be32_to_cpu(rtas_args->nargs); > + __assign_str(name, name); > + be32_to_cpu_array(__get_dynamic_array(inputs), rtas_args->args, > __entry->nargs); > + ), > + > + TP_printk("%s arguments: %s", __get_str(name), > + __print_array(__get_dynamic_array(inputs), __en
Re: [PATCH linux-next][RFC]torture: avoid offline tick_do_timer_cpu
Thank you all for your guidance and encouragement! I learn how to construct commit message properly and learn how important the role that the torture test framework plays for the Linux kernel. Hope I can be of benefit to the community by my work. I am going to continue to study this topic and study the torture test framework, and wait for your further instructions. Best Regards Zhouyi On Mon, Nov 28, 2022 at 1:53 AM Paul E. McKenney wrote: > > On Sun, Nov 27, 2022 at 01:40:28PM +0100, Thomas Gleixner wrote: > > [ . . . ] > > > >> No. We are not exporting this just to make a bogus test case happy. > > >> > > >> Fix the torture code to handle -EBUSY correctly. > > > I am going to do a study on this, for now, I do a grep in the kernel tree: > > > find . -name "*.c"|xargs grep cpuhp_setup_state|wc -l > > > The result of the grep command shows that there are 268 > > > cpuhp_setup_state* cases. > > > which may make our task more complicated. > > > > Why? The whole point of this torture thing is to stress the > > infrastructure. > > Indeed. > > > There are quite some reasons why a CPU-hotplug or a hot-unplug operation > > can fail, which is not a fatal problem, really. > > > > So if a CPU hotplug operation fails, then why can't the torture test > > just move on and validate that the system still behaves correctly? > > > > That gives us more coverage than just testing the good case and giving > > up when something unexpected happens. > > Agreed, with access to a function like the tick_nohz_full_timekeeper() > suggested earlier in this email thread, then yes, it would make sense to > try to offline the CPU anyway, then forgive the failure in cases where > the CPU matches that indicated by tick_nohz_full_timekeeper(). > > > I even argue that the torture test should inject random failures into > > the hotplug state machine to achieve extended code coverage. > > I could imagine torture_onoff() telling various CPU-hotplug notifiers > to refuse the transition using some TBD interface. That would better > test the CPU-hotplug common code's ability to deal with failures. > > Or did you have something else/additional in mind? > > Thanx, Paul
Re: [PATCH 13/13] powerpc/rtas: place tracepoints in do_enter_rtas()
On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote: > Call the just-added rtas tracepoints in do_enter_rtas(), taking care > to avoid function name lookups in the CPU offline path. > > Signed-off-by: Nathan Lynch > --- > arch/powerpc/kernel/rtas.c | 23 +++ > 1 file changed, 23 insertions(+) > > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c > index 198366d641d0..3487b42cfbf7 100644 > --- a/arch/powerpc/kernel/rtas.c > +++ b/arch/powerpc/kernel/rtas.c > @@ -38,6 +38,7 @@ > #include > #include > #include > +#include > #include > > enum rtas_function_flags { > @@ -525,6 +526,7 @@ void enter_rtas(unsigned long); > static void do_enter_rtas(struct rtas_args *args) > { > unsigned long msr; > + const char *name = NULL; > > /* >* Make sure MSR[RI] is currently enabled as it will be forced later > @@ -537,9 +539,30 @@ static void do_enter_rtas(struct rtas_args *args) > > hard_irq_disable(); /* Ensure MSR[EE] is disabled on PPC64 */ > > + if ((trace_rtas_input_enabled() || trace_rtas_output_enabled())) { > + /* > + * rtas_token_to_function() uses xarray which uses RCU, > + * but this code can run in the CPU offline path > + * (e.g. stop-self), after it's become invalid to call > + * RCU APIs. > + */ We can call this in real-mode via pseries_machine_check_realmode -> fwnmi_release_errinfo, so tracing should be disabled for that case too... Does this_cpu_set_ftrace_enabled(0) in the early machine check handler cover that sufficiently? Thanks, Nick
[PATCH v3 real 01/17] powerpc/qspinlock: powerpc qspinlock implementation
Add a powerpc specific implementation of queued spinlocks. This is the build framework with a very simple (non-queued) spinlock implementation to begin with. Later changes add queueing, and other features and optimisations one-at-a-time. It is done this way to more easily see how the queued spinlocks are built, and to make performance and correctness bisects more useful. Signed-off-by: Nicholas Piggin --- Missed the first patch sending the series :( Here is the real patch 1. Thanks, NIck arch/powerpc/Kconfig | 1 - arch/powerpc/include/asm/paravirt.h | 3 +- arch/powerpc/include/asm/processor.h | 1 + arch/powerpc/include/asm/qspinlock.h | 87 +++ arch/powerpc/include/asm/qspinlock_paravirt.h | 7 -- arch/powerpc/include/asm/qspinlock_types.h| 13 +++ arch/powerpc/include/asm/spinlock.h | 2 +- arch/powerpc/include/asm/spinlock_types.h | 2 +- arch/powerpc/lib/Makefile | 4 +- arch/powerpc/lib/qspinlock.c | 17 arch/powerpc/platforms/pseries/vas.c | 1 + 11 files changed, 67 insertions(+), 71 deletions(-) delete mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h create mode 100644 arch/powerpc/include/asm/qspinlock_types.h create mode 100644 arch/powerpc/lib/qspinlock.c diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2ca5418457ed..1d5b4f280feb 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -155,7 +155,6 @@ config PPC select ARCH_USE_CMPXCHG_LOCKREF if PPC64 select ARCH_USE_MEMTEST select ARCH_USE_QUEUED_RWLOCKS if PPC_QUEUED_SPINLOCKS - select ARCH_USE_QUEUED_SPINLOCKSif PPC_QUEUED_SPINLOCKS select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT select ARCH_WANT_IPC_PARSE_VERSION select ARCH_WANT_IRQS_OFF_ACTIVATE_MM diff --git a/arch/powerpc/include/asm/paravirt.h b/arch/powerpc/include/asm/paravirt.h index f5ba1a3c41f8..119b44b8e81b 100644 --- a/arch/powerpc/include/asm/paravirt.h +++ b/arch/powerpc/include/asm/paravirt.h @@ -3,14 +3,13 @@ #define _ASM_POWERPC_PARAVIRT_H #include -#include #ifdef CONFIG_PPC64 #include #include #endif #ifdef CONFIG_PPC_SPLPAR -#include +#include #include #include diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 631802999d59..640d9a35661c 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -39,6 +39,7 @@ #ifndef __ASSEMBLY__ #include #include +#include #include #include diff --git a/arch/powerpc/include/asm/qspinlock.h b/arch/powerpc/include/asm/qspinlock.h index b676c4fb90fd..b1443aab2145 100644 --- a/arch/powerpc/include/asm/qspinlock.h +++ b/arch/powerpc/include/asm/qspinlock.h @@ -2,83 +2,54 @@ #ifndef _ASM_POWERPC_QSPINLOCK_H #define _ASM_POWERPC_QSPINLOCK_H -#include -#include +#include +#include +#include -#define _Q_PENDING_LOOPS (1 << 9) /* not tuned */ - -#ifdef CONFIG_PARAVIRT_SPINLOCKS -extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val); -extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val); -extern void __pv_queued_spin_unlock(struct qspinlock *lock); - -static __always_inline void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) +static __always_inline int queued_spin_is_locked(struct qspinlock *lock) { - if (!is_shared_processor()) - native_queued_spin_lock_slowpath(lock, val); - else - __pv_queued_spin_lock_slowpath(lock, val); + return atomic_read(&lock->val); } -#define queued_spin_unlock queued_spin_unlock -static inline void queued_spin_unlock(struct qspinlock *lock) +static __always_inline int queued_spin_value_unlocked(struct qspinlock lock) { - if (!is_shared_processor()) - smp_store_release(&lock->locked, 0); - else - __pv_queued_spin_unlock(lock); + return !atomic_read(&lock.val); } -#else -extern void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val); -#endif - -static __always_inline void queued_spin_lock(struct qspinlock *lock) +static __always_inline int queued_spin_is_contended(struct qspinlock *lock) { - u32 val = 0; - - if (likely(arch_atomic_try_cmpxchg_lock(&lock->val, &val, _Q_LOCKED_VAL))) - return; - - queued_spin_lock_slowpath(lock, val); + return 0; } -#define queued_spin_lock queued_spin_lock -#ifdef CONFIG_PARAVIRT_SPINLOCKS -#define SPIN_THRESHOLD (1<<15) /* not tuned */ - -static __always_inline void pv_wait(u8 *ptr, u8 val) +static __always_inline int queued_spin_trylock(struct qspinlock *lock) { - if (*ptr != val) - return; - yield_to_any(); - /* -* We could pass in a CPU here if waiting in the queue and yield to -* the previous CPU in the queue. -*/ + retur
Re: [PATCH] pseries/mobility: reset the RCU watchdogs after a LPM
On Sat Nov 26, 2022 at 3:32 AM AEST, Laurent Dufour wrote: > The RCU watchdog timer should be reset when restarting the CPU after a Live > Partition Mobility operation. > > Signed-off-by: Laurent Dufour Looks okay to me. xmon touches the softlockup watchdog explicitly but is that for architectures with unsynchronized clocks maybe. Acked-by: Nicholas Piggin > --- > arch/powerpc/platforms/pseries/mobility.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/pseries/mobility.c > b/arch/powerpc/platforms/pseries/mobility.c > index 634fac5db3f9..9e10f38dd9ad 100644 > --- a/arch/powerpc/platforms/pseries/mobility.c > +++ b/arch/powerpc/platforms/pseries/mobility.c > @@ -636,8 +636,10 @@ static int do_join(void *arg) > } > /* >* Execution may have been suspended for several seconds, so > - * reset the watchdog. > + * reset the watchdogs. >*/ > + rcu_cpu_stall_reset(); > + /* touch_nmi_watchdog() also touch the soft lockup watchdog */ > touch_nmi_watchdog(); > return ret; > } > -- > 2.38.1
Re: [RFC PATCH 00/13] Add DEXCR support
On Mon, 2022-11-28 at 13:44 +1100, Benjamin Gray wrote: > This series is based on initial work by Chris Riedl that was not sent > to the list. > > Adds a kernel interface for userspace to interact with the DEXCR. > The DEXCR is a SPR that allows control over various execution > 'aspects', such as indirect branch prediction and enabling the > hashst/hashchk instructions. Further details are in ISA 3.1B > Book 3 chapter 12. > > This RFC proposes an interface for users to interact with the DEXCR. > It aims to support > > * Querying supported aspects > * Getting/setting aspects on a per-process level > * Allowing global overrides across all processes > > There are some parts that I'm not sure on the best way to approach > (hence RFC): > > * The feature names in arch/powerpc/kernel/dt_cpu_ftrs.c appear to be > unimplemented > in skiboot, so are being defined by this series. Is being so > verbose fine? These are going to need to be added to skiboot before they can be referenced in the kernel. Inclusion in skiboot makes them ABI, the kernel is just a consumer. > * What aspects should be editable by a process? E.g., SBHE has > effects that potentially bleed into other processes. Should > it only be system wide configurable? For context, ISA 3.1B p1358 says: In some micro-architectures, the execution behav- ior controlled by aspect 0 is difficult to change with any degree of timing precision. The change may also bleed over into other threads on the same pro- cessor. Any environment that has a dependence on the more secure setting of aspect 0 should not change the value, and ideally should share a pro- cessor only with similar threads. For other environ- ments, changes to the effective value of aspect 0 represent a relative risk tolerance for its aspect of execution behavior, with the understanding that there will be significant hysteresis in the execution behavior. If a process sets SBHE for itself and all it takes is context switching from a process with SBHE unset to cause exposure, then yeah I think it should just be global. I doubt branch hints have enough impact for process granularity to be especially desirable anyway. > * Should configuring certain aspects for the process be non- > privileged? E.g., > Is there harm in always allowing configuration of IBRTPD, SRAPD? > The *FORCE_SET* > action prevents further process local changes regardless of > privilege. I'm not aware of a reason why it would be a problem to allow unprivileged configuration as long as there's a way to prevent further changes. The concerning case is if a mitigation is set by a trusted process context, and then untrusted code is executed that manages to turn the mitigation off again. > * The tests fail Patchwork CI because of the new prctl macros, and > the CI > doesn't run headers_install and add -isystem > /usr/include to > the make command. The CI runs on x86 and cross compiles the kernel and selftests, and boots are done in qemu tcg. Maybe we can skip the build if the symbols are undefined or do something like #ifndef PR_PPC_DEXCR_... return KSFT_SKIP; #endif in the test itself? > * On handling an exception, I don't check if the NPHIE bit is enabled > in the DEXCR. > To do so would require reading both the DEXCR and HDEXCR, for > little gain (it > should only matter that the current instruction was a hashchk. If > so, the only > reason it would cause an exception is the failed check. If the > instruction is > rewritten between exception and check we'd be wrong anyway). For context, the hashst and hashchk instructions are implemented using previously reserved nops. I'm not aware of any reason a nop could trap (i.e. we could check for a trap that came from hashchk even if NPHIE is not set), but afaik that'd be the only reason we would have to check. > > The series is based on the earlier selftest utils series[1], so the > tests won't build > at all without applying that first. The kernel side should build fine > on ppc/next > 247f34f7b80357943234f93f247a1ae6b6c3a740 though. > > [1]: > https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20221122231103.15829-1-bg...@linux.ibm.com/ > > Benjamin Gray (13): > powerpc/book3s: Add missing include > powerpc: Add initial Dynamic Execution Control Register (DEXCR) > support > powerpc/dexcr: Handle hashchk exception > powerpc/dexcr: Support userspace ROP protection > prctl: Define PowerPC DEXCR interface > powerpc/dexcr: Add prctl implementation > powerpc/dexcr: Add sysctl entry for SBHE system override > powerpc/dexcr: Add enforced userspace ROP protection config > selftests/powerpc: Add more utility macros > selftests/powerpc: Add hashst/hashchk test > selftests/powerpc: Add DEXCR prctl, sysctl interface test > selftests/powerpc: Add DEXCR status utility lsdexcr > Documentation: Document PowerPC kernel DEXCR interface > > Documentation/powe
[PATCH v6 0/4] Option to build big-endian with ELFv2 ABI
This is hopefully the final attempt. Luis was happy for the module patch to go via the powerpc tree, so I've put the the ELFv2 for big endian build patches into the series. Hopefully we can deprecate the ELFv1 ABI Since v5, I cleaned up patch 2 as per Christophe's review. And patch 4 I removed the EXPERT depends so it's easier to test. It's marked as experimental, but we should soon make it default and try to deprecate the v1 ABI so we can eventually remove it. Thanks, Nick Nicholas Piggin (4): module: add module_elf_check_arch for module-specific checks powerpc/64: Add module check for ELF ABI version powerpc/64: Add big-endian ELFv2 flavour to crypto VMX asm generation powerpc/64: Option to build big-endian with ELFv2 ABI arch/powerpc/Kconfig | 21 + arch/powerpc/kernel/module_64.c| 10 ++ arch/powerpc/platforms/Kconfig.cputype | 4 ++-- drivers/crypto/vmx/Makefile| 12 +++- drivers/crypto/vmx/ppc-xlate.pl| 10 ++ include/linux/moduleloader.h | 3 +++ kernel/module/main.c | 10 ++ 7 files changed, 63 insertions(+), 7 deletions(-) -- 2.37.2
[PATCH v6 2/4] powerpc/64: Add module check for ELF ABI version
Override the generic module ELF check to provide a check for the ELF ABI version. This becomes important if we allow big-endian ELF ABI V2 builds but it doesn't hurt to check now. Cc: Jessica Yu Signed-off-by: Michael Ellerman [np: split patch, added changelog, adjust to Jessica's proposal] Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/module_64.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c index 7e45dc98df8a..ff045644f13f 100644 --- a/arch/powerpc/kernel/module_64.c +++ b/arch/powerpc/kernel/module_64.c @@ -31,6 +31,16 @@ this, and makes other things simpler. Anton? --RR. */ +bool module_elf_check_arch(Elf_Ehdr *hdr) +{ + unsigned long abi_level = hdr->e_flags & 0x3; + + if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2)) + return abi_level == 2; + else + return abi_level < 2; +} + #ifdef CONFIG_PPC64_ELF_ABI_V2 static func_desc_t func_desc(unsigned long addr) -- 2.37.2
[PATCH v6 1/4] module: add module_elf_check_arch for module-specific checks
The elf_check_arch() function is also used to test compatibility of usermode binaries. Kernel modules may have more specific requirements, for example powerpc would like to test for ABI version compatibility. Add a weak module_elf_check_arch() that defaults to true, and call it from elf_validity_check(). Cc: Michael Ellerman Signed-off-by: Jessica Yu [np: added changelog, adjust name, rebase] Acked-by: Luis Chamberlain Signed-off-by: Nicholas Piggin --- include/linux/moduleloader.h | 3 +++ kernel/module/main.c | 10 ++ 2 files changed, 13 insertions(+) diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h index 9e09d11ffe5b..7b4587a19189 100644 --- a/include/linux/moduleloader.h +++ b/include/linux/moduleloader.h @@ -13,6 +13,9 @@ * must be implemented by each architecture. */ +/* arch may override to do additional checking of ELF header architecture */ +bool module_elf_check_arch(Elf_Ehdr *hdr); + /* Adjust arch-specific sections. Return 0 on success. */ int module_frob_arch_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs, diff --git a/kernel/module/main.c b/kernel/module/main.c index d02d39c7174e..7b3f6fb0d428 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -1674,6 +1674,11 @@ static int elf_validity_check(struct load_info *info) info->hdr->e_machine); goto no_exec; } + if (!module_elf_check_arch(info->hdr)) { + pr_err("Invalid module architecture in ELF header: %u\n", + info->hdr->e_machine); + goto no_exec; + } if (info->hdr->e_shentsize != sizeof(Elf_Shdr)) { pr_err("Invalid ELF section header size\n"); goto no_exec; @@ -2247,6 +2252,11 @@ static void flush_module_icache(const struct module *mod) (unsigned long)mod->core_layout.base + mod->core_layout.size); } +bool __weak module_elf_check_arch(Elf_Ehdr *hdr) +{ + return true; +} + int __weak module_frob_arch_sections(Elf_Ehdr *hdr, Elf_Shdr *sechdrs, char *secstrings, -- 2.37.2
[PATCH v6 3/4] powerpc/64: Add big-endian ELFv2 flavour to crypto VMX asm generation
This allows asm generation for big-endian ELFv2 builds. Signed-off-by: Nicholas Piggin --- drivers/crypto/vmx/Makefile | 12 +++- drivers/crypto/vmx/ppc-xlate.pl | 10 ++ 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/drivers/crypto/vmx/Makefile b/drivers/crypto/vmx/Makefile index 2560cfea1dec..e33c7238e7f8 100644 --- a/drivers/crypto/vmx/Makefile +++ b/drivers/crypto/vmx/Makefile @@ -2,8 +2,18 @@ obj-$(CONFIG_CRYPTO_DEV_VMX_ENCRYPT) += vmx-crypto.o vmx-crypto-objs := vmx.o aesp8-ppc.o ghashp8-ppc.o aes.o aes_cbc.o aes_ctr.o aes_xts.o ghash.o +ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y) +override flavour := linux-ppc64le +else +ifdef CONFIG_PPC64_ELF_ABI_V2 +override flavour := linux-ppc64-elfv2 +else +override flavour := linux-ppc64 +endif +endif + quiet_cmd_perl = PERL$@ - cmd_perl = $(PERL) $< $(if $(CONFIG_CPU_LITTLE_ENDIAN), linux-ppc64le, linux-ppc64) > $@ + cmd_perl = $(PERL) $< $(flavour) > $@ targets += aesp8-ppc.S ghashp8-ppc.S diff --git a/drivers/crypto/vmx/ppc-xlate.pl b/drivers/crypto/vmx/ppc-xlate.pl index 36db2ef09e5b..b583898c11ae 100644 --- a/drivers/crypto/vmx/ppc-xlate.pl +++ b/drivers/crypto/vmx/ppc-xlate.pl @@ -9,6 +9,8 @@ open STDOUT,">$output" || die "can't open $output: $!"; my %GLOBALS; my $dotinlocallabels=($flavour=~/linux/)?1:0; +my $elfv2abi=(($flavour =~ /linux-ppc64le/) or ($flavour =~ /linux-ppc64-elfv2/))?1:0; +my $dotfunctions=($elfv2abi=~1)?0:1; # directives which need special treatment on different platforms @@ -40,7 +42,7 @@ my $globl = sub { }; my $text = sub { my $ret = ($flavour =~ /aix/) ? ".csect\t.text[PR],7" : ".text"; -$ret = ".abiversion2\n".$ret if ($flavour =~ /linux.*64le/); +$ret = ".abiversion2\n".$ret if ($elfv2abi); $ret; }; my $machine = sub { @@ -56,8 +58,8 @@ my $size = sub { if ($flavour =~ /linux/) { shift; my $name = shift; $name =~ s|^[\.\_]||; - my $ret = ".size $name,.-".($flavour=~/64$/?".":"").$name; - $ret .= "\n.size.$name,.-.$name" if ($flavour=~/64$/); + my $ret = ".size $name,.-".($dotfunctions?".":"").$name; + $ret .= "\n.size.$name,.-.$name" if ($dotfunctions); $ret; } else @@ -142,7 +144,7 @@ my $vmr = sub { # Some ABIs specify vrsave, special-purpose register #256, as reserved # for system use. -my $no_vrsave = ($flavour =~ /linux-ppc64le/); +my $no_vrsave = ($elfv2abi); my $mtspr = sub { my ($f,$idx,$ra) = @_; if ($idx == 256 && $no_vrsave) { -- 2.37.2
[PATCH v6 4/4] powerpc/64: Option to build big-endian with ELFv2 ABI
Provide an option to build big-endian kernels using the ELFv2 ABI. This works on GCC only for now. Clang is rumored to support this, but core build files need updating first, at least. This gives big-endian kernels useful advantages of the ELFv2 ABI, e.g., less stack usage, -mprofile-kernel support, better compatibility with eBPF tools. BE+ELFv2 is not officially supported by the GNU toolchain, but it works fine in testing and has been used by some userspace for some time (e.g., Void Linux). Tested-by: Michal Suchánek Reviewed-by: Segher Boessenkool Signed-off-by: Nicholas Piggin --- arch/powerpc/Kconfig | 21 + arch/powerpc/platforms/Kconfig.cputype | 4 ++-- 2 files changed, 23 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 2ca5418457ed..2d0d80bcc24a 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -1,6 +1,9 @@ # SPDX-License-Identifier: GPL-2.0 source "arch/powerpc/platforms/Kconfig.cputype" +config CC_HAS_ELFV2 + def_bool PPC64 && $(cc-option, -mabi=elfv2) + config 32BIT bool default y if PPC32 @@ -583,6 +586,24 @@ config KEXEC_FILE config ARCH_HAS_KEXEC_PURGATORY def_bool KEXEC_FILE +config PPC64_BIG_ENDIAN_ELF_ABI_V2 + bool "Build big-endian kernel using ELF ABI V2 (EXPERIMENTAL)" + depends on PPC64 && CPU_BIG_ENDIAN + depends on CC_HAS_ELFV2 + depends on LD_IS_BFD && LD_VERSION >= 22400 + default n + help + This builds the kernel image using the "Power Architecture 64-Bit ELF + V2 ABI Specification", which has a reduced stack overhead and faster + function calls. This internal kernel ABI option does not affect + userspace compatibility. + + The V2 ABI is standard for 64-bit little-endian, but for big-endian + it is less well tested by kernel and toolchain. However some distros + build userspace this way, and it can produce a functioning kernel. + + This requires GCC and binutils 2.24 or newer. + config RELOCATABLE bool "Build a relocatable kernel" depends on PPC64 || (FLATMEM && (44x || PPC_85xx)) diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 0c4eed9aea80..6e94d45f3baa 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -575,10 +575,10 @@ config CPU_LITTLE_ENDIAN endchoice config PPC64_ELF_ABI_V1 - def_bool PPC64 && CPU_BIG_ENDIAN + def_bool PPC64 && (CPU_BIG_ENDIAN && !PPC64_BIG_ENDIAN_ELF_ABI_V2) config PPC64_ELF_ABI_V2 - def_bool PPC64 && CPU_LITTLE_ENDIAN + def_bool PPC64 && !PPC64_ELF_ABI_V1 config PPC64_BOOT_WRAPPER def_bool n -- 2.37.2
[PATCH v3 2/7] selftests/powerpc: Add ptrace setup_core_pattern() null-terminator
- malloc() does not zero the buffer, - fread() does not null-terminate it's output, - `cat /proc/sys/kernel/core_pattern | hexdump -C` shows the file is not inherently null-terminated So using string operations on the buffer is risky. Explicitly add a null character to the end to make it safer. Signed-off-by: Benjamin Gray --- tools/testing/selftests/powerpc/ptrace/core-pkey.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/powerpc/ptrace/core-pkey.c b/tools/testing/selftests/powerpc/ptrace/core-pkey.c index bbc05ffc5860..5c82ed9e7c65 100644 --- a/tools/testing/selftests/powerpc/ptrace/core-pkey.c +++ b/tools/testing/selftests/powerpc/ptrace/core-pkey.c @@ -383,7 +383,7 @@ static int setup_core_pattern(char **core_pattern_, bool *changed_) goto out; } - ret = fread(core_pattern, 1, PATH_MAX, f); + ret = fread(core_pattern, 1, PATH_MAX - 1, f); fclose(f); if (!ret) { perror("Error reading core_pattern file"); @@ -391,6 +391,8 @@ static int setup_core_pattern(char **core_pattern_, bool *changed_) goto out; } + core_pattern[ret] = '\0'; + /* Check whether we can predict the name of the core file. */ if (!strcmp(core_pattern, "core") || !strcmp(core_pattern, "core.%p")) *changed_ = false; -- 2.38.1
[PATCH v3 0/7] Expand selftest utils
Started this when writing tests for a feature I'm working on, needing a way to read/write numbers to system files. After writing some utils to safely handle file IO and parsing, I realised I'd made the ~6th file read/write implementation and only(?) number parser that checks all the failure modes when expecting to parse a single number from a file. So these utils ended up becoming this series. I also modified some other test utils I came across while doing so. My understanding is selftests are not expected to be backported, so I wasn't concerned about only introducing new utils and leaving the existing implementations be. V3: * Add reviewed-by from previous version * Fix write(2) call to include creation mode Benjamin Gray (7): selftests/powerpc: Use mfspr/mtspr macros selftests/powerpc: Add ptrace setup_core_pattern() null-terminator selftests/powerpc: Add generic read/write file util selftests/powerpc: Add read/write debugfs file, int selftests/powerpc: Parse long/unsigned long value safely selftests/powerpc: Add {read,write}_{long,ulong} selftests/powerpc: Add automatically allocating read_file tools/testing/selftests/powerpc/dscr/dscr.h | 56 +--- .../selftests/powerpc/dscr/dscr_sysfs_test.c | 23 +- .../testing/selftests/powerpc/include/utils.h | 18 +- .../selftests/powerpc/nx-gzip/gzfht_test.c| 52 +-- tools/testing/selftests/powerpc/pmu/lib.c | 35 +- .../selftests/powerpc/ptrace/core-pkey.c | 28 +- .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 6 +- .../testing/selftests/powerpc/ptrace/ptrace.h | 5 +- .../selftests/powerpc/security/entry_flush.c | 12 +- .../selftests/powerpc/security/flush_utils.c | 3 +- .../selftests/powerpc/security/rfi_flush.c| 12 +- .../powerpc/security/uaccess_flush.c | 18 +- .../selftests/powerpc/syscalls/Makefile | 2 +- .../selftests/powerpc/syscalls/rtas_filter.c | 80 + tools/testing/selftests/powerpc/utils.c | 314 ++ 15 files changed, 341 insertions(+), 323 deletions(-) base-commit: 247f34f7b80357943234f93f247a1ae6b6c3a740 -- 2.38.1
[PATCH v3 1/7] selftests/powerpc: Use mfspr/mtspr macros
No need to write inline asm for mtspr/mfspr, we have macros for this in reg.h Signed-off-by: Benjamin Gray Reviewed-by: Andrew Donnellan --- tools/testing/selftests/powerpc/dscr/dscr.h | 17 + .../selftests/powerpc/ptrace/ptrace-hwbreak.c | 6 ++ tools/testing/selftests/powerpc/ptrace/ptrace.h | 5 + .../selftests/powerpc/security/flush_utils.c| 3 ++- 4 files changed, 10 insertions(+), 21 deletions(-) diff --git a/tools/testing/selftests/powerpc/dscr/dscr.h b/tools/testing/selftests/powerpc/dscr/dscr.h index 13e9b9e28e2c..b703714e7d98 100644 --- a/tools/testing/selftests/powerpc/dscr/dscr.h +++ b/tools/testing/selftests/powerpc/dscr/dscr.h @@ -23,6 +23,7 @@ #include #include +#include "reg.h" #include "utils.h" #define THREADS100 /* Max threads */ @@ -41,31 +42,23 @@ /* Prilvilege state DSCR access */ inline unsigned long get_dscr(void) { - unsigned long ret; - - asm volatile("mfspr %0,%1" : "=r" (ret) : "i" (SPRN_DSCR_PRIV)); - - return ret; + return mfspr(SPRN_DSCR_PRIV); } inline void set_dscr(unsigned long val) { - asm volatile("mtspr %1,%0" : : "r" (val), "i" (SPRN_DSCR_PRIV)); + mtspr(SPRN_DSCR_PRIV, val); } /* Problem state DSCR access */ inline unsigned long get_dscr_usr(void) { - unsigned long ret; - - asm volatile("mfspr %0,%1" : "=r" (ret) : "i" (SPRN_DSCR)); - - return ret; + return mfspr(SPRN_DSCR); } inline void set_dscr_usr(unsigned long val) { - asm volatile("mtspr %1,%0" : : "r" (val), "i" (SPRN_DSCR)); + mtspr(SPRN_DSCR, val); } /* Default DSCR access */ diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c index a0635a3819aa..1345e9b9af0f 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c +++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c @@ -23,6 +23,7 @@ #include #include #include "ptrace.h" +#include "reg.h" #define SPRN_PVR 0x11F #define PVR_8xx0x0050 @@ -620,10 +621,7 @@ static int ptrace_hwbreak(void) int main(int argc, char **argv, char **envp) { - int pvr = 0; - asm __volatile__ ("mfspr %0,%1" : "=r"(pvr) : "i"(SPRN_PVR)); - if (pvr == PVR_8xx) - is_8xx = true; + is_8xx = mfspr(SPRN_PVR) == PVR_8xx; return test_harness(ptrace_hwbreak, "ptrace-hwbreak"); } diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace.h b/tools/testing/selftests/powerpc/ptrace/ptrace.h index 4e0233c0f2b3..04788e5fc504 100644 --- a/tools/testing/selftests/powerpc/ptrace/ptrace.h +++ b/tools/testing/selftests/powerpc/ptrace/ptrace.h @@ -745,10 +745,7 @@ int show_tm_spr(pid_t child, struct tm_spr_regs *out) /* Analyse TEXASR after TM failure */ inline unsigned long get_tfiar(void) { - unsigned long ret; - - asm volatile("mfspr %0,%1" : "=r" (ret) : "i" (SPRN_TFIAR)); - return ret; + return mfspr(SPRN_TFIAR); } void analyse_texasr(unsigned long texasr) diff --git a/tools/testing/selftests/powerpc/security/flush_utils.c b/tools/testing/selftests/powerpc/security/flush_utils.c index 4d95965cb751..9c5c00e04f63 100644 --- a/tools/testing/selftests/powerpc/security/flush_utils.c +++ b/tools/testing/selftests/powerpc/security/flush_utils.c @@ -14,6 +14,7 @@ #include #include #include +#include "reg.h" #include "utils.h" #include "flush_utils.h" @@ -79,5 +80,5 @@ void set_dscr(unsigned long val) init = 1; } - asm volatile("mtspr %1,%0" : : "r" (val), "i" (SPRN_DSCR)); + mtspr(SPRN_DSCR, val); } -- 2.38.1
[PATCH v3 4/7] selftests/powerpc: Add read/write debugfs file, int
Debugfs files are not always integers, so make *_file return/write a byte buffer, and *_int deal with int values specifically. This increases consistency with the other file read/write helpers. Signed-off-by: Benjamin Gray --- .../testing/selftests/powerpc/include/utils.h | 6 ++-- .../selftests/powerpc/security/entry_flush.c | 12 +++ .../selftests/powerpc/security/rfi_flush.c| 12 +++ .../powerpc/security/uaccess_flush.c | 18 +- tools/testing/selftests/powerpc/utils.c | 34 --- 5 files changed, 47 insertions(+), 35 deletions(-) diff --git a/tools/testing/selftests/powerpc/include/utils.h b/tools/testing/selftests/powerpc/include/utils.h index 70885e5814a8..de5e3790f397 100644 --- a/tools/testing/selftests/powerpc/include/utils.h +++ b/tools/testing/selftests/powerpc/include/utils.h @@ -35,8 +35,10 @@ int pick_online_cpu(void); int read_file(const char *path, char *buf, size_t count, size_t *len); int write_file(const char *path, const char *buf, size_t count); -int read_debugfs_file(char *debugfs_file, int *result); -int write_debugfs_file(char *debugfs_file, int result); +int read_debugfs_file(const char *debugfs_file, char *buf, size_t count); +int write_debugfs_file(const char *debugfs_file, const char *buf, size_t count); +int read_debugfs_int(const char *debugfs_file, int *result); +int write_debugfs_int(const char *debugfs_file, int result); int read_sysfs_file(char *debugfs_file, char *result, size_t result_size); int perf_event_open_counter(unsigned int type, unsigned long config, int group_fd); diff --git a/tools/testing/selftests/powerpc/security/entry_flush.c b/tools/testing/selftests/powerpc/security/entry_flush.c index 68ce377b205e..e01c573deadd 100644 --- a/tools/testing/selftests/powerpc/security/entry_flush.c +++ b/tools/testing/selftests/powerpc/security/entry_flush.c @@ -34,18 +34,18 @@ int entry_flush_test(void) // The PMU event we use only works on Power7 or later SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06)); - if (read_debugfs_file("powerpc/rfi_flush", &rfi_flush_orig) < 0) { + if (read_debugfs_int("powerpc/rfi_flush", &rfi_flush_orig) < 0) { perror("Unable to read powerpc/rfi_flush debugfs file"); SKIP_IF(1); } - if (read_debugfs_file("powerpc/entry_flush", &entry_flush_orig) < 0) { + if (read_debugfs_int("powerpc/entry_flush", &entry_flush_orig) < 0) { perror("Unable to read powerpc/entry_flush debugfs file"); SKIP_IF(1); } if (rfi_flush_orig != 0) { - if (write_debugfs_file("powerpc/rfi_flush", 0) < 0) { + if (write_debugfs_int("powerpc/rfi_flush", 0) < 0) { perror("error writing to powerpc/rfi_flush debugfs file"); FAIL_IF(1); } @@ -105,7 +105,7 @@ int entry_flush_test(void) if (entry_flush == entry_flush_orig) { entry_flush = !entry_flush_orig; - if (write_debugfs_file("powerpc/entry_flush", entry_flush) < 0) { + if (write_debugfs_int("powerpc/entry_flush", entry_flush) < 0) { perror("error writing to powerpc/entry_flush debugfs file"); return 1; } @@ -120,12 +120,12 @@ int entry_flush_test(void) set_dscr(0); - if (write_debugfs_file("powerpc/rfi_flush", rfi_flush_orig) < 0) { + if (write_debugfs_int("powerpc/rfi_flush", rfi_flush_orig) < 0) { perror("unable to restore original value of powerpc/rfi_flush debugfs file"); return 1; } - if (write_debugfs_file("powerpc/entry_flush", entry_flush_orig) < 0) { + if (write_debugfs_int("powerpc/entry_flush", entry_flush_orig) < 0) { perror("unable to restore original value of powerpc/entry_flush debugfs file"); return 1; } diff --git a/tools/testing/selftests/powerpc/security/rfi_flush.c b/tools/testing/selftests/powerpc/security/rfi_flush.c index f73484a6470f..6bedc86443a6 100644 --- a/tools/testing/selftests/powerpc/security/rfi_flush.c +++ b/tools/testing/selftests/powerpc/security/rfi_flush.c @@ -34,18 +34,18 @@ int rfi_flush_test(void) // The PMU event we use only works on Power7 or later SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06)); - if (read_debugfs_file("powerpc/rfi_flush", &rfi_flush_orig) < 0) { + if (read_debugfs_int("powerpc/rfi_flush", &rfi_flush_orig) < 0) { perror("Unable to read powerpc/rfi_flush debugfs file"); SKIP_IF(1); } - if (read_debugfs_file("powerpc/entry_flush", &entry_flush_orig) < 0) { + if (read_debugfs_int("powerpc/entry_flush", &entry_flush_orig) < 0) { have_entry_flush = 0; } else { have_entry_flush = 1; if (entry_f
[PATCH v3 5/7] selftests/powerpc: Parse long/unsigned long value safely
Often a file is expected to hold an integral value. Existing functions will use a C stdlib function like atoi or strtol to parse the file. These operations are error prone, with complicated error conditions (atoi returns 0 if not a number, and is undefined behaviour if not in range. strtol returns 0 if not a number, and LONG_MIN/MAX if not in range + sets errno to ERANGE). Add a dedicated parse function that accounts for these error conditions so tests can safely parse numbers without undetected bad data. It's a bit ugly to generate the functions through a macro, but it beats copying the error check logic multiple times over. Signed-off-by: Benjamin Gray --- .../testing/selftests/powerpc/include/utils.h | 5 ++ tools/testing/selftests/powerpc/pmu/lib.c | 9 ++-- tools/testing/selftests/powerpc/utils.c | 53 +-- 3 files changed, 59 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/powerpc/include/utils.h b/tools/testing/selftests/powerpc/include/utils.h index de5e3790f397..b82e143a07c6 100644 --- a/tools/testing/selftests/powerpc/include/utils.h +++ b/tools/testing/selftests/powerpc/include/utils.h @@ -33,6 +33,11 @@ void *get_auxv_entry(int type); int pick_online_cpu(void); +int parse_int(const char *buffer, size_t count, int *result, int base); +int parse_long(const char *buffer, size_t count, long *result, int base); +int parse_uint(const char *buffer, size_t count, unsigned int *result, int base); +int parse_ulong(const char *buffer, size_t count, unsigned long *result, int base); + int read_file(const char *path, char *buf, size_t count, size_t *len); int write_file(const char *path, const char *buf, size_t count); int read_debugfs_file(const char *debugfs_file, char *buf, size_t count); diff --git a/tools/testing/selftests/powerpc/pmu/lib.c b/tools/testing/selftests/powerpc/pmu/lib.c index e8960e7a1271..771658278f55 100644 --- a/tools/testing/selftests/powerpc/pmu/lib.c +++ b/tools/testing/selftests/powerpc/pmu/lib.c @@ -192,16 +192,15 @@ bool require_paranoia_below(int level) { int err; long current; - char *end, buf[16]; + char buf[16] = {0}; + char *end; - if ((err = read_file(PARANOID_PATH, buf, sizeof(buf), NULL))) { + if ((err = read_file(PARANOID_PATH, buf, sizeof(buf) - 1, NULL))) { printf("Couldn't read " PARANOID_PATH "?\n"); return false; } - current = strtol(buf, &end, 10); - - if (end == buf) { + if ((err = parse_long(buf, sizeof(buf), ¤t, 10))) { printf("Couldn't parse " PARANOID_PATH "?\n"); return false; } diff --git a/tools/testing/selftests/powerpc/utils.c b/tools/testing/selftests/powerpc/utils.c index 8593e67ce779..c82539fd44f1 100644 --- a/tools/testing/selftests/powerpc/utils.c +++ b/tools/testing/selftests/powerpc/utils.c @@ -8,6 +8,8 @@ #include #include #include +#include +#include #include #include #include @@ -113,6 +115,53 @@ int write_debugfs_file(const char *subpath, const char *buf, size_t count) return write_file(path, buf, count); } +#define TYPE_MIN(x)\ + _Generic((x), \ + int:INT_MIN,\ + long: LONG_MIN, \ + unsigned int: 0, \ + unsigned long: 0) + +#define TYPE_MAX(x)\ + _Generic((x), \ + int:INT_MAX,\ + long: LONG_MAX, \ + unsigned int: INT_MAX,\ + unsigned long: LONG_MAX) + +#define define_parse_number(fn, type, super_type) \ + int fn(const char *buffer, size_t count, type *result, int base) \ + { \ + char *end; \ + super_type parsed; \ + \ + errno = 0; \ + parsed = _Generic(parsed, \ + intmax_t: strtoimax, \ + uintmax_t:strtoumax)(buffer, &end, base); \ + \ + if (errno == ERANGE || \ + parsed < TYPE_MIN(*result) || parsed > TYPE_MAX(*result)) \ + return ERANGE; \ + \ +
[PATCH v3 7/7] selftests/powerpc: Add automatically allocating read_file
A couple of tests roll their own auto-allocating file read logic. Add a generic implementation and convert them to use it. Signed-off-by: Benjamin Gray --- .../testing/selftests/powerpc/include/utils.h | 1 + .../selftests/powerpc/nx-gzip/gzfht_test.c| 37 + .../selftests/powerpc/syscalls/Makefile | 2 +- .../selftests/powerpc/syscalls/rtas_filter.c | 80 +++ tools/testing/selftests/powerpc/utils.c | 63 +++ 5 files changed, 75 insertions(+), 108 deletions(-) diff --git a/tools/testing/selftests/powerpc/include/utils.h b/tools/testing/selftests/powerpc/include/utils.h index 044b0236df38..95f3a24a4569 100644 --- a/tools/testing/selftests/powerpc/include/utils.h +++ b/tools/testing/selftests/powerpc/include/utils.h @@ -40,6 +40,7 @@ int parse_ulong(const char *buffer, size_t count, unsigned long *result, int bas int read_file(const char *path, char *buf, size_t count, size_t *len); int write_file(const char *path, const char *buf, size_t count); +int read_file_alloc(const char *path, char **buf, size_t *len); int read_long(const char *path, long *result, int base); int write_long(const char *path, long result, int base); int read_ulong(const char *path, unsigned long *result, int base); diff --git a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c index a6a226e1b8ba..4de079923ccb 100644 --- a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c +++ b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c @@ -143,41 +143,6 @@ int gzip_header_blank(char *buf) return i; } -/* Caller must free the allocated buffer return nonzero on error. */ -int read_alloc_input_file(char *fname, char **buf, size_t *bufsize) -{ - int err; - struct stat statbuf; - char *p; - size_t num_bytes; - - if (stat(fname, &statbuf)) { - perror(fname); - return -1; - } - - assert(NULL != (p = (char *) malloc(statbuf.st_size))); - - if ((err = read_file(fname, p, statbuf.st_size, &num_bytes))) { - fprintf(stderr, "Failed to read file: %s\n", strerror(err)); - goto fail; - } - - if (num_bytes != statbuf.st_size) { - fprintf(stderr, "Actual bytes != expected bytes\n"); - err = -1; - goto fail; - } - - *buf = p; - *bufsize = num_bytes; - return 0; - -fail: - free(p); - return err; -} - /* * Z_SYNC_FLUSH as described in zlib.h. * Returns number of appended bytes @@ -244,7 +209,7 @@ int compress_file(int argc, char **argv, void *handle) fprintf(stderr, "usage: %s \n", argv[0]); exit(-1); } - if (read_alloc_input_file(argv[1], &inbuf, &inlen)) + if (read_file_alloc(argv[1], &inbuf, &inlen)) exit(-1); fprintf(stderr, "file %s read, %ld bytes\n", argv[1], inlen); diff --git a/tools/testing/selftests/powerpc/syscalls/Makefile b/tools/testing/selftests/powerpc/syscalls/Makefile index b63f8459c704..54ff5cfffc63 100644 --- a/tools/testing/selftests/powerpc/syscalls/Makefile +++ b/tools/testing/selftests/powerpc/syscalls/Makefile @@ -6,4 +6,4 @@ CFLAGS += -I../../../../../usr/include top_srcdir = ../../../../.. include ../../lib.mk -$(TEST_GEN_PROGS): ../harness.c +$(TEST_GEN_PROGS): ../harness.c ../utils.c diff --git a/tools/testing/selftests/powerpc/syscalls/rtas_filter.c b/tools/testing/selftests/powerpc/syscalls/rtas_filter.c index 03b487f18d00..05f25f12556f 100644 --- a/tools/testing/selftests/powerpc/syscalls/rtas_filter.c +++ b/tools/testing/selftests/powerpc/syscalls/rtas_filter.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -50,70 +51,16 @@ struct region { struct region *next; }; -int read_entire_file(int fd, char **buf, size_t *len) -{ - size_t buf_size = 0; - size_t off = 0; - int rc; - - *buf = NULL; - do { - buf_size += BLOCK_SIZE; - if (*buf == NULL) - *buf = malloc(buf_size); - else - *buf = realloc(*buf, buf_size); - - if (*buf == NULL) - return -ENOMEM; - - rc = read(fd, *buf + off, BLOCK_SIZE); - if (rc < 0) - return -EIO; - - off += rc; - } while (rc == BLOCK_SIZE); - - if (len) - *len = off; - - return 0; -} - -static int open_prop_file(const char *prop_path, const char *prop_name, int *fd) -{ - char *path; - int len; - - /* allocate enough for two string, a slash and trailing NULL */ - len = strlen(prop_path) + strlen(prop_name) + 1 + 1; - path = malloc(len); - if (path == NULL) - return -ENOMEM; - - snprintf(path, len, "%s/%s", prop_path, prop_name); - -
[PATCH v3 6/7] selftests/powerpc: Add {read,write}_{long,ulong}
Add helper functions to read and write (unsigned) long values directly from/to files. One of the kernel interfaces uses hex strings, so we need to allow passing a base too. Signed-off-by: Benjamin Gray --- tools/testing/selftests/powerpc/dscr/dscr.h | 9 +-- .../selftests/powerpc/dscr/dscr_sysfs_test.c | 12 ++-- .../testing/selftests/powerpc/include/utils.h | 4 ++ tools/testing/selftests/powerpc/pmu/lib.c | 11 +--- tools/testing/selftests/powerpc/utils.c | 62 +++ 5 files changed, 76 insertions(+), 22 deletions(-) diff --git a/tools/testing/selftests/powerpc/dscr/dscr.h b/tools/testing/selftests/powerpc/dscr/dscr.h index 9a69d473ffdf..b5166ddcf26a 100644 --- a/tools/testing/selftests/powerpc/dscr/dscr.h +++ b/tools/testing/selftests/powerpc/dscr/dscr.h @@ -65,26 +65,21 @@ inline void set_dscr_usr(unsigned long val) unsigned long get_default_dscr(void) { int err; - char buf[16] = {0}; unsigned long val; - if ((err = read_file(DSCR_DEFAULT, buf, sizeof(buf) - 1, NULL))) { + if ((err = read_ulong(DSCR_DEFAULT, &val, 16))) { fprintf(stderr, "get_default_dscr() read failed: %s\n", strerror(err)); exit(1); } - sscanf(buf, "%lx", &val); return val; } void set_default_dscr(unsigned long val) { int err; - char buf[16]; - sprintf(buf, "%lx\n", val); - - if ((err = write_file(DSCR_DEFAULT, buf, strlen(buf { + if ((err = write_ulong(DSCR_DEFAULT, val, 16))) { fprintf(stderr, "set_default_dscr() write failed: %s\n", strerror(err)); exit(1); } diff --git a/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c b/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c index 310946262a24..3ac176888feb 100644 --- a/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c +++ b/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c @@ -12,15 +12,15 @@ static int check_cpu_dscr_default(char *file, unsigned long val) { - char buf[10] = {0}; - int rc; + unsigned long cpu_dscr; + int err; - if ((rc = read_file(file, buf, sizeof(buf) - 1, NULL))) - return rc; + if ((err = read_ulong(file, &cpu_dscr, 16))) + return err; - if (strtol(buf, NULL, 16) != val) { + if (cpu_dscr != val) { printf("DSCR match failed: %ld (system) %ld (cpu)\n", - val, strtol(buf, NULL, 16)); + val, cpu_dscr); return 1; } return 0; diff --git a/tools/testing/selftests/powerpc/include/utils.h b/tools/testing/selftests/powerpc/include/utils.h index b82e143a07c6..044b0236df38 100644 --- a/tools/testing/selftests/powerpc/include/utils.h +++ b/tools/testing/selftests/powerpc/include/utils.h @@ -40,6 +40,10 @@ int parse_ulong(const char *buffer, size_t count, unsigned long *result, int bas int read_file(const char *path, char *buf, size_t count, size_t *len); int write_file(const char *path, const char *buf, size_t count); +int read_long(const char *path, long *result, int base); +int write_long(const char *path, long result, int base); +int read_ulong(const char *path, unsigned long *result, int base); +int write_ulong(const char *path, unsigned long result, int base); int read_debugfs_file(const char *debugfs_file, char *buf, size_t count); int write_debugfs_file(const char *debugfs_file, const char *buf, size_t count); int read_debugfs_int(const char *debugfs_file, int *result); diff --git a/tools/testing/selftests/powerpc/pmu/lib.c b/tools/testing/selftests/powerpc/pmu/lib.c index 771658278f55..55481c5b6995 100644 --- a/tools/testing/selftests/powerpc/pmu/lib.c +++ b/tools/testing/selftests/powerpc/pmu/lib.c @@ -192,16 +192,9 @@ bool require_paranoia_below(int level) { int err; long current; - char buf[16] = {0}; - char *end; - if ((err = read_file(PARANOID_PATH, buf, sizeof(buf) - 1, NULL))) { - printf("Couldn't read " PARANOID_PATH "?\n"); - return false; - } - - if ((err = parse_long(buf, sizeof(buf), ¤t, 10))) { - printf("Couldn't parse " PARANOID_PATH "?\n"); + if ((err = read_long(PARANOID_PATH, ¤t, 10))) { + fprintf(stderr, "Couldn't read " PARANOID_PATH ": %s\n", strerror(err)); return false; } diff --git a/tools/testing/selftests/powerpc/utils.c b/tools/testing/selftests/powerpc/utils.c index c82539fd44f1..b2906dd71cf5 100644 --- a/tools/testing/selftests/powerpc/utils.c +++ b/tools/testing/selftests/powerpc/utils.c @@ -162,6 +162,68 @@ define_parse_number(parse_long, long, intmax_t); define_parse_number(parse_uint, unsigned int, uintmax_t); define_parse_number(parse_ulong, unsigned long, uintmax_t); +int read_long(const char *path, long *result, int base) +{ + int err; +
[PATCH v3 3/7] selftests/powerpc: Add generic read/write file util
File read/write is reimplemented in about 5 different ways in the various PowerPC selftests. This indicates it should be a common util. Add a common read_file / write_file implementation and convert users to it where (easily) possible. Signed-off-by: Benjamin Gray --- tools/testing/selftests/powerpc/dscr/dscr.h | 36 ++ .../selftests/powerpc/dscr/dscr_sysfs_test.c | 19 +-- .../testing/selftests/powerpc/include/utils.h | 2 + .../selftests/powerpc/nx-gzip/gzfht_test.c| 49 +++- tools/testing/selftests/powerpc/pmu/lib.c | 27 + .../selftests/powerpc/ptrace/core-pkey.c | 30 ++--- tools/testing/selftests/powerpc/utils.c | 108 ++ 7 files changed, 107 insertions(+), 164 deletions(-) diff --git a/tools/testing/selftests/powerpc/dscr/dscr.h b/tools/testing/selftests/powerpc/dscr/dscr.h index b703714e7d98..9a69d473ffdf 100644 --- a/tools/testing/selftests/powerpc/dscr/dscr.h +++ b/tools/testing/selftests/powerpc/dscr/dscr.h @@ -64,48 +64,30 @@ inline void set_dscr_usr(unsigned long val) /* Default DSCR access */ unsigned long get_default_dscr(void) { - int fd = -1, ret; - char buf[16]; + int err; + char buf[16] = {0}; unsigned long val; - if (fd == -1) { - fd = open(DSCR_DEFAULT, O_RDONLY); - if (fd == -1) { - perror("open() failed"); - exit(1); - } - } - memset(buf, 0, sizeof(buf)); - lseek(fd, 0, SEEK_SET); - ret = read(fd, buf, sizeof(buf)); - if (ret == -1) { - perror("read() failed"); + if ((err = read_file(DSCR_DEFAULT, buf, sizeof(buf) - 1, NULL))) { + fprintf(stderr, "get_default_dscr() read failed: %s\n", strerror(err)); exit(1); } + sscanf(buf, "%lx", &val); - close(fd); return val; } void set_default_dscr(unsigned long val) { - int fd = -1, ret; + int err; char buf[16]; - if (fd == -1) { - fd = open(DSCR_DEFAULT, O_RDWR); - if (fd == -1) { - perror("open() failed"); - exit(1); - } - } sprintf(buf, "%lx\n", val); - ret = write(fd, buf, strlen(buf)); - if (ret == -1) { - perror("write() failed"); + + if ((err = write_file(DSCR_DEFAULT, buf, strlen(buf { + fprintf(stderr, "set_default_dscr() write failed: %s\n", strerror(err)); exit(1); } - close(fd); } double uniform_deviate(int seed) diff --git a/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c b/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c index fbbdffdb2e5d..310946262a24 100644 --- a/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c +++ b/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c @@ -12,23 +12,12 @@ static int check_cpu_dscr_default(char *file, unsigned long val) { - char buf[10]; - int fd, rc; + char buf[10] = {0}; + int rc; - fd = open(file, O_RDWR); - if (fd == -1) { - perror("open() failed"); - return 1; - } - - rc = read(fd, buf, sizeof(buf)); - if (rc == -1) { - perror("read() failed"); - return 1; - } - close(fd); + if ((rc = read_file(file, buf, sizeof(buf) - 1, NULL))) + return rc; - buf[rc] = '\0'; if (strtol(buf, NULL, 16) != val) { printf("DSCR match failed: %ld (system) %ld (cpu)\n", val, strtol(buf, NULL, 16)); diff --git a/tools/testing/selftests/powerpc/include/utils.h b/tools/testing/selftests/powerpc/include/utils.h index e222a5858450..70885e5814a8 100644 --- a/tools/testing/selftests/powerpc/include/utils.h +++ b/tools/testing/selftests/powerpc/include/utils.h @@ -33,6 +33,8 @@ void *get_auxv_entry(int type); int pick_online_cpu(void); +int read_file(const char *path, char *buf, size_t count, size_t *len); +int write_file(const char *path, const char *buf, size_t count); int read_debugfs_file(char *debugfs_file, int *result); int write_debugfs_file(char *debugfs_file, int result); int read_sysfs_file(char *debugfs_file, char *result, size_t result_size); diff --git a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c index 095195a25687..a6a226e1b8ba 100644 --- a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c +++ b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c @@ -146,49 +146,36 @@ int gzip_header_blank(char *buf) /* Caller must free the allocated buffer return nonzero on error. */ int read_alloc_input_file(char *fname, char **buf, size_t *bufsize) { + int err; struct stat statbuf; - FILE *fp; char *p; size_t num_bytes; if (stat(fname, &statbuf)
[RFC PATCH] Disable Book-E KVM support?
BookE KVM is in a deep maintenance state, I'm not sure how much testing it gets. I don't have a test setup, and it does not look like QEMU has any HV architecture enabled. It hasn't been too painful but there are some cases where it causes a bit of problem not being able to test, e.g., https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-November/251452.html Time to begin removal process, or are there still people using it? I'm happy to to keep making occasional patches to try keep it going if there are people testing upstream. Getting HV support into QEMU would help with long term support, not sure how big of a job that would be. Thanks, Nick --- arch/powerpc/kvm/Kconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index a9f57dad6d91..6c9458741cb3 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -191,6 +191,7 @@ config KVM_EXIT_TIMING config KVM_E500V2 bool "KVM support for PowerPC E500v2 processors" + depends on false depends on PPC_E500 && !PPC_E500MC depends on !CONTEXT_TRACKING_USER select KVM @@ -207,6 +208,7 @@ config KVM_E500V2 config KVM_E500MC bool "KVM support for PowerPC E500MC/E5500/E6500 processors" + depends on false depends on PPC_E500MC depends on !CONTEXT_TRACKING_USER select KVM -- 2.37.2