Re: [PATCH mm-unstable v1 16/20] mm/frame-vector: remove FOLL_FORCE usage

2022-11-27 Thread David Hildenbrand

On 16.11.22 11:26, David Hildenbrand wrote:

FOLL_FORCE is really only for ptrace access. According to commit
707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always
writable"), get_vaddr_frames() currently pins all pages writable as a
workaround for issues with read-only buffers.

FOLL_FORCE, however, seems to be a legacy leftover as it predates
commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are
always writable"). Let's just remove it.

Once the read-only buffer issue has been resolved, FOLL_WRITE could
again be set depending on the DMA direction.

Cc: Hans Verkuil 
Cc: Marek Szyprowski 
Cc: Tomasz Figa 
Cc: Marek Szyprowski 
Cc: Mauro Carvalho Chehab 
Signed-off-by: David Hildenbrand 
---
  drivers/media/common/videobuf2/frame_vector.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/common/videobuf2/frame_vector.c 
b/drivers/media/common/videobuf2/frame_vector.c
index 542dde9d2609..062e98148c53 100644
--- a/drivers/media/common/videobuf2/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -50,7 +50,7 @@ int get_vaddr_frames(unsigned long start, unsigned int 
nr_frames,
start = untagged_addr(start);
  
  	ret = pin_user_pages_fast(start, nr_frames,

- FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
+ FOLL_WRITE | FOLL_LONGTERM,
  (struct page **)(vec->ptrs));
if (ret > 0) {
vec->got_ref = true;



Hi Andrew,

see the discussion at [1] regarding a conflict and how to proceed with
upstreaming. The conflict would be easy to resolve, however, also
the patch description doesn't make sense anymore with [1].


On top of mm-unstable, reverting this patch and applying [1] gives me
an updated patch:


From 1e66c25f1467c1f1e5f275312f2c6df29308d4df Mon Sep 17 00:00:00 2001
From: David Hildenbrand 
Date: Wed, 16 Nov 2022 11:26:55 +0100
Subject: [PATCH] mm/frame-vector: remove FOLL_FORCE usage

GUP now supports reliable R/O long-term pinning in COW mappings, such
that we break COW early. MAP_SHARED VMAs only use the shared zeropage so
far in one corner case (DAXFS file with holes), which can be ignored
because GUP does not support long-term pinning in fsdax (see
check_vma_flags()).

Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required
for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop
using FOLL_FORCE, which is really only for ptrace access.

Reviewed-by: Daniel Vetter 
Acked-by: Hans Verkuil 
Cc: Hans Verkuil 
Cc: Marek Szyprowski 
Cc: Tomasz Figa 
Cc: Marek Szyprowski 
Cc: Mauro Carvalho Chehab 
Signed-off-by: David Hildenbrand 
---
 drivers/media/common/videobuf2/frame_vector.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/common/videobuf2/frame_vector.c 
b/drivers/media/common/videobuf2/frame_vector.c
index aad72640f055..8606fdacf5b8 100644
--- a/drivers/media/common/videobuf2/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -41,7 +41,7 @@ int get_vaddr_frames(unsigned long start, unsigned int 
nr_frames, bool write,
int ret_pin_user_pages_fast = 0;
int ret = 0;
int err;
-   unsigned int gup_flags = FOLL_FORCE | FOLL_LONGTERM;
+   unsigned int gup_flags = FOLL_LONGTERM;
 
 	if (nr_frames == 0)

return 0;
--
2.38.1



Please let me know how you want to proceed. Ideally, you'd pick up
[1] and apply this updated patch. Also, please tell me if I should
send this updated patch in a separate mail (e.g., as reply to this mail).


[1] https://lkml.kernel.org/r/71bdd3cf-b044-3f12-df58-7c16d5749...@xs4all.nl

--
Thanks,

David / dhildenb



Re: [PATCH linux-next][RFC]torture: avoid offline tick_do_timer_cpu

2022-11-27 Thread Thomas Gleixner
Zhouyi,

On Sun, Nov 27 2022 at 10:45, Zhouyi Zhou wrote:
> On Sun, Nov 27, 2022 at 1:05 AM Thomas Gleixner  wrote:
>
> So, I should construct my patch as:
> We avoid ... by ...

Not "We avoid".

Avoid this behaviour by 

>> No. We are not exporting this just to make a bogus test case happy.
>>
>> Fix the torture code to handle -EBUSY correctly.
> I am going to do a study on this, for now, I do a grep in the kernel tree:
> find . -name "*.c"|xargs grep cpuhp_setup_state|wc -l
> The result of the grep command shows that there are 268
> cpuhp_setup_state* cases.
> which may make our task more complicated.

Why? The whole point of this torture thing is to stress the
infrastructure.

There are quite some reasons why a CPU-hotplug or a hot-unplug operation
can fail, which is not a fatal problem, really.

So if a CPU hotplug operation fails, then why can't the torture test
just move on and validate that the system still behaves correctly?

That gives us more coverage than just testing the good case and giving
up when something unexpected happens.

I even argue that the torture test should inject random failures into
the hotplug state machine to achieve extended code coverage.

Thanks,

tglx





[PATCH 00/17] powerpc: Remove STACK_FRAME_OVERHEAD

2022-11-27 Thread Nicholas Piggin
Since RFC:
- Fix a compile bug.
- Fix BookE KVM properly. Hopefully -- I don't have a BookE
  KVM environment to test. Can QEMU do it? Is it still tested?
- Drop the last two patches that changed the stack layout, they
  can be done later.
- Drop the load/store-multiple change to 32-bit.

Thanks,
Nick

Nicholas Piggin (17):
  KVM: PPC: Book3E: Fix CONFIG_TRACE_IRQFLAGS support
  powerpc/64: Remove asm interrupt tracing call helpers
  powerpc/perf: callchain validate kernel stack pointer bounds
  powerpc: Rearrange copy_thread child stack creation
  powerpc/pseries: hvcall stack frame overhead
  powerpc: simplify ppc_save_regs
  powerpc: add definition for pt_regs offset within an interrupt frame
  powerpc: add a definition for the marker offset within the interrupt
frame
  powerpc: Rename STACK_FRAME_MARKER and derive it from frame offset
  powerpc: add a define for the user interrupt frame size
  powerpc: add a define for the switch frame size and regs offset
  powerpc: copy_thread fill in interrupt frame marker and back chain
  powerpc: copy_thread add a back chain to the switch stack frame
  powerpc: split validate_sp into two functions
  powerpc: allow minimum sized kernel stack frames
  powerpc/64: ELFv2 use minimal stack frames in int and switch frame
sizes
  powerpc: remove STACK_FRAME_OVERHEAD

 arch/powerpc/include/asm/irqflags.h   | 58 -
 arch/powerpc/include/asm/kvm_ppc.h| 12 +++
 arch/powerpc/include/asm/processor.h  | 15 +++-
 arch/powerpc/include/asm/ptrace.h | 37 ++---
 arch/powerpc/kernel/asm-offsets.c |  9 +-
 arch/powerpc/kernel/entry_32.S| 14 ++--
 arch/powerpc/kernel/exceptions-64e.S  | 44 +-
 arch/powerpc/kernel/exceptions-64s.S  | 82 +--
 arch/powerpc/kernel/head_32.h |  4 +-
 arch/powerpc/kernel/head_40x.S|  2 +-
 arch/powerpc/kernel/head_44x.S|  6 +-
 arch/powerpc/kernel/head_64.S |  6 +-
 arch/powerpc/kernel/head_85xx.S   |  8 +-
 arch/powerpc/kernel/head_8xx.S|  2 +-
 arch/powerpc/kernel/head_book3s_32.S  |  4 +-
 arch/powerpc/kernel/head_booke.h  |  4 +-
 arch/powerpc/kernel/interrupt_64.S| 32 
 arch/powerpc/kernel/irq.c |  4 +-
 arch/powerpc/kernel/kgdb.c|  2 +-
 arch/powerpc/kernel/misc_32.S |  2 +-
 arch/powerpc/kernel/misc_64.S |  4 +-
 arch/powerpc/kernel/optprobes_head.S  |  4 +-
 arch/powerpc/kernel/ppc_save_regs.S   | 57 -
 arch/powerpc/kernel/process.c | 54 +++-
 arch/powerpc/kernel/smp.c |  2 +-
 arch/powerpc/kernel/stacktrace.c  | 10 +--
 arch/powerpc/kernel/tm.S  |  8 +-
 arch/powerpc/kernel/trace/ftrace_mprofile.S   |  2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  2 +-
 arch/powerpc/kvm/booke.c  |  3 +
 arch/powerpc/kvm/bookehv_interrupts.S |  9 --
 .../lib/test_emulate_step_exec_instr.S|  2 +-
 arch/powerpc/perf/callchain.c |  9 +-
 arch/powerpc/platforms/pseries/hvCall.S   | 38 +
 arch/powerpc/xmon/xmon.c  | 10 +--
 35 files changed, 259 insertions(+), 302 deletions(-)

-- 
2.37.2



[PATCH 01/17] KVM: PPC: Book3E: Fix CONFIG_TRACE_IRQFLAGS support

2022-11-27 Thread Nicholas Piggin
32-bit does not trace_irqs_off() to match the trace_irqs_on() call in
kvmppc_fix_ee_before_entry(). This can lead to irqs being enabled twice
in the trace, and the irqs-off region between guest exit and the host
enabling local irqs again is not properly traced.

64-bit code does call this, but from asm code where volatiles are live
and so incorrectly get clobbered.

Move the irq reconcile into C to fix both problems.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/kvm_ppc.h| 12 
 arch/powerpc/kvm/booke.c  |  3 +++
 arch/powerpc/kvm/bookehv_interrupts.S |  9 -
 3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index bfacf12784dd..eae9619b6190 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -1014,6 +1014,18 @@ static inline void kvmppc_fix_ee_before_entry(void)
 #endif
 }
 
+static inline void kvmppc_fix_ee_after_exit(void)
+{
+#ifdef CONFIG_PPC64
+   /* Only need to enable IRQs by hard enabling them after this */
+   local_paca->irq_happened = PACA_IRQ_HARD_DIS;
+   irq_soft_mask_set(IRQS_ALL_DISABLED);
+#endif
+
+   trace_hardirqs_off();
+}
+
+
 static inline ulong kvmppc_get_ea_indexed(struct kvm_vcpu *vcpu, int ra, int 
rb)
 {
ulong ea;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 7b4920e9fd26..0dce93ccaadf 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1015,6 +1015,9 @@ int kvmppc_handle_exit(struct kvm_vcpu *vcpu, unsigned 
int exit_nr)
u32 last_inst = KVM_INST_FETCH_FAILED;
enum emulation_result emulated = EMULATE_DONE;
 
+   /* Fix irq state (pairs with kvmppc_fix_ee_before_entry()) */
+   kvmppc_fix_ee_after_exit();
+
/* update before a new last_exit_type is rewritten */
kvmppc_update_timing_stats(vcpu);
 
diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S
index 8262c14fc9e6..b5fe6fb53c66 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -424,15 +424,6 @@ _GLOBAL(kvmppc_resume_host)
mtspr   SPRN_EPCR, r3
isync
 
-#ifdef CONFIG_64BIT
-   /*
-* We enter with interrupts disabled in hardware, but
-* we need to call RECONCILE_IRQ_STATE to ensure
-* that the software state is kept in sync.
-*/
-   RECONCILE_IRQ_STATE(r3,r5)
-#endif
-
/* Switch to kernel stack and jump to handler. */
mr  r3, r4
mr  r5, r14 /* intno */
-- 
2.37.2



[PATCH 02/17] powerpc/64: Remove asm interrupt tracing call helpers

2022-11-27 Thread Nicholas Piggin
These are now unused. Remove.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/irqflags.h | 58 -
 1 file changed, 58 deletions(-)

diff --git a/arch/powerpc/include/asm/irqflags.h 
b/arch/powerpc/include/asm/irqflags.h
index 1a6c1ce17735..47d46712928a 100644
--- a/arch/powerpc/include/asm/irqflags.h
+++ b/arch/powerpc/include/asm/irqflags.h
@@ -11,64 +11,6 @@
  */
 #include 
 
-#else
-#ifdef CONFIG_TRACE_IRQFLAGS
-#ifdef CONFIG_IRQSOFF_TRACER
-/*
- * Since the ftrace irqsoff latency trace checks CALLER_ADDR1,
- * which is the stack frame here, we need to force a stack frame
- * in case we came from user space.
- */
-#define TRACE_WITH_FRAME_BUFFER(func)  \
-   mflrr0; \
-   stdur1, -STACK_FRAME_OVERHEAD(r1);  \
-   std r0, 16(r1); \
-   stdur1, -STACK_FRAME_OVERHEAD(r1);  \
-   bl func;\
-   ld  r1, 0(r1);  \
-   ld  r1, 0(r1);
-#else
-#define TRACE_WITH_FRAME_BUFFER(func)  \
-   bl func;
-#endif
-
-/*
- * These are calls to C code, so the caller must be prepared for volatiles to
- * be clobbered.
- */
-#define TRACE_ENABLE_INTS  TRACE_WITH_FRAME_BUFFER(trace_hardirqs_on)
-#define TRACE_DISABLE_INTS TRACE_WITH_FRAME_BUFFER(trace_hardirqs_off)
-
-/*
- * This is used by assembly code to soft-disable interrupts first and
- * reconcile irq state.
- *
- * NB: This may call C code, so the caller must be prepared for volatiles to
- * be clobbered.
- */
-#define RECONCILE_IRQ_STATE(__rA, __rB)\
-   lbz __rA,PACAIRQSOFTMASK(r13);  \
-   lbz __rB,PACAIRQHAPPENED(r13);  \
-   andi.   __rA,__rA,IRQS_DISABLED;\
-   li  __rA,IRQS_DISABLED; \
-   ori __rB,__rB,PACA_IRQ_HARD_DIS;\
-   stb __rB,PACAIRQHAPPENED(r13);  \
-   bne 44f;\
-   stb __rA,PACAIRQSOFTMASK(r13);  \
-   TRACE_DISABLE_INTS; \
-44:
-
-#else
-#define TRACE_ENABLE_INTS
-#define TRACE_DISABLE_INTS
-
-#define RECONCILE_IRQ_STATE(__rA, __rB)\
-   lbz __rA,PACAIRQHAPPENED(r13);  \
-   li  __rB,IRQS_DISABLED; \
-   ori __rA,__rA,PACA_IRQ_HARD_DIS;\
-   stb __rB,PACAIRQSOFTMASK(r13);  \
-   stb __rA,PACAIRQHAPPENED(r13)
-#endif
 #endif
 
 #endif
-- 
2.37.2



[PATCH 03/17] powerpc/perf: callchain validate kernel stack pointer bounds

2022-11-27 Thread Nicholas Piggin
The interrupt frame detection and loads from the hypothetical pt_regs
are not bounds-checked. The next-frame validation only bounds-checks
STACK_FRAME_OVERHEAD, which does not include the pt_regs. Add another
test for this.

The user could set r1 to be equal to the address matching the first
interrupt frame - STACK_INT_FRAME_SIZE, which is in the previous page
due to the kernel redzone, and induce the kernel to load the marker from
there. Possibly this could cause a crash at least. If the user could
induce the previous page to contain a valid marker, then it might be
able to direct perf to read specific memory addresses in a way that
could be transmitted back to the user in the perf data.

Signed-off-by: Nicholas Piggin 
---
Not sure if my attack scenario is actually valid, but I think there is
some concern here...

Thanks,
Nick

 arch/powerpc/perf/callchain.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 082f6d0308a4..8718289c051d 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -61,6 +61,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, 
struct pt_regs *re
next_sp = fp[0];
 
if (next_sp == sp + STACK_INT_FRAME_SIZE &&
+   validate_sp(sp, current, STACK_INT_FRAME_SIZE) &&
fp[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
/*
 * This looks like an interrupt frame for an
-- 
2.37.2



[PATCH 04/17] powerpc: Rearrange copy_thread child stack creation

2022-11-27 Thread Nicholas Piggin
This makes it a bit clearer where the stack frame is created, and will
allow easier use of some of the stack offset constants in a later
change.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/process.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 67da147fe34d..acfa197fb2df 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1726,13 +1726,16 @@ int copy_thread(struct task_struct *p, const struct 
kernel_clone_args *args)
 
klp_init_thread_info(p);
 
+   /* Create initial stack frame. */
+   sp -= (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD);
+   ((unsigned long *)sp)[0] = 0;
+
/* Copy registers */
-   sp -= sizeof(struct pt_regs);
-   childregs = (struct pt_regs *) sp;
+   childregs = (struct pt_regs *)(sp + STACK_FRAME_OVERHEAD);
if (unlikely(args->fn)) {
/* kernel thread */
memset(childregs, 0, sizeof(struct pt_regs));
-   childregs->gpr[1] = sp + sizeof(struct pt_regs);
+   childregs->gpr[1] = sp + (sizeof(struct pt_regs) + 
STACK_FRAME_OVERHEAD);
/* function */
if (args->fn)
childregs->gpr[14] = ppc_function_entry((void 
*)args->fn);
@@ -1767,7 +1770,6 @@ int copy_thread(struct task_struct *p, const struct 
kernel_clone_args *args)
f = ret_from_fork;
}
childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX);
-   sp -= STACK_FRAME_OVERHEAD;
 
/*
 * The way this works is that at some point in the future
@@ -1777,7 +1779,6 @@ int copy_thread(struct task_struct *p, const struct 
kernel_clone_args *args)
 * do some house keeping and then return from the fork or clone
 * system call, using the stack frame created above.
 */
-   ((unsigned long *)sp)[0] = 0;
sp -= sizeof(struct pt_regs);
kregs = (struct pt_regs *) sp;
sp -= STACK_FRAME_OVERHEAD;
-- 
2.37.2



[PATCH 05/17] powerpc/pseries: hvcall stack frame overhead

2022-11-27 Thread Nicholas Piggin
This call may use the min size stack frame. The scratch space used is
in the caller's parameter area frame, not this function's frame.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/platforms/pseries/hvCall.S | 38 +
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hvCall.S 
b/arch/powerpc/platforms/pseries/hvCall.S
index 762eb15d3bd4..783c16ad648b 100644
--- a/arch/powerpc/platforms/pseries/hvCall.S
+++ b/arch/powerpc/platforms/pseries/hvCall.S
@@ -27,7 +27,9 @@ hcall_tracepoint_refcount:
 
 /*
  * precall must preserve all registers.  use unused STK_PARAM()
- * areas to save snapshots and opcode.
+ * areas to save snapshots and opcode. STK_PARAM() in the caller's
+ * frame will be available even on ELFv2 because these are all
+ * variadic functions.
  */
 #define HCALL_INST_PRECALL(FIRST_REG)  \
mflrr0; \
@@ -41,29 +43,29 @@ hcall_tracepoint_refcount:
std r10,STK_PARAM(R10)(r1); \
std r0,16(r1);  \
addir4,r1,STK_PARAM(FIRST_REG); \
-   stdur1,-STACK_FRAME_OVERHEAD(r1);   \
+   stdur1,-STACK_FRAME_MIN_SIZE(r1);   \
bl  __trace_hcall_entry;\
-   ld  r3,STACK_FRAME_OVERHEAD+STK_PARAM(R3)(r1);  \
-   ld  r4,STACK_FRAME_OVERHEAD+STK_PARAM(R4)(r1);  \
-   ld  r5,STACK_FRAME_OVERHEAD+STK_PARAM(R5)(r1);  \
-   ld  r6,STACK_FRAME_OVERHEAD+STK_PARAM(R6)(r1);  \
-   ld  r7,STACK_FRAME_OVERHEAD+STK_PARAM(R7)(r1);  \
-   ld  r8,STACK_FRAME_OVERHEAD+STK_PARAM(R8)(r1);  \
-   ld  r9,STACK_FRAME_OVERHEAD+STK_PARAM(R9)(r1);  \
-   ld  r10,STACK_FRAME_OVERHEAD+STK_PARAM(R10)(r1)
+   ld  r3,STACK_FRAME_MIN_SIZE+STK_PARAM(R3)(r1);  \
+   ld  r4,STACK_FRAME_MIN_SIZE+STK_PARAM(R4)(r1);  \
+   ld  r5,STACK_FRAME_MIN_SIZE+STK_PARAM(R5)(r1);  \
+   ld  r6,STACK_FRAME_MIN_SIZE+STK_PARAM(R6)(r1);  \
+   ld  r7,STACK_FRAME_MIN_SIZE+STK_PARAM(R7)(r1);  \
+   ld  r8,STACK_FRAME_MIN_SIZE+STK_PARAM(R8)(r1);  \
+   ld  r9,STACK_FRAME_MIN_SIZE+STK_PARAM(R9)(r1);  \
+   ld  r10,STACK_FRAME_MIN_SIZE+STK_PARAM(R10)(r1)
 
 /*
  * postcall is performed immediately before function return which
  * allows liberal use of volatile registers.
  */
 #define __HCALL_INST_POSTCALL  \
-   ld  r0,STACK_FRAME_OVERHEAD+STK_PARAM(R3)(r1);  \
-   std r3,STACK_FRAME_OVERHEAD+STK_PARAM(R3)(r1);  \
+   ld  r0,STACK_FRAME_MIN_SIZE+STK_PARAM(R3)(r1);  \
+   std r3,STACK_FRAME_MIN_SIZE+STK_PARAM(R3)(r1);  \
mr  r4,r3;  \
mr  r3,r0;  \
bl  __trace_hcall_exit; \
-   ld  r0,STACK_FRAME_OVERHEAD+16(r1); \
-   addir1,r1,STACK_FRAME_OVERHEAD; \
+   ld  r0,STACK_FRAME_MIN_SIZE+16(r1); \
+   addir1,r1,STACK_FRAME_MIN_SIZE; \
ld  r3,STK_PARAM(R3)(r1);   \
mtlrr0
 
@@ -303,14 +305,14 @@ plpar_hcall9_trace:
mr  r7,r8
mr  r8,r9
mr  r9,r10
-   ld  r10,STACK_FRAME_OVERHEAD+STK_PARAM(R11)(r1)
-   ld  r11,STACK_FRAME_OVERHEAD+STK_PARAM(R12)(r1)
-   ld  r12,STACK_FRAME_OVERHEAD+STK_PARAM(R13)(r1)
+   ld  r10,STACK_FRAME_MIN_SIZE+STK_PARAM(R11)(r1)
+   ld  r11,STACK_FRAME_MIN_SIZE+STK_PARAM(R12)(r1)
+   ld  r12,STACK_FRAME_MIN_SIZE+STK_PARAM(R13)(r1)
 
HVSC
 
mr  r0,r12
-   ld  r12,STACK_FRAME_OVERHEAD+STK_PARAM(R4)(r1)
+   ld  r12,STACK_FRAME_MIN_SIZE+STK_PARAM(R4)(r1)
std r4,0(r12)
std r5,8(r12)
std r6,16(r12)
-- 
2.37.2



[PATCH 06/17] powerpc: simplify ppc_save_regs

2022-11-27 Thread Nicholas Piggin
Adjust the pt_regs pointer so the interrupt frame offsets can be used
to save registers.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/ppc_save_regs.S | 57 -
 1 file changed, 15 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/kernel/ppc_save_regs.S 
b/arch/powerpc/kernel/ppc_save_regs.S
index 2d4d21bb46a9..6e86f3bf4673 100644
--- a/arch/powerpc/kernel/ppc_save_regs.S
+++ b/arch/powerpc/kernel/ppc_save_regs.S
@@ -21,60 +21,33 @@
  * different ABIs, though).
  */
 _GLOBAL(ppc_save_regs)
-   PPC_STL r0,0*SZL(r3)
+   /* This allows stack frame accessor macros and offsets to be used */
+   subir3,r3,STACK_FRAME_OVERHEAD
+   PPC_STL r0,GPR0(r3)
 #ifdef CONFIG_PPC32
-   stmwr2, 2*SZL(r3)
+   stmwr2,GPR2(r3)
 #else
-   PPC_STL r2,2*SZL(r3)
-   PPC_STL r3,3*SZL(r3)
-   PPC_STL r4,4*SZL(r3)
-   PPC_STL r5,5*SZL(r3)
-   PPC_STL r6,6*SZL(r3)
-   PPC_STL r7,7*SZL(r3)
-   PPC_STL r8,8*SZL(r3)
-   PPC_STL r9,9*SZL(r3)
-   PPC_STL r10,10*SZL(r3)
-   PPC_STL r11,11*SZL(r3)
-   PPC_STL r12,12*SZL(r3)
-   PPC_STL r13,13*SZL(r3)
-   PPC_STL r14,14*SZL(r3)
-   PPC_STL r15,15*SZL(r3)
-   PPC_STL r16,16*SZL(r3)
-   PPC_STL r17,17*SZL(r3)
-   PPC_STL r18,18*SZL(r3)
-   PPC_STL r19,19*SZL(r3)
-   PPC_STL r20,20*SZL(r3)
-   PPC_STL r21,21*SZL(r3)
-   PPC_STL r22,22*SZL(r3)
-   PPC_STL r23,23*SZL(r3)
-   PPC_STL r24,24*SZL(r3)
-   PPC_STL r25,25*SZL(r3)
-   PPC_STL r26,26*SZL(r3)
-   PPC_STL r27,27*SZL(r3)
-   PPC_STL r28,28*SZL(r3)
-   PPC_STL r29,29*SZL(r3)
-   PPC_STL r30,30*SZL(r3)
-   PPC_STL r31,31*SZL(r3)
+   SAVE_GPRS(2, 31, r3)
lbz r0,PACAIRQSOFTMASK(r13)
-   PPC_STL r0,SOFTE-STACK_FRAME_OVERHEAD(r3)
+   PPC_STL r0,SOFTE(r3)
 #endif
/* go up one stack frame for SP */
PPC_LL  r4,0(r1)
-   PPC_STL r4,1*SZL(r3)
+   PPC_STL r4,GPR1(r3)
/* get caller's LR */
PPC_LL  r0,LRSAVE(r4)
-   PPC_STL r0,_LINK-STACK_FRAME_OVERHEAD(r3)
+   PPC_STL r0,_LINK(r3)
mflrr0
-   PPC_STL r0,_NIP-STACK_FRAME_OVERHEAD(r3)
+   PPC_STL r0,_NIP(r3)
mfmsr   r0
-   PPC_STL r0,_MSR-STACK_FRAME_OVERHEAD(r3)
+   PPC_STL r0,_MSR(r3)
mfctr   r0
-   PPC_STL r0,_CTR-STACK_FRAME_OVERHEAD(r3)
+   PPC_STL r0,_CTR(r3)
mfxer   r0
-   PPC_STL r0,_XER-STACK_FRAME_OVERHEAD(r3)
+   PPC_STL r0,_XER(r3)
mfcrr0
-   PPC_STL r0,_CCR-STACK_FRAME_OVERHEAD(r3)
+   PPC_STL r0,_CCR(r3)
li  r0,0
-   PPC_STL r0,_TRAP-STACK_FRAME_OVERHEAD(r3)
-   PPC_STL r0,ORIG_GPR3-STACK_FRAME_OVERHEAD(r3)
+   PPC_STL r0,_TRAP(r3)
+   PPC_STL r0,ORIG_GPR3(r3)
blr
-- 
2.37.2



[PATCH 07/17] powerpc: add definition for pt_regs offset within an interrupt frame

2022-11-27 Thread Nicholas Piggin
This is a common offset that currently uses the overloaded
STACK_FRAME_OVERHEAD constant. It's easier to read and more
flexible to use a specific regs offset for this.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ptrace.h |  2 +
 arch/powerpc/kernel/asm-offsets.c |  7 +-
 arch/powerpc/kernel/entry_32.S|  6 +-
 arch/powerpc/kernel/exceptions-64e.S  | 42 +-
 arch/powerpc/kernel/exceptions-64s.S  | 80 +--
 arch/powerpc/kernel/head_32.h |  2 +-
 arch/powerpc/kernel/head_85xx.S   |  4 +-
 arch/powerpc/kernel/head_booke.h  |  2 +-
 arch/powerpc/kernel/interrupt_64.S| 22 ++---
 arch/powerpc/kernel/kgdb.c|  2 +-
 arch/powerpc/kernel/optprobes_head.S  |  4 +-
 arch/powerpc/kernel/ppc_save_regs.S   |  2 +-
 arch/powerpc/kernel/process.c |  4 +-
 arch/powerpc/kernel/tm.S  |  8 +-
 arch/powerpc/kernel/trace/ftrace_mprofile.S   |  2 +-
 .../lib/test_emulate_step_exec_instr.S|  2 +-
 arch/powerpc/perf/callchain.c |  2 +-
 arch/powerpc/xmon/xmon.c  |  7 +-
 18 files changed, 100 insertions(+), 100 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 2efec6d87049..a4ae67aa9b76 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -124,6 +124,7 @@ struct pt_regs
 #define STACK_FRAME_LR_SAVE2   /* Location of LR in stack frame */
 #define STACK_INT_FRAME_SIZE   (sizeof(struct pt_regs) + \
 STACK_FRAME_OVERHEAD + KERNEL_REDZONE_SIZE)
+#define STACK_INT_FRAME_REGS   STACK_FRAME_OVERHEAD
 #define STACK_FRAME_MARKER 12
 
 #ifdef CONFIG_PPC64_ELF_ABI_V2
@@ -143,6 +144,7 @@ struct pt_regs
 #define STACK_FRAME_OVERHEAD   16  /* size of minimum stack frame */
 #define STACK_FRAME_LR_SAVE1   /* Location of LR in stack frame */
 #define STACK_INT_FRAME_SIZE   (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD)
+#define STACK_INT_FRAME_REGS   STACK_FRAME_OVERHEAD
 #define STACK_FRAME_MARKER 2
 #define STACK_FRAME_MIN_SIZE   STACK_FRAME_OVERHEAD
 
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 4ce2a4aa3985..db5e66c1d031 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -72,7 +72,7 @@
 #endif
 
 #define STACK_PT_REGS_OFFSET(sym, val) \
-   DEFINE(sym, STACK_FRAME_OVERHEAD + offsetof(struct pt_regs, val))
+   DEFINE(sym, STACK_INT_FRAME_REGS + offsetof(struct pt_regs, val))
 
 int main(void)
 {
@@ -167,9 +167,8 @@ int main(void)
OFFSET(THREAD_CKVRSTATE, thread_struct, ckvr_state.vr);
OFFSET(THREAD_CKVRSAVE, thread_struct, ckvrsave);
OFFSET(THREAD_CKFPSTATE, thread_struct, ckfp_state.fpr);
-   /* Local pt_regs on stack for Transactional Memory funcs. */
-   DEFINE(TM_FRAME_SIZE, STACK_FRAME_OVERHEAD +
-  sizeof(struct pt_regs) + 16);
+   /* Local pt_regs on stack in int frame form, plus 16 bytes for TM */
+   DEFINE(TM_FRAME_SIZE, STACK_INT_FRAME_SIZE + 16);
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
OFFSET(TI_LOCAL_FLAGS, thread_info, local_flags);
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 3fc7c9886bb7..24c8d84a56c9 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -123,12 +123,12 @@ transfer_to_syscall:
kuep_lock
 
/* Calling convention has r3 = regs, r4 = orig r0 */
-   addir3,r1,STACK_FRAME_OVERHEAD
+   addir3,r1,STACK_INT_FRAME_REGS
mr  r4,r0
bl  system_call_exception
 
 ret_from_syscall:
-   addir4,r1,STACK_FRAME_OVERHEAD
+   addir4,r1,STACK_INT_FRAME_REGS
li  r5,0
bl  syscall_exit_prepare
 #ifdef CONFIG_PPC_47x
@@ -293,7 +293,7 @@ _ASM_NOKPROBE_SYMBOL(fast_exception_return)
.globl interrupt_return
 interrupt_return:
lwz r4,_MSR(r1)
-   addir3,r1,STACK_FRAME_OVERHEAD
+   addir3,r1,STACK_INT_FRAME_REGS
andi.   r0,r4,MSR_PR
beq .Lkernel_interrupt_return
bl  interrupt_exit_user_prepare
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 2f68fb2ee4fc..62033d022e0a 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -455,7 +455,7 @@ exc_##n##_bad_stack:
\
EXCEPTION_COMMON(trapnum)   \
ack(r8);\
CHECK_NAPPING();\
-   addir3,r1,STACK_FRAME_OVERHEAD; \
+   addir3,r1,STACK_INT_FRAME_REGS;   

[PATCH 08/17] powerpc: add a definition for the marker offset within the interrupt frame

2022-11-27 Thread Nicholas Piggin
Define a constant rather than open-code the offset for the
"regs" marker.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ptrace.h   |  2 ++
 arch/powerpc/kernel/entry_32.S  |  2 +-
 arch/powerpc/kernel/exceptions-64e.S|  2 +-
 arch/powerpc/kernel/exceptions-64s.S|  2 +-
 arch/powerpc/kernel/head_32.h   |  2 +-
 arch/powerpc/kernel/head_booke.h|  2 +-
 arch/powerpc/kernel/interrupt_64.S  | 10 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  2 +-
 8 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index a4ae67aa9b76..8a9f4cf8c4c5 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -125,6 +125,7 @@ struct pt_regs
 #define STACK_INT_FRAME_SIZE   (sizeof(struct pt_regs) + \
 STACK_FRAME_OVERHEAD + KERNEL_REDZONE_SIZE)
 #define STACK_INT_FRAME_REGS   STACK_FRAME_OVERHEAD
+#define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 16)
 #define STACK_FRAME_MARKER 12
 
 #ifdef CONFIG_PPC64_ELF_ABI_V2
@@ -145,6 +146,7 @@ struct pt_regs
 #define STACK_FRAME_LR_SAVE1   /* Location of LR in stack frame */
 #define STACK_INT_FRAME_SIZE   (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD)
 #define STACK_INT_FRAME_REGS   STACK_FRAME_OVERHEAD
+#define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 8)
 #define STACK_FRAME_MARKER 2
 #define STACK_FRAME_MIN_SIZE   STACK_FRAME_OVERHEAD
 
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 24c8d84a56c9..2f61b7d3677c 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -114,7 +114,7 @@ transfer_to_syscall:
addir12,r12,STACK_FRAME_REGS_MARKER@l
stw r9,_MSR(r1)
li  r2, INTERRUPT_SYSCALL
-   stw r12,8(r1)
+   stw r12,STACK_INT_FRAME_MARKER(r1)
stw r2,_TRAP(r1)
SAVE_GPR(0, r1)
SAVE_GPRS(3, 8, r1)
diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 62033d022e0a..b9cec22df9f9 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -391,7 +391,7 @@ exc_##n##_common:   
\
std r10,_CCR(r1);   /* store orig CR in stackframe */   \
std r9,GPR1(r1);/* store stack frame back link */   \
std r11,SOFTE(r1);  /* and save it to stackframe */ \
-   std r12,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame */   \
+   std r12,STACK_INT_FRAME_MARKER(r1); /* mark the frame */\
std r3,_TRAP(r1);   /* set trap number  */  \
std r0,RESULT(r1);  /* clear regs->result */\
SAVE_NVGPRS(r1);
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 29b78536ca59..ac3b0580224e 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -591,7 +591,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
li  r10,0
LOAD_REG_IMMEDIATE(r11, STACK_FRAME_REGS_MARKER)
std r10,RESULT(r1)  /* clear regs->result   */
-   std r11,STACK_FRAME_OVERHEAD-16(r1) /* mark the frame   */
+   std r11,STACK_INT_FRAME_MARKER(r1) /* mark the frame*/
 .endm
 
 /*
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 117d25330e13..f8e2911478a7 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -112,7 +112,7 @@ _ASM_NOKPROBE_SYMBOL(\name\()_virt)
stw r0,GPR0(r1)
lis r10,STACK_FRAME_REGS_MARKER@ha /* exception frame marker */
addir10,r10,STACK_FRAME_REGS_MARKER@l
-   stw r10,8(r1)
+   stw r10,STACK_INT_FRAME_MARKER(r1)
li  r10, \trapno
stw r10,_TRAP(r1)
SAVE_GPRS(3, 8, r1)
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index 3149ac20b18e..37d43c172676 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -84,7 +84,7 @@ END_BTB_FLUSH_SECTION
stw r0,GPR0(r1)
lis r10, STACK_FRAME_REGS_MARKER@ha /* exception frame marker */
addir10, r10, STACK_FRAME_REGS_MARKER@l
-   stw r10, 8(r1)
+   stw r10, STACK_INT_FRAME_MARKER(r1)
li  r10, \trapno
stw r10,_TRAP(r1)
SAVE_GPRS(3, 8, r1)
diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index 49d585eae7c8..321992c1c9f9 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -77,11 +77,11 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
std r11,_TRAP(r1)
std r12,_CCR(r1)
std r3,ORIG_GPR3(r1)
+   LOAD_REG_IMMEDIATE(r

[PATCH 09/17] powerpc: Rename STACK_FRAME_MARKER and derive it from frame offset

2022-11-27 Thread Nicholas Piggin
This is a count of longs from the stack pointer to the regs marker.
Rename it to make it more distinct from the other byte offsets. It
can be derived from the byte offset definitions just added.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ptrace.h | 4 ++--
 arch/powerpc/kernel/process.c | 2 +-
 arch/powerpc/kernel/stacktrace.c  | 2 +-
 arch/powerpc/perf/callchain.c | 2 +-
 arch/powerpc/xmon/xmon.c  | 3 +--
 5 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 8a9f4cf8c4c5..fdd50648df56 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -126,7 +126,6 @@ struct pt_regs
 STACK_FRAME_OVERHEAD + KERNEL_REDZONE_SIZE)
 #define STACK_INT_FRAME_REGS   STACK_FRAME_OVERHEAD
 #define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 16)
-#define STACK_FRAME_MARKER 12
 
 #ifdef CONFIG_PPC64_ELF_ABI_V2
 #define STACK_FRAME_MIN_SIZE   32
@@ -147,7 +146,6 @@ struct pt_regs
 #define STACK_INT_FRAME_SIZE   (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD)
 #define STACK_INT_FRAME_REGS   STACK_FRAME_OVERHEAD
 #define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 8)
-#define STACK_FRAME_MARKER 2
 #define STACK_FRAME_MIN_SIZE   STACK_FRAME_OVERHEAD
 
 /* Size of stack frame allocated when calling signal handler. */
@@ -155,6 +153,8 @@ struct pt_regs
 
 #endif /* __powerpc64__ */
 
+#define STACK_INT_FRAME_MARKER_LONGS   (STACK_INT_FRAME_MARKER/sizeof(long))
+
 #ifndef __ASSEMBLY__
 #include 
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index e7010f71de24..b0a9e5eeec4c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2234,7 +2234,7 @@ void __no_sanitize_address show_stack(struct task_struct 
*tsk,
 * We look for the "regs" marker in the current frame.
 */
if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS)
-   && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
+   && stack[STACK_INT_FRAME_MARKER_LONGS] == 
STACK_FRAME_REGS_MARKER) {
struct pt_regs *regs = (struct pt_regs *)
(sp + STACK_INT_FRAME_REGS);
 
diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index a2443d61728e..7efa0ec9dd77 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -136,7 +136,7 @@ int __no_sanitize_address 
arch_stack_walk_reliable(stack_trace_consume_fn consum
 
/* Mark stacktraces with exception frames as unreliable. */
if (sp <= stack_end - STACK_INT_FRAME_SIZE &&
-   stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
+   stack[STACK_INT_FRAME_MARKER_LONGS] == 
STACK_FRAME_REGS_MARKER) {
return -EINVAL;
}
 
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 9e254aed1f61..b01497ed5173 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -62,7 +62,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, 
struct pt_regs *re
 
if (next_sp == sp + STACK_INT_FRAME_SIZE &&
validate_sp(sp, current, STACK_INT_FRAME_SIZE) &&
-   fp[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
+   fp[STACK_INT_FRAME_MARKER_LONGS] == 
STACK_FRAME_REGS_MARKER) {
/*
 * This looks like an interrupt frame for an
 * interrupt that occurred in the kernel
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index e403f14eb6eb..bbdaa42ba4ba 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1720,7 +1720,6 @@ static void get_function_bounds(unsigned long pc, 
unsigned long *startp,
 }
 
 #define LRSAVE_OFFSET  (STACK_FRAME_LR_SAVE * sizeof(unsigned long))
-#define MARKER_OFFSET  (STACK_FRAME_MARKER * sizeof(unsigned long))
 
 static void xmon_show_stack(unsigned long sp, unsigned long lr,
unsigned long pc)
@@ -1783,7 +1782,7 @@ static void xmon_show_stack(unsigned long sp, unsigned 
long lr,
 
/* Look for "regs" marker to see if this is
   an exception frame. */
-   if (mread(sp + MARKER_OFFSET, &marker, sizeof(unsigned long))
+   if (mread(sp + STACK_INT_FRAME_MARKER, &marker, sizeof(unsigned 
long))
&& marker == STACK_FRAME_REGS_MARKER) {
if (mread(sp + STACK_INT_FRAME_REGS, ®s, 
sizeof(regs)) != sizeof(regs)) {
printf("Couldn't read registers at %lx\n",
-- 
2.37.2



[PATCH 10/17] powerpc: add a define for the user interrupt frame size

2022-11-27 Thread Nicholas Piggin
The user interrupt frame is a different size from the kernel frame, so
give it its own name.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ptrace.h | 6 +++---
 arch/powerpc/kernel/process.c | 6 +++---
 arch/powerpc/kernel/stacktrace.c  | 4 ++--
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index fdd50648df56..705ce26ae887 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -122,8 +122,7 @@ struct pt_regs
 
 #define STACK_FRAME_OVERHEAD   112 /* size of minimum stack frame */
 #define STACK_FRAME_LR_SAVE2   /* Location of LR in stack frame */
-#define STACK_INT_FRAME_SIZE   (sizeof(struct pt_regs) + \
-STACK_FRAME_OVERHEAD + KERNEL_REDZONE_SIZE)
+#define STACK_USER_INT_FRAME_SIZE  (sizeof(struct pt_regs) + 
STACK_FRAME_OVERHEAD)
 #define STACK_INT_FRAME_REGS   STACK_FRAME_OVERHEAD
 #define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 16)
 
@@ -143,7 +142,7 @@ struct pt_regs
 #define KERNEL_REDZONE_SIZE0
 #define STACK_FRAME_OVERHEAD   16  /* size of minimum stack frame */
 #define STACK_FRAME_LR_SAVE1   /* Location of LR in stack frame */
-#define STACK_INT_FRAME_SIZE   (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD)
+#define STACK_USER_INT_FRAME_SIZE  (sizeof(struct pt_regs) + 
STACK_FRAME_OVERHEAD)
 #define STACK_INT_FRAME_REGS   STACK_FRAME_OVERHEAD
 #define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 8)
 #define STACK_FRAME_MIN_SIZE   STACK_FRAME_OVERHEAD
@@ -153,6 +152,7 @@ struct pt_regs
 
 #endif /* __powerpc64__ */
 
+#define STACK_INT_FRAME_SIZE   (KERNEL_REDZONE_SIZE + 
STACK_USER_INT_FRAME_SIZE)
 #define STACK_INT_FRAME_MARKER_LONGS   (STACK_INT_FRAME_MARKER/sizeof(long))
 
 #ifndef __ASSEMBLY__
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index b0a9e5eeec4c..d6daf0d073b3 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1727,15 +1727,15 @@ int copy_thread(struct task_struct *p, const struct 
kernel_clone_args *args)
klp_init_thread_info(p);
 
/* Create initial stack frame. */
-   sp -= (sizeof(struct pt_regs) + STACK_FRAME_OVERHEAD);
+   sp -= STACK_USER_INT_FRAME_SIZE;
((unsigned long *)sp)[0] = 0;
 
/* Copy registers */
-   childregs = (struct pt_regs *)(sp + STACK_FRAME_OVERHEAD);
+   childregs = (struct pt_regs *)(sp + STACK_INT_FRAME_REGS);
if (unlikely(args->fn)) {
/* kernel thread */
memset(childregs, 0, sizeof(struct pt_regs));
-   childregs->gpr[1] = sp + (sizeof(struct pt_regs) + 
STACK_FRAME_OVERHEAD);
+   childregs->gpr[1] = sp + STACK_USER_INT_FRAME_SIZE;
/* function */
if (args->fn)
childregs->gpr[14] = ppc_function_entry((void 
*)args->fn);
diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index 7efa0ec9dd77..453ac317a6cf 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -77,7 +77,7 @@ int __no_sanitize_address 
arch_stack_walk_reliable(stack_trace_consume_fn consum
/*
 * For user tasks, this is the SP value loaded on
 * kernel entry, see "PACAKSAVE(r13)" in _switch() and
-* system_call_common()/EXCEPTION_PROLOG_COMMON().
+* system_call_common().
 *
 * Likewise for non-swapper kernel threads,
 * this also happens to be the top of the stack
@@ -88,7 +88,7 @@ int __no_sanitize_address 
arch_stack_walk_reliable(stack_trace_consume_fn consum
 * an unreliable stack trace until it's been
 * _switch()'ed to for the first time.
 */
-   stack_end -= STACK_FRAME_OVERHEAD + sizeof(struct pt_regs);
+   stack_end -= STACK_USER_INT_FRAME_SIZE;
} else {
/*
 * idle tasks have a custom stack layout,
-- 
2.37.2



[PATCH 11/17] powerpc: add a define for the switch frame size and regs offset

2022-11-27 Thread Nicholas Piggin
This is open-coded in process.c, ppc32 uses a different define with the
same value, and the C definition is name differently which makes it an
extra indirection to grep for.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ptrace.h |  6 --
 arch/powerpc/kernel/asm-offsets.c |  2 +-
 arch/powerpc/kernel/entry_32.S|  6 +++---
 arch/powerpc/kernel/process.c | 12 
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 705ce26ae887..412ef0749775 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -97,8 +97,6 @@ struct pt_regs
 #endif
 
 
-#define STACK_FRAME_WITH_PT_REGS (STACK_FRAME_OVERHEAD + sizeof(struct 
pt_regs))
-
 // Always displays as "REGS" in memory dumps
 #ifdef CONFIG_CPU_BIG_ENDIAN
 #define STACK_FRAME_REGS_MARKERASM_CONST(0x52454753)
@@ -125,6 +123,8 @@ struct pt_regs
 #define STACK_USER_INT_FRAME_SIZE  (sizeof(struct pt_regs) + 
STACK_FRAME_OVERHEAD)
 #define STACK_INT_FRAME_REGS   STACK_FRAME_OVERHEAD
 #define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 16)
+#define STACK_SWITCH_FRAME_SIZE(sizeof(struct pt_regs) + 
STACK_FRAME_OVERHEAD)
+#define STACK_SWITCH_FRAME_REGSSTACK_FRAME_OVERHEAD
 
 #ifdef CONFIG_PPC64_ELF_ABI_V2
 #define STACK_FRAME_MIN_SIZE   32
@@ -146,6 +146,8 @@ struct pt_regs
 #define STACK_INT_FRAME_REGS   STACK_FRAME_OVERHEAD
 #define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 8)
 #define STACK_FRAME_MIN_SIZE   STACK_FRAME_OVERHEAD
+#define STACK_SWITCH_FRAME_SIZE(sizeof(struct pt_regs) + 
STACK_FRAME_OVERHEAD)
+#define STACK_SWITCH_FRAME_REGSSTACK_FRAME_OVERHEAD
 
 /* Size of stack frame allocated when calling signal handler. */
 #define __SIGNAL_FRAMESIZE 64
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index db5e66c1d031..f7dff906c24b 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -260,7 +260,7 @@ int main(void)
 
/* Interrupt register frame */
DEFINE(INT_FRAME_SIZE, STACK_INT_FRAME_SIZE);
-   DEFINE(SWITCH_FRAME_SIZE, STACK_FRAME_WITH_PT_REGS);
+   DEFINE(SWITCH_FRAME_SIZE, STACK_SWITCH_FRAME_SIZE);
STACK_PT_REGS_OFFSET(GPR0, gpr[0]);
STACK_PT_REGS_OFFSET(GPR1, gpr[1]);
STACK_PT_REGS_OFFSET(GPR2, gpr[2]);
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 2f61b7d3677c..6e99ec10be89 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -215,9 +215,9 @@ ret_from_kernel_thread:
  * in arch/ppc/kernel/process.c
  */
 _GLOBAL(_switch)
-   stwur1,-INT_FRAME_SIZE(r1)
+   stwur1,-SWITCH_FRAME_SIZE(r1)
mflrr0
-   stw r0,INT_FRAME_SIZE+4(r1)
+   stw r0,SWITCH_FRAME_SIZE+4(r1)
/* r3-r12 are caller saved -- Cort */
SAVE_NVGPRS(r1)
stw r0,_NIP(r1) /* Return to switch caller */
@@ -248,7 +248,7 @@ _GLOBAL(_switch)
 
lwz r4,_NIP(r1) /* Return to _switch caller in new task */
mtlrr4
-   addir1,r1,INT_FRAME_SIZE
+   addir1,r1,SWITCH_FRAME_SIZE
blr
 
.globl  fast_exception_return
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index d6daf0d073b3..a097879b0474 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1779,10 +1779,10 @@ int copy_thread(struct task_struct *p, const struct 
kernel_clone_args *args)
 * do some house keeping and then return from the fork or clone
 * system call, using the stack frame created above.
 */
-   sp -= sizeof(struct pt_regs);
-   kregs = (struct pt_regs *) sp;
-   sp -= STACK_FRAME_OVERHEAD;
+   sp -= STACK_SWITCH_FRAME_SIZE;
+   kregs = (struct pt_regs *)(sp + STACK_SWITCH_FRAME_REGS);
p->thread.ksp = sp;
+
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
for (i = 0; i < nr_wp_slots(); i++)
p->thread.ptrace_bps[i] = NULL;
@@ -2232,8 +2232,12 @@ void __no_sanitize_address show_stack(struct task_struct 
*tsk,
/*
 * See if this is an exception frame.
 * We look for the "regs" marker in the current frame.
+*
+* STACK_SWITCH_FRAME_SIZE being the smallest frame that
+* could hold a pt_regs, if that does not fit then it can't
+* have regs.
 */
-   if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS)
+   if (validate_sp(sp, tsk, STACK_SWITCH_FRAME_SIZE)
&& stack[STACK_INT_FRAME_MARKER_LONGS] == 
STACK_FRAME_REGS_MARKER) {
struct pt_regs *regs = (struct pt_regs *)
(sp + STACK_INT_FRAME_REGS);
-- 
2.37.2



[PATCH 12/17] powerpc: copy_thread fill in interrupt frame marker and back chain

2022-11-27 Thread Nicholas Piggin
Backtraces will not recognise the fork system call interrupt without
the regs marker. And regular interrupt entry from userspace creates
the back chain to the user stack, so do this for the initial fork
frame too, to be consistent.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/process.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a097879b0474..27956831fa5d 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1728,12 +1728,13 @@ int copy_thread(struct task_struct *p, const struct 
kernel_clone_args *args)
 
/* Create initial stack frame. */
sp -= STACK_USER_INT_FRAME_SIZE;
-   ((unsigned long *)sp)[0] = 0;
+   *(unsigned long *)(sp + STACK_INT_FRAME_MARKER) = 
STACK_FRAME_REGS_MARKER;
 
/* Copy registers */
childregs = (struct pt_regs *)(sp + STACK_INT_FRAME_REGS);
if (unlikely(args->fn)) {
/* kernel thread */
+   ((unsigned long *)sp)[0] = 0;
memset(childregs, 0, sizeof(struct pt_regs));
childregs->gpr[1] = sp + STACK_USER_INT_FRAME_SIZE;
/* function */
@@ -1753,6 +1754,7 @@ int copy_thread(struct task_struct *p, const struct 
kernel_clone_args *args)
*childregs = *regs;
if (usp)
childregs->gpr[1] = usp;
+   ((unsigned long *)sp)[0] = childregs->gpr[1];
p->thread.regs = childregs;
/* 64s sets this in ret_from_fork */
if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64))
-- 
2.37.2



[PATCH 13/17] powerpc: copy_thread add a back chain to the switch stack frame

2022-11-27 Thread Nicholas Piggin
Stack unwinders need LR and the back chain as a minimum. The switch
stack uses regs->nip for its return pointer rather than lrsave, so
that was not set in the fork frame, and neither was the back chain.
This change sets those fields in the stack.

With this and the previous change, a stack trace in the switch or
interrupt stack goes from looking like this:

  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 3 PID: 90 Comm: systemd Not tainted
  NIP:  c0011060 LR: c0010f68 CTR: 7fff
  [ ... regs ... ]
  NIP [c0011060] _switch+0x160/0x17c
  LR [c0010f68] _switch+0x68/0x17c
  Call Trace:

To this:

  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  CPU: 0 PID: 93 Comm: systemd Not tainted
  NIP:  c0011060 LR: c0010f68 CTR: 7fff
  [ ... regs ... ]
  NIP [c0011060] _switch+0x160/0x17c
  LR [c0010f68] _switch+0x68/0x17c
  Call Trace:
  [c5a93e10] [c000cdbc] ret_from_fork_scv+0x0/0x54
  --- interrupt: 3000 at 0x7fffa72f56d8
  NIP:  7fffa72f56d8 LR:  CTR: 
  [ ... regs ... ]
  NIP [7fffa72f56d8] 0x7fffa72f56d8
  LR [] 0x0
  --- interrupt: 3000

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/process.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 27956831fa5d..6cb3982a11ef 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1781,7 +1781,9 @@ int copy_thread(struct task_struct *p, const struct 
kernel_clone_args *args)
 * do some house keeping and then return from the fork or clone
 * system call, using the stack frame created above.
 */
+   ((unsigned long *)sp)[STACK_FRAME_LR_SAVE] = (unsigned long)f;
sp -= STACK_SWITCH_FRAME_SIZE;
+   ((unsigned long *)sp)[0] = sp + STACK_SWITCH_FRAME_SIZE;
kregs = (struct pt_regs *)(sp + STACK_SWITCH_FRAME_REGS);
p->thread.ksp = sp;
 
-- 
2.37.2



[PATCH 14/17] powerpc: split validate_sp into two functions

2022-11-27 Thread Nicholas Piggin
Most callers just want to validate an arbitrary kernel stack pointer,
some need a particular size. Make the size case the exceptional one
with an extra function.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/processor.h | 15 ---
 arch/powerpc/kernel/process.c| 23 ++-
 arch/powerpc/kernel/stacktrace.c |  2 +-
 arch/powerpc/perf/callchain.c|  6 +++---
 4 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 631802999d59..e96c9b8c2a60 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -374,9 +374,18 @@ static inline unsigned long __pack_fe01(unsigned int 
fpmode)
 
 #endif
 
-/* Check that a certain kernel stack pointer is valid in task_struct p */
-int validate_sp(unsigned long sp, struct task_struct *p,
-   unsigned long nbytes);
+/*
+ * Check that a certain kernel stack pointer is a valid (minimum sized)
+ * stack frame in task_struct p.
+ */
+int validate_sp(unsigned long sp, struct task_struct *p);
+
+/*
+ * validate the stack frame of a particular minimum size, used for when we are
+ * looking at a certain object in the stack beyond the minimum.
+ */
+int validate_sp_size(unsigned long sp, struct task_struct *p,
+unsigned long nbytes);
 
 /*
  * Prefetch macros.
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 6cb3982a11ef..6820d90744c3 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2128,9 +2128,12 @@ static inline int valid_emergency_stack(unsigned long 
sp, struct task_struct *p,
return 0;
 }
 
-
-int validate_sp(unsigned long sp, struct task_struct *p,
-  unsigned long nbytes)
+/*
+ * validate the stack frame of a particular minimum size, used for when we are
+ * looking at a certain object in the stack beyond the minimum.
+ */
+int validate_sp_size(unsigned long sp, struct task_struct *p,
+unsigned long nbytes)
 {
unsigned long stack_page = (unsigned long)task_stack_page(p);
 
@@ -2146,7 +2149,10 @@ int validate_sp(unsigned long sp, struct task_struct *p,
return valid_emergency_stack(sp, p, nbytes);
 }
 
-EXPORT_SYMBOL(validate_sp);
+int validate_sp(unsigned long sp, struct task_struct *p)
+{
+   return validate_sp_size(sp, p, STACK_FRAME_OVERHEAD);
+}
 
 static unsigned long ___get_wchan(struct task_struct *p)
 {
@@ -2154,13 +2160,12 @@ static unsigned long ___get_wchan(struct task_struct *p)
int count = 0;
 
sp = p->thread.ksp;
-   if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD))
+   if (!validate_sp(sp, p))
return 0;
 
do {
sp = READ_ONCE_NOCHECK(*(unsigned long *)sp);
-   if (!validate_sp(sp, p, STACK_FRAME_OVERHEAD) ||
-   task_is_running(p))
+   if (!validate_sp(sp, p) || task_is_running(p))
return 0;
if (count > 0) {
ip = READ_ONCE_NOCHECK(((unsigned long 
*)sp)[STACK_FRAME_LR_SAVE]);
@@ -2214,7 +2219,7 @@ void __no_sanitize_address show_stack(struct task_struct 
*tsk,
lr = 0;
printk("%sCall Trace:\n", loglvl);
do {
-   if (!validate_sp(sp, tsk, STACK_FRAME_OVERHEAD))
+   if (!validate_sp(sp, tsk))
break;
 
stack = (unsigned long *) sp;
@@ -2241,7 +2246,7 @@ void __no_sanitize_address show_stack(struct task_struct 
*tsk,
 * could hold a pt_regs, if that does not fit then it can't
 * have regs.
 */
-   if (validate_sp(sp, tsk, STACK_SWITCH_FRAME_SIZE)
+   if (validate_sp_size(sp, tsk, STACK_SWITCH_FRAME_SIZE)
&& stack[STACK_INT_FRAME_MARKER_LONGS] == 
STACK_FRAME_REGS_MARKER) {
struct pt_regs *regs = (struct pt_regs *)
(sp + STACK_INT_FRAME_REGS);
diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index 453ac317a6cf..1dbbf30f265e 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -43,7 +43,7 @@ void __no_sanitize_address 
arch_stack_walk(stack_trace_consume_fn consume_entry,
unsigned long *stack = (unsigned long *) sp;
unsigned long newsp, ip;
 
-   if (!validate_sp(sp, task, STACK_FRAME_OVERHEAD))
+   if (!validate_sp(sp, task))
return;
 
newsp = stack[0];
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index b01497ed5173..6b4434dd0ff3 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -27,7 +27,7 @@ static int valid_next_sp(unsigned long sp, unsigned long 
prev_sp)
 {
if (sp & 0xf)

[PATCH 15/17] powerpc: allow minimum sized kernel stack frames

2022-11-27 Thread Nicholas Piggin
This affects only 64-bit ELFv2 kernels, and reduces the minimum
asm-created stack frame size from 112 to 32 byte on those kernels.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/head_40x.S   | 2 +-
 arch/powerpc/kernel/head_44x.S   | 6 +++---
 arch/powerpc/kernel/head_64.S| 6 +++---
 arch/powerpc/kernel/head_85xx.S  | 4 ++--
 arch/powerpc/kernel/head_8xx.S   | 2 +-
 arch/powerpc/kernel/head_book3s_32.S | 4 ++--
 arch/powerpc/kernel/irq.c| 4 ++--
 arch/powerpc/kernel/misc_32.S| 2 +-
 arch/powerpc/kernel/misc_64.S| 4 ++--
 arch/powerpc/kernel/process.c| 2 +-
 arch/powerpc/kernel/smp.c| 2 +-
 arch/powerpc/kernel/stacktrace.c | 2 +-
 12 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
index 088f500896c7..918547b93b5e 100644
--- a/arch/powerpc/kernel/head_40x.S
+++ b/arch/powerpc/kernel/head_40x.S
@@ -602,7 +602,7 @@ start_here:
lis r1,init_thread_union@ha
addir1,r1,init_thread_union@l
li  r0,0
-   stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1)
+   stwur0,THREAD_SIZE-STACK_FRAME_MIN_SIZE(r1)
 
bl  early_init  /* We have to do this with MMU on */
 
diff --git a/arch/powerpc/kernel/head_44x.S b/arch/powerpc/kernel/head_44x.S
index f15cb9fdb692..63a85c16fef4 100644
--- a/arch/powerpc/kernel/head_44x.S
+++ b/arch/powerpc/kernel/head_44x.S
@@ -109,7 +109,7 @@ _GLOBAL(_start);
lis r1,init_thread_union@h
ori r1,r1,init_thread_union@l
li  r0,0
-   stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1)
+   stwur0,THREAD_SIZE-STACK_FRAME_MIN_SIZE(r1)
 
bl  early_init
 
@@ -1012,7 +1012,7 @@ _GLOBAL(start_secondary_47x)
 */
lis r1,temp_boot_stack@h
ori r1,r1,temp_boot_stack@l
-   addir1,r1,1024-STACK_FRAME_OVERHEAD
+   addir1,r1,1024-STACK_FRAME_MIN_SIZE
li  r0,0
stw r0,0(r1)
bl  mmu_init_secondary
@@ -1025,7 +1025,7 @@ _GLOBAL(start_secondary_47x)
lwz r1,TASK_STACK(r2)
 
/* Current stack pointer */
-   addir1,r1,THREAD_SIZE-STACK_FRAME_OVERHEAD
+   addir1,r1,THREAD_SIZE-STACK_FRAME_MIN_SIZE
li  r0,0
stw r0,0(r1)
 
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index dedcc6fe2263..b513d13bf79e 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -424,7 +424,7 @@ generic_secondary_common_init:
 
/* Create a temp kernel stack for use before relocation is on.  */
ld  r1,PACAEMERGSP(r13)
-   subir1,r1,STACK_FRAME_OVERHEAD
+   subir1,r1,STACK_FRAME_MIN_SIZE
 
/* See if we need to call a cpu state restore handler */
LOAD_REG_ADDR(r23, cur_cpu_spec)
@@ -780,7 +780,7 @@ _GLOBAL(pmac_secondary_start)
 
/* Create a temp kernel stack for use before relocation is on.  */
ld  r1,PACAEMERGSP(r13)
-   subir1,r1,STACK_FRAME_OVERHEAD
+   subir1,r1,STACK_FRAME_MIN_SIZE
 
b   __secondary_start
 
@@ -958,7 +958,7 @@ start_here_multiplatform:
LOAD_REG_IMMEDIATE(r1,THREAD_SIZE)
add r1,r3,r1
li  r0,0
-   stdur0,-STACK_FRAME_OVERHEAD(r1)
+   stdur0,-STACK_FRAME_MIN_SIZE(r1)
 
/*
 * Do very early kernel initializations, including initial hash table
diff --git a/arch/powerpc/kernel/head_85xx.S b/arch/powerpc/kernel/head_85xx.S
index 24f39abf81df..d9bd377dec91 100644
--- a/arch/powerpc/kernel/head_85xx.S
+++ b/arch/powerpc/kernel/head_85xx.S
@@ -229,7 +229,7 @@ set_ivor:
lis r1,init_thread_union@h
ori r1,r1,init_thread_union@l
li  r0,0
-   stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1)
+   stwur0,THREAD_SIZE-STACK_FRAME_MIN_SIZE(r1)
 
 #ifdef CONFIG_SMP
stw r24, TASK_CPU(r2)
@@ -1044,7 +1044,7 @@ __secondary_start:
lwz r1,TASK_STACK(r2)
 
/* stack */
-   addir1,r1,THREAD_SIZE-STACK_FRAME_OVERHEAD
+   addir1,r1,THREAD_SIZE-STACK_FRAME_MIN_SIZE
li  r0,0
stw r0,0(r1)
 
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 0b05f2be66b9..cf546d0e5c40 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -537,7 +537,7 @@ start_here:
ori r0, r0, STACK_END_MAGIC@l
stw r0, 0(r1)
li  r0,0
-   stwur0,THREAD_SIZE-STACK_FRAME_OVERHEAD(r1)
+   stwur0,THREAD_SIZE-STACK_FRAME_MIN_SIZE(r1)
 
lis r6, swapper_pg_dir@ha
tophys(r6,r6)
diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index 519b60695167..40854d092dd3 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -840,7 +840,7 @@ __secondary_start:
 

[PATCH 16/17] powerpc/64: ELFv2 use minimal stack frames in int and switch frame sizes

2022-11-27 Thread Nicholas Piggin
Adjust the ELFv2 interrupt and switch frames to the minimum C ABI size,
plus pt_regs, plus 16 bytes for the aligned regs marker for the int
frame (and the switch frame needs to match that because it uses the same
regs offset as the int frame).

This saves 80 bytes of kernel stack per interrupt. It's the principle of
getting our accounting right that's more important than the practical
saving.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ptrace.h | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 412ef0749775..a9dfce62a5eb 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -120,16 +119,26 @@ struct pt_regs
 
 #define STACK_FRAME_OVERHEAD   112 /* size of minimum stack frame */
 #define STACK_FRAME_LR_SAVE2   /* Location of LR in stack frame */
+
+#ifdef CONFIG_PPC64_ELF_ABI_V2
+#define STACK_FRAME_MIN_SIZE   32
+#define STACK_USER_INT_FRAME_SIZE  (sizeof(struct pt_regs) + 
STACK_FRAME_MIN_SIZE + 16)
+#define STACK_INT_FRAME_REGS   (STACK_FRAME_MIN_SIZE + 16)
+#define STACK_INT_FRAME_MARKER STACK_FRAME_MIN_SIZE
+#define STACK_SWITCH_FRAME_SIZE (sizeof(struct pt_regs) + STACK_FRAME_MIN_SIZE 
+ 16)
+#define STACK_SWITCH_FRAME_REGS(STACK_FRAME_MIN_SIZE + 16)
+#else
+/*
+ * The ELFv1 ABI specifies 48 bytes plus a minimum 64 byte parameter save
+ * area. This parameter area is not used by calls to C from interrupt entry,
+ * so the second from last one of those is used for the frame marker.
+ */
+#define STACK_FRAME_MIN_SIZE   112
 #define STACK_USER_INT_FRAME_SIZE  (sizeof(struct pt_regs) + 
STACK_FRAME_OVERHEAD)
 #define STACK_INT_FRAME_REGS   STACK_FRAME_OVERHEAD
 #define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 16)
 #define STACK_SWITCH_FRAME_SIZE(sizeof(struct pt_regs) + 
STACK_FRAME_OVERHEAD)
 #define STACK_SWITCH_FRAME_REGSSTACK_FRAME_OVERHEAD
-
-#ifdef CONFIG_PPC64_ELF_ABI_V2
-#define STACK_FRAME_MIN_SIZE   32
-#else
-#define STACK_FRAME_MIN_SIZE   STACK_FRAME_OVERHEAD
 #endif
 
 /* Size of dummy stack frame allocated when calling signal handler. */
-- 
2.37.2



[PATCH 17/17] powerpc: remove STACK_FRAME_OVERHEAD

2022-11-27 Thread Nicholas Piggin
This is equal to STACK_FRAME_MIN_SIZE on 32-bit and 64-bit ELFv1, and no
longer used in 64-bit ELFv2, so replace STACK_FRAME_OVERHEAD occurrences
with STACK_FRAME_MIN_SIZE.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ptrace.h | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index a9dfce62a5eb..a53c580388e2 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -117,7 +117,6 @@ struct pt_regs
 #define USER_REDZONE_SIZE  512
 #define KERNEL_REDZONE_SIZE288
 
-#define STACK_FRAME_OVERHEAD   112 /* size of minimum stack frame */
 #define STACK_FRAME_LR_SAVE2   /* Location of LR in stack frame */
 
 #ifdef CONFIG_PPC64_ELF_ABI_V2
@@ -134,11 +133,11 @@ struct pt_regs
  * so the second from last one of those is used for the frame marker.
  */
 #define STACK_FRAME_MIN_SIZE   112
-#define STACK_USER_INT_FRAME_SIZE  (sizeof(struct pt_regs) + 
STACK_FRAME_OVERHEAD)
-#define STACK_INT_FRAME_REGS   STACK_FRAME_OVERHEAD
-#define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 16)
-#define STACK_SWITCH_FRAME_SIZE(sizeof(struct pt_regs) + 
STACK_FRAME_OVERHEAD)
-#define STACK_SWITCH_FRAME_REGSSTACK_FRAME_OVERHEAD
+#define STACK_USER_INT_FRAME_SIZE  (sizeof(struct pt_regs) + 
STACK_FRAME_MIN_SIZE)
+#define STACK_INT_FRAME_REGS   STACK_FRAME_MIN_SIZE
+#define STACK_INT_FRAME_MARKER (STACK_FRAME_MIN_SIZE - 16)
+#define STACK_SWITCH_FRAME_SIZE(sizeof(struct pt_regs) + 
STACK_FRAME_MIN_SIZE)
+#define STACK_SWITCH_FRAME_REGSSTACK_FRAME_MIN_SIZE
 #endif
 
 /* Size of dummy stack frame allocated when calling signal handler. */
@@ -149,14 +148,13 @@ struct pt_regs
 
 #define USER_REDZONE_SIZE  0
 #define KERNEL_REDZONE_SIZE0
-#define STACK_FRAME_OVERHEAD   16  /* size of minimum stack frame */
+#define STACK_FRAME_MIN_SIZE   16
 #define STACK_FRAME_LR_SAVE1   /* Location of LR in stack frame */
-#define STACK_USER_INT_FRAME_SIZE  (sizeof(struct pt_regs) + 
STACK_FRAME_OVERHEAD)
-#define STACK_INT_FRAME_REGS   STACK_FRAME_OVERHEAD
-#define STACK_INT_FRAME_MARKER (STACK_FRAME_OVERHEAD - 8)
-#define STACK_FRAME_MIN_SIZE   STACK_FRAME_OVERHEAD
-#define STACK_SWITCH_FRAME_SIZE(sizeof(struct pt_regs) + 
STACK_FRAME_OVERHEAD)
-#define STACK_SWITCH_FRAME_REGSSTACK_FRAME_OVERHEAD
+#define STACK_USER_INT_FRAME_SIZE  (sizeof(struct pt_regs) + 
STACK_FRAME_MIN_SIZE)
+#define STACK_INT_FRAME_REGS   STACK_FRAME_MIN_SIZE
+#define STACK_INT_FRAME_MARKER (STACK_FRAME_MIN_SIZE - 8)
+#define STACK_SWITCH_FRAME_SIZE(sizeof(struct pt_regs) + 
STACK_FRAME_MIN_SIZE)
+#define STACK_SWITCH_FRAME_REGSSTACK_FRAME_MIN_SIZE
 
 /* Size of stack frame allocated when calling signal handler. */
 #define __SIGNAL_FRAMESIZE 64
-- 
2.37.2



Re: [PATCH linux-next][RFC]torture: avoid offline tick_do_timer_cpu

2022-11-27 Thread Paul E. McKenney
On Sun, Nov 27, 2022 at 01:40:28PM +0100, Thomas Gleixner wrote:

[ . . . ]

> >> No. We are not exporting this just to make a bogus test case happy.
> >>
> >> Fix the torture code to handle -EBUSY correctly.
> > I am going to do a study on this, for now, I do a grep in the kernel tree:
> > find . -name "*.c"|xargs grep cpuhp_setup_state|wc -l
> > The result of the grep command shows that there are 268
> > cpuhp_setup_state* cases.
> > which may make our task more complicated.
> 
> Why? The whole point of this torture thing is to stress the
> infrastructure.

Indeed.

> There are quite some reasons why a CPU-hotplug or a hot-unplug operation
> can fail, which is not a fatal problem, really.
> 
> So if a CPU hotplug operation fails, then why can't the torture test
> just move on and validate that the system still behaves correctly?
> 
> That gives us more coverage than just testing the good case and giving
> up when something unexpected happens.

Agreed, with access to a function like the tick_nohz_full_timekeeper()
suggested earlier in this email thread, then yes, it would make sense to
try to offline the CPU anyway, then forgive the failure in cases where
the CPU matches that indicated by tick_nohz_full_timekeeper().

> I even argue that the torture test should inject random failures into
> the hotplug state machine to achieve extended code coverage.

I could imagine torture_onoff() telling various CPU-hotplug notifiers
to refuse the transition using some TBD interface.  That would better
test the CPU-hotplug common code's ability to deal with failures.

Or did you have something else/additional in mind?

Thanx, Paul


[powerpc:next-test] BUILD SUCCESS 4eef1c9ccd19132c34fd55e79b104ace87ff09d4

2022-11-27 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
branch HEAD: 4eef1c9ccd19132c34fd55e79b104ace87ff09d4  selftests/powerpc: 
Account for offline cpus in perf-hwbreak test

elapsed time: 743m

configs tested: 58
configs skipped: 4

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arc defconfig
alpha   defconfig
um i386_defconfig
x86_64randconfig-a011
x86_64randconfig-a004
x86_64rhel-8.3-kselftests
x86_64randconfig-a002
x86_64  rhel-8.3-func
um   x86_64_defconfig
x86_64   rhel-8.3
x86_64randconfig-a013
powerpc   allnoconfig
x86_64randconfig-a006
x86_64  defconfig
i386defconfig
i386  randconfig-a014
sh   allmodconfig
arc  randconfig-r043-20221127
i386  randconfig-a001
riscvrandconfig-r042-20221127
x86_64randconfig-a015
x86_64   allyesconfig
i386  randconfig-a003
i386  randconfig-a005
s390 randconfig-r044-20221127
ia64 allmodconfig
x86_64   rhel-8.3-kvm
i386  randconfig-a012
s390defconfig
i386  randconfig-a016
s390 allmodconfig
i386 allyesconfig
x86_64   rhel-8.3-syz
m68k allyesconfig
x86_64 rhel-8.3-kunit
s390 allyesconfig
mips allyesconfig
powerpc  allmodconfig
alphaallyesconfig
arc  allyesconfig
m68k allmodconfig
arm defconfig
arm  allyesconfig
arm64allyesconfig

clang tested configs:
hexagon  randconfig-r045-20221127
hexagon  randconfig-r041-20221127
x86_64randconfig-a012
x86_64randconfig-a005
x86_64randconfig-a001
x86_64randconfig-a016
x86_64randconfig-a003
i386  randconfig-a013
i386  randconfig-a011
i386  randconfig-a004
i386  randconfig-a002
i386  randconfig-a006
x86_64randconfig-a014
i386  randconfig-a015

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


[powerpc:topic/ppc-kvm] BUILD SUCCESS a96b20758b23be7e9f693218908228d6100c3c26

2022-11-27 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
topic/ppc-kvm
branch HEAD: a96b20758b23be7e9f693218908228d6100c3c26  KVM: PPC: Book3S HV: Use 
the bitmap API to allocate bitmaps

elapsed time: 743m

configs tested: 2
configs skipped: 100

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
powerpc   allnoconfig
powerpc  allmodconfig

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


[powerpc:fixes-test] BUILD SUCCESS 2e7ec190a0e38aaa8a6d87fd5f804ec07947febc

2022-11-27 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
fixes-test
branch HEAD: 2e7ec190a0e38aaa8a6d87fd5f804ec07947febc  powerpc/64s: Add missing 
declaration for machine_check_early_boot()

elapsed time: 746m

configs tested: 58
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
x86_64  rhel-8.3-func
x86_64rhel-8.3-kselftests
x86_64 rhel-8.3-kunit
x86_64   rhel-8.3-kvm
x86_64randconfig-a013
x86_64   rhel-8.3-syz
x86_64randconfig-a011
x86_64randconfig-a015
um i386_defconfig
um   x86_64_defconfig
arc defconfig
i386  randconfig-a001
s390 allmodconfig
x86_64  defconfig
alpha   defconfig
i386  randconfig-a003
sh   allmodconfig
i386defconfig
powerpc  allmodconfig
x86_64randconfig-a006
i386  randconfig-a016
mips allyesconfig
ia64 allmodconfig
s390defconfig
i386  randconfig-a005
i386  randconfig-a012
s390 allyesconfig
x86_64   rhel-8.3
x86_64   allyesconfig
i386  randconfig-a014
arc  randconfig-r043-20221127
m68k allmodconfig
powerpc   allnoconfig
arc  allyesconfig
i386 allyesconfig
x86_64randconfig-a002
alphaallyesconfig
riscvrandconfig-r042-20221127
m68k allyesconfig
x86_64randconfig-a004
s390 randconfig-r044-20221127
arm defconfig
arm  allyesconfig
arm64allyesconfig

clang tested configs:
x86_64randconfig-a014
x86_64randconfig-a012
x86_64randconfig-a016
hexagon  randconfig-r045-20221127
hexagon  randconfig-r041-20221127
x86_64randconfig-a005
i386  randconfig-a002
i386  randconfig-a015
i386  randconfig-a006
i386  randconfig-a013
i386  randconfig-a004
i386  randconfig-a011
x86_64randconfig-a001
x86_64randconfig-a003

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


Re: [PATCH 2/3] powerpc/book3e: remove #include

2022-11-27 Thread Michael Ellerman
Thomas Weißschuh  writes:
> On 2022-11-26 07:36+, Christophe Leroy wrote:
>> Le 26/11/2022 à 06:10, Thomas Weißschuh a écrit :
>>> Commit 7ad4bd887d27 ("powerpc/book3e: get rid of #include 
>>> ")
>>> removed the usage of the define UTS_VERSION but forgot to drop the
>>> include.
>> 
>> What about:
>> arch/powerpc/platforms/52xx/efika.c
>> arch/powerpc/platforms/amigaone/setup.c
>> arch/powerpc/platforms/chrp/setup.c
>> arch/powerpc/platforms/powermac/bootx_init.c
>> 
>> I believe you can do a lot more than what you did in your series.
>
> The commit messages are wrong.
> They should have said UTS_RELEASE instead of UTS_VERSION.
>
> Could the maintainers fix this up when applying?
> I also changed it locally so it will be fixed for v2.

I'll take this patch, but not the others.

cheers


Re: [PATCH] powerpc/64s: Add missing declaration for machine_check_early_boot()

2022-11-27 Thread Nicholas Piggin
On Fri Nov 25, 2022 at 11:25 PM AEST, Michael Ellerman wrote:
> There's no declaration for machine_check_early_boot(), which leads to a
> build failure with W=1. Add one.
>
> Fixes: 2f5182cffa43 ("powerpc/64s: early boot machine check handler")
> Signed-off-by: Michael Ellerman 

Acked-by: Nicholas Piggin 

> ---
>  arch/powerpc/include/asm/interrupt.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/powerpc/include/asm/interrupt.h 
> b/arch/powerpc/include/asm/interrupt.h
> index 4745bb9998bd..6d8492b6e2b8 100644
> --- a/arch/powerpc/include/asm/interrupt.h
> +++ b/arch/powerpc/include/asm/interrupt.h
> @@ -602,6 +602,7 @@ ##func(struct pt_regs *regs)
>  /* kernel/traps.c */
>  DECLARE_INTERRUPT_HANDLER_NMI(system_reset_exception);
>  #ifdef CONFIG_PPC_BOOK3S_64
> +DECLARE_INTERRUPT_HANDLER_RAW(machine_check_early_boot);
>  DECLARE_INTERRUPT_HANDLER_ASYNC(machine_check_exception_async);
>  #endif
>  DECLARE_INTERRUPT_HANDLER_NMI(machine_check_exception);
> -- 
> 2.38.1



Re: [PATCH v2 1/4] powerpc/64: Add INTERRUPT_SANITIZE_REGISTERS Kconfig

2022-11-27 Thread Nicholas Piggin
On Tue Nov 8, 2022 at 12:28 AM AEST, Christophe Leroy wrote:
>
>
> Le 07/11/2022 à 04:31, Rohan McLure a écrit :
> > Add Kconfig option for enabling clearing of registers on arrival in an
> > interrupt handler. This reduces the speculation influence of registers
> > on kernel internals. The option will be consumed by 64-bit systems that
> > feature speculation and wish to implement this mitigation.
> > 
> > This patch only introduces the Kconfig option, no actual mitigations.
>
> If that has to do with speculation, do we need a new Kconfig option ? 
> Can't we use CONFIG_PPC_BARRIER_NOSPEC for that ?

NOSPEC barrier adds runtime-patchable hardware barrier and that config
is a build implementation detail. Also that spec barrier is for bounds
checks speculation that is easy to get the kernel to do something like
speculatively branch to arbitrary address.

Interrupt/syscall register sanitization is more handwavy. It could be
a bandaid for cases where the above speculation barrier was missed
for exampel. But at some point, at least for syscalls, registers have to
contain some values influenced by userspace so if we were paranoid
we would have to put barriers before every branch while any registers
contained a value from userspace.

A security option menu might be a good idea though. There's some other
build time options like rop protection that we might want to add.

Thanks,
Nick



Re: [PATCH v2 2/4] powerpc/64s: Clear gprs on interrupt routine entry on Book3S

2022-11-27 Thread Nicholas Piggin
On Mon Nov 7, 2022 at 1:32 PM AEST, Rohan McLure wrote:
> Zero user state in gprs (assign to zero) to reduce the influence of user
> registers on speculation within kernel syscall handlers. Clears occur
> at the very beginning of the sc and scv 0 interrupt handlers, with
> restores occurring following the execution of the syscall handler.
>
> Zero GPRS r0, r2-r11, r14-r31, on entry into the kernel for all
> other interrupt sources. The remaining gprs are overwritten by
> entry macros to interrupt handlers, irrespective of whether or not a
> given handler consumes these register values.
>
> Prior to this commit, r14-r31 are restored on a per-interrupt basis at
> exit, but now they are always restored on 64bit Book3S. Remove explicit
> REST_NVGPRS invocations on 64-bit Book3S. 32-bit systems do not clear
> user registers on interrupt, and continue to depend on the return value
> of interrupt_exit_user_prepare to determine whether or not to restore
> non-volatiles.
>
> The mmap_bench benchmark in selftests should rapidly invoke pagefaults.
> See ~0.8% performance regression with this mitigation, but this
> indicates the worst-case performance due to heavier-weight interrupt
> handlers. This mitigation is able to be enabled/disabled through
> CONFIG_INTERRUPT_SANITIZE_REGISTERS.

I think it looks good. You could put those macros into a .h file shared
by exceptions-64s.S and interrupt_64.S. Also interrupt_64.S could use
the HANDLER_RESTORE_NVGPRS macro to kill a few ifdefs I think? The
IMSR_R12 change *could* be done in a separate patch, if you're doing
another spin... sorry for the late feedback.

Reviewed-by: Nicholas Piggin 

>
> Signed-off-by: Rohan McLure 
> ---
> Resubmitting patches as their own series after v6 partially merged:
> Link: 
> https://lore.kernel.org/all/166488988686.779920.13794870102696416283.b4...@ellerman.id.au/t/
>
> v2: REST_NVGPRS should be conditional on mitigation in scv handler. Fix
> improper multi-line preprocessor macro in interrupt_64.S
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 47 +-
>  arch/powerpc/kernel/interrupt_64.S   | 36 
>  2 files changed, 74 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index 651c36b056bd..0605018762d1 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -21,6 +21,19 @@
>  #include 
>  #include 
>  
> +/*
> + * macros for handling user register sanitisation
> + */
> +#ifdef CONFIG_INTERRUPT_SANITIZE_REGISTERS
> +#define SANITIZE_ZEROIZE_NVGPRS()ZEROIZE_NVGPRS()
> +#define SANITIZE_RESTORE_NVGPRS()REST_NVGPRS(r1)
> +#define HANDLER_RESTORE_NVGPRS()
> +#else
> +#define SANITIZE_ZEROIZE_NVGPRS()
> +#define SANITIZE_RESTORE_NVGPRS()
> +#define HANDLER_RESTORE_NVGPRS() REST_NVGPRS(r1)
> +#endif /* CONFIG_INTERRUPT_SANITIZE_REGISTERS */
> +
>  /*
>   * Following are fixed section helper macros.
>   *
> @@ -111,6 +124,7 @@ name:
>  #define ISTACK   .L_ISTACK_\name\()  /* Set regular kernel 
> stack */
>  #define __ISTACK(name)   .L_ISTACK_ ## name
>  #define IKUAP.L_IKUAP_\name\()   /* Do KUAP lock */
> +#define IMSR_R12 .L_IMSR_R12_\name\()/* Assumes MSR saved to r12 */
>  
>  #define INT_DEFINE_BEGIN(n)  \
>  .macro int_define_ ## n name
> @@ -176,6 +190,9 @@ do_define_int n
>   .ifndef IKUAP
>   IKUAP=1
>   .endif
> + .ifndef IMSR_R12
> + IMSR_R12=0
> + .endif
>  .endm
>  
>  /*
> @@ -502,6 +519,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real, text)
>   std r10,0(r1)   /* make stack chain pointer */
>   std r0,GPR0(r1) /* save r0 in stackframe*/
>   std r10,GPR1(r1)/* save r1 in stackframe*/
> + ZEROIZE_GPR(0)
>  
>   /* Mark our [H]SRRs valid for return */
>   li  r10,1
> @@ -544,8 +562,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
>   std r9,GPR11(r1)
>   std r10,GPR12(r1)
>   std r11,GPR13(r1)
> + .if !IMSR_R12
> + ZEROIZE_GPRS(9, 12)
> + .else
> + ZEROIZE_GPRS(9, 11)
> + .endif
>  
>   SAVE_NVGPRS(r1)
> + SANITIZE_ZEROIZE_NVGPRS()
>  
>   .if IDAR
>   .if IISIDE
> @@ -577,8 +601,8 @@ BEGIN_FTR_SECTION
>  END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
>   ld  r10,IAREA+EX_CTR(r13)
>   std r10,_CTR(r1)
> - std r2,GPR2(r1) /* save r2 in stackframe*/
> - SAVE_GPRS(3, 8, r1) /* save r3 - r8 in stackframe   */
> + SAVE_GPRS(2, 8, r1) /* save r2 - r8 in stackframe   */
> + ZEROIZE_GPRS(2, 8)
>   mflrr9  /* Get LR, later save to stack  */
>   LOAD_PACA_TOC() /* get kernel TOC into r2   */
>   std r9,_LINK(r1)
> @@ -696,6 +720,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
>

Re: [PATCH v2 3/4] powerpc/64e: Clear gprs on interrupt routine entry on Book3E

2022-11-27 Thread Nicholas Piggin
On Mon Nov 7, 2022 at 1:32 PM AEST, Rohan McLure wrote:
> Zero GPRS r14-r31 on entry into the kernel for interrupt sources to
> limit influence of user-space values in potential speculation gadgets.
> Prior to this commit, all other GPRS are reassigned during the common
> prologue to interrupt handlers and so need not be zeroised explicitly.
>
> This may be done safely, without loss of register state prior to the
> interrupt, as the common prologue saves the initial values of
> non-volatiles, which are unconditionally restored in interrupt_64.S.

In the case of ret_from_crit_except and ret_from_mc_except, it looks
like those are restored by ret_from_level_except, so that's fine.
And fast_interrupt_return you added NVGPRS restore in the previous
patch too.

Maybe actually you could move that interrupt_64.h code that applies to
both 64s and 64e in patch 1. So then the 64s/e enablement patches are
independent and apply to exactly that subarch.

But code-wise I think this looks good.

Reviewed-by: Nicholas Piggin 

> Mitigation defaults to enabled by INTERRUPT_SANITIZE_REGISTERS.
>
> Signed-off-by: Rohan McLure 
> ---
> Resubmitting patches as their own series after v6 partially merged:
> Link: 
> https://lore.kernel.org/all/166488988686.779920.13794870102696416283.b4...@ellerman.id.au/t/
> ---
>  arch/powerpc/kernel/exceptions-64e.S | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64e.S 
> b/arch/powerpc/kernel/exceptions-64e.S
> index 2f68fb2ee4fc..91d8019123c2 100644
> --- a/arch/powerpc/kernel/exceptions-64e.S
> +++ b/arch/powerpc/kernel/exceptions-64e.S
> @@ -358,6 +358,11 @@ ret_from_mc_except:
>   std r14,PACA_EXMC+EX_R14(r13);  \
>   std r15,PACA_EXMC+EX_R15(r13)
>  
> +#ifdef CONFIG_INTERRUPT_SANITIZE_REGISTERS
> +#define SANITIZE_ZEROIZE_NVGPRS()ZEROIZE_NVGPRS()
> +#else
> +#define SANITIZE_ZEROIZE_NVGPRS()
> +#endif

Could possibly share these macros.

>  
>  /* Core exception code for all exceptions except TLB misses. */
>  #define EXCEPTION_COMMON_LVL(n, scratch, excf)   
> \
> @@ -394,7 +399,8 @@ exc_##n##_common: 
> \
>   std r12,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame */   \
>   std r3,_TRAP(r1);   /* set trap number  */  \
>   std r0,RESULT(r1);  /* clear regs->result */\
> - SAVE_NVGPRS(r1);
> + SAVE_NVGPRS(r1);\
> + SANITIZE_ZEROIZE_NVGPRS();  /* minimise speculation influence */
>  
>  #define EXCEPTION_COMMON(n) \
>   EXCEPTION_COMMON_LVL(n, SPRN_SPRG_GEN_SCRATCH, PACA_EXGEN)
> -- 
> 2.34.1



Re: [PATCH v2 4/4] powerpc/64s: Sanitise user registers on interrupt in pseries

2022-11-27 Thread Nicholas Piggin
On Mon Nov 7, 2022 at 1:32 PM AEST, Rohan McLure wrote:
> Cause pseries platforms to default to zeroising all potentially user-defined
> registers when entering the kernel by means of any interrupt source,
> reducing user-influence of the kernel and the likelihood or producing
> speculation gadgets.

For POWERNV as well?

Thanks,
Nick

>
> Signed-off-by: Rohan McLure 
> ---
> Resubmitting patches as their own series after v6 partially merged:
> Link: 
> https://lore.kernel.org/all/166488988686.779920.13794870102696416283.b4...@ellerman.id.au/t/
> ---
>  arch/powerpc/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 9d3d20c6f365..2eb328b25e49 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -532,7 +532,7 @@ config HOTPLUG_CPU
>  config INTERRUPT_SANITIZE_REGISTERS
>   bool "Clear gprs on interrupt arrival"
>   depends on PPC64 && ARCH_HAS_SYSCALL_WRAPPER
> - default PPC_BOOK3E_64
> + default PPC_BOOK3E_64 || PPC_PSERIES
>   help
> Reduce the influence of user register state on interrupt handlers and
> syscalls through clearing user state from registers before handling
> -- 
> 2.34.1



Re: [PATCH 03/13] powerpc/rtas: avoid device tree lookups in rtas_os_term()

2022-11-27 Thread Nicholas Piggin
On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote:
> rtas_os_term() is called during panic. Its behavior depends on a
> couple of conditions in the /rtas node of the device tree, the
> traversal of which entails locking and local IRQ state changes. If the
> kernel panics while devtree_lock is held, rtas_os_term() as currently
> written could hang.

Nice.

>
> Instead of discovering the relevant characteristics at panic time,
> cache them in file-static variables at boot. Note the lookup for
> "ibm,extended-os-term" is converted to of_property_read_bool() since
> it is a boolean property, not a RTAS function token.

Small nit, but you could do that at the query site unless you
were going to start using ibm,os-term without the extended
capability.

Reviewed-by: Nicholas Piggin 

>
> Signed-off-by: Nathan Lynch 
> ---
>  arch/powerpc/kernel/rtas.c | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index c12dd5ed5e00..81e4996012b7 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -947,6 +947,8 @@ void __noreturn rtas_halt(void)
>  
>  /* Must be in the RMO region, so we place it here */
>  static char rtas_os_term_buf[2048];
> +static s32 ibm_os_term_token = RTAS_UNKNOWN_SERVICE;
> +static bool ibm_extended_os_term;
>  
>  void rtas_os_term(char *str)
>  {
> @@ -958,14 +960,13 @@ void rtas_os_term(char *str)
>* this property may terminate the partition which we want to avoid
>* since it interferes with panic_timeout.
>*/
> - if (RTAS_UNKNOWN_SERVICE == rtas_token("ibm,os-term") ||
> - RTAS_UNKNOWN_SERVICE == rtas_token("ibm,extended-os-term"))
> + if (ibm_os_term_token == RTAS_UNKNOWN_SERVICE || !ibm_extended_os_term)
>   return;
>  
>   snprintf(rtas_os_term_buf, 2048, "OS panic: %s", str);
>  
>   do {
> - status = rtas_call(rtas_token("ibm,os-term"), 1, 1, NULL,
> + status = rtas_call(ibm_os_term_token, 1, 1, NULL,
>  __pa(rtas_os_term_buf));
>   } while (rtas_busy_delay(status));
>  
> @@ -1335,6 +1336,13 @@ void __init rtas_initialize(void)
>   no_entry = of_property_read_u32(rtas.dev, "linux,rtas-entry", &entry);
>   rtas.entry = no_entry ? rtas.base : entry;
>  
> + /*
> +  * Discover these now to avoid device tree lookups in the
> +  * panic path.
> +  */
> + ibm_os_term_token = rtas_token("ibm,os-term");
> + ibm_extended_os_term = of_property_read_bool(rtas.dev, 
> "ibm,extended-os-term");
> +
>   /* If RTAS was found, allocate the RMO buffer for it and look for
>* the stop-self token if any
>*/
> -- 
> 2.37.1



Re: [PATCH 04/13] powerpc/rtas: avoid scheduling in rtas_os_term()

2022-11-27 Thread Nicholas Piggin
On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote:
> It's unsafe to use rtas_busy_delay() to handle a busy status from
> the ibm,os-term RTAS function in rtas_os_term():
>
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b
> BUG: sleeping function called from invalid context at 
> arch/powerpc/kernel/rtas.c:618
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> preempt_count: 2, expected: 0
> CPU: 7 PID: 1 Comm: swapper/0 Tainted: G  D
> 6.0.0-rc5-02182-gf8553a572277-dirty #9
> Call Trace:
> [c7b8f000] [c1337110] dump_stack_lvl+0xb4/0x110 (unreliable)
> [c7b8f040] [c02440e4] __might_resched+0x394/0x3c0
> [c7b8f0e0] [c004f680] rtas_busy_delay+0x120/0x1b0
> [c7b8f100] [c0052d04] rtas_os_term+0xb8/0xf4
> [c7b8f180] [c01150fc] pseries_panic+0x50/0x68
> [c7b8f1f0] [c0036354] ppc_panic_platform_handler+0x34/0x50
> [c7b8f210] [c02303c4] notifier_call_chain+0xd4/0x1c0
> [c7b8f2b0] [c02306cc] atomic_notifier_call_chain+0xac/0x1c0
> [c7b8f2f0] [c01d62b8] panic+0x228/0x4d0
> [c7b8f390] [c01e573c] do_exit+0x140c/0x1420
> [c7b8f480] [c01e586c] make_task_dead+0xdc/0x200
>
> Use rtas_busy_delay_time() instead, which signals without side effects
> whether to attempt the ibm,os-term RTAS call again.

rtas_busy_delay should probably be renamed to rtas_busy_sleep, to make
that self-documenting that it can schedule. You could then add a
rtas_busy_delay which doesn't sleep, which a few other places could
use...

But that's a bigger chance and there is precedent for using this call
this way, so looks okay to me. Maybe you could open-code an mdelay
though, although I guess firmware should be tolerant of calling it in
a loop.

Reviewed-by: Nicholas Piggin 

>
> Signed-off-by: Nathan Lynch 
> ---
>  arch/powerpc/kernel/rtas.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 81e4996012b7..51f0508593a7 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -965,10 +965,15 @@ void rtas_os_term(char *str)
>  
>   snprintf(rtas_os_term_buf, 2048, "OS panic: %s", str);
>  
> + /*
> +  * Keep calling as long as RTAS returns a "try again" status,
> +  * but don't use rtas_busy_delay(), which potentially
> +  * schedules.
> +  */
>   do {
>   status = rtas_call(ibm_os_term_token, 1, 1, NULL,
>  __pa(rtas_os_term_buf));
> - } while (rtas_busy_delay(status));
> + } while (rtas_busy_delay_time(status));
>  
>   if (status != 0)
>   printk(KERN_EMERG "ibm,os-term call failed %d\n", status);
> -- 
> 2.37.1



Re: [PATCH 11/13] powerpc/rtas: strengthen do_enter_rtas() type safety, drop inline

2022-11-27 Thread Nicholas Piggin
On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote:
> Make do_enter_rtas() take a pointer to struct rtas_args and do the
> __pa() conversion in one place instead of leaving it to callers. This
> also makes it possible to introduce enter/exit tracepoints that access
> the rtas_args struct fields.
>
> There's no apparent reason to force inlining of do_enter_rtas()
> either, and it seems to bloat the code a bit. Let the compiler decide.

Reviewed-by: Nicholas Piggin 

>
> Signed-off-by: Nathan Lynch 
> ---
>  arch/powerpc/kernel/rtas.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index a88db3b3486f..198366d641d0 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -522,7 +522,7 @@ static const struct rtas_function 
> *rtas_token_to_function(s32 token)
>  /* This is here deliberately so it's only used in this file */
>  void enter_rtas(unsigned long);
>  
> -static inline void do_enter_rtas(unsigned long args)
> +static void do_enter_rtas(struct rtas_args *args)
>  {
>   unsigned long msr;
>  
> @@ -537,7 +537,7 @@ static inline void do_enter_rtas(unsigned long args)
>  
>   hard_irq_disable(); /* Ensure MSR[EE] is disabled on PPC64 */
>  
> - enter_rtas(args);
> + enter_rtas(__pa(args));
>  
>   srr_regs_clobbered(); /* rtas uses SRRs, invalidate */
>  }
> @@ -908,7 +908,7 @@ static char *__fetch_rtas_last_error(char *altbuf)
>   save_args = rtas.args;
>   rtas.args = err_args;
>  
> - do_enter_rtas(__pa(&rtas.args));
> + do_enter_rtas(&rtas.args);
>  
>   err_args = rtas.args;
>   rtas.args = save_args;
> @@ -955,7 +955,7 @@ va_rtas_call_unlocked(struct rtas_args *args, int token, 
> int nargs, int nret,
>   for (i = 0; i < nret; ++i)
>   args->rets[i] = 0;
>  
> - do_enter_rtas(__pa(args));
> + do_enter_rtas(args);
>  }
>  
>  void rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int 
> nret, ...)
> @@ -1731,7 +1731,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>   flags = lock_rtas();
>  
>   rtas.args = args;
> - do_enter_rtas(__pa(&rtas.args));
> + do_enter_rtas(&rtas.args);
>   args = rtas.args;
>  
>   /* A -1 return code indicates that the last command couldn't
> -- 
> 2.37.1



[RFC PATCH 08/13] powerpc/dexcr: Add enforced userspace ROP protection config

2022-11-27 Thread Benjamin Gray
The DEXCR Non-Privileged Hash Instruction Enable (NPHIE) aspect controls
whether the hashst and hashchk instructions are treated as no-ops by the
CPU.

NPHIE behaviour per ISA 3.1B:

0:  hashst and hashchk instructions are executed as no-ops
(even when allowed by PCR)

1:  hashst and hashchk instructions are executed normally
(if allowed by PCR)

Currently this aspect may be set per-process by prctl() or enforced
globally by the hypervisor.

Add a kernel config option PPC_USER_ROP_PROTECT to enforce DEXCR[NPHIE]
globally regardless of prctl() or hypervisor. If set, don't report
NPHIE as editable via prctl(), as the prctl() value can never take
effect.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/Kconfig|  5 +
 arch/powerpc/kernel/dexcr.c | 15 +++
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 699df27b0e2f..ba3458d07744 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -434,6 +434,11 @@ config PGTABLE_LEVELS
default 2 if !PPC64
default 4
 
+config PPC_USER_ROP_PROTECT
+   bool
+   depends on PPC_BOOK3S_64
+   default y
+
 source "arch/powerpc/sysdev/Kconfig"
 source "arch/powerpc/platforms/Kconfig"
 
diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c
index 8239bcc92026..394140fc23aa 100644
--- a/arch/powerpc/kernel/dexcr.c
+++ b/arch/powerpc/kernel/dexcr.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -18,8 +19,8 @@
 #define DEFAULT_DEXCR  0
 
 /* Allow process configuration of these by default */
-#define DEXCR_PRCTL_EDITABLE (DEXCR_PRO_SBHE | DEXCR_PRO_IBRTPD | \
- DEXCR_PRO_SRAPD | DEXCR_PRO_NPHIE)
+static unsigned long dexcr_prctl_editable __ro_after_init =
+   DEXCR_PRO_SBHE | DEXCR_PRO_IBRTPD | DEXCR_PRO_SRAPD | DEXCR_PRO_NPHIE;
 
 /*
  * Lock to protect system DEXCR override from concurrent updates.
@@ -83,6 +84,12 @@ static int __init dexcr_init(void)
if (early_cpu_has_feature(CPU_FTR_DEXCR_SBHE))
update_userspace_system_dexcr(DEXCR_PRO_SBHE, 
spec_branch_hint_enable);
 
+   if (early_cpu_has_feature(CPU_FTR_DEXCR_NPHIE) &&
+   IS_ENABLED(CONFIG_PPC_USER_ROP_PROTECT)) {
+   update_userspace_system_dexcr(DEXCR_PRO_NPHIE, 1);
+   dexcr_prctl_editable &= ~DEXCR_PRO_NPHIE;
+   }
+
return 0;
 }
 early_initcall(dexcr_init);
@@ -131,7 +138,7 @@ static int dexcr_aspect_get(struct task_struct *task, 
unsigned int aspect)
 {
int ret = 0;
 
-   if (aspect & DEXCR_PRCTL_EDITABLE)
+   if (aspect & dexcr_prctl_editable)
ret |= PR_PPC_DEXCR_PRCTL;
 
if (aspect & task->thread.dexcr_mask) {
@@ -174,7 +181,7 @@ int dexcr_prctl_get(struct task_struct *task, unsigned long 
which)
 
 static int dexcr_aspect_set(struct task_struct *task, unsigned int aspect, 
unsigned long ctrl)
 {
-   if (!(aspect & DEXCR_PRCTL_EDITABLE))
+   if (!(aspect & dexcr_prctl_editable))
return -ENXIO;  /* Aspect is not allowed to be changed by prctl 
*/
 
if (aspect & task->thread.dexcr_forced)
-- 
2.38.1



[RFC PATCH 04/13] powerpc/dexcr: Support userspace ROP protection

2022-11-27 Thread Benjamin Gray
The ISA 3.1B hashst and hashchk instructions use a per-cpu SPR HASHKEYR
to hold a key used in the hash calculation. This key should be different
for each process to make it harder for a malicious process to recreate
valid hash values for a victim process.

Add support for storing a per-thread hash key, and setting/clearing
HASHKEYR appropriately.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/include/asm/book3s/64/kexec.h |  3 +++
 arch/powerpc/include/asm/processor.h   |  1 +
 arch/powerpc/include/asm/reg.h |  1 +
 arch/powerpc/kernel/process.c  | 12 
 4 files changed, 17 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/kexec.h 
b/arch/powerpc/include/asm/book3s/64/kexec.h
index 563baf94a962..163de935df28 100644
--- a/arch/powerpc/include/asm/book3s/64/kexec.h
+++ b/arch/powerpc/include/asm/book3s/64/kexec.h
@@ -24,6 +24,9 @@ static inline void reset_sprs(void)
if (cpu_has_feature(CPU_FTR_ARCH_31))
mtspr(SPRN_DEXCR, 0);
 
+   if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE))
+   mtspr(SPRN_HASHKEYR, 0);
+
/*  Do we need isync()? We are going via a kexec reset */
isync();
 }
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index c17ec1e44c86..2381217c95dc 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -264,6 +264,7 @@ struct thread_struct {
unsigned long   mmcr3;
unsigned long   sier2;
unsigned long   sier3;
+   unsigned long   hashkeyr;
 
 #endif
 };
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index cdd1f174c399..854664cf844f 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -384,6 +384,7 @@
 #define SPRN_HRMOR 0x139   /* Real mode offset register */
 #define SPRN_HSRR0 0x13A   /* Hypervisor Save/Restore 0 */
 #define SPRN_HSRR1 0x13B   /* Hypervisor Save/Restore 1 */
+#define SPRN_HASHKEYR  0x1D4   /* Non-privileged hashst/hashchk key register */
 #define SPRN_ASDR  0x330   /* Access segment descriptor register */
 #define SPRN_DEXCR 0x33C   /* Dynamic execution control register */
 #define   DEXCR_PRO_MASK(aspect)   __MASK(63 - (32 + (aspect)))/* 
Aspect number to problem state aspect mask */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 17d26f652b80..4d7b0c7641d0 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1229,6 +1229,9 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
old_thread->tidr != new_thread->tidr)
mtspr(SPRN_TIDR, new_thread->tidr);
 
+   if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE))
+   mtspr(SPRN_HASHKEYR, new_thread->hashkeyr);
+
if (cpu_has_feature(CPU_FTR_ARCH_31)) {
unsigned long new_dexcr = get_thread_dexcr(new_thread);
 
@@ -1818,6 +1821,10 @@ int copy_thread(struct task_struct *p, const struct 
kernel_clone_args *args)
childregs->ppr = DEFAULT_PPR;
 
p->thread.tidr = 0;
+#endif
+#ifdef CONFIG_PPC_BOOK3S_64
+   if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE))
+   p->thread.hashkeyr = current->thread.hashkeyr;
 #endif
/*
 * Run with the current AMR value of the kernel
@@ -1947,6 +1954,11 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
current->thread.load_tm = 0;
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 #ifdef CONFIG_PPC_BOOK3S_64
+   if (cpu_has_feature(CPU_FTR_DEXCR_NPHIE)) {
+   current->thread.hashkeyr = get_random_long();
+   mtspr(SPRN_HASHKEYR, current->thread.hashkeyr);
+   }
+
if (cpu_has_feature(CPU_FTR_ARCH_31))
mtspr(SPRN_DEXCR, get_thread_dexcr(¤t->thread));
 #endif /* CONFIG_PPC_BOOK3S_64 */
-- 
2.38.1



[RFC PATCH 03/13] powerpc/dexcr: Handle hashchk exception

2022-11-27 Thread Benjamin Gray
Recognise and pass the appropriate signal to the user program when a
hashchk instruction triggers. This is independent of allowing
configuration of DEXCR[NPHIE], as a hypervisor can enforce this aspect
regardless of the kernel.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/include/asm/ppc-opcode.h |  1 +
 arch/powerpc/include/asm/processor.h  |  6 ++
 arch/powerpc/kernel/dexcr.c   | 22 ++
 arch/powerpc/kernel/traps.c   |  6 ++
 4 files changed, 35 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 21e33e46f4b8..89b316466ed1 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -215,6 +215,7 @@
 #define OP_31_XOP_STFSX663
 #define OP_31_XOP_STFSUX695
 #define OP_31_XOP_STFDX 727
+#define OP_31_XOP_HASHCHK   754
 #define OP_31_XOP_STFDUX759
 #define OP_31_XOP_LHBRX 790
 #define OP_31_XOP_LFIWAX855
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 0a8a793b8b8b..c17ec1e44c86 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -448,10 +448,16 @@ void *exit_vmx_ops(void *dest);
 
 #ifdef CONFIG_PPC_BOOK3S_64
 
+bool is_hashchk_trap(struct pt_regs const *regs);
 unsigned long get_thread_dexcr(struct thread_struct const *t);
 
 #else
 
+static inline bool is_hashchk_trap(struct pt_regs const *regs)
+{
+   return false;
+}
+
 static inline unsigned long get_thread_dexcr(struct thread_struct const *t)
 {
return 0;
diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c
index 32a0a69ff638..11515e67afac 100644
--- a/arch/powerpc/kernel/dexcr.c
+++ b/arch/powerpc/kernel/dexcr.c
@@ -3,6 +3,9 @@
 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 
@@ -19,6 +22,25 @@ static int __init dexcr_init(void)
 }
 early_initcall(dexcr_init);
 
+bool is_hashchk_trap(struct pt_regs const *regs)
+{
+   ppc_inst_t insn;
+
+   if (!cpu_has_feature(CPU_FTR_DEXCR_NPHIE))
+   return false;
+
+   if (get_user_instr(insn, (void __user *)regs->nip)) {
+   WARN_ON(1);
+   return false;
+   }
+
+   if (ppc_inst_primary_opcode(insn) == 31 &&
+   get_xop(ppc_inst_val(insn)) == OP_31_XOP_HASHCHK)
+   return true;
+
+   return false;
+}
+
 unsigned long get_thread_dexcr(struct thread_struct const *t)
 {
return DEFAULT_DEXCR;
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 9bdd79aa51cf..b83f5b382f24 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1516,6 +1516,12 @@ static void do_program_check(struct pt_regs *regs)
return;
}
}
+
+   if (user_mode(regs) && is_hashchk_trap(regs)) {
+   _exception(SIGILL, regs, ILL_ILLOPN, regs->nip);
+   return;
+   }
+
_exception(SIGTRAP, regs, TRAP_BRKPT, regs->nip);
return;
}
-- 
2.38.1



[RFC PATCH 01/13] powerpc/book3s: Add missing include

2022-11-27 Thread Benjamin Gray
The functions here use struct thread_struct fields, so need to import
the full definition from . The  header
that defines current only forward declares struct thread_struct.

Failing to include this  header leads to a compilation
error when a translation unit does not also include 
indirectly.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/include/asm/book3s/64/kup.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/include/asm/book3s/64/kup.h 
b/arch/powerpc/include/asm/book3s/64/kup.h
index 54cf46808157..84c09e546115 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -194,6 +194,7 @@
 #else /* !__ASSEMBLY__ */
 
 #include 
+#include 
 
 DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);
 
-- 
2.38.1



[RFC PATCH 02/13] powerpc: Add initial Dynamic Execution Control Register (DEXCR) support

2022-11-27 Thread Benjamin Gray
ISA 3.1B introduces the Dynamic Execution Control Register (DEXCR). It
is a per-cpu register that allows control over various CPU behaviours
including branch hint usage, indirect branch speculation, and
hashst/hashchk support.

Though introduced in 3.1B, no CPUs using 3.1 were released, so
CPU_FTR_ARCH_31 is used to determine support for the register itself.
Support for each DEXCR bit (aspect) is reported separately by the
firmware.

Add various definitions and basic support for the DEXCR in the kernel.
Right now it just initialises and maintains the DEXCR on process
creation/swap, and clears it in reset_sprs().

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/include/asm/book3s/64/kexec.h |  3 +++
 arch/powerpc/include/asm/cputable.h|  8 ++-
 arch/powerpc/include/asm/processor.h   | 13 +++
 arch/powerpc/include/asm/reg.h |  6 ++
 arch/powerpc/kernel/Makefile   |  1 +
 arch/powerpc/kernel/dexcr.c| 25 ++
 arch/powerpc/kernel/dt_cpu_ftrs.c  |  4 
 arch/powerpc/kernel/process.c  | 13 ++-
 arch/powerpc/kernel/prom.c |  4 
 9 files changed, 75 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/kernel/dexcr.c

diff --git a/arch/powerpc/include/asm/book3s/64/kexec.h 
b/arch/powerpc/include/asm/book3s/64/kexec.h
index d4b9d476ecba..563baf94a962 100644
--- a/arch/powerpc/include/asm/book3s/64/kexec.h
+++ b/arch/powerpc/include/asm/book3s/64/kexec.h
@@ -21,6 +21,9 @@ static inline void reset_sprs(void)
plpar_set_ciabr(0);
}
 
+   if (cpu_has_feature(CPU_FTR_ARCH_31))
+   mtspr(SPRN_DEXCR, 0);
+
/*  Do we need isync()? We are going via a kexec reset */
isync();
 }
diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index 757dbded11dc..03bc192f2d8b 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -192,6 +192,10 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_P9_RADIX_PREFETCH_BUG  LONG_ASM_CONST(0x0002)
 #define CPU_FTR_ARCH_31
LONG_ASM_CONST(0x0004)
 #define CPU_FTR_DAWR1  LONG_ASM_CONST(0x0008)
+#define CPU_FTR_DEXCR_SBHE LONG_ASM_CONST(0x0010)
+#define CPU_FTR_DEXCR_IBRTPD   LONG_ASM_CONST(0x0020)
+#define CPU_FTR_DEXCR_SRAPDLONG_ASM_CONST(0x0040)
+#define CPU_FTR_DEXCR_NPHIELONG_ASM_CONST(0x0080)
 
 #ifndef __ASSEMBLY__
 
@@ -451,7 +455,9 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
CPU_FTR_ARCH_300 | CPU_FTR_ARCH_31 | \
-   CPU_FTR_DAWR | CPU_FTR_DAWR1)
+   CPU_FTR_DAWR | CPU_FTR_DAWR1 | \
+   CPU_FTR_DEXCR_SBHE | CPU_FTR_DEXCR_IBRTPD | CPU_FTR_DEXCR_SRAPD | \
+   CPU_FTR_DEXCR_NPHIE)
 #define CPU_FTRS_CELL  (CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 631802999d59..0a8a793b8b8b 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -446,6 +446,19 @@ int exit_vmx_usercopy(void);
 int enter_vmx_ops(void);
 void *exit_vmx_ops(void *dest);
 
+#ifdef CONFIG_PPC_BOOK3S_64
+
+unsigned long get_thread_dexcr(struct thread_struct const *t);
+
+#else
+
+static inline unsigned long get_thread_dexcr(struct thread_struct const *t)
+{
+   return 0;
+}
+
+#endif /* CONFIG_PPC_BOOK3S_64 */
+
 #endif /* __KERNEL__ */
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_PROCESSOR_H */
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 1e8b2e04e626..cdd1f174c399 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -385,6 +385,12 @@
 #define SPRN_HSRR0 0x13A   /* Hypervisor Save/Restore 0 */
 #define SPRN_HSRR1 0x13B   /* Hypervisor Save/Restore 1 */
 #define SPRN_ASDR  0x330   /* Access segment descriptor register */
+#define SPRN_DEXCR 0x33C   /* Dynamic execution control register */
+#define   DEXCR_PRO_MASK(aspect)   __MASK(63 - (32 + (aspect)))/* 
Aspect number to problem state aspect mask */
+#define   DEXCR_PRO_SBHE   DEXCR_PRO_MASK(0)   /* Speculative 
Branch Hint Enable */
+#define   DEXCR_PRO_IBRTPD DEXCR_PRO_MASK(3)   /* Indirect 
Branch Recurrent Target Prediction Disable */
+#define   DEXCR_PRO_SRAPD  DEXCR_PRO_MASK(4)   /* Subroutine 
Return Address Prediction Disable */
+#define   DEXCR_PRO_NPHIE  DEXCR_PRO_MASK(5)   /* 
Non-Privileged Hash Instruction Enable */
 #define SPRN_IC 

[RFC PATCH 09/13] selftests/powerpc: Add more utility macros

2022-11-27 Thread Benjamin Gray
Adds more assertion variants to provide more context behind why a
failure occurred.

The SIGSAFE_FAIL_* variants are to allow safely asserting conditions
in a signal handler (though we are about to exit, so it's unlikely to
run into an issue with regular FAIL_IF_EXIT).

Also adds an ARRAY_SIZE macro.

These will be used by the following DEXCR selftests.

Signed-off-by: Benjamin Gray 
---
 .../testing/selftests/powerpc/include/utils.h | 44 +++
 1 file changed, 44 insertions(+)

diff --git a/tools/testing/selftests/powerpc/include/utils.h 
b/tools/testing/selftests/powerpc/include/utils.h
index 95f3a24a4569..b03d2192c6f6 100644
--- a/tools/testing/selftests/powerpc/include/utils.h
+++ b/tools/testing/selftests/powerpc/include/utils.h
@@ -9,12 +9,19 @@
 #define __cacheline_aligned __attribute__((aligned(128)))
 
 #include 
+#include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include "reg.h"
 
+#ifndef ARRAY_SIZE
+# define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif
+
 /* Avoid headaches with PRI?64 - just use %ll? always */
 typedef unsigned long long u64;
 typedef   signed long long s64;
@@ -111,6 +118,16 @@ do {   
\
}   \
 } while (0)
 
+#define FAIL_IF_MSG(x, msg)\
+do {   \
+   if ((x)) {  \
+   fprintf(stderr, \
+   "[FAIL] Test FAILED on line %d: %s\n",  \
+   __LINE__, msg); \
+   return 1;   \
+   }   \
+} while (0)
+
 #define FAIL_IF_EXIT(x)\
 do {   \
if ((x)) {  \
@@ -120,6 +137,16 @@ do {   
\
}   \
 } while (0)
 
+#define FAIL_IF_EXIT_MSG(x, msg)   \
+do {   \
+   if ((x)) {  \
+   fprintf(stderr, \
+   "[FAIL] Test FAILED on line %d: %s\n",  \
+   __LINE__, msg); \
+   _exit(1);   \
+   }   \
+} while (0)
+
 /* The test harness uses this, yes it's gross */
 #define MAGIC_SKIP_RETURN_VALUE99
 
@@ -149,6 +176,23 @@ do {   
\
ssize_t nbytes __attribute__((unused)); \
nbytes = write(STDERR_FILENO, msg, strlen(msg)); })
 
+#define SIGSAFE_FAIL_IF_EXIT(x)
\
+do {   
\
+   if ((x)) {  
\
+   sigsafe_err("[FAIL] Test FAILED on line " str(__LINE__) "\n");  
\
+   _exit(1);   
\
+   }   
\
+} while (0)
+
+#define SIGSAFE_FAIL_IF_EXIT_MSG(x, msg)   
\
+do {   
\
+   if ((x)) {  
\
+   sigsafe_err("[FAIL] Test FAILED on line "   
\
+   str(__LINE__) ": " msg "\n");   
\
+   _exit(1);   
\
+   }   
\
+} while (0)
+
 /* POWER9 feature */
 #ifndef PPC_FEATURE2_ARCH_3_00
 #define PPC_FEATURE2_ARCH_3_00 0x0080
-- 
2.38.1



[RFC PATCH 06/13] powerpc/dexcr: Add prctl implementation

2022-11-27 Thread Benjamin Gray
Adds an initial prctl interface implementation. Unprivileged processes
can query the current prctl setting, including whether an aspect is
implemented by the hardware or is permitted to be modified by a setter
prctl. Editable aspects can be changed by a CAP_SYS_ADMIN privileged
process.

The prctl setting represents what the process itself has requested, and
does not account for any overrides. Either the kernel or a hypervisor
may enforce a different setting for an aspect.

Userspace can access a readonly view of the current DEXCR via SPR 812,
and a readonly view of the aspects enforced by the hypervisor via
SPR 455. A bitwise OR of these two SPRs will give the effective
DEXCR aspect state of the process.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/include/asm/processor.h |  13 +++
 arch/powerpc/kernel/dexcr.c  | 133 ++-
 arch/powerpc/kernel/process.c|   6 ++
 3 files changed, 151 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 2381217c95dc..4c995258f668 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -265,6 +265,9 @@ struct thread_struct {
unsigned long   sier2;
unsigned long   sier3;
unsigned long   hashkeyr;
+   unsigned intdexcr_override;
+   unsigned intdexcr_mask;
+   unsigned intdexcr_forced;
 
 #endif
 };
@@ -338,6 +341,16 @@ extern int set_endian(struct task_struct *tsk, unsigned 
int val);
 extern int get_unalign_ctl(struct task_struct *tsk, unsigned long adr);
 extern int set_unalign_ctl(struct task_struct *tsk, unsigned int val);
 
+#ifdef CONFIG_PPC_BOOK3S_64
+
+#define PPC_GET_DEXCR_ASPECT(tsk, asp) dexcr_prctl_get((tsk), (asp))
+#define PPC_SET_DEXCR_ASPECT(tsk, asp, val) dexcr_prctl_set((tsk), (asp), 
(val))
+
+int dexcr_prctl_get(struct task_struct *tsk, unsigned long asp);
+int dexcr_prctl_set(struct task_struct *tsk, unsigned long asp, unsigned long 
val);
+
+#endif
+
 extern void load_fp_state(struct thread_fp_state *fp);
 extern void store_fp_state(struct thread_fp_state *fp);
 extern void load_vr_state(struct thread_vr_state *vr);
diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c
index 11515e67afac..9290beed722a 100644
--- a/arch/powerpc/kernel/dexcr.c
+++ b/arch/powerpc/kernel/dexcr.c
@@ -1,5 +1,8 @@
 #include 
+#include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -11,6 +14,10 @@
 
 #define DEFAULT_DEXCR  0
 
+/* Allow process configuration of these by default */
+#define DEXCR_PRCTL_EDITABLE (DEXCR_PRO_SBHE | DEXCR_PRO_IBRTPD | \
+ DEXCR_PRO_SRAPD | DEXCR_PRO_NPHIE)
+
 static int __init dexcr_init(void)
 {
if (!early_cpu_has_feature(CPU_FTR_ARCH_31))
@@ -43,5 +50,129 @@ bool is_hashchk_trap(struct pt_regs const *regs)
 
 unsigned long get_thread_dexcr(struct thread_struct const *t)
 {
-   return DEFAULT_DEXCR;
+   unsigned long dexcr = DEFAULT_DEXCR;
+
+   /* Apply prctl overrides */
+   dexcr = (dexcr & ~t->dexcr_mask) | t->dexcr_override;
+
+   return dexcr;
+}
+
+static void update_dexcr_on_cpu(void *info)
+{
+   mtspr(SPRN_DEXCR, get_thread_dexcr(¤t->thread));
+}
+
+static int dexcr_aspect_get(struct task_struct *task, unsigned int aspect)
+{
+   int ret = 0;
+
+   if (aspect & DEXCR_PRCTL_EDITABLE)
+   ret |= PR_PPC_DEXCR_PRCTL;
+
+   if (aspect & task->thread.dexcr_mask) {
+   if (aspect & task->thread.dexcr_override) {
+   if (aspect & task->thread.dexcr_forced)
+   ret |= PR_PPC_DEXCR_FORCE_SET_ASPECT;
+   else
+   ret |= PR_PPC_DEXCR_SET_ASPECT;
+   } else {
+   ret |= PR_PPC_DEXCR_CLEAR_ASPECT;
+   }
+   }
+
+   return ret;
+}
+
+int dexcr_prctl_get(struct task_struct *task, unsigned long which)
+{
+   switch (which) {
+   case PR_PPC_DEXCR_SBHE:
+   if (!cpu_has_feature(CPU_FTR_DEXCR_SBHE))
+   return -ENODEV;
+   return dexcr_aspect_get(task, DEXCR_PRO_SBHE);
+   case PR_PPC_DEXCR_IBRTPD:
+   if (!cpu_has_feature(CPU_FTR_DEXCR_IBRTPD))
+   return -ENODEV;
+   return dexcr_aspect_get(task, DEXCR_PRO_IBRTPD);
+   case PR_PPC_DEXCR_SRAPD:
+   if (!cpu_has_feature(CPU_FTR_DEXCR_SRAPD))
+   return -ENODEV;
+   return dexcr_aspect_get(task, DEXCR_PRO_SRAPD);
+   case PR_PPC_DEXCR_NPHIE:
+   if (!cpu_has_feature(CPU_FTR_DEXCR_NPHIE))
+   return -ENODEV;
+   return dexcr_aspect_get(task, DEXCR_PRO_NPHIE);
+   default:
+   return -ENODEV;
+   }
+}
+
+static int dexcr_aspect_set(struct task_struct *task, unsigned int aspect, 
unsigned long ctrl)
+{
+   if (!(aspect & DEXCR_

[RFC PATCH 11/13] selftests/powerpc: Add DEXCR prctl, sysctl interface test

2022-11-27 Thread Benjamin Gray
Test the prctl and sysctl interfaces of the DEXCR.

This adds a new capabilities util for getting and setting CAP_SYS_ADMIN.
Adding this avoids depending on an external libcap package. There is a
similar implementation (and reason) in the tools/testing/selftests/bpf
subtree but there's no obvious place to move it for sharing.

Signed-off-by: Benjamin Gray 
---
 .../selftests/powerpc/dexcr/.gitignore|   1 +
 .../testing/selftests/powerpc/dexcr/Makefile  |   4 +-
 tools/testing/selftests/powerpc/dexcr/cap.c   |  72 ++
 tools/testing/selftests/powerpc/dexcr/cap.h   |  18 ++
 tools/testing/selftests/powerpc/dexcr/dexcr.h |   2 +
 .../selftests/powerpc/dexcr/dexcr_test.c  | 241 ++
 6 files changed, 336 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/dexcr/cap.c
 create mode 100644 tools/testing/selftests/powerpc/dexcr/cap.h
 create mode 100644 tools/testing/selftests/powerpc/dexcr/dexcr_test.c

diff --git a/tools/testing/selftests/powerpc/dexcr/.gitignore 
b/tools/testing/selftests/powerpc/dexcr/.gitignore
index 37adb7f47832..035a1fcd8fb3 100644
--- a/tools/testing/selftests/powerpc/dexcr/.gitignore
+++ b/tools/testing/selftests/powerpc/dexcr/.gitignore
@@ -1 +1,2 @@
+dexcr_test
 hashchk_user
diff --git a/tools/testing/selftests/powerpc/dexcr/Makefile 
b/tools/testing/selftests/powerpc/dexcr/Makefile
index 4b4380d4d986..9814e72a4afa 100644
--- a/tools/testing/selftests/powerpc/dexcr/Makefile
+++ b/tools/testing/selftests/powerpc/dexcr/Makefile
@@ -1,4 +1,4 @@
-TEST_GEN_PROGS := hashchk_test
+TEST_GEN_PROGS := dexcr_test hashchk_test
 
 TEST_FILES := settings
 top_srcdir = ../../../../..
@@ -6,4 +6,4 @@ include ../../lib.mk
 
 HASHCHK_TEST_CFLAGS = -no-pie $(call cc-option,-mno-rop-protect)
 
-$(TEST_GEN_PROGS): ../harness.c ../utils.c ./dexcr.c
+$(TEST_GEN_PROGS): ../harness.c ../utils.c ./dexcr.c ./cap.c
diff --git a/tools/testing/selftests/powerpc/dexcr/cap.c 
b/tools/testing/selftests/powerpc/dexcr/cap.c
new file mode 100644
index ..3c9b1f27345d
--- /dev/null
+++ b/tools/testing/selftests/powerpc/dexcr/cap.c
@@ -0,0 +1,72 @@
+#include 
+#include 
+#include 
+
+#include "cap.h"
+#include "utils.h"
+
+struct kernel_capabilities {
+   struct __user_cap_header_struct header;
+
+   struct __user_cap_data_struct data[_LINUX_CAPABILITY_U32S_3];
+};
+
+static void get_caps(struct kernel_capabilities *caps)
+{
+   FAIL_IF_EXIT_MSG(syscall(SYS_capget, &caps->header, &caps->data),
+"cannot get capabilities");
+}
+
+static void set_caps(struct kernel_capabilities *caps)
+{
+   FAIL_IF_EXIT_MSG(syscall(SYS_capset, &caps->header, &caps->data),
+"cannot set capabilities");
+}
+
+static void init_caps(struct kernel_capabilities *caps, pid_t pid)
+{
+   memset(caps, 0, sizeof(*caps));
+
+   caps->header.version = _LINUX_CAPABILITY_VERSION_3;
+   caps->header.pid = pid;
+
+   get_caps(caps);
+}
+
+static bool has_cap(struct kernel_capabilities *caps, size_t cap)
+{
+   size_t data_index = cap / 32;
+   size_t offset = cap % 32;
+
+   FAIL_IF_EXIT_MSG(data_index >= ARRAY_SIZE(caps->data), "cap out of 
range");
+
+   return caps->data[data_index].effective & (1 << offset);
+}
+
+static void drop_cap(struct kernel_capabilities *caps, size_t cap)
+{
+   size_t data_index = cap / 32;
+   size_t offset = cap % 32;
+
+   FAIL_IF_EXIT_MSG(data_index >= ARRAY_SIZE(caps->data), "cap out of 
range");
+
+   caps->data[data_index].effective &= ~(1 << offset);
+}
+
+bool check_cap_sysadmin(void)
+{
+   struct kernel_capabilities caps;
+
+   init_caps(&caps, 0);
+
+   return has_cap(&caps, CAP_SYS_ADMIN);
+}
+
+void drop_cap_sysadmin(void)
+{
+   struct kernel_capabilities caps;
+
+   init_caps(&caps, 0);
+   drop_cap(&caps, CAP_SYS_ADMIN);
+   set_caps(&caps);
+}
diff --git a/tools/testing/selftests/powerpc/dexcr/cap.h 
b/tools/testing/selftests/powerpc/dexcr/cap.h
new file mode 100644
index ..41f41dda9862
--- /dev/null
+++ b/tools/testing/selftests/powerpc/dexcr/cap.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Simple capabilities getter/setter
+ *
+ * This header file contains helper functions and macros
+ * required to get and set capabilities(7). Introduced so
+ * we aren't the first to rely on libcap.
+ */
+#ifndef _SELFTESTS_POWERPC_DEXCR_CAP_H
+#define _SELFTESTS_POWERPC_DEXCR_CAP_H
+
+#include 
+
+bool check_cap_sysadmin(void);
+
+void drop_cap_sysadmin(void);
+
+#endif  /* _SELFTESTS_POWERPC_DEXCR_CAP_H */
diff --git a/tools/testing/selftests/powerpc/dexcr/dexcr.h 
b/tools/testing/selftests/powerpc/dexcr/dexcr.h
index fb8007bf19f8..b90633ae49e9 100644
--- a/tools/testing/selftests/powerpc/dexcr/dexcr.h
+++ b/tools/testing/selftests/powerpc/dexcr/dexcr.h
@@ -21,6 +21,8 @@
 #define DEXCR_PRO_SRAPDDEXCR_PRO_MASK(4)
 #define DEXCR_PRO_NPHIEDEXCR_PRO_MA

[RFC PATCH 12/13] selftests/powerpc: Add DEXCR status utility lsdexcr

2022-11-27 Thread Benjamin Gray
Add a utility 'lsdexcr' to print the current DEXCR status. Useful for
quickly checking the status when debugging test failures, using the
sysctl interfaces manually, or just wanting to check it.

Example output:

  Requested: 8400 (SBHE, NPHIE)
Hypervisor enforced: 
  Effective: 8400 (SBHE, NPHIE)

SBHE * (0): set, prctl editable (Speculative branch hint enable)
  IBRTPD   (3): clear, prctl editable   (Indirect branch recurrent 
target prediction disable)
   SRAPD   (4): clear, prctl editable   (Subroutine return address 
prediction disable)
   NPHIE * (5): set (Non-privileged hash instruction enable)

Global SBHE override: 1 (set)

Signed-off-by: Benjamin Gray 
---
 .../selftests/powerpc/dexcr/.gitignore|   1 +
 .../testing/selftests/powerpc/dexcr/Makefile  |   2 +
 .../testing/selftests/powerpc/dexcr/lsdexcr.c | 178 ++
 3 files changed, 181 insertions(+)
 create mode 100644 tools/testing/selftests/powerpc/dexcr/lsdexcr.c

diff --git a/tools/testing/selftests/powerpc/dexcr/.gitignore 
b/tools/testing/selftests/powerpc/dexcr/.gitignore
index 035a1fcd8fb3..7dd2fad93732 100644
--- a/tools/testing/selftests/powerpc/dexcr/.gitignore
+++ b/tools/testing/selftests/powerpc/dexcr/.gitignore
@@ -1,2 +1,3 @@
 dexcr_test
 hashchk_user
+lsdexcr
diff --git a/tools/testing/selftests/powerpc/dexcr/Makefile 
b/tools/testing/selftests/powerpc/dexcr/Makefile
index 9814e72a4afa..8cb732cda7e7 100644
--- a/tools/testing/selftests/powerpc/dexcr/Makefile
+++ b/tools/testing/selftests/powerpc/dexcr/Makefile
@@ -1,4 +1,5 @@
 TEST_GEN_PROGS := dexcr_test hashchk_test
+TEST_GEN_FILES := lsdexcr
 
 TEST_FILES := settings
 top_srcdir = ../../../../..
@@ -7,3 +8,4 @@ include ../../lib.mk
 HASHCHK_TEST_CFLAGS = -no-pie $(call cc-option,-mno-rop-protect)
 
 $(TEST_GEN_PROGS): ../harness.c ../utils.c ./dexcr.c ./cap.c
+$(TEST_GEN_FILES): ../utils.c ./dexcr.c
diff --git a/tools/testing/selftests/powerpc/dexcr/lsdexcr.c 
b/tools/testing/selftests/powerpc/dexcr/lsdexcr.c
new file mode 100644
index ..c9f0035f8e2e
--- /dev/null
+++ b/tools/testing/selftests/powerpc/dexcr/lsdexcr.c
@@ -0,0 +1,178 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "dexcr.h"
+#include "utils.h"
+
+static unsigned int requested;
+static unsigned int enforced;
+static unsigned int effective;
+
+struct dexcr_aspect {
+   const char *name;
+   const char *desc;
+   unsigned int index;
+   unsigned long pr_val;
+};
+
+static const struct dexcr_aspect aspects[] = {
+   {
+   .name = "SBHE",
+   .desc = "Speculative branch hint enable",
+   .index = 0,
+   .pr_val = PR_PPC_DEXCR_SBHE,
+   },
+   {
+   .name = "IBRTPD",
+   .desc = "Indirect branch recurrent target prediction disable",
+   .index = 3,
+   .pr_val = PR_PPC_DEXCR_IBRTPD,
+   },
+   {
+   .name = "SRAPD",
+   .desc = "Subroutine return address prediction disable",
+   .index = 4,
+   .pr_val = PR_PPC_DEXCR_SRAPD,
+   },
+   {
+   .name = "NPHIE",
+   .desc = "Non-privileged hash instruction enable",
+   .index = 5,
+   .pr_val = PR_PPC_DEXCR_NPHIE,
+   },
+};
+
+#define NUM_ASPECTS (sizeof(aspects) / sizeof(struct dexcr_aspect))
+
+static void print_list(const char *list[], size_t len)
+{
+   for (size_t i = 0; i < len; i++) {
+   printf("%s", list[i]);
+   if (i + 1 < len)
+   printf(", ");
+   }
+}
+
+static void print_dexcr(char *name, unsigned int bits)
+{
+   const char *enabled_aspects[32] = {NULL};
+   size_t j = 0;
+
+   printf("%s: %08x", name, bits);
+
+   if (bits == 0) {
+   printf("\n");
+   return;
+   }
+
+   for (size_t i = 0; i < NUM_ASPECTS; i++) {
+   unsigned int mask = pr_aspect_to_dexcr_mask(aspects[i].pr_val);
+   if (bits & mask) {
+   enabled_aspects[j++] = aspects[i].name;
+   bits &= ~mask;
+   }
+   }
+
+   if (bits)
+   enabled_aspects[j++] = "unknown";
+
+   printf(" (");
+   print_list(enabled_aspects, j);
+   printf(")\n");
+}
+
+static void print_aspect(const struct dexcr_aspect *aspect)
+{
+   const char *attributes[32] = {NULL};
+   size_t j = 0;
+   unsigned long mask;
+   int pr_status;
+
+   /* Kernel-independent info about aspect */
+   mask = pr_aspect_to_dexcr_mask(aspect->pr_val);
+   if (requested & mask)
+   attributes[j++] = "set";
+   if (enforced & mask)
+   attributes[j++] = "hypervisor enforced";
+   if (!(effective & mask))
+   attributes[j++] = "clear";
+
+   /* Kernel understanding of the aspect */
+   pr_stat

[RFC PATCH 00/13] Add DEXCR support

2022-11-27 Thread Benjamin Gray
This series is based on initial work by Chris Riedl that was not sent
to the list.

Adds a kernel interface for userspace to interact with the DEXCR.
The DEXCR is a SPR that allows control over various execution
'aspects', such as indirect branch prediction and enabling the
hashst/hashchk instructions. Further details are in ISA 3.1B
Book 3 chapter 12.

This RFC proposes an interface for users to interact with the DEXCR.
It aims to support

* Querying supported aspects
* Getting/setting aspects on a per-process level
* Allowing global overrides across all processes

There are some parts that I'm not sure on the best way to approach (hence RFC):

* The feature names in arch/powerpc/kernel/dt_cpu_ftrs.c appear to be 
unimplemented
  in skiboot, so are being defined by this series. Is being so verbose fine?
* What aspects should be editable by a process? E.g., SBHE has
  effects that potentially bleed into other processes. Should
  it only be system wide configurable?
* Should configuring certain aspects for the process be non-privileged? E.g.,
  Is there harm in always allowing configuration of IBRTPD, SRAPD? The 
*FORCE_SET*
  action prevents further process local changes regardless of privilege.
* The tests fail Patchwork CI because of the new prctl macros, and the CI
  doesn't run headers_install and add -isystem /usr/include to
  the make command.
* On handling an exception, I don't check if the NPHIE bit is enabled in the 
DEXCR.
  To do so would require reading both the DEXCR and HDEXCR, for little gain (it
  should only matter that the current instruction was a hashchk. If so, the only
  reason it would cause an exception is the failed check. If the instruction is
  rewritten between exception and check we'd be wrong anyway).

The series is based on the earlier selftest utils series[1], so the tests won't 
build
at all without applying that first. The kernel side should build fine on 
ppc/next
247f34f7b80357943234f93f247a1ae6b6c3a740 though.

[1]: 
https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20221122231103.15829-1-bg...@linux.ibm.com/

Benjamin Gray (13):
  powerpc/book3s: Add missing  include
  powerpc: Add initial Dynamic Execution Control Register (DEXCR)
support
  powerpc/dexcr: Handle hashchk exception
  powerpc/dexcr: Support userspace ROP protection
  prctl: Define PowerPC DEXCR interface
  powerpc/dexcr: Add prctl implementation
  powerpc/dexcr: Add sysctl entry for SBHE system override
  powerpc/dexcr: Add enforced userspace ROP protection config
  selftests/powerpc: Add more utility macros
  selftests/powerpc: Add hashst/hashchk test
  selftests/powerpc: Add DEXCR prctl, sysctl interface test
  selftests/powerpc: Add DEXCR status utility lsdexcr
  Documentation: Document PowerPC kernel DEXCR interface

 Documentation/powerpc/dexcr.rst   | 183 +++
 Documentation/powerpc/index.rst   |   1 +
 arch/powerpc/Kconfig  |   5 +
 arch/powerpc/include/asm/book3s/64/kexec.h|   6 +
 arch/powerpc/include/asm/book3s/64/kup.h  |   1 +
 arch/powerpc/include/asm/cputable.h   |   8 +-
 arch/powerpc/include/asm/ppc-opcode.h |   1 +
 arch/powerpc/include/asm/processor.h  |  33 ++
 arch/powerpc/include/asm/reg.h|   7 +
 arch/powerpc/kernel/Makefile  |   1 +
 arch/powerpc/kernel/dexcr.c   | 310 ++
 arch/powerpc/kernel/dt_cpu_ftrs.c |   4 +
 arch/powerpc/kernel/process.c |  31 +-
 arch/powerpc/kernel/prom.c|   4 +
 arch/powerpc/kernel/traps.c   |   6 +
 include/uapi/linux/prctl.h|  14 +
 kernel/sys.c  |  16 +
 tools/testing/selftests/powerpc/Makefile  |   1 +
 .../selftests/powerpc/dexcr/.gitignore|   3 +
 .../testing/selftests/powerpc/dexcr/Makefile  |  11 +
 tools/testing/selftests/powerpc/dexcr/cap.c   |  72 
 tools/testing/selftests/powerpc/dexcr/cap.h   |  18 +
 tools/testing/selftests/powerpc/dexcr/dexcr.c | 118 +++
 tools/testing/selftests/powerpc/dexcr/dexcr.h |  54 +++
 .../selftests/powerpc/dexcr/dexcr_test.c  | 241 ++
 .../selftests/powerpc/dexcr/hashchk_test.c| 229 +
 .../testing/selftests/powerpc/dexcr/lsdexcr.c | 178 ++
 tools/testing/selftests/powerpc/include/reg.h |   4 +
 .../testing/selftests/powerpc/include/utils.h |  44 +++
 29 files changed, 1602 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/powerpc/dexcr.rst
 create mode 100644 arch/powerpc/kernel/dexcr.c
 create mode 100644 tools/testing/selftests/powerpc/dexcr/.gitignore
 create mode 100644 tools/testing/selftests/powerpc/dexcr/Makefile
 create mode 100644 tools/testing/selftests/powerpc/dexcr/cap.c
 create mode 100644 tools/testing/selftests/powerpc/dexcr/cap.h
 create mode 100644 tools/testing/selftests/powerpc/dexcr/dexcr.c
 create mode 100644 tools/testing/selftests/powerpc/dexcr/de

[RFC PATCH 13/13] Documentation: Document PowerPC kernel DEXCR interface

2022-11-27 Thread Benjamin Gray
Describe the DEXCR and document how to interact with it via the
prctl and sysctl interfaces.

Signed-off-by: Benjamin Gray 
---
 Documentation/powerpc/dexcr.rst | 183 
 Documentation/powerpc/index.rst |   1 +
 2 files changed, 184 insertions(+)
 create mode 100644 Documentation/powerpc/dexcr.rst

diff --git a/Documentation/powerpc/dexcr.rst b/Documentation/powerpc/dexcr.rst
new file mode 100644
index ..3c995f4b9fe0
--- /dev/null
+++ b/Documentation/powerpc/dexcr.rst
@@ -0,0 +1,183 @@
+==
+DEXCR (Dynamic Execution Control Register)
+==
+
+Overview
+
+
+The DEXCR is a privileged special purpose register (SPR) introduced in
+PowerPC ISA 3.1B (Power10) that allows per-cpu control over several dynamic
+execution behaviours. These behaviours include speculation (e.g., indirect
+branch target prediction) and enabling return-oriented programming (ROP)
+protection instructions.
+
+The execution control is exposed in hardware as up to 32 bits ('aspects') in
+the DEXCR. Each aspect controls a certain behaviour, and can be set or cleared
+to enable/disable the aspect. There are several variants of the DEXCR for
+different purposes:
+
+DEXCR
+A priviliged SPR that can control aspects for userspace and kernel space
+HDEXCR
+A hypervisor-privileged SPR that can control aspects for the hypervisor and
+enforce aspects for the kernel and userspace.
+UDEXCR
+An optional ultravisor-privileged SPR that can control aspects for the 
ultravisor.
+
+Userspace can examine the current DEXCR state using a dedicated SPR that
+provides a non-privileged read-only view of the userspace DEXCR aspects.
+There is also an SPR that provides a read-only view of the hypervisor enforced
+aspects, which ORed with the userspace DEXCR view gives the effective DEXCR
+state for a process.
+
+
+User API
+
+
+prctl()
+---
+
+A process can control its own userspace DEXCR value using the
+``PR_PPC_GET_DEXCR`` and ``PR_PPC_SET_DEXCR`` pair of
+:manpage:`prctl(2)` commands. These calls have the form::
+
+prctl(PR_PPC_GET_DEXCR, unsigned long aspect, 0, 0, 0);
+prctl(PR_PPC_SET_DEXCR, unsigned long aspect, unsigned long flags, 0, 0);
+
+Where ``aspect`` (``arg1``) is a constant and ``flags`` (``arg2``) is a 
bifield.
+The possible aspect and flag values are as follows. Note there is no relation
+between aspect value and ``prctl()`` constant value.
+
+.. flat-table::
+   :header-rows: 1
+   :widths: 2 7 1
+
+   * - ``prctl()`` constant
+ - Aspect name
+ - Aspect bit
+
+   * - ``PR_PPC_DEXCR_SBHE``
+ - Speculative Branch Hint Enable (SBHE)
+ - 0
+
+   * - ``PR_PPC_DEXCR_IBRTPD``
+ - Indirect Branch Recurrent Target Prediction Disable (IBRTPD)
+ - 3
+
+   * - ``PR_PPC_DEXCR_SRAPD``
+ - Subroutine Return Address Prediction Disable (SRAPD)
+ - 4
+
+   * - ``PR_PPC_DEXCR_NPHIE``
+ - Non-Privileged Hash Instruction Enable (NPHIE)
+ - 5
+
+.. flat-table::
+   :header-rows: 1
+   :widths: 2 8
+
+   * - ``prctl()`` flag
+ - Meaning
+
+   * - ``PR_PPC_DEXCR_PRCTL``
+ - This aspect can be configured with ``prctl(PR_PPC_SET_DEXCR, ...)``
+
+   * - ``PR_PPC_DEXCR_SET_ASPECT``
+ - This aspect is set
+
+   * - ``PR_PPC_DEXCR_FORCE_SET_ASPECT``
+ - This aspect is set and cannot be undone. A subsequent
+   ``prctl(..., PR_PPC_DEXCR_CLEAR_ASPECT)`` will fail.
+
+   * - ``PR_PPC_DEXCR_CLEAR_ASPECT``
+ - This aspect is clear
+
+Note that
+
+* The ``*_SET_ASPECT`` / ``*_CLEAR_ASPECT`` refers to setting/clearing the bit 
in the DEXCR.
+  For example::
+
+  prctl(PR_PPC_SET_DEXCR, PR_PPC_DEXCR_IBRTPD, PR_PPC_DEXCR_SET_ASPECT, 0, 
0);
+
+  will set the IBRTPD aspect bit in the DEXCR, causing indirect branch 
prediction
+  to be disabled.
+
+* The status returned by ``PR_PPC_GET_DEXCR`` does not include any alternative
+  config overrides. To see the true DEXCR state software should read the 
appropriate
+  SPRs directly.
+
+* A forced aspect will still report ``PR_PPC_DEXCR_PRCTL`` if it would
+  otherwise be editable.
+
+* The aspect state when starting a process is copied from the parent's
+  state on :manpage:`fork(2)` and :manpage:`execve(2)`. Aspects may also be set
+  or cleared by the kernel on process creation.
+
+Use ``PR_PPC_SET_DEXCR`` with one of ``PR_PPC_DEXCR_SET_ASPECT``,
+``PR_PPC_DEXCR_FORCE_SET_ASPECT``, or ``PR_PPC_DEXCR_CLEAR_ASPECT`` to edit a
+ given aspect.
+
+Common error codes for both getting and setting the DEXCR are as follows:
+
+.. flat-table::
+   :header-rows: 1
+   :widths: 2 8
+
+   * - Error
+ - Meaning
+
+   * - ``EINVAL``
+ - The DEXCR is not supported by the kernel.
+
+   * - ``ENODEV``
+ - The aspect is not recognised by the kernel or not supported by the 
hardware.
+
+``PR_PPC_SET_DEXCR`` may also report the following error codes:
+
+.. flat-table::
+   :header-rows: 1
+   :widths: 2 8
+
+   * - Err

[RFC PATCH 10/13] selftests/powerpc: Add hashst/hashchk test

2022-11-27 Thread Benjamin Gray
Test the kernel DEXCR[NPHIE] interface and hashchk exception handling.

Introduces with it a DEXCR utils library for common DEXCR operations.

Signed-off-by: Benjamin Gray 
---
 tools/testing/selftests/powerpc/Makefile  |   1 +
 .../selftests/powerpc/dexcr/.gitignore|   1 +
 .../testing/selftests/powerpc/dexcr/Makefile  |   9 +
 tools/testing/selftests/powerpc/dexcr/dexcr.c | 118 +
 tools/testing/selftests/powerpc/dexcr/dexcr.h |  52 
 .../selftests/powerpc/dexcr/hashchk_test.c| 229 ++
 tools/testing/selftests/powerpc/include/reg.h |   4 +
 7 files changed, 414 insertions(+)
 create mode 100644 tools/testing/selftests/powerpc/dexcr/.gitignore
 create mode 100644 tools/testing/selftests/powerpc/dexcr/Makefile
 create mode 100644 tools/testing/selftests/powerpc/dexcr/dexcr.c
 create mode 100644 tools/testing/selftests/powerpc/dexcr/dexcr.h
 create mode 100644 tools/testing/selftests/powerpc/dexcr/hashchk_test.c

diff --git a/tools/testing/selftests/powerpc/Makefile 
b/tools/testing/selftests/powerpc/Makefile
index 6ba95cd19e42..00dbd000ee01 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -17,6 +17,7 @@ SUB_DIRS = alignment  \
   benchmarks   \
   cache_shape  \
   copyloops\
+  dexcr\
   dscr \
   mm   \
   nx-gzip  \
diff --git a/tools/testing/selftests/powerpc/dexcr/.gitignore 
b/tools/testing/selftests/powerpc/dexcr/.gitignore
new file mode 100644
index ..37adb7f47832
--- /dev/null
+++ b/tools/testing/selftests/powerpc/dexcr/.gitignore
@@ -0,0 +1 @@
+hashchk_user
diff --git a/tools/testing/selftests/powerpc/dexcr/Makefile 
b/tools/testing/selftests/powerpc/dexcr/Makefile
new file mode 100644
index ..4b4380d4d986
--- /dev/null
+++ b/tools/testing/selftests/powerpc/dexcr/Makefile
@@ -0,0 +1,9 @@
+TEST_GEN_PROGS := hashchk_test
+
+TEST_FILES := settings
+top_srcdir = ../../../../..
+include ../../lib.mk
+
+HASHCHK_TEST_CFLAGS = -no-pie $(call cc-option,-mno-rop-protect)
+
+$(TEST_GEN_PROGS): ../harness.c ../utils.c ./dexcr.c
diff --git a/tools/testing/selftests/powerpc/dexcr/dexcr.c 
b/tools/testing/selftests/powerpc/dexcr/dexcr.c
new file mode 100644
index ..3e7cb581d4a2
--- /dev/null
+++ b/tools/testing/selftests/powerpc/dexcr/dexcr.c
@@ -0,0 +1,118 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "dexcr.h"
+#include "reg.h"
+#include "utils.h"
+
+long sysctl_get_sbhe(void)
+{
+   long value;
+
+   FAIL_IF_EXIT_MSG(read_long(SYSCTL_DEXCR_SBHE, &value, 10),
+"failed to read " SYSCTL_DEXCR_SBHE);
+
+   return value;
+}
+
+void sysctl_set_sbhe(long value)
+{
+   FAIL_IF_EXIT_MSG(write_long(SYSCTL_DEXCR_SBHE, value, 10),
+"failed to write to " SYSCTL_DEXCR_SBHE);
+}
+
+unsigned int pr_aspect_to_dexcr_mask(unsigned long which)
+{
+   switch (which) {
+   case PR_PPC_DEXCR_SBHE:
+   return DEXCR_PRO_SBHE;
+   case PR_PPC_DEXCR_IBRTPD:
+   return DEXCR_PRO_IBRTPD;
+   case PR_PPC_DEXCR_SRAPD:
+   return DEXCR_PRO_SRAPD;
+   case PR_PPC_DEXCR_NPHIE:
+   return DEXCR_PRO_NPHIE;
+   default:
+   FAIL_IF_EXIT_MSG(true, "unknown PR aspect");
+   }
+}
+
+static inline unsigned int get_dexcr_pro(void)
+{
+   return mfspr(SPRN_DEXCR);
+}
+
+static inline unsigned int get_dexcr_enf(void)
+{
+   return mfspr(SPRN_HDEXCR);
+}
+
+static inline unsigned int get_dexcr_eff(void)
+{
+   return get_dexcr_pro() | get_dexcr_enf();
+}
+
+unsigned int get_dexcr(enum DexcrSource source)
+{
+   switch (source) {
+   case UDEXCR:
+   return get_dexcr_pro();
+   case ENFORCED:
+   return get_dexcr_enf();
+   case EFFECTIVE:
+   return get_dexcr_eff();
+   default:
+   FAIL_IF_EXIT_MSG(true, "bad DEXCR source");
+   }
+}
+
+bool pr_aspect_supported(unsigned long which)
+{
+   return prctl(PR_PPC_GET_DEXCR, which, 0, 0, 0) >= 0;
+}
+
+bool pr_aspect_editable(unsigned long which)
+{
+   int ret = prctl(PR_PPC_GET_DEXCR, which, 0, 0, 0);
+   return ret > 0 && (ret & PR_PPC_DEXCR_PRCTL) > 0;
+}
+
+bool pr_aspect_edit(unsigned long which, unsigned long ctrl)
+{
+   return prctl(PR_PPC_SET_DEXCR, which, ctrl, 0, 0) == 0;
+}
+
+bool pr_aspect_check(unsigned long which, enum DexcrSource source)
+{
+   unsigned int dexcr = get_dexcr(source);
+   unsigned int aspect = pr_aspect_to_dexcr_mask(which);
+   return (dexcr & aspect) != 0;
+}
+
+int pr_aspect_get(unsigned long pr_aspect)
+{
+   int ret = prctl(PR_PPC_GET_DEXCR, pr_aspect, 0, 0, 0);
+   FAIL_IF_EXIT_MSG(ret < 0, "prctl failed");
+   return ret;
+}
+
+bool dexcr_pro_check(unsigned int

[RFC PATCH 05/13] prctl: Define PowerPC DEXCR interface

2022-11-27 Thread Benjamin Gray
Adds the definitions and generic handler for prctl control of the
PowerPC Dynamic Execution Control Register (DEXCR).

Signed-off-by: Benjamin Gray 
---
 include/uapi/linux/prctl.h | 14 ++
 kernel/sys.c   | 16 
 2 files changed, 30 insertions(+)

diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a5e06dcbba13..b4720e8de6f3 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -281,6 +281,20 @@ struct prctl_mm_map {
 # define PR_SME_VL_LEN_MASK0x
 # define PR_SME_VL_INHERIT (1 << 17) /* inherit across exec */
 
+/* PowerPC Dynamic Execution Control Register (DEXCR) controls */
+#define PR_PPC_GET_DEXCR   65
+#define PR_PPC_SET_DEXCR   66
+/* DEXCR aspect to act on */
+# define PR_PPC_DEXCR_SBHE 0 /* Speculative branch hint enable */
+# define PR_PPC_DEXCR_IBRTPD   1 /* Indirect branch recurrent target 
prediction disable */
+# define PR_PPC_DEXCR_SRAPD2 /* Subroutine return address 
prediction disable */
+# define PR_PPC_DEXCR_NPHIE3 /* Non-privileged hash instruction 
enable */
+/* Action to apply / return */
+# define PR_PPC_DEXCR_PRCTL(1 << 0)
+# define PR_PPC_DEXCR_SET_ASPECT   (1 << 1)
+# define PR_PPC_DEXCR_FORCE_SET_ASPECT (1 << 2)
+# define PR_PPC_DEXCR_CLEAR_ASPECT (1 << 3)
+
 #define PR_SET_VMA 0x53564d41
 # define PR_SET_VMA_ANON_NAME  0
 
diff --git a/kernel/sys.c b/kernel/sys.c
index 5fd54bf0e886..55b8f7369059 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -139,6 +139,12 @@
 #ifndef GET_TAGGED_ADDR_CTRL
 # define GET_TAGGED_ADDR_CTRL()(-EINVAL)
 #endif
+#ifndef PPC_GET_DEXCR_ASPECT
+# define PPC_GET_DEXCR_ASPECT(a, b)(-EINVAL)
+#endif
+#ifndef PPC_SET_DEXCR_ASPECT
+# define PPC_SET_DEXCR_ASPECT(a, b, c) (-EINVAL)
+#endif
 
 /*
  * this is where the system-wide overflow UID and GID are defined, for
@@ -2623,6 +2629,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, 
unsigned long, arg3,
error = sched_core_share_pid(arg2, arg3, arg4, arg5);
break;
 #endif
+   case PR_PPC_GET_DEXCR:
+   if (arg3 || arg4 || arg5)
+   return -EINVAL;
+   error = PPC_GET_DEXCR_ASPECT(me, arg2);
+   break;
+   case PR_PPC_SET_DEXCR:
+   if (arg4 || arg5)
+   return -EINVAL;
+   error = PPC_SET_DEXCR_ASPECT(me, arg2, arg3);
+   break;
case PR_SET_VMA:
error = prctl_set_vma(arg2, arg3, arg4, arg5);
break;
-- 
2.38.1



[RFC PATCH 07/13] powerpc/dexcr: Add sysctl entry for SBHE system override

2022-11-27 Thread Benjamin Gray
The DEXCR Speculative Branch Hint Enable (SBHE) aspect controls whether
the hints provided by BO field of Branch instructions are obeyed during
speculative execution.

SBHE behaviour per ISA 3.1B:

0:  The hints provided by BO field of Branch instructions may be
ignored during speculative execution

1:  The hints provided by BO field of Branch instructions are obeyed
during speculative execution

Add a sysctl entry to allow changing this aspect globally in the system
at runtime:

/proc/sys/kernel/speculative_branch_hint_enable

Three values are supported:

-1: Disable DEXCR SBHE sysctl override
 0: Override and set DEXCR[SBHE] aspect to 0
 1: Override and set DEXCR[SBHE] aspect to 1

Internally, introduces a mechanism to apply arbitrary system wide
overrides on top of the prctl() config.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/kernel/dexcr.c | 125 
 1 file changed, 125 insertions(+)

diff --git a/arch/powerpc/kernel/dexcr.c b/arch/powerpc/kernel/dexcr.c
index 9290beed722a..8239bcc92026 100644
--- a/arch/powerpc/kernel/dexcr.c
+++ b/arch/powerpc/kernel/dexcr.c
@@ -1,8 +1,11 @@
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -18,6 +21,58 @@
 #define DEXCR_PRCTL_EDITABLE (DEXCR_PRO_SBHE | DEXCR_PRO_IBRTPD | \
  DEXCR_PRO_SRAPD | DEXCR_PRO_NPHIE)
 
+/*
+ * Lock to protect system DEXCR override from concurrent updates.
+ * RCU semantics: writers take lock, readers are unlocked.
+ * Writers ensure the memory update is atomic, readers read
+ * atomically.
+ */
+static DEFINE_SPINLOCK(dexcr_sys_enforced_write_lock);
+
+struct mask_override {
+   union {
+   struct {
+   unsigned int mask;
+   unsigned int override;
+   };
+
+   /* Raw access for atomic read/write */
+   unsigned long all;
+   };
+};
+
+static struct mask_override dexcr_sys_enforced;
+
+static int spec_branch_hint_enable = -1;
+
+static void update_userspace_system_dexcr(unsigned int pro_mask, int value)
+{
+   struct mask_override update = { .all = 0 };
+
+   switch (value) {
+   case -1:  /* Clear the mask bit, clear the override bit */
+   break;
+   case 0:  /* Set the mask bit, clear the override bit */
+   update.mask |= pro_mask;
+   break;
+   case 1:  /* Set the mask bit, set the override bit */
+   update.mask |= pro_mask;
+   update.override |= pro_mask;
+   break;
+   }
+
+   spin_lock(&dexcr_sys_enforced_write_lock);
+
+   /* Use the existing values for the non-updated bits */
+   update.mask |= dexcr_sys_enforced.mask & ~pro_mask;
+   update.override |= dexcr_sys_enforced.override & ~pro_mask;
+
+   /* Atomically update system enforced aspects */
+   WRITE_ONCE(dexcr_sys_enforced.all, update.all);
+
+   spin_unlock(&dexcr_sys_enforced_write_lock);
+}
+
 static int __init dexcr_init(void)
 {
if (!early_cpu_has_feature(CPU_FTR_ARCH_31))
@@ -25,6 +80,9 @@ static int __init dexcr_init(void)
 
mtspr(SPRN_DEXCR, DEFAULT_DEXCR);
 
+   if (early_cpu_has_feature(CPU_FTR_DEXCR_SBHE))
+   update_userspace_system_dexcr(DEXCR_PRO_SBHE, 
spec_branch_hint_enable);
+
return 0;
 }
 early_initcall(dexcr_init);
@@ -52,9 +110,15 @@ unsigned long get_thread_dexcr(struct thread_struct const 
*t)
 {
unsigned long dexcr = DEFAULT_DEXCR;
 
+   /* Atomically read enforced mask & override */
+   struct mask_override enforced = READ_ONCE(dexcr_sys_enforced);
+
/* Apply prctl overrides */
dexcr = (dexcr & ~t->dexcr_mask) | t->dexcr_override;
 
+   /* Apply system overrides */
+   dexcr = (dexcr & ~enforced.mask) | enforced.override;
+
return dexcr;
 }
 
@@ -176,3 +240,64 @@ int dexcr_prctl_set(struct task_struct *task, unsigned 
long which, unsigned long
 
return 0;
 }
+
+#ifdef CONFIG_SYSCTL
+
+static const int min_sysctl_val = -1;
+
+static int sysctl_dexcr_sbhe_handler(struct ctl_table *table, int write,
+void *buf, size_t *lenp, loff_t *ppos)
+{
+   int err;
+   int prev = spec_branch_hint_enable;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   if (!cpu_has_feature(CPU_FTR_DEXCR_SBHE))
+   return -ENODEV;
+
+   err = proc_dointvec_minmax(table, write, buf, lenp, ppos);
+   if (err)
+   return err;
+
+   if (prev != spec_branch_hint_enable && write) {
+   update_userspace_system_dexcr(DEXCR_PRO_SBHE, 
spec_branch_hint_enable);
+   cpus_read_lock();
+   on_each_cpu(update_dexcr_on_cpu, NULL, 1);
+   cpus_read_unlock();
+   }
+
+   return 0;
+}
+
+static struct ctl_table dexcr_sbhe_ctl_table[] = {
+   {

Re: [PATCH 12/13] powerpc/tracing: tracepoints for RTAS entry and exit

2022-11-27 Thread Nicholas Piggin
On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote:
> Add two sets of tracepoints to be used around RTAS entry:
>
> * rtas_input/rtas_output, which emit the function name, its inputs,
>   the returned status, and any other outputs. These produce an API-level
>   record of OS<->RTAS activity.
>
> * rtas_ll_entry/rtas_ll_exit, which are lower-level and emit the
>   entire contents of the parameter block (aka rtas_args) on entry and
>   exit. Likely useful only for debugging.
>
> With uses of these tracepoints in do_enter_rtas() to be added in the
> following patch, examples of get-time-of-day and event-scan functions
> as rendered by trace-cmd (with some multi-line formatting manually
> imposed on the rtas_ll_* entries to avoid extremely long lines in the
> commit message):
>
> cat-36800 [059]  4978.518303: rtas_input:   get-time-of-day arguments:
> cat-36800 [059]  4978.518306: rtas_ll_entry:token=3 nargs=0 nret=8
> params: [0]=0x 
> [1]=0x [2]=0x [3]=0x
> [4]=0x 
> [5]=0x [6]=0x [7]=0x
>   [8]=0x 
> [9]=0x [10]=0x [11]=0x
>   [12]=0x 
> [13]=0x [14]=0x [15]=0x
> cat-36800 [059]  4978.518366: rtas_ll_exit: token=3 nargs=0 nret=8
> params: [0]=0x 
> [1]=0x07e6 [2]=0x000b [3]=0x0001
>   [4]=0x 
> [5]=0x000e [6]=0x0008 [7]=0x2e0dac40
>   [8]=0x 
> [9]=0x [10]=0x [11]=0x
>   [12]=0x 
> [13]=0x [14]=0x [15]=0x
> cat-36800 [059]  4978.518366: rtas_output:  get-time-of-day status: 
> 0, other outputs: 2022 11 1 0 14 8 772648000
>
> kworker/39:1-336   [039]  4982.731623: rtas_input:   event-scan 
> arguments: 4294967295 0 80484920 2048
> kworker/39:1-336   [039]  4982.731626: rtas_ll_entry:token=6 nargs=4 
> nret=1
>  params: 
> [0]=0x [1]=0x [2]=0x04cc1a38 [3]=0x0800
>
> [4]=0x [5]=0x000e [6]=0x0008 [7]=0x2e0dac40
>
> [8]=0x [9]=0x [10]=0x [11]=0x
>
> [12]=0x [13]=0x [14]=0x [15]=0x
> kworker/39:1-336   [039]  4982.731676: rtas_ll_exit: token=6 nargs=4 
> nret=1
>  params: 
> [0]=0x [1]=0x [2]=0x04cc1a38 [3]=0x0800
>
> [4]=0x0001 [5]=0x000e [6]=0x0008 [7]=0x2e0dac40
>
> [8]=0x [9]=0x [10]=0x [11]=0x
>
> [12]=0x [13]=0x [14]=0x [15]=0x
> kworker/39:1-336   [039]  4982.731677: rtas_output:  event-scan 
> status: 1, other outputs:
>
> Signed-off-by: Nathan Lynch 
> ---
>  arch/powerpc/include/asm/trace.h | 116 +++
>  1 file changed, 116 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/trace.h 
> b/arch/powerpc/include/asm/trace.h
> index 08cd60cd70b7..e7a301c9eb95 100644
> --- a/arch/powerpc/include/asm/trace.h
> +++ b/arch/powerpc/include/asm/trace.h
> @@ -119,6 +119,122 @@ TRACE_EVENT_FN_COND(hcall_exit,
>  );
>  #endif
>  
> +#ifdef CONFIG_PPC_RTAS
> +
> +#include 
> +
> +/*
> + * Since stop-self is how CPUs go offline on RTAS platforms,
> + * these tracepoints are conditional.
> + */
> +
> +TRACE_EVENT_CONDITION(rtas_input,
> +
> + TP_PROTO(struct rtas_args *rtas_args, const char *name),
> +
> + TP_ARGS(rtas_args, name),
> +
> + TP_CONDITION(cpu_online(raw_smp_processor_id())),
> +
> + TP_STRUCT__entry(
> + __field(__u32, nargs)
> + __string(name, name)
> + __dynamic_array(__u32, inputs, be32_to_cpu(rtas_args->nargs))
> + ),
> +
> + TP_fast_assign(
> + __entry->nargs = be32_to_cpu(rtas_args->nargs);
> + __assign_str(name, name);
> + be32_to_cpu_array(__get_dynamic_array(inputs), rtas_args->args, 
> __entry->nargs);
> + ),
> +
> + TP_printk("%s arguments: %s", __get_str(name),
> +   __print_array(__get_dynamic_array(inputs), __en

Re: [PATCH linux-next][RFC]torture: avoid offline tick_do_timer_cpu

2022-11-27 Thread Zhouyi Zhou
Thank you all for your guidance and encouragement!

I learn how to construct commit message properly and learn how
important the role
that the torture test framework plays for the Linux kernel. Hope I can
be of benefit to the community by my work.

I am going to continue to study this topic and study the torture test
framework, and wait for your further instructions.

Best Regards
Zhouyi
On Mon, Nov 28, 2022 at 1:53 AM Paul E. McKenney  wrote:
>
> On Sun, Nov 27, 2022 at 01:40:28PM +0100, Thomas Gleixner wrote:
>
> [ . . . ]
>
> > >> No. We are not exporting this just to make a bogus test case happy.
> > >>
> > >> Fix the torture code to handle -EBUSY correctly.
> > > I am going to do a study on this, for now, I do a grep in the kernel tree:
> > > find . -name "*.c"|xargs grep cpuhp_setup_state|wc -l
> > > The result of the grep command shows that there are 268
> > > cpuhp_setup_state* cases.
> > > which may make our task more complicated.
> >
> > Why? The whole point of this torture thing is to stress the
> > infrastructure.
>
> Indeed.
>
> > There are quite some reasons why a CPU-hotplug or a hot-unplug operation
> > can fail, which is not a fatal problem, really.
> >
> > So if a CPU hotplug operation fails, then why can't the torture test
> > just move on and validate that the system still behaves correctly?
> >
> > That gives us more coverage than just testing the good case and giving
> > up when something unexpected happens.
>
> Agreed, with access to a function like the tick_nohz_full_timekeeper()
> suggested earlier in this email thread, then yes, it would make sense to
> try to offline the CPU anyway, then forgive the failure in cases where
> the CPU matches that indicated by tick_nohz_full_timekeeper().
>
> > I even argue that the torture test should inject random failures into
> > the hotplug state machine to achieve extended code coverage.
>
> I could imagine torture_onoff() telling various CPU-hotplug notifiers
> to refuse the transition using some TBD interface.  That would better
> test the CPU-hotplug common code's ability to deal with failures.
>
> Or did you have something else/additional in mind?
>
> Thanx, Paul


Re: [PATCH 13/13] powerpc/rtas: place tracepoints in do_enter_rtas()

2022-11-27 Thread Nicholas Piggin
On Sat Nov 19, 2022 at 1:07 AM AEST, Nathan Lynch wrote:
> Call the just-added rtas tracepoints in do_enter_rtas(), taking care
> to avoid function name lookups in the CPU offline path.
>
> Signed-off-by: Nathan Lynch 
> ---
>  arch/powerpc/kernel/rtas.c | 23 +++
>  1 file changed, 23 insertions(+)
>
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 198366d641d0..3487b42cfbf7 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  enum rtas_function_flags {
> @@ -525,6 +526,7 @@ void enter_rtas(unsigned long);
>  static void do_enter_rtas(struct rtas_args *args)
>  {
>   unsigned long msr;
> + const char *name = NULL;
>  
>   /*
>* Make sure MSR[RI] is currently enabled as it will be forced later
> @@ -537,9 +539,30 @@ static void do_enter_rtas(struct rtas_args *args)
>  
>   hard_irq_disable(); /* Ensure MSR[EE] is disabled on PPC64 */
>  
> + if ((trace_rtas_input_enabled() || trace_rtas_output_enabled())) {
> + /*
> +  * rtas_token_to_function() uses xarray which uses RCU,
> +  * but this code can run in the CPU offline path
> +  * (e.g. stop-self), after it's become invalid to call
> +  * RCU APIs.
> +  */

We can call this in real-mode via pseries_machine_check_realmode
-> fwnmi_release_errinfo, so tracing should be disabled for that
case too... Does this_cpu_set_ftrace_enabled(0) in the early
machine check handler cover that sufficiently?

Thanks,
Nick


[PATCH v3 real 01/17] powerpc/qspinlock: powerpc qspinlock implementation

2022-11-27 Thread Nicholas Piggin
Add a powerpc specific implementation of queued spinlocks. This is the
build framework with a very simple (non-queued) spinlock implementation
to begin with. Later changes add queueing, and other features and
optimisations one-at-a-time. It is done this way to more easily see how
the queued spinlocks are built, and to make performance and correctness
bisects more useful.

Signed-off-by: Nicholas Piggin 
---
Missed the first patch sending the series :( Here is the real patch 1.

Thanks,
NIck

 arch/powerpc/Kconfig  |  1 -
 arch/powerpc/include/asm/paravirt.h   |  3 +-
 arch/powerpc/include/asm/processor.h  |  1 +
 arch/powerpc/include/asm/qspinlock.h  | 87 +++
 arch/powerpc/include/asm/qspinlock_paravirt.h |  7 --
 arch/powerpc/include/asm/qspinlock_types.h| 13 +++
 arch/powerpc/include/asm/spinlock.h   |  2 +-
 arch/powerpc/include/asm/spinlock_types.h |  2 +-
 arch/powerpc/lib/Makefile |  4 +-
 arch/powerpc/lib/qspinlock.c  | 17 
 arch/powerpc/platforms/pseries/vas.c  |  1 +
 11 files changed, 67 insertions(+), 71 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h
 create mode 100644 arch/powerpc/include/asm/qspinlock_types.h
 create mode 100644 arch/powerpc/lib/qspinlock.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2ca5418457ed..1d5b4f280feb 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -155,7 +155,6 @@ config PPC
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
select ARCH_USE_MEMTEST
select ARCH_USE_QUEUED_RWLOCKS  if PPC_QUEUED_SPINLOCKS
-   select ARCH_USE_QUEUED_SPINLOCKSif PPC_QUEUED_SPINLOCKS
select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
select ARCH_WANT_IPC_PARSE_VERSION
select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
diff --git a/arch/powerpc/include/asm/paravirt.h 
b/arch/powerpc/include/asm/paravirt.h
index f5ba1a3c41f8..119b44b8e81b 100644
--- a/arch/powerpc/include/asm/paravirt.h
+++ b/arch/powerpc/include/asm/paravirt.h
@@ -3,14 +3,13 @@
 #define _ASM_POWERPC_PARAVIRT_H
 
 #include 
-#include 
 #ifdef CONFIG_PPC64
 #include 
 #include 
 #endif
 
 #ifdef CONFIG_PPC_SPLPAR
-#include 
+#include 
 #include 
 #include 
 
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 631802999d59..640d9a35661c 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -39,6 +39,7 @@
 #ifndef __ASSEMBLY__
 #include 
 #include 
+#include 
 #include 
 #include 
 
diff --git a/arch/powerpc/include/asm/qspinlock.h 
b/arch/powerpc/include/asm/qspinlock.h
index b676c4fb90fd..b1443aab2145 100644
--- a/arch/powerpc/include/asm/qspinlock.h
+++ b/arch/powerpc/include/asm/qspinlock.h
@@ -2,83 +2,54 @@
 #ifndef _ASM_POWERPC_QSPINLOCK_H
 #define _ASM_POWERPC_QSPINLOCK_H
 
-#include 
-#include 
+#include 
+#include 
+#include 
 
-#define _Q_PENDING_LOOPS   (1 << 9) /* not tuned */
-
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
-extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
-extern void __pv_queued_spin_unlock(struct qspinlock *lock);
-
-static __always_inline void queued_spin_lock_slowpath(struct qspinlock *lock, 
u32 val)
+static __always_inline int queued_spin_is_locked(struct qspinlock *lock)
 {
-   if (!is_shared_processor())
-   native_queued_spin_lock_slowpath(lock, val);
-   else
-   __pv_queued_spin_lock_slowpath(lock, val);
+   return atomic_read(&lock->val);
 }
 
-#define queued_spin_unlock queued_spin_unlock
-static inline void queued_spin_unlock(struct qspinlock *lock)
+static __always_inline int queued_spin_value_unlocked(struct qspinlock lock)
 {
-   if (!is_shared_processor())
-   smp_store_release(&lock->locked, 0);
-   else
-   __pv_queued_spin_unlock(lock);
+   return !atomic_read(&lock.val);
 }
 
-#else
-extern void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
-#endif
-
-static __always_inline void queued_spin_lock(struct qspinlock *lock)
+static __always_inline int queued_spin_is_contended(struct qspinlock *lock)
 {
-   u32 val = 0;
-
-   if (likely(arch_atomic_try_cmpxchg_lock(&lock->val, &val, 
_Q_LOCKED_VAL)))
-   return;
-
-   queued_spin_lock_slowpath(lock, val);
+   return 0;
 }
-#define queued_spin_lock queued_spin_lock
 
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-#define SPIN_THRESHOLD (1<<15) /* not tuned */
-
-static __always_inline void pv_wait(u8 *ptr, u8 val)
+static __always_inline int queued_spin_trylock(struct qspinlock *lock)
 {
-   if (*ptr != val)
-   return;
-   yield_to_any();
-   /*
-* We could pass in a CPU here if waiting in the queue and yield to
-* the previous CPU in the queue.
-*/
+   retur

Re: [PATCH] pseries/mobility: reset the RCU watchdogs after a LPM

2022-11-27 Thread Nicholas Piggin
On Sat Nov 26, 2022 at 3:32 AM AEST, Laurent Dufour wrote:
> The RCU watchdog timer should be reset when restarting the CPU after a Live
> Partition Mobility operation.
>
> Signed-off-by: Laurent Dufour 

Looks okay to me. xmon touches the softlockup watchdog explicitly but
is that for architectures with unsynchronized clocks maybe.

Acked-by: Nicholas Piggin 

> ---
>  arch/powerpc/platforms/pseries/mobility.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/mobility.c 
> b/arch/powerpc/platforms/pseries/mobility.c
> index 634fac5db3f9..9e10f38dd9ad 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -636,8 +636,10 @@ static int do_join(void *arg)
>   }
>   /*
>* Execution may have been suspended for several seconds, so
> -  * reset the watchdog.
> +  * reset the watchdogs.
>*/
> + rcu_cpu_stall_reset();
> + /* touch_nmi_watchdog() also touch the soft lockup watchdog */
>   touch_nmi_watchdog();
>   return ret;
>  }
> -- 
> 2.38.1



Re: [RFC PATCH 00/13] Add DEXCR support

2022-11-27 Thread Russell Currey
On Mon, 2022-11-28 at 13:44 +1100, Benjamin Gray wrote:
> This series is based on initial work by Chris Riedl that was not sent
> to the list.
> 
> Adds a kernel interface for userspace to interact with the DEXCR.
> The DEXCR is a SPR that allows control over various execution
> 'aspects', such as indirect branch prediction and enabling the
> hashst/hashchk instructions. Further details are in ISA 3.1B
> Book 3 chapter 12.
> 
> This RFC proposes an interface for users to interact with the DEXCR.
> It aims to support
> 
> * Querying supported aspects
> * Getting/setting aspects on a per-process level
> * Allowing global overrides across all processes
> 
> There are some parts that I'm not sure on the best way to approach
> (hence RFC):
> 
> * The feature names in arch/powerpc/kernel/dt_cpu_ftrs.c appear to be
> unimplemented
>   in skiboot, so are being defined by this series. Is being so
> verbose fine?

These are going to need to be added to skiboot before they can be
referenced in the kernel.  Inclusion in skiboot makes them ABI, the
kernel is just a consumer.

> * What aspects should be editable by a process? E.g., SBHE has
>   effects that potentially bleed into other processes. Should
>   it only be system wide configurable?

For context, ISA 3.1B p1358 says: 

   In some micro-architectures, the execution behav-
   ior controlled by aspect 0 is difficult to change with
   any degree of timing precision. The change may
   also bleed over into other threads on the same pro-
   cessor. Any environment that has a dependence on
   the more secure setting of aspect 0 should not
   change the value, and ideally should share a pro-
   cessor only with similar threads. For other environ-
   ments, changes to the effective value of aspect 0
   represent a relative risk tolerance for its aspect of
   execution behavior, with the understanding that
   there will be significant hysteresis in the execution
   behavior.
   
If a process sets SBHE for itself and all it takes is context switching
from a process with SBHE unset to cause exposure, then yeah I think it
should just be global.  I doubt branch hints have enough impact for
process granularity to be especially desirable anyway.

> * Should configuring certain aspects for the process be non-
> privileged? E.g.,
>   Is there harm in always allowing configuration of IBRTPD, SRAPD?
> The *FORCE_SET*
>   action prevents further process local changes regardless of
> privilege.

I'm not aware of a reason why it would be a problem to allow
unprivileged configuration as long as there's a way to prevent further
changes.  The concerning case is if a mitigation is set by a trusted
process context, and then untrusted code is executed that manages to
turn the mitigation off again.

> * The tests fail Patchwork CI because of the new prctl macros, and
> the CI
>   doesn't run headers_install and add -isystem
> /usr/include to
>   the make command.

The CI runs on x86 and cross compiles the kernel and selftests, and
boots are done in qemu tcg.  Maybe we can skip the build if the symbols
are undefined or do something like

#ifndef PR_PPC_DEXCR_...
return KSFT_SKIP;
#endif

in the test itself?

> * On handling an exception, I don't check if the NPHIE bit is enabled
> in the DEXCR.
>   To do so would require reading both the DEXCR and HDEXCR, for
> little gain (it
>   should only matter that the current instruction was a hashchk. If
> so, the only
>   reason it would cause an exception is the failed check. If the
> instruction is
>   rewritten between exception and check we'd be wrong anyway).

For context, the hashst and hashchk instructions are implemented using
previously reserved nops.  I'm not aware of any reason a nop could trap
(i.e. we could check for a trap that came from hashchk even if NPHIE is
not set), but afaik that'd be the only reason we would have to check.

> 
> The series is based on the earlier selftest utils series[1], so the
> tests won't build
> at all without applying that first. The kernel side should build fine
> on ppc/next
> 247f34f7b80357943234f93f247a1ae6b6c3a740 though.
> 
> [1]:
> https://patchwork.ozlabs.org/project/linuxppc-dev/cover/20221122231103.15829-1-bg...@linux.ibm.com/
> 
> Benjamin Gray (13):
>   powerpc/book3s: Add missing  include
>   powerpc: Add initial Dynamic Execution Control Register (DEXCR)
>     support
>   powerpc/dexcr: Handle hashchk exception
>   powerpc/dexcr: Support userspace ROP protection
>   prctl: Define PowerPC DEXCR interface
>   powerpc/dexcr: Add prctl implementation
>   powerpc/dexcr: Add sysctl entry for SBHE system override
>   powerpc/dexcr: Add enforced userspace ROP protection config
>   selftests/powerpc: Add more utility macros
>   selftests/powerpc: Add hashst/hashchk test
>   selftests/powerpc: Add DEXCR prctl, sysctl interface test
>   selftests/powerpc: Add DEXCR status utility lsdexcr
>   Documentation: Document PowerPC kernel DEXCR interface
> 
>  Documentation/powe

[PATCH v6 0/4] Option to build big-endian with ELFv2 ABI

2022-11-27 Thread Nicholas Piggin
This is hopefully the final attempt. Luis was happy for the module
patch to go via the powerpc tree, so I've put the the ELFv2 for big
endian build patches into the series. Hopefully we can deprecate
the ELFv1 ABI 

Since v5, I cleaned up patch 2 as per Christophe's review. And patch
4 I removed the EXPERT depends so it's easier to test. It's marked as
experimental, but we should soon make it default and try to deprecate
the v1 ABI so we can eventually remove it.

Thanks,
Nick

Nicholas Piggin (4):
  module: add module_elf_check_arch for module-specific checks
  powerpc/64: Add module check for ELF ABI version
  powerpc/64: Add big-endian ELFv2 flavour to crypto VMX asm generation
  powerpc/64: Option to build big-endian with ELFv2 ABI

 arch/powerpc/Kconfig   | 21 +
 arch/powerpc/kernel/module_64.c| 10 ++
 arch/powerpc/platforms/Kconfig.cputype |  4 ++--
 drivers/crypto/vmx/Makefile| 12 +++-
 drivers/crypto/vmx/ppc-xlate.pl| 10 ++
 include/linux/moduleloader.h   |  3 +++
 kernel/module/main.c   | 10 ++
 7 files changed, 63 insertions(+), 7 deletions(-)

-- 
2.37.2



[PATCH v6 2/4] powerpc/64: Add module check for ELF ABI version

2022-11-27 Thread Nicholas Piggin
Override the generic module ELF check to provide a check for the ELF ABI
version. This becomes important if we allow big-endian ELF ABI V2 builds
but it doesn't hurt to check now.

Cc: Jessica Yu 
Signed-off-by: Michael Ellerman 
[np: split patch, added changelog, adjust to Jessica's proposal]
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/module_64.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 7e45dc98df8a..ff045644f13f 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -31,6 +31,16 @@
this, and makes other things simpler.  Anton?
--RR.  */
 
+bool module_elf_check_arch(Elf_Ehdr *hdr)
+{
+   unsigned long abi_level = hdr->e_flags & 0x3;
+
+   if (IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2))
+   return abi_level == 2;
+   else
+   return abi_level < 2;
+}
+
 #ifdef CONFIG_PPC64_ELF_ABI_V2
 
 static func_desc_t func_desc(unsigned long addr)
-- 
2.37.2



[PATCH v6 1/4] module: add module_elf_check_arch for module-specific checks

2022-11-27 Thread Nicholas Piggin
The elf_check_arch() function is also used to test compatibility of
usermode binaries. Kernel modules may have more specific requirements,
for example powerpc would like to test for ABI version compatibility.

Add a weak module_elf_check_arch() that defaults to true, and call it
from elf_validity_check().

Cc: Michael Ellerman 
Signed-off-by: Jessica Yu 
[np: added changelog, adjust name, rebase]
Acked-by: Luis Chamberlain 
Signed-off-by: Nicholas Piggin 
---
 include/linux/moduleloader.h |  3 +++
 kernel/module/main.c | 10 ++
 2 files changed, 13 insertions(+)

diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 9e09d11ffe5b..7b4587a19189 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -13,6 +13,9 @@
  * must be implemented by each architecture.
  */
 
+/* arch may override to do additional checking of ELF header architecture */
+bool module_elf_check_arch(Elf_Ehdr *hdr);
+
 /* Adjust arch-specific sections.  Return 0 on success.  */
 int module_frob_arch_sections(Elf_Ehdr *hdr,
  Elf_Shdr *sechdrs,
diff --git a/kernel/module/main.c b/kernel/module/main.c
index d02d39c7174e..7b3f6fb0d428 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1674,6 +1674,11 @@ static int elf_validity_check(struct load_info *info)
   info->hdr->e_machine);
goto no_exec;
}
+   if (!module_elf_check_arch(info->hdr)) {
+   pr_err("Invalid module architecture in ELF header: %u\n",
+  info->hdr->e_machine);
+   goto no_exec;
+   }
if (info->hdr->e_shentsize != sizeof(Elf_Shdr)) {
pr_err("Invalid ELF section header size\n");
goto no_exec;
@@ -2247,6 +2252,11 @@ static void flush_module_icache(const struct module *mod)
   (unsigned long)mod->core_layout.base + 
mod->core_layout.size);
 }
 
+bool __weak module_elf_check_arch(Elf_Ehdr *hdr)
+{
+   return true;
+}
+
 int __weak module_frob_arch_sections(Elf_Ehdr *hdr,
 Elf_Shdr *sechdrs,
 char *secstrings,
-- 
2.37.2



[PATCH v6 3/4] powerpc/64: Add big-endian ELFv2 flavour to crypto VMX asm generation

2022-11-27 Thread Nicholas Piggin
This allows asm generation for big-endian ELFv2 builds.

Signed-off-by: Nicholas Piggin 
---
 drivers/crypto/vmx/Makefile | 12 +++-
 drivers/crypto/vmx/ppc-xlate.pl | 10 ++
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/vmx/Makefile b/drivers/crypto/vmx/Makefile
index 2560cfea1dec..e33c7238e7f8 100644
--- a/drivers/crypto/vmx/Makefile
+++ b/drivers/crypto/vmx/Makefile
@@ -2,8 +2,18 @@
 obj-$(CONFIG_CRYPTO_DEV_VMX_ENCRYPT) += vmx-crypto.o
 vmx-crypto-objs := vmx.o aesp8-ppc.o ghashp8-ppc.o aes.o aes_cbc.o aes_ctr.o 
aes_xts.o ghash.o
 
+ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y)
+override flavour := linux-ppc64le
+else
+ifdef CONFIG_PPC64_ELF_ABI_V2
+override flavour := linux-ppc64-elfv2
+else
+override flavour := linux-ppc64
+endif
+endif
+
 quiet_cmd_perl = PERL$@
-  cmd_perl = $(PERL) $< $(if $(CONFIG_CPU_LITTLE_ENDIAN), linux-ppc64le, 
linux-ppc64) > $@
+  cmd_perl = $(PERL) $< $(flavour) > $@
 
 targets += aesp8-ppc.S ghashp8-ppc.S
 
diff --git a/drivers/crypto/vmx/ppc-xlate.pl b/drivers/crypto/vmx/ppc-xlate.pl
index 36db2ef09e5b..b583898c11ae 100644
--- a/drivers/crypto/vmx/ppc-xlate.pl
+++ b/drivers/crypto/vmx/ppc-xlate.pl
@@ -9,6 +9,8 @@ open STDOUT,">$output" || die "can't open $output: $!";
 
 my %GLOBALS;
 my $dotinlocallabels=($flavour=~/linux/)?1:0;
+my $elfv2abi=(($flavour =~ /linux-ppc64le/) or ($flavour =~ 
/linux-ppc64-elfv2/))?1:0;
+my $dotfunctions=($elfv2abi=~1)?0:1;
 
 
 # directives which need special treatment on different platforms
@@ -40,7 +42,7 @@ my $globl = sub {
 };
 my $text = sub {
 my $ret = ($flavour =~ /aix/) ? ".csect\t.text[PR],7" : ".text";
-$ret = ".abiversion2\n".$ret   if ($flavour =~ /linux.*64le/);
+$ret = ".abiversion2\n".$ret   if ($elfv2abi);
 $ret;
 };
 my $machine = sub {
@@ -56,8 +58,8 @@ my $size = sub {
 if ($flavour =~ /linux/)
 {  shift;
my $name = shift; $name =~ s|^[\.\_]||;
-   my $ret  = ".size   $name,.-".($flavour=~/64$/?".":"").$name;
-   $ret .= "\n.size.$name,.-.$name" if ($flavour=~/64$/);
+   my $ret  = ".size   $name,.-".($dotfunctions?".":"").$name;
+   $ret .= "\n.size.$name,.-.$name" if ($dotfunctions);
$ret;
 }
 else
@@ -142,7 +144,7 @@ my $vmr = sub {
 
 # Some ABIs specify vrsave, special-purpose register #256, as reserved
 # for system use.
-my $no_vrsave = ($flavour =~ /linux-ppc64le/);
+my $no_vrsave = ($elfv2abi);
 my $mtspr = sub {
 my ($f,$idx,$ra) = @_;
 if ($idx == 256 && $no_vrsave) {
-- 
2.37.2



[PATCH v6 4/4] powerpc/64: Option to build big-endian with ELFv2 ABI

2022-11-27 Thread Nicholas Piggin
Provide an option to build big-endian kernels using the ELFv2 ABI. This
works on GCC only for now. Clang is rumored to support this, but core
build files need updating first, at least.

This gives big-endian kernels useful advantages of the ELFv2 ABI, e.g.,
less stack usage, -mprofile-kernel support, better compatibility with
eBPF tools.

BE+ELFv2 is not officially supported by the GNU toolchain, but it works
fine in testing and has been used by some userspace for some time (e.g.,
Void Linux).

Tested-by: Michal Suchánek 
Reviewed-by: Segher Boessenkool 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig   | 21 +
 arch/powerpc/platforms/Kconfig.cputype |  4 ++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2ca5418457ed..2d0d80bcc24a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -1,6 +1,9 @@
 # SPDX-License-Identifier: GPL-2.0
 source "arch/powerpc/platforms/Kconfig.cputype"
 
+config CC_HAS_ELFV2
+   def_bool PPC64 && $(cc-option, -mabi=elfv2)
+
 config 32BIT
bool
default y if PPC32
@@ -583,6 +586,24 @@ config KEXEC_FILE
 config ARCH_HAS_KEXEC_PURGATORY
def_bool KEXEC_FILE
 
+config PPC64_BIG_ENDIAN_ELF_ABI_V2
+   bool "Build big-endian kernel using ELF ABI V2 (EXPERIMENTAL)"
+   depends on PPC64 && CPU_BIG_ENDIAN
+   depends on CC_HAS_ELFV2
+   depends on LD_IS_BFD && LD_VERSION >= 22400
+   default n
+   help
+ This builds the kernel image using the "Power Architecture 64-Bit ELF
+ V2 ABI Specification", which has a reduced stack overhead and faster
+ function calls. This internal kernel ABI option does not affect
+  userspace compatibility.
+
+ The V2 ABI is standard for 64-bit little-endian, but for big-endian
+ it is less well tested by kernel and toolchain. However some distros
+ build userspace this way, and it can produce a functioning kernel.
+
+ This requires GCC and binutils 2.24 or newer.
+
 config RELOCATABLE
bool "Build a relocatable kernel"
depends on PPC64 || (FLATMEM && (44x || PPC_85xx))
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 0c4eed9aea80..6e94d45f3baa 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -575,10 +575,10 @@ config CPU_LITTLE_ENDIAN
 endchoice
 
 config PPC64_ELF_ABI_V1
-   def_bool PPC64 && CPU_BIG_ENDIAN
+   def_bool PPC64 && (CPU_BIG_ENDIAN && !PPC64_BIG_ENDIAN_ELF_ABI_V2)
 
 config PPC64_ELF_ABI_V2
-   def_bool PPC64 && CPU_LITTLE_ENDIAN
+   def_bool PPC64 && !PPC64_ELF_ABI_V1
 
 config PPC64_BOOT_WRAPPER
def_bool n
-- 
2.37.2



[PATCH v3 2/7] selftests/powerpc: Add ptrace setup_core_pattern() null-terminator

2022-11-27 Thread Benjamin Gray
- malloc() does not zero the buffer,
- fread() does not null-terminate it's output,
- `cat /proc/sys/kernel/core_pattern | hexdump -C` shows the file is
  not inherently null-terminated

So using string operations on the buffer is risky. Explicitly add a null
character to the end to make it safer.

Signed-off-by: Benjamin Gray 
---
 tools/testing/selftests/powerpc/ptrace/core-pkey.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/ptrace/core-pkey.c 
b/tools/testing/selftests/powerpc/ptrace/core-pkey.c
index bbc05ffc5860..5c82ed9e7c65 100644
--- a/tools/testing/selftests/powerpc/ptrace/core-pkey.c
+++ b/tools/testing/selftests/powerpc/ptrace/core-pkey.c
@@ -383,7 +383,7 @@ static int setup_core_pattern(char **core_pattern_, bool 
*changed_)
goto out;
}
 
-   ret = fread(core_pattern, 1, PATH_MAX, f);
+   ret = fread(core_pattern, 1, PATH_MAX - 1, f);
fclose(f);
if (!ret) {
perror("Error reading core_pattern file");
@@ -391,6 +391,8 @@ static int setup_core_pattern(char **core_pattern_, bool 
*changed_)
goto out;
}
 
+   core_pattern[ret] = '\0';
+
/* Check whether we can predict the name of the core file. */
if (!strcmp(core_pattern, "core") || !strcmp(core_pattern, "core.%p"))
*changed_ = false;
-- 
2.38.1



[PATCH v3 0/7] Expand selftest utils

2022-11-27 Thread Benjamin Gray
Started this when writing tests for a feature I'm working on, needing a way to
read/write numbers to system files. After writing some utils to safely handle
file IO and parsing, I realised I'd made the ~6th file read/write implementation
and only(?) number parser that checks all the failure modes when expecting to
parse a single number from a file.

So these utils ended up becoming this series. I also modified some other test
utils I came across while doing so. My understanding is selftests are not 
expected
to be backported, so I wasn't concerned about only introducing new utils and 
leaving
the existing implementations be.

V3: * Add reviewed-by from previous version
* Fix write(2) call to include creation mode

Benjamin Gray (7):
  selftests/powerpc: Use mfspr/mtspr macros
  selftests/powerpc: Add ptrace setup_core_pattern() null-terminator
  selftests/powerpc: Add generic read/write file util
  selftests/powerpc: Add read/write debugfs file, int
  selftests/powerpc: Parse long/unsigned long value safely
  selftests/powerpc: Add {read,write}_{long,ulong}
  selftests/powerpc: Add automatically allocating read_file

 tools/testing/selftests/powerpc/dscr/dscr.h   |  56 +---
 .../selftests/powerpc/dscr/dscr_sysfs_test.c  |  23 +-
 .../testing/selftests/powerpc/include/utils.h |  18 +-
 .../selftests/powerpc/nx-gzip/gzfht_test.c|  52 +--
 tools/testing/selftests/powerpc/pmu/lib.c |  35 +-
 .../selftests/powerpc/ptrace/core-pkey.c  |  28 +-
 .../selftests/powerpc/ptrace/ptrace-hwbreak.c |   6 +-
 .../testing/selftests/powerpc/ptrace/ptrace.h |   5 +-
 .../selftests/powerpc/security/entry_flush.c  |  12 +-
 .../selftests/powerpc/security/flush_utils.c  |   3 +-
 .../selftests/powerpc/security/rfi_flush.c|  12 +-
 .../powerpc/security/uaccess_flush.c  |  18 +-
 .../selftests/powerpc/syscalls/Makefile   |   2 +-
 .../selftests/powerpc/syscalls/rtas_filter.c  |  80 +
 tools/testing/selftests/powerpc/utils.c   | 314 ++
 15 files changed, 341 insertions(+), 323 deletions(-)


base-commit: 247f34f7b80357943234f93f247a1ae6b6c3a740
--
2.38.1


[PATCH v3 1/7] selftests/powerpc: Use mfspr/mtspr macros

2022-11-27 Thread Benjamin Gray
No need to write inline asm for mtspr/mfspr, we have macros for this
in reg.h

Signed-off-by: Benjamin Gray 
Reviewed-by: Andrew Donnellan 
---
 tools/testing/selftests/powerpc/dscr/dscr.h | 17 +
 .../selftests/powerpc/ptrace/ptrace-hwbreak.c   |  6 ++
 tools/testing/selftests/powerpc/ptrace/ptrace.h |  5 +
 .../selftests/powerpc/security/flush_utils.c|  3 ++-
 4 files changed, 10 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/powerpc/dscr/dscr.h 
b/tools/testing/selftests/powerpc/dscr/dscr.h
index 13e9b9e28e2c..b703714e7d98 100644
--- a/tools/testing/selftests/powerpc/dscr/dscr.h
+++ b/tools/testing/selftests/powerpc/dscr/dscr.h
@@ -23,6 +23,7 @@
 #include 
 #include 
 
+#include "reg.h"
 #include "utils.h"
 
 #define THREADS100 /* Max threads */
@@ -41,31 +42,23 @@
 /* Prilvilege state DSCR access */
 inline unsigned long get_dscr(void)
 {
-   unsigned long ret;
-
-   asm volatile("mfspr %0,%1" : "=r" (ret) : "i" (SPRN_DSCR_PRIV));
-
-   return ret;
+   return mfspr(SPRN_DSCR_PRIV);
 }
 
 inline void set_dscr(unsigned long val)
 {
-   asm volatile("mtspr %1,%0" : : "r" (val), "i" (SPRN_DSCR_PRIV));
+   mtspr(SPRN_DSCR_PRIV, val);
 }
 
 /* Problem state DSCR access */
 inline unsigned long get_dscr_usr(void)
 {
-   unsigned long ret;
-
-   asm volatile("mfspr %0,%1" : "=r" (ret) : "i" (SPRN_DSCR));
-
-   return ret;
+   return mfspr(SPRN_DSCR);
 }
 
 inline void set_dscr_usr(unsigned long val)
 {
-   asm volatile("mtspr %1,%0" : : "r" (val), "i" (SPRN_DSCR));
+   mtspr(SPRN_DSCR, val);
 }
 
 /* Default DSCR access */
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
index a0635a3819aa..1345e9b9af0f 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include "ptrace.h"
+#include "reg.h"
 
 #define SPRN_PVR   0x11F
 #define PVR_8xx0x0050
@@ -620,10 +621,7 @@ static int ptrace_hwbreak(void)
 
 int main(int argc, char **argv, char **envp)
 {
-   int pvr = 0;
-   asm __volatile__ ("mfspr %0,%1" : "=r"(pvr) : "i"(SPRN_PVR));
-   if (pvr == PVR_8xx)
-   is_8xx = true;
+   is_8xx = mfspr(SPRN_PVR) == PVR_8xx;
 
return test_harness(ptrace_hwbreak, "ptrace-hwbreak");
 }
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace.h 
b/tools/testing/selftests/powerpc/ptrace/ptrace.h
index 4e0233c0f2b3..04788e5fc504 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace.h
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace.h
@@ -745,10 +745,7 @@ int show_tm_spr(pid_t child, struct tm_spr_regs *out)
 /* Analyse TEXASR after TM failure */
 inline unsigned long get_tfiar(void)
 {
-   unsigned long ret;
-
-   asm volatile("mfspr %0,%1" : "=r" (ret) : "i" (SPRN_TFIAR));
-   return ret;
+   return mfspr(SPRN_TFIAR);
 }
 
 void analyse_texasr(unsigned long texasr)
diff --git a/tools/testing/selftests/powerpc/security/flush_utils.c 
b/tools/testing/selftests/powerpc/security/flush_utils.c
index 4d95965cb751..9c5c00e04f63 100644
--- a/tools/testing/selftests/powerpc/security/flush_utils.c
+++ b/tools/testing/selftests/powerpc/security/flush_utils.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include "reg.h"
 #include "utils.h"
 #include "flush_utils.h"
 
@@ -79,5 +80,5 @@ void set_dscr(unsigned long val)
init = 1;
}
 
-   asm volatile("mtspr %1,%0" : : "r" (val), "i" (SPRN_DSCR));
+   mtspr(SPRN_DSCR, val);
 }
-- 
2.38.1



[PATCH v3 4/7] selftests/powerpc: Add read/write debugfs file, int

2022-11-27 Thread Benjamin Gray
Debugfs files are not always integers, so make *_file return/write a
byte buffer, and *_int deal with int values specifically. This increases
consistency with the other file read/write helpers.

Signed-off-by: Benjamin Gray 
---
 .../testing/selftests/powerpc/include/utils.h |  6 ++--
 .../selftests/powerpc/security/entry_flush.c  | 12 +++
 .../selftests/powerpc/security/rfi_flush.c| 12 +++
 .../powerpc/security/uaccess_flush.c  | 18 +-
 tools/testing/selftests/powerpc/utils.c   | 34 ---
 5 files changed, 47 insertions(+), 35 deletions(-)

diff --git a/tools/testing/selftests/powerpc/include/utils.h 
b/tools/testing/selftests/powerpc/include/utils.h
index 70885e5814a8..de5e3790f397 100644
--- a/tools/testing/selftests/powerpc/include/utils.h
+++ b/tools/testing/selftests/powerpc/include/utils.h
@@ -35,8 +35,10 @@ int pick_online_cpu(void);
 
 int read_file(const char *path, char *buf, size_t count, size_t *len);
 int write_file(const char *path, const char *buf, size_t count);
-int read_debugfs_file(char *debugfs_file, int *result);
-int write_debugfs_file(char *debugfs_file, int result);
+int read_debugfs_file(const char *debugfs_file, char *buf, size_t count);
+int write_debugfs_file(const char *debugfs_file, const char *buf, size_t 
count);
+int read_debugfs_int(const char *debugfs_file, int *result);
+int write_debugfs_int(const char *debugfs_file, int result);
 int read_sysfs_file(char *debugfs_file, char *result, size_t result_size);
 int perf_event_open_counter(unsigned int type,
unsigned long config, int group_fd);
diff --git a/tools/testing/selftests/powerpc/security/entry_flush.c 
b/tools/testing/selftests/powerpc/security/entry_flush.c
index 68ce377b205e..e01c573deadd 100644
--- a/tools/testing/selftests/powerpc/security/entry_flush.c
+++ b/tools/testing/selftests/powerpc/security/entry_flush.c
@@ -34,18 +34,18 @@ int entry_flush_test(void)
// The PMU event we use only works on Power7 or later
SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
 
-   if (read_debugfs_file("powerpc/rfi_flush", &rfi_flush_orig) < 0) {
+   if (read_debugfs_int("powerpc/rfi_flush", &rfi_flush_orig) < 0) {
perror("Unable to read powerpc/rfi_flush debugfs file");
SKIP_IF(1);
}
 
-   if (read_debugfs_file("powerpc/entry_flush", &entry_flush_orig) < 0) {
+   if (read_debugfs_int("powerpc/entry_flush", &entry_flush_orig) < 0) {
perror("Unable to read powerpc/entry_flush debugfs file");
SKIP_IF(1);
}
 
if (rfi_flush_orig != 0) {
-   if (write_debugfs_file("powerpc/rfi_flush", 0) < 0) {
+   if (write_debugfs_int("powerpc/rfi_flush", 0) < 0) {
perror("error writing to powerpc/rfi_flush debugfs 
file");
FAIL_IF(1);
}
@@ -105,7 +105,7 @@ int entry_flush_test(void)
 
if (entry_flush == entry_flush_orig) {
entry_flush = !entry_flush_orig;
-   if (write_debugfs_file("powerpc/entry_flush", entry_flush) < 0) 
{
+   if (write_debugfs_int("powerpc/entry_flush", entry_flush) < 0) {
perror("error writing to powerpc/entry_flush debugfs 
file");
return 1;
}
@@ -120,12 +120,12 @@ int entry_flush_test(void)
 
set_dscr(0);
 
-   if (write_debugfs_file("powerpc/rfi_flush", rfi_flush_orig) < 0) {
+   if (write_debugfs_int("powerpc/rfi_flush", rfi_flush_orig) < 0) {
perror("unable to restore original value of powerpc/rfi_flush 
debugfs file");
return 1;
}
 
-   if (write_debugfs_file("powerpc/entry_flush", entry_flush_orig) < 0) {
+   if (write_debugfs_int("powerpc/entry_flush", entry_flush_orig) < 0) {
perror("unable to restore original value of powerpc/entry_flush 
debugfs file");
return 1;
}
diff --git a/tools/testing/selftests/powerpc/security/rfi_flush.c 
b/tools/testing/selftests/powerpc/security/rfi_flush.c
index f73484a6470f..6bedc86443a6 100644
--- a/tools/testing/selftests/powerpc/security/rfi_flush.c
+++ b/tools/testing/selftests/powerpc/security/rfi_flush.c
@@ -34,18 +34,18 @@ int rfi_flush_test(void)
// The PMU event we use only works on Power7 or later
SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
 
-   if (read_debugfs_file("powerpc/rfi_flush", &rfi_flush_orig) < 0) {
+   if (read_debugfs_int("powerpc/rfi_flush", &rfi_flush_orig) < 0) {
perror("Unable to read powerpc/rfi_flush debugfs file");
SKIP_IF(1);
}
 
-   if (read_debugfs_file("powerpc/entry_flush", &entry_flush_orig) < 0) {
+   if (read_debugfs_int("powerpc/entry_flush", &entry_flush_orig) < 0) {
have_entry_flush = 0;
} else {
have_entry_flush = 1;
 
if (entry_f

[PATCH v3 5/7] selftests/powerpc: Parse long/unsigned long value safely

2022-11-27 Thread Benjamin Gray
Often a file is expected to hold an integral value. Existing functions
will use a C stdlib function like atoi or strtol to parse the file.
These operations are error prone, with complicated error conditions
(atoi returns 0 if not a number, and is undefined behaviour if not in
range. strtol returns 0 if not a number, and LONG_MIN/MAX if not in
range + sets errno to ERANGE).

Add a dedicated parse function that accounts for these error conditions
so tests can safely parse numbers without undetected bad data. It's a
bit ugly to generate the functions through a macro, but it beats copying
the error check logic multiple times over.

Signed-off-by: Benjamin Gray 
---
 .../testing/selftests/powerpc/include/utils.h |  5 ++
 tools/testing/selftests/powerpc/pmu/lib.c |  9 ++--
 tools/testing/selftests/powerpc/utils.c   | 53 +--
 3 files changed, 59 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/powerpc/include/utils.h 
b/tools/testing/selftests/powerpc/include/utils.h
index de5e3790f397..b82e143a07c6 100644
--- a/tools/testing/selftests/powerpc/include/utils.h
+++ b/tools/testing/selftests/powerpc/include/utils.h
@@ -33,6 +33,11 @@ void *get_auxv_entry(int type);
 
 int pick_online_cpu(void);
 
+int parse_int(const char *buffer, size_t count, int *result, int base);
+int parse_long(const char *buffer, size_t count, long *result, int base);
+int parse_uint(const char *buffer, size_t count, unsigned int *result, int 
base);
+int parse_ulong(const char *buffer, size_t count, unsigned long *result, int 
base);
+
 int read_file(const char *path, char *buf, size_t count, size_t *len);
 int write_file(const char *path, const char *buf, size_t count);
 int read_debugfs_file(const char *debugfs_file, char *buf, size_t count);
diff --git a/tools/testing/selftests/powerpc/pmu/lib.c 
b/tools/testing/selftests/powerpc/pmu/lib.c
index e8960e7a1271..771658278f55 100644
--- a/tools/testing/selftests/powerpc/pmu/lib.c
+++ b/tools/testing/selftests/powerpc/pmu/lib.c
@@ -192,16 +192,15 @@ bool require_paranoia_below(int level)
 {
int err;
long current;
-   char *end, buf[16];
+   char buf[16] = {0};
+   char *end;
 
-   if ((err = read_file(PARANOID_PATH, buf, sizeof(buf), NULL))) {
+   if ((err = read_file(PARANOID_PATH, buf, sizeof(buf) - 1, NULL))) {
printf("Couldn't read " PARANOID_PATH "?\n");
return false;
}
 
-   current = strtol(buf, &end, 10);
-
-   if (end == buf) {
+   if ((err = parse_long(buf, sizeof(buf), ¤t, 10))) {
printf("Couldn't parse " PARANOID_PATH "?\n");
return false;
}
diff --git a/tools/testing/selftests/powerpc/utils.c 
b/tools/testing/selftests/powerpc/utils.c
index 8593e67ce779..c82539fd44f1 100644
--- a/tools/testing/selftests/powerpc/utils.c
+++ b/tools/testing/selftests/powerpc/utils.c
@@ -8,6 +8,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -113,6 +115,53 @@ int write_debugfs_file(const char *subpath, const char 
*buf, size_t count)
return write_file(path, buf, count);
 }
 
+#define TYPE_MIN(x)\
+   _Generic((x),   \
+   int:INT_MIN,\
+   long:   LONG_MIN,   \
+   unsigned int:   0,  \
+   unsigned long:  0)
+
+#define TYPE_MAX(x)\
+   _Generic((x),   \
+   int:INT_MAX,\
+   long:   LONG_MAX,   \
+   unsigned int:   INT_MAX,\
+   unsigned long:  LONG_MAX)
+
+#define define_parse_number(fn, type, super_type)  
\
+   int fn(const char *buffer, size_t count, type *result, int base)
\
+   {   
\
+   char *end;  
\
+   super_type parsed;  
\
+   
\
+   errno = 0;  
\
+   parsed = _Generic(parsed,   
\
+ intmax_t: strtoimax,  
\
+ uintmax_t:strtoumax)(buffer, &end, base); 
\
+   
\
+   if (errno == ERANGE ||  
\
+   parsed < TYPE_MIN(*result) || parsed > TYPE_MAX(*result))   
\
+   return ERANGE;  
\
+   
\
+   

[PATCH v3 7/7] selftests/powerpc: Add automatically allocating read_file

2022-11-27 Thread Benjamin Gray
A couple of tests roll their own auto-allocating file read logic.

Add a generic implementation and convert them to use it.

Signed-off-by: Benjamin Gray 
---
 .../testing/selftests/powerpc/include/utils.h |  1 +
 .../selftests/powerpc/nx-gzip/gzfht_test.c| 37 +
 .../selftests/powerpc/syscalls/Makefile   |  2 +-
 .../selftests/powerpc/syscalls/rtas_filter.c  | 80 +++
 tools/testing/selftests/powerpc/utils.c   | 63 +++
 5 files changed, 75 insertions(+), 108 deletions(-)

diff --git a/tools/testing/selftests/powerpc/include/utils.h 
b/tools/testing/selftests/powerpc/include/utils.h
index 044b0236df38..95f3a24a4569 100644
--- a/tools/testing/selftests/powerpc/include/utils.h
+++ b/tools/testing/selftests/powerpc/include/utils.h
@@ -40,6 +40,7 @@ int parse_ulong(const char *buffer, size_t count, unsigned 
long *result, int bas
 
 int read_file(const char *path, char *buf, size_t count, size_t *len);
 int write_file(const char *path, const char *buf, size_t count);
+int read_file_alloc(const char *path, char **buf, size_t *len);
 int read_long(const char *path, long *result, int base);
 int write_long(const char *path, long result, int base);
 int read_ulong(const char *path, unsigned long *result, int base);
diff --git a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c 
b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
index a6a226e1b8ba..4de079923ccb 100644
--- a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
+++ b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
@@ -143,41 +143,6 @@ int gzip_header_blank(char *buf)
return i;
 }
 
-/* Caller must free the allocated buffer return nonzero on error. */
-int read_alloc_input_file(char *fname, char **buf, size_t *bufsize)
-{
-   int err;
-   struct stat statbuf;
-   char *p;
-   size_t num_bytes;
-
-   if (stat(fname, &statbuf)) {
-   perror(fname);
-   return -1;
-   }
-
-   assert(NULL != (p = (char *) malloc(statbuf.st_size)));
-
-   if ((err = read_file(fname, p, statbuf.st_size, &num_bytes))) {
-   fprintf(stderr, "Failed to read file: %s\n", strerror(err));
-   goto fail;
-   }
-
-   if (num_bytes != statbuf.st_size) {
-   fprintf(stderr, "Actual bytes != expected bytes\n");
-   err = -1;
-   goto fail;
-   }
-
-   *buf = p;
-   *bufsize = num_bytes;
-   return 0;
-
-fail:
-   free(p);
-   return err;
-}
-
 /*
  * Z_SYNC_FLUSH as described in zlib.h.
  * Returns number of appended bytes
@@ -244,7 +209,7 @@ int compress_file(int argc, char **argv, void *handle)
fprintf(stderr, "usage: %s \n", argv[0]);
exit(-1);
}
-   if (read_alloc_input_file(argv[1], &inbuf, &inlen))
+   if (read_file_alloc(argv[1], &inbuf, &inlen))
exit(-1);
fprintf(stderr, "file %s read, %ld bytes\n", argv[1], inlen);
 
diff --git a/tools/testing/selftests/powerpc/syscalls/Makefile 
b/tools/testing/selftests/powerpc/syscalls/Makefile
index b63f8459c704..54ff5cfffc63 100644
--- a/tools/testing/selftests/powerpc/syscalls/Makefile
+++ b/tools/testing/selftests/powerpc/syscalls/Makefile
@@ -6,4 +6,4 @@ CFLAGS += -I../../../../../usr/include
 top_srcdir = ../../../../..
 include ../../lib.mk
 
-$(TEST_GEN_PROGS): ../harness.c
+$(TEST_GEN_PROGS): ../harness.c ../utils.c
diff --git a/tools/testing/selftests/powerpc/syscalls/rtas_filter.c 
b/tools/testing/selftests/powerpc/syscalls/rtas_filter.c
index 03b487f18d00..05f25f12556f 100644
--- a/tools/testing/selftests/powerpc/syscalls/rtas_filter.c
+++ b/tools/testing/selftests/powerpc/syscalls/rtas_filter.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -50,70 +51,16 @@ struct region {
struct region *next;
 };
 
-int read_entire_file(int fd, char **buf, size_t *len)
-{
-   size_t buf_size = 0;
-   size_t off = 0;
-   int rc;
-
-   *buf = NULL;
-   do {
-   buf_size += BLOCK_SIZE;
-   if (*buf == NULL)
-   *buf = malloc(buf_size);
-   else
-   *buf = realloc(*buf, buf_size);
-
-   if (*buf == NULL)
-   return -ENOMEM;
-
-   rc = read(fd, *buf + off, BLOCK_SIZE);
-   if (rc < 0)
-   return -EIO;
-
-   off += rc;
-   } while (rc == BLOCK_SIZE);
-
-   if (len)
-   *len = off;
-
-   return 0;
-}
-
-static int open_prop_file(const char *prop_path, const char *prop_name, int 
*fd)
-{
-   char *path;
-   int len;
-
-   /* allocate enough for two string, a slash and trailing NULL */
-   len = strlen(prop_path) + strlen(prop_name) + 1 + 1;
-   path = malloc(len);
-   if (path == NULL)
-   return -ENOMEM;
-
-   snprintf(path, len, "%s/%s", prop_path, prop_name);
-
- 

[PATCH v3 6/7] selftests/powerpc: Add {read,write}_{long,ulong}

2022-11-27 Thread Benjamin Gray
Add helper functions to read and write (unsigned) long values directly
from/to files. One of the kernel interfaces uses hex strings, so we need
to allow passing a base too.

Signed-off-by: Benjamin Gray 
---
 tools/testing/selftests/powerpc/dscr/dscr.h   |  9 +--
 .../selftests/powerpc/dscr/dscr_sysfs_test.c  | 12 ++--
 .../testing/selftests/powerpc/include/utils.h |  4 ++
 tools/testing/selftests/powerpc/pmu/lib.c | 11 +---
 tools/testing/selftests/powerpc/utils.c   | 62 +++
 5 files changed, 76 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/powerpc/dscr/dscr.h 
b/tools/testing/selftests/powerpc/dscr/dscr.h
index 9a69d473ffdf..b5166ddcf26a 100644
--- a/tools/testing/selftests/powerpc/dscr/dscr.h
+++ b/tools/testing/selftests/powerpc/dscr/dscr.h
@@ -65,26 +65,21 @@ inline void set_dscr_usr(unsigned long val)
 unsigned long get_default_dscr(void)
 {
int err;
-   char buf[16] = {0};
unsigned long val;
 
-   if ((err = read_file(DSCR_DEFAULT, buf, sizeof(buf) - 1, NULL))) {
+   if ((err = read_ulong(DSCR_DEFAULT, &val, 16))) {
fprintf(stderr, "get_default_dscr() read failed: %s\n", 
strerror(err));
exit(1);
}
 
-   sscanf(buf, "%lx", &val);
return val;
 }
 
 void set_default_dscr(unsigned long val)
 {
int err;
-   char buf[16];
 
-   sprintf(buf, "%lx\n", val);
-
-   if ((err = write_file(DSCR_DEFAULT, buf, strlen(buf {
+   if ((err = write_ulong(DSCR_DEFAULT, val, 16))) {
fprintf(stderr, "set_default_dscr() write failed: %s\n", 
strerror(err));
exit(1);
}
diff --git a/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c 
b/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c
index 310946262a24..3ac176888feb 100644
--- a/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c
+++ b/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c
@@ -12,15 +12,15 @@
 
 static int check_cpu_dscr_default(char *file, unsigned long val)
 {
-   char buf[10] = {0};
-   int rc;
+   unsigned long cpu_dscr;
+   int err;
 
-   if ((rc = read_file(file, buf, sizeof(buf) - 1, NULL)))
-   return rc;
+   if ((err = read_ulong(file, &cpu_dscr, 16)))
+   return err;
 
-   if (strtol(buf, NULL, 16) != val) {
+   if (cpu_dscr != val) {
printf("DSCR match failed: %ld (system) %ld (cpu)\n",
-   val, strtol(buf, NULL, 16));
+   val, cpu_dscr);
return 1;
}
return 0;
diff --git a/tools/testing/selftests/powerpc/include/utils.h 
b/tools/testing/selftests/powerpc/include/utils.h
index b82e143a07c6..044b0236df38 100644
--- a/tools/testing/selftests/powerpc/include/utils.h
+++ b/tools/testing/selftests/powerpc/include/utils.h
@@ -40,6 +40,10 @@ int parse_ulong(const char *buffer, size_t count, unsigned 
long *result, int bas
 
 int read_file(const char *path, char *buf, size_t count, size_t *len);
 int write_file(const char *path, const char *buf, size_t count);
+int read_long(const char *path, long *result, int base);
+int write_long(const char *path, long result, int base);
+int read_ulong(const char *path, unsigned long *result, int base);
+int write_ulong(const char *path, unsigned long result, int base);
 int read_debugfs_file(const char *debugfs_file, char *buf, size_t count);
 int write_debugfs_file(const char *debugfs_file, const char *buf, size_t 
count);
 int read_debugfs_int(const char *debugfs_file, int *result);
diff --git a/tools/testing/selftests/powerpc/pmu/lib.c 
b/tools/testing/selftests/powerpc/pmu/lib.c
index 771658278f55..55481c5b6995 100644
--- a/tools/testing/selftests/powerpc/pmu/lib.c
+++ b/tools/testing/selftests/powerpc/pmu/lib.c
@@ -192,16 +192,9 @@ bool require_paranoia_below(int level)
 {
int err;
long current;
-   char buf[16] = {0};
-   char *end;
 
-   if ((err = read_file(PARANOID_PATH, buf, sizeof(buf) - 1, NULL))) {
-   printf("Couldn't read " PARANOID_PATH "?\n");
-   return false;
-   }
-
-   if ((err = parse_long(buf, sizeof(buf), ¤t, 10))) {
-   printf("Couldn't parse " PARANOID_PATH "?\n");
+   if ((err = read_long(PARANOID_PATH, ¤t, 10))) {
+   fprintf(stderr, "Couldn't read " PARANOID_PATH ": %s\n", 
strerror(err));
return false;
}
 
diff --git a/tools/testing/selftests/powerpc/utils.c 
b/tools/testing/selftests/powerpc/utils.c
index c82539fd44f1..b2906dd71cf5 100644
--- a/tools/testing/selftests/powerpc/utils.c
+++ b/tools/testing/selftests/powerpc/utils.c
@@ -162,6 +162,68 @@ define_parse_number(parse_long, long, intmax_t);
 define_parse_number(parse_uint, unsigned int, uintmax_t);
 define_parse_number(parse_ulong, unsigned long, uintmax_t);
 
+int read_long(const char *path, long *result, int base)
+{
+   int err;
+  

[PATCH v3 3/7] selftests/powerpc: Add generic read/write file util

2022-11-27 Thread Benjamin Gray
File read/write is reimplemented in about 5 different ways in the
various PowerPC selftests. This indicates it should be a common util.

Add a common read_file / write_file implementation and convert users
to it where (easily) possible.

Signed-off-by: Benjamin Gray 
---
 tools/testing/selftests/powerpc/dscr/dscr.h   |  36 ++
 .../selftests/powerpc/dscr/dscr_sysfs_test.c  |  19 +--
 .../testing/selftests/powerpc/include/utils.h |   2 +
 .../selftests/powerpc/nx-gzip/gzfht_test.c|  49 +++-
 tools/testing/selftests/powerpc/pmu/lib.c |  27 +
 .../selftests/powerpc/ptrace/core-pkey.c  |  30 ++---
 tools/testing/selftests/powerpc/utils.c   | 108 ++
 7 files changed, 107 insertions(+), 164 deletions(-)

diff --git a/tools/testing/selftests/powerpc/dscr/dscr.h 
b/tools/testing/selftests/powerpc/dscr/dscr.h
index b703714e7d98..9a69d473ffdf 100644
--- a/tools/testing/selftests/powerpc/dscr/dscr.h
+++ b/tools/testing/selftests/powerpc/dscr/dscr.h
@@ -64,48 +64,30 @@ inline void set_dscr_usr(unsigned long val)
 /* Default DSCR access */
 unsigned long get_default_dscr(void)
 {
-   int fd = -1, ret;
-   char buf[16];
+   int err;
+   char buf[16] = {0};
unsigned long val;
 
-   if (fd == -1) {
-   fd = open(DSCR_DEFAULT, O_RDONLY);
-   if (fd == -1) {
-   perror("open() failed");
-   exit(1);
-   }
-   }
-   memset(buf, 0, sizeof(buf));
-   lseek(fd, 0, SEEK_SET);
-   ret = read(fd, buf, sizeof(buf));
-   if (ret == -1) {
-   perror("read() failed");
+   if ((err = read_file(DSCR_DEFAULT, buf, sizeof(buf) - 1, NULL))) {
+   fprintf(stderr, "get_default_dscr() read failed: %s\n", 
strerror(err));
exit(1);
}
+
sscanf(buf, "%lx", &val);
-   close(fd);
return val;
 }
 
 void set_default_dscr(unsigned long val)
 {
-   int fd = -1, ret;
+   int err;
char buf[16];
 
-   if (fd == -1) {
-   fd = open(DSCR_DEFAULT, O_RDWR);
-   if (fd == -1) {
-   perror("open() failed");
-   exit(1);
-   }
-   }
sprintf(buf, "%lx\n", val);
-   ret = write(fd, buf, strlen(buf));
-   if (ret == -1) {
-   perror("write() failed");
+
+   if ((err = write_file(DSCR_DEFAULT, buf, strlen(buf {
+   fprintf(stderr, "set_default_dscr() write failed: %s\n", 
strerror(err));
exit(1);
}
-   close(fd);
 }
 
 double uniform_deviate(int seed)
diff --git a/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c 
b/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c
index fbbdffdb2e5d..310946262a24 100644
--- a/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c
+++ b/tools/testing/selftests/powerpc/dscr/dscr_sysfs_test.c
@@ -12,23 +12,12 @@
 
 static int check_cpu_dscr_default(char *file, unsigned long val)
 {
-   char buf[10];
-   int fd, rc;
+   char buf[10] = {0};
+   int rc;
 
-   fd = open(file, O_RDWR);
-   if (fd == -1) {
-   perror("open() failed");
-   return 1;
-   }
-
-   rc = read(fd, buf, sizeof(buf));
-   if (rc == -1) {
-   perror("read() failed");
-   return 1;
-   }
-   close(fd);
+   if ((rc = read_file(file, buf, sizeof(buf) - 1, NULL)))
+   return rc;
 
-   buf[rc] = '\0';
if (strtol(buf, NULL, 16) != val) {
printf("DSCR match failed: %ld (system) %ld (cpu)\n",
val, strtol(buf, NULL, 16));
diff --git a/tools/testing/selftests/powerpc/include/utils.h 
b/tools/testing/selftests/powerpc/include/utils.h
index e222a5858450..70885e5814a8 100644
--- a/tools/testing/selftests/powerpc/include/utils.h
+++ b/tools/testing/selftests/powerpc/include/utils.h
@@ -33,6 +33,8 @@ void *get_auxv_entry(int type);
 
 int pick_online_cpu(void);
 
+int read_file(const char *path, char *buf, size_t count, size_t *len);
+int write_file(const char *path, const char *buf, size_t count);
 int read_debugfs_file(char *debugfs_file, int *result);
 int write_debugfs_file(char *debugfs_file, int result);
 int read_sysfs_file(char *debugfs_file, char *result, size_t result_size);
diff --git a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c 
b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
index 095195a25687..a6a226e1b8ba 100644
--- a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
+++ b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
@@ -146,49 +146,36 @@ int gzip_header_blank(char *buf)
 /* Caller must free the allocated buffer return nonzero on error. */
 int read_alloc_input_file(char *fname, char **buf, size_t *bufsize)
 {
+   int err;
struct stat statbuf;
-   FILE *fp;
char *p;
size_t num_bytes;
 
if (stat(fname, &statbuf)

[RFC PATCH] Disable Book-E KVM support?

2022-11-27 Thread Nicholas Piggin
BookE KVM is in a deep maintenance state, I'm not sure how much testing
it gets. I don't have a test setup, and it does not look like QEMU has
any HV architecture enabled. It hasn't been too painful but there are
some cases where it causes a bit of problem not being able to test, e.g.,

https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-November/251452.html

Time to begin removal process, or are there still people using it? I'm
happy to to keep making occasional patches to try keep it going if
there are people testing upstream. Getting HV support into QEMU would
help with long term support, not sure how big of a job that would be.

Thanks,
Nick
---
 arch/powerpc/kvm/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index a9f57dad6d91..6c9458741cb3 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -191,6 +191,7 @@ config KVM_EXIT_TIMING
 
 config KVM_E500V2
bool "KVM support for PowerPC E500v2 processors"
+   depends on false
depends on PPC_E500 && !PPC_E500MC
depends on !CONTEXT_TRACKING_USER
select KVM
@@ -207,6 +208,7 @@ config KVM_E500V2
 
 config KVM_E500MC
bool "KVM support for PowerPC E500MC/E5500/E6500 processors"
+   depends on false
depends on PPC_E500MC
depends on !CONTEXT_TRACKING_USER
select KVM
-- 
2.37.2