Re: Re: Re: [PATCH v2] treewide: const qualify ctl_tables where applicable

2025-01-27 Thread Jani Nikula
On Mon, 27 Jan 2025, Joel Granados  wrote:
> On Wed, Jan 22, 2025 at 01:41:35PM +0100, Ard Biesheuvel wrote:
>> On Wed, 22 Jan 2025 at 13:25, Joel Granados  wrote:
>> >
>> > On Tue, Jan 21, 2025 at 02:40:16PM +0100, Alexander Gordeev wrote:
>> > > On Fri, Jan 10, 2025 at 03:16:08PM +0100, Joel Granados wrote:
>> > >
>> > > Hi Joel,
>> > >
>> > > > Add the const qualifier to all the ctl_tables in the tree except for
>> > > > watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
>> > > > loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
>> > > > drivers/inifiniband dirs). These are special cases as they use a
>> > > > registration function with a non-const qualified ctl_table argument or
>> > > > modify the arrays before passing them on to the registration function.
>> > > >
>> > > > Constifying ctl_table structs will prevent the modification of
>> > > > proc_handler function pointers as the arrays would reside in .rodata.
>> > > > This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
>> > > > constify the ctl_table argument of proc_handlers") constified all the
>> > > > proc_handlers.
>> > >
>> > > I could identify at least these occurences in s390 code as well:
>> > Hey Alexander
>> >
>> > Thx for bringing these to my attention. I had completely missed them as
>> > the spatch only deals with ctl_tables outside functions.
>> >
>> > Short answer:
>> > These should not be included in the current patch because they are a
>> > different pattern from how sysctl tables are usually used. So I will not
>> > include them.
>> >
>> > With that said, I think it might be interesting to look closer at them
>> > as they seem to be complicating the proc_handler (I have to look at them
>> > closer).
>> >
>> > I see that they are defining a ctl_table struct within the functions and
>> > just using the data (from the incoming ctl_table) to forward things down
>> > to proc_do{u,}intvec_* functions. This is very odd and I have only seen
>> > it done in order to change the incoming ctl_table (which is not what is
>> > being done here).
>> >
>> > I will take a closer look after the merge window and circle back with
>> > more info. Might take me a while as I'm not very familiar with s390
>> > code; any additional information on why those are being used inside the
>> > functions would be helpfull.
>> >
>> 
>> Using const data on the stack is not as useful, because the stack is
>> always mapped writable.
>> 
>> Global data structures marked 'const' will be moved into an ELF
>> section that is typically mapped read-only in its entirely, and so the
>> data cannot be modified by writing to it directly. No such protection
>> is possible for the stack, and so the constness there is only enforced
>> at compile time.
> I completely agree with you. No reason to use const within those
> functions. But why define those ctl_tables in function to begin with.
> Can't you just use the ones that are defined outside the functions?

You could have static const within functions too. You get the rodata
protection and function local scope, best of both worlds?

BR,
Jani.


-- 
Jani Nikula, Intel



Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()

2025-01-27 Thread Dmitry V. Levin
On Sat, Jan 25, 2025 at 11:18:06PM +1100, Michael Ellerman wrote:
> Alexey Gladkov  writes:
> >
> ...
> > I'm not a powerpc expert but shouldn't be used regs->gpr[3] via a
> > regs_return_value() in system_call_exception() ?
> 
> Yes I agree.
> 
> > notrace long system_call_exception(struct pt_regs *regs, unsigned long r0)
> > {
> > ...
> > r0 = do_syscall_trace_enter(regs);
> > if (unlikely(r0 >= NR_syscalls))
> > return regs->gpr[3];
> 
> This is the case where we're expecting the r3 value to be a negative
> error code, to match the in-kernel semantics. But after this change it
> would be a positive error value. It is probably harmless with the
> current code structure, but that's just luck.

I'm afraid that's not just luck.  do_seccomp() from the very beginning
supports both the generic kernel -ERRORCODE return value ABI and the
powerpc sc syscall return ABI, thanks to syscall_exit_prepare() that
converts the former to the latter.  Given that this inconsistency was
exposed to user space via PTRACE_EVENT_SECCOMP tracers for so many years,
I suppose backwards compatibility has to be provided.  Consequently, since
the point of __secure_computing() invocation and up to the point of
conversion in syscall_exit_prepare(), gpr[3] may be set according to
either of these two ABIs.  Unfortunately, this means any future attempt
to avoid the inconsistency would be inherently incomplete.

For this reason, I doubt it would make sense to include into the patch
any changes that are needed only to address this consistency issue.


-- 
ldv



Re: [PATCH v2] selftests: livepatch: handle PRINTK_CALLER in check_result()

2025-01-27 Thread Petr Mladek
On Sun 2025-01-19 22:02:38, Madhavan Srinivasan wrote:
> Some arch configs (like ppc64) enable CONFIG_PRINTK_CALLER,
> which adds the caller id as part of the dmesg. With recent
> util-linux's update 467a5b3192f16 ('dmesg: add caller_id support')
> the standard "dmesg" has been enhanced to print PRINTK_CALLER fields.
> 
> Due to this, even though the expected vs observed are same,
> end testcase results are failed.
> 
>  -% insmod test_modules/test_klp_livepatch.ko
>  -livepatch: enabling patch 'test_klp_livepatch'
>  -livepatch: 'test_klp_livepatch': initializing patching transition
>  -livepatch: 'test_klp_livepatch': starting patching transition
>  -livepatch: 'test_klp_livepatch': completing patching transition
>  -livepatch: 'test_klp_livepatch': patching complete
>  -% echo 0 > /sys/kernel/livepatch/test_klp_livepatch/enabled
>  -livepatch: 'test_klp_livepatch': initializing unpatching transition
>  -livepatch: 'test_klp_livepatch': starting unpatching transition
>  -livepatch: 'test_klp_livepatch': completing unpatching transition
>  -livepatch: 'test_klp_livepatch': unpatching complete
>  -% rmmod test_klp_livepatch
>  +[   T3659] % insmod test_modules/test_klp_livepatch.ko
>  +[   T3682] livepatch: enabling patch 'test_klp_livepatch'
>  +[   T3682] livepatch: 'test_klp_livepatch': initializing patching transition
>  +[   T3682] livepatch: 'test_klp_livepatch': starting patching transition
>  +[T826] livepatch: 'test_klp_livepatch': completing patching transition
>  +[T826] livepatch: 'test_klp_livepatch': patching complete
>  +[   T3659] % echo 0 > /sys/kernel/livepatch/test_klp_livepatch/enabled
>  +[   T3659] livepatch: 'test_klp_livepatch': initializing unpatching 
> transition
>  +[   T3659] livepatch: 'test_klp_livepatch': starting unpatching transition
>  +[T789] livepatch: 'test_klp_livepatch': completing unpatching transition
>  +[T789] livepatch: 'test_klp_livepatch': unpatching complete
>  +[   T3659] % rmmod test_klp_livepatch
> 
>   ERROR: livepatch kselftest(s) failed
>  not ok 1 selftests: livepatch: test-livepatch.sh # exit=1
> 
> Currently the check_result() handles the "[time]" removal from
> the dmesg. Enhance the check to also handle removal of "[Thread Id]"
> or "[CPU Id]".
> 
> Signed-off-by: Madhavan Srinivasan 

JFYI, the patch has been committed into livepatch.git,
branch for-6.14/selftests-dmesg.

I am going to create a pull request for Linus' master by
the end of the week or next week.

Best Regards,
Petr



Re: [PATCH 00/34] address all -Wunused-const warnings

2025-01-27 Thread Andy Shevchenko
On Wed, Apr 03, 2024 at 10:06:18AM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> Compilers traditionally warn for unused 'static' variables, but not
> if they are constant. The reason here is a custom for C++ programmers
> to define named constants as 'static const' variables in header files
> instead of using macros or enums.
> 
> In W=1 builds, we get warnings only static const variables in C
> files, but not in headers, which is a good compromise, but this still
> produces warning output in at least 30 files. These warnings are
> almost all harmless, but also trivial to fix, and there is no
> good reason to warn only about the non-const variables being unused.
> 
> I've gone through all the files that I found using randconfig and
> allmodconfig builds and created patches to avoid these warnings,
> with the goal of retaining a clean build once the option is enabled
> by default.
> 
> Unfortunately, there is one fairly large patch ("drivers: remove
> incorrect of_match_ptr/ACPI_PTR annotations") that touches
> 34 individual drivers that all need the same one-line change.
> If necessary, I can split it up by driver or by subsystem,
> but at least for reviewing I would keep it as one piece for
> the moment.
> 
> Please merge the individual patches through subsystem trees.
> I expect that some of these will have to go through multiple
> revisions before they are picked up, so anything that gets
> applied early saves me from resending.

Arnd, can you refresh this one? It seems some misses still...
I have got 3+ 0-day reports against one of the mux drivers.

https://lore.kernel.org/all/?q=adg792a.c

-- 
With Best Regards,
Andy Shevchenko





Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()

2025-01-27 Thread Dmitry V. Levin
On Mon, Jan 27, 2025 at 12:36:53PM +0100, Christophe Leroy wrote:
> Le 27/01/2025 à 12:20, Dmitry V. Levin a écrit :
> > On Thu, Jan 23, 2025 at 11:07:21PM +0100, Christophe Leroy wrote:
> > [...]
> >> To add a bit more to the confusion,
> > 
> > Looks like there is no end to it:
> > 
> > static inline long regs_return_value(struct pt_regs *regs)
> > {
> >  if (trap_is_scv(regs))
> >  return regs->gpr[3];
> > 
> >  if (is_syscall_success(regs))
> >  return regs->gpr[3];
> >  else
> >  return -regs->gpr[3];
> > }
> > 
> > static inline void regs_set_return_value(struct pt_regs *regs, unsigned 
> > long rc)
> > {
> >  regs->gpr[3] = rc;
> > }
> > 
> > This doesn't look consistent, does it?
> > 
> > 
> 
> That regs_set_return_value() looks pretty similar to 
> syscall_get_return_value().

Yes, but here similarities end, and differences begin.

> regs_set_return_value() documentation in asm-generic/syscall.h 
> explicitely says: This value is meaningless if syscall_get_error() 
> returned nonzero
> 
> Is it the same with regs_set_return_value(), only meaningfull where 
> there is no error ?

Did you mean syscall_set_return_value?  No, it explicitly has two
arguments, "int error" and "long val", so it can be used to either
clear or set the error condition as specified by the caller.

> By the way, why have two very similar APIs, one in syscall.h one in 
> ptrace.h ?

I have no polite answer to this, sorry.


-- 
ldv



Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()

2025-01-27 Thread Christophe Leroy




Le 27/01/2025 à 12:44, Dmitry V. Levin a écrit :

On Mon, Jan 27, 2025 at 12:36:53PM +0100, Christophe Leroy wrote:

Le 27/01/2025 à 12:20, Dmitry V. Levin a écrit :

On Thu, Jan 23, 2025 at 11:07:21PM +0100, Christophe Leroy wrote:
[...]

To add a bit more to the confusion,


Looks like there is no end to it:

static inline long regs_return_value(struct pt_regs *regs)
{
  if (trap_is_scv(regs))
  return regs->gpr[3];

  if (is_syscall_success(regs))
  return regs->gpr[3];
  else
  return -regs->gpr[3];
}

static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
{
  regs->gpr[3] = rc;
}

This doesn't look consistent, does it?




That regs_set_return_value() looks pretty similar to
syscall_get_return_value().


Yes, but here similarities end, and differences begin.


regs_set_return_value() documentation in asm-generic/syscall.h
explicitely says: This value is meaningless if syscall_get_error()
returned nonzero

Is it the same with regs_set_return_value(), only meaningfull where
there is no error ?


Did you mean syscall_set_return_value?  No, it explicitly has two
arguments, "int error" and "long val", so it can be used to either
clear or set the error condition as specified by the caller.


Sorry, I mean syscall_get_return_value() here.

static inline long syscall_get_return_value(struct task_struct *task,
struct pt_regs *regs)
{
return regs->gpr[3];
}

Versus

static inline void regs_set_return_value(struct pt_regs *regs, unsigned 
long rc)

{
regs->gpr[3] = rc;
}




By the way, why have two very similar APIs, one in syscall.h one in
ptrace.h ?


I have no polite answer to this, sorry.







Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()

2025-01-27 Thread Christophe Leroy




Le 27/01/2025 à 12:20, Dmitry V. Levin a écrit :

On Thu, Jan 23, 2025 at 11:07:21PM +0100, Christophe Leroy wrote:
[...]

To add a bit more to the confusion,


Looks like there is no end to it:

static inline long regs_return_value(struct pt_regs *regs)
{
 if (trap_is_scv(regs))
 return regs->gpr[3];

 if (is_syscall_success(regs))
 return regs->gpr[3];
 else
 return -regs->gpr[3];
}

static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
{
 regs->gpr[3] = rc;
}

This doesn't look consistent, does it?




That regs_set_return_value() looks pretty similar to 
syscall_get_return_value().


regs_set_return_value() documentation in asm-generic/syscall.h 
explicitely says: This value is meaningless if syscall_get_error() 
returned nonzero


Is it the same with regs_set_return_value(), only meaningfull where 
there is no error ?


By the way, why have two very similar APIs, one in syscall.h one in 
ptrace.h ?




Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()

2025-01-27 Thread Dmitry V. Levin
On Thu, Jan 23, 2025 at 11:07:21PM +0100, Christophe Leroy wrote:
[...]
> To add a bit more to the confusion,

Looks like there is no end to it:

static inline long regs_return_value(struct pt_regs *regs)
{
if (trap_is_scv(regs))
return regs->gpr[3];

if (is_syscall_success(regs))
return regs->gpr[3];
else
return -regs->gpr[3];
}

static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
{
regs->gpr[3] = rc;
}

This doesn't look consistent, does it?


-- 
ldv



Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()

2025-01-27 Thread Dmitry V. Levin
On Mon, Jan 27, 2025 at 01:04:27PM +0100, Christophe Leroy wrote:
> 
> 
> Le 27/01/2025 à 12:44, Dmitry V. Levin a écrit :
> > On Mon, Jan 27, 2025 at 12:36:53PM +0100, Christophe Leroy wrote:
> >> Le 27/01/2025 à 12:20, Dmitry V. Levin a écrit :
> >>> On Thu, Jan 23, 2025 at 11:07:21PM +0100, Christophe Leroy wrote:
> >>> [...]
>  To add a bit more to the confusion,
> >>>
> >>> Looks like there is no end to it:
> >>>
> >>> static inline long regs_return_value(struct pt_regs *regs)
> >>> {
> >>>   if (trap_is_scv(regs))
> >>>   return regs->gpr[3];
> >>>
> >>>   if (is_syscall_success(regs))
> >>>   return regs->gpr[3];
> >>>   else
> >>>   return -regs->gpr[3];
> >>> }
> >>>
> >>> static inline void regs_set_return_value(struct pt_regs *regs, unsigned 
> >>> long rc)
> >>> {
> >>>   regs->gpr[3] = rc;
> >>> }
> >>>
> >>> This doesn't look consistent, does it?
> >>>
> >>>
> >>
> >> That regs_set_return_value() looks pretty similar to
> >> syscall_get_return_value().
> > 
> > Yes, but here similarities end, and differences begin.
> > 
> >> regs_set_return_value() documentation in asm-generic/syscall.h
> >> explicitely says: This value is meaningless if syscall_get_error()
> >> returned nonzero
> >>
> >> Is it the same with regs_set_return_value(), only meaningfull where
> >> there is no error ?
> > 
> > Did you mean syscall_set_return_value?  No, it explicitly has two
> > arguments, "int error" and "long val", so it can be used to either
> > clear or set the error condition as specified by the caller.
> 
> Sorry, I mean syscall_get_return_value() here.
> 
> static inline long syscall_get_return_value(struct task_struct *task,
>   struct pt_regs *regs)
> {
>   return regs->gpr[3];
> }
> 
> Versus
> 
> static inline void regs_set_return_value(struct pt_regs *regs, unsigned 
> long rc)
> {
>   regs->gpr[3] = rc;
> }

The asm/syscall.h API provides two functions to obtain the return value:
syscall_get_error() and syscall_get_return_value().  The first one is used
to obtain the error code when the error condition is set.  When the error
condition is not set, it returns 0.  The second function is used to obtain
the return value when the error condition is not set.  When the error
condition is set, its return value is undefined.


-- 
ldv



Re: Re: Re: [PATCH v2] treewide: const qualify ctl_tables where applicable

2025-01-27 Thread Joel Granados
On Wed, Jan 22, 2025 at 01:41:35PM +0100, Ard Biesheuvel wrote:
> On Wed, 22 Jan 2025 at 13:25, Joel Granados  wrote:
> >
> > On Tue, Jan 21, 2025 at 02:40:16PM +0100, Alexander Gordeev wrote:
> > > On Fri, Jan 10, 2025 at 03:16:08PM +0100, Joel Granados wrote:
> > >
> > > Hi Joel,
> > >
> > > > Add the const qualifier to all the ctl_tables in the tree except for
> > > > watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
> > > > loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
> > > > drivers/inifiniband dirs). These are special cases as they use a
> > > > registration function with a non-const qualified ctl_table argument or
> > > > modify the arrays before passing them on to the registration function.
> > > >
> > > > Constifying ctl_table structs will prevent the modification of
> > > > proc_handler function pointers as the arrays would reside in .rodata.
> > > > This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
> > > > constify the ctl_table argument of proc_handlers") constified all the
> > > > proc_handlers.
> > >
> > > I could identify at least these occurences in s390 code as well:
> > Hey Alexander
> >
> > Thx for bringing these to my attention. I had completely missed them as
> > the spatch only deals with ctl_tables outside functions.
> >
> > Short answer:
> > These should not be included in the current patch because they are a
> > different pattern from how sysctl tables are usually used. So I will not
> > include them.
> >
> > With that said, I think it might be interesting to look closer at them
> > as they seem to be complicating the proc_handler (I have to look at them
> > closer).
> >
> > I see that they are defining a ctl_table struct within the functions and
> > just using the data (from the incoming ctl_table) to forward things down
> > to proc_do{u,}intvec_* functions. This is very odd and I have only seen
> > it done in order to change the incoming ctl_table (which is not what is
> > being done here).
> >
> > I will take a closer look after the merge window and circle back with
> > more info. Might take me a while as I'm not very familiar with s390
> > code; any additional information on why those are being used inside the
> > functions would be helpfull.
> >
> 
> Using const data on the stack is not as useful, because the stack is
> always mapped writable.
> 
> Global data structures marked 'const' will be moved into an ELF
> section that is typically mapped read-only in its entirely, and so the
> data cannot be modified by writing to it directly. No such protection
> is possible for the stack, and so the constness there is only enforced
> at compile time.
I completely agree with you. No reason to use const within those
functions. But why define those ctl_tables in function to begin with.
Can't you just use the ones that are defined outside the functions?

Best


-- 

Joel Granados



Re: [PATCH v3] powerpc/hugetlb: Disable gigantic hugepages if fadump is active

2025-01-27 Thread Christophe Leroy




Le 25/01/2025 à 11:49, Sourabh Jain a écrit :

The fadump kernel boots with limited memory solely to collect the kernel
core dump. Having gigantic hugepages in the fadump kernel is of no use.
Many times, the fadump kernel encounters OOM (Out of Memory) issues if
gigantic hugepages are allocated.

To address this, disable gigantic hugepages if fadump is active by
returning early from arch_hugetlb_valid_size() using
hugepages_supported(). hugepages_supported() returns false if fadump is
active.


Maybe you could explain how that's done. As far as I can see there is no 
powerpc specific version of hugepages_supported(), so it relies on 
hpage_shift being 0, but I was not able to quickly spot where that is done.




Returning early from arch_hugetlb_valid_size() not only disables
gigantic hugepages but also avoids unnecessary hstate initialization for
every hugepage size supported by the platform.

kernel logs related to hugepages with this patch included:
kernel argument passed: hugepagesz=1G hugepages=1

First kernel: gigantic hugepage got allocated
==

dmesg | grep -i "hugetlb"
-
HugeTLB: registered 1.00 GiB page size, pre-allocated 1 pages
HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page

$ cat /proc/meminfo | grep -i "hugetlb"
-
Hugetlb: 1048576 kB

Fadump kernel: gigantic hugepage not allocated
===

dmesg | grep -i "hugetlb"
-
[0.00] HugeTLB: unsupported hugepagesz=1G
[0.00] HugeTLB: hugepages=1 does not follow a valid hugepagesz, ignoring
[0.706375] HugeTLB support is disabled!
[0.773530] hugetlbfs: disabling because there are no supported hugepage 
sizes

$ cat /proc/meminfo | grep -i "hugetlb"
--


Cc: Christophe Leroy 
Cc: Hari Bathini 
Cc: Madhavan Srinivasan 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Ritesh Harjani (IBM)" 
Signed-off-by: Sourabh Jain 


Reviewed-by: Christophe Leroy 


---

Changelog:

v1:
https://lore.kernel.org/all/20250121150419.1342794-1-sourabhj...@linux.ibm.com/

v2:
https://lore.kernel.org/all/20250124103220.111303-1-sourabhj...@linux.ibm.com/
  - disable gigantic hugepage in arch code, arch_hugetlb_valid_size()

v3:
  - Do not modify the initialization of the shift variable

---
  arch/powerpc/mm/hugetlbpage.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 6b043180220a..88cfd182db4e 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -138,6 +138,9 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
int shift = __ffs(size);
int mmu_psize;
  
+	if (!hugepages_supported())

+   return false;
+
/* Check that it is a page size supported by the hardware and
 * that it fits within pagetable and slice limits. */
if (size <= PAGE_SIZE || !is_power_of_2(size))





Re: [PATCH RFC 9/9] dt-bindings: nand: Convert fsl,elbc bindings to YAML

2025-01-27 Thread Krzysztof Kozlowski
On Sun, Jan 26, 2025 at 07:59:04PM +0100, J. Neuschäfer wrote:
> Convert the Freescale localbus controller bindings from text form to
> YAML. The list of compatible strings reflects current usage.

simple-bus and 20 other compatibles you used were not present in the
original binding. Does above "list of compatible strings" mean you just
added them?

> 
> Changes compared to the txt version:
>  - removed the board-control (fsl,mpc8272ads-bcsr) node because it only
>appears in this example and nowhere else
>  - added a new example with NAND flash
> 
> Remaining issues:
>  - The localbus is not really a simple-bus: Unit addresses are not simply
>addresses on a memory bus. Instead, they have a format: The first cell
>is a chip select number, the remaining one or two cells are bus
>addresses.
> 
> Signed-off-by: J. Neuschäfer 
> ---
>  .../devicetree/bindings/mtd/fsl,elbc-fcm-nand.yaml |  61 +
>  .../bindings/powerpc/fsl/fsl,elbc-gpcm-uio.yaml|  55 

Please split the conversion from adding new bindings. For example above
file and its compatible fsl,elbc-gpcm-uio was not documented in original
TXT.

...

> diff --git a/Documentation/devicetree/bindings/powerpc/fsl/fsl,elbc.yaml 
> b/Documentation/devicetree/bindings/powerpc/fsl/fsl,elbc.yaml
> new file mode 100644
> index 
> ..6bbceb82c77826499abe85879e9189b18d396eea
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/powerpc/fsl/fsl,elbc.yaml
> @@ -0,0 +1,150 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/powerpc/fsl/fsl,elbc.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Freescale Enhanced Local Bus Controller

What sort of bus is it? Memory bus? Then place it with others, see
memory directory.

> +
> +maintainers:
> +  - J. Neuschäfer 
> +
> +properties:
> +  $nodename:
> +pattern: "^localbus@[0-9a-f]+$"
> +
> +  compatible:
> +oneOf:
> +  - items:
> +  - enum:
> +  - fsl,mpc8313-elbc
> +  - fsl,mpc8315-elbc
> +  - fsl,mpc8377-elbc
> +  - fsl,mpc8378-elbc
> +  - fsl,mpc8379-elbc
> +  - fsl,mpc8536-elbc
> +  - fsl,mpc8569-elbc
> +  - fsl,mpc8572-elbc
> +  - fsl,p1020-elbc
> +  - fsl,p1021-elbc
> +  - fsl,p1023-elbc
> +  - fsl,p2020-elbc
> +  - fsl,p2041-elbc
> +  - fsl,p3041-elbc
> +  - fsl,p4080-elbc
> +  - fsl,p5020-elbc
> +  - fsl,p5040-elbc
> +  - const: fsl,elbc
> +  - const: simple-bus
> +
> +  - items:
> +  - const: fsl,mpc8272-localbus
> +  - const: fsl,pq2-localbus
> +
> +  - items:
> +  - enum:
> +  - fsl,mpc8247-localbus
> +  - fsl,mpc8248-localbus
> +  - fsl,mpc8360-localbus
> +  - const: fsl,pq2pro-localbus
> +  - const: simple-bus
> +
> +  - items:
> +  - enum:
> +  - fsl,mpc8540-localbus
> +  - fsl,mpc8544-lbc
> +  - fsl,mpc8544-localbus
> +  - fsl,mpc8548-lbc
> +  - fsl,mpc8548-localbus
> +  - fsl,mpc8560-localbus
> +  - fsl,mpc8568-localbus
> +  - const: fsl,pq3-localbus
> +  - const: simple-bus
> +
> +  reg:
> +maxItems: 1
> +
> +  interrupts:
> +maxItems: 1
> +
> +  "#address-cells":
> +enum: [2, 3]
> +description: |
> +  The first cell is the chipselect number, and the remaining cells are 
> the
> +  offset into the chipselect.
> +
> +  "#size-cells":
> +enum: [1, 2]
> +description: |
> +  Either one or two, depending on how large each chipselect can be.
> +
> +  ranges:
> +description: |
> +  Each range corresponds to a single chipselect, and covers the entire
> +  access window as configured.
> +
> +patternProperties:
> +  "^.*@.*$":
> +type: object

And probably you need 
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +localbus@f0010100 {
> +compatible = "fsl,mpc8272-localbus",
> + "fsl,pq2-localbus";
> +#address-cells = <2>;
> +#size-cells = <1>;
> +reg = <0xf0010100 0x40>;

compatible, then reg - see DTS coding style.

> +
> +ranges = <0x0 0x0 0xfe00 0x0200
> +  0x1 0x0 0xf450 0x8000
> +  0x2 0x0 0xfd81 0x0001>;
> +
> +flash@0,0 {
> +compatible = "jedec-flash";
> +reg = <0x0 0x0 0x200>;

Well, here it is correct

> +bank-width = <4>;
> +device-width = <1>;
> +};
> +
> +simple-periph@2,0 {
> +compatible = "fsl,elbc-gpcm-uio";
> +reg = <0x2 0x0 0x1>;
> +elbc-gpcm-br = <0xfd810800>;
> +elbc-gpcm-or = <0x09f7>;
> 

Re: [PATCH v2] powerpc/hugetlb: Disable gigantic hugepages if fadump is active

2025-01-27 Thread Christophe Leroy




Le 25/01/2025 à 10:32, Sourabh Jain a écrit :

Hello Christophe


On 24/01/25 19:44, Christophe Leroy wrote:



Le 24/01/2025 à 11:32, Sourabh Jain a écrit :

The fadump kernel boots with limited memory solely to collect the kernel
core dump. Having gigantic hugepages in the fadump kernel is of no use.
Many times, the fadump kernel encounters OOM (Out of Memory) issues if
gigantic hugepages are allocated.

To address this, disable gigantic hugepages if fadump is active by
returning early from arch_hugetlb_valid_size() using
hugepages_supported(). hugepages_supported() returns false if fadump is
active.

Returning early from arch_hugetlb_valid_size() not only disables
gigantic hugepages but also avoids unnecessary hstate initialization for
every hugepage size supported by the platform.

kernel logs related to hugepages with this patch included:
kernel argument passed: hugepagesz=1G hugepages=1

First kernel: gigantic hugepage got allocated
==

dmesg | grep -i "hugetlb"
-
HugeTLB: registered 1.00 GiB page size, pre-allocated 1 pages
HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page

$ cat /proc/meminfo | grep -i "hugetlb"
-
Hugetlb: 1048576 kB

Fadump kernel: gigantic hugepage not allocated
===

dmesg | grep -i "hugetlb"
-
[    0.00] HugeTLB: unsupported hugepagesz=1G
[    0.00] HugeTLB: hugepages=1 does not follow a valid 
hugepagesz, ignoring

[    0.706375] HugeTLB support is disabled!
[    0.773530] hugetlbfs: disabling because there are no supported 
hugepage sizes


$ cat /proc/meminfo | grep -i "hugetlb"
--


Cc: Hari Bathini 
Cc: Madhavan Srinivasan 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Ritesh Harjani (IBM)" 
Signed-off-by: Sourabh Jain 
---

Changelog:

v1:
https://eur01.safelinks.protection.outlook.com/? 
url=https%3A%2F%2Flore.kernel.org%2Fall%2F20250121150419.1342794-1- 
sourabhjain%40linux.ibm.com%2F&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7C747d8db131e44ea472cf08dd3d234a0f%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638733943957107236%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=wArHG9ozTqBJWRbp850rkU7ioemZjai3TEf7nZGm4h0%3D&reserved=0


v2:
  - disable gigantic hugepage in arch code, arch_hugetlb_valid_size()

---
  arch/powerpc/mm/hugetlbpage.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/ 
hugetlbpage.c

index 6b043180220a..087a8df32416 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -135,8 +135,12 @@ int __init alloc_bootmem_huge_page(struct hstate 
*h, int nid)

    bool __init arch_hugetlb_valid_size(unsigned long size)
  {
-    int shift = __ffs(size);
-    int mmu_psize;
+    int shift, mmu_psize;
+
+    if (!hugepages_supported())
+    return false;
+
+    shift = __ffs(size);


Why change the declaration/init of shift ?


I did this to avoid running code that wasn't necessary.


The compiler does it for you, when you look at the assembly you see the 
calculation is done only when needed.







It should be enough to leave things as they are and just add

if (!hugepages_supported())
    return false;


Sure.





    /* Check that it is a page size supported by the hardware and
   * that it fits within pagetable and slice limits. */




Thanks for the review.

- Sourabh Jain





Re: [PATCH v2] powerpc/hugetlb: Disable gigantic hugepages if fadump is active

2025-01-27 Thread Sourabh Jain





On 27/01/25 13:18, Christophe Leroy wrote:



Le 25/01/2025 à 10:32, Sourabh Jain a écrit :

Hello Christophe


On 24/01/25 19:44, Christophe Leroy wrote:



Le 24/01/2025 à 11:32, Sourabh Jain a écrit :
The fadump kernel boots with limited memory solely to collect the 
kernel
core dump. Having gigantic hugepages in the fadump kernel is of no 
use.

Many times, the fadump kernel encounters OOM (Out of Memory) issues if
gigantic hugepages are allocated.

To address this, disable gigantic hugepages if fadump is active by
returning early from arch_hugetlb_valid_size() using
hugepages_supported(). hugepages_supported() returns false if 
fadump is

active.

Returning early from arch_hugetlb_valid_size() not only disables
gigantic hugepages but also avoids unnecessary hstate 
initialization for

every hugepage size supported by the platform.

kernel logs related to hugepages with this patch included:
kernel argument passed: hugepagesz=1G hugepages=1

First kernel: gigantic hugepage got allocated
==

dmesg | grep -i "hugetlb"
-
HugeTLB: registered 1.00 GiB page size, pre-allocated 1 pages
HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page

$ cat /proc/meminfo | grep -i "hugetlb"
-
Hugetlb: 1048576 kB

Fadump kernel: gigantic hugepage not allocated
===

dmesg | grep -i "hugetlb"
-
[    0.00] HugeTLB: unsupported hugepagesz=1G
[    0.00] HugeTLB: hugepages=1 does not follow a valid 
hugepagesz, ignoring

[    0.706375] HugeTLB support is disabled!
[    0.773530] hugetlbfs: disabling because there are no supported 
hugepage sizes


$ cat /proc/meminfo | grep -i "hugetlb"
--


Cc: Hari Bathini 
Cc: Madhavan Srinivasan 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Ritesh Harjani (IBM)" 
Signed-off-by: Sourabh Jain 
---

Changelog:

v1:
https://eur01.safelinks.protection.outlook.com/? 
url=https%3A%2F%2Flore.kernel.org%2Fall%2F20250121150419.1342794-1- 
sourabhjain%40linux.ibm.com%2F&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7C747d8db131e44ea472cf08dd3d234a0f%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638733943957107236%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=wArHG9ozTqBJWRbp850rkU7ioemZjai3TEf7nZGm4h0%3D&reserved=0 



v2:
  - disable gigantic hugepage in arch code, arch_hugetlb_valid_size()

---
  arch/powerpc/mm/hugetlbpage.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/ 
hugetlbpage.c

index 6b043180220a..087a8df32416 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -135,8 +135,12 @@ int __init alloc_bootmem_huge_page(struct 
hstate *h, int nid)

    bool __init arch_hugetlb_valid_size(unsigned long size)
  {
-    int shift = __ffs(size);
-    int mmu_psize;
+    int shift, mmu_psize;
+
+    if (!hugepages_supported())
+    return false;
+
+    shift = __ffs(size);


Why change the declaration/init of shift ?


I did this to avoid running code that wasn't necessary.


The compiler does it for you, when you look at the assembly you see 
the calculation is done only when needed.


Yeah I didn't consider that. Thanks.

- Sourabh Jain




Re: [PATCH v3] powerpc/hugetlb: Disable gigantic hugepages if fadump is active

2025-01-27 Thread Sourabh Jain

Hello Christophe,


On 27/01/25 13:23, Christophe Leroy wrote:



Le 25/01/2025 à 11:49, Sourabh Jain a écrit :

The fadump kernel boots with limited memory solely to collect the kernel
core dump. Having gigantic hugepages in the fadump kernel is of no use.
Many times, the fadump kernel encounters OOM (Out of Memory) issues if
gigantic hugepages are allocated.

To address this, disable gigantic hugepages if fadump is active by
returning early from arch_hugetlb_valid_size() using
hugepages_supported(). hugepages_supported() returns false if fadump is
active.


Maybe you could explain how that's done. As far as I can see there is 
no powerpc specific version of hugepages_supported(), so it relies on 
hpage_shift being 0, but I was not able to quickly spot where that is 
done.



hugepages_supported() is defined for powerpc in
arch/powerpc/include/asm/hugetlb.h

static inline bool hugepages_supported(void)
{
    if (hugetlb_disabled)
    return false;

    return HPAGE_SHIFT != 0;
}

Bu yes I will add a point about how it is disable for fadump kernel in
the commit message.






Returning early from arch_hugetlb_valid_size() not only disables
gigantic hugepages but also avoids unnecessary hstate initialization for
every hugepage size supported by the platform.

kernel logs related to hugepages with this patch included:
kernel argument passed: hugepagesz=1G hugepages=1

First kernel: gigantic hugepage got allocated
==

dmesg | grep -i "hugetlb"
-
HugeTLB: registered 1.00 GiB page size, pre-allocated 1 pages
HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page

$ cat /proc/meminfo | grep -i "hugetlb"
-
Hugetlb: 1048576 kB

Fadump kernel: gigantic hugepage not allocated
===

dmesg | grep -i "hugetlb"
-
[    0.00] HugeTLB: unsupported hugepagesz=1G
[    0.00] HugeTLB: hugepages=1 does not follow a valid 
hugepagesz, ignoring

[    0.706375] HugeTLB support is disabled!
[    0.773530] hugetlbfs: disabling because there are no supported 
hugepage sizes


$ cat /proc/meminfo | grep -i "hugetlb"
--


Cc: Christophe Leroy 
Cc: Hari Bathini 
Cc: Madhavan Srinivasan 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Ritesh Harjani (IBM)" 
Signed-off-by: Sourabh Jain 


Reviewed-by: Christophe Leroy 



Thank you for the Review!

- Sourabh Jain





---

Changelog:

v1:
https://lore.kernel.org/all/20250121150419.1342794-1-sourabhj...@linux.ibm.com/ 



v2:
https://lore.kernel.org/all/20250124103220.111303-1-sourabhj...@linux.ibm.com/ 


  - disable gigantic hugepage in arch code, arch_hugetlb_valid_size()

v3:
  - Do not modify the initialization of the shift variable

---
  arch/powerpc/mm/hugetlbpage.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/mm/hugetlbpage.c 
b/arch/powerpc/mm/hugetlbpage.c

index 6b043180220a..88cfd182db4e 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -138,6 +138,9 @@ bool __init arch_hugetlb_valid_size(unsigned long 
size)

  int shift = __ffs(size);
  int mmu_psize;
  +    if (!hugepages_supported())
+    return false;
+
  /* Check that it is a page size supported by the hardware and
   * that it fits within pagetable and slice limits. */
  if (size <= PAGE_SIZE || !is_power_of_2(size))







Re: Re: Re: [PATCH v2] treewide: const qualify ctl_tables where applicable

2025-01-27 Thread Matthew Wilcox
On Mon, Jan 27, 2025 at 04:55:58PM +0200, Jani Nikula wrote:
> You could have static const within functions too. You get the rodata
> protection and function local scope, best of both worlds?

timer_active is on the stack, so it can't be static const.

Does this really need to be cc'd to such a wide distribution list?



[PATCH] kbuild: Add missing $(objtree) prefix to powerpc crtsavres.o artifact

2025-01-27 Thread Kienan Stewart
In the upstream commit 214c0eea43b2ea66bcd6467ea57e47ce8874191b
("kbuild: add $(objtree)/ prefix to some in-kernel build artifacts")
artifacts required for building out-of-tree kernel modules had
$(objtree) prepended to them to prepare for building in other
directories.

When building external modules for powerpc,
arch/powerpc/lib/crtsavres.o is required for certain
configurations. This artifact is missing the prepended $(objtree).

External modules may work around this omission for v6.13 by setting MO=$KDIR.

Signed-off-by: Kienan Stewart 
---
 arch/powerpc/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 
f3804103c56ccfdb16289468397ccaea71bf721e..9933b98df69d7f7b9aaf33d36155cc61ab4460c7
 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -58,7 +58,7 @@ ifeq ($(CONFIG_PPC64)$(CONFIG_LD_IS_BFD),yy)
 # There is a corresponding test in arch/powerpc/lib/Makefile
 KBUILD_LDFLAGS_MODULE += --save-restore-funcs
 else
-KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
+KBUILD_LDFLAGS_MODULE += $(objtree)/arch/powerpc/lib/crtsavres.o
 endif
 
 ifdef CONFIG_CPU_LITTLE_ENDIAN

---
base-commit: ffd294d346d185b70e28b1a28abe367bbfe53c04
change-id: 20250127-buildfix-extmod-powerpc-a744e1331f83

Best regards,
-- 
Kienan Stewart 




Re: [PATCH v6 1/2] kexec: Consolidate machine_kexec_mask_interrupts() implementation

2025-01-27 Thread Will Deacon
On Wed, Dec 04, 2024 at 02:20:02PM +, Eliav Farber wrote:
> Consolidate the machine_kexec_mask_interrupts implementation into a common
> function located in a new file: kernel/irq/kexec.c. This removes duplicate
> implementations from architecture-specific files in arch/arm, arch/arm64,
> arch/powerpc, and arch/riscv, reducing code duplication and improving
> maintainability.
> 
> The new implementation retains architecture-specific behavior for
> CONFIG_GENERIC_IRQ_KEXEC_CLEAR_VM_FORWARD, which was previously implemented
> for ARM64. When enabled (currently for ARM64), it clears the active state
> of interrupts forwarded to virtual machines (VMs) before handling other
> interrupt masking operations.
> 
> Signed-off-by: Eliav Farber 
> ---
> V5 -> V6:
>  - Change GENERIC_IRQ_KEXEC_CLEAR_VM_FORWARD to not be user selectable.
> 
>  arch/arm/kernel/machine_kexec.c   | 23 --
>  arch/arm64/Kconfig|  1 +
>  arch/arm64/kernel/machine_kexec.c | 31 
>  arch/powerpc/include/asm/kexec.h  |  1 -
>  arch/powerpc/kexec/core.c | 22 -
>  arch/powerpc/kexec/core_32.c  |  1 +
>  arch/riscv/kernel/machine_kexec.c | 23 --
>  include/linux/irq.h   |  3 +++
>  kernel/irq/Kconfig|  6 +
>  kernel/irq/Makefile   |  2 +-
>  kernel/irq/kexec.c| 40 +++
>  11 files changed, 52 insertions(+), 101 deletions(-)
>  create mode 100644 kernel/irq/kexec.c
> 
> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
> index 80ceb5bd2680..dd430477e7c1 100644
> --- a/arch/arm/kernel/machine_kexec.c
> +++ b/arch/arm/kernel/machine_kexec.c
> @@ -127,29 +127,6 @@ void crash_smp_send_stop(void)
>   cpus_stopped = 1;
>  }
>  
> -static void machine_kexec_mask_interrupts(void)
> -{
> - unsigned int i;
> - struct irq_desc *desc;
> -
> - for_each_irq_desc(i, desc) {
> - struct irq_chip *chip;
> -
> - chip = irq_desc_get_chip(desc);
> - if (!chip)
> - continue;
> -
> - if (chip->irq_eoi && irqd_irq_inprogress(&desc->irq_data))
> - chip->irq_eoi(&desc->irq_data);
> -
> - if (chip->irq_mask)
> - chip->irq_mask(&desc->irq_data);
> -
> - if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
> - chip->irq_disable(&desc->irq_data);
> - }
> -}
> -
>  void machine_crash_shutdown(struct pt_regs *regs)
>  {
>   local_irq_disable();
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 100570a048c5..dcc3551cf6c2 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -149,6 +149,7 @@ config ARM64
>   select GENERIC_IDLE_POLL_SETUP
>   select GENERIC_IOREMAP
>   select GENERIC_IRQ_IPI
> + select GENERIC_IRQ_KEXEC_CLEAR_VM_FORWARD

Is this really specific to VMs? In the new code, it's just controlling
this part of the old code:

> diff --git a/arch/arm64/kernel/machine_kexec.c 
> b/arch/arm64/kernel/machine_kexec.c
> index 82e2203d86a3..6f121a0164a4 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -207,37 +207,6 @@ void machine_kexec(struct kimage *kimage)
>   BUG(); /* Should never get here. */
>  }
>  
> -static void machine_kexec_mask_interrupts(void)
> -{
> - unsigned int i;
> - struct irq_desc *desc;
> -
> - for_each_irq_desc(i, desc) {
> - struct irq_chip *chip;
> - int ret;
> -
> - chip = irq_desc_get_chip(desc);
> - if (!chip)
> - continue;
> -
> - /*
> -  * First try to remove the active state. If this
> -  * fails, try to EOI the interrupt.
> -  */
> - ret = irq_set_irqchip_state(i, IRQCHIP_STATE_ACTIVE, false);

which is about active interrupts. Why can't that happen for the host?

Will



[PATCH 1/2] powerpc: properly negate error in syscall_set_return_value() in sc case

2025-01-27 Thread Dmitry V. Levin
According to the Power Architecture Linux system call ABI documented in
[1], when the syscall is made with the sc instruction, both a value and an
error condition are returned, where r3 register contains the return value,
and cr0.SO bit specifies the error condition.  When cr0.SO is clear, the
syscall succeeded and r3 is the return value.  When cr0.SO is set, the
syscall failed and r3 is the error value.  This syscall return semantics
was implemented from the very beginning of Power Architecture on Linux,
and syscall tracers and debuggers like strace that read or modify syscall
return information also rely on this ABI.

r3 and cr0.SO are exposed directly via struct pt_regs where gpr[3] and
(ccr & 0x1000) correspond to r3 and cr0.SO, respectively.
For example, here is an excerpt from check_syscall_restart() that assigns
these members of struct pt_regs:
regs->result = -EINTR;
regs->gpr[3] = EINTR;
regs->ccr |= 0x1000;
In this example, the semantics of negative ERRORCODE that's being used
virtually everywhere in generic kernel code is translated to powerpc sc
syscall return ABI which uses positive ERRORCODE and cr0.SO bit.

Also, r3 and cr0.SO are exposed indirectly via helpers.
For example, here is an excerpt from syscall_get_error():
/*
 * If the system call failed,
 * regs->gpr[3] contains a positive ERRORCODE.
 */
return (regs->ccr & 0x1000UL) ? -regs->gpr[3] : 0;
and here is another example, from regs_return_value():
if (is_syscall_success(regs))
return regs->gpr[3];
else
return -regs->gpr[3];
In these examples, the powerpc sc syscall return ABI which uses positive
ERRORCODE and cr0.SO bit is translated to the semantics of negative
ERRORCODE that's being used virtually everywhere in generic kernel code.

Up to a certain point in time the kernel managed to implement the powerpc
sc syscall return ABI in all cases where struct pt_regs was exposed to user
space.

The situation changed when SECCOMP_RET_TRACE support was introduced.
At this point the -ERRORCODE semantics that was used under the hood to
implement seccomp on powerpc became exposed to user space.  The tracer
handling PTRACE_EVENT_SECCOMP is not just able to observe -ENOSYS in gpr[3]
- this is relatively harmless as at this stage there is no syscall return
yet so the powerpc sc syscall return ABI does not apply.  What's important
is that the tracer can change the syscall number to -1 thus making the
syscall fail, and at this point the tracer is also able to specify the
error value.  This has to be done in accordance with the syscall return
ABI, however, the current implementation of do_seccomp() supports both the
generic kernel -ERRORCODE return value ABI and the powerpc sc syscall
return ABI, thanks to syscall_exit_prepare() that converts the former to
the latter.  Consequently, seccomp_bpf selftest passes both with and
without this change.

Now comes the moment when PTRACE_SET_SYSCALL_INFO is going to be
introduced.  PTRACE_SET_SYSCALL_INFO is a generic ptrace API that
complements PTRACE_GET_SYSCALL_INFO by letting the ptracer modify
the details of the system calls the tracee is blocked in.

One of the helpers that have to be used to implement
PTRACE_SET_SYSCALL_INFO is syscall_set_return_value().
This helper complements other two helpers, syscall_get_error() and
syscall_get_return_value(), that are currently used to implement
PTRACE_GET_SYSCALL_INFO on syscall return.  When syscall_set_return_value()
is used to set an error code, the caller specifies it as a negative value
in -ERRORCODE format.

Unfortunately, this does not work well on powerpc since commit 1b1a3702a65c
("powerpc: Don't negate error in syscall_set_return_value()") because
syscall_set_return_value() does not follow the powerpc sc syscall return
ABI:
/*
 * In the general case it's not obvious that we must deal with
 * CCR here, as the syscall exit path will also do that for us.
 * However there are some places, eg. the signal code, which
 * check ccr to decide if the value in r3 is actually an error.
 */
if (error) {
regs->ccr |= 0x1000L;
regs->gpr[3] = error;
} else {
regs->ccr &= ~0x1000L;
regs->gpr[3] = val;
}

The reason why this syscall_set_return_value() implementation was able to
get away with violating the powerpc sc syscall return ABI is the following:
Up to now, syscall_set_return_value() on powerpc could be called only from
do_syscall_trace_enter() via do_seccomp(), there was no way it could be
called from do_syscall_trace_leave() which is the point where tracers on
syscall return are activated and the powerpc sc syscall return ABI has
to be respected.

Introduction of PTRACE_SET_SYSCALL_INFO necessitates a change of
syscall_set_return_value() to comply with the powerpc sc syscall return
ABI.  Without the 

[PATCH 2/2] powerpc: fix inconsistencies in syscall error return handling

2025-01-27 Thread Dmitry V. Levin
Since the introduction of SECCOMP_RET_TRACE support, the kernel supports
simultaneously both the generic kernel -ERRORCODE return value ABI and
the powerpc sc syscall return ABI for PTRACE_EVENT_SECCOMP tracers.
This change is an attempt to address the code inconsistencies in syscall
error return handling that were introduced as a side effect of the dual
ABI support.

Signed-off-by: Dmitry V. Levin 
---
 arch/powerpc/kernel/ptrace/ptrace.c | 23 ---
 arch/powerpc/kernel/signal.c| 11 +++
 arch/powerpc/kernel/syscall.c   |  6 +++---
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/ptrace/ptrace.c 
b/arch/powerpc/kernel/ptrace/ptrace.c
index 727ed4a14545..3778775bf6ba 100644
--- a/arch/powerpc/kernel/ptrace/ptrace.c
+++ b/arch/powerpc/kernel/ptrace/ptrace.c
@@ -207,7 +207,7 @@ static int do_seccomp(struct pt_regs *regs)
 * syscall parameter. This is different to the ptrace ABI where
 * both r3 and orig_gpr3 contain the first syscall parameter.
 */
-   regs->gpr[3] = -ENOSYS;
+   syscall_set_return_value(current, regs, -ENOSYS, 0);
 
/*
 * We use the __ version here because we have already checked
@@ -215,8 +215,18 @@ static int do_seccomp(struct pt_regs *regs)
 * have already loaded -ENOSYS into r3, or seccomp has put
 * something else in r3 (via SECCOMP_RET_ERRNO/TRACE).
 */
-   if (__secure_computing(NULL))
+   if (__secure_computing(NULL)) {
+
+   /*
+* Traditionally, both the generic kernel -ERRORCODE return
+* value ABI and the powerpc sc syscall return ABI is
+* supported.  For consistency, if the former is detected,
+* convert it to the latter.
+*/
+   if (!trap_is_scv(regs) && IS_ERR_VALUE(regs->gpr[3]))
+   syscall_set_return_value(current, regs, regs->gpr[3], 
0);
return -1;
+   }
 
/*
 * The syscall was allowed by seccomp, restore the register
@@ -226,6 +236,13 @@ static int do_seccomp(struct pt_regs *regs)
 * allow the syscall to proceed.
 */
regs->gpr[3] = regs->orig_gpr3;
+   if (!trap_is_scv(regs)) {
+   /*
+* Clear SO bit that was set in this function earlier by
+* syscall_set_return_value.
+*/
+   regs->ccr &= ~0x1000L;
+   }
 
return 0;
 }
@@ -315,7 +332,7 @@ long do_syscall_trace_enter(struct pt_regs *regs)
 * If we are aborting explicitly, or if the syscall number is
 * now invalid, set the return value to -ENOSYS.
 */
-   regs->gpr[3] = -ENOSYS;
+   syscall_set_return_value(current, regs, -ENOSYS, 0);
return -1;
 }
 
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index aa17e62f3754..1a38d6bcaed6 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "signal.h"
 
@@ -229,14 +230,8 @@ static void check_syscall_restart(struct pt_regs *regs, 
struct k_sigaction *ka,
regs_add_return_ip(regs, -4);
regs->result = 0;
} else {
-   if (trap_is_scv(regs)) {
-   regs->result = -EINTR;
-   regs->gpr[3] = -EINTR;
-   } else {
-   regs->result = -EINTR;
-   regs->gpr[3] = EINTR;
-   regs->ccr |= 0x1000;
-   }
+   regs->result = -EINTR;
+   syscall_set_return_value(current, regs, -EINTR, 0);
}
 }
 
diff --git a/arch/powerpc/kernel/syscall.c b/arch/powerpc/kernel/syscall.c
index be159ad4b77b..2fe47191e509 100644
--- a/arch/powerpc/kernel/syscall.c
+++ b/arch/powerpc/kernel/syscall.c
@@ -122,7 +122,7 @@ notrace long system_call_exception(struct pt_regs *regs, 
unsigned long r0)
if (unlikely(trap_is_unsupported_scv(regs))) {
/* Unsupported scv vector */
_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
-   return regs->gpr[3];
+   return regs_return_value(regs);
}
/*
 * We use the return value of do_syscall_trace_enter() as the
@@ -133,13 +133,13 @@ notrace long system_call_exception(struct pt_regs *regs, 
unsigned long r0)
 */
r0 = do_syscall_trace_enter(regs);
if (unlikely(r0 >= NR_syscalls))
-   return regs->gpr[3];
+   return regs_return_value(regs);
 
} else if (unlikely(r0 >= NR_syscalls)) {
if (unlikely(trap_is_unsupported_scv(regs))) {
/* Unsupported scv vector */
_exception(SIGILL, regs, ILL_ILLOPC, reg

Re: [PATCH 2/4] seccomp: kill the dead code in the !CONFIG_HAVE_ARCH_SECCOMP_FILTER version of __secure_computing()

2025-01-27 Thread Kees Cook
On Tue, Jan 21, 2025 at 03:30:39PM +0100, Oleg Nesterov wrote:
> How about
> 
>   __secure_computing()
>   {
>   return secure_computing_strict(syscall_get_nr(...));
>   }
> 
> in the "#ifndef CONFIG_HAVE_ARCH_SECCOMP_FILTER" section near
> secure_computing_strict() in kernel/seccomp.c ?

Yeah, that should be good.

-- 
Kees Cook



[PATCH v4] powerpc/hugetlb: Disable gigantic hugepages if fadump is active

2025-01-27 Thread Sourabh Jain
The fadump kernel boots with limited memory solely to collect the kernel
core dump. Having gigantic hugepages in the fadump kernel is of no use.
Many times, the fadump kernel encounters OOM (Out of Memory) issues if
gigantic hugepages are allocated.

To address this, disable gigantic hugepages if fadump is active by
returning early from arch_hugetlb_valid_size() using
hugepages_supported(). When fadump is active, the global variable
hugetlb_disabled is set to true, which is later used by the
PowerPC-specific hugepages_supported() function to determine hugepage
support.

Returning early from arch_hugetlb_vali_size() not only disables
gigantic hugepages but also avoids unnecessary hstate initialization for
every hugepage size supported by the platform.

kernel logs related to hugepages with this patch included:
kernel argument passed: hugepagesz=1G hugepages=1

First kernel: gigantic hugepage got allocated
==

dmesg | grep -i "hugetlb"
-
HugeTLB: registered 1.00 GiB page size, pre-allocated 1 pages
HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page

$ cat /proc/meminfo | grep -i "hugetlb"
-
Hugetlb: 1048576 kB

Fadump kernel: gigantic hugepage not allocated
===

dmesg | grep -i "hugetlb"
-
[0.00] HugeTLB: unsupported hugepagesz=1G
[0.00] HugeTLB: hugepages=1 does not follow a valid hugepagesz, ignoring
[0.706375] HugeTLB support is disabled!
[0.773530] hugetlbfs: disabling because there are no supported hugepage 
sizes

$ cat /proc/meminfo | grep -i "hugetlb"
--


Cc: Hari Bathini 
Cc: Madhavan Srinivasan 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Ritesh Harjani (IBM)" 
Reviewed-by: Christophe Leroy 
Signed-off-by: Sourabh Jain 
---
Changelog:

v1:
https://lore.kernel.org/all/20250121150419.1342794-1-sourabhj...@linux.ibm.com/

v2:
https://lore.kernel.org/all/20250124103220.111303-1-sourabhj...@linux.ibm.com/
 - disable gigantic hugepage in arch code, arch_hugetlb_valid_size()

v3:
https://lore.kernel.org/all/20250125104928.1-1-sourabhj...@linux.ibm.com/
 - Do not modify the initialization of the shift variable

v4:
- Update commit message to include how hugepages_supported() detects
  hugepages support when fadump is active
- Add Reviewed-by tag
- NO functional change

---
 arch/powerpc/mm/hugetlbpage.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 6b043180220a..88cfd182db4e 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -138,6 +138,9 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
int shift = __ffs(size);
int mmu_psize;
 
+   if (!hugepages_supported())
+   return false;
+
/* Check that it is a page size supported by the hardware and
 * that it fits within pagetable and slice limits. */
if (size <= PAGE_SIZE || !is_power_of_2(size))
-- 
2.48.1




[PATCH 27/27] mm/hugetlb: enable bootmem allocation from CMA areas

2025-01-27 Thread Frank van der Linden
If hugetlb_cma_only is enabled, we know that hugetlb pages
can only be allocated from CMA. Now that there is an interface
to do early reservations from a CMA area (returning memblock
memory), it can be used to allocate hugetlb pages from CMA.

This also allows for doing pre-HVO on these pages (if enabled).

Make sure to initialize the page structures and associated data
correctly. Create a flag to signal that a hugetlb page has been
allocated from CMA to make things a little easier.

Some configurations of powerpc have a special hugetlb bootmem
allocator, so introduce a boolean arch_specific_huge_bootmem_alloc
that returns true if such an allocator is present. In that case,
CMA bootmem allocations can't be used, so check that function
before trying.

Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Frank van der Linden 
---
 arch/powerpc/mm/hugetlbpage.c |   5 ++
 include/linux/hugetlb.h   |   7 ++
 mm/hugetlb.c  | 135 +-
 3 files changed, 114 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index d3c1b749dcfc..e53e4b4c8ef6 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -121,6 +121,11 @@ bool __init hugetlb_node_alloc_supported(void)
 {
return false;
 }
+
+bool __init arch_specific_huge_bootmem_alloc(struct hstate *h)
+{
+   return (firmware_has_feature(FW_FEATURE_LPAR) && !radix_enabled());
+}
 #endif
 
 
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 2512463bca49..bca3052fb175 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -591,6 +591,7 @@ enum hugetlb_page_flags {
HPG_freed,
HPG_vmemmap_optimized,
HPG_raw_hwp_unreliable,
+   HPG_cma,
__NR_HPAGEFLAGS,
 };
 
@@ -650,6 +651,7 @@ HPAGEFLAG(Temporary, temporary)
 HPAGEFLAG(Freed, freed)
 HPAGEFLAG(VmemmapOptimized, vmemmap_optimized)
 HPAGEFLAG(RawHwpUnreliable, raw_hwp_unreliable)
+HPAGEFLAG(Cma, cma)
 
 #ifdef CONFIG_HUGETLB_PAGE
 
@@ -678,14 +680,18 @@ struct hstate {
char name[HSTATE_NAME_LEN];
 };
 
+struct cma;
+
 struct huge_bootmem_page {
struct list_head list;
struct hstate *hstate;
unsigned long flags;
+   struct cma *cma;
 };
 
 #define HUGE_BOOTMEM_HVO   0x0001
 #define HUGE_BOOTMEM_ZONES_VALID   0x0002
+#define HUGE_BOOTMEM_CMA   0x0004
 
 bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m);
 
@@ -711,6 +717,7 @@ bool __init hugetlb_node_alloc_supported(void);
 
 void __init hugetlb_add_hstate(unsigned order);
 bool __init arch_hugetlb_valid_size(unsigned long size);
+bool __init arch_specific_huge_bootmem_alloc(struct hstate *h);
 struct hstate *size_to_hstate(unsigned long size);
 
 #ifndef HUGE_MAX_HSTATE
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 32ebde9039e2..183e8d0c2fb4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -61,7 +61,7 @@ static struct cma *hugetlb_cma[MAX_NUMNODES];
 static unsigned long hugetlb_cma_size_in_node[MAX_NUMNODES] __initdata;
 #endif
 static bool hugetlb_cma_only;
-static unsigned long hugetlb_cma_size __initdata;
+static unsigned long hugetlb_cma_size;
 
 __initdata struct list_head huge_boot_pages[MAX_NUMNODES];
 __initdata unsigned long hstate_boot_nrinvalid[HUGE_MAX_HSTATE];
@@ -132,8 +132,10 @@ static void hugetlb_free_folio(struct folio *folio)
 #ifdef CONFIG_CMA
int nid = folio_nid(folio);
 
-   if (cma_free_folio(hugetlb_cma[nid], folio))
+   if (folio_test_hugetlb_cma(folio)) {
+   WARN_ON(!cma_free_folio(hugetlb_cma[nid], folio));
return;
+   }
 #endif
folio_put(folio);
 }
@@ -1509,6 +1511,9 @@ static struct folio *alloc_gigantic_folio(struct hstate 
*h, gfp_t gfp_mask,
break;
}
}
+
+   if (folio)
+   folio_set_hugetlb_cma(folio);
}
 #endif
if (!folio) {
@@ -3175,6 +3180,63 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct 
*vma,
return ERR_PTR(-ENOSPC);
 }
 
+/*
+ * Some architectures do their own bootmem allocation, so they can't use
+ * early CMA allocation. So, allow for this function to be redefined.
+ */
+bool __init __attribute((weak))
+arch_specific_huge_bootmem_alloc(struct hstate *h)
+{
+   return false;
+}
+
+static bool __init hugetlb_early_cma(struct hstate *h)
+{
+   if (arch_specific_huge_bootmem_alloc(h))
+   return false;
+
+   return (hstate_is_gigantic(h) && hugetlb_cma_size && hugetlb_cma_only);
+}
+
+static __init void *alloc_bootmem(struct hstate *h, int nid)
+{
+   struct huge_bootmem_page *m;
+   unsigned long flags;
+   struct cma *cma;
+
+#ifdef CONFIG_CMA
+   if (hugetlb_early_cma(h)) {
+   flags = HUGE_BOOTMEM_CMA;
+   cma = hugetlb_cma[nid];
+ 

[PATCH v3 1/6] powerpc/pseries: Define common functions for RTAS sequence calls

2025-01-27 Thread Haren Myneni
The RTAS call can be normal where retrieves the data form the
hypervisor once or sequence based RTAS call which has to
issue multiple times until the complete data is obtained. For
some of these sequence RTAS calls, the OS should not interleave
calls with different input until the sequence is completed.
The data is collected for each call and copy to the buffer
for the entire sequence during ioctl() handle and then expose
this buffer to the user space with read() handle.

One such sequence RTAS call is ibm,get-vpd and its support is
already included in the current code. To add the similar support
for other sequence based calls, move the common functions in to
separate file and update papr_rtas_sequence struct with the
following callbacks so that RTAS call specific code will be
defined and executed to complete the sequence.

struct papr_rtas_sequence {
int error;
void params;
void (*begin) (struct papr_rtas_sequence *, void *);
void (*end) (struct papr_rtas_sequence *);
const char * (*work) (struct papr_rtas_sequence *, size_t *);
};

params: Input parameters used to pass for RTAS call.
Begin:  RTAS call specific function to initialize data
including work area allocation.
End:RTAS call specific function to free up resources
(free work area) after the sequence is completed.
Work:   The actual RTAS call specific function which collects
the data from the hypervisor.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/pseries/Makefile   |   2 +-
 .../platforms/pseries/papr-rtas-common.c  | 243 
 .../platforms/pseries/papr-rtas-common.h  |  45 +++
 arch/powerpc/platforms/pseries/papr-vpd.c | 270 +++---
 4 files changed, 335 insertions(+), 225 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/papr-rtas-common.c
 create mode 100644 arch/powerpc/platforms/pseries/papr-rtas-common.h

diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 7bf506f6b8c8..697c216b70dc 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -3,7 +3,7 @@ ccflags-$(CONFIG_PPC_PSERIES_DEBUG) += -DDEBUG
 
 obj-y  := lpar.o hvCall.o nvram.o reconfig.o \
   of_helpers.o rtas-work-area.o papr-sysparm.o \
-  papr-vpd.o \
+  papr-rtas-common.o papr-vpd.o \
   setup.o iommu.o event_sources.o ras.o \
   firmware.o power.o dlpar.o mobility.o rng.o \
   pci.o pci_dlpar.o eeh_pseries.o msi.o \
diff --git a/arch/powerpc/platforms/pseries/papr-rtas-common.c 
b/arch/powerpc/platforms/pseries/papr-rtas-common.c
new file mode 100644
index ..0209ec09d1fd
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/papr-rtas-common.c
@@ -0,0 +1,243 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "papr-common: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "papr-rtas-common.h"
+
+/*
+ * Sequence based RTAS HCALL has to issue multiple times to retrieve
+ * complete data from the hypervisor. For some of these RTAS calls,
+ * the OS should not interleave calls with different input until the
+ * sequence is completed. So data is collected for these calls during
+ * ioctl handle and export to user space with read() handle.
+ * This file provides common functions needed for such sequence based
+ * RTAS calls Ex: ibm,get-vpd and ibm,get-indices.
+ */
+
+bool papr_rtas_blob_has_data(const struct papr_rtas_blob *blob)
+{
+   return blob->data && blob->len;
+}
+
+void papr_rtas_blob_free(const struct papr_rtas_blob *blob)
+{
+   if (blob) {
+   kvfree(blob->data);
+   kfree(blob);
+   }
+}
+
+/**
+ * papr_rtas_blob_extend() - Append data to a &struct papr_rtas_blob.
+ * @blob: The blob to extend.
+ * @data: The new data to append to @blob.
+ * @len:  The length of @data.
+ *
+ * Context: May sleep.
+ * Return: -ENOMEM on allocation failure, 0 otherwise.
+ */
+static int papr_rtas_blob_extend(struct papr_rtas_blob *blob,
+   const char *data, size_t len)
+{
+   const size_t new_len = blob->len + len;
+   const size_t old_len = blob->len;
+   const char *old_ptr = blob->data;
+   char *new_ptr;
+
+   new_ptr = kvrealloc(old_ptr, new_len, GFP_KERNEL_ACCOUNT);
+   if (!new_ptr)
+   return -ENOMEM;
+
+   memcpy(&new_ptr[old_len], data, len);
+   blob->data = new_ptr;
+   blob->len = new_len;
+   return 0;
+}
+
+/**
+ * papr_rtas_blob_generate() - Construct a new &struct papr_rtas_blob.
+ * @seq: work function of the caller that is called to obtain
+ *   data with the caller RTAS call.
+ *
+ * The @work callback is invoked until it returns NULL. @seq is
+ * passed to @work in its first argument o

[PATCH v3 6/6] powerpc/pseries: Add papr-platform-dump character driver for dump retrieval

2025-01-27 Thread Haren Myneni
ibm,platform-dump RTAS call in combination with writable mapping
/dev/mem is issued to collect platform dump from the hypervisor
and may need multiple calls to get the complete dump. The current
implementation uses rtas_platform_dump() API provided by librtas
library to issue these RTAS calls. But /dev/mem access by the
user space is prohibited under system lockdown.

The solution should be to restrict access to RTAS function in user
space and provide kernel interfaces to collect dump. This patch
adds papr-platform-dump character driver and expose standard
interfaces such as open / ioctl/ read to user space in ways that
are compatible with lockdown.

PAPR (7.3.3.4.1 ibm,platform-dump) provides a method to obtain
the complete dump:
- Each dump will be identified by ID called dump tag.
- A sequence of RTAS calls have to be issued until retrieve the
  complete dump. The hypervisor expects the first RTAS call with
  the sequence 0 and the subsequent calls with the sequence
  number returned from the previous calls.
- The hypervisor returns "dump complete" status once the complete
  dump is retrieved. But expects one more RTAS call from the
  partition with the NULL buffer to invalidate dump which means
  the dump will be removed in the hypervisor.
- Sequence of calls are allowed with different dump IDs at the
  same time but not with the same dump ID.

Expose these interfaces to user space with a /dev/papr-platform-dump
character device using the following programming model:

   int devfd = open("/dev/papr-platform-dump", O_RDONLY);
   int fd = ioctl(devfd,PAPR_PLATFORM_DUMP_IOC_CREATE_HANDLE, &dump_id)
- Restrict user space to access with the same dump ID.
  Typically we do not expect user space requests the dump
  again for the same dump ID.
   char *buf = malloc(size);
   length = read(fd, buf, size);
- size should be minimum 1K based on PAPR and  <= 4K based
  on RTAS work area size. It will be restrict to RTAS work
  area size. Using 4K work area based on the current
  implementation in librtas library
- Each read call issue RTAS call to get the data based on
  the size requirement and returns bytes returned from the
  hypervisor
- If the previous call returns dump complete status, the
  next read returns 0 like EOF.
   ret = ioctl(PAPR_PLATFORM_DUMP_IOC_INVALIDATE, &dump_id)
- RTAS call with NULL buffer to invalidates the dump.

The read API should use the file descriptor obtained from ioctl
based on dump ID so that gets dump contents for the corresponding
dump ID. Implemented support in librtas (rtas_platform_dump()) for
this new ABI to support system lockdown.

Signed-off-by: Haren Myneni 
---
 .../include/uapi/asm/papr-platform-dump.h |  15 +
 arch/powerpc/platforms/pseries/Makefile   |   1 +
 .../platforms/pseries/papr-platform-dump.c| 409 ++
 3 files changed, 425 insertions(+)
 create mode 100644 arch/powerpc/include/uapi/asm/papr-platform-dump.h
 create mode 100644 arch/powerpc/platforms/pseries/papr-platform-dump.c

diff --git a/arch/powerpc/include/uapi/asm/papr-platform-dump.h 
b/arch/powerpc/include/uapi/asm/papr-platform-dump.h
new file mode 100644
index ..a1d89c290dab
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/papr-platform-dump.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_PAPR_PLATFORM_DUMP_H_
+#define _UAPI_PAPR_PLATFORM_DUMP_H_
+
+#include 
+#include 
+
+/*
+ * ioctl for /dev/papr-platform-dump. Returns a platform-dump handle fd
+ * corresponding to dump tag.
+ */
+#define PAPR_PLATFORM_DUMP_IOC_CREATE_HANDLE _IOW(PAPR_MISCDEV_IOC_ID, 6, 
__u64)
+#define PAPR_PLATFORM_DUMP_IOC_INVALIDATE_IOW(PAPR_MISCDEV_IOC_ID, 7, 
__u64)
+
+#endif /* _UAPI_PAPR_PLATFORM_DUMP_H_ */
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index e1db61877bb9..c82c94e0a73c 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -4,6 +4,7 @@ ccflags-$(CONFIG_PPC_PSERIES_DEBUG) += -DDEBUG
 obj-y  := lpar.o hvCall.o nvram.o reconfig.o \
   of_helpers.o rtas-work-area.o papr-sysparm.o \
   papr-rtas-common.o papr-vpd.o papr-indices.o \
+  papr-platform-dump.o \
   setup.o iommu.o event_sources.o ras.o \
   firmware.o power.o dlpar.o mobility.o rng.o \
   pci.o pci_dlpar.o eeh_pseries.o msi.o \
diff --git a/arch/powerpc/platforms/pseries/papr-platform-dump.c 
b/arch/powerpc/platforms/pseries/papr-platform-dump.c
new file mode 100644
index ..6b6c4313b300
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/papr-platform-dump.c
@@ -0,0 +1,409 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "papr-platform-dump: " fmt
+
+#include 
+#includ

[PATCH v3 3/6] powerpc/pseries: Add papr-indices char driver for ibm,get-indices

2025-01-27 Thread Haren Myneni
The RTAS call ibm,get-indices is used to obtain indices and
location codes for a specified indicator or sensor token. The
current implementation uses rtas_get_indices() API provided by
librtas library which allocates RMO buffer and issue this RTAS
call in the user space. But writable mapping /dev/mem access by
the user space is prohibited under system lockdown.

To overcome the restricted access in the user space, the kernel
provide interfaces to collect indices data from the hypervisor.
This patch adds papr-indices character driver and expose standard
interfaces such as open / ioctl/ read to user space in ways that
are compatible with lockdown.

PAPR (2.13 7.3.17 ibm,get-indices RTAS Call) describes the
following steps to retrieve all indices data:
- User input parameters to the RTAS call: sensor or indicator,
  and indice type
- ibm,get-indices is sequence RTAS call which means has to issue
  multiple times to get the entire list of indicators or sensors
  of a particular type. The hypervisor expects the first RTAS call
  with the sequence 1 and the subsequent calls with the sequence
  number returned from the previous calls.
- The OS may not interleave calls to ibm,get-indices for different
  indicator or sensor types. Means other RTAS calls with different
  type should not be issued while the previous type sequence is in
  progress. So collect the entire list of indices and copied to
  buffer BLOB during ioctl() and expose this buffer to the user
  space with the file descriptor.
- The hypervisor fills the work area with a specific format but
  does not return the number of bytes written to the buffer.
  Instead of parsing the data for each call to determine the data
  length, copy the work area size (RTAS_GET_INDICES_BUF_SIZE) to
  the buffer. Return work-area size of data to the user space for
  each read() call.

Expose these interfaces to user space with a /dev/papr-indices
character device using the following programming model:

 int devfd = open("/dev/papr-indices", O_RDONLY);
 int fd = ioctl(devfd, PAPR_INDICES_IOC_GET,
struct papr_indices_io_block)
  - Collect all indices data for the specified token to the buffer
 char *buf = malloc(RTAS_GET_INDICES_BUF_SIZE);
 length = read(fd, buf,  RTAS_GET_INDICES_BUF_SIZE)
  - RTAS_GET_INDICES_BUF_SIZE of data is returned to the user
space.
  - The user space retrieves the indices and their location codes
from the buffer
  - Should issue multiple read() calls until reaches the end of
BLOB buffer.

The read() should use the file descriptor obtained from ioctl to
get the data that is exposed to file descriptor. Implemented
support in librtas (rtas_get_indices()) for this new ABI for
system lockdown.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/rtas.h   |   1 +
 arch/powerpc/kernel/rtas.c|   2 +-
 arch/powerpc/platforms/pseries/Makefile   |   2 +-
 arch/powerpc/platforms/pseries/papr-indices.c | 369 ++
 4 files changed, 372 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/papr-indices.c

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 04406162fc5a..7dc527a5aaac 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -515,6 +515,7 @@ extern char rtas_data_buf[RTAS_DATA_BUF_SIZE];
 extern unsigned long rtas_rmo_buf;
 
 extern struct mutex rtas_ibm_get_vpd_lock;
+extern struct mutex rtas_ibm_get_indices_lock;
 
 #define GLOBAL_INTERRUPT_QUEUE 9005
 
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index d31c9799cab2..76c634b92cb2 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -93,11 +93,11 @@ struct rtas_function {
  */
 static DEFINE_MUTEX(rtas_ibm_activate_firmware_lock);
 static DEFINE_MUTEX(rtas_ibm_get_dynamic_sensor_state_lock);
-static DEFINE_MUTEX(rtas_ibm_get_indices_lock);
 static DEFINE_MUTEX(rtas_ibm_lpar_perftools_lock);
 static DEFINE_MUTEX(rtas_ibm_physical_attestation_lock);
 static DEFINE_MUTEX(rtas_ibm_set_dynamic_indicator_lock);
 DEFINE_MUTEX(rtas_ibm_get_vpd_lock);
+DEFINE_MUTEX(rtas_ibm_get_indices_lock);
 
 static struct rtas_function rtas_function_table[] __ro_after_init = {
[RTAS_FNIDX__CHECK_EXCEPTION] = {
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 697c216b70dc..e1db61877bb9 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -3,7 +3,7 @@ ccflags-$(CONFIG_PPC_PSERIES_DEBUG) += -DDEBUG
 
 obj-y  := lpar.o hvCall.o nvram.o reconfig.o \
   of_helpers.o rtas-work-area.o papr-sysparm.o \
-  papr-rtas-common.o papr-vpd.o \
+  papr-rtas-common.o papr-vpd.o papr-indices.o \
   setup.o iommu.o event_sources.o ras.o \
   firmware.o power.o dlpar.o mobilit

[PATCH v3 2/6] powerpc/pseries: Define papr_indices_io_block for papr-indices ioctls

2025-01-27 Thread Haren Myneni
To issue ibm,get-indices, ibm,set-dynamic-indicator and
ibm,get-dynamic-sensor-state in the user space, the RMO buffer is
allocated for the work area which is restricted under system
lockdown. So instead of user space execution, the kernel will
provide /dev/papr-indices interface to execute these RTAS calls.

The user space assigns data in papr_indices_io_block struct
depends on the specific HCALL and passes to the following ioctls:

PAPR_INDICES_IOC_GET:   Use for ibm,get-indices. Returns a
get-indices handle fd to read data.
PAPR_DYNAMIC_SENSOR_IOC_GET:Use for  ibm,get-dynamic-sensor-state.
Updates the sensor state in
papr_indices_io_block.dynamic_param.state

PAPR_DYNAMIC_INDICATOR_IOC_SET: Use for ibm,set-dynamic-indicator.
Sets the new state for the input
indicator.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/uapi/asm/papr-indices.h | 41 
 1 file changed, 41 insertions(+)
 create mode 100644 arch/powerpc/include/uapi/asm/papr-indices.h

diff --git a/arch/powerpc/include/uapi/asm/papr-indices.h 
b/arch/powerpc/include/uapi/asm/papr-indices.h
new file mode 100644
index ..c580025fe201
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/papr-indices.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_PAPR_INDICES_H_
+#define _UAPI_PAPR_INDICES_H_
+
+#include 
+#include 
+#include 
+
+#define LOC_CODE_SIZE  80
+#define RTAS_GET_INDICES_BUF_SIZE  SZ_4K
+
+struct papr_indices_io_block {
+   union {
+   struct {
+   __u8 is_sensor; /* 0 for indicator and 1 for sensor */
+   __u32 indice_type;
+   } indices;
+   struct {
+   __u32 token; /* Sensor or indicator token */
+   __u32 state; /* get / set state */
+   /*
+* PAPR+ 12.3.2.4 Converged Location Code Rules - Length
+* Restrictions. 79 characters plus null.
+*/
+   char location_code_str[LOC_CODE_SIZE]; /* location code 
*/
+   } dynamic_param;
+   };
+};
+
+/*
+ * ioctls for /dev/papr-indices.
+ * PAPR_INDICES_IOC_GET:   Returns a get-indices handle fd to read data
+ * PAPR_DYNAMIC_SENSOR_IOC_GET:Gets the state of the input sensor
+ * PAPR_DYNAMIC_INDICATOR_IOC_SET: Sets the new state for the input indicator
+ */
+#define PAPR_INDICES_IOC_GET   _IOW(PAPR_MISCDEV_IOC_ID, 3, struct 
papr_indices_io_block)
+#define PAPR_DYNAMIC_SENSOR_IOC_GET_IOWR(PAPR_MISCDEV_IOC_ID, 4, struct 
papr_indices_io_block)
+#define PAPR_DYNAMIC_INDICATOR_IOC_SET _IOW(PAPR_MISCDEV_IOC_ID, 5, struct 
papr_indices_io_block)
+
+
+#endif /* _UAPI_PAPR_INDICES_H_ */
-- 
2.43.5




[PATCH v3 5/6] powerpc/pseries: Add ibm,get-dynamic-sensor-state RTAS call support

2025-01-27 Thread Haren Myneni
The RTAS call ibm,get-dynamic-sensor-state is used to get the
sensor state identified by the location code and the sensor
token. The librtas library provides an API
rtas_get_dynamic_sensor() which uses /dev/mem access for work
area allocation but is restricted under system lockdown.

This patch provides an interface with new ioctl
 PAPR_DYNAMIC_SENSOR_IOC_GET to the papr-indices character
driver which executes this HCALL and copies the sensor state
in the user specified ioctl buffer.

Refer PAPR 7.3.19 ibm,get-dynamic-sensor-state for more
information on this RTAS call.
- User input parameters to the RTAS call: location code string
  and the sensor token

Expose these interfaces to user space with a /dev/papr-indices
character device using the following programming model:
 int fd = open("/dev/papr-indices", O_RDWR);
 int ret = ioctl(fd, PAPR_DYNAMIC_SENSOR_IOC_GET,
struct papr_indices_io_block)
  - The user space specifies input parameters in
papr_indices_io_block struct
  - Returned state for the specified sensor is copied to
papr_indices_io_block.dynamic_param.state

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/rtas.h   |  1 +
 arch/powerpc/kernel/rtas.c|  2 +-
 arch/powerpc/platforms/pseries/papr-indices.c | 67 +++
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 2da52f59e4c6..fcd822f0e1d7 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -517,6 +517,7 @@ extern unsigned long rtas_rmo_buf;
 extern struct mutex rtas_ibm_get_vpd_lock;
 extern struct mutex rtas_ibm_get_indices_lock;
 extern struct mutex rtas_ibm_set_dynamic_indicator_lock;
+extern struct mutex rtas_ibm_get_dynamic_sensor_state_lock;
 
 #define GLOBAL_INTERRUPT_QUEUE 9005
 
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 88fa416730af..a4848e7f248e 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -92,12 +92,12 @@ struct rtas_function {
  * Per-function locks for sequence-based RTAS functions.
  */
 static DEFINE_MUTEX(rtas_ibm_activate_firmware_lock);
-static DEFINE_MUTEX(rtas_ibm_get_dynamic_sensor_state_lock);
 static DEFINE_MUTEX(rtas_ibm_lpar_perftools_lock);
 static DEFINE_MUTEX(rtas_ibm_physical_attestation_lock);
 DEFINE_MUTEX(rtas_ibm_get_vpd_lock);
 DEFINE_MUTEX(rtas_ibm_get_indices_lock);
 DEFINE_MUTEX(rtas_ibm_set_dynamic_indicator_lock);
+DEFINE_MUTEX(rtas_ibm_get_dynamic_sensor_state_lock);
 
 static struct rtas_function rtas_function_table[] __ro_after_init = {
[RTAS_FNIDX__CHECK_EXCEPTION] = {
diff --git a/arch/powerpc/platforms/pseries/papr-indices.c 
b/arch/powerpc/platforms/pseries/papr-indices.c
index e4b4eb8ece84..ca6ff2529017 100644
--- a/arch/powerpc/platforms/pseries/papr-indices.c
+++ b/arch/powerpc/platforms/pseries/papr-indices.c
@@ -441,6 +441,67 @@ static long papr_dynamic_indicator_ioc_set(struct 
papr_indices_io_block __user *
return ret;
 }
 
+/**
+ * papr_dynamic_sensor_ioc_get - ibm,get-dynamic-sensor-state RTAS Call
+ * PAPR 2.13 7.3.19
+ *
+ * @ubuf: Input parameters to RTAS call such as sensor token
+ *Copies the state in user space buffer.
+ *
+ *
+ * Returns success or -errno.
+ */
+
+static long papr_dynamic_sensor_ioc_get(struct papr_indices_io_block __user 
*ubuf)
+{
+   struct papr_indices_io_block kbuf;
+   struct rtas_work_area *work_area;
+   s32 fwrc, token, ret;
+   u32 rets;
+
+   token = rtas_function_token(RTAS_FN_IBM_GET_DYNAMIC_SENSOR_STATE);
+   if (token == RTAS_UNKNOWN_SERVICE)
+   return -ENOENT;
+
+   mutex_lock(&rtas_ibm_get_dynamic_sensor_state_lock);
+   work_area = papr_dynamic_indice_buf_from_user(ubuf, &kbuf);
+   if (IS_ERR(work_area)) {
+   ret = PTR_ERR(work_area);
+   goto out;
+   }
+
+   do {
+   fwrc = rtas_call(token, 2, 2, &rets,
+   kbuf.dynamic_param.token,
+   rtas_work_area_phys(work_area));
+   } while (rtas_busy_delay(fwrc));
+
+   rtas_work_area_free(work_area);
+
+   switch (fwrc) {
+   case RTAS_IBM_DYNAMIC_INDICE_SUCCESS:
+   if (put_user(rets, &ubuf->dynamic_param.state))
+   ret = -EFAULT;
+   else
+   ret = 0;
+   break;
+   case RTAS_IBM_DYNAMIC_INDICE_NO_INDICATOR:  /* No such indicator */
+   ret = -EOPNOTSUPP;
+   break;
+   default:
+   pr_err("unexpected ibm,get-dynamic-sensor result %d\n",
+   fwrc);
+   fallthrough;
+   case RTAS_IBM_DYNAMIC_INDICE_HW_ERROR:  /* Hardware/platform error */
+   ret = -EIO;
+   break;
+   }
+
+out:
+   mutex_unlock(&rtas_ibm_get_dynamic_sensor_state_lock);
+   return ret;
+}
+

[PATCH v3 4/6] powerpc/pseries: Add ibm,set-dynamic-indicator RTAS call support

2025-01-27 Thread Haren Myneni
The RTAS call ibm,set-dynamic-indicator is used to set the new
indicator state identified by a location code. The current
implementation uses rtas_set_dynamic_indicator() API provided by
librtas library which allocates RMO buffer and issue this RTAS
call in the user space. But /dev/mem access by the user space
is prohibited under system lockdown.

This patch provides an interface with new ioctl
PAPR_DYNAMIC_INDICATOR_IOC_SET to the papr-indices character
driver and expose this interface to the user space that is
compatible with lockdown.

Refer PAPR 7.3.18 ibm,set-dynamic-indicator for more
information on this RTAS call.
-  User input parameters to the RTAS call: location code
   string, indicator token and new state

Expose these interfaces to user space with a /dev/papr-indices
character device using the following programming model:
 int fd = open("/dev/papr-indices", O_RDWR);
 int ret = ioctl(fd, PAPR_DYNAMIC_INDICATOR_IOC_SET,
struct papr_indices_io_block)
  - The user space passes input parameters in papr_indices_io_block
struct

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/rtas.h   |   1 +
 arch/powerpc/kernel/rtas.c|   2 +-
 arch/powerpc/platforms/pseries/papr-indices.c | 122 ++
 3 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 7dc527a5aaac..2da52f59e4c6 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -516,6 +516,7 @@ extern unsigned long rtas_rmo_buf;
 
 extern struct mutex rtas_ibm_get_vpd_lock;
 extern struct mutex rtas_ibm_get_indices_lock;
+extern struct mutex rtas_ibm_set_dynamic_indicator_lock;
 
 #define GLOBAL_INTERRUPT_QUEUE 9005
 
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 76c634b92cb2..88fa416730af 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -95,9 +95,9 @@ static DEFINE_MUTEX(rtas_ibm_activate_firmware_lock);
 static DEFINE_MUTEX(rtas_ibm_get_dynamic_sensor_state_lock);
 static DEFINE_MUTEX(rtas_ibm_lpar_perftools_lock);
 static DEFINE_MUTEX(rtas_ibm_physical_attestation_lock);
-static DEFINE_MUTEX(rtas_ibm_set_dynamic_indicator_lock);
 DEFINE_MUTEX(rtas_ibm_get_vpd_lock);
 DEFINE_MUTEX(rtas_ibm_get_indices_lock);
+DEFINE_MUTEX(rtas_ibm_set_dynamic_indicator_lock);
 
 static struct rtas_function rtas_function_table[] __ro_after_init = {
[RTAS_FNIDX__CHECK_EXCEPTION] = {
diff --git a/arch/powerpc/platforms/pseries/papr-indices.c 
b/arch/powerpc/platforms/pseries/papr-indices.c
index c5cd63dae609..e4b4eb8ece84 100644
--- a/arch/powerpc/platforms/pseries/papr-indices.c
+++ b/arch/powerpc/platforms/pseries/papr-indices.c
@@ -27,6 +27,15 @@
 #define RTAS_IBM_GET_INDICES_MORE_DATA   1 /* More data is available. */
 #define RTAS_IBM_GET_INDICES_START_OVER -4 /* Indices list changed, restart 
call sequence. */
 
+/*
+ * Function-specific return values for ibm,set-dynamic-indicator and
+ * ibm,get-dynamic-sensor-state RTAS calls, drived from PAPR+
+ * v2.13 7.3.18 and 7.3.19.
+ */
+#define RTAS_IBM_DYNAMIC_INDICE_SUCCESS0
+#define RTAS_IBM_DYNAMIC_INDICE_HW_ERROR   -1
+#define RTAS_IBM_DYNAMIC_INDICE_NO_INDICATOR   -3
+
 /**
  * struct rtas_get_indices_params - Parameters (in and out) for
  *  ibm,get-indices.
@@ -328,6 +337,110 @@ static long papr_indices_create_handle(struct 
papr_indices_io_block __user *ubuf
return fd;
 }
 
+/*
+ * Create work area with the input parameters. This function is used
+ * for both ibm,set-dynamic-indicator and ibm,get-dynamic-sensor-state
+ * RTAS Calls.
+ */
+static struct rtas_work_area *
+papr_dynamic_indice_buf_from_user(struct papr_indices_io_block __user *ubuf,
+   struct papr_indices_io_block *kbuf)
+{
+   struct rtas_work_area *work_area;
+   u32 length;
+   __be32 len_be;
+
+   if (copy_from_user(kbuf, ubuf, sizeof(*kbuf)))
+   return ERR_PTR(-EFAULT);
+
+
+   if (!string_is_terminated(kbuf->dynamic_param.location_code_str,
+   ARRAY_SIZE(kbuf->dynamic_param.location_code_str)))
+   return ERR_PTR(-EINVAL);
+
+   /*
+* The input data in the work area should be as follows:
+* - 32-bit integer length of the location code string,
+*   including NULL.
+* - Location code string, NULL terminated, identifying the
+*   token (sensor or indicator).
+* PAPR 2.13 - R1–7.3.18–5 ibm,set-dynamic-indicator
+*   - R1–7.3.19–5 ibm,get-dynamic-sensor-state
+*/
+   /*
+* Length that user space passed should also include NULL
+* terminator.
+*/
+   length = strlen(kbuf->dynamic_param.location_code_str) + 1;
+   if (length > LOC_CODE_SIZE)
+   return ERR_PTR(-EINVAL);
+
+   len_be = cpu_to_be32(length);
+
+   work_area 

[PATCH v3 0/6] Add character devices for indices and platform-dump RTAS

2025-01-27 Thread Haren Myneni
Several APIs such as rtas_get_indices(), rtas_get_dynamic_sensor(),
rtas_set_dynamic_indicator() and rtas_platform_dump() provided by
librtas library are implemented in user space using rtas syscall
in combination with writable mappings of /dev/mem. But this
implementation is not compatible with system lockdown which
prohibits /dev/mem access. The current kernel already provides
char based driver interfaces for several RTAS calls such as VPD
and system parameters to support lockdown feature.

This patch series adds new char based drivers, /dev/papr-indices
for ibm,get-indices, ibm,get-dynamic-sensor-state and
ibm,set-dynamic-indicator RTAS Calls. and /dev/papr-platform-dump
for ibm,platform-dump. Providing the similar open/ioctl/read
interfaces to the user space as in the case of VPD and system
parameters.

I have made changes to librtas library to use the new kernel
interfaces if the corresponding device entry is available.

This patch series has the following patches:
powerpc/pseries: Define common functions for RTAS sequence calls
- For some of sequence based RTAS calls, the OS should not start
  another sequence with different input until the previous sequence
  is completed. So the sequence should be completed during ioctl()
  and expose the entire buffer during read(). ibm,get-indices is
  sequence based RTAS function similar to ibm,get-vpd and we already
  have the corresponding implementation for VPD driver. So update
  papr_rtas_sequence struct for RTAS call specific functions and move
  the top level sequence functions in to a separate file.

powerpc/pseries: Define papr_indices_io_block for papr-indices ioctls
- /dev/papr-indices driver supports ibm,get-indices,
  ibm,get-dynamic-sensor-state and ibm,set-dynamic-indicator RTAS Calls.
  papr-indices.h introduces 3 different ioctls for these RTAS calls and
  the corresponding ioctl input buffer.

powerpc/pseries: Add papr-indices char driver for ibm,get-indices
- Introduce /dev/papr-indices char based driver and add support for
  get-indices RTAS function

powerpc/pseries: Add ibm,set-dynamic-indicator RTAS call support
- Update /dev/papr-indices for set-dynamic-indicator RTAS function

powerpc/pseries: Add ibm,get-dynamic-sensor-state RTAS call support
-  Update /dev/papr-indices for  get-dynamic-sensor-state RTAS function

powerpc/pseries: Add papr-platform-dump character driver for dump
   retrieval
- Introduce /dev/papr-platform-dump char driver and adds support for
  ibm,platform-dump. Received suggestions from the previous post as a
  separate patch - Updated the patch with invalidating the dump using
  a separate ioctl.

Changelog:

v3:
- put_unused_fd() only after get_unused_fd() successful for the failure
  case later ("Add papr-platform-dump character driver for dump
  retrieval" patch).

v2:
- Added unlock rtas_ibm_set_dynamic_indicator_lock and
  rtas_ibm_get_dynamic_sensor_state_lock mutex for failure cases
  as reported by Dan Carpenter
- Fixed build warnings for the proper function parameter descriptions
  as reported by kernel test robot 

Haren Myneni (6):
  powerpc/pseries: Define common functions for RTAS sequence calls
  powerpc/pseries: Define papr_indices_io_block for papr-indices ioctls
  powerpc/pseries: Add papr-indices char driver for ibm,get-indices
  powerpc/pseries: Add ibm,set-dynamic-indicator RTAS call support
  powerpc/pseries: Add ibm,get-dynamic-sensor-state RTAS call support
  powerpc/pseries: Add papr-platform-dump character driver for dump
retrieval

 arch/powerpc/include/asm/rtas.h   |   3 +
 arch/powerpc/include/uapi/asm/papr-indices.h  |  41 ++
 .../include/uapi/asm/papr-platform-dump.h |  15 +
 arch/powerpc/kernel/rtas.c|   6 +-
 arch/powerpc/platforms/pseries/Makefile   |   3 +-
 arch/powerpc/platforms/pseries/papr-indices.c | 558 ++
 .../platforms/pseries/papr-platform-dump.c| 409 +
 .../platforms/pseries/papr-rtas-common.c  | 243 
 .../platforms/pseries/papr-rtas-common.h  |  45 ++
 arch/powerpc/platforms/pseries/papr-vpd.c | 270 ++---
 10 files changed, 1365 insertions(+), 228 deletions(-)
 create mode 100644 arch/powerpc/include/uapi/asm/papr-indices.h
 create mode 100644 arch/powerpc/include/uapi/asm/papr-platform-dump.h
 create mode 100644 arch/powerpc/platforms/pseries/papr-indices.c
 create mode 100644 arch/powerpc/platforms/pseries/papr-platform-dump.c
 create mode 100644 arch/powerpc/platforms/pseries/papr-rtas-common.c
 create mode 100644 arch/powerpc/platforms/pseries/papr-rtas-common.h

-- 
2.43.5