Re: [kvm-unit-tests PATCH v1 2/5] configure: Display the default processor for arm and arm64

2025-01-15 Thread Andrew Jones
On Wed, Jan 15, 2025 at 09:55:14AM +, Alexandru Elisei wrote:
> Hi Drew,
> 
> On Tue, Jan 14, 2025 at 07:51:04PM +0100, Andrew Jones wrote:
> > On Tue, Jan 14, 2025 at 05:17:28PM +, Alexandru Elisei wrote:
> > ...
> > > > > +# $arch will have changed when cross-compiling.
> > > > > +[ -z "$processor" ] && processor=$(get_default_processor $arch)
> > > > 
> > > > The fact that $arch and $processor are wrong until they've had a chance 
> > > > to
> > > 
> > > $processor is never wrong. $processor is unset until either the user sets 
> > > it
> > > with --processor, or until this line. This patch introduces 
> > > $default_processor
> > > only for the purpose of having an accurate help text, it doesn't change 
> > > when and
> > > how $processor is assigned.
> > 
> > I should have said "The fact that $arch and $default_processor are wrong..."
> > 
> > > 
> > > > be converted might be another reason for the $do_help idea. But it'll
> > > > always be fragile since another change that does some sort of conversion
> > > > could end up getting added after the '[ $do_help ] && usage' someday.
> > > 
> > > configure needs to distinguish between:
> > > 
> > > 1. The user not having specified --processor when doing ./configure.
> > > 2. The user having set --processor.
> > > 
> > > If 1, then kvm-unit-tests can use the default $processor value for $arch,
> > > which could have also been specified by the user.
> > > 
> > > If 2, then kvm-unit-tests should not touch $processor because that's what 
> > > the
> > > user wants.
> > > 
> > > Do you see something wrong with that reasoning?
> > 
> > If we output $default_processor in usage() before it's had a chance to be
> > set correctly based on a given cross arch, then it won't display the
> > correct name.
> > 
> > > 
> > > Also, I don't understand why you say it's fragile, since configure doesn't
> > 
> > I wrote "it'll always be fragile" where 'it' refers to the most recent
> > object of my paragraph ("the $do_help idea"). But, TBH, I'm not sure
> > how important it is to get the help text accurate, so we can just not
> > care if we call usage() with the wrong strings sometimes.
> 
> Got it now, thanks for explaining it.
> 
> My opinion is that a help text is there to help the user, and in my experience
> an inaccurate help text can be very frustrating - think comments that say
> one thing, and the code does something else.
> 
> How about this:
> 
> diff --git a/configure b/configure
> index 3ab0ec208e10..5dbe189816b2 100755
> --- a/configure
> +++ b/configure
> @@ -51,7 +51,6 @@ page_size=
>  earlycon=
>  efi=
>  efi_direct=
> -default_processor=$(get_default_processor $arch)
> 
>  # Enable -Werror by default for git repositories only (i.e. developer builds)
>  if [ -e "$srcdir"/.git ]; then
> @@ -61,13 +60,14 @@ else
>  fi
> 
>  usage() {
> +[ -z "$processor" ] && processor=$(get_default_processor $arch)
>  cat <<-EOF
> Usage: $0 [options]
> 
> Options include:
> --arch=ARCHarchitecture to compile for ($arch). ARCH 
> can be one of:
>arm, arm64/aarch64, i386, ppc64, riscv32, 
> riscv64, s390x, x86_64
> -   --processor=PROCESSOR  processor to compile for 
> ($default_processor). For arm and arm64, the
> +   --processor=PROCESSOR  processor to compile for ($processor). For 
> arm and arm64, the
>value 'max' is special and it will be 
> passed directly to
>qemu, bypassing the compiler. In this 
> case, --cflags can be
>used to compile for a specific processor.
> 
> Should be accurate enough, as far as I can tell. And I don't think there's
> a need for $do_help: if the user does ./configure --help --arch=arm64, then
> I think it's reasonable to expect that --help will be interpreted before
> --arch is parsed.
>

Sounds good.

Thanks,
drew



Re: [kvm-unit-tests PATCH v1 2/5] configure: Display the default processor for arm and arm64

2025-01-15 Thread Alexandru Elisei
Hi Drew,

On Tue, Jan 14, 2025 at 07:51:04PM +0100, Andrew Jones wrote:
> On Tue, Jan 14, 2025 at 05:17:28PM +, Alexandru Elisei wrote:
> ...
> > > > +# $arch will have changed when cross-compiling.
> > > > +[ -z "$processor" ] && processor=$(get_default_processor $arch)
> > > 
> > > The fact that $arch and $processor are wrong until they've had a chance to
> > 
> > $processor is never wrong. $processor is unset until either the user sets it
> > with --processor, or until this line. This patch introduces 
> > $default_processor
> > only for the purpose of having an accurate help text, it doesn't change 
> > when and
> > how $processor is assigned.
> 
> I should have said "The fact that $arch and $default_processor are wrong..."
> 
> > 
> > > be converted might be another reason for the $do_help idea. But it'll
> > > always be fragile since another change that does some sort of conversion
> > > could end up getting added after the '[ $do_help ] && usage' someday.
> > 
> > configure needs to distinguish between:
> > 
> > 1. The user not having specified --processor when doing ./configure.
> > 2. The user having set --processor.
> > 
> > If 1, then kvm-unit-tests can use the default $processor value for $arch,
> > which could have also been specified by the user.
> > 
> > If 2, then kvm-unit-tests should not touch $processor because that's what 
> > the
> > user wants.
> > 
> > Do you see something wrong with that reasoning?
> 
> If we output $default_processor in usage() before it's had a chance to be
> set correctly based on a given cross arch, then it won't display the
> correct name.
> 
> > 
> > Also, I don't understand why you say it's fragile, since configure doesn't
> 
> I wrote "it'll always be fragile" where 'it' refers to the most recent
> object of my paragraph ("the $do_help idea"). But, TBH, I'm not sure
> how important it is to get the help text accurate, so we can just not
> care if we call usage() with the wrong strings sometimes.

Got it now, thanks for explaining it.

My opinion is that a help text is there to help the user, and in my experience
an inaccurate help text can be very frustrating - think comments that say
one thing, and the code does something else.

How about this:

diff --git a/configure b/configure
index 3ab0ec208e10..5dbe189816b2 100755
--- a/configure
+++ b/configure
@@ -51,7 +51,6 @@ page_size=
 earlycon=
 efi=
 efi_direct=
-default_processor=$(get_default_processor $arch)

 # Enable -Werror by default for git repositories only (i.e. developer builds)
 if [ -e "$srcdir"/.git ]; then
@@ -61,13 +60,14 @@ else
 fi

 usage() {
+[ -z "$processor" ] && processor=$(get_default_processor $arch)
 cat <<-EOF
Usage: $0 [options]

Options include:
--arch=ARCHarchitecture to compile for ($arch). ARCH 
can be one of:
   arm, arm64/aarch64, i386, ppc64, riscv32, 
riscv64, s390x, x86_64
-   --processor=PROCESSOR  processor to compile for 
($default_processor). For arm and arm64, the
+   --processor=PROCESSOR  processor to compile for ($processor). For 
arm and arm64, the
   value 'max' is special and it will be passed 
directly to
   qemu, bypassing the compiler. In this case, 
--cflags can be
   used to compile for a specific processor.

Should be accurate enough, as far as I can tell. And I don't think there's
a need for $do_help: if the user does ./configure --help --arch=arm64, then
I think it's reasonable to expect that --help will be interpreted before
--arch is parsed.

Thanks,
Alex



Re: [kvm-unit-tests PATCH v1 1/5] configure: Document that the architecture name 'aarch64' is also supported

2025-01-15 Thread Alexandru Elisei
Hi Drew,

On Tue, Jan 14, 2025 at 07:39:49PM +0100, Andrew Jones wrote:
> On Tue, Jan 14, 2025 at 05:03:20PM +, Alexandru Elisei wrote:
> ...
> > diff --git a/configure b/configure
> > index 86cf1da36467..1362b68dd68b 100755
> > --- a/configure
> > +++ b/configure
> > @@ -15,8 +15,8 @@ objdump=objdump
> >  readelf=readelf
> >  ar=ar
> >  addr2line=addr2line
> > -arch=$(uname -m | sed -e 
> > 's/i.86/i386/;s/arm64/aarch64/;s/arm.*/arm/;s/ppc64.*/ppc64/')
> > -host=$arch
> > +host=$(uname -m | sed -e 
> > 's/i.86/i386/;s/arm64/aarch64/;s/arm.*/arm/;s/ppc64.*/ppc64/')
> > +arch=$(echo $host | sed -e 's/aarch64/arm64/')
> 
> Sure, or avoid the second sed and just do
> 
> host=$(...)
> arch=$host
> [ "$arch" = "aarch64" ] && arch="arm64"

Yep, thanks.

Alex



Re: [PATCH 0/2] PCI: Simplify few things

2025-01-15 Thread Krzysztof Wilczyński
Hello,

> Few code simplifications without functional impact.  Not tested on
> hardware.

Applied to controller/dwc for v6.14, thank you!

Krzysztof



Re: [PATCH V2] tools/perf/builtin-lock: Fix return code for functions in __cmd_contention

2025-01-15 Thread Athira Rajeev



> On 14 Jan 2025, at 3:47 AM, Namhyung Kim  wrote:
> 
> On Fri, Jan 10, 2025 at 03:07:30PM +0530, Athira Rajeev wrote:
>> perf lock contention returns zero exit value even if the lock contention
>> BPF setup failed.
>> 
>>  # ./perf lock con -b true
>>  libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was 
>> CONFIG_DEBUG_INFO_BTF enabled?
>>  libbpf: failed to find '.BTF' ELF section in 
>> /lib/modules/6.13.0-rc3+/build/vmlinux
>>  libbpf: failed to find valid kernel BTF
>>  libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was 
>> CONFIG_DEBUG_INFO_BTF enabled?
>>  libbpf: failed to find '.BTF' ELF section in 
>> /lib/modules/6.13.0-rc3+/build/vmlinux
>>  libbpf: failed to find valid kernel BTF
>>  libbpf: Error loading vmlinux BTF: -ESRCH
>>  libbpf: failed to load object 'lock_contention_bpf'
>>  libbpf: failed to load BPF skeleton 'lock_contention_bpf': -ESRCH
>>  Failed to load lock-contention BPF skeleton
>>  lock contention BPF setup failed
>>  # echo $?
>>   0
>> 
>> Fix this by saving the return code for lock_contention_prepare
>> so that command exits with proper return code. Similarly set the
>> return code properly for two other functions in builtin-lock, namely
>> setup_output_field() and select_key().
>> 
>> Signed-off-by: Athira Rajeev 
> 
> Reviewed-by: Namhyung Kim 
> 
> Thanks,
> Namhyung
Hi Namhyung,

Thanks for the reviewed-by 

Athira

> 
>> ---
>> Changelog:
>> v1 -> v2
>> Fixed return code in functions: setup_output_field()
>> and select_key() as pointed out by Namhyung.
>> 
>> tools/perf/builtin-lock.c | 11 ---
>> 1 file changed, 8 insertions(+), 3 deletions(-)
>> 
>> diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
>> index 208c482daa56..94a2bc15a2fa 100644
>> --- a/tools/perf/builtin-lock.c
>> +++ b/tools/perf/builtin-lock.c
>> @@ -2049,7 +2049,8 @@ static int __cmd_contention(int argc, const char 
>> **argv)
>> goto out_delete;
>> }
>> 
>> - if (lock_contention_prepare(&con) < 0) {
>> + err = lock_contention_prepare(&con);
>> + if (err < 0) {
>> pr_err("lock contention BPF setup failed\n");
>> goto out_delete;
>> }
>> @@ -2070,10 +2071,14 @@ static int __cmd_contention(int argc, const char 
>> **argv)
>> }
>> }
>> 
>> - if (setup_output_field(true, output_fields))
>> + err = setup_output_field(true, output_fields);
>> + if (err) {
>> + pr_err("Failed to setup output field\n");
>> goto out_delete;
>> + }
>> 
>> - if (select_key(true))
>> + err = select_key(true);
>> + if (err)
>> goto out_delete;
>> 
>> if (symbol_conf.field_sep) {
>> -- 
>> 2.43.5





RE: [PATCH v2 net-next 07/13] net: enetc: add RSS support for i.MX95 ENETC PF

2025-01-15 Thread Wei Fang
> -Original Message-
> From: Jakub Kicinski 
> Sent: 2025年1月16日 10:41
> To: Wei Fang 
> Cc: Claudiu Manoil ; Vladimir Oltean
> ; Clark Wang ;
> andrew+net...@lunn.ch; da...@davemloft.net; eduma...@google.com;
> pab...@redhat.com; christophe.le...@csgroup.eu; net...@vger.kernel.org;
> linux-ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> linux-arm-ker...@lists.infradead.org; i...@lists.linux.dev
> Subject: Re: [PATCH v2 net-next 07/13] net: enetc: add RSS support for i.MX95
> ENETC PF
> 
> On Thu, 16 Jan 2025 02:24:10 + Wei Fang wrote:
> > > Why create full ops for something this trivial?
> >
> > We add enetc_pf_hw_ops to implement different hardware ops
> > for different chips. So that they can be called in common functions.
> > Although the change is minor, it is consistent with the original
> > intention of adding enetc_pf_hw_ops.
> 
> In other words you prefer ops.
> 
> Now imagine you have to refactor such piece of code in 10 drivers
> and each of them has 2 layers of indirect ops like you do.
> Unnecessary complexity.

Okay, I will remove them from ops.



[PATCH v5 00/15] powerpc/objtool: uaccess validation for PPC32 (v5)

2025-01-15 Thread Christophe Leroy
This series adds UACCESS validation for PPC32. It includes
a dozen of changes to objtool core.

It applies on top of series "Cleanup/Optimise KUAP (v3)"
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=363368&state=*

It is almost mature, performs code analysis for all PPC32.

In this version objtool switch table lookup has been enhanced to
handle nested switch tables.

Most object files are correctly decoded, only a few
'unreachable instruction' warnings remain due to more complex
fonctions which include back and forth jumps or branches. Two types
of switch tables are missed at the time being:
- When switch table address is temporarily saved in the stack before
being used.
- When there are backwards jumps in the path.

It allowed to detect some UACCESS mess in a few files. They've been
fixed through other patches.

Changes in v5:
- Rebased on top of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
branch tip/objtool/core
- Use generic annotation infrastructure to annotate uaccess begin and end 
instructions

Changes in v4:
- Split series in two parts, the powerpc uaccess rework is submitted
separately, see 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=363368&state=*
- Support of UACCESS on all PPC32 including book3s/32 which was missing in v3.
- More elaborated switch tables lookup.
- Patches 2, 7, 8, 9, 10, 11 are new
- Patch 11 in series v3 is now removed.

Changes in v3:
- Rebased on top of a merge of powerpc tree and tip/objtool/core tree
- Simplified support for relative switch tables based on relocation type
- Taken comments from Peter


Christophe Leroy (15):
  objtool: Fix generic annotation infrastructure cross build
  objtool: Move back misplaced comment
  objtool: Allow an architecture to disable objtool on ASM files
  objtool: Fix JUMP_ENTRY_SIZE for bi-arch like powerpc
  objtool: Add INSN_RETURN_CONDITIONAL
  objtool: Add support for relative switch tables
  objtool: Merge mark_func_jump_tables() and add_func_jump_tables()
  objtool: Track general purpose register used for switch table base
  objtool: Find end of switch table directly
  objtool: When looking for switch tables also follow conditional and
dynamic jumps
  objtool: .rodata.cst{2/4/8/16} are not switch tables
  objtool: Add support for more complex UACCESS control
  objtool: Prepare noreturns.h for more architectures
  powerpc/bug: Annotate reachable after warning trap
  powerpc: Implement UACCESS validation on PPC32

 arch/Kconfig  |   5 +
 arch/powerpc/Kconfig  |   2 +
 arch/powerpc/include/asm/book3s/32/kup.h  |   2 +
 arch/powerpc/include/asm/bug.h|  14 +-
 arch/powerpc/include/asm/nohash/32/kup-8xx.h  |   4 +-
 arch/powerpc/include/asm/nohash/kup-booke.h   |   4 +-
 arch/powerpc/kexec/core_32.c  |   4 +-
 arch/powerpc/mm/nohash/kup.c  |   2 +
 include/linux/objtool.h   |   3 +
 include/linux/objtool_types.h |   2 +
 scripts/Makefile.lib  |   4 +
 tools/include/linux/objtool_types.h   |   2 +
 tools/objtool/arch/powerpc/decode.c   | 150 +-
 .../arch/powerpc/include/arch/noreturns.h |  11 ++
 .../arch/powerpc/include/arch/special.h   |  11 +-
 tools/objtool/arch/powerpc/special.c  |  40 -
 .../objtool/arch/x86/include/arch/noreturns.h |  20 +++
 tools/objtool/arch/x86/special.c  |   8 +-
 tools/objtool/check.c | 129 ++-
 tools/objtool/include/objtool/arch.h  |   1 +
 tools/objtool/include/objtool/check.h |   6 +-
 tools/objtool/include/objtool/special.h   |   3 +-
 tools/objtool/noreturns.h |  14 +-
 tools/objtool/special.c   |  55 ---
 24 files changed, 386 insertions(+), 110 deletions(-)
 create mode 100644 tools/objtool/arch/powerpc/include/arch/noreturns.h
 create mode 100644 tools/objtool/arch/x86/include/arch/noreturns.h

-- 
2.47.0




[PATCH v5 09/15] objtool: Find end of switch table directly

2025-01-15 Thread Christophe Leroy
At the time being, the end of a switch table can only be known
once the start of the following switch table has ben located.

This is a problem when switch tables are nested because until the first
switch table is properly added, the second one cannot be located as a
the backward walk will abut on the dynamic switch of the previous one.

So perform a first forward walk in the code in order to locate all
possible relocations to switch tables and build a local table with
those relocations. Later on once one switch table is found, go through
this local table to know where next switch table starts.

Signed-off-by: Christophe Leroy 
---
 tools/objtool/check.c | 63 ---
 1 file changed, 47 insertions(+), 16 deletions(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 72b977f81dd6..0ad2bdd92232 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2058,14 +2058,30 @@ static void find_jump_table(struct objtool_file *file, 
struct symbol *func,
}
 }
 
+static struct reloc *find_next_table(struct instruction *insn,
+struct reloc **table, unsigned int size)
+{
+   unsigned long offset = reloc_offset(insn_jump_table(insn));
+   int i;
+   struct reloc *reloc = NULL;
+
+   for (i = 0; i < size; i++) {
+   if (reloc_offset(table[i]) > offset &&
+   (!reloc || reloc_offset(table[i]) < reloc_offset(reloc)))
+   reloc = table[i];
+   }
+   return reloc;
+}
+
 /*
  * First pass: Mark the head of each jump table so that in the next pass,
  * we know when a given jump table ends and the next one starts.
  */
 static int mark_add_func_jump_tables(struct objtool_file *file,
-struct symbol *func)
+struct symbol *func,
+struct reloc **table, unsigned int size)
 {
-   struct instruction *insn, *last = NULL, *insn_t1 = NULL, *insn_t2;
+   struct instruction *insn, *last = NULL;
int ret = 0;
 
func_for_each_insn(file, func, insn) {
@@ -2094,23 +2110,11 @@ static int mark_add_func_jump_tables(struct 
objtool_file *file,
if (!insn_jump_table(insn))
continue;
 
-   if (!insn_t1) {
-   insn_t1 = insn;
-   continue;
-   }
-
-   insn_t2 = insn;
-
-   ret = add_jump_table(file, insn_t1, insn_jump_table(insn_t2));
+   ret = add_jump_table(file, insn, find_next_table(insn, table, 
size));
if (ret)
return ret;
-
-   insn_t1 = insn_t2;
}
 
-   if (insn_t1)
-   ret = add_jump_table(file, insn_t1, NULL);
-
return ret;
 }
 
@@ -2123,15 +2127,42 @@ static int add_jump_table_alts(struct objtool_file 
*file)
 {
struct symbol *func;
int ret;
+   struct instruction *insn;
+   unsigned int size = 0, i = 0;
+   struct reloc **table = NULL;
 
if (!file->rodata)
return 0;
 
+   for_each_insn(file, insn) {
+   struct instruction *dest_insn;
+   struct reloc *reloc;
+   unsigned long table_size;
+
+   func = insn_func(insn) ? insn_func(insn)->pfunc : NULL;
+   reloc = arch_find_switch_table(file, insn, &table_size, NULL);
+   /*
+* Each table entry has a rela associated with it.  The rela
+* should reference text in the same function as the original
+* instruction.
+*/
+   if (!reloc)
+   continue;
+   dest_insn = find_insn(file, reloc->sym->sec, 
reloc_addend(reloc));
+   if (!dest_insn || !insn_func(dest_insn) || 
insn_func(dest_insn)->pfunc != func)
+   continue;
+   if (i == size) {
+   size += 1024;
+   table = realloc(table, size * sizeof(*table));
+   }
+   table[i++] = reloc;
+   }
+
for_each_sym(file, func) {
if (func->type != STT_FUNC)
continue;
 
-   ret = mark_add_func_jump_tables(file, func);
+   ret = mark_add_func_jump_tables(file, func, table, i);
if (ret)
return ret;
}
-- 
2.47.0




[PATCH v5 01/15] objtool: Fix generic annotation infrastructure cross build

2025-01-15 Thread Christophe Leroy
Cross build for powerpc/32 on x86_64 leads to:

  CC  init/main.o
init/main.o: warning: objtool: early_randomize_kstack_offset+0xf0: Unknown 
annotation type: 134217728
init/main.o: warning: objtool: start_kernel+0x4a8: Unknown annotation type: 
134217728
init/main.o: warning: objtool: do_one_initcall+0x178: Unknown annotation type: 
134217728

Fix byte order.

Fixes: 2116b349e29a ("objtool: Generic annotation infrastructure")
Signed-off-by: Christophe Leroy 
---
v5: New in v5
---
 tools/objtool/check.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index e92c5564d9ca..129c4e2245ae 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2272,7 +2272,7 @@ static int read_annotate(struct objtool_file *file,
}
 
for_each_reloc(sec->rsec, reloc) {
-   type = *(u32 *)(sec->data->d_buf + (reloc_idx(reloc) * 
sec->sh.sh_entsize) + 4);
+   type = bswap_if_needed(file->elf, *(u32 *)(sec->data->d_buf + 
(reloc_idx(reloc) * sec->sh.sh_entsize) + 4));
 
offset = reloc->sym->offset + reloc_addend(reloc);
insn = find_insn(file, reloc->sym->sec, offset);
-- 
2.47.0




[PATCH v5 11/15] objtool: .rodata.cst{2/4/8/16} are not switch tables

2025-01-15 Thread Christophe Leroy
Exclude sections named
  .rodata.cst2
  .rodata.cst4
  .rodata.cst8
  .rodata.cst16
as they won't contain switch tables.

Signed-off-by: Christophe Leroy 
---
 tools/objtool/check.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 87b81d8e01c0..91436f4b3622 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2506,7 +2506,8 @@ static void mark_rodata(struct objtool_file *file)
 */
for_each_sec(file, sec) {
if (!strncmp(sec->name, ".rodata", 7) &&
-   !strstr(sec->name, ".str1.")) {
+   !strstr(sec->name, ".str1.") &&
+   !strstr(sec->name, ".cst")) {
sec->rodata = true;
found = true;
}
-- 
2.47.0




[PATCH v5 14/15] powerpc/bug: Annotate reachable after warning trap

2025-01-15 Thread Christophe Leroy
This commit is copied from commit bfb1a7c91fb7 ("x86/bug: Merge
annotate_reachable() into _BUG_FLAGS() asm")

'twi 31,0,0' is a BUG instruction, which is by default a dead end.

But the same instruction is used for WARNINGs and the execution
resumes with the following instruction. Mark it reachable so
that objtool knows that it is not a dead end in that case.

Also change the unreachable() annotation by __builtin_unreachable()
since objtool already knows that a BUG instruction is a dead end.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/bug.h | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 1db485aacbd9..c41e9f903b5b 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -4,6 +4,7 @@
 #ifdef __KERNEL__
 
 #include 
+#include 
 
 #ifdef CONFIG_BUG
 
@@ -51,10 +52,11 @@
".previous\n"
 #endif
 
-#define BUG_ENTRY(insn, flags, ...)\
+#define BUG_ENTRY(insn, flags, extra, ...) \
__asm__ __volatile__(   \
"1: " insn "\n" \
_EMIT_BUG_ENTRY \
+   extra   \
: : "i" (__FILE__), "i" (__LINE__), \
  "i" (flags),  \
  "i" (sizeof(struct bug_entry)),   \
@@ -67,12 +69,12 @@
  */
 
 #define BUG() do { \
-   BUG_ENTRY("twi 31, 0, 0", 0);   \
-   unreachable();  \
+   BUG_ENTRY("twi 31, 0, 0", 0, "");   \
+   __builtin_unreachable();\
 } while (0)
 #define HAVE_ARCH_BUG
 
-#define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | 
(flags))
+#define __WARN_FLAGS(flags) BUG_ENTRY("twi 31, 0, 0", BUGFLAG_WARNING | 
(flags), ANNOTATE_REACHABLE(1b))
 
 #ifdef CONFIG_PPC64
 #define BUG_ON(x) do { \
@@ -80,7 +82,7 @@
if (x)  \
BUG();  \
} else {\
-   BUG_ENTRY(PPC_TLNEI " %4, 0", 0, "r" ((__force long)(x)));  
\
+   BUG_ENTRY(PPC_TLNEI " %4, 0", 0, "", "r" ((__force long)(x)));  
\
}   \
 } while (0)
 
@@ -92,7 +94,7 @@
} else {\
BUG_ENTRY(PPC_TLNEI " %4, 0",   \
  BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),  \
- "r" (__ret_warn_on)); \
+ "", "r" (__ret_warn_on)); \
}   \
unlikely(__ret_warn_on);\
 })
-- 
2.47.0




[PATCH v5 10/15] objtool: When looking for switch tables also follow conditional and dynamic jumps

2025-01-15 Thread Christophe Leroy
When walking backward to find the base address of a switch table,
also take into account conditionnal branches and dynamic jumps from
a previous switch table.

To avoid mis-routing, break when stumbling on a function return.

Signed-off-by: Christophe Leroy 
---
 tools/objtool/check.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 0ad2bdd92232..87b81d8e01c0 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1998,6 +1998,8 @@ static int add_jump_table(struct objtool_file *file, 
struct instruction *insn,
alt->next = insn->alts;
insn->alts = alt;
prev_offset = reloc_offset(reloc);
+   if (!dest_insn->first_jump_src)
+   dest_insn->first_jump_src = insn;
}
 
if (!prev_offset) {
@@ -2032,6 +2034,9 @@ static void find_jump_table(struct objtool_file *file, 
struct symbol *func,
insn->gpr == orig_insn->gpr)
break;
 
+   if (insn->type == INSN_RETURN)
+   break;
+
/* allow small jumps within the range */
if (insn->type == INSN_JUMP_UNCONDITIONAL &&
insn->jump_dest &&
@@ -2093,8 +2098,7 @@ static int mark_add_func_jump_tables(struct objtool_file 
*file,
 * that find_jump_table() can back-track using those and
 * avoid some potentially confusing code.
 */
-   if (insn->type == INSN_JUMP_UNCONDITIONAL && insn->jump_dest &&
-   insn->offset > last->offset &&
+   if (is_static_jump(insn) && insn->jump_dest &&
insn->jump_dest->offset > insn->offset &&
!insn->jump_dest->first_jump_src) {
 
-- 
2.47.0




[PATCH v5 06/15] objtool: Add support for relative switch tables

2025-01-15 Thread Christophe Leroy
On powerpc, switch tables are relative, than means the address of the
table is added to the value of the entry in order to get the pointed
address: (r10 is the table address, r4 the index in the table)

  lis r10,0 <== Load r10 with upper part of .rodata address
  R_PPC_ADDR16_HA .rodata
  addir10,r10,0 <== Add lower part of .rodata address
  R_PPC_ADDR16_LO .rodata
  lwzxr8,r10,r4 <== Read table entry at r10 + r4 into r8
  add r10,r8,r10<== Add table address to read value
  mtctr   r10   <== Save calculated address in CTR
  bctr  <== Branch to address in CTR

  RELOCATION RECORDS FOR [.rodata]:
  OFFSET   TYPE  VALUE
   R_PPC_REL32   .text+0x054c
  0004 R_PPC_REL32   .text+0x03d0
...

But for c_jump_tables it is not the case, they contain the
pointed address directly:

  lis r28,0 <== Load r28 with upper .rodata..c_jump_table
  R_PPC_ADDR16_HA   .rodata..c_jump_table
  addir28,r28,0 <== Add lower part of .rodata..c_jump_table
  R_PPC_ADDR16_LO   .rodata..c_jump_table
  lwzxr10,r28,r10   <== Read table entry at r10 + r28 into r10
  mtctr   r10   <== Save read value in CTR
  bctr  <== Branch to address in CTR

  RELOCATION RECORDS FOR [.rodata..c_jump_table]:
  OFFSET   TYPE  VALUE
   R_PPC_ADDR32  .text+0x0dc8
  0004 R_PPC_ADDR32  .text+0x0dc8
...

Add support to objtool for relative tables, based on the relocation
type which is R_PPC_REL32 for switch tables and R_PPC_ADDR32 for
C jump tables. Do the comparison using R_ABS32 and R_ABS64 which are
architecture agnostic.

And use correct size for 'long' instead of hard coding a size of '8'.

Signed-off-by: Christophe Leroy 
---
 tools/objtool/check.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 10979d68103d..4495e7823b29 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1950,7 +1950,7 @@ static int add_jump_table(struct objtool_file *file, 
struct instruction *insn,
struct symbol *pfunc = insn_func(insn)->pfunc;
struct reloc *table = insn_jump_table(insn);
struct instruction *dest_insn;
-   unsigned int prev_offset = 0;
+   unsigned int offset, prev_offset = 0;
struct reloc *reloc = table;
struct alternative *alt;
 
@@ -1967,7 +1967,7 @@ static int add_jump_table(struct objtool_file *file, 
struct instruction *insn,
break;
 
/* Make sure the table entries are consecutive: */
-   if (prev_offset && reloc_offset(reloc) != prev_offset + 8)
+   if (prev_offset && reloc_offset(reloc) != prev_offset + 
elf_addr_size(file->elf))
break;
 
/* Detect function pointers from contiguous objects: */
@@ -1975,7 +1975,12 @@ static int add_jump_table(struct objtool_file *file, 
struct instruction *insn,
reloc_addend(reloc) == pfunc->offset)
break;
 
-   dest_insn = find_insn(file, reloc->sym->sec, 
reloc_addend(reloc));
+   if (reloc_type(reloc) == R_ABS32 || reloc_type(reloc) == 
R_ABS64)
+   offset = reloc_addend(reloc);
+   else
+   offset = reloc_addend(reloc) + reloc_offset(table) - 
reloc_offset(reloc);
+
+   dest_insn = find_insn(file, reloc->sym->sec, offset);
if (!dest_insn)
break;
 
-- 
2.47.0




[PATCH v5 04/15] objtool: Fix JUMP_ENTRY_SIZE for bi-arch like powerpc

2025-01-15 Thread Christophe Leroy
struct jump_entry {
s32 code;
s32 target;
long key;
};

It means that the size of the third argument depends on
whether we are building a 32 bits or 64 bits kernel.

Therefore JUMP_ENTRY_SIZE must depend on elf_class_addrsize(elf).

To allow that, entries[] table must be initialised at runtime. This is
easily done by moving it into its only user which is special_get_alts().

Signed-off-by: Christophe Leroy 
Acked-by: Peter Zijlstra (Intel) 
---
 .../arch/powerpc/include/arch/special.h   |  2 +-
 tools/objtool/special.c   | 55 +--
 2 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/tools/objtool/arch/powerpc/include/arch/special.h 
b/tools/objtool/arch/powerpc/include/arch/special.h
index ffef9ada7133..b17802dcf436 100644
--- a/tools/objtool/arch/powerpc/include/arch/special.h
+++ b/tools/objtool/arch/powerpc/include/arch/special.h
@@ -6,7 +6,7 @@
 #define EX_ORIG_OFFSET 0
 #define EX_NEW_OFFSET 4
 
-#define JUMP_ENTRY_SIZE 16
+#define JUMP_ENTRY_SIZE (8 + elf_addr_size(elf)) /* 12 on PPC32, 16 on PPC64 */
 #define JUMP_ORIG_OFFSET 0
 #define JUMP_NEW_OFFSET 4
 #define JUMP_KEY_OFFSET 8
diff --git a/tools/objtool/special.c b/tools/objtool/special.c
index 097a69db82a0..7780ed8a084a 100644
--- a/tools/objtool/special.c
+++ b/tools/objtool/special.c
@@ -26,34 +26,6 @@ struct special_entry {
unsigned char key; /* jump_label key */
 };
 
-static const struct special_entry entries[] = {
-   {
-   .sec = ".altinstructions",
-   .group = true,
-   .size = ALT_ENTRY_SIZE,
-   .orig = ALT_ORIG_OFFSET,
-   .orig_len = ALT_ORIG_LEN_OFFSET,
-   .new = ALT_NEW_OFFSET,
-   .new_len = ALT_NEW_LEN_OFFSET,
-   .feature = ALT_FEATURE_OFFSET,
-   },
-   {
-   .sec = "__jump_table",
-   .jump_or_nop = true,
-   .size = JUMP_ENTRY_SIZE,
-   .orig = JUMP_ORIG_OFFSET,
-   .new = JUMP_NEW_OFFSET,
-   .key = JUMP_KEY_OFFSET,
-   },
-   {
-   .sec = "__ex_table",
-   .size = EX_ENTRY_SIZE,
-   .orig = EX_ORIG_OFFSET,
-   .new = EX_NEW_OFFSET,
-   },
-   {},
-};
-
 void __weak arch_handle_alternative(unsigned short feature, struct special_alt 
*alt)
 {
 }
@@ -144,6 +116,33 @@ int special_get_alts(struct elf *elf, struct list_head 
*alts)
unsigned int nr_entries;
struct special_alt *alt;
int idx, ret;
+   const struct special_entry entries[] = {
+   {
+   .sec = ".altinstructions",
+   .group = true,
+   .size = ALT_ENTRY_SIZE,
+   .orig = ALT_ORIG_OFFSET,
+   .orig_len = ALT_ORIG_LEN_OFFSET,
+   .new = ALT_NEW_OFFSET,
+   .new_len = ALT_NEW_LEN_OFFSET,
+   .feature = ALT_FEATURE_OFFSET,
+   },
+   {
+   .sec = "__jump_table",
+   .jump_or_nop = true,
+   .size = JUMP_ENTRY_SIZE,
+   .orig = JUMP_ORIG_OFFSET,
+   .new = JUMP_NEW_OFFSET,
+   .key = JUMP_KEY_OFFSET,
+   },
+   {
+   .sec = "__ex_table",
+   .size = EX_ENTRY_SIZE,
+   .orig = EX_ORIG_OFFSET,
+   .new = EX_NEW_OFFSET,
+   },
+   {},
+   };
 
INIT_LIST_HEAD(alts);
 
-- 
2.47.0




[PATCH v5 12/15] objtool: Add support for more complex UACCESS control

2025-01-15 Thread Christophe Leroy
On x86, UACCESS is controlled by two instructions: STAC and CLAC.
STAC instruction enables UACCESS while CLAC disables UACCESS.
This is simple enough for objtool to locate UACCESS enable and
disable.

But on powerpc it is a bit more complex, the same instruction is
used for enabling and disabling UACCESS, and the same instruction
can be used for many other things. It would be too complex to use
exclusively instruction decoding.

To help objtool, annotate such instructions, on the same principle as
reachable/unreachable instructions. And add ANNOTATE_UACCESS_BEGIN
and ANNOTATE_UACCESS_END macros to be used in inline assembly code to
annotate UACCESS enable and UACCESS disable instructions.

Signed-off-by: Christophe Leroy 
---
v5: Use generic annotation infrastructure
---
 include/linux/objtool.h | 3 +++
 include/linux/objtool_types.h   | 2 ++
 tools/include/linux/objtool_types.h | 2 ++
 tools/objtool/check.c   | 8 
 4 files changed, 15 insertions(+)

diff --git a/include/linux/objtool.h b/include/linux/objtool.h
index c722a921165b..7efd731da2a2 100644
--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -183,6 +183,9 @@
  */
 #define ANNOTATE_REACHABLE(label)  __ASM_ANNOTATE(label, 
ANNOTYPE_REACHABLE)
 
+#define ANNOTATE_UACCESS_BEGIN ASM_ANNOTATE(ANNOTYPE_UACCESS_BEGIN)
+#define ANNOTATE_UACCESS_END   ASM_ANNOTATE(ANNOTYPE_UACCESS_END)
+
 #else
 #define ANNOTATE_NOENDBR   ANNOTATE type=ANNOTYPE_NOENDBR
 #define ANNOTATE_RETPOLINE_SAFEANNOTATE 
type=ANNOTYPE_RETPOLINE_SAFE
diff --git a/include/linux/objtool_types.h b/include/linux/objtool_types.h
index df5d9fa84dba..28da3d989e65 100644
--- a/include/linux/objtool_types.h
+++ b/include/linux/objtool_types.h
@@ -65,5 +65,7 @@ struct unwind_hint {
 #define ANNOTYPE_IGNORE_ALTS   6
 #define ANNOTYPE_INTRA_FUNCTION_CALL   7
 #define ANNOTYPE_REACHABLE 8
+#define ANNOTYPE_UACCESS_BEGIN 9
+#define ANNOTYPE_UACCESS_END   10
 
 #endif /* _LINUX_OBJTOOL_TYPES_H */
diff --git a/tools/include/linux/objtool_types.h 
b/tools/include/linux/objtool_types.h
index df5d9fa84dba..28da3d989e65 100644
--- a/tools/include/linux/objtool_types.h
+++ b/tools/include/linux/objtool_types.h
@@ -65,5 +65,7 @@ struct unwind_hint {
 #define ANNOTYPE_IGNORE_ALTS   6
 #define ANNOTYPE_INTRA_FUNCTION_CALL   7
 #define ANNOTYPE_REACHABLE 8
+#define ANNOTYPE_UACCESS_BEGIN 9
+#define ANNOTYPE_UACCESS_END   10
 
 #endif /* _LINUX_OBJTOOL_TYPES_H */
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 91436f4b3622..54625f09d831 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2422,6 +2422,14 @@ static int __annotate_late(struct objtool_file *file, 
int type, struct instructi
insn->dead_end = false;
break;
 
+   case ANNOTYPE_UACCESS_BEGIN:
+   insn->type = INSN_STAC;
+   break;
+
+   case ANNOTYPE_UACCESS_END:
+   insn->type = INSN_CLAC;
+   break;
+
default:
WARN_INSN(insn, "Unknown annotation type: %d", type);
break;
-- 
2.47.0




[PATCH v5 03/15] objtool: Allow an architecture to disable objtool on ASM files

2025-01-15 Thread Christophe Leroy
Supporting objtool on ASM files requires quite an effort.

Features like UACCESS validation don't require ASM files validation.

In order to allow architectures to enable objtool validation
without spending unnecessary effort on cleaning up ASM files,
provide an option to disable objtool validation on ASM files.

Suggested-by: Naveen N Rao 
Signed-off-by: Christophe Leroy 
---
 arch/Kconfig | 5 +
 scripts/Makefile.lib | 4 
 2 files changed, 9 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 6682b2a53e34..137ef643e865 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1305,6 +1305,11 @@ config ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
 config HAVE_OBJTOOL
bool
 
+config ARCH_OBJTOOL_SKIP_ASM
+   bool
+   help
+ Architecture doesn't support objtool on ASM files
+
 config HAVE_JUMP_LABEL_HACK
bool
 
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 7395200538da..3c5e6de76b11 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -325,7 +325,11 @@ define rule_cc_o_c
 endef
 
 quiet_cmd_as_o_S = AS $(quiet_modtag)  $@
+ifndef CONFIG_ARCH_OBJTOOL_SKIP_ASM
   cmd_as_o_S = $(CC) $(a_flags) -c -o $@ $< $(cmd_objtool)
+else
+  cmd_as_o_S = $(CC) $(a_flags) -c -o $@ $<
+endif
 
 define rule_as_o_S
$(call cmd_and_fixdep,as_o_S)
-- 
2.47.0




[PATCH v5 13/15] objtool: Prepare noreturns.h for more architectures

2025-01-15 Thread Christophe Leroy
noreturns.h is a mix of x86 specific functions and more generic
core functions.

In preparation of inclusion of powerpc, split x86 functions out of
noreturns.h into arch/noreturns.h

Signed-off-by: Christophe Leroy 
---
 .../objtool/arch/x86/include/arch/noreturns.h | 20 +++
 tools/objtool/noreturns.h | 14 ++---
 2 files changed, 22 insertions(+), 12 deletions(-)
 create mode 100644 tools/objtool/arch/x86/include/arch/noreturns.h

diff --git a/tools/objtool/arch/x86/include/arch/noreturns.h 
b/tools/objtool/arch/x86/include/arch/noreturns.h
new file mode 100644
index ..a4262aff3917
--- /dev/null
+++ b/tools/objtool/arch/x86/include/arch/noreturns.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * This is a (sorted!) list of all known __noreturn functions in arch/x86.
+ * It's needed for objtool to properly reverse-engineer the control flow graph.
+ *
+ * Yes, this is unfortunate.  A better solution is in the works.
+ */
+NORETURN(cpu_bringup_and_idle)
+NORETURN(ex_handler_msr_mce)
+NORETURN(hlt_play_dead)
+NORETURN(hv_ghcb_terminate)
+NORETURN(machine_real_restart)
+NORETURN(rewind_stack_and_make_dead)
+NORETURN(sev_es_terminate)
+NORETURN(snp_abort)
+NORETURN(x86_64_start_kernel)
+NORETURN(x86_64_start_reservations)
+NORETURN(xen_cpu_bringup_again)
+NORETURN(xen_start_kernel)
diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
index f37614cc2c1b..dfee1c91a70d 100644
--- a/tools/objtool/noreturns.h
+++ b/tools/objtool/noreturns.h
@@ -1,5 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 
+#include 
+
 /*
  * This is a (sorted!) list of all known __noreturn functions in the kernel.
  * It's needed for objtool to properly reverse-engineer the control flow graph.
@@ -19,33 +21,21 @@ NORETURN(__x64_sys_exit_group)
 NORETURN(arch_cpu_idle_dead)
 NORETURN(bch2_trans_in_restart_error)
 NORETURN(bch2_trans_restart_error)
-NORETURN(cpu_bringup_and_idle)
 NORETURN(cpu_startup_entry)
 NORETURN(do_exit)
 NORETURN(do_group_exit)
 NORETURN(do_task_dead)
-NORETURN(ex_handler_msr_mce)
-NORETURN(hlt_play_dead)
-NORETURN(hv_ghcb_terminate)
 NORETURN(kthread_complete_and_exit)
 NORETURN(kthread_exit)
 NORETURN(kunit_try_catch_throw)
-NORETURN(machine_real_restart)
 NORETURN(make_task_dead)
 NORETURN(mpt_halt_firmware)
 NORETURN(nmi_panic_self_stop)
 NORETURN(panic)
 NORETURN(panic_smp_self_stop)
 NORETURN(rest_init)
-NORETURN(rewind_stack_and_make_dead)
 NORETURN(rust_begin_unwind)
 NORETURN(rust_helper_BUG)
-NORETURN(sev_es_terminate)
-NORETURN(snp_abort)
 NORETURN(start_kernel)
 NORETURN(stop_this_cpu)
 NORETURN(usercopy_abort)
-NORETURN(x86_64_start_kernel)
-NORETURN(x86_64_start_reservations)
-NORETURN(xen_cpu_bringup_again)
-NORETURN(xen_start_kernel)
-- 
2.47.0




[PATCH v5 07/15] objtool: Merge mark_func_jump_tables() and add_func_jump_tables()

2025-01-15 Thread Christophe Leroy
Those two functions loop over the instructions of a function.
Merge the two loops in order to ease enhancement of table end
in a following patch.

Signed-off-by: Christophe Leroy 
---
 tools/objtool/check.c | 19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 4495e7823b29..613d169eb6b8 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2061,10 +2061,11 @@ static void find_jump_table(struct objtool_file *file, 
struct symbol *func,
  * First pass: Mark the head of each jump table so that in the next pass,
  * we know when a given jump table ends and the next one starts.
  */
-static void mark_func_jump_tables(struct objtool_file *file,
-   struct symbol *func)
+static int mark_add_func_jump_tables(struct objtool_file *file,
+struct symbol *func)
 {
-   struct instruction *insn, *last = NULL;
+   struct instruction *insn, *last = NULL, *insn_t1 = NULL, *insn_t2;
+   int ret = 0;
 
func_for_each_insn(file, func, insn) {
if (!last)
@@ -2088,16 +2089,7 @@ static void mark_func_jump_tables(struct objtool_file 
*file,
continue;
 
find_jump_table(file, func, insn);
-   }
-}
 
-static int add_func_jump_tables(struct objtool_file *file,
- struct symbol *func)
-{
-   struct instruction *insn, *insn_t1 = NULL, *insn_t2;
-   int ret = 0;
-
-   func_for_each_insn(file, func, insn) {
if (!insn_jump_table(insn))
continue;
 
@@ -2138,8 +2130,7 @@ static int add_jump_table_alts(struct objtool_file *file)
if (func->type != STT_FUNC)
continue;
 
-   mark_func_jump_tables(file, func);
-   ret = add_func_jump_tables(file, func);
+   ret = mark_add_func_jump_tables(file, func);
if (ret)
return ret;
}
-- 
2.47.0




[PATCH v5 15/15] powerpc: Implement UACCESS validation on PPC32

2025-01-15 Thread Christophe Leroy
In order to implement UACCESS validation, objtool support
for powerpc needs to be enhanced to decode more instructions.

It also requires implementation of switch tables finding.
On PPC32 it is similar to x86, switch tables are anonymous in .rodata,
the difference is that the value is relative to its index in the table.

Another big different is that several switch tables can be nested
so the register containing the table base address also needs to be
tracked and taken into account.

Don't activate it for Clang for now because its switch tables are
different from GCC switch tables.

Then comes the UACCESS enabling/disabling instructions. On booke and
8xx it is done with a mtspr instruction. For 8xx that's in SPRN_MD_AP,
for booke that's in SPRN_PID. Annotate those instructions.

No work has been done for ASM files, they are not used for UACCESS
so for the moment just tell objtool to ignore ASM files.

For relocatable code, the .got2 relocation preceding each global
function needs to be marked as ignored because some versions of GCC
do this:

 120:   00 00 00 00 .long 0x0
120: R_PPC_REL32.got2+0x7ff0

0124 :
 124:   94 21 ff f0 stwur1,-16(r1)
 128:   7c 08 02 a6 mflrr0
 12c:   42 9f 00 05 bcl 20,4*cr7+so,130 
 130:   39 00 00 00 li  r8,0
 134:   39 20 00 08 li  r9,8
 138:   93 c1 00 08 stw r30,8(r1)
 13c:   7f c8 02 a6 mflrr30
 140:   90 01 00 14 stw r0,20(r1)
 144:   80 1e ff f0 lwz r0,-16(r30)
 148:   7f c0 f2 14 add r30,r0,r30
 14c:   81 5e 80 00 lwz r10,-32768(r30)
 150:   80 fe 80 04 lwz r7,-32764(r30)

Also declare longjmp() and start_secondary_resume() as global noreturn
functions, and declare __copy_tofrom_user() and __arch_clear_user()
as UACCESS safe.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig  |   2 +
 arch/powerpc/include/asm/book3s/32/kup.h  |   2 +
 arch/powerpc/include/asm/nohash/32/kup-8xx.h  |   4 +-
 arch/powerpc/include/asm/nohash/kup-booke.h   |   4 +-
 arch/powerpc/kexec/core_32.c  |   4 +-
 arch/powerpc/mm/nohash/kup.c  |   2 +
 tools/objtool/arch/powerpc/decode.c   | 150 +-
 .../arch/powerpc/include/arch/noreturns.h |  11 ++
 .../arch/powerpc/include/arch/special.h   |   9 ++
 tools/objtool/arch/powerpc/special.c  |  37 -
 tools/objtool/check.c |   6 +-
 11 files changed, 216 insertions(+), 15 deletions(-)
 create mode 100644 tools/objtool/arch/powerpc/include/arch/noreturns.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a0ce777f9706..525ab52b79fb 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -162,6 +162,7 @@ config PPC
select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE if PPC_RADIX_MMU
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
+   select ARCH_OBJTOOL_SKIP_ASM
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
select ARCH_SPLIT_ARG64 if PPC32
@@ -267,6 +268,7 @@ config PPC
select HAVE_OPTPROBES
select HAVE_OBJTOOL if 
ARCH_USING_PATCHABLE_FUNCTION_ENTRY || MPROFILE_KERNEL || PPC32
select HAVE_OBJTOOL_MCOUNT  if HAVE_OBJTOOL
+   select HAVE_UACCESS_VALIDATION  if HAVE_OBJTOOL && PPC_KUAP && 
PPC32 && CC_IS_GCC
select HAVE_PERF_EVENTS
select HAVE_PERF_EVENTS_NMI if PPC64
select HAVE_PERF_REGS
diff --git a/arch/powerpc/include/asm/book3s/32/kup.h 
b/arch/powerpc/include/asm/book3s/32/kup.h
index 4e14a5427a63..9e158b1dd3a6 100644
--- a/arch/powerpc/include/asm/book3s/32/kup.h
+++ b/arch/powerpc/include/asm/book3s/32/kup.h
@@ -34,6 +34,7 @@ static __always_inline void uaccess_begin_32s(unsigned long 
addr)
asm volatile(ASM_MMU_FTR_IFSET(
"mfsrin %0, %1;"
"rlwinm %0, %0, 0, %2;"
+   ANNOTATE_UACCESS_BEGIN
"mtsrin %0, %1;"
"isync", "", %3)
: "=&r"(tmp)
@@ -48,6 +49,7 @@ static __always_inline void uaccess_end_32s(unsigned long 
addr)
asm volatile(ASM_MMU_FTR_IFSET(
"mfsrin %0, %1;"
"oris %0, %0, %2;"
+   ANNOTATE_UACCESS_END
"mtsrin %0, %1;"
"isync", "", %3)
: "=&r"(tmp)
diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h 
b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
index 46bc5925e5fd..8f3a859fe0a1 100644
--- a/arch/powerpc/include/asm/nohash/32/kup-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
@@ -39,13 +39,13 @@ static __always_inline unsigned long 
__kuap_get_and_assert_locked(void)
 
 static __always_inline void uacces

[PATCH v5 08/15] objtool: Track general purpose register used for switch table base

2025-01-15 Thread Christophe Leroy
A function can contain nested switch tables using different registers
as base address.

In order to avoid failure in tracking those switch tables, the register
containing the base address needs to be taken into account.

To do so, add a 5 bits field in struct instruction that will hold the
ID of the register containing the base address of the switch table and
take that register into account during the backward search in order to
not stop the walk when encountering a jump related to another switch
table.

On architectures not handling it, the ID stays nul and has no impact
on the search.

To enable that, also provide to arch_find_switch_table() the dynamic
instruction related to a table search.

Also allow prev_insn_same_sec() to be used outside check.c so that
architectures can backward walk through instruction to find out which
register is used as base address for a switch table.

Signed-off-by: Christophe Leroy 
---
 tools/objtool/arch/powerpc/special.c| 3 ++-
 tools/objtool/arch/x86/special.c| 3 ++-
 tools/objtool/check.c   | 9 +
 tools/objtool/include/objtool/check.h   | 6 --
 tools/objtool/include/objtool/special.h | 3 ++-
 5 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/tools/objtool/arch/powerpc/special.c 
b/tools/objtool/arch/powerpc/special.c
index 51610689abf7..0b3a766c4842 100644
--- a/tools/objtool/arch/powerpc/special.c
+++ b/tools/objtool/arch/powerpc/special.c
@@ -14,7 +14,8 @@ bool arch_support_alt_relocation(struct special_alt 
*special_alt,
 
 struct reloc *arch_find_switch_table(struct objtool_file *file,
 struct instruction *insn,
-unsigned long *table_size)
+unsigned long *table_size,
+struct instruction *orig_insn)
 {
exit(-1);
 }
diff --git a/tools/objtool/arch/x86/special.c b/tools/objtool/arch/x86/special.c
index 76c7933bcb19..b0147923a70c 100644
--- a/tools/objtool/arch/x86/special.c
+++ b/tools/objtool/arch/x86/special.c
@@ -110,7 +110,8 @@ bool arch_support_alt_relocation(struct special_alt 
*special_alt,
  */
 struct reloc *arch_find_switch_table(struct objtool_file *file,
 struct instruction *insn,
-unsigned long *table_size)
+unsigned long *table_size,
+struct instruction *orig_insn)
 {
struct reloc  *text_reloc, *rodata_reloc;
struct section *table_sec;
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 613d169eb6b8..72b977f81dd6 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -81,8 +81,8 @@ static struct instruction *next_insn_same_func(struct 
objtool_file *file,
return find_insn(file, func->cfunc->sec, func->cfunc->offset);
 }
 
-static struct instruction *prev_insn_same_sec(struct objtool_file *file,
- struct instruction *insn)
+struct instruction *prev_insn_same_sec(struct objtool_file *file,
+  struct instruction *insn)
 {
if (insn->idx == 0) {
if (insn->prev_len)
@@ -2028,7 +2028,8 @@ static void find_jump_table(struct objtool_file *file, 
struct symbol *func,
 insn && insn_func(insn) && insn_func(insn)->pfunc == func;
 insn = insn->first_jump_src ?: prev_insn_same_sym(file, insn)) {
 
-   if (insn != orig_insn && insn->type == INSN_JUMP_DYNAMIC)
+   if (insn != orig_insn && insn->type == INSN_JUMP_DYNAMIC &&
+   insn->gpr == orig_insn->gpr)
break;
 
/* allow small jumps within the range */
@@ -2038,7 +2039,7 @@ static void find_jump_table(struct objtool_file *file, 
struct symbol *func,
 insn->jump_dest->offset > orig_insn->offset))
break;
 
-   table_reloc = arch_find_switch_table(file, insn, &table_size);
+   table_reloc = arch_find_switch_table(file, insn, &table_size, 
orig_insn);
if (!table_reloc)
continue;
 
diff --git a/tools/objtool/include/objtool/check.h 
b/tools/objtool/include/objtool/check.h
index e1cd13cd28a3..8b68f840dddb 100644
--- a/tools/objtool/include/objtool/check.h
+++ b/tools/objtool/include/objtool/check.h
@@ -63,8 +63,9 @@ struct instruction {
noendbr : 1,
unret   : 1,
visited : 4,
-   no_reloc: 1;
-   /* 10 bit hole */
+   no_reloc: 1,
+   gpr : 5;
+   /* 5 bit hole */
 
struct alt_group *alt_group;
struct instruction *jump_dest;
@@ -118,6 +119,7 @@ struct instruction *find_insn(struct objtool_file *file,
  struct section *sec, unsigne

[PATCH v5 02/15] objtool: Move back misplaced comment

2025-01-15 Thread Christophe Leroy
A comment was introduced by commit 113d4bc90483 ("objtool: Fix
clang switch table edge case") and wrongly moved by
commit d871f7b5a6a2 ("objtool: Refactor jump table code to support
other architectures") without the piece of code added with the
comment in the original commit.

Fixes: d871f7b5a6a2 ("objtool: Refactor jump table code to support other 
architectures")
Signed-off-by: Christophe Leroy 
---
 tools/objtool/arch/x86/special.c | 5 -
 tools/objtool/check.c| 6 ++
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/tools/objtool/arch/x86/special.c b/tools/objtool/arch/x86/special.c
index 9c1c9df09aaa..76c7933bcb19 100644
--- a/tools/objtool/arch/x86/special.c
+++ b/tools/objtool/arch/x86/special.c
@@ -142,11 +142,6 @@ struct reloc *arch_find_switch_table(struct objtool_file 
*file,
strcmp(table_sec->name, C_JUMP_TABLE_SECTION))
return NULL;
 
-   /*
-* Each table entry has a rela associated with it.  The rela
-* should reference text in the same function as the original
-* instruction.
-*/
rodata_reloc = find_reloc_by_dest(file->elf, table_sec, table_offset);
if (!rodata_reloc)
return NULL;
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 129c4e2245ae..58d9b1a750e3 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2036,6 +2036,12 @@ static void find_jump_table(struct objtool_file *file, 
struct symbol *func,
table_reloc = arch_find_switch_table(file, insn, &table_size);
if (!table_reloc)
continue;
+
+   /*
+* Each table entry has a rela associated with it.  The rela
+* should reference text in the same function as the original
+* instruction.
+*/
dest_insn = find_insn(file, table_reloc->sym->sec, 
reloc_addend(table_reloc));
if (!dest_insn || !insn_func(dest_insn) || 
insn_func(dest_insn)->pfunc != func)
continue;
-- 
2.47.0




Re: [PATCH 1/2] PCI: dwc: dra7xx: Use syscon_regmap_lookup_by_phandle_args

2025-01-15 Thread Bjorn Helgaas
On Sun, Jan 12, 2025 at 02:39:02PM +0100, Krzysztof Kozlowski wrote:
> Use syscon_regmap_lookup_by_phandle_args() which is a wrapper over
> syscon_regmap_lookup_by_phandle() combined with getting the syscon
> argument.  Except simpler code this annotates within one line that given
> phandle has arguments, so grepping for code would be easier.
> 
> There is also no real benefit in printing errors on missing syscon
> argument, because this is done just too late: runtime check on
> static/build-time data.  Dtschema and Devicetree bindings offer the
> static/build-time check for this already.
> 
> Signed-off-by: Krzysztof Kozlowski 
> ---
>  drivers/pci/controller/dwc/pci-dra7xx.c | 27 ++-
>  1 file changed, 6 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c 
> b/drivers/pci/controller/dwc/pci-dra7xx.c
> index 
> 5c62e1a3ba52919afe96fbcbc6edaf70775a69cb..33d6bf460ffe5bb724a061558dd93ec7bdadc336
>  100644
> --- a/drivers/pci/controller/dwc/pci-dra7xx.c
> +++ b/drivers/pci/controller/dwc/pci-dra7xx.c
> @@ -635,30 +635,20 @@ static int dra7xx_pcie_unaligned_memaccess(struct 
> device *dev)
>  {
>   int ret;
>   struct device_node *np = dev->of_node;
> - struct of_phandle_args args;
> + unsigned int args[2];
>   struct regmap *regmap;
>  
> - regmap = syscon_regmap_lookup_by_phandle(np,
> -  "ti,syscon-unaligned-access");
> + regmap = syscon_regmap_lookup_by_phandle_args(np, 
> "ti,syscon-unaligned-access",
> +   2, args);
>   if (IS_ERR(regmap)) {
>   dev_dbg(dev, "can't get ti,syscon-unaligned-access\n");
>   return -EINVAL;
>   }
>  
> - ret = of_parse_phandle_with_fixed_args(np, "ti,syscon-unaligned-access",
> -2, 0, &args);
> - if (ret) {
> - dev_err(dev, "failed to parse ti,syscon-unaligned-access\n");
> - return ret;
> - }
> -
> - ret = regmap_update_bits(regmap, args.args[0], args.args[1],
> -  args.args[1]);
> + ret = regmap_update_bits(regmap, args[0], args[1], args[1]);
>   if (ret)
>   dev_err(dev, "failed to enable unaligned access\n");
>  
> - of_node_put(args.np);
> -
>   return ret;
>  }
>  
> @@ -671,18 +661,13 @@ static int dra7xx_pcie_configure_two_lane(struct device 
> *dev,
>   u32 mask;
>   u32 val;
>  
> - pcie_syscon = syscon_regmap_lookup_by_phandle(np, "ti,syscon-lane-sel");
> + pcie_syscon = syscon_regmap_lookup_by_phandle_args(np, 
> "ti,syscon-lane-sel",
> +1, &pcie_reg);
>   if (IS_ERR(pcie_syscon)) {
>   dev_err(dev, "unable to get ti,syscon-lane-sel\n");
>   return -EINVAL;
>   }
>  
> - if (of_property_read_u32_index(np, "ti,syscon-lane-sel", 1,
> -&pcie_reg)) {
> - dev_err(dev, "couldn't get lane selection reg offset\n");
> - return -EINVAL;
> - }

Wow.  I believe you that syscon_regmap_lookup_by_phandle_args() is
equivalent to both:

  - syscon_regmap_lookup_by_phandle() followed by
of_parse_phandle_with_fixed_args(), and

  - syscon_regmap_lookup_by_phandle() followed by
of_property_read_u32_index()

but I can't say it's obvious to this syscon- and OF-naive reviewer,
even after tracing a few layers in :)

Bjorn



Re: [PATCH] selftests: livepatch: handle PRINTK_CALLER in check_result()

2025-01-15 Thread Madhavan Srinivasan



On 1/15/25 11:40 PM, Joe Lawrence wrote:
> On Tue, Jan 14, 2025 at 08:01:44PM +0530, Madhavan Srinivasan wrote:
>> Some arch configs (like ppc64) enable CONFIG_PRINTK_CALLER, which
>> adds the caller id as part of the dmesg. Due to this, even though
>> the expected vs observed are same, end testcase results are failed.
>>
>>  -% insmod test_modules/test_klp_livepatch.ko
>>  -livepatch: enabling patch 'test_klp_livepatch'
>>  -livepatch: 'test_klp_livepatch': initializing patching transition
>>  -livepatch: 'test_klp_livepatch': starting patching transition
>>  -livepatch: 'test_klp_livepatch': completing patching transition
>>  -livepatch: 'test_klp_livepatch': patching complete
>>  -% echo 0 > /sys/kernel/livepatch/test_klp_livepatch/enabled
>>  -livepatch: 'test_klp_livepatch': initializing unpatching transition
>>  -livepatch: 'test_klp_livepatch': starting unpatching transition
>>  -livepatch: 'test_klp_livepatch': completing unpatching transition
>>  -livepatch: 'test_klp_livepatch': unpatching complete
>>  -% rmmod test_klp_livepatch
>>  +[   T3659] % insmod test_modules/test_klp_livepatch.ko
>>  +[   T3682] livepatch: enabling patch 'test_klp_livepatch'
>>  +[   T3682] livepatch: 'test_klp_livepatch': initializing patching 
>> transition
>>  +[   T3682] livepatch: 'test_klp_livepatch': starting patching transition
>>  +[T826] livepatch: 'test_klp_livepatch': completing patching transition
>>  +[T826] livepatch: 'test_klp_livepatch': patching complete
>>  +[   T3659] % echo 0 > /sys/kernel/livepatch/test_klp_livepatch/enabled
>>  +[   T3659] livepatch: 'test_klp_livepatch': initializing unpatching 
>> transition
>>  +[   T3659] livepatch: 'test_klp_livepatch': starting unpatching transition
>>  +[T789] livepatch: 'test_klp_livepatch': completing unpatching 
>> transition
>>  +[T789] livepatch: 'test_klp_livepatch': unpatching complete
>>  +[   T3659] % rmmod test_klp_livepatch
>>
>>   ERROR: livepatch kselftest(s) failed
>>  not ok 1 selftests: livepatch: test-livepatch.sh # exit=1
>>
>> Currently the check_result() handles the "[time]" removal from
>> the dmesg. Enhance the check to handle removal of "[Tid]" also.
>>
>> Signed-off-by: Madhavan Srinivasan 
>> ---
>>  tools/testing/selftests/livepatch/functions.sh | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/testing/selftests/livepatch/functions.sh 
>> b/tools/testing/selftests/livepatch/functions.sh
>> index e5d06fb40233..a1730c1864a4 100644
>> --- a/tools/testing/selftests/livepatch/functions.sh
>> +++ b/tools/testing/selftests/livepatch/functions.sh
>> @@ -306,7 +306,8 @@ function check_result {
>>  result=$(dmesg | awk -v last_dmesg="$LAST_DMESG" 'p; $0 == last_dmesg { 
>> p=1 }' | \
>>   grep -e 'livepatch:' -e 'test_klp' | \
>>   grep -v '\(tainting\|taints\) kernel' | \
>> - sed 's/^\[[ 0-9.]*\] //')
>> + sed 's/^\[[ 0-9.]*\] //' | \
>> + sed 's/^\[[ ]*T[0-9]*\] //')
> 
> Thanks for adding this to the filter.
> 
> If I read the PRINTK_CALLER docs correctly, there is a potential CPU
> identifier as well.  Are there any instances where the livepatching code
> will use the "[C$processor_id]" (out of task context) prefix?  Or would
> it hurt to future proof with [CT][0-9]?

Thanks for the review.

yeah, saw that case, but in my current build and boot test, seen only the 
Thread-id added,
so sent out to fix that. I did not get to add a test to create "processor id" 
scenario,
so cant test it at this point.

Maddy

> 
> Acked-by: Joe Lawrence 
> 
> --
> Joe
> 
>>  
>>  if [[ "$expect" == "$result" ]] ; then
>>  echo "ok"
>> -- 
>> 2.47.0
>>
> 




Re: [PATCH v2 3/7] syscall.h: add syscall_set_arguments() and syscall_set_return_value()

2025-01-15 Thread Charlie Jenkins
On Mon, Jan 13, 2025 at 07:11:40PM +0200, Dmitry V. Levin wrote:
> These functions are going to be needed on all HAVE_ARCH_TRACEHOOK
> architectures to implement PTRACE_SET_SYSCALL_INFO API.
> 
> This partially reverts commit 7962c2eddbfe ("arch: remove unused
> function syscall_set_arguments()") by reusing some of old
> syscall_set_arguments() implementations.
> 
> Signed-off-by: Dmitry V. Levin 
> ---
> 
> Note that I'm not a MIPS expert, I just added mips_set_syscall_arg() by
> looking at mips_get_syscall_arg() and the result passes tests in qemu on
> mips O32, mips64 O32, mips64 N32, and mips64 N64.
> 
>  arch/arc/include/asm/syscall.h| 14 +++
>  arch/arm/include/asm/syscall.h| 13 ++
>  arch/arm64/include/asm/syscall.h  | 13 ++
>  arch/csky/include/asm/syscall.h   | 13 ++
>  arch/hexagon/include/asm/syscall.h| 14 +++
>  arch/loongarch/include/asm/syscall.h  |  8 ++
>  arch/mips/include/asm/syscall.h   | 32 
>  arch/nios2/include/asm/syscall.h  | 11 
>  arch/openrisc/include/asm/syscall.h   |  7 ++
>  arch/parisc/include/asm/syscall.h | 12 +
>  arch/powerpc/include/asm/syscall.h| 10 
>  arch/riscv/include/asm/syscall.h  |  9 +++
>  arch/s390/include/asm/syscall.h   | 12 +
>  arch/sh/include/asm/syscall_32.h  | 12 +
>  arch/sparc/include/asm/syscall.h  | 10 
>  arch/um/include/asm/syscall-generic.h | 14 +++
>  arch/x86/include/asm/syscall.h| 36 +++
>  arch/xtensa/include/asm/syscall.h | 11 
>  include/asm-generic/syscall.h | 16 
>  19 files changed, 267 insertions(+)
> 
> diff --git a/arch/arc/include/asm/syscall.h b/arch/arc/include/asm/syscall.h
> index 9709256e31c8..89c1e1736356 100644
> --- a/arch/arc/include/asm/syscall.h
> +++ b/arch/arc/include/asm/syscall.h
> @@ -67,6 +67,20 @@ syscall_get_arguments(struct task_struct *task, struct 
> pt_regs *regs,
>   }
>  }
>  
> +static inline void
> +syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
> +   unsigned long *args)
> +{
> + unsigned long *inside_ptregs = ®s->r0;
> + unsigned int n = 6;
> + unsigned int i = 0;
> +
> + while (n--) {
> + *inside_ptregs = args[i++];
> + inside_ptregs--;
> + }
> +}
> +
>  static inline int
>  syscall_get_arch(struct task_struct *task)
>  {
> diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
> index fe4326d938c1..21927fa0ae2b 100644
> --- a/arch/arm/include/asm/syscall.h
> +++ b/arch/arm/include/asm/syscall.h
> @@ -80,6 +80,19 @@ static inline void syscall_get_arguments(struct 
> task_struct *task,
>   memcpy(args, ®s->ARM_r0 + 1, 5 * sizeof(args[0]));
>  }
>  
> +static inline void syscall_set_arguments(struct task_struct *task,
> +  struct pt_regs *regs,
> +  const unsigned long *args)
> +{
> + memcpy(®s->ARM_r0, args, 6 * sizeof(args[0]));
> + /*
> +  * Also copy the first argument into ARM_ORIG_r0
> +  * so that syscall_get_arguments() would return it
> +  * instead of the previous value.
> +  */
> + regs->ARM_ORIG_r0 = regs->ARM_r0;
> +}
> +
>  static inline int syscall_get_arch(struct task_struct *task)
>  {
>   /* ARM tasks don't change audit architectures on the fly. */
> diff --git a/arch/arm64/include/asm/syscall.h 
> b/arch/arm64/include/asm/syscall.h
> index ab8e14b96f68..76020b66286b 100644
> --- a/arch/arm64/include/asm/syscall.h
> +++ b/arch/arm64/include/asm/syscall.h
> @@ -73,6 +73,19 @@ static inline void syscall_get_arguments(struct 
> task_struct *task,
>   memcpy(args, ®s->regs[1], 5 * sizeof(args[0]));
>  }
>  
> +static inline void syscall_set_arguments(struct task_struct *task,
> +  struct pt_regs *regs,
> +  const unsigned long *args)
> +{
> + memcpy(®s->regs[0], args, 6 * sizeof(args[0]));
> + /*
> +  * Also copy the first argument into orig_x0
> +  * so that syscall_get_arguments() would return it
> +  * instead of the previous value.
> +  */
> + regs->orig_x0 = regs->regs[0];
> +}
> +
>  /*
>   * We don't care about endianness (__AUDIT_ARCH_LE bit) here because
>   * AArch64 has the same system calls both on little- and big- endian.
> diff --git a/arch/csky/include/asm/syscall.h b/arch/csky/include/asm/syscall.h
> index 0de5734950bf..30403f7a0487 100644
> --- a/arch/csky/include/asm/syscall.h
> +++ b/arch/csky/include/asm/syscall.h
> @@ -59,6 +59,19 @@ syscall_get_arguments(struct task_struct *task, struct 
> pt_regs *regs,
>   memcpy(args, ®s->a1, 5 * sizeof(args[0]));
>  }
>  
> +static inline void
> +syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
> +   const unsigned long *args)

Re: [PATCH v2 4/7] syscall.h: introduce syscall_set_nr()

2025-01-15 Thread Charlie Jenkins
On Mon, Jan 13, 2025 at 07:11:51PM +0200, Dmitry V. Levin wrote:
> Similar to syscall_set_arguments() that complements
> syscall_get_arguments(), introduce syscall_set_nr()
> that complements syscall_get_nr().
> 
> syscall_set_nr() is going to be needed along with
> syscall_set_arguments() on all HAVE_ARCH_TRACEHOOK
> architectures to implement PTRACE_SET_SYSCALL_INFO API.
> 
> Signed-off-by: Dmitry V. Levin 
> ---
>  arch/arc/include/asm/syscall.h| 11 +++
>  arch/arm/include/asm/syscall.h| 24 
>  arch/arm64/include/asm/syscall.h  | 16 
>  arch/hexagon/include/asm/syscall.h|  7 +++
>  arch/loongarch/include/asm/syscall.h  |  7 +++
>  arch/m68k/include/asm/syscall.h   |  7 +++
>  arch/microblaze/include/asm/syscall.h |  7 +++
>  arch/mips/include/asm/syscall.h   | 14 ++
>  arch/nios2/include/asm/syscall.h  |  5 +
>  arch/openrisc/include/asm/syscall.h   |  6 ++
>  arch/parisc/include/asm/syscall.h |  7 +++
>  arch/powerpc/include/asm/syscall.h| 10 ++
>  arch/riscv/include/asm/syscall.h  |  7 +++
>  arch/s390/include/asm/syscall.h   | 12 
>  arch/sh/include/asm/syscall_32.h  | 12 
>  arch/sparc/include/asm/syscall.h  | 12 
>  arch/um/include/asm/syscall-generic.h |  5 +
>  arch/x86/include/asm/syscall.h|  7 +++
>  arch/xtensa/include/asm/syscall.h |  7 +++
>  include/asm-generic/syscall.h | 14 ++
>  20 files changed, 197 insertions(+)
> 
> diff --git a/arch/arc/include/asm/syscall.h b/arch/arc/include/asm/syscall.h
> index 89c1e1736356..728d625a10f1 100644
> --- a/arch/arc/include/asm/syscall.h
> +++ b/arch/arc/include/asm/syscall.h
> @@ -23,6 +23,17 @@ syscall_get_nr(struct task_struct *task, struct pt_regs 
> *regs)
>   return -1;
>  }
>  
> +static inline void
> +syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
> +{
> + /*
> +  * Unlike syscall_get_nr(), syscall_set_nr() can be called only when
> +  * the target task is stopped for tracing on entering syscall, so
> +  * there is no need to have the same check syscall_get_nr() has.
> +  */
> + regs->r8 = nr;
> +}
> +
>  static inline void
>  syscall_rollback(struct task_struct *task, struct pt_regs *regs)
>  {
> diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
> index 21927fa0ae2b..18b102a30741 100644
> --- a/arch/arm/include/asm/syscall.h
> +++ b/arch/arm/include/asm/syscall.h
> @@ -68,6 +68,30 @@ static inline void syscall_set_return_value(struct 
> task_struct *task,
>   regs->ARM_r0 = (long) error ? error : val;
>  }
>  
> +static inline void syscall_set_nr(struct task_struct *task,
> +   struct pt_regs *regs,
> +   int nr)
> +{
> + if (nr == -1) {
> + task_thread_info(task)->abi_syscall = -1;
> + /*
> +  * When the syscall number is set to -1, the syscall will be
> +  * skipped.  In this case the syscall return value has to be
> +  * set explicitly, otherwise the first syscall argument is
> +  * returned as the syscall return value.
> +  */
> + syscall_set_return_value(task, regs, -ENOSYS, 0);
> + return;
> + }
> + if ((IS_ENABLED(CONFIG_AEABI) && !IS_ENABLED(CONFIG_OABI_COMPAT))) {
> + task_thread_info(task)->abi_syscall = nr;
> + return;
> + }
> + task_thread_info(task)->abi_syscall =
> + (task_thread_info(task)->abi_syscall & ~__NR_SYSCALL_MASK) |
> + (nr & __NR_SYSCALL_MASK);
> +}
> +
>  #define SYSCALL_MAX_ARGS 7
>  
>  static inline void syscall_get_arguments(struct task_struct *task,
> diff --git a/arch/arm64/include/asm/syscall.h 
> b/arch/arm64/include/asm/syscall.h
> index 76020b66286b..712daa90e643 100644
> --- a/arch/arm64/include/asm/syscall.h
> +++ b/arch/arm64/include/asm/syscall.h
> @@ -61,6 +61,22 @@ static inline void syscall_set_return_value(struct 
> task_struct *task,
>   regs->regs[0] = val;
>  }
>  
> +static inline void syscall_set_nr(struct task_struct *task,
> +   struct pt_regs *regs,
> +   int nr)
> +{
> + regs->syscallno = nr;
> + if (nr == -1) {
> + /*
> +  * When the syscall number is set to -1, the syscall will be
> +  * skipped.  In this case the syscall return value has to be
> +  * set explicitly, otherwise the first syscall argument is
> +  * returned as the syscall return value.
> +  */
> + syscall_set_return_value(task, regs, -ENOSYS, 0);
> + }
> +}
> +
>  #define SYSCALL_MAX_ARGS 6
>  
>  static inline void syscall_get_arguments(struct task_struct *task,
> diff --git a/arch/hexagon/include/asm/syscall.h 
> b/

RE: [PATCH v2 net-next 07/13] net: enetc: add RSS support for i.MX95 ENETC PF

2025-01-15 Thread Wei Fang
> On Mon, 13 Jan 2025 16:22:39 +0800 Wei Fang wrote:
> > Add Receive side scaling (RSS) support for i.MX95 ENETC PF to improve the
> > network performance and balance the CPU loading. In addition, since both
> > ENETC v1 and ENETC v4 only support the toeplitz algorithm, so a check for
> > hfunc was added.
> 
> This and previous commits are a bi hard to follow. You plumb some
> stuff thru in the previous commit. In this one you reshuffle things,
> again. Try to separate code movement / restructuring in one commit.
> And new additions more clearly in the next.

Okay, I will.

> 
> > +static void enetc4_set_rss_key(struct enetc_hw *hw, const u8 *key)
> > +{
> > +   int i;
> > +
> > +   for (i = 0; i < ENETC_RSSHASH_KEY_SIZE / 4; i++)
> > +   enetc_port_wr(hw, ENETC4_PRSSKR(i), ((u32 *)key)[i]);
> > +}
> > +
> > +static void enetc4_get_rss_key(struct enetc_hw *hw, u8 *key)
> > +{
> > +   int i;
> > +
> > +   for (i = 0; i < ENETC_RSSHASH_KEY_SIZE / 4; i++)
> > +   ((u32 *)key)[i] = enetc_port_rd(hw, ENETC4_PRSSKR(i));
> > +}
> 
> Isn't the only difference between the chips the register offset?
Yes.

> Why create full ops for something this trivial?

We add enetc_pf_hw_ops to implement different hardware ops
for different chips. So that they can be called in common functions.
Although the change is minor, it is consistent with the original
intention of adding enetc_pf_hw_ops.

> 
> > +static int enetc4_get_rxnfc(struct net_device *ndev, struct ethtool_rxnfc
> *rxnfc,
> > +   u32 *rule_locs)
> > +{
> > +   struct enetc_ndev_priv *priv = netdev_priv(ndev);
> > +
> > +   switch (rxnfc->cmd) {
> > +   case ETHTOOL_GRXRINGS:
> > +   rxnfc->data = priv->num_rx_rings;
> > +   break;
> > +   case ETHTOOL_GRXFH:
> > +   return enetc_get_rsshash(rxnfc);
> > +   default:
> > +   return -EOPNOTSUPP;
> > +   }
> > +
> > +   return 0;
> > +}
> 
> Why add a new function instead of returning EOPNOTSUPP for new chips
> in the existing one?

We will add ETHTOOL_G/SRXCLSXXX in the future, but both the hardware and
software implementation of ENETC4 are different from ENETC1, and we don't
want to mix them in one function, which would look a bit messy.

> 
> > @@ -712,6 +730,12 @@ static int enetc_set_rxfh(struct net_device *ndev,
> > struct enetc_hw *hw = &si->hw;
> > int err = 0;
> >
> > +   if (rxfh->hfunc != ETH_RSS_HASH_NO_CHANGE &&
> > +   rxfh->hfunc != ETH_RSS_HASH_TOP) {
> > +   netdev_err(ndev, "Only toeplitz hash function is supported\n");
> > +   return -EOPNOTSUPP;
> 
> Should be a separate commit.
> --
> pw-bot: cr



[PATCH v5 05/15] objtool: Add INSN_RETURN_CONDITIONAL

2025-01-15 Thread Christophe Leroy
Most functions have an unconditional return at the end, like
this one:

 :
   0:   81 22 04 d0 lwz r9,1232(r2)
   4:   38 60 00 00 li  r3,0
   8:   2c 09 00 00 cmpwi   r9,0
   c:   4d 82 00 20 beqlr   <== Conditional return
  10:   80 69 00 a0 lwz r3,160(r9)
  14:   54 63 00 36 clrrwi  r3,r3,4
  18:   68 63 04 00 xorir3,r3,1024
  1c:   7c 63 00 34 cntlzw  r3,r3
  20:   54 63 d9 7e srwir3,r3,5
  24:   4e 80 00 20 blr <== Unconditional return

But other functions like this other one below only have
conditional returns:

0028 :
  28:   81 25 00 00 lwz r9,0(r5)
  2c:   2c 08 00 00 cmpwi   r8,0
  30:   7d 29 30 78 andcr9,r9,r6
  34:   7d 27 3b 78 or  r7,r9,r7
  38:   54 84 65 3a rlwinm  r4,r4,12,20,29
  3c:   81 23 00 18 lwz r9,24(r3)
  40:   41 82 00 58 beq 98 
  44:   7d 29 20 2e lwzxr9,r9,r4
  48:   55 29 07 3a rlwinm  r9,r9,0,28,29
  4c:   2c 09 00 0c cmpwi   r9,12
  50:   41 82 00 08 beq 58 
  54:   39 00 00 80 li  r8,128
  58:   2c 08 00 01 cmpwi   r8,1
  5c:   90 e5 00 00 stw r7,0(r5)
  60:   4d a2 00 20 beqlr+  <== Conditional return
  64:   7c e9 3b 78 mr  r9,r7
  68:   39 40 00 00 li  r10,0
  6c:   39 4a 00 04 addir10,r10,4
  70:   7c 0a 40 00 cmpwr10,r8
  74:   91 25 00 04 stw r9,4(r5)
  78:   91 25 00 08 stw r9,8(r5)
  7c:   38 a5 00 10 addir5,r5,16
  80:   91 25 ff fc stw r9,-4(r5)
  84:   4c 80 00 20 bgelr   <== Conditional return
  88:   55 49 60 26 slwir9,r10,12
  8c:   7d 29 3a 14 add r9,r9,r7
  90:   91 25 00 00 stw r9,0(r5)
  94:   4b ff ff d8 b   6c 
  98:   39 00 00 04 li  r8,4
  9c:   4b ff ff bc b   58 

If conditional returns are decoded as INSN_OTHER, objtool considers
that the second function never returns.

If conditional returns are decoded as INSN_RETURN, objtool considers
that code after that conditional return is dead.

To overcome this situation, introduce INSN_RETURN_CONDITIONAL which
is taken as a confirmation that a function is not noreturn but still
sees following code as reachable.

Signed-off-by: Christophe Leroy 
Acked-by: Peter Zijlstra (Intel) 
---
 tools/objtool/check.c| 2 +-
 tools/objtool/include/objtool/arch.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 58d9b1a750e3..10979d68103d 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -279,7 +279,7 @@ static bool __dead_end_function(struct objtool_file *file, 
struct symbol *func,
func_for_each_insn(file, func, insn) {
empty = false;
 
-   if (insn->type == INSN_RETURN)
+   if (insn->type == INSN_RETURN || insn->type == 
INSN_RETURN_CONDITIONAL)
return false;
}
 
diff --git a/tools/objtool/include/objtool/arch.h 
b/tools/objtool/include/objtool/arch.h
index d63b46a19f39..900601e2f22b 100644
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -19,6 +19,7 @@ enum insn_type {
INSN_CALL,
INSN_CALL_DYNAMIC,
INSN_RETURN,
+   INSN_RETURN_CONDITIONAL,
INSN_CONTEXT_SWITCH,
INSN_BUG,
INSN_NOP,
-- 
2.47.0




Re: [PATCH RFC v2 01/29] mm: asi: Make some utility functions noinstr compatible

2025-01-15 Thread Borislav Petkov
On Fri, Jan 10, 2025 at 06:40:27PM +, Brendan Jackman wrote:
> Subject: Re: [PATCH RFC v2 01/29] mm: asi: Make some utility functions 
> noinstr compatible

The tip tree preferred format for patch subject prefixes is
'subsys/component:', e.g. 'x86/apic:', 'x86/mm/fault:', 'sched/fair:',
'genirq/core:'. Please do not use file names or complete file paths as
prefix. 'git log path/to/file' should give you a reasonable hint in most
cases.

So I guess "x86/asm:" or so.

> Some existing utility functions would need to be called from a noinstr
> context in the later patches. So mark these as either noinstr or
> __always_inline.
> 
> An earlier version of this by Junaid had a macro that was intended to
> tell the compiler "either inline this function, or call it in the
> noinstr section", which basically boiled down to:
> 
>  #define inline_or_noinstr noinline __section(".noinstr.text")
> 
> Unfortunately Thomas pointed out this will prevent the function from
> being inlined at call sites in .text.
> 
> So far I haven't been able[1] to find a formulation that lets us :
> 1. avoid calls from .noinstr.text -> .text,
> 2. while also letting the compiler freely decide what to inline.
> 
> 1 is a functional requirement so here I'm just giving up on 2. Existing
> callsites of this code are just forced inline. For the incoming code
> that needs to call it from noinstr, they will be out-of-line calls.

I'm not sure some of that belongs in the commit message - if you want to have
it in the submission, you should put it under the --- line below, right above
the diffstat.

> [1] 
> https://lore.kernel.org/lkml/ca+i-1c1z35m8wa_4awmq7--c1ogjnolgtkn4+td5gkg7qqa...@mail.gmail.com/
> 
> Checkpatch-args: --ignore=COMMIT_LOG_LONG_LINE

Yeah, you can drop those. People should not turn off brain, use checkpatch and
point at all the silly errors it spits anyway.

> Signed-off-by: Brendan Jackman 
> ---
>  arch/x86/include/asm/processor.h |  2 +-
>  arch/x86/include/asm/special_insns.h |  8 
>  arch/x86/include/asm/tlbflush.h  |  3 +++
>  arch/x86/mm/tlb.c| 13 +
>  4 files changed, 17 insertions(+), 9 deletions(-)

So I was just about to look at the below diff but then booting the patch in my
guest causes it to stop at:

[1.110988] sr 2:0:0:0: Attached scsi generic sg1 type 5
[1.114210] PM: Image not found (code -22)
[1.114903] clk: Disabling unused clocks
[1.119397] EXT4-fs (sda2): mounted filesystem 
90868bc4-a017-4fa2-ac81-931ba260346f ro with ordered data mode. Quota mode: 
disabled.
[1.121069] VFS: Mounted root (ext4 filesystem) readonly on device 8:2.
<--- EOF

with the below call stack.

Booting it on Linus' master branch is ok but this is tip/master with all that
we've accumulated for the next merge window along with other stuff I'm poking
at...

Long story short, lemme try to poke around tomorrow to try to figure out what
actually happens. It could be caused by the part of Rik's patches and this one
inlining things. We'll see...

native_flush_tlb_one_user (addr=2507219558400) at arch/x86/mm/tlb.c:1177
1177if (!static_cpu_has(X86_FEATURE_PTI))
(gdb) bt
#0  native_flush_tlb_one_user (addr=2507219558400) at arch/x86/mm/tlb.c:1177
#1  0x8128206e in flush_tlb_one_user (addr=addr@entry=2507219558400) at 
arch/x86/mm/tlb.c:1196
#2  flush_tlb_one_kernel (addr=addr@entry=2507219558400) at 
arch/x86/mm/tlb.c:1151
#3  0x812820b7 in do_kernel_range_flush (info=0x88807dc311c0) at 
arch/x86/mm/tlb.c:1092
#4  0x8137beb6 in csd_do_func (csd=0x0 , 
info=0x88807dc311c0, 
func=0x81282090 ) at kernel/smp.c:134
#5  smp_call_function_many_cond (mask=, 
func=func@entry=0x81282090 , 
info=0x88807dc311c0, scf_flags=scf_flags@entry=3, 
cond_func=cond_func@entry=0x0 )
at kernel/smp.c:876
#6  0x8137c254 in on_each_cpu_cond_mask (cond_func=cond_func@entry=0x0 
, 
func=func@entry=0x81282090 , info=, wait=wait@entry=true, 
mask=) at kernel/smp.c:1052
#7  0x81282020 in on_each_cpu (wait=1, info=, 
func=0x81282090 )
at ./include/linux/smp.h:71
#8  flush_tlb_kernel_range (start=start@entry=18446683600570097664, 
end=, end@entry=18446683600579907584)
at arch/x86/mm/tlb.c:1106
#9  0x81481c3f in __purge_vmap_area_lazy (start=18446683600570097664, 
start@entry=18446744073709551615, 
end=18446683600579907584, end@entry=0, 
full_pool_decay=full_pool_decay@entry=false) at mm/vmalloc.c:2284
#10 0x81481fde in _vm_unmap_aliases 
(start=start@entry=18446744073709551615, end=end@entry=0, 
flush=, flush@entry=0) at mm/vmalloc.c:2899
#11 0x81482049 in vm_unmap_aliases () at mm/vmalloc.c:2922
#12 0x81284d9f in change_page_attr_set_clr (addr=0xc901fef0, 
numpages=, mask_set=..., 
mask_clr=..., force_split=, in_flag=0, pages=0x0 
)
at arch/x86/mm/pat/set_memory.c:1881
#13 0x81285c52 in change_page_attr_set (array=0, mask=

Re: [PATCH v2 net-next 07/13] net: enetc: add RSS support for i.MX95 ENETC PF

2025-01-15 Thread Jakub Kicinski
On Thu, 16 Jan 2025 02:24:10 + Wei Fang wrote:
> > Why create full ops for something this trivial?  
> 
> We add enetc_pf_hw_ops to implement different hardware ops
> for different chips. So that they can be called in common functions.
> Although the change is minor, it is consistent with the original
> intention of adding enetc_pf_hw_ops.

In other words you prefer ops.

Now imagine you have to refactor such piece of code in 10 drivers 
and each of them has 2 layers of indirect ops like you do.
Unnecessary complexity.



Re: [PATCH v5 00/15] powerpc/objtool: uaccess validation for PPC32 (v5)

2025-01-15 Thread Christophe Leroy




Le 15/01/2025 à 23:42, Christophe Leroy a écrit :

This series adds UACCESS validation for PPC32. It includes
a dozen of changes to objtool core.

It applies on top of series "Cleanup/Optimise KUAP (v3)"
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=363368&state=*


I forgot to remove that sentence. That was merged long time ago so the 
series doesn't have any dependency anymore, it applies as standalone on 
top of git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git branch 
tip/objtool/core (HEAD 41a1e976623e ("x86/mm: Convert unreachable() to 
BUG()"))




It is almost mature, performs code analysis for all PPC32.

In this version objtool switch table lookup has been enhanced to
handle nested switch tables.

Most object files are correctly decoded, only a few
'unreachable instruction' warnings remain due to more complex
fonctions which include back and forth jumps or branches. Two types
of switch tables are missed at the time being:
- When switch table address is temporarily saved in the stack before
being used.
- When there are backwards jumps in the path.

It allowed to detect some UACCESS mess in a few files. They've been
fixed through other patches.

Changes in v5:
- Rebased on top of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
branch tip/objtool/core
- Use generic annotation infrastructure to annotate uaccess begin and end 
instructions





[PATCH] powerpc/configs/64s: Enable CONFIG_KALLSYMS_ALL

2025-01-15 Thread Madhavan Srinivasan
This adds all symbols required for use case like
livepatching. Distros already enable this config
and enabling this increases build time by 3%
(in a power9 128 cpu setup) and almost no size
changes for vmlinux.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/configs/powernv_defconfig | 1 +
 arch/powerpc/configs/ppc64_defconfig   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/configs/powernv_defconfig 
b/arch/powerpc/configs/powernv_defconfig
index ee84ade7a033..c92c2abb5680 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -343,3 +343,4 @@ CONFIG_KVM_BOOK3S_64_HV=m
 CONFIG_VHOST_NET=m
 CONFIG_PRINTK_TIME=y
 CONFIG_PRINTK_CALLER=y
+CONFIG_KALLSYMS_ALL=y
diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index f39c0d000c43..2800f6181332 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -471,3 +471,4 @@ CONFIG_TEST_MEMCAT_P=m
 CONFIG_TEST_MEMINIT=m
 CONFIG_TEST_FREE_PAGES=m
 CONFIG_MEMTEST=y
+CONFIG_KALLSYMS_ALL=y
-- 
2.47.0




Re: [PATCH v2] treewide: const qualify ctl_tables where applicable

2025-01-15 Thread Wei Liu
On Fri, Jan 10, 2025 at 03:16:08PM +0100, Joel Granados wrote:
[...]
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 7a35c82976e0..9453f0c26f2a 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -141,7 +141,7 @@ static int sysctl_record_panic_msg = 1;
>   * sysctl option to allow the user to control whether kmsg data should be
>   * reported to Hyper-V on panic.
>   */
> -static struct ctl_table hv_ctl_table[] = {
> +static const struct ctl_table hv_ctl_table[] = {
>   {
>   .procname   = "hyperv_record_panic_msg",
>   .data   = &sysctl_record_panic_msg,

Acked-by: Wei Liu 



[PATCH v2 4/6] kvm powerpc/book3s-apiv2: Introduce kvm-hv specific PMU

2025-01-15 Thread Vaibhav Jain
Introduce a new PMU named 'kvm-hv' to report Book3s kvm-hv specific
performance counters. This will expose KVM-HV specific performance
attributes to user-space via kernel's PMU infrastructure and would enable
users to monitor active kvm-hv based guests.

The patch creates necessary scaffolding to for the new PMU callbacks and
introduces two new exports kvmppc_{,un}register_pmu() that are called from
kvm-hv init and exit function to perform initialize and cleanup for the
'kvm-hv' PMU. The patch doesn't introduce any perf-events yet, which will
be introduced in later patches

Signed-off-by: Vaibhav Jain 

---
Changelog

v1->v2:
* Fixed an issue of kvm-hv not loading on baremetal kvm [Gautam]
---
 arch/powerpc/include/asm/kvm_book3s.h |  12 +++
 arch/powerpc/kvm/Makefile |   6 ++
 arch/powerpc/kvm/book3s_hv.c  |   9 ++
 arch/powerpc/kvm/book3s_hv_pmu.c  | 133 ++
 4 files changed, 160 insertions(+)
 create mode 100644 arch/powerpc/kvm/book3s_hv_pmu.c

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index e1ff291ba891..cf91a1493159 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -334,6 +334,9 @@ static inline bool kvmhv_is_nestedv1(void)
return !static_branch_likely(&__kvmhv_is_nestedv2);
 }
 
+int kvmppc_register_pmu(void);
+void kvmppc_unregister_pmu(void);
+
 #else
 
 static inline bool kvmhv_is_nestedv2(void)
@@ -346,6 +349,15 @@ static inline bool kvmhv_is_nestedv1(void)
return false;
 }
 
+static int kvmppc_register_pmu(void)
+{
+   return 0;
+}
+
+static void kvmppc_unregister_pmu(void)
+{
+}
+
 #endif
 
 int __kvmhv_nestedv2_reload_ptregs(struct kvm_vcpu *vcpu, struct pt_regs 
*regs);
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 4bd9d1230869..094c3916d9d0 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -92,6 +92,12 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) 
+= \
$(kvm-book3s_64-builtin-tm-objs-y) \
$(kvm-book3s_64-builtin-xics-objs-y)
 
+# enable kvm_hv perf events
+ifdef CONFIG_HAVE_PERF_EVENTS
+kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
+   book3s_hv_pmu.o
+endif
+
 obj-$(CONFIG_GUEST_STATE_BUFFER_TEST) += test-guest-state-buffer.o
 endif
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 25429905ae90..6365b8126574 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6662,6 +6662,14 @@ static int kvmppc_book3s_init_hv(void)
return r;
}
 
+   r = kvmppc_register_pmu();
+   if (r == -EOPNOTSUPP) {
+   pr_info("KVM-HV: PMU not supported %d\n", r);
+   } else if (r) {
+   pr_err("KVM-HV: Unable to register PMUs %d\n", r);
+   goto err;
+   }
+
kvm_ops_hv.owner = THIS_MODULE;
kvmppc_hv_ops = &kvm_ops_hv;
 
@@ -6676,6 +6684,7 @@ static int kvmppc_book3s_init_hv(void)
 
 static void kvmppc_book3s_exit_hv(void)
 {
+   kvmppc_unregister_pmu();
kvmppc_uvmem_free();
kvmppc_free_host_rm_ops();
if (kvmppc_radix_possible())
diff --git a/arch/powerpc/kvm/book3s_hv_pmu.c b/arch/powerpc/kvm/book3s_hv_pmu.c
new file mode 100644
index ..8c6ed30b7654
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_hv_pmu.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Description: PMUs specific to running nested KVM-HV guests
+ * on Book3S processors (specifically POWER9 and later).
+ */
+
+#define pr_fmt(fmt)  "kvmppc-pmu: " fmt
+
+#include "asm-generic/local64.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+enum kvmppc_pmu_eventid {
+   KVMPPC_EVENT_MAX,
+};
+
+static struct attribute *kvmppc_pmu_events_attr[] = {
+   NULL,
+};
+
+static const struct attribute_group kvmppc_pmu_events_group = {
+   .name = "events",
+   .attrs = kvmppc_pmu_events_attr,
+};
+
+PMU_FORMAT_ATTR(event, "config:0");
+static struct attribute *kvmppc_pmu_format_attr[] = {
+   &format_attr_event.attr,
+   NULL,
+};
+
+static struct attribute_group kvmppc_pmu_format_group = {
+   .name = "format",
+   .attrs = kvmppc_pmu_format_attr,
+};
+
+static const struct attribute_group *kvmppc_pmu_attr_groups[] = {
+   &kvmppc_pmu_events_group,
+   &kvmppc_pmu_format_group,
+   NULL,
+};
+
+static int kvmppc_pmu_event_init(struct perf_event *event)
+{
+   unsigned int config = event->attr.config;
+
+   pr_debug("%s: Event(%p) id=%llu cpu=%x on_cpu=%x config=%u",
+__func__, event, event->id, event->cpu,
+event->oncpu, config);
+
+   if (event->attr.type != event->pmu->type)
+   return -ENOENT;
+
+   if (config >= KVMPPC_EVENT_MAX)
+ 

[PATCH v2 2/6] kvm powerpc/book3s-apiv2: Add support for Hostwide GSB elements

2025-01-15 Thread Vaibhav Jain
Add support for adding and parsing Hostwide elements to the
Guest-state-buffer data structure used in apiv2. These elements are used to
share meta-information pertaining to entire L1-Lpar and this
meta-information is maintained by L0-PowerVM hypervisor. Example of this
include the amount of the page-table memory currently used by L0-PowerVM
for hosting the Shadow-Pagetable of all active L2-Guests. More of the are
documented in kernel-documentation at [1]. The Hostwide GSB elements are
currently only support with H_GUEST_SET_STATE hcall with a special flag
namely 'KVMPPC_GS_FLAGS_HOST_WIDE'.

The patch introduces new defs for the 5 new Hostwide GSB elements including
their GSIDs as well as introduces a new class of GSB elements namely
'KVMPPC_GS_CLASS_HOSTWIDE' to indicate to GSB construction/parsing
infrastructure in 'kvm/guest-state-buffer.c'. Also
gs_msg_ops_vcpu_get_size(), kvmppc_gsid_type() and
kvmppc_gse_{flatten,unflatten}_iden() are updated to appropriately indicate
the needed size for these Hostwide GSB elements as well as how to
flatten/unflatten their GSIDs so that they can be marked as available in
GSB bitmap.

[1] Documention/arch/powerpc/kvm-nested.rst

Signed-off-by: Vaibhav Jain 
---
 arch/powerpc/include/asm/guest-state-buffer.h | 35 ++---
 arch/powerpc/include/asm/hvcall.h | 13 ---
 arch/powerpc/kvm/book3s_hv_nestedv2.c |  6 +++
 arch/powerpc/kvm/guest-state-buffer.c | 39 +++
 4 files changed, 81 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/guest-state-buffer.h 
b/arch/powerpc/include/asm/guest-state-buffer.h
index d107abe1468f..acd61eb36d59 100644
--- a/arch/powerpc/include/asm/guest-state-buffer.h
+++ b/arch/powerpc/include/asm/guest-state-buffer.h
@@ -28,6 +28,21 @@
  /* Process Table Info */
 #define KVMPPC_GSID_PROCESS_TABLE  0x0006
 
+/* Guest Management Heap Size */
+#define KVMPPC_GSID_L0_GUEST_HEAP  0x0800
+
+/* Guest Management Heap Max Size */
+#define KVMPPC_GSID_L0_GUEST_HEAP_MAX  0x0801
+
+/* Guest Pagetable Size */
+#define KVMPPC_GSID_L0_GUEST_PGTABLE_SIZE  0x0802
+
+/* Guest Pagetable Max Size */
+#define KVMPPC_GSID_L0_GUEST_PGTABLE_SIZE_MAX  0x0803
+
+/* Guest Pagetable Reclaim in bytes */
+#define KVMPPC_GSID_L0_GUEST_PGTABLE_RECLAIM   0x0804
+
 /* H_GUEST_RUN_VCPU input buffer Info */
 #define KVMPPC_GSID_RUN_INPUT  0x0C00
 /* H_GUEST_RUN_VCPU output buffer Info */
@@ -106,6 +121,11 @@
 #define KVMPPC_GSE_GUESTWIDE_COUNT \
(KVMPPC_GSE_GUESTWIDE_END - KVMPPC_GSE_GUESTWIDE_START + 1)
 
+#define KVMPPC_GSE_HOSTWIDE_START KVMPPC_GSID_L0_GUEST_HEAP
+#define KVMPPC_GSE_HOSTWIDE_END KVMPPC_GSID_L0_GUEST_PGTABLE_RECLAIM
+#define KVMPPC_GSE_HOSTWIDE_COUNT \
+   (KVMPPC_GSE_HOSTWIDE_END - KVMPPC_GSE_HOSTWIDE_START + 1)
+
 #define KVMPPC_GSE_META_START KVMPPC_GSID_RUN_INPUT
 #define KVMPPC_GSE_META_END KVMPPC_GSID_VPA
 #define KVMPPC_GSE_META_COUNT (KVMPPC_GSE_META_END - KVMPPC_GSE_META_START + 1)
@@ -130,7 +150,8 @@
(KVMPPC_GSE_INTR_REGS_END - KVMPPC_GSE_INTR_REGS_START + 1)
 
 #define KVMPPC_GSE_IDEN_COUNT \
-   (KVMPPC_GSE_GUESTWIDE_COUNT + KVMPPC_GSE_META_COUNT + \
+   (KVMPPC_GSE_HOSTWIDE_COUNT + \
+KVMPPC_GSE_GUESTWIDE_COUNT + KVMPPC_GSE_META_COUNT + \
 KVMPPC_GSE_DW_REGS_COUNT + KVMPPC_GSE_W_REGS_COUNT + \
 KVMPPC_GSE_VSRS_COUNT + KVMPPC_GSE_INTR_REGS_COUNT)
 
@@ -139,10 +160,11 @@
  */
 enum {
KVMPPC_GS_CLASS_GUESTWIDE = 0x01,
-   KVMPPC_GS_CLASS_META = 0x02,
-   KVMPPC_GS_CLASS_DWORD_REG = 0x04,
-   KVMPPC_GS_CLASS_WORD_REG = 0x08,
-   KVMPPC_GS_CLASS_VECTOR = 0x10,
+   KVMPPC_GS_CLASS_HOSTWIDE = 0x02,
+   KVMPPC_GS_CLASS_META = 0x04,
+   KVMPPC_GS_CLASS_DWORD_REG = 0x08,
+   KVMPPC_GS_CLASS_WORD_REG = 0x10,
+   KVMPPC_GS_CLASS_VECTOR = 0x18,
KVMPPC_GS_CLASS_INTR = 0x20,
 };
 
@@ -164,6 +186,7 @@ enum {
  */
 enum {
KVMPPC_GS_FLAGS_WIDE = 0x01,
+   KVMPPC_GS_FLAGS_HOST_WIDE = 0x02,
 };
 
 /**
@@ -287,7 +310,7 @@ struct kvmppc_gs_msg_ops {
  * struct kvmppc_gs_msg - a guest state message
  * @bitmap: the guest state ids that should be included
  * @ops: modify message behavior for reading and writing to buffers
- * @flags: guest wide or thread wide
+ * @flags: host wide, guest wide or thread wide
  * @data: location where buffer data will be written to or from.
  *
  * A guest state message is allows flexibility in sending in receiving data
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 65d1f291393d..1c12713538a4 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -489,14 +489,15 @@
 #define H_RPTI_PAGE_ALL (-1UL)
 
 /* Flags for H_GUEST_{S,G}_STATE */
-#define H_GUEST_FLAGS_WIDE (1UL<<(63-0))
+#define H_GUEST_FLAGS_WIDE (1UL << (63 - 0))
+#define H_GUEST_FLAGS_HOST_WIDE(1UL << (63 - 1))
 
 /* Fl

[PATCH v2 3/6] kvm powerpc/book3s-apiv2: Add kunit tests for Hostwide GSB elements

2025-01-15 Thread Vaibhav Jain
Update 'test-guest-state-buffer.c' to add two new KUNIT test cases for
validating correctness of changes to Guest-state-buffer management
infrastructure for adding support for Hostwide GSB elements.

The newly introduced test test_gs_hostwide_msg() checks if the Hostwide
elements can be set and parsed from a Guest-state-buffer. The second kunit
test test_gs_hostwide_counters() checks if the Hostwide GSB elements can be
send to the L0-PowerVM hypervisor via the H_GUEST_SET_STATE hcall and
ensures that the returned guest-state-buffer has all the 5 Hostwide stat
counters present.

Below is the KATP test report with the newly added KUNIT tests:

KTAP version 1
# Subtest: guest_state_buffer_test
# module: test_guest_state_buffer
1..7
ok 1 test_creating_buffer
ok 2 test_adding_element
ok 3 test_gs_bitmap
ok 4 test_gs_parsing
ok 5 test_gs_msg
ok 6 test_gs_hostwide_msg
# test_gs_hostwide_counters: Guest Heap Size=0 bytes
# test_gs_hostwide_counters: Guest Heap Size Max=10995367936 bytes
# test_gs_hostwide_counters: Guest Page-table Size=2178304 bytes
# test_gs_hostwide_counters: Guest Page-table Size Max=2147483648 bytes
# test_gs_hostwide_counters: Guest Page-table Reclaim Size=0 bytes
ok 7 test_gs_hostwide_counters
 # guest_state_buffer_test: pass:7 fail:0 skip:0 total:7
 # Totals: pass:7 fail:0 skip:0 total:7
 ok 1 guest_state_buffer_test

Signed-off-by: Vaibhav Jain 
---
 arch/powerpc/kvm/test-guest-state-buffer.c | 210 +
 1 file changed, 210 insertions(+)

diff --git a/arch/powerpc/kvm/test-guest-state-buffer.c 
b/arch/powerpc/kvm/test-guest-state-buffer.c
index bfd225329a18..99a3d4b12843 100644
--- a/arch/powerpc/kvm/test-guest-state-buffer.c
+++ b/arch/powerpc/kvm/test-guest-state-buffer.c
@@ -141,6 +141,16 @@ static void test_gs_bitmap(struct kunit *test)
i++;
}
 
+   for (u16 iden = KVMPPC_GSID_L0_GUEST_HEAP;
+iden <= KVMPPC_GSID_L0_GUEST_PGTABLE_RECLAIM; iden++) {
+   kvmppc_gsbm_set(&gsbm, iden);
+   kvmppc_gsbm_set(&gsbm1, iden);
+   KUNIT_EXPECT_TRUE(test, kvmppc_gsbm_test(&gsbm, iden));
+   kvmppc_gsbm_clear(&gsbm, iden);
+   KUNIT_EXPECT_FALSE(test, kvmppc_gsbm_test(&gsbm, iden));
+   i++;
+   }
+
for (u16 iden = KVMPPC_GSID_RUN_INPUT; iden <= KVMPPC_GSID_VPA;
 iden++) {
kvmppc_gsbm_set(&gsbm, iden);
@@ -309,12 +319,212 @@ static void test_gs_msg(struct kunit *test)
kvmppc_gsm_free(gsm);
 }
 
+/* Test data struct for hostwide/L0 counters */
+struct kvmppc_gs_msg_test_hostwide_data {
+   u64 guest_heap;
+   u64 guest_heap_max;
+   u64 guest_pgtable_size;
+   u64 guest_pgtable_size_max;
+   u64 guest_pgtable_reclaim;
+};
+
+static size_t test_hostwide_get_size(struct kvmppc_gs_msg *gsm)
+
+{
+   size_t size = 0;
+   u16 ids[] = {
+   KVMPPC_GSID_L0_GUEST_HEAP,
+   KVMPPC_GSID_L0_GUEST_HEAP_MAX,
+   KVMPPC_GSID_L0_GUEST_PGTABLE_SIZE,
+   KVMPPC_GSID_L0_GUEST_PGTABLE_SIZE_MAX,
+   KVMPPC_GSID_L0_GUEST_PGTABLE_RECLAIM
+   };
+
+   for (int i = 0; i < ARRAY_SIZE(ids); i++)
+   size += kvmppc_gse_total_size(kvmppc_gsid_size(ids[i]));
+   return size;
+}
+
+static int test_hostwide_fill_info(struct kvmppc_gs_buff *gsb,
+  struct kvmppc_gs_msg *gsm)
+{
+   struct kvmppc_gs_msg_test_hostwide_data *data = gsm->data;
+
+   if (kvmppc_gsm_includes(gsm, KVMPPC_GSID_L0_GUEST_HEAP))
+   kvmppc_gse_put_u64(gsb, KVMPPC_GSID_L0_GUEST_HEAP,
+  data->guest_heap);
+   if (kvmppc_gsm_includes(gsm, KVMPPC_GSID_L0_GUEST_HEAP_MAX))
+   kvmppc_gse_put_u64(gsb, KVMPPC_GSID_L0_GUEST_HEAP_MAX,
+  data->guest_heap_max);
+   if (kvmppc_gsm_includes(gsm, KVMPPC_GSID_L0_GUEST_PGTABLE_SIZE))
+   kvmppc_gse_put_u64(gsb, KVMPPC_GSID_L0_GUEST_PGTABLE_SIZE,
+  data->guest_pgtable_size);
+   if (kvmppc_gsm_includes(gsm, KVMPPC_GSID_L0_GUEST_PGTABLE_SIZE_MAX))
+   kvmppc_gse_put_u64(gsb, KVMPPC_GSID_L0_GUEST_PGTABLE_SIZE_MAX,
+  data->guest_pgtable_size_max);
+   if (kvmppc_gsm_includes(gsm, KVMPPC_GSID_L0_GUEST_PGTABLE_RECLAIM))
+   kvmppc_gse_put_u64(gsb, KVMPPC_GSID_L0_GUEST_PGTABLE_RECLAIM,
+  data->guest_pgtable_reclaim);
+
+   return 0;
+}
+
+static int test_hostwide_refresh_info(struct kvmppc_gs_msg *gsm,
+ struct kvmppc_gs_buff *gsb)
+{
+   struct kvmppc_gs_parser gsp = { 0 };
+   struct kvmppc_gs_msg_test_hostwide_data *data = gsm->data;
+   struct kvmppc_gs_elem *gse;
+   int rc;
+
+   rc = kvmppc_gse_parse(&gsp, gsb);
+   if (rc < 0)
+   

[PATCH v2 0/5] kvm powerpc/book3s-hv: Expose Hostwide counters as perf-events

2025-01-15 Thread Vaibhav Jain
Changes from V1
Link: https://lore.kernel.org/all/20241222140247.174998-1-vaib...@linux.ibm.com

* Fixed an issue preventing loading of kvm-hv on PowerNV [Gautam]
* Improved the error handling of GSB callback hostwide_fill_info() [Gautam]
* Tweaks to documentation of Hostwide counters [Gautam]
* Proposed Qemu-TCG emulation for Hostwide counters [3]
===

This patch-series adds support for reporting Hostwide(L1-Lpar) counters via
perf-events. With the support for running KVM Guest in a PSeries-Lpar using
nested-APIv2 via [1], the underlying L0-PowerVM hypervisor holds some state
information pertaining to all running L2-KVM Guests in an L1-Lpar. This
state information is held in a pre-allocated memory thats owned by
L0-PowerVM and is termed as Guest-Management-Area(GMA). The GMA is
allocated per L1-LPAR and is only allocated if the lpar is KVM enabled. The
size of this area is a fixed percentage of the memory assigned to the KVM
enabled L1-lpar and is composed of two major components, Guest Management
Space(Host-Heap) and Guest Page Table Management Space(Host-Pagetable).

The Host-Heap holds the various data-structures allocated by L0-PowerVM for
L2-KVM Guests running in the L1-Lpar. The Host-Pagetable holds the Radix
pagetable[2] for the L2-KVM Guest which is used by L0-PowerVM to handle
page faults. Since the size of both of these areas is limited and fixed via
partition boot profile, it puts an upper bound on the number of L2-KVM
Guests that can be run in an LPAR. Also due limited size of Host-Pagetable
area, L0-PowerVM is at times forced to perform reclaim operation on
it. This reclaim operation is usually performed when running large number
of L2-KVM Guests which are memory bound and increases Host-Pagetable
utilization.

In light of the above its recommended to track usage of these areas to
ensure consistent L2-KVM Guest performance. Hence this patch-series
attempts to expose the max-size and current-usage of these areas as well as
cumulative amount of bytes reclaimed from Host-Pagetable as perf-events
that can be queried via perf-stat.

The patch series introduces a new 'kvm-hv' PMU which exports the
perf-events mentioned below. Since perf-events exported represents the
state of the whole L1-Lpar and not that of a specific L2-KVM guest hence
the 'kvm-hv' PMU's scope is set as PERF_PMU_SCOPE_SYS_WIDE(System-Wide).

New perf-events introduced
==

* kvm-hv/host_heap/ : The currently used bytes in the
  Hypervisor's Guest Management Space
  associated with the Host Partition.
* kvm-hv/host_heap_max/ : The maximum bytes available in the
  Hypervisor's Guest Management Space
  associated with the Host Partition.
* kvm-hv/host_pagetable/: The currently used bytes in the
  Hypervisor's Guest Page Table Management
  Space associated with the Host Partition.
* kvm-hv/host_pagetable_max/: The maximum bytes available in the
  Hypervisor's Guest Page Table Management
  Space associated with the Host Partition.
* kvm-hv/host_pagetable_reclaim/: The amount of space in bytes that has
  been reclaimed due to overcommit in the
  Hypervisor's Guest Page Table Management
  Space associated with the Host Partition.

Structure of this patch series
==
Start with documenting and updating the KVM nested-APIv2 hcall
specifications for H_GUEST_GET_STATE hcall and Hostwide guest-state-buffer
elements.

Subsequent patches add support for adding and parsing Hostwide
guest-state-buffer elements in existing kvm-hv apiv2 infrastructure. Also
add a kunit test case to verify correctness of the changes introduced.

Next set of patches in the patch-set introduces a new PMU for kvm-hv on
pseries named as 'kvm-hv', implement plumbing between kvm-hv module and
initialization of this new PMU, necessary setup code in kvm-hv pmu to
create populate and parse a guest-state-buffer holding the Hostwide
counters returned from L0-PowerVM.

The final patch in the series creates the five new perf-events which then
leverage the kernel's perf-event infrastructure to report the Hostwide
counters returned from L0-PowerVM to perf tool.

Output
==
Once the patch-set is integrated, perf-stat should report the Hostwide
counters for a kvm-enabled pseries lpar as below:

$ sudo perf stat -e 'kvm-hv/host_heap/'  -e 'kvm-hv/host_heap_max/' \
  -e 'kvm-hv/host_pagetable/' -e 'kvm-hv/host_pagetable_max/' \
  -e 'kvm-hv/host_pagetable_reclaim/' -- sleep 0

Performance counter stats for 'system wide':

 0  kvm-hv/host_heap/
10,995,367,936  kvm-hv/host_heap_max/
 2,178,304  kvm-hv/host_pagetab

[PATCH v2 6/6] kvm powerpc/book3s-hv-pmu: Add perf-events for Hostwide counters

2025-01-15 Thread Vaibhav Jain
Update 'book3s_hv_pmu.c' to add five new perf-events mapped to the five
Hostwide counters. Since these newly introduced perf events are at system
wide scope and can be read from any L1-Lpar CPU, 'kvmppv_pmu's scope and
capabilities are updated appropriately.

Also introduce two new helpers. First is kvmppc_update_l0_stats() that uses
the infrastructure introduced in previous patches to issues the
H_GUEST_GET_STATE hcall L0-PowerVM to fetch guest-state-buffer holding the
latest values of these counters which is then parsed and 'l0_stats'
variable updated.

Second helper is kvmppc_pmu_event_update() which is called from
'kvmppv_pmu' callbacks and uses kvmppc_update_l0_stats() to update
'l0_stats' and the update the 'struct perf_event's event-counter.

Some minor updates to kvmppc_pmu_{add, del, read}() to remove some debug
scaffolding code.

Signed-off-by: Vaibhav Jain 
---
 arch/powerpc/kvm/book3s_hv_pmu.c | 92 +++-
 1 file changed, 91 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_pmu.c b/arch/powerpc/kvm/book3s_hv_pmu.c
index 0107ed3b03e3..4c14c5885269 100644
--- a/arch/powerpc/kvm/book3s_hv_pmu.c
+++ b/arch/powerpc/kvm/book3s_hv_pmu.c
@@ -30,6 +30,11 @@
 #include "asm/guest-state-buffer.h"
 
 enum kvmppc_pmu_eventid {
+   KVMPPC_EVENT_HOST_HEAP,
+   KVMPPC_EVENT_HOST_HEAP_MAX,
+   KVMPPC_EVENT_HOST_PGTABLE,
+   KVMPPC_EVENT_HOST_PGTABLE_MAX,
+   KVMPPC_EVENT_HOST_PGTABLE_RECLAIM,
KVMPPC_EVENT_MAX,
 };
 
@@ -51,8 +56,14 @@ static DEFINE_SPINLOCK(lock_l0_stats);
 /* GSB related structs needed to talk to L0 */
 static struct kvmppc_gs_msg *gsm_l0_stats;
 static struct kvmppc_gs_buff *gsb_l0_stats;
+static struct kvmppc_gs_parser gsp_l0_stats;
 
 static struct attribute *kvmppc_pmu_events_attr[] = {
+   KVMPPC_PMU_EVENT_ATTR(host_heap, KVMPPC_EVENT_HOST_HEAP),
+   KVMPPC_PMU_EVENT_ATTR(host_heap_max, KVMPPC_EVENT_HOST_HEAP_MAX),
+   KVMPPC_PMU_EVENT_ATTR(host_pagetable, KVMPPC_EVENT_HOST_PGTABLE),
+   KVMPPC_PMU_EVENT_ATTR(host_pagetable_max, 
KVMPPC_EVENT_HOST_PGTABLE_MAX),
+   KVMPPC_PMU_EVENT_ATTR(host_pagetable_reclaim, 
KVMPPC_EVENT_HOST_PGTABLE_RECLAIM),
NULL,
 };
 
@@ -61,7 +72,7 @@ static const struct attribute_group kvmppc_pmu_events_group = 
{
.attrs = kvmppc_pmu_events_attr,
 };
 
-PMU_FORMAT_ATTR(event, "config:0");
+PMU_FORMAT_ATTR(event, "config:0-5");
 static struct attribute *kvmppc_pmu_format_attr[] = {
&format_attr_event.attr,
NULL,
@@ -78,6 +89,79 @@ static const struct attribute_group 
*kvmppc_pmu_attr_groups[] = {
NULL,
 };
 
+/*
+ * Issue the hcall to get the L0-host stats.
+ * Should be called with l0-stat lock held
+ */
+static int kvmppc_update_l0_stats(void)
+{
+   int rc;
+
+   /* With HOST_WIDE flags guestid and vcpuid will be ignored */
+   rc = kvmppc_gsb_recv(gsb_l0_stats, KVMPPC_GS_FLAGS_HOST_WIDE);
+   if (rc)
+   goto out;
+
+   /* Parse the guest state buffer is successful */
+   rc = kvmppc_gse_parse(&gsp_l0_stats, gsb_l0_stats);
+   if (rc)
+   goto out;
+
+   /* Update the l0 returned stats*/
+   memset(&l0_stats, 0, sizeof(l0_stats));
+   rc = kvmppc_gsm_refresh_info(gsm_l0_stats, gsb_l0_stats);
+
+out:
+   return rc;
+}
+
+/* Update the value of the given perf_event */
+static int kvmppc_pmu_event_update(struct perf_event *event)
+{
+   int rc;
+   u64 curr_val, prev_val;
+   unsigned long flags;
+   unsigned int config = event->attr.config;
+
+   /* Ensure no one else is modifying the l0_stats */
+   spin_lock_irqsave(&lock_l0_stats, flags);
+
+   rc = kvmppc_update_l0_stats();
+   if (!rc) {
+   switch (config) {
+   case KVMPPC_EVENT_HOST_HEAP:
+   curr_val = l0_stats.guest_heap;
+   break;
+   case KVMPPC_EVENT_HOST_HEAP_MAX:
+   curr_val = l0_stats.guest_heap_max;
+   break;
+   case KVMPPC_EVENT_HOST_PGTABLE:
+   curr_val = l0_stats.guest_pgtable_size;
+   break;
+   case KVMPPC_EVENT_HOST_PGTABLE_MAX:
+   curr_val = l0_stats.guest_pgtable_size_max;
+   break;
+   case KVMPPC_EVENT_HOST_PGTABLE_RECLAIM:
+   curr_val = l0_stats.guest_pgtable_reclaim;
+   break;
+   default:
+   rc = -ENOENT;
+   break;
+   }
+   }
+
+   spin_unlock_irqrestore(&lock_l0_stats, flags);
+
+   /* If no error than update the perf event */
+   if (!rc) {
+   prev_val = local64_xchg(&event->hw.prev_count, curr_val);
+   if (curr_val > prev_val)
+   local64_add(curr_val - prev_val, &event->count);
+   }
+
+   return rc;
+}
+
 static int kvmppc_pmu_event_init(

[PATCH v2 5/6] powerpc/book3s-hv-pmu: Implement GSB message-ops for hostwide counters

2025-01-15 Thread Vaibhav Jain
Implement and setup necessary structures to send a prepolulated
Guest-State-Buffer(GSB) requesting hostwide counters to L0-PowerVM and have
the returned GSB holding the values of these counters parsed. This is done
via existing GSB implementation and with the newly added support of
Hostwide elements in GSB.

The request to L0-PowerVM to return Hostwide counters is done using a
pre-allocated GSB named 'gsb_l0_stats'. To be able to populate this GSB
with the needed Guest-State-Elements (GSIDs) a instance of 'struct
kvmppc_gs_msg' named 'gsm_l0_stats' is introduced. The 'gsm_l0_stats' is
tied to an instance of 'struct kvmppc_gs_msg_ops' named  'gsb_ops_l0_stats'
which holds various callbacks to be compute the size ( hostwide_get_size()
), populate the GSB ( hostwide_fill_info() ) and
refresh ( hostwide_refresh_info() ) the contents of
'l0_stats' that holds the Hostwide counters returned from L0-PowerVM.

To protect these structures from simultaneous access a spinlock
'lock_l0_stats' has been introduced. The allocation and initialization of
the above structures is done in newly introduced kvmppc_init_hostwide() and
similarly the cleanup is performed in newly introduced
kvmppc_cleanup_hostwide().

Signed-off-by: Vaibhav Jain 

---
Changelog

v1->v2:
* Added error handling to hostwide_fill_info() [Gautam]
---
 arch/powerpc/kvm/book3s_hv_pmu.c | 199 +++
 1 file changed, 199 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv_pmu.c b/arch/powerpc/kvm/book3s_hv_pmu.c
index 8c6ed30b7654..0107ed3b03e3 100644
--- a/arch/powerpc/kvm/book3s_hv_pmu.c
+++ b/arch/powerpc/kvm/book3s_hv_pmu.c
@@ -27,10 +27,31 @@
 #include 
 #include 
 
+#include "asm/guest-state-buffer.h"
+
 enum kvmppc_pmu_eventid {
KVMPPC_EVENT_MAX,
 };
 
+#define KVMPPC_PMU_EVENT_ATTR(_name, _id) \
+   PMU_EVENT_ATTR_ID(_name, power_events_sysfs_show, _id)
+
+/* Holds the hostwide stats */
+static struct kvmppc_hostwide_stats {
+   u64 guest_heap;
+   u64 guest_heap_max;
+   u64 guest_pgtable_size;
+   u64 guest_pgtable_size_max;
+   u64 guest_pgtable_reclaim;
+} l0_stats;
+
+/* Protect access to l0_stats */
+static DEFINE_SPINLOCK(lock_l0_stats);
+
+/* GSB related structs needed to talk to L0 */
+static struct kvmppc_gs_msg *gsm_l0_stats;
+static struct kvmppc_gs_buff *gsb_l0_stats;
+
 static struct attribute *kvmppc_pmu_events_attr[] = {
NULL,
 };
@@ -90,6 +111,177 @@ static void kvmppc_pmu_read(struct perf_event *event)
 {
 }
 
+/* Return the size of the needed guest state buffer */
+static size_t hostwide_get_size(struct kvmppc_gs_msg *gsm)
+
+{
+   size_t size = 0;
+   const u16 ids[] = {
+   KVMPPC_GSID_L0_GUEST_HEAP,
+   KVMPPC_GSID_L0_GUEST_HEAP_MAX,
+   KVMPPC_GSID_L0_GUEST_PGTABLE_SIZE,
+   KVMPPC_GSID_L0_GUEST_PGTABLE_SIZE_MAX,
+   KVMPPC_GSID_L0_GUEST_PGTABLE_RECLAIM
+   };
+
+   for (int i = 0; i < ARRAY_SIZE(ids); i++)
+   size += kvmppc_gse_total_size(kvmppc_gsid_size(ids[i]));
+   return size;
+}
+
+/* Populate the request guest state buffer */
+static int hostwide_fill_info(struct kvmppc_gs_buff *gsb,
+ struct kvmppc_gs_msg *gsm)
+{
+   int rc = 0;
+   struct kvmppc_hostwide_stats  *stats = gsm->data;
+
+   /*
+* It doesn't matter what values are put into request buffer as
+* they are going to be overwritten anyways. But for the sake of
+* testcode and symmetry contents of existing stats are put
+* populated into the request guest state buffer.
+*/
+   if (kvmppc_gsm_includes(gsm, KVMPPC_GSID_L0_GUEST_HEAP))
+   rc = kvmppc_gse_put_u64(gsb,
+   KVMPPC_GSID_L0_GUEST_HEAP,
+   stats->guest_heap);
+
+   if (!rc && kvmppc_gsm_includes(gsm, KVMPPC_GSID_L0_GUEST_HEAP_MAX))
+   rc = kvmppc_gse_put_u64(gsb,
+   KVMPPC_GSID_L0_GUEST_HEAP_MAX,
+   stats->guest_heap_max);
+
+   if (!rc && kvmppc_gsm_includes(gsm, KVMPPC_GSID_L0_GUEST_PGTABLE_SIZE))
+   rc = kvmppc_gse_put_u64(gsb,
+   KVMPPC_GSID_L0_GUEST_PGTABLE_SIZE,
+   stats->guest_pgtable_size);
+   if (!rc &&
+   kvmppc_gsm_includes(gsm, KVMPPC_GSID_L0_GUEST_PGTABLE_SIZE_MAX))
+   rc = kvmppc_gse_put_u64(gsb,
+   KVMPPC_GSID_L0_GUEST_PGTABLE_SIZE_MAX,
+   stats->guest_pgtable_size_max);
+   if (!rc &&
+   kvmppc_gsm_includes(gsm, KVMPPC_GSID_L0_GUEST_PGTABLE_RECLAIM))
+   rc = kvmppc_gse_put_u64(gsb,
+   KVMPPC_GSID_L0_GUEST_PGTABLE_RECLAIM,
+   stats->guest_pgtable_reclaim);
+
+   return rc;
+}
+
+/* Pa

[PATCH v2 1/6] powerpc: Document APIv2 KVM hcall spec for Hostwide counters

2025-01-15 Thread Vaibhav Jain
Update kvm-nested APIv2 documentation to include five new
Guest-State-Elements to fetch the hostwide counters. These counters are
per L1-Lpar and indicate the amount of Heap/Page-table memory allocated,
available and Page-table memory reclaimed for all L2-Guests active
instances

Cc: linux-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan 
Cc: Nicholas Piggin 
Signed-off-by: Vaibhav Jain 

---
Changelog

v1->v2:
* Reworded section on GSID [Gautam]
---
 Documentation/arch/powerpc/kvm-nested.rst | 40 +--
 1 file changed, 30 insertions(+), 10 deletions(-)

diff --git a/Documentation/arch/powerpc/kvm-nested.rst 
b/Documentation/arch/powerpc/kvm-nested.rst
index 5defd13cc6c1..8e468a4db0dc 100644
--- a/Documentation/arch/powerpc/kvm-nested.rst
+++ b/Documentation/arch/powerpc/kvm-nested.rst
@@ -208,13 +208,9 @@ associated values for each ID in the GSB::
   flags:
  Bit 0: getGuestWideState: Request state of the Guest instead
of an individual VCPU.
- Bit 1: takeOwnershipOfVcpuState Indicate the L1 is taking
-   over ownership of the VCPU state and that the L0 can free
-   the storage holding the state. The VCPU state will need to
-   be returned to the Hypervisor via H_GUEST_SET_STATE prior
-   to H_GUEST_RUN_VCPU being called for this VCPU. The data
-   returned in the dataBuffer is in a Hypervisor internal
-   format.
+ Bit 1: getHostWideState: Request stats of the Host. This causes
+   the guestId and vcpuId parameters to be ignored and attempting
+   to get the VCPU/Guest state will cause an error.
  Bits 2-63: Reserved
   guestId: ID obtained from H_GUEST_CREATE
   vcpuId: ID of the vCPU pass to H_GUEST_CREATE_VCPU
@@ -406,9 +402,10 @@ the partition like the timebase offset and partition 
scoped page
 table information.
 
 ++---+++--+
-|   ID   | Size  | RW | Thread | Details  |
-|| Bytes || Guest  |  |
-||   || Scope  |  |
+|   ID   | Size  | RW |(H)ost  | Details  |
+|| Bytes ||(G)uest |  |
+||   ||(T)hread|  |
+||   ||Scope   |  |
 ++===+++==+
 | 0x |   | RW |   TG   | NOP element  |
 ++---+++--+
@@ -434,6 +431,29 @@ table information.
 ||   |||- 0x8 Table size. |
 ++---+++--+
 | 0x0007-|   ||| Reserved |
+| 0x07FF |   |||  |
+++---+++--+
+| 0x0800 | 0x08  | R  |   H| Current usage in bytes of the|
+||   ||| L0's Guest Management Space  |
+||   ||| for an L1-Lpar.  |
+++---+++--+
+| 0x0801 | 0x08  | R  |   H| Max bytes available in the   |
+||   ||| L0's Guest Management Space for  |
+||   ||| an L1-Lpar   |
+++---+++--+
+| 0x0802 | 0x08  | R  |   H| Current usage in bytes of the|
+||   ||| L0's Guest Page Table Management |
+||   ||| Space for an L1-Lpar |
+++---+++--+
+| 0x0803 | 0x08  | R  |   H| Max bytes available in the L0's  |
+||   ||| Guest Page Table Management  |
+||   ||| Space for an L1-Lpar |
+++---+++--+
+| 0x0804 | 0x08  | R  |   H| Amount of reclaimed L0 Guest's   |
+||   ||| Page Table Management Space due  |
+||   ||| to overcommit for an L1-Lpar |
+++---+++--+
+| 0x0805-|   ||| Reserved |
 | 0x0BFF |   |||  |
 ++---+++--+
 | 0x0C00 | 0x10  | RW |   T|Run vCPU Input Buffer:|
-- 
2.47.1




[PATCH v4] powerpc/pseries/eeh: Fix get PE state translation

2025-01-15 Thread Narayana Murty N
The PE Reset State "0" returned by RTAS calls
"ibm_read_slot_reset_[state|state2]" indicates that the reset is
deactivated and the PE is in a state where MMIO and DMA are allowed.
However, the current implementation of "pseries_eeh_get_state()" does
not reflect this, causing drivers to incorrectly assume that MMIO and
DMA operations cannot be resumed.

The userspace drivers as a part of EEH recovery using VFIO ioctls fail
to detect when the recovery process is complete. The VFIO_EEH_PE_GET_STATE
ioctl does not report the expected EEH_PE_STATE_NORMAL state, preventing
userspace drivers from functioning properly on pseries systems.

The patch addresses this issue by updating 'pseries_eeh_get_state()'
to include "EEH_STATE_MMIO_ENABLED" and "EEH_STATE_DMA_ENABLED" in
the result mask for PE Reset State "0". This ensures correct state
reporting to the callers, aligning the behavior with the PAPR specification
and fixing the bug in EEH recovery for VFIO user workflows.

Fixes: 00ba05a12b3c ("powerpc/pseries: Cleanup on pseries_eeh_get_state()")
Cc: 
Signed-off-by: Narayana Murty N 

---
Changelog:
V1:https://lore.kernel.org/all/20241107042027.338065-1-nnmli...@linux.ibm.com/
--added Fixes tag for "powerpc/pseries: Cleanup on
pseries_eeh_get_state()".
V2:https://lore.kernel.org/stable/20241212075044.10563-1-nnmlinux%40linux.ibm.com
--Updated the patch description to include it in the stable kernel tree.
V3:https://lore.kernel.org/all/87v7vm8pwz@gmail.com/
--Updated commit description.
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index 1893f66371fa..b12ef382fec7 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -580,8 +580,10 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int 
*delay)
 
switch(rets[0]) {
case 0:
-   result = EEH_STATE_MMIO_ACTIVE |
-EEH_STATE_DMA_ACTIVE;
+   result = EEH_STATE_MMIO_ACTIVE  |
+EEH_STATE_DMA_ACTIVE   |
+EEH_STATE_MMIO_ENABLED |
+EEH_STATE_DMA_ENABLED;
break;
case 1:
result = EEH_STATE_RESET_ACTIVE |
-- 
2.47.1




Re: [PATCH V2 2/5] tools/testing/selftests/powerpc: Add check for power11 pvr for pmu selfests

2025-01-15 Thread Disha Goel

On 13/01/25 1:28 pm, Athira Rajeev wrote:

Some of the tests depends on pvr value to choose
the event. Example:
- event_alternatives_tests_p10: alternative event depends
  on registered PMU driver which is based on pvr
- generic_events_valid_test varies based on platform
- bhrb_filter_map_test: again its dependent on pmu to
  decide which bhrb filter to use
- reserved_bits_mmcra_sample_elig_mode: randome sampling
  mode reserved bits is also varies based on platform

Signed-off-by: Athira Rajeev 

I have tested the patches on PowerPC by compiling and running pmu selftest.

For the series:
Tested-by: Disha Goel

---
Changelog:
 v1 -> v2
 No code changes. Rebased to latest upstream

 .../pmu/event_code_tests/event_alternatives_tests_p10.c| 3 ++-
 .../pmu/event_code_tests/generic_events_valid_test.c   | 3 ++-
 .../reserved_bits_mmcra_sample_elig_mode_test.c| 3 ++-
 .../powerpc/pmu/sampling_tests/bhrb_filter_map_test.c  | 7 +--
 4 files changed, 11 insertions(+), 5 deletions(-)

diff --git 
a/tools/testing/selftests/powerpc/pmu/event_code_tests/event_alternatives_tests_p10.c
 
b/tools/testing/selftests/powerpc/pmu/event_code_tests/event_alternatives_tests_p10.c
index 8be7aada6523..355f8bbe06c3 100644
--- 
a/tools/testing/selftests/powerpc/pmu/event_code_tests/event_alternatives_tests_p10.c
+++ 
b/tools/testing/selftests/powerpc/pmu/event_code_tests/event_alternatives_tests_p10.c
@@ -26,6 +26,7 @@ static int event_alternatives_tests_p10(void)
 {
struct event *e, events[5];
int i;
+   int pvr = PVR_VER(mfspr(SPRN_PVR));

/* Check for platform support for the test */
SKIP_IF(platform_check_for_tests());
@@ -36,7 +37,7 @@ static int event_alternatives_tests_p10(void)
 * code and using PVR will work correctly for all cases
 * including generic compat mode.
 */
-   SKIP_IF(PVR_VER(mfspr(SPRN_PVR)) != POWER10);
+   SKIP_IF((pvr != POWER10) && (pvr != POWER11));

SKIP_IF(check_for_generic_compat_pmu());

diff --git 
a/tools/testing/selftests/powerpc/pmu/event_code_tests/generic_events_valid_test.c
 
b/tools/testing/selftests/powerpc/pmu/event_code_tests/generic_events_valid_test.c
index 0d237c15d3f2..a378fa9a5a7b 100644
--- 
a/tools/testing/selftests/powerpc/pmu/event_code_tests/generic_events_valid_test.c
+++ 
b/tools/testing/selftests/powerpc/pmu/event_code_tests/generic_events_valid_test.c
@@ -17,6 +17,7 @@
 static int generic_events_valid_test(void)
 {
struct event event;
+   int pvr = mfspr(SPRN_PVR);

/* Check for platform support for the test */
SKIP_IF(platform_check_for_tests());
@@ -31,7 +32,7 @@ static int generic_events_valid_test(void)
 * - PERF_COUNT_HW_STALLED_CYCLES_BACKEND
 * - PERF_COUNT_HW_REF_CPU_CYCLES
 */
-   if (PVR_VER(mfspr(SPRN_PVR)) == POWER10) {
+   if ((pvr == POWER10) || (pvr == POWER11)) {
event_init_opts(&event, PERF_COUNT_HW_CPU_CYCLES, PERF_TYPE_HARDWARE, 
"event");
FAIL_IF(event_open(&event));
event_close(&event);
diff --git 
a/tools/testing/selftests/powerpc/pmu/event_code_tests/reserved_bits_mmcra_sample_elig_mode_test.c
 
b/tools/testing/selftests/powerpc/pmu/event_code_tests/reserved_bits_mmcra_sample_elig_mode_test.c
index 4c119c821b99..7bb26a232fbe 100644
--- 
a/tools/testing/selftests/powerpc/pmu/event_code_tests/reserved_bits_mmcra_sample_elig_mode_test.c
+++ 
b/tools/testing/selftests/powerpc/pmu/event_code_tests/reserved_bits_mmcra_sample_elig_mode_test.c
@@ -21,6 +21,7 @@
 static int reserved_bits_mmcra_sample_elig_mode(void)
 {
struct event event;
+   int pvr = PVR_VER(mfspr(SPRN_PVR));

/* Check for platform support for the test */
SKIP_IF(platform_check_for_tests());
@@ -59,7 +60,7 @@ static int reserved_bits_mmcra_sample_elig_mode(void)
 * is reserved in power10 and 0xC is reserved in
 * power9.
 */
-   if (PVR_VER(mfspr(SPRN_PVR)) == POWER10) {
+   if ((pvr == POWER10) || (pvr == POWER11)) {
event_init(&event, 0x100401e0);
FAIL_IF(!event_open(&event));
} else if (PVR_VER(mfspr(SPRN_PVR)) == POWER9) {
diff --git 
a/tools/testing/selftests/powerpc/pmu/sampling_tests/bhrb_filter_map_test.c 
b/tools/testing/selftests/powerpc/pmu/sampling_tests/bhrb_filter_map_test.c
index 3f43c315c666..64ab9784f9b1 100644
--- a/tools/testing/selftests/powerpc/pmu/sampling_tests/bhrb_filter_map_test.c
+++ b/tools/testing/selftests/powerpc/pmu/sampling_tests/bhrb_filter_map_test.c
@@ -83,13 +83,16 @@ static int bhrb_filter_map_test(void)
 * using PVR will work correctly for all cases including generic
 * compat mode.
 */
-   if (PVR_VER(mfspr(SPRN_PVR)) == POWER10) {
+   switch (PVR_VER(mfspr(SPRN_PVR))) {
+   case POWER11:
+   case POWER10:
for (i = 0; i < ARRAY_SIZE(bhrb_filter_map_valid_p10); i++) {
ev

Re: [PATCH v2] treewide: const qualify ctl_tables where applicable

2025-01-15 Thread Thomas Gleixner
On Fri, Jan 10 2025 at 15:16, Joel Granados wrote:
> sed:
> sed --in-place \
>   -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table 
> = \&uts_kern/" \
>   kernel/utsname_sysctl.c
>
> Reviewed-by: Song Liu 
> Acked-by: Steven Rostedt (Google)  # for kernel/trace/
> Reviewed-by: Martin K. Petersen  # SCSI
> Reviewed-by: Darrick J. Wong  # xfs
> Acked-by: Jani Nikula 
> Acked-by: Corey Minyard 
> Signed-off-by: Joel Granados 

Acked-by: Thomas Gleixner 



Re: [PATCH v5 1/6] elf: Define note name macros

2025-01-15 Thread Dave Martin
Hi,

On Wed, Jan 15, 2025 at 02:47:58PM +0900, Akihiko Odaki wrote:
> elf.h had a comment saying:
> > Notes used in ET_CORE. Architectures export some of the arch register
> > sets using the corresponding note types via the PTRACE_GETREGSET and
> > PTRACE_SETREGSET requests.
> > The note name for these types is "LINUX", except NT_PRFPREG that is
> > named "CORE".
> 
> However, NT_PRSTATUS is also named "CORE". It is also unclear what
> "these types" refers to.
> 
> To fix these problems, define a name for each note type. The added
> definitions are macros so the kernel and userspace can directly refer to
> them to remove their duplicate definitions of note names.
> 
> Signed-off-by: Akihiko Odaki 
> Acked-by: Baoquan He 
> ---
>  include/uapi/linux/elf.h | 89 
> +---
>  1 file changed, 84 insertions(+), 5 deletions(-)
> 
> diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
> index b44069d29cec..592507aa9b3a 100644
> --- a/include/uapi/linux/elf.h
> +++ b/include/uapi/linux/elf.h
> @@ -368,101 +368,180 @@ typedef struct elf64_shdr {
>  #define ELF_OSABI ELFOSABI_NONE
>  #endif
>  
> +/* Note definitions: NN_ defines names. NT_ defines types. */
> +
> +#define NN_GNU_PROPERTY_TYPE_0   "GNU"
> +#define NT_GNU_PROPERTY_TYPE_0   5
> +

I guess this also works as a neutral way of saying that
NT_GNU_PROPERTY_TYPE_0 isn't _specifically_ for coredumps.

I would worry that moving this block is going to generate unwanted
context noise with other patches that may be in flight and add new
NT_ definitions.

But (a) changing the comments will cause that anyway, and
(b) if there are any new NT_ definitions in flight, we want people to
notice the conflict and add the accompanying NN_ definition.

So, perhaps context noise is not such a bad thing in this instance.

[...]

> +#define NN_LOONGARCH_HW_WATCH"LINUX"
>  #define NT_LOONGARCH_HW_WATCH0xa06   /* LoongArch hardware 
> watchpoint registers */
>  
> -/* Note types with note name "GNU" */
> -#define NT_GNU_PROPERTY_TYPE_0   5
> -
>  /* Note header in a PT_NOTE section */
>  typedef struct elf32_note {
>Elf32_Word n_namesz;   /* Name size */

Reviewed-by: Dave Martin 

Cheers
---Dave



Re: [PATCH] selftests: livepatch: handle PRINTK_CALLER in check_result()

2025-01-15 Thread Joe Lawrence
On Tue, Jan 14, 2025 at 08:01:44PM +0530, Madhavan Srinivasan wrote:
> Some arch configs (like ppc64) enable CONFIG_PRINTK_CALLER, which
> adds the caller id as part of the dmesg. Due to this, even though
> the expected vs observed are same, end testcase results are failed.
> 
>  -% insmod test_modules/test_klp_livepatch.ko
>  -livepatch: enabling patch 'test_klp_livepatch'
>  -livepatch: 'test_klp_livepatch': initializing patching transition
>  -livepatch: 'test_klp_livepatch': starting patching transition
>  -livepatch: 'test_klp_livepatch': completing patching transition
>  -livepatch: 'test_klp_livepatch': patching complete
>  -% echo 0 > /sys/kernel/livepatch/test_klp_livepatch/enabled
>  -livepatch: 'test_klp_livepatch': initializing unpatching transition
>  -livepatch: 'test_klp_livepatch': starting unpatching transition
>  -livepatch: 'test_klp_livepatch': completing unpatching transition
>  -livepatch: 'test_klp_livepatch': unpatching complete
>  -% rmmod test_klp_livepatch
>  +[   T3659] % insmod test_modules/test_klp_livepatch.ko
>  +[   T3682] livepatch: enabling patch 'test_klp_livepatch'
>  +[   T3682] livepatch: 'test_klp_livepatch': initializing patching transition
>  +[   T3682] livepatch: 'test_klp_livepatch': starting patching transition
>  +[T826] livepatch: 'test_klp_livepatch': completing patching transition
>  +[T826] livepatch: 'test_klp_livepatch': patching complete
>  +[   T3659] % echo 0 > /sys/kernel/livepatch/test_klp_livepatch/enabled
>  +[   T3659] livepatch: 'test_klp_livepatch': initializing unpatching 
> transition
>  +[   T3659] livepatch: 'test_klp_livepatch': starting unpatching transition
>  +[T789] livepatch: 'test_klp_livepatch': completing unpatching transition
>  +[T789] livepatch: 'test_klp_livepatch': unpatching complete
>  +[   T3659] % rmmod test_klp_livepatch
> 
>   ERROR: livepatch kselftest(s) failed
>  not ok 1 selftests: livepatch: test-livepatch.sh # exit=1
> 
> Currently the check_result() handles the "[time]" removal from
> the dmesg. Enhance the check to handle removal of "[Tid]" also.
> 
> Signed-off-by: Madhavan Srinivasan 
> ---
>  tools/testing/selftests/livepatch/functions.sh | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/livepatch/functions.sh 
> b/tools/testing/selftests/livepatch/functions.sh
> index e5d06fb40233..a1730c1864a4 100644
> --- a/tools/testing/selftests/livepatch/functions.sh
> +++ b/tools/testing/selftests/livepatch/functions.sh
> @@ -306,7 +306,8 @@ function check_result {
>   result=$(dmesg | awk -v last_dmesg="$LAST_DMESG" 'p; $0 == last_dmesg { 
> p=1 }' | \
>grep -e 'livepatch:' -e 'test_klp' | \
>grep -v '\(tainting\|taints\) kernel' | \
> -  sed 's/^\[[ 0-9.]*\] //')
> +  sed 's/^\[[ 0-9.]*\] //' | \
> +  sed 's/^\[[ ]*T[0-9]*\] //')

Thanks for adding this to the filter.

If I read the PRINTK_CALLER docs correctly, there is a potential CPU
identifier as well.  Are there any instances where the livepatching code
will use the "[C$processor_id]" (out of task context) prefix?  Or would
it hurt to future proof with [CT][0-9]?

Acked-by: Joe Lawrence 

--
Joe

>  
>   if [[ "$expect" == "$result" ]] ; then
>   echo "ok"
> -- 
> 2.47.0
> 




Re: [PATCH v2] treewide: const qualify ctl_tables where applicable

2025-01-15 Thread Bill O'Donnell
On Fri, Jan 10, 2025 at 03:16:08PM +0100, Joel Granados wrote:
> Add the const qualifier to all the ctl_tables in the tree except for
> watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
> loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
> drivers/inifiniband dirs). These are special cases as they use a
> registration function with a non-const qualified ctl_table argument or
> modify the arrays before passing them on to the registration function.
> 
> Constifying ctl_table structs will prevent the modification of
> proc_handler function pointers as the arrays would reside in .rodata.
> This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
> constify the ctl_table argument of proc_handlers") constified all the
> proc_handlers.
> 
> Created this by running an spatch followed by a sed command:
> Spatch:
> virtual patch
> 
> @
> depends on !(file in "net")
> disable optional_qualifier
> @
> identifier table_name != 
> {watchdog_hardlockup_sysctl,iwcm_ctl_table,ucma_ctl_table,memory_allocation_profiling_sysctls,loadpin_sysctl_table};
> @@
> 
> + const
> struct ctl_table table_name [] = { ... };
> 
> sed:
> sed --in-place \
>   -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table 
> = \&uts_kern/" \
>   kernel/utsname_sysctl.c
> 
> Reviewed-by: Song Liu 
> Acked-by: Steven Rostedt (Google)  # for kernel/trace/
> Reviewed-by: Martin K. Petersen  # SCSI
> Reviewed-by: Darrick J. Wong  # xfs
> Acked-by: Jani Nikula 
> Acked-by: Corey Minyard 
> Signed-off-by: Joel Granados 
> ---

For xfs bits...
Reviewed-by: Bill O'Donnell 


> This treewide commit builds upon the work Thomas began a few releases
> ago [1], where he laid the groundwork for constifying ctl_tables. We
> implement constification throughout the tree, with the exception of the
> ctl_tables in the "net" directory. Those are special in that they treat
> the ctl_table as non-const but we can take them at a later point.
> 
> Upstreaming:
> ===
> It is late in the release cycle, but I'm hopeful that we can get this
> in for the upcoming merge window and this is why:
> 1. We don't use linux-next: As with previous treewide changes similar to
>this one [1], we avoid using linux-next in order to avoid unwanted
>merge conflicts
> 2. This is a non-functional change: which lowers the probability of
>unforeseen errors or regressions.
> 3. It will have at least 2 weeks to be tested/reviewed: The PULL should
>be sent at the end of the merge window, giving it at least 2 weeks.
>And if there are more release candidates after rc6, there will be
>more time.
> 
> Testing:
> 
> 1. Currently being tested in 0-day
> 2. sysctl self-tests/kunit-tests
> 
> Reduced To/Cc:
> ==
> b4 originally gave me 200 ppl that this should go out to (which seems a
> bit overkill from my point of view). So I left the mailing lists and
> reduced the To: the ppl previously involved in the effort and sysctl
> maintainers. Please tell me if I missed someone important to the
> constification effort.
> 
> Comments are greatly appreciated.
> 
> Changes in v2:
> - watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
>   loadpin_sysctl_table, iwcm_ctl_table and ucma_ctl_table where removed
>   from patchset as they change the sysctl array before registration.
> - Added reviewed-by tags
> - Link to v1: 
> https://lore.kernel.org/r/20250109-jag-ctl_table_const-v1-1-622aea723...@kernel.org
> Best
> 
> [1] https://lore.kernel.org/20240724210014.mc6nima6cekgi...@joels2.panther.com
> 
> --
> ---
> 
> ---
>  arch/arm/kernel/isa.c | 2 +-
>  arch/arm64/kernel/fpsimd.c| 4 ++--
>  arch/arm64/kernel/process.c   | 2 +-
>  arch/powerpc/kernel/idle.c| 2 +-
>  arch/powerpc/platforms/pseries/mobility.c | 2 +-
>  arch/riscv/kernel/process.c   | 2 +-
>  arch/riscv/kernel/vector.c| 2 +-
>  arch/s390/appldata/appldata_base.c| 2 +-
>  arch/s390/kernel/debug.c  | 2 +-
>  arch/s390/kernel/hiperdispatch.c  | 2 +-
>  arch/s390/kernel/topology.c   | 2 +-
>  arch/s390/mm/cmm.c| 2 +-
>  arch/s390/mm/pgalloc.c| 2 +-
>  arch/x86/entry/vdso/vdso32-setup.c| 2 +-
>  arch/x86/kernel/cpu/bus_lock.c| 2 +-
>  arch/x86/kernel/itmt.c| 2 +-
>  crypto/fips.c | 2 +-
>  drivers/base/firmware_loader/fallback_table.c | 2 +-
>  drivers/cdrom/cdrom.c | 2 +-
>  drivers/char/hpet.c   | 2 +-
>  drivers/char/ipmi/ipmi_poweroff.c | 2 +-
>  drivers/char/random.c | 2 +-
>  drivers/gpu/drm/i915/i915_perf.c  | 2 +-
>  drivers/gpu/drm/xe/xe_observation.c   | 2 +-
>  drive

Re: [PATCH v2 net-next 07/13] net: enetc: add RSS support for i.MX95 ENETC PF

2025-01-15 Thread Jakub Kicinski
On Mon, 13 Jan 2025 16:22:39 +0800 Wei Fang wrote:
> Add Receive side scaling (RSS) support for i.MX95 ENETC PF to improve the
> network performance and balance the CPU loading. In addition, since both
> ENETC v1 and ENETC v4 only support the toeplitz algorithm, so a check for
> hfunc was added.

This and previous commits are a bi hard to follow. You plumb some
stuff thru in the previous commit. In this one you reshuffle things,
again. Try to separate code movement / restructuring in one commit. 
And new additions more clearly in the next.

> +static void enetc4_set_rss_key(struct enetc_hw *hw, const u8 *key)
> +{
> + int i;
> +
> + for (i = 0; i < ENETC_RSSHASH_KEY_SIZE / 4; i++)
> + enetc_port_wr(hw, ENETC4_PRSSKR(i), ((u32 *)key)[i]);
> +}
> +
> +static void enetc4_get_rss_key(struct enetc_hw *hw, u8 *key)
> +{
> + int i;
> +
> + for (i = 0; i < ENETC_RSSHASH_KEY_SIZE / 4; i++)
> + ((u32 *)key)[i] = enetc_port_rd(hw, ENETC4_PRSSKR(i));
> +}

Isn't the only difference between the chips the register offset?
Why create full ops for something this trivial?

> +static int enetc4_get_rxnfc(struct net_device *ndev, struct ethtool_rxnfc 
> *rxnfc,
> + u32 *rule_locs)
> +{
> + struct enetc_ndev_priv *priv = netdev_priv(ndev);
> +
> + switch (rxnfc->cmd) {
> + case ETHTOOL_GRXRINGS:
> + rxnfc->data = priv->num_rx_rings;
> + break;
> + case ETHTOOL_GRXFH:
> + return enetc_get_rsshash(rxnfc);
> + default:
> + return -EOPNOTSUPP;
> + }
> +
> + return 0;
> +}

Why add a new function instead of returning EOPNOTSUPP for new chips 
in the existing one?

> @@ -712,6 +730,12 @@ static int enetc_set_rxfh(struct net_device *ndev,
>   struct enetc_hw *hw = &si->hw;
>   int err = 0;
>  
> + if (rxfh->hfunc != ETH_RSS_HASH_NO_CHANGE &&
> + rxfh->hfunc != ETH_RSS_HASH_TOP) {
> + netdev_err(ndev, "Only toeplitz hash function is supported\n");
> + return -EOPNOTSUPP;

Should be a separate commit.
-- 
pw-bot: cr