date:20250113

Re: [PATCH] fs: introduce getfsxattrat and setfsxattrat syscalls

2025-01-13 Thread Jan Kara

On Thu 09-01-25 18:45:40, Andrey Albershteyn wrote:
> From: Andrey Albershteyn 
> 
> Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
> extended attributes/flags. The syscalls take parent directory FD and
> path to the child together with struct fsxattr.
> 
> This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
> that file don't need to be open. By having this we can manipulated
> inode extended attributes not only on normal files but also on
> special ones. This is not possible with FS_IOC_FSSETXATTR ioctl as
> opening special files returns VFS special inode instead of
> underlying filesystem one.
> 
> This patch adds two new syscalls which allows userspace to set
> extended inode attributes on special files by using parent directory
> to open FS inode.
> 
> Also, as vfs_fileattr_set() is now will be called on special files
> too, let's forbid any other attributes except projid and nextents
> (symlink can have an extent).
> 
> CC: linux-...@vger.kernel.org
> Signed-off-by: Andrey Albershteyn 

Couple of comments below:

> @@ -2953,3 +2956,105 @@ umode_t mode_strip_sgid(struct mnt_idmap *idmap,
>   return mode & ~S_ISGID;
>  }
>  EXPORT_SYMBOL(mode_strip_sgid);
> +
> +SYSCALL_DEFINE4(getfsxattrat, int, dfd, const char __user *, filename,
> + struct fsxattr *, fsx, int, at_flags)
   ^^^ at_flags should be probably
unsigned - at least they seem to be for other syscalls.

> +{
> + struct fd dir;
> + struct fileattr fa;
> + struct path filepath;
> + struct inode *inode;
> + int error;
> +
> + if (at_flags)
> + return -EINVAL;

Shouldn't we support basic path resolve flags like AT_SYMLINK_NOFOLLOW or
AT_EMPTY_PATH? I didn't put too much thought to this but intuitively I'd say
we should follow what path_setxattrat() does.

> +
> + if (!capable(CAP_FOWNER))
> + return -EPERM;

Why? Firstly this does not handle user namespaces at all, secondly it
doesn't match the check done during ioctl, and thirdly vfs_fileattr_get()
should do all the needed checks?

> +
> + dir = fdget(dfd);
> + if (!fd_file(dir))
> + return -EBADF;
> +
> + if (!S_ISDIR(file_inode(fd_file(dir))->i_mode)) {
> + error = -EBADF;
> + goto out;
> + }
> +
> + error = user_path_at(dfd, filename, at_flags, &filepath);
> + if (error)
> + goto out;

I guess this is OK for now but allowing full flexibility of the "_at"
syscall (e.g. like setxattrat() does) would be preferred. Mostly so that
userspace programmer doesn't have to read manpage in detail and think
whether the particular combination of path arguments is supported by a
particular syscall. Admittedly VFS could make this a bit simpler. Currently
the boilerplate code that's needed in path_setxattrat() &
filename_setxattr() / file_setxattr() is offputting.

> +
> + inode = filepath.dentry->d_inode;
> + if (file_inode(fd_file(dir))->i_sb->s_magic != inode->i_sb->s_magic) {
> + error = -EBADF;
> + goto out_path;
> + }

What's the motivation for this check?

> +
> + error = vfs_fileattr_get(filepath.dentry, &fa);
> + if (error)
> + goto out_path;
> +
> + if (copy_fsxattr_to_user(&fa, fsx))
> + error = -EFAULT;
> +
> +out_path:
> + path_put(&filepath);
> +out:
> + fdput(dir);
> + return error;
> +}
> +
> +SYSCALL_DEFINE4(setfsxattrat, int, dfd, const char __user *, filename,
> + struct fsxattr *, fsx, int, at_flags)
> +{

Same comments as for getfsxattrat() apply here as well.

> -static int copy_fsxattr_from_user(struct fileattr *fa,
> -   struct fsxattr __user *ufa)
> +int copy_fsxattr_from_user(struct fileattr *fa, struct fsxattr __user *ufa)
>  {
>   struct fsxattr xfa;
>  
> @@ -574,6 +573,7 @@ static int copy_fsxattr_from_user(struct fileattr *fa,
>  
>   return 0;
>  }
> +EXPORT_SYMBOL(copy_fsxattr_from_user);

I guess no need to export this function? The code you call it from cannot
be compiled as a module.

Honza
-- 
Jan Kara 
SUSE Labs, CR

Re: [PATCH] powerpc/pseries/iommu: IOMMU incorrectly marks MMIO range in DDW

2025-01-13 Thread Madhavan Srinivasan

On Fri, 06 Dec 2024 15:00:39 -0600, Gaurav Batra wrote:
> Power Hypervisor can possibily allocate MMIO window intersecting with
> Dynamic DMA Window (DDW) range, which is over 32-bit addressing.
> 
> These MMIO pages needs to be marked as reserved so that IOMMU doesn't map
> DMA buffers in this range.
> 
> The current code is not marking these pages correctly which is resulting
> in LPAR to OOPS while booting. The stack is at below
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/pseries/iommu: IOMMU incorrectly marks MMIO range in DDW
  https://git.kernel.org/powerpc/c/8f70caad82e9c088ed93b4fea48d941ab6441886

Thanks

Re: [PATCH] selftests/powerpc: Fix argument order to timer_sub()

2025-01-13 Thread Madhavan Srinivasan

On Wed, 18 Dec 2024 22:43:47 +1100, Michael Ellerman wrote:
> Commit c814bf958926 ("powerpc/selftests: Use timersub() for
> gettimeofday()"), got the order of arguments to timersub() wrong,
> leading to a negative time delta being reported, eg:
> 
>   test: gettimeofday
>   tags: git_version:v6.12-rc5-409-gdddf291c3030
>   time = -3.297781
>   success: gettimeofday
> 
> [...]

Applied to powerpc/next.

[1/1] selftests/powerpc: Fix argument order to timer_sub()
  https://git.kernel.org/powerpc/c/2bf66e66d2e6feece6175ec09ec590a0a8563bdd

Thanks

Re: [PATCH] powerpc/prom_init: Use IS_ENABLED()

2025-01-13 Thread Madhavan Srinivasan

On Wed, 18 Dec 2024 22:31:59 +1100, Michael Ellerman wrote:
> Use IS_ENABLED() for the device tree checks, so that more code is
> checked by the compiler without having to build all the different
> configurations.
> 
> 

Applied to powerpc/next.

[1/1] powerpc/prom_init: Use IS_ENABLED()
  https://git.kernel.org/powerpc/c/200f22fa48a8c670a1ba66d18d810c51055e6ae9

Thanks

Re: [PATCH V2] tools/perf/tests/base_probe: Fix check for the count of existing probes in test_adding_kernel

2025-01-13 Thread Arnaldo Carvalho de Melo

On Mon, Jan 13, 2025 at 11:21:24AM +0100, Veronika Molnarova wrote:
> On 1/10/25 10:43, Athira Rajeev wrote:
> > But if there are other probes in the system, the log will
> > contain reference to other existing probe too. Hence change
> > usage of check_all_lines_matched.pl to check_all_patterns_found.pl
> > This will make sure expecting string comes in the result
> > 
> > Signed-off-by: Athira Rajeev 
> 
> Acked-by: Veronika Molnarova 

Thanks, applied to perf-tools-next,

- Arnaldo

Re: [kvm-unit-tests PATCH v1 2/5] configure: Display the default processor for arm and arm64

2025-01-13 Thread Andrew Jones

On Fri, Jan 10, 2025 at 01:58:45PM +, Alexandru Elisei wrote:
> The help text for the --processor option displays the architecture name as
> the default processor type. But the default for arm is cortex-a15, and for
> arm64 is cortex-a57. Teach configure to display the correct default
> processor type for these two architectures.
> 
> Signed-off-by: Alexandru Elisei 
> ---
>  configure | 30 ++
>  1 file changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/configure b/configure
> index 5b0a2d7f39c0..138840c3f76d 100755
> --- a/configure
> +++ b/configure
> @@ -5,6 +5,24 @@ if [ -z "${BASH_VERSINFO[0]}" ] || [ "${BASH_VERSINFO[0]}" 
> -lt 4 ] ; then
>  exit 1
>  fi
>  
> +function get_default_processor()
> +{
> +local arch="$1"
> +
> +case "$arch" in
> +"arm")
> +default_processor="cortex-a15"
> +;;
> +"arm64" | "aarch64")
> +default_processor="cortex-a57"
> +;;
> +*)
> +default_processor=$arch
> +esac
> +
> +echo "$default_processor"
> +}
> +
>  srcdir=$(cd "$(dirname "$0")"; pwd)
>  prefix=/usr/local
>  cc=gcc
> @@ -33,6 +51,7 @@ page_size=
>  earlycon=
>  efi=
>  efi_direct=
> +default_processor=$(get_default_processor $arch)
>  
>  # Enable -Werror by default for git repositories only (i.e. developer builds)
>  if [ -e "$srcdir"/.git ]; then
> @@ -48,7 +67,7 @@ usage() {
>   Options include:
>   --arch=ARCHarchitecture to compile for ($arch). ARCH 
> can be one of:
>  arm, arm64/aarch64, i386, ppc64, riscv32, 
> riscv64, s390x, x86_64
> - --processor=PROCESSOR  processor to compile for ($arch)
> + --processor=PROCESSOR  processor to compile for ($default_processor)
>   --target=TARGETtarget platform that the tests will be 
> running on (qemu or
>  kvmtool, default is qemu) (arm/arm64 only)
>   --cross-prefix=PREFIX  cross compiler prefix
> @@ -283,13 +302,8 @@ else
>  fi
>  fi
>  
> -[ -z "$processor" ] && processor="$arch"
> -
> -if [ "$processor" = "arm64" ]; then
> -processor="cortex-a57"
> -elif [ "$processor" = "arm" ]; then
> -processor="cortex-a15"
> -fi
> +# $arch will have changed when cross-compiling.
> +[ -z "$processor" ] && processor=$(get_default_processor $arch)

The fact that $arch and $processor are wrong until they've had a chance to
be converted might be another reason for the $do_help idea. But it'll
always be fragile since another change that does some sort of conversion
could end up getting added after the '[ $do_help ] && usage' someday.

Thanks,
drew

>  
>  if [ "$arch" = "i386" ] || [ "$arch" = "x86_64" ]; then
>  testdir=x86
> -- 
> 2.47.1
> 
> 
> -- 
> kvm-riscv mailing list
> kvm-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kvm-riscv

Re: [PATCH v6 12/26] mm/memory: Enhance insert_page_into_pte_locked() to create writable mappings

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> In preparation for using insert_page() for DAX, enhance
> insert_page_into_pte_locked() to handle establishing writable
> mappings.  Recall that DAX returns VM_FAULT_NOPAGE after installing a
> PTE which bypasses the typical set_pte_range() in finish_fault.
> 
> Signed-off-by: Alistair Popple 
> Suggested-by: Dan Williams 
> 
> ---
> 
> Changes for v5:
>  - Minor comment/formatting fixes suggested by David Hildenbrand
> 
> Changes since v2:
> 
>  - New patch split out from "mm/memory: Add dax_insert_pfn"
> ---
>  mm/memory.c | 37 +
>  1 file changed, 29 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 06bb29e..8531acb 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2126,19 +2126,40 @@ static int validate_page_before_insert(struct 
> vm_area_struct *vma,
>  }
>  
>  static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t 
> *pte,
> - unsigned long addr, struct page *page, pgprot_t prot)
> + unsigned long addr, struct page *page,
> + pgprot_t prot, bool mkwrite)
>  {
>   struct folio *folio = page_folio(page);
> + pte_t entry = ptep_get(pte);
>   pte_t pteval;
>  
> - if (!pte_none(ptep_get(pte)))
> - return -EBUSY;
> + if (!pte_none(entry)) {
> + if (!mkwrite)
> + return -EBUSY;
> +
> + /* see insert_pfn(). */
> + if (pte_pfn(entry) != page_to_pfn(page)) {
> + WARN_ON_ONCE(!is_zero_pfn(pte_pfn(entry)));
> + return -EFAULT;
> + }
> + entry = maybe_mkwrite(entry, vma);
> + entry = pte_mkyoung(entry);
> + if (ptep_set_access_flags(vma, addr, pte, entry, 1))
> + update_mmu_cache(vma, addr, pte);
> + return 0;
> + }

This hunk feels like it is begging to be unified with insert_pfn() after
pfn_t dies. Perhaps a TODO to remember to come back and unify them, or
you can go append that work to your pfn_t removal series?

Other than that you can add:

Reviewed-by: Dan Williams

Re: [PATCH v6 16/26] huge_memory: Add vmf_insert_folio_pmd()

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> Currently DAX folio/page reference counts are managed differently to
> normal pages. To allow these to be managed the same as normal pages
> introduce vmf_insert_folio_pmd. This will map the entire PMD-sized folio
> and take references as it would for a normally mapped page.
> 
> This is distinct from the current mechanism, vmf_insert_pfn_pmd, which
> simply inserts a special devmap PMD entry into the page table without
> holding a reference to the page for the mapping.
> 
> Signed-off-by: Alistair Popple 
> 
> ---
> 
> Changes for v5:
>  - Minor code cleanup suggested by David
> ---
>  include/linux/huge_mm.h |  1 +-
>  mm/huge_memory.c| 54 ++
>  2 files changed, 45 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 5bd1ff7..3633bd3 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -39,6 +39,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct 
> vm_area_struct *vma,
>  
>  vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write);
>  vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write);
> +vm_fault_t vmf_insert_folio_pmd(struct vm_fault *vmf, struct folio *folio, 
> bool write);
>  vm_fault_t vmf_insert_folio_pud(struct vm_fault *vmf, struct folio *folio, 
> bool write);
>  
>  enum transparent_hugepage_flag {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 256adc3..d1ea76e 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1381,14 +1381,12 @@ static void insert_pfn_pmd(struct vm_area_struct 
> *vma, unsigned long addr,
>  {
>   struct mm_struct *mm = vma->vm_mm;
>   pmd_t entry;
> - spinlock_t *ptl;
>  
> - ptl = pmd_lock(mm, pmd);

Apply this comment to the previous patch too, but I think this would be
more self-documenting as:

   lockdep_assert_held(pmd_lock(mm, pmd));

...to make it clear in this diff and into the future what the locking
constraints of this function are.

After that you can add:

Reviewed-by: Dan Williams

Re: [PATCH v6 17/26] memremap: Add is_devdax_page() and is_fsdax_page() helpers

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> Add helpers to determine if a page or folio is a devdax or fsdax page
> or folio.
> 
> Signed-off-by: Alistair Popple 
> Acked-by: David Hildenbrand 
> 
> ---
> 
> Changes for v5:
>  - Renamed is_device_dax_page() to is_devdax_page() for consistency.
> ---
>  include/linux/memremap.h | 22 ++
>  1 file changed, 22 insertions(+)

Patch does what it says on the tin, but I am not a fan patches this
tiny. Fold it in with the first user.

Re: [PATCH v6 14/26] rmap: Add support for PUD sized mappings to rmap

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> The rmap doesn't currently support adding a PUD mapping of a
> folio. This patch adds support for entire PUD mappings of folios,
> primarily to allow for more standard refcounting of device DAX
> folios. Currently DAX is the only user of this and it doesn't require
> support for partially mapped PUD-sized folios so we don't support for
> that for now.
> 
> Signed-off-by: Alistair Popple 
> Acked-by: David Hildenbrand 
> 
> ---
> 
> Changes for v6:
> 
>  - Minor comment formatting fix
>  - Add an additional check for CONFIG_TRANSPARENT_HUGEPAGE to fix a
>build breakage when CONFIG_PGTABLE_HAS_HUGE_LEAVES is not defined.
> 
> Changes for v5:
> 
>  - Fixed accounting as suggested by David.
> 
> Changes for v4:
> 
>  - New for v4, split out rmap changes as suggested by David.
> ---
>  include/linux/rmap.h | 15 ++-
>  mm/rmap.c| 67 ++---
>  2 files changed, 78 insertions(+), 4 deletions(-)

Looks mechanically correct to me.

Reviewed-by: Dan Williams

Re: [PATCH v13 5/5] rust: Use gendwarfksyms + extended modversions for CONFIG_MODVERSIONS

2025-01-13 Thread Masahiro Yamada

On Tue, Jan 14, 2025 at 5:04 AM Sami Tolvanen  wrote:
>
> Hi Masahiro,
>
> On Fri, Jan 10, 2025 at 6:26 PM Masahiro Yamada  wrote:
> >
> > On Sat, Jan 4, 2025 at 2:37 AM Matthew Maurer  wrote:
> > >
> > > From: Sami Tolvanen 
> > >
> > > Previously, two things stopped Rust from using MODVERSIONS:
> > > 1. Rust symbols are occasionally too long to be represented in the
> > >original versions table
> > > 2. Rust types cannot be properly hashed by the existing genksyms
> > >approach because:
> > > * Looking up type definitions in Rust is more complex than C
> > > * Type layout is potentially dependent on the compiler in Rust,
> > >   not just the source type declaration.
> > >
> > > CONFIG_EXTENDED_MODVERSIONS addresses the first point, and
> > > CONFIG_GENDWARFKSYMS the second. If Rust wants to use MODVERSIONS, allow
> > > it to do so by selecting both features.
> > >
> > > Signed-off-by: Sami Tolvanen 
> > > Co-developed-by: Matthew Maurer 
> > > Signed-off-by: Matthew Maurer 
> > > ---
> > >  init/Kconfig  |  3 ++-
> > >  rust/Makefile | 34 --
> > >  2 files changed, 34 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/init/Kconfig b/init/Kconfig
> > > index 
> > > c1f9eb3d5f2e892e977ba1425599502dc830f552..b60acfd9431e0ac2bf401ecb6523b5104ad31150
> > >  100644
> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -1959,7 +1959,8 @@ config RUST
> > > bool "Rust support"
> > > depends on HAVE_RUST
> > > depends on RUST_IS_AVAILABLE
> > > -   depends on !MODVERSIONS
> > > +   select EXTENDED_MODVERSIONS if MODVERSIONS
> > > +   depends on !MODVERSIONS || GENDWARFKSYMS
> > > depends on !GCC_PLUGIN_RANDSTRUCT
> > > depends on !RANDSTRUCT
> > > depends on !DEBUG_INFO_BTF || PAHOLE_HAS_LANG_EXCLUDE
> > > diff --git a/rust/Makefile b/rust/Makefile
> > > index 
> > > a40a3936126d603836e0ec9b42a1285916b60e45..80f970ad81f7989afe5ff2b5f633f50feb7f6006
> > >  100644
> > > --- a/rust/Makefile
> > > +++ b/rust/Makefile
> > > @@ -329,10 +329,11 @@ $(obj)/bindings/bindings_helpers_generated.rs: 
> > > private bindgen_target_extra = ;
> > >  $(obj)/bindings/bindings_helpers_generated.rs: $(src)/helpers/helpers.c 
> > > FORCE
> > > $(call if_changed_dep,bindgen)
> > >
> > > +rust_exports = $(NM) -p --defined-only $(1) | awk '$$2~/(T|R|D|B)/ && 
> > > $$3!~/__cfi/ { printf $(2),$(3) }'
> > > +
> > >  quiet_cmd_exports = EXPORTS $@
> > >cmd_exports = \
> > > -   $(NM) -p --defined-only $< \
> > > -   | awk '$$2~/(T|R|D|B)/ && $$3!~/__cfi/ {printf 
> > > "EXPORT_SYMBOL_RUST_GPL(%s);\n",$$3}' > $@
> > > +   $(call rust_exports,$<,"EXPORT_SYMBOL_RUST_GPL(%s);\n",$$3) > $@
> >
> > I noticed a nit:
> >
> > Both of the two callsites of rust_exports pass
> > '$$3' to the last parameter instead of hardcoding it.
> >
> > Is it a flexibility for future extensions?
> >
> > I cannot think of any other use except for printing
> > the third column, i.e. symbol name.
>
> Good catch, the last parameter isn't necessary anymore. It was used in
> early versions of the series to also pass symbol addresses to
> gendwarfksyms, but that's not needed since we read the symbol table
> directly now.

If you submit a diff, I will squash it to 5/5.
(You do not need to input commit description body)


-- 
Best Regards
Masahiro Yamada

Re: [PATCH] fs: introduce getfsxattrat and setfsxattrat syscalls

2025-01-13 Thread Andrey Albershteyn

On 2025-01-13 12:19:36, Jan Kara wrote:
> On Thu 09-01-25 18:45:40, Andrey Albershteyn wrote:
> > From: Andrey Albershteyn 
> > 
> > Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
> > extended attributes/flags. The syscalls take parent directory FD and
> > path to the child together with struct fsxattr.
> > 
> > This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
> > that file don't need to be open. By having this we can manipulated
> > inode extended attributes not only on normal files but also on
> > special ones. This is not possible with FS_IOC_FSSETXATTR ioctl as
> > opening special files returns VFS special inode instead of
> > underlying filesystem one.
> > 
> > This patch adds two new syscalls which allows userspace to set
> > extended inode attributes on special files by using parent directory
> > to open FS inode.
> > 
> > Also, as vfs_fileattr_set() is now will be called on special files
> > too, let's forbid any other attributes except projid and nextents
> > (symlink can have an extent).
> > 
> > CC: linux-...@vger.kernel.org
> > Signed-off-by: Andrey Albershteyn 
> 
> Couple of comments below:
> 
> > @@ -2953,3 +2956,105 @@ umode_t mode_strip_sgid(struct mnt_idmap *idmap,
> > return mode & ~S_ISGID;
> >  }
> >  EXPORT_SYMBOL(mode_strip_sgid);
> > +
> > +SYSCALL_DEFINE4(getfsxattrat, int, dfd, const char __user *, filename,
> > +   struct fsxattr *, fsx, int, at_flags)
>  ^^^ at_flags should be probably
> unsigned - at least they seem to be for other syscalls.

sure

> 
> > +{
> > +   struct fd dir;
> > +   struct fileattr fa;
> > +   struct path filepath;
> > +   struct inode *inode;
> > +   int error;
> > +
> > +   if (at_flags)
> > +   return -EINVAL;
> 
> Shouldn't we support basic path resolve flags like AT_SYMLINK_NOFOLLOW or
> AT_EMPTY_PATH? I didn't put too much thought to this but intuitively I'd say
> we should follow what path_setxattrat() does.

Hmm, yeah, you are right these two can be passed. I thought about
setting AT_SYMLINK_NOFOLLOW by default (which is also missing here),
but adding allowing passing these seems to be fine.

> 
> > +
> > +   if (!capable(CAP_FOWNER))
> > +   return -EPERM;
> 
> Why? Firstly this does not handle user namespaces at all, secondly it
> doesn't match the check done during ioctl, and thirdly vfs_fileattr_get()
> should do all the needed checks?

Sorry, miss-understood how this works, I will remove this from both
get/set. get*() doesn't need it and set*() checks capabilities in
vfs_fileattr_set(). Thanks!

> 
> > +
> > +   dir = fdget(dfd);
> > +   if (!fd_file(dir))
> > +   return -EBADF;
> > +
> > +   if (!S_ISDIR(file_inode(fd_file(dir))->i_mode)) {
> > +   error = -EBADF;
> > +   goto out;
> > +   }
> > +
> > +   error = user_path_at(dfd, filename, at_flags, &filepath);
> > +   if (error)
> > +   goto out;
> 
> I guess this is OK for now but allowing full flexibility of the "_at"
> syscall (e.g. like setxattrat() does) would be preferred. Mostly so that
> userspace programmer doesn't have to read manpage in detail and think
> whether the particular combination of path arguments is supported by a
> particular syscall. Admittedly VFS could make this a bit simpler. Currently
> the boilerplate code that's needed in path_setxattrat() &
> filename_setxattr() / file_setxattr() is offputting.
> 
> > +
> > +   inode = filepath.dentry->d_inode;
> > +   if (file_inode(fd_file(dir))->i_sb->s_magic != inode->i_sb->s_magic) {
> > +   error = -EBADF;
> > +   goto out_path;
> > +   }
> 
> What's the motivation for this check?

This was one of the comments on the ioctl() patch, that it doesn't
make much sense to allow ioctl() to be called over different
filesystems. But for syscall this is probably make less sense to
restrict it like that. I will drop it.

> 
> > +
> > +   error = vfs_fileattr_get(filepath.dentry, &fa);
> > +   if (error)
> > +   goto out_path;
> > +
> > +   if (copy_fsxattr_to_user(&fa, fsx))
> > +   error = -EFAULT;
> > +
> > +out_path:
> > +   path_put(&filepath);
> > +out:
> > +   fdput(dir);
> > +   return error;
> > +}
> > +
> > +SYSCALL_DEFINE4(setfsxattrat, int, dfd, const char __user *, filename,
> > +   struct fsxattr *, fsx, int, at_flags)
> > +{
> 
> Same comments as for getfsxattrat() apply here as well.
> 
> > -static int copy_fsxattr_from_user(struct fileattr *fa,
> > - struct fsxattr __user *ufa)
> > +int copy_fsxattr_from_user(struct fileattr *fa, struct fsxattr __user *ufa)
> >  {
> > struct fsxattr xfa;
> >  
> > @@ -574,6 +573,7 @@ static int copy_fsxattr_from_user(struct fileattr *fa,
> >  
> > return 0;
> >  }
> > +EXPORT_SYMBOL(copy_fsxattr_from_user);
> 
> I guess no need to export this function? The code you call it from cannot
> be compiled as a module.

Yes, that's true, I added this because copy_fsxattr_to_user()

Re: [PATCH 3/5] KVM: Add a common kvm_run flag to communicate an exit needs completion

2025-01-13 Thread Sean Christopherson

On Mon, Jan 13, 2025, Marc Zyngier wrote:
> On Mon, 13 Jan 2025 18:58:45 +,
> Sean Christopherson  wrote:
> > 
> > On Mon, Jan 13, 2025, Marc Zyngier wrote:
> > > On Mon, 13 Jan 2025 15:44:28 +,
> > > Sean Christopherson  wrote:
> > > > 
> > > > On Sat, Jan 11, 2025, Marc Zyngier wrote:
> > > > > Yet, you don't amend arm64 to publish that flag. Not that I think this
> > > > > causes any issue (even if you save the state at that point without
> > > > > reentering the guest, it will be still be consistent), but that
> > > > > directly contradicts the documentation (isn't that ironic? ;-).
> > > > 
> > > > It does cause issues, I missed this code in kvm_arch_vcpu_ioctl_run():
> > > > 
> > > > if (run->exit_reason == KVM_EXIT_MMIO) {
> > > > ret = kvm_handle_mmio_return(vcpu);
> > > > if (ret <= 0)
> > > > return ret;
> > > > }
> > > 
> > > That's satisfying a load from the guest forwarded to userspace.
> > 
> > And MMIO stores, no?  I.e. PC needs to be incremented on stores as well.
> 
> Yes, *after* the store as completed. If you replay the instruction,
> the same store comes out.
> 
> > > If the VMM did a save of the guest at this stage, restored and resumed it,
> > > *nothing* bad would happen, as PC still points to the instruction that got
> > > forwarded. You'll see the same load again.
> > 
> > But replaying an MMIO store could cause all kinds of problems, and even MMIO
> > loads could theoretically be problematic, e.g. if there are side effects in 
> > the
> > device that trigger on access to a device register.
> 
> But that's the VMM's problem. If it has modified its own state and
> doesn't return to the guest to complete the instruction, that's just
> as bad as a load, which *do* have side effects as well.

Agreed, just wanted to make sure I wasn't completely misunderstanding something
about arm64.

> Overall, the guest state exposed by KVM is always correct, and
> replaying the instruction is not going to change that. It is if the
> VMM is broken that things turn ugly *for the VMM itself*, 
> and I claim that no amount of flag being added is going to help that.

On x86 at least, adding KVM_RUN_NEEDS_COMPLETION reduces the chances for human
error.  x86 has had bugs in both KVM (patch 1) and userspace (Google's VMM when
handling MSR exits) that would have been avoided if KVM_RUN_NEEDS_COMPLETION 
existed.
Unless the VMM is doing something decidely odd, userspace needs to write code 
once
(maybe even just once for all architectures).  For KVM, the flag is set based on
whether or not the vCPU has a valid completion callback, i.e. will be correct so
long as the underlying KVM code is correct.

Contrast that with the current approach, where the KVM developer needs to get
the KVM code correct and remember to update KVM's documentation.  Documentation
is especially problematic, because in practice it can't be tested, i.e. is much
more likely to be missed by the developer and the maintainer.  The VMM either
needs to blindly redo KVM_RUN (as selftests do, and apparently as QEMU does), or
the developer adding VMM support needs to be diligent in reading KVM's 
documentation.
And like KVM documentation, testing that the VMM is implemented to KVM's "spec"
is effectively impossible in practice, because 99.% of the time userspace
exits and save/restore will work just fine.

I do agree that the VMM is likely going to run into problems sooner or later if
the developers/maintainers don't fundamentally understand the need to redo 
KVM_RUN,
but I also think there's significant value in reducing the chances for simple
human error to result in broken VMs.

Re: [PATCH V2] tools/perf/builtin-lock: Fix return code for functions in __cmd_contention

2025-01-13 Thread Namhyung Kim

On Fri, Jan 10, 2025 at 03:07:30PM +0530, Athira Rajeev wrote:
> perf lock contention returns zero exit value even if the lock contention
> BPF setup failed.
> 
>   # ./perf lock con -b true
>   libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was 
> CONFIG_DEBUG_INFO_BTF enabled?
>   libbpf: failed to find '.BTF' ELF section in 
> /lib/modules/6.13.0-rc3+/build/vmlinux
>   libbpf: failed to find valid kernel BTF
>   libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was 
> CONFIG_DEBUG_INFO_BTF enabled?
>   libbpf: failed to find '.BTF' ELF section in 
> /lib/modules/6.13.0-rc3+/build/vmlinux
>   libbpf: failed to find valid kernel BTF
>   libbpf: Error loading vmlinux BTF: -ESRCH
>   libbpf: failed to load object 'lock_contention_bpf'
>   libbpf: failed to load BPF skeleton 'lock_contention_bpf': -ESRCH
>   Failed to load lock-contention BPF skeleton
>   lock contention BPF setup failed
>   # echo $?
>0
> 
> Fix this by saving the return code for lock_contention_prepare
> so that command exits with proper return code. Similarly set the
> return code properly for two other functions in builtin-lock, namely
> setup_output_field() and select_key().
> 
> Signed-off-by: Athira Rajeev 

Reviewed-by: Namhyung Kim 

Thanks,
Namhyung

> ---
> Changelog:
>  v1 -> v2
>  Fixed return code in functions: setup_output_field()
>  and select_key() as pointed out by Namhyung.
> 
>  tools/perf/builtin-lock.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
> index 208c482daa56..94a2bc15a2fa 100644
> --- a/tools/perf/builtin-lock.c
> +++ b/tools/perf/builtin-lock.c
> @@ -2049,7 +2049,8 @@ static int __cmd_contention(int argc, const char **argv)
>   goto out_delete;
>   }
>  
> - if (lock_contention_prepare(&con) < 0) {
> + err = lock_contention_prepare(&con);
> + if (err < 0) {
>   pr_err("lock contention BPF setup failed\n");
>   goto out_delete;
>   }
> @@ -2070,10 +2071,14 @@ static int __cmd_contention(int argc, const char 
> **argv)
>   }
>   }
>  
> - if (setup_output_field(true, output_fields))
> + err = setup_output_field(true, output_fields);
> + if (err) {
> + pr_err("Failed to setup output field\n");
>   goto out_delete;
> + }
>  
> - if (select_key(true))
> + err = select_key(true);
> + if (err)
>   goto out_delete;
>  
>   if (symbol_conf.field_sep) {
> -- 
> 2.43.5
>

Re: [PATCH v6 13/26] mm/memory: Add vmf_insert_page_mkwrite()

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> Currently to map a DAX page the DAX driver calls vmf_insert_pfn. This
> creates a special devmap PTE entry for the pfn but does not take a
> reference on the underlying struct page for the mapping. This is
> because DAX page refcounts are treated specially, as indicated by the
> presence of a devmap entry.
> 
> To allow DAX page refcounts to be managed the same as normal page
> refcounts introduce vmf_insert_page_mkwrite(). This will take a
> reference on the underlying page much the same as vmf_insert_page,
> except it also permits upgrading an existing mapping to be writable if
> requested/possible.
> 
> Signed-off-by: Alistair Popple 
> 
> ---
> 
> Updates from v2:
> 
>  - Rename function to make not DAX specific
> 
>  - Split the insert_page_into_pte_locked() change into a separate
>patch.
> 
> Updates from v1:
> 
>  - Re-arrange code in insert_page_into_pte_locked() based on comments
>from Jan Kara.
> 
>  - Call mkdrity/mkyoung for the mkwrite case, also suggested by Jan.
> ---
>  include/linux/mm.h |  2 ++
>  mm/memory.c| 36 
>  2 files changed, 38 insertions(+)

Looks good to me, you can add:

Reviewed-by: Dan Williams

Re: [PATCH v6 08/26] fs/dax: Remove PAGE_MAPPING_DAX_SHARED mapping flag

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> PAGE_MAPPING_DAX_SHARED is the same as PAGE_MAPPING_ANON. 

I think a bit a bit more detail is warranted, how about?

The page ->mapping pointer can have magic values like
PAGE_MAPPING_DAX_SHARED and PAGE_MAPPING_ANON for page owner specific
usage. In fact, PAGE_MAPPING_DAX_SHARED and PAGE_MAPPING_ANON alias the
same value.

> This isn't currently a problem because FS DAX pages are treated
> specially.

s/are treated specially/are never seen by the anonymous mapping code and
vice versa/

> However a future change will make FS DAX pages more like
> normal pages, so folio_test_anon() must not return true for a FS DAX
> page.
> 
> We could explicitly test for a FS DAX page in folio_test_anon(),
> etc. however the PAGE_MAPPING_DAX_SHARED flag isn't actually
> needed. Instead we can use the page->mapping field to implicitly track
> the first mapping of a page. If page->mapping is non-NULL it implies
> the page is associated with a single mapping at page->index. If the
> page is associated with a second mapping clear page->mapping and set
> page->share to 1.
> 
> This is possible because a shared mapping implies the file-system
> implements dax_holder_operations which makes the ->mapping and
> ->index, which is a union with ->share, unused.
> 
> The page is considered shared when page->mapping == NULL and
> page->share > 0 or page->mapping != NULL, implying it is present in at
> least one address space. This also makes it easier for a future change
> to detect when a page is first mapped into an address space which
> requires special handling.
> 
> Signed-off-by: Alistair Popple 
> ---
>  fs/dax.c   | 45 +--
>  include/linux/page-flags.h |  6 +-
>  2 files changed, 29 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 4e49cc4..d35dbe1 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -351,38 +351,41 @@ static unsigned long dax_end_pfn(void *entry)
>   for (pfn = dax_to_pfn(entry); \
>   pfn < dax_end_pfn(entry); pfn++)
>  
> +/*
> + * A DAX page is considered shared if it has no mapping set and ->share 
> (which
> + * shares the ->index field) is non-zero. Note this may return false even if 
> the
> + * page is shared between multiple files but has not yet actually been mapped
> + * into multiple address spaces.
> + */
>  static inline bool dax_page_is_shared(struct page *page)
>  {
> - return page->mapping == PAGE_MAPPING_DAX_SHARED;
> + return !page->mapping && page->share;
>  }
>  
>  /*
> - * Set the page->mapping with PAGE_MAPPING_DAX_SHARED flag, increase the
> - * refcount.
> + * Increase the page share refcount, warning if the page is not marked as 
> shared.
>   */
>  static inline void dax_page_share_get(struct page *page)
>  {
> - if (page->mapping != PAGE_MAPPING_DAX_SHARED) {
> - /*
> -  * Reset the index if the page was already mapped
> -  * regularly before.
> -  */
> - if (page->mapping)
> - page->share = 1;
> - page->mapping = PAGE_MAPPING_DAX_SHARED;
> - }
> + WARN_ON_ONCE(!page->share);
> + WARN_ON_ONCE(page->mapping);

Given the only caller of this function is dax_associate_entry() it seems
like overkill to check that a function only a few lines away manipulated
->mapping correctly.

I don't see much reason for dax_page_share_get() to exist after your
changes.

Perhaps all that is needed is a dax_make_shared() helper that does the
initial fiddling of '->mapping = NULL' and '->share = 1'?

>   page->share++;
>  }
>  
>  static inline unsigned long dax_page_share_put(struct page *page)
>  {
> + WARN_ON_ONCE(!page->share);
>   return --page->share;
>  }
>  
>  /*
> - * When it is called in dax_insert_entry(), the shared flag will indicate 
> that
> - * whether this entry is shared by multiple files.  If so, set the 
> page->mapping
> - * PAGE_MAPPING_DAX_SHARED, and use page->share as refcount.
> + * When it is called in dax_insert_entry(), the shared flag will indicate
> + * whether this entry is shared by multiple files. If the page has not
> + * previously been associated with any mappings the ->mapping and ->index
> + * fields will be set. If it has already been associated with a mapping
> + * the mapping will be cleared and the share count set. It's then up to the
> + * file-system to track which mappings contain which pages, ie. by 
> implementing
> + * dax_holder_operations.

This feels like a good comment for a new dax_make_shared() not
dax_associate_entry().

I would also:

s/up to the file-system to track which mappings contain which pages, ie. by 
implementing
 dax_holder_operations/up to reverse map users like memory_failure() to
call back into the filesystem to recover ->mapping and ->index
information/

>   */
>  static void dax_associate_entry(void *entry, struct address_space *mapping,
>   struct vm_area_struct *vma,

Re: [PATCH v6 15/26] huge_memory: Add vmf_insert_folio_pud()

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> Currently DAX folio/page reference counts are managed differently to
> normal pages. To allow these to be managed the same as normal pages
> introduce vmf_insert_folio_pud. This will map the entire PUD-sized folio
> and take references as it would for a normally mapped page.
> 
> This is distinct from the current mechanism, vmf_insert_pfn_pud, which
> simply inserts a special devmap PUD entry into the page table without
> holding a reference to the page for the mapping.
> 
> Signed-off-by: Alistair Popple 

Looks correct for what it is:

Reviewed-by: Dan Williams

Re: [PATCH v6 20/26] mm/mlock: Skip ZONE_DEVICE PMDs during mlock

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> At present mlock skips ptes mapping ZONE_DEVICE pages. A future change
> to remove pmd_devmap will allow pmd_trans_huge_lock() to return
> ZONE_DEVICE folios so make sure we continue to skip those.
> 
> Signed-off-by: Alistair Popple 
> Acked-by: David Hildenbrand 

This looks like a fix in that mlock_pte_range() *does* call mlock_folio() 
when pmd_trans_huge_lock() returns a non-NULL @ptl.

So it is not in preparation for a future change it is making the pte and
pmd cases behave the same to drop mlock requests.

The code change looks good, but do add a Fixes tag and reword the
changelog a bit before adding:

Reviewed-by: Dan Williams

Re: [PATCH v6 18/26] mm/gup: Don't allow FOLL_LONGTERM pinning of FS DAX pages

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> Longterm pinning of FS DAX pages should already be disallowed by
> various pXX_devmap checks. However a future change will cause these
> checks to be invalid for FS DAX pages so make
> folio_is_longterm_pinnable() return false for FS DAX pages.
> 
> Signed-off-by: Alistair Popple 
> Reviewed-by: John Hubbard 
> Acked-by: David Hildenbrand 
> ---
>  include/linux/mm.h | 4 
>  1 file changed, 4 insertions(+)

> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f267b06..01edca9 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2078,6 +2078,10 @@ static inline bool folio_is_longterm_pinnable(struct 
> folio *folio)
>   if (folio_is_device_coherent(folio))
>   return false;
>  
> + /* DAX must also always allow eviction. */

This 'eviction' terminology seems like it was copied from the
device-memory comment, but with fsdax it does not fit. How about:

/*
 * Filesystems can only tolerate transient delays to truncate and
 * hole-punch operations
 */

> + if (folio_is_fsdax(folio))
> + return false;
> +

After the comment fixup you can add:

Reviewed-by: Dan Williams

Re: [PATCH v5 05/17] arm64: pgtable: use mmu gather to free p4d level page table

2025-01-13 Thread Qi Zheng


Hi Will,

On 2025/1/14 00:26, Will Deacon wrote:

On Wed, Jan 08, 2025 at 02:57:21PM +0800, Qi Zheng wrote:

Like other levels of page tables, also use mmu gather mechanism to free
p4d level page table.

Signed-off-by: Qi Zheng 
Originally-by: Peter Zijlstra (Intel) 
Reviewed-by: Kevin Brodsky 
Cc: linux-arm-ker...@lists.infradead.org
---
  arch/arm64/include/asm/pgalloc.h |  1 -
  arch/arm64/include/asm/tlb.h | 14 ++
  2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 2965f5a7e39e3..1b4509d3382c6 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -85,7 +85,6 @@ static inline void pgd_populate(struct mm_struct *mm, pgd_t 
*pgdp, p4d_t *p4dp)
__pgd_populate(pgdp, __pa(p4dp), pgdval);
  }
  
-#define __p4d_free_tlb(tlb, p4d, addr)  p4d_free((tlb)->mm, p4d)

  #else
  static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t 
prot)
  {
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index a947c6e784ed2..445282cde9afb 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -111,4 +111,18 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, 
pud_t *pudp,
  }
  #endif
  
+#if CONFIG_PGTABLE_LEVELS > 4

+static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4dp,
+ unsigned long addr)
+{
+   struct ptdesc *ptdesc = virt_to_ptdesc(p4dp);
+
+   if (!pgtable_l5_enabled())
+   return;
+
+   pagetable_p4d_dtor(ptdesc);
+   tlb_remove_ptdesc(tlb, ptdesc);
+}


Should we update p4d_free() to call the destructor, too? It looks like
it just does free_page() atm.


The patch #3 introduces the generic p4d_free() and lets arm64 to use it.
The patch #4 adds the destructor to generic p4d_free(). So IIUC, there
is no problem here.

Thanks!



Will

Re: [PATCH v6 19/26] proc/task_mmu: Mark devdax and fsdax pages as always unpinned

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> The procfs mmu files such as smaps and pagemap currently ignore devdax and
> fsdax pages because these pages are considered special. A future change
> will start treating these as normal pages, meaning they can be exposed via
> smaps and pagemap.
> 
> The only difference is that devdax and fsdax pages can never be pinned for
> DMA via FOLL_LONGTERM, so add an explicit check in pte_is_pinned() to
> reflect that.

I don't understand this patch.

pin_user_pages() is also used for Direct-I/O page pinning, so the
comment about FOLL_LONGTERM is wrong, and I otherwise do not understand
what goes wrong if the only pte_is_pinned() user correctly detects the
pin state?

Re: [PATCH v6 21/26] fs/dax: Properly refcount fs dax pages

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> Currently fs dax pages are considered free when the refcount drops to
> one and their refcounts are not increased when mapped via PTEs or
> decreased when unmapped. This requires special logic in mm paths to
> detect that these pages should not be properly refcounted, and to
> detect when the refcount drops to one instead of zero.
> 
> On the other hand get_user_pages(), etc. will properly refcount fs dax
> pages by taking a reference and dropping it when the page is
> unpinned.
> 
> Tracking this special behaviour requires extra PTE bits
> (eg. pte_devmap) and introduces rules that are potentially confusing
> and specific to FS DAX pages. To fix this, and to possibly allow
> removal of the special PTE bits in future, convert the fs dax page
> refcounts to be zero based and instead take a reference on the page
> each time it is mapped as is currently the case for normal pages.
> 
> This may also allow a future clean-up to remove the pgmap refcounting
> that is currently done in mm/gup.c.

This patch depends on FS_DAX_LIMITED being abandoned first, so do
include the patch at the bottom of this reply in your series before this
patch.

> Signed-off-by: Alistair Popple 
> 
> ---
> 
> Changes since v2:
> 
> Based on some questions from Dan I attempted to have the FS DAX page
> cache (ie. address space) hold a reference to the folio whilst it was
> mapped. However I came to the strong conclusion that this was not the
> right thing to do.
> 
> If the page refcount == 0 it means the page is:
> 
> 1. not mapped into user-space
> 2. not subject to other access via DMA/GUP/etc.
> 
> Ie. From the core MM perspective the page is not in use.
> 
> The fact a page may or may not be present in one or more address space
> mappings is irrelevant for core MM. It just means the page is still in
> use or valid from the file system perspective, and it's a
> responsiblity of the file system to remove these mappings if the pfn
> mapping becomes invalid (along with first making sure the MM state,
> ie. page->refcount, is idle). So we shouldn't be trying to track that
> lifetime with MM refcounts.
> 
> Doing so just makes DMA-idle tracking more complex because there is
> now another thing (one or more address spaces) which can hold
> references on a page. And FS DAX can't even keep track of all the
> address spaces which might contain a reference to the page in the
> XFS/reflink case anyway.
> 
> We could do this if we made file systems invalidate all address space
> mappings prior to calling dax_break_layouts(), but that isn't
> currently neccessary and would lead to increased faults just so we
> could do some superfluous refcounting which the file system already
> does.
> 
> I have however put the page sharing checks and WARN_ON's back which
> also turned out to be useful for figuring out when to re-initialising
> a folio.

I feel like these comments are a useful analysis that deserve not to be
lost to the sands of time on the list.

Perhaps capture a flavor of this relevant for future consideration in a
"DAX page Lifetime" section of Documentation/filesystems/dax.rst?

> ---
>  drivers/nvdimm/pmem.c|   4 +-
>  fs/dax.c | 212 +++-
>  fs/fuse/virtio_fs.c  |   3 +-
>  fs/xfs/xfs_inode.c   |   2 +-
>  include/linux/dax.h  |   6 +-
>  include/linux/mm.h   |  27 +-
>  include/linux/mm_types.h |   7 +-
>  mm/gup.c |   9 +--
>  mm/huge_memory.c |   6 +-
>  mm/internal.h|   2 +-
>  mm/memory-failure.c  |   6 +-
>  mm/memory.c  |   6 +-
>  mm/memremap.c|  47 -
>  mm/mm_init.c |   9 +--
>  mm/swap.c|   2 +-
>  15 files changed, 183 insertions(+), 165 deletions(-)
> 
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index d81faa9..785b2d2 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -513,7 +513,7 @@ static int pmem_attach_disk(struct device *dev,
>  
>   pmem->disk = disk;
>   pmem->pgmap.owner = pmem;
> - pmem->pfn_flags = PFN_DEV;
> + pmem->pfn_flags = 0;
>   if (is_nd_pfn(dev)) {
>   pmem->pgmap.type = MEMORY_DEVICE_FS_DAX;
>   pmem->pgmap.ops = &fsdax_pagemap_ops;
> @@ -522,7 +522,6 @@ static int pmem_attach_disk(struct device *dev,
>   pmem->data_offset = le64_to_cpu(pfn_sb->dataoff);
>   pmem->pfn_pad = resource_size(res) -
>   range_len(&pmem->pgmap.range);
> - pmem->pfn_flags |= PFN_MAP;
>   bb_range = pmem->pgmap.range;
>   bb_range.start += pmem->data_offset;
>   } else if (pmem_should_map_pages(dev)) {
> @@ -532,7 +531,6 @@ static int pmem_attach_disk(struct device *dev,
>   pmem->pgmap.type = MEMORY_DEVICE_FS_DAX;
>   pmem->pgmap.ops = &fsdax_pagemap_ops;
>   addr = devm_memremap_pages(dev, &pmem->pgmap);
> - pmem->pfn_f

[PATCH v2 net-next 04/13] net: enetc: add MAC filter for i.MX95 ENETC PF

2025-01-13 Thread Wei Fang

The i.MX95 ENETC supports both MAC hash filter and MAC exact filter. MAC
hash filter is implenented through a 64-bits hash table to match against
the hashed addresses, PF and VFs each have two MAC hash tables, one is
for unicast and the other one is for multicast. But MAC exact filter is
shared between SIs (PF and VFs), each table entry contains a MAC address
that may be unicast or multicast and the entry also contains an SI bitmap
field that indicates for which SIs the entry is valid.

For i.MX95 ENETC, MAC exact filter only has 4 entries. According to the
observation of the system default network configuration, the MAC filter
will be configured with multiple multicast addresses, so MAC exact filter
does not have enough entries to implement multicast filtering. Therefore,
the current MAC exact filter is only used for unicast filtering. If the
number of unicast addresses exceeds 4, then MAC hash filter is used.

Note that both MAC hash filter and MAC exact filter can only be accessed
by PF, VFs can notify PF to set its corresponding MAC filter through the
mailbox mechanism of ENETC. But currently MAC filter is only added for
i.MX95 ENETC PF. The MAC filter support of ENETC VFs will be supported in
subsequent patches.

Signed-off-by: Wei Fang 
---
v2 changes:
Fix the compile warning.
---
 drivers/net/ethernet/freescale/enetc/enetc.h  |   2 +
 .../net/ethernet/freescale/enetc/enetc4_hw.h  |   8 +
 .../net/ethernet/freescale/enetc/enetc4_pf.c  | 411 +-
 .../net/ethernet/freescale/enetc/enetc_hw.h   |   6 +
 .../net/ethernet/freescale/enetc/enetc_pf.h   |  11 +
 5 files changed, 437 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h 
b/drivers/net/ethernet/freescale/enetc/enetc.h
index 9380d3e8ca01..4dba91408e3d 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -316,6 +316,8 @@ struct enetc_si {
const struct enetc_si_ops *ops;
 
struct enetc_mac_filter mac_filter[MADDR_TYPE];
+   struct workqueue_struct *workqueue;
+   struct work_struct rx_mode_task;
 };
 
 #define ENETC_SI_ALIGN 32
diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_hw.h 
b/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
index 695cb07c74bc..826359004850 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
@@ -99,6 +99,14 @@
 #define ENETC4_PSICFGR2(a) ((a) * 0x80 + 0x2018)
 #define  PSICFGR2_NUM_MSIX GENMASK(5, 0)
 
+/* Port station interface a unicast MAC hash filter register 0/1 */
+#define ENETC4_PSIUMHFR0(a)((a) * 0x80 + 0x2050)
+#define ENETC4_PSIUMHFR1(a)((a) * 0x80 + 0x2054)
+
+/* Port station interface a multicast MAC hash filter register 0/1 */
+#define ENETC4_PSIMMHFR0(a)((a) * 0x80 + 0x2058)
+#define ENETC4_PSIMMHFR1(a)((a) * 0x80 + 0x205c)
+
 #define ENETC4_PMCAPR  0x4004
 #define  PMCAPR_HD BIT(8)
 #define  PMCAPR_FP GENMASK(10, 9)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
index b957e92e3a00..7e69c9be36a8 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
@@ -11,6 +11,15 @@
 
 #define ENETC_SI_MAX_RING_NUM  8
 
+#define ENETC_MAC_FILTER_TYPE_UC   BIT(0)
+#define ENETC_MAC_FILTER_TYPE_MC   BIT(1)
+#define ENETC_MAC_FILTER_TYPE_ALL  (ENETC_MAC_FILTER_TYPE_UC | \
+ENETC_MAC_FILTER_TYPE_MC)
+
+struct enetc_mac_addr {
+   u8 addr[ETH_ALEN];
+};
+
 static void enetc4_get_port_caps(struct enetc_pf *pf)
 {
struct enetc_hw *hw = &pf->si->hw;
@@ -26,6 +35,9 @@ static void enetc4_get_port_caps(struct enetc_pf *pf)
 
val = enetc_port_rd(hw, ENETC4_PMCAPR);
pf->caps.half_duplex = (val & PMCAPR_HD) ? 1 : 0;
+
+   val = enetc_port_rd(hw, ENETC4_PSIMAFCAPR);
+   pf->caps.mac_filter_num = val & PSIMAFCAPR_NUM_MAC_AFTE;
 }
 
 static void enetc4_pf_set_si_primary_mac(struct enetc_hw *hw, int si,
@@ -71,9 +83,33 @@ static int enetc4_pf_struct_init(struct enetc_si *si)
 
enetc4_get_port_caps(pf);
 
+   INIT_HLIST_HEAD(&pf->mac_list);
+   mutex_init(&pf->mac_list_lock);
+
return 0;
 }
 
+static void enetc4_pf_destroy_mac_list(struct enetc_pf *pf)
+{
+   struct enetc_mac_list_entry *entry;
+   struct hlist_node *tmp;
+
+   scoped_guard(mutex, &pf->mac_list_lock) {
+   hlist_for_each_entry_safe(entry, tmp, &pf->mac_list, node) {
+   hlist_del(&entry->node);
+   kfree(entry);
+   }
+
+   pf->num_mfe = 0;
+   }
+}
+
+static void enetc4_pf_struct_free(struct enetc_pf *pf)
+{
+   enetc4_pf_destroy_mac_list(pf);
+   mutex_destroy(&pf->mac_list_lock);
+}
+
 static u32 enetc4_psicfgr0_

[PATCH v2 net-next 05/13] net: enetc: add debugfs interface to dump MAC filter

2025-01-13 Thread Wei Fang

ENETC's MAC filter consists of hash MAC filter and exact MAC filter. Hash
MAC filter is a 64-entry hash table consisting of two 32-bit registers.
Exact MAC filter is implemented by configuring MAC address filter table
through command BD ring. The table is stored in ENETC's internal memory
and needs to be read through command BD ring. In order to facilitate
debugging, added a debugfs interface to get the relevant information
about MAC filter.

Signed-off-by: Wei Fang 
---
 drivers/net/ethernet/freescale/enetc/Makefile |  1 +
 drivers/net/ethernet/freescale/enetc/enetc.h  |  1 +
 .../ethernet/freescale/enetc/enetc4_debugfs.c | 93 +++
 .../ethernet/freescale/enetc/enetc4_debugfs.h | 20 
 .../net/ethernet/freescale/enetc/enetc4_pf.c  |  4 +
 5 files changed, 119 insertions(+)
 create mode 100644 drivers/net/ethernet/freescale/enetc/enetc4_debugfs.c
 create mode 100644 drivers/net/ethernet/freescale/enetc/enetc4_debugfs.h

diff --git a/drivers/net/ethernet/freescale/enetc/Makefile 
b/drivers/net/ethernet/freescale/enetc/Makefile
index 707a68e26971..f1c5ad45fd76 100644
--- a/drivers/net/ethernet/freescale/enetc/Makefile
+++ b/drivers/net/ethernet/freescale/enetc/Makefile
@@ -16,6 +16,7 @@ fsl-enetc-$(CONFIG_FSL_ENETC_QOS) += enetc_qos.o
 
 obj-$(CONFIG_NXP_ENETC4) += nxp-enetc4.o
 nxp-enetc4-y := enetc4_pf.o
+nxp-enetc4-$(CONFIG_DEBUG_FS) += enetc4_debugfs.o
 
 obj-$(CONFIG_FSL_ENETC_VF) += fsl-enetc-vf.o
 fsl-enetc-vf-y := enetc_vf.o
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h 
b/drivers/net/ethernet/freescale/enetc/enetc.h
index 4dba91408e3d..ca1bc85c0ac9 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -318,6 +318,7 @@ struct enetc_si {
struct enetc_mac_filter mac_filter[MADDR_TYPE];
struct workqueue_struct *workqueue;
struct work_struct rx_mode_task;
+   struct dentry *debugfs_root;
 };
 
 #define ENETC_SI_ALIGN 32
diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_debugfs.c 
b/drivers/net/ethernet/freescale/enetc/enetc4_debugfs.c
new file mode 100644
index ..3a660c80344a
--- /dev/null
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_debugfs.c
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: GPL-2.0+
+/* Copyright 2025 NXP */
+
+#include 
+#include 
+#include 
+
+#include "enetc_pf.h"
+#include "enetc4_debugfs.h"
+
+#define is_en(x)   (x) ? "Enabled" : "Disabled"
+
+static void enetc_show_si_mac_hash_filter(struct seq_file *s, int i)
+{
+   struct enetc_si *si = s->private;
+   struct enetc_hw *hw = &si->hw;
+   u32 hash_h, hash_l;
+
+   hash_l = enetc_port_rd(hw, ENETC4_PSIUMHFR0(i));
+   hash_h = enetc_port_rd(hw, ENETC4_PSIUMHFR1(i));
+   seq_printf(s, "SI %d unicast MAC hash filter: 0x%08x%08x\n",
+  i, hash_h, hash_l);
+
+   hash_l = enetc_port_rd(hw, ENETC4_PSIMMHFR0(i));
+   hash_h = enetc_port_rd(hw, ENETC4_PSIMMHFR1(i));
+   seq_printf(s, "SI %d multicast MAC hash filter: 0x%08x%08x\n",
+  i, hash_h, hash_l);
+}
+
+static int enetc_mac_filter_show(struct seq_file *s, void *data)
+{
+   struct maft_entry_data maft_data;
+   struct enetc_si *si = s->private;
+   struct enetc_hw *hw = &si->hw;
+   struct maft_keye_data *keye;
+   struct enetc_pf *pf;
+   int i, err, num_si;
+   u32 val;
+
+   pf = enetc_si_priv(si);
+   num_si = pf->caps.num_vsi + 1;
+
+   val = enetc_port_rd(hw, ENETC4_PSIPMMR);
+   for (i = 0; i < num_si; i++) {
+   seq_printf(s, "SI %d Unicast Promiscuous mode: %s\n",
+  i, is_en(PSIPMMR_SI_MAC_UP(i) & val));
+   seq_printf(s, "SI %d Multicast Promiscuous mode: %s\n",
+  i, is_en(PSIPMMR_SI_MAC_MP(i) & val));
+   }
+
+   /* MAC hash filter table */
+   for (i = 0; i < num_si; i++)
+   enetc_show_si_mac_hash_filter(s, i);
+
+   if (!pf->num_mfe)
+   return 0;
+
+   /* MAC address filter table */
+   seq_puts(s, "Show MAC address filter table\n");
+   for (i = 0; i < pf->num_mfe; i++) {
+   memset(&maft_data, 0, sizeof(maft_data));
+   err = ntmp_maft_query_entry(&si->ntmp.cbdrs, i, &maft_data);
+   if (err)
+   return err;
+
+   keye = &maft_data.keye;
+   seq_printf(s, "Entry %d, MAC: %pM, SI bitmap: 0x%04x\n", i,
+  keye->mac_addr, 
le16_to_cpu(maft_data.cfge.si_bitmap));
+   }
+
+   return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(enetc_mac_filter);
+
+void enetc_create_debugfs(struct enetc_si *si)
+{
+   struct net_device *ndev = si->ndev;
+   struct dentry *root;
+
+   root = debugfs_create_dir(netdev_name(ndev), NULL);
+   if (IS_ERR(root))
+   return;
+
+   si->debugfs_root = root;
+
+   debugfs_create_file("mac_filter", 0444, root, si, 
&enetc_mac_filter_fops);
+}
+
+vo

[PATCH v2 net-next 03/13] net: enetc: move generic MAC filterng interfaces to enetc-core

2025-01-13 Thread Wei Fang

Although only ENETC PF can access the MAC address filter table, the table
entries can specify MAC address filtering for one or more SIs based on
SI_BITMAP, which means that the table also supports MAC address filtering
for VFs.

Currently, only the ENETC v1 PF driver supports MAC address filtering. In
order to add the MAC address filtering support for the ENETC v4 PF driver
and VF driver in the future, the relevant generic interfaces are moved to
the enetc-core driver. At the same time, the struct enetc_mac_filter is
moved from enetc_pf to enetc_si, because enetc_si is a structure shared by
PF and VFs. This lays the basis for i.MX95 ENETC PF and VFs to support
MAC address filtering.

Signed-off-by: Wei Fang 
---
 drivers/net/ethernet/freescale/enetc/enetc.c  | 36 ++
 drivers/net/ethernet/freescale/enetc/enetc.h  | 17 +++
 .../net/ethernet/freescale/enetc/enetc_pf.c   | 49 +++
 .../net/ethernet/freescale/enetc/enetc_pf.h   | 14 --
 4 files changed, 60 insertions(+), 56 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c 
b/drivers/net/ethernet/freescale/enetc/enetc.c
index 6a6fc819dfde..6d21c133e418 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -36,6 +36,42 @@ static void enetc_change_preemptible_tcs(struct 
enetc_ndev_priv *priv,
enetc_mm_commit_preemptible_tcs(priv);
 }
 
+static int enetc_mac_addr_hash_idx(const u8 *addr)
+{
+   u64 fold = __swab64(ether_addr_to_u64(addr)) >> 16;
+   u64 mask = 0;
+   int res = 0;
+   int i;
+
+   for (i = 0; i < 8; i++)
+   mask |= BIT_ULL(i * 6);
+
+   for (i = 0; i < 6; i++)
+   res |= (hweight64(fold & (mask << i)) & 0x1) << i;
+
+   return res;
+}
+
+void enetc_add_mac_addr_ht_filter(struct enetc_mac_filter *filter,
+ const unsigned char *addr)
+{
+   int idx = enetc_mac_addr_hash_idx(addr);
+
+   /* add hash table entry */
+   __set_bit(idx, filter->mac_hash_table);
+   filter->mac_addr_cnt++;
+}
+EXPORT_SYMBOL_GPL(enetc_add_mac_addr_ht_filter);
+
+void enetc_reset_mac_addr_filter(struct enetc_mac_filter *filter)
+{
+   filter->mac_addr_cnt = 0;
+
+   bitmap_zero(filter->mac_hash_table,
+   ENETC_MADDR_HASH_TBL_SZ);
+}
+EXPORT_SYMBOL_GPL(enetc_reset_mac_addr_filter);
+
 static int enetc_num_stack_tx_queues(struct enetc_ndev_priv *priv)
 {
int num_tx_rings = priv->num_tx_rings;
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h 
b/drivers/net/ethernet/freescale/enetc/enetc.h
index 4ff0957e69be..9380d3e8ca01 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -23,6 +23,18 @@
 
 #define ENETC_CBD_DATA_MEM_ALIGN 64
 
+#define ENETC_MADDR_HASH_TBL_SZ64
+
+enum enetc_mac_addr_type {UC, MC, MADDR_TYPE};
+
+struct enetc_mac_filter {
+   union {
+   char mac_addr[ETH_ALEN];
+   DECLARE_BITMAP(mac_hash_table, ENETC_MADDR_HASH_TBL_SZ);
+   };
+   int mac_addr_cnt;
+};
+
 struct enetc_tx_swbd {
union {
struct sk_buff *skb;
@@ -302,6 +314,8 @@ struct enetc_si {
int hw_features;
const struct enetc_drvdata *drvdata;
const struct enetc_si_ops *ops;
+
+   struct enetc_mac_filter mac_filter[MADDR_TYPE];
 };
 
 #define ENETC_SI_ALIGN 32
@@ -484,6 +498,9 @@ int enetc_alloc_si_resources(struct enetc_ndev_priv *priv);
 void enetc_free_si_resources(struct enetc_ndev_priv *priv);
 int enetc_configure_si(struct enetc_ndev_priv *priv);
 int enetc_get_driver_data(struct enetc_si *si);
+void enetc_add_mac_addr_ht_filter(struct enetc_mac_filter *filter,
+ const unsigned char *addr);
+void enetc_reset_mac_addr_filter(struct enetc_mac_filter *filter);
 
 int enetc_open(struct net_device *ndev);
 int enetc_close(struct net_device *ndev);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index a214749a4af6..cc3e52bd3096 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -72,30 +72,6 @@ static void enetc_set_isol_vlan(struct enetc_hw *hw, int si, 
u16 vlan, u8 qos)
enetc_port_wr(hw, ENETC_PSIVLANR(si), val);
 }
 
-static int enetc_mac_addr_hash_idx(const u8 *addr)
-{
-   u64 fold = __swab64(ether_addr_to_u64(addr)) >> 16;
-   u64 mask = 0;
-   int res = 0;
-   int i;
-
-   for (i = 0; i < 8; i++)
-   mask |= BIT_ULL(i * 6);
-
-   for (i = 0; i < 6; i++)
-   res |= (hweight64(fold & (mask << i)) & 0x1) << i;
-
-   return res;
-}
-
-static void enetc_reset_mac_addr_filter(struct enetc_mac_filter *filter)
-{
-   filter->mac_addr_cnt = 0;
-
-   bitmap_zero(filter->mac_hash_table,
-   ENETC_MADDR_HASH_TBL_SZ);
-}
-
 static void enetc_add_mac_addr

[PATCH v2 net-next 01/13] net: enetc: add initial netc-lib driver to support NTMP

2025-01-13 Thread Wei Fang

Some NETC functionality is controlled using control messages sent to the
hardware using BD ring interface with 32B descriptor similar to transmit
BD ring used on ENETC. This BD ring interface is referred to as command
BD ring. It is used to configure functionality where the underlying
resources may be shared between different entities or being too large to
configure using direct registers. Therefore, a messaging protocol called
NETC Table Management Protocol (NTMP) is provided for exchanging
configuration and management information between the software and the
hardware using the command BD ring interface.

For i.MX95, NTMP has been upgraded to version 2.0, which is incompatible
with LS1028A, because the message formats have been changed. Therefore,
add the netc-lib driver to support NTMP 2.0 to operate various tables.
Note that, only MAC address filter table and RSS table are supported at
the moment. More tables will be supported in subsequent patches.

It is worth mentioning that the purpose of the netc-lib driver is to
provide some NTMP-based generic interfaces for ENETC and NETC Switch
drivers. Currently, it only supports the configurations of some tables.
Interfaces such as tc flower and debugfs will be added in the future.

Signed-off-by: Wei Fang 
---
v2 changes:
Change NTMP_FILL_CRD() and NTMP_FILL_CRD_EID to functions.
---
 drivers/net/ethernet/freescale/enetc/Kconfig  |  11 +
 drivers/net/ethernet/freescale/enetc/Makefile |   3 +
 drivers/net/ethernet/freescale/enetc/ntmp.c   | 468 ++
 .../ethernet/freescale/enetc/ntmp_formats.h   |  59 +++
 include/linux/fsl/ntmp.h  | 178 +++
 5 files changed, 719 insertions(+)
 create mode 100644 drivers/net/ethernet/freescale/enetc/ntmp.c
 create mode 100644 drivers/net/ethernet/freescale/enetc/ntmp_formats.h
 create mode 100644 include/linux/fsl/ntmp.h

diff --git a/drivers/net/ethernet/freescale/enetc/Kconfig 
b/drivers/net/ethernet/freescale/enetc/Kconfig
index 6c2779047dcd..94db8e8d0eb3 100644
--- a/drivers/net/ethernet/freescale/enetc/Kconfig
+++ b/drivers/net/ethernet/freescale/enetc/Kconfig
@@ -15,6 +15,16 @@ config NXP_ENETC_PF_COMMON
 
  If compiled as module (M), the module name is nxp-enetc-pf-common.
 
+config NXP_NETC_LIB
+   tristate "NETC Library"
+   help
+ This module provides common functionalities for both ENETC and NETC
+ Switch, such as NETC Table Management Protocol (NTMP) 2.0, common tc
+ flower and debugfs interfaces and so on.
+
+ If compiled as module (M), the module name is nxp-netc-lib.
+
+
 config FSL_ENETC
tristate "ENETC PF driver"
depends on PCI_MSI
@@ -40,6 +50,7 @@ config NXP_ENETC4
select FSL_ENETC_CORE
select FSL_ENETC_MDIO
select NXP_ENETC_PF_COMMON
+   select NXP_NETC_LIB
select PHYLINK
select DIMLIB
help
diff --git a/drivers/net/ethernet/freescale/enetc/Makefile 
b/drivers/net/ethernet/freescale/enetc/Makefile
index 6fd27ee4fcd1..707a68e26971 100644
--- a/drivers/net/ethernet/freescale/enetc/Makefile
+++ b/drivers/net/ethernet/freescale/enetc/Makefile
@@ -6,6 +6,9 @@ fsl-enetc-core-y := enetc.o enetc_cbdr.o enetc_ethtool.o
 obj-$(CONFIG_NXP_ENETC_PF_COMMON) += nxp-enetc-pf-common.o
 nxp-enetc-pf-common-y := enetc_pf_common.o
 
+obj-$(CONFIG_NXP_NETC_LIB) += nxp-netc-lib.o
+nxp-netc-lib-y := ntmp.o
+
 obj-$(CONFIG_FSL_ENETC) += fsl-enetc.o
 fsl-enetc-y := enetc_pf.o
 fsl-enetc-$(CONFIG_PCI_IOV) += enetc_msg.o
diff --git a/drivers/net/ethernet/freescale/enetc/ntmp.c 
b/drivers/net/ethernet/freescale/enetc/ntmp.c
new file mode 100644
index ..ba8a2ac9d4b4
--- /dev/null
+++ b/drivers/net/ethernet/freescale/enetc/ntmp.c
@@ -0,0 +1,468 @@
+// SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause)
+/*
+ * NETC NTMP (NETC Table Management Protocol) 2.0 Library
+ * Copyright 2025 NXP
+ */
+
+#include 
+#include 
+#include 
+
+#include "ntmp_formats.h"
+
+#define NETC_CBDR_TIMEOUT  1000 /* us */
+#define NETC_CBDR_MR_ENBIT(31)
+
+#define NTMP_BASE_ADDR_ALIGN   128
+#define NTMP_DATA_ADDR_ALIGN   32
+
+/* Define NTMP Table ID */
+#define NTMP_MAFT_ID   1
+#define NTMP_RSST_ID   3
+
+/* Generic Update Actions for most tables */
+#define NTMP_GEN_UA_CFGEU  BIT(0)
+#define NTMP_GEN_UA_STSEU  BIT(1)
+
+#define NTMP_ENTRY_ID_SIZE 4
+#define RSST_ENTRY_NUM 64
+#define RSST_STSE_DATA_SIZE(n) ((n) * 8)
+#define RSST_CFGE_DATA_SIZE(n) (n)
+
+int netc_setup_cbdr(struct device *dev, int cbd_num,
+   struct netc_cbdr_regs *regs,
+   struct netc_cbdr *cbdr)
+{
+   int size;
+
+   size = cbd_num * sizeof(union netc_cbd) + NTMP_BASE_ADDR_ALIGN;
+
+   cbdr->addr_base = dma_alloc_coherent(dev, size, &cbdr->dma_base,
+GFP_KERNEL);
+   if (!cbdr->addr_

[PATCH v2 net-next 02/13] net: enetc: add command BD ring support for i.MX95 ENETC

2025-01-13 Thread Wei Fang

The command BD ring is used to configure functionality where the
underlying resources may be shared between different entities or being
too large to configure using direct registers (such as lookup tables).

Because the command BD and table formats of i.MX95 and LS1028A are very
different, the software processing logic is also different. In order to
ensure driver compatibility, struct enetc_si_ops is introduced. This
structure defines some hooks shared by VSI and PSI. Different hardware
driver will register different hooks, For example, setup_cbdr() is used
to initialize the command BD ring, and teardown_cbdr() is used to free
the command BD ring.

Signed-off-by: Wei Fang 
---
 drivers/net/ethernet/freescale/enetc/enetc.h  | 27 --
 .../net/ethernet/freescale/enetc/enetc4_pf.c  | 47 -
 .../net/ethernet/freescale/enetc/enetc_cbdr.c | 51 ---
 .../net/ethernet/freescale/enetc/enetc_pf.c   | 13 +++--
 .../net/ethernet/freescale/enetc/enetc_vf.c   | 13 +++--
 5 files changed, 132 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h 
b/drivers/net/ethernet/freescale/enetc/enetc.h
index 4ad4eb5c5a74..4ff0957e69be 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -266,6 +267,19 @@ struct enetc_platform_info {
const struct enetc_drvdata *data;
 };
 
+struct enetc_si;
+
+/*
+ * This structure defines the some common hooks for ENETC PSI and VSI.
+ * In addition, since VSI only uses the struct enetc_si as its private
+ * driver data, so this structure also define some hooks specifically
+ * for VSI. For VSI-specific hooks, the format is ???vf_*()???.
+ */
+struct enetc_si_ops {
+   int (*setup_cbdr)(struct enetc_si *si);
+   void (*teardown_cbdr)(struct enetc_si *si);
+};
+
 /* PCI IEP device data */
 struct enetc_si {
struct pci_dev *pdev;
@@ -274,7 +288,10 @@ struct enetc_si {
 
struct net_device *ndev; /* back ref. */
 
-   struct enetc_cbdr cbd_ring;
+   union {
+   struct enetc_cbdr cbd_ring; /* Only ENETC 1.0 */
+   struct ntmp_priv ntmp; /* ENETC 4.1 and later */
+   };
 
int num_rx_rings; /* how many rings are available in the SI */
int num_tx_rings;
@@ -284,6 +301,7 @@ struct enetc_si {
u16 revision;
int hw_features;
const struct enetc_drvdata *drvdata;
+   const struct enetc_si_ops *ops;
 };
 
 #define ENETC_SI_ALIGN 32
@@ -490,9 +508,10 @@ void enetc_mm_link_state_update(struct enetc_ndev_priv 
*priv, bool link);
 void enetc_mm_commit_preemptible_tcs(struct enetc_ndev_priv *priv);
 
 /* control buffer descriptor ring (CBDR) */
-int enetc_setup_cbdr(struct device *dev, struct enetc_hw *hw, int bd_count,
-struct enetc_cbdr *cbdr);
-void enetc_teardown_cbdr(struct enetc_cbdr *cbdr);
+int enetc_setup_cbdr(struct enetc_si *si);
+void enetc_teardown_cbdr(struct enetc_si *si);
+int enetc4_setup_cbdr(struct enetc_si *si);
+void enetc4_teardown_cbdr(struct enetc_si *si);
 int enetc_set_mac_flt_entry(struct enetc_si *si, int index,
char *mac_addr, int si_map);
 int enetc_clear_mac_flt_entry(struct enetc_si *si, int index);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
index fc41078c4f5d..b957e92e3a00 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
@@ -260,6 +260,23 @@ static void enetc4_configure_port(struct enetc_pf *pf)
enetc4_enable_trx(pf);
 }
 
+static int enetc4_init_ntmp_priv(struct enetc_si *si)
+{
+   struct ntmp_priv *ntmp = &si->ntmp;
+
+   ntmp->dev_type = NETC_DEV_ENETC;
+
+   /* For ENETC 4.1, all table versions are 0 */
+   memset(&ntmp->cbdrs.tbl, 0, sizeof(ntmp->cbdrs.tbl));
+
+   return si->ops->setup_cbdr(si);
+}
+
+static void enetc4_free_ntmp_priv(struct enetc_si *si)
+{
+   si->ops->teardown_cbdr(si);
+}
+
 static int enetc4_pf_init(struct enetc_pf *pf)
 {
struct device *dev = &pf->si->pdev->dev;
@@ -272,11 +289,22 @@ static int enetc4_pf_init(struct enetc_pf *pf)
return err;
}
 
+   err = enetc4_init_ntmp_priv(pf->si);
+   if (err) {
+   dev_err(dev, "Failed to init CBDR\n");
+   return err;
+   }
+
enetc4_configure_port(pf);
 
return 0;
 }
 
+static void enetc4_pf_free(struct enetc_pf *pf)
+{
+   enetc4_free_ntmp_priv(pf->si);
+}
+
 static const struct net_device_ops enetc4_ndev_ops = {
.ndo_open   = enetc_open,
.ndo_stop   = enetc_close,
@@ -688,6 +716,11 @@ static void enetc4_pf_netdev_destroy(struct enetc_si *si)
free_netdev(ndev);
 }
 
+static const struct enetc_si_ops enetc4_psi_ops = {
+   .setup_cbdr

[PATCH v2 net-next 00/13] Add more feautues for ENETC v4 - round 2

2025-01-13 Thread Wei Fang

This patch set adds the following features.
1. Compared with ENETC v1, the formats of tables and command BD of ENETC
v4 have changed significantly, and the two are not compatible. Therefore,
in order to support the NETC Table Management Protocol (NTMP) v2.0, we
introduced the netc-lib driver and added support for MAC address filter
table and RSS table.
2. Add MAC filter and VLAN filter support for i.MX95 ENETC PF.
3. Add RSS support for i.MX95 ENETC PF.
4. Add loopback support for i.MX95 ENETC PF.

---
v1 Link: https://lore.kernel.org/imx/20250103060610.2233908-1-wei.f...@nxp.com/
---

Wei Fang (13):
  net: enetc: add initial netc-lib driver to support NTMP
  net: enetc: add command BD ring support for i.MX95 ENETC
  net: enetc: move generic MAC filterng interfaces to enetc-core
  net: enetc: add MAC filter for i.MX95 ENETC PF
  net: enetc: add debugfs interface to dump MAC filter
  net: enetc: make enetc_set_rxfh() and enetc_get_rxfh() reusable
  net: enetc: add RSS support for i.MX95 ENETC PF
  net: enetc: enable RSS feature by default
  net: enetc: move generic VLAN filter interfaces to enetc-core
  net: enetc: move generic VLAN hash filter functions to
enetc_pf_common.c
  net: enetc: add VLAN filtering support for i.MX95 ENETC PF
  net: enetc: add loopback support for i.MX95 ENETC PF
  MAINTAINERS: add new file ntmp.h to ENETC driver

 MAINTAINERS   |   1 +
 drivers/net/ethernet/freescale/enetc/Kconfig  |  11 +
 drivers/net/ethernet/freescale/enetc/Makefile |   4 +
 drivers/net/ethernet/freescale/enetc/enetc.c  | 103 +++-
 drivers/net/ethernet/freescale/enetc/enetc.h  |  58 +-
 .../ethernet/freescale/enetc/enetc4_debugfs.c |  93 +++
 .../ethernet/freescale/enetc/enetc4_debugfs.h |  20 +
 .../net/ethernet/freescale/enetc/enetc4_hw.h  |  12 +
 .../net/ethernet/freescale/enetc/enetc4_pf.c  | 537 +-
 .../net/ethernet/freescale/enetc/enetc_cbdr.c |  65 ++-
 .../ethernet/freescale/enetc/enetc_ethtool.c  |  71 ++-
 .../net/ethernet/freescale/enetc/enetc_hw.h   |   6 +
 .../net/ethernet/freescale/enetc/enetc_pf.c   | 140 ++---
 .../net/ethernet/freescale/enetc/enetc_pf.h   |  32 +-
 .../freescale/enetc/enetc_pf_common.c |  46 +-
 .../freescale/enetc/enetc_pf_common.h |   2 +
 .../net/ethernet/freescale/enetc/enetc_vf.c   |  19 +-
 drivers/net/ethernet/freescale/enetc/ntmp.c   | 468 +++
 .../ethernet/freescale/enetc/ntmp_formats.h   |  59 ++
 include/linux/fsl/ntmp.h  | 178 ++
 20 files changed, 1729 insertions(+), 196 deletions(-)
 create mode 100644 drivers/net/ethernet/freescale/enetc/enetc4_debugfs.c
 create mode 100644 drivers/net/ethernet/freescale/enetc/enetc4_debugfs.h
 create mode 100644 drivers/net/ethernet/freescale/enetc/ntmp.c
 create mode 100644 drivers/net/ethernet/freescale/enetc/ntmp_formats.h
 create mode 100644 include/linux/fsl/ntmp.h

-- 
2.34.1

Re: watchdog: BUG: soft lockup

2025-01-13 Thread wzs

Thanks for the tip!

Doug Anderson  于2025年1月9日周四 01:33写道：
>
> Hi,
>
> On Sun, Dec 22, 2024 at 10:32 PM wzs  wrote:
> >
> > Hello,
> > when fuzzing the Linux kernel,
> > I triggered many "watch: BUG: soft lockup" warnings.
> > I am not sure whether this is an issue with the kernel or with the
> > fuzzing program I ran.
> > (The same fuzzing program, when tested on kernel versions from
> > Linux-6.7.0 to 6.12.0, triggers the 'watchdog: BUG: soft lockup'
> > warning on some versions, while others do not. Linux 6.12.0 is the
> > latest stable release where this error occurs.)
> >
> > The bug information I provided below is from the Linux-6.12.0 kernel.
> > If you need bug information from other versions, I would be happy to 
> > provide it.
> >
> > kernel config :https://pastebin.com/i4LPXNAN
> > console output :https://pastebin.com/uKVpvJ78
>
> IMO it's nearly always a bug if userspace can cause the kernel to soft
> lockup. I'd expect this isn't a bug in the soft lockup detector but a
> problem in whatever part of the kernel you're fuzzing. For some
> details of the soft lockup detector, see
> `Documentation/admin-guide/lockup-watchdogs.rst`.
>
> Presumably you're fuzzing the kernel in a way that causes it to enter
> a big loop while preemption is disabled, or something like that.
> Presumably the kernel should be detecting something invalid that
> userspace did and that would keep it from looping so long.
>
> I tried looking at your pastebin and probably what's going on is
> somewhere hidden in there, but unfortunately the beginning of the logs
> are a bit jumbled since it looks like the RCU warning and the soft
> lockup warning happened at about the same time and their stuff is
> jumbled. There's also a lot of tasks to go through. Honestly, it's
> probably less work just to look at whatever you were trying to fuzz to
> help you pinpoint the problem.
>
> I'll also note that you seem to be using KASAN and are running in a
> virtual machine. It's not inconceivable that's contributing to your
> problems. KASAN makes things _a lot_ slower and a VM may be getting
> its time stolen by the host.
>
> -Doug

[PATCH v2 net-next 08/13] net: enetc: enable RSS feature by default

2025-01-13 Thread Wei Fang

Receive side scaling (RSS) is a network driver technology that enables
the efficient distribution of network receive processing across multiple
CPUs in multiprocessor systems. Therefore, it is better to enable RSS by
default so that the CPU load can be balanced and network performance can
be improved when then network is enabled.

Signed-off-by: Wei Fang 
---
 drivers/net/ethernet/freescale/enetc/enetc.c  | 35 ++-
 .../freescale/enetc/enetc_pf_common.c |  4 ++-
 .../net/ethernet/freescale/enetc/enetc_vf.c   |  4 ++-
 3 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c 
b/drivers/net/ethernet/freescale/enetc/enetc.c
index 233f58e57a20..e27b031c4f46 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -2378,6 +2378,22 @@ static void enetc_set_lso_flags_mask(struct enetc_hw *hw)
enetc_wr(hw, ENETC4_SILSOSFMR1, 0);
 }
 
+static int enetc_set_rss(struct net_device *ndev, int en)
+{
+   struct enetc_ndev_priv *priv = netdev_priv(ndev);
+   struct enetc_hw *hw = &priv->si->hw;
+   u32 reg;
+
+   enetc_wr(hw, ENETC_SIRBGCR, priv->num_rx_rings);
+
+   reg = enetc_rd(hw, ENETC_SIMR);
+   reg &= ~ENETC_SIMR_RSSE;
+   reg |= (en) ? ENETC_SIMR_RSSE : 0;
+   enetc_wr(hw, ENETC_SIMR, reg);
+
+   return 0;
+}
+
 int enetc_configure_si(struct enetc_ndev_priv *priv)
 {
struct enetc_si *si = priv->si;
@@ -2398,6 +2414,9 @@ int enetc_configure_si(struct enetc_ndev_priv *priv)
err = enetc_setup_default_rss_table(si, priv->num_rx_rings);
if (err)
return err;
+
+   if (priv->ndev->features & NETIF_F_RXHASH)
+   enetc_set_rss(priv->ndev, true);
}
 
return 0;
@@ -3190,22 +3209,6 @@ struct net_device_stats *enetc_get_stats(struct 
net_device *ndev)
 }
 EXPORT_SYMBOL_GPL(enetc_get_stats);
 
-static int enetc_set_rss(struct net_device *ndev, int en)
-{
-   struct enetc_ndev_priv *priv = netdev_priv(ndev);
-   struct enetc_hw *hw = &priv->si->hw;
-   u32 reg;
-
-   enetc_wr(hw, ENETC_SIRBGCR, priv->num_rx_rings);
-
-   reg = enetc_rd(hw, ENETC_SIMR);
-   reg &= ~ENETC_SIMR_RSSE;
-   reg |= (en) ? ENETC_SIMR_RSSE : 0;
-   enetc_wr(hw, ENETC_SIMR, reg);
-
-   return 0;
-}
-
 static void enetc_enable_rxvlan(struct net_device *ndev, bool en)
 {
struct enetc_ndev_priv *priv = netdev_priv(ndev);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
index c346e0e3ad37..a737a7f8c79e 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
@@ -128,8 +128,10 @@ void enetc_pf_netdev_setup(struct enetc_si *si, struct 
net_device *ndev,
if (si->hw_features & ENETC_SI_F_LSO)
priv->active_offloads |= ENETC_F_LSO;
 
-   if (si->num_rss)
+   if (si->num_rss) {
ndev->hw_features |= NETIF_F_RXHASH;
+   ndev->features |= NETIF_F_RXHASH;
+   }
 
/* TODO: currently, i.MX95 ENETC driver does not support advanced 
features */
if (!is_enetc_rev1(si)) {
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_vf.c 
b/drivers/net/ethernet/freescale/enetc/enetc_vf.c
index 072e5b40a199..3372a9a779a6 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_vf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_vf.c
@@ -155,8 +155,10 @@ static void enetc_vf_netdev_setup(struct enetc_si *si, 
struct net_device *ndev,
ndev->vlan_features = NETIF_F_SG | NETIF_F_HW_CSUM |
  NETIF_F_TSO | NETIF_F_TSO6;
 
-   if (si->num_rss)
+   if (si->num_rss) {
ndev->hw_features |= NETIF_F_RXHASH;
+   ndev->features |= NETIF_F_RXHASH;
+   }
 
/* pick up primary MAC address from SI */
enetc_load_primary_mac_addr(&si->hw, ndev);
-- 
2.34.1

[PATCH v2 net-next 10/13] net: enetc: move generic VLAN hash filter functions to enetc_pf_common.c

2025-01-13 Thread Wei Fang

Since the VLAN hash filter of ENETC v1 and v4 is the basically same, the
only difference is the offset of the VLAN hash filter registers. So, the
.set_si_vlan_hash_filter() hook is added to struct enetc_pf_ops to set
the registers of the corresponding platform. In addition, the common VLAN
hash filter functions enetc_vlan_rx_add_vid() and enetc_vlan_rx_del_vid()
are moved to enetc_pf_common.c.

Signed-off-by: Wei Fang 
---
 .../net/ethernet/freescale/enetc/enetc_pf.c   | 34 ++-
 .../net/ethernet/freescale/enetc/enetc_pf.h   |  1 +
 .../freescale/enetc/enetc_pf_common.c | 34 +++
 .../freescale/enetc/enetc_pf_common.h |  2 ++
 4 files changed, 39 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index c0aaf6349b0b..d9c1ebd180db 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -215,43 +215,12 @@ static void enetc_pf_set_rx_mode(struct net_device *ndev)
enetc_port_wr(hw, ENETC_PSIPMR, psipmr);
 }
 
-static void enetc_set_vlan_ht_filter(struct enetc_hw *hw, int si_idx,
-unsigned long hash)
+static void enetc_set_vlan_ht_filter(struct enetc_hw *hw, int si_idx, u64 hash)
 {
enetc_port_wr(hw, ENETC_PSIVHFR0(si_idx), lower_32_bits(hash));
enetc_port_wr(hw, ENETC_PSIVHFR1(si_idx), upper_32_bits(hash));
 }
 
-static int enetc_vlan_rx_add_vid(struct net_device *ndev, __be16 prot, u16 vid)
-{
-   struct enetc_ndev_priv *priv = netdev_priv(ndev);
-   struct enetc_si *si = priv->si;
-   struct enetc_hw *hw = &si->hw;
-   int idx;
-
-   __set_bit(vid, si->active_vlans);
-
-   idx = enetc_vid_hash_idx(vid);
-   if (!__test_and_set_bit(idx, si->vlan_ht_filter))
-   enetc_set_vlan_ht_filter(hw, 0, *si->vlan_ht_filter);
-
-   return 0;
-}
-
-static int enetc_vlan_rx_del_vid(struct net_device *ndev, __be16 prot, u16 vid)
-{
-   struct enetc_ndev_priv *priv = netdev_priv(ndev);
-   struct enetc_si *si = priv->si;
-   struct enetc_hw *hw = &si->hw;
-
-   if (__test_and_clear_bit(vid, si->active_vlans)) {
-   enetc_refresh_vlan_ht_filter(si);
-   enetc_set_vlan_ht_filter(hw, 0, *si->vlan_ht_filter);
-   }
-
-   return 0;
-}
-
 static void enetc_set_loopback(struct net_device *ndev, bool en)
 {
struct enetc_ndev_priv *priv = netdev_priv(ndev);
@@ -971,6 +940,7 @@ static const struct enetc_pf_ops enetc_pf_ops = {
.enable_psfp = enetc_psfp_enable,
.set_rss_key = enetc_set_rss_key,
.get_rss_key = enetc_get_rss_key,
+   .set_si_vlan_hash_filter = enetc_set_vlan_ht_filter,
 };
 
 static int enetc_pf_probe(struct pci_dev *pdev,
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.h 
b/drivers/net/ethernet/freescale/enetc/enetc_pf.h
index d56b381b9da9..7a0fa5fba8bf 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.h
@@ -39,6 +39,7 @@ struct enetc_pf_ops {
int (*enable_psfp)(struct enetc_ndev_priv *priv);
void (*set_rss_key)(struct enetc_hw *hw, const u8 *key);
void (*get_rss_key)(struct enetc_hw *hw, u8 *key);
+   void (*set_si_vlan_hash_filter)(struct enetc_hw *hw, int si, u64 hash);
 };
 
 struct enetc_pf {
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
index a737a7f8c79e..9f812c1af7a3 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
@@ -343,5 +343,39 @@ void enetc_phylink_destroy(struct enetc_ndev_priv *priv)
 }
 EXPORT_SYMBOL_GPL(enetc_phylink_destroy);
 
+int enetc_vlan_rx_add_vid(struct net_device *ndev, __be16 prot, u16 vid)
+{
+   struct enetc_ndev_priv *priv = netdev_priv(ndev);
+   struct enetc_pf *pf = enetc_si_priv(priv->si);
+   struct enetc_si *si = priv->si;
+   struct enetc_hw *hw = &si->hw;
+   int idx;
+
+   __set_bit(vid, si->active_vlans);
+
+   idx = enetc_vid_hash_idx(vid);
+   if (!__test_and_set_bit(idx, si->vlan_ht_filter))
+   pf->ops->set_si_vlan_hash_filter(hw, 0, *si->vlan_ht_filter);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(enetc_vlan_rx_add_vid);
+
+int enetc_vlan_rx_del_vid(struct net_device *ndev, __be16 prot, u16 vid)
+{
+   struct enetc_ndev_priv *priv = netdev_priv(ndev);
+   struct enetc_pf *pf = enetc_si_priv(priv->si);
+   struct enetc_si *si = priv->si;
+   struct enetc_hw *hw = &si->hw;
+
+   if (__test_and_clear_bit(vid, si->active_vlans)) {
+   enetc_refresh_vlan_ht_filter(si);
+   pf->ops->set_si_vlan_hash_filter(hw, 0, *si->vlan_ht_filter);
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(enetc_vlan_rx_del_vid);
+
 MODULE_DESCRIPTION("NXP ENETC PF common functi

[PATCH v2 net-next 11/13] net: enetc: add VLAN filtering support for i.MX95 ENETC PF

2025-01-13 Thread Wei Fang

Add VLAN hash filter support for i.MX95 ENETC PF. If VLAN filtering is
disabled, then VLAN promiscuous mode will be enabled, which means that
PF qualifies for reception of all VLAN tags.

Signed-off-by: Wei Fang 
---
 .../net/ethernet/freescale/enetc/enetc4_hw.h  |  4 
 .../net/ethernet/freescale/enetc/enetc4_pf.c  | 20 +++
 .../freescale/enetc/enetc_pf_common.c |  2 +-
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_hw.h 
b/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
index 826359004850..aa25b445d301 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
@@ -107,6 +107,10 @@
 #define ENETC4_PSIMMHFR0(a)((a) * 0x80 + 0x2058)
 #define ENETC4_PSIMMHFR1(a)((a) * 0x80 + 0x205c)
 
+/* Port station interface a VLAN hash filter register 0/1 */
+#define ENETC4_PSIVHFR0(a) ((a) * 0x80 + 0x2060)
+#define ENETC4_PSIVHFR1(a) ((a) * 0x80 + 0x2064)
+
 #define ENETC4_PMCAPR  0x4004
 #define  PMCAPR_HD BIT(8)
 #define  PMCAPR_FP GENMASK(10, 9)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
index adb5819c091f..65e6e3742ada 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
@@ -85,11 +85,19 @@ static void enetc4_get_rss_key(struct enetc_hw *hw, u8 *key)
((u32 *)key)[i] = enetc_port_rd(hw, ENETC4_PRSSKR(i));
 }
 
+static void enetc4_pf_set_si_vlan_hash_filter(struct enetc_hw *hw,
+ int si, u64 hash)
+{
+   enetc_port_wr(hw, ENETC4_PSIVHFR0(si), lower_32_bits(hash));
+   enetc_port_wr(hw, ENETC4_PSIVHFR1(si), upper_32_bits(hash));
+}
+
 static const struct enetc_pf_ops enetc4_pf_ops = {
.set_si_primary_mac = enetc4_pf_set_si_primary_mac,
.get_si_primary_mac = enetc4_pf_get_si_primary_mac,
.set_rss_key = enetc4_set_rss_key,
.get_rss_key = enetc4_get_rss_key,
+   .set_si_vlan_hash_filter = enetc4_pf_set_si_vlan_hash_filter,
 };
 
 static int enetc4_pf_struct_init(struct enetc_si *si)
@@ -704,6 +712,16 @@ static void enetc4_pf_set_rx_mode(struct net_device *ndev)
 static int enetc4_pf_set_features(struct net_device *ndev,
  netdev_features_t features)
 {
+   netdev_features_t changed = ndev->features ^ features;
+   struct enetc_ndev_priv *priv = netdev_priv(ndev);
+   struct enetc_hw *hw = &priv->si->hw;
+
+   if (changed & NETIF_F_HW_VLAN_CTAG_FILTER) {
+   bool promisc_en = !(features & NETIF_F_HW_VLAN_CTAG_FILTER);
+
+   enetc4_pf_set_si_vlan_promisc(hw, 0, promisc_en);
+   }
+
enetc_set_features(ndev, features);
 
return 0;
@@ -717,6 +735,8 @@ static const struct net_device_ops enetc4_ndev_ops = {
.ndo_set_mac_address= enetc_pf_set_mac_addr,
.ndo_set_rx_mode= enetc4_pf_set_rx_mode,
.ndo_set_features   = enetc4_pf_set_features,
+   .ndo_vlan_rx_add_vid= enetc_vlan_rx_add_vid,
+   .ndo_vlan_rx_kill_vid   = enetc_vlan_rx_del_vid,
 };
 
 static struct phylink_pcs *
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
index 9f812c1af7a3..3f7ccc482301 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
@@ -135,7 +135,7 @@ void enetc_pf_netdev_setup(struct enetc_si *si, struct 
net_device *ndev,
 
/* TODO: currently, i.MX95 ENETC driver does not support advanced 
features */
if (!is_enetc_rev1(si)) {
-   ndev->hw_features &= ~(NETIF_F_HW_VLAN_CTAG_FILTER | 
NETIF_F_LOOPBACK);
+   ndev->hw_features &= ~NETIF_F_LOOPBACK;
goto end;
}
 
-- 
2.34.1

[PATCH v2 net-next 12/13] net: enetc: add loopback support for i.MX95 ENETC PF

2025-01-13 Thread Wei Fang

Add internal loopback support for i.MX95 ENETC PF, the default loopback
mode is MAC level loopback, the MAC Tx data is looped back onto the Rx.
The MAC interface runs at a fixed 1:8 ratio of NETC clock in MAC-level
loopback mode, with no dependency on Tx clock.

Signed-off-by: Wei Fang 
---
 .../net/ethernet/freescale/enetc/enetc4_pf.c   | 18 ++
 .../ethernet/freescale/enetc/enetc_pf_common.c |  4 +---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
index 65e6e3742ada..948d2f796bfb 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
@@ -709,6 +709,21 @@ static void enetc4_pf_set_rx_mode(struct net_device *ndev)
queue_work(si->workqueue, &si->rx_mode_task);
 }
 
+static void enetc4_pf_set_loopback(struct net_device *ndev, bool en)
+{
+   struct enetc_ndev_priv *priv = netdev_priv(ndev);
+   struct enetc_si *si = priv->si;
+   u32 val;
+
+   val = enetc_port_mac_rd(si, ENETC4_PM_CMD_CFG(0));
+   val = u32_replace_bits(val, en ? 1 : 0, PM_CMD_CFG_LOOP_EN);
+   /* Default to select MAC level loopback mode if loopback is enabled. */
+   val = u32_replace_bits(val, en ? LPBCK_MODE_MAC_LEVEL : 0,
+  PM_CMD_CFG_LPBK_MODE);
+
+   enetc_port_mac_wr(si, ENETC4_PM_CMD_CFG(0), val);
+}
+
 static int enetc4_pf_set_features(struct net_device *ndev,
  netdev_features_t features)
 {
@@ -722,6 +737,9 @@ static int enetc4_pf_set_features(struct net_device *ndev,
enetc4_pf_set_si_vlan_promisc(hw, 0, promisc_en);
}
 
+   if (changed & NETIF_F_LOOPBACK)
+   enetc4_pf_set_loopback(ndev, !!(features & NETIF_F_LOOPBACK));
+
enetc_set_features(ndev, features);
 
return 0;
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
index 3f7ccc482301..0a2b8769a175 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
@@ -134,10 +134,8 @@ void enetc_pf_netdev_setup(struct enetc_si *si, struct 
net_device *ndev,
}
 
/* TODO: currently, i.MX95 ENETC driver does not support advanced 
features */
-   if (!is_enetc_rev1(si)) {
-   ndev->hw_features &= ~NETIF_F_LOOPBACK;
+   if (!is_enetc_rev1(si))
goto end;
-   }
 
ndev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
 NETDEV_XDP_ACT_NDO_XMIT | NETDEV_XDP_ACT_RX_SG |
-- 
2.34.1

[PATCH v2 net-next 07/13] net: enetc: add RSS support for i.MX95 ENETC PF

2025-01-13 Thread Wei Fang

Add Receive side scaling (RSS) support for i.MX95 ENETC PF to improve the
network performance and balance the CPU loading. In addition, since both
ENETC v1 and ENETC v4 only support the toeplitz algorithm, so a check for
hfunc was added.

Signed-off-by: Wei Fang 
---
 drivers/net/ethernet/freescale/enetc/enetc.c  |  7 +---
 drivers/net/ethernet/freescale/enetc/enetc.h  |  4 ++
 .../net/ethernet/freescale/enetc/enetc4_pf.c  | 37 +++
 .../net/ethernet/freescale/enetc/enetc_cbdr.c | 14 +++
 .../ethernet/freescale/enetc/enetc_ethtool.c  | 33 -
 .../net/ethernet/freescale/enetc/enetc_pf.c   |  2 +
 .../freescale/enetc/enetc_pf_common.c |  6 +--
 .../net/ethernet/freescale/enetc/enetc_vf.c   |  2 +
 8 files changed, 87 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c 
b/drivers/net/ethernet/freescale/enetc/enetc.c
index 6d21c133e418..233f58e57a20 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -2363,7 +2363,7 @@ static int enetc_setup_default_rss_table(struct enetc_si 
*si, int num_groups)
for (i = 0; i < si->num_rss; i++)
rss_table[i] = i % num_groups;
 
-   enetc_set_rss_table(si, rss_table, si->num_rss);
+   si->ops->set_rss_table(si, rss_table, si->num_rss);
 
kfree(rss_table);
 
@@ -2394,10 +2394,7 @@ int enetc_configure_si(struct enetc_ndev_priv *priv)
if (si->hw_features & ENETC_SI_F_LSO)
enetc_set_lso_flags_mask(hw);
 
-   /* TODO: RSS support for i.MX95 will be supported later, and the
-* is_enetc_rev1() condition will be removed
-*/
-   if (si->num_rss && is_enetc_rev1(si)) {
+   if (si->num_rss) {
err = enetc_setup_default_rss_table(si, priv->num_rx_rings);
if (err)
return err;
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h 
b/drivers/net/ethernet/freescale/enetc/enetc.h
index fb53fb961364..2b0d27ed924d 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -290,6 +290,8 @@ struct enetc_si;
 struct enetc_si_ops {
int (*setup_cbdr)(struct enetc_si *si);
void (*teardown_cbdr)(struct enetc_si *si);
+   int (*get_rss_table)(struct enetc_si *si, u32 *table, int count);
+   int (*set_rss_table)(struct enetc_si *si, const u32 *table, int count);
 };
 
 /* PCI IEP device data */
@@ -540,6 +542,8 @@ int enetc_set_fs_entry(struct enetc_si *si, struct 
enetc_cmd_rfse *rfse,
 int enetc_get_rss_table(struct enetc_si *si, u32 *table, int count);
 int enetc_set_rss_table(struct enetc_si *si, const u32 *table, int count);
 int enetc_send_cmd(struct enetc_si *si, struct enetc_cbd *cbd);
+int enetc4_get_rss_table(struct enetc_si *si, u32 *table, int count);
+int enetc4_set_rss_table(struct enetc_si *si, const u32 *table, int count);
 
 static inline void *enetc_cbd_alloc_data_mem(struct enetc_si *si,
 struct enetc_cbd *cbd,
diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
index 798c69e83c8f..adb5819c091f 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_pf.c
@@ -69,9 +69,27 @@ static void enetc4_pf_get_si_primary_mac(struct enetc_hw 
*hw, int si,
put_unaligned_le16(lower, addr + 4);
 }
 
+static void enetc4_set_rss_key(struct enetc_hw *hw, const u8 *key)
+{
+   int i;
+
+   for (i = 0; i < ENETC_RSSHASH_KEY_SIZE / 4; i++)
+   enetc_port_wr(hw, ENETC4_PRSSKR(i), ((u32 *)key)[i]);
+}
+
+static void enetc4_get_rss_key(struct enetc_hw *hw, u8 *key)
+{
+   int i;
+
+   for (i = 0; i < ENETC_RSSHASH_KEY_SIZE / 4; i++)
+   ((u32 *)key)[i] = enetc_port_rd(hw, ENETC4_PRSSKR(i));
+}
+
 static const struct enetc_pf_ops enetc4_pf_ops = {
.set_si_primary_mac = enetc4_pf_set_si_primary_mac,
.get_si_primary_mac = enetc4_pf_get_si_primary_mac,
+   .set_rss_key = enetc4_set_rss_key,
+   .get_rss_key = enetc4_get_rss_key,
 };
 
 static int enetc4_pf_struct_init(struct enetc_si *si)
@@ -263,14 +281,6 @@ static void enetc4_set_trx_frame_size(struct enetc_pf *pf)
enetc4_pf_reset_tc_msdu(&si->hw);
 }
 
-static void enetc4_set_rss_key(struct enetc_hw *hw, const u8 *bytes)
-{
-   int i;
-
-   for (i = 0; i < ENETC_RSSHASH_KEY_SIZE / 4; i++)
-   enetc_port_wr(hw, ENETC4_PRSSKR(i), ((u32 *)bytes)[i]);
-}
-
 static void enetc4_set_default_rss_key(struct enetc_pf *pf)
 {
u8 hash_key[ENETC_RSSHASH_KEY_SIZE] = {0};
@@ -691,6 +701,14 @@ static void enetc4_pf_set_rx_mode(struct net_device *ndev)
queue_work(si->workqueue, &si->rx_mode_task);
 }
 
+static int enetc4_pf_set_features(struct net_device *ndev,
+ netdev_features_t features)
+{
+   enetc_set_f

[PATCH v2 net-next 09/13] net: enetc: move generic VLAN filter interfaces to enetc-core

2025-01-13 Thread Wei Fang

For ENETC, each SI has a corresponding VLAN hash table. That is to say,
both PF and VFs can support VLAN filter. However, currently only ENETC v1
PF driver supports VLAN filter. In order to make i.MX95 ENETC (v4) PF and
VF drivers also support VLAN filter, some related macros are moved from
enetc_pf.h to enetc.h, and the related structure variables are moved from
enetc_pf to enetc_si.

Besides, enetc_vid_hash_idx() as a generic function is moved to enetc.c.
Extract enetc_refresh_vlan_ht_filter() from enetc_sync_vlan_ht_filter()
so that it can be shared by PF and VF drivers. This will make it easier
to add VLAN filter support for i.MX95 ENETC later.

Signed-off-by: Wei Fang 
---
 drivers/net/ethernet/freescale/enetc/enetc.c  | 25 ++
 drivers/net/ethernet/freescale/enetc/enetc.h  |  6 +++
 .../net/ethernet/freescale/enetc/enetc_pf.c   | 46 +--
 .../net/ethernet/freescale/enetc/enetc_pf.h   |  4 --
 4 files changed, 42 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c 
b/drivers/net/ethernet/freescale/enetc/enetc.c
index e27b031c4f46..8b4a004f51a4 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -72,6 +72,31 @@ void enetc_reset_mac_addr_filter(struct enetc_mac_filter 
*filter)
 }
 EXPORT_SYMBOL_GPL(enetc_reset_mac_addr_filter);
 
+int enetc_vid_hash_idx(unsigned int vid)
+{
+   int res = 0;
+   int i;
+
+   for (i = 0; i < 6; i++)
+   res |= (hweight8(vid & (BIT(i) | BIT(i + 6))) & 0x1) << i;
+
+   return res;
+}
+EXPORT_SYMBOL_GPL(enetc_vid_hash_idx);
+
+void enetc_refresh_vlan_ht_filter(struct enetc_si *si)
+{
+   int i;
+
+   bitmap_zero(si->vlan_ht_filter, ENETC_VLAN_HT_SIZE);
+   for_each_set_bit(i, si->active_vlans, VLAN_N_VID) {
+   int hidx = enetc_vid_hash_idx(i);
+
+   __set_bit(hidx, si->vlan_ht_filter);
+   }
+}
+EXPORT_SYMBOL_GPL(enetc_refresh_vlan_ht_filter);
+
 static int enetc_num_stack_tx_queues(struct enetc_ndev_priv *priv)
 {
int num_tx_rings = priv->num_tx_rings;
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h 
b/drivers/net/ethernet/freescale/enetc/enetc.h
index 2b0d27ed924d..0ecec9da6148 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -24,6 +24,7 @@
 #define ENETC_CBD_DATA_MEM_ALIGN 64
 
 #define ENETC_MADDR_HASH_TBL_SZ64
+#define ENETC_VLAN_HT_SIZE 64
 
 enum enetc_mac_addr_type {UC, MC, MADDR_TYPE};
 
@@ -321,6 +322,9 @@ struct enetc_si {
struct workqueue_struct *workqueue;
struct work_struct rx_mode_task;
struct dentry *debugfs_root;
+
+   DECLARE_BITMAP(vlan_ht_filter, ENETC_VLAN_HT_SIZE);
+   DECLARE_BITMAP(active_vlans, VLAN_N_VID);
 };
 
 #define ENETC_SI_ALIGN 32
@@ -506,6 +510,8 @@ int enetc_get_driver_data(struct enetc_si *si);
 void enetc_add_mac_addr_ht_filter(struct enetc_mac_filter *filter,
  const unsigned char *addr);
 void enetc_reset_mac_addr_filter(struct enetc_mac_filter *filter);
+int enetc_vid_hash_idx(unsigned int vid);
+void enetc_refresh_vlan_ht_filter(struct enetc_si *si);
 
 int enetc_open(struct net_device *ndev);
 int enetc_close(struct net_device *ndev);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index 59039d087695..c0aaf6349b0b 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -222,45 +222,18 @@ static void enetc_set_vlan_ht_filter(struct enetc_hw *hw, 
int si_idx,
enetc_port_wr(hw, ENETC_PSIVHFR1(si_idx), upper_32_bits(hash));
 }
 
-static int enetc_vid_hash_idx(unsigned int vid)
-{
-   int res = 0;
-   int i;
-
-   for (i = 0; i < 6; i++)
-   res |= (hweight8(vid & (BIT(i) | BIT(i + 6))) & 0x1) << i;
-
-   return res;
-}
-
-static void enetc_sync_vlan_ht_filter(struct enetc_pf *pf, bool rehash)
-{
-   int i;
-
-   if (rehash) {
-   bitmap_zero(pf->vlan_ht_filter, ENETC_VLAN_HT_SIZE);
-
-   for_each_set_bit(i, pf->active_vlans, VLAN_N_VID) {
-   int hidx = enetc_vid_hash_idx(i);
-
-   __set_bit(hidx, pf->vlan_ht_filter);
-   }
-   }
-
-   enetc_set_vlan_ht_filter(&pf->si->hw, 0, *pf->vlan_ht_filter);
-}
-
 static int enetc_vlan_rx_add_vid(struct net_device *ndev, __be16 prot, u16 vid)
 {
struct enetc_ndev_priv *priv = netdev_priv(ndev);
-   struct enetc_pf *pf = enetc_si_priv(priv->si);
+   struct enetc_si *si = priv->si;
+   struct enetc_hw *hw = &si->hw;
int idx;
 
-   __set_bit(vid, pf->active_vlans);
+   __set_bit(vid, si->active_vlans);
 
idx = enetc_vid_hash_idx(vid);
-   if (!__test_and_set_bit(idx, pf->vlan_ht_filter))
-   enetc_sync_vlan_ht_filter(pf, false);
+   if (!__test_and

[PATCH v2 net-next 06/13] net: enetc: make enetc_set_rxfh() and enetc_get_rxfh() reusable

2025-01-13 Thread Wei Fang

Both ENETC v1 and v4 support Receive Side Scaling (RSS), but the offset
of the RSS key registers is different. In order to make enetc_get_rxfh()
and enetc_set_rxfh() be reused by ENETC v4, the .set_rss_key() and
.get_rss_key() interfaces are added to enect_pf_ops.

Signed-off-by: Wei Fang 
---
 drivers/net/ethernet/freescale/enetc/enetc.h  |  1 -
 .../ethernet/freescale/enetc/enetc_ethtool.c  | 42 +--
 .../net/ethernet/freescale/enetc/enetc_pf.c   | 18 
 .../net/ethernet/freescale/enetc/enetc_pf.h   |  2 +
 4 files changed, 39 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h 
b/drivers/net/ethernet/freescale/enetc/enetc.h
index ca1bc85c0ac9..fb53fb961364 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -537,7 +537,6 @@ int enetc_set_mac_flt_entry(struct enetc_si *si, int index,
 int enetc_clear_mac_flt_entry(struct enetc_si *si, int index);
 int enetc_set_fs_entry(struct enetc_si *si, struct enetc_cmd_rfse *rfse,
   int index);
-void enetc_set_rss_key(struct enetc_hw *hw, const u8 *bytes);
 int enetc_get_rss_table(struct enetc_si *si, u32 *table, int count);
 int enetc_set_rss_table(struct enetc_si *si, const u32 *table, int count);
 int enetc_send_cmd(struct enetc_si *si, struct enetc_cbd *cbd);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c 
b/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
index bf34b5bb1e35..56ba82830279 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
@@ -4,7 +4,8 @@
 #include 
 #include 
 #include 
-#include "enetc.h"
+
+#include "enetc_pf.h"
 
 static const u32 enetc_si_regs[] = {
ENETC_SIMR, ENETC_SIPMAR0, ENETC_SIPMAR1, ENETC_SICBDRMR,
@@ -681,51 +682,46 @@ static int enetc_get_rxfh(struct net_device *ndev,
  struct ethtool_rxfh_param *rxfh)
 {
struct enetc_ndev_priv *priv = netdev_priv(ndev);
-   struct enetc_hw *hw = &priv->si->hw;
-   int err = 0, i;
+   struct enetc_si *si = priv->si;
+   struct enetc_hw *hw = &si->hw;
+   int err = 0;
 
/* return hash function */
rxfh->hfunc = ETH_RSS_HASH_TOP;
 
/* return hash key */
-   if (rxfh->key && hw->port)
-   for (i = 0; i < ENETC_RSSHASH_KEY_SIZE / 4; i++)
-   ((u32 *)rxfh->key)[i] = enetc_port_rd(hw,
- ENETC_PRSSK(i));
+   if (rxfh->key && enetc_si_is_pf(si)) {
+   struct enetc_pf *pf = enetc_si_priv(si);
+
+   pf->ops->get_rss_key(hw, rxfh->key);
+   }
 
/* return RSS table */
if (rxfh->indir)
-   err = enetc_get_rss_table(priv->si, rxfh->indir,
- priv->si->num_rss);
+   err = enetc_get_rss_table(si, rxfh->indir, si->num_rss);
 
return err;
 }
 
-void enetc_set_rss_key(struct enetc_hw *hw, const u8 *bytes)
-{
-   int i;
-
-   for (i = 0; i < ENETC_RSSHASH_KEY_SIZE / 4; i++)
-   enetc_port_wr(hw, ENETC_PRSSK(i), ((u32 *)bytes)[i]);
-}
-EXPORT_SYMBOL_GPL(enetc_set_rss_key);
-
 static int enetc_set_rxfh(struct net_device *ndev,
  struct ethtool_rxfh_param *rxfh,
  struct netlink_ext_ack *extack)
 {
struct enetc_ndev_priv *priv = netdev_priv(ndev);
-   struct enetc_hw *hw = &priv->si->hw;
+   struct enetc_si *si = priv->si;
+   struct enetc_hw *hw = &si->hw;
int err = 0;
 
/* set hash key, if PF */
-   if (rxfh->key && hw->port)
-   enetc_set_rss_key(hw, rxfh->key);
+   if (rxfh->key && enetc_si_is_pf(si)) {
+   struct enetc_pf *pf = enetc_si_priv(si);
+
+   pf->ops->set_rss_key(hw, rxfh->key);
+   }
 
/* set RSS table */
if (rxfh->indir)
-   err = enetc_set_rss_table(priv->si, rxfh->indir,
- priv->si->num_rss);
+   err = enetc_set_rss_table(si, rxfh->indir, si->num_rss);
 
return err;
 }
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c 
b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index cc3e52bd3096..f050cf039733 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -512,6 +512,22 @@ static void enetc_mac_enable(struct enetc_si *si, bool en)
enetc_port_mac_wr(si, ENETC_PM0_CMD_CFG, val);
 }
 
+static void enetc_set_rss_key(struct enetc_hw *hw, const u8 *key)
+{
+   int i;
+
+   for (i = 0; i < ENETC_RSSHASH_KEY_SIZE / 4; i++)
+   enetc_port_wr(hw, ENETC_PRSSK(i), ((u32 *)key)[i]);
+}
+
+static void enetc_get_rss_key(struct enetc_hw *hw, u8 *key)
+{
+   int i;
+
+   for (i = 0; i < ENETC_RSSHASH_KEY_SIZE / 4; i++)
+   ((u32 *)key)[i]

[PATCH v2 net-next 13/13] MAINTAINERS: add new file ntmp.h to ENETC driver

2025-01-13 Thread Wei Fang

Add new file ntmp.h. to ENETC driver.

Signed-off-by: Wei Fang 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1579124ef426..ac28154f7eb5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9099,6 +9099,7 @@ F:
Documentation/devicetree/bindings/net/nxp,netc-blk-ctrl.yaml
 F: drivers/net/ethernet/freescale/enetc/
 F: include/linux/fsl/enetc_mdio.h
 F: include/linux/fsl/netc_global.h
+F: include/linux/fsl/ntmp.h
 
 FREESCALE eTSEC ETHERNET DRIVER (GIANFAR)
 M: Claudiu Manoil 
-- 
2.34.1

[PATCH] select: Fix unbalanced user_access_end()

2025-01-13 Thread Christophe Leroy

While working on implementing user access validation on powerpc
I got the following warnings on a pmac32_defconfig build:

  CC  fs/select.o
fs/select.o: warning: objtool: sys_pselect6+0x1bc: redundant UACCESS 
disable
fs/select.o: warning: objtool: sys_pselect6_time32+0x1bc: redundant 
UACCESS disable

On powerpc/32s, user_read_access_begin/end() are no-ops, but the
failure path has a user_access_end() instead of user_read_access_end()
which means an access end without any prior access begin.

Replace that user_access_end() by user_read_access_end().

Fixes: 7e71609f64ec ("pselect6() and friends: take handling the combined 
6th/7th args into helper")
Signed-off-by: Christophe Leroy 
---
 fs/select.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index e223d1fe9d55..7da531b1cf6b 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -786,7 +786,7 @@ static inline int get_sigset_argpack(struct sigset_argpack 
*to,
}
return 0;
 Efault:
-   user_access_end();
+   user_read_access_end();
return -EFAULT;
 }
 
@@ -1355,7 +1355,7 @@ static inline int get_compat_sigset_argpack(struct 
compat_sigset_argpack *to,
}
return 0;
 Efault:
-   user_access_end();
+   user_read_access_end();
return -EFAULT;
 }
 
-- 
2.47.0

Re: [PATCH 3/5] KVM: Add a common kvm_run flag to communicate an exit needs completion

2025-01-13 Thread Binbin Wu






On 1/13/2025 10:09 AM, Chao Gao wrote:

On Fri, Jan 10, 2025 at 05:24:48PM -0800, Sean Christopherson wrote:

Add a kvm_run flag, KVM_RUN_NEEDS_COMPLETION, to communicate to userspace
that KVM_RUN needs to be re-executed prior to save/restore in order to
complete the instruction/operation that triggered the userspace exit.

KVM's current approach of adding notes in the Documentation is beyond
brittle, e.g. there is at least one known case where a KVM developer added
a new userspace exit type, and then that same developer forgot to handle
completion when adding userspace support.

This answers one question I had:
https://lore.kernel.org/kvm/z1bmucedoz87w...@intel.com/

In current QEMU code, it always returns back to KVM via KVM_RUN after it
successfully handled a KVM exit reason, no matter what the exit reason is.
The complete_userspace_io() callback will be called if it has been setup.
So if a new kvm exit reason is added in QEMU, it seems QEMU doesn't need
special handing to make the complete_userspace_io() callback be called.

However, QEMU is not the only userspace VMM that supports KVM, it makes
sense to make the solution generic and clear for different userspace VMMs.

Regarding the support of MapGPA for TDX when live migration is considered,
since a big range will be split into 2MB chunks, in order the status is
right after TD live migration, it needs to set the return code to retry
with the next_gpa in the complete_userspace_io() callback if vcpu->wants_to_run
is false or vcpu->run->immediate_exit__unsafe is set, otherwise, TDX guest
will see return code as successful and think the whole range has been converted
successfully.

@@ -1093,7 +1093,8 @@ static int tdx_complete_vmcall_map_gpa(struct kvm_vcpu 
*vcpu)
 * immediately after STI or MOV/POP SS.
 */
    if (pi_has_pending_interrupt(vcpu) ||
-   kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending) {
+   kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending ||
+   !vcpu->wants_to_run) {
    tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_RETRY);
    tdx->vp_enter_args.r11 = tdx->map_gpa_next;
    return 1;

Of course, it can be addressed later when TD live migration is supported.




So, it is the VMM's (i.e., QEMU's) responsibility to re-execute KVM_RUN in this
case.

Btw, can this flag be used to address the issue [*] with steal time accounting?
We can set the new flag for each vCPU in the PM notifier and we need to change
the re-execution to handle steal time accounting (not just IO completion).

[*]: https://lore.kernel.org/kvm/z36xjl1oaahvk...@google.com/

one nit below,


--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -104,9 +104,10 @@ struct kvm_ioapic_state {
#define KVM_IRQCHIP_IOAPIC   2
#define KVM_NR_IRQCHIPS  3

-#define KVM_RUN_X86_SMM (1 << 0)
-#define KVM_RUN_X86_BUS_LOCK (1 << 1)
-#define KVM_RUN_X86_GUEST_MODE   (1 << 2)
+#define KVM_RUN_X86_SMM(1 << 0)
+#define KVM_RUN_X86_BUS_LOCK   (1 << 1)
+#define KVM_RUN_X86_GUEST_MODE (1 << 2)
+#define KVM_RUN_X86_NEEDS_COMPLETION   (1 << 2)

This X86_NEEDS_COMPLETION should be dropped. It is never used.

[PATCH 0/2] ASoC: fsl: Support MQS on i.MX943

2025-01-13 Thread Shengjiu Wang

There are two MQS instances on the i.MX943 platform.
The definition of bit positions in the control register are
different. In order to support these MQS modules, define
two compatible strings to distinguish them.

Shengjiu Wang (2):
  ASoC: fsl_mqs: Add i.MX943 platform support
  ASoC: dt-bindings: fsl,mqs: Add compatible string for i.MX943 platform

 .../devicetree/bindings/sound/fsl,mqs.yaml|  2 ++
 sound/soc/fsl/fsl_mqs.c   | 28 +++
 2 files changed, 30 insertions(+)

-- 
2.34.1

[PATCH 2/2] ASoC: dt-bindings: fsl,mqs: Add compatible string for i.MX943 platform

2025-01-13 Thread Shengjiu Wang

There are two MQS instances on the i.MX943 platform.
The definition of bit positions in the control register are
different. In order to support these MQS modules, define
two compatible strings to distinguish them.

As one instance is in the always-on domain, another is in the
wakeup domain, so the compatible strings are
"fsl,imx943-aonmix-mqs", "fsl,imx943-wakeupmix-mqs".

Signed-off-by: Shengjiu Wang 
---
 Documentation/devicetree/bindings/sound/fsl,mqs.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/sound/fsl,mqs.yaml 
b/Documentation/devicetree/bindings/sound/fsl,mqs.yaml
index 030ccc173130..8c22e8348b14 100644
--- a/Documentation/devicetree/bindings/sound/fsl,mqs.yaml
+++ b/Documentation/devicetree/bindings/sound/fsl,mqs.yaml
@@ -23,6 +23,8 @@ properties:
   - fsl,imx8qm-mqs
   - fsl,imx8qxp-mqs
   - fsl,imx93-mqs
+  - fsl,imx943-aonmix-mqs
+  - fsl,imx943-wakeupmix-mqs
   - fsl,imx95-aonmix-mqs
   - fsl,imx95-netcmix-mqs
 
-- 
2.34.1

[PATCH 1/2] ASoC: fsl_mqs: Add i.MX943 platform support

2025-01-13 Thread Shengjiu Wang

There are two MQS instances on the i.MX943 platform.
The definition of bit positions in the control register are
different. In order to support these MQS modules, define
two compatible strings to distinguish them.

On i.MX943 one instance is in Always-on mix, another is in
Wakeup-mix.

Signed-off-by: Shengjiu Wang 
---
 sound/soc/fsl/fsl_mqs.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/sound/soc/fsl/fsl_mqs.c b/sound/soc/fsl/fsl_mqs.c
index 0513e9e8402e..e34e5ea98de5 100644
--- a/sound/soc/fsl/fsl_mqs.c
+++ b/sound/soc/fsl/fsl_mqs.c
@@ -410,12 +410,40 @@ static const struct fsl_mqs_soc_data 
fsl_mqs_imx95_netc_data = {
.div_shift = 9,
 };
 
+static const struct fsl_mqs_soc_data fsl_mqs_imx943_aon_data = {
+   .type = TYPE_REG_SM,
+   .ctrl_off = 0x88,
+   .en_mask  = BIT(1),
+   .en_shift = 1,
+   .rst_mask = BIT(2),
+   .rst_shift = 2,
+   .osr_mask = BIT(3),
+   .osr_shift = 3,
+   .div_mask = GENMASK(15, 8),
+   .div_shift = 8,
+};
+
+static const struct fsl_mqs_soc_data fsl_mqs_imx943_wakeup_data = {
+   .type = TYPE_REG_GPR,
+   .ctrl_off = 0x10,
+   .en_mask  = BIT(1),
+   .en_shift = 1,
+   .rst_mask = BIT(2),
+   .rst_shift = 2,
+   .osr_mask = BIT(3),
+   .osr_shift = 3,
+   .div_mask = GENMASK(15, 8),
+   .div_shift = 8,
+};
+
 static const struct of_device_id fsl_mqs_dt_ids[] = {
{ .compatible = "fsl,imx8qm-mqs", .data = &fsl_mqs_imx8qm_data },
{ .compatible = "fsl,imx6sx-mqs", .data = &fsl_mqs_imx6sx_data },
{ .compatible = "fsl,imx93-mqs", .data = &fsl_mqs_imx93_data },
{ .compatible = "fsl,imx95-aonmix-mqs", .data = &fsl_mqs_imx95_aon_data 
},
{ .compatible = "fsl,imx95-netcmix-mqs", .data = 
&fsl_mqs_imx95_netc_data },
+   { .compatible = "fsl,imx943-aonmix-mqs", .data = 
&fsl_mqs_imx943_aon_data },
+   { .compatible = "fsl,imx943-wakeupmix-mqs", .data = 
&fsl_mqs_imx943_wakeup_data },
{}
 };
 MODULE_DEVICE_TABLE(of, fsl_mqs_dt_ids);
-- 
2.34.1

[PATCH V2 4/5] selftests/powerpc/pmu: Add interface test for extended reg support

2025-01-13 Thread Athira Rajeev

From: Kajol Jain 

The testcase uses check_extended_regs_support and
perf_get_platform_reg_mask function to check if the
platform has extended reg support. This will help to
check if sampling pmu selftest is enabled or not for
a given platform.

Signed-off-by: Kajol Jain 
Signed-off-by: Athira Rajeev 
---
Changelog:
 v1 -> v2
 No code changes. Rebased to latest upstream

 .../powerpc/pmu/sampling_tests/Makefile   |  3 +-
 .../sampling_tests/check_extended_reg_test.c  | 35 +++
 .../powerpc/pmu/sampling_tests/misc.c |  2 +-
 .../powerpc/pmu/sampling_tests/misc.h |  2 ++
 4 files changed, 40 insertions(+), 2 deletions(-)
 create mode 100644 
tools/testing/selftests/powerpc/pmu/sampling_tests/check_extended_reg_test.c

diff --git a/tools/testing/selftests/powerpc/pmu/sampling_tests/Makefile 
b/tools/testing/selftests/powerpc/pmu/sampling_tests/Makefile
index 9f79bec5fce7..0c4ed299c3b8 100644
--- a/tools/testing/selftests/powerpc/pmu/sampling_tests/Makefile
+++ b/tools/testing/selftests/powerpc/pmu/sampling_tests/Makefile
@@ -5,7 +5,8 @@ TEST_GEN_PROGS := mmcr0_exceptionbits_test mmcr0_cc56run_test 
mmcr0_pmccext_test
   mmcr3_src_test mmcra_thresh_marked_sample_test 
mmcra_thresh_cmp_test \
   mmcra_bhrb_ind_call_test mmcra_bhrb_any_test 
mmcra_bhrb_cond_test \
   mmcra_bhrb_disable_test bhrb_no_crash_wo_pmu_test 
intr_regs_no_crash_wo_pmu_test \
-  bhrb_filter_map_test mmcr1_sel_unit_cache_test 
mmcra_bhrb_disable_no_branch_test
+  bhrb_filter_map_test mmcr1_sel_unit_cache_test 
mmcra_bhrb_disable_no_branch_test \
+  check_extended_reg_test
 
 top_srcdir = ../../../../../..
 include ../../../lib.mk
diff --git 
a/tools/testing/selftests/powerpc/pmu/sampling_tests/check_extended_reg_test.c 
b/tools/testing/selftests/powerpc/pmu/sampling_tests/check_extended_reg_test.c
new file mode 100644
index ..865bc69f920c
--- /dev/null
+++ 
b/tools/testing/selftests/powerpc/pmu/sampling_tests/check_extended_reg_test.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2024, Kajol Jain, IBM Corp.
+ */
+
+#include 
+#include 
+
+#include "../event.h"
+#include "misc.h"
+#include "utils.h"
+
+/*
+ * A perf sampling test to check extended
+ * reg support.
+ */
+static int check_extended_reg_test(void)
+{
+   /* Check for platform support for the test */
+   SKIP_IF(!have_hwcap2(PPC_FEATURE2_ARCH_3_00));
+
+/* Skip for Generic compat PMU */
+   SKIP_IF(check_for_generic_compat_pmu());
+
+   /* Check if platform supports extended regs */
+   platform_extended_mask = perf_get_platform_reg_mask();
+   FAIL_IF(check_extended_regs_support());
+
+   return 0;
+}
+
+int main(void)
+{
+   return test_harness(check_extended_reg_test, "check_extended_reg_test");
+}
diff --git a/tools/testing/selftests/powerpc/pmu/sampling_tests/misc.c 
b/tools/testing/selftests/powerpc/pmu/sampling_tests/misc.c
index c52d8bc2a5dc..1ba675802ee9 100644
--- a/tools/testing/selftests/powerpc/pmu/sampling_tests/misc.c
+++ b/tools/testing/selftests/powerpc/pmu/sampling_tests/misc.c
@@ -92,7 +92,7 @@ static void init_ev_encodes(void)
 }
 
 /* Return the extended regs mask value */
-static u64 perf_get_platform_reg_mask(void)
+u64 perf_get_platform_reg_mask(void)
 {
if (have_hwcap2(PPC_FEATURE2_ARCH_3_1))
return PERF_POWER10_MASK;
diff --git a/tools/testing/selftests/powerpc/pmu/sampling_tests/misc.h 
b/tools/testing/selftests/powerpc/pmu/sampling_tests/misc.h
index 09c5abe237af..357e9f0fc0f7 100644
--- a/tools/testing/selftests/powerpc/pmu/sampling_tests/misc.h
+++ b/tools/testing/selftests/powerpc/pmu/sampling_tests/misc.h
@@ -39,6 +39,8 @@ extern int pvr;
 extern u64 platform_extended_mask;
 extern int check_pvr_for_sampling_tests(void);
 extern int platform_check_for_tests(void);
+extern int check_extended_regs_support(void);
+extern u64 perf_get_platform_reg_mask(void);
 
 /*
  * Event code field extraction macro.
-- 
2.43.5

[PATCH V2 1/5] tools/testing/selftests/powerpc: Enable pmu selftests for power11

2025-01-13 Thread Athira Rajeev

Add check for power11 pvr in the selftest utility
functions. Selftests uses pvr value to check for platform
support inorder to run the tests. pvr is also used to
send the extended mask value to capture sampling registers.

Update some of the utility functions to use hwcap2 inorder
to return platform specific bits from sampling registers.

Signed-off-by: Athira Rajeev 
---
Changelog:
 v1 -> v2
 No code changes. Rebased to latest upstream

 .../selftests/powerpc/pmu/sampling_tests/misc.c   | 11 ++-
 .../selftests/powerpc/pmu/sampling_tests/misc.h   | 10 ++
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/powerpc/pmu/sampling_tests/misc.c 
b/tools/testing/selftests/powerpc/pmu/sampling_tests/misc.c
index eac6420abdf1..c52d8bc2a5dc 100644
--- a/tools/testing/selftests/powerpc/pmu/sampling_tests/misc.c
+++ b/tools/testing/selftests/powerpc/pmu/sampling_tests/misc.c
@@ -59,6 +59,7 @@ static void init_ev_encodes(void)
ev_shift_thd_stop = 32;
 
switch (pvr) {
+   case POWER11:
case POWER10:
ev_mask_thd_cmp = 0x3;
ev_shift_thd_cmp = 0;
@@ -129,8 +130,14 @@ int platform_check_for_tests(void)
 * Check for supported platforms
 * for sampling test
 */
-   if ((pvr != POWER10) && (pvr != POWER9))
+   switch (pvr) {
+   case POWER11:
+   case POWER10:
+   case POWER9:
+   break;
+   default:
goto out;
+   }
 
/*
 * Check PMU driver registered by looking for
@@ -499,6 +506,8 @@ static bool auxv_generic_compat_pmu(void)
base_pvr = POWER9;
else if (!strcmp(auxv_base_platform(), "power10"))
base_pvr = POWER10;
+   else if (!strcmp(auxv_base_platform(), "power11"))
+   base_pvr = POWER11;
 
return (!base_pvr);
 }
diff --git a/tools/testing/selftests/powerpc/pmu/sampling_tests/misc.h 
b/tools/testing/selftests/powerpc/pmu/sampling_tests/misc.h
index 64e25cce1435..09c5abe237af 100644
--- a/tools/testing/selftests/powerpc/pmu/sampling_tests/misc.h
+++ b/tools/testing/selftests/powerpc/pmu/sampling_tests/misc.h
@@ -8,10 +8,12 @@
 #include 
 #include "../event.h"
 
+#define POWER11 0x82
 #define POWER10 0x80
 #define POWER9  0x4e
 #define PERF_POWER9_MASK0x7f8
 #define PERF_POWER10_MASK   0x7ff
+#define PERF_POWER11_MASK   PERF_POWER10_MASK
 
 #define MMCR0_FC56  0x0010UL /* freeze counters 5 and 6 */
 #define MMCR0_PMCCEXT   0x0200UL /* PMCCEXT control */
@@ -165,21 +167,21 @@ static inline int get_mmcr2_fcta(u64 mmcr2, int pmc)
 
 static inline int get_mmcr2_l2l3(u64 mmcr2, int pmc)
 {
-   if (pvr == POWER10)
+   if (have_hwcap2(PPC_FEATURE2_ARCH_3_1))
return ((mmcr2 & 0xf8) >> 3);
return 0;
 }
 
 static inline int get_mmcr3_src(u64 mmcr3, int pmc)
 {
-   if (pvr != POWER10)
+   if (!have_hwcap2(PPC_FEATURE2_ARCH_3_1))
return 0;
return ((mmcr3 >> ((49 - (15 * ((pmc) - 1) & 0x7fff);
 }
 
 static inline int get_mmcra_thd_cmp(u64 mmcra, int pmc)
 {
-   if (pvr == POWER10)
+   if (have_hwcap2(PPC_FEATURE2_ARCH_3_1))
return ((mmcra >> 45) & 0x7ff);
return ((mmcra >> 45) & 0x3ff);
 }
@@ -191,7 +193,7 @@ static inline int get_mmcra_sm(u64 mmcra, int pmc)
 
 static inline u64 get_mmcra_bhrb_disable(u64 mmcra, int pmc)
 {
-   if (pvr == POWER10)
+   if (have_hwcap2(PPC_FEATURE2_ARCH_3_1))
return mmcra & BHRB_DISABLE;
return 0;
 }
-- 
2.43.5

Re: [PATCH 0/2] ASoC: fsl: Support MQS on i.MX943

2025-01-13 Thread Daniel Baluta

On Mon, Jan 13, 2025 at 11:04 AM Shengjiu Wang  wrote:
>
> There are two MQS instances on the i.MX943 platform.
> The definition of bit positions in the control register are
> different. In order to support these MQS modules, define
> two compatible strings to distinguish them.
>
> Shengjiu Wang (2):
>   ASoC: fsl_mqs: Add i.MX943 platform support
>   ASoC: dt-bindings: fsl,mqs: Add compatible string for i.MX943 platform


For entire patchseries:

Reviewed-by: Daniel Baluta

Re: [PATCH] select: Fix unbalanced user_access_end()

2025-01-13 Thread Christian Brauner

On Mon, 13 Jan 2025 09:37:24 +0100, Christophe Leroy wrote:
> While working on implementing user access validation on powerpc
> I got the following warnings on a pmac32_defconfig build:
> 
> CC  fs/select.o
>   fs/select.o: warning: objtool: sys_pselect6+0x1bc: redundant UACCESS 
> disable
>   fs/select.o: warning: objtool: sys_pselect6_time32+0x1bc: redundant 
> UACCESS disable
> 
> [...]

Applied to the vfs-6.14.misc branch of the vfs/vfs.git tree.
Patches in the vfs-6.14.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.14.misc

[1/1] select: Fix unbalanced user_access_end()
  https://git.kernel.org/vfs/vfs/c/83e724bcabc5

Re: [PATCH v5 09/17] arm64: pgtable: move pagetable_dtor() to __tlb_remove_table()

2025-01-13 Thread Will Deacon

On Wed, Jan 08, 2025 at 02:57:25PM +0800, Qi Zheng wrote:
> Move pagetable_dtor() to __tlb_remove_table(), so that ptlock and page
> table pages can be freed together (regardless of whether RCU is used).
> This prevents the use-after-free problem where the ptlock is freed
> immediately but the page table pages is freed later via RCU.
> 
> Page tables shouldn't have swap cache, so use pagetable_free() instead of
> free_page_and_swap_cache() to free page table pages.
> 
> Signed-off-by: Qi Zheng 
> Suggested-by: Peter Zijlstra (Intel) 
> Reviewed-by: Kevin Brodsky 
> Cc: linux-arm-ker...@lists.infradead.org
> ---
>  arch/arm64/include/asm/tlb.h | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)

Acked-by: Will Deacon 

Will

Re: [PATCH v6 07/26] fs/dax: Ensure all pages are idle prior to filesystem unmount

2025-01-13 Thread Darrick J. Wong

On Mon, Jan 13, 2025 at 04:48:31PM +1100, Alistair Popple wrote:
> On Sun, Jan 12, 2025 at 06:49:40PM -0800, Darrick J. Wong wrote:
> > On Mon, Jan 13, 2025 at 11:57:18AM +1100, Alistair Popple wrote:
> > > On Fri, Jan 10, 2025 at 08:50:19AM -0800, Darrick J. Wong wrote:
> > > > On Fri, Jan 10, 2025 at 05:00:35PM +1100, Alistair Popple wrote:
> > > > > File systems call dax_break_mapping() prior to reallocating file
> > > > > system blocks to ensure the page is not undergoing any DMA or other
> > > > > accesses. Generally this is needed when a file is truncated to ensure
> > > > > that if a block is reallocated nothing is writing to it. However
> > > > > filesystems currently don't call this when an FS DAX inode is evicted.
> > > > > 
> > > > > This can cause problems when the file system is unmounted as a page
> > > > > can continue to be under going DMA or other remote access after
> > > > > unmount. This means if the file system is remounted any truncate or
> > > > > other operation which requires the underlying file system block to be
> > > > > freed will not wait for the remote access to complete. Therefore a
> > > > > busy block may be reallocated to a new file leading to corruption.
> > > > > 
> > > > > Signed-off-by: Alistair Popple 
> > > > > 
> > > > > ---
> > > > > 
> > > > > Changes for v5:
> > > > > 
> > > > >  - Don't wait for pages to be idle in non-DAX mappings
> > > > > ---
> > > > >  fs/dax.c| 29 +
> > > > >  fs/ext4/inode.c | 32 ++--
> > > > >  fs/xfs/xfs_inode.c  |  9 +
> > > > >  fs/xfs/xfs_inode.h  |  1 +
> > > > >  fs/xfs/xfs_super.c  | 18 ++
> > > > >  include/linux/dax.h |  2 ++
> > > > >  6 files changed, 73 insertions(+), 18 deletions(-)
> > > > > 
> > > > > diff --git a/fs/dax.c b/fs/dax.c
> > > > > index 7008a73..4e49cc4 100644
> > > > > --- a/fs/dax.c
> > > > > +++ b/fs/dax.c
> > > > > @@ -883,6 +883,14 @@ static int wait_page_idle(struct page *page,
> > > > >   TASK_INTERRUPTIBLE, 0, 0, cb(inode));
> > > > >  }
> > > > >  
> > > > > +static void wait_page_idle_uninterruptible(struct page *page,
> > > > > + void (cb)(struct inode *),
> > > > > + struct inode *inode)
> > > > > +{
> > > > > + ___wait_var_event(page, page_ref_count(page) == 1,
> > > > > + TASK_UNINTERRUPTIBLE, 0, 0, cb(inode));
> > > > > +}
> > > > > +
> > > > >  /*
> > > > >   * Unmaps the inode and waits for any DMA to complete prior to 
> > > > > deleting the
> > > > >   * DAX mapping entries for the range.
> > > > > @@ -911,6 +919,27 @@ int dax_break_mapping(struct inode *inode, 
> > > > > loff_t start, loff_t end,
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(dax_break_mapping);
> > > > >  
> > > > > +void dax_break_mapping_uninterruptible(struct inode *inode,
> > > > > + void (cb)(struct inode *))
> > > > > +{
> > > > > + struct page *page;
> > > > > +
> > > > > + if (!dax_mapping(inode->i_mapping))
> > > > > + return;
> > > > > +
> > > > > + do {
> > > > > + page = dax_layout_busy_page_range(inode->i_mapping, 0,
> > > > > + LLONG_MAX);
> > > > > + if (!page)
> > > > > + break;
> > > > > +
> > > > > + wait_page_idle_uninterruptible(page, cb, inode);
> > > > > + } while (true);
> > > > > +
> > > > > + dax_delete_mapping_range(inode->i_mapping, 0, LLONG_MAX);
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(dax_break_mapping_uninterruptible);
> > > > > +
> > > > >  /*
> > > > >   * Invalidate DAX entry if it is clean.
> > > > >   */
> > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > > > index ee8e83f..fa35161 100644
> > > > > --- a/fs/ext4/inode.c
> > > > > +++ b/fs/ext4/inode.c
> > > > > @@ -163,6 +163,18 @@ int ext4_inode_is_fast_symlink(struct inode 
> > > > > *inode)
> > > > >  (inode->i_size < EXT4_N_BLOCKS * 4);
> > > > >  }
> > > > >  
> > > > > +static void ext4_wait_dax_page(struct inode *inode)
> > > > > +{
> > > > > + filemap_invalidate_unlock(inode->i_mapping);
> > > > > + schedule();
> > > > > + filemap_invalidate_lock(inode->i_mapping);
> > > > > +}
> > > > > +
> > > > > +int ext4_break_layouts(struct inode *inode)
> > > > > +{
> > > > > + return dax_break_mapping_inode(inode, ext4_wait_dax_page);
> > > > > +}
> > > > > +
> > > > >  /*
> > > > >   * Called at the last iput() if i_nlink is zero.
> > > > >   */
> > > > > @@ -181,6 +193,8 @@ void ext4_evict_inode(struct inode *inode)
> > > > >  
> > > > >   trace_ext4_evict_inode(inode);
> > > > >  
> > > > > + dax_break_mapping_uninterruptible(inode, ext4_wait_dax_page);
> > > > > +
> > > > >   if (EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL)
> > > > >   ext4_evict_ea_inode(inode);
> > > > >   if (inode->i_nlink) {
> > >

[PATCH v2 0/7] ptrace: introduce PTRACE_SET_SYSCALL_INFO API

2025-01-13 Thread Dmitry V. Levin

PTRACE_SET_SYSCALL_INFO is a generic ptrace API that complements
PTRACE_GET_SYSCALL_INFO by letting the ptracer modify details of
system calls the tracee is blocked in.

This API allows ptracers to obtain and modify system call details
in a straightforward and architecture-agnostic way.

Current implementation supports changing only those bits of system call
information that are used by strace, namely, syscall number, syscall
arguments, and syscall return value.

Support of changing additional details returned by PTRACE_GET_SYSCALL_INFO,
such as instruction pointer and stack pointer, could be added later
if needed, by using struct ptrace_syscall_info.flags to specify
the additional details that should be set.  Currently, flags and reserved
fields of struct ptrace_syscall_info must be initialized with zeroes;
arch, instruction_pointer, and stack_pointer fields are ignored.

PTRACE_SET_SYSCALL_INFO currently supports only PTRACE_SYSCALL_INFO_ENTRY,
PTRACE_SYSCALL_INFO_EXIT, and PTRACE_SYSCALL_INFO_SECCOMP operations.
Other operations could be added later if needed.

Ideally, PTRACE_SET_SYSCALL_INFO should have been introduced along with
PTRACE_GET_SYSCALL_INFO, but it didn't happen.  The last straw that
convinced me to implement PTRACE_SET_SYSCALL_INFO was apparent failure
to provide an API of changing the first system call argument on riscv
architecture [1].

ptrace(2) man page:

long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
...
PTRACE_SET_SYSCALL_INFO
   Modify information about the system call that caused the stop.
   The "data" argument is a pointer to struct ptrace_syscall_info
   that specifies the system call information to be set.
   The "addr" argument should be set to sizeof(struct ptrace_syscall_info)).

[1] https://lore.kernel.org/all/59505464-c84a-403d-972f-d4b2055ee...@gmail.com/

---

Notes:
v2:
* Add patch to fix syscall_set_return_value() on powerpc
* Add patch to fix mips_get_syscall_arg() on mips
* Merge two patches adding syscall_set_arguments() implementations
  from different sources into a single patch
* Add syscall_set_return_value() implementation on hexagon
* Add syscall_set_return_value() invocation to syscall_set_nr()
  on arm and arm64.
* Fix syscall_set_nr() and mips_set_syscall_arg() on mips
* Add a comment to syscall_set_nr() on arc, powerpc, s390, sh,
  and sparc
* Remove redundant ptrace_syscall_info.op assignments in
  ptrace_get_syscall_info_*
* Minor style tweaks in ptrace_get_syscall_info_op()
* Remove syscall_set_return_value() invocation from
  ptrace_set_syscall_info_entry()
* Skip syscall_set_arguments() invocation in case of syscall number -1
  in ptrace_set_syscall_info_entry() 
* Split ptrace_syscall_info.reserved into ptrace_syscall_info.reserved
  and ptrace_syscall_info.flags
* Use __kernel_ulong_t instead of unsigned long in set_syscall_info test

Dmitry V. Levin (7):
  powerpc: properly negate error in syscall_set_return_value()
  mips: fix mips_get_syscall_arg() for O32 and N32
  syscall.h: add syscall_set_arguments() and syscall_set_return_value()
  syscall.h: introduce syscall_set_nr()
  ptrace_get_syscall_info: factor out ptrace_get_syscall_info_op
  ptrace: introduce PTRACE_SET_SYSCALL_INFO request
  selftests/ptrace: add a test case for PTRACE_SET_SYSCALL_INFO

 arch/arc/include/asm/syscall.h|  25 +
 arch/arm/include/asm/syscall.h|  37 ++
 arch/arm64/include/asm/syscall.h  |  29 ++
 arch/csky/include/asm/syscall.h   |  13 +
 arch/hexagon/include/asm/syscall.h|  21 +
 arch/loongarch/include/asm/syscall.h  |  15 +
 arch/m68k/include/asm/syscall.h   |   7 +
 arch/microblaze/include/asm/syscall.h |   7 +
 arch/mips/include/asm/syscall.h   |  72 ++-
 arch/nios2/include/asm/syscall.h  |  16 +
 arch/openrisc/include/asm/syscall.h   |  13 +
 arch/parisc/include/asm/syscall.h |  19 +
 arch/powerpc/include/asm/syscall.h|  26 +-
 arch/riscv/include/asm/syscall.h  |  16 +
 arch/s390/include/asm/syscall.h   |  24 +
 arch/sh/include/asm/syscall_32.h  |  24 +
 arch/sparc/include/asm/syscall.h  |  22 +
 arch/um/include/asm/syscall-generic.h |  19 +
 arch/x86/include/asm/syscall.h|  43 ++
 arch/xtensa/include/asm/syscall.h |  18 +
 include/asm-generic/syscall.h |  30 ++
 include/linux/ptrace.h|   3 +
 include/uapi/linux/ptrace.h   |   4 +-
 kernel/ptrace.c   | 153 +-
 tools/testing/selftests/ptrace/Makefile   |   2 +-
 .../selftests/ptrace/set_syscall_info.c   | 441 ++
 26 files changed, 1052 insertions(+), 47 deletions(-)
 create mode 100644 tools/testing/selftests/ptrace/set_syscall_info.c

-- 
l

[PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()

2025-01-13 Thread Dmitry V. Levin

Bring syscall_set_return_value() in sync with syscall_get_error(),
and let upcoming ptrace/set_syscall_info selftest pass on powerpc.

This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
syscall_set_return_value()").

Signed-off-by: Dmitry V. Levin 
---
 arch/powerpc/include/asm/syscall.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/syscall.h 
b/arch/powerpc/include/asm/syscall.h
index 3dd36c5e334a..422d7735ace6 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -82,7 +82,11 @@ static inline void syscall_set_return_value(struct 
task_struct *task,
 */
if (error) {
regs->ccr |= 0x1000L;
-   regs->gpr[3] = error;
+   /*
+* In case of an error regs->gpr[3] contains
+* a positive ERRORCODE.
+*/
+   regs->gpr[3] = -error;
} else {
regs->ccr &= ~0x1000L;
regs->gpr[3] = val;
-- 
ldv

Re: [PATCH 3/5] KVM: Add a common kvm_run flag to communicate an exit needs completion

2025-01-13 Thread Sean Christopherson

On Mon, Jan 13, 2025, Chao Gao wrote:
> On Fri, Jan 10, 2025 at 05:24:48PM -0800, Sean Christopherson wrote:
> >Add a kvm_run flag, KVM_RUN_NEEDS_COMPLETION, to communicate to userspace
> >that KVM_RUN needs to be re-executed prior to save/restore in order to
> >complete the instruction/operation that triggered the userspace exit.
> >
> >KVM's current approach of adding notes in the Documentation is beyond
> >brittle, e.g. there is at least one known case where a KVM developer added
> >a new userspace exit type, and then that same developer forgot to handle
> >completion when adding userspace support.
> 
> This answers one question I had:
> https://lore.kernel.org/kvm/z1bmucedoz87w...@intel.com/
> 
> So, it is the VMM's (i.e., QEMU's) responsibility to re-execute KVM_RUN in 
> this
> case.

Yep.

> Btw, can this flag be used to address the issue [*] with steal time 
> accounting?
> We can set the new flag for each vCPU in the PM notifier and we need to change
> the re-execution to handle steal time accounting (not just IO completion).
> 
> [*]: https://lore.kernel.org/kvm/z36xjl1oaahvk...@google.com/

Uh, hmm.  Partially?  And not without creating new, potentially worse problems.

I like the idea, but (a) there's no guarantee a vCPU would be "in" KVM_RUN at
the time of suspend, and (b) KVM would need to take vcpu->mutex in the PM 
notifier
in order to avoid clobbering the current completion callback, which is 
definitely
a net negative (hello, deadlocks).

E.g. if a vCPU task is in userspace processing emulated MMIO at the time of
suspend+resume, KVM's completion callback will be non-zero and must be 
preserved.
And if a vCPU task is in userspace processing an exit that _doesn't_ require
completion, setting KVM_RUN_NEEDS_COMPLETION would likely be missed by 
userspace,
e.g. if userspace checks the flag only after regaining control from KVM_RUN.

In general, I think setting KVM_RUN_NEEDS_COMPLETION outside of KVM_RUN would 
add
too much complexity.

> one nit below,
> 
> >--- a/arch/x86/include/uapi/asm/kvm.h
> >+++ b/arch/x86/include/uapi/asm/kvm.h
> >@@ -104,9 +104,10 @@ struct kvm_ioapic_state {
> > #define KVM_IRQCHIP_IOAPIC   2
> > #define KVM_NR_IRQCHIPS  3
> > 
> >-#define KVM_RUN_X86_SMM  (1 << 0)
> >-#define KVM_RUN_X86_BUS_LOCK (1 << 1)
> >-#define KVM_RUN_X86_GUEST_MODE   (1 << 2)
> >+#define KVM_RUN_X86_SMM (1 << 0)
> >+#define KVM_RUN_X86_BUS_LOCK(1 << 1)
> >+#define KVM_RUN_X86_GUEST_MODE  (1 << 2)
> >+#define KVM_RUN_X86_NEEDS_COMPLETION(1 << 2)
> 
> This X86_NEEDS_COMPLETION should be dropped. It is never used.

Gah, thanks!

[PATCH v2 3/7] syscall.h: add syscall_set_arguments() and syscall_set_return_value()

2025-01-13 Thread Dmitry V. Levin

These functions are going to be needed on all HAVE_ARCH_TRACEHOOK
architectures to implement PTRACE_SET_SYSCALL_INFO API.

This partially reverts commit 7962c2eddbfe ("arch: remove unused
function syscall_set_arguments()") by reusing some of old
syscall_set_arguments() implementations.

Signed-off-by: Dmitry V. Levin 
---

Note that I'm not a MIPS expert, I just added mips_set_syscall_arg() by
looking at mips_get_syscall_arg() and the result passes tests in qemu on
mips O32, mips64 O32, mips64 N32, and mips64 N64.

 arch/arc/include/asm/syscall.h| 14 +++
 arch/arm/include/asm/syscall.h| 13 ++
 arch/arm64/include/asm/syscall.h  | 13 ++
 arch/csky/include/asm/syscall.h   | 13 ++
 arch/hexagon/include/asm/syscall.h| 14 +++
 arch/loongarch/include/asm/syscall.h  |  8 ++
 arch/mips/include/asm/syscall.h   | 32 
 arch/nios2/include/asm/syscall.h  | 11 
 arch/openrisc/include/asm/syscall.h   |  7 ++
 arch/parisc/include/asm/syscall.h | 12 +
 arch/powerpc/include/asm/syscall.h| 10 
 arch/riscv/include/asm/syscall.h  |  9 +++
 arch/s390/include/asm/syscall.h   | 12 +
 arch/sh/include/asm/syscall_32.h  | 12 +
 arch/sparc/include/asm/syscall.h  | 10 
 arch/um/include/asm/syscall-generic.h | 14 +++
 arch/x86/include/asm/syscall.h| 36 +++
 arch/xtensa/include/asm/syscall.h | 11 
 include/asm-generic/syscall.h | 16 
 19 files changed, 267 insertions(+)

diff --git a/arch/arc/include/asm/syscall.h b/arch/arc/include/asm/syscall.h
index 9709256e31c8..89c1e1736356 100644
--- a/arch/arc/include/asm/syscall.h
+++ b/arch/arc/include/asm/syscall.h
@@ -67,6 +67,20 @@ syscall_get_arguments(struct task_struct *task, struct 
pt_regs *regs,
}
 }
 
+static inline void
+syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
+ unsigned long *args)
+{
+   unsigned long *inside_ptregs = ®s->r0;
+   unsigned int n = 6;
+   unsigned int i = 0;
+
+   while (n--) {
+   *inside_ptregs = args[i++];
+   inside_ptregs--;
+   }
+}
+
 static inline int
 syscall_get_arch(struct task_struct *task)
 {
diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index fe4326d938c1..21927fa0ae2b 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -80,6 +80,19 @@ static inline void syscall_get_arguments(struct task_struct 
*task,
memcpy(args, ®s->ARM_r0 + 1, 5 * sizeof(args[0]));
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+struct pt_regs *regs,
+const unsigned long *args)
+{
+   memcpy(®s->ARM_r0, args, 6 * sizeof(args[0]));
+   /*
+* Also copy the first argument into ARM_ORIG_r0
+* so that syscall_get_arguments() would return it
+* instead of the previous value.
+*/
+   regs->ARM_ORIG_r0 = regs->ARM_r0;
+}
+
 static inline int syscall_get_arch(struct task_struct *task)
 {
/* ARM tasks don't change audit architectures on the fly. */
diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
index ab8e14b96f68..76020b66286b 100644
--- a/arch/arm64/include/asm/syscall.h
+++ b/arch/arm64/include/asm/syscall.h
@@ -73,6 +73,19 @@ static inline void syscall_get_arguments(struct task_struct 
*task,
memcpy(args, ®s->regs[1], 5 * sizeof(args[0]));
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+struct pt_regs *regs,
+const unsigned long *args)
+{
+   memcpy(®s->regs[0], args, 6 * sizeof(args[0]));
+   /*
+* Also copy the first argument into orig_x0
+* so that syscall_get_arguments() would return it
+* instead of the previous value.
+*/
+   regs->orig_x0 = regs->regs[0];
+}
+
 /*
  * We don't care about endianness (__AUDIT_ARCH_LE bit) here because
  * AArch64 has the same system calls both on little- and big- endian.
diff --git a/arch/csky/include/asm/syscall.h b/arch/csky/include/asm/syscall.h
index 0de5734950bf..30403f7a0487 100644
--- a/arch/csky/include/asm/syscall.h
+++ b/arch/csky/include/asm/syscall.h
@@ -59,6 +59,19 @@ syscall_get_arguments(struct task_struct *task, struct 
pt_regs *regs,
memcpy(args, ®s->a1, 5 * sizeof(args[0]));
 }
 
+static inline void
+syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
+ const unsigned long *args)
+{
+   memcpy(®s->a0, args, 6 * sizeof(regs->a0));
+   /*
+* Also copy the first argument into orig_x0
+* so that syscall_get_arguments() would return it
+* instead of the previous value.
+*/
+   regs->orig

[PATCH v2 4/7] syscall.h: introduce syscall_set_nr()

2025-01-13 Thread Dmitry V. Levin

Similar to syscall_set_arguments() that complements
syscall_get_arguments(), introduce syscall_set_nr()
that complements syscall_get_nr().

syscall_set_nr() is going to be needed along with
syscall_set_arguments() on all HAVE_ARCH_TRACEHOOK
architectures to implement PTRACE_SET_SYSCALL_INFO API.

Signed-off-by: Dmitry V. Levin 
---
 arch/arc/include/asm/syscall.h| 11 +++
 arch/arm/include/asm/syscall.h| 24 
 arch/arm64/include/asm/syscall.h  | 16 
 arch/hexagon/include/asm/syscall.h|  7 +++
 arch/loongarch/include/asm/syscall.h  |  7 +++
 arch/m68k/include/asm/syscall.h   |  7 +++
 arch/microblaze/include/asm/syscall.h |  7 +++
 arch/mips/include/asm/syscall.h   | 14 ++
 arch/nios2/include/asm/syscall.h  |  5 +
 arch/openrisc/include/asm/syscall.h   |  6 ++
 arch/parisc/include/asm/syscall.h |  7 +++
 arch/powerpc/include/asm/syscall.h| 10 ++
 arch/riscv/include/asm/syscall.h  |  7 +++
 arch/s390/include/asm/syscall.h   | 12 
 arch/sh/include/asm/syscall_32.h  | 12 
 arch/sparc/include/asm/syscall.h  | 12 
 arch/um/include/asm/syscall-generic.h |  5 +
 arch/x86/include/asm/syscall.h|  7 +++
 arch/xtensa/include/asm/syscall.h |  7 +++
 include/asm-generic/syscall.h | 14 ++
 20 files changed, 197 insertions(+)

diff --git a/arch/arc/include/asm/syscall.h b/arch/arc/include/asm/syscall.h
index 89c1e1736356..728d625a10f1 100644
--- a/arch/arc/include/asm/syscall.h
+++ b/arch/arc/include/asm/syscall.h
@@ -23,6 +23,17 @@ syscall_get_nr(struct task_struct *task, struct pt_regs 
*regs)
return -1;
 }
 
+static inline void
+syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
+{
+   /*
+* Unlike syscall_get_nr(), syscall_set_nr() can be called only when
+* the target task is stopped for tracing on entering syscall, so
+* there is no need to have the same check syscall_get_nr() has.
+*/
+   regs->r8 = nr;
+}
+
 static inline void
 syscall_rollback(struct task_struct *task, struct pt_regs *regs)
 {
diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index 21927fa0ae2b..18b102a30741 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -68,6 +68,30 @@ static inline void syscall_set_return_value(struct 
task_struct *task,
regs->ARM_r0 = (long) error ? error : val;
 }
 
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+   if (nr == -1) {
+   task_thread_info(task)->abi_syscall = -1;
+   /*
+* When the syscall number is set to -1, the syscall will be
+* skipped.  In this case the syscall return value has to be
+* set explicitly, otherwise the first syscall argument is
+* returned as the syscall return value.
+*/
+   syscall_set_return_value(task, regs, -ENOSYS, 0);
+   return;
+   }
+   if ((IS_ENABLED(CONFIG_AEABI) && !IS_ENABLED(CONFIG_OABI_COMPAT))) {
+   task_thread_info(task)->abi_syscall = nr;
+   return;
+   }
+   task_thread_info(task)->abi_syscall =
+   (task_thread_info(task)->abi_syscall & ~__NR_SYSCALL_MASK) |
+   (nr & __NR_SYSCALL_MASK);
+}
+
 #define SYSCALL_MAX_ARGS 7
 
 static inline void syscall_get_arguments(struct task_struct *task,
diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
index 76020b66286b..712daa90e643 100644
--- a/arch/arm64/include/asm/syscall.h
+++ b/arch/arm64/include/asm/syscall.h
@@ -61,6 +61,22 @@ static inline void syscall_set_return_value(struct 
task_struct *task,
regs->regs[0] = val;
 }
 
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+   regs->syscallno = nr;
+   if (nr == -1) {
+   /*
+* When the syscall number is set to -1, the syscall will be
+* skipped.  In this case the syscall return value has to be
+* set explicitly, otherwise the first syscall argument is
+* returned as the syscall return value.
+*/
+   syscall_set_return_value(task, regs, -ENOSYS, 0);
+   }
+}
+
 #define SYSCALL_MAX_ARGS 6
 
 static inline void syscall_get_arguments(struct task_struct *task,
diff --git a/arch/hexagon/include/asm/syscall.h 
b/arch/hexagon/include/asm/syscall.h
index 1024a6548d78..70637261817a 100644
--- a/arch/hexagon/include/asm/syscall.h
+++ b/arch/hexagon/include/asm/syscall.h
@@ -26,6 +26,13 @@ static inline long syscall_get_nr(struct task_struct

[PATCH] powerpc/prom_init: Fixup missing #size-cells on PowerBook6,7

2025-01-13 Thread Andreas Schwab

Similar to the PowerMac3,1, the PowerBook6,7 is missing the #size-cells
property on the i2s node.

Depends-on: 045b14ca5c36 ("of: WARN on deprecated #address-cells/#size-cells 
handling")
Signed-off-by: Andreas Schwab 
---
 arch/powerpc/kernel/prom_init.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 8e776ba39497..24b1523eeea8 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -2898,11 +2898,11 @@ static void __init fixup_device_tree_pmac(void)
char type[8];
phandle node;
 
-   // Some pmacs are missing #size-cells on escc nodes
+   // Some pmacs are missing #size-cells on escc or i2s nodes
for (node = 0; prom_next_node(&node); ) {
type[0] = '\0';
prom_getprop(node, "device_type", type, sizeof(type));
-   if (prom_strcmp(type, "escc"))
+   if (prom_strcmp(type, "escc") && prom_strcmp(type, "i2s"))
continue;
 
if (prom_getproplen(node, "#size-cells") != PROM_ERROR)
-- 
2.48.0


-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

Re: [PATCH v6 05/26] fs/dax: Create a common implementation to break DAX layouts

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> Prior to freeing a block file systems supporting FS DAX must check
> that the associated pages are both unmapped from user-space and not
> undergoing DMA or other access from eg. get_user_pages(). This is
> achieved by unmapping the file range and scanning the FS DAX
> page-cache to see if any pages within the mapping have an elevated
> refcount.
> 
> This is done using two functions - dax_layout_busy_page_range() which
> returns a page to wait for the refcount to become idle on. Rather than
> open-code this introduce a common implementation to both unmap and
> wait for the page to become idle.
> 
> Signed-off-by: Alistair Popple 

After resolving my confusion about retries, you can add:

Reviewed-by: Dan Williams 

...although some bikeshedding below that can take or leave as you wish.

> 
> ---
> 
> Changes for v5:
> 
>  - Don't wait for idle pages on non-DAX mappings
> 
> Changes for v4:
> 
>  - Fixed some build breakage due to missing symbol exports reported by
>John Hubbard (thanks!).
> ---
>  fs/dax.c| 33 +
>  fs/ext4/inode.c | 10 +-
>  fs/fuse/dax.c   | 27 +++
>  fs/xfs/xfs_inode.c  | 23 +--
>  fs/xfs/xfs_inode.h  |  2 +-
>  include/linux/dax.h | 21 +
>  mm/madvise.c|  8 
>  7 files changed, 68 insertions(+), 56 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index d010c10..9c3bd07 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -845,6 +845,39 @@ int dax_delete_mapping_entry(struct address_space 
> *mapping, pgoff_t index)
>   return ret;
>  }
>  
> +static int wait_page_idle(struct page *page,
> + void (cb)(struct inode *),
> + struct inode *inode)
> +{
> + return ___wait_var_event(page, page_ref_count(page) == 1,
> + TASK_INTERRUPTIBLE, 0, 0, cb(inode));
> +}
> +
> +/*
> + * Unmaps the inode and waits for any DMA to complete prior to deleting the
> + * DAX mapping entries for the range.
> + */
> +int dax_break_mapping(struct inode *inode, loff_t start, loff_t end,
> + void (cb)(struct inode *))
> +{
> + struct page *page;
> + int error;
> +
> + if (!dax_mapping(inode->i_mapping))
> + return 0;
> +
> + do {
> + page = dax_layout_busy_page_range(inode->i_mapping, start, end);
> + if (!page)
> + break;
> +
> + error = wait_page_idle(page, cb, inode);
> + } while (error == 0);
> +
> + return error;
> +}
> +EXPORT_SYMBOL_GPL(dax_break_mapping);

It is not clear why this is called "mapping" vs "layout". The detail
about the file that is being "broken" is whether there are any live
subscriptions to the "layout" of the file, the pfn storage layout, not
the memory mapping.

For example the bulk of dax_break_layout() is performed after
invalidate_inode_pages() has torn down the memory mapping.

Re: [PATCH] MAINTAINERS: powerpc: Update my status

2025-01-13 Thread Michael Ellerman

Stephen Rothwell  writes:
> Hi Michael,
>
> On Sat, 11 Jan 2025 10:57:38 +1100 Michael Ellerman  
> wrote:
>>
>> Maddy is taking over the day-to-day maintenance of powerpc. I will still
>> be around to help, and as a backup.
>> 
>> Re-order the main POWERPC list to put Maddy first to reflect that.
>> 
>> KVM/powerpc patches will be handled by Maddy via the powerpc tree with
>> review from Nick, so replace myself with Maddy there.
>
> I have added Maddy as a contact on the powerpc and powerpc-fixes tree
> in linux-next and replaced you with Maddy for the kvm-ppc tree.  Are
> there any other changes needed?

Ah thanks sfr, I forgot about that. I think that's all that's needed for
now.

cheers

Re: [PATCH v6 05/26] fs/dax: Create a common implementation to break DAX layouts

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> Prior to freeing a block file systems supporting FS DAX must check
> that the associated pages are both unmapped from user-space and not
> undergoing DMA or other access from eg. get_user_pages(). This is
> achieved by unmapping the file range and scanning the FS DAX
> page-cache to see if any pages within the mapping have an elevated
> refcount.
> 
> This is done using two functions - dax_layout_busy_page_range() which
> returns a page to wait for the refcount to become idle on. Rather than
> open-code this introduce a common implementation to both unmap and
> wait for the page to become idle.
> 
> Signed-off-by: Alistair Popple 
> 
> ---
[..]

Whoops, I hit send on the last mail before seeing this:

> diff --git a/mm/madvise.c b/mm/madvise.c
> index 49f3a75..1f4c99e 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c

This hunk needs to move to the devmap removal patch, right?

With that fixed up the Reviewed-by still stands.

Re: [PATCH v6 06/26] fs/dax: Always remove DAX page-cache entries when breaking layouts

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> Prior to any truncation operations file systems call
> dax_break_mapping() to ensure pages in the range are not under going
> DMA. Later DAX page-cache entries will be removed by
> truncate_folio_batch_exceptionals() in the generic page-cache code.
> 
> However this makes it possible for folios to be removed from the
> page-cache even though they are still DMA busy if the file-system
> hasn't called dax_break_mapping(). It also means they can never be
> waited on in future because FS DAX will lose track of them once the
> page-cache entry has been deleted.
> 
> Instead it is better to delete the FS DAX entry when the file-system
> calls dax_break_mapping() as part of it's truncate operation. This
> ensures only idle pages can be removed from the FS DAX page-cache and
> makes it easy to detect if a file-system hasn't called
> dax_break_mapping() prior to a truncate operation.
> 
> Signed-off-by: Alistair Popple 
> 
> ---
> 
> Ideally I think we would move the whole wait-for-idle logic directly
> into the truncate paths. However this is difficult for a few
> reasons. Each filesystem needs it's own wait callback, although a new
> address space operation could address that. More problematic is that
> the wait-for-idle can fail as the wait is TASK_INTERRUPTIBLE, but none
> of the generic truncate paths allow for failure.
> 
> So it ends up being easier to continue to let file systems call this
> and check that they behave as expected.
> ---
>  fs/dax.c| 33 +
>  fs/xfs/xfs_inode.c  |  6 ++
>  include/linux/dax.h |  2 ++
>  mm/truncate.c   | 16 +++-
>  4 files changed, 56 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 9c3bd07..7008a73 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -845,6 +845,36 @@ int dax_delete_mapping_entry(struct address_space 
> *mapping, pgoff_t index)
>   return ret;
>  }
>  
> +void dax_delete_mapping_range(struct address_space *mapping,
> + loff_t start, loff_t end)
> +{
> + void *entry;
> + pgoff_t start_idx = start >> PAGE_SHIFT;
> + pgoff_t end_idx;
> + XA_STATE(xas, &mapping->i_pages, start_idx);
> +
> + /* If end == LLONG_MAX, all pages from start to till end of file */
> + if (end == LLONG_MAX)
> + end_idx = ULONG_MAX;
> + else
> + end_idx = end >> PAGE_SHIFT;
> +
> + xas_lock_irq(&xas);
> + xas_for_each(&xas, entry, end_idx) {
> + if (!xa_is_value(entry))
> + continue;
> + entry = wait_entry_unlocked_exclusive(&xas, entry);
> + if (!entry)
> + continue;
> + dax_disassociate_entry(entry, mapping, true);
> + xas_store(&xas, NULL);
> + mapping->nrpages -= 1UL << dax_entry_order(entry);
> + put_unlocked_entry(&xas, entry, WAKE_ALL);
> + }
> + xas_unlock_irq(&xas);
> +}
> +EXPORT_SYMBOL_GPL(dax_delete_mapping_range);
> +
>  static int wait_page_idle(struct page *page,
>   void (cb)(struct inode *),
>   struct inode *inode)
> @@ -874,6 +904,9 @@ int dax_break_mapping(struct inode *inode, loff_t start, 
> loff_t end,
>   error = wait_page_idle(page, cb, inode);
>   } while (error == 0);
>  
> + if (!page)
> + dax_delete_mapping_range(inode->i_mapping, start, end);
> +

Just reinforcing the rename comment on the last patch...

I think this is an example where the
s/dax_break_mapping/dax_break_layout/ rename helps disambiguate what is
related to mapping cleanup and what is related to mapping cleanup as
dax_break_layout calls dax_delete_mapping.

>   return error;
>  }
>  EXPORT_SYMBOL_GPL(dax_break_mapping);
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 295730a..4410b42 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -2746,6 +2746,12 @@ xfs_mmaplock_two_inodes_and_break_dax_layout(
>   goto again;
>   }
>  
> + /*
> +  * Normally xfs_break_dax_layouts() would delete the mapping entries as 
> well so
> +  * do that here.
> +  */
> + dax_delete_mapping_range(VFS_I(ip2)->i_mapping, 0, LLONG_MAX);
> +

I think it is unfortunate that dax_break_mapping is so close to being
useful for this case... how about this incremental cleanup?

diff --git a/fs/dax.c b/fs/dax.c
index facddd6c6bbb..1fa5521e5a2e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -942,12 +942,15 @@ static void wait_page_idle_uninterruptible(struct page 
*page,
 /*
  * Unmaps the inode and waits for any DMA to complete prior to deleting the
  * DAX mapping entries for the range.
+ *
+ * For NOWAIT behavior, pass @cb as NULL to early-exit on first found
+ * busy page
  */
 int dax_break_mapping(struct inode *inode, loff_t start, loff_t end,
void (cb)(struct inode *))
 {
struct page *page;
-   int error;
+   int error = 0;

Re: [PATCH v6 07/26] fs/dax: Ensure all pages are idle prior to filesystem unmount

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> File systems call dax_break_mapping() prior to reallocating file
> system blocks to ensure the page is not undergoing any DMA or other
> accesses. Generally this is needed when a file is truncated to ensure
> that if a block is reallocated nothing is writing to it. However
> filesystems currently don't call this when an FS DAX inode is evicted.
> 
> This can cause problems when the file system is unmounted as a page
> can continue to be under going DMA or other remote access after
> unmount. This means if the file system is remounted any truncate or
> other operation which requires the underlying file system block to be
> freed will not wait for the remote access to complete. Therefore a
> busy block may be reallocated to a new file leading to corruption.
> 
> Signed-off-by: Alistair Popple 
> 
> ---
> 
> Changes for v5:
> 
>  - Don't wait for pages to be idle in non-DAX mappings
> ---
>  fs/dax.c| 29 +
>  fs/ext4/inode.c | 32 ++--
>  fs/xfs/xfs_inode.c  |  9 +
>  fs/xfs/xfs_inode.h  |  1 +
>  fs/xfs/xfs_super.c  | 18 ++
>  include/linux/dax.h |  2 ++
>  6 files changed, 73 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 7008a73..4e49cc4 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -883,6 +883,14 @@ static int wait_page_idle(struct page *page,
>   TASK_INTERRUPTIBLE, 0, 0, cb(inode));
>  }
>  
> +static void wait_page_idle_uninterruptible(struct page *page,
> + void (cb)(struct inode *),
> + struct inode *inode)
> +{
> + ___wait_var_event(page, page_ref_count(page) == 1,
> + TASK_UNINTERRUPTIBLE, 0, 0, cb(inode));
> +}
> +
>  /*
>   * Unmaps the inode and waits for any DMA to complete prior to deleting the
>   * DAX mapping entries for the range.
> @@ -911,6 +919,27 @@ int dax_break_mapping(struct inode *inode, loff_t start, 
> loff_t end,
>  }
>  EXPORT_SYMBOL_GPL(dax_break_mapping);
>  
> +void dax_break_mapping_uninterruptible(struct inode *inode,
> + void (cb)(struct inode *))
> +{
> + struct page *page;
> +
> + if (!dax_mapping(inode->i_mapping))
> + return;
> +
> + do {
> + page = dax_layout_busy_page_range(inode->i_mapping, 0,
> + LLONG_MAX);
> + if (!page)
> + break;
> +
> + wait_page_idle_uninterruptible(page, cb, inode);
> + } while (true);
> +
> + dax_delete_mapping_range(inode->i_mapping, 0, LLONG_MAX);
> +}
> +EXPORT_SYMBOL_GPL(dax_break_mapping_uninterruptible);

Riffing off of Darrick's feedback, how about call this
dax_break_layout_final()?

Re: [PATCH v2 09/18] riscv: vdso: Switch to generic storage implementation

2025-01-13 Thread Conor Dooley

On Fri, Jan 10, 2025 at 04:23:48PM +0100, Thomas Weißschuh wrote:
> The generic storage implementation provides the same features as the
> custom one. However it can be shared between architectures, making
> maintenance easier.
> 
> Co-developed-by: Nam Cao 
> Signed-off-by: Nam Cao 
> Signed-off-by: Thomas Weißschuh 

For rv64, nommu:
  LD  vmlinux
ld.lld: error: undefined symbol: vmf_insert_pfn
>>> referenced by datastore.c
>>>   lib/vdso/datastore.o:(vvar_fault) in archive vmlinux.a

ld.lld: error: undefined symbol: _install_special_mapping
>>> referenced by datastore.c
>>>   lib/vdso/datastore.o:(vdso_install_vvar_mapping) in archive 
>>> vmlinux.a

Later patches in the series don't make it build again.
rv32 builds now though, so thanks for fixing that.

Cheers,
Conor.

> ---
>  arch/riscv/Kconfig |  3 +-
>  arch/riscv/include/asm/vdso.h  |  2 +-
>  .../include/asm/vdso/{time_data.h => arch_data.h}  |  8 +-
>  arch/riscv/include/asm/vdso/gettimeofday.h | 14 +---
>  arch/riscv/include/asm/vdso/vsyscall.h |  9 ---
>  arch/riscv/kernel/sys_hwprobe.c|  3 +-
>  arch/riscv/kernel/vdso.c   | 90 
> +-
>  arch/riscv/kernel/vdso/hwprobe.c   |  6 +-
>  arch/riscv/kernel/vdso/vdso.lds.S  |  7 +-
>  9 files changed, 18 insertions(+), 124 deletions(-)
> 
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 
> d4a7ca0388c071b536df59c0eb11d55f9080c7cd..335cbbd4dddb17e5ccaa2cddaefc298cb559dbc0
>  100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -52,7 +52,7 @@ config RISCV
>   select ARCH_HAS_SYSCALL_WRAPPER
>   select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>   select ARCH_HAS_UBSAN
> - select ARCH_HAS_VDSO_TIME_DATA
> + select ARCH_HAS_VDSO_ARCH_DATA
>   select ARCH_KEEP_MEMBLOCK if ACPI
>   select ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE if 64BIT && MMU
>   select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
> @@ -115,6 +115,7 @@ config RISCV
>   select GENERIC_SCHED_CLOCK
>   select GENERIC_SMP_IDLE_THREAD
>   select GENERIC_TIME_VSYSCALL if MMU && 64BIT
> + select GENERIC_VDSO_DATA_STORE
>   select GENERIC_VDSO_TIME_NS if HAVE_GENERIC_VDSO
>   select HARDIRQS_SW_RESEND
>   select HAS_IOPORT if MMU
> diff --git a/arch/riscv/include/asm/vdso.h b/arch/riscv/include/asm/vdso.h
> index 
> f891478829a52c41e06240f67611694cc28197d9..c130d8100232cbe50e52e35eb418e354bd114cb7
>  100644
> --- a/arch/riscv/include/asm/vdso.h
> +++ b/arch/riscv/include/asm/vdso.h
> @@ -14,7 +14,7 @@
>   */
>  #ifdef CONFIG_MMU
>  
> -#define __VVAR_PAGES2
> +#define __VDSO_PAGES4
>  
>  #ifndef __ASSEMBLY__
>  #include 
> diff --git a/arch/riscv/include/asm/vdso/time_data.h 
> b/arch/riscv/include/asm/vdso/arch_data.h
> similarity index 71%
> rename from arch/riscv/include/asm/vdso/time_data.h
> rename to arch/riscv/include/asm/vdso/arch_data.h
> index 
> dfa65228999bed41dfd6c5e36cb678e1e055eec8..da57a3786f7a53c866fc00948826b4a2d839940f
>  100644
> --- a/arch/riscv/include/asm/vdso/time_data.h
> +++ b/arch/riscv/include/asm/vdso/arch_data.h
> @@ -1,12 +1,12 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
> -#ifndef __RISCV_ASM_VDSO_TIME_DATA_H
> -#define __RISCV_ASM_VDSO_TIME_DATA_H
> +#ifndef __RISCV_ASM_VDSO_ARCH_DATA_H
> +#define __RISCV_ASM_VDSO_ARCH_DATA_H
>  
>  #include 
>  #include 
>  #include 
>  
> -struct arch_vdso_time_data {
> +struct vdso_arch_data {
>   /* Stash static answers to the hwprobe queries when all CPUs are 
> selected. */
>   __u64 all_cpu_hwprobe_values[RISCV_HWPROBE_MAX_KEY + 1];
>  
> @@ -14,4 +14,4 @@ struct arch_vdso_time_data {
>   __u8 homogeneous_cpus;
>  };
>  
> -#endif /* __RISCV_ASM_VDSO_TIME_DATA_H */
> +#endif /* __RISCV_ASM_VDSO_ARCH_DATA_H */
> diff --git a/arch/riscv/include/asm/vdso/gettimeofday.h 
> b/arch/riscv/include/asm/vdso/gettimeofday.h
> index 
> ba3283cf7accaa93a38512d2c17eda0eefde0612..29164f84f93cec6e28251e6a0adfbc341ac88241
>  100644
> --- a/arch/riscv/include/asm/vdso/gettimeofday.h
> +++ b/arch/riscv/include/asm/vdso/gettimeofday.h
> @@ -69,7 +69,7 @@ int clock_getres_fallback(clockid_t _clkid, struct 
> __kernel_timespec *_ts)
>  #endif /* CONFIG_GENERIC_TIME_VSYSCALL */
>  
>  static __always_inline u64 __arch_get_hw_counter(s32 clock_mode,
> -  const struct vdso_data *vd)
> +  const struct vdso_time_data 
> *vd)
>  {
>   /*
>* The purpose of csr_read(CSR_TIME) is to trap the system into
> @@ -79,18 +79,6 @@ static __always_inline u64 __arch_get_hw_counter(s32 
> clock_mode,
>   return csr_read(CSR_TIME);
>  }
>  
> -static __always_inline const struct vdso_data *__arch_get_vdso_data(void)
> -{
> - return _vdso_data;
> -}
> -
> -#ifdef CONFIG_TIME_NS
> -static __alw

Re: [PATCH] select: Fix unbalanced user_access_end()

2025-01-13 Thread Al Viro

On Mon, Jan 13, 2025 at 09:37:24AM +0100, Christophe Leroy wrote:
> While working on implementing user access validation on powerpc
> I got the following warnings on a pmac32_defconfig build:
> 
> CC  fs/select.o
>   fs/select.o: warning: objtool: sys_pselect6+0x1bc: redundant UACCESS 
> disable
>   fs/select.o: warning: objtool: sys_pselect6_time32+0x1bc: redundant 
> UACCESS disable
> 
> On powerpc/32s, user_read_access_begin/end() are no-ops, but the
> failure path has a user_access_end() instead of user_read_access_end()
> which means an access end without any prior access begin.
> 
> Replace that user_access_end() by user_read_access_end().

ACK.

Re: [PATCH v13 5/5] rust: Use gendwarfksyms + extended modversions for CONFIG_MODVERSIONS

2025-01-13 Thread Sami Tolvanen

Hi Masahiro,

On Fri, Jan 10, 2025 at 6:26 PM Masahiro Yamada  wrote:
>
> On Sat, Jan 4, 2025 at 2:37 AM Matthew Maurer  wrote:
> >
> > From: Sami Tolvanen 
> >
> > Previously, two things stopped Rust from using MODVERSIONS:
> > 1. Rust symbols are occasionally too long to be represented in the
> >original versions table
> > 2. Rust types cannot be properly hashed by the existing genksyms
> >approach because:
> > * Looking up type definitions in Rust is more complex than C
> > * Type layout is potentially dependent on the compiler in Rust,
> >   not just the source type declaration.
> >
> > CONFIG_EXTENDED_MODVERSIONS addresses the first point, and
> > CONFIG_GENDWARFKSYMS the second. If Rust wants to use MODVERSIONS, allow
> > it to do so by selecting both features.
> >
> > Signed-off-by: Sami Tolvanen 
> > Co-developed-by: Matthew Maurer 
> > Signed-off-by: Matthew Maurer 
> > ---
> >  init/Kconfig  |  3 ++-
> >  rust/Makefile | 34 --
> >  2 files changed, 34 insertions(+), 3 deletions(-)
> >
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 
> > c1f9eb3d5f2e892e977ba1425599502dc830f552..b60acfd9431e0ac2bf401ecb6523b5104ad31150
> >  100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -1959,7 +1959,8 @@ config RUST
> > bool "Rust support"
> > depends on HAVE_RUST
> > depends on RUST_IS_AVAILABLE
> > -   depends on !MODVERSIONS
> > +   select EXTENDED_MODVERSIONS if MODVERSIONS
> > +   depends on !MODVERSIONS || GENDWARFKSYMS
> > depends on !GCC_PLUGIN_RANDSTRUCT
> > depends on !RANDSTRUCT
> > depends on !DEBUG_INFO_BTF || PAHOLE_HAS_LANG_EXCLUDE
> > diff --git a/rust/Makefile b/rust/Makefile
> > index 
> > a40a3936126d603836e0ec9b42a1285916b60e45..80f970ad81f7989afe5ff2b5f633f50feb7f6006
> >  100644
> > --- a/rust/Makefile
> > +++ b/rust/Makefile
> > @@ -329,10 +329,11 @@ $(obj)/bindings/bindings_helpers_generated.rs: 
> > private bindgen_target_extra = ;
> >  $(obj)/bindings/bindings_helpers_generated.rs: $(src)/helpers/helpers.c 
> > FORCE
> > $(call if_changed_dep,bindgen)
> >
> > +rust_exports = $(NM) -p --defined-only $(1) | awk '$$2~/(T|R|D|B)/ && 
> > $$3!~/__cfi/ { printf $(2),$(3) }'
> > +
> >  quiet_cmd_exports = EXPORTS $@
> >cmd_exports = \
> > -   $(NM) -p --defined-only $< \
> > -   | awk '$$2~/(T|R|D|B)/ && $$3!~/__cfi/ {printf 
> > "EXPORT_SYMBOL_RUST_GPL(%s);\n",$$3}' > $@
> > +   $(call rust_exports,$<,"EXPORT_SYMBOL_RUST_GPL(%s);\n",$$3) > $@
>
> I noticed a nit:
>
> Both of the two callsites of rust_exports pass
> '$$3' to the last parameter instead of hardcoding it.
>
> Is it a flexibility for future extensions?
>
> I cannot think of any other use except for printing
> the third column, i.e. symbol name.

Good catch, the last parameter isn't necessary anymore. It was used in
early versions of the series to also pass symbol addresses to
gendwarfksyms, but that's not needed since we read the symbol table
directly now.

Sami

Re: [PATCH v6 05/26] fs/dax: Create a common implementation to break DAX layouts

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> Prior to freeing a block file systems supporting FS DAX must check
> that the associated pages are both unmapped from user-space and not
> undergoing DMA or other access from eg. get_user_pages(). This is
> achieved by unmapping the file range and scanning the FS DAX
> page-cache to see if any pages within the mapping have an elevated
> refcount.
> 
> This is done using two functions - dax_layout_busy_page_range() which
> returns a page to wait for the refcount to become idle on. Rather than
> open-code this introduce a common implementation to both unmap and
> wait for the page to become idle.
> 
> Signed-off-by: Alistair Popple 
> 
> ---
> 
> Changes for v5:
> 
>  - Don't wait for idle pages on non-DAX mappings
> 
> Changes for v4:
> 
>  - Fixed some build breakage due to missing symbol exports reported by
>John Hubbard (thanks!).
[..]
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index cc1acb1..ee8e83f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3917,15 +3917,7 @@ int ext4_break_layouts(struct inode *inode)
>   if (WARN_ON_ONCE(!rwsem_is_locked(&inode->i_mapping->invalidate_lock)))
>   return -EINVAL;
>  
> - do {
> - page = dax_layout_busy_page(inode->i_mapping);
> - if (!page)
> - return 0;
> -
> - error = dax_wait_page_idle(page, ext4_wait_dax_page, inode);
> - } while (error == 0);
> -
> - return error;
> + return dax_break_mapping_inode(inode, ext4_wait_dax_page);

I hit this in my compile testing:

fs/ext4/inode.c: In function ‘ext4_break_layouts’:
fs/ext4/inode.c:3915:13: error: unused variable ‘error’ 
[-Werror=unused-variable]
 3915 | int error;
  | ^
fs/ext4/inode.c:3914:22: error: unused variable ‘page’ [-Werror=unused-variable]
 3914 | struct page *page;
  |  ^~~~
cc1: all warnings being treated as errors

...which gets fixed up later on, but bisect breakage is unwanted.

The bots will probably find this too eventually.

[PATCH v6 3/3] powerpc: Document details on H_HTM hcall

2025-01-13 Thread adubey

From: Abhishek Dubey 

Add documentation to 'papr_hcalls.rst' describing the
input, output and return values of the H_HTM hcall as
per the internal specification.

v3 patch:
  
https://lore.kernel.org/linuxppc-dev/20240828085223.42177-3-ma...@linux.ibm.com/

Signed-off-by: Abhishek Dubey 
Co-developed-by: Madhavan Srinivasan 
Signed-off-by: Madhavan Srinivasan 
---
 Documentation/arch/powerpc/papr_hcalls.rst | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/Documentation/arch/powerpc/papr_hcalls.rst 
b/Documentation/arch/powerpc/papr_hcalls.rst
index 80d2c0aadab5..805e1cb9bab9 100644
--- a/Documentation/arch/powerpc/papr_hcalls.rst
+++ b/Documentation/arch/powerpc/papr_hcalls.rst
@@ -289,6 +289,17 @@ to be issued multiple times in order to be completely 
serviced. The
 subsequent hcalls to the hypervisor until the hcall is completely serviced
 at which point H_SUCCESS or other error is returned by the hypervisor.
 
+**H_HTM**
+
+| Input: flags, target, operation (op), op-param1, op-param2, op-param3
+| Out: *dumphtmbufferdata*
+| Return Value: *H_Success,H_Busy,H_LongBusyOrder,H_Partial,H_Parameter,
+H_P2,H_P3,H_P4,H_P5,H_P6,H_State,H_Not_Available,H_Authority*
+
+H_HTM supports setup, configuration, control and dumping of Hardware Trace
+Macro (HTM) function and its data. HTM buffer stores tracing data for functions
+like core instruction, core LLAT and nest.
+
 References
 ==
 .. [1] "Power Architecture Platform Reference"
-- 
2.39.3

[PATCH v6 1/3] powerpc/pseries: Macros and wrapper functions for H_HTM call

2025-01-13 Thread adubey

From: Abhishek Dubey 

Define macros and wrapper functions to handle
H_HTM (Hardware Trace Macro) hypervisor call.
H_HTM is new HCALL added to export data from
Hardware Trace Macro (HTM) function.

v3 patch:
  
https://lore.kernel.org/linuxppc-dev/20240828085223.42177-1-ma...@linux.ibm.com/

Signed-off-by: Abhishek Dubey 
Co-developed-by: Madhavan Srinivasan 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/hvcall.h | 34 +++
 arch/powerpc/include/asm/plpar_wrappers.h | 21 ++
 2 files changed, 55 insertions(+)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 65d1f291393d..eeef13db2770 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -348,6 +348,7 @@
 #define H_SCM_FLUSH0x44C
 #define H_GET_ENERGY_SCALE_INFO0x450
 #define H_PKS_SIGNED_UPDATE0x454
+#define H_HTM   0x458
 #define H_WATCHDOG 0x45C
 #define H_GUEST_GET_CAPABILITIES 0x460
 #define H_GUEST_SET_CAPABILITIES 0x464
@@ -498,6 +499,39 @@
 #define H_GUEST_CAP_POWER11(1UL<<(63-3))
 #define H_GUEST_CAP_BITMAP2(1UL<<(63-63))
 
+/*
+ * Defines for H_HTM - Macros for hardware trace macro (HTM) function.
+ */
+#define H_HTM_FLAGS_HARDWARE_TARGET(1ul << 63)
+#define H_HTM_FLAGS_LOGICAL_TARGET (1ul << 62)
+#define H_HTM_FLAGS_PROCID_TARGET  (1ul << 61)
+#define H_HTM_FLAGS_NOWRAP (1ul << 60)
+
+#define H_HTM_OP_SHIFT (63-15)
+#define H_HTM_OP(x)((unsigned long)(x)<
 
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index 71648c126970..91be7b885944 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -65,6 +65,27 @@ static inline long register_dtl(unsigned long cpu, unsigned 
long vpa)
return vpa_call(H_VPA_REG_DTL, cpu, vpa);
 }
 
+static inline long htm_call(unsigned long flags, unsigned long target,
+   unsigned long operation, unsigned long param1,
+   unsigned long param2, unsigned long param3)
+{
+   return plpar_hcall_norets(H_HTM, flags, target, operation,
+ param1, param2, param3);
+}
+
+static inline long htm_get_dump_hardware(unsigned long nodeindex,
+   unsigned long nodalchipindex, unsigned long coreindexonchip,
+   unsigned long type, unsigned long addr, unsigned long size,
+   unsigned long offset)
+{
+   return htm_call(H_HTM_FLAGS_HARDWARE_TARGET,
+   H_HTM_TARGET_NODE_INDEX(nodeindex) |
+   H_HTM_TARGET_NODAL_CHIP_INDEX(nodalchipindex) |
+   H_HTM_TARGET_CORE_INDEX_ON_CHIP(coreindexonchip),
+   H_HTM_OP(H_HTM_OP_DUMP_DATA) | H_HTM_TYPE(type),
+   addr, size, offset);
+}
+
 extern void vpa_init(int cpu);
 
 static inline long plpar_pte_enter(unsigned long flags,
-- 
2.39.3

[PATCH v6 2/3] powerpc/pseries: Export hardware trace macro dump via debugfs

2025-01-13 Thread adubey

From: Abhishek Dubey 

This patch adds debugfs interface to export Hardware Trace Macro (HTM)
function data in a LPAR. New hypervisor call "H_HTM" has been
defined to setup, configure, control and dump the HTM data.
This patch supports only dumping of HTM data in a LPAR.
New debugfs folder called "htmdump" has been added under
/sys/kernel/debug/arch path which contains files need to
pass required parameters for the H_HTM dump function. New Kconfig
option called "CONFIG_HTMDUMP" is added in platform/pseries
for the same.

With this module loaded, list of files in debugfs path

/sys/kernel/debug/powerpc/htmdump
coreindexonchip  htmtype  nodalchipindex  nodeindex  trace

Changelog:
  v5->v6 : Header file inclusion
  v4->v5 : Removed offset from available calculation, as offset is
   always zero leading to buffur size reads.
   Edited comments and commit message

v3 patch:
  
https://lore.kernel.org/linuxppc-dev/20240828085223.42177-2-ma...@linux.ibm.com/

Signed-off-by: Abhishek Dubey 
Co-developed-by: Madhavan Srinivasan 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/platforms/pseries/Kconfig   |   9 ++
 arch/powerpc/platforms/pseries/Makefile  |   1 +
 arch/powerpc/platforms/pseries/htmdump.c | 121 +++
 3 files changed, 131 insertions(+)
 create mode 100644 arch/powerpc/platforms/pseries/htmdump.c

diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 42fc66e97539..b839e87408aa 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -128,6 +128,15 @@ config CMM
  will be reused for other LPARs. The interface allows firmware to
  balance memory across many LPARs.
 
+config HTMDUMP
+   tristate "PowerVM data dumper"
+   depends on PPC_PSERIES && DEBUG_FS
+   default m
+   help
+ Select this option, if you want to enable the kernel debugfs
+ interface to dump the Hardware Trace Macro (HTM) function data
+ in the LPAR.
+
 config HV_PERF_CTRS
bool "Hypervisor supplied PMU events (24x7 & GPCI)"
default y
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 7bf506f6b8c8..3f3e3492e436 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -19,6 +19,7 @@ obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o
 obj-$(CONFIG_HVCS) += hvcserver.o
 obj-$(CONFIG_HCALL_STATS)  += hvCall_inst.o
 obj-$(CONFIG_CMM)  += cmm.o
+obj-$(CONFIG_HTMDUMP)  += htmdump.o
 obj-$(CONFIG_IO_EVENT_IRQ) += io_event_irq.o
 obj-$(CONFIG_LPARCFG)  += lparcfg.o
 obj-$(CONFIG_IBMVIO)   += vio.o
diff --git a/arch/powerpc/platforms/pseries/htmdump.c 
b/arch/powerpc/platforms/pseries/htmdump.c
new file mode 100644
index ..57fc1700f604
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/htmdump.c
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) IBM Corporation, 2024
+ */
+
+#define pr_fmt(fmt) "htmdump: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static void *htm_buf;
+static u32 nodeindex;
+static u32 nodalchipindex;
+static u32 coreindexonchip;
+static u32 htmtype;
+static struct dentry *htmdump_debugfs_dir;
+
+static ssize_t htmdump_read(struct file *filp, char __user *ubuf,
+size_t count, loff_t *ppos)
+{
+   void *htm_buf = filp->private_data;
+   unsigned long page, read_size, available;
+   loff_t offset;
+   long rc;
+
+   page = ALIGN_DOWN(*ppos, PAGE_SIZE);
+   offset = (*ppos) % PAGE_SIZE;
+
+   rc = htm_get_dump_hardware(nodeindex, nodalchipindex, coreindexonchip,
+  htmtype, virt_to_phys(htm_buf), PAGE_SIZE, 
page);
+
+   switch (rc) {
+   case H_SUCCESS:
+   /* H_PARTIAL for the case where all available data can't be
+* returned due to buffer size constraint.
+*/
+   case H_PARTIAL:
+   break;
+   /* H_NOT_AVAILABLE indicates reading from an offset outside the range,
+* i.e. past end of file.
+*/
+   case H_NOT_AVAILABLE:
+   return 0;
+   case H_BUSY:
+   case H_LONG_BUSY_ORDER_1_MSEC:
+   case H_LONG_BUSY_ORDER_10_MSEC:
+   case H_LONG_BUSY_ORDER_100_MSEC:
+   case H_LONG_BUSY_ORDER_1_SEC:
+   case H_LONG_BUSY_ORDER_10_SEC:
+   case H_LONG_BUSY_ORDER_100_SEC:
+   return -EBUSY;
+   case H_PARAMETER:
+   case H_P2:
+   case H_P3:
+   case H_P4:
+   case H_P5:
+   case H_P6:
+   return -EINVAL;
+   case H_STATE:
+   return -EIO;
+   case H_AUTHORITY:
+   return -EPERM;
+   }
+
+   available = PAGE_SIZE;
+   read_size = min(count, available);
+   *ppos += read_size;
+   return simple_read_from_buffer(ubuf, count, &offset, htm_buf, 
avai

Re: [PATCH v6 2/3] powerpc/pseries: Export hardware trace macro dump via debugfs

2025-01-13 Thread Athira Rajeev




> On 13 Jan 2025, at 4:10 PM, adu...@linux.ibm.com wrote:
> 
> From: Abhishek Dubey 
> 
> This patch adds debugfs interface to export Hardware Trace Macro (HTM)
> function data in a LPAR. New hypervisor call "H_HTM" has been
> defined to setup, configure, control and dump the HTM data.
> This patch supports only dumping of HTM data in a LPAR.
> New debugfs folder called "htmdump" has been added under
> /sys/kernel/debug/arch path which contains files need to
> pass required parameters for the H_HTM dump function. New Kconfig
> option called "CONFIG_HTMDUMP" is added in platform/pseries
> for the same.
> 
> With this module loaded, list of files in debugfs path
> 
> /sys/kernel/debug/powerpc/htmdump
> coreindexonchip  htmtype  nodalchipindex  nodeindex  trace
> 
> Changelog:
>  v5->v6 : Header file inclusion
>  v4->v5 : Removed offset from available calculation, as offset is
>   always zero leading to buffur size reads.
>   Edited comments and commit message
> 
> v3 patch:
>  
> https://lore.kernel.org/linuxppc-dev/20240828085223.42177-2-ma...@linux.ibm.com/

Please move the Changelog and v3 patch reference to below where it won’t come 
as part of git log

> 
> Signed-off-by: Abhishek Dubey 
> Co-developed-by: Madhavan Srinivasan 
> Signed-off-by: Madhavan Srinivasan 
> —

Here we can add the changelog.

With that change,
Reviewed-by: Athira Rajeev 

Thanks
Athira
> arch/powerpc/platforms/pseries/Kconfig   |   9 ++
> arch/powerpc/platforms/pseries/Makefile  |   1 +
> arch/powerpc/platforms/pseries/htmdump.c | 121 +++
> 3 files changed, 131 insertions(+)
> create mode 100644 arch/powerpc/platforms/pseries/htmdump.c
> 
> diff --git a/arch/powerpc/platforms/pseries/Kconfig 
> b/arch/powerpc/platforms/pseries/Kconfig
> index 42fc66e97539..b839e87408aa 100644
> --- a/arch/powerpc/platforms/pseries/Kconfig
> +++ b/arch/powerpc/platforms/pseries/Kconfig
> @@ -128,6 +128,15 @@ config CMM
>  will be reused for other LPARs. The interface allows firmware to
>  balance memory across many LPARs.
> 
> +config HTMDUMP
> + tristate "PowerVM data dumper"
> + depends on PPC_PSERIES && DEBUG_FS
> + default m
> + help
> +  Select this option, if you want to enable the kernel debugfs
> +  interface to dump the Hardware Trace Macro (HTM) function data
> +  in the LPAR.
> +
> config HV_PERF_CTRS
> bool "Hypervisor supplied PMU events (24x7 & GPCI)"
> default y
> diff --git a/arch/powerpc/platforms/pseries/Makefile 
> b/arch/powerpc/platforms/pseries/Makefile
> index 7bf506f6b8c8..3f3e3492e436 100644
> --- a/arch/powerpc/platforms/pseries/Makefile
> +++ b/arch/powerpc/platforms/pseries/Makefile
> @@ -19,6 +19,7 @@ obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o
> obj-$(CONFIG_HVCS) += hvcserver.o
> obj-$(CONFIG_HCALL_STATS) += hvCall_inst.o
> obj-$(CONFIG_CMM) += cmm.o
> +obj-$(CONFIG_HTMDUMP) += htmdump.o
> obj-$(CONFIG_IO_EVENT_IRQ) += io_event_irq.o
> obj-$(CONFIG_LPARCFG) += lparcfg.o
> obj-$(CONFIG_IBMVIO) += vio.o
> diff --git a/arch/powerpc/platforms/pseries/htmdump.c 
> b/arch/powerpc/platforms/pseries/htmdump.c
> new file mode 100644
> index ..57fc1700f604
> --- /dev/null
> +++ b/arch/powerpc/platforms/pseries/htmdump.c
> @@ -0,0 +1,121 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) IBM Corporation, 2024
> + */
> +
> +#define pr_fmt(fmt) "htmdump: " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static void *htm_buf;
> +static u32 nodeindex;
> +static u32 nodalchipindex;
> +static u32 coreindexonchip;
> +static u32 htmtype;
> +static struct dentry *htmdump_debugfs_dir;
> +
> +static ssize_t htmdump_read(struct file *filp, char __user *ubuf,
> + size_t count, loff_t *ppos)
> +{
> + void *htm_buf = filp->private_data;
> + unsigned long page, read_size, available;
> + loff_t offset;
> + long rc;
> +
> + page = ALIGN_DOWN(*ppos, PAGE_SIZE);
> + offset = (*ppos) % PAGE_SIZE;
> +
> + rc = htm_get_dump_hardware(nodeindex, nodalchipindex, coreindexonchip,
> +   htmtype, virt_to_phys(htm_buf), PAGE_SIZE, page);
> +
> + switch (rc) {
> + case H_SUCCESS:
> + /* H_PARTIAL for the case where all available data can't be
> + * returned due to buffer size constraint.
> + */
> + case H_PARTIAL:
> + break;
> + /* H_NOT_AVAILABLE indicates reading from an offset outside the range,
> + * i.e. past end of file.
> + */
> + case H_NOT_AVAILABLE:
> + return 0;
> + case H_BUSY:
> + case H_LONG_BUSY_ORDER_1_MSEC:
> + case H_LONG_BUSY_ORDER_10_MSEC:
> + case H_LONG_BUSY_ORDER_100_MSEC:
> + case H_LONG_BUSY_ORDER_1_SEC:
> + case H_LONG_BUSY_ORDER_10_SEC:
> + case H_LONG_BUSY_ORDER_100_SEC:
> + return -EBUSY;
> + case H_PARAMETER:
> + case H_P2:
> + case H_P3:
> + case H_P4:
> + case H_P5:
> + case H_P6:
> + return -EINVAL;
> + case H_STATE:
> + return -EIO;
> + case H_AUTHORITY:
> + return -EPERM;
> + }
> +
> + available = PAGE_SIZE;
> + read_size = min(count, available);
> + *ppos += read_size;
> + return simple_read_from_buffer(ubuf, coun

Re: [PATCH] fadump: Use str_yes_no() helper in fadump_show_config()

2025-01-13 Thread IBM

Thorsten Blum  writes:

> Remove hard-coded strings by using the str_yes_no() helper function.
>
> Signed-off-by: Thorsten Blum 
> ---
>  arch/powerpc/kernel/fadump.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)


In fadump.c file we have implicit include of string_choices.h i.e. 

include/linux/seq_file.h -> linux/string_helpers.h -> 
linux/string_choices.h 

Directly having string_choices include could be better. 
#include 

However no hard preferences. The patch functionally looks correct to me. 

Please feel free to add - 
Reviewed-by: Ritesh Harjani (IBM) 


>
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 4b371c738213..8c531533dd3e 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -289,10 +289,8 @@ static void __init fadump_show_config(void)
>   if (!fw_dump.fadump_supported)
>   return;
>  
> - pr_debug("Fadump enabled: %s\n",
> - (fw_dump.fadump_enabled ? "yes" : "no"));
> - pr_debug("Dump Active   : %s\n",
> - (fw_dump.dump_active ? "yes" : "no"));
> + pr_debug("Fadump enabled: %s\n", 
> str_yes_no(fw_dump.fadump_enabled));
> + pr_debug("Dump Active   : %s\n", str_yes_no(fw_dump.dump_active));
>   pr_debug("Dump section sizes:\n");
>   pr_debug("CPU state data size: %lx\n", fw_dump.cpu_state_data_size);
>   pr_debug("HPTE region size   : %lx\n", fw_dump.hpte_region_size);
> -- 
> 2.47.1

Re: [PATCH 3/5] KVM: Add a common kvm_run flag to communicate an exit needs completion

2025-01-13 Thread Sean Christopherson

On Sat, Jan 11, 2025, Marc Zyngier wrote:
> On Sat, 11 Jan 2025 01:24:48 +,
> Sean Christopherson  wrote:
> > 
> > Add a kvm_run flag, KVM_RUN_NEEDS_COMPLETION, to communicate to userspace
> > that KVM_RUN needs to be re-executed prior to save/restore in order to
> > complete the instruction/operation that triggered the userspace exit.
> > 
> > KVM's current approach of adding notes in the Documentation is beyond
> > brittle, e.g. there is at least one known case where a KVM developer added
> > a new userspace exit type, and then that same developer forgot to handle
> > completion when adding userspace support.
> 
> Is this going to fix anything? If they couldn't be bothered to read
> the documentation, let alone update it, how is that going to be
> improved by extra rules and regulations?
> 
> I don't see how someone ignoring the documented behaviour of a given
> exit reason is, all of a sudden, have an epiphany and take a *new*
> flag into account.

The idea is to reduce the probability of introducing bugs, in KVM or userspace,
every time KVM attaches a completion callback.  Yes, userspace would need to be
updated to handle KVM_RUN_NEEDS_COMPLETION, but once that flag is merged, 
neither
KVM's documentation nor userspace would never need to be updated again.  And if
all architectures took an approach of handling completion via function callback,
I'm pretty sure we'd never need to manually update KVM itself either.

> > +7.37 KVM_CAP_NEEDS_COMPLETION
> > +-
> > +
> > +:Architectures: all
> > +:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
> > +
> > +The presence of this capability indicates that KVM_RUN will set
> > +KVM_RUN_NEEDS_COMPLETION in kvm_run.flags if KVM requires userspace to 
> > re-enter
> > +the kernel KVM_RUN to complete the exit.
> > +
> > +For select exits, userspace must re-enter the kernel with KVM_RUN to 
> > complete
> > +the corresponding operation, only after which is guest state guaranteed to 
> > be
> > +consistent.  On such a KVM_RUN, the kernel side will first finish 
> > incomplete
> > +operations and then check for pending signals.
> > +
> > +The pending state of the operation for such exits is not preserved in state
> > +which is visible to userspace, thus userspace should ensure that the 
> > operation
> > +is completed before performing state save/restore, e.g. for live migration.
> > +Userspace can re-enter the guest with an unmasked signal pending or with 
> > the
> > +immediate_exit field set to complete pending operations without allowing 
> > any
> > +further instructions to be executed.
> > +
> > +Without KVM_CAP_NEEDS_COMPLETION, KVM_RUN_NEEDS_COMPLETION will never be 
> > set
> > +and userspace must assume that exits of type KVM_EXIT_IO, KVM_EXIT_MMIO,
> > +KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN, KVM_EXIT_EPR, 
> > KVM_EXIT_X86_RDMSR,
> > +KVM_EXIT_X86_WRMSR, and KVM_EXIT_HYPERCALL require completion.
> 
> So once you advertise KVM_CAP_NEEDS_COMPLETION, the completion flag
> must be present for all of these exits, right? And from what I can
> tell, this capability is unconditionally advertised.
> 
> Yet, you don't amend arm64 to publish that flag. Not that I think this
> causes any issue (even if you save the state at that point without
> reentering the guest, it will be still be consistent), but that
> directly contradicts the documentation (isn't that ironic? ;-).

It does cause issues, I missed this code in kvm_arch_vcpu_ioctl_run():

if (run->exit_reason == KVM_EXIT_MMIO) {
ret = kvm_handle_mmio_return(vcpu);
if (ret <= 0)
return ret;
}

> Or is your intent to *relax* the requirements on arm64 (and anything
> else but x86 and POWER)?

Re: [kvm-unit-tests PATCH v1 5/5] configure: arm64: Make 'max' the default for --processor

2025-01-13 Thread Andrew Jones

On Fri, Jan 10, 2025 at 01:58:48PM +, Alexandru Elisei wrote:
> Newer architecture features are supported by qemu TCG on newer CPUs. When
> writing a test for such architecture features, it is necessary to pass the
> correct -cpu argument to qemu. Make it easier on users and test authors
> alike by making 'max' the default value for --processor. The 'max' CPU
> model contains all the features of the cortex-a57 CPU (the old default), so
> no regression should be possible.
> 
> A side effect is that, by default, the compiler will not receive a -mcpu
> argument for compiling the code. The expectation is that this is fine,
> since support for -mcpu=$PROCESSOR has only been added for arm64 in the
> last commit.
> 
> The default for arm (cortex-a15) has been kept unchanged, because passing
> --processor=max will cause compilation to break. If the user wants the qemu
> CPU model to be 'max', the user will also have to supply a suitable compile
> CPU target via --cflags=-mcpu= configure option.
> 
> Signed-off-by: Alexandru Elisei 
> ---
>  configure | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/configure b/configure
> index 46964d36a7d8..3ab0ec208e10 100755
> --- a/configure
> +++ b/configure
> @@ -14,7 +14,7 @@ function get_default_processor()
>  default_processor="cortex-a15"
>  ;;
>  "arm64" | "aarch64")
> -default_processor="cortex-a57"
> +default_processor="max"
>  ;;
>  *)
>  default_processor=$arch
> -- 
> 2.47.1
>

Another reason to introduce a new parameter (qemu_cpu) is that we can also
change arm32 to 'max', reducing divergence between arm32 and arm64.

Thanks,
drew

Re: [kvm-unit-tests PATCH v1 3/5] arm64: Implement the ./configure --processor option

2025-01-13 Thread Andrew Jones

On Fri, Jan 10, 2025 at 01:58:46PM +, Alexandru Elisei wrote:
> The help text for the ./configure --processor option says:
> 
> --processor=PROCESSOR  processor to compile for (cortex-a57)
> 
> but, unlike arm, the build system does not pass a -mcpu argument to the
> compiler. Fix it, and bring arm64 at parity with arm.
> 
> Note that this introduces a regression, which is also present on arm: if
> the --processor argument is something that the compiler doesn't understand,
> but qemu does (like 'max'), then compilation fails. This will be fixed in a
> following patch; another fix is to specify a CPU model that gcc implements
> by using --cflags=-mcpu=.
> 
> Signed-off-by: Alexandru Elisei 
> ---
>  arm/Makefile.arm| 1 -
>  arm/Makefile.common | 1 +
>  2 files changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arm/Makefile.arm b/arm/Makefile.arm
> index 7fd39f3ada64..d6250b7fb686 100644
> --- a/arm/Makefile.arm
> +++ b/arm/Makefile.arm
> @@ -12,7 +12,6 @@ $(error Cannot build arm32 tests as EFI apps)
>  endif
>  
>  CFLAGS += $(machine)
> -CFLAGS += -mcpu=$(PROCESSOR)
>  CFLAGS += -mno-unaligned-access
>  
>  ifeq ($(TARGET),qemu)
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index f828dbe01d33..a5d97bcf477a 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -25,6 +25,7 @@ AUXFLAGS ?= 0x0
>  # stack.o relies on frame pointers.
>  KEEP_FRAME_POINTER := y
>  
> +CFLAGS += -mcpu=$(PROCESSOR)
>  CFLAGS += -std=gnu99
>  CFLAGS += -ffreestanding
>  CFLAGS += -O2
> -- 
> 2.47.1
>

Reviewed-by: Andrew Jones

Re: [kvm-unit-tests PATCH v1 1/5] configure: Document that the architecture name 'aarch64' is also supported

2025-01-13 Thread Andrew Jones

On Fri, Jan 10, 2025 at 01:58:44PM +, Alexandru Elisei wrote:
> $arch, on arm64, defaults to 'aarch64', and later in the script is replaced
> by 'arm64'. Intentional or not, document that the name 'aarch64' is also
> supported when configuring for the arm64 architecture. This has been the
> case since the initial commit that added support for the arm64
> architecture, commit 39ac3f8494be ("arm64: initial drop").
> 
> The help text for --arch changes from*:
> 
>--arch=ARCHarchitecture to compile for (aarch64). ARCH can be 
> one of:
>arm, arm64, i386, ppc64, riscv32, riscv64, s390x, 
> x86_64
> 
> to:
> 
> --arch=ARCHarchitecture to compile for (aarch64). ARCH can be 
> one of:
>arm, arm64/aarch64, i386, ppc64, riscv32, riscv64, 
> s390x, x86_64
> 
> *Worth pointing out that the default architecture is 'aarch64', even though
> the rest of the help text doesn't have it as one of the supported
> architectures.
> 
> Signed-off-by: Alexandru Elisei 
> ---
>  configure | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/configure b/configure
> index 86cf1da36467..5b0a2d7f39c0 100755
> --- a/configure
> +++ b/configure
> @@ -47,7 +47,7 @@ usage() {
>  
>   Options include:
>   --arch=ARCHarchitecture to compile for ($arch). ARCH 
> can be one of:
> -arm, arm64, i386, ppc64, riscv32, riscv64, 
> s390x, x86_64
> +arm, arm64/aarch64, i386, ppc64, riscv32, 
> riscv64, s390x, x86_64
>   --processor=PROCESSOR  processor to compile for ($arch)
>   --target=TARGETtarget platform that the tests will be 
> running on (qemu or
>  kvmtool, default is qemu) (arm/arm64 only)
> -- 
> 2.47.1
>

I'd prefer to support --arch=aarch64, but then always refer to it as only
arm64 everywhere else. We need to support arch=aarch64 since that's what
'uname -m' returns, but I don't think we need to change the help text for
it. If we don't want to trust our users to figure out arm64==aarch64,
then we can do something like

@@ -216,12 +197,12 @@ while [[ $optno -le $argc ]]; do
werror=
;;
--help)
-   usage
+   do_help=1
;;
*)
echo "Unknown option '$opt'"
echo
-   usage
+   do_help=1
;;
 esac
 done

And then only do

 if [ $do_help ]; then
usage
 fi

after $arch and other variables have had a chance to be converted.

Thanks,
drew

Re: [PATCH] tools/perf: Fix segfault during perf record --off-cpu when debuginfo is not enabled

2025-01-13 Thread Arnaldo Carvalho de Melo

On Mon, Jan 06, 2025 at 01:25:32PM -0800, Namhyung Kim wrote:
> On Fri, Dec 27, 2024 at 04:18:32PM +0530, Athira Rajeev wrote:
> > 
> > 
> > > On 23 Dec 2024, at 7:28 PM, Athira Rajeev  
> > > wrote:
> > > 
> > > When kernel is built without debuginfo, running perf record with
> > > --off-cpu results in segfault as below:
> > > 
> > >   ./perf record --off-cpu -e dummy sleep 1
> > >   libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was 
> > > CONFIG_DEBUG_INFO_BTF enabled?
> > >   libbpf: failed to find '.BTF' ELF section in 
> > > /lib/modules/6.13.0-rc3+/build/vmlinux
> > >   libbpf: failed to find valid kernel BTF
> > >   Segmentation fault (core dumped)
> > > 
> > > The backtrace pointed to:
> > > 
> > >   #0  0x100fb17c in btf.type_cnt ()
> > >   #1  0x100fc1a8 in btf_find_by_name_kind ()
> > >   #2  0x100fc38c in btf.find_by_name_kind ()
> > >   #3  0x102ee3ac in off_cpu_prepare ()
> > >   #4  0x1002f78c in cmd_record ()
> > >   #5  0x100aee78 in run_builtin ()
> > >   #6  0x100af3e4 in handle_internal_command ()
> > >   #7  0x1001004c in main ()
> > > 
> > > Code sequence is:
> > >   static void check_sched_switch_args(void)
> > >   {
> > >struct btf *btf = btf__load_vmlinux_btf();
> > >const struct btf_type *t1, *t2, *t3;
> > >u32 type_id;
> > > 
> > >type_id = btf__find_by_name_kind(btf, "btf_trace_sched_switch",
> > > BTF_KIND_TYPEDEF);
> > > 
> > > btf__load_vmlinux_btf fails when CONFIG_DEBUG_INFO_BTF is not enabled.
> > > Here bpf__find_by_name_kind calls btf__type_cnt with NULL btf
> > > value and results in segfault. To fix this, add a check to see if
> > > btf is not NULL before invoking bpf__find_by_name_kind
> > > 
> > > Signed-off-by: Athira Rajeev 
> 
> Reviewed-by: Namhyung Kim 

Thanks, applied to perf-tools-next,

- Arnaldo

Re: [kvm-unit-tests PATCH v1 4/5] arm/arm64: Add support for --processor=max

2025-01-13 Thread Andrew Jones

On Fri, Jan 10, 2025 at 01:58:47PM +, Alexandru Elisei wrote:
> For arm64, newer architecture features are supported only on newer CPUs.
> Instead of expecting the user to know which CPU model supports which
> feature when using the TCG accelerator for qemu, let's make it easier and
> add support for the --processor 'max' value.
> 
> The --processor value is passed to the compiler's -mcpu argument and to
> qemu's -cpu argument. 'max' is a special value that only qemu understands -
> it means that all CPU features that qemu implements are supported by the
> guest CPU, and passing it to the compiler causes a build error. So omit the
> -mcpu argument when $PROCESSOR=max.
> 
> This affects only the TCG accelerator; when using KVM or HVF,
> kvm-unit-tests sets the cpu model to 'host'.
> 
> Note that using --processor=max with a 32 bit compiler will cause a build
> error: the CPU model that the compiler defaults to when the -mcpu argument
> is missing lacks support for some of the instructions that kvm-unit-tests
> uses. The solution in the case is to specify a CPU model for the compiler
> using --cflags:
> 
>   ./configure --arch=arm --processor=max --cflags=-mcpu=
> 
> This patch doesn't introduce a regression for arm when --processor=max is
> used, it's only the error that changes: from an unknown processor type to
> using instructions that are not available on the processor.
> 
> Signed-off-by: Alexandru Elisei 
> ---
>  arm/Makefile.common | 2 ++
>  configure   | 5 -
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/Makefile.common b/arm/Makefile.common
> index a5d97bcf477a..b757250dc9ae 100644
> --- a/arm/Makefile.common
> +++ b/arm/Makefile.common
> @@ -25,7 +25,9 @@ AUXFLAGS ?= 0x0
>  # stack.o relies on frame pointers.
>  KEEP_FRAME_POINTER := y
>  
> +ifneq ($(PROCESSOR),max)
>  CFLAGS += -mcpu=$(PROCESSOR)
> +endif
>  CFLAGS += -std=gnu99
>  CFLAGS += -ffreestanding
>  CFLAGS += -O2
> diff --git a/configure b/configure
> index 138840c3f76d..46964d36a7d8 100755
> --- a/configure
> +++ b/configure
> @@ -67,7 +67,10 @@ usage() {
>   Options include:
>   --arch=ARCHarchitecture to compile for ($arch). ARCH 
> can be one of:
>  arm, arm64/aarch64, i386, ppc64, riscv32, 
> riscv64, s390x, x86_64
> - --processor=PROCESSOR  processor to compile for ($default_processor)
> + --processor=PROCESSOR  processor to compile for 
> ($default_processor). For arm and arm64, the
> +value 'max' is special and it will be passed 
> directly to
> +qemu, bypassing the compiler. In this case, 
> --cflags can be
> +used to compile for a specific processor.
>   --target=TARGETtarget platform that the tests will be 
> running on (qemu or
>  kvmtool, default is qemu) (arm/arm64 only)
>   --cross-prefix=PREFIX  cross compiler prefix
> -- 
> 2.47.1
>

I don't think we want to overload processor this way. While mcpu and QEMU
could both understand the same cpu names, then it was mostly fine
(although it probably shouldn't have been overloaded before either). Now
that we want one name for compiling and another for running, then I think
we need another configure parameter, something like --qemu-cpu.

Thanks,
drew

> 
> -- 
> kvm-riscv mailing list
> kvm-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kvm-riscv

Re: [PATCH 2/2] PCI: dwc: layerscape: Use syscon_regmap_lookup_by_phandle_args

2025-01-13 Thread Frank Li

On Sun, Jan 12, 2025 at 02:39:03PM +0100, Krzysztof Kozlowski wrote:
> Use syscon_regmap_lookup_by_phandle_args() which is a wrapper over
> syscon_regmap_lookup_by_phandle() combined with getting the syscon
> argument.  Except simpler code this annotates within one line that given
> phandle has arguments, so grepping for code would be easier.
>
> Signed-off-by: Krzysztof Kozlowski 

Reviewed-by: Frank Li 

> ---
>  drivers/pci/controller/dwc/pci-layerscape.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
> b/drivers/pci/controller/dwc/pci-layerscape.c
> index 
> ee6f5256813374bdf656bef4f9b96e1b8760d1b5..239a05b36e8e6291b195f1253289af79f4a86d36
>  100644
> --- a/drivers/pci/controller/dwc/pci-layerscape.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> @@ -329,7 +329,6 @@ static int ls_pcie_probe(struct platform_device *pdev)
>   struct ls_pcie *pcie;
>   struct resource *dbi_base;
>   u32 index[2];
> - int ret;
>
>   pcie = devm_kzalloc(dev, sizeof(*pcie), GFP_KERNEL);
>   if (!pcie)
> @@ -355,16 +354,15 @@ static int ls_pcie_probe(struct platform_device *pdev)
>   pcie->pf_lut_base = pci->dbi_base + pcie->drvdata->pf_lut_off;
>
>   if (pcie->drvdata->scfg_support) {
> - pcie->scfg = syscon_regmap_lookup_by_phandle(dev->of_node, 
> "fsl,pcie-scfg");
> + pcie->scfg =
> + syscon_regmap_lookup_by_phandle_args(dev->of_node,
> +  "fsl,pcie-scfg", 2,
> +  index);
>   if (IS_ERR(pcie->scfg)) {
>   dev_err(dev, "No syscfg phandle specified\n");
>   return PTR_ERR(pcie->scfg);
>   }
>
> - ret = of_property_read_u32_array(dev->of_node, "fsl,pcie-scfg", 
> index, 2);
> - if (ret)
> - return ret;
> -
>   pcie->index = index[1];
>   }
>
>
> --
> 2.43.0
>

Re: [kvm-unit-tests PATCH v1 0/5] arm64: Change the default --processor to max

2025-01-13 Thread Vladimir Murzin

On 1/10/25 13:58, Alexandru Elisei wrote:
> (CC'ing everyone from MAINTAINERS because I'm touching configure)
> 
> Vladimir sent a test for MTE [1], which didn't work on the default -cpu
> model, cortex-a57, because that CPU didn't implement MTE. There were two
> options to get it working:
> 
> 1. Add -cpu max to the extra_params unittest parameter.
> 2. Make the default value for the configure --processor option 'max'.
> 
> We decided that the second option was preferable, so here it is.
> 
> The first patch might look unrelated, but when I was writing the function
> to select the default processor based on the architecture I noticed that
> for arm64, $arch is first equal to aarch64, then it gets changed to arm64.
> My first instinct was to have it be arm64 from the start, but then I
> realized that, despite the help text, --arch=aarch64 has been supported
> ever since arm64 was added to kvm-unit-tests. So I decided that it might
> more prudent to go with it and document it.
> 
> [1] 
> https://lore.kernel.org/all/20241212103447.34593-1-vladimir.mur...@arm.com/
> 

Thanks Alex! That removes extra hassle of setting up -cpu to match required
feature. My MTE test continue working fine and require one less configuration
option - undeniable improvement in user experience!

FWIW:
Tested-by: Vladimir Murzin  # arm64

Vladimir

> Alexandru Elisei (5):
>   configure: Document that the architecture name 'aarch64' is also
> supported
>   configure: Display the default processor for arm and arm64
>   arm64: Implement the ./configure --processor option
>   arm/arm64: Add support for --processor=max
>   configure: arm64: Make 'max' the default for --processor
> 
>  arm/Makefile.arm|  1 -
>  arm/Makefile.common |  3 +++
>  configure   | 35 ++-
>  3 files changed, 29 insertions(+), 10 deletions(-)
> 
> 
> base-commit: 0ed2cdf3c80ee803b9150898e687e77e4d6f5db2
> -- 2.47.1
>

[PATCH v6 2/3] powerpc/pseries: Export hardware trace macro dump via debugfs

2025-01-13 Thread adubey

From: Abhishek Dubey 

This patch adds debugfs interface to export Hardware Trace Macro (HTM)
function data in a LPAR. New hypervisor call "H_HTM" has been
defined to setup, configure, control and dump the HTM data.
This patch supports only dumping of HTM data in a LPAR.
New debugfs folder called "htmdump" has been added under
/sys/kernel/debug/arch path which contains files need to
pass required parameters for the H_HTM dump function. New Kconfig
option called "CONFIG_HTMDUMP" is added in platform/pseries
for the same.

With this module loaded, list of files in debugfs path

/sys/kernel/debug/powerpc/htmdump
coreindexonchip  htmtype  nodalchipindex  nodeindex  trace

Changelog:
  v5->v6 : Header file inclusion
  v4->v5 : Removed offset from available calculation, as offset is
   always zero leading to buffur size reads.
   Edited comments and commit message

v3 patch:
  
https://lore.kernel.org/linuxppc-dev/20240828085223.42177-2-ma...@linux.ibm.com/

Signed-off-by: Abhishek Dubey 
Co-developed-by: Madhavan Srinivasan 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/platforms/pseries/Kconfig   |   9 ++
 arch/powerpc/platforms/pseries/Makefile  |   1 +
 arch/powerpc/platforms/pseries/htmdump.c | 121 +++
 3 files changed, 131 insertions(+)
 create mode 100644 arch/powerpc/platforms/pseries/htmdump.c

diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 42fc66e97539..b839e87408aa 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -128,6 +128,15 @@ config CMM
  will be reused for other LPARs. The interface allows firmware to
  balance memory across many LPARs.
 
+config HTMDUMP
+   tristate "PowerVM data dumper"
+   depends on PPC_PSERIES && DEBUG_FS
+   default m
+   help
+ Select this option, if you want to enable the kernel debugfs
+ interface to dump the Hardware Trace Macro (HTM) function data
+ in the LPAR.
+
 config HV_PERF_CTRS
bool "Hypervisor supplied PMU events (24x7 & GPCI)"
default y
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 7bf506f6b8c8..3f3e3492e436 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -19,6 +19,7 @@ obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o
 obj-$(CONFIG_HVCS) += hvcserver.o
 obj-$(CONFIG_HCALL_STATS)  += hvCall_inst.o
 obj-$(CONFIG_CMM)  += cmm.o
+obj-$(CONFIG_HTMDUMP)  += htmdump.o
 obj-$(CONFIG_IO_EVENT_IRQ) += io_event_irq.o
 obj-$(CONFIG_LPARCFG)  += lparcfg.o
 obj-$(CONFIG_IBMVIO)   += vio.o
diff --git a/arch/powerpc/platforms/pseries/htmdump.c 
b/arch/powerpc/platforms/pseries/htmdump.c
new file mode 100644
index ..57fc1700f604
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/htmdump.c
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) IBM Corporation, 2024
+ */
+
+#define pr_fmt(fmt) "htmdump: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static void *htm_buf;
+static u32 nodeindex;
+static u32 nodalchipindex;
+static u32 coreindexonchip;
+static u32 htmtype;
+static struct dentry *htmdump_debugfs_dir;
+
+static ssize_t htmdump_read(struct file *filp, char __user *ubuf,
+size_t count, loff_t *ppos)
+{
+   void *htm_buf = filp->private_data;
+   unsigned long page, read_size, available;
+   loff_t offset;
+   long rc;
+
+   page = ALIGN_DOWN(*ppos, PAGE_SIZE);
+   offset = (*ppos) % PAGE_SIZE;
+
+   rc = htm_get_dump_hardware(nodeindex, nodalchipindex, coreindexonchip,
+  htmtype, virt_to_phys(htm_buf), PAGE_SIZE, 
page);
+
+   switch (rc) {
+   case H_SUCCESS:
+   /* H_PARTIAL for the case where all available data can't be
+* returned due to buffer size constraint.
+*/
+   case H_PARTIAL:
+   break;
+   /* H_NOT_AVAILABLE indicates reading from an offset outside the range,
+* i.e. past end of file.
+*/
+   case H_NOT_AVAILABLE:
+   return 0;
+   case H_BUSY:
+   case H_LONG_BUSY_ORDER_1_MSEC:
+   case H_LONG_BUSY_ORDER_10_MSEC:
+   case H_LONG_BUSY_ORDER_100_MSEC:
+   case H_LONG_BUSY_ORDER_1_SEC:
+   case H_LONG_BUSY_ORDER_10_SEC:
+   case H_LONG_BUSY_ORDER_100_SEC:
+   return -EBUSY;
+   case H_PARAMETER:
+   case H_P2:
+   case H_P3:
+   case H_P4:
+   case H_P5:
+   case H_P6:
+   return -EINVAL;
+   case H_STATE:
+   return -EIO;
+   case H_AUTHORITY:
+   return -EPERM;
+   }
+
+   available = PAGE_SIZE;
+   read_size = min(count, available);
+   *ppos += read_size;
+   return simple_read_from_buffer(ubuf, count, &offset, htm_buf, 
avai

[PATCH v6 1/3] powerpc/pseries: Macros and wrapper functions for H_HTM call

2025-01-13 Thread adubey

From: Abhishek Dubey 

Define macros and wrapper functions to handle
H_HTM (Hardware Trace Macro) hypervisor call.
H_HTM is new HCALL added to export data from
Hardware Trace Macro (HTM) function.

v3 patch:
  
https://lore.kernel.org/linuxppc-dev/20240828085223.42177-1-ma...@linux.ibm.com/

Signed-off-by: Abhishek Dubey 
Co-developed-by: Madhavan Srinivasan 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/hvcall.h | 34 +++
 arch/powerpc/include/asm/plpar_wrappers.h | 21 ++
 2 files changed, 55 insertions(+)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 65d1f291393d..eeef13db2770 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -348,6 +348,7 @@
 #define H_SCM_FLUSH0x44C
 #define H_GET_ENERGY_SCALE_INFO0x450
 #define H_PKS_SIGNED_UPDATE0x454
+#define H_HTM   0x458
 #define H_WATCHDOG 0x45C
 #define H_GUEST_GET_CAPABILITIES 0x460
 #define H_GUEST_SET_CAPABILITIES 0x464
@@ -498,6 +499,39 @@
 #define H_GUEST_CAP_POWER11(1UL<<(63-3))
 #define H_GUEST_CAP_BITMAP2(1UL<<(63-63))
 
+/*
+ * Defines for H_HTM - Macros for hardware trace macro (HTM) function.
+ */
+#define H_HTM_FLAGS_HARDWARE_TARGET(1ul << 63)
+#define H_HTM_FLAGS_LOGICAL_TARGET (1ul << 62)
+#define H_HTM_FLAGS_PROCID_TARGET  (1ul << 61)
+#define H_HTM_FLAGS_NOWRAP (1ul << 60)
+
+#define H_HTM_OP_SHIFT (63-15)
+#define H_HTM_OP(x)((unsigned long)(x)<
 
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index 71648c126970..91be7b885944 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -65,6 +65,27 @@ static inline long register_dtl(unsigned long cpu, unsigned 
long vpa)
return vpa_call(H_VPA_REG_DTL, cpu, vpa);
 }
 
+static inline long htm_call(unsigned long flags, unsigned long target,
+   unsigned long operation, unsigned long param1,
+   unsigned long param2, unsigned long param3)
+{
+   return plpar_hcall_norets(H_HTM, flags, target, operation,
+ param1, param2, param3);
+}
+
+static inline long htm_get_dump_hardware(unsigned long nodeindex,
+   unsigned long nodalchipindex, unsigned long coreindexonchip,
+   unsigned long type, unsigned long addr, unsigned long size,
+   unsigned long offset)
+{
+   return htm_call(H_HTM_FLAGS_HARDWARE_TARGET,
+   H_HTM_TARGET_NODE_INDEX(nodeindex) |
+   H_HTM_TARGET_NODAL_CHIP_INDEX(nodalchipindex) |
+   H_HTM_TARGET_CORE_INDEX_ON_CHIP(coreindexonchip),
+   H_HTM_OP(H_HTM_OP_DUMP_DATA) | H_HTM_TYPE(type),
+   addr, size, offset);
+}
+
 extern void vpa_init(int cpu);
 
 static inline long plpar_pte_enter(unsigned long flags,
-- 
2.39.3

[PATCH v6 3/3] powerpc: Document details on H_HTM hcall

2025-01-13 Thread adubey

From: Abhishek Dubey 

Add documentation to 'papr_hcalls.rst' describing the
input, output and return values of the H_HTM hcall as
per the internal specification.

v3 patch:
  
https://lore.kernel.org/linuxppc-dev/20240828085223.42177-3-ma...@linux.ibm.com/

Signed-off-by: Abhishek Dubey 
Co-developed-by: Madhavan Srinivasan 
Signed-off-by: Madhavan Srinivasan 
---
 Documentation/arch/powerpc/papr_hcalls.rst | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/Documentation/arch/powerpc/papr_hcalls.rst 
b/Documentation/arch/powerpc/papr_hcalls.rst
index 80d2c0aadab5..805e1cb9bab9 100644
--- a/Documentation/arch/powerpc/papr_hcalls.rst
+++ b/Documentation/arch/powerpc/papr_hcalls.rst
@@ -289,6 +289,17 @@ to be issued multiple times in order to be completely 
serviced. The
 subsequent hcalls to the hypervisor until the hcall is completely serviced
 at which point H_SUCCESS or other error is returned by the hypervisor.
 
+**H_HTM**
+
+| Input: flags, target, operation (op), op-param1, op-param2, op-param3
+| Out: *dumphtmbufferdata*
+| Return Value: *H_Success,H_Busy,H_LongBusyOrder,H_Partial,H_Parameter,
+H_P2,H_P3,H_P4,H_P5,H_P6,H_State,H_Not_Available,H_Authority*
+
+H_HTM supports setup, configuration, control and dumping of Hardware Trace
+Macro (HTM) function and its data. HTM buffer stores tracing data for functions
+like core instruction, core LLAT and nest.
+
 References
 ==
 .. [1] "Power Architecture Platform Reference"
-- 
2.39.3

Re: [PATCH V2] tools/perf/tests/base_probe: Fix check for the count of existing probes in test_adding_kernel

2025-01-13 Thread Veronika Molnarova




On 1/10/25 10:43, Athira Rajeev wrote:
> perftool-testsuite_probe fails in test_adding_kernel as below:
>   Regexp not found: "probe:inode_permission_11"
>   -- [ FAIL ] -- perf_probe :: test_adding_kernel :: force-adding probes 
> ::
>   second probe adding (with force) (output regexp parsing)
>   event syntax error: 'probe:inode_permission_11'
> \___ unknown tracepoint
> 
>   Error:  File /sys/kernel/tracing//events/probe/inode_permission_11
>   not found.
>   Hint:   Perhaps this kernel misses some CONFIG_ setting to
>   enable this feature?.
> 
> The test does the following:
> 1) Adds a probe point first using :
> $CMD_PERF probe --add $TEST_PROBE
> 2) Then tries to add same probe again without —force
> and expects it to fail. Next tries to add same probe again
> with —force. In this case, perf probe succeeds and adds
> the probe with a suffix number. Example:
> 
>  ./perf probe --add inode_permission
>  Added new event:
>   probe:inode_permission (on inode_permission)
> 
>  ./perf probe --add inode_permission --force
>  Added new event:
>   probe:inode_permission_1 (on inode_permission)
> 
>   ./perf probe --add inode_permission --force
>  Added new event:
>   probe:inode_permission_2 (on inode_permission)
> 
> Each time, suffix is added to existing probe name.
> To get the suffix number, test cases uses :
> NO_OF_PROBES=`$CMD_PERF probe -l | wc -l`
> 
> This will work if there is no other probe existing
> in the system. If there are any other probes other than
> kernel probes or inode_permission, ( example: any probe),
> "perf probe -l" will include count for other probes too.
> 
> Example, in the system where this failed, already some
> probes were default added. So count became 10
>   ./perf probe -l | wc -l
>   10
> 
> So to be specific for "inode_permission", restrict the
> probe count check to that probe point alone using :
> NO_OF_PROBES=`$CMD_PERF probe -l $TEST_PROBE| wc -l`
> 
> Similarly while removing the probe using "probe --del *",
> ( removing all probes ), check uses:
> 
>  ../common/check_all_lines_matched.pl "Removed event: probe:$TEST_PROBE"
> 
> But if there are other probes in the system, the log will
> contain reference to other existing probe too. Hence change
> usage of check_all_lines_matched.pl to check_all_patterns_found.pl
> This will make sure expecting string comes in the result
> 
> Signed-off-by: Athira Rajeev 

Acked-by: Veronika Molnarova 

Thanks,
Veronika

> ---
> Changelog:
>  v1 -> v2:
>  No code changes. After being reviewed by Michael Petlan, since
>  initial patch was posted in 2024-10-14, rebased on top of latest
>  perf-tools-next
> 
>  tools/perf/tests/shell/base_probe/test_adding_kernel.sh | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/tests/shell/base_probe/test_adding_kernel.sh 
> b/tools/perf/tests/shell/base_probe/test_adding_kernel.sh
> index d541ffd44a93..f8b5f096d0d7 100755
> --- a/tools/perf/tests/shell/base_probe/test_adding_kernel.sh
> +++ b/tools/perf/tests/shell/base_probe/test_adding_kernel.sh
> @@ -169,7 +169,7 @@ print_results $PERF_EXIT_CODE $CHECK_EXIT_CODE 
> "force-adding probes :: second pr
>  (( TEST_RESULT += $? ))
>  
>  # adding existing probe with '--force' should pass
> -NO_OF_PROBES=`$CMD_PERF probe -l | wc -l`
> +NO_OF_PROBES=`$CMD_PERF probe -l $TEST_PROBE| wc -l`
>  $CMD_PERF probe --force --add $TEST_PROBE 2> 
> $LOGS_DIR/adding_kernel_forceadd_03.err
>  PERF_EXIT_CODE=$?
>  
> @@ -205,7 +205,7 @@ print_results $PERF_EXIT_CODE $CHECK_EXIT_CODE "using 
> doubled probe"
>  $CMD_PERF probe --del \* 2> $LOGS_DIR/adding_kernel_removing_wildcard.err
>  PERF_EXIT_CODE=$?
>  
> -../common/check_all_lines_matched.pl "Removed event: probe:$TEST_PROBE" 
> "Removed event: probe:${TEST_PROBE}_1" < 
> $LOGS_DIR/adding_kernel_removing_wildcard.err
> +../common/check_all_patterns_found.pl "Removed event: probe:$TEST_PROBE" 
> "Removed event: probe:${TEST_PROBE}_1" < 
> $LOGS_DIR/adding_kernel_removing_wildcard.err
>  CHECK_EXIT_CODE=$?
>  
>  print_results $PERF_EXIT_CODE $CHECK_EXIT_CODE "removing multiple probes"

Re: [PATCH] fadump: Use str_yes_no() helper in fadump_show_config()

2025-01-13 Thread Sourabh Jain






On 31/12/24 03:11, Thorsten Blum wrote:

Remove hard-coded strings by using the str_yes_no() helper function.

Signed-off-by: Thorsten Blum 
---
  arch/powerpc/kernel/fadump.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 4b371c738213..8c531533dd3e 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -289,10 +289,8 @@ static void __init fadump_show_config(void)
if (!fw_dump.fadump_supported)
return;
  
-	pr_debug("Fadump enabled: %s\n",

-   (fw_dump.fadump_enabled ? "yes" : "no"));
-   pr_debug("Dump Active   : %s\n",
-   (fw_dump.dump_active ? "yes" : "no"));
+   pr_debug("Fadump enabled: %s\n", 
str_yes_no(fw_dump.fadump_enabled));
+   pr_debug("Dump Active   : %s\n", str_yes_no(fw_dump.dump_active));
pr_debug("Dump section sizes:\n");
pr_debug("CPU state data size: %lx\n", fw_dump.cpu_state_data_size);
pr_debug("HPTE region size   : %lx\n", fw_dump.hpte_region_size);


Yes, it is better to use `str_yes_no()` instead of hard-coded strings.

I have also tested your patch, and everything is working fine.

Reviewed-by: Sourabh Jain 

Thanks,
Sourabh Jain

Re: [PATCH v6 22/26] device/dax: Properly refcount device dax pages when mapping

2025-01-13 Thread Dan Williams

Alistair Popple wrote:
> Device DAX pages are currently not reference counted when mapped,
> instead relying on the devmap PTE bit to ensure mapping code will not
> get/put references. This requires special handling in various page
> table walkers, particularly GUP, to manage references on the
> underlying pgmap to ensure the pages remain valid.
> 
> However there is no reason these pages can't be refcounted properly at
> map time. Doning so eliminates the need for the devmap PTE bit,
> freeing up a precious PTE bit. It also simplifies GUP as it no longer
> needs to manage the special pgmap references and can instead just
> treat the pages normally as defined by vm_normal_page().
> 
> Signed-off-by: Alistair Popple 
> ---
>  drivers/dax/device.c | 15 +--
>  mm/memremap.c| 13 ++---
>  2 files changed, 15 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
> index 6d74e62..fd22dbf 100644
> --- a/drivers/dax/device.c
> +++ b/drivers/dax/device.c
> @@ -126,11 +126,12 @@ static vm_fault_t __dev_dax_pte_fault(struct dev_dax 
> *dev_dax,
>   return VM_FAULT_SIGBUS;
>   }
>  
> - pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
> + pfn = phys_to_pfn_t(phys, 0);
>  
>   dax_set_mapping(vmf, pfn, fault_size);
>  
> - return vmf_insert_mixed(vmf->vma, vmf->address, pfn);
> + return vmf_insert_page_mkwrite(vmf, pfn_t_to_page(pfn),
> + vmf->flags & FAULT_FLAG_WRITE);
>  }
>  
>  static vm_fault_t __dev_dax_pmd_fault(struct dev_dax *dev_dax,
> @@ -169,11 +170,12 @@ static vm_fault_t __dev_dax_pmd_fault(struct dev_dax 
> *dev_dax,
>   return VM_FAULT_SIGBUS;
>   }
>  
> - pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
> + pfn = phys_to_pfn_t(phys, 0);
>  
>   dax_set_mapping(vmf, pfn, fault_size);
>  
> - return vmf_insert_pfn_pmd(vmf, pfn, vmf->flags & FAULT_FLAG_WRITE);
> + return vmf_insert_folio_pmd(vmf, page_folio(pfn_t_to_page(pfn)),
> + vmf->flags & FAULT_FLAG_WRITE);

This looks suspect without initializing the compound page metadata.

This might be getting compound pages by default with
CONFIG_ARCH_WANT_OPTIMIZE_DAX_VMEMMAP. The device-dax unit tests are ok
so far, but that is not super comforting until I can think about this a
bit more... but not tonight.

Might as well fix up device-dax refcounts in this series too, but I
won't ask you to do that, will send you something to include.

Re: [PATCH V2] tools/perf/tests/base_probe: Fix check for the count of existing probes in test_adding_kernel

2025-01-13 Thread Athira Rajeev




> On 13 Jan 2025, at 8:36 PM, Arnaldo Carvalho de Melo  wrote:
> 
> On Mon, Jan 13, 2025 at 11:21:24AM +0100, Veronika Molnarova wrote:
>> On 1/10/25 10:43, Athira Rajeev wrote:
>>> But if there are other probes in the system, the log will
>>> contain reference to other existing probe too. Hence change
>>> usage of check_all_lines_matched.pl to check_all_patterns_found.pl
>>> This will make sure expecting string comes in the result
>>> 
>>> Signed-off-by: Athira Rajeev 
>> 
>> Acked-by: Veronika Molnarova 
> 
> Thanks, applied to perf-tools-next,
> 
> - Arnaldo
Thanks Veronika for the ack and thanks Arnaldo for pulling in the patch

Thanks
Athira

Re: [PATCH] tools/perf: Fix segfault during perf record --off-cpu when debuginfo is not enabled

2025-01-13 Thread Athira Rajeev

> On 13 Jan 2025, at 8:59 PM, Arnaldo Carvalho de Melo  wrote:
> 
> On Mon, Jan 06, 2025 at 01:25:32PM -0800, Namhyung Kim wrote:
>> On Fri, Dec 27, 2024 at 04:18:32PM +0530, Athira Rajeev wrote:
>>> 
>>> 
 On 23 Dec 2024, at 7:28 PM, Athira Rajeev  
 wrote:

 When kernel is built without debuginfo, running perf record with
 --off-cpu results in segfault as below:

  ./perf record --off-cpu -e dummy sleep 1
  libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was 
 CONFIG_DEBUG_INFO_BTF enabled?
  libbpf: failed to find '.BTF' ELF section in 
 /lib/modules/6.13.0-rc3+/build/vmlinux
  libbpf: failed to find valid kernel BTF
  Segmentation fault (core dumped)

 The backtrace pointed to:

  #0  0x100fb17c in btf.type_cnt ()
  #1  0x100fc1a8 in btf_find_by_name_kind ()
  #2  0x100fc38c in btf.find_by_name_kind ()
  #3  0x102ee3ac in off_cpu_prepare ()
  #4  0x1002f78c in cmd_record ()
  #5  0x100aee78 in run_builtin ()
  #6  0x100af3e4 in handle_internal_command ()
  #7  0x1001004c in main ()

 Code sequence is:
  static void check_sched_switch_args(void)
  {
   struct btf *btf = btf__load_vmlinux_btf();
   const struct btf_type *t1, *t2, *t3;
   u32 type_id;

   type_id = btf__find_by_name_kind(btf, "btf_trace_sched_switch",
BTF_KIND_TYPEDEF);

 btf__load_vmlinux_btf fails when CONFIG_DEBUG_INFO_BTF is not enabled.
 Here bpf__find_by_name_kind calls btf__type_cnt with NULL btf
 value and results in segfault. To fix this, add a check to see if
 btf is not NULL before invoking bpf__find_by_name_kind

 Signed-off-by: Athira Rajeev 
>> 
>> Reviewed-by: Namhyung Kim 
> 
> Thanks, applied to perf-tools-next,
> 
> - Arnaldo

Thanks Namhyung and Arnaldo 

Athira.

[RESEND v4 1/3] mm/pkey: Add PKEY_UNRESTRICTED macro

2025-01-13 Thread Yury Khrustalev

Memory protection keys (pkeys) uapi has two macros for pkeys restrictions:

 - PKEY_DISABLE_ACCESS 0x1
 - PKEY_DISABLE_WRITE  0x2

with implicit literal value of 0x0 that means "unrestricted". Code that
works with pkeys has to use this literal value when implying that a pkey
imposes no restrictions. This may reduce readability because 0 can be
written in various ways (e.g. 0x0 or 0) and also because 0 in the context
of pkeys can be mistaken for "no permissions" (akin PROT_NONE) while it
actually means "no restrictions". This is important because pkeys are
oftentimes used near mprotect() that uses PROT_ macros.

This patch adds PKEY_UNRESTRICTED macro defined as 0x0.

Signed-off-by: Yury Khrustalev 
Acked-by: Dave Hansen 
---
 include/uapi/asm-generic/mman-common.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/asm-generic/mman-common.h 
b/include/uapi/asm-generic/mman-common.h
index 1ea2c4c33b86..ef1c27fa3c57 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -85,6 +85,7 @@
 /* compatibility flags */
 #define MAP_FILE   0
 
+#define PKEY_UNRESTRICTED  0x0
 #define PKEY_DISABLE_ACCESS0x1
 #define PKEY_DISABLE_WRITE 0x2
 #define PKEY_ACCESS_MASK   (PKEY_DISABLE_ACCESS |\
-- 
2.39.5

[RESEND v4 0/3] mm/pkey: Add PKEY_UNRESTRICTED macro

2025-01-13 Thread Yury Khrustalev

Add PKEY_UNRESTRICTED macro to mman.h and use it in selftests.

For context, this change will also allow for more consistent update of the
Glibc manual which in turn will help with introducing memory protection
keys on AArch64 targets.

Applies to 5bc55a333a2f (tag: v6.13-rc7).

Note that I couldn't build ppc tests so I would appreciate if someone
could check the 3rd patch. Thank you!

Signed-off-by: Yury Khrustalev 

---
Changes in v4:
 - Removed change to tools/include/uapi/asm-generic/mman-common.h as it is not
   necessary.

Link to v3: 
https://lore.kernel.org/all/20241028090715.509527-1-yury.khrusta...@arm.com/

Changes in v3:
 - Replaced previously missed 0-s tools/testing/selftests/mm/mseal_test.c
 - Replaced previously missed 0-s in tools/testing/selftests/mm/mseal_test.c

Link to v2: 
https://lore.kernel.org/linux-arch/20241027170006.464252-2-yury.khrusta...@arm.com/

Changes in v2:
 - Update tools/include/uapi/asm-generic/mman-common.h as well
 - Add usages of the new macro to selftests.

Link to v1: 
https://lore.kernel.org/linux-arch/20241022120128.359652-1-yury.khrusta...@arm.com/

---

Yury Khrustalev (3):
  mm/pkey: Add PKEY_UNRESTRICTED macro
  selftests/mm: Use PKEY_UNRESTRICTED macro
  selftests/powerpc: Use PKEY_UNRESTRICTED macro

 include/uapi/asm-generic/mman-common.h   | 1 +
 tools/testing/selftests/mm/mseal_test.c  | 6 +++---
 tools/testing/selftests/mm/pkey-helpers.h| 3 ++-
 tools/testing/selftests/mm/pkey_sighandler_tests.c   | 4 ++--
 tools/testing/selftests/mm/protection_keys.c | 2 +-
 tools/testing/selftests/powerpc/include/pkeys.h  | 2 +-
 tools/testing/selftests/powerpc/mm/pkey_exec_prot.c  | 2 +-
 tools/testing/selftests/powerpc/mm/pkey_siginfo.c| 2 +-
 tools/testing/selftests/powerpc/ptrace/core-pkey.c   | 6 +++---
 tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c | 6 +++---
 10 files changed, 18 insertions(+), 16 deletions(-)

-- 
2.39.5

[RESEND v4 3/3] selftests/powerpc: Use PKEY_UNRESTRICTED macro

2025-01-13 Thread Yury Khrustalev

Replace literal 0 with macro PKEY_UNRESTRICTED where pkey_*() functions
are used in mm selftests for memory protection keys for ppc target.

Signed-off-by: Yury Khrustalev 
Suggested-by: Kevin Brodsky 
Reviewed-by: Kevin Brodsky 

---
Note that I couldn't build these tests so I would appreciate if someone
could check this patch. Thank you!
---
 tools/testing/selftests/powerpc/include/pkeys.h  | 2 +-
 tools/testing/selftests/powerpc/mm/pkey_exec_prot.c  | 2 +-
 tools/testing/selftests/powerpc/mm/pkey_siginfo.c| 2 +-
 tools/testing/selftests/powerpc/ptrace/core-pkey.c   | 6 +++---
 tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c | 6 +++---
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/powerpc/include/pkeys.h 
b/tools/testing/selftests/powerpc/include/pkeys.h
index 51729d9a7111..430cb4bd7472 100644
--- a/tools/testing/selftests/powerpc/include/pkeys.h
+++ b/tools/testing/selftests/powerpc/include/pkeys.h
@@ -85,7 +85,7 @@ int pkeys_unsupported(void)
SKIP_IF(!hash_mmu);
 
/* Check if the system call is supported */
-   pkey = sys_pkey_alloc(0, 0);
+   pkey = sys_pkey_alloc(0, PKEY_UNRESTRICTED);
SKIP_IF(pkey < 0);
sys_pkey_free(pkey);
 
diff --git a/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c 
b/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
index 0af4f02669a1..29b91b7456eb 100644
--- a/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
+++ b/tools/testing/selftests/powerpc/mm/pkey_exec_prot.c
@@ -72,7 +72,7 @@ static void segv_handler(int signum, siginfo_t *sinfo, void 
*ctx)
 
switch (fault_type) {
case PKEY_DISABLE_ACCESS:
-   pkey_set_rights(fault_pkey, 0);
+   pkey_set_rights(fault_pkey, PKEY_UNRESTRICTED);
break;
case PKEY_DISABLE_EXECUTE:
/*
diff --git a/tools/testing/selftests/powerpc/mm/pkey_siginfo.c 
b/tools/testing/selftests/powerpc/mm/pkey_siginfo.c
index 2db76e56d4cb..e89a164c686b 100644
--- a/tools/testing/selftests/powerpc/mm/pkey_siginfo.c
+++ b/tools/testing/selftests/powerpc/mm/pkey_siginfo.c
@@ -83,7 +83,7 @@ static void segv_handler(int signum, siginfo_t *sinfo, void 
*ctx)
mprotect(pgstart, pgsize, PROT_EXEC))
_exit(1);
else
-   pkey_set_rights(pkey, 0);
+   pkey_set_rights(pkey, PKEY_UNRESTRICTED);
 
fault_count++;
 }
diff --git a/tools/testing/selftests/powerpc/ptrace/core-pkey.c 
b/tools/testing/selftests/powerpc/ptrace/core-pkey.c
index f6da4cb30cd6..64c985445cb7 100644
--- a/tools/testing/selftests/powerpc/ptrace/core-pkey.c
+++ b/tools/testing/selftests/powerpc/ptrace/core-pkey.c
@@ -124,16 +124,16 @@ static int child(struct shared_info *info)
/* Get some pkeys so that we can change their bits in the AMR. */
pkey1 = sys_pkey_alloc(0, PKEY_DISABLE_EXECUTE);
if (pkey1 < 0) {
-   pkey1 = sys_pkey_alloc(0, 0);
+   pkey1 = sys_pkey_alloc(0, PKEY_UNRESTRICTED);
FAIL_IF(pkey1 < 0);
 
disable_execute = false;
}
 
-   pkey2 = sys_pkey_alloc(0, 0);
+   pkey2 = sys_pkey_alloc(0, PKEY_UNRESTRICTED);
FAIL_IF(pkey2 < 0);
 
-   pkey3 = sys_pkey_alloc(0, 0);
+   pkey3 = sys_pkey_alloc(0, PKEY_UNRESTRICTED);
FAIL_IF(pkey3 < 0);
 
info->amr |= 3ul << pkeyshift(pkey1) | 2ul << pkeyshift(pkey2);
diff --git a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c 
b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
index d89474377f11..37794f82ed66 100644
--- a/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
+++ b/tools/testing/selftests/powerpc/ptrace/ptrace-pkey.c
@@ -81,16 +81,16 @@ static int child(struct shared_info *info)
/* Get some pkeys so that we can change their bits in the AMR. */
pkey1 = sys_pkey_alloc(0, PKEY_DISABLE_EXECUTE);
if (pkey1 < 0) {
-   pkey1 = sys_pkey_alloc(0, 0);
+   pkey1 = sys_pkey_alloc(0, PKEY_UNRESTRICTED);
CHILD_FAIL_IF(pkey1 < 0, &info->child_sync);
 
disable_execute = false;
}
 
-   pkey2 = sys_pkey_alloc(0, 0);
+   pkey2 = sys_pkey_alloc(0, PKEY_UNRESTRICTED);
CHILD_FAIL_IF(pkey2 < 0, &info->child_sync);
 
-   pkey3 = sys_pkey_alloc(0, 0);
+   pkey3 = sys_pkey_alloc(0, PKEY_UNRESTRICTED);
CHILD_FAIL_IF(pkey3 < 0, &info->child_sync);
 
info->amr1 |= 3ul << pkeyshift(pkey1);
-- 
2.39.5

[RESEND v4 2/3] selftests/mm: Use PKEY_UNRESTRICTED macro

2025-01-13 Thread Yury Khrustalev

Replace literal 0 with macro PKEY_UNRESTRICTED where pkey_*() functions
are used in mm selftests for memory protection keys.

Signed-off-by: Yury Khrustalev 
Suggested-by: Joey Gouly 
Acked-by: Dave Hansen 
---
 tools/testing/selftests/mm/mseal_test.c| 6 +++---
 tools/testing/selftests/mm/pkey-helpers.h  | 3 ++-
 tools/testing/selftests/mm/pkey_sighandler_tests.c | 4 ++--
 tools/testing/selftests/mm/protection_keys.c   | 2 +-
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/mm/mseal_test.c 
b/tools/testing/selftests/mm/mseal_test.c
index 01675c412b2a..30ea37e8ecf8 100644
--- a/tools/testing/selftests/mm/mseal_test.c
+++ b/tools/testing/selftests/mm/mseal_test.c
@@ -218,7 +218,7 @@ bool seal_support(void)
 bool pkey_supported(void)
 {
 #if defined(__i386__) || defined(__x86_64__) /* arch */
-   int pkey = sys_pkey_alloc(0, 0);
+   int pkey = sys_pkey_alloc(0, PKEY_UNRESTRICTED);
 
if (pkey > 0)
return true;
@@ -1671,7 +1671,7 @@ static void test_seal_discard_ro_anon_on_pkey(bool seal)
setup_single_address_rw(size, &ptr);
FAIL_TEST_IF_FALSE(ptr != (void *)-1);
 
-   pkey = sys_pkey_alloc(0, 0);
+   pkey = sys_pkey_alloc(0, PKEY_UNRESTRICTED);
FAIL_TEST_IF_FALSE(pkey > 0);
 
ret = sys_mprotect_pkey((void *)ptr, size, PROT_READ | PROT_WRITE, 
pkey);
@@ -1683,7 +1683,7 @@ static void test_seal_discard_ro_anon_on_pkey(bool seal)
}
 
/* sealing doesn't take effect if PKRU allow write. */
-   set_pkey(pkey, 0);
+   set_pkey(pkey, PKEY_UNRESTRICTED);
ret = sys_madvise(ptr, size, MADV_DONTNEED);
FAIL_TEST_IF_FALSE(!ret);
 
diff --git a/tools/testing/selftests/mm/pkey-helpers.h 
b/tools/testing/selftests/mm/pkey-helpers.h
index f7cfe163b0ff..10fa3ca9b05b 100644
--- a/tools/testing/selftests/mm/pkey-helpers.h
+++ b/tools/testing/selftests/mm/pkey-helpers.h
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../kselftest.h"
 
@@ -224,7 +225,7 @@ static inline u32 *siginfo_get_pkey_ptr(siginfo_t *si)
 static inline int kernel_has_pkeys(void)
 {
/* try allocating a key and see if it succeeds */
-   int ret = sys_pkey_alloc(0, 0);
+   int ret = sys_pkey_alloc(0, PKEY_UNRESTRICTED);
if (ret <= 0) {
return 0;
}
diff --git a/tools/testing/selftests/mm/pkey_sighandler_tests.c 
b/tools/testing/selftests/mm/pkey_sighandler_tests.c
index c593a426341c..2015ed7e0928 100644
--- a/tools/testing/selftests/mm/pkey_sighandler_tests.c
+++ b/tools/testing/selftests/mm/pkey_sighandler_tests.c
@@ -314,7 +314,7 @@ static void 
test_sigsegv_handler_with_different_pkey_for_stack(void)
__write_pkey_reg(pkey_reg);
 
/* Protect the new stack with MPK 1 */
-   pkey = pkey_alloc(0, 0);
+   pkey = pkey_alloc(0, PKEY_UNRESTRICTED);
pkey_mprotect(stack, STACK_SIZE, PROT_READ | PROT_WRITE, pkey);
 
/* Set up alternate signal stack that will use the default MPK */
@@ -487,7 +487,7 @@ static void test_pkru_sigreturn(void)
__write_pkey_reg(pkey_reg);
 
/* Protect the stack with MPK 2 */
-   pkey = pkey_alloc(0, 0);
+   pkey = pkey_alloc(0, PKEY_UNRESTRICTED);
pkey_mprotect(stack, STACK_SIZE, PROT_READ | PROT_WRITE, pkey);
 
/* Set up alternate signal stack that will use the default MPK */
diff --git a/tools/testing/selftests/mm/protection_keys.c 
b/tools/testing/selftests/mm/protection_keys.c
index 4990f7ab4cb7..cca7435a7bc5 100644
--- a/tools/testing/selftests/mm/protection_keys.c
+++ b/tools/testing/selftests/mm/protection_keys.c
@@ -491,7 +491,7 @@ int sys_pkey_alloc(unsigned long flags, unsigned long 
init_val)
 int alloc_pkey(void)
 {
int ret;
-   unsigned long init_val = 0x0;
+   unsigned long init_val = PKEY_UNRESTRICTED;
 
dprintf1("%s()::%d, pkey_reg: 0x%016llx shadow: %016llx\n",
__func__, __LINE__, __read_pkey_reg(), shadow_pkey_reg);
-- 
2.39.5

Re: [PATCH 3/5] KVM: Add a common kvm_run flag to communicate an exit needs completion

2025-01-13 Thread Sean Christopherson

On Mon, Jan 13, 2025, Marc Zyngier wrote:
> On Mon, 13 Jan 2025 15:44:28 +,
> Sean Christopherson  wrote:
> > 
> > On Sat, Jan 11, 2025, Marc Zyngier wrote:
> > > On Sat, 11 Jan 2025 01:24:48 +,
> > > Sean Christopherson  wrote:
> > > > 
> > > > Add a kvm_run flag, KVM_RUN_NEEDS_COMPLETION, to communicate to 
> > > > userspace
> > > > that KVM_RUN needs to be re-executed prior to save/restore in order to
> > > > complete the instruction/operation that triggered the userspace exit.
> > > > 
> > > > KVM's current approach of adding notes in the Documentation is beyond
> > > > brittle, e.g. there is at least one known case where a KVM developer 
> > > > added
> > > > a new userspace exit type, and then that same developer forgot to handle
> > > > completion when adding userspace support.
> > > 
> > > Is this going to fix anything? If they couldn't be bothered to read
> > > the documentation, let alone update it, how is that going to be
> > > improved by extra rules and regulations?
> > > 
> > > I don't see how someone ignoring the documented behaviour of a given
> > > exit reason is, all of a sudden, have an epiphany and take a *new*
> > > flag into account.
> > 
> > The idea is to reduce the probability of introducing bugs, in KVM or 
> > userspace,
> > every time KVM attaches a completion callback.  Yes, userspace would need 
> > to be
> > updated to handle KVM_RUN_NEEDS_COMPLETION, but once that flag is merged, 
> > neither
> > KVM's documentation nor userspace would never need to be updated again.  
> > And if
> > all architectures took an approach of handling completion via function 
> > callback,
> > I'm pretty sure we'd never need to manually update KVM itself either.
> 
> You are assuming that we need this completion, and I dispute this
> assertion.

Ah, gotcha.

> > > > +The pending state of the operation for such exits is not preserved in 
> > > > state
> > > > +which is visible to userspace, thus userspace should ensure that the 
> > > > operation
> > > > +is completed before performing state save/restore, e.g. for live 
> > > > migration.
> > > > +Userspace can re-enter the guest with an unmasked signal pending or 
> > > > with the
> > > > +immediate_exit field set to complete pending operations without 
> > > > allowing any
> > > > +further instructions to be executed.
> > > > +
> > > > +Without KVM_CAP_NEEDS_COMPLETION, KVM_RUN_NEEDS_COMPLETION will never 
> > > > be set
> > > > +and userspace must assume that exits of type KVM_EXIT_IO, 
> > > > KVM_EXIT_MMIO,
> > > > +KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN, KVM_EXIT_EPR, 
> > > > KVM_EXIT_X86_RDMSR,
> > > > +KVM_EXIT_X86_WRMSR, and KVM_EXIT_HYPERCALL require completion.
> > > 
> > > So once you advertise KVM_CAP_NEEDS_COMPLETION, the completion flag
> > > must be present for all of these exits, right? And from what I can
> > > tell, this capability is unconditionally advertised.
> > > 
> > > Yet, you don't amend arm64 to publish that flag. Not that I think this
> > > causes any issue (even if you save the state at that point without
> > > reentering the guest, it will be still be consistent), but that
> > > directly contradicts the documentation (isn't that ironic? ;-).
> > 
> > It does cause issues, I missed this code in kvm_arch_vcpu_ioctl_run():
> > 
> > if (run->exit_reason == KVM_EXIT_MMIO) {
> > ret = kvm_handle_mmio_return(vcpu);
> > if (ret <= 0)
> > return ret;
> > }
> 
> That's satisfying a load from the guest forwarded to userspace.

And MMIO stores, no?  I.e. PC needs to be incremented on stores as well.

> If the VMM did a save of the guest at this stage, restored and resumed it,
> *nothing* bad would happen, as PC still points to the instruction that got
> forwarded. You'll see the same load again.

But replaying an MMIO store could cause all kinds of problems, and even MMIO
loads could theoretically be problematic, e.g. if there are side effects in the
device that trigger on access to a device register.

Re: [PATCH v5 05/17] arm64: pgtable: use mmu gather to free p4d level page table

2025-01-13 Thread Will Deacon

On Wed, Jan 08, 2025 at 02:57:21PM +0800, Qi Zheng wrote:
> Like other levels of page tables, also use mmu gather mechanism to free
> p4d level page table.
> 
> Signed-off-by: Qi Zheng 
> Originally-by: Peter Zijlstra (Intel) 
> Reviewed-by: Kevin Brodsky 
> Cc: linux-arm-ker...@lists.infradead.org
> ---
>  arch/arm64/include/asm/pgalloc.h |  1 -
>  arch/arm64/include/asm/tlb.h | 14 ++
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/pgalloc.h 
> b/arch/arm64/include/asm/pgalloc.h
> index 2965f5a7e39e3..1b4509d3382c6 100644
> --- a/arch/arm64/include/asm/pgalloc.h
> +++ b/arch/arm64/include/asm/pgalloc.h
> @@ -85,7 +85,6 @@ static inline void pgd_populate(struct mm_struct *mm, pgd_t 
> *pgdp, p4d_t *p4dp)
>   __pgd_populate(pgdp, __pa(p4dp), pgdval);
>  }
>  
> -#define __p4d_free_tlb(tlb, p4d, addr)  p4d_free((tlb)->mm, p4d)
>  #else
>  static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t 
> prot)
>  {
> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> index a947c6e784ed2..445282cde9afb 100644
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -111,4 +111,18 @@ static inline void __pud_free_tlb(struct mmu_gather 
> *tlb, pud_t *pudp,
>  }
>  #endif
>  
> +#if CONFIG_PGTABLE_LEVELS > 4
> +static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4dp,
> +   unsigned long addr)
> +{
> + struct ptdesc *ptdesc = virt_to_ptdesc(p4dp);
> +
> + if (!pgtable_l5_enabled())
> + return;
> +
> + pagetable_p4d_dtor(ptdesc);
> + tlb_remove_ptdesc(tlb, ptdesc);
> +}

Should we update p4d_free() to call the destructor, too? It looks like
it just does free_page() atm.

Will

[PATCH v6 RESEND 1/3] powerpc/pseries: Macros and wrapper functions for H_HTM call

2025-01-13 Thread adubey

From: Abhishek Dubey 

Define macros and wrapper functions to handle
H_HTM (Hardware Trace Macro) hypervisor call.
H_HTM is new HCALL added to export data from
Hardware Trace Macro (HTM) function.

Signed-off-by: Abhishek Dubey 
Co-developed-by: Madhavan Srinivasan 
Signed-off-by: Madhavan Srinivasan 
Reviewed-by: Athira Rajeev 
---
v3 patch:
  
https://lore.kernel.org/linuxppc-dev/20240828085223.42177-1-ma...@linux.ibm.com/

 arch/powerpc/include/asm/hvcall.h | 34 +++
 arch/powerpc/include/asm/plpar_wrappers.h | 21 ++
 2 files changed, 55 insertions(+)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 65d1f291393d..eeef13db2770 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -348,6 +348,7 @@
 #define H_SCM_FLUSH0x44C
 #define H_GET_ENERGY_SCALE_INFO0x450
 #define H_PKS_SIGNED_UPDATE0x454
+#define H_HTM   0x458
 #define H_WATCHDOG 0x45C
 #define H_GUEST_GET_CAPABILITIES 0x460
 #define H_GUEST_SET_CAPABILITIES 0x464
@@ -498,6 +499,39 @@
 #define H_GUEST_CAP_POWER11(1UL<<(63-3))
 #define H_GUEST_CAP_BITMAP2(1UL<<(63-63))
 
+/*
+ * Defines for H_HTM - Macros for hardware trace macro (HTM) function.
+ */
+#define H_HTM_FLAGS_HARDWARE_TARGET(1ul << 63)
+#define H_HTM_FLAGS_LOGICAL_TARGET (1ul << 62)
+#define H_HTM_FLAGS_PROCID_TARGET  (1ul << 61)
+#define H_HTM_FLAGS_NOWRAP (1ul << 60)
+
+#define H_HTM_OP_SHIFT (63-15)
+#define H_HTM_OP(x)((unsigned long)(x)<
 
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index 71648c126970..91be7b885944 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -65,6 +65,27 @@ static inline long register_dtl(unsigned long cpu, unsigned 
long vpa)
return vpa_call(H_VPA_REG_DTL, cpu, vpa);
 }
 
+static inline long htm_call(unsigned long flags, unsigned long target,
+   unsigned long operation, unsigned long param1,
+   unsigned long param2, unsigned long param3)
+{
+   return plpar_hcall_norets(H_HTM, flags, target, operation,
+ param1, param2, param3);
+}
+
+static inline long htm_get_dump_hardware(unsigned long nodeindex,
+   unsigned long nodalchipindex, unsigned long coreindexonchip,
+   unsigned long type, unsigned long addr, unsigned long size,
+   unsigned long offset)
+{
+   return htm_call(H_HTM_FLAGS_HARDWARE_TARGET,
+   H_HTM_TARGET_NODE_INDEX(nodeindex) |
+   H_HTM_TARGET_NODAL_CHIP_INDEX(nodalchipindex) |
+   H_HTM_TARGET_CORE_INDEX_ON_CHIP(coreindexonchip),
+   H_HTM_OP(H_HTM_OP_DUMP_DATA) | H_HTM_TYPE(type),
+   addr, size, offset);
+}
+
 extern void vpa_init(int cpu);
 
 static inline long plpar_pte_enter(unsigned long flags,
-- 
2.39.3

Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()

2025-01-13 Thread Christophe Leroy





Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :

Bring syscall_set_return_value() in sync with syscall_get_error(),
and let upcoming ptrace/set_syscall_info selftest pass on powerpc.

This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
syscall_set_return_value()").


There is a clear detailed explanation in that commit of why it needs to 
be done.


If you think that commit is wrong you have to explain why with at least 
the same level of details.




Signed-off-by: Dmitry V. Levin 
---
  arch/powerpc/include/asm/syscall.h | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/syscall.h 
b/arch/powerpc/include/asm/syscall.h
index 3dd36c5e334a..422d7735ace6 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -82,7 +82,11 @@ static inline void syscall_set_return_value(struct 
task_struct *task,
 */
if (error) {
regs->ccr |= 0x1000L;
-   regs->gpr[3] = error;
+   /*
+* In case of an error regs->gpr[3] contains
+* a positive ERRORCODE.
+*/
+   regs->gpr[3] = -error;
} else {
regs->ccr &= ~0x1000L;
regs->gpr[3] = val;

Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()

2025-01-13 Thread Dmitry V. Levin

On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
> > Bring syscall_set_return_value() in sync with syscall_get_error(),
> > and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
> > 
> > This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
> > syscall_set_return_value()").
> 
> There is a clear detailed explanation in that commit of why it needs to 
> be done.
> 
> If you think that commit is wrong you have to explain why with at least 
> the same level of details.

I'm sorry, I'm not by any means a powerpc expert to explain why that
commit was added in the first place, I wish Michael would be able to do it
himself.  All I can say is that for some mysterious reason current
syscall_set_return_value() implementation assumes that in case of an error
regs->gpr[3] has to be negative, while, according to well-tested
syscall_get_error(), it has to be positive.

This is very visible with PTRACE_SET_SYSCALL_INFO that exposes
syscall_set_return_value() to userspace, and, in particular, with the
architecture-agnostic ptrace/set_syscall_info selftest added later in the
series.

> > diff --git a/arch/powerpc/include/asm/syscall.h 
> > b/arch/powerpc/include/asm/syscall.h
> > index 3dd36c5e334a..422d7735ace6 100644
> > --- a/arch/powerpc/include/asm/syscall.h
> > +++ b/arch/powerpc/include/asm/syscall.h
> > @@ -82,7 +82,11 @@ static inline void syscall_set_return_value(struct 
> > task_struct *task,
> >  */
> > if (error) {
> > regs->ccr |= 0x1000L;
> > -   regs->gpr[3] = error;
> > +   /*
> > +* In case of an error regs->gpr[3] contains
> > +* a positive ERRORCODE.
> > +*/
> > +   regs->gpr[3] = -error;
> > } else {
> > regs->ccr &= ~0x1000L;
> > regs->gpr[3] = val;

-- 
ldv

Re: [PATCH 3/5] KVM: Add a common kvm_run flag to communicate an exit needs completion

2025-01-13 Thread Marc Zyngier

On Mon, 13 Jan 2025 15:44:28 +,
Sean Christopherson  wrote:
> 
> On Sat, Jan 11, 2025, Marc Zyngier wrote:
> > On Sat, 11 Jan 2025 01:24:48 +,
> > Sean Christopherson  wrote:
> > > 
> > > Add a kvm_run flag, KVM_RUN_NEEDS_COMPLETION, to communicate to userspace
> > > that KVM_RUN needs to be re-executed prior to save/restore in order to
> > > complete the instruction/operation that triggered the userspace exit.
> > > 
> > > KVM's current approach of adding notes in the Documentation is beyond
> > > brittle, e.g. there is at least one known case where a KVM developer added
> > > a new userspace exit type, and then that same developer forgot to handle
> > > completion when adding userspace support.
> > 
> > Is this going to fix anything? If they couldn't be bothered to read
> > the documentation, let alone update it, how is that going to be
> > improved by extra rules and regulations?
> > 
> > I don't see how someone ignoring the documented behaviour of a given
> > exit reason is, all of a sudden, have an epiphany and take a *new*
> > flag into account.
> 
> The idea is to reduce the probability of introducing bugs, in KVM or 
> userspace,
> every time KVM attaches a completion callback.  Yes, userspace would need to 
> be
> updated to handle KVM_RUN_NEEDS_COMPLETION, but once that flag is merged, 
> neither
> KVM's documentation nor userspace would never need to be updated again.  And 
> if
> all architectures took an approach of handling completion via function 
> callback,
> I'm pretty sure we'd never need to manually update KVM itself either.

You are assuming that we need this completion, and I dispute this
assertion.

>
> > > +7.37 KVM_CAP_NEEDS_COMPLETION
> > > +-
> > > +
> > > +:Architectures: all
> > > +:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
> > > +
> > > +The presence of this capability indicates that KVM_RUN will set
> > > +KVM_RUN_NEEDS_COMPLETION in kvm_run.flags if KVM requires userspace to 
> > > re-enter
> > > +the kernel KVM_RUN to complete the exit.
> > > +
> > > +For select exits, userspace must re-enter the kernel with KVM_RUN to 
> > > complete
> > > +the corresponding operation, only after which is guest state guaranteed 
> > > to be
> > > +consistent.  On such a KVM_RUN, the kernel side will first finish 
> > > incomplete
> > > +operations and then check for pending signals.
> > > +
> > > +The pending state of the operation for such exits is not preserved in 
> > > state
> > > +which is visible to userspace, thus userspace should ensure that the 
> > > operation
> > > +is completed before performing state save/restore, e.g. for live 
> > > migration.
> > > +Userspace can re-enter the guest with an unmasked signal pending or with 
> > > the
> > > +immediate_exit field set to complete pending operations without allowing 
> > > any
> > > +further instructions to be executed.
> > > +
> > > +Without KVM_CAP_NEEDS_COMPLETION, KVM_RUN_NEEDS_COMPLETION will never be 
> > > set
> > > +and userspace must assume that exits of type KVM_EXIT_IO, KVM_EXIT_MMIO,
> > > +KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN, KVM_EXIT_EPR, 
> > > KVM_EXIT_X86_RDMSR,
> > > +KVM_EXIT_X86_WRMSR, and KVM_EXIT_HYPERCALL require completion.
> > 
> > So once you advertise KVM_CAP_NEEDS_COMPLETION, the completion flag
> > must be present for all of these exits, right? And from what I can
> > tell, this capability is unconditionally advertised.
> > 
> > Yet, you don't amend arm64 to publish that flag. Not that I think this
> > causes any issue (even if you save the state at that point without
> > reentering the guest, it will be still be consistent), but that
> > directly contradicts the documentation (isn't that ironic? ;-).
> 
> It does cause issues, I missed this code in kvm_arch_vcpu_ioctl_run():
> 
>   if (run->exit_reason == KVM_EXIT_MMIO) {
>   ret = kvm_handle_mmio_return(vcpu);
>   if (ret <= 0)
>   return ret;
>   }

That's satisfying a load from the guest forwarded to userspace. If the
VMM did a save of the guest at this stage, restored and resumed it,
*nothing* bad would happen, as PC still points to the instruction that
got forwarded. You'll see the same load again.

As for all arm64 synchronous exceptions, they are idempotent, and can
be repeated as often as you want without side effects.

M.

-- 
Without deviation from the norm, progress is not possible.

Re: [PATCH 3/5] KVM: Add a common kvm_run flag to communicate an exit needs completion

2025-01-13 Thread Marc Zyngier

On Mon, 13 Jan 2025 18:58:45 +,
Sean Christopherson  wrote:
> 
> On Mon, Jan 13, 2025, Marc Zyngier wrote:
> > On Mon, 13 Jan 2025 15:44:28 +,
> > Sean Christopherson  wrote:
> > > 
> > > On Sat, Jan 11, 2025, Marc Zyngier wrote:
> > > > On Sat, 11 Jan 2025 01:24:48 +,
> > > > Sean Christopherson  wrote:
> > > > > 
> > > > > Add a kvm_run flag, KVM_RUN_NEEDS_COMPLETION, to communicate to 
> > > > > userspace
> > > > > that KVM_RUN needs to be re-executed prior to save/restore in order to
> > > > > complete the instruction/operation that triggered the userspace exit.
> > > > > 
> > > > > KVM's current approach of adding notes in the Documentation is beyond
> > > > > brittle, e.g. there is at least one known case where a KVM developer 
> > > > > added
> > > > > a new userspace exit type, and then that same developer forgot to 
> > > > > handle
> > > > > completion when adding userspace support.
> > > > 
> > > > Is this going to fix anything? If they couldn't be bothered to read
> > > > the documentation, let alone update it, how is that going to be
> > > > improved by extra rules and regulations?
> > > > 
> > > > I don't see how someone ignoring the documented behaviour of a given
> > > > exit reason is, all of a sudden, have an epiphany and take a *new*
> > > > flag into account.
> > > 
> > > The idea is to reduce the probability of introducing bugs, in KVM or 
> > > userspace,
> > > every time KVM attaches a completion callback.  Yes, userspace would need 
> > > to be
> > > updated to handle KVM_RUN_NEEDS_COMPLETION, but once that flag is merged, 
> > > neither
> > > KVM's documentation nor userspace would never need to be updated again.  
> > > And if
> > > all architectures took an approach of handling completion via function 
> > > callback,
> > > I'm pretty sure we'd never need to manually update KVM itself either.
> > 
> > You are assuming that we need this completion, and I dispute this
> > assertion.
> 
> Ah, gotcha.
> 
> > > > > +The pending state of the operation for such exits is not preserved 
> > > > > in state
> > > > > +which is visible to userspace, thus userspace should ensure that the 
> > > > > operation
> > > > > +is completed before performing state save/restore, e.g. for live 
> > > > > migration.
> > > > > +Userspace can re-enter the guest with an unmasked signal pending or 
> > > > > with the
> > > > > +immediate_exit field set to complete pending operations without 
> > > > > allowing any
> > > > > +further instructions to be executed.
> > > > > +
> > > > > +Without KVM_CAP_NEEDS_COMPLETION, KVM_RUN_NEEDS_COMPLETION will 
> > > > > never be set
> > > > > +and userspace must assume that exits of type KVM_EXIT_IO, 
> > > > > KVM_EXIT_MMIO,
> > > > > +KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN, KVM_EXIT_EPR, 
> > > > > KVM_EXIT_X86_RDMSR,
> > > > > +KVM_EXIT_X86_WRMSR, and KVM_EXIT_HYPERCALL require completion.
> > > > 
> > > > So once you advertise KVM_CAP_NEEDS_COMPLETION, the completion flag
> > > > must be present for all of these exits, right? And from what I can
> > > > tell, this capability is unconditionally advertised.
> > > > 
> > > > Yet, you don't amend arm64 to publish that flag. Not that I think this
> > > > causes any issue (even if you save the state at that point without
> > > > reentering the guest, it will be still be consistent), but that
> > > > directly contradicts the documentation (isn't that ironic? ;-).
> > > 
> > > It does cause issues, I missed this code in kvm_arch_vcpu_ioctl_run():
> > > 
> > >   if (run->exit_reason == KVM_EXIT_MMIO) {
> > >   ret = kvm_handle_mmio_return(vcpu);
> > >   if (ret <= 0)
> > >   return ret;
> > >   }
> > 
> > That's satisfying a load from the guest forwarded to userspace.
> 
> And MMIO stores, no?  I.e. PC needs to be incremented on stores as well.

Yes, *after* the store as completed. If you replay the instruction,
the same store comes out.

>
> > If the VMM did a save of the guest at this stage, restored and resumed it,
> > *nothing* bad would happen, as PC still points to the instruction that got
> > forwarded. You'll see the same load again.
> 
> But replaying an MMIO store could cause all kinds of problems, and even MMIO
> loads could theoretically be problematic, e.g. if there are side effects in 
> the
> device that trigger on access to a device register.

But that's the VMM's problem. If it has modified its own state and
doesn't return to the guest to complete the instruction, that's just
as bad as a load, which *do* have side effects as well.

Overall, the guest state exposed by KVM is always correct, and
replaying the instruction is not going to change that. It is if the
VMM is broken that things turn ugly *for the VMM itself*, and I claim
that no amount of flag being added is going to help that.

M.

-- 
Without deviation from the norm, progress is not possible.

[PATCH v6 RESEND 2/3] powerpc/pseries: Export hardware trace macro dump via debugfs

2025-01-13 Thread adubey

From: Abhishek Dubey 

This patch adds debugfs interface to export Hardware Trace Macro (HTM)
function data in a LPAR. New hypervisor call "H_HTM" has been
defined to setup, configure, control and dump the HTM data.
This patch supports only dumping of HTM data in a LPAR.
New debugfs folder called "htmdump" has been added under
/sys/kernel/debug/arch path which contains files need to
pass required parameters for the H_HTM dump function. New Kconfig
option called "CONFIG_HTMDUMP" is added in platform/pseries
for the same.

With this module loaded, list of files in debugfs path

/sys/kernel/debug/powerpc/htmdump
coreindexonchip  htmtype  nodalchipindex  nodeindex  trace

Signed-off-by: Abhishek Dubey 
Co-developed-by: Madhavan Srinivasan 
Signed-off-by: Madhavan Srinivasan 
Reviewed-by: Athira Rajeev 
---
Changelog:
  v5->v6 : Header file inclusion
  v4->v5 : Removed offset from available calculation, as offset is
   always zero leading to buffur size reads.
   Edited comments and commit message

v3 patch:
  
https://lore.kernel.org/linuxppc-dev/20240828085223.42177-2-ma...@linux.ibm.com/

 arch/powerpc/platforms/pseries/Kconfig   |   9 ++
 arch/powerpc/platforms/pseries/Makefile  |   1 +
 arch/powerpc/platforms/pseries/htmdump.c | 121 +++
 3 files changed, 131 insertions(+)
 create mode 100644 arch/powerpc/platforms/pseries/htmdump.c

diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 42fc66e97539..b839e87408aa 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -128,6 +128,15 @@ config CMM
  will be reused for other LPARs. The interface allows firmware to
  balance memory across many LPARs.
 
+config HTMDUMP
+   tristate "PowerVM data dumper"
+   depends on PPC_PSERIES && DEBUG_FS
+   default m
+   help
+ Select this option, if you want to enable the kernel debugfs
+ interface to dump the Hardware Trace Macro (HTM) function data
+ in the LPAR.
+
 config HV_PERF_CTRS
bool "Hypervisor supplied PMU events (24x7 & GPCI)"
default y
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 7bf506f6b8c8..3f3e3492e436 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -19,6 +19,7 @@ obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o
 obj-$(CONFIG_HVCS) += hvcserver.o
 obj-$(CONFIG_HCALL_STATS)  += hvCall_inst.o
 obj-$(CONFIG_CMM)  += cmm.o
+obj-$(CONFIG_HTMDUMP)  += htmdump.o
 obj-$(CONFIG_IO_EVENT_IRQ) += io_event_irq.o
 obj-$(CONFIG_LPARCFG)  += lparcfg.o
 obj-$(CONFIG_IBMVIO)   += vio.o
diff --git a/arch/powerpc/platforms/pseries/htmdump.c 
b/arch/powerpc/platforms/pseries/htmdump.c
new file mode 100644
index ..57fc1700f604
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/htmdump.c
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) IBM Corporation, 2024
+ */
+
+#define pr_fmt(fmt) "htmdump: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static void *htm_buf;
+static u32 nodeindex;
+static u32 nodalchipindex;
+static u32 coreindexonchip;
+static u32 htmtype;
+static struct dentry *htmdump_debugfs_dir;
+
+static ssize_t htmdump_read(struct file *filp, char __user *ubuf,
+size_t count, loff_t *ppos)
+{
+   void *htm_buf = filp->private_data;
+   unsigned long page, read_size, available;
+   loff_t offset;
+   long rc;
+
+   page = ALIGN_DOWN(*ppos, PAGE_SIZE);
+   offset = (*ppos) % PAGE_SIZE;
+
+   rc = htm_get_dump_hardware(nodeindex, nodalchipindex, coreindexonchip,
+  htmtype, virt_to_phys(htm_buf), PAGE_SIZE, 
page);
+
+   switch (rc) {
+   case H_SUCCESS:
+   /* H_PARTIAL for the case where all available data can't be
+* returned due to buffer size constraint.
+*/
+   case H_PARTIAL:
+   break;
+   /* H_NOT_AVAILABLE indicates reading from an offset outside the range,
+* i.e. past end of file.
+*/
+   case H_NOT_AVAILABLE:
+   return 0;
+   case H_BUSY:
+   case H_LONG_BUSY_ORDER_1_MSEC:
+   case H_LONG_BUSY_ORDER_10_MSEC:
+   case H_LONG_BUSY_ORDER_100_MSEC:
+   case H_LONG_BUSY_ORDER_1_SEC:
+   case H_LONG_BUSY_ORDER_10_SEC:
+   case H_LONG_BUSY_ORDER_100_SEC:
+   return -EBUSY;
+   case H_PARAMETER:
+   case H_P2:
+   case H_P3:
+   case H_P4:
+   case H_P5:
+   case H_P6:
+   return -EINVAL;
+   case H_STATE:
+   return -EIO;
+   case H_AUTHORITY:
+   return -EPERM;
+   }
+
+   available = PAGE_SIZE;
+   read_size = min(count, available);
+   *ppos += read_size;
+   return simple_read_from_buffer(ubuf, co

[PATCH v6 RESEND 3/3] powerpc: Document details on H_HTM hcall

2025-01-13 Thread adubey

From: Abhishek Dubey 

Add documentation to 'papr_hcalls.rst' describing the
input, output and return values of the H_HTM hcall as
per the internal specification.

Signed-off-by: Abhishek Dubey 
Co-developed-by: Madhavan Srinivasan 
Signed-off-by: Madhavan Srinivasan 
Reviewed-by: Athira Rajeev 
---
 v3 patch:
  
https://lore.kernel.org/linuxppc-dev/20240828085223.42177-3-ma...@linux.ibm.com/

 Documentation/arch/powerpc/papr_hcalls.rst | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/Documentation/arch/powerpc/papr_hcalls.rst 
b/Documentation/arch/powerpc/papr_hcalls.rst
index 80d2c0aadab5..805e1cb9bab9 100644
--- a/Documentation/arch/powerpc/papr_hcalls.rst
+++ b/Documentation/arch/powerpc/papr_hcalls.rst
@@ -289,6 +289,17 @@ to be issued multiple times in order to be completely 
serviced. The
 subsequent hcalls to the hypervisor until the hcall is completely serviced
 at which point H_SUCCESS or other error is returned by the hypervisor.
 
+**H_HTM**
+
+| Input: flags, target, operation (op), op-param1, op-param2, op-param3
+| Out: *dumphtmbufferdata*
+| Return Value: *H_Success,H_Busy,H_LongBusyOrder,H_Partial,H_Parameter,
+H_P2,H_P3,H_P4,H_P5,H_P6,H_State,H_Not_Available,H_Authority*
+
+H_HTM supports setup, configuration, control and dumping of Hardware Trace
+Macro (HTM) function and its data. HTM buffer stores tracing data for functions
+like core instruction, core LLAT and nest.
+
 References
 ==
 .. [1] "Power Architecture Platform Reference"
-- 
2.39.3

[PATCH v6 RESEND 1/3] powerpc/pseries: Macros and wrapper functions for H_HTM call

2025-01-13 Thread adubey

From: Abhishek Dubey 

Define macros and wrapper functions to handle
H_HTM (Hardware Trace Macro) hypervisor call.
H_HTM is new HCALL added to export data from
Hardware Trace Macro (HTM) function.

Signed-off-by: Abhishek Dubey 
Co-developed-by: Madhavan Srinivasan 
Signed-off-by: Madhavan Srinivasan 
Reviewed-by: Athira Rajeev 
---
v3 patch:
  
https://lore.kernel.org/linuxppc-dev/20240828085223.42177-1-ma...@linux.ibm.com/

 arch/powerpc/include/asm/hvcall.h | 34 +++
 arch/powerpc/include/asm/plpar_wrappers.h | 21 ++
 2 files changed, 55 insertions(+)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 65d1f291393d..eeef13db2770 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -348,6 +348,7 @@
 #define H_SCM_FLUSH0x44C
 #define H_GET_ENERGY_SCALE_INFO0x450
 #define H_PKS_SIGNED_UPDATE0x454
+#define H_HTM   0x458
 #define H_WATCHDOG 0x45C
 #define H_GUEST_GET_CAPABILITIES 0x460
 #define H_GUEST_SET_CAPABILITIES 0x464
@@ -498,6 +499,39 @@
 #define H_GUEST_CAP_POWER11(1UL<<(63-3))
 #define H_GUEST_CAP_BITMAP2(1UL<<(63-63))
 
+/*
+ * Defines for H_HTM - Macros for hardware trace macro (HTM) function.
+ */
+#define H_HTM_FLAGS_HARDWARE_TARGET(1ul << 63)
+#define H_HTM_FLAGS_LOGICAL_TARGET (1ul << 62)
+#define H_HTM_FLAGS_PROCID_TARGET  (1ul << 61)
+#define H_HTM_FLAGS_NOWRAP (1ul << 60)
+
+#define H_HTM_OP_SHIFT (63-15)
+#define H_HTM_OP(x)((unsigned long)(x)<
 
diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index 71648c126970..91be7b885944 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -65,6 +65,27 @@ static inline long register_dtl(unsigned long cpu, unsigned 
long vpa)
return vpa_call(H_VPA_REG_DTL, cpu, vpa);
 }
 
+static inline long htm_call(unsigned long flags, unsigned long target,
+   unsigned long operation, unsigned long param1,
+   unsigned long param2, unsigned long param3)
+{
+   return plpar_hcall_norets(H_HTM, flags, target, operation,
+ param1, param2, param3);
+}
+
+static inline long htm_get_dump_hardware(unsigned long nodeindex,
+   unsigned long nodalchipindex, unsigned long coreindexonchip,
+   unsigned long type, unsigned long addr, unsigned long size,
+   unsigned long offset)
+{
+   return htm_call(H_HTM_FLAGS_HARDWARE_TARGET,
+   H_HTM_TARGET_NODE_INDEX(nodeindex) |
+   H_HTM_TARGET_NODAL_CHIP_INDEX(nodalchipindex) |
+   H_HTM_TARGET_CORE_INDEX_ON_CHIP(coreindexonchip),
+   H_HTM_OP(H_HTM_OP_DUMP_DATA) | H_HTM_TYPE(type),
+   addr, size, offset);
+}
+
 extern void vpa_init(int cpu);
 
 static inline long plpar_pte_enter(unsigned long flags,
-- 
2.39.3

Re: [PATCH 0/2] ASoC: fsl: Support MQS on i.MX943

2025-01-13 Thread Mark Brown

On Mon, 13 Jan 2025 17:03:19 +0800, Shengjiu Wang wrote:
> There are two MQS instances on the i.MX943 platform.
> The definition of bit positions in the control register are
> different. In order to support these MQS modules, define
> two compatible strings to distinguish them.
> 
> Shengjiu Wang (2):
>   ASoC: fsl_mqs: Add i.MX943 platform support
>   ASoC: dt-bindings: fsl,mqs: Add compatible string for i.MX943 platform
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/2] ASoC: fsl_mqs: Add i.MX943 platform support
  commit: 6f490e6b2c34792e363685bacb48a759e7e40cd1
[2/2] ASoC: dt-bindings: fsl,mqs: Add compatible string for i.MX943 platform
  commit: a1a771e5f1e31e4764d9a225c02e93969d3f5389

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

99 matches

Mail list logo