[PATCH] fadump: Use str_yes_no() helper in fadump_show_config()

2024-12-30 Thread Thorsten Blum
Remove hard-coded strings by using the str_yes_no() helper function.

Signed-off-by: Thorsten Blum 
---
 arch/powerpc/kernel/fadump.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 4b371c738213..8c531533dd3e 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -289,10 +289,8 @@ static void __init fadump_show_config(void)
if (!fw_dump.fadump_supported)
return;
 
-   pr_debug("Fadump enabled: %s\n",
-   (fw_dump.fadump_enabled ? "yes" : "no"));
-   pr_debug("Dump Active   : %s\n",
-   (fw_dump.dump_active ? "yes" : "no"));
+   pr_debug("Fadump enabled: %s\n", 
str_yes_no(fw_dump.fadump_enabled));
+   pr_debug("Dump Active   : %s\n", str_yes_no(fw_dump.dump_active));
pr_debug("Dump section sizes:\n");
pr_debug("CPU state data size: %lx\n", fw_dump.cpu_state_data_size);
pr_debug("HPTE region size   : %lx\n", fw_dump.hpte_region_size);
-- 
2.47.1




Re: [PATCH] net: ethernet: toshiba: ps3_gelic_wireless: Remove driver using deprecated API wext

2024-12-30 Thread Johannes Berg
On Tue, 2024-12-24 at 09:07 +0100, Philipp Hortmann wrote:
> Driver was contributed in 2008.
> 
> The following reasons lead to the removal:
> - This driver generates maintenance workload for itself and for API wext

So I've been wondering, why are you so concerned about this? And in
particular, more concerned about it than the people actually doing the
maintenance? :)

We got here because I removed a *staging* driver that was in the way of
some wext cleanups, but that had a thousand other reasons to never go
anywhere anyway.

> - wext is deprecated and only used by two wireless drivers in
>   mainline kernel

true

> - no progress changing to mac80211

It fundamentally cannot be converted to mac80211, it has a whole
different model. In fact it cannot even be converted to cfg80211 because
some APIs it uses just never existed there, and likely never will.

> Tested a rebased version of this patch on the Playstation 3. Used
> T2 Linux with Kernel 6.12.5 to test the Ethernet connection.
> 

Arguably that's a pretty strong argument for *not* removing it, if it's
actually relatively simple today to bring up the latest kernel on a PS3.

johannes



[PATCH v4 00/15] move pagetable_*_dtor() to __tlb_remove_table()

2024-12-30 Thread Qi Zheng
Changes in v4:
 - remove [PATCH v3 15/17] and [PATCH v3 16/17] (Mike Rapoport)
   (the tlb_remove_page_ptdesc() and tlb_remove_ptdesc() are intermediate
products of the project: https://kernelnewbies.org/MatthewWilcox/Memdescs,
so keep them)
 - collect Acked-by

Changes in v3:
 - take patch #5 and #6 from Kevin Brodsky's patch series below.
   Link: 
https://lore.kernel.org/lkml/20241219164425.2277022-1-kevin.brod...@arm.com/
 - separate the statistics part from [PATCH v2 02/15] as [PATCH v3 04/17], and
   replace the rest part with Kevin Brodsky's patch #6
   (Alexander Gordeev and Kevin Brodsky)
 - change the commit message of [PATCH v2 10/15] and [PATCH v2 11/15]
   (Alexander Gordeev)
 - fix the bug introduced by [PATCH v2 11/15]
   (Peter Zijlstra)
 - rebase onto the next-20241220

Changes in v2:
 - add [PATCH v2 13|14|15/15] (suggested by Peter Zijlstra)
 - add Originally-bys and Suggested-bys
 - rebase onto the next-20241218

Hi all,

As proposed [1] by Peter Zijlstra below, this patch series aims to move
pagetable_*_dtor() into __tlb_remove_table(). This will cleanup 
pagetable_*_dtor()
a bit and more gracefully fix the UAF issue [2] reported by syzbot.

```
Notably:

 - s390 pud isn't calling the existing pagetable_pud_[cd]tor()
 - none of the p4d things have pagetable_p4d_[cd]tor() (x86,arm64,s390,riscv)
   and they have inconsistent accounting
 - while much of the _ctor calls are in generic code, many of the _dtor
   calls are in arch code for hysterial raisins, this could easily be
   fixed
 - if we fix ptlock_free() to handle NULL, then all the _dtor()
   functions can use it, and we can observe they're all identical
   and can be folded

after all that cleanup, you can move the _dtor from *_free_tlb() into
tlb_remove_table() -- which for the above case, would then have it
called from __tlb_remove_table_free().
```

And hi Andrew, I developed the code based on the latest linux-next, so I 
reverted
the "mm: pgtable: make ptlock be freed by RCU" first. Once the review of this
patch series is completed, the "mm: pgtable: make ptlock be freed by RCU" can be
dropped directly from mm tree, and this revert patch will not be needed.

This series is based on next-20241220. And I tested this patch series on x86 and
only cross-compiled it on arm, arm64, powerpc, riscv, s390 and sparc.

Comments and suggestions are welcome!

Thanks,
Qi

[1]. 
https://lore.kernel.org/all/20241211133433.gc12...@noisy.programming.kicks-ass.net/
[2]. https://lore.kernel.org/all/67548279.050a0220.a30f1.015b@google.com/

Kevin Brodsky (2):
  riscv: mm: Skip pgtable level check in {pud,p4d}_alloc_one
  asm-generic: pgalloc: Provide generic p4d_{alloc_one,free}

Qi Zheng (13):
  Revert "mm: pgtable: make ptlock be freed by RCU"
  mm: pgtable: add statistics for P4D level page table
  arm64: pgtable: use mmu gather to free p4d level page table
  s390: pgtable: add statistics for PUD and P4D level page table
  mm: pgtable: introduce pagetable_dtor()
  arm: pgtable: move pagetable_dtor() to __tlb_remove_table()
  arm64: pgtable: move pagetable_dtor() to __tlb_remove_table()
  riscv: pgtable: move pagetable_dtor() to __tlb_remove_table()
  x86: pgtable: move pagetable_dtor() to __tlb_remove_table()
  s390: pgtable: also move pagetable_dtor() of PxD to
__tlb_remove_table()
  mm: pgtable: introduce generic __tlb_remove_table()
  mm: pgtable: move __tlb_remove_table_one() in x86 to generic file
  mm: pgtable: introduce generic pagetable_dtor_free()

 Documentation/mm/split_page_table_lock.rst |  4 +-
 arch/arm/include/asm/tlb.h | 10 
 arch/arm64/include/asm/pgalloc.h   | 18 --
 arch/arm64/include/asm/tlb.h   | 21 ---
 arch/csky/include/asm/pgalloc.h|  2 +-
 arch/hexagon/include/asm/pgalloc.h |  2 +-
 arch/loongarch/include/asm/pgalloc.h   |  2 +-
 arch/m68k/include/asm/mcf_pgalloc.h|  4 +-
 arch/m68k/include/asm/sun3_pgalloc.h   |  2 +-
 arch/m68k/mm/motorola.c|  2 +-
 arch/mips/include/asm/pgalloc.h|  2 +-
 arch/nios2/include/asm/pgalloc.h   |  2 +-
 arch/openrisc/include/asm/pgalloc.h|  2 +-
 arch/powerpc/include/asm/tlb.h |  1 +
 arch/powerpc/mm/book3s64/mmu_context.c |  2 +-
 arch/powerpc/mm/book3s64/pgtable.c |  2 +-
 arch/powerpc/mm/pgtable-frag.c |  4 +-
 arch/riscv/include/asm/pgalloc.h   | 69 +-
 arch/riscv/include/asm/tlb.h   | 18 --
 arch/riscv/mm/init.c   |  4 +-
 arch/s390/include/asm/pgalloc.h| 31 +++---
 arch/s390/include/asm/tlb.h| 43 +++---
 arch/s390/mm/pgalloc.c | 23 +---
 arch/sh/include/asm/pgalloc.h  |  2 +-
 arch/sparc/include/asm/tlb_32.h|  1 +
 arch/sparc/include/asm/tlb_64.h|  1 +
 arch/sparc/mm/init_64.c|  2 +-
 arch/sparc/mm/srmmu.c

[PATCH v4 02/15] riscv: mm: Skip pgtable level check in {pud,p4d}_alloc_one

2024-12-30 Thread Qi Zheng
From: Kevin Brodsky 

{pmd,pud,p4d}_alloc_one() is never called if the corresponding page
table level is folded, as {pmd,pud,p4d}_alloc() already does the
required check. We can therefore remove the runtime page table level
checks in {pud,p4d}_alloc_one. The PUD helper becomes equivalent to
the generic version, so we remove it altogether.

This is consistent with the way arm64 and x86 handle this situation
(runtime check in p4d_free() only).

Signed-off-by: Kevin Brodsky 
Acked-by: Dave Hansen 
Signed-off-by: Qi Zheng 
Acked-by: Palmer Dabbelt 
---
 arch/riscv/include/asm/pgalloc.h | 22 --
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index f52264304f772..8ad0bbe838a24 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -12,7 +12,6 @@
 #include 
 
 #ifdef CONFIG_MMU
-#define __HAVE_ARCH_PUD_ALLOC_ONE
 #define __HAVE_ARCH_PUD_FREE
 #include 
 
@@ -88,15 +87,6 @@ static inline void pgd_populate_safe(struct mm_struct *mm, 
pgd_t *pgd,
}
 }
 
-#define pud_alloc_one pud_alloc_one
-static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
-{
-   if (pgtable_l4_enabled)
-   return __pud_alloc_one(mm, addr);
-
-   return NULL;
-}
-
 #define pud_free pud_free
 static inline void pud_free(struct mm_struct *mm, pud_t *pud)
 {
@@ -118,15 +108,11 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, 
pud_t *pud,
 #define p4d_alloc_one p4d_alloc_one
 static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-   if (pgtable_l5_enabled) {
-   gfp_t gfp = GFP_PGTABLE_USER;
-
-   if (mm == &init_mm)
-   gfp = GFP_PGTABLE_KERNEL;
-   return (p4d_t *)get_zeroed_page(gfp);
-   }
+   gfp_t gfp = GFP_PGTABLE_USER;
 
-   return NULL;
+   if (mm == &init_mm)
+   gfp = GFP_PGTABLE_KERNEL;
+   return (p4d_t *)get_zeroed_page(gfp);
 }
 
 static inline void __p4d_free(struct mm_struct *mm, p4d_t *p4d)
-- 
2.20.1




[PATCH v4 01/15] Revert "mm: pgtable: make ptlock be freed by RCU"

2024-12-30 Thread Qi Zheng
This reverts commit 2f3443770437e49abc39af26962d293851cbab6d.

Signed-off-by: Qi Zheng 
---
 include/linux/mm.h   |  2 +-
 include/linux/mm_types.h |  9 +
 mm/memory.c  | 22 ++
 3 files changed, 8 insertions(+), 25 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index d61b9c7a3a7b0..c49bc7b764535 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2925,7 +2925,7 @@ void ptlock_free(struct ptdesc *ptdesc);
 
 static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
 {
-   return &(ptdesc->ptl->ptl);
+   return ptdesc->ptl;
 }
 #else /* ALLOC_SPLIT_PTLOCKS */
 static inline void ptlock_cache_init(void)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 90ab8293d714a..6b27db7f94963 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -434,13 +434,6 @@ FOLIO_MATCH(flags, _flags_2a);
 FOLIO_MATCH(compound_head, _head_2a);
 #undef FOLIO_MATCH
 
-#if ALLOC_SPLIT_PTLOCKS
-struct pt_lock {
-   spinlock_t ptl;
-   struct rcu_head rcu;
-};
-#endif
-
 /**
  * struct ptdesc -Memory descriptor for page tables.
  * @__page_flags: Same as page flags. Powerpc only.
@@ -489,7 +482,7 @@ struct ptdesc {
union {
unsigned long _pt_pad_2;
 #if ALLOC_SPLIT_PTLOCKS
-   struct pt_lock *ptl;
+   spinlock_t *ptl;
 #else
spinlock_t ptl;
 #endif
diff --git a/mm/memory.c b/mm/memory.c
index b9b05c3f93f11..9423967b24180 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -7034,34 +7034,24 @@ static struct kmem_cache *page_ptl_cachep;
 
 void __init ptlock_cache_init(void)
 {
-   page_ptl_cachep = kmem_cache_create("page->ptl", sizeof(struct 
pt_lock), 0,
+   page_ptl_cachep = kmem_cache_create("page->ptl", sizeof(spinlock_t), 0,
SLAB_PANIC, NULL);
 }
 
 bool ptlock_alloc(struct ptdesc *ptdesc)
 {
-   struct pt_lock *pt_lock;
+   spinlock_t *ptl;
 
-   pt_lock = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
-   if (!pt_lock)
+   ptl = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
+   if (!ptl)
return false;
-   ptdesc->ptl = pt_lock;
+   ptdesc->ptl = ptl;
return true;
 }
 
-static void ptlock_free_rcu(struct rcu_head *head)
-{
-   struct pt_lock *pt_lock;
-
-   pt_lock = container_of(head, struct pt_lock, rcu);
-   kmem_cache_free(page_ptl_cachep, pt_lock);
-}
-
 void ptlock_free(struct ptdesc *ptdesc)
 {
-   struct pt_lock *pt_lock = ptdesc->ptl;
-
-   call_rcu(&pt_lock->rcu, ptlock_free_rcu);
+   kmem_cache_free(page_ptl_cachep, ptdesc->ptl);
 }
 #endif
 
-- 
2.20.1




[PATCH v4 07/15] mm: pgtable: introduce pagetable_dtor()

2024-12-30 Thread Qi Zheng
The pagetable_p*_dtor() are exactly the same except for the handling of
ptlock. If we make ptlock_free() handle the case where ptdesc->ptl is
NULL and remove VM_BUG_ON_PAGE() from pmd_ptlock_free(), we can unify
pagetable_p*_dtor() into one function. Let's introduce pagetable_dtor()
to do this.

Later, pagetable_dtor() will be moved to tlb_remove_ptdesc(), so that
ptlock and page table pages can be freed together (regardless of whether
RCU is used). This prevents the use-after-free problem where the ptlock
is freed immediately but the page table pages is freed later via RCU.

Signed-off-by: Qi Zheng 
Originally-by: Peter Zijlstra (Intel) 
---
 Documentation/mm/split_page_table_lock.rst |  4 +-
 arch/arm/include/asm/tlb.h |  4 +-
 arch/arm64/include/asm/tlb.h   |  8 ++--
 arch/csky/include/asm/pgalloc.h|  2 +-
 arch/hexagon/include/asm/pgalloc.h |  2 +-
 arch/loongarch/include/asm/pgalloc.h   |  2 +-
 arch/m68k/include/asm/mcf_pgalloc.h|  4 +-
 arch/m68k/include/asm/sun3_pgalloc.h   |  2 +-
 arch/m68k/mm/motorola.c|  2 +-
 arch/mips/include/asm/pgalloc.h|  2 +-
 arch/nios2/include/asm/pgalloc.h   |  2 +-
 arch/openrisc/include/asm/pgalloc.h|  2 +-
 arch/powerpc/mm/book3s64/mmu_context.c |  2 +-
 arch/powerpc/mm/book3s64/pgtable.c |  2 +-
 arch/powerpc/mm/pgtable-frag.c |  4 +-
 arch/riscv/include/asm/pgalloc.h   |  8 ++--
 arch/riscv/mm/init.c   |  4 +-
 arch/s390/include/asm/pgalloc.h|  6 +--
 arch/s390/include/asm/tlb.h|  6 +--
 arch/s390/mm/pgalloc.c |  2 +-
 arch/sh/include/asm/pgalloc.h  |  2 +-
 arch/sparc/mm/init_64.c|  2 +-
 arch/sparc/mm/srmmu.c  |  2 +-
 arch/um/include/asm/pgalloc.h  |  6 +--
 arch/x86/mm/pgtable.c  | 12 ++---
 include/asm-generic/pgalloc.h  |  8 ++--
 include/linux/mm.h | 52 --
 mm/memory.c|  3 +-
 28 files changed, 62 insertions(+), 95 deletions(-)

diff --git a/Documentation/mm/split_page_table_lock.rst 
b/Documentation/mm/split_page_table_lock.rst
index 581446d4a4eba..8e1ceb0a6619a 100644
--- a/Documentation/mm/split_page_table_lock.rst
+++ b/Documentation/mm/split_page_table_lock.rst
@@ -62,7 +62,7 @@ Support of split page table lock by an architecture
 ===
 
 There's no need in special enabling of PTE split page table lock: everything
-required is done by pagetable_pte_ctor() and pagetable_pte_dtor(), which
+required is done by pagetable_pte_ctor() and pagetable_dtor(), which
 must be called on PTE table allocation / freeing.
 
 Make sure the architecture doesn't use slab allocator for page table
@@ -73,7 +73,7 @@ PMD split lock only makes sense if you have more than two 
page table
 levels.
 
 PMD split lock enabling requires pagetable_pmd_ctor() call on PMD table
-allocation and pagetable_pmd_dtor() on freeing.
+allocation and pagetable_dtor() on freeing.
 
 Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
 pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index f40d06ad5d2a3..ef79bf1e8563f 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -41,7 +41,7 @@ __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, 
unsigned long addr)
 {
struct ptdesc *ptdesc = page_ptdesc(pte);
 
-   pagetable_pte_dtor(ptdesc);
+   pagetable_dtor(ptdesc);
 
 #ifndef CONFIG_ARM_LPAE
/*
@@ -61,7 +61,7 @@ __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned 
long addr)
 #ifdef CONFIG_ARM_LPAE
struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
 
-   pagetable_pmd_dtor(ptdesc);
+   pagetable_dtor(ptdesc);
tlb_remove_ptdesc(tlb, ptdesc);
 #endif
 }
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 445282cde9afb..408d0f36a8a8f 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -82,7 +82,7 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, 
pgtable_t pte,
 {
struct ptdesc *ptdesc = page_ptdesc(pte);
 
-   pagetable_pte_dtor(ptdesc);
+   pagetable_dtor(ptdesc);
tlb_remove_ptdesc(tlb, ptdesc);
 }
 
@@ -92,7 +92,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, 
pmd_t *pmdp,
 {
struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
 
-   pagetable_pmd_dtor(ptdesc);
+   pagetable_dtor(ptdesc);
tlb_remove_ptdesc(tlb, ptdesc);
 }
 #endif
@@ -106,7 +106,7 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, 
pud_t *pudp,
if (!pgtable_l4_enabled())
return;
 
-   pagetable_pud_dtor(ptdesc);
+   pagetable_dtor(ptdesc);
tlb_remove_pt

[PATCH v4 03/15] asm-generic: pgalloc: Provide generic p4d_{alloc_one,free}

2024-12-30 Thread Qi Zheng
From: Kevin Brodsky 

Four architectures currently implement 5-level pgtables: arm64,
riscv, x86 and s390. The first three have essentially the same
implementation for p4d_alloc_one() and p4d_free(), so we've got an
opportunity to reduce duplication like at the lower levels.

Provide a generic version of p4d_alloc_one() and p4d_free(), and
make use of it on those architectures.

Their implementation is the same as at PUD level, except that
p4d_free() performs a runtime check by calling mm_p4d_folded().
5-level pgtables depend on a runtime-detected hardware feature on
all supported architectures, so we might as well include this check
in the generic implementation. No runtime check is required in
p4d_alloc_one() as the top-level p4d_alloc() already does the
required check.

Signed-off-by: Kevin Brodsky 
Acked-by: Dave Hansen 
Signed-off-by: Qi Zheng 
---
 arch/arm64/include/asm/pgalloc.h | 17 
 arch/riscv/include/asm/pgalloc.h | 23 
 arch/x86/include/asm/pgalloc.h   | 18 -
 include/asm-generic/pgalloc.h| 45 
 4 files changed, 45 insertions(+), 58 deletions(-)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index e75422864d1bd..2965f5a7e39e3 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -85,23 +85,6 @@ static inline void pgd_populate(struct mm_struct *mm, pgd_t 
*pgdp, p4d_t *p4dp)
__pgd_populate(pgdp, __pa(p4dp), pgdval);
 }
 
-static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
-{
-   gfp_t gfp = GFP_PGTABLE_USER;
-
-   if (mm == &init_mm)
-   gfp = GFP_PGTABLE_KERNEL;
-   return (p4d_t *)get_zeroed_page(gfp);
-}
-
-static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
-{
-   if (!pgtable_l5_enabled())
-   return;
-   BUG_ON((unsigned long)p4d & (PAGE_SIZE-1));
-   free_page((unsigned long)p4d);
-}
-
 #define __p4d_free_tlb(tlb, p4d, addr)  p4d_free((tlb)->mm, p4d)
 #else
 static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t prot)
diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index 8ad0bbe838a24..551d614d3369c 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -105,29 +105,6 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, 
pud_t *pud,
}
 }
 
-#define p4d_alloc_one p4d_alloc_one
-static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
-{
-   gfp_t gfp = GFP_PGTABLE_USER;
-
-   if (mm == &init_mm)
-   gfp = GFP_PGTABLE_KERNEL;
-   return (p4d_t *)get_zeroed_page(gfp);
-}
-
-static inline void __p4d_free(struct mm_struct *mm, p4d_t *p4d)
-{
-   BUG_ON((unsigned long)p4d & (PAGE_SIZE-1));
-   free_page((unsigned long)p4d);
-}
-
-#define p4d_free p4d_free
-static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
-{
-   if (pgtable_l5_enabled)
-   __p4d_free(mm, p4d);
-}
-
 static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
  unsigned long addr)
 {
diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index dcd836b59bebd..dd4841231bb9f 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -147,24 +147,6 @@ static inline void pgd_populate_safe(struct mm_struct *mm, 
pgd_t *pgd, p4d_t *p4
set_pgd_safe(pgd, __pgd(_PAGE_TABLE | __pa(p4d)));
 }
 
-static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
-{
-   gfp_t gfp = GFP_KERNEL_ACCOUNT;
-
-   if (mm == &init_mm)
-   gfp &= ~__GFP_ACCOUNT;
-   return (p4d_t *)get_zeroed_page(gfp);
-}
-
-static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
-{
-   if (!pgtable_l5_enabled())
-   return;
-
-   BUG_ON((unsigned long)p4d & (PAGE_SIZE-1));
-   free_page((unsigned long)p4d);
-}
-
 extern void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d);
 
 static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 7c48f5fbf8aa7..59131629ac9cc 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -215,6 +215,51 @@ static inline void pud_free(struct mm_struct *mm, pud_t 
*pud)
 
 #endif /* CONFIG_PGTABLE_LEVELS > 3 */
 
+#if CONFIG_PGTABLE_LEVELS > 4
+
+static inline p4d_t *__p4d_alloc_one_noprof(struct mm_struct *mm, unsigned 
long addr)
+{
+   gfp_t gfp = GFP_PGTABLE_USER;
+   struct ptdesc *ptdesc;
+
+   if (mm == &init_mm)
+   gfp = GFP_PGTABLE_KERNEL;
+   gfp &= ~__GFP_HIGHMEM;
+
+   ptdesc = pagetable_alloc_noprof(gfp, 0);
+   if (!ptdesc)
+   return NULL;
+
+   return ptdesc_address(ptdesc);
+}
+#define __p4d_alloc_one(...)   alloc_hooks(__p4d_alloc_one_noprof(__VA_ARGS__))
+
+#ifndef __HAVE_ARCH_P4D

[PATCH v4 05/15] arm64: pgtable: use mmu gather to free p4d level page table

2024-12-30 Thread Qi Zheng
Like other levels of page tables, also use mmu gather mechanism to free
p4d level page table.

Signed-off-by: Qi Zheng 
Originally-by: Peter Zijlstra (Intel) 
Cc: linux-arm-ker...@lists.infradead.org
---
 arch/arm64/include/asm/pgalloc.h |  1 -
 arch/arm64/include/asm/tlb.h | 14 ++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 2965f5a7e39e3..1b4509d3382c6 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -85,7 +85,6 @@ static inline void pgd_populate(struct mm_struct *mm, pgd_t 
*pgdp, p4d_t *p4dp)
__pgd_populate(pgdp, __pa(p4dp), pgdval);
 }
 
-#define __p4d_free_tlb(tlb, p4d, addr)  p4d_free((tlb)->mm, p4d)
 #else
 static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t p4dp, pgdval_t prot)
 {
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index a947c6e784ed2..445282cde9afb 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -111,4 +111,18 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, 
pud_t *pudp,
 }
 #endif
 
+#if CONFIG_PGTABLE_LEVELS > 4
+static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4dp,
+ unsigned long addr)
+{
+   struct ptdesc *ptdesc = virt_to_ptdesc(p4dp);
+
+   if (!pgtable_l5_enabled())
+   return;
+
+   pagetable_p4d_dtor(ptdesc);
+   tlb_remove_ptdesc(tlb, ptdesc);
+}
+#endif
+
 #endif
-- 
2.20.1




[PATCH v4 04/15] mm: pgtable: add statistics for P4D level page table

2024-12-30 Thread Qi Zheng
Like other levels of page tables, add statistics for P4D level page table.

Signed-off-by: Qi Zheng 
Originally-by: Peter Zijlstra (Intel) 
---
 arch/riscv/include/asm/pgalloc.h |  6 +-
 arch/x86/mm/pgtable.c|  3 +++
 include/asm-generic/pgalloc.h|  2 ++
 include/linux/mm.h   | 16 
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index 551d614d3369c..3466fbe2e508d 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -108,8 +108,12 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, 
pud_t *pud,
 static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
  unsigned long addr)
 {
-   if (pgtable_l5_enabled)
+   if (pgtable_l5_enabled) {
+   struct ptdesc *ptdesc = virt_to_ptdesc(p4d);
+
+   pagetable_p4d_dtor(ptdesc);
riscv_tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
+   }
 }
 #endif /* __PAGETABLE_PMD_FOLDED */
 
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 69a357b15974a..3d6e84da45b24 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -94,6 +94,9 @@ void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
 #if CONFIG_PGTABLE_LEVELS > 4
 void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
 {
+   struct ptdesc *ptdesc = virt_to_ptdesc(p4d);
+
+   pagetable_p4d_dtor(ptdesc);
paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT);
paravirt_tlb_remove_table(tlb, virt_to_page(p4d));
 }
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 59131629ac9cc..bb482eeca0c3e 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -230,6 +230,7 @@ static inline p4d_t *__p4d_alloc_one_noprof(struct 
mm_struct *mm, unsigned long
if (!ptdesc)
return NULL;
 
+   pagetable_p4d_ctor(ptdesc);
return ptdesc_address(ptdesc);
 }
 #define __p4d_alloc_one(...)   alloc_hooks(__p4d_alloc_one_noprof(__VA_ARGS__))
@@ -247,6 +248,7 @@ static inline void __p4d_free(struct mm_struct *mm, p4d_t 
*p4d)
struct ptdesc *ptdesc = virt_to_ptdesc(p4d);
 
BUG_ON((unsigned long)p4d & (PAGE_SIZE-1));
+   pagetable_p4d_dtor(ptdesc);
pagetable_free(ptdesc);
 }
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c49bc7b764535..5d82f42ddd5cc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3175,6 +3175,22 @@ static inline void pagetable_pud_dtor(struct ptdesc 
*ptdesc)
lruvec_stat_sub_folio(folio, NR_PAGETABLE);
 }
 
+static inline void pagetable_p4d_ctor(struct ptdesc *ptdesc)
+{
+   struct folio *folio = ptdesc_folio(ptdesc);
+
+   __folio_set_pgtable(folio);
+   lruvec_stat_add_folio(folio, NR_PAGETABLE);
+}
+
+static inline void pagetable_p4d_dtor(struct ptdesc *ptdesc)
+{
+   struct folio *folio = ptdesc_folio(ptdesc);
+
+   __folio_clear_pgtable(folio);
+   lruvec_stat_sub_folio(folio, NR_PAGETABLE);
+}
+
 extern void __init pagecache_init(void);
 extern void free_initmem(void);
 
-- 
2.20.1




[PATCH v4 06/15] s390: pgtable: add statistics for PUD and P4D level page table

2024-12-30 Thread Qi Zheng
Like PMD and PTE level page table, also add statistics for PUD and P4D
page table.

Signed-off-by: Qi Zheng 
Suggested-by: Peter Zijlstra (Intel) 
Cc: linux-s...@vger.kernel.org
---
 arch/s390/include/asm/pgalloc.h | 29 +++---
 arch/s390/include/asm/tlb.h | 37 +
 2 files changed, 40 insertions(+), 26 deletions(-)

diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 7b84ef6dc4b6d..a0c1ca5d8423c 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -53,29 +53,42 @@ static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, 
unsigned long address)
 {
unsigned long *table = crst_table_alloc(mm);
 
-   if (table)
-   crst_table_init(table, _REGION2_ENTRY_EMPTY);
+   if (!table)
+   return NULL;
+   crst_table_init(table, _REGION2_ENTRY_EMPTY);
+   pagetable_p4d_ctor(virt_to_ptdesc(table));
+
return (p4d_t *) table;
 }
 
 static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
 {
-   if (!mm_p4d_folded(mm))
-   crst_table_free(mm, (unsigned long *) p4d);
+   if (mm_p4d_folded(mm))
+   return;
+
+   pagetable_p4d_dtor(virt_to_ptdesc(p4d));
+   crst_table_free(mm, (unsigned long *) p4d);
 }
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long address)
 {
unsigned long *table = crst_table_alloc(mm);
-   if (table)
-   crst_table_init(table, _REGION3_ENTRY_EMPTY);
+
+   if (!table)
+   return NULL;
+   crst_table_init(table, _REGION3_ENTRY_EMPTY);
+   pagetable_pud_ctor(virt_to_ptdesc(table));
+
return (pud_t *) table;
 }
 
 static inline void pud_free(struct mm_struct *mm, pud_t *pud)
 {
-   if (!mm_pud_folded(mm))
-   crst_table_free(mm, (unsigned long *) pud);
+   if (mm_pud_folded(mm))
+   return;
+
+   pagetable_pud_dtor(virt_to_ptdesc(pud));
+   crst_table_free(mm, (unsigned long *) pud);
 }
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long vmaddr)
diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index e95b2c8081eb8..b946964afce8e 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -110,24 +110,6 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb, 
pmd_t *pmd,
tlb_remove_ptdesc(tlb, pmd);
 }
 
-/*
- * p4d_free_tlb frees a pud table and clears the CRSTE for the
- * region second table entry from the tlb.
- * If the mm uses a four level page table the single p4d is freed
- * as the pgd. p4d_free_tlb checks the asce_limit against 8PB
- * to avoid the double free of the p4d in this case.
- */
-static inline void p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
-   unsigned long address)
-{
-   if (mm_p4d_folded(tlb->mm))
-   return;
-   __tlb_adjust_range(tlb, address, PAGE_SIZE);
-   tlb->mm->context.flush_mm = 1;
-   tlb->freed_tables = 1;
-   tlb_remove_ptdesc(tlb, p4d);
-}
-
 /*
  * pud_free_tlb frees a pud table and clears the CRSTE for the
  * region third table entry from the tlb.
@@ -140,11 +122,30 @@ static inline void pud_free_tlb(struct mmu_gather *tlb, 
pud_t *pud,
 {
if (mm_pud_folded(tlb->mm))
return;
+   pagetable_pud_dtor(virt_to_ptdesc(pud));
tlb->mm->context.flush_mm = 1;
tlb->freed_tables = 1;
tlb->cleared_p4ds = 1;
tlb_remove_ptdesc(tlb, pud);
 }
 
+/*
+ * p4d_free_tlb frees a p4d table and clears the CRSTE for the
+ * region second table entry from the tlb.
+ * If the mm uses a four level page table the single p4d is freed
+ * as the pgd. p4d_free_tlb checks the asce_limit against 8PB
+ * to avoid the double free of the p4d in this case.
+ */
+static inline void p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
+   unsigned long address)
+{
+   if (mm_p4d_folded(tlb->mm))
+   return;
+   pagetable_p4d_dtor(virt_to_ptdesc(p4d));
+   __tlb_adjust_range(tlb, address, PAGE_SIZE);
+   tlb->mm->context.flush_mm = 1;
+   tlb->freed_tables = 1;
+   tlb_remove_ptdesc(tlb, p4d);
+}
 
 #endif /* _S390_TLB_H */
-- 
2.20.1




[PATCH v4 11/15] x86: pgtable: move pagetable_dtor() to __tlb_remove_table()

2024-12-30 Thread Qi Zheng
Move pagetable_dtor() to __tlb_remove_table(), so that ptlock and page
table pages can be freed together (regardless of whether RCU is used).
This prevents the use-after-free problem where the ptlock is freed
immediately but the page table pages is freed later via RCU.

Page tables shouldn't have swap cache, so use pagetable_free() instead of
free_page_and_swap_cache() to free page table pages.

Signed-off-by: Qi Zheng 
Suggested-by: Peter Zijlstra (Intel) 
Cc: x...@kernel.org
---
 arch/x86/include/asm/tlb.h | 17 ++---
 arch/x86/kernel/paravirt.c |  1 +
 arch/x86/mm/pgtable.c  | 12 ++--
 3 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h
index 73f0786181cc9..f64730be5ad67 100644
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -31,24 +31,27 @@ static inline void tlb_flush(struct mmu_gather *tlb)
  */
 static inline void __tlb_remove_table(void *table)
 {
-   free_page_and_swap_cache(table);
+   struct ptdesc *ptdesc = (struct ptdesc *)table;
+
+   pagetable_dtor(ptdesc);
+   pagetable_free(ptdesc);
 }
 
 #ifdef CONFIG_PT_RECLAIM
 static inline void __tlb_remove_table_one_rcu(struct rcu_head *head)
 {
-   struct page *page;
+   struct ptdesc *ptdesc;
 
-   page = container_of(head, struct page, rcu_head);
-   put_page(page);
+   ptdesc = container_of(head, struct ptdesc, pt_rcu_head);
+   __tlb_remove_table(ptdesc);
 }
 
 static inline void __tlb_remove_table_one(void *table)
 {
-   struct page *page;
+   struct ptdesc *ptdesc;
 
-   page = table;
-   call_rcu(&page->rcu_head, __tlb_remove_table_one_rcu);
+   ptdesc = table;
+   call_rcu(&ptdesc->pt_rcu_head, __tlb_remove_table_one_rcu);
 }
 #define __tlb_remove_table_one __tlb_remove_table_one
 #endif /* CONFIG_PT_RECLAIM */
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 7bdcf152778c0..46d5d325483b0 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -62,6 +62,7 @@ void __init native_pv_lock_init(void)
 #ifndef CONFIG_PT_RECLAIM
 static void native_tlb_remove_table(struct mmu_gather *tlb, void *table)
 {
+   pagetable_dtor(table);
tlb_remove_page(tlb, table);
 }
 #else
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index a6cd9660e29ec..a0b0e501ba663 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -23,6 +23,7 @@ EXPORT_SYMBOL(physical_mask);
 static inline
 void paravirt_tlb_remove_table(struct mmu_gather *tlb, void *table)
 {
+   pagetable_dtor(table);
tlb_remove_page(tlb, table);
 }
 #else
@@ -60,7 +61,6 @@ early_param("userpte", setup_userpte);
 
 void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
 {
-   pagetable_dtor(page_ptdesc(pte));
paravirt_release_pte(page_to_pfn(pte));
paravirt_tlb_remove_table(tlb, pte);
 }
@@ -68,7 +68,6 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
 #if CONFIG_PGTABLE_LEVELS > 2
 void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 {
-   struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
paravirt_release_pmd(__pa(pmd) >> PAGE_SHIFT);
/*
 * NOTE! For PAE, any changes to the top page-directory-pointer-table
@@ -77,16 +76,12 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 #ifdef CONFIG_X86_PAE
tlb->need_flush_all = 1;
 #endif
-   pagetable_dtor(ptdesc);
-   paravirt_tlb_remove_table(tlb, ptdesc_page(ptdesc));
+   paravirt_tlb_remove_table(tlb, virt_to_page(pmd));
 }
 
 #if CONFIG_PGTABLE_LEVELS > 3
 void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
 {
-   struct ptdesc *ptdesc = virt_to_ptdesc(pud);
-
-   pagetable_dtor(ptdesc);
paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
paravirt_tlb_remove_table(tlb, virt_to_page(pud));
 }
@@ -94,9 +89,6 @@ void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
 #if CONFIG_PGTABLE_LEVELS > 4
 void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
 {
-   struct ptdesc *ptdesc = virt_to_ptdesc(p4d);
-
-   pagetable_dtor(ptdesc);
paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT);
paravirt_tlb_remove_table(tlb, virt_to_page(p4d));
 }
-- 
2.20.1




[PATCH v4 13/15] mm: pgtable: introduce generic __tlb_remove_table()

2024-12-30 Thread Qi Zheng
Several architectures (arm, arm64, riscv and x86) define exactly the
same __tlb_remove_table(), just introduce generic __tlb_remove_table() to
eliminate these duplications.

The s390 __tlb_remove_table() is nearly the same, so also make s390
__tlb_remove_table() version generic.

Signed-off-by: Qi Zheng 
---
 arch/arm/include/asm/tlb.h  |  9 -
 arch/arm64/include/asm/tlb.h|  7 ---
 arch/powerpc/include/asm/tlb.h  |  1 +
 arch/riscv/include/asm/tlb.h| 12 
 arch/s390/include/asm/tlb.h |  9 -
 arch/s390/mm/pgalloc.c  |  7 ---
 arch/sparc/include/asm/tlb_32.h |  1 +
 arch/sparc/include/asm/tlb_64.h |  1 +
 arch/x86/include/asm/tlb.h  | 17 -
 include/asm-generic/tlb.h   | 15 +--
 10 files changed, 20 insertions(+), 59 deletions(-)

diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index 264ab635e807a..ea4fbe7b17f6f 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -27,15 +27,6 @@
 #else /* !CONFIG_MMU */
 
 #include 
-
-static inline void __tlb_remove_table(void *_table)
-{
-   struct ptdesc *ptdesc = (struct ptdesc *)_table;
-
-   pagetable_dtor(ptdesc);
-   pagetable_free(ptdesc);
-}
-
 #include 
 
 static inline void
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 93591a80b5bfb..8d762607285cc 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -10,13 +10,6 @@
 
 #include 
 
-static inline void __tlb_remove_table(void *_table)
-{
-   struct ptdesc *ptdesc = (struct ptdesc *)_table;
-
-   pagetable_dtor(ptdesc);
-   pagetable_free(ptdesc);
-}
 
 #define tlb_flush tlb_flush
 static void tlb_flush(struct mmu_gather *tlb);
diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index 1ca7d4c4b90db..2058e8d3e0138 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -37,6 +37,7 @@ extern void tlb_flush(struct mmu_gather *tlb);
  */
 #define tlb_needs_table_invalidate()   radix_enabled()
 
+#define __HAVE_ARCH_TLB_REMOVE_TABLE
 /* Get the generic bits... */
 #include 
 
diff --git a/arch/riscv/include/asm/tlb.h b/arch/riscv/include/asm/tlb.h
index ded8724b3c4f7..50b63b5c15bd8 100644
--- a/arch/riscv/include/asm/tlb.h
+++ b/arch/riscv/include/asm/tlb.h
@@ -10,18 +10,6 @@ struct mmu_gather;
 
 static void tlb_flush(struct mmu_gather *tlb);
 
-#ifdef CONFIG_MMU
-
-static inline void __tlb_remove_table(void *table)
-{
-   struct ptdesc *ptdesc = (struct ptdesc *)table;
-
-   pagetable_dtor(ptdesc);
-   pagetable_free(ptdesc);
-}
-
-#endif /* CONFIG_MMU */
-
 #define tlb_flush tlb_flush
 #include 
 
diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index 79df7c0932c56..da4a7d175f69c 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -22,7 +22,6 @@
  * Pages used for the page tables is a different story. FIXME: more
  */
 
-void __tlb_remove_table(void *_table);
 static inline void tlb_flush(struct mmu_gather *tlb);
 static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
struct page *page, bool delay_rmap, int page_size);
@@ -87,7 +86,7 @@ static inline void pte_free_tlb(struct mmu_gather *tlb, 
pgtable_t pte,
tlb->cleared_pmds = 1;
if (mm_alloc_pgste(tlb->mm))
gmap_unlink(tlb->mm, (unsigned long *)pte, address);
-   tlb_remove_ptdesc(tlb, pte);
+   tlb_remove_ptdesc(tlb, virt_to_ptdesc(pte));
 }
 
 /*
@@ -106,7 +105,7 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb, 
pmd_t *pmd,
tlb->mm->context.flush_mm = 1;
tlb->freed_tables = 1;
tlb->cleared_puds = 1;
-   tlb_remove_ptdesc(tlb, pmd);
+   tlb_remove_ptdesc(tlb, virt_to_ptdesc(pmd));
 }
 
 /*
@@ -124,7 +123,7 @@ static inline void pud_free_tlb(struct mmu_gather *tlb, 
pud_t *pud,
tlb->mm->context.flush_mm = 1;
tlb->freed_tables = 1;
tlb->cleared_p4ds = 1;
-   tlb_remove_ptdesc(tlb, pud);
+   tlb_remove_ptdesc(tlb, virt_to_ptdesc(pud));
 }
 
 /*
@@ -142,7 +141,7 @@ static inline void p4d_free_tlb(struct mmu_gather *tlb, 
p4d_t *p4d,
__tlb_adjust_range(tlb, address, PAGE_SIZE);
tlb->mm->context.flush_mm = 1;
tlb->freed_tables = 1;
-   tlb_remove_ptdesc(tlb, p4d);
+   tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
 }
 
 #endif /* _S390_TLB_H */
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index c73b89811a264..3e002dea6278f 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -193,13 +193,6 @@ void page_table_free(struct mm_struct *mm, unsigned long 
*table)
pagetable_dtor_free(ptdesc);
 }
 
-void __tlb_remove_table(void *table)
-{
-   struct ptdesc *ptdesc = virt_to_ptdesc(table);
-
-   pagetable_dtor_free(ptdesc);
-}
-
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static void pte_free_now(struct rcu_head *head)
 {
diff --git a/arch/sparc

[PATCH v4 12/15] s390: pgtable: also move pagetable_dtor() of PxD to __tlb_remove_table()

2024-12-30 Thread Qi Zheng
To unify the PxD and PTE TLB free path, also move the pagetable_dtor() of
PMD|PUD|P4D to __tlb_remove_table().

Signed-off-by: Qi Zheng 
Suggested-by: Peter Zijlstra (Intel) 
Cc: linux-s...@vger.kernel.org
---
 arch/s390/include/asm/tlb.h |  3 ---
 arch/s390/mm/pgalloc.c  | 14 --
 2 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index 74b6fba4c2ee3..79df7c0932c56 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -102,7 +102,6 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb, 
pmd_t *pmd,
 {
if (mm_pmd_folded(tlb->mm))
return;
-   pagetable_dtor(virt_to_ptdesc(pmd));
__tlb_adjust_range(tlb, address, PAGE_SIZE);
tlb->mm->context.flush_mm = 1;
tlb->freed_tables = 1;
@@ -122,7 +121,6 @@ static inline void pud_free_tlb(struct mmu_gather *tlb, 
pud_t *pud,
 {
if (mm_pud_folded(tlb->mm))
return;
-   pagetable_dtor(virt_to_ptdesc(pud));
tlb->mm->context.flush_mm = 1;
tlb->freed_tables = 1;
tlb->cleared_p4ds = 1;
@@ -141,7 +139,6 @@ static inline void p4d_free_tlb(struct mmu_gather *tlb, 
p4d_t *p4d,
 {
if (mm_p4d_folded(tlb->mm))
return;
-   pagetable_dtor(virt_to_ptdesc(p4d));
__tlb_adjust_range(tlb, address, PAGE_SIZE);
tlb->mm->context.flush_mm = 1;
tlb->freed_tables = 1;
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 569de24d33761..c73b89811a264 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -180,7 +180,7 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
return table;
 }
 
-static void pagetable_pte_dtor_free(struct ptdesc *ptdesc)
+static void pagetable_dtor_free(struct ptdesc *ptdesc)
 {
pagetable_dtor(ptdesc);
pagetable_free(ptdesc);
@@ -190,20 +190,14 @@ void page_table_free(struct mm_struct *mm, unsigned long 
*table)
 {
struct ptdesc *ptdesc = virt_to_ptdesc(table);
 
-   pagetable_pte_dtor_free(ptdesc);
+   pagetable_dtor_free(ptdesc);
 }
 
 void __tlb_remove_table(void *table)
 {
struct ptdesc *ptdesc = virt_to_ptdesc(table);
-   struct page *page = ptdesc_page(ptdesc);
 
-   if (compound_order(page) == CRST_ALLOC_ORDER) {
-   /* pmd, pud, or p4d */
-   pagetable_free(ptdesc);
-   return;
-   }
-   pagetable_pte_dtor_free(ptdesc);
+   pagetable_dtor_free(ptdesc);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -211,7 +205,7 @@ static void pte_free_now(struct rcu_head *head)
 {
struct ptdesc *ptdesc = container_of(head, struct ptdesc, pt_rcu_head);
 
-   pagetable_pte_dtor_free(ptdesc);
+   pagetable_dtor_free(ptdesc);
 }
 
 void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable)
-- 
2.20.1




[PATCH v4 10/15] riscv: pgtable: move pagetable_dtor() to __tlb_remove_table()

2024-12-30 Thread Qi Zheng
Move pagetable_dtor() to __tlb_remove_table(), so that ptlock and page
table pages can be freed together (regardless of whether RCU is used).
This prevents the use-after-free problem where the ptlock is freed
immediately but the page table pages is freed later via RCU.

Page tables shouldn't have swap cache, so use pagetable_free() instead of
free_page_and_swap_cache() to free page table pages.

By the way, move the comment above __tlb_remove_table() to
riscv_tlb_remove_ptdesc(), it will be more appropriate.

Signed-off-by: Qi Zheng 
Suggested-by: Peter Zijlstra (Intel) 
Cc: linux-ri...@lists.infradead.org
---
 arch/riscv/include/asm/pgalloc.h | 38 ++--
 arch/riscv/include/asm/tlb.h | 14 
 2 files changed, 21 insertions(+), 31 deletions(-)

diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index b6793c5c99296..c8907b8317115 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -15,12 +15,22 @@
 #define __HAVE_ARCH_PUD_FREE
 #include 
 
+/*
+ * While riscv platforms with riscv_ipi_for_rfence as true require an IPI to
+ * perform TLB shootdown, some platforms with riscv_ipi_for_rfence as false use
+ * SBI to perform TLB shootdown. To keep software pagetable walkers safe in 
this
+ * case we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the
+ * comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in 
include/asm-generic/tlb.h
+ * for more details.
+ */
 static inline void riscv_tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
 {
-   if (riscv_use_sbi_for_rfence())
+   if (riscv_use_sbi_for_rfence()) {
tlb_remove_ptdesc(tlb, pt);
-   else
+   } else {
+   pagetable_dtor(pt);
tlb_remove_page_ptdesc(tlb, pt);
+   }
 }
 
 static inline void pmd_populate_kernel(struct mm_struct *mm,
@@ -97,23 +107,15 @@ static inline void pud_free(struct mm_struct *mm, pud_t 
*pud)
 static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud,
  unsigned long addr)
 {
-   if (pgtable_l4_enabled) {
-   struct ptdesc *ptdesc = virt_to_ptdesc(pud);
-
-   pagetable_dtor(ptdesc);
-   riscv_tlb_remove_ptdesc(tlb, ptdesc);
-   }
+   if (pgtable_l4_enabled)
+   riscv_tlb_remove_ptdesc(tlb, virt_to_ptdesc(pud));
 }
 
 static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d,
  unsigned long addr)
 {
-   if (pgtable_l5_enabled) {
-   struct ptdesc *ptdesc = virt_to_ptdesc(p4d);
-
-   pagetable_dtor(ptdesc);
+   if (pgtable_l5_enabled)
riscv_tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d));
-   }
 }
 #endif /* __PAGETABLE_PMD_FOLDED */
 
@@ -142,10 +144,7 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd,
  unsigned long addr)
 {
-   struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
-
-   pagetable_dtor(ptdesc);
-   riscv_tlb_remove_ptdesc(tlb, ptdesc);
+   riscv_tlb_remove_ptdesc(tlb, virt_to_ptdesc(pmd));
 }
 
 #endif /* __PAGETABLE_PMD_FOLDED */
@@ -153,10 +152,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, 
pmd_t *pmd,
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
  unsigned long addr)
 {
-   struct ptdesc *ptdesc = page_ptdesc(pte);
-
-   pagetable_dtor(ptdesc);
-   riscv_tlb_remove_ptdesc(tlb, ptdesc);
+   riscv_tlb_remove_ptdesc(tlb, page_ptdesc(pte));
 }
 #endif /* CONFIG_MMU */
 
diff --git a/arch/riscv/include/asm/tlb.h b/arch/riscv/include/asm/tlb.h
index 1f6c38420d8e0..ded8724b3c4f7 100644
--- a/arch/riscv/include/asm/tlb.h
+++ b/arch/riscv/include/asm/tlb.h
@@ -11,19 +11,13 @@ struct mmu_gather;
 static void tlb_flush(struct mmu_gather *tlb);
 
 #ifdef CONFIG_MMU
-#include 
 
-/*
- * While riscv platforms with riscv_ipi_for_rfence as true require an IPI to
- * perform TLB shootdown, some platforms with riscv_ipi_for_rfence as false use
- * SBI to perform TLB shootdown. To keep software pagetable walkers safe in 
this
- * case we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the
- * comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in 
include/asm-generic/tlb.h
- * for more details.
- */
 static inline void __tlb_remove_table(void *table)
 {
-   free_page_and_swap_cache(table);
+   struct ptdesc *ptdesc = (struct ptdesc *)table;
+
+   pagetable_dtor(ptdesc);
+   pagetable_free(ptdesc);
 }
 
 #endif /* CONFIG_MMU */
-- 
2.20.1




[PATCH v4 09/15] arm64: pgtable: move pagetable_dtor() to __tlb_remove_table()

2024-12-30 Thread Qi Zheng
Move pagetable_dtor() to __tlb_remove_table(), so that ptlock and page
table pages can be freed together (regardless of whether RCU is used).
This prevents the use-after-free problem where the ptlock is freed
immediately but the page table pages is freed later via RCU.

Page tables shouldn't have swap cache, so use pagetable_free() instead of
free_page_and_swap_cache() to free page table pages.

Signed-off-by: Qi Zheng 
Suggested-by: Peter Zijlstra (Intel) 
Cc: linux-arm-ker...@lists.infradead.org
---
 arch/arm64/include/asm/tlb.h | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 408d0f36a8a8f..93591a80b5bfb 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -9,11 +9,13 @@
 #define __ASM_TLB_H
 
 #include 
-#include 
 
 static inline void __tlb_remove_table(void *_table)
 {
-   free_page_and_swap_cache((struct page *)_table);
+   struct ptdesc *ptdesc = (struct ptdesc *)_table;
+
+   pagetable_dtor(ptdesc);
+   pagetable_free(ptdesc);
 }
 
 #define tlb_flush tlb_flush
@@ -82,7 +84,6 @@ static inline void __pte_free_tlb(struct mmu_gather *tlb, 
pgtable_t pte,
 {
struct ptdesc *ptdesc = page_ptdesc(pte);
 
-   pagetable_dtor(ptdesc);
tlb_remove_ptdesc(tlb, ptdesc);
 }
 
@@ -92,7 +93,6 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, 
pmd_t *pmdp,
 {
struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
 
-   pagetable_dtor(ptdesc);
tlb_remove_ptdesc(tlb, ptdesc);
 }
 #endif
@@ -106,7 +106,6 @@ static inline void __pud_free_tlb(struct mmu_gather *tlb, 
pud_t *pudp,
if (!pgtable_l4_enabled())
return;
 
-   pagetable_dtor(ptdesc);
tlb_remove_ptdesc(tlb, ptdesc);
 }
 #endif
@@ -120,7 +119,6 @@ static inline void __p4d_free_tlb(struct mmu_gather *tlb, 
p4d_t *p4dp,
if (!pgtable_l5_enabled())
return;
 
-   pagetable_dtor(ptdesc);
tlb_remove_ptdesc(tlb, ptdesc);
 }
 #endif
-- 
2.20.1




[PATCH v4 15/15] mm: pgtable: introduce generic pagetable_dtor_free()

2024-12-30 Thread Qi Zheng
The pte_free(), pmd_free(), __pud_free() and __p4d_free() in
asm-generic/pgalloc.h and the generic __tlb_remove_table() are basically
the same, so let's introduce pagetable_dtor_free() to deduplicate them.

In addition, the pagetable_dtor_free() in s390 does the same thing, so
let's s390 also calls generic pagetable_dtor_free().

Signed-off-by: Qi Zheng 
Suggested-by: Peter Zijlstra (Intel) 
---
 arch/s390/mm/pgalloc.c|  6 --
 include/asm-generic/pgalloc.h | 12 
 include/asm-generic/tlb.h |  3 +--
 include/linux/mm.h|  6 ++
 4 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index 3e002dea6278f..a4e7619020931 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -180,12 +180,6 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
return table;
 }
 
-static void pagetable_dtor_free(struct ptdesc *ptdesc)
-{
-   pagetable_dtor(ptdesc);
-   pagetable_free(ptdesc);
-}
-
 void page_table_free(struct mm_struct *mm, unsigned long *table)
 {
struct ptdesc *ptdesc = virt_to_ptdesc(table);
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 4afb346eae255..e3977ddca15e4 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -109,8 +109,7 @@ static inline void pte_free(struct mm_struct *mm, struct 
page *pte_page)
 {
struct ptdesc *ptdesc = page_ptdesc(pte_page);
 
-   pagetable_dtor(ptdesc);
-   pagetable_free(ptdesc);
+   pagetable_dtor_free(ptdesc);
 }
 
 
@@ -153,8 +152,7 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t 
*pmd)
struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
 
BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
-   pagetable_dtor(ptdesc);
-   pagetable_free(ptdesc);
+   pagetable_dtor_free(ptdesc);
 }
 #endif
 
@@ -202,8 +200,7 @@ static inline void __pud_free(struct mm_struct *mm, pud_t 
*pud)
struct ptdesc *ptdesc = virt_to_ptdesc(pud);
 
BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
-   pagetable_dtor(ptdesc);
-   pagetable_free(ptdesc);
+   pagetable_dtor_free(ptdesc);
 }
 
 #ifndef __HAVE_ARCH_PUD_FREE
@@ -248,8 +245,7 @@ static inline void __p4d_free(struct mm_struct *mm, p4d_t 
*p4d)
struct ptdesc *ptdesc = virt_to_ptdesc(p4d);
 
BUG_ON((unsigned long)p4d & (PAGE_SIZE-1));
-   pagetable_dtor(ptdesc);
-   pagetable_free(ptdesc);
+   pagetable_dtor_free(ptdesc);
 }
 
 #ifndef __HAVE_ARCH_P4D_FREE
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index 69de47c7ef3c5..a96d4b440f3da 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -213,8 +213,7 @@ static inline void __tlb_remove_table(void *table)
 {
struct ptdesc *ptdesc = (struct ptdesc *)table;
 
-   pagetable_dtor(ptdesc);
-   pagetable_free(ptdesc);
+   pagetable_dtor_free(ptdesc);
 }
 #endif
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index cad11fa10c192..94078c488e904 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3001,6 +3001,12 @@ static inline void pagetable_dtor(struct ptdesc *ptdesc)
lruvec_stat_sub_folio(folio, NR_PAGETABLE);
 }
 
+static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
+{
+   pagetable_dtor(ptdesc);
+   pagetable_free(ptdesc);
+}
+
 static inline bool pagetable_pte_ctor(struct ptdesc *ptdesc)
 {
struct folio *folio = ptdesc_folio(ptdesc);
-- 
2.20.1




[PATCH v4 14/15] mm: pgtable: move __tlb_remove_table_one() in x86 to generic file

2024-12-30 Thread Qi Zheng
The __tlb_remove_table_one() in x86 does not contain architecture-specific
content, so move it to the generic file.

Signed-off-by: Qi Zheng 
---
 arch/x86/include/asm/tlb.h | 19 ---
 mm/mmu_gather.c| 20 ++--
 2 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/tlb.h b/arch/x86/include/asm/tlb.h
index 3858dbf75880e..77f52bc1578a7 100644
--- a/arch/x86/include/asm/tlb.h
+++ b/arch/x86/include/asm/tlb.h
@@ -20,25 +20,6 @@ static inline void tlb_flush(struct mmu_gather *tlb)
flush_tlb_mm_range(tlb->mm, start, end, stride_shift, 
tlb->freed_tables);
 }
 
-#ifdef CONFIG_PT_RECLAIM
-static inline void __tlb_remove_table_one_rcu(struct rcu_head *head)
-{
-   struct ptdesc *ptdesc;
-
-   ptdesc = container_of(head, struct ptdesc, pt_rcu_head);
-   __tlb_remove_table(ptdesc);
-}
-
-static inline void __tlb_remove_table_one(void *table)
-{
-   struct ptdesc *ptdesc;
-
-   ptdesc = table;
-   call_rcu(&ptdesc->pt_rcu_head, __tlb_remove_table_one_rcu);
-}
-#define __tlb_remove_table_one __tlb_remove_table_one
-#endif /* CONFIG_PT_RECLAIM */
-
 static inline void invlpg(unsigned long addr)
 {
asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 1e21022bcf339..7aa6f18c500b2 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -311,13 +311,29 @@ static inline void tlb_table_invalidate(struct mmu_gather 
*tlb)
}
 }
 
-#ifndef __tlb_remove_table_one
+#ifdef CONFIG_PT_RECLAIM
+static inline void __tlb_remove_table_one_rcu(struct rcu_head *head)
+{
+   struct ptdesc *ptdesc;
+
+   ptdesc = container_of(head, struct ptdesc, pt_rcu_head);
+   __tlb_remove_table(ptdesc);
+}
+
+static inline void __tlb_remove_table_one(void *table)
+{
+   struct ptdesc *ptdesc;
+
+   ptdesc = table;
+   call_rcu(&ptdesc->pt_rcu_head, __tlb_remove_table_one_rcu);
+}
+#else
 static inline void __tlb_remove_table_one(void *table)
 {
tlb_remove_table_sync_one();
__tlb_remove_table(table);
 }
-#endif
+#endif /* CONFIG_PT_RECLAIM */
 
 static void tlb_remove_table_one(void *table)
 {
-- 
2.20.1




Re: [PATCH 4/6] kvm powerpc/book3s-apiv2: Introduce kvm-hv specific PMU

2024-12-30 Thread Gautam Menghani
On Sun, Dec 22, 2024 at 07:32:32PM +0530, Vaibhav Jain wrote:
> Introduce a new PMU named 'kvm-hv' to report Book3s kvm-hv specific
> performance counters. This will expose KVM-HV specific performance
> attributes to user-space via kernel's PMU infrastructure and would enable
> users to monitor active kvm-hv based guests.
> 
> The patch creates necessary scaffolding to for the new PMU callbacks and
> introduces two new exports kvmppc_{,un}register_pmu() that are called from
> kvm-hv init and exit function to perform initialize and cleanup for the
> 'kvm-hv' PMU. The patch doesn't introduce any perf-events yet, which will
> be introduced in later patches
> 
> Signed-off-by: Vaibhav Jain 
> ---
>  arch/powerpc/include/asm/kvm_book3s.h |  12 +++
>  arch/powerpc/kvm/Makefile |   6 ++
>  arch/powerpc/kvm/book3s_hv.c  |   7 ++
>  arch/powerpc/kvm/book3s_hv_pmu.c  | 133 ++
>  4 files changed, 158 insertions(+)
>  create mode 100644 arch/powerpc/kvm/book3s_hv_pmu.c
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
> b/arch/powerpc/include/asm/kvm_book3s.h
> index e1ff291ba891..cf91a1493159 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -334,6 +334,9 @@ static inline bool kvmhv_is_nestedv1(void)
>   return !static_branch_likely(&__kvmhv_is_nestedv2);
>  }
>  
> +int kvmppc_register_pmu(void);
> +void kvmppc_unregister_pmu(void);
> +
>  #else
>  
>  static inline bool kvmhv_is_nestedv2(void)
> @@ -346,6 +349,15 @@ static inline bool kvmhv_is_nestedv1(void)
>   return false;
>  }
>  
> +static int kvmppc_register_pmu(void)
> +{
> + return 0;
> +}
> +
> +static void kvmppc_unregister_pmu(void)
> +{
> +}
> +
>  #endif
>  
>  int __kvmhv_nestedv2_reload_ptregs(struct kvm_vcpu *vcpu, struct pt_regs 
> *regs);
> diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
> index 4bd9d1230869..094c3916d9d0 100644
> --- a/arch/powerpc/kvm/Makefile
> +++ b/arch/powerpc/kvm/Makefile
> @@ -92,6 +92,12 @@ kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) 
> += \
>   $(kvm-book3s_64-builtin-tm-objs-y) \
>   $(kvm-book3s_64-builtin-xics-objs-y)
>  
> +# enable kvm_hv perf events
> +ifdef CONFIG_HAVE_PERF_EVENTS
> +kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \
> + book3s_hv_pmu.o
> +endif
> +
>  obj-$(CONFIG_GUEST_STATE_BUFFER_TEST) += test-guest-state-buffer.o
>  endif
>  
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 25429905ae90..83bcce2fb557 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -6662,6 +6662,12 @@ static int kvmppc_book3s_init_hv(void)
>   return r;
>   }
>  
> + r = kvmppc_register_pmu();
> + if (r) {
> + pr_err("KVM-HV: Unable to register PMUs %d\n", r);
> + goto err;
> + }
> +
>   kvm_ops_hv.owner = THIS_MODULE;
>   kvmppc_hv_ops = &kvm_ops_hv;
>  
> @@ -6676,6 +6682,7 @@ static int kvmppc_book3s_init_hv(void)
>  
>  static void kvmppc_book3s_exit_hv(void)
>  {
> + kvmppc_unregister_pmu();
>   kvmppc_uvmem_free();
>   kvmppc_free_host_rm_ops();
>   if (kvmppc_radix_possible())
> diff --git a/arch/powerpc/kvm/book3s_hv_pmu.c 
> b/arch/powerpc/kvm/book3s_hv_pmu.c
> new file mode 100644
> index ..e72542d5e750
> --- /dev/null
> +++ b/arch/powerpc/kvm/book3s_hv_pmu.c
> @@ -0,0 +1,133 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Description: PMUs specific to running nested KVM-HV guests
> + * on Book3S processors (specifically POWER9 and later).
> + */
> +
> +#define pr_fmt(fmt)  "kvmppc-pmu: " fmt
> +
> +#include "asm-generic/local64.h"
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +enum kvmppc_pmu_eventid {
> + KVMPPC_EVENT_MAX,
> +};
> +
> +static struct attribute *kvmppc_pmu_events_attr[] = {
> + NULL,
> +};
> +
> +static const struct attribute_group kvmppc_pmu_events_group = {
> + .name = "events",
> + .attrs = kvmppc_pmu_events_attr,
> +};
> +
> +PMU_FORMAT_ATTR(event, "config:0");
> +static struct attribute *kvmppc_pmu_format_attr[] = {
> + &format_attr_event.attr,
> + NULL,
> +};
> +
> +static struct attribute_group kvmppc_pmu_format_group = {
> + .name = "format",
> + .attrs = kvmppc_pmu_format_attr,
> +};
> +
> +static const struct attribute_group *kvmppc_pmu_attr_groups[] = {
> + &kvmppc_pmu_events_group,
> + &kvmppc_pmu_format_group,
> + NULL,
> +};
> +
> +static int kvmppc_pmu_event_init(struct perf_event *event)
> +{
> + unsigned int config = event->attr.config;
> +
> + pr_debug("%s: Event(%p) id=%llu cpu=%x on_cpu=%x config=%u",
> +  __func__, event, event->id, event->cpu,
> +  event->oncpu, config);

[PATCH v4 08/15] arm: pgtable: move pagetable_dtor() to __tlb_remove_table()

2024-12-30 Thread Qi Zheng
Move pagetable_dtor() to __tlb_remove_table(), so that ptlock and page
table pages can be freed together (regardless of whether RCU is used).
This prevents the use-after-free problem where the ptlock is freed
immediately but the page table pages is freed later via RCU.

Page tables shouldn't have swap cache, so use pagetable_free() instead of
free_page_and_swap_cache() to free page table pages.

Signed-off-by: Qi Zheng 
Suggested-by: Peter Zijlstra (Intel) 
Cc: linux-arm-ker...@lists.infradead.org
---
 arch/arm/include/asm/tlb.h | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index ef79bf1e8563f..264ab635e807a 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -26,12 +26,14 @@
 
 #else /* !CONFIG_MMU */
 
-#include 
 #include 
 
 static inline void __tlb_remove_table(void *_table)
 {
-   free_page_and_swap_cache((struct page *)_table);
+   struct ptdesc *ptdesc = (struct ptdesc *)_table;
+
+   pagetable_dtor(ptdesc);
+   pagetable_free(ptdesc);
 }
 
 #include 
@@ -41,8 +43,6 @@ __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, 
unsigned long addr)
 {
struct ptdesc *ptdesc = page_ptdesc(pte);
 
-   pagetable_dtor(ptdesc);
-
 #ifndef CONFIG_ARM_LPAE
/*
 * With the classic ARM MMU, a pte page has two corresponding pmd
@@ -61,7 +61,6 @@ __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned 
long addr)
 #ifdef CONFIG_ARM_LPAE
struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
 
-   pagetable_dtor(ptdesc);
tlb_remove_ptdesc(tlb, ptdesc);
 #endif
 }
-- 
2.20.1




Re: [PATCH v2 0/3] sysfs: constify bin_attribute argument of sysfs_bin_attr_simple_read()

2024-12-30 Thread Alexei Starovoitov
On Sat, Dec 28, 2024 at 12:43 AM Thomas Weißschuh  wrote:
>
> Most users use this function through the BIN_ATTR_SIMPLE* macros,
> they can handle the switch transparently.
>
> This series is meant to be merged through the driver core tree.

hmm. why?

I'd rather take patches 2 and 3 into bpf-next to avoid
potential conflicts.
Patch 1 looks orthogonal and independent.



[PATCH v12 1/5] modules: Support extended MODVERSIONS info

2024-12-30 Thread Matthew Maurer
Adds a new format for MODVERSIONS which stores each field in a separate
ELF section. This initially adds support for variable length names, but
could later be used to add additional fields to MODVERSIONS in a
backwards compatible way if needed. Any new fields will be ignored by
old user tooling, unlike the current format where user tooling cannot
tolerate adjustments to the format (for example making the name field
longer).

Since PPC munges its version records to strip leading dots, we reproduce
the munging for the new format. Other architectures do not appear to
have architecture-specific usage of this information.

Reviewed-by: Sami Tolvanen 
Signed-off-by: Matthew Maurer 
---
 arch/powerpc/kernel/module_64.c | 24 ++-
 kernel/module/internal.h| 11 +
 kernel/module/main.c| 92 +
 kernel/module/version.c | 45 
 4 files changed, 162 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 
45dac7b46aa3cdcb2058a2320b88c0d67e5586b3..34a5aec4908fba3b91a02e914264cb525918942a
 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -369,6 +369,24 @@ static void dedotify_versions(struct modversion_info *vers,
}
 }
 
+/* Same as normal versions, remove a leading dot if present. */
+static void dedotify_ext_version_names(char *str_seq, unsigned long size)
+{
+   unsigned long out = 0;
+   unsigned long in;
+   char last = '\0';
+
+   for (in = 0; in < size; in++) {
+   /* Skip one leading dot */
+   if (last == '\0' && str_seq[in] == '.')
+   in++;
+   last = str_seq[in];
+   str_seq[out++] = last;
+   }
+   /* Zero the trailing portion of the names table for robustness */
+   memset(&str_seq[out], 0, size - out);
+}
+
 /*
  * Undefined symbols which refer to .funcname, hack to funcname. Make .TOC.
  * seem to be defined (value set later).
@@ -438,10 +456,12 @@ int module_frob_arch_sections(Elf64_Ehdr *hdr,
me->arch.toc_section = i;
if (sechdrs[i].sh_addralign < 8)
sechdrs[i].sh_addralign = 8;
-   }
-   else if (strcmp(secstrings+sechdrs[i].sh_name,"__versions")==0)
+   } else if (strcmp(secstrings + sechdrs[i].sh_name, 
"__versions") == 0)
dedotify_versions((void *)hdr + sechdrs[i].sh_offset,
  sechdrs[i].sh_size);
+   else if (strcmp(secstrings + sechdrs[i].sh_name, 
"__version_ext_names") == 0)
+   dedotify_ext_version_names((void *)hdr + 
sechdrs[i].sh_offset,
+  sechdrs[i].sh_size);
 
if (sechdrs[i].sh_type == SHT_SYMTAB)
dedotify((void *)hdr + sechdrs[i].sh_offset,
diff --git a/kernel/module/internal.h b/kernel/module/internal.h
index 
f10dc3ea7ff883b1c91e036b427cdb90502933b8..8ccb922f765cbed85dafee01e074503154ca05d5
 100644
--- a/kernel/module/internal.h
+++ b/kernel/module/internal.h
@@ -86,6 +86,8 @@ struct load_info {
unsigned int vers;
unsigned int info;
unsigned int pcpu;
+   unsigned int vers_ext_crc;
+   unsigned int vers_ext_name;
} index;
 };
 
@@ -389,6 +391,15 @@ void module_layout(struct module *mod, struct 
modversion_info *ver, struct kerne
   struct kernel_symbol *ks, struct tracepoint * const *tp);
 int check_modstruct_version(const struct load_info *info, struct module *mod);
 int same_magic(const char *amagic, const char *bmagic, bool has_crcs);
+struct modversion_info_ext {
+   size_t remaining;
+   const s32 *crc;
+   const char *name;
+};
+void modversion_ext_start(const struct load_info *info, struct 
modversion_info_ext *ver);
+void modversion_ext_advance(struct modversion_info_ext *ver);
+#define for_each_modversion_info_ext(ver, info) \
+   for (modversion_ext_start(info, &ver); ver.remaining > 0; 
modversion_ext_advance(&ver))
 #else /* !CONFIG_MODVERSIONS */
 static inline int check_version(const struct load_info *info,
const char *symname,
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 
59ea20102b400b5acd68347f61eef48a677c77ce..029c91c01cd0f637910446c7efa3b9a776d95e72
 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2073,6 +2073,82 @@ static int elf_validity_cache_index_str(struct load_info 
*info)
return 0;
 }
 
+/**
+ * elf_validity_cache_index_versions() - Validate and cache version indices
+ * @info:  Load info to cache version indices in.
+ * Must have &load_info->sechdrs and &load_info->secstrings populated.
+ * @flags: Load flags, relevant to suppress version loading, see
+ * uapi/linux

[PATCH v12 0/5] Extended MODVERSIONS Support

2024-12-30 Thread Matthew Maurer
This patch series is intended for use alongside the Implement DWARF
modversions series [1] to enable RUST and MODVERSIONS at the same
time.

Elsewhere, we've seen a desire for long symbol name support for LTO
symbol names [2], and the previous series came up [3] as a possible
solution rather than hashing, which some have objected [4] to.

This series adds a MODVERSIONS format which uses a section per column.
This avoids userspace tools breaking if we need to make a similar change
to the format in the future - we would do so by adding a new section,
rather than editing the struct definition. In the new format, the name
section is formatted as a concatenated sequence of NUL-terminated
strings, which allows for arbitrary length names.

Emitting the extended format is guarded by CONFIG_EXTENDED_MODVERSIONS,
but the kernel always knows how to validate both the original and
extended formats.

Emitting the existing format is now guarded by CONFIG_BASIC_MODVERSIONS,
but it is enabled by default when MODVERSIONS is enabled and must be
explicitly disabled by the user.

Disabling CONFIG_BASIC_MODVERSIONS may cause some userspace tools to be
unable to retrieve CRCs until they are patched to understand the new
location. Even with CONFIG_BASIC_MODVERSIONS enabled, those tools will
be unable to read the CRCs for long symbols until they are updated to
read the new format. This is not expected to interfere with normal
operation, as the primary use for CRCs embedded in the module is
load-time verification by the kernel. Recording and monitoring of CRCs
is typically done through Module.symvers.

Selecting RUST and MODVERSIONS is now possible if GENDWARFKSYMS is
selected, and will implicitly select EXTENDED_MODVERSIONS.

This series depends upon DWARF-based versions [1] and Masahiro's u32
fixup patch [5].

[1] 
https://lore.kernel.org/lkml/20241219210736.2990838-20-samitolva...@google.com/ 

[2] https://lore.kernel.org/lkml/20240605032120.3179157-1-s...@kernel.org/
[3] https://lore.kernel.org/lkml/zoxbeesk40asi...@bombadil.infradead.org/
[4] https://lore.kernel.org/lkml/0b2697fd-7ab4-469f-83a6-ec9ebc701...@suse.com/
[5] 
https://lore.kernel.org/linux-kbuild/20241228154603.2234284-1-masahi...@kernel.org

Changes in v12:
- Rebased on top of Masahiro's cleanup patch
- Switched modpost to Masahiro's new types, including using u32 instead
  of s32
- Eliminated comment noise per Masahiro's suggestion
- Fixed typo in patch 3 commit message
- Set default of BASIC_MODVERSIONS to y instead of MODVERSIONS, per
  Masahiro's suggestion

v11: 
https://lore.kernel.org/r/20241223-extended-modversions-v11-0-221d184ee...@google.com
- Fixed documentation about where strings are stored per Petr's
  suggestion.
- Rebased on to the latest version of Sami's series on linux-next

v10: 
https://lore.kernel.org/r/20241123-extended-modversions-v10-0-0fa754ffd...@google.com
- Fixed accidental selects / default confusion in previous patch
- Re-ran tests (check for section presence in Y/Y, Y/N, N/Y, N/N, check
  all module kinds load)

v9: 
https://lore.kernel.org/r/20241123-extended-modversions-v9-0-bc0403f05...@google.com
- Rebased onto the latest version of Sami's series, on top of linux-next
- Added BASIC_MODVERSIONS to allow using *only* EXTENDED_MODVERSIONS
- Documented where symbol data is stored and format limitations

v8: 
https://lore.kernel.org/r/20241030-extended-modversions-v8-0-93acdef62...@google.com
- Rebased onto latest version of Sami's series, on top of v6.12-rc5
- Pass --stable when KBUILD_GENDWARFKSYMS_STABLE is set.
- Flipped MODVERSIONS/GENDWARFKSYMS order in deps for CONFIG_RUST
- Picked up trailers

v7: 
https://lore.kernel.org/r/20241023-extended-modversions-v7-0-339787b43...@google.com
- Fix modpost to detect EXTENDED_MODVERSIONS based on a flag
- Drop patches to fix export_report.pl
- Switch from conditional compilation in .mod.c to conditional emission
  in modpost
- Factored extended modversion emission into its own function
- Allow RUST + MODVERSIONS if GENDWARFKSYMS is enabled by selecting
  EXTENDED_MODVERSIONS

v6: https://lore.kernel.org/lkml/20241015231925.3854230-1-mmau...@google.com/
- Splits verification refactor Luis requested out to a separate change
- Clarifies commits around export_report.pl repairs
- Add CONFIG_EXTENDED_MODVERSIONS to control whether extended
  information is included in the module, per Luis's request.

v5: https://lore.kernel.org/all/20240925233854.90072-1-mmau...@google.com/
- Addresses Sami's comments from v3 that I missed in v4 (missing early
  return, extra parens)

v4: https://lore.kernel.org/asahi/20240924212024.540574-1-mmau...@google.com/
- Fix incorrect dot munging in PPC

v3: https://lore.kernel.org/lkml/87le0w2hop.fsf@mail.lhotse/T/
- Split up the module verification refactor into smaller patches, per
  Greg K-H's suggestion.

v2: https://lore.kernel.org/all/20231118025748.2778044-1-mmau...@google.com/
- Add loading/verification refactor before modifying, per Luis's request

v1: 

[PATCH v12 2/5] modpost: Produce extended MODVERSIONS information

2024-12-30 Thread Matthew Maurer
Generate both the existing modversions format and the new extended one
when running modpost. Presence of this metadata in the final .ko is
guarded by CONFIG_EXTENDED_MODVERSIONS.

We no longer generate an error on long symbols in modpost if
CONFIG_EXTENDED_MODVERSIONS is set, as they can now be appropriately
encoded in the extended section. These symbols will be skipped in the
previous encoding. An error will still be generated if
CONFIG_EXTENDED_MODVERSIONS is not set.

Reviewed-by: Sami Tolvanen 
Signed-off-by: Matthew Maurer 
---
 kernel/module/Kconfig| 10 
 scripts/Makefile.modpost |  1 +
 scripts/mod/modpost.c| 62 
 3 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
index 
d443fc504ffca0d1001f880ec496ab1f21fe979e..9568b629a03ce8289d3f3597eefc66fc96445720
 100644
--- a/kernel/module/Kconfig
+++ b/kernel/module/Kconfig
@@ -207,6 +207,16 @@ config ASM_MODVERSIONS
  assembly. This can be enabled only when the target architecture
  supports it.
 
+config EXTENDED_MODVERSIONS
+   bool "Extended Module Versioning Support"
+   depends on MODVERSIONS
+   help
+ This enables extended MODVERSIONs support, allowing long symbol
+ names to be versioned.
+
+ The most likely reason you would enable this is to enable Rust
+ support. If unsure, say N.
+
 config MODULE_SRCVERSION_ALL
bool "Source checksum for all modules"
help
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index 
ab0e94ea62496e11dbaa3ffc289ce546862795ca..40426fc6350985780c0092beb49c6cc29b9eff62
 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -43,6 +43,7 @@ MODPOST = $(objtree)/scripts/mod/modpost
 modpost-args = 
\
$(if $(CONFIG_MODULES),-M)  
\
$(if $(CONFIG_MODVERSIONS),-m)  
\
+   $(if $(CONFIG_EXTENDED_MODVERSIONS),-x) 
\
$(if $(CONFIG_MODULE_SRCVERSION_ALL),-a)
\
$(if $(CONFIG_SECTION_MISMATCH_WARN_ONLY),,-E)  
\
$(if $(KBUILD_MODPOST_WARN),-w) 
\
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 
e489983bb8b2850c0f95bcbdfd82f684d4e7f0c3..6324b30f6b97ac24dc517b9229f227c6c369f7d5
 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -33,6 +33,8 @@ static bool module_enabled;
 static bool modversions;
 /* Is CONFIG_MODULE_SRCVERSION_ALL set? */
 static bool all_versions;
+/* Is CONFIG_EXTENDED_MODVERSIONS set? */
+static bool extended_modversions;
 /* If we are modposting external module set to 1 */
 static bool external_module;
 /* Only warn about unresolved symbols */
@@ -1804,6 +1806,49 @@ static void add_exported_symbols(struct buffer *buf, 
struct module *mod)
}
 }
 
+/**
+ * Record CRCs for unresolved symbols, supporting long names
+ */
+static void add_extended_versions(struct buffer *b, struct module *mod)
+{
+   struct symbol *s;
+
+   if (!extended_modversions)
+   return;
+
+   buf_printf(b, "\n");
+   buf_printf(b, "static const u32 version_ext_crcs[]\n");
+   buf_printf(b, "__used __section(\"__version_ext_crcs\") = {\n");
+   list_for_each_entry(s, &mod->unresolved_symbols, list) {
+   if (!s->module)
+   continue;
+   if (!s->crc_valid) {
+   warn("\"%s\" [%s.ko] has no CRC!\n",
+   s->name, mod->name);
+   continue;
+   }
+   buf_printf(b, "\t0x%08x,\n", s->crc);
+   }
+   buf_printf(b, "};\n");
+
+   buf_printf(b, "static const char version_ext_names[]\n");
+   buf_printf(b, "__used __section(\"__version_ext_names\") =\n");
+   list_for_each_entry(s, &mod->unresolved_symbols, list) {
+   if (!s->module)
+   continue;
+   if (!s->crc_valid)
+   /*
+* We already warned on this when producing the crc
+* table.
+* We need to skip its name too, as the indexes in
+* both tables need to align.
+*/
+   continue;
+   buf_printf(b, "\t\"%s\\0\"\n", s->name);
+   }
+   buf_printf(b, ";\n");
+}
+
 /**
  * Record CRCs for unresolved symbols
  **/
@@ -1827,9 +1872,14 @@ static void add_versions(struct buffer *b, struct module 
*mod)
continue;
}
if (strlen(s->name) >= MODULE_NAME_LEN) {
-   error("too long symbol \"%

[PATCH v12 4/5] Documentation/kbuild: Document storage of symbol information

2024-12-30 Thread Matthew Maurer
Document where exported and imported symbols are kept, format options,
and limitations.

Signed-off-by: Matthew Maurer 
---
 Documentation/kbuild/modules.rst | 20 
 1 file changed, 20 insertions(+)

diff --git a/Documentation/kbuild/modules.rst b/Documentation/kbuild/modules.rst
index 
101de236cd0c9abe1f5684d80063ff3f9a7fc673..a42f00d8cb90ff6ee44677c1278287ef25a84c89
 100644
--- a/Documentation/kbuild/modules.rst
+++ b/Documentation/kbuild/modules.rst
@@ -423,6 +423,26 @@ Symbols From the Kernel (vmlinux + modules)
1) It lists all exported symbols from vmlinux and all modules.
2) It lists the CRC if CONFIG_MODVERSIONS is enabled.
 
+Version Information Formats
+---
+
+   Exported symbols have information stored in __ksymtab or __ksymtab_gpl
+   sections. Symbol names and namespaces are stored in __ksymtab_strings,
+   using a format similar to the string table used for ELF. If
+   CONFIG_MODVERSIONS is enabled, the CRCs corresponding to exported
+   symbols will be added to the __kcrctab or __kcrctab_gpl.
+
+   If CONFIG_BASIC_MODVERSIONS is enabled (default with
+   CONFIG_MODVERSIONS), imported symbols will have their symbol name and
+   CRC stored in the __versions section of the importing module. This
+   mode only supports symbols of length up to 64 bytes.
+
+   If CONFIG_EXTENDED_MODVERSIONS is enabled (required to enable both
+   CONFIG_MODVERSIONS and CONFIG_RUST at the same time), imported symbols
+   will have their symbol name recorded in the __version_ext_names
+   section as a series of concatenated, null-terminated strings. CRCs for
+   these symbols will be recorded in the __version_ext_crcs section.
+
 Symbols and External Modules
 
 

-- 
2.47.1.613.gc27f4b7a9f-goog




[PATCH v12 3/5] modules: Allow extended modversions without basic MODVERSIONS

2024-12-30 Thread Matthew Maurer
If you know that your kernel modules will only ever be loaded by a newer
kernel, you can disable BASIC_MODVERSIONS to save space. This also
allows easy creation of test modules to see how tooling will respond to
modules that only have the new format.

Signed-off-by: Matthew Maurer 
---
 kernel/module/Kconfig| 15 +++
 scripts/Makefile.modpost |  1 +
 scripts/mod/modpost.c|  9 +++--
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/kernel/module/Kconfig b/kernel/module/Kconfig
index 
9568b629a03ce8289d3f3597eefc66fc96445720..4538f3af63e1ca531d0f74ef45a6f5268e505aec
 100644
--- a/kernel/module/Kconfig
+++ b/kernel/module/Kconfig
@@ -217,6 +217,21 @@ config EXTENDED_MODVERSIONS
  The most likely reason you would enable this is to enable Rust
  support. If unsure, say N.
 
+config BASIC_MODVERSIONS
+   bool "Basic Module Versioning Support"
+   depends on MODVERSIONS
+   default y
+   help
+ This enables basic MODVERSIONS support, allowing older tools or
+ kernels to potentially load modules.
+
+ Disabling this may cause older `modprobe` or `kmod` to be unable
+ to read MODVERSIONS information from built modules. With this
+ disabled, older kernels may treat this module as unversioned.
+
+ This is enabled by default when MODVERSIONS are enabled.
+ If unsure, say Y.
+
 config MODULE_SRCVERSION_ALL
bool "Source checksum for all modules"
help
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index 
40426fc6350985780c0092beb49c6cc29b9eff62..d7d45067d08b94a82451d66a64eae29b6826e139
 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -43,6 +43,7 @@ MODPOST = $(objtree)/scripts/mod/modpost
 modpost-args = 
\
$(if $(CONFIG_MODULES),-M)  
\
$(if $(CONFIG_MODVERSIONS),-m)  
\
+   $(if $(CONFIG_BASIC_MODVERSIONS),-b)
\
$(if $(CONFIG_EXTENDED_MODVERSIONS),-x) 
\
$(if $(CONFIG_MODULE_SRCVERSION_ALL),-a)
\
$(if $(CONFIG_SECTION_MISMATCH_WARN_ONLY),,-E)  
\
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 
6324b30f6b97ac24dc517b9229f227c6c369f7d5..3784f1e08104dc2ca1da10d45ed92bb8adf4826a
 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -33,6 +33,8 @@ static bool module_enabled;
 static bool modversions;
 /* Is CONFIG_MODULE_SRCVERSION_ALL set? */
 static bool all_versions;
+/* Is CONFIG_BASIC_MODVERSIONS set? */
+static bool basic_modversions;
 /* Is CONFIG_EXTENDED_MODVERSIONS set? */
 static bool extended_modversions;
 /* If we are modposting external module set to 1 */
@@ -1856,7 +1858,7 @@ static void add_versions(struct buffer *b, struct module 
*mod)
 {
struct symbol *s;
 
-   if (!modversions)
+   if (!basic_modversions)
return;
 
buf_printf(b, "\n");
@@ -2176,7 +2178,7 @@ int main(int argc, char **argv)
LIST_HEAD(dump_lists);
struct dump_list *dl, *dl2;
 
-   while ((opt = getopt(argc, argv, "ei:MmnT:to:au:WwENd:x")) != -1) {
+   while ((opt = getopt(argc, argv, "ei:MmnT:to:au:WwENd:xb")) != -1) {
switch (opt) {
case 'e':
external_module = true;
@@ -2225,6 +2227,9 @@ int main(int argc, char **argv)
case 'd':
missing_namespace_deps = optarg;
break;
+   case 'b':
+   basic_modversions = true;
+   break;
case 'x':
extended_modversions = true;
break;

-- 
2.47.1.613.gc27f4b7a9f-goog




[PATCH v12 5/5] rust: Use gendwarfksyms + extended modversions for CONFIG_MODVERSIONS

2024-12-30 Thread Matthew Maurer
From: Sami Tolvanen 

Previously, two things stopped Rust from using MODVERSIONS:
1. Rust symbols are occasionally too long to be represented in the
   original versions table
2. Rust types cannot be properly hashed by the existing genksyms
   approach because:
* Looking up type definitions in Rust is more complex than C
* Type layout is potentially dependent on the compiler in Rust,
  not just the source type declaration.

CONFIG_EXTENDED_MODVERSIONS addresses the first point, and
CONFIG_GENDWARFKSYMS the second. If Rust wants to use MODVERSIONS, allow
it to do so by selecting both features.

Signed-off-by: Sami Tolvanen 
Co-developed-by: Matthew Maurer 
Signed-off-by: Matthew Maurer 
---
 init/Kconfig  |  3 ++-
 rust/Makefile | 34 --
 2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 
c1f9eb3d5f2e892e977ba1425599502dc830f552..b60acfd9431e0ac2bf401ecb6523b5104ad31150
 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1959,7 +1959,8 @@ config RUST
bool "Rust support"
depends on HAVE_RUST
depends on RUST_IS_AVAILABLE
-   depends on !MODVERSIONS
+   select EXTENDED_MODVERSIONS if MODVERSIONS
+   depends on !MODVERSIONS || GENDWARFKSYMS
depends on !GCC_PLUGIN_RANDSTRUCT
depends on !RANDSTRUCT
depends on !DEBUG_INFO_BTF || PAHOLE_HAS_LANG_EXCLUDE
diff --git a/rust/Makefile b/rust/Makefile
index 
a40a3936126d603836e0ec9b42a1285916b60e45..80f970ad81f7989afe5ff2b5f633f50feb7f6006
 100644
--- a/rust/Makefile
+++ b/rust/Makefile
@@ -329,10 +329,11 @@ $(obj)/bindings/bindings_helpers_generated.rs: private 
bindgen_target_extra = ;
 $(obj)/bindings/bindings_helpers_generated.rs: $(src)/helpers/helpers.c FORCE
$(call if_changed_dep,bindgen)
 
+rust_exports = $(NM) -p --defined-only $(1) | awk '$$2~/(T|R|D|B)/ && 
$$3!~/__cfi/ { printf $(2),$(3) }'
+
 quiet_cmd_exports = EXPORTS $@
   cmd_exports = \
-   $(NM) -p --defined-only $< \
-   | awk '$$2~/(T|R|D|B)/ && $$3!~/__cfi/ {printf 
"EXPORT_SYMBOL_RUST_GPL(%s);\n",$$3}' > $@
+   $(call rust_exports,$<,"EXPORT_SYMBOL_RUST_GPL(%s);\n",$$3) > $@
 
 $(obj)/exports_core_generated.h: $(obj)/core.o FORCE
$(call if_changed,exports)
@@ -401,11 +402,36 @@ ifneq ($(or $(CONFIG_ARM64),$(and 
$(CONFIG_RISCV),$(CONFIG_64BIT))),)
__ashlti3 __lshrti3
 endif
 
+ifdef CONFIG_MODVERSIONS
+cmd_gendwarfksyms = $(if $(skip_gendwarfksyms),, \
+   $(call rust_exports,$@,"%s\n",$$3) | \
+   scripts/gendwarfksyms/gendwarfksyms \
+   $(if $(KBUILD_GENDWARFKSYMS_STABLE), --stable) \
+   $(if $(KBUILD_SYMTYPES), --symtypes $(@:.o=.symtypes),) \
+   $@ >> $(dot-target).cmd)
+endif
+
 define rule_rustc_library
$(call cmd_and_fixdep,rustc_library)
$(call cmd,gen_objtooldep)
+   $(call cmd,gendwarfksyms)
 endef
 
+define rule_rust_cc_library
+   $(call if_changed_rule,cc_o_c)
+   $(call cmd,force_checksrc)
+   $(call cmd,gendwarfksyms)
+endef
+
+# helpers.o uses the same export mechanism as Rust libraries, so ensure symbol
+# versions are calculated for the helpers too.
+$(obj)/helpers/helpers.o: $(src)/helpers/helpers.c $(recordmcount_source) FORCE
+   +$(call if_changed_rule,rust_cc_library)
+
+# Disable symbol versioning for exports.o to avoid conflicts with the actual
+# symbol versions generated from Rust objects.
+$(obj)/exports.o: private skip_gendwarfksyms = 1
+
 $(obj)/core.o: private skip_clippy = 1
 $(obj)/core.o: private skip_flags = -Wunreachable_pub
 $(obj)/core.o: private rustc_objcopy = $(foreach 
sym,$(redirect-intrinsics),--redefine-sym $(sym)=__rust$(sym))
@@ -417,13 +443,16 @@ ifneq ($(or $(CONFIG_X86_64),$(CONFIG_X86_32)),)
 $(obj)/core.o: scripts/target.json
 endif
 
+$(obj)/compiler_builtins.o: private skip_gendwarfksyms = 1
 $(obj)/compiler_builtins.o: private rustc_objcopy = -w -W '__*'
 $(obj)/compiler_builtins.o: $(src)/compiler_builtins.rs $(obj)/core.o FORCE
+$(call if_changed_rule,rustc_library)
 
+$(obj)/build_error.o: private skip_gendwarfksyms = 1
 $(obj)/build_error.o: $(src)/build_error.rs $(obj)/compiler_builtins.o FORCE
+$(call if_changed_rule,rustc_library)
 
+$(obj)/ffi.o: private skip_gendwarfksyms = 1
 $(obj)/ffi.o: $(src)/ffi.rs $(obj)/compiler_builtins.o FORCE
+$(call if_changed_rule,rustc_library)
 
@@ -435,6 +464,7 @@ $(obj)/bindings.o: $(src)/bindings/lib.rs \
+$(call if_changed_rule,rustc_library)
 
 $(obj)/uapi.o: private rustc_target_flags = --extern ffi
+$(obj)/uapi.o: private skip_gendwarfksyms = 1
 $(obj)/uapi.o: $(src)/uapi/lib.rs \
 $(obj)/ffi.o \
 $(obj)/uapi/uapi_generated.rs FORCE

-- 
2.47.1.613.gc27f4b7a9f-goog




Re: [PATCH v4 00/15] move pagetable_*_dtor() to __tlb_remove_table()

2024-12-30 Thread Andrew Morton
On Mon, 30 Dec 2024 17:07:35 +0800 Qi Zheng  wrote:

> Changes in v4:
>  - remove [PATCH v3 15/17] and [PATCH v3 16/17] (Mike Rapoport)
>(the tlb_remove_page_ptdesc() and tlb_remove_ptdesc() are intermediate
> products of the project: https://kernelnewbies.org/MatthewWilcox/Memdescs,
> so keep them)
>  - collect Acked-by

Thanks, I've updated mm.git to v5.