Stefan Berger <stef...@linux.ibm.com> writes:

> I bisected Linux between 6.13.0 and 6.12.0 due to failing kexec on a 
> Power8 baremetal host on 6.13.0:
>
> 8fec58f503b296af87ffca3898965e3054f2b616 is the first bad commit
> commit 8fec58f503b296af87ffca3898965e3054f2b616
> Author: Ritesh Harjani (IBM) <ritesh.l...@gmail.com>
> Date:   Fri Oct 18 22:59:50 2024 +0530
>
>      book3s64/hash: Add kfence functionality
>
>      Now that linear map functionality of debug_pagealloc is made generic,
>      enable kfence to use this generic infrastructure.
>
>      1. Define kfence related linear map variables.
>         - u8 *linear_map_kf_hash_slots;
>         - unsigned long linear_map_kf_hash_count;
>         - DEFINE_RAW_SPINLOCK(linear_map_kf_hash_lock);
>      2. The linear map size allocated in RMA region is quite small
>         (KFENCE_POOL_SIZE >> PAGE_SHIFT) which is 512 bytes by default.
>      3. kfence pool memory is reserved using memblock_phys_alloc() which has
>         can come from anywhere.
>         (default 255 objects => ((1+255) * 2) << PAGE_SHIFT = 32MB)
>      4. The hash slot information for kfence memory gets added in linear map
>         in hash_linear_map_add_slot() (which also adds for debug_pagealloc).
>
>      Reported-by: Pavithra Prakash <pavra...@linux.vnet.ibm.com>
>      Signed-off-by: Ritesh Harjani (IBM) <ritesh.l...@gmail.com>
>      Signed-off-by: Michael Ellerman <m...@ellerman.id.au>
>      Link: 
> https://patch.msgid.link/5c2b61941b344077a2b8654dab46efa0322af3af.1729271995.git.ritesh.l...@gmail.com
>
>   arch/powerpc/include/asm/kfence.h     |   5 ---
>   arch/powerpc/mm/book3s64/hash_utils.c | 162 
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
>   2 files changed, 149 insertions(+), 18 deletions(-)
>
>
> Reverting part of this patch by applying the following changes to 6.13.0 
> resolves the issue:

Sorry for the delay in getting back to this, we have been going back and
forth on a few other work priorities. Nevertheless, Aboorva (cc'd)
helped in analyzing & has root caused the reported problem. Let me
summarize the findings. Aboorva, please add if I missed any details in
here:

1. The issue reported on this is not related to the above mentioned
patch. The issue can also be reproduced on v6.12 or older kernel where
we didn't have this series (with CONFIG_KFENCE=y).

2. The issue is happening during kexec_sequence(), i.e.
kexec_copy_flush() -> copy_segments -> copy_page(dest, addr). Note that
the dest address in copy_page() is obtained during kexec load time.

The root cause of the issue is - that the dest address in above
copy_page() function is falling into a kfence region, which is causing
the page fault.

Now, as per the kexec_sequence(), we are not supposed to take a page
fault during that path after the local_paca->data_offset is changed to
some poison value. We do disable the MMU for most other cases except
for Hash on Pseries. That is also the reason why the issue is only seen
on Hash on Pseries and not on Radix. 


On debugging further, we found that kexec on ppc tries to find the
memory region using "for_each_mem_range_rev()" rather than using
"for_each_free_mem_range_reverse()". i.e. 

- Ƒ __locate_mem_hole_top_down
  - Ƒ locate_mem_hole_top_down_ppc64
    - Ƒ arch_kexec_locate_mem_hole
      - Ƒ kexec_add_buffer
        - Ƒ kexec_purgatory_setup_kbuf
          - Ƒ kexec_load_purgatory
            + Ƒ elf64_load


IMO, it should be something like below diff... so that we could avoid using
a region which is in use by someone else e.g. kfence.

diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index dc65c1391157..771b7dbaae0b 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -66,7 +66,7 @@ static int __locate_mem_hole_top_down(struct kexec_buf *kbuf,
        phys_addr_t start, end;
        u64 i;
 
-       for_each_mem_range_rev(i, &start, &end) {
+       for_each_free_mem_range_reverse(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, 
&end, NULL) {
                /*
                 * memblock uses [start, end) convention while it is
                 * [start, end] here. Fix the off-by-one to have the
@@ -165,7 +165,7 @@ static int __locate_mem_hole_bottom_up(struct kexec_buf 
*kbuf,
        phys_addr_t start, end;
        u64 i;
 
-       for_each_mem_range(i, &start, &end) {
+       for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, 
NULL) {
                /*
                 * memblock uses [start, end) convention while it is
                 * [start, end] here. Fix the off-by-one to have the


3. In the latest 6.15-rc1 release, it looks like these custom arch
specific functions "__locate_mem_hole_top_down()" etc. has been removed
[1] (For some other reason though).  Aboorva, also verified that the
issue is not seen on v6.15-rc1 anymore. 

[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6e5250eaa665fac681926251a8d7f5f418103399


@Stefan,
Can you give v6.15-rc1 a try to confirm that the issue is indeed fixed
at your end too? For this we need not add any custom changes, v6.15-rc1
with kfence enabled on Hash should not give any issues during kexec.


Next steps: 
- We still need to see how this needs to be fixed in the older stable
  kernels. Can we backport the necessary patches from this series [2] or do
  we need a separate change to backport.

[2]: 
https://lore.kernel.org/all/20250131113830.925179-5-sourabhj...@linux.ibm.com/


-ritesh




>
>
> diff --git a/arch/powerpc/include/asm/kfence.h 
> b/arch/powerpc/include/asm/kfence.h
> index 1f7cab58ab2c..e7981719313c 100644
> --- a/arch/powerpc/include/asm/kfence.h
> +++ b/arch/powerpc/include/asm/kfence.h
> @@ -10,6 +10,7 @@
>
>   #include <linux/mm.h>
>   #include <asm/pgtable.h>
> +#include <asm/mmu.h>
>
>   #ifdef CONFIG_PPC64_ELF_ABI_V1
>   #define ARCH_FUNC_PREFIX "."
> @@ -25,6 +26,10 @@ static inline void disable_kfence(void)
>
>   static inline bool arch_kfence_init_pool(void)
>   {
> +#ifdef CONFIG_PPC64
> +       if (!radix_enabled())
> +               return false;
> +#endif
>          return !kfence_disabled;
>   }

Reply via email to