Stefan Berger <stef...@linux.ibm.com> writes: > I bisected Linux between 6.13.0 and 6.12.0 due to failing kexec on a > Power8 baremetal host on 6.13.0: > > 8fec58f503b296af87ffca3898965e3054f2b616 is the first bad commit > commit 8fec58f503b296af87ffca3898965e3054f2b616 > Author: Ritesh Harjani (IBM) <ritesh.l...@gmail.com> > Date: Fri Oct 18 22:59:50 2024 +0530 > > book3s64/hash: Add kfence functionality > > Now that linear map functionality of debug_pagealloc is made generic, > enable kfence to use this generic infrastructure. > > 1. Define kfence related linear map variables. > - u8 *linear_map_kf_hash_slots; > - unsigned long linear_map_kf_hash_count; > - DEFINE_RAW_SPINLOCK(linear_map_kf_hash_lock); > 2. The linear map size allocated in RMA region is quite small > (KFENCE_POOL_SIZE >> PAGE_SHIFT) which is 512 bytes by default. > 3. kfence pool memory is reserved using memblock_phys_alloc() which has > can come from anywhere. > (default 255 objects => ((1+255) * 2) << PAGE_SHIFT = 32MB) > 4. The hash slot information for kfence memory gets added in linear map > in hash_linear_map_add_slot() (which also adds for debug_pagealloc). > > Reported-by: Pavithra Prakash <pavra...@linux.vnet.ibm.com> > Signed-off-by: Ritesh Harjani (IBM) <ritesh.l...@gmail.com> > Signed-off-by: Michael Ellerman <m...@ellerman.id.au> > Link: > https://patch.msgid.link/5c2b61941b344077a2b8654dab46efa0322af3af.1729271995.git.ritesh.l...@gmail.com > > arch/powerpc/include/asm/kfence.h | 5 --- > arch/powerpc/mm/book3s64/hash_utils.c | 162 > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------ > 2 files changed, 149 insertions(+), 18 deletions(-) > > > Reverting part of this patch by applying the following changes to 6.13.0 > resolves the issue:
Sorry for the delay in getting back to this, we have been going back and forth on a few other work priorities. Nevertheless, Aboorva (cc'd) helped in analyzing & has root caused the reported problem. Let me summarize the findings. Aboorva, please add if I missed any details in here: 1. The issue reported on this is not related to the above mentioned patch. The issue can also be reproduced on v6.12 or older kernel where we didn't have this series (with CONFIG_KFENCE=y). 2. The issue is happening during kexec_sequence(), i.e. kexec_copy_flush() -> copy_segments -> copy_page(dest, addr). Note that the dest address in copy_page() is obtained during kexec load time. The root cause of the issue is - that the dest address in above copy_page() function is falling into a kfence region, which is causing the page fault. Now, as per the kexec_sequence(), we are not supposed to take a page fault during that path after the local_paca->data_offset is changed to some poison value. We do disable the MMU for most other cases except for Hash on Pseries. That is also the reason why the issue is only seen on Hash on Pseries and not on Radix. On debugging further, we found that kexec on ppc tries to find the memory region using "for_each_mem_range_rev()" rather than using "for_each_free_mem_range_reverse()". i.e. - Ƒ __locate_mem_hole_top_down - Ƒ locate_mem_hole_top_down_ppc64 - Ƒ arch_kexec_locate_mem_hole - Ƒ kexec_add_buffer - Ƒ kexec_purgatory_setup_kbuf - Ƒ kexec_load_purgatory + Ƒ elf64_load IMO, it should be something like below diff... so that we could avoid using a region which is in use by someone else e.g. kfence. diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index dc65c1391157..771b7dbaae0b 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -66,7 +66,7 @@ static int __locate_mem_hole_top_down(struct kexec_buf *kbuf, phys_addr_t start, end; u64 i; - for_each_mem_range_rev(i, &start, &end) { + for_each_free_mem_range_reverse(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) { /* * memblock uses [start, end) convention while it is * [start, end] here. Fix the off-by-one to have the @@ -165,7 +165,7 @@ static int __locate_mem_hole_bottom_up(struct kexec_buf *kbuf, phys_addr_t start, end; u64 i; - for_each_mem_range(i, &start, &end) { + for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) { /* * memblock uses [start, end) convention while it is * [start, end] here. Fix the off-by-one to have the 3. In the latest 6.15-rc1 release, it looks like these custom arch specific functions "__locate_mem_hole_top_down()" etc. has been removed [1] (For some other reason though). Aboorva, also verified that the issue is not seen on v6.15-rc1 anymore. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6e5250eaa665fac681926251a8d7f5f418103399 @Stefan, Can you give v6.15-rc1 a try to confirm that the issue is indeed fixed at your end too? For this we need not add any custom changes, v6.15-rc1 with kfence enabled on Hash should not give any issues during kexec. Next steps: - We still need to see how this needs to be fixed in the older stable kernels. Can we backport the necessary patches from this series [2] or do we need a separate change to backport. [2]: https://lore.kernel.org/all/20250131113830.925179-5-sourabhj...@linux.ibm.com/ -ritesh > > > diff --git a/arch/powerpc/include/asm/kfence.h > b/arch/powerpc/include/asm/kfence.h > index 1f7cab58ab2c..e7981719313c 100644 > --- a/arch/powerpc/include/asm/kfence.h > +++ b/arch/powerpc/include/asm/kfence.h > @@ -10,6 +10,7 @@ > > #include <linux/mm.h> > #include <asm/pgtable.h> > +#include <asm/mmu.h> > > #ifdef CONFIG_PPC64_ELF_ABI_V1 > #define ARCH_FUNC_PREFIX "." > @@ -25,6 +26,10 @@ static inline void disable_kfence(void) > > static inline bool arch_kfence_init_pool(void) > { > +#ifdef CONFIG_PPC64 > + if (!radix_enabled()) > + return false; > +#endif > return !kfence_disabled; > }