Re: [PATCH v5 11/17] x86: pgtable: convert __tlb_remove_table() to use struct ptdesc
On 08/01/2025 07:57, Qi Zheng wrote: > Convert __tlb_remove_table() to use struct ptdesc, which will help to move > pagetable_dtor() to __tlb_remove_table(). > > And page tables shouldn't have swap cache, so use pagetable_free() instead > of free_page_and_swap_cache() to free page table pages. > > Signed-off-by: Qi Zheng Definitely a good idea to have split patch 11 from v4. Reviewed-by: Kevin Brodsky - Kevin
[PATCH v3 20/28] powerpc/ftrace: Use RCU in all users of __module_text_address().
__module_text_address() can be invoked within a RCU section, there is no requirement to have preemption disabled. Replace the preempt_disable() section around __module_text_address() with RCU. Cc: Christophe Leroy Cc: Madhavan Srinivasan Cc: Mark Rutland Cc: Masami Hiramatsu Cc: Michael Ellerman Cc: Naveen N Rao Cc: Nicholas Piggin Cc: Steven Rostedt Cc: linux-trace-ker...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Acked-by: Peter Zijlstra (Intel) Signed-off-by: Sebastian Andrzej Siewior --- arch/powerpc/kernel/trace/ftrace.c | 6 ++ arch/powerpc/kernel/trace/ftrace_64_pg.c | 6 ++ 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 5ccd791761e8f..558d7f4e4bea6 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -115,10 +115,8 @@ static unsigned long ftrace_lookup_module_stub(unsigned long ip, unsigned long a { struct module *mod = NULL; - preempt_disable(); - mod = __module_text_address(ip); - preempt_enable(); - + scoped_guard(rcu) + mod = __module_text_address(ip); if (!mod) pr_err("No module loaded at addr=%lx\n", ip); diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.c b/arch/powerpc/kernel/trace/ftrace_64_pg.c index 98787376eb87c..531d40f10c8a1 100644 --- a/arch/powerpc/kernel/trace/ftrace_64_pg.c +++ b/arch/powerpc/kernel/trace/ftrace_64_pg.c @@ -120,10 +120,8 @@ static struct module *ftrace_lookup_module(struct dyn_ftrace *rec) { struct module *mod; - preempt_disable(); - mod = __module_text_address(rec->ip); - preempt_enable(); - + scoped_guard(rcu) + mod = __module_text_address(rec->ip); if (!mod) pr_err("No module loaded at addr=%lx\n", rec->ip); -- 2.47.1
[PATCH v3 01/28] module: Extend the preempt disabled section in dereference_symbol_descriptor().
dereference_symbol_descriptor() needs to obtain the module pointer belonging to pointer in order to resolve that pointer. The returned mod pointer is obtained under RCU-sched/ preempt_disable() guarantees and needs to be used within this section to ensure that the module is not removed in the meantime. Extend the preempt_disable() section to also cover dereference_module_function_descriptor(). Fixes: 04b8eb7a4ccd9 ("symbol lookup: introduce dereference_symbol_descriptor()") Cc: James E.J. Bottomley Cc: Christophe Leroy Cc: Helge Deller Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: Naveen N Rao Cc: Nicholas Piggin Cc: Sergey Senozhatsky Cc: linux-par...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Reviewed-by: Sergey Senozhatsky Acked-by: Peter Zijlstra (Intel) Signed-off-by: Sebastian Andrzej Siewior --- include/linux/kallsyms.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index c3f075e8f60cb..1c6a6c1704d8d 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -57,10 +57,10 @@ static inline void *dereference_symbol_descriptor(void *ptr) preempt_disable(); mod = __module_address((unsigned long)ptr); - preempt_enable(); if (mod) ptr = dereference_module_function_descriptor(mod, ptr); + preempt_enable(); #endif return ptr; } -- 2.47.1
[PATCH] PCI/AER: Add kernel.aer_print_skip_mask to control aer log
Sometimes certain PCIE devices installed on some servers occasionally produce large number of AER correctable error logs, which is quite annoying. Add this sysctl parameter kernel.aer_print_skip_mask to skip printing AER errors of certain severity. The AER severity can be 0(NONFATAL), 1(FATAL), 2(CORRECTABLE). The 3 low bits of the mask are used to skip these 3 severities. Set bit 0 can skip printing NONFATAL AER errors, and set bit 1 can skip printing FATAL AER errors, set bit 2 can skip printing CORRECTABLE AER errors. And multiple bits can be set to skip multiple severities. Signed-off-by: Bijie Xu --- drivers/pci/pcie/aer.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 80c5ba8d8296..b46973526bcf 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -698,6 +698,7 @@ static void __aer_print_error(struct pci_dev *dev, pci_dev_aer_stats_incr(dev, info); } +unsigned int aer_print_skip_mask __read_mostly; void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) { int layer, agent; @@ -710,6 +711,9 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) goto out; } + if ((1 << info->severity) & aer_print_skip_mask) + goto out; + layer = AER_GET_LAYER_ERROR(info->severity, info->status); agent = AER_GET_AGENT(info->severity, info->status); @@ -1596,3 +1600,22 @@ int __init pcie_aer_init(void) return -ENXIO; return pcie_port_service_register(&aerdriver); } + +static const struct ctl_table aer_print_skip_mask_sysctls[] = { + { + .procname = "aer_print_skip_mask", + .data = &aer_print_skip_mask, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = &proc_douintvec, + }, + {} +}; + +static int __init aer_print_skip_mask_sysctl_init(void) +{ + register_sysctl_init("kernel", aer_print_skip_mask_sysctls); + return 0; +} + +late_initcall(aer_print_skip_mask_sysctl_init); -- 2.25.1
[PATCH] perf machine: Don't ignore _etext when not a text symbol
Depending on how vmlinux.lds is written, _etext might be the very first data symbol instead of the very last text symbol. Don't require it to be a text symbol, accept any symbol type. Fixes: ed9adb2035b5 ("perf machine: Read also the end of the kernel") Signed-off-by: Christophe Leroy --- tools/perf/util/machine.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 27d5345d2b30..9be2f4479f52 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -1003,7 +1003,7 @@ static int machine__get_running_kernel_start(struct machine *machine, err = kallsyms__get_symbol_start(filename, "_edata", &addr); if (err) - err = kallsyms__get_function_start(filename, "_etext", &addr); + err = kallsyms__get_symbol_start(filename, "_etext", &addr); if (!err) *end = addr; -- 2.47.0
Re: [PATCH RESEND v1 1/5] crash: remove an unused argument from reserve_crashkernel_generic()
On 01/08/25 at 03:44pm, Sourabh Jain wrote: > cmdline argument is not used in reserve_crashkernel_generic() so remove > it. Correspondingly, all the callers have been updated as well. > > No functional change intended. > > Cc: Andrew Morton > Cc: Baoquan he > Cc: Hari Bathini > CC: Madhavan Srinivasan > Cc: Michael Ellerman > Cc: ke...@lists.infradead.org > Cc: linux-ker...@vger.kernel.org > Signed-off-by: Sourabh Jain > --- > arch/arm64/mm/init.c | 6 ++ > arch/loongarch/kernel/setup.c | 5 ++--- > arch/riscv/mm/init.c | 6 ++ > arch/x86/kernel/setup.c | 6 ++ > include/linux/crash_reserve.h | 11 +-- > kernel/crash_reserve.c| 9 - > 6 files changed, 17 insertions(+), 26 deletions(-) Acked-by: Baoquan He
Re: [PATCH RESEND v1 5/5] crash: option to let arch decide mem range is usable
On 08/01/25 17:08, Baoquan he wrote: On 01/08/25 at 03:44pm, Sourabh Jain wrote: ...snip... diff --git a/include/linux/kexec.h b/include/linux/kexec.h index f0e9f8eda7a3..407f8b0346aa 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -205,6 +205,15 @@ static inline int arch_kimage_file_post_load_cleanup(struct kimage *image) } #endif +#ifndef arch_check_excluded_range +static inline int arch_check_excluded_range(struct kimage *image, + unsigned long start, + unsigned long end) +{ + return 0; +} +#endif + #ifdef CONFIG_KEXEC_SIG #ifdef CONFIG_SIGNED_PE_FILE_VERIFICATION int kexec_kernel_verify_pe_sig(const char *kernel, unsigned long kernel_len); diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 3eedb8c226ad..52e1480dbfa1 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -464,6 +464,12 @@ static int locate_mem_hole_top_down(unsigned long start, unsigned long end, continue; } + /* Make sure this does not conflict exclude range */ ^ Make sure this doesn't conflict with excluded range? + if (arch_check_excluded_range(image, temp_start, temp_end)) { + temp_start = temp_start - PAGE_SIZE; + continue; + } + /* We found a suitable memory range */ break; } while (1); @@ -498,6 +504,12 @@ static int locate_mem_hole_bottom_up(unsigned long start, unsigned long end, continue; } + /* Make sure this does not conflict exclude range */ ^ Ditto. I will update both comments. Thanks, Sourabh Jain
Re: [PATCH RESEND v1 2/5] crash: let arch decide crash memory export to iomem_resource
Hello Baoquan On 08/01/25 16:55, Baoquan he wrote: Hi, On 01/08/25 at 03:44pm, Sourabh Jain wrote: insert_crashkernel_resources() adds crash memory to iomem_resource if generic crashkernel reservation is enabled on an architecture. On PowerPC, system RAM is added to iomem_resource. See commit c40dd2f766440 ("powerpc: Add System RAM to /proc/iomem"). Enabling generic crashkernel reservation on PowerPC leads to a conflict when system RAM is added to iomem_resource because a part of the system RAM, the crashkernel memory, has already been added to iomem_resource. The next commit in the series "powerpc/crash: use generic crashkernel reservation" enables generic crashkernel reservation on PowerPC. If the crashkernel is added to iomem_resource, the kernel fails to add system RAM to /proc/iomem and prints the following traces: CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.13.0-rc2+ snip... NIP [c2016b3c] add_system_ram_resources+0xf0/0x15c LR [c2016b34] add_system_ram_resources+0xe8/0x15c Call Trace: [c484bbc0] [c2016b34] add_system_ram_resources+0xe8/0x15c [c484bc20] [c0010a4c] do_one_initcall+0x7c/0x39c [c484bd00] [c2005418] do_initcalls+0x144/0x18c [c484bd90] [c2005714] kernel_init_freeable+0x21c/0x290 [c484bdf0] [c00110f4] kernel_init+0x2c/0x1b8 [c484be50] [c000dd3c] ret_from_kernel_user_thread+0x14/0x1c To avoid this, an architecture hook is added in insert_crashkernel_resources(), allowing the architecture to decide whether crashkernel memory should be added to iomem_resource. Have you tried defining HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY in ppc to add crashkernel region to iomem early? I didn’t try, but I think even if I do, the kernel will run into the same issue because as soon as crashkernel is added to iomem, there will be a resource conflict when the PowerPC code tries to add system RAM to iomem. Which happens during subsys_initcall. Regardless, I will give it a try. Now there are two branches in the existing code, adding a hook will make three ways. I agree. I can try updating powerpc code to not consider crashkernel memory as iomem conflict. Thanks for the review. - Sourabh Jain
Re: [PATCH v3 5/6] s390/crash: Use note name macros
On Wed, Jan 08, 2025 at 01:53:51PM +0900, Akihiko Odaki wrote: > On 2025/01/08 1:17, Dave Martin wrote: > > Hi, > > > > On Tue, Jan 07, 2025 at 09:45:56PM +0900, Akihiko Odaki wrote: > > > Use note name macros to match with the userspace's expectation. > > > > > > Signed-off-by: Akihiko Odaki > > > --- > > > arch/s390/kernel/crash_dump.c | 62 > > > --- > > > 1 file changed, 23 insertions(+), 39 deletions(-) > > > > > > diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c > > > > [...] > > > +#define NT_INIT(buf, type, desc) \ > > > + (nt_init_name((buf), NT_ ## type, &(desc), sizeof(desc), NN_ ## type)) [...] > > (Note also, the outer parentheses and the parentheses around (buf) > > appear redundant -- although harmless?) > > They only make a difference in trivial corner cases and may look needlessly > verbose. (In case there was a misunderstanding here, I meant that some parentheses can be removed without affecting correctness: #define NT_INIT(buf, type, desc) \ nt_init_name(buf, NT_ ## type, &(desc), sizeof(desc), NN_ ## type)) It still doesn't matter though -- and some people do prefer to be defensive anyway and err on the side of having too many parentheses rather than too few.) [...] Cheers ---Dave
Re: [PATCH v8 0/7] PCI: Consolidate TLP Log reading and printing
On Wed, Dec 18, 2024 at 04:37:40PM +0200, Ilpo Järvinen wrote: > This series has the remaining patches of the AER & DPC TLP Log handling > consolidation and now includes a few minor improvements to the earlier > accepted TLP Logging code. > > v8: > - Added missing parameter to kerneldoc. > - Dropped last patch due to conflict with the pci_printk() cleanup > series (will move the patch into that series). > > v7: > - Explain in commit message reasoning why eetlp_prefix_max stores Max > End-End TLP Prefixes value instead of limiting it by the bridge/RP > imposed limits > - Take account TLP Prefix Log Present flag. > - Align PCI_ERR_CAP_* flags in pci_regs.h > - Add EE_PREFIX_STR define to be able to take its sizeof() for output > char[] sizing. > > v6: > - Preserve "AER:"/"DPC:" prefix on the printed TLP line > - New patch to add "AER:" also on other lines of the AER error dump > > v5: > - Fix build with AER=y and DPC=n > - Match kerneldoc and function parameter name > > v4: > - Added patches: > - Remove EXPORT of pcie_read_tlp_log() > - Moved code to pcie/tlp.c and build only with AER enabled > - Match variables in prototype and function > - int -> unsigned int conversion > - eetlp_prefix_max into own patch > - struct pcie_tlp_log param consistently called "log" within tlp.c > - Moved function prototypes into drivers/pci/pci.h > - Describe AER/DPC differences more clearly in one commit message > > v3: > - Small rewording in a commit message > > v2: > - Don't add EXPORT()s > - Don't include igxbe changes > - Don't use pr_cont() as it's incompatible with pci_err() and according > to Andy Shevchenko should not be used in the first place > > > Ilpo Järvinen (7): > PCI: Don't expose pcie_read_tlp_log() outside of PCI subsystem > PCI: Move TLP Log handling to own file > PCI: Make pcie_read_tlp_log() signature same > PCI: Use unsigned int i in pcie_read_tlp_log() > PCI: Store # of supported End-End TLP Prefixes > PCI: Add TLP Prefix reading into pcie_read_tlp_log() > PCI: Create helper to print TLP Header and Prefix Log > > drivers/pci/ats.c | 2 +- > drivers/pci/pci.c | 28 - > drivers/pci/pci.h | 9 +++ > drivers/pci/pcie/Makefile | 2 +- > drivers/pci/pcie/aer.c| 15 ++--- > drivers/pci/pcie/dpc.c| 14 ++--- > drivers/pci/pcie/tlp.c| 113 ++ > drivers/pci/probe.c | 14 +++-- > include/linux/aer.h | 3 +- > include/linux/pci.h | 2 +- > include/uapi/linux/pci_regs.h | 11 ++-- > 11 files changed, 153 insertions(+), 60 deletions(-) > create mode 100644 drivers/pci/pcie/tlp.c > > -- Can you please include a base commit in future revisions? I was able to apply the set to v6.13-rc6, but I was trying a couple of the PCI repo branches before which didn't apply. Thanks, Yazen
Re: [PATCH RESEND v1 4/5] powerpc/crash: use generic crashkernel reservation
On 2025-01-08 15:44:57 Wed, Sourabh Jain wrote: > Commit 0ab97169aa05 ("crash_core: add generic function to do > reservation") added a generic function to reserve crashkernel memory. > So let's use the same function on powerpc and remove the > architecture-specific code that essentially does the same thing. > > The generic crashkernel reservation also provides a way to split the > crashkernel reservation into high and low memory reservations, which can > be enabled for powerpc in the future. > > Along with moving to the generic crashkernel reservation, the code > related to finding the base address for the crashkernel has been > separated into its own function name get_crash_base() for better > readability and maintainability. > > To prevent crashkernel memory from being added to iomem_resource, the > function arch_add_crash_res_to_iomem() has been introduced. For further > details on why this should not be done for the PowerPC architecture, > please refer to the previous commit titled "crash: let arch decide crash > memory export to iomem_resource. > > Cc: Andrew Morton > Cc: Baoquan he > Cc: Hari Bathini > CC: Madhavan Srinivasan > Cc: Michael Ellerman > Cc: ke...@lists.infradead.org > Cc: linux-ker...@vger.kernel.org > Signed-off-by: Sourabh Jain > --- > arch/powerpc/Kconfig | 3 + > arch/powerpc/include/asm/crash_reserve.h | 18 + > arch/powerpc/include/asm/kexec.h | 4 +- > arch/powerpc/kernel/prom.c | 2 +- > arch/powerpc/kexec/core.c| 90 ++-- > 5 files changed, 63 insertions(+), 54 deletions(-) > create mode 100644 arch/powerpc/include/asm/crash_reserve.h [...] > @@ -113,9 +113,9 @@ int setup_new_fdt_ppc64(const struct kimage *image, void > *fdt, struct crash_mem > > #ifdef CONFIG_CRASH_RESERVE > int __init overlaps_crashkernel(unsigned long start, unsigned long size); > -extern void reserve_crashkernel(void); > +extern void arch_reserve_crashkernel(void); Do we really need to rename this ? it is still called from powepc arch and not from the common code. > #else > -static inline void reserve_crashkernel(void) {} > +static inline void arch_reserve_crashkernel(void) {} > static inline int overlaps_crashkernel(unsigned long start, unsigned long > size) { return 0; } > #endif > > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c > index e0059842a1c6..9ed9dde7d231 100644 > --- a/arch/powerpc/kernel/prom.c > +++ b/arch/powerpc/kernel/prom.c > @@ -860,7 +860,7 @@ void __init early_init_devtree(void *params) >*/ > if (fadump_reserve_mem() == 0) > #endif > - reserve_crashkernel(); > + arch_reserve_crashkernel(); > early_reserve_mem(); > > if (memory_limit > memblock_phys_mem_size()) Rest looks good to me. Reviewed-by: Mahesh Salgaonkar Thanks, -Mahesh.
Re: [PATCH RESEND v1 3/5] powerpc/kdump: preserve user-specified memory limit
On 2025-01-08 15:44:56 Wed, Sourabh Jain wrote: > Commit 59d58189f3d9 ("crash: fix crash memory reserve exceed system > memory bug") fails crashkernel parsing if the crash size is found to be > higher than system RAM, which makes the memory_limit adjustment code > ineffective due to an early exit from reserve_crashkernel(). > > Regardless lets not violated the user-specified memory limit by > adjusting it. Remove this adjustment to ensure all reservations stay > within the limit. Commit f94f5ac07983 ("powerpc/fadump: Don't update > the user-specified memory limit") did the same for fadump. Agreed. Reviewed-by: Mahesh Salgaonkar Thanks, -Mahesh.
Re: [PATCH v2] perf: Fix display of kernel symbols
On Wed, Jan 08, 2025 at 10:54:20AM +0100, Christophe Leroy wrote: > Since commit 659ad3492b91 ("perf maps: Switch from rbtree to lazily > sorted array for addresses"), perf doesn't display anymore kernel > symbols on powerpc, allthough it still detects them as kernel addresses. > > # Overhead Command Shared Object Symbol > # .. . > .. > # > 80.49% Coeur main [unknown] [k] 0xc005f0f8 >3.91% Coeur main gau[.] > engine_loop.constprop.0.isra.0 >1.72% Coeur main [unknown] [k] 0xc005f11c >1.09% Coeur main [unknown] [k] 0xc01f82c8 >0.44% Coeur main libc.so.6 [.] epoll_wait >0.38% Coeur main [unknown] [k] 0xc0011718 >0.36% Coeur main [unknown] [k] 0xc01f45c0 > > This is because function maps__find_next_entry() now returns current > entry instead of next entry, leading to kernel map end address > getting mis-configured with its own start address instead of the > start address of the following map. > > Fix it by really taking the next entry, also make sure that entry > follows current one by making sure entries are sorted. > > Fixes: 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted array > for addresses") > Signed-off-by: Christophe Leroy > Reviewed-by: Arnaldo Carvalho de Melo > --- > v2: Make sure the entries are sorted, if not sort them. Since you have changed what I reviewed I'll have to re-review :-) Will try to do it after some calls. - Arnaldo > --- > tools/perf/util/maps.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c > index 432399cbe5dd..09c9cc326c08 100644 > --- a/tools/perf/util/maps.c > +++ b/tools/perf/util/maps.c > @@ -1136,8 +1136,13 @@ struct map *maps__find_next_entry(struct maps *maps, > struct map *map) > struct map *result = NULL; > > down_read(maps__lock(maps)); > + while (!maps__maps_by_address_sorted(maps)) { > + up_read(maps__lock(maps)); > + maps__sort_by_address(maps); > + down_read(maps__lock(maps)); > + } > i = maps__by_address_index(maps, map); > - if (i < maps__nr_maps(maps)) > + if (++i < maps__nr_maps(maps)) > result = map__get(maps__maps_by_address(maps)[i]); > > up_read(maps__lock(maps)); > -- > 2.47.0
[PATCH] powerpc/32: Stop printing Kernel virtual memory layout
Printing of Kernel virtual memory layout was added for debug purpose by commit f637a49e507c ("powerpc: Minor cleanups of kernel virt address space definitions") For security reasons, don't display the kernel's virtual memory layout. Other architectures have removed it through following commits. 071929dbdd86 ("arm64: Stop printing the virtual memory layout") 1c31d4e96b8c ("ARM: 8820/1: mm: Stop printing the virtual memory layout") 3182f798 ("m68k/mm: Stop printing the virtual memory layout") fd8d0ca25631 ("parisc: Hide virtual kernel memory layout") 681ff0181bbf ("x86/mm/init/32: Stop printing the virtual memory layout") Commit 681ff0181bbf ("x86/mm/init/32: Stop printing the virtual memory layout") thought x86 was the last one, but in reality powerpc/32 still had it. So remove it now on powerpc/32 as well. Cc: Arvind Sankar Cc: Kees Cook Signed-off-by: Christophe Leroy --- arch/powerpc/mm/mem.c | 22 -- 1 file changed, 22 deletions(-) diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index c7708c8fad29..34806c858e54 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -319,28 +319,6 @@ void __init mem_init(void) per_cpu(next_tlbcam_idx, smp_processor_id()) = (mfspr(SPRN_TLB1CFG) & TLBnCFG_N_ENTRY) - 1; #endif - -#ifdef CONFIG_PPC32 - pr_info("Kernel virtual memory layout:\n"); -#ifdef CONFIG_KASAN - pr_info(" * 0x%08lx..0x%08lx : kasan shadow mem\n", - KASAN_SHADOW_START, KASAN_SHADOW_END); -#endif - pr_info(" * 0x%08lx..0x%08lx : fixmap\n", FIXADDR_START, FIXADDR_TOP); -#ifdef CONFIG_HIGHMEM - pr_info(" * 0x%08lx..0x%08lx : highmem PTEs\n", - PKMAP_BASE, PKMAP_ADDR(LAST_PKMAP)); -#endif /* CONFIG_HIGHMEM */ - if (ioremap_bot != IOREMAP_TOP) - pr_info(" * 0x%08lx..0x%08lx : early ioremap\n", - ioremap_bot, IOREMAP_TOP); - pr_info(" * 0x%08lx..0x%08lx : vmalloc & ioremap\n", - VMALLOC_START, VMALLOC_END); -#ifdef MODULES_VADDR - pr_info(" * 0x%08lx..0x%08lx : modules\n", - MODULES_VADDR, MODULES_END); -#endif -#endif /* CONFIG_PPC32 */ } void free_initmem(void) -- 2.47.0
Re: [PATCH v2] perf: Fix display of kernel symbols
On Wed, Jan 08, 2025 at 06:06:03PM +0100, Christophe Leroy wrote: > > > Le 08/01/2025 à 15:53, Arnaldo Carvalho de Melo a écrit : > > On Wed, Jan 08, 2025 at 10:54:20AM +0100, Christophe Leroy wrote: > > > Since commit 659ad3492b91 ("perf maps: Switch from rbtree to lazily > > > sorted array for addresses"), perf doesn't display anymore kernel > > > symbols on powerpc, allthough it still detects them as kernel addresses. > > > > > > # Overhead Command Shared Object Symbol > > > # .. . > > > .. > > > # > > > 80.49% Coeur main [unknown] [k] 0xc005f0f8 > > >3.91% Coeur main gau[.] > > > engine_loop.constprop.0.isra.0 > > >1.72% Coeur main [unknown] [k] 0xc005f11c > > >1.09% Coeur main [unknown] [k] 0xc01f82c8 > > >0.44% Coeur main libc.so.6 [.] epoll_wait > > >0.38% Coeur main [unknown] [k] 0xc0011718 > > >0.36% Coeur main [unknown] [k] 0xc01f45c0 > > > > > > This is because function maps__find_next_entry() now returns current > > > entry instead of next entry, leading to kernel map end address > > > getting mis-configured with its own start address instead of the > > > start address of the following map. > > > > > > Fix it by really taking the next entry, also make sure that entry > > > follows current one by making sure entries are sorted. > > > > > > Fixes: 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted > > > array for addresses") > > > Signed-off-by: Christophe Leroy > > > Reviewed-by: Arnaldo Carvalho de Melo > > > --- > > > v2: Make sure the entries are sorted, if not sort them. > > > > Since you have changed what I reviewed I'll have to re-review :-) Will > > try to do it after some calls. > > Ah yes sorry, should have removed your Reviewed-by. > > Based on Ian's feedback "Using the next entry in this way won't work if the > entries aren't sorted", I added the following block in front of the initial > change: > > + while (!maps__maps_by_address_sorted(maps)) { > + up_read(maps__lock(maps)); > + maps__sort_by_address(maps); > + down_read(maps__lock(maps)); > + } Its ok, I'll keep it now that I looked at it. Thanks! - Arnaldo
Re: [PATCH v5 00/25] fs/dax: Fix ZONE_DEVICE page reference counts
On Tue, Jan 07, 2025 at 02:42:16PM +1100, Alistair Popple wrote: > Main updates since v4: > > - Removed most of the devdax/fsdax checks in fs/proc/task_mmu.c. This >means smaps/pagemap may contain DAX pages. > > - Fixed rmap accounting of PUD mapped pages. > > - Minor code clean-ups. > > Main updates since v3: > > - Rebased onto next-20241216. Hi Alistair- This set passes the ndctl/dax unit tests when applied to next-20241216 Tested-by: Alison Schofield -- snip
Re: [PATCH v5 01/25] fuse: Fix dax truncate/punch_hole fault path
On Wed, Jan 08, 2025 at 02:30:24PM -0800, Dan Williams wrote: > Alistair Popple wrote: > > FS DAX requires file systems to call into the DAX layout prior to > > unlinking inodes to ensure there is no ongoing DMA or other remote > > access to the direct mapped page. The fuse file system implements > > fuse_dax_break_layouts() to do this which includes a comment > > indicating that passing dmap_end == 0 leads to unmapping of the whole > > file. > > > > However this is not true - passing dmap_end == 0 will not unmap > > anything before dmap_start, and further more > > dax_layout_busy_page_range() will not scan any of the range to see if > > there maybe ongoing DMA access to the range. > > It would be useful to clarify that this is bug was found by inspection > and that there are no known end user reports of trouble but that the > failure more would look like random fs corruption. The window is hard to > hit because a block needs to be truncated, reallocated to > a file, and written to before stale DMA could corrupt it. So that may > contribute to the fact that fuse-dax users have not reported an issue > since v5.10. > > > Fix this by checking for dmap_end == 0 in fuse_dax_break_layouts() and > > pass the entire file range to dax_layout_busy_page_range(). > > That's not what this patch does, maybe a rebase error that pushed the > @dmap_end fixup after the call to dax_layout_busy_page_range? Ha. Yep. I spotted this when doing the conversion to dax_layout_busy_page_range() and then had to rebase it into a stand alone patch for easy review. Obviously I put the check in the wrong spot, although it ends up in the right spot at the end of the series. > However, I don't think this is quite the right fix, more below... Yeah, I like your version better so will respin with that. > > Signed-off-by: Alistair Popple > > Fixes: 6ae330cad6ef ("virtiofs: serialize truncate/punch_hole and dax fault > > path") > > Cc: Vivek Goyal > > > > --- > > > > I am not at all familiar with the fuse file system driver so I have no > > idea if the comment is relevant or not and whether the documented > > behaviour for dmap_end == 0 is ever relied upon. However this seemed > > like the safest fix unless someone more familiar with fuse can confirm > > that dmap_end == 0 is never used. > > It is used in several places and has been wrong since day one. I believe > the original commit simply misunderstood that > dax_layout_busy_page_range() semantics are analogous to > invalidate_inode_pages2_range() semantics in terms of what @start and > @end mean. > > You can add: > > Co-developed-by: Dan Williams > Signed-off-by: Dan Williams > > ...if you end up doing a resend, or I will add it on applying to > nvdimm.git if the rebase does not end up being too prickly. Looks like I accidentally dropped a PPC fix so will do a respin for that. And the kernel build bot was complaining about incorrect documentation so will fix that while I'm at it. > -- 8< -- > diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c > index c5d1feaa239c..455c4a16080b 100644 > --- a/fs/fuse/dax.c > +++ b/fs/fuse/dax.c > @@ -681,7 +681,6 @@ static int __fuse_dax_break_layouts(struct inode *inode, > bool *retry, > 0, 0, fuse_wait_dax_page(inode)); > } > > -/* dmap_end == 0 leads to unmapping of whole file */ > int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, > u64 dmap_end) > { > @@ -693,10 +692,6 @@ int fuse_dax_break_layouts(struct inode *inode, u64 > dmap_start, > ret = __fuse_dax_break_layouts(inode, &retry, dmap_start, > dmap_end); > } while (ret == 0 && retry); > - if (!dmap_end) { > - dmap_start = 0; > - dmap_end = LLONG_MAX; > - } > > return ret; > } > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c > index 0b2f8567ca30..bc6c8936c529 100644 > --- a/fs/fuse/dir.c > +++ b/fs/fuse/dir.c > @@ -1936,7 +1936,7 @@ int fuse_do_setattr(struct mnt_idmap *idmap, struct > dentry *dentry, > if (FUSE_IS_DAX(inode) && is_truncate) { > filemap_invalidate_lock(mapping); > fault_blocked = true; > - err = fuse_dax_break_layouts(inode, 0, 0); > + err = fuse_dax_break_layouts(inode, 0, -1); > if (err) { > filemap_invalidate_unlock(mapping); > return err; > diff --git a/fs/fuse/file.c b/fs/fuse/file.c > index 082ee374f694..cef7a8f75821 100644 > --- a/fs/fuse/file.c > +++ b/fs/fuse/file.c > @@ -253,7 +253,7 @@ static int fuse_open(struct inode *inode, struct file > *file) > > if (dax_truncate) { > filemap_invalidate_lock(inode->i_mapping); > - err = fuse_dax_break_layouts(inode, 0, 0); > + err = fuse_dax_break_layouts(inode, 0, -1); > if (err) > goto out_inode_unlock; > } > @@ -2890,7 +2890,7 @@ static long fus
Re: [PATCH RESEND v1 4/5] powerpc/crash: use generic crashkernel reservation
Hello Mahesh, On 08/01/25 22:35, Mahesh J Salgaonkar wrote: On 2025-01-08 15:44:57 Wed, Sourabh Jain wrote: Commit 0ab97169aa05 ("crash_core: add generic function to do reservation") added a generic function to reserve crashkernel memory. So let's use the same function on powerpc and remove the architecture-specific code that essentially does the same thing. The generic crashkernel reservation also provides a way to split the crashkernel reservation into high and low memory reservations, which can be enabled for powerpc in the future. Along with moving to the generic crashkernel reservation, the code related to finding the base address for the crashkernel has been separated into its own function name get_crash_base() for better readability and maintainability. To prevent crashkernel memory from being added to iomem_resource, the Mahesh. function arch_add_crash_res_to_iomem() has been introduced. For further details on why this should not be done for the PowerPC architecture, please refer to the previous commit titled "crash: let arch decide crash memory export to iomem_resource. Cc: Andrew Morton Cc: Baoquan he Cc: Hari Bathini CC: Madhavan Srinivasan Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Sourabh Jain --- arch/powerpc/Kconfig | 3 + arch/powerpc/include/asm/crash_reserve.h | 18 + arch/powerpc/include/asm/kexec.h | 4 +- arch/powerpc/kernel/prom.c | 2 +- arch/powerpc/kexec/core.c| 90 ++-- 5 files changed, 63 insertions(+), 54 deletions(-) create mode 100644 arch/powerpc/include/asm/crash_reserve.h [...] @@ -113,9 +113,9 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, struct crash_mem #ifdef CONFIG_CRASH_RESERVE int __init overlaps_crashkernel(unsigned long start, unsigned long size); -extern void reserve_crashkernel(void); +extern void arch_reserve_crashkernel(void); Mahesh. Mahesh Do we really need to rename this ? it is still called from powepc arch and not from the common code. You are right, we don’t. However, all architectures (x86, RISC-V, LoongArch, ARM64) that use the generic crash kernel reservation have named their architecture-specific function `arch_reserve_crashkernel()`. So, I did the same for PowerPC, and this helps sometimes. Maybe I should justify the name change in the commit to avoid confusion. Please let me know your opinion. #else -static inline void reserve_crashkernel(void) {} +static inline void arch_reserve_crashkernel(void) {} static inline int overlaps_crashkernel(unsigned long start, unsigned long size) { return 0; } #endif diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index e0059842a1c6..9ed9dde7d231 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -860,7 +860,7 @@ void __init early_init_devtree(void *params) */ if (fadump_reserve_mem() == 0) #endif - reserve_crashkernel(); + arch_reserve_crashkernel(); early_reserve_mem(); if (memory_limit > memblock_phys_mem_size()) Rest looks good to me. Reviewed-by: Mahesh Salgaonkar Thank you for reviewing this. - Sourabh Jain
Re: [PATCH] kexec: Initialize ELF lowest address to ULONG_MAX
Hello Baoquan and Eric, On 12/12/24 08:25, Baoquan he wrote: On 12/10/24 at 02:43pm, Sourabh Jain wrote: kexec_elf_load() loads an ELF executable and sets the address of the lowest PT_LOAD section to the address held by the lowest_load_addr function argument. To determine the lowest PT_LOAD address, a local variable lowest_addr (type unsigned long) is initialized to UINT_MAX. After loading each PT_LOAD, its address is compared to lowest_addr. If a loaded PT_LOAD address is lower, lowest_addr is updated. However, setting lowest_addr to UINT_MAX won't work when the kernel image is loaded above 4G, as the returned lowest PT_LOAD address would be invalid. This is resolved by initializing lowest_addr to ULONG_MAX instead. This issue was discovered while implementing crashkernel high/low reservation on the PowerPC architecture. Fixes: a0458284f062 ("powerpc: Add support code for kexec_file_load()") Cc: Baoquan he Cc: Hari Bathini CC: Madhavan Srinivasan Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Sourabh Jain --- kernel/kexec_elf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/kexec_elf.c b/kernel/kexec_elf.c index d3689632e8b9..3a5c25b2adc9 100644 --- a/kernel/kexec_elf.c +++ b/kernel/kexec_elf.c @@ -390,7 +390,7 @@ int kexec_elf_load(struct kimage *image, struct elfhdr *ehdr, struct kexec_buf *kbuf, unsigned long *lowest_load_addr) { - unsigned long lowest_addr = UINT_MAX; + unsigned long lowest_addr = ULONG_MAX; Great catch. Acked-by: Baoquan He Thank you for the Ack! The upcoming two patch series, which aim to enable generic crashkernel reservation, depends on this fix. One of them is already posted for upstream review: https://lore.kernel.org/all/20250108101458.406806-1-sourabhj...@linux.ibm.com/ I was wondering if you could guide us on how to get this fix pushed to the mainline tree. Thanks, Sourabh Jain
Re: [PATCH v5 03/25] fs/dax: Don't skip locked entries when scanning entries
On Wed, Jan 08, 2025 at 02:50:36PM -0800, Dan Williams wrote: > Alistair Popple wrote: > > Several functions internal to FS DAX use the following pattern when > > trying to obtain an unlocked entry: > > > > xas_for_each(&xas, entry, end_idx) { > > if (dax_is_locked(entry)) > > entry = get_unlocked_entry(&xas, 0); > > > > This is problematic because get_unlocked_entry() will get the next > > present entry in the range, and the next entry may not be > > locked. Therefore any processing of the original locked entry will be > > skipped. This can cause dax_layout_busy_page_range() to miss DMA-busy > > pages in the range, leading file systems to free blocks whilst DMA > > operations are ongoing which can lead to file system corruption. > > > > Instead callers from within a xas_for_each() loop should be waiting > > for the current entry to be unlocked without advancing the XArray > > state so a new function is introduced to wait. > > Oh wow, good eye! > > Did this trip up an xfstest, or did you see this purely by inspection? Oh this was a "fun" one to track down :-) The other half of the story is in "fs/dax: Always remove DAX page-cache entries when breaking layouts". With just that patch applied xfstest triggered the new WARN_ON_ONCE in truncate_folio_batch_exceptionals(). That made no sense, because that patch makes breaking layouts also remove the DAX page-cache entries. Therefore no DAX page-cache entries should be found in truncate_folio_batch_exceptionals() which is now more of a sanity check. However due to the bug addressed by this patch DAX page-cache entries which should have been deleted as part of breaking layouts were being observed in truncate_folio_batch_exceptionals(). Prior to this series nothing would have noticed these being skipped because dax_delete_mapping_entry() doesn't check if the page is DMA idle. I believe this could lead to filesystem corruption if the locked entry was DMA-busy because the filesystem would assume the page was DMA-idle and therefore the underlying block free to be reallocated. However writing a test to actually prove this is tricky, and I didn't get time to do so. > > Also while we are here rename get_unlocked_entry() to > > get_next_unlocked_entry() to make it clear that it may advance the > > iterator state. > > Outside of the above clarification of how found / end user effect you > can add: > > Reviewed-by: Dan Williams
[PATCH] spi: fsl-spi: Remove display of virtual address
The following appears in kernel log at boot: fsl_spi b01004c0.spi: at 0x(ptrval) (irq = 51), QE mode This is useless, so remove the display of that virtual address and display the MMIO address instead, just like serial core does. Signed-off-by: Christophe Leroy --- drivers/spi/spi-fsl-spi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/spi/spi-fsl-spi.c b/drivers/spi/spi-fsl-spi.c index 856a4a9def66..2f2082652a1a 100644 --- a/drivers/spi/spi-fsl-spi.c +++ b/drivers/spi/spi-fsl-spi.c @@ -618,7 +618,7 @@ static struct spi_controller *fsl_spi_probe(struct device *dev, if (ret < 0) goto err_probe; - dev_info(dev, "at 0x%p (irq = %d), %s mode\n", reg_base, + dev_info(dev, "at MMIO %pa (irq = %d), %s mode\n", &mem->start, mpc8xxx_spi->irq, mpc8xxx_spi_strmode(mpc8xxx_spi->flags)); return host; -- 2.47.0
[PATCH] powerpc/ipic: Stop printing address of registers
The following line appears at boot: IPIC (128 IRQ sources) at (ptrval) This is pointless so remove the printing of the virtual address and replace it by matching physical address. Signed-off-by: Christophe Leroy --- arch/powerpc/sysdev/ipic.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/powerpc/sysdev/ipic.c b/arch/powerpc/sysdev/ipic.c index 5f69e2d50f26..037b04bf9a9f 100644 --- a/arch/powerpc/sysdev/ipic.c +++ b/arch/powerpc/sysdev/ipic.c @@ -762,8 +762,7 @@ struct ipic * __init ipic_init(struct device_node *node, unsigned int flags) ipic_write(ipic->regs, IPIC_SIMSR_H, 0); ipic_write(ipic->regs, IPIC_SIMSR_L, 0); - printk ("IPIC (%d IRQ sources) at %p\n", NR_IPIC_INTS, - primary_ipic->regs); + pr_info("IPIC (%d IRQ sources) at MMIO %pa\n", NR_IPIC_INTS, &res.start); return ipic; } -- 2.47.0
Re: [PATCH v5 05/25] fs/dax: Create a common implementation to break DAX layouts
On Wed, Jan 08, 2025 at 04:14:20PM -0800, Dan Williams wrote: > Alistair Popple wrote: > > Prior to freeing a block file systems supporting FS DAX must check > > that the associated pages are both unmapped from user-space and not > > undergoing DMA or other access from eg. get_user_pages(). This is > > achieved by unmapping the file range and scanning the FS DAX > > page-cache to see if any pages within the mapping have an elevated > > refcount. > > > > This is done using two functions - dax_layout_busy_page_range() which > > returns a page to wait for the refcount to become idle on. Rather than > > open-code this introduce a common implementation to both unmap and > > wait for the page to become idle. > > > > Signed-off-by: Alistair Popple > > > > --- > > > > Changes for v5: > > > > - Don't wait for idle pages on non-DAX mappings > > > > Changes for v4: > > > > - Fixed some build breakage due to missing symbol exports reported by > >John Hubbard (thanks!). > > --- > > fs/dax.c| 33 + > > fs/ext4/inode.c | 10 +- > > fs/fuse/dax.c | 29 + > > fs/xfs/xfs_inode.c | 23 +-- > > fs/xfs/xfs_inode.h | 2 +- > > include/linux/dax.h | 21 + > > mm/madvise.c| 8 > > 7 files changed, 70 insertions(+), 56 deletions(-) > > > > diff --git a/fs/dax.c b/fs/dax.c > > index d010c10..9c3bd07 100644 > > --- a/fs/dax.c > > +++ b/fs/dax.c > > @@ -845,6 +845,39 @@ int dax_delete_mapping_entry(struct address_space > > *mapping, pgoff_t index) > > return ret; > > } > > > > +static int wait_page_idle(struct page *page, > > + void (cb)(struct inode *), > > + struct inode *inode) > > +{ > > + return ___wait_var_event(page, page_ref_count(page) == 1, > > + TASK_INTERRUPTIBLE, 0, 0, cb(inode)); > > +} > > + > > +/* > > + * Unmaps the inode and waits for any DMA to complete prior to deleting the > > + * DAX mapping entries for the range. > > + */ > > +int dax_break_mapping(struct inode *inode, loff_t start, loff_t end, > > + void (cb)(struct inode *)) > > +{ > > + struct page *page; > > + int error; > > + > > + if (!dax_mapping(inode->i_mapping)) > > + return 0; > > + > > + do { > > + page = dax_layout_busy_page_range(inode->i_mapping, start, end); > > + if (!page) > > + break; > > + > > + error = wait_page_idle(page, cb, inode); > > This implementations removes logic around @retry found in the XFS and > FUSE implementations, I think that is a mistake, and EXT4 has > apparently been broken in this regard. I think both implementations are equivalent though, just that the XFS/FUSE ones are spread across two functions with the retry happening in the outer function whilst the EXT4 implementation is implemented in a single function with a do/ while loop. Both exit early if dax_layout_busy_page() doesn't find a DMA-busy page, and both call dax_layout_busy_page() a second time after waiting on a page to become idle. So I don't think anything is broken here, unless I've missed something. > wait_page_idle() returns after @page is idle, but that does not mean > @inode is DMA idle. After one found page from > dax_layout_busy_page_range() is waited upon a new call to > dax_break_mapping() needs to made to check if another DMA started, or if > there were originally more pages active. > > > + } while (error == 0); > > + > > + return error; > > Surprised that the compiler does not warn about an uninitialized > variable here? So am I. Turns out this is built with -Wno-maybe-uninitialized and it's not certain error is used uninitialized because we may bail early if this is not a dax_mapping. So it's only maybe used uninitialized which isn't warned about.
Re: [PATCH 4/6] powerpc/pseries: Add ibm,set-dynamic-indicator RTAS call support
Hi Haren, kernel test robot noticed the following build warnings: https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Haren-Myneni/powerpc-pseries-Define-common-functions-for-RTAS-sequence-HCALLs/20250105-045010 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next patch link: https://lore.kernel.org/r/20250104204652.388720-5-haren%40linux.ibm.com patch subject: [PATCH 4/6] powerpc/pseries: Add ibm,set-dynamic-indicator RTAS call support config: powerpc64-randconfig-r071-20250108 (https://download.01.org/0day-ci/archive/20250109/202501090337.xkcgrblc-...@intel.com/config) compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 096551537b2a747a3387726ca618ceeb3950e9bc) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Reported-by: Dan Carpenter | Closes: https://lore.kernel.org/r/202501090337.xkcgrblc-...@intel.com/ smatch warnings: arch/powerpc/platforms/pseries/papr-indices.c:438 papr_dynamic_indicator_ioc_set() warn: inconsistent returns 'global &rtas_ibm_set_dynamic_indicator_lock'. vim +438 arch/powerpc/platforms/pseries/papr-indices.c 3f48afd07934e4 Haren Myneni 2025-01-04 397 static long papr_dynamic_indicator_ioc_set(struct papr_indices_io_block __user *ubuf) 3f48afd07934e4 Haren Myneni 2025-01-04 398 { 3f48afd07934e4 Haren Myneni 2025-01-04 399 struct papr_indices_io_block kbuf; 3f48afd07934e4 Haren Myneni 2025-01-04 400 struct rtas_work_area *work_area; 3f48afd07934e4 Haren Myneni 2025-01-04 401 s32 fwrc, token, ret; 3f48afd07934e4 Haren Myneni 2025-01-04 402 3f48afd07934e4 Haren Myneni 2025-01-04 403 token = rtas_function_token(RTAS_FN_IBM_SET_DYNAMIC_INDICATOR); 3f48afd07934e4 Haren Myneni 2025-01-04 404 if (token == RTAS_UNKNOWN_SERVICE) 3f48afd07934e4 Haren Myneni 2025-01-04 405 return -ENOENT; 3f48afd07934e4 Haren Myneni 2025-01-04 406 3f48afd07934e4 Haren Myneni 2025-01-04 407 mutex_lock(&rtas_ibm_set_dynamic_indicator_lock); 3f48afd07934e4 Haren Myneni 2025-01-04 408 work_area = papr_dynamic_indice_buf_from_user(ubuf, &kbuf); 3f48afd07934e4 Haren Myneni 2025-01-04 409 if (IS_ERR(work_area)) 3f48afd07934e4 Haren Myneni 2025-01-04 410 return PTR_ERR(work_area); mutex_unlock(&rtas_ibm_set_dynamic_indicator_lock); before returning 3f48afd07934e4 Haren Myneni 2025-01-04 411 3f48afd07934e4 Haren Myneni 2025-01-04 412 do { 3f48afd07934e4 Haren Myneni 2025-01-04 413 fwrc = rtas_call(token, 3, 1, NULL, 3f48afd07934e4 Haren Myneni 2025-01-04 414 kbuf.dynamic_param.token, 3f48afd07934e4 Haren Myneni 2025-01-04 415 kbuf.dynamic_param.state, 3f48afd07934e4 Haren Myneni 2025-01-04 416 rtas_work_area_phys(work_area)); 3f48afd07934e4 Haren Myneni 2025-01-04 417 } while (rtas_busy_delay(fwrc)); 3f48afd07934e4 Haren Myneni 2025-01-04 418 3f48afd07934e4 Haren Myneni 2025-01-04 419 rtas_work_area_free(work_area); 3f48afd07934e4 Haren Myneni 2025-01-04 420 mutex_unlock(&rtas_ibm_set_dynamic_indicator_lock); 3f48afd07934e4 Haren Myneni 2025-01-04 421 3f48afd07934e4 Haren Myneni 2025-01-04 422 switch (fwrc) { 3f48afd07934e4 Haren Myneni 2025-01-04 423 case RTAS_IBM_DYNAMIC_INDICE_SUCCESS: 3f48afd07934e4 Haren Myneni 2025-01-04 424 ret = 0; 3f48afd07934e4 Haren Myneni 2025-01-04 425 break; 3f48afd07934e4 Haren Myneni 2025-01-04 426 case RTAS_IBM_DYNAMIC_INDICE_NO_INDICATOR: /* No such indicator */ 3f48afd07934e4 Haren Myneni 2025-01-04 427 ret = -EOPNOTSUPP; 3f48afd07934e4 Haren Myneni 2025-01-04 428 break; 3f48afd07934e4 Haren Myneni 2025-01-04 429 default: 3f48afd07934e4 Haren Myneni 2025-01-04 430 pr_err("unexpected ibm,set-dynamic-indicator result %d\n", 3f48afd07934e4 Haren Myneni 2025-01-04 431 fwrc); 3f48afd07934e4 Haren Myneni 2025-01-04 432 fallthrough; 3f48afd07934e4 Haren Myneni 2025-01-04 433 case RTAS_IBM_DYNAMIC_INDICE_HW_ERROR: /* Hardware/platform error */ 3f48afd07934e4 Haren Myneni 2025-01-04 434 ret = -EIO; 3f48afd07934e4 Haren Myneni 2025-01-04 435 break; 3f48afd07934e4 Haren Myneni 2025-01-04 436 } 3f48afd07934e4 Haren Myneni 2025-01-04 437 3f48afd07934e4 Haren Myneni 2025-01-04 @438 return ret; 3f48afd07934e4 Haren Myneni 2025-01-04 439 } -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Re: [PATCH 5/6] powerpc/pseries: Add ibm,get-dynamic-sensor-state RTAS call support
Hi Haren, kernel test robot noticed the following build warnings: https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Haren-Myneni/powerpc-pseries-Define-common-functions-for-RTAS-sequence-HCALLs/20250105-045010 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next patch link: https://lore.kernel.org/r/20250104204652.388720-6-haren%40linux.ibm.com patch subject: [PATCH 5/6] powerpc/pseries: Add ibm,get-dynamic-sensor-state RTAS call support config: powerpc64-randconfig-r071-20250108 (https://download.01.org/0day-ci/archive/20250109/202501090552.uzefb4qu-...@intel.com/config) compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 096551537b2a747a3387726ca618ceeb3950e9bc) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Reported-by: Dan Carpenter | Closes: https://lore.kernel.org/r/202501090552.uzefb4qu-...@intel.com/ New smatch warnings: arch/powerpc/platforms/pseries/papr-indices.c:496 papr_dynamic_sensor_ioc_get() warn: inconsistent returns 'global &rtas_ibm_get_dynamic_sensor_state_lock'. Old smatch warnings: arch/powerpc/platforms/pseries/papr-indices.c:438 papr_dynamic_indicator_ioc_set() warn: inconsistent returns 'global &rtas_ibm_set_dynamic_indicator_lock'. vim +496 arch/powerpc/platforms/pseries/papr-indices.c e44fb25ad9fa03 Haren Myneni 2025-01-04 452 static long papr_dynamic_sensor_ioc_get(struct papr_indices_io_block __user *ubuf) e44fb25ad9fa03 Haren Myneni 2025-01-04 453 { e44fb25ad9fa03 Haren Myneni 2025-01-04 454 struct papr_indices_io_block kbuf; e44fb25ad9fa03 Haren Myneni 2025-01-04 455 struct rtas_work_area *work_area; e44fb25ad9fa03 Haren Myneni 2025-01-04 456 s32 fwrc, token, ret; e44fb25ad9fa03 Haren Myneni 2025-01-04 457 u32 rets; e44fb25ad9fa03 Haren Myneni 2025-01-04 458 e44fb25ad9fa03 Haren Myneni 2025-01-04 459 token = rtas_function_token(RTAS_FN_IBM_GET_DYNAMIC_SENSOR_STATE); e44fb25ad9fa03 Haren Myneni 2025-01-04 460 if (token == RTAS_UNKNOWN_SERVICE) e44fb25ad9fa03 Haren Myneni 2025-01-04 461 return -ENOENT; e44fb25ad9fa03 Haren Myneni 2025-01-04 462 e44fb25ad9fa03 Haren Myneni 2025-01-04 463 mutex_lock(&rtas_ibm_get_dynamic_sensor_state_lock); e44fb25ad9fa03 Haren Myneni 2025-01-04 464 work_area = papr_dynamic_indice_buf_from_user(ubuf, &kbuf); e44fb25ad9fa03 Haren Myneni 2025-01-04 465 if (IS_ERR(work_area)) e44fb25ad9fa03 Haren Myneni 2025-01-04 466 return PTR_ERR(work_area); Add an unlock, same as with the _set() function. e44fb25ad9fa03 Haren Myneni 2025-01-04 467 e44fb25ad9fa03 Haren Myneni 2025-01-04 468 do { e44fb25ad9fa03 Haren Myneni 2025-01-04 469 fwrc = rtas_call(token, 2, 2, &rets, e44fb25ad9fa03 Haren Myneni 2025-01-04 470 kbuf.dynamic_param.token, e44fb25ad9fa03 Haren Myneni 2025-01-04 471 rtas_work_area_phys(work_area)); e44fb25ad9fa03 Haren Myneni 2025-01-04 472 } while (rtas_busy_delay(fwrc)); e44fb25ad9fa03 Haren Myneni 2025-01-04 473 e44fb25ad9fa03 Haren Myneni 2025-01-04 474 rtas_work_area_free(work_area); e44fb25ad9fa03 Haren Myneni 2025-01-04 475 mutex_unlock(&rtas_ibm_get_dynamic_sensor_state_lock); e44fb25ad9fa03 Haren Myneni 2025-01-04 476 e44fb25ad9fa03 Haren Myneni 2025-01-04 477 switch (fwrc) { e44fb25ad9fa03 Haren Myneni 2025-01-04 478 case RTAS_IBM_DYNAMIC_INDICE_SUCCESS: e44fb25ad9fa03 Haren Myneni 2025-01-04 479 if (put_user(rets, &ubuf->dynamic_param.state)) e44fb25ad9fa03 Haren Myneni 2025-01-04 480 ret = -EFAULT; e44fb25ad9fa03 Haren Myneni 2025-01-04 481 else e44fb25ad9fa03 Haren Myneni 2025-01-04 482 ret = 0; e44fb25ad9fa03 Haren Myneni 2025-01-04 483 break; e44fb25ad9fa03 Haren Myneni 2025-01-04 484 case RTAS_IBM_DYNAMIC_INDICE_NO_INDICATOR: /* No such indicator */ e44fb25ad9fa03 Haren Myneni 2025-01-04 485 ret = -EOPNOTSUPP; e44fb25ad9fa03 Haren Myneni 2025-01-04 486 break; e44fb25ad9fa03 Haren Myneni 2025-01-04 487 default: e44fb25ad9fa03 Haren Myneni 2025-01-04 488 pr_err("unexpected ibm,get-dynamic-sensor result %d\n", e44fb25ad9fa03 Haren Myneni 2025-01-04 489 fwrc); e44fb25ad9fa03 Haren Myneni 2025-01-04 490 fallthrough; e44fb25ad9fa03 Haren Myneni 2025-01-04 491 case RTAS_IBM_DYNAMIC_INDICE_HW_ERROR: /* Hardware/platform error */ e44fb25ad9fa03 Haren Myneni 2025-01-04 492 ret = -EIO; e44fb25ad9fa03 Haren Myneni 2025-01-04 493 break; e44fb25ad9fa03 Haren Myneni 2025-01-04 494 } e44fb25ad9fa03 Haren Myneni
Re: [PATCH v2 0/3] sysfs: constify bin_attribute argument of sysfs_bin_attr_simple_read()
On Wed, Jan 08, 2025 at 09:45:45AM -0800, Alexei Starovoitov wrote: > On Tue, Dec 31, 2024 at 2:30 AM Thomas Weißschuh wrote: > > > > On 2024-12-30 16:50:41-0800, Alexei Starovoitov wrote: > > > On Sat, Dec 28, 2024 at 12:43 AM Thomas Weißschuh > > > wrote: > > > > > > > > Most users use this function through the BIN_ATTR_SIMPLE* macros, > > > > they can handle the switch transparently. > > > > > > > > This series is meant to be merged through the driver core tree. > > > > > > hmm. why? > > > > Patch 1 changes the signature of sysfs_bin_attr_simple_read(). > > Before patch 1 sysfs_bin_attr_simple_read() needs to be assigned to the > > callback member .read, after patch 1 it's .read_new. > > (Both callbacks work exactly the same, except for their signature, > > .read_new is only a transition mechanism and will go away again) > > > > > I'd rather take patches 2 and 3 into bpf-next to avoid > > > potential conflicts. > > > Patch 1 looks orthogonal and independent. > > > > If you pick up 2 and 3 through bpf-next you would need to adapt these > > assignments. As soon as both patch 1 and the modified 2 and 3 hit > > Linus' tree, the build would break due to mismatches function pointers. > > (Casting function pointers to avoid the mismatch will blow up with KCFI) > > I see. All these steps to constify is frankly a mess. > You're wasting cpu and memory for this read vs read_new > when const is not much more than syntactic sugar in C. > You should have done one tree wide patch without doing this _new() hack. > > Anyway, rant over. Carry patches 2,3. Hopefully they won't conflict. > But I don't want to see any constification patches in bpf land > that come with such pointless runtime penalty. The "pointless" penalty will go away once we convert all instances, and really, it's just one pointer check, sysfs files should NOT be a hot path for anything real, and one more pointer check should be cached and not measurable compared to the real logic behind the binary data coming from the hardware/kernel, right? sysfs is NOT tuned for speed at all, so adding more checks like this should be fine. thanks, greg k-h
Re: [PATCH] perf machine: Don't ignore _etext when not a text symbol
Le 08/01/2025 à 21:14, Arnaldo Carvalho de Melo a écrit : On Wed, Jan 08, 2025 at 10:15:24AM +0100, Christophe Leroy wrote: Depending on how vmlinux.lds is written, _etext might be the very first data symbol instead of the very last text symbol. Don't require it to be a text symbol, accept any symbol type. I'm adding a Link: Link: https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F752a31b0-4370-4f52-b7cc-45f0078c1d6c%40csgroup.eu&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7C914f4c7995574ee91f5c08dd30211dd6%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638719640997470461%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=kqCNbhhgKri3TlJaUb3mkTU6NyFRzhnb%2BDiK93h9aSQ%3D&reserved=0 To give more context as where this has been observed, and also add a snippet of your explanation there, this: # grep -e _stext -e _etext -e _edata /proc/kallsyms c000 T _stext c08b8000 D _etext So there is no _edata and _etext is not text For the absence of _edata, I sent another patch, will you take it as well ? : https://lore.kernel.org/linux-perf-users/2fec8c50c271dff59f0177ff0884b6c374486ba5.1736327770.git.christophe.le...@csgroup.eu/T/#u Thanks Christophe
Re: [PATCH v3 5/6] s390/crash: Use note name macros
On 2025/01/08 22:50, Dave Martin wrote: On Wed, Jan 08, 2025 at 01:53:51PM +0900, Akihiko Odaki wrote: On 2025/01/08 1:17, Dave Martin wrote: Hi, On Tue, Jan 07, 2025 at 09:45:56PM +0900, Akihiko Odaki wrote: Use note name macros to match with the userspace's expectation. Signed-off-by: Akihiko Odaki --- arch/s390/kernel/crash_dump.c | 62 --- 1 file changed, 23 insertions(+), 39 deletions(-) diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c [...] +#define NT_INIT(buf, type, desc) \ + (nt_init_name((buf), NT_ ## type, &(desc), sizeof(desc), NN_ ## type)) [...] (Note also, the outer parentheses and the parentheses around (buf) appear redundant -- although harmless?) They only make a difference in trivial corner cases and may look needlessly verbose. (In case there was a misunderstanding here, I meant that some parentheses can be removed without affecting correctness: #define NT_INIT(buf, type, desc) \ nt_init_name(buf, NT_ ## type, &(desc), sizeof(desc), NN_ ## type)) It still doesn't matter though -- and some people do prefer to be defensive anyway and err on the side of having too many parentheses rather than too few.) Well, being very pedantic, there are some cases where these parentheses have some effect. If you omit the outer parentheses, the following code will have different consequences: a->NT_INIT(buf, PRSTATUS, desc) The parentheses around buf will make difference for the following code: #define COMMA , NT_INIT(NULL COMMA buf, PRSTATUS, desc) But nobody will write such code. Regards, Akihiko Odaki
Re: [PATCH] kexec: Initialize ELF lowest address to ULONG_MAX
On Thu, 9 Jan 2025 09:42:14 +0530 Sourabh Jain wrote: > Hello Baoquan and Eric, > > > On 12/12/24 08:25, Baoquan he wrote: > > On 12/10/24 at 02:43pm, Sourabh Jain wrote: > >> kexec_elf_load() loads an ELF executable and sets the address of the > >> lowest PT_LOAD section to the address held by the lowest_load_addr > >> function argument. > >> > >> To determine the lowest PT_LOAD address, a local variable lowest_addr > >> (type unsigned long) is initialized to UINT_MAX. After loading each > >> PT_LOAD, its address is compared to lowest_addr. If a loaded PT_LOAD > >> address is lower, lowest_addr is updated. However, setting lowest_addr > >> to UINT_MAX won't work when the kernel image is loaded above 4G, as the > >> returned lowest PT_LOAD address would be invalid. This is resolved by > >> initializing lowest_addr to ULONG_MAX instead. > >> > >> This issue was discovered while implementing crashkernel high/low > >> reservation on the PowerPC architecture. > >> > >> Fixes: a0458284f062 ("powerpc: Add support code for kexec_file_load()") > >> Cc: Baoquan he > >> Cc: Hari Bathini > >> CC: Madhavan Srinivasan > >> Cc: Michael Ellerman > >> Cc: ke...@lists.infradead.org > >> Cc: linuxppc-dev@lists.ozlabs.org > >> Cc: linux-ker...@vger.kernel.org > >> Signed-off-by: Sourabh Jain > >> --- > >> kernel/kexec_elf.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/kernel/kexec_elf.c b/kernel/kexec_elf.c > >> index d3689632e8b9..3a5c25b2adc9 100644 > >> --- a/kernel/kexec_elf.c > >> +++ b/kernel/kexec_elf.c > >> @@ -390,7 +390,7 @@ int kexec_elf_load(struct kimage *image, struct elfhdr > >> *ehdr, > >> struct kexec_buf *kbuf, > >> unsigned long *lowest_load_addr) > >> { > >> - unsigned long lowest_addr = UINT_MAX; > >> + unsigned long lowest_addr = ULONG_MAX; > > Great catch. > > > > Acked-by: Baoquan He > Thank you for the Ack! The upcoming two patch series, which aim to > enable generic crashkernel reservation, depends on this fix. One of them > is already posted for upstream review: > https://lore.kernel.org/all/20250108101458.406806-1-sourabhj...@linux.ibm.com/ > > I was wondering if you could guide us on how to get this fix pushed to > the mainline tree. Please include this patch (with Baoquan's ack) in whichever tree contains the powerpc patches which depend upon it.
Re: [PATCH] kexec: Initialize ELF lowest address to ULONG_MAX
Hello Andrew, On 09/01/25 10:58, Andrew Morton wrote: On Thu, 9 Jan 2025 09:42:14 +0530 Sourabh Jain wrote: Hello Baoquan and Eric, On 12/12/24 08:25, Baoquan he wrote: On 12/10/24 at 02:43pm, Sourabh Jain wrote: kexec_elf_load() loads an ELF executable and sets the address of the lowest PT_LOAD section to the address held by the lowest_load_addr function argument. To determine the lowest PT_LOAD address, a local variable lowest_addr (type unsigned long) is initialized to UINT_MAX. After loading each PT_LOAD, its address is compared to lowest_addr. If a loaded PT_LOAD address is lower, lowest_addr is updated. However, setting lowest_addr to UINT_MAX won't work when the kernel image is loaded above 4G, as the returned lowest PT_LOAD address would be invalid. This is resolved by initializing lowest_addr to ULONG_MAX instead. This issue was discovered while implementing crashkernel high/low reservation on the PowerPC architecture. Fixes: a0458284f062 ("powerpc: Add support code for kexec_file_load()") Cc: Baoquan he Cc: Hari Bathini CC: Madhavan Srinivasan Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Sourabh Jain --- kernel/kexec_elf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/kexec_elf.c b/kernel/kexec_elf.c index d3689632e8b9..3a5c25b2adc9 100644 --- a/kernel/kexec_elf.c +++ b/kernel/kexec_elf.c @@ -390,7 +390,7 @@ int kexec_elf_load(struct kimage *image, struct elfhdr *ehdr, struct kexec_buf *kbuf, unsigned long *lowest_load_addr) { - unsigned long lowest_addr = UINT_MAX; + unsigned long lowest_addr = ULONG_MAX; Great catch. Acked-by: Baoquan He Thank you for the Ack! The upcoming two patch series, which aim to enable generic crashkernel reservation, depends on this fix. One of them is already posted for upstream review: https://lore.kernel.org/all/20250108101458.406806-1-sourabhj...@linux.ibm.com/ I was wondering if you could guide us on how to get this fix pushed to the mainline tree. Please include this patch (with Baoquan's ack) in whichever tree contains the powerpc patches which depend upon it. Sure, I will include this patch in the respective patch series. Thanks, Sourabh Jain
Re: [PATCH RESEND v1 1/5] crash: remove an unused argument from reserve_crashkernel_generic()
Hello Baoquan, On 08/01/25 16:46, Baoquan he wrote: On 01/08/25 at 03:44pm, Sourabh Jain wrote: cmdline argument is not used in reserve_crashkernel_generic() so remove it. Correspondingly, all the callers have been updated as well. No functional change intended. Cc: Andrew Morton Cc: Baoquan he Cc: Hari Bathini CC: Madhavan Srinivasan Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Sourabh Jain --- arch/arm64/mm/init.c | 6 ++ arch/loongarch/kernel/setup.c | 5 ++--- arch/riscv/mm/init.c | 6 ++ arch/x86/kernel/setup.c | 6 ++ include/linux/crash_reserve.h | 11 +-- kernel/crash_reserve.c| 9 - 6 files changed, 17 insertions(+), 26 deletions(-) Acked-by: Baoquan He Thanks for the Ack! - Sourabh Jain
Re: [PATCH v2] perf: Fix display of kernel symbols
Le 08/01/2025 à 15:53, Arnaldo Carvalho de Melo a écrit : On Wed, Jan 08, 2025 at 10:54:20AM +0100, Christophe Leroy wrote: Since commit 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted array for addresses"), perf doesn't display anymore kernel symbols on powerpc, allthough it still detects them as kernel addresses. # Overhead Command Shared Object Symbol # .. . .. # 80.49% Coeur main [unknown] [k] 0xc005f0f8 3.91% Coeur main gau[.] engine_loop.constprop.0.isra.0 1.72% Coeur main [unknown] [k] 0xc005f11c 1.09% Coeur main [unknown] [k] 0xc01f82c8 0.44% Coeur main libc.so.6 [.] epoll_wait 0.38% Coeur main [unknown] [k] 0xc0011718 0.36% Coeur main [unknown] [k] 0xc01f45c0 This is because function maps__find_next_entry() now returns current entry instead of next entry, leading to kernel map end address getting mis-configured with its own start address instead of the start address of the following map. Fix it by really taking the next entry, also make sure that entry follows current one by making sure entries are sorted. Fixes: 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted array for addresses") Signed-off-by: Christophe Leroy Reviewed-by: Arnaldo Carvalho de Melo --- v2: Make sure the entries are sorted, if not sort them. Since you have changed what I reviewed I'll have to re-review :-) Will try to do it after some calls. Ah yes sorry, should have removed your Reviewed-by. Based on Ian's feedback "Using the next entry in this way won't work if the entries aren't sorted", I added the following block in front of the initial change: + while (!maps__maps_by_address_sorted(maps)) { + up_read(maps__lock(maps)); + maps__sort_by_address(maps); + down_read(maps__lock(maps)); + } Christophe
Re: Raptor Engineering dedicating resources to KVM on PowerNV + KVM CI/CD
Hi Alex, On 1/7/25 5:45 AM, Alex Williamson wrote > Hi, > > What are you supposing the value to the community is for a CI pipeline > that always fails? Are you hoping the community will address the > failing tests or monitor the failures to try to make them not become > worse? The failing tests are all isolated to issues with the specific AMD graphics hardware that the test machine is using for the VFIO and host GPU tests, and are likely isolated to the amdgpu driver itself. We have filed bugs with amdgpu folks. The non-failing tests however, possess value for regression monitoring including VM boot smoke tests for both little endian and big endian ppc64/pseries targets, as well as the vfio-*-attach tests that ensure hardware can be successfully bound to the vfio-pci driver on a PowerNV host. The test artifacts also include full dmesg output from the host and guest machine (when applicable) to assist with debugging. The data could definitely be presented in an easier to digest way to make it more obvious which failures are regressions and which are due to the aforementioned amdgpu issues, so that's an area for improvement. > > I would imagine that CI against key developer branches or linux-next > would be more useful than finding problems after we've merged with > mainline, but it's not clear there's any useful baseline here to > monitor for regressions. Thanks, > That's a good point -- I'll definitely look into adding at least linux-next, as well as any other branch requests from developers. > Alex Thanks, Shawn
Re: [PATCH v2] perf: Fix display of kernel symbols
On Wed, Jan 8, 2025 at 1:54 AM Christophe Leroy wrote: > > Since commit 659ad3492b91 ("perf maps: Switch from rbtree to lazily > sorted array for addresses"), perf doesn't display anymore kernel > symbols on powerpc, allthough it still detects them as kernel addresses. > > # Overhead Command Shared Object Symbol > # .. . > .. > # > 80.49% Coeur main [unknown] [k] 0xc005f0f8 > 3.91% Coeur main gau[.] > engine_loop.constprop.0.isra.0 > 1.72% Coeur main [unknown] [k] 0xc005f11c > 1.09% Coeur main [unknown] [k] 0xc01f82c8 > 0.44% Coeur main libc.so.6 [.] epoll_wait > 0.38% Coeur main [unknown] [k] 0xc0011718 > 0.36% Coeur main [unknown] [k] 0xc01f45c0 > > This is because function maps__find_next_entry() now returns current > entry instead of next entry, leading to kernel map end address > getting mis-configured with its own start address instead of the > start address of the following map. > > Fix it by really taking the next entry, also make sure that entry > follows current one by making sure entries are sorted. > > Fixes: 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted array > for addresses") > Signed-off-by: Christophe Leroy > Reviewed-by: Arnaldo Carvalho de Melo Reviewed-by: Ian Rogers Thanks! Ian > --- > v2: Make sure the entries are sorted, if not sort them. > --- > tools/perf/util/maps.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c > index 432399cbe5dd..09c9cc326c08 100644 > --- a/tools/perf/util/maps.c > +++ b/tools/perf/util/maps.c > @@ -1136,8 +1136,13 @@ struct map *maps__find_next_entry(struct maps *maps, > struct map *map) > struct map *result = NULL; > > down_read(maps__lock(maps)); > + while (!maps__maps_by_address_sorted(maps)) { > + up_read(maps__lock(maps)); > + maps__sort_by_address(maps); > + down_read(maps__lock(maps)); > + } > i = maps__by_address_index(maps, map); > - if (i < maps__nr_maps(maps)) > + if (++i < maps__nr_maps(maps)) > result = map__get(maps__maps_by_address(maps)[i]); > > up_read(maps__lock(maps)); > -- > 2.47.0 >
Re: [PATCH v5 07/17] mm: pgtable: introduce pagetable_dtor()
On Wed, Jan 08, 2025 at 02:57:23PM +0800, Qi Zheng wrote: > The pagetable_p*_dtor() are exactly the same except for the handling of > ptlock. If we make ptlock_free() handle the case where ptdesc->ptl is > NULL and remove VM_BUG_ON_PAGE() from pmd_ptlock_free(), we can unify > pagetable_p*_dtor() into one function. Let's introduce pagetable_dtor() > to do this. > > Later, pagetable_dtor() will be moved to tlb_remove_ptdesc(), so that > ptlock and page table pages can be freed together (regardless of whether > RCU is used). This prevents the use-after-free problem where the ptlock > is freed immediately but the page table pages is freed later via RCU. > > Signed-off-by: Qi Zheng > Originally-by: Peter Zijlstra (Intel) > Reviewed-by: Kevin Brodsky > --- ... > arch/s390/include/asm/pgalloc.h| 6 +-- > arch/s390/include/asm/tlb.h| 6 +-- > arch/s390/mm/pgalloc.c | 2 +- ... > include/asm-generic/pgalloc.h | 8 ++-- > include/linux/mm.h | 52 -- > mm/memory.c| 3 +- > 28 files changed, 62 insertions(+), 95 deletions(-) ... For s390: Acked-by: Alexander Gordeev Thanks!
Re: [PATCH v5 13/17] s390: pgtable: consolidate PxD and PTE TLB free paths
On Wed, Jan 08, 2025 at 02:57:29PM +0800, Qi Zheng wrote: > Call pagetable_dtor() for PMD|PUD|P4D tables just before ptdesc is > freed - same as it is done for PTE tables. That allows consolidating > TLB free paths for all table types. > > Signed-off-by: Qi Zheng > Suggested-by: Peter Zijlstra (Intel) > Reviewed-by: Kevin Brodsky > Cc: linux-s...@vger.kernel.org > --- > arch/s390/include/asm/tlb.h | 3 --- > arch/s390/mm/pgalloc.c | 14 -- > 2 files changed, 4 insertions(+), 13 deletions(-) > > diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h > index dde847a5be545..d5b27a2445c96 100644 > --- a/arch/s390/include/asm/tlb.h > +++ b/arch/s390/include/asm/tlb.h > @@ -102,7 +102,6 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb, > pmd_t *pmd, > { > if (mm_pmd_folded(tlb->mm)) > return; > - pagetable_dtor(virt_to_ptdesc(pmd)); > __tlb_adjust_range(tlb, address, PAGE_SIZE); > tlb->mm->context.flush_mm = 1; > tlb->freed_tables = 1; > @@ -122,7 +121,6 @@ static inline void p4d_free_tlb(struct mmu_gather *tlb, > p4d_t *p4d, > { > if (mm_p4d_folded(tlb->mm)) > return; > - pagetable_dtor(virt_to_ptdesc(p4d)); > __tlb_adjust_range(tlb, address, PAGE_SIZE); > tlb->mm->context.flush_mm = 1; > tlb->freed_tables = 1; > @@ -141,7 +139,6 @@ static inline void pud_free_tlb(struct mmu_gather *tlb, > pud_t *pud, > { > if (mm_pud_folded(tlb->mm)) > return; > - pagetable_dtor(virt_to_ptdesc(pud)); > tlb->mm->context.flush_mm = 1; > tlb->freed_tables = 1; > tlb->cleared_p4ds = 1; > diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c > index 569de24d33761..c73b89811a264 100644 > --- a/arch/s390/mm/pgalloc.c > +++ b/arch/s390/mm/pgalloc.c > @@ -180,7 +180,7 @@ unsigned long *page_table_alloc(struct mm_struct *mm) > return table; > } > > -static void pagetable_pte_dtor_free(struct ptdesc *ptdesc) > +static void pagetable_dtor_free(struct ptdesc *ptdesc) > { > pagetable_dtor(ptdesc); > pagetable_free(ptdesc); > @@ -190,20 +190,14 @@ void page_table_free(struct mm_struct *mm, unsigned > long *table) > { > struct ptdesc *ptdesc = virt_to_ptdesc(table); > > - pagetable_pte_dtor_free(ptdesc); > + pagetable_dtor_free(ptdesc); > } > > void __tlb_remove_table(void *table) > { > struct ptdesc *ptdesc = virt_to_ptdesc(table); > - struct page *page = ptdesc_page(ptdesc); > > - if (compound_order(page) == CRST_ALLOC_ORDER) { > - /* pmd, pud, or p4d */ > - pagetable_free(ptdesc); > - return; > - } > - pagetable_pte_dtor_free(ptdesc); > + pagetable_dtor_free(ptdesc); > } > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > @@ -211,7 +205,7 @@ static void pte_free_now(struct rcu_head *head) > { > struct ptdesc *ptdesc = container_of(head, struct ptdesc, pt_rcu_head); > > - pagetable_pte_dtor_free(ptdesc); > + pagetable_dtor_free(ptdesc); > } > > void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable) Acked-by: Alexander Gordeev Thanks!
Re: [PATCH v8 1/7] PCI: Don't expose pcie_read_tlp_log() outside of PCI subsystem
On Wed, Dec 18, 2024 at 04:37:41PM +0200, Ilpo Järvinen wrote: > pcie_read_tlp_log() was exposed by the commit 0a5a46a6a61b ("PCI/AER: > Generalize TLP Header Log reading") but this is now considered a > mistake. No drivers outside of PCI subsystem should build their own > diagnostic logging but should rely on PCI core doing it for them. > > There's currently one driver (ixgbe) doing it independently which was > the initial reason why the export was added but it was decided by the > PCI maintainer that it's something that should be eliminated. > > Remove the unwanted EXPORT of pcie_read_tlp_log() and remove it from > include/linux/aer.h. > > Link: https://lore.kernel.org/all/20240322193011.GA701027@bhelgaas/ > Signed-off-by: Ilpo Järvinen > Reviewed-by: Jonathan Cameron Reviewed-by: Yazen Ghannam Thanks, Yazen
Re: [PATCH v5 06/17] s390: pgtable: add statistics for PUD and P4D level page table
On Wed, Jan 08, 2025 at 02:57:22PM +0800, Qi Zheng wrote: > Like PMD and PTE level page table, also add statistics for PUD and P4D > page table. > > Signed-off-by: Qi Zheng > Suggested-by: Peter Zijlstra (Intel) > Reviewed-by: Kevin Brodsky > Cc: linux-s...@vger.kernel.org > --- > arch/s390/include/asm/pgalloc.h | 29 + > arch/s390/include/asm/tlb.h | 2 ++ > 2 files changed, 23 insertions(+), 8 deletions(-) > > diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h > index 7b84ef6dc4b6d..a0c1ca5d8423c 100644 > --- a/arch/s390/include/asm/pgalloc.h > +++ b/arch/s390/include/asm/pgalloc.h > @@ -53,29 +53,42 @@ static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, > unsigned long address) > { > unsigned long *table = crst_table_alloc(mm); > > - if (table) > - crst_table_init(table, _REGION2_ENTRY_EMPTY); > + if (!table) > + return NULL; > + crst_table_init(table, _REGION2_ENTRY_EMPTY); > + pagetable_p4d_ctor(virt_to_ptdesc(table)); > + > return (p4d_t *) table; > } > > static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d) > { > - if (!mm_p4d_folded(mm)) > - crst_table_free(mm, (unsigned long *) p4d); > + if (mm_p4d_folded(mm)) > + return; > + > + pagetable_p4d_dtor(virt_to_ptdesc(p4d)); > + crst_table_free(mm, (unsigned long *) p4d); > } > > static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long > address) > { > unsigned long *table = crst_table_alloc(mm); > - if (table) > - crst_table_init(table, _REGION3_ENTRY_EMPTY); > + > + if (!table) > + return NULL; > + crst_table_init(table, _REGION3_ENTRY_EMPTY); > + pagetable_pud_ctor(virt_to_ptdesc(table)); > + > return (pud_t *) table; > } > > static inline void pud_free(struct mm_struct *mm, pud_t *pud) > { > - if (!mm_pud_folded(mm)) > - crst_table_free(mm, (unsigned long *) pud); > + if (mm_pud_folded(mm)) > + return; > + > + pagetable_pud_dtor(virt_to_ptdesc(pud)); > + crst_table_free(mm, (unsigned long *) pud); > } > > static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long > vmaddr) > diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h > index e95b2c8081eb8..907d57a68145c 100644 > --- a/arch/s390/include/asm/tlb.h > +++ b/arch/s390/include/asm/tlb.h > @@ -122,6 +122,7 @@ static inline void p4d_free_tlb(struct mmu_gather *tlb, > p4d_t *p4d, > { > if (mm_p4d_folded(tlb->mm)) > return; > + pagetable_p4d_dtor(virt_to_ptdesc(p4d)); > __tlb_adjust_range(tlb, address, PAGE_SIZE); > tlb->mm->context.flush_mm = 1; > tlb->freed_tables = 1; > @@ -140,6 +141,7 @@ static inline void pud_free_tlb(struct mmu_gather *tlb, > pud_t *pud, > { > if (mm_pud_folded(tlb->mm)) > return; > + pagetable_pud_dtor(virt_to_ptdesc(pud)); > tlb->mm->context.flush_mm = 1; > tlb->freed_tables = 1; > tlb->cleared_p4ds = 1; Acked-by: Alexander Gordeev Thanks!
Re: [PATCH v8 2/7] PCI: Move TLP Log handling to own file
On Wed, Dec 18, 2024 at 04:37:42PM +0200, Ilpo Järvinen wrote: > TLP Log is PCIe feature and is processed only by AER and DPC. > Configwise, DPC depends AER being enabled. In lack of better place, the > TLP Log handling code was initially placed into pci.c but it can be > easily placed in a separate file. > > Move TLP Log handling code to own file under pcie/ subdirectory and > include it only when AER is enabled. > > Signed-off-by: Ilpo Järvinen > Reviewed-by: Jonathan Cameron > --- Reviewed-by: Yazen Ghannam Overall, looks good to me, but I have one idea below. > drivers/pci/pci.c | 27 --- > drivers/pci/pci.h | 2 +- > drivers/pci/pcie/Makefile | 2 +- > drivers/pci/pcie/tlp.c| 39 +++ > 4 files changed, 41 insertions(+), 29 deletions(-) > create mode 100644 drivers/pci/pcie/tlp.c > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index e0fdc9d10f91..02cd4c7eb80b 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -1099,33 +1099,6 @@ static void pci_enable_acs(struct pci_dev *dev) > pci_write_config_word(dev, pos + PCI_ACS_CTRL, caps.ctrl); > } > > -/** > - * pcie_read_tlp_log - read TLP Header Log > - * @dev: PCIe device > - * @where: PCI Config offset of TLP Header Log > - * @tlp_log: TLP Log structure to fill > - * > - * Fill @tlp_log from TLP Header Log registers, e.g., AER or DPC. > - * > - * Return: 0 on success and filled TLP Log structure, <0 on error. > - */ > -int pcie_read_tlp_log(struct pci_dev *dev, int where, > - struct pcie_tlp_log *tlp_log) > -{ > - int i, ret; > - > - memset(tlp_log, 0, sizeof(*tlp_log)); > - > - for (i = 0; i < 4; i++) { > - ret = pci_read_config_dword(dev, where + i * 4, > - &tlp_log->dw[i]); > - if (ret) > - return pcibios_err_to_errno(ret); > - } > - > - return 0; > -} > - > /** > * pci_restore_bars - restore a device's BAR values (e.g. after wake-up) > * @dev: PCI device to have its BARs restored > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index 8a60fc9e7786..55fcf3bac4f7 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -549,9 +549,9 @@ struct aer_err_info { > > int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info > *info); > void aer_print_error(struct pci_dev *dev, struct aer_err_info *info); > -#endif /* CONFIG_PCIEAER */ > > int pcie_read_tlp_log(struct pci_dev *dev, int where, struct pcie_tlp_log > *log); > +#endif /* CONFIG_PCIEAER */ > > #ifdef CONFIG_PCIEPORTBUS > /* Cached RCEC Endpoint Association */ > diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile > index 53ccab62314d..173829aa02e6 100644 > --- a/drivers/pci/pcie/Makefile > +++ b/drivers/pci/pcie/Makefile > @@ -7,7 +7,7 @@ pcieportdrv-y := portdrv.o rcec.o > obj-$(CONFIG_PCIEPORTBUS)+= pcieportdrv.o bwctrl.o > > obj-y+= aspm.o > -obj-$(CONFIG_PCIEAER)+= aer.o err.o > +obj-$(CONFIG_PCIEAER)+= aer.o err.o tlp.o > obj-$(CONFIG_PCIEAER_INJECT) += aer_inject.o > obj-$(CONFIG_PCIE_PME) += pme.o > obj-$(CONFIG_PCIE_DPC) += dpc.o > diff --git a/drivers/pci/pcie/tlp.c b/drivers/pci/pcie/tlp.c > new file mode 100644 > index ..3f053cc62290 > --- /dev/null > +++ b/drivers/pci/pcie/tlp.c > @@ -0,0 +1,39 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * PCIe TLP Log handling > + * > + * Copyright (C) 2024 Intel Corporation > + */ > + > +#include > +#include > +#include > + > +#include "../pci.h" > + > +/** > + * pcie_read_tlp_log - read TLP Header Log > + * @dev: PCIe device > + * @where: PCI Config offset of TLP Header Log > + * @tlp_log: TLP Log structure to fill > + * > + * Fill @tlp_log from TLP Header Log registers, e.g., AER or DPC. > + * > + * Return: 0 on success and filled TLP Log structure, <0 on error. > + */ > +int pcie_read_tlp_log(struct pci_dev *dev, int where, > + struct pcie_tlp_log *tlp_log) > +{ > + int i, ret; > + > + memset(tlp_log, 0, sizeof(*tlp_log)); > + Can we include a define for the number of registers? > + for (i = 0; i < 4; i++) { This '4' is "MIN_TLP_REGS" or something similar. > + ret = pci_read_config_dword(dev, where + i * 4, This '4' is the register offset factor. Another thought is to make the offset a variable and adjust it in the for-loop conditions. int i, ret, offset = where; for (i = 0; i < MIN_TLP_REGS; i++, offset += 4) { ret = pci_read_config_dword(dev, offset, &tlp_log->dw[i]); I think this will help as variable-size TLP logs are added in later patches. > + &tlp_log->dw[i]); > + if (ret) > + return pcibios_err_to_errno(ret); > + } > + > + return 0
Re: Perf doesn't display kernel symbols anymore (bisected commit 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted array for addresses"))
Le 06/01/2025 à 22:46, Namhyung Kim a écrit : And in System.map I have: c136a000 D __start___bug_table c1377c14 D __stop___bug_table c1378000 B __bss_start c1378000 B _edata c1378000 B initcall_debug c1378004 B reset_devices Should perf try to locate the very last symbol when it doesn't find _edata ? Or should architecture's link script be modified ? Otherwise commit 69a87a32f5cd ("perf machine: Include data symbols in the kernel map") is just pointless. Let's go with kallsyms__get_symbol_start(). I think it's the most straight-forward and simplest fix. Ok, I did that, see patch https://lore.kernel.org/linux-perf-users/b3ee1994d95257cb7f2de037c5030ba7d1bed404.1736327613.git.christophe.le...@csgroup.eu/T/#u And for the _edata which is sometimes missing, I send patch https://lore.kernel.org/linux-perf-users/2fec8c50c271dff59f0177ff0884b6c374486ba5.1736327770.git.christophe.le...@csgroup.eu/T/#u Christophe
Re: [PATCH v3 5/6] s390/crash: Use note name macros
On Wed, Jan 08, 2025 at 01:53:51PM +0900, Akihiko Odaki wrote: > On 2025/01/08 1:17, Dave Martin wrote: > > > +#define NT_INIT(buf, type, desc) \ > > > + (nt_init_name((buf), NT_ ## type, &(desc), sizeof(desc), NN_ ## type)) > > > > Nit: this macro name clashes with the naming scheme in elf.h. > > > > I think that there is a (weak) convention that macros with upper-case > > names don't expand to a C function call; thus, a macro with an upper- > > case name can be invoked in places where a C function call would not be > > allowed. (This convention is not followed everywhere, though -- it's > > up to the maintainer what they prefer here.) > > I wanted to clarify it is a macro as it concatenates tokens with ##, but I > also find there are many macros that are named lower-case and performs token > concatenation. > > S390 maintainers, please tell usr your opinion. Just make the new macros lower case to avoid the naming scheme clashes, please. Otherwise it doesn't matter too much. > > > +#define NT_SIZE(type, desc) (nt_size_name(sizeof(desc), NN_ ## type)) > > > > Nit: name prefix clash (again); possibly redundant parentheses. Same here. > > > - size = nt_size(NT_PRSTATUS, sizeof(struct elf_prstatus)); > > > - size += nt_size(NT_PRFPREG, sizeof(elf_fpregset_t)); > > > - size += nt_size(NT_S390_TIMER, sizeof(sa->timer)); > > > - size += nt_size(NT_S390_TODCMP, sizeof(sa->todcmp)); > > > - size += nt_size(NT_S390_TODPREG, sizeof(sa->todpreg)); > > > - size += nt_size(NT_S390_CTRS, sizeof(sa->ctrs)); > > > - size += nt_size(NT_S390_PREFIX, sizeof(sa->prefix)); > > > + size = NT_SIZE(PRSTATUS, struct elf_prstatus); > > > + size += NT_SIZE(PRFPREG, elf_fpregset_t); > > > + size += NT_SIZE(S390_TIMER, sa->timer); > > > + size += NT_SIZE(S390_TODCMP, sa->todcmp); > > > + size += NT_SIZE(S390_TODPREG, sa->todpreg); > > > + size += NT_SIZE(S390_CTRS, sa->ctrs); > > > + size += NT_SIZE(S390_PREFIX, sa->prefix); > > > > It might be worth fixing the funny spacing on these lines, since all > > the affected lines are being replaced. Yes, please! Besides that this looks good: Acked-by: Heiko Carstens
Re: [PATCH] KVM: allow NULL writable argument to __kvm_faultin_pfn
On Mon, Jan 06, 2025, Sean Christopherson wrote: > On Wed, Jan 01, 2025, Paolo Bonzini wrote: > > kvm_follow_pfn() is able to work with NULL in the .map_writable field > > of the homonymous struct. But __kvm_faultin_pfn() rejects the combo > > despite KVM for e500 trying to use it. Indeed .map_writable is not > > particularly useful if the flags include FOLL_WRITE and readonly > > guest memory is not supported, so add support to __kvm_faultin_pfn() > > for this case. > > I would prefer to keep the sanity check to minimize the risk of a page fault > handler not supporting opportunistic write mappings. e500 is definitely the > odd one out here. Per a quick chat at PUCK, Paolo is going to try and fix the e500 code to actually use the @writable param as it's intended.
Re: [PATCH v3 2/6] binfmt_elf: Use note name macros
Hi, On Wed, Jan 08, 2025 at 01:34:24PM +0900, Akihiko Odaki wrote: > On 2025/01/08 1:18, Dave Martin wrote: > > On Tue, Jan 07, 2025 at 09:45:53PM +0900, Akihiko Odaki wrote: > > > Use note name macros to match with the userspace's expectation. > > > > Also (and more importantly) get rid of duplicated knowledge about the > > mapping of note types to note names, so that elf.h is the authoritative > > source of this information? > > > > > > > > Signed-off-by: Akihiko Odaki > > > Acked-by: Baoquan He > > > --- > > > fs/binfmt_elf.c | 21 ++--- > > > fs/binfmt_elf_fdpic.c | 8 > > > 2 files changed, 14 insertions(+), 15 deletions(-) > > > > > > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c > > > index 106f0e8af177..5b4a92e5e508 100644 > > > --- a/fs/binfmt_elf.c > > > +++ b/fs/binfmt_elf.c > > > > [...] > > > > > @@ -1538,7 +1538,7 @@ static int elf_fdpic_core_dump(struct > > > coredump_params *cprm) > > > do > > > i += 2; > > > while (auxv[i - 2] != AT_NULL); > > > - fill_note(&auxv_note, "CORE", NT_AUXV, i * sizeof(elf_addr_t), auxv); > > > + fill_note(&auxv_note, NN_AUXV, NT_AUXV, i * sizeof(elf_addr_t), auxv); > > > thread_status_size += notesize(&auxv_note); > > > offset = sizeof(*elf); /* ELF header */ > > > > Looking at this code, it appears that the right name is explicitly > > taken from elf.h for a few specific notes, but for those that are > > specified by the arch code (e.g., in struct user_regset entries) the > > name is still guessed locally: > > > > static int fill_thread_core_info(...) { > > > > ... > > > > fill_note(&t->notes[note_iter], is_fpreg ? "CORE" : "LINUX", > > note_type, ret, data); > > > > > > It would be preferable to clean this up if we want elf.h to be the > > authoritative source for the names. > > If we want elf.h to be the authoritative source, yes, but I like the current > form as it ensures nobody adds a note with a name different from "LINUX" and > it is also simpler. There is a trade-off so I'd like to keep the current > form unless anyone has a strong preference for one option. > > Regards, > Akihiko Odaki I can see where you're coming from here. It would be nice to at least be able to check that elf.h is consistent with the behaviour here, but you're right -- there is a tradeoff. Maybe add a comment in elf.h at the end of the block of #defines saying that new Linux-specific entries should use the name "LINUX"? Either way, I don't think it's a huge deal. If people are happy with this code as-is, then I don't have an issue with it. I might follow up with a separate patch if this series is merged, and people can consider it on its own merits (or lack thereof). Cheers ---Dave
Re: [PATCH 03/14] ibmvnic: simplify ibmvnic_set_queue_affinity()y
On Tue, Jan 07, 2025 at 03:04:40PM -0800, Yury Norov wrote: > On Tue, Jan 07, 2025 at 02:43:01PM -0800, Yury Norov wrote: > > On Tue, Jan 07, 2025 at 04:37:17PM -0600, Nick Child wrote: > > > On Sat, Dec 28, 2024 at 10:49:35AM -0800, Yury Norov wrote: > > > > A loop based on cpumask_next_wrap() opencodes the dedicated macro > > > > for_each_online_cpu_wrap(). Using the macro allows to avoid setting > > > > bits affinity mask more than once when stride >= num_online_cpus. > > > > > > > > This also helps to drop cpumask handling code in the caller function. > > > > > > > > Signed-off-by: Yury Norov > > > > --- > > > > drivers/net/ethernet/ibm/ibmvnic.c | 17 ++--- > > > > 1 file changed, 10 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c > > > > b/drivers/net/ethernet/ibm/ibmvnic.c > > > > index e95ae0d39948..4cfd90fb206b 100644 > > > > --- a/drivers/net/ethernet/ibm/ibmvnic.c > > > > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > > > > @@ -234,11 +234,16 @@ static int ibmvnic_set_queue_affinity(struct > > > > ibmvnic_sub_crq_queue *queue, > > > > (*stragglers)--; > > > > } > > > > /* atomic write is safer than writing bit by bit directly */ > > > > - for (i = 0; i < stride; i++) { > > > > - cpumask_set_cpu(*cpu, mask); > > > > - *cpu = cpumask_next_wrap(*cpu, cpu_online_mask, > > > > -nr_cpu_ids, false); > > > > + for_each_online_cpu_wrap(i, *cpu) { > > > > + if (!stride--) > > > > + break; > > > > + cpumask_set_cpu(i, mask); > > > > } > > > > + > > > > + /* For the next queue we start from the first unused CPU in > > > > this queue */ > > > > + if (i < nr_cpu_ids) > > > > + *cpu = i + 1; > > > > + > > > This should read '*cpu = i'. Since the loop breaks after incrementing i. > > > Thanks! > > > > cpumask_next_wrap() makes '+ 1' for you. The for_each_cpu_wrap() starts > > exactly where you point. So, this '+1' needs to be explicit now. > > > > Does that make sense? > > Ah, I think I see what you mean. It should be like this, right? > > for_each_online_cpu_wrap(i, *cpu) { > if (!stride--) { > *cpu = i + 1; > break; > } > cpumask_set_cpu(i, mask); > } Not quite, for_each_online_cpu_wrap will increment i to point to the next online cpu, then enter the body of the loop. When we break (beacuse stride is zero), we exit the loop early before i is added to any mask, i is the next unassigned online cpu. I tested this to make sure, we see unused cpus (#7, #23) with the patch as is: IRQ : 256 -> ibmvnic-3003-tx0 /proc/irq/256/smp_affinity_list:0-6 IRQ : 257 -> ibmvnic-3003-tx1 /proc/irq/257/smp_affinity_list:16-22 IRQ : 258 -> ibmvnic-3003-rx0 /proc/irq/258/smp_affinity_list:8-14 IRQ : 259 -> ibmvnic-3003-rx1 /proc/irq/259/smp_affinity_list:24-30
Re: watchdog: BUG: soft lockup
Hi, On Sun, Dec 22, 2024 at 10:32 PM wzs wrote: > > Hello, > when fuzzing the Linux kernel, > I triggered many "watch: BUG: soft lockup" warnings. > I am not sure whether this is an issue with the kernel or with the > fuzzing program I ran. > (The same fuzzing program, when tested on kernel versions from > Linux-6.7.0 to 6.12.0, triggers the 'watchdog: BUG: soft lockup' > warning on some versions, while others do not. Linux 6.12.0 is the > latest stable release where this error occurs.) > > The bug information I provided below is from the Linux-6.12.0 kernel. > If you need bug information from other versions, I would be happy to provide > it. > > kernel config :https://pastebin.com/i4LPXNAN > console output :https://pastebin.com/uKVpvJ78 IMO it's nearly always a bug if userspace can cause the kernel to soft lockup. I'd expect this isn't a bug in the soft lockup detector but a problem in whatever part of the kernel you're fuzzing. For some details of the soft lockup detector, see `Documentation/admin-guide/lockup-watchdogs.rst`. Presumably you're fuzzing the kernel in a way that causes it to enter a big loop while preemption is disabled, or something like that. Presumably the kernel should be detecting something invalid that userspace did and that would keep it from looping so long. I tried looking at your pastebin and probably what's going on is somewhere hidden in there, but unfortunately the beginning of the logs are a bit jumbled since it looks like the RCU warning and the soft lockup warning happened at about the same time and their stuff is jumbled. There's also a lot of tasks to go through. Honestly, it's probably less work just to look at whatever you were trying to fuzz to help you pinpoint the problem. I'll also note that you seem to be using KASAN and are running in a virtual machine. It's not inconceivable that's contributing to your problems. KASAN makes things _a lot_ slower and a VM may be getting its time stolen by the host. -Doug
[PATCH] powerpc/pseries/iommu: create DDW for devices with DMA mask less than 64-bits
Starting with PAPR level 2.13, platform supports placing PHB in limited address mode. Devices that support DMA masks less that 64-bit but greater than 32-bits are placed in limited address mode. In this mode, the starting DMA address returned by the DDW is 4GB. When the device driver calls dma_supported, with mask less then 64-bit, the PowerPC IOMMU driver places PHB in the Limited Addressing Mode before creating DDW. Signed-off-by: Gaurav Batra --- arch/powerpc/platforms/pseries/iommu.c | 110 + 1 file changed, 94 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 534cd159e9ab..551e9ca4dcc2 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -52,7 +52,8 @@ enum { enum { DDW_EXT_SIZE = 0, DDW_EXT_RESET_DMA_WIN = 1, - DDW_EXT_QUERY_OUT_SIZE = 2 + DDW_EXT_QUERY_OUT_SIZE = 2, + DDW_EXT_LIMITED_ADDR_MODE = 3 }; static struct iommu_table *iommu_pseries_alloc_table(int node) @@ -1331,6 +1332,54 @@ static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn) ret); } +/* + * Platforms support placing PHB in limited address mode starting with LoPAR + * level 2.13 implement. In this mode, the DMA address returned by DDW is over + * 4GB but, less than 64-bits. This benefits IO adapters that don't support + * 64-bits for DMA addresses. + */ +static int limited_dma_window(struct pci_dev *dev, struct device_node *par_dn) +{ + int ret; + u32 cfg_addr, reset_dma_win, las_supported; + u64 buid; + struct device_node *dn; + struct pci_dn *pdn; + + ret = ddw_read_ext(par_dn, DDW_EXT_RESET_DMA_WIN, &reset_dma_win); + if (ret) + goto out; + + ret = ddw_read_ext(par_dn, DDW_EXT_LIMITED_ADDR_MODE, &las_supported); + + /* Limited Address Space extension available on the platform but DDW in +* limited addressing mode not supported +*/ + if (!ret && !las_supported) + ret = -EPROTO; + + if (ret) { + dev_info(&dev->dev, "Limited Address Space for DDW not Supported, err: %d", ret); + goto out; + } + + dn = pci_device_to_OF_node(dev); + pdn = PCI_DN(dn); + buid = pdn->phb->buid; + cfg_addr = (pdn->busno << 16) | (pdn->devfn << 8); + + ret = rtas_call(reset_dma_win, 4, 1, NULL, cfg_addr, BUID_HI(buid), + BUID_LO(buid), 1); + if (ret) + dev_info(&dev->dev, +"ibm,reset-pe-dma-windows(%x) for Limited Addr Support: %x %x %x returned %d ", +reset_dma_win, cfg_addr, BUID_HI(buid), BUID_LO(buid), +ret); + +out: + return ret; +} + /* Return largest page shift based on "IO Page Sizes" output of ibm,query-pe-dma-window. */ static int iommu_get_page_shift(u32 query_page_size) { @@ -1398,7 +1447,7 @@ static struct property *ddw_property_create(const char *propname, u32 liobn, u64 * * returns true if can map all pages (direct mapping), false otherwise.. */ -static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn) +static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn, u64 dma_mask) { int len = 0, ret; int max_ram_len = order_base_2(ddw_memory_hotplug_max()); @@ -1417,6 +1466,9 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn) bool pmem_present; struct pci_dn *pci = PCI_DN(pdn); struct property *default_win = NULL; + bool limited_addr_req = false, limited_addr_enabled = false; + int dev_max_ddw; + int ddw_sz; dn = of_find_node_by_type(NULL, "ibm,pmemory"); pmem_present = dn != NULL; @@ -1443,7 +1495,6 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn) * the ibm,ddw-applicable property holds the tokens for: * ibm,query-pe-dma-window * ibm,create-pe-dma-window -* ibm,remove-pe-dma-window * for the given node in that order. * the property is actually in the parent, not the PE */ @@ -1463,6 +1514,20 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn) if (ret != 0) goto out_failed; + /* DMA Limited Addressing required? This is when the driver has +* requested to create DDW but supports mask which is less than 64-bits +*/ + limited_addr_req = (dma_mask != DMA_BIT_MASK(64)); + + /* place the PHB in Limited Addressing mode */ + if (limited_addr_req) { + if (limited_dma_window(dev, pdn)) + goto out_failed; + + /* PHB is in Limited address mode */ + limited_addr_enabled = true; + } + /* * If there is no window available, remove the default DMA w
Re: [PATCH v2 0/3] sysfs: constify bin_attribute argument of sysfs_bin_attr_simple_read()
On Tue, Dec 31, 2024 at 2:30 AM Thomas Weißschuh wrote: > > On 2024-12-30 16:50:41-0800, Alexei Starovoitov wrote: > > On Sat, Dec 28, 2024 at 12:43 AM Thomas Weißschuh > > wrote: > > > > > > Most users use this function through the BIN_ATTR_SIMPLE* macros, > > > they can handle the switch transparently. > > > > > > This series is meant to be merged through the driver core tree. > > > > hmm. why? > > Patch 1 changes the signature of sysfs_bin_attr_simple_read(). > Before patch 1 sysfs_bin_attr_simple_read() needs to be assigned to the > callback member .read, after patch 1 it's .read_new. > (Both callbacks work exactly the same, except for their signature, > .read_new is only a transition mechanism and will go away again) > > > I'd rather take patches 2 and 3 into bpf-next to avoid > > potential conflicts. > > Patch 1 looks orthogonal and independent. > > If you pick up 2 and 3 through bpf-next you would need to adapt these > assignments. As soon as both patch 1 and the modified 2 and 3 hit > Linus' tree, the build would break due to mismatches function pointers. > (Casting function pointers to avoid the mismatch will blow up with KCFI) I see. All these steps to constify is frankly a mess. You're wasting cpu and memory for this read vs read_new when const is not much more than syntactic sugar in C. You should have done one tree wide patch without doing this _new() hack. Anyway, rant over. Carry patches 2,3. Hopefully they won't conflict. But I don't want to see any constification patches in bpf land that come with such pointless runtime penalty.
Re: [PATCH] perf machine: Don't ignore _etext when not a text symbol
On Wed, Jan 08, 2025 at 10:15:24AM +0100, Christophe Leroy wrote: > Depending on how vmlinux.lds is written, _etext might be the very > first data symbol instead of the very last text symbol. > > Don't require it to be a text symbol, accept any symbol type. I'm adding a Link: Link: https://lore.kernel.org/all/752a31b0-4370-4f52-b7cc-45f0078c1...@csgroup.eu To give more context as where this has been observed, and also add a snippet of your explanation there, this: # grep -e _stext -e _etext -e _edata /proc/kallsyms c000 T _stext c08b8000 D _etext So there is no _edata and _etext is not text $ ppc-linux-objdump -x vmlinux | grep -e _stext -e _etext -e _edata c000 g .head.text _stext c08b8000 g .rodata _etext c1378000 g .sbss _edata Thanks, - Arnaldo > Fixes: ed9adb2035b5 ("perf machine: Read also the end of the kernel") > Signed-off-by: Christophe Leroy > --- > tools/perf/util/machine.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c > index 27d5345d2b30..9be2f4479f52 100644 > --- a/tools/perf/util/machine.c > +++ b/tools/perf/util/machine.c > @@ -1003,7 +1003,7 @@ static int machine__get_running_kernel_start(struct > machine *machine, > > err = kallsyms__get_symbol_start(filename, "_edata", &addr); > if (err) > - err = kallsyms__get_function_start(filename, "_etext", &addr); > + err = kallsyms__get_symbol_start(filename, "_etext", &addr); > if (!err) > *end = addr; > > -- > 2.47.0
Re: [PATCH v8 3/7] PCI: Make pcie_read_tlp_log() signature same
On Wed, Dec 18, 2024 at 04:37:43PM +0200, Ilpo Järvinen wrote: > pcie_read_tlp_log()'s prototype and function signature diverged due to > changes made while applying. > > Make the parameters of pcie_read_tlp_log() named identically. > > Signed-off-by: Ilpo Järvinen > Reviewed-by: Jonathan Cameron Reviewed-by: Yazen Ghannam Just wondering, could this be squashed into the previous patch? Or is the goal to be strict with one logical change per patch? Thanks, Yazen
Re: [PATCH v8 6/7] PCI: Add TLP Prefix reading into pcie_read_tlp_log()
On Wed, Dec 18, 2024 at 04:37:46PM +0200, Ilpo Järvinen wrote: > pcie_read_tlp_log() handles only 4 Header Log DWORDs but TLP Prefix Log > (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13) may also be present. > > Generalize pcie_read_tlp_log() and struct pcie_tlp_log to handle also > TLP Prefix Log. The relevant registers are formatted identically in AER > and DPC Capability, but has these variations: > > a) The offsets of TLP Prefix Log registers vary. > b) DPC RP PIO TLP Prefix Log register can be < 4 DWORDs. > c) AER TLP Prefix Log Present (PCIe r6.1 sec 7.8.4.7) can indicate >Prefix Log is not present. > > Therefore callers must pass the offset of the TLP Prefix Log register > and the entire length to pcie_read_tlp_log() to be able to read the > correct number of TLP Prefix DWORDs from the correct offset. > > Signed-off-by: Ilpo Järvinen > Reviewed-by: Jonathan Cameron > --- > drivers/pci/pci.h | 5 +++- > drivers/pci/pcie/aer.c| 5 +++- > drivers/pci/pcie/dpc.c| 13 + > drivers/pci/pcie/tlp.c| 51 +++ > include/linux/aer.h | 1 + > include/uapi/linux/pci_regs.h | 10 --- > 6 files changed, 67 insertions(+), 18 deletions(-) > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index 55fcf3bac4f7..7797b2544d00 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -550,7 +550,9 @@ struct aer_err_info { > int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info > *info); > void aer_print_error(struct pci_dev *dev, struct aer_err_info *info); > > -int pcie_read_tlp_log(struct pci_dev *dev, int where, struct pcie_tlp_log > *log); > +int pcie_read_tlp_log(struct pci_dev *dev, int where, int where2, > + unsigned int tlp_len, struct pcie_tlp_log *log); > +unsigned int aer_tlp_log_len(struct pci_dev *dev, u32 aercc); > #endif /* CONFIG_PCIEAER */ > > #ifdef CONFIG_PCIEPORTBUS > @@ -569,6 +571,7 @@ void pci_dpc_init(struct pci_dev *pdev); > void dpc_process_error(struct pci_dev *pdev); > pci_ers_result_t dpc_reset_link(struct pci_dev *pdev); > bool pci_dpc_recovered(struct pci_dev *pdev); > +unsigned int dpc_tlp_log_len(struct pci_dev *dev); > #else > static inline void pci_save_dpc_state(struct pci_dev *dev) { } > static inline void pci_restore_dpc_state(struct pci_dev *dev) { } > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 80c5ba8d8296..656dbf1ac45b 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -1248,7 +1248,10 @@ int aer_get_device_error_info(struct pci_dev *dev, > struct aer_err_info *info) > > if (info->status & AER_LOG_TLP_MASKS) { > info->tlp_header_valid = 1; > - pcie_read_tlp_log(dev, aer + PCI_ERR_HEADER_LOG, > &info->tlp); > + pcie_read_tlp_log(dev, aer + PCI_ERR_HEADER_LOG, > + aer + PCI_ERR_PREFIX_LOG, > + aer_tlp_log_len(dev, aercc), > + &info->tlp); > } > } > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c > index 2b6ef7efa3c1..7933b3cedb59 100644 > --- a/drivers/pci/pcie/dpc.c > +++ b/drivers/pci/pcie/dpc.c > @@ -190,7 +190,7 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev) > static void dpc_process_rp_pio_error(struct pci_dev *pdev) > { > u16 cap = pdev->dpc_cap, dpc_status, first_error; > - u32 status, mask, sev, syserr, exc, log, prefix; > + u32 status, mask, sev, syserr, exc, log; > struct pcie_tlp_log tlp_log; > int i; > > @@ -217,20 +217,19 @@ static void dpc_process_rp_pio_error(struct pci_dev > *pdev) > > if (pdev->dpc_rp_log_size < 4) > goto clear_status; > - pcie_read_tlp_log(pdev, cap + PCI_EXP_DPC_RP_PIO_HEADER_LOG, &tlp_log); > + pcie_read_tlp_log(pdev, cap + PCI_EXP_DPC_RP_PIO_HEADER_LOG, > + cap + PCI_EXP_DPC_RP_PIO_TLPPREFIX_LOG, > + dpc_tlp_log_len(pdev), &tlp_log); > pci_err(pdev, "TLP Header: %#010x %#010x %#010x %#010x\n", > tlp_log.dw[0], tlp_log.dw[1], tlp_log.dw[2], tlp_log.dw[3]); > + for (i = 0; i < pdev->dpc_rp_log_size - 5; i++) > + pci_err(pdev, "TLP Prefix Header: dw%d, %#010x\n", i, > tlp_log.prefix[i]); > > if (pdev->dpc_rp_log_size < 5) > goto clear_status; > pci_read_config_dword(pdev, cap + PCI_EXP_DPC_RP_PIO_IMPSPEC_LOG, &log); > pci_err(pdev, "RP PIO ImpSpec Log %#010x\n", log); > > - for (i = 0; i < pdev->dpc_rp_log_size - 5; i++) { > - pci_read_config_dword(pdev, > - cap + PCI_EXP_DPC_RP_PIO_TLPPREFIX_LOG + i * 4, > &prefix); > - pci_err(pdev, "TLP Prefix Header: dw%d, %#010x\n", i, prefix); > - } > clear_status: > pci_write_config_dword(pdev, cap + PCI_EXP_DPC_RP_PIO_STATUS, stat
Re: [PATCH v8 5/7] PCI: Store # of supported End-End TLP Prefixes
On Wed, Dec 18, 2024 at 04:37:45PM +0200, Ilpo Järvinen wrote: > eetlp_prefix_path in the struct pci_dev tells if End-End TLP Prefixes > are supported by the path or not, the value is only calculated if > CONFIG_PCI_PASID is set. > > The Max End-End TLP Prefixes field in the Device Capabilities Register > 2 also tells how many (1-4) End-End TLP Prefixes are supported (PCIe > r6.2 sec 7.5.3.15). The number of supported End-End Prefixes is useful > for reading correct number of DWORDs from TLP Prefix Log register in AER > capability (PCIe r6.2 sec 7.8.4.12). > > Replace eetlp_prefix_path with eetlp_prefix_max and determine the > number of supported End-End Prefixes regardless of CONFIG_PCI_PASID so > that an upcoming commit generalizing TLP Prefix Log register reading > does not have to read extra DWORDs for End-End Prefixes that never will > be there. > > The value stored into eetlp_prefix_max is directly derived from > device's Max End-End TLP Prefixes and does not consider limitations > imposed by bridges or the Root Port beyond supported/not supported > flags. This is intentional for two reasons: > > 1) PCIe r6.2 spec sections r6.1 2.2.10.4 & 6.2.4.4 indicate that TLP > is handled malformed only if the number of prefixes exceed the number > of Max End-End TLP Prefixes, which seems to be the case even if the > device could never receive that many prefixes due to smaller maximum > imposed by a bridge or the Root Port. If TLP parsing is later added, > this distinction is significant in interpreting what is logged by the > TLP Prefix Log registers and the value matching to the Malformed TLP > threshold is going to be more useful. > > 2) TLP Prefix handling happens autonomously on a low layer and the > value in eetlp_prefix_max is not programmed anywhere by the kernel > (i.e., there is no limiter OS can control to prevent sending more > than n TLP Prefixes). > > Signed-off-by: Ilpo Järvinen Reviewed-by: Yazen Ghannam Thanks, Yazen
Re: [PATCH] of: address: Unify resource bounds overflow checking
Hi, >> Thomas Weißschuh writes: >> > The members "start" and "end" of struct resource are of type >> > "resource_size_t" which can be 32bit wide. >> > Values read from OF however are always 64bit wide. >> > >> > Refactor the diff overflow checks into a helper function. >> > Also extend the checks to validate each calculation step. >> > >> > Signed-off-by: Thomas Weißschuh >> > --- >> > drivers/of/address.c | 45 ++--- >> > 1 file changed, 26 insertions(+), 19 deletions(-) >> > >> > diff --git a/drivers/of/address.c b/drivers/of/address.c >> > index 7e59283a4472..df854bb427ce 100644 >> > --- a/drivers/of/address.c >> > +++ b/drivers/of/address.c >> > @@ -198,6 +198,25 @@ static u64 of_bus_pci_map(__be32 *addr, const __be32 >> > *range, int na, int ns, >> > >> > #endif /* CONFIG_PCI */ >> > >> > +static int __of_address_resource_bounds(struct resource *r, u64 start, >> > u64 size) >> > +{ >> > + u64 end = start; >> > + >> > + if (overflows_type(start, r->start)) >> > + return -EOVERFLOW; >> > + if (size == 0) >> > + return -EOVERFLOW; >> > + if (check_add_overflow(end, size - 1, &end)) >> > + return -EOVERFLOW; >> > + if (overflows_type(end, r->end)) >> > + return -EOVERFLOW; >> >> This breaks PCI on powerpc qemu. Part of the PCI probe reads a resource >> that's zero sized, which used to succeed but now fails due to the size >> check above. >> >> The diff below fixes it for me. > > I fixed it up with your change. This commit is breaking Ethernet functionality on the TI AM57xx platform due to zero byte SRAM block size allocation during initialization. Prior to this patch, zero byte block sizes were handled properly. The issue is with the following line of code: if (size && check_add_overflow(end, size - 1, &end)) // check_add_overflow not called when size is zero We feel check_add_overflow() should be invoked even when the size is zero to ensure correct block size allocation. Thanks & Best Regards, Basharath
Re: [PATCH v5 05/25] fs/dax: Create a common implementation to break DAX layouts
Alistair Popple wrote: > Prior to freeing a block file systems supporting FS DAX must check > that the associated pages are both unmapped from user-space and not > undergoing DMA or other access from eg. get_user_pages(). This is > achieved by unmapping the file range and scanning the FS DAX > page-cache to see if any pages within the mapping have an elevated > refcount. > > This is done using two functions - dax_layout_busy_page_range() which > returns a page to wait for the refcount to become idle on. Rather than > open-code this introduce a common implementation to both unmap and > wait for the page to become idle. > > Signed-off-by: Alistair Popple > > --- > > Changes for v5: > > - Don't wait for idle pages on non-DAX mappings > > Changes for v4: > > - Fixed some build breakage due to missing symbol exports reported by >John Hubbard (thanks!). > --- > fs/dax.c| 33 + > fs/ext4/inode.c | 10 +- > fs/fuse/dax.c | 29 + > fs/xfs/xfs_inode.c | 23 +-- > fs/xfs/xfs_inode.h | 2 +- > include/linux/dax.h | 21 + > mm/madvise.c| 8 > 7 files changed, 70 insertions(+), 56 deletions(-) > > diff --git a/fs/dax.c b/fs/dax.c > index d010c10..9c3bd07 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -845,6 +845,39 @@ int dax_delete_mapping_entry(struct address_space > *mapping, pgoff_t index) > return ret; > } > > +static int wait_page_idle(struct page *page, > + void (cb)(struct inode *), > + struct inode *inode) > +{ > + return ___wait_var_event(page, page_ref_count(page) == 1, > + TASK_INTERRUPTIBLE, 0, 0, cb(inode)); > +} > + > +/* > + * Unmaps the inode and waits for any DMA to complete prior to deleting the > + * DAX mapping entries for the range. > + */ > +int dax_break_mapping(struct inode *inode, loff_t start, loff_t end, > + void (cb)(struct inode *)) > +{ > + struct page *page; > + int error; > + > + if (!dax_mapping(inode->i_mapping)) > + return 0; > + > + do { > + page = dax_layout_busy_page_range(inode->i_mapping, start, end); > + if (!page) > + break; > + > + error = wait_page_idle(page, cb, inode); This implementations removes logic around @retry found in the XFS and FUSE implementations, I think that is a mistake, and EXT4 has apparently been broken in this regard. wait_page_idle() returns after @page is idle, but that does not mean @inode is DMA idle. After one found page from dax_layout_busy_page_range() is waited upon a new call to dax_break_mapping() needs to made to check if another DMA started, or if there were originally more pages active. > + } while (error == 0); > + > + return error; Surprised that the compiler does not warn about an uninitialized variable here?
Re: [PATCH] powerpc/32: Stop printing Kernel virtual memory layout
On Wed, Jan 08, 2025 at 07:40:38PM +0100, Christophe Leroy wrote: > Printing of Kernel virtual memory layout was added for debug purpose > by commit f637a49e507c ("powerpc: Minor cleanups of kernel virt > address space definitions") > > For security reasons, don't display the kernel's virtual memory layout. > > Other architectures have removed it through following commits. > > 071929dbdd86 ("arm64: Stop printing the virtual memory layout") > 1c31d4e96b8c ("ARM: 8820/1: mm: Stop printing the virtual memory layout") > 3182f798 ("m68k/mm: Stop printing the virtual memory layout") > fd8d0ca25631 ("parisc: Hide virtual kernel memory layout") > 681ff0181bbf ("x86/mm/init/32: Stop printing the virtual memory layout") > > Commit 681ff0181bbf ("x86/mm/init/32: Stop printing the virtual memory > layout") thought x86 was the last one, but in reality powerpc/32 still > had it. > > So remove it now on powerpc/32 as well. > > Cc: Arvind Sankar > Cc: Kees Cook > Signed-off-by: Christophe Leroy Reviewed-by: Kees Cook -- Kees Cook
Re: [PATCH v5 00/25] fs/dax: Fix ZONE_DEVICE page reference counts
Andrew Morton wrote: > On Tue, 7 Jan 2025 14:42:16 +1100 Alistair Popple wrote: > > > Device and FS DAX pages have always maintained their own page > > reference counts without following the normal rules for page reference > > counting. In particular pages are considered free when the refcount > > hits one rather than zero and refcounts are not added when mapping the > > page. > > > > Tracking this requires special PTE bits (PTE_DEVMAP) and a secondary > > mechanism for allowing GUP to hold references on the page (see > > get_dev_pagemap). However there doesn't seem to be any reason why FS > > DAX pages need their own reference counting scheme. > > > > By treating the refcounts on these pages the same way as normal pages > > we can remove a lot of special checks. In particular pXd_trans_huge() > > becomes the same as pXd_leaf(), although I haven't made that change > > here. It also frees up a valuable SW define PTE bit on architectures > > that have devmap PTE bits defined. > > > > It also almost certainly allows further clean-up of the devmap managed > > functions, but I have left that as a future improvment. It also > > enables support for compound ZONE_DEVICE pages which is one of my > > primary motivators for doing this work. > > > > https://lkml.kernel.org/r/wysuus23bqmjtwkfu3zutqtmkse3ki3erf45x32yezlrl24qto@xlqt7qducyld > made me expect merge/build/runtime issues, however this series merges > and builds OK on mm-unstable. Did something change? What's the story > here? > > Oh well, it built so I'll ship it! So my plan is to review this latest set on top of -next as is and then rebase (or ask Alistair to rebase) on a mainline tag so I can identify the merge conflicts with -mm and communicate those to Linus. I will double check that you have pulled these back out of mm-unstable before doing that to avoid a double-commit conflicts in -next, but for now exposure in mm-unstable is good to flush out issues.
linux-next: manual merge of the powerpc tree with the mm tree
Hi all, Today's linux-next merge of the powerpc tree got a conflict in: arch/powerpc/Kconfig between commit: c0c3319917db ("mm: remove devmap related functions and page table bits") from the mm-unstable branch of the mm tree and commit: 00199ed6f2ca ("powerpc: Add preempt lazy support") from the powerpc tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc arch/powerpc/Kconfig index 85409ec1fd83,db9f7b2d07bf.. --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@@ -145,6 -145,8 +145,7 @@@ config PP select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PHYS_TO_DMA select ARCH_HAS_PMEM_API + select ARCH_HAS_PREEMPT_LAZY - select ARCH_HAS_PTE_DEVMAP if PPC_BOOK3S_64 select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE && PPC_BOOK3S_64 select ARCH_HAS_SET_MEMORY pgpDrCnE7fg2c.pgp Description: OpenPGP digital signature
Re: [PATCH RESEND v1 3/5] powerpc/kdump: preserve user-specified memory limit
Hello Mahesh, On 08/01/25 22:36, Mahesh J Salgaonkar wrote: On 2025-01-08 15:44:56 Wed, Sourabh Jain wrote: Commit 59d58189f3d9 ("crash: fix crash memory reserve exceed system memory bug") fails crashkernel parsing if the crash size is found to be higher than system RAM, which makes the memory_limit adjustment code ineffective due to an early exit from reserve_crashkernel(). Regardless lets not violated the user-specified memory limit by adjusting it. Remove this adjustment to ensure all reservations stay within the limit. Commit f94f5ac07983 ("powerpc/fadump: Don't update the user-specified memory limit") did the same for fadump. Agreed. Reviewed-by: Mahesh Salgaonkar Thanks for the review. - Sourabh Jain
Re: [PATCH v3 01/28] module: Extend the preempt disabled section in dereference_symbol_descriptor().
On 2025-01-08 10:55:04 [+0100], Helge Deller wrote: > Nice catch. > > Acked-by: Helge Deller > > This patch really should be backported. > Can you add a Cc: stable tag? It should be picked due to the fixes tag. I add it if I am going to repost it. > Helge Sebastian
[PATCH] powerpc/vmlinux: Remove etext, edata and end
etext is not used anymore since commit 843a1ffaf6f2 ("powerpc/mm: use core_kernel_text() helper") edata and end have not been used since the merge of arch/ppc/ and arch/ppc64/ Remove the three and remove macro PROVIDE32. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/vmlinux.lds.S | 9 - arch/powerpc/kvm/book3s_32_mmu_host.c | 2 -- 2 files changed, 11 deletions(-) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index b4c9decc7a75..de6ee7d35cff 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -1,10 +1,4 @@ /* SPDX-License-Identifier: GPL-2.0 */ -#ifdef CONFIG_PPC64 -#define PROVIDE32(x) PROVIDE(__unused__##x) -#else -#define PROVIDE32(x) PROVIDE(x) -#endif - #define BSS_FIRST_SECTIONS *(.bss.prominit) #define EMITS_PT_NOTE #define RO_EXCEPTION_TABLE_ALIGN 0 @@ -127,7 +121,6 @@ SECTIONS . = ALIGN(PAGE_SIZE); _etext = .; - PROVIDE32 (etext = .); /* Read-only data */ RO_DATA(PAGE_SIZE) @@ -394,7 +387,6 @@ SECTIONS . = ALIGN(PAGE_SIZE); _edata = .; - PROVIDE32 (edata = .); /* * And finally the bss @@ -404,7 +396,6 @@ SECTIONS . = ALIGN(PAGE_SIZE); _end = . ; - PROVIDE32 (end = .); DWARF_DEBUG ELF_DETAILS diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c index 5b7212edbb13..c7e4b62642ea 100644 --- a/arch/powerpc/kvm/book3s_32_mmu_host.c +++ b/arch/powerpc/kvm/book3s_32_mmu_host.c @@ -125,8 +125,6 @@ static u32 *kvmppc_mmu_get_pteg(struct kvm_vcpu *vcpu, u32 vsid, u32 eaddr, return (u32*)pteg; } -extern char etext[]; - int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte, bool iswrite) { -- 2.47.0
[PATCH] powerpc/44x: Declare primary_uic static in uic.c
primary_uic is not used outside uic.c, declare it static. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202411101746.ld8ydvzy-...@intel.com/ Fixes: e58923ed1437 ("[POWERPC] Add arch/powerpc driver for UIC, PPC4xx interrupt controller") Signed-off-by: Christophe Leroy --- arch/powerpc/platforms/44x/uic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/44x/uic.c b/arch/powerpc/platforms/44x/uic.c index e3e148b9dd18..8b03ae4cb3f6 100644 --- a/arch/powerpc/platforms/44x/uic.c +++ b/arch/powerpc/platforms/44x/uic.c @@ -37,7 +37,7 @@ #define UIC_VR 0x7 #define UIC_VCR0x8 -struct uic *primary_uic; +static struct uic *primary_uic; struct uic { int index; -- 2.47.0
Re: [PATCH v5 03/17] asm-generic: pgalloc: Provide generic p4d_{alloc_one,free}
On Wed, Jan 8, 2025, at 07:57, Qi Zheng wrote: > From: Kevin Brodsky > > Four architectures currently implement 5-level pgtables: arm64, > riscv, x86 and s390. The first three have essentially the same > implementation for p4d_alloc_one() and p4d_free(), so we've got an > opportunity to reduce duplication like at the lower levels. > > Provide a generic version of p4d_alloc_one() and p4d_free(), and > make use of it on those architectures. > > Their implementation is the same as at PUD level, except that > p4d_free() performs a runtime check by calling mm_p4d_folded(). > 5-level pgtables depend on a runtime-detected hardware feature on > all supported architectures, so we might as well include this check > in the generic implementation. No runtime check is required in > p4d_alloc_one() as the top-level p4d_alloc() already does the > required check. > > Signed-off-by: Kevin Brodsky > Acked-by: Dave Hansen > Signed-off-by: Qi Zheng > --- > arch/arm64/include/asm/pgalloc.h | 17 > arch/riscv/include/asm/pgalloc.h | 23 > arch/x86/include/asm/pgalloc.h | 18 - > include/asm-generic/pgalloc.h| 45 > 4 files changed, 45 insertions(+), 58 deletions(-) Acked-by: Arnd Bergmann # asm-generic
[PATCH] kallsyms: Always include _edata
Since commit 69a87a32f5cd ("perf machine: Include data symbols in the kernel map"), perf needs _edata symbol. Make sure it is always included in /proc/kallsyms and not only when CONFIG_KALLSYMS_ALL is selected. Signed-off-by: Christophe Leroy Fixes: 69a87a32f5cd ("perf machine: Include data symbols in the kernel map") --- scripts/kallsyms.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c index 03852da3d249..391ab7ebce7f 100644 --- a/scripts/kallsyms.c +++ b/scripts/kallsyms.c @@ -215,6 +215,8 @@ static int symbol_valid(const struct sym_entry *s) if (string_starts_with(name, "__start_") || string_starts_with(name, "__stop_")) return 1; + if (!strcmp(name, "_edata")) + return 1; if (symbol_in_range(s, text_ranges, ARRAY_SIZE(text_ranges)) == 0) -- 2.47.0
[PATCH RESEND v1 3/5] powerpc/kdump: preserve user-specified memory limit
Commit 59d58189f3d9 ("crash: fix crash memory reserve exceed system memory bug") fails crashkernel parsing if the crash size is found to be higher than system RAM, which makes the memory_limit adjustment code ineffective due to an early exit from reserve_crashkernel(). Regardless lets not violated the user-specified memory limit by adjusting it. Remove this adjustment to ensure all reservations stay within the limit. Commit f94f5ac07983 ("powerpc/fadump: Don't update the user-specified memory limit") did the same for fadump. Cc: Andrew Morton Cc: Baoquan he Cc: Hari Bathini CC: Madhavan Srinivasan Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Sourabh Jain --- arch/powerpc/kexec/core.c | 8 1 file changed, 8 deletions(-) diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c index b8333a49ea5d..4945b33322ae 100644 --- a/arch/powerpc/kexec/core.c +++ b/arch/powerpc/kexec/core.c @@ -150,14 +150,6 @@ void __init reserve_crashkernel(void) return; } - /* Crash kernel trumps memory limit */ - if (memory_limit && memory_limit <= crashk_res.end) { - memory_limit = crashk_res.end + 1; - total_mem_sz = memory_limit; - printk("Adjusted memory limit for crashkernel, now 0x%llx\n", - memory_limit); - } - printk(KERN_INFO "Reserving %ldMB of memory at %ldMB " "for crashkernel (System RAM: %ldMB)\n", (unsigned long)(crash_size >> 20), -- 2.47.1
[PATCH RESEND v1 0/5] powerpc/crash: use generic crashkernel reservation
Commit 0ab97169aa05 ("crash_core: add generic function to do reservation") added a generic function to reserve crashkernel memory. So let's use the same function on powerpc and remove the architecture-specific code that essentially does the same thing. The generic crashkernel reservation also provides a way to split the crashkernel reservation into high and low memory reservations, which can be enabled for powerpc in the future. Additionally move powerpc to use generic APIs to locate memory hole for kexec segments while loading kdump kernel. Patch series summary: = Patch 1-2: generic changes Patch 3-4: powerpc changes Patch 5: generic + powerpc changes Cc: Andrew Morton Cc: Baoquan he Cc: Hari Bathini CC: Madhavan Srinivasan Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Cc: sourabhj...@linux.ibm.com Changelog: v1 Resend: - Rebased on top of 6.13-rc6 Sourabh Jain (5): crash: remove an unused argument from reserve_crashkernel_generic() crash: let arch decide crash memory export to iomem_resource powerpc/kdump: preserve user-specified memory limit powerpc/crash: use generic crashkernel reservation crash: option to let arch decide mem range is usable arch/arm64/mm/init.c | 6 +- arch/loongarch/kernel/setup.c| 5 +- arch/powerpc/Kconfig | 3 + arch/powerpc/include/asm/crash_reserve.h | 18 ++ arch/powerpc/include/asm/kexec.h | 10 +- arch/powerpc/kernel/prom.c | 2 +- arch/powerpc/kexec/core.c| 96 - arch/powerpc/kexec/file_load_64.c| 259 +-- arch/riscv/mm/init.c | 6 +- arch/x86/kernel/setup.c | 6 +- include/linux/crash_reserve.h| 22 +- include/linux/kexec.h| 9 + kernel/crash_reserve.c | 12 +- kernel/kexec_file.c | 12 ++ 14 files changed, 127 insertions(+), 339 deletions(-) create mode 100644 arch/powerpc/include/asm/crash_reserve.h -- 2.47.1
[PATCH RESEND v1 2/5] crash: let arch decide crash memory export to iomem_resource
insert_crashkernel_resources() adds crash memory to iomem_resource if generic crashkernel reservation is enabled on an architecture. On PowerPC, system RAM is added to iomem_resource. See commit c40dd2f766440 ("powerpc: Add System RAM to /proc/iomem"). Enabling generic crashkernel reservation on PowerPC leads to a conflict when system RAM is added to iomem_resource because a part of the system RAM, the crashkernel memory, has already been added to iomem_resource. The next commit in the series "powerpc/crash: use generic crashkernel reservation" enables generic crashkernel reservation on PowerPC. If the crashkernel is added to iomem_resource, the kernel fails to add system RAM to /proc/iomem and prints the following traces: CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.13.0-rc2+ snip... NIP [c2016b3c] add_system_ram_resources+0xf0/0x15c LR [c2016b34] add_system_ram_resources+0xe8/0x15c Call Trace: [c484bbc0] [c2016b34] add_system_ram_resources+0xe8/0x15c [c484bc20] [c0010a4c] do_one_initcall+0x7c/0x39c [c484bd00] [c2005418] do_initcalls+0x144/0x18c [c484bd90] [c2005714] kernel_init_freeable+0x21c/0x290 [c484bdf0] [c00110f4] kernel_init+0x2c/0x1b8 [c484be50] [c000dd3c] ret_from_kernel_user_thread+0x14/0x1c To avoid this, an architecture hook is added in insert_crashkernel_resources(), allowing the architecture to decide whether crashkernel memory should be added to iomem_resource. Cc: Andrew Morton Cc: Baoquan he Cc: Hari Bathini CC: Madhavan Srinivasan Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Sourabh Jain --- include/linux/crash_reserve.h | 11 +++ kernel/crash_reserve.c| 3 +++ 2 files changed, 14 insertions(+) diff --git a/include/linux/crash_reserve.h b/include/linux/crash_reserve.h index 1fe7e7d1b214..f1205d044dae 100644 --- a/include/linux/crash_reserve.h +++ b/include/linux/crash_reserve.h @@ -18,6 +18,17 @@ int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, unsigned long long *crash_size, unsigned long long *crash_base, unsigned long long *low_size, bool *high); +#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION + +#ifndef arch_add_crash_res_to_iomem +static inline bool arch_add_crash_res_to_iomem(void) +{ + return true; +} +#endif + +#endif + #ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION #ifndef DEFAULT_CRASH_KERNEL_LOW_SIZE #define DEFAULT_CRASH_KERNEL_LOW_SIZE (128UL << 20) diff --git a/kernel/crash_reserve.c b/kernel/crash_reserve.c index aff7c0fdbefa..190104f32fe1 100644 --- a/kernel/crash_reserve.c +++ b/kernel/crash_reserve.c @@ -460,6 +460,9 @@ void __init reserve_crashkernel_generic(unsigned long long crash_size, #ifndef HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY static __init int insert_crashkernel_resources(void) { + if (!arch_add_crash_res_to_iomem()) + return 0; + if (crashk_res.start < crashk_res.end) insert_resource(&iomem_resource, &crashk_res); -- 2.47.1
[PATCH RESEND v1 4/5] powerpc/crash: use generic crashkernel reservation
Commit 0ab97169aa05 ("crash_core: add generic function to do reservation") added a generic function to reserve crashkernel memory. So let's use the same function on powerpc and remove the architecture-specific code that essentially does the same thing. The generic crashkernel reservation also provides a way to split the crashkernel reservation into high and low memory reservations, which can be enabled for powerpc in the future. Along with moving to the generic crashkernel reservation, the code related to finding the base address for the crashkernel has been separated into its own function name get_crash_base() for better readability and maintainability. To prevent crashkernel memory from being added to iomem_resource, the function arch_add_crash_res_to_iomem() has been introduced. For further details on why this should not be done for the PowerPC architecture, please refer to the previous commit titled "crash: let arch decide crash memory export to iomem_resource. Cc: Andrew Morton Cc: Baoquan he Cc: Hari Bathini CC: Madhavan Srinivasan Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Sourabh Jain --- arch/powerpc/Kconfig | 3 + arch/powerpc/include/asm/crash_reserve.h | 18 + arch/powerpc/include/asm/kexec.h | 4 +- arch/powerpc/kernel/prom.c | 2 +- arch/powerpc/kexec/core.c| 90 ++-- 5 files changed, 63 insertions(+), 54 deletions(-) create mode 100644 arch/powerpc/include/asm/crash_reserve.h diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index a0ce777f9706..8df8871215f8 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -717,6 +717,9 @@ config ARCH_SUPPORTS_CRASH_HOTPLUG def_bool y depends on PPC64 +config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION + def_bool CRASH_RESERVE + config FA_DUMP bool "Firmware-assisted dump" depends on CRASH_DUMP && PPC64 && (PPC_RTAS || PPC_POWERNV) diff --git a/arch/powerpc/include/asm/crash_reserve.h b/arch/powerpc/include/asm/crash_reserve.h new file mode 100644 index ..f5e60721de41 --- /dev/null +++ b/arch/powerpc/include/asm/crash_reserve.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_CRASH_RESERVE_H +#define _ASM_POWERPC_CRASH_RESERVE_H + +/* crash kernel regions are Page size agliged */ +#define CRASH_ALIGN PAGE_SIZE + +#ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION + +static inline bool arch_add_crash_res_to_iomem(void) +{ + return false; +} +#define arch_add_crash_res_to_iomem arch_add_crash_res_to_iomem +#endif + +#endif /* _ASM_POWERPC_CRASH_RESERVE_H */ + diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 270ee93a0f7d..64741558071f 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -113,9 +113,9 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, struct crash_mem #ifdef CONFIG_CRASH_RESERVE int __init overlaps_crashkernel(unsigned long start, unsigned long size); -extern void reserve_crashkernel(void); +extern void arch_reserve_crashkernel(void); #else -static inline void reserve_crashkernel(void) {} +static inline void arch_reserve_crashkernel(void) {} static inline int overlaps_crashkernel(unsigned long start, unsigned long size) { return 0; } #endif diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index e0059842a1c6..9ed9dde7d231 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -860,7 +860,7 @@ void __init early_init_devtree(void *params) */ if (fadump_reserve_mem() == 0) #endif - reserve_crashkernel(); + arch_reserve_crashkernel(); early_reserve_mem(); if (memory_limit > memblock_phys_mem_size()) diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c index 4945b33322ae..b21cfa814492 100644 --- a/arch/powerpc/kexec/core.c +++ b/arch/powerpc/kexec/core.c @@ -80,38 +80,20 @@ void machine_kexec(struct kimage *image) } #ifdef CONFIG_CRASH_RESERVE -void __init reserve_crashkernel(void) -{ - unsigned long long crash_size, crash_base, total_mem_sz; - int ret; - total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size(); - /* use common parsing */ - ret = parse_crashkernel(boot_command_line, total_mem_sz, - &crash_size, &crash_base, NULL, NULL); - if (ret == 0 && crash_size > 0) { - crashk_res.start = crash_base; - crashk_res.end = crash_base + crash_size - 1; - } - - if (crashk_res.end == crashk_res.start) { - crashk_res.start = crashk_res.end = 0; - return; - } - - /* We might have got these values via the command line or the -* device tree, either way sanitise them now. */ - - crash_size = resource_size
[PATCH RESEND v1 1/5] crash: remove an unused argument from reserve_crashkernel_generic()
cmdline argument is not used in reserve_crashkernel_generic() so remove it. Correspondingly, all the callers have been updated as well. No functional change intended. Cc: Andrew Morton Cc: Baoquan he Cc: Hari Bathini CC: Madhavan Srinivasan Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Sourabh Jain --- arch/arm64/mm/init.c | 6 ++ arch/loongarch/kernel/setup.c | 5 ++--- arch/riscv/mm/init.c | 6 ++ arch/x86/kernel/setup.c | 6 ++ include/linux/crash_reserve.h | 11 +-- kernel/crash_reserve.c| 9 - 6 files changed, 17 insertions(+), 26 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index ccdef53872a0..2e496209af80 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -98,21 +98,19 @@ static void __init arch_reserve_crashkernel(void) { unsigned long long low_size = 0; unsigned long long crash_base, crash_size; - char *cmdline = boot_command_line; bool high = false; int ret; if (!IS_ENABLED(CONFIG_CRASH_RESERVE)) return; - ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), + ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), &crash_size, &crash_base, &low_size, &high); if (ret) return; - reserve_crashkernel_generic(cmdline, crash_size, crash_base, - low_size, high); + reserve_crashkernel_generic(crash_size, crash_base, low_size, high); } static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit) diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c index 56934fe58170..ece9c4266c3f 100644 --- a/arch/loongarch/kernel/setup.c +++ b/arch/loongarch/kernel/setup.c @@ -259,18 +259,17 @@ static void __init arch_reserve_crashkernel(void) int ret; unsigned long long low_size = 0; unsigned long long crash_base, crash_size; - char *cmdline = boot_command_line; bool high = false; if (!IS_ENABLED(CONFIG_CRASH_RESERVE)) return; - ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), + ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), &crash_size, &crash_base, &low_size, &high); if (ret) return; - reserve_crashkernel_generic(cmdline, crash_size, crash_base, low_size, high); + reserve_crashkernel_generic(crash_size, crash_base, low_size, high); } static void __init fdt_setup(void) diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index fc53ce748c80..750991df9e52 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -1377,21 +1377,19 @@ static void __init arch_reserve_crashkernel(void) { unsigned long long low_size = 0; unsigned long long crash_base, crash_size; - char *cmdline = boot_command_line; bool high = false; int ret; if (!IS_ENABLED(CONFIG_CRASH_RESERVE)) return; - ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), + ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), &crash_size, &crash_base, &low_size, &high); if (ret) return; - reserve_crashkernel_generic(cmdline, crash_size, crash_base, - low_size, high); + reserve_crashkernel_generic(crash_size, crash_base, low_size, high); } void __init paging_init(void) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index f1fea506e20f..15b6823556c8 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -469,14 +469,13 @@ static void __init memblock_x86_reserve_range_setup_data(void) static void __init arch_reserve_crashkernel(void) { unsigned long long crash_base, crash_size, low_size = 0; - char *cmdline = boot_command_line; bool high = false; int ret; if (!IS_ENABLED(CONFIG_CRASH_RESERVE)) return; - ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), + ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), &crash_size, &crash_base, &low_size, &high); if (ret) @@ -487,8 +486,7 @@ static void __init arch_reserve_crashkernel(void) return; } - reserve_crashkernel_generic(cmdline, crash_size, crash_base, - low_size, high); + reserve_crashkernel_generic(crash_size, crash_base, low_size, high); } static struct resource standard_io_resources[] = { diff --git a/include/linux/crash_reserve.h b/include/linux/crash_reserve.h index 5a9df944fb80..1fe7e7d1b214 100644 -
[PATCH RESEND v1 5/5] crash: option to let arch decide mem range is usable
On PowerPC, the memory reserved for the crashkernel can contain components like RTAS, TCE, OPAL, etc., which should be avoided when loading kexec segments into crashkernel memory. Due to these special components, PowerPC has its own set of functions to locate holes in the crashkernel memory for loading kexec segments for kdump. However, for loading kexec segments in the kexec case, PowerPC uses generic functions to locate holes. So, let's use generic functions to locate memory holes for kdump on PowerPC by adding an arch hook to handle such special regions while loading kexec segments, and remove the PowerPC functions to locate holes. Cc: Andrew Morton Cc: Baoquan he Cc: Hari Bathini CC: Madhavan Srinivasan Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Sourabh Jain --- arch/powerpc/include/asm/kexec.h | 6 +- arch/powerpc/kexec/file_load_64.c | 259 ++ include/linux/kexec.h | 9 ++ kernel/kexec_file.c | 12 ++ 4 files changed, 34 insertions(+), 252 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 64741558071f..5e4680f9ff35 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -95,8 +95,10 @@ int arch_kexec_kernel_image_probe(struct kimage *image, void *buf, unsigned long int arch_kimage_file_post_load_cleanup(struct kimage *image); #define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup -int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf); -#define arch_kexec_locate_mem_hole arch_kexec_locate_mem_hole +int arch_check_excluded_range(struct kimage *image, unsigned long start, + unsigned long end); +#define arch_check_excluded_range arch_check_excluded_range + int load_crashdump_segments_ppc64(struct kimage *image, struct kexec_buf *kbuf); diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index dc65c1391157..e7ef8b2a2554 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -49,201 +49,18 @@ const struct kexec_file_ops * const kexec_file_loaders[] = { NULL }; -/** - * __locate_mem_hole_top_down - Looks top down for a large enough memory hole - * in the memory regions between buf_min & buf_max - * for the buffer. If found, sets kbuf->mem. - * @kbuf: Buffer contents and memory parameters. - * @buf_min:Minimum address for the buffer. - * @buf_max:Maximum address for the buffer. - * - * Returns 0 on success, negative errno on error. - */ -static int __locate_mem_hole_top_down(struct kexec_buf *kbuf, - u64 buf_min, u64 buf_max) -{ - int ret = -EADDRNOTAVAIL; - phys_addr_t start, end; - u64 i; - - for_each_mem_range_rev(i, &start, &end) { - /* -* memblock uses [start, end) convention while it is -* [start, end] here. Fix the off-by-one to have the -* same convention. -*/ - end -= 1; - - if (start > buf_max) - continue; - - /* Memory hole not found */ - if (end < buf_min) - break; - - /* Adjust memory region based on the given range */ - if (start < buf_min) - start = buf_min; - if (end > buf_max) - end = buf_max; - - start = ALIGN(start, kbuf->buf_align); - if (start < end && (end - start + 1) >= kbuf->memsz) { - /* Suitable memory range found. Set kbuf->mem */ - kbuf->mem = ALIGN_DOWN(end - kbuf->memsz + 1, - kbuf->buf_align); - ret = 0; - break; - } - } - - return ret; -} - -/** - * locate_mem_hole_top_down_ppc64 - Skip special memory regions to find a - * suitable buffer with top down approach. - * @kbuf: Buffer contents and memory parameters. - * @buf_min:Minimum address for the buffer. - * @buf_max:Maximum address for the buffer. - * @emem: Exclude memory ranges. - * - * Returns 0 on success, negative errno on error. - */ -static int locate_mem_hole_top_down_ppc64(struct kexec_buf *kbuf, - u64 buf_min, u64 buf_max, - const struct crash_mem *emem) +int arch_check_excluded_range(struct kimage *image, unsigned long start, + unsigned long end) { - int i, ret = 0, err = -EAD
Re: [PATCH v5 14/17] mm: pgtable: introduce generic __tlb_remove_table()
On Wed, Jan 8, 2025, at 07:57, Qi Zheng wrote: > Several architectures (arm, arm64, riscv and x86) define exactly the > same __tlb_remove_table(), just introduce generic __tlb_remove_table() to > eliminate these duplications. > > The s390 __tlb_remove_table() is nearly the same, so also make s390 > __tlb_remove_table() version generic. > > Signed-off-by: Qi Zheng > Reviewed-by: Kevin Brodsky > Acked-by: Andreas Larsson # sparc > Acked-by: Alexander Gordeev # s390 Acked-by: Arnd Bergmann # asm-generic
Re: [PATCH RESEND v1 5/5] crash: option to let arch decide mem range is usable
On 01/08/25 at 03:44pm, Sourabh Jain wrote: ...snip... > diff --git a/include/linux/kexec.h b/include/linux/kexec.h > index f0e9f8eda7a3..407f8b0346aa 100644 > --- a/include/linux/kexec.h > +++ b/include/linux/kexec.h > @@ -205,6 +205,15 @@ static inline int > arch_kimage_file_post_load_cleanup(struct kimage *image) > } > #endif > > +#ifndef arch_check_excluded_range > +static inline int arch_check_excluded_range(struct kimage *image, > + unsigned long start, > + unsigned long end) > +{ > + return 0; > +} > +#endif > + > #ifdef CONFIG_KEXEC_SIG > #ifdef CONFIG_SIGNED_PE_FILE_VERIFICATION > int kexec_kernel_verify_pe_sig(const char *kernel, unsigned long kernel_len); > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c > index 3eedb8c226ad..52e1480dbfa1 100644 > --- a/kernel/kexec_file.c > +++ b/kernel/kexec_file.c > @@ -464,6 +464,12 @@ static int locate_mem_hole_top_down(unsigned long start, > unsigned long end, > continue; > } > > + /* Make sure this does not conflict exclude range */ ^ Make sure this doesn't conflict with excluded range? > + if (arch_check_excluded_range(image, temp_start, temp_end)) { > + temp_start = temp_start - PAGE_SIZE; > + continue; > + } > + > /* We found a suitable memory range */ > break; > } while (1); > @@ -498,6 +504,12 @@ static int locate_mem_hole_bottom_up(unsigned long > start, unsigned long end, > continue; > } > > + /* Make sure this does not conflict exclude range */ ^ Ditto. > + if (arch_check_excluded_range(image, temp_start, temp_end)) { > + temp_start = temp_start + PAGE_SIZE; > + continue; > + } > + > /* We found a suitable memory range */ > break; > } while (1); > -- > 2.47.1 >
[PATCH] powerpc/64s: Rewrite __real_pte() as a static inline
Rewrite __real_pte() as a static inline in order to avoid following warning/error when building with 4k page size: CC arch/powerpc/mm/book3s64/hash_tlb.o arch/powerpc/mm/book3s64/hash_tlb.c: In function 'hpte_need_flush': arch/powerpc/mm/book3s64/hash_tlb.c:49:16: error: variable 'offset' set but not used [-Werror=unused-but-set-variable] 49 | int i, offset; |^~ cc1: all warnings being treated as errors Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202501081741.ayfwybsq-...@intel.com/ Fixes: ff31e105464d ("powerpc/mm/hash64: Store the slot information at the right offset for hugetlb") Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/64/hash-4k.h | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h index c3efacab4b94..a7a68ba9c71b 100644 --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h @@ -77,7 +77,10 @@ /* * With 4K page size the real_pte machinery is all nops. */ -#define __real_pte(e, p, o)((real_pte_t){(e)}) +static inline real_pte_t __real_pte(pte_t pte, pte_t *ptep, int offset) +{ + return (real_pte_t){pte}; +} #define __rpte_to_pte(r) ((r).pte) #define __rpte_to_hidx(r,index)(pte_val(__rpte_to_pte(r)) >> H_PAGE_F_GIX_SHIFT) -- 2.47.0
[PATCH v2] perf: Fix display of kernel symbols
Since commit 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted array for addresses"), perf doesn't display anymore kernel symbols on powerpc, allthough it still detects them as kernel addresses. # Overhead Command Shared Object Symbol # .. . .. # 80.49% Coeur main [unknown] [k] 0xc005f0f8 3.91% Coeur main gau[.] engine_loop.constprop.0.isra.0 1.72% Coeur main [unknown] [k] 0xc005f11c 1.09% Coeur main [unknown] [k] 0xc01f82c8 0.44% Coeur main libc.so.6 [.] epoll_wait 0.38% Coeur main [unknown] [k] 0xc0011718 0.36% Coeur main [unknown] [k] 0xc01f45c0 This is because function maps__find_next_entry() now returns current entry instead of next entry, leading to kernel map end address getting mis-configured with its own start address instead of the start address of the following map. Fix it by really taking the next entry, also make sure that entry follows current one by making sure entries are sorted. Fixes: 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted array for addresses") Signed-off-by: Christophe Leroy Reviewed-by: Arnaldo Carvalho de Melo --- v2: Make sure the entries are sorted, if not sort them. --- tools/perf/util/maps.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c index 432399cbe5dd..09c9cc326c08 100644 --- a/tools/perf/util/maps.c +++ b/tools/perf/util/maps.c @@ -1136,8 +1136,13 @@ struct map *maps__find_next_entry(struct maps *maps, struct map *map) struct map *result = NULL; down_read(maps__lock(maps)); + while (!maps__maps_by_address_sorted(maps)) { + up_read(maps__lock(maps)); + maps__sort_by_address(maps); + down_read(maps__lock(maps)); + } i = maps__by_address_index(maps, map); - if (i < maps__nr_maps(maps)) + if (++i < maps__nr_maps(maps)) result = map__get(maps__maps_by_address(maps)[i]); up_read(maps__lock(maps)); -- 2.47.0
Re: [PATCH v3 01/28] module: Extend the preempt disabled section in dereference_symbol_descriptor().
On 1/8/25 10:04, Sebastian Andrzej Siewior wrote: dereference_symbol_descriptor() needs to obtain the module pointer belonging to pointer in order to resolve that pointer. The returned mod pointer is obtained under RCU-sched/ preempt_disable() guarantees and needs to be used within this section to ensure that the module is not removed in the meantime. Extend the preempt_disable() section to also cover dereference_module_function_descriptor(). Fixes: 04b8eb7a4ccd9 ("symbol lookup: introduce dereference_symbol_descriptor()") Cc: James E.J. Bottomley Cc: Christophe Leroy Cc: Helge Deller Cc: Madhavan Srinivasan Cc: Michael Ellerman Cc: Naveen N Rao Cc: Nicholas Piggin Cc: Sergey Senozhatsky Cc: linux-par...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Reviewed-by: Sergey Senozhatsky Acked-by: Peter Zijlstra (Intel) Signed-off-by: Sebastian Andrzej Siewior Nice catch. Acked-by: Helge Deller This patch really should be backported. Can you add a Cc: stable tag? Helge --- include/linux/kallsyms.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index c3f075e8f60cb..1c6a6c1704d8d 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -57,10 +57,10 @@ static inline void *dereference_symbol_descriptor(void *ptr) preempt_disable(); mod = __module_address((unsigned long)ptr); - preempt_enable(); if (mod) ptr = dereference_module_function_descriptor(mod, ptr); + preempt_enable(); #endif return ptr; }
Re: [PATCH RESEND v1 2/5] crash: let arch decide crash memory export to iomem_resource
Hi, On 01/08/25 at 03:44pm, Sourabh Jain wrote: > insert_crashkernel_resources() adds crash memory to iomem_resource if > generic crashkernel reservation is enabled on an architecture. > > On PowerPC, system RAM is added to iomem_resource. See commit > c40dd2f766440 ("powerpc: Add System RAM to /proc/iomem"). > > Enabling generic crashkernel reservation on PowerPC leads to a conflict > when system RAM is added to iomem_resource because a part of the system > RAM, the crashkernel memory, has already been added to iomem_resource. > > The next commit in the series "powerpc/crash: use generic crashkernel > reservation" enables generic crashkernel reservation on PowerPC. If the > crashkernel is added to iomem_resource, the kernel fails to add > system RAM to /proc/iomem and prints the following traces: > > CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.13.0-rc2+ > snip... > NIP [c2016b3c] add_system_ram_resources+0xf0/0x15c > LR [c2016b34] add_system_ram_resources+0xe8/0x15c > Call Trace: > [c484bbc0] [c2016b34] add_system_ram_resources+0xe8/0x15c > [c484bc20] [c0010a4c] do_one_initcall+0x7c/0x39c > [c484bd00] [c2005418] do_initcalls+0x144/0x18c > [c484bd90] [c2005714] kernel_init_freeable+0x21c/0x290 > [c484bdf0] [c00110f4] kernel_init+0x2c/0x1b8 > [c484be50] [c000dd3c] ret_from_kernel_user_thread+0x14/0x1c > > To avoid this, an architecture hook is added in > insert_crashkernel_resources(), allowing the architecture to decide > whether crashkernel memory should be added to iomem_resource. Have you tried defining HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY in ppc to add crashkernel region to iomem early? Now there are two branches in the existing code, adding a hook will make three ways.
Re: [PATCH v5 03/25] fs/dax: Don't skip locked entries when scanning entries
Alistair Popple wrote: > Several functions internal to FS DAX use the following pattern when > trying to obtain an unlocked entry: > > xas_for_each(&xas, entry, end_idx) { > if (dax_is_locked(entry)) > entry = get_unlocked_entry(&xas, 0); > > This is problematic because get_unlocked_entry() will get the next > present entry in the range, and the next entry may not be > locked. Therefore any processing of the original locked entry will be > skipped. This can cause dax_layout_busy_page_range() to miss DMA-busy > pages in the range, leading file systems to free blocks whilst DMA > operations are ongoing which can lead to file system corruption. > > Instead callers from within a xas_for_each() loop should be waiting > for the current entry to be unlocked without advancing the XArray > state so a new function is introduced to wait. Oh wow, good eye! Did this trip up an xfstest, or did you see this purely by inspection? > Also while we are here rename get_unlocked_entry() to > get_next_unlocked_entry() to make it clear that it may advance the > iterator state. Outside of the above clarification of how found / end user effect you can add: Reviewed-by: Dan Williams
Re: [PATCH] of: address: Unify resource bounds overflow checking
On Wed, Jan 8, 2025 at 8:04 AM Basharath Hussain Khaja wrote: > > Hi, > > >> Thomas Weißschuh writes: > >> > The members "start" and "end" of struct resource are of type > >> > "resource_size_t" which can be 32bit wide. > >> > Values read from OF however are always 64bit wide. > >> > > >> > Refactor the diff overflow checks into a helper function. > >> > Also extend the checks to validate each calculation step. > >> > > >> > Signed-off-by: Thomas Weißschuh > >> > --- > >> > drivers/of/address.c | 45 ++--- > >> > 1 file changed, 26 insertions(+), 19 deletions(-) > >> > > >> > diff --git a/drivers/of/address.c b/drivers/of/address.c > >> > index 7e59283a4472..df854bb427ce 100644 > >> > --- a/drivers/of/address.c > >> > +++ b/drivers/of/address.c > >> > @@ -198,6 +198,25 @@ static u64 of_bus_pci_map(__be32 *addr, const > >> > __be32 *range, int na, int ns, > >> > > >> > #endif /* CONFIG_PCI */ > >> > > >> > +static int __of_address_resource_bounds(struct resource *r, u64 start, > >> > u64 size) > >> > +{ > >> > + u64 end = start; > >> > + > >> > + if (overflows_type(start, r->start)) > >> > + return -EOVERFLOW; > >> > + if (size == 0) > >> > + return -EOVERFLOW; > >> > + if (check_add_overflow(end, size - 1, &end)) > >> > + return -EOVERFLOW; > >> > + if (overflows_type(end, r->end)) > >> > + return -EOVERFLOW; > >> > >> This breaks PCI on powerpc qemu. Part of the PCI probe reads a resource > >> that's zero sized, which used to succeed but now fails due to the size > >> check above. > >> > >> The diff below fixes it for me. > > > > I fixed it up with your change. > > > This commit is breaking Ethernet functionality on the TI AM57xx platform due > to zero byte SRAM block size allocation during initialization. Prior to this > patch, zero byte block sizes were handled properly. What driver and where exactly? Rob
Re: [PATCH v8 3/7] PCI: Make pcie_read_tlp_log() signature same
On Wed, Jan 08, 2025 at 03:40:11PM -0500, Yazen Ghannam wrote: > On Wed, Dec 18, 2024 at 04:37:43PM +0200, Ilpo Järvinen wrote: > > pcie_read_tlp_log()'s prototype and function signature diverged due to > > changes made while applying. > > > > Make the parameters of pcie_read_tlp_log() named identically. > > > > Signed-off-by: Ilpo Järvinen > > Reviewed-by: Jonathan Cameron > > Reviewed-by: Yazen Ghannam > > Just wondering, could this be squashed into the previous patch? Or is > the goal to be strict with one logical change per patch? I haven't looked closely enough to opine on whether this should be squashed, but I generally do prefer one logical change per patch. It's much easier for me to squash things when merging than it is to untangle things that should be separate patches. Bjorn
Re: [PATCH v5 01/25] fuse: Fix dax truncate/punch_hole fault path
Alistair Popple wrote: > FS DAX requires file systems to call into the DAX layout prior to > unlinking inodes to ensure there is no ongoing DMA or other remote > access to the direct mapped page. The fuse file system implements > fuse_dax_break_layouts() to do this which includes a comment > indicating that passing dmap_end == 0 leads to unmapping of the whole > file. > > However this is not true - passing dmap_end == 0 will not unmap > anything before dmap_start, and further more > dax_layout_busy_page_range() will not scan any of the range to see if > there maybe ongoing DMA access to the range. It would be useful to clarify that this is bug was found by inspection and that there are no known end user reports of trouble but that the failure more would look like random fs corruption. The window is hard to hit because a block needs to be truncated, reallocated to a file, and written to before stale DMA could corrupt it. So that may contribute to the fact that fuse-dax users have not reported an issue since v5.10. > Fix this by checking for dmap_end == 0 in fuse_dax_break_layouts() and > pass the entire file range to dax_layout_busy_page_range(). That's not what this patch does, maybe a rebase error that pushed the @dmap_end fixup after the call to dax_layout_busy_page_range? However, I don't think this is quite the right fix, more below... > Signed-off-by: Alistair Popple > Fixes: 6ae330cad6ef ("virtiofs: serialize truncate/punch_hole and dax fault > path") > Cc: Vivek Goyal > > --- > > I am not at all familiar with the fuse file system driver so I have no > idea if the comment is relevant or not and whether the documented > behaviour for dmap_end == 0 is ever relied upon. However this seemed > like the safest fix unless someone more familiar with fuse can confirm > that dmap_end == 0 is never used. It is used in several places and has been wrong since day one. I believe the original commit simply misunderstood that dax_layout_busy_page_range() semantics are analogous to invalidate_inode_pages2_range() semantics in terms of what @start and @end mean. You can add: Co-developed-by: Dan Williams Signed-off-by: Dan Williams ...if you end up doing a resend, or I will add it on applying to nvdimm.git if the rebase does not end up being too prickly. -- 8< -- diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c index c5d1feaa239c..455c4a16080b 100644 --- a/fs/fuse/dax.c +++ b/fs/fuse/dax.c @@ -681,7 +681,6 @@ static int __fuse_dax_break_layouts(struct inode *inode, bool *retry, 0, 0, fuse_wait_dax_page(inode)); } -/* dmap_end == 0 leads to unmapping of whole file */ int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end) { @@ -693,10 +692,6 @@ int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, ret = __fuse_dax_break_layouts(inode, &retry, dmap_start, dmap_end); } while (ret == 0 && retry); - if (!dmap_end) { - dmap_start = 0; - dmap_end = LLONG_MAX; - } return ret; } diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 0b2f8567ca30..bc6c8936c529 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1936,7 +1936,7 @@ int fuse_do_setattr(struct mnt_idmap *idmap, struct dentry *dentry, if (FUSE_IS_DAX(inode) && is_truncate) { filemap_invalidate_lock(mapping); fault_blocked = true; - err = fuse_dax_break_layouts(inode, 0, 0); + err = fuse_dax_break_layouts(inode, 0, -1); if (err) { filemap_invalidate_unlock(mapping); return err; diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 082ee374f694..cef7a8f75821 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -253,7 +253,7 @@ static int fuse_open(struct inode *inode, struct file *file) if (dax_truncate) { filemap_invalidate_lock(inode->i_mapping); - err = fuse_dax_break_layouts(inode, 0, 0); + err = fuse_dax_break_layouts(inode, 0, -1); if (err) goto out_inode_unlock; } @@ -2890,7 +2890,7 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, inode_lock(inode); if (block_faults) { filemap_invalidate_lock(inode->i_mapping); - err = fuse_dax_break_layouts(inode, 0, 0); + err = fuse_dax_break_layouts(inode, 0, -1); if (err) goto out; }
Re: [PATCH v5 02/25] fs/dax: Return unmapped busy pages from dax_layout_busy_page_range()
Alistair Popple wrote: > dax_layout_busy_page_range() is used by file systems to scan the DAX > page-cache to unmap mapping pages from user-space and to determine if > any pages in the given range are busy, either due to ongoing DMA or > other get_user_pages() usage. > > Currently it checks to see the file mapping is mapped into user-space > with mapping_mapped() and returns early if not, skipping the check for > DMA busy pages. This is wrong as pages may still be undergoing DMA > access even if they have subsequently been unmapped from > user-space. Fix this by dropping the check for mapping_mapped(). > > Signed-off-by: Alistair Popple > Suggested-by: Dan Williams Reviewed-by: Dan Williams