Re: [PATCH v2] powerpc: Remove eieio() in PowerPC IO functions
On Wed, Jan 29, 2025, at 12:14, Segher Boessenkool wrote: > On Wed, Jan 29, 2025 at 10:45:10AM +0100, Julian Vetter wrote: >> Remove the eieio() calls in IO functions for PowerPC. While other >> architectures permit prefetching, combining, and reordering, the eieio() >> calls on PowerPC prevent such optimizations. > > Yes, and it is crucial to prevent combining, it is part of the semantics > of these functions. This is a much bigger problem on PowerPC than on > architectures which optimise memory accesses much less. So most other > archs can get away with it much easier (but it is still completely wrong > there). Unfortunately it's not well documented what the actual behavior is supposed to be across architectures, so part of the work that Julian is doing is to make them behave consistently. My impression is that we actually do want combining (and reordering) here in memcpy_fromio, unless there are specific drivers that depend on the non-combining behavior. My earlier observations towards this are: - memcpy_fromio() is almost always used on RAM behind a bus, not MMIO registers. - there has been a push towards using instruction sequences on arm64 make sure we get the most (write) combining for the I/O string functions to get better performance - There is only an eieio in powerpc memcpy_fromio() but no barrier inside the memcpy_toio() or memset_io() loops. If it was necessary to prevent combining, this would likely be for both load and store loops here. - I tried to trace the history of memcpy_fromio() and found that it was originally just a loop around readl(), which at the time was eieio() and a pointer dereference and byteswap. The readl() definition has changed many times after that, but memcpy_fromio() never changed again, which makes it most likely that this was just forgotten in the conversion rather than intentional. If you can think of callers of memcpy_fromio() that depend on the eieio(), we probably need to fix other architectures as well and go back to Julian's original idea of adding a new barrier into the common code and have an architecture specific definition for that barrier. For the insb()/insw()/insl() case, I think you are correct that the eieio is required on powerpc and likely others as well, since the CPU combining multiple reads on a single address into a single one would clearly break the concept. On Arm and other architectures that prevent combining of MMIO reads based on page table flags, we don't need a barrier here, but there is a good chance that these barriers are in fact missing on some alpha and some mips implementations of readsl() etc: The alpha ioread32_rep() and insl() have barriers inside, but it also uses the generic readsl() which does not. > You are keeping the trap;isync things, which a) have a way bigger > performance impact, and b) are merely a debugging aid (if some i/o > access kills the system, it will be clear where that came from). And > that isn't even the biggest thing of course, there is a heavyweight > sync in there as well. The barriers around memcpy_toio()/memcpy_fromio() are a separate question that we should address as well. I somehow thought that other architectures had the same barriers as readl/writel around the string functions, but it does seem like it's only powerpc at this point. Intuitively I would have expected that a memcpy_toio() etc has the same barriers as a writel() around it for consistency, on the other hand it's only powerpc that has these, and if the functions are indeed only meant to work on RAM behind MMIO, they would not not have to serialize against DMA the same way that writel does because there are no side-effects. I'm still looking for more insights on that, but if we can agree that memcpy_{from,to}io don't need the outer barriers, it seems best to address this at the same time as the internal eieio() in memcpy_fromio(), provided we agree on removing those as well. Arnd
Re: [PATCH 0/9] YAML conversion of several Freescale/PowerPC DT bindings
On Wed, Jan 29, 2025 at 05:29:41PM -0500, Frank Li wrote: > On Sun, Jan 26, 2025 at 07:58:55PM +0100, J. Neuschäfer wrote: > > This is a spin-off of the series titled > > "powerpc: MPC83xx cleanup and LANCOM NWAPP2 board". > > > > During the development of that series, it became clear that many > > devicetree bindings for Freescale MPC8xxx platforms are still in the old > > plain-text format, or don't exist at all, and in any case don't mention > > all valid compatible strings. > > > > Signed-off-by: J. Neuschäfer > > Please cc i...@lists.linux.dev next time > > Frank Will do. Best regards, J. Neuschäfer
[PATCH v3 2/7] crash: remove an unused argument from reserve_crashkernel_generic()
cmdline argument is not used in reserve_crashkernel_generic() so remove it. Correspondingly, all the callers have been updated as well. No functional change intended. Cc: Andrew Morton Cc: Madhavan Srinivasan Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Acked-by: Hari Bathini Acked-by: Baoquan He Signed-off-by: Sourabh Jain --- arch/arm64/mm/init.c | 6 ++ arch/loongarch/kernel/setup.c | 5 ++--- arch/riscv/mm/init.c | 6 ++ arch/x86/kernel/setup.c | 6 ++ include/linux/crash_reserve.h | 11 +-- kernel/crash_reserve.c| 9 - 6 files changed, 17 insertions(+), 26 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index ccdef53872a0..2e496209af80 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -98,21 +98,19 @@ static void __init arch_reserve_crashkernel(void) { unsigned long long low_size = 0; unsigned long long crash_base, crash_size; - char *cmdline = boot_command_line; bool high = false; int ret; if (!IS_ENABLED(CONFIG_CRASH_RESERVE)) return; - ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), + ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), &crash_size, &crash_base, &low_size, &high); if (ret) return; - reserve_crashkernel_generic(cmdline, crash_size, crash_base, - low_size, high); + reserve_crashkernel_generic(crash_size, crash_base, low_size, high); } static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit) diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c index 56934fe58170..ece9c4266c3f 100644 --- a/arch/loongarch/kernel/setup.c +++ b/arch/loongarch/kernel/setup.c @@ -259,18 +259,17 @@ static void __init arch_reserve_crashkernel(void) int ret; unsigned long long low_size = 0; unsigned long long crash_base, crash_size; - char *cmdline = boot_command_line; bool high = false; if (!IS_ENABLED(CONFIG_CRASH_RESERVE)) return; - ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), + ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), &crash_size, &crash_base, &low_size, &high); if (ret) return; - reserve_crashkernel_generic(cmdline, crash_size, crash_base, low_size, high); + reserve_crashkernel_generic(crash_size, crash_base, low_size, high); } static void __init fdt_setup(void) diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index fc53ce748c80..750991df9e52 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -1377,21 +1377,19 @@ static void __init arch_reserve_crashkernel(void) { unsigned long long low_size = 0; unsigned long long crash_base, crash_size; - char *cmdline = boot_command_line; bool high = false; int ret; if (!IS_ENABLED(CONFIG_CRASH_RESERVE)) return; - ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), + ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), &crash_size, &crash_base, &low_size, &high); if (ret) return; - reserve_crashkernel_generic(cmdline, crash_size, crash_base, - low_size, high); + reserve_crashkernel_generic(crash_size, crash_base, low_size, high); } void __init paging_init(void) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index f1fea506e20f..15b6823556c8 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -469,14 +469,13 @@ static void __init memblock_x86_reserve_range_setup_data(void) static void __init arch_reserve_crashkernel(void) { unsigned long long crash_base, crash_size, low_size = 0; - char *cmdline = boot_command_line; bool high = false; int ret; if (!IS_ENABLED(CONFIG_CRASH_RESERVE)) return; - ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), + ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), &crash_size, &crash_base, &low_size, &high); if (ret) @@ -487,8 +486,7 @@ static void __init arch_reserve_crashkernel(void) return; } - reserve_crashkernel_generic(cmdline, crash_size, crash_base, - low_size, high); + reserve_crashkernel_generic(crash_size, crash_base, low_size, high); } static struct resource standard_io_resources[] = { diff --git a/include/linux/crash_reserve.h b/include/linux/crash_reserve.h index
[PATCH v3 4/7] powerpc/crash: Use generic APIs to locate memory hole for kdump
On PowerPC, the memory reserved for the crashkernel can contain components like RTAS, TCE, OPAL, etc., which should be avoided when loading kexec segments into crashkernel memory. Due to these special components, PowerPC has its own set of APIs to locate holes in the crashkernel memory for loading kexec segments for kdump. However, for loading kexec segments in the kexec case, PowerPC already uses generic APIs to locate holes. The previous patch in this series, titled "crash: Let arch decide usable memory range in reserved area," introduced arch-specific hook to handle such special regions in the crashkernel area. So, switch PowerPC to use the generic APIs to locate memory holes for kdump and remove the redundant PowerPC-specific APIs. Cc: Andrew Morton Cc: Baoquan he Cc: Hari Bathini Cc: Madhavan Srinivasan Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Sourabh Jain --- arch/powerpc/include/asm/kexec.h | 6 +- arch/powerpc/kexec/file_load_64.c | 259 ++ 2 files changed, 13 insertions(+), 252 deletions(-) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 270ee93a0f7d..8d9d20e0b02b 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -95,8 +95,10 @@ int arch_kexec_kernel_image_probe(struct kimage *image, void *buf, unsigned long int arch_kimage_file_post_load_cleanup(struct kimage *image); #define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup -int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf); -#define arch_kexec_locate_mem_hole arch_kexec_locate_mem_hole +int arch_check_excluded_range(struct kimage *image, unsigned long start, + unsigned long end); +#define arch_check_excluded_range arch_check_excluded_range + int load_crashdump_segments_ppc64(struct kimage *image, struct kexec_buf *kbuf); diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index dc65c1391157..e7ef8b2a2554 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -49,201 +49,18 @@ const struct kexec_file_ops * const kexec_file_loaders[] = { NULL }; -/** - * __locate_mem_hole_top_down - Looks top down for a large enough memory hole - * in the memory regions between buf_min & buf_max - * for the buffer. If found, sets kbuf->mem. - * @kbuf: Buffer contents and memory parameters. - * @buf_min:Minimum address for the buffer. - * @buf_max:Maximum address for the buffer. - * - * Returns 0 on success, negative errno on error. - */ -static int __locate_mem_hole_top_down(struct kexec_buf *kbuf, - u64 buf_min, u64 buf_max) -{ - int ret = -EADDRNOTAVAIL; - phys_addr_t start, end; - u64 i; - - for_each_mem_range_rev(i, &start, &end) { - /* -* memblock uses [start, end) convention while it is -* [start, end] here. Fix the off-by-one to have the -* same convention. -*/ - end -= 1; - - if (start > buf_max) - continue; - - /* Memory hole not found */ - if (end < buf_min) - break; - - /* Adjust memory region based on the given range */ - if (start < buf_min) - start = buf_min; - if (end > buf_max) - end = buf_max; - - start = ALIGN(start, kbuf->buf_align); - if (start < end && (end - start + 1) >= kbuf->memsz) { - /* Suitable memory range found. Set kbuf->mem */ - kbuf->mem = ALIGN_DOWN(end - kbuf->memsz + 1, - kbuf->buf_align); - ret = 0; - break; - } - } - - return ret; -} - -/** - * locate_mem_hole_top_down_ppc64 - Skip special memory regions to find a - * suitable buffer with top down approach. - * @kbuf: Buffer contents and memory parameters. - * @buf_min:Minimum address for the buffer. - * @buf_max:Maximum address for the buffer. - * @emem: Exclude memory ranges. - * - * Returns 0 on success, negative errno on error. - */ -static int locate_mem_hole_top_down_ppc64(struct kexec_buf *kbuf, - u64 buf_min, u64 buf_max, - const struct crash_mem *emem) +int arch_check_excluded_range(struct kimage *image, unsigned long start, + unsigned long end)
[PATCH v3 1/7] kexec: Initialize ELF lowest address to ULONG_MAX
kexec_elf_load() loads an ELF executable and sets the address of the lowest PT_LOAD section to the address held by the lowest_load_addr function argument. To determine the lowest PT_LOAD address, a local variable lowest_addr (type unsigned long) is initialized to UINT_MAX. After loading each PT_LOAD, its address is compared to lowest_addr. If a loaded PT_LOAD address is lower, lowest_addr is updated. However, setting lowest_addr to UINT_MAX won't work when the kernel image is loaded above 4G, as the returned lowest PT_LOAD address would be invalid. This is resolved by initializing lowest_addr to ULONG_MAX instead. This issue was discovered while implementing crashkernel high/low reservation on the PowerPC architecture. Fixes: a0458284f062 ("powerpc: Add support code for kexec_file_load()") Cc: Andrew Morton Cc: Madhavan Srinivasan Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-ker...@vger.kernel.org Acked-by: Hari Bathini Acked-by: Baoquan He Signed-off-by: Sourabh Jain --- kernel/kexec_elf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/kexec_elf.c b/kernel/kexec_elf.c index d3689632e8b9..3a5c25b2adc9 100644 --- a/kernel/kexec_elf.c +++ b/kernel/kexec_elf.c @@ -390,7 +390,7 @@ int kexec_elf_load(struct kimage *image, struct elfhdr *ehdr, struct kexec_buf *kbuf, unsigned long *lowest_load_addr) { - unsigned long lowest_addr = UINT_MAX; + unsigned long lowest_addr = ULONG_MAX; int ret; size_t i; -- 2.48.1
[PATCH v3 7/7] powerpc/crash: use generic crashkernel reservation
Commit 0ab97169aa05 ("crash_core: add generic function to do reservation") added a generic function to reserve crashkernel memory. So let's use the same function on powerpc and remove the architecture-specific code that essentially does the same thing. The generic crashkernel reservation also provides a way to split the crashkernel reservation into high and low memory reservations, which can be enabled for powerpc in the future. Along with moving to the generic crashkernel reservation, the code related to finding the base address for the crashkernel has been separated into its own function name get_crash_base() for better readability and maintainability. Cc: Andrew Morton Cc: Baoquan he CC: Madhavan Srinivasan Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Acked-by: Hari Bathini Reviewed-by: Mahesh Salgaonkar Signed-off-by: Sourabh Jain --- arch/powerpc/Kconfig | 3 + arch/powerpc/include/asm/crash_reserve.h | 8 +++ arch/powerpc/include/asm/kexec.h | 4 +- arch/powerpc/kernel/prom.c | 2 +- arch/powerpc/kexec/core.c| 90 ++-- 5 files changed, 53 insertions(+), 54 deletions(-) create mode 100644 arch/powerpc/include/asm/crash_reserve.h diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index db9f7b2d07bf..880d35fadf40 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -718,6 +718,9 @@ config ARCH_SUPPORTS_CRASH_HOTPLUG def_bool y depends on PPC64 +config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION + def_bool CRASH_RESERVE + config FA_DUMP bool "Firmware-assisted dump" depends on CRASH_DUMP && PPC64 && (PPC_RTAS || PPC_POWERNV) diff --git a/arch/powerpc/include/asm/crash_reserve.h b/arch/powerpc/include/asm/crash_reserve.h new file mode 100644 index ..6467ce29b1fa --- /dev/null +++ b/arch/powerpc/include/asm/crash_reserve.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_POWERPC_CRASH_RESERVE_H +#define _ASM_POWERPC_CRASH_RESERVE_H + +/* crash kernel regions are Page size agliged */ +#define CRASH_ALIGN PAGE_SIZE + +#endif /* _ASM_POWERPC_CRASH_RESERVE_H */ diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 8d9d20e0b02b..5e4680f9ff35 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -115,9 +115,9 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, struct crash_mem #ifdef CONFIG_CRASH_RESERVE int __init overlaps_crashkernel(unsigned long start, unsigned long size); -extern void reserve_crashkernel(void); +extern void arch_reserve_crashkernel(void); #else -static inline void reserve_crashkernel(void) {} +static inline void arch_reserve_crashkernel(void) {} static inline int overlaps_crashkernel(unsigned long start, unsigned long size) { return 0; } #endif diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index e0059842a1c6..9ed9dde7d231 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -860,7 +860,7 @@ void __init early_init_devtree(void *params) */ if (fadump_reserve_mem() == 0) #endif - reserve_crashkernel(); + arch_reserve_crashkernel(); early_reserve_mem(); if (memory_limit > memblock_phys_mem_size()) diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c index 4945b33322ae..b21cfa814492 100644 --- a/arch/powerpc/kexec/core.c +++ b/arch/powerpc/kexec/core.c @@ -80,38 +80,20 @@ void machine_kexec(struct kimage *image) } #ifdef CONFIG_CRASH_RESERVE -void __init reserve_crashkernel(void) -{ - unsigned long long crash_size, crash_base, total_mem_sz; - int ret; - total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size(); - /* use common parsing */ - ret = parse_crashkernel(boot_command_line, total_mem_sz, - &crash_size, &crash_base, NULL, NULL); - if (ret == 0 && crash_size > 0) { - crashk_res.start = crash_base; - crashk_res.end = crash_base + crash_size - 1; - } - - if (crashk_res.end == crashk_res.start) { - crashk_res.start = crashk_res.end = 0; - return; - } - - /* We might have got these values via the command line or the -* device tree, either way sanitise them now. */ - - crash_size = resource_size(&crashk_res); +static unsigned long long __init get_crash_base(unsigned long long crash_base) +{ #ifndef CONFIG_NONSTATIC_KERNEL - if (crashk_res.start != KDUMP_KERNELBASE) + if (crash_base != KDUMP_KERNELBASE) printk("Crash kernel location must be 0x%x\n", KDUMP_KERNELBASE); - crashk_res.start = KDUMP_KERNELBASE; + return KDUMP_KERNELBASE; #else - if (!crashk_res.start) { + unsigned long long crash_base_a
[PATCH v3 6/7] powerpc: insert System RAM resource to prevent crashkernel conflict
The next patch in the series with title "powerpc/crash: use generic crashkernel reservation" enables powerpc to use generic crashkernel reservation instead of custom implementation. This leads to exporting of `Crash Kernel` memory to iomem_resource (/proc/iomem) via insert_crashkernel_resources():kernel/crash_reserve.c or at another place in the same file if HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY is set. The add_system_ram_resources():arch/powerpc/mm/mem.c adds `System RAM` to iomem_resource using request_resource(). This creates a conflict with `Crash Kernel`, which is added by the generic crashkernel reservation code. As a result, the kernel ultimately fails to add `System RAM` to iomem_resource. Consequently, it does not appear in /proc/iomem. There are multiple approches tried to avoid this: 1. Don't add Crash Kernel to iomem_resource: - This has two issues. First, it requires adding an architecture-specific hook in the generic code. There are already two code paths to choose when to add `Crash Kernel` to iomem_resource. This adds one more code path to skip it. Second, what if `Crash Kernel` is required in /proc/iomem in the future? Many architectures do export it. 2. Don't add `System RAM` to iomem_resource by reverting commit c40dd2f766440 ("powerpc: Add System RAM to /proc/iomem"): - It's not ideal to export `System RAM` via /proc/iomem, but since it already done ealier and userspace tools like kdump and kdump-utils rely on `System RAM` from /proc/iomem, removing it will break userspace. 3. Add Crash Kernel along with System RAM to /proc/iomem: This patch takes the third approach by updating add_system_ram_resources() to use insert_resource() instead of the request_resource() API to add the `System RAM` resource to iomem_resource. insert_resource() allows inserting resources even if they overlap with existing ones. Since `Crash Kernel` and `System RAM` resources are added to iomem_resource early in the boot, any other conflict is not expected. With the changes introduced here and in the next patch, "powerpc/crash: use generic crashkernel reservation," /proc/iomem now exports `System RAM` and `Crash Kernel` as shown below: $ cat /proc/iomem -3 : System RAM 1000-4fff : Crash kernel The kdump script is capable of handling `System RAM` and `Crash Kernel` in the above format. The same format is used in other architectures. Cc: Andrew Morton Cc: Baoquan he Cc: Hari Bathini CC: Madhavan Srinivasan Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Sourabh Jain --- arch/powerpc/mm/mem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index c7708c8fad29..615966d71425 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -376,7 +376,7 @@ static int __init add_system_ram_resources(void) */ res->end = end - 1; res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; - WARN_ON(request_resource(&iomem_resource, res) < 0); + WARN_ON(insert_resource(&iomem_resource, res) < 0); } } -- 2.48.1
[PATCH v3 3/7] crash: Let arch decide usable memory range in reserved area
Although the crashkernel area is reserved, on architectures like PowerPC, it is possible for the crashkernel reserved area to contain components like RTAS, TCE, OPAL, etc. To avoid placing kexec segments over these components, PowerPC has its own set of APIs to locate holes in the crashkernel reserved area. Add an arch hook in the generic locate mem hole APIs so that architectures can handle such special regions in the crashkernel area while locating memory holes for kexec segments using generic APIs. With this, a lot of redundant arch-specific code can be removed, as it performs the exact same job as the generic APIs. To keep the generic and arch-specific changes separate, the changes related to moving PowerPC to use the generic APIs and the removal of PowerPC-specific APIs for memory hole allocation are done in a subsequent patch titled "powerpc/crash: Use generic APIs to locate memory hole for kdump. Cc: Andrew Morton Cc: Baoquan he Cc: Hari Bathini Cc: Madhavan Srinivasan Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Sourabh Jain --- include/linux/kexec.h | 9 + kernel/kexec_file.c | 12 2 files changed, 21 insertions(+) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index f0e9f8eda7a3..407f8b0346aa 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -205,6 +205,15 @@ static inline int arch_kimage_file_post_load_cleanup(struct kimage *image) } #endif +#ifndef arch_check_excluded_range +static inline int arch_check_excluded_range(struct kimage *image, + unsigned long start, + unsigned long end) +{ + return 0; +} +#endif + #ifdef CONFIG_KEXEC_SIG #ifdef CONFIG_SIGNED_PE_FILE_VERIFICATION int kexec_kernel_verify_pe_sig(const char *kernel, unsigned long kernel_len); diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 3eedb8c226ad..fba686487e3b 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -464,6 +464,12 @@ static int locate_mem_hole_top_down(unsigned long start, unsigned long end, continue; } + /* Make sure this does not conflict with exclude range */ + if (arch_check_excluded_range(image, temp_start, temp_end)) { + temp_start = temp_start - PAGE_SIZE; + continue; + } + /* We found a suitable memory range */ break; } while (1); @@ -498,6 +504,12 @@ static int locate_mem_hole_bottom_up(unsigned long start, unsigned long end, continue; } + /* Make sure this does not conflict with exclude range */ + if (arch_check_excluded_range(image, temp_start, temp_end)) { + temp_start = temp_start + PAGE_SIZE; + continue; + } + /* We found a suitable memory range */ break; } while (1); -- 2.48.1
Re: [PATCH 2/5] powerpc/microwatt: Device-tree updates
On Wed, Jan 29, 2025 at 06:18:54PM +1100, Paul Mackerras wrote: > Interesting. I looked in my copy of v2.07 (PowerISA_V2.07_PUBLIC.pdf) > and it mentions rfscv in a couple of places, but has no description of > scv or rfscv. I'll change it to v3.0. Huh, rfscv is 3.0 and later according to later versions of the ISA. Segher
Re: [PATCH 2/5] powerpc/microwatt: Device-tree updates
On Wed, Jan 29, 2025 at 06:20:31PM +1000, Nicholas Piggin wrote: > Perfectly reasonable to not add broadcast tlbie in microwatt. If you call "the easy way out" reasonable, then sure. This pretty trivial hardware addition causes so many software headaches whenn missing, it isn't funny. "Friends don't let friends skimp on minimal system support features", etc. :-) Segher
Re: [PATCH 0/5] Microwatt updates
Hi! On Wed, Jan 29, 2025 at 09:49:49AM +1100, Paul Mackerras wrote: > This patch series updates the kernel support for the Microwatt > soft-core and its implementation on FPGA systems, particularly the > Digilent Arty A7-100 FPGA development board. > > Microwatt now supports almost all of the features of the SFFS (Scalar > Fixed-poin and Floating-point Subset) compliancy subset of Power ISA > version 3.1C, including prefixed instructions and the fixed-point hash > (ROP mitigation) instructions. It is also now SMP-capable, and a > dual-core system will fit on the Arty A7-100 board. Congratulations! > Microwatt does not have broadcast TLB invalidations in SMP systems; So it isn't *really* SMP. Compare 603 vs. 604. With enough software (OS) trickery you can make some things work, but :-) (There have been many 603 multiprocessor systems as well, to draw the analogy further than wanted :-) ) > the kernel already has code to deal with this. One of the patches in > this series provides a config option to allow platforms to select > unconditionally the behaviour where cross-CPU TLB invalidations are > handled using inter-processor interrupts. Are there plans to broadcast the (SMP cache invalidation) messages? Will uwatt support some real bus protocol, for example? Again, congrats on this great milestone! Does this floating point support do square roots as well (aka "gpopt"; does it do "gfxopt" for that matter, fsel?) fsqrt is kinda tricky to get to work fully correctly :-) Segher
Re: [PATCH 4/5] powerpc: Define config option for processors without broadcast TLBIE
On Wed, Jan 29, 2025 at 06:10:43PM +1100, Paul Mackerras wrote: > > Hate to bikeshed, but would it be annoying to make this an affirmative > > option? > > I guess we'd have to make all the platforms that do have broadcast > tlbie (and a book3s-64 MMU with radix) select that option. Which > would be powernv and pseries, I would think. If that's correct then > it's probably not too annoying. Should I do that in v2? Such an option would mean "this platform implements the Power architecture correctly". Interesting :-) Segher
Re: [PATCH 2/5] powerpc/microwatt: Device-tree updates
On Wed, Jan 29, 2025 at 09:52:09AM +1100, Paul Mackerras wrote: > - isa = <3000>; > + isa = <3010>; Does this mean 3.1, or 3.01? If the former, can this also encode 3.1C? Should uwatt say to support that? > little-endian { > - isa = <2050>; > - usable-privilege = <3>; > + isa = <0>; > + usable-privilege = <7>; > + os-support = <0>; > hwcap-bit-nr = <1>; > }; What does "isa" in this node mean? Why is it changed to 0? (I don't know this node at all, I only know a property with the same name). > cache-inhibited-large-page { > - isa = <2040>; > - usable-privilege = <2>; > + isa = <0>; > + usable-privilege = <6>; > + os-support = <0>; > }; Similar question here. > - isa = <2010>; > + isa = <0x00>; > usable-privilege = <2>; > + os-support = <0>; > }; And here. That's a quite woolly way to say "0" btw ;-) Segher
Re: [PATCH 2/5] powerpc/microwatt: Device-tree updates
On Wed, Jan 29, 2025 at 04:36:14PM +1000, Nicholas Piggin wrote: > On Wed Jan 29, 2025 at 8:52 AM AEST, Paul Mackerras wrote: > > Microwatt now implements ISA v3.1 (SFFS compliancy subset), including > > prefixed instructions, scv/rfscv, and the FSCR, HFSCR, TAR, and CTRL > > registers. The privileged mode of operation is now hypervisor mode > > and there is no privileged non-hypervisor mode; the MSR[HV] bit is > > forced to 1. > > Cool. Lots of development in microwatt. > > Come to think of it we should have put a broadcast-tlbie feature > in there and you wouldn't need the other patch. That can go on > the todo list I guess. > > system-call-vectored was available in ISA v3.0. Not that we do much > with it at the moment IIRC, but there were dreams of wiring it in for > compat guests. With that fixed, This patch says 2.07 for it (if that is what 2070 mens(if that is what 2070 means). Something's not right :-) > > + system-call-vectored { > > + isa = <2070>; > > + usable-privilege = <7>; > > + os-support = <1>; > > + fscr-bit-nr = <12>; > > + hwcap-bit-nr = <52>; > > }; Segher
Re: [PATCH 5/9] dt-bindings: dma: Convert fsl,elo*-dma bindings to YAML
On Sun, Jan 26, 2025 at 10:47:35PM -0600, Rob Herring wrote: > On Sun, Jan 26, 2025 at 07:59:00PM +0100, J. Neuschäfer wrote: > > The devicetree bindings for Freescale DMA engines have so far existed as > > a text file. This patch converts them to YAML, and specifies all the > > compatible strings currently in use in arch/powerpc/boot/dts. > > > > Signed-off-by: J. Neuschäfer > > --- > > .../devicetree/bindings/dma/fsl,elo-dma.yaml | 129 + > > .../devicetree/bindings/dma/fsl,elo3-dma.yaml | 105 +++ > > .../devicetree/bindings/dma/fsl,eloplus-dma.yaml | 120 > > .../devicetree/bindings/powerpc/fsl/dma.txt| 204 > > - > > 4 files changed, 354 insertions(+), 204 deletions(-) [...] > > +patternProperties: > > + "^dma-channel@.*$": > > +type: object > >additionalProperties: false I'll add it. > (The tools should have highlighted this) With dtschema 2024.11 installed, "make dt_binding_check DT_SCHEMA_FILES=fsl,elo-dma.yaml" does not highlight this. > > + > > +properties: > > + compatible: > > +items: > > + - enum: > > + - fsl,mpc8315-dma-channel > > + - fsl,mpc8323-dma-channel > > + - fsl,mpc8347-dma-channel > > + - fsl,mpc8349-dma-channel > > + - fsl,mpc8360-dma-channel > > + - fsl,mpc8377-dma-channel > > + - fsl,mpc8378-dma-channel > > + - fsl,mpc8379-dma-channel > > + - const: fsl,elo-dma-channel > > + > > + reg: > > +maxItems: 1 > > + > > + cell-index: > > +description: DMA channel index starts at 0. > > + > > + interrupts: true > > You have to define how many interrupts and what they are. Will do. (and the same for the other two files) Best regards, J. Neuschäfer
Re: [PATCH 3/5] powerpc/microwatt: Define an idle power-save function
Hi! > +static void microwatt_idle(void) > +{ > + if (!prep_irq_for_idle()) > + return; > + > + __asm__ __volatile__ ("wait"); > +} All asm without outputs is always implicitly volatile (if it wasn't, it could always be transfirmed whatever way you want, like, optimised away completely). It can still be useful for documentation purposes of course, but here it is the other way around really, it is just cargo cult :-( Segher
Re: [PATCH 3/5] powerpc/microwatt: Define an idle power-save function
On Wed, Jan 29, 2025 at 04:06:03PM +1000, Nicholas Piggin wrote: > Does wait cause MSR[EE] to be set? If not, do you need to use > prep_irq_for_idle_irqsoff() here maybe? Assuming this does implement the standard ISA 2.03 wait instruction (and it better), this does not do anything other than to stop fetching and execution until some later event happens. What values of the WC field does uwatt implement? Segher
Re: [PATCH v3 3/7] crash: Let arch decide usable memory range in reserved area
On 01/31/25 at 05:08pm, Sourabh Jain wrote: > Although the crashkernel area is reserved, on architectures like > PowerPC, it is possible for the crashkernel reserved area to contain > components like RTAS, TCE, OPAL, etc. To avoid placing kexec segments > over these components, PowerPC has its own set of APIs to locate holes > in the crashkernel reserved area. > > Add an arch hook in the generic locate mem hole APIs so that > architectures can handle such special regions in the crashkernel area > while locating memory holes for kexec segments using generic APIs. > With this, a lot of redundant arch-specific code can be removed, as it > performs the exact same job as the generic APIs. > > To keep the generic and arch-specific changes separate, the changes > related to moving PowerPC to use the generic APIs and the removal of > PowerPC-specific APIs for memory hole allocation are done in a > subsequent patch titled "powerpc/crash: Use generic APIs to locate > memory hole for kdump. > > Cc: Andrew Morton > Cc: Baoquan he > Cc: Hari Bathini > Cc: Madhavan Srinivasan > Cc: Mahesh Salgaonkar > Cc: Michael Ellerman > Cc: ke...@lists.infradead.org > Cc: linux-ker...@vger.kernel.org > Signed-off-by: Sourabh Jain > --- > include/linux/kexec.h | 9 + > kernel/kexec_file.c | 12 > 2 files changed, 21 insertions(+) LGTM, Acked-by: Baoquan He
Re: [PATCH v3 3/7] crash: Let arch decide usable memory range in reserved area
Hello Baoquan, On 01/02/25 09:52, Baoquan he wrote: On 01/31/25 at 05:08pm, Sourabh Jain wrote: Although the crashkernel area is reserved, on architectures like PowerPC, it is possible for the crashkernel reserved area to contain components like RTAS, TCE, OPAL, etc. To avoid placing kexec segments over these components, PowerPC has its own set of APIs to locate holes in the crashkernel reserved area. Add an arch hook in the generic locate mem hole APIs so that architectures can handle such special regions in the crashkernel area while locating memory holes for kexec segments using generic APIs. With this, a lot of redundant arch-specific code can be removed, as it performs the exact same job as the generic APIs. To keep the generic and arch-specific changes separate, the changes related to moving PowerPC to use the generic APIs and the removal of PowerPC-specific APIs for memory hole allocation are done in a subsequent patch titled "powerpc/crash: Use generic APIs to locate memory hole for kdump. Cc: Andrew Morton Cc: Baoquan he Cc: Hari Bathini Cc: Madhavan Srinivasan Cc: Mahesh Salgaonkar Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Sourabh Jain --- include/linux/kexec.h | 9 + kernel/kexec_file.c | 12 2 files changed, 21 insertions(+) LGTM, Acked-by: Baoquan He Thanks for the Ack! - Sourabh Jain
Re: [PATCH 2/2] powerpc/fadump: fix additional param memory reservation for HASH MMU
On 23/01/25 7:54 pm, Avnish Chouhan wrote: On 2025-01-23 15:26, Hari Bathini wrote: On 20/01/25 11:05 pm, Sourabh Jain wrote: Commit 683eab94da75bc ("powerpc/fadump: setup additional parameters for dump capture kernel") introduced the additional parameter feature in fadump for HASH MMU with the understanding that GRUB does not use the memory area between 640MB and 768MB for its operation. However, the patch ("powerpc: increase MIN RMA size for CAS negotiation") changes the MIN RMA size to 768MB, allowing GRUB to use memory up to 768MB. This makes the fadump reservation for the additional parameter feature for HASH MMU unreliable. To address this, adjust the memory range for the additional parameter in fadump for HASH MMU. This will ensure that GRUB does not overwrite the memory reserved for fadump's additional parameter in HASH MMU. The new policy for the memory range for the additional parameter in HASH MMU is that the first memory block must be larger than the MIN_RMA size, as the bootloader can use memory up to the MIN_RMA size. The range should be between MIN_RMA and the RMA size (ppc64_rma_size), and it must not overlap with the fadump reserved area. IIRC, even memory above MIN_RMA is used by the bootloader except for 640MB to 768MB (assuming RMA size is >768MB). So, how does this change guarantee that the bootloader is not using memory reserved for bootargs? Avnish, earlier, bootloader was using RUNTIME_MIN_SPACE (128MB) starting top-down at 768MB earlier. With MIN_RMA changed to 768MB, is bootloader still using the concept of RUNTIME_MIN_SPACE to set aside some memory for kernel to use. If yes, where exactly is it allocating this space now? Also, rtas instantiates top-down at 768MB. Would that not have a conflict with grub allocations without RUNTIME_MIN_SPACE at 768MB? - Hari Hi Hari, Hi Avnish, The RUNTIME_MIN_SPACE is the space left aside by Grub is within the MIN_RMA size. Grub won't use memory beyond the MIN_RMA. With this change, we haven't changed the RUNTIME_MIN_SPACE behavior. Grub will still keep the 128 MB space in MIN_RMA for loading stock kernel and initrd. IIUC, you mean, 640MB to 768MB is not used by Grub even if MIN_RMA is at 768MB? If that is true, this change is not needed, as fadump could still use the memory between 640MB to 768MB, right? Am I missing something here.. - Hari
Re: [PATCH 2/9] dt-bindings: ata: Convert fsl,pq-sata binding to YAML
On Mon, Jan 27, 2025 at 08:22:55AM +0900, Damien Le Moal wrote: > On 1/27/25 03:58, J. Neuschäfer via B4 Relay wrote: > > From: "J. Neuschäfer" > > > > Convert the Freescale PowerQUICC SATA controller binding from text form > > to YAML. The list of compatible strings reflects current usage. > > > > Signed-off-by: J. Neuschäfer > > --- > > .../devicetree/bindings/ata/fsl,pq-sata.yaml | 59 > > ++ [...] > > +description: | > > + SATA nodes are defined to describe on-chip Serial ATA controllers. > > + Each SATA port should have its own node. > > Very unclear. The SATA nodes define ports or controllers ? Normally, a single > controller can have multiple ports, so the distinction is important. I'll change it to "Each SATA controller ...", see below. > > + cell-index: > > +$ref: /schemas/types.yaml#/definitions/uint32 > > +enum: [1, 2, 3, 4] > > +description: | > > + 1 for controller @ 0x18000 > > + 2 for controller @ 0x19000 > > + 3 for controller @ 0x1a000 > > + 4 for controller @ 0x1b000 > > Are you sure these are different controllers ? Are they not different ports of > the same controller ? Given that the previous text description define this as > "controller index", I suspect these are the port offsets and you SATA nodes > define ports, and not controllers. They have no shared registers, and each instance has the same register set (at a different base address). The MPC8315E reference manual (for example) documents them as: SATA 1 Controller—Block Base Address 0x1_8000 SATA 2 Controller—Block Base Address 0x1_9000 (table A.24 Serial ATA (SATA) Controller) Section 15.2 Command Operation implies that each SATA controller supports a single port: The SATA controller maintains a queue consisting of up to 16 commands. These commands can be distributed to a single attached device or, if the system contains a port multiplier, over each of the attached devices. So, in conclusion, I'm fairly sure "controller" is the right description. Best regards, J. Neuschäfer
[PATCH v3 0/7] powerpc/crash: use generic crashkernel reservation
Commit 0ab97169aa05 ("crash_core: add generic function to do reservation") added a generic function to reserve crashkernel memory. So let's use the same function on powerpc and remove the architecture-specific code that essentially does the same thing. The generic crashkernel reservation also provides a way to split the crashkernel reservation into high and low memory reservations, which can be enabled for powerpc in the future. Additionally move powerpc to use generic APIs to locate memory hole for kexec segments while loading kdump kernel. Patch series summary: = Patch 1-3: generic changes Patch 4-7: powerpc changes Changelog: v1: https://lore.kernel.org/all/20241217064613.1042866-1-sourabhj...@linux.ibm.com/ v1 Resend: https://lore.kernel.org/all/20250108101458.406806-1-sourabhj...@linux.ibm.com/ - Rebased on top of 6.13-rc6 v2: https://lore.kernel.org/all/20250121115442.1278458-1-sourabhj...@linux.ibm.com/ - 01/06 new patch, fixes a bug with ELF load address It was posted in community as individual path but since it align with this patch series so include here: https://lore.kernel.org/all/20241210091314.185785-1-sourabhj...@linux.ibm.com/ - Fixed a typo 06/06 - Added Acked-by and Reviewed-by tags - Rebased it on 6.13 - It was suggested to try HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY and see if 03/06 patch can be avoided: https://lore.kernel.org/all/Z35gnO2N%2FLFt1E7E@MiWiFi-R3L-srv/ But after adding HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY the issue persisted and kernel prints below logs on boot: ...snip ... v3: - Split the patch that adds the arch hook to handle special regions in crashkernel reserved: Now 03/07 and 04/07. - Dropped the patch that adds arch hooks to skip `Crash Kernel` from iomem_resource (/proc/iomem). Instead, a patch is added to include both `System RAM` and `Crash Kernel` memory ranges in iomem_resource: 06/07. - Rearranged the patches to ensure that generic patches come first, followed by PowerPC-specific patches. - Added Acked-by tags. Cc: Andrew Morton Cc: Baoquan he Cc: Hari Bathini CC: Madhavan Srinivasan Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Cc: sourabhj...@linux.ibm.com Sourabh Jain (7): kexec: Initialize ELF lowest address to ULONG_MAX crash: remove an unused argument from reserve_crashkernel_generic() crash: Let arch decide usable memory range in reserved area powerpc/crash: Use generic APIs to locate memory hole for kdump powerpc/crash: preserve user-specified memory limit powerpc: insert System RAM resource to prevent crashkernel conflict powerpc/crash: use generic crashkernel reservation arch/arm64/mm/init.c | 6 +- arch/loongarch/kernel/setup.c| 5 +- arch/powerpc/Kconfig | 3 + arch/powerpc/include/asm/crash_reserve.h | 8 + arch/powerpc/include/asm/kexec.h | 10 +- arch/powerpc/kernel/prom.c | 2 +- arch/powerpc/kexec/core.c| 96 - arch/powerpc/kexec/file_load_64.c| 259 +-- arch/powerpc/mm/mem.c| 2 +- arch/riscv/mm/init.c | 6 +- arch/x86/kernel/setup.c | 6 +- include/linux/crash_reserve.h| 11 +- include/linux/kexec.h| 9 + kernel/crash_reserve.c | 9 +- kernel/kexec_elf.c | 2 +- kernel/kexec_file.c | 12 ++ 16 files changed, 105 insertions(+), 341 deletions(-) create mode 100644 arch/powerpc/include/asm/crash_reserve.h -- 2.48.1
[PATCH v3 5/7] powerpc/crash: preserve user-specified memory limit
Commit 59d58189f3d9 ("crash: fix crash memory reserve exceed system memory bug") fails crashkernel parsing if the crash size is found to be higher than system RAM, which makes the memory_limit adjustment code ineffective due to an early exit from reserve_crashkernel(). Regardless lets not violate the user-specified memory limit by adjusting it. Remove this adjustment to ensure all reservations stay within the limit. Commit f94f5ac07983 ("powerpc/fadump: Don't update the user-specified memory limit") did the same for fadump. Cc: Andrew Morton Cc: Baoquan he CC: Madhavan Srinivasan Cc: Michael Ellerman Cc: ke...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Acked-by: Hari Bathini Reviewed-by: Mahesh Salgaonkar Signed-off-by: Sourabh Jain --- arch/powerpc/kexec/core.c | 8 1 file changed, 8 deletions(-) diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c index b8333a49ea5d..4945b33322ae 100644 --- a/arch/powerpc/kexec/core.c +++ b/arch/powerpc/kexec/core.c @@ -150,14 +150,6 @@ void __init reserve_crashkernel(void) return; } - /* Crash kernel trumps memory limit */ - if (memory_limit && memory_limit <= crashk_res.end) { - memory_limit = crashk_res.end + 1; - total_mem_sz = memory_limit; - printk("Adjusted memory limit for crashkernel, now 0x%llx\n", - memory_limit); - } - printk(KERN_INFO "Reserving %ldMB of memory at %ldMB " "for crashkernel (System RAM: %ldMB)\n", (unsigned long)(crash_size >> 20), -- 2.48.1
Re: [PATCH 4/5] powerpc: Define config option for processors without broadcast TLBIE
On Wed, Jan 29, 2025 at 09:53:44AM +1100, Paul Mackerras wrote: > Power ISA v3.1 implementations in the Linux Compliancy Subset and > lower are not required to implement broadcast TLBIE, and in fact > Microwatt doesn't. But this pretty much means that such systems cannot be SMP systems at all. Implementing the necessary synchronisation using some cobbled-together rickety jury-rigged contraption is not anyone's goal. Interesting that you did not see any performance loss, btw! Well you didn't try it on anything bigger than a simple dual CPU :-) Segher
Re: [PATCH 0/5] Microwatt updates
On Fri, Jan 31, 2025 at 10:13:43AM -0600, Segher Boessenkool wrote: > Hi! > > On Wed, Jan 29, 2025 at 09:49:49AM +1100, Paul Mackerras wrote: > > This patch series updates the kernel support for the Microwatt > > soft-core and its implementation on FPGA systems, particularly the > > Digilent Arty A7-100 FPGA development board. > > > > Microwatt now supports almost all of the features of the SFFS (Scalar > > Fixed-poin and Floating-point Subset) compliancy subset of Power ISA > > version 3.1C, including prefixed instructions and the fixed-point hash > > (ROP mitigation) instructions. It is also now SMP-capable, and a > > dual-core system will fit on the Arty A7-100 board. > > Congratulations! Thanks! > > Microwatt does not have broadcast TLB invalidations in SMP systems; > > So it isn't *really* SMP. Compare 603 vs. 604. With enough software Actually, the term "SMP" is about latency to memory, indicating that all CPUs have access to memory with similar latency. It doesn't say anything about coherency, either of memory caches or TLBs. So yes, Microwatt is SMP. And for the record, the instruction and data caches are coherent, which is what matters to user-space. Stuff to do with the TLB is not visible to user-space. And the ISA explicitly says "TLBs are non-coherent caches of the HTABs and Radix Trees" (Book III section 6.10.1). > (OS) trickery you can make some things work, but :-) (There have been > many 603 multiprocessor systems as well, to draw the analogy further > than wanted :-) ) 603 was a looong time ago, I don't recall the details. Regarding broadcast TLBIEs, the protocols and mechanisms for doing that are known to be complex and slow in the IBM Power processors (ask Derek Williams about that :). Anton found that in fact doing only local TLBIEs and using IPIs gave *better* performance on IBM Power systems than using hardware broadcast TLBIEs in many cases (the reason being that software knows which other CPUs might have a given TLB entry, often quite a small set, whereas hardware doesn't, and has to send the invalidation to every CPU and wait for a response from every CPU). Add to that, that most other SMP-capable CPU architectures don't do broadcast TLB invalidations, Intel x86 for example. > > the kernel already has code to deal with this. One of the patches in > > this series provides a config option to allow platforms to select > > unconditionally the behaviour where cross-CPU TLB invalidations are > > handled using inter-processor interrupts. > > Are there plans to broadcast the (SMP cache invalidation) messages? Cache (i.e. instruction and data cache) - yes, they *are* coherent. More precisely, the D caches are write-through, and all I and D caches snoop writes to memory (including DMA writes) and invalidate any cache lines being written to. > Will uwatt support some real bus protocol, for example? "Real" meaning using tri-state bus drivers, like we did in the 90s? :) > Again, congrats on this great milestone! Does this floating point > support do square roots as well (aka "gpopt"; does it do "gfxopt" for > that matter, fsel?) fsqrt is kinda tricky to get to work fully > correctly :-) Yes, fsqrt and fsel are implemented in hardware, and are accurate to the last bit. Also, the FPU handles denormalized values in hardware (both input and output) and implements all exception handling as per the ISA, including the trap-enabled overflow cases. Feel free to run whatever tests you like and report bugs. But we're getting a bit off-topic from the kernel patches. :) Paul.
Re: [PATCH 3/5] powerpc/microwatt: Define an idle power-save function
On Fri, Jan 31, 2025 at 10:32:55AM -0600, Segher Boessenkool wrote: > On Wed, Jan 29, 2025 at 04:06:03PM +1000, Nicholas Piggin wrote: > > Does wait cause MSR[EE] to be set? If not, do you need to use > > prep_irq_for_idle_irqsoff() here maybe? > > Assuming this does implement the standard ISA 2.03 wait instruction ISA 2.03? I don't have a copy of 2.03. I looked in 2.04 and the wait instruction there has a different extended opcode from the ISA 3.0/3.1 wait instruction. Why is ISA 2.03 at all relevant to anything here? In any case, the description of the wait instruction in 2.04 doesn't actually say that it waits for anything other than all previous instructions being finished. > (and it better), this does not do anything other than to stop fetching > and execution until some later event happens. > > What values of the WC field does uwatt implement? Just WC=0; for other values it's a no-op. (Which is still arguably correct given that execution is allowed to resume when an implementation-dependent event occurs; P9 for instance just stops for a few microseconds, if I recall correctly, for any WC value.) Paul.
Re: [PATCH 5/9] dt-bindings: dma: Convert fsl,elo*-dma bindings to YAML
On Fri, Jan 31, 2025 at 8:03 AM J. Neuschäfer wrote: > > On Sun, Jan 26, 2025 at 10:47:35PM -0600, Rob Herring wrote: > > On Sun, Jan 26, 2025 at 07:59:00PM +0100, J. Neuschäfer wrote: > > > The devicetree bindings for Freescale DMA engines have so far existed as > > > a text file. This patch converts them to YAML, and specifies all the > > > compatible strings currently in use in arch/powerpc/boot/dts. > > > > > > Signed-off-by: J. Neuschäfer > > > --- > > > .../devicetree/bindings/dma/fsl,elo-dma.yaml | 129 + > > > .../devicetree/bindings/dma/fsl,elo3-dma.yaml | 105 +++ > > > .../devicetree/bindings/dma/fsl,eloplus-dma.yaml | 120 > > > .../devicetree/bindings/powerpc/fsl/dma.txt| 204 > > > - > > > 4 files changed, 354 insertions(+), 204 deletions(-) > [...] > > > +patternProperties: > > > + "^dma-channel@.*$": > > > +type: object > > > >additionalProperties: false > > I'll add it. > > > (The tools should have highlighted this) > > With dtschema 2024.11 installed, "make dt_binding_check > DT_SCHEMA_FILES=fsl,elo-dma.yaml" does not highlight this. Actually, it's the top-level 'addtionalProperties: true' that disables the check here. That should be false as well. Rob