Re: [PATCH v2] powerpc: Remove eieio() in PowerPC IO functions

2025-01-31 Thread Arnd Bergmann
On Wed, Jan 29, 2025, at 12:14, Segher Boessenkool wrote:
> On Wed, Jan 29, 2025 at 10:45:10AM +0100, Julian Vetter wrote:
>> Remove the eieio() calls in IO functions for PowerPC. While other
>> architectures permit prefetching, combining, and reordering, the eieio()
>> calls on PowerPC prevent such optimizations.
>
> Yes, and it is crucial to prevent combining, it is part of the semantics
> of these functions.  This is a much bigger problem on PowerPC than on
> architectures which optimise memory accesses much less.  So most other
> archs can get away with it much easier (but it is still completely wrong
> there).

Unfortunately it's not well documented what the actual behavior is
supposed to be across architectures, so part of the work that Julian
is doing is to make them behave consistently. My impression is that
we actually do want combining (and reordering) here in memcpy_fromio,
unless there are specific drivers that depend on the non-combining
behavior. My earlier observations towards this are:

- memcpy_fromio() is almost always used on RAM behind a bus, not MMIO
  registers.

- there has been a push towards using instruction sequences on arm64
  make sure we get the most (write) combining for the I/O string functions
  to get better performance

- There is only an eieio in powerpc memcpy_fromio() but no barrier
  inside the memcpy_toio() or memset_io() loops. If it was necessary
  to prevent combining, this would likely be for both load and store
  loops here.

- I tried to trace the history of memcpy_fromio() and found that
  it was originally just a loop around readl(), which at the time
  was eieio() and a pointer dereference and byteswap. The readl()
  definition has changed many times after that, but memcpy_fromio()
  never changed again, which makes it most likely that this was
  just forgotten in the conversion rather than intentional.

If you can think of callers of memcpy_fromio() that depend on the
eieio(), we probably need to fix other architectures as well and
go back to Julian's original idea of adding a new barrier into the
common code and have an architecture specific definition for that
barrier.

For the insb()/insw()/insl() case, I think you are correct that
the eieio is required on powerpc and likely others as well, since
the CPU combining multiple reads on a single address into a single
one would clearly break the concept.
On Arm and other architectures that prevent combining of MMIO
reads based on page table flags, we don't need a barrier here,
but there is a good chance that these barriers are in fact
missing on some alpha and some mips implementations of
readsl() etc: The alpha ioread32_rep() and insl() have barriers
inside, but it also uses the generic readsl() which does not.

> You are keeping the trap;isync things, which a) have a way bigger
> performance impact, and b) are merely a debugging aid (if some i/o
> access kills the system, it will be clear where that came from).  And
> that isn't even the biggest thing of course, there is a heavyweight
> sync in there as well.

The barriers around memcpy_toio()/memcpy_fromio() are a separate
question that we should address as well. I somehow thought that
other architectures had the same barriers as readl/writel around
the string functions, but it does seem like it's only powerpc
at this point.

Intuitively I would have expected that a memcpy_toio() etc
has the same barriers as a writel() around it for consistency, on
the other hand it's only powerpc that has these, and if the
functions are indeed only meant to work on RAM behind MMIO,
they would not not have to serialize against DMA the same way
that writel does because there are no side-effects.

I'm still looking for more insights on that, but if we can
agree that memcpy_{from,to}io don't need the outer barriers,
it seems best to address this at the same time as the internal
eieio() in memcpy_fromio(), provided we agree on removing
those as well.

  Arnd



Re: [PATCH 0/9] YAML conversion of several Freescale/PowerPC DT bindings

2025-01-31 Thread J . Neuschäfer
On Wed, Jan 29, 2025 at 05:29:41PM -0500, Frank Li wrote:
> On Sun, Jan 26, 2025 at 07:58:55PM +0100, J. Neuschäfer wrote:
> > This is a spin-off of the series titled
> > "powerpc: MPC83xx cleanup and LANCOM NWAPP2 board".
> >
> > During the development of that series, it became clear that many
> > devicetree bindings for Freescale MPC8xxx platforms are still in the old
> > plain-text format, or don't exist at all, and in any case don't mention
> > all valid compatible strings.
> >
> > Signed-off-by: J. Neuschäfer 
> 
> Please cc i...@lists.linux.dev next time
> 
> Frank

Will do.

Best regards,
J. Neuschäfer



[PATCH v3 2/7] crash: remove an unused argument from reserve_crashkernel_generic()

2025-01-31 Thread Sourabh Jain
cmdline argument is not used in reserve_crashkernel_generic() so remove
it. Correspondingly, all the callers have been updated as well.

No functional change intended.

Cc: Andrew Morton 
Cc: Madhavan Srinivasan 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
Signed-off-by: Sourabh Jain 
---
 arch/arm64/mm/init.c  |  6 ++
 arch/loongarch/kernel/setup.c |  5 ++---
 arch/riscv/mm/init.c  |  6 ++
 arch/x86/kernel/setup.c   |  6 ++
 include/linux/crash_reserve.h | 11 +--
 kernel/crash_reserve.c|  9 -
 6 files changed, 17 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index ccdef53872a0..2e496209af80 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -98,21 +98,19 @@ static void __init arch_reserve_crashkernel(void)
 {
unsigned long long low_size = 0;
unsigned long long crash_base, crash_size;
-   char *cmdline = boot_command_line;
bool high = false;
int ret;
 
if (!IS_ENABLED(CONFIG_CRASH_RESERVE))
return;
 
-   ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
+   ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
&crash_size, &crash_base,
&low_size, &high);
if (ret)
return;
 
-   reserve_crashkernel_generic(cmdline, crash_size, crash_base,
-   low_size, high);
+   reserve_crashkernel_generic(crash_size, crash_base, low_size, high);
 }
 
 static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index 56934fe58170..ece9c4266c3f 100644
--- a/arch/loongarch/kernel/setup.c
+++ b/arch/loongarch/kernel/setup.c
@@ -259,18 +259,17 @@ static void __init arch_reserve_crashkernel(void)
int ret;
unsigned long long low_size = 0;
unsigned long long crash_base, crash_size;
-   char *cmdline = boot_command_line;
bool high = false;
 
if (!IS_ENABLED(CONFIG_CRASH_RESERVE))
return;
 
-   ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
+   ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
&crash_size, &crash_base, &low_size, &high);
if (ret)
return;
 
-   reserve_crashkernel_generic(cmdline, crash_size, crash_base, low_size, 
high);
+   reserve_crashkernel_generic(crash_size, crash_base, low_size, high);
 }
 
 static void __init fdt_setup(void)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index fc53ce748c80..750991df9e52 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -1377,21 +1377,19 @@ static void __init arch_reserve_crashkernel(void)
 {
unsigned long long low_size = 0;
unsigned long long crash_base, crash_size;
-   char *cmdline = boot_command_line;
bool high = false;
int ret;
 
if (!IS_ENABLED(CONFIG_CRASH_RESERVE))
return;
 
-   ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
+   ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
&crash_size, &crash_base,
&low_size, &high);
if (ret)
return;
 
-   reserve_crashkernel_generic(cmdline, crash_size, crash_base,
-   low_size, high);
+   reserve_crashkernel_generic(crash_size, crash_base, low_size, high);
 }
 
 void __init paging_init(void)
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index f1fea506e20f..15b6823556c8 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -469,14 +469,13 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
 static void __init arch_reserve_crashkernel(void)
 {
unsigned long long crash_base, crash_size, low_size = 0;
-   char *cmdline = boot_command_line;
bool high = false;
int ret;
 
if (!IS_ENABLED(CONFIG_CRASH_RESERVE))
return;
 
-   ret = parse_crashkernel(cmdline, memblock_phys_mem_size(),
+   ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
&crash_size, &crash_base,
&low_size, &high);
if (ret)
@@ -487,8 +486,7 @@ static void __init arch_reserve_crashkernel(void)
return;
}
 
-   reserve_crashkernel_generic(cmdline, crash_size, crash_base,
-   low_size, high);
+   reserve_crashkernel_generic(crash_size, crash_base, low_size, high);
 }
 
 static struct resource standard_io_resources[] = {
diff --git a/include/linux/crash_reserve.h b/include/linux/crash_reserve.h
index 

[PATCH v3 4/7] powerpc/crash: Use generic APIs to locate memory hole for kdump

2025-01-31 Thread Sourabh Jain
On PowerPC, the memory reserved for the crashkernel can contain
components like RTAS, TCE, OPAL, etc., which should be avoided when
loading kexec segments into crashkernel memory. Due to these special
components, PowerPC has its own set of APIs to locate holes in the
crashkernel memory for loading kexec segments for kdump. However, for
loading kexec segments in the kexec case, PowerPC already uses generic
APIs to locate holes.

The previous patch in this series, titled "crash: Let arch decide usable
memory range in reserved area," introduced arch-specific hook to handle
such special regions in the crashkernel area. So, switch PowerPC to use
the generic APIs to locate memory holes for kdump and remove the
redundant PowerPC-specific APIs.

Cc: Andrew Morton 
Cc: Baoquan he 
Cc: Hari Bathini 
Cc: Madhavan Srinivasan 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Sourabh Jain 
---
 arch/powerpc/include/asm/kexec.h  |   6 +-
 arch/powerpc/kexec/file_load_64.c | 259 ++
 2 files changed, 13 insertions(+), 252 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 270ee93a0f7d..8d9d20e0b02b 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -95,8 +95,10 @@ int arch_kexec_kernel_image_probe(struct kimage *image, void 
*buf, unsigned long
 int arch_kimage_file_post_load_cleanup(struct kimage *image);
 #define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
 
-int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf);
-#define arch_kexec_locate_mem_hole arch_kexec_locate_mem_hole
+int arch_check_excluded_range(struct kimage *image, unsigned long start,
+ unsigned long end);
+#define arch_check_excluded_range  arch_check_excluded_range
+
 
 int load_crashdump_segments_ppc64(struct kimage *image,
  struct kexec_buf *kbuf);
diff --git a/arch/powerpc/kexec/file_load_64.c 
b/arch/powerpc/kexec/file_load_64.c
index dc65c1391157..e7ef8b2a2554 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -49,201 +49,18 @@ const struct kexec_file_ops * const kexec_file_loaders[] = 
{
NULL
 };
 
-/**
- * __locate_mem_hole_top_down - Looks top down for a large enough memory hole
- *  in the memory regions between buf_min & buf_max
- *  for the buffer. If found, sets kbuf->mem.
- * @kbuf:   Buffer contents and memory parameters.
- * @buf_min:Minimum address for the buffer.
- * @buf_max:Maximum address for the buffer.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int __locate_mem_hole_top_down(struct kexec_buf *kbuf,
- u64 buf_min, u64 buf_max)
-{
-   int ret = -EADDRNOTAVAIL;
-   phys_addr_t start, end;
-   u64 i;
-
-   for_each_mem_range_rev(i, &start, &end) {
-   /*
-* memblock uses [start, end) convention while it is
-* [start, end] here. Fix the off-by-one to have the
-* same convention.
-*/
-   end -= 1;
-
-   if (start > buf_max)
-   continue;
-
-   /* Memory hole not found */
-   if (end < buf_min)
-   break;
-
-   /* Adjust memory region based on the given range */
-   if (start < buf_min)
-   start = buf_min;
-   if (end > buf_max)
-   end = buf_max;
-
-   start = ALIGN(start, kbuf->buf_align);
-   if (start < end && (end - start + 1) >= kbuf->memsz) {
-   /* Suitable memory range found. Set kbuf->mem */
-   kbuf->mem = ALIGN_DOWN(end - kbuf->memsz + 1,
-  kbuf->buf_align);
-   ret = 0;
-   break;
-   }
-   }
-
-   return ret;
-}
-
-/**
- * locate_mem_hole_top_down_ppc64 - Skip special memory regions to find a
- *  suitable buffer with top down approach.
- * @kbuf:   Buffer contents and memory parameters.
- * @buf_min:Minimum address for the buffer.
- * @buf_max:Maximum address for the buffer.
- * @emem:   Exclude memory ranges.
- *
- * Returns 0 on success, negative errno on error.
- */
-static int locate_mem_hole_top_down_ppc64(struct kexec_buf *kbuf,
- u64 buf_min, u64 buf_max,
- const struct crash_mem *emem)
+int arch_check_excluded_range(struct kimage *image, unsigned long start,
+ unsigned long end)

[PATCH v3 1/7] kexec: Initialize ELF lowest address to ULONG_MAX

2025-01-31 Thread Sourabh Jain
kexec_elf_load() loads an ELF executable and sets the address of the
lowest PT_LOAD section to the address held by the lowest_load_addr
function argument.

To determine the lowest PT_LOAD address, a local variable lowest_addr
(type unsigned long) is initialized to UINT_MAX. After loading each
PT_LOAD, its address is compared to lowest_addr. If a loaded PT_LOAD
address is lower, lowest_addr is updated. However, setting lowest_addr
to UINT_MAX won't work when the kernel image is loaded above 4G, as the
returned lowest PT_LOAD address would be invalid. This is resolved by
initializing lowest_addr to ULONG_MAX instead.

This issue was discovered while implementing crashkernel high/low
reservation on the PowerPC architecture.

Fixes: a0458284f062 ("powerpc: Add support code for kexec_file_load()")
Cc: Andrew Morton 
Cc: Madhavan Srinivasan 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: ke...@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
Acked-by: Hari Bathini 
Acked-by: Baoquan He 
Signed-off-by: Sourabh Jain 
---
 kernel/kexec_elf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/kexec_elf.c b/kernel/kexec_elf.c
index d3689632e8b9..3a5c25b2adc9 100644
--- a/kernel/kexec_elf.c
+++ b/kernel/kexec_elf.c
@@ -390,7 +390,7 @@ int kexec_elf_load(struct kimage *image, struct elfhdr 
*ehdr,
 struct kexec_buf *kbuf,
 unsigned long *lowest_load_addr)
 {
-   unsigned long lowest_addr = UINT_MAX;
+   unsigned long lowest_addr = ULONG_MAX;
int ret;
size_t i;
 
-- 
2.48.1




[PATCH v3 7/7] powerpc/crash: use generic crashkernel reservation

2025-01-31 Thread Sourabh Jain
Commit 0ab97169aa05 ("crash_core: add generic function to do
reservation") added a generic function to reserve crashkernel memory.
So let's use the same function on powerpc and remove the
architecture-specific code that essentially does the same thing.

The generic crashkernel reservation also provides a way to split the
crashkernel reservation into high and low memory reservations, which can
be enabled for powerpc in the future.

Along with moving to the generic crashkernel reservation, the code
related to finding the base address for the crashkernel has been
separated into its own function name get_crash_base() for better
readability and maintainability.

Cc: Andrew Morton 
Cc: Baoquan he 
CC: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Acked-by: Hari Bathini 
Reviewed-by: Mahesh Salgaonkar 
Signed-off-by: Sourabh Jain 
---
 arch/powerpc/Kconfig |  3 +
 arch/powerpc/include/asm/crash_reserve.h |  8 +++
 arch/powerpc/include/asm/kexec.h |  4 +-
 arch/powerpc/kernel/prom.c   |  2 +-
 arch/powerpc/kexec/core.c| 90 ++--
 5 files changed, 53 insertions(+), 54 deletions(-)
 create mode 100644 arch/powerpc/include/asm/crash_reserve.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index db9f7b2d07bf..880d35fadf40 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -718,6 +718,9 @@ config ARCH_SUPPORTS_CRASH_HOTPLUG
def_bool y
depends on PPC64
 
+config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION
+   def_bool CRASH_RESERVE
+
 config FA_DUMP
bool "Firmware-assisted dump"
depends on CRASH_DUMP && PPC64 && (PPC_RTAS || PPC_POWERNV)
diff --git a/arch/powerpc/include/asm/crash_reserve.h 
b/arch/powerpc/include/asm/crash_reserve.h
new file mode 100644
index ..6467ce29b1fa
--- /dev/null
+++ b/arch/powerpc/include/asm/crash_reserve.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_CRASH_RESERVE_H
+#define _ASM_POWERPC_CRASH_RESERVE_H
+
+/* crash kernel regions are Page size agliged */
+#define CRASH_ALIGN PAGE_SIZE
+
+#endif /* _ASM_POWERPC_CRASH_RESERVE_H */
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 8d9d20e0b02b..5e4680f9ff35 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -115,9 +115,9 @@ int setup_new_fdt_ppc64(const struct kimage *image, void 
*fdt, struct crash_mem
 
 #ifdef CONFIG_CRASH_RESERVE
 int __init overlaps_crashkernel(unsigned long start, unsigned long size);
-extern void reserve_crashkernel(void);
+extern void arch_reserve_crashkernel(void);
 #else
-static inline void reserve_crashkernel(void) {}
+static inline void arch_reserve_crashkernel(void) {}
 static inline int overlaps_crashkernel(unsigned long start, unsigned long 
size) { return 0; }
 #endif
 
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index e0059842a1c6..9ed9dde7d231 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -860,7 +860,7 @@ void __init early_init_devtree(void *params)
 */
if (fadump_reserve_mem() == 0)
 #endif
-   reserve_crashkernel();
+   arch_reserve_crashkernel();
early_reserve_mem();
 
if (memory_limit > memblock_phys_mem_size())
diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
index 4945b33322ae..b21cfa814492 100644
--- a/arch/powerpc/kexec/core.c
+++ b/arch/powerpc/kexec/core.c
@@ -80,38 +80,20 @@ void machine_kexec(struct kimage *image)
 }
 
 #ifdef CONFIG_CRASH_RESERVE
-void __init reserve_crashkernel(void)
-{
-   unsigned long long crash_size, crash_base, total_mem_sz;
-   int ret;
 
-   total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size();
-   /* use common parsing */
-   ret = parse_crashkernel(boot_command_line, total_mem_sz,
-   &crash_size, &crash_base, NULL, NULL);
-   if (ret == 0 && crash_size > 0) {
-   crashk_res.start = crash_base;
-   crashk_res.end = crash_base + crash_size - 1;
-   }
-
-   if (crashk_res.end == crashk_res.start) {
-   crashk_res.start = crashk_res.end = 0;
-   return;
-   }
-
-   /* We might have got these values via the command line or the
-* device tree, either way sanitise them now. */
-
-   crash_size = resource_size(&crashk_res);
+static unsigned long long __init get_crash_base(unsigned long long crash_base)
+{
 
 #ifndef CONFIG_NONSTATIC_KERNEL
-   if (crashk_res.start != KDUMP_KERNELBASE)
+   if (crash_base != KDUMP_KERNELBASE)
printk("Crash kernel location must be 0x%x\n",
KDUMP_KERNELBASE);
 
-   crashk_res.start = KDUMP_KERNELBASE;
+   return KDUMP_KERNELBASE;
 #else
-   if (!crashk_res.start) {
+   unsigned long long crash_base_a

[PATCH v3 6/7] powerpc: insert System RAM resource to prevent crashkernel conflict

2025-01-31 Thread Sourabh Jain
The next patch in the series with title "powerpc/crash: use generic
crashkernel reservation" enables powerpc to use generic crashkernel
reservation instead of custom implementation. This leads to exporting
of `Crash Kernel` memory to iomem_resource (/proc/iomem) via
insert_crashkernel_resources():kernel/crash_reserve.c or at another
place in the same file if
HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY is set.

The add_system_ram_resources():arch/powerpc/mm/mem.c adds `System RAM`
to iomem_resource using request_resource(). This creates a conflict with
`Crash Kernel`, which is added by the generic crashkernel reservation
code. As a result, the kernel ultimately fails to add `System RAM` to
iomem_resource. Consequently, it does not appear in /proc/iomem.

There are multiple approches tried to avoid this:

1. Don't add Crash Kernel to iomem_resource:
- This has two issues.
  First, it requires adding an architecture-specific hook in the
  generic code. There are already two code paths to choose when to
  add `Crash Kernel` to iomem_resource. This adds one more code path
  to skip it.

  Second, what if `Crash Kernel` is required in /proc/iomem in the
  future? Many architectures do export it.

2. Don't add `System RAM` to iomem_resource by reverting commit
   c40dd2f766440 ("powerpc: Add System RAM to /proc/iomem"):
- It's not ideal to export `System RAM` via /proc/iomem, but since
  it already done ealier and userspace tools like kdump and
  kdump-utils rely on `System RAM` from /proc/iomem, removing it
  will break userspace.

3. Add Crash Kernel along with System RAM to /proc/iomem:

This patch takes the third approach by updating
add_system_ram_resources() to use insert_resource() instead of the
request_resource() API to add the `System RAM` resource to
iomem_resource. insert_resource() allows inserting resources even if
they overlap with existing ones. Since `Crash Kernel` and `System RAM`
resources are added to iomem_resource early in the boot, any other
conflict is not expected.

With the changes introduced here and in the next patch, "powerpc/crash:
use generic crashkernel reservation," /proc/iomem now exports
`System RAM` and `Crash Kernel` as shown below:

$ cat /proc/iomem
-3 : System RAM
  1000-4fff : Crash kernel

The kdump script is capable of handling `System RAM` and `Crash Kernel`
in the above format. The same format is used in other architectures.

Cc: Andrew Morton 
Cc: Baoquan he 
Cc: Hari Bathini 
CC: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Sourabh Jain 
---
 arch/powerpc/mm/mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index c7708c8fad29..615966d71425 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -376,7 +376,7 @@ static int __init add_system_ram_resources(void)
 */
res->end = end - 1;
res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
-   WARN_ON(request_resource(&iomem_resource, res) < 0);
+   WARN_ON(insert_resource(&iomem_resource, res) < 0);
}
}
 
-- 
2.48.1




[PATCH v3 3/7] crash: Let arch decide usable memory range in reserved area

2025-01-31 Thread Sourabh Jain
Although the crashkernel area is reserved, on architectures like
PowerPC, it is possible for the crashkernel reserved area to contain
components like RTAS, TCE, OPAL, etc. To avoid placing kexec segments
over these components, PowerPC has its own set of APIs to locate holes
in the crashkernel reserved area.

Add an arch hook in the generic locate mem hole APIs so that
architectures can handle such special regions in the crashkernel area
while locating memory holes for kexec segments using generic APIs.
With this, a lot of redundant arch-specific code can be removed, as it
performs the exact same job as the generic APIs.

To keep the generic and arch-specific changes separate, the changes
related to moving PowerPC to use the generic APIs and the removal of
PowerPC-specific APIs for memory hole allocation are done in a
subsequent patch titled "powerpc/crash: Use generic APIs to locate
memory hole for kdump.

Cc: Andrew Morton 
Cc: Baoquan he 
Cc: Hari Bathini 
Cc: Madhavan Srinivasan 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Sourabh Jain 
---
 include/linux/kexec.h |  9 +
 kernel/kexec_file.c   | 12 
 2 files changed, 21 insertions(+)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index f0e9f8eda7a3..407f8b0346aa 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -205,6 +205,15 @@ static inline int 
arch_kimage_file_post_load_cleanup(struct kimage *image)
 }
 #endif
 
+#ifndef arch_check_excluded_range
+static inline int arch_check_excluded_range(struct kimage *image,
+   unsigned long start,
+   unsigned long end)
+{
+   return 0;
+}
+#endif
+
 #ifdef CONFIG_KEXEC_SIG
 #ifdef CONFIG_SIGNED_PE_FILE_VERIFICATION
 int kexec_kernel_verify_pe_sig(const char *kernel, unsigned long kernel_len);
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 3eedb8c226ad..fba686487e3b 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -464,6 +464,12 @@ static int locate_mem_hole_top_down(unsigned long start, 
unsigned long end,
continue;
}
 
+   /* Make sure this does not conflict with exclude range */
+   if (arch_check_excluded_range(image, temp_start, temp_end)) {
+   temp_start = temp_start - PAGE_SIZE;
+   continue;
+   }
+
/* We found a suitable memory range */
break;
} while (1);
@@ -498,6 +504,12 @@ static int locate_mem_hole_bottom_up(unsigned long start, 
unsigned long end,
continue;
}
 
+   /* Make sure this does not conflict with exclude range */
+   if (arch_check_excluded_range(image, temp_start, temp_end)) {
+   temp_start = temp_start + PAGE_SIZE;
+   continue;
+   }
+
/* We found a suitable memory range */
break;
} while (1);
-- 
2.48.1




Re: [PATCH 2/5] powerpc/microwatt: Device-tree updates

2025-01-31 Thread Segher Boessenkool
On Wed, Jan 29, 2025 at 06:18:54PM +1100, Paul Mackerras wrote:
> Interesting.  I looked in my copy of v2.07 (PowerISA_V2.07_PUBLIC.pdf)
> and it mentions rfscv in a couple of places, but has no description of
> scv or rfscv.  I'll change it to v3.0.

Huh, rfscv is 3.0 and later according to later versions of the ISA.


Segher



Re: [PATCH 2/5] powerpc/microwatt: Device-tree updates

2025-01-31 Thread Segher Boessenkool
On Wed, Jan 29, 2025 at 06:20:31PM +1000, Nicholas Piggin wrote:
> Perfectly reasonable to not add broadcast tlbie in microwatt.

If you call "the easy way out" reasonable, then sure.  This pretty
trivial hardware addition causes so many software headaches whenn
missing, it isn't funny.  "Friends don't let friends skimp on minimal
system support features", etc. :-)


Segher



Re: [PATCH 0/5] Microwatt updates

2025-01-31 Thread Segher Boessenkool
Hi!

On Wed, Jan 29, 2025 at 09:49:49AM +1100, Paul Mackerras wrote:
> This patch series updates the kernel support for the Microwatt
> soft-core and its implementation on FPGA systems, particularly the
> Digilent Arty A7-100 FPGA development board.
> 
> Microwatt now supports almost all of the features of the SFFS (Scalar
> Fixed-poin and Floating-point Subset) compliancy subset of Power ISA
> version 3.1C, including prefixed instructions and the fixed-point hash
> (ROP mitigation) instructions.  It is also now SMP-capable, and a
> dual-core system will fit on the Arty A7-100 board.

Congratulations!

> Microwatt does not have broadcast TLB invalidations in SMP systems;

So it isn't *really* SMP.  Compare 603 vs. 604.  With enough software
(OS) trickery you can make some things work, but :-)  (There have been
many 603 multiprocessor systems as well, to draw the analogy further
than wanted :-) )

> the kernel already has code to deal with this.  One of the patches in
> this series provides a config option to allow platforms to select
> unconditionally the behaviour where cross-CPU TLB invalidations are
> handled using inter-processor interrupts.

Are there plans to broadcast the (SMP cache invalidation) messages?
Will uwatt support some real bus protocol, for example?

Again, congrats on this great milestone!  Does this floating point
support do square roots as well (aka "gpopt"; does it do "gfxopt" for
that matter, fsel?)  fsqrt is kinda tricky to get to work fully
correctly :-)


Segher



Re: [PATCH 4/5] powerpc: Define config option for processors without broadcast TLBIE

2025-01-31 Thread Segher Boessenkool
On Wed, Jan 29, 2025 at 06:10:43PM +1100, Paul Mackerras wrote:
> > Hate to bikeshed, but would it be annoying to make this an affirmative
> > option?
> 
> I guess we'd have to make all the platforms that do have broadcast
> tlbie (and a book3s-64 MMU with radix) select that option.  Which
> would be powernv and pseries, I would think.  If that's correct then
> it's probably not too annoying.  Should I do that in v2?

Such an option would mean "this platform implements the Power
architecture correctly".  Interesting :-)


Segher



Re: [PATCH 2/5] powerpc/microwatt: Device-tree updates

2025-01-31 Thread Segher Boessenkool
On Wed, Jan 29, 2025 at 09:52:09AM +1100, Paul Mackerras wrote:
> - isa = <3000>;
> + isa = <3010>;

Does this mean 3.1, or 3.01?  If the former, can this also encode 3.1C?
Should uwatt say to support that?

>   little-endian {
> - isa = <2050>;
> - usable-privilege = <3>;
> + isa = <0>;
> + usable-privilege = <7>;
> + os-support = <0>;
>   hwcap-bit-nr = <1>;
>   };

What does "isa" in this node mean?  Why is it changed to 0?  (I don't
know this node at all, I only know a property with the same name).

>   cache-inhibited-large-page {
> - isa = <2040>;
> - usable-privilege = <2>;
> + isa = <0>;
> + usable-privilege = <6>;
> + os-support = <0>;
>   };

Similar question here.

> - isa = <2010>;
> + isa = <0x00>;
>   usable-privilege = <2>;
> + os-support = <0>;
>   };

And here.  That's a quite woolly way to say "0" btw ;-)


Segher



Re: [PATCH 2/5] powerpc/microwatt: Device-tree updates

2025-01-31 Thread Segher Boessenkool
On Wed, Jan 29, 2025 at 04:36:14PM +1000, Nicholas Piggin wrote:
> On Wed Jan 29, 2025 at 8:52 AM AEST, Paul Mackerras wrote:
> > Microwatt now implements ISA v3.1 (SFFS compliancy subset), including
> > prefixed instructions, scv/rfscv, and the FSCR, HFSCR, TAR, and CTRL
> > registers.  The privileged mode of operation is now hypervisor mode
> > and there is no privileged non-hypervisor mode; the MSR[HV] bit is
> > forced to 1.
> 
> Cool. Lots of development in microwatt.
> 
> Come to think of it we should have put a broadcast-tlbie feature
> in there and you wouldn't need the other patch. That can go on
> the todo list I guess.
> 
> system-call-vectored was available in ISA v3.0. Not that we do much
> with it at the moment IIRC, but there were dreams of wiring it in for
> compat guests. With that fixed,

This patch says 2.07 for it (if that is what 2070 mens(if that is what
2070 means).  Something's not right :-)

> > +   system-call-vectored {
> > +   isa = <2070>;
> > +   usable-privilege = <7>;
> > +   os-support = <1>;
> > +   fscr-bit-nr = <12>;
> > +   hwcap-bit-nr = <52>;
> > };


Segher



Re: [PATCH 5/9] dt-bindings: dma: Convert fsl,elo*-dma bindings to YAML

2025-01-31 Thread J . Neuschäfer
On Sun, Jan 26, 2025 at 10:47:35PM -0600, Rob Herring wrote:
> On Sun, Jan 26, 2025 at 07:59:00PM +0100, J. Neuschäfer wrote:
> > The devicetree bindings for Freescale DMA engines have so far existed as
> > a text file. This patch converts them to YAML, and specifies all the
> > compatible strings currently in use in arch/powerpc/boot/dts.
> > 
> > Signed-off-by: J. Neuschäfer 
> > ---
> >  .../devicetree/bindings/dma/fsl,elo-dma.yaml   | 129 +
> >  .../devicetree/bindings/dma/fsl,elo3-dma.yaml  | 105 +++
> >  .../devicetree/bindings/dma/fsl,eloplus-dma.yaml   | 120 
> >  .../devicetree/bindings/powerpc/fsl/dma.txt| 204 
> > -
> >  4 files changed, 354 insertions(+), 204 deletions(-)
[...]
> > +patternProperties:
> > +  "^dma-channel@.*$":
> > +type: object
> 
>additionalProperties: false

I'll add it.

> (The tools should have highlighted this)

With dtschema 2024.11 installed, "make dt_binding_check
DT_SCHEMA_FILES=fsl,elo-dma.yaml" does not highlight this.

> > +
> > +properties:
> > +  compatible:
> > +items:
> > +  - enum:
> > +  - fsl,mpc8315-dma-channel
> > +  - fsl,mpc8323-dma-channel
> > +  - fsl,mpc8347-dma-channel
> > +  - fsl,mpc8349-dma-channel
> > +  - fsl,mpc8360-dma-channel
> > +  - fsl,mpc8377-dma-channel
> > +  - fsl,mpc8378-dma-channel
> > +  - fsl,mpc8379-dma-channel
> > +  - const: fsl,elo-dma-channel
> > +
> > +  reg:
> > +maxItems: 1
> > +
> > +  cell-index:
> > +description: DMA channel index starts at 0.
> > +
> > +  interrupts: true
> 
> You have to define how many interrupts and what they are.

Will do.

(and the same for the other two files)


Best regards,
J. Neuschäfer



Re: [PATCH 3/5] powerpc/microwatt: Define an idle power-save function

2025-01-31 Thread Segher Boessenkool
Hi!

> +static void microwatt_idle(void)
> +{
> + if (!prep_irq_for_idle())
> + return;
> +
> + __asm__ __volatile__ ("wait");
> +}

All asm without outputs is always implicitly volatile (if it wasn't, it
could always be transfirmed whatever way you want, like, optimised away
completely).

It can still be useful for documentation purposes of course, but here it
is the other way around really, it is just cargo cult :-(


Segher



Re: [PATCH 3/5] powerpc/microwatt: Define an idle power-save function

2025-01-31 Thread Segher Boessenkool
On Wed, Jan 29, 2025 at 04:06:03PM +1000, Nicholas Piggin wrote:
> Does wait cause MSR[EE] to be set? If not, do you need to use
> prep_irq_for_idle_irqsoff() here maybe?

Assuming this does implement the standard ISA 2.03 wait instruction
(and it better), this does not do anything other than to stop fetching
and execution until some later event happens.

What values of the WC field does uwatt implement?


Segher



Re: [PATCH v3 3/7] crash: Let arch decide usable memory range in reserved area

2025-01-31 Thread Baoquan he
On 01/31/25 at 05:08pm, Sourabh Jain wrote:
> Although the crashkernel area is reserved, on architectures like
> PowerPC, it is possible for the crashkernel reserved area to contain
> components like RTAS, TCE, OPAL, etc. To avoid placing kexec segments
> over these components, PowerPC has its own set of APIs to locate holes
> in the crashkernel reserved area.
> 
> Add an arch hook in the generic locate mem hole APIs so that
> architectures can handle such special regions in the crashkernel area
> while locating memory holes for kexec segments using generic APIs.
> With this, a lot of redundant arch-specific code can be removed, as it
> performs the exact same job as the generic APIs.
> 
> To keep the generic and arch-specific changes separate, the changes
> related to moving PowerPC to use the generic APIs and the removal of
> PowerPC-specific APIs for memory hole allocation are done in a
> subsequent patch titled "powerpc/crash: Use generic APIs to locate
> memory hole for kdump.
> 
> Cc: Andrew Morton 
> Cc: Baoquan he 
> Cc: Hari Bathini 
> Cc: Madhavan Srinivasan 
> Cc: Mahesh Salgaonkar 
> Cc: Michael Ellerman 
> Cc: ke...@lists.infradead.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: Sourabh Jain 
> ---
>  include/linux/kexec.h |  9 +
>  kernel/kexec_file.c   | 12 
>  2 files changed, 21 insertions(+)

LGTM,

Acked-by: Baoquan He 




Re: [PATCH v3 3/7] crash: Let arch decide usable memory range in reserved area

2025-01-31 Thread Sourabh Jain

Hello Baoquan,


On 01/02/25 09:52, Baoquan he wrote:

On 01/31/25 at 05:08pm, Sourabh Jain wrote:

Although the crashkernel area is reserved, on architectures like
PowerPC, it is possible for the crashkernel reserved area to contain
components like RTAS, TCE, OPAL, etc. To avoid placing kexec segments
over these components, PowerPC has its own set of APIs to locate holes
in the crashkernel reserved area.

Add an arch hook in the generic locate mem hole APIs so that
architectures can handle such special regions in the crashkernel area
while locating memory holes for kexec segments using generic APIs.
With this, a lot of redundant arch-specific code can be removed, as it
performs the exact same job as the generic APIs.

To keep the generic and arch-specific changes separate, the changes
related to moving PowerPC to use the generic APIs and the removal of
PowerPC-specific APIs for memory hole allocation are done in a
subsequent patch titled "powerpc/crash: Use generic APIs to locate
memory hole for kdump.

Cc: Andrew Morton 
Cc: Baoquan he 
Cc: Hari Bathini 
Cc: Madhavan Srinivasan 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Sourabh Jain 
---
  include/linux/kexec.h |  9 +
  kernel/kexec_file.c   | 12 
  2 files changed, 21 insertions(+)

LGTM,

Acked-by: Baoquan He 


Thanks for the Ack!

- Sourabh Jain




Re: [PATCH 2/2] powerpc/fadump: fix additional param memory reservation for HASH MMU

2025-01-31 Thread Hari Bathini




On 23/01/25 7:54 pm, Avnish Chouhan wrote:

On 2025-01-23 15:26, Hari Bathini wrote:

On 20/01/25 11:05 pm, Sourabh Jain wrote:

Commit 683eab94da75bc ("powerpc/fadump: setup additional parameters for
dump capture kernel") introduced the additional parameter feature in
fadump for HASH MMU with the understanding that GRUB does not use the
memory area between 640MB and 768MB for its operation.

However, the patch ("powerpc: increase MIN RMA size for CAS
negotiation") changes the MIN RMA size to 768MB, allowing GRUB to use
memory up to 768MB. This makes the fadump reservation for the additional
parameter feature for HASH MMU unreliable.

To address this, adjust the memory range for the additional parameter in
fadump for HASH MMU. This will ensure that GRUB does not overwrite the
memory reserved for fadump's additional parameter in HASH MMU.




The new policy for the memory range for the additional parameter in HASH
MMU is that the first memory block must be larger than the MIN_RMA size,
as the bootloader can use memory up to the MIN_RMA size. The range
should be between MIN_RMA and the RMA size (ppc64_rma_size), and it must
not overlap with the fadump reserved area.


IIRC, even memory above MIN_RMA is used by the bootloader except for
640MB to 768MB (assuming RMA size is >768MB). So, how does this change
guarantee that the bootloader is not using memory reserved for bootargs?

Avnish, earlier, bootloader was using RUNTIME_MIN_SPACE (128MB) starting
top-down at 768MB earlier. With MIN_RMA changed to 768MB, is bootloader
still using the concept of RUNTIME_MIN_SPACE to set aside some memory
for kernel to use. If yes, where exactly is it allocating this space
now? Also, rtas instantiates top-down at 768MB. Would that not have
a conflict with grub allocations without RUNTIME_MIN_SPACE at 768MB?

- Hari


Hi Hari,


Hi Avnish,

The RUNTIME_MIN_SPACE is the space left aside by Grub is within the 
MIN_RMA size. Grub won't use memory beyond the MIN_RMA. With this 
change, we haven't changed the RUNTIME_MIN_SPACE behavior. Grub will 
still keep the 128 MB space in MIN_RMA for loading stock kernel and initrd.


IIUC, you mean, 640MB to 768MB is not used by Grub even if MIN_RMA
is at 768MB? If that is true, this change is not needed, as fadump
could still use the memory between 640MB to 768MB, right?
Am I missing something here..

- Hari



Re: [PATCH 2/9] dt-bindings: ata: Convert fsl,pq-sata binding to YAML

2025-01-31 Thread J . Neuschäfer
On Mon, Jan 27, 2025 at 08:22:55AM +0900, Damien Le Moal wrote:
> On 1/27/25 03:58, J. Neuschäfer via B4 Relay wrote:
> > From: "J. Neuschäfer" 
> > 
> > Convert the Freescale PowerQUICC SATA controller binding from text form
> > to YAML. The list of compatible strings reflects current usage.
> > 
> > Signed-off-by: J. Neuschäfer 
> > ---
> >  .../devicetree/bindings/ata/fsl,pq-sata.yaml   | 59 
> > ++
[...]
> > +description: |
> > +  SATA nodes are defined to describe on-chip Serial ATA controllers.
> > +  Each SATA port should have its own node.
> 
> Very unclear. The SATA nodes define ports or controllers ? Normally, a single
> controller can have multiple ports, so the distinction is important.

I'll change it to "Each SATA controller ...", see below.


> > +  cell-index:
> > +$ref: /schemas/types.yaml#/definitions/uint32
> > +enum: [1, 2, 3, 4]
> > +description: |
> > +  1 for controller @ 0x18000
> > +  2 for controller @ 0x19000
> > +  3 for controller @ 0x1a000
> > +  4 for controller @ 0x1b000
> 
> Are you sure these are different controllers ? Are they not different ports of
> the same controller ? Given that the previous text description define this as
> "controller index", I suspect these are the port offsets and you SATA nodes
> define ports, and not controllers.

They have no shared registers, and each instance has the same register
set (at a different base address).

The MPC8315E reference manual (for example) documents them as:

SATA 1 Controller—Block Base Address 0x1_8000
SATA 2 Controller—Block Base Address 0x1_9000

(table A.24 Serial ATA (SATA) Controller)

Section 15.2 Command Operation implies that each SATA controller
supports a single port:

The SATA controller maintains a queue consisting of up to 16
commands. These commands can be distributed to a single attached
device or, if the system contains a port multiplier, over each
of the attached devices.

So, in conclusion, I'm fairly sure "controller" is the right description.


Best regards,
J. Neuschäfer



[PATCH v3 0/7] powerpc/crash: use generic crashkernel reservation

2025-01-31 Thread Sourabh Jain
Commit 0ab97169aa05 ("crash_core: add generic function to do
reservation") added a generic function to reserve crashkernel memory.
So let's use the same function on powerpc and remove the
architecture-specific code that essentially does the same thing.

The generic crashkernel reservation also provides a way to split the
crashkernel reservation into high and low memory reservations, which can
be enabled for powerpc in the future.

Additionally move powerpc to use generic APIs to locate memory hole for
kexec segments while loading kdump kernel.

Patch series summary:
=
Patch 1-3: generic changes
Patch 4-7: powerpc changes

Changelog:

v1:
 https://lore.kernel.org/all/20241217064613.1042866-1-sourabhj...@linux.ibm.com/

v1 Resend:
 https://lore.kernel.org/all/20250108101458.406806-1-sourabhj...@linux.ibm.com/
 - Rebased on top of 6.13-rc6

v2:
 https://lore.kernel.org/all/20250121115442.1278458-1-sourabhj...@linux.ibm.com/
 - 01/06 new patch, fixes a bug with ELF load address
   It was posted in community as individual path but
   since it align with this patch series so include here:
   
https://lore.kernel.org/all/20241210091314.185785-1-sourabhj...@linux.ibm.com/
 - Fixed a typo 06/06

 - Added Acked-by and Reviewed-by tags

 - Rebased it on 6.13

 - It was suggested to try HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY and
   see if 03/06 patch can be avoided:
   https://lore.kernel.org/all/Z35gnO2N%2FLFt1E7E@MiWiFi-R3L-srv/
   But after adding HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY the issue
   persisted and kernel prints below logs on boot:
   ...snip ...

v3:
 - Split the patch that adds the arch hook to handle special regions in
   crashkernel reserved: Now 03/07 and 04/07.

 - Dropped the patch that adds arch hooks to skip `Crash Kernel` from
   iomem_resource (/proc/iomem). Instead, a patch is added to include
   both `System RAM` and `Crash Kernel` memory ranges in iomem_resource: 06/07.

 - Rearranged the patches to ensure that generic patches come first,
   followed by PowerPC-specific patches.

 - Added Acked-by tags.

Cc: Andrew Morton 
Cc: Baoquan he 
Cc: Hari Bathini 
CC: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: sourabhj...@linux.ibm.com

Sourabh Jain (7):
  kexec: Initialize ELF lowest address to ULONG_MAX
  crash: remove an unused argument from reserve_crashkernel_generic()
  crash: Let arch decide usable memory range in reserved area
  powerpc/crash: Use generic APIs to locate memory hole for kdump
  powerpc/crash: preserve user-specified memory limit
  powerpc: insert System RAM resource to prevent crashkernel conflict
  powerpc/crash: use generic crashkernel reservation

 arch/arm64/mm/init.c |   6 +-
 arch/loongarch/kernel/setup.c|   5 +-
 arch/powerpc/Kconfig |   3 +
 arch/powerpc/include/asm/crash_reserve.h |   8 +
 arch/powerpc/include/asm/kexec.h |  10 +-
 arch/powerpc/kernel/prom.c   |   2 +-
 arch/powerpc/kexec/core.c|  96 -
 arch/powerpc/kexec/file_load_64.c| 259 +--
 arch/powerpc/mm/mem.c|   2 +-
 arch/riscv/mm/init.c |   6 +-
 arch/x86/kernel/setup.c  |   6 +-
 include/linux/crash_reserve.h|  11 +-
 include/linux/kexec.h|   9 +
 kernel/crash_reserve.c   |   9 +-
 kernel/kexec_elf.c   |   2 +-
 kernel/kexec_file.c  |  12 ++
 16 files changed, 105 insertions(+), 341 deletions(-)
 create mode 100644 arch/powerpc/include/asm/crash_reserve.h

-- 
2.48.1




[PATCH v3 5/7] powerpc/crash: preserve user-specified memory limit

2025-01-31 Thread Sourabh Jain
Commit 59d58189f3d9 ("crash: fix crash memory reserve exceed system
memory bug") fails crashkernel parsing if the crash size is found to be
higher than system RAM, which makes the memory_limit adjustment code
ineffective due to an early exit from reserve_crashkernel().

Regardless lets not violate the user-specified memory limit by
adjusting it. Remove this adjustment to ensure all reservations stay
within the limit. Commit f94f5ac07983 ("powerpc/fadump: Don't update
the user-specified memory limit") did the same for fadump.

Cc: Andrew Morton 
Cc: Baoquan he 
CC: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: ke...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Acked-by: Hari Bathini 
Reviewed-by: Mahesh Salgaonkar 
Signed-off-by: Sourabh Jain 
---
 arch/powerpc/kexec/core.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
index b8333a49ea5d..4945b33322ae 100644
--- a/arch/powerpc/kexec/core.c
+++ b/arch/powerpc/kexec/core.c
@@ -150,14 +150,6 @@ void __init reserve_crashkernel(void)
return;
}
 
-   /* Crash kernel trumps memory limit */
-   if (memory_limit && memory_limit <= crashk_res.end) {
-   memory_limit = crashk_res.end + 1;
-   total_mem_sz = memory_limit;
-   printk("Adjusted memory limit for crashkernel, now 0x%llx\n",
-  memory_limit);
-   }
-
printk(KERN_INFO "Reserving %ldMB of memory at %ldMB "
"for crashkernel (System RAM: %ldMB)\n",
(unsigned long)(crash_size >> 20),
-- 
2.48.1




Re: [PATCH 4/5] powerpc: Define config option for processors without broadcast TLBIE

2025-01-31 Thread Segher Boessenkool
On Wed, Jan 29, 2025 at 09:53:44AM +1100, Paul Mackerras wrote:
> Power ISA v3.1 implementations in the Linux Compliancy Subset and
> lower are not required to implement broadcast TLBIE, and in fact
> Microwatt doesn't.

But this pretty much means that such systems cannot be SMP systems at
all.  Implementing the necessary synchronisation using some
cobbled-together rickety jury-rigged contraption is not anyone's goal.

Interesting that you did not see any performance loss, btw!  Well you
didn't try it on anything bigger than a simple dual CPU :-)


Segher



Re: [PATCH 0/5] Microwatt updates

2025-01-31 Thread Paul Mackerras
On Fri, Jan 31, 2025 at 10:13:43AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Jan 29, 2025 at 09:49:49AM +1100, Paul Mackerras wrote:
> > This patch series updates the kernel support for the Microwatt
> > soft-core and its implementation on FPGA systems, particularly the
> > Digilent Arty A7-100 FPGA development board.
> > 
> > Microwatt now supports almost all of the features of the SFFS (Scalar
> > Fixed-poin and Floating-point Subset) compliancy subset of Power ISA
> > version 3.1C, including prefixed instructions and the fixed-point hash
> > (ROP mitigation) instructions.  It is also now SMP-capable, and a
> > dual-core system will fit on the Arty A7-100 board.
> 
> Congratulations!

Thanks!

> > Microwatt does not have broadcast TLB invalidations in SMP systems;
> 
> So it isn't *really* SMP.  Compare 603 vs. 604.  With enough software

Actually, the term "SMP" is about latency to memory, indicating that
all CPUs have access to memory with similar latency.  It doesn't say
anything about coherency, either of memory caches or TLBs.  So yes,
Microwatt is SMP.

And for the record, the instruction and data caches are coherent,
which is what matters to user-space.  Stuff to do with the TLB is not
visible to user-space.  And the ISA explicitly says "TLBs are
non-coherent caches of the HTABs and Radix Trees" (Book III section
6.10.1).

> (OS) trickery you can make some things work, but :-)  (There have been
> many 603 multiprocessor systems as well, to draw the analogy further
> than wanted :-) )

603 was a looong time ago, I don't recall the details.

Regarding broadcast TLBIEs, the protocols and mechanisms for doing
that are known to be complex and slow in the IBM Power processors (ask
Derek Williams about that :).  Anton found that in fact doing only
local TLBIEs and using IPIs gave *better* performance on IBM Power
systems than using hardware broadcast TLBIEs in many cases (the reason
being that software knows which other CPUs might have a given TLB
entry, often quite a small set, whereas hardware doesn't, and has to
send the invalidation to every CPU and wait for a response from every
CPU).  Add to that, that most other SMP-capable CPU architectures
don't do broadcast TLB invalidations, Intel x86 for example.

> > the kernel already has code to deal with this.  One of the patches in
> > this series provides a config option to allow platforms to select
> > unconditionally the behaviour where cross-CPU TLB invalidations are
> > handled using inter-processor interrupts.
> 
> Are there plans to broadcast the (SMP cache invalidation) messages?

Cache (i.e. instruction and data cache) - yes, they *are* coherent.
More precisely, the D caches are write-through, and all I and D caches
snoop writes to memory (including DMA writes) and invalidate any cache
lines being written to.

> Will uwatt support some real bus protocol, for example?

"Real" meaning using tri-state bus drivers, like we did in the 90s? :)

> Again, congrats on this great milestone!  Does this floating point
> support do square roots as well (aka "gpopt"; does it do "gfxopt" for
> that matter, fsel?)  fsqrt is kinda tricky to get to work fully
> correctly :-)

Yes, fsqrt and fsel are implemented in hardware, and are accurate to
the last bit.  Also, the FPU handles denormalized values in hardware
(both input and output) and implements all exception handling as per
the ISA, including the trap-enabled overflow cases.  Feel free to run
whatever tests you like and report bugs.  But we're getting a bit
off-topic from the kernel patches. :)

Paul.



Re: [PATCH 3/5] powerpc/microwatt: Define an idle power-save function

2025-01-31 Thread Paul Mackerras
On Fri, Jan 31, 2025 at 10:32:55AM -0600, Segher Boessenkool wrote:
> On Wed, Jan 29, 2025 at 04:06:03PM +1000, Nicholas Piggin wrote:
> > Does wait cause MSR[EE] to be set? If not, do you need to use
> > prep_irq_for_idle_irqsoff() here maybe?
> 
> Assuming this does implement the standard ISA 2.03 wait instruction

ISA 2.03?  I don't have a copy of 2.03.  I looked in 2.04 and the wait
instruction there has a different extended opcode from the ISA 3.0/3.1
wait instruction.  Why is ISA 2.03 at all relevant to anything here?
In any case, the description of the wait instruction in 2.04 doesn't
actually say that it waits for anything other than all previous
instructions being finished.

> (and it better), this does not do anything other than to stop fetching
> and execution until some later event happens.
> 
> What values of the WC field does uwatt implement?

Just WC=0; for other values it's a no-op.  (Which is still arguably
correct given that execution is allowed to resume when an
implementation-dependent event occurs; P9 for instance just stops for
a few microseconds, if I recall correctly, for any WC value.)

Paul.



Re: [PATCH 5/9] dt-bindings: dma: Convert fsl,elo*-dma bindings to YAML

2025-01-31 Thread Rob Herring
On Fri, Jan 31, 2025 at 8:03 AM J. Neuschäfer  wrote:
>
> On Sun, Jan 26, 2025 at 10:47:35PM -0600, Rob Herring wrote:
> > On Sun, Jan 26, 2025 at 07:59:00PM +0100, J. Neuschäfer wrote:
> > > The devicetree bindings for Freescale DMA engines have so far existed as
> > > a text file. This patch converts them to YAML, and specifies all the
> > > compatible strings currently in use in arch/powerpc/boot/dts.
> > >
> > > Signed-off-by: J. Neuschäfer 
> > > ---
> > >  .../devicetree/bindings/dma/fsl,elo-dma.yaml   | 129 +
> > >  .../devicetree/bindings/dma/fsl,elo3-dma.yaml  | 105 +++
> > >  .../devicetree/bindings/dma/fsl,eloplus-dma.yaml   | 120 
> > >  .../devicetree/bindings/powerpc/fsl/dma.txt| 204 
> > > -
> > >  4 files changed, 354 insertions(+), 204 deletions(-)
> [...]
> > > +patternProperties:
> > > +  "^dma-channel@.*$":
> > > +type: object
> >
> >additionalProperties: false
>
> I'll add it.
>
> > (The tools should have highlighted this)
>
> With dtschema 2024.11 installed, "make dt_binding_check
> DT_SCHEMA_FILES=fsl,elo-dma.yaml" does not highlight this.

Actually, it's the top-level 'addtionalProperties: true' that disables
the check here. That should be false as well.

Rob