[PATCH v14 2/6] crash: make CPU and Memory hotplug support reporting flexible

2023-12-11 Thread Sourabh Jain
Architectures' specific functions `arch_crash_hotplug_cpu_support()` and
`arch_crash_hotplug_memory_support()` advertise the kernel's capability
to update the kdump image on CPU and Memory hotplug events to userspace
via the sysfs interface. These architecture-specific functions need to
access various attributes of the `kexec_crash_image` object to determine
whether the kernel can update the kdump image and advertise this
information to userspace accordingly.

As the architecture-specific code is not exposed to the APIs required to
acquire the lock for accessing the `kexec_crash_image` object, it calls
a generic function, `crash_check_update_elfcorehdr()`, to determine
whether the kernel can update the kdump image or not.

The lack of access to the `kexec_crash_image` object in
architecture-specific code restricts architectures from performing
additional architecture-specific checks required to determine if the
kdump image is updatable or not. For instance, on PowerPC, the kernel
can update the kdump image only if both the elfcorehdr and FDT are
marked as updatable for the `kexec_load` system call.

So define two generic functions, `crash_hotplug_cpu_support()` and
`crash_hotplug_memory_support()`, which are called when userspace
attempts to read the crash CPU and Memory hotplug support via the sysfs
interface. These functions take the necessary locks needed to access the
`kexec_crash_image` object and then forward it to the
architecture-specific handler to do the rest.

Signed-off-by: Sourabh Jain 
Cc: Akhil Raj 
Cc: Andrew Morton 
Cc: Aneesh Kumar K.V 
Cc: Baoquan He 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Eric DeVolder 
Cc: Greg Kroah-Hartman 
Cc: Hari Bathini 
Cc: Laurent Dufour 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Mimi Zohar 
Cc: Naveen N Rao 
Cc: Oscar Salvador 
Cc: Thomas Gleixner 
Cc: Valentin Schneider 
Cc: Vivek Goyal 
Cc: ke...@lists.infradead.org
Cc: x...@kernel.org
---
 arch/x86/include/asm/kexec.h |  8 
 arch/x86/kernel/crash.c  | 20 +++-
 include/linux/kexec.h| 13 +++--
 kernel/crash_core.c  | 23 +--
 4 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 9bb6607e864e..5c88d27b086d 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -212,13 +212,13 @@ void arch_crash_handle_hotplug_event(struct kimage 
*image, void *arg);
 #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
 
 #ifdef CONFIG_HOTPLUG_CPU
-int arch_crash_hotplug_cpu_support(void);
-#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
+int arch_crash_hotplug_cpu_support(struct kimage *image);
+#define arch_crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int arch_crash_hotplug_memory_support(void);
-#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
+int arch_crash_hotplug_memory_support(struct kimage *image);
+#define arch_crash_hotplug_memory_support arch_crash_hotplug_memory_support
 #endif
 
 unsigned int arch_crash_get_elfcorehdr_size(void);
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 0d7b2657beb6..ad5941665589 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -398,18 +398,28 @@ int crash_load_segments(struct kimage *image)
 #undef pr_fmt
 #define pr_fmt(fmt) "crash hp: " fmt
 
-/* These functions provide the value for the sysfs crash_hotplug nodes */
+#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
+static int crash_hotplug_support(struct kimage *image)
+{
+   if (image->file_mode)
+   return 1;
+
+   return image->update_elfcorehdr;
+}
+#endif
+
 #ifdef CONFIG_HOTPLUG_CPU
-int arch_crash_hotplug_cpu_support(void)
+/* These functions provide the value for the sysfs crash_hotplug nodes */
+int arch_crash_hotplug_cpu_support(struct kimage *image)
 {
-   return crash_check_update_elfcorehdr();
+   return crash_hotplug_support(image);
 }
 #endif
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int arch_crash_hotplug_memory_support(void)
+int arch_crash_hotplug_memory_support(struct kimage *image)
 {
-   return crash_check_update_elfcorehdr();
+   return crash_hotplug_support(image);
 }
 #endif
 
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index ee28c09a7fb0..0f6ea35879ee 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -486,16 +486,17 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, 
unsigned int pages) {
 static inline void arch_crash_handle_hotplug_event(struct kimage *image, void 
*arg) { }
 #endif
 
-int crash_check_update_elfcorehdr(void);
-
-#ifndef crash_hotplug_cpu_support
-static inline int crash_hotplug_cpu_support(void) { return 0; }
+#ifndef arch_crash_hotplug_cpu_support
+static inline int arch_crash_hotplug_cpu_sup

[PATCH v14 1/6] crash: forward memory_notify arg to arch crash hotplug handler

2023-12-11 Thread Sourabh Jain
In the event of memory hotplug or online/offline events, the crash
memory hotplug notifier `crash_memhp_notifier()` receives a
`memory_notify` object but doesn't forward that object to the
generic and architecture-specific crash hotplug handler.

The `memory_notify` object contains the starting PFN (Page Frame Number)
and the number of pages in the hot-removed memory. This information is
necessary for architectures like PowerPC to update/recreate the kdump
image, specifically `elfcorehdr`.

So update the function signature of `crash_handle_hotplug_event()` and
`arch_crash_handle_hotplug_event()` to accept the `memory_notify` object
as an argument from crash memory hotplug notifier.

Since no such object is available in the case of CPU hotplug event, the
crash CPU hotplug notifier `crash_cpuhp_online()` passes NULL to the
crash hotplug handler.

Signed-off-by: Sourabh Jain 
Cc: Akhil Raj 
Cc: Andrew Morton 
Cc: Aneesh Kumar K.V 
Cc: Baoquan He 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Eric DeVolder 
Cc: Greg Kroah-Hartman 
Cc: Hari Bathini 
Cc: Laurent Dufour 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Mimi Zohar 
Cc: Naveen N Rao 
Cc: Oscar Salvador 
Cc: Thomas Gleixner 
Cc: Valentin Schneider 
Cc: Vivek Goyal 
Cc: ke...@lists.infradead.org
Cc: x...@kernel.org
---
 arch/x86/include/asm/kexec.h |  2 +-
 arch/x86/kernel/crash.c  |  3 ++-
 include/linux/kexec.h|  2 +-
 kernel/crash_core.c  | 14 +++---
 4 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index c9f6a6c5de3c..9bb6607e864e 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -208,7 +208,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage 
*image);
 extern void kdump_nmi_shootdown_cpus(void);
 
 #ifdef CONFIG_CRASH_HOTPLUG
-void arch_crash_handle_hotplug_event(struct kimage *image);
+void arch_crash_handle_hotplug_event(struct kimage *image, void *arg);
 #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
 
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index c92d88680dbf..0d7b2657beb6 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -428,10 +428,11 @@ unsigned int arch_crash_get_elfcorehdr_size(void)
 /**
  * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
  * @image: a pointer to kexec_crash_image
+ * @arg: struct memory_notify handler for memory hotplug case and NULL for CPU 
hotplug case.
  *
  * Prepare the new elfcorehdr and replace the existing elfcorehdr.
  */
-void arch_crash_handle_hotplug_event(struct kimage *image)
+void arch_crash_handle_hotplug_event(struct kimage *image, void *arg)
 {
void *elfbuf = NULL, *old_elfcorehdr;
unsigned long nr_mem_ranges;
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 8227455192b7..ee28c09a7fb0 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -483,7 +483,7 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, 
unsigned int pages) {
 #endif
 
 #ifndef arch_crash_handle_hotplug_event
-static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
+static inline void arch_crash_handle_hotplug_event(struct kimage *image, void 
*arg) { }
 #endif
 
 int crash_check_update_elfcorehdr(void);
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index efe87d501c8c..b9190265fe52 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -935,7 +935,7 @@ int crash_check_update_elfcorehdr(void)
  * list of segments it checks (since the elfcorehdr changes and thus
  * would require an update to purgatory itself to update the digest).
  */
-static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int 
cpu)
+static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int 
cpu, void *arg)
 {
struct kimage *image;
 
@@ -997,7 +997,7 @@ static void crash_handle_hotplug_event(unsigned int 
hp_action, unsigned int cpu)
image->hp_action = hp_action;
 
/* Now invoke arch-specific update handler */
-   arch_crash_handle_hotplug_event(image);
+   arch_crash_handle_hotplug_event(image, arg);
 
/* No longer handling a hotplug event */
image->hp_action = KEXEC_CRASH_HP_NONE;
@@ -1013,17 +1013,17 @@ static void crash_handle_hotplug_event(unsigned int 
hp_action, unsigned int cpu)
crash_hotplug_unlock();
 }
 
-static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, 
void *v)
+static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, 
void *arg)
 {
switch (val) {
case MEM_ONLINE:
crash_handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY,
-   KEXEC_CRASH_HP_INVALID_CPU);
+   KEXEC_CRASH_HP_INVALID_CPU, arg);
break;
 
ca

[PATCH v14 4/6] powerpc/kexec: turn some static helper functions public

2023-12-11 Thread Sourabh Jain
Move the functions update_cpus_node and get_crash_memory_ranges from
kexec/file_load_64.c to kexec/core_64.c to make these functions usable
by other kexec components.

get_crash_memory_ranges uses functions defined in ranges.c, so take
ranges.c out of CONFIG_KEXEC_FILE.

Later in the series, these functions are utilized for in-kernel updates
to kdump image during CPU/Memory hotplug or online/offline events for
both kexec_load and kexec_file_load syscalls.

There is no intended functional change.

Signed-off-by: Sourabh Jain 
Reviewed-by: Laurent Dufour 
Cc: Akhil Raj 
Cc: Andrew Morton 
Cc: Aneesh Kumar K.V 
Cc: Baoquan He 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Eric DeVolder 
Cc: Greg Kroah-Hartman 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Mimi Zohar 
Cc: Naveen N Rao 
Cc: Oscar Salvador 
Cc: Thomas Gleixner 
Cc: Valentin Schneider 
Cc: Vivek Goyal 
Cc: ke...@lists.infradead.org
Cc: x...@kernel.org
---
 arch/powerpc/include/asm/kexec.h  |   6 ++
 arch/powerpc/kexec/Makefile   |   4 +-
 arch/powerpc/kexec/core_64.c  | 166 ++
 arch/powerpc/kexec/file_load_64.c | 162 -
 4 files changed, 174 insertions(+), 164 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index e1b43aa12175..562e1bb689da 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -108,6 +108,12 @@ void crash_free_reserved_phys_range(unsigned long begin, 
unsigned long end);
 #endif /* CONFIG_PPC_RTAS */
 #endif /* CONFIG_CRASH_DUMP */
 
+#ifdef CONFIG_PPC64
+struct crash_mem;
+int update_cpus_node(void *fdt);
+int get_crash_memory_ranges(struct crash_mem **mem_ranges);
+#endif /* CONFIG_PPC64 */
+
 #ifdef CONFIG_KEXEC_FILE
 extern const struct kexec_file_ops kexec_elf64_ops;
 
diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index 0c2abe7f9908..f2ed5b85b912 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -3,11 +3,11 @@
 # Makefile for the linux kernel.
 #
 
-obj-y  += core.o crash.o core_$(BITS).o
+obj-y  += core.o crash.o ranges.o core_$(BITS).o
 
 obj-$(CONFIG_PPC32)+= relocate_32.o
 
-obj-$(CONFIG_KEXEC_FILE)   += file_load.o ranges.o file_load_$(BITS).o 
elf_$(BITS).o
+obj-$(CONFIG_KEXEC_FILE)   += file_load.o file_load_$(BITS).o elf_$(BITS).o
 
 # Disable GCOV, KCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_core_$(BITS).o := n
diff --git a/arch/powerpc/kexec/core_64.c b/arch/powerpc/kexec/core_64.c
index 0bee7ca9a77c..9966b51d9aa8 100644
--- a/arch/powerpc/kexec/core_64.c
+++ b/arch/powerpc/kexec/core_64.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -30,6 +32,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 int machine_kexec_prepare(struct kimage *image)
 {
@@ -377,6 +381,168 @@ void default_machine_kexec(struct kimage *image)
/* NOTREACHED */
 }
 
+/**
+ * get_crash_memory_ranges - Get crash memory ranges. This list includes
+ *   first/crashing kernel's memory regions that
+ *   would be exported via an elfcore.
+ * @mem_ranges:  Range list to add the memory ranges to.
+ *
+ * Returns 0 on success, negative errno on error.
+ */
+int get_crash_memory_ranges(struct crash_mem **mem_ranges)
+{
+   phys_addr_t base, end;
+   struct crash_mem *tmem;
+   u64 i;
+   int ret;
+
+   for_each_mem_range(i, &base, &end) {
+   u64 size = end - base;
+
+   /* Skip backup memory region, which needs a separate entry */
+   if (base == BACKUP_SRC_START) {
+   if (size > BACKUP_SRC_SIZE) {
+   base = BACKUP_SRC_END + 1;
+   size -= BACKUP_SRC_SIZE;
+   } else
+   continue;
+   }
+
+   ret = add_mem_range(mem_ranges, base, size);
+   if (ret)
+   goto out;
+
+   /* Try merging adjacent ranges before reallocation attempt */
+   if ((*mem_ranges)->nr_ranges == (*mem_ranges)->max_nr_ranges)
+   sort_memory_ranges(*mem_ranges, true);
+   }
+
+   /* Reallocate memory ranges if there is no space to split ranges */
+   tmem = *mem_ranges;
+   if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) {
+   tmem = realloc_mem_ranges(mem_ranges);
+   if (!tmem)
+   goto out;
+   }
+
+   /* Exclude crashkernel region */
+   ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end);
+   if (ret)
+   goto out;
+
+   /*
+* FIXME: For now, stay in parity with kexec-tools bu

[PATCH v14 0/6] powerpc/crash: Kernel handling of CPU and memory hotplug

2023-12-11 Thread Sourabh Jain
Commit 247262756121 ("crash: add generic infrastructure for crash
hotplug support") added a generic infrastructure that allows
architectures to selectively update the kdump image component during CPU
or memory add/remove events within the kernel itself.

This patch series adds crash hotplug handler for PowerPC and enable
support to update the kdump image on CPU/Memory add/remove events.

Among the 6 patches in this series, the first three patches make changes
to the generic crash hotplug handler to assist PowerPC in adding support
for this feature. The last three patches add support for this feature.

The following section outlines the problem addressed by this patch
series, along with the current solution, its shortcomings, and the
proposed resolution.

Problem:

Due to CPU/Memory hotplug or online/offline events the elfcorehdr
(which describes the CPUs and memory of the crashed kernel) and FDT
(Flattened Device Tree) of kdump image becomes outdated. Consequently,
attempting dump collection with an outdated elfcorehdr or FDT can lead
to failed or inaccurate dump collection.

Going forward CPU hotplug or online/offline events are referred as
CPU/Memory add/remove events.

Existing solution and its shortcoming:
==
The current solution to address the above issue involves monitoring the
CPU/memory add/remove events in userspace using udev rules and whenever
there are changes in CPU and memory resources, the entire kdump image
is loaded again. The kdump image includes kernel, initrd, elfcorehdr,
FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to
CPU/Memory add/remove events, reloading the entire kdump image is
inefficient. More importantly, kdump remains inactive for a substantial
amount of time until the kdump reload completes.

Proposed solution:
==
Instead of initiating a full kdump image reload from userspace on
CPU/Memory hotplug and online/offline events, the proposed solution aims
to update only the necessary kdump image component within the kernel
itself.

Git tree for testing:
=
Git tree rebased on top of v6.7-rc4:
https://github.com/sourabhjains/linux/tree/kdump-in-kernel-crash-update-v14

To realize this feature, the kdump udev rule must be updated. On RHEL,
add the following two lines at the top of the
"/usr/lib/udev/rules.d/98-kexec.rules" file.

SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"

With the above change to the kdump udev rule, kdump reload is avoided
during CPU/Memory add/remove events if this feature is enabled in the
kernel.

Note: only kexec_file_load syscall will work. For kexec_load minor changes
are required in kexec tool.

Changelog:
--

v14:
  - Fix build warnings by including necessary header files
  - Rebase to v6.7-rc5

v13:
  - Fix a build warning, take ranges.c out of CONFIG_KEXEC_FILE
  - Rebase to v6.7-rc4

v12:
  - A patch to add new kexec flags to support this feature on kexec_load
system call
  - Change in the way this feature is advertise to userspace for both
kexec_load syscall
  - Rebase to v6.6-rc7

v11:
  - Rebase to v6.4-rc6
  - The patch that introduced CONFIG_CRASH_HOTPLUG for PowerPC has been
removed. The config is now part of common configuration:
https://lore.kernel.org/all/87ilbpflsk.fsf@mail.lhotse/

v10:
  - Drop the patch that adds fdt_index attribute to struct kimage_arch
Find the fdt segment index when needed.
  - Added more details into commits messages.
  - Rebased onto 6.3.0-rc5

v9:
  - Removed patch to prepare elfcorehdr crash notes for possible CPUs.
The patch is moved to generic patch series that introduces generic
infrastructure for in kernel crash update.
  - Removed patch to pass the hotplug action type to the arch crash
hotplug handler function. The generic patch series has introduced
the hotplug action type in kimage struct.
  - Add detail commit message for better understanding.

v8:
  - Restrict fdt_index initialization to machine_kexec_post_load
it work for both kexec_load and kexec_file_load.[3/8] Laurent Dufour

  - Updated the logic to find the number of offline core. [6/8]

  - Changed the logic to find the elfcore program header to accommodate
future memory ranges due memory hotplug events. [8/8]

v7
  - added a new config to configure this feature
  - pass hotplug action type to arch specific handler

v6
  - Added crash memory hotplug support

v5:
  - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU.
  - Move fdt segment identification for kexec_load case to load path
instead of crash hotplug handler
  - Keep new attribute defined under kimage_arch to track FDT segment
under CONFIG_HOTPLUG_CPU config.

v4:
  - Update the logic to find the additional space needed for hotadd CPUs
post kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash
hotplug support for ke

[PATCH v14 3/6] crash: add a new kexec flag for FDT update

2023-12-11 Thread Sourabh Jain
The commit a72bbec70da2 ("crash: hotplug support for kexec_load()")
introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses
this flag to indicate kernel that it is safe to modify the elfcorehdr
of kdump image loaded using kexec_load system call.

Similarly, add a new kexec flag, `KEXEC_UPDATE_FDT`, for another kdump
component named FDT (Flatten Device Tree). Architectures like PowerPC
need to update FDT kdump image component on CPU hotplug events. Kexec
tool passing `KEXEC_UPDATE_FDT` will be an indication to kernel that FDT
segment is not part of SHA calculation hence it is safe to update it.

With the `KEXEC_UPDATE_ELFCOREHDR` and `KEXEC_UPDATE_FDT` kexec flags,
crash hotplug support can be added to PowerPC for the kexec_load syscall
while maintaining the backward compatibility with older kexec tools that
do not have these newly introduced flags.

Signed-off-by: Sourabh Jain 
Cc: Akhil Raj 
Cc: Andrew Morton 
Cc: Aneesh Kumar K.V 
Cc: Baoquan He 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Eric DeVolder 
Cc: Greg Kroah-Hartman 
Cc: Hari Bathini 
Cc: Laurent Dufour 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Mimi Zohar 
Cc: Naveen N Rao 
Cc: Oscar Salvador 
Cc: Thomas Gleixner 
Cc: Valentin Schneider 
Cc: Vivek Goyal 
Cc: ke...@lists.infradead.org
Cc: x...@kernel.org
---
 include/linux/kexec.h  | 6 --
 include/uapi/linux/kexec.h | 1 +
 kernel/kexec.c | 2 ++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 0f6ea35879ee..bcedb7625b1f 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -319,6 +319,7 @@ struct kimage {
 #ifdef CONFIG_CRASH_HOTPLUG
/* If set, allow changes to elfcorehdr of kexec_load'd image */
unsigned int update_elfcorehdr:1;
+   unsigned int update_fdt:1;
 #endif
 
 #ifdef ARCH_HAS_KIMAGE_ARCH
@@ -396,9 +397,10 @@ bool kexec_load_permitted(int kexec_image_type);
 
 /* List of defined/legal kexec flags */
 #ifndef CONFIG_KEXEC_JUMP
-#define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR)
+#define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR | 
KEXEC_UPDATE_FDT)
 #else
-#define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | 
KEXEC_UPDATE_ELFCOREHDR)
+#define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | 
KEXEC_UPDATE_ELFCOREHDR | \
+   KEXEC_UPDATE_FDT)
 #endif
 
 /* List of defined/legal kexec file flags */
diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
index 01766dd839b0..3d5b3d757bed 100644
--- a/include/uapi/linux/kexec.h
+++ b/include/uapi/linux/kexec.h
@@ -13,6 +13,7 @@
 #define KEXEC_ON_CRASH 0x0001
 #define KEXEC_PRESERVE_CONTEXT 0x0002
 #define KEXEC_UPDATE_ELFCOREHDR0x0004
+#define KEXEC_UPDATE_FDT   0x0008
 #define KEXEC_ARCH_MASK0x
 
 /*
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 8f35a5a42af8..97eb151cd931 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -132,6 +132,8 @@ static int do_kexec_load(unsigned long entry, unsigned long 
nr_segments,
 #ifdef CONFIG_CRASH_HOTPLUG
if (flags & KEXEC_UPDATE_ELFCOREHDR)
image->update_elfcorehdr = 1;
+   if (flags & KEXEC_UPDATE_FDT)
+   image->update_fdt = 1;
 #endif
 
ret = machine_kexec_prepare(image);
-- 
2.41.0



[PATCH v14 5/6] powerpc: add crash CPU hotplug support

2023-12-11 Thread Sourabh Jain
Due to CPU/Memory hotplug or online/offline events the elfcorehdr
(which describes the CPUs and memory of the crashed kernel) and FDT
(Flattened Device Tree) of kdump image becomes outdated. Consequently,
attempting dump collection with an outdated elfcorehdr or FDT can lead
to failed or inaccurate dump collection.

Going forward CPU hotplug or online/offlice events are referred as
CPU/Memory add/remvoe events.

The current solution to address the above issue involves monitoring the
CPU/memory add/remove events in userspace using udev rules and whenever
there are changes in CPU and memory resources, the entire kdump image
is loaded again. The kdump image includes kernel, initrd, elfcorehdr,
FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to
CPU/Memory add/remove events, reloading the entire kdump image is
inefficient. More importantly, kdump remains inactive for a substantial
amount of time until the kdump reload completes.

To address the aforementioned issue, commit 247262756121 ("crash: add
generic infrastructure for crash hotplug support") added a generic
infrastructure that allows architectures to selectively update the kdump
image component during CPU or memory add/remove events within the kernel
itself.

In the event of a CPU or memory add/remove event, the generic crash
hotplug event handler, `crash_handle_hotplug_event()`, is triggered. It
then acquires the necessary locks to update the kdump image and invokes
the architecture-specific crash hotplug handler,
`arch_crash_handle_hotplug_event()`, to update the required kdump image
components.

This patch adds crash hotplug handler for PowerPC and enable support to
update the kdump image on CPU add/remove events. Support for memory
add/remove events is added in a subsequent patch with the title
"powerpc: add crash memory hotplug support."

As mentioned earlier, only the elfcorehdr and FDT kdump image components
need to be updated in the event of CPU or memory add/remove events.
However, the PowerPC architecture crash hotplug handler only updates the
FDT to enable crash hotplug support for CPU add/remove events. Here's
why.

The Elfcorehdr on PowerPC is built with possible CPUs, and thus, it does
not need an update on CPU add/remove events. On the other hand, the FDT
needs to be updated on CPU add events to include the newly added CPU. If
the FDT is not updated and the kernel crashes on a newly added CPU, the
kdump kernel will fail to boot due to the unavailability of the crashing
CPU in the FDT. During the early boot, it is expected that the boot CPU
must be a part of the FDT; otherwise, the kernel will raise a BUG and
fail to boot. For more information, refer to commit 36ae37e3436b0
("powerpc: Make boot_cpuid common between 32 and 64-bit"). Since it is
okay to have an offline CPU in the kdump FDT, no action is taken in case
of CPU removal.

There are two system calls, `kexec_file_load` and `kexec_load`, used to
load the kdump image. Few changes have been made to ensure kernel can
safely update the kdump FDT for both system calls.

For kexec_file_load syscall the kdump image is prepared in kernel. So to
support an increasing number of CPUs, the FDT is constructed with extra
buffer space to ensure it can accommodate a possible number of CPU
nodes. Additionally, a call to fdt_pack (which trims the unused space
once the FDT is prepared) is avoided for kdump image loading if this
feature is enabled.

For the kexec_load syscall, the FDT is updated only if both the
KEXEC_UPDATE_FDT and KEXEC_UPDATE_ELFCOREHDR kexec flags are passed to
the kernel by the kexec tool. Passing these flags to the kernel
indicates that the FDT is built to accommodate possible CPUs, and the
FDT segment is not considered for SHA calculation, making it safe to
update the FDT.

Commit 88a6f8994421 ("crash: memory and CPU hotplug sysfs attributes")
added a sysfs interface to indicate userspace (kdump udev rule) that
kernel will update the kdump image on CPU hotplug events, so kdump
reload can be avoided. Implement arch specific function
`arch_crash_hotplug_cpu_support()` to correctly advertise kernel
capability to update kdump image.

This feature is advertised to userspace when the following conditions
are met:

1. Kdump image is loaded using kexec_file_load system call.
2. Kdump image is loaded using kexec_load system and both
   KEXEC_UPATE_ELFCOREHDR and KEXEC_UPDATE_FDT kexec flags are
   passed to kernel.

The changes related to this feature are kept under the CRASH_HOTPLUG
config, and it is enabled by default.

Signed-off-by: Sourabh Jain 
Cc: Akhil Raj 
Cc: Andrew Morton 
Cc: Aneesh Kumar K.V 
Cc: Baoquan He 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Eric DeVolder 
Cc: Greg Kroah-Hartman 
Cc: Hari Bathini 
Cc: Laurent Dufour 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Mimi Zohar 
Cc: Naveen N Rao 
Cc: Oscar Salvador 
Cc: Thomas Gleixner 
Cc: Valentin Schneider 
Cc:

[PATCH v14 6/6] powerpc: add crash memory hotplug support

2023-12-11 Thread Sourabh Jain
Extend the arch crash hotplug handler, as introduced by the patch title
("powerpc: add crash CPU hotplug support"), to also support memory
add/remove events.

Elfcorehdr describes the memory of the crash kernel to capture the
kernel; hence, it needs to be updated if memory resources change due to
memory add/remove events. Therefore, arch_crash_handle_hotplug_event()
is updated to recreate the elfcorehdr and replace it with the previous
one on memory add/remove events.

The memblock list is used to prepare the elfcorehdr. In the case of
memory hot removal, the memblock list is updated after the arch crash
hotplug handler is triggered, as depicted in Figure 1. Thus, the
hot-removed memory is explicitly removed from the crash memory ranges
to ensure that the memory ranges added to elfcorehdr do not include the
hot-removed memory.

Memory remove
  |
  v
Offline pages
  |
  v
 Initiate memory notify call <> crash hotplug handler
 chain for MEM_OFFLINE event
  |
  v
 Update memblock list

Figure 1

There are two system calls, `kexec_file_load` and `kexec_load`, used to
load the kdump image. A few changes have been made to ensure that the
kernel can safely update the elfcorehdr component of the kdump image for
both system calls.

For the kexec_file_load syscall, kdump image is prepared in the kernel.
To support an increasing number of memory regions, the elfcorehdr is
built with extra buffer space to ensure that it can accommodate
additional memory ranges in future.

For the kexec_load syscall, the elfcorehdr is updated only if both the
KEXEC_UPDATE_FDT and KEXEC_UPDATE_ELFCOREHDR kexec flags are passed to
the kernel by the kexec tool. Passing these flags to the kernel
indicates that the elfcorehdr is built to accommodate additional memory
ranges and the elfcorehdr segment is not considered for SHA calculation,
making it safe to update it.

Commit 88a6f8994421 ("crash: memory and CPU hotplug sysfs attributes")
added a sysfs interface to indicate userspace (kdump udev rule) that
kernel will update the kdump image on memory hotplug events, so kdump
reload can be avoided. Implement arch specific function
`arch_crash_hotplug_memory_support()` to correctly advertise kernel
capability to update kdump image.

This feature is advertised to usersapce when following conditions are
met:

1. Kdump image is loaded using kexec_file_load system call.
2. Kdump image is loaded using kexec_load system and both
   KEXEC_UPATE_ELFCOREHDR and KEXEC_UPDATE_FDT kexec flags are passed
   to kernel.

The changes related to this feature are kept under the CRASH_HOTPLUG
config, and it is enabled by default.

Signed-off-by: Sourabh Jain 
Cc: Akhil Raj 
Cc: Andrew Morton 
Cc: Aneesh Kumar K.V 
Cc: Baoquan He 
Cc: Borislav Petkov (AMD) 
Cc: Boris Ostrovsky 
Cc: Christophe Leroy 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: David Hildenbrand 
Cc: Eric DeVolder 
Cc: Greg Kroah-Hartman 
Cc: Hari Bathini 
Cc: Laurent Dufour 
Cc: Mahesh Salgaonkar 
Cc: Michael Ellerman 
Cc: Mimi Zohar 
Cc: Naveen N Rao 
Cc: Oscar Salvador 
Cc: Thomas Gleixner 
Cc: Valentin Schneider 
Cc: Vivek Goyal 
Cc: ke...@lists.infradead.org
Cc: x...@kernel.org
---
 arch/powerpc/include/asm/kexec.h|   8 ++
 arch/powerpc/include/asm/kexec_ranges.h |   1 +
 arch/powerpc/kexec/core_64.c| 126 ++--
 arch/powerpc/kexec/file_load_64.c   |  34 ++-
 arch/powerpc/kexec/ranges.c |  85 
 5 files changed, 245 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 7823ab10d323..c8d6cfda523c 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -122,6 +122,14 @@ int arch_crash_hotplug_cpu_support(struct kimage *image);
 #define arch_crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
 #endif
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+int arch_crash_hotplug_memory_support(struct kimage *image);
+#define arch_crash_hotplug_memory_support arch_crash_hotplug_memory_support
+#endif
+
+unsigned int arch_crash_get_elfcorehdr_size(void);
+#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
+
 #endif /*CONFIG_CRASH_HOTPLUG */
 #endif /* CONFIG_PPC64 */
 
diff --git a/arch/powerpc/include/asm/kexec_ranges.h 
b/arch/powerpc/include/asm/kexec_ranges.h
index f83866a19e87..802abf580cf0 100644
--- a/arch/powerpc/include/asm/kexec_ranges.h
+++ b/arch/powerpc/include/asm/kexec_ranges.h
@@ -7,6 +7,7 @@
 void sort_memory_ranges(struct crash_mem *mrngs, bool merge);
 struct crash_mem *realloc_mem_ranges(struct crash_mem **mem_ranges);
 int add_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
+int remove_mem_range(struct crash_mem **mem_ranges, u64 base, u64 size);
 int add_tce_mem_ranges(struct crash_mem **mem_ranges);
 int add_initrd_mem_range(struct crash_mem **mem_ranges);
 #ifdef CONFIG_PPC_64S_HASH_MMU
diff --git a/arch/powerpc/kexec/core_64.c b/arch/power

Re: [PATCH 1/1] powerpc/debug: implement HAVE_USER_RETURN_NOTIFIER

2023-12-11 Thread Michael Ellerman
Luming Yu  writes:

> The support for user return notifier infrastructure
> is hooked into powerpc architecture.
> ---
>  arch/powerpc/Kconfig|  1 +
>  arch/powerpc/include/asm/entry-common.h | 16 
>  arch/powerpc/include/asm/thread_info.h  |  2 ++
>  arch/powerpc/kernel/process.c   |  2 ++
>  4 files changed, 21 insertions(+)
>  create mode 100644 arch/powerpc/include/asm/entry-common.h
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index c10229c0243c..b968068cc04a 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -277,6 +277,7 @@ config PPC
>   select HAVE_STACKPROTECTOR  if PPC64 && 
> $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
>   select HAVE_STATIC_CALL if PPC32
>   select HAVE_SYSCALL_TRACEPOINTS
> + select HAVE_USER_RETURN_NOTIFIER
>   select HAVE_VIRT_CPU_ACCOUNTING
>   select HAVE_VIRT_CPU_ACCOUNTING_GEN
>   select HOTPLUG_SMT  if HOTPLUG_CPU
> diff --git a/arch/powerpc/include/asm/entry-common.h 
> b/arch/powerpc/include/asm/entry-common.h
> new file mode 100644
> index ..51f1eb767696
> --- /dev/null
> +++ b/arch/powerpc/include/asm/entry-common.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef ARCH_POWERPC_ENTRY_COMMON_H
> +#define ARCH_POWERPC_ENTRY_COMMON_H
> +
> +#include 
> +
> +static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
> +   unsigned long ti_work)
> +{
> + if (ti_work & _TIF_USER_RETURN_NOTIFY)
> + fire_user_return_notifiers();
> +}
> +
> +#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare

AFAICS this is never called because we don't use CONFIG_GENERIC_ENTRY ?

cheers


Re: [PATCH v4 0/5] powerpc/smp: Topology and shared processor optimizations

2023-12-11 Thread Michael Ellerman
Srikar Dronamraju  writes:
> * Srikar Dronamraju  [2023-11-09 11:19:28]:
>
> Hi Michael,
>
>> PowerVM systems configured in shared processors mode have some unique
>> challenges. Some device-tree properties will be missing on a shared
>> processor. Hence some sched domains may not make sense for shared processor
>> systems.
>> 
>> 
>
> Did you get a chance to look at this patchset?
> Do you see this getting pulled into your merge branch?
> I havent seen any comments that requires a change from the current patchset.

I was assuming you'd send another version to at least incorporate the
clarifications you posted.

And I wasn't really sure the discussion about the prink_once() change
was resolved. Anyway I'll check with Aneesh.

cheers


Re: [PATCH 1/2] powerpc/locking: implement this_cpu_cmpxchg local API

2023-12-11 Thread Michael Ellerman
Hi Luming Yu,

Luming Yu  writes:
> ppc appears to have already supported cmpxchg-local atomic semantics
> that is defined by the kernel convention of the feature.
> Add this_cpu_cmpxchg ppc local for the performance benefit of arch
> sepcific implementation than asm-generic c verison of the locking API.

Implementing this has been suggested before but it wasn't clear that it
was actually going to perform better than the generic version.

On 64-bit we have interrupt soft masking, so disabling interrupts is
relatively cheap. So the generic this_cpu_cmpxhg using irq disable just
becomes a few loads & stores, no atomic ops required.

In contrast implementing it using __cmpxchg_local() will use
ldarx/stdcx etc. which will be more expensive.

Have you done any performance measurements?

It probably is a bit fishy that we don't mask PMU interrupts vs
this_cpu_cmpxchg() etc., but I don't think it's actually a bug given the
few places using this_cpu_cmpxchg(). Though I could be wrong about that.

> diff --git a/arch/powerpc/include/asm/percpu.h 
> b/arch/powerpc/include/asm/percpu.h
> index 8e5b7d0b851c..ceab5df6e7ab 100644
> --- a/arch/powerpc/include/asm/percpu.h
> +++ b/arch/powerpc/include/asm/percpu.h
> @@ -18,5 +18,22 @@
>  #include 
>  
>  #include 
> +#include 
> +#ifdef this_cpu_cmpxchg_1
> +#undef this_cpu_cmpxchg_1
 
If we need to undef then I think something has gone wrong with the
header inclusion order, the model should be that the arch defines what
it has and the generic code provides fallbacks if the arch didn't define
anything.

> +#define this_cpu_cmpxchg_1(pcp, oval, nval)  
> __cmpxchg_local(raw_cpu_ptr(&(pcp)), oval, nval, 1)

I think this is unsafe vs preemption. The raw_cpu_ptr() can generate the
per-cpu address, but then the task can be preempted and moved to a
different CPU before the ldarx/stdcx do the cmpxchg.

The arm64 implementation had the same bug until they fixed it.

> +#endif 
> +#ifdef this_cpu_cmpxchg_2
> +#undef this_cpu_cmpxchg_2
> +#define this_cpu_cmpxchg_2(pcp, oval, nval)  
> __cmpxchg_local(raw_cpu_ptr(&(pcp)), oval, nval, 2)
> +#endif
> +#ifdef this_cpu_cmpxchg_4
> +#undef this_cpu_cmpxchg_4
> +#define this_cpu_cmpxchg_4(pcp, oval, nval)  
> __cmpxchg_local(raw_cpu_ptr(&(pcp)), oval, nval, 4)
> +#endif
> +#ifdef this_cpu_cmpxchg_8
> +#undef this_cpu_cmpxchg_8
> +#define this_cpu_cmpxchg_8(pcp, oval, nval)  
> __cmpxchg_local(raw_cpu_ptr(&(pcp)), oval, nval, 8)
> +#endif
>  
>  #endif /* _ASM_POWERPC_PERCPU_H_ */

cheers


Re: [RFC PATCH 10/12] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT

2023-12-11 Thread Michael Ellerman
Hi Samuel,

Thanks for trying to clean all this up.

One problem below.

Samuel Holland  writes:
> Now that all previously-supported architectures select
> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
> of the existing list of architectures. It can also take advantage of the
> common kernel-mode FPU API and method of adjusting CFLAGS.
>
> Signed-off-by: Samuel Holland 
...
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> index 4ae4720535a5..b64f917174ca 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> @@ -87,20 +78,9 @@ void dc_fpu_begin(const char *function_name, const int 
> line)
>   WARN_ON_ONCE(!in_task());
>   preempt_disable();
>   depth = __this_cpu_inc_return(fpu_recursion_depth);
> -
>   if (depth == 1) {
> -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
> + BUG_ON(!kernel_fpu_available());
>   kernel_fpu_begin();
> -#elif defined(CONFIG_PPC64)
> - if (cpu_has_feature(CPU_FTR_VSX_COMP))
> - enable_kernel_vsx();
> - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
> - enable_kernel_altivec();
 
Note altivec.

> - else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
> - enable_kernel_fp();
> -#elif defined(CONFIG_ARM64)
> - kernel_neon_begin();
> -#endif
>   }
>  
>   TRACE_DCN_FPU(true, function_name, line, depth);
> diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile 
> b/drivers/gpu/drm/amd/display/dc/dml/Makefile
> index ea7d60f9a9b4..5aad0f572ba3 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
> +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
> @@ -25,40 +25,8 @@
>  # It provides the general basic services required by other DAL
>  # subcomponents.
>  
> -ifdef CONFIG_X86
> -dml_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float
> -dml_ccflags := $(dml_ccflags-y) -msse
> -endif
> -
> -ifdef CONFIG_PPC64
> -dml_ccflags := -mhard-float -maltivec
> -endif

And altivec is enabled in the flags there.

That doesn't match your implementation for powerpc in patch 7, which
only deals with float.

I suspect the AMD driver actually doesn't need altivec enabled, but I
don't know that for sure. It compiles without it, but I don't have a GPU
to actually test. I've added Timothy on Cc who added the support for
powerpc to the driver originally, hopefully he has a test system.

Anyway if that's true that it doesn't need altivec we should probably do
a lead-up patch that drops altivec from the AMD driver explicitly, eg.
as below.

cheers


diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
index 4ae4720535a5..0de16796466b 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -92,11 +92,7 @@ void dc_fpu_begin(const char *function_name, const int line)
 #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
kernel_fpu_begin();
 #elif defined(CONFIG_PPC64)
-   if (cpu_has_feature(CPU_FTR_VSX_COMP))
-   enable_kernel_vsx();
-   else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
-   enable_kernel_altivec();
-   else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
+   if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
enable_kernel_fp();
 #elif defined(CONFIG_ARM64)
kernel_neon_begin();
@@ -125,11 +121,7 @@ void dc_fpu_end(const char *function_name, const int line)
 #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
kernel_fpu_end();
 #elif defined(CONFIG_PPC64)
-   if (cpu_has_feature(CPU_FTR_VSX_COMP))
-   disable_kernel_vsx();
-   else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
-   disable_kernel_altivec();
-   else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
+   if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
disable_kernel_fp();
 #elif defined(CONFIG_ARM64)
kernel_neon_end();
diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile 
b/drivers/gpu/drm/amd/display/dc/dml/Makefile
index 6042a5a6a44f..554c39024a40 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
@@ -31,7 +31,7 @@ dml_ccflags := $(dml_ccflags-y) -msse
 endif
 
 ifdef CONFIG_PPC64
-dml_ccflags := -mhard-float -maltivec
+dml_ccflags := -mhard-float
 endif
 
 ifdef CONFIG_ARM64
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile 
b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
index acff3449b8d7..7b51364084b5 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
@@ -30,7 +30,7 @@ dml2_ccflags := $(dml2_ccflags-y) 

[PATCH AUTOSEL 6.6 12/47] ASoC: fsl_xcvr: Enable 2 * TX bit clock for spdif only case

2023-12-11 Thread Sasha Levin
From: Shengjiu Wang 

[ Upstream commit c33fd110424dfcb544cf55a1b312f43fe1918235 ]

The bit 10 in TX_DPTH_CTRL register controls the TX clock rate.
If this bit is set, TX datapath clock should be = 2* TX bit rate.
If this bit is not set, TX datapath clock should be 10* TX bit rate.

As the spdif only case, we always use 2 * TX bit clock, so
this bit need to be set.

Signed-off-by: Shengjiu Wang 
Link: 
https://lore.kernel.org/r/1700617373-6472-1-git-send-email-shengjiu.w...@nxp.com
Signed-off-by: Mark Brown 
Signed-off-by: Sasha Levin 
---
 sound/soc/fsl/fsl_xcvr.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/sound/soc/fsl/fsl_xcvr.c b/sound/soc/fsl/fsl_xcvr.c
index fa0a15263c66d..77f8e2394bf93 100644
--- a/sound/soc/fsl/fsl_xcvr.c
+++ b/sound/soc/fsl/fsl_xcvr.c
@@ -414,6 +414,16 @@ static int fsl_xcvr_prepare(struct snd_pcm_substream 
*substream,
 
switch (xcvr->mode) {
case FSL_XCVR_MODE_SPDIF:
+   if (xcvr->soc_data->spdif_only && tx) {
+   ret = regmap_update_bits(xcvr->regmap, 
FSL_XCVR_TX_DPTH_CTRL_SET,
+
FSL_XCVR_TX_DPTH_CTRL_BYPASS_FEM,
+
FSL_XCVR_TX_DPTH_CTRL_BYPASS_FEM);
+   if (ret < 0) {
+   dev_err(dai->dev, "Failed to set bypass fem: 
%d\n", ret);
+   return ret;
+   }
+   }
+   fallthrough;
case FSL_XCVR_MODE_ARC:
if (tx) {
ret = fsl_xcvr_en_aud_pll(xcvr, fout);
-- 
2.42.0



[PATCH AUTOSEL 6.6 15/47] ASoC: fsl_xcvr: refine the requested phy clock frequency

2023-12-11 Thread Sasha Levin
From: Shengjiu Wang 

[ Upstream commit 347ecf29a68cc8958fbcbd26ef410d07fe9d82f4 ]

As the input phy clock frequency will divided by 2 by default
on i.MX8MP with the implementation of clk-imx8mp-audiomix driver,
So the requested frequency need to be updated.

The relation of phy clock is:
sai_pll_ref_sel
   sai_pll
  sai_pll_bypass
 sai_pll_out
sai_pll_out_div2
   earc_phy_cg

Signed-off-by: Shengjiu Wang 
Reviewed-by: Iuliana Prodan 
Link: 
https://lore.kernel.org/r/1700702093-8008-1-git-send-email-shengjiu.w...@nxp.com
Signed-off-by: Mark Brown 
Signed-off-by: Sasha Levin 
---
 sound/soc/fsl/fsl_xcvr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_xcvr.c b/sound/soc/fsl/fsl_xcvr.c
index 77f8e2394bf93..f0fb33d719c25 100644
--- a/sound/soc/fsl/fsl_xcvr.c
+++ b/sound/soc/fsl/fsl_xcvr.c
@@ -358,7 +358,7 @@ static int fsl_xcvr_en_aud_pll(struct fsl_xcvr *xcvr, u32 
freq)
struct device *dev = &xcvr->pdev->dev;
int ret;
 
-   freq = xcvr->soc_data->spdif_only ? freq / 10 : freq;
+   freq = xcvr->soc_data->spdif_only ? freq / 5 : freq;
clk_disable_unprepare(xcvr->phy_clk);
ret = clk_set_rate(xcvr->phy_clk, freq);
if (ret < 0) {
@@ -409,7 +409,7 @@ static int fsl_xcvr_prepare(struct snd_pcm_substream 
*substream,
bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
u32 m_ctl = 0, v_ctl = 0;
u32 r = substream->runtime->rate, ch = substream->runtime->channels;
-   u32 fout = 32 * r * ch * 10 * 2;
+   u32 fout = 32 * r * ch * 10;
int ret = 0;
 
switch (xcvr->mode) {
-- 
2.42.0



Re: [RFC PATCH 01/12] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT

2023-12-11 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 02/12] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2023-12-11 Thread Christoph Hellwig
> --- /dev/null
> +++ b/arch/arm/include/asm/fpu.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * linux/arch/arm/include/asm/fpu.h

Please don't add the file name to top of the file comments.  It serves
no purpose and easily gets out of date.

Except for that:

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 03/12] ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS

2023-12-11 Thread Christoph Hellwig
On Thu, Dec 07, 2023 at 09:54:33PM -0800, Samuel Holland wrote:
> Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
> tree, use it instead of duplicating the flags here.

Looks good:

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 04/12] arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2023-12-11 Thread Christoph Hellwig
> + * linux/arch/arm64/include/asm/fpu.h

Same comment as for arm here.  Except for that:

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 05/12] lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS

2023-12-11 Thread Christoph Hellwig
> +CFLAGS_REMOVE_neon1.o += $(CC_FLAGS_NO_FPU)
> +CFLAGS_REMOVE_neon2.o += $(CC_FLAGS_NO_FPU)
> +CFLAGS_REMOVE_neon4.o += $(CC_FLAGS_NO_FPU)
> +CFLAGS_REMOVE_neon8.o += $(CC_FLAGS_NO_FPU)

Btw, do we even really need the extra variables for compiler flags
to remove?  Don't gcc/clang options work so that if you add a
no-prefixed version of the option later it transparently gets removed?

Except for that:

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 06/12] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2023-12-11 Thread Christoph Hellwig
On Thu, Dec 07, 2023 at 09:54:36PM -0800, Samuel Holland wrote:
> LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in
> asm/fpu.h, so it only needs to add kernel_fpu_available() and export
> the CFLAGS adjustments.

Looks good:

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 07/12] powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2023-12-11 Thread Christoph Hellwig
On Thu, Dec 07, 2023 at 09:54:37PM -0800, Samuel Holland wrote:
> PowerPC provides an equivalent to the common kernel-mode FPU API, but in
> a different header and using different function names. The PowerPC API
> also requires a non-preemptible context. Add a wrapper header, and
> export the CFLAGS adjustments.

Looks good:

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 08/12] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2023-12-11 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 09/12] riscv: Add support for kernel-mode FPU

2023-12-11 Thread Christoph Hellwig
> +#ifdef __riscv_f
> +
> +#define kernel_fpu_begin() \
> + static_assert(false, "floating-point code must use a separate 
> translation unit")
> +#define kernel_fpu_end() kernel_fpu_begin()
> +
> +#else
> +
> +void kernel_fpu_begin(void);
> +void kernel_fpu_end(void);
> +
> +#endif

I'll assume this is related to trick that places code in a separate
translation unit, but I fail to understand it.  Can you add a comment
explaining it?



Re: [RFC PATCH 05/12] lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS

2023-12-11 Thread Samuel Holland
On 2023-12-11 10:07 AM, Christoph Hellwig wrote:
>> +CFLAGS_REMOVE_neon1.o += $(CC_FLAGS_NO_FPU)
>> +CFLAGS_REMOVE_neon2.o += $(CC_FLAGS_NO_FPU)
>> +CFLAGS_REMOVE_neon4.o += $(CC_FLAGS_NO_FPU)
>> +CFLAGS_REMOVE_neon8.o += $(CC_FLAGS_NO_FPU)
> 
> Btw, do we even really need the extra variables for compiler flags
> to remove?  Don't gcc/clang options work so that if you add a
> no-prefixed version of the option later it transparently gets removed?

Unfortunately, not all of the relevant options can be no-prefixed:

$ cat float.c 
int main(void) { volatile float f = 123.456; return f / 10; }
$ aarch64-linux-musl-gcc float.c 
$ aarch64-linux-musl-gcc -mgeneral-regs-only float.c 
float.c: In function 'main':
float.c:1:33: error: '-mgeneral-regs-only' is incompatible with the use of 
floating-point types
1 | int main(void) { volatile float f = 123.456; return f / 10; }
  | ^
float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of 
floating-point types
1 | int main(void) { volatile float f = 123.456; return f / 10; }
  | ~~^~~~
float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of 
floating-point types
float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of 
floating-point types
float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of 
floating-point types
float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of 
floating-point types
float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of 
floating-point types
float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of 
floating-point types
float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of 
floating-point types
float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of 
floating-point types
$ aarch64-linux-musl-gcc -mgeneral-regs-only -mno-general-regs-only float.c 
aarch64-linux-musl-gcc: error: unrecognized command-line option 
'-mno-general-regs-only'; did you mean '-mgeneral-regs-only'?
$ 



Re: [RFC PATCH 09/12] riscv: Add support for kernel-mode FPU

2023-12-11 Thread Samuel Holland
On 2023-12-11 10:11 AM, Christoph Hellwig wrote:
>> +#ifdef __riscv_f
>> +
>> +#define kernel_fpu_begin() \
>> +static_assert(false, "floating-point code must use a separate 
>> translation unit")
>> +#define kernel_fpu_end() kernel_fpu_begin()
>> +
>> +#else
>> +
>> +void kernel_fpu_begin(void);
>> +void kernel_fpu_end(void);
>> +
>> +#endif
> 
> I'll assume this is related to trick that places code in a separate
> translation unit, but I fail to understand it.  Can you add a comment
> explaining it?

Yes, I can add a comment. Here, __riscv_f refers to RISC-V's F extension for
single-precision floating point, which is enabled by CC_FLAGS_FPU.



Re: [RFC PATCH 11/12] selftests/fpu: Move FP code to a separate translation unit

2023-12-11 Thread Christoph Hellwig
>  obj-$(CONFIG_TEST_FPU) += test_fpu.o
> -CFLAGS_test_fpu.o += $(FPU_CFLAGS)
> +test_fpu-y := test_fpu_glue.o test_fpu_impl.o
> +CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS)

Btw, I really wonder if having a

modname-fpu += foo.o

syntax in kbuild wouldn't be preferable to this.  Of coure that requires
someone who understands kbuild inside out.

> +int test_fpu(void);

This needs to go into a header.

And I think I underatand your way to enforce the use of a separate
compilation unit in the riscv patch now.

Can we just make that generic, e.g. have a  that wraps
 that does the guard based on a
-D_LINUX_FPU_COMPILATION_UNIT=1 on the command line so that all the
code becomes fully portable?  Any legacy arch specific fpu users not
using  would not be affected by it, although it would be
great to eventually migrate them to the common scheme.



Re: [RFC PATCH 12/12] selftests/fpu: Allow building on other architectures

2023-12-11 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH v6 4/4] PCI: layerscape: Add suspend/resume for ls1043a

2023-12-11 Thread Frank Li
On Mon, Dec 04, 2023 at 11:08:29AM -0500, Frank Li wrote:
> Add suspend/resume support for Layerscape LS1043a.
> 
> In the suspend path, PME_Turn_Off message is sent to the endpoint to
> transition the link to L2/L3_Ready state. In this SoC, there is no way to
> check if the controller has received the PME_To_Ack from the endpoint or
> not. So to be on the safer side, the driver just waits for
> PCIE_PME_TO_L2_TIMEOUT_US before asserting the SoC specific PMXMTTURNOFF
> bit to complete the PME_Turn_Off handshake. Then the link would enter L2/L3
> state depending on the VAUX supply.
> 
> In the resume path, the link is brought back from L2 to L0 by doing a
> software reset.
> 
> Acked-by: Roy Zang 
> Reviewed-by: Manivannan Sadhasivam 
> Signed-off-by: Frank Li 

@Lorenzo:

   Could you please pick up these patches? Mani already reviewed and only
impact layerscape platform.

Frank

> ---
> 
> Notes:
> Chagne from v5 to v6
> - none
> Change from v4 to v5
> - update commit message
> - use comments
> /* Reset the PEX wrapper to bring the link out of L2 */
> 
> Change from v3 to v4
> - Call scfg_pcie_send_turnoff_msg() shared with ls1021a
> - update commit message
> 
> Change from v2 to v3
> - Remove ls_pcie_lut_readl(writel) function
> 
> Change from v1 to v2
> - Update subject 'a' to 'A'
> 
>  drivers/pci/controller/dwc/pci-layerscape.c | 63 -
>  1 file changed, 62 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
> b/drivers/pci/controller/dwc/pci-layerscape.c
> index f3dfb70066fb7..7cdada200de7e 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape.c
> @@ -41,6 +41,15 @@
>  #define SCFG_PEXSFTRSTCR 0x190
>  #define PEXSR(idx)   BIT(idx)
>  
> +/* LS1043A PEX PME control register */
> +#define SCFG_PEXPMECR0x144
> +#define PEXPME(idx)  BIT(31 - (idx) * 4)
> +
> +/* LS1043A PEX LUT debug register */
> +#define LS_PCIE_LDBG 0x7fc
> +#define LDBG_SR  BIT(30)
> +#define LDBG_WE  BIT(31)
> +
>  #define PCIE_IATU_NUM6
>  
>  struct ls_pcie_drvdata {
> @@ -224,6 +233,45 @@ static int ls1021a_pcie_exit_from_l2(struct dw_pcie_rp 
> *pp)
>   return scfg_pcie_exit_from_l2(pcie->scfg, SCFG_PEXSFTRSTCR, 
> PEXSR(pcie->index));
>  }
>  
> +static void ls1043a_pcie_send_turnoff_msg(struct dw_pcie_rp *pp)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> + struct ls_pcie *pcie = to_ls_pcie(pci);
> +
> + scfg_pcie_send_turnoff_msg(pcie->scfg, SCFG_PEXPMECR, 
> PEXPME(pcie->index));
> +}
> +
> +static int ls1043a_pcie_exit_from_l2(struct dw_pcie_rp *pp)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> + struct ls_pcie *pcie = to_ls_pcie(pci);
> + u32 val;
> +
> + /*
> +  * Reset the PEX wrapper to bring the link out of L2.
> +  * LDBG_WE: allows the user to have write access to the PEXDBG[SR] for 
> both setting and
> +  *  clearing the soft reset on the PEX module.
> +  * LDBG_SR: When SR is set to 1, the PEX module enters soft reset.
> +  */
> + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> + val |= LDBG_WE;
> + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> +
> + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> + val |= LDBG_SR;
> + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> +
> + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> + val &= ~LDBG_SR;
> + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> +
> + val = ls_pcie_pf_lut_readl(pcie, LS_PCIE_LDBG);
> + val &= ~LDBG_WE;
> + ls_pcie_pf_lut_writel(pcie, LS_PCIE_LDBG, val);
> +
> + return 0;
> +}
> +
>  static const struct dw_pcie_host_ops ls_pcie_host_ops = {
>   .host_init = ls_pcie_host_init,
>   .pme_turn_off = ls_pcie_send_turnoff_msg,
> @@ -241,6 +289,19 @@ static const struct ls_pcie_drvdata ls1021a_drvdata = {
>   .exit_from_l2 = ls1021a_pcie_exit_from_l2,
>  };
>  
> +static const struct dw_pcie_host_ops ls1043a_pcie_host_ops = {
> + .host_init = ls_pcie_host_init,
> + .pme_turn_off = ls1043a_pcie_send_turnoff_msg,
> +};
> +
> +static const struct ls_pcie_drvdata ls1043a_drvdata = {
> + .pf_lut_off = 0x1,
> + .pm_support = true,
> + .scfg_support = true,
> + .ops = &ls1043a_pcie_host_ops,
> + .exit_from_l2 = ls1043a_pcie_exit_from_l2,
> +};
> +
>  static const struct ls_pcie_drvdata layerscape_drvdata = {
>   .pf_lut_off = 0xc,
>   .pm_support = true,
> @@ -252,7 +313,7 @@ static const struct of_device_id ls_pcie_of_match[] = {
>   { .compatible = "fsl,ls1012a-pcie", .data = &layerscape_drvdata },
>   { .compatible = "fsl,ls1021a-pcie", .data = &ls1021a_drvdata },
>   { .compatible = "fsl,ls1028a-pcie", .data = &layerscape_drvdata },
> - { .compatible = "fsl,ls1043a-pcie", .data = &ls1021a_drvdata },

[PATCH v3 18/35] powerpc: optimize arch code by using atomic find_bit() API

2023-12-11 Thread Yury Norov
Use find_and_{set,clear}_bit() where appropriate and simplify the logic.

Signed-off-by: Yury Norov 
---
 arch/powerpc/mm/book3s32/mmu_context.c | 10 ++---
 arch/powerpc/platforms/pasemi/dma_lib.c| 45 +-
 arch/powerpc/platforms/powernv/pci-sriov.c | 12 ++
 3 files changed, 17 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/mm/book3s32/mmu_context.c 
b/arch/powerpc/mm/book3s32/mmu_context.c
index 1922f9a6b058..7db19f173c2e 100644
--- a/arch/powerpc/mm/book3s32/mmu_context.c
+++ b/arch/powerpc/mm/book3s32/mmu_context.c
@@ -50,13 +50,11 @@ static unsigned long context_map[LAST_CONTEXT / 
BITS_PER_LONG + 1];
 
 unsigned long __init_new_context(void)
 {
-   unsigned long ctx = next_mmu_context;
+   unsigned long ctx;
 
-   while (test_and_set_bit(ctx, context_map)) {
-   ctx = find_next_zero_bit(context_map, LAST_CONTEXT+1, ctx);
-   if (ctx > LAST_CONTEXT)
-   ctx = 0;
-   }
+   ctx = find_and_set_next_bit(context_map, LAST_CONTEXT + 1, 
next_mmu_context);
+   if (ctx > LAST_CONTEXT)
+   ctx = 0;
next_mmu_context = (ctx + 1) & LAST_CONTEXT;
 
return ctx;
diff --git a/arch/powerpc/platforms/pasemi/dma_lib.c 
b/arch/powerpc/platforms/pasemi/dma_lib.c
index 1be1f18f6f09..906dabee0132 100644
--- a/arch/powerpc/platforms/pasemi/dma_lib.c
+++ b/arch/powerpc/platforms/pasemi/dma_lib.c
@@ -118,14 +118,9 @@ static int pasemi_alloc_tx_chan(enum pasemi_dmachan_type 
type)
limit = MAX_TXCH;
break;
}
-retry:
-   bit = find_next_bit(txch_free, MAX_TXCH, start);
-   if (bit >= limit)
-   return -ENOSPC;
-   if (!test_and_clear_bit(bit, txch_free))
-   goto retry;
-
-   return bit;
+
+   bit = find_and_clear_next_bit(txch_free, MAX_TXCH, start);
+   return bit < limit ? bit : -ENOSPC;
 }
 
 static void pasemi_free_tx_chan(int chan)
@@ -136,15 +131,9 @@ static void pasemi_free_tx_chan(int chan)
 
 static int pasemi_alloc_rx_chan(void)
 {
-   int bit;
-retry:
-   bit = find_first_bit(rxch_free, MAX_RXCH);
-   if (bit >= MAX_TXCH)
-   return -ENOSPC;
-   if (!test_and_clear_bit(bit, rxch_free))
-   goto retry;
-
-   return bit;
+   int bit = find_and_clear_bit(rxch_free, MAX_RXCH);
+
+   return bit < MAX_TXCH ? bit : -ENOSPC;
 }
 
 static void pasemi_free_rx_chan(int chan)
@@ -374,16 +363,9 @@ EXPORT_SYMBOL(pasemi_dma_free_buf);
  */
 int pasemi_dma_alloc_flag(void)
 {
-   int bit;
+   int bit = find_and_clear_bit(flags_free, MAX_FLAGS);
 
-retry:
-   bit = find_first_bit(flags_free, MAX_FLAGS);
-   if (bit >= MAX_FLAGS)
-   return -ENOSPC;
-   if (!test_and_clear_bit(bit, flags_free))
-   goto retry;
-
-   return bit;
+   return bit < MAX_FLAGS ? bit : -ENOSPC;
 }
 EXPORT_SYMBOL(pasemi_dma_alloc_flag);
 
@@ -439,16 +421,9 @@ EXPORT_SYMBOL(pasemi_dma_clear_flag);
  */
 int pasemi_dma_alloc_fun(void)
 {
-   int bit;
-
-retry:
-   bit = find_first_bit(fun_free, MAX_FLAGS);
-   if (bit >= MAX_FLAGS)
-   return -ENOSPC;
-   if (!test_and_clear_bit(bit, fun_free))
-   goto retry;
+   int bit = find_and_clear_bit(fun_free, MAX_FLAGS);
 
-   return bit;
+   return bit < MAX_FLAGS ? bit : -ENOSPC;
 }
 EXPORT_SYMBOL(pasemi_dma_alloc_fun);
 
diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c 
b/arch/powerpc/platforms/powernv/pci-sriov.c
index 59882da3e742..640e387e6d83 100644
--- a/arch/powerpc/platforms/powernv/pci-sriov.c
+++ b/arch/powerpc/platforms/powernv/pci-sriov.c
@@ -397,18 +397,12 @@ static int64_t pnv_ioda_map_m64_single(struct pnv_phb 
*phb,
 
 static int pnv_pci_alloc_m64_bar(struct pnv_phb *phb, struct pnv_iov_data *iov)
 {
-   int win;
+   int win = find_and_set_bit(&phb->ioda.m64_bar_alloc, 
phb->ioda.m64_bar_idx + 1);
 
-   do {
-   win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
-   phb->ioda.m64_bar_idx + 1, 0);
-
-   if (win >= phb->ioda.m64_bar_idx + 1)
-   return -1;
-   } while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
+   if (win >= phb->ioda.m64_bar_idx + 1)
+   return -1;
 
set_bit(win, iov->used_m64_bar_mask);
-
return win;
 }
 
-- 
2.40.1



[PATCH v3 00/35] bitops: add atomic find_bit() operations

2023-12-11 Thread Yury Norov
Add helpers around test_and_{set,clear}_bit() that allow to search for
clear or set bits and flip them atomically.

The target patterns may look like this:

for (idx = 0; idx < nbits; idx++)
if (test_and_clear_bit(idx, bitmap))
do_something(idx);

Or like this:

do {
bit = find_first_bit(bitmap, nbits);
if (bit >= nbits)
return nbits;
} while (!test_and_clear_bit(bit, bitmap));
return bit;

In both cases, the opencoded loop may be converted to a single function
or iterator call. Correspondingly:

for_each_test_and_clear_bit(idx, bitmap, nbits)
do_something(idx);

Or:
return find_and_clear_bit(bitmap, nbits);

Obviously, the less routine code people have to write themself, the
less probability to make a mistake.

Those are not only handy helpers but also resolve a non-trivial
issue of using non-atomic find_bit() together with atomic
test_and_{set,clear)_bit().

The trick is that find_bit() implies that the bitmap is a regular
non-volatile piece of memory, and compiler is allowed to use such
optimization techniques like re-fetching memory instead of caching it.

For example, find_first_bit() is implemented like this:

  for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {
  val = addr[idx];
  if (val) {
  sz = min(idx * BITS_PER_LONG + __ffs(val), sz);
  break;
  }
  }

On register-memory architectures, like x86, compiler may decide to
access memory twice - first time to compare against 0, and second time
to fetch its value to pass it to __ffs().

When running find_first_bit() on volatile memory, the memory may get
changed in-between, and for instance, it may lead to passing 0 to
__ffs(), which is undefined. This is a potentially dangerous call.

find_and_clear_bit() as a wrapper around test_and_clear_bit()
naturally treats underlying bitmap as a volatile memory and prevents
compiler from such optimizations.

Now that KCSAN is catching exactly this type of situations and warns on
undercover memory modifications. We can use it to reveal improper usage
of find_bit(), and convert it to atomic find_and_*_bit() as appropriate.

In some cases concurrent operations with plain find_bit() are acceptable.
For example:

 - two threads running find_*_bit(): safe wrt ffs(0) and returns correct
   value, because underlying bitmap is unchanged;
 - find_next_bit() in parallel with set or clear_bit(), when modifying
   a bit prior to the start bit to search: safe and correct;
 - find_first_bit() in parallel with set_bit(): safe, but may return wrong
   bit number;
 - find_first_zero_bit() in parallel with clear_bit(): same as above.

In last 2 cases find_bit() may not return a correct bit number, but
it may be OK if caller requires any (not exactly the first) set or clear
bit, correspondingly.

In such cases, KCSAN may be safely silenced with data_race(). But in most
cases where KCSAN detects concurrency people should carefully review their
code and likely protect critical sections or switch to atomic
find_and_bit(), as appropriate.

The 1st patch of the series adds the following atomic primitives:

find_and_set_bit(addr, nbits);
find_and_set_next_bit(addr, nbits, start);
...

Here find_and_{set,clear} part refers to the corresponding
test_and_{set,clear}_bit function. Suffixes like _wrap or _lock
derive their semantics from corresponding find() or test() functions.

For brevity, the naming omits the fact that we search for zero bit in
find_and_set, and correspondingly search for set bit in find_and_clear
functions.

The patch also adds iterators with atomic semantics, like
for_each_test_and_set_bit(). Here, the naming rule is to simply prefix
corresponding atomic operation with 'for_each'.

In [1] Jan reported 2% slowdown in a single-thread search test when
switching find_bit() function to treat bitmaps as volatile arrays. On
the other hand, kernel robot in the same thread reported +3.7% to the
performance of will-it-scale.per_thread_ops test.

Assuming that our compilers are sane and generate better code against
properly annotated data, the above discrepancy doesn't look weird. When
running on non-volatile bitmaps, plain find_bit() outperforms atomic
find_and_bit(), and vice-versa.

So, all users of find_bit() API, where heavy concurrency is expected,
are encouraged to switch to atomic find_and_bit() as appropriate.

The 1st patch of this series adds atomic find_and_bit() API, 2nd adds
a basic test for new API, and all the following patches spread it over
the kernel.

They can be applied separately from each other on per-subsystems basis,
or I can pull them in bitmap tree, as appropriate.

[1] 
https://lore.kernel.org/lkml/634f5fdf-e236-42cf-be8d-48a581c21...@alu.unizg.hr/T/#m3e7341eb3571753f3acf8fe166f3fb5b2c12e615

---
v1: 
https://lore.kernel.org/netdev/2023111

[PATCH v3 02/35] lib/find: add test for atomic find_bit() ops

2023-12-11 Thread Yury Norov
Add basic functionality test for new API.

Signed-off-by: Yury Norov 
---
 lib/test_bitmap.c | 61 +++
 1 file changed, 61 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 65f22c2578b0..277e1ca9fd28 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -221,6 +221,65 @@ static void __init test_zero_clear(void)
expect_eq_pbl("", bmap, 1024);
 }
 
+static void __init test_find_and_bit(void)
+{
+   unsigned long w, w_part, bit, cnt = 0;
+   DECLARE_BITMAP(bmap, EXP1_IN_BITS);
+
+   /*
+* Test find_and_clear{_next}_bit() and corresponding
+* iterators
+*/
+   bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+   w = bitmap_weight(bmap, EXP1_IN_BITS);
+
+   for_each_test_and_clear_bit(bit, bmap, EXP1_IN_BITS)
+   cnt++;
+
+   expect_eq_uint(w, cnt);
+   expect_eq_uint(0, bitmap_weight(bmap, EXP1_IN_BITS));
+
+   bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+   w = bitmap_weight(bmap, EXP1_IN_BITS);
+   w_part = bitmap_weight(bmap, EXP1_IN_BITS / 3);
+
+   cnt = 0;
+   bit = EXP1_IN_BITS / 3;
+   for_each_test_and_clear_bit_from(bit, bmap, EXP1_IN_BITS)
+   cnt++;
+
+   expect_eq_uint(bitmap_weight(bmap, EXP1_IN_BITS), bitmap_weight(bmap, 
EXP1_IN_BITS / 3));
+   expect_eq_uint(w_part, bitmap_weight(bmap, EXP1_IN_BITS));
+   expect_eq_uint(w - w_part, cnt);
+
+   /*
+* Test find_and_set{_next}_bit() and corresponding
+* iterators
+*/
+   bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+   w = bitmap_weight(bmap, EXP1_IN_BITS);
+   cnt = 0;
+
+   for_each_test_and_set_bit(bit, bmap, EXP1_IN_BITS)
+   cnt++;
+
+   expect_eq_uint(EXP1_IN_BITS - w, cnt);
+   expect_eq_uint(EXP1_IN_BITS, bitmap_weight(bmap, EXP1_IN_BITS));
+
+   bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+   w = bitmap_weight(bmap, EXP1_IN_BITS);
+   w_part = bitmap_weight(bmap, EXP1_IN_BITS / 3);
+   cnt = 0;
+
+   bit = EXP1_IN_BITS / 3;
+   for_each_test_and_set_bit_from(bit, bmap, EXP1_IN_BITS)
+   cnt++;
+
+   expect_eq_uint(EXP1_IN_BITS - bitmap_weight(bmap, EXP1_IN_BITS),
+   EXP1_IN_BITS / 3 - bitmap_weight(bmap, EXP1_IN_BITS / 
3));
+   expect_eq_uint(EXP1_IN_BITS * 2 / 3 - (w - w_part), cnt);
+}
+
 static void __init test_find_nth_bit(void)
 {
unsigned long b, bit, cnt = 0;
@@ -1273,6 +1332,8 @@ static void __init selftest(void)
test_for_each_clear_bitrange_from();
test_for_each_set_clump8();
test_for_each_set_bit_wrap();
+
+   test_find_and_bit();
 }
 
 KSTM_MODULE_LOADERS(test_bitmap);
-- 
2.40.1



[PATCH v3 01/35] lib/find: add atomic find_bit() primitives

2023-12-11 Thread Yury Norov
Add helpers around test_and_{set,clear}_bit() that allow to search for
clear or set bits and flip them atomically.

The target patterns may look like this:

for (idx = 0; idx < nbits; idx++)
if (test_and_clear_bit(idx, bitmap))
do_something(idx);

Or like this:

do {
bit = find_first_bit(bitmap, nbits);
if (bit >= nbits)
return nbits;
} while (!test_and_clear_bit(bit, bitmap));
return bit;

In both cases, the opencoded loop may be converted to a single function
or iterator call. Correspondingly:

for_each_test_and_clear_bit(idx, bitmap, nbits)
do_something(idx);

Or:
return find_and_clear_bit(bitmap, nbits);

Obviously, the less routine code people have to write themself, the
less probability to make a mistake.

Those are not only handy helpers but also resolve a non-trivial
issue of using non-atomic find_bit() together with atomic
test_and_{set,clear)_bit().

The trick is that find_bit() implies that the bitmap is a regular
non-volatile piece of memory, and compiler is allowed to use such
optimization techniques like re-fetching memory instead of caching it.

For example, find_first_bit() is implemented like this:

  for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {
  val = addr[idx];
  if (val) {
  sz = min(idx * BITS_PER_LONG + __ffs(val), sz);
  break;
  }
  }

On register-memory architectures, like x86, compiler may decide to
access memory twice - first time to compare against 0, and second time
to fetch its value to pass it to __ffs().

When running find_first_bit() on volatile memory, the memory may get
changed in-between, and for instance, it may lead to passing 0 to
__ffs(), which is undefined. This is a potentially dangerous call.

find_and_clear_bit() as a wrapper around test_and_clear_bit()
naturally treats underlying bitmap as a volatile memory and prevents
compiler from such optimizations.

Now that KCSAN is catching exactly this type of situations and warns on
undercover memory modifications. We can use it to reveal improper usage
of find_bit(), and convert it to atomic find_and_*_bit() as appropriate.

In some cases concurrent operations with plain find_bit() are acceptable.
For example:

 - two threads running find_*_bit(): safe wrt ffs(0) and returns correct
   value, because underlying bitmap is unchanged;
 - find_next_bit() in parallel with set or clear_bit(), when modifying
   a bit prior to the start bit to search: safe and correct;
 - find_first_bit() in parallel with set_bit(): safe, but may return wrong
   bit number;
 - find_first_zero_bit() in parallel with clear_bit(): same as above.

In last 2 cases find_bit() may not return a correct bit number, but
it may be OK if caller requires any (not exactly the first) set or clear
bit, correspondingly.

In such cases, KCSAN may be safely silenced with data_race(). But in most
cases where KCSAN detects concurrency people should carefully review their
code and likely protect critical sections or switch to atomic
find_and_bit(), as appropriate.

The 1st patch of the series adds the following atomic primitives:

find_and_set_bit(addr, nbits);
find_and_set_next_bit(addr, nbits, start);
...

Here find_and_{set,clear} part refers to the corresponding
test_and_{set,clear}_bit function. Suffixes like _wrap or _lock
derive their semantics from corresponding find() or test() functions.

For brevity, the naming omits the fact that we search for zero bit in
find_and_set, and correspondingly search for set bit in find_and_clear
functions.

The patch also adds iterators with atomic semantics, like
for_each_test_and_set_bit(). Here, the naming rule is to simply prefix
corresponding atomic operation with 'for_each'.

CC: Bart Van Assche 
CC: Sergey Shtylyov 
Signed-off-by: Yury Norov 
---
 include/linux/find.h | 293 +++
 lib/find_bit.c   |  85 +
 2 files changed, 378 insertions(+)

diff --git a/include/linux/find.h b/include/linux/find.h
index 5e4f39ef2e72..237513356ffa 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -32,6 +32,16 @@ extern unsigned long _find_first_and_bit(const unsigned long 
*addr1,
 extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned 
long size);
 extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long 
size);
 
+unsigned long _find_and_set_bit(volatile unsigned long *addr, unsigned long 
nbits);
+unsigned long _find_and_set_next_bit(volatile unsigned long *addr, unsigned 
long nbits,
+   unsigned long start);
+unsigned long _find_and_set_bit_lock(volatile unsigned long *addr, unsigned 
long nbits);
+unsigned long _find_and_set_next_bit_lock(volatile unsigned long *addr, 
unsigned long nbits,
+  

Re: [RFC PATCH 05/12] lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS

2023-12-11 Thread Christoph Hellwig
On Mon, Dec 11, 2023 at 10:12:27AM -0600, Samuel Holland wrote:
> On 2023-12-11 10:07 AM, Christoph Hellwig wrote:
> 
> Unfortunately, not all of the relevant options can be no-prefixed:

Ok.  That is another good argument for having the obj-fpu += syntax
I proposed.  You might need help from the kbuild maintainers from that
as trying to understand the kbuild magic isn't something I'd expect
from a normal contributor (including myself..).



Re: [PATCH 3/3] drm/amd/display: Support DRM_AMD_DC_FP on RISC-V

2023-12-11 Thread Christoph Hellwig
On Thu, Dec 07, 2023 at 10:49:53PM -0600, Samuel Holland wrote:
> Actually tracking all possibly-FPU-tainted functions and their call sites is
> probably possible, but a much larger task.

I think objtool should be able to do that reasonably easily, it already
does it for checking section where userspace address access is enabled
or not, which is very similar.