[PATCH V2 1/9] powerpc/pseries/htmdump: Add htm_hcall_wrapper to integrate other htm operations

2025-03-21 Thread Athira Rajeev
H_HTM (Hardware Trace Macro) hypervisor call is an HCALL to export data
from Hardware Trace Macro (HTM) function. The debugfs interface to
export the HTM function data in an lpar currently supports only dumping
of HTM data in an lpar. To add support for setup, configuration and
control of HTM function via debugfs interface, update the hcall wrapper
function. Rename and update htm_get_dump_hardware to htm_hcall_wrapper()
so that it can be used for other HTM operations as well. Additionally
include parameter "htm_op". Update htmdump module to check the return
code of hcall in a separate function so that it can be reused for other
option too.

Signed-off-by: Athira Rajeev 
---
 arch/powerpc/include/asm/plpar_wrappers.h | 18 +---
 arch/powerpc/platforms/pseries/htmdump.c  | 55 +--
 2 files changed, 55 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index 91be7b885944..f3efa9946b3c 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -65,6 +65,14 @@ static inline long register_dtl(unsigned long cpu, unsigned 
long vpa)
return vpa_call(H_VPA_REG_DTL, cpu, vpa);
 }
 
+/*
+ * Invokes H_HTM hcall with parameters passed from htm_hcall_wrapper.
+ * flags: Set to hardwareTarget.
+ * target: Specifies target using node index, nodal chip index and core index.
+ * operation : action to perform ie configure, start, stop, deconfigure, trace
+ * based on the HTM type.
+ * param1, param2, param3: parameters for each action.
+ */
 static inline long htm_call(unsigned long flags, unsigned long target,
unsigned long operation, unsigned long param1,
unsigned long param2, unsigned long param3)
@@ -73,17 +81,17 @@ static inline long htm_call(unsigned long flags, unsigned 
long target,
  param1, param2, param3);
 }
 
-static inline long htm_get_dump_hardware(unsigned long nodeindex,
+static inline long htm_hcall_wrapper(unsigned long nodeindex,
unsigned long nodalchipindex, unsigned long coreindexonchip,
-   unsigned long type, unsigned long addr, unsigned long size,
-   unsigned long offset)
+  unsigned long type, unsigned long htm_op, unsigned long param1, 
unsigned long param2,
+  unsigned long param3)
 {
return htm_call(H_HTM_FLAGS_HARDWARE_TARGET,
H_HTM_TARGET_NODE_INDEX(nodeindex) |
H_HTM_TARGET_NODAL_CHIP_INDEX(nodalchipindex) |
H_HTM_TARGET_CORE_INDEX_ON_CHIP(coreindexonchip),
-   H_HTM_OP(H_HTM_OP_DUMP_DATA) | H_HTM_TYPE(type),
-   addr, size, offset);
+  H_HTM_OP(htm_op) | H_HTM_TYPE(type),
+  param1, param2, param3);
 }
 
 extern void vpa_init(int cpu);
diff --git a/arch/powerpc/platforms/pseries/htmdump.c 
b/arch/powerpc/platforms/pseries/htmdump.c
index 57fc1700f604..f13a5570446c 100644
--- a/arch/powerpc/platforms/pseries/htmdump.c
+++ b/arch/powerpc/platforms/pseries/htmdump.c
@@ -18,20 +18,19 @@ static u32 coreindexonchip;
 static u32 htmtype;
 static struct dentry *htmdump_debugfs_dir;
 
-static ssize_t htmdump_read(struct file *filp, char __user *ubuf,
-size_t count, loff_t *ppos)
+/*
+ * Check the return code for H_HTM hcall.
+ * Return non-zero value (1) if either H_PARTIAL or H_SUCCESS
+ * is returned. For other return codes:
+ * Return zero if H_NOT_AVAILABLE.
+ * Return -EBUSY if hcall return busy.
+ * Return -EINVAL if any parameter or operation is not valid.
+ * Return -EPERM if HTM Virtualization Engine Technology code
+ * is not applied.
+ * Return -EIO if the HTM state is not valid.
+ */
+static ssize_t htm_return_check(long rc)
 {
-   void *htm_buf = filp->private_data;
-   unsigned long page, read_size, available;
-   loff_t offset;
-   long rc;
-
-   page = ALIGN_DOWN(*ppos, PAGE_SIZE);
-   offset = (*ppos) % PAGE_SIZE;
-
-   rc = htm_get_dump_hardware(nodeindex, nodalchipindex, coreindexonchip,
-  htmtype, virt_to_phys(htm_buf), PAGE_SIZE, 
page);
-
switch (rc) {
case H_SUCCESS:
/* H_PARTIAL for the case where all available data can't be
@@ -65,6 +64,36 @@ static ssize_t htmdump_read(struct file *filp, char __user 
*ubuf,
return -EPERM;
}
 
+   /*
+* Return 1 for H_SUCCESS/H_PARTIAL
+*/
+   return 1;
+}
+
+static ssize_t htmdump_read(struct file *filp, char __user *ubuf,
+size_t count, loff_t *ppos)
+{
+   void *htm_buf = filp->private_data;
+   unsigned long page, read_size, available;
+   loff_t offset;
+   long rc, ret;
+
+   page = ALIGN_DOWN(*ppos, PAGE_SIZE);
+   offset = (*ppos) % PAGE_SIZE;
+
+   /*
+* Invoke H_HTM call with

[PATCH V2 0/9] Add support for configure and control of Hardware Trace Macro(HTM)

2025-03-21 Thread Athira Rajeev
H_HTM (Hardware Trace Macro) hypervisor call is an HCALL to export
data from Hardware Trace Macro (HTM) function. The debugfs interface
to export the HTM function data in a partition currently supports only
dumping of HTM data in an lpar. Patchset add support for configuration
and control of HTM function via debugfs interface.

With the patchset, after loading htmdump module,
below files are present:

ls /sys/kernel/debug/powerpc/htmdump/
  coreindexonchip  htmcaps  htmconfigure  htmflags  htminfo  htmsetup
  htmstart  htmstatus  htmtype  nodalchipindex  nodeindex  trace

- nodeindex, nodalchipindex, coreindexonchip specifies which
  partition to configure the HTM for.
- htmtype: specifies the type of HTM. Supported target is
  hardwareTarget.
- trace: is to read the HTM data.
- htmconfigure: Configure/Deconfigure the HTM. Writing 1 to
  the file will configure the trace, writing 0 to the file
  will do deconfigure.
- htmstart: start/Stop the HTM. Writing 1 to the file will
  start the tracing, writing 0 to the file will stop the tracing.
- htmstatus: get the status of HTM. This is needed to understand
  the HTM state after each operation.
- htmsetup: set the HTM buffer size. Size of HTM buffer is in
  power of 2.
- htminfo: provides the system processor configuration details.
  This is needed to understand the appropriate values for nodeindex,
  nodalchipindex, coreindexonchip.
- htmcaps : provides the HTM capabilities like minimum/maximum buffer
  size, what kind of tracing the HTM supports etc.
- htmflags : allows to pass flags to hcall. Currently supports
  controlling the wrapping of HTM buffer.

Example usage:
To collect HTM traces for a partition represented by nodeindex as
zero, nodalchipindex as 1 and coreindexonchip as 12.

  # cd /sys/kernel/debug/powerpc/htmdump/
  # echo 2 > htmtype
  # echo 0 > nodeindex
  # echo 1 > nodalchipindex
  # echo 12 > coreindexonchip
  # echo 1 > htmflags # to set noWrap for HTM buffers
  # echo 1 > htmconfigure # Configure the HTM
  # echo 1 > htmstart # Start the HTM
  # echo 0 > htmstart # Stop the HTM
  # echo 0 > htmconfigure # Deconfigure the HTM
  # cat htmstatus # Dump the status of HTM entries as data

Changelog:
V2: Venkat reported that patch 7 failed to apply on powerpc-next.
Fixed that in V2.

Athira Rajeev (9):
  powerpc/pseries/htmdump: Add htm_hcall_wrapper to integrate other htm
operations
  powerpc/pseries/htmdump: Add htm configure support to htmdump module
  powerpc/pseries/htmdump: Add htm start support to htmdump module
  powerpc/pseries/htmdump: Add htm status support to htmdump module
  powerpc/pseries/htmdump: Add htm info support to htmdump module
  powerpc/pseries/htmdump: Add htm setup support to htmdump module
  powerpc/pseries/htmdump: Add htm flags support to htmdump module
  powerpc/pseries/htmdump: Add htm capabilities support to htmdump
module
  powerpc/pseries/htmdump: Add documentation for H_HTM debugfs interface

 Documentation/arch/powerpc/htm.rst| 104 ++
 arch/powerpc/include/asm/plpar_wrappers.h |  20 +-
 arch/powerpc/platforms/pseries/htmdump.c  | 370 +-
 3 files changed, 475 insertions(+), 19 deletions(-)
 create mode 100644 Documentation/arch/powerpc/htm.rst

-- 
2.43.5




[PATCH V2 4/9] powerpc/pseries/htmdump: Add htm status support to htmdump module

2025-03-21 Thread Athira Rajeev
Support dumping status of Hardware Trace Macro (HTM) function
via debugfs interface. Under debugfs folder
"/sys/kernel/debug/powerpc/htmdump", add file "htmstatus”.
The interface allows only read of this file which will present the
content of HTM status buffer from the hcall. The 16th offset of HTM
status buffer has value for the number of HTM entries in the status
buffer. Each nest htm status entry is 0x6 bytes, where as core HTM
status entry is 0x8 bytes. Calculate the number of bytes to read
based on this detail.

Signed-off-by: Athira Rajeev 
---
 arch/powerpc/platforms/pseries/htmdump.c | 55 
 1 file changed, 55 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/htmdump.c 
b/arch/powerpc/platforms/pseries/htmdump.c
index 9210645ec3d3..ef530c092ae4 100644
--- a/arch/powerpc/platforms/pseries/htmdump.c
+++ b/arch/powerpc/platforms/pseries/htmdump.c
@@ -12,6 +12,7 @@
 #include 
 
 static void *htm_buf;
+static void *htm_status_buf;
 static u32 nodeindex;
 static u32 nodalchipindex;
 static u32 coreindexonchip;
@@ -197,6 +198,51 @@ static int htmstart_get(void *data, u64 *val)
return 0;
 }
 
+static ssize_t htmstatus_read(struct file *filp, char __user *ubuf,
+size_t count, loff_t *ppos)
+{
+   void *htm_status_buf = filp->private_data;
+   long rc, ret;
+   u64 *num_entries;
+   u64 to_copy;
+   int htmstatus_flag;
+
+   /*
+* Invoke H_HTM call with:
+* - operation as htm status (H_HTM_OP_STATUS)
+* - last three values as addr, size and offset
+*/
+   rc = htm_hcall_wrapper(nodeindex, nodalchipindex, coreindexonchip,
+  htmtype, H_HTM_OP_STATUS, 
virt_to_phys(htm_status_buf),
+  PAGE_SIZE, 0);
+
+   ret = htm_return_check(rc);
+   if (ret <= 0)
+   return ret;
+
+   /*
+* HTM status buffer, start of buffer + 0x10 gives the
+* number of HTM entries in the buffer. Each nest htm status
+* entry is 0x6 bytes where each core htm status entry is
+* 0x8 bytes.
+* So total count to copy is:
+* 32 bytes (for first 7 fields) + (number of HTM entries * entry size)
+*/
+   num_entries = htm_status_buf + 0x10;
+   if (htmtype == 0x2)
+   htmstatus_flag = 0x8;
+   else
+   htmstatus_flag = 0x6;
+   to_copy = 32 + (be64_to_cpu(*num_entries) * htmstatus_flag);
+   return simple_read_from_buffer(ubuf, count, ppos, htm_status_buf, 
to_copy);
+}
+
+static const struct file_operations htmstatus_fops = {
+   .llseek = NULL,
+   .read   = htmstatus_read,
+   .open   = simple_open,
+};
+
 DEFINE_SIMPLE_ATTRIBUTE(htmconfigure_fops, htmconfigure_get, htmconfigure_set, 
"%llu\n");
 DEFINE_SIMPLE_ATTRIBUTE(htmstart_fops, htmstart_get, htmstart_set, "%llu\n");
 
@@ -227,6 +273,15 @@ static int htmdump_init_debugfs(void)
debugfs_create_file("htmconfigure", 0600, htmdump_debugfs_dir, NULL, 
&htmconfigure_fops);
debugfs_create_file("htmstart", 0600, htmdump_debugfs_dir, NULL, 
&htmstart_fops);
 
+   /* Debugfs interface file to present status of HTM */
+   htm_status_buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+   if (!htm_status_buf) {
+   pr_err("Failed to allocate htmstatus buf\n");
+   return -ENOMEM;
+   }
+
+   debugfs_create_file("htmstatus", 0400, htmdump_debugfs_dir, 
htm_status_buf, &htmstatus_fops);
+
return 0;
 }
 
-- 
2.43.5




[PATCH V2 2/9] powerpc/pseries/htmdump: Add htm configure support to htmdump module

2025-03-21 Thread Athira Rajeev
Support configuring of Hardware Trace Macro (HTM) function
via debugfs interface. Under debugfs folder
"/sys/kernel/debug/powerpc/htmdump", add file "htmconfigure".
The interface allows configuring of htm via this file
by writing value "1". Allow deconfiguring of htm via this file
by writing value "0".  Any other value returns -EINVAL.

Signed-off-by: Athira Rajeev 
---
 arch/powerpc/platforms/pseries/htmdump.c | 52 
 1 file changed, 52 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/htmdump.c 
b/arch/powerpc/platforms/pseries/htmdump.c
index f13a5570446c..c623836f7054 100644
--- a/arch/powerpc/platforms/pseries/htmdump.c
+++ b/arch/powerpc/platforms/pseries/htmdump.c
@@ -16,6 +16,7 @@ static u32 nodeindex;
 static u32 nodalchipindex;
 static u32 coreindexonchip;
 static u32 htmtype;
+static u32 htmconfigure;
 static struct dentry *htmdump_debugfs_dir;
 
 /*
@@ -106,6 +107,52 @@ static const struct file_operations htmdump_fops = {
.open   = simple_open,
 };
 
+static int  htmconfigure_set(void *data, u64 val)
+{
+   long rc, ret;
+
+   /*
+* value as 1 : configure HTM.
+* value as 0 : deconfigure HTM. Return -EINVAL for
+* other values.
+*/
+   if (val == 1) {
+   /*
+* Invoke H_HTM call with:
+* - operation as htm configure (H_HTM_OP_CONFIGURE)
+* - last three values are unused, hence set to zero
+*/
+   rc = htm_hcall_wrapper(nodeindex, nodalchipindex, 
coreindexonchip,
+  htmtype, H_HTM_OP_CONFIGURE, 0, 0, 0);
+   } else if (val == 0) {
+   /*
+* Invoke H_HTM call with:
+* - operation as htm deconfigure (H_HTM_OP_DECONFIGURE)
+* - last three values are unused, hence set to zero
+*/
+   rc = htm_hcall_wrapper(nodeindex, nodalchipindex, 
coreindexonchip,
+   htmtype, H_HTM_OP_DECONFIGURE, 0, 0, 0);
+   } else
+   return -EINVAL;
+
+   ret = htm_return_check(rc);
+   if (ret <= 0)
+   return ret;
+
+   /* Set htmconfigure if operation succeeds */
+   htmconfigure = val;
+
+   return 0;
+}
+
+static int htmconfigure_get(void *data, u64 *val)
+{
+   *val = htmconfigure;
+   return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(htmconfigure_fops, htmconfigure_get, htmconfigure_set, 
"%llu\n");
+
 static int htmdump_init_debugfs(void)
 {
htm_buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
@@ -127,6 +174,11 @@ static int htmdump_init_debugfs(void)
htmdump_debugfs_dir, &htmtype);
debugfs_create_file("trace", 0400, htmdump_debugfs_dir, htm_buf, 
&htmdump_fops);
 
+   /*
+* Debugfs interface files to control HTM operations:
+*/
+   debugfs_create_file("htmconfigure", 0600, htmdump_debugfs_dir, NULL, 
&htmconfigure_fops);
+
return 0;
 }
 
-- 
2.43.5




[PATCH V2 3/9] powerpc/pseries/htmdump: Add htm start support to htmdump module

2025-03-21 Thread Athira Rajeev
Support starting of Hardware Trace Macro (HTM) function
via debugfs interface. Under debugfs folder
"/sys/kernel/debug/powerpc/htmdump", add file "htmstart".
The interface allows starting of htm via this file by
writing value "1". Also allows stopping of htm tracing by
writing value "0" to this file. Any other value returns
-EINVAL.

Signed-off-by: Athira Rajeev 
---
 arch/powerpc/platforms/pseries/htmdump.c | 48 
 1 file changed, 48 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/htmdump.c 
b/arch/powerpc/platforms/pseries/htmdump.c
index c623836f7054..9210645ec3d3 100644
--- a/arch/powerpc/platforms/pseries/htmdump.c
+++ b/arch/powerpc/platforms/pseries/htmdump.c
@@ -17,6 +17,7 @@ static u32 nodalchipindex;
 static u32 coreindexonchip;
 static u32 htmtype;
 static u32 htmconfigure;
+static u32 htmstart;
 static struct dentry *htmdump_debugfs_dir;
 
 /*
@@ -151,7 +152,53 @@ static int htmconfigure_get(void *data, u64 *val)
return 0;
 }
 
+static int  htmstart_set(void *data, u64 val)
+{
+   long rc, ret;
+
+   /*
+* value as 1: start HTM
+* value as 0: stop HTM
+* Return -EINVAL for other values.
+*/
+   if (val == 1) {
+   /*
+* Invoke H_HTM call with:
+* - operation as htm start (H_HTM_OP_START)
+* - last three values are unused, hence set to zero
+*/
+   rc = htm_hcall_wrapper(nodeindex, nodalchipindex, 
coreindexonchip,
+  htmtype, H_HTM_OP_START, 0, 0, 0);
+
+   } else if (val == 0) {
+   /*
+* Invoke H_HTM call with:
+* - operation as htm stop (H_HTM_OP_STOP)
+* - last three values are unused, hence set to zero
+*/
+   rc = htm_hcall_wrapper(nodeindex, nodalchipindex, 
coreindexonchip,
+   htmtype, H_HTM_OP_STOP, 0, 0, 0);
+   } else
+   return -EINVAL;
+
+   ret = htm_return_check(rc);
+   if (ret <= 0)
+   return ret;
+
+   /* Set htmstart if H_HTM_OP_START/H_HTM_OP_STOP operation succeeds */
+   htmstart = val;
+
+   return 0;
+}
+
+static int htmstart_get(void *data, u64 *val)
+{
+   *val = htmstart;
+   return 0;
+}
+
 DEFINE_SIMPLE_ATTRIBUTE(htmconfigure_fops, htmconfigure_get, htmconfigure_set, 
"%llu\n");
+DEFINE_SIMPLE_ATTRIBUTE(htmstart_fops, htmstart_get, htmstart_set, "%llu\n");
 
 static int htmdump_init_debugfs(void)
 {
@@ -178,6 +225,7 @@ static int htmdump_init_debugfs(void)
 * Debugfs interface files to control HTM operations:
 */
debugfs_create_file("htmconfigure", 0600, htmdump_debugfs_dir, NULL, 
&htmconfigure_fops);
+   debugfs_create_file("htmstart", 0600, htmdump_debugfs_dir, NULL, 
&htmstart_fops);
 
return 0;
 }
-- 
2.43.5




[PATCH V2 7/9] powerpc/pseries/htmdump: Add htm flags support to htmdump module

2025-03-21 Thread Athira Rajeev
Under debugfs folder, "/sys/kernel/debug/powerpc/htmdump", add file
"htmflags". Currently supported flag value is to enable/disable
HTM buffer wrap. wrap is used along with "configure" to prevent
HTM buffer from wrapping. Writing 1 will set noWrap while
configuring HTM

Signed-off-by: Athira Rajeev 
---
 arch/powerpc/include/asm/plpar_wrappers.h |  4 +-
 arch/powerpc/platforms/pseries/htmdump.c  | 55 +++
 2 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/plpar_wrappers.h 
b/arch/powerpc/include/asm/plpar_wrappers.h
index f3efa9946b3c..f2b6cc4341bb 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -81,12 +81,12 @@ static inline long htm_call(unsigned long flags, unsigned 
long target,
  param1, param2, param3);
 }
 
-static inline long htm_hcall_wrapper(unsigned long nodeindex,
+static inline long htm_hcall_wrapper(unsigned long flags, unsigned long 
nodeindex,
unsigned long nodalchipindex, unsigned long coreindexonchip,
   unsigned long type, unsigned long htm_op, unsigned long param1, 
unsigned long param2,
   unsigned long param3)
 {
-   return htm_call(H_HTM_FLAGS_HARDWARE_TARGET,
+   return htm_call(H_HTM_FLAGS_HARDWARE_TARGET | flags,
H_HTM_TARGET_NODE_INDEX(nodeindex) |
H_HTM_TARGET_NODAL_CHIP_INDEX(nodalchipindex) |
H_HTM_TARGET_CORE_INDEX_ON_CHIP(coreindexonchip),
diff --git a/arch/powerpc/platforms/pseries/htmdump.c 
b/arch/powerpc/platforms/pseries/htmdump.c
index 24e597fb85d8..45f8254fe322 100644
--- a/arch/powerpc/platforms/pseries/htmdump.c
+++ b/arch/powerpc/platforms/pseries/htmdump.c
@@ -21,6 +21,7 @@ static u32 htmtype;
 static u32 htmconfigure;
 static u32 htmstart;
 static u32 htmsetup;
+static u64 htmflags;
 
 static struct dentry *htmdump_debugfs_dir;
 
@@ -92,7 +93,7 @@ static ssize_t htmdump_read(struct file *filp, char __user 
*ubuf,
 * - operation as htm dump (H_HTM_OP_DUMP_DATA)
 * - last three values are address, size and offset
 */
-   rc = htm_hcall_wrapper(nodeindex, nodalchipindex, coreindexonchip,
+   rc = htm_hcall_wrapper(htmflags, nodeindex, nodalchipindex, 
coreindexonchip,
   htmtype, H_HTM_OP_DUMP_DATA, 
virt_to_phys(htm_buf),
   PAGE_SIZE, page);
 
@@ -115,6 +116,7 @@ static const struct file_operations htmdump_fops = {
 static int  htmconfigure_set(void *data, u64 val)
 {
long rc, ret;
+   unsigned long param1 = -1, param2 = -1;
 
/*
 * value as 1 : configure HTM.
@@ -125,17 +127,25 @@ static int  htmconfigure_set(void *data, u64 val)
/*
 * Invoke H_HTM call with:
 * - operation as htm configure (H_HTM_OP_CONFIGURE)
+* - If htmflags is set, param1 and param2 will be -1
+*   which is an indicator to use default htm mode reg mask
+*   and htm mode reg value.
 * - last three values are unused, hence set to zero
 */
-   rc = htm_hcall_wrapper(nodeindex, nodalchipindex, 
coreindexonchip,
-  htmtype, H_HTM_OP_CONFIGURE, 0, 0, 0);
+   if (!htmflags) {
+   param1 = 0;
+   param2 = 0;
+   }
+
+   rc = htm_hcall_wrapper(htmflags, nodeindex, nodalchipindex, 
coreindexonchip,
+  htmtype, H_HTM_OP_CONFIGURE, param1, param2, 0);
} else if (val == 0) {
/*
 * Invoke H_HTM call with:
 * - operation as htm deconfigure (H_HTM_OP_DECONFIGURE)
 * - last three values are unused, hence set to zero
 */
-   rc = htm_hcall_wrapper(nodeindex, nodalchipindex, 
coreindexonchip,
+   rc = htm_hcall_wrapper(htmflags, nodeindex, nodalchipindex, 
coreindexonchip,
htmtype, H_HTM_OP_DECONFIGURE, 0, 0, 0);
} else
return -EINVAL;
@@ -171,7 +181,7 @@ static int  htmstart_set(void *data, u64 val)
 * - operation as htm start (H_HTM_OP_START)
 * - last three values are unused, hence set to zero
 */
-   rc = htm_hcall_wrapper(nodeindex, nodalchipindex, 
coreindexonchip,
+   rc = htm_hcall_wrapper(htmflags, nodeindex, nodalchipindex, 
coreindexonchip,
   htmtype, H_HTM_OP_START, 0, 0, 0);
 
} else if (val == 0) {
@@ -180,7 +190,7 @@ static int  htmstart_set(void *data, u64 val)
 * - operation as htm stop (H_HTM_OP_STOP)
 * - last three values are unused, hence set to zero
 */
-   rc = htm_hcall_wrapper(nodeindex, nodalchipindex, 
coreindexon

[PATCH V2 9/9] powerpc/pseries/htmdump: Add documentation for H_HTM debugfs interface

2025-03-21 Thread Athira Rajeev
Documentation for HTM (Hardware Trace Macro) debugfs interface
and how it can be used to configure/control the HTM operations.

Signed-off-by: Athira Rajeev 
---
 Documentation/arch/powerpc/htm.rst | 104 +
 1 file changed, 104 insertions(+)
 create mode 100644 Documentation/arch/powerpc/htm.rst

diff --git a/Documentation/arch/powerpc/htm.rst 
b/Documentation/arch/powerpc/htm.rst
new file mode 100644
index ..fcb4eb6306b1
--- /dev/null
+++ b/Documentation/arch/powerpc/htm.rst
@@ -0,0 +1,104 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. _htm:
+
+===
+HTM (Hardware Trace Macro)
+===
+
+Athira Rajeev, 2 Mar 2025
+
+.. contents::
+:depth: 3
+
+
+Basic overview
+==
+
+H_HTM is used as an interface for executing Hardware Trace Macro (HTM)
+functions, including setup, configuration, control and dumping of the HTM data.
+For using HTM, it is required to setup HTM buffers and HTM operations can
+be controlled using the H_HTM hcall. The hcall can be invoked for any core/chip
+of the system from within a partition itself. To use this feature, a debugfs
+folder called "htmdump" is present under /sys/kernel/debug/powerpc.
+
+
+HTM debugfs example usage
+=
+
+.. code-block:: sh
+
+  #  ls /sys/kernel/debug/powerpc/htmdump/
+  coreindexonchip  htmcaps  htmconfigure  htmflags  htminfo  htmsetup
+  htmstart  htmstatus  htmtype  nodalchipindex  nodeindex  trace
+
+Details on each file:
+
+* nodeindex, nodalchipindex, coreindexonchip specifies which partition to 
configure the HTM for.
+* htmtype: specifies the type of HTM. Supported target is hardwareTarget.
+* trace: is to read the HTM data.
+* htmconfigure: Configure/Deconfigure the HTM. Writing 1 to the file will 
configure the trace, writing 0 to the file will do deconfigure.
+* htmstart: start/Stop the HTM. Writing 1 to the file will start the tracing, 
writing 0 to the file will stop the tracing.
+* htmstatus: get the status of HTM. This is needed to understand the HTM state 
after each operation.
+* htmsetup: set the HTM buffer size. Size of HTM buffer is in power of 2
+* htminfo: provides the system processor configuration details. This is needed 
to understand the appropriate values for nodeindex, nodalchipindex, 
coreindexonchip.
+* htmcaps : provides the HTM capabilities like minimum/maximum buffer size, 
what kind of tracing the HTM supports etc.
+* htmflags : allows to pass flags to hcall. Currently supports controlling the 
wrapping of HTM buffer.
+
+To see the system processor configuration details:
+
+.. code-block:: sh
+
+  # cat /sys/kernel/debug/powerpc/htmdump/htminfo > htminfo_file
+
+The result can be interpreted using hexdump.
+
+To collect HTM traces for a partition represented by nodeindex as
+zero, nodalchipindex as 1 and coreindexonchip as 12
+
+.. code-block:: sh
+
+  # cd /sys/kernel/debug/powerpc/htmdump/
+  # echo 2 > htmtype
+  # echo 33 > htmsetup ( sets 8GB memory for HTM buffer, number is size in 
power of 2 )
+
+This requires a CEC reboot to get the HTM buffers allocated.
+
+.. code-block:: sh
+
+  # cd /sys/kernel/debug/powerpc/htmdump/
+  # echo 2 > htmtype
+  # echo 0 > nodeindex
+  # echo 1 > nodalchipindex
+  # echo 12 > coreindexonchip
+  # echo 1 > htmflags # to set noWrap for HTM buffers
+  # echo 1 > htmconfigure # Configure the HTM
+  # echo 1 > htmstart # Start the HTM
+  # echo 0 > htmstart # Stop the HTM
+  # echo 0 > htmconfigure # Deconfigure the HTM
+  # cat htmstatus # Dump the status of HTM entries as data
+
+Above will set the htmtype and core details, followed by executing respective 
HTM operation.
+
+Read the HTM trace data
+
+
+After starting the trace collection, run the workload
+of interest. Stop the trace collection after required period
+of time, and read the trace file.
+
+.. code-block:: sh
+
+  # cat /sys/kernel/debug/powerpc/htmdump/trace > trace_file
+
+This trace file will contain the relevant instruction traces
+collected during the workload execution. And can be used as
+input file for trace decoders to understand data.
+
+Benefits of using HTM debugfs interface
+===
+
+It is now possible to collect traces for a particular core/chip
+from within any partition of the system and decode it. Through
+this enablement, a small partition can be dedicated to collect the
+trace data and analyze to provide important information for Performance
+analysis, Software tuning, or Hardware debug.
-- 
2.43.5




[PATCH V2 8/9] powerpc/pseries/htmdump: Add htm capabilities support to htmdump module

2025-03-21 Thread Athira Rajeev
Support dumping HTM capabilities information from Hardware
Trace Macro (HTM) function via debugfs interface. Under
debugfs folder "/sys/kernel/debug/powerpc/htmdump", add
file "htmcaps”.

The interface allows only read of this file which will present the
content of HTM buffer from the hcall.

Signed-off-by: Athira Rajeev 
---
 arch/powerpc/platforms/pseries/htmdump.c | 38 
 1 file changed, 38 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/htmdump.c 
b/arch/powerpc/platforms/pseries/htmdump.c
index 45f8254fe322..f21d738ddecb 100644
--- a/arch/powerpc/platforms/pseries/htmdump.c
+++ b/arch/powerpc/platforms/pseries/htmdump.c
@@ -14,6 +14,7 @@
 static void *htm_buf;
 static void *htm_status_buf;
 static void *htm_info_buf;
+static void *htm_caps_buf;
 static u32 nodeindex;
 static u32 nodalchipindex;
 static u32 coreindexonchip;
@@ -290,12 +291,41 @@ static ssize_t htminfo_read(struct file *filp, char 
__user *ubuf,
return simple_read_from_buffer(ubuf, count, ppos, htm_info_buf, 
to_copy);
 }
 
+static ssize_t htmcaps_read(struct file *filp, char __user *ubuf,
+size_t count, loff_t *ppos)
+{
+   void *htm_caps_buf = filp->private_data;
+   long rc, ret;
+
+   /*
+* Invoke H_HTM call with:
+* - operation as htm capabilities (H_HTM_OP_CAPABILITIES)
+* - last three values as addr, size (0x80 for Capabilities Output 
Buffer
+*   and zero
+*/
+   rc = htm_hcall_wrapper(htmflags, nodeindex, nodalchipindex, 
coreindexonchip,
+  htmtype, H_HTM_OP_CAPABILITIES, 
virt_to_phys(htm_caps_buf),
+  0x80, 0);
+
+   ret = htm_return_check(rc);
+   if (ret <= 0)
+   return ret;
+
+   return simple_read_from_buffer(ubuf, count, ppos, htm_caps_buf, 0x80);
+}
+
 static const struct file_operations htminfo_fops = {
.llseek = NULL,
.read   = htminfo_read,
.open   = simple_open,
 };
 
+static const struct file_operations htmcaps_fops = {
+   .llseek = NULL,
+   .read   = htmcaps_read,
+   .open   = simple_open,
+};
+
 static int  htmsetup_set(void *data, u64 val)
 {
long rc, ret;
@@ -401,8 +431,16 @@ static int htmdump_init_debugfs(void)
return -ENOMEM;
}
 
+   /* Debugfs interface file to present HTM capabilities */
+   htm_caps_buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
+   if (!htm_caps_buf) {
+   pr_err("Failed to allocate htm caps buf\n");
+   return -ENOMEM;
+   }
+
debugfs_create_file("htmstatus", 0400, htmdump_debugfs_dir, 
htm_status_buf, &htmstatus_fops);
debugfs_create_file("htminfo", 0400, htmdump_debugfs_dir, htm_info_buf, 
&htminfo_fops);
+   debugfs_create_file("htmcaps", 0400, htmdump_debugfs_dir, htm_caps_buf, 
&htmcaps_fops);
 
return 0;
 }
-- 
2.43.5




Re: [PATCH] bus: fsl-mc: Remove deadcode

2025-03-21 Thread Ioana Ciornei
On Fri, Nov 15, 2024 at 03:20:55PM +, li...@treblig.org wrote:
> From: "Dr. David Alan Gilbert" 
> 
> fsl_mc_allocator_driver_exit() was added explicitly by
> commit 1e8ac83b6caf ("bus: fsl-mc: add fsl_mc_allocator cleanup function")
> but was never used.
> 
> Remove it.
> 
> fsl_mc_portal_reset() was added in 2015 by
> commit 197f4d6a4a00 ("staging: fsl-mc: fsl-mc object allocator driver")
> but was never used.
> 
> Remove it.
> 
> fsl_mc_portal_reset() was the only caller of dpmcp_reset().
> 
> Remove it.
> 
> Signed-off-by: Dr. David Alan Gilbert 

Acked-by: Ioana Ciornei 




[PATCH V2 6/9] powerpc/pseries/htmdump: Add htm setup support to htmdump module

2025-03-21 Thread Athira Rajeev
Add htm setup support to htmdump module. To use the
HTM (Hardware Trace Macro), HTM buffer has to be allocated.
Support setup of HTM buffers via debugfs interface. Under
debugfs folder, "/sys/kernel/debug/powerpc/htmdump", add file
"htmsetup". The interface allows setup of HTM buffer by writing
size of HTM buffer in power of 2 to the "htmsetup" file

Signed-off-by: Athira Rajeev 
---
 arch/powerpc/platforms/pseries/htmdump.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/htmdump.c 
b/arch/powerpc/platforms/pseries/htmdump.c
index b387e46f2b8e..24e597fb85d8 100644
--- a/arch/powerpc/platforms/pseries/htmdump.c
+++ b/arch/powerpc/platforms/pseries/htmdump.c
@@ -20,6 +20,8 @@ static u32 coreindexonchip;
 static u32 htmtype;
 static u32 htmconfigure;
 static u32 htmstart;
+static u32 htmsetup;
+
 static struct dentry *htmdump_debugfs_dir;
 
 /*
@@ -284,8 +286,41 @@ static const struct file_operations htminfo_fops = {
.open   = simple_open,
 };
 
+static int  htmsetup_set(void *data, u64 val)
+{
+   long rc, ret;
+
+   /*
+* Input value: HTM buffer size in the power of 2
+* example: hex value 0x21 ( decimal: 33 ) is for
+* 8GB
+* Invoke H_HTM call with:
+* - operation as htm start (H_HTM_OP_SETUP)
+* - parameter 1 set to input value.
+* - last two values are unused, hence set to zero
+*/
+   rc = htm_hcall_wrapper(nodeindex, nodalchipindex, coreindexonchip,
+   htmtype, H_HTM_OP_SETUP, val, 0, 0);
+
+   ret = htm_return_check(rc);
+   if (ret <= 0)
+   return ret;
+
+   /* Set htmsetup if H_HTM_OP_SETUP operation succeeds */
+   htmsetup = val;
+
+   return 0;
+}
+
+static int htmsetup_get(void *data, u64 *val)
+{
+   *val = htmsetup;
+   return 0;
+}
+
 DEFINE_SIMPLE_ATTRIBUTE(htmconfigure_fops, htmconfigure_get, htmconfigure_set, 
"%llu\n");
 DEFINE_SIMPLE_ATTRIBUTE(htmstart_fops, htmstart_get, htmstart_set, "%llu\n");
+DEFINE_SIMPLE_ATTRIBUTE(htmsetup_fops, htmsetup_get, htmsetup_set, "%llu\n");
 
 static int htmdump_init_debugfs(void)
 {
@@ -313,6 +348,7 @@ static int htmdump_init_debugfs(void)
 */
debugfs_create_file("htmconfigure", 0600, htmdump_debugfs_dir, NULL, 
&htmconfigure_fops);
debugfs_create_file("htmstart", 0600, htmdump_debugfs_dir, NULL, 
&htmstart_fops);
+   debugfs_create_file("htmsetup", 0600, htmdump_debugfs_dir, NULL, 
&htmsetup_fops);
 
/* Debugfs interface file to present status of HTM */
htm_status_buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
-- 
2.43.5




Re: [RFC 3/3] mm: integrate GCMA with CMA using dt-bindings

2025-03-21 Thread Conor Dooley
On Thu, Mar 20, 2025 at 10:39:31AM -0700, Suren Baghdasaryan wrote:
> This patch introduces a new "guarantee" property for shared-dma-pool.
> With this property, admin can create specific memory pool as
> GCMA-based CMA if they care about allocation success rate and latency.
> The downside of GCMA is that it can host only clean file-backed pages
> since it's using cleancache as its secondary user.
> 
> Signed-off-by: Minchan Kim 
> Signed-off-by: Suren Baghdasaryan 
> ---
>  arch/powerpc/kernel/fadump.c |  2 +-
>  include/linux/cma.h  |  2 +-
>  kernel/dma/contiguous.c  | 11 ++-
>  mm/cma.c | 33 ++---
>  mm/cma.h |  1 +
>  mm/cma_sysfs.c   | 10 ++
>  6 files changed, 49 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 4b371c738213..4eb7be0cdcdb 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -111,7 +111,7 @@ void __init fadump_cma_init(void)
>   return;
>   }
>  
> - rc = cma_init_reserved_mem(base, size, 0, "fadump_cma", &fadump_cma);
> + rc = cma_init_reserved_mem(base, size, 0, "fadump_cma", &fadump_cma, 
> false);
>   if (rc) {
>   pr_err("Failed to init cma area for firmware-assisted 
> dump,%d\n", rc);
>   /*
> diff --git a/include/linux/cma.h b/include/linux/cma.h
> index 62d9c1cf6326..3207db979e94 100644
> --- a/include/linux/cma.h
> +++ b/include/linux/cma.h
> @@ -46,7 +46,7 @@ extern int __init cma_declare_contiguous_multi(phys_addr_t 
> size,
>  extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
>   unsigned int order_per_bit,
>   const char *name,
> - struct cma **res_cma);
> + struct cma **res_cma, bool gcma);
>  extern struct page *cma_alloc(struct cma *cma, unsigned long count, unsigned 
> int align,
> bool no_warn);
>  extern bool cma_pages_valid(struct cma *cma, const struct page *pages, 
> unsigned long count);
> diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
> index 055da410ac71..a68b3123438c 100644
> --- a/kernel/dma/contiguous.c
> +++ b/kernel/dma/contiguous.c
> @@ -459,6 +459,7 @@ static int __init rmem_cma_setup(struct reserved_mem 
> *rmem)
>   unsigned long node = rmem->fdt_node;
>   bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
>   struct cma *cma;
> + bool gcma;
>   int err;
>  
>   if (size_cmdline != -1 && default_cma) {
> @@ -476,7 +477,15 @@ static int __init rmem_cma_setup(struct reserved_mem 
> *rmem)
>   return -EINVAL;
>   }
>  
> - err = cma_init_reserved_mem(rmem->base, rmem->size, 0, rmem->name, 
> &cma);
> + gcma = !!of_get_flat_dt_prop(node, "guarantee", NULL);

When this (or if I guess) this goes !RFC, you will need to document this
new property that you're adding.


signature.asc
Description: PGP signature


Re: [PATCH V2 2/2] tools/perf/powerpc/util: Add support to handle compatible mode PVR for perf json events

2025-03-21 Thread Christophe Leroy

Hi,

Le 10/10/2024 à 16:51, Athira Rajeev a écrit :

perf list picks the events supported for specific platform
from pmu-events/arch/powerpc/. Example power10 events
are in pmu-events/arch/powerpc/power10, power9 events are part
of pmu-events/arch/powerpc/power9. The decision of which
platform to pick is determined based on PVR value in powerpc.
The PVR value is matched from pmu-events/arch/powerpc/mapfile.csv

Example:

Format:
PVR,Version,JSON/file/pathname,Type

0x004[bcd][[:xdigit:]]{4},1,power8,core
0x0066[[:xdigit:]]{4},1,power8,core
0x004e[[:xdigit:]]{4},1,power9,core
0x0080[[:xdigit:]]{4},1,power10,core
0x0082[[:xdigit:]]{4},1,power10,core

The code gets the PVR from system using get_cpuid_str function
in arch/powerpc/util/headers.c ( from SPRN_PVR ) and compares
with value from mapfile.csv
In case of compat mode, say when partition is booted in a power9
mode when the system is a power10, this picks incorrectly. Because
PVR will point to power10 where as it should pick events from power9
folder. To support generic events, add new folder
pmu-events/arch/powerpc/compat to contain the ISA architected events
which is supported in compat mode. Also return 0x00ff as pvr
when booted in compat mode. Based on this pvr value, json will
pick events from pmu-events/arch/powerpc/compat

Suggested-by: Madhavan Srinivasan 
Signed-off-by: Athira Rajeev 


I see this patch was merged into mainline allthough it had CI failures 
and still has.


Could you please fix it ?

arch/powerpc/util/header.c: In function 'is_compat_mode':
Error: arch/powerpc/util/header.c:20:14: error: cast to pointer from 
integer of different size [-Werror=int-to-pointer-cast]

   20 |  if (!strcmp((char *)platform, (char *)base_platform))
  |  ^
Error: arch/powerpc/util/header.c:20:32: error: cast to pointer from 
integer of different size [-Werror=int-to-pointer-cast]

   20 |  if (!strcmp((char *)platform, (char *)base_platform))
  |^
cc1: all warnings being treated as errors
make[6]: *** [/linux/tools/build/Makefile.build:86: 
/output/arch/powerpc/util/header.o] Error 1



The following fix but is maybe not the right one as in reality 
getauxval() seems to return a long not a u64.


diff --git a/tools/perf/arch/powerpc/util/header.c 
b/tools/perf/arch/powerpc/util/header.c

index c7df534dbf8f..1b045d410f31 100644
--- a/tools/perf/arch/powerpc/util/header.c
+++ b/tools/perf/arch/powerpc/util/header.c
@@ -17,7 +17,7 @@ static bool is_compat_mode(void)
u64 base_platform = getauxval(AT_BASE_PLATFORM);
u64 platform = getauxval(AT_PLATFORM);

-   if (!strcmp((char *)platform, (char *)base_platform))
+   if (!strcmp((char *)(long)platform, (char *)(long)base_platform))
return false;

return true;


Thanks
Christophe


---
changelog:
V1 -> V2:
Corrected commit message and subject line

  tools/perf/arch/powerpc/util/header.c | 32 ++-
  1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/powerpc/util/header.c 
b/tools/perf/arch/powerpc/util/header.c
index 6b00efd53638..adc82c479443 100644
--- a/tools/perf/arch/powerpc/util/header.c
+++ b/tools/perf/arch/powerpc/util/header.c
@@ -10,6 +10,18 @@
  #include "utils_header.h"
  #include "metricgroup.h"
  #include 
+#include 
+
+static bool is_compat_mode(void)
+{
+   u64 base_platform = getauxval(AT_BASE_PLATFORM);
+   u64 platform = getauxval(AT_PLATFORM);
+
+   if (!strcmp((char *)platform, (char *)base_platform))
+   return false;
+
+   return true;
+}
  
  int

  get_cpuid(char *buffer, size_t sz)
@@ -33,8 +45,26 @@ char *
  get_cpuid_str(struct perf_pmu *pmu __maybe_unused)
  {
char *bufp;
+   unsigned long pvr;
+
+   /*
+* IBM Power System supports compatible mode. That is
+* Nth generation platform can support previous generation
+* OS in a mode called compatibile mode. For ex. LPAR can be
+* booted in a Power9 mode when the system is a Power10.
+*
+* In the compatible mode, care must be taken when generating
+* PVR value. When read, PVR will be of the AT_BASE_PLATFORM
+* To support generic events, return 0x00ff as pvr when
+* booted in compat mode. Based on this pvr value, json will
+* pick events from pmu-events/arch/powerpc/compat
+*/
+   if (!is_compat_mode())
+   pvr = mfspr(SPRN_PVR);
+   else
+   pvr = 0x00ff;
  
-	if (asprintf(&bufp, "0x%.8lx", mfspr(SPRN_PVR)) < 0)

+   if (asprintf(&bufp, "0x%.8lx", pvr) < 0)
bufp = NULL;
  
  	return bufp;





Re: [PATCH v2 32/57] irqdomain: ppc: Switch to irq_domain_create_*()

2025-03-21 Thread Christophe Leroy




Le 19/03/2025 à 10:29, Jiri Slaby (SUSE) a écrit :

irq_domain_add_*() interfaces are going away as being obsolete now.
Switch to the preferred irq_domain_create_*() ones. Those differ in the
node parameter: They take more generic struct fwnode_handle instead of
struct device_node. Therefore, of_fwnode_handle() is added around the
original parameter.

Note some of the users can likely use dev->fwnode directly instead of
indirect of_fwnode_handle(dev->of_node). But dev->fwnode is not
guaranteed to be set for all, so this has to be investigated on case to
case basis (by people who can actually test with the HW).

Signed-off-by: Jiri Slaby (SUSE) 
Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Naveen N Rao 
Cc: Anatolij Gustschin 
Cc: Scott Wood 
Cc: linuxppc-dev@lists.ozlabs.org



Reviewed-by: Christophe Leroy  # For 8xx




---
  arch/powerpc/platforms/44x/uic.c | 5 +++--
  arch/powerpc/platforms/512x/mpc5121_ads_cpld.c   | 3 ++-
  arch/powerpc/platforms/52xx/media5200.c  | 2 +-
  arch/powerpc/platforms/52xx/mpc52xx_gpt.c| 4 ++--
  arch/powerpc/platforms/52xx/mpc52xx_pic.c| 2 +-
  arch/powerpc/platforms/85xx/socrates_fpga_pic.c  | 2 +-
  arch/powerpc/platforms/8xx/cpm1-ic.c | 3 ++-
  arch/powerpc/platforms/8xx/pic.c | 3 ++-
  arch/powerpc/platforms/embedded6xx/flipper-pic.c | 5 +++--
  arch/powerpc/platforms/embedded6xx/hlwd-pic.c| 5 +++--
  arch/powerpc/platforms/powermac/pic.c| 5 +++--
  arch/powerpc/platforms/powernv/opal-irqchip.c| 3 ++-
  arch/powerpc/sysdev/cpm2_pic.c   | 3 ++-
  arch/powerpc/sysdev/ehv_pic.c| 5 +++--
  arch/powerpc/sysdev/fsl_msi.c| 2 +-
  arch/powerpc/sysdev/ge/ge_pic.c  | 5 +++--
  arch/powerpc/sysdev/i8259.c  | 4 ++--
  arch/powerpc/sysdev/ipic.c   | 5 +++--
  arch/powerpc/sysdev/mpic.c   | 6 +++---
  arch/powerpc/sysdev/tsi108_pci.c | 4 ++--
  arch/powerpc/sysdev/xive/common.c| 2 +-
  21 files changed, 45 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/platforms/44x/uic.c b/arch/powerpc/platforms/44x/uic.c
index 31f760c2ec5d..481ec25ce78f 100644
--- a/arch/powerpc/platforms/44x/uic.c
+++ b/arch/powerpc/platforms/44x/uic.c
@@ -254,8 +254,9 @@ static struct uic * __init uic_init_one(struct device_node 
*node)
}
uic->dcrbase = *dcrreg;
  
-	uic->irqhost = irq_domain_add_linear(node, NR_UIC_INTS, &uic_host_ops,

-uic);
+   uic->irqhost = irq_domain_create_linear(of_fwnode_handle(node),
+   NR_UIC_INTS, &uic_host_ops,
+   uic);
if (! uic->irqhost)
return NULL; /* FIXME: panic? */
  
diff --git a/arch/powerpc/platforms/512x/mpc5121_ads_cpld.c b/arch/powerpc/platforms/512x/mpc5121_ads_cpld.c

index e995eb30bf09..2cf3c6237337 100644
--- a/arch/powerpc/platforms/512x/mpc5121_ads_cpld.c
+++ b/arch/powerpc/platforms/512x/mpc5121_ads_cpld.c
@@ -188,7 +188,8 @@ mpc5121_ads_cpld_pic_init(void)
  
  	cpld_pic_node = of_node_get(np);
  
-	cpld_pic_host = irq_domain_add_linear(np, 16, &cpld_pic_host_ops, NULL);

+   cpld_pic_host = irq_domain_create_linear(of_fwnode_handle(np), 16,
+&cpld_pic_host_ops, NULL);
if (!cpld_pic_host) {
printk(KERN_ERR "CPLD PIC: failed to allocate irq host!\n");
goto end;
diff --git a/arch/powerpc/platforms/52xx/media5200.c 
b/arch/powerpc/platforms/52xx/media5200.c
index 19626cd42406..bc7f83cfec1d 100644
--- a/arch/powerpc/platforms/52xx/media5200.c
+++ b/arch/powerpc/platforms/52xx/media5200.c
@@ -168,7 +168,7 @@ static void __init media5200_init_irq(void)
  
  	spin_lock_init(&media5200_irq.lock);
  
-	media5200_irq.irqhost = irq_domain_add_linear(fpga_np,

+   media5200_irq.irqhost = 
irq_domain_create_linear(of_fwnode_handle(fpga_np),
MEDIA5200_NUM_IRQS, &media5200_irq_ops, &media5200_irq);
if (!media5200_irq.irqhost)
goto out;
diff --git a/arch/powerpc/platforms/52xx/mpc52xx_gpt.c 
b/arch/powerpc/platforms/52xx/mpc52xx_gpt.c
index 1ea591ec6083..f042b21b2b73 100644
--- a/arch/powerpc/platforms/52xx/mpc52xx_gpt.c
+++ b/arch/powerpc/platforms/52xx/mpc52xx_gpt.c
@@ -247,9 +247,9 @@ mpc52xx_gpt_irq_setup(struct mpc52xx_gpt_priv *gpt, struct 
device_node *node)
if (!cascade_virq)
return;
  
-	gpt->irqhost = irq_domain_add_linear(node, 1, &mpc52xx_gpt_irq_ops, gpt);

+   gpt->irqhost = irq_domain_create_linear(of_fwnode_handle(node), 1, 
&mpc52xx_gpt_irq_ops, gpt);
if (!gpt->irqhost) {
-   dev_err(gpt->dev, "irq_domain_add_linear() failed\n");
+   dev_err(gpt->dev, "irq_domain_create_line

[PATCH v4 1/3] lsm: introduce new hooks for setting/getting inode fsxattr

2025-03-21 Thread Andrey Albershteyn
Introduce new hooks for setting and getting filesystem extended
attributes on inode (FS_IOC_FSGETXATTR).

Cc: seli...@vger.kernel.org
Cc: Paul Moore 

Signed-off-by: Andrey Albershteyn 
---
 fs/ioctl.c|  7 ++-
 include/linux/lsm_hook_defs.h |  4 
 include/linux/security.h  | 16 
 security/security.c   | 32 
 4 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index 
638a36be31c14afc66a7fd6eb237d9545e8ad997..4434c97bc5dff5a3e8635e28745cd99404ff353e
 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -525,10 +525,15 @@ EXPORT_SYMBOL(fileattr_fill_flags);
 int vfs_fileattr_get(struct dentry *dentry, struct fileattr *fa)
 {
struct inode *inode = d_inode(dentry);
+   int error;
 
if (!inode->i_op->fileattr_get)
return -ENOIOCTLCMD;
 
+   error = security_inode_getfsxattr(inode, fa);
+   if (error)
+   return error;
+
return inode->i_op->fileattr_get(dentry, fa);
 }
 EXPORT_SYMBOL(vfs_fileattr_get);
@@ -692,7 +697,7 @@ int vfs_fileattr_set(struct mnt_idmap *idmap, struct dentry 
*dentry,
fa->flags |= old_ma.flags & ~FS_COMMON_FL;
}
err = fileattr_set_prepare(inode, &old_ma, fa);
-   if (!err)
+   if (!err && !security_inode_setfsxattr(inode, fa))
err = inode->i_op->fileattr_set(idmap, dentry, fa);
}
inode_unlock(inode);
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 
eb2937599cb029004f491012b3bf5a3d6d2731df..49e64d23e9049568af133bf3f30ca719c9ec5f25
 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -157,6 +157,10 @@ LSM_HOOK(int, 0, inode_removexattr, struct mnt_idmap 
*idmap,
 struct dentry *dentry, const char *name)
 LSM_HOOK(void, LSM_RET_VOID, inode_post_removexattr, struct dentry *dentry,
 const char *name)
+LSM_HOOK(int, 0, inode_setfsxattr, const struct inode *inode,
+const struct fileattr *fa)
+LSM_HOOK(int, 0, inode_getfsxattr, const struct inode *inode,
+const struct fileattr *fa)
 LSM_HOOK(int, 0, inode_set_acl, struct mnt_idmap *idmap,
 struct dentry *dentry, const char *acl_name, struct posix_acl *kacl)
 LSM_HOOK(void, LSM_RET_VOID, inode_post_set_acl, struct dentry *dentry,
diff --git a/include/linux/security.h b/include/linux/security.h
index 
cbdba435b798660130779d6919388779edd41d54..dd58ace29c6e325ee49470596d0abb6ecc38ba07
 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -439,6 +439,10 @@ int security_inode_listxattr(struct dentry *dentry);
 int security_inode_removexattr(struct mnt_idmap *idmap,
   struct dentry *dentry, const char *name);
 void security_inode_post_removexattr(struct dentry *dentry, const char *name);
+int security_inode_setfsxattr(const struct inode *inode,
+ const struct fileattr *fa);
+int security_inode_getfsxattr(const struct inode *inode,
+ const struct fileattr *fa);
 int security_inode_need_killpriv(struct dentry *dentry);
 int security_inode_killpriv(struct mnt_idmap *idmap, struct dentry *dentry);
 int security_inode_getsecurity(struct mnt_idmap *idmap,
@@ -1042,6 +1046,18 @@ static inline void 
security_inode_post_removexattr(struct dentry *dentry,
   const char *name)
 { }
 
+static inline int security_inode_setfsxattr(const struct inode *inode,
+   const struct fileattr *fa)
+{
+   return 0;
+}
+
+static inline int security_inode_getfsxattr(const struct inode *inode,
+   const struct fileattr *fa)
+{
+   return 0;
+}
+
 static inline int security_inode_need_killpriv(struct dentry *dentry)
 {
return cap_inode_need_killpriv(dentry);
diff --git a/security/security.c b/security/security.c
index 
09664e09fec9a1d502a23847aa2e87a6d19837db..d3b527f55ed52209d8e22c05adf278b164374d35
 100644
--- a/security/security.c
+++ b/security/security.c
@@ -2617,6 +2617,38 @@ void security_inode_post_removexattr(struct dentry 
*dentry, const char *name)
call_void_hook(inode_post_removexattr, dentry, name);
 }
 
+/**
+ * security_inode_setfsxattr() - check if setting fsxattr is allowed
+ * @inode: inode to set filesystem extended attributes on
+ * @fa: extended attributes to set on the inode
+ *
+ * Called when setfsxattrat() syscall or FS_IOC_FSSETXATTR ioctl() is called on
+ * inode
+ *
+ * Return: Returns 0 if permission is granted.
+ */
+int security_inode_setfsxattr(const struct inode *inode,
+ const struct fileattr *fa)
+{
+   return call_int_hook(inode_setfsxattr, inode, fa);
+}
+
+/**
+ * security_inode_getfsxattr() - check if retrieving fsxattr is allowed
+ * @inode: inode to retrieve filesyste

Re: distro support for CONFIG_KUNIT: [PATCH 0/3] bitmap: convert self-test to KUnit

2025-03-21 Thread Tamir Duberstein
On Fri, Mar 21, 2025 at 2:32 PM Yury Norov  wrote:
>
> On Fri, Mar 21, 2025 at 12:53:36PM -0400, Tamir Duberstein wrote:
> > Hi all, now that the printf and scanf series have been taken via kees'
> > tree[0] and sent in for v6.15-rc1[1], I wonder if we'd like to revisit
> > this discussion.
> >
> > As I understand it, the primary objections to moving bitmap to KUnit were:
> > - Unclear benefits.
> > - Source churn.
> > - Extra dependencies for benchmarks.
> >
> > Hopefully David's enumeration of the benefits of KUnit was compelling.
> > Regarding source churn: it is inevitable, but I did pay attention to
> > this and minimized the diff where possible.
> >
> > The last point is trickiest, because KUnit doesn't have first-class
> > benchmark support, but nor is there a blessed benchmark facility in
> > the kernel generally. I'd prefer not to tie this series to distros
> > enabling KUNIT_CONFIG by default, which will take $time.
> >
> > I think the most sensible thing we can do - if we accept that KUnit
> > has benefits to offer - is to split test_bitmap.c into
> > benchmark_bitmap.c and bitmap_kunit.c.
> >
> > Please let me know your thoughts.
>
> Sure, no problem.
>
> I asked you to answer to 4 very simple and specific questions. You
> didn't answer any of them. David sent a lengthy email that doesn't
> address them, either.

OK, that's fair I suppose. Let me try and address them now:

> - What do the tests miss now?

The tests do not _miss_ anything. They are just inconvenient to run,
particularly from automation, because they do not report success in a
way that is trivially understood by automation. In other words, I'm
not aware of something trivial I can run that will exit 0 if and only
if the bitmap tests pass.

> - What do _you_ need from the tests? Describe your test scenario.

I want kernel tests to be easier to run, and for more of them to be
run by existing automation such as LKP[0]. I know for sure that KUnit
tests are automatically run by LKP because other tests I converted to
KUnit subsequently had warnings reported by LKP.

> - How exactly KUNIT helps you testing bitmaps and friends better?

KUnit reports test results in a standard protocol (KTAP) that is
machine-friendly. It comes with userspace tools that understand this
protocol and produce useful exit codes, as well as human-friendly
output.

> - Is there a way to meet your needs with a less invasive approach,
> particularly without run-time dependencies?

I'm not aware of such a way, but if you know of one, I would be very
interested to learn.

> None of you guys submitted anything to bitmaps - neither in library,
> nor in tests. Your opinion about what is good for bitmap development
> and what's not is purely theoretical.
>
> Real contributors never concerned about current testing model.
>
> I think that you don't care about bitmaps. If bitmaps testing will get
> broken one day, or more complicated, you will not come to help. If I'm
> wrong and you are willing to contribute, you're warmly welcome! I always
> encourage people to increase testing coverage.
>
> If you'd like to add new cases to existing tests - I'll be happy. If
> you'd like to add completely new tests based on KUNITs or whatever
> else - I'll be happy just as well.

I can't speak for David, but you are right about me; I do not have an
interest in bitmap in particular. My interest is in kernel testing
generally, which I hope I have adequately explained above. As for my
willingness to help people obtain and keep good workflows, well,
you're welcome to examine my history in OSS. I've contributed to
dozens of projects, many for far longer than my professional goals
required.

Let's keep talking.
Tamir

[0] https://github.com/intel/lkp-tests



[PATCH v4 2/3] fs: split fileattr/fsxattr converters into helpers

2025-03-21 Thread Andrey Albershteyn
This will be helpful for get/setfsxattrat syscalls to convert
between fileattr and fsxattr.

Signed-off-by: Andrey Albershteyn 
---
 fs/ioctl.c   | 32 +---
 include/linux/fileattr.h |  2 ++
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index 
4434c97bc5dff5a3e8635e28745cd99404ff353e..840283d8c406623d8d26790f89b62ebcbd39e2de
 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -538,6 +538,16 @@ int vfs_fileattr_get(struct dentry *dentry, struct 
fileattr *fa)
 }
 EXPORT_SYMBOL(vfs_fileattr_get);
 
+void fileattr_to_fsxattr(const struct fileattr *fa, struct fsxattr *fsx)
+{
+   memset(fsx, 0, sizeof(struct fsxattr));
+   fsx->fsx_xflags = fa->fsx_xflags;
+   fsx->fsx_extsize = fa->fsx_extsize;
+   fsx->fsx_nextents = fa->fsx_nextents;
+   fsx->fsx_projid = fa->fsx_projid;
+   fsx->fsx_cowextsize = fa->fsx_cowextsize;
+}
+
 /**
  * copy_fsxattr_to_user - copy fsxattr to userspace.
  * @fa:fileattr pointer
@@ -549,12 +559,7 @@ int copy_fsxattr_to_user(const struct fileattr *fa, struct 
fsxattr __user *ufa)
 {
struct fsxattr xfa;
 
-   memset(&xfa, 0, sizeof(xfa));
-   xfa.fsx_xflags = fa->fsx_xflags;
-   xfa.fsx_extsize = fa->fsx_extsize;
-   xfa.fsx_nextents = fa->fsx_nextents;
-   xfa.fsx_projid = fa->fsx_projid;
-   xfa.fsx_cowextsize = fa->fsx_cowextsize;
+   fileattr_to_fsxattr(fa, &xfa);
 
if (copy_to_user(ufa, &xfa, sizeof(xfa)))
return -EFAULT;
@@ -563,6 +568,15 @@ int copy_fsxattr_to_user(const struct fileattr *fa, struct 
fsxattr __user *ufa)
 }
 EXPORT_SYMBOL(copy_fsxattr_to_user);
 
+void fsxattr_to_fileattr(const struct fsxattr *fsx, struct fileattr *fa)
+{
+   fileattr_fill_xflags(fa, fsx->fsx_xflags);
+   fa->fsx_extsize = fsx->fsx_extsize;
+   fa->fsx_nextents = fsx->fsx_nextents;
+   fa->fsx_projid = fsx->fsx_projid;
+   fa->fsx_cowextsize = fsx->fsx_cowextsize;
+}
+
 static int copy_fsxattr_from_user(struct fileattr *fa,
  struct fsxattr __user *ufa)
 {
@@ -571,11 +585,7 @@ static int copy_fsxattr_from_user(struct fileattr *fa,
if (copy_from_user(&xfa, ufa, sizeof(xfa)))
return -EFAULT;
 
-   fileattr_fill_xflags(fa, xfa.fsx_xflags);
-   fa->fsx_extsize = xfa.fsx_extsize;
-   fa->fsx_nextents = xfa.fsx_nextents;
-   fa->fsx_projid = xfa.fsx_projid;
-   fa->fsx_cowextsize = xfa.fsx_cowextsize;
+   fsxattr_to_fileattr(&xfa, fa);
 
return 0;
 }
diff --git a/include/linux/fileattr.h b/include/linux/fileattr.h
index 
47c05a9851d0600964b644c9c7218faacfd865f8..31888fa2edf10050be134f587299256088344365
 100644
--- a/include/linux/fileattr.h
+++ b/include/linux/fileattr.h
@@ -33,7 +33,9 @@ struct fileattr {
boolfsx_valid:1;
 };
 
+void fileattr_to_fsxattr(const struct fileattr *fa, struct fsxattr *fsx);
 int copy_fsxattr_to_user(const struct fileattr *fa, struct fsxattr __user 
*ufa);
+void fsxattr_to_fileattr(const struct fsxattr *fsx, struct fileattr *fa);
 
 void fileattr_fill_xflags(struct fileattr *fa, u32 xflags);
 void fileattr_fill_flags(struct fileattr *fa, u32 flags);

-- 
2.47.2




Re: distro support for CONFIG_KUNIT: [PATCH 0/3] bitmap: convert self-test to KUnit

2025-03-21 Thread Tamir Duberstein
Hi all, now that the printf and scanf series have been taken via kees'
tree[0] and sent in for v6.15-rc1[1], I wonder if we'd like to revisit
this discussion.

As I understand it, the primary objections to moving bitmap to KUnit were:
- Unclear benefits.
- Source churn.
- Extra dependencies for benchmarks.

Hopefully David's enumeration of the benefits of KUnit was compelling.
Regarding source churn: it is inevitable, but I did pay attention to
this and minimized the diff where possible.

The last point is trickiest, because KUnit doesn't have first-class
benchmark support, but nor is there a blessed benchmark facility in
the kernel generally. I'd prefer not to tie this series to distros
enabling KUNIT_CONFIG by default, which will take $time.

I think the most sensible thing we can do - if we accept that KUnit
has benefits to offer - is to split test_bitmap.c into
benchmark_bitmap.c and bitmap_kunit.c.

Please let me know your thoughts.
Tamir

[0] 
https://web.git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=move-lib-kunit-v6.15-rc1
[1] https://lore.kernel.org/all/202503170842.FFEE75351@keescook/



[PATCH v4 0/3] fs: introduce getfsxattrat and setfsxattrat syscalls

2025-03-21 Thread Andrey Albershteyn
This patchset introduced two new syscalls getfsxattrat() and
setfsxattrat(). These syscalls are similar to FS_IOC_FSSETXATTR ioctl()
except they use *at() semantics. Therefore, there's no need to open the
file to get an fd.

These syscalls allow userspace to set filesystem inode attributes on
special files. One of the usage examples is XFS quota projects.

XFS has project quotas which could be attached to a directory. All
new inodes in these directories inherit project ID set on parent
directory.

The project is created from userspace by opening and calling
FS_IOC_FSSETXATTR on each inode. This is not possible for special
files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left
with empty project ID. Those inodes then are not shown in the quota
accounting but still exist in the directory. This is not critical but in
the case when special files are created in the directory with already
existing project quota, these new inodes inherit extended attributes.
This creates a mix of special files with and without attributes.
Moreover, special files with attributes don't have a possibility to
become clear or change the attributes. This, in turn, prevents userspace
from re-creating quota project on these existing files.

Christian, if this get in some mergeable state, please don't merge it
yet. Amir suggested these syscalls better to use updated struct fsxattr
with masking from Pali Rohár patchset, so, let's see how it goes.

NAME

getfsxattrat/setfsxattrat - get/set filesystem inode attributes

SYNOPSIS

#include /* Definition of SYS_* constants */
#include 

long syscall(SYS_getfsxattrat, int dirfd, const char *pathname,
struct fsxattr *fsx, size_t size,
unsigned int at_flags);
long syscall(SYS_setfsxattrat, int dirfd, const char *pathname,
struct fsxattr *fsx, size_t size,
unsigned int at_flags);

Note: glibc doesn't provide for getfsxattrat()/setfsxattrat(),
use syscall(2) instead.

DESCRIPTION

The syscalls take fd and path to the child together with struct
fsxattr. If path is absolute, fd is not used. If path is empty,
inode under fd is used to get/set attributes on.

This is an alternative to FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR
ioctl with a difference that file don't need to be open as we
can reference it with a path instead of fd. By having this we
can manipulated filesystem inode attributes not only on regular
files but also on special ones. This is not possible with
FS_IOC_FSSETXATTR ioctl as with special files we can not call
ioctl() directly on the filesystem inode using file descriptor.

RETURN VALUE

On success, 0 is returned.  On error, -1 is returned, and errno
is set to indicate the error.

ERRORS

EINVAL  Invalid at_flag specified (only
AT_SYMLINK_NOFOLLOW and AT_EMPTY_PATH is
supported).

EINVAL  Size was smaller than any known version of
struct fsxattr.

EINVAL  Invalid combination of parameters provided in
fsxattr for this type of file.

E2BIG   Size of input argument **struct fsxattr** is too
big.

EBADF   Invalid file descriptor was provided.

EPERM   No permission to change this file.

EOPNOTSUPP  Filesystem does not support setting attributes
on this type of inode

HISTORY

Added in Linux 6.14.

EXAMPLE

Create directory and file "mkdir ./dir && touch ./dir/foo" and then
execute the following program:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

int
main(int argc, char **argv) {
int dfd;
int error;
struct fsxattr fsx;

dfd = open("./dir", O_RDONLY);
if (dfd == -1) {
printf("can not open ./dir");
return dfd;
}

error = syscall(467, dfd, "./foo", &fsx, 0);
if (error) {
printf("can not call 467: %s", strerror(errno));
return error;
}

printf("dir/foo flags: %d\n", fsx.fsx_xflags);

fsx.fsx_xflags |= FS_XFLAG_NODUMP;
error = syscall(468, dfd, "./foo", &fsx, 0);
if (error) {
printf("can not call 468: %s", strerror(errno));
return error;
}

printf("dir/foo flags: %d\n", fsx.fsx_xflags);

return error;
}

SEE ALSO

ioctl(2), ioctl_iflags(2), ioctl_xfs_fsgetxattr(2)

---
Changes in v4:
- Use getname_maybe_null() f

[PATCH v4 3/3] fs: introduce getfsxattrat and setfsxattrat syscalls

2025-03-21 Thread Andrey Albershteyn
From: Andrey Albershteyn 

Introduce getfsxattrat and setfsxattrat syscalls to manipulate inode
extended attributes/flags. The syscalls take parent directory fd and
path to the child together with struct fsxattr.

This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
that file don't need to be open as we can reference it with a path
instead of fd. By having this we can manipulated inode extended
attributes not only on regular files but also on special ones. This
is not possible with FS_IOC_FSSETXATTR ioctl as with special files
we can not call ioctl() directly on the filesystem inode using fd.

This patch adds two new syscalls which allows userspace to get/set
extended inode attributes on special files by using parent directory
and a path - *at() like syscall.

CC: linux-...@vger.kernel.org
CC: linux-fsde...@vger.kernel.org
CC: linux-...@vger.kernel.org
Signed-off-by: Andrey Albershteyn 
Acked-by: Arnd Bergmann 
---
 arch/alpha/kernel/syscalls/syscall.tbl  |   2 +
 arch/arm/tools/syscall.tbl  |   2 +
 arch/arm64/tools/syscall_32.tbl |   2 +
 arch/m68k/kernel/syscalls/syscall.tbl   |   2 +
 arch/microblaze/kernel/syscalls/syscall.tbl |   2 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   |   2 +
 arch/mips/kernel/syscalls/syscall_n64.tbl   |   2 +
 arch/mips/kernel/syscalls/syscall_o32.tbl   |   2 +
 arch/parisc/kernel/syscalls/syscall.tbl |   2 +
 arch/powerpc/kernel/syscalls/syscall.tbl|   2 +
 arch/s390/kernel/syscalls/syscall.tbl   |   2 +
 arch/sh/kernel/syscalls/syscall.tbl |   2 +
 arch/sparc/kernel/syscalls/syscall.tbl  |   2 +
 arch/x86/entry/syscalls/syscall_32.tbl  |   2 +
 arch/x86/entry/syscalls/syscall_64.tbl  |   2 +
 arch/xtensa/kernel/syscalls/syscall.tbl |   2 +
 fs/inode.c  | 130 
 include/linux/syscalls.h|   6 ++
 include/uapi/asm-generic/unistd.h   |   8 +-
 include/uapi/linux/fs.h |   3 +
 20 files changed, 178 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl 
b/arch/alpha/kernel/syscalls/syscall.tbl
index 
c59d53d6d3f3490f976ca179ddfe02e69265ae4d..4b9e687494c16b60c6fd6ca1dc4d6564706a7e25
 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -506,3 +506,5 @@
 574common  getxattrat  sys_getxattrat
 575common  listxattrat sys_listxattrat
 576common  removexattrat   sys_removexattrat
+577common  getfsxattratsys_getfsxattrat
+578common  setfsxattratsys_setfsxattrat
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index 
49eeb2ad8dbd8e074c6240417693f23fb328afa8..66466257f3c2debb3e2299f0b608c6740c98cab2
 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -481,3 +481,5 @@
 464common  getxattrat  sys_getxattrat
 465common  listxattrat sys_listxattrat
 466common  removexattrat   sys_removexattrat
+467common  getfsxattratsys_getfsxattrat
+468common  setfsxattratsys_setfsxattrat
diff --git a/arch/arm64/tools/syscall_32.tbl b/arch/arm64/tools/syscall_32.tbl
index 
69a829912a05eb8a3e21ed701d1030e31c0148bc..9c516118b154811d8d11d5696f32817430320dbf
 100644
--- a/arch/arm64/tools/syscall_32.tbl
+++ b/arch/arm64/tools/syscall_32.tbl
@@ -478,3 +478,5 @@
 464common  getxattrat  sys_getxattrat
 465common  listxattrat sys_listxattrat
 466common  removexattrat   sys_removexattrat
+467common  getfsxattratsys_getfsxattrat
+468common  setfsxattratsys_setfsxattrat
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl 
b/arch/m68k/kernel/syscalls/syscall.tbl
index 
f5ed71f1910d09769c845c2d062d99ee0449437c..159476387f394a92ee5e29db89b118c630372db2
 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -466,3 +466,5 @@
 464common  getxattrat  sys_getxattrat
 465common  listxattrat sys_listxattrat
 466common  removexattrat   sys_removexattrat
+467common  getfsxattratsys_getfsxattrat
+468common  setfsxattratsys_setfsxattrat
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl 
b/arch/microblaze/kernel/syscalls/syscall.tbl
index 
680f568b77f2cbefc3eacb2517f276041f229b1e..a6d59ee740b58cacf823702003cf9bad17c0d3b7
 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -472,3 +472,5 @@
 464common  getxattrat  sys_getxattrat
 465common  listxattrat sys_listxattrat
 466common  removexattrat 

Re: [PATCH v4 10/14] s390: Add support for suppressing warning backtraces

2025-03-21 Thread Alessandro Carminati
Hello Guenter,
Sorry for being late to the party.

On Fri, Mar 21, 2025 at 6:06 PM Guenter Roeck  wrote:
>
> On 3/13/25 04:43, Alessandro Carminati wrote:
> > From: Guenter Roeck 
> >
> > Add name of functions triggering warning backtraces to the __bug_table
> > object section to enable support for suppressing WARNING backtraces.
> >
> > To limit image size impact, the pointer to the function name is only added
> > to the __bug_table section if both CONFIG_KUNIT_SUPPRESS_BACKTRACE and
> > CONFIG_DEBUG_BUGVERBOSE are enabled. Otherwise, the __func__ assembly
> > parameter is replaced with a (dummy) NULL parameter to avoid an image size
> > increase due to unused __func__ entries (this is necessary because
> > __func__ is not a define but a virtual variable).
> >
> > Tested-by: Linux Kernel Functional Testing 
> > Acked-by: Dan Carpenter 
> > Cc: Heiko Carstens 
> > Cc: Vasily Gorbik 
> > Cc: Alexander Gordeev 
> > Signed-off-by: Guenter Roeck 
> > Signed-off-by: Alessandro Carminati 
> > ---
> >   arch/s390/include/asm/bug.h | 17 ++---
> >   1 file changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/s390/include/asm/bug.h b/arch/s390/include/asm/bug.h
> > index c500d45fb465..44d4e9f24ae0 100644
> > --- a/arch/s390/include/asm/bug.h
> > +++ b/arch/s390/include/asm/bug.h
> > @@ -8,6 +8,15 @@
> >
> >   #ifdef CONFIG_DEBUG_BUGVERBOSE
> >
> > +#ifdef CONFIG_KUNIT_SUPPRESS_BACKTRACE
> > +# define HAVE_BUG_FUNCTION
> > +# define __BUG_FUNC_PTR  "   .long   %0-.\n"
> > +# define __BUG_FUNC  __func__
>
> gcc 7.5.0 on s390 barfs; it doesn't like the use of "__func__" with "%0-."
>
> drivers/gpu/drm/bridge/analogix/analogix-i2c-dptx.c: In function 
> 'anx_dp_aux_transfer':
> ././include/linux/compiler_types.h:492:20: warning: asm operand 0 probably 
> doesn't match constraints
>
> I was unable to find an alternate constraint that the compiler would accept.
>
> I don't know if the same problem is seen with older compilers on other 
> architectures,
> or if the problem is relevant in the first place.
>
> gcc 10.3.0 and later do not have this problem. I also tried s390 builds with 
> gcc 9.4
> and 9.5 but they both crash for unrelated reasons.
>
> If this is a concern, the best idea I have is to make KUNIT_SUPPRESS_BACKTRACE
> depend on, say,
> depends on CC_IS_CLANG || (CC_IS_GCC && GCC_VERSION >= 100300)
>
> A more complex solution might be to define an architecture flag such
> as HAVE_SUPPRESS_BACKTRACE, make that conditional on the gcc version
> for s390 only, and make KUNIT_SUPPRESS_BACKTRACE depend on it.

I've spent some time trying to better define the problem.
Although it may seem trivial, the old compiler simply doesn't work—I
believe the issue is a bit more complex.

So, let me share some code and then comment on it.
$ cat bug-s390.c
#include "bug_entry.h"
#define asm_inline asm __inline
# define __BUG_FUNC_PTR " .long %0-.\n"
# define __BUG_FUNC __func__
#define __EMIT_BUG(x) do { \
asm_inline volatile( \
"0: mc 0,0\n" \
".section .rodata.str,\"aMS\",@progbits,1\n" \
"1: .asciz \""__FILE__"\"\n" \
".previous\n" \
".section __bug_table,\"aw\"\n" \
"2: .long 0b-.\n" \
" .long 1b-.\n" \
__BUG_FUNC_PTR \
" .short %1,%2\n" \
" .org 2b+%3\n" \
".previous\n" \
: : "i" (__BUG_FUNC), \
"i" (__LINE__), \
"i" (x), \
"i" (sizeof(struct bug_entry))); \
} while (0)

#define BUG() do { \
__EMIT_BUG(0); \
} while (0)

void f1(){
BUG();
}
void f2(){
BUG();
}
int main() {
BUG();
f1();
f2();
return 0;
}
$ # This is a stripped version of the s390x code for bug
$ ~/x-tools/s390x-ibm-linux-gnu_14/bin/s390x-ibm-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/home/alessandro/x-tools/s390x-ibm-linux-gnu_14/bin/s390x-ibm-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/home/alessandro/x-tools/s390x-ibm-linux-gnu_14/bin/../libexec/gcc/s390x-ibm-linux-gnu/14.2.0/lto-wrapper
Target: s390x-ibm-linux-gnu
Configured with:
/home/alessandro/src/s390x-toolchain/.build/s390x-ibm-linux-gnu/src/gcc/configure
--build=x86_64-build_pc-linux-gnu --host=x86_64-build_pc-linux-gnu
--target=s390x-ibm-linux-gnu
--prefix=/home/alessandro/x-tools/s390x-ibm-linux-gnu
--exec_prefix=/home/alessandro/x-tools/s390x-ibm-linux-gnu
--with-sysroot=/home/alessandro/x-tools/s390x-ibm-linux-gnu/s390x-ibm-linux-gnu/sysroot
--enable-languages=c,c++ --with-pkgversion='crosstool-NG
1.27.0.18_7458341' --enable-__cxa_atexit --disable-libmudflap
--disable-libgomp --disable-libssp --disable-libquadmath
--disable-libquadmath-support --disable-libsanitizer --disable-libmpx
--with-gmp=/home/alessandro/src/s390x-toolchain/.build/s390x-ibm-linux-gnu/buildtools
--with-mpfr=/home/alessandro/src/s390x-toolchain/.build/s390x-ibm-linux-gnu/buildtools
--with-mpc=/home/alessandro/src/s390x-toolchain/.build/s390x-ibm-linux-gnu/buildtools
--with-isl=/home/alessandro/src/s390x-toolchain/.build/s390x-ibm-linux-gnu/buildtools
--enable-lto --enable-threads=posix --enable-target-optspace
--disable-plugin --disable-nls --disable-multil

Re: [RFC 2/3] mm: introduce GCMA

2025-03-21 Thread Suren Baghdasaryan
On Thu, Mar 20, 2025 at 10:14 PM Christoph Hellwig  wrote:
>
> On Thu, Mar 20, 2025 at 10:39:30AM -0700, Suren Baghdasaryan wrote:
> > From: Minchan Kim 
> >
> > This patch introduces GCMA (Guaranteed Contiguous Memory Allocator)
> > cleacache backend which reserves some amount of memory at the boot
> > and then donates it to store clean file-backed pages in the cleancache.
> > GCMA aims to guarantee contiguous memory allocation success as well as
> > low and deterministic allocation latency.
> >
> > Notes:
> > Originally, the idea was posted by SeongJae Park and Minchan Kim [1].
> > Later Minchan reworked it to be used in Android as a reference for
> > Android vendors to use [2].
>
> That is not a very good summay.  It needs to explain how you ensure
> that the pages do stay clean forever.

Sure, I'm happy to improve the description. Do you want more details
about how only clean pages end up in the cleancache and how they get
invalidated once the original page gets modified? Or is the concern
that donated pages might be changed by the donor without taking them
away from the cleancache?

>



Re: [RFC 1/3] mm: implement cleancache

2025-03-21 Thread Suren Baghdasaryan
On Thu, Mar 20, 2025 at 10:14 PM Christoph Hellwig  wrote:
>
> On Thu, Mar 20, 2025 at 10:39:29AM -0700, Suren Baghdasaryan wrote:
> > Cleancache can be thought of as a page-granularity victim cache for clean
>
> Please implement your semantics directly instea of with a single user
> abstraction.  If we ever need an abstraction we can add it once we have
> multiple consumers and know what they need.

If after the conference no other users emerge I will fold it into
GCMA. That's quite easy to do.
Thanks,
Suren.

>



Re: [linux-next-20250320][btrfs] Kernel OOPs while running btrfs/108

2025-03-21 Thread Ritesh Harjani (IBM)


+linux-btrfs

Venkat Rao Bagalkote  writes:

> Greetings!!!
>
>
> I am observing Kernel oops while running brtfs/108 TC on IBM Power System.
>
> Repo: Linux-Next (next-20250320)

Looks like this next tag had many btrfs related changes -
https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/fs/btrfs?h=next-20250320

>
> Traces:
>
> [  418.392604] run fstests btrfs/108 at 2025-03-21 05:11:21
> [  418.560137] Kernel attempted to read user page (0) - exploit attempt? 
> (uid: 0)
> [  418.560156] BUG: Kernel NULL pointer dereference on read at 0x

NULL pointer dereference... 

> [  418.560161] Faulting instruction address: 0xc10ef8b0
> [  418.560166] Oops: Kernel access of bad area, sig: 11 [#1]
> [  418.560169] LE PAGE_SIZE=64K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
> [  418.560174] Modules linked in: btrfs blake2b_generic xor raid6_pq 
> zstd_compress loop nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib 
> nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct 
> nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 bonding nf_defrag_ipv4 
> tls rfkill ip_set nf_tables nfnetlink sunrpc pseries_rng vmx_crypto fuse 
> ext4 mbcache jbd2 sd_mod sg ibmvscsi scsi_transport_srp ibmveth
> [  418.560212] CPU: 1 UID: 0 PID: 37583 Comm: rm Kdump: loaded Not 
> tainted 6.14.0-rc7-next-20250320 #1 VOLUNTARY
> [  418.560218] Hardware name: IBM,9080-HEX Power11
> [  418.560223] NIP:  c10ef8b0 LR: c0080bb190ac CTR: 
> c10ef888
> [  418.560227] REGS: c000a252f5a0 TRAP: 0300   Not tainted 
> (6.14.0-rc7-next-20250320)
> [  418.560232] MSR:  80009033   CR: 
> 44008444  XER: 2004
> [  418.560240] CFAR: c0080bc1df84 DAR:  DSISR: 
> 4000 IRQMASK: 1
> [  418.560240] GPR00: c0080bb190ac c000a252f840 c16a8100 
> 
> [  418.560240] GPR04:  0001  
> fffe
> [  418.560240] GPR08: c0010724aad8 0003 1000 
> c0080bc1df70
> [  418.560240] GPR12: c10ef888 c00adb00  
> 
> [  418.560240] GPR16:    
> 
> [  418.560240] GPR20: c000777a8000 c0006a9c9000 c0010724a950 
> c000777a8000
> [  418.560240] GPR24: fffe c0010724aad8 0001 
> 00a0
> [  418.560240] GPR28: 0001 c00c0048c3c0  
> 
> [  418.560287] NIP [c10ef8b0] _raw_spin_lock_irq+0x28/0x98
> [  418.560294] LR [c0080bb190ac] wait_subpage_spinlock+0x64/0xd0 [btrfs]


btrfs is working on subpage size support for a while now.
Adding +linux-btrfs, in case if they are already aware of this problem.

I am not that familiar with btrfs code. But does this look like that the
subpage (folio->private became NULL here) somehow?

-ritesh

> [  418.560339] Call Trace:
> [  418.560342] [c000a252f870] [c0080bb205dc] 
> btrfs_invalidate_folio+0xa8/0x4f0 [btrfs]
> [  418.560384] [c000a252f930] [c04cbcdc] 
> truncate_cleanup_folio+0x110/0x14c
> [  418.560391] [c000a252f960] [c04ccc7c] 
> truncate_inode_pages_range+0x100/0x4dc
> [  418.560397] [c000a252fbd0] [c0080bb20ba8] 
> btrfs_evict_inode+0x74/0x510 [btrfs]
> [  418.560437] [c000a252fc90] [c065c71c] evict+0x164/0x334
> [  418.560443] [c000a252fd30] [c0647c9c] do_unlinkat+0x2f4/0x3a4
> [  418.560449] [c000a252fde0] [c0647da0] sys_unlinkat+0x54/0xac
> [  418.560454] [c000a252fe10] [c0033498] 
> system_call_exception+0x138/0x330
> [  418.560461] [c000a252fe50] [c000d05c] 
> system_call_vectored_common+0x15c/0x2ec
> [  418.560468] --- interrupt: 3000 at 0x7fffb1b366bc
> [  418.560471] NIP:  7fffb1b366bc LR: 7fffb1b366bc CTR: 
> 
> [  418.560475] REGS: c000a252fe80 TRAP: 3000   Not tainted 
> (6.14.0-rc7-next-20250320)
> [  418.560479] MSR:  8280f033 
>   CR: 44008804  XER: 
> [  418.560490] IRQMASK: 0
> [  418.560490] GPR00: 0124 7cb4e2b0 7fffb1c37d00 
> ff9c
> [  418.560490] GPR04: 00013d660380   
> 0003
> [  418.560490] GPR08:    
> 
> [  418.560490] GPR12:  7fffb1dba5c0 7cb4e538 
> 00011972d0e8
> [  418.560490] GPR16: 00011972d098 00011972d060 00011972d020 
> 00011972cff0
> [  418.560490] GPR20: 00011972d298 00011972cc10  
> 00013d6615a0
> [  418.560490] GPR24: 0002 00011972d0b8 00011972cf98 
> 00011972d1d0
> [  418.560490] GPR28: 7cb4e538 00013d6602f0  
> 0010
> [  418.560532] NIP [7fffb1b366bc] 0x7fffb1b366bc
> [  418.560536] LR [7fffb1b366bc] 0x7fffb1b366bc
> [  418.560538] --- interr

Re: [PATCH v4 10/14] s390: Add support for suppressing warning backtraces

2025-03-21 Thread Arnd Bergmann
On Fri, Mar 21, 2025, at 18:05, Guenter Roeck wrote:
> On 3/13/25 04:43, Alessandro Carminati wrote:
>
> gcc 10.3.0 and later do not have this problem. I also tried s390 builds 
> with gcc 9.4
> and 9.5 but they both crash for unrelated reasons.
>
> If this is a concern, the best idea I have is to make KUNIT_SUPPRESS_BACKTRACE
> depend on, say,
>   depends on CC_IS_CLANG || (CC_IS_GCC && GCC_VERSION >= 100300)
>
> A more complex solution might be to define an architecture flag such
> as HAVE_SUPPRESS_BACKTRACE, make that conditional on the gcc version
> for s390 only, and make KUNIT_SUPPRESS_BACKTRACE depend on it.

That is probably fine, there are very few users on s390 that build
their own kernels, and they likely all have modern compilers anyway.

I should still send a patch to raise the minimum compiler version to
gcc-8.1, but unfortunately that is not enough here.

 Arnd



Re: [PATCH v4 10/14] s390: Add support for suppressing warning backtraces

2025-03-21 Thread Guenter Roeck

On 3/13/25 04:43, Alessandro Carminati wrote:

From: Guenter Roeck 

Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.

To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT_SUPPRESS_BACKTRACE and
CONFIG_DEBUG_BUGVERBOSE are enabled. Otherwise, the __func__ assembly
parameter is replaced with a (dummy) NULL parameter to avoid an image size
increase due to unused __func__ entries (this is necessary because
__func__ is not a define but a virtual variable).

Tested-by: Linux Kernel Functional Testing 
Acked-by: Dan Carpenter 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Alexander Gordeev 
Signed-off-by: Guenter Roeck 
Signed-off-by: Alessandro Carminati 
---
  arch/s390/include/asm/bug.h | 17 ++---
  1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/s390/include/asm/bug.h b/arch/s390/include/asm/bug.h
index c500d45fb465..44d4e9f24ae0 100644
--- a/arch/s390/include/asm/bug.h
+++ b/arch/s390/include/asm/bug.h
@@ -8,6 +8,15 @@
  
  #ifdef CONFIG_DEBUG_BUGVERBOSE
  
+#ifdef CONFIG_KUNIT_SUPPRESS_BACKTRACE

+# define HAVE_BUG_FUNCTION
+# define __BUG_FUNC_PTR"  .long   %0-.\n"
+# define __BUG_FUNC__func__


gcc 7.5.0 on s390 barfs; it doesn't like the use of "__func__" with "%0-."

drivers/gpu/drm/bridge/analogix/analogix-i2c-dptx.c: In function 
'anx_dp_aux_transfer':
././include/linux/compiler_types.h:492:20: warning: asm operand 0 probably 
doesn't match constraints

I was unable to find an alternate constraint that the compiler would accept.

I don't know if the same problem is seen with older compilers on other 
architectures,
or if the problem is relevant in the first place.

gcc 10.3.0 and later do not have this problem. I also tried s390 builds with 
gcc 9.4
and 9.5 but they both crash for unrelated reasons.

If this is a concern, the best idea I have is to make KUNIT_SUPPRESS_BACKTRACE
depend on, say,
depends on CC_IS_CLANG || (CC_IS_GCC && GCC_VERSION >= 100300)

A more complex solution might be to define an architecture flag such
as HAVE_SUPPRESS_BACKTRACE, make that conditional on the gcc version
for s390 only, and make KUNIT_SUPPRESS_BACKTRACE depend on it.

Guenter




Re: [RFC 3/3] mm: integrate GCMA with CMA using dt-bindings

2025-03-21 Thread Suren Baghdasaryan
On Fri, Mar 21, 2025 at 7:06 AM Conor Dooley  wrote:
>
> On Thu, Mar 20, 2025 at 10:39:31AM -0700, Suren Baghdasaryan wrote:
> > This patch introduces a new "guarantee" property for shared-dma-pool.
> > With this property, admin can create specific memory pool as
> > GCMA-based CMA if they care about allocation success rate and latency.
> > The downside of GCMA is that it can host only clean file-backed pages
> > since it's using cleancache as its secondary user.
> >
> > Signed-off-by: Minchan Kim 
> > Signed-off-by: Suren Baghdasaryan 
> > ---
> >  arch/powerpc/kernel/fadump.c |  2 +-
> >  include/linux/cma.h  |  2 +-
> >  kernel/dma/contiguous.c  | 11 ++-
> >  mm/cma.c | 33 ++---
> >  mm/cma.h |  1 +
> >  mm/cma_sysfs.c   | 10 ++
> >  6 files changed, 49 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> > index 4b371c738213..4eb7be0cdcdb 100644
> > --- a/arch/powerpc/kernel/fadump.c
> > +++ b/arch/powerpc/kernel/fadump.c
> > @@ -111,7 +111,7 @@ void __init fadump_cma_init(void)
> >   return;
> >   }
> >
> > - rc = cma_init_reserved_mem(base, size, 0, "fadump_cma", &fadump_cma);
> > + rc = cma_init_reserved_mem(base, size, 0, "fadump_cma", &fadump_cma, 
> > false);
> >   if (rc) {
> >   pr_err("Failed to init cma area for firmware-assisted 
> > dump,%d\n", rc);
> >   /*
> > diff --git a/include/linux/cma.h b/include/linux/cma.h
> > index 62d9c1cf6326..3207db979e94 100644
> > --- a/include/linux/cma.h
> > +++ b/include/linux/cma.h
> > @@ -46,7 +46,7 @@ extern int __init 
> > cma_declare_contiguous_multi(phys_addr_t size,
> >  extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
> >   unsigned int order_per_bit,
> >   const char *name,
> > - struct cma **res_cma);
> > + struct cma **res_cma, bool gcma);
> >  extern struct page *cma_alloc(struct cma *cma, unsigned long count, 
> > unsigned int align,
> > bool no_warn);
> >  extern bool cma_pages_valid(struct cma *cma, const struct page *pages, 
> > unsigned long count);
> > diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
> > index 055da410ac71..a68b3123438c 100644
> > --- a/kernel/dma/contiguous.c
> > +++ b/kernel/dma/contiguous.c
> > @@ -459,6 +459,7 @@ static int __init rmem_cma_setup(struct reserved_mem 
> > *rmem)
> >   unsigned long node = rmem->fdt_node;
> >   bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", 
> > NULL);
> >   struct cma *cma;
> > + bool gcma;
> >   int err;
> >
> >   if (size_cmdline != -1 && default_cma) {
> > @@ -476,7 +477,15 @@ static int __init rmem_cma_setup(struct reserved_mem 
> > *rmem)
> >   return -EINVAL;
> >   }
> >
> > - err = cma_init_reserved_mem(rmem->base, rmem->size, 0, rmem->name, 
> > &cma);
> > + gcma = !!of_get_flat_dt_prop(node, "guarantee", NULL);
>
> When this (or if I guess) this goes !RFC, you will need to document this
> new property that you're adding.

Definitely. I'll document the cleancache and GCMA as well.
Thanks!



Re: Using Restricted DMA for virtio-pci

2025-03-21 Thread David Woodhouse
On Fri, 2025-03-21 at 14:32 -0400, Michael S. Tsirkin wrote:
> On Fri, Mar 21, 2025 at 03:38:10PM +, David Woodhouse wrote:
> > On Tue, 2021-02-09 at 14:21 +0800, Claire Chang wrote:
> > > This series implements mitigations for lack of DMA access control on
> > > systems without an IOMMU, which could result in the DMA accessing the
> > > system memory at unexpected times and/or unexpected addresses, possibly
> > > leading to data leakage or corruption.
> > 
> > Replying to an ancient (2021) thread which has already been merged...
> > 
> > I'd like to be able to use this facility for virtio devices.
> > 
> > Virtio already has a complicated relationship with the DMA API, because
> > there were a bunch of early VMM bugs where the virtio devices where
> > magically exempted from IOMMU protection, but the VMM lied to the guest
> > and claimed they weren't.
> > 
> > With the advent of confidential computing, and the VMM (or whatever's
> > emulating the virtio device) not being *allowed* to arbitrarily access
> > all of the guest's memory, the DMA API becomes necessary again.
> > 
> > Either a virtual IOMMU needs to determine which guest memory the VMM
> > may access, or the DMA API is wrappers around operations which
> > share/unshare (or unencrypt/encrypt) the memory in question.
> > 
> > All of which is complicated and slow, if we're looking at a minimal
> > privileged hypervisor stub like pKVM which enforces the lack of guest
> > memory access from VMM.
> > 
> > I'm thinking of defining a new type of virtio-pci device which cannot
> > do DMA to arbitrary system memory. Instead it has an additional memory
> > BAR which is used as a SWIOTLB for bounce buffering.
> > 
> > The driver for it would look much like the existing virtio-pci device
> > except that it would register the restricted-dma region first (and thus
> > the swiotlb dma_ops), and then just go through the rest of the setup
> > like any other virtio device.
> > 
> > That seems like it ought to be fairly simple, and seems like a
> > reasonable way to allow an untrusted VMM to provide virtio devices with
> > restricted DMA access.
> > 
> > While I start actually doing the typing... does anyone want to start
> > yelling at me now? Christoph? mst? :)
> 
> 
> I don't mind as such (though I don't understand completely), but since
> this is changing the device anyway, I am a bit confused why you can't
> just set the VIRTIO_F_ACCESS_PLATFORM feature bit?  This forces DMA API
> which will DTRT for you, will it not?

That would be necessary but not sufficient. The question is *what* does
the DMA API do?

For a real passthrough PCI device, perhaps we'd have a vIOMMU exposed
to the guest so that it can do real protection with two-stage page
tables (IOVA→GPA under control of the guest, GPA→HPA under control of
the hypervisor). For that to work in the pKVM model though, you'd need
pKVM to be talking the guest's stage1 I/O page tables to see if a given
access from the VMM ought to be permitted?

Or for confidential guests there could be DMA ops which are an
'enlightenment'; a hypercall into pKVM to share/unshare pages so that
the VMM can actually access them, or SEV-SNP guests might mark pages
unencrypted to have the same effect with hardware protection.

Doing any of those dynamically to allow the VMM to access buffers in
arbitrary guest memory (when it wouldn't normally have access to
arbitrary guest memory) is complex and doesn't perform very well. And
exposes a full 4KiB page for any byte that needs to be made available.

Thus the idea of having a fixed range of memory to use for a SWIOTLB,
which is fairly much what the restricted DMA setup is all about.

We're just proposing that we build it in to a virtio-pci device model,
which automatically uses the extra memory BAR instead of the
restricted-dma-pool DT node.

It's basically just allowing us to expose through PCI, what I believe
we can already do for virtio in DT.


smime.p7s
Description: S/MIME cryptographic signature


Re: [linux-next-20250320][btrfs] Kernel OOPs while running btrfs/108

2025-03-21 Thread Qu Wenruo




在 2025/3/22 02:26, Ritesh Harjani (IBM) 写道:


+linux-btrfs

Venkat Rao Bagalkote  writes:


Greetings!!!


I am observing Kernel oops while running brtfs/108 TC on IBM Power System.

Repo: Linux-Next (next-20250320)


Looks like this next tag had many btrfs related changes -
https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/fs/btrfs?h=next-20250320



Traces:

[  418.392604] run fstests btrfs/108 at 2025-03-21 05:11:21
[  418.560137] Kernel attempted to read user page (0) - exploit attempt?
(uid: 0)
[  418.560156] BUG: Kernel NULL pointer dereference on read at 0x


NULL pointer dereference...


[  418.560161] Faulting instruction address: 0xc10ef8b0
[  418.560166] Oops: Kernel access of bad area, sig: 11 [#1]
[  418.560169] LE PAGE_SIZE=64K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
[  418.560174] Modules linked in: btrfs blake2b_generic xor raid6_pq
zstd_compress loop nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 bonding nf_defrag_ipv4
tls rfkill ip_set nf_tables nfnetlink sunrpc pseries_rng vmx_crypto fuse
ext4 mbcache jbd2 sd_mod sg ibmvscsi scsi_transport_srp ibmveth
[  418.560212] CPU: 1 UID: 0 PID: 37583 Comm: rm Kdump: loaded Not
tainted 6.14.0-rc7-next-20250320 #1 VOLUNTARY
[  418.560218] Hardware name: IBM,9080-HEX Power11
[  418.560223] NIP:  c10ef8b0 LR: c0080bb190ac CTR:
c10ef888
[  418.560227] REGS: c000a252f5a0 TRAP: 0300   Not tainted
(6.14.0-rc7-next-20250320)
[  418.560232] MSR:  80009033   CR:
44008444  XER: 2004
[  418.560240] CFAR: c0080bc1df84 DAR:  DSISR:
4000 IRQMASK: 1
[  418.560240] GPR00: c0080bb190ac c000a252f840 c16a8100

[  418.560240] GPR04:  0001 
fffe
[  418.560240] GPR08: c0010724aad8 0003 1000
c0080bc1df70
[  418.560240] GPR12: c10ef888 c00adb00 

[  418.560240] GPR16:   

[  418.560240] GPR20: c000777a8000 c0006a9c9000 c0010724a950
c000777a8000
[  418.560240] GPR24: fffe c0010724aad8 0001
00a0
[  418.560240] GPR28: 0001 c00c0048c3c0 

[  418.560287] NIP [c10ef8b0] _raw_spin_lock_irq+0x28/0x98
[  418.560294] LR [c0080bb190ac] wait_subpage_spinlock+0x64/0xd0 [btrfs]



btrfs is working on subpage size support for a while now.
Adding +linux-btrfs, in case if they are already aware of this problem.

I am not that familiar with btrfs code. But does this look like that the
subpage (folio->private became NULL here) somehow?


The for-next branch seems to have some conflicts, IIRC the following two
commits are no longer in our tree anymore:

btrfs: kill EXTENT_FOLIO_PRIVATE
btrfs: add mapping_set_release_always to inode's mapping

I believe those two may be the cause.

Mind to test with the our current for-next branch? Where that's all of
our development happening, and I run daily subpage fstests on it to make
sure at least that branch is safe:

  https://github.com/btrfs/linux/tree/for-next

And appreciate if you can verify if the NULL pointer dereference is
still there on that branch.

Thanks,
Qu



-ritesh


[  418.560339] Call Trace:
[  418.560342] [c000a252f870] [c0080bb205dc]
btrfs_invalidate_folio+0xa8/0x4f0 [btrfs]
[  418.560384] [c000a252f930] [c04cbcdc]
truncate_cleanup_folio+0x110/0x14c
[  418.560391] [c000a252f960] [c04ccc7c]
truncate_inode_pages_range+0x100/0x4dc
[  418.560397] [c000a252fbd0] [c0080bb20ba8]
btrfs_evict_inode+0x74/0x510 [btrfs]
[  418.560437] [c000a252fc90] [c065c71c] evict+0x164/0x334
[  418.560443] [c000a252fd30] [c0647c9c] do_unlinkat+0x2f4/0x3a4
[  418.560449] [c000a252fde0] [c0647da0] sys_unlinkat+0x54/0xac
[  418.560454] [c000a252fe10] [c0033498]
system_call_exception+0x138/0x330
[  418.560461] [c000a252fe50] [c000d05c]
system_call_vectored_common+0x15c/0x2ec
[  418.560468] --- interrupt: 3000 at 0x7fffb1b366bc
[  418.560471] NIP:  7fffb1b366bc LR: 7fffb1b366bc CTR:

[  418.560475] REGS: c000a252fe80 TRAP: 3000   Not tainted
(6.14.0-rc7-next-20250320)
[  418.560479] MSR:  8280f033
  CR: 44008804  XER: 
[  418.560490] IRQMASK: 0
[  418.560490] GPR00: 0124 7cb4e2b0 7fffb1c37d00
ff9c
[  418.560490] GPR04: 00013d660380  
0003
[  418.560490] GPR08:   

[  418.560490] GPR12:  7fffb1dba5c0 7cb4e538
00011972d0e8
[  418.560490] GPR16: 00011972d098 00011972d060 00011972

Re: [PATCH] vfio: pci: Advertise INTx only if LINE is connected

2025-03-21 Thread Alex Williamson
On Thu, 20 Mar 2025 23:24:49 +0530
Shivaprasad G Bhat  wrote:

> On 3/18/25 11:28 PM, Alex Williamson wrote:
> > On Tue, 18 Mar 2025 17:29:21 +
> > Shivaprasad G Bhat  wrote:
> >  
> >> On POWER systems, when the device is behind the io expander,
> >> not all PCI slots would have the PCI_INTERRUPT_LINE connected.
> >> The firmware assigns a valid PCI_INTERRUPT_PIN though. In such
> >> configuration, the irq_info ioctl currently advertizes the
> >> irq count as 1 as the PCI_INTERRUPT_PIN is valid.
> >>
> >> The patch adds the additional check[1] if the irq is assigned
> >> for the PIN which is done iff the LINE is connected.
> >>
> >> [1]: 
> >> https://lore.kernel.org/qemu-devel/20250131150201.048aa3bf.alex.william...@redhat.com/
> >>
> >> Signed-off-by: Shivaprasad G Bhat 
> >> Suggested-By: Alex Williamson 
> >> ---
> >>   drivers/vfio/pci/vfio_pci_core.c |4 
> >>   1 file changed, 4 insertions(+)
> >>
> >> diff --git a/drivers/vfio/pci/vfio_pci_core.c 
> >> b/drivers/vfio/pci/vfio_pci_core.c
> >> index 586e49efb81b..4ce70f05b4a8 100644
> >> --- a/drivers/vfio/pci/vfio_pci_core.c
> >> +++ b/drivers/vfio/pci/vfio_pci_core.c
> >> @@ -734,6 +734,10 @@ static int vfio_pci_get_irq_count(struct 
> >> vfio_pci_core_device *vdev, int irq_typ
> >>return 0;
> >>   
> >>pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin);
> >> +#if IS_ENABLED(CONFIG_PPC64)
> >> +  if (!vdev->pdev->irq)
> >> +  pin = 0;
> >> +#endif
> >>   
> >>return pin ? 1 : 0;
> >>} else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) {
> >>
> >>  
> > See:
> >
> > https://lore.kernel.org/all/20250311230623.1264283-1-alex.william...@redhat.com/
> >
> > Do we need to expand that to test !vdev->pdev->irq in
> > vfio_config_init()?  
> 
> Yes. Looks to be the better option. I did try this and it works.
> 
> 
> I see your patch has already got Reviewed-by. Are you planning
> 
> for v2 Or want me to post a separate patch with this new check?

It seems worth noting this as an additional vector for virtualizing the
PIN register since we'd often expect the PIN is already zero if
pdev->irq is zero.  I posted a patch[1], please review/test.  Thanks,

Alex

[1]https://lore.kernel.org/all/20250320194145.2816379-1-alex.william...@redhat.com/




Re: [linux-next-20250320][btrfs] Kernel OOPs while running btrfs/108

2025-03-21 Thread Venkat Rao Bagalkote



On 21/03/25 3:50 pm, Venkat Rao Bagalkote wrote:

Greetings!!!


I am observing Kernel oops while running brtfs/108 TC on IBM Power 
System.


Repo: Linux-Next (next-20250320)



Additional Info:

BTRFS tool: btrfs-progs v6.12

BTRFS tool repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git


XFS Repo:https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/



Traces:

[  418.392604] run fstests btrfs/108 at 2025-03-21 05:11:21
[  418.560137] Kernel attempted to read user page (0) - exploit 
attempt? (uid: 0)

[  418.560156] BUG: Kernel NULL pointer dereference on read at 0x
[  418.560161] Faulting instruction address: 0xc10ef8b0
[  418.560166] Oops: Kernel access of bad area, sig: 11 [#1]
[  418.560169] LE PAGE_SIZE=64K MMU=Radix  SMP NR_CPUS=8192 NUMA pSeries
[  418.560174] Modules linked in: btrfs blake2b_generic xor raid6_pq 
zstd_compress loop nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib 
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct 
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 bonding 
nf_defrag_ipv4 tls rfkill ip_set nf_tables nfnetlink sunrpc 
pseries_rng vmx_crypto fuse ext4 mbcache jbd2 sd_mod sg ibmvscsi 
scsi_transport_srp ibmveth
[  418.560212] CPU: 1 UID: 0 PID: 37583 Comm: rm Kdump: loaded Not 
tainted 6.14.0-rc7-next-20250320 #1 VOLUNTARY

[  418.560218] Hardware name: IBM,9080-HEX Power11
[  418.560223] NIP:  c10ef8b0 LR: c0080bb190ac CTR: 
c10ef888
[  418.560227] REGS: c000a252f5a0 TRAP: 0300   Not tainted 
(6.14.0-rc7-next-20250320)
[  418.560232] MSR:  80009033   CR: 
44008444  XER: 2004
[  418.560240] CFAR: c0080bc1df84 DAR:  DSISR: 
4000 IRQMASK: 1
[  418.560240] GPR00: c0080bb190ac c000a252f840 
c16a8100 
[  418.560240] GPR04:  0001 
 fffe
[  418.560240] GPR08: c0010724aad8 0003 
1000 c0080bc1df70
[  418.560240] GPR12: c10ef888 c00adb00 
 
[  418.560240] GPR16:   
 
[  418.560240] GPR20: c000777a8000 c0006a9c9000 
c0010724a950 c000777a8000
[  418.560240] GPR24: fffe c0010724aad8 
0001 00a0
[  418.560240] GPR28: 0001 c00c0048c3c0 
 

[  418.560287] NIP [c10ef8b0] _raw_spin_lock_irq+0x28/0x98
[  418.560294] LR [c0080bb190ac] wait_subpage_spinlock+0x64/0xd0 
[btrfs]

[  418.560339] Call Trace:
[  418.560342] [c000a252f870] [c0080bb205dc] 
btrfs_invalidate_folio+0xa8/0x4f0 [btrfs]
[  418.560384] [c000a252f930] [c04cbcdc] 
truncate_cleanup_folio+0x110/0x14c
[  418.560391] [c000a252f960] [c04ccc7c] 
truncate_inode_pages_range+0x100/0x4dc
[  418.560397] [c000a252fbd0] [c0080bb20ba8] 
btrfs_evict_inode+0x74/0x510 [btrfs]

[  418.560437] [c000a252fc90] [c065c71c] evict+0x164/0x334
[  418.560443] [c000a252fd30] [c0647c9c] 
do_unlinkat+0x2f4/0x3a4
[  418.560449] [c000a252fde0] [c0647da0] 
sys_unlinkat+0x54/0xac
[  418.560454] [c000a252fe10] [c0033498] 
system_call_exception+0x138/0x330
[  418.560461] [c000a252fe50] [c000d05c] 
system_call_vectored_common+0x15c/0x2ec

[  418.560468] --- interrupt: 3000 at 0x7fffb1b366bc
[  418.560471] NIP:  7fffb1b366bc LR: 7fffb1b366bc CTR: 

[  418.560475] REGS: c000a252fe80 TRAP: 3000   Not tainted 
(6.14.0-rc7-next-20250320)
[  418.560479] MSR:  8280f033 
  CR: 44008804  XER: 

[  418.560490] IRQMASK: 0
[  418.560490] GPR00: 0124 7cb4e2b0 
7fffb1c37d00 ff9c
[  418.560490] GPR04: 00013d660380  
 0003
[  418.560490] GPR08:   
 
[  418.560490] GPR12:  7fffb1dba5c0 
7cb4e538 00011972d0e8
[  418.560490] GPR16: 00011972d098 00011972d060 
00011972d020 00011972cff0
[  418.560490] GPR20: 00011972d298 00011972cc10 
 00013d6615a0
[  418.560490] GPR24: 0002 00011972d0b8 
00011972cf98 00011972d1d0
[  418.560490] GPR28: 7cb4e538 00013d6602f0 
 0010

[  418.560532] NIP [7fffb1b366bc] 0x7fffb1b366bc
[  418.560536] LR [7fffb1b366bc] 0x7fffb1b366bc
[  418.560538] --- interrupt: 3000
[  418.560541] Code: 7c0803a6 4e800020 3c4c005c 38428878 7c0802a6 
6000 3921 992d0932 a12d0008 3ce0fffe 5529083c 61290001 
<7d001829> 7d063879 40c20018 7d063838

[  418.560555] ---[ end trace  ]---


If you happed to fix this, please add below tag.


Reported-by: Venkat Rao Bagalkote 


Regards,

Venkat.






Re: Using Restricted DMA for virtio-pci

2025-03-21 Thread Michael S. Tsirkin
On Fri, Mar 21, 2025 at 03:38:10PM +, David Woodhouse wrote:
> On Tue, 2021-02-09 at 14:21 +0800, Claire Chang wrote:
> > This series implements mitigations for lack of DMA access control on
> > systems without an IOMMU, which could result in the DMA accessing the
> > system memory at unexpected times and/or unexpected addresses, possibly
> > leading to data leakage or corruption.
> 
> Replying to an ancient (2021) thread which has already been merged...
> 
> I'd like to be able to use this facility for virtio devices.
> 
> Virtio already has a complicated relationship with the DMA API, because
> there were a bunch of early VMM bugs where the virtio devices where
> magically exempted from IOMMU protection, but the VMM lied to the guest
> and claimed they weren't.
> 
> With the advent of confidential computing, and the VMM (or whatever's
> emulating the virtio device) not being *allowed* to arbitrarily access
> all of the guest's memory, the DMA API becomes necessary again.
> 
> Either a virtual IOMMU needs to determine which guest memory the VMM
> may access, or the DMA API is wrappers around operations which
> share/unshare (or unencrypt/encrypt) the memory in question.
> 
> All of which is complicated and slow, if we're looking at a minimal
> privileged hypervisor stub like pKVM which enforces the lack of guest
> memory access from VMM.
> 
> I'm thinking of defining a new type of virtio-pci device which cannot
> do DMA to arbitrary system memory. Instead it has an additional memory
> BAR which is used as a SWIOTLB for bounce buffering.
> 
> The driver for it would look much like the existing virtio-pci device
> except that it would register the restricted-dma region first (and thus
> the swiotlb dma_ops), and then just go through the rest of the setup
> like any other virtio device.
> 
> That seems like it ought to be fairly simple, and seems like a
> reasonable way to allow an untrusted VMM to provide virtio devices with
> restricted DMA access.
> 
> While I start actually doing the typing... does anyone want to start
> yelling at me now? Christoph? mst? :)


I don't mind as such (though I don't understand completely), but since
this is changing the device anyway, I am a bit confused why you can't
just set the VIRTIO_F_ACCESS_PLATFORM feature bit?  This forces DMA API
which will DTRT for you, will it not?

-- 
MST




Re: distro support for CONFIG_KUNIT: [PATCH 0/3] bitmap: convert self-test to KUnit

2025-03-21 Thread Yury Norov
On Fri, Mar 21, 2025 at 12:53:36PM -0400, Tamir Duberstein wrote:
> Hi all, now that the printf and scanf series have been taken via kees'
> tree[0] and sent in for v6.15-rc1[1], I wonder if we'd like to revisit
> this discussion.
> 
> As I understand it, the primary objections to moving bitmap to KUnit were:
> - Unclear benefits.
> - Source churn.
> - Extra dependencies for benchmarks.
> 
> Hopefully David's enumeration of the benefits of KUnit was compelling.
> Regarding source churn: it is inevitable, but I did pay attention to
> this and minimized the diff where possible.
> 
> The last point is trickiest, because KUnit doesn't have first-class
> benchmark support, but nor is there a blessed benchmark facility in
> the kernel generally. I'd prefer not to tie this series to distros
> enabling KUNIT_CONFIG by default, which will take $time.
> 
> I think the most sensible thing we can do - if we accept that KUnit
> has benefits to offer - is to split test_bitmap.c into
> benchmark_bitmap.c and bitmap_kunit.c.
> 
> Please let me know your thoughts.

Sure, no problem.

I asked you to answer to 4 very simple and specific questions. You
didn't answer any of them. David sent a lengthy email that doesn't
address them, either.

None of you guys submitted anything to bitmaps - neither in library,
nor in tests. Your opinion about what is good for bitmap development
and what's not is purely theoretical.

Real contributors never concerned about current testing model.

I think that you don't care about bitmaps. If bitmaps testing will get
broken one day, or more complicated, you will not come to help. If I'm
wrong and you are willing to contribute, you're warmly welcome! I always
encourage people to increase testing coverage.

If you'd like to add new cases to existing tests - I'll be happy. If
you'd like to add completely new tests based on KUNITs or whatever
else - I'll be happy just as well.

Thanks,
Yury