[linux-next-20250307] Build Failure

2025-03-09 Thread Venkat Rao Bagalkote

Greetings!!,

I see linux-next-20250307 fails to build on IBM Power9 and Power10 servers.


Errors:

In file included from ^[[01m^[[K^[[m^[[K:
^[[01m^[[K./usr/include/cxl/features.h:11:10:^[[m^[[K ^[[01;31m^[[Kfatal 
error: ^[[m^[[Kuuid/uuid.h: No such file or directory

   11 | #include ^[[01;31m^[[K^[[m^[[K
  |  ^[[01;31m^[[K^^[[m^[[K
compilation terminated.
make[4]: *** [usr/include/Makefile:85: usr/include/cxl/features.hdrtest] 
Error 1

make[3]: *** [scripts/Makefile.build:461: usr/include] Error 2
make[2]: *** [scripts/Makefile.build:461: usr] Error 2
make[2]: *** Waiting for unfinished jobs
arch/powerpc/kernel/switch.o: warning: objtool: .text+0x4: 
intra_function_call not a direct call
arch/powerpc/crypto/ghashp8-ppc.o: warning: objtool: .text+0x22c: 
unannotated intra-function call
arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xe84: 
intra_function_call not a direct call

make[1]: *** [/home/linux_src/linux/Makefile:1997: .] Error 2
make: *** [Makefile:251: __sub-make] Error 2

Please add below tag, if you happen to fix this issue.

Reported-by: Venkat Rao Bagalkote 


Regards,

Venkat.




Re: [PATCH] book3s64/radix : Align section vmemmap start address to PAGE_SIZE

2025-03-09 Thread Donet Tom



On 3/8/25 9:16 AM, Aneesh Kumar K.V wrote:

Donet Tom  writes:


A vmemmap altmap is a device-provided region used to provide
backing storage for struct pages. For each namespace, the altmap
should belong to that same namespace. If the namespaces are
created unaligned, there is a chance that the section vmemmap
start address could also be unaligned. If the section vmemmap
start address is unaligned, the altmap page allocated from the
current namespace might be used by the previous namespace also.
During the free operation, since the altmap is shared between two
namespaces, the previous namespace may detect that the page does
not belong to its altmap and incorrectly assume that the page is a
normal page. It then attempts to free the normal page, which leads
to a kernel crash.

In this patch, we are aligning the section vmemmap start address
to PAGE_SIZE. After alignment, the start address will not be
part of the current namespace, and a normal page will be allocated
for the vmemmap mapping of the current section. For the remaining
sections, altmaps will be allocated. During the free operation,
the normal page will be correctly freed.

Without this patch
==
NS1 start   NS2 start
  _
| NS1   |NS2  |
  -
| Altmap| Altmap | .|Altmap| Altmap | ...
|  NS1  |  NS1   |  | NS2  |  NS2   |

In the above scenario, NS1 and NS2 are two namespaces. The vmemmap
for NS1 comes from Altmap NS1, which belongs to NS1, and the
vmemmap for NS2 comes from Altmap NS2, which belongs to NS2.

The vmemmap start for NS2 is not aligned, so Altmap NS2 is shared
by both NS1 and NS2. During the free operation in NS1, Altmap NS2
is not part of NS1's altmap, causing it to attempt to free an
invalid page.

With this patch
===
NS1 start   NS2 start
  _
| NS1   |NS2  |
  -
| Altmap| Altmap | .| Normal | Altmap | Altmap |...
|  NS1  |  NS1   |  |  Page  |  NS2   |  NS2   |

If the vmemmap start for NS2 is not aligned then we are allocating
a normal page. NS1 and NS2 vmemmap will be freed correctly.

Fixes: 368a0590d954("powerpc/book3s64/vmemmap: switch radix to use a different 
vmemmap handling function")
Co-developed-by: Ritesh Harjani (IBM) 
Signed-off-by: Ritesh Harjani (IBM) 
Signed-off-by: Donet Tom 
---
  arch/powerpc/mm/book3s64/radix_pgtable.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 311e2112d782..b22d5f6147d2 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1120,6 +1120,8 @@ int __meminit radix__vmemmap_populate(unsigned long 
start, unsigned long end, in
pmd_t *pmd;
pte_t *pte;


/*
 * Make sure we align the start vmemmap addr so that we calculate
the correct start_pfn in altmap boundary check to decided whether
we should use altmap or RAM based backing memory allocation. Also
the address need to be aligned for set_pte operation.

If the start addr is already PMD_SIZE aligned we will try to use
a pmd mapping. We don't want to be too aggressive here beacause
that will cause more allocations in RAM. So only if the namespace
vmemmap start addr is PMD_SIZE aligned we will use PMD mapping.

*/

May be with some comments as above?



Sure, I will update it and send a V3.





+   start = ALIGN_DOWN(start, PAGE_SIZE);
+
for (addr = start; addr < end; addr = next) {
next = pmd_addr_end(addr, end);
  
--

2.43.5




[PATCH v7 5/7] powerpc/pseries: Add ibm,get-dynamic-sensor-state RTAS call support

2025-03-09 Thread Haren Myneni
The RTAS call ibm,get-dynamic-sensor-state is used to get the
sensor state identified by the location code and the sensor
token. The librtas library provides an API
rtas_get_dynamic_sensor() which uses /dev/mem access for work
area allocation but is restricted under system lockdown.

This patch provides an interface with new ioctl
 PAPR_DYNAMIC_SENSOR_IOC_GET to the papr-indices character
driver which executes this HCALL and copies the sensor state
in the user specified ioctl buffer.

Refer PAPR 7.3.19 ibm,get-dynamic-sensor-state for more
information on this RTAS call.
- User input parameters to the RTAS call: location code string
  and the sensor token

Expose these interfaces to user space with a /dev/papr-indices
character device using the following programming model:
 int fd = open("/dev/papr-indices", O_RDWR);
 int ret = ioctl(fd, PAPR_DYNAMIC_SENSOR_IOC_GET,
struct papr_indices_io_block)
  - The user space specifies input parameters in
papr_indices_io_block struct
  - Returned state for the specified sensor is copied to
papr_indices_io_block.dynamic_param.state

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/rtas.h   |  1 +
 arch/powerpc/kernel/rtas.c|  2 +-
 arch/powerpc/platforms/pseries/papr-indices.c | 67 +++
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 2da52f59e4c6..fcd822f0e1d7 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -517,6 +517,7 @@ extern unsigned long rtas_rmo_buf;
 extern struct mutex rtas_ibm_get_vpd_lock;
 extern struct mutex rtas_ibm_get_indices_lock;
 extern struct mutex rtas_ibm_set_dynamic_indicator_lock;
+extern struct mutex rtas_ibm_get_dynamic_sensor_state_lock;
 
 #define GLOBAL_INTERRUPT_QUEUE 9005
 
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 88fa416730af..a4848e7f248e 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -92,12 +92,12 @@ struct rtas_function {
  * Per-function locks for sequence-based RTAS functions.
  */
 static DEFINE_MUTEX(rtas_ibm_activate_firmware_lock);
-static DEFINE_MUTEX(rtas_ibm_get_dynamic_sensor_state_lock);
 static DEFINE_MUTEX(rtas_ibm_lpar_perftools_lock);
 static DEFINE_MUTEX(rtas_ibm_physical_attestation_lock);
 DEFINE_MUTEX(rtas_ibm_get_vpd_lock);
 DEFINE_MUTEX(rtas_ibm_get_indices_lock);
 DEFINE_MUTEX(rtas_ibm_set_dynamic_indicator_lock);
+DEFINE_MUTEX(rtas_ibm_get_dynamic_sensor_state_lock);
 
 static struct rtas_function rtas_function_table[] __ro_after_init = {
[RTAS_FNIDX__CHECK_EXCEPTION] = {
diff --git a/arch/powerpc/platforms/pseries/papr-indices.c 
b/arch/powerpc/platforms/pseries/papr-indices.c
index a5f344cc2e53..fb4586bf3c69 100644
--- a/arch/powerpc/platforms/pseries/papr-indices.c
+++ b/arch/powerpc/platforms/pseries/papr-indices.c
@@ -372,6 +372,67 @@ static long papr_dynamic_indicator_ioc_set(struct 
papr_indices_io_block __user *
return ret;
 }
 
+/**
+ * papr_dynamic_sensor_ioc_get - ibm,get-dynamic-sensor-state RTAS Call
+ * PAPR 2.13 7.3.19
+ *
+ * @ubuf: Input parameters to RTAS call such as sensor token
+ *Copies the state in user space buffer.
+ *
+ *
+ * Returns success or -errno.
+ */
+
+static long papr_dynamic_sensor_ioc_get(struct papr_indices_io_block __user 
*ubuf)
+{
+   struct papr_indices_io_block kbuf;
+   struct rtas_work_area *work_area;
+   s32 fwrc, token, ret;
+   u32 rets;
+
+   token = rtas_function_token(RTAS_FN_IBM_GET_DYNAMIC_SENSOR_STATE);
+   if (token == RTAS_UNKNOWN_SERVICE)
+   return -ENOENT;
+
+   mutex_lock(&rtas_ibm_get_dynamic_sensor_state_lock);
+   work_area = papr_dynamic_indice_buf_from_user(ubuf, &kbuf);
+   if (IS_ERR(work_area)) {
+   ret = PTR_ERR(work_area);
+   goto out;
+   }
+
+   do {
+   fwrc = rtas_call(token, 2, 2, &rets,
+   kbuf.dynamic_param.token,
+   rtas_work_area_phys(work_area));
+   } while (rtas_busy_delay(fwrc));
+
+   rtas_work_area_free(work_area);
+
+   switch (fwrc) {
+   case RTAS_SUCCESS:
+   if (put_user(rets, &ubuf->dynamic_param.state))
+   ret = -EFAULT;
+   else
+   ret = 0;
+   break;
+   case RTAS_IBM_DYNAMIC_INDICE_NO_INDICATOR:  /* No such indicator */
+   ret = -EOPNOTSUPP;
+   break;
+   default:
+   pr_err("unexpected ibm,get-dynamic-sensor result %d\n",
+   fwrc);
+   fallthrough;
+   case RTAS_HARDWARE_ERROR:   /* Hardware/platform error */
+   ret = -EIO;
+   break;
+   }
+
+out:
+   mutex_unlock(&rtas_ibm_get_dynamic_sensor_state_lock);
+   return ret;
+}
+
 /*
  * Top-level ioctl han

[PATCH v7 1/7] powerpc/pseries: Define common functions for RTAS sequence calls

2025-03-09 Thread Haren Myneni
The RTAS call can be normal where retrieves the data form the
hypervisor once or sequence based RTAS call which has to
issue multiple times until the complete data is obtained. For
some of these sequence RTAS calls, the OS should not interleave
calls with different input until the sequence is completed.
The data is collected for each call and copy to the buffer
for the entire sequence during ioctl() handle and then expose
this buffer to the user space with read() handle.

One such sequence RTAS call is ibm,get-vpd and its support is
already included in the current code. To add the similar support
for other sequence based calls, move the common functions in to
separate file and update papr_rtas_sequence struct with the
following callbacks so that RTAS call specific code will be
defined and executed to complete the sequence.

struct papr_rtas_sequence {
int error;
void params;
void (*begin) (struct papr_rtas_sequence *);
void (*end) (struct papr_rtas_sequence *);
const char * (*work) (struct papr_rtas_sequence *, size_t *);
};

params: Input parameters used to pass for RTAS call.
Begin:  RTAS call specific function to initialize data
including work area allocation.
End:RTAS call specific function to free up resources
(free work area) after the sequence is completed.
Work:   The actual RTAS call specific function which collects
the data from the hypervisor.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/platforms/pseries/Makefile   |   2 +-
 .../platforms/pseries/papr-rtas-common.c  | 313 
 .../platforms/pseries/papr-rtas-common.h  |  61 +++
 arch/powerpc/platforms/pseries/papr-vpd.c | 350 +++---
 4 files changed, 418 insertions(+), 308 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/papr-rtas-common.c
 create mode 100644 arch/powerpc/platforms/pseries/papr-rtas-common.h

diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 7bf506f6b8c8..697c216b70dc 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -3,7 +3,7 @@ ccflags-$(CONFIG_PPC_PSERIES_DEBUG) += -DDEBUG
 
 obj-y  := lpar.o hvCall.o nvram.o reconfig.o \
   of_helpers.o rtas-work-area.o papr-sysparm.o \
-  papr-vpd.o \
+  papr-rtas-common.o papr-vpd.o \
   setup.o iommu.o event_sources.o ras.o \
   firmware.o power.o dlpar.o mobility.o rng.o \
   pci.o pci_dlpar.o eeh_pseries.o msi.o \
diff --git a/arch/powerpc/platforms/pseries/papr-rtas-common.c 
b/arch/powerpc/platforms/pseries/papr-rtas-common.c
new file mode 100644
index ..a01a4d913ead
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/papr-rtas-common.c
@@ -0,0 +1,313 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) "papr-common: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "papr-rtas-common.h"
+
+/*
+ * Sequence based RTAS HCALL has to issue multiple times to retrieve
+ * complete data from the hypervisor. For some of these RTAS calls,
+ * the OS should not interleave calls with different input until the
+ * sequence is completed. So data is collected for these calls during
+ * ioctl handle and export to user space with read() handle.
+ * This file provides common functions needed for such sequence based
+ * RTAS calls Ex: ibm,get-vpd and ibm,get-indices.
+ */
+
+bool papr_rtas_blob_has_data(const struct papr_rtas_blob *blob)
+{
+   return blob->data && blob->len;
+}
+
+void papr_rtas_blob_free(const struct papr_rtas_blob *blob)
+{
+   if (blob) {
+   kvfree(blob->data);
+   kfree(blob);
+   }
+}
+
+/**
+ * papr_rtas_blob_extend() - Append data to a &struct papr_rtas_blob.
+ * @blob: The blob to extend.
+ * @data: The new data to append to @blob.
+ * @len:  The length of @data.
+ *
+ * Context: May sleep.
+ * Return: -ENOMEM on allocation failure, 0 otherwise.
+ */
+static int papr_rtas_blob_extend(struct papr_rtas_blob *blob,
+   const char *data, size_t len)
+{
+   const size_t new_len = blob->len + len;
+   const size_t old_len = blob->len;
+   const char *old_ptr = blob->data;
+   char *new_ptr;
+
+   new_ptr = kvrealloc(old_ptr, new_len, GFP_KERNEL_ACCOUNT);
+   if (!new_ptr)
+   return -ENOMEM;
+
+   memcpy(&new_ptr[old_len], data, len);
+   blob->data = new_ptr;
+   blob->len = new_len;
+   return 0;
+}
+
+/**
+ * papr_rtas_blob_generate() - Construct a new &struct papr_rtas_blob.
+ * @seq: work function of the caller that is called to obtain
+ *   data with the caller RTAS call.
+ *
+ * The @work callback is invoked until it returns NULL. @seq is
+ * passed to @work in its first argument on each c

[PATCH v7 6/7] powerpc/pseries: Add papr-platform-dump character driver for dump retrieval

2025-03-09 Thread Haren Myneni
ibm,platform-dump RTAS call in combination with writable mapping
/dev/mem is issued to collect platform dump from the hypervisor
and may need multiple calls to get the complete dump. The current
implementation uses rtas_platform_dump() API provided by librtas
library to issue these RTAS calls. But /dev/mem access by the
user space is prohibited under system lockdown.

The solution should be to restrict access to RTAS function in user
space and provide kernel interfaces to collect dump. This patch
adds papr-platform-dump character driver and expose standard
interfaces such as open / ioctl/ read to user space in ways that
are compatible with lockdown.

PAPR (7.3.3.4.1 ibm,platform-dump) provides a method to obtain
the complete dump:
- Each dump will be identified by ID called dump tag.
- A sequence of RTAS calls have to be issued until retrieve the
  complete dump. The hypervisor expects the first RTAS call with
  the sequence 0 and the subsequent calls with the sequence
  number returned from the previous calls.
- The hypervisor returns "dump complete" status once the complete
  dump is retrieved. But expects one more RTAS call from the
  partition with the NULL buffer to invalidate dump which means
  the dump will be removed in the hypervisor.
- Sequence of calls are allowed with different dump IDs at the
  same time but not with the same dump ID.

Expose these interfaces to user space with a /dev/papr-platform-dump
character device using the following programming model:

   int devfd = open("/dev/papr-platform-dump", O_RDONLY);
   int fd = ioctl(devfd,PAPR_PLATFORM_DUMP_IOC_CREATE_HANDLE, &dump_id)
- Restrict user space to access with the same dump ID.
  Typically we do not expect user space requests the dump
  again for the same dump ID.
   char *buf = malloc(size);
   length = read(fd, buf, size);
- size should be minimum 1K based on PAPR and  <= 4K based
  on RTAS work area size. It will be restrict to RTAS work
  area size. Using 4K work area based on the current
  implementation in librtas library
- Each read call issue RTAS call to get the data based on
  the size requirement and returns bytes returned from the
  hypervisor
- If the previous call returns dump complete status, the
  next read returns 0 like EOF.
   ret = ioctl(PAPR_PLATFORM_DUMP_IOC_INVALIDATE, &dump_id)
- RTAS call with NULL buffer to invalidates the dump.

The read API should use the file descriptor obtained from ioctl
based on dump ID so that gets dump contents for the corresponding
dump ID. Implemented support in librtas (rtas_platform_dump()) for
this new ABI to support system lockdown.

Signed-off-by: Haren Myneni 
---
 .../userspace-api/ioctl/ioctl-number.rst  |   2 +
 .../include/uapi/asm/papr-platform-dump.h |  15 +
 arch/powerpc/platforms/pseries/Makefile   |   1 +
 .../platforms/pseries/papr-platform-dump.c| 411 ++
 4 files changed, 429 insertions(+)
 create mode 100644 arch/powerpc/include/uapi/asm/papr-platform-dump.h
 create mode 100644 arch/powerpc/platforms/pseries/papr-platform-dump.c

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index f9332b634116..1b661436aa7c 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -365,6 +365,8 @@ Code  Seq#Include File  
 Comments
  

 0xB2  03-05 arch/powerpc/include/uapi/asm/papr-indices.h 
powerpc/pseries indices API
  

+0xB2  06-07 arch/powerpc/include/uapi/asm/papr-platform-dump.h   
powerpc/pseries Platform Dump API
+ 

 0xB3  00 linux/mmc/ioctl.h
 0xB4  00-0F  linux/gpio.h

 0xB5  00-0F  uapi/linux/rpmsg.h  

diff --git a/arch/powerpc/include/uapi/asm/papr-platform-dump.h 
b/arch/powerpc/include/uapi/asm/papr-platform-dump.h
new file mode 100644
index ..a1d89c290dab
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/papr-platform-dump.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_PAPR_PLATFORM_DUMP_H_
+#define _UAPI_PAPR_PLATFORM_DUMP_H_
+
+#include 
+#include 
+
+/*
+ * ioctl for /dev/papr-platform-dump. Returns a platform-dump handle fd
+ * corresponding to dump tag.
+ */
+#define PAPR_PLATFORM_DUMP_IOC_CREATE_HANDLE _IOW(PAPR_MISCDEV_IOC_ID, 6, 
__u64)
+#define PAPR_PLATFORM_DUMP_IOC_INVALIDATE_IOW(PAPR_MISCDEV_IOC_ID, 7, 
__u64)
+
+#endif /* _UAPI_PAPR_

[PATCH v7 3/7] powerpc/pseries: Add papr-indices char driver for ibm,get-indices

2025-03-09 Thread Haren Myneni
The RTAS call ibm,get-indices is used to obtain indices and
location codes for a specified indicator or sensor token. The
current implementation uses rtas_get_indices() API provided by
librtas library which allocates RMO buffer and issue this RTAS
call in the user space. But writable mapping /dev/mem access by
the user space is prohibited under system lockdown.

To overcome the restricted access in the user space, the kernel
provide interfaces to collect indices data from the hypervisor.
This patch adds papr-indices character driver and expose standard
interfaces such as open / ioctl/ read to user space in ways that
are compatible with lockdown.

PAPR (2.13 7.3.17 ibm,get-indices RTAS Call) describes the
following steps to retrieve all indices data:
- User input parameters to the RTAS call: sensor or indicator,
  and indice type
- ibm,get-indices is sequence RTAS call which means has to issue
  multiple times to get the entire list of indicators or sensors
  of a particular type. The hypervisor expects the first RTAS call
  with the sequence 1 and the subsequent calls with the sequence
  number returned from the previous calls.
- The OS may not interleave calls to ibm,get-indices for different
  indicator or sensor types. Means other RTAS calls with different
  type should not be issued while the previous type sequence is in
  progress. So collect the entire list of indices and copied to
  buffer BLOB during ioctl() and expose this buffer to the user
  space with the file descriptor.
- The hypervisor fills the work area with a specific format but
  does not return the number of bytes written to the buffer.
  Instead of parsing the data for each call to determine the data
  length, copy the work area size (RTAS_GET_INDICES_BUF_SIZE) to
  the buffer. Return work-area size of data to the user space for
  each read() call.

Expose these interfaces to user space with a /dev/papr-indices
character device using the following programming model:

 int devfd = open("/dev/papr-indices", O_RDONLY);
 int fd = ioctl(devfd, PAPR_INDICES_IOC_GET,
struct papr_indices_io_block)
  - Collect all indices data for the specified token to the buffer
 char *buf = malloc(RTAS_GET_INDICES_BUF_SIZE);
 length = read(fd, buf,  RTAS_GET_INDICES_BUF_SIZE)
  - RTAS_GET_INDICES_BUF_SIZE of data is returned to the user
space.
  - The user space retrieves the indices and their location codes
from the buffer
  - Should issue multiple read() calls until reaches the end of
BLOB buffer.

The read() should use the file descriptor obtained from ioctl to
get the data that is exposed to file descriptor. Implemented
support in librtas (rtas_get_indices()) for this new ABI for
system lockdown.

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/rtas.h   |   1 +
 arch/powerpc/kernel/rtas.c|   2 +-
 arch/powerpc/platforms/pseries/Makefile   |   2 +-
 arch/powerpc/platforms/pseries/papr-indices.c | 302 ++
 4 files changed, 305 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/papr-indices.c

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 04406162fc5a..7dc527a5aaac 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -515,6 +515,7 @@ extern char rtas_data_buf[RTAS_DATA_BUF_SIZE];
 extern unsigned long rtas_rmo_buf;
 
 extern struct mutex rtas_ibm_get_vpd_lock;
+extern struct mutex rtas_ibm_get_indices_lock;
 
 #define GLOBAL_INTERRUPT_QUEUE 9005
 
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index d31c9799cab2..76c634b92cb2 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -93,11 +93,11 @@ struct rtas_function {
  */
 static DEFINE_MUTEX(rtas_ibm_activate_firmware_lock);
 static DEFINE_MUTEX(rtas_ibm_get_dynamic_sensor_state_lock);
-static DEFINE_MUTEX(rtas_ibm_get_indices_lock);
 static DEFINE_MUTEX(rtas_ibm_lpar_perftools_lock);
 static DEFINE_MUTEX(rtas_ibm_physical_attestation_lock);
 static DEFINE_MUTEX(rtas_ibm_set_dynamic_indicator_lock);
 DEFINE_MUTEX(rtas_ibm_get_vpd_lock);
+DEFINE_MUTEX(rtas_ibm_get_indices_lock);
 
 static struct rtas_function rtas_function_table[] __ro_after_init = {
[RTAS_FNIDX__CHECK_EXCEPTION] = {
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 697c216b70dc..e1db61877bb9 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -3,7 +3,7 @@ ccflags-$(CONFIG_PPC_PSERIES_DEBUG) += -DDEBUG
 
 obj-y  := lpar.o hvCall.o nvram.o reconfig.o \
   of_helpers.o rtas-work-area.o papr-sysparm.o \
-  papr-rtas-common.o papr-vpd.o \
+  papr-rtas-common.o papr-vpd.o papr-indices.o \
   setup.o iommu.o event_sources.o ras.o \
   firmware.o power.o dlpar.o mobilit

[PATCH v7 4/7] powerpc/pseries: Add ibm,set-dynamic-indicator RTAS call support

2025-03-09 Thread Haren Myneni
The RTAS call ibm,set-dynamic-indicator is used to set the new
indicator state identified by a location code. The current
implementation uses rtas_set_dynamic_indicator() API provided by
librtas library which allocates RMO buffer and issue this RTAS
call in the user space. But /dev/mem access by the user space
is prohibited under system lockdown.

This patch provides an interface with new ioctl
PAPR_DYNAMIC_INDICATOR_IOC_SET to the papr-indices character
driver and expose this interface to the user space that is
compatible with lockdown.

Refer PAPR 7.3.18 ibm,set-dynamic-indicator for more
information on this RTAS call.
-  User input parameters to the RTAS call: location code
   string, indicator token and new state

Expose these interfaces to user space with a /dev/papr-indices
character device using the following programming model:
 int fd = open("/dev/papr-indices", O_RDWR);
 int ret = ioctl(fd, PAPR_DYNAMIC_INDICATOR_IOC_SET,
struct papr_indices_io_block)
  - The user space passes input parameters in papr_indices_io_block
struct

Signed-off-by: Haren Myneni 
---
 arch/powerpc/include/asm/rtas.h   |   1 +
 arch/powerpc/kernel/rtas.c|   2 +-
 arch/powerpc/platforms/pseries/papr-indices.c | 120 ++
 3 files changed, 122 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 7dc527a5aaac..2da52f59e4c6 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -516,6 +516,7 @@ extern unsigned long rtas_rmo_buf;
 
 extern struct mutex rtas_ibm_get_vpd_lock;
 extern struct mutex rtas_ibm_get_indices_lock;
+extern struct mutex rtas_ibm_set_dynamic_indicator_lock;
 
 #define GLOBAL_INTERRUPT_QUEUE 9005
 
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 76c634b92cb2..88fa416730af 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -95,9 +95,9 @@ static DEFINE_MUTEX(rtas_ibm_activate_firmware_lock);
 static DEFINE_MUTEX(rtas_ibm_get_dynamic_sensor_state_lock);
 static DEFINE_MUTEX(rtas_ibm_lpar_perftools_lock);
 static DEFINE_MUTEX(rtas_ibm_physical_attestation_lock);
-static DEFINE_MUTEX(rtas_ibm_set_dynamic_indicator_lock);
 DEFINE_MUTEX(rtas_ibm_get_vpd_lock);
 DEFINE_MUTEX(rtas_ibm_get_indices_lock);
+DEFINE_MUTEX(rtas_ibm_set_dynamic_indicator_lock);
 
 static struct rtas_function rtas_function_table[] __ro_after_init = {
[RTAS_FNIDX__CHECK_EXCEPTION] = {
diff --git a/arch/powerpc/platforms/pseries/papr-indices.c 
b/arch/powerpc/platforms/pseries/papr-indices.c
index 11aa0e6879e8..a5f344cc2e53 100644
--- a/arch/powerpc/platforms/pseries/papr-indices.c
+++ b/arch/powerpc/platforms/pseries/papr-indices.c
@@ -20,6 +20,13 @@
 #include 
 #include "papr-rtas-common.h"
 
+/*
+ * Function-specific return values for ibm,set-dynamic-indicator and
+ * ibm,get-dynamic-sensor-state RTAS calls.
+ * PAPR+ v2.13 7.3.18 and 7.3.19.
+ */
+#define RTAS_IBM_DYNAMIC_INDICE_NO_INDICATOR   -3
+
 /**
  * struct rtas_get_indices_params - Parameters (in and out) for
  *  ibm,get-indices.
@@ -261,6 +268,110 @@ static long papr_indices_create_handle(struct 
papr_indices_io_block __user *ubuf
return fd;
 }
 
+/*
+ * Create work area with the input parameters. This function is used
+ * for both ibm,set-dynamic-indicator and ibm,get-dynamic-sensor-state
+ * RTAS Calls.
+ */
+static struct rtas_work_area *
+papr_dynamic_indice_buf_from_user(struct papr_indices_io_block __user *ubuf,
+   struct papr_indices_io_block *kbuf)
+{
+   struct rtas_work_area *work_area;
+   u32 length;
+   __be32 len_be;
+
+   if (copy_from_user(kbuf, ubuf, sizeof(*kbuf)))
+   return ERR_PTR(-EFAULT);
+
+
+   if (!string_is_terminated(kbuf->dynamic_param.location_code_str,
+   ARRAY_SIZE(kbuf->dynamic_param.location_code_str)))
+   return ERR_PTR(-EINVAL);
+
+   /*
+* The input data in the work area should be as follows:
+* - 32-bit integer length of the location code string,
+*   including NULL.
+* - Location code string, NULL terminated, identifying the
+*   token (sensor or indicator).
+* PAPR 2.13 - R1–7.3.18–5 ibm,set-dynamic-indicator
+*   - R1–7.3.19–5 ibm,get-dynamic-sensor-state
+*/
+   /*
+* Length that user space passed should also include NULL
+* terminator.
+*/
+   length = strlen(kbuf->dynamic_param.location_code_str) + 1;
+   if (length > LOC_CODE_SIZE)
+   return ERR_PTR(-EINVAL);
+
+   len_be = cpu_to_be32(length);
+
+   work_area = rtas_work_area_alloc(LOC_CODE_SIZE + sizeof(u32));
+   memcpy(rtas_work_area_raw_buf(work_area), &len_be, sizeof(u32));
+   memcpy((rtas_work_area_raw_buf(work_area) + sizeof(u32)),
+   &kbuf->dynamic_param.location_cod

[PATCH v7 7/7] powerpc/pseries: Add a char driver for physical-attestation RTAS

2025-03-09 Thread Haren Myneni
The RTAS call ibm,physical-attestation is used to retrieve
information about the trusted boot state of the firmware and
hypervisor on the system, and also Trusted Platform Modules (TPM)
data if the system is TCG 2.0 compliant.

This RTAS interface expects the caller to define different command
structs such as RetrieveTPMLog, RetrievePlatformCertificat and etc,
in a work area with a maximum size of 4K bytes and the response
buffer will be returned in the same work area.

The current implementation of this RTAS function is in the user
space but allocation of the work area is restricted with the system
lockdown. So this patch implements this RTAS function in the kernel
and expose to the user space with open/ioctl/read interfaces.

PAPR (2.13+ 21.3 ibm,physical-attestation) defines RTAS function:
- Pass the command struct to obtain the response buffer for the
  specific command.
- This RTAS function is sequence RTAS call and has to issue RTAS
  call multiple times to get the complete response buffer (max 64K).
  The hypervisor expects the first RTAS call with the sequence 1 and
  the subsequent calls with the sequence number returned from the
  previous calls.

Expose these interfaces to user space with a
/dev/papr-physical-attestation character device using the following
programming model:

 int devfd = open("/dev/papr-physical-attestation");
 int fd = ioctl(devfd, PAPR_PHY_ATTEST_IOC_HANDLE,
  struct papr_phy_attest_io_block);
 - The user space defines the command struct and requests the
   response for any command.
 - Obtain the complete response buffer and returned the buffer as
   blob to the command specific FD.
 size = read(fd, buf, len);
 - Can retrieve the response buffer once or multiple times until the
   end of BLOB buffer.

Implemented this new kernel ABI support in librtas library for
system lockdown

Signed-off-by: Haren Myneni 
---
 .../userspace-api/ioctl/ioctl-number.rst  |   2 +
 arch/powerpc/include/asm/rtas.h   |   1 +
 .../uapi/asm/papr-physical-attestation.h  |  31 ++
 arch/powerpc/kernel/rtas.c|   2 +-
 arch/powerpc/platforms/pseries/Makefile   |   2 +-
 .../platforms/pseries/papr-phy-attest.c   | 287 ++
 6 files changed, 323 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/include/uapi/asm/papr-physical-attestation.h
 create mode 100644 arch/powerpc/platforms/pseries/papr-phy-attest.c

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 1b661436aa7c..504d059970d4 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -367,6 +367,8 @@ Code  Seq#Include File  
 Comments
  

 0xB2  06-07 arch/powerpc/include/uapi/asm/papr-platform-dump.h   
powerpc/pseries Platform Dump API
  

+0xB2  08  arch/powerpc/include/uapi/asm/papr-physical-attestation.h  
powerpc/pseries Physical Attestation API
+ 

 0xB3  00 linux/mmc/ioctl.h
 0xB4  00-0F  linux/gpio.h

 0xB5  00-0F  uapi/linux/rpmsg.h  

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index fcd822f0e1d7..75fa0293c508 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -518,6 +518,7 @@ extern struct mutex rtas_ibm_get_vpd_lock;
 extern struct mutex rtas_ibm_get_indices_lock;
 extern struct mutex rtas_ibm_set_dynamic_indicator_lock;
 extern struct mutex rtas_ibm_get_dynamic_sensor_state_lock;
+extern struct mutex rtas_ibm_physical_attestation_lock;
 
 #define GLOBAL_INTERRUPT_QUEUE 9005
 
diff --git a/arch/powerpc/include/uapi/asm/papr-physical-attestation.h 
b/arch/powerpc/include/uapi/asm/papr-physical-attestation.h
new file mode 100644
index ..ea746837bb9a
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/papr-physical-attestation.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_PAPR_PHYSICAL_ATTESTATION_H_
+#define _UAPI_PAPR_PHYSICAL_ATTESTATION_H_
+
+#include 
+#include 
+#include 
+
+#define PAPR_PHYATTEST_MAX_INPUT 4084 /* Max 4K buffer: 4K-12 */
+
+/*
+ * Defined in PAPR 2.13+ 21.6 Attestation Command Structures.
+ * User space pass this struct and the max size should be 4K.
+ */
+struct papr_phy_attest_io_block {
+   __u8 version;
+   __u8 command;
+   __u8 TCG_major_ver;
+   __u8 TCG_minor_ver;
+   __be32 length;
+   __be32 correlator;
+   __u8 payload[PAPR_PHYATTEST_MAX_INPUT];
+};
+

Re: [PATCH 02/13] csky: move setup_initrd() to setup.c

2025-03-09 Thread Guo Ren
Move setup_initrd from mem_init into memblock_init, that LGTM.

Acked by: Guo Ren (csky) 

On Fri, Mar 7, 2025 at 2:52 AM Mike Rapoport  wrote:
>
> From: "Mike Rapoport (Microsoft)" 
>
> Memory used by initrd should be reserved as soon as possible before
> there any memblock allocations that might overwrite that memory.
>
> This will also help with pulling out memblock_free_all() to the generic
> code and reducing code duplication in arch::mem_init().
>
> Signed-off-by: Mike Rapoport (Microsoft) 
> ---
>  arch/csky/kernel/setup.c | 43 
>  arch/csky/mm/init.c  | 43 
>  2 files changed, 43 insertions(+), 43 deletions(-)
>
> diff --git a/arch/csky/kernel/setup.c b/arch/csky/kernel/setup.c
> index fe715b707fd0..e0d6ca86ea8c 100644
> --- a/arch/csky/kernel/setup.c
> +++ b/arch/csky/kernel/setup.c
> @@ -12,6 +12,45 @@
>  #include 
>  #include 
>
> +#ifdef CONFIG_BLK_DEV_INITRD
> +static void __init setup_initrd(void)
> +{
> +   unsigned long size;
> +
> +   if (initrd_start >= initrd_end) {
> +   pr_err("initrd not found or empty");
> +   goto disable;
> +   }
> +
> +   if (__pa(initrd_end) > PFN_PHYS(max_low_pfn)) {
> +   pr_err("initrd extends beyond end of memory");
> +   goto disable;
> +   }
> +
> +   size = initrd_end - initrd_start;
> +
> +   if (memblock_is_region_reserved(__pa(initrd_start), size)) {
> +   pr_err("INITRD: 0x%08lx+0x%08lx overlaps in-use memory 
> region",
> +  __pa(initrd_start), size);
> +   goto disable;
> +   }
> +
> +   memblock_reserve(__pa(initrd_start), size);
> +
> +   pr_info("Initial ramdisk at: 0x%p (%lu bytes)\n",
> +   (void *)(initrd_start), size);
> +
> +   initrd_below_start_ok = 1;
> +
> +   return;
> +
> +disable:
> +   initrd_start = initrd_end = 0;
> +
> +   pr_err(" - disabling initrd\n");
> +}
> +#endif
> +
>  static void __init csky_memblock_init(void)
>  {
> unsigned long lowmem_size = PFN_DOWN(LOWMEM_LIMIT - 
> PHYS_OFFSET_OFFSET);
> @@ -40,6 +79,10 @@ static void __init csky_memblock_init(void)
> max_low_pfn = min_low_pfn + sseg_size;
> }
>
> +#ifdef CONFIG_BLK_DEV_INITRD
> +   setup_initrd();
> +#endif
> +
> max_zone_pfn[ZONE_NORMAL] = max_low_pfn;
>
> mmu_init(min_low_pfn, max_low_pfn);
> diff --git a/arch/csky/mm/init.c b/arch/csky/mm/init.c
> index bde7cabd23df..ab51acbc19b2 100644
> --- a/arch/csky/mm/init.c
> +++ b/arch/csky/mm/init.c
> @@ -42,45 +42,6 @@ unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned 
> long)]
> __page_aligned_bss;
>  EXPORT_SYMBOL(empty_zero_page);
>
> -#ifdef CONFIG_BLK_DEV_INITRD
> -static void __init setup_initrd(void)
> -{
> -   unsigned long size;
> -
> -   if (initrd_start >= initrd_end) {
> -   pr_err("initrd not found or empty");
> -   goto disable;
> -   }
> -
> -   if (__pa(initrd_end) > PFN_PHYS(max_low_pfn)) {
> -   pr_err("initrd extends beyond end of memory");
> -   goto disable;
> -   }
> -
> -   size = initrd_end - initrd_start;
> -
> -   if (memblock_is_region_reserved(__pa(initrd_start), size)) {
> -   pr_err("INITRD: 0x%08lx+0x%08lx overlaps in-use memory 
> region",
> -  __pa(initrd_start), size);
> -   goto disable;
> -   }
> -
> -   memblock_reserve(__pa(initrd_start), size);
> -
> -   pr_info("Initial ramdisk at: 0x%p (%lu bytes)\n",
> -   (void *)(initrd_start), size);
> -
> -   initrd_below_start_ok = 1;
> -
> -   return;
> -
> -disable:
> -   initrd_start = initrd_end = 0;
> -
> -   pr_err(" - disabling initrd\n");
> -}
> -#endif
> -
>  void __init mem_init(void)
>  {
>  #ifdef CONFIG_HIGHMEM
> @@ -92,10 +53,6 @@ void __init mem_init(void)
>  #endif
> high_memory = (void *) __va(max_low_pfn << PAGE_SHIFT);
>
> -#ifdef CONFIG_BLK_DEV_INITRD
> -   setup_initrd();
> -#endif
> -
> memblock_free_all();
>
>  #ifdef CONFIG_HIGHMEM
> --
> 2.47.2
>


-- 
Best Regards
 Guo Ren



Re: [PATCH v4 4/6] kvm powerpc/book3s-apiv2: Introduce kvm-hv specific PMU

2025-03-09 Thread Athira Rajeev



> On 24 Feb 2025, at 6:45 PM, Vaibhav Jain  wrote:
> 
> Introduce a new PMU named 'kvm-hv' inside a new module named 'kvm-hv-pmu'
> to report Book3s kvm-hv specific performance counters. This will expose
> KVM-HV specific performance attributes to user-space via kernel's PMU
> infrastructure and would enableusers to monitor active kvm-hv based guests.
> 
> The patch creates necessary scaffolding to for the new PMU callbacks and
> introduces the new kernel module name 'kvm-hv-pmu' which is built with
> CONFIG_KVM_BOOK3S_HV_PMU. The patch doesn't introduce any perf-events yet,
> which will be introduced in later patches
> 
> Signed-off-by: Vaibhav Jain 
> 
> ---
> Changelog
> 
> v3->v4:
> * Introduced a new kernel module named 'kmv-hv-pmu' to host the new PMU
> instead of building the as part of KVM-HV module. [ Maddy ]
> * Moved the code from arch/powerpc/kvm to arch/powerpc/perf [ Atheera ]
> * Added a new config named KVM_BOOK3S_HV_PMU to arch/powerpc/kvm/Kconfig
> 
> v2->v3:
> * Fixed a build warning reported by kernel build robot.
> Link:
> https://lore.kernel.org/oe-kbuild-all/202501171030.3x0gqw8g-...@intel.com
> 
> v1->v2:
> * Fixed an issue of kvm-hv not loading on baremetal kvm [Gautam]
> ---
> arch/powerpc/kvm/Kconfig   |  13 
> arch/powerpc/perf/Makefile |   2 +
> arch/powerpc/perf/kvm-hv-pmu.c | 138 +
> 3 files changed, 153 insertions(+)
> create mode 100644 arch/powerpc/perf/kvm-hv-pmu.c
> 
> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
> index dbfdc126bf14..5f0ce19e7e27 100644
> --- a/arch/powerpc/kvm/Kconfig
> +++ b/arch/powerpc/kvm/Kconfig
> @@ -83,6 +83,7 @@ config KVM_BOOK3S_64_HV
> depends on KVM_BOOK3S_64 && PPC_POWERNV
> select KVM_BOOK3S_HV_POSSIBLE
> select KVM_GENERIC_MMU_NOTIFIER
> + select KVM_BOOK3S_HV_PMU
> select CMA
> help
>  Support running unmodified book3s_64 guest kernels in
> @@ -171,6 +172,18 @@ config KVM_BOOK3S_HV_NESTED_PMU_WORKAROUND
>  those buggy L1s which saves the L2 state, at the cost of performance
>  in all nested-capable guest entry/exit.
> 
> +config KVM_BOOK3S_HV_PMU
> + tristate "Hypervisor Perf events for KVM Book3s-HV"
> + depends on KVM_BOOK3S_64_HV && HV_PERF_CTRS
> + help
> +  Enable Book3s-HV Hypervisor Perf events PMU named 'kvm-hv'. These
> +  Perf events give an overview of hypervisor performance overall
> +  instead of a specific guests. Currently the PMU reports
> +  L0-Hypervisor stats on a kvm-hv enabled PSeries LPAR like:
> +  * Total/Used Guest-Heap
> +  * Total/Used Guest Page-table Memory
> +  * Total amount of Guest Page-table Memory reclaimed
> +
> config KVM_BOOKE_HV
> bool
> 
> diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
> index ac2cf58d62db..7f53fcb7495a 100644
> --- a/arch/powerpc/perf/Makefile
> +++ b/arch/powerpc/perf/Makefile
> @@ -18,6 +18,8 @@ obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o 
> hv-common.o
> 
> obj-$(CONFIG_VPA_PMU) += vpa-pmu.o
> 
> +obj-$(CONFIG_KVM_BOOK3S_HV_PMU) += kvm-hv-pmu.o
> +
> obj-$(CONFIG_PPC_8xx) += 8xx-pmu.o
> 
> obj-$(CONFIG_PPC64) += $(obj64-y)
> diff --git a/arch/powerpc/perf/kvm-hv-pmu.c b/arch/powerpc/perf/kvm-hv-pmu.c
> new file mode 100644
> index ..c154f54e09e2
> --- /dev/null
> +++ b/arch/powerpc/perf/kvm-hv-pmu.c
> @@ -0,0 +1,138 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Description: PMUs specific to running nested KVM-HV guests
> + * on Book3S processors (specifically POWER9 and later).
> + */
> +
> +#define pr_fmt(fmt)  "kvmppc-pmu: " fmt
> +
> +#include "asm-generic/local64.h"
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +enum kvmppc_pmu_eventid {
> + KVMPPC_EVENT_MAX,
> +};
> +
> +static struct attribute *kvmppc_pmu_events_attr[] = {
> + NULL,
> +};
> +
> +static const struct attribute_group kvmppc_pmu_events_group = {
> + .name = "events",
> + .attrs = kvmppc_pmu_events_attr,
> +};
> +
> +PMU_FORMAT_ATTR(event, "config:0");
> +static struct attribute *kvmppc_pmu_format_attr[] = {
> + &format_attr_event.attr,
> + NULL,
> +};
> +
> +static struct attribute_group kvmppc_pmu_format_group = {
> + .name = "format",
> + .attrs = kvmppc_pmu_format_attr,
> +};
> +
> +static const struct attribute_group *kvmppc_pmu_attr_groups[] = {
> + &kvmppc_pmu_events_group,
> + &kvmppc_pmu_format_group,
> + NULL,
> +};
> +
> +static int kvmppc_pmu_event_init(struct perf_event *event)
> +{
> + unsigned int config = event->attr.config;
> +
> + pr_debug("%s: Event(%p) id=%llu cpu=%x on_cpu=%x config=%u",
> + __func__, event, event->id, event->cpu,
> + event->oncpu, config);
> +
> + if (event->attr.type != event->pmu->type)
> + return -ENOENT;
> +
> + if (config >= KVMPPC_EVENT_MAX)
> + return -EINVAL;
> +
> + local64_set(&event->hw.prev_count, 0);
> + local64_set(&event->count, 0);
> +
> + return 0;

Re: [PATCH v4 6/6] powerpc/kvm-hv-pmu: Add perf-events for Hostwide counters

2025-03-09 Thread Athira Rajeev



> On 24 Feb 2025, at 6:45 PM, Vaibhav Jain  wrote:
> 
> Update 'kvm-hv-pmu.c' to add five new perf-events mapped to the five
> Hostwide counters. Since these newly introduced perf events are at system
> wide scope and can be read from any L1-Lpar CPU, 'kvmppc_pmu' scope and
> capabilities are updated appropriately.
> 
> Also introduce two new helpers. First is kvmppc_update_l0_stats() that uses
> the infrastructure introduced in previous patches to issues the
> H_GUEST_GET_STATE hcall L0-PowerVM to fetch guest-state-buffer holding the
> latest values of these counters which is then parsed and 'l0_stats'
> variable updated.
> 
> Second helper is kvmppc_pmu_event_update() which is called from
> 'kvmppv_pmu' callbacks and uses kvmppc_update_l0_stats() to update
> 'l0_stats' and the update the 'struct perf_event's event-counter.
> 
> Some minor updates to kvmppc_pmu_{add, del, read}() to remove some debug
> scaffolding code.
> 
> Signed-off-by: Vaibhav Jain 
> ---
> Changelog
> 
> v3->v4:
> * Minor tweaks to patch description and code as its now being built as a
> separate kernel module.
> 
> v2->v3:
> None
> 
> v1->v2:
> None
> ---
> arch/powerpc/perf/kvm-hv-pmu.c | 92 +-
> 1 file changed, 91 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/perf/kvm-hv-pmu.c b/arch/powerpc/perf/kvm-hv-pmu.c
> index ed371454f7b5..274459bb32d6 100644
> --- a/arch/powerpc/perf/kvm-hv-pmu.c
> +++ b/arch/powerpc/perf/kvm-hv-pmu.c
> @@ -30,6 +30,11 @@
> #include "asm/guest-state-buffer.h"
> 
> enum kvmppc_pmu_eventid {
> + KVMPPC_EVENT_HOST_HEAP,
> + KVMPPC_EVENT_HOST_HEAP_MAX,
> + KVMPPC_EVENT_HOST_PGTABLE,
> + KVMPPC_EVENT_HOST_PGTABLE_MAX,
> + KVMPPC_EVENT_HOST_PGTABLE_RECLAIM,
> KVMPPC_EVENT_MAX,
> };
> 
> @@ -61,8 +66,14 @@ static DEFINE_SPINLOCK(lock_l0_stats);
> /* GSB related structs needed to talk to L0 */
> static struct kvmppc_gs_msg *gsm_l0_stats;
> static struct kvmppc_gs_buff *gsb_l0_stats;
> +static struct kvmppc_gs_parser gsp_l0_stats;
> 
> static struct attribute *kvmppc_pmu_events_attr[] = {
> + KVMPPC_PMU_EVENT_ATTR(host_heap, KVMPPC_EVENT_HOST_HEAP),
> + KVMPPC_PMU_EVENT_ATTR(host_heap_max, KVMPPC_EVENT_HOST_HEAP_MAX),
> + KVMPPC_PMU_EVENT_ATTR(host_pagetable, KVMPPC_EVENT_HOST_PGTABLE),
> + KVMPPC_PMU_EVENT_ATTR(host_pagetable_max, KVMPPC_EVENT_HOST_PGTABLE_MAX),
> + KVMPPC_PMU_EVENT_ATTR(host_pagetable_reclaim, 
> KVMPPC_EVENT_HOST_PGTABLE_RECLAIM),
> NULL,
> };
> 
> @@ -71,7 +82,7 @@ static const struct attribute_group kvmppc_pmu_events_group 
> = {
> .attrs = kvmppc_pmu_events_attr,
> };
> 
> -PMU_FORMAT_ATTR(event, "config:0");
> +PMU_FORMAT_ATTR(event, "config:0-5");
> static struct attribute *kvmppc_pmu_format_attr[] = {
> &format_attr_event.attr,
> NULL,
> @@ -88,6 +99,79 @@ static const struct attribute_group 
> *kvmppc_pmu_attr_groups[] = {
> NULL,
> };
> 
> +/*
> + * Issue the hcall to get the L0-host stats.
> + * Should be called with l0-stat lock held
> + */
> +static int kvmppc_update_l0_stats(void)
> +{
> + int rc;
> +
> + /* With HOST_WIDE flags guestid and vcpuid will be ignored */
> + rc = kvmppc_gsb_recv(gsb_l0_stats, KVMPPC_GS_FLAGS_HOST_WIDE);
> + if (rc)
> + goto out;
> +
> + /* Parse the guest state buffer is successful */
> + rc = kvmppc_gse_parse(&gsp_l0_stats, gsb_l0_stats);
> + if (rc)
> + goto out;
> +
> + /* Update the l0 returned stats*/
> + memset(&l0_stats, 0, sizeof(l0_stats));
> + rc = kvmppc_gsm_refresh_info(gsm_l0_stats, gsb_l0_stats);
> +
> +out:
> + return rc;
> +}
> +
> +/* Update the value of the given perf_event */
> +static int kvmppc_pmu_event_update(struct perf_event *event)
> +{
> + int rc;
> + u64 curr_val, prev_val;
> + unsigned long flags;
> + unsigned int config = event->attr.config;
> +
> + /* Ensure no one else is modifying the l0_stats */
> + spin_lock_irqsave(&lock_l0_stats, flags);
> +
> + rc = kvmppc_update_l0_stats();
> + if (!rc) {
> + switch (config) {
> + case KVMPPC_EVENT_HOST_HEAP:
> + curr_val = l0_stats.guest_heap;
> + break;
> + case KVMPPC_EVENT_HOST_HEAP_MAX:
> + curr_val = l0_stats.guest_heap_max;
> + break;
> + case KVMPPC_EVENT_HOST_PGTABLE:
> + curr_val = l0_stats.guest_pgtable_size;
> + break;
> + case KVMPPC_EVENT_HOST_PGTABLE_MAX:
> + curr_val = l0_stats.guest_pgtable_size_max;
> + break;
> + case KVMPPC_EVENT_HOST_PGTABLE_RECLAIM:
> + curr_val = l0_stats.guest_pgtable_reclaim;
> + break;
> + default:
> + rc = -ENOENT;
> + break;
> + }
> + }
> +
> + spin_unlock_irqrestore(&lock_l0_stats, flags);
> +
> + /* If no error than update the perf event */
> + if (!rc) {
> + prev_val = local64_xchg(&event->hw.prev_count, curr_val);
> + if (curr_val > prev_val)
> + local64_add(curr_val - prev_val, &event->count);
> + }
> +
> + return rc;
> +}
> +
> static int kvmppc_pmu_event_init(struct perf_event *event)
> {
> unsigned int config = event->attr.config;
> @@ -110,15 +194,19 @@ static int kvmppc_pmu_event_init(struct perf_event 
> *event)
> 
> static void kvmppc_pmu_del(struct perf_event *event, int flags)