date:20230904

Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval

2023-09-04 Thread Michal Suchánek

Hello,

On Thu, Aug 31, 2023 at 12:59:25PM -0500, Nathan Lynch wrote:
> Michal Suchánek  writes:
> > On Thu, Aug 31, 2023 at 09:37:12PM +1000, Michael Ellerman wrote:
> >> Michal Suchánek  writes:
> >> > On Thu, Aug 31, 2023 at 03:34:37PM +1000, Michael Ellerman wrote:
> >> >> Michal Suchánek  writes:
> >> >> > On Tue, Aug 22, 2023 at 04:33:39PM -0500, Nathan Lynch via B4 Relay 
> >> >> > wrote:
> >> >> >> From: Nathan Lynch 
> >> >> >> 
> >> >> >> PowerVM LPARs may retrieve Vital Product Data (VPD) for system
> >> >> >> components using the ibm,get-vpd RTAS function.
> >> >> >> 
> >> >> >> We can expose this to user space with a /dev/papr-vpd character
> >> >> >> device, where the programming model is:
> >> >> >> 
> >> >> >>   struct papr_location_code plc = { .str = "", }; /* obtain all VPD 
> >> >> >> */
> >> >> >>   int devfd = open("/dev/papr-vpd", O_WRONLY);
> >> >> >>   int vpdfd = ioctl(devfd, PAPR_VPD_CREATE_HANDLE, &plc);
> >> >> >>   size_t size = lseek(vpdfd, 0, SEEK_END);
> >> >> >>   char *buf = malloc(size);
> >> >> >>   pread(devfd, buf, size, 0);
> >> >> >> 
> >> >> >> When a file descriptor is obtained from 
> >> >> >> ioctl(PAPR_VPD_CREATE_HANDLE),
> >> >> >> the file contains the result of a complete ibm,get-vpd sequence. The
> >> >> >
> >> >> > Could this be somewhat less obfuscated?
> >> >> >
> >> >> > What the caller wants is the result of "ibm,get-vpd", which is a
> >> >> > well-known string identifier of the rtas call.
> >> >> 
> >> >> Not really. What the caller wants is *the VPD*. Currently that's done
> >> >> by calling the RTAS "ibm,get-vpd" function, but that could change in
> >> >> future. There's RTAS calls that have been replaced with a "version 2" in
> >> >> the past, that could happen here too. Or the RTAS call could be replaced
> >> >> by a hypercall (though unlikely).
> >> >> 
> >> >> But hopefully if the underlying mechanism changed the kernel would be
> >> >> able to hide that detail behind this new API, and users would not need
> >> >> to change at all.
> >> >> 
> >> >> > Yet this identifier is never passed in. Instead we have this new
> >> >> > PAPR_VPD_CREATE_HANDLE. This is a completely new identifier, specific 
> >> >> > to
> >> >> > this call only as is the /dev/papr-vpd device name, another new
> >> >> > identifier.
> >> >> >
> >> >> > Maybe the interface could provide a way to specify the service name?
> >> >> >
> >> >> >> file contents are immutable from the POV of user space. To get a new
> >> >> >> view of VPD, clients must create a new handle.
> >> >> >
> >> >> > Which is basically the same as creating a file descriptor with open().
> >> >> 
> >> >> Sort of. But much cleaner becuase you don't need to create a file in the
> >> >> filesystem and tell userspace how to find it.
> >> >
> >> > You very much do. There is the /dev/papr-vpd and PAPR_VPD_CREATE_HANDLE
> >> > which userspace has to know about, the PAPR_VPD_CREATE_HANDLE is not
> >> > even possible to find at all.
> >> 
> >> Well yeah you need the device itself :)
> >
> > And as named it's specific to this call, and new devices will be needed
> > for any additional rtas called implemented.
> >
> >> 
> >> And yes the ioctl is defined in a header, not in the filesystem, but
> >> that's entirely normal for an ioctl based API.
> >
> > Of course, because the ioctl API has no safe way of passing a string
> > identifier for the function. Then it needs to create these obscure IDs.
> >
> > Other APIs that don't have this problem exist.
> 
> Looking at the cover letter for the series, I wonder if my framing and
> word choice is confusing? Instead of "new character devices for RTAS
> functions", what I would really like to convey is "new character devices
> for platform features that are currently only accessible through the
> rtas() syscall". But that's too long :-)

and does that really change anything?

> You (Michal) seem to favor a kernel-user ABI where user space is allowed
> to invoke arbitrary RTAS functions by name. But we already have that in
> the form of the rtas() syscall. (User space looks up function tokens by
> name in the DT.) The point of the series is that we need to move away
> from that. It's too low-level and user space has to use /dev/mem when
> invoking any of the less-simple RTAS functions.

We don't have that, directly accessing /dev/mem does not really work.
And that's what needs fixing in my view.

The rtas calls are all mechanically the same, the function implemented
here should be able to call any of them if there was a way to specify
the call.

Given that there is desire to have access to multiple calls I don't
think it makes sense to allocate a separate device with different name
for each.

The ioclt interface is not nice for this. however.

Given that different calls have different parameters having one ioctl
for all of them would be quite messy.

Still even as is the ioctl takes a string argument which is problematic
on its own.

Given that there is an argument to the call it ca

Re: [PATCH v2 1/3] kconfig: add dependencies of POWER_RESET for mips malta

2023-09-04 Thread Philippe Mathieu-Daudé


Hi,

On 1/9/23 04:42, Yuan Tan wrote:

MIPS Malta's power off depends on PCI, PCI_QUIRKS, and
POWER_RESET_PIIX4_POWEROFF to work. Enable them when POWER_RESET is set
for convenience.

Suggested-by: Zhangjin Wu 
Signed-off-by: Yuan Tan 
---
  arch/mips/Kconfig | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index bc8421859006..13bacbd05125 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -547,6 +547,9 @@ config MIPS_MALTA
select MIPS_L1_CACHE_SHIFT_6
select MIPS_MSC
select PCI_GT64XXX_PCI0
+   select PCI if POWER_RESET
+   select PCI_QUIRKS if POWER_RESET
+   select POWER_RESET_PIIX4_POWEROFF if POWER_RESET
select SMP_UP if SMP
select SWAP_IO_SPACE
select SYS_HAS_CPU_MIPS32_R1


Shouldn't we also update the _defconfig files?

Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval

2023-09-04 Thread Michal Suchánek

On Thu, Aug 31, 2023 at 03:34:37PM +1000, Michael Ellerman wrote:
> Michal Suchánek  writes:
> > Hello,
> >
> > thanks for working on this.
> >
> > On Tue, Aug 22, 2023 at 04:33:39PM -0500, Nathan Lynch via B4 Relay wrote:
> >> From: Nathan Lynch 
> >> 
> >> PowerVM LPARs may retrieve Vital Product Data (VPD) for system
> >> components using the ibm,get-vpd RTAS function.
> >> 
> >> We can expose this to user space with a /dev/papr-vpd character
> >> device, where the programming model is:
> >> 
> >>   struct papr_location_code plc = { .str = "", }; /* obtain all VPD */
> >>   int devfd = open("/dev/papr-vpd", O_WRONLY);
> >>   int vpdfd = ioctl(devfd, PAPR_VPD_CREATE_HANDLE, &plc);
> >>   size_t size = lseek(vpdfd, 0, SEEK_END);
> >>   char *buf = malloc(size);
> >>   pread(devfd, buf, size, 0);
> >> 
> >> When a file descriptor is obtained from ioctl(PAPR_VPD_CREATE_HANDLE),
> >> the file contains the result of a complete ibm,get-vpd sequence. The
> >
> > Could this be somewhat less obfuscated?
> >
> > What the caller wants is the result of "ibm,get-vpd", which is a
> > well-known string identifier of the rtas call.
> 
> Not really. What the caller wants is *the VPD*. Currently that's done
> by calling the RTAS "ibm,get-vpd" function, but that could change in
> future. There's RTAS calls that have been replaced with a "version 2" in
> the past, that could happen here too. Or the RTAS call could be replaced
> by a hypercall (though unlikely).
> 
> But hopefully if the underlying mechanism changed the kernel would be
> able to hide that detail behind this new API, and users would not need
> to change at all.

With the device named rtas-vpd it's clearly tied to rtas.

If 'version 2' of the call happens it's more likely than not going to
have new data format because limit of current format was reached. Then
emulating that old call with the new one would be counterproductive or
impossible.

Even if the same data is available through different call there is no
problem. If the user really used the well-known "ibm,get-vpd" identifier
documented in the specification then the kernel can translate it
internally to whatever new method for obtaining the data exists. The
current revisions of the specification are not going to go away, and the
identifier is still well-known and documented, even if newer revisions
of the platform use different way to provide the data to the kernel.

Sure, with the current syscall interface it would not work because the
user translates this well-known identifier into a function token, and
passes that to the kernel. With that if the "ibm,get-vpd" is gone the
kernel cannot provide the data anymore.

That was done to make it possible to call functions that were not yet
known when the kernel was written. This is no longer allowed, and the
kernel has functionality for translating function names to tokens for
the functions it does know about. Then it can do the translation for
userspace as well, and when the call is implemented differently in the
future abstract that detail away.

> > Yet this identifier is never passed in. Instead we have this new
> > PAPR_VPD_CREATE_HANDLE. This is a completely new identifier, specific to
> > this call only as is the /dev/papr-vpd device name, another new
> > identifier.
> >
> > Maybe the interface could provide a way to specify the service name?
> >
> >> file contents are immutable from the POV of user space. To get a new
> >> view of VPD, clients must create a new handle.
> >
> > Which is basically the same as creating a file descriptor with open().
> 
> Sort of. But much cleaner becuase you don't need to create a file in the
> filesystem and tell userspace how to find it.

Instead, you create a device in the filesystem, and assign an IOCTL, and
need to tell the userspace how to find both.

> 
> This pattern of creating file descriptors from existing file descriptors
> to model a hiearachy of objects is well established in eg. the KVM and
> DRM APIs.

Yet there is no object hierarchy to speak of here. There is one device
with one ioctl on it. The device name is tied to this specific call so a
different call will need both a new device and new IOCTL.

> 
> > Maybe creating a directory in sysfs or procfs with filenames
> > corresponding to rtas services would do the same job without extra
> > obfuscation?
> 
> It's not obfuscation, it's abstraction. The kernel talks to firmware to
> do things, and provides an API to user space. Not all the details of how
> the firmware works are relevant to user space, including the exact
> firmware calls required to implement a certain feature.

Well, it's not static data, it's a call. There might be different ways
to go around passing the arguments in and getting the results out.

Hiding the buffer management and chunking of the resulting data is an
abstraction, great.

Hiding the translation of well-known function name to the
system-specific token is an abstraction, great.

Translating the rtas error codes to well-known e

Re: [PATCH v2 1/3] kconfig: add dependencies of POWER_RESET for mips malta

2023-09-04 Thread Philippe Mathieu-Daudé


On 4/9/23 11:24, Yuan Tan wrote:

Hi,

On 9/4/2023 3:40 PM, Philippe Mathieu-Daudé wrote:

Hi,

On 1/9/23 04:42, Yuan Tan wrote:

MIPS Malta's power off depends on PCI, PCI_QUIRKS, and
POWER_RESET_PIIX4_POWEROFF to work. Enable them when POWER_RESET is set
for convenience.

Suggested-by: Zhangjin Wu 
Signed-off-by: Yuan Tan 
---
  arch/mips/Kconfig | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index bc8421859006..13bacbd05125 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -547,6 +547,9 @@ config MIPS_MALTA
  select MIPS_L1_CACHE_SHIFT_6
  select MIPS_MSC
  select PCI_GT64XXX_PCI0
+    select PCI if POWER_RESET
+    select PCI_QUIRKS if POWER_RESET
+    select POWER_RESET_PIIX4_POWEROFF if POWER_RESET
  select SMP_UP if SMP
  select SWAP_IO_SPACE
  select SYS_HAS_CPU_MIPS32_R1


Shouldn't we also update the _defconfig files?

Sorry, in my last email, I forgot to reply to all. So I am now resending 
this email.


In malta_defconfig, PCI and POWER_RESET_PIIX4_POWEROFF have already been 
set and PCI_QUIRKS is also selected by FSL_PCI [=n].


So shutdown and reboot with malta_defconfig is working and there is no 
need to update the malta_defconfig 🙂


Since the dependency is now enforced by Kconfig, the defconfig can
be simplified:

--- a/arch/mips/configs/malta_defconfig
+++ b/arch/mips/configs/malta_defconfig
@@ -306,3 +306,2 @@ CONFIG_SERIAL_8250_CONSOLE=y
 CONFIG_POWER_RESET=y
-CONFIG_POWER_RESET_PIIX4_POWEROFF=y
 CONFIG_POWER_RESET_SYSCON=y

But maybe we don't care, I don't know.

Re: [PATCH v2 1/3] kconfig: add dependencies of POWER_RESET for mips malta

2023-09-04 Thread Christophe Leroy



Le 04/09/2023 à 12:51, Philippe Mathieu-Daudé a écrit :
> On 4/9/23 11:24, Yuan Tan wrote:
>> Hi,
>>
>> On 9/4/2023 3:40 PM, Philippe Mathieu-Daudé wrote:
>>> Hi,
>>>
>>> On 1/9/23 04:42, Yuan Tan wrote:
 MIPS Malta's power off depends on PCI, PCI_QUIRKS, and
 POWER_RESET_PIIX4_POWEROFF to work. Enable them when POWER_RESET is set
 for convenience.

 Suggested-by: Zhangjin Wu 
 Signed-off-by: Yuan Tan 
 ---
   arch/mips/Kconfig | 3 +++
   1 file changed, 3 insertions(+)

 diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
 index bc8421859006..13bacbd05125 100644
 --- a/arch/mips/Kconfig
 +++ b/arch/mips/Kconfig
 @@ -547,6 +547,9 @@ config MIPS_MALTA
   select MIPS_L1_CACHE_SHIFT_6
   select MIPS_MSC
   select PCI_GT64XXX_PCI0
 +    select PCI if POWER_RESET
 +    select PCI_QUIRKS if POWER_RESET
 +    select POWER_RESET_PIIX4_POWEROFF if POWER_RESET
   select SMP_UP if SMP
   select SWAP_IO_SPACE
   select SYS_HAS_CPU_MIPS32_R1
>>>
>>> Shouldn't we also update the _defconfig files?
>>>
>> Sorry, in my last email, I forgot to reply to all. So I am now 
>> resending this email.
>>
>> In malta_defconfig, PCI and POWER_RESET_PIIX4_POWEROFF have already 
>> been set and PCI_QUIRKS is also selected by FSL_PCI [=n].
>>
>> So shutdown and reboot with malta_defconfig is working and there is no 
>> need to update the malta_defconfig 🙂
> 
> Since the dependency is now enforced by Kconfig, the defconfig can
> be simplified:
> 
> --- a/arch/mips/configs/malta_defconfig
> +++ b/arch/mips/configs/malta_defconfig
> @@ -306,3 +306,2 @@ CONFIG_SERIAL_8250_CONSOLE=y
>   CONFIG_POWER_RESET=y
> -CONFIG_POWER_RESET_PIIX4_POWEROFF=y
>   CONFIG_POWER_RESET_SYSCON=y
> 
> But maybe we don't care, I don't know.

I understand from what you say that you update malta_defconfig manually ?

defconfigs shouldn't be updated manually.

Once you have the new .config you should use "make savedefconfig" then 
replace your file by the newly generated defconfig file.

Christophe

[PATCH v4] drivers/net: process the result of hdlc_open() and add call of hdlc_close() in uhdlc_close()

2023-09-04 Thread Alexandra Diupina

Process the result of hdlc_open() and call uhdlc_close()
in case of an error. It is necessary to pass the error
code up the control flow, similar to a possible
error in request_irq().
Also add a hdlc_close() call to the uhdlc_close()
because the comment to hdlc_close() says it must be called
by the hardware driver when the HDLC device is being closed

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: c19b6d246a35 ("drivers/net: support hdlc function for QE-UCC")
Signed-off-by: Alexandra Diupina 
---
v4: undo all the things done prior to hdlc_open() as 
Jakub Kicinski  suggested, 
add hdlc_close() call to the uhdlc_close() to match the function comment, 
add uhdlc_close() declaration to the top of the file not to put the 
uhdlc_close() function definition before uhdlc_open()
v3: Fix the commits tree
v2: Remove the 'rc' variable (stores the return value of the 
hdlc_open()) as Christophe Leroy  suggested
 drivers/net/wan/fsl_ucc_hdlc.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
index 47c2ad7a3e42..fd999dabdd39 100644
--- a/drivers/net/wan/fsl_ucc_hdlc.c
+++ b/drivers/net/wan/fsl_ucc_hdlc.c
@@ -34,6 +34,8 @@
 #define TDM_PPPOHT_SLIC_MAXIN
 #define RX_BD_ERRORS (R_CD_S | R_OV_S | R_CR_S | R_AB_S | R_NO_S | R_LG_S)
 
+static int uhdlc_close(struct net_device *dev);
+
 static struct ucc_tdm_info utdm_primary_info = {
.uf_info = {
.tsa = 0,
@@ -731,7 +733,9 @@ static int uhdlc_open(struct net_device *dev)
napi_enable(&priv->napi);
netdev_reset_queue(dev);
netif_start_queue(dev);
-   hdlc_open(dev);
+
+   int rc = hdlc_open(dev);
+   return rc == 0 ? 0 : (uhdlc_close(dev), rc);
}
 
return 0;
@@ -824,6 +828,8 @@ static int uhdlc_close(struct net_device *dev)
netdev_reset_queue(dev);
priv->hdlc_busy = 0;
 
+   hdlc_close(dev);
+
return 0;
 }
 
-- 
2.30.2

Re: [PATCH 1/2] vmcore: allow alternate dump capturing methods to export vmcore without is_kdump_kernel()

2023-09-04 Thread Hari Bathini


Hi Baoquan,

Thanks for the review...

On 03/09/23 9:06 am, Baoquan He wrote:

Hi Hari,

On 09/02/23 at 12:34am, Hari Bathini wrote:

Currently, is_kdump_kernel() returns true when elfcorehdr_addr is set.
While elfcorehdr_addr is set for kexec based kernel dump mechanism,
alternate dump capturing methods like fadump [1] also set it to export
the vmcore. is_kdump_kernel() is used to restrict resources in crash
dump capture kernel but such restrictions may not be desirable for
fadump. Allow is_kdump_kernel() to be defined differently for such
scenarios. With this, is_kdump_kernel() could be false while vmcore
is usable. So, introduce is_crashdump_kernel() to return true when
elfcorehdr_addr is set and use it for vmcore related checks.


I got what is done in these two patches, but didn't get why they need be
done. vmcore_unusable()/is_vmcore_usable() are only unitilized in ia64.
Why do you care if it's is_crashdump_kernel() or is_kdump_kernel()?
If you want to override the generic is_kdump_kernel() with powerpc's own
is_kdump_kernel(), your below change is enough to allow you to do that.
I can't see why is_crashdump_kernel() is needed. Could you explain that
specifically?


You mean to just remove is_kdump_kernel() check in is_vmcore_usable() &
vmcore_unusable() functions? Replaced generic is_crashdump_kernel()
function instead, that returns true for any dump capturing method,
irrespective of whether is_kdump_kernel() returns true or false.
For fadump case, is_kdump_kernel() will return false after patch 2/2.

Thanks
Hari

Re: KASAN debug kernel fails to boot at early stage when CONFIG_SMP=y is set (kernel 6.5-rc5, PowerMac G4 3,6)

2023-09-04 Thread Christophe Leroy



Le 03/09/2023 à 23:06, Erhard Furtner a écrit :
> On Fri, 1 Sep 2023 07:43:34 +
> Christophe Leroy  wrote:
> 
 Can you try what happens when you remove the call to kasan_init() at the
 start of setup_arch() in arch/powerpc/kernel/setup-common.c
>>>
>>> Ok, so I left the other patches in place + btext_map() instead of 
>>> btext_unmap() at the end of MMU_init() + Michaels patch and additionally 
>>> commented-out kasan_init() as stated above. The outcome is rather 
>>> interesting! Now I deterministically get this output at boot OF console, 
>>> regardless wheter it's a cold boot or warm boot:
>>
>> Ah, my bad. You also need to remove the call to kasan_late_init() in
>> mem_init() in arch/powerpc/mm/mem.c
> 
> Not tragic. Meanwhile I commented-out kasan_late_init() and updated to kernel 
> v6.5.1.
> 
> dmesg did not change however, getting the same "BUG: KASAN: 
> stack-out-of-bounds in __kernel_poison_pages+0x6c/0xd0" as last time only on 
> v6.5.1.
> 


Ok, so lets come back to normal situation. Can you add back kasan_init() 
and kasan_late_init(), while keeping the btext changes and Michael's patch.

Then see what result you get with CONFIG_KASAN but without 
CONFIG_KASAN_VMALLOC

It would help narrow the problem area because kasan_init() does several 
things based on CONFIG_KASAN_VMALLOC.

Thanks
Christophe

Re: KASAN debug kernel fails to boot at early stage when CONFIG_SMP=y is set (kernel 6.5-rc5, PowerMac G4 3,6)

2023-09-04 Thread Christophe Leroy



Le 04/09/2023 à 16:48, Christophe Leroy a écrit :
> 
> 
> Le 03/09/2023 à 23:06, Erhard Furtner a écrit :
>> On Fri, 1 Sep 2023 07:43:34 +
>> Christophe Leroy  wrote:
>>
> Can you try what happens when you remove the call to kasan_init() 
> at the
> start of setup_arch() in arch/powerpc/kernel/setup-common.c

 Ok, so I left the other patches in place + btext_map() instead of 
 btext_unmap() at the end of MMU_init() + Michaels patch and 
 additionally commented-out kasan_init() as stated above. The outcome 
 is rather interesting! Now I deterministically get this output at 
 boot OF console, regardless wheter it's a cold boot or warm boot:
>>>
>>> Ah, my bad. You also need to remove the call to kasan_late_init() in
>>> mem_init() in arch/powerpc/mm/mem.c
>>
>> Not tragic. Meanwhile I commented-out kasan_late_init() and updated to 
>> kernel v6.5.1.
>>
>> dmesg did not change however, getting the same "BUG: KASAN: 
>> stack-out-of-bounds in __kernel_poison_pages+0x6c/0xd0" as last time 
>> only on v6.5.1.
>>
> 
> 
> Ok, so lets come back to normal situation. Can you add back kasan_init() 
> and kasan_late_init(), while keeping the btext changes and Michael's patch.
> 
> Then see what result you get with CONFIG_KASAN but without 
> CONFIG_KASAN_VMALLOC
> 
> It would help narrow the problem area because kasan_init() does several 
> things based on CONFIG_KASAN_VMALLOC.
> 



Another thing that could be interesting to test is to remove (or comment 
out) the following line in arch/powerpc/mm/kasan/Makefile :

   obj-$(CONFIG_PPC_BOOK3S_32)  += book3s_32.o

That way, the weak version of kasan_init_region() will be used instead 
of the one in book3s_32.c

Re: [PATCH v4] drivers/net: process the result of hdlc_open() and add call of hdlc_close() in uhdlc_close()

2023-09-04 Thread Christophe Leroy



Le 04/09/2023 à 14:31, Alexandra Diupina a écrit :
> Process the result of hdlc_open() and call uhdlc_close()
> in case of an error. It is necessary to pass the error
> code up the control flow, similar to a possible
> error in request_irq().
> Also add a hdlc_close() call to the uhdlc_close()
> because the comment to hdlc_close() says it must be called
> by the hardware driver when the HDLC device is being closed
> 
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
> 
> Fixes: c19b6d246a35 ("drivers/net: support hdlc function for QE-UCC")
> Signed-off-by: Alexandra Diupina 
> ---
> v4: undo all the things done prior to hdlc_open() as
> Jakub Kicinski  suggested,
> add hdlc_close() call to the uhdlc_close() to match the function comment,
> add uhdlc_close() declaration to the top of the file not to put the
> uhdlc_close() function definition before uhdlc_open()
> v3: Fix the commits tree
> v2: Remove the 'rc' variable (stores the return value of the
> hdlc_open()) as Christophe Leroy  suggested
>   drivers/net/wan/fsl_ucc_hdlc.c | 8 +++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
> index 47c2ad7a3e42..fd999dabdd39 100644
> --- a/drivers/net/wan/fsl_ucc_hdlc.c
> +++ b/drivers/net/wan/fsl_ucc_hdlc.c
> @@ -34,6 +34,8 @@
>   #define TDM_PPPOHT_SLIC_MAXIN
>   #define RX_BD_ERRORS (R_CD_S | R_OV_S | R_CR_S | R_AB_S | R_NO_S | R_LG_S)
>   
> +static int uhdlc_close(struct net_device *dev);
> +
>   static struct ucc_tdm_info utdm_primary_info = {
>   .uf_info = {
>   .tsa = 0,
> @@ -731,7 +733,9 @@ static int uhdlc_open(struct net_device *dev)
>   napi_enable(&priv->napi);
>   netdev_reset_queue(dev);
>   netif_start_queue(dev);
> - hdlc_open(dev);
> +
> + int rc = hdlc_open(dev);

Do not mix declarations and code. Please put all declaration at the top 
of the block.

> + return rc == 0 ? 0 : (uhdlc_close(dev), rc);
>   }

That's not easy to read.

I know that's more changes, but I'd prefer something like:

static int uhdlc_open(struct net_device *dev)
{
u32 cecr_subblock;
hdlc_device *hdlc = dev_to_hdlc(dev);
struct ucc_hdlc_private *priv = hdlc->priv;
struct ucc_tdm *utdm = priv->utdm;
int rc;

if (priv->hdlc_busy != 1)
return 0;

if (request_irq(priv->ut_info->uf_info.irq,
ucc_hdlc_irq_handler, 0, "hdlc", priv))
return -ENODEV;

cecr_subblock = ucc_fast_get_qe_cr_subblock(
priv->ut_info->uf_info.ucc_num);

qe_issue_cmd(QE_INIT_TX_RX, cecr_subblock,
 QE_CR_PROTOCOL_UNSPECIFIED, 0);

ucc_fast_enable(priv->uccf, COMM_DIR_RX | COMM_DIR_TX);

/* Enable the TDM port */
if (priv->tsa)
qe_setbits_8(&utdm->si_regs->siglmr1_h, 0x1 << utdm->tdm_port);

priv->hdlc_busy = 1;
netif_device_attach(priv->ndev);
napi_enable(&priv->napi);
netdev_reset_queue(dev);
netif_start_queue(dev);

rc = hdlc_open(dev);
if (rc)
uhdlc_close(dev);

return rc;
}



>   
>   return 0;
> @@ -824,6 +828,8 @@ static int uhdlc_close(struct net_device *dev)
>   netdev_reset_queue(dev);
>   priv->hdlc_busy = 0;
>   
> + hdlc_close(dev);
> +
>   return 0;
>   }
>   


And while you are looking at the correctness of this code, is it sure 
that uhdlc_open() cannot be called twice in parallele ?
If it can be called in parallèle I think the "if (priv->hdlc_busy != 1)" 
should be replaced by something using cmpxchg()

Re: [PATCH v4] drivers/net: process the result of hdlc_open() and add call of hdlc_close() in uhdlc_close()

2023-09-04 Thread Christophe Leroy



Le 04/09/2023 à 19:03, Christophe Leroy a écrit :
> 
> 
> Le 04/09/2023 à 14:31, Alexandra Diupina a écrit :
>> Process the result of hdlc_open() and call uhdlc_close()
>> in case of an error. It is necessary to pass the error
>> code up the control flow, similar to a possible
>> error in request_irq().
>> Also add a hdlc_close() call to the uhdlc_close()
>> because the comment to hdlc_close() says it must be called
>> by the hardware driver when the HDLC device is being closed
>>
>> Found by Linux Verification Center (linuxtesting.org) with SVACE.
>>
>> Fixes: c19b6d246a35 ("drivers/net: support hdlc function for QE-UCC")
>> Signed-off-by: Alexandra Diupina 
>> ---
>> v4: undo all the things done prior to hdlc_open() as
>> Jakub Kicinski  suggested,
>> add hdlc_close() call to the uhdlc_close() to match the function comment,
>> add uhdlc_close() declaration to the top of the file not to put the
>> uhdlc_close() function definition before uhdlc_open()
>> v3: Fix the commits tree
>> v2: Remove the 'rc' variable (stores the return value of the
>> hdlc_open()) as Christophe Leroy  suggested
>>   drivers/net/wan/fsl_ucc_hdlc.c | 8 +++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/wan/fsl_ucc_hdlc.c 
>> b/drivers/net/wan/fsl_ucc_hdlc.c
>> index 47c2ad7a3e42..fd999dabdd39 100644
>> --- a/drivers/net/wan/fsl_ucc_hdlc.c
>> +++ b/drivers/net/wan/fsl_ucc_hdlc.c
>> @@ -34,6 +34,8 @@
>>   #define TDM_PPPOHT_SLIC_MAXIN
>>   #define RX_BD_ERRORS (R_CD_S | R_OV_S | R_CR_S | R_AB_S | R_NO_S | 
>> R_LG_S)
>> +static int uhdlc_close(struct net_device *dev);
>> +
>>   static struct ucc_tdm_info utdm_primary_info = {
>>   .uf_info = {
>>   .tsa = 0,
>> @@ -731,7 +733,9 @@ static int uhdlc_open(struct net_device *dev)
>>   napi_enable(&priv->napi);
>>   netdev_reset_queue(dev);
>>   netif_start_queue(dev);
>> -    hdlc_open(dev);
>> +
>> +    int rc = hdlc_open(dev);
> 
> Do not mix declarations and code. Please put all declaration at the top 
> of the block.
> 
>> +    return rc == 0 ? 0 : (uhdlc_close(dev), rc);
>>   }
> 
> That's not easy to read.
> 
> I know that's more changes, but I'd prefer something like:
> 
> static int uhdlc_open(struct net_device *dev)
> {
>  u32 cecr_subblock;
>  hdlc_device *hdlc = dev_to_hdlc(dev);
>  struct ucc_hdlc_private *priv = hdlc->priv;
>  struct ucc_tdm *utdm = priv->utdm;
>  int rc;
> 
>  if (priv->hdlc_busy != 1)

Of course should be:

if (priv->hdlc_busy == 1)

>      return 0;
> 
>  if (request_irq(priv->ut_info->uf_info.irq,
>      ucc_hdlc_irq_handler, 0, "hdlc", priv))
>      return -ENODEV;
> 
>  cecr_subblock = ucc_fast_get_qe_cr_subblock(
>      priv->ut_info->uf_info.ucc_num);
> 
>  qe_issue_cmd(QE_INIT_TX_RX, cecr_subblock,
>   QE_CR_PROTOCOL_UNSPECIFIED, 0);
> 
>  ucc_fast_enable(priv->uccf, COMM_DIR_RX | COMM_DIR_TX);
> 
>  /* Enable the TDM port */
>  if (priv->tsa)
>      qe_setbits_8(&utdm->si_regs->siglmr1_h, 0x1 << utdm->tdm_port);
> 
>  priv->hdlc_busy = 1;
>  netif_device_attach(priv->ndev);
>  napi_enable(&priv->napi);
>  netdev_reset_queue(dev);
>  netif_start_queue(dev);
> 
>  rc = hdlc_open(dev);
>  if (rc)
>      uhdlc_close(dev);
> 
>  return rc;
> }
> 
> 
> 
>>   return 0;
>> @@ -824,6 +828,8 @@ static int uhdlc_close(struct net_device *dev)
>>   netdev_reset_queue(dev);
>>   priv->hdlc_busy = 0;
>> +    hdlc_close(dev);
>> +
>>   return 0;
>>   }
> 
> 
> And while you are looking at the correctness of this code, is it sure 
> that uhdlc_open() cannot be called twice in parallele ?
> If it can be called in parallèle I think the "if (priv->hdlc_busy != 1)" 
> should be replaced by something using cmpxchg()

Re: [PATCH v2 1/3] kconfig: add dependencies of POWER_RESET for mips malta

2023-09-04 Thread Yuan Tan




On 9/4/2023 6:58 PM, Christophe Leroy wrote:


Le 04/09/2023 à 12:51, Philippe Mathieu-Daudé a écrit :

On 4/9/23 11:24, Yuan Tan wrote:

Hi,

On 9/4/2023 3:40 PM, Philippe Mathieu-Daudé wrote:

Hi,

On 1/9/23 04:42, Yuan Tan wrote:

MIPS Malta's power off depends on PCI, PCI_QUIRKS, and
POWER_RESET_PIIX4_POWEROFF to work. Enable them when POWER_RESET is set
for convenience.

Suggested-by: Zhangjin Wu 
Signed-off-by: Yuan Tan 
---
   arch/mips/Kconfig | 3 +++
   1 file changed, 3 insertions(+)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index bc8421859006..13bacbd05125 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -547,6 +547,9 @@ config MIPS_MALTA
   select MIPS_L1_CACHE_SHIFT_6
   select MIPS_MSC
   select PCI_GT64XXX_PCI0
+    select PCI if POWER_RESET
+    select PCI_QUIRKS if POWER_RESET
+    select POWER_RESET_PIIX4_POWEROFF if POWER_RESET
   select SMP_UP if SMP
   select SWAP_IO_SPACE
   select SYS_HAS_CPU_MIPS32_R1

Shouldn't we also update the _defconfig files?


Sorry, in my last email, I forgot to reply to all. So I am now
resending this email.

In malta_defconfig, PCI and POWER_RESET_PIIX4_POWEROFF have already
been set and PCI_QUIRKS is also selected by FSL_PCI [=n].

So shutdown and reboot with malta_defconfig is working and there is no
need to update the malta_defconfig 🙂

Since the dependency is now enforced by Kconfig, the defconfig can
be simplified:

--- a/arch/mips/configs/malta_defconfig
+++ b/arch/mips/configs/malta_defconfig
@@ -306,3 +306,2 @@ CONFIG_SERIAL_8250_CONSOLE=y
   CONFIG_POWER_RESET=y
-CONFIG_POWER_RESET_PIIX4_POWEROFF=y
   CONFIG_POWER_RESET_SYSCON=y

But maybe we don't care, I don't know.

I understand from what you say that you update malta_defconfig manually ?

defconfigs shouldn't be updated manually.

Once you have the new .config you should use "make savedefconfig" then
replace your file by the newly generated defconfig file.

Christophe


To do so, I just unset CONFIG_POWER_RESET and set it again in 
menuconfig, then "make savedefconfig". The POWER_RESET part is simplified.


 CONFIG_POWER_RESET=y
-CONFIG_POWER_RESET_PIIX4_POWEROFF=y
-CONFIG_POWER_RESET_SYSCON=y

 However, I found that there's other changes in this new 
malta_defconfig, for example


CONFIG_NLS_KOI8_U=m CONFIG_CRYPTO_CRYPTD=m -CONFIG_CRYPTO_LRW=m 
-CONFIG_CRYPTO_PCBC=m -CONFIG_CRYPTO_HMAC=y -CONFIG_CRYPTO_XCBC=m 
-CONFIG_CRYPTO_MD4=m -CONFIG_CRYPTO_SHA512=m -CONFIG_CRYPTO_WP512=m 
-CONFIG_CRYPTO_ANUBIS=m CONFIG_CRYPTO_BLOWFISH=m CONFIG_CRYPTO_CAMELLIA=m


Should I import all these changes in a commit? Or only POWER_RESET part.

Re: [PATCH v2 1/3] kconfig: add dependencies of POWER_RESET for mips malta

2023-09-04 Thread Philippe Mathieu-Daudé


On 4/9/23 19:40, Yuan Tan wrote:


On 9/4/2023 6:58 PM, Christophe Leroy wrote:


Le 04/09/2023 à 12:51, Philippe Mathieu-Daudé a écrit :

On 4/9/23 11:24, Yuan Tan wrote:

Hi,

On 9/4/2023 3:40 PM, Philippe Mathieu-Daudé wrote:

Hi,

On 1/9/23 04:42, Yuan Tan wrote:

MIPS Malta's power off depends on PCI, PCI_QUIRKS, and
POWER_RESET_PIIX4_POWEROFF to work. Enable them when POWER_RESET 
is set

for convenience.

Suggested-by: Zhangjin Wu 
Signed-off-by: Yuan Tan 
---
   arch/mips/Kconfig | 3 +++
   1 file changed, 3 insertions(+)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index bc8421859006..13bacbd05125 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -547,6 +547,9 @@ config MIPS_MALTA
   select MIPS_L1_CACHE_SHIFT_6
   select MIPS_MSC
   select PCI_GT64XXX_PCI0
+    select PCI if POWER_RESET
+    select PCI_QUIRKS if POWER_RESET
+    select POWER_RESET_PIIX4_POWEROFF if POWER_RESET
   select SMP_UP if SMP
   select SWAP_IO_SPACE
   select SYS_HAS_CPU_MIPS32_R1

Shouldn't we also update the _defconfig files?


Sorry, in my last email, I forgot to reply to all. So I am now
resending this email.

In malta_defconfig, PCI and POWER_RESET_PIIX4_POWEROFF have already
been set and PCI_QUIRKS is also selected by FSL_PCI [=n].

So shutdown and reboot with malta_defconfig is working and there is no
need to update the malta_defconfig 🙂

Since the dependency is now enforced by Kconfig, the defconfig can
be simplified:

--- a/arch/mips/configs/malta_defconfig
+++ b/arch/mips/configs/malta_defconfig
@@ -306,3 +306,2 @@ CONFIG_SERIAL_8250_CONSOLE=y
   CONFIG_POWER_RESET=y
-CONFIG_POWER_RESET_PIIX4_POWEROFF=y
   CONFIG_POWER_RESET_SYSCON=y

But maybe we don't care, I don't know.

I understand from what you say that you update malta_defconfig manually ?

defconfigs shouldn't be updated manually.

Once you have the new .config you should use "make savedefconfig" then
replace your file by the newly generated defconfig file.

Christophe


To do so, I just unset CONFIG_POWER_RESET and set it again in 
menuconfig, then "make savedefconfig". The POWER_RESET part is simplified.


  CONFIG_POWER_RESET=y
-CONFIG_POWER_RESET_PIIX4_POWEROFF=y
-CONFIG_POWER_RESET_SYSCON=y

  However, I found that there's other changes in this new 
malta_defconfig, for example


CONFIG_NLS_KOI8_U=m CONFIG_CRYPTO_CRYPTD=m -CONFIG_CRYPTO_LRW=m 
-CONFIG_CRYPTO_PCBC=m -CONFIG_CRYPTO_HMAC=y -CONFIG_CRYPTO_XCBC=m 
-CONFIG_CRYPTO_MD4=m -CONFIG_CRYPTO_SHA512=m -CONFIG_CRYPTO_WP512=m 
-CONFIG_CRYPTO_ANUBIS=m CONFIG_CRYPTO_BLOWFISH=m CONFIG_CRYPTO_CAMELLIA=m


Should I import all these changes in a commit? Or only POWER_RESET part.


I'd first update the defconfigs with mainline (as a cleanup)
then apply your series on top, re-running 'make savedefconfig'
you should get only the changes relevant to your work.

Re: [PATCH] powerpc/smp: Dynamically build powerpc topology

2023-09-04 Thread Peter Zijlstra

On Wed, Aug 30, 2023 at 05:56:14PM +0530, Srikar Dronamraju wrote:
> Currently there are four powerpc specific sched topologies.  These are
> all statically defined.  However not all these topologies are used by
> all powerpc systems.
> 
> To avoid unnecessary degenerations by the scheduler , masks and flags
> are compared. However if the sched topologies are build dynamically then
> the code is simpler and there are greater chances of avoiding
> degenerations.
> 
> Even x86 builds its sched topologies dynamically and new changes are
> very similar to the way x86 is building its topologies.

You're not stating it explicitly, but you're doing this as a performance
optimization, right? The x86 thing didn't particularly care about
avoiding degenerate topologies -- it's just that the fixed tables method
grew unwieldy due to combinatorics.

And how does this patch relate to the other series touching this?

  powerpc/smp: Shared processor sched optimizations

Re: [PATCH 1/2] vmcore: allow alternate dump capturing methods to export vmcore without is_kdump_kernel()

2023-09-04 Thread Baoquan He

On 09/04/23 at 08:04pm, Hari Bathini wrote:
> Hi Baoquan,
> 
> Thanks for the review...
> 
> On 03/09/23 9:06 am, Baoquan He wrote:
> > Hi Hari,
> > 
> > On 09/02/23 at 12:34am, Hari Bathini wrote:
> > > Currently, is_kdump_kernel() returns true when elfcorehdr_addr is set.
> > > While elfcorehdr_addr is set for kexec based kernel dump mechanism,
> > > alternate dump capturing methods like fadump [1] also set it to export
> > > the vmcore. is_kdump_kernel() is used to restrict resources in crash
> > > dump capture kernel but such restrictions may not be desirable for
> > > fadump. Allow is_kdump_kernel() to be defined differently for such
> > > scenarios. With this, is_kdump_kernel() could be false while vmcore
> > > is usable. So, introduce is_crashdump_kernel() to return true when
> > > elfcorehdr_addr is set and use it for vmcore related checks.
> > 
> > I got what is done in these two patches, but didn't get why they need be
> > done. vmcore_unusable()/is_vmcore_usable() are only unitilized in ia64.
> > Why do you care if it's is_crashdump_kernel() or is_kdump_kernel()?
> > If you want to override the generic is_kdump_kernel() with powerpc's own
> > is_kdump_kernel(), your below change is enough to allow you to do that.
> > I can't see why is_crashdump_kernel() is needed. Could you explain that
> > specifically?
> 
> You mean to just remove is_kdump_kernel() check in is_vmcore_usable() &
> vmcore_unusable() functions? Replaced generic is_crashdump_kernel()
> function instead, that returns true for any dump capturing method,
> irrespective of whether is_kdump_kernel() returns true or false.
> For fadump case, is_kdump_kernel() will return false after patch 2/2.

OK, I could understand what you want to achieve. You want to make
is_kdump_kernel() only return true for kdump, while is_vmcore_usable()
returns true for both kdump and fadump.

IIUC, can we change as below? It could make code clearer and more
straightforward. I don't think adding another is_crashdump_kernel()
is a good idea, that would be a torture for non-powerpc people reading
code when they need differentiate between kdump and crashdump.

diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 0f3a656293b0..102a8b710b38 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -50,6 +50,7 @@ void vmcore_cleanup(void);
 #define vmcore_elf64_check_arch(x) (elf_check_arch(x) || 
vmcore_elf_check_arch_cross(x))
 #endif
 
+#ifndef is_kdump_active
 /*
  * is_kdump_kernel() checks whether this kernel is booting after a panic of
  * previous kernel or not. This is determined by checking if previous kernel
@@ -64,6 +65,14 @@ static inline bool is_kdump_kernel(void)
 {
return elfcorehdr_addr != ELFCORE_ADDR_MAX;
 }
+#endif
+
+#ifndef is_fadump_active
+static inline bool is_fadump_active(void)
+{
+   return false;
+}
+#endif
 
 /* is_vmcore_usable() checks if the kernel is booting after a panic and
  * the vmcore region is usable.
@@ -75,7 +84,8 @@ static inline bool is_kdump_kernel(void)
 
 static inline int is_vmcore_usable(void)
 {
-   return is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
+   return (is_kdump_kernel() || is_fadump_active())
+   && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
 }
 
 /* vmcore_unusable() marks the vmcore as unusable,
@@ -84,7 +94,7 @@ static inline int is_vmcore_usable(void)
 
 static inline void vmcore_unusable(void)
 {
-   if (is_kdump_kernel())
+   if (is_kdump_kernel() || is_fadump_active())
elfcorehdr_addr = ELFCORE_ADDR_ERR;
 }

Re: [PATCH] powerpc/64e: Fix wrong test in __ptep_test_and_clear_young()

2023-09-04 Thread Michael Ellerman

Christophe Leroy  writes:
> Commit 45201c879469 ("powerpc/nohash: Remove hash related code from
> nohash headers.") replaced:
>
>   if ((pte_val(*ptep) & (_PAGE_ACCESSED | _PAGE_HASHPTE)) == 0)
>   return 0;
>
> By:
>
>   if (pte_young(*ptep))
>   return 0;
>
> But it should be:
>
>   if (!pte_young(*ptep))
>   return 0;

That seems bad :)

But I don't know off the top of my head where
__ptep_test_and_clear_young() is used, and so what the symptoms could
be. Presumably nothing too bad or someone would have noticed?

cheers

Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval

2023-09-04 Thread Michael Ellerman

Michal Suchánek  writes:
> On Thu, Aug 31, 2023 at 12:59:25PM -0500, Nathan Lynch wrote:
...
>> You (Michal) seem to favor a kernel-user ABI where user space is allowed
>> to invoke arbitrary RTAS functions by name. But we already have that in
>> the form of the rtas() syscall. (User space looks up function tokens by
>> name in the DT.) The point of the series is that we need to move away
>> from that. It's too low-level and user space has to use /dev/mem when
>> invoking any of the less-simple RTAS functions.
>
> We don't have that, directly accessing /dev/mem does not really work.
> And that's what needs fixing in my view.
>
> The rtas calls are all mechanically the same, the function implemented
> here should be able to call any of them if there was a way to specify
> the call.
>
> Given that there is desire to have access to multiple calls I don't
> think it makes sense to allocate a separate device with different name
> for each.

I think it does make sense.

We explicitly don't want a general "call any RTAS function" API.

We want tightly scoped APIs that do one thing, or a family of related
things, but not anything & everything.

Having different devices for each of those APIs means permissions can be
granted separately on those devices. So a user/group can be given access
to the "papr-vpd" device, but not some other unrelated device that also
happens to expose an RTAS service (eg. error injection).

cheers

Re: [PATCH 0/4] ppc, fbdev: Clean up fbdev mmap helper

2023-09-04 Thread Michael Ellerman

Thomas Zimmermann  writes:
> Refactor fb_pgprotect() in PowerPC to work without struct file. Then
> clean up and rename fb_pgprotect(). This change has been discussed at
> [1] in the context of refactoring fbdev's mmap code.
>
> The first three patches adapt PowerPC's internal interfaces to
> provide a phys_mem_access_prot() that works without struct file. Neither
> the architecture code or fbdev helpers need the parameter.
>
> Patch 4 replaces fbdev's fb_pgprotect() with fb_pgprot_device() on
> all architectures. The new helper with its stream-lined interface
> enables more refactoring within fbdev's mmap implementation.

The content of this series is OK, but the way it's structured makes it a
real headache to merge, because it's mostly powerpc changes and then a
dependant cross architecture patch at the end.

It would be simpler if patch 4 was first and just passed file=NULL to
the powerpc helper, with an explanation that it's unused and will be
dropped in a future cleanup.

We could then put the first patch (previously patch 4) in a topic branch
that is shared between the powerpc tree and the fbdev tree, and then the
powerpc changes could be staged on top of that through the powerpc tree.

cheers

[PATCH v4 00/11] KVM: PPC: Nested APIv2 guest support

2023-09-04 Thread Jordan Niethe

A nested-HV API for PAPR has been developed based on the KVM-specific
nested-HV API that is upstream in Linux/KVM and QEMU. The PAPR API had
to break compatibility to accommodate implementation in other
hypervisors and partitioning firmware. The existing KVM-specific API
will be known as the Nested APIv1 and the PAPR API will be known as the
Nested APIv2. 

The control flow and interrupt processing between L0, L1, and L2 in
the Nested APIv2 are conceptually unchanged. Where Nested APIv1 is almost
stateless, the Nested APIv2 is stateful, with the L1 registering L2 virtual
machines and vCPUs with the L0. Supervisor-privileged register switching
duty is now the responsibility for the L0, which holds canonical L2
register state and handles all switching. This new register handling
motivates the "getters and setters" wrappers to assist in syncing the
L2s state in the L1 and the L0.

Broadly, the new hcalls will be used for  creating and managing guests
by a regular partition in the following way:

  - L1 and L0 negotiate capabilities with
H_GUEST_{G,S}ET_CAPABILITIES

  - L1 requests the L0 create a L2 with
H_GUEST_CREATE and receives a handle to use in future hcalls

  - L1 requests the L0 create a L2 vCPU with
H_GUEST_CREATE_VCPU

  - L1 sets up the L2 using H_GUEST_SET and the
H_GUEST_VCPU_RUN input buffer

  - L1 requests the L0 runs the L2 vCPU using H_GUEST_VCPU_RUN

  - L2 returns to L1 with an exit reason and L1 reads the
H_GUEST_VCPU_RUN output buffer populated by the L0

  - L1 handles the exit using H_GET_STATE if necessary

  - L1 reruns L2 vCPU with H_GUEST_VCPU_RUN

  - L1 frees the L2 in the L0 with H_GUEST_DELETE

Further details are available in Documentation/powerpc/kvm-nested.rst.

This series adds KVM support for using this hcall interface as a regular
PAPR partition, i.e. the L1. It does not add support for running as the
L0.

The new hcalls have been implemented in the spapr qemu model for
testing.

This is available at https://github.com/planetharsh/qemu/tree/upstream-0714-kop

There are scripts available to assist in setting up an environment for
testing nested guests at https://github.com/iamjpn/kvm-powervm-test

A tree with this series is available at
https://github.com/iamjpn/linux/tree/features/kvm-nestedv2-v4

Thanks to Amit Machhiwal, Kautuk Consul, Vaibhav Jain, Michael Neuling,
Shivaprasad Bhat, Harsh Prateek Bora, Paul Mackerras and Nicholas
Piggin.

Change overview in v4:
  - Split previous "KVM: PPC: Use getters and setters for vcpu register
state" into a number of seperate patches
- Remove _hv suffix from VCORE wrappers
- Do not create arch_compat and lpcr setters, use the existing ones
- Use #ifdef ALTIVEC
  - KVM: PPC: Rename accessor generator macros
- Fix typo
  - KVM: PPC: Book3s HV: Hold LPIDs in an unsigned long
- Use u64
- Change format strings instead of casting
  - KVM: PPC: Add support for nestedv2 guests
- Batch H_GUEST_GET calls in kvmhv_nestedv2_reload_ptregs()
- Fix compile without CONFIG_PSERIES
- Fix maybe uninitialized 'trap' in kvmhv_p9_guest_entry()
- Extend existing setters for arch_compat and lpcr


Change overview in v3:
  - KVM: PPC: Use getters and setters for vcpu register state
  - Do not add a helper for pvr
  - Use an expression when declaring variable in case
  - Squash in all getters and setters
  - Pass vector registers by reference
  - KVM: PPC: Rename accessor generator macros
  - New to series
  - KVM: PPC: Add helper library for Guest State Buffers
  - Use EXPORT_SYMBOL_GPL()
  - Use the kvmppc namespace
  - Move kvmppc_gsb_reset() out of kvmppc_gsm_fill_info()
  - Comments for GSID elements
  - Pass vector elements by reference
  - Remove generic put and get functions
  - KVM: PPC: Book3s HV: Hold LPIDs in an unsigned long
  - New to series
  - KVM: PPC: Add support for nestedv2 guests
  - Use EXPORT_SYMBOL_GPL()
  - Change to kvmhv_nestedv2 namespace
  - Make kvmhv_enable_nested() return -ENODEV on NESTEDv2 L1 hosts
  - s/kvmhv_on_papr/kvmhv_is_nestedv2/
  - mv book3s_hv_papr.c book3s_hv_nestedv2.c
  - Handle shared regs without a guest state id in the same wrapper
  - Use a static key for API version
  - Add a positive test for NESTEDv1
  - Give the amor a static value
  - s/struct kvmhv_nestedv2_host/struct kvmhv_nestedv2_io/
  - Propagate failure in kvmhv_vcpu_entry_nestedv2()
  - WARN if getters and setters fail
  - Progagate failure from kvmhv_nestedv2_parse_output()
  - Replace delay with sleep in plpar_guest_{create,delete,create_vcpu}()
  - Add logical PVR handling
  - Replace kvmppc_gse_{get,put} with specific version
  - docs: powerpc: Document nested KVM on POWER
  - Fix typos


Change overview in v2:
  - Rebase on top of kvm ppc prefix instruction support
  - Make documentation an individual patch
  - Move guest state buffer files from arch/powerpc/l

[PATCH v4 01/11] KVM: PPC: Always use the GPR accessors

2023-09-04 Thread Jordan Niethe

Always use the GPR accessor functions. This will be important later for
Nested APIv2 support which requires additional functionality for
accessing and modifying VCPU state.

Signed-off-by: Jordan Niethe 
---
v4:
  - Split into unique patch
---
 arch/powerpc/kvm/book3s_64_vio.c | 4 ++--
 arch/powerpc/kvm/book3s_hv.c | 8 ++--
 arch/powerpc/kvm/book3s_hv_builtin.c | 6 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 8 
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 4 ++--
 arch/powerpc/kvm/book3s_xive.c   | 4 ++--
 6 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 93b695b289e9..4ba048f272f2 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -786,12 +786,12 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned 
long liobn,
idx = (ioba >> stt->page_shift) - stt->offset;
page = stt->pages[idx / TCES_PER_PAGE];
if (!page) {
-   vcpu->arch.regs.gpr[4] = 0;
+   kvmppc_set_gpr(vcpu, 4, 0);
return H_SUCCESS;
}
tbl = (u64 *)page_address(page);
 
-   vcpu->arch.regs.gpr[4] = tbl[idx % TCES_PER_PAGE];
+   kvmppc_set_gpr(vcpu, 4, tbl[idx % TCES_PER_PAGE]);
 
return H_SUCCESS;
 }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 130bafdb1430..4af5b68cf7f8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1267,10 +1267,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
return RESUME_HOST;
break;
 #endif
-   case H_RANDOM:
-   if (!arch_get_random_seed_longs(&vcpu->arch.regs.gpr[4], 1))
+   case H_RANDOM: {
+   unsigned long rand;
+
+   if (!arch_get_random_seed_longs(&rand, 1))
ret = H_HARDWARE;
+   kvmppc_set_gpr(vcpu, 4, rand);
break;
+   }
case H_RPT_INVALIDATE:
ret = kvmppc_h_rpt_invalidate(vcpu, kvmppc_get_gpr(vcpu, 4),
  kvmppc_get_gpr(vcpu, 5),
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 0f5b021fa559..f3afe194e616 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -182,9 +182,13 @@ EXPORT_SYMBOL_GPL(kvmppc_hwrng_present);
 
 long kvmppc_rm_h_random(struct kvm_vcpu *vcpu)
 {
+   unsigned long rand;
+
if (ppc_md.get_random_seed &&
-   ppc_md.get_random_seed(&vcpu->arch.regs.gpr[4]))
+   ppc_md.get_random_seed(&rand)) {
+   kvmppc_set_gpr(vcpu, 4, rand);
return H_SUCCESS;
+   }
 
return H_HARDWARE;
 }
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c 
b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 9182324dbef9..17cb75a127b0 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -776,8 +776,8 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long 
flags,
r = rev[i].guest_rpte | (r & (HPTE_R_R | HPTE_R_C));
r &= ~HPTE_GR_RESERVED;
}
-   vcpu->arch.regs.gpr[4 + i * 2] = v;
-   vcpu->arch.regs.gpr[5 + i * 2] = r;
+   kvmppc_set_gpr(vcpu, 4 + i * 2, v);
+   kvmppc_set_gpr(vcpu, 5 + i * 2, r);
}
return H_SUCCESS;
 }
@@ -824,7 +824,7 @@ long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned 
long flags,
}
}
}
-   vcpu->arch.regs.gpr[4] = gr;
+   kvmppc_set_gpr(vcpu, 4, gr);
ret = H_SUCCESS;
  out:
unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
@@ -872,7 +872,7 @@ long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned 
long flags,
kvmppc_set_dirty_from_hpte(kvm, v, gr);
}
}
-   vcpu->arch.regs.gpr[4] = gr;
+   kvmppc_set_gpr(vcpu, 4, gr);
ret = H_SUCCESS;
  out:
unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index e165bfa842bf..e42984878503 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -481,7 +481,7 @@ static void icp_rm_down_cppr(struct kvmppc_xics *xics, 
struct kvmppc_icp *icp,
 
 unsigned long xics_rm_h_xirr_x(struct kvm_vcpu *vcpu)
 {
-   vcpu->arch.regs.gpr[5] = get_tb();
+   kvmppc_set_gpr(vcpu, 5, get_tb());
return xics_rm_h_xirr(vcpu);
 }
 
@@ -518,7 +518,7 @@ unsigned long xics_rm_h_xirr(struct kvm_vcpu *vcpu)
} while (!icp_rm_try_update(icp, old_state, new_state));
 
/* Return the result in GPR4 */
-   vcpu->arch.regs.gpr[4] = xirr;
+   kvmppc_set_gpr(vcpu, 4, xirr);
 
return check_too_hard(xics, icp);
 }
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/boo

[PATCH v4 02/11] KVM: PPC: Introduce FPR/VR accessor functions

2023-09-04 Thread Jordan Niethe

Introduce accessor functions for floating point and vector registers
like the ones that exist for GPRs. Use these to replace the existing FPR
and VR accessor macros.

This will be important later for Nested APIv2 support which requires
additional functionality for accessing and modifying VCPU state.

Signed-off-by: Jordan Niethe 
---
v4:
  - Split into unique patch
---
 arch/powerpc/include/asm/kvm_book3s.h | 55 
 arch/powerpc/include/asm/kvm_booke.h  | 10 
 arch/powerpc/kvm/book3s.c | 16 +++---
 arch/powerpc/kvm/emulate_loadstore.c  |  2 +-
 arch/powerpc/kvm/powerpc.c| 72 +--
 5 files changed, 110 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index bbf5e2c5fe09..109a5f56767a 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -403,6 +403,61 @@ static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu 
*vcpu)
return vcpu->arch.fault_dar;
 }
 
+static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
+{
+   return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
+}
+
+static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
+{
+   vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
+}
+
+static inline u64 kvmppc_get_fpscr(struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.fp.fpscr;
+}
+
+static inline void kvmppc_set_fpscr(struct kvm_vcpu *vcpu, u64 val)
+{
+   vcpu->arch.fp.fpscr = val;
+}
+
+
+static inline u64 kvmppc_get_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j)
+{
+   return vcpu->arch.fp.fpr[i][j];
+}
+
+static inline void kvmppc_set_vsx_fpr(struct kvm_vcpu *vcpu, int i, int j,
+ u64 val)
+{
+   vcpu->arch.fp.fpr[i][j] = val;
+}
+
+#ifdef CONFIG_ALTIVEC
+static inline void kvmppc_get_vsx_vr(struct kvm_vcpu *vcpu, int i, vector128 
*v)
+{
+   *v =  vcpu->arch.vr.vr[i];
+}
+
+static inline void kvmppc_set_vsx_vr(struct kvm_vcpu *vcpu, int i,
+vector128 *val)
+{
+   vcpu->arch.vr.vr[i] = *val;
+}
+
+static inline u32 kvmppc_get_vscr(struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.vr.vscr.u[3];
+}
+
+static inline void kvmppc_set_vscr(struct kvm_vcpu *vcpu, u32 val)
+{
+   vcpu->arch.vr.vscr.u[3] = val;
+}
+#endif
+
 /* Expiry time of vcpu DEC relative to host TB */
 static inline u64 kvmppc_dec_expires_host_tb(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/powerpc/include/asm/kvm_booke.h 
b/arch/powerpc/include/asm/kvm_booke.h
index 0c3401b2e19e..7c3291aa8922 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -89,6 +89,16 @@ static inline ulong kvmppc_get_pc(struct kvm_vcpu *vcpu)
return vcpu->arch.regs.nip;
 }
 
+static inline void kvmppc_set_fpr(struct kvm_vcpu *vcpu, int i, u64 val)
+{
+   vcpu->arch.fp.fpr[i][TS_FPROFFSET] = val;
+}
+
+static inline u64 kvmppc_get_fpr(struct kvm_vcpu *vcpu, int i)
+{
+   return vcpu->arch.fp.fpr[i][TS_FPROFFSET];
+}
+
 #ifdef CONFIG_BOOKE
 static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 686d8d9eda3e..c080dd2e96ac 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -636,17 +636,17 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
break;
case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
i = id - KVM_REG_PPC_FPR0;
-   *val = get_reg_val(id, VCPU_FPR(vcpu, i));
+   *val = get_reg_val(id, kvmppc_get_fpr(vcpu, i));
break;
case KVM_REG_PPC_FPSCR:
-   *val = get_reg_val(id, vcpu->arch.fp.fpscr);
+   *val = get_reg_val(id, kvmppc_get_fpscr(vcpu));
break;
 #ifdef CONFIG_VSX
case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31:
if (cpu_has_feature(CPU_FTR_VSX)) {
i = id - KVM_REG_PPC_VSR0;
-   val->vsxval[0] = vcpu->arch.fp.fpr[i][0];
-   val->vsxval[1] = vcpu->arch.fp.fpr[i][1];
+   val->vsxval[0] = kvmppc_get_vsx_fpr(vcpu, i, 0);
+   val->vsxval[1] = kvmppc_get_vsx_fpr(vcpu, i, 1);
} else {
r = -ENXIO;
}
@@ -724,7 +724,7 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
break;
case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31:
i = id - KVM_REG_PPC_FPR0;
-   VCPU_FPR(vcpu, i) = set_reg_val(id, *val);
+   kvmppc_set_fpr(vcpu, i, set_reg_val(id, *val));
break;
case KVM_REG_PPC_FPSCR:

[PATCH v4 03/11] KVM: PPC: Rename accessor generator macros

2023-09-04 Thread Jordan Niethe

More "wrapper" style accessor generating macros will be introduced for
the nestedv2 guest support. Rename the existing macros with more
descriptive names now so there is a consistent naming convention.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Jordan Niethe 
---
v3:
  - New to series
v4:
  - Fix ACESSOR typo
---
 arch/powerpc/include/asm/kvm_ppc.h | 60 +++---
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index d16d80ad2ae4..d554bc56e7f3 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -927,19 +927,19 @@ static inline bool kvmppc_shared_big_endian(struct 
kvm_vcpu *vcpu)
 #endif
 }
 
-#define SPRNG_WRAPPER_GET(reg, bookehv_spr)\
+#define KVMPPC_BOOKE_HV_SPRNG_ACCESSOR_GET(reg, bookehv_spr)   \
 static inline ulong kvmppc_get_##reg(struct kvm_vcpu *vcpu)\
 {  \
return mfspr(bookehv_spr);  \
 }  \
 
-#define SPRNG_WRAPPER_SET(reg, bookehv_spr)\
+#define KVMPPC_BOOKE_HV_SPRNG_ACCESSOR_SET(reg, bookehv_spr)   \
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, ulong val)  \
 {  \
mtspr(bookehv_spr, val);
\
 }  \
 
-#define SHARED_WRAPPER_GET(reg, size)  \
+#define KVMPPC_VCPU_SHARED_REGS_ACCESSOR_GET(reg, size)
\
 static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)  \
 {  \
if (kvmppc_shared_big_endian(vcpu)) \
@@ -948,7 +948,7 @@ static inline u##size kvmppc_get_##reg(struct kvm_vcpu 
*vcpu)   \
   return le##size##_to_cpu(vcpu->arch.shared->reg);\
 }  \
 
-#define SHARED_WRAPPER_SET(reg, size)  \
+#define KVMPPC_VCPU_SHARED_REGS_ACCESSOR_SET(reg, size)
\
 static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)
\
 {  \
if (kvmppc_shared_big_endian(vcpu)) \
@@ -957,36 +957,36 @@ static inline void kvmppc_set_##reg(struct kvm_vcpu 
*vcpu, u##size val)   \
   vcpu->arch.shared->reg = cpu_to_le##size(val);   \
 }  \
 
-#define SHARED_WRAPPER(reg, size)  \
-   SHARED_WRAPPER_GET(reg, size)   \
-   SHARED_WRAPPER_SET(reg, size)   \
+#define KVMPPC_VCPU_SHARED_REGS_ACCESSOR(reg, size)\
+   KVMPPC_VCPU_SHARED_REGS_ACCESSOR_GET(reg, size) \
+   KVMPPC_VCPU_SHARED_REGS_ACCESSOR_SET(reg, size) \
 
-#define SPRNG_WRAPPER(reg, bookehv_spr)
\
-   SPRNG_WRAPPER_GET(reg, bookehv_spr) \
-   SPRNG_WRAPPER_SET(reg, bookehv_spr) \
+#define KVMPPC_BOOKE_HV_SPRNG_ACCESSOR(reg, bookehv_spr)   \
+   KVMPPC_BOOKE_HV_SPRNG_ACCESSOR_GET(reg, bookehv_spr)\
+   KVMPPC_BOOKE_HV_SPRNG_ACCESSOR_SET(reg, bookehv_spr)\
 
 #ifdef CONFIG_KVM_BOOKE_HV
 
-#define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr)   \
-   SPRNG_WRAPPER(reg, bookehv_spr) \
+#define KVMPPC_BOOKE_HV_SPRNG_OR_VCPU_SHARED_REGS_ACCESSOR(reg, size, 
bookehv_spr) \
+   KVMPPC_BOOKE_HV_SPRNG_ACCESSOR(reg, bookehv_spr)\
 
 #else
 
-#define SHARED_SPRNG_WRAPPER(reg, size, bookehv_spr)   \
-   SHARED_WRAPPER(reg, size)   \
+#define KVMPPC_BOOKE_HV_SPRNG_OR_VCPU_SHARED_REGS_ACCESSOR(reg, size, 
bookehv_spr) \
+   KVMPPC_VCPU_SHARED_REGS_ACCESSOR(reg, size) \
 
 #endif
 
-SHARED_WRAPPER(critical, 64)
-SHARED_SPRNG_WRAPPER(sprg0, 64, SPRN_GSPRG0)
-SHARED_SPRNG_WRAPPER(sprg1, 64, SPRN_GSPRG1)
-SHARED_SPRNG_WRAPPER(sprg2, 64, SPRN_GSPRG2)
-SHARED_SPRNG_WRAPPER(sprg3, 64, SPRN_GSPRG3)
-SHARED_SPRNG_WRAPPER(srr0, 64, SPRN_GSRR0)
-SHARED_SPRNG_WRAPPER(srr1, 64, SPRN_GSRR1)
-SHARED_SPRNG_WRAPPER(dar, 64, SPRN_GDEAR)
-SHARED_SPRNG_WRAPPER(esr, 64, SPRN_GESR)
-SHARED_WRAPPER_GET(msr, 64)
+KVMPPC_VCPU_SHARED_REGS_ACCESSOR(critical, 64)
+KVMPPC_BOOKE_HV_SPRNG_OR_VCPU_SHARE

[PATCH v4 04/11] KVM: PPC: Use accessors for VCPU registers

2023-09-04 Thread Jordan Niethe

Introduce accessor generator macros for VCPU registers. Use the accessor
functions to replace direct accesses to this registers.

This will be important later for Nested APIv2 support which requires
additional functionality for accessing and modifying VCPU state.

Signed-off-by: Jordan Niethe 
---
v4:
  - Split to unique patch
---
 arch/powerpc/include/asm/kvm_book3s.h  | 37 +-
 arch/powerpc/kvm/book3s.c  | 22 +++
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  4 +--
 arch/powerpc/kvm/book3s_hv.c   | 12 -
 arch/powerpc/kvm/book3s_hv_p9_entry.c  |  4 +--
 arch/powerpc/kvm/powerpc.c |  4 +--
 6 files changed, 59 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 109a5f56767a..1a220cd63227 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -458,10 +458,45 @@ static inline void kvmppc_set_vscr(struct kvm_vcpu *vcpu, 
u32 val)
 }
 #endif
 
+#define KVMPPC_BOOK3S_VCPU_ACCESSOR_SET(reg, size) \
+static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)
\
+{  \
+   \
+   vcpu->arch.reg = val;   \
+}
+
+#define KVMPPC_BOOK3S_VCPU_ACCESSOR_GET(reg, size) \
+static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)  \
+{  \
+   return vcpu->arch.reg;  \
+}
+
+#define KVMPPC_BOOK3S_VCPU_ACCESSOR(reg, size) \
+   KVMPPC_BOOK3S_VCPU_ACCESSOR_SET(reg, size)  \
+   KVMPPC_BOOK3S_VCPU_ACCESSOR_GET(reg, size)  \
+
+KVMPPC_BOOK3S_VCPU_ACCESSOR(pid, 32)
+KVMPPC_BOOK3S_VCPU_ACCESSOR(tar, 64)
+KVMPPC_BOOK3S_VCPU_ACCESSOR(ebbhr, 64)
+KVMPPC_BOOK3S_VCPU_ACCESSOR(ebbrr, 64)
+KVMPPC_BOOK3S_VCPU_ACCESSOR(bescr, 64)
+KVMPPC_BOOK3S_VCPU_ACCESSOR(ic, 64)
+KVMPPC_BOOK3S_VCPU_ACCESSOR(vrsave, 64)
+
+static inline u64 kvmppc_get_dec_expires(struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.dec_expires;
+}
+
+static inline void kvmppc_set_dec_expires(struct kvm_vcpu *vcpu, u64 val)
+{
+   vcpu->arch.dec_expires = val;
+}
+
 /* Expiry time of vcpu DEC relative to host TB */
 static inline u64 kvmppc_dec_expires_host_tb(struct kvm_vcpu *vcpu)
 {
-   return vcpu->arch.dec_expires - vcpu->arch.vcore->tb_offset;
+   return kvmppc_get_dec_expires(vcpu) - vcpu->arch.vcore->tb_offset;
 }
 
 static inline bool is_kvmppc_resume_guest(int r)
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index c080dd2e96ac..6cd20ab9e94e 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -565,7 +565,7 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
regs->msr = kvmppc_get_msr(vcpu);
regs->srr0 = kvmppc_get_srr0(vcpu);
regs->srr1 = kvmppc_get_srr1(vcpu);
-   regs->pid = vcpu->arch.pid;
+   regs->pid = kvmppc_get_pid(vcpu);
regs->sprg0 = kvmppc_get_sprg0(vcpu);
regs->sprg1 = kvmppc_get_sprg1(vcpu);
regs->sprg2 = kvmppc_get_sprg2(vcpu);
@@ -683,19 +683,19 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
*val = get_reg_val(id, vcpu->arch.fscr);
break;
case KVM_REG_PPC_TAR:
-   *val = get_reg_val(id, vcpu->arch.tar);
+   *val = get_reg_val(id, kvmppc_get_tar(vcpu));
break;
case KVM_REG_PPC_EBBHR:
-   *val = get_reg_val(id, vcpu->arch.ebbhr);
+   *val = get_reg_val(id, kvmppc_get_ebbhr(vcpu));
break;
case KVM_REG_PPC_EBBRR:
-   *val = get_reg_val(id, vcpu->arch.ebbrr);
+   *val = get_reg_val(id, kvmppc_get_ebbrr(vcpu));
break;
case KVM_REG_PPC_BESCR:
-   *val = get_reg_val(id, vcpu->arch.bescr);
+   *val = get_reg_val(id, kvmppc_get_bescr(vcpu));
break;
case KVM_REG_PPC_IC:
-   *val = get_reg_val(id, vcpu->arch.ic);
+   *val = get_reg_val(id, kvmppc_get_ic(vcpu));
break;
default:
r = -EINVAL;
@@ -768,19 +768,19 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
kvmppc_set_fpscr(vcpu, set_reg_val(id, *val));
break;
case KVM_REG_PPC_TAR:
-   vcpu->arch.tar = set_reg_val(id, *val);
+   kvmppc_set_tar(vcpu, set_reg_v

[PATCH v4 05/11] KVM: PPC: Use accessors VCORE registers

2023-09-04 Thread Jordan Niethe

Introduce accessor generator macros for VCORE registers. Use the accessor
functions to replace direct accesses to this registers.

This will be important later for Nested APIv2 support which requires
additional functionality for accessing and modifying VCPU state.

Signed-off-by: Jordan Niethe 
---
v4:
  - Split to unique patch
  - Remove _hv suffix
  - Do not generate for setter arch_compat and lpcr
---
 arch/powerpc/include/asm/kvm_book3s.h | 25 -
 arch/powerpc/kvm/book3s_hv.c  | 24 
 arch/powerpc/kvm/book3s_hv_ras.c  |  4 ++--
 arch/powerpc/kvm/book3s_xive.c|  4 +---
 4 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 1a220cd63227..4c6558d5fefe 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -483,6 +483,29 @@ KVMPPC_BOOK3S_VCPU_ACCESSOR(bescr, 64)
 KVMPPC_BOOK3S_VCPU_ACCESSOR(ic, 64)
 KVMPPC_BOOK3S_VCPU_ACCESSOR(vrsave, 64)
 
+
+#define KVMPPC_BOOK3S_VCORE_ACCESSOR_SET(reg, size)\
+static inline void kvmppc_set_##reg(struct kvm_vcpu *vcpu, u##size val)
\
+{  \
+   vcpu->arch.vcore->reg = val;\
+}
+
+#define KVMPPC_BOOK3S_VCORE_ACCESSOR_GET(reg, size)\
+static inline u##size kvmppc_get_##reg(struct kvm_vcpu *vcpu)  \
+{  \
+   return vcpu->arch.vcore->reg;   \
+}
+
+#define KVMPPC_BOOK3S_VCORE_ACCESSOR(reg, size)
\
+   KVMPPC_BOOK3S_VCORE_ACCESSOR_SET(reg, size) \
+   KVMPPC_BOOK3S_VCORE_ACCESSOR_GET(reg, size) \
+
+
+KVMPPC_BOOK3S_VCORE_ACCESSOR(vtb, 64)
+KVMPPC_BOOK3S_VCORE_ACCESSOR(tb_offset, 64)
+KVMPPC_BOOK3S_VCORE_ACCESSOR_GET(arch_compat, 32)
+KVMPPC_BOOK3S_VCORE_ACCESSOR_GET(lpcr, 64)
+
 static inline u64 kvmppc_get_dec_expires(struct kvm_vcpu *vcpu)
 {
return vcpu->arch.dec_expires;
@@ -496,7 +519,7 @@ static inline void kvmppc_set_dec_expires(struct kvm_vcpu 
*vcpu, u64 val)
 /* Expiry time of vcpu DEC relative to host TB */
 static inline u64 kvmppc_dec_expires_host_tb(struct kvm_vcpu *vcpu)
 {
-   return kvmppc_get_dec_expires(vcpu) - vcpu->arch.vcore->tb_offset;
+   return kvmppc_get_dec_expires(vcpu) - kvmppc_get_tb_offset(vcpu);
 }
 
 static inline bool is_kvmppc_resume_guest(int r)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 27faecad1e3b..73d9a9eb376f 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -794,7 +794,7 @@ static void kvmppc_update_vpa_dispatch(struct kvm_vcpu 
*vcpu,
 
vpa->enqueue_dispatch_tb = 
cpu_to_be64(be64_to_cpu(vpa->enqueue_dispatch_tb) + stolen);
 
-   __kvmppc_create_dtl_entry(vcpu, vpa, vc->pcpu, now + vc->tb_offset, 
stolen);
+   __kvmppc_create_dtl_entry(vcpu, vpa, vc->pcpu, now + 
kvmppc_get_tb_offset(vcpu), stolen);
 
vcpu->arch.vpa.dirty = true;
 }
@@ -845,9 +845,9 @@ static bool kvmppc_doorbell_pending(struct kvm_vcpu *vcpu)
 
 static bool kvmppc_power8_compatible(struct kvm_vcpu *vcpu)
 {
-   if (vcpu->arch.vcore->arch_compat >= PVR_ARCH_207)
+   if (kvmppc_get_arch_compat(vcpu) >= PVR_ARCH_207)
return true;
-   if ((!vcpu->arch.vcore->arch_compat) &&
+   if ((!kvmppc_get_arch_compat(vcpu)) &&
cpu_has_feature(CPU_FTR_ARCH_207S))
return true;
return false;
@@ -2283,7 +2283,7 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, 
u64 id,
*val = get_reg_val(id, vcpu->arch.vcore->dpdes);
break;
case KVM_REG_PPC_VTB:
-   *val = get_reg_val(id, vcpu->arch.vcore->vtb);
+   *val = get_reg_val(id, kvmppc_get_vtb(vcpu));
break;
case KVM_REG_PPC_DAWR:
*val = get_reg_val(id, vcpu->arch.dawr0);
@@ -2342,11 +2342,11 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, 
u64 id,
spin_unlock(&vcpu->arch.vpa_update_lock);
break;
case KVM_REG_PPC_TB_OFFSET:
-   *val = get_reg_val(id, vcpu->arch.vcore->tb_offset);
+   *val = get_reg_val(id, kvmppc_get_tb_offset(vcpu));
break;
case KVM_REG_PPC_LPCR:
case KVM_REG_PPC_LPCR_64:
-   *val = get_reg_val(id, vcpu->arch.vcore->lpcr);
+   *val = get_reg_val(id, kvmppc_get_lpcr(vcpu));
break;
case KVM_REG_PPC_PPR:
*val = get_reg_val(id, vcpu->arch.ppr);
@@ -2418,7 +2418,7 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, 
u64 id,
break;
 #endif
case KVM_REG_PPC_ARCH_COMPAT:
-   *val = get_reg_

[PATCH v4 06/11] KVM: PPC: Book3S HV: Use accessors for VCPU registers

2023-09-04 Thread Jordan Niethe

Introduce accessor generator macros for Book3S HV VCPU registers. Use
the accessor functions to replace direct accesses to this registers.

This will be important later for Nested APIv2 support which requires
additional functionality for accessing and modifying VCPU state.

Signed-off-by: Jordan Niethe 
---
v4:
  - Split to unique patch
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c |   5 +-
 arch/powerpc/kvm/book3s_hv.c   | 148 +
 arch/powerpc/kvm/book3s_hv.h   |  58 ++
 3 files changed, 139 insertions(+), 72 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 5c71d6ae3a7b..ab646f59afd7 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -15,6 +15,7 @@
 
 #include 
 #include 
+#include "book3s_hv.h"
 #include 
 #include 
 #include 
@@ -294,9 +295,9 @@ int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t 
eaddr,
} else {
if (!(pte & _PAGE_PRIVILEGED)) {
/* Check AMR/IAMR to see if strict mode is in force */
-   if (vcpu->arch.amr & (1ul << 62))
+   if (kvmppc_get_amr_hv(vcpu) & (1ul << 62))
gpte->may_read = 0;
-   if (vcpu->arch.amr & (1ul << 63))
+   if (kvmppc_get_amr_hv(vcpu) & (1ul << 63))
gpte->may_write = 0;
if (vcpu->arch.iamr & (1ul << 62))
gpte->may_execute = 0;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 73d9a9eb376f..fabe99af0e0b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -868,7 +868,7 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, 
unsigned long mflags,
/* Guests can't breakpoint the hypervisor */
if ((value1 & CIABR_PRIV) == CIABR_PRIV_HYPER)
return H_P3;
-   vcpu->arch.ciabr  = value1;
+   kvmppc_set_ciabr_hv(vcpu, value1);
return H_SUCCESS;
case H_SET_MODE_RESOURCE_SET_DAWR0:
if (!kvmppc_power8_compatible(vcpu))
@@ -879,8 +879,8 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, 
unsigned long mflags,
return H_UNSUPPORTED_FLAG_START;
if (value2 & DABRX_HYP)
return H_P4;
-   vcpu->arch.dawr0  = value1;
-   vcpu->arch.dawrx0 = value2;
+   kvmppc_set_dawr0_hv(vcpu, value1);
+   kvmppc_set_dawrx0_hv(vcpu, value2);
return H_SUCCESS;
case H_SET_MODE_RESOURCE_SET_DAWR1:
if (!kvmppc_power8_compatible(vcpu))
@@ -895,8 +895,8 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, 
unsigned long mflags,
return H_UNSUPPORTED_FLAG_START;
if (value2 & DABRX_HYP)
return H_P4;
-   vcpu->arch.dawr1  = value1;
-   vcpu->arch.dawrx1 = value2;
+   kvmppc_set_dawr1_hv(vcpu, value1);
+   kvmppc_set_dawrx1_hv(vcpu, value2);
return H_SUCCESS;
case H_SET_MODE_RESOURCE_ADDR_TRANS_MODE:
/*
@@ -1548,7 +1548,7 @@ static int kvmppc_pmu_unavailable(struct kvm_vcpu *vcpu)
if (!(vcpu->arch.hfscr_permitted & HFSCR_PM))
return EMULATE_FAIL;
 
-   vcpu->arch.hfscr |= HFSCR_PM;
+   kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_PM);
 
return RESUME_GUEST;
 }
@@ -1558,7 +1558,7 @@ static int kvmppc_ebb_unavailable(struct kvm_vcpu *vcpu)
if (!(vcpu->arch.hfscr_permitted & HFSCR_EBB))
return EMULATE_FAIL;
 
-   vcpu->arch.hfscr |= HFSCR_EBB;
+   kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_EBB);
 
return RESUME_GUEST;
 }
@@ -1568,7 +1568,7 @@ static int kvmppc_tm_unavailable(struct kvm_vcpu *vcpu)
if (!(vcpu->arch.hfscr_permitted & HFSCR_TM))
return EMULATE_FAIL;
 
-   vcpu->arch.hfscr |= HFSCR_TM;
+   kvmppc_set_hfscr_hv(vcpu, kvmppc_get_hfscr_hv(vcpu) | HFSCR_TM);
 
return RESUME_GUEST;
 }
@@ -1867,7 +1867,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 * Otherwise, we just generate a program interrupt to the guest.
 */
case BOOK3S_INTERRUPT_H_FAC_UNAVAIL: {
-   u64 cause = vcpu->arch.hfscr >> 56;
+   u64 cause = kvmppc_get_hfscr_hv(vcpu) >> 56;
 
r = EMULATE_FAIL;
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
@@ -2211,64 +2211,64 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, 
u64 id,
*val = get_reg_val(id, vcpu->arch.dabrx);
break;
case KVM_REG_PPC_DSCR:
-   *val = get_reg_val(id, vcpu->arch.dscr);
+   *val = get_reg_val(id

[PATCH v4 07/11] KVM: PPC: Book3S HV: Introduce low level MSR accessor

2023-09-04 Thread Jordan Niethe

kvmppc_get_msr() and kvmppc_set_msr_fast() serve as accessors for the
MSR. However because the MSR is kept in the shared regs they include a
conditional check for kvmppc_shared_big_endian() and endian conversion.

Within the Book3S HV specific code there are direct reads and writes of
shregs::msr. In preparation for Nested APIv2 these accesses need to be
replaced with accessor functions so it is possible to extend their
behavior. However, using the kvmppc_get_msr() and kvmppc_set_msr_fast()
functions is undesirable because it would introduce a conditional branch
and endian conversion that is not currently present.

kvmppc_set_msr_hv() already exists, it is used for the
kvmppc_ops::set_msr callback.

Introduce a low level accessor __kvmppc_{s,g}et_msr_hv() that simply
gets and sets shregs::msr. This will be extend for Nested APIv2 support.

Signed-off-by: Jordan Niethe 
---
v4:
  - New to series
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  5 ++--
 arch/powerpc/kvm/book3s_hv.c | 34 ++--
 arch/powerpc/kvm/book3s_hv.h | 10 
 arch/powerpc/kvm/book3s_hv_builtin.c |  5 ++--
 4 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index efd0ebf70a5e..fdfc2a62dd67 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -28,6 +28,7 @@
 #include 
 
 #include "book3s.h"
+#include "book3s_hv.h"
 #include "trace_hv.h"
 
 //#define DEBUG_RESIZE_HPT 1
@@ -347,7 +348,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
unsigned long v, orig_v, gr;
__be64 *hptep;
long int index;
-   int virtmode = vcpu->arch.shregs.msr & (data ? MSR_DR : MSR_IR);
+   int virtmode = __kvmppc_get_msr_hv(vcpu) & (data ? MSR_DR : MSR_IR);
 
if (kvm_is_radix(vcpu->kvm))
return kvmppc_mmu_radix_xlate(vcpu, eaddr, gpte, data, iswrite);
@@ -385,7 +386,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
 
/* Get PP bits and key for permission check */
pp = gr & (HPTE_R_PP0 | HPTE_R_PP);
-   key = (vcpu->arch.shregs.msr & MSR_PR) ? SLB_VSID_KP : SLB_VSID_KS;
+   key = (__kvmppc_get_msr_hv(vcpu) & MSR_PR) ? SLB_VSID_KP : SLB_VSID_KS;
key &= slb_v;
 
/* Calculate permissions */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index fabe99af0e0b..d4db8192753b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1374,7 +1374,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
  */
 static void kvmppc_cede(struct kvm_vcpu *vcpu)
 {
-   vcpu->arch.shregs.msr |= MSR_EE;
+   __kvmppc_set_msr_hv(vcpu, __kvmppc_get_msr_hv(vcpu) | MSR_EE);
vcpu->arch.ceded = 1;
smp_mb();
if (vcpu->arch.prodded) {
@@ -1589,7 +1589,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 * That can happen due to a bug, or due to a machine check
 * occurring at just the wrong time.
 */
-   if (vcpu->arch.shregs.msr & MSR_HV) {
+   if (__kvmppc_get_msr_hv(vcpu) & MSR_HV) {
printk(KERN_EMERG "KVM trap in HV mode!\n");
printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n",
vcpu->arch.trap, kvmppc_get_pc(vcpu),
@@ -1640,7 +1640,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 * so that it knows that the machine check occurred.
 */
if (!vcpu->kvm->arch.fwnmi_enabled) {
-   ulong flags = (vcpu->arch.shregs.msr & 0x083c) |
+   ulong flags = (__kvmppc_get_msr_hv(vcpu) & 0x083c) |
(kvmppc_get_msr(vcpu) & SRR1_PREFIXED);
kvmppc_core_queue_machine_check(vcpu, flags);
r = RESUME_GUEST;
@@ -1670,7 +1670,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 * as a result of a hypervisor emulation interrupt
 * (e40) getting turned into a 700 by BML RTAS.
 */
-   flags = (vcpu->arch.shregs.msr & 0x1full) |
+   flags = (__kvmppc_get_msr_hv(vcpu) & 0x1full) |
(kvmppc_get_msr(vcpu) & SRR1_PREFIXED);
kvmppc_core_queue_program(vcpu, flags);
r = RESUME_GUEST;
@@ -1680,7 +1680,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
{
int i;
 
-   if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
+   if (unlikely(__kvmppc_get_msr_hv(vcpu) & MSR_PR)) {
/*
 * Guest userspace executed sc 1. This can only be
 * reached by the P9 path because the old path
@@ -1758,7 +1758,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,

[PATCH v4 08/11] KVM: PPC: Add helper library for Guest State Buffers

2023-09-04 Thread Jordan Niethe

The PAPR "Nestedv2" guest API introduces the concept of a Guest State
Buffer for communication about L2 guests between L1 and L0 hosts.

In the new API, the L0 manages the L2 on behalf of the L1. This means
that if the L1 needs to change L2 state (e.g. GPRs, SPRs, partition
table...), it must request the L0 perform the modification. If the
nested host needs to read L2 state likewise this request must
go through the L0.

The Guest State Buffer is a Type-Length-Value style data format defined
in the PAPR which assigns all relevant partition state a unique
identity. Unlike a typical TLV format the length is redundant as the
length of each identity is fixed but is included for checking
correctness.

A guest state buffer consists of an element count followed by a stream
of elements, where elements are composed of an ID number, data length,
then the data:

  Header:

   <---4 bytes--->
  ++-
  | Element Count  | Elements...
  ++-

  Element:

   <2 bytes---> <-2 bytes-> <-Length bytes->
  ++---++
  | Guest State ID |  Length   |  Data  |
  ++---++

Guest State IDs have other attributes defined in the PAPR such as
whether they are per thread or per guest, or read-only.

Introduce a library for using guest state buffers. This includes support
for actions such as creating buffers, adding elements to buffers,
reading the value of elements and parsing buffers. This will be used
later by the nestedv2 guest support.

Signed-off-by: Jordan Niethe 
---
v2:
  - Add missing #ifdef CONFIG_VSXs
  - Move files from lib/ to kvm/
  - Guard compilation on CONFIG_KVM_BOOK3S_HV_POSSIBLE
  - Use kunit for guest state buffer tests
  - Add configuration option for the tests
  - Use macros for contiguous id ranges like GPRs
  - Add some missing EXPORTs to functions
  - HEIR element is a double word not a word
v3:
  - Use EXPORT_SYMBOL_GPL()
  - Use the kvmppc namespace
  - Move kvmppc_gsb_reset() out of kvmppc_gsm_fill_info()
  - Comments for GSID elements
  - Pass vector elements by reference
  - Remove generic put and get functions
---
 arch/powerpc/Kconfig.debug|  12 +
 arch/powerpc/include/asm/guest-state-buffer.h | 904 ++
 arch/powerpc/kvm/Makefile |   3 +
 arch/powerpc/kvm/guest-state-buffer.c | 571 +++
 arch/powerpc/kvm/test-guest-state-buffer.c| 328 +++
 5 files changed, 1818 insertions(+)
 create mode 100644 arch/powerpc/include/asm/guest-state-buffer.h
 create mode 100644 arch/powerpc/kvm/guest-state-buffer.c
 create mode 100644 arch/powerpc/kvm/test-guest-state-buffer.c

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 2a54fadbeaf5..339c3a5f56f1 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -82,6 +82,18 @@ config MSI_BITMAP_SELFTEST
bool "Run self-tests of the MSI bitmap code"
depends on DEBUG_KERNEL
 
+config GUEST_STATE_BUFFER_TEST
+   def_tristate n
+   prompt "Enable Guest State Buffer unit tests"
+   depends on KUNIT
+   depends on KVM_BOOK3S_HV_POSSIBLE
+   default KUNIT_ALL_TESTS
+   help
+ The Guest State Buffer is a data format specified in the PAPR.
+ It is by hcalls to communicate the state of L2 guests between
+ the L1 and L0 hypervisors. Enable unit tests for the library
+ used to create and use guest state buffers.
+
 config PPC_IRQ_SOFT_MASK_DEBUG
bool "Include extra checks for powerpc irq soft masking"
depends on PPC64
diff --git a/arch/powerpc/include/asm/guest-state-buffer.h 
b/arch/powerpc/include/asm/guest-state-buffer.h
new file mode 100644
index ..aaefe1075fc4
--- /dev/null
+++ b/arch/powerpc/include/asm/guest-state-buffer.h
@@ -0,0 +1,904 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Interface based on include/net/netlink.h
+ */
+#ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
+#define _ASM_POWERPC_GUEST_STATE_BUFFER_H
+
+#include 
+#include 
+#include 
+
+/**
+ * Guest State Buffer Constants
+ **/
+/* Element without a value and any length */
+#define KVMPPC_GSID_BLANK  0x
+/* Size required for the L0's internal VCPU representation */
+#define KVMPPC_GSID_HOST_STATE_SIZE0x0001
+ /* Minimum size for the H_GUEST_RUN_VCPU output buffer */
+#define KVMPPC_GSID_RUN_OUTPUT_MIN_SIZE0x0002
+ /* "Logical" PVR value as defined in the PAPR */
+#define KVMPPC_GSID_LOGICAL_PVR0x0003
+ /* L0 relative timebase offset */
+#define KVMPPC_GSID_TB_OFFSET  0x0004
+ /* Partition Scoped Page Table Info */
+#define KVMPPC_GSID_PARTITION_TABLE0x0005
+ /* Process Table Info */
+#define KVMPPC_GSID_PROCESS_TABLE

[PATCH v4 09/11] KVM: PPC: Book3s HV: Hold LPIDs in an unsigned long

2023-09-04 Thread Jordan Niethe

The LPID register is 32 bits long. The host keeps the lpids for each
guest in an unsigned word struct kvm_arch. Currently, LPIDs are already
limited by mmu_lpid_bits and KVM_MAX_NESTED_GUESTS_SHIFT.

The nestedv2 API returns a 64 bit "Guest ID" to be used be the L1 host
for each L2 guest. This value is used as an lpid, e.g. it is the
parameter used by H_RPT_INVALIDATE. To minimize needless special casing
it makes sense to keep this "Guest ID" in struct kvm_arch::lpid.

This means that struct kvm_arch::lpid is too small so prepare for this
and make it an unsigned long. This is not a problem for the KVM-HV and
nestedv1 cases as their lpid values are already limited to valid ranges
so in those contexts the lpid can be used as an unsigned word safely as
needed.

In the PAPR, the H_RPT_INVALIDATE pid/lpid parameter is already
specified as an unsigned long so change pseries_rpt_invalidate() to
match that.  Update the callers of pseries_rpt_invalidate() to also take
an unsigned long if they take an lpid value.

Signed-off-by: Jordan Niethe 
---
v3:
  - New to series
v4:
  - Use u64
  - Change format strings instead of casting
---
 arch/powerpc/include/asm/kvm_book3s.h | 10 +-
 arch/powerpc/include/asm/kvm_book3s_64.h  |  2 +-
 arch/powerpc/include/asm/kvm_host.h   |  2 +-
 arch/powerpc/include/asm/plpar_wrappers.h |  4 ++--
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c| 22 +++---
 arch/powerpc/kvm/book3s_hv_nested.c   |  4 ++--
 arch/powerpc/kvm/book3s_hv_uvmem.c|  2 +-
 arch/powerpc/kvm/book3s_xive.c|  4 ++--
 9 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 4c6558d5fefe..831c23e4f121 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -191,14 +191,14 @@ extern int kvmppc_mmu_radix_translate_table(struct 
kvm_vcpu *vcpu, gva_t eaddr,
 extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
struct kvmppc_pte *gpte, bool data, bool iswrite);
 extern void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned long addr,
-   unsigned int pshift, unsigned int lpid);
+   unsigned int pshift, u64 lpid);
 extern void kvmppc_unmap_pte(struct kvm *kvm, pte_t *pte, unsigned long gpa,
unsigned int shift,
const struct kvm_memory_slot *memslot,
-   unsigned int lpid);
+   u64 lpid);
 extern bool kvmppc_hv_handle_set_rc(struct kvm *kvm, bool nested,
bool writing, unsigned long gpa,
-   unsigned int lpid);
+   u64 lpid);
 extern int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
unsigned long gpa,
struct kvm_memory_slot *memslot,
@@ -207,7 +207,7 @@ extern int kvmppc_book3s_instantiate_page(struct kvm_vcpu 
*vcpu,
 extern int kvmppc_init_vm_radix(struct kvm *kvm);
 extern void kvmppc_free_radix(struct kvm *kvm);
 extern void kvmppc_free_pgtable_radix(struct kvm *kvm, pgd_t *pgd,
- unsigned int lpid);
+ u64 lpid);
 extern int kvmppc_radix_init(void);
 extern void kvmppc_radix_exit(void);
 extern void kvm_unmap_radix(struct kvm *kvm, struct kvm_memory_slot *memslot,
@@ -300,7 +300,7 @@ void kvmhv_nested_exit(void);
 void kvmhv_vm_nested_init(struct kvm *kvm);
 long kvmhv_set_partition_table(struct kvm_vcpu *vcpu);
 long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu);
-void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1);
+void kvmhv_set_ptbl_entry(u64 lpid, u64 dw0, u64 dw1);
 void kvmhv_release_all_nested(struct kvm *kvm);
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
 long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index d49065af08e9..572f9bbf1a25 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -624,7 +624,7 @@ static inline void copy_to_checkpoint(struct kvm_vcpu *vcpu)
 
 extern int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, pte_t pte,
 unsigned long gpa, unsigned int level,
-unsigned long mmu_seq, unsigned int lpid,
+unsigned long mmu_seq, u64 lpid,
 unsigned long *rmapp, struct rmap_nested **n_rmap);
 extern void kvmhv_insert_nest_rmap(struct kvm *kvm, unsigned long *rmapp,
   struct rmap_nested **n_rmap);
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 14ee0dece853..429b53bc1773 100

[PATCH v4 10/11] KVM: PPC: Add support for nestedv2 guests

2023-09-04 Thread Jordan Niethe

A series of hcalls have been added to the PAPR which allow a regular
guest partition to create and manage guest partitions of its own. KVM
already had an interface that allowed this on powernv platforms. This
existing interface will now be called "nestedv1". The newly added PAPR
interface will be called "nestedv2".  PHYP will support the nestedv2
interface. At this time the host side of the nestedv2 interface has not
been implemented on powernv but there is no technical reason why it
could not be added.

The nestedv1 interface is still supported.

Add support to KVM to utilize these hcalls to enable running nested
guests as a pseries guest on PHYP.

Overview of the new hcall usage:

- L1 and L0 negotiate capabilities with
  H_GUEST_{G,S}ET_CAPABILITIES()

- L1 requests the L0 create a L2 with
  H_GUEST_CREATE() and receives a handle to use in future hcalls

- L1 requests the L0 create a L2 vCPU with
  H_GUEST_CREATE_VCPU()

- L1 sets up the L2 using H_GUEST_SET and the
  H_GUEST_VCPU_RUN input buffer

- L1 requests the L0 runs the L2 vCPU using H_GUEST_VCPU_RUN()

- L2 returns to L1 with an exit reason and L1 reads the
  H_GUEST_VCPU_RUN output buffer populated by the L0

- L1 handles the exit using H_GET_STATE if necessary

- L1 reruns L2 vCPU with H_GUEST_VCPU_RUN

- L1 frees the L2 in the L0 with H_GUEST_DELETE()

Support for the new API is determined by trying
H_GUEST_GET_CAPABILITIES. On a successful return, use the nestedv2
interface.

Use the vcpu register state setters for tracking modified guest state
elements and copy the thread wide values into the H_GUEST_VCPU_RUN input
buffer immediately before running a L2. The guest wide
elements can not be added to the input buffer so send them with a
separate H_GUEST_SET call if necessary.

Make the vcpu register getter load the corresponding value from the real
host with H_GUEST_GET. To avoid unnecessarily calling H_GUEST_GET, track
which values have already been loaded between H_GUEST_VCPU_RUN calls. If
an element is present in the H_GUEST_VCPU_RUN output buffer it also does
not need to be loaded again.

Signed-off-by: Vaibhav Jain 
Signed-off-by: Gautam Menghani 
Signed-off-by: Kautuk Consul 
Signed-off-by: Amit Machhiwal 
Signed-off-by: Jordan Niethe 
---
v2:
  - Declare op structs as static
  - Guatam: Use expressions in switch case with local variables
  - Do not use the PVR for the LOGICAL PVR ID
  - Kautuk: Handle emul_inst as now a double word, init correctly
  - Use new GPR(), etc macros
  - Amit: Determine PAPR nested capabilities from cpu features
v3:
  - Use EXPORT_SYMBOL_GPL()
  - Change to kvmhv_nestedv2 namespace
  - Make kvmhv_enable_nested() return -ENODEV on NESTEDv2 L1 hosts
  - s/kvmhv_on_papr/kvmhv_is_nestedv2/
  - mv book3s_hv_papr.c book3s_hv_nestedv2.c
  - Handle shared regs without a guest state id in the same wrapper
  - Vaibhav: Use a static key for API version
  - Add a positive test for NESTEDv1
  - Give the amor a static value
  - s/struct kvmhv_nestedv2_host/struct kvmhv_nestedv2_io/
  - Propagate failure in kvmhv_vcpu_entry_nestedv2()
  - WARN if getters and setters fail
  - Progagate failure from kvmhv_nestedv2_parse_output()
  - Replace delay with sleep in plpar_guest_{create,delete,create_vcpu}()
  - Amit: Add logical PVR handling
  - Replace kvmppc_gse_{get,put} with specific version
v4:
  - Batch H_GUEST_GET calls in kvmhv_nestedv2_reload_ptregs()
  - Fix compile without CONFIG_PSERIES
  - Fix maybe uninitialized trap in kvmhv_p9_guest_entry()
  - Extend existing setters for arch_compat and lpcr
---
 arch/powerpc/include/asm/guest-state-buffer.h |  91 ++
 arch/powerpc/include/asm/hvcall.h |  30 +
 arch/powerpc/include/asm/kvm_book3s.h | 137 ++-
 arch/powerpc/include/asm/kvm_book3s_64.h  |   6 +
 arch/powerpc/include/asm/kvm_host.h   |  20 +
 arch/powerpc/include/asm/kvm_ppc.h|  90 +-
 arch/powerpc/include/asm/plpar_wrappers.h | 244 +
 arch/powerpc/kvm/Makefile |   1 +
 arch/powerpc/kvm/book3s_hv.c  | 134 ++-
 arch/powerpc/kvm/book3s_hv.h  |  80 +-
 arch/powerpc/kvm/book3s_hv_nested.c   |  38 +-
 arch/powerpc/kvm/book3s_hv_nestedv2.c | 998 ++
 arch/powerpc/kvm/emulate_loadstore.c  |   4 +-
 arch/powerpc/kvm/guest-state-buffer.c |  50 +
 14 files changed, 1826 insertions(+), 97 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_hv_nestedv2.c

diff --git a/arch/powerpc/include/asm/guest-state-buffer.h 
b/arch/powerpc/include/asm/guest-state-buffer.h
index aaefe1075fc4..808149f31576 100644
--- a/arch/powerpc/include/asm/guest-state-buffer.h
+++ b/arch/powerpc/include/asm/guest-state-buffer.h
@@ -5,6 +5,7 @@
 #ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
 #define _ASM_POWERPC_GUEST_STATE_BUFFER_H
 
+#include "asm/hvcall.h"
 #include 
 #include 
 #include 
@@ -313,6 +314,8 @@ struct kvmppc_gs_buff *kvmppc_gsb_new(size_t size, unsigned 
long guest_id,

[PATCH v4 11/11] docs: powerpc: Document nested KVM on POWER

2023-09-04 Thread Jordan Niethe

From: Michael Neuling 

Document support for nested KVM on POWER using the existing API as well
as the new PAPR API. This includes the new HCALL interface and how it
used by KVM.

Signed-off-by: Michael Neuling 
Signed-off-by: Jordan Niethe 
---
v2:
  - Separated into individual patch
v3:
  - Fix typos
---
 Documentation/powerpc/index.rst  |   1 +
 Documentation/powerpc/kvm-nested.rst | 636 +++
 2 files changed, 637 insertions(+)
 create mode 100644 Documentation/powerpc/kvm-nested.rst

diff --git a/Documentation/powerpc/index.rst b/Documentation/powerpc/index.rst
index d33b554ca7ba..23e449994c2a 100644
--- a/Documentation/powerpc/index.rst
+++ b/Documentation/powerpc/index.rst
@@ -26,6 +26,7 @@ powerpc
 isa-versions
 kaslr-booke32
 mpc52xx
+kvm-nested
 papr_hcalls
 pci_iov_resource_on_powernv
 pmu-ebb
diff --git a/Documentation/powerpc/kvm-nested.rst 
b/Documentation/powerpc/kvm-nested.rst
new file mode 100644
index ..8b37981dc3d9
--- /dev/null
+++ b/Documentation/powerpc/kvm-nested.rst
@@ -0,0 +1,636 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+
+Nested KVM on POWER
+
+
+Introduction
+
+
+This document explains how a guest operating system can act as a
+hypervisor and run nested guests through the use of hypercalls, if the
+hypervisor has implemented them. The terms L0, L1, and L2 are used to
+refer to different software entities. L0 is the hypervisor mode entity
+that would normally be called the "host" or "hypervisor". L1 is a
+guest virtual machine that is directly run under L0 and is initiated
+and controlled by L0. L2 is a guest virtual machine that is initiated
+and controlled by L1 acting as a hypervisor.
+
+Existing API
+
+
+Linux/KVM has had support for Nesting as an L0 or L1 since 2018
+
+The L0 code was added::
+
+   commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce
+   Author: Paul Mackerras 
+   Date:   Mon Oct 8 16:31:03 2018 +1100
+   KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization
+
+The L1 code was added::
+
+   commit 360cae313702cdd0b90f82c261a8302fecef030a
+   Author: Paul Mackerras 
+   Date:   Mon Oct 8 16:31:04 2018 +1100
+   KVM: PPC: Book3S HV: Nested guest entry via hypercall
+
+This API works primarily using a single hcall h_enter_nested(). This
+call made by the L1 to tell the L0 to start an L2 vCPU with the given
+state. The L0 then starts this L2 and runs until an L2 exit condition
+is reached. Once the L2 exits, the state of the L2 is given back to
+the L1 by the L0. The full L2 vCPU state is always transferred from
+and to L1 when the L2 is run. The L0 doesn't keep any state on the L2
+vCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2
+-> L1 exit).
+
+The only state kept by the L0 is the partition table. The L1 registers
+it's partition table using the h_set_partition_table() hcall. All
+other state held by the L0 about the L2s is cached state (such as
+shadow page tables).
+
+The L1 may run any L2 or vCPU without first informing the L0. It
+simply starts the vCPU using h_enter_nested(). The creation of L2s and
+vCPUs is done implicitly whenever h_enter_nested() is called.
+
+In this document, we call this existing API the v1 API.
+
+New PAPR API
+===
+
+The new PAPR API changes from the v1 API such that the creating L2 and
+associated vCPUs is explicit. In this document, we call this the v2
+API.
+
+h_enter_nested() is replaced with H_GUEST_VCPU_RUN().  Before this can
+be called the L1 must explicitly create the L2 using h_guest_create()
+and any associated vCPUs() created with h_guest_create_vCPU(). Getting
+and setting vCPU state can also be performed using h_guest_{g|s}et
+hcall.
+
+The basic execution flow is for an L1 to create an L2, run it, and
+delete it is:
+
+- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES()
+  (normally at L1 boot time).
+
+- L1 requests the L0 create an L2 with H_GUEST_CREATE() and receives a token
+
+- L1 requests the L0 create an L2 vCPU with H_GUEST_CREATE_VCPU()
+
+- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall
+
+- L1 requests the L0 runs the vCPU running H_GUEST_VCPU_RUN() hcall
+
+- L1 deletes L2 with H_GUEST_DELETE()
+
+More details of the individual hcalls follows:
+
+HCALL Details
+=
+
+This documentation is provided to give an overall understating of the
+API. It doesn't aim to provide all the details required to implement
+an L1 or L0. Latest version of PAPR can be referred to for more details.
+
+All these HCALLs are made by the L1 to the L0.
+
+H_GUEST_GET_CAPABILITIES()
+--
+
+This is called to get the capabilities of the L0 nested
+hypervisor. This includes capabilities such the CPU versions (eg
+POWER9, POWER10) that are supported as L2s::
+
+  H_GUEST_GET_CAPABILITIES(uint64 flags)
+
+  Parameters:
+Input:
+  flags: Reserv

Re: [PATCH] powerpc/64e: Fix wrong test in __ptep_test_and_clear_young()

2023-09-04 Thread Christophe Leroy

Le 05/09/2023 à 04:36, Michael Ellerman a écrit :
> Christophe Leroy  writes:
>> Commit 45201c879469 ("powerpc/nohash: Remove hash related code from
>> nohash headers.") replaced:
>>
>>if ((pte_val(*ptep) & (_PAGE_ACCESSED | _PAGE_HASHPTE)) == 0)
>>  return 0;
>>
>> By:
>>
>>if (pte_young(*ptep))
>>  return 0;
>>
>> But it should be:
>>
>>if (!pte_young(*ptep))
>>  return 0;
> 
> 
> That seems bad :)
> 
> But I don't know off the top of my head where
> __ptep_test_and_clear_young() is used, and so what the symptoms could
> be. Presumably nothing too bad or someone would have noticed?
> 

The two uses in mm/vmscan.c are as follows:

if (!ptep_test_and_clear_young(args->vma, addr, pte + i))
VM_WARN_ON_ONCE(true);

So it seems to be expected to never happen.

The only useful place it is used seems to be folio_check_references() 
which is part of the reclaim process. So probably it messes up swap, but 
to what extent ? ppc64e is for embedded systems. Do embedded systems 
have swap at all ?

Also surprising that it is also called from mm/debug_vm_pgtable.c so 
shouldn't we have noticed earlier ? I'll check if it works.

Re: [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()

2023-09-04 Thread Baoquan He

Hi Hari, Michael

On 08/11/23 at 01:23pm, Hari Bathini wrote:
> 
> 
> On 10/08/23 8:31 am, Baoquan He wrote:
> > On 08/10/23 at 10:06am, Ming Lei wrote:
> > > On Thu, Aug 10, 2023 at 09:18:27AM +0800, Baoquan He wrote:
> > > > On 08/10/23 at 08:09am, Ming Lei wrote:
> > > > > On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote:
> > > > > > I'm starting to sound like a broken record, but we can't just do 
> > > > > > random
> > > > > > is_kdump checks, and it's not going to get better by resending it 
> > > > > > again and
> > > > > > again.  If kdump kernels limit the number of possible CPUs, it 
> > > > > > needs to
> > > > > > reflected in cpu_possible_map and we need to use that information.
> > > > > > 
> > > > > 
> > > > > Can you look at previous kdump/arch guys' comment about kdump usage &
> > > > > num_possible_cpus?
> > > > > 
> > > > >  
> > > > > https://lore.kernel.org/linux-block/caf+s44ruqswbosy9kmdx35crviqnxoeuvgnsue75bb0y2jg...@mail.gmail.com/
> > > > >  
> > > > > https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/
> > > > > 
> > > > > The point is that kdump kernels does not limit the number of possible 
> > > > > CPUs.
> > > > > 
> > > > > 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, 
> > > > > since
> > > > > num_possible_cpus becomes 1.
> > > > 
> > > > Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus="
> > > > limits the possible cpu numbers, while "maxcpuss=" only limits the cpu
> > > > number which can be brought up during bootup. We noticed this diference
> > > > because a large number of possible cpus will cost more memory in kdump
> > > > kernel. e.g percpu initialization, even though kdump kernel have set
> > > > "maxcpus=1".
> > > > 
> > > > Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much
> > > > effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64
> > > > dev and maintainers do not care about it. Finally the patches are not
> > > > accepted, and the work is not continued.
> > > > 
> > > > Now, I am wondering what is the barrier to add "nr_cpus=" to power ach.
> > > > Can we reconsider adding 'nr_cpus=' to power arch since real issue
> > > > occurred in kdump kernel?
> > > 
> > > If 'nr_cpus=' can be supported on ppc64, this patchset isn't needed.
> > > 
> > > > 
> > > > As for this patchset, it can be accpeted so that no failure in kdump
> > > > kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion.
> > > 
> > > IMO 'nr_cpus=' support should be preferred, given it is annoying to
> > > maintain two kinds of implementation for kdump kernel from driver
> > > viewpoint. I guess kdump things can be simplified too with supporting
> > > 'nr_cpus=' only.
> > 
> > Yes, 'nr_cpus=' is ideal. Not sure if there's some underlying concerns so
> > that power people decided to not support it.
> 
> Though "nr_cpus=1" is an ideal solution, maintainer was not happy with
> the patch as the code changes have impact for regular boot path and
> it is likely to cause breakages. So, even if "nr_cpus=1" support for
> ppc64 is revived, the change is going to take time to be accepted
> upstream.

I talked to pingfan recently, he said he posted patches to add 'nr_cpus='
support in powerpc in order to reduce memory amount for kdump kernel.
His patches were rejected by maintainer because maintainer thought the
reason is not sufficient. So up to now, in architectures fedora/RHEL
supports to provide default crashkernel reservation value, powerpc costs
most. Now with this emerging issue, can we reconsider supporting
'nr_cpus=' in powerpc?

> 
> Also, I see is_kdump_kernel() being used irrespective of "nr_cpus=1"
> support for other optimizations in the driver for the special dump
> capture environment kdump is.
> 
> If there is no other downside for driver code, to use is_kdump_kernel(),
> other than the maintainability aspect, I think the above changes are
> worth considering.

Hi Hari,

By the way, will you use the ppc specific is_kdump_kernel() and
is_crashdump_kernel() in your patches to fix this issue?

Thanks
Baoquan

Re: [PATCH] powerpc/smp: Dynamically build powerpc topology

2023-09-04 Thread Srikar Dronamraju

* Peter Zijlstra  [2023-09-05 00:10:04]:

> On Wed, Aug 30, 2023 at 05:56:14PM +0530, Srikar Dronamraju wrote:
> > Currently there are four powerpc specific sched topologies.  These are
> > all statically defined.  However not all these topologies are used by
> > all powerpc systems.
> > 
> > To avoid unnecessary degenerations by the scheduler , masks and flags
> > are compared. However if the sched topologies are build dynamically then
> > the code is simpler and there are greater chances of avoiding
> > degenerations.
> > 
> > Even x86 builds its sched topologies dynamically and new changes are
> > very similar to the way x86 is building its topologies.
> 

Thanks Peter for taking a look.

> You're not stating it explicitly, but you're doing this as a performance
> optimization, right? The x86 thing didn't particularly care about
> avoiding degenerate topologies -- it's just that the fixed tables method
> grew unwieldy due to combinatorics.
> 

Yes, its an optimization. On Powerpc, there is an utility ppc64, which users
would use to set their SMT mode, and whenever they do we end up recreating
the topology. Hence avoiding degenerates esp on large systems, should help.

Also dynamic add of CPUs is more common on Powerpc. Hence there also we
would avoid degenerating unnecessary domains.

> And how does this patch relate to the other series touching this?
> 
>   powerpc/smp: Shared processor sched optimizations
> 

This patch will work independent of that patchset.
However Shared processor sched optimization patchset makes MC domain
avoid degeneration. Hence this patch will benefit from that patchset.

i.e without the Shared processor sched patchset, has_coregroup_support()
will return true on Power10 for even shared processor. And hence the
scheduler will create and destroy MC domains. If the patchset is already
present, on Power10 for shared processors, we will avoid MC domains.
Other that this there wont be any change.

-- 
Thanks and Regards
Srikar Dronamraju

Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval

Re: [PATCH v2 1/3] kconfig: add dependencies of POWER_RESET for mips malta

Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval

Re: [PATCH v2 1/3] kconfig: add dependencies of POWER_RESET for mips malta

Re: [PATCH v2 1/3] kconfig: add dependencies of POWER_RESET for mips malta

[PATCH v4] drivers/net: process the result of hdlc_open() and add call of hdlc_close() in uhdlc_close()

Re: [PATCH 1/2] vmcore: allow alternate dump capturing methods to export vmcore without is_kdump_kernel()

Re: KASAN debug kernel fails to boot at early stage when CONFIG_SMP=y is set (kernel 6.5-rc5, PowerMac G4 3,6)

Re: KASAN debug kernel fails to boot at early stage when CONFIG_SMP=y is set (kernel 6.5-rc5, PowerMac G4 3,6)

Re: [PATCH v4] drivers/net: process the result of hdlc_open() and add call of hdlc_close() in uhdlc_close()

Re: [PATCH v4] drivers/net: process the result of hdlc_open() and add call of hdlc_close() in uhdlc_close()

Re: [PATCH v2 1/3] kconfig: add dependencies of POWER_RESET for mips malta

Re: [PATCH v2 1/3] kconfig: add dependencies of POWER_RESET for mips malta

Re: [PATCH] powerpc/smp: Dynamically build powerpc topology

Re: [PATCH 1/2] vmcore: allow alternate dump capturing methods to export vmcore without is_kdump_kernel()

Re: [PATCH] powerpc/64e: Fix wrong test in __ptep_test_and_clear_young()

Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval

Re: [PATCH 0/4] ppc, fbdev: Clean up fbdev mmap helper

[PATCH v4 00/11] KVM: PPC: Nested APIv2 guest support

[PATCH v4 01/11] KVM: PPC: Always use the GPR accessors

[PATCH v4 02/11] KVM: PPC: Introduce FPR/VR accessor functions

[PATCH v4 03/11] KVM: PPC: Rename accessor generator macros

[PATCH v4 04/11] KVM: PPC: Use accessors for VCPU registers

[PATCH v4 05/11] KVM: PPC: Use accessors VCORE registers

[PATCH v4 06/11] KVM: PPC: Book3S HV: Use accessors for VCPU registers

[PATCH v4 07/11] KVM: PPC: Book3S HV: Introduce low level MSR accessor

[PATCH v4 08/11] KVM: PPC: Add helper library for Guest State Buffers

[PATCH v4 09/11] KVM: PPC: Book3s HV: Hold LPIDs in an unsigned long

[PATCH v4 10/11] KVM: PPC: Add support for nestedv2 guests

[PATCH v4 11/11] docs: powerpc: Document nested KVM on POWER

Re: [PATCH] powerpc/64e: Fix wrong test in __ptep_test_and_clear_young()

Re: [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()

Re: [PATCH] powerpc/smp: Dynamically build powerpc topology

33 matches

Site Navigation

Mail list logo

Footer information