Re: [PATCH v4 4/4] pseries/mobility: set NMI watchdog factor during LPM

2022-07-13 Thread Laurent Dufour
Le 12/07/2022 à 18:25, Randy Dunlap a écrit :
> Hi--
> 
> On 7/12/22 07:32, Laurent Dufour wrote:
>> During a LPM, while the memory transfer is in progress on the arrival side,
>> some latencies is generated when accessing not yet transferred pages on the
> 
>  are
> 
>> arrival side. Thus, the NMI watchdog may be triggered too frequently, which
>> increases the risk to hit a NMI interrupt in a bad place in the kernel,
> 
> an NMI
> 
>> leading to a kernel panic.
>>
>> Disabling the Hard Lockup Watchdog until the memory transfer could be a too
>> strong work around, some users would want this timeout to be eventually
>> triggered if the system is hanging even during LPM.
>>
>> Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
>> a factor to the NMI watchdog timeout during a LPM. Just before the CPU are
> 
>   an LPM.the CPU is
> 
>> stopped for the switchover sequence, the NMI watchdog timer is set to
>>  watchdog_tresh + factor%
> 
>watchdog_thresh
> 
>>
>> A value of 0 has no effect. The default value is 200, meaning that the NMI
>> watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value).
> 
> watchdog_thresh
> 
>> Once the memory transfer is achieved, the factor is reset to 0.
>>
>> Setting this value to a high number is like disabling the NMI watchdog
>> during a LPM.
> 
>  an LPM.
> 
>>
>> Reviewed-by: Nicholas Piggin 
>> Signed-off-by: Laurent Dufour 
>> ---
>>  Documentation/admin-guide/sysctl/kernel.rst | 12 ++
>>  arch/powerpc/platforms/pseries/mobility.c   | 43 +
>>  2 files changed, 55 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst 
>> b/Documentation/admin-guide/sysctl/kernel.rst
>> index ddccd1077462..0bb0b7f27e96 100644
>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>> @@ -592,6 +592,18 @@ to the guest kernel command line (see
>>  Documentation/admin-guide/kernel-parameters.rst).
>>  
> 
> This entire block should be in kernel-parameters.txt, not .rst,
> and it should be formatted like everything else in the .txt file.

Thanks for reviewing this patch.

I'll apply your requests in the next version.

However, regarding the change in kernel-parameters.txt, I'm confused. The
newly introduced parameter is only exposed through sysctl. Not as a kernel
boot option. In that case, should it be mentioned in kernel-parameters.txt?

Documentation/process/4.Coding.rst says:
The file :ref:`Documentation/admin-guide/kernel-parameters.rst
` describes all of the kernel's boot-time parameters.
Any patch which adds new parameters should add the appropriate entries to
this file.

And Documentation/process/submit-checklist.rst says:
16) All new kernel boot parameters are documented in
``Documentation/admin-guide/kernel-parameters.rst``.

What are the rules about editing .txt or .rst files?

>>  
>> +nmi_watchdog_factor (PPC only)
>> +==
>> +
>> +Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is
> 
>Factor to apply to the NMI
> 
>> +set to 1). This factor represents the percentage added to
>> +``watchdog_thresh`` when calculating the NMI watchdog timeout during a
> 
>  during an
> 
>> +LPM. The soft lockup timeout is not impacted.
>> +
>> +A value of 0 means no change. The default value is 200 meaning the NMI
>> +watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
>> +
>> +
>>  numa_balancing
>>  ==
>>  
> 
> 



[PATCH] powerpc: dts: turris1x.dts: Add CPLD reboot node

2022-07-13 Thread Pali Rohár
CPLD firmware can reset board by writing value 0x01 at CPLD memory offset
0x0d. Define syscon-reboot node for this reset support.

Fixes: 54c15ec3b738 ("powerpc: dts: Add DTS file for CZ.NIC Turris 1.x routers")
Signed-off-by: Pali Rohár 
---
 arch/powerpc/boot/dts/turris1x.dts | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/dts/turris1x.dts 
b/arch/powerpc/boot/dts/turris1x.dts
index c76b628cf026..12e08271e61f 100644
--- a/arch/powerpc/boot/dts/turris1x.dts
+++ b/arch/powerpc/boot/dts/turris1x.dts
@@ -332,7 +332,7 @@
 * Turris CPLD firmware is open source and available at:
 * 
https://gitlab.nic.cz/turris/hw/turris_cpld/-/blob/master/CZ_NIC_Router_CPLD.v
 */
-   compatible = "cznic,turris1x-cpld", 
"fsl,p1021rdb-pc-cpld", "simple-bus";
+   compatible = "cznic,turris1x-cpld", 
"fsl,p1021rdb-pc-cpld", "simple-bus", "syscon";
reg = <0x3 0x0 0x30>;
#address-cells = <1>;
#size-cells = <1>;
@@ -352,6 +352,14 @@
gpios = <&gpio 11 GPIO_ACTIVE_LOW>;
};
 
+   reboot@d {
+   compatible = "syscon-reboot";
+   reg = <0x0d 0x01>;
+   offset = <0x0d>;
+   mask = <0x01>;
+   value = <0x01>;
+   };
+
led-controller@13 {
/*
 * LEDs are controlled by CPLD firmware.
-- 
2.20.1



Re: [PATCH] macintosh:fix oob read in do_adb_query function

2022-07-13 Thread Greg KH
On Wed, Jul 13, 2022 at 09:40:37PM +0800, NAME wrote:
> From: sohu0106 

For obvious reasons, we need a real name here, and in the signed-off-by
line.

> In do_adb_query function of drivers/macintosh/adb.c,
> req->data is copy form userland. The parameter
> "req->data[2]" is Missing check, the array size of
> adb_handler[] is 16, so "adb_handler[req->data[2]].
> original_address" and "adb_handler[req->data[2]].
> handler_id" will lead to oob read.

You can use all 72 columns, if you want to re-wrap these lines when you
resend.

> 
> Signed-off-by: sohu0106 
> ---
>  drivers/macintosh/adb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/macintosh/adb.c b/drivers/macintosh/adb.c
> index 439fab4eaa85..1bbb9ca08d40 100644
> --- a/drivers/macintosh/adb.c
> +++ b/drivers/macintosh/adb.c
> @@ -647,7 +647,7 @@ do_adb_query(struct adb_request *req)
>  
>   switch(req->data[1]) {
>   case ADB_QUERY_GETDEVINFO:
> - if (req->nbytes < 3)
> + if (req->nbytes < 3 || req->data[2] >= 16)

Shouldn't 16 be the array size instead of having this hard coded to a
magic number?

Something like "sizeof(adb_handler) / sizeof(struct adb_handler)"?

Maybe not, that's messy, your choice.

thanks,

greg k-h


[PATCH] macintosh:fix oob read in do_adb_query function

2022-07-13 Thread NAME
From: sohu0106 

In do_adb_query function of drivers/macintosh/adb.c,
req->data is copy form userland. The parameter
"req->data[2]" is Missing check, the array size of
adb_handler[] is 16, so "adb_handler[req->data[2]].
original_address" and "adb_handler[req->data[2]].
handler_id" will lead to oob read.

Signed-off-by: sohu0106 
---
 drivers/macintosh/adb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/macintosh/adb.c b/drivers/macintosh/adb.c
index 439fab4eaa85..1bbb9ca08d40 100644
--- a/drivers/macintosh/adb.c
+++ b/drivers/macintosh/adb.c
@@ -647,7 +647,7 @@ do_adb_query(struct adb_request *req)
 
switch(req->data[1]) {
case ADB_QUERY_GETDEVINFO:
-   if (req->nbytes < 3)
+   if (req->nbytes < 3 || req->data[2] >= 16)
break;
mutex_lock(&adb_handler_mutex);
req->reply[0] = adb_handler[req->data[2]].original_address;
-- 
2.25.1



Re: [PATCH v3 4/4] pseries/mobility: set NMI watchdog factor during LPM

2022-07-13 Thread Laurent Dufour
Le 12/07/2022 à 11:47, Laurent Dufour a écrit :
> Le 12/07/2022 à 03:46, Nicholas Piggin a écrit :
>> Excerpts from Laurent Dufour's message of June 27, 2022 11:53 pm:
>>> During a LPM, while the memory transfer is in progress on the arrival side,
>>> some latencies is generated when accessing not yet transferred pages on the
>>> arrival side. Thus, the NMI watchdog may be triggered too frequently, which
>>> increases the risk to hit a NMI interrupt in a bad place in the kernel,
>>> leading to a kernel panic.
>>>
>>> Disabling the Hard Lockup Watchdog until the memory transfer could be a too
>>> strong work around, some users would want this timeout to be eventually
>>> triggered if the system is hanging even during LPM.
>>>
>>> Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
>>> a factor to the NMI watchdog timeout during a LPM. Just before the CPU are
>>> stopped for the switchover sequence, the NMI watchdog timer is set to
>>>  watchdog_tresh + factor%
>>>
>>> A value of 0 has no effect. The default value is 200, meaning that the NMI
>>> watchdog is set to 30s during LPM (based on a 10s watchdog_tresh value).
>>> Once the memory transfer is achieved, the factor is reset to 0.
>>>
>>> Setting this value to a high number is like disabling the NMI watchdog
>>> during a LPM.
>>>
>>> Signed-off-by: Laurent Dufour 
>>> ---
>>>  Documentation/admin-guide/sysctl/kernel.rst | 12 ++
>>>  arch/powerpc/platforms/pseries/mobility.c   | 43 +
>>>  2 files changed, 55 insertions(+)
>>>
>>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst 
>>> b/Documentation/admin-guide/sysctl/kernel.rst
>>> index ddccd1077462..0bb0b7f27e96 100644
>>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>>> @@ -592,6 +592,18 @@ to the guest kernel command line (see
>>>  Documentation/admin-guide/kernel-parameters.rst).
>>>  
>>>  
>>> +nmi_watchdog_factor (PPC only)
>>> +==
>>> +
>>> +Factor apply to to the NMI watchdog timeout (only when ``nmi_watchdog`` is
>>> +set to 1). This factor represents the percentage added to
>>> +``watchdog_thresh`` when calculating the NMI watchdog timeout during a
>>> +LPM. The soft lockup timeout is not impacted.
>>
>> Could "LPM" or "mobility" be a bit more prominent in the parameter name
>> and documentation? Something else might want to add a factor as well,
>> one day.
> 
> In the V2 version, Nathan suggested "making the user-visible
> name more generic (e.g. "nmi_watchdog_factor") in case it makes sense to
> apply this to other contexts in the future."
> 
> So I made the change to a more generic name. I think this is a good option
> since the documentation is explicit about the LPM particular case.
> If in the future this factor needs to apply during an other operation that
> name will be generic enough.
> 
> Do you agree ?

Nick and I discussed that.
Nick prefers to have LPM in the tunable names, and thinks we can add a new
tunable if a separate user came up which required it.

We agree that 'nmi_wd_lpm_factor' is a good name.
I'll send a v5 updating that name.

>>
>> Otherwise the code looks okay.
>>
>> Reviewed-by: Nicholas Piggin 
>>
>>> +
>>> +A value of 0 means no change. The default value is 200 meaning the NMI
>>> +watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
>>> +
>>> +
>>>  numa_balancing
>>>  ==
>>>  
>>> diff --git a/arch/powerpc/platforms/pseries/mobility.c 
>>> b/arch/powerpc/platforms/pseries/mobility.c
>>> index 907a779074d6..649155faafc2 100644
>>> --- a/arch/powerpc/platforms/pseries/mobility.c
>>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>>> @@ -48,6 +48,39 @@ struct update_props_workarea {
>>>  #define MIGRATION_SCOPE(1)
>>>  #define PRRN_SCOPE -2
>>>  
>>> +#ifdef CONFIG_PPC_WATCHDOG
>>> +static unsigned int nmi_wd_factor = 200;
>>> +
>>> +#ifdef CONFIG_SYSCTL
>>> +static struct ctl_table nmi_wd_factor_ctl_table[] = {
>>> +   {
>>> +   .procname   = "nmi_watchdog_factor",
>>> +   .data   = &nmi_wd_factor,
>>> +   .maxlen = sizeof(int),
>>> +   .mode   = 0644,
>>> +   .proc_handler   = proc_douintvec_minmax,
>>> +   },
>>> +   {}
>>> +};
>>> +static struct ctl_table nmi_wd_factor_sysctl_root[] = {
>>> +   {
>>> +   .procname   = "kernel",
>>> +   .mode   = 0555,
>>> +   .child  = nmi_wd_factor_ctl_table,
>>> +   },
>>> +   {}
>>> +};
>>> +
>>> +static int __init register_nmi_wd_factor_sysctl(void)
>>> +{
>>> +   register_sysctl_table(nmi_wd_factor_sysctl_root);
>>> +
>>> +   return 0;
>>> +}
>>> +device_initcall(register_nmi_wd_factor_sysctl);
>>> +#endif /* CONFIG_SYSCTL */
>>> +#endif /* CONFIG_PPC_WATCHDOG */
>>> +
>>>  static int mobility_rtas_call(int token, char *buf, s32 scope)
>>>  {
>>> int rc;
>>> @@ -702,13 +735,20 @@ static int pseries_suspend(u64 handle)
>>>  static int ps

Re: [PATCH v4 4/4] pseries/mobility: set NMI watchdog factor during LPM

2022-07-13 Thread Randy Dunlap
Hi,

On 7/13/22 03:56, Laurent Dufour wrote:
> Le 12/07/2022 à 18:25, Randy Dunlap a écrit :
>> Hi--
>>
>> On 7/12/22 07:32, Laurent Dufour wrote:

>>>
>>> Reviewed-by: Nicholas Piggin 
>>> Signed-off-by: Laurent Dufour 
>>> ---
>>>  Documentation/admin-guide/sysctl/kernel.rst | 12 ++
>>>  arch/powerpc/platforms/pseries/mobility.c   | 43 +
>>>  2 files changed, 55 insertions(+)
>>>
>>> diff --git a/Documentation/admin-guide/sysctl/kernel.rst 
>>> b/Documentation/admin-guide/sysctl/kernel.rst
>>> index ddccd1077462..0bb0b7f27e96 100644
>>> --- a/Documentation/admin-guide/sysctl/kernel.rst
>>> +++ b/Documentation/admin-guide/sysctl/kernel.rst
>>> @@ -592,6 +592,18 @@ to the guest kernel command line (see
>>>  Documentation/admin-guide/kernel-parameters.rst).
>>>  
>>
>> This entire block should be in kernel-parameters.txt, not .rst,
>> and it should be formatted like everything else in the .txt file.

My apologies. I misread the file name.
I don't see a problem with this part of the patch or its location.

> Thanks for reviewing this patch.
> 
> I'll apply your requests in the next version.
> 
> However, regarding the change in kernel-parameters.txt, I'm confused. The
> newly introduced parameter is only exposed through sysctl. Not as a kernel
> boot option. In that case, should it be mentioned in kernel-parameters.txt?
> 
> Documentation/process/4.Coding.rst says:
> The file :ref:`Documentation/admin-guide/kernel-parameters.rst
> ` describes all of the kernel's boot-time parameters.
> Any patch which adds new parameters should add the appropriate entries to
> this file.
> 
> And Documentation/process/submit-checklist.rst says:
> 16) All new kernel boot parameters are documented in
> ``Documentation/admin-guide/kernel-parameters.rst``.
> 
> What are the rules about editing .txt or .rst files?

Yeah, that's a little confusing.
kernel-parameters.txt in included in kernel-parameters.rst when
'make htmldocs' is run, so the produced output looks like it is from
the .rst file.

Kernel boot parameters should be added to the .txt file.
The .rst file is just intro material.

Thanks.

-- 
~Randy


[PATCH] macintosh:fix oob read in do_adb_query function

2022-07-13 Thread Ning Qiang
In do_adb_query function of drivers/macintosh/adb.c, req->data is copy
form userland. the  parameter "req->data[2]" is Missing check, the
array size of adb_handler[] is 16, so "adb_handler[
req->data[2]].original_address" and "adb_handler[
req->data[2]].handler_id" will lead to oob read.

Signed-off-by: Ning Qiang 
---
 drivers/macintosh/adb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/macintosh/adb.c b/drivers/macintosh/adb.c
index 439fab4eaa85..1bbb9ca08d40 100644
--- a/drivers/macintosh/adb.c
+++ b/drivers/macintosh/adb.c
@@ -647,7 +647,7 @@ do_adb_query(struct adb_request *req)
 
switch(req->data[1]) {
case ADB_QUERY_GETDEVINFO:
-   if (req->nbytes < 3)
+   if (req->nbytes < 3 || req->data[2] >= 16)
break;
mutex_lock(&adb_handler_mutex);
req->reply[0] = adb_handler[req->data[2]].original_address;
-- 
2.25.1



Re: [PATCH v5] random: remove CONFIG_ARCH_RANDOM

2022-07-13 Thread Catalin Marinas
On Fri, Jul 08, 2022 at 02:40:32AM +0200, Jason A. Donenfeld wrote:
> When RDRAND was introduced, there was much discussion on whether it
> should be trusted and how the kernel should handle that. Initially, two
> mechanisms cropped up, CONFIG_ARCH_RANDOM, a compile time switch, and
> "nordrand", a boot-time switch.
> 
> Later the thinking evolved. With a properly designed RNG, using RDRAND
> values alone won't harm anything, even if the outputs are malicious.
> Rather, the issue is whether those values are being *trusted* to be good
> or not. And so a new set of options were introduced as the real
> ones that people use -- CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu".
> With these options, RDRAND is used, but it's not always credited. So in
> the worst case, it does nothing, and in the best case, maybe it helps.
> 
> Along the way, CONFIG_ARCH_RANDOM's meaning got sort of pulled into the
> center and became something certain platforms force-select.
> 
> The old options don't really help with much, and it's a bit odd to have
> special handling for these instructions when the kernel can deal fine
> with the existence or untrusted existence or broken existence or
> non-existence of that CPU capability.
> 
> Simplify the situation by removing CONFIG_ARCH_RANDOM and using the
> ordinary asm-generic fallback pattern instead, keeping the two options
> that are actually used. For now it leaves "nordrand" for now, as the
> removal of that will take a different route.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Michael Ellerman 
> Cc: Alexander Gordeev 
> Cc: Thomas Gleixner 
> Cc: H. Peter Anvin 
> Acked-by: Borislav Petkov 
> Acked-by: Heiko Carstens 
> Acked-by: Greg Kroah-Hartman 
> Signed-off-by: Jason A. Donenfeld 

For arm64:

Acked-by: Catalin Marinas 


[PATCH v5 2/4] watchdog: export lockup_detector_reconfigure

2022-07-13 Thread Laurent Dufour
In some circumstances it may be interesting to reconfigure the watchdog
from inside the kernel.

On PowerPC, this may helpful before and after a LPAR migration (LPM) is
initiated, because it implies some latencies, watchdog, and especially NMI
watchdog is expected to be triggered during this operation. Reconfiguring
the watchdog with a factor, would prevent it to happen too frequently
during LPM.

Rename lockup_detector_reconfigure() as __lockup_detector_reconfigure() and
create a new function lockup_detector_reconfigure() calling
__lockup_detector_reconfigure() under the protection of watchdog_mutex.

Cc: Christoph Hellwig 
Signed-off-by: Laurent Dufour 
---
 include/linux/nmi.h |  2 ++
 kernel/watchdog.c   | 21 -
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 750c7f395ca9..f700ff2df074 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -122,6 +122,8 @@ int watchdog_nmi_probe(void);
 int watchdog_nmi_enable(unsigned int cpu);
 void watchdog_nmi_disable(unsigned int cpu);
 
+void lockup_detector_reconfigure(void);
+
 /**
  * touch_nmi_watchdog - restart NMI watchdog timeout.
  *
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 20a7a55e62b6..90e6c41d5e33 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -541,7 +541,7 @@ int lockup_detector_offline_cpu(unsigned int cpu)
return 0;
 }
 
-static void lockup_detector_reconfigure(void)
+static void __lockup_detector_reconfigure(void)
 {
cpus_read_lock();
watchdog_nmi_stop();
@@ -561,6 +561,13 @@ static void lockup_detector_reconfigure(void)
__lockup_detector_cleanup();
 }
 
+void lockup_detector_reconfigure(void)
+{
+   mutex_lock(&watchdog_mutex);
+   __lockup_detector_reconfigure();
+   mutex_unlock(&watchdog_mutex);
+}
+
 /*
  * Create the watchdog infrastructure and configure the detector(s).
  */
@@ -577,13 +584,13 @@ static __init void lockup_detector_setup(void)
return;
 
mutex_lock(&watchdog_mutex);
-   lockup_detector_reconfigure();
+   __lockup_detector_reconfigure();
softlockup_initialized = true;
mutex_unlock(&watchdog_mutex);
 }
 
 #else /* CONFIG_SOFTLOCKUP_DETECTOR */
-static void lockup_detector_reconfigure(void)
+void __lockup_detector_reconfigure(void)
 {
cpus_read_lock();
watchdog_nmi_stop();
@@ -591,9 +598,13 @@ static void lockup_detector_reconfigure(void)
watchdog_nmi_start();
cpus_read_unlock();
 }
+static inline void lockup_detector_reconfigure(void)
+{
+   __lockup_detector_reconfigure();
+}
 static inline void lockup_detector_setup(void)
 {
-   lockup_detector_reconfigure();
+   __lockup_detector_reconfigure();
 }
 #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */
 
@@ -633,7 +644,7 @@ static void proc_watchdog_update(void)
 {
/* Remove impossible cpus to keep sysctl output clean. */
cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
-   lockup_detector_reconfigure();
+   __lockup_detector_reconfigure();
 }
 
 /*
-- 
2.37.0



[PATCH v5 3/4] powerpc/watchdog: introduce a NMI watchdog's factor

2022-07-13 Thread Laurent Dufour
Introduce a factor which would apply to the NMI watchdog timeout.

This factor is a percentage added to the watchdog_tresh value. The value is
set under the watchdog_mutex protection and lockup_detector_reconfigure()
is called to recompute wd_panic_timeout_tb.

Once the factor is set, it remains until it is set back to 0, which means
no impact.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/include/asm/nmi.h |  2 ++
 arch/powerpc/kernel/watchdog.c | 21 -
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/nmi.h b/arch/powerpc/include/asm/nmi.h
index ea0e487f87b1..c3c7adef74de 100644
--- a/arch/powerpc/include/asm/nmi.h
+++ b/arch/powerpc/include/asm/nmi.h
@@ -5,8 +5,10 @@
 #ifdef CONFIG_PPC_WATCHDOG
 extern void arch_touch_nmi_watchdog(void);
 long soft_nmi_interrupt(struct pt_regs *regs);
+void watchdog_nmi_set_timeout_pct(u64 pct);
 #else
 static inline void arch_touch_nmi_watchdog(void) {}
+static inline void watchdog_nmi_set_timeout_pct(u64 pct) {}
 #endif
 
 #ifdef CONFIG_NMI_IPI
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 7d28b9553654..5d903e63f932 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -91,6 +91,10 @@ static cpumask_t wd_smp_cpus_pending;
 static cpumask_t wd_smp_cpus_stuck;
 static u64 wd_smp_last_reset_tb;
 
+#ifdef CONFIG_PPC_PSERIES
+static u64 wd_timeout_pct;
+#endif
+
 /*
  * Try to take the exclusive watchdog action / NMI IPI / printing lock.
  * wd_smp_lock must be held. If this fails, we should return and wait
@@ -527,7 +531,13 @@ static int stop_watchdog_on_cpu(unsigned int cpu)
 
 static void watchdog_calc_timeouts(void)
 {
-   wd_panic_timeout_tb = watchdog_thresh * ppc_tb_freq;
+   u64 threshold = watchdog_thresh;
+
+#ifdef CONFIG_PPC_PSERIES
+   threshold += (READ_ONCE(wd_timeout_pct) * threshold) / 100;
+#endif
+
+   wd_panic_timeout_tb = threshold * ppc_tb_freq;
 
/* Have the SMP detector trigger a bit later */
wd_smp_panic_timeout_tb = wd_panic_timeout_tb * 3 / 2;
@@ -570,3 +580,12 @@ int __init watchdog_nmi_probe(void)
}
return 0;
 }
+
+#ifdef CONFIG_PPC_PSERIES
+void watchdog_nmi_set_timeout_pct(u64 pct)
+{
+   pr_info("Set the NMI watchdog timeout factor to %llu%%\n", pct);
+   WRITE_ONCE(wd_timeout_pct, pct);
+   lockup_detector_reconfigure();
+}
+#endif
-- 
2.37.0



[PATCH v5 1/4] powerpc/mobility: wait for memory transfer to complete

2022-07-13 Thread Laurent Dufour
In pseries_migration_partition(), loop until the memory transfer is
complete. This way the calling drmgr process will not exit earlier,
allowing callbacks to be run only once the migration is fully completed.

If reading the VASI state is done after the hypervisor has completed the
migration, the HCALL is returning H_PARAMETER. We can safely assume that
the memory transfer is achieved if this happens.

This will also allow to manage the NMI watchdog state in the next commits.

Reviewed-by: Nathan Lynch 
Reviewed-by: Nicholas Piggin 
Signed-off-by: Laurent Dufour 
---
 arch/powerpc/platforms/pseries/mobility.c | 48 ++-
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 78f3f74c7056..6297467072e6 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -427,6 +427,43 @@ static int wait_for_vasi_session_suspending(u64 handle)
return ret;
 }
 
+static void wait_for_vasi_session_completed(u64 handle)
+{
+   unsigned long state = 0;
+   int ret;
+
+   pr_info("waiting for memory transfer to complete...\n");
+
+   /*
+* Wait for transition from H_VASI_RESUMED to H_VASI_COMPLETED.
+*/
+   while (true) {
+   ret = poll_vasi_state(handle, &state);
+
+   /*
+* If the memory transfer is already complete and the migration
+* has been cleaned up by the hypervisor, H_PARAMETER is return,
+* which is translate in EINVAL by poll_vasi_state().
+*/
+   if (ret == -EINVAL || (!ret && state == H_VASI_COMPLETED)) {
+   pr_info("memory transfer completed.\n");
+   break;
+   }
+
+   if (ret) {
+   pr_err("H_VASI_STATE return error (%d)\n", ret);
+   break;
+   }
+
+   if (state != H_VASI_RESUMED) {
+   pr_err("unexpected H_VASI_STATE result %lu\n", state);
+   break;
+   }
+
+   msleep(500);
+   }
+}
+
 static void prod_single(unsigned int target_cpu)
 {
long hvrc;
@@ -673,9 +710,16 @@ static int pseries_migrate_partition(u64 handle)
vas_migration_handler(VAS_SUSPEND);
 
ret = pseries_suspend(handle);
-   if (ret == 0)
+   if (ret == 0) {
post_mobility_fixup();
-   else
+   /*
+* Wait until the memory transfer is complete, so that the user
+* space process returns from the syscall after the transfer is
+* complete. This allows the user hooks to be executed at the
+* right time.
+*/
+   wait_for_vasi_session_completed(handle);
+   } else
pseries_cancel_migration(handle, ret);
 
vas_migration_handler(VAS_RESUME);
-- 
2.37.0



[PATCH v5 4/4] pseries/mobility: set NMI watchdog factor during an LPM

2022-07-13 Thread Laurent Dufour
During an LPM, while the memory transfer is in progress on the arrival
side, some latencies are generated when accessing not yet transferred pages
on the arrival side. Thus, the NMI watchdog may be triggered too
frequently, which increases the risk to hit an NMI interrupt in a bad place
in the kernel, leading to a kernel panic.

Disabling the Hard Lockup Watchdog until the memory transfer could be a too
strong work around, some users would want this timeout to be eventually
triggered if the system is hanging even during an LPM.

Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply a
factor to the NMI watchdog timeout during an LPM. Just before the CPUs are
stopped for the switchover sequence, the NMI watchdog timer is set to
watchdog_thresh + factor%

A value of 0 has no effect. The default value is 200, meaning that the NMI
watchdog is set to 30s during LPM (based on a 10s watchdog_thresh value).
Once the memory transfer is achieved, the factor is reset to 0.

Setting this value to a high number is like disabling the NMI watchdog
during an LPM.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Laurent Dufour 
---
 Documentation/admin-guide/sysctl/kernel.rst | 12 ++
 arch/powerpc/platforms/pseries/mobility.c   | 43 +
 2 files changed, 55 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst 
b/Documentation/admin-guide/sysctl/kernel.rst
index ddccd1077462..d73faa619c15 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -592,6 +592,18 @@ to the guest kernel command line (see
 Documentation/admin-guide/kernel-parameters.rst).
 
 
+nmi_wd_lpm_factor (PPC only)
+
+
+Factor apply to the NMI watchdog timeout (only when ``nmi_watchdog`` is
+set to 1). This factor represents the percentage added to
+``watchdog_thresh`` when calculating the NMI watchdog timeout during an
+LPM. The soft lockup timeout is not impacted.
+
+A value of 0 means no change. The default value is 200 meaning the NMI
+watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
+
+
 numa_balancing
 ==
 
diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 6297467072e6..3d36a8955eaf 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -48,6 +48,39 @@ struct update_props_workarea {
 #define MIGRATION_SCOPE(1)
 #define PRRN_SCOPE -2
 
+#ifdef CONFIG_PPC_WATCHDOG
+static unsigned int nmi_wd_lpm_factor = 200;
+
+#ifdef CONFIG_SYSCTL
+static struct ctl_table nmi_wd_lpm_factor_ctl_table[] = {
+   {
+   .procname   = "nmi_wd_lpm_factor",
+   .data   = &nmi_wd_lpm_factor,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_douintvec_minmax,
+   },
+   {}
+};
+static struct ctl_table nmi_wd_lpm_factor_sysctl_root[] = {
+   {
+   .procname   = "kernel",
+   .mode   = 0555,
+   .child  = nmi_wd_lpm_factor_ctl_table,
+   },
+   {}
+};
+
+static int __init register_nmi_wd_lpm_factor_sysctl(void)
+{
+   register_sysctl_table(nmi_wd_lpm_factor_sysctl_root);
+
+   return 0;
+}
+device_initcall(register_nmi_wd_lpm_factor_sysctl);
+#endif /* CONFIG_SYSCTL */
+#endif /* CONFIG_PPC_WATCHDOG */
+
 static int mobility_rtas_call(int token, char *buf, s32 scope)
 {
int rc;
@@ -702,13 +735,20 @@ static int pseries_suspend(u64 handle)
 static int pseries_migrate_partition(u64 handle)
 {
int ret;
+   unsigned int factor = 0;
 
+#ifdef CONFIG_PPC_WATCHDOG
+   factor = nmi_wd_lpm_factor;
+#endif
ret = wait_for_vasi_session_suspending(handle);
if (ret)
return ret;
 
vas_migration_handler(VAS_SUSPEND);
 
+   if (factor)
+   watchdog_nmi_set_timeout_pct(factor);
+
ret = pseries_suspend(handle);
if (ret == 0) {
post_mobility_fixup();
@@ -722,6 +762,9 @@ static int pseries_migrate_partition(u64 handle)
} else
pseries_cancel_migration(handle, ret);
 
+   if (factor)
+   watchdog_nmi_set_timeout_pct(0);
+
vas_migration_handler(VAS_RESUME);
 
return ret;
-- 
2.37.0



[PATCH v5 0/4] Extending NMI watchdog during LPM

2022-07-13 Thread Laurent Dufour
When a partition is transferred, once it arrives at the destination node,
the partition is active but much of its memory must be transferred from the
start node.

It depends on the activity in the partition, but the more CPU the partition
has, the more memory to be transferred is likely to be. This causes latency
when accessing pages that need to be transferred, and often, for large
partitions, it triggers the NMI watchdog.

The NMI watchdog causes the CPU stack to dump where it appears to be
stuck. In this case, it does not bring much information since it can happen
during any memory access of the kernel.

In addition, the NMI interrupt mechanism is not secure and can generate a
dump system in the event that the interruption is taken while MSR[RI]=0.

Depending on the LPAR size and load, it may be interesting to extend the
NMI watchdog timer during the LPM.

That's configurable through sysctl with the new introduced variable
(specific to powerpc) nmi_watchdog_factor. This value represents the
percentage added to watchdog_tresh to set the NMI watchdog timeout during a
LPM.

Changes in V5 (no functional changes in this version):
Patch 4/4:
 - fixing typos and grammar issues reported by Randy.
 - Renaming sysctl value from nmi_watchdog_factor to nmi_wd_lpm_factor as
   per Nick request

V4:
https://lore.kernel.org/linuxppc-dev/20220712143202.23144-1-lduf...@linux.ibm.com/

Laurent Dufour (4):
  powerpc/mobility: wait for memory transfer to complete
  watchdog: export lockup_detector_reconfigure
  powerpc/watchdog: introduce a NMI watchdog's factor
  pseries/mobility: set NMI watchdog factor during an LPM

 Documentation/admin-guide/sysctl/kernel.rst | 12 +++
 arch/powerpc/include/asm/nmi.h  |  2 +
 arch/powerpc/kernel/watchdog.c  | 21 -
 arch/powerpc/platforms/pseries/mobility.c   | 91 -
 include/linux/nmi.h |  2 +
 kernel/watchdog.c   | 21 +++--
 6 files changed, 141 insertions(+), 8 deletions(-)

-- 
2.37.0



Re: [PATCH] macintosh:fix oob read in do_adb_query function

2022-07-13 Thread Greg KH
On Wed, Jul 13, 2022 at 11:37:34PM +0800, Ning Qiang wrote:
> In do_adb_query function of drivers/macintosh/adb.c, req->data is copy
> form userland. the  parameter "req->data[2]" is Missing check, the
> array size of adb_handler[] is 16, so "adb_handler[
> req->data[2]].original_address" and "adb_handler[
> req->data[2]].handler_id" will lead to oob read.
> 
> Signed-off-by: Ning Qiang 

Cc: stable 
Reviewed-by: Greg Kroah-Hartman 



Re: [PATCH] macintosh:fix oob read in do_adb_query function

2022-07-13 Thread Kees Cook
On Wed, Jul 13, 2022 at 11:37:34PM +0800, Ning Qiang wrote:
> In do_adb_query function of drivers/macintosh/adb.c, req->data is copy
> form userland. the  parameter "req->data[2]" is Missing check, the
> array size of adb_handler[] is 16, so "adb_handler[
> req->data[2]].original_address" and "adb_handler[
> req->data[2]].handler_id" will lead to oob read.
> 
> Signed-off-by: Ning Qiang 

Thanks for catching this!

Do you have a reproducer for this? I'd expect CONFIG_UBSAN_BOUNDS=y to
notice this at runtime, at least.


> ---
>  drivers/macintosh/adb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/macintosh/adb.c b/drivers/macintosh/adb.c
> index 439fab4eaa85..1bbb9ca08d40 100644
> --- a/drivers/macintosh/adb.c
> +++ b/drivers/macintosh/adb.c
> @@ -647,7 +647,7 @@ do_adb_query(struct adb_request *req)
>  
>   switch(req->data[1]) {
>   case ADB_QUERY_GETDEVINFO:
> - if (req->nbytes < 3)
> + if (req->nbytes < 3 || req->data[2] >= 16)

I'd prefer this was:

+   if (req->nbytes < 3 || req->data[2] >= ARRAY_SIZE(adb_handler))

so it's tied to the actual variable (if its size ever changes).

With that:

Reviewed-by: Kees Cook 

-Kees

>   break;
>   mutex_lock(&adb_handler_mutex);
>   req->reply[0] = adb_handler[req->data[2]].original_address;
> -- 
> 2.25.1
> 

-- 
Kees Cook


Re: [PATCH v5 4/4] pseries/mobility: set NMI watchdog factor during an LPM

2022-07-13 Thread Randy Dunlap
Hi Laurent,

On 7/13/22 08:47, Laurent Dufour wrote:
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst 
> b/Documentation/admin-guide/sysctl/kernel.rst
> index ddccd1077462..d73faa619c15 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -592,6 +592,18 @@ to the guest kernel command line (see
>  Documentation/admin-guide/kernel-parameters.rst).
>  
>  
> +nmi_wd_lpm_factor (PPC only)
> +
> +
> +Factor apply to the NMI watchdog timeout (only when ``nmi_watchdog`` is

   Factor to apply to

> +set to 1). This factor represents the percentage added to
> +``watchdog_thresh`` when calculating the NMI watchdog timeout during an
> +LPM. The soft lockup timeout is not impacted.
> +
> +A value of 0 means no change. The default value is 200 meaning the NMI
> +watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).

-- 
~Randy


[PATCH v3 0/4] pseries-wdt: initial support for H_WATCHDOG-based watchdog timers

2022-07-13 Thread Scott Cheloha
PAPR v2.12 defines a new hypercall, H_WATCHDOG.  This patch series
adds support for this hypercall to powerpc/pseries kernels and
introduces a new watchdog driver, "pseries-wdt", for the virtual
timers exposed by the hypercall.

This series is preceded by the following:

RFC v1: 
https://lore.kernel.org/linux-watchdog/20220413165104.179144-1-chel...@linux.ibm.com/
RFC v2: 
https://lore.kernel.org/linux-watchdog/20220509174357.5448-1-chel...@linux.ibm.com/
PATCH v1: 
https://lore.kernel.org/linux-watchdog/20220520183552.33426-1-chel...@linux.ibm.com/
PATCH v2: 
https://lore.kernel.org/linux-watchdog/20220602175353.68942-1-chel...@linux.ibm.com/

Changes of note from PATCH v2:

- Don't keep a pointer to the platform device at registration
  time.  We don't use the pointer for anything and we cannot
  hotplug the "device".

- Drop the GETFIELD() and SETFIELD() macros: Michael Ellerman really
  doesn't like them.  Use plain integer constants and custom bitfield
  extraction macros for the capability output instead.

  (After making the change I can see the upside to plain constants.)

- Actually use PSERIES_WDTQ_MAX_NUMBER(): check that the hypervisor
  gave us at least one timer to work with.

- Use MSEC_PER_SEC in a few places instead of the literal 1000 to
  show the reader what we're doing.

- Use "reverse xmas tree" sorting for automatic variable declarations.

- Note where the max_timeout of (UINT_MAX / 1000) comes from.

- Nix email addresses from the MODULE_AUTHOR() macros, they tend to
  rot.




[PATCH v3 1/4] powerpc/pseries: hvcall.h: add H_WATCHDOG opcode, H_NOOP return code

2022-07-13 Thread Scott Cheloha
PAPR v2.12 defines a new hypercall, H_WATCHDOG.  The hypercall permits
guest control of one or more virtual watchdog timers.

Add the opcode for the H_WATCHDOG hypercall to hvcall.h.  While here,
add a definition for H_NOOP, a possible return code for H_WATCHDOG.

Signed-off-by: Scott Cheloha 
---
 arch/powerpc/include/asm/hvcall.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index d92a20a85395..4b4f69c35b4f 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -87,6 +87,7 @@
 #define H_P7   -60
 #define H_P8   -61
 #define H_P9   -62
+#define H_NOOP -63
 #define H_TOO_BIG  -64
 #define H_UNSUPPORTED  -67
 #define H_OVERLAP  -68
@@ -324,7 +325,8 @@
 #define H_RPT_INVALIDATE   0x448
 #define H_SCM_FLUSH0x44C
 #define H_GET_ENERGY_SCALE_INFO0x450
-#define MAX_HCALL_OPCODE   H_GET_ENERGY_SCALE_INFO
+#define H_WATCHDOG 0x45C
+#define MAX_HCALL_OPCODE   H_WATCHDOG
 
 /* Scope args for H_SCM_UNBIND_ALL */
 #define H_UNBIND_SCOPE_ALL (0x1)
-- 
2.27.0



[PATCH v3 2/4] powerpc/pseries: add FW_FEATURE_WATCHDOG flag

2022-07-13 Thread Scott Cheloha
PAPR v2.12 specifies a new optional function set, "hcall-watchdog",
for the /rtas/ibm,hypertas-functions property.  The presence of this
function set indicates support for the H_WATCHDOG hypercall.

Check for this function set and, if present, set the new
FW_FEATURE_WATCHDOG flag.

Signed-off-by: Scott Cheloha 
---
 arch/powerpc/include/asm/firmware.h   | 3 ++-
 arch/powerpc/platforms/pseries/firmware.c | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/firmware.h 
b/arch/powerpc/include/asm/firmware.h
index 834b8ecf..398e0b5e485f 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -55,6 +55,7 @@
 #define FW_FEATURE_RPT_INVALIDATE ASM_CONST(0x0100)
 #define FW_FEATURE_FORM2_AFFINITY ASM_CONST(0x0200)
 #define FW_FEATURE_ENERGY_SCALE_INFO ASM_CONST(0x0400)
+#define FW_FEATURE_WATCHDOGASM_CONST(0x0800)
 
 #ifndef __ASSEMBLY__
 
@@ -76,7 +77,7 @@ enum {
FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE |
FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR |
FW_FEATURE_RPT_INVALIDATE | FW_FEATURE_FORM2_AFFINITY |
-   FW_FEATURE_ENERGY_SCALE_INFO,
+   FW_FEATURE_ENERGY_SCALE_INFO | FW_FEATURE_WATCHDOG,
FW_FEATURE_PSERIES_ALWAYS = 0,
FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_ULTRAVISOR,
FW_FEATURE_POWERNV_ALWAYS = 0,
diff --git a/arch/powerpc/platforms/pseries/firmware.c 
b/arch/powerpc/platforms/pseries/firmware.c
index 09c119b2f623..080108d129ed 100644
--- a/arch/powerpc/platforms/pseries/firmware.c
+++ b/arch/powerpc/platforms/pseries/firmware.c
@@ -67,6 +67,7 @@ hypertas_fw_features_table[] = {
{FW_FEATURE_PAPR_SCM,   "hcall-scm"},
{FW_FEATURE_RPT_INVALIDATE, "hcall-rpt-invalidate"},
{FW_FEATURE_ENERGY_SCALE_INFO,  "hcall-energy-scale-info"},
+   {FW_FEATURE_WATCHDOG,   "hcall-watchdog"},
 };
 
 /* Build up the firmware features bitmask using the contents of
-- 
2.27.0



[PATCH v3 3/4] powerpc/pseries: register pseries-wdt device with platform bus

2022-07-13 Thread Scott Cheloha
PAPR v2.12 defines a new hypercall, H_WATCHDOG.  The hypercall permits
guest control of one or more virtual watchdog timers.

These timers do not conform to PowerPC device conventions.  They are
not affixed to any extant bus, nor do they have full representation in
the device tree.

As a workaround we represent them as platform devices.

This patch registers a single platform device, "pseries-wdt", with the
platform bus if the FW_FEATURE_WATCHDOG flag is set.

A driver for this device, "pseries-wdt", will be introduced in a
subsequent patch.

Signed-off-by: Scott Cheloha 
---
 arch/powerpc/platforms/pseries/setup.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index ee4f1db49515..dd9d3f500cff 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -14,6 +14,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -169,6 +170,18 @@ static void __init fwnmi_init(void)
 #endif
 }
 
+/*
+ * Affix a device for the first timer to the platform bus if
+ * we have firmware support for the H_WATCHDOG hypercall.
+ */
+static __init int pseries_wdt_init(void)
+{
+   if (firmware_has_feature(FW_FEATURE_WATCHDOG))
+   platform_device_register_simple("pseries-wdt", 0, NULL, 0);
+   return 0;
+}
+machine_subsys_initcall(pseries, pseries_wdt_init);
+
 static void pseries_8259_cascade(struct irq_desc *desc)
 {
struct irq_chip *chip = irq_desc_get_chip(desc);
-- 
2.27.0



[PATCH v3 4/4] watchdog/pseries-wdt: initial support for H_WATCHDOG-based watchdog timers

2022-07-13 Thread Scott Cheloha
PAPR v2.12 defines a new hypercall, H_WATCHDOG.  The hypercall permits
guest control of one or more virtual watchdog timers.  The timers have
millisecond granularity.  The guest is terminated when a timer
expires.

This patch adds a watchdog driver for these timers, "pseries-wdt".

pseries_wdt_probe() currently assumes the existence of only one
platform device and always assigns it watchdogNumber 1.  If we ever
expose more than one timer to userspace we will need to devise a way
to assign a distinct watchdogNumber to each platform device at device
registration time.

Signed-off-by: Scott Cheloha 
---
 .../watchdog/watchdog-parameters.rst  |  12 +
 drivers/watchdog/Kconfig  |   8 +
 drivers/watchdog/Makefile |   1 +
 drivers/watchdog/pseries-wdt.c| 239 ++
 4 files changed, 260 insertions(+)
 create mode 100644 drivers/watchdog/pseries-wdt.c

diff --git a/Documentation/watchdog/watchdog-parameters.rst 
b/Documentation/watchdog/watchdog-parameters.rst
index 223c99361a30..29153eed6689 100644
--- a/Documentation/watchdog/watchdog-parameters.rst
+++ b/Documentation/watchdog/watchdog-parameters.rst
@@ -425,6 +425,18 @@ pnx833x_wdt:
 
 -
 
+pseries-wdt:
+action:
+   Action taken when watchdog expires: 0 (power off), 1 (restart),
+   2 (dump and restart). (default=1)
+timeout:
+   Initial watchdog timeout in seconds. (default=60)
+nowayout:
+   Watchdog cannot be stopped once started.
+   (default=kernel config parameter)
+
+-
+
 rc32434_wdt:
 timeout:
Watchdog timeout value, in seconds (default=20)
diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 32fd37698932..a2429604a4ab 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -1962,6 +1962,14 @@ config MEN_A21_WDT
 
 # PPC64 Architecture
 
+config PSERIES_WDT
+   tristate "POWER Architecture Platform Watchdog Timer"
+   depends on PPC_PSERIES
+   select WATCHDOG_CORE
+   help
+ Driver for virtual watchdog timers provided by PAPR
+ hypervisors (e.g. PowerVM, KVM).
+
 config WATCHDOG_RTAS
tristate "RTAS watchdog"
depends on PPC_RTAS
diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
index c324e9d820e9..cdeb119e6e61 100644
--- a/drivers/watchdog/Makefile
+++ b/drivers/watchdog/Makefile
@@ -187,6 +187,7 @@ obj-$(CONFIG_BOOKE_WDT) += booke_wdt.o
 obj-$(CONFIG_MEN_A21_WDT) += mena21_wdt.o
 
 # PPC64 Architecture
+obj-$(CONFIG_PSERIES_WDT) += pseries-wdt.o
 obj-$(CONFIG_WATCHDOG_RTAS) += wdrtas.o
 
 # S390 Architecture
diff --git a/drivers/watchdog/pseries-wdt.c b/drivers/watchdog/pseries-wdt.c
new file mode 100644
index ..7f53b5293409
--- /dev/null
+++ b/drivers/watchdog/pseries-wdt.c
@@ -0,0 +1,239 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2022 International Business Machines, Inc.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRV_NAME "pseries-wdt"
+
+/*
+ * H_WATCHDOG Input
+ *
+ * R4: "flags":
+ *
+ * Bits 48-55: "operation"
+ */
+#define PSERIES_WDTF_OP_START  0x100UL /* start timer */
+#define PSERIES_WDTF_OP_STOP   0x200UL /* stop timer */
+#define PSERIES_WDTF_OP_QUERY  0x300UL /* query timer capabilities */
+
+/*
+ * Bits 56-63: "timeoutAction" (for "Start Watchdog" only)
+ */
+#define PSERIES_WDTF_ACTION_HARD_POWEROFF  0x1UL   /* poweroff */
+#define PSERIES_WDTF_ACTION_HARD_RESTART   0x2UL   /* restart */
+#define PSERIES_WDTF_ACTION_DUMP_RESTART   0x3UL   /* dump + restart */
+
+/*
+ * H_WATCHDOG Output
+ *
+ * R3: Return code
+ *
+ * H_SUCCESSThe operation completed.
+ *
+ * H_BUSY  The hypervisor is too busy; retry the operation.
+ *
+ * H_PARAMETER  The given "flags" are somehow invalid.  Either the
+ *  "operation" or "timeoutAction" is invalid, or a
+ *  reserved bit is set.
+ *
+ * H_P2 The given "watchdogNumber" is zero or exceeds the
+ *  supported maximum value.
+ *
+ * H_P3 The given "timeoutInMs" is below the supported
+ *  minimum value.
+ *
+ * H_NOOP   The given "watchdogNumber" is already stopped.
+ *
+ * H_HARDWARE   The operation failed for ineffable reasons.
+ *
+ * H_FUNCTION   The H_WATCHDOG hypercall is not supported by this
+ *  hypervisor.
+ *
+ * R4:
+ *
+ * - For the "Query Watchdog Capabilities" operation, a 64-bit
+ *   structure:
+ */
+#define PSERIES_WDTQ_MIN_TIMEOUT(cap)  (((cap) >> 48) & 0x)
+#define PSERIES_WDTQ_MAX_NUMBER(cap)   (((cap) >> 32) & 0x)
+
+static const unsigned long pseries_wdt_action[] = {
+   [0] = PSERIES_WDTF_ACTION_HARD_POWEROFF,
+   [1] = PSERIES_WDTF_ACTION_HARD_R

Re: [PATCH v3 4/4] watchdog/pseries-wdt: initial support for H_WATCHDOG-based watchdog timers

2022-07-13 Thread Guenter Roeck

On 7/13/22 13:23, Scott Cheloha wrote:

PAPR v2.12 defines a new hypercall, H_WATCHDOG.  The hypercall permits
guest control of one or more virtual watchdog timers.  The timers have
millisecond granularity.  The guest is terminated when a timer
expires.

This patch adds a watchdog driver for these timers, "pseries-wdt".

pseries_wdt_probe() currently assumes the existence of only one
platform device and always assigns it watchdogNumber 1.  If we ever
expose more than one timer to userspace we will need to devise a way
to assign a distinct watchdogNumber to each platform device at device
registration time.

Signed-off-by: Scott Cheloha 


Acked-by: Guenter Roeck 


---
  .../watchdog/watchdog-parameters.rst  |  12 +
  drivers/watchdog/Kconfig  |   8 +
  drivers/watchdog/Makefile |   1 +
  drivers/watchdog/pseries-wdt.c| 239 ++
  4 files changed, 260 insertions(+)
  create mode 100644 drivers/watchdog/pseries-wdt.c

diff --git a/Documentation/watchdog/watchdog-parameters.rst 
b/Documentation/watchdog/watchdog-parameters.rst
index 223c99361a30..29153eed6689 100644
--- a/Documentation/watchdog/watchdog-parameters.rst
+++ b/Documentation/watchdog/watchdog-parameters.rst
@@ -425,6 +425,18 @@ pnx833x_wdt:
  
  -
  
+pseries-wdt:

+action:
+   Action taken when watchdog expires: 0 (power off), 1 (restart),
+   2 (dump and restart). (default=1)
+timeout:
+   Initial watchdog timeout in seconds. (default=60)
+nowayout:
+   Watchdog cannot be stopped once started.
+   (default=kernel config parameter)
+
+-
+
  rc32434_wdt:
  timeout:
Watchdog timeout value, in seconds (default=20)
diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index 32fd37698932..a2429604a4ab 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -1962,6 +1962,14 @@ config MEN_A21_WDT
  
  # PPC64 Architecture
  
+config PSERIES_WDT

+   tristate "POWER Architecture Platform Watchdog Timer"
+   depends on PPC_PSERIES
+   select WATCHDOG_CORE
+   help
+ Driver for virtual watchdog timers provided by PAPR
+ hypervisors (e.g. PowerVM, KVM).
+
  config WATCHDOG_RTAS
tristate "RTAS watchdog"
depends on PPC_RTAS
diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
index c324e9d820e9..cdeb119e6e61 100644
--- a/drivers/watchdog/Makefile
+++ b/drivers/watchdog/Makefile
@@ -187,6 +187,7 @@ obj-$(CONFIG_BOOKE_WDT) += booke_wdt.o
  obj-$(CONFIG_MEN_A21_WDT) += mena21_wdt.o
  
  # PPC64 Architecture

+obj-$(CONFIG_PSERIES_WDT) += pseries-wdt.o
  obj-$(CONFIG_WATCHDOG_RTAS) += wdrtas.o
  
  # S390 Architecture

diff --git a/drivers/watchdog/pseries-wdt.c b/drivers/watchdog/pseries-wdt.c
new file mode 100644
index ..7f53b5293409
--- /dev/null
+++ b/drivers/watchdog/pseries-wdt.c
@@ -0,0 +1,239 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2022 International Business Machines, Inc.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRV_NAME "pseries-wdt"
+
+/*
+ * H_WATCHDOG Input
+ *
+ * R4: "flags":
+ *
+ * Bits 48-55: "operation"
+ */
+#define PSERIES_WDTF_OP_START  0x100UL /* start timer */
+#define PSERIES_WDTF_OP_STOP   0x200UL /* stop timer */
+#define PSERIES_WDTF_OP_QUERY  0x300UL /* query timer capabilities */
+
+/*
+ * Bits 56-63: "timeoutAction" (for "Start Watchdog" only)
+ */
+#define PSERIES_WDTF_ACTION_HARD_POWEROFF  0x1UL   /* poweroff */
+#define PSERIES_WDTF_ACTION_HARD_RESTART   0x2UL   /* restart */
+#define PSERIES_WDTF_ACTION_DUMP_RESTART   0x3UL   /* dump + restart */
+
+/*
+ * H_WATCHDOG Output
+ *
+ * R3: Return code
+ *
+ * H_SUCCESSThe operation completed.
+ *
+ * H_BUSY  The hypervisor is too busy; retry the operation.
+ *
+ * H_PARAMETER  The given "flags" are somehow invalid.  Either the
+ *  "operation" or "timeoutAction" is invalid, or a
+ *  reserved bit is set.
+ *
+ * H_P2 The given "watchdogNumber" is zero or exceeds the
+ *  supported maximum value.
+ *
+ * H_P3 The given "timeoutInMs" is below the supported
+ *  minimum value.
+ *
+ * H_NOOP   The given "watchdogNumber" is already stopped.
+ *
+ * H_HARDWARE   The operation failed for ineffable reasons.
+ *
+ * H_FUNCTION   The H_WATCHDOG hypercall is not supported by this
+ *  hypervisor.
+ *
+ * R4:
+ *
+ * - For the "Query Watchdog Capabilities" operation, a 64-bit
+ *   structure:
+ */
+#define PSERIES_WDTQ_MIN_TIMEOUT(cap)  (((cap) >> 48) & 0x)
+#define PSERIES_WDTQ_MAX_NUMBER(cap)   (((cap) >> 32) & 0x)
+
+static const unsigned long pseries_wdt_action[

[PATCH linux-next] powerpc: use raw_smp_processor_id in arch_touch_nmi_watchdog

2022-07-13 Thread Zhouyi Zhou
use raw_smp_processor_id() in arch_touch_nmi_watchdog
because when called from watchdog, the cpu is preemptible.

Signed-off-by: Zhouyi Zhou 
---
Dear PPC developers

I found this bug when trying to do rcutorture tests in ppc VM of
Open Source Lab of Oregon State University.

qemu-system-ppc64  -nographic -smp cores=4,threads=1 -net none  -M pseries 
-nodefaults -device spapr-vscsi -serial file:/tmp/console.log -m 2G -kernel 
/home/ubuntu/linux-next/tools/testing/selftests/rcutorture/res/2022.07.08-22.36.11-torture/results-rcuscale-kvfree/TREE/vmlinux
 -append "debug_boot_weak_hash panic=-1 console=ttyS0 rcuscale.kfree_rcu_test=1 
rcuscale.kfree_nthreads=16 rcuscale.holdoff=20 rcuscale.kfree_loops=1 
torture.disable_onoff_at_boot rcuscale.shutdown=1 rcuscale.verbose=0"

tail /tmp/console.log
[ 1232.433552][   T41] BUG: using smp_processor_id() in preemptible [] 
code: khungtaskd/41
[ 1232.439751][   T41] caller is arch_touch_nmi_watchdog+0x34/0xd0
[ 1232.440934][   T41] CPU: 3 PID: 41 Comm: khungtaskd Not tainted 
5.19.0-rc5-next-20220708-dirty #106
[ 1232.442684][   T41] Call Trace:
[ 1232.443343][   T41] [c29cbbb0] [c06df360] 
dump_stack_lvl+0x74/0xa8 (unreliable)
[ 1232.445237][   T41] [c29cbbf0] [c0d04f30] 
check_preemption_disabled+0x150/0x160
[ 1232.446926][   T41] [c29cbc80] [c0035584] 
arch_touch_nmi_watchdog+0x34/0xd0
[ 1232.448532][   T41] [c29cbcb0] [c02068ac] 
watchdog+0x40c/0x5b0
[ 1232.451449][   T41] [c29cbdc0] [c0139df4] kthread+0x144/0x170
[ 1232.452896][   T41] [c29cbe10] [c000cd54] 
ret_from_kernel_thread+0x5c/0x64

After this fix, "BUG: using smp_processor_id() in preemptible [] code: 
khungtaskd/41" does not
appear again.

I also examined other places in watchdog.c where smp_processor_id() are used, 
but they are well protected by preempt
disable.

Kind Regards
Zhouyi
--
 arch/powerpc/kernel/watchdog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 7d28b9553654..ab6b84e00311 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -450,7 +450,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
 void arch_touch_nmi_watchdog(void)
 {
unsigned long ticks = tb_ticks_per_usec * wd_timer_period_ms * 1000;
-   int cpu = smp_processor_id();
+   int cpu = raw_smp_processor_id();
u64 tb;
 
if (!cpumask_test_cpu(cpu, &watchdog_cpumask))
-- 
2.25.1



Re: [PATCH v2 0/4] mm: arm64: bring up BATCHED_UNMAP_TLB_FLUSH

2022-07-13 Thread Xin Hao

Hi barry.

I do some test on Kunpeng arm64 machine use Unixbench.

The test  result as below.

One core, we can see the performance improvement above +30%.
./Run -c 1 -i 1 shell1
w/o
System Benchmarks Partial Index  BASELINE RESULT INDEX
Shell Scripts (1 concurrent) 42.4 5481.0 1292.7

System Benchmarks Index Score (Partial Only) 1292.7

w/
System Benchmarks Partial Index  BASELINE RESULT INDEX
Shell Scripts (1 concurrent) 42.4 6974.6 1645.0

System Benchmarks Index Score (Partial Only) 1645.0


But with whole cores, there have little performance degradation above -5%

./Run -c 96 -i 1 shell1
w/o
Shell Scripts (1 concurrent)  80765.5 lpm   (60.0 s, 1 
samples)

System Benchmarks Partial Index  BASELINE RESULT INDEX
Shell Scripts (1 concurrent) 42.4 80765.5 19048.5

System Benchmarks Index Score (Partial Only)    19048.5

w
Shell Scripts (1 concurrent)  76333.6 lpm   (60.0 s, 1 
samples)

System Benchmarks Partial Index  BASELINE RESULT INDEX
Shell Scripts (1 concurrent) 42.4 76333.6 18003.2

System Benchmarks Index Score (Partial Only)    18003.2

-- 



After discuss with you, and do some changes in the patch.

ndex a52381a680db..1ecba81f1277 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -727,7 +727,11 @@ void flush_tlb_batched_pending(struct mm_struct *mm)
int flushed = batch >> TLB_FLUSH_BATCH_FLUSHED_SHIFT;

if (pending != flushed) {
+#ifdef CONFIG_ARCH_HAS_MM_CPUMASK
flush_tlb_mm(mm);
+#else
+   dsb(ish);
+#endif
/*
 * If the new TLB flushing is pending during flushing, leave
 * mm->tlb_flush_batched as is, to avoid losing flushing.

there have a performance improvement with whole cores, above +30%

./Run -c 96 -i 1 shell1
96 CPUs in system; running 96 parallel copies of tests

Shell Scripts (1 concurrent) 109229.0 lpm   (60.0 s, 1 samples)
System Benchmarks Partial Index  BASELINE   RESULT    INDEX
Shell Scripts (1 concurrent) 42.4 109229.0  25761.6
   
System Benchmarks Index Score (Partial Only)    25761.6


Tested-by: Xin Hao

Looking forward to your next version patch.

On 7/11/22 11:46 AM, Barry Song wrote:

Though ARM64 has the hardware to do tlb shootdown, the hardware
broadcasting is not free.
A simplest micro benchmark shows even on snapdragon 888 with only
8 cores, the overhead for ptep_clear_flush is huge even for paging
out one page mapped by only one process:
5.36%  a.out[kernel.kallsyms]  [k] ptep_clear_flush

While pages are mapped by multiple processes or HW has more CPUs,
the cost should become even higher due to the bad scalability of
tlb shootdown.

The same benchmark can result in 16.99% CPU consumption on ARM64
server with around 100 cores according to Yicong's test on patch
4/4.

This patchset leverages the existing BATCHED_UNMAP_TLB_FLUSH by
1. only send tlbi instructions in the first stage -
arch_tlbbatch_add_mm()
2. wait for the completion of tlbi by dsb while doing tlbbatch
sync in arch_tlbbatch_flush()
My testing on snapdragon shows the overhead of ptep_clear_flush
is removed by the patchset. The micro benchmark becomes 5% faster
even for one page mapped by single process on snapdragon 888.


-v2:
1. Collected Yicong's test result on kunpeng920 ARM64 server;
2. Removed the redundant vma parameter in arch_tlbbatch_add_mm()
according to the comments of Peter Zijlstra and Dave Hansen
3. Added ARCH_HAS_MM_CPUMASK rather than checking if mm_cpumask
is empty according to the comments of Nadav Amit

Thanks, Yicong, Peter, Dave and Nadav for your testing or reviewing
, and comments.

-v1:
https://lore.kernel.org/lkml/20220707125242.425242-1-21cn...@gmail.com/

Barry Song (4):
   Revert "Documentation/features: mark BATCHED_UNMAP_TLB_FLUSH doesn't
 apply to ARM64"
   mm: rmap: Allow platforms without mm_cpumask to defer TLB flush
   mm: rmap: Extend tlbbatch APIs to fit new platforms
   arm64: support batched/deferred tlb shootdown during page reclamation

  Documentation/features/arch-support.txt   |  1 -
  .../features/vm/TLB/arch-support.txt  |  2 +-
  arch/arm/Kconfig  |  1 +
  arch/arm64/Kconfig|  1 +
  arch/arm64/include/asm/tlbbatch.h | 12 ++
  arch/arm64/include/asm/tlbflush.h | 23 +--
  arch/loongarch/Kconfig|  1 +
  arch/mips/Kconfig |  1 +
  arch/openrisc/Kconfig

Re: [PATCH v2 0/4] mm: arm64: bring up BATCHED_UNMAP_TLB_FLUSH

2022-07-13 Thread Barry Song
On Thu, Jul 14, 2022 at 3:29 PM Xin Hao  wrote:
>
> Hi barry.
>
> I do some test on Kunpeng arm64 machine use Unixbench.
>
> The test  result as below.
>
> One core, we can see the performance improvement above +30%.

I am really pleased to see the 30%+ improvement on unixbench on single core.

> ./Run -c 1 -i 1 shell1
> w/o
> System Benchmarks Partial Index  BASELINE RESULT INDEX
> Shell Scripts (1 concurrent) 42.4 5481.0 1292.7
> 
> System Benchmarks Index Score (Partial Only) 1292.7
>
> w/
> System Benchmarks Partial Index  BASELINE RESULT INDEX
> Shell Scripts (1 concurrent) 42.4 6974.6 1645.0
> 
> System Benchmarks Index Score (Partial Only) 1645.0
>
>
> But with whole cores, there have little performance degradation above -5%

That is sad as we might get more concurrency between mprotect(), madvise(),
mremap(), zap_pte_range() and the deferred tlbi.

>
> ./Run -c 96 -i 1 shell1
> w/o
> Shell Scripts (1 concurrent)  80765.5 lpm   (60.0 s, 1
> samples)
> System Benchmarks Partial Index  BASELINE RESULT INDEX
> Shell Scripts (1 concurrent) 42.4 80765.5 19048.5
> 
> System Benchmarks Index Score (Partial Only)19048.5
>
> w
> Shell Scripts (1 concurrent)  76333.6 lpm   (60.0 s, 1
> samples)
> System Benchmarks Partial Index  BASELINE RESULT INDEX
> Shell Scripts (1 concurrent) 42.4 76333.6 18003.2
> 
> System Benchmarks Index Score (Partial Only)18003.2
>
> --
>
>
> After discuss with you, and do some changes in the patch.
>
> ndex a52381a680db..1ecba81f1277 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -727,7 +727,11 @@ void flush_tlb_batched_pending(struct mm_struct *mm)
>  int flushed = batch >> TLB_FLUSH_BATCH_FLUSHED_SHIFT;
>
>  if (pending != flushed) {
> +#ifdef CONFIG_ARCH_HAS_MM_CPUMASK
>  flush_tlb_mm(mm);
> +#else
> +   dsb(ish);
> +#endif
>

i was guessing the problem might be flush_tlb_batched_pending()
so i asked you to change this to verify my guess.

 /*
>   * If the new TLB flushing is pending during flushing, leave
>   * mm->tlb_flush_batched as is, to avoid losing flushing.
>
> there have a performance improvement with whole cores, above +30%

But I don't think it is a proper patch. There is no guarantee the cpu calling
flush_tlb_batched_pending is exactly the cpu sending the deferred
tlbi. so the solution is unsafe. But since this temporary code can bring the
30%+ performance improvement back for high concurrency, we have huge
potential to finally make it.

Unfortunately I don't have an arm64 server to debug on this. I only have
8 cores which are unlikely to reproduce regression which happens in
high concurrency with 96 parallel tasks.

So I'd ask if @yicong or someone else working on kunpeng or other
arm64 servers  is able to actually debug and figure out a proper
patch for this, then add the patch as 5/5 into this series?

>
> ./Run -c 96 -i 1 shell1
> 96 CPUs in system; running 96 parallel copies of tests
>
> Shell Scripts (1 concurrent) 109229.0 lpm   (60.0 s, 1 
> samples)
> System Benchmarks Partial Index  BASELINE   RESULTINDEX
> Shell Scripts (1 concurrent) 42.4 109229.0  25761.6
> 
> System Benchmarks Index Score (Partial Only)25761.6
>
>
> Tested-by: Xin Hao

Thanks for your testing!

>
> Looking forward to your next version patch.
>
> On 7/11/22 11:46 AM, Barry Song wrote:
> > Though ARM64 has the hardware to do tlb shootdown, the hardware
> > broadcasting is not free.
> > A simplest micro benchmark shows even on snapdragon 888 with only
> > 8 cores, the overhead for ptep_clear_flush is huge even for paging
> > out one page mapped by only one process:
> > 5.36%  a.out[kernel.kallsyms]  [k] ptep_clear_flush
> >
> > While pages are mapped by multiple processes or HW has more CPUs,
> > the cost should become even higher due to the bad scalability of
> > tlb shootdown.
> >
> > The same benchmark can result in 16.99% CPU consumption on ARM64
> > server with around 100 cores according to Yicong's test on patch
> > 4/4.
> >
> > This patchset leverages the existing BATCHED_UNMAP_TLB_FLUSH by
> > 1. only send tlbi instructions in the first stage -
> >   arch_tlbbatch_add_mm()
> > 2. wait for the completion of tlbi by dsb while doing tlbbatch
> >   sync in arch_tlbbatch_flush()
> > My testing on snapdragon shows the overhead of ptep_clear_flush
> > is removed by the patchset. The micro benchmark becomes 5% faster
> > even for one page mapped by single process on