Re: [PATCH 2/7] signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32

2020-04-28 Thread Christoph Hellwig
On Mon, Apr 27, 2020 at 03:40:50PM -0700, Andrew Morton wrote:
> > https://www.spinics.net/lists/kernel/msg3473847.html
> > https://www.spinics.net/lists/kernel/msg3473840.html
> > https://www.spinics.net/lists/kernel/msg3473843.html
> 
> OK, but that doesn't necessitate the above monstrosity?  How about
> 
> static int __copy_siginfo_to_user32(struct compat_siginfo __user *to,
>const struct kernel_siginfo *from, bool x32_ABI)
> {
>   struct compat_siginfo new;
>   copy_siginfo_to_external32(&new, from);
>   ...
> }
> 
> int copy_siginfo_to_user32(struct compat_siginfo __user *to,
>  const struct kernel_siginfo *from)
> {
> #if defined(CONFIG_X86_X32_ABI) || defined(CONFIG_IA32_EMULATION)
>   return __copy_siginfo_to_user32(to, from, in_x32_syscall());
> #else
>   return __copy_siginfo_to_user32(to, from, 0);
> #endif
> }
> 
> Or something like that - I didn't try very hard.  We know how to do
> this stuff, and surely this thing isn't how!

I guess that might be a worthwhile middle ground.  Still not a fan of
all these ifdefs..


Re: [PATCH 2/7] signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32

2020-04-28 Thread Christophe Leroy




Le 28/04/2020 à 09:09, Christoph Hellwig a écrit :

On Mon, Apr 27, 2020 at 03:40:50PM -0700, Andrew Morton wrote:

https://www.spinics.net/lists/kernel/msg3473847.html
https://www.spinics.net/lists/kernel/msg3473840.html
https://www.spinics.net/lists/kernel/msg3473843.html


OK, but that doesn't necessitate the above monstrosity?  How about

static int __copy_siginfo_to_user32(struct compat_siginfo __user *to,
 const struct kernel_siginfo *from, bool x32_ABI)
{
struct compat_siginfo new;
copy_siginfo_to_external32(&new, from);
...
}

int copy_siginfo_to_user32(struct compat_siginfo __user *to,
   const struct kernel_siginfo *from)
{
#if defined(CONFIG_X86_X32_ABI) || defined(CONFIG_IA32_EMULATION)
return __copy_siginfo_to_user32(to, from, in_x32_syscall());
#else
return __copy_siginfo_to_user32(to, from, 0);
#endif
}

Or something like that - I didn't try very hard.  We know how to do
this stuff, and surely this thing isn't how!


I guess that might be a worthwhile middle ground.  Still not a fan of
all these ifdefs..



Can't we move the small X32 specific part out of 
__copy_siginfo_to_user32(), in an arch specific helper that voids for 
other architectures ?


Something like:

if (!arch_special_something(&new, from)) {
new.si_utime = from->si_utime;
new.si_stime = from->si_stime;
}

Then the arch_special_something() does what it wants in x86 and returns 
1, and for architectures not implementating it, a generic version return 
0 all the time.


Christophe


Re: [PATCH 2/7] signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32

2020-04-28 Thread Christoph Hellwig
On Tue, Apr 28, 2020 at 09:45:46AM +0200, Christophe Leroy wrote:
>> I guess that might be a worthwhile middle ground.  Still not a fan of
>> all these ifdefs..
>>
>
> Can't we move the small X32 specific part out of 
> __copy_siginfo_to_user32(), in an arch specific helper that voids for other 
> architectures ?
>
> Something like:
>
>   if (!arch_special_something(&new, from)) {
>   new.si_utime = from->si_utime;
>   new.si_stime = from->si_stime;
>   }
>
> Then the arch_special_something() does what it wants in x86 and returns 1, 
> and for architectures not implementating it, a generic version return 0 all 
> the time.

The main issue is that we need an explicit paramter to select x32,
as it can't just be discovered from the calling context otherwise.
The rest is just sugarcoating.



[PATCH v2 1/3] powerpc/numa: Set numa_node for all possible cpus

2020-04-28 Thread Srikar Dronamraju
A Powerpc system with multiple possible nodes and with CONFIG_NUMA
enabled always used to have a node 0, even if node 0 does not any cpus
or memory attached to it. As per PAPR, node affinity of a cpu is only
available once its present / online. For all cpus that are possible but
not present, cpu_to_node() would point to node 0.

To ensure a cpuless, memoryless dummy node is not online, powerpc need
to make sure all possible but not present cpu_to_node are set to a
proper node.

Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-ker...@vger.kernel.org
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: Vlastimil Babka 
Cc: "Kirill A. Shutemov" 
Cc: Christopher Lameter 
Cc: Michael Ellerman 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Signed-off-by: Srikar Dronamraju 
---
Changelog v1:->v2:
- Rebased to v5.7-rc3

 arch/powerpc/mm/numa.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 9fcf2d195830..b3615b7fdbdf 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -931,8 +931,20 @@ void __init mem_topology_setup(void)
 
reset_numa_cpu_lookup_table();
 
-   for_each_present_cpu(cpu)
-   numa_setup_cpu(cpu);
+   for_each_possible_cpu(cpu) {
+   /*
+* Powerpc with CONFIG_NUMA always used to have a node 0,
+* even if it was memoryless or cpuless. For all cpus that
+* are possible but not present, cpu_to_node() would point
+* to node 0. To remove a cpuless, memoryless dummy node,
+* powerpc need to make sure all possible but not present
+* cpu_to_node are set to a proper node.
+*/
+   if (cpu_present(cpu))
+   numa_setup_cpu(cpu);
+   else
+   set_cpu_numa_node(cpu, first_online_node);
+   }
 }
 
 void __init initmem_init(void)
-- 
2.20.1



[PATCH v2 2/3] powerpc/numa: Prefer node id queried from vphn

2020-04-28 Thread Srikar Dronamraju
Node id queried from the static device tree may not
be correct. For example: it may always show 0 on a shared processor.
Hence prefer the node id queried from vphn and fallback on the device tree
based node id if vphn query fails.

Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-ker...@vger.kernel.org
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: Vlastimil Babka 
Cc: "Kirill A. Shutemov" 
Cc: Christopher Lameter 
Cc: Michael Ellerman 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Signed-off-by: Srikar Dronamraju 
---
Changelog v1:->v2:
- Rebased to v5.7-rc3

 arch/powerpc/mm/numa.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index b3615b7fdbdf..281531340230 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -719,20 +719,20 @@ static int __init parse_numa_properties(void)
 */
for_each_present_cpu(i) {
struct device_node *cpu;
-   int nid;
-
-   cpu = of_get_cpu_node(i, NULL);
-   BUG_ON(!cpu);
-   nid = of_node_to_nid_single(cpu);
-   of_node_put(cpu);
+   int nid = vphn_get_nid(i);
 
/*
 * Don't fall back to default_nid yet -- we will plug
 * cpus into nodes once the memory scan has discovered
 * the topology.
 */
-   if (nid < 0)
-   continue;
+   if (nid == NUMA_NO_NODE) {
+   cpu = of_get_cpu_node(i, NULL);
+   if (cpu) {
+   nid = of_node_to_nid_single(cpu);
+   of_node_put(cpu);
+   }
+   }
node_set_online(nid);
}
 
-- 
2.20.1



[PATCH v2 0/3] Offline memoryless cpuless node 0

2020-04-28 Thread Srikar Dronamraju
Changelog v1:->v2:
- Rebased to v5.7-rc3
- Updated the changelog.

Linux kernel configured with CONFIG_NUMA on a system with multiple
possible nodes, marks node 0 as online at boot. However in practice,
there are systems which have node 0 as memoryless and cpuless.

This can cause
1. numa_balancing to be enabled on systems with only one online node.
2. Existence of dummy (cpuless and memoryless) node which can confuse
users/scripts looking at output of lscpu / numactl.

This patchset wants to correct this anomaly.

This should only affect systems that have CONFIG_MEMORYLESS_NODES.
Currently there are only 2 architectures ia64 and powerpc that have this
config.

Note: Patch 3 in this patch series depends on patches 1 and 2.
Without patches 1 and 2, patch 3 might crash powerpc.

v5.7-rc3
 available: 2 nodes (0,2)
 node 0 cpus:
 node 0 size: 0 MB
 node 0 free: 0 MB
 node 2 cpus: 0 1 2 3 4 5 6 7
 node 2 size: 32625 MB
 node 2 free: 31490 MB
 node distances:
 node   0   2
   0:  10  20
   2:  20  10

proc and sys files
--
 /sys/devices/system/node/online:0,2
 /proc/sys/kernel/numa_balancing:1
 /sys/devices/system/node/has_cpu:   2
 /sys/devices/system/node/has_memory:2
 /sys/devices/system/node/has_normal_memory: 2
 /sys/devices/system/node/possible:  0-31

v5.7-rc3 + patches
--
 available: 1 nodes (2)
 node 2 cpus: 0 1 2 3 4 5 6 7
 node 2 size: 32625 MB
 node 2 free: 31487 MB
 node distances:
 node   2
   2:  10

proc and sys files
--
/sys/devices/system/node/online:2
/proc/sys/kernel/numa_balancing:0
/sys/devices/system/node/has_cpu:   2
/sys/devices/system/node/has_memory:2
/sys/devices/system/node/has_normal_memory: 2
/sys/devices/system/node/possible:  0-31

Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-ker...@vger.kernel.org
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: Vlastimil Babka 
Cc: "Kirill A. Shutemov" 
Cc: Christopher Lameter 
Cc: Michael Ellerman 
Cc: Andrew Morton 
Cc: Linus Torvalds 

Srikar Dronamraju (3):
  powerpc/numa: Set numa_node for all possible cpus
  powerpc/numa: Prefer node id queried from vphn
  mm/page_alloc: Keep memoryless cpuless node 0 offline

 arch/powerpc/mm/numa.c | 32 ++--
 mm/page_alloc.c|  4 +++-
 2 files changed, 25 insertions(+), 11 deletions(-)

-- 
2.20.1



[PATCH v2 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline

2020-04-28 Thread Srikar Dronamraju
Currently Linux kernel with CONFIG_NUMA on a system with multiple
possible nodes, marks node 0 as online at boot.  However in practice,
there are systems which have node 0 as memoryless and cpuless.

This can cause numa_balancing to be enabled on systems with only one node
with memory and CPUs. The existence of this dummy node which is cpuless and
memoryless node can confuse users/scripts looking at output of lscpu /
numactl.

By marking, N_ONLINE as NODE_MASK_NONE, lets stop assuming that Node 0 is
always online.

v5.7-rc3
 available: 2 nodes (0,2)
 node 0 cpus:
 node 0 size: 0 MB
 node 0 free: 0 MB
 node 2 cpus: 0 1 2 3 4 5 6 7
 node 2 size: 32625 MB
 node 2 free: 31490 MB
 node distances:
 node   0   2
   0:  10  20
   2:  20  10

proc and sys files
--
 /sys/devices/system/node/online:0,2
 /proc/sys/kernel/numa_balancing:1
 /sys/devices/system/node/has_cpu:   2
 /sys/devices/system/node/has_memory:2
 /sys/devices/system/node/has_normal_memory: 2
 /sys/devices/system/node/possible:  0-31

v5.7-rc3 + patch
--
 available: 1 nodes (2)
 node 2 cpus: 0 1 2 3 4 5 6 7
 node 2 size: 32625 MB
 node 2 free: 31487 MB
 node distances:
 node   2
   2:  10

proc and sys files
--
/sys/devices/system/node/online:2
/proc/sys/kernel/numa_balancing:0
/sys/devices/system/node/has_cpu:   2
/sys/devices/system/node/has_memory:2
/sys/devices/system/node/has_normal_memory: 2
/sys/devices/system/node/possible:  0-31

Note: On Powerpc, cpu_to_node of possible but not present cpus would
previously return 0. Hence this commit depends on commit ("powerpc/numa: Set
numa_node for all possible cpus") and commit ("powerpc/numa: Prefer node id
queried from vphn"). Without the 2 commits, Powerpc system might crash.

Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-ker...@vger.kernel.org
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: Vlastimil Babka 
Cc: "Kirill A. Shutemov" 
Cc: Christopher Lameter 
Cc: Michael Ellerman 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Signed-off-by: Srikar Dronamraju 
---
Changelog v1:->v2:
- Rebased to v5.7-rc3
- Updated the changelog.

 mm/page_alloc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 69827d4fa052..03b89592af04 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -116,8 +116,10 @@ EXPORT_SYMBOL(latent_entropy);
  */
 nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
[N_POSSIBLE] = NODE_MASK_ALL,
+#ifdef CONFIG_NUMA
+   [N_ONLINE] = NODE_MASK_NONE,
+#else
[N_ONLINE] = { { [0] = 1UL } },
-#ifndef CONFIG_NUMA
[N_NORMAL_MEMORY] = { { [0] = 1UL } },
 #ifdef CONFIG_HIGHMEM
[N_HIGH_MEMORY] = { { [0] = 1UL } },
-- 
2.20.1



Re: [PATCH v6 00/28] Initial Prefixed Instruction support

2020-04-28 Thread Balamuruhan S
On Tue, 2020-04-28 at 11:57 +1000, Jordan Niethe wrote:
> A future revision of the ISA will introduce prefixed instructions. A
> prefixed instruction is composed of a 4-byte prefix followed by a
> 4-byte suffix.
> 
> All prefixes have the major opcode 1. A prefix will never be a valid
> word instruction. A suffix may be an existing word instruction or a
> new instruction.
> 
> This series enables prefixed instructions and extends the instruction
> emulation to support them. Then the places where prefixed instructions
> might need to be emulated are updated.

Hi Jordan,

I tried to test Kprobes with prefixed instruction on this patchset and
observed that kprobe/uprobe enablement patches are missing from v4.
Till v3 it were available,

https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200211053355.21574-11-jniet...@gmail.com/

https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200211053355.21574-12-jniet...@gmail.com/

was it missed for any dependencies/reason ? or will you plan it include in next
version ?

please let me know if you need help on it.

-- Bala

> 
> v6 is based on feedback from Balamuruhan Suriyakumar, Alistair Popple,
> Christophe Leroy and Segher Boessenkool.
> The major changes:
> - Use the instruction type in more places that had been missed before
> - Fix issues with ppc32
> - Introduce new self tests for code patching and feature fixups
> 
> v5 is based on feedback from Nick Piggins, Michael Ellerman, Balamuruhan
> Suriyakumar and Alistair Popple.
> The major changes:
> - The ppc instruction type is now a struct
> - Series now just based on next
> - ppc_inst_masked() dropped
> - Space for xmon breakpoints allocated in an assembly file
> - "Add prefixed instructions to instruction data type" patch seperated in
>   to smaller patches
> - Calling convention for create_branch() is changed
> - Some places which had not been updated to use the data type are now
> updated
> 
> v4 is based on feedback from Nick Piggins, Christophe Leroy and Daniel
> Axtens.
> The major changes:
> - Move xmon breakpoints from data section to text section
> - Introduce a data type for instructions on powerpc
> 
> v3 is based on feedback from Christophe Leroy. The major changes:
> - Completely replacing store_inst() with patch_instruction() in
>   xmon
> - Improve implementation of mread_instr() to not use mread().
> - Base the series on top of
>   https://patchwork.ozlabs.org/patch/1232619/ as this will effect
>   kprobes.
> - Some renaming and simplification of conditionals.
> 
> v2 incorporates feedback from Daniel Axtens and and Balamuruhan
> S. The major changes are:
> - Squashing together all commits about SRR1 bits
> - Squashing all commits for supporting prefixed load stores
> - Changing abbreviated references to sufx/prfx -> suffix/prefix
> - Introducing macros for returning the length of an instruction
> - Removing sign extension flag from pstd/pld in sstep.c
> - Dropping patch  "powerpc/fault: Use analyse_instr() to check for
>   store with updates to sp" from the series, it did not really fit
>   with prefixed enablement in the first place and as reported by Greg
>   Kurz did not work correctly.
> 
> Alistair Popple (1):
>   powerpc: Enable Prefixed Instructions
> 
> Jordan Niethe (27):
>   powerpc/xmon: Remove store_inst() for patch_instruction()
>   powerpc/xmon: Move breakpoint instructions to own array
>   powerpc/xmon: Move breakpoints to text section
>   powerpc/xmon: Use bitwise calculations in_breakpoint_table()
>   powerpc: Change calling convention for create_branch() et. al.
>   powerpc: Use a macro for creating instructions from u32s
>   powerpc: Use an accessor for instructions
>   powerpc: Use a function for getting the instruction op code
>   powerpc: Use a function for byte swapping instructions
>   powerpc: Introduce functions for instruction equality
>   powerpc: Use a datatype for instructions
>   powerpc: Use a function for reading instructions
>   powerpc: Add a probe_user_read_inst() function
>   powerpc: Add a probe_kernel_read_inst() function
>   powerpc/kprobes: Use patch_instruction()
>   powerpc: Define and use __get_user_instr{,inatomic}()
>   powerpc: Introduce a function for reporting instruction length
>   powerpc/xmon: Use a function for reading instructions
>   powerpc/xmon: Move insertion of breakpoint for xol'ing
>   powerpc: Make test_translate_branch() independent of instruction
> length
>   powerpc: Define new SRR1 bits for a future ISA version
>   powerpc: Add prefixed instructions to instruction data type
>   powerpc: Test prefixed code patching
>   powerpc: Test prefixed instructions in feature fixups
>   powerpc: Support prefixed instructions in alignment handler
>   powerpc sstep: Add support for prefixed load/stores
>   powerpc sstep: Add support for prefixed fixed-point arithmetic
> 
>  arch/powerpc/include/asm/code-patching.

Re: [PATCH v2 1/2] cpufreq: qoriq: convert to a platform driver

2020-04-28 Thread Viresh Kumar
On 21-04-20, 10:29, Mian Yousaf Kaukab wrote:
> The driver has to be manually loaded if it is built as a module. It
> is neither exporting MODULE_DEVICE_TABLE nor MODULE_ALIAS. Moreover,
> no platform-device is created (and thus no uevent is sent) for the
> clockgen nodes it depends on.
> 
> Convert the module to a platform driver with its own alias. Moreover,
> drop whitelisted SOCs. Platform device will be created only for the
> compatible platforms.
> 
> Reviewed-by: Yuantian Tang 
> Acked-by: Viresh Kumar 
> Signed-off-by: Mian Yousaf Kaukab 
> ---
> v2:
>  +Rafael, Stephen, linux-clk
>  Add Reviewed-by and Acked-by tags
> 
>  drivers/cpufreq/qoriq-cpufreq.c | 76 
> -
>  1 file changed, 29 insertions(+), 47 deletions(-)

@Rafael,

Though this looks to be PPC stuff, but it is used on both ARM and PPC. Do you
want to pick them up or should I do that ?

-- 
viresh


Re: [PATCH] powerpc: Add interrupt mode information in /proc/cpuinfo

2020-04-28 Thread Michael Ellerman
Cédric Le Goater  writes:
> PowerNV and pSeries machines can run using the XIVE or XICS interrupt
> mode. Report this information in /proc/cpuinfo :
>
> timebase: 51200
> platform: PowerNV
> model   : 9006-22C
> machine : PowerNV 9006-22C
> firmware: OPAL
> MMU : Radix
> IRQ : XIVE

H, I dunno. At what point do we stop putting random non CPU-related
things in cpuinfo? :)

The IRQ mode is (reasonably) easily discovered in sys, eg:

  $ cat /sys/kernel/irq/*/chip_name | grep -m 1 XIVE
  XIVE-IRQ

vs:

  $ cat /sys/kernel/irq/*/chip_name | grep -m 1 XICS
  XICS


cheers

> diff --git a/arch/powerpc/platforms/powernv/setup.c 
> b/arch/powerpc/platforms/powernv/setup.c
> index 3bc188da82ba..39ef3394038d 100644
> --- a/arch/powerpc/platforms/powernv/setup.c
> +++ b/arch/powerpc/platforms/powernv/setup.c
> @@ -196,14 +196,18 @@ static void pnv_show_cpuinfo(struct seq_file *m)
>   model = of_get_property(root, "model", NULL);
>   seq_printf(m, "machine\t\t: PowerNV %s\n", model);
>   if (firmware_has_feature(FW_FEATURE_OPAL))
> - seq_printf(m, "firmware\t: OPAL\n");
> + seq_puts(m, "firmware\t: OPAL\n");
>   else
> - seq_printf(m, "firmware\t: BML\n");
> + seq_puts(m, "firmware\t: BML\n");
>   of_node_put(root);
>   if (radix_enabled())
> - seq_printf(m, "MMU\t\t: Radix\n");
> + seq_puts(m, "MMU\t\t: Radix\n");
>   else
> - seq_printf(m, "MMU\t\t: Hash\n");
> + seq_puts(m, "MMU\t\t: Hash\n");
> + if (xive_enabled())
> + seq_puts(m, "IRQ\t\t: XIVE\n");
> + else
> + seq_puts(m, "IRQ\t\t: XICS\n");
>  }
>  
>  static void pnv_prepare_going_down(void)
> diff --git a/arch/powerpc/platforms/pseries/setup.c 
> b/arch/powerpc/platforms/pseries/setup.c
> index 0c8421dd01ab..d248fca67797 100644
> --- a/arch/powerpc/platforms/pseries/setup.c
> +++ b/arch/powerpc/platforms/pseries/setup.c
> @@ -95,9 +95,13 @@ static void pSeries_show_cpuinfo(struct seq_file *m)
>   seq_printf(m, "machine\t\t: CHRP %s\n", model);
>   of_node_put(root);
>   if (radix_enabled())
> - seq_printf(m, "MMU\t\t: Radix\n");
> + seq_puts(m, "MMU\t\t: Radix\n");
>   else
> - seq_printf(m, "MMU\t\t: Hash\n");
> + seq_puts(m, "MMU\t\t: Hash\n");
> + if (xive_enabled())
> + seq_puts(m, "IRQ\t\t: XIVE\n");
> + else
> + seq_puts(m, "IRQ\t\t: XICS\n");
>  }
>  
>  /* Initialize firmware assisted non-maskable interrupts if
> -- 
> 2.25.3


[PATCH v2] powerpc/64: BE option to use ELFv2 ABI for big endian kernels

2020-04-28 Thread Nicholas Piggin
Provide an option to use ELFv2 ABI for big endian builds. This works on
GCC and clang (since 2014). It is less well tested and supported by the
GNU toolchain, but it can give some useful advantages of the ELFv2 ABI
for BE (e.g., less stack usage). Some distros even build BE ELFv2
userspace.

Reviewed-by: Segher Boessenkool 
Signed-off-by: Nicholas Piggin 
---
Since v1:
- Improved the override flavour name suggested by Segher.
- Improved changelog wording.


 arch/powerpc/Kconfig| 19 +++
 arch/powerpc/Makefile   | 15 ++-
 arch/powerpc/boot/Makefile  |  4 
 drivers/crypto/vmx/Makefile |  8 ++--
 drivers/crypto/vmx/ppc-xlate.pl | 10 ++
 5 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 924c541a9260..d9d2abc06c2c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -147,6 +147,7 @@ config PPC
select ARCH_WEAK_RELEASE_ACQUIRE
select BINFMT_ELF
select BUILDTIME_TABLE_SORT
+   select BUILD_ELF_V2 if PPC64 && CPU_LITTLE_ENDIAN
select CLONE_BACKWARDS
select DCACHE_WORD_ACCESS   if PPC64 && CPU_LITTLE_ENDIAN
select DYNAMIC_FTRACE   if FUNCTION_TRACER
@@ -541,6 +542,24 @@ config KEXEC_FILE
 config ARCH_HAS_KEXEC_PURGATORY
def_bool KEXEC_FILE
 
+config BUILD_ELF_V2
+   bool
+
+config BUILD_BIG_ENDIAN_ELF_V2
+   bool "Build big-endian kernel using ELFv2 ABI (EXPERIMENTAL)"
+   depends on PPC64 && CPU_BIG_ENDIAN && EXPERT
+   default n
+   select BUILD_ELF_V2
+   help
+ This builds the kernel image using the ELFv2 ABI, which has a
+ reduced stack overhead and faster function calls. This does not
+ affect the userspace ABIs.
+
+ ELFv2 is the standard ABI for little-endian, but for big-endian
+ this is an experimental option that is less tested (kernel and
+ toolchain). This requires gcc 4.9 or newer and binutils 2.24 or
+ newer.
+
 config RELOCATABLE
bool "Build a relocatable kernel"
depends on PPC64 || (FLATMEM && (44x || FSL_BOOKE))
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index f310c32e88a4..e306b39d847e 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -92,10 +92,14 @@ endif
 
 ifdef CONFIG_PPC64
 ifndef CONFIG_CC_IS_CLANG
-cflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mabi=elfv1)
-cflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call 
cc-option,-mcall-aixdesc)
-aflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mabi=elfv1)
-aflags-$(CONFIG_CPU_LITTLE_ENDIAN) += -mabi=elfv2
+ifdef CONFIG_BUILD_ELF_V2
+cflags-y   += $(call cc-option,-mabi=elfv2,$(call 
cc-option,-mcall-aixdesc))
+aflags-y   += $(call cc-option,-mabi=elfv2)
+else
+cflags-y   += $(call cc-option,-mabi=elfv1)
+cflags-y   += $(call cc-option,-mcall-aixdesc)
+aflags-y   += $(call cc-option,-mabi=elfv1)
+endif
 endif
 endif
 
@@ -144,7 +148,7 @@ endif
 
 CFLAGS-$(CONFIG_PPC64) := $(call cc-option,-mtraceback=no)
 ifndef CONFIG_CC_IS_CLANG
-ifdef CONFIG_CPU_LITTLE_ENDIAN
+ifdef CONFIG_BUILD_ELF_V2
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2,$(call 
cc-option,-mcall-aixdesc))
 AFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2)
 else
@@ -153,6 +157,7 @@ CFLAGS-$(CONFIG_PPC64)  += $(call 
cc-option,-mcall-aixdesc)
 AFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv1)
 endif
 endif
+
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium,$(call 
cc-option,-mminimal-toc))
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mno-pointers-to-nested-functions)
 
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index c53a1b8bba8b..03942d08695d 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -41,6 +41,10 @@ endif
 
 BOOTCFLAGS += -isystem $(shell $(BOOTCC) -print-file-name=include)
 
+ifdef CONFIG_BUILD_ELF_V2
+BOOTCFLAGS += $(call cc-option,-mabi=elfv2)
+endif
+
 ifdef CONFIG_CPU_BIG_ENDIAN
 BOOTCFLAGS += -mbig-endian
 else
diff --git a/drivers/crypto/vmx/Makefile b/drivers/crypto/vmx/Makefile
index 709670d2b553..9aea34602beb 100644
--- a/drivers/crypto/vmx/Makefile
+++ b/drivers/crypto/vmx/Makefile
@@ -5,18 +5,22 @@ vmx-crypto-objs := vmx.o aesp8-ppc.o ghashp8-ppc.o aes.o 
aes_cbc.o aes_ctr.o aes
 ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y)
 override flavour := linux-ppc64le
 else
+ifdef CONFIG_BUILD_ELF_V2
+override flavour := linux-ppc64-elfv2
+else
 override flavour := linux-ppc64
 endif
+endif
 
 quiet_cmd_perl = PERL $@
   cmd_perl = $(PERL) $(<) $(flavour) > $(@)
 
 targets += aesp8-ppc.S ghashp8-ppc.S
 
-$(obj)/aesp8-ppc.S: $(src)/aesp8-ppc.pl FORCE
+$(obj)/aesp8-ppc.S: $(src)/aesp8-ppc.pl $(src)/ppc-xlate.pl FORCE
  

Re: [PATCH v4] pci: Make return value of pcie_capability_read*() consistent

2020-04-28 Thread Bjorn Helgaas
On Tue, Apr 28, 2020 at 10:19:08AM +0800, Yicong Yang wrote:
> On 2020/4/28 2:13, Bjorn Helgaas wrote:
> >
> > I'm starting to think we're approaching this backwards.  I searched
> > for PCIBIOS_FUNC_NOT_SUPPORTED, PCIBIOS_BAD_VENDOR_ID, and the other
> > error values.  Almost every use is a *return* in a config accessor.
> > There are very, very few *tests* for these values.
> 
> If we have certain reasons to reserve PCI_BIOS* error to identify
> PCI errors in PCI drivers, maybe redefine the PCI_BIOS* to generic
> error codes can solve the issues, and no need to call
> pcibios_err_to_errno() to do the conversion.  Few changes may be
> made to current codes. One possible patch may look like below.
> Otherwise, maybe convert all PCI_BIOS* errors to generic error codes
> is a better idea.
> 
> Not sure it's the best way or not. Just FYI.

That's a brilliant idea!  We should still look carefully at all the
callers of the config accessors, but this would avoid changing all the
arch accessors, so the patch would be dramatically smaller.

> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 83ce1cd..843987c 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -675,14 +675,18 @@ static inline bool pci_dev_msi_enabled(struct pci_dev 
> *pci_dev) { return false;
>  
>  /* Error values that may be returned by PCI functions */
>  #define PCIBIOS_SUCCESSFUL   0x00
> -#define PCIBIOS_FUNC_NOT_SUPPORTED   0x81
> -#define PCIBIOS_BAD_VENDOR_ID0x83
> -#define PCIBIOS_DEVICE_NOT_FOUND 0x86
> -#define PCIBIOS_BAD_REGISTER_NUMBER  0x87
> -#define PCIBIOS_SET_FAILED   0x88
> -#define PCIBIOS_BUFFER_TOO_SMALL 0x89
> -
> -/* Translate above to generic errno for passing back through non-PCI code */
> +#define PCIBIOS_FUNC_NOT_SUPPORTED   -ENOENT
> +#define PCIBIOS_BAD_VENDOR_ID-ENOTTY
> +#define PCIBIOS_DEVICE_NOT_FOUND -ENODEV
> +#define PCIBIOS_BAD_REGISTER_NUMBER  -EFAULT
> +#define PCIBIOS_SET_FAILED   -EIO
> +#define PCIBIOS_BUFFER_TOO_SMALL -ENOSPC
> +
> +/**
> + * Translate above to generic errno for passing back through non-PCI code
> + *
> + * Deprecated. Use the PCIBIOS_* directly without a translation.
> + */
>  static inline int pcibios_err_to_errno(int err)
>  {
>   if (err <= PCIBIOS_SUCCESSFUL)
> @@ -690,17 +694,12 @@ static inline int pcibios_err_to_errno(int err)
>  
>   switch (err) {
>   case PCIBIOS_FUNC_NOT_SUPPORTED:
> - return -ENOENT;
>   case PCIBIOS_BAD_VENDOR_ID:
> - return -ENOTTY;
>   case PCIBIOS_DEVICE_NOT_FOUND:
> - return -ENODEV;
>   case PCIBIOS_BAD_REGISTER_NUMBER:
> - return -EFAULT;
>   case PCIBIOS_SET_FAILED:
> - return -EIO;
>   case PCIBIOS_BUFFER_TOO_SMALL:
> - return -ENOSPC;
> + return err;
>   }
>  
>   return -ERANGE;
> 
> > For example, the only tests for PCIBIOS_FUNC_NOT_SUPPORTED are in
> > xen_pcibios_err_to_errno() and pcibios_err_to_errno(), i.e., we're
> > just converting that value to -ENOENT or the Xen-specific thing.
> >
> > So I think the best approach might be to remove the PCIBIOS_* error
> > values completely and replace them with the corresponding values from
> > pcibios_err_to_errno().  For example, a part of the patch would look
> > like this:
> >
> > diff --git a/arch/mips/pci/ops-emma2rh.c b/arch/mips/pci/ops-emma2rh.c
> > index 65f47344536c..d4d9c902c147 100644
> > --- a/arch/mips/pci/ops-emma2rh.c
> > +++ b/arch/mips/pci/ops-emma2rh.c
> > @@ -100,7 +100,7 @@ static int pci_config_read(struct pci_bus *bus, 
> > unsigned int devfn, int where,
> > break;
> > default:
> > emma2rh_out32(EMMA2RH_PCI_IWIN0_CTR, backup_win0);
> > -   return PCIBIOS_FUNC_NOT_SUPPORTED;
> > +   return -ENOENT;
> > }
> >  
> > emma2rh_out32(EMMA2RH_PCI_IWIN0_CTR, backup_win0);
> > @@ -149,7 +149,7 @@ static int pci_config_write(struct pci_bus *bus, 
> > unsigned int devfn, int where,
> > break;
> > default:
> > emma2rh_out32(EMMA2RH_PCI_IWIN0_CTR, backup_win0);
> > -   return PCIBIOS_FUNC_NOT_SUPPORTED;
> > +   return -ENOENT;
> > }
> > *(volatile u32 *)(base + (PCI_FUNC(devfn) << 8) +
> >   (where & 0xfffc)) = data;
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 83ce1cdf5676..f95637a8d391 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -675,7 +675,6 @@ static inline bool pci_dev_msi_enabled(struct pci_dev 
> > *pci_dev) { return false;
> >  
> >  /* Error values that may be returned by PCI functions */
> >  #define PCIBIOS_SUCCESSFUL 0x00
> > -#define PCIBIOS_FUNC_NOT_SUPPORTED 0x81
> >  #define PCIBIOS_BAD_VENDOR_ID  0x83
> >  #define PCIBIOS_DEVICE_NOT_FOUND   0x86
> >  #define PCIBIOS_BAD_REGISTER_NUMBER0x87
> > @@ -689,8 +688,6 @@ static inline int pcibios_err_to_er

[PATCH] powerpc/spufs: Add rcu_read_lock() around fcheck()

2020-04-28 Thread Michael Ellerman
Currently the spu coredump code triggers an RCU warning:

  =
  WARNING: suspicious RCU usage
  5.7.0-rc3-01755-g7cd49f0b7ec7 #1 Not tainted
  -
  include/linux/fdtable.h:95 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  1 lock held by spu-coredump/1343:
   #0: c007fa22f430 (sb_writers#2){.+.+}-{0:0}, at: 
.do_coredump+0x1010/0x13c8

  stack backtrace:
  CPU: 0 PID: 1343 Comm: spu-coredump Not tainted 5.7.0-rc3-01755-g7cd49f0b7ec7 
#1
  Call Trace:
.dump_stack+0xec/0x15c (unreliable)
.lockdep_rcu_suspicious+0x120/0x144
.coredump_next_context+0x148/0x158
.spufs_coredump_extra_notes_size+0x54/0x190
.elf_coredump_extra_notes_size+0x34/0x50
.elf_core_dump+0xe48/0x19d0
.do_coredump+0xe50/0x13c8
.get_signal+0x864/0xd88
.do_notify_resume+0x158/0x3c8
.interrupt_exit_user_prepare+0x19c/0x208
interrupt_return+0x14/0x1c0

This comes from fcheck_files() via fcheck().

It's pretty clearly documented that fcheck() must be wrapped with
rcu_read_lock(), so fix it.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/cell/spufs/coredump.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/cell/spufs/coredump.c 
b/arch/powerpc/platforms/cell/spufs/coredump.c
index 8b3296b62f65..0fc52cbaa552 100644
--- a/arch/powerpc/platforms/cell/spufs/coredump.c
+++ b/arch/powerpc/platforms/cell/spufs/coredump.c
@@ -82,13 +82,19 @@ static int match_context(const void *v, struct file *file, 
unsigned fd)
  */
 static struct spu_context *coredump_next_context(int *fd)
 {
+   struct spu_context *ctx;
struct file *file;
int n = iterate_fd(current->files, *fd, match_context, NULL);
if (!n)
return NULL;
*fd = n - 1;
+
+   rcu_read_lock();
file = fcheck(*fd);
-   return SPUFS_I(file_inode(file))->i_ctx;
+   ctx = SPUFS_I(file_inode(file))->i_ctx;
+   rcu_read_unlock();
+
+   return ctx;
 }
 
 int spufs_coredump_extra_notes_size(void)
-- 
2.25.1



Re: [PATCH] powerpc: Add interrupt mode information in /proc/cpuinfo

2020-04-28 Thread Cédric Le Goater
On 4/28/20 1:03 PM, Michael Ellerman wrote:
> Cédric Le Goater  writes:
>> PowerNV and pSeries machines can run using the XIVE or XICS interrupt
>> mode. Report this information in /proc/cpuinfo :
>>
>> timebase: 51200
>> platform: PowerNV
>> model   : 9006-22C
>> machine : PowerNV 9006-22C
>> firmware: OPAL
>> MMU : Radix
>> IRQ : XIVE
> 
> H, I dunno. At what point do we stop putting random non CPU-related
> things in cpuinfo? :)

True. 

> The IRQ mode is (reasonably) easily discovered in sys, eg:
> 
>   $ cat /sys/kernel/irq/*/chip_name | grep -m 1 XIVE
>   XIVE-IRQ
> 
> vs:
> 
>   $ cat /sys/kernel/irq/*/chip_name | grep -m 1 XICS
>   XICS


That's good enough for error reporting

Thanks.

C.

> 
> cheers
> 
>> diff --git a/arch/powerpc/platforms/powernv/setup.c 
>> b/arch/powerpc/platforms/powernv/setup.c
>> index 3bc188da82ba..39ef3394038d 100644
>> --- a/arch/powerpc/platforms/powernv/setup.c
>> +++ b/arch/powerpc/platforms/powernv/setup.c
>> @@ -196,14 +196,18 @@ static void pnv_show_cpuinfo(struct seq_file *m)
>>  model = of_get_property(root, "model", NULL);
>>  seq_printf(m, "machine\t\t: PowerNV %s\n", model);
>>  if (firmware_has_feature(FW_FEATURE_OPAL))
>> -seq_printf(m, "firmware\t: OPAL\n");
>> +seq_puts(m, "firmware\t: OPAL\n");
>>  else
>> -seq_printf(m, "firmware\t: BML\n");
>> +seq_puts(m, "firmware\t: BML\n");
>>  of_node_put(root);
>>  if (radix_enabled())
>> -seq_printf(m, "MMU\t\t: Radix\n");
>> +seq_puts(m, "MMU\t\t: Radix\n");
>>  else
>> -seq_printf(m, "MMU\t\t: Hash\n");
>> +seq_puts(m, "MMU\t\t: Hash\n");
>> +if (xive_enabled())
>> +seq_puts(m, "IRQ\t\t: XIVE\n");
>> +else
>> +seq_puts(m, "IRQ\t\t: XICS\n");
>>  }
>>  
>>  static void pnv_prepare_going_down(void)
>> diff --git a/arch/powerpc/platforms/pseries/setup.c 
>> b/arch/powerpc/platforms/pseries/setup.c
>> index 0c8421dd01ab..d248fca67797 100644
>> --- a/arch/powerpc/platforms/pseries/setup.c
>> +++ b/arch/powerpc/platforms/pseries/setup.c
>> @@ -95,9 +95,13 @@ static void pSeries_show_cpuinfo(struct seq_file *m)
>>  seq_printf(m, "machine\t\t: CHRP %s\n", model);
>>  of_node_put(root);
>>  if (radix_enabled())
>> -seq_printf(m, "MMU\t\t: Radix\n");
>> +seq_puts(m, "MMU\t\t: Radix\n");
>>  else
>> -seq_printf(m, "MMU\t\t: Hash\n");
>> +seq_puts(m, "MMU\t\t: Hash\n");
>> +if (xive_enabled())
>> +seq_puts(m, "IRQ\t\t: XIVE\n");
>> +else
>> +seq_puts(m, "IRQ\t\t: XICS\n");
>>  }
>>  
>>  /* Initialize firmware assisted non-maskable interrupts if
>> -- 
>> 2.25.3



[RFC PATCH] powerpc/spufs: fix copy_to_user while atomic

2020-04-28 Thread Jeremy Kerr
Currently, we may perform a copy_to_user (through
simple_read_from_buffer()) while holding a context's register_lock,
while accessing the context save area.

This change uses a temporary buffers for the context save area data,
which we then pass to simple_read_from_buffer.

Signed-off-by: Jeremy Kerr 
---

Christoph - this fixes the copy_to_user while atomic, hopefully with
only minimal distruption to your series.

---
 arch/powerpc/platforms/cell/spufs/file.c | 110 +++
 1 file changed, 74 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/file.c 
b/arch/powerpc/platforms/cell/spufs/file.c
index c0f950a3f4e1..c62d77ddaf7d 100644
--- a/arch/powerpc/platforms/cell/spufs/file.c
+++ b/arch/powerpc/platforms/cell/spufs/file.c
@@ -1978,8 +1978,9 @@ static ssize_t __spufs_mbox_info_read(struct spu_context 
*ctx,
 static ssize_t spufs_mbox_info_read(struct file *file, char __user *buf,
   size_t len, loff_t *pos)
 {
-   int ret;
struct spu_context *ctx = file->private_data;
+   u32 stat, data;
+   int ret;
 
if (!access_ok(buf, len))
return -EFAULT;
@@ -1988,11 +1989,16 @@ static ssize_t spufs_mbox_info_read(struct file *file, 
char __user *buf,
if (ret)
return ret;
spin_lock(&ctx->csa.register_lock);
-   ret = __spufs_mbox_info_read(ctx, buf, len, pos);
+   stat = ctx->csa.prob.mb_stat_R;
+   data = ctx->csa.prob.pu_mb_R;
spin_unlock(&ctx->csa.register_lock);
spu_release_saved(ctx);
 
-   return ret;
+   /* EOF if there's no entry in the mbox */
+   if (!(stat & 0xff))
+   return 0;
+
+   return simple_read_from_buffer(buf, len, pos, &data, sizeof(data));
 }
 
 static const struct file_operations spufs_mbox_info_fops = {
@@ -2019,6 +2025,7 @@ static ssize_t spufs_ibox_info_read(struct file *file, 
char __user *buf,
   size_t len, loff_t *pos)
 {
struct spu_context *ctx = file->private_data;
+   u32 stat, data;
int ret;
 
if (!access_ok(buf, len))
@@ -2028,11 +2035,16 @@ static ssize_t spufs_ibox_info_read(struct file *file, 
char __user *buf,
if (ret)
return ret;
spin_lock(&ctx->csa.register_lock);
-   ret = __spufs_ibox_info_read(ctx, buf, len, pos);
+   stat = ctx->csa.prob.mb_stat_R;
+   data = ctx->csa.priv2.puint_mb_R;
spin_unlock(&ctx->csa.register_lock);
spu_release_saved(ctx);
 
-   return ret;
+   /* EOF if there's no entry in the ibox */
+   if (!(stat & 0xff))
+   return 0;
+
+   return simple_read_from_buffer(buf, len, pos, &data, sizeof(data));
 }
 
 static const struct file_operations spufs_ibox_info_fops = {
@@ -2041,6 +2053,11 @@ static const struct file_operations spufs_ibox_info_fops 
= {
.llseek  = generic_file_llseek,
 };
 
+static size_t spufs_wbox_info_cnt(struct spu_context *ctx)
+{
+   return (4 - ((ctx->csa.prob.mb_stat_R & 0x00ff00) >> 8)) * sizeof(u32);
+}
+
 static ssize_t __spufs_wbox_info_read(struct spu_context *ctx,
char __user *buf, size_t len, loff_t *pos)
 {
@@ -2049,7 +2066,7 @@ static ssize_t __spufs_wbox_info_read(struct spu_context 
*ctx,
u32 wbox_stat;
 
wbox_stat = ctx->csa.prob.mb_stat_R;
-   cnt = 4 - ((wbox_stat & 0x00ff00) >> 8);
+   cnt = spufs_wbox_info_cnt(ctx);
for (i = 0; i < cnt; i++) {
data[i] = ctx->csa.spu_mailbox_data[i];
}
@@ -2062,7 +2079,8 @@ static ssize_t spufs_wbox_info_read(struct file *file, 
char __user *buf,
   size_t len, loff_t *pos)
 {
struct spu_context *ctx = file->private_data;
-   int ret;
+   u32 data[ARRAY_SIZE(ctx->csa.spu_mailbox_data)];
+   int ret, count;
 
if (!access_ok(buf, len))
return -EFAULT;
@@ -2071,11 +2089,13 @@ static ssize_t spufs_wbox_info_read(struct file *file, 
char __user *buf,
if (ret)
return ret;
spin_lock(&ctx->csa.register_lock);
-   ret = __spufs_wbox_info_read(ctx, buf, len, pos);
+   count = spufs_wbox_info_cnt(ctx);
+   memcpy(&data, &ctx->csa.spu_mailbox_data, sizeof(data));
spin_unlock(&ctx->csa.register_lock);
spu_release_saved(ctx);
 
-   return ret;
+   return simple_read_from_buffer(buf, len, pos, &data,
+   count * sizeof(u32));
 }
 
 static const struct file_operations spufs_wbox_info_fops = {
@@ -2084,20 +2104,19 @@ static const struct file_operations 
spufs_wbox_info_fops = {
.llseek  = generic_file_llseek,
 };
 
-static ssize_t __spufs_dma_info_read(struct spu_context *ctx,
-   char __user *buf, size_t len, loff_t *pos)
+static void ___spufs_dma_info_read(struct spu_context *ctx,
+   struct spu_dma_info *info)
 {
-   struct spu_dma_info i

[PATCH] powerpc/64: Don't initialise init_task->thread.regs

2020-04-28 Thread Michael Ellerman
Aneesh increased the size of struct pt_regs by 16 bytes and started
seeing this WARN_ON:

  smp: Bringing up secondary CPUs ...
  [ cut here ]
  WARNING: CPU: 0 PID: 0 at arch/powerpc/kernel/process.c:455 
giveup_all+0xb4/0x110
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+ #318
  NIP:  c001a2b4 LR: c001a29c CTR: c31d
  REGS: c26d3980 TRAP: 0700   Not tainted  
(5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+)
  MSR:  8282b033   CR: 48048224  XER: 

  CFAR: c0019cc8 IRQMASK: 1
  GPR00: c001a264 c26d3c20 c26d7200 8280b033
  GPR04: 0001  0077 30206d7372203164
  GPR08: 2000 02002000 8280b033 3230303030303030
  GPR12: 8800 c31d 00800050 0266
  GPR16: 0309a1a0 0309a4b0 0309a2d8 0309a890
  GPR20: 030d0098 c264da40 fd62 c000ff798080
  GPR24: c264edf0 c001007469f0 fd62 c20e5e90
  GPR28: c264edf0 c264d200 1db6 c264d200
  NIP [c001a2b4] giveup_all+0xb4/0x110
  LR [c001a29c] giveup_all+0x9c/0x110
  Call Trace:
  [c26d3c20] [c001a264] giveup_all+0x64/0x110 (unreliable)
  [c26d3c90] [c001ae34] __switch_to+0x104/0x480
  [c26d3cf0] [c0e0b8a0] __schedule+0x320/0x970
  [c26d3dd0] [c0e0c518] schedule_idle+0x38/0x70
  [c26d3df0] [c019c7c8] do_idle+0x248/0x3f0
  [c26d3e70] [c019cbb8] cpu_startup_entry+0x38/0x40
  [c26d3ea0] [c0011bb0] rest_init+0xe0/0xf8
  [c26d3ed0] [c2004820] start_kernel+0x990/0x9e0
  [c26d3f90] [c000c49c] start_here_common+0x1c/0x400

Which was unexpected. The warning is checking the thread.regs->msr
value of the task we are switching from:

  usermsr = tsk->thread.regs->msr;
  ...
  WARN_ON((usermsr & MSR_VSX) && !((usermsr & MSR_FP) && (usermsr & MSR_VEC)));

ie. if MSR_VSX is set then both of MSR_FP and MSR_VEC are also set.

Dumping tsk->thread.regs->msr we see that it's: 0x1db6

Which is not a normal looking MSR, in fact the only valid bit is
MSR_VSX, all the other bits are reserved in the current definition of
the MSR.

We can see from the oops that it was swapper/0 that we were switching
from when we hit the warning, ie. init_task. So its thread.regs points
to the base (high addresses) in init_stack.

Dumping the content of init_task->thread.regs, with the members of
pt_regs annotated (the 16 bytes larger version), we see:

   c2780080gpr[0] gpr[1]
   c2666008gpr[2] gpr[3]
  c26d3ed0 0078gpr[4] gpr[5]
  c0011b68 c2780080gpr[6] gpr[7]
   gpr[8] gpr[9]
  c26d3f90 80002200gpr[10]gpr[11]
  c2004820 c26d7200gpr[12]gpr[13]
  1db6 c10aabe8gpr[14]gpr[15]
  c10aabe8 c10aabe8gpr[16]gpr[17]
  c294d598 gpr[18]gpr[19]
   1ff8gpr[20]gpr[21]
   c206d608gpr[22]gpr[23]
  c278e0cc gpr[24]gpr[25]
  2fff c000gpr[26]gpr[27]
  0200 0028gpr[28]gpr[29]
  1db6 0475gpr[30]gpr[31]
  0200 1db6nipmsr
   orig_r3ctr
  c000c49c link   xer
   ccrsofte
   trap   dar
   dsisr  result
   pprkuap
   pad[2] pad[3]

This looks suspiciously like stack frames, not a pt_regs. If we look
closely we can see return addresses from the stack trace above,
c2004820 (start_kernel) and c000c49c (start_here_common).

init_task->thread.regs is setup at build time in processor.h:

  #define INIT_THREAD  { \
.ksp = INIT_SP, \
.regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \

The early boot code where we setup the initial stack is:

  LOAD_REG_ADDR(r3,init_thread_union)

  /* set up a stack pointer */
  LOAD_REG_IMMEDIATE(r1,THREAD_SIZE)
  add   r1,r3,r1
  lir0,0
  stdu  r0,-STACK_FRAME_OVERHEAD(r1)

Which creates a stack frame of size 112 bytes (STACK_FRAME_OVERHEAD).
Which is far too small to contain a pt_regs.

So the result is init_task->thread.regs is pointing at some stack
frames on the init stack, not at a pt_regs.

We have gotten away with this 

[PATCH] powerpc: Drop unneeded cast in task_pt_regs()

2020-04-28 Thread Michael Ellerman
There's no need to cast in task_pt_regs() as tsk->thread.regs should
already be a struct pt_regs. If someone's using task_pt_regs() on
something that's not a task but happens to have a thread.regs then
we'll deal with them later.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index bfa336fbcfeb..8e855c78d780 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -307,7 +307,7 @@ struct thread_struct {
 }
 #endif
 
-#define task_pt_regs(tsk)  ((struct pt_regs *)(tsk)->thread.regs)
+#define task_pt_regs(tsk)  ((tsk)->thread.regs)
 
 unsigned long get_wchan(struct task_struct *p);
 
-- 
2.25.1



Re: [PATCH] tty: hvc: Fix data abort due to race in hvc_open

2020-04-28 Thread Markus Elfring
> Hence, serialize hvc_open and check if tty->private_data is NULL before
> proceeding ahead.

How do you think about to add the tag “Fixes” because of adjustments
for the data synchronisation?


…
> +++ b/drivers/tty/hvc/hvc_console.c
…
@@ -384,6 +394,8 @@ static int hvc_open(struct tty_struct *tty, struct file * 
filp)
…
> +out:
> + mutex_unlock(&hvc_open_mutex);
>   return rc;
>  }

I suggest to use the label “unlock” instead.

Regards,
Markus


[PATCH v8 2/8] powerpc/vdso: Remove __kernel_datapage_offset and simplify __get_datapage()

2020-04-28 Thread Christophe Leroy
The VDSO datapage and the text pages are always located immediately
next to each other, so it can be hardcoded without an indirection
through __kernel_datapage_offset

In order to ease things, move the data page in front like other
arches, that way there is no need to know the size of the library
to locate the data page.

Before:
clock-getres-realtime-coarse:vdso: 714 nsec/call
clock-gettime-realtime-coarse:vdso: 792 nsec/call
clock-gettime-realtime:vdso: 1243 nsec/call

After:
clock-getres-realtime-coarse:vdso: 699 nsec/call
clock-gettime-realtime-coarse:vdso: 784 nsec/call
clock-gettime-realtime:vdso: 1231 nsec/call

Signed-off-by: Christophe Leroy 
---
v7:
- Moved the removal of the tmp param of __get_datapage()
in a subsequent patch
- Included the addition of the offset param to __get_datapage()
in the further preparatory patch
---
 arch/powerpc/include/asm/vdso_datapage.h |  8 ++--
 arch/powerpc/kernel/vdso.c   | 53 
 arch/powerpc/kernel/vdso32/datapage.S|  3 --
 arch/powerpc/kernel/vdso32/vdso32.lds.S  |  7 +---
 arch/powerpc/kernel/vdso64/datapage.S|  3 --
 arch/powerpc/kernel/vdso64/vdso64.lds.S  |  7 +---
 6 files changed, 16 insertions(+), 65 deletions(-)

diff --git a/arch/powerpc/include/asm/vdso_datapage.h 
b/arch/powerpc/include/asm/vdso_datapage.h
index b9ef6cf50ea5..11886467dfdf 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -118,10 +118,12 @@ extern struct vdso_data *vdso_data;
 
 .macro get_datapage ptr, tmp
bcl 20, 31, .+4
+999:
mflr\ptr
-   addi\ptr, \ptr, (__kernel_datapage_offset - (.-4))@l
-   lwz \tmp, 0(\ptr)
-   add \ptr, \tmp, \ptr
+#if CONFIG_PPC_PAGE_SHIFT > 14
+   addis   \ptr, \ptr, (_vdso_datapage - 999b)@ha
+#endif
+   addi\ptr, \ptr, (_vdso_datapage - 999b)@l
 .endm
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index f38f26e844b6..d33fa22ddbed 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -190,7 +190,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, 
int uses_interp)
 * install_special_mapping or the perf counter mmap tracking code
 * will fail to recognise it as a vDSO (since arch_vma_name fails).
 */
-   current->mm->context.vdso_base = vdso_base;
+   current->mm->context.vdso_base = vdso_base + PAGE_SIZE;
 
/*
 * our vma flags don't have VM_WRITE so by default, the process isn't
@@ -482,42 +482,6 @@ static __init void vdso_setup_trampolines(struct 
lib32_elfinfo *v32,
vdso32_rt_sigtramp = find_function32(v32, "__kernel_sigtramp_rt32");
 }
 
-static __init int vdso_fixup_datapage(struct lib32_elfinfo *v32,
-  struct lib64_elfinfo *v64)
-{
-#ifdef CONFIG_VDSO32
-   Elf32_Sym *sym32;
-#endif
-#ifdef CONFIG_PPC64
-   Elf64_Sym *sym64;
-
-   sym64 = find_symbol64(v64, "__kernel_datapage_offset");
-   if (sym64 == NULL) {
-   printk(KERN_ERR "vDSO64: Can't find symbol "
-  "__kernel_datapage_offset !\n");
-   return -1;
-   }
-   *((int *)(vdso64_kbase + sym64->st_value - VDSO64_LBASE)) =
-   (vdso64_pages << PAGE_SHIFT) -
-   (sym64->st_value - VDSO64_LBASE);
-#endif /* CONFIG_PPC64 */
-
-#ifdef CONFIG_VDSO32
-   sym32 = find_symbol32(v32, "__kernel_datapage_offset");
-   if (sym32 == NULL) {
-   printk(KERN_ERR "vDSO32: Can't find symbol "
-  "__kernel_datapage_offset !\n");
-   return -1;
-   }
-   *((int *)(vdso32_kbase + (sym32->st_value - VDSO32_LBASE))) =
-   (vdso32_pages << PAGE_SHIFT) -
-   (sym32->st_value - VDSO32_LBASE);
-#endif
-
-   return 0;
-}
-
-
 static __init int vdso_fixup_features(struct lib32_elfinfo *v32,
  struct lib64_elfinfo *v64)
 {
@@ -618,9 +582,6 @@ static __init int vdso_setup(void)
if (vdso_do_find_sections(&v32, &v64))
return -1;
 
-   if (vdso_fixup_datapage(&v32, &v64))
-   return -1;
-
if (vdso_fixup_features(&v32, &v64))
return -1;
 
@@ -761,26 +722,26 @@ static int __init vdso_init(void)
vdso32_pagelist = kcalloc(vdso32_pages + 2, sizeof(struct page *),
  GFP_KERNEL);
BUG_ON(vdso32_pagelist == NULL);
+   vdso32_pagelist[0] = virt_to_page(vdso_data);
for (i = 0; i < vdso32_pages; i++) {
struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE);
get_page(pg);
-   vdso32_pagelist[i] = pg;
+   vdso32_pagelist[i + 1] = pg;
}
-   vdso32_pagelist[i++] = virt_to_page(vdso_data);
-   vdso32_pagelist[i] = NULL;
+   vdso32_pagelist[i + 1] = NULL;
 #endif
 
 #ifdef C

[PATCH v8 1/8] powerpc/vdso64: Switch from __get_datapage() to get_datapage inline macro

2020-04-28 Thread Christophe Leroy
On the same way as already done on PPC32, drop __get_datapage()
function and use get_datapage inline macro instead.

See commit ec0895f08f99 ("powerpc/vdso32: inline __get_datapage()")

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/vdso64/cacheflush.S   |  9 
 arch/powerpc/kernel/vdso64/datapage.S | 28 +++
 arch/powerpc/kernel/vdso64/gettimeofday.S |  8 +++
 3 files changed, 11 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/kernel/vdso64/cacheflush.S 
b/arch/powerpc/kernel/vdso64/cacheflush.S
index 526f5ba2593e..cab14324242b 100644
--- a/arch/powerpc/kernel/vdso64/cacheflush.S
+++ b/arch/powerpc/kernel/vdso64/cacheflush.S
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
.text
@@ -24,14 +25,12 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
   .cfi_startproc
mflrr12
   .cfi_register lr,r12
-   mr  r11,r3
-   bl  V_LOCAL_FUNC(__get_datapage)
+   get_datapager10, r0
mtlrr12
-   mr  r10,r3
 
lwz r7,CFG_DCACHE_BLOCKSZ(r10)
addir5,r7,-1
-   andcr6,r11,r5   /* round low to line bdy */
+   andcr6,r3,r5/* round low to line bdy */
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
lwz r9,CFG_DCACHE_LOGBLOCKSZ(r10)
@@ -48,7 +47,7 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
 
lwz r7,CFG_ICACHE_BLOCKSZ(r10)
addir5,r7,-1
-   andcr6,r11,r5   /* round low to line bdy */
+   andcr6,r3,r5/* round low to line bdy */
subfr8,r6,r4/* compute length */
add r8,r8,r5
lwz r9,CFG_ICACHE_LOGBLOCKSZ(r10)
diff --git a/arch/powerpc/kernel/vdso64/datapage.S 
b/arch/powerpc/kernel/vdso64/datapage.S
index dc84f5ae3802..067247d3efb9 100644
--- a/arch/powerpc/kernel/vdso64/datapage.S
+++ b/arch/powerpc/kernel/vdso64/datapage.S
@@ -10,35 +10,13 @@
 #include 
 #include 
 #include 
+#include 
 
.text
 .global__kernel_datapage_offset;
 __kernel_datapage_offset:
.long   0
 
-V_FUNCTION_BEGIN(__get_datapage)
-  .cfi_startproc
-   /* We don't want that exposed or overridable as we want other objects
-* to be able to bl directly to here
-*/
-   .protected __get_datapage
-   .hidden __get_datapage
-
-   mflrr0
-  .cfi_register lr,r0
-
-   bcl 20,31,data_page_branch
-data_page_branch:
-   mflrr3
-   mtlrr0
-   addir3, r3, __kernel_datapage_offset-data_page_branch
-   lwz r0,0(r3)
-  .cfi_restore lr
-   add r3,r0,r3
-   blr
-  .cfi_endproc
-V_FUNCTION_END(__get_datapage)
-
 /*
  * void *__kernel_get_syscall_map(unsigned int *syscall_count) ;
  *
@@ -53,7 +31,7 @@ V_FUNCTION_BEGIN(__kernel_get_syscall_map)
mflrr12
   .cfi_register lr,r12
mr  r4,r3
-   bl  V_LOCAL_FUNC(__get_datapage)
+   get_datapager3, r0
mtlrr12
addir3,r3,CFG_SYSCALL_MAP64
cmpldi  cr0,r4,0
@@ -75,7 +53,7 @@ V_FUNCTION_BEGIN(__kernel_get_tbfreq)
   .cfi_startproc
mflrr12
   .cfi_register lr,r12
-   bl  V_LOCAL_FUNC(__get_datapage)
+   get_datapager3, r0
ld  r3,CFG_TB_TICKS_PER_SEC(r3)
mtlrr12
crclr   cr0*4+so
diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S 
b/arch/powerpc/kernel/vdso64/gettimeofday.S
index 1c9a04703250..e54c4ce4d356 100644
--- a/arch/powerpc/kernel/vdso64/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso64/gettimeofday.S
@@ -26,7 +26,7 @@ V_FUNCTION_BEGIN(__kernel_gettimeofday)
 
mr  r11,r3  /* r11 holds tv */
mr  r10,r4  /* r10 holds tz */
-   bl  V_LOCAL_FUNC(__get_datapage)/* get data page */
+   get_datapager3, r0
cmpldi  r11,0   /* check if tv is NULL */
beq 2f
lis r7,100@ha   /* load up USEC_PER_SEC */
@@ -71,7 +71,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
mflrr12 /* r12 saves lr */
   .cfi_register lr,r12
mr  r11,r4  /* r11 saves tp */
-   bl  V_LOCAL_FUNC(__get_datapage)/* get data page */
+   get_datapager3, r0
lis r7,NSEC_PER_SEC@h   /* want nanoseconds */
ori r7,r7,NSEC_PER_SEC@l
beq cr5,70f
@@ -188,7 +188,7 @@ V_FUNCTION_BEGIN(__kernel_clock_getres)
 
mflrr12
   .cfi_register lr,r12
-   bl  V_LOCAL_FUNC(__get_datapage)
+   get_datapager3, r0
lwz r5, CLOCK_HRTIMER_RES(r3)
mtlrr12
li  r3,0
@@ -221,7 +221,7 @@ V_FUNCTION_BEGIN(__kernel_time)
   .cfi_register lr,r12
 
mr  r11,r3  /* r11 holds t */
-   bl  V_LOCAL_FUNC(__get_datapage)
+   get_datapage  

[PATCH v8 4/8] powerpc/processor: Move cpu_relax() into asm/vdso/processor.h

2020-04-28 Thread Christophe Leroy
cpu_relax() need to be in asm/vdso/processor.h to be used by
the C VDSO generic library.

Move it there.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/processor.h  | 10 ++
 arch/powerpc/include/asm/vdso/processor.h | 23 +++
 2 files changed, 25 insertions(+), 8 deletions(-)
 create mode 100644 arch/powerpc/include/asm/vdso/processor.h

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index bfa336fbcfeb..8390503c44a2 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -6,6 +6,8 @@
  * Copyright (C) 2001 PPC 64 Team, IBM Corp
  */
 
+#include 
+
 #include 
 
 #ifdef CONFIG_VSX
@@ -63,14 +65,6 @@ extern int _chrp_type;
 
 #endif /* defined(__KERNEL__) && defined(CONFIG_PPC32) */
 
-/* Macros for adjusting thread priority (hardware multi-threading) */
-#define HMT_very_low()   asm volatile("or 31,31,31   # very low priority")
-#define HMT_low()   asm volatile("or 1,1,1  # low priority")
-#define HMT_medium_low() asm volatile("or 6,6,6  # medium low priority")
-#define HMT_medium()asm volatile("or 2,2,2  # medium priority")
-#define HMT_medium_high() asm volatile("or 5,5,5  # medium high priority")
-#define HMT_high()  asm volatile("or 3,3,3  # high priority")
-
 #ifdef __KERNEL__
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/include/asm/vdso/processor.h 
b/arch/powerpc/include/asm/vdso/processor.h
new file mode 100644
index ..39b9beace9ca
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/processor.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_VDSO_PROCESSOR_H
+#define __ASM_VDSO_PROCESSOR_H
+
+#ifndef __ASSEMBLY__
+
+/* Macros for adjusting thread priority (hardware multi-threading) */
+#define HMT_very_low() asm volatile("or 31, 31, 31 # very low 
priority")
+#define HMT_low()  asm volatile("or 1, 1, 1# low priority")
+#define HMT_medium_low()   asm volatile("or 6, 6, 6# medium low 
priority")
+#define HMT_medium()   asm volatile("or 2, 2, 2# medium 
priority")
+#define HMT_medium_high()  asm volatile("or 5, 5, 5# medium high 
priority")
+#define HMT_high() asm volatile("or 3, 3, 3# high 
priority")
+
+#ifdef CONFIG_PPC64
+#define cpu_relax()do { HMT_low(); HMT_medium(); barrier(); } while (0)
+#else
+#define cpu_relax()barrier()
+#endif
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __ASM_VDSO_PROCESSOR_H */
-- 
2.25.0



[PATCH v8 3/8] powerpc/vdso: Remove unused \tmp param in __get_datapage()

2020-04-28 Thread Christophe Leroy
The \tmp param is not used anymore, remove it.

Signed-off-by: Christophe Leroy 
---
v7: New patch, splitted out of preceding patch
---
 arch/powerpc/include/asm/vdso_datapage.h  | 2 +-
 arch/powerpc/kernel/vdso32/cacheflush.S   | 2 +-
 arch/powerpc/kernel/vdso32/datapage.S | 4 ++--
 arch/powerpc/kernel/vdso32/gettimeofday.S | 8 
 arch/powerpc/kernel/vdso64/cacheflush.S   | 2 +-
 arch/powerpc/kernel/vdso64/datapage.S | 4 ++--
 arch/powerpc/kernel/vdso64/gettimeofday.S | 8 
 7 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/vdso_datapage.h 
b/arch/powerpc/include/asm/vdso_datapage.h
index 11886467dfdf..f3d7e4e2a45b 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -116,7 +116,7 @@ extern struct vdso_data *vdso_data;
 
 #else /* __ASSEMBLY__ */
 
-.macro get_datapage ptr, tmp
+.macro get_datapage ptr
bcl 20, 31, .+4
 999:
mflr\ptr
diff --git a/arch/powerpc/kernel/vdso32/cacheflush.S 
b/arch/powerpc/kernel/vdso32/cacheflush.S
index 3440ddf21c8b..017843bf5382 100644
--- a/arch/powerpc/kernel/vdso32/cacheflush.S
+++ b/arch/powerpc/kernel/vdso32/cacheflush.S
@@ -27,7 +27,7 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
 #ifdef CONFIG_PPC64
mflrr12
   .cfi_register lr,r12
-   get_datapager10, r0
+   get_datapager10
mtlrr12
 #endif
 
diff --git a/arch/powerpc/kernel/vdso32/datapage.S 
b/arch/powerpc/kernel/vdso32/datapage.S
index 5513a4f8253e..0513a2eabec8 100644
--- a/arch/powerpc/kernel/vdso32/datapage.S
+++ b/arch/powerpc/kernel/vdso32/datapage.S
@@ -28,7 +28,7 @@ V_FUNCTION_BEGIN(__kernel_get_syscall_map)
mflrr12
   .cfi_register lr,r12
mr. r4,r3
-   get_datapager3, r0
+   get_datapager3
mtlrr12
addir3,r3,CFG_SYSCALL_MAP32
beqlr
@@ -49,7 +49,7 @@ V_FUNCTION_BEGIN(__kernel_get_tbfreq)
   .cfi_startproc
mflrr12
   .cfi_register lr,r12
-   get_datapager3, r0
+   get_datapager3
lwz r4,(CFG_TB_TICKS_PER_SEC + 4)(r3)
lwz r3,CFG_TB_TICKS_PER_SEC(r3)
mtlrr12
diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S 
b/arch/powerpc/kernel/vdso32/gettimeofday.S
index a3951567118a..0bbdce0f2a9c 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -34,7 +34,7 @@ V_FUNCTION_BEGIN(__kernel_gettimeofday)
 
mr. r10,r3  /* r10 saves tv */
mr  r11,r4  /* r11 saves tz */
-   get_datapager9, r0
+   get_datapager9
beq 3f
LOAD_REG_IMMEDIATE(r7, 100) /* load up USEC_PER_SEC */
bl  __do_get_tspec@local/* get sec/usec from tb & kernel */
@@ -79,7 +79,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
mflrr12 /* r12 saves lr */
   .cfi_register lr,r12
mr  r11,r4  /* r11 saves tp */
-   get_datapager9, r0
+   get_datapager9
LOAD_REG_IMMEDIATE(r7, NSEC_PER_SEC)/* load up NSEC_PER_SEC */
beq cr5, .Lcoarse_clocks
 .Lprecise_clocks:
@@ -206,7 +206,7 @@ V_FUNCTION_BEGIN(__kernel_clock_getres)
 
mflrr12
   .cfi_register lr,r12
-   get_datapager3, r0
+   get_datapager3
lwz r5, CLOCK_HRTIMER_RES(r3)
mtlrr12
 1: li  r3,0
@@ -240,7 +240,7 @@ V_FUNCTION_BEGIN(__kernel_time)
   .cfi_register lr,r12
 
mr  r11,r3  /* r11 holds t */
-   get_datapager9, r0
+   get_datapager9
 
lwz r3,STAMP_XTIME_SEC+LOPART(r9)
 
diff --git a/arch/powerpc/kernel/vdso64/cacheflush.S 
b/arch/powerpc/kernel/vdso64/cacheflush.S
index cab14324242b..61985de5758f 100644
--- a/arch/powerpc/kernel/vdso64/cacheflush.S
+++ b/arch/powerpc/kernel/vdso64/cacheflush.S
@@ -25,7 +25,7 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
   .cfi_startproc
mflrr12
   .cfi_register lr,r12
-   get_datapager10, r0
+   get_datapager10
mtlrr12
 
lwz r7,CFG_DCACHE_BLOCKSZ(r10)
diff --git a/arch/powerpc/kernel/vdso64/datapage.S 
b/arch/powerpc/kernel/vdso64/datapage.S
index 03bb72c440dc..00760dc69d68 100644
--- a/arch/powerpc/kernel/vdso64/datapage.S
+++ b/arch/powerpc/kernel/vdso64/datapage.S
@@ -28,7 +28,7 @@ V_FUNCTION_BEGIN(__kernel_get_syscall_map)
mflrr12
   .cfi_register lr,r12
mr  r4,r3
-   get_datapager3, r0
+   get_datapager3
mtlrr12
addir3,r3,CFG_SYSCALL_MAP64
cmpldi  cr0,r4,0
@@ -50,7 +50,7 @@ V_FUNCTION_BEGIN(__kernel_get_tbfreq)
   .cfi_startproc
mflrr12
   .cfi_register lr,r12
-   get_datapager3, r0
+   get_datapager3
ld  r3,CFG_TB_TICKS_PER_SEC(r3)
mtlrr12
crclr   cr0*4+so
diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S 
b/arch/

[PATCH v8 0/8] powerpc: switch VDSO to C implementation

2020-04-28 Thread Christophe Leroy
This is the seventh version of a series to switch powerpc VDSO to
generic C implementation.

Main changes since v7 are:
- Added gettime64 on PPC32

This series applies on today's powerpc/merge branch.

See the last patches for details on changes and performance.

Christophe Leroy (8):
  powerpc/vdso64: Switch from __get_datapage() to get_datapage inline
macro
  powerpc/vdso: Remove __kernel_datapage_offset and simplify
__get_datapage()
  powerpc/vdso: Remove unused \tmp param in __get_datapage()
  powerpc/processor: Move cpu_relax() into asm/vdso/processor.h
  powerpc/vdso: Prepare for switching VDSO to generic C implementation.
  powerpc/vdso: Switch VDSO to generic C implementation.
  lib/vdso: force inlining of __cvdso_clock_gettime_common()
  powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32

 arch/powerpc/Kconfig |   2 +
 arch/powerpc/include/asm/clocksource.h   |   7 +
 arch/powerpc/include/asm/processor.h |  10 +-
 arch/powerpc/include/asm/vdso/clocksource.h  |   7 +
 arch/powerpc/include/asm/vdso/gettimeofday.h | 175 +++
 arch/powerpc/include/asm/vdso/processor.h|  23 ++
 arch/powerpc/include/asm/vdso/vsyscall.h |  25 ++
 arch/powerpc/include/asm/vdso_datapage.h |  50 ++--
 arch/powerpc/kernel/asm-offsets.c|  49 +--
 arch/powerpc/kernel/time.c   |  91 +-
 arch/powerpc/kernel/vdso.c   |  58 +---
 arch/powerpc/kernel/vdso32/Makefile  |  32 +-
 arch/powerpc/kernel/vdso32/cacheflush.S  |   2 +-
 arch/powerpc/kernel/vdso32/config-fake32.h   |  34 +++
 arch/powerpc/kernel/vdso32/datapage.S|   7 +-
 arch/powerpc/kernel/vdso32/gettimeofday.S| 300 +--
 arch/powerpc/kernel/vdso32/vdso32.lds.S  |   8 +-
 arch/powerpc/kernel/vdso32/vgettimeofday.c   |  35 +++
 arch/powerpc/kernel/vdso64/Makefile  |  23 +-
 arch/powerpc/kernel/vdso64/cacheflush.S  |   9 +-
 arch/powerpc/kernel/vdso64/datapage.S|  31 +-
 arch/powerpc/kernel/vdso64/gettimeofday.S| 243 +--
 arch/powerpc/kernel/vdso64/vdso64.lds.S  |   7 +-
 arch/powerpc/kernel/vdso64/vgettimeofday.c   |  29 ++
 lib/vdso/gettimeofday.c  |   2 +-
 25 files changed, 460 insertions(+), 799 deletions(-)
 create mode 100644 arch/powerpc/include/asm/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h
 create mode 100644 arch/powerpc/include/asm/vdso/processor.h
 create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h
 create mode 100644 arch/powerpc/kernel/vdso32/config-fake32.h
 create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c
 create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c

-- 
2.25.0



[PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-04-28 Thread Christophe Leroy
Prepare for switching VDSO to generic C implementation in following
patch. Here, we:
- Modify __get_datapage() to take an offset
- Prepare the helpers to call the C VDSO functions
- Prepare the required callbacks for the C VDSO functions
- Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
- Add the C trampolines to the generic C VDSO functions

powerpc is a bit special for VDSO as well as system calls in the
way that it requires setting CR SO bit which cannot be done in C.
Therefore, entry/exit needs to be performed in ASM.

Implementing __arch_get_vdso_data() would clobber the link register,
requiring the caller to save it. As the ASM calling function already
has to set a stack frame and saves the link register before calling
the C vdso function, retriving the vdso data pointer there is lighter.

Implement __arch_vdso_capable() and:
- When the timebase is used, make it always return true.
- When the RTC clock is used, make it always return false.

Provide vdso_shift_ns(), as the generic x >> s gives the following
bad result:

  18:   35 25 ff e0 addic.  r9,r5,-32
  1c:   41 80 00 10 blt 2c 
  20:   7c 64 4c 30 srw r4,r3,r9
  24:   38 60 00 00 li  r3,0
...
  2c:   54 69 08 3c rlwinm  r9,r3,1,0,30
  30:   21 45 00 1f subfic  r10,r5,31
  34:   7c 84 2c 30 srw r4,r4,r5
  38:   7d 29 50 30 slw r9,r9,r10
  3c:   7c 63 2c 30 srw r3,r3,r5
  40:   7d 24 23 78 or  r4,r9,r4

In our case the shift is always <= 32. In addition,  the upper 32 bits
of the result are likely nul. Lets GCC know it, it also optimises the
following calculations.

With the patch, we get:
   0:   21 25 00 20 subfic  r9,r5,32
   4:   7c 69 48 30 slw r9,r3,r9
   8:   7c 84 2c 30 srw r4,r4,r5
   c:   7d 24 23 78 or  r4,r9,r4
  10:   7c 63 2c 30 srw r3,r3,r5

Signed-off-by: Christophe Leroy 
---
v8:
- New, splitted out of last patch of the series
---
 arch/powerpc/include/asm/clocksource.h   |   7 +
 arch/powerpc/include/asm/vdso/clocksource.h  |   7 +
 arch/powerpc/include/asm/vdso/gettimeofday.h | 175 +++
 arch/powerpc/include/asm/vdso_datapage.h |   6 +-
 arch/powerpc/kernel/vdso32/vgettimeofday.c   |  29 +++
 arch/powerpc/kernel/vdso64/vgettimeofday.c   |  29 +++
 6 files changed, 250 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/include/asm/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h
 create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c
 create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c

diff --git a/arch/powerpc/include/asm/clocksource.h 
b/arch/powerpc/include/asm/clocksource.h
new file mode 100644
index ..482185566b0c
--- /dev/null
+++ b/arch/powerpc/include/asm/clocksource.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_CLOCKSOURCE_H
+#define _ASM_CLOCKSOURCE_H
+
+#include 
+
+#endif
diff --git a/arch/powerpc/include/asm/vdso/clocksource.h 
b/arch/powerpc/include/asm/vdso/clocksource.h
new file mode 100644
index ..ec5d672d2569
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/clocksource.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSOCLOCKSOURCE_H
+#define __ASM_VDSOCLOCKSOURCE_H
+
+#define VDSO_ARCH_CLOCKMODES   VDSO_CLOCKMODE_ARCHTIMER
+
+#endif
diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
b/arch/powerpc/include/asm/vdso/gettimeofday.h
new file mode 100644
index ..4452897f9bd8
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -0,0 +1,175 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSO_GETTIMEOFDAY_H
+#define __ASM_VDSO_GETTIMEOFDAY_H
+
+#include 
+
+#ifdef __ASSEMBLY__
+
+.macro cvdso_call funct
+  .cfi_startproc
+   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
+   mflrr0
+  .cfi_register lr, r0
+   PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+   get_datapager5, VDSO_DATA_OFFSET
+   bl  \funct
+   PPC_LL  r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+   cmpwi   r3, 0
+   mtlrr0
+  .cfi_restore lr
+   addir1, r1, STACK_FRAME_OVERHEAD
+   crclr   so
+   beqlr+
+   crset   so
+   neg r3, r3
+   blr
+  .cfi_endproc
+.endm
+
+.macro cvdso_call_time funct
+  .cfi_startproc
+   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
+   mflrr0
+  .cfi_register lr, r0
+   PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+   get_datapager4, VDSO_DATA_OFFSET
+   bl  \funct
+   PPC_LL  r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+   crclr   so
+   mtlrr0
+  .cfi_restore lr
+   addir1, r1, STACK_FRAME_OVERHEAD
+   blr
+  .cfi_endproc
+.endm
+
+#else
+
+#include 
+#i

[PATCH v8 8/8] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32

2020-04-28 Thread Christophe Leroy
Provides __kernel_clock_gettime64() on vdso32. This is the
64 bits version of __kernel_clock_gettime() which is
y2038 compliant.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/vdso32/gettimeofday.S  | 9 +
 arch/powerpc/kernel/vdso32/vdso32.lds.S| 1 +
 arch/powerpc/kernel/vdso32/vgettimeofday.c | 6 ++
 3 files changed, 16 insertions(+)

diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S 
b/arch/powerpc/kernel/vdso32/gettimeofday.S
index fd7b01c51281..a6e29f880e0e 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -35,6 +35,15 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
cvdso_call __c_kernel_clock_gettime
 V_FUNCTION_END(__kernel_clock_gettime)
 
+/*
+ * Exact prototype of clock_gettime64()
+ *
+ * int __kernel_clock_gettime64(clockid_t clock_id, struct __timespec64 *ts);
+ *
+ */
+V_FUNCTION_BEGIN(__kernel_clock_gettime64)
+   cvdso_call __c_kernel_clock_gettime64
+V_FUNCTION_END(__kernel_clock_gettime64)
 
 /*
  * Exact prototype of clock_getres()
diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S 
b/arch/powerpc/kernel/vdso32/vdso32.lds.S
index 6cf729612268..05b462143238 100644
--- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
@@ -144,6 +144,7 @@ VERSION
 #ifndef CONFIG_PPC_BOOK3S_601
__kernel_gettimeofday;
__kernel_clock_gettime;
+   __kernel_clock_gettime64;
__kernel_clock_getres;
__kernel_time;
__kernel_get_tbfreq;
diff --git a/arch/powerpc/kernel/vdso32/vgettimeofday.c 
b/arch/powerpc/kernel/vdso32/vgettimeofday.c
index 0b9ab4c22ef2..f7f71fecf4ed 100644
--- a/arch/powerpc/kernel/vdso32/vgettimeofday.c
+++ b/arch/powerpc/kernel/vdso32/vgettimeofday.c
@@ -11,6 +11,12 @@ int __c_kernel_clock_gettime(clockid_t clock, struct 
old_timespec32 *ts,
return __cvdso_clock_gettime32_data(vd, clock, ts);
 }
 
+int __c_kernel_clock_gettime64(clockid_t clock, struct __kernel_timespec *ts,
+  const struct vdso_data *vd)
+{
+   return __cvdso_clock_gettime_data(vd, clock, ts);
+}
+
 int __c_kernel_gettimeofday(struct __kernel_old_timeval *tv, struct timezone 
*tz,
const struct vdso_data *vd)
 {
-- 
2.25.0



[PATCH v8 6/8] powerpc/vdso: Switch VDSO to generic C implementation.

2020-04-28 Thread Christophe Leroy
For VDSO32 on PPC64, we create a fake 32 bits config, on the same
principle as MIPS architecture, in order to get the correct parts of
the different asm header files.

With the C VDSO, the performance is slightly lower, but it is worth
it as it will ease maintenance and evolution, and also brings clocks
that are not supported with the ASM VDSO.

On an 8xx at 132 MHz, vdsotest with the ASM VDSO:
gettimeofday:vdso: 828 nsec/call
clock-getres-realtime-coarse:vdso: 391 nsec/call
clock-gettime-realtime-coarse:vdso: 614 nsec/call
clock-getres-realtime:vdso: 460 nsec/call
clock-gettime-realtime:vdso: 876 nsec/call
clock-getres-monotonic-coarse:vdso: 399 nsec/call
clock-gettime-monotonic-coarse:vdso: 691 nsec/call
clock-getres-monotonic:vdso: 460 nsec/call
clock-gettime-monotonic:vdso: 1026 nsec/call

On an 8xx at 132 MHz, vdsotest with the C VDSO:
gettimeofday:vdso: 955 nsec/call
clock-getres-realtime-coarse:vdso: 545 nsec/call
clock-gettime-realtime-coarse:vdso: 592 nsec/call
clock-getres-realtime:vdso: 545 nsec/call
clock-gettime-realtime:vdso: 941 nsec/call
clock-getres-monotonic-coarse:vdso: 545 nsec/call
clock-gettime-monotonic-coarse:vdso: 591 nsec/call
clock-getres-monotonic:vdso: 545 nsec/call
clock-gettime-monotonic:vdso: 940 nsec/call

It is even better for gettime with monotonic clocks.

Unsupported clocks with ASM VDSO:
clock-gettime-boottime:vdso: 3851 nsec/call
clock-gettime-tai:vdso: 3852 nsec/call
clock-gettime-monotonic-raw:vdso: 3396 nsec/call

Same clocks with C VDSO:
clock-gettime-tai:vdso: 941 nsec/call
clock-gettime-monotonic-raw:vdso: 1001 nsec/call
clock-gettime-monotonic-coarse:vdso: 591 nsec/call

On an 8321E at 333 MHz, vdsotest with the ASM VDSO:
gettimeofday:vdso: 220 nsec/call
clock-getres-realtime-coarse:vdso: 102 nsec/call
clock-gettime-realtime-coarse:vdso: 178 nsec/call
clock-getres-realtime:vdso: 129 nsec/call
clock-gettime-realtime:vdso: 235 nsec/call
clock-getres-monotonic-coarse:vdso: 105 nsec/call
clock-gettime-monotonic-coarse:vdso: 208 nsec/call
clock-getres-monotonic:vdso: 129 nsec/call
clock-gettime-monotonic:vdso: 274 nsec/call

On an 8321E at 333 MHz, vdsotest with the C VDSO:
gettimeofday:vdso: 272 nsec/call
clock-getres-realtime-coarse:vdso: 160 nsec/call
clock-gettime-realtime-coarse:vdso: 184 nsec/call
clock-getres-realtime:vdso: 166 nsec/call
clock-gettime-realtime:vdso: 281 nsec/call
clock-getres-monotonic-coarse:vdso: 160 nsec/call
clock-gettime-monotonic-coarse:vdso: 184 nsec/call
clock-getres-monotonic:vdso: 169 nsec/call
clock-gettime-monotonic:vdso: 275 nsec/call

Signed-off-by: Christophe Leroy 
---
v7:
- Split out preparatory changes in a new preceding patch
- Added -fasynchronous-unwind-tables to CC flags.

v6:
- Added missing prototypes in asm/vdso/gettimeofday.h for __c_kernel_ functions.
- Using STACK_FRAME_OVERHEAD instead of INT_FRAME_SIZE
- Rebased on powerpc/merge as of 7 Apr 2020
- Fixed build failure with gcc 9
- Added a patch to create asm/vdso/processor.h and more cpu_relax() in it

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   |   2 +
 arch/powerpc/include/asm/vdso/vsyscall.h   |  25 ++
 arch/powerpc/include/asm/vdso_datapage.h   |  40 +--
 arch/powerpc/kernel/asm-offsets.c  |  49 +---
 arch/powerpc/kernel/time.c |  91 +--
 arch/powerpc/kernel/vdso.c |   5 +-
 arch/powerpc/kernel/vdso32/Makefile|  32 ++-
 arch/powerpc/kernel/vdso32/config-fake32.h |  34 +++
 arch/powerpc/kernel/vdso32/gettimeofday.S  | 291 +
 arch/powerpc/kernel/vdso64/Makefile|  23 +-
 arch/powerpc/kernel/vdso64/gettimeofday.S  | 243 +
 11 files changed, 144 insertions(+), 691 deletions(-)
 create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h
 create mode 100644 arch/powerpc/kernel/vdso32/config-fake32.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 924c541a9260..4e51454a3e7b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -166,6 +166,7 @@ config PPC
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
select GENERIC_TIME_VSYSCALL
+   select GENERIC_GETTIMEOFDAY
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_HUGE_VMAP  if PPC_BOOK3S_64 && 
PPC_RADIX_MMU
select HAVE_ARCH_JUMP_LABEL
@@ -197,6 +198,7 @@ config PPC
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER
select HAVE_GCC_PLUGINS if GCC_VERSION >= 50200   # 
plugin support on gcc <= 5.1 is buggy on PPC
+   select HAVE_GENERIC_VDSO
select HAVE_HW_BREAKPOINT   if PERF_EVENTS && (PPC_BOOK3S 
|| PPC_8xx)
select HAVE_IDE
select HAVE_IOREMAP_PROT
diff --git a/arch/powerpc/include/asm/vdso/vsyscall.h 
b/arch/powerpc/include/asm/vdso/v

[PATCH v8 7/8] lib/vdso: force inlining of __cvdso_clock_gettime_common()

2020-04-28 Thread Christophe Leroy
When adding gettime64() to a 32 bit architecture (namely powerpc/32)
it has been noticed that GCC doesn't inline anymore
__cvdso_clock_gettime_common() because it is called twice
(Once by __cvdso_clock_gettime() and once by
__cvdso_clock_gettime32).

This has the effect of seriously degrading the performance:

Before the implementation of gettime64(), gettime() runs in:

clock-gettime-monotonic-raw:vdso: 1003 nsec/call
clock-gettime-monotonic-coarse:vdso: 592 nsec/call
clock-gettime-monotonic:vdso: 942 nsec/call

When adding a gettime64() entry point, the standard gettime()
performance is degraded by 30% to 50%:

clock-gettime-monotonic-raw:vdso: 1300 nsec/call
clock-gettime-monotonic-coarse:vdso: 900 nsec/call
clock-gettime-monotonic:vdso: 1232 nsec/call

Adding __always_inline() to __cvdso_clock_gettime_common()
regains the original performance.

In terms of code size, the inlining increases the code size
by only 176 bytes. This is in the noise for a kernel image.

Signed-off-by: Christophe Leroy 
---
 lib/vdso/gettimeofday.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index a2909af4b924..7938d3c4901d 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -210,7 +210,7 @@ static __always_inline int do_coarse(const struct vdso_data 
*vd, clockid_t clk,
return 0;
 }
 
-static __maybe_unused int
+static __always_inline int
 __cvdso_clock_gettime_common(const struct vdso_data *vd, clockid_t clock,
 struct __kernel_timespec *ts)
 {
-- 
2.25.0



[PATCH 1/2] powerpc: Keep .rela* sections when CONFIG_RELOCATABLE is defined

2020-04-28 Thread H.J. Lu
arch/powerpc/kernel/vmlinux.lds.S has

 #ifdef CONFIG_RELOCATABLE
 ...
.rela.dyn : AT(ADDR(.rela.dyn) - LOAD_OFFSET)
{
__rela_dyn_start = .;
*(.rela*)
}
 #endif
 ...
DISCARDS
/DISCARD/ : {
*(*.EMB.apuinfo)
*(.glink .iplt .plt .rela* .comment)
*(.gnu.version*)
*(.gnu.attributes)
*(.eh_frame)
}

Since .rela* sections are needed when CONFIG_RELOCATABLE is defined,
don't discard .rela* sections if CONFIG_RELOCATABLE is defined.

Signed-off-by: H.J. Lu 
Acked-by: Michael Ellerman  (powerpc)
---
 arch/powerpc/kernel/vmlinux.lds.S | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 31a0f201fb6f..4ba07734a210 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -366,9 +366,12 @@ SECTIONS
DISCARDS
/DISCARD/ : {
*(*.EMB.apuinfo)
-   *(.glink .iplt .plt .rela* .comment)
+   *(.glink .iplt .plt .comment)
*(.gnu.version*)
*(.gnu.attributes)
*(.eh_frame)
+#ifndef CONFIG_RELOCATABLE
+   *(.rela*)
+#endif
}
 }
-- 
2.25.4



[PATCH 2/2] Discard .note.gnu.property sections in generic NOTES

2020-04-28 Thread H.J. Lu
With the command-line option, -mx86-used-note=yes, the x86 assembler
in binutils 2.32 and above generates a program property note in a note
section, .note.gnu.property, to encode used x86 ISAs and features.  But
kernel linker script only contains a single NOTE segment:

PHDRS {
 text PT_LOAD FLAGS(5);
 data PT_LOAD FLAGS(6);
 percpu PT_LOAD FLAGS(6);
 init PT_LOAD FLAGS(7);
 note PT_NOTE FLAGS(0);
}
SECTIONS
{
...
 .notes : AT(ADDR(.notes) - 0x8000) { __start_notes = .; KEEP(*(.not
e.*)) __stop_notes = .; } :text :note
...
}

The NOTE segment generated by kernel linker script is aligned to 4 bytes.
But .note.gnu.property section must be aligned to 8 bytes on x86-64 and
we get

[hjl@gnu-skx-1 linux]$ readelf -n vmlinux

Displaying notes found in: .notes
  OwnerData size Description
  Xen  0x0006 Unknown note type: (0x0006)
   description data: 6c 69 6e 75 78 00
  Xen  0x0004 Unknown note type: (0x0007)
   description data: 32 2e 36 00
  xen-3.0  0x0005 Unknown note type: (0x006e6558)
   description data: 08 00 00 00 03
readelf: Warning: note with invalid namesz and/or descsz found at offset 0x50
readelf: Warning:  type: 0x, namesize: 0x006e6558, descsize:
0x8000, alignment: 8
[hjl@gnu-skx-1 linux]$

Since note.gnu.property section in kernel image is never used, this patch
discards .note.gnu.property sections in kernel linker script by adding

/DISCARD/ : {
  *(.note.gnu.property)
}

before kernel NOTE segment in generic NOTES.

Signed-off-by: H.J. Lu 
Reviewed-by: Kees Cook 
---
 include/asm-generic/vmlinux.lds.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 71e387a5fe90..95cd678428f4 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -833,7 +833,14 @@
 #define TRACEDATA
 #endif
 
+/*
+ * Discard .note.gnu.property sections which are unused and have
+ * different alignment requirement from kernel note sections.
+ */
 #define NOTES  \
+   /DISCARD/ : {   \
+   *(.note.gnu.property)   \
+   }   \
.notes : AT(ADDR(.notes) - LOAD_OFFSET) {   \
__start_notes = .;  \
KEEP(*(.note.*))\
-- 
2.25.4



Re: [PATCH] powerpc/spufs: Add rcu_read_lock() around fcheck()

2020-04-28 Thread Christoph Hellwig
On Tue, Apr 28, 2020 at 09:48:11PM +1000, Michael Ellerman wrote:
> 
> This comes from fcheck_files() via fcheck().
> 
> It's pretty clearly documented that fcheck() must be wrapped with
> rcu_read_lock(), so fix it.

But for this to actually be useful you'd need the rcu read lock until
your are done with the file (or got a reference).


Re: [RFC PATCH] powerpc/spufs: fix copy_to_user while atomic

2020-04-28 Thread Christoph Hellwig
On Tue, Apr 28, 2020 at 08:02:07PM +0800, Jeremy Kerr wrote:
> Currently, we may perform a copy_to_user (through
> simple_read_from_buffer()) while holding a context's register_lock,
> while accessing the context save area.
> 
> This change uses a temporary buffers for the context save area data,
> which we then pass to simple_read_from_buffer.
> 
> Signed-off-by: Jeremy Kerr 
> ---
> 
> Christoph - this fixes the copy_to_user while atomic, hopefully with
> only minimal distruption to your series.

I recognize plenty of sniplets from that :)  Your patch looks good
to me:

Reviewed-by: Christoph Hellwig 


[PATCH v2 1/2] powerpc/ftrace: Simplify error checking when patching instructions

2020-04-28 Thread Naveen N. Rao
Introduce a macro PATCH_INSN_OR_GOTO() to simplify instruction patching,
and to make the error messages more uniform and useful:
- print an error message that includes the original return value
- print the function name and line numbers, so that the offending
  location is clear
- goto a label which always return -EPERM, which ftrace_bug() expects
  for proper error handling

Also eliminate use of patch_branch() since most such uses already call
create_branch() for error checking before patching. Instead, use the
return value from create_branch() with PATCH_INSN_OR_GOTO().

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/trace/ftrace.c | 96 +++---
 1 file changed, 60 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c 
b/arch/powerpc/kernel/trace/ftrace.c
index 7ea0ca044b65..63edbd48af42 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -31,6 +31,17 @@
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 
+#define PATCH_INSN_OR_GOTO(addr, instr, label)  \
+do {\
+   int rc = patch_instruction((unsigned int *)(addr), instr);   \
+   if (rc) {\
+   pr_err("%s:%d Error patching instruction at 0x%pK (%pS): %d\n", 
\
+   __func__, __LINE__,  \
+   (void *)(addr), (void *)(addr), rc); \
+   goto label;  \
+   }\
+} while (0)
+
 /*
  * We generally only have a single long_branch tramp and at most 2 or 3 plt
  * tramps generated. But, we don't use the plt tramps currently. We also allot
@@ -78,10 +89,12 @@ ftrace_modify_code(unsigned long ip, unsigned int old, 
unsigned int new)
}
 
/* replace the text with the new text */
-   if (patch_instruction((unsigned int *)ip, new))
-   return -EPERM;
+   PATCH_INSN_OR_GOTO(ip, new, patch_err);
 
return 0;
+
+patch_err:
+   return -EPERM;
 }
 
 /*
@@ -204,12 +217,12 @@ __ftrace_make_nop(struct module *mod,
}
 #endif /* CONFIG_MPROFILE_KERNEL */
 
-   if (patch_instruction((unsigned int *)ip, pop)) {
-   pr_err("Patching NOP failed.\n");
-   return -EPERM;
-   }
+   PATCH_INSN_OR_GOTO(ip, pop, patch_err);
 
return 0;
+
+patch_err:
+   return -EPERM;
 }
 
 #else /* !PPC64 */
@@ -276,10 +289,12 @@ __ftrace_make_nop(struct module *mod,
 
op = PPC_INST_NOP;
 
-   if (patch_instruction((unsigned int *)ip, op))
-   return -EPERM;
+   PATCH_INSN_OR_GOTO(ip, op, patch_err);
 
return 0;
+
+patch_err:
+   return -EPERM;
 }
 #endif /* PPC64 */
 #endif /* CONFIG_MODULES */
@@ -322,7 +337,7 @@ static int add_ftrace_tramp(unsigned long tramp)
  */
 static int setup_mcount_compiler_tramp(unsigned long tramp)
 {
-   int i, op;
+   unsigned int i, op;
unsigned long ptr;
static unsigned long ftrace_plt_tramps[NUM_FTRACE_TRAMPS];
 
@@ -366,16 +381,14 @@ static int setup_mcount_compiler_tramp(unsigned long 
tramp)
 #else
ptr = ppc_global_function_entry((void *)ftrace_caller);
 #endif
-   if (!create_branch((void *)tramp, ptr, 0)) {
+   op = create_branch((void *)tramp, ptr, 0);
+   if (!op) {
pr_debug("%ps is not reachable from existing mcount tramp\n",
(void *)ptr);
return -1;
}
 
-   if (patch_branch((unsigned int *)tramp, ptr, 0)) {
-   pr_debug("REL24 out of range!\n");
-   return -1;
-   }
+   PATCH_INSN_OR_GOTO(tramp, op, patch_err);
 
if (add_ftrace_tramp(tramp)) {
pr_debug("No tramp locations left\n");
@@ -383,6 +396,9 @@ static int setup_mcount_compiler_tramp(unsigned long tramp)
}
 
return 0;
+
+patch_err:
+   return -EPERM;
 }
 
 static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr)
@@ -416,12 +432,12 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace 
*rec, unsigned long addr)
}
}
 
-   if (patch_instruction((unsigned int *)ip, PPC_INST_NOP)) {
-   pr_err("Patching NOP failed.\n");
-   return -EPERM;
-   }
+   PATCH_INSN_OR_GOTO(ip, PPC_INST_NOP, patch_err);
 
return 0;
+
+patch_err:
+   return -EPERM;
 }
 
 int ftrace_make_nop(struct module *mod,
@@ -557,17 +573,18 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long 
addr)
}
 
/* Ensure branch is within 24 bits */
-   if (!create_branch(ip, tramp, BRANCH_SET_LINK)) {
+   op[0] = create_branch(ip, tramp, BRANCH_SET_LINK);
+   if (!op[0]) {
pr_err("Branch out of rang

[PATCH v2 0/2] powerpc: Enhance error handling with patch_instruction()

2020-04-28 Thread Naveen N. Rao
Changes in v2:
- Change macro to use 'goto' instead of 'return'
- Rename macro to indicate use of 'goto'
- Convert more patch_instruction() uses in optprobes to use the new 
  macro.
- Drop 1st patch, which added error checking in do_patch_instruction() 
  since that is being covered in a separate patchset.

v1:
http://lkml.kernel.org/r/cover.1587654213.git.naveen.n@linux.vnet.ibm.com


- Naveen

Naveen N. Rao (2):
  powerpc/ftrace: Simplify error checking when patching instructions
  powerpc/kprobes: Check return value of patch_instruction()

 arch/powerpc/kernel/kprobes.c  |  13 +++-
 arch/powerpc/kernel/optprobes.c| 109 +
 arch/powerpc/kernel/trace/ftrace.c |  96 +++--
 3 files changed, 149 insertions(+), 69 deletions(-)


base-commit: 8299da600ad05b8aa0f15ec0f5f03bd40e37d6f0
-- 
2.25.1



[PATCH v2 2/2] powerpc/kprobes: Check return value of patch_instruction()

2020-04-28 Thread Naveen N. Rao
patch_instruction() can fail in some scenarios. Add appropriate error
checking so that such failures are caught and logged, and suitable error
code is returned.

Fixes: d07df82c43be8 ("powerpc/kprobes: Move kprobes over to 
patch_instruction()")
Fixes: f3eca95638931 ("powerpc/kprobes/optprobes: Use patch_instruction()")
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/kprobes.c   |  13 +++-
 arch/powerpc/kernel/optprobes.c | 109 +++-
 2 files changed, 89 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 81efb605113e..9eb1cc05ddd5 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -138,13 +139,21 @@ NOKPROBE_SYMBOL(arch_prepare_kprobe);
 
 void arch_arm_kprobe(struct kprobe *p)
 {
-   patch_instruction(p->addr, BREAKPOINT_INSTRUCTION);
+   int rc = patch_instruction(p->addr, BREAKPOINT_INSTRUCTION);
+
+   if (WARN_ON(rc))
+   pr_err("Failed to patch trap at 0x%pK: %d\n",
+   (void *)p->addr, rc);
 }
 NOKPROBE_SYMBOL(arch_arm_kprobe);
 
 void arch_disarm_kprobe(struct kprobe *p)
 {
-   patch_instruction(p->addr, p->opcode);
+   int rc = patch_instruction(p->addr, p->opcode);
+
+   if (WARN_ON(rc))
+   pr_err("Failed to remove trap at 0x%pK: %d\n",
+   (void *)p->addr, rc);
 }
 NOKPROBE_SYMBOL(arch_disarm_kprobe);
 
diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
index 024f7aad1952..20897600db2e 100644
--- a/arch/powerpc/kernel/optprobes.c
+++ b/arch/powerpc/kernel/optprobes.c
@@ -139,52 +139,80 @@ void arch_remove_optimized_kprobe(struct optimized_kprobe 
*op)
}
 }
 
+#define PATCH_INSN_OR_GOTO(addr, instr, label)  \
+do {\
+   int rc = patch_instruction((unsigned int *)(addr), instr);   \
+   if (rc) {\
+   pr_err("%s:%d Error patching instruction at 0x%pK (%pS): %d\n", 
\
+   __func__, __LINE__,  \
+   (void *)(addr), (void *)(addr), rc); \
+   goto label;  \
+   }\
+} while (0)
+
 /*
  * emulate_step() requires insn to be emulated as
  * second parameter. Load register 'r4' with the
  * instruction.
  */
-void patch_imm32_load_insns(unsigned int val, kprobe_opcode_t *addr)
+static int patch_imm32_load_insns(unsigned int val, kprobe_opcode_t *addr)
 {
/* addis r4,0,(insn)@h */
-   patch_instruction(addr, PPC_INST_ADDIS | ___PPC_RT(4) |
- ((val >> 16) & 0x));
+   PATCH_INSN_OR_GOTO(addr, PPC_INST_ADDIS | ___PPC_RT(4) |
+ ((val >> 16) & 0x),
+ patch_err);
addr++;
 
/* ori r4,r4,(insn)@l */
-   patch_instruction(addr, PPC_INST_ORI | ___PPC_RA(4) |
- ___PPC_RS(4) | (val & 0x));
+   PATCH_INSN_OR_GOTO(addr, PPC_INST_ORI | ___PPC_RA(4) |
+ ___PPC_RS(4) | (val & 0x),
+ patch_err);
+
+   return 0;
+
+patch_err:
+   return -EFAULT;
 }
 
 /*
  * Generate instructions to load provided immediate 64-bit value
  * to register 'r3' and patch these instructions at 'addr'.
  */
-void patch_imm64_load_insns(unsigned long val, kprobe_opcode_t *addr)
+static int patch_imm64_load_insns(unsigned long val, kprobe_opcode_t *addr)
 {
/* lis r3,(op)@highest */
-   patch_instruction(addr, PPC_INST_ADDIS | ___PPC_RT(3) |
- ((val >> 48) & 0x));
+   PATCH_INSN_OR_GOTO(addr, PPC_INST_ADDIS | ___PPC_RT(3) |
+ ((val >> 48) & 0x),
+ patch_err);
addr++;
 
/* ori r3,r3,(op)@higher */
-   patch_instruction(addr, PPC_INST_ORI | ___PPC_RA(3) |
- ___PPC_RS(3) | ((val >> 32) & 0x));
+   PATCH_INSN_OR_GOTO(addr, PPC_INST_ORI | ___PPC_RA(3) |
+ ___PPC_RS(3) | ((val >> 32) & 0x),
+ patch_err);
addr++;
 
/* rldicr r3,r3,32,31 */
-   patch_instruction(addr, PPC_INST_RLDICR | ___PPC_RA(3) |
- ___PPC_RS(3) | __PPC_SH64(32) | __PPC_ME64(31));
+   PATCH_INSN_OR_GOTO(addr, PPC_INST_RLDICR | ___PPC_RA(3) |
+ ___PPC_RS(3) | __PPC_SH64(32) | __PPC_ME64(31),
+ patch_err);
addr++;
 
/* oris r3,r3,(op)@h */
-   patch_instruction(addr, PPC_INST_ORIS | ___PPC_RA(3) |
-   

Re: [PATCH v8 8/8] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32

2020-04-28 Thread Christophe Leroy

Hi,


On 04/28/2020 01:16 PM, Christophe Leroy wrote:

Provides __kernel_clock_gettime64() on vdso32. This is the
64 bits version of __kernel_clock_gettime() which is
y2038 compliant.

Signed-off-by: Christophe Leroy 


Why does snowpatch still report upstream failure ? This is fixed in 
latest powerpc/merge which is


commit d9e4eabb775e1ba56d83bd4047de2413a21f87ce
Merge: 54dc28ff5e0b 51184ae37e05
Author: Michael Ellerman 
Date:   Tue Apr 28 09:01:47 2020 +1000

    Automatic merge of branches 'master', 'next' and 'fixes' into merge

Looks like snowpatch is still using

commit 54dc28ff5e0b3585224d49a31b53e030342ca5c3
Merge: 5d23fec1ac1d b2768df24ec4
Author: Michael Ellerman 
Date:   Sun Apr 26 09:44:18 2020 +1000

    Automatic merge of branches 'master', 'next' and 'fixes' into merge

Why ?

Christophe




---
  arch/powerpc/kernel/vdso32/gettimeofday.S  | 9 +
  arch/powerpc/kernel/vdso32/vdso32.lds.S| 1 +
  arch/powerpc/kernel/vdso32/vgettimeofday.c | 6 ++
  3 files changed, 16 insertions(+)

diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S 
b/arch/powerpc/kernel/vdso32/gettimeofday.S
index fd7b01c51281..a6e29f880e0e 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -35,6 +35,15 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
cvdso_call __c_kernel_clock_gettime
  V_FUNCTION_END(__kernel_clock_gettime)
  
+/*

+ * Exact prototype of clock_gettime64()
+ *
+ * int __kernel_clock_gettime64(clockid_t clock_id, struct __timespec64 *ts);
+ *
+ */
+V_FUNCTION_BEGIN(__kernel_clock_gettime64)
+   cvdso_call __c_kernel_clock_gettime64
+V_FUNCTION_END(__kernel_clock_gettime64)
  
  /*

   * Exact prototype of clock_getres()
diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S 
b/arch/powerpc/kernel/vdso32/vdso32.lds.S
index 6cf729612268..05b462143238 100644
--- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
@@ -144,6 +144,7 @@ VERSION
  #ifndef CONFIG_PPC_BOOK3S_601
__kernel_gettimeofday;
__kernel_clock_gettime;
+   __kernel_clock_gettime64;
__kernel_clock_getres;
__kernel_time;
__kernel_get_tbfreq;
diff --git a/arch/powerpc/kernel/vdso32/vgettimeofday.c 
b/arch/powerpc/kernel/vdso32/vgettimeofday.c
index 0b9ab4c22ef2..f7f71fecf4ed 100644
--- a/arch/powerpc/kernel/vdso32/vgettimeofday.c
+++ b/arch/powerpc/kernel/vdso32/vgettimeofday.c
@@ -11,6 +11,12 @@ int __c_kernel_clock_gettime(clockid_t clock, struct 
old_timespec32 *ts,
return __cvdso_clock_gettime32_data(vd, clock, ts);
  }
  
+int __c_kernel_clock_gettime64(clockid_t clock, struct __kernel_timespec *ts,

+  const struct vdso_data *vd)
+{
+   return __cvdso_clock_gettime_data(vd, clock, ts);
+}
+
  int __c_kernel_gettimeofday(struct __kernel_old_timeval *tv, struct timezone 
*tz,
const struct vdso_data *vd)
  {




Re: [PATCH net] ibmvnic: Fall back to 16 H_SEND_SUB_CRQ_INDIRECT entries with old FW

2020-04-28 Thread Thomas Falcon

On 4/27/20 12:33 PM, Juliet Kim wrote:

The maximum entries for H_SEND_SUB_CRQ_INDIRECT has increased on
some platforms from 16 to 128. If Live Partition Mobility is used
to migrate a running OS image from a newer source platform to an
older target platform, then H_SEND_SUB_CRQ_INDIRECT will fail with
H_PARAMETER if 128 entries are queued.

Fix this by falling back to 16 entries if H_PARAMETER is returned
from the hcall().


Thanks for the submission, but I am having a hard time believing that 
this is what is happening since the driver does not support sending 
multiple frames per hypervisor call at this time. Even if it were the 
case, this approach would omit frame data needed by the VF, so the 
second attempt may still fail. Are there system logs available that show 
the driver is attempting to send transmissions with greater than 16 
descriptors?


Thanks,

Tom




Signed-off-by: Juliet Kim 
---
  drivers/net/ethernet/ibm/ibmvnic.c | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 4bd33245bad6..b66c2f26a427 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1656,6 +1656,17 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, 
struct net_device *netdev)
lpar_rc = send_subcrq_indirect(adapter, handle_array[queue_num],
   (u64)tx_buff->indir_dma,
   (u64)num_entries);
+
+   /* Old firmware accepts max 16 num_entries */
+   if (lpar_rc == H_PARAMETER && num_entries > 16) {
+   tx_crq.v1.n_crq_elem = 16;
+   tx_buff->num_entries = 16;
+   lpar_rc = send_subcrq_indirect(adapter,
+  handle_array[queue_num],
+  (u64)tx_buff->indir_dma,
+  16);
+   }
+
dma_unmap_single(dev, tx_buff->indir_dma,
 sizeof(tx_buff->indir_arr), DMA_TO_DEVICE);
} else {


Re: [RFC PATCH] powerpc/spufs: fix copy_to_user while atomic

2020-04-28 Thread Arnd Bergmann
On Tue, Apr 28, 2020 at 2:05 PM Jeremy Kerr  wrote:
>
> Currently, we may perform a copy_to_user (through
> simple_read_from_buffer()) while holding a context's register_lock,
> while accessing the context save area.
>
> This change uses a temporary buffers for the context save area data,
> which we then pass to simple_read_from_buffer.
>
> Signed-off-by: Jeremy Kerr 
> ---

Thanks for fixing this!

I wonder how far it should be backported, given that this has been broken for
14 years now.

Fixes: bf1ab978be23 ("[POWERPC] coredump: Add SPU elf notes to coredump.")
Reviewed-by: Arnd Bergmann 


Re: [PATCH v8 8/8] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32

2020-04-28 Thread Arnd Bergmann
On Tue, Apr 28, 2020 at 3:16 PM Christophe Leroy
 wrote:
>
> Provides __kernel_clock_gettime64() on vdso32. This is the
> 64 bits version of __kernel_clock_gettime() which is
> y2038 compliant.
>
> Signed-off-by: Christophe Leroy 

Looks good to me

Reviewed-by: Arnd Bergmann 

There was a bug on ARM for the corresponding function, so far it is unclear
if this was a problem related to particular hardware, the 32-bit kernel code,
or the common implementation of clock_gettime64 in the vdso library,
see https://github.com/richfelker/musl-cross-make/issues/96

Just to be sure that powerpc is not affected by the same issue, can you
confirm that repeatedly calling clock_gettime64 on powerpc32, alternating
between vdso and syscall, results in monotically increasing times?

   Arnd


Re: [RFC PATCH v2 7/7] powerpc/selftest: reuse ppc-opcode macros to avoid redundancy

2020-04-28 Thread Naveen N. Rao

Balamuruhan S wrote:

Avoid redefining macros to encode ppc instructions instead reuse it from
ppc-opcode.h, Makefile changes are necessary to compile memcmp_64.S with
__ASSEMBLY__ defined from selftests.

Signed-off-by: Balamuruhan S 
---
 .../selftests/powerpc/stringloops/Makefile| 34 ++
 .../powerpc/stringloops/asm/asm-const.h   |  1 +
 .../powerpc/stringloops/asm/ppc-opcode.h  | 36 +--
 3 files changed, 29 insertions(+), 42 deletions(-)
 create mode 12 tools/testing/selftests/powerpc/stringloops/asm/asm-const.h
 mode change 100644 => 12 
tools/testing/selftests/powerpc/stringloops/asm/ppc-opcode.h

diff --git a/tools/testing/selftests/powerpc/stringloops/Makefile 
b/tools/testing/selftests/powerpc/stringloops/Makefile
index 7fc0623d85c3..efe76c5a5b94 100644
--- a/tools/testing/selftests/powerpc/stringloops/Makefile
+++ b/tools/testing/selftests/powerpc/stringloops/Makefile
@@ -1,26 +1,44 @@
 # SPDX-License-Identifier: GPL-2.0
 # The loops are all 64-bit code
-CFLAGS += -I$(CURDIR)
+GIT_VERSION = $(shell git describe --always --long --dirty || echo "unknown")
+CFLAGS += -DGIT_VERSION='"$(GIT_VERSION)"' -I$(CURDIR) -I$(CURDIR)/../include


I'm not sure why there are some extra flags added here. Not sure if it 
is required for this change, or if they are separate changes. Trying to 
build this from 'tools/testing/selftests/powerpc/' with 'make 
stringloops' shows slightly different compile flags being used.


- Naveen



Re: [RFC PATCH v2 0/7] consolidate PowerPC instruction encoding macros

2020-04-28 Thread Naveen N. Rao

Balamuruhan S wrote:

ppc-opcode.h have base instruction encoding wrapped with stringify_in_c()
for raw encoding to have compatibility. But there are redundant macros for
base instruction encodings in bpf, instruction emulation test infrastructure
and powerpc selftests.

Currently PPC_INST_* macros are used for encoding instruction opcode and PPC_*
for raw instuction encoding, this rfc patchset introduces PPC_RAW_* macros for
base instruction encoding and reuse it from elsewhere. With this change we can
avoid redundant macro definitions in multiple files and start adding new
instructions in ppc-opcode.h in future.

Changes in v2:
-
Fix review comments/suggestions from Naveen and Michael Ellerman,

* Rename PPC_ENCODE_* to PPC_RAW_* for base instruction encoding macros.
* Split the patches that does mass renaming and make them simpler that just
  adds new macros.
* Keep the patch to update all the existing names later (patch 6).
* Lot of PPC_INST_* macros are used only in ppc-opcode.h for PPC_*  macros,
  fold PPC_INST_* encoding into PPC_RAW_* to avoid using them accidentally.
* Fixed clipped macros that was due to a typo/copy-paste
* Consolidated all the instruction encoding macros from bpf_jit.h to
  ppc-opcode.h
* squashed patch that removes the duplicate macro PPC_MR() in bpf_jit.h
* merge few changes in bpf_jit files from patch 2 into patch 3
* few fixes in powerpc selftest stringloops Makefile
* build tested for ppc64le_defconfig, ppc64e_defconfig and pmac32_defconfig
* Rebased on next branch of linuxppc tree

Testing:
---
* Tested it by compiling vmlinux and comparing objdump of it with and without
  the patchset and observed that it remains same,

  # diff vmlinux_objdump vmlinux_rfc_objdump 
  2c2

  < vmlinux: file format elf64-powerpcle
  ---
  > vmlinux_rfc: file format elf64-powerpcle

* Tested building it with this changes for Fedora30 config, booted VM
  with powerpc next and powerpc next + patchset to run powerpc selftest and
  ftrace selftest. There were couple of failures that were common and
  patchset did not introduce any new failures.

  ftrace selftest:
  ---
# # of passed:  96
# # of failed:  1
# # of unresolved:  7
# # of untested:  0
# # of unsupported:  1
# # of xfailed:  1
# # of undefined(test bug):  0
not ok 1 selftests: ftrace: ftracetest # exit=1

  powerpc selftest:
  
not ok 7 selftests: powerpc/dscr: dscr_sysfs_thread_test # exit=1
not ok 20 selftests: powerpc/pmu/ebb: lost_exception_test # TIMEOUT
not ok 2 selftests: powerpc/security: spectre_v2 # exit=1

Thanks to Naveen, Sandipan and Michael on overall suggestions/improvements.

I would request for review and suggestions to make it better.

v1: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-March/206494.html

Balamuruhan S (7):
  powerpc/ppc-opcode: introduce PPC_RAW_* macros for base instruction
encoding
  powerpc/ppc-opcode: move ppc instruction encoding from
test_emulate_step
  powerpc/bpf_jit: reuse instruction macros from ppc-opcode.h
  powerpc/ppc-opcode: consolidate powerpc instructions from bpf_jit.h
  powerpc/ppc-opcode: reuse raw instruction macros to stringify
  powerpc/ppc-opcode: fold PPC_INST_* macros into PPC_RAW_* macros
  powerpc/selftest: reuse ppc-opcode macros to avoid redundancy

 arch/powerpc/include/asm/ppc-opcode.h | 706 +++---
 arch/powerpc/lib/test_emulate_step.c  | 155 ++--
 arch/powerpc/net/bpf_jit.h| 184 +
 arch/powerpc/net/bpf_jit32.h  |  34 +-
 arch/powerpc/net/bpf_jit64.h  |  16 +-
 arch/powerpc/net/bpf_jit_comp.c   | 134 ++--
 arch/powerpc/net/bpf_jit_comp64.c | 298 
 .../selftests/powerpc/stringloops/Makefile|  34 +-
 .../powerpc/stringloops/asm/asm-const.h   |   1 +
 .../powerpc/stringloops/asm/ppc-opcode.h  |  36 +-
 10 files changed, 762 insertions(+), 836 deletions(-)
 create mode 12 tools/testing/selftests/powerpc/stringloops/asm/asm-const.h
 mode change 100644 => 12 
tools/testing/selftests/powerpc/stringloops/asm/ppc-opcode.h


This LGTM. Except the last patch:
Acked-by: Naveen N. Rao 
Tested-by: Naveen N. Rao 


Thanks!
- Naveen



Re: [RFC PATCH v2 4/7] powerpc/ppc-opcode: consolidate powerpc instructions from bpf_jit.h

2020-04-28 Thread Naveen N. Rao

Balamuruhan S wrote:

move macro definitions of powerpc instructions from bpf_jit.h to ppc-opcode.h
and adopt the users of the macros accordingly. `PPC_MR()` is defined twice in
bpf_jit.h, remove the duplicate one.

Signed-off-by: Balamuruhan S 
---
 arch/powerpc/include/asm/ppc-opcode.h | 139 +
 arch/powerpc/net/bpf_jit.h| 166 ++-
 arch/powerpc/net/bpf_jit32.h  |  24 +--
 arch/powerpc/net/bpf_jit64.h  |  12 +-
 arch/powerpc/net/bpf_jit_comp.c   | 132 ++--
 arch/powerpc/net/bpf_jit_comp64.c | 278 +-
 6 files changed, 378 insertions(+), 373 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 2ae0afc5c2bb..6b9a891884be 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -79,6 +79,16 @@
 #define IMM_L(i)   ((uintptr_t)(i) & 0x)
 #define IMM_DS(i)  ((uintptr_t)(i) & 0xfffc)

+/*
+ * 16-bit immediate helper macros: HA() is for use with sign-extending instrs
+ * (e.g. LD, ADDI).  If the bottom 16 bits is "-ve", add another bit into the
+ * top half to negate the effect (i.e. 0x + 1 = 0x(1)).
+ */
+#define IMM_H(i)((uintptr_t)(i)>>16)
+#define IMM_HA(i)   (((uintptr_t)(i)>>16) +   \
+   (((uintptr_t)(i) & 0x8000) >> 15))
+


Not needed for this patch series, but at some point, we should move over 
to using the PPC_LO(), PPC_HI() and PPC_HA() macros that are defined 
later in this file.



- Naveen



Re: [RFC PATCH] powerpc/spufs: fix copy_to_user while atomic

2020-04-28 Thread Christoph Hellwig
FYI, these little hunks reduce the difference to my version, maybe
you can fold them in?

diff --git a/arch/powerpc/platforms/cell/spufs/file.c 
b/arch/powerpc/platforms/cell/spufs/file.c
index c62d77ddaf7d3..1861436a6091d 100644
--- a/arch/powerpc/platforms/cell/spufs/file.c
+++ b/arch/powerpc/platforms/cell/spufs/file.c
@@ -2107,7 +2107,6 @@ static const struct file_operations spufs_wbox_info_fops 
= {
 static void ___spufs_dma_info_read(struct spu_context *ctx,
struct spu_dma_info *info)
 {
-   struct mfc_cq_sr *qp, *spuqp;
int i;
 
info->dma_info_type = ctx->csa.priv2.spu_tag_status_query_RW;
@@ -2116,8 +2115,8 @@ static void ___spufs_dma_info_read(struct spu_context 
*ctx,
info->dma_info_stall_and_notify = ctx->csa.spu_chnldata_RW[25];
info->dma_info_atomic_command_status = ctx->csa.spu_chnldata_RW[27];
for (i = 0; i < 16; i++) {
-   qp = &info->dma_info_command_data[i];
-   spuqp = &ctx->csa.priv2.spuq[i];
+   struct mfc_cq_sr *qp = &info->dma_info_command_data[i];
+   struct mfc_cq_sr *qp, *spuqp = &ctx->csa.priv2.spuq[i];
 
qp->mfc_cq_data0_RW = spuqp->mfc_cq_data0_RW;
qp->mfc_cq_data1_RW = spuqp->mfc_cq_data1_RW;
@@ -2166,7 +2165,7 @@ static const struct file_operations spufs_dma_info_fops = 
{
 };
 
 static void ___spufs_proxydma_info_read(struct spu_context *ctx,
-   struct spu_proxydma_info *info)
+   struct spu_proxydma_info *info)
 {
int i;
 


Re: [PATCH v3 00/29] Convert files to ReST - part 2

2020-04-28 Thread Jonathan Corbet
On Mon, 27 Apr 2020 23:16:52 +0200
Mauro Carvalho Chehab  wrote:

> This is the second part of a series I wrote sometime ago where I manually
> convert lots of files to be properly parsed by Sphinx as ReST files.
> 
> As it touches on lot of stuff, this series is based on today's linux-next, 
> at tag next-20190617.
> 
> The first version of this series had 57 patches. The first part with 28 
> patches
> were already merged. Right now, there are still ~76  patches pending applying
> (including this series), and that's because I opted to do ~1 patch per 
> converted
>  directory.
> 
> That sounds too much to be send on a single round. So, I'm opting to split
> it on 3 parts for the conversion, plus a final patch adding orphaned books
> to existing ones. 
> 
> Those patches should probably be good to be merged either by subsystem
> maintainers or via the docs tree.

So I'm happy to merge this set, but there is one thing that worries me a
bit... 

>  fs/cachefiles/Kconfig |4 +-
>  fs/coda/Kconfig   |2 +-
>  fs/configfs/inode.c   |2 +-
>  fs/configfs/item.c|2 +-
>  fs/fscache/Kconfig|8 +-
>  fs/fscache/cache.c|8 +-
>  fs/fscache/cookie.c   |2 +-
>  fs/fscache/object.c   |4 +-
>  fs/fscache/operation.c|2 +-
>  fs/locks.c|2 +-
>  include/linux/configfs.h  |2 +-
>  include/linux/fs_context.h|2 +-
>  include/linux/fscache-cache.h |4 +-
>  include/linux/fscache.h   |   42 +-
>  include/linux/lsm_hooks.h |2 +-

I'd feel a bit better if I could get an ack or two from filesystem folks
before I venture that far out of my own yard...what say you all?

Thanks,

jon


[PATCH] fixup! signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32

2020-04-28 Thread Arnd Bergmann
I think I found a way to improve the x32 handling:

This is a simplification over Christoph's "[PATCH 2/7] signal: factor
copy_siginfo_to_external32 from copy_siginfo_to_user32", reducing the
x32 specifics in the common code to a single #ifdef/#endif check, in
order to keep it more readable for everyone else.

Christoph, if you like it, please fold into your patch.

Cc: Andrew Morton 
Cc: Alexander Viro 
Cc: Jeremy Kerr 
Cc: Arnd Bergmann 
Cc: "Eric W . Biederman" 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-fsde...@vger.kernel.org
Signed-off-by: Arnd Bergmann 
---
 arch/x86/include/asm/compat.h | 10 ++
 arch/x86/kernel/signal.c  | 23 +++
 kernel/signal.c   | 15 ++-
 3 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index 52e9f3480f69..9341ea3da757 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -214,7 +214,17 @@ static inline bool in_compat_syscall(void)
 #endif
 
 struct compat_siginfo;
+#ifdef CONFIG_X86_X32_ABI
 int __copy_siginfo_to_user32(struct compat_siginfo __user *to,
const kernel_siginfo_t *from, bool x32_ABI);
+#else
+int copy_siginfo_to_user32(struct compat_siginfo __user *to,
+   const kernel_siginfo_t *from);
+static inline int __copy_siginfo_to_user32(struct compat_siginfo __user *to,
+   const kernel_siginfo_t *from, bool x32_ABI)
+{
+   return copy_siginfo_to_user32(to, from);
+}
+#endif
 
 #endif /* _ASM_X86_COMPAT_H */
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 83b74fb38c8f..0e166571d28b 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -511,6 +511,29 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig,
 }
 #endif /* CONFIG_X86_32 */
 
+#ifdef CONFIG_X86_X32_ABI
+int copy_siginfo_to_user32(struct compat_siginfo __user *to,
+  const struct kernel_siginfo *from)
+{
+   return __copy_siginfo_to_user32(to, from, in_x32_syscall());
+}
+
+int __copy_siginfo_to_user32(struct compat_siginfo __user *to,
+const struct kernel_siginfo *from, bool x32_ABI)
+{
+   struct compat_siginfo new;
+
+   copy_siginfo_to_external32(&new, from);
+   if (x32_ABI && from->si_signo == SIGCHLD) {
+   new._sifields._sigchld_x32._utime = from->si_utime;
+   new._sifields._sigchld_x32._stime = from->si_stime;
+   }
+   if (copy_to_user(to, &new, sizeof(struct compat_siginfo)))
+   return -EFAULT;
+   return 0;
+}
+#endif
+
 static int x32_setup_rt_frame(struct ksignal *ksig,
  compat_sigset_t *set,
  struct pt_regs *regs)
diff --git a/kernel/signal.c b/kernel/signal.c
index 1a81602050b4..935facca4860 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -3318,29 +3318,18 @@ void copy_siginfo_to_external32(struct compat_siginfo 
*to,
}
 }
 
+#ifndef CONFIG_X86_X32_ABI
 int copy_siginfo_to_user32(struct compat_siginfo __user *to,
   const struct kernel_siginfo *from)
-#if defined(CONFIG_X86_X32_ABI) || defined(CONFIG_IA32_EMULATION)
-{
-   return __copy_siginfo_to_user32(to, from, in_x32_syscall());
-}
-int __copy_siginfo_to_user32(struct compat_siginfo __user *to,
-const struct kernel_siginfo *from, bool x32_ABI)
-#endif
 {
struct compat_siginfo new;
 
copy_siginfo_to_external32(&new, from);
-#ifdef CONFIG_X86_X32_ABI
-   if (x32_ABI && from->si_signo == SIGCHLD) {
-   new._sifields._sigchld_x32._utime = from->si_utime;
-   new._sifields._sigchld_x32._stime = from->si_stime;
-   }
-#endif
if (copy_to_user(to, &new, sizeof(struct compat_siginfo)))
return -EFAULT;
return 0;
 }
+#endif
 
 static int post_copy_siginfo_from_user32(kernel_siginfo_t *to,
 const struct compat_siginfo *from)
-- 
2.26.0



[PATCH v4 1/4] hugetlbfs: add arch_hugetlb_valid_size

2020-04-28 Thread Mike Kravetz
The architecture independent routine hugetlb_default_setup sets up
the default huge pages size.  It has no way to verify if the passed
value is valid, so it accepts it and attempts to validate at a later
time.  This requires undocumented cooperation between the arch specific
and arch independent code.

For architectures that support more than one huge page size, provide
a routine arch_hugetlb_valid_size to validate a huge page size.
hugetlb_default_setup can use this to validate passed values.

arch_hugetlb_valid_size will also be used in a subsequent patch to
move processing of the "hugepagesz=" in arch specific code to a common
routine in arch independent code.

Signed-off-by: Mike Kravetz 
Acked-by: Gerald Schaefer   [s390]
Acked-by: Will Deacon 
---
 arch/arm64/mm/hugetlbpage.c   | 17 +
 arch/powerpc/mm/hugetlbpage.c | 20 +---
 arch/riscv/mm/hugetlbpage.c   | 26 +-
 arch/s390/mm/hugetlbpage.c| 16 
 arch/sparc/mm/init_64.c   | 24 
 arch/x86/mm/hugetlbpage.c | 17 +
 include/linux/hugetlb.h   |  1 +
 mm/hugetlb.c  | 21 ++---
 8 files changed, 103 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index bbeb6a5a6ba6..069b96ee2aec 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -462,17 +462,26 @@ static int __init hugetlbpage_init(void)
 }
 arch_initcall(hugetlbpage_init);
 
-static __init int setup_hugepagesz(char *opt)
+bool __init arch_hugetlb_valid_size(unsigned long size)
 {
-   unsigned long ps = memparse(opt, &opt);
-
-   switch (ps) {
+   switch (size) {
 #ifdef CONFIG_ARM64_4K_PAGES
case PUD_SIZE:
 #endif
case CONT_PMD_SIZE:
case PMD_SIZE:
case CONT_PTE_SIZE:
+   return true;
+   }
+
+   return false;
+}
+
+static __init int setup_hugepagesz(char *opt)
+{
+   unsigned long ps = memparse(opt, &opt);
+
+   if (arch_hugetlb_valid_size(ps)) {
add_huge_page_size(ps);
return 1;
}
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 33b3461d91e8..de54d2a37830 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -558,7 +558,7 @@ unsigned long vma_mmu_pagesize(struct vm_area_struct *vma)
return vma_kernel_pagesize(vma);
 }
 
-static int __init add_huge_page_size(unsigned long long size)
+bool __init arch_hugetlb_valid_size(unsigned long size)
 {
int shift = __ffs(size);
int mmu_psize;
@@ -566,20 +566,26 @@ static int __init add_huge_page_size(unsigned long long 
size)
/* Check that it is a page size supported by the hardware and
 * that it fits within pagetable and slice limits. */
if (size <= PAGE_SIZE || !is_power_of_2(size))
-   return -EINVAL;
+   return false;
 
mmu_psize = check_and_get_huge_psize(shift);
if (mmu_psize < 0)
-   return -EINVAL;
+   return false;
 
BUG_ON(mmu_psize_defs[mmu_psize].shift != shift);
 
-   /* Return if huge page size has already been setup */
-   if (size_to_hstate(size))
-   return 0;
+   return true;
+}
 
-   hugetlb_add_hstate(shift - PAGE_SHIFT);
+static int __init add_huge_page_size(unsigned long long size)
+{
+   int shift = __ffs(size);
+
+   if (!arch_hugetlb_valid_size((unsigned long)size))
+   return -EINVAL;
 
+   if (!size_to_hstate(size))
+   hugetlb_add_hstate(shift - PAGE_SHIFT);
return 0;
 }
 
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index a6189ed36c5f..da1f516bc451 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -12,21 +12,29 @@ int pmd_huge(pmd_t pmd)
return pmd_leaf(pmd);
 }
 
+bool __init arch_hugetlb_valid_size(unsigned long size)
+{
+   if (size == HPAGE_SIZE)
+   return true;
+   else if (IS_ENABLED(CONFIG_64BIT) && size == PUD_SIZE)
+   return true;
+   else
+   return false;
+}
+
 static __init int setup_hugepagesz(char *opt)
 {
unsigned long ps = memparse(opt, &opt);
 
-   if (ps == HPAGE_SIZE) {
-   hugetlb_add_hstate(HPAGE_SHIFT - PAGE_SHIFT);
-   } else if (IS_ENABLED(CONFIG_64BIT) && ps == PUD_SIZE) {
-   hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
-   } else {
-   hugetlb_bad_size();
-   pr_err("hugepagesz: Unsupported page size %lu M\n", ps >> 20);
-   return 0;
+   if (arch_hugetlb_valid_size(ps)) {
+   hugetlb_add_hstate(ilog2(ps) - PAGE_SHIFT);
+   return 1;
}
 
-   return 1;
+   hugetlb_bad_size();
+   pr_err("hugepagesz: Unsupported page size %lu M\n", ps >> 20);
+   return 0;
+
 }
 __setup("hugepagesz

[PATCH v4 0/4] Clean up hugetlb boot command line processing

2020-04-28 Thread Mike Kravetz
v4 -
   Fixed huge page order definitions for arm64 (Qian Cai)
   Removed hugepages_supported() checks in command line processing as
 powerpc does not set hugepages_supported until later in boot (Sandipan)
   Added Acks, Reviews and Tested (Will, Gerald, Anders, Sandipan)

v3 -
   Used weak attribute method of defining arch_hugetlb_valid_size.
 This eliminates changes to arch specific hugetlb.h files (Peter)
   Updated documentation (Peter, Randy)
   Fixed handling of implicitly specified gigantic page preallocation
 in existing code and removed documentation of such.  There is now
 no difference between handling of gigantic and non-gigantic pages.
 (Peter, Nitesh).
 This requires the most review as there is a small change to
 undocumented behavior.  See patch 4 commit message for details.
   Added Acks and Reviews (Mina, Peter)

v2 -
   Fix build errors with patch 1 (Will)
   Change arch_hugetlb_valid_size arg to unsigned long and remove
 irrelevant 'extern' keyword (Christophe)
   Documentation and other misc changes (Randy, Christophe, Mina)
   Do not process command line options if !hugepages_supported()
 (Dave, but it sounds like we may want to additional changes to
  hugepages_supported() for x86?  If that is needed I would prefer
  a separate patch.)

Longpeng(Mike) reported a weird message from hugetlb command line processing
and proposed a solution [1].  While the proposed patch does address the
specific issue, there are other related issues in command line processing.
As hugetlbfs evolved, updates to command line processing have been made to
meet immediate needs and not necessarily in a coordinated manner.  The result
is that some processing is done in arch specific code, some is done in arch
independent code and coordination is problematic.  Semantics can vary between
architectures.

The patch series does the following:
- Define arch specific arch_hugetlb_valid_size routine used to validate
  passed huge page sizes.
- Move hugepagesz= command line parsing out of arch specific code and into
  an arch independent routine.
- Clean up command line processing to follow desired semantics and
  document those semantics.

[1] https://lore.kernel.org/linux-mm/20200305033014.1152-1-longpe...@huawei.com

Mike Kravetz (4):
  hugetlbfs: add arch_hugetlb_valid_size
  hugetlbfs: move hugepagesz= parsing to arch independent code
  hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate
  hugetlbfs: clean up command line processing

 .../admin-guide/kernel-parameters.txt |  40 ++--
 Documentation/admin-guide/mm/hugetlbpage.rst  |  35 
 arch/arm64/mm/hugetlbpage.c   |  30 +--
 arch/powerpc/mm/hugetlbpage.c |  30 +--
 arch/riscv/mm/hugetlbpage.c   |  24 +--
 arch/s390/mm/hugetlbpage.c|  24 +--
 arch/sparc/mm/init_64.c   |  43 +
 arch/x86/mm/hugetlbpage.c |  23 +--
 include/linux/hugetlb.h   |   2 +-
 mm/hugetlb.c  | 180 ++
 10 files changed, 260 insertions(+), 171 deletions(-)

-- 
2.25.4



[PATCH v4 4/4] hugetlbfs: clean up command line processing

2020-04-28 Thread Mike Kravetz
With all hugetlb page processing done in a single file clean up code.
- Make code match desired semantics
  - Update documentation with semantics
- Make all warnings and errors messages start with 'HugeTLB:'.
- Consistently name command line parsing routines.
- Warn if !hugepages_supported() and command line parameters have
  been specified.
- Add comments to code
  - Describe some of the subtle interactions
  - Describe semantics of command line arguments

This patch also fixes issues with implicitly setting the number of
gigantic huge pages to preallocate.  Previously on X86 command line,
hugepages=2 default_hugepagesz=1G
would result in zero 1G pages being preallocated and,
# grep HugePages_Total /proc/meminfo
HugePages_Total:   0
# sysctl -a | grep nr_hugepages
vm.nr_hugepages = 2
vm.nr_hugepages_mempolicy = 2
# cat /proc/sys/vm/nr_hugepages
2
After this patch 2 gigantic pages will be preallocated and all the
proc, sysfs, sysctl and meminfo files will accurately reflect this.

To address the issue with gigantic pages, a small change in behavior
was made to command line processing.  Previously the command line,
hugepages=128 default_hugepagesz=2M hugepagesz=2M hugepages=256
would result in the allocation of 256 2M huge pages.  The value 128
would be ignored without any warning.  After this patch, 128 2M pages
will be allocated and a warning message will be displayed indicating
the value of 256 is ignored.  This change in behavior is required
because allocation of implicitly specified gigantic pages must be done
when the default_hugepagesz= is encountered for gigantic pages.
Previously the code waited until later in the boot process (hugetlb_init),
to allocate pages of default size.  However the bootmem allocator required
for gigantic allocations is not available at this time.

Signed-off-by: Mike Kravetz 
Acked-by: Gerald Schaefer   [s390]
Acked-by: Will Deacon 
Tested-by: Sandipan Das 
---
 .../admin-guide/kernel-parameters.txt |  40 +++--
 Documentation/admin-guide/mm/hugetlbpage.rst  |  35 
 mm/hugetlb.c  | 149 ++
 3 files changed, 179 insertions(+), 45 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 7bc83f3d9bdf..cbe657b86d0e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -834,12 +834,15 @@
See also Documentation/networking/decnet.txt.
 
default_hugepagesz=
-   [same as hugepagesz=] The size of the default
-   HugeTLB page size. This is the size represented by
-   the legacy /proc/ hugepages APIs, used for SHM, and
-   default size when mounting hugetlbfs filesystems.
-   Defaults to the default architecture's huge page size
-   if not specified.
+   [HW] The size of the default HugeTLB page. This is
+   the size represented by the legacy /proc/ hugepages
+   APIs.  In addition, this is the default hugetlb size
+   used for shmget(), mmap() and mounting hugetlbfs
+   filesystems.  If not specified, defaults to the
+   architecture's default huge page size.  Huge page
+   sizes are architecture dependent.  See also
+   Documentation/admin-guide/mm/hugetlbpage.rst.
+   Format: size[KMG]
 
deferred_probe_timeout=
[KNL] Debugging option to set a timeout in seconds for
@@ -1479,13 +1482,24 @@
hugepages using the cma allocator. If enabled, the
boot-time allocation of gigantic hugepages is skipped.
 
-   hugepages=  [HW,X86-32,IA-64] HugeTLB pages to allocate at boot.
-   hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
-   On x86-64 and powerpc, this option can be specified
-   multiple times interleaved with hugepages= to reserve
-   huge pages of different sizes. Valid pages sizes on
-   x86-64 are 2M (when the CPU supports "pse") and 1G
-   (when the CPU supports the "pdpe1gb" cpuinfo flag).
+   hugepages=  [HW] Number of HugeTLB pages to allocate at boot.
+   If this follows hugepagesz (below), it specifies
+   the number of pages of hugepagesz to be allocated.
+   If this is the first HugeTLB parameter on the command
+   line, it specifies the number of pages to allocate for
+   the default huge page size.  See also
+   Documentation/admin-guide/mm/hugetlb

[PATCH v4 3/4] hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate

2020-04-28 Thread Mike Kravetz
The routine hugetlb_add_hstate prints a warning if the hstate already
exists.  This was originally done as part of kernel command line
parsing.  If 'hugepagesz=' was specified more than once, the warning
pr_warn("hugepagesz= specified twice, ignoring\n");
would be printed.

Some architectures want to enable all huge page sizes.  They would
call hugetlb_add_hstate for all supported sizes.  However, this was
done after command line processing and as a result hstates could have
already been created for some sizes.  To make sure no warning were
printed, there would often be code like:
if (!size_to_hstate(size)
hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT)

The only time we want to print the warning is as the result of command
line processing.  So, remove the warning from hugetlb_add_hstate and
add it to the single arch independent routine processing "hugepagesz=".
After this, calls to size_to_hstate() in arch specific code can be
removed and hugetlb_add_hstate can be called without worrying about
warning messages.

Signed-off-by: Mike Kravetz 
Acked-by: Mina Almasry 
Acked-by: Gerald Schaefer   [s390]
Acked-by: Will Deacon 
Tested-by: Anders Roxell 
---
 arch/arm64/mm/hugetlbpage.c   | 16 
 arch/powerpc/mm/hugetlbpage.c |  3 +--
 arch/riscv/mm/hugetlbpage.c   |  2 +-
 arch/sparc/mm/init_64.c   | 19 ---
 arch/x86/mm/hugetlbpage.c |  2 +-
 mm/hugetlb.c  |  9 ++---
 6 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index f706b821aba6..14bed8f4674a 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -441,22 +441,14 @@ void huge_ptep_clear_flush(struct vm_area_struct *vma,
clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig);
 }
 
-static void __init add_huge_page_size(unsigned long size)
-{
-   if (size_to_hstate(size))
-   return;
-
-   hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT);
-}
-
 static int __init hugetlbpage_init(void)
 {
 #ifdef CONFIG_ARM64_4K_PAGES
-   add_huge_page_size(PUD_SIZE);
+   hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
 #endif
-   add_huge_page_size(CONT_PMD_SIZE);
-   add_huge_page_size(PMD_SIZE);
-   add_huge_page_size(CONT_PTE_SIZE);
+   hugetlb_add_hstate((CONT_PMD_SHIFT + PMD_SHIFT) - PAGE_SHIFT);
+   hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate((CONT_PTE_SHIFT + PAGE_SHIFT) - PAGE_SHIFT);
 
return 0;
 }
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 2c3fa0a7787b..4d5ed1093615 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -584,8 +584,7 @@ static int __init add_huge_page_size(unsigned long long 
size)
if (!arch_hugetlb_valid_size((unsigned long)size))
return -EINVAL;
 
-   if (!size_to_hstate(size))
-   hugetlb_add_hstate(shift - PAGE_SHIFT);
+   hugetlb_add_hstate(shift - PAGE_SHIFT);
return 0;
 }
 
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index 4e5d7e9f0eef..932dadfdca54 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -26,7 +26,7 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 static __init int gigantic_pages_init(void)
 {
/* With CONTIG_ALLOC, we can allocate gigantic pages at runtime */
-   if (IS_ENABLED(CONFIG_64BIT) && !size_to_hstate(1UL << PUD_SHIFT))
+   if (IS_ENABLED(CONFIG_64BIT))
hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
return 0;
 }
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 4618f96fd30f..ae819a16d07a 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -325,23 +325,12 @@ static void __update_mmu_tsb_insert(struct mm_struct *mm, 
unsigned long tsb_inde
 }
 
 #ifdef CONFIG_HUGETLB_PAGE
-static void __init add_huge_page_size(unsigned long size)
-{
-   unsigned int order;
-
-   if (size_to_hstate(size))
-   return;
-
-   order = ilog2(size) - PAGE_SHIFT;
-   hugetlb_add_hstate(order);
-}
-
 static int __init hugetlbpage_init(void)
 {
-   add_huge_page_size(1UL << HPAGE_64K_SHIFT);
-   add_huge_page_size(1UL << HPAGE_SHIFT);
-   add_huge_page_size(1UL << HPAGE_256MB_SHIFT);
-   add_huge_page_size(1UL << HPAGE_2GB_SHIFT);
+   hugetlb_add_hstate(HPAGE_64K_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(HPAGE_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(HPAGE_256MB_SHIFT - PAGE_SHIFT);
+   hugetlb_add_hstate(HPAGE_2GB_SHIFT - PAGE_SHIFT);
 
return 0;
 }
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 937d640a89e3..cf5781142716 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -195,7 +195,7 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 static __init int gigantic_pages_init(void)
 {
/* With compaction 

[PATCH v4 2/4] hugetlbfs: move hugepagesz= parsing to arch independent code

2020-04-28 Thread Mike Kravetz
Now that architectures provide arch_hugetlb_valid_size(), parsing
of "hugepagesz=" can be done in architecture independent code.
Create a single routine to handle hugepagesz= parsing and remove
all arch specific routines.  We can also remove the interface
hugetlb_bad_size() as this is no longer used outside arch independent
code.

This also provides consistent behavior of hugetlbfs command line
options.  The hugepagesz= option should only be specified once for
a specific size, but some architectures allow multiple instances.
This appears to be more of an oversight when code was added by some
architectures to set up ALL huge pages sizes.

Signed-off-by: Mike Kravetz 
Acked-by: Mina Almasry 
Reviewed-by: Peter Xu 
Acked-by: Gerald Schaefer   [s390]
Acked-by: Will Deacon 
---
 arch/arm64/mm/hugetlbpage.c   | 15 ---
 arch/powerpc/mm/hugetlbpage.c | 15 ---
 arch/riscv/mm/hugetlbpage.c   | 16 
 arch/s390/mm/hugetlbpage.c| 18 --
 arch/sparc/mm/init_64.c   | 22 --
 arch/x86/mm/hugetlbpage.c | 16 
 include/linux/hugetlb.h   |  1 -
 mm/hugetlb.c  | 23 +--
 8 files changed, 17 insertions(+), 109 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 069b96ee2aec..f706b821aba6 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -476,18 +476,3 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 
return false;
 }
-
-static __init int setup_hugepagesz(char *opt)
-{
-   unsigned long ps = memparse(opt, &opt);
-
-   if (arch_hugetlb_valid_size(ps)) {
-   add_huge_page_size(ps);
-   return 1;
-   }
-
-   hugetlb_bad_size();
-   pr_err("hugepagesz: Unsupported page size %lu K\n", ps >> 10);
-   return 0;
-}
-__setup("hugepagesz=", setup_hugepagesz);
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index de54d2a37830..2c3fa0a7787b 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -589,21 +589,6 @@ static int __init add_huge_page_size(unsigned long long 
size)
return 0;
 }
 
-static int __init hugepage_setup_sz(char *str)
-{
-   unsigned long long size;
-
-   size = memparse(str, &str);
-
-   if (add_huge_page_size(size) != 0) {
-   hugetlb_bad_size();
-   pr_err("Invalid huge page size specified(%llu)\n", size);
-   }
-
-   return 1;
-}
-__setup("hugepagesz=", hugepage_setup_sz);
-
 static int __init hugetlbpage_init(void)
 {
bool configured = false;
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index da1f516bc451..4e5d7e9f0eef 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -22,22 +22,6 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
return false;
 }
 
-static __init int setup_hugepagesz(char *opt)
-{
-   unsigned long ps = memparse(opt, &opt);
-
-   if (arch_hugetlb_valid_size(ps)) {
-   hugetlb_add_hstate(ilog2(ps) - PAGE_SHIFT);
-   return 1;
-   }
-
-   hugetlb_bad_size();
-   pr_err("hugepagesz: Unsupported page size %lu M\n", ps >> 20);
-   return 0;
-
-}
-__setup("hugepagesz=", setup_hugepagesz);
-
 #ifdef CONFIG_CONTIG_ALLOC
 static __init int gigantic_pages_init(void)
 {
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index ac25b207624c..242dfc0d462d 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -261,24 +261,6 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
return false;
 }
 
-static __init int setup_hugepagesz(char *opt)
-{
-   unsigned long size;
-   char *string = opt;
-
-   size = memparse(opt, &opt);
-   if (arch_hugetlb_valid_size(size)) {
-   hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT);
-   } else {
-   hugetlb_bad_size();
-   pr_err("hugepagesz= specifies an unsupported page size %s\n",
-   string);
-   return 0;
-   }
-   return 1;
-}
-__setup("hugepagesz=", setup_hugepagesz);
-
 static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags)
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 2bfe8e22b706..4618f96fd30f 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -397,28 +397,6 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
 
return true;
 }
-
-static int __init setup_hugepagesz(char *string)
-{
-   unsigned long long hugepage_size;
-   int rc = 0;
-
-   hugepage_size = memparse(string, &string);
-
-   if (!arch_hugetlb_valid_size((unsigned long)hugepage_size)) {
-   hugetlb_bad_size();
-   pr_err("hugepa

Re: [PATCH net] ibmvnic: Fall back to 16 H_SEND_SUB_CRQ_INDIRECT entries with old FW

2020-04-28 Thread Juliet Kim


On 4/28/20 10:35 AM, Thomas Falcon wrote:
> On 4/27/20 12:33 PM, Juliet Kim wrote:
>> The maximum entries for H_SEND_SUB_CRQ_INDIRECT has increased on
>> some platforms from 16 to 128. If Live Partition Mobility is used
>> to migrate a running OS image from a newer source platform to an
>> older target platform, then H_SEND_SUB_CRQ_INDIRECT will fail with
>> H_PARAMETER if 128 entries are queued.
>>
>> Fix this by falling back to 16 entries if H_PARAMETER is returned
>> from the hcall().
>
> Thanks for the submission, but I am having a hard time believing that this is 
> what is happening since the driver does not support sending multiple frames 
> per hypervisor call at this time. Even if it were the case, this approach 
> would omit frame data needed by the VF, so the second attempt may still fail. 
> Are there system logs available that show the driver is attempting to send 
> transmissions with greater than 16 descriptors?
>
> Thanks,
>
> Tom
>
>
I am trying to confirm.

Juliet

>>
>> Signed-off-by: Juliet Kim 
>> ---
>>   drivers/net/ethernet/ibm/ibmvnic.c | 11 +++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
>> b/drivers/net/ethernet/ibm/ibmvnic.c
>> index 4bd33245bad6..b66c2f26a427 100644
>> --- a/drivers/net/ethernet/ibm/ibmvnic.c
>> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
>> @@ -1656,6 +1656,17 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, 
>> struct net_device *netdev)
>>   lpar_rc = send_subcrq_indirect(adapter, handle_array[queue_num],
>>  (u64)tx_buff->indir_dma,
>>  (u64)num_entries);
>> +
>> +    /* Old firmware accepts max 16 num_entries */
>> +    if (lpar_rc == H_PARAMETER && num_entries > 16) {
>> +    tx_crq.v1.n_crq_elem = 16;
>> +    tx_buff->num_entries = 16;
>> +    lpar_rc = send_subcrq_indirect(adapter,
>> +   handle_array[queue_num],
>> +   (u64)tx_buff->indir_dma,
>> +   16);
>> +    }
>> +
>>   dma_unmap_single(dev, tx_buff->indir_dma,
>>    sizeof(tx_buff->indir_arr), DMA_TO_DEVICE);
>>   } else {


Re: [PATCH v2] powerpc/64: BE option to use ELFv2 ABI for big endian kernels

2020-04-28 Thread Segher Boessenkool
Hi!

On Tue, Apr 28, 2020 at 09:25:17PM +1000, Nicholas Piggin wrote:
> +config BUILD_BIG_ENDIAN_ELF_V2
> + bool "Build big-endian kernel using ELFv2 ABI (EXPERIMENTAL)"
> + depends on PPC64 && CPU_BIG_ENDIAN && EXPERT
> + default n
> + select BUILD_ELF_V2
> + help
> +   This builds the kernel image using the ELFv2 ABI, which has a
> +   reduced stack overhead and faster function calls. This does not
> +   affect the userspace ABIs.
> +
> +   ELFv2 is the standard ABI for little-endian, but for big-endian
> +   this is an experimental option that is less tested (kernel and
> +   toolchain). This requires gcc 4.9 or newer and binutils 2.24 or
> +   newer.

Is it clear that this is only for 64-bit?  Maybe this text should fit
that in somewhere?

It's not obvious to people who do not already know that ELFv2 is just
the (nick-)name of a particular ABI, not a new kind of ELF (it is just
version 1 ELF in fact), and that ABI is for 64-bit Power only.


Segher


Re: [PATCH v2 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline

2020-04-28 Thread Andrew Morton
On Tue, 28 Apr 2020 15:08:36 +0530 Srikar Dronamraju 
 wrote:

> Currently Linux kernel with CONFIG_NUMA on a system with multiple
> possible nodes, marks node 0 as online at boot.  However in practice,
> there are systems which have node 0 as memoryless and cpuless.
> 
> This can cause numa_balancing to be enabled on systems with only one node
> with memory and CPUs. The existence of this dummy node which is cpuless and
> memoryless node can confuse users/scripts looking at output of lscpu /
> numactl.
> 
> By marking, N_ONLINE as NODE_MASK_NONE, lets stop assuming that Node 0 is
> always online.
> 
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -116,8 +116,10 @@ EXPORT_SYMBOL(latent_entropy);
>   */
>  nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
>   [N_POSSIBLE] = NODE_MASK_ALL,
> +#ifdef CONFIG_NUMA
> + [N_ONLINE] = NODE_MASK_NONE,
> +#else
>   [N_ONLINE] = { { [0] = 1UL } },
> -#ifndef CONFIG_NUMA
>   [N_NORMAL_MEMORY] = { { [0] = 1UL } },
>  #ifdef CONFIG_HIGHMEM
>   [N_HIGH_MEMORY] = { { [0] = 1UL } },

So on all other NUMA machines, when does node 0 get marked online?

This change means that for some time during boot, such machines will
now be running with node 0 marked as offline.  What are the
implications of this?  Will something break?


Re: [PATCH v2] powerpc/64: BE option to use ELFv2 ABI for big endian kernels

2020-04-28 Thread Nicholas Piggin
Excerpts from Segher Boessenkool's message of April 29, 2020 9:40 am:
> Hi!
> 
> On Tue, Apr 28, 2020 at 09:25:17PM +1000, Nicholas Piggin wrote:
>> +config BUILD_BIG_ENDIAN_ELF_V2
>> +bool "Build big-endian kernel using ELFv2 ABI (EXPERIMENTAL)"
>> +depends on PPC64 && CPU_BIG_ENDIAN && EXPERT
>> +default n
>> +select BUILD_ELF_V2
>> +help
>> +  This builds the kernel image using the ELFv2 ABI, which has a
>> +  reduced stack overhead and faster function calls. This does not
>> +  affect the userspace ABIs.
>> +
>> +  ELFv2 is the standard ABI for little-endian, but for big-endian
>> +  this is an experimental option that is less tested (kernel and
>> +  toolchain). This requires gcc 4.9 or newer and binutils 2.24 or
>> +  newer.
> 
> Is it clear that this is only for 64-bit?  Maybe this text should fit
> that in somewhere?

Don't know if it's necessary, the option only appears when 64-bit is
selected.

> It's not obvious to people who do not already know that ELFv2 is just
> the (nick-)name of a particular ABI, not a new kind of ELF (it is just
> version 1 ELF in fact), and that ABI is for 64-bit Power only.

I blame toolchain for -mabi=elfv2 ! And also some blame on ABI document 
which is called ELF V2 ABI rather than ELF ABI V2 which would have been 
unambiguous.

Kernel mostly gets it right (in the code I didn't write), and uses these

#if defined(_CALL_ELF) && _CALL_ELF == 2
#define PPC64_ELF_ABI_v2
#else
#define PPC64_ELF_ABI_v1
#endif

I can go through and change all my stuff and config options to ELF_ABI_v2.

Thanks,
Nick



Re: [PATCH v3 00/29] Convert files to ReST - part 2

2020-04-28 Thread Jan Harkes
On Tue, Apr 28, 2020 at 03:09:51PM -0400, Jonathan Corbet wrote:
> So I'm happy to merge this set, but there is one thing that worries me a
> bit... 
> 
> >  fs/coda/Kconfig   |2 +-
> 
> I'd feel a bit better if I could get an ack or two from filesystem folks
> before I venture that far out of my own yard...what say you all?

I acked the Coda parts on the first iteration of this patch. I have no
problem with you merging them.

Jan


[PATCH v3] powerpc/64: Option to use ELF V2 ABI for big-endian kernels

2020-04-28 Thread Nicholas Piggin
Provide an option to build big-endian kernels using the ELF V2 ABI. This works
on GCC and clang (since about 2014). it is is not officially supported by the
GNU toolchain, but it can give big-endian kernels  some useful advantages of
the V2 ABI (e.g., less stack usage).

Reviewed-by: Segher Boessenkool 
Signed-off-by: Nicholas Piggin 
---
Since v1:
- Improved the override flavour name suggested by Segher.
- Improved changelog wording.

Since v2:
- Improved changelog, help text, to use the name ELF V2 ABI in the spec,
  and clarify things a bit more, suggested by Segher.
- For option name, match the ELF_ABI_v1/2 which is already in the kernel.
- Prefix options with PPC64_ to avoid arch clashes or confusion.
- "elfv2" is the toolchain name of the ABI, so I kept that in the crypto
  perl scripts.

 arch/powerpc/Kconfig| 21 +
 arch/powerpc/Makefile   | 15 ++-
 arch/powerpc/boot/Makefile  |  4 
 drivers/crypto/vmx/Makefile |  8 ++--
 drivers/crypto/vmx/ppc-xlate.pl | 10 ++
 5 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 924c541a9260..444867e07039 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -147,6 +147,7 @@ config PPC
select ARCH_WEAK_RELEASE_ACQUIRE
select BINFMT_ELF
select BUILDTIME_TABLE_SORT
+   select PPC64_BUILD_ELF_ABI_V2   if PPC64 && CPU_LITTLE_ENDIAN
select CLONE_BACKWARDS
select DCACHE_WORD_ACCESS   if PPC64 && CPU_LITTLE_ENDIAN
select DYNAMIC_FTRACE   if FUNCTION_TRACER
@@ -541,6 +542,26 @@ config KEXEC_FILE
 config ARCH_HAS_KEXEC_PURGATORY
def_bool KEXEC_FILE
 
+config PPC64_BUILD_ELF_ABI_V2
+   bool
+
+config PPC64_BUILD_BIG_ENDIAN_ELF_ABI_V2
+   bool "Build big-endian kernel using ELF V2 ABI (EXPERIMENTAL)"
+   depends on PPC64 && CPU_BIG_ENDIAN && EXPERT
+   default n
+   select PPC64_BUILD_ELF_ABI_V2
+   help
+ This builds the kernel image using the "Power Architecture 64-Bit ELF
+ V2 ABI Specification", which has a reduced stack overhead and faster
+ function calls. This internal kernel ABI option does not affect
+  userspace compatibility.
+
+ The V2 ABI is standard for 64-bit little-endian, but for big-endian
+ it is less well tested by kernel and toolchain. However some distros
+ build userspace this way, and it can produce a functioning kernel.
+
+ This requires gcc 4.9 or newer and binutils 2.24 or newer.
+
 config RELOCATABLE
bool "Build a relocatable kernel"
depends on PPC64 || (FLATMEM && (44x || FSL_BOOKE))
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index f310c32e88a4..baf477d100b2 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -92,10 +92,14 @@ endif
 
 ifdef CONFIG_PPC64
 ifndef CONFIG_CC_IS_CLANG
-cflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mabi=elfv1)
-cflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call 
cc-option,-mcall-aixdesc)
-aflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mabi=elfv1)
-aflags-$(CONFIG_CPU_LITTLE_ENDIAN) += -mabi=elfv2
+ifdef CONFIG_PPC64_BUILD_ELF_ABI_V2
+cflags-y   += $(call cc-option,-mabi=elfv2,$(call 
cc-option,-mcall-aixdesc))
+aflags-y   += $(call cc-option,-mabi=elfv2)
+else
+cflags-y   += $(call cc-option,-mabi=elfv1)
+cflags-y   += $(call cc-option,-mcall-aixdesc)
+aflags-y   += $(call cc-option,-mabi=elfv1)
+endif
 endif
 endif
 
@@ -144,7 +148,7 @@ endif
 
 CFLAGS-$(CONFIG_PPC64) := $(call cc-option,-mtraceback=no)
 ifndef CONFIG_CC_IS_CLANG
-ifdef CONFIG_CPU_LITTLE_ENDIAN
+ifdef CONFIG_PPC64_BUILD_ELF_ABI_V2
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2,$(call 
cc-option,-mcall-aixdesc))
 AFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2)
 else
@@ -153,6 +157,7 @@ CFLAGS-$(CONFIG_PPC64)  += $(call 
cc-option,-mcall-aixdesc)
 AFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv1)
 endif
 endif
+
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium,$(call 
cc-option,-mminimal-toc))
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mno-pointers-to-nested-functions)
 
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index c53a1b8bba8b..d27e85413c84 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -41,6 +41,10 @@ endif
 
 BOOTCFLAGS += -isystem $(shell $(BOOTCC) -print-file-name=include)
 
+ifdef CONFIG_PPC64_BUILD_ELF_ABI_V2
+BOOTCFLAGS += $(call cc-option,-mabi=elfv2)
+endif
+
 ifdef CONFIG_CPU_BIG_ENDIAN
 BOOTCFLAGS += -mbig-endian
 else
diff --git a/drivers/crypto/vmx/Makefile b/drivers/crypto/vmx/Makefile
index 709670d2b553..751ffca694aa 100644
--- a/drivers/crypto/vmx/Makefile
+++ b/drive

Re: [RFC PATCH] powerpc/spufs: fix copy_to_user while atomic

2020-04-28 Thread Jeremy Kerr
Hi Christoph,

> FYI, these little hunks reduce the difference to my version, maybe
> you can fold them in?

Sure, no problem.

How do you want to coordinate these? I can submit mine through mpe, but
that may make it tricky to synchronise with your changes. Or, you can
include this change in your series if you prefer.

Cheers,


Jeremy




Re: [PATCH v2 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline

2020-04-28 Thread Srikar Dronamraju
> > 
> > By marking, N_ONLINE as NODE_MASK_NONE, lets stop assuming that Node 0 is
> > always online.
> > 
> > ...
> >
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -116,8 +116,10 @@ EXPORT_SYMBOL(latent_entropy);
> >   */
> >  nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
> > [N_POSSIBLE] = NODE_MASK_ALL,
> > +#ifdef CONFIG_NUMA
> > +   [N_ONLINE] = NODE_MASK_NONE,
> > +#else
> > [N_ONLINE] = { { [0] = 1UL } },
> > -#ifndef CONFIG_NUMA
> > [N_NORMAL_MEMORY] = { { [0] = 1UL } },
> >  #ifdef CONFIG_HIGHMEM
> > [N_HIGH_MEMORY] = { { [0] = 1UL } },
> 
> So on all other NUMA machines, when does node 0 get marked online?
> 
> This change means that for some time during boot, such machines will
> now be running with node 0 marked as offline.  What are the
> implications of this?  Will something break?

Till the nodes are detected, marking Node 0 as online tends to be redundant.
Because the system doesn't know if its a NUMA or a non-NUMA system.
Once we detect the nodes, we online them immediately. Hence I don't see any
side-effects or negative implications of this change.

However if I am missing anything, please do let me know.

>From my part, I have tested this on
1. Non-NUMA Single node but CPUs and memory coming from zero node.
2. Non-NUMA Single node but CPUs and memory coming from non-zero node.
3. NUMA Multi node but with CPUs and memory from node 0.
4. NUMA Multi node but with no CPUs and memory from node 0.

-- 
Thanks and Regards
Srikar Dronamraju


Re: [PATCH v6 10/28] powerpc: Introduce functions for instruction equality

2020-04-28 Thread Alistair Popple
There seems to be a minor typo which breaks compilation when 
CONFIG_MPROFILE_KERNEL is not enabled. See the fix below.

---
 arch/powerpc/kernel/trace/ftrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/
ftrace.c
index a6064e1977ca..0ad2c9d4ab49 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -499,7 +499,7 @@ expected_nop_sequence(void *ip, struct ppc_inst op0, 
struct ppc_inst op1)
 * The load offset is different depending on the ABI. For simplicity
 * just mask it out when doing the compare.
 */
-   if ((!ppc_inst_equal(op0), ppc_inst(0x4808)) || (ppc_inst_val(op1) 
& 
0x) != 0xe841)
+   if ((!ppc_inst_equal(op0, ppc_inst(0x4808))) || (ppc_inst_val(op1) 
& 
0x) != 0xe841)
return 0;
return 1;
 }
-- 
2.20.1






Re: [PATCH v6 11/28] powerpc: Use a datatype for instructions

2020-04-28 Thread Alistair Popple
Hi Jordan,

I needed the below fix for building with CONFIG_STRICT_KERNEL_RWX enabled. 
Hopefully it's correct, I have not yet had a chance to test it beyond building 
it.

- Alistair

---
 arch/powerpc/lib/code-patching.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-
patching.c
index ad5754c5f007..a8c8ffdb1ccd 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -166,8 +166,8 @@ static int do_patch_instruction(struct ppc_inst *addr, 
struct ppc_inst instr)
goto out;
}
 
-   patch_addr = (unsigned int *)(text_poke_addr) +
-   ((kaddr & ~PAGE_MASK) / sizeof(unsigned int));
+   patch_addr = (struct ppc_inst *)(text_poke_addr) +
+   ((kaddr & ~PAGE_MASK) / sizeof(unsigned int));
 
__patch_instruction(addr, instr, patch_addr);
 
-- 
2.20.1






[RFC PATCH v2 0/5] Use per-CPU temporary mappings for patching

2020-04-28 Thread Christopher M. Riedl
When compiled with CONFIG_STRICT_KERNEL_RWX, the kernel must create
temporary mappings when patching itself. These mappings temporarily
override the strict RWX text protections to permit a write. Currently,
powerpc allocates a per-CPU VM area for patching. Patching occurs as
follows:

1. Map page of text to be patched to per-CPU VM area w/
   PAGE_KERNEL protection
2. Patch text
3. Remove the temporary mapping

While the VM area is per-CPU, the mapping is actually inserted into the
kernel page tables. Presumably, this could allow another CPU to access
the normally write-protected text - either malicously or accidentally -
via this same mapping if the address of the VM area is known. Ideally,
the mapping should be kept local to the CPU doing the patching (or any
other sensitive operations requiring temporarily overriding memory
protections) [0].

x86 introduced "temporary mm" structs which allow the creation of
mappings local to a particular CPU [1]. This series intends to bring the
notion of a temporary mm to powerpc and harden powerpc by using such a
mapping for patching a kernel with strict RWX permissions.

The first patch introduces the temporary mm struct and API for powerpc
along with a new function to retrieve a current hw breakpoint.

The second patch uses the `poking_init` init hook added by the x86
patches to initialize a temporary mm and patching address. The patching
address is randomized between 0 and DEFAULT_MAP_WINDOW-PAGE_SIZE. The
upper limit is necessary due to how the hash MMU operates - by default
the space above DEFAULT_MAP_WINDOW is not available. For now, both hash
and radix randomize inside this range. The number of possible random
addresses is dependent on PAGE_SIZE and limited by DEFAULT_MAP_WINDOW.

Bits of entropy with 64K page size on BOOK3S_64:

bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)

PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
bits of entropy = log2(128TB / 64K)
bits of entropy = 31

Randomization occurs only once during initialization at boot.

The third patch replaces the VM area with the temporary mm in the
patching code. The page for patching has to be mapped PAGE_SHARED with
the hash MMU since hash prevents the kernel from accessing userspace
pages with PAGE_PRIVILEGED bit set. On the radix MMU the page is mapped with
PAGE_KERNEL which has the added benefit that we can skip KUAP. 

The fourth and fifth patches implement an LKDTM test "proof-of-concept" which
exploits the previous vulnerability (ie. the mapping during patching is exposed
in kernel page tables and accessible by other CPUS). The LKDTM test is somewhat
"rough" in that it uses a brute-force approach - I am open to any suggestions
and/or ideas to improve this. Currently, the LKDTM test passes with this series
on POWER8 (hash) and POWER9 (radix, hash) and fails without this series (ie.
the temporary mapping for patching is exposed to CPUs other than the patching
CPU).

The test can be applied to a tree without this new series by first
adding this in /arch/powerpc/lib/code-patching.c:

@@ -41,6 +41,13 @@ int raw_patch_instruction(unsigned int *addr, unsigned int 
instr)
 #ifdef CONFIG_STRICT_KERNEL_RWX
 static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);

+#ifdef CONFIG_LKDTM
+unsigned long read_cpu_patching_addr(unsigned int cpu)
+{
+   return (unsigned long)(per_cpu(text_poke_area, cpu))->addr;
+}
+#endif
+
 static int text_area_cpu_up(unsigned int cpu)
 {
struct vm_struct *area;

And then applying the last patch of this series which adds the LKDTM test,
(powerpc: Add LKDTM test to hijack a patch mapping).

Tested on QEMU (POWER8, POWER9), POWER8 VM, and a Blackbird (8-core POWER9).

v2: Many fixes and improvements mostly based on extensive feedback and testing
by Christophe Leroy (thanks!).
* Make patching_mm and patching_addr static and mode '__ro_after_init'
  to after the variable name (more common in other parts of the kernel)
* Use 'asm/debug.h' header instead of 'asm/hw_breakpoint.h' to fix
  PPC64e compile
* Add comment explaining why we use BUG_ON() during the init call to
  setup for patching later
* Move ptep into patch_mapping to avoid walking page tables a second
  time when unmapping the temporary mapping
* Use KUAP under non-radix, also manually dirty the PTE for patch
  mapping on non-BOOK3S_64 platforms
* Properly return any error from __patch_instruction
* Do not use 'memcmp' where a simple comparison is appropriate
* Simplify expression for patch address by removing pointer maths
* Add LKDTM test


[0]: https://github.com/linuxppc/issues/issues/224
[1]: 
https://lore.kernel.org/kernel-hardening/20190426232303.28381-1-nadav.a...@gmail.com/

Christopher M. Riedl (5):
  powerpc/mm: Introduce temporary mm
  powerpc/lib: Initialize a temporary mm for code patching
  powerpc/lib: 

[RFC PATCH v2 3/5] powerpc/lib: Use a temporary mm for code patching

2020-04-28 Thread Christopher M. Riedl
Currently, code patching a STRICT_KERNEL_RWX exposes the temporary
mappings to other CPUs. These mappings should be kept local to the CPU
doing the patching. Use the pre-initialized temporary mm and patching
address for this purpose. Also add a check after patching to ensure the
patch succeeded.

Use the KUAP functions on non-BOOKS3_64 platforms since the temporary
mapping for patching uses a userspace address (to keep the mapping
local). On BOOKS3_64 platforms hash does not implement KUAP and on radix
the use of PAGE_KERNEL sets EAA[0] for the PTE which means the AMR
(KUAP) protection is ignored (see PowerISA v3.0b, Fig, 35).

Based on x86 implementation:

commit b3fd8e83ada0
("x86/alternatives: Use temporary mm for text poking")

Signed-off-by: Christopher M. Riedl 
---
 arch/powerpc/lib/code-patching.c | 149 ---
 1 file changed, 55 insertions(+), 94 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 259c19480a85..26f06cdb5d7e 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int __patch_instruction(unsigned int *exec_addr, unsigned int instr,
   unsigned int *patch_addr)
@@ -72,101 +73,58 @@ void __init poking_init(void)
pte_unmap_unlock(ptep, ptl);
 }
 
-static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
-
-static int text_area_cpu_up(unsigned int cpu)
-{
-   struct vm_struct *area;
-
-   area = get_vm_area(PAGE_SIZE, VM_ALLOC);
-   if (!area) {
-   WARN_ONCE(1, "Failed to create text area for cpu %d\n",
-   cpu);
-   return -1;
-   }
-   this_cpu_write(text_poke_area, area);
-
-   return 0;
-}
-
-static int text_area_cpu_down(unsigned int cpu)
-{
-   free_vm_area(this_cpu_read(text_poke_area));
-   return 0;
-}
-
-/*
- * Run as a late init call. This allows all the boot time patching to be done
- * simply by patching the code, and then we're called here prior to
- * mark_rodata_ro(), which happens after all init calls are run. Although
- * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
- * it as being preferable to a kernel that will crash later when someone tries
- * to use patch_instruction().
- */
-static int __init setup_text_poke_area(void)
-{
-   BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
-   "powerpc/text_poke:online", text_area_cpu_up,
-   text_area_cpu_down));
-
-   return 0;
-}
-late_initcall(setup_text_poke_area);
+struct patch_mapping {
+   spinlock_t *ptl; /* for protecting pte table */
+   pte_t *ptep;
+   struct temp_mm temp_mm;
+};
 
 /*
  * This can be called for kernel text or a module.
  */
-static int map_patch_area(void *addr, unsigned long text_poke_addr)
+static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
 {
-   unsigned long pfn;
-   int err;
+   struct page *page;
+   pte_t pte;
+   pgprot_t pgprot;
 
if (is_vmalloc_addr(addr))
-   pfn = vmalloc_to_pfn(addr);
+   page = vmalloc_to_page(addr);
else
-   pfn = __pa_symbol(addr) >> PAGE_SHIFT;
+   page = virt_to_page(addr);
 
-   err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
+   if (radix_enabled())
+   pgprot = PAGE_KERNEL;
+   else
+   pgprot = PAGE_SHARED;
 
-   pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err);
-   if (err)
+   patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr,
+&patch_mapping->ptl);
+   if (unlikely(!patch_mapping->ptep)) {
+   pr_warn("map patch: failed to allocate pte for patching\n");
return -1;
+   }
+
+   pte = mk_pte(page, pgprot);
+   if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64))
+   pte = pte_mkdirty(pte);
+   set_pte_at(patching_mm, patching_addr, patch_mapping->ptep, pte);
+
+   init_temp_mm(&patch_mapping->temp_mm, patching_mm);
+   use_temporary_mm(&patch_mapping->temp_mm);
 
return 0;
 }
 
-static inline int unmap_patch_area(unsigned long addr)
+static void unmap_patch(struct patch_mapping *patch_mapping)
 {
-   pte_t *ptep;
-   pmd_t *pmdp;
-   pud_t *pudp;
-   pgd_t *pgdp;
-
-   pgdp = pgd_offset_k(addr);
-   if (unlikely(!pgdp))
-   return -EINVAL;
-
-   pudp = pud_offset(pgdp, addr);
-   if (unlikely(!pudp))
-   return -EINVAL;
-
-   pmdp = pmd_offset(pudp, addr);
-   if (unlikely(!pmdp))
-   return -EINVAL;
-
-   ptep = pte_offset_kernel(pmdp, addr);
-   if (unlikely(!ptep))
-   return -EINVAL;
+   /* In hash, pte_clear flushes the tlb */
+   pte_clear(patching_mm, patching_addr, patch_m

[RFC PATCH v2 5/5] powerpc: Add LKDTM test to hijack a patch mapping

2020-04-28 Thread Christopher M. Riedl
When live patching with STRICT_KERNEL_RWX, the CPU doing the patching
must use a temporary mapping which allows for writing to kernel text.
During the entire window of time when this temporary mapping is in use,
another CPU could write to the same mapping and maliciously alter kernel
text. Implement a LKDTM test to attempt to exploit such a openings when
a CPU is patching under STRICT_KERNEL_RWX. The test is only implemented
on powerpc for now.

The LKDTM "hijack" test works as follows:

1. A CPU executes an infinite loop to patch an instruction.
   This is the "patching" CPU.
2. Another CPU attempts to write to the address of the temporary
   mapping used by the "patching" CPU. This other CPU is the
   "hijacker" CPU. The hijack either fails with a segfault or
   succeeds, in which case some kernel text is now overwritten.

How to run the test:

mount -t debugfs none /sys/kernel/debug
(echo HIJACK_PATCH > /sys/kernel/debug/provoke-crash/DIRECT)

Signed-off-by: Christopher M. Riedl 
---
 drivers/misc/lkdtm/core.c  |  1 +
 drivers/misc/lkdtm/lkdtm.h |  1 +
 drivers/misc/lkdtm/perms.c | 99 ++
 3 files changed, 101 insertions(+)

diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index a5e344df9166..482e72f6a1e1 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -145,6 +145,7 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(WRITE_RO),
CRASHTYPE(WRITE_RO_AFTER_INIT),
CRASHTYPE(WRITE_KERN),
+   CRASHTYPE(HIJACK_PATCH),
CRASHTYPE(REFCOUNT_INC_OVERFLOW),
CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
CRASHTYPE(REFCOUNT_INC_NOT_ZERO_OVERFLOW),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 601a2156a0d4..bfcf3542370d 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -62,6 +62,7 @@ void lkdtm_EXEC_USERSPACE(void);
 void lkdtm_EXEC_NULL(void);
 void lkdtm_ACCESS_USERSPACE(void);
 void lkdtm_ACCESS_NULL(void);
+void lkdtm_HIJACK_PATCH(void);
 
 /* lkdtm_refcount.c */
 void lkdtm_REFCOUNT_INC_OVERFLOW(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 62f76d506f04..547ce16e03e5 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* Whether or not to fill the target memory area with do_nothing(). */
@@ -213,6 +214,104 @@ void lkdtm_ACCESS_NULL(void)
*ptr = tmp;
 }
 
+#if defined(CONFIG_PPC) && defined(CONFIG_STRICT_KERNEL_RWX)
+#include 
+
+extern unsigned long read_cpu_patching_addr(unsigned int cpu);
+
+static unsigned int * const patch_site = (unsigned int * const)&do_nothing;
+
+static int lkdtm_patching_cpu(void *data)
+{
+   int err = 0;
+
+   pr_info("starting patching_cpu=%d\n", smp_processor_id());
+   do {
+   err = patch_instruction(patch_site, 0xdeadbeef);
+   } while (*READ_ONCE(patch_site) == 0xdeadbeef &&
+   !err && !kthread_should_stop());
+
+   if (err)
+   pr_warn("patch_instruction returned error: %d\n", err);
+
+   set_current_state(TASK_INTERRUPTIBLE);
+   while (!kthread_should_stop()) {
+   schedule();
+   set_current_state(TASK_INTERRUPTIBLE);
+   }
+
+   return err;
+}
+
+void lkdtm_HIJACK_PATCH(void)
+{
+   struct task_struct *patching_kthrd;
+   int patching_cpu, hijacker_cpu, original_insn, attempts;
+   unsigned long addr;
+   bool hijacked;
+
+   if (num_online_cpus() < 2) {
+   pr_warn("need at least two cpus\n");
+   return;
+   }
+
+   original_insn = *READ_ONCE(patch_site);
+
+   hijacker_cpu = smp_processor_id();
+   patching_cpu = cpumask_any_but(cpu_online_mask, hijacker_cpu);
+
+   patching_kthrd = kthread_create_on_node(&lkdtm_patching_cpu, NULL,
+   cpu_to_node(patching_cpu),
+   "lkdtm_patching_cpu");
+   kthread_bind(patching_kthrd, patching_cpu);
+   wake_up_process(patching_kthrd);
+
+   addr = offset_in_page(patch_site) | 
read_cpu_patching_addr(patching_cpu);
+
+   pr_info("starting hijacker_cpu=%d\n", hijacker_cpu);
+   for (attempts = 0; attempts < 10; ++attempts) {
+   /* Use __put_user to catch faults without an Oops */
+   hijacked = !__put_user(0xbad00bad, (unsigned int *)addr);
+
+   if (hijacked) {
+   if (kthread_stop(patching_kthrd))
+   goto out;
+   break;
+   }
+   }
+   pr_info("hijack attempts: %d\n", attempts);
+
+   if (hijacked) {
+   if (*READ_ONCE(patch_site) == 0xbad00bad)
+   pr_err("overwrote kernel text\n");
+   /*
+* The

[RFC PATCH v2 2/5] powerpc/lib: Initialize a temporary mm for code patching

2020-04-28 Thread Christopher M. Riedl
When code patching a STRICT_KERNEL_RWX kernel the page containing the
address to be patched is temporarily mapped with permissive memory
protections. Currently, a per-cpu vmalloc patch area is used for this
purpose. While the patch area is per-cpu, the temporary page mapping is
inserted into the kernel page tables for the duration of the patching.
The mapping is exposed to CPUs other than the patching CPU - this is
undesirable from a hardening perspective.

Use the `poking_init` init hook to prepare a temporary mm and patching
address. Initialize the temporary mm by copying the init mm. Choose a
randomized patching address inside the temporary mm userspace address
portion. The next patch uses the temporary mm and patching address for
code patching.

Based on x86 implementation:

commit 4fc19708b165
("x86/alternatives: Initialize temporary mm for patching")

Signed-off-by: Christopher M. Riedl 
---
 arch/powerpc/lib/code-patching.c | 33 
 1 file changed, 33 insertions(+)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 3345f039a876..259c19480a85 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -11,6 +11,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -39,6 +41,37 @@ int raw_patch_instruction(unsigned int *addr, unsigned int 
instr)
 }
 
 #ifdef CONFIG_STRICT_KERNEL_RWX
+
+static struct mm_struct *patching_mm __ro_after_init;
+static unsigned long patching_addr __ro_after_init;
+
+void __init poking_init(void)
+{
+   spinlock_t *ptl; /* for protecting pte table */
+   pte_t *ptep;
+
+   /*
+* Some parts of the kernel (static keys for example) depend on
+* successful code patching. Code patching under STRICT_KERNEL_RWX
+* requires this setup - otherwise we cannot patch at all. We use
+* BUG_ON() here and later since an early failure is preferred to
+* buggy behavior and/or strange crashes later.
+*/
+   patching_mm = copy_init_mm();
+   BUG_ON(!patching_mm);
+
+   /*
+* In hash we cannot go above DEFAULT_MAP_WINDOW easily.
+* XXX: Do we want additional bits of entropy for radix?
+*/
+   patching_addr = (get_random_long() & PAGE_MASK) %
+   (DEFAULT_MAP_WINDOW - PAGE_SIZE);
+
+   ptep = get_locked_pte(patching_mm, patching_addr, &ptl);
+   BUG_ON(!ptep);
+   pte_unmap_unlock(ptep, ptl);
+}
+
 static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
 
 static int text_area_cpu_up(unsigned int cpu)
-- 
2.26.1



[RFC PATCH v2 4/5] powerpc/lib: Add LKDTM accessor for patching addr

2020-04-28 Thread Christopher M. Riedl
When live patching a STRICT_RWX kernel, a mapping is installed at a
"patching address" with temporary write permissions. Provide a
LKDTM-only accessor function for this address in preparation for a LKDTM
test which attempts to "hijack" this mapping by writing to it from
another CPU.

Signed-off-by: Christopher M. Riedl 
---
 arch/powerpc/lib/code-patching.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 26f06cdb5d7e..cfbdef90384e 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -46,6 +46,13 @@ int raw_patch_instruction(unsigned int *addr, unsigned int 
instr)
 static struct mm_struct *patching_mm __ro_after_init;
 static unsigned long patching_addr __ro_after_init;
 
+#ifdef CONFIG_LKDTM
+unsigned long read_cpu_patching_addr(unsigned int cpu)
+{
+   return patching_addr;
+}
+#endif
+
 void __init poking_init(void)
 {
spinlock_t *ptl; /* for protecting pte table */
-- 
2.26.1



[RFC PATCH v2 1/5] powerpc/mm: Introduce temporary mm

2020-04-28 Thread Christopher M. Riedl
x86 supports the notion of a temporary mm which restricts access to
temporary PTEs to a single CPU. A temporary mm is useful for situations
where a CPU needs to perform sensitive operations (such as patching a
STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
said mappings to other CPUs. A side benefit is that other CPU TLBs do
not need to be flushed when the temporary mm is torn down.

Mappings in the temporary mm can be set in the userspace portion of the
address-space.

Interrupts must be disabled while the temporary mm is in use. HW
breakpoints, which may have been set by userspace as watchpoints on
addresses now within the temporary mm, are saved and disabled when
loading the temporary mm. The HW breakpoints are restored when unloading
the temporary mm. All HW breakpoints are indiscriminately disabled while
the temporary mm is in use.

Based on x86 implementation:

commit cefa929c034e
("x86/mm: Introduce temporary mm structs")

Signed-off-by: Christopher M. Riedl 
---
 arch/powerpc/include/asm/debug.h   |  1 +
 arch/powerpc/include/asm/mmu_context.h | 54 ++
 arch/powerpc/kernel/process.c  |  5 +++
 3 files changed, 60 insertions(+)

diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
index 7756026b95ca..b945bc16c932 100644
--- a/arch/powerpc/include/asm/debug.h
+++ b/arch/powerpc/include/asm/debug.h
@@ -45,6 +45,7 @@ static inline int debugger_break_match(struct pt_regs *regs) 
{ return 0; }
 static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; }
 #endif
 
+void __get_breakpoint(struct arch_hw_breakpoint *brk);
 void __set_breakpoint(struct arch_hw_breakpoint *brk);
 bool ppc_breakpoint_available(void);
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 360367c579de..57a8695fe63f 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -10,6 +10,7 @@
 #include
 #include 
 #include 
+#include 
 
 /*
  * Most if the context management is out of line
@@ -270,5 +271,58 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
return 0;
 }
 
+struct temp_mm {
+   struct mm_struct *temp;
+   struct mm_struct *prev;
+   bool is_kernel_thread;
+   struct arch_hw_breakpoint brk;
+};
+
+static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
+{
+   temp_mm->temp = mm;
+   temp_mm->prev = NULL;
+   temp_mm->is_kernel_thread = false;
+   memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
+}
+
+static inline void use_temporary_mm(struct temp_mm *temp_mm)
+{
+   lockdep_assert_irqs_disabled();
+
+   temp_mm->is_kernel_thread = current->mm == NULL;
+   if (temp_mm->is_kernel_thread)
+   temp_mm->prev = current->active_mm;
+   else
+   temp_mm->prev = current->mm;
+
+   /*
+* Hash requires a non-NULL current->mm to allocate a userspace address
+* when handling a page fault. Does not appear to hurt in Radix either.
+*/
+   current->mm = temp_mm->temp;
+   switch_mm_irqs_off(NULL, temp_mm->temp, current);
+
+   if (ppc_breakpoint_available()) {
+   __get_breakpoint(&temp_mm->brk);
+   if (temp_mm->brk.type != 0)
+   hw_breakpoint_disable();
+   }
+}
+
+static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
+{
+   lockdep_assert_irqs_disabled();
+
+   if (temp_mm->is_kernel_thread)
+   current->mm = NULL;
+   else
+   current->mm = temp_mm->prev;
+   switch_mm_irqs_off(NULL, temp_mm->prev, current);
+
+   if (ppc_breakpoint_available() && temp_mm->brk.type != 0)
+   __set_breakpoint(&temp_mm->brk);
+}
+
 #endif /* __KERNEL__ */
 #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 9c21288f8645..ec4cf890d92c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -800,6 +800,11 @@ static inline int set_breakpoint_8xx(struct 
arch_hw_breakpoint *brk)
return 0;
 }
 
+void __get_breakpoint(struct arch_hw_breakpoint *brk)
+{
+   memcpy(brk, this_cpu_ptr(¤t_brk), sizeof(*brk));
+}
+
 void __set_breakpoint(struct arch_hw_breakpoint *brk)
 {
memcpy(this_cpu_ptr(¤t_brk), brk, sizeof(*brk));
-- 
2.26.1



Re: [PATCH v2 1/7] KVM: s390: clean up redundant 'kvm_run' parameters

2020-04-28 Thread Tianjia Zhang




On 2020/4/26 20:59, Thomas Huth wrote:

On 23/04/2020 13.00, Christian Borntraeger wrote:



On 23.04.20 12:58, Tianjia Zhang wrote:



On 2020/4/23 18:39, Cornelia Huck wrote:

On Thu, 23 Apr 2020 11:01:43 +0800
Tianjia Zhang  wrote:


On 2020/4/23 0:04, Cornelia Huck wrote:

On Wed, 22 Apr 2020 17:58:04 +0200
Christian Borntraeger  wrote:
   

On 22.04.20 15:45, Cornelia Huck wrote:

On Wed, 22 Apr 2020 20:58:04 +0800
Tianjia Zhang  wrote:
  

In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu'
structure. Earlier than historical reasons, many kvm-related function


s/Earlier than/For/ ?
  

parameters retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time.
This patch does a unified cleanup of these remaining redundant parameters.

Signed-off-by: Tianjia Zhang 
---
    arch/s390/kvm/kvm-s390.c | 37 ++---
    1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index e335a7e5ead7..d7bb2e7a07ff 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4176,8 +4176,9 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
    return rc;
    }
    -static void sync_regs_fmt2(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+static void sync_regs_fmt2(struct kvm_vcpu *vcpu)
    {
+    struct kvm_run *kvm_run = vcpu->run;
    struct runtime_instr_cb *riccb;
    struct gs_cb *gscb;
    @@ -4235,7 +4236,7 @@ static void sync_regs_fmt2(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
    }
    if (vcpu->arch.gs_enabled) {
    current->thread.gs_cb = (struct gs_cb *)
-    &vcpu->run->s.regs.gscb;
+    &kvm_run->s.regs.gscb;


Not sure if these changes (vcpu->run-> => kvm_run->) are really worth
it. (It seems they amount to at least as much as the changes advertised
in the patch description.)

Other opinions?


Agreed. It feels kind of random. Maybe just do the first line (move kvm_run 
from the
function parameter list into the variable declaration)? Not sure if this is 
better.
   


There's more in this patch that I cut... but I think just moving
kvm_run from the parameter list would be much less disruptive.



I think there are two kinds of code(`vcpu->run->` and `kvm_run->`), but
there will be more disruptive, not less.


I just fail to see the benefit; sure, kvm_run-> is convenient, but the
current code is just fine, and any rework should be balanced against
the cost (e.g. cluttering git annotate).



cluttering git annotate ? Does it mean Fix  ("comment"). Is it possible to 
solve this problem by splitting this patch?


No its about breaking git blame (and bugfix backports) for just a cosmetic 
improvement.


It could be slightly more than a cosmetic improvement (depending on the
smartness of the compiler): vcpu->run-> are two dereferences, while
kvm_run-> is only one dereference. So it could be slightly more compact
and faster code.

  Thomas



If the compiler is smart enough, this place can be automatically 
optimized, but we can't just rely on the compiler, if not? This requires 
a trade-off between code cleanliness readability and breaking git blame.
In addition, I have removed the changes here and sent a v4 patch. Please 
also help review it.


Thanks and best,
Tianjia


Re: New powerpc vdso calling convention

2020-04-28 Thread Nicholas Piggin
Excerpts from Adhemerval Zanella's message of April 27, 2020 11:09 pm:
> 
> 
> On 26/04/2020 00:41, Nicholas Piggin wrote:
>> Excerpts from Rich Felker's message of April 26, 2020 9:11 am:
>>> On Sun, Apr 26, 2020 at 08:58:19AM +1000, Nicholas Piggin wrote:
 Excerpts from Christophe Leroy's message of April 25, 2020 10:20 pm:
>
>
> Le 25/04/2020 à 12:56, Nicholas Piggin a écrit :
>> Excerpts from Christophe Leroy's message of April 25, 2020 5:47 pm:
>>>
>>>
>>> Le 25/04/2020 à 07:22, Nicholas Piggin a écrit :
 As noted in the 'scv' thread, powerpc's vdso calling convention does 
 not
 match the C ELF ABI calling convention (or the proposed scv 
 convention).
 I think we could implement a new ABI by basically duplicating function
 entry points with different names.
>>>
>>> I think doing this is a real good idea.
>>>
>>> I've been working at porting powerpc VDSO to the GENERIC C VDSO, and the
>>> main pitfall has been that our vdso calling convention is not compatible
>>> with C calling convention, so we have go through an ASM entry/exit.
>>>
>>> See 
>>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=171469
>>>
>>> We should kill this error flag return through CR[SO] and get it the
>>> "modern" way like other architectectures implementing the C VDSO: return
>>> 0 when successfull, return -err when failed.
>>
>> Agreed.
>>
 The ELF v2 ABI convention would suit it well, because the caller 
 already
 requires the function address for ctr, so having it in r12 will
 eliminate the need for address calculation, which suits the vdso data
 page access.

 Is there a need for ELF v1 specific calls as well, or could those just 
 be
 deprecated and remain on existing functions or required to use the ELF
 v2 calls using asm wrappers?
>>>
>>> What's ELF v1 and ELF v2 ? Is ELF v1 what PPC32 uses ? If so, I'd say
>>> yes, it would be good to have it to avoid going through ASM in the 
>>> middle.
>>
>> I'm not sure about PPC32. On PPC64, ELFv2 functions must be called with
>> their address in r12 if called at their global entry point. ELFv1 have a
>> function descriptor with call address and TOC in it, caller has to load
>> the TOC if it's global.
>>
>> The vdso doesn't have TOC, it has one global address (the vdso data
>> page) which it loads by calculating its own address.
>>
>> The kernel doesn't change the vdso based on whether it's called by a v1
>> or v2 userspace (it doesn't really know itself and would have to export
>> different functions). glibc has a hack to create something:
>>
>> # define VDSO_IFUNC_RET(value)   \
>>({ \
>>  static Elf64_FuncDesc vdso_opd = { .fd_toc = ~0x0 }; \
>>  vdso_opd.fd_func = (Elf64_Addr)value;\
>>  &vdso_opd;   \
>>})
>>
>> If we could make something which links more like any other dso with
>> ELFv1, that would be good. Otherwise I think v2 is preferable so it
>> doesn't have to calculate its own address.
>
> I see the following in glibc. So looks like PPC32 is like PPC64 elfv1. 
> By the way, they are talking about something not completely finished in 
> the kernel. Can we finish it ?

 Possibly can. It seems like a good idea to fix all loose ends if we are 
 going to add new versions. Will have to check with the toolchain people 
 to make sure we're doing the right thing.
>>>
>>> "ELFv1" and "ELFv2" are PPC64-specific names for the old and new
>>> version of the ELF psABI for PPC64. They have nothing at all to do
>>> with PPC32 which is a completely different ABI from either.
>> 
>> Right, I'm just talking about those comments -- it seems like the kernel 
>> vdso should contain an .opd section with function descriptors in it for
>> elfv1 calls, rather than the hack it has now of creating one in the 
>> caller's .data section.
>> 
>> But all that function descriptor code is gated by
>> 
>> #if (defined(__PPC64__) || defined(__powerpc64__)) && _CALL_ELF != 2
>> 
>> So it seems PPC32 does not use function descriptors but a direct pointer 
>> to the entry point like PPC64 with ELFv2.
> 
> Yes, this hack is only for ELFv1.  The missing ODP has not been an issue 
> or glibc because it has been using the inline assembly to emulate the 
> functions call since initial vDSO support (INTERNAL_VSYSCALL_CALL_TYPE).
> It just has become an issue when I added a ifunc optimization to 
> gettimeofday so it can bypass the libc.so and make plt branch to vDSO 
> directly.

I can't understand if it's actually a problem for you or not.

Regardless if you can hack around it, it seems 

Re: [PATCH v6 03/28] powerpc/xmon: Move breakpoints to text section

2020-04-28 Thread Jordan Niethe
On Tue, Apr 28, 2020 at 3:36 PM Christophe Leroy
 wrote:
>
>
>
> Le 28/04/2020 à 07:30, Jordan Niethe a écrit :
> > On Tue, Apr 28, 2020 at 3:20 PM Christophe Leroy
> >  wrote:
> >>
> >>
> >>
> >> Le 28/04/2020 à 03:57, Jordan Niethe a écrit :
> >>> The instructions for xmon's breakpoint are stored bpt_table[] which is in
> >>> the data section. This is problematic as the data section may be marked
> >>> as no execute. Move bpt_table[] to the text section.
> >>>
> >>> Signed-off-by: Jordan Niethe 
> >>> ---
> >>> v6: - New to series. Was part of the previous patch.
> >>>   - Make BPT_SIZE available in assembly
> >>> ---
> >>>arch/powerpc/kernel/asm-offsets.c |  8 
> >>>arch/powerpc/xmon/Makefile|  2 +-
> >>>arch/powerpc/xmon/xmon.c  |  6 +-
> >>>arch/powerpc/xmon/xmon_bpts.S |  9 +
> >>>arch/powerpc/xmon/xmon_bpts.h | 14 ++
> >>>5 files changed, 33 insertions(+), 6 deletions(-)
> >>>create mode 100644 arch/powerpc/xmon/xmon_bpts.S
> >>>create mode 100644 arch/powerpc/xmon/xmon_bpts.h
> >>>
> >>> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> >>> b/arch/powerpc/kernel/asm-offsets.c
> >>> index c25e562f1cd9..2401f415f423 100644
> >>> --- a/arch/powerpc/kernel/asm-offsets.c
> >>> +++ b/arch/powerpc/kernel/asm-offsets.c
> >>> @@ -70,6 +70,10 @@
> >>>#include 
> >>>#endif
> >>>
> >>> +#ifdef CONFIG_XMON
> >>> +#include "../xmon/xmon_bpts.h"
> >>> +#endif
> >>> +
> >>>#define STACK_PT_REGS_OFFSET(sym, val)  \
> >>>DEFINE(sym, STACK_FRAME_OVERHEAD + offsetof(struct pt_regs, val))
> >>>
> >>> @@ -783,5 +787,9 @@ int main(void)
> >>>DEFINE(VIRT_IMMR_BASE, (u64)__fix_to_virt(FIX_IMMR_BASE));
> >>>#endif
> >>>
> >>> +#ifdef CONFIG_XMON
> >>> + DEFINE(BPT_SIZE, BPT_SIZE);
> >>> +#endif
> >>> +
> >>>return 0;
> >>>}
> >>> diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
> >>> index c3842dbeb1b7..515a13ea6f28 100644
> >>> --- a/arch/powerpc/xmon/Makefile
> >>> +++ b/arch/powerpc/xmon/Makefile
> >>> @@ -21,7 +21,7 @@ endif
> >>>
> >>>ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC)
> >>>
> >>> -obj-y+= xmon.o nonstdio.o spr_access.o
> >>> +obj-y+= xmon.o nonstdio.o spr_access.o 
> >>> xmon_bpts.o
> >>>
> >>>ifdef CONFIG_XMON_DISASSEMBLY
> >>>obj-y   += ppc-dis.o ppc-opc.o
> >>> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> >>> index a064392df1b8..f7ce3ea8694c 100644
> >>> --- a/arch/powerpc/xmon/xmon.c
> >>> +++ b/arch/powerpc/xmon/xmon.c
> >>> @@ -62,6 +62,7 @@
> >>>
> >>>#include "nonstdio.h"
> >>>#include "dis-asm.h"
> >>> +#include "xmon_bpts.h"
> >>>
> >>>#ifdef CONFIG_SMP
> >>>static cpumask_t cpus_in_xmon = CPU_MASK_NONE;
> >>> @@ -108,7 +109,6 @@ struct bpt {
> >>>#define BP_TRAP 2
> >>>#define BP_DABR 4
> >>>
> >>> -#define NBPTS256
> >>>static struct bpt bpts[NBPTS];
> >>>static struct bpt dabr;
> >>>static struct bpt *iabr;
> >>> @@ -116,10 +116,6 @@ static unsigned bpinstr = 0x7fe8;/* trap */
> >>>
> >>>#define BP_NUM(bp)  ((bp) - bpts + 1)
> >>>
> >>> -#define BPT_SIZE   (sizeof(unsigned int) * 2)
> >>> -#define BPT_WORDS  (BPT_SIZE / sizeof(unsigned int))
> >>> -static unsigned int bpt_table[NBPTS * BPT_WORDS];
> >>> -
> >>>/* Prototypes */
> >>>static int cmds(struct pt_regs *);
> >>>static int mread(unsigned long, void *, int);
> >>> diff --git a/arch/powerpc/xmon/xmon_bpts.S b/arch/powerpc/xmon/xmon_bpts.S
> >>> new file mode 100644
> >>> index ..f3ad0ab50854
> >>> --- /dev/null
> >>> +++ b/arch/powerpc/xmon/xmon_bpts.S
> >>> @@ -0,0 +1,9 @@
> >>> +/* SPDX-License-Identifier: GPL-2.0 */
> >>> +#include 
> >>> +#include 
> >>> +#include 
> >>> +#include "xmon_bpts.h"
> >>> +
> >>> +.global bpt_table
> >>> +bpt_table:
> >>> + .space NBPTS * BPT_SIZE
> >>
> >> No alignment required ? Standard alignment (probably 4 bytes ?) is
> >> acceptable ?
> > No, I'll add a .balign4 to be sure.
>
> I think it is aligned to 4 by default. My question was to know whether 4
> is enough.
> I see BPT_SIZE is 8, should the alignment be at least 8 ?
At this point each breakpoint entry is two instructions: the
instruction that has been replaced by a breakpoint and a trap
instruction after that. So I think it should be fine aligned to 4. In
the patch that introduces prefixed instructions to the data type, the
alignment changes.
>
> >>
> >>
> >>> diff --git a/arch/powerpc/xmon/xmon_bpts.h b/arch/powerpc/xmon/xmon_bpts.h
> >>> new file mode 100644
> >>> index ..b7e94375db86
> >>> --- /dev/null
> >>> +++ b/arch/powerpc/xmon/xmon_bpts.h
> >>> @@ -0,0 +1,14 @@
> >>> +/* SPDX-License-Identifier: GPL-2.0 */
> >>> +#ifndef XMON_BPTS_H
> >>> +#define XMON_BPTS_H
> >>> +
> >>> +#define NBPTS256
> >>> +#ifndef __ASSEMBLY__
> >>> +#define BPT_S

Re: [PATCH v6 00/28] Initial Prefixed Instruction support

2020-04-28 Thread Jordan Niethe
On Tue, Apr 28, 2020 at 8:07 PM Balamuruhan S  wrote:
>
> On Tue, 2020-04-28 at 11:57 +1000, Jordan Niethe wrote:
> > A future revision of the ISA will introduce prefixed instructions. A
> > prefixed instruction is composed of a 4-byte prefix followed by a
> > 4-byte suffix.
> >
> > All prefixes have the major opcode 1. A prefix will never be a valid
> > word instruction. A suffix may be an existing word instruction or a
> > new instruction.
> >
> > This series enables prefixed instructions and extends the instruction
> > emulation to support them. Then the places where prefixed instructions
> > might need to be emulated are updated.
>
> Hi Jordan,
>
> I tried to test Kprobes with prefixed instruction on this patchset and
> observed that kprobe/uprobe enablement patches are missing from v4.
> Till v3 it were available,
>
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200211053355.21574-11-jniet...@gmail.com/
>
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200211053355.21574-12-jniet...@gmail.com/
>
> was it missed for any dependencies/reason ? or will you plan it include in 
> next
> version ?
V4 was when I introduced the data type of prefixed instructions, that
along with helper functions covered a lot of what those enablement
patches were doing.
All of the stuff in that uprobes patch gets done without needing a
specific patch now. I'll add back in the checks for prefix
instructions in arch_prepare_kprobe().

> please let me know if you need help on it.
>
> -- Bala
>
> >
> > v6 is based on feedback from Balamuruhan Suriyakumar, Alistair Popple,
> > Christophe Leroy and Segher Boessenkool.
> > The major changes:
> > - Use the instruction type in more places that had been missed before
> > - Fix issues with ppc32
> > - Introduce new self tests for code patching and feature fixups
> >
> > v5 is based on feedback from Nick Piggins, Michael Ellerman, Balamuruhan
> > Suriyakumar and Alistair Popple.
> > The major changes:
> > - The ppc instruction type is now a struct
> > - Series now just based on next
> > - ppc_inst_masked() dropped
> > - Space for xmon breakpoints allocated in an assembly file
> > - "Add prefixed instructions to instruction data type" patch seperated 
> > in
> >   to smaller patches
> > - Calling convention for create_branch() is changed
> > - Some places which had not been updated to use the data type are now
> > updated
> >
> > v4 is based on feedback from Nick Piggins, Christophe Leroy and Daniel
> > Axtens.
> > The major changes:
> > - Move xmon breakpoints from data section to text section
> > - Introduce a data type for instructions on powerpc
> >
> > v3 is based on feedback from Christophe Leroy. The major changes:
> > - Completely replacing store_inst() with patch_instruction() in
> >   xmon
> > - Improve implementation of mread_instr() to not use mread().
> > - Base the series on top of
> >   https://patchwork.ozlabs.org/patch/1232619/ as this will effect
> >   kprobes.
> > - Some renaming and simplification of conditionals.
> >
> > v2 incorporates feedback from Daniel Axtens and and Balamuruhan
> > S. The major changes are:
> > - Squashing together all commits about SRR1 bits
> > - Squashing all commits for supporting prefixed load stores
> > - Changing abbreviated references to sufx/prfx -> suffix/prefix
> > - Introducing macros for returning the length of an instruction
> > - Removing sign extension flag from pstd/pld in sstep.c
> > - Dropping patch  "powerpc/fault: Use analyse_instr() to check for
> >   store with updates to sp" from the series, it did not really fit
> >   with prefixed enablement in the first place and as reported by Greg
> >   Kurz did not work correctly.
> >
> > Alistair Popple (1):
> >   powerpc: Enable Prefixed Instructions
> >
> > Jordan Niethe (27):
> >   powerpc/xmon: Remove store_inst() for patch_instruction()
> >   powerpc/xmon: Move breakpoint instructions to own array
> >   powerpc/xmon: Move breakpoints to text section
> >   powerpc/xmon: Use bitwise calculations in_breakpoint_table()
> >   powerpc: Change calling convention for create_branch() et. al.
> >   powerpc: Use a macro for creating instructions from u32s
> >   powerpc: Use an accessor for instructions
> >   powerpc: Use a function for getting the instruction op code
> >   powerpc: Use a function for byte swapping instructions
> >   powerpc: Introduce functions for instruction equality
> >   powerpc: Use a datatype for instructions
> >   powerpc: Use a function for reading instructions
> >   powerpc: Add a probe_user_read_inst() function
> >   powerpc: Add a probe_kernel_read_inst() function
> >   powerpc/kprobes: Use patch_instruction()
> >   powerpc: Define and use __get_user_instr{,inatomic}()
> >   powerpc: Introduce a function for reporting instruction length
> >   powerpc/xmon: Use a function for reading instructions
> >   powerpc/xmon: Move in

Re: [PATCH v6 10/28] powerpc: Introduce functions for instruction equality

2020-04-28 Thread Jordan Niethe
On Wed, Apr 29, 2020 at 11:59 AM Alistair Popple  wrote:
>
> There seems to be a minor typo which breaks compilation when
> CONFIG_MPROFILE_KERNEL is not enabled. See the fix below.
>
> ---
>  arch/powerpc/kernel/trace/ftrace.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/
> ftrace.c
> index a6064e1977ca..0ad2c9d4ab49 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -499,7 +499,7 @@ expected_nop_sequence(void *ip, struct ppc_inst op0,
> struct ppc_inst op1)
>  * The load offset is different depending on the ABI. For simplicity
>  * just mask it out when doing the compare.
>  */
> -   if ((!ppc_inst_equal(op0), ppc_inst(0x4808)) || 
> (ppc_inst_val(op1) &
> 0x) != 0xe841)
> +   if ((!ppc_inst_equal(op0, ppc_inst(0x4808))) || 
> (ppc_inst_val(op1) &
> 0x) != 0xe841)
> return 0;
> return 1;
>  }
Thank you.
> --
> 2.20.1
>
>
>
>


Re: [PATCH v6 11/28] powerpc: Use a datatype for instructions

2020-04-28 Thread Jordan Niethe
On Wed, Apr 29, 2020 at 12:02 PM Alistair Popple  wrote:
>
> Hi Jordan,
>
> I needed the below fix for building with CONFIG_STRICT_KERNEL_RWX enabled.
> Hopefully it's correct, I have not yet had a chance to test it beyond building
> it.
Thanks, I'll get that working.
>
> - Alistair
>
> ---
>  arch/powerpc/lib/code-patching.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-
> patching.c
> index ad5754c5f007..a8c8ffdb1ccd 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -166,8 +166,8 @@ static int do_patch_instruction(struct ppc_inst *addr,
> struct ppc_inst instr)
> goto out;
> }
>
> -   patch_addr = (unsigned int *)(text_poke_addr) +
> -   ((kaddr & ~PAGE_MASK) / sizeof(unsigned int));
> +   patch_addr = (struct ppc_inst *)(text_poke_addr) +
> +   ((kaddr & ~PAGE_MASK) / sizeof(unsigned int));
Hmm, I think this might not give the expected result, with struct
ppc_inst being 8 long, compared to unsigned int being 4 long.
So the pointer arithmetic will not give the patch_addr you would expecting.

>
> __patch_instruction(addr, instr, patch_addr);
>
> --
> 2.20.1
>
>
>
>


Re: Section mismatch in reference from the function .early_init_mmu() to the function .init.text:.radix__early_init_mmu() after PowerPC updates 5.7-1

2020-04-28 Thread Nicholas Piggin
Excerpts from Christian Zigotzky's message of April 29, 2020 2:53 pm:
> Hi All,
> 
> The issue still exists in the RC3. (kernel config attached)
> 
> Please help me to fix this issue.

Huh, looks like maybe early_init_mmu() got uninlined because the 
compiler decided it was unlikely.

Does this fix it?

Thanks,
Nick
--

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index bb3deb76c951..3ffe5f967483 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -208,7 +208,7 @@ void hash__early_init_devtree(void);
 void radix__early_init_devtree(void);
 extern void hash__early_init_mmu(void);
 extern void radix__early_init_mmu(void);
-static inline void early_init_mmu(void)
+static inline void __init early_init_mmu(void)
 {
if (radix_enabled())
return radix__early_init_mmu();



Re: [RFC PATCH v2 1/5] powerpc/mm: Introduce temporary mm

2020-04-28 Thread Christophe Leroy




Le 29/04/2020 à 04:05, Christopher M. Riedl a écrit :

x86 supports the notion of a temporary mm which restricts access to
temporary PTEs to a single CPU. A temporary mm is useful for situations
where a CPU needs to perform sensitive operations (such as patching a
STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
said mappings to other CPUs. A side benefit is that other CPU TLBs do
not need to be flushed when the temporary mm is torn down.

Mappings in the temporary mm can be set in the userspace portion of the
address-space.

Interrupts must be disabled while the temporary mm is in use. HW
breakpoints, which may have been set by userspace as watchpoints on
addresses now within the temporary mm, are saved and disabled when
loading the temporary mm. The HW breakpoints are restored when unloading
the temporary mm. All HW breakpoints are indiscriminately disabled while
the temporary mm is in use.

Based on x86 implementation:

commit cefa929c034e
("x86/mm: Introduce temporary mm structs")

Signed-off-by: Christopher M. Riedl 
---
  arch/powerpc/include/asm/debug.h   |  1 +
  arch/powerpc/include/asm/mmu_context.h | 54 ++
  arch/powerpc/kernel/process.c  |  5 +++
  3 files changed, 60 insertions(+)

diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
index 7756026b95ca..b945bc16c932 100644
--- a/arch/powerpc/include/asm/debug.h
+++ b/arch/powerpc/include/asm/debug.h
@@ -45,6 +45,7 @@ static inline int debugger_break_match(struct pt_regs *regs) 
{ return 0; }
  static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; }
  #endif
  
+void __get_breakpoint(struct arch_hw_breakpoint *brk);

  void __set_breakpoint(struct arch_hw_breakpoint *brk);
  bool ppc_breakpoint_available(void);
  #ifdef CONFIG_PPC_ADV_DEBUG_REGS
diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 360367c579de..57a8695fe63f 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -10,6 +10,7 @@
  #include 
  #include 
  #include 
+#include 
  
  /*

   * Most if the context management is out of line
@@ -270,5 +271,58 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm,
return 0;
  }
  
+struct temp_mm {

+   struct mm_struct *temp;
+   struct mm_struct *prev;
+   bool is_kernel_thread;
+   struct arch_hw_breakpoint brk;
+};
+
+static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
+{
+   temp_mm->temp = mm;
+   temp_mm->prev = NULL;
+   temp_mm->is_kernel_thread = false;
+   memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
+}
+
+static inline void use_temporary_mm(struct temp_mm *temp_mm)
+{
+   lockdep_assert_irqs_disabled();
+
+   temp_mm->is_kernel_thread = current->mm == NULL;
+   if (temp_mm->is_kernel_thread)
+   temp_mm->prev = current->active_mm;
+   else
+   temp_mm->prev = current->mm;
+
+   /*
+* Hash requires a non-NULL current->mm to allocate a userspace address
+* when handling a page fault. Does not appear to hurt in Radix either.
+*/
+   current->mm = temp_mm->temp;
+   switch_mm_irqs_off(NULL, temp_mm->temp, current);
+
+   if (ppc_breakpoint_available()) {
+   __get_breakpoint(&temp_mm->brk);
+   if (temp_mm->brk.type != 0)
+   hw_breakpoint_disable();
+   }
+}
+
+static inline void unuse_temporary_mm(struct temp_mm *temp_mm)


Not sure "unuse" is a best naming, allthought I don't have a better 
suggestion a the moment. If not using temporary_mm anymore, what are we 
using now ?



+{
+   lockdep_assert_irqs_disabled();
+
+   if (temp_mm->is_kernel_thread)
+   current->mm = NULL;
+   else
+   current->mm = temp_mm->prev;
+   switch_mm_irqs_off(NULL, temp_mm->prev, current);
+
+   if (ppc_breakpoint_available() && temp_mm->brk.type != 0)
+   __set_breakpoint(&temp_mm->brk);
+}
+
  #endif /* __KERNEL__ */
  #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 9c21288f8645..ec4cf890d92c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -800,6 +800,11 @@ static inline int set_breakpoint_8xx(struct 
arch_hw_breakpoint *brk)
return 0;
  }
  
+void __get_breakpoint(struct arch_hw_breakpoint *brk)

+{
+   memcpy(brk, this_cpu_ptr(¤t_brk), sizeof(*brk));
+}
+
  void __set_breakpoint(struct arch_hw_breakpoint *brk)
  {
memcpy(this_cpu_ptr(¤t_brk), brk, sizeof(*brk));



Christophe


Re: [RFC PATCH v2 1/5] powerpc/mm: Introduce temporary mm

2020-04-28 Thread Christophe Leroy




Le 29/04/2020 à 04:05, Christopher M. Riedl a écrit :

x86 supports the notion of a temporary mm which restricts access to
temporary PTEs to a single CPU. A temporary mm is useful for situations
where a CPU needs to perform sensitive operations (such as patching a
STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
said mappings to other CPUs. A side benefit is that other CPU TLBs do
not need to be flushed when the temporary mm is torn down.

Mappings in the temporary mm can be set in the userspace portion of the
address-space.

Interrupts must be disabled while the temporary mm is in use. HW
breakpoints, which may have been set by userspace as watchpoints on
addresses now within the temporary mm, are saved and disabled when
loading the temporary mm. The HW breakpoints are restored when unloading
the temporary mm. All HW breakpoints are indiscriminately disabled while
the temporary mm is in use.


Why do we need to use a temporary mm all the time ?

Doesn't each CPU have its own mm already ? Only the upper address space 
is shared between all mm's but each mm has its own lower address space, 
at least when it is running a user process. Why not just use that mm ? 
As we are mapping then unmapping with interrupts disabled, there is no 
risk at all that the user starts running while the patch page is mapped, 
so I'm not sure why switching to a temporary mm is needed.





Based on x86 implementation:

commit cefa929c034e
("x86/mm: Introduce temporary mm structs")

Signed-off-by: Christopher M. Riedl 


Christophe


Re: [RFC PATCH v2 3/5] powerpc/lib: Use a temporary mm for code patching

2020-04-28 Thread Christophe Leroy




Le 29/04/2020 à 04:05, Christopher M. Riedl a écrit :

Currently, code patching a STRICT_KERNEL_RWX exposes the temporary
mappings to other CPUs. These mappings should be kept local to the CPU
doing the patching. Use the pre-initialized temporary mm and patching
address for this purpose. Also add a check after patching to ensure the
patch succeeded.

Use the KUAP functions on non-BOOKS3_64 platforms since the temporary
mapping for patching uses a userspace address (to keep the mapping
local). On BOOKS3_64 platforms hash does not implement KUAP and on radix
the use of PAGE_KERNEL sets EAA[0] for the PTE which means the AMR
(KUAP) protection is ignored (see PowerISA v3.0b, Fig, 35).

Based on x86 implementation:

commit b3fd8e83ada0
("x86/alternatives: Use temporary mm for text poking")

Signed-off-by: Christopher M. Riedl 
---
  arch/powerpc/lib/code-patching.c | 149 ---
  1 file changed, 55 insertions(+), 94 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 259c19480a85..26f06cdb5d7e 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -19,6 +19,7 @@
  #include 
  #include 
  #include 
+#include 
  
  static int __patch_instruction(unsigned int *exec_addr, unsigned int instr,

   unsigned int *patch_addr)
@@ -72,101 +73,58 @@ void __init poking_init(void)
pte_unmap_unlock(ptep, ptl);
  }
  
-static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);

-
-static int text_area_cpu_up(unsigned int cpu)
-{
-   struct vm_struct *area;
-
-   area = get_vm_area(PAGE_SIZE, VM_ALLOC);
-   if (!area) {
-   WARN_ONCE(1, "Failed to create text area for cpu %d\n",
-   cpu);
-   return -1;
-   }
-   this_cpu_write(text_poke_area, area);
-
-   return 0;
-}
-
-static int text_area_cpu_down(unsigned int cpu)
-{
-   free_vm_area(this_cpu_read(text_poke_area));
-   return 0;
-}
-
-/*
- * Run as a late init call. This allows all the boot time patching to be done
- * simply by patching the code, and then we're called here prior to
- * mark_rodata_ro(), which happens after all init calls are run. Although
- * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
- * it as being preferable to a kernel that will crash later when someone tries
- * to use patch_instruction().
- */
-static int __init setup_text_poke_area(void)
-{
-   BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
-   "powerpc/text_poke:online", text_area_cpu_up,
-   text_area_cpu_down));
-
-   return 0;
-}
-late_initcall(setup_text_poke_area);
+struct patch_mapping {
+   spinlock_t *ptl; /* for protecting pte table */
+   pte_t *ptep;
+   struct temp_mm temp_mm;
+};
  
  /*

   * This can be called for kernel text or a module.
   */
-static int map_patch_area(void *addr, unsigned long text_poke_addr)
+static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
  {
-   unsigned long pfn;
-   int err;
+   struct page *page;
+   pte_t pte;
+   pgprot_t pgprot;
  
  	if (is_vmalloc_addr(addr))

-   pfn = vmalloc_to_pfn(addr);
+   page = vmalloc_to_page(addr);
else
-   pfn = __pa_symbol(addr) >> PAGE_SHIFT;
+   page = virt_to_page(addr);
  
-	err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);

+   if (radix_enabled())
+   pgprot = PAGE_KERNEL;
+   else
+   pgprot = PAGE_SHARED;
  
-	pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err);

-   if (err)
+   patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr,
+&patch_mapping->ptl);
+   if (unlikely(!patch_mapping->ptep)) {
+   pr_warn("map patch: failed to allocate pte for patching\n");
return -1;
+   }
+
+   pte = mk_pte(page, pgprot);
+   if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64))
+   pte = pte_mkdirty(pte);


Why only when CONFIG_PPC_BOOK3S_64 is not set ?

PAGE_KERNEL should already be dirty, so making it dirty all the time 
shouldn't hurt.



+   set_pte_at(patching_mm, patching_addr, patch_mapping->ptep, pte);
+
+   init_temp_mm(&patch_mapping->temp_mm, patching_mm);
+   use_temporary_mm(&patch_mapping->temp_mm);
  
  	return 0;

  }
  
-static inline int unmap_patch_area(unsigned long addr)

+static void unmap_patch(struct patch_mapping *patch_mapping)
  {
-   pte_t *ptep;
-   pmd_t *pmdp;
-   pud_t *pudp;
-   pgd_t *pgdp;
-
-   pgdp = pgd_offset_k(addr);
-   if (unlikely(!pgdp))
-   return -EINVAL;
-
-   pudp = pud_offset(pgdp, addr);
-   if (unlikely(!pudp))
-   return -EINVAL;
-
-   pmdp = pmd_offset(pudp, addr);
-   if (unlikely(!pmdp))
-   return -EINVAL;
-
-  

[PATCH 1/2] tools/perf: set no_auxtrace for powerpc

2020-04-28 Thread Anju T Sudhakar
x86/perf_regs.h is included by util/intel-pt.c, which will get compiled
when buiding perf on powerpc. Since x86/perf_regs.h has
`PERF_EXTENDED_REG_MASK` defined, defining `PERF_EXTENDED_REG_MASK` for
powerpc to add support for perf extended regs will result in perf build
error on powerpc.

Currently powerpc architecture is not having support for auxtrace. So as
a workaround for this issue, set NO_AUXTRACE for powerpc.

Signed-off-by: Anju T Sudhakar 
---
 tools/perf/arch/powerpc/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/arch/powerpc/Makefile b/tools/perf/arch/powerpc/Makefile
index e58d00d62f02..9ebb5f513605 100644
--- a/tools/perf/arch/powerpc/Makefile
+++ b/tools/perf/arch/powerpc/Makefile
@@ -3,6 +3,7 @@ ifndef NO_DWARF
 PERF_HAVE_DWARF_REGS := 1
 endif
 
+NO_AUXTRACE := 1
 HAVE_KVM_STAT_SUPPORT := 1
 PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET := 1
 PERF_HAVE_JITDUMP := 1
-- 
2.20.1



[PATCH 0/2] powerpc/perf: Add support for perf extended regs in powerpc

2020-04-28 Thread Anju T Sudhakar
Patch set to add support for perf extended register capability in
powerpc. The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to
indicate the PMU which support extended registers. The generic code
define the mask of extended registers as 0 for non supported architectures.

patch 2/2 defines this PERF_PMU_CAP_EXTENDED_REGS mask to output the
values of mmcr0,mmcr1,mmcr2 for POWER9.
   
x86/perf_regs.h is included by util/intel-pt.c, which will get compiled   
when buiding perf on powerpc. Since x86/perf_regs.h has
`PERF_EXTENDED_REG_MASK` defined, defining `PERF_EXTENDED_REG_MASK` for   
powerpc to add support for perf extended regs will result in perf build   
error on powerpc. Currently powerpc architecture is not having support
for auxtrace. So as a workaround for this issue, patch 1/2 set
NO_AUXTRACE for powerpc. (Any other solutions are welcome.)

Patch 2/2 also add extended regs to sample_reg_mask in the tool side to use
with `-I?` option.

Anju T Sudhakar (2):
  tools/perf: set no_auxtrace for powerpc
  powerpc/perf: Add support for outputting extended regs in perf
intr_regs

 arch/powerpc/include/asm/perf_event_server.h  |  5 +++
 arch/powerpc/include/uapi/asm/perf_regs.h | 13 +++-
 arch/powerpc/perf/core-book3s.c   |  1 +
 arch/powerpc/perf/perf_regs.c | 29 ++--
 arch/powerpc/perf/power9-pmu.c|  1 +
 .../arch/powerpc/include/uapi/asm/perf_regs.h | 13 +++-
 tools/perf/arch/powerpc/Makefile  |  1 +
 tools/perf/arch/powerpc/include/perf_regs.h   |  6 +++-
 tools/perf/arch/powerpc/util/perf_regs.c  | 33 +++
 9 files changed, 96 insertions(+), 6 deletions(-)

-- 
2.20.1



[PATCH 2/2] powerpc/perf: Add support for outputting extended regs in perf intr_regs

2020-04-28 Thread Anju T Sudhakar
The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the
PMU which support extended registers. The generic code define the mask
of extended registers as 0 for non supported architectures.

Add support for extended registers in POWER9 architecture. For POWER9,
the extended registers are mmcr0, mmc1 and mmcr2.

REG_RESERVED mask is redefined to accommodate the extended registers.

With patch:


# perf record -I?
available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14
r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26 r27 r28 r29 r30 r31 nip
msr orig_r3 ctr link xer ccr softe trap dar dsisr sier mmcra mmcr0
mmcr1 mmcr2

# perf record -I ls
# perf script -D

PERF_RECORD_SAMPLE(IP, 0x1): 9019/9019: 0 period: 1 addr: 0
... intr regs: mask 0x ABI 64-bit
 r00xc011b12c
 r10xc03f9a98b930
 r20xc1a32100
 r30xc03f8fe9a800
 r40xc03fd181
 r50x3e32557150
 r60xc03f9a98b908
 r70xffc1cdae06ac
 r80x818
[.]
 r31   0xc03ffd047230
 nip   0xc011b2c0
 msr   0x90009033
 orig_r3 0xc011b21c
 ctr   0xc0119380
 link  0xc011b12c
 xer   0x0
 ccr   0x2800
 softe 0x1
 trap  0xf00
 dar   0x0
 dsisr 0x800
 sier  0x0
 mmcra 0x800
 mmcr0 0x82008090
 mmcr1 0x1e00
 mmcr2 0x0
 ... thread: perf:9019

Signed-off-by: Anju T Sudhakar 
---
 arch/powerpc/include/asm/perf_event_server.h  |  5 +++
 arch/powerpc/include/uapi/asm/perf_regs.h | 13 +++-
 arch/powerpc/perf/core-book3s.c   |  1 +
 arch/powerpc/perf/perf_regs.c | 29 ++--
 arch/powerpc/perf/power9-pmu.c|  1 +
 .../arch/powerpc/include/uapi/asm/perf_regs.h | 13 +++-
 tools/perf/arch/powerpc/include/perf_regs.h   |  6 +++-
 tools/perf/arch/powerpc/util/perf_regs.c  | 33 +++
 8 files changed, 95 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 3e9703f44c7c..1d15953bd99e 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -55,6 +55,11 @@ struct power_pmu {
int *blacklist_ev;
/* BHRB entries in the PMU */
int bhrb_nr;
+   /*
+* set this flag with `PERF_PMU_CAP_EXTENDED_REGS` if
+* the pmu supports extended perf regs capability
+*/
+   int capabilities;
 };
 
 /*
diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h 
b/arch/powerpc/include/uapi/asm/perf_regs.h
index f599064dd8dc..604b831378fe 100644
--- a/arch/powerpc/include/uapi/asm/perf_regs.h
+++ b/arch/powerpc/include/uapi/asm/perf_regs.h
@@ -48,6 +48,17 @@ enum perf_event_powerpc_regs {
PERF_REG_POWERPC_DSISR,
PERF_REG_POWERPC_SIER,
PERF_REG_POWERPC_MMCRA,
-   PERF_REG_POWERPC_MAX,
+   /* Extended registers */
+   PERF_REG_POWERPC_MMCR0,
+   PERF_REG_POWERPC_MMCR1,
+   PERF_REG_POWERPC_MMCR2,
+   PERF_REG_EXTENDED_MAX,
+   /* Max regs without the extended regs */
+   PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
 };
+
+#define PERF_REG_PMU_MASK  ((1ULL << PERF_REG_POWERPC_MAX) - 1)
+#define PERF_REG_EXTENDED_MASK  (((1ULL << (PERF_REG_EXTENDED_MAX))\
+   - 1) - PERF_REG_PMU_MASK)
+
 #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 3dcfecf858f3..f56b77800a7b 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2276,6 +2276,7 @@ int register_power_pmu(struct power_pmu *pmu)
 
power_pmu.attr_groups = ppmu->attr_groups;
 
+   power_pmu.capabilities |= (ppmu->capabilities & 
PERF_PMU_CAP_EXTENDED_REGS);
 #ifdef MSR_HV
/*
 * Use FCHV to ignore kernel events if MSR.HV is set.
diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c
index a213a0aa5d25..57aa02568caf 100644
--- a/arch/powerpc/perf/perf_regs.c
+++ b/arch/powerpc/perf/perf_regs.c
@@ -15,7 +15,8 @@
 
 #define PT_REGS_OFFSET(id, r) [id] = offsetof(struct pt_regs, r)
 
-#define REG_RESERVED (~((1ULL << PERF_REG_POWERPC_MAX) - 1))
+#define REG_RESERVED (~(PERF_REG_EXTENDED_MASK) &  \
+   (~((1ULL << PERF_REG_POWERPC_MAX) - 1)))
 
 static unsigned int pt_regs_offset[PERF_REG_POWERPC_MAX] = {
PT_REGS_OFFSET(PERF_REG_POWERPC_R0,  gpr[0]),
@@ -69,10 +70,22 @@ static unsigned int pt_regs_offset[PERF_REG_POWERPC_MAX] = {
PT_REGS_OFFSET(PERF_REG_POWERPC_MMCRA, dsisr),
 };
 
+/* Function to return the extended register values */
+static u64 get_ext_regs_value(int idx)
+{
+   switch (idx) {
+   case PERF_REG_POWERPC_MMCR0:
+   return mfspr(SPRN_MMCR0);

Re: [RFC PATCH] powerpc/spufs: fix copy_to_user while atomic

2020-04-28 Thread Christoph Hellwig
On Wed, Apr 29, 2020 at 09:36:30AM +0800, Jeremy Kerr wrote:
> Hi Christoph,
> 
> > FYI, these little hunks reduce the difference to my version, maybe
> > you can fold them in?
> 
> Sure, no problem.
> 
> How do you want to coordinate these? I can submit mine through mpe, but
> that may make it tricky to synchronise with your changes. Or, you can
> include this change in your series if you prefer.

Maybe you can feed your patch directly to Linus through Michael
ASAP, and I'll wait for that before resubmitting this series?


Re: [RFC PATCH] powerpc/spufs: fix copy_to_user while atomic

2020-04-28 Thread Christoph Hellwig
On Wed, Apr 29, 2020 at 08:05:53AM +0200, Christoph Hellwig wrote:
> On Wed, Apr 29, 2020 at 09:36:30AM +0800, Jeremy Kerr wrote:
> > Hi Christoph,
> > 
> > > FYI, these little hunks reduce the difference to my version, maybe
> > > you can fold them in?
> > 
> > Sure, no problem.
> > 
> > How do you want to coordinate these? I can submit mine through mpe, but
> > that may make it tricky to synchronise with your changes. Or, you can
> > include this change in your series if you prefer.
> 
> Maybe you can feed your patch directly to Linus through Michael
> ASAP, and I'll wait for that before resubmitting this series?

Btw, turns out my fold patch didn't actually compile without the
rebased patch on top, sorry.  Here is the proper one:

diff --git a/arch/powerpc/platforms/cell/spufs/file.c 
b/arch/powerpc/platforms/cell/spufs/file.c
index c62d77ddaf7d3..b4e1ef650b406 100644
--- a/arch/powerpc/platforms/cell/spufs/file.c
+++ b/arch/powerpc/platforms/cell/spufs/file.c
@@ -2107,7 +2107,6 @@ static const struct file_operations spufs_wbox_info_fops 
= {
 static void ___spufs_dma_info_read(struct spu_context *ctx,
struct spu_dma_info *info)
 {
-   struct mfc_cq_sr *qp, *spuqp;
int i;
 
info->dma_info_type = ctx->csa.priv2.spu_tag_status_query_RW;
@@ -2116,8 +2115,8 @@ static void ___spufs_dma_info_read(struct spu_context 
*ctx,
info->dma_info_stall_and_notify = ctx->csa.spu_chnldata_RW[25];
info->dma_info_atomic_command_status = ctx->csa.spu_chnldata_RW[27];
for (i = 0; i < 16; i++) {
-   qp = &info->dma_info_command_data[i];
-   spuqp = &ctx->csa.priv2.spuq[i];
+   struct mfc_cq_sr *qp = &info->dma_info_command_data[i];
+   struct mfc_cq_sr *spuqp = &ctx->csa.priv2.spuq[i];
 
qp->mfc_cq_data0_RW = spuqp->mfc_cq_data0_RW;
qp->mfc_cq_data1_RW = spuqp->mfc_cq_data1_RW;
@@ -2166,7 +2165,7 @@ static const struct file_operations spufs_dma_info_fops = 
{
 };
 
 static void ___spufs_proxydma_info_read(struct spu_context *ctx,
-   struct spu_proxydma_info *info)
+   struct spu_proxydma_info *info)
 {
int i;
 


Re: [RFC PATCH] powerpc/spufs: fix copy_to_user while atomic

2020-04-28 Thread Christoph Hellwig
And another one that should go on top of this one to address Al's other
compaint:

---
>From 1b7ced3de0b3a4addec61f61ac5278c3ff141657 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Wed, 22 Apr 2020 09:05:30 +0200
Subject: powerpc/spufs: stop using access_ok

Just use the proper non __-prefixed get/put_user variants where that is
not done yet.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/platforms/cell/spufs/file.c | 42 +---
 1 file changed, 8 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/file.c 
b/arch/powerpc/platforms/cell/spufs/file.c
index b4e1ef650b406..cd7d10f27fad1 100644
--- a/arch/powerpc/platforms/cell/spufs/file.c
+++ b/arch/powerpc/platforms/cell/spufs/file.c
@@ -590,17 +590,12 @@ static ssize_t spufs_mbox_read(struct file *file, char 
__user *buf,
size_t len, loff_t *pos)
 {
struct spu_context *ctx = file->private_data;
-   u32 mbox_data, __user *udata;
+   u32 mbox_data, __user *udata = (void __user *)buf;
ssize_t count;
 
if (len < 4)
return -EINVAL;
 
-   if (!access_ok(buf, len))
-   return -EFAULT;
-
-   udata = (void __user *)buf;
-
count = spu_acquire(ctx);
if (count)
return count;
@@ -616,7 +611,7 @@ static ssize_t spufs_mbox_read(struct file *file, char 
__user *buf,
 * but still need to return the data we have
 * read successfully so far.
 */
-   ret = __put_user(mbox_data, udata);
+   ret = put_user(mbox_data, udata);
if (ret) {
if (!count)
count = -EFAULT;
@@ -698,17 +693,12 @@ static ssize_t spufs_ibox_read(struct file *file, char 
__user *buf,
size_t len, loff_t *pos)
 {
struct spu_context *ctx = file->private_data;
-   u32 ibox_data, __user *udata;
+   u32 ibox_data, __user *udata = (void __user *)buf;
ssize_t count;
 
if (len < 4)
return -EINVAL;
 
-   if (!access_ok(buf, len))
-   return -EFAULT;
-
-   udata = (void __user *)buf;
-
count = spu_acquire(ctx);
if (count)
goto out;
@@ -727,7 +717,7 @@ static ssize_t spufs_ibox_read(struct file *file, char 
__user *buf,
}
 
/* if we can't write at all, return -EFAULT */
-   count = __put_user(ibox_data, udata);
+   count = put_user(ibox_data, udata);
if (count)
goto out_unlock;
 
@@ -741,7 +731,7 @@ static ssize_t spufs_ibox_read(struct file *file, char 
__user *buf,
 * but still need to return the data we have
 * read successfully so far.
 */
-   ret = __put_user(ibox_data, udata);
+   ret = put_user(ibox_data, udata);
if (ret)
break;
}
@@ -836,17 +826,13 @@ static ssize_t spufs_wbox_write(struct file *file, const 
char __user *buf,
size_t len, loff_t *pos)
 {
struct spu_context *ctx = file->private_data;
-   u32 wbox_data, __user *udata;
+   u32 wbox_data, __user *udata = (void __user *)buf;
ssize_t count;
 
if (len < 4)
return -EINVAL;
 
-   udata = (void __user *)buf;
-   if (!access_ok(buf, len))
-   return -EFAULT;
-
-   if (__get_user(wbox_data, udata))
+   if (get_user(wbox_data, udata))
return -EFAULT;
 
count = spu_acquire(ctx);
@@ -873,7 +859,7 @@ static ssize_t spufs_wbox_write(struct file *file, const 
char __user *buf,
/* write as much as possible */
for (count = 4, udata++; (count + 4) <= len; count += 4, udata++) {
int ret;
-   ret = __get_user(wbox_data, udata);
+   ret = get_user(wbox_data, udata);
if (ret)
break;
 
@@ -1982,9 +1968,6 @@ static ssize_t spufs_mbox_info_read(struct file *file, 
char __user *buf,
u32 stat, data;
int ret;
 
-   if (!access_ok(buf, len))
-   return -EFAULT;
-
ret = spu_acquire_saved(ctx);
if (ret)
return ret;
@@ -2028,9 +2011,6 @@ static ssize_t spufs_ibox_info_read(struct file *file, 
char __user *buf,
u32 stat, data;
int ret;
 
-   if (!access_ok(buf, len))
-   return -EFAULT;
-
ret = spu_acquire_saved(ctx);
if (ret)
return ret;
@@ -2082,9 +2062,6 @@ static ssize_t spufs_wbox_info_read(struct file *file, 
char __user *buf,
u32 data[ARRAY_SIZE(ctx->csa.spu_mailbox_data)];
int ret, count;
 
-   if (!access_ok(buf, len))
-   return -EFAULT;
-
ret = spu_acquire_saved(ctx);
if (ret)
return ret;
@@ -2143,9 +2120,6 @@ static ssize_t spufs_dma_info_read(struct file *fil

Re: [PATCH] fixup! signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32

2020-04-28 Thread Christophe Leroy




Le 28/04/2020 à 21:56, Arnd Bergmann a écrit :

I think I found a way to improve the x32 handling:

This is a simplification over Christoph's "[PATCH 2/7] signal: factor
copy_siginfo_to_external32 from copy_siginfo_to_user32", reducing the
x32 specifics in the common code to a single #ifdef/#endif check, in
order to keep it more readable for everyone else.

Christoph, if you like it, please fold into your patch.

Cc: Andrew Morton 
Cc: Alexander Viro 
Cc: Jeremy Kerr 
Cc: Arnd Bergmann 
Cc: "Eric W . Biederman" 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-fsde...@vger.kernel.org
Signed-off-by: Arnd Bergmann 
---
  arch/x86/include/asm/compat.h | 10 ++
  arch/x86/kernel/signal.c  | 23 +++
  kernel/signal.c   | 15 ++-
  3 files changed, 35 insertions(+), 13 deletions(-)



[...]


diff --git a/kernel/signal.c b/kernel/signal.c
index 1a81602050b4..935facca4860 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -3318,29 +3318,18 @@ void copy_siginfo_to_external32(struct compat_siginfo 
*to,
}
  }
  
+#ifndef CONFIG_X86_X32_ABI


Can it be declared __weak instead of enclosing it in an #ifndef ?


  int copy_siginfo_to_user32(struct compat_siginfo __user *to,
   const struct kernel_siginfo *from)
-#if defined(CONFIG_X86_X32_ABI) || defined(CONFIG_IA32_EMULATION)
-{
-   return __copy_siginfo_to_user32(to, from, in_x32_syscall());
-}
-int __copy_siginfo_to_user32(struct compat_siginfo __user *to,
-const struct kernel_siginfo *from, bool x32_ABI)
-#endif
  {
struct compat_siginfo new;
  
  	copy_siginfo_to_external32(&new, from);

-#ifdef CONFIG_X86_X32_ABI
-   if (x32_ABI && from->si_signo == SIGCHLD) {
-   new._sifields._sigchld_x32._utime = from->si_utime;
-   new._sifields._sigchld_x32._stime = from->si_stime;
-   }
-#endif
if (copy_to_user(to, &new, sizeof(struct compat_siginfo)))
return -EFAULT;
return 0;
  }
+#endif
  
  static int post_copy_siginfo_from_user32(kernel_siginfo_t *to,

 const struct compat_siginfo *from)



Christophe


[PATCH] powerpc/64: refactor interrupt exit irq disabling sequence

2020-04-28 Thread Nicholas Piggin
The same complicated sequence for juggling EE, RI, soft mask, and
irq tracing is repeated 3 times, tidy these up into one function.

This differs qiute a bit between sub architectures, so this makes
the ppc32 port cleaner as well.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/syscall_64.c | 58 +++-
 1 file changed, 28 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index c74295a7765b..8f7e268f3294 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -101,6 +101,31 @@ notrace long system_call_exception(long r3, long r4, long 
r5,
return f(r3, r4, r5, r6, r7, r8);
 }
 
+/*
+ * local irqs must be disabled. Returns false if the caller must re-enable
+ * them, check for new work, and try again.
+ */
+static notrace inline bool prep_irq_for_enabled_exit(void)
+{
+   /* This must be done with RI=1 because tracing may touch vmaps */
+   trace_hardirqs_on();
+
+   /* This pattern matches prep_irq_for_idle */
+   __hard_EE_RI_disable();
+   if (unlikely(lazy_irq_pending())) {
+   /* Took an interrupt, may have more exit work to do. */
+   __hard_RI_enable();
+   trace_hardirqs_off();
+   local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+
+   return false;
+   }
+   local_paca->irq_happened = 0;
+   irq_soft_mask_set(IRQS_ENABLED);
+
+   return true;
+}
+
 /*
  * This should be called after a syscall returns, with r3 the return value
  * from the syscall. If this function returns non-zero, the system call
@@ -184,21 +209,10 @@ notrace unsigned long syscall_exit_prepare(unsigned long 
r3,
}
}
 
-   /* This must be done with RI=1 because tracing may touch vmaps */
-   trace_hardirqs_on();
-
-   /* This pattern matches prep_irq_for_idle */
-   __hard_EE_RI_disable();
-   if (unlikely(lazy_irq_pending())) {
-   __hard_RI_enable();
-   trace_hardirqs_off();
-   local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+   if (unlikely(!prep_irq_for_enabled_exit())) {
local_irq_enable();
-   /* Took an interrupt, may have more exit work to do. */
goto again;
}
-   local_paca->irq_happened = 0;
-   irq_soft_mask_set(IRQS_ENABLED);
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
local_paca->tm_scratch = regs->msr;
@@ -262,19 +276,11 @@ notrace unsigned long interrupt_exit_user_prepare(struct 
pt_regs *regs, unsigned
}
}
 
-   trace_hardirqs_on();
-   __hard_EE_RI_disable();
-   if (unlikely(lazy_irq_pending())) {
-   __hard_RI_enable();
-   trace_hardirqs_off();
-   local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+   if (unlikely(!prep_irq_for_enabled_exit())) {
local_irq_enable();
local_irq_disable();
-   /* Took an interrupt, may have more exit work to do. */
goto again;
}
-   local_paca->irq_happened = 0;
-   irq_soft_mask_set(IRQS_ENABLED);
 
 #ifdef CONFIG_PPC_BOOK3E
if (unlikely(ts->debug.dbcr0 & DBCR0_IDM)) {
@@ -332,13 +338,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct 
pt_regs *regs, unsign
}
}
 
-   trace_hardirqs_on();
-   __hard_EE_RI_disable();
-   if (unlikely(lazy_irq_pending())) {
-   __hard_RI_enable();
-   irq_soft_mask_set(IRQS_ALL_DISABLED);
-   trace_hardirqs_off();
-   local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
+   if (unlikely(!prep_irq_for_enabled_exit())) {
/*
 * Can't local_irq_restore to replay if we were in
 * interrupt context. Must replay directly.
@@ -352,8 +352,6 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct 
pt_regs *regs, unsign
/* Took an interrupt, may have more exit work to do. */
goto again;
}
-   local_paca->irq_happened = 0;
-   irq_soft_mask_set(IRQS_ENABLED);
} else {
/* Returning to a kernel context with local irqs disabled. */
__hard_EE_RI_disable();
-- 
2.23.0



[PATCH 0/6] assorted kuap fixes

2020-04-28 Thread Nicholas Piggin
Here's a bunch of fixes I collected, and some that Aneesh needs for
his kuap on hash series.

Nicholas Piggin (6):
  powerpc/64/kuap: move kuap checks out of MSR[RI]=0 regions of exit
code
  missing isync
  powerpc/64/kuap: interrupt exit kuap restore add missing isync,
conditionally restore AMR
  rpc/64/kuap: restore AMR in system reset exception
  powerpc/64/kuap: restore AMR in fast_interrupt_return
  powerpc/64/kuap: conditionally restore AMR in kuap_restore_amr asm

 .../powerpc/include/asm/book3s/64/kup-radix.h | 45 ---
 arch/powerpc/kernel/entry_64.S| 10 +++--
 arch/powerpc/kernel/exceptions-64s.S  |  3 +-
 arch/powerpc/kernel/syscall_64.c  | 26 +++
 4 files changed, 63 insertions(+), 21 deletions(-)

-- 
2.23.0



[PATCH 1/6] powerpc/64/kuap: move kuap checks out of MSR[RI]=0 regions of exit code

2020-04-28 Thread Nicholas Piggin
Any kind of WARN causes a program check that will crash with
unrecoverable exception if it occurs when RI is clear.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/syscall_64.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 8f7e268f3294..a37c7717424f 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -35,6 +35,8 @@ notrace long system_call_exception(long r3, long r4, long r5,
BUG_ON(!FULL_REGS(regs));
BUG_ON(regs->softe != IRQS_ENABLED);
 
+   kuap_check_amr();
+
account_cpu_user_entry();
 
 #ifdef CONFIG_PPC_SPLPAR
@@ -47,8 +49,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
}
 #endif
 
-   kuap_check_amr();
-
/*
 * This is not required for the syscall exit path, but makes the
 * stack frame look nicer. If this was initialised in the first stack
@@ -142,6 +142,8 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
unsigned long ti_flags;
unsigned long ret = 0;
 
+   kuap_check_amr();
+
regs->result = r3;
 
/* Check whether the syscall is issued inside a restartable sequence */
@@ -218,8 +220,6 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
local_paca->tm_scratch = regs->msr;
 #endif
 
-   kuap_check_amr();
-
account_cpu_user_exit();
 
return ret;
@@ -242,6 +242,8 @@ notrace unsigned long interrupt_exit_user_prepare(struct 
pt_regs *regs, unsigned
BUG_ON(!FULL_REGS(regs));
BUG_ON(regs->softe != IRQS_ENABLED);
 
+   kuap_check_amr();
+
local_irq_save(flags);
 
 again:
@@ -298,8 +300,6 @@ notrace unsigned long interrupt_exit_user_prepare(struct 
pt_regs *regs, unsigned
local_paca->tm_scratch = regs->msr;
 #endif
 
-   kuap_check_amr();
-
account_cpu_user_exit();
 
return ret;
@@ -319,6 +319,8 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct 
pt_regs *regs, unsign
BUG_ON(regs->msr & MSR_PR);
BUG_ON(!FULL_REGS(regs));
 
+   kuap_check_amr();
+
if (unlikely(*ti_flagsp & _TIF_EMULATE_STACK_STORE)) {
clear_bits(_TIF_EMULATE_STACK_STORE, ti_flagsp);
ret = 1;
-- 
2.23.0



[PATCH 2/6] missing isync

2020-04-28 Thread Nicholas Piggin
---
 arch/powerpc/include/asm/book3s/64/kup-radix.h | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index 3bcef989a35d..8dc5f292b806 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -16,7 +16,9 @@
 #ifdef CONFIG_PPC_KUAP
BEGIN_MMU_FTR_SECTION_NESTED(67)
ld  \gpr, STACK_REGS_KUAP(r1)
+   isync
mtspr   SPRN_AMR, \gpr
+   /* No isync required, see kuap_restore_amr() */
END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67)
 #endif
 .endm
@@ -62,8 +64,15 @@
 
 static inline void kuap_restore_amr(struct pt_regs *regs)
 {
-   if (mmu_has_feature(MMU_FTR_RADIX_KUAP))
+   if (mmu_has_feature(MMU_FTR_RADIX_KUAP)) {
+   isync();
mtspr(SPRN_AMR, regs->kuap);
+   /*
+* No isync required here because we are about to rfi
+* back to previous context before any user accesses
+* would be made, which is a CSI.
+*/
+   }
 }
 
 static inline void kuap_check_amr(void)
-- 
2.23.0



[PATCH 3/6] powerpc/64/kuap: interrupt exit kuap restore add missing isync, conditionally restore AMR

2020-04-28 Thread Nicholas Piggin
This fixes a missing isync before the mtspr(AMR), which ensures previous
memory accesses execute before the mtspr, so they can't slip past the
AMR check and access user memory if we are returning to a context where
kuap is allowed.

The AMR is made conditional, and only done if the AMR must change, which
should be the less common case on most workloads (though kernel page
faults on uaccess could be frequent, this doesn't significantly slow
down that case).

Signed-off-by: Nicholas Piggin 
---
 .../powerpc/include/asm/book3s/64/kup-radix.h | 36 ++-
 arch/powerpc/kernel/syscall_64.c  | 14 +---
 2 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index 8dc5f292b806..ec8970958a26 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -62,19 +62,32 @@
 #include 
 #include 
 
-static inline void kuap_restore_amr(struct pt_regs *regs)
+static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr)
 {
if (mmu_has_feature(MMU_FTR_RADIX_KUAP)) {
-   isync();
-   mtspr(SPRN_AMR, regs->kuap);
-   /*
-* No isync required here because we are about to rfi
-* back to previous context before any user accesses
-* would be made, which is a CSI.
-*/
+   if (unlikely(regs->kuap != amr)) {
+   isync();
+   mtspr(SPRN_AMR, regs->kuap);
+   /*
+* No isync required here because we are about to rfi
+* back to previous context before any user accesses
+* would be made, which is a CSI.
+*/
+   }
}
 }
 
+static inline unsigned long kuap_get_and_check_amr(void)
+{
+   if (mmu_has_feature(MMU_FTR_RADIX_KUAP)) {
+   unsigned long amr = mfspr(SPRN_AMR);
+   if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG)) /* kuap_check_amr() */
+   WARN_ON_ONCE(amr != AMR_KUAP_BLOCKED);
+   return amr;
+   }
+   return 0;
+}
+
 static inline void kuap_check_amr(void)
 {
if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && 
mmu_has_feature(MMU_FTR_RADIX_KUAP))
@@ -151,13 +164,18 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long 
address, bool is_write)
"Bug: %s fault blocked by AMR!", is_write ? "Write" : 
"Read");
 }
 #else /* CONFIG_PPC_KUAP */
-static inline void kuap_restore_amr(struct pt_regs *regs)
+static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr)
 {
 }
 
 static inline void kuap_check_amr(void)
 {
 }
+
+static inline unsigned long kuap_get_and_check_amr(void)
+{
+   return 0;
+}
 #endif /* CONFIG_PPC_KUAP */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index a37c7717424f..edeab10c6888 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -242,6 +242,10 @@ notrace unsigned long interrupt_exit_user_prepare(struct 
pt_regs *regs, unsigned
BUG_ON(!FULL_REGS(regs));
BUG_ON(regs->softe != IRQS_ENABLED);
 
+   /*
+* We don't need to restore AMR on the way back to userspace for KUAP.
+* AMR can only have been unlocked if we interrupted the kernel.
+*/
kuap_check_amr();
 
local_irq_save(flags);
@@ -313,13 +317,14 @@ notrace unsigned long 
interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign
unsigned long *ti_flagsp = ¤t_thread_info()->flags;
unsigned long flags;
unsigned long ret = 0;
+   unsigned long amr;
 
if (IS_ENABLED(CONFIG_PPC_BOOK3S) && unlikely(!(regs->msr & MSR_RI)))
unrecoverable_exception(regs);
BUG_ON(regs->msr & MSR_PR);
BUG_ON(!FULL_REGS(regs));
 
-   kuap_check_amr();
+   amr = kuap_get_and_check_amr();
 
if (unlikely(*ti_flagsp & _TIF_EMULATE_STACK_STORE)) {
clear_bits(_TIF_EMULATE_STACK_STORE, ti_flagsp);
@@ -367,10 +372,11 @@ notrace unsigned long 
interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign
 #endif
 
/*
-* We don't need to restore AMR on the way back to userspace for KUAP.
-* The value of AMR only matters while we're in the kernel.
+* Don't want to mfspr(SPRN_AMR) here, because this comes after
+* mtmsr, which would cause RAW stalls. Hence, we take the AMR value
+* from the check above.
 */
-   kuap_restore_amr(regs);
+   kuap_restore_amr(regs, amr);
 
return ret;
 }
-- 
2.23.0



[PATCH 4/6] rpc/64/kuap: restore AMR in system reset exception

2020-04-28 Thread Nicholas Piggin
system reset interrupt handler locks AMR and exits with
PTION_RESTORE_REGS without restoring AMR. Similarly to the soft-NMI
ler, it needs to restore.

ed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/exceptions-64s.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 728ccb0f560c..b0ad930cbae5 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -971,6 +971,7 @@ EXC_COMMON_BEGIN(system_reset_common)
ld  r10,SOFTE(r1)
stb r10,PACAIRQSOFTMASK(r13)
 
+   kuap_restore_amr r10
EXCEPTION_RESTORE_REGS
RFI_TO_USER_OR_KERNEL
 
-- 
2.23.0



[PATCH 5/6] powerpc/64/kuap: restore AMR in fast_interrupt_return

2020-04-28 Thread Nicholas Piggin
Interrupts that use fast_interrupt_return actually do lock AMR, but they
have been ones which tend to come from userspace (or kernel bugs) in
radix mode. With kuap on hash, segment interrupts are taken in kernel
often, which quickly breaks due to the missing restore.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/entry_64.S | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 9a1e5d636dea..b3c9f15089b6 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -472,15 +472,17 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 #ifdef CONFIG_PPC_BOOK3S
/*
 * If MSR EE/RI was never enabled, IRQs not reconciled, NVGPRs not
-* touched, AMR not set, no exit work created, then this can be used.
+* touched, no exit work created, then this can be used.
 */
.balign IFETCH_ALIGN_BYTES
.globl fast_interrupt_return
 fast_interrupt_return:
 _ASM_NOKPROBE_SYMBOL(fast_interrupt_return)
+   kuap_check_amr r3, r4
ld  r4,_MSR(r1)
andi.   r0,r4,MSR_PR
bne .Lfast_user_interrupt_return
+   kuap_restore_amr r3
andi.   r0,r4,MSR_RI
li  r3,0 /* 0 return value, no EMULATE_STACK_STORE */
bne+.Lfast_kernel_interrupt_return
-- 
2.23.0



[PATCH 6/6] powerpc/64/kuap: conditionally restore AMR in kuap_restore_amr asm

2020-04-28 Thread Nicholas Piggin
Similar to the C code change, make the AMR restore conditional on
whether the register has changed.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/kup-radix.h | 10 +++---
 arch/powerpc/kernel/entry_64.S |  8 
 arch/powerpc/kernel/exceptions-64s.S   |  4 ++--
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index ec8970958a26..e82df54f5681 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -12,13 +12,17 @@
 
 #ifdef __ASSEMBLY__
 
-.macro kuap_restore_amrgpr
+.macro kuap_restore_amrgpr1, gpr2
 #ifdef CONFIG_PPC_KUAP
BEGIN_MMU_FTR_SECTION_NESTED(67)
-   ld  \gpr, STACK_REGS_KUAP(r1)
+   mfspr   \gpr1, SPRN_AMR
+   ld  \gpr2, STACK_REGS_KUAP(r1)
+   cmpd\gpr1, \gpr2
+   beq 998f
isync
-   mtspr   SPRN_AMR, \gpr
+   mtspr   SPRN_AMR, \gpr2
/* No isync required, see kuap_restore_amr() */
+998:
END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67)
 #endif
 .endm
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index b3c9f15089b6..9d49338e0c85 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -479,11 +479,11 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 fast_interrupt_return:
 _ASM_NOKPROBE_SYMBOL(fast_interrupt_return)
kuap_check_amr r3, r4
-   ld  r4,_MSR(r1)
-   andi.   r0,r4,MSR_PR
+   ld  r5,_MSR(r1)
+   andi.   r0,r5,MSR_PR
bne .Lfast_user_interrupt_return
-   kuap_restore_amr r3
-   andi.   r0,r4,MSR_RI
+   kuap_restore_amr r3, r4
+   andi.   r0,r5,MSR_RI
li  r3,0 /* 0 return value, no EMULATE_STACK_STORE */
bne+.Lfast_kernel_interrupt_return
addir3,r1,STACK_FRAME_OVERHEAD
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index b0ad930cbae5..ef4a90212664 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -971,7 +971,7 @@ EXC_COMMON_BEGIN(system_reset_common)
ld  r10,SOFTE(r1)
stb r10,PACAIRQSOFTMASK(r13)
 
-   kuap_restore_amr r10
+   kuap_restore_amr r9, r10
EXCEPTION_RESTORE_REGS
RFI_TO_USER_OR_KERNEL
 
@@ -2757,7 +2757,7 @@ EXC_COMMON_BEGIN(soft_nmi_common)
ld  r10,SOFTE(r1)
stb r10,PACAIRQSOFTMASK(r13)
 
-   kuap_restore_amr r10
+   kuap_restore_amr r9, r10
EXCEPTION_RESTORE_REGS hsrr=0
RFI_TO_KERNEL
 
-- 
2.23.0



Re: [PATCH] fixup! signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32

2020-04-28 Thread Christoph Hellwig
On Wed, Apr 29, 2020 at 08:17:22AM +0200, Christophe Leroy wrote:
>>   +#ifndef CONFIG_X86_X32_ABI
>
> Can it be declared __weak instead of enclosing it in an #ifndef ?

I really hate the __weak ifdefs.  But my plan was to move to a
CONFIG_ARCH_COPY_SIGINFO_TO_USER32 and have x86 select it.


Re: [RFC PATCH] powerpc/spufs: fix copy_to_user while atomic

2020-04-28 Thread Jeremy Kerr
Hi Christoph,

> And another one that should go on top of this one to address Al's other
> compaint:

Yeah, I was pondering that one. The access_ok() is kinda redundant, but
it does avoid forcing a SPU context save on those errors.

However, it's not like we really need to optimise for the case of
invalid addresses from userspace. So, I'll include this change in the
submission to Michael's tree. Arnd - let me know if you have any
objections.

Cheers,


Jeremy




Re: Section mismatch in reference from the function .early_init_mmu() to the function .init.text:.radix__early_init_mmu() after PowerPC updates 5.7-1

2020-04-28 Thread Christian Zigotzky

On 29 April 2020 at 07:13 am, Nicholas Piggin wrote:

Excerpts from Christian Zigotzky's message of April 29, 2020 2:53 pm:

Hi All,

The issue still exists in the RC3. (kernel config attached)

Please help me to fix this issue.

Huh, looks like maybe early_init_mmu() got uninlined because the
compiler decided it was unlikely.

Does this fix it?

Thanks,
Nick
--


Hi Nick,

Many thanks for your patch! It solved the issue.

Have a nice day!

Cheers,
Christian



diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index bb3deb76c951..3ffe5f967483 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -208,7 +208,7 @@ void hash__early_init_devtree(void);
  void radix__early_init_devtree(void);
  extern void hash__early_init_mmu(void);
  extern void radix__early_init_mmu(void);
-static inline void early_init_mmu(void)
+static inline void __init early_init_mmu(void)
  {
if (radix_enabled())
return radix__early_init_mmu();





Re: [PATCH] fixup! signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32

2020-04-28 Thread Christoph Hellwig
On Tue, Apr 28, 2020 at 09:56:26PM +0200, Arnd Bergmann wrote:
> I think I found a way to improve the x32 handling:
> 
> This is a simplification over Christoph's "[PATCH 2/7] signal: factor
> copy_siginfo_to_external32 from copy_siginfo_to_user32", reducing the
> x32 specifics in the common code to a single #ifdef/#endif check, in
> order to keep it more readable for everyone else.
> 
> Christoph, if you like it, please fold into your patch.

What do you think of this version?  This one always overrides
copy_siginfo_to_user32 for the x86 compat case to keep the churn down,
and improves the copy_siginfo_to_external32 documentation a bit.

---
>From 5ca642c4c744ce6460ebb91fe30ec7a064d28e96 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Wed, 29 Apr 2020 08:38:07 +0200
Subject: signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32

To remove the use of set_fs in the coredump code there needs to be a
way to convert a kernel siginfo to a userspace compat siginfo.

Factour out a copy_siginfo_to_external32 helper from
copy_siginfo_to_user32 that fills out the compat_siginfo, but does so
on a kernel space data structure.  Handling of the x32 SIGCHLD magic
is kept out of copy_siginfo_to_external32, as coredumps are never
caused by SIGCHLD.  To simplify the mess that the current
copy_siginfo_to_user32 is, we allow architectures to override it, and
keep thus keep the x32 SIGCHLD magic confined to x86-64.

Contains improvements from Eric W. Biederman 
and Arnd Bergmann .

Signed-off-by: Christoph Hellwig 
---
 arch/Kconfig |   3 ++
 arch/x86/Kconfig |   1 +
 arch/x86/kernel/signal.c |  22 
 include/linux/compat.h   |   2 +
 kernel/signal.c  | 108 ---
 5 files changed, 83 insertions(+), 53 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 786a85d4ad40d..d5bc9f1ee1747 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -815,6 +815,9 @@ config OLD_SIGACTION
 config COMPAT_OLD_SIGACTION
bool
 
+config ARCH_COPY_SIGINFO_TO_USER32
+   bool
+
 config COMPAT_32BIT_TIME
bool "Provide system calls for 32-bit time_t"
default !64BIT || COMPAT
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1197b5596d5ad..ad13facbe9ffe 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2906,6 +2906,7 @@ config X86_X32
 config COMPAT_32
def_bool y
depends on IA32_EMULATION || X86_32
+   select ARCH_COPY_SIGINFO_TO_USER32
select HAVE_UID16
select OLD_SIGSUSPEND3
 
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 83b74fb38c8fc..d2b09866105a2 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -511,6 +511,28 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig,
 }
 #endif /* CONFIG_X86_32 */
 
+#ifdef CONFIG_ARCH_COPY_SIGINFO_TO_USER32
+int __copy_siginfo_to_user32(struct compat_siginfo __user *to,
+const struct kernel_siginfo *from, bool x32_ABI)
+{
+   struct compat_siginfo new;
+
+   copy_siginfo_to_external32(&new, from);
+   if (x32_ABI && from->si_signo == SIGCHLD) {
+   new._sifields._sigchld_x32._utime = from->si_utime;
+   new._sifields._sigchld_x32._stime = from->si_stime;
+   }
+   if (copy_to_user(to, &new, sizeof(struct compat_siginfo)))
+   return -EFAULT;
+   return 0;
+}
+int copy_siginfo_to_user32(struct compat_siginfo __user *to,
+  const struct kernel_siginfo *from)
+{
+   return __copy_siginfo_to_user32(to, from, in_x32_syscall());
+}
+#endif /* CONFIG_ARCH_COPY_SIGINFO_TO_USER32 */
+
 static int x32_setup_rt_frame(struct ksignal *ksig,
  compat_sigset_t *set,
  struct pt_regs *regs)
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 0480ba4db5929..adbfe8f688d92 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -402,6 +402,8 @@ long compat_get_bitmap(unsigned long *mask, const 
compat_ulong_t __user *umask,
   unsigned long bitmap_size);
 long compat_put_bitmap(compat_ulong_t __user *umask, unsigned long *mask,
   unsigned long bitmap_size);
+void copy_siginfo_to_external32(struct compat_siginfo *to,
+   const struct kernel_siginfo *from);
 int copy_siginfo_from_user32(kernel_siginfo_t *to, const struct compat_siginfo 
__user *from);
 int copy_siginfo_to_user32(struct compat_siginfo __user *to, const 
kernel_siginfo_t *from);
 int get_compat_sigevent(struct sigevent *event,
diff --git a/kernel/signal.c b/kernel/signal.c
index 284fc1600063b..710655df1269d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -3235,96 +3235,98 @@ int copy_siginfo_from_user(kernel_siginfo_t *to, const 
siginfo_t __user *from)
 }
 
 #ifdef CONFIG_COMPAT
-int copy_siginfo_to_user32(struct compat_siginfo __user *to,
-  const struct kernel_siginfo *fro

Re: [PATCH v2 2/3] powerpc/numa: Prefer node id queried from vphn

2020-04-28 Thread Gautham R Shenoy
Hello Srikar,

On Tue, Apr 28, 2020 at 03:08:35PM +0530, Srikar Dronamraju wrote:
> Node id queried from the static device tree may not
> be correct. For example: it may always show 0 on a shared processor.
> Hence prefer the node id queried from vphn and fallback on the device tree
> based node id if vphn query fails.
> 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux...@kvack.org
> Cc: linux-ker...@vger.kernel.org
> Cc: Michal Hocko 
> Cc: Mel Gorman 
> Cc: Vlastimil Babka 
> Cc: "Kirill A. Shutemov" 
> Cc: Christopher Lameter 
> Cc: Michael Ellerman 
> Cc: Andrew Morton 
> Cc: Linus Torvalds 
> Signed-off-by: Srikar Dronamraju 
> ---
> Changelog v1:->v2:
> - Rebased to v5.7-rc3
> 
>  arch/powerpc/mm/numa.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index b3615b7fdbdf..281531340230 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -719,20 +719,20 @@ static int __init parse_numa_properties(void)
>*/
>   for_each_present_cpu(i) {
>   struct device_node *cpu;
> - int nid;
> -
> - cpu = of_get_cpu_node(i, NULL);
> - BUG_ON(!cpu);
> - nid = of_node_to_nid_single(cpu);
> - of_node_put(cpu);
> + int nid = vphn_get_nid(i);
> 
>   /*
>* Don't fall back to default_nid yet -- we will plug
>* cpus into nodes once the memory scan has discovered
>* the topology.
>*/
> - if (nid < 0)
> - continue;


> + if (nid == NUMA_NO_NODE) {
> + cpu = of_get_cpu_node(i, NULL);
> + if (cpu) {

Why are we not retaining the BUG_ON(!cpu) assert here ?

> + nid = of_node_to_nid_single(cpu);
> + of_node_put(cpu);
> + }
> + }

Is it possible at this point that both vphn_get_nid(i) and
of_node_to_nid_single(cpu) returns NUMA_NO_NODE ? If so,
should we still call node_set_online() below ?


>   node_set_online(nid);
>   }
> 
> -- 
> 2.20.1
> 
--
Thanks and Regards
gautham.


[PATCH 0/6] assorted kuap fixes (try again)

2020-04-28 Thread Nicholas Piggin
Well the last series was a disaster, I'll try again sending the
patches with proper subject and changelogs written.

Nicholas Piggin (6):
  powerpc/64/kuap: move kuap checks out of MSR[RI]=0 regions of exit
code
  powerpc/64s/kuap: kuap_restore missing isync
  powerpc/64/kuap: interrupt exit conditionally restore AMR
  powerpc/64s/kuap: restore AMR in system reset exception
  powerpc/64s/kuap: restore AMR in fast_interrupt_return
  powerpc/64s/kuap: conditionally restore AMR in kuap_restore_amr asm

 .../powerpc/include/asm/book3s/64/kup-radix.h | 45 ---
 arch/powerpc/kernel/entry_64.S| 10 +++--
 arch/powerpc/kernel/exceptions-64s.S  |  3 +-
 arch/powerpc/kernel/syscall_64.c  | 26 +++
 4 files changed, 63 insertions(+), 21 deletions(-)

-- 
2.23.0



[PATCH 1/6] powerpc/64/kuap: move kuap checks out of MSR[RI]=0 regions of exit code

2020-04-28 Thread Nicholas Piggin
Any kind of WARN causes a program check that will crash with
unrecoverable exception if it occurs when RI is clear.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/syscall_64.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
index 8f7e268f3294..a37c7717424f 100644
--- a/arch/powerpc/kernel/syscall_64.c
+++ b/arch/powerpc/kernel/syscall_64.c
@@ -35,6 +35,8 @@ notrace long system_call_exception(long r3, long r4, long r5,
BUG_ON(!FULL_REGS(regs));
BUG_ON(regs->softe != IRQS_ENABLED);
 
+   kuap_check_amr();
+
account_cpu_user_entry();
 
 #ifdef CONFIG_PPC_SPLPAR
@@ -47,8 +49,6 @@ notrace long system_call_exception(long r3, long r4, long r5,
}
 #endif
 
-   kuap_check_amr();
-
/*
 * This is not required for the syscall exit path, but makes the
 * stack frame look nicer. If this was initialised in the first stack
@@ -142,6 +142,8 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
unsigned long ti_flags;
unsigned long ret = 0;
 
+   kuap_check_amr();
+
regs->result = r3;
 
/* Check whether the syscall is issued inside a restartable sequence */
@@ -218,8 +220,6 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
local_paca->tm_scratch = regs->msr;
 #endif
 
-   kuap_check_amr();
-
account_cpu_user_exit();
 
return ret;
@@ -242,6 +242,8 @@ notrace unsigned long interrupt_exit_user_prepare(struct 
pt_regs *regs, unsigned
BUG_ON(!FULL_REGS(regs));
BUG_ON(regs->softe != IRQS_ENABLED);
 
+   kuap_check_amr();
+
local_irq_save(flags);
 
 again:
@@ -298,8 +300,6 @@ notrace unsigned long interrupt_exit_user_prepare(struct 
pt_regs *regs, unsigned
local_paca->tm_scratch = regs->msr;
 #endif
 
-   kuap_check_amr();
-
account_cpu_user_exit();
 
return ret;
@@ -319,6 +319,8 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct 
pt_regs *regs, unsign
BUG_ON(regs->msr & MSR_PR);
BUG_ON(!FULL_REGS(regs));
 
+   kuap_check_amr();
+
if (unlikely(*ti_flagsp & _TIF_EMULATE_STACK_STORE)) {
clear_bits(_TIF_EMULATE_STACK_STORE, ti_flagsp);
ret = 1;
-- 
2.23.0