Re: [PATCH 2/7] signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32
On Mon, Apr 27, 2020 at 03:40:50PM -0700, Andrew Morton wrote: > > https://www.spinics.net/lists/kernel/msg3473847.html > > https://www.spinics.net/lists/kernel/msg3473840.html > > https://www.spinics.net/lists/kernel/msg3473843.html > > OK, but that doesn't necessitate the above monstrosity? How about > > static int __copy_siginfo_to_user32(struct compat_siginfo __user *to, >const struct kernel_siginfo *from, bool x32_ABI) > { > struct compat_siginfo new; > copy_siginfo_to_external32(&new, from); > ... > } > > int copy_siginfo_to_user32(struct compat_siginfo __user *to, > const struct kernel_siginfo *from) > { > #if defined(CONFIG_X86_X32_ABI) || defined(CONFIG_IA32_EMULATION) > return __copy_siginfo_to_user32(to, from, in_x32_syscall()); > #else > return __copy_siginfo_to_user32(to, from, 0); > #endif > } > > Or something like that - I didn't try very hard. We know how to do > this stuff, and surely this thing isn't how! I guess that might be a worthwhile middle ground. Still not a fan of all these ifdefs..
Re: [PATCH 2/7] signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32
Le 28/04/2020 à 09:09, Christoph Hellwig a écrit : On Mon, Apr 27, 2020 at 03:40:50PM -0700, Andrew Morton wrote: https://www.spinics.net/lists/kernel/msg3473847.html https://www.spinics.net/lists/kernel/msg3473840.html https://www.spinics.net/lists/kernel/msg3473843.html OK, but that doesn't necessitate the above monstrosity? How about static int __copy_siginfo_to_user32(struct compat_siginfo __user *to, const struct kernel_siginfo *from, bool x32_ABI) { struct compat_siginfo new; copy_siginfo_to_external32(&new, from); ... } int copy_siginfo_to_user32(struct compat_siginfo __user *to, const struct kernel_siginfo *from) { #if defined(CONFIG_X86_X32_ABI) || defined(CONFIG_IA32_EMULATION) return __copy_siginfo_to_user32(to, from, in_x32_syscall()); #else return __copy_siginfo_to_user32(to, from, 0); #endif } Or something like that - I didn't try very hard. We know how to do this stuff, and surely this thing isn't how! I guess that might be a worthwhile middle ground. Still not a fan of all these ifdefs.. Can't we move the small X32 specific part out of __copy_siginfo_to_user32(), in an arch specific helper that voids for other architectures ? Something like: if (!arch_special_something(&new, from)) { new.si_utime = from->si_utime; new.si_stime = from->si_stime; } Then the arch_special_something() does what it wants in x86 and returns 1, and for architectures not implementating it, a generic version return 0 all the time. Christophe
Re: [PATCH 2/7] signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32
On Tue, Apr 28, 2020 at 09:45:46AM +0200, Christophe Leroy wrote: >> I guess that might be a worthwhile middle ground. Still not a fan of >> all these ifdefs.. >> > > Can't we move the small X32 specific part out of > __copy_siginfo_to_user32(), in an arch specific helper that voids for other > architectures ? > > Something like: > > if (!arch_special_something(&new, from)) { > new.si_utime = from->si_utime; > new.si_stime = from->si_stime; > } > > Then the arch_special_something() does what it wants in x86 and returns 1, > and for architectures not implementating it, a generic version return 0 all > the time. The main issue is that we need an explicit paramter to select x32, as it can't just be discovered from the calling context otherwise. The rest is just sugarcoating.
[PATCH v2 1/3] powerpc/numa: Set numa_node for all possible cpus
A Powerpc system with multiple possible nodes and with CONFIG_NUMA enabled always used to have a node 0, even if node 0 does not any cpus or memory attached to it. As per PAPR, node affinity of a cpu is only available once its present / online. For all cpus that are possible but not present, cpu_to_node() would point to node 0. To ensure a cpuless, memoryless dummy node is not online, powerpc need to make sure all possible but not present cpu_to_node are set to a proper node. Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Cc: linux-ker...@vger.kernel.org Cc: Michal Hocko Cc: Mel Gorman Cc: Vlastimil Babka Cc: "Kirill A. Shutemov" Cc: Christopher Lameter Cc: Michael Ellerman Cc: Andrew Morton Cc: Linus Torvalds Signed-off-by: Srikar Dronamraju --- Changelog v1:->v2: - Rebased to v5.7-rc3 arch/powerpc/mm/numa.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 9fcf2d195830..b3615b7fdbdf 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -931,8 +931,20 @@ void __init mem_topology_setup(void) reset_numa_cpu_lookup_table(); - for_each_present_cpu(cpu) - numa_setup_cpu(cpu); + for_each_possible_cpu(cpu) { + /* +* Powerpc with CONFIG_NUMA always used to have a node 0, +* even if it was memoryless or cpuless. For all cpus that +* are possible but not present, cpu_to_node() would point +* to node 0. To remove a cpuless, memoryless dummy node, +* powerpc need to make sure all possible but not present +* cpu_to_node are set to a proper node. +*/ + if (cpu_present(cpu)) + numa_setup_cpu(cpu); + else + set_cpu_numa_node(cpu, first_online_node); + } } void __init initmem_init(void) -- 2.20.1
[PATCH v2 2/3] powerpc/numa: Prefer node id queried from vphn
Node id queried from the static device tree may not be correct. For example: it may always show 0 on a shared processor. Hence prefer the node id queried from vphn and fallback on the device tree based node id if vphn query fails. Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Cc: linux-ker...@vger.kernel.org Cc: Michal Hocko Cc: Mel Gorman Cc: Vlastimil Babka Cc: "Kirill A. Shutemov" Cc: Christopher Lameter Cc: Michael Ellerman Cc: Andrew Morton Cc: Linus Torvalds Signed-off-by: Srikar Dronamraju --- Changelog v1:->v2: - Rebased to v5.7-rc3 arch/powerpc/mm/numa.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index b3615b7fdbdf..281531340230 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -719,20 +719,20 @@ static int __init parse_numa_properties(void) */ for_each_present_cpu(i) { struct device_node *cpu; - int nid; - - cpu = of_get_cpu_node(i, NULL); - BUG_ON(!cpu); - nid = of_node_to_nid_single(cpu); - of_node_put(cpu); + int nid = vphn_get_nid(i); /* * Don't fall back to default_nid yet -- we will plug * cpus into nodes once the memory scan has discovered * the topology. */ - if (nid < 0) - continue; + if (nid == NUMA_NO_NODE) { + cpu = of_get_cpu_node(i, NULL); + if (cpu) { + nid = of_node_to_nid_single(cpu); + of_node_put(cpu); + } + } node_set_online(nid); } -- 2.20.1
[PATCH v2 0/3] Offline memoryless cpuless node 0
Changelog v1:->v2: - Rebased to v5.7-rc3 - Updated the changelog. Linux kernel configured with CONFIG_NUMA on a system with multiple possible nodes, marks node 0 as online at boot. However in practice, there are systems which have node 0 as memoryless and cpuless. This can cause 1. numa_balancing to be enabled on systems with only one online node. 2. Existence of dummy (cpuless and memoryless) node which can confuse users/scripts looking at output of lscpu / numactl. This patchset wants to correct this anomaly. This should only affect systems that have CONFIG_MEMORYLESS_NODES. Currently there are only 2 architectures ia64 and powerpc that have this config. Note: Patch 3 in this patch series depends on patches 1 and 2. Without patches 1 and 2, patch 3 might crash powerpc. v5.7-rc3 available: 2 nodes (0,2) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31490 MB node distances: node 0 2 0: 10 20 2: 20 10 proc and sys files -- /sys/devices/system/node/online:0,2 /proc/sys/kernel/numa_balancing:1 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory:2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 v5.7-rc3 + patches -- available: 1 nodes (2) node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31487 MB node distances: node 2 2: 10 proc and sys files -- /sys/devices/system/node/online:2 /proc/sys/kernel/numa_balancing:0 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory:2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Cc: linux-ker...@vger.kernel.org Cc: Michal Hocko Cc: Mel Gorman Cc: Vlastimil Babka Cc: "Kirill A. Shutemov" Cc: Christopher Lameter Cc: Michael Ellerman Cc: Andrew Morton Cc: Linus Torvalds Srikar Dronamraju (3): powerpc/numa: Set numa_node for all possible cpus powerpc/numa: Prefer node id queried from vphn mm/page_alloc: Keep memoryless cpuless node 0 offline arch/powerpc/mm/numa.c | 32 ++-- mm/page_alloc.c| 4 +++- 2 files changed, 25 insertions(+), 11 deletions(-) -- 2.20.1
[PATCH v2 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
Currently Linux kernel with CONFIG_NUMA on a system with multiple possible nodes, marks node 0 as online at boot. However in practice, there are systems which have node 0 as memoryless and cpuless. This can cause numa_balancing to be enabled on systems with only one node with memory and CPUs. The existence of this dummy node which is cpuless and memoryless node can confuse users/scripts looking at output of lscpu / numactl. By marking, N_ONLINE as NODE_MASK_NONE, lets stop assuming that Node 0 is always online. v5.7-rc3 available: 2 nodes (0,2) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31490 MB node distances: node 0 2 0: 10 20 2: 20 10 proc and sys files -- /sys/devices/system/node/online:0,2 /proc/sys/kernel/numa_balancing:1 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory:2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 v5.7-rc3 + patch -- available: 1 nodes (2) node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31487 MB node distances: node 2 2: 10 proc and sys files -- /sys/devices/system/node/online:2 /proc/sys/kernel/numa_balancing:0 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory:2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 Note: On Powerpc, cpu_to_node of possible but not present cpus would previously return 0. Hence this commit depends on commit ("powerpc/numa: Set numa_node for all possible cpus") and commit ("powerpc/numa: Prefer node id queried from vphn"). Without the 2 commits, Powerpc system might crash. Cc: linuxppc-dev@lists.ozlabs.org Cc: linux...@kvack.org Cc: linux-ker...@vger.kernel.org Cc: Michal Hocko Cc: Mel Gorman Cc: Vlastimil Babka Cc: "Kirill A. Shutemov" Cc: Christopher Lameter Cc: Michael Ellerman Cc: Andrew Morton Cc: Linus Torvalds Signed-off-by: Srikar Dronamraju --- Changelog v1:->v2: - Rebased to v5.7-rc3 - Updated the changelog. mm/page_alloc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 69827d4fa052..03b89592af04 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -116,8 +116,10 @@ EXPORT_SYMBOL(latent_entropy); */ nodemask_t node_states[NR_NODE_STATES] __read_mostly = { [N_POSSIBLE] = NODE_MASK_ALL, +#ifdef CONFIG_NUMA + [N_ONLINE] = NODE_MASK_NONE, +#else [N_ONLINE] = { { [0] = 1UL } }, -#ifndef CONFIG_NUMA [N_NORMAL_MEMORY] = { { [0] = 1UL } }, #ifdef CONFIG_HIGHMEM [N_HIGH_MEMORY] = { { [0] = 1UL } }, -- 2.20.1
Re: [PATCH v6 00/28] Initial Prefixed Instruction support
On Tue, 2020-04-28 at 11:57 +1000, Jordan Niethe wrote: > A future revision of the ISA will introduce prefixed instructions. A > prefixed instruction is composed of a 4-byte prefix followed by a > 4-byte suffix. > > All prefixes have the major opcode 1. A prefix will never be a valid > word instruction. A suffix may be an existing word instruction or a > new instruction. > > This series enables prefixed instructions and extends the instruction > emulation to support them. Then the places where prefixed instructions > might need to be emulated are updated. Hi Jordan, I tried to test Kprobes with prefixed instruction on this patchset and observed that kprobe/uprobe enablement patches are missing from v4. Till v3 it were available, https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200211053355.21574-11-jniet...@gmail.com/ https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200211053355.21574-12-jniet...@gmail.com/ was it missed for any dependencies/reason ? or will you plan it include in next version ? please let me know if you need help on it. -- Bala > > v6 is based on feedback from Balamuruhan Suriyakumar, Alistair Popple, > Christophe Leroy and Segher Boessenkool. > The major changes: > - Use the instruction type in more places that had been missed before > - Fix issues with ppc32 > - Introduce new self tests for code patching and feature fixups > > v5 is based on feedback from Nick Piggins, Michael Ellerman, Balamuruhan > Suriyakumar and Alistair Popple. > The major changes: > - The ppc instruction type is now a struct > - Series now just based on next > - ppc_inst_masked() dropped > - Space for xmon breakpoints allocated in an assembly file > - "Add prefixed instructions to instruction data type" patch seperated in > to smaller patches > - Calling convention for create_branch() is changed > - Some places which had not been updated to use the data type are now > updated > > v4 is based on feedback from Nick Piggins, Christophe Leroy and Daniel > Axtens. > The major changes: > - Move xmon breakpoints from data section to text section > - Introduce a data type for instructions on powerpc > > v3 is based on feedback from Christophe Leroy. The major changes: > - Completely replacing store_inst() with patch_instruction() in > xmon > - Improve implementation of mread_instr() to not use mread(). > - Base the series on top of > https://patchwork.ozlabs.org/patch/1232619/ as this will effect > kprobes. > - Some renaming and simplification of conditionals. > > v2 incorporates feedback from Daniel Axtens and and Balamuruhan > S. The major changes are: > - Squashing together all commits about SRR1 bits > - Squashing all commits for supporting prefixed load stores > - Changing abbreviated references to sufx/prfx -> suffix/prefix > - Introducing macros for returning the length of an instruction > - Removing sign extension flag from pstd/pld in sstep.c > - Dropping patch "powerpc/fault: Use analyse_instr() to check for > store with updates to sp" from the series, it did not really fit > with prefixed enablement in the first place and as reported by Greg > Kurz did not work correctly. > > Alistair Popple (1): > powerpc: Enable Prefixed Instructions > > Jordan Niethe (27): > powerpc/xmon: Remove store_inst() for patch_instruction() > powerpc/xmon: Move breakpoint instructions to own array > powerpc/xmon: Move breakpoints to text section > powerpc/xmon: Use bitwise calculations in_breakpoint_table() > powerpc: Change calling convention for create_branch() et. al. > powerpc: Use a macro for creating instructions from u32s > powerpc: Use an accessor for instructions > powerpc: Use a function for getting the instruction op code > powerpc: Use a function for byte swapping instructions > powerpc: Introduce functions for instruction equality > powerpc: Use a datatype for instructions > powerpc: Use a function for reading instructions > powerpc: Add a probe_user_read_inst() function > powerpc: Add a probe_kernel_read_inst() function > powerpc/kprobes: Use patch_instruction() > powerpc: Define and use __get_user_instr{,inatomic}() > powerpc: Introduce a function for reporting instruction length > powerpc/xmon: Use a function for reading instructions > powerpc/xmon: Move insertion of breakpoint for xol'ing > powerpc: Make test_translate_branch() independent of instruction > length > powerpc: Define new SRR1 bits for a future ISA version > powerpc: Add prefixed instructions to instruction data type > powerpc: Test prefixed code patching > powerpc: Test prefixed instructions in feature fixups > powerpc: Support prefixed instructions in alignment handler > powerpc sstep: Add support for prefixed load/stores > powerpc sstep: Add support for prefixed fixed-point arithmetic > > arch/powerpc/include/asm/code-patching.
Re: [PATCH v2 1/2] cpufreq: qoriq: convert to a platform driver
On 21-04-20, 10:29, Mian Yousaf Kaukab wrote: > The driver has to be manually loaded if it is built as a module. It > is neither exporting MODULE_DEVICE_TABLE nor MODULE_ALIAS. Moreover, > no platform-device is created (and thus no uevent is sent) for the > clockgen nodes it depends on. > > Convert the module to a platform driver with its own alias. Moreover, > drop whitelisted SOCs. Platform device will be created only for the > compatible platforms. > > Reviewed-by: Yuantian Tang > Acked-by: Viresh Kumar > Signed-off-by: Mian Yousaf Kaukab > --- > v2: > +Rafael, Stephen, linux-clk > Add Reviewed-by and Acked-by tags > > drivers/cpufreq/qoriq-cpufreq.c | 76 > - > 1 file changed, 29 insertions(+), 47 deletions(-) @Rafael, Though this looks to be PPC stuff, but it is used on both ARM and PPC. Do you want to pick them up or should I do that ? -- viresh
Re: [PATCH] powerpc: Add interrupt mode information in /proc/cpuinfo
Cédric Le Goater writes: > PowerNV and pSeries machines can run using the XIVE or XICS interrupt > mode. Report this information in /proc/cpuinfo : > > timebase: 51200 > platform: PowerNV > model : 9006-22C > machine : PowerNV 9006-22C > firmware: OPAL > MMU : Radix > IRQ : XIVE H, I dunno. At what point do we stop putting random non CPU-related things in cpuinfo? :) The IRQ mode is (reasonably) easily discovered in sys, eg: $ cat /sys/kernel/irq/*/chip_name | grep -m 1 XIVE XIVE-IRQ vs: $ cat /sys/kernel/irq/*/chip_name | grep -m 1 XICS XICS cheers > diff --git a/arch/powerpc/platforms/powernv/setup.c > b/arch/powerpc/platforms/powernv/setup.c > index 3bc188da82ba..39ef3394038d 100644 > --- a/arch/powerpc/platforms/powernv/setup.c > +++ b/arch/powerpc/platforms/powernv/setup.c > @@ -196,14 +196,18 @@ static void pnv_show_cpuinfo(struct seq_file *m) > model = of_get_property(root, "model", NULL); > seq_printf(m, "machine\t\t: PowerNV %s\n", model); > if (firmware_has_feature(FW_FEATURE_OPAL)) > - seq_printf(m, "firmware\t: OPAL\n"); > + seq_puts(m, "firmware\t: OPAL\n"); > else > - seq_printf(m, "firmware\t: BML\n"); > + seq_puts(m, "firmware\t: BML\n"); > of_node_put(root); > if (radix_enabled()) > - seq_printf(m, "MMU\t\t: Radix\n"); > + seq_puts(m, "MMU\t\t: Radix\n"); > else > - seq_printf(m, "MMU\t\t: Hash\n"); > + seq_puts(m, "MMU\t\t: Hash\n"); > + if (xive_enabled()) > + seq_puts(m, "IRQ\t\t: XIVE\n"); > + else > + seq_puts(m, "IRQ\t\t: XICS\n"); > } > > static void pnv_prepare_going_down(void) > diff --git a/arch/powerpc/platforms/pseries/setup.c > b/arch/powerpc/platforms/pseries/setup.c > index 0c8421dd01ab..d248fca67797 100644 > --- a/arch/powerpc/platforms/pseries/setup.c > +++ b/arch/powerpc/platforms/pseries/setup.c > @@ -95,9 +95,13 @@ static void pSeries_show_cpuinfo(struct seq_file *m) > seq_printf(m, "machine\t\t: CHRP %s\n", model); > of_node_put(root); > if (radix_enabled()) > - seq_printf(m, "MMU\t\t: Radix\n"); > + seq_puts(m, "MMU\t\t: Radix\n"); > else > - seq_printf(m, "MMU\t\t: Hash\n"); > + seq_puts(m, "MMU\t\t: Hash\n"); > + if (xive_enabled()) > + seq_puts(m, "IRQ\t\t: XIVE\n"); > + else > + seq_puts(m, "IRQ\t\t: XICS\n"); > } > > /* Initialize firmware assisted non-maskable interrupts if > -- > 2.25.3
[PATCH v2] powerpc/64: BE option to use ELFv2 ABI for big endian kernels
Provide an option to use ELFv2 ABI for big endian builds. This works on GCC and clang (since 2014). It is less well tested and supported by the GNU toolchain, but it can give some useful advantages of the ELFv2 ABI for BE (e.g., less stack usage). Some distros even build BE ELFv2 userspace. Reviewed-by: Segher Boessenkool Signed-off-by: Nicholas Piggin --- Since v1: - Improved the override flavour name suggested by Segher. - Improved changelog wording. arch/powerpc/Kconfig| 19 +++ arch/powerpc/Makefile | 15 ++- arch/powerpc/boot/Makefile | 4 drivers/crypto/vmx/Makefile | 8 ++-- drivers/crypto/vmx/ppc-xlate.pl | 10 ++ 5 files changed, 45 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 924c541a9260..d9d2abc06c2c 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -147,6 +147,7 @@ config PPC select ARCH_WEAK_RELEASE_ACQUIRE select BINFMT_ELF select BUILDTIME_TABLE_SORT + select BUILD_ELF_V2 if PPC64 && CPU_LITTLE_ENDIAN select CLONE_BACKWARDS select DCACHE_WORD_ACCESS if PPC64 && CPU_LITTLE_ENDIAN select DYNAMIC_FTRACE if FUNCTION_TRACER @@ -541,6 +542,24 @@ config KEXEC_FILE config ARCH_HAS_KEXEC_PURGATORY def_bool KEXEC_FILE +config BUILD_ELF_V2 + bool + +config BUILD_BIG_ENDIAN_ELF_V2 + bool "Build big-endian kernel using ELFv2 ABI (EXPERIMENTAL)" + depends on PPC64 && CPU_BIG_ENDIAN && EXPERT + default n + select BUILD_ELF_V2 + help + This builds the kernel image using the ELFv2 ABI, which has a + reduced stack overhead and faster function calls. This does not + affect the userspace ABIs. + + ELFv2 is the standard ABI for little-endian, but for big-endian + this is an experimental option that is less tested (kernel and + toolchain). This requires gcc 4.9 or newer and binutils 2.24 or + newer. + config RELOCATABLE bool "Build a relocatable kernel" depends on PPC64 || (FLATMEM && (44x || FSL_BOOKE)) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index f310c32e88a4..e306b39d847e 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -92,10 +92,14 @@ endif ifdef CONFIG_PPC64 ifndef CONFIG_CC_IS_CLANG -cflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mabi=elfv1) -cflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mcall-aixdesc) -aflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mabi=elfv1) -aflags-$(CONFIG_CPU_LITTLE_ENDIAN) += -mabi=elfv2 +ifdef CONFIG_BUILD_ELF_V2 +cflags-y += $(call cc-option,-mabi=elfv2,$(call cc-option,-mcall-aixdesc)) +aflags-y += $(call cc-option,-mabi=elfv2) +else +cflags-y += $(call cc-option,-mabi=elfv1) +cflags-y += $(call cc-option,-mcall-aixdesc) +aflags-y += $(call cc-option,-mabi=elfv1) +endif endif endif @@ -144,7 +148,7 @@ endif CFLAGS-$(CONFIG_PPC64) := $(call cc-option,-mtraceback=no) ifndef CONFIG_CC_IS_CLANG -ifdef CONFIG_CPU_LITTLE_ENDIAN +ifdef CONFIG_BUILD_ELF_V2 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2,$(call cc-option,-mcall-aixdesc)) AFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2) else @@ -153,6 +157,7 @@ CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcall-aixdesc) AFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv1) endif endif + CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium,$(call cc-option,-mminimal-toc)) CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mno-pointers-to-nested-functions) diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index c53a1b8bba8b..03942d08695d 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -41,6 +41,10 @@ endif BOOTCFLAGS += -isystem $(shell $(BOOTCC) -print-file-name=include) +ifdef CONFIG_BUILD_ELF_V2 +BOOTCFLAGS += $(call cc-option,-mabi=elfv2) +endif + ifdef CONFIG_CPU_BIG_ENDIAN BOOTCFLAGS += -mbig-endian else diff --git a/drivers/crypto/vmx/Makefile b/drivers/crypto/vmx/Makefile index 709670d2b553..9aea34602beb 100644 --- a/drivers/crypto/vmx/Makefile +++ b/drivers/crypto/vmx/Makefile @@ -5,18 +5,22 @@ vmx-crypto-objs := vmx.o aesp8-ppc.o ghashp8-ppc.o aes.o aes_cbc.o aes_ctr.o aes ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y) override flavour := linux-ppc64le else +ifdef CONFIG_BUILD_ELF_V2 +override flavour := linux-ppc64-elfv2 +else override flavour := linux-ppc64 endif +endif quiet_cmd_perl = PERL $@ cmd_perl = $(PERL) $(<) $(flavour) > $(@) targets += aesp8-ppc.S ghashp8-ppc.S -$(obj)/aesp8-ppc.S: $(src)/aesp8-ppc.pl FORCE +$(obj)/aesp8-ppc.S: $(src)/aesp8-ppc.pl $(src)/ppc-xlate.pl FORCE
Re: [PATCH v4] pci: Make return value of pcie_capability_read*() consistent
On Tue, Apr 28, 2020 at 10:19:08AM +0800, Yicong Yang wrote: > On 2020/4/28 2:13, Bjorn Helgaas wrote: > > > > I'm starting to think we're approaching this backwards. I searched > > for PCIBIOS_FUNC_NOT_SUPPORTED, PCIBIOS_BAD_VENDOR_ID, and the other > > error values. Almost every use is a *return* in a config accessor. > > There are very, very few *tests* for these values. > > If we have certain reasons to reserve PCI_BIOS* error to identify > PCI errors in PCI drivers, maybe redefine the PCI_BIOS* to generic > error codes can solve the issues, and no need to call > pcibios_err_to_errno() to do the conversion. Few changes may be > made to current codes. One possible patch may look like below. > Otherwise, maybe convert all PCI_BIOS* errors to generic error codes > is a better idea. > > Not sure it's the best way or not. Just FYI. That's a brilliant idea! We should still look carefully at all the callers of the config accessors, but this would avoid changing all the arch accessors, so the patch would be dramatically smaller. > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 83ce1cd..843987c 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -675,14 +675,18 @@ static inline bool pci_dev_msi_enabled(struct pci_dev > *pci_dev) { return false; > > /* Error values that may be returned by PCI functions */ > #define PCIBIOS_SUCCESSFUL 0x00 > -#define PCIBIOS_FUNC_NOT_SUPPORTED 0x81 > -#define PCIBIOS_BAD_VENDOR_ID0x83 > -#define PCIBIOS_DEVICE_NOT_FOUND 0x86 > -#define PCIBIOS_BAD_REGISTER_NUMBER 0x87 > -#define PCIBIOS_SET_FAILED 0x88 > -#define PCIBIOS_BUFFER_TOO_SMALL 0x89 > - > -/* Translate above to generic errno for passing back through non-PCI code */ > +#define PCIBIOS_FUNC_NOT_SUPPORTED -ENOENT > +#define PCIBIOS_BAD_VENDOR_ID-ENOTTY > +#define PCIBIOS_DEVICE_NOT_FOUND -ENODEV > +#define PCIBIOS_BAD_REGISTER_NUMBER -EFAULT > +#define PCIBIOS_SET_FAILED -EIO > +#define PCIBIOS_BUFFER_TOO_SMALL -ENOSPC > + > +/** > + * Translate above to generic errno for passing back through non-PCI code > + * > + * Deprecated. Use the PCIBIOS_* directly without a translation. > + */ > static inline int pcibios_err_to_errno(int err) > { > if (err <= PCIBIOS_SUCCESSFUL) > @@ -690,17 +694,12 @@ static inline int pcibios_err_to_errno(int err) > > switch (err) { > case PCIBIOS_FUNC_NOT_SUPPORTED: > - return -ENOENT; > case PCIBIOS_BAD_VENDOR_ID: > - return -ENOTTY; > case PCIBIOS_DEVICE_NOT_FOUND: > - return -ENODEV; > case PCIBIOS_BAD_REGISTER_NUMBER: > - return -EFAULT; > case PCIBIOS_SET_FAILED: > - return -EIO; > case PCIBIOS_BUFFER_TOO_SMALL: > - return -ENOSPC; > + return err; > } > > return -ERANGE; > > > For example, the only tests for PCIBIOS_FUNC_NOT_SUPPORTED are in > > xen_pcibios_err_to_errno() and pcibios_err_to_errno(), i.e., we're > > just converting that value to -ENOENT or the Xen-specific thing. > > > > So I think the best approach might be to remove the PCIBIOS_* error > > values completely and replace them with the corresponding values from > > pcibios_err_to_errno(). For example, a part of the patch would look > > like this: > > > > diff --git a/arch/mips/pci/ops-emma2rh.c b/arch/mips/pci/ops-emma2rh.c > > index 65f47344536c..d4d9c902c147 100644 > > --- a/arch/mips/pci/ops-emma2rh.c > > +++ b/arch/mips/pci/ops-emma2rh.c > > @@ -100,7 +100,7 @@ static int pci_config_read(struct pci_bus *bus, > > unsigned int devfn, int where, > > break; > > default: > > emma2rh_out32(EMMA2RH_PCI_IWIN0_CTR, backup_win0); > > - return PCIBIOS_FUNC_NOT_SUPPORTED; > > + return -ENOENT; > > } > > > > emma2rh_out32(EMMA2RH_PCI_IWIN0_CTR, backup_win0); > > @@ -149,7 +149,7 @@ static int pci_config_write(struct pci_bus *bus, > > unsigned int devfn, int where, > > break; > > default: > > emma2rh_out32(EMMA2RH_PCI_IWIN0_CTR, backup_win0); > > - return PCIBIOS_FUNC_NOT_SUPPORTED; > > + return -ENOENT; > > } > > *(volatile u32 *)(base + (PCI_FUNC(devfn) << 8) + > > (where & 0xfffc)) = data; > > diff --git a/include/linux/pci.h b/include/linux/pci.h > > index 83ce1cdf5676..f95637a8d391 100644 > > --- a/include/linux/pci.h > > +++ b/include/linux/pci.h > > @@ -675,7 +675,6 @@ static inline bool pci_dev_msi_enabled(struct pci_dev > > *pci_dev) { return false; > > > > /* Error values that may be returned by PCI functions */ > > #define PCIBIOS_SUCCESSFUL 0x00 > > -#define PCIBIOS_FUNC_NOT_SUPPORTED 0x81 > > #define PCIBIOS_BAD_VENDOR_ID 0x83 > > #define PCIBIOS_DEVICE_NOT_FOUND 0x86 > > #define PCIBIOS_BAD_REGISTER_NUMBER0x87 > > @@ -689,8 +688,6 @@ static inline int pcibios_err_to_er
[PATCH] powerpc/spufs: Add rcu_read_lock() around fcheck()
Currently the spu coredump code triggers an RCU warning: = WARNING: suspicious RCU usage 5.7.0-rc3-01755-g7cd49f0b7ec7 #1 Not tainted - include/linux/fdtable.h:95 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by spu-coredump/1343: #0: c007fa22f430 (sb_writers#2){.+.+}-{0:0}, at: .do_coredump+0x1010/0x13c8 stack backtrace: CPU: 0 PID: 1343 Comm: spu-coredump Not tainted 5.7.0-rc3-01755-g7cd49f0b7ec7 #1 Call Trace: .dump_stack+0xec/0x15c (unreliable) .lockdep_rcu_suspicious+0x120/0x144 .coredump_next_context+0x148/0x158 .spufs_coredump_extra_notes_size+0x54/0x190 .elf_coredump_extra_notes_size+0x34/0x50 .elf_core_dump+0xe48/0x19d0 .do_coredump+0xe50/0x13c8 .get_signal+0x864/0xd88 .do_notify_resume+0x158/0x3c8 .interrupt_exit_user_prepare+0x19c/0x208 interrupt_return+0x14/0x1c0 This comes from fcheck_files() via fcheck(). It's pretty clearly documented that fcheck() must be wrapped with rcu_read_lock(), so fix it. Signed-off-by: Michael Ellerman --- arch/powerpc/platforms/cell/spufs/coredump.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/cell/spufs/coredump.c b/arch/powerpc/platforms/cell/spufs/coredump.c index 8b3296b62f65..0fc52cbaa552 100644 --- a/arch/powerpc/platforms/cell/spufs/coredump.c +++ b/arch/powerpc/platforms/cell/spufs/coredump.c @@ -82,13 +82,19 @@ static int match_context(const void *v, struct file *file, unsigned fd) */ static struct spu_context *coredump_next_context(int *fd) { + struct spu_context *ctx; struct file *file; int n = iterate_fd(current->files, *fd, match_context, NULL); if (!n) return NULL; *fd = n - 1; + + rcu_read_lock(); file = fcheck(*fd); - return SPUFS_I(file_inode(file))->i_ctx; + ctx = SPUFS_I(file_inode(file))->i_ctx; + rcu_read_unlock(); + + return ctx; } int spufs_coredump_extra_notes_size(void) -- 2.25.1
Re: [PATCH] powerpc: Add interrupt mode information in /proc/cpuinfo
On 4/28/20 1:03 PM, Michael Ellerman wrote: > Cédric Le Goater writes: >> PowerNV and pSeries machines can run using the XIVE or XICS interrupt >> mode. Report this information in /proc/cpuinfo : >> >> timebase: 51200 >> platform: PowerNV >> model : 9006-22C >> machine : PowerNV 9006-22C >> firmware: OPAL >> MMU : Radix >> IRQ : XIVE > > H, I dunno. At what point do we stop putting random non CPU-related > things in cpuinfo? :) True. > The IRQ mode is (reasonably) easily discovered in sys, eg: > > $ cat /sys/kernel/irq/*/chip_name | grep -m 1 XIVE > XIVE-IRQ > > vs: > > $ cat /sys/kernel/irq/*/chip_name | grep -m 1 XICS > XICS That's good enough for error reporting Thanks. C. > > cheers > >> diff --git a/arch/powerpc/platforms/powernv/setup.c >> b/arch/powerpc/platforms/powernv/setup.c >> index 3bc188da82ba..39ef3394038d 100644 >> --- a/arch/powerpc/platforms/powernv/setup.c >> +++ b/arch/powerpc/platforms/powernv/setup.c >> @@ -196,14 +196,18 @@ static void pnv_show_cpuinfo(struct seq_file *m) >> model = of_get_property(root, "model", NULL); >> seq_printf(m, "machine\t\t: PowerNV %s\n", model); >> if (firmware_has_feature(FW_FEATURE_OPAL)) >> -seq_printf(m, "firmware\t: OPAL\n"); >> +seq_puts(m, "firmware\t: OPAL\n"); >> else >> -seq_printf(m, "firmware\t: BML\n"); >> +seq_puts(m, "firmware\t: BML\n"); >> of_node_put(root); >> if (radix_enabled()) >> -seq_printf(m, "MMU\t\t: Radix\n"); >> +seq_puts(m, "MMU\t\t: Radix\n"); >> else >> -seq_printf(m, "MMU\t\t: Hash\n"); >> +seq_puts(m, "MMU\t\t: Hash\n"); >> +if (xive_enabled()) >> +seq_puts(m, "IRQ\t\t: XIVE\n"); >> +else >> +seq_puts(m, "IRQ\t\t: XICS\n"); >> } >> >> static void pnv_prepare_going_down(void) >> diff --git a/arch/powerpc/platforms/pseries/setup.c >> b/arch/powerpc/platforms/pseries/setup.c >> index 0c8421dd01ab..d248fca67797 100644 >> --- a/arch/powerpc/platforms/pseries/setup.c >> +++ b/arch/powerpc/platforms/pseries/setup.c >> @@ -95,9 +95,13 @@ static void pSeries_show_cpuinfo(struct seq_file *m) >> seq_printf(m, "machine\t\t: CHRP %s\n", model); >> of_node_put(root); >> if (radix_enabled()) >> -seq_printf(m, "MMU\t\t: Radix\n"); >> +seq_puts(m, "MMU\t\t: Radix\n"); >> else >> -seq_printf(m, "MMU\t\t: Hash\n"); >> +seq_puts(m, "MMU\t\t: Hash\n"); >> +if (xive_enabled()) >> +seq_puts(m, "IRQ\t\t: XIVE\n"); >> +else >> +seq_puts(m, "IRQ\t\t: XICS\n"); >> } >> >> /* Initialize firmware assisted non-maskable interrupts if >> -- >> 2.25.3
[RFC PATCH] powerpc/spufs: fix copy_to_user while atomic
Currently, we may perform a copy_to_user (through simple_read_from_buffer()) while holding a context's register_lock, while accessing the context save area. This change uses a temporary buffers for the context save area data, which we then pass to simple_read_from_buffer. Signed-off-by: Jeremy Kerr --- Christoph - this fixes the copy_to_user while atomic, hopefully with only minimal distruption to your series. --- arch/powerpc/platforms/cell/spufs/file.c | 110 +++ 1 file changed, 74 insertions(+), 36 deletions(-) diff --git a/arch/powerpc/platforms/cell/spufs/file.c b/arch/powerpc/platforms/cell/spufs/file.c index c0f950a3f4e1..c62d77ddaf7d 100644 --- a/arch/powerpc/platforms/cell/spufs/file.c +++ b/arch/powerpc/platforms/cell/spufs/file.c @@ -1978,8 +1978,9 @@ static ssize_t __spufs_mbox_info_read(struct spu_context *ctx, static ssize_t spufs_mbox_info_read(struct file *file, char __user *buf, size_t len, loff_t *pos) { - int ret; struct spu_context *ctx = file->private_data; + u32 stat, data; + int ret; if (!access_ok(buf, len)) return -EFAULT; @@ -1988,11 +1989,16 @@ static ssize_t spufs_mbox_info_read(struct file *file, char __user *buf, if (ret) return ret; spin_lock(&ctx->csa.register_lock); - ret = __spufs_mbox_info_read(ctx, buf, len, pos); + stat = ctx->csa.prob.mb_stat_R; + data = ctx->csa.prob.pu_mb_R; spin_unlock(&ctx->csa.register_lock); spu_release_saved(ctx); - return ret; + /* EOF if there's no entry in the mbox */ + if (!(stat & 0xff)) + return 0; + + return simple_read_from_buffer(buf, len, pos, &data, sizeof(data)); } static const struct file_operations spufs_mbox_info_fops = { @@ -2019,6 +2025,7 @@ static ssize_t spufs_ibox_info_read(struct file *file, char __user *buf, size_t len, loff_t *pos) { struct spu_context *ctx = file->private_data; + u32 stat, data; int ret; if (!access_ok(buf, len)) @@ -2028,11 +2035,16 @@ static ssize_t spufs_ibox_info_read(struct file *file, char __user *buf, if (ret) return ret; spin_lock(&ctx->csa.register_lock); - ret = __spufs_ibox_info_read(ctx, buf, len, pos); + stat = ctx->csa.prob.mb_stat_R; + data = ctx->csa.priv2.puint_mb_R; spin_unlock(&ctx->csa.register_lock); spu_release_saved(ctx); - return ret; + /* EOF if there's no entry in the ibox */ + if (!(stat & 0xff)) + return 0; + + return simple_read_from_buffer(buf, len, pos, &data, sizeof(data)); } static const struct file_operations spufs_ibox_info_fops = { @@ -2041,6 +2053,11 @@ static const struct file_operations spufs_ibox_info_fops = { .llseek = generic_file_llseek, }; +static size_t spufs_wbox_info_cnt(struct spu_context *ctx) +{ + return (4 - ((ctx->csa.prob.mb_stat_R & 0x00ff00) >> 8)) * sizeof(u32); +} + static ssize_t __spufs_wbox_info_read(struct spu_context *ctx, char __user *buf, size_t len, loff_t *pos) { @@ -2049,7 +2066,7 @@ static ssize_t __spufs_wbox_info_read(struct spu_context *ctx, u32 wbox_stat; wbox_stat = ctx->csa.prob.mb_stat_R; - cnt = 4 - ((wbox_stat & 0x00ff00) >> 8); + cnt = spufs_wbox_info_cnt(ctx); for (i = 0; i < cnt; i++) { data[i] = ctx->csa.spu_mailbox_data[i]; } @@ -2062,7 +2079,8 @@ static ssize_t spufs_wbox_info_read(struct file *file, char __user *buf, size_t len, loff_t *pos) { struct spu_context *ctx = file->private_data; - int ret; + u32 data[ARRAY_SIZE(ctx->csa.spu_mailbox_data)]; + int ret, count; if (!access_ok(buf, len)) return -EFAULT; @@ -2071,11 +2089,13 @@ static ssize_t spufs_wbox_info_read(struct file *file, char __user *buf, if (ret) return ret; spin_lock(&ctx->csa.register_lock); - ret = __spufs_wbox_info_read(ctx, buf, len, pos); + count = spufs_wbox_info_cnt(ctx); + memcpy(&data, &ctx->csa.spu_mailbox_data, sizeof(data)); spin_unlock(&ctx->csa.register_lock); spu_release_saved(ctx); - return ret; + return simple_read_from_buffer(buf, len, pos, &data, + count * sizeof(u32)); } static const struct file_operations spufs_wbox_info_fops = { @@ -2084,20 +2104,19 @@ static const struct file_operations spufs_wbox_info_fops = { .llseek = generic_file_llseek, }; -static ssize_t __spufs_dma_info_read(struct spu_context *ctx, - char __user *buf, size_t len, loff_t *pos) +static void ___spufs_dma_info_read(struct spu_context *ctx, + struct spu_dma_info *info) { - struct spu_dma_info i
[PATCH] powerpc/64: Don't initialise init_task->thread.regs
Aneesh increased the size of struct pt_regs by 16 bytes and started seeing this WARN_ON: smp: Bringing up secondary CPUs ... [ cut here ] WARNING: CPU: 0 PID: 0 at arch/powerpc/kernel/process.c:455 giveup_all+0xb4/0x110 Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+ #318 NIP: c001a2b4 LR: c001a29c CTR: c31d REGS: c26d3980 TRAP: 0700 Not tainted (5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+) MSR: 8282b033 CR: 48048224 XER: CFAR: c0019cc8 IRQMASK: 1 GPR00: c001a264 c26d3c20 c26d7200 8280b033 GPR04: 0001 0077 30206d7372203164 GPR08: 2000 02002000 8280b033 3230303030303030 GPR12: 8800 c31d 00800050 0266 GPR16: 0309a1a0 0309a4b0 0309a2d8 0309a890 GPR20: 030d0098 c264da40 fd62 c000ff798080 GPR24: c264edf0 c001007469f0 fd62 c20e5e90 GPR28: c264edf0 c264d200 1db6 c264d200 NIP [c001a2b4] giveup_all+0xb4/0x110 LR [c001a29c] giveup_all+0x9c/0x110 Call Trace: [c26d3c20] [c001a264] giveup_all+0x64/0x110 (unreliable) [c26d3c90] [c001ae34] __switch_to+0x104/0x480 [c26d3cf0] [c0e0b8a0] __schedule+0x320/0x970 [c26d3dd0] [c0e0c518] schedule_idle+0x38/0x70 [c26d3df0] [c019c7c8] do_idle+0x248/0x3f0 [c26d3e70] [c019cbb8] cpu_startup_entry+0x38/0x40 [c26d3ea0] [c0011bb0] rest_init+0xe0/0xf8 [c26d3ed0] [c2004820] start_kernel+0x990/0x9e0 [c26d3f90] [c000c49c] start_here_common+0x1c/0x400 Which was unexpected. The warning is checking the thread.regs->msr value of the task we are switching from: usermsr = tsk->thread.regs->msr; ... WARN_ON((usermsr & MSR_VSX) && !((usermsr & MSR_FP) && (usermsr & MSR_VEC))); ie. if MSR_VSX is set then both of MSR_FP and MSR_VEC are also set. Dumping tsk->thread.regs->msr we see that it's: 0x1db6 Which is not a normal looking MSR, in fact the only valid bit is MSR_VSX, all the other bits are reserved in the current definition of the MSR. We can see from the oops that it was swapper/0 that we were switching from when we hit the warning, ie. init_task. So its thread.regs points to the base (high addresses) in init_stack. Dumping the content of init_task->thread.regs, with the members of pt_regs annotated (the 16 bytes larger version), we see: c2780080gpr[0] gpr[1] c2666008gpr[2] gpr[3] c26d3ed0 0078gpr[4] gpr[5] c0011b68 c2780080gpr[6] gpr[7] gpr[8] gpr[9] c26d3f90 80002200gpr[10]gpr[11] c2004820 c26d7200gpr[12]gpr[13] 1db6 c10aabe8gpr[14]gpr[15] c10aabe8 c10aabe8gpr[16]gpr[17] c294d598 gpr[18]gpr[19] 1ff8gpr[20]gpr[21] c206d608gpr[22]gpr[23] c278e0cc gpr[24]gpr[25] 2fff c000gpr[26]gpr[27] 0200 0028gpr[28]gpr[29] 1db6 0475gpr[30]gpr[31] 0200 1db6nipmsr orig_r3ctr c000c49c link xer ccrsofte trap dar dsisr result pprkuap pad[2] pad[3] This looks suspiciously like stack frames, not a pt_regs. If we look closely we can see return addresses from the stack trace above, c2004820 (start_kernel) and c000c49c (start_here_common). init_task->thread.regs is setup at build time in processor.h: #define INIT_THREAD { \ .ksp = INIT_SP, \ .regs = (struct pt_regs *)INIT_SP - 1, /* XXX bogus, I think */ \ The early boot code where we setup the initial stack is: LOAD_REG_ADDR(r3,init_thread_union) /* set up a stack pointer */ LOAD_REG_IMMEDIATE(r1,THREAD_SIZE) add r1,r3,r1 lir0,0 stdu r0,-STACK_FRAME_OVERHEAD(r1) Which creates a stack frame of size 112 bytes (STACK_FRAME_OVERHEAD). Which is far too small to contain a pt_regs. So the result is init_task->thread.regs is pointing at some stack frames on the init stack, not at a pt_regs. We have gotten away with this
[PATCH] powerpc: Drop unneeded cast in task_pt_regs()
There's no need to cast in task_pt_regs() as tsk->thread.regs should already be a struct pt_regs. If someone's using task_pt_regs() on something that's not a task but happens to have a thread.regs then we'll deal with them later. Signed-off-by: Michael Ellerman --- arch/powerpc/include/asm/processor.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index bfa336fbcfeb..8e855c78d780 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -307,7 +307,7 @@ struct thread_struct { } #endif -#define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.regs) +#define task_pt_regs(tsk) ((tsk)->thread.regs) unsigned long get_wchan(struct task_struct *p); -- 2.25.1
Re: [PATCH] tty: hvc: Fix data abort due to race in hvc_open
> Hence, serialize hvc_open and check if tty->private_data is NULL before > proceeding ahead. How do you think about to add the tag “Fixes” because of adjustments for the data synchronisation? … > +++ b/drivers/tty/hvc/hvc_console.c … @@ -384,6 +394,8 @@ static int hvc_open(struct tty_struct *tty, struct file * filp) … > +out: > + mutex_unlock(&hvc_open_mutex); > return rc; > } I suggest to use the label “unlock” instead. Regards, Markus
[PATCH v8 2/8] powerpc/vdso: Remove __kernel_datapage_offset and simplify __get_datapage()
The VDSO datapage and the text pages are always located immediately next to each other, so it can be hardcoded without an indirection through __kernel_datapage_offset In order to ease things, move the data page in front like other arches, that way there is no need to know the size of the library to locate the data page. Before: clock-getres-realtime-coarse:vdso: 714 nsec/call clock-gettime-realtime-coarse:vdso: 792 nsec/call clock-gettime-realtime:vdso: 1243 nsec/call After: clock-getres-realtime-coarse:vdso: 699 nsec/call clock-gettime-realtime-coarse:vdso: 784 nsec/call clock-gettime-realtime:vdso: 1231 nsec/call Signed-off-by: Christophe Leroy --- v7: - Moved the removal of the tmp param of __get_datapage() in a subsequent patch - Included the addition of the offset param to __get_datapage() in the further preparatory patch --- arch/powerpc/include/asm/vdso_datapage.h | 8 ++-- arch/powerpc/kernel/vdso.c | 53 arch/powerpc/kernel/vdso32/datapage.S| 3 -- arch/powerpc/kernel/vdso32/vdso32.lds.S | 7 +--- arch/powerpc/kernel/vdso64/datapage.S| 3 -- arch/powerpc/kernel/vdso64/vdso64.lds.S | 7 +--- 6 files changed, 16 insertions(+), 65 deletions(-) diff --git a/arch/powerpc/include/asm/vdso_datapage.h b/arch/powerpc/include/asm/vdso_datapage.h index b9ef6cf50ea5..11886467dfdf 100644 --- a/arch/powerpc/include/asm/vdso_datapage.h +++ b/arch/powerpc/include/asm/vdso_datapage.h @@ -118,10 +118,12 @@ extern struct vdso_data *vdso_data; .macro get_datapage ptr, tmp bcl 20, 31, .+4 +999: mflr\ptr - addi\ptr, \ptr, (__kernel_datapage_offset - (.-4))@l - lwz \tmp, 0(\ptr) - add \ptr, \tmp, \ptr +#if CONFIG_PPC_PAGE_SHIFT > 14 + addis \ptr, \ptr, (_vdso_datapage - 999b)@ha +#endif + addi\ptr, \ptr, (_vdso_datapage - 999b)@l .endm #endif /* __ASSEMBLY__ */ diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c index f38f26e844b6..d33fa22ddbed 100644 --- a/arch/powerpc/kernel/vdso.c +++ b/arch/powerpc/kernel/vdso.c @@ -190,7 +190,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) * install_special_mapping or the perf counter mmap tracking code * will fail to recognise it as a vDSO (since arch_vma_name fails). */ - current->mm->context.vdso_base = vdso_base; + current->mm->context.vdso_base = vdso_base + PAGE_SIZE; /* * our vma flags don't have VM_WRITE so by default, the process isn't @@ -482,42 +482,6 @@ static __init void vdso_setup_trampolines(struct lib32_elfinfo *v32, vdso32_rt_sigtramp = find_function32(v32, "__kernel_sigtramp_rt32"); } -static __init int vdso_fixup_datapage(struct lib32_elfinfo *v32, - struct lib64_elfinfo *v64) -{ -#ifdef CONFIG_VDSO32 - Elf32_Sym *sym32; -#endif -#ifdef CONFIG_PPC64 - Elf64_Sym *sym64; - - sym64 = find_symbol64(v64, "__kernel_datapage_offset"); - if (sym64 == NULL) { - printk(KERN_ERR "vDSO64: Can't find symbol " - "__kernel_datapage_offset !\n"); - return -1; - } - *((int *)(vdso64_kbase + sym64->st_value - VDSO64_LBASE)) = - (vdso64_pages << PAGE_SHIFT) - - (sym64->st_value - VDSO64_LBASE); -#endif /* CONFIG_PPC64 */ - -#ifdef CONFIG_VDSO32 - sym32 = find_symbol32(v32, "__kernel_datapage_offset"); - if (sym32 == NULL) { - printk(KERN_ERR "vDSO32: Can't find symbol " - "__kernel_datapage_offset !\n"); - return -1; - } - *((int *)(vdso32_kbase + (sym32->st_value - VDSO32_LBASE))) = - (vdso32_pages << PAGE_SHIFT) - - (sym32->st_value - VDSO32_LBASE); -#endif - - return 0; -} - - static __init int vdso_fixup_features(struct lib32_elfinfo *v32, struct lib64_elfinfo *v64) { @@ -618,9 +582,6 @@ static __init int vdso_setup(void) if (vdso_do_find_sections(&v32, &v64)) return -1; - if (vdso_fixup_datapage(&v32, &v64)) - return -1; - if (vdso_fixup_features(&v32, &v64)) return -1; @@ -761,26 +722,26 @@ static int __init vdso_init(void) vdso32_pagelist = kcalloc(vdso32_pages + 2, sizeof(struct page *), GFP_KERNEL); BUG_ON(vdso32_pagelist == NULL); + vdso32_pagelist[0] = virt_to_page(vdso_data); for (i = 0; i < vdso32_pages; i++) { struct page *pg = virt_to_page(vdso32_kbase + i*PAGE_SIZE); get_page(pg); - vdso32_pagelist[i] = pg; + vdso32_pagelist[i + 1] = pg; } - vdso32_pagelist[i++] = virt_to_page(vdso_data); - vdso32_pagelist[i] = NULL; + vdso32_pagelist[i + 1] = NULL; #endif #ifdef C
[PATCH v8 1/8] powerpc/vdso64: Switch from __get_datapage() to get_datapage inline macro
On the same way as already done on PPC32, drop __get_datapage() function and use get_datapage inline macro instead. See commit ec0895f08f99 ("powerpc/vdso32: inline __get_datapage()") Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/vdso64/cacheflush.S | 9 arch/powerpc/kernel/vdso64/datapage.S | 28 +++ arch/powerpc/kernel/vdso64/gettimeofday.S | 8 +++ 3 files changed, 11 insertions(+), 34 deletions(-) diff --git a/arch/powerpc/kernel/vdso64/cacheflush.S b/arch/powerpc/kernel/vdso64/cacheflush.S index 526f5ba2593e..cab14324242b 100644 --- a/arch/powerpc/kernel/vdso64/cacheflush.S +++ b/arch/powerpc/kernel/vdso64/cacheflush.S @@ -8,6 +8,7 @@ #include #include #include +#include #include .text @@ -24,14 +25,12 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache) .cfi_startproc mflrr12 .cfi_register lr,r12 - mr r11,r3 - bl V_LOCAL_FUNC(__get_datapage) + get_datapager10, r0 mtlrr12 - mr r10,r3 lwz r7,CFG_DCACHE_BLOCKSZ(r10) addir5,r7,-1 - andcr6,r11,r5 /* round low to line bdy */ + andcr6,r3,r5/* round low to line bdy */ subfr8,r6,r4/* compute length */ add r8,r8,r5/* ensure we get enough */ lwz r9,CFG_DCACHE_LOGBLOCKSZ(r10) @@ -48,7 +47,7 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache) lwz r7,CFG_ICACHE_BLOCKSZ(r10) addir5,r7,-1 - andcr6,r11,r5 /* round low to line bdy */ + andcr6,r3,r5/* round low to line bdy */ subfr8,r6,r4/* compute length */ add r8,r8,r5 lwz r9,CFG_ICACHE_LOGBLOCKSZ(r10) diff --git a/arch/powerpc/kernel/vdso64/datapage.S b/arch/powerpc/kernel/vdso64/datapage.S index dc84f5ae3802..067247d3efb9 100644 --- a/arch/powerpc/kernel/vdso64/datapage.S +++ b/arch/powerpc/kernel/vdso64/datapage.S @@ -10,35 +10,13 @@ #include #include #include +#include .text .global__kernel_datapage_offset; __kernel_datapage_offset: .long 0 -V_FUNCTION_BEGIN(__get_datapage) - .cfi_startproc - /* We don't want that exposed or overridable as we want other objects -* to be able to bl directly to here -*/ - .protected __get_datapage - .hidden __get_datapage - - mflrr0 - .cfi_register lr,r0 - - bcl 20,31,data_page_branch -data_page_branch: - mflrr3 - mtlrr0 - addir3, r3, __kernel_datapage_offset-data_page_branch - lwz r0,0(r3) - .cfi_restore lr - add r3,r0,r3 - blr - .cfi_endproc -V_FUNCTION_END(__get_datapage) - /* * void *__kernel_get_syscall_map(unsigned int *syscall_count) ; * @@ -53,7 +31,7 @@ V_FUNCTION_BEGIN(__kernel_get_syscall_map) mflrr12 .cfi_register lr,r12 mr r4,r3 - bl V_LOCAL_FUNC(__get_datapage) + get_datapager3, r0 mtlrr12 addir3,r3,CFG_SYSCALL_MAP64 cmpldi cr0,r4,0 @@ -75,7 +53,7 @@ V_FUNCTION_BEGIN(__kernel_get_tbfreq) .cfi_startproc mflrr12 .cfi_register lr,r12 - bl V_LOCAL_FUNC(__get_datapage) + get_datapager3, r0 ld r3,CFG_TB_TICKS_PER_SEC(r3) mtlrr12 crclr cr0*4+so diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S b/arch/powerpc/kernel/vdso64/gettimeofday.S index 1c9a04703250..e54c4ce4d356 100644 --- a/arch/powerpc/kernel/vdso64/gettimeofday.S +++ b/arch/powerpc/kernel/vdso64/gettimeofday.S @@ -26,7 +26,7 @@ V_FUNCTION_BEGIN(__kernel_gettimeofday) mr r11,r3 /* r11 holds tv */ mr r10,r4 /* r10 holds tz */ - bl V_LOCAL_FUNC(__get_datapage)/* get data page */ + get_datapager3, r0 cmpldi r11,0 /* check if tv is NULL */ beq 2f lis r7,100@ha /* load up USEC_PER_SEC */ @@ -71,7 +71,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime) mflrr12 /* r12 saves lr */ .cfi_register lr,r12 mr r11,r4 /* r11 saves tp */ - bl V_LOCAL_FUNC(__get_datapage)/* get data page */ + get_datapager3, r0 lis r7,NSEC_PER_SEC@h /* want nanoseconds */ ori r7,r7,NSEC_PER_SEC@l beq cr5,70f @@ -188,7 +188,7 @@ V_FUNCTION_BEGIN(__kernel_clock_getres) mflrr12 .cfi_register lr,r12 - bl V_LOCAL_FUNC(__get_datapage) + get_datapager3, r0 lwz r5, CLOCK_HRTIMER_RES(r3) mtlrr12 li r3,0 @@ -221,7 +221,7 @@ V_FUNCTION_BEGIN(__kernel_time) .cfi_register lr,r12 mr r11,r3 /* r11 holds t */ - bl V_LOCAL_FUNC(__get_datapage) + get_datapage
[PATCH v8 4/8] powerpc/processor: Move cpu_relax() into asm/vdso/processor.h
cpu_relax() need to be in asm/vdso/processor.h to be used by the C VDSO generic library. Move it there. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/processor.h | 10 ++ arch/powerpc/include/asm/vdso/processor.h | 23 +++ 2 files changed, 25 insertions(+), 8 deletions(-) create mode 100644 arch/powerpc/include/asm/vdso/processor.h diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index bfa336fbcfeb..8390503c44a2 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -6,6 +6,8 @@ * Copyright (C) 2001 PPC 64 Team, IBM Corp */ +#include + #include #ifdef CONFIG_VSX @@ -63,14 +65,6 @@ extern int _chrp_type; #endif /* defined(__KERNEL__) && defined(CONFIG_PPC32) */ -/* Macros for adjusting thread priority (hardware multi-threading) */ -#define HMT_very_low() asm volatile("or 31,31,31 # very low priority") -#define HMT_low() asm volatile("or 1,1,1 # low priority") -#define HMT_medium_low() asm volatile("or 6,6,6 # medium low priority") -#define HMT_medium()asm volatile("or 2,2,2 # medium priority") -#define HMT_medium_high() asm volatile("or 5,5,5 # medium high priority") -#define HMT_high() asm volatile("or 3,3,3 # high priority") - #ifdef __KERNEL__ #ifdef CONFIG_PPC64 diff --git a/arch/powerpc/include/asm/vdso/processor.h b/arch/powerpc/include/asm/vdso/processor.h new file mode 100644 index ..39b9beace9ca --- /dev/null +++ b/arch/powerpc/include/asm/vdso/processor.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __ASM_VDSO_PROCESSOR_H +#define __ASM_VDSO_PROCESSOR_H + +#ifndef __ASSEMBLY__ + +/* Macros for adjusting thread priority (hardware multi-threading) */ +#define HMT_very_low() asm volatile("or 31, 31, 31 # very low priority") +#define HMT_low() asm volatile("or 1, 1, 1# low priority") +#define HMT_medium_low() asm volatile("or 6, 6, 6# medium low priority") +#define HMT_medium() asm volatile("or 2, 2, 2# medium priority") +#define HMT_medium_high() asm volatile("or 5, 5, 5# medium high priority") +#define HMT_high() asm volatile("or 3, 3, 3# high priority") + +#ifdef CONFIG_PPC64 +#define cpu_relax()do { HMT_low(); HMT_medium(); barrier(); } while (0) +#else +#define cpu_relax()barrier() +#endif + +#endif /* __ASSEMBLY__ */ + +#endif /* __ASM_VDSO_PROCESSOR_H */ -- 2.25.0
[PATCH v8 3/8] powerpc/vdso: Remove unused \tmp param in __get_datapage()
The \tmp param is not used anymore, remove it. Signed-off-by: Christophe Leroy --- v7: New patch, splitted out of preceding patch --- arch/powerpc/include/asm/vdso_datapage.h | 2 +- arch/powerpc/kernel/vdso32/cacheflush.S | 2 +- arch/powerpc/kernel/vdso32/datapage.S | 4 ++-- arch/powerpc/kernel/vdso32/gettimeofday.S | 8 arch/powerpc/kernel/vdso64/cacheflush.S | 2 +- arch/powerpc/kernel/vdso64/datapage.S | 4 ++-- arch/powerpc/kernel/vdso64/gettimeofday.S | 8 7 files changed, 15 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/vdso_datapage.h b/arch/powerpc/include/asm/vdso_datapage.h index 11886467dfdf..f3d7e4e2a45b 100644 --- a/arch/powerpc/include/asm/vdso_datapage.h +++ b/arch/powerpc/include/asm/vdso_datapage.h @@ -116,7 +116,7 @@ extern struct vdso_data *vdso_data; #else /* __ASSEMBLY__ */ -.macro get_datapage ptr, tmp +.macro get_datapage ptr bcl 20, 31, .+4 999: mflr\ptr diff --git a/arch/powerpc/kernel/vdso32/cacheflush.S b/arch/powerpc/kernel/vdso32/cacheflush.S index 3440ddf21c8b..017843bf5382 100644 --- a/arch/powerpc/kernel/vdso32/cacheflush.S +++ b/arch/powerpc/kernel/vdso32/cacheflush.S @@ -27,7 +27,7 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache) #ifdef CONFIG_PPC64 mflrr12 .cfi_register lr,r12 - get_datapager10, r0 + get_datapager10 mtlrr12 #endif diff --git a/arch/powerpc/kernel/vdso32/datapage.S b/arch/powerpc/kernel/vdso32/datapage.S index 5513a4f8253e..0513a2eabec8 100644 --- a/arch/powerpc/kernel/vdso32/datapage.S +++ b/arch/powerpc/kernel/vdso32/datapage.S @@ -28,7 +28,7 @@ V_FUNCTION_BEGIN(__kernel_get_syscall_map) mflrr12 .cfi_register lr,r12 mr. r4,r3 - get_datapager3, r0 + get_datapager3 mtlrr12 addir3,r3,CFG_SYSCALL_MAP32 beqlr @@ -49,7 +49,7 @@ V_FUNCTION_BEGIN(__kernel_get_tbfreq) .cfi_startproc mflrr12 .cfi_register lr,r12 - get_datapager3, r0 + get_datapager3 lwz r4,(CFG_TB_TICKS_PER_SEC + 4)(r3) lwz r3,CFG_TB_TICKS_PER_SEC(r3) mtlrr12 diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S b/arch/powerpc/kernel/vdso32/gettimeofday.S index a3951567118a..0bbdce0f2a9c 100644 --- a/arch/powerpc/kernel/vdso32/gettimeofday.S +++ b/arch/powerpc/kernel/vdso32/gettimeofday.S @@ -34,7 +34,7 @@ V_FUNCTION_BEGIN(__kernel_gettimeofday) mr. r10,r3 /* r10 saves tv */ mr r11,r4 /* r11 saves tz */ - get_datapager9, r0 + get_datapager9 beq 3f LOAD_REG_IMMEDIATE(r7, 100) /* load up USEC_PER_SEC */ bl __do_get_tspec@local/* get sec/usec from tb & kernel */ @@ -79,7 +79,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime) mflrr12 /* r12 saves lr */ .cfi_register lr,r12 mr r11,r4 /* r11 saves tp */ - get_datapager9, r0 + get_datapager9 LOAD_REG_IMMEDIATE(r7, NSEC_PER_SEC)/* load up NSEC_PER_SEC */ beq cr5, .Lcoarse_clocks .Lprecise_clocks: @@ -206,7 +206,7 @@ V_FUNCTION_BEGIN(__kernel_clock_getres) mflrr12 .cfi_register lr,r12 - get_datapager3, r0 + get_datapager3 lwz r5, CLOCK_HRTIMER_RES(r3) mtlrr12 1: li r3,0 @@ -240,7 +240,7 @@ V_FUNCTION_BEGIN(__kernel_time) .cfi_register lr,r12 mr r11,r3 /* r11 holds t */ - get_datapager9, r0 + get_datapager9 lwz r3,STAMP_XTIME_SEC+LOPART(r9) diff --git a/arch/powerpc/kernel/vdso64/cacheflush.S b/arch/powerpc/kernel/vdso64/cacheflush.S index cab14324242b..61985de5758f 100644 --- a/arch/powerpc/kernel/vdso64/cacheflush.S +++ b/arch/powerpc/kernel/vdso64/cacheflush.S @@ -25,7 +25,7 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache) .cfi_startproc mflrr12 .cfi_register lr,r12 - get_datapager10, r0 + get_datapager10 mtlrr12 lwz r7,CFG_DCACHE_BLOCKSZ(r10) diff --git a/arch/powerpc/kernel/vdso64/datapage.S b/arch/powerpc/kernel/vdso64/datapage.S index 03bb72c440dc..00760dc69d68 100644 --- a/arch/powerpc/kernel/vdso64/datapage.S +++ b/arch/powerpc/kernel/vdso64/datapage.S @@ -28,7 +28,7 @@ V_FUNCTION_BEGIN(__kernel_get_syscall_map) mflrr12 .cfi_register lr,r12 mr r4,r3 - get_datapager3, r0 + get_datapager3 mtlrr12 addir3,r3,CFG_SYSCALL_MAP64 cmpldi cr0,r4,0 @@ -50,7 +50,7 @@ V_FUNCTION_BEGIN(__kernel_get_tbfreq) .cfi_startproc mflrr12 .cfi_register lr,r12 - get_datapager3, r0 + get_datapager3 ld r3,CFG_TB_TICKS_PER_SEC(r3) mtlrr12 crclr cr0*4+so diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S b/arch/
[PATCH v8 0/8] powerpc: switch VDSO to C implementation
This is the seventh version of a series to switch powerpc VDSO to generic C implementation. Main changes since v7 are: - Added gettime64 on PPC32 This series applies on today's powerpc/merge branch. See the last patches for details on changes and performance. Christophe Leroy (8): powerpc/vdso64: Switch from __get_datapage() to get_datapage inline macro powerpc/vdso: Remove __kernel_datapage_offset and simplify __get_datapage() powerpc/vdso: Remove unused \tmp param in __get_datapage() powerpc/processor: Move cpu_relax() into asm/vdso/processor.h powerpc/vdso: Prepare for switching VDSO to generic C implementation. powerpc/vdso: Switch VDSO to generic C implementation. lib/vdso: force inlining of __cvdso_clock_gettime_common() powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32 arch/powerpc/Kconfig | 2 + arch/powerpc/include/asm/clocksource.h | 7 + arch/powerpc/include/asm/processor.h | 10 +- arch/powerpc/include/asm/vdso/clocksource.h | 7 + arch/powerpc/include/asm/vdso/gettimeofday.h | 175 +++ arch/powerpc/include/asm/vdso/processor.h| 23 ++ arch/powerpc/include/asm/vdso/vsyscall.h | 25 ++ arch/powerpc/include/asm/vdso_datapage.h | 50 ++-- arch/powerpc/kernel/asm-offsets.c| 49 +-- arch/powerpc/kernel/time.c | 91 +- arch/powerpc/kernel/vdso.c | 58 +--- arch/powerpc/kernel/vdso32/Makefile | 32 +- arch/powerpc/kernel/vdso32/cacheflush.S | 2 +- arch/powerpc/kernel/vdso32/config-fake32.h | 34 +++ arch/powerpc/kernel/vdso32/datapage.S| 7 +- arch/powerpc/kernel/vdso32/gettimeofday.S| 300 +-- arch/powerpc/kernel/vdso32/vdso32.lds.S | 8 +- arch/powerpc/kernel/vdso32/vgettimeofday.c | 35 +++ arch/powerpc/kernel/vdso64/Makefile | 23 +- arch/powerpc/kernel/vdso64/cacheflush.S | 9 +- arch/powerpc/kernel/vdso64/datapage.S| 31 +- arch/powerpc/kernel/vdso64/gettimeofday.S| 243 +-- arch/powerpc/kernel/vdso64/vdso64.lds.S | 7 +- arch/powerpc/kernel/vdso64/vgettimeofday.c | 29 ++ lib/vdso/gettimeofday.c | 2 +- 25 files changed, 460 insertions(+), 799 deletions(-) create mode 100644 arch/powerpc/include/asm/clocksource.h create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h create mode 100644 arch/powerpc/include/asm/vdso/processor.h create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h create mode 100644 arch/powerpc/kernel/vdso32/config-fake32.h create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c -- 2.25.0
[PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
Prepare for switching VDSO to generic C implementation in following patch. Here, we: - Modify __get_datapage() to take an offset - Prepare the helpers to call the C VDSO functions - Prepare the required callbacks for the C VDSO functions - Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES - Add the C trampolines to the generic C VDSO functions powerpc is a bit special for VDSO as well as system calls in the way that it requires setting CR SO bit which cannot be done in C. Therefore, entry/exit needs to be performed in ASM. Implementing __arch_get_vdso_data() would clobber the link register, requiring the caller to save it. As the ASM calling function already has to set a stack frame and saves the link register before calling the C vdso function, retriving the vdso data pointer there is lighter. Implement __arch_vdso_capable() and: - When the timebase is used, make it always return true. - When the RTC clock is used, make it always return false. Provide vdso_shift_ns(), as the generic x >> s gives the following bad result: 18: 35 25 ff e0 addic. r9,r5,-32 1c: 41 80 00 10 blt 2c 20: 7c 64 4c 30 srw r4,r3,r9 24: 38 60 00 00 li r3,0 ... 2c: 54 69 08 3c rlwinm r9,r3,1,0,30 30: 21 45 00 1f subfic r10,r5,31 34: 7c 84 2c 30 srw r4,r4,r5 38: 7d 29 50 30 slw r9,r9,r10 3c: 7c 63 2c 30 srw r3,r3,r5 40: 7d 24 23 78 or r4,r9,r4 In our case the shift is always <= 32. In addition, the upper 32 bits of the result are likely nul. Lets GCC know it, it also optimises the following calculations. With the patch, we get: 0: 21 25 00 20 subfic r9,r5,32 4: 7c 69 48 30 slw r9,r3,r9 8: 7c 84 2c 30 srw r4,r4,r5 c: 7d 24 23 78 or r4,r9,r4 10: 7c 63 2c 30 srw r3,r3,r5 Signed-off-by: Christophe Leroy --- v8: - New, splitted out of last patch of the series --- arch/powerpc/include/asm/clocksource.h | 7 + arch/powerpc/include/asm/vdso/clocksource.h | 7 + arch/powerpc/include/asm/vdso/gettimeofday.h | 175 +++ arch/powerpc/include/asm/vdso_datapage.h | 6 +- arch/powerpc/kernel/vdso32/vgettimeofday.c | 29 +++ arch/powerpc/kernel/vdso64/vgettimeofday.c | 29 +++ 6 files changed, 250 insertions(+), 3 deletions(-) create mode 100644 arch/powerpc/include/asm/clocksource.h create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c diff --git a/arch/powerpc/include/asm/clocksource.h b/arch/powerpc/include/asm/clocksource.h new file mode 100644 index ..482185566b0c --- /dev/null +++ b/arch/powerpc/include/asm/clocksource.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_CLOCKSOURCE_H +#define _ASM_CLOCKSOURCE_H + +#include + +#endif diff --git a/arch/powerpc/include/asm/vdso/clocksource.h b/arch/powerpc/include/asm/vdso/clocksource.h new file mode 100644 index ..ec5d672d2569 --- /dev/null +++ b/arch/powerpc/include/asm/vdso/clocksource.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_VDSOCLOCKSOURCE_H +#define __ASM_VDSOCLOCKSOURCE_H + +#define VDSO_ARCH_CLOCKMODES VDSO_CLOCKMODE_ARCHTIMER + +#endif diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h b/arch/powerpc/include/asm/vdso/gettimeofday.h new file mode 100644 index ..4452897f9bd8 --- /dev/null +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h @@ -0,0 +1,175 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_VDSO_GETTIMEOFDAY_H +#define __ASM_VDSO_GETTIMEOFDAY_H + +#include + +#ifdef __ASSEMBLY__ + +.macro cvdso_call funct + .cfi_startproc + PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1) + mflrr0 + .cfi_register lr, r0 + PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1) + get_datapager5, VDSO_DATA_OFFSET + bl \funct + PPC_LL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1) + cmpwi r3, 0 + mtlrr0 + .cfi_restore lr + addir1, r1, STACK_FRAME_OVERHEAD + crclr so + beqlr+ + crset so + neg r3, r3 + blr + .cfi_endproc +.endm + +.macro cvdso_call_time funct + .cfi_startproc + PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1) + mflrr0 + .cfi_register lr, r0 + PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1) + get_datapager4, VDSO_DATA_OFFSET + bl \funct + PPC_LL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1) + crclr so + mtlrr0 + .cfi_restore lr + addir1, r1, STACK_FRAME_OVERHEAD + blr + .cfi_endproc +.endm + +#else + +#include +#i
[PATCH v8 8/8] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32
Provides __kernel_clock_gettime64() on vdso32. This is the 64 bits version of __kernel_clock_gettime() which is y2038 compliant. Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/vdso32/gettimeofday.S | 9 + arch/powerpc/kernel/vdso32/vdso32.lds.S| 1 + arch/powerpc/kernel/vdso32/vgettimeofday.c | 6 ++ 3 files changed, 16 insertions(+) diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S b/arch/powerpc/kernel/vdso32/gettimeofday.S index fd7b01c51281..a6e29f880e0e 100644 --- a/arch/powerpc/kernel/vdso32/gettimeofday.S +++ b/arch/powerpc/kernel/vdso32/gettimeofday.S @@ -35,6 +35,15 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime) cvdso_call __c_kernel_clock_gettime V_FUNCTION_END(__kernel_clock_gettime) +/* + * Exact prototype of clock_gettime64() + * + * int __kernel_clock_gettime64(clockid_t clock_id, struct __timespec64 *ts); + * + */ +V_FUNCTION_BEGIN(__kernel_clock_gettime64) + cvdso_call __c_kernel_clock_gettime64 +V_FUNCTION_END(__kernel_clock_gettime64) /* * Exact prototype of clock_getres() diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S b/arch/powerpc/kernel/vdso32/vdso32.lds.S index 6cf729612268..05b462143238 100644 --- a/arch/powerpc/kernel/vdso32/vdso32.lds.S +++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S @@ -144,6 +144,7 @@ VERSION #ifndef CONFIG_PPC_BOOK3S_601 __kernel_gettimeofday; __kernel_clock_gettime; + __kernel_clock_gettime64; __kernel_clock_getres; __kernel_time; __kernel_get_tbfreq; diff --git a/arch/powerpc/kernel/vdso32/vgettimeofday.c b/arch/powerpc/kernel/vdso32/vgettimeofday.c index 0b9ab4c22ef2..f7f71fecf4ed 100644 --- a/arch/powerpc/kernel/vdso32/vgettimeofday.c +++ b/arch/powerpc/kernel/vdso32/vgettimeofday.c @@ -11,6 +11,12 @@ int __c_kernel_clock_gettime(clockid_t clock, struct old_timespec32 *ts, return __cvdso_clock_gettime32_data(vd, clock, ts); } +int __c_kernel_clock_gettime64(clockid_t clock, struct __kernel_timespec *ts, + const struct vdso_data *vd) +{ + return __cvdso_clock_gettime_data(vd, clock, ts); +} + int __c_kernel_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz, const struct vdso_data *vd) { -- 2.25.0
[PATCH v8 6/8] powerpc/vdso: Switch VDSO to generic C implementation.
For VDSO32 on PPC64, we create a fake 32 bits config, on the same principle as MIPS architecture, in order to get the correct parts of the different asm header files. With the C VDSO, the performance is slightly lower, but it is worth it as it will ease maintenance and evolution, and also brings clocks that are not supported with the ASM VDSO. On an 8xx at 132 MHz, vdsotest with the ASM VDSO: gettimeofday:vdso: 828 nsec/call clock-getres-realtime-coarse:vdso: 391 nsec/call clock-gettime-realtime-coarse:vdso: 614 nsec/call clock-getres-realtime:vdso: 460 nsec/call clock-gettime-realtime:vdso: 876 nsec/call clock-getres-monotonic-coarse:vdso: 399 nsec/call clock-gettime-monotonic-coarse:vdso: 691 nsec/call clock-getres-monotonic:vdso: 460 nsec/call clock-gettime-monotonic:vdso: 1026 nsec/call On an 8xx at 132 MHz, vdsotest with the C VDSO: gettimeofday:vdso: 955 nsec/call clock-getres-realtime-coarse:vdso: 545 nsec/call clock-gettime-realtime-coarse:vdso: 592 nsec/call clock-getres-realtime:vdso: 545 nsec/call clock-gettime-realtime:vdso: 941 nsec/call clock-getres-monotonic-coarse:vdso: 545 nsec/call clock-gettime-monotonic-coarse:vdso: 591 nsec/call clock-getres-monotonic:vdso: 545 nsec/call clock-gettime-monotonic:vdso: 940 nsec/call It is even better for gettime with monotonic clocks. Unsupported clocks with ASM VDSO: clock-gettime-boottime:vdso: 3851 nsec/call clock-gettime-tai:vdso: 3852 nsec/call clock-gettime-monotonic-raw:vdso: 3396 nsec/call Same clocks with C VDSO: clock-gettime-tai:vdso: 941 nsec/call clock-gettime-monotonic-raw:vdso: 1001 nsec/call clock-gettime-monotonic-coarse:vdso: 591 nsec/call On an 8321E at 333 MHz, vdsotest with the ASM VDSO: gettimeofday:vdso: 220 nsec/call clock-getres-realtime-coarse:vdso: 102 nsec/call clock-gettime-realtime-coarse:vdso: 178 nsec/call clock-getres-realtime:vdso: 129 nsec/call clock-gettime-realtime:vdso: 235 nsec/call clock-getres-monotonic-coarse:vdso: 105 nsec/call clock-gettime-monotonic-coarse:vdso: 208 nsec/call clock-getres-monotonic:vdso: 129 nsec/call clock-gettime-monotonic:vdso: 274 nsec/call On an 8321E at 333 MHz, vdsotest with the C VDSO: gettimeofday:vdso: 272 nsec/call clock-getres-realtime-coarse:vdso: 160 nsec/call clock-gettime-realtime-coarse:vdso: 184 nsec/call clock-getres-realtime:vdso: 166 nsec/call clock-gettime-realtime:vdso: 281 nsec/call clock-getres-monotonic-coarse:vdso: 160 nsec/call clock-gettime-monotonic-coarse:vdso: 184 nsec/call clock-getres-monotonic:vdso: 169 nsec/call clock-gettime-monotonic:vdso: 275 nsec/call Signed-off-by: Christophe Leroy --- v7: - Split out preparatory changes in a new preceding patch - Added -fasynchronous-unwind-tables to CC flags. v6: - Added missing prototypes in asm/vdso/gettimeofday.h for __c_kernel_ functions. - Using STACK_FRAME_OVERHEAD instead of INT_FRAME_SIZE - Rebased on powerpc/merge as of 7 Apr 2020 - Fixed build failure with gcc 9 - Added a patch to create asm/vdso/processor.h and more cpu_relax() in it Signed-off-by: Christophe Leroy --- arch/powerpc/Kconfig | 2 + arch/powerpc/include/asm/vdso/vsyscall.h | 25 ++ arch/powerpc/include/asm/vdso_datapage.h | 40 +-- arch/powerpc/kernel/asm-offsets.c | 49 +--- arch/powerpc/kernel/time.c | 91 +-- arch/powerpc/kernel/vdso.c | 5 +- arch/powerpc/kernel/vdso32/Makefile| 32 ++- arch/powerpc/kernel/vdso32/config-fake32.h | 34 +++ arch/powerpc/kernel/vdso32/gettimeofday.S | 291 + arch/powerpc/kernel/vdso64/Makefile| 23 +- arch/powerpc/kernel/vdso64/gettimeofday.S | 243 + 11 files changed, 144 insertions(+), 691 deletions(-) create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h create mode 100644 arch/powerpc/kernel/vdso32/config-fake32.h diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 924c541a9260..4e51454a3e7b 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -166,6 +166,7 @@ config PPC select GENERIC_STRNCPY_FROM_USER select GENERIC_STRNLEN_USER select GENERIC_TIME_VSYSCALL + select GENERIC_GETTIMEOFDAY select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_HUGE_VMAP if PPC_BOOK3S_64 && PPC_RADIX_MMU select HAVE_ARCH_JUMP_LABEL @@ -197,6 +198,7 @@ config PPC select HAVE_FUNCTION_GRAPH_TRACER select HAVE_FUNCTION_TRACER select HAVE_GCC_PLUGINS if GCC_VERSION >= 50200 # plugin support on gcc <= 5.1 is buggy on PPC + select HAVE_GENERIC_VDSO select HAVE_HW_BREAKPOINT if PERF_EVENTS && (PPC_BOOK3S || PPC_8xx) select HAVE_IDE select HAVE_IOREMAP_PROT diff --git a/arch/powerpc/include/asm/vdso/vsyscall.h b/arch/powerpc/include/asm/vdso/v
[PATCH v8 7/8] lib/vdso: force inlining of __cvdso_clock_gettime_common()
When adding gettime64() to a 32 bit architecture (namely powerpc/32) it has been noticed that GCC doesn't inline anymore __cvdso_clock_gettime_common() because it is called twice (Once by __cvdso_clock_gettime() and once by __cvdso_clock_gettime32). This has the effect of seriously degrading the performance: Before the implementation of gettime64(), gettime() runs in: clock-gettime-monotonic-raw:vdso: 1003 nsec/call clock-gettime-monotonic-coarse:vdso: 592 nsec/call clock-gettime-monotonic:vdso: 942 nsec/call When adding a gettime64() entry point, the standard gettime() performance is degraded by 30% to 50%: clock-gettime-monotonic-raw:vdso: 1300 nsec/call clock-gettime-monotonic-coarse:vdso: 900 nsec/call clock-gettime-monotonic:vdso: 1232 nsec/call Adding __always_inline() to __cvdso_clock_gettime_common() regains the original performance. In terms of code size, the inlining increases the code size by only 176 bytes. This is in the noise for a kernel image. Signed-off-by: Christophe Leroy --- lib/vdso/gettimeofday.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c index a2909af4b924..7938d3c4901d 100644 --- a/lib/vdso/gettimeofday.c +++ b/lib/vdso/gettimeofday.c @@ -210,7 +210,7 @@ static __always_inline int do_coarse(const struct vdso_data *vd, clockid_t clk, return 0; } -static __maybe_unused int +static __always_inline int __cvdso_clock_gettime_common(const struct vdso_data *vd, clockid_t clock, struct __kernel_timespec *ts) { -- 2.25.0
[PATCH 1/2] powerpc: Keep .rela* sections when CONFIG_RELOCATABLE is defined
arch/powerpc/kernel/vmlinux.lds.S has #ifdef CONFIG_RELOCATABLE ... .rela.dyn : AT(ADDR(.rela.dyn) - LOAD_OFFSET) { __rela_dyn_start = .; *(.rela*) } #endif ... DISCARDS /DISCARD/ : { *(*.EMB.apuinfo) *(.glink .iplt .plt .rela* .comment) *(.gnu.version*) *(.gnu.attributes) *(.eh_frame) } Since .rela* sections are needed when CONFIG_RELOCATABLE is defined, don't discard .rela* sections if CONFIG_RELOCATABLE is defined. Signed-off-by: H.J. Lu Acked-by: Michael Ellerman (powerpc) --- arch/powerpc/kernel/vmlinux.lds.S | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 31a0f201fb6f..4ba07734a210 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -366,9 +366,12 @@ SECTIONS DISCARDS /DISCARD/ : { *(*.EMB.apuinfo) - *(.glink .iplt .plt .rela* .comment) + *(.glink .iplt .plt .comment) *(.gnu.version*) *(.gnu.attributes) *(.eh_frame) +#ifndef CONFIG_RELOCATABLE + *(.rela*) +#endif } } -- 2.25.4
[PATCH 2/2] Discard .note.gnu.property sections in generic NOTES
With the command-line option, -mx86-used-note=yes, the x86 assembler in binutils 2.32 and above generates a program property note in a note section, .note.gnu.property, to encode used x86 ISAs and features. But kernel linker script only contains a single NOTE segment: PHDRS { text PT_LOAD FLAGS(5); data PT_LOAD FLAGS(6); percpu PT_LOAD FLAGS(6); init PT_LOAD FLAGS(7); note PT_NOTE FLAGS(0); } SECTIONS { ... .notes : AT(ADDR(.notes) - 0x8000) { __start_notes = .; KEEP(*(.not e.*)) __stop_notes = .; } :text :note ... } The NOTE segment generated by kernel linker script is aligned to 4 bytes. But .note.gnu.property section must be aligned to 8 bytes on x86-64 and we get [hjl@gnu-skx-1 linux]$ readelf -n vmlinux Displaying notes found in: .notes OwnerData size Description Xen 0x0006 Unknown note type: (0x0006) description data: 6c 69 6e 75 78 00 Xen 0x0004 Unknown note type: (0x0007) description data: 32 2e 36 00 xen-3.0 0x0005 Unknown note type: (0x006e6558) description data: 08 00 00 00 03 readelf: Warning: note with invalid namesz and/or descsz found at offset 0x50 readelf: Warning: type: 0x, namesize: 0x006e6558, descsize: 0x8000, alignment: 8 [hjl@gnu-skx-1 linux]$ Since note.gnu.property section in kernel image is never used, this patch discards .note.gnu.property sections in kernel linker script by adding /DISCARD/ : { *(.note.gnu.property) } before kernel NOTE segment in generic NOTES. Signed-off-by: H.J. Lu Reviewed-by: Kees Cook --- include/asm-generic/vmlinux.lds.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 71e387a5fe90..95cd678428f4 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -833,7 +833,14 @@ #define TRACEDATA #endif +/* + * Discard .note.gnu.property sections which are unused and have + * different alignment requirement from kernel note sections. + */ #define NOTES \ + /DISCARD/ : { \ + *(.note.gnu.property) \ + } \ .notes : AT(ADDR(.notes) - LOAD_OFFSET) { \ __start_notes = .; \ KEEP(*(.note.*))\ -- 2.25.4
Re: [PATCH] powerpc/spufs: Add rcu_read_lock() around fcheck()
On Tue, Apr 28, 2020 at 09:48:11PM +1000, Michael Ellerman wrote: > > This comes from fcheck_files() via fcheck(). > > It's pretty clearly documented that fcheck() must be wrapped with > rcu_read_lock(), so fix it. But for this to actually be useful you'd need the rcu read lock until your are done with the file (or got a reference).
Re: [RFC PATCH] powerpc/spufs: fix copy_to_user while atomic
On Tue, Apr 28, 2020 at 08:02:07PM +0800, Jeremy Kerr wrote: > Currently, we may perform a copy_to_user (through > simple_read_from_buffer()) while holding a context's register_lock, > while accessing the context save area. > > This change uses a temporary buffers for the context save area data, > which we then pass to simple_read_from_buffer. > > Signed-off-by: Jeremy Kerr > --- > > Christoph - this fixes the copy_to_user while atomic, hopefully with > only minimal distruption to your series. I recognize plenty of sniplets from that :) Your patch looks good to me: Reviewed-by: Christoph Hellwig
[PATCH v2 1/2] powerpc/ftrace: Simplify error checking when patching instructions
Introduce a macro PATCH_INSN_OR_GOTO() to simplify instruction patching, and to make the error messages more uniform and useful: - print an error message that includes the original return value - print the function name and line numbers, so that the offending location is clear - goto a label which always return -EPERM, which ftrace_bug() expects for proper error handling Also eliminate use of patch_branch() since most such uses already call create_branch() for error checking before patching. Instead, use the return value from create_branch() with PATCH_INSN_OR_GOTO(). Signed-off-by: Naveen N. Rao --- arch/powerpc/kernel/trace/ftrace.c | 96 +++--- 1 file changed, 60 insertions(+), 36 deletions(-) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 7ea0ca044b65..63edbd48af42 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -31,6 +31,17 @@ #ifdef CONFIG_DYNAMIC_FTRACE +#define PATCH_INSN_OR_GOTO(addr, instr, label) \ +do {\ + int rc = patch_instruction((unsigned int *)(addr), instr); \ + if (rc) {\ + pr_err("%s:%d Error patching instruction at 0x%pK (%pS): %d\n", \ + __func__, __LINE__, \ + (void *)(addr), (void *)(addr), rc); \ + goto label; \ + }\ +} while (0) + /* * We generally only have a single long_branch tramp and at most 2 or 3 plt * tramps generated. But, we don't use the plt tramps currently. We also allot @@ -78,10 +89,12 @@ ftrace_modify_code(unsigned long ip, unsigned int old, unsigned int new) } /* replace the text with the new text */ - if (patch_instruction((unsigned int *)ip, new)) - return -EPERM; + PATCH_INSN_OR_GOTO(ip, new, patch_err); return 0; + +patch_err: + return -EPERM; } /* @@ -204,12 +217,12 @@ __ftrace_make_nop(struct module *mod, } #endif /* CONFIG_MPROFILE_KERNEL */ - if (patch_instruction((unsigned int *)ip, pop)) { - pr_err("Patching NOP failed.\n"); - return -EPERM; - } + PATCH_INSN_OR_GOTO(ip, pop, patch_err); return 0; + +patch_err: + return -EPERM; } #else /* !PPC64 */ @@ -276,10 +289,12 @@ __ftrace_make_nop(struct module *mod, op = PPC_INST_NOP; - if (patch_instruction((unsigned int *)ip, op)) - return -EPERM; + PATCH_INSN_OR_GOTO(ip, op, patch_err); return 0; + +patch_err: + return -EPERM; } #endif /* PPC64 */ #endif /* CONFIG_MODULES */ @@ -322,7 +337,7 @@ static int add_ftrace_tramp(unsigned long tramp) */ static int setup_mcount_compiler_tramp(unsigned long tramp) { - int i, op; + unsigned int i, op; unsigned long ptr; static unsigned long ftrace_plt_tramps[NUM_FTRACE_TRAMPS]; @@ -366,16 +381,14 @@ static int setup_mcount_compiler_tramp(unsigned long tramp) #else ptr = ppc_global_function_entry((void *)ftrace_caller); #endif - if (!create_branch((void *)tramp, ptr, 0)) { + op = create_branch((void *)tramp, ptr, 0); + if (!op) { pr_debug("%ps is not reachable from existing mcount tramp\n", (void *)ptr); return -1; } - if (patch_branch((unsigned int *)tramp, ptr, 0)) { - pr_debug("REL24 out of range!\n"); - return -1; - } + PATCH_INSN_OR_GOTO(tramp, op, patch_err); if (add_ftrace_tramp(tramp)) { pr_debug("No tramp locations left\n"); @@ -383,6 +396,9 @@ static int setup_mcount_compiler_tramp(unsigned long tramp) } return 0; + +patch_err: + return -EPERM; } static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr) @@ -416,12 +432,12 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr) } } - if (patch_instruction((unsigned int *)ip, PPC_INST_NOP)) { - pr_err("Patching NOP failed.\n"); - return -EPERM; - } + PATCH_INSN_OR_GOTO(ip, PPC_INST_NOP, patch_err); return 0; + +patch_err: + return -EPERM; } int ftrace_make_nop(struct module *mod, @@ -557,17 +573,18 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) } /* Ensure branch is within 24 bits */ - if (!create_branch(ip, tramp, BRANCH_SET_LINK)) { + op[0] = create_branch(ip, tramp, BRANCH_SET_LINK); + if (!op[0]) { pr_err("Branch out of rang
[PATCH v2 0/2] powerpc: Enhance error handling with patch_instruction()
Changes in v2: - Change macro to use 'goto' instead of 'return' - Rename macro to indicate use of 'goto' - Convert more patch_instruction() uses in optprobes to use the new macro. - Drop 1st patch, which added error checking in do_patch_instruction() since that is being covered in a separate patchset. v1: http://lkml.kernel.org/r/cover.1587654213.git.naveen.n@linux.vnet.ibm.com - Naveen Naveen N. Rao (2): powerpc/ftrace: Simplify error checking when patching instructions powerpc/kprobes: Check return value of patch_instruction() arch/powerpc/kernel/kprobes.c | 13 +++- arch/powerpc/kernel/optprobes.c| 109 + arch/powerpc/kernel/trace/ftrace.c | 96 +++-- 3 files changed, 149 insertions(+), 69 deletions(-) base-commit: 8299da600ad05b8aa0f15ec0f5f03bd40e37d6f0 -- 2.25.1
[PATCH v2 2/2] powerpc/kprobes: Check return value of patch_instruction()
patch_instruction() can fail in some scenarios. Add appropriate error checking so that such failures are caught and logged, and suitable error code is returned. Fixes: d07df82c43be8 ("powerpc/kprobes: Move kprobes over to patch_instruction()") Fixes: f3eca95638931 ("powerpc/kprobes/optprobes: Use patch_instruction()") Signed-off-by: Naveen N. Rao --- arch/powerpc/kernel/kprobes.c | 13 +++- arch/powerpc/kernel/optprobes.c | 109 +++- 2 files changed, 89 insertions(+), 33 deletions(-) diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c index 81efb605113e..9eb1cc05ddd5 100644 --- a/arch/powerpc/kernel/kprobes.c +++ b/arch/powerpc/kernel/kprobes.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -138,13 +139,21 @@ NOKPROBE_SYMBOL(arch_prepare_kprobe); void arch_arm_kprobe(struct kprobe *p) { - patch_instruction(p->addr, BREAKPOINT_INSTRUCTION); + int rc = patch_instruction(p->addr, BREAKPOINT_INSTRUCTION); + + if (WARN_ON(rc)) + pr_err("Failed to patch trap at 0x%pK: %d\n", + (void *)p->addr, rc); } NOKPROBE_SYMBOL(arch_arm_kprobe); void arch_disarm_kprobe(struct kprobe *p) { - patch_instruction(p->addr, p->opcode); + int rc = patch_instruction(p->addr, p->opcode); + + if (WARN_ON(rc)) + pr_err("Failed to remove trap at 0x%pK: %d\n", + (void *)p->addr, rc); } NOKPROBE_SYMBOL(arch_disarm_kprobe); diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c index 024f7aad1952..20897600db2e 100644 --- a/arch/powerpc/kernel/optprobes.c +++ b/arch/powerpc/kernel/optprobes.c @@ -139,52 +139,80 @@ void arch_remove_optimized_kprobe(struct optimized_kprobe *op) } } +#define PATCH_INSN_OR_GOTO(addr, instr, label) \ +do {\ + int rc = patch_instruction((unsigned int *)(addr), instr); \ + if (rc) {\ + pr_err("%s:%d Error patching instruction at 0x%pK (%pS): %d\n", \ + __func__, __LINE__, \ + (void *)(addr), (void *)(addr), rc); \ + goto label; \ + }\ +} while (0) + /* * emulate_step() requires insn to be emulated as * second parameter. Load register 'r4' with the * instruction. */ -void patch_imm32_load_insns(unsigned int val, kprobe_opcode_t *addr) +static int patch_imm32_load_insns(unsigned int val, kprobe_opcode_t *addr) { /* addis r4,0,(insn)@h */ - patch_instruction(addr, PPC_INST_ADDIS | ___PPC_RT(4) | - ((val >> 16) & 0x)); + PATCH_INSN_OR_GOTO(addr, PPC_INST_ADDIS | ___PPC_RT(4) | + ((val >> 16) & 0x), + patch_err); addr++; /* ori r4,r4,(insn)@l */ - patch_instruction(addr, PPC_INST_ORI | ___PPC_RA(4) | - ___PPC_RS(4) | (val & 0x)); + PATCH_INSN_OR_GOTO(addr, PPC_INST_ORI | ___PPC_RA(4) | + ___PPC_RS(4) | (val & 0x), + patch_err); + + return 0; + +patch_err: + return -EFAULT; } /* * Generate instructions to load provided immediate 64-bit value * to register 'r3' and patch these instructions at 'addr'. */ -void patch_imm64_load_insns(unsigned long val, kprobe_opcode_t *addr) +static int patch_imm64_load_insns(unsigned long val, kprobe_opcode_t *addr) { /* lis r3,(op)@highest */ - patch_instruction(addr, PPC_INST_ADDIS | ___PPC_RT(3) | - ((val >> 48) & 0x)); + PATCH_INSN_OR_GOTO(addr, PPC_INST_ADDIS | ___PPC_RT(3) | + ((val >> 48) & 0x), + patch_err); addr++; /* ori r3,r3,(op)@higher */ - patch_instruction(addr, PPC_INST_ORI | ___PPC_RA(3) | - ___PPC_RS(3) | ((val >> 32) & 0x)); + PATCH_INSN_OR_GOTO(addr, PPC_INST_ORI | ___PPC_RA(3) | + ___PPC_RS(3) | ((val >> 32) & 0x), + patch_err); addr++; /* rldicr r3,r3,32,31 */ - patch_instruction(addr, PPC_INST_RLDICR | ___PPC_RA(3) | - ___PPC_RS(3) | __PPC_SH64(32) | __PPC_ME64(31)); + PATCH_INSN_OR_GOTO(addr, PPC_INST_RLDICR | ___PPC_RA(3) | + ___PPC_RS(3) | __PPC_SH64(32) | __PPC_ME64(31), + patch_err); addr++; /* oris r3,r3,(op)@h */ - patch_instruction(addr, PPC_INST_ORIS | ___PPC_RA(3) | -
Re: [PATCH v8 8/8] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32
Hi, On 04/28/2020 01:16 PM, Christophe Leroy wrote: Provides __kernel_clock_gettime64() on vdso32. This is the 64 bits version of __kernel_clock_gettime() which is y2038 compliant. Signed-off-by: Christophe Leroy Why does snowpatch still report upstream failure ? This is fixed in latest powerpc/merge which is commit d9e4eabb775e1ba56d83bd4047de2413a21f87ce Merge: 54dc28ff5e0b 51184ae37e05 Author: Michael Ellerman Date: Tue Apr 28 09:01:47 2020 +1000 Automatic merge of branches 'master', 'next' and 'fixes' into merge Looks like snowpatch is still using commit 54dc28ff5e0b3585224d49a31b53e030342ca5c3 Merge: 5d23fec1ac1d b2768df24ec4 Author: Michael Ellerman Date: Sun Apr 26 09:44:18 2020 +1000 Automatic merge of branches 'master', 'next' and 'fixes' into merge Why ? Christophe --- arch/powerpc/kernel/vdso32/gettimeofday.S | 9 + arch/powerpc/kernel/vdso32/vdso32.lds.S| 1 + arch/powerpc/kernel/vdso32/vgettimeofday.c | 6 ++ 3 files changed, 16 insertions(+) diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S b/arch/powerpc/kernel/vdso32/gettimeofday.S index fd7b01c51281..a6e29f880e0e 100644 --- a/arch/powerpc/kernel/vdso32/gettimeofday.S +++ b/arch/powerpc/kernel/vdso32/gettimeofday.S @@ -35,6 +35,15 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime) cvdso_call __c_kernel_clock_gettime V_FUNCTION_END(__kernel_clock_gettime) +/* + * Exact prototype of clock_gettime64() + * + * int __kernel_clock_gettime64(clockid_t clock_id, struct __timespec64 *ts); + * + */ +V_FUNCTION_BEGIN(__kernel_clock_gettime64) + cvdso_call __c_kernel_clock_gettime64 +V_FUNCTION_END(__kernel_clock_gettime64) /* * Exact prototype of clock_getres() diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S b/arch/powerpc/kernel/vdso32/vdso32.lds.S index 6cf729612268..05b462143238 100644 --- a/arch/powerpc/kernel/vdso32/vdso32.lds.S +++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S @@ -144,6 +144,7 @@ VERSION #ifndef CONFIG_PPC_BOOK3S_601 __kernel_gettimeofday; __kernel_clock_gettime; + __kernel_clock_gettime64; __kernel_clock_getres; __kernel_time; __kernel_get_tbfreq; diff --git a/arch/powerpc/kernel/vdso32/vgettimeofday.c b/arch/powerpc/kernel/vdso32/vgettimeofday.c index 0b9ab4c22ef2..f7f71fecf4ed 100644 --- a/arch/powerpc/kernel/vdso32/vgettimeofday.c +++ b/arch/powerpc/kernel/vdso32/vgettimeofday.c @@ -11,6 +11,12 @@ int __c_kernel_clock_gettime(clockid_t clock, struct old_timespec32 *ts, return __cvdso_clock_gettime32_data(vd, clock, ts); } +int __c_kernel_clock_gettime64(clockid_t clock, struct __kernel_timespec *ts, + const struct vdso_data *vd) +{ + return __cvdso_clock_gettime_data(vd, clock, ts); +} + int __c_kernel_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz, const struct vdso_data *vd) {
Re: [PATCH net] ibmvnic: Fall back to 16 H_SEND_SUB_CRQ_INDIRECT entries with old FW
On 4/27/20 12:33 PM, Juliet Kim wrote: The maximum entries for H_SEND_SUB_CRQ_INDIRECT has increased on some platforms from 16 to 128. If Live Partition Mobility is used to migrate a running OS image from a newer source platform to an older target platform, then H_SEND_SUB_CRQ_INDIRECT will fail with H_PARAMETER if 128 entries are queued. Fix this by falling back to 16 entries if H_PARAMETER is returned from the hcall(). Thanks for the submission, but I am having a hard time believing that this is what is happening since the driver does not support sending multiple frames per hypervisor call at this time. Even if it were the case, this approach would omit frame data needed by the VF, so the second attempt may still fail. Are there system logs available that show the driver is attempting to send transmissions with greater than 16 descriptors? Thanks, Tom Signed-off-by: Juliet Kim --- drivers/net/ethernet/ibm/ibmvnic.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 4bd33245bad6..b66c2f26a427 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1656,6 +1656,17 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev) lpar_rc = send_subcrq_indirect(adapter, handle_array[queue_num], (u64)tx_buff->indir_dma, (u64)num_entries); + + /* Old firmware accepts max 16 num_entries */ + if (lpar_rc == H_PARAMETER && num_entries > 16) { + tx_crq.v1.n_crq_elem = 16; + tx_buff->num_entries = 16; + lpar_rc = send_subcrq_indirect(adapter, + handle_array[queue_num], + (u64)tx_buff->indir_dma, + 16); + } + dma_unmap_single(dev, tx_buff->indir_dma, sizeof(tx_buff->indir_arr), DMA_TO_DEVICE); } else {
Re: [RFC PATCH] powerpc/spufs: fix copy_to_user while atomic
On Tue, Apr 28, 2020 at 2:05 PM Jeremy Kerr wrote: > > Currently, we may perform a copy_to_user (through > simple_read_from_buffer()) while holding a context's register_lock, > while accessing the context save area. > > This change uses a temporary buffers for the context save area data, > which we then pass to simple_read_from_buffer. > > Signed-off-by: Jeremy Kerr > --- Thanks for fixing this! I wonder how far it should be backported, given that this has been broken for 14 years now. Fixes: bf1ab978be23 ("[POWERPC] coredump: Add SPU elf notes to coredump.") Reviewed-by: Arnd Bergmann
Re: [PATCH v8 8/8] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32
On Tue, Apr 28, 2020 at 3:16 PM Christophe Leroy wrote: > > Provides __kernel_clock_gettime64() on vdso32. This is the > 64 bits version of __kernel_clock_gettime() which is > y2038 compliant. > > Signed-off-by: Christophe Leroy Looks good to me Reviewed-by: Arnd Bergmann There was a bug on ARM for the corresponding function, so far it is unclear if this was a problem related to particular hardware, the 32-bit kernel code, or the common implementation of clock_gettime64 in the vdso library, see https://github.com/richfelker/musl-cross-make/issues/96 Just to be sure that powerpc is not affected by the same issue, can you confirm that repeatedly calling clock_gettime64 on powerpc32, alternating between vdso and syscall, results in monotically increasing times? Arnd
Re: [RFC PATCH v2 7/7] powerpc/selftest: reuse ppc-opcode macros to avoid redundancy
Balamuruhan S wrote: Avoid redefining macros to encode ppc instructions instead reuse it from ppc-opcode.h, Makefile changes are necessary to compile memcmp_64.S with __ASSEMBLY__ defined from selftests. Signed-off-by: Balamuruhan S --- .../selftests/powerpc/stringloops/Makefile| 34 ++ .../powerpc/stringloops/asm/asm-const.h | 1 + .../powerpc/stringloops/asm/ppc-opcode.h | 36 +-- 3 files changed, 29 insertions(+), 42 deletions(-) create mode 12 tools/testing/selftests/powerpc/stringloops/asm/asm-const.h mode change 100644 => 12 tools/testing/selftests/powerpc/stringloops/asm/ppc-opcode.h diff --git a/tools/testing/selftests/powerpc/stringloops/Makefile b/tools/testing/selftests/powerpc/stringloops/Makefile index 7fc0623d85c3..efe76c5a5b94 100644 --- a/tools/testing/selftests/powerpc/stringloops/Makefile +++ b/tools/testing/selftests/powerpc/stringloops/Makefile @@ -1,26 +1,44 @@ # SPDX-License-Identifier: GPL-2.0 # The loops are all 64-bit code -CFLAGS += -I$(CURDIR) +GIT_VERSION = $(shell git describe --always --long --dirty || echo "unknown") +CFLAGS += -DGIT_VERSION='"$(GIT_VERSION)"' -I$(CURDIR) -I$(CURDIR)/../include I'm not sure why there are some extra flags added here. Not sure if it is required for this change, or if they are separate changes. Trying to build this from 'tools/testing/selftests/powerpc/' with 'make stringloops' shows slightly different compile flags being used. - Naveen
Re: [RFC PATCH v2 0/7] consolidate PowerPC instruction encoding macros
Balamuruhan S wrote: ppc-opcode.h have base instruction encoding wrapped with stringify_in_c() for raw encoding to have compatibility. But there are redundant macros for base instruction encodings in bpf, instruction emulation test infrastructure and powerpc selftests. Currently PPC_INST_* macros are used for encoding instruction opcode and PPC_* for raw instuction encoding, this rfc patchset introduces PPC_RAW_* macros for base instruction encoding and reuse it from elsewhere. With this change we can avoid redundant macro definitions in multiple files and start adding new instructions in ppc-opcode.h in future. Changes in v2: - Fix review comments/suggestions from Naveen and Michael Ellerman, * Rename PPC_ENCODE_* to PPC_RAW_* for base instruction encoding macros. * Split the patches that does mass renaming and make them simpler that just adds new macros. * Keep the patch to update all the existing names later (patch 6). * Lot of PPC_INST_* macros are used only in ppc-opcode.h for PPC_* macros, fold PPC_INST_* encoding into PPC_RAW_* to avoid using them accidentally. * Fixed clipped macros that was due to a typo/copy-paste * Consolidated all the instruction encoding macros from bpf_jit.h to ppc-opcode.h * squashed patch that removes the duplicate macro PPC_MR() in bpf_jit.h * merge few changes in bpf_jit files from patch 2 into patch 3 * few fixes in powerpc selftest stringloops Makefile * build tested for ppc64le_defconfig, ppc64e_defconfig and pmac32_defconfig * Rebased on next branch of linuxppc tree Testing: --- * Tested it by compiling vmlinux and comparing objdump of it with and without the patchset and observed that it remains same, # diff vmlinux_objdump vmlinux_rfc_objdump 2c2 < vmlinux: file format elf64-powerpcle --- > vmlinux_rfc: file format elf64-powerpcle * Tested building it with this changes for Fedora30 config, booted VM with powerpc next and powerpc next + patchset to run powerpc selftest and ftrace selftest. There were couple of failures that were common and patchset did not introduce any new failures. ftrace selftest: --- # # of passed: 96 # # of failed: 1 # # of unresolved: 7 # # of untested: 0 # # of unsupported: 1 # # of xfailed: 1 # # of undefined(test bug): 0 not ok 1 selftests: ftrace: ftracetest # exit=1 powerpc selftest: not ok 7 selftests: powerpc/dscr: dscr_sysfs_thread_test # exit=1 not ok 20 selftests: powerpc/pmu/ebb: lost_exception_test # TIMEOUT not ok 2 selftests: powerpc/security: spectre_v2 # exit=1 Thanks to Naveen, Sandipan and Michael on overall suggestions/improvements. I would request for review and suggestions to make it better. v1: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-March/206494.html Balamuruhan S (7): powerpc/ppc-opcode: introduce PPC_RAW_* macros for base instruction encoding powerpc/ppc-opcode: move ppc instruction encoding from test_emulate_step powerpc/bpf_jit: reuse instruction macros from ppc-opcode.h powerpc/ppc-opcode: consolidate powerpc instructions from bpf_jit.h powerpc/ppc-opcode: reuse raw instruction macros to stringify powerpc/ppc-opcode: fold PPC_INST_* macros into PPC_RAW_* macros powerpc/selftest: reuse ppc-opcode macros to avoid redundancy arch/powerpc/include/asm/ppc-opcode.h | 706 +++--- arch/powerpc/lib/test_emulate_step.c | 155 ++-- arch/powerpc/net/bpf_jit.h| 184 + arch/powerpc/net/bpf_jit32.h | 34 +- arch/powerpc/net/bpf_jit64.h | 16 +- arch/powerpc/net/bpf_jit_comp.c | 134 ++-- arch/powerpc/net/bpf_jit_comp64.c | 298 .../selftests/powerpc/stringloops/Makefile| 34 +- .../powerpc/stringloops/asm/asm-const.h | 1 + .../powerpc/stringloops/asm/ppc-opcode.h | 36 +- 10 files changed, 762 insertions(+), 836 deletions(-) create mode 12 tools/testing/selftests/powerpc/stringloops/asm/asm-const.h mode change 100644 => 12 tools/testing/selftests/powerpc/stringloops/asm/ppc-opcode.h This LGTM. Except the last patch: Acked-by: Naveen N. Rao Tested-by: Naveen N. Rao Thanks! - Naveen
Re: [RFC PATCH v2 4/7] powerpc/ppc-opcode: consolidate powerpc instructions from bpf_jit.h
Balamuruhan S wrote: move macro definitions of powerpc instructions from bpf_jit.h to ppc-opcode.h and adopt the users of the macros accordingly. `PPC_MR()` is defined twice in bpf_jit.h, remove the duplicate one. Signed-off-by: Balamuruhan S --- arch/powerpc/include/asm/ppc-opcode.h | 139 + arch/powerpc/net/bpf_jit.h| 166 ++- arch/powerpc/net/bpf_jit32.h | 24 +-- arch/powerpc/net/bpf_jit64.h | 12 +- arch/powerpc/net/bpf_jit_comp.c | 132 ++-- arch/powerpc/net/bpf_jit_comp64.c | 278 +- 6 files changed, 378 insertions(+), 373 deletions(-) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 2ae0afc5c2bb..6b9a891884be 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -79,6 +79,16 @@ #define IMM_L(i) ((uintptr_t)(i) & 0x) #define IMM_DS(i) ((uintptr_t)(i) & 0xfffc) +/* + * 16-bit immediate helper macros: HA() is for use with sign-extending instrs + * (e.g. LD, ADDI). If the bottom 16 bits is "-ve", add another bit into the + * top half to negate the effect (i.e. 0x + 1 = 0x(1)). + */ +#define IMM_H(i)((uintptr_t)(i)>>16) +#define IMM_HA(i) (((uintptr_t)(i)>>16) + \ + (((uintptr_t)(i) & 0x8000) >> 15)) + Not needed for this patch series, but at some point, we should move over to using the PPC_LO(), PPC_HI() and PPC_HA() macros that are defined later in this file. - Naveen
Re: [RFC PATCH] powerpc/spufs: fix copy_to_user while atomic
FYI, these little hunks reduce the difference to my version, maybe you can fold them in? diff --git a/arch/powerpc/platforms/cell/spufs/file.c b/arch/powerpc/platforms/cell/spufs/file.c index c62d77ddaf7d3..1861436a6091d 100644 --- a/arch/powerpc/platforms/cell/spufs/file.c +++ b/arch/powerpc/platforms/cell/spufs/file.c @@ -2107,7 +2107,6 @@ static const struct file_operations spufs_wbox_info_fops = { static void ___spufs_dma_info_read(struct spu_context *ctx, struct spu_dma_info *info) { - struct mfc_cq_sr *qp, *spuqp; int i; info->dma_info_type = ctx->csa.priv2.spu_tag_status_query_RW; @@ -2116,8 +2115,8 @@ static void ___spufs_dma_info_read(struct spu_context *ctx, info->dma_info_stall_and_notify = ctx->csa.spu_chnldata_RW[25]; info->dma_info_atomic_command_status = ctx->csa.spu_chnldata_RW[27]; for (i = 0; i < 16; i++) { - qp = &info->dma_info_command_data[i]; - spuqp = &ctx->csa.priv2.spuq[i]; + struct mfc_cq_sr *qp = &info->dma_info_command_data[i]; + struct mfc_cq_sr *qp, *spuqp = &ctx->csa.priv2.spuq[i]; qp->mfc_cq_data0_RW = spuqp->mfc_cq_data0_RW; qp->mfc_cq_data1_RW = spuqp->mfc_cq_data1_RW; @@ -2166,7 +2165,7 @@ static const struct file_operations spufs_dma_info_fops = { }; static void ___spufs_proxydma_info_read(struct spu_context *ctx, - struct spu_proxydma_info *info) + struct spu_proxydma_info *info) { int i;
Re: [PATCH v3 00/29] Convert files to ReST - part 2
On Mon, 27 Apr 2020 23:16:52 +0200 Mauro Carvalho Chehab wrote: > This is the second part of a series I wrote sometime ago where I manually > convert lots of files to be properly parsed by Sphinx as ReST files. > > As it touches on lot of stuff, this series is based on today's linux-next, > at tag next-20190617. > > The first version of this series had 57 patches. The first part with 28 > patches > were already merged. Right now, there are still ~76 patches pending applying > (including this series), and that's because I opted to do ~1 patch per > converted > directory. > > That sounds too much to be send on a single round. So, I'm opting to split > it on 3 parts for the conversion, plus a final patch adding orphaned books > to existing ones. > > Those patches should probably be good to be merged either by subsystem > maintainers or via the docs tree. So I'm happy to merge this set, but there is one thing that worries me a bit... > fs/cachefiles/Kconfig |4 +- > fs/coda/Kconfig |2 +- > fs/configfs/inode.c |2 +- > fs/configfs/item.c|2 +- > fs/fscache/Kconfig|8 +- > fs/fscache/cache.c|8 +- > fs/fscache/cookie.c |2 +- > fs/fscache/object.c |4 +- > fs/fscache/operation.c|2 +- > fs/locks.c|2 +- > include/linux/configfs.h |2 +- > include/linux/fs_context.h|2 +- > include/linux/fscache-cache.h |4 +- > include/linux/fscache.h | 42 +- > include/linux/lsm_hooks.h |2 +- I'd feel a bit better if I could get an ack or two from filesystem folks before I venture that far out of my own yard...what say you all? Thanks, jon
[PATCH] fixup! signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32
I think I found a way to improve the x32 handling: This is a simplification over Christoph's "[PATCH 2/7] signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32", reducing the x32 specifics in the common code to a single #ifdef/#endif check, in order to keep it more readable for everyone else. Christoph, if you like it, please fold into your patch. Cc: Andrew Morton Cc: Alexander Viro Cc: Jeremy Kerr Cc: Arnd Bergmann Cc: "Eric W . Biederman" Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-fsde...@vger.kernel.org Signed-off-by: Arnd Bergmann --- arch/x86/include/asm/compat.h | 10 ++ arch/x86/kernel/signal.c | 23 +++ kernel/signal.c | 15 ++- 3 files changed, 35 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h index 52e9f3480f69..9341ea3da757 100644 --- a/arch/x86/include/asm/compat.h +++ b/arch/x86/include/asm/compat.h @@ -214,7 +214,17 @@ static inline bool in_compat_syscall(void) #endif struct compat_siginfo; +#ifdef CONFIG_X86_X32_ABI int __copy_siginfo_to_user32(struct compat_siginfo __user *to, const kernel_siginfo_t *from, bool x32_ABI); +#else +int copy_siginfo_to_user32(struct compat_siginfo __user *to, + const kernel_siginfo_t *from); +static inline int __copy_siginfo_to_user32(struct compat_siginfo __user *to, + const kernel_siginfo_t *from, bool x32_ABI) +{ + return copy_siginfo_to_user32(to, from); +} +#endif #endif /* _ASM_X86_COMPAT_H */ diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index 83b74fb38c8f..0e166571d28b 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -511,6 +511,29 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig, } #endif /* CONFIG_X86_32 */ +#ifdef CONFIG_X86_X32_ABI +int copy_siginfo_to_user32(struct compat_siginfo __user *to, + const struct kernel_siginfo *from) +{ + return __copy_siginfo_to_user32(to, from, in_x32_syscall()); +} + +int __copy_siginfo_to_user32(struct compat_siginfo __user *to, +const struct kernel_siginfo *from, bool x32_ABI) +{ + struct compat_siginfo new; + + copy_siginfo_to_external32(&new, from); + if (x32_ABI && from->si_signo == SIGCHLD) { + new._sifields._sigchld_x32._utime = from->si_utime; + new._sifields._sigchld_x32._stime = from->si_stime; + } + if (copy_to_user(to, &new, sizeof(struct compat_siginfo))) + return -EFAULT; + return 0; +} +#endif + static int x32_setup_rt_frame(struct ksignal *ksig, compat_sigset_t *set, struct pt_regs *regs) diff --git a/kernel/signal.c b/kernel/signal.c index 1a81602050b4..935facca4860 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -3318,29 +3318,18 @@ void copy_siginfo_to_external32(struct compat_siginfo *to, } } +#ifndef CONFIG_X86_X32_ABI int copy_siginfo_to_user32(struct compat_siginfo __user *to, const struct kernel_siginfo *from) -#if defined(CONFIG_X86_X32_ABI) || defined(CONFIG_IA32_EMULATION) -{ - return __copy_siginfo_to_user32(to, from, in_x32_syscall()); -} -int __copy_siginfo_to_user32(struct compat_siginfo __user *to, -const struct kernel_siginfo *from, bool x32_ABI) -#endif { struct compat_siginfo new; copy_siginfo_to_external32(&new, from); -#ifdef CONFIG_X86_X32_ABI - if (x32_ABI && from->si_signo == SIGCHLD) { - new._sifields._sigchld_x32._utime = from->si_utime; - new._sifields._sigchld_x32._stime = from->si_stime; - } -#endif if (copy_to_user(to, &new, sizeof(struct compat_siginfo))) return -EFAULT; return 0; } +#endif static int post_copy_siginfo_from_user32(kernel_siginfo_t *to, const struct compat_siginfo *from) -- 2.26.0
[PATCH v4 1/4] hugetlbfs: add arch_hugetlb_valid_size
The architecture independent routine hugetlb_default_setup sets up the default huge pages size. It has no way to verify if the passed value is valid, so it accepts it and attempts to validate at a later time. This requires undocumented cooperation between the arch specific and arch independent code. For architectures that support more than one huge page size, provide a routine arch_hugetlb_valid_size to validate a huge page size. hugetlb_default_setup can use this to validate passed values. arch_hugetlb_valid_size will also be used in a subsequent patch to move processing of the "hugepagesz=" in arch specific code to a common routine in arch independent code. Signed-off-by: Mike Kravetz Acked-by: Gerald Schaefer [s390] Acked-by: Will Deacon --- arch/arm64/mm/hugetlbpage.c | 17 + arch/powerpc/mm/hugetlbpage.c | 20 +--- arch/riscv/mm/hugetlbpage.c | 26 +- arch/s390/mm/hugetlbpage.c| 16 arch/sparc/mm/init_64.c | 24 arch/x86/mm/hugetlbpage.c | 17 + include/linux/hugetlb.h | 1 + mm/hugetlb.c | 21 ++--- 8 files changed, 103 insertions(+), 39 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index bbeb6a5a6ba6..069b96ee2aec 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -462,17 +462,26 @@ static int __init hugetlbpage_init(void) } arch_initcall(hugetlbpage_init); -static __init int setup_hugepagesz(char *opt) +bool __init arch_hugetlb_valid_size(unsigned long size) { - unsigned long ps = memparse(opt, &opt); - - switch (ps) { + switch (size) { #ifdef CONFIG_ARM64_4K_PAGES case PUD_SIZE: #endif case CONT_PMD_SIZE: case PMD_SIZE: case CONT_PTE_SIZE: + return true; + } + + return false; +} + +static __init int setup_hugepagesz(char *opt) +{ + unsigned long ps = memparse(opt, &opt); + + if (arch_hugetlb_valid_size(ps)) { add_huge_page_size(ps); return 1; } diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index 33b3461d91e8..de54d2a37830 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -558,7 +558,7 @@ unsigned long vma_mmu_pagesize(struct vm_area_struct *vma) return vma_kernel_pagesize(vma); } -static int __init add_huge_page_size(unsigned long long size) +bool __init arch_hugetlb_valid_size(unsigned long size) { int shift = __ffs(size); int mmu_psize; @@ -566,20 +566,26 @@ static int __init add_huge_page_size(unsigned long long size) /* Check that it is a page size supported by the hardware and * that it fits within pagetable and slice limits. */ if (size <= PAGE_SIZE || !is_power_of_2(size)) - return -EINVAL; + return false; mmu_psize = check_and_get_huge_psize(shift); if (mmu_psize < 0) - return -EINVAL; + return false; BUG_ON(mmu_psize_defs[mmu_psize].shift != shift); - /* Return if huge page size has already been setup */ - if (size_to_hstate(size)) - return 0; + return true; +} - hugetlb_add_hstate(shift - PAGE_SHIFT); +static int __init add_huge_page_size(unsigned long long size) +{ + int shift = __ffs(size); + + if (!arch_hugetlb_valid_size((unsigned long)size)) + return -EINVAL; + if (!size_to_hstate(size)) + hugetlb_add_hstate(shift - PAGE_SHIFT); return 0; } diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c index a6189ed36c5f..da1f516bc451 100644 --- a/arch/riscv/mm/hugetlbpage.c +++ b/arch/riscv/mm/hugetlbpage.c @@ -12,21 +12,29 @@ int pmd_huge(pmd_t pmd) return pmd_leaf(pmd); } +bool __init arch_hugetlb_valid_size(unsigned long size) +{ + if (size == HPAGE_SIZE) + return true; + else if (IS_ENABLED(CONFIG_64BIT) && size == PUD_SIZE) + return true; + else + return false; +} + static __init int setup_hugepagesz(char *opt) { unsigned long ps = memparse(opt, &opt); - if (ps == HPAGE_SIZE) { - hugetlb_add_hstate(HPAGE_SHIFT - PAGE_SHIFT); - } else if (IS_ENABLED(CONFIG_64BIT) && ps == PUD_SIZE) { - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); - } else { - hugetlb_bad_size(); - pr_err("hugepagesz: Unsupported page size %lu M\n", ps >> 20); - return 0; + if (arch_hugetlb_valid_size(ps)) { + hugetlb_add_hstate(ilog2(ps) - PAGE_SHIFT); + return 1; } - return 1; + hugetlb_bad_size(); + pr_err("hugepagesz: Unsupported page size %lu M\n", ps >> 20); + return 0; + } __setup("hugepagesz
[PATCH v4 0/4] Clean up hugetlb boot command line processing
v4 - Fixed huge page order definitions for arm64 (Qian Cai) Removed hugepages_supported() checks in command line processing as powerpc does not set hugepages_supported until later in boot (Sandipan) Added Acks, Reviews and Tested (Will, Gerald, Anders, Sandipan) v3 - Used weak attribute method of defining arch_hugetlb_valid_size. This eliminates changes to arch specific hugetlb.h files (Peter) Updated documentation (Peter, Randy) Fixed handling of implicitly specified gigantic page preallocation in existing code and removed documentation of such. There is now no difference between handling of gigantic and non-gigantic pages. (Peter, Nitesh). This requires the most review as there is a small change to undocumented behavior. See patch 4 commit message for details. Added Acks and Reviews (Mina, Peter) v2 - Fix build errors with patch 1 (Will) Change arch_hugetlb_valid_size arg to unsigned long and remove irrelevant 'extern' keyword (Christophe) Documentation and other misc changes (Randy, Christophe, Mina) Do not process command line options if !hugepages_supported() (Dave, but it sounds like we may want to additional changes to hugepages_supported() for x86? If that is needed I would prefer a separate patch.) Longpeng(Mike) reported a weird message from hugetlb command line processing and proposed a solution [1]. While the proposed patch does address the specific issue, there are other related issues in command line processing. As hugetlbfs evolved, updates to command line processing have been made to meet immediate needs and not necessarily in a coordinated manner. The result is that some processing is done in arch specific code, some is done in arch independent code and coordination is problematic. Semantics can vary between architectures. The patch series does the following: - Define arch specific arch_hugetlb_valid_size routine used to validate passed huge page sizes. - Move hugepagesz= command line parsing out of arch specific code and into an arch independent routine. - Clean up command line processing to follow desired semantics and document those semantics. [1] https://lore.kernel.org/linux-mm/20200305033014.1152-1-longpe...@huawei.com Mike Kravetz (4): hugetlbfs: add arch_hugetlb_valid_size hugetlbfs: move hugepagesz= parsing to arch independent code hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate hugetlbfs: clean up command line processing .../admin-guide/kernel-parameters.txt | 40 ++-- Documentation/admin-guide/mm/hugetlbpage.rst | 35 arch/arm64/mm/hugetlbpage.c | 30 +-- arch/powerpc/mm/hugetlbpage.c | 30 +-- arch/riscv/mm/hugetlbpage.c | 24 +-- arch/s390/mm/hugetlbpage.c| 24 +-- arch/sparc/mm/init_64.c | 43 + arch/x86/mm/hugetlbpage.c | 23 +-- include/linux/hugetlb.h | 2 +- mm/hugetlb.c | 180 ++ 10 files changed, 260 insertions(+), 171 deletions(-) -- 2.25.4
[PATCH v4 4/4] hugetlbfs: clean up command line processing
With all hugetlb page processing done in a single file clean up code. - Make code match desired semantics - Update documentation with semantics - Make all warnings and errors messages start with 'HugeTLB:'. - Consistently name command line parsing routines. - Warn if !hugepages_supported() and command line parameters have been specified. - Add comments to code - Describe some of the subtle interactions - Describe semantics of command line arguments This patch also fixes issues with implicitly setting the number of gigantic huge pages to preallocate. Previously on X86 command line, hugepages=2 default_hugepagesz=1G would result in zero 1G pages being preallocated and, # grep HugePages_Total /proc/meminfo HugePages_Total: 0 # sysctl -a | grep nr_hugepages vm.nr_hugepages = 2 vm.nr_hugepages_mempolicy = 2 # cat /proc/sys/vm/nr_hugepages 2 After this patch 2 gigantic pages will be preallocated and all the proc, sysfs, sysctl and meminfo files will accurately reflect this. To address the issue with gigantic pages, a small change in behavior was made to command line processing. Previously the command line, hugepages=128 default_hugepagesz=2M hugepagesz=2M hugepages=256 would result in the allocation of 256 2M huge pages. The value 128 would be ignored without any warning. After this patch, 128 2M pages will be allocated and a warning message will be displayed indicating the value of 256 is ignored. This change in behavior is required because allocation of implicitly specified gigantic pages must be done when the default_hugepagesz= is encountered for gigantic pages. Previously the code waited until later in the boot process (hugetlb_init), to allocate pages of default size. However the bootmem allocator required for gigantic allocations is not available at this time. Signed-off-by: Mike Kravetz Acked-by: Gerald Schaefer [s390] Acked-by: Will Deacon Tested-by: Sandipan Das --- .../admin-guide/kernel-parameters.txt | 40 +++-- Documentation/admin-guide/mm/hugetlbpage.rst | 35 mm/hugetlb.c | 149 ++ 3 files changed, 179 insertions(+), 45 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 7bc83f3d9bdf..cbe657b86d0e 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -834,12 +834,15 @@ See also Documentation/networking/decnet.txt. default_hugepagesz= - [same as hugepagesz=] The size of the default - HugeTLB page size. This is the size represented by - the legacy /proc/ hugepages APIs, used for SHM, and - default size when mounting hugetlbfs filesystems. - Defaults to the default architecture's huge page size - if not specified. + [HW] The size of the default HugeTLB page. This is + the size represented by the legacy /proc/ hugepages + APIs. In addition, this is the default hugetlb size + used for shmget(), mmap() and mounting hugetlbfs + filesystems. If not specified, defaults to the + architecture's default huge page size. Huge page + sizes are architecture dependent. See also + Documentation/admin-guide/mm/hugetlbpage.rst. + Format: size[KMG] deferred_probe_timeout= [KNL] Debugging option to set a timeout in seconds for @@ -1479,13 +1482,24 @@ hugepages using the cma allocator. If enabled, the boot-time allocation of gigantic hugepages is skipped. - hugepages= [HW,X86-32,IA-64] HugeTLB pages to allocate at boot. - hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages. - On x86-64 and powerpc, this option can be specified - multiple times interleaved with hugepages= to reserve - huge pages of different sizes. Valid pages sizes on - x86-64 are 2M (when the CPU supports "pse") and 1G - (when the CPU supports the "pdpe1gb" cpuinfo flag). + hugepages= [HW] Number of HugeTLB pages to allocate at boot. + If this follows hugepagesz (below), it specifies + the number of pages of hugepagesz to be allocated. + If this is the first HugeTLB parameter on the command + line, it specifies the number of pages to allocate for + the default huge page size. See also + Documentation/admin-guide/mm/hugetlb
[PATCH v4 3/4] hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate
The routine hugetlb_add_hstate prints a warning if the hstate already exists. This was originally done as part of kernel command line parsing. If 'hugepagesz=' was specified more than once, the warning pr_warn("hugepagesz= specified twice, ignoring\n"); would be printed. Some architectures want to enable all huge page sizes. They would call hugetlb_add_hstate for all supported sizes. However, this was done after command line processing and as a result hstates could have already been created for some sizes. To make sure no warning were printed, there would often be code like: if (!size_to_hstate(size) hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT) The only time we want to print the warning is as the result of command line processing. So, remove the warning from hugetlb_add_hstate and add it to the single arch independent routine processing "hugepagesz=". After this, calls to size_to_hstate() in arch specific code can be removed and hugetlb_add_hstate can be called without worrying about warning messages. Signed-off-by: Mike Kravetz Acked-by: Mina Almasry Acked-by: Gerald Schaefer [s390] Acked-by: Will Deacon Tested-by: Anders Roxell --- arch/arm64/mm/hugetlbpage.c | 16 arch/powerpc/mm/hugetlbpage.c | 3 +-- arch/riscv/mm/hugetlbpage.c | 2 +- arch/sparc/mm/init_64.c | 19 --- arch/x86/mm/hugetlbpage.c | 2 +- mm/hugetlb.c | 9 ++--- 6 files changed, 17 insertions(+), 34 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index f706b821aba6..14bed8f4674a 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -441,22 +441,14 @@ void huge_ptep_clear_flush(struct vm_area_struct *vma, clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); } -static void __init add_huge_page_size(unsigned long size) -{ - if (size_to_hstate(size)) - return; - - hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT); -} - static int __init hugetlbpage_init(void) { #ifdef CONFIG_ARM64_4K_PAGES - add_huge_page_size(PUD_SIZE); + hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); #endif - add_huge_page_size(CONT_PMD_SIZE); - add_huge_page_size(PMD_SIZE); - add_huge_page_size(CONT_PTE_SIZE); + hugetlb_add_hstate((CONT_PMD_SHIFT + PMD_SHIFT) - PAGE_SHIFT); + hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); + hugetlb_add_hstate((CONT_PTE_SHIFT + PAGE_SHIFT) - PAGE_SHIFT); return 0; } diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index 2c3fa0a7787b..4d5ed1093615 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -584,8 +584,7 @@ static int __init add_huge_page_size(unsigned long long size) if (!arch_hugetlb_valid_size((unsigned long)size)) return -EINVAL; - if (!size_to_hstate(size)) - hugetlb_add_hstate(shift - PAGE_SHIFT); + hugetlb_add_hstate(shift - PAGE_SHIFT); return 0; } diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c index 4e5d7e9f0eef..932dadfdca54 100644 --- a/arch/riscv/mm/hugetlbpage.c +++ b/arch/riscv/mm/hugetlbpage.c @@ -26,7 +26,7 @@ bool __init arch_hugetlb_valid_size(unsigned long size) static __init int gigantic_pages_init(void) { /* With CONTIG_ALLOC, we can allocate gigantic pages at runtime */ - if (IS_ENABLED(CONFIG_64BIT) && !size_to_hstate(1UL << PUD_SHIFT)) + if (IS_ENABLED(CONFIG_64BIT)) hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); return 0; } diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index 4618f96fd30f..ae819a16d07a 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -325,23 +325,12 @@ static void __update_mmu_tsb_insert(struct mm_struct *mm, unsigned long tsb_inde } #ifdef CONFIG_HUGETLB_PAGE -static void __init add_huge_page_size(unsigned long size) -{ - unsigned int order; - - if (size_to_hstate(size)) - return; - - order = ilog2(size) - PAGE_SHIFT; - hugetlb_add_hstate(order); -} - static int __init hugetlbpage_init(void) { - add_huge_page_size(1UL << HPAGE_64K_SHIFT); - add_huge_page_size(1UL << HPAGE_SHIFT); - add_huge_page_size(1UL << HPAGE_256MB_SHIFT); - add_huge_page_size(1UL << HPAGE_2GB_SHIFT); + hugetlb_add_hstate(HPAGE_64K_SHIFT - PAGE_SHIFT); + hugetlb_add_hstate(HPAGE_SHIFT - PAGE_SHIFT); + hugetlb_add_hstate(HPAGE_256MB_SHIFT - PAGE_SHIFT); + hugetlb_add_hstate(HPAGE_2GB_SHIFT - PAGE_SHIFT); return 0; } diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c index 937d640a89e3..cf5781142716 100644 --- a/arch/x86/mm/hugetlbpage.c +++ b/arch/x86/mm/hugetlbpage.c @@ -195,7 +195,7 @@ bool __init arch_hugetlb_valid_size(unsigned long size) static __init int gigantic_pages_init(void) { /* With compaction
[PATCH v4 2/4] hugetlbfs: move hugepagesz= parsing to arch independent code
Now that architectures provide arch_hugetlb_valid_size(), parsing of "hugepagesz=" can be done in architecture independent code. Create a single routine to handle hugepagesz= parsing and remove all arch specific routines. We can also remove the interface hugetlb_bad_size() as this is no longer used outside arch independent code. This also provides consistent behavior of hugetlbfs command line options. The hugepagesz= option should only be specified once for a specific size, but some architectures allow multiple instances. This appears to be more of an oversight when code was added by some architectures to set up ALL huge pages sizes. Signed-off-by: Mike Kravetz Acked-by: Mina Almasry Reviewed-by: Peter Xu Acked-by: Gerald Schaefer [s390] Acked-by: Will Deacon --- arch/arm64/mm/hugetlbpage.c | 15 --- arch/powerpc/mm/hugetlbpage.c | 15 --- arch/riscv/mm/hugetlbpage.c | 16 arch/s390/mm/hugetlbpage.c| 18 -- arch/sparc/mm/init_64.c | 22 -- arch/x86/mm/hugetlbpage.c | 16 include/linux/hugetlb.h | 1 - mm/hugetlb.c | 23 +-- 8 files changed, 17 insertions(+), 109 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 069b96ee2aec..f706b821aba6 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -476,18 +476,3 @@ bool __init arch_hugetlb_valid_size(unsigned long size) return false; } - -static __init int setup_hugepagesz(char *opt) -{ - unsigned long ps = memparse(opt, &opt); - - if (arch_hugetlb_valid_size(ps)) { - add_huge_page_size(ps); - return 1; - } - - hugetlb_bad_size(); - pr_err("hugepagesz: Unsupported page size %lu K\n", ps >> 10); - return 0; -} -__setup("hugepagesz=", setup_hugepagesz); diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index de54d2a37830..2c3fa0a7787b 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -589,21 +589,6 @@ static int __init add_huge_page_size(unsigned long long size) return 0; } -static int __init hugepage_setup_sz(char *str) -{ - unsigned long long size; - - size = memparse(str, &str); - - if (add_huge_page_size(size) != 0) { - hugetlb_bad_size(); - pr_err("Invalid huge page size specified(%llu)\n", size); - } - - return 1; -} -__setup("hugepagesz=", hugepage_setup_sz); - static int __init hugetlbpage_init(void) { bool configured = false; diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c index da1f516bc451..4e5d7e9f0eef 100644 --- a/arch/riscv/mm/hugetlbpage.c +++ b/arch/riscv/mm/hugetlbpage.c @@ -22,22 +22,6 @@ bool __init arch_hugetlb_valid_size(unsigned long size) return false; } -static __init int setup_hugepagesz(char *opt) -{ - unsigned long ps = memparse(opt, &opt); - - if (arch_hugetlb_valid_size(ps)) { - hugetlb_add_hstate(ilog2(ps) - PAGE_SHIFT); - return 1; - } - - hugetlb_bad_size(); - pr_err("hugepagesz: Unsupported page size %lu M\n", ps >> 20); - return 0; - -} -__setup("hugepagesz=", setup_hugepagesz); - #ifdef CONFIG_CONTIG_ALLOC static __init int gigantic_pages_init(void) { diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c index ac25b207624c..242dfc0d462d 100644 --- a/arch/s390/mm/hugetlbpage.c +++ b/arch/s390/mm/hugetlbpage.c @@ -261,24 +261,6 @@ bool __init arch_hugetlb_valid_size(unsigned long size) return false; } -static __init int setup_hugepagesz(char *opt) -{ - unsigned long size; - char *string = opt; - - size = memparse(opt, &opt); - if (arch_hugetlb_valid_size(size)) { - hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT); - } else { - hugetlb_bad_size(); - pr_err("hugepagesz= specifies an unsupported page size %s\n", - string); - return 0; - } - return 1; -} -__setup("hugepagesz=", setup_hugepagesz); - static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index 2bfe8e22b706..4618f96fd30f 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -397,28 +397,6 @@ bool __init arch_hugetlb_valid_size(unsigned long size) return true; } - -static int __init setup_hugepagesz(char *string) -{ - unsigned long long hugepage_size; - int rc = 0; - - hugepage_size = memparse(string, &string); - - if (!arch_hugetlb_valid_size((unsigned long)hugepage_size)) { - hugetlb_bad_size(); - pr_err("hugepa
Re: [PATCH net] ibmvnic: Fall back to 16 H_SEND_SUB_CRQ_INDIRECT entries with old FW
On 4/28/20 10:35 AM, Thomas Falcon wrote: > On 4/27/20 12:33 PM, Juliet Kim wrote: >> The maximum entries for H_SEND_SUB_CRQ_INDIRECT has increased on >> some platforms from 16 to 128. If Live Partition Mobility is used >> to migrate a running OS image from a newer source platform to an >> older target platform, then H_SEND_SUB_CRQ_INDIRECT will fail with >> H_PARAMETER if 128 entries are queued. >> >> Fix this by falling back to 16 entries if H_PARAMETER is returned >> from the hcall(). > > Thanks for the submission, but I am having a hard time believing that this is > what is happening since the driver does not support sending multiple frames > per hypervisor call at this time. Even if it were the case, this approach > would omit frame data needed by the VF, so the second attempt may still fail. > Are there system logs available that show the driver is attempting to send > transmissions with greater than 16 descriptors? > > Thanks, > > Tom > > I am trying to confirm. Juliet >> >> Signed-off-by: Juliet Kim >> --- >> drivers/net/ethernet/ibm/ibmvnic.c | 11 +++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c >> b/drivers/net/ethernet/ibm/ibmvnic.c >> index 4bd33245bad6..b66c2f26a427 100644 >> --- a/drivers/net/ethernet/ibm/ibmvnic.c >> +++ b/drivers/net/ethernet/ibm/ibmvnic.c >> @@ -1656,6 +1656,17 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, >> struct net_device *netdev) >> lpar_rc = send_subcrq_indirect(adapter, handle_array[queue_num], >> (u64)tx_buff->indir_dma, >> (u64)num_entries); >> + >> + /* Old firmware accepts max 16 num_entries */ >> + if (lpar_rc == H_PARAMETER && num_entries > 16) { >> + tx_crq.v1.n_crq_elem = 16; >> + tx_buff->num_entries = 16; >> + lpar_rc = send_subcrq_indirect(adapter, >> + handle_array[queue_num], >> + (u64)tx_buff->indir_dma, >> + 16); >> + } >> + >> dma_unmap_single(dev, tx_buff->indir_dma, >> sizeof(tx_buff->indir_arr), DMA_TO_DEVICE); >> } else {
Re: [PATCH v2] powerpc/64: BE option to use ELFv2 ABI for big endian kernels
Hi! On Tue, Apr 28, 2020 at 09:25:17PM +1000, Nicholas Piggin wrote: > +config BUILD_BIG_ENDIAN_ELF_V2 > + bool "Build big-endian kernel using ELFv2 ABI (EXPERIMENTAL)" > + depends on PPC64 && CPU_BIG_ENDIAN && EXPERT > + default n > + select BUILD_ELF_V2 > + help > + This builds the kernel image using the ELFv2 ABI, which has a > + reduced stack overhead and faster function calls. This does not > + affect the userspace ABIs. > + > + ELFv2 is the standard ABI for little-endian, but for big-endian > + this is an experimental option that is less tested (kernel and > + toolchain). This requires gcc 4.9 or newer and binutils 2.24 or > + newer. Is it clear that this is only for 64-bit? Maybe this text should fit that in somewhere? It's not obvious to people who do not already know that ELFv2 is just the (nick-)name of a particular ABI, not a new kind of ELF (it is just version 1 ELF in fact), and that ABI is for 64-bit Power only. Segher
Re: [PATCH v2 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
On Tue, 28 Apr 2020 15:08:36 +0530 Srikar Dronamraju wrote: > Currently Linux kernel with CONFIG_NUMA on a system with multiple > possible nodes, marks node 0 as online at boot. However in practice, > there are systems which have node 0 as memoryless and cpuless. > > This can cause numa_balancing to be enabled on systems with only one node > with memory and CPUs. The existence of this dummy node which is cpuless and > memoryless node can confuse users/scripts looking at output of lscpu / > numactl. > > By marking, N_ONLINE as NODE_MASK_NONE, lets stop assuming that Node 0 is > always online. > > ... > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -116,8 +116,10 @@ EXPORT_SYMBOL(latent_entropy); > */ > nodemask_t node_states[NR_NODE_STATES] __read_mostly = { > [N_POSSIBLE] = NODE_MASK_ALL, > +#ifdef CONFIG_NUMA > + [N_ONLINE] = NODE_MASK_NONE, > +#else > [N_ONLINE] = { { [0] = 1UL } }, > -#ifndef CONFIG_NUMA > [N_NORMAL_MEMORY] = { { [0] = 1UL } }, > #ifdef CONFIG_HIGHMEM > [N_HIGH_MEMORY] = { { [0] = 1UL } }, So on all other NUMA machines, when does node 0 get marked online? This change means that for some time during boot, such machines will now be running with node 0 marked as offline. What are the implications of this? Will something break?
Re: [PATCH v2] powerpc/64: BE option to use ELFv2 ABI for big endian kernels
Excerpts from Segher Boessenkool's message of April 29, 2020 9:40 am: > Hi! > > On Tue, Apr 28, 2020 at 09:25:17PM +1000, Nicholas Piggin wrote: >> +config BUILD_BIG_ENDIAN_ELF_V2 >> +bool "Build big-endian kernel using ELFv2 ABI (EXPERIMENTAL)" >> +depends on PPC64 && CPU_BIG_ENDIAN && EXPERT >> +default n >> +select BUILD_ELF_V2 >> +help >> + This builds the kernel image using the ELFv2 ABI, which has a >> + reduced stack overhead and faster function calls. This does not >> + affect the userspace ABIs. >> + >> + ELFv2 is the standard ABI for little-endian, but for big-endian >> + this is an experimental option that is less tested (kernel and >> + toolchain). This requires gcc 4.9 or newer and binutils 2.24 or >> + newer. > > Is it clear that this is only for 64-bit? Maybe this text should fit > that in somewhere? Don't know if it's necessary, the option only appears when 64-bit is selected. > It's not obvious to people who do not already know that ELFv2 is just > the (nick-)name of a particular ABI, not a new kind of ELF (it is just > version 1 ELF in fact), and that ABI is for 64-bit Power only. I blame toolchain for -mabi=elfv2 ! And also some blame on ABI document which is called ELF V2 ABI rather than ELF ABI V2 which would have been unambiguous. Kernel mostly gets it right (in the code I didn't write), and uses these #if defined(_CALL_ELF) && _CALL_ELF == 2 #define PPC64_ELF_ABI_v2 #else #define PPC64_ELF_ABI_v1 #endif I can go through and change all my stuff and config options to ELF_ABI_v2. Thanks, Nick
Re: [PATCH v3 00/29] Convert files to ReST - part 2
On Tue, Apr 28, 2020 at 03:09:51PM -0400, Jonathan Corbet wrote: > So I'm happy to merge this set, but there is one thing that worries me a > bit... > > > fs/coda/Kconfig |2 +- > > I'd feel a bit better if I could get an ack or two from filesystem folks > before I venture that far out of my own yard...what say you all? I acked the Coda parts on the first iteration of this patch. I have no problem with you merging them. Jan
[PATCH v3] powerpc/64: Option to use ELF V2 ABI for big-endian kernels
Provide an option to build big-endian kernels using the ELF V2 ABI. This works on GCC and clang (since about 2014). it is is not officially supported by the GNU toolchain, but it can give big-endian kernels some useful advantages of the V2 ABI (e.g., less stack usage). Reviewed-by: Segher Boessenkool Signed-off-by: Nicholas Piggin --- Since v1: - Improved the override flavour name suggested by Segher. - Improved changelog wording. Since v2: - Improved changelog, help text, to use the name ELF V2 ABI in the spec, and clarify things a bit more, suggested by Segher. - For option name, match the ELF_ABI_v1/2 which is already in the kernel. - Prefix options with PPC64_ to avoid arch clashes or confusion. - "elfv2" is the toolchain name of the ABI, so I kept that in the crypto perl scripts. arch/powerpc/Kconfig| 21 + arch/powerpc/Makefile | 15 ++- arch/powerpc/boot/Makefile | 4 drivers/crypto/vmx/Makefile | 8 ++-- drivers/crypto/vmx/ppc-xlate.pl | 10 ++ 5 files changed, 47 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 924c541a9260..444867e07039 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -147,6 +147,7 @@ config PPC select ARCH_WEAK_RELEASE_ACQUIRE select BINFMT_ELF select BUILDTIME_TABLE_SORT + select PPC64_BUILD_ELF_ABI_V2 if PPC64 && CPU_LITTLE_ENDIAN select CLONE_BACKWARDS select DCACHE_WORD_ACCESS if PPC64 && CPU_LITTLE_ENDIAN select DYNAMIC_FTRACE if FUNCTION_TRACER @@ -541,6 +542,26 @@ config KEXEC_FILE config ARCH_HAS_KEXEC_PURGATORY def_bool KEXEC_FILE +config PPC64_BUILD_ELF_ABI_V2 + bool + +config PPC64_BUILD_BIG_ENDIAN_ELF_ABI_V2 + bool "Build big-endian kernel using ELF V2 ABI (EXPERIMENTAL)" + depends on PPC64 && CPU_BIG_ENDIAN && EXPERT + default n + select PPC64_BUILD_ELF_ABI_V2 + help + This builds the kernel image using the "Power Architecture 64-Bit ELF + V2 ABI Specification", which has a reduced stack overhead and faster + function calls. This internal kernel ABI option does not affect + userspace compatibility. + + The V2 ABI is standard for 64-bit little-endian, but for big-endian + it is less well tested by kernel and toolchain. However some distros + build userspace this way, and it can produce a functioning kernel. + + This requires gcc 4.9 or newer and binutils 2.24 or newer. + config RELOCATABLE bool "Build a relocatable kernel" depends on PPC64 || (FLATMEM && (44x || FSL_BOOKE)) diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index f310c32e88a4..baf477d100b2 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -92,10 +92,14 @@ endif ifdef CONFIG_PPC64 ifndef CONFIG_CC_IS_CLANG -cflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mabi=elfv1) -cflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mcall-aixdesc) -aflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mabi=elfv1) -aflags-$(CONFIG_CPU_LITTLE_ENDIAN) += -mabi=elfv2 +ifdef CONFIG_PPC64_BUILD_ELF_ABI_V2 +cflags-y += $(call cc-option,-mabi=elfv2,$(call cc-option,-mcall-aixdesc)) +aflags-y += $(call cc-option,-mabi=elfv2) +else +cflags-y += $(call cc-option,-mabi=elfv1) +cflags-y += $(call cc-option,-mcall-aixdesc) +aflags-y += $(call cc-option,-mabi=elfv1) +endif endif endif @@ -144,7 +148,7 @@ endif CFLAGS-$(CONFIG_PPC64) := $(call cc-option,-mtraceback=no) ifndef CONFIG_CC_IS_CLANG -ifdef CONFIG_CPU_LITTLE_ENDIAN +ifdef CONFIG_PPC64_BUILD_ELF_ABI_V2 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2,$(call cc-option,-mcall-aixdesc)) AFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2) else @@ -153,6 +157,7 @@ CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcall-aixdesc) AFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv1) endif endif + CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium,$(call cc-option,-mminimal-toc)) CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mno-pointers-to-nested-functions) diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index c53a1b8bba8b..d27e85413c84 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -41,6 +41,10 @@ endif BOOTCFLAGS += -isystem $(shell $(BOOTCC) -print-file-name=include) +ifdef CONFIG_PPC64_BUILD_ELF_ABI_V2 +BOOTCFLAGS += $(call cc-option,-mabi=elfv2) +endif + ifdef CONFIG_CPU_BIG_ENDIAN BOOTCFLAGS += -mbig-endian else diff --git a/drivers/crypto/vmx/Makefile b/drivers/crypto/vmx/Makefile index 709670d2b553..751ffca694aa 100644 --- a/drivers/crypto/vmx/Makefile +++ b/drive
Re: [RFC PATCH] powerpc/spufs: fix copy_to_user while atomic
Hi Christoph, > FYI, these little hunks reduce the difference to my version, maybe > you can fold them in? Sure, no problem. How do you want to coordinate these? I can submit mine through mpe, but that may make it tricky to synchronise with your changes. Or, you can include this change in your series if you prefer. Cheers, Jeremy
Re: [PATCH v2 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
> > > > By marking, N_ONLINE as NODE_MASK_NONE, lets stop assuming that Node 0 is > > always online. > > > > ... > > > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -116,8 +116,10 @@ EXPORT_SYMBOL(latent_entropy); > > */ > > nodemask_t node_states[NR_NODE_STATES] __read_mostly = { > > [N_POSSIBLE] = NODE_MASK_ALL, > > +#ifdef CONFIG_NUMA > > + [N_ONLINE] = NODE_MASK_NONE, > > +#else > > [N_ONLINE] = { { [0] = 1UL } }, > > -#ifndef CONFIG_NUMA > > [N_NORMAL_MEMORY] = { { [0] = 1UL } }, > > #ifdef CONFIG_HIGHMEM > > [N_HIGH_MEMORY] = { { [0] = 1UL } }, > > So on all other NUMA machines, when does node 0 get marked online? > > This change means that for some time during boot, such machines will > now be running with node 0 marked as offline. What are the > implications of this? Will something break? Till the nodes are detected, marking Node 0 as online tends to be redundant. Because the system doesn't know if its a NUMA or a non-NUMA system. Once we detect the nodes, we online them immediately. Hence I don't see any side-effects or negative implications of this change. However if I am missing anything, please do let me know. >From my part, I have tested this on 1. Non-NUMA Single node but CPUs and memory coming from zero node. 2. Non-NUMA Single node but CPUs and memory coming from non-zero node. 3. NUMA Multi node but with CPUs and memory from node 0. 4. NUMA Multi node but with no CPUs and memory from node 0. -- Thanks and Regards Srikar Dronamraju
Re: [PATCH v6 10/28] powerpc: Introduce functions for instruction equality
There seems to be a minor typo which breaks compilation when CONFIG_MPROFILE_KERNEL is not enabled. See the fix below. --- arch/powerpc/kernel/trace/ftrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ ftrace.c index a6064e1977ca..0ad2c9d4ab49 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -499,7 +499,7 @@ expected_nop_sequence(void *ip, struct ppc_inst op0, struct ppc_inst op1) * The load offset is different depending on the ABI. For simplicity * just mask it out when doing the compare. */ - if ((!ppc_inst_equal(op0), ppc_inst(0x4808)) || (ppc_inst_val(op1) & 0x) != 0xe841) + if ((!ppc_inst_equal(op0, ppc_inst(0x4808))) || (ppc_inst_val(op1) & 0x) != 0xe841) return 0; return 1; } -- 2.20.1
Re: [PATCH v6 11/28] powerpc: Use a datatype for instructions
Hi Jordan, I needed the below fix for building with CONFIG_STRICT_KERNEL_RWX enabled. Hopefully it's correct, I have not yet had a chance to test it beyond building it. - Alistair --- arch/powerpc/lib/code-patching.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code- patching.c index ad5754c5f007..a8c8ffdb1ccd 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -166,8 +166,8 @@ static int do_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr) goto out; } - patch_addr = (unsigned int *)(text_poke_addr) + - ((kaddr & ~PAGE_MASK) / sizeof(unsigned int)); + patch_addr = (struct ppc_inst *)(text_poke_addr) + + ((kaddr & ~PAGE_MASK) / sizeof(unsigned int)); __patch_instruction(addr, instr, patch_addr); -- 2.20.1
[RFC PATCH v2 0/5] Use per-CPU temporary mappings for patching
When compiled with CONFIG_STRICT_KERNEL_RWX, the kernel must create temporary mappings when patching itself. These mappings temporarily override the strict RWX text protections to permit a write. Currently, powerpc allocates a per-CPU VM area for patching. Patching occurs as follows: 1. Map page of text to be patched to per-CPU VM area w/ PAGE_KERNEL protection 2. Patch text 3. Remove the temporary mapping While the VM area is per-CPU, the mapping is actually inserted into the kernel page tables. Presumably, this could allow another CPU to access the normally write-protected text - either malicously or accidentally - via this same mapping if the address of the VM area is known. Ideally, the mapping should be kept local to the CPU doing the patching (or any other sensitive operations requiring temporarily overriding memory protections) [0]. x86 introduced "temporary mm" structs which allow the creation of mappings local to a particular CPU [1]. This series intends to bring the notion of a temporary mm to powerpc and harden powerpc by using such a mapping for patching a kernel with strict RWX permissions. The first patch introduces the temporary mm struct and API for powerpc along with a new function to retrieve a current hw breakpoint. The second patch uses the `poking_init` init hook added by the x86 patches to initialize a temporary mm and patching address. The patching address is randomized between 0 and DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how the hash MMU operates - by default the space above DEFAULT_MAP_WINDOW is not available. For now, both hash and radix randomize inside this range. The number of possible random addresses is dependent on PAGE_SIZE and limited by DEFAULT_MAP_WINDOW. Bits of entropy with 64K page size on BOOK3S_64: bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE) PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB bits of entropy = log2(128TB / 64K) bits of entropy = 31 Randomization occurs only once during initialization at boot. The third patch replaces the VM area with the temporary mm in the patching code. The page for patching has to be mapped PAGE_SHARED with the hash MMU since hash prevents the kernel from accessing userspace pages with PAGE_PRIVILEGED bit set. On the radix MMU the page is mapped with PAGE_KERNEL which has the added benefit that we can skip KUAP. The fourth and fifth patches implement an LKDTM test "proof-of-concept" which exploits the previous vulnerability (ie. the mapping during patching is exposed in kernel page tables and accessible by other CPUS). The LKDTM test is somewhat "rough" in that it uses a brute-force approach - I am open to any suggestions and/or ideas to improve this. Currently, the LKDTM test passes with this series on POWER8 (hash) and POWER9 (radix, hash) and fails without this series (ie. the temporary mapping for patching is exposed to CPUs other than the patching CPU). The test can be applied to a tree without this new series by first adding this in /arch/powerpc/lib/code-patching.c: @@ -41,6 +41,13 @@ int raw_patch_instruction(unsigned int *addr, unsigned int instr) #ifdef CONFIG_STRICT_KERNEL_RWX static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); +#ifdef CONFIG_LKDTM +unsigned long read_cpu_patching_addr(unsigned int cpu) +{ + return (unsigned long)(per_cpu(text_poke_area, cpu))->addr; +} +#endif + static int text_area_cpu_up(unsigned int cpu) { struct vm_struct *area; And then applying the last patch of this series which adds the LKDTM test, (powerpc: Add LKDTM test to hijack a patch mapping). Tested on QEMU (POWER8, POWER9), POWER8 VM, and a Blackbird (8-core POWER9). v2: Many fixes and improvements mostly based on extensive feedback and testing by Christophe Leroy (thanks!). * Make patching_mm and patching_addr static and mode '__ro_after_init' to after the variable name (more common in other parts of the kernel) * Use 'asm/debug.h' header instead of 'asm/hw_breakpoint.h' to fix PPC64e compile * Add comment explaining why we use BUG_ON() during the init call to setup for patching later * Move ptep into patch_mapping to avoid walking page tables a second time when unmapping the temporary mapping * Use KUAP under non-radix, also manually dirty the PTE for patch mapping on non-BOOK3S_64 platforms * Properly return any error from __patch_instruction * Do not use 'memcmp' where a simple comparison is appropriate * Simplify expression for patch address by removing pointer maths * Add LKDTM test [0]: https://github.com/linuxppc/issues/issues/224 [1]: https://lore.kernel.org/kernel-hardening/20190426232303.28381-1-nadav.a...@gmail.com/ Christopher M. Riedl (5): powerpc/mm: Introduce temporary mm powerpc/lib: Initialize a temporary mm for code patching powerpc/lib:
[RFC PATCH v2 3/5] powerpc/lib: Use a temporary mm for code patching
Currently, code patching a STRICT_KERNEL_RWX exposes the temporary mappings to other CPUs. These mappings should be kept local to the CPU doing the patching. Use the pre-initialized temporary mm and patching address for this purpose. Also add a check after patching to ensure the patch succeeded. Use the KUAP functions on non-BOOKS3_64 platforms since the temporary mapping for patching uses a userspace address (to keep the mapping local). On BOOKS3_64 platforms hash does not implement KUAP and on radix the use of PAGE_KERNEL sets EAA[0] for the PTE which means the AMR (KUAP) protection is ignored (see PowerISA v3.0b, Fig, 35). Based on x86 implementation: commit b3fd8e83ada0 ("x86/alternatives: Use temporary mm for text poking") Signed-off-by: Christopher M. Riedl --- arch/powerpc/lib/code-patching.c | 149 --- 1 file changed, 55 insertions(+), 94 deletions(-) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 259c19480a85..26f06cdb5d7e 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -19,6 +19,7 @@ #include #include #include +#include static int __patch_instruction(unsigned int *exec_addr, unsigned int instr, unsigned int *patch_addr) @@ -72,101 +73,58 @@ void __init poking_init(void) pte_unmap_unlock(ptep, ptl); } -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); - -static int text_area_cpu_up(unsigned int cpu) -{ - struct vm_struct *area; - - area = get_vm_area(PAGE_SIZE, VM_ALLOC); - if (!area) { - WARN_ONCE(1, "Failed to create text area for cpu %d\n", - cpu); - return -1; - } - this_cpu_write(text_poke_area, area); - - return 0; -} - -static int text_area_cpu_down(unsigned int cpu) -{ - free_vm_area(this_cpu_read(text_poke_area)); - return 0; -} - -/* - * Run as a late init call. This allows all the boot time patching to be done - * simply by patching the code, and then we're called here prior to - * mark_rodata_ro(), which happens after all init calls are run. Although - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge - * it as being preferable to a kernel that will crash later when someone tries - * to use patch_instruction(). - */ -static int __init setup_text_poke_area(void) -{ - BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, - "powerpc/text_poke:online", text_area_cpu_up, - text_area_cpu_down)); - - return 0; -} -late_initcall(setup_text_poke_area); +struct patch_mapping { + spinlock_t *ptl; /* for protecting pte table */ + pte_t *ptep; + struct temp_mm temp_mm; +}; /* * This can be called for kernel text or a module. */ -static int map_patch_area(void *addr, unsigned long text_poke_addr) +static int map_patch(const void *addr, struct patch_mapping *patch_mapping) { - unsigned long pfn; - int err; + struct page *page; + pte_t pte; + pgprot_t pgprot; if (is_vmalloc_addr(addr)) - pfn = vmalloc_to_pfn(addr); + page = vmalloc_to_page(addr); else - pfn = __pa_symbol(addr) >> PAGE_SHIFT; + page = virt_to_page(addr); - err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL); + if (radix_enabled()) + pgprot = PAGE_KERNEL; + else + pgprot = PAGE_SHARED; - pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err); - if (err) + patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr, +&patch_mapping->ptl); + if (unlikely(!patch_mapping->ptep)) { + pr_warn("map patch: failed to allocate pte for patching\n"); return -1; + } + + pte = mk_pte(page, pgprot); + if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64)) + pte = pte_mkdirty(pte); + set_pte_at(patching_mm, patching_addr, patch_mapping->ptep, pte); + + init_temp_mm(&patch_mapping->temp_mm, patching_mm); + use_temporary_mm(&patch_mapping->temp_mm); return 0; } -static inline int unmap_patch_area(unsigned long addr) +static void unmap_patch(struct patch_mapping *patch_mapping) { - pte_t *ptep; - pmd_t *pmdp; - pud_t *pudp; - pgd_t *pgdp; - - pgdp = pgd_offset_k(addr); - if (unlikely(!pgdp)) - return -EINVAL; - - pudp = pud_offset(pgdp, addr); - if (unlikely(!pudp)) - return -EINVAL; - - pmdp = pmd_offset(pudp, addr); - if (unlikely(!pmdp)) - return -EINVAL; - - ptep = pte_offset_kernel(pmdp, addr); - if (unlikely(!ptep)) - return -EINVAL; + /* In hash, pte_clear flushes the tlb */ + pte_clear(patching_mm, patching_addr, patch_m
[RFC PATCH v2 5/5] powerpc: Add LKDTM test to hijack a patch mapping
When live patching with STRICT_KERNEL_RWX, the CPU doing the patching must use a temporary mapping which allows for writing to kernel text. During the entire window of time when this temporary mapping is in use, another CPU could write to the same mapping and maliciously alter kernel text. Implement a LKDTM test to attempt to exploit such a openings when a CPU is patching under STRICT_KERNEL_RWX. The test is only implemented on powerpc for now. The LKDTM "hijack" test works as follows: 1. A CPU executes an infinite loop to patch an instruction. This is the "patching" CPU. 2. Another CPU attempts to write to the address of the temporary mapping used by the "patching" CPU. This other CPU is the "hijacker" CPU. The hijack either fails with a segfault or succeeds, in which case some kernel text is now overwritten. How to run the test: mount -t debugfs none /sys/kernel/debug (echo HIJACK_PATCH > /sys/kernel/debug/provoke-crash/DIRECT) Signed-off-by: Christopher M. Riedl --- drivers/misc/lkdtm/core.c | 1 + drivers/misc/lkdtm/lkdtm.h | 1 + drivers/misc/lkdtm/perms.c | 99 ++ 3 files changed, 101 insertions(+) diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c index a5e344df9166..482e72f6a1e1 100644 --- a/drivers/misc/lkdtm/core.c +++ b/drivers/misc/lkdtm/core.c @@ -145,6 +145,7 @@ static const struct crashtype crashtypes[] = { CRASHTYPE(WRITE_RO), CRASHTYPE(WRITE_RO_AFTER_INIT), CRASHTYPE(WRITE_KERN), + CRASHTYPE(HIJACK_PATCH), CRASHTYPE(REFCOUNT_INC_OVERFLOW), CRASHTYPE(REFCOUNT_ADD_OVERFLOW), CRASHTYPE(REFCOUNT_INC_NOT_ZERO_OVERFLOW), diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h index 601a2156a0d4..bfcf3542370d 100644 --- a/drivers/misc/lkdtm/lkdtm.h +++ b/drivers/misc/lkdtm/lkdtm.h @@ -62,6 +62,7 @@ void lkdtm_EXEC_USERSPACE(void); void lkdtm_EXEC_NULL(void); void lkdtm_ACCESS_USERSPACE(void); void lkdtm_ACCESS_NULL(void); +void lkdtm_HIJACK_PATCH(void); /* lkdtm_refcount.c */ void lkdtm_REFCOUNT_INC_OVERFLOW(void); diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c index 62f76d506f04..547ce16e03e5 100644 --- a/drivers/misc/lkdtm/perms.c +++ b/drivers/misc/lkdtm/perms.c @@ -9,6 +9,7 @@ #include #include #include +#include #include /* Whether or not to fill the target memory area with do_nothing(). */ @@ -213,6 +214,104 @@ void lkdtm_ACCESS_NULL(void) *ptr = tmp; } +#if defined(CONFIG_PPC) && defined(CONFIG_STRICT_KERNEL_RWX) +#include + +extern unsigned long read_cpu_patching_addr(unsigned int cpu); + +static unsigned int * const patch_site = (unsigned int * const)&do_nothing; + +static int lkdtm_patching_cpu(void *data) +{ + int err = 0; + + pr_info("starting patching_cpu=%d\n", smp_processor_id()); + do { + err = patch_instruction(patch_site, 0xdeadbeef); + } while (*READ_ONCE(patch_site) == 0xdeadbeef && + !err && !kthread_should_stop()); + + if (err) + pr_warn("patch_instruction returned error: %d\n", err); + + set_current_state(TASK_INTERRUPTIBLE); + while (!kthread_should_stop()) { + schedule(); + set_current_state(TASK_INTERRUPTIBLE); + } + + return err; +} + +void lkdtm_HIJACK_PATCH(void) +{ + struct task_struct *patching_kthrd; + int patching_cpu, hijacker_cpu, original_insn, attempts; + unsigned long addr; + bool hijacked; + + if (num_online_cpus() < 2) { + pr_warn("need at least two cpus\n"); + return; + } + + original_insn = *READ_ONCE(patch_site); + + hijacker_cpu = smp_processor_id(); + patching_cpu = cpumask_any_but(cpu_online_mask, hijacker_cpu); + + patching_kthrd = kthread_create_on_node(&lkdtm_patching_cpu, NULL, + cpu_to_node(patching_cpu), + "lkdtm_patching_cpu"); + kthread_bind(patching_kthrd, patching_cpu); + wake_up_process(patching_kthrd); + + addr = offset_in_page(patch_site) | read_cpu_patching_addr(patching_cpu); + + pr_info("starting hijacker_cpu=%d\n", hijacker_cpu); + for (attempts = 0; attempts < 10; ++attempts) { + /* Use __put_user to catch faults without an Oops */ + hijacked = !__put_user(0xbad00bad, (unsigned int *)addr); + + if (hijacked) { + if (kthread_stop(patching_kthrd)) + goto out; + break; + } + } + pr_info("hijack attempts: %d\n", attempts); + + if (hijacked) { + if (*READ_ONCE(patch_site) == 0xbad00bad) + pr_err("overwrote kernel text\n"); + /* +* The
[RFC PATCH v2 2/5] powerpc/lib: Initialize a temporary mm for code patching
When code patching a STRICT_KERNEL_RWX kernel the page containing the address to be patched is temporarily mapped with permissive memory protections. Currently, a per-cpu vmalloc patch area is used for this purpose. While the patch area is per-cpu, the temporary page mapping is inserted into the kernel page tables for the duration of the patching. The mapping is exposed to CPUs other than the patching CPU - this is undesirable from a hardening perspective. Use the `poking_init` init hook to prepare a temporary mm and patching address. Initialize the temporary mm by copying the init mm. Choose a randomized patching address inside the temporary mm userspace address portion. The next patch uses the temporary mm and patching address for code patching. Based on x86 implementation: commit 4fc19708b165 ("x86/alternatives: Initialize temporary mm for patching") Signed-off-by: Christopher M. Riedl --- arch/powerpc/lib/code-patching.c | 33 1 file changed, 33 insertions(+) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 3345f039a876..259c19480a85 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -11,6 +11,8 @@ #include #include #include +#include +#include #include #include @@ -39,6 +41,37 @@ int raw_patch_instruction(unsigned int *addr, unsigned int instr) } #ifdef CONFIG_STRICT_KERNEL_RWX + +static struct mm_struct *patching_mm __ro_after_init; +static unsigned long patching_addr __ro_after_init; + +void __init poking_init(void) +{ + spinlock_t *ptl; /* for protecting pte table */ + pte_t *ptep; + + /* +* Some parts of the kernel (static keys for example) depend on +* successful code patching. Code patching under STRICT_KERNEL_RWX +* requires this setup - otherwise we cannot patch at all. We use +* BUG_ON() here and later since an early failure is preferred to +* buggy behavior and/or strange crashes later. +*/ + patching_mm = copy_init_mm(); + BUG_ON(!patching_mm); + + /* +* In hash we cannot go above DEFAULT_MAP_WINDOW easily. +* XXX: Do we want additional bits of entropy for radix? +*/ + patching_addr = (get_random_long() & PAGE_MASK) % + (DEFAULT_MAP_WINDOW - PAGE_SIZE); + + ptep = get_locked_pte(patching_mm, patching_addr, &ptl); + BUG_ON(!ptep); + pte_unmap_unlock(ptep, ptl); +} + static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); static int text_area_cpu_up(unsigned int cpu) -- 2.26.1
[RFC PATCH v2 4/5] powerpc/lib: Add LKDTM accessor for patching addr
When live patching a STRICT_RWX kernel, a mapping is installed at a "patching address" with temporary write permissions. Provide a LKDTM-only accessor function for this address in preparation for a LKDTM test which attempts to "hijack" this mapping by writing to it from another CPU. Signed-off-by: Christopher M. Riedl --- arch/powerpc/lib/code-patching.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 26f06cdb5d7e..cfbdef90384e 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -46,6 +46,13 @@ int raw_patch_instruction(unsigned int *addr, unsigned int instr) static struct mm_struct *patching_mm __ro_after_init; static unsigned long patching_addr __ro_after_init; +#ifdef CONFIG_LKDTM +unsigned long read_cpu_patching_addr(unsigned int cpu) +{ + return patching_addr; +} +#endif + void __init poking_init(void) { spinlock_t *ptl; /* for protecting pte table */ -- 2.26.1
[RFC PATCH v2 1/5] powerpc/mm: Introduce temporary mm
x86 supports the notion of a temporary mm which restricts access to temporary PTEs to a single CPU. A temporary mm is useful for situations where a CPU needs to perform sensitive operations (such as patching a STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing said mappings to other CPUs. A side benefit is that other CPU TLBs do not need to be flushed when the temporary mm is torn down. Mappings in the temporary mm can be set in the userspace portion of the address-space. Interrupts must be disabled while the temporary mm is in use. HW breakpoints, which may have been set by userspace as watchpoints on addresses now within the temporary mm, are saved and disabled when loading the temporary mm. The HW breakpoints are restored when unloading the temporary mm. All HW breakpoints are indiscriminately disabled while the temporary mm is in use. Based on x86 implementation: commit cefa929c034e ("x86/mm: Introduce temporary mm structs") Signed-off-by: Christopher M. Riedl --- arch/powerpc/include/asm/debug.h | 1 + arch/powerpc/include/asm/mmu_context.h | 54 ++ arch/powerpc/kernel/process.c | 5 +++ 3 files changed, 60 insertions(+) diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h index 7756026b95ca..b945bc16c932 100644 --- a/arch/powerpc/include/asm/debug.h +++ b/arch/powerpc/include/asm/debug.h @@ -45,6 +45,7 @@ static inline int debugger_break_match(struct pt_regs *regs) { return 0; } static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; } #endif +void __get_breakpoint(struct arch_hw_breakpoint *brk); void __set_breakpoint(struct arch_hw_breakpoint *brk); bool ppc_breakpoint_available(void); #ifdef CONFIG_PPC_ADV_DEBUG_REGS diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 360367c579de..57a8695fe63f 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -10,6 +10,7 @@ #include #include #include +#include /* * Most if the context management is out of line @@ -270,5 +271,58 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm, return 0; } +struct temp_mm { + struct mm_struct *temp; + struct mm_struct *prev; + bool is_kernel_thread; + struct arch_hw_breakpoint brk; +}; + +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm) +{ + temp_mm->temp = mm; + temp_mm->prev = NULL; + temp_mm->is_kernel_thread = false; + memset(&temp_mm->brk, 0, sizeof(temp_mm->brk)); +} + +static inline void use_temporary_mm(struct temp_mm *temp_mm) +{ + lockdep_assert_irqs_disabled(); + + temp_mm->is_kernel_thread = current->mm == NULL; + if (temp_mm->is_kernel_thread) + temp_mm->prev = current->active_mm; + else + temp_mm->prev = current->mm; + + /* +* Hash requires a non-NULL current->mm to allocate a userspace address +* when handling a page fault. Does not appear to hurt in Radix either. +*/ + current->mm = temp_mm->temp; + switch_mm_irqs_off(NULL, temp_mm->temp, current); + + if (ppc_breakpoint_available()) { + __get_breakpoint(&temp_mm->brk); + if (temp_mm->brk.type != 0) + hw_breakpoint_disable(); + } +} + +static inline void unuse_temporary_mm(struct temp_mm *temp_mm) +{ + lockdep_assert_irqs_disabled(); + + if (temp_mm->is_kernel_thread) + current->mm = NULL; + else + current->mm = temp_mm->prev; + switch_mm_irqs_off(NULL, temp_mm->prev, current); + + if (ppc_breakpoint_available() && temp_mm->brk.type != 0) + __set_breakpoint(&temp_mm->brk); +} + #endif /* __KERNEL__ */ #endif /* __ASM_POWERPC_MMU_CONTEXT_H */ diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 9c21288f8645..ec4cf890d92c 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -800,6 +800,11 @@ static inline int set_breakpoint_8xx(struct arch_hw_breakpoint *brk) return 0; } +void __get_breakpoint(struct arch_hw_breakpoint *brk) +{ + memcpy(brk, this_cpu_ptr(¤t_brk), sizeof(*brk)); +} + void __set_breakpoint(struct arch_hw_breakpoint *brk) { memcpy(this_cpu_ptr(¤t_brk), brk, sizeof(*brk)); -- 2.26.1
Re: [PATCH v2 1/7] KVM: s390: clean up redundant 'kvm_run' parameters
On 2020/4/26 20:59, Thomas Huth wrote: On 23/04/2020 13.00, Christian Borntraeger wrote: On 23.04.20 12:58, Tianjia Zhang wrote: On 2020/4/23 18:39, Cornelia Huck wrote: On Thu, 23 Apr 2020 11:01:43 +0800 Tianjia Zhang wrote: On 2020/4/23 0:04, Cornelia Huck wrote: On Wed, 22 Apr 2020 17:58:04 +0200 Christian Borntraeger wrote: On 22.04.20 15:45, Cornelia Huck wrote: On Wed, 22 Apr 2020 20:58:04 +0800 Tianjia Zhang wrote: In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu' structure. Earlier than historical reasons, many kvm-related function s/Earlier than/For/ ? parameters retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This patch does a unified cleanup of these remaining redundant parameters. Signed-off-by: Tianjia Zhang --- arch/s390/kvm/kvm-s390.c | 37 ++--- 1 file changed, 22 insertions(+), 15 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index e335a7e5ead7..d7bb2e7a07ff 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4176,8 +4176,9 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) return rc; } -static void sync_regs_fmt2(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) +static void sync_regs_fmt2(struct kvm_vcpu *vcpu) { + struct kvm_run *kvm_run = vcpu->run; struct runtime_instr_cb *riccb; struct gs_cb *gscb; @@ -4235,7 +4236,7 @@ static void sync_regs_fmt2(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) } if (vcpu->arch.gs_enabled) { current->thread.gs_cb = (struct gs_cb *) - &vcpu->run->s.regs.gscb; + &kvm_run->s.regs.gscb; Not sure if these changes (vcpu->run-> => kvm_run->) are really worth it. (It seems they amount to at least as much as the changes advertised in the patch description.) Other opinions? Agreed. It feels kind of random. Maybe just do the first line (move kvm_run from the function parameter list into the variable declaration)? Not sure if this is better. There's more in this patch that I cut... but I think just moving kvm_run from the parameter list would be much less disruptive. I think there are two kinds of code(`vcpu->run->` and `kvm_run->`), but there will be more disruptive, not less. I just fail to see the benefit; sure, kvm_run-> is convenient, but the current code is just fine, and any rework should be balanced against the cost (e.g. cluttering git annotate). cluttering git annotate ? Does it mean Fix ("comment"). Is it possible to solve this problem by splitting this patch? No its about breaking git blame (and bugfix backports) for just a cosmetic improvement. It could be slightly more than a cosmetic improvement (depending on the smartness of the compiler): vcpu->run-> are two dereferences, while kvm_run-> is only one dereference. So it could be slightly more compact and faster code. Thomas If the compiler is smart enough, this place can be automatically optimized, but we can't just rely on the compiler, if not? This requires a trade-off between code cleanliness readability and breaking git blame. In addition, I have removed the changes here and sent a v4 patch. Please also help review it. Thanks and best, Tianjia
Re: New powerpc vdso calling convention
Excerpts from Adhemerval Zanella's message of April 27, 2020 11:09 pm: > > > On 26/04/2020 00:41, Nicholas Piggin wrote: >> Excerpts from Rich Felker's message of April 26, 2020 9:11 am: >>> On Sun, Apr 26, 2020 at 08:58:19AM +1000, Nicholas Piggin wrote: Excerpts from Christophe Leroy's message of April 25, 2020 10:20 pm: > > > Le 25/04/2020 à 12:56, Nicholas Piggin a écrit : >> Excerpts from Christophe Leroy's message of April 25, 2020 5:47 pm: >>> >>> >>> Le 25/04/2020 à 07:22, Nicholas Piggin a écrit : As noted in the 'scv' thread, powerpc's vdso calling convention does not match the C ELF ABI calling convention (or the proposed scv convention). I think we could implement a new ABI by basically duplicating function entry points with different names. >>> >>> I think doing this is a real good idea. >>> >>> I've been working at porting powerpc VDSO to the GENERIC C VDSO, and the >>> main pitfall has been that our vdso calling convention is not compatible >>> with C calling convention, so we have go through an ASM entry/exit. >>> >>> See >>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=171469 >>> >>> We should kill this error flag return through CR[SO] and get it the >>> "modern" way like other architectectures implementing the C VDSO: return >>> 0 when successfull, return -err when failed. >> >> Agreed. >> The ELF v2 ABI convention would suit it well, because the caller already requires the function address for ctr, so having it in r12 will eliminate the need for address calculation, which suits the vdso data page access. Is there a need for ELF v1 specific calls as well, or could those just be deprecated and remain on existing functions or required to use the ELF v2 calls using asm wrappers? >>> >>> What's ELF v1 and ELF v2 ? Is ELF v1 what PPC32 uses ? If so, I'd say >>> yes, it would be good to have it to avoid going through ASM in the >>> middle. >> >> I'm not sure about PPC32. On PPC64, ELFv2 functions must be called with >> their address in r12 if called at their global entry point. ELFv1 have a >> function descriptor with call address and TOC in it, caller has to load >> the TOC if it's global. >> >> The vdso doesn't have TOC, it has one global address (the vdso data >> page) which it loads by calculating its own address. >> >> The kernel doesn't change the vdso based on whether it's called by a v1 >> or v2 userspace (it doesn't really know itself and would have to export >> different functions). glibc has a hack to create something: >> >> # define VDSO_IFUNC_RET(value) \ >>({ \ >> static Elf64_FuncDesc vdso_opd = { .fd_toc = ~0x0 }; \ >> vdso_opd.fd_func = (Elf64_Addr)value;\ >> &vdso_opd; \ >>}) >> >> If we could make something which links more like any other dso with >> ELFv1, that would be good. Otherwise I think v2 is preferable so it >> doesn't have to calculate its own address. > > I see the following in glibc. So looks like PPC32 is like PPC64 elfv1. > By the way, they are talking about something not completely finished in > the kernel. Can we finish it ? Possibly can. It seems like a good idea to fix all loose ends if we are going to add new versions. Will have to check with the toolchain people to make sure we're doing the right thing. >>> >>> "ELFv1" and "ELFv2" are PPC64-specific names for the old and new >>> version of the ELF psABI for PPC64. They have nothing at all to do >>> with PPC32 which is a completely different ABI from either. >> >> Right, I'm just talking about those comments -- it seems like the kernel >> vdso should contain an .opd section with function descriptors in it for >> elfv1 calls, rather than the hack it has now of creating one in the >> caller's .data section. >> >> But all that function descriptor code is gated by >> >> #if (defined(__PPC64__) || defined(__powerpc64__)) && _CALL_ELF != 2 >> >> So it seems PPC32 does not use function descriptors but a direct pointer >> to the entry point like PPC64 with ELFv2. > > Yes, this hack is only for ELFv1. The missing ODP has not been an issue > or glibc because it has been using the inline assembly to emulate the > functions call since initial vDSO support (INTERNAL_VSYSCALL_CALL_TYPE). > It just has become an issue when I added a ifunc optimization to > gettimeofday so it can bypass the libc.so and make plt branch to vDSO > directly. I can't understand if it's actually a problem for you or not. Regardless if you can hack around it, it seems
Re: [PATCH v6 03/28] powerpc/xmon: Move breakpoints to text section
On Tue, Apr 28, 2020 at 3:36 PM Christophe Leroy wrote: > > > > Le 28/04/2020 à 07:30, Jordan Niethe a écrit : > > On Tue, Apr 28, 2020 at 3:20 PM Christophe Leroy > > wrote: > >> > >> > >> > >> Le 28/04/2020 à 03:57, Jordan Niethe a écrit : > >>> The instructions for xmon's breakpoint are stored bpt_table[] which is in > >>> the data section. This is problematic as the data section may be marked > >>> as no execute. Move bpt_table[] to the text section. > >>> > >>> Signed-off-by: Jordan Niethe > >>> --- > >>> v6: - New to series. Was part of the previous patch. > >>> - Make BPT_SIZE available in assembly > >>> --- > >>>arch/powerpc/kernel/asm-offsets.c | 8 > >>>arch/powerpc/xmon/Makefile| 2 +- > >>>arch/powerpc/xmon/xmon.c | 6 +- > >>>arch/powerpc/xmon/xmon_bpts.S | 9 + > >>>arch/powerpc/xmon/xmon_bpts.h | 14 ++ > >>>5 files changed, 33 insertions(+), 6 deletions(-) > >>>create mode 100644 arch/powerpc/xmon/xmon_bpts.S > >>>create mode 100644 arch/powerpc/xmon/xmon_bpts.h > >>> > >>> diff --git a/arch/powerpc/kernel/asm-offsets.c > >>> b/arch/powerpc/kernel/asm-offsets.c > >>> index c25e562f1cd9..2401f415f423 100644 > >>> --- a/arch/powerpc/kernel/asm-offsets.c > >>> +++ b/arch/powerpc/kernel/asm-offsets.c > >>> @@ -70,6 +70,10 @@ > >>>#include > >>>#endif > >>> > >>> +#ifdef CONFIG_XMON > >>> +#include "../xmon/xmon_bpts.h" > >>> +#endif > >>> + > >>>#define STACK_PT_REGS_OFFSET(sym, val) \ > >>>DEFINE(sym, STACK_FRAME_OVERHEAD + offsetof(struct pt_regs, val)) > >>> > >>> @@ -783,5 +787,9 @@ int main(void) > >>>DEFINE(VIRT_IMMR_BASE, (u64)__fix_to_virt(FIX_IMMR_BASE)); > >>>#endif > >>> > >>> +#ifdef CONFIG_XMON > >>> + DEFINE(BPT_SIZE, BPT_SIZE); > >>> +#endif > >>> + > >>>return 0; > >>>} > >>> diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile > >>> index c3842dbeb1b7..515a13ea6f28 100644 > >>> --- a/arch/powerpc/xmon/Makefile > >>> +++ b/arch/powerpc/xmon/Makefile > >>> @@ -21,7 +21,7 @@ endif > >>> > >>>ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC) > >>> > >>> -obj-y+= xmon.o nonstdio.o spr_access.o > >>> +obj-y+= xmon.o nonstdio.o spr_access.o > >>> xmon_bpts.o > >>> > >>>ifdef CONFIG_XMON_DISASSEMBLY > >>>obj-y += ppc-dis.o ppc-opc.o > >>> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c > >>> index a064392df1b8..f7ce3ea8694c 100644 > >>> --- a/arch/powerpc/xmon/xmon.c > >>> +++ b/arch/powerpc/xmon/xmon.c > >>> @@ -62,6 +62,7 @@ > >>> > >>>#include "nonstdio.h" > >>>#include "dis-asm.h" > >>> +#include "xmon_bpts.h" > >>> > >>>#ifdef CONFIG_SMP > >>>static cpumask_t cpus_in_xmon = CPU_MASK_NONE; > >>> @@ -108,7 +109,6 @@ struct bpt { > >>>#define BP_TRAP 2 > >>>#define BP_DABR 4 > >>> > >>> -#define NBPTS256 > >>>static struct bpt bpts[NBPTS]; > >>>static struct bpt dabr; > >>>static struct bpt *iabr; > >>> @@ -116,10 +116,6 @@ static unsigned bpinstr = 0x7fe8;/* trap */ > >>> > >>>#define BP_NUM(bp) ((bp) - bpts + 1) > >>> > >>> -#define BPT_SIZE (sizeof(unsigned int) * 2) > >>> -#define BPT_WORDS (BPT_SIZE / sizeof(unsigned int)) > >>> -static unsigned int bpt_table[NBPTS * BPT_WORDS]; > >>> - > >>>/* Prototypes */ > >>>static int cmds(struct pt_regs *); > >>>static int mread(unsigned long, void *, int); > >>> diff --git a/arch/powerpc/xmon/xmon_bpts.S b/arch/powerpc/xmon/xmon_bpts.S > >>> new file mode 100644 > >>> index ..f3ad0ab50854 > >>> --- /dev/null > >>> +++ b/arch/powerpc/xmon/xmon_bpts.S > >>> @@ -0,0 +1,9 @@ > >>> +/* SPDX-License-Identifier: GPL-2.0 */ > >>> +#include > >>> +#include > >>> +#include > >>> +#include "xmon_bpts.h" > >>> + > >>> +.global bpt_table > >>> +bpt_table: > >>> + .space NBPTS * BPT_SIZE > >> > >> No alignment required ? Standard alignment (probably 4 bytes ?) is > >> acceptable ? > > No, I'll add a .balign4 to be sure. > > I think it is aligned to 4 by default. My question was to know whether 4 > is enough. > I see BPT_SIZE is 8, should the alignment be at least 8 ? At this point each breakpoint entry is two instructions: the instruction that has been replaced by a breakpoint and a trap instruction after that. So I think it should be fine aligned to 4. In the patch that introduces prefixed instructions to the data type, the alignment changes. > > >> > >> > >>> diff --git a/arch/powerpc/xmon/xmon_bpts.h b/arch/powerpc/xmon/xmon_bpts.h > >>> new file mode 100644 > >>> index ..b7e94375db86 > >>> --- /dev/null > >>> +++ b/arch/powerpc/xmon/xmon_bpts.h > >>> @@ -0,0 +1,14 @@ > >>> +/* SPDX-License-Identifier: GPL-2.0 */ > >>> +#ifndef XMON_BPTS_H > >>> +#define XMON_BPTS_H > >>> + > >>> +#define NBPTS256 > >>> +#ifndef __ASSEMBLY__ > >>> +#define BPT_S
Re: [PATCH v6 00/28] Initial Prefixed Instruction support
On Tue, Apr 28, 2020 at 8:07 PM Balamuruhan S wrote: > > On Tue, 2020-04-28 at 11:57 +1000, Jordan Niethe wrote: > > A future revision of the ISA will introduce prefixed instructions. A > > prefixed instruction is composed of a 4-byte prefix followed by a > > 4-byte suffix. > > > > All prefixes have the major opcode 1. A prefix will never be a valid > > word instruction. A suffix may be an existing word instruction or a > > new instruction. > > > > This series enables prefixed instructions and extends the instruction > > emulation to support them. Then the places where prefixed instructions > > might need to be emulated are updated. > > Hi Jordan, > > I tried to test Kprobes with prefixed instruction on this patchset and > observed that kprobe/uprobe enablement patches are missing from v4. > Till v3 it were available, > > https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200211053355.21574-11-jniet...@gmail.com/ > > https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200211053355.21574-12-jniet...@gmail.com/ > > was it missed for any dependencies/reason ? or will you plan it include in > next > version ? V4 was when I introduced the data type of prefixed instructions, that along with helper functions covered a lot of what those enablement patches were doing. All of the stuff in that uprobes patch gets done without needing a specific patch now. I'll add back in the checks for prefix instructions in arch_prepare_kprobe(). > please let me know if you need help on it. > > -- Bala > > > > > v6 is based on feedback from Balamuruhan Suriyakumar, Alistair Popple, > > Christophe Leroy and Segher Boessenkool. > > The major changes: > > - Use the instruction type in more places that had been missed before > > - Fix issues with ppc32 > > - Introduce new self tests for code patching and feature fixups > > > > v5 is based on feedback from Nick Piggins, Michael Ellerman, Balamuruhan > > Suriyakumar and Alistair Popple. > > The major changes: > > - The ppc instruction type is now a struct > > - Series now just based on next > > - ppc_inst_masked() dropped > > - Space for xmon breakpoints allocated in an assembly file > > - "Add prefixed instructions to instruction data type" patch seperated > > in > > to smaller patches > > - Calling convention for create_branch() is changed > > - Some places which had not been updated to use the data type are now > > updated > > > > v4 is based on feedback from Nick Piggins, Christophe Leroy and Daniel > > Axtens. > > The major changes: > > - Move xmon breakpoints from data section to text section > > - Introduce a data type for instructions on powerpc > > > > v3 is based on feedback from Christophe Leroy. The major changes: > > - Completely replacing store_inst() with patch_instruction() in > > xmon > > - Improve implementation of mread_instr() to not use mread(). > > - Base the series on top of > > https://patchwork.ozlabs.org/patch/1232619/ as this will effect > > kprobes. > > - Some renaming and simplification of conditionals. > > > > v2 incorporates feedback from Daniel Axtens and and Balamuruhan > > S. The major changes are: > > - Squashing together all commits about SRR1 bits > > - Squashing all commits for supporting prefixed load stores > > - Changing abbreviated references to sufx/prfx -> suffix/prefix > > - Introducing macros for returning the length of an instruction > > - Removing sign extension flag from pstd/pld in sstep.c > > - Dropping patch "powerpc/fault: Use analyse_instr() to check for > > store with updates to sp" from the series, it did not really fit > > with prefixed enablement in the first place and as reported by Greg > > Kurz did not work correctly. > > > > Alistair Popple (1): > > powerpc: Enable Prefixed Instructions > > > > Jordan Niethe (27): > > powerpc/xmon: Remove store_inst() for patch_instruction() > > powerpc/xmon: Move breakpoint instructions to own array > > powerpc/xmon: Move breakpoints to text section > > powerpc/xmon: Use bitwise calculations in_breakpoint_table() > > powerpc: Change calling convention for create_branch() et. al. > > powerpc: Use a macro for creating instructions from u32s > > powerpc: Use an accessor for instructions > > powerpc: Use a function for getting the instruction op code > > powerpc: Use a function for byte swapping instructions > > powerpc: Introduce functions for instruction equality > > powerpc: Use a datatype for instructions > > powerpc: Use a function for reading instructions > > powerpc: Add a probe_user_read_inst() function > > powerpc: Add a probe_kernel_read_inst() function > > powerpc/kprobes: Use patch_instruction() > > powerpc: Define and use __get_user_instr{,inatomic}() > > powerpc: Introduce a function for reporting instruction length > > powerpc/xmon: Use a function for reading instructions > > powerpc/xmon: Move in
Re: [PATCH v6 10/28] powerpc: Introduce functions for instruction equality
On Wed, Apr 29, 2020 at 11:59 AM Alistair Popple wrote: > > There seems to be a minor typo which breaks compilation when > CONFIG_MPROFILE_KERNEL is not enabled. See the fix below. > > --- > arch/powerpc/kernel/trace/ftrace.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ > ftrace.c > index a6064e1977ca..0ad2c9d4ab49 100644 > --- a/arch/powerpc/kernel/trace/ftrace.c > +++ b/arch/powerpc/kernel/trace/ftrace.c > @@ -499,7 +499,7 @@ expected_nop_sequence(void *ip, struct ppc_inst op0, > struct ppc_inst op1) > * The load offset is different depending on the ABI. For simplicity > * just mask it out when doing the compare. > */ > - if ((!ppc_inst_equal(op0), ppc_inst(0x4808)) || > (ppc_inst_val(op1) & > 0x) != 0xe841) > + if ((!ppc_inst_equal(op0, ppc_inst(0x4808))) || > (ppc_inst_val(op1) & > 0x) != 0xe841) > return 0; > return 1; > } Thank you. > -- > 2.20.1 > > > >
Re: [PATCH v6 11/28] powerpc: Use a datatype for instructions
On Wed, Apr 29, 2020 at 12:02 PM Alistair Popple wrote: > > Hi Jordan, > > I needed the below fix for building with CONFIG_STRICT_KERNEL_RWX enabled. > Hopefully it's correct, I have not yet had a chance to test it beyond building > it. Thanks, I'll get that working. > > - Alistair > > --- > arch/powerpc/lib/code-patching.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code- > patching.c > index ad5754c5f007..a8c8ffdb1ccd 100644 > --- a/arch/powerpc/lib/code-patching.c > +++ b/arch/powerpc/lib/code-patching.c > @@ -166,8 +166,8 @@ static int do_patch_instruction(struct ppc_inst *addr, > struct ppc_inst instr) > goto out; > } > > - patch_addr = (unsigned int *)(text_poke_addr) + > - ((kaddr & ~PAGE_MASK) / sizeof(unsigned int)); > + patch_addr = (struct ppc_inst *)(text_poke_addr) + > + ((kaddr & ~PAGE_MASK) / sizeof(unsigned int)); Hmm, I think this might not give the expected result, with struct ppc_inst being 8 long, compared to unsigned int being 4 long. So the pointer arithmetic will not give the patch_addr you would expecting. > > __patch_instruction(addr, instr, patch_addr); > > -- > 2.20.1 > > > >
Re: Section mismatch in reference from the function .early_init_mmu() to the function .init.text:.radix__early_init_mmu() after PowerPC updates 5.7-1
Excerpts from Christian Zigotzky's message of April 29, 2020 2:53 pm: > Hi All, > > The issue still exists in the RC3. (kernel config attached) > > Please help me to fix this issue. Huh, looks like maybe early_init_mmu() got uninlined because the compiler decided it was unlikely. Does this fix it? Thanks, Nick -- diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h index bb3deb76c951..3ffe5f967483 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu.h +++ b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -208,7 +208,7 @@ void hash__early_init_devtree(void); void radix__early_init_devtree(void); extern void hash__early_init_mmu(void); extern void radix__early_init_mmu(void); -static inline void early_init_mmu(void) +static inline void __init early_init_mmu(void) { if (radix_enabled()) return radix__early_init_mmu();
Re: [RFC PATCH v2 1/5] powerpc/mm: Introduce temporary mm
Le 29/04/2020 à 04:05, Christopher M. Riedl a écrit : x86 supports the notion of a temporary mm which restricts access to temporary PTEs to a single CPU. A temporary mm is useful for situations where a CPU needs to perform sensitive operations (such as patching a STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing said mappings to other CPUs. A side benefit is that other CPU TLBs do not need to be flushed when the temporary mm is torn down. Mappings in the temporary mm can be set in the userspace portion of the address-space. Interrupts must be disabled while the temporary mm is in use. HW breakpoints, which may have been set by userspace as watchpoints on addresses now within the temporary mm, are saved and disabled when loading the temporary mm. The HW breakpoints are restored when unloading the temporary mm. All HW breakpoints are indiscriminately disabled while the temporary mm is in use. Based on x86 implementation: commit cefa929c034e ("x86/mm: Introduce temporary mm structs") Signed-off-by: Christopher M. Riedl --- arch/powerpc/include/asm/debug.h | 1 + arch/powerpc/include/asm/mmu_context.h | 54 ++ arch/powerpc/kernel/process.c | 5 +++ 3 files changed, 60 insertions(+) diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h index 7756026b95ca..b945bc16c932 100644 --- a/arch/powerpc/include/asm/debug.h +++ b/arch/powerpc/include/asm/debug.h @@ -45,6 +45,7 @@ static inline int debugger_break_match(struct pt_regs *regs) { return 0; } static inline int debugger_fault_handler(struct pt_regs *regs) { return 0; } #endif +void __get_breakpoint(struct arch_hw_breakpoint *brk); void __set_breakpoint(struct arch_hw_breakpoint *brk); bool ppc_breakpoint_available(void); #ifdef CONFIG_PPC_ADV_DEBUG_REGS diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 360367c579de..57a8695fe63f 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -10,6 +10,7 @@ #include #include #include +#include /* * Most if the context management is out of line @@ -270,5 +271,58 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm, return 0; } +struct temp_mm { + struct mm_struct *temp; + struct mm_struct *prev; + bool is_kernel_thread; + struct arch_hw_breakpoint brk; +}; + +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm) +{ + temp_mm->temp = mm; + temp_mm->prev = NULL; + temp_mm->is_kernel_thread = false; + memset(&temp_mm->brk, 0, sizeof(temp_mm->brk)); +} + +static inline void use_temporary_mm(struct temp_mm *temp_mm) +{ + lockdep_assert_irqs_disabled(); + + temp_mm->is_kernel_thread = current->mm == NULL; + if (temp_mm->is_kernel_thread) + temp_mm->prev = current->active_mm; + else + temp_mm->prev = current->mm; + + /* +* Hash requires a non-NULL current->mm to allocate a userspace address +* when handling a page fault. Does not appear to hurt in Radix either. +*/ + current->mm = temp_mm->temp; + switch_mm_irqs_off(NULL, temp_mm->temp, current); + + if (ppc_breakpoint_available()) { + __get_breakpoint(&temp_mm->brk); + if (temp_mm->brk.type != 0) + hw_breakpoint_disable(); + } +} + +static inline void unuse_temporary_mm(struct temp_mm *temp_mm) Not sure "unuse" is a best naming, allthought I don't have a better suggestion a the moment. If not using temporary_mm anymore, what are we using now ? +{ + lockdep_assert_irqs_disabled(); + + if (temp_mm->is_kernel_thread) + current->mm = NULL; + else + current->mm = temp_mm->prev; + switch_mm_irqs_off(NULL, temp_mm->prev, current); + + if (ppc_breakpoint_available() && temp_mm->brk.type != 0) + __set_breakpoint(&temp_mm->brk); +} + #endif /* __KERNEL__ */ #endif /* __ASM_POWERPC_MMU_CONTEXT_H */ diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 9c21288f8645..ec4cf890d92c 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -800,6 +800,11 @@ static inline int set_breakpoint_8xx(struct arch_hw_breakpoint *brk) return 0; } +void __get_breakpoint(struct arch_hw_breakpoint *brk) +{ + memcpy(brk, this_cpu_ptr(¤t_brk), sizeof(*brk)); +} + void __set_breakpoint(struct arch_hw_breakpoint *brk) { memcpy(this_cpu_ptr(¤t_brk), brk, sizeof(*brk)); Christophe
Re: [RFC PATCH v2 1/5] powerpc/mm: Introduce temporary mm
Le 29/04/2020 à 04:05, Christopher M. Riedl a écrit : x86 supports the notion of a temporary mm which restricts access to temporary PTEs to a single CPU. A temporary mm is useful for situations where a CPU needs to perform sensitive operations (such as patching a STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing said mappings to other CPUs. A side benefit is that other CPU TLBs do not need to be flushed when the temporary mm is torn down. Mappings in the temporary mm can be set in the userspace portion of the address-space. Interrupts must be disabled while the temporary mm is in use. HW breakpoints, which may have been set by userspace as watchpoints on addresses now within the temporary mm, are saved and disabled when loading the temporary mm. The HW breakpoints are restored when unloading the temporary mm. All HW breakpoints are indiscriminately disabled while the temporary mm is in use. Why do we need to use a temporary mm all the time ? Doesn't each CPU have its own mm already ? Only the upper address space is shared between all mm's but each mm has its own lower address space, at least when it is running a user process. Why not just use that mm ? As we are mapping then unmapping with interrupts disabled, there is no risk at all that the user starts running while the patch page is mapped, so I'm not sure why switching to a temporary mm is needed. Based on x86 implementation: commit cefa929c034e ("x86/mm: Introduce temporary mm structs") Signed-off-by: Christopher M. Riedl Christophe
Re: [RFC PATCH v2 3/5] powerpc/lib: Use a temporary mm for code patching
Le 29/04/2020 à 04:05, Christopher M. Riedl a écrit : Currently, code patching a STRICT_KERNEL_RWX exposes the temporary mappings to other CPUs. These mappings should be kept local to the CPU doing the patching. Use the pre-initialized temporary mm and patching address for this purpose. Also add a check after patching to ensure the patch succeeded. Use the KUAP functions on non-BOOKS3_64 platforms since the temporary mapping for patching uses a userspace address (to keep the mapping local). On BOOKS3_64 platforms hash does not implement KUAP and on radix the use of PAGE_KERNEL sets EAA[0] for the PTE which means the AMR (KUAP) protection is ignored (see PowerISA v3.0b, Fig, 35). Based on x86 implementation: commit b3fd8e83ada0 ("x86/alternatives: Use temporary mm for text poking") Signed-off-by: Christopher M. Riedl --- arch/powerpc/lib/code-patching.c | 149 --- 1 file changed, 55 insertions(+), 94 deletions(-) diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index 259c19480a85..26f06cdb5d7e 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -19,6 +19,7 @@ #include #include #include +#include static int __patch_instruction(unsigned int *exec_addr, unsigned int instr, unsigned int *patch_addr) @@ -72,101 +73,58 @@ void __init poking_init(void) pte_unmap_unlock(ptep, ptl); } -static DEFINE_PER_CPU(struct vm_struct *, text_poke_area); - -static int text_area_cpu_up(unsigned int cpu) -{ - struct vm_struct *area; - - area = get_vm_area(PAGE_SIZE, VM_ALLOC); - if (!area) { - WARN_ONCE(1, "Failed to create text area for cpu %d\n", - cpu); - return -1; - } - this_cpu_write(text_poke_area, area); - - return 0; -} - -static int text_area_cpu_down(unsigned int cpu) -{ - free_vm_area(this_cpu_read(text_poke_area)); - return 0; -} - -/* - * Run as a late init call. This allows all the boot time patching to be done - * simply by patching the code, and then we're called here prior to - * mark_rodata_ro(), which happens after all init calls are run. Although - * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge - * it as being preferable to a kernel that will crash later when someone tries - * to use patch_instruction(). - */ -static int __init setup_text_poke_area(void) -{ - BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, - "powerpc/text_poke:online", text_area_cpu_up, - text_area_cpu_down)); - - return 0; -} -late_initcall(setup_text_poke_area); +struct patch_mapping { + spinlock_t *ptl; /* for protecting pte table */ + pte_t *ptep; + struct temp_mm temp_mm; +}; /* * This can be called for kernel text or a module. */ -static int map_patch_area(void *addr, unsigned long text_poke_addr) +static int map_patch(const void *addr, struct patch_mapping *patch_mapping) { - unsigned long pfn; - int err; + struct page *page; + pte_t pte; + pgprot_t pgprot; if (is_vmalloc_addr(addr)) - pfn = vmalloc_to_pfn(addr); + page = vmalloc_to_page(addr); else - pfn = __pa_symbol(addr) >> PAGE_SHIFT; + page = virt_to_page(addr); - err = map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL); + if (radix_enabled()) + pgprot = PAGE_KERNEL; + else + pgprot = PAGE_SHARED; - pr_devel("Mapped addr %lx with pfn %lx:%d\n", text_poke_addr, pfn, err); - if (err) + patch_mapping->ptep = get_locked_pte(patching_mm, patching_addr, +&patch_mapping->ptl); + if (unlikely(!patch_mapping->ptep)) { + pr_warn("map patch: failed to allocate pte for patching\n"); return -1; + } + + pte = mk_pte(page, pgprot); + if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64)) + pte = pte_mkdirty(pte); Why only when CONFIG_PPC_BOOK3S_64 is not set ? PAGE_KERNEL should already be dirty, so making it dirty all the time shouldn't hurt. + set_pte_at(patching_mm, patching_addr, patch_mapping->ptep, pte); + + init_temp_mm(&patch_mapping->temp_mm, patching_mm); + use_temporary_mm(&patch_mapping->temp_mm); return 0; } -static inline int unmap_patch_area(unsigned long addr) +static void unmap_patch(struct patch_mapping *patch_mapping) { - pte_t *ptep; - pmd_t *pmdp; - pud_t *pudp; - pgd_t *pgdp; - - pgdp = pgd_offset_k(addr); - if (unlikely(!pgdp)) - return -EINVAL; - - pudp = pud_offset(pgdp, addr); - if (unlikely(!pudp)) - return -EINVAL; - - pmdp = pmd_offset(pudp, addr); - if (unlikely(!pmdp)) - return -EINVAL; - -
[PATCH 1/2] tools/perf: set no_auxtrace for powerpc
x86/perf_regs.h is included by util/intel-pt.c, which will get compiled when buiding perf on powerpc. Since x86/perf_regs.h has `PERF_EXTENDED_REG_MASK` defined, defining `PERF_EXTENDED_REG_MASK` for powerpc to add support for perf extended regs will result in perf build error on powerpc. Currently powerpc architecture is not having support for auxtrace. So as a workaround for this issue, set NO_AUXTRACE for powerpc. Signed-off-by: Anju T Sudhakar --- tools/perf/arch/powerpc/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/perf/arch/powerpc/Makefile b/tools/perf/arch/powerpc/Makefile index e58d00d62f02..9ebb5f513605 100644 --- a/tools/perf/arch/powerpc/Makefile +++ b/tools/perf/arch/powerpc/Makefile @@ -3,6 +3,7 @@ ifndef NO_DWARF PERF_HAVE_DWARF_REGS := 1 endif +NO_AUXTRACE := 1 HAVE_KVM_STAT_SUPPORT := 1 PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET := 1 PERF_HAVE_JITDUMP := 1 -- 2.20.1
[PATCH 0/2] powerpc/perf: Add support for perf extended regs in powerpc
Patch set to add support for perf extended register capability in powerpc. The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the PMU which support extended registers. The generic code define the mask of extended registers as 0 for non supported architectures. patch 2/2 defines this PERF_PMU_CAP_EXTENDED_REGS mask to output the values of mmcr0,mmcr1,mmcr2 for POWER9. x86/perf_regs.h is included by util/intel-pt.c, which will get compiled when buiding perf on powerpc. Since x86/perf_regs.h has `PERF_EXTENDED_REG_MASK` defined, defining `PERF_EXTENDED_REG_MASK` for powerpc to add support for perf extended regs will result in perf build error on powerpc. Currently powerpc architecture is not having support for auxtrace. So as a workaround for this issue, patch 1/2 set NO_AUXTRACE for powerpc. (Any other solutions are welcome.) Patch 2/2 also add extended regs to sample_reg_mask in the tool side to use with `-I?` option. Anju T Sudhakar (2): tools/perf: set no_auxtrace for powerpc powerpc/perf: Add support for outputting extended regs in perf intr_regs arch/powerpc/include/asm/perf_event_server.h | 5 +++ arch/powerpc/include/uapi/asm/perf_regs.h | 13 +++- arch/powerpc/perf/core-book3s.c | 1 + arch/powerpc/perf/perf_regs.c | 29 ++-- arch/powerpc/perf/power9-pmu.c| 1 + .../arch/powerpc/include/uapi/asm/perf_regs.h | 13 +++- tools/perf/arch/powerpc/Makefile | 1 + tools/perf/arch/powerpc/include/perf_regs.h | 6 +++- tools/perf/arch/powerpc/util/perf_regs.c | 33 +++ 9 files changed, 96 insertions(+), 6 deletions(-) -- 2.20.1
[PATCH 2/2] powerpc/perf: Add support for outputting extended regs in perf intr_regs
The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the PMU which support extended registers. The generic code define the mask of extended registers as 0 for non supported architectures. Add support for extended registers in POWER9 architecture. For POWER9, the extended registers are mmcr0, mmc1 and mmcr2. REG_RESERVED mask is redefined to accommodate the extended registers. With patch: # perf record -I? available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26 r27 r28 r29 r30 r31 nip msr orig_r3 ctr link xer ccr softe trap dar dsisr sier mmcra mmcr0 mmcr1 mmcr2 # perf record -I ls # perf script -D PERF_RECORD_SAMPLE(IP, 0x1): 9019/9019: 0 period: 1 addr: 0 ... intr regs: mask 0x ABI 64-bit r00xc011b12c r10xc03f9a98b930 r20xc1a32100 r30xc03f8fe9a800 r40xc03fd181 r50x3e32557150 r60xc03f9a98b908 r70xffc1cdae06ac r80x818 [.] r31 0xc03ffd047230 nip 0xc011b2c0 msr 0x90009033 orig_r3 0xc011b21c ctr 0xc0119380 link 0xc011b12c xer 0x0 ccr 0x2800 softe 0x1 trap 0xf00 dar 0x0 dsisr 0x800 sier 0x0 mmcra 0x800 mmcr0 0x82008090 mmcr1 0x1e00 mmcr2 0x0 ... thread: perf:9019 Signed-off-by: Anju T Sudhakar --- arch/powerpc/include/asm/perf_event_server.h | 5 +++ arch/powerpc/include/uapi/asm/perf_regs.h | 13 +++- arch/powerpc/perf/core-book3s.c | 1 + arch/powerpc/perf/perf_regs.c | 29 ++-- arch/powerpc/perf/power9-pmu.c| 1 + .../arch/powerpc/include/uapi/asm/perf_regs.h | 13 +++- tools/perf/arch/powerpc/include/perf_regs.h | 6 +++- tools/perf/arch/powerpc/util/perf_regs.c | 33 +++ 8 files changed, 95 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h index 3e9703f44c7c..1d15953bd99e 100644 --- a/arch/powerpc/include/asm/perf_event_server.h +++ b/arch/powerpc/include/asm/perf_event_server.h @@ -55,6 +55,11 @@ struct power_pmu { int *blacklist_ev; /* BHRB entries in the PMU */ int bhrb_nr; + /* +* set this flag with `PERF_PMU_CAP_EXTENDED_REGS` if +* the pmu supports extended perf regs capability +*/ + int capabilities; }; /* diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h b/arch/powerpc/include/uapi/asm/perf_regs.h index f599064dd8dc..604b831378fe 100644 --- a/arch/powerpc/include/uapi/asm/perf_regs.h +++ b/arch/powerpc/include/uapi/asm/perf_regs.h @@ -48,6 +48,17 @@ enum perf_event_powerpc_regs { PERF_REG_POWERPC_DSISR, PERF_REG_POWERPC_SIER, PERF_REG_POWERPC_MMCRA, - PERF_REG_POWERPC_MAX, + /* Extended registers */ + PERF_REG_POWERPC_MMCR0, + PERF_REG_POWERPC_MMCR1, + PERF_REG_POWERPC_MMCR2, + PERF_REG_EXTENDED_MAX, + /* Max regs without the extended regs */ + PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1, }; + +#define PERF_REG_PMU_MASK ((1ULL << PERF_REG_POWERPC_MAX) - 1) +#define PERF_REG_EXTENDED_MASK (((1ULL << (PERF_REG_EXTENDED_MAX))\ + - 1) - PERF_REG_PMU_MASK) + #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */ diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 3dcfecf858f3..f56b77800a7b 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -2276,6 +2276,7 @@ int register_power_pmu(struct power_pmu *pmu) power_pmu.attr_groups = ppmu->attr_groups; + power_pmu.capabilities |= (ppmu->capabilities & PERF_PMU_CAP_EXTENDED_REGS); #ifdef MSR_HV /* * Use FCHV to ignore kernel events if MSR.HV is set. diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c index a213a0aa5d25..57aa02568caf 100644 --- a/arch/powerpc/perf/perf_regs.c +++ b/arch/powerpc/perf/perf_regs.c @@ -15,7 +15,8 @@ #define PT_REGS_OFFSET(id, r) [id] = offsetof(struct pt_regs, r) -#define REG_RESERVED (~((1ULL << PERF_REG_POWERPC_MAX) - 1)) +#define REG_RESERVED (~(PERF_REG_EXTENDED_MASK) & \ + (~((1ULL << PERF_REG_POWERPC_MAX) - 1))) static unsigned int pt_regs_offset[PERF_REG_POWERPC_MAX] = { PT_REGS_OFFSET(PERF_REG_POWERPC_R0, gpr[0]), @@ -69,10 +70,22 @@ static unsigned int pt_regs_offset[PERF_REG_POWERPC_MAX] = { PT_REGS_OFFSET(PERF_REG_POWERPC_MMCRA, dsisr), }; +/* Function to return the extended register values */ +static u64 get_ext_regs_value(int idx) +{ + switch (idx) { + case PERF_REG_POWERPC_MMCR0: + return mfspr(SPRN_MMCR0);
Re: [RFC PATCH] powerpc/spufs: fix copy_to_user while atomic
On Wed, Apr 29, 2020 at 09:36:30AM +0800, Jeremy Kerr wrote: > Hi Christoph, > > > FYI, these little hunks reduce the difference to my version, maybe > > you can fold them in? > > Sure, no problem. > > How do you want to coordinate these? I can submit mine through mpe, but > that may make it tricky to synchronise with your changes. Or, you can > include this change in your series if you prefer. Maybe you can feed your patch directly to Linus through Michael ASAP, and I'll wait for that before resubmitting this series?
Re: [RFC PATCH] powerpc/spufs: fix copy_to_user while atomic
On Wed, Apr 29, 2020 at 08:05:53AM +0200, Christoph Hellwig wrote: > On Wed, Apr 29, 2020 at 09:36:30AM +0800, Jeremy Kerr wrote: > > Hi Christoph, > > > > > FYI, these little hunks reduce the difference to my version, maybe > > > you can fold them in? > > > > Sure, no problem. > > > > How do you want to coordinate these? I can submit mine through mpe, but > > that may make it tricky to synchronise with your changes. Or, you can > > include this change in your series if you prefer. > > Maybe you can feed your patch directly to Linus through Michael > ASAP, and I'll wait for that before resubmitting this series? Btw, turns out my fold patch didn't actually compile without the rebased patch on top, sorry. Here is the proper one: diff --git a/arch/powerpc/platforms/cell/spufs/file.c b/arch/powerpc/platforms/cell/spufs/file.c index c62d77ddaf7d3..b4e1ef650b406 100644 --- a/arch/powerpc/platforms/cell/spufs/file.c +++ b/arch/powerpc/platforms/cell/spufs/file.c @@ -2107,7 +2107,6 @@ static const struct file_operations spufs_wbox_info_fops = { static void ___spufs_dma_info_read(struct spu_context *ctx, struct spu_dma_info *info) { - struct mfc_cq_sr *qp, *spuqp; int i; info->dma_info_type = ctx->csa.priv2.spu_tag_status_query_RW; @@ -2116,8 +2115,8 @@ static void ___spufs_dma_info_read(struct spu_context *ctx, info->dma_info_stall_and_notify = ctx->csa.spu_chnldata_RW[25]; info->dma_info_atomic_command_status = ctx->csa.spu_chnldata_RW[27]; for (i = 0; i < 16; i++) { - qp = &info->dma_info_command_data[i]; - spuqp = &ctx->csa.priv2.spuq[i]; + struct mfc_cq_sr *qp = &info->dma_info_command_data[i]; + struct mfc_cq_sr *spuqp = &ctx->csa.priv2.spuq[i]; qp->mfc_cq_data0_RW = spuqp->mfc_cq_data0_RW; qp->mfc_cq_data1_RW = spuqp->mfc_cq_data1_RW; @@ -2166,7 +2165,7 @@ static const struct file_operations spufs_dma_info_fops = { }; static void ___spufs_proxydma_info_read(struct spu_context *ctx, - struct spu_proxydma_info *info) + struct spu_proxydma_info *info) { int i;
Re: [RFC PATCH] powerpc/spufs: fix copy_to_user while atomic
And another one that should go on top of this one to address Al's other compaint: --- >From 1b7ced3de0b3a4addec61f61ac5278c3ff141657 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 22 Apr 2020 09:05:30 +0200 Subject: powerpc/spufs: stop using access_ok Just use the proper non __-prefixed get/put_user variants where that is not done yet. Signed-off-by: Christoph Hellwig --- arch/powerpc/platforms/cell/spufs/file.c | 42 +--- 1 file changed, 8 insertions(+), 34 deletions(-) diff --git a/arch/powerpc/platforms/cell/spufs/file.c b/arch/powerpc/platforms/cell/spufs/file.c index b4e1ef650b406..cd7d10f27fad1 100644 --- a/arch/powerpc/platforms/cell/spufs/file.c +++ b/arch/powerpc/platforms/cell/spufs/file.c @@ -590,17 +590,12 @@ static ssize_t spufs_mbox_read(struct file *file, char __user *buf, size_t len, loff_t *pos) { struct spu_context *ctx = file->private_data; - u32 mbox_data, __user *udata; + u32 mbox_data, __user *udata = (void __user *)buf; ssize_t count; if (len < 4) return -EINVAL; - if (!access_ok(buf, len)) - return -EFAULT; - - udata = (void __user *)buf; - count = spu_acquire(ctx); if (count) return count; @@ -616,7 +611,7 @@ static ssize_t spufs_mbox_read(struct file *file, char __user *buf, * but still need to return the data we have * read successfully so far. */ - ret = __put_user(mbox_data, udata); + ret = put_user(mbox_data, udata); if (ret) { if (!count) count = -EFAULT; @@ -698,17 +693,12 @@ static ssize_t spufs_ibox_read(struct file *file, char __user *buf, size_t len, loff_t *pos) { struct spu_context *ctx = file->private_data; - u32 ibox_data, __user *udata; + u32 ibox_data, __user *udata = (void __user *)buf; ssize_t count; if (len < 4) return -EINVAL; - if (!access_ok(buf, len)) - return -EFAULT; - - udata = (void __user *)buf; - count = spu_acquire(ctx); if (count) goto out; @@ -727,7 +717,7 @@ static ssize_t spufs_ibox_read(struct file *file, char __user *buf, } /* if we can't write at all, return -EFAULT */ - count = __put_user(ibox_data, udata); + count = put_user(ibox_data, udata); if (count) goto out_unlock; @@ -741,7 +731,7 @@ static ssize_t spufs_ibox_read(struct file *file, char __user *buf, * but still need to return the data we have * read successfully so far. */ - ret = __put_user(ibox_data, udata); + ret = put_user(ibox_data, udata); if (ret) break; } @@ -836,17 +826,13 @@ static ssize_t spufs_wbox_write(struct file *file, const char __user *buf, size_t len, loff_t *pos) { struct spu_context *ctx = file->private_data; - u32 wbox_data, __user *udata; + u32 wbox_data, __user *udata = (void __user *)buf; ssize_t count; if (len < 4) return -EINVAL; - udata = (void __user *)buf; - if (!access_ok(buf, len)) - return -EFAULT; - - if (__get_user(wbox_data, udata)) + if (get_user(wbox_data, udata)) return -EFAULT; count = spu_acquire(ctx); @@ -873,7 +859,7 @@ static ssize_t spufs_wbox_write(struct file *file, const char __user *buf, /* write as much as possible */ for (count = 4, udata++; (count + 4) <= len; count += 4, udata++) { int ret; - ret = __get_user(wbox_data, udata); + ret = get_user(wbox_data, udata); if (ret) break; @@ -1982,9 +1968,6 @@ static ssize_t spufs_mbox_info_read(struct file *file, char __user *buf, u32 stat, data; int ret; - if (!access_ok(buf, len)) - return -EFAULT; - ret = spu_acquire_saved(ctx); if (ret) return ret; @@ -2028,9 +2011,6 @@ static ssize_t spufs_ibox_info_read(struct file *file, char __user *buf, u32 stat, data; int ret; - if (!access_ok(buf, len)) - return -EFAULT; - ret = spu_acquire_saved(ctx); if (ret) return ret; @@ -2082,9 +2062,6 @@ static ssize_t spufs_wbox_info_read(struct file *file, char __user *buf, u32 data[ARRAY_SIZE(ctx->csa.spu_mailbox_data)]; int ret, count; - if (!access_ok(buf, len)) - return -EFAULT; - ret = spu_acquire_saved(ctx); if (ret) return ret; @@ -2143,9 +2120,6 @@ static ssize_t spufs_dma_info_read(struct file *fil
Re: [PATCH] fixup! signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32
Le 28/04/2020 à 21:56, Arnd Bergmann a écrit : I think I found a way to improve the x32 handling: This is a simplification over Christoph's "[PATCH 2/7] signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32", reducing the x32 specifics in the common code to a single #ifdef/#endif check, in order to keep it more readable for everyone else. Christoph, if you like it, please fold into your patch. Cc: Andrew Morton Cc: Alexander Viro Cc: Jeremy Kerr Cc: Arnd Bergmann Cc: "Eric W . Biederman" Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-fsde...@vger.kernel.org Signed-off-by: Arnd Bergmann --- arch/x86/include/asm/compat.h | 10 ++ arch/x86/kernel/signal.c | 23 +++ kernel/signal.c | 15 ++- 3 files changed, 35 insertions(+), 13 deletions(-) [...] diff --git a/kernel/signal.c b/kernel/signal.c index 1a81602050b4..935facca4860 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -3318,29 +3318,18 @@ void copy_siginfo_to_external32(struct compat_siginfo *to, } } +#ifndef CONFIG_X86_X32_ABI Can it be declared __weak instead of enclosing it in an #ifndef ? int copy_siginfo_to_user32(struct compat_siginfo __user *to, const struct kernel_siginfo *from) -#if defined(CONFIG_X86_X32_ABI) || defined(CONFIG_IA32_EMULATION) -{ - return __copy_siginfo_to_user32(to, from, in_x32_syscall()); -} -int __copy_siginfo_to_user32(struct compat_siginfo __user *to, -const struct kernel_siginfo *from, bool x32_ABI) -#endif { struct compat_siginfo new; copy_siginfo_to_external32(&new, from); -#ifdef CONFIG_X86_X32_ABI - if (x32_ABI && from->si_signo == SIGCHLD) { - new._sifields._sigchld_x32._utime = from->si_utime; - new._sifields._sigchld_x32._stime = from->si_stime; - } -#endif if (copy_to_user(to, &new, sizeof(struct compat_siginfo))) return -EFAULT; return 0; } +#endif static int post_copy_siginfo_from_user32(kernel_siginfo_t *to, const struct compat_siginfo *from) Christophe
[PATCH] powerpc/64: refactor interrupt exit irq disabling sequence
The same complicated sequence for juggling EE, RI, soft mask, and irq tracing is repeated 3 times, tidy these up into one function. This differs qiute a bit between sub architectures, so this makes the ppc32 port cleaner as well. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/syscall_64.c | 58 +++- 1 file changed, 28 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c index c74295a7765b..8f7e268f3294 100644 --- a/arch/powerpc/kernel/syscall_64.c +++ b/arch/powerpc/kernel/syscall_64.c @@ -101,6 +101,31 @@ notrace long system_call_exception(long r3, long r4, long r5, return f(r3, r4, r5, r6, r7, r8); } +/* + * local irqs must be disabled. Returns false if the caller must re-enable + * them, check for new work, and try again. + */ +static notrace inline bool prep_irq_for_enabled_exit(void) +{ + /* This must be done with RI=1 because tracing may touch vmaps */ + trace_hardirqs_on(); + + /* This pattern matches prep_irq_for_idle */ + __hard_EE_RI_disable(); + if (unlikely(lazy_irq_pending())) { + /* Took an interrupt, may have more exit work to do. */ + __hard_RI_enable(); + trace_hardirqs_off(); + local_paca->irq_happened |= PACA_IRQ_HARD_DIS; + + return false; + } + local_paca->irq_happened = 0; + irq_soft_mask_set(IRQS_ENABLED); + + return true; +} + /* * This should be called after a syscall returns, with r3 the return value * from the syscall. If this function returns non-zero, the system call @@ -184,21 +209,10 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, } } - /* This must be done with RI=1 because tracing may touch vmaps */ - trace_hardirqs_on(); - - /* This pattern matches prep_irq_for_idle */ - __hard_EE_RI_disable(); - if (unlikely(lazy_irq_pending())) { - __hard_RI_enable(); - trace_hardirqs_off(); - local_paca->irq_happened |= PACA_IRQ_HARD_DIS; + if (unlikely(!prep_irq_for_enabled_exit())) { local_irq_enable(); - /* Took an interrupt, may have more exit work to do. */ goto again; } - local_paca->irq_happened = 0; - irq_soft_mask_set(IRQS_ENABLED); #ifdef CONFIG_PPC_TRANSACTIONAL_MEM local_paca->tm_scratch = regs->msr; @@ -262,19 +276,11 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned } } - trace_hardirqs_on(); - __hard_EE_RI_disable(); - if (unlikely(lazy_irq_pending())) { - __hard_RI_enable(); - trace_hardirqs_off(); - local_paca->irq_happened |= PACA_IRQ_HARD_DIS; + if (unlikely(!prep_irq_for_enabled_exit())) { local_irq_enable(); local_irq_disable(); - /* Took an interrupt, may have more exit work to do. */ goto again; } - local_paca->irq_happened = 0; - irq_soft_mask_set(IRQS_ENABLED); #ifdef CONFIG_PPC_BOOK3E if (unlikely(ts->debug.dbcr0 & DBCR0_IDM)) { @@ -332,13 +338,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign } } - trace_hardirqs_on(); - __hard_EE_RI_disable(); - if (unlikely(lazy_irq_pending())) { - __hard_RI_enable(); - irq_soft_mask_set(IRQS_ALL_DISABLED); - trace_hardirqs_off(); - local_paca->irq_happened |= PACA_IRQ_HARD_DIS; + if (unlikely(!prep_irq_for_enabled_exit())) { /* * Can't local_irq_restore to replay if we were in * interrupt context. Must replay directly. @@ -352,8 +352,6 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign /* Took an interrupt, may have more exit work to do. */ goto again; } - local_paca->irq_happened = 0; - irq_soft_mask_set(IRQS_ENABLED); } else { /* Returning to a kernel context with local irqs disabled. */ __hard_EE_RI_disable(); -- 2.23.0
[PATCH 0/6] assorted kuap fixes
Here's a bunch of fixes I collected, and some that Aneesh needs for his kuap on hash series. Nicholas Piggin (6): powerpc/64/kuap: move kuap checks out of MSR[RI]=0 regions of exit code missing isync powerpc/64/kuap: interrupt exit kuap restore add missing isync, conditionally restore AMR rpc/64/kuap: restore AMR in system reset exception powerpc/64/kuap: restore AMR in fast_interrupt_return powerpc/64/kuap: conditionally restore AMR in kuap_restore_amr asm .../powerpc/include/asm/book3s/64/kup-radix.h | 45 --- arch/powerpc/kernel/entry_64.S| 10 +++-- arch/powerpc/kernel/exceptions-64s.S | 3 +- arch/powerpc/kernel/syscall_64.c | 26 +++ 4 files changed, 63 insertions(+), 21 deletions(-) -- 2.23.0
[PATCH 1/6] powerpc/64/kuap: move kuap checks out of MSR[RI]=0 regions of exit code
Any kind of WARN causes a program check that will crash with unrecoverable exception if it occurs when RI is clear. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/syscall_64.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c index 8f7e268f3294..a37c7717424f 100644 --- a/arch/powerpc/kernel/syscall_64.c +++ b/arch/powerpc/kernel/syscall_64.c @@ -35,6 +35,8 @@ notrace long system_call_exception(long r3, long r4, long r5, BUG_ON(!FULL_REGS(regs)); BUG_ON(regs->softe != IRQS_ENABLED); + kuap_check_amr(); + account_cpu_user_entry(); #ifdef CONFIG_PPC_SPLPAR @@ -47,8 +49,6 @@ notrace long system_call_exception(long r3, long r4, long r5, } #endif - kuap_check_amr(); - /* * This is not required for the syscall exit path, but makes the * stack frame look nicer. If this was initialised in the first stack @@ -142,6 +142,8 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, unsigned long ti_flags; unsigned long ret = 0; + kuap_check_amr(); + regs->result = r3; /* Check whether the syscall is issued inside a restartable sequence */ @@ -218,8 +220,6 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, local_paca->tm_scratch = regs->msr; #endif - kuap_check_amr(); - account_cpu_user_exit(); return ret; @@ -242,6 +242,8 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned BUG_ON(!FULL_REGS(regs)); BUG_ON(regs->softe != IRQS_ENABLED); + kuap_check_amr(); + local_irq_save(flags); again: @@ -298,8 +300,6 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned local_paca->tm_scratch = regs->msr; #endif - kuap_check_amr(); - account_cpu_user_exit(); return ret; @@ -319,6 +319,8 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign BUG_ON(regs->msr & MSR_PR); BUG_ON(!FULL_REGS(regs)); + kuap_check_amr(); + if (unlikely(*ti_flagsp & _TIF_EMULATE_STACK_STORE)) { clear_bits(_TIF_EMULATE_STACK_STORE, ti_flagsp); ret = 1; -- 2.23.0
[PATCH 2/6] missing isync
--- arch/powerpc/include/asm/book3s/64/kup-radix.h | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h index 3bcef989a35d..8dc5f292b806 100644 --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h @@ -16,7 +16,9 @@ #ifdef CONFIG_PPC_KUAP BEGIN_MMU_FTR_SECTION_NESTED(67) ld \gpr, STACK_REGS_KUAP(r1) + isync mtspr SPRN_AMR, \gpr + /* No isync required, see kuap_restore_amr() */ END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67) #endif .endm @@ -62,8 +64,15 @@ static inline void kuap_restore_amr(struct pt_regs *regs) { - if (mmu_has_feature(MMU_FTR_RADIX_KUAP)) + if (mmu_has_feature(MMU_FTR_RADIX_KUAP)) { + isync(); mtspr(SPRN_AMR, regs->kuap); + /* +* No isync required here because we are about to rfi +* back to previous context before any user accesses +* would be made, which is a CSI. +*/ + } } static inline void kuap_check_amr(void) -- 2.23.0
[PATCH 3/6] powerpc/64/kuap: interrupt exit kuap restore add missing isync, conditionally restore AMR
This fixes a missing isync before the mtspr(AMR), which ensures previous memory accesses execute before the mtspr, so they can't slip past the AMR check and access user memory if we are returning to a context where kuap is allowed. The AMR is made conditional, and only done if the AMR must change, which should be the less common case on most workloads (though kernel page faults on uaccess could be frequent, this doesn't significantly slow down that case). Signed-off-by: Nicholas Piggin --- .../powerpc/include/asm/book3s/64/kup-radix.h | 36 ++- arch/powerpc/kernel/syscall_64.c | 14 +--- 2 files changed, 37 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h index 8dc5f292b806..ec8970958a26 100644 --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h @@ -62,19 +62,32 @@ #include #include -static inline void kuap_restore_amr(struct pt_regs *regs) +static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr) { if (mmu_has_feature(MMU_FTR_RADIX_KUAP)) { - isync(); - mtspr(SPRN_AMR, regs->kuap); - /* -* No isync required here because we are about to rfi -* back to previous context before any user accesses -* would be made, which is a CSI. -*/ + if (unlikely(regs->kuap != amr)) { + isync(); + mtspr(SPRN_AMR, regs->kuap); + /* +* No isync required here because we are about to rfi +* back to previous context before any user accesses +* would be made, which is a CSI. +*/ + } } } +static inline unsigned long kuap_get_and_check_amr(void) +{ + if (mmu_has_feature(MMU_FTR_RADIX_KUAP)) { + unsigned long amr = mfspr(SPRN_AMR); + if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG)) /* kuap_check_amr() */ + WARN_ON_ONCE(amr != AMR_KUAP_BLOCKED); + return amr; + } + return 0; +} + static inline void kuap_check_amr(void) { if (IS_ENABLED(CONFIG_PPC_KUAP_DEBUG) && mmu_has_feature(MMU_FTR_RADIX_KUAP)) @@ -151,13 +164,18 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write) "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read"); } #else /* CONFIG_PPC_KUAP */ -static inline void kuap_restore_amr(struct pt_regs *regs) +static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr) { } static inline void kuap_check_amr(void) { } + +static inline unsigned long kuap_get_and_check_amr(void) +{ + return 0; +} #endif /* CONFIG_PPC_KUAP */ #endif /* __ASSEMBLY__ */ diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c index a37c7717424f..edeab10c6888 100644 --- a/arch/powerpc/kernel/syscall_64.c +++ b/arch/powerpc/kernel/syscall_64.c @@ -242,6 +242,10 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned BUG_ON(!FULL_REGS(regs)); BUG_ON(regs->softe != IRQS_ENABLED); + /* +* We don't need to restore AMR on the way back to userspace for KUAP. +* AMR can only have been unlocked if we interrupted the kernel. +*/ kuap_check_amr(); local_irq_save(flags); @@ -313,13 +317,14 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign unsigned long *ti_flagsp = ¤t_thread_info()->flags; unsigned long flags; unsigned long ret = 0; + unsigned long amr; if (IS_ENABLED(CONFIG_PPC_BOOK3S) && unlikely(!(regs->msr & MSR_RI))) unrecoverable_exception(regs); BUG_ON(regs->msr & MSR_PR); BUG_ON(!FULL_REGS(regs)); - kuap_check_amr(); + amr = kuap_get_and_check_amr(); if (unlikely(*ti_flagsp & _TIF_EMULATE_STACK_STORE)) { clear_bits(_TIF_EMULATE_STACK_STORE, ti_flagsp); @@ -367,10 +372,11 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign #endif /* -* We don't need to restore AMR on the way back to userspace for KUAP. -* The value of AMR only matters while we're in the kernel. +* Don't want to mfspr(SPRN_AMR) here, because this comes after +* mtmsr, which would cause RAW stalls. Hence, we take the AMR value +* from the check above. */ - kuap_restore_amr(regs); + kuap_restore_amr(regs, amr); return ret; } -- 2.23.0
[PATCH 4/6] rpc/64/kuap: restore AMR in system reset exception
system reset interrupt handler locks AMR and exits with PTION_RESTORE_REGS without restoring AMR. Similarly to the soft-NMI ler, it needs to restore. ed-off-by: Nicholas Piggin --- arch/powerpc/kernel/exceptions-64s.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 728ccb0f560c..b0ad930cbae5 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -971,6 +971,7 @@ EXC_COMMON_BEGIN(system_reset_common) ld r10,SOFTE(r1) stb r10,PACAIRQSOFTMASK(r13) + kuap_restore_amr r10 EXCEPTION_RESTORE_REGS RFI_TO_USER_OR_KERNEL -- 2.23.0
[PATCH 5/6] powerpc/64/kuap: restore AMR in fast_interrupt_return
Interrupts that use fast_interrupt_return actually do lock AMR, but they have been ones which tend to come from userspace (or kernel bugs) in radix mode. With kuap on hash, segment interrupts are taken in kernel often, which quickly breaks due to the missing restore. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/entry_64.S | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 9a1e5d636dea..b3c9f15089b6 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -472,15 +472,17 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) #ifdef CONFIG_PPC_BOOK3S /* * If MSR EE/RI was never enabled, IRQs not reconciled, NVGPRs not -* touched, AMR not set, no exit work created, then this can be used. +* touched, no exit work created, then this can be used. */ .balign IFETCH_ALIGN_BYTES .globl fast_interrupt_return fast_interrupt_return: _ASM_NOKPROBE_SYMBOL(fast_interrupt_return) + kuap_check_amr r3, r4 ld r4,_MSR(r1) andi. r0,r4,MSR_PR bne .Lfast_user_interrupt_return + kuap_restore_amr r3 andi. r0,r4,MSR_RI li r3,0 /* 0 return value, no EMULATE_STACK_STORE */ bne+.Lfast_kernel_interrupt_return -- 2.23.0
[PATCH 6/6] powerpc/64/kuap: conditionally restore AMR in kuap_restore_amr asm
Similar to the C code change, make the AMR restore conditional on whether the register has changed. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/book3s/64/kup-radix.h | 10 +++--- arch/powerpc/kernel/entry_64.S | 8 arch/powerpc/kernel/exceptions-64s.S | 4 ++-- 3 files changed, 13 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h index ec8970958a26..e82df54f5681 100644 --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h @@ -12,13 +12,17 @@ #ifdef __ASSEMBLY__ -.macro kuap_restore_amrgpr +.macro kuap_restore_amrgpr1, gpr2 #ifdef CONFIG_PPC_KUAP BEGIN_MMU_FTR_SECTION_NESTED(67) - ld \gpr, STACK_REGS_KUAP(r1) + mfspr \gpr1, SPRN_AMR + ld \gpr2, STACK_REGS_KUAP(r1) + cmpd\gpr1, \gpr2 + beq 998f isync - mtspr SPRN_AMR, \gpr + mtspr SPRN_AMR, \gpr2 /* No isync required, see kuap_restore_amr() */ +998: END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_RADIX_KUAP, 67) #endif .endm diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index b3c9f15089b6..9d49338e0c85 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -479,11 +479,11 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) fast_interrupt_return: _ASM_NOKPROBE_SYMBOL(fast_interrupt_return) kuap_check_amr r3, r4 - ld r4,_MSR(r1) - andi. r0,r4,MSR_PR + ld r5,_MSR(r1) + andi. r0,r5,MSR_PR bne .Lfast_user_interrupt_return - kuap_restore_amr r3 - andi. r0,r4,MSR_RI + kuap_restore_amr r3, r4 + andi. r0,r5,MSR_RI li r3,0 /* 0 return value, no EMULATE_STACK_STORE */ bne+.Lfast_kernel_interrupt_return addir3,r1,STACK_FRAME_OVERHEAD diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index b0ad930cbae5..ef4a90212664 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -971,7 +971,7 @@ EXC_COMMON_BEGIN(system_reset_common) ld r10,SOFTE(r1) stb r10,PACAIRQSOFTMASK(r13) - kuap_restore_amr r10 + kuap_restore_amr r9, r10 EXCEPTION_RESTORE_REGS RFI_TO_USER_OR_KERNEL @@ -2757,7 +2757,7 @@ EXC_COMMON_BEGIN(soft_nmi_common) ld r10,SOFTE(r1) stb r10,PACAIRQSOFTMASK(r13) - kuap_restore_amr r10 + kuap_restore_amr r9, r10 EXCEPTION_RESTORE_REGS hsrr=0 RFI_TO_KERNEL -- 2.23.0
Re: [PATCH] fixup! signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32
On Wed, Apr 29, 2020 at 08:17:22AM +0200, Christophe Leroy wrote: >> +#ifndef CONFIG_X86_X32_ABI > > Can it be declared __weak instead of enclosing it in an #ifndef ? I really hate the __weak ifdefs. But my plan was to move to a CONFIG_ARCH_COPY_SIGINFO_TO_USER32 and have x86 select it.
Re: [RFC PATCH] powerpc/spufs: fix copy_to_user while atomic
Hi Christoph, > And another one that should go on top of this one to address Al's other > compaint: Yeah, I was pondering that one. The access_ok() is kinda redundant, but it does avoid forcing a SPU context save on those errors. However, it's not like we really need to optimise for the case of invalid addresses from userspace. So, I'll include this change in the submission to Michael's tree. Arnd - let me know if you have any objections. Cheers, Jeremy
Re: Section mismatch in reference from the function .early_init_mmu() to the function .init.text:.radix__early_init_mmu() after PowerPC updates 5.7-1
On 29 April 2020 at 07:13 am, Nicholas Piggin wrote: Excerpts from Christian Zigotzky's message of April 29, 2020 2:53 pm: Hi All, The issue still exists in the RC3. (kernel config attached) Please help me to fix this issue. Huh, looks like maybe early_init_mmu() got uninlined because the compiler decided it was unlikely. Does this fix it? Thanks, Nick -- Hi Nick, Many thanks for your patch! It solved the issue. Have a nice day! Cheers, Christian diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h index bb3deb76c951..3ffe5f967483 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu.h +++ b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -208,7 +208,7 @@ void hash__early_init_devtree(void); void radix__early_init_devtree(void); extern void hash__early_init_mmu(void); extern void radix__early_init_mmu(void); -static inline void early_init_mmu(void) +static inline void __init early_init_mmu(void) { if (radix_enabled()) return radix__early_init_mmu();
Re: [PATCH] fixup! signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32
On Tue, Apr 28, 2020 at 09:56:26PM +0200, Arnd Bergmann wrote: > I think I found a way to improve the x32 handling: > > This is a simplification over Christoph's "[PATCH 2/7] signal: factor > copy_siginfo_to_external32 from copy_siginfo_to_user32", reducing the > x32 specifics in the common code to a single #ifdef/#endif check, in > order to keep it more readable for everyone else. > > Christoph, if you like it, please fold into your patch. What do you think of this version? This one always overrides copy_siginfo_to_user32 for the x86 compat case to keep the churn down, and improves the copy_siginfo_to_external32 documentation a bit. --- >From 5ca642c4c744ce6460ebb91fe30ec7a064d28e96 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 29 Apr 2020 08:38:07 +0200 Subject: signal: factor copy_siginfo_to_external32 from copy_siginfo_to_user32 To remove the use of set_fs in the coredump code there needs to be a way to convert a kernel siginfo to a userspace compat siginfo. Factour out a copy_siginfo_to_external32 helper from copy_siginfo_to_user32 that fills out the compat_siginfo, but does so on a kernel space data structure. Handling of the x32 SIGCHLD magic is kept out of copy_siginfo_to_external32, as coredumps are never caused by SIGCHLD. To simplify the mess that the current copy_siginfo_to_user32 is, we allow architectures to override it, and keep thus keep the x32 SIGCHLD magic confined to x86-64. Contains improvements from Eric W. Biederman and Arnd Bergmann . Signed-off-by: Christoph Hellwig --- arch/Kconfig | 3 ++ arch/x86/Kconfig | 1 + arch/x86/kernel/signal.c | 22 include/linux/compat.h | 2 + kernel/signal.c | 108 --- 5 files changed, 83 insertions(+), 53 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 786a85d4ad40d..d5bc9f1ee1747 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -815,6 +815,9 @@ config OLD_SIGACTION config COMPAT_OLD_SIGACTION bool +config ARCH_COPY_SIGINFO_TO_USER32 + bool + config COMPAT_32BIT_TIME bool "Provide system calls for 32-bit time_t" default !64BIT || COMPAT diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 1197b5596d5ad..ad13facbe9ffe 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2906,6 +2906,7 @@ config X86_X32 config COMPAT_32 def_bool y depends on IA32_EMULATION || X86_32 + select ARCH_COPY_SIGINFO_TO_USER32 select HAVE_UID16 select OLD_SIGSUSPEND3 diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index 83b74fb38c8fc..d2b09866105a2 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -511,6 +511,28 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig, } #endif /* CONFIG_X86_32 */ +#ifdef CONFIG_ARCH_COPY_SIGINFO_TO_USER32 +int __copy_siginfo_to_user32(struct compat_siginfo __user *to, +const struct kernel_siginfo *from, bool x32_ABI) +{ + struct compat_siginfo new; + + copy_siginfo_to_external32(&new, from); + if (x32_ABI && from->si_signo == SIGCHLD) { + new._sifields._sigchld_x32._utime = from->si_utime; + new._sifields._sigchld_x32._stime = from->si_stime; + } + if (copy_to_user(to, &new, sizeof(struct compat_siginfo))) + return -EFAULT; + return 0; +} +int copy_siginfo_to_user32(struct compat_siginfo __user *to, + const struct kernel_siginfo *from) +{ + return __copy_siginfo_to_user32(to, from, in_x32_syscall()); +} +#endif /* CONFIG_ARCH_COPY_SIGINFO_TO_USER32 */ + static int x32_setup_rt_frame(struct ksignal *ksig, compat_sigset_t *set, struct pt_regs *regs) diff --git a/include/linux/compat.h b/include/linux/compat.h index 0480ba4db5929..adbfe8f688d92 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -402,6 +402,8 @@ long compat_get_bitmap(unsigned long *mask, const compat_ulong_t __user *umask, unsigned long bitmap_size); long compat_put_bitmap(compat_ulong_t __user *umask, unsigned long *mask, unsigned long bitmap_size); +void copy_siginfo_to_external32(struct compat_siginfo *to, + const struct kernel_siginfo *from); int copy_siginfo_from_user32(kernel_siginfo_t *to, const struct compat_siginfo __user *from); int copy_siginfo_to_user32(struct compat_siginfo __user *to, const kernel_siginfo_t *from); int get_compat_sigevent(struct sigevent *event, diff --git a/kernel/signal.c b/kernel/signal.c index 284fc1600063b..710655df1269d 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -3235,96 +3235,98 @@ int copy_siginfo_from_user(kernel_siginfo_t *to, const siginfo_t __user *from) } #ifdef CONFIG_COMPAT -int copy_siginfo_to_user32(struct compat_siginfo __user *to, - const struct kernel_siginfo *fro
Re: [PATCH v2 2/3] powerpc/numa: Prefer node id queried from vphn
Hello Srikar, On Tue, Apr 28, 2020 at 03:08:35PM +0530, Srikar Dronamraju wrote: > Node id queried from the static device tree may not > be correct. For example: it may always show 0 on a shared processor. > Hence prefer the node id queried from vphn and fallback on the device tree > based node id if vphn query fails. > > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux...@kvack.org > Cc: linux-ker...@vger.kernel.org > Cc: Michal Hocko > Cc: Mel Gorman > Cc: Vlastimil Babka > Cc: "Kirill A. Shutemov" > Cc: Christopher Lameter > Cc: Michael Ellerman > Cc: Andrew Morton > Cc: Linus Torvalds > Signed-off-by: Srikar Dronamraju > --- > Changelog v1:->v2: > - Rebased to v5.7-rc3 > > arch/powerpc/mm/numa.c | 16 > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > index b3615b7fdbdf..281531340230 100644 > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@ -719,20 +719,20 @@ static int __init parse_numa_properties(void) >*/ > for_each_present_cpu(i) { > struct device_node *cpu; > - int nid; > - > - cpu = of_get_cpu_node(i, NULL); > - BUG_ON(!cpu); > - nid = of_node_to_nid_single(cpu); > - of_node_put(cpu); > + int nid = vphn_get_nid(i); > > /* >* Don't fall back to default_nid yet -- we will plug >* cpus into nodes once the memory scan has discovered >* the topology. >*/ > - if (nid < 0) > - continue; > + if (nid == NUMA_NO_NODE) { > + cpu = of_get_cpu_node(i, NULL); > + if (cpu) { Why are we not retaining the BUG_ON(!cpu) assert here ? > + nid = of_node_to_nid_single(cpu); > + of_node_put(cpu); > + } > + } Is it possible at this point that both vphn_get_nid(i) and of_node_to_nid_single(cpu) returns NUMA_NO_NODE ? If so, should we still call node_set_online() below ? > node_set_online(nid); > } > > -- > 2.20.1 > -- Thanks and Regards gautham.
[PATCH 0/6] assorted kuap fixes (try again)
Well the last series was a disaster, I'll try again sending the patches with proper subject and changelogs written. Nicholas Piggin (6): powerpc/64/kuap: move kuap checks out of MSR[RI]=0 regions of exit code powerpc/64s/kuap: kuap_restore missing isync powerpc/64/kuap: interrupt exit conditionally restore AMR powerpc/64s/kuap: restore AMR in system reset exception powerpc/64s/kuap: restore AMR in fast_interrupt_return powerpc/64s/kuap: conditionally restore AMR in kuap_restore_amr asm .../powerpc/include/asm/book3s/64/kup-radix.h | 45 --- arch/powerpc/kernel/entry_64.S| 10 +++-- arch/powerpc/kernel/exceptions-64s.S | 3 +- arch/powerpc/kernel/syscall_64.c | 26 +++ 4 files changed, 63 insertions(+), 21 deletions(-) -- 2.23.0
[PATCH 1/6] powerpc/64/kuap: move kuap checks out of MSR[RI]=0 regions of exit code
Any kind of WARN causes a program check that will crash with unrecoverable exception if it occurs when RI is clear. Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/syscall_64.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c index 8f7e268f3294..a37c7717424f 100644 --- a/arch/powerpc/kernel/syscall_64.c +++ b/arch/powerpc/kernel/syscall_64.c @@ -35,6 +35,8 @@ notrace long system_call_exception(long r3, long r4, long r5, BUG_ON(!FULL_REGS(regs)); BUG_ON(regs->softe != IRQS_ENABLED); + kuap_check_amr(); + account_cpu_user_entry(); #ifdef CONFIG_PPC_SPLPAR @@ -47,8 +49,6 @@ notrace long system_call_exception(long r3, long r4, long r5, } #endif - kuap_check_amr(); - /* * This is not required for the syscall exit path, but makes the * stack frame look nicer. If this was initialised in the first stack @@ -142,6 +142,8 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, unsigned long ti_flags; unsigned long ret = 0; + kuap_check_amr(); + regs->result = r3; /* Check whether the syscall is issued inside a restartable sequence */ @@ -218,8 +220,6 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, local_paca->tm_scratch = regs->msr; #endif - kuap_check_amr(); - account_cpu_user_exit(); return ret; @@ -242,6 +242,8 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned BUG_ON(!FULL_REGS(regs)); BUG_ON(regs->softe != IRQS_ENABLED); + kuap_check_amr(); + local_irq_save(flags); again: @@ -298,8 +300,6 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned local_paca->tm_scratch = regs->msr; #endif - kuap_check_amr(); - account_cpu_user_exit(); return ret; @@ -319,6 +319,8 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign BUG_ON(regs->msr & MSR_PR); BUG_ON(!FULL_REGS(regs)); + kuap_check_amr(); + if (unlikely(*ti_flagsp & _TIF_EMULATE_STACK_STORE)) { clear_bits(_TIF_EMULATE_STACK_STORE, ti_flagsp); ret = 1; -- 2.23.0