[Bug 216183] Kernel 5.19-rc4 boots ok with CONFIG_PPC_RADIX_MMU=y but fails to boot with CONFIG_PPC_HASH_MMU_NATIVE=y

2022-07-10 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=216183 --- Comment #4 from Erhard F. (erhar...@mailbox.org) --- Tried https://cgit.freedesktop.org/drm/drm-misc/commit/?h=drm-misc-fixes&id=925b6e59138cefa47275c67891c65d48d3266d57 suggested in https://gitlab.freedesktop.org/drm/amd/-/issues/2050#note_14

[Bug 216183] Kernel 5.19-rc4 boots ok with CONFIG_PPC_RADIX_MMU=y but fails to boot with CONFIG_PPC_HASH_MMU_NATIVE=y

2022-07-10 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=216183 --- Comment #5 from Erhard F. (erhar...@mailbox.org) --- Danm, posted that to the wrong bug... Sorry! Please ignore comment #4. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee

Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains

2022-07-10 Thread Alexey Kardashevskiy
On 10/07/2022 16:29, Jason Gunthorpe wrote: On Sat, Jul 09, 2022 at 12:58:00PM +1000, Alexey Kardashevskiy wrote: driver->ops->attach_group on POWER attaches a group so VFIO claims ownership over a group, not devices. Underlying API (pnv_ioda2_take_ownership()) does not need to keep track

[PATCH] powerpc/32: Don't always pass -mcpu=powerpc to the compiler

2022-07-10 Thread Christophe Leroy
Since commit 4bf4f42a2feb ("powerpc/kbuild: Set default generic machine type for 32-bit compile"), when building a 32 bits kernel with a bi-arch version of GCC, or when building a book3s/32 kernel, the option -mcpu=powerpc is passed to GCC at all time, relying on it being eventually overriden by a

Re: [PATCH] powerpc: e500: Fix compilation with gcc e500 compiler

2022-07-10 Thread Christophe Leroy
Le 09/07/2022 à 12:23, Pali Rohár a écrit : >>> >>> -ifdef CONFIG_PPC_BOOK3S_64 >>>ifdef CONFIG_CPU_LITTLE_ENDIAN >>> -CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=power8 >>> -CFLAGS-$(CONFIG_GENERIC_CPU) += $(call >>> cc-option,-mtune=power9,-mtune=power8) >>> +CFLAGS-$(CONFIG_PPC_BOOK3S_64) +

Re: [PATCH] powerpc: e500: Fix compilation with gcc e500 compiler

2022-07-10 Thread Pali Rohár
On Sunday 10 July 2022 17:38:33 Christophe Leroy wrote: > Le 09/07/2022 à 12:23, Pali Rohár a écrit : > >>> > >>> -ifdef CONFIG_PPC_BOOK3S_64 > >>>ifdef CONFIG_CPU_LITTLE_ENDIAN > >>> -CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=power8 > >>> -CFLAGS-$(CONFIG_GENERIC_CPU) += $(call > >>> cc-optio

Re: [PATCH] powerpc/32: Don't always pass -mcpu=powerpc to the compiler

2022-07-10 Thread Pali Rohár
On Sunday 10 July 2022 19:36:58 Christophe Leroy wrote: > Since commit 4bf4f42a2feb ("powerpc/kbuild: Set default generic > machine type for 32-bit compile"), when building a 32 bits kernel > with a bi-arch version of GCC, or when building a book3s/32 kernel, > the option -mcpu=powerpc is passed to

Re: [PATCH] powerpc/32: Don't always pass -mcpu=powerpc to the compiler

2022-07-10 Thread Arnd Bergmann
On Sun, Jul 10, 2022 at 7:36 PM Christophe Leroy wrote: > > Since commit 4bf4f42a2feb ("powerpc/kbuild: Set default generic > machine type for 32-bit compile"), when building a 32 bits kernel > with a bi-arch version of GCC, or when building a book3s/32 kernel, > the option -mcpu=powerpc is passed

Re: [PATCH v4 3/5] tpm: of: Make of-tree specific function commonly available

2022-07-10 Thread Jarkko Sakkinen
On Thu, Jun 30, 2022 at 10:26:01PM -0400, Stefan Berger wrote: > Simplify tpm_read_log_of() by moving reusable parts of the code into > an inline function that makes it commonly available so it can be > used also for kexec support. Call the new of_tpm_get_sml_parameters() > function from the TPM Op

Re: [PATCH v5 4/6] tpm: of: Make of-tree specific function commonly available

2022-07-10 Thread Jarkko Sakkinen
On Wed, Jul 06, 2022 at 11:23:27AM -0400, Stefan Berger wrote: > Simplify tpm_read_log_of() by moving reusable parts of the code into > an inline function that makes it commonly available so it can be > used also for kexec support. Call the new of_tpm_get_sml_parameters() > function from the TPM Op

[RFC PATCH 00/14] add our own qspinlock implementation

2022-07-10 Thread Nicholas Piggin
The qspinlock conversion resulted in some latency regressions particularly in the paravirt (SPLPAR) case. I haven't been able to improve them much so for now I rewrite with a different paravirt algorithm (as s390 does). This isn't the same as s390 but they have some of the same concerns by the look

[RFC PATCH 01/14] powerpc/qspinlock: powerpc qspinlock implementation

2022-07-10 Thread Nicholas Piggin
Add a powerpc specific implementation of queued spinlocks. This is the build framework with a very simple (non-queued) spinlock implementation to begin with. Later changes add queueing, and other features and optimisations one-at-a-time. It is done this way to more easily see how the queued spinloc

[RFC PATCH 02/14] powerpc/qspinlock: add mcs queueing for contended waiters

2022-07-10 Thread Nicholas Piggin
This forms the basis of the qspinlock slow path. Like generic qspinlocks and unlike the vanilla MCS algorithm, the lock owner does not participate in the queue, only waiters. The first waiter spins on the lock word, then when the lock is released it takes ownership and unqueues the next waiter. Th

[RFC PATCH 03/14] powerpc/qspinlock: use a half-word store to unlock to avoid larx/stcx.

2022-07-10 Thread Nicholas Piggin
The first 16 bits of the lock are only modified by the owner, and other modifications always use atomic operations on the entire 32 bits, so unlocks can use plain stores on the 16 bits. This is the same kind of optimisation done by core qspinlock code. --- arch/powerpc/include/asm/qspinlock.h

[RFC PATCH 04/14] powerpc/qspinlock: convert atomic operations to assembly

2022-07-10 Thread Nicholas Piggin
This uses more optimal ll/sc style access patterns (rather than cmpxchg), and also sets the EH=1 lock hint on those operations which acquire ownership of the lock. --- arch/powerpc/include/asm/qspinlock.h | 25 +-- arch/powerpc/include/asm/qspinlock_types.h | 6 +- arch/powerpc/lib/qspi

[RFC PATCH 05/14] powerpc/qspinlock: allow new waiters to steal the lock before queueing

2022-07-10 Thread Nicholas Piggin
Allow new waiters a number of spins on the lock word before queueing, which particularly helps paravirt performance when physical CPUs are oversubscribed. --- arch/powerpc/lib/qspinlock.c | 143 --- 1 file changed, 132 insertions(+), 11 deletions(-) diff --git a/ar

[RFC PATCH 06/14] powerpc/qspinlock: theft prevention to control latency

2022-07-10 Thread Nicholas Piggin
Give the queue head the ability to stop stealers. After a number of spins without sucessfully acquiring the lock, the queue head employs this, which will assure it is the next owner. --- arch/powerpc/include/asm/qspinlock_types.h | 10 - arch/powerpc/lib/qspinlock.c | 45

[RFC PATCH 07/14] powerpc/qspinlock: store owner CPU in lock word

2022-07-10 Thread Nicholas Piggin
Store the owner CPU number in the lock word so it may be yielded to, as powerpc's paravirtualised simple spinlocks do. --- arch/powerpc/include/asm/qspinlock.h | 8 +++- arch/powerpc/include/asm/qspinlock_types.h | 10 ++ arch/powerpc/lib/qspinlock.c | 6 +++--- 3

[RFC PATCH 08/14] powerpc/qspinlock: paravirt yield to lock owner

2022-07-10 Thread Nicholas Piggin
Waiters spinning on the lock word should yield to the lock owner if the vCPU is preempted. This improves performance when the hypervisor has oversubscribed physical CPUs. --- arch/powerpc/lib/qspinlock.c | 93 +++- 1 file changed, 82 insertions(+), 11 deletions(-)

[RFC PATCH 09/14] powerpc/qspinlock: implement option to yield to previous node

2022-07-10 Thread Nicholas Piggin
Queued waiters which are not at the head of the queue don't spin on the lock word but their qnode lock word, waiting for the previous queued CPU to release them. Add an option which allows these waiters to yield to the previous CPU if its vCPU is preempted. Disable this option by default for now,

[RFC PATCH 10/14] powerpc/qspinlock: allow stealing when head of queue yields

2022-07-10 Thread Nicholas Piggin
If the head of queue is preventing stealing but it finds the owner vCPU is preempted, it will yield its cycles to the owner which could cause it to become preempted. Add an option to re-allow stealers before yielding, and disallow them again after returning from the yield. Disable this option by d

[RFC PATCH 11/14] powerpc/qspinlock: allow propagation of yield CPU down the queue

2022-07-10 Thread Nicholas Piggin
Having all CPUs poll the lock word for the owner CPU that should be yielded to defeats most of the purpose of using MCS queueing for scalability. Yet it may be desirable for queued waiters to to yield to a preempted owner. s390 addreses this problem by having queued waiters sample the lock word to

[RFC PATCH 12/14] powerpc/qspinlock: add ability to prod new queue head CPU

2022-07-10 Thread Nicholas Piggin
After the head of the queue acquires the lock, it releases the next waiter in the queue to become the new head. Add an option to prod the new head if its vCPU was preempted. This may only have an effect if queue waiters are yielding. Disable this option by default for now, i.e., no logical change.

[RFC PATCH 13/14] powerpc/qspinlock: trylock and initial lock attempt may steal

2022-07-10 Thread Nicholas Piggin
This gives trylock slightly more strength, and it also gives most of the benefit of passing 'val' back through the slowpath without the complexity. --- arch/powerpc/include/asm/qspinlock.h | 39 +++- arch/powerpc/lib/qspinlock.c | 9 +++ 2 files changed, 47 ins

[RFC PATCH 14/14] powerpc/qspinlock: use spin_begin/end API

2022-07-10 Thread Nicholas Piggin
Use the spin_begin/spin_cpu_relax/spin_end APIs in qspinlock, which helps to prevent threads issuing a lot of expensive priority nops which may not have much effect due to immediately executing low then medium priority. --- arch/powerpc/lib/qspinlock.c | 21 + 1 file changed, 1

[PATCH 1/2] powerpc/mce: mce_init use early_cpu_to_node

2022-07-10 Thread Nicholas Piggin
cpu_to_node is not available (setup_arch() is called before setup_per_cpu_areas() by start_kernel()). Signed-off-by: Nicholas Piggin --- arch/powerpc/kernel/mce.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index 18173

[PATCH 2/2] powerpc/64: poison __per_cpu_offset to catch use-before-init

2022-07-10 Thread Nicholas Piggin
If the boot CPU tries to access per-cpu data of other CPUs before per cpu areas are set up, it will unexpectedly use offset 0. Try to catch such accesses by poisoning the __per_cpu_offset array. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/percpu.h | 1 + arch/powerpc/kernel/paca

[PATCH 1/2] powerpc: add BookS wait opcode macro

2022-07-10 Thread Nicholas Piggin
The wait instruction has a different encoding between BookE and BookS. Add the BookS variant. Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/ppc-opcode.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opco

[PATCH 2/2] powerpc/64s: Make POWER10 and later use pause_short in cpu_relax loops

2022-07-10 Thread Nicholas Piggin
We want to move away from using SMT prioroty updates for cpu_relax, and use a 'wait' instruction which is similar to x86. As well as being a much better fit for what everybody else uses and tests with, priority nops are stateful which is nasty (interrupts have to consider they might be taken at a d

[PATCH v2] powerpc/papr_scm: Fix nvdimm event mappings

2022-07-10 Thread Kajol Jain
Commit 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support") added performance monitoring support for papr-scm nvdimm devices via perf interface. Commit also added an array in papr_scm_priv structure called "nvdimm_events_map", which got filled based on the result of H_SCM_PERFORMANCE_STATS

[PATCH v2 0/4] mm: arm64: bring up BATCHED_UNMAP_TLB_FLUSH

2022-07-10 Thread Barry Song
Though ARM64 has the hardware to do tlb shootdown, the hardware broadcasting is not free. A simplest micro benchmark shows even on snapdragon 888 with only 8 cores, the overhead for ptep_clear_flush is huge even for paging out one page mapped by only one process: 5.36% a.out[kernel.kallsyms]

[PATCH v2 1/4] Revert "Documentation/features: mark BATCHED_UNMAP_TLB_FLUSH doesn't apply to ARM64"

2022-07-10 Thread Barry Song
From: Barry Song This reverts commit 6bfef171d0d74cb050112e0e49feb20bfddf7f42. I was wrong. Though ARM64 has hardware TLB flush, but it is not free and it is still expensive. We still have a good chance to enable batched and deferred TLB flush on ARM64 for memory reclamation. A possible way is t

[PATCH v2 2/4] mm: rmap: Allow platforms without mm_cpumask to defer TLB flush

2022-07-10 Thread Barry Song
From: Barry Song Platforms like ARM64 have hareware TLB shootdown broadcast. They don't maintain mm_cpumask but just send tlbi and related sync instructions for TLB flush. task's mm_cpumask is normally empty in this case. We also allow deferred TLB flush on this kind of platforms. Signed-off-by:

[PATCH v2 3/4] mm: rmap: Extend tlbbatch APIs to fit new platforms

2022-07-10 Thread Barry Song
From: Barry Song Add uaddr to tlbbatch APIs so that platforms like ARM64 are able to apply this on their specific hardware features. For ARM64, this could be sending tlbi into hardware queues for the page with this particular uaddr. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc:

[PATCH v2 4/4] arm64: support batched/deferred tlb shootdown during page reclamation

2022-07-10 Thread Barry Song
From: Barry Song on x86, batched and deferred tlb shootdown has lead to 90% performance increase on tlb shootdown. on arm64, HW can do tlb shootdown without software IPI. But sync tlbi is still quite expensive. Even running a simplest program which requires swapout can prove this is true, #incl

[PATCH 1/2] powerpc/kvm: Move pmu code in kvm folder to separate file for power9 and later platforms

2022-07-10 Thread Kajol Jain
File book3s_hv_p9_entry.c in powerpc/kvm folder consists of functions like freeze_pmu, switch_pmu_to_guest and switch_pmu_to_host which are specific to Performance Monitoring Unit(PMU) for power9 and later platforms. For better maintenance, moving pmu related code from book3s_hv_p9_entry.c to a ne

[PATCH 2/2] powerpc/kvm: Remove comment related to moving PMU code to perf subsystem

2022-07-10 Thread Kajol Jain
Commit aabcaf6ae2a0 ("KVM: PPC: Book3S HV P9: Move host OS save/restore functions to built-in") added a comment in switch_pmu_to_guest function, indicating possibility of moving PMU handling code to perf subsystem. But perf subsystem code compilation depends upon the enablement of CONFIG_PERF_EVEN