https://bugzilla.kernel.org/show_bug.cgi?id=216183
--- Comment #4 from Erhard F. (erhar...@mailbox.org) ---
Tried
https://cgit.freedesktop.org/drm/drm-misc/commit/?h=drm-misc-fixes&id=925b6e59138cefa47275c67891c65d48d3266d57
suggested in https://gitlab.freedesktop.org/drm/amd/-/issues/2050#note_14
https://bugzilla.kernel.org/show_bug.cgi?id=216183
--- Comment #5 from Erhard F. (erhar...@mailbox.org) ---
Danm, posted that to the wrong bug... Sorry! Please ignore comment #4.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee
On 10/07/2022 16:29, Jason Gunthorpe wrote:
On Sat, Jul 09, 2022 at 12:58:00PM +1000, Alexey Kardashevskiy wrote:
driver->ops->attach_group on POWER attaches a group so VFIO claims ownership
over a group, not devices. Underlying API (pnv_ioda2_take_ownership()) does
not need to keep track
Since commit 4bf4f42a2feb ("powerpc/kbuild: Set default generic
machine type for 32-bit compile"), when building a 32 bits kernel
with a bi-arch version of GCC, or when building a book3s/32 kernel,
the option -mcpu=powerpc is passed to GCC at all time, relying on it
being eventually overriden by a
Le 09/07/2022 à 12:23, Pali Rohár a écrit :
>>>
>>> -ifdef CONFIG_PPC_BOOK3S_64
>>>ifdef CONFIG_CPU_LITTLE_ENDIAN
>>> -CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=power8
>>> -CFLAGS-$(CONFIG_GENERIC_CPU) += $(call
>>> cc-option,-mtune=power9,-mtune=power8)
>>> +CFLAGS-$(CONFIG_PPC_BOOK3S_64) +
On Sunday 10 July 2022 17:38:33 Christophe Leroy wrote:
> Le 09/07/2022 à 12:23, Pali Rohár a écrit :
> >>>
> >>> -ifdef CONFIG_PPC_BOOK3S_64
> >>>ifdef CONFIG_CPU_LITTLE_ENDIAN
> >>> -CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=power8
> >>> -CFLAGS-$(CONFIG_GENERIC_CPU) += $(call
> >>> cc-optio
On Sunday 10 July 2022 19:36:58 Christophe Leroy wrote:
> Since commit 4bf4f42a2feb ("powerpc/kbuild: Set default generic
> machine type for 32-bit compile"), when building a 32 bits kernel
> with a bi-arch version of GCC, or when building a book3s/32 kernel,
> the option -mcpu=powerpc is passed to
On Sun, Jul 10, 2022 at 7:36 PM Christophe Leroy
wrote:
>
> Since commit 4bf4f42a2feb ("powerpc/kbuild: Set default generic
> machine type for 32-bit compile"), when building a 32 bits kernel
> with a bi-arch version of GCC, or when building a book3s/32 kernel,
> the option -mcpu=powerpc is passed
On Thu, Jun 30, 2022 at 10:26:01PM -0400, Stefan Berger wrote:
> Simplify tpm_read_log_of() by moving reusable parts of the code into
> an inline function that makes it commonly available so it can be
> used also for kexec support. Call the new of_tpm_get_sml_parameters()
> function from the TPM Op
On Wed, Jul 06, 2022 at 11:23:27AM -0400, Stefan Berger wrote:
> Simplify tpm_read_log_of() by moving reusable parts of the code into
> an inline function that makes it commonly available so it can be
> used also for kexec support. Call the new of_tpm_get_sml_parameters()
> function from the TPM Op
The qspinlock conversion resulted in some latency regressions
particularly in the paravirt (SPLPAR) case. I haven't been able to
improve them much so for now I rewrite with a different paravirt
algorithm (as s390 does). This isn't the same as s390 but they have
some of the same concerns by the look
Add a powerpc specific implementation of queued spinlocks. This is the
build framework with a very simple (non-queued) spinlock implementation
to begin with. Later changes add queueing, and other features and
optimisations one-at-a-time. It is done this way to more easily see how
the queued spinloc
This forms the basis of the qspinlock slow path.
Like generic qspinlocks and unlike the vanilla MCS algorithm, the lock
owner does not participate in the queue, only waiters. The first waiter
spins on the lock word, then when the lock is released it takes
ownership and unqueues the next waiter. Th
The first 16 bits of the lock are only modified by the owner, and other
modifications always use atomic operations on the entire 32 bits, so
unlocks can use plain stores on the 16 bits. This is the same kind of
optimisation done by core qspinlock code.
---
arch/powerpc/include/asm/qspinlock.h
This uses more optimal ll/sc style access patterns (rather than
cmpxchg), and also sets the EH=1 lock hint on those operations
which acquire ownership of the lock.
---
arch/powerpc/include/asm/qspinlock.h | 25 +--
arch/powerpc/include/asm/qspinlock_types.h | 6 +-
arch/powerpc/lib/qspi
Allow new waiters a number of spins on the lock word before queueing,
which particularly helps paravirt performance when physical CPUs are
oversubscribed.
---
arch/powerpc/lib/qspinlock.c | 143 ---
1 file changed, 132 insertions(+), 11 deletions(-)
diff --git a/ar
Give the queue head the ability to stop stealers. After a number of
spins without sucessfully acquiring the lock, the queue head employs
this, which will assure it is the next owner.
---
arch/powerpc/include/asm/qspinlock_types.h | 10 -
arch/powerpc/lib/qspinlock.c | 45
Store the owner CPU number in the lock word so it may be yielded to,
as powerpc's paravirtualised simple spinlocks do.
---
arch/powerpc/include/asm/qspinlock.h | 8 +++-
arch/powerpc/include/asm/qspinlock_types.h | 10 ++
arch/powerpc/lib/qspinlock.c | 6 +++---
3
Waiters spinning on the lock word should yield to the lock owner if the
vCPU is preempted. This improves performance when the hypervisor has
oversubscribed physical CPUs.
---
arch/powerpc/lib/qspinlock.c | 93 +++-
1 file changed, 82 insertions(+), 11 deletions(-)
Queued waiters which are not at the head of the queue don't spin on
the lock word but their qnode lock word, waiting for the previous queued
CPU to release them. Add an option which allows these waiters to yield
to the previous CPU if its vCPU is preempted.
Disable this option by default for now,
If the head of queue is preventing stealing but it finds the owner vCPU
is preempted, it will yield its cycles to the owner which could cause it
to become preempted. Add an option to re-allow stealers before yielding,
and disallow them again after returning from the yield.
Disable this option by d
Having all CPUs poll the lock word for the owner CPU that should be
yielded to defeats most of the purpose of using MCS queueing for
scalability. Yet it may be desirable for queued waiters to to yield
to a preempted owner.
s390 addreses this problem by having queued waiters sample the lock
word to
After the head of the queue acquires the lock, it releases the
next waiter in the queue to become the new head. Add an option
to prod the new head if its vCPU was preempted. This may only
have an effect if queue waiters are yielding.
Disable this option by default for now, i.e., no logical change.
This gives trylock slightly more strength, and it also gives most
of the benefit of passing 'val' back through the slowpath without
the complexity.
---
arch/powerpc/include/asm/qspinlock.h | 39 +++-
arch/powerpc/lib/qspinlock.c | 9 +++
2 files changed, 47 ins
Use the spin_begin/spin_cpu_relax/spin_end APIs in qspinlock, which helps
to prevent threads issuing a lot of expensive priority nops which may not
have much effect due to immediately executing low then medium priority.
---
arch/powerpc/lib/qspinlock.c | 21 +
1 file changed, 1
cpu_to_node is not available (setup_arch() is called before
setup_per_cpu_areas() by start_kernel()).
Signed-off-by: Nicholas Piggin
---
arch/powerpc/kernel/mce.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 18173
If the boot CPU tries to access per-cpu data of other CPUs before
per cpu areas are set up, it will unexpectedly use offset 0.
Try to catch such accesses by poisoning the __per_cpu_offset array.
Signed-off-by: Nicholas Piggin
---
arch/powerpc/include/asm/percpu.h | 1 +
arch/powerpc/kernel/paca
The wait instruction has a different encoding between BookE and BookS.
Add the BookS variant.
Signed-off-by: Nicholas Piggin
---
arch/powerpc/include/asm/ppc-opcode.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/powerpc/include/asm/ppc-opcode.h
b/arch/powerpc/include/asm/ppc-opco
We want to move away from using SMT prioroty updates for cpu_relax, and
use a 'wait' instruction which is similar to x86. As well as being a
much better fit for what everybody else uses and tests with, priority
nops are stateful which is nasty (interrupts have to consider they might
be taken at a d
Commit 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support")
added performance monitoring support for papr-scm nvdimm devices via
perf interface. Commit also added an array in papr_scm_priv
structure called "nvdimm_events_map", which got filled based on the
result of H_SCM_PERFORMANCE_STATS
Though ARM64 has the hardware to do tlb shootdown, the hardware
broadcasting is not free.
A simplest micro benchmark shows even on snapdragon 888 with only
8 cores, the overhead for ptep_clear_flush is huge even for paging
out one page mapped by only one process:
5.36% a.out[kernel.kallsyms]
From: Barry Song
This reverts commit 6bfef171d0d74cb050112e0e49feb20bfddf7f42.
I was wrong. Though ARM64 has hardware TLB flush, but it is not free
and it is still expensive.
We still have a good chance to enable batched and deferred TLB flush
on ARM64 for memory reclamation. A possible way is t
From: Barry Song
Platforms like ARM64 have hareware TLB shootdown broadcast. They
don't maintain mm_cpumask but just send tlbi and related sync
instructions for TLB flush. task's mm_cpumask is normally empty
in this case. We also allow deferred TLB flush on this kind of
platforms.
Signed-off-by:
From: Barry Song
Add uaddr to tlbbatch APIs so that platforms like ARM64 are
able to apply this on their specific hardware features. For
ARM64, this could be sending tlbi into hardware queues for
the page with this particular uaddr.
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Borislav Petkov
Cc:
From: Barry Song
on x86, batched and deferred tlb shootdown has lead to 90%
performance increase on tlb shootdown. on arm64, HW can do
tlb shootdown without software IPI. But sync tlbi is still
quite expensive.
Even running a simplest program which requires swapout can
prove this is true,
#incl
File book3s_hv_p9_entry.c in powerpc/kvm folder consists of functions
like freeze_pmu, switch_pmu_to_guest and switch_pmu_to_host which are
specific to Performance Monitoring Unit(PMU) for power9 and later
platforms.
For better maintenance, moving pmu related code from
book3s_hv_p9_entry.c to a ne
Commit aabcaf6ae2a0 ("KVM: PPC: Book3S HV P9: Move host OS save/restore
functions to built-in") added a comment in switch_pmu_to_guest
function, indicating possibility of moving PMU handling code
to perf subsystem. But perf subsystem code compilation depends upon
the enablement of CONFIG_PERF_EVEN
37 matches
Mail list logo