On 11/11/21 11:41, Michael Ellerman wrote:
Cédric Le Goater <[email protected]> writes:
On processors with a XIVE interrupt controller (POWER9 and above), the
kernel can use either doorbells or XIVE to generate CPU IPIs. Sending
doorbell is generally preferred to using the XIVE IC because it is
faster. There are cases where we want to avoid doorbells and use XIVE
only, for debug or performance. Only useful on POWER9 and above.
How much do we want this?
Yes. Thanks for asking. It is a recent need.
Here is some background I should have added in the first place. May be
for a v2.
We have different ways of doing IPIs on POWER9 and above processors,
depending on the platform and the underlying hypervisor.
- PowerNV uses global doorbells
- pSeries/KVM uses XIVE only because local doorbells are not
efficient, as there are emulated in the KVM hypervisor
- pSeries/PowerVM uses XIVE for remote cores and local doorbells for
threads on same core (SMT4 or 8)
This recent commit 5b06d1679f2f ("powerpc/pseries: Use doorbells even
if XIVE is available") introduced the optimization for PowerVM and
commit 107c55005fbd ("powerpc/pseries: Add KVM guest doorbell
restrictions") restricted the optimization.
We would like a way to turn off the optimization.
Kernel command line args are a bit of a pain, they tend to be poorly
tested, because someone has to explicitly enable them at boot time,
and then reboot to test the other case.
True. The "xive=off" parameter was poorly tested initially.
When would we want to enable this?
For bring-up, for debug, for tests. I have been using a similar switch
to compare the XIVE interrupt controller performance with doorbells on
POWER9 and P0WER10.
A new need arises with PowerVM, some configurations will behave as KVM
(local doorbell are unsupported) and the doorbell=off parameter is a
simple way to handle this case today.
Can we make the kernel smarter about when to use doorbells and make
it automated?
I don't think we want to probe all IPI methods to detect how well
local doorbells are supported on the platform. Do we ?
A machine property/feature would be cleaner. It is a global CPU
property but I don't know where to put it. Ideas ?
Could we make it a runtime switch?
We can. See the patch below. It covers the need for test/performance
but it won't work on a PowerVM system not supporting local doorbells
since boot will fail as soon as secondaries are started. We need a way
to take a decision early on which method to activate.
Thanks
C.
From dcac8528c89b689217515032f3329ba5ea10085d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <[email protected]>
Date: Fri, 5 Nov 2021 12:23:48 +0100
Subject: [PATCH] powerpc/xive: Add a debugfs toggle to select xive for IPIs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
For performance tests only.
Signed-off-by: Cédric Le Goater <[email protected]>
---
arch/powerpc/sysdev/xive/common.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/arch/powerpc/sysdev/xive/common.c
b/arch/powerpc/sysdev/xive/common.c
index 39142df828a018..9ee36b95f9c545 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1826,6 +1826,30 @@ static int xive_eq_debug_show(struct seq_file *m, void
*private)
}
DEFINE_SHOW_ATTRIBUTE(xive_eq_debug);
+static int xive_ipi_cause_debug_set(void *data, u64 val)
+{
+ static void (*do_ipi)(int cpu);
+
+ if (val) {
+ do_ipi = smp_ops->cause_ipi;
+ smp_ops->cause_ipi = xive_cause_ipi;
+ } else {
+ if (do_ipi)
+ smp_ops->cause_ipi = do_ipi;
+ }
+
+ return 0;
+}
+
+static int xive_ipi_cause_debug_get(void *data, u64 *val)
+{
+ *val = xive_cause_ipi == smp_ops->cause_ipi;
+ return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(xive_ipi_cause_debug_fops, xive_ipi_cause_debug_get,
+ xive_ipi_cause_debug_set, "%llu\n");
+
static void xive_core_debugfs_create(void)
{
struct dentry *xive_dir;
@@ -1849,6 +1873,8 @@ static void xive_core_debugfs_create(void)
}
debugfs_create_bool("store-eoi", 0600, xive_dir, &xive_store_eoi);
debugfs_create_bool("save-restore", 0600, xive_dir,
&xive_has_save_restore);
+ debugfs_create_file("ipi-cause", 0600, xive_dir,
+ NULL, &xive_ipi_cause_debug_fops);
}
#endif /* CONFIG_DEBUG_FS */