[PATCH] selftests/powerpc: Remove -flto from common CFLAGS
LTO can cause GCC to inline some functions which have attributes set. The act of inlining the functions can lead to GCC forgetting about the attributes which leads to incorrect tests. Notable example being: __attribute__((__target__("no-vsx"))) LTO can also interact strangely with custom assembly functions and cause tests to intermittently fail. Both these cases are hard to detect and require manual inspection of binaries which is unlikely to happen for all tests. Furthermore, LTO optimisations are not necessary for selftests and correctness is paramount and as such it is best to disable LTO. LTO can be enabled on a per test basis. A pseries_le_defconfig kernel on a POWER8 was used to determine that the same subset of selftests pass and fail with and without -flto in the common Makefile. These tests always fail: selftests: per_event_excludes [FAIL] selftests: event_attributes_test [FAIL] selftests: ebb_vs_cpu_event_test [FAIL] selftests: cpu_event_vs_ebb_test [FAIL] selftests: cpu_event_pinned_vs_ebb_test [FAIL] selftests: ipc_unmuxed [FAIL] And the remaining tests PASS. Signed-off-by: Suraj Jitindar Singh --- tools/testing/selftests/powerpc/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile index 0c2706b..7925a96 100644 --- a/tools/testing/selftests/powerpc/Makefile +++ b/tools/testing/selftests/powerpc/Makefile @@ -8,7 +8,7 @@ ifeq ($(ARCH),powerpc) GIT_VERSION = $(shell git describe --always --long --dirty || echo "unknown") -CFLAGS := -Wall -O2 -flto -Wall -Werror -DGIT_VERSION='"$(GIT_VERSION)"' -I$(CURDIR) $(CFLAGS) +CFLAGS := -Wall -O2 -Wall -Werror -DGIT_VERSION='"$(GIT_VERSION)"' -I$(CURDIR) $(CFLAGS) export CFLAGS -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] selftests/powerpc: Remove -flto from common CFLAGS
LTO can cause GCC to inline some functions which have attributes set. The act of inlining the functions can lead to GCC forgetting about the attributes which leads to incorrect tests. Notable example being: __attribute__((__target__("no-vsx"))) LTO can also interact strangely with custom assembly functions and cause tests to intermittently fail. Both these cases are hard to detect and require manual inspection of binaries which is unlikely to happen for all tests. Furthermore, LTO optimisations are not necessary for selftests and correctness is paramount and as such it is best to disable LTO. LTO can be enabled on a per test basis. A pseries_le_defconfig kernel on a POWER8 was used to determine that the same subset of selftests pass and fail with and without -flto in the common Makefile. These tests always fail: selftests: per_event_excludes [FAIL] selftests: event_attributes_test [FAIL] selftests: ebb_vs_cpu_event_test [FAIL] selftests: cpu_event_vs_ebb_test [FAIL] selftests: cpu_event_pinned_vs_ebb_test [FAIL] selftests: ipc_unmuxed [FAIL] And the remaining tests PASS. Signed-off-by: Suraj Jitindar Singh --- tools/testing/selftests/powerpc/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile index 0c2706b..7925a96 100644 --- a/tools/testing/selftests/powerpc/Makefile +++ b/tools/testing/selftests/powerpc/Makefile @@ -8,7 +8,7 @@ ifeq ($(ARCH),powerpc) GIT_VERSION = $(shell git describe --always --long --dirty || echo "unknown") -CFLAGS := -Wall -O2 -flto -Wall -Werror -DGIT_VERSION='"$(GIT_VERSION)"' -I$(CURDIR) $(CFLAGS) +CFLAGS := -Wall -O2 -Wall -Werror -DGIT_VERSION='"$(GIT_VERSION)"' -I$(CURDIR) $(CFLAGS) export CFLAGS -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: selftests/powerpc: Remove -flto from common CFLAGS
On 01/03/16 14:11, Cyril Bur wrote: > On Mon, 29 Feb 2016 22:10:13 +1100 (AEDT) > Michael Ellerman wrote: > >> Hi Suraj, >> >> On Mon, 2016-29-02 at 06:29:55 UTC, Suraj Jitindar Singh wrote: >>> LTO can cause GCC to inline some functions which have attributes set. The >> You should define what LTO is the first time you use it. >> >>> act of inlining the functions can lead to GCC forgetting about the >>> attributes which leads to incorrect tests. >>> Notable example being: __attribute__((__target__("no-vsx"))) >> That is probably a GCC bug, but we still need to work around it for now. >> >>> LTO can also interact strangely with custom assembly functions and cause >>> tests to intermittently fail. >> That's probably Cyril writing bad asm :) >> >>> Both these cases are hard to detect and require manual inspection of >>> binaries which is unlikely to happen for all tests. Furthermore, LTO >>> optimisations are not necessary for selftests and correctness is paramount >>> and as such it is best to disable LTO. >>> >>> LTO can be enabled on a per test basis. >>> >>> A pseries_le_defconfig kernel on a POWER8 was used to determine that the >>> same subset of selftests pass and fail with and without -flto in the >>> common Makefile. >>> >>> These tests always fail: >>> selftests: per_event_excludes [FAIL] >>> selftests: event_attributes_test [FAIL] >>> selftests: ebb_vs_cpu_event_test [FAIL] >>> selftests: cpu_event_vs_ebb_test [FAIL] >>> selftests: cpu_event_pinned_vs_ebb_test [FAIL] >> They shouldn't :) >> >> Are you running as root? Bare metal or guest? > The answer here is that /proc/sys/kernel/perf_event_paranoid defaults to 1 but > 0 is needed for these tests to not SKIP. The standard harness does not decern > between skips and fails, running each test in powerpc/ shows SKIP/FAIL. > > I have run four kernels under QEMU/KVM on a fairly busy VM box and I have the > same result for all 4. All had /proc/sys/kernel/perf_event_paranoid set to 0 > before running the tests. Each test was run by it's self not with the > run_kselftest.sh script which hides SKIPs. > > pseries_defconfig (BE) qemu/KVM PATCHED > # grep FAIL *.out > cpu_event_pinned_vs_ebb_test.out:[FAIL] Test FAILED on line 87 > ipc_unmuxed.out:[FAIL] Test FAILED on line 38 > per_event_excludes.out:[FAIL] Test FAILED on line 95 > > pseries_defconfig (BE) qemu/KVM unpatched > # grep FAIL *.out > cpu_event_pinned_vs_ebb_test.out:[FAIL] Test FAILED on line 87 > ipc_unmuxed.out:[FAIL] Test FAILED on line 38 > per_event_excludes.out:[FAIL] Test FAILED on line 95 > > pseries_le_defconfig (LE) qemu/KVM PATCHED > # grep FAIL *.out > cpu_event_pinned_vs_ebb_test.out:[FAIL] Test FAILED on line 87 > ipc_unmuxed.out:[FAIL] Test FAILED on line 38 > per_event_excludes.out:[FAIL] Test FAILED on line 95 > > pseries_le_defconfig (LE) qemu/KVM unpatched > # grep FAIL *.out > cpu_event_pinned_vs_ebb_test.out:[FAIL] Test FAILED on line 87 > ipc_unmuxed.out:[FAIL] Test FAILED on line 38 > per_event_excludes.out:[FAIL] Test FAILED on line 95 > > There were no matches for grep SKIP *.out for any of the four. > > This patch appears to not have affected any of the tests. > > Reviewed-by: Cyril Bur > The same tests were run with /proc/sys/kernel/perf_event_paranoid set to 0 on an bare metal power8 system with the same results. pseries_le_defconfig (LE) PATCHED # grep FAIL results_patched [FAIL] Test FAILED on line 95 selftests: per_event_excludes [FAIL] [FAIL] Test FAILED on line 87 selftests: cpu_event_pinned_vs_ebb_test [FAIL] selftests: ipc_unmuxed [FAIL] pseries_le_defconfig (LE) unpatched # grep FAIL results_unpatched [FAIL] Test FAILED on line 95 selftests: per_event_excludes [FAIL] [FAIL] Test FAILED on line 87 selftests: cpu_event_pinned_vs_ebb_test [FAIL] selftests: ipc_unmuxed [FAIL] Thus there was no change in the test results between the patched and unpatched systems. >>> selftests: ipc_unmuxed [FAIL] >> That one is expected. >> >> cheers >> ___ >> Linuxppc-dev mailing list >> Linuxppc-dev@lists.ozlabs.org >> https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] devicetree/bindings: Add binding for operator panel on FSP machines
Add a binding to Documentation/devicetree/bindings/powerpc/opal (oppanel-opal.txt) for the operator panel which is present on IBM pseries machines with FSPs. Signed-off-by: Suraj Jitindar Singh --- .../devicetree/bindings/powerpc/opal/oppanel-opal.txt | 14 ++ 1 file changed, 14 insertions(+) create mode 100644 Documentation/devicetree/bindings/powerpc/opal/oppanel-opal.txt diff --git a/Documentation/devicetree/bindings/powerpc/opal/oppanel-opal.txt b/Documentation/devicetree/bindings/powerpc/opal/oppanel-opal.txt new file mode 100644 index 000..dffb791 --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/opal/oppanel-opal.txt @@ -0,0 +1,14 @@ +IBM OPAL Operator Panel Binding +--- + +Required properties: +- compatible : Should be "ibm,opal-oppanel". +- #lines : Number of lines on the operator panel e.g. <0x2>. +- #length: Number of characters per line of the operator panel e.g. <0x10>. + +Example: + oppanel { + compatible = "ibm,opal-oppanel"; + #lines = <0x2>; + #length = <0x10>; + }; -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/2] powerpc/drivers: Add driver for operator panel on FSP machines
Implement new character device driver to allow access from user space to the 2x16 character operator panel display present on powernv machines. This will allow status information to be presented on the display which is visible to a user. The driver implements a 32 character buffer which a user can read/write by accessing the device (/dev/oppanel). This buffer is then displayed on the operator panel display. Any attempt to write past the 32nd position will have no effect and attempts to write more than 32 characters will be truncated. Valid characters are ascii: '.', '/', ':', '0-9', 'a-z', 'A-Z'. All other characters are considered invalid and will be replaced with '.'. A write call past the 32nd character will return zero characters written. A write call will not clear the display and it is up to the user to put spaces (' ') where blank space is required. The device may only be accessed by a single process at a time. Signed-off-by: Suraj Jitindar Singh --- MAINTAINERS| 6 + arch/powerpc/configs/powernv_defconfig | 1 + arch/powerpc/include/asm/opal.h| 2 + arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/opal.c | 5 + drivers/char/Kconfig | 14 ++ drivers/char/Makefile | 1 + drivers/char/op-panel-powernv.c| 246 + 8 files changed, 276 insertions(+) create mode 100644 drivers/char/op-panel-powernv.c diff --git a/MAINTAINERS b/MAINTAINERS index 40eb1db..dbacb12 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8817,6 +8817,12 @@ F: drivers/firmware/psci.c F: include/linux/psci.h F: include/uapi/linux/psci.h +POWERNV OPERATOR PANEL LCD DISPLAY DRIVER +M: Suraj Jitindar Singh +L: linuxppc-dev@lists.ozlabs.org +S: Maintained +F: drivers/char/op-panel-powernv.c + PNP SUPPORT M: "Rafael J. Wysocki" S: Maintained diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig index 0450310..8f9f4ce 100644 --- a/arch/powerpc/configs/powernv_defconfig +++ b/arch/powerpc/configs/powernv_defconfig @@ -181,6 +181,7 @@ CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_JSM=m CONFIG_VIRTIO_CONSOLE=m +CONFIG_IBM_OP_PANEL=m CONFIG_IPMI_HANDLER=y CONFIG_IPMI_DEVICE_INTERFACE=y CONFIG_IPMI_POWERNV=y diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 9d86c66..b33e349 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -178,6 +178,8 @@ int64_t opal_dump_ack(uint32_t dump_id); int64_t opal_dump_resend_notification(void); int64_t opal_get_msg(uint64_t buffer, uint64_t size); +int64_t opal_write_oppanel_async(uint64_t token, oppanel_line_t *lines, + uint64_t num_lines); int64_t opal_check_completion(uint64_t buffer, uint64_t size, uint64_t token); int64_t opal_sync_host_reboot(void); int64_t opal_get_param(uint64_t token, uint32_t param_id, uint64_t buffer, diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S index e45b88a..ddba8bf 100644 --- a/arch/powerpc/platforms/powernv/opal-wrappers.S +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S @@ -278,6 +278,7 @@ OPAL_CALL(opal_dump_info2, OPAL_DUMP_INFO2); OPAL_CALL(opal_dump_read, OPAL_DUMP_READ); OPAL_CALL(opal_dump_ack, OPAL_DUMP_ACK); OPAL_CALL(opal_get_msg,OPAL_GET_MSG); +OPAL_CALL(opal_write_oppanel_async,OPAL_WRITE_OPPANEL_ASYNC); OPAL_CALL(opal_check_completion, OPAL_CHECK_ASYNC_COMPLETION); OPAL_CALL(opal_dump_resend_notification, OPAL_DUMP_RESEND); OPAL_CALL(opal_sync_host_reboot, OPAL_SYNC_HOST_REBOOT); diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c index 0256d07..228751a 100644 --- a/arch/powerpc/platforms/powernv/opal.c +++ b/arch/powerpc/platforms/powernv/opal.c @@ -751,6 +751,9 @@ static int __init opal_init(void) opal_pdev_init(opal_node, "ibm,opal-flash"); opal_pdev_init(opal_node, "ibm,opal-prd"); + /* Initialise platform device: oppanel interface */ + opal_pdev_init(opal_node, "ibm,opal-oppanel"); + /* Initialise OPAL kmsg dumper for flushing console on panic */ opal_kmsg_init(); @@ -885,3 +888,5 @@ EXPORT_SYMBOL_GPL(opal_i2c_request); /* Export these symbols for PowerNV LED class driver */ EXPORT_SYMBOL_GPL(opal_leds_get_ind); EXPORT_SYMBOL_GPL(opal_leds_set_ind); +/* Export this symbol for PowerNV Operator Panel class driver */ +EXPORT_SYMBOL_GPL(opal_write_oppanel_async); diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
Re: [PATCH 2/2] powerpc/drivers: Add driver for operator panel on FSP machines
On 11/04/16 15:27, Andrew Donnellan wrote: > On 11/04/16 11:41, Suraj Jitindar Singh wrote: >> Implement new character device driver to allow access from user space >> to the 2x16 character operator panel display present on powernv machines. > > Specifically, on IBM Power Systems machines with FSPs (see comments below). > >> This will allow status information to be presented on the display which >> is visible to a user. >> >> The driver implements a 32 character buffer which a user can read/write >> by accessing the device (/dev/oppanel). This buffer is then displayed on >> the operator panel display. Any attempt to write past the 32nd position >> will have no effect and attempts to write more than 32 characters will be >> truncated. Valid characters are ascii: '.', '/', ':', '0-9', 'a-z', >> 'A-Z'. All other characters are considered invalid and will be replaced >> with '.'. > > For reference, the ASCII character whitelist is enforced by skiboot, not by > the driver (see > https://github.com/open-power/skiboot/blob/master/hw/fsp/fsp-op-panel.c#L217). > It's been included ever since the first public release of skiboot, so this > statement is true for all machines at present, though theoretically might not > be true in future skiboots or alternative OPAL implementations (should > someone be crazy enough to write one). > >> A write call past the 32nd character will return zero characters >> written. A write call will not clear the display and it is up to the >> user to put spaces (' ') where blank space is required. The device may >> only be accessed by a single process at a time. >> >> Signed-off-by: Suraj Jitindar Singh > > I reviewed an earlier version of this patch internally and Suraj has fixed a > bunch of issues which I raised. I'm not hugely experienced with this, but all > the obvious things I noticed have gone, so... > > Reviewed-by: Andrew Donnellan > > A couple of minor nitpicks below. Thanks Andrew, will fix up the wording to align with your requests and improve clarity. > >> diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig >> index 3ec0766..8c91edf 100644 >> --- a/drivers/char/Kconfig >> +++ b/drivers/char/Kconfig >> @@ -178,6 +178,20 @@ config IBM_BSR >> of threads across a large system which avoids bouncing a cacheline >> between several cores on a system >> >> +config IBM_OP_PANEL >> +tristate "IBM POWER Operator Panel Display support" >> +depends on PPC_POWERNV >> +default m >> +help >> + If you say Y here, a special character device node /dev/oppanel will > > Add commas: "node, /dev/oppanel, will" > >> diff --git a/drivers/char/op-panel-powernv.c >> b/drivers/char/op-panel-powernv.c >> new file mode 100644 >> index 000..cc72c5d >> --- /dev/null >> +++ b/drivers/char/op-panel-powernv.c > [...] >> +/* >> + * This driver creates a character device (/dev/oppanel) which exposes the >> + * operator panel display (2x16 character display) on IBM pSeries machines. > > I'd prefer "IBM Power Systems machines with FSPs" so as to avoid confusion > with the Linux pseries platform, to be in line with current IBM branding, and > to emphasise that it's only FSP machines (the Power Systems LC models are > not). > > Hmm, perhaps also mention that in the Kconfig description too? > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V2 1/2] devicetree/bindings: Add binding for operator panel on FSP machines
Add a binding to Documentation/devicetree/bindings/powerpc/opal (oppanel-opal.txt) for the operator panel which is present on IBM pseries machines with FSPs. Signed-off-by: Suraj Jitindar Singh --- .../devicetree/bindings/powerpc/opal/oppanel-opal.txt | 14 ++ 1 file changed, 14 insertions(+) create mode 100644 Documentation/devicetree/bindings/powerpc/opal/oppanel-opal.txt diff --git a/Documentation/devicetree/bindings/powerpc/opal/oppanel-opal.txt b/Documentation/devicetree/bindings/powerpc/opal/oppanel-opal.txt new file mode 100644 index 000..dffb791 --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/opal/oppanel-opal.txt @@ -0,0 +1,14 @@ +IBM OPAL Operator Panel Binding +--- + +Required properties: +- compatible : Should be "ibm,opal-oppanel". +- #lines : Number of lines on the operator panel e.g. <0x2>. +- #length: Number of characters per line of the operator panel e.g. <0x10>. + +Example: + oppanel { + compatible = "ibm,opal-oppanel"; + #lines = <0x2>; + #length = <0x10>; + }; -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V2 2/2] powerpc/drivers: Add driver for operator panel on FSP machines
Implement new character device driver to allow access from user space to the 2x16 character operator panel display present on IBM Power Systems machines with FSPs. This will allow status information to be presented on the display which is visible to a user. The driver implements a 32 character buffer which a user can read/write by accessing the device (/dev/oppanel). This buffer is then displayed on the operator panel display. Any attempt to write past the 32nd position will have no effect and attempts to write more than 32 characters will be truncated. The device may only be accessed by a single process at a time. Signed-off-by: Suraj Jitindar Singh Reviewed-by: Andrew Donnellan --- Change Log: V1 -> V2: - Replace "IBM pSeries machines" with "IBM Power Systems machines with FSPs" for improved clarity - Basic wording/grammar fixes --- MAINTAINERS| 6 + arch/powerpc/configs/powernv_defconfig | 1 + arch/powerpc/include/asm/opal.h| 2 + arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/opal.c | 5 + drivers/char/Kconfig | 14 ++ drivers/char/Makefile | 1 + drivers/char/op-panel-powernv.c| 247 + 8 files changed, 277 insertions(+) create mode 100644 drivers/char/op-panel-powernv.c diff --git a/MAINTAINERS b/MAINTAINERS index 40eb1db..dbacb12 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8817,6 +8817,12 @@ F: drivers/firmware/psci.c F: include/linux/psci.h F: include/uapi/linux/psci.h +POWERNV OPERATOR PANEL LCD DISPLAY DRIVER +M: Suraj Jitindar Singh +L: linuxppc-dev@lists.ozlabs.org +S: Maintained +F: drivers/char/op-panel-powernv.c + PNP SUPPORT M: "Rafael J. Wysocki" S: Maintained diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig index 0450310..8f9f4ce 100644 --- a/arch/powerpc/configs/powernv_defconfig +++ b/arch/powerpc/configs/powernv_defconfig @@ -181,6 +181,7 @@ CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_JSM=m CONFIG_VIRTIO_CONSOLE=m +CONFIG_IBM_OP_PANEL=m CONFIG_IPMI_HANDLER=y CONFIG_IPMI_DEVICE_INTERFACE=y CONFIG_IPMI_POWERNV=y diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 9d86c66..b33e349 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -178,6 +178,8 @@ int64_t opal_dump_ack(uint32_t dump_id); int64_t opal_dump_resend_notification(void); int64_t opal_get_msg(uint64_t buffer, uint64_t size); +int64_t opal_write_oppanel_async(uint64_t token, oppanel_line_t *lines, + uint64_t num_lines); int64_t opal_check_completion(uint64_t buffer, uint64_t size, uint64_t token); int64_t opal_sync_host_reboot(void); int64_t opal_get_param(uint64_t token, uint32_t param_id, uint64_t buffer, diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S index e45b88a..ddba8bf 100644 --- a/arch/powerpc/platforms/powernv/opal-wrappers.S +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S @@ -278,6 +278,7 @@ OPAL_CALL(opal_dump_info2, OPAL_DUMP_INFO2); OPAL_CALL(opal_dump_read, OPAL_DUMP_READ); OPAL_CALL(opal_dump_ack, OPAL_DUMP_ACK); OPAL_CALL(opal_get_msg,OPAL_GET_MSG); +OPAL_CALL(opal_write_oppanel_async,OPAL_WRITE_OPPANEL_ASYNC); OPAL_CALL(opal_check_completion, OPAL_CHECK_ASYNC_COMPLETION); OPAL_CALL(opal_dump_resend_notification, OPAL_DUMP_RESEND); OPAL_CALL(opal_sync_host_reboot, OPAL_SYNC_HOST_REBOOT); diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c index 0256d07..228751a 100644 --- a/arch/powerpc/platforms/powernv/opal.c +++ b/arch/powerpc/platforms/powernv/opal.c @@ -751,6 +751,9 @@ static int __init opal_init(void) opal_pdev_init(opal_node, "ibm,opal-flash"); opal_pdev_init(opal_node, "ibm,opal-prd"); + /* Initialise platform device: oppanel interface */ + opal_pdev_init(opal_node, "ibm,opal-oppanel"); + /* Initialise OPAL kmsg dumper for flushing console on panic */ opal_kmsg_init(); @@ -885,3 +888,5 @@ EXPORT_SYMBOL_GPL(opal_i2c_request); /* Export these symbols for PowerNV LED class driver */ EXPORT_SYMBOL_GPL(opal_leds_get_ind); EXPORT_SYMBOL_GPL(opal_leds_set_ind); +/* Export this symbol for PowerNV Operator Panel class driver */ +EXPORT_SYMBOL_GPL(opal_write_oppanel_async); diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig index 3ec0766..c1d354d 100644 --- a/drivers/char/Kconfig +++ b/drivers/char/Kconfig @@ -178,6 +178,20 @@ config IBM_BSR of threads across a l
Re: [PATCH V2 1/2] devicetree/bindings: Add binding for operator panel on FSP machines
On 27/04/16 15:03, Stewart Smith wrote: > Suraj Jitindar Singh writes: >> Add a binding to Documentation/devicetree/bindings/powerpc/opal >> (oppanel-opal.txt) for the operator panel which is present on IBM >> pseries machines with FSPs. > It's not pseries (as that implies PowerVM / PAPR) - while here we're all > about OPAL. Thanks, will fix that up. > With a slight change to the commit message, > Acked-by: Stewart Smith > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] drivers/of: Add check for null property in of_remove_property()
The validity of the property input argument to of_remove_property() is never checked within the function and thus it is possible to pass a null value. It happens that this will be picked up in __of_remove_property() as no matching property of the device node will be found and thus an error will be returned, however once again there is no explicit check for a null value. By the time this is detected 2 locks have already been acquired which is completely unnecessary if the property to remove is null. Add an explicit check in the function of_remove_property() for a null property value and return -ENODEV in this case, this is consistent with what the previous return value would have been when the null value was not detected and passed to __of_remove_property(). By moving an explicit check for the property paramenter into the of_remove_property() function, this will remove the need to perform this check in calling code before invocation of the of_remove_property() function. Signed-off-by: Suraj Jitindar Singh --- drivers/of/base.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/of/base.c b/drivers/of/base.c index b299de2..64018eb 100644 --- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -1777,6 +1777,9 @@ int of_remove_property(struct device_node *np, struct property *prop) unsigned long flags; int rc; + if (!prop) + return -ENODEV; + mutex_lock(&of_mutex); raw_spin_lock_irqsave(&devtree_lock, flags); -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/2] powerpc: Update of_remove_property() call sites to remove null checking
After obtaining a property from of_find_property() and before calling of_remove_property() most code checks to ensure that the property returned from of_find_property() is not null. The previous patch moved this check to the start of the function of_remove_property() in order to avoid the case where this check isn't done and a null value is passed. This ensures the check is always conducted before taking locks and attempting to remove the property. Thus it is no longer necessary to perform a check for null values before invoking of_remove_property(). Update of_remove_property() call sites in order to remove redundant checking for null property value as check is now performed within the of_remove_property function(). Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kernel/machine_kexec.c | 19 ++- arch/powerpc/kernel/machine_kexec_64.c| 11 --- arch/powerpc/platforms/pseries/mobility.c | 4 ++-- arch/powerpc/platforms/pseries/reconfig.c | 5 + 4 files changed, 13 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/kernel/machine_kexec.c b/arch/powerpc/kernel/machine_kexec.c index 015ae55..55744a8 100644 --- a/arch/powerpc/kernel/machine_kexec.c +++ b/arch/powerpc/kernel/machine_kexec.c @@ -228,17 +228,12 @@ static struct property memory_limit_prop = { static void __init export_crashk_values(struct device_node *node) { - struct property *prop; - /* There might be existing crash kernel properties, but we can't * be sure what's in them, so remove them. */ - prop = of_find_property(node, "linux,crashkernel-base", NULL); - if (prop) - of_remove_property(node, prop); - - prop = of_find_property(node, "linux,crashkernel-size", NULL); - if (prop) - of_remove_property(node, prop); + of_remove_property(node, of_find_property(node, + "linux,crashkernel-base", NULL)); + of_remove_property(node, of_find_property(node, + "linux,crashkernel-size", NULL)); if (crashk_res.start != 0) { crashk_base = cpu_to_be_ulong(crashk_res.start), @@ -258,16 +253,14 @@ static void __init export_crashk_values(struct device_node *node) static int __init kexec_setup(void) { struct device_node *node; - struct property *prop; node = of_find_node_by_path("/chosen"); if (!node) return -ENOENT; /* remove any stale properties so ours can be found */ - prop = of_find_property(node, kernel_end_prop.name, NULL); - if (prop) - of_remove_property(node, prop); + of_remove_property(node, of_find_property(node, kernel_end_prop.name, + NULL)); /* information needed by userspace when using default_machine_kexec */ kernel_end = cpu_to_be_ulong(__pa(_end)); diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 0fbd75d..2608192 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -401,7 +401,6 @@ static struct property htab_size_prop = { static int __init export_htab_values(void) { struct device_node *node; - struct property *prop; /* On machines with no htab htab_address is NULL */ if (!htab_address) @@ -412,12 +411,10 @@ static int __init export_htab_values(void) return -ENODEV; /* remove any stale propertys so ours can be found */ - prop = of_find_property(node, htab_base_prop.name, NULL); - if (prop) - of_remove_property(node, prop); - prop = of_find_property(node, htab_size_prop.name, NULL); - if (prop) - of_remove_property(node, prop); + of_remove_property(node, of_find_property(node, htab_base_prop.name, + NULL)); + of_remove_property(node, of_find_property(node, htab_size_prop.name, + NULL)); htab_base = cpu_to_be64(__pa(htab_address)); of_add_property(node, &htab_base_prop); diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index ceb18d3..a560a98 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -191,8 +191,8 @@ static int update_dt_node(__be32 phandle, s32 scope) break; case 0x8000: - prop = of_find_property(dn, prop_name, NULL); - of_remove_property(dn, prop); + of_remove_property(dn, of_find_property(dn, + prop_name, NULL)); prop = NULL; break; d
[PATCH] powerpc/pseries: Add null property check to pseries_discover_pic()
The return value of of_get_property() isn't checked before it is passed to the strstr() function, if it happens that the return value is null then this will result in a null pointer being dereferenced. Add a check to see if the return value of of_get_property() is null and if it is continue straight on to the next node. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/platforms/pseries/setup.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 6e944fc..fa73494 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -235,6 +235,8 @@ static void __init pseries_discover_pic(void) for_each_node_by_name(np, "interrupt-controller") { typep = of_get_property(np, "compatible", NULL); + if (!typep) + continue; if (strstr(typep, "open-pic")) { pSeries_mpic_node = of_node_get(np); ppc_md.init_IRQ = pseries_mpic_init_IRQ; -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Add out of bounds check to crash_shutdown_unregister()
When unregistering a crash_shutdown_handle in the function crash_shutdown_unregister() the other handles are shifted down in the array to replace the unregistered handle. The for loop assumes that the last element in the array is null and uses this as the stop condition, however in the case that the last element is not null there is no check to ensure that an out of bounds access is not performed. Add a check to terminate the shift operation when CRASH_HANDLER_MAX is reached in order to protect against out of bounds accesses. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kernel/crash.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c index 2bb252c..6b267af 100644 --- a/arch/powerpc/kernel/crash.c +++ b/arch/powerpc/kernel/crash.c @@ -288,7 +288,7 @@ int crash_shutdown_unregister(crash_shutdown_t handler) rc = 1; } else { /* Shift handles down */ - for (; crash_shutdown_handles[i]; i++) + for (; crash_shutdown_handles[i] && i < CRASH_HANDLER_MAX; i++) crash_shutdown_handles[i] = crash_shutdown_handles[i+1]; rc = 0; -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V3 1/2] devicetree/bindings: Add binding for operator panel on FSP machines
Add a binding to Documentation/devicetree/bindings/powerpc/opal (oppanel-opal.txt) for the operator panel which is present on IBM Power Systems machines with FSPs. Signed-off-by: Suraj Jitindar Singh Acked-by: Rob Herring Acked-by: Stewart Smith --- Change Log: V1 -> V2: - Nothing V2 -> V3: - Change "IBM pseries machines" to "IBM Power Systems machines" in the commit message for improved clarity. --- .../devicetree/bindings/powerpc/opal/oppanel-opal.txt | 14 ++ 1 file changed, 14 insertions(+) create mode 100644 Documentation/devicetree/bindings/powerpc/opal/oppanel-opal.txt diff --git a/Documentation/devicetree/bindings/powerpc/opal/oppanel-opal.txt b/Documentation/devicetree/bindings/powerpc/opal/oppanel-opal.txt new file mode 100644 index 000..dffb791 --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/opal/oppanel-opal.txt @@ -0,0 +1,14 @@ +IBM OPAL Operator Panel Binding +--- + +Required properties: +- compatible : Should be "ibm,opal-oppanel". +- #lines : Number of lines on the operator panel e.g. <0x2>. +- #length: Number of characters per line of the operator panel e.g. <0x10>. + +Example: + oppanel { + compatible = "ibm,opal-oppanel"; + #lines = <0x2>; + #length = <0x10>; + }; -- 2.5.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH V3 2/2] powerpc/drivers: Add driver for operator panel on FSP machines
Implement new character device driver to allow access from user space to the 2x16 character operator panel display present on IBM Power Systems machines with FSPs. This will allow status information to be presented on the display which is visible to a user. The driver implements a 32 character buffer which a user can read/write by accessing the device (/dev/oppanel). This buffer is then displayed on the operator panel display. Any attempt to write past the 32nd position will have no effect and attempts to write more than 32 characters will be truncated. The device may only be accessed by a single process at a time. Signed-off-by: Suraj Jitindar Singh Reviewed-by: Andrew Donnellan --- Change Log: V1 -> V2: - Replace "IBM pSeries machines" with "IBM Power Systems machines with FSPs" for improved clarity - Basic wording/grammar fixes V2 -> V3: - Nothing --- MAINTAINERS| 6 + arch/powerpc/configs/powernv_defconfig | 1 + arch/powerpc/include/asm/opal.h| 2 + arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + arch/powerpc/platforms/powernv/opal.c | 5 + drivers/char/Kconfig | 14 ++ drivers/char/Makefile | 1 + drivers/char/op-panel-powernv.c| 247 + 8 files changed, 277 insertions(+) create mode 100644 drivers/char/op-panel-powernv.c diff --git a/MAINTAINERS b/MAINTAINERS index 40eb1db..dbacb12 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8817,6 +8817,12 @@ F: drivers/firmware/psci.c F: include/linux/psci.h F: include/uapi/linux/psci.h +POWERNV OPERATOR PANEL LCD DISPLAY DRIVER +M: Suraj Jitindar Singh +L: linuxppc-dev@lists.ozlabs.org +S: Maintained +F: drivers/char/op-panel-powernv.c + PNP SUPPORT M: "Rafael J. Wysocki" S: Maintained diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig index 0450310..8f9f4ce 100644 --- a/arch/powerpc/configs/powernv_defconfig +++ b/arch/powerpc/configs/powernv_defconfig @@ -181,6 +181,7 @@ CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_JSM=m CONFIG_VIRTIO_CONSOLE=m +CONFIG_IBM_OP_PANEL=m CONFIG_IPMI_HANDLER=y CONFIG_IPMI_DEVICE_INTERFACE=y CONFIG_IPMI_POWERNV=y diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 9d86c66..b33e349 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -178,6 +178,8 @@ int64_t opal_dump_ack(uint32_t dump_id); int64_t opal_dump_resend_notification(void); int64_t opal_get_msg(uint64_t buffer, uint64_t size); +int64_t opal_write_oppanel_async(uint64_t token, oppanel_line_t *lines, + uint64_t num_lines); int64_t opal_check_completion(uint64_t buffer, uint64_t size, uint64_t token); int64_t opal_sync_host_reboot(void); int64_t opal_get_param(uint64_t token, uint32_t param_id, uint64_t buffer, diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S index e45b88a..ddba8bf 100644 --- a/arch/powerpc/platforms/powernv/opal-wrappers.S +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S @@ -278,6 +278,7 @@ OPAL_CALL(opal_dump_info2, OPAL_DUMP_INFO2); OPAL_CALL(opal_dump_read, OPAL_DUMP_READ); OPAL_CALL(opal_dump_ack, OPAL_DUMP_ACK); OPAL_CALL(opal_get_msg,OPAL_GET_MSG); +OPAL_CALL(opal_write_oppanel_async,OPAL_WRITE_OPPANEL_ASYNC); OPAL_CALL(opal_check_completion, OPAL_CHECK_ASYNC_COMPLETION); OPAL_CALL(opal_dump_resend_notification, OPAL_DUMP_RESEND); OPAL_CALL(opal_sync_host_reboot, OPAL_SYNC_HOST_REBOOT); diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c index 0256d07..228751a 100644 --- a/arch/powerpc/platforms/powernv/opal.c +++ b/arch/powerpc/platforms/powernv/opal.c @@ -751,6 +751,9 @@ static int __init opal_init(void) opal_pdev_init(opal_node, "ibm,opal-flash"); opal_pdev_init(opal_node, "ibm,opal-prd"); + /* Initialise platform device: oppanel interface */ + opal_pdev_init(opal_node, "ibm,opal-oppanel"); + /* Initialise OPAL kmsg dumper for flushing console on panic */ opal_kmsg_init(); @@ -885,3 +888,5 @@ EXPORT_SYMBOL_GPL(opal_i2c_request); /* Export these symbols for PowerNV LED class driver */ EXPORT_SYMBOL_GPL(opal_leds_get_ind); EXPORT_SYMBOL_GPL(opal_leds_set_ind); +/* Export this symbol for PowerNV Operator Panel class driver */ +EXPORT_SYMBOL_GPL(opal_write_oppanel_async); diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig index 3ec0766..c1d354d 100644 --- a/drivers/char/Kconfig +++ b/drivers/char/Kconfig @@ -178,6 +178,20 @@ config IBM_BSR
Re: [PATCH] powerpc: Add out of bounds check to crash_shutdown_unregister()
On Thu, 28 Apr 2016 16:55:13 +1000 Balbir Singh wrote: > On 28/04/16 16:17, Suraj Jitindar Singh wrote: > > When unregistering a crash_shutdown_handle in the function > > crash_shutdown_unregister() the other handles are shifted down in > > the array to replace the unregistered handle. The for loop assumes > > that the last element in the array is null and uses this as the > > stop condition, however in the case that the last element is not > > null there is no check to ensure that an out of bounds access is > > not performed. > > > > Add a check to terminate the shift operation when CRASH_HANDLER_MAX > > is reached in order to protect against out of bounds accesses. > > > > Signed-off-by: Suraj Jitindar Singh > > --- > > arch/powerpc/kernel/crash.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/powerpc/kernel/crash.c > > b/arch/powerpc/kernel/crash.c index 2bb252c..6b267af 100644 > > --- a/arch/powerpc/kernel/crash.c > > +++ b/arch/powerpc/kernel/crash.c > > @@ -288,7 +288,7 @@ int crash_shutdown_unregister(crash_shutdown_t > > handler) rc = 1; > > } else { > > /* Shift handles down */ > > - for (; crash_shutdown_handles[i]; i++) > > + for (; crash_shutdown_handles[i] && i < > > CRASH_HANDLER_MAX; i++) crash_shutdown_handles[i] = > > crash_shutdown_handles[i+1]; > > rc = 0; > > > > with i = CRASH_HANDLER_MAX-1 we could end up with > crash_shutdown_handles[i+1] already out of bounds I think you need to > check that i+1 does not overflow > > Balbir Thanks for taking a look Balbir, the size of crash_shutdown_handles is actually CRASH_HANDLER_MAX+1. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] KVM: PPC: powerpc: Add count cache flush parameters to kvmppc_get_cpu_char()
Add KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST & KVM_PPC_CPU_BEHAV_FLUSH_COUNT_CACHE to the characteristics returned from the H_GET_CPU_CHARACTERISTICS H-CALL, as queried from either the hypervisor or the device tree. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/uapi/asm/kvm.h | 2 ++ arch/powerpc/kvm/powerpc.c | 18 ++ 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 8c876c166ef2..26ca425f4c2c 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -463,10 +463,12 @@ struct kvm_ppc_cpu_char { #define KVM_PPC_CPU_CHAR_BR_HINT_HONOURED (1ULL << 58) #define KVM_PPC_CPU_CHAR_MTTRIG_THR_RECONF (1ULL << 57) #define KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS (1ULL << 56) +#define KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST(1ull << 54) #define KVM_PPC_CPU_BEHAV_FAVOUR_SECURITY (1ULL << 63) #define KVM_PPC_CPU_BEHAV_L1D_FLUSH_PR (1ULL << 62) #define KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_BAR(1ULL << 61) +#define KVM_PPC_CPU_BEHAV_FLUSH_COUNT_CACHE(1ull << 58) /* Per-vcpu XICS interrupt controller state */ #define KVM_REG_PPC_ICP_STATE (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x8c) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index b90a7d154180..a99dcac91e50 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -2189,10 +2189,12 @@ static int pseries_get_cpu_char(struct kvm_ppc_cpu_char *cp) KVM_PPC_CPU_CHAR_L1D_THREAD_PRIV | KVM_PPC_CPU_CHAR_BR_HINT_HONOURED | KVM_PPC_CPU_CHAR_MTTRIG_THR_RECONF | - KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS; + KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS | + KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST; cp->behaviour_mask = KVM_PPC_CPU_BEHAV_FAVOUR_SECURITY | KVM_PPC_CPU_BEHAV_L1D_FLUSH_PR | - KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_BAR; + KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_BAR | + KVM_PPC_CPU_BEHAV_FLUSH_COUNT_CACHE; } return 0; } @@ -2251,12 +2253,16 @@ static int kvmppc_get_cpu_char(struct kvm_ppc_cpu_char *cp) if (have_fw_feat(fw_features, "enabled", "fw-count-cache-disabled")) cp->character |= KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS; + if (have_fw_feat(fw_features, "enabled", +"fw-count-cache-flush-bcctr2,0,0")) + cp->character |= KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST; cp->character_mask = KVM_PPC_CPU_CHAR_SPEC_BAR_ORI31 | KVM_PPC_CPU_CHAR_BCCTRL_SERIALISED | KVM_PPC_CPU_CHAR_L1D_FLUSH_ORI30 | KVM_PPC_CPU_CHAR_L1D_FLUSH_TRIG2 | KVM_PPC_CPU_CHAR_L1D_THREAD_PRIV | - KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS; + KVM_PPC_CPU_CHAR_COUNT_CACHE_DIS | + KVM_PPC_CPU_CHAR_BCCTR_FLUSH_ASSIST; if (have_fw_feat(fw_features, "enabled", "speculation-policy-favor-security")) @@ -2267,9 +2273,13 @@ static int kvmppc_get_cpu_char(struct kvm_ppc_cpu_char *cp) if (!have_fw_feat(fw_features, "disabled", "needs-spec-barrier-for-bound-checks")) cp->behaviour |= KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_BAR; + if (have_fw_feat(fw_features, "enabled", +"needs-count-cache-flush-on-context-switch")) + cp->behaviour |= KVM_PPC_CPU_BEHAV_FLUSH_COUNT_CACHE; cp->behaviour_mask = KVM_PPC_CPU_BEHAV_FAVOUR_SECURITY | KVM_PPC_CPU_BEHAV_L1D_FLUSH_PR | - KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_BAR; + KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_BAR | + KVM_PPC_CPU_BEHAV_FLUSH_COUNT_CACHE; of_node_put(fw_features); } -- 2.13.6
[PATCH] powerpc: Add barrier_nospec to raw_copy_in_user()
Commit ddf35cf3764b ("powerpc: Use barrier_nospec in copy_from_user()") Added barrier_nospec before loading from user-controller pointers. The intention was to order the load from the potentially user-controlled pointer vs a previous branch based on an access_ok() check or similar. In order to achieve the same result, add a barrier_nospec to the raw_copy_in_user() function before loading from such a user-controlled pointer. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/uaccess.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index e3a731793ea2..bb615592d5bb 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -306,6 +306,7 @@ extern unsigned long __copy_tofrom_user(void __user *to, static inline unsigned long raw_copy_in_user(void __user *to, const void __user *from, unsigned long n) { + barrier_nospec(); return __copy_tofrom_user(to, from, n); } #endif /* __powerpc64__ */ -- 2.13.6
Re: [Qemu-ppc] pseries on qemu-system-ppc64le crashes in doorbell_core_ipi()
On Wed, 2019-03-27 at 17:51 +0100, Cédric Le Goater wrote: > On 3/27/19 5:37 PM, Cédric Le Goater wrote: > > On 3/27/19 1:36 PM, Sebastian Andrzej Siewior wrote: > > > With qemu-system-ppc64le -machine pseries -smp 4 I get: > > > > > > > # chrt 1 hackbench > > > > Running in process mode with 10 groups using 40 file > > > > descriptors each (== 400 tasks) > > > > Each sender will pass 100 messages of 100 bytes > > > > Oops: Exception in kernel mode, sig: 4 [#1] > > > > LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=2048 NUMA pSeries > > > > Modules linked in: > > > > CPU: 0 PID: 629 Comm: hackbench Not tainted 5.1.0-rc2 #71 > > > > NIP: c0046978 LR: c0046a38 CTR: > > > > c00b0150 > > > > REGS: c001fffeb8e0 TRAP: 0700 Not tainted (5.1.0-rc2) > > > > MSR: 80089033 CR: > > > > 42000874 XER: > > > > CFAR: c0046a34 IRQMASK: 1 > > > > GPR00: c00b0170 c001fffebb70 c0a6ba00 > > > > 2800 > > > > > > … > > > > NIP [c0046978] doorbell_core_ipi+0x28/0x30 > > > > LR [c0046a38] doorbell_try_core_ipi+0xb8/0xf0 > > > > Call Trace: > > > > [c001fffebb70] [c001fffebba0] 0xc001fffebba0 > > > > (unreliable) > > > > [c001fffebba0] [c00b0170] > > > > smp_pseries_cause_ipi+0x20/0x70 > > > > [c001fffebbd0] [c004b02c] > > > > arch_send_call_function_single_ipi+0x8c/0xa0 > > > > [c001fffebbf0] [c01de600] > > > > irq_work_queue_on+0xe0/0x130 > > > > [c001fffebc30] [c01340c8] > > > > rto_push_irq_work_func+0xc8/0x120 > > > > > > … > > > > Instruction dump: > > > > 6000 6000 3c4c00a2 384250b0 3d220009 392949c8 8129 > > > > 3929 > > > > 7d231838 7c0004ac 5463017e 64632800 <7c00191c> 4e800020 > > > > 3c4c00a2 38425080 > > > > ---[ end trace eb842b544538cbdf ]--- > > > > > > and I was wondering whether this is a qemu bug or the kernel is > > > using an > > > opcode it should rather not. If I skip doorbell_try_core_ipi() in > > > smp_pseries_cause_ipi() then there is no crash. The comment says > > > "POWER9 > > > should not use this handler" so… > > > > I would say Linux is using a msgsndp instruction which is not > > implemented > > in QEMU TCG. But why have we started using dbells in Linux ? Yeah the kernel must have used msgsndp which isn't implemented for TCG yet. We use doorbells in linux but only for threads which are on the same core. And when I try to construct a situation with more than 1 thread per core (e.g. -smp 4,threads=4), I get "TCG cannot support more than 1 thread/core on a pseries machine". So I wonder why the guest thinks it can use msgsndp... > > ah. It seems arch_local_irq_restore() / replay_interrupt() generated > some interrupt. > > C. >
[PATCH 0/2] Remove Variable Length Arrays from powerpc code
This patch series removes two Variable Length Arrays (VLAs) from the powerpc code. Series based on v4.19-rc2 Suraj Jitindar Singh (2): powerpc/prom: Remove VLA in prom_check_platform_support() powerpc/pseries: Remove VLA from lparcfg_write() arch/powerpc/kernel/prom_init.c | 7 +-- arch/powerpc/platforms/pseries/lparcfg.c | 5 ++--- 2 files changed, 7 insertions(+), 5 deletions(-) -- 2.13.6
[PATCH 2/2] powerpc/pseries: Remove VLA from lparcfg_write()
In lparcfg_write we hard code kbuf_sz and then use this as the variable length of kbuf creating a variable length array. Since we're hard coding the length anyway just define the array using this as the length and remove the need for kbuf_sz, thus removing the variable length array. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/platforms/pseries/lparcfg.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c index 7c872dc01bdb..8bd590af488a 100644 --- a/arch/powerpc/platforms/pseries/lparcfg.c +++ b/arch/powerpc/platforms/pseries/lparcfg.c @@ -585,8 +585,7 @@ static ssize_t update_mpp(u64 *entitlement, u8 *weight) static ssize_t lparcfg_write(struct file *file, const char __user * buf, size_t count, loff_t * off) { - int kbuf_sz = 64; - char kbuf[kbuf_sz]; + char kbuf[64]; char *tmp; u64 new_entitled, *new_entitled_ptr = &new_entitled; u8 new_weight, *new_weight_ptr = &new_weight; @@ -595,7 +594,7 @@ static ssize_t lparcfg_write(struct file *file, const char __user * buf, if (!firmware_has_feature(FW_FEATURE_SPLPAR)) return -EINVAL; - if (count > kbuf_sz) + if (count > sizeof(kbuf)) return -EINVAL; if (copy_from_user(kbuf, buf, count)) -- 2.13.6
[PATCH 1/2] powerpc/prom: Remove VLA in prom_check_platform_support()
In prom_check_platform_support() we retrieve and parse the "ibm,arch-vec-5-platform-support" property of the chosen node. Currently we use a variable length array however to avoid this use an array of constant length 8. This property is used to indicate the supported options of vector 5 bytes 23-26 of the ibm,architecture.vec node. Each of these options is a pair of bytes, thus for 4 options we have a max length of 8 bytes. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kernel/prom_init.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 9b38a2e5dd35..ce5fc03dc69f 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -1131,12 +1131,15 @@ static void __init prom_check_platform_support(void) "ibm,arch-vec-5-platform-support"); if (prop_len > 1) { int i; - u8 vec[prop_len]; + u8 vec[8]; prom_debug("Found ibm,arch-vec-5-platform-support, len: %d\n", prop_len); + if (prop_len > sizeof(vec)) + prom_printf("WARNING: ibm,arch-vec-5-platform-support longer "\ + " than expected (len: %d)\n", prop_len); prom_getprop(prom.chosen, "ibm,arch-vec-5-platform-support", &vec, sizeof(vec)); - for (i = 0; i < prop_len; i += 2) { + for (i = 0; i < sizeof(vec); i += 2) { prom_debug("%d: index = 0x%x val = 0x%x\n", i / 2 , vec[i] , vec[i + 1]); -- 2.13.6
Re: [PATCH] KVM: PPC: Book3S HV: fix handling for interrupted H_ENTER_NESTED
8280f033PVR > 004e1202 VRSAVE > SPRG0 SPRG1 c1b0 SPRG2 > 772f9565a280 SPRG3 > SPRG4 SPRG5 SPRG6 > 0000 SPRG7 0000 > HSRR0 HSRR1 >CFAR 0000 > LPCR 03d40413 >PTCR DAR 772f9539 DSISR > 4200 > > Fix this by setting vcpu->arch.hcall_needed = 0 to indicate > completion > of H_ENTER_NESTED before we exit to L0 userspace. Nice Catch :) Reviewed-by: Suraj Jitindar Singh > > Cc: linuxppc-...@ozlabs.org > Cc: David Gibson > Cc: Paul Mackerras > Cc: Suraj Jitindar Singh > Signed-off-by: Michael Roth > --- > arch/powerpc/kvm/book3s_hv.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/powerpc/kvm/book3s_hv.c > b/arch/powerpc/kvm/book3s_hv.c > index d65b961661fb..a56f8413758a 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -983,6 +983,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu > *vcpu) > ret = kvmhv_enter_nested_guest(vcpu); > if (ret == H_INTERRUPT) { > kvmppc_set_gpr(vcpu, 3, 0); > + vcpu->arch.hcall_needed = 0; > return -EINTR; > } > break;
Re: [PATCH] KVM: PPC: Book3S HV: NULL check before some freeing functions is not needed.
On Sun, 2018-12-02 at 21:52 +0100, Thomas Meyer wrote: > NULL check before some freeing functions is not needed. Technically true, however I think a comment should be added then to make it clearer to someone reading the code why this is ok. See below. Suraj. > > Signed-off-by: Thomas Meyer > --- > > diff -u -p a/arch/powerpc/kvm/book3s_hv_nested.c > b/arch/powerpc/kvm/book3s_hv_nested.c > --- a/arch/powerpc/kvm/book3s_hv_nested.c > +++ b/arch/powerpc/kvm/book3s_hv_nested.c > @@ -1252,8 +1252,7 @@ static long int __kvmhv_nested_page_faul > rmapp = &memslot->arch.rmap[gfn - memslot->base_gfn]; > ret = kvmppc_create_pte(kvm, gp->shadow_pgtable, pte, n_gpa, > level, > mmu_seq, gp->shadow_lpid, rmapp, > &n_rmap); > - if (n_rmap) > - kfree(n_rmap); > + kfree(n_rmap); e.g. /* n_rmap set to NULL in kvmppc_create_pte if reference preserved */ > if (ret == -EAGAIN) > ret = RESUME_GUEST; /* Let the guest try > again */ >
[PATCH 0/8] KVM: PPC: Implement passthrough of emulated devices for nested guests
This patch series allows for emulated devices to be passed through to nested guests, irrespective of at which level the device is being emulated. Note that the emulated device must be using dma, not virtio. For example, passing through an emulated e1000: 1. Emulate the device at L(n) for L(n+1) qemu-system-ppc64 -netdev type=user,id=net0 -device e1000,netdev=net0 2. Assign the VFIO-PCI driver at L(n+1) echo :00:00.0 > /sys/bus/pci/drivers/e1000/unbind echo :00:00.0 > /sys/bus/pci/drivers/vfio-pci/bind chmod 666 /dev/vfio/0 3. Pass the device through from L(n+1) to L(n+2) qemu-system-ppc64 -device vfio-pci,host=:00:00.0 4. L(n+2) can now access the device which will be emulated at L(n) Suraj Jitindar Singh (8): KVM: PPC: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix() KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2 KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops struct KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrants KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2 guest KVM: PPC: Introduce new hcall H_COPY_TOFROM_GUEST to access quadrants 1 & 2 KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L3 guest arch/powerpc/include/asm/hvcall.h| 1 + arch/powerpc/include/asm/kvm_book3s.h| 10 ++- arch/powerpc/include/asm/kvm_book3s_64.h | 13 arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/include/asm/kvm_ppc.h | 4 ++ arch/powerpc/kernel/exceptions-64s.S | 9 +++ arch/powerpc/kvm/book3s_64_mmu_radix.c | 97 ++ arch/powerpc/kvm/book3s_hv.c | 58 ++-- arch/powerpc/kvm/book3s_hv_nested.c | 114 +-- arch/powerpc/kvm/powerpc.c | 28 +++- arch/powerpc/mm/fault.c | 1 + 11 files changed, 323 insertions(+), 15 deletions(-) -- 2.13.6
[PATCH] KVM: PPC: Book3S PR: Set hflag to indicate that POWER9 supports 1T segments
When booting a kvm-pr guest on a POWER9 machine the following message is observed: "qemu-system-ppc64: KVM does not support 1TiB segments which guest expects" This is because the guest is expecting to be able to use 1T segments however we don't indicate support for it. This is because we don't set the BOOK3S_HFLAG_MULTI_PGSIZE flag in the hflags in kvmppc_set_pvr_pr() on POWER9. POWER9 does indeed have support for 1T segments, so add a case for POWER9 to the switch statement to ensure it is set. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/book3s_pr.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 4efd65d9e828..82840160c606 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -587,6 +587,7 @@ void kvmppc_set_pvr_pr(struct kvm_vcpu *vcpu, u32 pvr) case PVR_POWER8: case PVR_POWER8E: case PVR_POWER8NVL: + case PVR_POWER9: vcpu->arch.hflags |= BOOK3S_HFLAG_MULTI_PGSIZE | BOOK3S_HFLAG_NEW_TLBIE; break; -- 2.13.6
[PATCH 1/8] KVM: PPC: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines
The kvm capability KVM_CAP_SPAPR_TCE_VFIO is used to indicate the availability of in kernel tce acceleration for vfio. However it is currently the case that this is only available on a powernv machine, not for a pseries machine. Thus make this capability dependent on having the cpu feature CPU_FTR_HVMODE. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/powerpc.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2869a299c4ed..95859c53a5cd 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -496,6 +496,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) int r; /* Assume we're using HV mode when the HV module is loaded */ int hv_enabled = kvmppc_hv_ops ? 1 : 0; + int kvm_on_pseries = !cpu_has_feature(CPU_FTR_HVMODE); if (kvm) { /* @@ -543,8 +544,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) #ifdef CONFIG_PPC_BOOK3S_64 case KVM_CAP_SPAPR_TCE: case KVM_CAP_SPAPR_TCE_64: - /* fallthrough */ + r = 1; + break; case KVM_CAP_SPAPR_TCE_VFIO: + r = !kvm_on_pseries; + break; case KVM_CAP_PPC_RTAS: case KVM_CAP_PPC_FIXUP_HCALL: case KVM_CAP_PPC_ENABLE_HCALL: -- 2.13.6
[PATCH 2/8] KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix()
There exists a function kvm_is_radix() which is used to determine if a kvm instance is using the radix mmu. However this only applies to the first level (L1) guest. Add a function kvmhv_vcpu_is_radix() which can be used to determine if the current execution context of the vcpu is radix, accounting for if the vcpu is running a nested guest. Currently all nested guests must be radix but this may change in the future. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_book3s_64.h | 13 + arch/powerpc/kvm/book3s_hv_nested.c | 1 + 2 files changed, 14 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 6d298145d564..7a9e472f2872 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -55,6 +55,7 @@ struct kvm_nested_guest { cpumask_t need_tlb_flush; cpumask_t cpu_in_guest; short prev_cpu[NR_CPUS]; + u8 radix; /* is this nested guest radix */ }; /* @@ -150,6 +151,18 @@ static inline bool kvm_is_radix(struct kvm *kvm) return kvm->arch.radix; } +static inline bool kvmhv_vcpu_is_radix(struct kvm_vcpu *vcpu) +{ + bool radix; + + if (vcpu->arch.nested) + radix = vcpu->arch.nested->radix; + else + radix = kvm_is_radix(vcpu->kvm); + + return radix; +} + #define KVM_DEFAULT_HPT_ORDER 24 /* 16MB HPT by default */ #endif diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 401d2ecbebc5..4fca462e54c4 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -480,6 +480,7 @@ struct kvm_nested_guest *kvmhv_alloc_nested(struct kvm *kvm, unsigned int lpid) if (shadow_lpid < 0) goto out_free2; gp->shadow_lpid = shadow_lpid; + gp->radix = 1; memset(gp->prev_cpu, -1, sizeof(gp->prev_cpu)); -- 2.13.6
[PATCH 3/8] KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2
The POWER9 radix mmu has the concept of quadrants. The quadrant number is the two high bits of the effective address and determines the fully qualified address to be used for the translation. The fully qualified address consists of the effective lpid, the effective pid and the effective address. This gives then 4 possible quadrants 0, 1, 2, and 3. When accessing these quadrants the fully qualified address is obtained as follows: Quadrant| Hypervisor| Guest -- | EA[0:1] = 0xb00 | EA[0:1] = 0xb00 0 | effLPID = 0 | effLPID = LPIDR | effPID = PIDR| effPID = PIDR -- | EA[0:1] = 0xb01 | 1 | effLPID = LPIDR | Invalid Access | effPID = PIDR| -- | EA[0:1] = 0xb10 | 2 | effLPID = LPIDR | Invalid Access | effPID = 0 | -- | EA[0:1] = 0xb11 | EA[0:1] = 0xb11 3 | effLPID = 0 | effLPID = LPIDR | effPID = 0 | effPID = 0 -- In the Guest; Quadrant 3 is normally used to address the operating system since this uses effPID=0 and effLPID=LPIDR, meaning the PID register doesn't need to be switched. Quadrant 0 is normally used to address user space since the effLPID and effPID are taken from the corresponding registers. In the Host; Quadrant 0 and 3 are used as above, however the effLPID is always 0 to address the host. Quadrants 1 and 2 can be used by the host to address guest memory using a guest effective address. Since the effLPID comes from the LPID register, the host loads the LPID of the guest it would like to access (and the PID of the process) and can perform accesses to a guest effective address. This means quadrant 1 can be used to address the guest user space and quadrant 2 can be used to address the guest operating system from the hypervisor, using a guest effective address. Access to the quadrants can cause a Hypervisor Data Storage Interrupt (HDSI) due to being unable to perform partition scoped translation. Previously this could only be generated from a guest and so the code path expects us to take the KVM trampoline in the interrupt handler. This is no longer the case so we modify the handler to call bad_page_fault() to check if we were expecting this fault so we can handle it gracefully and just return with an error code. In the hash mmu case we still raise an unknown exception since quadrants aren't defined for the hash mmu. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_book3s.h | 4 ++ arch/powerpc/kernel/exceptions-64s.S | 9 arch/powerpc/kvm/book3s_64_mmu_radix.c | 97 ++ arch/powerpc/mm/fault.c| 1 + 4 files changed, 111 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 09f8e9ba69bc..5883fcce7009 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -188,6 +188,10 @@ extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hc); extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned long ea, unsigned long dsisr); +extern long kvmhv_copy_from_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, + void *to, unsigned long n); +extern long kvmhv_copy_to_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, + void *from, unsigned long n); extern int kvmppc_mmu_walk_radix_tree(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *gpte, u64 root, u64 *pte_ret_p); diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 89d32bb79d5e..db2691ff4c0b 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -995,7 +995,16 @@ EXC_COMMON_BEGIN(h_data_storage_common) bl save_nvgprs RECONCILE_IRQ_STATE(r10, r11) addir3,r1,STACK_FRAME_OVERHEAD +BEGIN_MMU_FTR_SECTION + ld r4,PACA_EXGEN+EX_DAR(r13) + lwz r5,PACA_EXGEN+EX_DSISR(r13) + std r4,_DAR(r1) + std r5,_DSISR(r1) + li r5,SIGSEGV + bl bad_page_fault +MMU_FTR_SECTION_ELSE bl unknown
[PATCH 4/8] KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops struct
The kvmppc_ops struct is used to store function pointers to kvm implementation specific functions. Introduce two new functions load_from_eaddr and store_to_eaddr to be used to load from and store to a guest effective address respectively. Also implement these for the kvm-hv module. If we are using the radix mmu then we can call the functions to access quadrant 1 and 2. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_ppc.h | 4 arch/powerpc/kvm/book3s_hv.c | 40 ++ 2 files changed, 44 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 9b89b1918dfc..159dd76700cb 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -326,6 +326,10 @@ struct kvmppc_ops { unsigned long flags); void (*giveup_ext)(struct kvm_vcpu *vcpu, ulong msr); int (*enable_nested)(struct kvm *kvm); + int (*load_from_eaddr)(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, + int size); + int (*store_to_eaddr)(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, + int size); }; extern struct kvmppc_ops *kvmppc_hv_ops; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index d65b961661fb..6c8b4f632168 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -5213,6 +5213,44 @@ static int kvmhv_enable_nested(struct kvm *kvm) return 0; } +static int kvmhv_load_from_eaddr(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, +int size) +{ + int rc = -EINVAL; + + if (kvmhv_vcpu_is_radix(vcpu)) { + rc = kvmhv_copy_from_guest_radix(vcpu, *eaddr, ptr, size); + + if (rc > 0) + rc = -EINVAL; + } + + /* For now quadrants are the only way to access nested guest memory */ + if (rc && vcpu->arch.nested) + rc = -EAGAIN; + + return rc; +} + +static int kvmhv_store_to_eaddr(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, + int size) +{ + int rc = -EINVAL; + + if (kvmhv_vcpu_is_radix(vcpu)) { + rc = kvmhv_copy_to_guest_radix(vcpu, *eaddr, ptr, size); + + if (rc > 0) + rc = -EINVAL; + } + + /* For now quadrants are the only way to access nested guest memory */ + if (rc && vcpu->arch.nested) + rc = -EAGAIN; + + return rc; +} + static struct kvmppc_ops kvm_ops_hv = { .get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv, .set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv, @@ -5253,6 +5291,8 @@ static struct kvmppc_ops kvm_ops_hv = { .get_rmmu_info = kvmhv_get_rmmu_info, .set_smt_mode = kvmhv_set_smt_mode, .enable_nested = kvmhv_enable_nested, + .load_from_eaddr = kvmhv_load_from_eaddr, + .store_to_eaddr = kvmhv_store_to_eaddr, }; static int kvm_init_subcore_bitmap(void) -- 2.13.6
[PATCH 5/8] KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrants
The functions kvmppc_st and kvmppc_ld are used to access guest memory from the host using a guest effective address. They do so by translating through the process table to obtain a guest real address and then using kvm_read_guest or kvm_write_guest to make the access with the guest real address. This method of access however only works for L1 guests and will give the incorrect results for a nested guest. We can however use the store_to_eaddr and load_from_eaddr kvmppc_ops to perform the access for a nested guesti (and a L1 guest). So attempt this method first and fall back to the old method if this fails and we aren't running a nested guest. At this stage there is no fall back method to perform the access for a nested guest and this is left as a future improvement. For now we will return to the nested guest and rely on the fact that a translation should be faulted in before retrying the access. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/powerpc.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 95859c53a5cd..cb029fcab404 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -331,10 +331,17 @@ int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, { ulong mp_pa = vcpu->arch.magic_page_pa & KVM_PAM & PAGE_MASK; struct kvmppc_pte pte; - int r; + int r = -EINVAL; vcpu->stat.st++; + if (vcpu->kvm->arch.kvm_ops && vcpu->kvm->arch.kvm_ops->store_to_eaddr) + r = vcpu->kvm->arch.kvm_ops->store_to_eaddr(vcpu, eaddr, ptr, + size); + + if ((!r) || (r == -EAGAIN)) + return r; + r = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST, XLATE_WRITE, &pte); if (r < 0) @@ -367,10 +374,17 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, { ulong mp_pa = vcpu->arch.magic_page_pa & KVM_PAM & PAGE_MASK; struct kvmppc_pte pte; - int rc; + int rc = -EINVAL; vcpu->stat.ld++; + if (vcpu->kvm->arch.kvm_ops && vcpu->kvm->arch.kvm_ops->load_from_eaddr) + rc = vcpu->kvm->arch.kvm_ops->load_from_eaddr(vcpu, eaddr, ptr, + size); + + if ((!rc) || (rc == -EAGAIN)) + return rc; + rc = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST, XLATE_READ, &pte); if (rc) -- 2.13.6
[PATCH 6/8] KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2 guest
Allow for a device which is being emulated at L0 (the host) for an L1 guest to be passed through to a nested (L2) guest. The existing kvmppc_hv_emulate_mmio function can be used here. The main challenge is that for a load the result must be stored into the L2 gpr, not an L1 gpr as would normally be the case after going out to qemu to complete the operation. This presents a challenge as at this point the L2 gpr state has been written back into L1 memory. To work around this we store the address in L1 memory of the L2 gpr where the result of the load is to be stored and use the new io_gpr value KVM_MMIO_REG_NESTED_GPR to indicate that this is a nested load for which completion must be done when returning back into the kernel. Then in kvmppc_complete_mmio_load() the resultant value is written into L1 memory at the location of the indicated L2 gpr. Note that we don't currently let an L1 guest emulate a device for an L2 guest which is then passed through to an L3 guest. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_book3s.h | 2 +- arch/powerpc/include/asm/kvm_host.h | 3 +++ arch/powerpc/kvm/book3s_hv.c | 12 ++ arch/powerpc/kvm/book3s_hv_nested.c | 43 ++- arch/powerpc/kvm/powerpc.c| 4 5 files changed, 53 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 5883fcce7009..ea94110bfde4 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -311,7 +311,7 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu, void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr); void kvmhv_restore_hv_return_state(struct kvm_vcpu *vcpu, struct hv_guest_state *hr); -long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu); +long int kvmhv_nested_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu); void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac); diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index fac6f631ed29..7a2483a139cf 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -793,6 +793,7 @@ struct kvm_vcpu_arch { /* For support of nested guests */ struct kvm_nested_guest *nested; u32 nested_vcpu_id; + gpa_t nested_io_gpr; #endif #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING @@ -827,6 +828,8 @@ struct kvm_vcpu_arch { #define KVM_MMIO_REG_FQPR 0x00c0 #define KVM_MMIO_REG_VSX 0x0100 #define KVM_MMIO_REG_VMX 0x0180 +#define KVM_MMIO_REG_NESTED_GPR0xffc0 + #define __KVM_HAVE_ARCH_WQP #define __KVM_HAVE_CREATE_DEVICE diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 6c8b4f632168..e7233499e063 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -984,6 +984,10 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) if (ret == H_INTERRUPT) { kvmppc_set_gpr(vcpu, 3, 0); return -EINTR; + } else if (ret == H_TOO_HARD) { + kvmppc_set_gpr(vcpu, 3, 0); + vcpu->arch.hcall_needed = 0; + return RESUME_HOST; } break; case H_TLB_INVALIDATE: @@ -1335,7 +1339,7 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, return r; } -static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu) +static int kvmppc_handle_nested_exit(struct kvm_run *run, struct kvm_vcpu *vcpu) { int r; int srcu_idx; @@ -1393,7 +1397,7 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu) */ case BOOK3S_INTERRUPT_H_DATA_STORAGE: srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); - r = kvmhv_nested_page_fault(vcpu); + r = kvmhv_nested_page_fault(run, vcpu); srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx); break; case BOOK3S_INTERRUPT_H_INST_STORAGE: @@ -1403,7 +1407,7 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu) if (vcpu->arch.shregs.msr & HSRR1_HISI_WRITE) vcpu->arch.fault_dsisr |= DSISR_ISSTORE; srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); - r = kvmhv_nested_page_fault(vcpu); + r = kvmhv_nested_page_fault(run, vcpu); srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx); break; @@ -4058,7 +4062,7 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, if (!nested) r = kvmppc_handle_exit_hv(kvm_run, vcpu, current); else - r = kvmppc_handle_nested_exit(vcpu); +
[PATCH 7/8] KVM: PPC: Introduce new hcall H_COPY_TOFROM_GUEST to access quadrants 1 & 2
A guest cannot access quadrants 1 or 2 as this would result in an exception. Thus introduce the hcall H_COPY_TOFROM_GUEST to be used by a guest when it wants to perform an access to quadrants 1 or 2, for example when it wants to access memory for one of its nested guests. Also provide an implementation for the kvm-hv module. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/hvcall.h | 1 + arch/powerpc/include/asm/kvm_book3s.h | 4 ++ arch/powerpc/kvm/book3s_64_mmu_radix.c | 7 ++-- arch/powerpc/kvm/book3s_hv.c | 6 ++- arch/powerpc/kvm/book3s_hv_nested.c| 75 ++ 5 files changed, 89 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 33a4fc891947..463c63a9fcf1 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -335,6 +335,7 @@ #define H_SET_PARTITION_TABLE 0xF800 #define H_ENTER_NESTED 0xF804 #define H_TLB_INVALIDATE 0xF808 +#define H_COPY_TOFROM_GUEST0xF80C /* Values for 2nd argument to H_SET_MODE */ #define H_SET_MODE_RESOURCE_SET_CIABR 1 diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index ea94110bfde4..720483733bb2 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -188,6 +188,9 @@ extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hc); extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned long ea, unsigned long dsisr); +extern unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, + gva_t eaddr, void *to, void *from, + unsigned long n); extern long kvmhv_copy_from_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, void *to, unsigned long n); extern long kvmhv_copy_to_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, @@ -302,6 +305,7 @@ long kvmhv_nested_init(void); void kvmhv_nested_exit(void); void kvmhv_vm_nested_init(struct kvm *kvm); long kvmhv_set_partition_table(struct kvm_vcpu *vcpu); +long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu); void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1); void kvmhv_release_all_nested(struct kvm *kvm); long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index e1e3ef710bd0..da89d10e5886 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -29,9 +29,9 @@ */ static int p9_supported_radix_bits[4] = { 5, 9, 9, 13 }; -static unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, - gva_t eaddr, void *to, void *from, - unsigned long n) +unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, + gva_t eaddr, void *to, void *from, + unsigned long n) { unsigned long quadrant, ret = n; int old_pid, old_lpid; @@ -82,6 +82,7 @@ static unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, return ret; } +EXPORT_SYMBOL_GPL(__kvmhv_copy_tofrom_guest_radix); static long kvmhv_copy_tofrom_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, void *to, void *from, unsigned long n) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index e7233499e063..e2e15722584a 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -995,7 +995,11 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) if (nesting_enabled(vcpu->kvm)) ret = kvmhv_do_nested_tlbie(vcpu); break; - + case H_COPY_TOFROM_GUEST: + ret = H_FUNCTION; + if (nesting_enabled(vcpu->kvm)) + ret = kvmhv_copy_tofrom_guest_nested(vcpu); + break; default: return RESUME_HOST; } diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 991f40ce4eea..f54301fcfbe4 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -462,6 +462,81 @@ long kvmhv_set_partition_table(struct kvm_vcpu *vcpu) } /* + * Handle the H_COPY_TOFROM_GUEST hcall. + * r4 = L1 lpid of nested guest + * r5 = pid + * r6 = eaddr to access + * r7 = to buffer (L1 gpa) + * r8 = from buffer (L1 gpa) + * r9 = n bytes to copy + */ +long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu) +{ + struct kvm_nested_guest *gp; + int l1_lpid = kvmppc_get_gpr(vcpu, 4); + int pid = kvmppc_get_gpr(vcpu, 5); + gva_t
[PATCH 8/8] KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L3 guest
Previously when a device was being emulated by an L1 guest for an L2 guest, that device couldn't then be passed through to an L3 guest. This was because the L1 guest had no method for accessing L3 memory. The hcall H_COPY_TOFROM_GUEST provides this access. Thus this setup for passthrough can now be allowed. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/book3s_64_mmu_radix.c | 9 - arch/powerpc/kvm/book3s_hv_nested.c| 5 - 2 files changed, 4 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index da89d10e5886..cf16e9d207a5 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -37,11 +37,10 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, int old_pid, old_lpid; bool is_load = !!to; - /* Can't access quadrants 1 or 2 in non-HV mode */ - if (kvmhv_on_pseries()) { - /* TODO h-call */ - return -EPERM; - } + /* Can't access quadrants 1 or 2 in non-HV mode, call the HV to do it */ + if (kvmhv_on_pseries()) + return plpar_hcall_norets(H_COPY_TOFROM_GUEST, lpid, pid, eaddr, + to, from, n); quadrant = 1; if (!pid) diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index f54301fcfbe4..acde90eb56f7 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -1284,11 +1284,6 @@ static long int __kvmhv_nested_page_fault(struct kvm_run *run, } /* passthrough of emulated MMIO case */ - if (kvmhv_on_pseries()) { - pr_err("emulated MMIO passthrough?\n"); - return -EINVAL; - } - return kvmppc_hv_emulate_mmio(run, vcpu, gpa, ea, writing); } if (memslot->flags & KVM_MEM_READONLY) { -- 2.13.6
Re: [PATCH 0/8] KVM: PPC: Implement passthrough of emulated devices for nested guests
On Fri, 2018-12-07 at 14:43 +1100, Suraj Jitindar Singh wrote: > This patch series allows for emulated devices to be passed through to > nested > guests, irrespective of at which level the device is being emulated. > > Note that the emulated device must be using dma, not virtio. > > For example, passing through an emulated e1000: > > 1. Emulate the device at L(n) for L(n+1) > > qemu-system-ppc64 -netdev type=user,id=net0 -device e1000,netdev=net0 > > 2. Assign the VFIO-PCI driver at L(n+1) > > echo :00:00.0 > /sys/bus/pci/drivers/e1000/unbind > echo :00:00.0 > /sys/bus/pci/drivers/vfio-pci/bind > chmod 666 /dev/vfio/0 > > 3. Pass the device through from L(n+1) to L(n+2) > > qemu-system-ppc64 -device vfio-pci,host=:00:00.0 > > 4. L(n+2) can now access the device which will be emulated at L(n) Note, [PATCH] KVM: PPC: Book3S PR: Set hflag to indicate that POWER9 supports 1T segments is not supposed to be part of this series > > Suraj Jitindar Singh (8): > KVM: PPC: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines > KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix() > KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2 > KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops > struct > KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrants > KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an > L2 > guest > KVM: PPC: Introduce new hcall H_COPY_TOFROM_GUEST to access > quadrants > 1 & 2 > KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an > L3 > guest > > arch/powerpc/include/asm/hvcall.h| 1 + > arch/powerpc/include/asm/kvm_book3s.h| 10 ++- > arch/powerpc/include/asm/kvm_book3s_64.h | 13 > arch/powerpc/include/asm/kvm_host.h | 3 + > arch/powerpc/include/asm/kvm_ppc.h | 4 ++ > arch/powerpc/kernel/exceptions-64s.S | 9 +++ > arch/powerpc/kvm/book3s_64_mmu_radix.c | 97 > ++ > arch/powerpc/kvm/book3s_hv.c | 58 ++-- > arch/powerpc/kvm/book3s_hv_nested.c | 114 > +-- > arch/powerpc/kvm/powerpc.c | 28 +++- > arch/powerpc/mm/fault.c | 1 + > 11 files changed, 323 insertions(+), 15 deletions(-) >
[PATCH V2 0/8] KVM: PPC: Implement passthrough of emulated devices for nested guests
This patch series allows for emulated devices to be passed through to nested guests, irrespective of at which level the device is being emulated. Note that the emulated device must be using dma, not virtio. For example, passing through an emulated e1000: 1. Emulate the device at L(n) for L(n+1) qemu-system-ppc64 -netdev type=user,id=net0 -device e1000,netdev=net0 2. Assign the VFIO-PCI driver at L(n+1) echo vfio-pci > /sys/bus/pci/devices/:00:00.0/driver_override echo :00:00.0 > /sys/bus/pci/drivers/e1000/unbind echo :00:00.0 > /sys/bus/pci/drivers/vfio-pci/bind chmod 666 /dev/vfio/0 3. Pass the device through from L(n+1) to L(n+2) qemu-system-ppc64 -device vfio-pci,host=:00:00.0 4. L(n+2) can now access the device which will be emulated at L(n) V1 -> V2: 1/8: None 2/8: None 3/8: None 4/8: None 5/8: None 6/8: Account for L1 differing in endianess in kvmppc_complete_mmio_load() 7/8: None 8/8: None Suraj Jitindar Singh (8): KVM: PPC: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix() KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2 KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops struct KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrants KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2 guest KVM: PPC: Introduce new hcall H_COPY_TOFROM_GUEST to access quadrants 1 & 2 KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L3 guest arch/powerpc/include/asm/hvcall.h| 1 + arch/powerpc/include/asm/kvm_book3s.h| 10 ++- arch/powerpc/include/asm/kvm_book3s_64.h | 13 arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/include/asm/kvm_ppc.h | 4 ++ arch/powerpc/kernel/exceptions-64s.S | 9 +++ arch/powerpc/kvm/book3s_64_mmu_radix.c | 97 ++ arch/powerpc/kvm/book3s_hv.c | 58 ++-- arch/powerpc/kvm/book3s_hv_nested.c | 114 +-- arch/powerpc/kvm/powerpc.c | 30 +++- arch/powerpc/mm/fault.c | 1 + 11 files changed, 325 insertions(+), 15 deletions(-) -- 2.13.6
[PATCH V2 1/8] KVM: PPC: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines
The kvm capability KVM_CAP_SPAPR_TCE_VFIO is used to indicate the availability of in kernel tce acceleration for vfio. However it is currently the case that this is only available on a powernv machine, not for a pseries machine. Thus make this capability dependent on having the cpu feature CPU_FTR_HVMODE. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/powerpc.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2869a299c4ed..95859c53a5cd 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -496,6 +496,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) int r; /* Assume we're using HV mode when the HV module is loaded */ int hv_enabled = kvmppc_hv_ops ? 1 : 0; + int kvm_on_pseries = !cpu_has_feature(CPU_FTR_HVMODE); if (kvm) { /* @@ -543,8 +544,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) #ifdef CONFIG_PPC_BOOK3S_64 case KVM_CAP_SPAPR_TCE: case KVM_CAP_SPAPR_TCE_64: - /* fallthrough */ + r = 1; + break; case KVM_CAP_SPAPR_TCE_VFIO: + r = !kvm_on_pseries; + break; case KVM_CAP_PPC_RTAS: case KVM_CAP_PPC_FIXUP_HCALL: case KVM_CAP_PPC_ENABLE_HCALL: -- 2.13.6
[PATCH V2 2/8] KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix()
There exists a function kvm_is_radix() which is used to determine if a kvm instance is using the radix mmu. However this only applies to the first level (L1) guest. Add a function kvmhv_vcpu_is_radix() which can be used to determine if the current execution context of the vcpu is radix, accounting for if the vcpu is running a nested guest. Currently all nested guests must be radix but this may change in the future. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_book3s_64.h | 13 + arch/powerpc/kvm/book3s_hv_nested.c | 1 + 2 files changed, 14 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 6d298145d564..7a9e472f2872 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -55,6 +55,7 @@ struct kvm_nested_guest { cpumask_t need_tlb_flush; cpumask_t cpu_in_guest; short prev_cpu[NR_CPUS]; + u8 radix; /* is this nested guest radix */ }; /* @@ -150,6 +151,18 @@ static inline bool kvm_is_radix(struct kvm *kvm) return kvm->arch.radix; } +static inline bool kvmhv_vcpu_is_radix(struct kvm_vcpu *vcpu) +{ + bool radix; + + if (vcpu->arch.nested) + radix = vcpu->arch.nested->radix; + else + radix = kvm_is_radix(vcpu->kvm); + + return radix; +} + #define KVM_DEFAULT_HPT_ORDER 24 /* 16MB HPT by default */ #endif diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 401d2ecbebc5..4fca462e54c4 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -480,6 +480,7 @@ struct kvm_nested_guest *kvmhv_alloc_nested(struct kvm *kvm, unsigned int lpid) if (shadow_lpid < 0) goto out_free2; gp->shadow_lpid = shadow_lpid; + gp->radix = 1; memset(gp->prev_cpu, -1, sizeof(gp->prev_cpu)); -- 2.13.6
[PATCH V2 3/8] KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2
The POWER9 radix mmu has the concept of quadrants. The quadrant number is the two high bits of the effective address and determines the fully qualified address to be used for the translation. The fully qualified address consists of the effective lpid, the effective pid and the effective address. This gives then 4 possible quadrants 0, 1, 2, and 3. When accessing these quadrants the fully qualified address is obtained as follows: Quadrant| Hypervisor| Guest -- | EA[0:1] = 0b00| EA[0:1] = 0b00 0 | effLPID = 0 | effLPID = LPIDR | effPID = PIDR| effPID = PIDR -- | EA[0:1] = 0b01| 1 | effLPID = LPIDR | Invalid Access | effPID = PIDR| -- | EA[0:1] = 0b10| 2 | effLPID = LPIDR | Invalid Access | effPID = 0 | -- | EA[0:1] = 0b11| EA[0:1] = 0b11 3 | effLPID = 0 | effLPID = LPIDR | effPID = 0 | effPID = 0 -- In the Guest; Quadrant 3 is normally used to address the operating system since this uses effPID=0 and effLPID=LPIDR, meaning the PID register doesn't need to be switched. Quadrant 0 is normally used to address user space since the effLPID and effPID are taken from the corresponding registers. In the Host; Quadrant 0 and 3 are used as above, however the effLPID is always 0 to address the host. Quadrants 1 and 2 can be used by the host to address guest memory using a guest effective address. Since the effLPID comes from the LPID register, the host loads the LPID of the guest it would like to access (and the PID of the process) and can perform accesses to a guest effective address. This means quadrant 1 can be used to address the guest user space and quadrant 2 can be used to address the guest operating system from the hypervisor, using a guest effective address. Access to the quadrants can cause a Hypervisor Data Storage Interrupt (HDSI) due to being unable to perform partition scoped translation. Previously this could only be generated from a guest and so the code path expects us to take the KVM trampoline in the interrupt handler. This is no longer the case so we modify the handler to call bad_page_fault() to check if we were expecting this fault so we can handle it gracefully and just return with an error code. In the hash mmu case we still raise an unknown exception since quadrants aren't defined for the hash mmu. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_book3s.h | 4 ++ arch/powerpc/kernel/exceptions-64s.S | 9 arch/powerpc/kvm/book3s_64_mmu_radix.c | 97 ++ arch/powerpc/mm/fault.c| 1 + 4 files changed, 111 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 09f8e9ba69bc..5883fcce7009 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -188,6 +188,10 @@ extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hc); extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned long ea, unsigned long dsisr); +extern long kvmhv_copy_from_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, + void *to, unsigned long n); +extern long kvmhv_copy_to_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, + void *from, unsigned long n); extern int kvmppc_mmu_walk_radix_tree(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *gpte, u64 root, u64 *pte_ret_p); diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 89d32bb79d5e..db2691ff4c0b 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -995,7 +995,16 @@ EXC_COMMON_BEGIN(h_data_storage_common) bl save_nvgprs RECONCILE_IRQ_STATE(r10, r11) addir3,r1,STACK_FRAME_OVERHEAD +BEGIN_MMU_FTR_SECTION + ld r4,PACA_EXGEN+EX_DAR(r13) + lwz r5,PACA_EXGEN+EX_DSISR(r13) + std r4,_DAR(r1) + std r5,_DSISR(r1) + li r5,SIGSEGV + bl bad_page_fault +MMU_FTR_SECTION_ELSE bl unknown
[PATCH V2 4/8] KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops struct
The kvmppc_ops struct is used to store function pointers to kvm implementation specific functions. Introduce two new functions load_from_eaddr and store_to_eaddr to be used to load from and store to a guest effective address respectively. Also implement these for the kvm-hv module. If we are using the radix mmu then we can call the functions to access quadrant 1 and 2. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_ppc.h | 4 arch/powerpc/kvm/book3s_hv.c | 40 ++ 2 files changed, 44 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 9b89b1918dfc..159dd76700cb 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -326,6 +326,10 @@ struct kvmppc_ops { unsigned long flags); void (*giveup_ext)(struct kvm_vcpu *vcpu, ulong msr); int (*enable_nested)(struct kvm *kvm); + int (*load_from_eaddr)(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, + int size); + int (*store_to_eaddr)(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, + int size); }; extern struct kvmppc_ops *kvmppc_hv_ops; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index a56f8413758a..8a0921176a60 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -5214,6 +5214,44 @@ static int kvmhv_enable_nested(struct kvm *kvm) return 0; } +static int kvmhv_load_from_eaddr(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, +int size) +{ + int rc = -EINVAL; + + if (kvmhv_vcpu_is_radix(vcpu)) { + rc = kvmhv_copy_from_guest_radix(vcpu, *eaddr, ptr, size); + + if (rc > 0) + rc = -EINVAL; + } + + /* For now quadrants are the only way to access nested guest memory */ + if (rc && vcpu->arch.nested) + rc = -EAGAIN; + + return rc; +} + +static int kvmhv_store_to_eaddr(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, + int size) +{ + int rc = -EINVAL; + + if (kvmhv_vcpu_is_radix(vcpu)) { + rc = kvmhv_copy_to_guest_radix(vcpu, *eaddr, ptr, size); + + if (rc > 0) + rc = -EINVAL; + } + + /* For now quadrants are the only way to access nested guest memory */ + if (rc && vcpu->arch.nested) + rc = -EAGAIN; + + return rc; +} + static struct kvmppc_ops kvm_ops_hv = { .get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv, .set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv, @@ -5254,6 +5292,8 @@ static struct kvmppc_ops kvm_ops_hv = { .get_rmmu_info = kvmhv_get_rmmu_info, .set_smt_mode = kvmhv_set_smt_mode, .enable_nested = kvmhv_enable_nested, + .load_from_eaddr = kvmhv_load_from_eaddr, + .store_to_eaddr = kvmhv_store_to_eaddr, }; static int kvm_init_subcore_bitmap(void) -- 2.13.6
[PATCH V2 5/8] KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrants
The functions kvmppc_st and kvmppc_ld are used to access guest memory from the host using a guest effective address. They do so by translating through the process table to obtain a guest real address and then using kvm_read_guest or kvm_write_guest to make the access with the guest real address. This method of access however only works for L1 guests and will give the incorrect results for a nested guest. We can however use the store_to_eaddr and load_from_eaddr kvmppc_ops to perform the access for a nested guesti (and a L1 guest). So attempt this method first and fall back to the old method if this fails and we aren't running a nested guest. At this stage there is no fall back method to perform the access for a nested guest and this is left as a future improvement. For now we will return to the nested guest and rely on the fact that a translation should be faulted in before retrying the access. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/powerpc.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 95859c53a5cd..cb029fcab404 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -331,10 +331,17 @@ int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, { ulong mp_pa = vcpu->arch.magic_page_pa & KVM_PAM & PAGE_MASK; struct kvmppc_pte pte; - int r; + int r = -EINVAL; vcpu->stat.st++; + if (vcpu->kvm->arch.kvm_ops && vcpu->kvm->arch.kvm_ops->store_to_eaddr) + r = vcpu->kvm->arch.kvm_ops->store_to_eaddr(vcpu, eaddr, ptr, + size); + + if ((!r) || (r == -EAGAIN)) + return r; + r = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST, XLATE_WRITE, &pte); if (r < 0) @@ -367,10 +374,17 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, { ulong mp_pa = vcpu->arch.magic_page_pa & KVM_PAM & PAGE_MASK; struct kvmppc_pte pte; - int rc; + int rc = -EINVAL; vcpu->stat.ld++; + if (vcpu->kvm->arch.kvm_ops && vcpu->kvm->arch.kvm_ops->load_from_eaddr) + rc = vcpu->kvm->arch.kvm_ops->load_from_eaddr(vcpu, eaddr, ptr, + size); + + if ((!rc) || (rc == -EAGAIN)) + return rc; + rc = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST, XLATE_READ, &pte); if (rc) -- 2.13.6
[PATCH V2 6/8] KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2 guest
Allow for a device which is being emulated at L0 (the host) for an L1 guest to be passed through to a nested (L2) guest. The existing kvmppc_hv_emulate_mmio function can be used here. The main challenge is that for a load the result must be stored into the L2 gpr, not an L1 gpr as would normally be the case after going out to qemu to complete the operation. This presents a challenge as at this point the L2 gpr state has been written back into L1 memory. To work around this we store the address in L1 memory of the L2 gpr where the result of the load is to be stored and use the new io_gpr value KVM_MMIO_REG_NESTED_GPR to indicate that this is a nested load for which completion must be done when returning back into the kernel. Then in kvmppc_complete_mmio_load() the resultant value is written into L1 memory at the location of the indicated L2 gpr. Note that we don't currently let an L1 guest emulate a device for an L2 guest which is then passed through to an L3 guest. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_book3s.h | 2 +- arch/powerpc/include/asm/kvm_host.h | 3 +++ arch/powerpc/kvm/book3s_hv.c | 12 ++ arch/powerpc/kvm/book3s_hv_nested.c | 43 ++- arch/powerpc/kvm/powerpc.c| 6 + 5 files changed, 55 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 5883fcce7009..ea94110bfde4 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -311,7 +311,7 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu, void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr); void kvmhv_restore_hv_return_state(struct kvm_vcpu *vcpu, struct hv_guest_state *hr); -long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu); +long int kvmhv_nested_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu); void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac); diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index fac6f631ed29..7a2483a139cf 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -793,6 +793,7 @@ struct kvm_vcpu_arch { /* For support of nested guests */ struct kvm_nested_guest *nested; u32 nested_vcpu_id; + gpa_t nested_io_gpr; #endif #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING @@ -827,6 +828,8 @@ struct kvm_vcpu_arch { #define KVM_MMIO_REG_FQPR 0x00c0 #define KVM_MMIO_REG_VSX 0x0100 #define KVM_MMIO_REG_VMX 0x0180 +#define KVM_MMIO_REG_NESTED_GPR0xffc0 + #define __KVM_HAVE_ARCH_WQP #define __KVM_HAVE_CREATE_DEVICE diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 8a0921176a60..2280bc4778f5 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -985,6 +985,10 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) kvmppc_set_gpr(vcpu, 3, 0); vcpu->arch.hcall_needed = 0; return -EINTR; + } else if (ret == H_TOO_HARD) { + kvmppc_set_gpr(vcpu, 3, 0); + vcpu->arch.hcall_needed = 0; + return RESUME_HOST; } break; case H_TLB_INVALIDATE: @@ -1336,7 +1340,7 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, return r; } -static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu) +static int kvmppc_handle_nested_exit(struct kvm_run *run, struct kvm_vcpu *vcpu) { int r; int srcu_idx; @@ -1394,7 +1398,7 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu) */ case BOOK3S_INTERRUPT_H_DATA_STORAGE: srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); - r = kvmhv_nested_page_fault(vcpu); + r = kvmhv_nested_page_fault(run, vcpu); srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx); break; case BOOK3S_INTERRUPT_H_INST_STORAGE: @@ -1404,7 +1408,7 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu) if (vcpu->arch.shregs.msr & HSRR1_HISI_WRITE) vcpu->arch.fault_dsisr |= DSISR_ISSTORE; srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); - r = kvmhv_nested_page_fault(vcpu); + r = kvmhv_nested_page_fault(run, vcpu); srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx); break; @@ -4059,7 +4063,7 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, if (!nested) r = kvmppc_handle_exit_hv(kvm_run, vcpu, current); else - r = kvmppc_handle_nested_exit(vcpu); +
[PATCH V2 7/8] KVM: PPC: Introduce new hcall H_COPY_TOFROM_GUEST to access quadrants 1 & 2
A guest cannot access quadrants 1 or 2 as this would result in an exception. Thus introduce the hcall H_COPY_TOFROM_GUEST to be used by a guest when it wants to perform an access to quadrants 1 or 2, for example when it wants to access memory for one of its nested guests. Also provide an implementation for the kvm-hv module. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/hvcall.h | 1 + arch/powerpc/include/asm/kvm_book3s.h | 4 ++ arch/powerpc/kvm/book3s_64_mmu_radix.c | 7 ++-- arch/powerpc/kvm/book3s_hv.c | 6 ++- arch/powerpc/kvm/book3s_hv_nested.c| 75 ++ 5 files changed, 89 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 33a4fc891947..463c63a9fcf1 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -335,6 +335,7 @@ #define H_SET_PARTITION_TABLE 0xF800 #define H_ENTER_NESTED 0xF804 #define H_TLB_INVALIDATE 0xF808 +#define H_COPY_TOFROM_GUEST0xF80C /* Values for 2nd argument to H_SET_MODE */ #define H_SET_MODE_RESOURCE_SET_CIABR 1 diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index ea94110bfde4..720483733bb2 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -188,6 +188,9 @@ extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hc); extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned long ea, unsigned long dsisr); +extern unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, + gva_t eaddr, void *to, void *from, + unsigned long n); extern long kvmhv_copy_from_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, void *to, unsigned long n); extern long kvmhv_copy_to_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, @@ -302,6 +305,7 @@ long kvmhv_nested_init(void); void kvmhv_nested_exit(void); void kvmhv_vm_nested_init(struct kvm *kvm); long kvmhv_set_partition_table(struct kvm_vcpu *vcpu); +long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu); void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1); void kvmhv_release_all_nested(struct kvm *kvm); long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index e1e3ef710bd0..da89d10e5886 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -29,9 +29,9 @@ */ static int p9_supported_radix_bits[4] = { 5, 9, 9, 13 }; -static unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, - gva_t eaddr, void *to, void *from, - unsigned long n) +unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, + gva_t eaddr, void *to, void *from, + unsigned long n) { unsigned long quadrant, ret = n; int old_pid, old_lpid; @@ -82,6 +82,7 @@ static unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, return ret; } +EXPORT_SYMBOL_GPL(__kvmhv_copy_tofrom_guest_radix); static long kvmhv_copy_tofrom_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, void *to, void *from, unsigned long n) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 2280bc4778f5..bd07f9b7c5e8 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -996,7 +996,11 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) if (nesting_enabled(vcpu->kvm)) ret = kvmhv_do_nested_tlbie(vcpu); break; - + case H_COPY_TOFROM_GUEST: + ret = H_FUNCTION; + if (nesting_enabled(vcpu->kvm)) + ret = kvmhv_copy_tofrom_guest_nested(vcpu); + break; default: return RESUME_HOST; } diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 991f40ce4eea..f54301fcfbe4 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -462,6 +462,81 @@ long kvmhv_set_partition_table(struct kvm_vcpu *vcpu) } /* + * Handle the H_COPY_TOFROM_GUEST hcall. + * r4 = L1 lpid of nested guest + * r5 = pid + * r6 = eaddr to access + * r7 = to buffer (L1 gpa) + * r8 = from buffer (L1 gpa) + * r9 = n bytes to copy + */ +long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu) +{ + struct kvm_nested_guest *gp; + int l1_lpid = kvmppc_get_gpr(vcpu, 4); + int pid = kvmppc_get_gpr(vcpu, 5); + gva_t
[PATCH V2 8/8] KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L3 guest
Previously when a device was being emulated by an L1 guest for an L2 guest, that device couldn't then be passed through to an L3 guest. This was because the L1 guest had no method for accessing L3 memory. The hcall H_COPY_TOFROM_GUEST provides this access. Thus this setup for passthrough can now be allowed. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/book3s_64_mmu_radix.c | 9 - arch/powerpc/kvm/book3s_hv_nested.c| 5 - 2 files changed, 4 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index da89d10e5886..cf16e9d207a5 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -37,11 +37,10 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, int old_pid, old_lpid; bool is_load = !!to; - /* Can't access quadrants 1 or 2 in non-HV mode */ - if (kvmhv_on_pseries()) { - /* TODO h-call */ - return -EPERM; - } + /* Can't access quadrants 1 or 2 in non-HV mode, call the HV to do it */ + if (kvmhv_on_pseries()) + return plpar_hcall_norets(H_COPY_TOFROM_GUEST, lpid, pid, eaddr, + to, from, n); quadrant = 1; if (!pid) diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index f54301fcfbe4..acde90eb56f7 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -1284,11 +1284,6 @@ static long int __kvmhv_nested_page_fault(struct kvm_run *run, } /* passthrough of emulated MMIO case */ - if (kvmhv_on_pseries()) { - pr_err("emulated MMIO passthrough?\n"); - return -EINVAL; - } - return kvmppc_hv_emulate_mmio(run, vcpu, gpa, ea, writing); } if (memslot->flags & KVM_MEM_READONLY) { -- 2.13.6
Re: [PATCH V2 7/8] KVM: PPC: Introduce new hcall H_COPY_TOFROM_GUEST to access quadrants 1 & 2
On Thu, 2018-12-13 at 16:24 +1100, Paul Mackerras wrote: > On Mon, Dec 10, 2018 at 02:58:24PM +1100, Suraj Jitindar Singh wrote: > > A guest cannot access quadrants 1 or 2 as this would result in an > > exception. Thus introduce the hcall H_COPY_TOFROM_GUEST to be used > > by a > > guest when it wants to perform an access to quadrants 1 or 2, for > > example when it wants to access memory for one of its nested > > guests. > > > > Also provide an implementation for the kvm-hv module. > > > > Signed-off-by: Suraj Jitindar Singh > > [snip] > > > /* > > + * Handle the H_COPY_TOFROM_GUEST hcall. > > + * r4 = L1 lpid of nested guest > > + * r5 = pid > > + * r6 = eaddr to access > > + * r7 = to buffer (L1 gpa) > > + * r8 = from buffer (L1 gpa) > > Comment says these are GPAs... > > > + * r9 = n bytes to copy > > + */ > > +long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu) > > +{ > > + struct kvm_nested_guest *gp; > > + int l1_lpid = kvmppc_get_gpr(vcpu, 4); > > + int pid = kvmppc_get_gpr(vcpu, 5); > > + gva_t eaddr = kvmppc_get_gpr(vcpu, 6); > > + void *gp_to = (void *) kvmppc_get_gpr(vcpu, 7); > > + void *gp_from = (void *) kvmppc_get_gpr(vcpu, 8); > > + void *buf; > > + unsigned long n = kvmppc_get_gpr(vcpu, 9); > > + bool is_load = !!gp_to; > > + long rc; > > + > > + if (gp_to && gp_from) /* One must be NULL to determine the > > direction */ > > + return H_PARAMETER; > > + > > + if (eaddr & (0xFFFUL << 52)) > > + return H_PARAMETER; > > + > > + buf = kzalloc(n, GFP_KERNEL); > > + if (!buf) > > + return H_NO_MEM; > > + > > + gp = kvmhv_get_nested(vcpu->kvm, l1_lpid, false); > > + if (!gp) { > > + rc = H_PARAMETER; > > + goto out_free; > > + } > > + > > + mutex_lock(&gp->tlb_lock); > > + > > + if (is_load) { > > + /* Load from the nested guest into our buffer */ > > + rc = __kvmhv_copy_tofrom_guest_radix(gp- > > >shadow_lpid, pid, > > +eaddr, buf, > > NULL, n); > > + if (rc) > > + goto not_found; > > + > > + /* Write what was loaded into our buffer back to > > the L1 guest */ > > + rc = kvmppc_st(vcpu, (ulong *) &gp_to, n, buf, > > true); > > but using kvmppc_st implies that it is an EA (and in fact when you > call it in the next patch you pass an EA). > > It would be more like other hcalls to pass a GPA, meaning that you > would use kvm_write_guest() here. On the other hand, with the > quadrant access, kvmppc_st() might well be faster than > kvm_write_guest. > > So you need to decide which it is and either fix the comment or > change > the code. Lets stick with gpa for now then for consistency, with room for optimisation. > > Paul.
[PATCH V3 0/8] KVM: PPC: Implement passthrough of emulated devices for nested guests
This patch series allows for emulated devices to be passed through to nested guests, irrespective of at which level the device is being emulated. Note that the emulated device must be using dma, not virtio. For example, passing through an emulated e1000: 1. Emulate the device at L(n) for L(n+1) qemu-system-ppc64 -netdev type=user,id=net0 -device e1000,netdev=net0 2. Assign the VFIO-PCI driver at L(n+1) echo vfio-pci > /sys/bus/pci/devices/:00:00.0/driver_override echo :00:00.0 > /sys/bus/pci/drivers/e1000/unbind echo :00:00.0 > /sys/bus/pci/drivers/vfio-pci/bind chmod 666 /dev/vfio/0 3. Pass the device through from L(n+1) to L(n+2) qemu-system-ppc64 -device vfio-pci,host=:00:00.0 4. L(n+2) can now access the device which will be emulated at L(n) V2 -> V3: 1/8: None 2/8: None 3/8: None 4/8: None 5/8: None 6/8: None 7/8: Use guest physical address for the args in H_COPY_TOFROM_GUEST to match the comment. 8/8: None Suraj Jitindar Singh (8): KVM: PPC: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix() KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2 KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops struct KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrants KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2 guest KVM: PPC: Introduce new hcall H_COPY_TOFROM_GUEST to access quadrants 1 & 2 KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L3 guest arch/powerpc/include/asm/hvcall.h| 1 + arch/powerpc/include/asm/kvm_book3s.h| 10 ++- arch/powerpc/include/asm/kvm_book3s_64.h | 13 arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/include/asm/kvm_ppc.h | 4 ++ arch/powerpc/kernel/exceptions-64s.S | 9 +++ arch/powerpc/kvm/book3s_64_mmu_radix.c | 97 ++ arch/powerpc/kvm/book3s_hv.c | 58 ++-- arch/powerpc/kvm/book3s_hv_nested.c | 114 +-- arch/powerpc/kvm/powerpc.c | 30 +++- arch/powerpc/mm/fault.c | 1 + 11 files changed, 325 insertions(+), 15 deletions(-) -- 2.13.6
[PATCH V3 1/8] KVM: PPC: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines
The kvm capability KVM_CAP_SPAPR_TCE_VFIO is used to indicate the availability of in kernel tce acceleration for vfio. However it is currently the case that this is only available on a powernv machine, not for a pseries machine. Thus make this capability dependent on having the cpu feature CPU_FTR_HVMODE. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/powerpc.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2869a299c4ed..95859c53a5cd 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -496,6 +496,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) int r; /* Assume we're using HV mode when the HV module is loaded */ int hv_enabled = kvmppc_hv_ops ? 1 : 0; + int kvm_on_pseries = !cpu_has_feature(CPU_FTR_HVMODE); if (kvm) { /* @@ -543,8 +544,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) #ifdef CONFIG_PPC_BOOK3S_64 case KVM_CAP_SPAPR_TCE: case KVM_CAP_SPAPR_TCE_64: - /* fallthrough */ + r = 1; + break; case KVM_CAP_SPAPR_TCE_VFIO: + r = !kvm_on_pseries; + break; case KVM_CAP_PPC_RTAS: case KVM_CAP_PPC_FIXUP_HCALL: case KVM_CAP_PPC_ENABLE_HCALL: -- 2.13.6
[PATCH V3 2/8] KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix()
There exists a function kvm_is_radix() which is used to determine if a kvm instance is using the radix mmu. However this only applies to the first level (L1) guest. Add a function kvmhv_vcpu_is_radix() which can be used to determine if the current execution context of the vcpu is radix, accounting for if the vcpu is running a nested guest. Currently all nested guests must be radix but this may change in the future. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_book3s_64.h | 13 + arch/powerpc/kvm/book3s_hv_nested.c | 1 + 2 files changed, 14 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 6d298145d564..7a9e472f2872 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -55,6 +55,7 @@ struct kvm_nested_guest { cpumask_t need_tlb_flush; cpumask_t cpu_in_guest; short prev_cpu[NR_CPUS]; + u8 radix; /* is this nested guest radix */ }; /* @@ -150,6 +151,18 @@ static inline bool kvm_is_radix(struct kvm *kvm) return kvm->arch.radix; } +static inline bool kvmhv_vcpu_is_radix(struct kvm_vcpu *vcpu) +{ + bool radix; + + if (vcpu->arch.nested) + radix = vcpu->arch.nested->radix; + else + radix = kvm_is_radix(vcpu->kvm); + + return radix; +} + #define KVM_DEFAULT_HPT_ORDER 24 /* 16MB HPT by default */ #endif diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 401d2ecbebc5..4fca462e54c4 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -480,6 +480,7 @@ struct kvm_nested_guest *kvmhv_alloc_nested(struct kvm *kvm, unsigned int lpid) if (shadow_lpid < 0) goto out_free2; gp->shadow_lpid = shadow_lpid; + gp->radix = 1; memset(gp->prev_cpu, -1, sizeof(gp->prev_cpu)); -- 2.13.6
[PATCH V3 3/8] KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2
The POWER9 radix mmu has the concept of quadrants. The quadrant number is the two high bits of the effective address and determines the fully qualified address to be used for the translation. The fully qualified address consists of the effective lpid, the effective pid and the effective address. This gives then 4 possible quadrants 0, 1, 2, and 3. When accessing these quadrants the fully qualified address is obtained as follows: Quadrant| Hypervisor| Guest -- | EA[0:1] = 0b00| EA[0:1] = 0b00 0 | effLPID = 0 | effLPID = LPIDR | effPID = PIDR| effPID = PIDR -- | EA[0:1] = 0b01| 1 | effLPID = LPIDR | Invalid Access | effPID = PIDR| -- | EA[0:1] = 0b10| 2 | effLPID = LPIDR | Invalid Access | effPID = 0 | -- | EA[0:1] = 0b11| EA[0:1] = 0b11 3 | effLPID = 0 | effLPID = LPIDR | effPID = 0 | effPID = 0 -- In the Guest; Quadrant 3 is normally used to address the operating system since this uses effPID=0 and effLPID=LPIDR, meaning the PID register doesn't need to be switched. Quadrant 0 is normally used to address user space since the effLPID and effPID are taken from the corresponding registers. In the Host; Quadrant 0 and 3 are used as above, however the effLPID is always 0 to address the host. Quadrants 1 and 2 can be used by the host to address guest memory using a guest effective address. Since the effLPID comes from the LPID register, the host loads the LPID of the guest it would like to access (and the PID of the process) and can perform accesses to a guest effective address. This means quadrant 1 can be used to address the guest user space and quadrant 2 can be used to address the guest operating system from the hypervisor, using a guest effective address. Access to the quadrants can cause a Hypervisor Data Storage Interrupt (HDSI) due to being unable to perform partition scoped translation. Previously this could only be generated from a guest and so the code path expects us to take the KVM trampoline in the interrupt handler. This is no longer the case so we modify the handler to call bad_page_fault() to check if we were expecting this fault so we can handle it gracefully and just return with an error code. In the hash mmu case we still raise an unknown exception since quadrants aren't defined for the hash mmu. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_book3s.h | 4 ++ arch/powerpc/kernel/exceptions-64s.S | 9 arch/powerpc/kvm/book3s_64_mmu_radix.c | 97 ++ arch/powerpc/mm/fault.c| 1 + 4 files changed, 111 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 09f8e9ba69bc..5883fcce7009 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -188,6 +188,10 @@ extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hc); extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned long ea, unsigned long dsisr); +extern long kvmhv_copy_from_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, + void *to, unsigned long n); +extern long kvmhv_copy_to_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, + void *from, unsigned long n); extern int kvmppc_mmu_walk_radix_tree(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *gpte, u64 root, u64 *pte_ret_p); diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 89d32bb79d5e..db2691ff4c0b 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -995,7 +995,16 @@ EXC_COMMON_BEGIN(h_data_storage_common) bl save_nvgprs RECONCILE_IRQ_STATE(r10, r11) addir3,r1,STACK_FRAME_OVERHEAD +BEGIN_MMU_FTR_SECTION + ld r4,PACA_EXGEN+EX_DAR(r13) + lwz r5,PACA_EXGEN+EX_DSISR(r13) + std r4,_DAR(r1) + std r5,_DSISR(r1) + li r5,SIGSEGV + bl bad_page_fault +MMU_FTR_SECTION_ELSE bl unknown
[PATCH V3 4/8] KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops struct
The kvmppc_ops struct is used to store function pointers to kvm implementation specific functions. Introduce two new functions load_from_eaddr and store_to_eaddr to be used to load from and store to a guest effective address respectively. Also implement these for the kvm-hv module. If we are using the radix mmu then we can call the functions to access quadrant 1 and 2. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_ppc.h | 4 arch/powerpc/kvm/book3s_hv.c | 40 ++ 2 files changed, 44 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 9b89b1918dfc..159dd76700cb 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -326,6 +326,10 @@ struct kvmppc_ops { unsigned long flags); void (*giveup_ext)(struct kvm_vcpu *vcpu, ulong msr); int (*enable_nested)(struct kvm *kvm); + int (*load_from_eaddr)(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, + int size); + int (*store_to_eaddr)(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, + int size); }; extern struct kvmppc_ops *kvmppc_hv_ops; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index a56f8413758a..8a0921176a60 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -5214,6 +5214,44 @@ static int kvmhv_enable_nested(struct kvm *kvm) return 0; } +static int kvmhv_load_from_eaddr(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, +int size) +{ + int rc = -EINVAL; + + if (kvmhv_vcpu_is_radix(vcpu)) { + rc = kvmhv_copy_from_guest_radix(vcpu, *eaddr, ptr, size); + + if (rc > 0) + rc = -EINVAL; + } + + /* For now quadrants are the only way to access nested guest memory */ + if (rc && vcpu->arch.nested) + rc = -EAGAIN; + + return rc; +} + +static int kvmhv_store_to_eaddr(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, + int size) +{ + int rc = -EINVAL; + + if (kvmhv_vcpu_is_radix(vcpu)) { + rc = kvmhv_copy_to_guest_radix(vcpu, *eaddr, ptr, size); + + if (rc > 0) + rc = -EINVAL; + } + + /* For now quadrants are the only way to access nested guest memory */ + if (rc && vcpu->arch.nested) + rc = -EAGAIN; + + return rc; +} + static struct kvmppc_ops kvm_ops_hv = { .get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv, .set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv, @@ -5254,6 +5292,8 @@ static struct kvmppc_ops kvm_ops_hv = { .get_rmmu_info = kvmhv_get_rmmu_info, .set_smt_mode = kvmhv_set_smt_mode, .enable_nested = kvmhv_enable_nested, + .load_from_eaddr = kvmhv_load_from_eaddr, + .store_to_eaddr = kvmhv_store_to_eaddr, }; static int kvm_init_subcore_bitmap(void) -- 2.13.6
[PATCH V3 5/8] KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrants
The functions kvmppc_st and kvmppc_ld are used to access guest memory from the host using a guest effective address. They do so by translating through the process table to obtain a guest real address and then using kvm_read_guest or kvm_write_guest to make the access with the guest real address. This method of access however only works for L1 guests and will give the incorrect results for a nested guest. We can however use the store_to_eaddr and load_from_eaddr kvmppc_ops to perform the access for a nested guesti (and a L1 guest). So attempt this method first and fall back to the old method if this fails and we aren't running a nested guest. At this stage there is no fall back method to perform the access for a nested guest and this is left as a future improvement. For now we will return to the nested guest and rely on the fact that a translation should be faulted in before retrying the access. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/powerpc.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 95859c53a5cd..cb029fcab404 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -331,10 +331,17 @@ int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, { ulong mp_pa = vcpu->arch.magic_page_pa & KVM_PAM & PAGE_MASK; struct kvmppc_pte pte; - int r; + int r = -EINVAL; vcpu->stat.st++; + if (vcpu->kvm->arch.kvm_ops && vcpu->kvm->arch.kvm_ops->store_to_eaddr) + r = vcpu->kvm->arch.kvm_ops->store_to_eaddr(vcpu, eaddr, ptr, + size); + + if ((!r) || (r == -EAGAIN)) + return r; + r = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST, XLATE_WRITE, &pte); if (r < 0) @@ -367,10 +374,17 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, { ulong mp_pa = vcpu->arch.magic_page_pa & KVM_PAM & PAGE_MASK; struct kvmppc_pte pte; - int rc; + int rc = -EINVAL; vcpu->stat.ld++; + if (vcpu->kvm->arch.kvm_ops && vcpu->kvm->arch.kvm_ops->load_from_eaddr) + rc = vcpu->kvm->arch.kvm_ops->load_from_eaddr(vcpu, eaddr, ptr, + size); + + if ((!rc) || (rc == -EAGAIN)) + return rc; + rc = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST, XLATE_READ, &pte); if (rc) -- 2.13.6
[PATCH V3 6/8] KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2 guest
Allow for a device which is being emulated at L0 (the host) for an L1 guest to be passed through to a nested (L2) guest. The existing kvmppc_hv_emulate_mmio function can be used here. The main challenge is that for a load the result must be stored into the L2 gpr, not an L1 gpr as would normally be the case after going out to qemu to complete the operation. This presents a challenge as at this point the L2 gpr state has been written back into L1 memory. To work around this we store the address in L1 memory of the L2 gpr where the result of the load is to be stored and use the new io_gpr value KVM_MMIO_REG_NESTED_GPR to indicate that this is a nested load for which completion must be done when returning back into the kernel. Then in kvmppc_complete_mmio_load() the resultant value is written into L1 memory at the location of the indicated L2 gpr. Note that we don't currently let an L1 guest emulate a device for an L2 guest which is then passed through to an L3 guest. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_book3s.h | 2 +- arch/powerpc/include/asm/kvm_host.h | 3 +++ arch/powerpc/kvm/book3s_hv.c | 12 ++ arch/powerpc/kvm/book3s_hv_nested.c | 43 ++- arch/powerpc/kvm/powerpc.c| 6 + 5 files changed, 55 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 5883fcce7009..ea94110bfde4 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -311,7 +311,7 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu, void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr); void kvmhv_restore_hv_return_state(struct kvm_vcpu *vcpu, struct hv_guest_state *hr); -long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu); +long int kvmhv_nested_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu); void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac); diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index fac6f631ed29..7a2483a139cf 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -793,6 +793,7 @@ struct kvm_vcpu_arch { /* For support of nested guests */ struct kvm_nested_guest *nested; u32 nested_vcpu_id; + gpa_t nested_io_gpr; #endif #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING @@ -827,6 +828,8 @@ struct kvm_vcpu_arch { #define KVM_MMIO_REG_FQPR 0x00c0 #define KVM_MMIO_REG_VSX 0x0100 #define KVM_MMIO_REG_VMX 0x0180 +#define KVM_MMIO_REG_NESTED_GPR0xffc0 + #define __KVM_HAVE_ARCH_WQP #define __KVM_HAVE_CREATE_DEVICE diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 8a0921176a60..2280bc4778f5 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -985,6 +985,10 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) kvmppc_set_gpr(vcpu, 3, 0); vcpu->arch.hcall_needed = 0; return -EINTR; + } else if (ret == H_TOO_HARD) { + kvmppc_set_gpr(vcpu, 3, 0); + vcpu->arch.hcall_needed = 0; + return RESUME_HOST; } break; case H_TLB_INVALIDATE: @@ -1336,7 +1340,7 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, return r; } -static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu) +static int kvmppc_handle_nested_exit(struct kvm_run *run, struct kvm_vcpu *vcpu) { int r; int srcu_idx; @@ -1394,7 +1398,7 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu) */ case BOOK3S_INTERRUPT_H_DATA_STORAGE: srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); - r = kvmhv_nested_page_fault(vcpu); + r = kvmhv_nested_page_fault(run, vcpu); srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx); break; case BOOK3S_INTERRUPT_H_INST_STORAGE: @@ -1404,7 +1408,7 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu) if (vcpu->arch.shregs.msr & HSRR1_HISI_WRITE) vcpu->arch.fault_dsisr |= DSISR_ISSTORE; srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); - r = kvmhv_nested_page_fault(vcpu); + r = kvmhv_nested_page_fault(run, vcpu); srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx); break; @@ -4059,7 +4063,7 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, if (!nested) r = kvmppc_handle_exit_hv(kvm_run, vcpu, current); else - r = kvmppc_handle_nested_exit(vcpu); +
[PATCH V3 7/8] KVM: PPC: Introduce new hcall H_COPY_TOFROM_GUEST to access quadrants 1 & 2
A guest cannot access quadrants 1 or 2 as this would result in an exception. Thus introduce the hcall H_COPY_TOFROM_GUEST to be used by a guest when it wants to perform an access to quadrants 1 or 2, for example when it wants to access memory for one of its nested guests. Also provide an implementation for the kvm-hv module. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/hvcall.h | 1 + arch/powerpc/include/asm/kvm_book3s.h | 4 ++ arch/powerpc/kvm/book3s_64_mmu_radix.c | 7 ++-- arch/powerpc/kvm/book3s_hv.c | 6 ++- arch/powerpc/kvm/book3s_hv_nested.c| 75 ++ 5 files changed, 89 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 33a4fc891947..463c63a9fcf1 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -335,6 +335,7 @@ #define H_SET_PARTITION_TABLE 0xF800 #define H_ENTER_NESTED 0xF804 #define H_TLB_INVALIDATE 0xF808 +#define H_COPY_TOFROM_GUEST0xF80C /* Values for 2nd argument to H_SET_MODE */ #define H_SET_MODE_RESOURCE_SET_CIABR 1 diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index ea94110bfde4..720483733bb2 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -188,6 +188,9 @@ extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hc); extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned long ea, unsigned long dsisr); +extern unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, + gva_t eaddr, void *to, void *from, + unsigned long n); extern long kvmhv_copy_from_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, void *to, unsigned long n); extern long kvmhv_copy_to_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, @@ -302,6 +305,7 @@ long kvmhv_nested_init(void); void kvmhv_nested_exit(void); void kvmhv_vm_nested_init(struct kvm *kvm); long kvmhv_set_partition_table(struct kvm_vcpu *vcpu); +long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu); void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1); void kvmhv_release_all_nested(struct kvm *kvm); long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index e1e3ef710bd0..da89d10e5886 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -29,9 +29,9 @@ */ static int p9_supported_radix_bits[4] = { 5, 9, 9, 13 }; -static unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, - gva_t eaddr, void *to, void *from, - unsigned long n) +unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, + gva_t eaddr, void *to, void *from, + unsigned long n) { unsigned long quadrant, ret = n; int old_pid, old_lpid; @@ -82,6 +82,7 @@ static unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, return ret; } +EXPORT_SYMBOL_GPL(__kvmhv_copy_tofrom_guest_radix); static long kvmhv_copy_tofrom_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, void *to, void *from, unsigned long n) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 2280bc4778f5..bd07f9b7c5e8 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -996,7 +996,11 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) if (nesting_enabled(vcpu->kvm)) ret = kvmhv_do_nested_tlbie(vcpu); break; - + case H_COPY_TOFROM_GUEST: + ret = H_FUNCTION; + if (nesting_enabled(vcpu->kvm)) + ret = kvmhv_copy_tofrom_guest_nested(vcpu); + break; default: return RESUME_HOST; } diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 991f40ce4eea..5903175751b4 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -462,6 +462,81 @@ long kvmhv_set_partition_table(struct kvm_vcpu *vcpu) } /* + * Handle the H_COPY_TOFROM_GUEST hcall. + * r4 = L1 lpid of nested guest + * r5 = pid + * r6 = eaddr to access + * r7 = to buffer (L1 gpa) + * r8 = from buffer (L1 gpa) + * r9 = n bytes to copy + */ +long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu) +{ + struct kvm_nested_guest *gp; + int l1_lpid = kvmppc_get_gpr(vcpu, 4); + int pid = kvmppc_get_gpr(vcpu, 5); + gva_t
[PATCH V3 8/8] KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L3 guest
Previously when a device was being emulated by an L1 guest for an L2 guest, that device couldn't then be passed through to an L3 guest. This was because the L1 guest had no method for accessing L3 memory. The hcall H_COPY_TOFROM_GUEST provides this access. Thus this setup for passthrough can now be allowed. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/book3s_64_mmu_radix.c | 9 - arch/powerpc/kvm/book3s_hv_nested.c| 5 - 2 files changed, 4 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index da89d10e5886..8522b034a4b2 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -37,11 +37,10 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, int old_pid, old_lpid; bool is_load = !!to; - /* Can't access quadrants 1 or 2 in non-HV mode */ - if (kvmhv_on_pseries()) { - /* TODO h-call */ - return -EPERM; - } + /* Can't access quadrants 1 or 2 in non-HV mode, call the HV to do it */ + if (kvmhv_on_pseries()) + return plpar_hcall_norets(H_COPY_TOFROM_GUEST, lpid, pid, eaddr, + __pa(to), __pa(from), n); quadrant = 1; if (!pid) diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 5903175751b4..a9db12cbc0fa 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -1284,11 +1284,6 @@ static long int __kvmhv_nested_page_fault(struct kvm_run *run, } /* passthrough of emulated MMIO case */ - if (kvmhv_on_pseries()) { - pr_err("emulated MMIO passthrough?\n"); - return -EINVAL; - } - return kvmppc_hv_emulate_mmio(run, vcpu, gpa, ea, writing); } if (memslot->flags & KVM_MEM_READONLY) { -- 2.13.6
[PATCH V4 0/8] KVM: PPC: Implement passthrough of emulated devices for nested guests
This patch series allows for emulated devices to be passed through to nested guests, irrespective of at which level the device is being emulated. Note that the emulated device must be using dma, not virtio. For example, passing through an emulated e1000: 1. Emulate the device at L(n) for L(n+1) qemu-system-ppc64 -netdev type=user,id=net0 -device e1000,netdev=net0 2. Assign the VFIO-PCI driver at L(n+1) echo vfio-pci > /sys/bus/pci/devices/:00:00.0/driver_override echo :00:00.0 > /sys/bus/pci/drivers/e1000/unbind echo :00:00.0 > /sys/bus/pci/drivers/vfio-pci/bind chmod 666 /dev/vfio/0 3. Pass the device through from L(n+1) to L(n+2) qemu-system-ppc64 -device vfio-pci,host=:00:00.0 4. L(n+2) can now access the device which will be emulated at L(n) V2 -> V3: 1/8: None 2/8: None 3/8: None 4/8: None 5/8: None 6/8: Add if def to fix compilation for some platforms 7/8: None 8/8: None Suraj Jitindar Singh (8): KVM: PPC: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix() KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2 KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops struct KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrants KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2 guest KVM: PPC: Introduce new hcall H_COPY_TOFROM_GUEST to access quadrants 1 & 2 KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L3 guest arch/powerpc/include/asm/hvcall.h| 1 + arch/powerpc/include/asm/kvm_book3s.h| 10 ++- arch/powerpc/include/asm/kvm_book3s_64.h | 13 arch/powerpc/include/asm/kvm_host.h | 3 + arch/powerpc/include/asm/kvm_ppc.h | 4 ++ arch/powerpc/kernel/exceptions-64s.S | 9 +++ arch/powerpc/kvm/book3s_64_mmu_radix.c | 97 ++ arch/powerpc/kvm/book3s_hv.c | 58 ++-- arch/powerpc/kvm/book3s_hv_nested.c | 114 +-- arch/powerpc/kvm/powerpc.c | 32 - arch/powerpc/mm/fault.c | 1 + 11 files changed, 327 insertions(+), 15 deletions(-) -- 2.13.6
[PATCH V4 1/8] KVM: PPC: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines
The kvm capability KVM_CAP_SPAPR_TCE_VFIO is used to indicate the availability of in kernel tce acceleration for vfio. However it is currently the case that this is only available on a powernv machine, not for a pseries machine. Thus make this capability dependent on having the cpu feature CPU_FTR_HVMODE. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/powerpc.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 2869a299c4ed..95859c53a5cd 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -496,6 +496,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) int r; /* Assume we're using HV mode when the HV module is loaded */ int hv_enabled = kvmppc_hv_ops ? 1 : 0; + int kvm_on_pseries = !cpu_has_feature(CPU_FTR_HVMODE); if (kvm) { /* @@ -543,8 +544,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) #ifdef CONFIG_PPC_BOOK3S_64 case KVM_CAP_SPAPR_TCE: case KVM_CAP_SPAPR_TCE_64: - /* fallthrough */ + r = 1; + break; case KVM_CAP_SPAPR_TCE_VFIO: + r = !kvm_on_pseries; + break; case KVM_CAP_PPC_RTAS: case KVM_CAP_PPC_FIXUP_HCALL: case KVM_CAP_PPC_ENABLE_HCALL: -- 2.13.6
[PATCH V4 2/8] KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix()
There exists a function kvm_is_radix() which is used to determine if a kvm instance is using the radix mmu. However this only applies to the first level (L1) guest. Add a function kvmhv_vcpu_is_radix() which can be used to determine if the current execution context of the vcpu is radix, accounting for if the vcpu is running a nested guest. Currently all nested guests must be radix but this may change in the future. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_book3s_64.h | 13 + arch/powerpc/kvm/book3s_hv_nested.c | 1 + 2 files changed, 14 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 6d298145d564..7a9e472f2872 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -55,6 +55,7 @@ struct kvm_nested_guest { cpumask_t need_tlb_flush; cpumask_t cpu_in_guest; short prev_cpu[NR_CPUS]; + u8 radix; /* is this nested guest radix */ }; /* @@ -150,6 +151,18 @@ static inline bool kvm_is_radix(struct kvm *kvm) return kvm->arch.radix; } +static inline bool kvmhv_vcpu_is_radix(struct kvm_vcpu *vcpu) +{ + bool radix; + + if (vcpu->arch.nested) + radix = vcpu->arch.nested->radix; + else + radix = kvm_is_radix(vcpu->kvm); + + return radix; +} + #define KVM_DEFAULT_HPT_ORDER 24 /* 16MB HPT by default */ #endif diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 401d2ecbebc5..4fca462e54c4 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -480,6 +480,7 @@ struct kvm_nested_guest *kvmhv_alloc_nested(struct kvm *kvm, unsigned int lpid) if (shadow_lpid < 0) goto out_free2; gp->shadow_lpid = shadow_lpid; + gp->radix = 1; memset(gp->prev_cpu, -1, sizeof(gp->prev_cpu)); -- 2.13.6
[PATCH V4 3/8] KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2
The POWER9 radix mmu has the concept of quadrants. The quadrant number is the two high bits of the effective address and determines the fully qualified address to be used for the translation. The fully qualified address consists of the effective lpid, the effective pid and the effective address. This gives then 4 possible quadrants 0, 1, 2, and 3. When accessing these quadrants the fully qualified address is obtained as follows: Quadrant| Hypervisor| Guest -- | EA[0:1] = 0b00| EA[0:1] = 0b00 0 | effLPID = 0 | effLPID = LPIDR | effPID = PIDR| effPID = PIDR -- | EA[0:1] = 0b01| 1 | effLPID = LPIDR | Invalid Access | effPID = PIDR| -- | EA[0:1] = 0b10| 2 | effLPID = LPIDR | Invalid Access | effPID = 0 | -- | EA[0:1] = 0b11| EA[0:1] = 0b11 3 | effLPID = 0 | effLPID = LPIDR | effPID = 0 | effPID = 0 -- In the Guest; Quadrant 3 is normally used to address the operating system since this uses effPID=0 and effLPID=LPIDR, meaning the PID register doesn't need to be switched. Quadrant 0 is normally used to address user space since the effLPID and effPID are taken from the corresponding registers. In the Host; Quadrant 0 and 3 are used as above, however the effLPID is always 0 to address the host. Quadrants 1 and 2 can be used by the host to address guest memory using a guest effective address. Since the effLPID comes from the LPID register, the host loads the LPID of the guest it would like to access (and the PID of the process) and can perform accesses to a guest effective address. This means quadrant 1 can be used to address the guest user space and quadrant 2 can be used to address the guest operating system from the hypervisor, using a guest effective address. Access to the quadrants can cause a Hypervisor Data Storage Interrupt (HDSI) due to being unable to perform partition scoped translation. Previously this could only be generated from a guest and so the code path expects us to take the KVM trampoline in the interrupt handler. This is no longer the case so we modify the handler to call bad_page_fault() to check if we were expecting this fault so we can handle it gracefully and just return with an error code. In the hash mmu case we still raise an unknown exception since quadrants aren't defined for the hash mmu. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_book3s.h | 4 ++ arch/powerpc/kernel/exceptions-64s.S | 9 arch/powerpc/kvm/book3s_64_mmu_radix.c | 97 ++ arch/powerpc/mm/fault.c| 1 + 4 files changed, 111 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 09f8e9ba69bc..5883fcce7009 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -188,6 +188,10 @@ extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hc); extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned long ea, unsigned long dsisr); +extern long kvmhv_copy_from_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, + void *to, unsigned long n); +extern long kvmhv_copy_to_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, + void *from, unsigned long n); extern int kvmppc_mmu_walk_radix_tree(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *gpte, u64 root, u64 *pte_ret_p); diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 89d32bb79d5e..db2691ff4c0b 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -995,7 +995,16 @@ EXC_COMMON_BEGIN(h_data_storage_common) bl save_nvgprs RECONCILE_IRQ_STATE(r10, r11) addir3,r1,STACK_FRAME_OVERHEAD +BEGIN_MMU_FTR_SECTION + ld r4,PACA_EXGEN+EX_DAR(r13) + lwz r5,PACA_EXGEN+EX_DSISR(r13) + std r4,_DAR(r1) + std r5,_DSISR(r1) + li r5,SIGSEGV + bl bad_page_fault +MMU_FTR_SECTION_ELSE bl unknown
[PATCH V4 4/8] KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops struct
The kvmppc_ops struct is used to store function pointers to kvm implementation specific functions. Introduce two new functions load_from_eaddr and store_to_eaddr to be used to load from and store to a guest effective address respectively. Also implement these for the kvm-hv module. If we are using the radix mmu then we can call the functions to access quadrant 1 and 2. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_ppc.h | 4 arch/powerpc/kvm/book3s_hv.c | 40 ++ 2 files changed, 44 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 9b89b1918dfc..159dd76700cb 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -326,6 +326,10 @@ struct kvmppc_ops { unsigned long flags); void (*giveup_ext)(struct kvm_vcpu *vcpu, ulong msr); int (*enable_nested)(struct kvm *kvm); + int (*load_from_eaddr)(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, + int size); + int (*store_to_eaddr)(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, + int size); }; extern struct kvmppc_ops *kvmppc_hv_ops; diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index a56f8413758a..8a0921176a60 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -5214,6 +5214,44 @@ static int kvmhv_enable_nested(struct kvm *kvm) return 0; } +static int kvmhv_load_from_eaddr(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, +int size) +{ + int rc = -EINVAL; + + if (kvmhv_vcpu_is_radix(vcpu)) { + rc = kvmhv_copy_from_guest_radix(vcpu, *eaddr, ptr, size); + + if (rc > 0) + rc = -EINVAL; + } + + /* For now quadrants are the only way to access nested guest memory */ + if (rc && vcpu->arch.nested) + rc = -EAGAIN; + + return rc; +} + +static int kvmhv_store_to_eaddr(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr, + int size) +{ + int rc = -EINVAL; + + if (kvmhv_vcpu_is_radix(vcpu)) { + rc = kvmhv_copy_to_guest_radix(vcpu, *eaddr, ptr, size); + + if (rc > 0) + rc = -EINVAL; + } + + /* For now quadrants are the only way to access nested guest memory */ + if (rc && vcpu->arch.nested) + rc = -EAGAIN; + + return rc; +} + static struct kvmppc_ops kvm_ops_hv = { .get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv, .set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv, @@ -5254,6 +5292,8 @@ static struct kvmppc_ops kvm_ops_hv = { .get_rmmu_info = kvmhv_get_rmmu_info, .set_smt_mode = kvmhv_set_smt_mode, .enable_nested = kvmhv_enable_nested, + .load_from_eaddr = kvmhv_load_from_eaddr, + .store_to_eaddr = kvmhv_store_to_eaddr, }; static int kvm_init_subcore_bitmap(void) -- 2.13.6
[PATCH V4 5/8] KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrants
The functions kvmppc_st and kvmppc_ld are used to access guest memory from the host using a guest effective address. They do so by translating through the process table to obtain a guest real address and then using kvm_read_guest or kvm_write_guest to make the access with the guest real address. This method of access however only works for L1 guests and will give the incorrect results for a nested guest. We can however use the store_to_eaddr and load_from_eaddr kvmppc_ops to perform the access for a nested guesti (and a L1 guest). So attempt this method first and fall back to the old method if this fails and we aren't running a nested guest. At this stage there is no fall back method to perform the access for a nested guest and this is left as a future improvement. For now we will return to the nested guest and rely on the fact that a translation should be faulted in before retrying the access. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/powerpc.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 95859c53a5cd..cb029fcab404 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -331,10 +331,17 @@ int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, { ulong mp_pa = vcpu->arch.magic_page_pa & KVM_PAM & PAGE_MASK; struct kvmppc_pte pte; - int r; + int r = -EINVAL; vcpu->stat.st++; + if (vcpu->kvm->arch.kvm_ops && vcpu->kvm->arch.kvm_ops->store_to_eaddr) + r = vcpu->kvm->arch.kvm_ops->store_to_eaddr(vcpu, eaddr, ptr, + size); + + if ((!r) || (r == -EAGAIN)) + return r; + r = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST, XLATE_WRITE, &pte); if (r < 0) @@ -367,10 +374,17 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, { ulong mp_pa = vcpu->arch.magic_page_pa & KVM_PAM & PAGE_MASK; struct kvmppc_pte pte; - int rc; + int rc = -EINVAL; vcpu->stat.ld++; + if (vcpu->kvm->arch.kvm_ops && vcpu->kvm->arch.kvm_ops->load_from_eaddr) + rc = vcpu->kvm->arch.kvm_ops->load_from_eaddr(vcpu, eaddr, ptr, + size); + + if ((!rc) || (rc == -EAGAIN)) + return rc; + rc = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST, XLATE_READ, &pte); if (rc) -- 2.13.6
[PATCH V4 6/8] KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2 guest
Allow for a device which is being emulated at L0 (the host) for an L1 guest to be passed through to a nested (L2) guest. The existing kvmppc_hv_emulate_mmio function can be used here. The main challenge is that for a load the result must be stored into the L2 gpr, not an L1 gpr as would normally be the case after going out to qemu to complete the operation. This presents a challenge as at this point the L2 gpr state has been written back into L1 memory. To work around this we store the address in L1 memory of the L2 gpr where the result of the load is to be stored and use the new io_gpr value KVM_MMIO_REG_NESTED_GPR to indicate that this is a nested load for which completion must be done when returning back into the kernel. Then in kvmppc_complete_mmio_load() the resultant value is written into L1 memory at the location of the indicated L2 gpr. Note that we don't currently let an L1 guest emulate a device for an L2 guest which is then passed through to an L3 guest. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_book3s.h | 2 +- arch/powerpc/include/asm/kvm_host.h | 3 +++ arch/powerpc/kvm/book3s_hv.c | 12 ++ arch/powerpc/kvm/book3s_hv_nested.c | 43 ++- arch/powerpc/kvm/powerpc.c| 8 +++ 5 files changed, 57 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 5883fcce7009..ea94110bfde4 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -311,7 +311,7 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu, void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr); void kvmhv_restore_hv_return_state(struct kvm_vcpu *vcpu, struct hv_guest_state *hr); -long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu); +long int kvmhv_nested_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu); void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac); diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index fac6f631ed29..7a2483a139cf 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -793,6 +793,7 @@ struct kvm_vcpu_arch { /* For support of nested guests */ struct kvm_nested_guest *nested; u32 nested_vcpu_id; + gpa_t nested_io_gpr; #endif #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING @@ -827,6 +828,8 @@ struct kvm_vcpu_arch { #define KVM_MMIO_REG_FQPR 0x00c0 #define KVM_MMIO_REG_VSX 0x0100 #define KVM_MMIO_REG_VMX 0x0180 +#define KVM_MMIO_REG_NESTED_GPR0xffc0 + #define __KVM_HAVE_ARCH_WQP #define __KVM_HAVE_CREATE_DEVICE diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 8a0921176a60..2280bc4778f5 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -985,6 +985,10 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) kvmppc_set_gpr(vcpu, 3, 0); vcpu->arch.hcall_needed = 0; return -EINTR; + } else if (ret == H_TOO_HARD) { + kvmppc_set_gpr(vcpu, 3, 0); + vcpu->arch.hcall_needed = 0; + return RESUME_HOST; } break; case H_TLB_INVALIDATE: @@ -1336,7 +1340,7 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, return r; } -static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu) +static int kvmppc_handle_nested_exit(struct kvm_run *run, struct kvm_vcpu *vcpu) { int r; int srcu_idx; @@ -1394,7 +1398,7 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu) */ case BOOK3S_INTERRUPT_H_DATA_STORAGE: srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); - r = kvmhv_nested_page_fault(vcpu); + r = kvmhv_nested_page_fault(run, vcpu); srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx); break; case BOOK3S_INTERRUPT_H_INST_STORAGE: @@ -1404,7 +1408,7 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu) if (vcpu->arch.shregs.msr & HSRR1_HISI_WRITE) vcpu->arch.fault_dsisr |= DSISR_ISSTORE; srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); - r = kvmhv_nested_page_fault(vcpu); + r = kvmhv_nested_page_fault(run, vcpu); srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx); break; @@ -4059,7 +4063,7 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, if (!nested) r = kvmppc_handle_exit_hv(kvm_run, vcpu, current); else - r = kvmppc_handle_nested_exit(vcpu);
[PATCH V4 7/8] KVM: PPC: Introduce new hcall H_COPY_TOFROM_GUEST to access quadrants 1 & 2
A guest cannot access quadrants 1 or 2 as this would result in an exception. Thus introduce the hcall H_COPY_TOFROM_GUEST to be used by a guest when it wants to perform an access to quadrants 1 or 2, for example when it wants to access memory for one of its nested guests. Also provide an implementation for the kvm-hv module. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/hvcall.h | 1 + arch/powerpc/include/asm/kvm_book3s.h | 4 ++ arch/powerpc/kvm/book3s_64_mmu_radix.c | 7 ++-- arch/powerpc/kvm/book3s_hv.c | 6 ++- arch/powerpc/kvm/book3s_hv_nested.c| 75 ++ 5 files changed, 89 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h index 33a4fc891947..463c63a9fcf1 100644 --- a/arch/powerpc/include/asm/hvcall.h +++ b/arch/powerpc/include/asm/hvcall.h @@ -335,6 +335,7 @@ #define H_SET_PARTITION_TABLE 0xF800 #define H_ENTER_NESTED 0xF804 #define H_TLB_INVALIDATE 0xF808 +#define H_COPY_TOFROM_GUEST0xF80C /* Values for 2nd argument to H_SET_MODE */ #define H_SET_MODE_RESOURCE_SET_CIABR 1 diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index ea94110bfde4..720483733bb2 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -188,6 +188,9 @@ extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned long hc); extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned long ea, unsigned long dsisr); +extern unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, + gva_t eaddr, void *to, void *from, + unsigned long n); extern long kvmhv_copy_from_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, void *to, unsigned long n); extern long kvmhv_copy_to_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, @@ -302,6 +305,7 @@ long kvmhv_nested_init(void); void kvmhv_nested_exit(void); void kvmhv_vm_nested_init(struct kvm *kvm); long kvmhv_set_partition_table(struct kvm_vcpu *vcpu); +long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu); void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1); void kvmhv_release_all_nested(struct kvm *kvm); long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu); diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index e1e3ef710bd0..da89d10e5886 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -29,9 +29,9 @@ */ static int p9_supported_radix_bits[4] = { 5, 9, 9, 13 }; -static unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, - gva_t eaddr, void *to, void *from, - unsigned long n) +unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, + gva_t eaddr, void *to, void *from, + unsigned long n) { unsigned long quadrant, ret = n; int old_pid, old_lpid; @@ -82,6 +82,7 @@ static unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, return ret; } +EXPORT_SYMBOL_GPL(__kvmhv_copy_tofrom_guest_radix); static long kvmhv_copy_tofrom_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, void *to, void *from, unsigned long n) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 2280bc4778f5..bd07f9b7c5e8 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -996,7 +996,11 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) if (nesting_enabled(vcpu->kvm)) ret = kvmhv_do_nested_tlbie(vcpu); break; - + case H_COPY_TOFROM_GUEST: + ret = H_FUNCTION; + if (nesting_enabled(vcpu->kvm)) + ret = kvmhv_copy_tofrom_guest_nested(vcpu); + break; default: return RESUME_HOST; } diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 991f40ce4eea..5903175751b4 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -462,6 +462,81 @@ long kvmhv_set_partition_table(struct kvm_vcpu *vcpu) } /* + * Handle the H_COPY_TOFROM_GUEST hcall. + * r4 = L1 lpid of nested guest + * r5 = pid + * r6 = eaddr to access + * r7 = to buffer (L1 gpa) + * r8 = from buffer (L1 gpa) + * r9 = n bytes to copy + */ +long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu) +{ + struct kvm_nested_guest *gp; + int l1_lpid = kvmppc_get_gpr(vcpu, 4); + int pid = kvmppc_get_gpr(vcpu, 5); + gva_t
[PATCH V4 8/8] KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L3 guest
Previously when a device was being emulated by an L1 guest for an L2 guest, that device couldn't then be passed through to an L3 guest. This was because the L1 guest had no method for accessing L3 memory. The hcall H_COPY_TOFROM_GUEST provides this access. Thus this setup for passthrough can now be allowed. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/book3s_64_mmu_radix.c | 9 - arch/powerpc/kvm/book3s_hv_nested.c| 5 - 2 files changed, 4 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index da89d10e5886..8522b034a4b2 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -37,11 +37,10 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid, int old_pid, old_lpid; bool is_load = !!to; - /* Can't access quadrants 1 or 2 in non-HV mode */ - if (kvmhv_on_pseries()) { - /* TODO h-call */ - return -EPERM; - } + /* Can't access quadrants 1 or 2 in non-HV mode, call the HV to do it */ + if (kvmhv_on_pseries()) + return plpar_hcall_norets(H_COPY_TOFROM_GUEST, lpid, pid, eaddr, + __pa(to), __pa(from), n); quadrant = 1; if (!pid) diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index 5903175751b4..a9db12cbc0fa 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -1284,11 +1284,6 @@ static long int __kvmhv_nested_page_fault(struct kvm_run *run, } /* passthrough of emulated MMIO case */ - if (kvmhv_on_pseries()) { - pr_err("emulated MMIO passthrough?\n"); - return -EINVAL; - } - return kvmppc_hv_emulate_mmio(run, vcpu, gpa, ea, writing); } if (memslot->flags & KVM_MEM_READONLY) { -- 2.13.6
[PATCH 0/2] Fix handling of h_set_dawr
Series contains 2 patches to fix the host in kernel handling of the hcall h_set_dawr. First patch from Michael Neuling is just a resend added here for clarity. Michael Neuling (1): KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr() Suraj Jitindar Singh (1): KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real mode arch/powerpc/kvm/book3s_hv_rmhandlers.S | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) -- 2.13.6
[PATCH 1/2] KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()
From: Michael Neuling Commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") screwed up some assembler and corrupted a pointer in r3. This resulted in crashes like the below: [ 44.374746] BUG: Kernel NULL pointer dereference at 0x13bf [ 44.374848] Faulting instruction address: 0xc010b044 [ 44.374906] Oops: Kernel access of bad area, sig: 11 [#1] [ 44.374951] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 44.375018] Modules linked in: vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter vmx_crypto crct10dif_vpmsum crc32c_vpmsum kvm_hv kvm sch_fq_codel ip_tables x_tables autofs4 virtio_net net_failover virtio_scsi failover [ 44.375401] CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc4+ #3 [ 44.375500] NIP: c010b044 LR: c008089dacf4 CTR: c010aff4 [ 44.375604] REGS: c0179b397710 TRAP: 0300 Not tainted (5.2.0-rc4+) [ 44.375691] MSR: 8280b033 CR: 42244842 XER: [ 44.375815] CFAR: c010aff8 DAR: 13bf DSISR: 4200 IRQMASK: 0 [ 44.375815] GPR00: c008089dd6bc c0179b3979a0 c00808a04300 [ 44.375815] GPR04: 0003 2444b05d c017f11c45d0 [ 44.375815] GPR08: 07803e018dfe 0028 0001 0075 [ 44.375815] GPR12: c010aff4 c7ff6300 [ 44.375815] GPR16: c017f11d c017f11ca7a8 [ 44.375815] GPR20: c017f11c42ec 000a [ 44.375815] GPR24: fffc c017f11c c1a77ed8 [ 44.375815] GPR28: c0179af7 fffc c008089ff170 c0179ae88540 [ 44.376673] NIP [c010b044] kvmppc_h_set_dabr+0x50/0x68 [ 44.376754] LR [c008089dacf4] kvmppc_pseries_do_hcall+0xa3c/0xeb0 [kvm_hv] [ 44.376849] Call Trace: [ 44.376886] [c0179b3979a0] [c017f11c] 0xc017f11c (unreliable) [ 44.376982] [c0179b397a10] [c008089dd6bc] kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv] [ 44.377084] [c0179b397ae0] [c008093f8bcc] kvmppc_vcpu_run+0x34/0x48 [kvm] [ 44.377185] [c0179b397b00] [c008093f522c] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm] [ 44.377286] [c0179b397b90] [c008093e3618] kvm_vcpu_ioctl+0x460/0x850 [kvm] [ 44.377384] [c0179b397d00] [c04ba6c4] do_vfs_ioctl+0xe4/0xb40 [ 44.377464] [c0179b397db0] [c04bb1e4] ksys_ioctl+0xc4/0x110 [ 44.377547] [c0179b397e00] [c04bb258] sys_ioctl+0x28/0x80 [ 44.377628] [c0179b397e20] [c000b888] system_call+0x5c/0x70 [ 44.377712] Instruction dump: [ 44.377765] 4082fff4 4c00012c 3860 4e800020 e96280c0 896b 2c2b 3860 [ 44.377862] 4d820020 50852e74 508516f6 78840724 f8a313c8 7c942ba6 7cbc2ba6 Fix the bug by only changing r3 when we are returning immediately. Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") Signed-off-by: Michael Neuling Reported-by: Cédric Le Goater -- mpe: This is for 5.2 fixes v2: Review from Christophe Leroy - De-Mikey/Cedric-ify commit message - Add "Fixes:" - Other trivial commit messages changes - No code change --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index d885a5831daa..703cd6cd994d 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -2500,8 +2500,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) LOAD_REG_ADDR(r11, dawr_force_enable) lbz r11, 0(r11) cmpdi r11, 0 + bne 3f li r3, H_HARDWARE - beqlr + blr +3: /* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */ rlwimi r5, r4, 5, DAWRX_DR | DAWRX_DW rlwimi r5, r4, 2, DAWRX_WT -- 2.13.6
[PATCH 2/2] KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real mode
The hcall H_SET_DAWR is used by a guest to set the data address watchpoint register (DAWR). This hcall is handled in the host in kvmppc_h_set_dawr() which can be called in either real mode on the guest exit path from hcall_try_real_mode() in book3s_hv_rmhandlers.S, or in virtual mode when called from kvmppc_pseries_do_hcall() in book3s_hv.c. The function kvmppc_h_set_dawr updates the dawr and dawrx fields in the vcpu struct accordingly and then also writes the respective values into the DAWR and DAWRX registers directly. It is necessary to write the registers directly here when calling the function in real mode since the path to re-enter the guest won't do this. However when in virtual mode the host DAWR and DAWRX values have already been restored, and so writing the registers would overwrite these. Additionally there is no reason to write the guest values here as these will be read from the vcpu struct and written to the registers appropriately the next time the vcpu is run. This also avoids the case when handling h_set_dawr for a nested guest where the guest hypervisor isn't able to write the DAWR and DAWRX registers directly and must rely on the real hypervisor to do this for it when it calls H_ENTER_NESTED. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 703cd6cd994d..337e64468d78 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -2510,9 +2510,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) clrrdi r4, r4, 3 std r4, VCPU_DAWR(r3) std r5, VCPU_DAWRX(r3) + /* +* If came in through the real mode hcall handler then it is necessary +* to write the registers since the return path won't. Otherwise it is +* sufficient to store then in the vcpu struct as they will be loaded +* next time the vcpu is run. +*/ + mfmsr r6 + andi. r6, r6, MSR_DR /* in real mode? */ + bne 4f mtspr SPRN_DAWR, r4 mtspr SPRN_DAWRX, r5 - li r3, 0 +4: li r3, 0 blr _GLOBAL(kvmppc_h_cede) /* r3 = vcpu pointer, r11 = msr, r13 = paca */ -- 2.13.6
Re: [PATCH 0/2] Fix handling of h_set_dawr
On Mon, 2019-06-17 at 11:06 +0200, Cédric Le Goater wrote: > On 17/06/2019 09:16, Suraj Jitindar Singh wrote: > > Series contains 2 patches to fix the host in kernel handling of the > > hcall > > h_set_dawr. > > > > First patch from Michael Neuling is just a resend added here for > > clarity. > > > > Michael Neuling (1): > > KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr() > > > > Suraj Jitindar Singh (1): > > KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr > > in > > real mode > > > > Reviewed-by: Cédric Le Goater > > and > > Tested-by: Cédric Le Goater > > > but I see slowdowns in nested as if the IPIs were not delivered. Have > we > touch this part in 5.2 ? Hi, I've seen the same and tracked it down to decrementer exceptions not being delivered when the guest is using large decrementer. I've got a patch I'm about to send so I'll CC you. Another option is to disable the large decrementer with: -machine pseries,cap-large-decr=false Thanks, Suraj > > Thanks, > > C. >
[PATCH 1/3] KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
When a guest vcpu moves from one physical thread to another it is necessary for the host to perform a tlb flush on the previous core if another vcpu from the same guest is going to run there. This is because the guest may use the local form of the tlb invalidation instruction meaning stale tlb entries would persist where it previously ran. This is handled on guest entry in kvmppc_check_need_tlb_flush() which calls flush_guest_tlb() to perform the tlb flush. Previously the generic radix__local_flush_tlb_lpid_guest() function was used, however the functionality was reimplemented in flush_guest_tlb() to avoid the trace_tlbie() call as the flushing may be done in real mode. The reimplementation in flush_guest_tlb() was missing an erat invalidation after flushing the tlb. This lead to observable memory corruption in the guest due to the caching of stale translations. Fix this by adding the erat invalidation. Fixes: 70ea13f6e609 "KVM: PPC: Book3S HV: Flush TLB on secondary radix threads" Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/book3s_hv_builtin.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 6035d24f1d1d..a46286f73eec 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -833,6 +833,7 @@ static void flush_guest_tlb(struct kvm *kvm) } } asm volatile("ptesync": : :"memory"); + asm volatile(PPC_INVALIDATE_ERAT : : :"memory"); } void kvmppc_check_need_tlb_flush(struct kvm *kvm, int pcpu, -- 2.13.6
[PATCH 3/3] KVM: PPC: Book3S HV: Clear pending decr exceptions on nested guest entry
If we enter an L1 guest with a pending decrementer exception then this is cleared on guest exit if the guest has writtien a positive value into the decrementer (indicating that it handled the decrementer exception) since there is no other way to detect that the guest has handled the pending exception and that it should be dequeued. In the event that the L1 guest tries to run a nested (L2) guest immediately after this and the L2 guest decrementer is negative (which is loaded by L1 before making the H_ENTER_NESTED hcall), then the pending decrementer exception isn't cleared and the L2 entry is blocked since L1 has a pending exception, even though L1 may have already handled the exception and written a positive value for it's decrementer. This results in a loop of L1 trying to enter the L2 guest and L0 blocking the entry since L1 has an interrupt pending with the outcome being that L2 never gets to run and hangs. Fix this by clearing any pending decrementer exceptions when L1 makes the H_ENTER_NESTED hcall since it won't do this if it's decrementer has gone negative, and anyway it's decrementer has been communicated to L0 in the hdec_expires field and L0 will return control to L1 when this goes negative by delivering an H_DECREMENTER exception. Fixes: 95a6432ce903 "KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests" Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/book3s_hv.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 719fd2529eec..4a5eb29b952f 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -4128,8 +4128,15 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, preempt_enable(); - /* cancel pending decrementer exception if DEC is now positive */ - if (get_tb() < vcpu->arch.dec_expires && kvmppc_core_pending_dec(vcpu)) + /* +* cancel pending decrementer exception if DEC is now positive, or if +* entering a nested guest in which case the decrementer is now owned +* by L2 and the L1 decrementer is provided in hdec_expires +*/ + if (kvmppc_core_pending_dec(vcpu) && + ((get_tb() < vcpu->arch.dec_expires) || +(trap == BOOK3S_INTERRUPT_SYSCALL && + kvmppc_get_gpr(vcpu, 3) == H_ENTER_NESTED))) kvmppc_core_dequeue_dec(vcpu); trace_kvm_guest_exit(vcpu); -- 2.13.6
[PATCH 2/3] KVM: PPC: Book3S HV: Signed extend decrementer value if not using large decr
On POWER9 the decrementer can operate in large decrementer mode where the decrementer is 56 bits and signed extended to 64 bits. When not operating in this mode the decrementer behaves as a 32 bit decrementer which is NOT signed extended (as on POWER8). Currently when reading a guest decrementer value we don't take into account whether the large decrementer is enabled or not, and this means the value will be incorrect when the guest is not using the large decrementer. Fix this by sign extending the value read when the guest isn't using the large decrementer. Fixes: 95a6432ce903 "KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests" Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/book3s_hv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index d3684509da35..719fd2529eec 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3607,6 +3607,8 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, vcpu->arch.slb_max = 0; dec = mfspr(SPRN_DEC); + if (!(lpcr & LPCR_LD)) /* Sign extend if not using large decrementer */ + dec = (s32) dec; tb = mftb(); vcpu->arch.dec_expires = dec + tb; vcpu->cpu = -1; -- 2.13.6
[PATCH 1/3] KVM: PPC: Book3S HV: Always save guest pmu for guest capable of nesting
The performance monitoring unit (PMU) registers are saved on guest exit when the guest has set the pmcregs_in_use flag in its lppaca, if it exists, or unconditionally if it doesn't. If a nested guest is being run then the hypervisor doesn't, and in most cases can't, know if the pmu registers are in use since it doesn't know the location of the lppaca for the nested guest, although it may have one for its immediate guest. This results in the values of these registers being lost across nested guest entry and exit in the case where the nested guest was making use of the performance monitoring facility while it's nested guest hypervisor wasn't. Further more the hypervisor could interrupt a guest hypervisor between when it has loaded up the pmu registers and it calling H_ENTER_NESTED or between returning from the nested guest to the guest hypervisor and the guest hypervisor reading the pmu registers, in kvmhv_p9_guest_entry(). This means that it isn't sufficient to just save the pmu registers when entering or exiting a nested guest, but that it is necessary to always save the pmu registers whenever a guest is capable of running nested guests to ensure the register values aren't lost in the context switch. Ensure the pmu register values are preserved by always saving their value into the vcpu struct when a guest is capable of running nested guests. This should have minimal performance impact however any impact can be avoided by booting a guest with "-machine pseries,cap-nested-hv=false" on the qemu commandline. Fixes: 95a6432ce903 "KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests" Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/book3s_hv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index ec1804f822af..b682a429f3ef 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3654,6 +3654,8 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, vcpu->arch.vpa.dirty = 1; save_pmu = lp->pmcregs_in_use; } + /* Must save pmu if this guest is capable of running nested guests */ + save_pmu |= nesting_enabled(vcpu->kvm); kvmhv_save_guest_pmu(vcpu, save_pmu); -- 2.13.6
[PATCH 2/3] PPC: PMC: Set pmcregs_in_use in paca when running as LPAR
The ability to run nested guests under KVM means that a guest can also act as a hypervisor for it's own nested guest. Currently ppc_set_pmu_inuse() assumes that either FW_FEATURE_LPAR is set, indicating a guest environment, and so sets the pmcregs_in_use flag in the lppaca, or that it isn't set, indicating a hypervisor environment, and so sets the pmcregs_in_use flag in the paca. The pmcregs_in_use flag in the lppaca is used to communicate this information to a hypervisor and so must be set in a guest environment. The pmcregs_in_use flag in the paca is used by KVM code to determine whether the host state of the performance monitoring unit (PMU) must be saved and restored when running a guest. Thus when a guest also acts as a hypervisor it must set this bit in both places since it needs to ensure both that the real hypervisor saves it's pmu registers when it runs (requires pmcregs_in_use flag in lppaca), and that it saves it's own pmu registers when running a nested guest (requires pmcregs_in_use flag in paca). Modify ppc_set_pmu_inuse() so that the pmcregs_in_use bit is set in both the lppaca and the paca when a guest (LPAR) is running with the capability of running it's own guests (CONFIG_KVM_BOOK3S_HV_POSSIBLE). Fixes: 95a6432ce903 "KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests" Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/pmc.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/pmc.h b/arch/powerpc/include/asm/pmc.h index dc9a1ca70edf..c6bbe9778d3c 100644 --- a/arch/powerpc/include/asm/pmc.h +++ b/arch/powerpc/include/asm/pmc.h @@ -27,11 +27,10 @@ static inline void ppc_set_pmu_inuse(int inuse) #ifdef CONFIG_PPC_PSERIES get_lppaca()->pmcregs_in_use = inuse; #endif - } else { + } #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE - get_paca()->pmcregs_in_use = inuse; + get_paca()->pmcregs_in_use = inuse; #endif - } #endif } -- 2.13.6
[PATCH 3/3] KVM: PPC: Book3S HV: Save and restore guest visible PSSCR bits on pseries
The performance stop status and control register (PSSCR) is used to control the power saving facilities of the processor. This register has various fields, some of which can be modified only in hypervisor state, and others which can be modified in both hypervisor and priviledged non-hypervisor state. The bits which can be modified in priviledged non-hypervisor state are referred to as guest visible. Currently the L0 hypervisor saves and restores both it's own host value as well as the guest value of the psscr when context switching between the hypervisor and guest. However a nested hypervisor running it's own nested guests (as indicated by kvmhv_on_pseries()) doesn't context switch the psscr register. This means that if a nested (L2) guest modified the psscr that the L1 guest hypervisor will run with this value, and if the L1 guest hypervisor modified this value and then goes to run the nested (L2) guest again that the L2 psscr value will be lost. Fix this by having the (L1) nested hypervisor save and restore both its host and the guest psscr value when entering and exiting a nested (L2) guest. Note that only the guest visible parts of the psscr are context switched since this is all the L1 nested hypervisor can access, this is fine however as these are the only fields the L0 hypervisor provides guest control of anyway and so all other fields are ignored. This could also have been implemented by adding the psscr register to the hv_regs passed to the L0 hypervisor as input to the H_ENTER_NESTED hcall, however this would have meant updating the structure layout and thus required modifications to both the L0 and L1 kernels. Whereas the approach used doesn't require L0 kernel modifications while achieving the same result. Fixes: 95a6432ce903 "KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests" Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/book3s_hv.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index b682a429f3ef..cde3f5a4b3e4 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3569,9 +3569,18 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, mtspr(SPRN_DEC, vcpu->arch.dec_expires - mftb()); if (kvmhv_on_pseries()) { + /* +* We need to save and restore the guest visible part of the +* psscr (i.e. using SPRN_PSSCR_PR) since the hypervisor +* doesn't do this for us. Note only required if pseries since +* this is done in kvmhv_load_hv_regs_and_go() below otherwise. +*/ + unsigned long host_psscr; /* call our hypervisor to load up HV regs and go */ struct hv_guest_state hvregs; + host_psscr = mfspr(SPRN_PSSCR_PR); + mtspr(SPRN_PSSCR_PR, vcpu->arch.psscr); kvmhv_save_hv_regs(vcpu, &hvregs); hvregs.lpcr = lpcr; vcpu->arch.regs.msr = vcpu->arch.shregs.msr; @@ -3590,6 +3599,8 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, vcpu->arch.shregs.msr = vcpu->arch.regs.msr; vcpu->arch.shregs.dar = mfspr(SPRN_DAR); vcpu->arch.shregs.dsisr = mfspr(SPRN_DSISR); + vcpu->arch.psscr = mfspr(SPRN_PSSCR_PR); + mtspr(SPRN_PSSCR_PR, host_psscr); /* H_CEDE has to be handled now, not later */ if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested && -- 2.13.6
[PATCH] powerpc: mm: Limit rma_size to 1TB when running without HV mode
The virtual real mode addressing (VRMA) mechanism is used when a partition is using HPT (Hash Page Table) translation and performs real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this mode effective address bits 0:23 are treated as zero (i.e. the access is aliased to 0) and the access is performed using an implicit 1TB SLB entry. The size of the RMA (Real Memory Area) is communicated to the guest as the size of the first memory region in the device tree. And because of the mechanism described above can be expected to not exceed 1TB. In the event that the host erroneously represents the RMA as being larger than 1TB, guest accesses in real mode to memory addresses above 1TB will be aliased down to below 1TB. This means that a memory access performed in real mode may differ to one performed in virtual mode for the same memory address, which would likely have unintended consequences. To avoid this outcome have the guest explicitly limit the size of the RMA to the current maximum, which is 1TB. This means that even if the first memory block is larger than 1TB, only the first 1TB should be accessed in real mode. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/mm/book3s64/hash_utils.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c index 28ced26f2a00..4d0e2cce9cd5 100644 --- a/arch/powerpc/mm/book3s64/hash_utils.c +++ b/arch/powerpc/mm/book3s64/hash_utils.c @@ -1901,11 +1901,19 @@ void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base, * * For guests on platforms before POWER9, we clamp the it limit to 1G * to avoid some funky things such as RTAS bugs etc... +* On POWER9 we limit to 1TB in case the host erroneously told us that +* the RMA was >1TB. Effective address bits 0:23 are treated as zero +* (meaning the access is aliased to zero i.e. addr = addr % 1TB) +* for virtual real mode addressing and so it doesn't make sense to +* have an area larger than 1TB as it can't be addressed. */ if (!early_cpu_has_feature(CPU_FTR_HVMODE)) { ppc64_rma_size = first_memblock_size; if (!early_cpu_has_feature(CPU_FTR_ARCH_300)) ppc64_rma_size = min_t(u64, ppc64_rma_size, 0x4000); + else + ppc64_rma_size = min_t(u64, ppc64_rma_size, + 1UL << SID_SHIFT_1T); /* Finally limit subsequent allocations */ memblock_set_current_limit(ppc64_rma_size); -- 2.13.6
Re: [PATCH] powerpc: mm: Limit rma_size to 1TB when running without HV mode
On Fri, 2019-07-12 at 23:09 +1000, Michael Ellerman wrote: > Suraj Jitindar Singh writes: > > The virtual real mode addressing (VRMA) mechanism is used when a > > partition is using HPT (Hash Page Table) translation and performs > > real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this > > mode effective address bits 0:23 are treated as zero (i.e. the > > access > > is aliased to 0) and the access is performed using an implicit 1TB > > SLB > > entry. > > > > The size of the RMA (Real Memory Area) is communicated to the guest > > as > > the size of the first memory region in the device tree. And because > > of > > the mechanism described above can be expected to not exceed 1TB. In > > the > > event that the host erroneously represents the RMA as being larger > > than > > 1TB, guest accesses in real mode to memory addresses above 1TB will > > be > > aliased down to below 1TB. This means that a memory access > > performed in > > real mode may differ to one performed in virtual mode for the same > > memory > > address, which would likely have unintended consequences. > > > > To avoid this outcome have the guest explicitly limit the size of > > the > > RMA to the current maximum, which is 1TB. This means that even if > > the > > first memory block is larger than 1TB, only the first 1TB should be > > accessed in real mode. > > > > Signed-off-by: Suraj Jitindar Singh > > I added: > > Fixes: c3ab300ea555 ("powerpc: Add POWER9 cputable entry") > Cc: sta...@vger.kernel.org # v4.6+ > > > Which is not exactly correct, but probably good enough? I think we actually want: Fixes: c610d65c0ad0 ("powerpc/pseries: lift RTAS limit for hash") Which is what actually caused it to break and for the issue to present itself. > > cheers > > > diff --git a/arch/powerpc/mm/book3s64/hash_utils.c > > b/arch/powerpc/mm/book3s64/hash_utils.c > > index 28ced26f2a00..4d0e2cce9cd5 100644 > > --- a/arch/powerpc/mm/book3s64/hash_utils.c > > +++ b/arch/powerpc/mm/book3s64/hash_utils.c > > @@ -1901,11 +1901,19 @@ void > > hash__setup_initial_memory_limit(phys_addr_t first_memblock_base, > > * > > * For guests on platforms before POWER9, we clamp the it > > limit to 1G > > * to avoid some funky things such as RTAS bugs etc... > > +* On POWER9 we limit to 1TB in case the host erroneously > > told us that > > +* the RMA was >1TB. Effective address bits 0:23 are > > treated as zero > > +* (meaning the access is aliased to zero i.e. addr = addr > > % 1TB) > > +* for virtual real mode addressing and so it doesn't make > > sense to > > +* have an area larger than 1TB as it can't be addressed. > > */ > > if (!early_cpu_has_feature(CPU_FTR_HVMODE)) { > > ppc64_rma_size = first_memblock_size; > > if (!early_cpu_has_feature(CPU_FTR_ARCH_300)) > > ppc64_rma_size = min_t(u64, > > ppc64_rma_size, 0x4000); > > + else > > + ppc64_rma_size = min_t(u64, > > ppc64_rma_size, > > + 1UL << > > SID_SHIFT_1T); > > > > /* Finally limit subsequent allocations */ > > memblock_set_current_limit(ppc64_rma_size); > > -- > > 2.13.6
Re: [PATCH 1/3] KVM: PPC: Book3S HV: Always save guest pmu for guest capable of nesting
On Sat, 2019-07-13 at 13:47 +1000, Michael Ellerman wrote: > Suraj Jitindar Singh writes: > > The performance monitoring unit (PMU) registers are saved on guest > > exit > > when the guest has set the pmcregs_in_use flag in its lppaca, if it > > exists, or unconditionally if it doesn't. If a nested guest is > > being > > run then the hypervisor doesn't, and in most cases can't, know if > > the > > pmu registers are in use since it doesn't know the location of the > > lppaca > > for the nested guest, although it may have one for its immediate > > guest. > > This results in the values of these registers being lost across > > nested > > guest entry and exit in the case where the nested guest was making > > use > > of the performance monitoring facility while it's nested guest > > hypervisor > > wasn't. > > > > Further more the hypervisor could interrupt a guest hypervisor > > between > > when it has loaded up the pmu registers and it calling > > H_ENTER_NESTED or > > between returning from the nested guest to the guest hypervisor and > > the > > guest hypervisor reading the pmu registers, in > > kvmhv_p9_guest_entry(). > > This means that it isn't sufficient to just save the pmu registers > > when > > entering or exiting a nested guest, but that it is necessary to > > always > > save the pmu registers whenever a guest is capable of running > > nested guests > > to ensure the register values aren't lost in the context switch. > > > > Ensure the pmu register values are preserved by always saving their > > value into the vcpu struct when a guest is capable of running > > nested > > guests. > > > > This should have minimal performance impact however any impact can > > be > > avoided by booting a guest with "-machine pseries,cap-nested- > > hv=false" > > on the qemu commandline. > > > > Fixes: 95a6432ce903 "KVM: PPC: Book3S HV: Streamlined guest > > entry/exit path on P9 for radix guests" > > I'm not clear why this and the next commit are marked as fixing the > above commit. Wasn't it broken prior to that commit as well? That was the commit which introduced the entry path which we use for a nested guest, the path on which we need to be saving and restoring the pmu registers and so where the new code was introduced. It wasn't technically broken prior to that commit since you couldn't run nested prior to that commit, and in fact it's a few commits after that one where we actually enabled the ability to run nested guests. However since that's the code which introduced the nested entry path it seemed like the best fit for the fixes tag for people who will be looking for fixes in that area. Also all the other nested entry path fixes used that fixes tag so it ties them together nicely. Thanks, Suraj > > cheers > > > Signed-off-by: Suraj Jitindar Singh > > --- > > arch/powerpc/kvm/book3s_hv.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/arch/powerpc/kvm/book3s_hv.c > > b/arch/powerpc/kvm/book3s_hv.c > > index ec1804f822af..b682a429f3ef 100644 > > --- a/arch/powerpc/kvm/book3s_hv.c > > +++ b/arch/powerpc/kvm/book3s_hv.c > > @@ -3654,6 +3654,8 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu > > *vcpu, u64 time_limit, > > vcpu->arch.vpa.dirty = 1; > > save_pmu = lp->pmcregs_in_use; > > } > > + /* Must save pmu if this guest is capable of running > > nested guests */ > > + save_pmu |= nesting_enabled(vcpu->kvm); > > > > kvmhv_save_guest_pmu(vcpu, save_pmu); > > > > -- > > 2.13.6
[PATCH] KVM: PPC: Book3S HV: Optimise mmio emulation for devices on FAST_MMIO_BUS
Devices on the KVM_FAST_MMIO_BUS by definition have length zero and are thus used for notification purposes rather than data transfer. For example eventfd for virtio devices. This means that when emulating mmio instructions which target devices on this bus we can immediately handle them and return without needing to load the instruction from guest memory. For now we restrict this to stores as this is the only use case at present. For a normal guest the effect is negligible, however for a nested guest we save on the order of 5us per access. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index bd2dcfbf00cd..be7bc070eae5 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -442,6 +442,24 @@ int kvmppc_hv_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu, u32 last_inst; /* +* Fast path - check if the guest physical address corresponds to a +* device on the FAST_MMIO_BUS, if so we can avoid loading the +* instruction all together, then we can just handle it and return. +*/ + if (is_store) { + int idx, ret; + + idx = srcu_read_lock(&vcpu->kvm->srcu); + ret = kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, (gpa_t) gpa, 0, + NULL); + srcu_read_unlock(&vcpu->kvm->srcu, idx); + if (!ret) { + kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4); + return RESUME_GUEST; + } + } + + /* * If we fail, we just return to the guest and try executing it again. */ if (kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst) != -- 2.13.6
[PATCH] KVM: PPC: Book3S: Add KVM stat num_[2M/1G]_pages
This adds an entry to the kvm_stats_debugfs directory which provides the number of large (2M or 1G) pages which have been used to setup the guest mappings. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/kvm_host.h| 2 ++ arch/powerpc/kvm/book3s.c | 3 +++ arch/powerpc/kvm/book3s_64_mmu_radix.c | 15 ++- 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 0f98f00da2ea..cbb090010312 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -99,6 +99,8 @@ struct kvm_nested_guest; struct kvm_vm_stat { ulong remote_tlb_flush; + ulong num_2M_pages; + ulong num_1G_pages; }; struct kvm_vcpu_stat { diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index bd1a677dd9e4..3cc5215bdb2e 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -39,6 +39,7 @@ #include "book3s.h" #include "trace.h" +#define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU /* #define EXIT_DEBUG */ @@ -71,6 +72,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { { "pthru_all", VCPU_STAT(pthru_all) }, { "pthru_host", VCPU_STAT(pthru_host) }, { "pthru_bad_aff", VCPU_STAT(pthru_bad_aff) }, + { "num_2M_pages",VM_STAT(num_2M_pages) }, + { "num_1G_pages",VM_STAT(num_1G_pages) }, { NULL } }; diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 1b821c6efdef..f55ef071883f 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -403,8 +403,13 @@ void kvmppc_unmap_pte(struct kvm *kvm, pte_t *pte, unsigned long gpa, if (!memslot) return; } - if (shift) + if (shift) { /* 1GB or 2MB page */ page_size = 1ul << shift; + if (shift == PMD_SHIFT) + kvm->stat.num_2M_pages--; + else if (shift == PUD_SHIFT) + kvm->stat.num_1G_pages--; + } gpa &= ~(page_size - 1); hpa = old & PTE_RPN_MASK; @@ -878,6 +883,14 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, put_page(page); } + /* Increment number of large pages if we (successfully) inserted one */ + if (!ret) { + if (level == 1) + kvm->stat.num_2M_pages++; + else if (level == 2) + kvm->stat.num_1G_pages++; + } + return ret; } -- 2.13.6
[PATCH v2] KVM: PPC: Book3S: Add KVM stat largepages_[2M/1G]
This adds an entry to the kvm_stats_debugfs directory which provides the number of large (2M or 1G) pages which have been used to setup the guest mappings. Signed-off-by: Suraj Jitindar Singh --- V1 -> V2: - Rename debugfs files from num_[2M/1G]_pages to largepages_[2M/1G] to match x86 arch/powerpc/include/asm/kvm_host.h| 2 ++ arch/powerpc/kvm/book3s.c | 3 +++ arch/powerpc/kvm/book3s_64_mmu_radix.c | 15 ++- 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 0f98f00da2ea..cbb090010312 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -99,6 +99,8 @@ struct kvm_nested_guest; struct kvm_vm_stat { ulong remote_tlb_flush; + ulong num_2M_pages; + ulong num_1G_pages; }; struct kvm_vcpu_stat { diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index bd1a677dd9e4..72fd7d44379b 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -39,6 +39,7 @@ #include "book3s.h" #include "trace.h" +#define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM #define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU /* #define EXIT_DEBUG */ @@ -71,6 +72,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { { "pthru_all", VCPU_STAT(pthru_all) }, { "pthru_host", VCPU_STAT(pthru_host) }, { "pthru_bad_aff", VCPU_STAT(pthru_bad_aff) }, + { "largepages_2M",VM_STAT(num_2M_pages) }, + { "largepages_1G",VM_STAT(num_1G_pages) }, { NULL } }; diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 1b821c6efdef..f55ef071883f 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -403,8 +403,13 @@ void kvmppc_unmap_pte(struct kvm *kvm, pte_t *pte, unsigned long gpa, if (!memslot) return; } - if (shift) + if (shift) { /* 1GB or 2MB page */ page_size = 1ul << shift; + if (shift == PMD_SHIFT) + kvm->stat.num_2M_pages--; + else if (shift == PUD_SHIFT) + kvm->stat.num_1G_pages--; + } gpa &= ~(page_size - 1); hpa = old & PTE_RPN_MASK; @@ -878,6 +883,14 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, put_page(page); } + /* Increment number of large pages if we (successfully) inserted one */ + if (!ret) { + if (level == 1) + kvm->stat.num_2M_pages++; + else if (level == 2) + kvm->stat.num_1G_pages++; + } + return ret; } -- 2.13.6
Re: [PATCH v6 1/7] kvmppc: Driver to manage pages of secure guest
On Fri, 2019-08-09 at 14:11 +0530, Bharata B Rao wrote: > KVMPPC driver to manage page transitions of secure guest > via H_SVM_PAGE_IN and H_SVM_PAGE_OUT hcalls. > > H_SVM_PAGE_IN: Move the content of a normal page to secure page > H_SVM_PAGE_OUT: Move the content of a secure page to normal page > > Private ZONE_DEVICE memory equal to the amount of secure memory > available in the platform for running secure guests is created > via a char device. Whenever a page belonging to the guest becomes > secure, a page from this private device memory is used to > represent and track that secure page on the HV side. The movement > of pages between normal and secure memory is done via > migrate_vma_pages() using UV_PAGE_IN and UV_PAGE_OUT ucalls. Hi Bharata, please see my patch where I define the bits which define the type of the rmap entry: https://patchwork.ozlabs.org/patch/1149791/ Please add an entry for the devm pfn type like: #define KVMPPC_RMAP_PFN_DEVM 0x0200 /* secure guest devm pfn */ And the following in the appropriate header file static inline bool kvmppc_rmap_is_pfn_demv(unsigned long *rmapp) { return !!((*rmapp & KVMPPC_RMAP_TYPE_MASK) == KVMPPC_RMAP_PFN_DEVM)); } Also see comment below. Thanks, Suraj > > Signed-off-by: Bharata B Rao > --- > arch/powerpc/include/asm/hvcall.h | 4 + > arch/powerpc/include/asm/kvm_book3s_devm.h | 29 ++ > arch/powerpc/include/asm/kvm_host.h| 12 + > arch/powerpc/include/asm/ultravisor-api.h | 2 + > arch/powerpc/include/asm/ultravisor.h | 14 + > arch/powerpc/kvm/Makefile | 3 + > arch/powerpc/kvm/book3s_hv.c | 19 + > arch/powerpc/kvm/book3s_hv_devm.c | 492 > + > 8 files changed, 575 insertions(+) > create mode 100644 arch/powerpc/include/asm/kvm_book3s_devm.h > create mode 100644 arch/powerpc/kvm/book3s_hv_devm.c > [snip] > + > +struct kvmppc_devm_page_pvt { > + unsigned long *rmap; > + unsigned int lpid; > + unsigned long gpa; > +}; > + > +struct kvmppc_devm_copy_args { > + unsigned long *rmap; > + unsigned int lpid; > + unsigned long gpa; > + unsigned long page_shift; > +}; > + > +/* > + * Bits 60:56 in the rmap entry will be used to identify the > + * different uses/functions of rmap. This definition with move > + * to a proper header when all other functions are defined. > + */ > +#define KVMPPC_PFN_DEVM (0x2ULL << 56) > + > +static inline bool kvmppc_is_devm_pfn(unsigned long pfn) > +{ > + return !!(pfn & KVMPPC_PFN_DEVM); > +} > + > +/* > + * Get a free device PFN from the pool > + * > + * Called when a normal page is moved to secure memory (UV_PAGE_IN). > Device > + * PFN will be used to keep track of the secure page on HV side. > + * > + * @rmap here is the slot in the rmap array that corresponds to > @gpa. > + * Thus a non-zero rmap entry indicates that the corresonding guest > + * page has become secure, and is not mapped on the HV side. > + * > + * NOTE: In this and subsequent functions, we pass around and access > + * individual elements of kvm_memory_slot->arch.rmap[] without any > + * protection. Should we use lock_rmap() here? > + */ > +static struct page *kvmppc_devm_get_page(unsigned long *rmap, > + unsigned long gpa, unsigned > int lpid) > +{ > + struct page *dpage = NULL; > + unsigned long bit, devm_pfn; > + unsigned long nr_pfns = kvmppc_devm.pfn_last - > + kvmppc_devm.pfn_first; > + unsigned long flags; > + struct kvmppc_devm_page_pvt *pvt; > + > + if (kvmppc_is_devm_pfn(*rmap)) > + return NULL; > + > + spin_lock_irqsave(&kvmppc_devm_lock, flags); > + bit = find_first_zero_bit(kvmppc_devm.pfn_bitmap, nr_pfns); > + if (bit >= nr_pfns) > + goto out; > + > + bitmap_set(kvmppc_devm.pfn_bitmap, bit, 1); > + devm_pfn = bit + kvmppc_devm.pfn_first; > + dpage = pfn_to_page(devm_pfn); > + > + if (!trylock_page(dpage)) > + goto out_clear; > + > + *rmap = devm_pfn | KVMPPC_PFN_DEVM; > + pvt = kzalloc(sizeof(*pvt), GFP_ATOMIC); > + if (!pvt) > + goto out_unlock; > + pvt->rmap = rmap; Am I missing something, why does the rmap need to be stored in pvt? Given the gpa is already stored and this is enough to get back to the rmap entry, right? > + pvt->gpa = gpa; > + pvt->lpid = lpid; > + dpage->zone_device_data = pvt; > + spin_unlock_irqrestore(&kvmppc_devm_lock, flags); > + > + get_page(dpage); > + return dpage; > + > +out_unlock: > + unlock_page(dpage); > +out_clear: > + bitmap_clear(kvmppc_devm.pfn_bitmap, > + devm_pfn - kvmppc_devm.pfn_first, 1); > +out: > + spin_unlock_irqrestore(&kvmppc_devm_lock, flags); > + return NULL; > +} > + > [snip]
Re: ppc64le STRICT_MODULE_RWX and livepatch apply_relocate_add() crashes
Hi Russell, On Mon, 2021-11-01 at 19:20 +1000, Russell Currey wrote: > On Sun, 2021-10-31 at 22:43 -0400, Joe Lawrence wrote: > > Starting with 5.14 kernels, I can reliably reproduce a crash [1] on > > ppc64le when loading livepatches containing late klp-relocations > > [2]. > > These are relocations, specific to livepatching, that are resolved > > not > > when a livepatch module is loaded, but only when a livepatch-target > > module is loaded. > > Hey Joe, thanks for the report. > > > I haven't started looking at a fix yet, but in the case of the x86 > > code > > update, its apply_relocate_add() implementation was modified to use > > a > > common text_poke() function to allowed us to drop > > module_{en,dis}ble_ro() games by the livepatching code. > > It should be a similar fix for Power, our patch_instruction() uses a > text poke area but apply_relocate_add() doesn't use it and does its > own > raw patching instead. > > > I can take a closer look this week, but thought I'd send out a > > report > > in case this may be a known todo for STRICT_MODULE_RWX on Power. > > I'm looking into this now, will update when there's progress. I > personally wasn't aware but Jordan flagged this as an issue back in > August [0]. Are the selftests in the klp-convert tree sufficient for > testing? I'm not especially familiar with livepatching & haven't > used > the userspace tools. > You can test this by livepatching any module since this only occurs when writing relocations for modules since the vmlinux relocations are written earlier before the module text is mapped read-only. - Suraj > - Russell > > [0] https://github.com/linuxppc/issues/issues/375 > > > > > -- Joe > >
Re: [PATCH] KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()
On Thu, 2019-06-13 at 10:16 +1000, Michael Neuling wrote: > On Wed, 2019-06-12 at 09:43 +0200, Cédric Le Goater wrote: > > On 12/06/2019 09:22, Michael Neuling wrote: > > > In commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 > > > option") I screwed up some assembler and corrupted a pointer in > > > r3. This resulted in crashes like the below from Cédric: > > > > > > [ 44.374746] BUG: Kernel NULL pointer dereference at > > > 0x13bf > > > [ 44.374848] Faulting instruction address: 0xc010b044 > > > [ 44.374906] Oops: Kernel access of bad area, sig: 11 [#1] > > > [ 44.374951] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP > > > NR_CPUS=2048 NUMA pSeries > > > [ 44.375018] Modules linked in: vhost_net vhost tap > > > xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat > > > xt_conntrack nf_conntrack nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 > > > ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter > > > ebtables ip6table_filter ip6_tables iptable_filter bpfilter > > > vmx_crypto crct10dif_vpmsum crc32c_vpmsum kvm_hv kvm sch_fq_codel > > > ip_tables x_tables autofs4 virtio_net net_failover virtio_scsi > > > failover > > > [ 44.375401] CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: > > > loaded Not tainted 5.2.0-rc4+ #3 > > > [ 44.375500] NIP: c010b044 LR: c008089dacf4 CTR: > > > c010aff4 > > > [ 44.375604] REGS: c0179b397710 TRAP: 0300 Not > > > tainted (5.2.0-rc4+) > > > [ 44.375691] MSR: 8280b033 > > > CR: 42244842 XER: > > > [ 44.375815] CFAR: c010aff8 DAR: 13bf > > > DSISR: 4200 IRQMASK: 0 > > > [ 44.375815] GPR00: c008089dd6bc c0179b3979a0 > > > c00808a04300 > > > [ 44.375815] GPR04: 0003 > > > 2444b05d c017f11c45d0 > > > [ 44.375815] GPR08: 07803e018dfe 0028 > > > 0001 0075 > > > [ 44.375815] GPR12: c010aff4 c7ff6300 > > > > > > [ 44.375815] GPR16: c017f11d > > > c017f11ca7a8 > > > [ 44.375815] GPR20: c017f11c42ec > > > 000a > > > [ 44.375815] GPR24: fffc > > > c017f11c c1a77ed8 > > > [ 44.375815] GPR28: c0179af7 fffc > > > c008089ff170 c0179ae88540 > > > [ 44.376673] NIP [c010b044] > > > kvmppc_h_set_dabr+0x50/0x68 > > > [ 44.376754] LR [c008089dacf4] > > > kvmppc_pseries_do_hcall+0xa3c/0xeb0 [kvm_hv] > > > [ 44.376849] Call Trace: > > > [ 44.376886] [c0179b3979a0] [c017f11c] > > > 0xc017f11c (unreliable) > > > [ 44.376982] [c0179b397a10] [c008089dd6bc] > > > kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv] > > > [ 44.377084] [c0179b397ae0] [c008093f8bcc] > > > kvmppc_vcpu_run+0x34/0x48 [kvm] > > > [ 44.377185] [c0179b397b00] [c008093f522c] > > > kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm] > > > [ 44.377286] [c0179b397b90] [c008093e3618] > > > kvm_vcpu_ioctl+0x460/0x850 [kvm] > > > [ 44.377384] [c0179b397d00] [c04ba6c4] > > > do_vfs_ioctl+0xe4/0xb40 > > > [ 44.377464] [c0179b397db0] [c04bb1e4] > > > ksys_ioctl+0xc4/0x110 > > > [ 44.377547] [c0179b397e00] [c04bb258] > > > sys_ioctl+0x28/0x80 > > > [ 44.377628] [c0179b397e20] [c000b888] > > > system_call+0x5c/0x70 > > > [ 44.377712] Instruction dump: > > > [ 44.377765] 4082fff4 4c00012c 3860 4e800020 e96280c0 > > > 896b 2c2b 3860 > > > [ 44.377862] 4d820020 50852e74 508516f6 78840724 > > > f8a313c8 7c942ba6 7cbc2ba6 > > > > > > This fixes the problem by only changing r3 when we are returning > > > immediately. > > > > > > Signed-off-by: Michael Neuling > > > Reported-by: Cédric Le Goater > > > > On nested, I still see : > > > > [ 94.609274] Oops: Exception in kernel mode, sig: 4 [#1] > > [ 94.609432] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 > > NUMA pSeries > > [ 94.609596] Modules linked in: vhost_net vhost tap xt_CHECKSUM > > iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack > > nf_conntrack nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 ipt_REJECT > > nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables > > ip6table_filter ip6_tables iptable_filter bpfilter vmx_crypto > > kvm_hv crct10dif_vpmsum crc32c_vpmsum kvm sch_fq_codel ip_tables > > x_tables autofs4 virtio_net virtio_scsi net_failover failover > > [ 94.610179] CPU: 12 PID: 2026 Comm: qemu-system-ppc Kdump: > > loaded Not tainted 5.2.0-rc4+ #6 > > [ 94.610290] NIP: c010b050 LR: c00808bbacf4 CTR: > > c010aff4 > > [ 94.610400] REGS: c017913d7710 TRAP: 0700 Not > > tainted (5.2.0-rc4+) > > [ 94.610493] MSR: 8284b033 > > CR: 42224842 XER:
Re: [PATCH V3 2/2] KVM: PPC: Book3S HV: Enable guests to use large decrementer mode on POWER9
On Mon, 2017-05-29 at 20:12 +1000, Paul Mackerras wrote: > This allows userspace (e.g. QEMU) to enable large decrementer mode > for > the guest when running on a POWER9 host, by setting the LPCR_LD bit > in > the guest LPCR value. With this, the guest exit code saves 64 bits > of > the guest DEC value on exit. Other places that use the guest DEC > value check the LPCR_LD bit in the guest LPCR value, and if it is > set, > omit the 32-bit sign extension that would otherwise be done. > > This doesn't change the DEC emulation used by PR KVM because PR KVM > is not supported on POWER9 yet. > > This is partly based on an earlier patch by Oliver O'Halloran. > > Signed-off-by: Paul Mackerras Tested with a hacked up qemu and upstream guest/host (with these patches). Tested-by: Suraj Jitindar Singh > --- > arch/powerpc/include/asm/kvm_host.h | 2 +- > arch/powerpc/kvm/book3s_hv.c| 6 ++ > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 29 > - > arch/powerpc/kvm/emulate.c | 4 ++-- > 4 files changed, 33 insertions(+), 8 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_host.h > b/arch/powerpc/include/asm/kvm_host.h > index 9c51ac4..3f879c8 100644 > --- a/arch/powerpc/include/asm/kvm_host.h > +++ b/arch/powerpc/include/asm/kvm_host.h > @@ -579,7 +579,7 @@ struct kvm_vcpu_arch { > ulong mcsrr0; > ulong mcsrr1; > ulong mcsr; > - u32 dec; > + ulong dec; > #ifdef CONFIG_BOOKE > u32 decar; > #endif > diff --git a/arch/powerpc/kvm/book3s_hv.c > b/arch/powerpc/kvm/book3s_hv.c > index 42b7a4f..9b2eb66 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -1143,6 +1143,12 @@ static void kvmppc_set_lpcr(struct kvm_vcpu > *vcpu, u64 new_lpcr, > mask = LPCR_DPFD | LPCR_ILE | LPCR_TC; > if (cpu_has_feature(CPU_FTR_ARCH_207S)) > mask |= LPCR_AIL; > + /* > + * On POWER9, allow userspace to enable large decrementer > for the > + * guest, whether or not the host has it enabled. > + */ > + if (cpu_has_feature(CPU_FTR_ARCH_300)) > + mask |= LPCR_LD; > > /* Broken 32-bit version of LPCR must not clear top bits */ > if (preserve_top32) > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > index e390b38..3c901b5 100644 > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > @@ -920,7 +920,7 @@ ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300) > mftbr7 > subfr3,r7,r8 > mtspr SPRN_DEC,r3 > - stw r3,VCPU_DEC(r4) > + std r3,VCPU_DEC(r4) > > ld r5, VCPU_SPRG0(r4) > ld r6, VCPU_SPRG1(r4) > @@ -1032,7 +1032,13 @@ kvmppc_cede_reentry: /* r4 = > vcpu, r13 = paca */ > li r0, BOOK3S_INTERRUPT_EXTERNAL > bne cr1, 12f > mfspr r0, SPRN_DEC > - cmpwi r0, 0 > +BEGIN_FTR_SECTION > + /* On POWER9 check whether the guest has large decrementer > enabled */ > + andis. r8, r8, LPCR_LD@h > + bne 15f > +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) > + extsw r0, r0 > +15: cmpdi r0, 0 > li r0, BOOK3S_INTERRUPT_DECREMENTER > bge 5f > > @@ -1459,12 +1465,18 @@ mc_cont: > mtspr SPRN_SPURR,r4 > > /* Save DEC */ > + ld r3, HSTATE_KVM_VCORE(r13) > mfspr r5,SPRN_DEC > mftbr6 > + /* On P9, if the guest has large decr enabled, don't sign > extend */ > +BEGIN_FTR_SECTION > + ld r4, VCORE_LPCR(r3) > + andis. r4, r4, LPCR_LD@h > + bne 16f > +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) > extsw r5,r5 > - add r5,r5,r6 > +16: add r5,r5,r6 > /* r5 is a guest timebase value here, convert to host TB */ > - ld r3,HSTATE_KVM_VCORE(r13) > ld r4,VCORE_TB_OFFSET(r3) > subfr5,r4,r5 > std r5,VCPU_DEC_EXPIRES(r9) > @@ -2376,8 +2388,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM) > mfspr r3, SPRN_DEC > mfspr r4, SPRN_HDEC > mftbr5 > +BEGIN_FTR_SECTION > + /* On P9 check whether the guest has large decrementer mode > enabled */ > + ld r6, HSTATE_KVM_VCORE(r13) > + ld r6, VCORE_LPCR(r6) > + andis. r6, r6, LPCR_LD@h > + bne 68f > +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) > extsw r3, r3 > - EXTEND_HDEC(r4) > +68: EXTEND_HDEC(r4) > cmpdr3, r4 > ble 67f > mtspr SPRN_DEC, r4 > diff --git a/arch/powerpc/kvm/emulate
Re: [PATCH] powerpc/64: Don't try to use radix MMU under a hypervisor
On Tue, 2016-12-20 at 22:40 +1100, Paul Mackerras wrote: > Currently, if the kernel is running on a POWER9 processor under a > hypervisor, it will try to use the radix MMU even though it doesn't > have the necessary code to use radix under a hypervisor (it doesn't > negotiate use of radix, and it doesn't do the H_REGISTER_PROC_TBL > hcall). The result is that the guest kernel will crash when it tries > to turn on the MMU, because it will still actually be using the HPT > MMU, but it won't have set up any SLB or HPT entries. It does this > because the only thing that the kernel looks at in deciding to use > radix, on any platform, is the ibm,pa-features property on the cpu > device nodes. > > This fixes it by looking for the /chosen/ibm,architecture-vec-5 > property, and if it exists, clearing the radix MMU feature bit. > We do this before we decide whether to initialize for radix or HPT. > This property is created by the hypervisor as a result of the guest > calling the ibm,client-architecture-support method to indicate > its capabilities, so it only exists on systems with a hypervisor. > The reason for using this property is that in future, when we > have support for using radix under a hypervisor, we will need > to check this property to see whether the hypervisor agreed to > us using radix. > > Fixes: 17a3dd2f5fc7 ("powerpc/mm/radix: Use firmware feature to > enable Radix MMU") > Cc: sta...@vger.kernel.org # v4.7+ > Signed-off-by: Paul Mackerras > --- > arch/powerpc/mm/init_64.c | 27 +++ > 1 file changed, 27 insertions(+) > > diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c > index a000c35..098531d 100644 > --- a/arch/powerpc/mm/init_64.c > +++ b/arch/powerpc/mm/init_64.c > @@ -42,6 +42,8 @@ > #include > #include > #include > +#include > +#include > > #include > #include > @@ -344,6 +346,28 @@ static int __init parse_disable_radix(char *p) > } > early_param("disable_radix", parse_disable_radix); > > +/* > + * If we're running under a hypervisor, we currently can't do radix > + * since we don't have the code to do the H_REGISTER_PROC_TBL hcall. > + * We tell that we're running under a hypervisor by looking for the > + * /chosen/ibm,architecture-vec-5 property. > + */ > +static void early_check_vec5(void) > +{ > + unsigned long root, chosen; > + int size; > + const u8 *vec5; > + > + root = of_get_flat_dt_root(); > + chosen = of_get_flat_dt_subnode_by_name(root, "chosen"); > + if (chosen == -FDT_ERR_NOTFOUND) > + return; > + vec5 = of_get_flat_dt_prop(chosen, "ibm,architecture-vec-5", > &size); > + if (!vec5) > + return; > + cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX; > +} > + Given that currently radix guest support doesn't exist upstream, it's sufficient to check for the existence of the vec5 node to determine that we are a guest and thus can't run radix. Is it worth checking the specific radix feature bit of the vec5 node so that this code is still correct for determining the lack of radix support by the host platform once guest radix kernels are (in the future) supported? > void __init mmu_early_init_devtree(void) > { > /* Disable radix mode based on kernel command line. */ > @@ -351,6 +375,9 @@ void __init mmu_early_init_devtree(void) > cur_cpu_spec->mmu_features &= ~MMU_FTR_TYPE_RADIX; > > if (early_radix_enabled()) > + early_check_vec5(); > + > + if (early_radix_enabled()) > radix__early_init_devtree(); > else > hash__early_init_devtree();
Re: [PATCH 10/18] KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9
On Thu, 2017-01-12 at 20:07 +1100, Paul Mackerras wrote: > This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl > for HPT guests on POWER9. With this, we can return 1 for the > KVM_CAP_PPC_MMU_HASH_V3 capability. > > Signed-off-by: Paul Mackerras > --- > arch/powerpc/include/asm/kvm_host.h | 1 + > arch/powerpc/kvm/book3s_hv.c| 35 > +++ > arch/powerpc/kvm/powerpc.c | 2 +- > 3 files changed, 33 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_host.h > b/arch/powerpc/include/asm/kvm_host.h > index e59b172..944532d 100644 > --- a/arch/powerpc/include/asm/kvm_host.h > +++ b/arch/powerpc/include/asm/kvm_host.h > @@ -264,6 +264,7 @@ struct kvm_arch { > atomic_t hpte_mod_interest; > cpumask_t need_tlb_flush; > int hpt_cma_alloc; > + u64 process_table; > struct dentry *debugfs_dir; > struct dentry *htab_dentry; > #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ > diff --git a/arch/powerpc/kvm/book3s_hv.c > b/arch/powerpc/kvm/book3s_hv.c > index 1736f87..6bd0f4a 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -3092,8 +3092,8 @@ static void kvmppc_setup_partition_table(struct > kvm *kvm) > /* HTABSIZE and HTABORG fields */ > dw0 |= kvm->arch.sdr1; > > - /* Second dword has GR=0; other fields are unused since > UPRT=0 */ > - dw1 = 0; > + /* Second dword as set by userspace */ > + dw1 = kvm->arch.process_table; > > mmu_partition_table_set_entry(kvm->arch.lpid, dw0, dw1); > } > @@ -3658,10 +3658,37 @@ static void init_default_hcalls(void) > } > } > > -/* dummy implementations for now */ > static int kvmhv_configure_mmu(struct kvm *kvm, struct > kvm_ppc_mmuv3_cfg *cfg) > { > - return -EINVAL; > + unsigned long lpcr; > + > + /* If not on a POWER9, reject it */ > + if (!cpu_has_feature(CPU_FTR_ARCH_300)) > + return -ENODEV; > + > + /* If any unknown flags set, reject it */ > + if (cfg->flags & ~(KVM_PPC_MMUV3_RADIX | > KVM_PPC_MMUV3_GTSE)) > + return -EINVAL; > + > + /* We can't do radix yet */ > + if (cfg->flags & KVM_PPC_MMUV3_RADIX) > + return -EINVAL; > + > + /* GR (guest radix) bit in process_table field must match */ > + if (cfg->process_table & PATB_GR) > + return -EINVAL; > + > + /* Process table size field must be reasonable, i.e. <= 24 > */ > + if ((cfg->process_table & PRTS_MASK) > 24) > + return -EINVAL; > + > + kvm->arch.process_table = cfg->process_table; > + kvmppc_setup_partition_table(kvm); > + > + lpcr = (cfg->flags & KVM_PPC_MMUV3_GTSE) ? LPCR_GTSE : 0; > + kvmppc_update_lpcr(kvm, lpcr, LPCR_GTSE); > + > + return 0; > } > > static int kvmhv_get_rmmu_info(struct kvm *kvm, struct > kvm_ppc_rmmu_info *info) > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > index 38c0d15..1476a48 100644 > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -569,7 +569,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, > long ext) > r = !!(0 && hv_enabled && radix_enabled()); > break; > case KVM_CAP_PPC_MMU_HASH_V3: > - r = !!(0 && hv_enabled && !radix_enabled() && > + r = !!(hv_enabled && !radix_enabled() && Just because we have radix enabled, is it correct to preclude a hash guest from running? Isn't it the case that we may have support for radix but a guest choose to run in hash mode (for what ever reason)? > cpu_has_feature(CPU_FTR_ARCH_300)); > break; > #endif
Re: [PATCH 13/18] KVM: PPC: Book3S HV: Page table construction and page faults for radix guests
On Thu, 2017-01-12 at 20:07 +1100, Paul Mackerras wrote: > This adds the code to construct the second-level ("partition-scoped" > in > architecturese) page tables for guests using the radix MMU. Apart > from > the PGD level, which is allocated when the guest is created, the rest > of the tree is all constructed in response to hypervisor page faults. > > As well as hypervisor page faults for missing pages, we also get > faults > for reference/change (RC) bits needing to be set, as well as various > other error conditions. For now, we only set the R or C bit in the > guest page table if the same bit is set in the host PTE for the > backing page. > > This code can take advantage of the guest being backed with either > transparent or ordinary 2MB huge pages, and insert 2MB page entries > into the guest page tables. There is no support for 1GB huge pages > yet. > --- > arch/powerpc/include/asm/kvm_book3s.h | 8 + > arch/powerpc/kvm/book3s.c | 1 + > arch/powerpc/kvm/book3s_64_mmu_hv.c| 7 +- > arch/powerpc/kvm/book3s_64_mmu_radix.c | 385 > + > arch/powerpc/kvm/book3s_hv.c | 17 +- > 5 files changed, 415 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_book3s.h > b/arch/powerpc/include/asm/kvm_book3s.h > index 7adfcc0..ff5cd5c 100644 > --- a/arch/powerpc/include/asm/kvm_book3s.h > +++ b/arch/powerpc/include/asm/kvm_book3s.h > @@ -170,6 +170,8 @@ extern int kvmppc_book3s_hv_page_fault(struct > kvm_run *run, > unsigned long status); > extern long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, > unsigned long slb_v, unsigned long valid); > +extern int kvmppc_hv_emulate_mmio(struct kvm_run *run, struct > kvm_vcpu *vcpu, > + unsigned long gpa, gva_t ea, int is_store); > > extern void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct > hpte_cache *pte); > extern struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct kvm_vcpu > *vcpu); > @@ -182,8 +184,14 @@ extern void kvmppc_mmu_hpte_sysexit(void); > extern int kvmppc_mmu_hv_init(void); > extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, unsigned > long hc); > > +extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run, > + struct kvm_vcpu *vcpu, > + unsigned long ea, unsigned long dsisr); > extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t > eaddr, > struct kvmppc_pte *gpte, bool data, bool > iswrite); > +extern void kvmppc_free_radix(struct kvm *kvm); > +extern int kvmppc_radix_init(void); > +extern void kvmppc_radix_exit(void); > > /* XXX remove this export when load_last_inst() is generic */ > extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, > void *ptr, bool data); > diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c > index 019f008..b6b5c18 100644 > --- a/arch/powerpc/kvm/book3s.c > +++ b/arch/powerpc/kvm/book3s.c > @@ -239,6 +239,7 @@ void kvmppc_core_queue_data_storage(struct > kvm_vcpu *vcpu, ulong dar, > kvmppc_set_dsisr(vcpu, flags); > kvmppc_book3s_queue_irqprio(vcpu, > BOOK3S_INTERRUPT_DATA_STORAGE); > } > +EXPORT_SYMBOL_GPL(kvmppc_core_queue_data_storage); /* used by > kvm_hv */ > > void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, ulong > flags) > { > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c > b/arch/powerpc/kvm/book3s_64_mmu_hv.c > index c208bf3..57690c2 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c > @@ -395,8 +395,8 @@ static int instruction_is_store(unsigned int > instr) > return (instr & mask) != 0; > } > > -static int kvmppc_hv_emulate_mmio(struct kvm_run *run, struct > kvm_vcpu *vcpu, > - unsigned long gpa, gva_t ea, int > is_store) > +int kvmppc_hv_emulate_mmio(struct kvm_run *run, struct kvm_vcpu > *vcpu, > + unsigned long gpa, gva_t ea, int > is_store) > { > u32 last_inst; > > @@ -461,6 +461,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run > *run, struct kvm_vcpu *vcpu, > unsigned long rcbits; > long mmio_update; > > + if (kvm_is_radix(kvm)) > + return kvmppc_book3s_radix_page_fault(run, vcpu, ea, > dsisr); > + > /* > * Real-mode code has already searched the HPT and found the > * entry we're interested in. Lock the entry and check that > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c > b/arch/powerpc/kvm/book3s_64_mmu_radix.c > index 9091407..865ea9b 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c > @@ -137,3 +137,388 @@ int kvmppc_mmu_radix_xlate(struct kvm_vcpu > *vcpu, gva_t eaddr, > return 0; > } > > +#ifdef CONFIG_PPC_64K_PAGES > +#define MMU_BASE_PSIZE MMU_PAGE_64K > +#else > +#define MMU_BASE_PSIZE MMU_PAGE_4K > +#endif > + > +static void k
Re: [PATCH 14/18] KVM: PPC: Book3S HV: MMU notifier callbacks for radix guests
On Thu, 2017-01-12 at 20:07 +1100, Paul Mackerras wrote: > This adapts our implementations of the MMU notifier callbacks > (unmap_hva, unmap_hva_range, age_hva, test_age_hva, set_spte_hva) > to call radix functions when the guest is using radix. These > implementations are much simpler than for HPT guests because we > have only one PTE to deal with, so we don't need to traverse > rmap chains. > > Signed-off-by: Paul Mackerras > --- > arch/powerpc/include/asm/kvm_book3s.h | 6 > arch/powerpc/kvm/book3s_64_mmu_hv.c| 64 +++- > -- > arch/powerpc/kvm/book3s_64_mmu_radix.c | 54 > > 3 files changed, 103 insertions(+), 21 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_book3s.h > b/arch/powerpc/include/asm/kvm_book3s.h > index ff5cd5c..952cc4b 100644 > --- a/arch/powerpc/include/asm/kvm_book3s.h > +++ b/arch/powerpc/include/asm/kvm_book3s.h > @@ -192,6 +192,12 @@ extern int kvmppc_mmu_radix_xlate(struct > kvm_vcpu *vcpu, gva_t eaddr, > extern void kvmppc_free_radix(struct kvm *kvm); > extern int kvmppc_radix_init(void); > extern void kvmppc_radix_exit(void); > +extern int kvm_unmap_radix(struct kvm *kvm, struct kvm_memory_slot > *memslot, > + unsigned long gfn); > +extern int kvm_age_radix(struct kvm *kvm, struct kvm_memory_slot > *memslot, > + unsigned long gfn); > +extern int kvm_test_age_radix(struct kvm *kvm, struct > kvm_memory_slot *memslot, > + unsigned long gfn); > > /* XXX remove this export when load_last_inst() is generic */ > extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, > void *ptr, bool data); > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c > b/arch/powerpc/kvm/book3s_64_mmu_hv.c > index 57690c2..fbb3de4 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c > @@ -701,12 +701,13 @@ static void kvmppc_rmap_reset(struct kvm *kvm) > srcu_read_unlock(&kvm->srcu, srcu_idx); > } > > +typedef int (*hva_handler_fn)(struct kvm *kvm, struct > kvm_memory_slot *memslot, > + unsigned long gfn); > + > static int kvm_handle_hva_range(struct kvm *kvm, > unsigned long start, > unsigned long end, > - int (*handler)(struct kvm *kvm, > - unsigned long *rmapp, > - unsigned long gfn)) > + hva_handler_fn handler) > { > int ret; > int retval = 0; > @@ -731,9 +732,7 @@ static int kvm_handle_hva_range(struct kvm *kvm, > gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - > 1, memslot); > > for (; gfn < gfn_end; ++gfn) { > - gfn_t gfn_offset = gfn - memslot->base_gfn; > - > - ret = handler(kvm, &memslot- > >arch.rmap[gfn_offset], gfn); > + ret = handler(kvm, memslot, gfn); > retval |= ret; > } > } > @@ -742,20 +741,21 @@ static int kvm_handle_hva_range(struct kvm > *kvm, > } > > static int kvm_handle_hva(struct kvm *kvm, unsigned long hva, > - int (*handler)(struct kvm *kvm, unsigned > long *rmapp, > - unsigned long gfn)) > + hva_handler_fn handler) > { > return kvm_handle_hva_range(kvm, hva, hva + 1, handler); > } > > -static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, > +static int kvm_unmap_rmapp(struct kvm *kvm, struct kvm_memory_slot > *memslot, > unsigned long gfn) > { > struct revmap_entry *rev = kvm->arch.revmap; > unsigned long h, i, j; > __be64 *hptep; > unsigned long ptel, psize, rcbits; > + unsigned long *rmapp; > > + rmapp = &memslot->arch.rmap[gfn - memslot->base_gfn]; > for (;;) { > lock_rmap(rmapp); > if (!(*rmapp & KVMPPC_RMAP_PRESENT)) { > @@ -816,26 +816,36 @@ static int kvm_unmap_rmapp(struct kvm *kvm, > unsigned long *rmapp, > > int kvm_unmap_hva_hv(struct kvm *kvm, unsigned long hva) > { > - kvm_handle_hva(kvm, hva, kvm_unmap_rmapp); > + hva_handler_fn handler; > + > + handler = kvm->arch.radix ? kvm_unmap_radix : kvm_is_radix() for consistency? > kvm_unmap_rmapp; > + kvm_handle_hva(kvm, hva, handler); > return 0; > } > > int kvm_unmap_hva_range_hv(struct kvm *kvm, unsigned long start, > unsigned long end) > { > - kvm_handle_hva_range(kvm, start, end, kvm_unmap_rmapp); > + hva_handler_fn handler; > + > + handler = kvm->arch.radix ? kvm_unmap_radix : ditto > kvm_unmap_rmapp; > + kvm_handle_hva_range(kvm, start, end, handler); > return 0; > } > > void kvmppc_core_flush_memslot_hv(struct kvm *kvm, > struct kvm_memory_slot *memslot
Re: [PATCH 17/18] KVM: PPC: Book3S HV: Enable radix guest support
On Thu, 2017-01-12 at 20:07 +1100, Paul Mackerras wrote: > This adds a few last pieces of the support for radix guests: > > * Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and > KVM_PPC_GET_RMMU_INFO ioctls for radix guests > > * On POWER9, allow secondary threads to be on/off-lined while guests > are running. > > * Set up LPCR and the partition table entry for radix guests. > > * Don't allocate the rmap array in the kvm_memory_slot structure > on radix. > > * Prevent the AIL field in the LPCR being set for radix guests, > since we can't yet handle getting interrupts from the guest with > the MMU on. > > * Don't try to initialize the HPT for radix guests, since they don't > have an HPT. > > * Take out the code that prevents the HV KVM module from > initializing on radix hosts. > > At this stage, we only support radix guests if the host is running > in radix mode, and only support HPT guests if the host is running in > HPT mode. Thus a guest cannot switch from one mode to the other, > which enables some simplifications. > > Signed-off-by: Paul Mackerras > --- > arch/powerpc/include/asm/kvm_book3s.h | 2 + > arch/powerpc/kvm/book3s_64_mmu_hv.c| 1 - > arch/powerpc/kvm/book3s_64_mmu_radix.c | 45 > arch/powerpc/kvm/book3s_hv.c | 93 > -- > arch/powerpc/kvm/powerpc.c | 2 +- > 5 files changed, 115 insertions(+), 28 deletions(-) > > diff --git a/arch/powerpc/include/asm/kvm_book3s.h > b/arch/powerpc/include/asm/kvm_book3s.h > index 57dc407..2bf3501 100644 > --- a/arch/powerpc/include/asm/kvm_book3s.h > +++ b/arch/powerpc/include/asm/kvm_book3s.h > @@ -189,6 +189,7 @@ extern int kvmppc_book3s_radix_page_fault(struct > kvm_run *run, > unsigned long ea, unsigned long dsisr); > extern int kvmppc_mmu_radix_xlate(struct kvm_vcpu *vcpu, gva_t > eaddr, > struct kvmppc_pte *gpte, bool data, bool > iswrite); > +extern int kvmppc_init_vm_radix(struct kvm *kvm); > extern void kvmppc_free_radix(struct kvm *kvm); > extern int kvmppc_radix_init(void); > extern void kvmppc_radix_exit(void); > @@ -200,6 +201,7 @@ extern int kvm_test_age_radix(struct kvm *kvm, > struct kvm_memory_slot *memslot, > unsigned long gfn); > extern long kvmppc_hv_get_dirty_log_radix(struct kvm *kvm, > struct kvm_memory_slot *memslot, unsigned > long *map); > +extern int kvmhv_get_rmmu_info(struct kvm *kvm, struct > kvm_ppc_rmmu_info *info); > > /* XXX remove this export when load_last_inst() is generic */ > extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, > void *ptr, bool data); > diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c > b/arch/powerpc/kvm/book3s_64_mmu_hv.c > index 7a9afbe..db8de17 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c > @@ -155,7 +155,6 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 > *htab_orderp) > > void kvmppc_free_hpt(struct kvm *kvm) > { > - kvmppc_free_lpid(kvm->arch.lpid); > vfree(kvm->arch.revmap); > if (kvm->arch.hpt_cma_alloc) > kvm_release_hpt(virt_to_page(kvm->arch.hpt_virt), > diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c > b/arch/powerpc/kvm/book3s_64_mmu_radix.c > index 125cc7c..4344651 100644 > --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c > +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c > @@ -610,6 +610,51 @@ long kvmppc_hv_get_dirty_log_radix(struct kvm > *kvm, > return 0; > } > > +static void add_rmmu_ap_encoding(struct kvm_ppc_rmmu_info *info, > + int psize, int *indexp) > +{ > + if (!mmu_psize_defs[psize].shift) > + return; > + info->ap_encodings[*indexp] = mmu_psize_defs[psize].shift | > + (mmu_psize_defs[psize].ap << 29); > + ++(*indexp); > +} > + > +int kvmhv_get_rmmu_info(struct kvm *kvm, struct kvm_ppc_rmmu_info > *info) > +{ > + int i; > + > + if (!radix_enabled()) > + return -EINVAL; > + memset(info, 0, sizeof(*info)); > + > + /* 4k page size */ > + info->geometries[0].page_shift = 12; > + info->geometries[0].level_bits[0] = 9; > + for (i = 1; i < 4; ++i) > + info->geometries[0].level_bits[i] = > p9_supported_radix_bits[i]; > + /* 64k page size */ > + info->geometries[1].page_shift = 16; > + for (i = 0; i < 4; ++i) > + info->geometries[1].level_bits[i] = > p9_supported_radix_bits[i]; > + > + i = 0; > + add_rmmu_ap_encoding(info, MMU_PAGE_4K, &i); > + add_rmmu_ap_encoding(info, MMU_PAGE_64K, &i); > + add_rmmu_ap_encoding(info, MMU_PAGE_2M, &i); > + add_rmmu_ap_encoding(info, MMU_PAGE_1G, &i); > + > + return 0; > +} > + > +int kvmppc_init_vm_radix(struct kvm *kvm) > +{ > + kvm->arch.pgtable = pgd_alloc(kvm->mm); > + if (!kvm->arch.pgtable) > + return -ENOMEM; > + return 0; > +} > + > void kvmp
Re: [PATCH] powerpc/64: Call H_REGISTER_PROC_TBL when running as a HPT guest on POWER9
On Thu, 2017-02-16 at 16:03 +1100, Paul Mackerras wrote: > On POWER9, since commit cc3d2940133d ("powerpc/64: Enable use of > radix > MMU under hypervisor on POWER9", 2017-01-30), we set both the radix > and > HPT bits in the client-architecture-support (CAS) vector, which tells > the hypervisor that we can do either radix or HPT. According to > PAPR, > if we use this combination we are promising to do a > H_REGISTER_PROC_TBL > hcall later on to let the hypervisor know whether we are doing radix > or HPT. We currently do this call if we are doing radix but not if > we are doing HPT. If the hypervisor is able to support both radix > and HPT guests, it would be entitled to defer allocation of the HPT > until the H_REGISTER_PROC_TBL call, and to fail any attempts to > create > HPTEs until the H_REGISTER_PROC_TBL call. Thus we need to do a > H_REGISTER_PROC_TBL call when we are doing HPT; otherwise we may > crash at boot time. > > This adds the code to call H_REGISTER_PROC_TBL in this case, before > we attempt to create any HPT entries using H_ENTER. > > Fixes: cc3d2940133d ("powerpc/64: Enable use of radix MMU under > hypervisor on POWER9") > Signed-off-by: Paul Mackerras > --- > This needs to go in after the topic/ppc-kvm branch. > > arch/powerpc/mm/hash_utils_64.c | 6 ++ > arch/powerpc/platforms/pseries/lpar.c | 8 ++-- > 2 files changed, 12 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/mm/hash_utils_64.c > b/arch/powerpc/mm/hash_utils_64.c > index 8033493..b0ed96e 100644 > --- a/arch/powerpc/mm/hash_utils_64.c > +++ b/arch/powerpc/mm/hash_utils_64.c > @@ -839,6 +839,12 @@ static void __init htab_initialize(void) > /* Using a hypervisor which owns the htab */ > htab_address = NULL; > _SDR1 = 0; > + /* > + * On POWER9, we need to do a H_REGISTER_PROC_TBL > hcall > + * to inform the hypervisor that we wish to use the > HPT. > + */ > + if (cpu_has_feature(CPU_FTR_ARCH_300)) > + register_process_table(0, 0, 0); > #ifdef CONFIG_FA_DUMP > /* > * If firmware assisted dump is active firmware > preserves > diff --git a/arch/powerpc/platforms/pseries/lpar.c > b/arch/powerpc/platforms/pseries/lpar.c > index 0587655..5b47026 100644 > --- a/arch/powerpc/platforms/pseries/lpar.c > +++ b/arch/powerpc/platforms/pseries/lpar.c > @@ -609,15 +609,18 @@ static int __init disable_bulk_remove(char > *str) > > __setup("bulk_remove=", disable_bulk_remove); > > -/* Actually only used for radix, so far */ > static int pseries_lpar_register_process_table(unsigned long base, > unsigned long page_size, unsigned long > table_size) > { > long rc; > - unsigned long flags = PROC_TABLE_NEW; > + unsigned long flags = 0; > > + if (table_size) > + flags |= PROC_TABLE_NEW; > if (radix_enabled()) > flags |= PROC_TABLE_RADIX | PROC_TABLE_GTSE; > + else > + flags |= PROC_TABLE_HPT_SLB; > for (;;) { > rc = plpar_hcall_norets(H_REGISTER_PROC_TBL, flags, > base, > page_size, table_size); > @@ -643,6 +646,7 @@ void __init hpte_init_pseries(void) > mmu_hash_ops.flush_hash_range = > pSeries_lpar_flush_hash_range; > mmu_hash_ops.hpte_clear_all = pseries_hpte_clear_all; > mmu_hash_ops.hugepage_invalidate = > pSeries_lpar_hugepage_invalidate; > + register_process_table = > pseries_lpar_register_process_table; > } > > void radix_init_pseries(void) FWIW: Reviewed-by: Suraj Jitindar Singh
Re: [PATCH] powerpc: Detect POWER9 architected mode
On Fri, 2017-02-17 at 10:59 +1100, Russell Currey wrote: > Signed-off-by: Russell Currey Tested-in-QEMU-by: Suraj Jitindar Singh > --- > arch/powerpc/kernel/cputable.c | 19 +++ > 1 file changed, 19 insertions(+) > > diff --git a/arch/powerpc/kernel/cputable.c > b/arch/powerpc/kernel/cputable.c > index 6a82ef039c50..d23a54b09436 100644 > --- a/arch/powerpc/kernel/cputable.c > +++ b/arch/powerpc/kernel/cputable.c > @@ -386,6 +386,25 @@ static struct cpu_spec __initdata cpu_specs[] = > { > .machine_check_early= > __machine_check_early_realmode_p8, > .platform = "power8", > }, > + { /* 3.00-compliant processor, i.e. Power9 > "architected" mode */ > + .pvr_mask = 0x, > + .pvr_value = 0x0f05, > + .cpu_name = "POWER9 (architected)", > + .cpu_features = CPU_FTRS_POWER9, > + .cpu_user_features = COMMON_USER_POWER9, > + .cpu_user_features2 = COMMON_USER2_POWER9, > + .mmu_features = MMU_FTRS_POWER9, > + .icache_bsize = 128, > + .dcache_bsize = 128, > + .num_pmcs = 6, > + .pmc_type = PPC_PMC_IBM, > + .oprofile_cpu_type = "ppc64/ibm-compat-v1", > + .oprofile_type = > PPC_OPROFILE_INVALID, > + .cpu_setup = __setup_cpu_power9, > + .cpu_restore= __restore_cpu_power9, > + .flush_tlb = __flush_tlb_power9, > + .platform = "power9", > + }, > { /* Power7 */ > .pvr_mask = 0x, > .pvr_value = 0x003f,
[RFC NO-MERGE 1/2] arch/powerpc/prom_init: Parse the command line before calling CAS
CAS now requires the guest to tell the host whether it would like to use a hash or radix mmu. It is possible to disable radix by passing "disable_radix" on the command line. The next patch will add support for the new CAS format, thus we need to parse the command line before calling CAS so we can correctly represent which mmu we would like to use. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/kernel/prom_init.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index d3db1bc..37b5a29 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -2993,6 +2993,11 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, */ prom_check_initrd(r3, r4); + /* +* Do early parsing of command line +*/ + early_cmdline_parse(); + #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) /* * On pSeries, inform the firmware about our capabilities @@ -3009,11 +3014,6 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, copy_and_flush(0, kbase, 0x100, 0); /* -* Do early parsing of command line -*/ - early_cmdline_parse(); - - /* * Initialize memory management within prom_init */ prom_init_mem(); -- 2.5.5
[RFC NO-MERGE 2/2] arch/powerpc/CAS: Update to new option-vector-5 format for CAS
The CAS process has been updated to change how the host to guest negotiation is done for the new hash/radix mmu as well as the nest mmu, process tables and guest translation shootdown (GTSE). The host tells the guest which options it supports in ibm,arch-vec-5-platform-support. The guest then chooses a subset of these to request in the CAS call and these are agreed to in the ibm,architecture-vec-5 property of the chosen node. Thus we read ibm,arch-vec-5-platform-support and make our selection before calling CAS. We then parse the ibm,architecture-vec-5 property of the chosen node to check whether we should run as hash or radix. Signed-off-by: Suraj Jitindar Singh --- arch/powerpc/include/asm/prom.h | 16 --- arch/powerpc/kernel/prom_init.c | 99 +++-- arch/powerpc/mm/init_64.c | 31 ++--- 3 files changed, 130 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h index 8af2546..19d2e84 100644 --- a/arch/powerpc/include/asm/prom.h +++ b/arch/powerpc/include/asm/prom.h @@ -158,12 +158,16 @@ struct of_drconf_cell { #define OV5_PFO_HW_ENCR0x1120 /* PFO Encryption Accelerator */ #define OV5_SUB_PROCESSORS 0x1501 /* 1,2,or 4 Sub-Processors supported */ #define OV5_XIVE_EXPLOIT 0x1701 /* XIVE exploitation supported */ -#define OV5_MMU_RADIX_300 0x1880 /* ISA v3.00 radix MMU supported */ -#define OV5_MMU_HASH_300 0x1840 /* ISA v3.00 hash MMU supported */ -#define OV5_MMU_SEGM_RADIX 0x1820 /* radix mode (no segmentation) */ -#define OV5_MMU_PROC_TBL 0x1810 /* hcall selects SLB or proc table */ -#define OV5_MMU_SLB0x1800 /* always use SLB */ -#define OV5_MMU_GTSE 0x1808 /* Guest translation shootdown */ +/* MMU Base Architecture */ +#define OV5_MMU_HASH_300 0x1800 /* ISA v3.00 Hash MMU Only */ +#define OV5_MMU_RADIX_300 0x1840 /* ISA v3.00 Radix MMU Only */ +#define OV5_MMU_EITHER_300 0x1880 /* ISA v3.00 Hash or Radix Supported */ +#define OV5_NMMU 0x1820 /* Nest MMU Available */ +/* Hash Table Extensions */ +#define OV5_HASH_SEG_TBL 0x1980 /* In Memory Segment Tables Available */ +#define OV5_HASH_GTSE 0x1940 /* Guest Translation Shoot Down Avail */ +/* Radix Table Extensions */ +#define OV5_RADIX_GTSE 0x1A40 /* Guest Translation Shoot Down Avail */ /* Option Vector 6: IBM PAPR hints */ #define OV6_LINUX 0x02/* Linux is our OS */ diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 37b5a29..8272104 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -168,6 +168,8 @@ static unsigned long __initdata prom_tce_alloc_start; static unsigned long __initdata prom_tce_alloc_end; #endif +static bool __initdata prom_radix_disable; + /* Platforms codes are now obsolete in the kernel. Now only used within this * file and ultimately gone too. Feel free to change them if you need, they * are not shared with anything outside of this file anymore @@ -626,6 +628,12 @@ static void __init early_cmdline_parse(void) prom_memory_limit = ALIGN(prom_memory_limit, 0x100); #endif } + + opt = strstr(prom_cmd_line, "disable_radix"); + if (opt) { + prom_debug("Radix disabled from cmdline\n"); + prom_radix_disable = true; + } } #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) @@ -693,8 +701,10 @@ struct option_vector5 { __be16 reserved3; u8 subprocessors; u8 byte22; - u8 intarch; + u8 xive; u8 mmu; + u8 hash_ext; + u8 radix_ext; } __packed; struct option_vector6 { @@ -849,9 +859,10 @@ struct ibm_arch_vec __cacheline_aligned ibm_architecture_vec = { .reserved2 = 0, .reserved3 = 0, .subprocessors = 1, - .intarch = 0, - .mmu = OV5_FEAT(OV5_MMU_RADIX_300) | OV5_FEAT(OV5_MMU_HASH_300) | - OV5_FEAT(OV5_MMU_PROC_TBL) | OV5_FEAT(OV5_MMU_GTSE), + .xive = 0, + .mmu = 0, + .hash_ext = 0, + .radix_ext = 0, }, /* option vector 6: IBM PAPR hints */ @@ -990,6 +1001,83 @@ static int __init prom_count_smt_threads(void) } +static void __init prom_check_platform_support(void) +{ + int prop_len, i; + bool radix_gtse = false, radix_mmu = false, hash_mmu = false; + + prop_len = prom_getproplen(prom.chosen, + "ibm,arch-vec-5-platform-support"); + if (prop_len > 1) { + u8 val[prop_len]; + prom_debug("Found ibm,arch-vec-5-platform-support, len: %d\n", + prop_len); + prom_getprop(prom.chosen, "ibm,arch-vec-5-platform-support
Re: [RFC NO-MERGE 2/2] arch/powerpc/CAS: Update to new option-vector-5 format for CAS
On Thu, 2017-02-23 at 15:44 +1100, Paul Mackerras wrote: > On Tue, Feb 21, 2017 at 05:06:11PM +1100, Suraj Jitindar Singh wrote: > > > > The CAS process has been updated to change how the host to guest > Once again, explain CAS; perhaps "The ibm,client-architecture-support > (CAS) negotiation process has been updated for POWER9 to ..." > > > > > negotiation is done for the new hash/radix mmu as well as the nest > > mmu, > > process tables and guest translation shootdown (GTSE). > > > > The host tells the guest which options it supports in > > ibm,arch-vec-5-platform-support. The guest then chooses a subset of > > these > > to request in the CAS call and these are agreed to in the > > ibm,architecture-vec-5 property of the chosen node. > > > > Thus we read ibm,arch-vec-5-platform-support and make our selection > > before > > calling CAS. We then parse the ibm,architecture-vec-5 property of > > the > > chosen node to check whether we should run as hash or radix. > > > > Signed-off-by: Suraj Jitindar Singh > > --- > > arch/powerpc/include/asm/prom.h | 16 --- > > arch/powerpc/kernel/prom_init.c | 99 > > +++-- > > arch/powerpc/mm/init_64.c | 31 ++--- > > 3 files changed, 130 insertions(+), 16 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/prom.h > > b/arch/powerpc/include/asm/prom.h > > index 8af2546..19d2e84 100644 > > --- a/arch/powerpc/include/asm/prom.h > > +++ b/arch/powerpc/include/asm/prom.h > > @@ -158,12 +158,16 @@ struct of_drconf_cell { > > #define OV5_PFO_HW_ENCR0x1120 /* PFO > > Encryption Accelerator */ > > #define OV5_SUB_PROCESSORS 0x1501 /* 1,2,or 4 Sub- > > Processors supported */ > > #define OV5_XIVE_EXPLOIT 0x1701 /* XIVE exploitation > > supported */ > > -#define OV5_MMU_RADIX_300 0x1880 /* ISA v3.00 radix > > MMU supported */ > > -#define OV5_MMU_HASH_300 0x1840 /* ISA v3.00 hash > > MMU supported */ > > -#define OV5_MMU_SEGM_RADIX 0x1820 /* radix mode (no > > segmentation) */ > > -#define OV5_MMU_PROC_TBL 0x1810 /* hcall selects SLB > > or proc table */ > > -#define OV5_MMU_SLB0x1800 /* always use SLB > > */ > > -#define OV5_MMU_GTSE 0x1808 /* Guest > > translation shootdown */ > > +/* MMU Base Architecture */ > > +#define OV5_MMU_HASH_300 0x1800 /* ISA v3.00 Hash > > MMU Only */ > This is actually legacy HPT as well as ISA v3.00 HPT. True > > > > > +#define OV5_MMU_RADIX_300 0x1840 /* ISA v3.00 Radix > > MMU Only */ > > +#define OV5_MMU_EITHER_300 0x1880 /* ISA v3.00 Hash > > or Radix Supported */ > I wonder if it would work better to have a define for the 2-bit field > with subsidiary definitions for the field values. Something like > > #define OV5_MMU_SELECTION 0x18c0 > #define OV5_MMU_HPT 0x00 > #define OV5_MMU_RADIX0x40 > #define OV5_MMU_EITHER 0x80 Yep that's clearer > > > > > +#define OV5_NMMU 0x1820 /* Nest MMU > > Available */ > > +/* Hash Table Extensions */ > > +#define OV5_HASH_SEG_TBL 0x1980 /* In Memory Segment > > Tables Available */ > > +#define OV5_HASH_GTSE 0x1940 /* Guest > > Translation Shoot Down Avail */ > > +/* Radix Table Extensions */ > > +#define OV5_RADIX_GTSE 0x1A40 /* Guest > > Translation Shoot Down Avail */ > > > > /* Option Vector 6: IBM PAPR hints */ > > #define OV6_LINUX 0x02/* Linux is our OS */ > > diff --git a/arch/powerpc/kernel/prom_init.c > > b/arch/powerpc/kernel/prom_init.c > > index 37b5a29..8272104 100644 > > --- a/arch/powerpc/kernel/prom_init.c > > +++ b/arch/powerpc/kernel/prom_init.c > > @@ -168,6 +168,8 @@ static unsigned long __initdata > > prom_tce_alloc_start; > > static unsigned long __initdata prom_tce_alloc_end; > > #endif > > > > +static bool __initdata prom_radix_disable; > > + > > /* Platforms codes are now obsolete in the kernel. Now only used > > within this > > * file and ultimately gone too. Feel free to change them if you > > need, they > > * are not shared with anything outside of this file anymore > > @@ -626,6 +628,12 @@ static void __init early_cmdline_parse(void) > > prom_memory_limit = ALIGN(prom_memory_limit, > > 0x100); > > #endif > > } > > + > > + opt = strstr(
[PATCH V2 1/2] arch/powerpc/prom_init: Parse the command line before calling CAS
On POWER9 the hypervisor requires the guest to decide whether it would like to use a hash or radix mmu model at the time it calls ibm,client-architecture-support (CAS) based on what the hypervisor has said it's allowed to do. It is possible to disable radix by passing "disable_radix" on the command line. The next patch will add support for the new CAS format, thus we need to parse the command line before calling CAS so we can correctly select which mmu we would like to use. Signed-off-by: Suraj Jitindar Singh Reviewed-by: Paul Mackerras --- V1 -> V2: - Reword commit message for clarity. No functional change --- arch/powerpc/kernel/prom_init.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index d3db1bc..37b5a29 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -2993,6 +2993,11 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, */ prom_check_initrd(r3, r4); + /* +* Do early parsing of command line +*/ + early_cmdline_parse(); + #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) /* * On pSeries, inform the firmware about our capabilities @@ -3009,11 +3014,6 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, copy_and_flush(0, kbase, 0x100, 0); /* -* Do early parsing of command line -*/ - early_cmdline_parse(); - - /* * Initialize memory management within prom_init */ prom_init_mem(); -- 2.5.5
[PATCH V2 2/2] arch/powerpc/CAS: Update to new option-vector-5 format for CAS
On POWER9 the ibm,client-architecture-support (CAS) negotiation process has been updated to change how the host to guest negotiation is done for the new hash/radix mmu as well as the nest mmu, process tables and guest translation shootdown (GTSE). The host tells the guest which options it supports in ibm,arch-vec-5-platform-support. The guest then chooses a subset of these to request in the CAS call and these are agreed to in the ibm,architecture-vec-5 property of the chosen node. Thus we read ibm,arch-vec-5-platform-support and make our selection before calling CAS. We then parse the ibm,architecture-vec-5 property of the chosen node to check whether we should run as hash or radix. ibm,arch-vec-5-platform-support format: index value pairs: ... index: Option vector 5 byte number val: Some representation of supported values Signed-off-by: Suraj Jitindar Singh --- V1 -> V2: - Fix error where whole byte was compared for mmu support instead of only the first two bytes - Break platform support parsing into multiple functions for clarity - Instead of printing WARNING: messages on old hypervisors change to a debug message --- arch/powerpc/include/asm/prom.h | 17 -- arch/powerpc/kernel/prom_init.c | 120 ++-- arch/powerpc/mm/init_64.c | 36 ++-- 3 files changed, 157 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h index 8af2546..d838b9d 100644 --- a/arch/powerpc/include/asm/prom.h +++ b/arch/powerpc/include/asm/prom.h @@ -158,12 +158,17 @@ struct of_drconf_cell { #define OV5_PFO_HW_ENCR0x1120 /* PFO Encryption Accelerator */ #define OV5_SUB_PROCESSORS 0x1501 /* 1,2,or 4 Sub-Processors supported */ #define OV5_XIVE_EXPLOIT 0x1701 /* XIVE exploitation supported */ -#define OV5_MMU_RADIX_300 0x1880 /* ISA v3.00 radix MMU supported */ -#define OV5_MMU_HASH_300 0x1840 /* ISA v3.00 hash MMU supported */ -#define OV5_MMU_SEGM_RADIX 0x1820 /* radix mode (no segmentation) */ -#define OV5_MMU_PROC_TBL 0x1810 /* hcall selects SLB or proc table */ -#define OV5_MMU_SLB0x1800 /* always use SLB */ -#define OV5_MMU_GTSE 0x1808 /* Guest translation shootdown */ +/* MMU Base Architecture */ +#define OV5_MMU_SUPPORT0x18C0 /* MMU Mode Support Mask */ +#define OV5_MMU_HASH 0x00/* Hash MMU Only */ +#define OV5_MMU_RADIX 0x40/* Radix MMU Only */ +#define OV5_MMU_EITHER 0x80/* Hash or Radix Supported */ +#define OV5_NMMU 0x1820 /* Nest MMU Available */ +/* Hash Table Extensions */ +#define OV5_HASH_SEG_TBL 0x1980 /* In Memory Segment Tables Available */ +#define OV5_HASH_GTSE 0x1940 /* Guest Translation Shoot Down Avail */ +/* Radix Table Extensions */ +#define OV5_RADIX_GTSE 0x1A40 /* Guest Translation Shoot Down Avail */ /* Option Vector 6: IBM PAPR hints */ #define OV6_LINUX 0x02/* Linux is our OS */ diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 37b5a29..08cd1b8 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -168,6 +168,14 @@ static unsigned long __initdata prom_tce_alloc_start; static unsigned long __initdata prom_tce_alloc_end; #endif +static bool __initdata prom_radix_disable; + +struct platform_support { + bool hash_mmu; + bool radix_mmu; + bool radix_gtse; +}; + /* Platforms codes are now obsolete in the kernel. Now only used within this * file and ultimately gone too. Feel free to change them if you need, they * are not shared with anything outside of this file anymore @@ -626,6 +634,12 @@ static void __init early_cmdline_parse(void) prom_memory_limit = ALIGN(prom_memory_limit, 0x100); #endif } + + opt = strstr(prom_cmd_line, "disable_radix"); + if (opt) { + prom_debug("Radix disabled from cmdline\n"); + prom_radix_disable = true; + } } #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) @@ -693,8 +707,10 @@ struct option_vector5 { __be16 reserved3; u8 subprocessors; u8 byte22; - u8 intarch; + u8 xive; u8 mmu; + u8 hash_ext; + u8 radix_ext; } __packed; struct option_vector6 { @@ -849,9 +865,10 @@ struct ibm_arch_vec __cacheline_aligned ibm_architecture_vec = { .reserved2 = 0, .reserved3 = 0, .subprocessors = 1, - .intarch = 0, - .mmu = OV5_FEAT(OV5_MMU_RADIX_300) | OV5_FEAT(OV5_MMU_HASH_300) | - OV5_FEAT(OV5_MMU_PROC_TBL) | OV5_FEAT(OV5_MMU_GTSE), + .xive = 0, + .mmu = 0, + .hash_ext = 0, + .radix_ext = 0, }, /* option vector 6: IBM PAPR hint
Re: [PATCH v2] powerpc/powernv: add hdat attribute to sysfs
On Fri, 2017-02-24 at 15:28 +1100, Matt Brown wrote: > The HDAT data area is consumed by skiboot and turned into a device- > tree. > In some cases we would like to look directly at the HDAT, so this > patch > adds a sysfs node to allow it to be viewed. This is not possible > through > /dev/mem as it is reserved memory which is stopped by the /dev/mem > filter. > > Signed-off-by: Matt Brown Your first patch, nice work! :) See below. > --- > > Between v1 and v2 of the patch the following changes were made. > Changelog: > - moved hdat code into opal-hdat.c > - added opal-hdat to the makefile > - changed struct and variable names from camelcase > --- > arch/powerpc/include/asm/opal.h| 1 + > arch/powerpc/platforms/powernv/Makefile| 1 + > arch/powerpc/platforms/powernv/opal-hdat.c | 63 > ++ > arch/powerpc/platforms/powernv/opal.c | 2 + > 4 files changed, 67 insertions(+) > create mode 100644 arch/powerpc/platforms/powernv/opal-hdat.c > > diff --git a/arch/powerpc/include/asm/opal.h > b/arch/powerpc/include/asm/opal.h > index 5c7db0f..b26944e 100644 > --- a/arch/powerpc/include/asm/opal.h > +++ b/arch/powerpc/include/asm/opal.h > @@ -277,6 +277,7 @@ extern int opal_async_comp_init(void); > extern int opal_sensor_init(void); > extern int opal_hmi_handler_init(void); > extern int opal_event_init(void); > +extern void opal_hdat_sysfs_init(void); > > extern int opal_machine_check(struct pt_regs *regs); > extern bool opal_mce_check_early_recovery(struct pt_regs *regs); > diff --git a/arch/powerpc/platforms/powernv/Makefile > b/arch/powerpc/platforms/powernv/Makefile > index b5d98cb..9a0c9d6 100644 > --- a/arch/powerpc/platforms/powernv/Makefile > +++ b/arch/powerpc/platforms/powernv/Makefile > @@ -3,6 +3,7 @@ obj-y += opal-rtc.o opal- > nvram.o opal-lpc.o opal-flash.o > obj-y+= rng.o opal-elog.o opal-dump.o opal- > sysparam.o opal-sensor.o > obj-y+= opal-msglog.o opal-hmi.o opal- > power.o opal-irqchip.o > obj-y+= opal-kmsg.o > +obj-y+= opal-hdat.o > > obj-$(CONFIG_SMP)+= smp.o subcore.o subcore-asm.o > obj-$(CONFIG_PCI)+= pci.o pci-ioda.o npu-dma.o > diff --git a/arch/powerpc/platforms/powernv/opal-hdat.c > b/arch/powerpc/platforms/powernv/opal-hdat.c > new file mode 100644 > index 000..bd305e0 > --- /dev/null > +++ b/arch/powerpc/platforms/powernv/opal-hdat.c > @@ -0,0 +1,63 @@ > +/* > + * PowerNV OPAL in-memory console interface > + * > + * Copyright 2014 IBM Corp. 2014? > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or (at your option) any later version. Check with someone maybe, but I thought we had to use V2. > + */ > + > +#include > +#include > +#include > +#include > + > +struct hdat_info { > + char *base; > + u64 size; > +}; > + > +static struct hdat_info hdat_inf; > + > +/* Read function for HDAT attribute in sysfs */ > +static ssize_t hdat_read(struct file *file, struct kobject *kobj, I assume this is just misaligned in my mail client... > + struct bin_attribute *bin_attr, char *to, > + loff_t pos, size_t count) > +{ > + if (!hdat_inf.base) > + return -ENODEV; > + > + return memory_read_from_buffer(to, count, &pos, > hdat_inf.base, > + hdat_inf.size); > +} > + > + > +/* HDAT attribute for sysfs */ > +static struct bin_attribute hdat_attr = { > + .attr = {.name = "hdat", .mode = 0444}, > + .read = hdat_read > +}; > + > +void __init opal_hdat_sysfs_init(void) > +{ > + u64 hdat_addr[2]; > + > + /* Check for the hdat-map prop in device-tree */ > + if (of_property_read_u64_array(opal_node, "hdat-map", > hdat_addr, 2)) { > + pr_debug("OPAL: Property hdat-map not found.\n"); > + return; > + } > + > + /* Print out hdat-map values. [0]: base, [1]: size */ > + pr_debug("OPAL: HDAT Base address: %#llx\n", hdat_addr[0]); > + pr_debug("OPAL: HDAT Size: %#llx\n", hdat_addr[1]); > + > + hdat_inf.base = phys_to_virt(hdat_addr[0]); > + hdat_inf.size = hdat_addr[1]; > + > + if (sysfs_create_bin_file(opal_kobj, &hdat_attr) != 0) Not Required This can be replaced with: "if (sysfs_create_bin_file(opal_kobj, &hdat_attr))" > + pr_debug("OPAL: sysfs file creation for HDAT > failed"); > + > +} > diff --git a/arch/powerpc/platforms/powernv/opal.c > b/arch/powerpc/platforms/powernv/opal.c > index 2822935..cae3745 100644 > --- a/arch/powerpc/platforms/powernv/opal.c > +++ b/arch/powerpc/platforms/powernv/opal.c > @@ -740,6 +740,8 @@
[PATCH V3 1/2] powerpc: Parse the command line before calling CAS
On POWER9 the hypervisor requires the guest to decide whether it would like to use a hash or radix mmu model at the time it calls ibm,client-architecture-support (CAS) based on what the hypervisor has said it's allowed to do. It is possible to disable radix by passing "disable_radix" on the command line. The next patch will add support for the new CAS format, thus we need to parse the command line before calling CAS so we can correctly select which mmu we would like to use. Signed-off-by: Suraj Jitindar Singh Reviewed-by: Paul Mackerras --- V1 -> V3: - Reword commit message for clarity. No functional change --- arch/powerpc/kernel/prom_init.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index d3db1bc..37b5a29 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -2993,6 +2993,11 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, */ prom_check_initrd(r3, r4); + /* +* Do early parsing of command line +*/ + early_cmdline_parse(); + #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) /* * On pSeries, inform the firmware about our capabilities @@ -3009,11 +3014,6 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4, copy_and_flush(0, kbase, 0x100, 0); /* -* Do early parsing of command line -*/ - early_cmdline_parse(); - - /* * Initialize memory management within prom_init */ prom_init_mem(); -- 2.5.5
[PATCH V3 2/2] powerpc: Update to new option-vector-5 format for CAS
On POWER9 the ibm,client-architecture-support (CAS) negotiation process has been updated to change how the host to guest negotiation is done for the new hash/radix mmu as well as the nest mmu, process tables and guest translation shootdown (GTSE). The host tells the guest which options it supports in ibm,arch-vec-5-platform-support. The guest then chooses a subset of these to request in the CAS call and these are agreed to in the ibm,architecture-vec-5 property of the chosen node. Thus we read ibm,arch-vec-5-platform-support and make our selection before calling CAS. We then parse the ibm,architecture-vec-5 property of the chosen node to check whether we should run as hash or radix. ibm,arch-vec-5-platform-support format: index value pairs: ... index: Option vector 5 byte number val: Some representation of supported values Signed-off-by: Suraj Jitindar Singh --- V2 -> V3: - Check for new either with dynamic switching option in ibm,arch-vec-5-platform-support which is indicated by 0xC0 and is used to tell the guest it can choose either HASH or RADIX and is allowed to dynamically switch later via H_REGISTER_PROCESS_TABLE V1 -> V2: - Fix error where whole byte was compared for mmu support instead of only the first two bytes - Break platform support parsing into multiple functions for clarity - Instead of printing WARNING: messages on old hypervisors change to a debug message --- arch/powerpc/include/asm/prom.h | 18 -- arch/powerpc/kernel/prom_init.c | 121 ++-- arch/powerpc/mm/init_64.c | 36 ++-- 3 files changed, 159 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h index 8af2546..d1b240b 100644 --- a/arch/powerpc/include/asm/prom.h +++ b/arch/powerpc/include/asm/prom.h @@ -158,12 +158,18 @@ struct of_drconf_cell { #define OV5_PFO_HW_ENCR0x1120 /* PFO Encryption Accelerator */ #define OV5_SUB_PROCESSORS 0x1501 /* 1,2,or 4 Sub-Processors supported */ #define OV5_XIVE_EXPLOIT 0x1701 /* XIVE exploitation supported */ -#define OV5_MMU_RADIX_300 0x1880 /* ISA v3.00 radix MMU supported */ -#define OV5_MMU_HASH_300 0x1840 /* ISA v3.00 hash MMU supported */ -#define OV5_MMU_SEGM_RADIX 0x1820 /* radix mode (no segmentation) */ -#define OV5_MMU_PROC_TBL 0x1810 /* hcall selects SLB or proc table */ -#define OV5_MMU_SLB0x1800 /* always use SLB */ -#define OV5_MMU_GTSE 0x1808 /* Guest translation shootdown */ +/* MMU Base Architecture */ +#define OV5_MMU_SUPPORT0x18C0 /* MMU Mode Support Mask */ +#define OV5_MMU_HASH 0x00/* Hash MMU Only */ +#define OV5_MMU_RADIX 0x40/* Radix MMU Only */ +#define OV5_MMU_EITHER 0x80/* Hash or Radix Supported */ +#define OV5_MMU_DYNAMIC0xC0/* Hash or Radix Can Switch Later */ +#define OV5_NMMU 0x1820 /* Nest MMU Available */ +/* Hash Table Extensions */ +#define OV5_HASH_SEG_TBL 0x1980 /* In Memory Segment Tables Available */ +#define OV5_HASH_GTSE 0x1940 /* Guest Translation Shoot Down Avail */ +/* Radix Table Extensions */ +#define OV5_RADIX_GTSE 0x1A40 /* Guest Translation Shoot Down Avail */ /* Option Vector 6: IBM PAPR hints */ #define OV6_LINUX 0x02/* Linux is our OS */ diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 37b5a29..4110350 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -168,6 +168,14 @@ static unsigned long __initdata prom_tce_alloc_start; static unsigned long __initdata prom_tce_alloc_end; #endif +static bool __initdata prom_radix_disable; + +struct platform_support { + bool hash_mmu; + bool radix_mmu; + bool radix_gtse; +}; + /* Platforms codes are now obsolete in the kernel. Now only used within this * file and ultimately gone too. Feel free to change them if you need, they * are not shared with anything outside of this file anymore @@ -626,6 +634,12 @@ static void __init early_cmdline_parse(void) prom_memory_limit = ALIGN(prom_memory_limit, 0x100); #endif } + + opt = strstr(prom_cmd_line, "disable_radix"); + if (opt) { + prom_debug("Radix disabled from cmdline\n"); + prom_radix_disable = true; + } } #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV) @@ -693,8 +707,10 @@ struct option_vector5 { __be16 reserved3; u8 subprocessors; u8 byte22; - u8 intarch; + u8 xive; u8 mmu; + u8 hash_ext; + u8 radix_ext; } __packed; struct option_vector6 { @@ -849,9 +865,10 @@ struct ibm_arch_vec __cacheline_aligned ibm_architecture_vec = { .reserved2 = 0, .reserved3 = 0, .subprocessors = 1, -
Re: [RFC PATCH 2/2] KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
On Fri, 2017-12-08 at 17:11 +1100, Paul Mackerras wrote: > POWER9 has hardware bugs relating to transactional memory and thread > reconfiguration (changes to hardware SMT mode). Specifically, the > core > does not have enough storage to store a complete checkpoint of all > the > architected state for all four threads. The DD2.2 version of POWER9 > includes hardware modifications designed to allow hypervisor software > to implement workarounds for these problems. This patch implements > those workarounds in KVM code so that KVM guests see a full, working > transactional memory implementation. > > The problems center around the use of TM suspended state, where the > CPU has a checkpointed state but execution is not transactional. The > workaround is to implement a "fake suspend" state, which looks to the > guest like suspended state but the CPU does not store a checkpoint. > In this state, any instruction that would cause a transition to > transactional state (rfid, rfebb, mtmsrd, tresume) or would use the > checkpointed state (treclaim) causes a "soft patch" interrupt (vector > 0x1500) to the hypervisor so that it can be emulated. The trechkpt > instruction also causes a soft patch interrupt. > > On POWER9 DD2.2, we avoid returning to the guest in any state which > would require a checkpoint to be present. The trechkpt in the guest > entry path which would normally create that checkpoint is replaced by > either a transition to fake suspend state, if the guest is in suspend > state, or a rollback to the pre-transactional state if the guest is > in > transactional state. Fake suspend state is indicated by a flag in > the > PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only > and > reads back as 0. > > On exit from the guest, if the guest is in fake suspend state, we > still > do the treclaim instruction as we would in real suspend state, in > order > to get into non-transactional state, but we do not save the resulting > register state since there was no checkpoint. > > Emulation of the instructions that cause a softpath interrupt is > handled > in two paths. If the guest is in real suspend mode, we call > kvmhv_p9_tm_emulation_early() to handle the cases where the guest is > transitioning to transactional state. This is called before we do > the treclaim in the guest exit path; because we haven't done > treclaim, > we can get back to the guest with the transaction still active. > If the instruction is a case that kvmhv_p9_tm_emulation_early() > doesn't > handle, or if the guest is in fake suspend state, then we proceed to > do the complete guest exit path and subsequently call > kvmhv_p9_tm_emulation() in host context with the MMU on. This > handles all the cases including the cases that generate program > interrupts (illegal instruction or TM Bad Thing) and facility > unavailable interrupts. > > The emulation is reasonably straightforward and is mostly concerned > with checking for exception conditions and updating the state of > registers such as MSR and CR0. The treclaim emulation takes care to > ensure that the TEXASR register gets updated as if it were the guest > treclaim instruction that had done failure recording, not the > treclaim > done in hypervisor state in the guest exit path. > > Signed-off-by: Paul Mackerras > With the following patch applied on top of the TM emulation code I was able to get at least a basic test to run on the guest on real hardware. [snip] diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index c7fe377ff6bc..adf2da6b2211 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -3049,6 +3049,7 @@ BEGIN_FTR_SECTION li r0, PSSCR_FAKE_SUSPEND andcr3, r3, r0 mtspr SPRN_PSSCR, r3 + ld r9, HSTATE_KVM_VCPU(r13) b 1f 2: END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL) @@ -3273,8 +3274,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_P9_TM_EMUL) b 9b /* and return */ 10:stdur1, -PPC_MIN_STKFRM(r1) /* guest is in transactional state, so simulate rollback */ + mr r3, r4 bl kvmhv_emulate_tm_rollback nop + ld r4, HSTATE_KVM_VCPU(r13) /* our vcpu pointer has been trashed */ addir1, r1, PPC_MIN_STKFRM b 9b #endif
Re: [PATCH] KVM: PPC: Book3S: Add capabilities for Meltdown/Spectre workarounds
On Tue, 2018-01-09 at 15:48 +1100, Paul Mackerras wrote: > This adds three new capabilities that give userspace information > about > the underlying machine's level of vulnerability to the Meltdown and > Spectre attacks, and what instructions the hardware implements to > assist software to work around the vulnerabilities. > > Each capability is a tri-state, where 0 indicates that the machine is > vulnerable and no workarounds are implement, 1 indicates that the > machine is vulnerable but workaround assist instructions are > available, and 2 indicates that the machine is not vulnerable. > > The capabilities are: > > KVM_CAP_PPC_SAFE_CACHE reports the vulnerability of the machine to > attacks based on using speculative loads to data in L1 cache which > should not be addressable. The workaround provided by hardware is an > instruction to invalidate the entire L1 data cache. > > KVM_CAP_PPC_SAFE_BOUNDS_CHECK reports the vulnerability of the > machine > to attacks based on using speculative loads behind mispredicted > bounds > checks. The workaround provided by hardware is an instruction that > acts as a speculation barrier. > > KVM_CAP_PPC_SAFE_INDIRECT_BRANCH reports the vulnerability of the > machine to attacks based on poisoning the indirect branch predictor. > No workaround that requires software changes is provided; the current > hardware fix is to prevent speculation past indirect branches. > > Signed-off-by: Paul Mackerras > --- > Note: This patch depends on the patch "powerpc/pseries: Add > H_GET_CPU_CHARACTERISTICS flags & wrapper" by Michael Ellerman, > available at http://patchwork.ozlabs.org/patch/856914/ . > > Documentation/virtual/kvm/api.txt | 36 +++ > arch/powerpc/kvm/powerpc.c| 202 > ++ > include/uapi/linux/kvm.h | 3 + > 3 files changed, 241 insertions(+) > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index 57d3ee9..8d76260 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -4369,3 +4369,39 @@ Parameters: none > This capability indicates if the flic device will be able to get/set > the > AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and > allows > to discover this without having to create a flic device. > + > +8.14 KVM_CAP_PPC_SAFE_CACHE > + > +Architectures: ppc > + > +This capability gives information about the underlying machine's > +vulnerability or otherwise to the Meltdown attack. Its value is a > +tristate, where 0 indicates the machine is vulnerable, 1 indicates > the > +hardware is vulnerable but provides assistance to work around the > +vulnerability (specifically by providing a fast L1 data cache flush > +facility), and 2 indicates that the machine is not vulnerable. > + > +8.15 KVM_CAP_PPC_SAFE_BOUNDS_CHECK > + > +Architectures: ppc > + > +This capability gives information about the underlying machine's > +vulnerability or otherwise to the bounds-check variant of the > Spectre > +attack. Its value is a tristate, where 0 indicates the machine is > +vulnerable, 1 indicates the hardware is vulnerable but provides > +assistance to work around the vulnerability (specifically by > providing > +an instruction that acts as a speculation barrier), and 2 indicates > +that the machine is not vulnerable. > + > +8.16 KVM_CAP_PPC_SAFE_INDIRECT_BRANCH > + > +Architectures: ppc > + > +This capability gives information about the underlying machine's > +vulnerability or otherwise to the indirect branch variant of the > Spectre > +attack. Its value is a tristate, where 0 indicates the machine is > +vulnerable and 2 indicates that the machine is not vulnerable. > +(1 would indicate the availability of a workaround that software > +needs to implement, but there is currently no workaround that needs > +software changes.) > + > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > index 1915e86..58e863b 100644 > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -39,6 +39,10 @@ > #include > #include > #include > +#ifdef CONFIG_PPC_PSERIES > +#include > +#include > +#endif > > #include "timing.h" > #include "irq.h" > @@ -488,6 +492,193 @@ void kvm_arch_destroy_vm(struct kvm *kvm) > module_put(kvm->arch.kvm_ops->owner); > } > > +#ifdef CONFIG_PPC_BOOK3S_64 > +/* > + * These functions check whether the underlying hardware is safe > + * against the Meltdown/Spectre attacks and whether it supplies > + * instructions for use in workarounds. The information comes from > + * firmware, either via the device tree on powernv platforms or > + * from an hcall on pseries platforms. > + * > + * For check_safe_cache() and check_safe_bounds_check(), a return > + * value of 0 means vulnerable, 1 means vulnerable but workaround > + * instructions are provided, and 2 means not vulnerable (no > workaround > + * is needed). > + * For check_safe_indirect_branch(), 0 means vulnerab