Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability
On 5/20/2022 2:05 AM, Jan Beulich wrote: On 20.05.2022 06:43, Chuck Zmudzinski wrote: On 5/4/22 5:14 AM, Juergen Gross wrote: On 04.05.22 10:31, Jan Beulich wrote: On 03.05.2022 15:22, Juergen Gross wrote: Some drivers are using pat_enabled() in order to test availability of special caching modes (WC and UC-). This will lead to false negatives in case the system was booted e.g. with the "nopat" variant and the BIOS did setup the PAT MSR supporting the queried mode, or if the system is running as a Xen PV guest. ... Add test functions for those caching modes instead and use them at the appropriate places. Fixes: bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()") Fixes: ae749c7ab475 ("PCI: Add arch_can_pci_mmap_wc() macro") Signed-off-by: Juergen Gross ... --- a/arch/x86/include/asm/pci.h +++ b/arch/x86/include/asm/pci.h @@ -94,7 +94,7 @@ int pcibios_set_irq_routing(struct pci_dev *dev, int pin, int irq); #define HAVE_PCI_MMAP -#define arch_can_pci_mmap_wc() pat_enabled() +#define arch_can_pci_mmap_wc() x86_has_pat_wc() Besides this and ... --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -76,7 +76,7 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data, if (args->flags & ~(I915_MMAP_WC)) return -EINVAL; - if (args->flags & I915_MMAP_WC && !pat_enabled()) + if (args->flags & I915_MMAP_WC && !x86_has_pat_wc()) return -ENODEV; obj = i915_gem_object_lookup(file, args->handle); @@ -757,7 +757,7 @@ i915_gem_dumb_mmap_offset(struct drm_file *file, if (HAS_LMEM(to_i915(dev))) mmap_type = I915_MMAP_TYPE_FIXED; - else if (pat_enabled()) + else if (x86_has_pat_wc()) mmap_type = I915_MMAP_TYPE_WC; else if (!i915_ggtt_has_aperture(to_gt(i915)->ggtt)) return -ENODEV; @@ -813,7 +813,7 @@ i915_gem_mmap_offset_ioctl(struct drm_device *dev, void *data, break; case I915_MMAP_OFFSET_WC: - if (!pat_enabled()) + if (!x86_has_pat_wc()) return -ENODEV; type = I915_MMAP_TYPE_WC; break; @@ -823,7 +823,7 @@ i915_gem_mmap_offset_ioctl(struct drm_device *dev, void *data, break; case I915_MMAP_OFFSET_UC: - if (!pat_enabled()) + if (!x86_has_pat_uc_minus()) return -ENODEV; type = I915_MMAP_TYPE_UC; break; ... these uses there are several more. You say nothing on why those want leaving unaltered. When preparing my earlier patch I did inspect them and came to the conclusion that these all would also better observe the adjusted behavior (or else I couldn't have left pat_enabled() as the only predicate). In fact, as said in the description of my earlier patch, in my debugging I did find the use in i915_gem_object_pin_map() to be the problematic one, which you leave alone. Oh, I missed that one, sorry. That is why your patch would not fix my Haswell unless it also touches i915_gem_object_pin_map() in drivers/gpu/drm/i915/gem/i915_gem_pages.c I wanted to be rather defensive in my changes, but I agree at least the case in arch_phys_wc_add() might want to be changed, too. I think your approach needs to be more aggressive so it will fix all the known false negatives introduced by bdd8b6c98239 such as the one in i915_gem_object_pin_map(). I looked at Jan's approach and I think it would fix the issue with my Haswell as long as I don't use the nopat option. I really don't have a strong opinion on that question, but I think the nopat option as a Linux kernel option, as opposed to a hypervisor option, should only affect the kernel, and if the hypervisor provides the pat feature, then the kernel should not override that, Hmm, why would the kernel not be allowed to override that? Such an override would affect only the single domain where the kernel runs; other domains could take their own decisions. Also, for the sake of completeness: "nopat" used when running on bare metal has the same bad effect on system boot, so there pretty clearly is an error cleanup issue in the i915 driver. But that's orthogonal, and I expect the maintainers may not even care (but tell us "don't do that then"). Jan but because of the confusion, As I just wrote earlier, the confusion is whether or not "nopat" means the kernel drivers will not use pat even if the firmware and hypervisor provides it. I think you are correct to point out that is the way the i915 driver behaved with the nopat option before bdd8b6c98239 was applied, with the same bad effects on bare metal as with the hypervisor. I think perhaps dealing with the nopat option to fix bdd8b6c98239 is a solution in search of a problem, at least as regards the i915 driver. The only problem we have, as I see it, is with a false negative when the nopat option is *not* enabled. But the forced disabling of pat in Jan's patch when the nopat option is enabled is probably needed
[ovmf test] 170593: regressions - FAIL
flight 170593 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/170593/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-amd64 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf a21a3438f795deecb24e1843c1636f95c485017c baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z 80 days Failing since168258 2022-03-01 01:55:31 Z 80 days 1113 attempts Testing same since 170593 2022-05-20 06:42:41 Z0 days1 attempts People who touched revisions under test: Abdul Lateef Attar Abdul Lateef Attar via groups.io Abner Chang Akihiko Odaki Anthony PERARD Bandaru, Purna Chandra Rao Bo Chang Ke Bob Feng Chao Li Chao, Zhuoran Chen Lin Z Chen, Christine Chen, Lin Z Corvin Köhne Dandan Bi dann frazier Dun Tan duntan Feng, Bob C Gerd Hoffmann Gua Guo Guo Dong Guomin Jiang Hao A Wu Heng Luo Hua Ma Huang, Li-Xia Jagadeesh Ujja Jake Garver Jake Garver via groups.io Jason Jason Lou Jiewen Yao Ke, Bo-ChangX Ken Lautner Kenneth Lautner Kun Qin Kuo, Ted Laszlo Ersek Lean Sheng Tan Leif Lindholm Li, Yi1 Li, Zhihao Liming Gao Liu Liu Yun Liu Yun Y Liu, Zhiguang Lixia Huang Lou, Yun Ma, Hua Mara Sophie Grosch Mara Sophie Grosch via groups.io Matt DeVillier Michael D Kinney Michael Kubacki Michael Kubacki Min M Xu Min Xu Oliver Steffen Patrick Rudolph Peter Grehan Purna Chandra Rao Bandaru Ray Ni Rebecca Cran Rebecca Cran Sami Mujawar Sean Rhodes Sean Rhodes sean@starlabs.systems Sebastien Boeuf Sunny Wang Tan, Dun Ted Kuo Tom Lendacky Wenyi Xie wenyi,xie via groups.io Xiaolu.Jiang Xie, Yuanhao Yi Li yi1 li Yu Pu Yuanhao Xie Yuwei Chen Zhiguang Liu Zhihao Li Zhuoran Chao jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 6968 lines long.)
Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability
On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote: On 5/20/2022 2:05 AM, Jan Beulich wrote: On 20.05.2022 06:43, Chuck Zmudzinski wrote: On 5/4/22 5:14 AM, Juergen Gross wrote: On 04.05.22 10:31, Jan Beulich wrote: On 03.05.2022 15:22, Juergen Gross wrote: ... these uses there are several more. You say nothing on why those want leaving unaltered. When preparing my earlier patch I did inspect them and came to the conclusion that these all would also better observe the adjusted behavior (or else I couldn't have left pat_enabled() as the only predicate). In fact, as said in the description of my earlier patch, in my debugging I did find the use in i915_gem_object_pin_map() to be the problematic one, which you leave alone. Oh, I missed that one, sorry. That is why your patch would not fix my Haswell unless it also touches i915_gem_object_pin_map() in drivers/gpu/drm/i915/gem/i915_gem_pages.c I wanted to be rather defensive in my changes, but I agree at least the case in arch_phys_wc_add() might want to be changed, too. I think your approach needs to be more aggressive so it will fix all the known false negatives introduced by bdd8b6c98239 such as the one in i915_gem_object_pin_map(). I looked at Jan's approach and I think it would fix the issue with my Haswell as long as I don't use the nopat option. I really don't have a strong opinion on that question, but I think the nopat option as a Linux kernel option, as opposed to a hypervisor option, should only affect the kernel, and if the hypervisor provides the pat feature, then the kernel should not override that, Hmm, why would the kernel not be allowed to override that? Such an override would affect only the single domain where the kernel runs; other domains could take their own decisions. Also, for the sake of completeness: "nopat" used when running on bare metal has the same bad effect on system boot, so there pretty clearly is an error cleanup issue in the i915 driver. But that's orthogonal, and I expect the maintainers may not even care (but tell us "don't do that then"). Actually I just did a test with the last official Debian kernel build of Linux 5.16, that is, a kernel before bdd8b6c98239 was applied. In fact, the nopat option does *not* break the i915 driver in 5.16. That is, with the nopat option, the i915 driver loads normally on both the bare metal and on the Xen hypervisor. That means your presumption (and the presumption of the author of bdd8b6c98239) that the "nopat" option was being observed by the i915 driver is incorrect. Setting "nopat" had no effect on my system with Linux 5.16. So after doing these tests, I am against the aggressive approach of breaking the i915 driver with the "nopat" option because prior to bdd8b6c98239, nopat did not break the i915 driver. Why break it now? Prior to bdd8b6c98239, the i915 driver used static_cpu_has(X86_FEATURE_PAT) to test for the PAT feature, and apparently this returns true even if nopat is set, but the new test, pat_enabled(), returns false on the Xen hypervisor even if nopat is not set. That is the only problem I see. The question of nopat should be irrelevant to the i915 driver. It was unfortunate that the author of bdd8b6c98239 mentioned nopat in the commit message when in fact nopat was never intended to be used to break the i915 driver. The i915 driver should ignore the nopat option and decide what to do based solely on the capability of the cpu, firmware, and the compiled options of the Linux kernel. That is how it behaved before bdd8b6c98239, and that behavior is what needs to be restored with a patch. Chuck
[PATCH v6 1/2] ns16550: use poll mode if INTERRUPT_LINE is 0xff
Intel LPSS has INTERRUPT_LINE set to 0xff by default, that is declared by the PCI Local Bus Specification Revision 3.0 (from 2004) as "unknown"/"no connection". Fallback to poll mode in this case. The 0xff handling is x86-specific, the surrounding code is guarded with CONFIG_X86 anyway. Signed-off-by: Marek Marczykowski-Górecki Reviewed-by: Roger Pau Monné --- Changes in v6: - wrap the check in additional CONFIG_X86, with appropriate comment Changes in v5: - drop IRQ 0 from the log message Changes in v4: - adjust log message, change it from WARNING to INFO - re-add x86 reference in the commit message Changes in v3: - change back to checking 0xff explicitly - adjust commit message, include spec reference - change warning to match the above Changes in v2: - add log message - extend commit message - code style fix --- xen/drivers/char/ns16550.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c index fb75cee4a13a..b37f67dc7430 100644 --- a/xen/drivers/char/ns16550.c +++ b/xen/drivers/char/ns16550.c @@ -1238,6 +1238,17 @@ pci_uart_config(struct ns16550 *uart, bool_t skip_amt, unsigned int idx) pci_conf_read8(PCI_SBDF(0, b, d, f), PCI_INTERRUPT_LINE) : 0; +#ifdef CONFIG_X86 +/* PCI Local Bus Specification Revision 3.0 defines 0xff value + * as special only for X86 */ +if ( uart->irq == 0xff ) +uart->irq = 0; +#endif +if ( !uart->irq ) +printk(XENLOG_INFO + "ns16550: %pp no legacy IRQ, using poll mode\n", + &PCI_SBDF(0, b, d, f)); + return 0; } } -- 2.35.1
[PATCH v6 2/2] ns16550: Add more device IDs for Intel LPSS UART
This is purely based on the spec: - Intel 500 Series PCH: 635218-006 - Intel 600 Series PCH: 691222-001, 648364-003 This is tested only on TGL-LP added initially, but according to the spec, they should behave the same. Signed-off-by: Marek Marczykowski-Górecki Acked-by: Andrew Cooper --- Changes in v2: - new patch, adding more IDs to the patch that went in already --- xen/drivers/char/ns16550.c | 80 +- 1 file changed, 79 insertions(+), 1 deletion(-) diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c index b37f67dc7430..b7da5646fc28 100644 --- a/xen/drivers/char/ns16550.c +++ b/xen/drivers/char/ns16550.c @@ -1077,12 +1077,90 @@ static const struct ns16550_config __initconst uart_config[] = .dev_id = 0x0358, .param = param_exar_xr17v358 }, -/* Intel Corp. TGL-LP LPSS PCI */ +/* Intel Corp. TGL-LP LPSS PCI UART #0 */ +{ +.vendor_id = PCI_VENDOR_ID_INTEL, +.dev_id = 0xa0a8, +.param = param_intel_lpss +}, +/* Intel Corp. TGL-LP LPSS PCI UART #1 */ +{ +.vendor_id = PCI_VENDOR_ID_INTEL, +.dev_id = 0xa0a9, +.param = param_intel_lpss +}, +/* Intel Corp. TGL-LP LPSS PCI UART #2 */ { .vendor_id = PCI_VENDOR_ID_INTEL, .dev_id = 0xa0c7, .param = param_intel_lpss }, +/* Intel Corp. TGL-H LPSS PCI UART #0 */ +{ +.vendor_id = PCI_VENDOR_ID_INTEL, +.dev_id = 0x43a8, +.param = param_intel_lpss +}, +/* Intel Corp. TGL-H LPSS PCI UART #1 */ +{ +.vendor_id = PCI_VENDOR_ID_INTEL, +.dev_id = 0x43a9, +.param = param_intel_lpss +}, +/* Intel Corp. TGL-H LPSS PCI UART #2 */ +{ +.vendor_id = PCI_VENDOR_ID_INTEL, +.dev_id = 0x43a7, +.param = param_intel_lpss +}, +/* Intel Corp. ADL-P LPSS PCI UART #0 */ +{ +.vendor_id = PCI_VENDOR_ID_INTEL, +.dev_id = 0x51a8, +.param = param_intel_lpss +}, +/* Intel Corp. ADL-P LPSS PCI UART #1 */ +{ +.vendor_id = PCI_VENDOR_ID_INTEL, +.dev_id = 0x51a9, +.param = param_intel_lpss +}, +/* Intel Corp. ADL-P LPSS PCI UART #2 */ +{ +.vendor_id = PCI_VENDOR_ID_INTEL, +.dev_id = 0x51c7, +.param = param_intel_lpss +}, +/* Intel Corp. ADL-P LPSS PCI UART #3 */ +{ +.vendor_id = PCI_VENDOR_ID_INTEL, +.dev_id = 0x51da, +.param = param_intel_lpss +}, +/* Intel Corp. ADL-S LPSS PCI UART #0 */ +{ +.vendor_id = PCI_VENDOR_ID_INTEL, +.dev_id = 0x7aa8, +.param = param_intel_lpss +}, +/* Intel Corp. ADL-S LPSS PCI UART #1 */ +{ +.vendor_id = PCI_VENDOR_ID_INTEL, +.dev_id = 0x7aa9, +.param = param_intel_lpss +}, +/* Intel Corp. ADL-S LPSS PCI UART #2 */ +{ +.vendor_id = PCI_VENDOR_ID_INTEL, +.dev_id = 0x7afe, +.param = param_intel_lpss +}, +/* Intel Corp. ADL-S LPSS PCI UART #3 */ +{ +.vendor_id = PCI_VENDOR_ID_INTEL, +.dev_id = 0x7adc, +.param = param_intel_lpss +}, }; static int __init -- 2.35.1
[ovmf test] 170594: regressions - FAIL
flight 170594 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/170594/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-amd64 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf a21a3438f795deecb24e1843c1636f95c485017c baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z 80 days Failing since168258 2022-03-01 01:55:31 Z 80 days 1114 attempts Testing same since 170593 2022-05-20 06:42:41 Z0 days2 attempts People who touched revisions under test: Abdul Lateef Attar Abdul Lateef Attar via groups.io Abner Chang Akihiko Odaki Anthony PERARD Bandaru, Purna Chandra Rao Bo Chang Ke Bob Feng Chao Li Chao, Zhuoran Chen Lin Z Chen, Christine Chen, Lin Z Corvin Köhne Dandan Bi dann frazier Dun Tan duntan Feng, Bob C Gerd Hoffmann Gua Guo Guo Dong Guomin Jiang Hao A Wu Heng Luo Hua Ma Huang, Li-Xia Jagadeesh Ujja Jake Garver Jake Garver via groups.io Jason Jason Lou Jiewen Yao Ke, Bo-ChangX Ken Lautner Kenneth Lautner Kun Qin Kuo, Ted Laszlo Ersek Lean Sheng Tan Leif Lindholm Li, Yi1 Li, Zhihao Liming Gao Liu Liu Yun Liu Yun Y Liu, Zhiguang Lixia Huang Lou, Yun Ma, Hua Mara Sophie Grosch Mara Sophie Grosch via groups.io Matt DeVillier Michael D Kinney Michael Kubacki Michael Kubacki Min M Xu Min Xu Oliver Steffen Patrick Rudolph Peter Grehan Purna Chandra Rao Bandaru Ray Ni Rebecca Cran Rebecca Cran Sami Mujawar Sean Rhodes Sean Rhodes sean@starlabs.systems Sebastien Boeuf Sunny Wang Tan, Dun Ted Kuo Tom Lendacky Wenyi Xie wenyi,xie via groups.io Xiaolu.Jiang Xie, Yuanhao Yi Li yi1 li Yu Pu Yuanhao Xie Yuwei Chen Zhiguang Liu Zhihao Li Zhuoran Chao jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 6968 lines long.)
Re: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm
Hi, > On 19 May 2022, at 03:36, Wei Chen wrote: > > Hi Jan, > >> -Original Message- >> From: Jan Beulich >> Sent: 2022年5月18日 21:05 >> To: Wei Chen >> Cc: nd ; Stefano Stabellini ; Julien >> Grall ; Bertrand Marquis ; >> Volodymyr Babchuk ; Andrew Cooper >> ; Roger Pau Monné ; Wei >> Liu ; Jiamei Xie ; xen- >> de...@lists.xenproject.org >> Subject: Re: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm >> >> On 11.05.2022 03:46, Wei Chen wrote: >>> x86 is using compiler feature testing to decide EFI build >>> enable or not. When EFI build is disabled, x86 will use an >>> efi/stub.c file to replace efi/runtime.c for build objects. >>> Following this idea, we introduce a stub file for Arm, but >>> use CONFIG_ARM_EFI to decide EFI build enable or not. >>> >>> And the most functions in x86 EFI stub.c can be reused for >>> other architectures, like Arm. So we move them to common >>> and keep the x86 specific function in x86/efi/stub.c. >>> >>> To avoid the symbol link conflict error when linking common >>> stub files to x86/efi. We add a regular file check in efi >>> stub files' link script. Depends on this check we can bypass >>> the link behaviors for existed stub files in x86/efi. >>> >>> As there is no Arm specific EFI stub function for Arm in >>> current stage, Arm still can use the existed symbol link >>> method for EFI stub files. >> >> Wouldn't it be better to mandate that every arch has its stub.c, >> and in the Arm one all you'd do (for now) is #include the common >> one? (But see also below.) >> > > Personally, I don't like to include a C file into another C file. > But I am OK as long as the Arm maintainers agree. > @Stefano Stabellini @Bertrand Marquis @Julien Grall I agree with Wei here and if we are realistic the current way the EFI code works needs a redesign anyway and asking him to change this in this serie is not right. So I am OK with Wei solution. Cheers Bertrand
Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability
On 20.05.2022 10:30, Chuck Zmudzinski wrote: > On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote: >> On 5/20/2022 2:05 AM, Jan Beulich wrote: >>> On 20.05.2022 06:43, Chuck Zmudzinski wrote: On 5/4/22 5:14 AM, Juergen Gross wrote: > On 04.05.22 10:31, Jan Beulich wrote: >> On 03.05.2022 15:22, Juergen Gross wrote: >> >> ... these uses there are several more. You say nothing on why >> those want >> leaving unaltered. When preparing my earlier patch I did inspect them >> and came to the conclusion that these all would also better >> observe the >> adjusted behavior (or else I couldn't have left pat_enabled() as the >> only predicate). In fact, as said in the description of my earlier >> patch, in >> my debugging I did find the use in i915_gem_object_pin_map() to be >> the >> problematic one, which you leave alone. > Oh, I missed that one, sorry. That is why your patch would not fix my Haswell unless it also touches i915_gem_object_pin_map() in drivers/gpu/drm/i915/gem/i915_gem_pages.c > I wanted to be rather defensive in my changes, but I agree at least > the > case in arch_phys_wc_add() might want to be changed, too. I think your approach needs to be more aggressive so it will fix all the known false negatives introduced by bdd8b6c98239 such as the one in i915_gem_object_pin_map(). I looked at Jan's approach and I think it would fix the issue with my Haswell as long as I don't use the nopat option. I really don't have a strong opinion on that question, but I think the nopat option as a Linux kernel option, as opposed to a hypervisor option, should only affect the kernel, and if the hypervisor provides the pat feature, then the kernel should not override that, >>> Hmm, why would the kernel not be allowed to override that? Such >>> an override would affect only the single domain where the >>> kernel runs; other domains could take their own decisions. >>> >>> Also, for the sake of completeness: "nopat" used when running on >>> bare metal has the same bad effect on system boot, so there >>> pretty clearly is an error cleanup issue in the i915 driver. But >>> that's orthogonal, and I expect the maintainers may not even care >>> (but tell us "don't do that then"). > > Actually I just did a test with the last official Debian kernel > build of Linux 5.16, that is, a kernel before bdd8b6c98239 was > applied. In fact, the nopat option does *not* break the i915 driver > in 5.16. That is, with the nopat option, the i915 driver loads > normally on both the bare metal and on the Xen hypervisor. > That means your presumption (and the presumption of > the author of bdd8b6c98239) that the "nopat" option was > being observed by the i915 driver is incorrect. Setting "nopat" > had no effect on my system with Linux 5.16. So after doing these > tests, I am against the aggressive approach of breaking the i915 > driver with the "nopat" option because prior to bdd8b6c98239, > nopat did not break the i915 driver. Why break it now? Because that's, in my understanding, is the purpose of "nopat" (not breaking the driver of course - that's a driver bug -, but having an effect on the driver). Jan
Re: [PATCH v6 2/2] ns16550: Add more device IDs for Intel LPSS UART
On 20.05.2022 10:53, Marek Marczykowski-Górecki wrote: > This is purely based on the spec: > - Intel 500 Series PCH: 635218-006 > - Intel 600 Series PCH: 691222-001, 648364-003 > > This is tested only on TGL-LP added initially, but according to the > spec, they should behave the same. > > Signed-off-by: Marek Marczykowski-Górecki > Acked-by: Andrew Cooper Oops? Commit 1f0b1f5cce9d. Jan
Re: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm
Hi Wei, On 19/05/2022 03:36, Wei Chen wrote: -Original Message- From: Jan Beulich Sent: 2022年5月18日 21:05 To: Wei Chen Cc: nd ; Stefano Stabellini ; Julien Grall ; Bertrand Marquis ; Volodymyr Babchuk ; Andrew Cooper ; Roger Pau Monné ; Wei Liu ; Jiamei Xie ; xen- de...@lists.xenproject.org Subject: Re: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm On 11.05.2022 03:46, Wei Chen wrote: x86 is using compiler feature testing to decide EFI build enable or not. When EFI build is disabled, x86 will use an efi/stub.c file to replace efi/runtime.c for build objects. Following this idea, we introduce a stub file for Arm, but use CONFIG_ARM_EFI to decide EFI build enable or not. And the most functions in x86 EFI stub.c can be reused for other architectures, like Arm. So we move them to common and keep the x86 specific function in x86/efi/stub.c. To avoid the symbol link conflict error when linking common stub files to x86/efi. We add a regular file check in efi stub files' link script. Depends on this check we can bypass the link behaviors for existed stub files in x86/efi. As there is no Arm specific EFI stub function for Arm in current stage, Arm still can use the existed symbol link method for EFI stub files. Wouldn't it be better to mandate that every arch has its stub.c, and in the Arm one all you'd do (for now) is #include the common one? (But see also below.) Personally, I don't like to include a C file into another C file. Same here. I know we already use it in EFI but I am not in favor to add more of them. Cheers, -- Julien Grall
Re: [PATCH v6 1/2] ns16550: use poll mode if INTERRUPT_LINE is 0xff
On 20.05.2022 10:53, Marek Marczykowski-Górecki wrote: > Intel LPSS has INTERRUPT_LINE set to 0xff by default, that is declared > by the PCI Local Bus Specification Revision 3.0 (from 2004) as > "unknown"/"no connection". Fallback to poll mode in this case. > The 0xff handling is x86-specific, the surrounding code is guarded with > CONFIG_X86 anyway. > > Signed-off-by: Marek Marczykowski-Górecki > Reviewed-by: Roger Pau Monné Reviewed-by: Jan Beulich > Changes in v6: > - wrap the check in additional CONFIG_X86, with appropriate comment Thanks, albeit ... > --- a/xen/drivers/char/ns16550.c > +++ b/xen/drivers/char/ns16550.c > @@ -1238,6 +1238,17 @@ pci_uart_config(struct ns16550 *uart, bool_t skip_amt, > unsigned int idx) > pci_conf_read8(PCI_SBDF(0, b, d, f), > PCI_INTERRUPT_LINE) : 0; > > +#ifdef CONFIG_X86 > +/* PCI Local Bus Specification Revision 3.0 defines 0xff > value > + * as special only for X86 */ ... I'll have to remember to correct the style of the comment while committing ... > +if ( uart->irq == 0xff ) > +uart->irq = 0; > +#endif > +if ( !uart->irq ) > +printk(XENLOG_INFO > + "ns16550: %pp no legacy IRQ, using poll mode\n", ... and perhaps insert another colon after %pp here. Jan > + &PCI_SBDF(0, b, d, f)); > + > return 0; > } > }
[ovmf test] 170595: regressions - FAIL
flight 170595 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/170595/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-amd64 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf a21a3438f795deecb24e1843c1636f95c485017c baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z 80 days Failing since168258 2022-03-01 01:55:31 Z 80 days 1115 attempts Testing same since 170593 2022-05-20 06:42:41 Z0 days3 attempts People who touched revisions under test: Abdul Lateef Attar Abdul Lateef Attar via groups.io Abner Chang Akihiko Odaki Anthony PERARD Bandaru, Purna Chandra Rao Bo Chang Ke Bob Feng Chao Li Chao, Zhuoran Chen Lin Z Chen, Christine Chen, Lin Z Corvin Köhne Dandan Bi dann frazier Dun Tan duntan Feng, Bob C Gerd Hoffmann Gua Guo Guo Dong Guomin Jiang Hao A Wu Heng Luo Hua Ma Huang, Li-Xia Jagadeesh Ujja Jake Garver Jake Garver via groups.io Jason Jason Lou Jiewen Yao Ke, Bo-ChangX Ken Lautner Kenneth Lautner Kun Qin Kuo, Ted Laszlo Ersek Lean Sheng Tan Leif Lindholm Li, Yi1 Li, Zhihao Liming Gao Liu Liu Yun Liu Yun Y Liu, Zhiguang Lixia Huang Lou, Yun Ma, Hua Mara Sophie Grosch Mara Sophie Grosch via groups.io Matt DeVillier Michael D Kinney Michael Kubacki Michael Kubacki Min M Xu Min Xu Oliver Steffen Patrick Rudolph Peter Grehan Purna Chandra Rao Bandaru Ray Ni Rebecca Cran Rebecca Cran Sami Mujawar Sean Rhodes Sean Rhodes sean@starlabs.systems Sebastien Boeuf Sunny Wang Tan, Dun Ted Kuo Tom Lendacky Wenyi Xie wenyi,xie via groups.io Xiaolu.Jiang Xie, Yuanhao Yi Li yi1 li Yu Pu Yuanhao Xie Yuwei Chen Zhiguang Liu Zhihao Li Zhuoran Chao jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 6968 lines long.)
[libvirt test] 170589: regressions - FAIL
flight 170589 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/170589/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777 build-arm64-libvirt 6 libvirt-buildfail REGR. vs. 151777 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-pair 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-vhd 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt 1 build-check(1) blocked n/a test-amd64-i386-libvirt-pair 1 build-check(1) blocked n/a test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt-raw 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-libvirt 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-qcow2 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-raw 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-raw 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-qcow2 1 build-check(1) blocked n/a version targeted for testing: libvirt 1bd24e79bee1800e7191863cc4617b12fe614dbe baseline version: libvirt 2c846fa6bcc11929c9fb857a22430fb9945654ad Last test of basis 151777 2020-07-10 04:19:19 Z 679 days Failing since151818 2020-07-11 04:18:52 Z 678 days 660 attempts Testing same since 170589 2022-05-20 04:20:23 Z0 days1 attempts People who touched revisions under test: Adolfo Jayme Barrientos Aleksandr Alekseev Aleksei Zakharov Amneesh Singh Andika Triwidada Andrea Bolognani Andrew Melnychenko Ani Sinha Balázs Meskó Barrett Schonefeld Bastian Germann Bastien Orivel BiaoXiang Ye Bihong Yu Binfeng Wu Bjoern Walk Boris Fiuczynski Brad Laue Brian Turek Bruno Haible Chris Mayo Christian Borntraeger Christian Ehrhardt Christian Kirbach Christian Schoenebeck Christophe Fergeau Claudio Fontana Cole Robinson Collin Walling Cornelia Huck Cédric Bosdonnat Côme Borsoi Daniel Henrique Barboza Daniel Letai Daniel P. Berrange Daniel P. Berrangé Didik Supriadi dinglimin Divya Garg Dmitrii Shcherbakov Dmytro Linkin Eiichi Tsukata Emilio Herrera Eric Farman Erik Skultety Fabian Affolter Fabian Freyer Fabiano Fidêncio Fangge Jin Farhan Ali Fedora Weblate Translation Franck Ridel Gavi Teitz gongwei Guoyi Tu Göran Uddeborg Halil Pasic Han Han Hao Wang Haonan Wang Hela Basa Helmut Grohne Hiroki Narukawa Hyman Huang(黄勇) Ian Wienand Ioanna Alifieraki Ivan Teterevkov Jakob Meng Jamie Strandboge Jamie Strandboge Jan Kuparinen jason lee Jean-Baptiste Holcroft Jia Zhou Jianan Gao Jim Fehlig Jin Yan Jing Qi Jinsheng Zhang Jiri Denemark Joachim Falk John Ferlan John Levon John Levon Jonathan Watt Jonathon Jongsma Julio Faracco Justin Gatzen Ján Tomko Kashyap Chamarthy Kevin Locke Kim InSoo Koichi Murase Kristina Hanicova Laine Stump Laszlo Ersek Lee Yarwood Lei Yang Lena Voytek Liang Yan Liang Yan Liao Pingfang Lin Ma Lin Ma Lin Ma Liu Yiding Lubomir Rintel Luke Yue Luyao Zhong luzhipeng Marc Hartmayer Marc-André Lureau Marek Marczykowski-Górecki Markus Schade Martin Kletzander Martin Pitt Masayoshi Mizuma Matej Cepl Matt Coleman Matt Coleman Mauro Matteo Cascella Maxim Nestratov Meina Li Michal Privoznik Michał Smyk Milo Casagrande Moshe Levi Moteen Shah Moteen Shah Muha Aliss Nathan Neal Gompa Nick Chevsky Nick Shyrokovskiy Nickys Music Group Nico Pache Nicolas Lécureuil Nicolas Lécureuil Nikolay Shirokovskiy Nikolay Shirokovskiy Nikolay Shirokovskiy Niteesh Dubey Olaf Hering Olesya Gerasimenko Or Ozeri Orion Poplawski Pany Paolo Bonzini Patrick Magauran Paulo de Rezende Pinatti Pavel Hrdina Peng Liang Peter Krempa Pino Toscano Pino Toscano
Re: [PATCH v3] xen/build: Add cppcheck and cppcheck-html make rules
Hi Bertrand, Since this patch has been committed, I get the following message for on every build (make -C xen): which: no cppcheck in ([...]) /bin/sh: cppcheck: command not found I wasn't expecting the build system to check every time. I think... On 26/04/2022 13:38, Bertrand Marquis wrote: +cppcheck-version: +ifeq ($(shell which $(CPPCHECK)),) ... this is because ifeq is similar to pre-preprocessing. So it always get executed. So far, the message is harmlesss (I can continue to build) but I think the check should really only happen when the target cppcheck-version is executed. So can you have a look to rework the rule? Cheers, -- Julien Grall
Re: [PATCH v3] xen/build: Add cppcheck and cppcheck-html make rules
Hi Julien, > On 20 May 2022, at 10:56, Julien Grall wrote: > > Hi Bertrand, > > Since this patch has been committed, I get the following message for on every > build (make -C xen): > > which: no cppcheck in ([...]) > /bin/sh: cppcheck: command not found > > I wasn't expecting the build system to check every time. I think... > > On 26/04/2022 13:38, Bertrand Marquis wrote: >> +cppcheck-version: >> +ifeq ($(shell which $(CPPCHECK)),) > > ... this is because ifeq is similar to pre-preprocessing. So it always get > executed. > > So far, the message is harmlesss (I can continue to build) but I think the > check should really only happen when the target cppcheck-version is executed. > So can you have a look to rework the rule? Sure I will push something to fix that. Thanks for the notice Cheers Bertrand > > Cheers, > > -- > Julien Grall >
Re: [PATCH 2/2] x86/vmx: implement Notify VM Exit
On Thu, May 19, 2022 at 04:45:20PM +0200, Roger Pau Monné wrote: > On Thu, May 19, 2022 at 12:10:24AM +, Andrew Cooper wrote: > > On 17/05/2022 14:21, Roger Pau Monne wrote: > > > @@ -1333,6 +1338,19 @@ static int construct_vmcs(struct vcpu *v) > > > rc = vmx_add_msr(v, MSR_FLUSH_CMD, FLUSH_CMD_L1D, > > > VMX_MSR_GUEST_LOADONLY); > > > > > > +if ( cpu_has_vmx_notify_vm_exiting && vm_notify_window >= 0 ) > > > +{ > > > +__vmwrite(NOTIFY_WINDOW, vm_notify_window); > > > +/* > > > + * Disable #AC and #DB interception: by using VM Notify Xen is > > > + * guaranteed to get a VM exit even if the guest manages to lock > > > the > > > + * CPU. > > > + */ > > > +v->arch.hvm.vmx.exception_bitmap &= ~((1U << TRAP_debug) | > > > + (1U << > > > TRAP_alignment_check)); > > > +vmx_update_exception_bitmap(v); > > > > IIRC, it's not quite this easy. There are conditions, e.g. attaching > > gdbsx, where #DB interception wants turning on/off dynamically, and the > > logic got simplified to nothing following XSA-156, so will need > > reintroducing. > > > > AMD Milan (Zen3) actually has NoNestedDataBp in CPUID.8021.eax[0] > > which allows us to not intercept #DB, so perhaps that might offer an > > easier way of adjusting the interception logic. (Or maybe not. I can't > > remember). > > OK, will look into it. So after taking a look, I think we need to modify vmx_update_debug_state() so it's: void vmx_update_debug_state(struct vcpu *v) { unsigned int mask = 1u << TRAP_int3; if ( v->arch.hvm.vmx.secondary_exec_control & SECONDARY_EXEC_NOTIFY_VM_EXITING ) /* * Only allow toggling TRAP_debug if notify VM exit is enabled, as * unconditionally setting TRAP_debug is part of the XSA-156 fix. */ mask |= 1u << TRAP_debug; if ( v->arch.hvm.debug_state_latch ) v->arch.hvm.vmx.exception_bitmap |= mask; else v->arch.hvm.vmx.exception_bitmap &= ~mask; [...] I'm however confused by the usage of cpu_has_monitor_trap_flag previous to XSA-156, which was: void vmx_update_debug_state(struct vcpu *v) { unsigned long mask; mask = 1u << TRAP_int3; if ( !cpu_has_monitor_trap_flag ) mask |= 1u << TRAP_debug; Was it fine to not set TRAP_debug only if cpu_has_monitor_trap_flag is supported by the CPU? (even if not currently set on secondary_exec_control)? Thanks, Roger.
[xen-unstable test] 170586: tolerable FAIL
flight 170586 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/170586/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-i386-xl-qemut-debianhvm-amd64 17 guest-saverestore.2 fail in 170563 pass in 170586 test-amd64-i386-qemut-rhel6hvm-amd 7 xen-install fail pass in 170563 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 170563 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 170563 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 170563 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 170563 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 170563 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 170563 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 170563 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 170563 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 170563 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 170563 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 170563 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 170563 test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pas
Re: [PATCH v4 14/21] x86: introduce helper for recording degree of contiguity in page tables
On Wed, May 18, 2022 at 12:06:29PM +0200, Jan Beulich wrote: > On 06.05.2022 15:25, Roger Pau Monné wrote: > > On Mon, Apr 25, 2022 at 10:41:23AM +0200, Jan Beulich wrote: > >> --- /dev/null > >> +++ b/xen/arch/x86/include/asm/pt-contig-markers.h > >> @@ -0,0 +1,105 @@ > >> +#ifndef __ASM_X86_PT_CONTIG_MARKERS_H > >> +#define __ASM_X86_PT_CONTIG_MARKERS_H > >> + > >> +/* > >> + * Short of having function templates in C, the function defined below is > >> + * intended to be used by multiple parties interested in recording the > >> + * degree of contiguity in mappings by a single page table. > >> + * > >> + * Scheme: Every entry records the order of contiguous successive entries, > >> + * up to the maximum order covered by that entry (which is the number of > >> + * clear low bits in its index, with entry 0 being the exception using > >> + * the base-2 logarithm of the number of entries in a single page table). > >> + * While a few entries need touching upon update, knowing whether the > >> + * table is fully contiguous (and can hence be replaced by a higher level > >> + * leaf entry) is then possible by simply looking at entry 0's marker. > >> + * > >> + * Prereqs: > >> + * - CONTIG_MASK needs to be #define-d, to a value having at least 4 > >> + * contiguous bits (ignored by hardware), before including this file, > >> + * - page tables to be passed here need to be initialized with correct > >> + * markers. > > > > Not sure it's very relevant, but might we worth adding that: > > > > - Null entries must have the PTE zeroed except for the CONTIG_MASK > > region in order to be considered as inactive. > > NP, I've added an item along these lines. > > >> +static bool pt_update_contig_markers(uint64_t *pt, unsigned int idx, > >> + unsigned int level, enum PTE_kind > >> kind) > >> +{ > >> +unsigned int b, i = idx; > >> +unsigned int shift = (level - 1) * CONTIG_LEVEL_SHIFT + PAGE_SHIFT; > >> + > >> +ASSERT(idx < CONTIG_NR); > >> +ASSERT(!(pt[idx] & CONTIG_MASK)); > >> + > >> +/* Step 1: Reduce markers in lower numbered entries. */ > >> +while ( i ) > >> +{ > >> +b = find_first_set_bit(i); > >> +i &= ~(1U << b); > >> +if ( GET_MARKER(pt[i]) > b ) > >> +SET_MARKER(pt[i], b); > > > > Can't you exit early when you find an entry that already has the > > to-be-set contiguous marker <= b, as lower numbered entries will then > > also be <= b'? > > > > Ie: > > > > if ( GET_MARKER(pt[i]) <= b ) > > break; > > else > > SET_MARKER(pt[i], b); > > Almost - I think it would need to be > > if ( GET_MARKER(pt[i]) < b ) > break; > if ( GET_MARKER(pt[i]) > b ) > SET_MARKER(pt[i], b); I guess I'm slightly confused, but if marker at i is <= b, then all following markers will also be <=, and hence could be skipped? Not sure why we need to keep iterating if GET_MARKER(pt[i]) == b. FWIW, you could even do: if ( GET_MARKER(pt[i]) <= b ) break; SET_MARKER(pt[i], b); Which would keep the conditionals to 1 like it currently is. > > or, accepting redundant updates, > > if ( GET_MARKER(pt[i]) < b ) > break; > SET_MARKER(pt[i], b); > > . Neither the redundant updates nor the extra (easily mis-predicted) > conditional looked very appealing to me, but I guess I could change > this if you are convinced that's better than continuing a loop with > at most 9 (typically less) iterations. Well, I think I at least partly understood the logic. Not sure whether it's worth adding the conditional or just assuming that continuing the loop is going to be cheaper. Might be worth adding a comment that we choose to explicitly not add an extra conditional to check for early exit, because we assume that to be more expensive than just continuing. Thanks, Roger.
[ovmf test] 170597: regressions - FAIL
flight 170597 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/170597/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-amd64 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf a21a3438f795deecb24e1843c1636f95c485017c baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z 80 days Failing since168258 2022-03-01 01:55:31 Z 80 days 1116 attempts Testing same since 170593 2022-05-20 06:42:41 Z0 days4 attempts People who touched revisions under test: Abdul Lateef Attar Abdul Lateef Attar via groups.io Abner Chang Akihiko Odaki Anthony PERARD Bandaru, Purna Chandra Rao Bo Chang Ke Bob Feng Chao Li Chao, Zhuoran Chen Lin Z Chen, Christine Chen, Lin Z Corvin Köhne Dandan Bi dann frazier Dun Tan duntan Feng, Bob C Gerd Hoffmann Gua Guo Guo Dong Guomin Jiang Hao A Wu Heng Luo Hua Ma Huang, Li-Xia Jagadeesh Ujja Jake Garver Jake Garver via groups.io Jason Jason Lou Jiewen Yao Ke, Bo-ChangX Ken Lautner Kenneth Lautner Kun Qin Kuo, Ted Laszlo Ersek Lean Sheng Tan Leif Lindholm Li, Yi1 Li, Zhihao Liming Gao Liu Liu Yun Liu Yun Y Liu, Zhiguang Lixia Huang Lou, Yun Ma, Hua Mara Sophie Grosch Mara Sophie Grosch via groups.io Matt DeVillier Michael D Kinney Michael Kubacki Michael Kubacki Min M Xu Min Xu Oliver Steffen Patrick Rudolph Peter Grehan Purna Chandra Rao Bandaru Ray Ni Rebecca Cran Rebecca Cran Sami Mujawar Sean Rhodes Sean Rhodes sean@starlabs.systems Sebastien Boeuf Sunny Wang Tan, Dun Ted Kuo Tom Lendacky Wenyi Xie wenyi,xie via groups.io Xiaolu.Jiang Xie, Yuanhao Yi Li yi1 li Yu Pu Yuanhao Xie Yuwei Chen Zhiguang Liu Zhihao Li Zhuoran Chao jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 6968 lines long.)
Re: [PATCH v6 2/2] ns16550: Add more device IDs for Intel LPSS UART
On Fri, May 20, 2022 at 11:42:37AM +0200, Jan Beulich wrote: > On 20.05.2022 10:53, Marek Marczykowski-Górecki wrote: > > This is purely based on the spec: > > - Intel 500 Series PCH: 635218-006 > > - Intel 600 Series PCH: 691222-001, 648364-003 > > > > This is tested only on TGL-LP added initially, but according to the > > spec, they should behave the same. > > > > Signed-off-by: Marek Marczykowski-Górecki > > Acked-by: Andrew Cooper > > Oops? Commit 1f0b1f5cce9d. Right, I haven't fetch new master. No changes here for several iterations, so everything is fine. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab signature.asc Description: PGP signature
Re: [PATCH v6 1/2] ns16550: use poll mode if INTERRUPT_LINE is 0xff
On Fri, May 20, 2022 at 11:47:02AM +0200, Jan Beulich wrote: > On 20.05.2022 10:53, Marek Marczykowski-Górecki wrote: > > Intel LPSS has INTERRUPT_LINE set to 0xff by default, that is declared > > by the PCI Local Bus Specification Revision 3.0 (from 2004) as > > "unknown"/"no connection". Fallback to poll mode in this case. > > The 0xff handling is x86-specific, the surrounding code is guarded with > > CONFIG_X86 anyway. > > > > Signed-off-by: Marek Marczykowski-Górecki > > Reviewed-by: Roger Pau Monné > > Reviewed-by: Jan Beulich Thanks. > > Changes in v6: > > - wrap the check in additional CONFIG_X86, with appropriate comment > > Thanks, albeit ... > > > --- a/xen/drivers/char/ns16550.c > > +++ b/xen/drivers/char/ns16550.c > > @@ -1238,6 +1238,17 @@ pci_uart_config(struct ns16550 *uart, bool_t > > skip_amt, unsigned int idx) > > pci_conf_read8(PCI_SBDF(0, b, d, f), > > PCI_INTERRUPT_LINE) : 0; > > > > +#ifdef CONFIG_X86 > > +/* PCI Local Bus Specification Revision 3.0 defines 0xff > > value > > + * as special only for X86 */ > > ... I'll have to remember to correct the style of the comment while > committing ... > > > +if ( uart->irq == 0xff ) > > +uart->irq = 0; > > +#endif > > +if ( !uart->irq ) > > +printk(XENLOG_INFO > > + "ns16550: %pp no legacy IRQ, using poll mode\n", > > ... and perhaps insert another colon after %pp here. Both fine with me, thanks! > Jan > > > + &PCI_SBDF(0, b, d, f)); > > + > > return 0; > > } > > } > -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab signature.asc Description: PGP signature
Re: [PATCH v4 17/21] AMD/IOMMU: replace all-contiguous page tables by superpage mappings
On Wed, May 18, 2022 at 12:40:59PM +0200, Jan Beulich wrote: > On 10.05.2022 17:31, Roger Pau Monné wrote: > > On Mon, Apr 25, 2022 at 10:43:16AM +0200, Jan Beulich wrote: > >> @@ -94,11 +95,15 @@ static union amd_iommu_pte set_iommu_pte > >> old.iw != iw || old.ir != ir ) > >> { > >> set_iommu_pde_present(pde, next_mfn, 0, iw, ir); > >> -pt_update_contig_markers(&table->raw, pfn_to_pde_idx(dfn, level), > >> - level, PTE_kind_leaf); > >> +*contig = pt_update_contig_markers(&table->raw, > >> + pfn_to_pde_idx(dfn, level), > >> + level, PTE_kind_leaf); > >> } > >> else > >> +{ > >> old.pr = false; /* signal "no change" to the caller */ > >> +*contig = false; > > > > So we assume that any caller getting contig == true must have acted > > and coalesced the page table? > > Yes, except that I wouldn't use "must", but "would". It's not a > requirement after all, functionality-wise all will be fine without > re-coalescing. > > > Might be worth a comment, to note that the function assumes that a > > previous return of contig == true will have coalesced the page table > > and hence a "no change" PTE write is not expected to happen on a > > contig page table. > > I'm not convinced, as there's effectively only one caller, > amd_iommu_map_page(). I also don't see why "no change" would be a > problem. "No change" can't result in a fully contiguous page table > if the page table wasn't fully contiguous already before (at which > point it would have been replaced by a superpage). Right, I agree, it's just that I would have preferred the result from set_iommu_ptes_present() to be consistent, ie: repeated calls to it using the same PTE should set contig to the same value. Anyway, that's not relevant to any current callers, so: Reviewed-by: Roger Pau Monné Thanks, Roger.
Re: [PATCH v2 1/3] tools/xl: Sort create command options
On Tue, Apr 19, 2022 at 06:56:03PM -0700, Elliott Mitchell wrote: > Hopefully simplify future changes by sorting options lists for > `xl create`. While at it, declare the options list constant. > > Signed-off-by: Elliott Mitchell Reviewed-by: Anthony PERARD Thanks, -- Anthony PERARD
Re: [PATCH v4 18/21] VT-d: replace all-contiguous page tables by superpage mappings
On Wed, May 18, 2022 at 12:44:06PM +0200, Jan Beulich wrote: > On 11.05.2022 13:08, Roger Pau Monné wrote: > > On Mon, Apr 25, 2022 at 10:43:45AM +0200, Jan Beulich wrote: > >> When a page table ends up with all contiguous entries (including all > >> identical attributes), it can be replaced by a superpage entry at the > >> next higher level. The page table itself can then be scheduled for > >> freeing. > >> > >> The adjustment to LEVEL_MASK is merely to avoid leaving a latent trap > >> for whenever we (and obviously hardware) start supporting 512G mappings. > >> > >> Signed-off-by: Jan Beulich > >> Reviewed-by: Kevin Tian > > > > Like on the AMD side, I wonder whether you can get away with only > > FTAOD I take it you mean "like on the all-empty side", as on AMD we > don't need to do any cache flushing? Heh, yes, sorry. > > doing a cache flush for the last (highest level) PTE, as the lower > > ones won't be reachable anyway, as the page-table is freed. > > But that freeing will happen only later, with a TLB flush in between. > Until then we would better make sure the IOMMU sees what was written, > even if it reading a stale value _should_ be benign. Hm, but when doing the TLB flush the paging structures will already be fully updated, and the top level visible entry will have it's cache flushed, so the lower ones would never be reachable AFAICT. Thanks, Roger.
Re: [PATCH v4 19/21] IOMMU/x86: add perf counters for page table splitting / coalescing
On Wed, May 18, 2022 at 01:39:02PM +0200, Jan Beulich wrote: > On 11.05.2022 15:48, Roger Pau Monné wrote: > > On Mon, Apr 25, 2022 at 10:44:11AM +0200, Jan Beulich wrote: > >> Signed-off-by: Jan Beulich > >> Reviewed-by: Kevin tian > > > > Reviewed-by: Roger Pau Monné > > Thanks. > > > Would be helpful to also have those per-guest I think. > > Perhaps, but we don't have per-guest counter infrastructure, do we? No, I don't think so? Would be nice, but I don't see us doing it any time soon. Thanks, Roger.
Re: [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables
On Thu, May 19, 2022 at 02:12:04PM +0200, Jan Beulich wrote: > On 06.05.2022 13:16, Roger Pau Monné wrote: > > On Mon, Apr 25, 2022 at 10:40:55AM +0200, Jan Beulich wrote: > >> --- > >> An alternative to the ASSERT()s added to set_iommu_ptes_present() would > >> be to make the function less general-purpose; it's used in a single > >> place only after all (i.e. it might as well be folded into its only > >> caller). > > > > I would think adding a comment that the function requires the PDE to > > be empty would be good. > > But that's not the case - what the function expects to be clear is > what is being ASSERT()ed. > > > Also given the current usage we could drop > > the nr_ptes parameter and just name the function fill_pde() or > > similar. > > Right, but that would want to be a separate change. > > >> --- a/xen/drivers/passthrough/amd/iommu_map.c > >> +++ b/xen/drivers/passthrough/amd/iommu_map.c > >> @@ -115,7 +115,19 @@ static void set_iommu_ptes_present(unsig > >> > >> while ( nr_ptes-- ) > >> { > >> -set_iommu_pde_present(pde, next_mfn, 0, iw, ir); > >> +ASSERT(!pde->next_level); > >> +ASSERT(!pde->u); > >> + > >> +if ( pde > table ) > >> +ASSERT(pde->ign0 == find_first_set_bit(pde - table)); > >> +else > >> +ASSERT(pde->ign0 == PAGE_SHIFT - 3); > > > > I think PAGETABLE_ORDER would be clearer here. > > I disagree - PAGETABLE_ORDER is a CPU-side concept. It's not used anywhere > in IOMMU code afaics. Isn't PAGE_SHIFT also a CPU-side concept in the same way? I'm not sure what's the rule for declaring that PAGE_SHIFT is fine to use in IOMMU code but not PAGETABLE_ORDER. Thanks, Roger.
[PATCH] build: Fix make warning if there is no cppcheck
If cppcheck is not present, the following warning appears during build: which: no cppcheck in ([...]) /bin/sh: cppcheck: command not found Fix this by hiding the error output from which and only try to execute cppcheck --version if we have a cppcheck. Reported-by: Julien Grall Signed-off-by: Bertrand Marquis --- xen/Makefile | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/xen/Makefile b/xen/Makefile index 15388703bc..f42be3d0ab 100644 --- a/xen/Makefile +++ b/xen/Makefile @@ -694,12 +694,13 @@ $(objtree)/%.c.cppcheck: $(srctree)/%.c $(objtree)/include/generated/autoconf.h $(call if_changed,cppcheck_xml) cppcheck-version: -ifeq ($(shell which $(CPPCHECK)),) +ifeq ($(shell which $(CPPCHECK) 2> /dev/null),) $(error Cannot find cppcheck executable: $(CPPCHECK)) -endif +else ifeq ($(shell $(CPPCHECK) --version | awk '{print ($$2 < 2.7)}'),1) $(error Please upgrade your cppcheck to version 2.7 or greater) endif +endif # Put this in generated headers this way it is cleaned by include/Makefile $(objtree)/include/generated/compiler-def.h: -- 2.25.1
Re: [PATCH v4 14/21] x86: introduce helper for recording degree of contiguity in page tables
On 20.05.2022 12:22, Roger Pau Monné wrote: > On Wed, May 18, 2022 at 12:06:29PM +0200, Jan Beulich wrote: >> On 06.05.2022 15:25, Roger Pau Monné wrote: >>> On Mon, Apr 25, 2022 at 10:41:23AM +0200, Jan Beulich wrote: --- /dev/null +++ b/xen/arch/x86/include/asm/pt-contig-markers.h @@ -0,0 +1,105 @@ +#ifndef __ASM_X86_PT_CONTIG_MARKERS_H +#define __ASM_X86_PT_CONTIG_MARKERS_H + +/* + * Short of having function templates in C, the function defined below is + * intended to be used by multiple parties interested in recording the + * degree of contiguity in mappings by a single page table. + * + * Scheme: Every entry records the order of contiguous successive entries, + * up to the maximum order covered by that entry (which is the number of + * clear low bits in its index, with entry 0 being the exception using + * the base-2 logarithm of the number of entries in a single page table). + * While a few entries need touching upon update, knowing whether the + * table is fully contiguous (and can hence be replaced by a higher level + * leaf entry) is then possible by simply looking at entry 0's marker. + * + * Prereqs: + * - CONTIG_MASK needs to be #define-d, to a value having at least 4 + * contiguous bits (ignored by hardware), before including this file, + * - page tables to be passed here need to be initialized with correct + * markers. >>> >>> Not sure it's very relevant, but might we worth adding that: >>> >>> - Null entries must have the PTE zeroed except for the CONTIG_MASK >>> region in order to be considered as inactive. >> >> NP, I've added an item along these lines. >> +static bool pt_update_contig_markers(uint64_t *pt, unsigned int idx, + unsigned int level, enum PTE_kind kind) +{ +unsigned int b, i = idx; +unsigned int shift = (level - 1) * CONTIG_LEVEL_SHIFT + PAGE_SHIFT; + +ASSERT(idx < CONTIG_NR); +ASSERT(!(pt[idx] & CONTIG_MASK)); + +/* Step 1: Reduce markers in lower numbered entries. */ +while ( i ) +{ +b = find_first_set_bit(i); +i &= ~(1U << b); +if ( GET_MARKER(pt[i]) > b ) +SET_MARKER(pt[i], b); >>> >>> Can't you exit early when you find an entry that already has the >>> to-be-set contiguous marker <= b, as lower numbered entries will then >>> also be <= b'? >>> >>> Ie: >>> >>> if ( GET_MARKER(pt[i]) <= b ) >>> break; >>> else >>> SET_MARKER(pt[i], b); >> >> Almost - I think it would need to be >> >> if ( GET_MARKER(pt[i]) < b ) >> break; >> if ( GET_MARKER(pt[i]) > b ) >> SET_MARKER(pt[i], b); > > I guess I'm slightly confused, but if marker at i is <= b, then all > following markers will also be <=, and hence could be skipped? Your use of "following" is ambiguous here, because the iteration moves downwards as far as PTEs inspected are concerned (and it's b which grows from one iteration to the next). But yes, I think I agree now that ... > Not sure why we need to keep iterating if GET_MARKER(pt[i]) == b. ... this isn't needed. At which point ... > FWIW, you could even do: > > if ( GET_MARKER(pt[i]) <= b ) > break; > SET_MARKER(pt[i], b); > > Which would keep the conditionals to 1 like it currently is. > >> >> or, accepting redundant updates, >> >> if ( GET_MARKER(pt[i]) < b ) >> break; >> SET_MARKER(pt[i], b); >> >> . Neither the redundant updates nor the extra (easily mis-predicted) >> conditional looked very appealing to me, but I guess I could change >> this if you are convinced that's better than continuing a loop with >> at most 9 (typically less) iterations. > > Well, I think I at least partly understood the logic. Not sure > whether it's worth adding the conditional or just assuming that > continuing the loop is going to be cheaper. Might be worth adding a > comment that we choose to explicitly not add an extra conditional to > check for early exit, because we assume that to be more expensive than > just continuing. ... this resolves without further action. Jan
Re: [PATCH] build: Fix make warning if there is no cppcheck
On 20.05.2022 12:49, Bertrand Marquis wrote: > --- a/xen/Makefile > +++ b/xen/Makefile > @@ -694,12 +694,13 @@ $(objtree)/%.c.cppcheck: $(srctree)/%.c > $(objtree)/include/generated/autoconf.h > $(call if_changed,cppcheck_xml) > > cppcheck-version: > -ifeq ($(shell which $(CPPCHECK)),) > +ifeq ($(shell which $(CPPCHECK) 2> /dev/null),) > $(error Cannot find cppcheck executable: $(CPPCHECK)) > -endif > +else > ifeq ($(shell $(CPPCHECK) --version | awk '{print ($$2 < 2.7)}'),1) > $(error Please upgrade your cppcheck to version 2.7 or greater) > endif > +endif While I agree this will silence things, I still would prefer if you switched to $(if ...) inside the rule - there's no need to invoke the shell while parsing the makefile. Anything like this only needlessly slows down the build. Not by much, but it sums up. Jan
Re: [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables
On 20.05.2022 12:47, Roger Pau Monné wrote: > On Thu, May 19, 2022 at 02:12:04PM +0200, Jan Beulich wrote: >> On 06.05.2022 13:16, Roger Pau Monné wrote: >>> On Mon, Apr 25, 2022 at 10:40:55AM +0200, Jan Beulich wrote: --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -115,7 +115,19 @@ static void set_iommu_ptes_present(unsig while ( nr_ptes-- ) { -set_iommu_pde_present(pde, next_mfn, 0, iw, ir); +ASSERT(!pde->next_level); +ASSERT(!pde->u); + +if ( pde > table ) +ASSERT(pde->ign0 == find_first_set_bit(pde - table)); +else +ASSERT(pde->ign0 == PAGE_SHIFT - 3); >>> >>> I think PAGETABLE_ORDER would be clearer here. >> >> I disagree - PAGETABLE_ORDER is a CPU-side concept. It's not used anywhere >> in IOMMU code afaics. > > Isn't PAGE_SHIFT also a CPU-side concept in the same way? I'm not > sure what's the rule for declaring that PAGE_SHIFT is fine to use in > IOMMU code but not PAGETABLE_ORDER. Hmm, yes and no. But for consistency with other IOMMU code I may want to switch to PAGE_SHIFT_4K. Jan
Re: [PATCH v4 16/21] VT-d: free all-empty page tables
On Wed, May 18, 2022 at 12:26:03PM +0200, Jan Beulich wrote: > On 10.05.2022 16:30, Roger Pau Monné wrote: > > On Mon, Apr 25, 2022 at 10:42:50AM +0200, Jan Beulich wrote: > >> @@ -837,9 +843,31 @@ static int dma_pte_clear_one(struct doma > >> > >> old = *pte; > >> dma_clear_pte(*pte); > >> +iommu_sync_cache(pte, sizeof(*pte)); > >> + > >> +while ( pt_update_contig_markers(&page->val, > >> + address_level_offset(addr, level), > >> + level, PTE_kind_null) && > >> +++level < min_pt_levels ) > >> +{ > >> +struct page_info *pg = maddr_to_page(pg_maddr); > >> + > >> +unmap_vtd_domain_page(page); > >> + > >> +pg_maddr = addr_to_dma_page_maddr(domain, addr, level, > >> flush_flags, > >> + false); > >> +BUG_ON(pg_maddr < PAGE_SIZE); > >> + > >> +page = map_vtd_domain_page(pg_maddr); > >> +pte = &page[address_level_offset(addr, level)]; > >> +dma_clear_pte(*pte); > >> +iommu_sync_cache(pte, sizeof(*pte)); > >> + > >> +*flush_flags |= IOMMU_FLUSHF_all; > >> +iommu_queue_free_pgtable(hd, pg); > >> +} > > > > I think I'm setting myself for trouble, but do we need to sync cache > > the lower lever entries if higher level ones are to be changed. > > > > IOW, would it be fine to just flush the highest level modified PTE? > > As the lower lever ones won't be reachable anyway. > > I definitely want to err on the safe side here. If later we can > prove that some cache flush is unneeded, I'd be happy to see it > go away. Hm, so it's not only about adding more cache flushes, but moving them inside of the locked region: previously the only cache flush was done outside of the locked region. I guess I can't convince myself why we would need to flush cache of entries that are to be removed, and that also point to pages scheduled to be freed. > >> @@ -2182,8 +2210,21 @@ static int __must_check cf_check intel_i > >> } > >> > >> *pte = new; > >> - > >> iommu_sync_cache(pte, sizeof(struct dma_pte)); > >> + > >> +/* > >> + * While the (ab)use of PTE_kind_table here allows to save some work > >> in > >> + * the function, the main motivation for it is that it avoids a so far > >> + * unexplained hang during boot (while preparing Dom0) on a Westmere > >> + * based laptop. > >> + */ > >> +pt_update_contig_markers(&page->val, > >> + address_level_offset(dfn_to_daddr(dfn), > >> level), > >> + level, > >> + (hd->platform_ops->page_sizes & > >> + (1UL << level_to_offset_bits(level + 1)) > >> + ? PTE_kind_leaf : PTE_kind_table)); > > > > So this works because on what we believe to be affected models the > > only supported page sizes are 4K? > > Yes. > > > Do we want to do the same with AMD if we don't allow 512G super pages? > > Why? They don't have a similar flaw. So the question was mostly whether we should also avoid the pt_update_contig_markers for 1G entries, because we won't coalesce them into a 512G anyway. IOW avoid the overhead of updating the contig markers if we know that the resulting super-page is not supported by ->page_sizes. Thanks, Roger.
Re: [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables
On 20.05.2022 13:11, Jan Beulich wrote: > On 20.05.2022 12:47, Roger Pau Monné wrote: >> On Thu, May 19, 2022 at 02:12:04PM +0200, Jan Beulich wrote: >>> On 06.05.2022 13:16, Roger Pau Monné wrote: On Mon, Apr 25, 2022 at 10:40:55AM +0200, Jan Beulich wrote: > --- a/xen/drivers/passthrough/amd/iommu_map.c > +++ b/xen/drivers/passthrough/amd/iommu_map.c > @@ -115,7 +115,19 @@ static void set_iommu_ptes_present(unsig > > while ( nr_ptes-- ) > { > -set_iommu_pde_present(pde, next_mfn, 0, iw, ir); > +ASSERT(!pde->next_level); > +ASSERT(!pde->u); > + > +if ( pde > table ) > +ASSERT(pde->ign0 == find_first_set_bit(pde - table)); > +else > +ASSERT(pde->ign0 == PAGE_SHIFT - 3); I think PAGETABLE_ORDER would be clearer here. >>> >>> I disagree - PAGETABLE_ORDER is a CPU-side concept. It's not used anywhere >>> in IOMMU code afaics. >> >> Isn't PAGE_SHIFT also a CPU-side concept in the same way? I'm not >> sure what's the rule for declaring that PAGE_SHIFT is fine to use in >> IOMMU code but not PAGETABLE_ORDER. > > Hmm, yes and no. But for consistency with other IOMMU code I may want > to switch to PAGE_SHIFT_4K. Except that, with the plan to re-use pt_update_contig_markers() for CPU- side re-coalescing, there I'd prefer to stick to PAGE_SHIFT. Jan
Re: [PATCH 24/30] panic: Refactor the panic path
On 19/05/2022 20:45, Baoquan He wrote: > [...] >> I really appreciate the summary skill you have, to convert complex >> problems in very clear and concise ideas. Thanks for that, very useful! >> I agree with what was summarized above. > > I want to say the similar words to Petr's reviewing comment when I went > through the patches and traced each reviewing sub-thread to try to > catch up. Petr has reivewed this series so carefully and given many > comments I want to ack immediately. > > I agree with most of the suggestions from Petr to this patch, except of > one tiny concern, please see below inline comment. Hi Baoquan, thanks! I'm glad you're also reviewing that =) > [...] > > I like the proposed skeleton of panic() and code style suggested by > Petr very much. About panic_prefer_crash_dump which might need be added, > I hope it has a default value true. This makes crash_dump execute at > first by default just as before, unless people specify > panic_prefer_crash_dump=0|n|off to disable it. Otherwise we need add > panic_prefer_crash_dump=1 in kernel and in our distros to enable kdump, > this is inconsistent with the old behaviour. I'd like to understand better why the crash_kexec() must always be the first thing in your use case. If we keep that behavior, we'll see all sorts of workarounds - see the last patches of this series, Hyper-V and PowerPC folks hardcoded "crash_kexec_post_notifiers" in order to force execution of their relevant notifiers (like the vmbus disconnect, specially in arm64 that has no custom machine_crash_shutdown, or the fadump case in ppc). This led to more risk in kdump. The thing is: with the notifiers' split, we tried to keep only the most relevant/necessary stuff in this first list, things that ultimately should improve kdump reliability or if not, at least not break it. My feeling is that, with this series, we should change the idea/concept that kdump must run first nevertheless, not matter what. We're here trying to accommodate the antagonistic goals of hypervisors that need some clean-up (even for kdump to work) VS. kdump users, that wish a "pristine" system reboot ASAP after the crash. Cheers, Guilherme
Re: [PATCH v4 14/21] x86: introduce helper for recording degree of contiguity in page tables
On Fri, May 20, 2022 at 12:59:55PM +0200, Jan Beulich wrote: > On 20.05.2022 12:22, Roger Pau Monné wrote: > > On Wed, May 18, 2022 at 12:06:29PM +0200, Jan Beulich wrote: > >> On 06.05.2022 15:25, Roger Pau Monné wrote: > >>> On Mon, Apr 25, 2022 at 10:41:23AM +0200, Jan Beulich wrote: > --- /dev/null > +++ b/xen/arch/x86/include/asm/pt-contig-markers.h > @@ -0,0 +1,105 @@ > +#ifndef __ASM_X86_PT_CONTIG_MARKERS_H > +#define __ASM_X86_PT_CONTIG_MARKERS_H > + > +/* > + * Short of having function templates in C, the function defined below > is > + * intended to be used by multiple parties interested in recording the > + * degree of contiguity in mappings by a single page table. > + * > + * Scheme: Every entry records the order of contiguous successive > entries, > + * up to the maximum order covered by that entry (which is the number of > + * clear low bits in its index, with entry 0 being the exception using > + * the base-2 logarithm of the number of entries in a single page > table). > + * While a few entries need touching upon update, knowing whether the > + * table is fully contiguous (and can hence be replaced by a higher > level > + * leaf entry) is then possible by simply looking at entry 0's marker. > + * > + * Prereqs: > + * - CONTIG_MASK needs to be #define-d, to a value having at least 4 > + * contiguous bits (ignored by hardware), before including this file, > + * - page tables to be passed here need to be initialized with correct > + * markers. > >>> > >>> Not sure it's very relevant, but might we worth adding that: > >>> > >>> - Null entries must have the PTE zeroed except for the CONTIG_MASK > >>> region in order to be considered as inactive. > >> > >> NP, I've added an item along these lines. > >> > +static bool pt_update_contig_markers(uint64_t *pt, unsigned int idx, > + unsigned int level, enum PTE_kind > kind) > +{ > +unsigned int b, i = idx; > +unsigned int shift = (level - 1) * CONTIG_LEVEL_SHIFT + PAGE_SHIFT; > + > +ASSERT(idx < CONTIG_NR); > +ASSERT(!(pt[idx] & CONTIG_MASK)); > + > +/* Step 1: Reduce markers in lower numbered entries. */ > +while ( i ) > +{ > +b = find_first_set_bit(i); > +i &= ~(1U << b); > +if ( GET_MARKER(pt[i]) > b ) > +SET_MARKER(pt[i], b); > >>> > >>> Can't you exit early when you find an entry that already has the > >>> to-be-set contiguous marker <= b, as lower numbered entries will then > >>> also be <= b'? > >>> > >>> Ie: > >>> > >>> if ( GET_MARKER(pt[i]) <= b ) > >>> break; > >>> else > >>> SET_MARKER(pt[i], b); > >> > >> Almost - I think it would need to be > >> > >> if ( GET_MARKER(pt[i]) < b ) > >> break; > >> if ( GET_MARKER(pt[i]) > b ) > >> SET_MARKER(pt[i], b); > > > > I guess I'm slightly confused, but if marker at i is <= b, then all > > following markers will also be <=, and hence could be skipped? > > Your use of "following" is ambiguous here, because the iteration > moves downwards as far as PTEs inspected are concerned (and it's > b which grows from one iteration to the next). But yes, I think I > agree now that ... Right, 'following' here would be the next item processed by the loop. > > Not sure why we need to keep iterating if GET_MARKER(pt[i]) == b. > > ... this isn't needed. At which point ... > > > FWIW, you could even do: > > > > if ( GET_MARKER(pt[i]) <= b ) > > break; > > SET_MARKER(pt[i], b); > > > > Which would keep the conditionals to 1 like it currently is. > > > >> > >> or, accepting redundant updates, > >> > >> if ( GET_MARKER(pt[i]) < b ) > >> break; > >> SET_MARKER(pt[i], b); > >> > >> . Neither the redundant updates nor the extra (easily mis-predicted) > >> conditional looked very appealing to me, but I guess I could change > >> this if you are convinced that's better than continuing a loop with > >> at most 9 (typically less) iterations. > > > > Well, I think I at least partly understood the logic. Not sure > > whether it's worth adding the conditional or just assuming that > > continuing the loop is going to be cheaper. Might be worth adding a > > comment that we choose to explicitly not add an extra conditional to > > check for early exit, because we assume that to be more expensive than > > just continuing. > > ... this resolves without further action. OK, since we agree, and that was the only comment I had, you can add: Reviewed-by: Roger Pau Monné Thanks, Roger.
[ovmf test] 170598: regressions - FAIL
flight 170598 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/170598/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-amd64 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf a21a3438f795deecb24e1843c1636f95c485017c baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z 81 days Failing since168258 2022-03-01 01:55:31 Z 80 days 1117 attempts Testing same since 170593 2022-05-20 06:42:41 Z0 days5 attempts People who touched revisions under test: Abdul Lateef Attar Abdul Lateef Attar via groups.io Abner Chang Akihiko Odaki Anthony PERARD Bandaru, Purna Chandra Rao Bo Chang Ke Bob Feng Chao Li Chao, Zhuoran Chen Lin Z Chen, Christine Chen, Lin Z Corvin Köhne Dandan Bi dann frazier Dun Tan duntan Feng, Bob C Gerd Hoffmann Gua Guo Guo Dong Guomin Jiang Hao A Wu Heng Luo Hua Ma Huang, Li-Xia Jagadeesh Ujja Jake Garver Jake Garver via groups.io Jason Jason Lou Jiewen Yao Ke, Bo-ChangX Ken Lautner Kenneth Lautner Kun Qin Kuo, Ted Laszlo Ersek Lean Sheng Tan Leif Lindholm Li, Yi1 Li, Zhihao Liming Gao Liu Liu Yun Liu Yun Y Liu, Zhiguang Lixia Huang Lou, Yun Ma, Hua Mara Sophie Grosch Mara Sophie Grosch via groups.io Matt DeVillier Michael D Kinney Michael Kubacki Michael Kubacki Min M Xu Min Xu Oliver Steffen Patrick Rudolph Peter Grehan Purna Chandra Rao Bandaru Ray Ni Rebecca Cran Rebecca Cran Sami Mujawar Sean Rhodes Sean Rhodes sean@starlabs.systems Sebastien Boeuf Sunny Wang Tan, Dun Ted Kuo Tom Lendacky Wenyi Xie wenyi,xie via groups.io Xiaolu.Jiang Xie, Yuanhao Yi Li yi1 li Yu Pu Yuanhao Xie Yuwei Chen Zhiguang Liu Zhihao Li Zhuoran Chao jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 6968 lines long.)
[ovmf test] 170600: regressions - FAIL
flight 170600 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/170600/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-amd64 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf a21a3438f795deecb24e1843c1636f95c485017c baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z 81 days Failing since168258 2022-03-01 01:55:31 Z 80 days 1118 attempts Testing same since 170593 2022-05-20 06:42:41 Z0 days6 attempts People who touched revisions under test: Abdul Lateef Attar Abdul Lateef Attar via groups.io Abner Chang Akihiko Odaki Anthony PERARD Bandaru, Purna Chandra Rao Bo Chang Ke Bob Feng Chao Li Chao, Zhuoran Chen Lin Z Chen, Christine Chen, Lin Z Corvin Köhne Dandan Bi dann frazier Dun Tan duntan Feng, Bob C Gerd Hoffmann Gua Guo Guo Dong Guomin Jiang Hao A Wu Heng Luo Hua Ma Huang, Li-Xia Jagadeesh Ujja Jake Garver Jake Garver via groups.io Jason Jason Lou Jiewen Yao Ke, Bo-ChangX Ken Lautner Kenneth Lautner Kun Qin Kuo, Ted Laszlo Ersek Lean Sheng Tan Leif Lindholm Li, Yi1 Li, Zhihao Liming Gao Liu Liu Yun Liu Yun Y Liu, Zhiguang Lixia Huang Lou, Yun Ma, Hua Mara Sophie Grosch Mara Sophie Grosch via groups.io Matt DeVillier Michael D Kinney Michael Kubacki Michael Kubacki Min M Xu Min Xu Oliver Steffen Patrick Rudolph Peter Grehan Purna Chandra Rao Bandaru Ray Ni Rebecca Cran Rebecca Cran Sami Mujawar Sean Rhodes Sean Rhodes sean@starlabs.systems Sebastien Boeuf Sunny Wang Tan, Dun Ted Kuo Tom Lendacky Wenyi Xie wenyi,xie via groups.io Xiaolu.Jiang Xie, Yuanhao Yi Li yi1 li Yu Pu Yuanhao Xie Yuwei Chen Zhiguang Liu Zhihao Li Zhuoran Chao jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 6968 lines long.)
Re: [PATCH] build: Fix make warning if there is no cppcheck
Hi Bertrand, On 20/05/2022 11:49, Bertrand Marquis wrote: If cppcheck is not present, the following warning appears during build: which: no cppcheck in ([...]) /bin/sh: cppcheck: command not found Fix this by hiding the error output from which and only try to execute cppcheck --version if we have a cppcheck. Reported-by: Julien Grall Signed-off-by: Bertrand Marquis This solves the warning so: Acked-by: Julien Grall In the long term, I think using what Jan suggested would be better. Cheers, -- Julien Grall
[PATCH 00/16] xen/arm: mm: Remove open-coding mappings
From: Julien Grall Hi all, This series was originally sent as "xen/arm: mm: Add limited support for superpages" [1] and finally has grown enough to remove most of the open-coding mappings in the boot code. This will help to: 1) Get better compliance with the Arm memory model 2) Pave the way to support other page size (64KB, 16KB) Cheers, [1] <20201119190751.22345-1-jul...@xen.org> Julien Grall (15): xen/arm: mm: Allow other mapping size in xen_pt_update_entry() xen/arm: mm: Add support for the contiguous bit xen/arm: mm: Avoid flushing the TLBs when mapping are inserted xen/arm: mm: Don't open-code Xen PT update in remove_early_mappings() xen/arm: mm: Re-implement early_fdt_map() using map_pages_to_xen() xen/arm32: mm: Re-implement setup_xenheap_mappings() using map_pages_to_xen() xen/arm: mm: Allocate xen page tables in domheap rather than xenheap xen/arm: mm: Allow page-table allocation from the boot allocator xen/arm: Move fixmap definitions in a separate header xen/arm: mm: Clean-up the includes and order them xen/arm: mm: Use the PMAP helpers in xen_{,un}map_table() xen/arm32: setup: Move out the code to populate the boot allocator xen/arm64: mm: Add memory to the boot allocator first xen/arm: mm: Rework setup_xenheap_mappings() xen/arm: mm: Re-implement setup_frame_table_mappings() with map_pages_to_xen() Wei Liu (1): xen/arm: add Persistent Map (PMAP) infrastructure xen/arch/arm/Kconfig| 1 + xen/arch/arm/acpi/lib.c | 2 + xen/arch/arm/include/asm/config.h | 6 - xen/arch/arm/include/asm/early_printk.h | 1 + xen/arch/arm/include/asm/fixmap.h | 48 +++ xen/arch/arm/include/asm/lpae.h | 8 + xen/arch/arm/include/asm/mm.h | 4 - xen/arch/arm/include/asm/page.h | 8 + xen/arch/arm/include/asm/pmap.h | 32 ++ xen/arch/arm/kernel.c | 1 + xen/arch/arm/mm.c | 494 +--- xen/arch/arm/setup.c| 141 +++ xen/common/Kconfig | 3 + xen/common/Makefile | 1 + xen/common/pmap.c | 72 xen/include/xen/acpi.h | 18 +- xen/include/xen/pmap.h | 16 + 17 files changed, 549 insertions(+), 307 deletions(-) create mode 100644 xen/arch/arm/include/asm/fixmap.h create mode 100644 xen/arch/arm/include/asm/pmap.h create mode 100644 xen/common/pmap.c create mode 100644 xen/include/xen/pmap.h -- 2.32.0
[PATCH 06/16] xen/arm32: mm: Re-implement setup_xenheap_mappings() using map_pages_to_xen()
From: Julien Grall Now that map_pages_to_xen() has been extended to support 2MB mappings, we can replace the create_mappings() call by map_pages_to_xen() call. Signed-off-by: Julien Grall Reviewed-by: Stefano Stabellini --- Changes in v4: - Add Stefano's reviewed-by Changes in v3: - Fix build when CONFIG_DEBUG=y Changes in v2: - New patch --- xen/arch/arm/mm.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 03f970e4d10b..47c2111c36a4 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -825,7 +825,12 @@ void mmu_init_secondary_cpu(void) void __init setup_xenheap_mappings(unsigned long base_mfn, unsigned long nr_mfns) { -create_mappings(xen_second, XENHEAP_VIRT_START, base_mfn, nr_mfns, MB(32)); +int rc; + +rc = map_pages_to_xen(XENHEAP_VIRT_START, _mfn(base_mfn), nr_mfns, + PAGE_HYPERVISOR_RW | _PAGE_BLOCK); +if ( rc ) +panic("Unable to setup the xenheap mappings.\n"); /* Record where the xenheap is, for translation routines. */ xenheap_virt_end = XENHEAP_VIRT_START + nr_mfns * PAGE_SIZE; -- 2.32.0
[PATCH 01/16] xen/arm: mm: Allow other mapping size in xen_pt_update_entry()
From: Julien Grall At the moment, xen_pt_update_entry() only supports mapping at level 3 (i.e 4KB mapping). While this is fine for most of the runtime helper, the boot code will require to use superpage mapping. We don't want to allow superpage mapping by default as some of the callers may expect small mappings (i.e populate_pt_range()) or even expect to unmap only a part of a superpage. To keep the code simple, a new flag _PAGE_BLOCK is introduced to allow the caller to enable superpage mapping. As the code doesn't support all the combinations, xen_pt_check_entry() is extended to take into account the cases we don't support when using block mapping: - Replacing a table with a mapping. This may happen if region was first mapped with 4KB mapping and then later on replaced with a 2MB (or 1GB mapping). - Removing/modifying a table. This may happen if a caller try to remove a region with _PAGE_BLOCK set when it was created without it. Note that the current restriction means that the caller must ensure that _PAGE_BLOCK is consistently set/cleared across all the updates on a given virtual region. This ought to be fine with the expected use-cases. More rework will be necessary if we wanted to remove the restrictions. Note that nr_mfns is now marked const as it is used for flushing the TLBs and we don't want it to be modified. Signed-off-by: Julien Grall Signed-off-by: Julien Grall Reviewed-by: Hongda Deng --- Changes in v4: - Add Hongda's reviewed-by - Add a comment why nr_mfns is const - Open-code pfn_to_paddr() Changes in v3: - Fix clash after prefixing the PT macros with XEN_PT_ - Fix typoes in the commit message - Support superpage mappings even if nr is not suitably aligned - Move the logic to find the level in a separate function Changes in v2: - Pass the target level rather than the order to xen_pt_update_entry() - Update some comments - Open-code paddr_to_pfn() - Add my AWS signed-off-by --- xen/arch/arm/include/asm/page.h | 4 ++ xen/arch/arm/mm.c | 109 ++-- 2 files changed, 95 insertions(+), 18 deletions(-) diff --git a/xen/arch/arm/include/asm/page.h b/xen/arch/arm/include/asm/page.h index c6f9fb0d4e0c..07998df47bac 100644 --- a/xen/arch/arm/include/asm/page.h +++ b/xen/arch/arm/include/asm/page.h @@ -69,6 +69,7 @@ * [3:4] Permission flags * [5] Page present * [6] Only populate page tables + * [7] Superpage mappings is allowed */ #define PAGE_AI_MASK(x) ((x) & 0x7U) @@ -82,6 +83,9 @@ #define _PAGE_PRESENT(1U << 5) #define _PAGE_POPULATE (1U << 6) +#define _PAGE_BLOCK_BIT 7 +#define _PAGE_BLOCK (1U << _PAGE_BLOCK_BIT) + /* * _PAGE_DEVICE and _PAGE_NORMAL are convenience defines. They are not * meant to be used outside of this header. diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 7b1f2f49060d..be2ac302d731 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1078,9 +1078,10 @@ static int xen_pt_next_level(bool read_only, unsigned int level, } /* Sanity check of the entry */ -static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags) +static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int level, + unsigned int flags) { -/* Sanity check when modifying a page. */ +/* Sanity check when modifying an entry. */ if ( (flags & _PAGE_PRESENT) && mfn_eq(mfn, INVALID_MFN) ) { /* We don't allow modifying an invalid entry. */ @@ -1090,6 +1091,13 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags) return false; } +/* We don't allow modifying a table entry */ +if ( !lpae_is_mapping(entry, level) ) +{ +mm_printk("Modifying a table entry is not allowed.\n"); +return false; +} + /* We don't allow changing memory attributes. */ if ( entry.pt.ai != PAGE_AI_MASK(flags) ) { @@ -1105,7 +1113,7 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags) return false; } } -/* Sanity check when inserting a page */ +/* Sanity check when inserting a mapping */ else if ( flags & _PAGE_PRESENT ) { /* We should be here with a valid MFN. */ @@ -1114,18 +1122,28 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags) /* We don't allow replacing any valid entry. */ if ( lpae_is_valid(entry) ) { -mm_printk("Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n", - mfn_x(lpae_get_mfn(entry)), mfn_x(mfn)); +if ( lpae_is_mapping(entry, level) ) +mm_printk("Changing MFN for a valid entry is not allowed (%#"PRI_mfn" -> %#"PRI_mfn").\n", + mf
[PATCH 02/16] xen/arm: mm: Add support for the contiguous bit
From: Julien Grall In follow-up patches, we will use xen_pt_update() (or its callers) to handle large mappings (e.g. frametable, xenheap). They are also not going to be modified once created. The page-table entries have an hint to indicate that whether an entry is contiguous to another 16 entries (assuming 4KB). When the processor support the hint, one TLB entry will be created per contiguous region. For now this is tied to _PAGE_BLOCK. We can untie it in the future if there are use-cases where we may want to use _PAGE_BLOCK without setting the contiguous (couldn't think of any yet). Note that to avoid extra complexity, mappings with the contiguous bit set cannot be removed. Given the expected use, this restriction ought to be fine. Signed-off-by: Julien Grall Reviewed-by: Hongda Deng Reviewed-by: Stefano Stabellini --- Changes in v4: - Add Hongda and Stefano's reviewed-by - Exit the outer loop as soon as there is an error. Changes in v3: - New patch --- xen/arch/arm/include/asm/page.h | 4 ++ xen/arch/arm/mm.c | 83 + 2 files changed, 79 insertions(+), 8 deletions(-) diff --git a/xen/arch/arm/include/asm/page.h b/xen/arch/arm/include/asm/page.h index 07998df47bac..e7cd62190c7f 100644 --- a/xen/arch/arm/include/asm/page.h +++ b/xen/arch/arm/include/asm/page.h @@ -70,6 +70,7 @@ * [5] Page present * [6] Only populate page tables * [7] Superpage mappings is allowed + * [8] Set contiguous bit (internal flag) */ #define PAGE_AI_MASK(x) ((x) & 0x7U) @@ -86,6 +87,9 @@ #define _PAGE_BLOCK_BIT 7 #define _PAGE_BLOCK (1U << _PAGE_BLOCK_BIT) +#define _PAGE_CONTIG_BIT8 +#define _PAGE_CONTIG(1U << _PAGE_CONTIG_BIT) + /* * _PAGE_DEVICE and _PAGE_NORMAL are convenience defines. They are not * meant to be used outside of this header. diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index be2ac302d731..c4487dd7fc46 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1252,6 +1252,8 @@ static int xen_pt_update_entry(mfn_t root, unsigned long virt, /* Set permission */ pte.pt.ro = PAGE_RO_MASK(flags); pte.pt.xn = PAGE_XN_MASK(flags); +/* Set contiguous bit */ +pte.pt.contig = !!(flags & _PAGE_CONTIG); } write_pte(entry, pte); @@ -1304,6 +1306,51 @@ static int xen_pt_mapping_level(unsigned long vfn, mfn_t mfn, unsigned long nr, return level; } +#define XEN_PT_4K_NR_CONTIG 16 + +/* + * Check whether the contiguous bit can be set. Return the number of + * contiguous entry allowed. If not allowed, return 1. + */ +static unsigned int xen_pt_check_contig(unsigned long vfn, mfn_t mfn, +unsigned int level, unsigned long left, +unsigned int flags) +{ +unsigned long nr_contig; + +/* + * Allow the contiguous bit to set when the caller requests block + * mapping. + */ +if ( !(flags & _PAGE_BLOCK) ) +return 1; + +/* + * We don't allow to remove mapping with the contiguous bit set. + * So shortcut the logic and directly return 1. + */ +if ( mfn_eq(mfn, INVALID_MFN) ) +return 1; + +/* + * The number of contiguous entries varies depending on the page + * granularity used. The logic below assumes 4KB. + */ +BUILD_BUG_ON(PAGE_SIZE != SZ_4K); + +/* + * In order to enable the contiguous bit, we should have enough entries + * to map left and both the virtual and physical address should be + * aligned to the size of 16 translation tables entries. + */ +nr_contig = BIT(XEN_PT_LEVEL_ORDER(level), UL) * XEN_PT_4K_NR_CONTIG; + +if ( (left < nr_contig) || ((mfn_x(mfn) | vfn) & (nr_contig - 1)) ) +return 1; + +return XEN_PT_4K_NR_CONTIG; +} + static DEFINE_SPINLOCK(xen_pt_lock); static int xen_pt_update(unsigned long virt, @@ -1338,6 +1385,12 @@ static int xen_pt_update(unsigned long virt, return -EINVAL; } +if ( flags & _PAGE_CONTIG ) +{ +mm_printk("_PAGE_CONTIG is an internal only flag.\n"); +return -EINVAL; +} + if ( !IS_ALIGNED(virt, PAGE_SIZE) ) { mm_printk("The virtual address is not aligned to the page-size.\n"); @@ -1348,22 +1401,36 @@ static int xen_pt_update(unsigned long virt, while ( left ) { -unsigned int order, level; +unsigned int order, level, nr_contig, new_flags; level = xen_pt_mapping_level(vfn, mfn, left, flags); order = XEN_PT_LEVEL_ORDER(level); ASSERT(left >= BIT(order, UL)); -rc = xen_pt_update_entry(root, vfn << PAGE_SHIFT, mfn, level, flags); -if ( rc ) -break; +/* + * Check if we can set the contiguous mapping and update the + * flags accordingly. + */ +nr_contig = xen_pt_check_contig(vfn, mfn, level, left, flags); +
[PATCH 03/16] xen/arm: mm: Avoid flushing the TLBs when mapping are inserted
From: Julien Grall Currently, the function xen_pt_update() will flush the TLBs even when the mappings are inserted. This is a bit wasteful because we don't allow mapping replacement. Even if we were, the flush would need to happen earlier because mapping replacement should use Break-Before-Make when updating the entry. A single call to xen_pt_update() can perform a single action. IOW, it is not possible to, for instance, mix inserting and removing mappings. Therefore, we can use `flags` to determine what action is performed. This change will be particularly help to limit the impact of switching boot time mapping to use xen_pt_update(). Signed-off-by: Julien Grall --- Changes in v4: - Switch the check to a different expression that will still result to the same truth table. Changes in v2: - New patch --- xen/arch/arm/mm.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index c4487dd7fc46..747083d820dd 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1119,7 +1119,13 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int level, /* We should be here with a valid MFN. */ ASSERT(!mfn_eq(mfn, INVALID_MFN)); -/* We don't allow replacing any valid entry. */ +/* + * We don't allow replacing any valid entry. + * + * Note that the function xen_pt_update() relies on this + * assumption and will skip the TLB flush. The function will need + * to be updated if the check is relaxed. + */ if ( lpae_is_valid(entry) ) { if ( lpae_is_mapping(entry, level) ) @@ -1434,11 +1440,16 @@ static int xen_pt_update(unsigned long virt, } /* - * Flush the TLBs even in case of failure because we may have + * The TLBs flush can be safely skipped when a mapping is inserted + * as we don't allow mapping replacement (see xen_pt_check_entry()). + * + * For all the other cases, the TLBs will be flushed unconditionally + * even if the mapping has failed. This is because we may have * partially modified the PT. This will prevent any unexpected * behavior afterwards. */ -flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns); +if ( !((flags & _PAGE_PRESENT) && !mfn_eq(mfn, INVALID_MFN)) ) +flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns); spin_unlock(&xen_pt_lock); -- 2.32.0
[PATCH 04/16] xen/arm: mm: Don't open-code Xen PT update in remove_early_mappings()
From: Julien Grall Now that xen_pt_update_entry() is able to deal with different mapping size, we can replace the open-coding of the page-tables update by a call to modify_xen_mappings(). As the function is not meant to fail, a BUG_ON() is added to check the return. Note that we don't use destroy_xen_mappings() because the helper doesn't allow us to pass a flags. In theory we could add an extra parameter to the function, however there are no other expected users. Hence why modify_xen_mappings() is used. Signed-off-by: Julien Grall Signed-off-by: Julien Grall Reviewed-by: Hongda Deng --- Changes in v4: - Add Hongda's reviewed-by - Add a comment to explain what modify_xen_mappings() does. - Clarify in the commit message hwy modify_xen_mappings() is used rather than destroy_xen_mappings(). Changes in v2: - Stay consistent with how function name are used in the commit message - Add my AWS signed-off-by --- xen/arch/arm/mm.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 747083d820dd..64a79d45b38c 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -614,11 +614,12 @@ void * __init early_fdt_map(paddr_t fdt_paddr) void __init remove_early_mappings(void) { -lpae_t pte = {0}; -write_pte(xen_second + second_table_offset(BOOT_FDT_VIRT_START), pte); -write_pte(xen_second + second_table_offset(BOOT_FDT_VIRT_START + SZ_2M), - pte); -flush_xen_tlb_range_va(BOOT_FDT_VIRT_START, BOOT_FDT_SLOT_SIZE); +int rc; + +/* destroy the _PAGE_BLOCK mapping */ +rc = modify_xen_mappings(BOOT_FDT_VIRT_START, BOOT_FDT_VIRT_END, + _PAGE_BLOCK); +BUG_ON(rc); } /* -- 2.32.0
[PATCH 05/16] xen/arm: mm: Re-implement early_fdt_map() using map_pages_to_xen()
From: Julien Grall Now that map_pages_to_xen() has been extended to support 2MB mappings, we can replace the create_mappings() calls by map_pages_to_xen() calls. The mapping can also be marked read-only as Xen should not modify the host Device Tree during boot. Signed-off-by: Julien Grall Signed-off-by: Julien Grall Reviewed-by: Hongda Deng --- Changes in v4: - Fix typo in the commit message - Add Hongda's reviewed-by Changes in v2: - Add my AWS signed-off-by - Fix typo in the commit message --- xen/arch/arm/mm.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 64a79d45b38c..03f970e4d10b 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -574,6 +574,7 @@ void * __init early_fdt_map(paddr_t fdt_paddr) paddr_t offset; void *fdt_virt; uint32_t size; +int rc; /* * Check whether the physical FDT address is set and meets the minimum @@ -589,8 +590,12 @@ void * __init early_fdt_map(paddr_t fdt_paddr) /* The FDT is mapped using 2MB superpage */ BUILD_BUG_ON(BOOT_FDT_VIRT_START % SZ_2M); -create_mappings(xen_second, BOOT_FDT_VIRT_START, paddr_to_pfn(base_paddr), -SZ_2M >> PAGE_SHIFT, SZ_2M); +rc = map_pages_to_xen(BOOT_FDT_VIRT_START, maddr_to_mfn(base_paddr), + SZ_2M >> PAGE_SHIFT, + PAGE_HYPERVISOR_RO | _PAGE_BLOCK); +if ( rc ) +panic("Unable to map the device-tree.\n"); + offset = fdt_paddr % SECOND_SIZE; fdt_virt = (void *)BOOT_FDT_VIRT_START + offset; @@ -604,9 +609,12 @@ void * __init early_fdt_map(paddr_t fdt_paddr) if ( (offset + size) > SZ_2M ) { -create_mappings(xen_second, BOOT_FDT_VIRT_START + SZ_2M, -paddr_to_pfn(base_paddr + SZ_2M), -SZ_2M >> PAGE_SHIFT, SZ_2M); +rc = map_pages_to_xen(BOOT_FDT_VIRT_START + SZ_2M, + maddr_to_mfn(base_paddr + SZ_2M), + SZ_2M >> PAGE_SHIFT, + PAGE_HYPERVISOR_RO | _PAGE_BLOCK); +if ( rc ) +panic("Unable to map the device-tree\n"); } return fdt_virt; -- 2.32.0
[PATCH 09/16] xen/arm: Move fixmap definitions in a separate header
From: Julien Grall To use properly the fixmap definitions, their user would need also new to include . This is not very great when the user itself is not meant to directly use ACPI definitions. Including in is not option because the latter header is included by everyone. So move out the fixmap entries definition in a new header. Take the opportunity to also move {set, clear}_fixmap() prototypes in the new header. Note that most of the definitions in now need to be surrounded with #ifndef __ASSEMBLY__ because will be used in assembly (see EARLY_UART_VIRTUAL_ADDRESS). The split will become more helpful in a follow-up patch where new fixmap entries will be defined. Signed-off-by: Julien Grall Acked-by: Jan Beulich --- There was some disagreement with Stefano on whether fixmap.h should include acpi.h or this should be the other way around. I chose the former because each components should decide how much entries in the fixmap they need and also because this is the current behavior on x86. We should stay consitent between arch to avoid any headers mess. Jan acked this patch, so I am assuming he is happy with this approach. I would be OK to rework it if others agree with Stefano's view. Changes in v4: - Add Jan's acked-by - Record Stefano's disagreement on the approach Changes in v3: - Patch added --- xen/arch/arm/acpi/lib.c | 2 ++ xen/arch/arm/include/asm/config.h | 6 -- xen/arch/arm/include/asm/early_printk.h | 1 + xen/arch/arm/include/asm/fixmap.h | 24 xen/arch/arm/include/asm/mm.h | 4 xen/arch/arm/kernel.c | 1 + xen/arch/arm/mm.c | 1 + xen/include/xen/acpi.h | 18 +++--- 8 files changed, 40 insertions(+), 17 deletions(-) create mode 100644 xen/arch/arm/include/asm/fixmap.h diff --git a/xen/arch/arm/acpi/lib.c b/xen/arch/arm/acpi/lib.c index a59cc4074cfb..41d521f720ac 100644 --- a/xen/arch/arm/acpi/lib.c +++ b/xen/arch/arm/acpi/lib.c @@ -25,6 +25,8 @@ #include #include +#include + static bool fixmap_inuse; char *__acpi_map_table(paddr_t phys, unsigned long size) diff --git a/xen/arch/arm/include/asm/config.h b/xen/arch/arm/include/asm/config.h index b25c9d39bb32..3e2a55a91058 100644 --- a/xen/arch/arm/include/asm/config.h +++ b/xen/arch/arm/include/asm/config.h @@ -169,12 +169,6 @@ #endif -/* Fixmap slots */ -#define FIXMAP_CONSOLE 0 /* The primary UART */ -#define FIXMAP_MISC 1 /* Ephemeral mappings of hardware */ -#define FIXMAP_ACPI_BEGIN 2 /* Start mappings of ACPI tables */ -#define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1) /* End mappings of ACPI tables */ - #define NR_hypercalls 64 #define STACK_ORDER 3 diff --git a/xen/arch/arm/include/asm/early_printk.h b/xen/arch/arm/include/asm/early_printk.h index 8dc911cf48a3..c5149b2976da 100644 --- a/xen/arch/arm/include/asm/early_printk.h +++ b/xen/arch/arm/include/asm/early_printk.h @@ -11,6 +11,7 @@ #define __ARM_EARLY_PRINTK_H__ #include +#include #ifdef CONFIG_EARLY_PRINTK diff --git a/xen/arch/arm/include/asm/fixmap.h b/xen/arch/arm/include/asm/fixmap.h new file mode 100644 index ..1cee51e52ab9 --- /dev/null +++ b/xen/arch/arm/include/asm/fixmap.h @@ -0,0 +1,24 @@ +/* + * fixmap.h: compile-time virtual memory allocation + */ +#ifndef __ASM_FIXMAP_H +#define __ASM_FIXMAP_H + +#include + +/* Fixmap slots */ +#define FIXMAP_CONSOLE 0 /* The primary UART */ +#define FIXMAP_MISC 1 /* Ephemeral mappings of hardware */ +#define FIXMAP_ACPI_BEGIN 2 /* Start mappings of ACPI tables */ +#define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1) /* End mappings of ACPI tables */ + +#ifndef __ASSEMBLY__ + +/* Map a page in a fixmap entry */ +extern void set_fixmap(unsigned map, mfn_t mfn, unsigned attributes); +/* Remove a mapping from a fixmap entry */ +extern void clear_fixmap(unsigned map); + +#endif /* __ASSEMBLY__ */ + +#endif /* __ASM_FIXMAP_H */ diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h index 424aaf28230b..045a8ba4bb63 100644 --- a/xen/arch/arm/include/asm/mm.h +++ b/xen/arch/arm/include/asm/mm.h @@ -191,10 +191,6 @@ extern void mmu_init_secondary_cpu(void); extern void setup_xenheap_mappings(unsigned long base_mfn, unsigned long nr_mfns); /* Map a frame table to cover physical addresses ps through pe */ extern void setup_frametable_mappings(paddr_t ps, paddr_t pe); -/* Map a 4k page in a fixmap entry */ -extern void set_fixmap(unsigned map, mfn_t mfn, unsigned attributes); -/* Remove a mapping from a fixmap entry */ -extern void clear_fixmap(unsigned map); /* map a physical range in virtual memory */ void __iomem *ioremap_attr(paddr_t start, size_t len, unsigned attributes); diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c index 8f43caa1866d..25ded1c056d9 100644
[PATCH 08/16] xen/arm: mm: Allow page-table allocation from the boot allocator
From: Julien Grall At the moment, page-table can only be allocated from domheap. This means it is not possible to create mapping in the page-tables via map_pages_to_xen() if page-table needs to be allocated. In order to avoid open-coding page-tables update in early boot, we need to be able to allocate page-tables much earlier. Thankfully, we have the boot allocator for those cases. create_xen_table() is updated to cater early boot allocation by using alloc_boot_pages(). Note, this is not sufficient to bootstrap the page-tables (i.e mapping before any memory is actually mapped). This will be addressed separately. Signed-off-by: Julien Grall Signed-off-by: Julien Grall Reviewed-by: Stefano Stabellini --- Changes in v4: - Add Stefano's reviewed-by Changes in v2: - New patch --- xen/arch/arm/mm.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 252114d67df5..6b7b72de27fe 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1030,19 +1030,27 @@ static void xen_unmap_table(const lpae_t *table) static int create_xen_table(lpae_t *entry) { -struct page_info *pg; +mfn_t mfn; void *p; lpae_t pte; -pg = alloc_domheap_page(NULL, 0); -if ( pg == NULL ) -return -ENOMEM; +if ( system_state != SYS_STATE_early_boot ) +{ +struct page_info *pg = alloc_domheap_page(NULL, 0); + +if ( pg == NULL ) +return -ENOMEM; + +mfn = page_to_mfn(pg); +} +else +mfn = alloc_boot_pages(1, 1); -p = xen_map_table(page_to_mfn(pg)); +p = xen_map_table(mfn); clear_page(p); xen_unmap_table(p); -pte = mfn_to_xen_entry(page_to_mfn(pg), MT_NORMAL); +pte = mfn_to_xen_entry(mfn, MT_NORMAL); pte.pt.table = 1; write_pte(entry, pte); -- 2.32.0
[PATCH 07/16] xen/arm: mm: Allocate xen page tables in domheap rather than xenheap
From: Julien Grall xen_{un,}map_table() already uses the helper to map/unmap pages on-demand (note this is currently a NOP on arm64). So switching to domheap don't have any disavantage. But this as the benefit: - to keep the page tables unmapped if an arch decided to do so - reduce xenheap use on arm32 which can be pretty small Signed-off-by: Julien Grall Acked-by: Stefano Stabellini --- Changes in v3: - Add Stefano's acked-by Changes in v2: - New patch --- xen/arch/arm/mm.c | 36 +--- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 47c2111c36a4..252114d67df5 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -988,21 +988,6 @@ void *ioremap(paddr_t pa, size_t len) return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE); } -static int create_xen_table(lpae_t *entry) -{ -void *p; -lpae_t pte; - -p = alloc_xenheap_page(); -if ( p == NULL ) -return -ENOMEM; -clear_page(p); -pte = mfn_to_xen_entry(virt_to_mfn(p), MT_NORMAL); -pte.pt.table = 1; -write_pte(entry, pte); -return 0; -} - static lpae_t *xen_map_table(mfn_t mfn) { /* @@ -1043,6 +1028,27 @@ static void xen_unmap_table(const lpae_t *table) unmap_domain_page(table); } +static int create_xen_table(lpae_t *entry) +{ +struct page_info *pg; +void *p; +lpae_t pte; + +pg = alloc_domheap_page(NULL, 0); +if ( pg == NULL ) +return -ENOMEM; + +p = xen_map_table(page_to_mfn(pg)); +clear_page(p); +xen_unmap_table(p); + +pte = mfn_to_xen_entry(page_to_mfn(pg), MT_NORMAL); +pte.pt.table = 1; +write_pte(entry, pte); + +return 0; +} + #define XEN_TABLE_MAP_FAILED 0 #define XEN_TABLE_SUPER_PAGE 1 #define XEN_TABLE_NORMAL_PAGE 2 -- 2.32.0
Re: [PATCH] build: Fix make warning if there is no cppcheck
Hi Jan, > On 20 May 2022, at 12:06, Jan Beulich wrote: > > On 20.05.2022 12:49, Bertrand Marquis wrote: >> --- a/xen/Makefile >> +++ b/xen/Makefile >> @@ -694,12 +694,13 @@ $(objtree)/%.c.cppcheck: $(srctree)/%.c >> $(objtree)/include/generated/autoconf.h >> $(call if_changed,cppcheck_xml) >> >> cppcheck-version: >> -ifeq ($(shell which $(CPPCHECK)),) >> +ifeq ($(shell which $(CPPCHECK) 2> /dev/null),) >> $(error Cannot find cppcheck executable: $(CPPCHECK)) >> -endif >> +else >> ifeq ($(shell $(CPPCHECK) --version | awk '{print ($$2 < 2.7)}'),1) >> $(error Please upgrade your cppcheck to version 2.7 or greater) >> endif >> +endif > > While I agree this will silence things, I still would prefer if you > switched to $(if ...) inside the rule - there's no need to invoke the > shell while parsing the makefile. Anything like this only needlessly > slows down the build. Not by much, but it sums up. I will submit a v2 to solve this properly. Cheers Bertrand > > Jan >
[PATCH 14/16] xen/arm64: mm: Add memory to the boot allocator first
From: Julien Grall Currently, memory is added to the boot allocator after the xenheap mappings are done. This will break if the first mapping is more than 512GB of RAM. In addition to that, a follow-up patch will rework setup_xenheap_mappings() to use smaller mappings (e.g. 2MB, 4KB). So it will be necessary to have memory in the boot allocator earlier. Only free memory (e.g. not reserved or modules) can be added to the boot allocator. It might be possible that some regions (including the first one) will have no free memory. So we need to add all the free memory to the boot allocator first and then add do the mappings. Populating the boot allocator is nearly the same between arm32 and arm64. The only difference is on the former we need to exclude the xenheap for the boot allocator. Gate the difference with CONFIG_ARM_32 so the code be re-used on arm64. Signed-off-by: Julien Grall --- Changes in v4: - The implementation of populate_boot_allocator() has been moved in a separate patch. - Fix typo Changes in v3: - Patch added --- xen/arch/arm/setup.c | 55 +++- 1 file changed, 24 insertions(+), 31 deletions(-) diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c index 3d5a2283d4ef..db1768c03f03 100644 --- a/xen/arch/arm/setup.c +++ b/xen/arch/arm/setup.c @@ -636,13 +636,12 @@ static void __init init_staticmem_pages(void) #endif } -#ifdef CONFIG_ARM_32 /* * Populate the boot allocator. All the RAM but the following regions * will be added: * - Modules (e.g., Xen, Kernel) * - Reserved regions - * - Xenheap + * - Xenheap (arm32 only) */ static void __init populate_boot_allocator(void) { @@ -672,6 +671,7 @@ static void __init populate_boot_allocator(void) if ( e > bank_end ) e = bank_end; +#ifdef CONFIG_ARM_32 /* Avoid the xenheap */ if ( s < mfn_to_maddr(xenheap_mfn_end) && mfn_to_maddr(xenheap_mfn_start) < e ) @@ -679,6 +679,7 @@ static void __init populate_boot_allocator(void) e = mfn_to_maddr(xenheap_mfn_start); n = mfn_to_maddr(xenheap_mfn_end); } +#endif fw_unreserved_regions(s, e, init_boot_pages, 0); s = n; @@ -686,6 +687,7 @@ static void __init populate_boot_allocator(void) } } +#ifdef CONFIG_ARM_32 static void __init setup_mm(void) { paddr_t ram_start, ram_end, ram_size, e; @@ -781,45 +783,36 @@ static void __init setup_mm(void) #else /* CONFIG_ARM_64 */ static void __init setup_mm(void) { +const struct meminfo *banks = &bootinfo.mem; paddr_t ram_start = ~0; paddr_t ram_end = 0; paddr_t ram_size = 0; -int bank; +unsigned int i; init_pdx(); -total_pages = 0; -for ( bank = 0 ; bank < bootinfo.mem.nr_banks; bank++ ) -{ -paddr_t bank_start = bootinfo.mem.bank[bank].start; -paddr_t bank_size = bootinfo.mem.bank[bank].size; -paddr_t bank_end = bank_start + bank_size; -paddr_t s, e; - -ram_size = ram_size + bank_size; -ram_start = min(ram_start,bank_start); -ram_end = max(ram_end,bank_end); - -setup_xenheap_mappings(bank_start>>PAGE_SHIFT, bank_size>>PAGE_SHIFT); - -s = bank_start; -while ( s < bank_end ) -{ -paddr_t n = bank_end; +/* + * We need some memory to allocate the page-tables used for the xenheap + * mappings. But some regions may contain memory already allocated + * for other uses (e.g. modules, reserved-memory...). + * + * For simplicity, add all the free regions in the boot allocator. + */ +populate_boot_allocator(); -e = next_module(s, &n); +total_pages = 0; -if ( e == ~(paddr_t)0 ) -{ -e = n = bank_end; -} +for ( i = 0; i < banks->nr_banks; i++ ) +{ +const struct membank *bank = &banks->bank[i]; +paddr_t bank_end = bank->start + bank->size; -if ( e > bank_end ) -e = bank_end; +ram_size = ram_size + bank->size; +ram_start = min(ram_start, bank->start); +ram_end = max(ram_end, bank_end); -fw_unreserved_regions(s, e, init_boot_pages, 0); -s = n; -} +setup_xenheap_mappings(PFN_DOWN(bank->start), + PFN_DOWN(bank->size)); } total_pages += ram_size >> PAGE_SHIFT; -- 2.32.0
[PATCH 11/16] xen/arm: mm: Clean-up the includes and order them
From: Julien Grall The numbers of includes in mm.c has been growing quite a lot. However some of them (e.g. xen/device_tree.h, xen/softirq.h) doesn't look to be directly used by the file or other will be included by larger headers (e.g asm/flushtlb.h will be included by xen/mm.h). So trim down the number of includes. Take the opportunity to order them with the xen headers first, then asm headers and last public headers. Signed-off-by: Julien Grall Acked-by: Stefano Stabellini --- Changes in v4: - Add Stefano's acked-by Changes in v3: - Patch added --- xen/arch/arm/mm.c | 27 ++- 1 file changed, 10 insertions(+), 17 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index bd1348a99716..d40dd4e6c9e6 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -17,33 +17,26 @@ * GNU General Public License for more details. */ -#include -#include -#include -#include -#include -#include +#include #include #include -#include -#include #include -#include -#include -#include -#include -#include -#include +#include +#include +#include +#include #include +#include +#include #include + #include -#include -#include -#include #include #include +#include + /* Override macros from asm/page.h to make them work with mfn_t */ #undef virt_to_mfn #define virt_to_mfn(va) _mfn(__virt_to_mfn(va)) -- 2.32.0
[PATCH 10/16] xen/arm: add Persistent Map (PMAP) infrastructure
From: Wei Liu The basic idea is like Persistent Kernel Map (PKMAP) in Linux. We pre-populate all the relevant page tables before the system is fully set up. We will need it on Arm in order to rework the arm64 version of xenheap_setup_mappings() as we may need to use pages allocated from the boot allocator before they are effectively mapped. This infrastructure is not lock-protected therefore can only be used before smpboot. After smpboot, map_domain_page() has to be used. This is based on the x86 version [1] that was originally implemented by Wei Liu. The PMAP infrastructure is implemented in common code with some arch helpers to set/clear the page-table entries and convertion between a fixmap slot to a virtual address... As mfn_to_xen_entry() now needs to be exported, take the opportunity to swich the parameter attr from unsigned to unsigned int. [1] Signed-off-by: Wei Liu Signed-off-by: Hongyan Xia [julien: Adapted for Arm] Signed-off-by: Julien Grall --- Changes in v4: - Move xen_fixmap in fixmap.h and add a comment about its usage. - Update comments - Use DECLARE_BITMAP() - Replace local_irq_{enable, disable} with an ASSERT() as there should be no user of pmap() in interrupt context. Changes in v3: - s/BITS_PER_LONG/BITS_PER_BYTE/ - Move pmap to common code Changes in v2: - New patch Cc: Jan Beulich Cc: Wei Liu Cc: Andrew Cooper Cc: Roger Pau Monné --- xen/arch/arm/Kconfig | 1 + xen/arch/arm/include/asm/fixmap.h | 24 +++ xen/arch/arm/include/asm/lpae.h | 8 xen/arch/arm/include/asm/pmap.h | 32 ++ xen/arch/arm/mm.c | 7 +-- xen/common/Kconfig| 3 ++ xen/common/Makefile | 1 + xen/common/pmap.c | 72 +++ xen/include/xen/pmap.h| 16 +++ 9 files changed, 158 insertions(+), 6 deletions(-) create mode 100644 xen/arch/arm/include/asm/pmap.h create mode 100644 xen/common/pmap.c create mode 100644 xen/include/xen/pmap.h diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig index ecfa6822e4d3..a89a67802aa9 100644 --- a/xen/arch/arm/Kconfig +++ b/xen/arch/arm/Kconfig @@ -14,6 +14,7 @@ config ARM select HAS_DEVICE_TREE select HAS_PASSTHROUGH select HAS_PDX + select HAS_PMAP select IOMMU_FORCE_PT_SHARE config ARCH_DEFCONFIG diff --git a/xen/arch/arm/include/asm/fixmap.h b/xen/arch/arm/include/asm/fixmap.h index 1cee51e52ab9..365a2385a087 100644 --- a/xen/arch/arm/include/asm/fixmap.h +++ b/xen/arch/arm/include/asm/fixmap.h @@ -5,20 +5,44 @@ #define __ASM_FIXMAP_H #include +#include /* Fixmap slots */ #define FIXMAP_CONSOLE 0 /* The primary UART */ #define FIXMAP_MISC 1 /* Ephemeral mappings of hardware */ #define FIXMAP_ACPI_BEGIN 2 /* Start mappings of ACPI tables */ #define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1) /* End mappings of ACPI tables */ +#define FIXMAP_PMAP_BEGIN (FIXMAP_ACPI_END + 1) /* Start of PMAP */ +#define FIXMAP_PMAP_END (FIXMAP_PMAP_BEGIN + NUM_FIX_PMAP - 1) /* End of PMAP */ + +#define FIXMAP_LAST FIXMAP_PMAP_END + +#define FIXADDR_START FIXMAP_ADDR(0) +#define FIXADDR_TOP FIXMAP_ADDR(FIXMAP_LAST) #ifndef __ASSEMBLY__ +/* + * Direct access to xen_fixmap[] should only happen when {set, + * clear}_fixmap() is unusable (e.g. where we would end up to + * recursively call the helpers). + */ +extern lpae_t xen_fixmap[XEN_PT_LPAE_ENTRIES]; + /* Map a page in a fixmap entry */ extern void set_fixmap(unsigned map, mfn_t mfn, unsigned attributes); /* Remove a mapping from a fixmap entry */ extern void clear_fixmap(unsigned map); +#define fix_to_virt(slot) ((void *)FIXMAP_ADDR(slot)) + +static inline unsigned int virt_to_fix(vaddr_t vaddr) +{ +BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START); + +return ((vaddr - FIXADDR_START) >> PAGE_SHIFT); +} + #endif /* __ASSEMBLY__ */ #endif /* __ASM_FIXMAP_H */ diff --git a/xen/arch/arm/include/asm/lpae.h b/xen/arch/arm/include/asm/lpae.h index aecb320dec45..fc19cbd84772 100644 --- a/xen/arch/arm/include/asm/lpae.h +++ b/xen/arch/arm/include/asm/lpae.h @@ -4,6 +4,7 @@ #ifndef __ASSEMBLY__ #include +#include /* * WARNING! Unlike the x86 pagetable code, where l1 is the lowest level and @@ -168,6 +169,13 @@ static inline bool lpae_is_superpage(lpae_t pte, unsigned int level) third_table_offset(addr)\ } +/* + * Standard entry type that we'll use to build Xen's own pagetables. + * We put the same permissions at every level, because they're ignored + * by the walker in non-leaf entries. + */ +lpae_t mfn_to_xen_entry(mfn_t mfn, unsigned int attr); + #endif /* __ASSEMBLY__ */ /* diff --git a/xen/arch/arm/include/asm/pmap.h b/xen/arch/arm/include/asm/pmap.h new file mode 100644 index ..74398b4c4fe6 --- /dev/null +++ b/xen/arch/arm/include/
[PATCH 16/16] xen/arm: mm: Re-implement setup_frame_table_mappings() with map_pages_to_xen()
From: Julien Grall Now that map_pages_to_xen() has been extended to support 2MB mappings, we can replace the create_mappings() call by map_pages_to_xen() call. This has the advantage to remove the differences between 32-bit and 64-bit code. Lastly remove create_mappings() as there is no more callers. Signed-off-by: Julien Grall Signed-off-by: Julien Grall --- Changes in v4: - Add missing _PAGE_BLOCK Changes in v3: - Fix typo in the commit message - Remove the TODO regarding contiguous bit Changes in v2: - New patch --- xen/arch/arm/mm.c | 64 +-- 1 file changed, 6 insertions(+), 58 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 65af44f42232..be37176a4725 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -369,40 +369,6 @@ void clear_fixmap(unsigned map) BUG_ON(res != 0); } -/* Create Xen's mappings of memory. - * Mapping_size must be either 2MB or 32MB. - * Base and virt must be mapping_size aligned. - * Size must be a multiple of mapping_size. - * second must be a contiguous set of second level page tables - * covering the region starting at virt_offset. */ -static void __init create_mappings(lpae_t *second, - unsigned long virt_offset, - unsigned long base_mfn, - unsigned long nr_mfns, - unsigned int mapping_size) -{ -unsigned long i, count; -const unsigned long granularity = mapping_size >> PAGE_SHIFT; -lpae_t pte, *p; - -ASSERT((mapping_size == MB(2)) || (mapping_size == MB(32))); -ASSERT(!((virt_offset >> PAGE_SHIFT) % granularity)); -ASSERT(!(base_mfn % granularity)); -ASSERT(!(nr_mfns % granularity)); - -count = nr_mfns / XEN_PT_LPAE_ENTRIES; -p = second + second_linear_offset(virt_offset); -pte = mfn_to_xen_entry(_mfn(base_mfn), MT_NORMAL); -if ( granularity == 16 * XEN_PT_LPAE_ENTRIES ) -pte.pt.contig = 1; /* These maps are in 16-entry contiguous chunks. */ -for ( i = 0; i < count; i++ ) -{ -write_pte(p + i, pte); -pte.pt.base += 1 << XEN_PT_LPAE_SHIFT; -} -flush_xen_tlb_local(); -} - #ifdef CONFIG_DOMAIN_PAGE void *map_domain_page_global(mfn_t mfn) { @@ -862,36 +828,18 @@ void __init setup_frametable_mappings(paddr_t ps, paddr_t pe) unsigned long frametable_size = nr_pdxs * sizeof(struct page_info); mfn_t base_mfn; const unsigned long mapping_size = frametable_size < MB(32) ? MB(2) : MB(32); -#ifdef CONFIG_ARM_64 -lpae_t *second, pte; -unsigned long nr_second; -mfn_t second_base; -int i; -#endif +int rc; frametable_base_pdx = mfn_to_pdx(maddr_to_mfn(ps)); /* Round up to 2M or 32M boundary, as appropriate. */ frametable_size = ROUNDUP(frametable_size, mapping_size); base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, 32<<(20-12)); -#ifdef CONFIG_ARM_64 -/* Compute the number of second level pages. */ -nr_second = ROUNDUP(frametable_size, FIRST_SIZE) >> FIRST_SHIFT; -second_base = alloc_boot_pages(nr_second, 1); -second = mfn_to_virt(second_base); -for ( i = 0; i < nr_second; i++ ) -{ -clear_page(mfn_to_virt(mfn_add(second_base, i))); -pte = mfn_to_xen_entry(mfn_add(second_base, i), MT_NORMAL); -pte.pt.table = 1; -write_pte(&xen_first[first_table_offset(FRAMETABLE_VIRT_START)+i], pte); -} -create_mappings(second, 0, mfn_x(base_mfn), frametable_size >> PAGE_SHIFT, -mapping_size); -#else -create_mappings(xen_second, FRAMETABLE_VIRT_START, mfn_x(base_mfn), -frametable_size >> PAGE_SHIFT, mapping_size); -#endif +rc = map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn, + frametable_size >> PAGE_SHIFT, + PAGE_HYPERVISOR_RW | _PAGE_BLOCK); +if ( rc ) +panic("Unable to setup the frametable mappings.\n"); memset(&frame_table[0], 0, nr_pdxs * sizeof(struct page_info)); memset(&frame_table[nr_pdxs], -1, -- 2.32.0
[PATCH 15/16] xen/arm: mm: Rework setup_xenheap_mappings()
From: Julien Grall The current implementation of setup_xenheap_mappings() is using 1GB mappings. This can lead to unexpected result because the mapping may alias a non-cachable region (such as device or reserved regions). For more details see B2.8 in ARM DDI 0487H.a. map_pages_to_xen() was recently reworked to allow superpage mappings, support contiguous mapping and deal with the use of page-tables before they are mapped. Most of the code in setup_xenheap_mappings() is now replaced with a single call to map_pages_to_xen(). Signed-off-by: Julien Grall Signed-off-by: Julien Grall Reviewed-by: Stefano Stabellini --- Changes in v4: - Fix typo - Add Stefano's reviewed-by Changes in v3: - Don't use 1GB mapping - Re-order code in setup_mm() in a separate patch Changes in v2: - New patch --- xen/arch/arm/mm.c | 87 ++- 1 file changed, 18 insertions(+), 69 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index b019e4b35b55..65af44f42232 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -138,17 +138,6 @@ static DEFINE_PAGE_TABLE(cpu0_pgtable); static DEFINE_PAGE_TABLES(cpu0_dommap, DOMHEAP_SECOND_PAGES); #endif -#ifdef CONFIG_ARM_64 -/* The first page of the first level mapping of the xenheap. The - * subsequent xenheap first level pages are dynamically allocated, but - * we need this one to bootstrap ourselves. */ -static DEFINE_PAGE_TABLE(xenheap_first_first); -/* The zeroeth level slot which uses xenheap_first_first. Used because - * setup_xenheap_mappings otherwise relies on mfn_to_virt which isn't - * valid for a non-xenheap mapping. */ -static __initdata int xenheap_first_first_slot = -1; -#endif - /* Common pagetable leaves */ /* Second level page tables. * @@ -831,77 +820,37 @@ void __init setup_xenheap_mappings(unsigned long base_mfn, void __init setup_xenheap_mappings(unsigned long base_mfn, unsigned long nr_mfns) { -lpae_t *first, pte; -unsigned long mfn, end_mfn; -vaddr_t vaddr; - -/* Align to previous 1GB boundary */ -mfn = base_mfn & ~((FIRST_SIZE>>PAGE_SHIFT)-1); +int rc; /* First call sets the xenheap physical and virtual offset. */ if ( mfn_eq(xenheap_mfn_start, INVALID_MFN) ) { +unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1); + xenheap_mfn_start = _mfn(base_mfn); xenheap_base_pdx = mfn_to_pdx(_mfn(base_mfn)); +/* + * The base address may not be aligned to the first level + * size (e.g. 1GB when using 4KB pages). This would prevent + * superpage mappings for all the regions because the virtual + * address and machine address should both be suitably aligned. + * + * Prevent that by offsetting the start of the xenheap virtual + * address. + */ xenheap_virt_start = DIRECTMAP_VIRT_START + -(base_mfn - mfn) * PAGE_SIZE; +(base_mfn - mfn_gb) * PAGE_SIZE; } if ( base_mfn < mfn_x(xenheap_mfn_start) ) panic("cannot add xenheap mapping at %lx below heap start %lx\n", base_mfn, mfn_x(xenheap_mfn_start)); -end_mfn = base_mfn + nr_mfns; - -/* - * Virtual address aligned to previous 1GB to match physical - * address alignment done above. - */ -vaddr = (vaddr_t)__mfn_to_virt(base_mfn) & FIRST_MASK; - -while ( mfn < end_mfn ) -{ -int slot = zeroeth_table_offset(vaddr); -lpae_t *p = &xen_pgtable[slot]; - -if ( p->pt.valid ) -{ -/* mfn_to_virt is not valid on the 1st 1st mfn, since it - * is not within the xenheap. */ -first = slot == xenheap_first_first_slot ? -xenheap_first_first : mfn_to_virt(lpae_get_mfn(*p)); -} -else if ( xenheap_first_first_slot == -1) -{ -/* Use xenheap_first_first to bootstrap the mappings */ -first = xenheap_first_first; - -pte = pte_of_xenaddr((vaddr_t)xenheap_first_first); -pte.pt.table = 1; -write_pte(p, pte); - -xenheap_first_first_slot = slot; -} -else -{ -mfn_t first_mfn = alloc_boot_pages(1, 1); - -clear_page(mfn_to_virt(first_mfn)); -pte = mfn_to_xen_entry(first_mfn, MT_NORMAL); -pte.pt.table = 1; -write_pte(p, pte); -first = mfn_to_virt(first_mfn); -} - -pte = mfn_to_xen_entry(_mfn(mfn), MT_NORMAL); -/* TODO: Set pte.pt.contig when appropriate. */ -write_pte(&first[first_table_offset(vaddr)], pte); - -mfn += FIRST_SIZE>>PAGE_SHIFT; -vaddr += FIRST_SIZE; -} - -flush_xen_tlb_local(); +rc = map_pages_to_xen((vaddr_t)__mfn_to_virt(base_mfn), + _mfn(base_mfn), nr_mfns, + PAGE
[PATCH 13/16] xen/arm32: setup: Move out the code to populate the boot allocator
From: Julien Grall In a follow-up patch, we will want to populate the boot allocator separately for arm64. The code will end up to be very similar to the one on arm32. So move out the code in a new helper populate_boot_allocator(). For now the code is still protected by CONFIG_ARM_32 to avoid any build failure on arm64. Take the opportunity to replace mfn_add(xen_mfn_start, xenheap_pages) with xenheap_mfn_end as they are equivalent. Signed-off-by: Julien Grall --- Changes in v4: - Patch added --- xen/arch/arm/setup.c | 90 +--- 1 file changed, 51 insertions(+), 39 deletions(-) diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c index d5d0792ed48a..3d5a2283d4ef 100644 --- a/xen/arch/arm/setup.c +++ b/xen/arch/arm/setup.c @@ -637,10 +637,58 @@ static void __init init_staticmem_pages(void) } #ifdef CONFIG_ARM_32 +/* + * Populate the boot allocator. All the RAM but the following regions + * will be added: + * - Modules (e.g., Xen, Kernel) + * - Reserved regions + * - Xenheap + */ +static void __init populate_boot_allocator(void) +{ +unsigned int i; +const struct meminfo *banks = &bootinfo.mem; + +for ( i = 0; i < banks->nr_banks; i++ ) +{ +const struct membank *bank = &banks->bank[i]; +paddr_t bank_end = bank->start + bank->size; +paddr_t s, e; + +s = bank->start; +while ( s < bank_end ) +{ +paddr_t n = bank_end; + +e = next_module(s, &n); + +if ( e == ~(paddr_t)0 ) +e = n = bank_end; + +/* + * Module in a RAM bank other than the one which we are + * not dealing with here. + */ +if ( e > bank_end ) +e = bank_end; + +/* Avoid the xenheap */ +if ( s < mfn_to_maddr(xenheap_mfn_end) && + mfn_to_maddr(xenheap_mfn_start) < e ) +{ +e = mfn_to_maddr(xenheap_mfn_start); +n = mfn_to_maddr(xenheap_mfn_end); +} + +fw_unreserved_regions(s, e, init_boot_pages, 0); +s = n; +} +} +} + static void __init setup_mm(void) { -paddr_t ram_start, ram_end, ram_size; -paddr_t s, e; +paddr_t ram_start, ram_end, ram_size, e; unsigned long ram_pages; unsigned long heap_pages, xenheap_pages, domheap_pages; int i; @@ -718,43 +766,7 @@ static void __init setup_mm(void) setup_xenheap_mappings((e >> PAGE_SHIFT) - xenheap_pages, xenheap_pages); /* Add non-xenheap memory */ -for ( i = 0; i < bootinfo.mem.nr_banks; i++ ) -{ -paddr_t bank_start = bootinfo.mem.bank[i].start; -paddr_t bank_end = bank_start + bootinfo.mem.bank[i].size; - -s = bank_start; -while ( s < bank_end ) -{ -paddr_t n = bank_end; - -e = next_module(s, &n); - -if ( e == ~(paddr_t)0 ) -{ -e = n = ram_end; -} - -/* - * Module in a RAM bank other than the one which we are - * not dealing with here. - */ -if ( e > bank_end ) -e = bank_end; - -/* Avoid the xenheap */ -if ( s < mfn_to_maddr(mfn_add(xenheap_mfn_start, xenheap_pages)) - && mfn_to_maddr(xenheap_mfn_start) < e ) -{ -e = mfn_to_maddr(xenheap_mfn_start); -n = mfn_to_maddr(mfn_add(xenheap_mfn_start, xenheap_pages)); -} - -fw_unreserved_regions(s, e, init_boot_pages, 0); - -s = n; -} -} +populate_boot_allocator(); /* Frame table covers all of RAM region, including holes */ setup_frametable_mappings(ram_start, ram_end); -- 2.32.0
[PATCH v2] build: Fix make warning if there is no cppcheck
If cppcheck is not present, the following warning appears during build: which: no cppcheck in ([...]) /bin/sh: cppcheck: command not found Fix the problem by using shell code inside the cppcheck-version rule to also prevent unneeded call of which when something else than cppcheck is built. Reported-by: Julien Grall Signed-off-by: Bertrand Marquis --- xen/Makefile | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/xen/Makefile b/xen/Makefile index 15388703bc..e8d8ed71bc 100644 --- a/xen/Makefile +++ b/xen/Makefile @@ -694,12 +694,14 @@ $(objtree)/%.c.cppcheck: $(srctree)/%.c $(objtree)/include/generated/autoconf.h $(call if_changed,cppcheck_xml) cppcheck-version: -ifeq ($(shell which $(CPPCHECK)),) - $(error Cannot find cppcheck executable: $(CPPCHECK)) -endif -ifeq ($(shell $(CPPCHECK) --version | awk '{print ($$2 < 2.7)}'),1) - $(error Please upgrade your cppcheck to version 2.7 or greater) -endif + @if ! which $(CPPCHECK) > /dev/null 2>&1; then \ + echo "Cannot find cppcheck executable: $(CPPCHECK)"; \ + exit 1; \ + fi + @if [ "$$($(CPPCHECK) --version | awk '{print ($$2 < 2.7)}')" -eq 1 ]; then \ + echo "Please upgrade your cppcheck to version 2.7 or greater"; \ + exit 1; \ + fi # Put this in generated headers this way it is cleaned by include/Makefile $(objtree)/include/generated/compiler-def.h: -- 2.25.1
[PATCH 12/16] xen/arm: mm: Use the PMAP helpers in xen_{,un}map_table()
From: Julien Grall During early boot, it is not possible to use xen_{,un}map_table() if the page tables are not residing the Xen binary. This is a blocker to switch some of the helpers to use xen_pt_update() as we may need to allocate extra page tables and access them before the domheap has been initialized (see setup_xenheap_mappings()). xen_{,un}map_table() are now updated to use the PMAP helpers for early boot map/unmap. Note that the special case for page-tables residing in Xen binary has been dropped because it is "complex" and was only added as a workaround in 8d4f1b8878e0 ("xen/arm: mm: Allow generic xen page-tables helpers to be called early"). Signed-off-by: Julien Grall Reviewed-by: Stefano Stabellini --- Changes in v4: - Add Stefano's reviewed-by Changes in v2: - New patch --- xen/arch/arm/mm.c | 33 + 1 file changed, 9 insertions(+), 24 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index d40dd4e6c9e6..b019e4b35b55 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -980,27 +981,11 @@ void *ioremap(paddr_t pa, size_t len) static lpae_t *xen_map_table(mfn_t mfn) { /* - * We may require to map the page table before map_domain_page() is - * useable. The requirements here is it must be useable as soon as - * page-tables are allocated dynamically via alloc_boot_pages(). - * - * We need to do the check on physical address rather than virtual - * address to avoid truncation on Arm32. Therefore is_kernel() cannot - * be used. + * During early boot, map_domain_page() may be unusable. Use the + * PMAP to map temporarily a page-table. */ if ( system_state == SYS_STATE_early_boot ) -{ -if ( is_xen_fixed_mfn(mfn) ) -{ -/* - * It is fine to demote the type because the size of Xen - * will always fit in vaddr_t. - */ -vaddr_t offset = mfn_to_maddr(mfn) - virt_to_maddr(&_start); - -return (lpae_t *)(XEN_VIRT_START + offset); -} -} +return pmap_map(mfn); return map_domain_page(mfn); } @@ -1009,12 +994,12 @@ static void xen_unmap_table(const lpae_t *table) { /* * During early boot, xen_map_table() will not use map_domain_page() - * for page-tables residing in Xen binary. So skip the unmap part. + * but the PMAP. */ -if ( system_state == SYS_STATE_early_boot && is_kernel(table) ) -return; - -unmap_domain_page(table); +if ( system_state == SYS_STATE_early_boot ) +pmap_unmap(table); +else +unmap_domain_page(table); } static int create_xen_table(lpae_t *entry) -- 2.32.0
Re: [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables
On Fri, May 20, 2022 at 01:13:28PM +0200, Jan Beulich wrote: > On 20.05.2022 13:11, Jan Beulich wrote: > > On 20.05.2022 12:47, Roger Pau Monné wrote: > >> On Thu, May 19, 2022 at 02:12:04PM +0200, Jan Beulich wrote: > >>> On 06.05.2022 13:16, Roger Pau Monné wrote: > On Mon, Apr 25, 2022 at 10:40:55AM +0200, Jan Beulich wrote: > > --- a/xen/drivers/passthrough/amd/iommu_map.c > > +++ b/xen/drivers/passthrough/amd/iommu_map.c > > @@ -115,7 +115,19 @@ static void set_iommu_ptes_present(unsig > > > > while ( nr_ptes-- ) > > { > > -set_iommu_pde_present(pde, next_mfn, 0, iw, ir); > > +ASSERT(!pde->next_level); > > +ASSERT(!pde->u); > > + > > +if ( pde > table ) > > +ASSERT(pde->ign0 == find_first_set_bit(pde - table)); > > +else > > +ASSERT(pde->ign0 == PAGE_SHIFT - 3); > > I think PAGETABLE_ORDER would be clearer here. > >>> > >>> I disagree - PAGETABLE_ORDER is a CPU-side concept. It's not used anywhere > >>> in IOMMU code afaics. > >> > >> Isn't PAGE_SHIFT also a CPU-side concept in the same way? I'm not > >> sure what's the rule for declaring that PAGE_SHIFT is fine to use in > >> IOMMU code but not PAGETABLE_ORDER. > > > > Hmm, yes and no. But for consistency with other IOMMU code I may want > > to switch to PAGE_SHIFT_4K. > > Except that, with the plan to re-use pt_update_contig_markers() for CPU- > side re-coalescing, there I'd prefer to stick to PAGE_SHIFT. Then can PAGETABLE_ORDER be used instead of PAGE_SHIFT - 3? IMO it makes the code quite easier to understand. Thanks, Roger.
[ovmf test] 170601: regressions - FAIL
flight 170601 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/170601/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-amd64 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf a21a3438f795deecb24e1843c1636f95c485017c baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z 81 days Failing since168258 2022-03-01 01:55:31 Z 80 days 1119 attempts Testing same since 170593 2022-05-20 06:42:41 Z0 days7 attempts People who touched revisions under test: Abdul Lateef Attar Abdul Lateef Attar via groups.io Abner Chang Akihiko Odaki Anthony PERARD Bandaru, Purna Chandra Rao Bo Chang Ke Bob Feng Chao Li Chao, Zhuoran Chen Lin Z Chen, Christine Chen, Lin Z Corvin Köhne Dandan Bi dann frazier Dun Tan duntan Feng, Bob C Gerd Hoffmann Gua Guo Guo Dong Guomin Jiang Hao A Wu Heng Luo Hua Ma Huang, Li-Xia Jagadeesh Ujja Jake Garver Jake Garver via groups.io Jason Jason Lou Jiewen Yao Ke, Bo-ChangX Ken Lautner Kenneth Lautner Kun Qin Kuo, Ted Laszlo Ersek Lean Sheng Tan Leif Lindholm Li, Yi1 Li, Zhihao Liming Gao Liu Liu Yun Liu Yun Y Liu, Zhiguang Lixia Huang Lou, Yun Ma, Hua Mara Sophie Grosch Mara Sophie Grosch via groups.io Matt DeVillier Michael D Kinney Michael Kubacki Michael Kubacki Min M Xu Min Xu Oliver Steffen Patrick Rudolph Peter Grehan Purna Chandra Rao Bandaru Ray Ni Rebecca Cran Rebecca Cran Sami Mujawar Sean Rhodes Sean Rhodes sean@starlabs.systems Sebastien Boeuf Sunny Wang Tan, Dun Ted Kuo Tom Lendacky Wenyi Xie wenyi,xie via groups.io Xiaolu.Jiang Xie, Yuanhao Yi Li yi1 li Yu Pu Yuanhao Xie Yuwei Chen Zhiguang Liu Zhihao Li Zhuoran Chao jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 6968 lines long.)
Re: [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables
On 20.05.2022 14:22, Roger Pau Monné wrote: > On Fri, May 20, 2022 at 01:13:28PM +0200, Jan Beulich wrote: >> On 20.05.2022 13:11, Jan Beulich wrote: >>> On 20.05.2022 12:47, Roger Pau Monné wrote: On Thu, May 19, 2022 at 02:12:04PM +0200, Jan Beulich wrote: > On 06.05.2022 13:16, Roger Pau Monné wrote: >> On Mon, Apr 25, 2022 at 10:40:55AM +0200, Jan Beulich wrote: >>> --- a/xen/drivers/passthrough/amd/iommu_map.c >>> +++ b/xen/drivers/passthrough/amd/iommu_map.c >>> @@ -115,7 +115,19 @@ static void set_iommu_ptes_present(unsig >>> >>> while ( nr_ptes-- ) >>> { >>> -set_iommu_pde_present(pde, next_mfn, 0, iw, ir); >>> +ASSERT(!pde->next_level); >>> +ASSERT(!pde->u); >>> + >>> +if ( pde > table ) >>> +ASSERT(pde->ign0 == find_first_set_bit(pde - table)); >>> +else >>> +ASSERT(pde->ign0 == PAGE_SHIFT - 3); >> >> I think PAGETABLE_ORDER would be clearer here. > > I disagree - PAGETABLE_ORDER is a CPU-side concept. It's not used anywhere > in IOMMU code afaics. Isn't PAGE_SHIFT also a CPU-side concept in the same way? I'm not sure what's the rule for declaring that PAGE_SHIFT is fine to use in IOMMU code but not PAGETABLE_ORDER. >>> >>> Hmm, yes and no. But for consistency with other IOMMU code I may want >>> to switch to PAGE_SHIFT_4K. >> >> Except that, with the plan to re-use pt_update_contig_markers() for CPU- >> side re-coalescing, there I'd prefer to stick to PAGE_SHIFT. > > Then can PAGETABLE_ORDER be used instead of PAGE_SHIFT - 3? pt_update_contig_markers() isn't IOMMU code; since I've said I'd switch to PAGE_SHIFT_4K in IOMMU code I'm having a hard time seeing how I could at the same time start using PAGETABLE_ORDER there. What I maybe could do is use PTE_PER_TABLE_SHIFT in AMD code and LEVEL_STRIDE in VT-d one. Yet I'm not sure that would be fully correct/ consistent, ... > IMO it makes the code quite easier to understand. ... or in fact helping readability. Jan
Re: [PATCH RFC 0/6] x86/ioapic: fix edge triggered interrupt migration
On Thu, Apr 21, 2022 at 03:21:08PM +0200, Roger Pau Monne wrote: > Hello, > > Following series attempts to solve the issue with IO-APIC edge triggered > interrupts seeing an inconsistent RTE or IRTE when injected while being > migrated. > > It's currently RFC because some patches have post commit message notes, > and because I'm not sure if patch 1 is really needed. I originally had > the idea of suggesting to only backport patch 1 in order to fix the > issue in older releases, while leaving the more complex (and thus > error prone) IOMMU changes in unstable. Note however that patch 1 is > just a workaround to prevent interrupts seeing inconsistent entries > while being updated, masking the entry just makes the warning go away, > but the interrupt will be lost. Ping? Thanks, Roger.
Re: Grant operation batching
On Fri, May 20, 2022 at 08:24:48AM +0200, Jan Beulich wrote: > On 20.05.2022 01:22, Demi Marie Obenour wrote: > > It is well known that mapping and unmapping grants is expensive, which > > is why blkback has persistent grants. Could this cost be mitigated by > > batching, and if it was, would it affect the tradeoff of memcpy() vs > > grant table operations? > > Which backend driver are you thinking about? The in-kernel Linux > xen-blkback already batches grant operations, afaics. Such > batching is helpful, but the main cost is assumed (known?) to be > with the (installing and) tearing down of the actual mappings of > the guest pages (into/)from backend address space. My thought was that the expensive part of this is TLB flushes, which are only needed once per batch. Also, what do you think about the “unsafe” mode? It would only be unsafe if the backend is untrusted, but it is quite common for the backend to be trusted. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab signature.asc Description: PGP signature
Re: Grant operation batching
On 20.05.2022 14:37, Demi Marie Obenour wrote: > On Fri, May 20, 2022 at 08:24:48AM +0200, Jan Beulich wrote: >> On 20.05.2022 01:22, Demi Marie Obenour wrote: >>> It is well known that mapping and unmapping grants is expensive, which >>> is why blkback has persistent grants. Could this cost be mitigated by >>> batching, and if it was, would it affect the tradeoff of memcpy() vs >>> grant table operations? >> >> Which backend driver are you thinking about? The in-kernel Linux >> xen-blkback already batches grant operations, afaics. Such >> batching is helpful, but the main cost is assumed (known?) to be >> with the (installing and) tearing down of the actual mappings of >> the guest pages (into/)from backend address space. > > My thought was that the expensive part of this is TLB flushes, which are > only needed once per batch. Correct, but as said - such batching is already being made use of by the in-kernel Linux backend. Of course "once per batch" is not entirely precise - very large batches would be split internally into smaller chunks, each of which would be followed by a TLB flush. > Also, what do you think about the “unsafe” > mode? It would only be unsafe if the backend is untrusted, but it is > quite common for the backend to be trusted. Well, I didn't put much thought into that (and hence intentionally didn't comment on it), so for now it's only "Why not?" Jan
Re: [PATCH RFC 0/6] x86/ioapic: fix edge triggered interrupt migration
On 20.05.2022 14:36, Roger Pau Monné wrote: > On Thu, Apr 21, 2022 at 03:21:08PM +0200, Roger Pau Monne wrote: >> Hello, >> >> Following series attempts to solve the issue with IO-APIC edge triggered >> interrupts seeing an inconsistent RTE or IRTE when injected while being >> migrated. >> >> It's currently RFC because some patches have post commit message notes, >> and because I'm not sure if patch 1 is really needed. I originally had >> the idea of suggesting to only backport patch 1 in order to fix the >> issue in older releases, while leaving the more complex (and thus >> error prone) IOMMU changes in unstable. Note however that patch 1 is >> just a workaround to prevent interrupts seeing inconsistent entries >> while being updated, masking the entry just makes the warning go away, >> but the interrupt will be lost. > > Ping? Sorry, the usual thing with RFCs: They take lower priority than other work items. This series is certainly the first of the several pending RFC series which I mean to get to, but it's hard to predict when this would be. Jan
Re: [PATCH v2] build: Fix make warning if there is no cppcheck
On 20.05.2022 14:14, Bertrand Marquis wrote: > --- a/xen/Makefile > +++ b/xen/Makefile > @@ -694,12 +694,14 @@ $(objtree)/%.c.cppcheck: $(srctree)/%.c > $(objtree)/include/generated/autoconf.h > $(call if_changed,cppcheck_xml) > > cppcheck-version: > -ifeq ($(shell which $(CPPCHECK)),) > - $(error Cannot find cppcheck executable: $(CPPCHECK)) > -endif > -ifeq ($(shell $(CPPCHECK) --version | awk '{print ($$2 < 2.7)}'),1) > - $(error Please upgrade your cppcheck to version 2.7 or greater) > -endif > + @if ! which $(CPPCHECK) > /dev/null 2>&1; then \ > + echo "Cannot find cppcheck executable: $(CPPCHECK)"; \ > + exit 1; \ > + fi > + @if [ "$$($(CPPCHECK) --version | awk '{print ($$2 < 2.7)}')" -eq 1 ]; > then \ > + echo "Please upgrade your cppcheck to version 2.7 or greater"; \ > + exit 1; \ > + fi > > # Put this in generated headers this way it is cleaned by include/Makefile > $(objtree)/include/generated/compiler-def.h: Fine with me, even if - as said on v1 - I would have preferred $(if ...). One question though: Wouldn't it better be $(Q) instead of the two plain @? Preferably with that adjustment (which I guess can be made while committing): Acked-by: Jan Beulich Jan
Re: [PATCH v2] build: Fix make warning if there is no cppcheck
Hi, > On 20 May 2022, at 13:51, Jan Beulich wrote: > > On 20.05.2022 14:14, Bertrand Marquis wrote: >> --- a/xen/Makefile >> +++ b/xen/Makefile >> @@ -694,12 +694,14 @@ $(objtree)/%.c.cppcheck: $(srctree)/%.c >> $(objtree)/include/generated/autoconf.h >> $(call if_changed,cppcheck_xml) >> >> cppcheck-version: >> -ifeq ($(shell which $(CPPCHECK)),) >> -$(error Cannot find cppcheck executable: $(CPPCHECK)) >> -endif >> -ifeq ($(shell $(CPPCHECK) --version | awk '{print ($$2 < 2.7)}'),1) >> -$(error Please upgrade your cppcheck to version 2.7 or greater) >> -endif >> +@if ! which $(CPPCHECK) > /dev/null 2>&1; then \ >> +echo "Cannot find cppcheck executable: $(CPPCHECK)"; \ >> +exit 1; \ >> +fi >> +@if [ "$$($(CPPCHECK) --version | awk '{print ($$2 < 2.7)}')" -eq 1 ]; >> then \ >> +echo "Please upgrade your cppcheck to version 2.7 or greater"; \ >> +exit 1; \ >> +fi >> >> # Put this in generated headers this way it is cleaned by include/Makefile >> $(objtree)/include/generated/compiler-def.h: > > Fine with me, even if - as said on v1 - I would have preferred $(if ...). Could you explain why and what you mean exactly ? I thought the code would be more complex and less clear using if and I do not see how it would solve the issue with which being called. > One question though: Wouldn't it better be $(Q) instead of the two plain > @? Preferably with that adjustment (which I guess can be made while > committing): I thought of it but who would be interested in actually seeing those commands which are not “building” anything. > Acked-by: Jan Beulich > Thanks Bertrand
[ovmf test] 170602: regressions - FAIL
flight 170602 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/170602/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-amd64 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf a21a3438f795deecb24e1843c1636f95c485017c baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z 81 days Failing since168258 2022-03-01 01:55:31 Z 80 days 1120 attempts Testing same since 170593 2022-05-20 06:42:41 Z0 days8 attempts People who touched revisions under test: Abdul Lateef Attar Abdul Lateef Attar via groups.io Abner Chang Akihiko Odaki Anthony PERARD Bandaru, Purna Chandra Rao Bo Chang Ke Bob Feng Chao Li Chao, Zhuoran Chen Lin Z Chen, Christine Chen, Lin Z Corvin Köhne Dandan Bi dann frazier Dun Tan duntan Feng, Bob C Gerd Hoffmann Gua Guo Guo Dong Guomin Jiang Hao A Wu Heng Luo Hua Ma Huang, Li-Xia Jagadeesh Ujja Jake Garver Jake Garver via groups.io Jason Jason Lou Jiewen Yao Ke, Bo-ChangX Ken Lautner Kenneth Lautner Kun Qin Kuo, Ted Laszlo Ersek Lean Sheng Tan Leif Lindholm Li, Yi1 Li, Zhihao Liming Gao Liu Liu Yun Liu Yun Y Liu, Zhiguang Lixia Huang Lou, Yun Ma, Hua Mara Sophie Grosch Mara Sophie Grosch via groups.io Matt DeVillier Michael D Kinney Michael Kubacki Michael Kubacki Min M Xu Min Xu Oliver Steffen Patrick Rudolph Peter Grehan Purna Chandra Rao Bandaru Ray Ni Rebecca Cran Rebecca Cran Sami Mujawar Sean Rhodes Sean Rhodes sean@starlabs.systems Sebastien Boeuf Sunny Wang Tan, Dun Ted Kuo Tom Lendacky Wenyi Xie wenyi,xie via groups.io Xiaolu.Jiang Xie, Yuanhao Yi Li yi1 li Yu Pu Yuanhao Xie Yuwei Chen Zhiguang Liu Zhihao Li Zhuoran Chao jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 6968 lines long.)
Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability
On 5/20/2022 5:41 AM, Jan Beulich wrote: On 20.05.2022 10:30, Chuck Zmudzinski wrote: On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote: On 5/20/2022 2:05 AM, Jan Beulich wrote: On 20.05.2022 06:43, Chuck Zmudzinski wrote: On 5/4/22 5:14 AM, Juergen Gross wrote: On 04.05.22 10:31, Jan Beulich wrote: On 03.05.2022 15:22, Juergen Gross wrote: ... these uses there are several more. You say nothing on why those want leaving unaltered. When preparing my earlier patch I did inspect them and came to the conclusion that these all would also better observe the adjusted behavior (or else I couldn't have left pat_enabled() as the only predicate). In fact, as said in the description of my earlier patch, in my debugging I did find the use in i915_gem_object_pin_map() to be the problematic one, which you leave alone. Oh, I missed that one, sorry. That is why your patch would not fix my Haswell unless it also touches i915_gem_object_pin_map() in drivers/gpu/drm/i915/gem/i915_gem_pages.c I wanted to be rather defensive in my changes, but I agree at least the case in arch_phys_wc_add() might want to be changed, too. I think your approach needs to be more aggressive so it will fix all the known false negatives introduced by bdd8b6c98239 such as the one in i915_gem_object_pin_map(). I looked at Jan's approach and I think it would fix the issue with my Haswell as long as I don't use the nopat option. I really don't have a strong opinion on that question, but I think the nopat option as a Linux kernel option, as opposed to a hypervisor option, should only affect the kernel, and if the hypervisor provides the pat feature, then the kernel should not override that, Hmm, why would the kernel not be allowed to override that? Such an override would affect only the single domain where the kernel runs; other domains could take their own decisions. Also, for the sake of completeness: "nopat" used when running on bare metal has the same bad effect on system boot, so there pretty clearly is an error cleanup issue in the i915 driver. But that's orthogonal, and I expect the maintainers may not even care (but tell us "don't do that then"). Actually I just did a test with the last official Debian kernel build of Linux 5.16, that is, a kernel before bdd8b6c98239 was applied. In fact, the nopat option does *not* break the i915 driver in 5.16. That is, with the nopat option, the i915 driver loads normally on both the bare metal and on the Xen hypervisor. That means your presumption (and the presumption of the author of bdd8b6c98239) that the "nopat" option was being observed by the i915 driver is incorrect. Setting "nopat" had no effect on my system with Linux 5.16. So after doing these tests, I am against the aggressive approach of breaking the i915 driver with the "nopat" option because prior to bdd8b6c98239, nopat did not break the i915 driver. Why break it now? Because that's, in my understanding, is the purpose of "nopat" (not breaking the driver of course - that's a driver bug -, but having an effect on the driver). I wouldn't call it a driver bug, but an incorrect configuration of the kernel by the user. I presume X86_FEATURE_PAT is required by the i915 driver and therefore the driver should refuse to disable it if the user requests to disable it and instead warn the user that the driver did not disable the feature, contrary to what the user requested with the nopat option. In any case, my test did not verify that when nopat is set in Linux 5.16, the thread takes the same code path as when nopat is not set, so I am not totally sure that the reason nopat does not break the i915 driver in 5.16 is that static_cpu_has(X86_FEATURE_PAT) returns true even when nopat is set. I could test it with a custom log message in 5.16 if that is necessary. Are you saying it was wrong for static_cpu_has(X86_FEATURE_PAT) to return true in 5.16 when the user requests nopat? I think that is just permitting a bad configuration to break the driver that a well-written operating system should not allow. The i915 driver was, in my opinion, correctly ignoring the nopat option in 5.16 because that option is not compatible with the hardware the i915 driver is trying to initialize and setup at boot time. At least that is my understanding now, but I will need to test it on 5.16 to be sure I understand it correctly. Also, AFAICT, your patch would break the driver when the nopat option is set and only fix the regression introduced by bdd8b6c98239 when nopat is not set on my box, so your patch would introduce a regression relative to Linux 5.16 and earlier for the case when nopat is set on my box. I think your point would be that it is not a regression if it is an incorrect user configuration. I respond by saying a well-written driver should refuse to honor the incorrect configuration requested by the user and instead warn the user that it did not honor the incorrect kernel option. I am only presuming what your patch would do on my box based on what I
[PATCH 3/5] x86/perf: expose LBR format in PERF_CAPABILITIES
Allow exposing the PDCM bit in CPUID for HVM guests if present on the platform, which in turn allows exposing PERF_CAPABILITIES. Limit the information exposed in PERF_CAPABILITIES to the LBR format only. This is helpful as hardware without model-specific LBRs set format to 0x3f in order to notify the feature is not present. Signed-off-by: Roger Pau Monné --- Seeing as we have never exposed PDCM in CPUID I wonder whether there's something that I'm missing that makes exposing PERF_CAPABILITIES LBR format not as trivial as it looks. --- xen/arch/x86/msr.c | 9 + xen/include/public/arch-x86/cpufeatureset.h | 2 +- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c index 01a15857b7..423a795d1d 100644 --- a/xen/arch/x86/msr.c +++ b/xen/arch/x86/msr.c @@ -316,6 +316,15 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) *val = 0; break; +case MSR_IA32_PERF_CAPABILITIES: +if ( !cp->basic.pdcm ) +goto gp_fault; + +/* Only report LBR format. */ +rdmsrl(MSR_IA32_PERF_CAPABILITIES, *val); +*val &= MSR_IA32_PERF_CAP_LBR_FORMAT; +break; + case MSR_X2APIC_FIRST ... MSR_X2APIC_LAST: if ( !is_hvm_domain(d) || v != curr ) goto gp_fault; diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h index cd6409f9f3..5fdaec43c5 100644 --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -135,7 +135,7 @@ XEN_CPUFEATURE(SSSE3, 1*32+ 9) /*A Supplemental Streaming SIMD Extensio XEN_CPUFEATURE(FMA, 1*32+12) /*A Fused Multiply Add */ XEN_CPUFEATURE(CX16, 1*32+13) /*A CMPXCHG16B */ XEN_CPUFEATURE(XTPR, 1*32+14) /* Send Task Priority Messages */ -XEN_CPUFEATURE(PDCM, 1*32+15) /* Perf/Debug Capability MSR */ +XEN_CPUFEATURE(PDCM, 1*32+15) /*S Perf/Debug Capability MSR */ XEN_CPUFEATURE(PCID, 1*32+17) /*H Process Context ID */ XEN_CPUFEATURE(DCA, 1*32+18) /* Direct Cache Access */ XEN_CPUFEATURE(SSE4_1,1*32+19) /*A Streaming SIMD Extensions 4.1 */ -- 2.36.0
[PATCH 0/5] x86/lbr: handle lack of model-specific LBRs
Hello, Intel Sapphire Rapids CPUs doesn't have model-specific MSRs, and hence only architectural LBRs are available. Firstly implement some changes so Xen knows how to enable arch LBRs so that the ler option can also work in such scenario (first two patches). The lack of model-specific LBRs also affects guests, as setting DEBUGCTLMSR.LBR is now ignored (value hardwired to 0, writes ignored) by the hardware due to the lack of model-specific LBRs. The LBR format reported in PERF_CAPABILITIES also need to be exposed, as that's a way for guests to detect lack of model-specific LBRs presence (patches 3 and 4). Patch 5 is an indentation fix that can be merged into patch 4: done separately to help readability of patch 4. Thanks, Roger. Roger Pau Monne (5): x86/ler: use feature flag to check if option is enabled x86/lbr: enable hypervisor LER with arch LBR x86/perf: expose LBR format in PERF_CAPABILITIES x86/vmx: handle no model-specific LBR presence x86/vmx: fix indentation of LBR xen/arch/x86/hvm/vmx/vmx.c | 59 ++--- xen/arch/x86/include/asm/msr-index.h| 18 +++ xen/arch/x86/msr.c | 9 xen/arch/x86/traps.c| 29 -- xen/arch/x86/x86_64/traps.c | 2 +- xen/include/public/arch-x86/cpufeatureset.h | 3 +- 6 files changed, 97 insertions(+), 23 deletions(-) -- 2.36.0
[PATCH 5/5] x86/vmx: fix indentation of LBR
Properly indent the handling of LBR enable in MSR_IA32_DEBUGCTLMSR vmx_msr_write_intercept(). No functional change. Signed-off-by: Roger Pau Monné --- Feel free to squash onto the previous patch, did separately to aid the readability of the previous change. --- xen/arch/x86/hvm/vmx/vmx.c | 38 +++--- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 3f45ac05c6..ff10b293a4 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3540,31 +3540,31 @@ static int cf_check vmx_msr_write_intercept( if ( lbr->count ) { -for ( ; lbr->count; lbr++ ) -{ -unsigned int i; - -for ( i = 0; i < lbr->count; i++ ) +for ( ; lbr->count; lbr++ ) { -int rc = vmx_add_guest_msr(v, lbr->base + i, 0); +unsigned int i; -if ( unlikely(rc) ) +for ( i = 0; i < lbr->count; i++ ) { -gprintk(XENLOG_ERR, -"Guest load/save list error %d\n", rc); -domain_crash(v->domain); -return X86EMUL_OKAY; -} +int rc = vmx_add_guest_msr(v, lbr->base + i, 0); -vmx_clear_msr_intercept(v, lbr->base + i, VMX_MSR_RW); +if ( unlikely(rc) ) +{ +gprintk(XENLOG_ERR, +"Guest load/save list error %d\n", rc); +domain_crash(v->domain); +return X86EMUL_OKAY; +} + +vmx_clear_msr_intercept(v, lbr->base + i, VMX_MSR_RW); +} } -} -v->arch.hvm.vmx.lbr_flags |= LBR_MSRS_INSERTED; -if ( lbr_tsx_fixup_needed ) -v->arch.hvm.vmx.lbr_flags |= LBR_FIXUP_TSX; -if ( ler_to_fixup_needed ) -v->arch.hvm.vmx.lbr_flags |= LBR_FIXUP_LER_TO; +v->arch.hvm.vmx.lbr_flags |= LBR_MSRS_INSERTED; +if ( lbr_tsx_fixup_needed ) +v->arch.hvm.vmx.lbr_flags |= LBR_FIXUP_TSX; +if ( ler_to_fixup_needed ) +v->arch.hvm.vmx.lbr_flags |= LBR_FIXUP_LER_TO; } else /* No model specific LBRs, ignore DEBUGCTLMSR.LBR. */ -- 2.36.0
[PATCH 1/5] x86/ler: use feature flag to check if option is enabled
It's more consistent with the rest of the usages of cpu_has_xen_lbr. No functional change intended. Signed-off-by: Roger Pau Monné --- xen/arch/x86/x86_64/traps.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c index 9d7f1f818b..24c5067ca2 100644 --- a/xen/arch/x86/x86_64/traps.c +++ b/xen/arch/x86/x86_64/traps.c @@ -156,7 +156,7 @@ void show_registers(const struct cpu_user_regs *regs) printk("CPU:%d\n", smp_processor_id()); _show_registers(&fault_regs, fault_crs, context, v); -if ( ler_msr && !guest_mode(regs) ) +if ( cpu_has_xen_lbr && !guest_mode(regs) ) { u64 from, to; -- 2.36.0
[PATCH 2/5] x86/lbr: enable hypervisor LER with arch LBR
CPUs having no model-specific LBRs don't implement DEBUGCTLMSR.LBR and LBRs can only be enabled if the processor supports architectural LBRs. Split the logic to enable LBRs into a separate function and expand the logic to also implement support for arch LBRs if model-specific LBRs are not supported. Signed-off-by: Roger Pau Monné --- xen/arch/x86/include/asm/msr-index.h| 18 + xen/arch/x86/traps.c| 29 ++--- xen/include/public/arch-x86/cpufeatureset.h | 1 + 3 files changed, 45 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/include/asm/msr-index.h b/xen/arch/x86/include/asm/msr-index.h index 3e038db618..7b08e1804b 100644 --- a/xen/arch/x86/include/asm/msr-index.h +++ b/xen/arch/x86/include/asm/msr-index.h @@ -139,6 +139,24 @@ #define PASID_PASID_MASK 0x000f #define PASID_VALID(_AC(1, ULL) << 31) +#define MSR_ARCH_LBR_CTL0x14ce +#define ARCH_LBR_CTL_LBREN (_AC(1, ULL) << 0) +#define ARCH_LBR_CTL_OS(_AC(1, ULL) << 1) +#define ARCH_LBR_CTL_COND (_AC(1, ULL) << 16) +#define ARCH_LBR_CTL_NEAR_REL_JMP (_AC(1, ULL) << 17) +#define ARCH_LBR_CTL_NEAR_IND_JMP (_AC(1, ULL) << 18) +#define ARCH_LBR_CTL_NEAR_REL_CALL (_AC(1, ULL) << 19) +#define ARCH_LBR_CTL_NEAR_IND_CALL (_AC(1, ULL) << 20) +#define ARCH_LBR_CTL_NEAR_RET (_AC(1, ULL) << 21) +#define ARCH_LBR_CTL_OTHER_BRANCH (_AC(1, ULL) << 22) +#define ARCH_LBR_CTL_RECORD_ALL(ARCH_LBR_CTL_COND | \ + ARCH_LBR_CTL_NEAR_REL_JMP | \ + ARCH_LBR_CTL_NEAR_IND_JMP | \ + ARCH_LBR_CTL_NEAR_REL_CALL | \ + ARCH_LBR_CTL_NEAR_IND_CALL | \ + ARCH_LBR_CTL_NEAR_RET | \ + ARCH_LBR_CTL_OTHER_BRANCH) + #define MSR_EFER0xc080 /* Extended Feature Enable Register */ #define EFER_SCE (_AC(1, ULL) << 0) /* SYSCALL Enable */ #define EFER_LME (_AC(1, ULL) << 8) /* Long Mode Enable */ diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 4c38f6c015..133348d9f9 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -1963,6 +1963,29 @@ void do_device_not_available(struct cpu_user_regs *regs) #endif } +static bool enable_lbr(void) +{ +uint64_t debugctl; + +wrmsrl(MSR_IA32_DEBUGCTLMSR, IA32_DEBUGCTLMSR_LBR); +rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl); +if ( !(debugctl & IA32_DEBUGCTLMSR_LBR) ) +{ +/* + * CPUs with no model-specific LBRs always return DEBUGCTLMSR.LBR + * == 0, attempt to set arch LBR if available. + */ +if ( !boot_cpu_has(X86_FEATURE_ARCH_LBR) ) +return false; + +/* Note that LASTINT{FROMIP,TOIP} matches LER_{FROM_IP,TO_IP} */ +wrmsrl(MSR_ARCH_LBR_CTL, ARCH_LBR_CTL_LBREN | ARCH_LBR_CTL_OS | + ARCH_LBR_CTL_RECORD_ALL); +} + +return true; +} + void do_debug(struct cpu_user_regs *regs) { unsigned long dr6; @@ -1997,7 +2020,7 @@ void do_debug(struct cpu_user_regs *regs) /* #DB automatically disabled LBR. Reinstate it if debugging Xen. */ if ( cpu_has_xen_lbr ) -wrmsrl(MSR_IA32_DEBUGCTLMSR, IA32_DEBUGCTLMSR_LBR); +enable_lbr(); if ( !guest_mode(regs) ) { @@ -2179,8 +2202,8 @@ void percpu_traps_init(void) if ( !ler_msr && (ler_msr = calc_ler_msr()) ) setup_force_cpu_cap(X86_FEATURE_XEN_LBR); -if ( cpu_has_xen_lbr ) -wrmsrl(MSR_IA32_DEBUGCTLMSR, IA32_DEBUGCTLMSR_LBR); +if ( cpu_has_xen_lbr && !enable_lbr() ) +printk(XENLOG_ERR "CPU#%u: failed to enable LBR\n", smp_processor_id()); } void __init init_idt_traps(void) diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h index 9cee4b439e..cd6409f9f3 100644 --- a/xen/include/public/arch-x86/cpufeatureset.h +++ b/xen/include/public/arch-x86/cpufeatureset.h @@ -280,6 +280,7 @@ XEN_CPUFEATURE(RTM_ALWAYS_ABORT, 9*32+11) /*! June 2021 TSX defeaturing in micro XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */ XEN_CPUFEATURE(SERIALIZE, 9*32+14) /*A SERIALIZE insn */ XEN_CPUFEATURE(TSXLDTRK, 9*32+16) /*a TSX load tracking suspend/resume insns */ +XEN_CPUFEATURE(ARCH_LBR, 9*32+19) /* Intel ARCH LBR */ XEN_CPUFEATURE(CET_IBT, 9*32+20) /* CET - Indirect Branch Tracking */ XEN_CPUFEATURE(IBRSB, 9*32+26) /*A IBRS and IBPB support (used by Intel) */ XEN_CPUFEATURE(STIBP, 9*32+27) /*A STIBP */ -- 2.36.0
[PATCH 4/5] x86/vmx: handle no model-specific LBR presence
Sapphire Rapids have no model-specific LBRs, and instead only expose architectural LBRs. As documented in the Architectural Last Branch Records specification, processors not supporting model-specific LBRs MSR_IA32_DEBUGCTLMSR.LBR has no meaning, and can be written to 0 or 1, but reads will always return 0. Implement support in vmx_msr_write_intercept() by adding generic detection of lack of model-specific LBRs by checking if the LBR format reported in PERF_CAPABILITIES matches 0x3f, which is explicitly listed in the manual as a way to signal lack of model-specific LBRs presence. Signed-off-by: Roger Pau Monné --- Note the indentation change in vmx_msr_write_intercept() as a result of the addition of a new condition is left for a following patch in order to aid readability of the change. --- xen/arch/x86/hvm/vmx/vmx.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index cf428a4849..3f45ac05c6 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3007,6 +3007,8 @@ static const struct lbr_info { { MSR_GM_LASTBRANCH_0_FROM_IP, NUM_MSR_GM_LASTBRANCH_FROM_TO }, { MSR_GM_LASTBRANCH_0_TO_IP,NUM_MSR_GM_LASTBRANCH_FROM_TO }, { 0, 0 } +}, no_lbr[] = { +{0, 0} }; static const struct lbr_info *last_branch_msr_get(void) @@ -3070,6 +3072,21 @@ static const struct lbr_info *last_branch_msr_get(void) /* Goldmont */ case 0x5c: case 0x5f: return gm_lbr; + +default: +if ( cpu_has_pdcm ) +{ +uint64_t cap; + +rdmsrl(MSR_IA32_PERF_CAPABILITIES, cap); +if ( (cap & MSR_IA32_PERF_CAP_LBR_FORMAT) == 0x3f ) +/* + * On processors that do not support model-specific LBRs, + * PERF_CAPABILITIES.LBR_FMT will have the value 0x3f. + */ +return no_lbr; +} +break; } break; @@ -3521,6 +3538,8 @@ static int cf_check vmx_msr_write_intercept( return X86EMUL_OKAY; } +if ( lbr->count ) +{ for ( ; lbr->count; lbr++ ) { unsigned int i; @@ -3546,6 +3565,10 @@ static int cf_check vmx_msr_write_intercept( v->arch.hvm.vmx.lbr_flags |= LBR_FIXUP_TSX; if ( ler_to_fixup_needed ) v->arch.hvm.vmx.lbr_flags |= LBR_FIXUP_LER_TO; +} +else +/* No model specific LBRs, ignore DEBUGCTLMSR.LBR. */ +msr_content &= ~IA32_DEBUGCTLMSR_LBR; } __vmwrite(GUEST_IA32_DEBUGCTL, msr_content); -- 2.36.0
Re: [PATCH RFC 0/6] x86/ioapic: fix edge triggered interrupt migration
On Fri, May 20, 2022 at 02:46:39PM +0200, Jan Beulich wrote: > On 20.05.2022 14:36, Roger Pau Monné wrote: > > On Thu, Apr 21, 2022 at 03:21:08PM +0200, Roger Pau Monne wrote: > >> Hello, > >> > >> Following series attempts to solve the issue with IO-APIC edge triggered > >> interrupts seeing an inconsistent RTE or IRTE when injected while being > >> migrated. > >> > >> It's currently RFC because some patches have post commit message notes, > >> and because I'm not sure if patch 1 is really needed. I originally had > >> the idea of suggesting to only backport patch 1 in order to fix the > >> issue in older releases, while leaving the more complex (and thus > >> error prone) IOMMU changes in unstable. Note however that patch 1 is > >> just a workaround to prevent interrupts seeing inconsistent entries > >> while being updated, masking the entry just makes the warning go away, > >> but the interrupt will be lost. > > > > Ping? > > Sorry, the usual thing with RFCs: They take lower priority than other > work items. This series is certainly the first of the several pending > RFC series which I mean to get to, but it's hard to predict when this > would be. No problem, I don't think it's a super urgent issue: we have always handled interrupts this way and so far got no noticeable issues (apart from the log messages reported on the console). There's a non-trivial amount of IOMMU code changes, so maybe the IOMMU maintainers could take a stab at those as a start? Thanks, Roger.
Re: [PATCH v2 2/3] tools/xl: Use sparse init for dom_info, remove duplicate vars
On Fri, Apr 29, 2022 at 03:45:25PM -0700, Elliott Mitchell wrote: > Rather than having shadow variables for every element of dom_info, it is > better to properly initialize dom_info at the start. This also removes > the misleading memset() in the middle of main_create(). > > Remove the dryrun element of domain_create as that has been displaced > by the global "dryrun_only" variable. > > Signed-off-by: Elliott Mitchell Reviewed-by: Anthony PERARD Thanks for the clean up. -- Anthony PERARD
Re: [PATCH v2] build: Fix make warning if there is no cppcheck
On 20.05.2022 15:23, Bertrand Marquis wrote: >> On 20 May 2022, at 13:51, Jan Beulich wrote: >> On 20.05.2022 14:14, Bertrand Marquis wrote: >>> --- a/xen/Makefile >>> +++ b/xen/Makefile >>> @@ -694,12 +694,14 @@ $(objtree)/%.c.cppcheck: $(srctree)/%.c >>> $(objtree)/include/generated/autoconf.h >>> $(call if_changed,cppcheck_xml) >>> >>> cppcheck-version: >>> -ifeq ($(shell which $(CPPCHECK)),) >>> - $(error Cannot find cppcheck executable: $(CPPCHECK)) >>> -endif >>> -ifeq ($(shell $(CPPCHECK) --version | awk '{print ($$2 < 2.7)}'),1) >>> - $(error Please upgrade your cppcheck to version 2.7 or greater) >>> -endif >>> + @if ! which $(CPPCHECK) > /dev/null 2>&1; then \ >>> + echo "Cannot find cppcheck executable: $(CPPCHECK)"; \ >>> + exit 1; \ >>> + fi >>> + @if [ "$$($(CPPCHECK) --version | awk '{print ($$2 < 2.7)}')" -eq 1 ]; >>> then \ >>> + echo "Please upgrade your cppcheck to version 2.7 or greater"; \ >>> + exit 1; \ >>> + fi >>> >>> # Put this in generated headers this way it is cleaned by include/Makefile >>> $(objtree)/include/generated/compiler-def.h: >> >> Fine with me, even if - as said on v1 - I would have preferred $(if ...). > > Could you explain why and what you mean exactly ? I generally think that make scripts should resort to shell language only if things cannot reasonably be expressed in make language. > I thought the code would be more complex and less clear using if and I > do not see how it would solve the issue with which being called. The problem to deal with was to move the shell invocation from makefile parsing time to rule execution time. Hence I don't see why cppcheck-version: $(if $(shell which ...),,$(error ...)) wouldn't deal with the problem equally well. But I guess I may not be understanding your question / concern. >> One question though: Wouldn't it better be $(Q) instead of the two plain >> @? Preferably with that adjustment (which I guess can be made while >> committing): > > I thought of it but who would be interested in actually seeing those > commands which are not “building” anything. You never know what's relevant to see when hunting down some obscure build system issue. Jan
[ovmf test] 170603: regressions - FAIL
flight 170603 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/170603/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-amd64 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf a21a3438f795deecb24e1843c1636f95c485017c baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z 81 days Failing since168258 2022-03-01 01:55:31 Z 80 days 1121 attempts Testing same since 170593 2022-05-20 06:42:41 Z0 days9 attempts People who touched revisions under test: Abdul Lateef Attar Abdul Lateef Attar via groups.io Abner Chang Akihiko Odaki Anthony PERARD Bandaru, Purna Chandra Rao Bo Chang Ke Bob Feng Chao Li Chao, Zhuoran Chen Lin Z Chen, Christine Chen, Lin Z Corvin Köhne Dandan Bi dann frazier Dun Tan duntan Feng, Bob C Gerd Hoffmann Gua Guo Guo Dong Guomin Jiang Hao A Wu Heng Luo Hua Ma Huang, Li-Xia Jagadeesh Ujja Jake Garver Jake Garver via groups.io Jason Jason Lou Jiewen Yao Ke, Bo-ChangX Ken Lautner Kenneth Lautner Kun Qin Kuo, Ted Laszlo Ersek Lean Sheng Tan Leif Lindholm Li, Yi1 Li, Zhihao Liming Gao Liu Liu Yun Liu Yun Y Liu, Zhiguang Lixia Huang Lou, Yun Ma, Hua Mara Sophie Grosch Mara Sophie Grosch via groups.io Matt DeVillier Michael D Kinney Michael Kubacki Michael Kubacki Min M Xu Min Xu Oliver Steffen Patrick Rudolph Peter Grehan Purna Chandra Rao Bandaru Ray Ni Rebecca Cran Rebecca Cran Sami Mujawar Sean Rhodes Sean Rhodes sean@starlabs.systems Sebastien Boeuf Sunny Wang Tan, Dun Ted Kuo Tom Lendacky Wenyi Xie wenyi,xie via groups.io Xiaolu.Jiang Xie, Yuanhao Yi Li yi1 li Yu Pu Yuanhao Xie Yuwei Chen Zhiguang Liu Zhihao Li Zhuoran Chao jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 6968 lines long.)
RE: [PATCH 3/3] x86/monitor: Add new monitor event to catch all vmexits
> -Original Message- > From: Tian, Kevin > Sent: Thursday, May 19, 2022 8:35 PM > To: Tamas K Lengyel ; xen- > de...@lists.xenproject.org > Cc: Lengyel, Tamas ; Wei Liu ; > Anthony PERARD ; Gross, Jurgen > ; Cooper, Andrew ; > George Dunlap ; Beulich, Jan > ; Julien Grall ; Stefano Stabellini > ; Alexandru Isaila ; Petre > Pircalabu ; Pau Monné, Roger > ; Nakajima, Jun > Subject: RE: [PATCH 3/3] x86/monitor: Add new monitor event to catch all > vmexits > > > From: Tamas K Lengyel > > Sent: Wednesday, May 18, 2022 11:02 PM > > > > On Thu, May 12, 2022 at 9:47 AM Tamas K Lengyel > > wrote: > > > > > > On Wed, May 4, 2022 at 9:12 AM Tamas K Lengyel > > > wrote: > > > > > > > > On Wed, Apr 27, 2022 at 11:51 AM Tamas K Lengyel > > > > wrote: > > > > > > > > > > Add monitor event that hooks the vmexit handler allowing for > > > > > both sync > > and > > > > > async monitoring of events. With async monitoring an event is > > > > > placed > > on the > > > > > monitor ring for each exit and the rest of the vmexit handler > > > > > resumes > > normally. > > > > > If there are additional monitor events configured those will > > > > > also place > > their > > > > > respective events on the monitor ring. > > > > > > > > > > With the sync version an event is placed on the monitor ring but > > > > > the > > handler > > > > > does not get resumed, thus the sync version is only useful when > > > > > the VM > > is not > > > > > expected to resume normally after the vmexit. Our use-case is > > > > > primarily > > with > > > > > the sync version with VM forks where the fork gets reset after > > > > > sync > > vmexit > > > > > event, thus the rest of the vmexit handler can be safely > > > > > skipped. This is very useful when we want to avoid Xen crashing > > > > > the VM under any > > circumstance, > > > > > for example during fuzzing. Collecting all vmexit information > > > > > regardless > > of > > > > > the root cause makes it easier to reason about the state of the > > > > > VM on > > the > > > > > monitor side, hence we opt to receive all events, even for > > > > > external > > interrupt > > > > > and NMI exits and let the monitor agent decide how to proceed. > > > > > > > > > > Signed-off-by: Tamas K Lengyel > > > > > --- > > > > > v5: wrap vmexit fields in arch.vmx structures in the public > > > > > vm_event ABI > > > > > > > > Patch ping. Could a toolstack maintainer please take a look at this? > > > > The hypervisor side already has a Reviewed-by. > > > > > > Patch ping. > > > > Patch ping. > > > > I guess what you really missed is an ack from toostack maintainer, but > anyway: > > Reviewed-by: Kevin Tian Thanks, the review is still appreciated! Tamas
Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability
On 20.05.2022 15:33, Chuck Zmudzinski wrote: > On 5/20/2022 5:41 AM, Jan Beulich wrote: >> On 20.05.2022 10:30, Chuck Zmudzinski wrote: >>> On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote: On 5/20/2022 2:05 AM, Jan Beulich wrote: > On 20.05.2022 06:43, Chuck Zmudzinski wrote: >> On 5/4/22 5:14 AM, Juergen Gross wrote: >>> On 04.05.22 10:31, Jan Beulich wrote: On 03.05.2022 15:22, Juergen Gross wrote: ... these uses there are several more. You say nothing on why those want leaving unaltered. When preparing my earlier patch I did inspect them and came to the conclusion that these all would also better observe the adjusted behavior (or else I couldn't have left pat_enabled() as the only predicate). In fact, as said in the description of my earlier patch, in my debugging I did find the use in i915_gem_object_pin_map() to be the problematic one, which you leave alone. >>> Oh, I missed that one, sorry. >> That is why your patch would not fix my Haswell unless >> it also touches i915_gem_object_pin_map() in >> drivers/gpu/drm/i915/gem/i915_gem_pages.c >> >>> I wanted to be rather defensive in my changes, but I agree at least >>> the >>> case in arch_phys_wc_add() might want to be changed, too. >> I think your approach needs to be more aggressive so it will fix >> all the known false negatives introduced by bdd8b6c98239 >> such as the one in i915_gem_object_pin_map(). >> >> I looked at Jan's approach and I think it would fix the issue >> with my Haswell as long as I don't use the nopat option. I >> really don't have a strong opinion on that question, but I >> think the nopat option as a Linux kernel option, as opposed >> to a hypervisor option, should only affect the kernel, and >> if the hypervisor provides the pat feature, then the kernel >> should not override that, > Hmm, why would the kernel not be allowed to override that? Such > an override would affect only the single domain where the > kernel runs; other domains could take their own decisions. > > Also, for the sake of completeness: "nopat" used when running on > bare metal has the same bad effect on system boot, so there > pretty clearly is an error cleanup issue in the i915 driver. But > that's orthogonal, and I expect the maintainers may not even care > (but tell us "don't do that then"). >>> Actually I just did a test with the last official Debian kernel >>> build of Linux 5.16, that is, a kernel before bdd8b6c98239 was >>> applied. In fact, the nopat option does *not* break the i915 driver >>> in 5.16. That is, with the nopat option, the i915 driver loads >>> normally on both the bare metal and on the Xen hypervisor. >>> That means your presumption (and the presumption of >>> the author of bdd8b6c98239) that the "nopat" option was >>> being observed by the i915 driver is incorrect. Setting "nopat" >>> had no effect on my system with Linux 5.16. So after doing these >>> tests, I am against the aggressive approach of breaking the i915 >>> driver with the "nopat" option because prior to bdd8b6c98239, >>> nopat did not break the i915 driver. Why break it now? >> Because that's, in my understanding, is the purpose of "nopat" >> (not breaking the driver of course - that's a driver bug -, but >> having an effect on the driver). > > I wouldn't call it a driver bug, but an incorrect configuration of the > kernel by the user. I presume X86_FEATURE_PAT is required by the > i915 driver The driver ought to work fine without PAT (and hence without being able to make WC mappings). It would use UC instead and be slow, but it ought to work. > and therefore the driver should refuse to disable > it if the user requests to disable it and instead warn the user that > the driver did not disable the feature, contrary to what the user > requested with the nopat option. > > In any case, my test did not verify that when nopat is set in Linux 5.16, > the thread takes the same code path as when nopat is not set, > so I am not totally sure that the reason nopat does not break the > i915 driver in 5.16 is that static_cpu_has(X86_FEATURE_PAT) > returns true even when nopat is set. I could test it with a custom > log message in 5.16 if that is necessary. > > Are you saying it was wrong for static_cpu_has(X86_FEATURE_PAT) > to return true in 5.16 when the user requests nopat? No, I'm not saying that. It was wrong for this construct to be used in the driver, which was fixed for 5.17 (and which had caused the regression I did observe, leading to the patch as a hopefully least bad option). > I think that is > just permitting a bad configuration to break the driver that a > well-written operating system should not allow. The i915 driver > was, in my opinion, correctly ignoring the nopat option in 5.16 > because that option is not compatible
Re: [PATCH 3/5] x86/perf: expose LBR format in PERF_CAPABILITIES
On 20/05/2022 14:37, Roger Pau Monne wrote: > Allow exposing the PDCM bit in CPUID for HVM guests if present on the > platform, which in turn allows exposing PERF_CAPABILITIES. Limit the > information exposed in PERF_CAPABILITIES to the LBR format only. > > This is helpful as hardware without model-specific LBRs set format to > 0x3f in order to notify the feature is not present. > > Signed-off-by: Roger Pau Monné > --- > Seeing as we have never exposed PDCM in CPUID I wonder whether there's > something that I'm missing that makes exposing PERF_CAPABILITIES LBR > format not as trivial as it looks. > --- > xen/arch/x86/msr.c | 9 + > xen/include/public/arch-x86/cpufeatureset.h | 2 +- > 2 files changed, 10 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c > index 01a15857b7..423a795d1d 100644 > --- a/xen/arch/x86/msr.c > +++ b/xen/arch/x86/msr.c > @@ -316,6 +316,15 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t > *val) > *val = 0; > break; > > +case MSR_IA32_PERF_CAPABILITIES: > +if ( !cp->basic.pdcm ) > +goto gp_fault; > + > +/* Only report LBR format. */ > +rdmsrl(MSR_IA32_PERF_CAPABILITIES, *val); > +*val &= MSR_IA32_PERF_CAP_LBR_FORMAT; Urgh. We should have this info cached from boot. Except caching on boot is broken on hybrid cpus. The P and E cores of an AlderLake report a different format iirc (they differ between linear, and effective addr). Given the other pain points with hybrid cpus, I think we can ignore it in the short term. > +break; > + > case MSR_X2APIC_FIRST ... MSR_X2APIC_LAST: > if ( !is_hvm_domain(d) || v != curr ) > goto gp_fault; > diff --git a/xen/include/public/arch-x86/cpufeatureset.h > b/xen/include/public/arch-x86/cpufeatureset.h > index cd6409f9f3..5fdaec43c5 100644 > --- a/xen/include/public/arch-x86/cpufeatureset.h > +++ b/xen/include/public/arch-x86/cpufeatureset.h > @@ -135,7 +135,7 @@ XEN_CPUFEATURE(SSSE3, 1*32+ 9) /*A Supplemental > Streaming SIMD Extensio > XEN_CPUFEATURE(FMA, 1*32+12) /*A Fused Multiply Add */ > XEN_CPUFEATURE(CX16, 1*32+13) /*A CMPXCHG16B */ > XEN_CPUFEATURE(XTPR, 1*32+14) /* Send Task Priority Messages */ > -XEN_CPUFEATURE(PDCM, 1*32+15) /* Perf/Debug Capability MSR */ > +XEN_CPUFEATURE(PDCM, 1*32+15) /*S Perf/Debug Capability MSR */ This is the bit which requires more toolstack logic to safely enable. Using 's' for off-by-default is fine if we want to get the series in now. But before we expose the MSR generally, we need to: 1) Put the configuration in msr_policy so the toolstack can reason about it 2) Reject migration attempts to destinations where the LBR format changes 3) Actually put the lBR registers in the migration stream ~Andrew
Re: [PATCH v2 3/3] tools/xl: Allow specifying JSON for domain configuration file format
On Tue, Apr 19, 2022 at 06:23:41PM -0700, Elliott Mitchell wrote: > JSON is currently used when saving domains to mass storage. Being able > to use JSON as the normal input to `xl create` has potential to be > valuable. Add the functionality. > > Move the memset() earlier so to allow use of the structure sooner. > > Signed-off-by: Elliott Mitchell So, I gave this a try and creating a guest from a json config, and that fails very early with "Unknown guest type". Have you actually tried to create a guest from config file written in json? Also, this would need documentation about the new option, about the format. The man page need to be edited. An example of a config file written in json would be nice as well. Thanks, -- Anthony PERARD
Re: [PATCH v2] build: Fix make warning if there is no cppcheck
Hi Jan, > On 20 May 2022, at 14:56, Jan Beulich wrote: > > On 20.05.2022 15:23, Bertrand Marquis wrote: >>> On 20 May 2022, at 13:51, Jan Beulich wrote: >>> On 20.05.2022 14:14, Bertrand Marquis wrote: --- a/xen/Makefile +++ b/xen/Makefile @@ -694,12 +694,14 @@ $(objtree)/%.c.cppcheck: $(srctree)/%.c $(objtree)/include/generated/autoconf.h $(call if_changed,cppcheck_xml) cppcheck-version: -ifeq ($(shell which $(CPPCHECK)),) - $(error Cannot find cppcheck executable: $(CPPCHECK)) -endif -ifeq ($(shell $(CPPCHECK) --version | awk '{print ($$2 < 2.7)}'),1) - $(error Please upgrade your cppcheck to version 2.7 or greater) -endif + @if ! which $(CPPCHECK) > /dev/null 2>&1; then \ + echo "Cannot find cppcheck executable: $(CPPCHECK)"; \ + exit 1; \ + fi + @if [ "$$($(CPPCHECK) --version | awk '{print ($$2 < 2.7)}')" -eq 1 ]; then \ + echo "Please upgrade your cppcheck to version 2.7 or greater"; \ + exit 1; \ + fi # Put this in generated headers this way it is cleaned by include/Makefile $(objtree)/include/generated/compiler-def.h: >>> >>> Fine with me, even if - as said on v1 - I would have preferred $(if ...). >> >> Could you explain why and what you mean exactly ? > > I generally think that make scripts should resort to shell language > only if things cannot reasonably be expressed in make language. Agree hence my first implementation. > >> I thought the code would be more complex and less clear using if and I >> do not see how it would solve the issue with which being called. > > The problem to deal with was to move the shell invocation from > makefile parsing time to rule execution time. Hence I don't see > why > > cppcheck-version: > $(if $(shell which ...),,$(error ...)) > > wouldn't deal with the problem equally well. But I guess I may > not be understanding your question / concern. There are always thousands of ways to achieve the same and here this is only a matter of taste. I must admit that I did not think of using that solution this way. If you prefer this I have nothing against it and I will ack a patch changing to this. > >>> One question though: Wouldn't it better be $(Q) instead of the two plain >>> @? Preferably with that adjustment (which I guess can be made while >>> committing): >> >> I thought of it but who would be interested in actually seeing those >> commands which are not “building” anything. > > You never know what's relevant to see when hunting down some > obscure build system issue. > Feel free to replace @ by $(Q) in my patch on commit. Cheers Bertrand > Jan >
Re: [PATCH 3/5] x86/perf: expose LBR format in PERF_CAPABILITIES
On 20.05.2022 16:10, Andrew Cooper wrote: > On 20/05/2022 14:37, Roger Pau Monne wrote: >> --- a/xen/include/public/arch-x86/cpufeatureset.h >> +++ b/xen/include/public/arch-x86/cpufeatureset.h >> @@ -135,7 +135,7 @@ XEN_CPUFEATURE(SSSE3, 1*32+ 9) /*A Supplemental >> Streaming SIMD Extensio >> XEN_CPUFEATURE(FMA, 1*32+12) /*A Fused Multiply Add */ >> XEN_CPUFEATURE(CX16, 1*32+13) /*A CMPXCHG16B */ >> XEN_CPUFEATURE(XTPR, 1*32+14) /* Send Task Priority Messages */ >> -XEN_CPUFEATURE(PDCM, 1*32+15) /* Perf/Debug Capability MSR */ >> +XEN_CPUFEATURE(PDCM, 1*32+15) /*S Perf/Debug Capability MSR */ > > This is the bit which requires more toolstack logic to safely enable. > Using 's' for off-by-default is fine if we want to get the series in now. > > But before we expose the MSR generally, we need to: > > 1) Put the configuration in msr_policy so the toolstack can reason about it > 2) Reject migration attempts to destinations where the LBR format changes Since this could be quite restrictive, and since people needing to know they need to hide this feature for migration to work, I guess this would further want qualifying by "did the guest actually use LBRs so far"? Jan > 3) Actually put the lBR registers in the migration stream > > ~Andrew
Re: [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables
On Fri, May 20, 2022 at 02:36:02PM +0200, Jan Beulich wrote: > On 20.05.2022 14:22, Roger Pau Monné wrote: > > On Fri, May 20, 2022 at 01:13:28PM +0200, Jan Beulich wrote: > >> On 20.05.2022 13:11, Jan Beulich wrote: > >>> On 20.05.2022 12:47, Roger Pau Monné wrote: > On Thu, May 19, 2022 at 02:12:04PM +0200, Jan Beulich wrote: > > On 06.05.2022 13:16, Roger Pau Monné wrote: > >> On Mon, Apr 25, 2022 at 10:40:55AM +0200, Jan Beulich wrote: > >>> --- a/xen/drivers/passthrough/amd/iommu_map.c > >>> +++ b/xen/drivers/passthrough/amd/iommu_map.c > >>> @@ -115,7 +115,19 @@ static void set_iommu_ptes_present(unsig > >>> > >>> while ( nr_ptes-- ) > >>> { > >>> -set_iommu_pde_present(pde, next_mfn, 0, iw, ir); > >>> +ASSERT(!pde->next_level); > >>> +ASSERT(!pde->u); > >>> + > >>> +if ( pde > table ) > >>> +ASSERT(pde->ign0 == find_first_set_bit(pde - table)); > >>> +else > >>> +ASSERT(pde->ign0 == PAGE_SHIFT - 3); > >> > >> I think PAGETABLE_ORDER would be clearer here. > > > > I disagree - PAGETABLE_ORDER is a CPU-side concept. It's not used > > anywhere > > in IOMMU code afaics. > > Isn't PAGE_SHIFT also a CPU-side concept in the same way? I'm not > sure what's the rule for declaring that PAGE_SHIFT is fine to use in > IOMMU code but not PAGETABLE_ORDER. > >>> > >>> Hmm, yes and no. But for consistency with other IOMMU code I may want > >>> to switch to PAGE_SHIFT_4K. > >> > >> Except that, with the plan to re-use pt_update_contig_markers() for CPU- > >> side re-coalescing, there I'd prefer to stick to PAGE_SHIFT. > > > > Then can PAGETABLE_ORDER be used instead of PAGE_SHIFT - 3? > > pt_update_contig_markers() isn't IOMMU code; since I've said I'd switch > to PAGE_SHIFT_4K in IOMMU code I'm having a hard time seeing how I could > at the same time start using PAGETABLE_ORDER there. I've got confused by the double reply and read it as if you where going to stick to using PAGE_SHIFT everywhere as proposed originally. > What I maybe could do is use PTE_PER_TABLE_SHIFT in AMD code and > LEVEL_STRIDE in VT-d one. Yet I'm not sure that would be fully correct/ > consistent, ... > > > IMO it makes the code quite easier to understand. > > ... or in fact helping readability. Looking at pt_update_contig_markers() we hardcode CONTIG_LEVEL_SHIFT to 9 there, which means all users must have a page table order of 9. It seems to me we are just making things more complicated than necessary by trying to avoid dependencies between CPU and IOMMU page-table sizes and definitions, when the underlying mechanism to set ->ign0 has those assumptions baked in. Would it help if you introduced a PAGE_TABLE_ORDER in page-defs.h? Thanks, Roger.
[ovmf test] 170604: regressions - FAIL
flight 170604 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/170604/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-amd64 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf a21a3438f795deecb24e1843c1636f95c485017c baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z 81 days Failing since168258 2022-03-01 01:55:31 Z 80 days 1122 attempts Testing same since 170593 2022-05-20 06:42:41 Z0 days 10 attempts People who touched revisions under test: Abdul Lateef Attar Abdul Lateef Attar via groups.io Abner Chang Akihiko Odaki Anthony PERARD Bandaru, Purna Chandra Rao Bo Chang Ke Bob Feng Chao Li Chao, Zhuoran Chen Lin Z Chen, Christine Chen, Lin Z Corvin Köhne Dandan Bi dann frazier Dun Tan duntan Feng, Bob C Gerd Hoffmann Gua Guo Guo Dong Guomin Jiang Hao A Wu Heng Luo Hua Ma Huang, Li-Xia Jagadeesh Ujja Jake Garver Jake Garver via groups.io Jason Jason Lou Jiewen Yao Ke, Bo-ChangX Ken Lautner Kenneth Lautner Kun Qin Kuo, Ted Laszlo Ersek Lean Sheng Tan Leif Lindholm Li, Yi1 Li, Zhihao Liming Gao Liu Liu Yun Liu Yun Y Liu, Zhiguang Lixia Huang Lou, Yun Ma, Hua Mara Sophie Grosch Mara Sophie Grosch via groups.io Matt DeVillier Michael D Kinney Michael Kubacki Michael Kubacki Min M Xu Min Xu Oliver Steffen Patrick Rudolph Peter Grehan Purna Chandra Rao Bandaru Ray Ni Rebecca Cran Rebecca Cran Sami Mujawar Sean Rhodes Sean Rhodes sean@starlabs.systems Sebastien Boeuf Sunny Wang Tan, Dun Ted Kuo Tom Lendacky Wenyi Xie wenyi,xie via groups.io Xiaolu.Jiang Xie, Yuanhao Yi Li yi1 li Yu Pu Yuanhao Xie Yuwei Chen Zhiguang Liu Zhihao Li Zhuoran Chao jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 6968 lines long.)
Re: [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables
On Fri, May 20, 2022 at 04:28:14PM +0200, Roger Pau Monné wrote: > On Fri, May 20, 2022 at 02:36:02PM +0200, Jan Beulich wrote: > > On 20.05.2022 14:22, Roger Pau Monné wrote: > > > On Fri, May 20, 2022 at 01:13:28PM +0200, Jan Beulich wrote: > > >> On 20.05.2022 13:11, Jan Beulich wrote: > > >>> On 20.05.2022 12:47, Roger Pau Monné wrote: > > On Thu, May 19, 2022 at 02:12:04PM +0200, Jan Beulich wrote: > > > On 06.05.2022 13:16, Roger Pau Monné wrote: > > >> On Mon, Apr 25, 2022 at 10:40:55AM +0200, Jan Beulich wrote: > > >>> --- a/xen/drivers/passthrough/amd/iommu_map.c > > >>> +++ b/xen/drivers/passthrough/amd/iommu_map.c > > >>> @@ -115,7 +115,19 @@ static void set_iommu_ptes_present(unsig > > >>> > > >>> while ( nr_ptes-- ) > > >>> { > > >>> -set_iommu_pde_present(pde, next_mfn, 0, iw, ir); > > >>> +ASSERT(!pde->next_level); > > >>> +ASSERT(!pde->u); > > >>> + > > >>> +if ( pde > table ) > > >>> +ASSERT(pde->ign0 == find_first_set_bit(pde - table)); > > >>> +else > > >>> +ASSERT(pde->ign0 == PAGE_SHIFT - 3); > > >> > > >> I think PAGETABLE_ORDER would be clearer here. > > > > > > I disagree - PAGETABLE_ORDER is a CPU-side concept. It's not used > > > anywhere > > > in IOMMU code afaics. > > > > Isn't PAGE_SHIFT also a CPU-side concept in the same way? I'm not > > sure what's the rule for declaring that PAGE_SHIFT is fine to use in > > IOMMU code but not PAGETABLE_ORDER. > > >>> > > >>> Hmm, yes and no. But for consistency with other IOMMU code I may want > > >>> to switch to PAGE_SHIFT_4K. > > >> > > >> Except that, with the plan to re-use pt_update_contig_markers() for CPU- > > >> side re-coalescing, there I'd prefer to stick to PAGE_SHIFT. > > > > > > Then can PAGETABLE_ORDER be used instead of PAGE_SHIFT - 3? > > > > pt_update_contig_markers() isn't IOMMU code; since I've said I'd switch > > to PAGE_SHIFT_4K in IOMMU code I'm having a hard time seeing how I could > > at the same time start using PAGETABLE_ORDER there. > > I've got confused by the double reply and read it as if you where > going to stick to using PAGE_SHIFT everywhere as proposed originally. > > > What I maybe could do is use PTE_PER_TABLE_SHIFT in AMD code and > > LEVEL_STRIDE in VT-d one. Yet I'm not sure that would be fully correct/ > > consistent, ... > > > > > IMO it makes the code quite easier to understand. > > > > ... or in fact helping readability. > > Looking at pt_update_contig_markers() we hardcode CONTIG_LEVEL_SHIFT > to 9 there, which means all users must have a page table order of 9. > > It seems to me we are just making things more complicated than > necessary by trying to avoid dependencies between CPU and IOMMU > page-table sizes and definitions, when the underlying mechanism to set > ->ign0 has those assumptions baked in. > > Would it help if you introduced a PAGE_TABLE_ORDER in page-defs.h? Sorry, should be PAGE_TABLE_ORDER_4K. Roger.
[xen-unstable-smoke test] 170599: tolerable all pass - PUSHED
flight 170599 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/170599/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen ec0cce125b8b9fccde3fa825b8ee963083b5de3b baseline version: xen 43aa3f6e72d340a85d3943b86350f6196a87289c Last test of basis 170553 2022-05-19 01:03:09 Z1 days Testing same since 170599 2022-05-20 11:00:25 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Jan Beulich Juergen Gross Julien Grall # Arm Marek Marczykowski-Górecki jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 43aa3f6e72..ec0cce125b ec0cce125b8b9fccde3fa825b8ee963083b5de3b -> smoke
Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability
On 5/20/2022 10:06 AM, Jan Beulich wrote: On 20.05.2022 15:33, Chuck Zmudzinski wrote: On 5/20/2022 5:41 AM, Jan Beulich wrote: On 20.05.2022 10:30, Chuck Zmudzinski wrote: On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote: On 5/20/2022 2:05 AM, Jan Beulich wrote: On 20.05.2022 06:43, Chuck Zmudzinski wrote: On 5/4/22 5:14 AM, Juergen Gross wrote: On 04.05.22 10:31, Jan Beulich wrote: On 03.05.2022 15:22, Juergen Gross wrote: ... these uses there are several more. You say nothing on why those want leaving unaltered. When preparing my earlier patch I did inspect them and came to the conclusion that these all would also better observe the adjusted behavior (or else I couldn't have left pat_enabled() as the only predicate). In fact, as said in the description of my earlier patch, in my debugging I did find the use in i915_gem_object_pin_map() to be the problematic one, which you leave alone. Oh, I missed that one, sorry. That is why your patch would not fix my Haswell unless it also touches i915_gem_object_pin_map() in drivers/gpu/drm/i915/gem/i915_gem_pages.c I wanted to be rather defensive in my changes, but I agree at least the case in arch_phys_wc_add() might want to be changed, too. I think your approach needs to be more aggressive so it will fix all the known false negatives introduced by bdd8b6c98239 such as the one in i915_gem_object_pin_map(). I looked at Jan's approach and I think it would fix the issue with my Haswell as long as I don't use the nopat option. I really don't have a strong opinion on that question, but I think the nopat option as a Linux kernel option, as opposed to a hypervisor option, should only affect the kernel, and if the hypervisor provides the pat feature, then the kernel should not override that, Hmm, why would the kernel not be allowed to override that? Such an override would affect only the single domain where the kernel runs; other domains could take their own decisions. Also, for the sake of completeness: "nopat" used when running on bare metal has the same bad effect on system boot, so there pretty clearly is an error cleanup issue in the i915 driver. But that's orthogonal, and I expect the maintainers may not even care (but tell us "don't do that then"). Actually I just did a test with the last official Debian kernel build of Linux 5.16, that is, a kernel before bdd8b6c98239 was applied. In fact, the nopat option does *not* break the i915 driver in 5.16. That is, with the nopat option, the i915 driver loads normally on both the bare metal and on the Xen hypervisor. That means your presumption (and the presumption of the author of bdd8b6c98239) that the "nopat" option was being observed by the i915 driver is incorrect. Setting "nopat" had no effect on my system with Linux 5.16. So after doing these tests, I am against the aggressive approach of breaking the i915 driver with the "nopat" option because prior to bdd8b6c98239, nopat did not break the i915 driver. Why break it now? Because that's, in my understanding, is the purpose of "nopat" (not breaking the driver of course - that's a driver bug -, but having an effect on the driver). I wouldn't call it a driver bug, but an incorrect configuration of the kernel by the user. I presume X86_FEATURE_PAT is required by the i915 driver The driver ought to work fine without PAT (and hence without being able to make WC mappings). It would use UC instead and be slow, but it ought to work. and therefore the driver should refuse to disable it if the user requests to disable it and instead warn the user that the driver did not disable the feature, contrary to what the user requested with the nopat option. In any case, my test did not verify that when nopat is set in Linux 5.16, the thread takes the same code path as when nopat is not set, so I am not totally sure that the reason nopat does not break the i915 driver in 5.16 is that static_cpu_has(X86_FEATURE_PAT) returns true even when nopat is set. I could test it with a custom log message in 5.16 if that is necessary. Are you saying it was wrong for static_cpu_has(X86_FEATURE_PAT) to return true in 5.16 when the user requests nopat? No, I'm not saying that. It was wrong for this construct to be used in the driver, which was fixed for 5.17 (and which had caused the regression I did observe, leading to the patch as a hopefully least bad option). I think that is just permitting a bad configuration to break the driver that a well-written operating system should not allow. The i915 driver was, in my opinion, correctly ignoring the nopat option in 5.16 because that option is not compatible with the hardware the i915 driver is trying to initialize and setup at boot time. At least that is my understanding now, but I will need to test it on 5.16 to be sure I understand it correctly. Also, AFAICT, your patch would break the driver when the nopat option is set and only fix the regression introduced by bdd8b6c98239 when nopat is not set on my box, so your patch would
Re: [PATCH 3/5] x86/perf: expose LBR format in PERF_CAPABILITIES
On Fri, May 20, 2022 at 02:10:31PM +, Andrew Cooper wrote: > On 20/05/2022 14:37, Roger Pau Monne wrote: > > Allow exposing the PDCM bit in CPUID for HVM guests if present on the > > platform, which in turn allows exposing PERF_CAPABILITIES. Limit the > > information exposed in PERF_CAPABILITIES to the LBR format only. > > > > This is helpful as hardware without model-specific LBRs set format to > > 0x3f in order to notify the feature is not present. > > > > Signed-off-by: Roger Pau Monné > > --- > > Seeing as we have never exposed PDCM in CPUID I wonder whether there's > > something that I'm missing that makes exposing PERF_CAPABILITIES LBR > > format not as trivial as it looks. > > --- > > xen/arch/x86/msr.c | 9 + > > xen/include/public/arch-x86/cpufeatureset.h | 2 +- > > 2 files changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c > > index 01a15857b7..423a795d1d 100644 > > --- a/xen/arch/x86/msr.c > > +++ b/xen/arch/x86/msr.c > > @@ -316,6 +316,15 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t > > *val) > > *val = 0; > > break; > > > > +case MSR_IA32_PERF_CAPABILITIES: > > +if ( !cp->basic.pdcm ) > > +goto gp_fault; > > + > > +/* Only report LBR format. */ > > +rdmsrl(MSR_IA32_PERF_CAPABILITIES, *val); > > +*val &= MSR_IA32_PERF_CAP_LBR_FORMAT; > > Urgh. We should have this info cached from boot. Except caching on > boot is broken on hybrid cpus. The P and E cores of an AlderLake report > a different format iirc (they differ between linear, and effective addr). > > Given the other pain points with hybrid cpus, I think we can ignore it > in the short term. > > > +break; > > + > > case MSR_X2APIC_FIRST ... MSR_X2APIC_LAST: > > if ( !is_hvm_domain(d) || v != curr ) > > goto gp_fault; > > diff --git a/xen/include/public/arch-x86/cpufeatureset.h > > b/xen/include/public/arch-x86/cpufeatureset.h > > index cd6409f9f3..5fdaec43c5 100644 > > --- a/xen/include/public/arch-x86/cpufeatureset.h > > +++ b/xen/include/public/arch-x86/cpufeatureset.h > > @@ -135,7 +135,7 @@ XEN_CPUFEATURE(SSSE3, 1*32+ 9) /*A > > Supplemental Streaming SIMD Extensio > > XEN_CPUFEATURE(FMA, 1*32+12) /*A Fused Multiply Add */ > > XEN_CPUFEATURE(CX16, 1*32+13) /*A CMPXCHG16B */ > > XEN_CPUFEATURE(XTPR, 1*32+14) /* Send Task Priority Messages */ > > -XEN_CPUFEATURE(PDCM, 1*32+15) /* Perf/Debug Capability MSR */ > > +XEN_CPUFEATURE(PDCM, 1*32+15) /*S Perf/Debug Capability MSR */ > > This is the bit which requires more toolstack logic to safely enable. > Using 's' for off-by-default is fine if we want to get the series in now. > > But before we expose the MSR generally, we need to: > > 1) Put the configuration in msr_policy so the toolstack can reason about it > 2) Reject migration attempts to destinations where the LBR format changes > 3) Actually put the lBR registers in the migration stream So far we have allowed guests to enable LBRs (DEBUGCTLMSR.LBR) freely without any restrictions, and migration of guests using LBRs certainly won't work currently, hence I wonder why exposing the LBR format makes it worse as to require all this extra handling. I'm not saying it's not worth having, but IMO we should better spend the time in getting architectural LBRs available to guests and migration safe, and for architectural LBRs we don't really care about PERF_CAPABILITIES.LBR_FORMAT other than hardcoding it to 0x3f. Thanks, Roger.
Re: [PATCH 3/5] x86/perf: expose LBR format in PERF_CAPABILITIES
On 20/05/2022 15:19, Jan Beulich wrote: > On 20.05.2022 16:10, Andrew Cooper wrote: >> On 20/05/2022 14:37, Roger Pau Monne wrote: >>> --- a/xen/include/public/arch-x86/cpufeatureset.h >>> +++ b/xen/include/public/arch-x86/cpufeatureset.h >>> @@ -135,7 +135,7 @@ XEN_CPUFEATURE(SSSE3, 1*32+ 9) /*A >>> Supplemental Streaming SIMD Extensio >>> XEN_CPUFEATURE(FMA, 1*32+12) /*A Fused Multiply Add */ >>> XEN_CPUFEATURE(CX16, 1*32+13) /*A CMPXCHG16B */ >>> XEN_CPUFEATURE(XTPR, 1*32+14) /* Send Task Priority Messages */ >>> -XEN_CPUFEATURE(PDCM, 1*32+15) /* Perf/Debug Capability MSR */ >>> +XEN_CPUFEATURE(PDCM, 1*32+15) /*S Perf/Debug Capability MSR */ >> This is the bit which requires more toolstack logic to safely enable. >> Using 's' for off-by-default is fine if we want to get the series in now. >> >> But before we expose the MSR generally, we need to: >> >> 1) Put the configuration in msr_policy so the toolstack can reason about it >> 2) Reject migration attempts to destinations where the LBR format changes > Since this could be quite restrictive, and since people needing to know > they need to hide this feature for migration to work, I guess this would > further want qualifying by "did the guest actually use LBRs so far"? In practice, it's every major generation ("tock" on Intel's old model), so isn't actually limiting the kinds of heterogeneous setups used in production. (Migration gets steadily less stable the further apart the two CPUs are.) As to dynamic, no - that would be a security bug in a cloud scenario, because there must not be anything the guest can do to interfere with the manageability. Use of LBR is rare, as demonstrated by the fact that noone has complained about the fact that migrating such a VM will malfunction. As we now have a way of reporting "no model-specific LBR", I'm tempted to suggest that VMs get no LBR by default, and someone wanting LBR has to opt in, which is also an explicit agreement to the migration limitation. ~Andrew
Re: [PATCH V8 2/2] libxl: Introduce basic virtio-mmio support on Arm
On Thu, May 19, 2022 at 08:16:16PM +0300, Oleksandr wrote: > On 18.05.22 14:05, Anthony PERARD wrote: > > On Tue, May 03, 2022 at 08:26:03PM +0300, Oleksandr Tyshchenko wrote: > > > +for (i = 0; i < d_config->num_disks; i++) { > > > +libxl_device_disk *disk = &d_config->disks[i]; > > > + > > > +if (disk->specification == LIBXL_DISK_SPECIFICATION_VIRTIO) { > > > +disk->base = alloc_virtio_mmio_base(gc, &virtio_mmio_base); > > > +if (!disk->base) > > > +return ERROR_FAIL; > > > + > > > +disk->irq = alloc_virtio_mmio_irq(gc, &virtio_mmio_irq); > > > +if (!disk->irq) > > > +return ERROR_FAIL; > > > + > > > +if (virtio_irq < disk->irq) > > > +virtio_irq = disk->irq; > > > +virtio_enabled = true; > > > + > > > +LOG(DEBUG, "Allocate Virtio MMIO params for Vdev %s: IRQ %u > > > BASE 0x%"PRIx64, > > > +disk->vdev, disk->irq, disk->base); > > > +} > > > +} > > > + > > > +if (virtio_enabled) > > > +nr_spis += (virtio_irq - 32) + 1; > > Is it possible to update "nr_spis" inside the loop? > > yes, but ... > > > > The added value > > seems to be "number of virtio device + 1", > > ... not really ... > > > > so updating "nr_spis" and > > adding +1 after the loop could work, right? > > ... from my understanding, we cannot just increment nr_spis by "one" > inside a loop, we need to calculate it. > > > Something like that (not tested): > > uint32_t spi; > > ... > > spi = irq - 32; > if (nr_spis <= spi) > nr_spis = spi + 1; > > > Shall I update "nr_spis" inside the loop? > > Are you asking because of eliminating "virtio_enabled" and/or "virtio_irq" > locals? They are used down the code. I'm asking because the calculation doesn't really make sense to me yet. At the moment "virtio_irq-32+1" happen to be the "number of disk + 1" and we have "nr_spis += " which I don't think makes sense with the "+1". Doesn't "nr_spis" only need to be the highest irq value for the devices we're adding? (Maybe with +1) (also -32 because I think I understand what 32 stand for now) (also, the "num_irqs" loop just after this loop seems to do exactly that) But I think what this line of code needs the most is a comment. > > Also, what is "32"? Is it "GUEST_VIRTIO_MMIO_SPI_FIRST - 1" ? > > Although currently "GUEST_VIRTIO_MMIO_SPI_FIRST - 1" = "32", we cannot rely > on this (I mean to use "GUEST_VIRTIO_MMIO_SPI_FIRST - 1" > > instead of "32"), because "32" has yet another meaning. This is an offset > for SPI, the values before 32 are for private IRQs (PPI, SGI). Do you think it could be possible to name that value? I've seen other use of 32 in the same function that have probably the same meaning. But if you don't have a good name, I guess we can also live a bit longer with a plain "32". Cheers, -- Anthony PERARD
[ovmf test] 170605: regressions - FAIL
flight 170605 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/170605/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-amd64 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf a21a3438f795deecb24e1843c1636f95c485017c baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z 81 days Failing since168258 2022-03-01 01:55:31 Z 80 days 1123 attempts Testing same since 170593 2022-05-20 06:42:41 Z0 days 11 attempts People who touched revisions under test: Abdul Lateef Attar Abdul Lateef Attar via groups.io Abner Chang Akihiko Odaki Anthony PERARD Bandaru, Purna Chandra Rao Bo Chang Ke Bob Feng Chao Li Chao, Zhuoran Chen Lin Z Chen, Christine Chen, Lin Z Corvin Köhne Dandan Bi dann frazier Dun Tan duntan Feng, Bob C Gerd Hoffmann Gua Guo Guo Dong Guomin Jiang Hao A Wu Heng Luo Hua Ma Huang, Li-Xia Jagadeesh Ujja Jake Garver Jake Garver via groups.io Jason Jason Lou Jiewen Yao Ke, Bo-ChangX Ken Lautner Kenneth Lautner Kun Qin Kuo, Ted Laszlo Ersek Lean Sheng Tan Leif Lindholm Li, Yi1 Li, Zhihao Liming Gao Liu Liu Yun Liu Yun Y Liu, Zhiguang Lixia Huang Lou, Yun Ma, Hua Mara Sophie Grosch Mara Sophie Grosch via groups.io Matt DeVillier Michael D Kinney Michael Kubacki Michael Kubacki Min M Xu Min Xu Oliver Steffen Patrick Rudolph Peter Grehan Purna Chandra Rao Bandaru Ray Ni Rebecca Cran Rebecca Cran Sami Mujawar Sean Rhodes Sean Rhodes sean@starlabs.systems Sebastien Boeuf Sunny Wang Tan, Dun Ted Kuo Tom Lendacky Wenyi Xie wenyi,xie via groups.io Xiaolu.Jiang Xie, Yuanhao Yi Li yi1 li Yu Pu Yuanhao Xie Yuwei Chen Zhiguang Liu Zhihao Li Zhuoran Chao jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 6968 lines long.)
[REGRESSION} Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability
On 5/20/2022 10:06 AM, Jan Beulich wrote: On 20.05.2022 15:33, Chuck Zmudzinski wrote: On 5/20/2022 5:41 AM, Jan Beulich wrote: On 20.05.2022 10:30, Chuck Zmudzinski wrote: On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote: On 5/20/2022 2:05 AM, Jan Beulich wrote: On 20.05.2022 06:43, Chuck Zmudzinski wrote: On 5/4/22 5:14 AM, Juergen Gross wrote: On 04.05.22 10:31, Jan Beulich wrote: On 03.05.2022 15:22, Juergen Gross wrote: ... these uses there are several more. You say nothing on why those want leaving unaltered. When preparing my earlier patch I did inspect them and came to the conclusion that these all would also better observe the adjusted behavior (or else I couldn't have left pat_enabled() as the only predicate). In fact, as said in the description of my earlier patch, in my debugging I did find the use in i915_gem_object_pin_map() to be the problematic one, which you leave alone. Oh, I missed that one, sorry. That is why your patch would not fix my Haswell unless it also touches i915_gem_object_pin_map() in drivers/gpu/drm/i915/gem/i915_gem_pages.c I wanted to be rather defensive in my changes, but I agree at least the case in arch_phys_wc_add() might want to be changed, too. I think your approach needs to be more aggressive so it will fix all the known false negatives introduced by bdd8b6c98239 such as the one in i915_gem_object_pin_map(). I looked at Jan's approach and I think it would fix the issue with my Haswell as long as I don't use the nopat option. I really don't have a strong opinion on that question, but I think the nopat option as a Linux kernel option, as opposed to a hypervisor option, should only affect the kernel, and if the hypervisor provides the pat feature, then the kernel should not override that, Hmm, why would the kernel not be allowed to override that? Such an override would affect only the single domain where the kernel runs; other domains could take their own decisions. Also, for the sake of completeness: "nopat" used when running on bare metal has the same bad effect on system boot, so there pretty clearly is an error cleanup issue in the i915 driver. But that's orthogonal, and I expect the maintainers may not even care (but tell us "don't do that then"). Actually I just did a test with the last official Debian kernel build of Linux 5.16, that is, a kernel before bdd8b6c98239 was applied. In fact, the nopat option does *not* break the i915 driver in 5.16. That is, with the nopat option, the i915 driver loads normally on both the bare metal and on the Xen hypervisor. That means your presumption (and the presumption of the author of bdd8b6c98239) that the "nopat" option was being observed by the i915 driver is incorrect. Setting "nopat" had no effect on my system with Linux 5.16. So after doing these tests, I am against the aggressive approach of breaking the i915 driver with the "nopat" option because prior to bdd8b6c98239, nopat did not break the i915 driver. Why break it now? Because that's, in my understanding, is the purpose of "nopat" (not breaking the driver of course - that's a driver bug -, but having an effect on the driver). I wouldn't call it a driver bug, but an incorrect configuration of the kernel by the user. I presume X86_FEATURE_PAT is required by the i915 driver The driver ought to work fine without PAT (and hence without being able to make WC mappings). It would use UC instead and be slow, but it ought to work. I am not an expert, but I think the reason it failed on my box was because of the requirements of CI. Maybe the driver would fall back to UC if the add_taint_for_CI function did not halt the entire system in response to the failed test for PAT when trying to use WC mappings. and therefore the driver should refuse to disable it if the user requests to disable it and instead warn the user that the driver did not disable the feature, contrary to what the user requested with the nopat option. In any case, my test did not verify that when nopat is set in Linux 5.16, the thread takes the same code path as when nopat is not set, so I am not totally sure that the reason nopat does not break the i915 driver in 5.16 is that static_cpu_has(X86_FEATURE_PAT) returns true even when nopat is set. I could test it with a custom log message in 5.16 if that is necessary. Are you saying it was wrong for to return true in 5.16 when the user requests nopat? No, I'm not saying that. It was wrong for this construct to be used in the driver, which was fixed for 5.17 (and which had caused the regression I did observe, leading to the patch as a hopefully least bad option). Hmm, the patch I used to fix my box with 5.17.6 used static_cpu_has(X86_FEATURE_PAT) so the driver could continue to configure the hardware using WC. This is the relevant part of the patch I used to fix my box, which includes extra error logs, (against Debian's official build of 5.17.6): --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 2022-05-09 03:16:33.0 -0400 +
[linux-linus test] 170592: tolerable FAIL - PUSHED
flight 170592 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/170592/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 170581 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 170581 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 170581 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 170581 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 170581 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 170581 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 170581 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 170581 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: linux3d7285a335edaf23b699e87c528cf0b0070e3293 baseline version: linuxb015dcd62b86d298829990f8261d5d154b8d7af5 Last test of basis 170581 2022-05-19 21:41:24 Z0 days Testing same since 170592 2022-05-20 06:42:30 Z0 days1 attempts People who touched revisions under test: Herbert Xu Linus Torvalds Ondrej Mosnacek jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64
[ovmf test] 170607: regressions - FAIL
flight 170607 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/170607/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-amd64 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf a21a3438f795deecb24e1843c1636f95c485017c baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z 81 days Failing since168258 2022-03-01 01:55:31 Z 80 days 1124 attempts Testing same since 170593 2022-05-20 06:42:41 Z0 days 12 attempts People who touched revisions under test: Abdul Lateef Attar Abdul Lateef Attar via groups.io Abner Chang Akihiko Odaki Anthony PERARD Bandaru, Purna Chandra Rao Bo Chang Ke Bob Feng Chao Li Chao, Zhuoran Chen Lin Z Chen, Christine Chen, Lin Z Corvin Köhne Dandan Bi dann frazier Dun Tan duntan Feng, Bob C Gerd Hoffmann Gua Guo Guo Dong Guomin Jiang Hao A Wu Heng Luo Hua Ma Huang, Li-Xia Jagadeesh Ujja Jake Garver Jake Garver via groups.io Jason Jason Lou Jiewen Yao Ke, Bo-ChangX Ken Lautner Kenneth Lautner Kun Qin Kuo, Ted Laszlo Ersek Lean Sheng Tan Leif Lindholm Li, Yi1 Li, Zhihao Liming Gao Liu Liu Yun Liu Yun Y Liu, Zhiguang Lixia Huang Lou, Yun Ma, Hua Mara Sophie Grosch Mara Sophie Grosch via groups.io Matt DeVillier Michael D Kinney Michael Kubacki Michael Kubacki Min M Xu Min Xu Oliver Steffen Patrick Rudolph Peter Grehan Purna Chandra Rao Bandaru Ray Ni Rebecca Cran Rebecca Cran Sami Mujawar Sean Rhodes Sean Rhodes sean@starlabs.systems Sebastien Boeuf Sunny Wang Tan, Dun Ted Kuo Tom Lendacky Wenyi Xie wenyi,xie via groups.io Xiaolu.Jiang Xie, Yuanhao Yi Li yi1 li Yu Pu Yuanhao Xie Yuwei Chen Zhiguang Liu Zhihao Li Zhuoran Chao jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 6968 lines long.)
Re: [OSSTEST PATCH v2] ts-xen-build-prep: Install newer NASM version, to build OVMF
Anthony PERARD writes ("[OSSTEST PATCH v2] ts-xen-build-prep: Install newer NASM version, to build OVMF"): > Recent versions of OVMF now need a version of NASM that is newer > than the one available on Debian oldstable/buster. They want to use > NASM 2.15.05 [1], which is available in Debian stable/bullseye. The > need to use a newer version started with d3febfd9ade3 ("MdePkg: > Replace Opcode with the corresponding instructions."). > > There is no backport package available but the nasm package from > Debian Bullseye can easily be installed on Buster as it has few > dependencies and are already satisfied. > > [1] > https://github.com/tianocore/edk2/commit/6a890db161cd6d378bec3499a1e774db3f5a27a7 > ("BaseTools: Upgrade the version of NASM tool") LGTM assuming Roger is also happy... Acked-by: Ian Jackson -- Ian JacksonThese opinions are my own. Pronouns: they/he. If I emailed you from @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: [REGRESSION} Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability
I think this summary of the regression is appropriate for a top-post. Details follow below. commit bdd8b6c98239: introduced what I call a real regression which persists in 5.17.x Jan's proposed patch: https://lore.kernel.org/lkml/9385fa60-fa5d-f559-a137-6608408f8...@suse.com/ Jan's patch would fix the real regression introduced by bdd8b6c98239 when the nopat option is not enabled, but when the nopat option is enabled, this patch would introduce what Jan calls a "perceived regression" that is really caused by the failure of the i915 driver to handle the case of the nopat option being provided on the command line properly. What I request: commit Jan's proposed patch, and backport it to 5.17. That would fix the real regression and only cause a perceived regression for the case when nopat is enabled. In that case, patches to the i915 driver would be helpful but necessary to fix a regression. Regard, Chuck Zmudzinski On 5/20/2022 11:46 AM, Chuck Zmudzinski wrote: On 5/20/2022 10:06 AM, Jan Beulich wrote: On 20.05.2022 15:33, Chuck Zmudzinski wrote: On 5/20/2022 5:41 AM, Jan Beulich wrote: On 20.05.2022 10:30, Chuck Zmudzinski wrote: On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote: On 5/20/2022 2:05 AM, Jan Beulich wrote: On 20.05.2022 06:43, Chuck Zmudzinski wrote: On 5/4/22 5:14 AM, Juergen Gross wrote: On 04.05.22 10:31, Jan Beulich wrote: On 03.05.2022 15:22, Juergen Gross wrote: ... these uses there are several more. You say nothing on why those want leaving unaltered. When preparing my earlier patch I did inspect them and came to the conclusion that these all would also better observe the adjusted behavior (or else I couldn't have left pat_enabled() as the only predicate). In fact, as said in the description of my earlier patch, in my debugging I did find the use in i915_gem_object_pin_map() to be the problematic one, which you leave alone. Oh, I missed that one, sorry. That is why your patch would not fix my Haswell unless it also touches i915_gem_object_pin_map() in drivers/gpu/drm/i915/gem/i915_gem_pages.c I wanted to be rather defensive in my changes, but I agree at least the case in arch_phys_wc_add() might want to be changed, too. I think your approach needs to be more aggressive so it will fix all the known false negatives introduced by bdd8b6c98239 such as the one in i915_gem_object_pin_map(). I looked at Jan's approach and I think it would fix the issue with my Haswell as long as I don't use the nopat option. I really don't have a strong opinion on that question, but I think the nopat option as a Linux kernel option, as opposed to a hypervisor option, should only affect the kernel, and if the hypervisor provides the pat feature, then the kernel should not override that, Hmm, why would the kernel not be allowed to override that? Such an override would affect only the single domain where the kernel runs; other domains could take their own decisions. Also, for the sake of completeness: "nopat" used when running on bare metal has the same bad effect on system boot, so there pretty clearly is an error cleanup issue in the i915 driver. But that's orthogonal, and I expect the maintainers may not even care (but tell us "don't do that then"). Actually I just did a test with the last official Debian kernel build of Linux 5.16, that is, a kernel before bdd8b6c98239 was applied. In fact, the nopat option does *not* break the i915 driver in 5.16. That is, with the nopat option, the i915 driver loads normally on both the bare metal and on the Xen hypervisor. That means your presumption (and the presumption of the author of bdd8b6c98239) that the "nopat" option was being observed by the i915 driver is incorrect. Setting "nopat" had no effect on my system with Linux 5.16. So after doing these tests, I am against the aggressive approach of breaking the i915 driver with the "nopat" option because prior to bdd8b6c98239, nopat did not break the i915 driver. Why break it now? Because that's, in my understanding, is the purpose of "nopat" (not breaking the driver of course - that's a driver bug -, but having an effect on the driver). I wouldn't call it a driver bug, but an incorrect configuration of the kernel by the user. I presume X86_FEATURE_PAT is required by the i915 driver The driver ought to work fine without PAT (and hence without being able to make WC mappings). It would use UC instead and be slow, but it ought to work. I am not an expert, but I think the reason it failed on my box was because of the requirements of CI. Maybe the driver would fall back to UC if the add_taint_for_CI function did not halt the entire system in response to the failed test for PAT when trying to use WC mappings. and therefore the driver should refuse to disable it if the user requests to disable it and instead warn the user that the driver did not disable the feature, contrary to what the user requested with the nopat option. In any case, my test did not verify that when nopat is set
Re: [REGRESSION} Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability
On 5/20/2022 1:13 PM, Chuck Zmudzinski wrote: I think this summary of the regression is appropriate for a top-post. Details follow below. commit bdd8b6c98239: introduced what I call a real regression which persists in 5.17.x Jan's proposed patch: https://lore.kernel.org/lkml/9385fa60-fa5d-f559-a137-6608408f8...@suse.com/ Jan's patch would fix the real regression introduced by bdd8b6c98239 when the nopat option is not enabled, but when the nopat option is enabled, this patch would introduce what Jan calls a "perceived regression" that is really caused by the failure of the i915 driver to handle the case of the nopat option being provided on the command line properly. What I request: commit Jan's proposed patch, and backport it to 5.17. That would fix the real regression and only cause a perceived regression for the case when nopat is enabled. In that case, patches to the i915 driver would be helpful but necessary to fix a regression. Sorry again, I mean patches to i915 would be helpful but *not* necessary to fix a regression. Regards, Chuck Zmudzinski On 5/20/2022 11:46 AM, Chuck Zmudzinski wrote: On 5/20/2022 10:06 AM, Jan Beulich wrote: On 20.05.2022 15:33, Chuck Zmudzinski wrote: On 5/20/2022 5:41 AM, Jan Beulich wrote: On 20.05.2022 10:30, Chuck Zmudzinski wrote: On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote: On 5/20/2022 2:05 AM, Jan Beulich wrote: On 20.05.2022 06:43, Chuck Zmudzinski wrote: On 5/4/22 5:14 AM, Juergen Gross wrote: On 04.05.22 10:31, Jan Beulich wrote: On 03.05.2022 15:22, Juergen Gross wrote: ... these uses there are several more. You say nothing on why those want leaving unaltered. When preparing my earlier patch I did inspect them and came to the conclusion that these all would also better observe the adjusted behavior (or else I couldn't have left pat_enabled() as the only predicate). In fact, as said in the description of my earlier patch, in my debugging I did find the use in i915_gem_object_pin_map() to be the problematic one, which you leave alone. Oh, I missed that one, sorry. That is why your patch would not fix my Haswell unless it also touches i915_gem_object_pin_map() in drivers/gpu/drm/i915/gem/i915_gem_pages.c I wanted to be rather defensive in my changes, but I agree at least the case in arch_phys_wc_add() might want to be changed, too. I think your approach needs to be more aggressive so it will fix all the known false negatives introduced by bdd8b6c98239 such as the one in i915_gem_object_pin_map(). I looked at Jan's approach and I think it would fix the issue with my Haswell as long as I don't use the nopat option. I really don't have a strong opinion on that question, but I think the nopat option as a Linux kernel option, as opposed to a hypervisor option, should only affect the kernel, and if the hypervisor provides the pat feature, then the kernel should not override that, Hmm, why would the kernel not be allowed to override that? Such an override would affect only the single domain where the kernel runs; other domains could take their own decisions. Also, for the sake of completeness: "nopat" used when running on bare metal has the same bad effect on system boot, so there pretty clearly is an error cleanup issue in the i915 driver. But that's orthogonal, and I expect the maintainers may not even care (but tell us "don't do that then"). Actually I just did a test with the last official Debian kernel build of Linux 5.16, that is, a kernel before bdd8b6c98239 was applied. In fact, the nopat option does *not* break the i915 driver in 5.16. That is, with the nopat option, the i915 driver loads normally on both the bare metal and on the Xen hypervisor. That means your presumption (and the presumption of the author of bdd8b6c98239) that the "nopat" option was being observed by the i915 driver is incorrect. Setting "nopat" had no effect on my system with Linux 5.16. So after doing these tests, I am against the aggressive approach of breaking the i915 driver with the "nopat" option because prior to bdd8b6c98239, nopat did not break the i915 driver. Why break it now? Because that's, in my understanding, is the purpose of "nopat" (not breaking the driver of course - that's a driver bug -, but having an effect on the driver). I wouldn't call it a driver bug, but an incorrect configuration of the kernel by the user. I presume X86_FEATURE_PAT is required by the i915 driver The driver ought to work fine without PAT (and hence without being able to make WC mappings). It would use UC instead and be slow, but it ought to work. I am not an expert, but I think the reason it failed on my box was because of the requirements of CI. Maybe the driver would fall back to UC if the add_taint_for_CI function did not halt the entire system in response to the failed test for PAT when trying to use WC mappings. and therefore the driver should refuse to disable it if the user requests to disable it and instead warn the user that the dri
Re: [PATCH V8 1/2] libxl: Add support for Virtio disk configuration
On 18.05.22 13:52, Anthony PERARD wrote: Hello Anthony On Tue, May 03, 2022 at 08:26:02PM +0300, Oleksandr Tyshchenko wrote: From: Oleksandr Tyshchenko This patch adds basic support for configuring and assisting virtio-mmio based virtio-disk backend (emulator) which is intended to run out of Qemu and could be run in any domain. Although the Virtio block device is quite different from traditional Xen PV block device (vbd) from the toolstack's point of view: - as the frontend is virtio-blk which is not a Xenbus driver, nothing written to Xenstore are fetched by the frontend currently ("vdev" is not passed to the frontend). But this might need to be revised in future, so frontend data might be written to Xenstore in order to support hotplugging virtio devices or passing the backend domain id on arch where the device-tree is not available. - the ring-ref/event-channel are not used for the backend<->frontend communication, the proposed IPC for Virtio is IOREQ/DM it is still a "block device" and ought to be integrated in existing "disk" handling. So, re-use (and adapt) "disk" parsing/configuration logic to deal with Virtio devices as well. For the immediate purpose and an ability to extend that support for other use-cases in future (Qemu, virtio-pci, etc) perform the following actions: - Add new disk backend type (LIBXL_DISK_BACKEND_OTHER) and reflect that in the configuration - Introduce new disk "specification" and "transport" fields to struct libxl_device_disk. Both are written to the Xenstore. The transport field is only used for the specification "virtio" and it assumes only "mmio" value for now. - Introduce new "specification" option with "xen" communication protocol being default value. - Add new device kind (LIBXL__DEVICE_KIND_VIRTIO_DISK) as current one (LIBXL__DEVICE_KIND_VBD) doesn't fit into Virtio disk model Is this still an issue? Yes, it is. Both LIBXL__DEVICE_KIND_VBD and LIBXL__DEVICE_KIND_VIRTIO_DISK are for disk devices, but they are quite different from toolstack PoV. They use different specifications/transports. The latter is for virtio-blk driver which is not xenbus driver. The way how both Virtio backend and frontend get configuration and communicate with each are completely different from Xen PV drivers. Since v5, the "disk/vbd" kind is used. Also see my comment about libxl_device_disk_get_path() regarding this. ok An example of domain configuration for Virtio disk: disk = [ 'phy:/dev/mmcblk0p3, xvda1, backendtype=other, specification=virtio'] Nothing has changed for default Xen disk configuration. Please note, this patch is not enough for virtio-disk to work on Xen (Arm), as for every Virtio device (including disk) we need to allocate Virtio MMIO params (IRQ and memory region) and pass them to the backend, also update Guest device-tree. The subsequent patch will add these missing bits. For the current patch, the default "irq" and "base" are just written to the Xenstore. This is not an ideal splitting, but this way we avoid breaking the bisectability. Signed-off-by: Oleksandr Tyshchenko --- diff --git a/tools/libs/light/libxl_disk.c b/tools/libs/light/libxl_disk.c index a5ca778..7fd98ce 100644 --- a/tools/libs/light/libxl_disk.c +++ b/tools/libs/light/libxl_disk.c @@ -163,6 +163,19 @@ static int libxl__device_disk_setdefault(libxl__gc *gc, uint32_t domid, rc = libxl__resolve_domid(gc, disk->backend_domname, &disk->backend_domid); if (rc < 0) return rc; +if (disk->specification == LIBXL_DISK_SPECIFICATION_UNKNOWN) +disk->specification = LIBXL_DISK_SPECIFICATION_XEN; + +/* + * The transport field is only used for the specification "virtio" and + * it assumes only "mmio" value for now. When there will be a need to add + * "pci" support, we will need to remove the enforcement here and + * respective assert(s) down the code and let the toolstack to decide + * the transport to use. + */ +if (disk->specification == LIBXL_DISK_SPECIFICATION_VIRTIO) +disk->transport = LIBXL_DISK_TRANSPORT_MMIO; Could you check that `disk->transport` is unset when `specification==xen` ? And probably return ERROR_INVAL in this case. yes, will do Also, I don't think you should overwrite the value set by an application in _setdefault(). If `specification==virtio`, check first that `transport` as a supported value (unknown or mmio) then you can then you can set the `transport` value expected by virtio if it wasn't set by the application. ( An example of this is done the function already when enforcing qdisk for cdroms. ) I got it, will do I transform to something like that: if (disk->specification == LIBXL_DISK_SPECIFICATION_XEN && disk->transport != LIBXL_DISK_TRANSPORT_UNKNOWN) { LOGD(ERROR, domid, "Transport is only supported for specification virtio"); return ERROR_FAIL; } /* Force transport mmio for specifica
[ovmf test] 170608: regressions - FAIL
flight 170608 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/170608/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-xsm 6 xen-buildfail REGR. vs. 168254 build-amd64 6 xen-buildfail REGR. vs. 168254 build-i3866 xen-buildfail REGR. vs. 168254 build-i386-xsm6 xen-buildfail REGR. vs. 168254 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a version targeted for testing: ovmf a21a3438f795deecb24e1843c1636f95c485017c baseline version: ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70 Last test of basis 168254 2022-02-28 10:41:46 Z 81 days Failing since168258 2022-03-01 01:55:31 Z 80 days 1125 attempts Testing same since 170593 2022-05-20 06:42:41 Z0 days 13 attempts People who touched revisions under test: Abdul Lateef Attar Abdul Lateef Attar via groups.io Abner Chang Akihiko Odaki Anthony PERARD Bandaru, Purna Chandra Rao Bo Chang Ke Bob Feng Chao Li Chao, Zhuoran Chen Lin Z Chen, Christine Chen, Lin Z Corvin Köhne Dandan Bi dann frazier Dun Tan duntan Feng, Bob C Gerd Hoffmann Gua Guo Guo Dong Guomin Jiang Hao A Wu Heng Luo Hua Ma Huang, Li-Xia Jagadeesh Ujja Jake Garver Jake Garver via groups.io Jason Jason Lou Jiewen Yao Ke, Bo-ChangX Ken Lautner Kenneth Lautner Kun Qin Kuo, Ted Laszlo Ersek Lean Sheng Tan Leif Lindholm Li, Yi1 Li, Zhihao Liming Gao Liu Liu Yun Liu Yun Y Liu, Zhiguang Lixia Huang Lou, Yun Ma, Hua Mara Sophie Grosch Mara Sophie Grosch via groups.io Matt DeVillier Michael D Kinney Michael Kubacki Michael Kubacki Min M Xu Min Xu Oliver Steffen Patrick Rudolph Peter Grehan Purna Chandra Rao Bandaru Ray Ni Rebecca Cran Rebecca Cran Sami Mujawar Sean Rhodes Sean Rhodes sean@starlabs.systems Sebastien Boeuf Sunny Wang Tan, Dun Ted Kuo Tom Lendacky Wenyi Xie wenyi,xie via groups.io Xiaolu.Jiang Xie, Yuanhao Yi Li yi1 li Yu Pu Yuanhao Xie Yuwei Chen Zhiguang Liu Zhihao Li Zhuoran Chao jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 6968 lines long.)