date:20220520

Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability

2022-05-20 Thread Chuck Zmudzinski


On 5/20/2022 2:05 AM, Jan Beulich wrote:

On 20.05.2022 06:43, Chuck Zmudzinski wrote:

On 5/4/22 5:14 AM, Juergen Gross wrote:

On 04.05.22 10:31, Jan Beulich wrote:

On 03.05.2022 15:22, Juergen Gross wrote:

Some drivers are using pat_enabled() in order to test availability of
special caching modes (WC and UC-). This will lead to false negatives
in case the system was booted e.g. with the "nopat" variant and the
BIOS did setup the PAT MSR supporting the queried mode, or if the
system is running as a Xen PV guest.

...

Add test functions for those caching modes instead and use them at the
appropriate places.

Fixes: bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with
pat_enabled()")
Fixes: ae749c7ab475 ("PCI: Add arch_can_pci_mmap_wc() macro")
Signed-off-by: Juergen Gross 

...


--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -94,7 +94,7 @@ int pcibios_set_irq_routing(struct pci_dev *dev,
int pin, int irq);
       #define HAVE_PCI_MMAP
-#define arch_can_pci_mmap_wc()    pat_enabled()
+#define arch_can_pci_mmap_wc()    x86_has_pat_wc()

Besides this and ...


--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -76,7 +76,7 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void
*data,
   if (args->flags & ~(I915_MMAP_WC))
   return -EINVAL;
   -    if (args->flags & I915_MMAP_WC && !pat_enabled())
+    if (args->flags & I915_MMAP_WC && !x86_has_pat_wc())
   return -ENODEV;
     obj = i915_gem_object_lookup(file, args->handle);
@@ -757,7 +757,7 @@ i915_gem_dumb_mmap_offset(struct drm_file *file,
     if (HAS_LMEM(to_i915(dev)))
   mmap_type = I915_MMAP_TYPE_FIXED;
-    else if (pat_enabled())
+    else if (x86_has_pat_wc())
   mmap_type = I915_MMAP_TYPE_WC;
   else if (!i915_ggtt_has_aperture(to_gt(i915)->ggtt))
   return -ENODEV;
@@ -813,7 +813,7 @@ i915_gem_mmap_offset_ioctl(struct drm_device
*dev, void *data,
   break;
     case I915_MMAP_OFFSET_WC:
-    if (!pat_enabled())
+    if (!x86_has_pat_wc())
   return -ENODEV;
   type = I915_MMAP_TYPE_WC;
   break;
@@ -823,7 +823,7 @@ i915_gem_mmap_offset_ioctl(struct drm_device
*dev, void *data,
   break;
     case I915_MMAP_OFFSET_UC:
-    if (!pat_enabled())
+    if (!x86_has_pat_uc_minus())
   return -ENODEV;
   type = I915_MMAP_TYPE_UC;
   break;

... these uses there are several more. You say nothing on why those want
leaving unaltered. When preparing my earlier patch I did inspect them
and came to the conclusion that these all would also better observe the
adjusted behavior (or else I couldn't have left pat_enabled() as the
only
predicate). In fact, as said in the description of my earlier patch, in
my debugging I did find the use in i915_gem_object_pin_map() to be the
problematic one, which you leave alone.

Oh, I missed that one, sorry.

That is why your patch would not fix my Haswell unless
it also touches i915_gem_object_pin_map() in
drivers/gpu/drm/i915/gem/i915_gem_pages.c


I wanted to be rather defensive in my changes, but I agree at least the
case in arch_phys_wc_add() might want to be changed, too.

I think your approach needs to be more aggressive so it will fix
all the known false negatives introduced by bdd8b6c98239
such as the one in i915_gem_object_pin_map().

I looked at Jan's approach and I think it would fix the issue
with my Haswell as long as I don't use the nopat option. I
really don't have a strong opinion on that question, but I
think the nopat option as a Linux kernel option, as opposed
to a hypervisor option, should only affect the kernel, and
if the hypervisor provides the pat feature, then the kernel
should not override that,

Hmm, why would the kernel not be allowed to override that? Such
an override would affect only the single domain where the
kernel runs; other domains could take their own decisions.

Also, for the sake of completeness: "nopat" used when running on
bare metal has the same bad effect on system boot, so there
pretty clearly is an error cleanup issue in the i915 driver. But
that's orthogonal, and I expect the maintainers may not even care
(but tell us "don't do that then").

Jan


but because of the confusion,


As I just wrote earlier, the confusion is whether or not "nopat"
means the kernel drivers will not use pat even if the firmware
and hypervisor provides it. I think you are correct to
point out that is the way the i915 driver behaved with the nopat
option before bdd8b6c98239 was applied, with the same
bad effects on bare metal as with the hypervisor. I think perhaps
dealing with the nopat option to fix bdd8b6c98239 is a solution in
search of a problem, at least as regards the i915 driver.

The only problem we have, as I see it, is with a false negative
when the nopat option is *not* enabled. But the forced disabling
of pat in Jan's patch when the nopat option is enabled is probably
needed

[ovmf test] 170593: regressions - FAIL

2022-05-20 Thread osstest service owner

flight 170593 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170593/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf a21a3438f795deecb24e1843c1636f95c485017c
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z   80 days
Failing since168258  2022-03-01 01:55:31 Z   80 days 1113 attempts
Testing same since   170593  2022-05-20 06:42:41 Z0 days1 attempts


People who touched revisions under test:
  Abdul Lateef Attar 
  Abdul Lateef Attar via groups.io 
  Abner Chang 
  Akihiko Odaki 
  Anthony PERARD 
  Bandaru, Purna Chandra Rao 
  Bo Chang Ke 
  Bob Feng 
  Chao Li 
  Chao, Zhuoran 
  Chen Lin Z 
  Chen, Christine 
  Chen, Lin Z 
  Corvin Köhne 
  Dandan Bi 
  dann frazier 
  Dun Tan 
  duntan 
  Feng, Bob C 
  Gerd Hoffmann 
  Gua Guo 
  Guo Dong 
  Guomin Jiang 
  Hao A Wu 
  Heng Luo 
  Hua Ma 
  Huang, Li-Xia 
  Jagadeesh Ujja 
  Jake Garver 
  Jake Garver via groups.io 
  Jason 
  Jason Lou 
  Jiewen Yao 
  Ke, Bo-ChangX 
  Ken Lautner 
  Kenneth Lautner 
  Kun Qin 
  Kuo, Ted 
  Laszlo Ersek 
  Lean Sheng Tan 
  Leif Lindholm 
  Li, Yi1 
  Li, Zhihao 
  Liming Gao 
  Liu 
  Liu Yun 
  Liu Yun Y 
  Liu, Zhiguang 
  Lixia Huang 
  Lou, Yun 
  Ma, Hua 
  Mara Sophie Grosch 
  Mara Sophie Grosch via groups.io 
  Matt DeVillier 
  Michael D Kinney 
  Michael Kubacki 
  Michael Kubacki 
  Min M Xu 
  Min Xu 
  Oliver Steffen 
  Patrick Rudolph 
  Peter Grehan 
  Purna Chandra Rao Bandaru 
  Ray Ni 
  Rebecca Cran 
  Rebecca Cran 
  Sami Mujawar 
  Sean Rhodes 
  Sean Rhodes sean@starlabs.systems
  Sebastien Boeuf 
  Sunny Wang 
  Tan, Dun 
  Ted Kuo 
  Tom Lendacky 
  Wenyi Xie 
  wenyi,xie via groups.io 
  Xiaolu.Jiang 
  Xie, Yuanhao 
  Yi Li 
  yi1 li 
  Yu Pu 
  Yuanhao Xie 
  Yuwei Chen 
  Zhiguang Liu 
  Zhihao Li 
  Zhuoran Chao 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 6968 lines long.)

Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability

2022-05-20 Thread Chuck Zmudzinski


On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote:

On 5/20/2022 2:05 AM, Jan Beulich wrote:

On 20.05.2022 06:43, Chuck Zmudzinski wrote:

On 5/4/22 5:14 AM, Juergen Gross wrote:

On 04.05.22 10:31, Jan Beulich wrote:

On 03.05.2022 15:22, Juergen Gross wrote:

... these uses there are several more. You say nothing on why 
those want

leaving unaltered. When preparing my earlier patch I did inspect them
and came to the conclusion that these all would also better 
observe the

adjusted behavior (or else I couldn't have left pat_enabled() as the
only predicate). In fact, as said in the description of my earlier 
patch, in
my debugging I did find the use in i915_gem_object_pin_map() to be 
the

problematic one, which you leave alone.

Oh, I missed that one, sorry.

That is why your patch would not fix my Haswell unless
it also touches i915_gem_object_pin_map() in
drivers/gpu/drm/i915/gem/i915_gem_pages.c

I wanted to be rather defensive in my changes, but I agree at least 
the

case in arch_phys_wc_add() might want to be changed, too.

I think your approach needs to be more aggressive so it will fix
all the known false negatives introduced by bdd8b6c98239
such as the one in i915_gem_object_pin_map().

I looked at Jan's approach and I think it would fix the issue
with my Haswell as long as I don't use the nopat option. I
really don't have a strong opinion on that question, but I
think the nopat option as a Linux kernel option, as opposed
to a hypervisor option, should only affect the kernel, and
if the hypervisor provides the pat feature, then the kernel
should not override that,

Hmm, why would the kernel not be allowed to override that? Such
an override would affect only the single domain where the
kernel runs; other domains could take their own decisions.

Also, for the sake of completeness: "nopat" used when running on
bare metal has the same bad effect on system boot, so there
pretty clearly is an error cleanup issue in the i915 driver. But
that's orthogonal, and I expect the maintainers may not even care
(but tell us "don't do that then").


Actually I just did a test with the last official Debian kernel
build of Linux 5.16, that is, a kernel before bdd8b6c98239 was
applied. In fact, the nopat option does *not* break the i915 driver
in 5.16. That is, with the nopat option, the i915 driver loads
normally on both the bare metal and on the Xen hypervisor.
That means your presumption (and the presumption of
the author of bdd8b6c98239) that the "nopat" option was
being observed by the i915 driver is incorrect. Setting "nopat"
had no effect on my system with Linux 5.16. So after doing these
tests, I am against the aggressive approach of breaking the i915
driver with the "nopat" option because prior to bdd8b6c98239,
nopat did not break the i915 driver. Why break it now?

Prior to bdd8b6c98239, the i915 driver used
static_cpu_has(X86_FEATURE_PAT) to test for the PAT
feature, and apparently this returns true even if nopat
is set, but the new test, pat_enabled(), returns false on
the Xen hypervisor even if nopat is not set. That is
the only problem I see. The question of nopat should
be irrelevant to the i915 driver.

It was unfortunate that the author of bdd8b6c98239
mentioned nopat in the commit message when in fact
nopat was never intended to be used to break the
i915 driver. The i915 driver should ignore the nopat
option and decide what to do based solely on the
capability of the cpu, firmware, and the compiled
options of the Linux kernel. That is how it behaved
before bdd8b6c98239, and that behavior is what needs
to be restored with a patch.

Chuck

[PATCH v6 1/2] ns16550: use poll mode if INTERRUPT_LINE is 0xff

2022-05-20 Thread Marek Marczykowski-Górecki

Intel LPSS has INTERRUPT_LINE set to 0xff by default, that is declared
by the PCI Local Bus Specification Revision 3.0 (from 2004) as
"unknown"/"no connection". Fallback to poll mode in this case.
The 0xff handling is x86-specific, the surrounding code is guarded with
CONFIG_X86 anyway.

Signed-off-by: Marek Marczykowski-Górecki 
Reviewed-by: Roger Pau Monné 
---
Changes in v6:
 - wrap the check in additional CONFIG_X86, with appropriate comment
Changes in v5:
 - drop IRQ 0 from the log message
Changes in v4:
 - adjust log message, change it from WARNING to INFO
 - re-add x86 reference in the commit message
Changes in v3:
 - change back to checking 0xff explicitly
 - adjust commit message, include spec reference
 - change warning to match the above
Changes in v2:
 - add log message
 - extend commit message
 - code style fix
---
 xen/drivers/char/ns16550.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
index fb75cee4a13a..b37f67dc7430 100644
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -1238,6 +1238,17 @@ pci_uart_config(struct ns16550 *uart, bool_t skip_amt, 
unsigned int idx)
 pci_conf_read8(PCI_SBDF(0, b, d, f),
PCI_INTERRUPT_LINE) : 0;
 
+#ifdef CONFIG_X86
+/* PCI Local Bus Specification Revision 3.0 defines 0xff value
+ * as special only for X86 */
+if ( uart->irq == 0xff )
+uart->irq = 0;
+#endif
+if ( !uart->irq )
+printk(XENLOG_INFO
+   "ns16550: %pp no legacy IRQ, using poll mode\n",
+   &PCI_SBDF(0, b, d, f));
+
 return 0;
 }
 }
-- 
2.35.1

[PATCH v6 2/2] ns16550: Add more device IDs for Intel LPSS UART

2022-05-20 Thread Marek Marczykowski-Górecki

This is purely based on the spec:
- Intel 500 Series PCH: 635218-006
- Intel 600 Series PCH: 691222-001, 648364-003

This is tested only on TGL-LP added initially, but according to the
spec, they should behave the same.

Signed-off-by: Marek Marczykowski-Górecki 
Acked-by: Andrew Cooper 
---
Changes in v2:
 - new patch, adding more IDs to the patch that went in already
---
 xen/drivers/char/ns16550.c | 80 +-
 1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
index b37f67dc7430..b7da5646fc28 100644
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -1077,12 +1077,90 @@ static const struct ns16550_config __initconst 
uart_config[] =
 .dev_id = 0x0358,
 .param = param_exar_xr17v358
 },
-/* Intel Corp. TGL-LP LPSS PCI */
+/* Intel Corp. TGL-LP LPSS PCI UART #0 */
+{
+.vendor_id = PCI_VENDOR_ID_INTEL,
+.dev_id = 0xa0a8,
+.param = param_intel_lpss
+},
+/* Intel Corp. TGL-LP LPSS PCI UART #1 */
+{
+.vendor_id = PCI_VENDOR_ID_INTEL,
+.dev_id = 0xa0a9,
+.param = param_intel_lpss
+},
+/* Intel Corp. TGL-LP LPSS PCI UART #2 */
 {
 .vendor_id = PCI_VENDOR_ID_INTEL,
 .dev_id = 0xa0c7,
 .param = param_intel_lpss
 },
+/* Intel Corp. TGL-H LPSS PCI UART #0 */
+{
+.vendor_id = PCI_VENDOR_ID_INTEL,
+.dev_id = 0x43a8,
+.param = param_intel_lpss
+},
+/* Intel Corp. TGL-H LPSS PCI UART #1 */
+{
+.vendor_id = PCI_VENDOR_ID_INTEL,
+.dev_id = 0x43a9,
+.param = param_intel_lpss
+},
+/* Intel Corp. TGL-H LPSS PCI UART #2 */
+{
+.vendor_id = PCI_VENDOR_ID_INTEL,
+.dev_id = 0x43a7,
+.param = param_intel_lpss
+},
+/* Intel Corp. ADL-P LPSS PCI UART #0 */
+{
+.vendor_id = PCI_VENDOR_ID_INTEL,
+.dev_id = 0x51a8,
+.param = param_intel_lpss
+},
+/* Intel Corp. ADL-P LPSS PCI UART #1 */
+{
+.vendor_id = PCI_VENDOR_ID_INTEL,
+.dev_id = 0x51a9,
+.param = param_intel_lpss
+},
+/* Intel Corp. ADL-P LPSS PCI UART #2 */
+{
+.vendor_id = PCI_VENDOR_ID_INTEL,
+.dev_id = 0x51c7,
+.param = param_intel_lpss
+},
+/* Intel Corp. ADL-P LPSS PCI UART #3 */
+{
+.vendor_id = PCI_VENDOR_ID_INTEL,
+.dev_id = 0x51da,
+.param = param_intel_lpss
+},
+/* Intel Corp. ADL-S LPSS PCI UART #0 */
+{
+.vendor_id = PCI_VENDOR_ID_INTEL,
+.dev_id = 0x7aa8,
+.param = param_intel_lpss
+},
+/* Intel Corp. ADL-S LPSS PCI UART #1 */
+{
+.vendor_id = PCI_VENDOR_ID_INTEL,
+.dev_id = 0x7aa9,
+.param = param_intel_lpss
+},
+/* Intel Corp. ADL-S LPSS PCI UART #2 */
+{
+.vendor_id = PCI_VENDOR_ID_INTEL,
+.dev_id = 0x7afe,
+.param = param_intel_lpss
+},
+/* Intel Corp. ADL-S LPSS PCI UART #3 */
+{
+.vendor_id = PCI_VENDOR_ID_INTEL,
+.dev_id = 0x7adc,
+.param = param_intel_lpss
+},
 };
 
 static int __init
-- 
2.35.1

[ovmf test] 170594: regressions - FAIL

2022-05-20 Thread osstest service owner

flight 170594 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170594/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf a21a3438f795deecb24e1843c1636f95c485017c
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z   80 days
Failing since168258  2022-03-01 01:55:31 Z   80 days 1114 attempts
Testing same since   170593  2022-05-20 06:42:41 Z0 days2 attempts


People who touched revisions under test:
  Abdul Lateef Attar 
  Abdul Lateef Attar via groups.io 
  Abner Chang 
  Akihiko Odaki 
  Anthony PERARD 
  Bandaru, Purna Chandra Rao 
  Bo Chang Ke 
  Bob Feng 
  Chao Li 
  Chao, Zhuoran 
  Chen Lin Z 
  Chen, Christine 
  Chen, Lin Z 
  Corvin Köhne 
  Dandan Bi 
  dann frazier 
  Dun Tan 
  duntan 
  Feng, Bob C 
  Gerd Hoffmann 
  Gua Guo 
  Guo Dong 
  Guomin Jiang 
  Hao A Wu 
  Heng Luo 
  Hua Ma 
  Huang, Li-Xia 
  Jagadeesh Ujja 
  Jake Garver 
  Jake Garver via groups.io 
  Jason 
  Jason Lou 
  Jiewen Yao 
  Ke, Bo-ChangX 
  Ken Lautner 
  Kenneth Lautner 
  Kun Qin 
  Kuo, Ted 
  Laszlo Ersek 
  Lean Sheng Tan 
  Leif Lindholm 
  Li, Yi1 
  Li, Zhihao 
  Liming Gao 
  Liu 
  Liu Yun 
  Liu Yun Y 
  Liu, Zhiguang 
  Lixia Huang 
  Lou, Yun 
  Ma, Hua 
  Mara Sophie Grosch 
  Mara Sophie Grosch via groups.io 
  Matt DeVillier 
  Michael D Kinney 
  Michael Kubacki 
  Michael Kubacki 
  Min M Xu 
  Min Xu 
  Oliver Steffen 
  Patrick Rudolph 
  Peter Grehan 
  Purna Chandra Rao Bandaru 
  Ray Ni 
  Rebecca Cran 
  Rebecca Cran 
  Sami Mujawar 
  Sean Rhodes 
  Sean Rhodes sean@starlabs.systems
  Sebastien Boeuf 
  Sunny Wang 
  Tan, Dun 
  Ted Kuo 
  Tom Lendacky 
  Wenyi Xie 
  wenyi,xie via groups.io 
  Xiaolu.Jiang 
  Xie, Yuanhao 
  Yi Li 
  yi1 li 
  Yu Pu 
  Yuanhao Xie 
  Yuwei Chen 
  Zhiguang Liu 
  Zhihao Li 
  Zhuoran Chao 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 6968 lines long.)

Re: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm

2022-05-20 Thread Bertrand Marquis

Hi,

> On 19 May 2022, at 03:36, Wei Chen  wrote:
> 
> Hi Jan,
> 
>> -Original Message-
>> From: Jan Beulich 
>> Sent: 2022年5月18日 21:05
>> To: Wei Chen 
>> Cc: nd ; Stefano Stabellini ; Julien
>> Grall ; Bertrand Marquis ;
>> Volodymyr Babchuk ; Andrew Cooper
>> ; Roger Pau Monné ; Wei
>> Liu ; Jiamei Xie ; xen-
>> de...@lists.xenproject.org
>> Subject: Re: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm
>> 
>> On 11.05.2022 03:46, Wei Chen wrote:
>>> x86 is using compiler feature testing to decide EFI build
>>> enable or not. When EFI build is disabled, x86 will use an
>>> efi/stub.c file to replace efi/runtime.c for build objects.
>>> Following this idea, we introduce a stub file for Arm, but
>>> use CONFIG_ARM_EFI to decide EFI build enable or not.
>>> 
>>> And the most functions in x86 EFI stub.c can be reused for
>>> other architectures, like Arm. So we move them to common
>>> and keep the x86 specific function in x86/efi/stub.c.
>>> 
>>> To avoid the symbol link conflict error when linking common
>>> stub files to x86/efi. We add a regular file check in efi
>>> stub files' link script. Depends on this check we can bypass
>>> the link behaviors for existed stub files in x86/efi.
>>> 
>>> As there is no Arm specific EFI stub function for Arm in
>>> current stage, Arm still can use the existed symbol link
>>> method for EFI stub files.
>> 
>> Wouldn't it be better to mandate that every arch has its stub.c,
>> and in the Arm one all you'd do (for now) is #include the common
>> one? (But see also below.)
>> 
> 
> Personally, I don't like to include a C file into another C file.
> But I am OK as long as the Arm maintainers agree.
> @Stefano Stabellini @Bertrand Marquis @Julien Grall

I agree with Wei here and if we are realistic the current way the EFI code 
works needs a redesign anyway and asking him to change this in this serie is 
not right.
So I am OK with Wei solution.

Cheers
Bertrand

Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability

2022-05-20 Thread Jan Beulich

On 20.05.2022 10:30, Chuck Zmudzinski wrote:
> On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote:
>> On 5/20/2022 2:05 AM, Jan Beulich wrote:
>>> On 20.05.2022 06:43, Chuck Zmudzinski wrote:
 On 5/4/22 5:14 AM, Juergen Gross wrote:
> On 04.05.22 10:31, Jan Beulich wrote:
>> On 03.05.2022 15:22, Juergen Gross wrote:
>>
>> ... these uses there are several more. You say nothing on why 
>> those want
>> leaving unaltered. When preparing my earlier patch I did inspect them
>> and came to the conclusion that these all would also better 
>> observe the
>> adjusted behavior (or else I couldn't have left pat_enabled() as the
>> only predicate). In fact, as said in the description of my earlier 
>> patch, in
>> my debugging I did find the use in i915_gem_object_pin_map() to be 
>> the
>> problematic one, which you leave alone.
> Oh, I missed that one, sorry.
 That is why your patch would not fix my Haswell unless
 it also touches i915_gem_object_pin_map() in
 drivers/gpu/drm/i915/gem/i915_gem_pages.c

> I wanted to be rather defensive in my changes, but I agree at least 
> the
> case in arch_phys_wc_add() might want to be changed, too.
 I think your approach needs to be more aggressive so it will fix
 all the known false negatives introduced by bdd8b6c98239
 such as the one in i915_gem_object_pin_map().

 I looked at Jan's approach and I think it would fix the issue
 with my Haswell as long as I don't use the nopat option. I
 really don't have a strong opinion on that question, but I
 think the nopat option as a Linux kernel option, as opposed
 to a hypervisor option, should only affect the kernel, and
 if the hypervisor provides the pat feature, then the kernel
 should not override that,
>>> Hmm, why would the kernel not be allowed to override that? Such
>>> an override would affect only the single domain where the
>>> kernel runs; other domains could take their own decisions.
>>>
>>> Also, for the sake of completeness: "nopat" used when running on
>>> bare metal has the same bad effect on system boot, so there
>>> pretty clearly is an error cleanup issue in the i915 driver. But
>>> that's orthogonal, and I expect the maintainers may not even care
>>> (but tell us "don't do that then").
> 
> Actually I just did a test with the last official Debian kernel
> build of Linux 5.16, that is, a kernel before bdd8b6c98239 was
> applied. In fact, the nopat option does *not* break the i915 driver
> in 5.16. That is, with the nopat option, the i915 driver loads
> normally on both the bare metal and on the Xen hypervisor.
> That means your presumption (and the presumption of
> the author of bdd8b6c98239) that the "nopat" option was
> being observed by the i915 driver is incorrect. Setting "nopat"
> had no effect on my system with Linux 5.16. So after doing these
> tests, I am against the aggressive approach of breaking the i915
> driver with the "nopat" option because prior to bdd8b6c98239,
> nopat did not break the i915 driver. Why break it now?

Because that's, in my understanding, is the purpose of "nopat"
(not breaking the driver of course - that's a driver bug -, but
having an effect on the driver).

Jan

Re: [PATCH v6 2/2] ns16550: Add more device IDs for Intel LPSS UART

2022-05-20 Thread Jan Beulich

On 20.05.2022 10:53, Marek Marczykowski-Górecki wrote:
> This is purely based on the spec:
> - Intel 500 Series PCH: 635218-006
> - Intel 600 Series PCH: 691222-001, 648364-003
> 
> This is tested only on TGL-LP added initially, but according to the
> spec, they should behave the same.
> 
> Signed-off-by: Marek Marczykowski-Górecki 
> Acked-by: Andrew Cooper 

Oops? Commit 1f0b1f5cce9d.

Jan

Re: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm

2022-05-20 Thread Julien Grall

Hi Wei,

On 19/05/2022 03:36, Wei Chen wrote:

-Original Message-
From: Jan Beulich 
Sent: 2022年5月18日 21:05
To: Wei Chen 
Cc: nd ; Stefano Stabellini ; Julien
Grall ; Bertrand Marquis ;
Volodymyr Babchuk ; Andrew Cooper
; Roger Pau Monné ; Wei
Liu ; Jiamei Xie ; xen-
de...@lists.xenproject.org
Subject: Re: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm

On 11.05.2022 03:46, Wei Chen wrote:

x86 is using compiler feature testing to decide EFI build
enable or not. When EFI build is disabled, x86 will use an
efi/stub.c file to replace efi/runtime.c for build objects.
Following this idea, we introduce a stub file for Arm, but
use CONFIG_ARM_EFI to decide EFI build enable or not.

And the most functions in x86 EFI stub.c can be reused for
other architectures, like Arm. So we move them to common
and keep the x86 specific function in x86/efi/stub.c.

To avoid the symbol link conflict error when linking common
stub files to x86/efi. We add a regular file check in efi
stub files' link script. Depends on this check we can bypass
the link behaviors for existed stub files in x86/efi.

As there is no Arm specific EFI stub function for Arm in
current stage, Arm still can use the existed symbol link
method for EFI stub files.

Wouldn't it be better to mandate that every arch has its stub.c,
and in the Arm one all you'd do (for now) is #include the common
one? (But see also below.)

Personally, I don't like to include a C file into another C file.

Same here. I know we already use it in EFI but I am not in favor to add 
more of them.

Cheers,

--
Julien Grall

Re: [PATCH v6 1/2] ns16550: use poll mode if INTERRUPT_LINE is 0xff

2022-05-20 Thread Jan Beulich

On 20.05.2022 10:53, Marek Marczykowski-Górecki wrote:
> Intel LPSS has INTERRUPT_LINE set to 0xff by default, that is declared
> by the PCI Local Bus Specification Revision 3.0 (from 2004) as
> "unknown"/"no connection". Fallback to poll mode in this case.
> The 0xff handling is x86-specific, the surrounding code is guarded with
> CONFIG_X86 anyway.
> 
> Signed-off-by: Marek Marczykowski-Górecki 
> Reviewed-by: Roger Pau Monné 

Reviewed-by: Jan Beulich 

> Changes in v6:
>  - wrap the check in additional CONFIG_X86, with appropriate comment

Thanks, albeit ...

> --- a/xen/drivers/char/ns16550.c
> +++ b/xen/drivers/char/ns16550.c
> @@ -1238,6 +1238,17 @@ pci_uart_config(struct ns16550 *uart, bool_t skip_amt, 
> unsigned int idx)
>  pci_conf_read8(PCI_SBDF(0, b, d, f),
> PCI_INTERRUPT_LINE) : 0;
>  
> +#ifdef CONFIG_X86
> +/* PCI Local Bus Specification Revision 3.0 defines 0xff 
> value
> + * as special only for X86 */

... I'll have to remember to correct the style of the comment while
committing ...

> +if ( uart->irq == 0xff )
> +uart->irq = 0;
> +#endif
> +if ( !uart->irq )
> +printk(XENLOG_INFO
> +   "ns16550: %pp no legacy IRQ, using poll mode\n",

... and perhaps insert another colon after %pp here.

Jan

> +   &PCI_SBDF(0, b, d, f));
> +
>  return 0;
>  }
>  }

[ovmf test] 170595: regressions - FAIL

2022-05-20 Thread osstest service owner

flight 170595 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170595/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf a21a3438f795deecb24e1843c1636f95c485017c
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z   80 days
Failing since168258  2022-03-01 01:55:31 Z   80 days 1115 attempts
Testing same since   170593  2022-05-20 06:42:41 Z0 days3 attempts


People who touched revisions under test:
  Abdul Lateef Attar 
  Abdul Lateef Attar via groups.io 
  Abner Chang 
  Akihiko Odaki 
  Anthony PERARD 
  Bandaru, Purna Chandra Rao 
  Bo Chang Ke 
  Bob Feng 
  Chao Li 
  Chao, Zhuoran 
  Chen Lin Z 
  Chen, Christine 
  Chen, Lin Z 
  Corvin Köhne 
  Dandan Bi 
  dann frazier 
  Dun Tan 
  duntan 
  Feng, Bob C 
  Gerd Hoffmann 
  Gua Guo 
  Guo Dong 
  Guomin Jiang 
  Hao A Wu 
  Heng Luo 
  Hua Ma 
  Huang, Li-Xia 
  Jagadeesh Ujja 
  Jake Garver 
  Jake Garver via groups.io 
  Jason 
  Jason Lou 
  Jiewen Yao 
  Ke, Bo-ChangX 
  Ken Lautner 
  Kenneth Lautner 
  Kun Qin 
  Kuo, Ted 
  Laszlo Ersek 
  Lean Sheng Tan 
  Leif Lindholm 
  Li, Yi1 
  Li, Zhihao 
  Liming Gao 
  Liu 
  Liu Yun 
  Liu Yun Y 
  Liu, Zhiguang 
  Lixia Huang 
  Lou, Yun 
  Ma, Hua 
  Mara Sophie Grosch 
  Mara Sophie Grosch via groups.io 
  Matt DeVillier 
  Michael D Kinney 
  Michael Kubacki 
  Michael Kubacki 
  Min M Xu 
  Min Xu 
  Oliver Steffen 
  Patrick Rudolph 
  Peter Grehan 
  Purna Chandra Rao Bandaru 
  Ray Ni 
  Rebecca Cran 
  Rebecca Cran 
  Sami Mujawar 
  Sean Rhodes 
  Sean Rhodes sean@starlabs.systems
  Sebastien Boeuf 
  Sunny Wang 
  Tan, Dun 
  Ted Kuo 
  Tom Lendacky 
  Wenyi Xie 
  wenyi,xie via groups.io 
  Xiaolu.Jiang 
  Xie, Yuanhao 
  Yi Li 
  yi1 li 
  Yu Pu 
  Yuanhao Xie 
  Yuwei Chen 
  Zhiguang Liu 
  Zhihao Li 
  Zhuoran Chao 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 6968 lines long.)

[libvirt test] 170589: regressions - FAIL

2022-05-20 Thread osstest service owner

flight 170589 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170589/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  1bd24e79bee1800e7191863cc4617b12fe614dbe
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  679 days
Failing since151818  2020-07-11 04:18:52 Z  678 days  660 attempts
Testing same since   170589  2022-05-20 04:20:23 Z0 days1 attempts


People who touched revisions under test:
Adolfo Jayme Barrientos 
  Aleksandr Alekseev 
  Aleksei Zakharov 
  Amneesh Singh 
  Andika Triwidada 
  Andrea Bolognani 
  Andrew Melnychenko 
  Ani Sinha 
  Balázs Meskó 
  Barrett Schonefeld 
  Bastian Germann 
  Bastien Orivel 
  BiaoXiang Ye 
  Bihong Yu 
  Binfeng Wu 
  Bjoern Walk 
  Boris Fiuczynski 
  Brad Laue 
  Brian Turek 
  Bruno Haible 
  Chris Mayo 
  Christian Borntraeger 
  Christian Ehrhardt 
  Christian Kirbach 
  Christian Schoenebeck 
  Christophe Fergeau 
  Claudio Fontana 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Cédric Bosdonnat 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Didik Supriadi 
  dinglimin 
  Divya Garg 
  Dmitrii Shcherbakov 
  Dmytro Linkin 
  Eiichi Tsukata 
  Emilio Herrera 
  Eric Farman 
  Erik Skultety 
  Fabian Affolter 
  Fabian Freyer 
  Fabiano Fidêncio 
  Fangge Jin 
  Farhan Ali 
  Fedora Weblate Translation 
  Franck Ridel 
  Gavi Teitz 
  gongwei 
  Guoyi Tu
  Göran Uddeborg 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Haonan Wang 
  Hela Basa 
  Helmut Grohne 
  Hiroki Narukawa 
  Hyman Huang(黄勇) 
  Ian Wienand 
  Ioanna Alifieraki 
  Ivan Teterevkov 
  Jakob Meng 
  Jamie Strandboge 
  Jamie Strandboge 
  Jan Kuparinen 
  jason lee 
  Jean-Baptiste Holcroft 
  Jia Zhou 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jing Qi 
  Jinsheng Zhang 
  Jiri Denemark 
  Joachim Falk 
  John Ferlan 
  John Levon 
  John Levon 
  Jonathan Watt 
  Jonathon Jongsma 
  Julio Faracco 
  Justin Gatzen 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Kim InSoo 
  Koichi Murase 
  Kristina Hanicova 
  Laine Stump 
  Laszlo Ersek 
  Lee Yarwood 
  Lei Yang 
  Lena Voytek 
  Liang Yan 
  Liang Yan 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Lin Ma 
  Liu Yiding 
  Lubomir Rintel 
  Luke Yue 
  Luyao Zhong 
  luzhipeng 
  Marc Hartmayer 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Markus Schade 
  Martin Kletzander 
  Martin Pitt 
  Masayoshi Mizuma 
  Matej Cepl 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Maxim Nestratov 
  Meina Li 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Moshe Levi 
  Moteen Shah 
  Moteen Shah 
  Muha Aliss 
  Nathan 
  Neal Gompa 
  Nick Chevsky 
  Nick Shyrokovskiy 
  Nickys Music Group 
  Nico Pache 
  Nicolas Lécureuil 
  Nicolas Lécureuil 
  Nikolay Shirokovskiy 
  Nikolay Shirokovskiy 
  Nikolay Shirokovskiy 
  Niteesh Dubey 
  Olaf Hering 
  Olesya Gerasimenko 
  Or Ozeri 
  Orion Poplawski 
  Pany 
  Paolo Bonzini 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peng Liang 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano

Re: [PATCH v3] xen/build: Add cppcheck and cppcheck-html make rules

2022-05-20 Thread Julien Grall


Hi Bertrand,

Since this patch has been committed, I get the following message for on 
every build (make -C xen):


which: no cppcheck in ([...])
/bin/sh: cppcheck: command not found

I wasn't expecting the build system to check every time. I think...

On 26/04/2022 13:38, Bertrand Marquis wrote:

+cppcheck-version:
+ifeq ($(shell which $(CPPCHECK)),)


... this is because ifeq is similar to pre-preprocessing. So it always 
get executed.


So far, the message is harmlesss (I can continue to build) but I think 
the check should really only happen when the target cppcheck-version is 
executed. So can you have a look to rework the rule?


Cheers,

--
Julien Grall

Re: [PATCH v3] xen/build: Add cppcheck and cppcheck-html make rules

2022-05-20 Thread Bertrand Marquis

Hi Julien,

> On 20 May 2022, at 10:56, Julien Grall  wrote:
> 
> Hi Bertrand,
> 
> Since this patch has been committed, I get the following message for on every 
> build (make -C xen):
> 
> which: no cppcheck in ([...])
> /bin/sh: cppcheck: command not found
> 
> I wasn't expecting the build system to check every time. I think...
> 
> On 26/04/2022 13:38, Bertrand Marquis wrote:
>> +cppcheck-version:
>> +ifeq ($(shell which $(CPPCHECK)),)
> 
> ... this is because ifeq is similar to pre-preprocessing. So it always get 
> executed.
> 
> So far, the message is harmlesss (I can continue to build) but I think the 
> check should really only happen when the target cppcheck-version is executed. 
> So can you have a look to rework the rule?

Sure I will push something to fix that.

Thanks for the notice

Cheers
Bertrand

> 
> Cheers,
> 
> -- 
> Julien Grall
>

Re: [PATCH 2/2] x86/vmx: implement Notify VM Exit

2022-05-20 Thread Roger Pau Monné

On Thu, May 19, 2022 at 04:45:20PM +0200, Roger Pau Monné wrote:
> On Thu, May 19, 2022 at 12:10:24AM +, Andrew Cooper wrote:
> > On 17/05/2022 14:21, Roger Pau Monne wrote:
> > > @@ -1333,6 +1338,19 @@ static int construct_vmcs(struct vcpu *v)
> > >  rc = vmx_add_msr(v, MSR_FLUSH_CMD, FLUSH_CMD_L1D,
> > >   VMX_MSR_GUEST_LOADONLY);
> > >  
> > > +if ( cpu_has_vmx_notify_vm_exiting && vm_notify_window >= 0 )
> > > +{
> > > +__vmwrite(NOTIFY_WINDOW, vm_notify_window);
> > > +/*
> > > + * Disable #AC and #DB interception: by using VM Notify Xen is
> > > + * guaranteed to get a VM exit even if the guest manages to lock 
> > > the
> > > + * CPU.
> > > + */
> > > +v->arch.hvm.vmx.exception_bitmap &= ~((1U << TRAP_debug) |
> > > +  (1U << 
> > > TRAP_alignment_check));
> > > +vmx_update_exception_bitmap(v);
> > 
> > IIRC, it's not quite this easy.  There are conditions, e.g. attaching
> > gdbsx, where #DB interception wants turning on/off dynamically, and the
> > logic got simplified to nothing following XSA-156, so will need
> > reintroducing.
> > 
> > AMD Milan (Zen3) actually has NoNestedDataBp in CPUID.8021.eax[0]
> > which allows us to not intercept #DB, so perhaps that might offer an
> > easier way of adjusting the interception logic.  (Or maybe not.  I can't
> > remember).
> 
> OK, will look into it.

So after taking a look, I think we need to modify vmx_update_debug_state() so 
it's:

void vmx_update_debug_state(struct vcpu *v)
{
unsigned int mask = 1u << TRAP_int3;

if ( v->arch.hvm.vmx.secondary_exec_control &
 SECONDARY_EXEC_NOTIFY_VM_EXITING )
/*
 * Only allow toggling TRAP_debug if notify VM exit is enabled, as
 * unconditionally setting TRAP_debug is part of the XSA-156 fix.
 */
mask |= 1u << TRAP_debug;

if ( v->arch.hvm.debug_state_latch )
v->arch.hvm.vmx.exception_bitmap |= mask;
else
v->arch.hvm.vmx.exception_bitmap &= ~mask;

[...]

I'm however confused by the usage of cpu_has_monitor_trap_flag
previous to XSA-156, which was:

void vmx_update_debug_state(struct vcpu *v)
{
unsigned long mask;

mask = 1u << TRAP_int3;
if ( !cpu_has_monitor_trap_flag )
mask |= 1u << TRAP_debug;

Was it fine to not set TRAP_debug only if cpu_has_monitor_trap_flag
is supported by the CPU? (even if not currently set on
secondary_exec_control)?

Thanks, Roger.

[xen-unstable test] 170586: tolerable FAIL

2022-05-20 Thread osstest service owner

flight 170586 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170586/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemut-debianhvm-amd64 17 guest-saverestore.2 fail in 170563 
pass in 170586
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-install  fail pass in 170563

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 170563
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 170563
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 170563
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 170563
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 170563
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 170563
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 170563
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 170563
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 170563
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 170563
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 170563
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 170563
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pas

Re: [PATCH v4 14/21] x86: introduce helper for recording degree of contiguity in page tables

2022-05-20 Thread Roger Pau Monné

On Wed, May 18, 2022 at 12:06:29PM +0200, Jan Beulich wrote:
> On 06.05.2022 15:25, Roger Pau Monné wrote:
> > On Mon, Apr 25, 2022 at 10:41:23AM +0200, Jan Beulich wrote:
> >> --- /dev/null
> >> +++ b/xen/arch/x86/include/asm/pt-contig-markers.h
> >> @@ -0,0 +1,105 @@
> >> +#ifndef __ASM_X86_PT_CONTIG_MARKERS_H
> >> +#define __ASM_X86_PT_CONTIG_MARKERS_H
> >> +
> >> +/*
> >> + * Short of having function templates in C, the function defined below is
> >> + * intended to be used by multiple parties interested in recording the
> >> + * degree of contiguity in mappings by a single page table.
> >> + *
> >> + * Scheme: Every entry records the order of contiguous successive entries,
> >> + * up to the maximum order covered by that entry (which is the number of
> >> + * clear low bits in its index, with entry 0 being the exception using
> >> + * the base-2 logarithm of the number of entries in a single page table).
> >> + * While a few entries need touching upon update, knowing whether the
> >> + * table is fully contiguous (and can hence be replaced by a higher level
> >> + * leaf entry) is then possible by simply looking at entry 0's marker.
> >> + *
> >> + * Prereqs:
> >> + * - CONTIG_MASK needs to be #define-d, to a value having at least 4
> >> + *   contiguous bits (ignored by hardware), before including this file,
> >> + * - page tables to be passed here need to be initialized with correct
> >> + *   markers.
> > 
> > Not sure it's very relevant, but might we worth adding that:
> > 
> > - Null entries must have the PTE zeroed except for the CONTIG_MASK
> >   region in order to be considered as inactive.
> 
> NP, I've added an item along these lines.
> 
> >> +static bool pt_update_contig_markers(uint64_t *pt, unsigned int idx,
> >> + unsigned int level, enum PTE_kind 
> >> kind)
> >> +{
> >> +unsigned int b, i = idx;
> >> +unsigned int shift = (level - 1) * CONTIG_LEVEL_SHIFT + PAGE_SHIFT;
> >> +
> >> +ASSERT(idx < CONTIG_NR);
> >> +ASSERT(!(pt[idx] & CONTIG_MASK));
> >> +
> >> +/* Step 1: Reduce markers in lower numbered entries. */
> >> +while ( i )
> >> +{
> >> +b = find_first_set_bit(i);
> >> +i &= ~(1U << b);
> >> +if ( GET_MARKER(pt[i]) > b )
> >> +SET_MARKER(pt[i], b);
> > 
> > Can't you exit early when you find an entry that already has the
> > to-be-set contiguous marker <= b, as lower numbered entries will then
> > also be <= b'?
> > 
> > Ie:
> > 
> > if ( GET_MARKER(pt[i]) <= b )
> > break;
> > else
> > SET_MARKER(pt[i], b);
> 
> Almost - I think it would need to be 
> 
> if ( GET_MARKER(pt[i]) < b )
> break;
> if ( GET_MARKER(pt[i]) > b )
> SET_MARKER(pt[i], b);

I guess I'm slightly confused, but if marker at i is <= b, then all
following markers will also be <=, and hence could be skipped?

Not sure why we need to keep iterating if GET_MARKER(pt[i]) == b.

FWIW, you could even do:

if ( GET_MARKER(pt[i]) <= b )
break;
SET_MARKER(pt[i], b);

Which would keep the conditionals to 1 like it currently is.

> 
> or, accepting redundant updates, 
> 
> if ( GET_MARKER(pt[i]) < b )
> break;
> SET_MARKER(pt[i], b);
> 
> . Neither the redundant updates nor the extra (easily mis-predicted)
> conditional looked very appealing to me, but I guess I could change
> this if you are convinced that's better than continuing a loop with
> at most 9 (typically less) iterations.

Well, I think I at least partly understood the logic.  Not sure
whether it's worth adding the conditional or just assuming that
continuing the loop is going to be cheaper.  Might be worth adding a
comment that we choose to explicitly not add an extra conditional to
check for early exit, because we assume that to be more expensive than
just continuing.

Thanks, Roger.

[ovmf test] 170597: regressions - FAIL

2022-05-20 Thread osstest service owner

flight 170597 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170597/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf a21a3438f795deecb24e1843c1636f95c485017c
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z   80 days
Failing since168258  2022-03-01 01:55:31 Z   80 days 1116 attempts
Testing same since   170593  2022-05-20 06:42:41 Z0 days4 attempts


People who touched revisions under test:
  Abdul Lateef Attar 
  Abdul Lateef Attar via groups.io 
  Abner Chang 
  Akihiko Odaki 
  Anthony PERARD 
  Bandaru, Purna Chandra Rao 
  Bo Chang Ke 
  Bob Feng 
  Chao Li 
  Chao, Zhuoran 
  Chen Lin Z 
  Chen, Christine 
  Chen, Lin Z 
  Corvin Köhne 
  Dandan Bi 
  dann frazier 
  Dun Tan 
  duntan 
  Feng, Bob C 
  Gerd Hoffmann 
  Gua Guo 
  Guo Dong 
  Guomin Jiang 
  Hao A Wu 
  Heng Luo 
  Hua Ma 
  Huang, Li-Xia 
  Jagadeesh Ujja 
  Jake Garver 
  Jake Garver via groups.io 
  Jason 
  Jason Lou 
  Jiewen Yao 
  Ke, Bo-ChangX 
  Ken Lautner 
  Kenneth Lautner 
  Kun Qin 
  Kuo, Ted 
  Laszlo Ersek 
  Lean Sheng Tan 
  Leif Lindholm 
  Li, Yi1 
  Li, Zhihao 
  Liming Gao 
  Liu 
  Liu Yun 
  Liu Yun Y 
  Liu, Zhiguang 
  Lixia Huang 
  Lou, Yun 
  Ma, Hua 
  Mara Sophie Grosch 
  Mara Sophie Grosch via groups.io 
  Matt DeVillier 
  Michael D Kinney 
  Michael Kubacki 
  Michael Kubacki 
  Min M Xu 
  Min Xu 
  Oliver Steffen 
  Patrick Rudolph 
  Peter Grehan 
  Purna Chandra Rao Bandaru 
  Ray Ni 
  Rebecca Cran 
  Rebecca Cran 
  Sami Mujawar 
  Sean Rhodes 
  Sean Rhodes sean@starlabs.systems
  Sebastien Boeuf 
  Sunny Wang 
  Tan, Dun 
  Ted Kuo 
  Tom Lendacky 
  Wenyi Xie 
  wenyi,xie via groups.io 
  Xiaolu.Jiang 
  Xie, Yuanhao 
  Yi Li 
  yi1 li 
  Yu Pu 
  Yuanhao Xie 
  Yuwei Chen 
  Zhiguang Liu 
  Zhihao Li 
  Zhuoran Chao 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 6968 lines long.)

Re: [PATCH v6 2/2] ns16550: Add more device IDs for Intel LPSS UART

2022-05-20 Thread Marek Marczykowski-Górecki

On Fri, May 20, 2022 at 11:42:37AM +0200, Jan Beulich wrote:
> On 20.05.2022 10:53, Marek Marczykowski-Górecki wrote:
> > This is purely based on the spec:
> > - Intel 500 Series PCH: 635218-006
> > - Intel 600 Series PCH: 691222-001, 648364-003
> > 
> > This is tested only on TGL-LP added initially, but according to the
> > spec, they should behave the same.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki 
> > Acked-by: Andrew Cooper 
> 
> Oops? Commit 1f0b1f5cce9d.

Right, I haven't fetch new master. No changes here for several
iterations, so everything is fine.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature

Re: [PATCH v6 1/2] ns16550: use poll mode if INTERRUPT_LINE is 0xff

2022-05-20 Thread Marek Marczykowski-Górecki

On Fri, May 20, 2022 at 11:47:02AM +0200, Jan Beulich wrote:
> On 20.05.2022 10:53, Marek Marczykowski-Górecki wrote:
> > Intel LPSS has INTERRUPT_LINE set to 0xff by default, that is declared
> > by the PCI Local Bus Specification Revision 3.0 (from 2004) as
> > "unknown"/"no connection". Fallback to poll mode in this case.
> > The 0xff handling is x86-specific, the surrounding code is guarded with
> > CONFIG_X86 anyway.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki 
> > Reviewed-by: Roger Pau Monné 
> 
> Reviewed-by: Jan Beulich 

Thanks.

> > Changes in v6:
> >  - wrap the check in additional CONFIG_X86, with appropriate comment
> 
> Thanks, albeit ...
> 
> > --- a/xen/drivers/char/ns16550.c
> > +++ b/xen/drivers/char/ns16550.c
> > @@ -1238,6 +1238,17 @@ pci_uart_config(struct ns16550 *uart, bool_t 
> > skip_amt, unsigned int idx)
> >  pci_conf_read8(PCI_SBDF(0, b, d, f),
> > PCI_INTERRUPT_LINE) : 0;
> >  
> > +#ifdef CONFIG_X86
> > +/* PCI Local Bus Specification Revision 3.0 defines 0xff 
> > value
> > + * as special only for X86 */
> 
> ... I'll have to remember to correct the style of the comment while
> committing ...
> 
> > +if ( uart->irq == 0xff )
> > +uart->irq = 0;
> > +#endif
> > +if ( !uart->irq )
> > +printk(XENLOG_INFO
> > +   "ns16550: %pp no legacy IRQ, using poll mode\n",
> 
> ... and perhaps insert another colon after %pp here.

Both fine with me, thanks!

> Jan
> 
> > +   &PCI_SBDF(0, b, d, f));
> > +
> >  return 0;
> >  }
> >  }
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature

Re: [PATCH v4 17/21] AMD/IOMMU: replace all-contiguous page tables by superpage mappings

2022-05-20 Thread Roger Pau Monné

On Wed, May 18, 2022 at 12:40:59PM +0200, Jan Beulich wrote:
> On 10.05.2022 17:31, Roger Pau Monné wrote:
> > On Mon, Apr 25, 2022 at 10:43:16AM +0200, Jan Beulich wrote:
> >> @@ -94,11 +95,15 @@ static union amd_iommu_pte set_iommu_pte
> >>   old.iw != iw || old.ir != ir )
> >>  {
> >>  set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
> >> -pt_update_contig_markers(&table->raw, pfn_to_pde_idx(dfn, level),
> >> - level, PTE_kind_leaf);
> >> +*contig = pt_update_contig_markers(&table->raw,
> >> +   pfn_to_pde_idx(dfn, level),
> >> +   level, PTE_kind_leaf);
> >>  }
> >>  else
> >> +{
> >>  old.pr = false; /* signal "no change" to the caller */
> >> +*contig = false;
> > 
> > So we assume that any caller getting contig == true must have acted
> > and coalesced the page table?
> 
> Yes, except that I wouldn't use "must", but "would". It's not a
> requirement after all, functionality-wise all will be fine without
> re-coalescing.
> 
> > Might be worth a comment, to note that the function assumes that a
> > previous return of contig == true will have coalesced the page table
> > and hence a "no change" PTE write is not expected to happen on a
> > contig page table.
> 
> I'm not convinced, as there's effectively only one caller,
> amd_iommu_map_page(). I also don't see why "no change" would be a
> problem. "No change" can't result in a fully contiguous page table
> if the page table wasn't fully contiguous already before (at which
> point it would have been replaced by a superpage).

Right, I agree, it's just that I would have preferred the result from
set_iommu_ptes_present() to be consistent, ie: repeated calls to it
using the same PTE should set contig to the same value.  Anyway,
that's not relevant to any current callers, so:

Reviewed-by: Roger Pau Monné 

Thanks, Roger.

Re: [PATCH v2 1/3] tools/xl: Sort create command options

2022-05-20 Thread Anthony PERARD

On Tue, Apr 19, 2022 at 06:56:03PM -0700, Elliott Mitchell wrote:
> Hopefully simplify future changes by sorting options lists for
> `xl create`.  While at it, declare the options list constant.
> 
> Signed-off-by: Elliott Mitchell 

Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD

Re: [PATCH v4 18/21] VT-d: replace all-contiguous page tables by superpage mappings

2022-05-20 Thread Roger Pau Monné

On Wed, May 18, 2022 at 12:44:06PM +0200, Jan Beulich wrote:
> On 11.05.2022 13:08, Roger Pau Monné wrote:
> > On Mon, Apr 25, 2022 at 10:43:45AM +0200, Jan Beulich wrote:
> >> When a page table ends up with all contiguous entries (including all
> >> identical attributes), it can be replaced by a superpage entry at the
> >> next higher level. The page table itself can then be scheduled for
> >> freeing.
> >>
> >> The adjustment to LEVEL_MASK is merely to avoid leaving a latent trap
> >> for whenever we (and obviously hardware) start supporting 512G mappings.
> >>
> >> Signed-off-by: Jan Beulich 
> >> Reviewed-by: Kevin Tian 
> > 
> > Like on the AMD side, I wonder whether you can get away with only
> 
> FTAOD I take it you mean "like on the all-empty side", as on AMD we
> don't need to do any cache flushing?

Heh, yes, sorry.

> > doing a cache flush for the last (highest level) PTE, as the lower
> > ones won't be reachable anyway, as the page-table is freed.
> 
> But that freeing will happen only later, with a TLB flush in between.
> Until then we would better make sure the IOMMU sees what was written,
> even if it reading a stale value _should_ be benign.

Hm, but when doing the TLB flush the paging structures will already be
fully updated, and the top level visible entry will have it's cache
flushed, so the lower ones would never be reachable AFAICT.

Thanks, Roger.

Re: [PATCH v4 19/21] IOMMU/x86: add perf counters for page table splitting / coalescing

2022-05-20 Thread Roger Pau Monné

On Wed, May 18, 2022 at 01:39:02PM +0200, Jan Beulich wrote:
> On 11.05.2022 15:48, Roger Pau Monné wrote:
> > On Mon, Apr 25, 2022 at 10:44:11AM +0200, Jan Beulich wrote:
> >> Signed-off-by: Jan Beulich 
> >> Reviewed-by: Kevin tian 
> > 
> > Reviewed-by: Roger Pau Monné 
> 
> Thanks.
> 
> > Would be helpful to also have those per-guest I think.
> 
> Perhaps, but we don't have per-guest counter infrastructure, do we?

No, I don't think so?  Would be nice, but I don't see us doing it any
time soon.

Thanks, Roger.

Re: [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables

2022-05-20 Thread Roger Pau Monné

On Thu, May 19, 2022 at 02:12:04PM +0200, Jan Beulich wrote:
> On 06.05.2022 13:16, Roger Pau Monné wrote:
> > On Mon, Apr 25, 2022 at 10:40:55AM +0200, Jan Beulich wrote:
> >> ---
> >> An alternative to the ASSERT()s added to set_iommu_ptes_present() would
> >> be to make the function less general-purpose; it's used in a single
> >> place only after all (i.e. it might as well be folded into its only
> >> caller).
> > 
> > I would think adding a comment that the function requires the PDE to
> > be empty would be good.
> 
> But that's not the case - what the function expects to be clear is
> what is being ASSERT()ed.
> 
> >  Also given the current usage we could drop
> > the nr_ptes parameter and just name the function fill_pde() or
> > similar.
> 
> Right, but that would want to be a separate change.
> 
> >> --- a/xen/drivers/passthrough/amd/iommu_map.c
> >> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> >> @@ -115,7 +115,19 @@ static void set_iommu_ptes_present(unsig
> >>  
> >>  while ( nr_ptes-- )
> >>  {
> >> -set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
> >> +ASSERT(!pde->next_level);
> >> +ASSERT(!pde->u);
> >> +
> >> +if ( pde > table )
> >> +ASSERT(pde->ign0 == find_first_set_bit(pde - table));
> >> +else
> >> +ASSERT(pde->ign0 == PAGE_SHIFT - 3);
> > 
> > I think PAGETABLE_ORDER would be clearer here.
> 
> I disagree - PAGETABLE_ORDER is a CPU-side concept. It's not used anywhere
> in IOMMU code afaics.

Isn't PAGE_SHIFT also a CPU-side concept in the same way?  I'm not
sure what's the rule for declaring that PAGE_SHIFT is fine to use in
IOMMU code  but not PAGETABLE_ORDER.

Thanks, Roger.

[PATCH] build: Fix make warning if there is no cppcheck

2022-05-20 Thread Bertrand Marquis

If cppcheck is not present, the following warning appears during build:
which: no cppcheck in ([...])
/bin/sh: cppcheck: command not found

Fix this by hiding the error output from which and only try to execute
cppcheck --version if we have a cppcheck.

Reported-by: Julien Grall 
Signed-off-by: Bertrand Marquis 
---
 xen/Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/xen/Makefile b/xen/Makefile
index 15388703bc..f42be3d0ab 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -694,12 +694,13 @@ $(objtree)/%.c.cppcheck: $(srctree)/%.c 
$(objtree)/include/generated/autoconf.h
$(call if_changed,cppcheck_xml)
 
 cppcheck-version:
-ifeq ($(shell which $(CPPCHECK)),)
+ifeq ($(shell which $(CPPCHECK) 2> /dev/null),)
$(error Cannot find cppcheck executable: $(CPPCHECK))
-endif
+else
 ifeq ($(shell $(CPPCHECK) --version | awk '{print ($$2 < 2.7)}'),1)
$(error Please upgrade your cppcheck to version 2.7 or greater)
 endif
+endif
 
 # Put this in generated headers this way it is cleaned by include/Makefile
 $(objtree)/include/generated/compiler-def.h:
-- 
2.25.1

Re: [PATCH v4 14/21] x86: introduce helper for recording degree of contiguity in page tables

2022-05-20 Thread Jan Beulich

On 20.05.2022 12:22, Roger Pau Monné wrote:
> On Wed, May 18, 2022 at 12:06:29PM +0200, Jan Beulich wrote:
>> On 06.05.2022 15:25, Roger Pau Monné wrote:
>>> On Mon, Apr 25, 2022 at 10:41:23AM +0200, Jan Beulich wrote:
 --- /dev/null
 +++ b/xen/arch/x86/include/asm/pt-contig-markers.h
 @@ -0,0 +1,105 @@
 +#ifndef __ASM_X86_PT_CONTIG_MARKERS_H
 +#define __ASM_X86_PT_CONTIG_MARKERS_H
 +
 +/*
 + * Short of having function templates in C, the function defined below is
 + * intended to be used by multiple parties interested in recording the
 + * degree of contiguity in mappings by a single page table.
 + *
 + * Scheme: Every entry records the order of contiguous successive entries,
 + * up to the maximum order covered by that entry (which is the number of
 + * clear low bits in its index, with entry 0 being the exception using
 + * the base-2 logarithm of the number of entries in a single page table).
 + * While a few entries need touching upon update, knowing whether the
 + * table is fully contiguous (and can hence be replaced by a higher level
 + * leaf entry) is then possible by simply looking at entry 0's marker.
 + *
 + * Prereqs:
 + * - CONTIG_MASK needs to be #define-d, to a value having at least 4
 + *   contiguous bits (ignored by hardware), before including this file,
 + * - page tables to be passed here need to be initialized with correct
 + *   markers.
>>>
>>> Not sure it's very relevant, but might we worth adding that:
>>>
>>> - Null entries must have the PTE zeroed except for the CONTIG_MASK
>>>   region in order to be considered as inactive.
>>
>> NP, I've added an item along these lines.
>>
 +static bool pt_update_contig_markers(uint64_t *pt, unsigned int idx,
 + unsigned int level, enum PTE_kind 
 kind)
 +{
 +unsigned int b, i = idx;
 +unsigned int shift = (level - 1) * CONTIG_LEVEL_SHIFT + PAGE_SHIFT;
 +
 +ASSERT(idx < CONTIG_NR);
 +ASSERT(!(pt[idx] & CONTIG_MASK));
 +
 +/* Step 1: Reduce markers in lower numbered entries. */
 +while ( i )
 +{
 +b = find_first_set_bit(i);
 +i &= ~(1U << b);
 +if ( GET_MARKER(pt[i]) > b )
 +SET_MARKER(pt[i], b);
>>>
>>> Can't you exit early when you find an entry that already has the
>>> to-be-set contiguous marker <= b, as lower numbered entries will then
>>> also be <= b'?
>>>
>>> Ie:
>>>
>>> if ( GET_MARKER(pt[i]) <= b )
>>> break;
>>> else
>>> SET_MARKER(pt[i], b);
>>
>> Almost - I think it would need to be 
>>
>> if ( GET_MARKER(pt[i]) < b )
>> break;
>> if ( GET_MARKER(pt[i]) > b )
>> SET_MARKER(pt[i], b);
> 
> I guess I'm slightly confused, but if marker at i is <= b, then all
> following markers will also be <=, and hence could be skipped?

Your use of "following" is ambiguous here, because the iteration
moves downwards as far as PTEs inspected are concerned (and it's
b which grows from one iteration to the next). But yes, I think I
agree now that ...

> Not sure why we need to keep iterating if GET_MARKER(pt[i]) == b.

... this isn't needed. At which point ...

> FWIW, you could even do:
> 
> if ( GET_MARKER(pt[i]) <= b )
> break;
> SET_MARKER(pt[i], b);
> 
> Which would keep the conditionals to 1 like it currently is.
> 
>>
>> or, accepting redundant updates, 
>>
>> if ( GET_MARKER(pt[i]) < b )
>> break;
>> SET_MARKER(pt[i], b);
>>
>> . Neither the redundant updates nor the extra (easily mis-predicted)
>> conditional looked very appealing to me, but I guess I could change
>> this if you are convinced that's better than continuing a loop with
>> at most 9 (typically less) iterations.
> 
> Well, I think I at least partly understood the logic.  Not sure
> whether it's worth adding the conditional or just assuming that
> continuing the loop is going to be cheaper.  Might be worth adding a
> comment that we choose to explicitly not add an extra conditional to
> check for early exit, because we assume that to be more expensive than
> just continuing.

... this resolves without further action.

Jan

Re: [PATCH] build: Fix make warning if there is no cppcheck

2022-05-20 Thread Jan Beulich

On 20.05.2022 12:49, Bertrand Marquis wrote:
> --- a/xen/Makefile
> +++ b/xen/Makefile
> @@ -694,12 +694,13 @@ $(objtree)/%.c.cppcheck: $(srctree)/%.c 
> $(objtree)/include/generated/autoconf.h
>   $(call if_changed,cppcheck_xml)
>  
>  cppcheck-version:
> -ifeq ($(shell which $(CPPCHECK)),)
> +ifeq ($(shell which $(CPPCHECK) 2> /dev/null),)
>   $(error Cannot find cppcheck executable: $(CPPCHECK))
> -endif
> +else
>  ifeq ($(shell $(CPPCHECK) --version | awk '{print ($$2 < 2.7)}'),1)
>   $(error Please upgrade your cppcheck to version 2.7 or greater)
>  endif
> +endif

While I agree this will silence things, I still would prefer if you
switched to $(if ...) inside the rule - there's no need to invoke the
shell while parsing the makefile. Anything like this only needlessly
slows down the build. Not by much, but it sums up.

Jan

Re: [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables

2022-05-20 Thread Jan Beulich

On 20.05.2022 12:47, Roger Pau Monné wrote:
> On Thu, May 19, 2022 at 02:12:04PM +0200, Jan Beulich wrote:
>> On 06.05.2022 13:16, Roger Pau Monné wrote:
>>> On Mon, Apr 25, 2022 at 10:40:55AM +0200, Jan Beulich wrote:
 --- a/xen/drivers/passthrough/amd/iommu_map.c
 +++ b/xen/drivers/passthrough/amd/iommu_map.c
 @@ -115,7 +115,19 @@ static void set_iommu_ptes_present(unsig
  
  while ( nr_ptes-- )
  {
 -set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
 +ASSERT(!pde->next_level);
 +ASSERT(!pde->u);
 +
 +if ( pde > table )
 +ASSERT(pde->ign0 == find_first_set_bit(pde - table));
 +else
 +ASSERT(pde->ign0 == PAGE_SHIFT - 3);
>>>
>>> I think PAGETABLE_ORDER would be clearer here.
>>
>> I disagree - PAGETABLE_ORDER is a CPU-side concept. It's not used anywhere
>> in IOMMU code afaics.
> 
> Isn't PAGE_SHIFT also a CPU-side concept in the same way?  I'm not
> sure what's the rule for declaring that PAGE_SHIFT is fine to use in
> IOMMU code  but not PAGETABLE_ORDER.

Hmm, yes and no. But for consistency with other IOMMU code I may want
to switch to PAGE_SHIFT_4K.

Jan

Re: [PATCH v4 16/21] VT-d: free all-empty page tables

2022-05-20 Thread Roger Pau Monné

On Wed, May 18, 2022 at 12:26:03PM +0200, Jan Beulich wrote:
> On 10.05.2022 16:30, Roger Pau Monné wrote:
> > On Mon, Apr 25, 2022 at 10:42:50AM +0200, Jan Beulich wrote:
> >> @@ -837,9 +843,31 @@ static int dma_pte_clear_one(struct doma
> >>  
> >>  old = *pte;
> >>  dma_clear_pte(*pte);
> >> +iommu_sync_cache(pte, sizeof(*pte));
> >> +
> >> +while ( pt_update_contig_markers(&page->val,
> >> + address_level_offset(addr, level),
> >> + level, PTE_kind_null) &&
> >> +++level < min_pt_levels )
> >> +{
> >> +struct page_info *pg = maddr_to_page(pg_maddr);
> >> +
> >> +unmap_vtd_domain_page(page);
> >> +
> >> +pg_maddr = addr_to_dma_page_maddr(domain, addr, level, 
> >> flush_flags,
> >> +  false);
> >> +BUG_ON(pg_maddr < PAGE_SIZE);
> >> +
> >> +page = map_vtd_domain_page(pg_maddr);
> >> +pte = &page[address_level_offset(addr, level)];
> >> +dma_clear_pte(*pte);
> >> +iommu_sync_cache(pte, sizeof(*pte));
> >> +
> >> +*flush_flags |= IOMMU_FLUSHF_all;
> >> +iommu_queue_free_pgtable(hd, pg);
> >> +}
> > 
> > I think I'm setting myself for trouble, but do we need to sync cache
> > the lower lever entries if higher level ones are to be changed.
> > 
> > IOW, would it be fine to just flush the highest level modified PTE?
> > As the lower lever ones won't be reachable anyway.
> 
> I definitely want to err on the safe side here. If later we can
> prove that some cache flush is unneeded, I'd be happy to see it
> go away.

Hm, so it's not only about adding more cache flushes, but moving them
inside of the locked region: previously the only cache flush was done
outside of the locked region.

I guess I can't convince myself why we would need to flush cache of
entries that are to be removed, and that also point to pages scheduled
to be freed.

> >> @@ -2182,8 +2210,21 @@ static int __must_check cf_check intel_i
> >>  }
> >>  
> >>  *pte = new;
> >> -
> >>  iommu_sync_cache(pte, sizeof(struct dma_pte));
> >> +
> >> +/*
> >> + * While the (ab)use of PTE_kind_table here allows to save some work 
> >> in
> >> + * the function, the main motivation for it is that it avoids a so far
> >> + * unexplained hang during boot (while preparing Dom0) on a Westmere
> >> + * based laptop.
> >> + */
> >> +pt_update_contig_markers(&page->val,
> >> + address_level_offset(dfn_to_daddr(dfn), 
> >> level),
> >> + level,
> >> + (hd->platform_ops->page_sizes &
> >> +  (1UL << level_to_offset_bits(level + 1))
> >> +  ? PTE_kind_leaf : PTE_kind_table));
> > 
> > So this works because on what we believe to be affected models the
> > only supported page sizes are 4K?
> 
> Yes.
> 
> > Do we want to do the same with AMD if we don't allow 512G super pages?
> 
> Why? They don't have a similar flaw.

So the question was mostly whether we should also avoid the
pt_update_contig_markers for 1G entries, because we won't coalesce
them into a 512G anyway.  IOW avoid the overhead of updating the
contig markers if we know that the resulting super-page is not
supported by ->page_sizes.

Thanks, Roger.

Re: [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables

2022-05-20 Thread Jan Beulich

On 20.05.2022 13:11, Jan Beulich wrote:
> On 20.05.2022 12:47, Roger Pau Monné wrote:
>> On Thu, May 19, 2022 at 02:12:04PM +0200, Jan Beulich wrote:
>>> On 06.05.2022 13:16, Roger Pau Monné wrote:
 On Mon, Apr 25, 2022 at 10:40:55AM +0200, Jan Beulich wrote:
> --- a/xen/drivers/passthrough/amd/iommu_map.c
> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> @@ -115,7 +115,19 @@ static void set_iommu_ptes_present(unsig
>  
>  while ( nr_ptes-- )
>  {
> -set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
> +ASSERT(!pde->next_level);
> +ASSERT(!pde->u);
> +
> +if ( pde > table )
> +ASSERT(pde->ign0 == find_first_set_bit(pde - table));
> +else
> +ASSERT(pde->ign0 == PAGE_SHIFT - 3);

 I think PAGETABLE_ORDER would be clearer here.
>>>
>>> I disagree - PAGETABLE_ORDER is a CPU-side concept. It's not used anywhere
>>> in IOMMU code afaics.
>>
>> Isn't PAGE_SHIFT also a CPU-side concept in the same way?  I'm not
>> sure what's the rule for declaring that PAGE_SHIFT is fine to use in
>> IOMMU code  but not PAGETABLE_ORDER.
> 
> Hmm, yes and no. But for consistency with other IOMMU code I may want
> to switch to PAGE_SHIFT_4K.

Except that, with the plan to re-use pt_update_contig_markers() for CPU-
side re-coalescing, there I'd prefer to stick to PAGE_SHIFT.

Jan

Re: [PATCH 24/30] panic: Refactor the panic path

2022-05-20 Thread Guilherme G. Piccoli

On 19/05/2022 20:45, Baoquan He wrote:
> [...]
>> I really appreciate the summary skill you have, to convert complex
>> problems in very clear and concise ideas. Thanks for that, very useful!
>> I agree with what was summarized above.
> 
> I want to say the similar words to Petr's reviewing comment when I went
> through the patches and traced each reviewing sub-thread to try to
> catch up. Petr has reivewed this series so carefully and given many
> comments I want to ack immediately.
> 
> I agree with most of the suggestions from Petr to this patch, except of
> one tiny concern, please see below inline comment.

Hi Baoquan, thanks! I'm glad you're also reviewing that =)

> [...]
> 
> I like the proposed skeleton of panic() and code style suggested by
> Petr very much. About panic_prefer_crash_dump which might need be added,
> I hope it has a default value true. This makes crash_dump execute at
> first by default just as before, unless people specify
> panic_prefer_crash_dump=0|n|off to disable it. Otherwise we need add
> panic_prefer_crash_dump=1 in kernel and in our distros to enable kdump,
> this is inconsistent with the old behaviour.

I'd like to understand better why the crash_kexec() must always be the
first thing in your use case. If we keep that behavior, we'll see all
sorts of workarounds - see the last patches of this series, Hyper-V and
PowerPC folks hardcoded "crash_kexec_post_notifiers" in order to force
execution of their relevant notifiers (like the vmbus disconnect,
specially in arm64 that has no custom machine_crash_shutdown, or the
fadump case in ppc). This led to more risk in kdump.

The thing is: with the notifiers' split, we tried to keep only the most
relevant/necessary stuff in this first list, things that ultimately
should improve kdump reliability or if not, at least not break it. My
feeling is that, with this series, we should change the idea/concept
that kdump must run first nevertheless, not matter what. We're here
trying to accommodate the antagonistic goals of hypervisors that need
some clean-up (even for kdump to work) VS. kdump users, that wish a
"pristine" system reboot ASAP after the crash.

Cheers,

Guilherme

Re: [PATCH v4 14/21] x86: introduce helper for recording degree of contiguity in page tables

2022-05-20 Thread Roger Pau Monné

On Fri, May 20, 2022 at 12:59:55PM +0200, Jan Beulich wrote:
> On 20.05.2022 12:22, Roger Pau Monné wrote:
> > On Wed, May 18, 2022 at 12:06:29PM +0200, Jan Beulich wrote:
> >> On 06.05.2022 15:25, Roger Pau Monné wrote:
> >>> On Mon, Apr 25, 2022 at 10:41:23AM +0200, Jan Beulich wrote:
>  --- /dev/null
>  +++ b/xen/arch/x86/include/asm/pt-contig-markers.h
>  @@ -0,0 +1,105 @@
>  +#ifndef __ASM_X86_PT_CONTIG_MARKERS_H
>  +#define __ASM_X86_PT_CONTIG_MARKERS_H
>  +
>  +/*
>  + * Short of having function templates in C, the function defined below 
>  is
>  + * intended to be used by multiple parties interested in recording the
>  + * degree of contiguity in mappings by a single page table.
>  + *
>  + * Scheme: Every entry records the order of contiguous successive 
>  entries,
>  + * up to the maximum order covered by that entry (which is the number of
>  + * clear low bits in its index, with entry 0 being the exception using
>  + * the base-2 logarithm of the number of entries in a single page 
>  table).
>  + * While a few entries need touching upon update, knowing whether the
>  + * table is fully contiguous (and can hence be replaced by a higher 
>  level
>  + * leaf entry) is then possible by simply looking at entry 0's marker.
>  + *
>  + * Prereqs:
>  + * - CONTIG_MASK needs to be #define-d, to a value having at least 4
>  + *   contiguous bits (ignored by hardware), before including this file,
>  + * - page tables to be passed here need to be initialized with correct
>  + *   markers.
> >>>
> >>> Not sure it's very relevant, but might we worth adding that:
> >>>
> >>> - Null entries must have the PTE zeroed except for the CONTIG_MASK
> >>>   region in order to be considered as inactive.
> >>
> >> NP, I've added an item along these lines.
> >>
>  +static bool pt_update_contig_markers(uint64_t *pt, unsigned int idx,
>  + unsigned int level, enum PTE_kind 
>  kind)
>  +{
>  +unsigned int b, i = idx;
>  +unsigned int shift = (level - 1) * CONTIG_LEVEL_SHIFT + PAGE_SHIFT;
>  +
>  +ASSERT(idx < CONTIG_NR);
>  +ASSERT(!(pt[idx] & CONTIG_MASK));
>  +
>  +/* Step 1: Reduce markers in lower numbered entries. */
>  +while ( i )
>  +{
>  +b = find_first_set_bit(i);
>  +i &= ~(1U << b);
>  +if ( GET_MARKER(pt[i]) > b )
>  +SET_MARKER(pt[i], b);
> >>>
> >>> Can't you exit early when you find an entry that already has the
> >>> to-be-set contiguous marker <= b, as lower numbered entries will then
> >>> also be <= b'?
> >>>
> >>> Ie:
> >>>
> >>> if ( GET_MARKER(pt[i]) <= b )
> >>> break;
> >>> else
> >>> SET_MARKER(pt[i], b);
> >>
> >> Almost - I think it would need to be 
> >>
> >> if ( GET_MARKER(pt[i]) < b )
> >> break;
> >> if ( GET_MARKER(pt[i]) > b )
> >> SET_MARKER(pt[i], b);
> > 
> > I guess I'm slightly confused, but if marker at i is <= b, then all
> > following markers will also be <=, and hence could be skipped?
> 
> Your use of "following" is ambiguous here, because the iteration
> moves downwards as far as PTEs inspected are concerned (and it's
> b which grows from one iteration to the next). But yes, I think I
> agree now that ...

Right, 'following' here would be the next item processed by the loop.

> > Not sure why we need to keep iterating if GET_MARKER(pt[i]) == b.
> 
> ... this isn't needed. At which point ...
> 
> > FWIW, you could even do:
> > 
> > if ( GET_MARKER(pt[i]) <= b )
> > break;
> > SET_MARKER(pt[i], b);
> > 
> > Which would keep the conditionals to 1 like it currently is.
> > 
> >>
> >> or, accepting redundant updates, 
> >>
> >> if ( GET_MARKER(pt[i]) < b )
> >> break;
> >> SET_MARKER(pt[i], b);
> >>
> >> . Neither the redundant updates nor the extra (easily mis-predicted)
> >> conditional looked very appealing to me, but I guess I could change
> >> this if you are convinced that's better than continuing a loop with
> >> at most 9 (typically less) iterations.
> > 
> > Well, I think I at least partly understood the logic.  Not sure
> > whether it's worth adding the conditional or just assuming that
> > continuing the loop is going to be cheaper.  Might be worth adding a
> > comment that we choose to explicitly not add an extra conditional to
> > check for early exit, because we assume that to be more expensive than
> > just continuing.
> 
> ... this resolves without further action.

OK, since we agree, and that was the only comment I had, you can add:

Reviewed-by: Roger Pau Monné 

Thanks, Roger.

[ovmf test] 170598: regressions - FAIL

2022-05-20 Thread osstest service owner

flight 170598 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170598/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf a21a3438f795deecb24e1843c1636f95c485017c
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z   81 days
Failing since168258  2022-03-01 01:55:31 Z   80 days 1117 attempts
Testing same since   170593  2022-05-20 06:42:41 Z0 days5 attempts


People who touched revisions under test:
  Abdul Lateef Attar 
  Abdul Lateef Attar via groups.io 
  Abner Chang 
  Akihiko Odaki 
  Anthony PERARD 
  Bandaru, Purna Chandra Rao 
  Bo Chang Ke 
  Bob Feng 
  Chao Li 
  Chao, Zhuoran 
  Chen Lin Z 
  Chen, Christine 
  Chen, Lin Z 
  Corvin Köhne 
  Dandan Bi 
  dann frazier 
  Dun Tan 
  duntan 
  Feng, Bob C 
  Gerd Hoffmann 
  Gua Guo 
  Guo Dong 
  Guomin Jiang 
  Hao A Wu 
  Heng Luo 
  Hua Ma 
  Huang, Li-Xia 
  Jagadeesh Ujja 
  Jake Garver 
  Jake Garver via groups.io 
  Jason 
  Jason Lou 
  Jiewen Yao 
  Ke, Bo-ChangX 
  Ken Lautner 
  Kenneth Lautner 
  Kun Qin 
  Kuo, Ted 
  Laszlo Ersek 
  Lean Sheng Tan 
  Leif Lindholm 
  Li, Yi1 
  Li, Zhihao 
  Liming Gao 
  Liu 
  Liu Yun 
  Liu Yun Y 
  Liu, Zhiguang 
  Lixia Huang 
  Lou, Yun 
  Ma, Hua 
  Mara Sophie Grosch 
  Mara Sophie Grosch via groups.io 
  Matt DeVillier 
  Michael D Kinney 
  Michael Kubacki 
  Michael Kubacki 
  Min M Xu 
  Min Xu 
  Oliver Steffen 
  Patrick Rudolph 
  Peter Grehan 
  Purna Chandra Rao Bandaru 
  Ray Ni 
  Rebecca Cran 
  Rebecca Cran 
  Sami Mujawar 
  Sean Rhodes 
  Sean Rhodes sean@starlabs.systems
  Sebastien Boeuf 
  Sunny Wang 
  Tan, Dun 
  Ted Kuo 
  Tom Lendacky 
  Wenyi Xie 
  wenyi,xie via groups.io 
  Xiaolu.Jiang 
  Xie, Yuanhao 
  Yi Li 
  yi1 li 
  Yu Pu 
  Yuanhao Xie 
  Yuwei Chen 
  Zhiguang Liu 
  Zhihao Li 
  Zhuoran Chao 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 6968 lines long.)

[ovmf test] 170600: regressions - FAIL

2022-05-20 Thread osstest service owner

flight 170600 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170600/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf a21a3438f795deecb24e1843c1636f95c485017c
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z   81 days
Failing since168258  2022-03-01 01:55:31 Z   80 days 1118 attempts
Testing same since   170593  2022-05-20 06:42:41 Z0 days6 attempts


People who touched revisions under test:
  Abdul Lateef Attar 
  Abdul Lateef Attar via groups.io 
  Abner Chang 
  Akihiko Odaki 
  Anthony PERARD 
  Bandaru, Purna Chandra Rao 
  Bo Chang Ke 
  Bob Feng 
  Chao Li 
  Chao, Zhuoran 
  Chen Lin Z 
  Chen, Christine 
  Chen, Lin Z 
  Corvin Köhne 
  Dandan Bi 
  dann frazier 
  Dun Tan 
  duntan 
  Feng, Bob C 
  Gerd Hoffmann 
  Gua Guo 
  Guo Dong 
  Guomin Jiang 
  Hao A Wu 
  Heng Luo 
  Hua Ma 
  Huang, Li-Xia 
  Jagadeesh Ujja 
  Jake Garver 
  Jake Garver via groups.io 
  Jason 
  Jason Lou 
  Jiewen Yao 
  Ke, Bo-ChangX 
  Ken Lautner 
  Kenneth Lautner 
  Kun Qin 
  Kuo, Ted 
  Laszlo Ersek 
  Lean Sheng Tan 
  Leif Lindholm 
  Li, Yi1 
  Li, Zhihao 
  Liming Gao 
  Liu 
  Liu Yun 
  Liu Yun Y 
  Liu, Zhiguang 
  Lixia Huang 
  Lou, Yun 
  Ma, Hua 
  Mara Sophie Grosch 
  Mara Sophie Grosch via groups.io 
  Matt DeVillier 
  Michael D Kinney 
  Michael Kubacki 
  Michael Kubacki 
  Min M Xu 
  Min Xu 
  Oliver Steffen 
  Patrick Rudolph 
  Peter Grehan 
  Purna Chandra Rao Bandaru 
  Ray Ni 
  Rebecca Cran 
  Rebecca Cran 
  Sami Mujawar 
  Sean Rhodes 
  Sean Rhodes sean@starlabs.systems
  Sebastien Boeuf 
  Sunny Wang 
  Tan, Dun 
  Ted Kuo 
  Tom Lendacky 
  Wenyi Xie 
  wenyi,xie via groups.io 
  Xiaolu.Jiang 
  Xie, Yuanhao 
  Yi Li 
  yi1 li 
  Yu Pu 
  Yuanhao Xie 
  Yuwei Chen 
  Zhiguang Liu 
  Zhihao Li 
  Zhuoran Chao 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 6968 lines long.)

Re: [PATCH] build: Fix make warning if there is no cppcheck

2022-05-20 Thread Julien Grall


Hi Bertrand,

On 20/05/2022 11:49, Bertrand Marquis wrote:

If cppcheck is not present, the following warning appears during build:
which: no cppcheck in ([...])
/bin/sh: cppcheck: command not found

Fix this by hiding the error output from which and only try to execute
cppcheck --version if we have a cppcheck.

Reported-by: Julien Grall 
Signed-off-by: Bertrand Marquis 


This solves the warning so:

Acked-by: Julien Grall 

In the long term, I think using what Jan suggested would be better.

Cheers,

--
Julien Grall

[PATCH 00/16] xen/arm: mm: Remove open-coding mappings

2022-05-20 Thread Julien Grall

From: Julien Grall 

Hi all,

This series was originally sent as "xen/arm: mm: Add limited support
for superpages" [1] and finally has grown enough to remove most of
the open-coding mappings in the boot code.

This will help to:
1) Get better compliance with the Arm memory model
2) Pave the way to support other page size (64KB, 16KB)

Cheers,

[1] <20201119190751.22345-1-jul...@xen.org>

Julien Grall (15):
  xen/arm: mm: Allow other mapping size in xen_pt_update_entry()
  xen/arm: mm: Add support for the contiguous bit
  xen/arm: mm: Avoid flushing the TLBs when mapping are inserted
  xen/arm: mm: Don't open-code Xen PT update in remove_early_mappings()
  xen/arm: mm: Re-implement early_fdt_map() using map_pages_to_xen()
  xen/arm32: mm: Re-implement setup_xenheap_mappings() using
map_pages_to_xen()
  xen/arm: mm: Allocate xen page tables in domheap rather than xenheap
  xen/arm: mm: Allow page-table allocation from the boot allocator
  xen/arm: Move fixmap definitions in a separate header
  xen/arm: mm: Clean-up the includes and order them
  xen/arm: mm: Use the PMAP helpers in xen_{,un}map_table()
  xen/arm32: setup: Move out the code to populate the boot allocator
  xen/arm64: mm: Add memory to the boot allocator first
  xen/arm: mm: Rework setup_xenheap_mappings()
  xen/arm: mm: Re-implement setup_frame_table_mappings() with
map_pages_to_xen()

Wei Liu (1):
  xen/arm: add Persistent Map (PMAP) infrastructure

 xen/arch/arm/Kconfig|   1 +
 xen/arch/arm/acpi/lib.c |   2 +
 xen/arch/arm/include/asm/config.h   |   6 -
 xen/arch/arm/include/asm/early_printk.h |   1 +
 xen/arch/arm/include/asm/fixmap.h   |  48 +++
 xen/arch/arm/include/asm/lpae.h |   8 +
 xen/arch/arm/include/asm/mm.h   |   4 -
 xen/arch/arm/include/asm/page.h |   8 +
 xen/arch/arm/include/asm/pmap.h |  32 ++
 xen/arch/arm/kernel.c   |   1 +
 xen/arch/arm/mm.c   | 494 +---
 xen/arch/arm/setup.c| 141 +++
 xen/common/Kconfig  |   3 +
 xen/common/Makefile |   1 +
 xen/common/pmap.c   |  72 
 xen/include/xen/acpi.h  |  18 +-
 xen/include/xen/pmap.h  |  16 +
 17 files changed, 549 insertions(+), 307 deletions(-)
 create mode 100644 xen/arch/arm/include/asm/fixmap.h
 create mode 100644 xen/arch/arm/include/asm/pmap.h
 create mode 100644 xen/common/pmap.c
 create mode 100644 xen/include/xen/pmap.h

-- 
2.32.0

[PATCH 06/16] xen/arm32: mm: Re-implement setup_xenheap_mappings() using map_pages_to_xen()

2022-05-20 Thread Julien Grall

From: Julien Grall 

Now that map_pages_to_xen() has been extended to support 2MB mappings,
we can replace the create_mappings() call by map_pages_to_xen() call.

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Changes in v4:
- Add Stefano's reviewed-by

Changes in v3:
- Fix build when CONFIG_DEBUG=y

Changes in v2:
- New patch
---
 xen/arch/arm/mm.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 03f970e4d10b..47c2111c36a4 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -825,7 +825,12 @@ void mmu_init_secondary_cpu(void)
 void __init setup_xenheap_mappings(unsigned long base_mfn,
unsigned long nr_mfns)
 {
-create_mappings(xen_second, XENHEAP_VIRT_START, base_mfn, nr_mfns, MB(32));
+int rc;
+
+rc = map_pages_to_xen(XENHEAP_VIRT_START, _mfn(base_mfn), nr_mfns,
+  PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
+if ( rc )
+panic("Unable to setup the xenheap mappings.\n");
 
 /* Record where the xenheap is, for translation routines. */
 xenheap_virt_end = XENHEAP_VIRT_START + nr_mfns * PAGE_SIZE;
-- 
2.32.0

[PATCH 01/16] xen/arm: mm: Allow other mapping size in xen_pt_update_entry()

2022-05-20 Thread Julien Grall

From: Julien Grall 

At the moment, xen_pt_update_entry() only supports mapping at level 3
(i.e 4KB mapping). While this is fine for most of the runtime helper,
the boot code will require to use superpage mapping.

We don't want to allow superpage mapping by default as some of the
callers may expect small mappings (i.e populate_pt_range()) or even
expect to unmap only a part of a superpage.

To keep the code simple, a new flag _PAGE_BLOCK is introduced to
allow the caller to enable superpage mapping.

As the code doesn't support all the combinations, xen_pt_check_entry()
is extended to take into account the cases we don't support when
using block mapping:
- Replacing a table with a mapping. This may happen if region was
first mapped with 4KB mapping and then later on replaced with a 2MB
(or 1GB mapping).
- Removing/modifying a table. This may happen if a caller try to
remove a region with _PAGE_BLOCK set when it was created without it.

Note that the current restriction means that the caller must ensure that
_PAGE_BLOCK is consistently set/cleared across all the updates on a
given virtual region. This ought to be fine with the expected use-cases.

More rework will be necessary if we wanted to remove the restrictions.

Note that nr_mfns is now marked const as it is used for flushing the
TLBs and we don't want it to be modified.

Signed-off-by: Julien Grall 
Signed-off-by: Julien Grall 
Reviewed-by: Hongda Deng 

---
Changes in v4:
- Add Hongda's reviewed-by
- Add a comment why nr_mfns is const
- Open-code pfn_to_paddr()

Changes in v3:
- Fix clash after prefixing the PT macros with XEN_PT_
- Fix typoes in the commit message
- Support superpage mappings even if nr is not suitably aligned
- Move the logic to find the level in a separate function

Changes in v2:
- Pass the target level rather than the order to
xen_pt_update_entry()
- Update some comments
- Open-code paddr_to_pfn()
- Add my AWS signed-off-by
---
 xen/arch/arm/include/asm/page.h |   4 ++
 xen/arch/arm/mm.c   | 109 ++--
 2 files changed, 95 insertions(+), 18 deletions(-)

diff --git a/xen/arch/arm/include/asm/page.h b/xen/arch/arm/include/asm/page.h
index c6f9fb0d4e0c..07998df47bac 100644
--- a/xen/arch/arm/include/asm/page.h
+++ b/xen/arch/arm/include/asm/page.h
@@ -69,6 +69,7 @@
  * [3:4] Permission flags
  * [5]   Page present
  * [6]   Only populate page tables
+ * [7]   Superpage mappings is allowed
  */
 #define PAGE_AI_MASK(x) ((x) & 0x7U)
 
@@ -82,6 +83,9 @@
 #define _PAGE_PRESENT(1U << 5)
 #define _PAGE_POPULATE   (1U << 6)
 
+#define _PAGE_BLOCK_BIT 7
+#define _PAGE_BLOCK (1U << _PAGE_BLOCK_BIT)
+
 /*
  * _PAGE_DEVICE and _PAGE_NORMAL are convenience defines. They are not
  * meant to be used outside of this header.
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 7b1f2f49060d..be2ac302d731 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1078,9 +1078,10 @@ static int xen_pt_next_level(bool read_only, unsigned 
int level,
 }
 
 /* Sanity check of the entry */
-static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
+static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int level,
+   unsigned int flags)
 {
-/* Sanity check when modifying a page. */
+/* Sanity check when modifying an entry. */
 if ( (flags & _PAGE_PRESENT) && mfn_eq(mfn, INVALID_MFN) )
 {
 /* We don't allow modifying an invalid entry. */
@@ -1090,6 +1091,13 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, 
unsigned int flags)
 return false;
 }
 
+/* We don't allow modifying a table entry */
+if ( !lpae_is_mapping(entry, level) )
+{
+mm_printk("Modifying a table entry is not allowed.\n");
+return false;
+}
+
 /* We don't allow changing memory attributes. */
 if ( entry.pt.ai != PAGE_AI_MASK(flags) )
 {
@@ -1105,7 +1113,7 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, 
unsigned int flags)
 return false;
 }
 }
-/* Sanity check when inserting a page */
+/* Sanity check when inserting a mapping */
 else if ( flags & _PAGE_PRESENT )
 {
 /* We should be here with a valid MFN. */
@@ -1114,18 +1122,28 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, 
unsigned int flags)
 /* We don't allow replacing any valid entry. */
 if ( lpae_is_valid(entry) )
 {
-mm_printk("Changing MFN for a valid entry is not allowed 
(%#"PRI_mfn" -> %#"PRI_mfn").\n",
-  mfn_x(lpae_get_mfn(entry)), mfn_x(mfn));
+if ( lpae_is_mapping(entry, level) )
+mm_printk("Changing MFN for a valid entry is not allowed 
(%#"PRI_mfn" -> %#"PRI_mfn").\n",
+  mf

[PATCH 02/16] xen/arm: mm: Add support for the contiguous bit

2022-05-20 Thread Julien Grall

From: Julien Grall 

In follow-up patches, we will use xen_pt_update() (or its callers)
to handle large mappings (e.g. frametable, xenheap). They are also
not going to be modified once created.

The page-table entries have an hint to indicate that whether an
entry is contiguous to another 16 entries (assuming 4KB). When the
processor support the hint, one TLB entry will be created per
contiguous region.

For now this is tied to _PAGE_BLOCK. We can untie it in the future
if there are use-cases where we may want to use _PAGE_BLOCK without
setting the contiguous (couldn't think of any yet).

Note that to avoid extra complexity, mappings with the contiguous
bit set cannot be removed. Given the expected use, this restriction
ought to be fine.

Signed-off-by: Julien Grall 
Reviewed-by: Hongda Deng 
Reviewed-by: Stefano Stabellini 

---
Changes in v4:
- Add Hongda and Stefano's reviewed-by
- Exit the outer loop as soon as there is an error.

Changes in v3:
- New patch
---
 xen/arch/arm/include/asm/page.h |  4 ++
 xen/arch/arm/mm.c   | 83 +
 2 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/include/asm/page.h b/xen/arch/arm/include/asm/page.h
index 07998df47bac..e7cd62190c7f 100644
--- a/xen/arch/arm/include/asm/page.h
+++ b/xen/arch/arm/include/asm/page.h
@@ -70,6 +70,7 @@
  * [5]   Page present
  * [6]   Only populate page tables
  * [7]   Superpage mappings is allowed
+ * [8]   Set contiguous bit (internal flag)
  */
 #define PAGE_AI_MASK(x) ((x) & 0x7U)
 
@@ -86,6 +87,9 @@
 #define _PAGE_BLOCK_BIT 7
 #define _PAGE_BLOCK (1U << _PAGE_BLOCK_BIT)
 
+#define _PAGE_CONTIG_BIT8
+#define _PAGE_CONTIG(1U << _PAGE_CONTIG_BIT)
+
 /*
  * _PAGE_DEVICE and _PAGE_NORMAL are convenience defines. They are not
  * meant to be used outside of this header.
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index be2ac302d731..c4487dd7fc46 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1252,6 +1252,8 @@ static int xen_pt_update_entry(mfn_t root, unsigned long 
virt,
 /* Set permission */
 pte.pt.ro = PAGE_RO_MASK(flags);
 pte.pt.xn = PAGE_XN_MASK(flags);
+/* Set contiguous bit */
+pte.pt.contig = !!(flags & _PAGE_CONTIG);
 }
 
 write_pte(entry, pte);
@@ -1304,6 +1306,51 @@ static int xen_pt_mapping_level(unsigned long vfn, mfn_t 
mfn, unsigned long nr,
  return level;
 }
 
+#define XEN_PT_4K_NR_CONTIG 16
+
+/*
+ * Check whether the contiguous bit can be set. Return the number of
+ * contiguous entry allowed. If not allowed, return 1.
+ */
+static unsigned int xen_pt_check_contig(unsigned long vfn, mfn_t mfn,
+unsigned int level, unsigned long left,
+unsigned int flags)
+{
+unsigned long nr_contig;
+
+/*
+ * Allow the contiguous bit to set when the caller requests block
+ * mapping.
+ */
+if ( !(flags & _PAGE_BLOCK) )
+return 1;
+
+/*
+ * We don't allow to remove mapping with the contiguous bit set.
+ * So shortcut the logic and directly return 1.
+ */
+if ( mfn_eq(mfn, INVALID_MFN) )
+return 1;
+
+/*
+ * The number of contiguous entries varies depending on the page
+ * granularity used. The logic below assumes 4KB.
+ */
+BUILD_BUG_ON(PAGE_SIZE != SZ_4K);
+
+/*
+ * In order to enable the contiguous bit, we should have enough entries
+ * to map left and both the virtual and physical address should be
+ * aligned to the size of 16 translation tables entries.
+ */
+nr_contig = BIT(XEN_PT_LEVEL_ORDER(level), UL) * XEN_PT_4K_NR_CONTIG;
+
+if ( (left < nr_contig) || ((mfn_x(mfn) | vfn) & (nr_contig - 1)) )
+return 1;
+
+return XEN_PT_4K_NR_CONTIG;
+}
+
 static DEFINE_SPINLOCK(xen_pt_lock);
 
 static int xen_pt_update(unsigned long virt,
@@ -1338,6 +1385,12 @@ static int xen_pt_update(unsigned long virt,
 return -EINVAL;
 }
 
+if ( flags & _PAGE_CONTIG )
+{
+mm_printk("_PAGE_CONTIG is an internal only flag.\n");
+return -EINVAL;
+}
+
 if ( !IS_ALIGNED(virt, PAGE_SIZE) )
 {
 mm_printk("The virtual address is not aligned to the page-size.\n");
@@ -1348,22 +1401,36 @@ static int xen_pt_update(unsigned long virt,
 
 while ( left )
 {
-unsigned int order, level;
+unsigned int order, level, nr_contig, new_flags;
 
 level = xen_pt_mapping_level(vfn, mfn, left, flags);
 order = XEN_PT_LEVEL_ORDER(level);
 
 ASSERT(left >= BIT(order, UL));
 
-rc = xen_pt_update_entry(root, vfn << PAGE_SHIFT, mfn, level, flags);
-if ( rc )
-break;
+/*
+ * Check if we can set the contiguous mapping and update the
+ * flags accordingly.
+ */
+nr_contig = xen_pt_check_contig(vfn, mfn, level, left, flags);
+

[PATCH 03/16] xen/arm: mm: Avoid flushing the TLBs when mapping are inserted

2022-05-20 Thread Julien Grall

From: Julien Grall 

Currently, the function xen_pt_update() will flush the TLBs even when
the mappings are inserted. This is a bit wasteful because we don't
allow mapping replacement. Even if we were, the flush would need to
happen earlier because mapping replacement should use Break-Before-Make
when updating the entry.

A single call to xen_pt_update() can perform a single action. IOW, it
is not possible to, for instance, mix inserting and removing mappings.
Therefore, we can use `flags` to determine what action is performed.

This change will be particularly help to limit the impact of switching
boot time mapping to use xen_pt_update().

Signed-off-by: Julien Grall 

---
Changes in v4:
- Switch the check to a different expression that will still
result to the same truth table.

Changes in v2:
- New patch
---
 xen/arch/arm/mm.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index c4487dd7fc46..747083d820dd 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1119,7 +1119,13 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, 
unsigned int level,
 /* We should be here with a valid MFN. */
 ASSERT(!mfn_eq(mfn, INVALID_MFN));
 
-/* We don't allow replacing any valid entry. */
+/*
+ * We don't allow replacing any valid entry.
+ *
+ * Note that the function xen_pt_update() relies on this
+ * assumption and will skip the TLB flush. The function will need
+ * to be updated if the check is relaxed.
+ */
 if ( lpae_is_valid(entry) )
 {
 if ( lpae_is_mapping(entry, level) )
@@ -1434,11 +1440,16 @@ static int xen_pt_update(unsigned long virt,
 }
 
 /*
- * Flush the TLBs even in case of failure because we may have
+ * The TLBs flush can be safely skipped when a mapping is inserted
+ * as we don't allow mapping replacement (see xen_pt_check_entry()).
+ *
+ * For all the other cases, the TLBs will be flushed unconditionally
+ * even if the mapping has failed. This is because we may have
  * partially modified the PT. This will prevent any unexpected
  * behavior afterwards.
  */
-flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns);
+if ( !((flags & _PAGE_PRESENT) && !mfn_eq(mfn, INVALID_MFN)) )
+flush_xen_tlb_range_va(virt, PAGE_SIZE * nr_mfns);
 
 spin_unlock(&xen_pt_lock);
 
-- 
2.32.0

[PATCH 04/16] xen/arm: mm: Don't open-code Xen PT update in remove_early_mappings()

2022-05-20 Thread Julien Grall

From: Julien Grall 

Now that xen_pt_update_entry() is able to deal with different mapping
size, we can replace the open-coding of the page-tables update by a call
to modify_xen_mappings().

As the function is not meant to fail, a BUG_ON() is added to check the
return.

Note that we don't use destroy_xen_mappings() because the helper doesn't
allow us to pass a flags. In theory we could add an extra parameter to
the function, however there are no other expected users. Hence why
modify_xen_mappings() is used.

Signed-off-by: Julien Grall 
Signed-off-by: Julien Grall 
Reviewed-by: Hongda Deng 

---
Changes in v4:
- Add Hongda's reviewed-by
- Add a comment to explain what modify_xen_mappings() does.
- Clarify in the commit message hwy modify_xen_mappings() is
  used rather than destroy_xen_mappings().

Changes in v2:
- Stay consistent with how function name are used in the commit
message
- Add my AWS signed-off-by
---
 xen/arch/arm/mm.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 747083d820dd..64a79d45b38c 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -614,11 +614,12 @@ void * __init early_fdt_map(paddr_t fdt_paddr)
 
 void __init remove_early_mappings(void)
 {
-lpae_t pte = {0};
-write_pte(xen_second + second_table_offset(BOOT_FDT_VIRT_START), pte);
-write_pte(xen_second + second_table_offset(BOOT_FDT_VIRT_START + SZ_2M),
-  pte);
-flush_xen_tlb_range_va(BOOT_FDT_VIRT_START, BOOT_FDT_SLOT_SIZE);
+int rc;
+
+/* destroy the _PAGE_BLOCK mapping */
+rc = modify_xen_mappings(BOOT_FDT_VIRT_START, BOOT_FDT_VIRT_END,
+ _PAGE_BLOCK);
+BUG_ON(rc);
 }
 
 /*
-- 
2.32.0

[PATCH 05/16] xen/arm: mm: Re-implement early_fdt_map() using map_pages_to_xen()

2022-05-20 Thread Julien Grall

From: Julien Grall 

Now that map_pages_to_xen() has been extended to support 2MB mappings,
we can replace the create_mappings() calls by map_pages_to_xen() calls.

The mapping can also be marked read-only as Xen should not modify
the host Device Tree during boot.

Signed-off-by: Julien Grall 
Signed-off-by: Julien Grall 
Reviewed-by: Hongda Deng 

---
Changes in v4:
- Fix typo in the commit message
- Add Hongda's reviewed-by

Changes in v2:
- Add my AWS signed-off-by
- Fix typo in the commit message
---
 xen/arch/arm/mm.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 64a79d45b38c..03f970e4d10b 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -574,6 +574,7 @@ void * __init early_fdt_map(paddr_t fdt_paddr)
 paddr_t offset;
 void *fdt_virt;
 uint32_t size;
+int rc;
 
 /*
  * Check whether the physical FDT address is set and meets the minimum
@@ -589,8 +590,12 @@ void * __init early_fdt_map(paddr_t fdt_paddr)
 /* The FDT is mapped using 2MB superpage */
 BUILD_BUG_ON(BOOT_FDT_VIRT_START % SZ_2M);
 
-create_mappings(xen_second, BOOT_FDT_VIRT_START, paddr_to_pfn(base_paddr),
-SZ_2M >> PAGE_SHIFT, SZ_2M);
+rc = map_pages_to_xen(BOOT_FDT_VIRT_START, maddr_to_mfn(base_paddr),
+  SZ_2M >> PAGE_SHIFT,
+  PAGE_HYPERVISOR_RO | _PAGE_BLOCK);
+if ( rc )
+panic("Unable to map the device-tree.\n");
+
 
 offset = fdt_paddr % SECOND_SIZE;
 fdt_virt = (void *)BOOT_FDT_VIRT_START + offset;
@@ -604,9 +609,12 @@ void * __init early_fdt_map(paddr_t fdt_paddr)
 
 if ( (offset + size) > SZ_2M )
 {
-create_mappings(xen_second, BOOT_FDT_VIRT_START + SZ_2M,
-paddr_to_pfn(base_paddr + SZ_2M),
-SZ_2M >> PAGE_SHIFT, SZ_2M);
+rc = map_pages_to_xen(BOOT_FDT_VIRT_START + SZ_2M,
+  maddr_to_mfn(base_paddr + SZ_2M),
+  SZ_2M >> PAGE_SHIFT,
+  PAGE_HYPERVISOR_RO | _PAGE_BLOCK);
+if ( rc )
+panic("Unable to map the device-tree\n");
 }
 
 return fdt_virt;
-- 
2.32.0

[PATCH 09/16] xen/arm: Move fixmap definitions in a separate header

2022-05-20 Thread Julien Grall

From: Julien Grall 

To use properly the fixmap definitions, their user would need
also new to include . This is not very great when
the user itself is not meant to directly use ACPI definitions.

Including  in  is not option because
the latter header is included by everyone. So move out the fixmap
entries definition in a new header.

Take the opportunity to also move {set, clear}_fixmap() prototypes
in the new header.

Note that most of the definitions in  now need to be
surrounded with #ifndef __ASSEMBLY__ because  will
be used in assembly (see EARLY_UART_VIRTUAL_ADDRESS).

The split will become more helpful in a follow-up patch where new
fixmap entries will be defined.

Signed-off-by: Julien Grall 
Acked-by: Jan Beulich 

---
There was some disagreement with Stefano on whether fixmap.h
should include acpi.h or this should be the other way around.

I chose the former because each components should decide how
much entries in the fixmap they need and also because this is
the current behavior on x86. We should stay consitent
between arch to avoid any headers mess.

Jan acked this patch, so I am assuming he is happy with this
approach. I would be OK to rework it if others agree with
Stefano's view.

Changes in v4:
- Add Jan's acked-by
- Record Stefano's disagreement on the approach

Changes in v3:
- Patch added
---
 xen/arch/arm/acpi/lib.c |  2 ++
 xen/arch/arm/include/asm/config.h   |  6 --
 xen/arch/arm/include/asm/early_printk.h |  1 +
 xen/arch/arm/include/asm/fixmap.h   | 24 
 xen/arch/arm/include/asm/mm.h   |  4 
 xen/arch/arm/kernel.c   |  1 +
 xen/arch/arm/mm.c   |  1 +
 xen/include/xen/acpi.h  | 18 +++---
 8 files changed, 40 insertions(+), 17 deletions(-)
 create mode 100644 xen/arch/arm/include/asm/fixmap.h

diff --git a/xen/arch/arm/acpi/lib.c b/xen/arch/arm/acpi/lib.c
index a59cc4074cfb..41d521f720ac 100644
--- a/xen/arch/arm/acpi/lib.c
+++ b/xen/arch/arm/acpi/lib.c
@@ -25,6 +25,8 @@
 #include 
 #include 
 
+#include 
+
 static bool fixmap_inuse;
 
 char *__acpi_map_table(paddr_t phys, unsigned long size)
diff --git a/xen/arch/arm/include/asm/config.h 
b/xen/arch/arm/include/asm/config.h
index b25c9d39bb32..3e2a55a91058 100644
--- a/xen/arch/arm/include/asm/config.h
+++ b/xen/arch/arm/include/asm/config.h
@@ -169,12 +169,6 @@
 
 #endif
 
-/* Fixmap slots */
-#define FIXMAP_CONSOLE  0  /* The primary UART */
-#define FIXMAP_MISC 1  /* Ephemeral mappings of hardware */
-#define FIXMAP_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
-#define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  /* 
End mappings of ACPI tables */
-
 #define NR_hypercalls 64
 
 #define STACK_ORDER 3
diff --git a/xen/arch/arm/include/asm/early_printk.h 
b/xen/arch/arm/include/asm/early_printk.h
index 8dc911cf48a3..c5149b2976da 100644
--- a/xen/arch/arm/include/asm/early_printk.h
+++ b/xen/arch/arm/include/asm/early_printk.h
@@ -11,6 +11,7 @@
 #define __ARM_EARLY_PRINTK_H__
 
 #include 
+#include 
 
 #ifdef CONFIG_EARLY_PRINTK
 
diff --git a/xen/arch/arm/include/asm/fixmap.h 
b/xen/arch/arm/include/asm/fixmap.h
new file mode 100644
index ..1cee51e52ab9
--- /dev/null
+++ b/xen/arch/arm/include/asm/fixmap.h
@@ -0,0 +1,24 @@
+/*
+ * fixmap.h: compile-time virtual memory allocation
+ */
+#ifndef __ASM_FIXMAP_H
+#define __ASM_FIXMAP_H
+
+#include 
+
+/* Fixmap slots */
+#define FIXMAP_CONSOLE  0  /* The primary UART */
+#define FIXMAP_MISC 1  /* Ephemeral mappings of hardware */
+#define FIXMAP_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
+#define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  /* 
End mappings of ACPI tables */
+
+#ifndef __ASSEMBLY__
+
+/* Map a page in a fixmap entry */
+extern void set_fixmap(unsigned map, mfn_t mfn, unsigned attributes);
+/* Remove a mapping from a fixmap entry */
+extern void clear_fixmap(unsigned map);
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __ASM_FIXMAP_H */
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 424aaf28230b..045a8ba4bb63 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -191,10 +191,6 @@ extern void mmu_init_secondary_cpu(void);
 extern void setup_xenheap_mappings(unsigned long base_mfn, unsigned long 
nr_mfns);
 /* Map a frame table to cover physical addresses ps through pe */
 extern void setup_frametable_mappings(paddr_t ps, paddr_t pe);
-/* Map a 4k page in a fixmap entry */
-extern void set_fixmap(unsigned map, mfn_t mfn, unsigned attributes);
-/* Remove a mapping from a fixmap entry */
-extern void clear_fixmap(unsigned map);
 /* map a physical range in virtual memory */
 void __iomem *ioremap_attr(paddr_t start, size_t len, unsigned attributes);
 
diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 8f43caa1866d..25ded1c056d9 100644

[PATCH 08/16] xen/arm: mm: Allow page-table allocation from the boot allocator

2022-05-20 Thread Julien Grall

From: Julien Grall 

At the moment, page-table can only be allocated from domheap. This means
it is not possible to create mapping in the page-tables via
map_pages_to_xen() if page-table needs to be allocated.

In order to avoid open-coding page-tables update in early boot, we need
to be able to allocate page-tables much earlier. Thankfully, we have the
boot allocator for those cases.

create_xen_table() is updated to cater early boot allocation by using
alloc_boot_pages().

Note, this is not sufficient to bootstrap the page-tables (i.e mapping
before any memory is actually mapped). This will be addressed
separately.

Signed-off-by: Julien Grall 
Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Changes in v4:
- Add Stefano's reviewed-by

Changes in v2:
- New patch
---
 xen/arch/arm/mm.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 252114d67df5..6b7b72de27fe 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1030,19 +1030,27 @@ static void xen_unmap_table(const lpae_t *table)
 
 static int create_xen_table(lpae_t *entry)
 {
-struct page_info *pg;
+mfn_t mfn;
 void *p;
 lpae_t pte;
 
-pg = alloc_domheap_page(NULL, 0);
-if ( pg == NULL )
-return -ENOMEM;
+if ( system_state != SYS_STATE_early_boot )
+{
+struct page_info *pg = alloc_domheap_page(NULL, 0);
+
+if ( pg == NULL )
+return -ENOMEM;
+
+mfn = page_to_mfn(pg);
+}
+else
+mfn = alloc_boot_pages(1, 1);
 
-p = xen_map_table(page_to_mfn(pg));
+p = xen_map_table(mfn);
 clear_page(p);
 xen_unmap_table(p);
 
-pte = mfn_to_xen_entry(page_to_mfn(pg), MT_NORMAL);
+pte = mfn_to_xen_entry(mfn, MT_NORMAL);
 pte.pt.table = 1;
 write_pte(entry, pte);
 
-- 
2.32.0

[PATCH 07/16] xen/arm: mm: Allocate xen page tables in domheap rather than xenheap

2022-05-20 Thread Julien Grall

From: Julien Grall 

xen_{un,}map_table() already uses the helper to map/unmap pages
on-demand (note this is currently a NOP on arm64). So switching to
domheap don't have any disavantage.

But this as the benefit:
- to keep the page tables unmapped if an arch decided to do so
- reduce xenheap use on arm32 which can be pretty small

Signed-off-by: Julien Grall 
Acked-by: Stefano Stabellini 

---
Changes in v3:
- Add Stefano's acked-by

Changes in v2:
- New patch
---
 xen/arch/arm/mm.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 47c2111c36a4..252114d67df5 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -988,21 +988,6 @@ void *ioremap(paddr_t pa, size_t len)
 return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE);
 }
 
-static int create_xen_table(lpae_t *entry)
-{
-void *p;
-lpae_t pte;
-
-p = alloc_xenheap_page();
-if ( p == NULL )
-return -ENOMEM;
-clear_page(p);
-pte = mfn_to_xen_entry(virt_to_mfn(p), MT_NORMAL);
-pte.pt.table = 1;
-write_pte(entry, pte);
-return 0;
-}
-
 static lpae_t *xen_map_table(mfn_t mfn)
 {
 /*
@@ -1043,6 +1028,27 @@ static void xen_unmap_table(const lpae_t *table)
 unmap_domain_page(table);
 }
 
+static int create_xen_table(lpae_t *entry)
+{
+struct page_info *pg;
+void *p;
+lpae_t pte;
+
+pg = alloc_domheap_page(NULL, 0);
+if ( pg == NULL )
+return -ENOMEM;
+
+p = xen_map_table(page_to_mfn(pg));
+clear_page(p);
+xen_unmap_table(p);
+
+pte = mfn_to_xen_entry(page_to_mfn(pg), MT_NORMAL);
+pte.pt.table = 1;
+write_pte(entry, pte);
+
+return 0;
+}
+
 #define XEN_TABLE_MAP_FAILED 0
 #define XEN_TABLE_SUPER_PAGE 1
 #define XEN_TABLE_NORMAL_PAGE 2
-- 
2.32.0

Re: [PATCH] build: Fix make warning if there is no cppcheck

2022-05-20 Thread Bertrand Marquis

Hi Jan,

> On 20 May 2022, at 12:06, Jan Beulich  wrote:
> 
> On 20.05.2022 12:49, Bertrand Marquis wrote:
>> --- a/xen/Makefile
>> +++ b/xen/Makefile
>> @@ -694,12 +694,13 @@ $(objtree)/%.c.cppcheck: $(srctree)/%.c 
>> $(objtree)/include/generated/autoconf.h
>>  $(call if_changed,cppcheck_xml)
>> 
>> cppcheck-version:
>> -ifeq ($(shell which $(CPPCHECK)),)
>> +ifeq ($(shell which $(CPPCHECK) 2> /dev/null),)
>>  $(error Cannot find cppcheck executable: $(CPPCHECK))
>> -endif
>> +else
>> ifeq ($(shell $(CPPCHECK) --version | awk '{print ($$2 < 2.7)}'),1)
>>  $(error Please upgrade your cppcheck to version 2.7 or greater)
>> endif
>> +endif
> 
> While I agree this will silence things, I still would prefer if you
> switched to $(if ...) inside the rule - there's no need to invoke the
> shell while parsing the makefile. Anything like this only needlessly
> slows down the build. Not by much, but it sums up.

I will submit a v2 to solve this properly.

Cheers
Bertrand

> 
> Jan
>

[PATCH 14/16] xen/arm64: mm: Add memory to the boot allocator first

2022-05-20 Thread Julien Grall

From: Julien Grall 

Currently, memory is added to the boot allocator after the xenheap
mappings are done. This will break if the first mapping is more than
512GB of RAM.

In addition to that, a follow-up patch will rework setup_xenheap_mappings()
to use smaller mappings (e.g. 2MB, 4KB). So it will be necessary to have
memory in the boot allocator earlier.

Only free memory (e.g. not reserved or modules) can be added to the boot
allocator. It might be possible that some regions (including the first
one) will have no free memory.

So we need to add all the free memory to the boot allocator first
and then add do the mappings.

Populating the boot allocator is nearly the same between arm32 and
arm64. The only difference is on the former we need to exclude the
xenheap for the boot allocator. Gate the difference with CONFIG_ARM_32
so the code be re-used on arm64.

Signed-off-by: Julien Grall 

---
Changes in v4:
- The implementation of populate_boot_allocator() has been
  moved in a separate patch.
- Fix typo

Changes in v3:
- Patch added
---
 xen/arch/arm/setup.c | 55 +++-
 1 file changed, 24 insertions(+), 31 deletions(-)

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 3d5a2283d4ef..db1768c03f03 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -636,13 +636,12 @@ static void __init init_staticmem_pages(void)
 #endif
 }
 
-#ifdef CONFIG_ARM_32
 /*
  * Populate the boot allocator. All the RAM but the following regions
  * will be added:
  *  - Modules (e.g., Xen, Kernel)
  *  - Reserved regions
- *  - Xenheap
+ *  - Xenheap (arm32 only)
  */
 static void __init populate_boot_allocator(void)
 {
@@ -672,6 +671,7 @@ static void __init populate_boot_allocator(void)
 if ( e > bank_end )
 e = bank_end;
 
+#ifdef CONFIG_ARM_32
 /* Avoid the xenheap */
 if ( s < mfn_to_maddr(xenheap_mfn_end) &&
  mfn_to_maddr(xenheap_mfn_start) < e )
@@ -679,6 +679,7 @@ static void __init populate_boot_allocator(void)
 e = mfn_to_maddr(xenheap_mfn_start);
 n = mfn_to_maddr(xenheap_mfn_end);
 }
+#endif
 
 fw_unreserved_regions(s, e, init_boot_pages, 0);
 s = n;
@@ -686,6 +687,7 @@ static void __init populate_boot_allocator(void)
 }
 }
 
+#ifdef CONFIG_ARM_32
 static void __init setup_mm(void)
 {
 paddr_t ram_start, ram_end, ram_size, e;
@@ -781,45 +783,36 @@ static void __init setup_mm(void)
 #else /* CONFIG_ARM_64 */
 static void __init setup_mm(void)
 {
+const struct meminfo *banks = &bootinfo.mem;
 paddr_t ram_start = ~0;
 paddr_t ram_end = 0;
 paddr_t ram_size = 0;
-int bank;
+unsigned int i;
 
 init_pdx();
 
-total_pages = 0;
-for ( bank = 0 ; bank < bootinfo.mem.nr_banks; bank++ )
-{
-paddr_t bank_start = bootinfo.mem.bank[bank].start;
-paddr_t bank_size = bootinfo.mem.bank[bank].size;
-paddr_t bank_end = bank_start + bank_size;
-paddr_t s, e;
-
-ram_size = ram_size + bank_size;
-ram_start = min(ram_start,bank_start);
-ram_end = max(ram_end,bank_end);
-
-setup_xenheap_mappings(bank_start>>PAGE_SHIFT, bank_size>>PAGE_SHIFT);
-
-s = bank_start;
-while ( s < bank_end )
-{
-paddr_t n = bank_end;
+/*
+ * We need some memory to allocate the page-tables used for the xenheap
+ * mappings. But some regions may contain memory already allocated
+ * for other uses (e.g. modules, reserved-memory...).
+ *
+ * For simplicity, add all the free regions in the boot allocator.
+ */
+populate_boot_allocator();
 
-e = next_module(s, &n);
+total_pages = 0;
 
-if ( e == ~(paddr_t)0 )
-{
-e = n = bank_end;
-}
+for ( i = 0; i < banks->nr_banks; i++ )
+{
+const struct membank *bank = &banks->bank[i];
+paddr_t bank_end = bank->start + bank->size;
 
-if ( e > bank_end )
-e = bank_end;
+ram_size = ram_size + bank->size;
+ram_start = min(ram_start, bank->start);
+ram_end = max(ram_end, bank_end);
 
-fw_unreserved_regions(s, e, init_boot_pages, 0);
-s = n;
-}
+setup_xenheap_mappings(PFN_DOWN(bank->start),
+   PFN_DOWN(bank->size));
 }
 
 total_pages += ram_size >> PAGE_SHIFT;
-- 
2.32.0

[PATCH 11/16] xen/arm: mm: Clean-up the includes and order them

2022-05-20 Thread Julien Grall

From: Julien Grall 

The numbers of includes in mm.c has been growing quite a lot. However
some of them (e.g. xen/device_tree.h, xen/softirq.h) doesn't look
to be directly used by the file or other will be included by
larger headers (e.g asm/flushtlb.h will be included by xen/mm.h).

So trim down the number of includes. Take the opportunity to order
them with the xen headers first, then asm headers and last public
headers.

Signed-off-by: Julien Grall 
Acked-by: Stefano Stabellini 

---

Changes in v4:
- Add Stefano's acked-by

Changes in v3:
- Patch added
---
 xen/arch/arm/mm.c | 27 ++-
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index bd1348a99716..d40dd4e6c9e6 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -17,33 +17,26 @@
  * GNU General Public License for more details.
  */
 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
+#include 
 #include 
 #include 
-#include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
+#include 
+#include 
+#include 
+#include 
 #include 
+#include 
+#include 
 #include 
+
 #include 
-#include 
-#include 
-#include 
 
 #include 
 #include 
 
+#include 
+
 /* Override macros from asm/page.h to make them work with mfn_t */
 #undef virt_to_mfn
 #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
-- 
2.32.0

[PATCH 10/16] xen/arm: add Persistent Map (PMAP) infrastructure

2022-05-20 Thread Julien Grall

From: Wei Liu 

The basic idea is like Persistent Kernel Map (PKMAP) in Linux. We
pre-populate all the relevant page tables before the system is fully
set up.

We will need it on Arm in order to rework the arm64 version of
xenheap_setup_mappings() as we may need to use pages allocated from
the boot allocator before they are effectively mapped.

This infrastructure is not lock-protected therefore can only be used
before smpboot. After smpboot, map_domain_page() has to be used.

This is based on the x86 version [1] that was originally implemented
by Wei Liu.

The PMAP infrastructure is implemented in common code with some
arch helpers to set/clear the page-table entries and convertion
between a fixmap slot to a virtual address...

As mfn_to_xen_entry() now needs to be exported, take the opportunity
to swich the parameter attr from unsigned to unsigned int.

[1] 


Signed-off-by: Wei Liu 
Signed-off-by: Hongyan Xia 
[julien: Adapted for Arm]
Signed-off-by: Julien Grall 

---
Changes in v4:
- Move xen_fixmap in fixmap.h and add a comment about its usage.
- Update comments
- Use DECLARE_BITMAP()
- Replace local_irq_{enable, disable} with an ASSERT() as there
  should be no user of pmap() in interrupt context.

Changes in v3:
- s/BITS_PER_LONG/BITS_PER_BYTE/
- Move pmap to common code

Changes in v2:
- New patch

Cc: Jan Beulich 
Cc: Wei Liu 
Cc: Andrew Cooper 
Cc: Roger Pau Monné 
---
 xen/arch/arm/Kconfig  |  1 +
 xen/arch/arm/include/asm/fixmap.h | 24 +++
 xen/arch/arm/include/asm/lpae.h   |  8 
 xen/arch/arm/include/asm/pmap.h   | 32 ++
 xen/arch/arm/mm.c |  7 +--
 xen/common/Kconfig|  3 ++
 xen/common/Makefile   |  1 +
 xen/common/pmap.c | 72 +++
 xen/include/xen/pmap.h| 16 +++
 9 files changed, 158 insertions(+), 6 deletions(-)
 create mode 100644 xen/arch/arm/include/asm/pmap.h
 create mode 100644 xen/common/pmap.c
 create mode 100644 xen/include/xen/pmap.h

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index ecfa6822e4d3..a89a67802aa9 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -14,6 +14,7 @@ config ARM
select HAS_DEVICE_TREE
select HAS_PASSTHROUGH
select HAS_PDX
+   select HAS_PMAP
select IOMMU_FORCE_PT_SHARE
 
 config ARCH_DEFCONFIG
diff --git a/xen/arch/arm/include/asm/fixmap.h 
b/xen/arch/arm/include/asm/fixmap.h
index 1cee51e52ab9..365a2385a087 100644
--- a/xen/arch/arm/include/asm/fixmap.h
+++ b/xen/arch/arm/include/asm/fixmap.h
@@ -5,20 +5,44 @@
 #define __ASM_FIXMAP_H
 
 #include 
+#include 
 
 /* Fixmap slots */
 #define FIXMAP_CONSOLE  0  /* The primary UART */
 #define FIXMAP_MISC 1  /* Ephemeral mappings of hardware */
 #define FIXMAP_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
 #define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  /* 
End mappings of ACPI tables */
+#define FIXMAP_PMAP_BEGIN (FIXMAP_ACPI_END + 1) /* Start of PMAP */
+#define FIXMAP_PMAP_END (FIXMAP_PMAP_BEGIN + NUM_FIX_PMAP - 1) /* End of PMAP 
*/
+
+#define FIXMAP_LAST FIXMAP_PMAP_END
+
+#define FIXADDR_START FIXMAP_ADDR(0)
+#define FIXADDR_TOP FIXMAP_ADDR(FIXMAP_LAST)
 
 #ifndef __ASSEMBLY__
 
+/*
+ * Direct access to xen_fixmap[] should only happen when {set,
+ * clear}_fixmap() is unusable (e.g. where we would end up to
+ * recursively call the helpers).
+ */
+extern lpae_t xen_fixmap[XEN_PT_LPAE_ENTRIES];
+
 /* Map a page in a fixmap entry */
 extern void set_fixmap(unsigned map, mfn_t mfn, unsigned attributes);
 /* Remove a mapping from a fixmap entry */
 extern void clear_fixmap(unsigned map);
 
+#define fix_to_virt(slot) ((void *)FIXMAP_ADDR(slot))
+
+static inline unsigned int virt_to_fix(vaddr_t vaddr)
+{
+BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START);
+
+return ((vaddr - FIXADDR_START) >> PAGE_SHIFT);
+}
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ASM_FIXMAP_H */
diff --git a/xen/arch/arm/include/asm/lpae.h b/xen/arch/arm/include/asm/lpae.h
index aecb320dec45..fc19cbd84772 100644
--- a/xen/arch/arm/include/asm/lpae.h
+++ b/xen/arch/arm/include/asm/lpae.h
@@ -4,6 +4,7 @@
 #ifndef __ASSEMBLY__
 
 #include 
+#include 
 
 /*
  * WARNING!  Unlike the x86 pagetable code, where l1 is the lowest level and
@@ -168,6 +169,13 @@ static inline bool lpae_is_superpage(lpae_t pte, unsigned 
int level)
 third_table_offset(addr)\
 }
 
+/*
+ * Standard entry type that we'll use to build Xen's own pagetables.
+ * We put the same permissions at every level, because they're ignored
+ * by the walker in non-leaf entries.
+ */
+lpae_t mfn_to_xen_entry(mfn_t mfn, unsigned int attr);
+
 #endif /* __ASSEMBLY__ */
 
 /*
diff --git a/xen/arch/arm/include/asm/pmap.h b/xen/arch/arm/include/asm/pmap.h
new file mode 100644
index ..74398b4c4fe6
--- /dev/null
+++ b/xen/arch/arm/include/

[PATCH 16/16] xen/arm: mm: Re-implement setup_frame_table_mappings() with map_pages_to_xen()

2022-05-20 Thread Julien Grall

From: Julien Grall 

Now that map_pages_to_xen() has been extended to support 2MB mappings,
we can replace the create_mappings() call by map_pages_to_xen() call.

This has the advantage to remove the differences between 32-bit and
64-bit code.

Lastly remove create_mappings() as there is no more callers.

Signed-off-by: Julien Grall 
Signed-off-by: Julien Grall 

---
Changes in v4:
- Add missing _PAGE_BLOCK

Changes in v3:
- Fix typo in the commit message
- Remove the TODO regarding contiguous bit

Changes in v2:
- New patch
---
 xen/arch/arm/mm.c | 64 +--
 1 file changed, 6 insertions(+), 58 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 65af44f42232..be37176a4725 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -369,40 +369,6 @@ void clear_fixmap(unsigned map)
 BUG_ON(res != 0);
 }
 
-/* Create Xen's mappings of memory.
- * Mapping_size must be either 2MB or 32MB.
- * Base and virt must be mapping_size aligned.
- * Size must be a multiple of mapping_size.
- * second must be a contiguous set of second level page tables
- * covering the region starting at virt_offset. */
-static void __init create_mappings(lpae_t *second,
-   unsigned long virt_offset,
-   unsigned long base_mfn,
-   unsigned long nr_mfns,
-   unsigned int mapping_size)
-{
-unsigned long i, count;
-const unsigned long granularity = mapping_size >> PAGE_SHIFT;
-lpae_t pte, *p;
-
-ASSERT((mapping_size == MB(2)) || (mapping_size == MB(32)));
-ASSERT(!((virt_offset >> PAGE_SHIFT) % granularity));
-ASSERT(!(base_mfn % granularity));
-ASSERT(!(nr_mfns % granularity));
-
-count = nr_mfns / XEN_PT_LPAE_ENTRIES;
-p = second + second_linear_offset(virt_offset);
-pte = mfn_to_xen_entry(_mfn(base_mfn), MT_NORMAL);
-if ( granularity == 16 * XEN_PT_LPAE_ENTRIES )
-pte.pt.contig = 1;  /* These maps are in 16-entry contiguous chunks. */
-for ( i = 0; i < count; i++ )
-{
-write_pte(p + i, pte);
-pte.pt.base += 1 << XEN_PT_LPAE_SHIFT;
-}
-flush_xen_tlb_local();
-}
-
 #ifdef CONFIG_DOMAIN_PAGE
 void *map_domain_page_global(mfn_t mfn)
 {
@@ -862,36 +828,18 @@ void __init setup_frametable_mappings(paddr_t ps, paddr_t 
pe)
 unsigned long frametable_size = nr_pdxs * sizeof(struct page_info);
 mfn_t base_mfn;
 const unsigned long mapping_size = frametable_size < MB(32) ? MB(2) : 
MB(32);
-#ifdef CONFIG_ARM_64
-lpae_t *second, pte;
-unsigned long nr_second;
-mfn_t second_base;
-int i;
-#endif
+int rc;
 
 frametable_base_pdx = mfn_to_pdx(maddr_to_mfn(ps));
 /* Round up to 2M or 32M boundary, as appropriate. */
 frametable_size = ROUNDUP(frametable_size, mapping_size);
 base_mfn = alloc_boot_pages(frametable_size >> PAGE_SHIFT, 32<<(20-12));
 
-#ifdef CONFIG_ARM_64
-/* Compute the number of second level pages. */
-nr_second = ROUNDUP(frametable_size, FIRST_SIZE) >> FIRST_SHIFT;
-second_base = alloc_boot_pages(nr_second, 1);
-second = mfn_to_virt(second_base);
-for ( i = 0; i < nr_second; i++ )
-{
-clear_page(mfn_to_virt(mfn_add(second_base, i)));
-pte = mfn_to_xen_entry(mfn_add(second_base, i), MT_NORMAL);
-pte.pt.table = 1;
-write_pte(&xen_first[first_table_offset(FRAMETABLE_VIRT_START)+i], 
pte);
-}
-create_mappings(second, 0, mfn_x(base_mfn), frametable_size >> PAGE_SHIFT,
-mapping_size);
-#else
-create_mappings(xen_second, FRAMETABLE_VIRT_START, mfn_x(base_mfn),
-frametable_size >> PAGE_SHIFT, mapping_size);
-#endif
+rc = map_pages_to_xen(FRAMETABLE_VIRT_START, base_mfn,
+  frametable_size >> PAGE_SHIFT,
+  PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
+if ( rc )
+panic("Unable to setup the frametable mappings.\n");
 
 memset(&frame_table[0], 0, nr_pdxs * sizeof(struct page_info));
 memset(&frame_table[nr_pdxs], -1,
-- 
2.32.0

[PATCH 15/16] xen/arm: mm: Rework setup_xenheap_mappings()

2022-05-20 Thread Julien Grall

From: Julien Grall 

The current implementation of setup_xenheap_mappings() is using 1GB
mappings. This can lead to unexpected result because the mapping
may alias a non-cachable region (such as device or reserved regions).
For more details see B2.8 in ARM DDI 0487H.a.

map_pages_to_xen() was recently reworked to allow superpage mappings,
support contiguous mapping and deal with the use of page-tables before
they are mapped.

Most of the code in setup_xenheap_mappings() is now replaced with a
single call to map_pages_to_xen().

Signed-off-by: Julien Grall 
Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Changes in v4:
- Fix typo
- Add Stefano's reviewed-by

Changes in v3:
- Don't use 1GB mapping
- Re-order code in setup_mm() in a separate patch

Changes in v2:
- New patch
---
 xen/arch/arm/mm.c | 87 ++-
 1 file changed, 18 insertions(+), 69 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index b019e4b35b55..65af44f42232 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -138,17 +138,6 @@ static DEFINE_PAGE_TABLE(cpu0_pgtable);
 static DEFINE_PAGE_TABLES(cpu0_dommap, DOMHEAP_SECOND_PAGES);
 #endif
 
-#ifdef CONFIG_ARM_64
-/* The first page of the first level mapping of the xenheap. The
- * subsequent xenheap first level pages are dynamically allocated, but
- * we need this one to bootstrap ourselves. */
-static DEFINE_PAGE_TABLE(xenheap_first_first);
-/* The zeroeth level slot which uses xenheap_first_first. Used because
- * setup_xenheap_mappings otherwise relies on mfn_to_virt which isn't
- * valid for a non-xenheap mapping. */
-static __initdata int xenheap_first_first_slot = -1;
-#endif
-
 /* Common pagetable leaves */
 /* Second level page tables.
  *
@@ -831,77 +820,37 @@ void __init setup_xenheap_mappings(unsigned long base_mfn,
 void __init setup_xenheap_mappings(unsigned long base_mfn,
unsigned long nr_mfns)
 {
-lpae_t *first, pte;
-unsigned long mfn, end_mfn;
-vaddr_t vaddr;
-
-/* Align to previous 1GB boundary */
-mfn = base_mfn & ~((FIRST_SIZE>>PAGE_SHIFT)-1);
+int rc;
 
 /* First call sets the xenheap physical and virtual offset. */
 if ( mfn_eq(xenheap_mfn_start, INVALID_MFN) )
 {
+unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1);
+
 xenheap_mfn_start = _mfn(base_mfn);
 xenheap_base_pdx = mfn_to_pdx(_mfn(base_mfn));
+/*
+ * The base address may not be aligned to the first level
+ * size (e.g. 1GB when using 4KB pages). This would prevent
+ * superpage mappings for all the regions because the virtual
+ * address and machine address should both be suitably aligned.
+ *
+ * Prevent that by offsetting the start of the xenheap virtual
+ * address.
+ */
 xenheap_virt_start = DIRECTMAP_VIRT_START +
-(base_mfn - mfn) * PAGE_SIZE;
+(base_mfn - mfn_gb) * PAGE_SIZE;
 }
 
 if ( base_mfn < mfn_x(xenheap_mfn_start) )
 panic("cannot add xenheap mapping at %lx below heap start %lx\n",
   base_mfn, mfn_x(xenheap_mfn_start));
 
-end_mfn = base_mfn + nr_mfns;
-
-/*
- * Virtual address aligned to previous 1GB to match physical
- * address alignment done above.
- */
-vaddr = (vaddr_t)__mfn_to_virt(base_mfn) & FIRST_MASK;
-
-while ( mfn < end_mfn )
-{
-int slot = zeroeth_table_offset(vaddr);
-lpae_t *p = &xen_pgtable[slot];
-
-if ( p->pt.valid )
-{
-/* mfn_to_virt is not valid on the 1st 1st mfn, since it
- * is not within the xenheap. */
-first = slot == xenheap_first_first_slot ?
-xenheap_first_first : mfn_to_virt(lpae_get_mfn(*p));
-}
-else if ( xenheap_first_first_slot == -1)
-{
-/* Use xenheap_first_first to bootstrap the mappings */
-first = xenheap_first_first;
-
-pte = pte_of_xenaddr((vaddr_t)xenheap_first_first);
-pte.pt.table = 1;
-write_pte(p, pte);
-
-xenheap_first_first_slot = slot;
-}
-else
-{
-mfn_t first_mfn = alloc_boot_pages(1, 1);
-
-clear_page(mfn_to_virt(first_mfn));
-pte = mfn_to_xen_entry(first_mfn, MT_NORMAL);
-pte.pt.table = 1;
-write_pte(p, pte);
-first = mfn_to_virt(first_mfn);
-}
-
-pte = mfn_to_xen_entry(_mfn(mfn), MT_NORMAL);
-/* TODO: Set pte.pt.contig when appropriate. */
-write_pte(&first[first_table_offset(vaddr)], pte);
-
-mfn += FIRST_SIZE>>PAGE_SHIFT;
-vaddr += FIRST_SIZE;
-}
-
-flush_xen_tlb_local();
+rc = map_pages_to_xen((vaddr_t)__mfn_to_virt(base_mfn),
+  _mfn(base_mfn), nr_mfns,
+  PAGE

[PATCH 13/16] xen/arm32: setup: Move out the code to populate the boot allocator

2022-05-20 Thread Julien Grall

From: Julien Grall 

In a follow-up patch, we will want to populate the boot allocator
separately for arm64. The code will end up to be very similar to the one
on arm32. So move out the code in a new helper populate_boot_allocator().

For now the code is still protected by CONFIG_ARM_32 to avoid any build
failure on arm64.

Take the opportunity to replace mfn_add(xen_mfn_start, xenheap_pages) with
xenheap_mfn_end as they are equivalent.

Signed-off-by: Julien Grall 

---

Changes in v4:
- Patch added
---
 xen/arch/arm/setup.c | 90 +---
 1 file changed, 51 insertions(+), 39 deletions(-)

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index d5d0792ed48a..3d5a2283d4ef 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -637,10 +637,58 @@ static void __init init_staticmem_pages(void)
 }
 
 #ifdef CONFIG_ARM_32
+/*
+ * Populate the boot allocator. All the RAM but the following regions
+ * will be added:
+ *  - Modules (e.g., Xen, Kernel)
+ *  - Reserved regions
+ *  - Xenheap
+ */
+static void __init populate_boot_allocator(void)
+{
+unsigned int i;
+const struct meminfo *banks = &bootinfo.mem;
+
+for ( i = 0; i < banks->nr_banks; i++ )
+{
+const struct membank *bank = &banks->bank[i];
+paddr_t bank_end = bank->start + bank->size;
+paddr_t s, e;
+
+s = bank->start;
+while ( s < bank_end )
+{
+paddr_t n = bank_end;
+
+e = next_module(s, &n);
+
+if ( e == ~(paddr_t)0 )
+e = n = bank_end;
+
+/*
+ * Module in a RAM bank other than the one which we are
+ * not dealing with here.
+ */
+if ( e > bank_end )
+e = bank_end;
+
+/* Avoid the xenheap */
+if ( s < mfn_to_maddr(xenheap_mfn_end) &&
+ mfn_to_maddr(xenheap_mfn_start) < e )
+{
+e = mfn_to_maddr(xenheap_mfn_start);
+n = mfn_to_maddr(xenheap_mfn_end);
+}
+
+fw_unreserved_regions(s, e, init_boot_pages, 0);
+s = n;
+}
+}
+}
+
 static void __init setup_mm(void)
 {
-paddr_t ram_start, ram_end, ram_size;
-paddr_t s, e;
+paddr_t ram_start, ram_end, ram_size, e;
 unsigned long ram_pages;
 unsigned long heap_pages, xenheap_pages, domheap_pages;
 int i;
@@ -718,43 +766,7 @@ static void __init setup_mm(void)
 setup_xenheap_mappings((e >> PAGE_SHIFT) - xenheap_pages, xenheap_pages);
 
 /* Add non-xenheap memory */
-for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
-{
-paddr_t bank_start = bootinfo.mem.bank[i].start;
-paddr_t bank_end = bank_start + bootinfo.mem.bank[i].size;
-
-s = bank_start;
-while ( s < bank_end )
-{
-paddr_t n = bank_end;
-
-e = next_module(s, &n);
-
-if ( e == ~(paddr_t)0 )
-{
-e = n = ram_end;
-}
-
-/*
- * Module in a RAM bank other than the one which we are
- * not dealing with here.
- */
-if ( e > bank_end )
-e = bank_end;
-
-/* Avoid the xenheap */
-if ( s < mfn_to_maddr(mfn_add(xenheap_mfn_start, xenheap_pages))
- && mfn_to_maddr(xenheap_mfn_start) < e )
-{
-e = mfn_to_maddr(xenheap_mfn_start);
-n = mfn_to_maddr(mfn_add(xenheap_mfn_start, xenheap_pages));
-}
-
-fw_unreserved_regions(s, e, init_boot_pages, 0);
-
-s = n;
-}
-}
+populate_boot_allocator();
 
 /* Frame table covers all of RAM region, including holes */
 setup_frametable_mappings(ram_start, ram_end);
-- 
2.32.0

[PATCH v2] build: Fix make warning if there is no cppcheck

2022-05-20 Thread Bertrand Marquis

If cppcheck is not present, the following warning appears during build:
which: no cppcheck in ([...])
/bin/sh: cppcheck: command not found

Fix the problem by using shell code inside the cppcheck-version rule to
also prevent unneeded call of which when something else than cppcheck is
built.

Reported-by: Julien Grall 
Signed-off-by: Bertrand Marquis 
---
 xen/Makefile | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/xen/Makefile b/xen/Makefile
index 15388703bc..e8d8ed71bc 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -694,12 +694,14 @@ $(objtree)/%.c.cppcheck: $(srctree)/%.c 
$(objtree)/include/generated/autoconf.h
$(call if_changed,cppcheck_xml)
 
 cppcheck-version:
-ifeq ($(shell which $(CPPCHECK)),)
-   $(error Cannot find cppcheck executable: $(CPPCHECK))
-endif
-ifeq ($(shell $(CPPCHECK) --version | awk '{print ($$2 < 2.7)}'),1)
-   $(error Please upgrade your cppcheck to version 2.7 or greater)
-endif
+   @if ! which $(CPPCHECK) > /dev/null 2>&1; then \
+   echo "Cannot find cppcheck executable: $(CPPCHECK)"; \
+   exit 1; \
+   fi
+   @if [ "$$($(CPPCHECK) --version | awk '{print ($$2 < 2.7)}')" -eq 1 ]; 
then \
+   echo "Please upgrade your cppcheck to version 2.7 or greater"; \
+   exit 1; \
+   fi
 
 # Put this in generated headers this way it is cleaned by include/Makefile
 $(objtree)/include/generated/compiler-def.h:
-- 
2.25.1

[PATCH 12/16] xen/arm: mm: Use the PMAP helpers in xen_{,un}map_table()

2022-05-20 Thread Julien Grall

From: Julien Grall 

During early boot, it is not possible to use xen_{,un}map_table()
if the page tables are not residing the Xen binary.

This is a blocker to switch some of the helpers to use xen_pt_update()
as we may need to allocate extra page tables and access them before
the domheap has been initialized (see setup_xenheap_mappings()).

xen_{,un}map_table() are now updated to use the PMAP helpers for early
boot map/unmap. Note that the special case for page-tables residing
in Xen binary has been dropped because it is "complex" and was
only added as a workaround in 8d4f1b8878e0 ("xen/arm: mm: Allow
generic xen page-tables helpers to be called early").

Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Changes in v4:
- Add Stefano's reviewed-by

Changes in v2:
- New patch
---
 xen/arch/arm/mm.c | 33 +
 1 file changed, 9 insertions(+), 24 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index d40dd4e6c9e6..b019e4b35b55 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -980,27 +981,11 @@ void *ioremap(paddr_t pa, size_t len)
 static lpae_t *xen_map_table(mfn_t mfn)
 {
 /*
- * We may require to map the page table before map_domain_page() is
- * useable. The requirements here is it must be useable as soon as
- * page-tables are allocated dynamically via alloc_boot_pages().
- *
- * We need to do the check on physical address rather than virtual
- * address to avoid truncation on Arm32. Therefore is_kernel() cannot
- * be used.
+ * During early boot, map_domain_page() may be unusable. Use the
+ * PMAP to map temporarily a page-table.
  */
 if ( system_state == SYS_STATE_early_boot )
-{
-if ( is_xen_fixed_mfn(mfn) )
-{
-/*
- * It is fine to demote the type because the size of Xen
- * will always fit in vaddr_t.
- */
-vaddr_t offset = mfn_to_maddr(mfn) - virt_to_maddr(&_start);
-
-return (lpae_t *)(XEN_VIRT_START + offset);
-}
-}
+return pmap_map(mfn);
 
 return map_domain_page(mfn);
 }
@@ -1009,12 +994,12 @@ static void xen_unmap_table(const lpae_t *table)
 {
 /*
  * During early boot, xen_map_table() will not use map_domain_page()
- * for page-tables residing in Xen binary. So skip the unmap part.
+ * but the PMAP.
  */
-if ( system_state == SYS_STATE_early_boot && is_kernel(table) )
-return;
-
-unmap_domain_page(table);
+if ( system_state == SYS_STATE_early_boot )
+pmap_unmap(table);
+else
+unmap_domain_page(table);
 }
 
 static int create_xen_table(lpae_t *entry)
-- 
2.32.0

Re: [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables

2022-05-20 Thread Roger Pau Monné

On Fri, May 20, 2022 at 01:13:28PM +0200, Jan Beulich wrote:
> On 20.05.2022 13:11, Jan Beulich wrote:
> > On 20.05.2022 12:47, Roger Pau Monné wrote:
> >> On Thu, May 19, 2022 at 02:12:04PM +0200, Jan Beulich wrote:
> >>> On 06.05.2022 13:16, Roger Pau Monné wrote:
>  On Mon, Apr 25, 2022 at 10:40:55AM +0200, Jan Beulich wrote:
> > --- a/xen/drivers/passthrough/amd/iommu_map.c
> > +++ b/xen/drivers/passthrough/amd/iommu_map.c
> > @@ -115,7 +115,19 @@ static void set_iommu_ptes_present(unsig
> >  
> >  while ( nr_ptes-- )
> >  {
> > -set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
> > +ASSERT(!pde->next_level);
> > +ASSERT(!pde->u);
> > +
> > +if ( pde > table )
> > +ASSERT(pde->ign0 == find_first_set_bit(pde - table));
> > +else
> > +ASSERT(pde->ign0 == PAGE_SHIFT - 3);
> 
>  I think PAGETABLE_ORDER would be clearer here.
> >>>
> >>> I disagree - PAGETABLE_ORDER is a CPU-side concept. It's not used anywhere
> >>> in IOMMU code afaics.
> >>
> >> Isn't PAGE_SHIFT also a CPU-side concept in the same way?  I'm not
> >> sure what's the rule for declaring that PAGE_SHIFT is fine to use in
> >> IOMMU code  but not PAGETABLE_ORDER.
> > 
> > Hmm, yes and no. But for consistency with other IOMMU code I may want
> > to switch to PAGE_SHIFT_4K.
> 
> Except that, with the plan to re-use pt_update_contig_markers() for CPU-
> side re-coalescing, there I'd prefer to stick to PAGE_SHIFT.

Then can PAGETABLE_ORDER be used instead of PAGE_SHIFT - 3?

IMO it makes the code quite easier to understand.

Thanks, Roger.

[ovmf test] 170601: regressions - FAIL

2022-05-20 Thread osstest service owner

flight 170601 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170601/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf a21a3438f795deecb24e1843c1636f95c485017c
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z   81 days
Failing since168258  2022-03-01 01:55:31 Z   80 days 1119 attempts
Testing same since   170593  2022-05-20 06:42:41 Z0 days7 attempts


People who touched revisions under test:
  Abdul Lateef Attar 
  Abdul Lateef Attar via groups.io 
  Abner Chang 
  Akihiko Odaki 
  Anthony PERARD 
  Bandaru, Purna Chandra Rao 
  Bo Chang Ke 
  Bob Feng 
  Chao Li 
  Chao, Zhuoran 
  Chen Lin Z 
  Chen, Christine 
  Chen, Lin Z 
  Corvin Köhne 
  Dandan Bi 
  dann frazier 
  Dun Tan 
  duntan 
  Feng, Bob C 
  Gerd Hoffmann 
  Gua Guo 
  Guo Dong 
  Guomin Jiang 
  Hao A Wu 
  Heng Luo 
  Hua Ma 
  Huang, Li-Xia 
  Jagadeesh Ujja 
  Jake Garver 
  Jake Garver via groups.io 
  Jason 
  Jason Lou 
  Jiewen Yao 
  Ke, Bo-ChangX 
  Ken Lautner 
  Kenneth Lautner 
  Kun Qin 
  Kuo, Ted 
  Laszlo Ersek 
  Lean Sheng Tan 
  Leif Lindholm 
  Li, Yi1 
  Li, Zhihao 
  Liming Gao 
  Liu 
  Liu Yun 
  Liu Yun Y 
  Liu, Zhiguang 
  Lixia Huang 
  Lou, Yun 
  Ma, Hua 
  Mara Sophie Grosch 
  Mara Sophie Grosch via groups.io 
  Matt DeVillier 
  Michael D Kinney 
  Michael Kubacki 
  Michael Kubacki 
  Min M Xu 
  Min Xu 
  Oliver Steffen 
  Patrick Rudolph 
  Peter Grehan 
  Purna Chandra Rao Bandaru 
  Ray Ni 
  Rebecca Cran 
  Rebecca Cran 
  Sami Mujawar 
  Sean Rhodes 
  Sean Rhodes sean@starlabs.systems
  Sebastien Boeuf 
  Sunny Wang 
  Tan, Dun 
  Ted Kuo 
  Tom Lendacky 
  Wenyi Xie 
  wenyi,xie via groups.io 
  Xiaolu.Jiang 
  Xie, Yuanhao 
  Yi Li 
  yi1 li 
  Yu Pu 
  Yuanhao Xie 
  Yuwei Chen 
  Zhiguang Liu 
  Zhihao Li 
  Zhuoran Chao 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 6968 lines long.)

Re: [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables

2022-05-20 Thread Jan Beulich

On 20.05.2022 14:22, Roger Pau Monné wrote:
> On Fri, May 20, 2022 at 01:13:28PM +0200, Jan Beulich wrote:
>> On 20.05.2022 13:11, Jan Beulich wrote:
>>> On 20.05.2022 12:47, Roger Pau Monné wrote:
 On Thu, May 19, 2022 at 02:12:04PM +0200, Jan Beulich wrote:
> On 06.05.2022 13:16, Roger Pau Monné wrote:
>> On Mon, Apr 25, 2022 at 10:40:55AM +0200, Jan Beulich wrote:
>>> --- a/xen/drivers/passthrough/amd/iommu_map.c
>>> +++ b/xen/drivers/passthrough/amd/iommu_map.c
>>> @@ -115,7 +115,19 @@ static void set_iommu_ptes_present(unsig
>>>  
>>>  while ( nr_ptes-- )
>>>  {
>>> -set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
>>> +ASSERT(!pde->next_level);
>>> +ASSERT(!pde->u);
>>> +
>>> +if ( pde > table )
>>> +ASSERT(pde->ign0 == find_first_set_bit(pde - table));
>>> +else
>>> +ASSERT(pde->ign0 == PAGE_SHIFT - 3);
>>
>> I think PAGETABLE_ORDER would be clearer here.
>
> I disagree - PAGETABLE_ORDER is a CPU-side concept. It's not used anywhere
> in IOMMU code afaics.

 Isn't PAGE_SHIFT also a CPU-side concept in the same way?  I'm not
 sure what's the rule for declaring that PAGE_SHIFT is fine to use in
 IOMMU code  but not PAGETABLE_ORDER.
>>>
>>> Hmm, yes and no. But for consistency with other IOMMU code I may want
>>> to switch to PAGE_SHIFT_4K.
>>
>> Except that, with the plan to re-use pt_update_contig_markers() for CPU-
>> side re-coalescing, there I'd prefer to stick to PAGE_SHIFT.
> 
> Then can PAGETABLE_ORDER be used instead of PAGE_SHIFT - 3?

pt_update_contig_markers() isn't IOMMU code; since I've said I'd switch
to PAGE_SHIFT_4K in IOMMU code I'm having a hard time seeing how I could
at the same time start using PAGETABLE_ORDER there.

What I maybe could do is use PTE_PER_TABLE_SHIFT in AMD code and
LEVEL_STRIDE in VT-d one. Yet I'm not sure that would be fully correct/
consistent, ...

> IMO it makes the code quite easier to understand.

... or in fact helping readability.

Jan

Re: [PATCH RFC 0/6] x86/ioapic: fix edge triggered interrupt migration

2022-05-20 Thread Roger Pau Monné

On Thu, Apr 21, 2022 at 03:21:08PM +0200, Roger Pau Monne wrote:
> Hello,
> 
> Following series attempts to solve the issue with IO-APIC edge triggered
> interrupts seeing an inconsistent RTE or IRTE when injected while being
> migrated.
> 
> It's currently RFC because some patches have post commit message notes,
> and because I'm not sure if patch 1 is really needed.  I originally had
> the idea of suggesting to only backport patch 1 in order to fix the
> issue in older releases, while leaving the more complex (and thus
> error prone) IOMMU changes in unstable.  Note however that patch 1 is
> just a workaround to prevent interrupts seeing inconsistent entries
> while being updated, masking the entry just makes the warning go away,
> but the interrupt will be lost.

Ping?

Thanks, Roger.

Re: Grant operation batching

2022-05-20 Thread Demi Marie Obenour

On Fri, May 20, 2022 at 08:24:48AM +0200, Jan Beulich wrote:
> On 20.05.2022 01:22, Demi Marie Obenour wrote:
> > It is well known that mapping and unmapping grants is expensive, which
> > is why blkback has persistent grants.  Could this cost be mitigated by
> > batching, and if it was, would it affect the tradeoff of memcpy() vs
> > grant table operations?
> 
> Which backend driver are you thinking about? The in-kernel Linux
> xen-blkback already batches grant operations, afaics. Such
> batching is helpful, but the main cost is assumed (known?) to be
> with the (installing and) tearing down of the actual mappings of
> the guest pages (into/)from backend address space.

My thought was that the expensive part of this is TLB flushes, which are
only needed once per batch.  Also, what do you think about the “unsafe”
mode?  It would only be unsafe if the backend is untrusted, but it is
quite common for the backend to be trusted.

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature

Re: Grant operation batching

2022-05-20 Thread Jan Beulich

On 20.05.2022 14:37, Demi Marie Obenour wrote:
> On Fri, May 20, 2022 at 08:24:48AM +0200, Jan Beulich wrote:
>> On 20.05.2022 01:22, Demi Marie Obenour wrote:
>>> It is well known that mapping and unmapping grants is expensive, which
>>> is why blkback has persistent grants.  Could this cost be mitigated by
>>> batching, and if it was, would it affect the tradeoff of memcpy() vs
>>> grant table operations?
>>
>> Which backend driver are you thinking about? The in-kernel Linux
>> xen-blkback already batches grant operations, afaics. Such
>> batching is helpful, but the main cost is assumed (known?) to be
>> with the (installing and) tearing down of the actual mappings of
>> the guest pages (into/)from backend address space.
> 
> My thought was that the expensive part of this is TLB flushes, which are
> only needed once per batch.

Correct, but as said - such batching is already being made use of by
the in-kernel Linux backend. Of course "once per batch" is not entirely
precise - very large batches would be split internally into smaller
chunks, each of which would be followed by a TLB flush.

>  Also, what do you think about the “unsafe”
> mode?  It would only be unsafe if the backend is untrusted, but it is
> quite common for the backend to be trusted.

Well, I didn't put much thought into that (and hence intentionally
didn't comment on it), so for now it's only "Why not?"

Jan

Re: [PATCH RFC 0/6] x86/ioapic: fix edge triggered interrupt migration

2022-05-20 Thread Jan Beulich

On 20.05.2022 14:36, Roger Pau Monné wrote:
> On Thu, Apr 21, 2022 at 03:21:08PM +0200, Roger Pau Monne wrote:
>> Hello,
>>
>> Following series attempts to solve the issue with IO-APIC edge triggered
>> interrupts seeing an inconsistent RTE or IRTE when injected while being
>> migrated.
>>
>> It's currently RFC because some patches have post commit message notes,
>> and because I'm not sure if patch 1 is really needed.  I originally had
>> the idea of suggesting to only backport patch 1 in order to fix the
>> issue in older releases, while leaving the more complex (and thus
>> error prone) IOMMU changes in unstable.  Note however that patch 1 is
>> just a workaround to prevent interrupts seeing inconsistent entries
>> while being updated, masking the entry just makes the warning go away,
>> but the interrupt will be lost.
> 
> Ping?

Sorry, the usual thing with RFCs: They take lower priority than other
work items. This series is certainly the first of the several pending
RFC series which I mean to get to, but it's hard to predict when this
would be.

Jan

Re: [PATCH v2] build: Fix make warning if there is no cppcheck

2022-05-20 Thread Jan Beulich

On 20.05.2022 14:14, Bertrand Marquis wrote:
> --- a/xen/Makefile
> +++ b/xen/Makefile
> @@ -694,12 +694,14 @@ $(objtree)/%.c.cppcheck: $(srctree)/%.c 
> $(objtree)/include/generated/autoconf.h
>   $(call if_changed,cppcheck_xml)
>  
>  cppcheck-version:
> -ifeq ($(shell which $(CPPCHECK)),)
> - $(error Cannot find cppcheck executable: $(CPPCHECK))
> -endif
> -ifeq ($(shell $(CPPCHECK) --version | awk '{print ($$2 < 2.7)}'),1)
> - $(error Please upgrade your cppcheck to version 2.7 or greater)
> -endif
> + @if ! which $(CPPCHECK) > /dev/null 2>&1; then \
> + echo "Cannot find cppcheck executable: $(CPPCHECK)"; \
> + exit 1; \
> + fi
> + @if [ "$$($(CPPCHECK) --version | awk '{print ($$2 < 2.7)}')" -eq 1 ]; 
> then \
> + echo "Please upgrade your cppcheck to version 2.7 or greater"; \
> + exit 1; \
> + fi
>  
>  # Put this in generated headers this way it is cleaned by include/Makefile
>  $(objtree)/include/generated/compiler-def.h:

Fine with me, even if - as said on v1 - I would have preferred $(if ...).
One question though: Wouldn't it better be $(Q) instead of the two plain
@? Preferably with that adjustment (which I guess can be made while
committing):
Acked-by: Jan Beulich 

Jan

Re: [PATCH v2] build: Fix make warning if there is no cppcheck

2022-05-20 Thread Bertrand Marquis

Hi,

> On 20 May 2022, at 13:51, Jan Beulich  wrote:
> 
> On 20.05.2022 14:14, Bertrand Marquis wrote:
>> --- a/xen/Makefile
>> +++ b/xen/Makefile
>> @@ -694,12 +694,14 @@ $(objtree)/%.c.cppcheck: $(srctree)/%.c 
>> $(objtree)/include/generated/autoconf.h
>>  $(call if_changed,cppcheck_xml)
>> 
>> cppcheck-version:
>> -ifeq ($(shell which $(CPPCHECK)),)
>> -$(error Cannot find cppcheck executable: $(CPPCHECK))
>> -endif
>> -ifeq ($(shell $(CPPCHECK) --version | awk '{print ($$2 < 2.7)}'),1)
>> -$(error Please upgrade your cppcheck to version 2.7 or greater)
>> -endif
>> +@if ! which $(CPPCHECK) > /dev/null 2>&1; then \
>> +echo "Cannot find cppcheck executable: $(CPPCHECK)"; \
>> +exit 1; \
>> +fi
>> +@if [ "$$($(CPPCHECK) --version | awk '{print ($$2 < 2.7)}')" -eq 1 ]; 
>> then \
>> +echo "Please upgrade your cppcheck to version 2.7 or greater"; \
>> +exit 1; \
>> +fi
>> 
>> # Put this in generated headers this way it is cleaned by include/Makefile
>> $(objtree)/include/generated/compiler-def.h:
> 
> Fine with me, even if - as said on v1 - I would have preferred $(if ...).

Could you explain why and what you mean exactly ?
I thought the code would be more complex and less clear using if and I
do not see how it would solve the issue with which being called.

> One question though: Wouldn't it better be $(Q) instead of the two plain
> @? Preferably with that adjustment (which I guess can be made while
> committing):

I thought of it but who would be interested in actually seeing those
commands which are not “building” anything.

> Acked-by: Jan Beulich 
> 
Thanks

Bertrand

[ovmf test] 170602: regressions - FAIL

2022-05-20 Thread osstest service owner

flight 170602 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170602/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf a21a3438f795deecb24e1843c1636f95c485017c
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z   81 days
Failing since168258  2022-03-01 01:55:31 Z   80 days 1120 attempts
Testing same since   170593  2022-05-20 06:42:41 Z0 days8 attempts


People who touched revisions under test:
  Abdul Lateef Attar 
  Abdul Lateef Attar via groups.io 
  Abner Chang 
  Akihiko Odaki 
  Anthony PERARD 
  Bandaru, Purna Chandra Rao 
  Bo Chang Ke 
  Bob Feng 
  Chao Li 
  Chao, Zhuoran 
  Chen Lin Z 
  Chen, Christine 
  Chen, Lin Z 
  Corvin Köhne 
  Dandan Bi 
  dann frazier 
  Dun Tan 
  duntan 
  Feng, Bob C 
  Gerd Hoffmann 
  Gua Guo 
  Guo Dong 
  Guomin Jiang 
  Hao A Wu 
  Heng Luo 
  Hua Ma 
  Huang, Li-Xia 
  Jagadeesh Ujja 
  Jake Garver 
  Jake Garver via groups.io 
  Jason 
  Jason Lou 
  Jiewen Yao 
  Ke, Bo-ChangX 
  Ken Lautner 
  Kenneth Lautner 
  Kun Qin 
  Kuo, Ted 
  Laszlo Ersek 
  Lean Sheng Tan 
  Leif Lindholm 
  Li, Yi1 
  Li, Zhihao 
  Liming Gao 
  Liu 
  Liu Yun 
  Liu Yun Y 
  Liu, Zhiguang 
  Lixia Huang 
  Lou, Yun 
  Ma, Hua 
  Mara Sophie Grosch 
  Mara Sophie Grosch via groups.io 
  Matt DeVillier 
  Michael D Kinney 
  Michael Kubacki 
  Michael Kubacki 
  Min M Xu 
  Min Xu 
  Oliver Steffen 
  Patrick Rudolph 
  Peter Grehan 
  Purna Chandra Rao Bandaru 
  Ray Ni 
  Rebecca Cran 
  Rebecca Cran 
  Sami Mujawar 
  Sean Rhodes 
  Sean Rhodes sean@starlabs.systems
  Sebastien Boeuf 
  Sunny Wang 
  Tan, Dun 
  Ted Kuo 
  Tom Lendacky 
  Wenyi Xie 
  wenyi,xie via groups.io 
  Xiaolu.Jiang 
  Xie, Yuanhao 
  Yi Li 
  yi1 li 
  Yu Pu 
  Yuanhao Xie 
  Yuwei Chen 
  Zhiguang Liu 
  Zhihao Li 
  Zhuoran Chao 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 6968 lines long.)

Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability

2022-05-20 Thread Chuck Zmudzinski


On 5/20/2022 5:41 AM, Jan Beulich wrote:

On 20.05.2022 10:30, Chuck Zmudzinski wrote:

On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote:

On 5/20/2022 2:05 AM, Jan Beulich wrote:

On 20.05.2022 06:43, Chuck Zmudzinski wrote:

On 5/4/22 5:14 AM, Juergen Gross wrote:

On 04.05.22 10:31, Jan Beulich wrote:

On 03.05.2022 15:22, Juergen Gross wrote:

... these uses there are several more. You say nothing on why
those want
leaving unaltered. When preparing my earlier patch I did inspect them
and came to the conclusion that these all would also better
observe the
adjusted behavior (or else I couldn't have left pat_enabled() as the
only predicate). In fact, as said in the description of my earlier
patch, in
my debugging I did find the use in i915_gem_object_pin_map() to be
the
problematic one, which you leave alone.

Oh, I missed that one, sorry.

That is why your patch would not fix my Haswell unless
it also touches i915_gem_object_pin_map() in
drivers/gpu/drm/i915/gem/i915_gem_pages.c


I wanted to be rather defensive in my changes, but I agree at least
the
case in arch_phys_wc_add() might want to be changed, too.

I think your approach needs to be more aggressive so it will fix
all the known false negatives introduced by bdd8b6c98239
such as the one in i915_gem_object_pin_map().

I looked at Jan's approach and I think it would fix the issue
with my Haswell as long as I don't use the nopat option. I
really don't have a strong opinion on that question, but I
think the nopat option as a Linux kernel option, as opposed
to a hypervisor option, should only affect the kernel, and
if the hypervisor provides the pat feature, then the kernel
should not override that,

Hmm, why would the kernel not be allowed to override that? Such
an override would affect only the single domain where the
kernel runs; other domains could take their own decisions.

Also, for the sake of completeness: "nopat" used when running on
bare metal has the same bad effect on system boot, so there
pretty clearly is an error cleanup issue in the i915 driver. But
that's orthogonal, and I expect the maintainers may not even care
(but tell us "don't do that then").

Actually I just did a test with the last official Debian kernel
build of Linux 5.16, that is, a kernel before bdd8b6c98239 was
applied. In fact, the nopat option does *not* break the i915 driver
in 5.16. That is, with the nopat option, the i915 driver loads
normally on both the bare metal and on the Xen hypervisor.
That means your presumption (and the presumption of
the author of bdd8b6c98239) that the "nopat" option was
being observed by the i915 driver is incorrect. Setting "nopat"
had no effect on my system with Linux 5.16. So after doing these
tests, I am against the aggressive approach of breaking the i915
driver with the "nopat" option because prior to bdd8b6c98239,
nopat did not break the i915 driver. Why break it now?

Because that's, in my understanding, is the purpose of "nopat"
(not breaking the driver of course - that's a driver bug -, but
having an effect on the driver).


I wouldn't call it a driver bug, but an incorrect configuration of the
kernel by the user.  I presume X86_FEATURE_PAT is required by the
i915 driver and therefore the driver should refuse to disable
it if the user requests to disable it and instead warn the user that
the driver did not disable the feature, contrary to what the user
requested with the nopat option.

In any case, my test did not verify that when nopat is set in Linux 5.16,
the thread takes the same code path as when nopat is not set,
so I am not totally sure that the reason nopat does not break the
i915 driver in 5.16 is that static_cpu_has(X86_FEATURE_PAT)
returns true even when nopat is set. I could test it with a custom
log message in 5.16 if that is necessary.

Are you saying it was wrong for static_cpu_has(X86_FEATURE_PAT)
to return true in 5.16 when the user requests nopat? I think that is
just permitting a bad configuration to break the driver that a
well-written operating system should not allow. The i915 driver
was, in my opinion, correctly ignoring the nopat option in 5.16
because that option is not compatible with the hardware the
i915 driver is trying to initialize and setup at boot time. At least
that is my understanding now, but I will need to test it on 5.16
to be sure I understand it correctly.

Also, AFAICT, your patch would break the driver when the nopat
option is set and only fix the regression introduced by bdd8b6c98239
when nopat is not set on my box, so your patch would
introduce a regression relative to Linux 5.16 and earlier for the
case when nopat is set on my box. I think your point would
be that it is not a regression if it is an incorrect user configuration.
I respond by saying a well-written driver should refuse to honor
the incorrect configuration requested by the user and instead
warn the user that it did not honor the incorrect kernel option.

I am only presuming what your patch would do on my box based
on what I

[PATCH 3/5] x86/perf: expose LBR format in PERF_CAPABILITIES

2022-05-20 Thread Roger Pau Monne

Allow exposing the PDCM bit in CPUID for HVM guests if present on the
platform, which in turn allows exposing PERF_CAPABILITIES.  Limit the
information exposed in PERF_CAPABILITIES to the LBR format only.

This is helpful as hardware without model-specific LBRs set format to
0x3f in order to notify the feature is not present.

Signed-off-by: Roger Pau Monné 
---
Seeing as we have never exposed PDCM in CPUID I wonder whether there's
something that I'm missing that makes exposing PERF_CAPABILITIES LBR
format not as trivial as it looks.
---
 xen/arch/x86/msr.c  | 9 +
 xen/include/public/arch-x86/cpufeatureset.h | 2 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 01a15857b7..423a795d1d 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -316,6 +316,15 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t 
*val)
 *val = 0;
 break;
 
+case MSR_IA32_PERF_CAPABILITIES:
+if ( !cp->basic.pdcm )
+goto gp_fault;
+
+/* Only report LBR format. */
+rdmsrl(MSR_IA32_PERF_CAPABILITIES, *val);
+*val &= MSR_IA32_PERF_CAP_LBR_FORMAT;
+break;
+
 case MSR_X2APIC_FIRST ... MSR_X2APIC_LAST:
 if ( !is_hvm_domain(d) || v != curr )
 goto gp_fault;
diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
b/xen/include/public/arch-x86/cpufeatureset.h
index cd6409f9f3..5fdaec43c5 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -135,7 +135,7 @@ XEN_CPUFEATURE(SSSE3, 1*32+ 9) /*A  Supplemental 
Streaming SIMD Extensio
 XEN_CPUFEATURE(FMA,   1*32+12) /*A  Fused Multiply Add */
 XEN_CPUFEATURE(CX16,  1*32+13) /*A  CMPXCHG16B */
 XEN_CPUFEATURE(XTPR,  1*32+14) /*   Send Task Priority Messages */
-XEN_CPUFEATURE(PDCM,  1*32+15) /*   Perf/Debug Capability MSR */
+XEN_CPUFEATURE(PDCM,  1*32+15) /*S  Perf/Debug Capability MSR */
 XEN_CPUFEATURE(PCID,  1*32+17) /*H  Process Context ID */
 XEN_CPUFEATURE(DCA,   1*32+18) /*   Direct Cache Access */
 XEN_CPUFEATURE(SSE4_1,1*32+19) /*A  Streaming SIMD Extensions 4.1 */
-- 
2.36.0

[PATCH 0/5] x86/lbr: handle lack of model-specific LBRs

2022-05-20 Thread Roger Pau Monne

Hello,

Intel Sapphire Rapids CPUs doesn't have model-specific MSRs, and hence
only architectural LBRs are available.

Firstly implement some changes so Xen knows how to enable arch LBRs so
that the ler option can also work in such scenario (first two patches).

The lack of model-specific LBRs also affects guests, as setting
DEBUGCTLMSR.LBR is now ignored (value hardwired to 0, writes ignored) by
the hardware due to the lack of model-specific LBRs.  The LBR format
reported in PERF_CAPABILITIES also need to be exposed, as that's a way
for guests to detect lack of model-specific LBRs presence (patches 3
and 4).

Patch 5 is an indentation fix that can be merged into patch 4: done
separately to help readability of patch 4.

Thanks, Roger.

Roger Pau Monne (5):
  x86/ler: use feature flag to check if option is enabled
  x86/lbr: enable hypervisor LER with arch LBR
  x86/perf: expose LBR format in PERF_CAPABILITIES
  x86/vmx: handle no model-specific LBR presence
  x86/vmx: fix indentation of LBR

 xen/arch/x86/hvm/vmx/vmx.c  | 59 ++---
 xen/arch/x86/include/asm/msr-index.h| 18 +++
 xen/arch/x86/msr.c  |  9 
 xen/arch/x86/traps.c| 29 --
 xen/arch/x86/x86_64/traps.c |  2 +-
 xen/include/public/arch-x86/cpufeatureset.h |  3 +-
 6 files changed, 97 insertions(+), 23 deletions(-)

-- 
2.36.0

[PATCH 5/5] x86/vmx: fix indentation of LBR

2022-05-20 Thread Roger Pau Monne

Properly indent the handling of LBR enable in MSR_IA32_DEBUGCTLMSR
vmx_msr_write_intercept().

No functional change.

Signed-off-by: Roger Pau Monné 
---
Feel free to squash onto the previous patch, did separately to aid the
readability of the previous change.
---
 xen/arch/x86/hvm/vmx/vmx.c | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 3f45ac05c6..ff10b293a4 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3540,31 +3540,31 @@ static int cf_check vmx_msr_write_intercept(
 
 if ( lbr->count )
 {
-for ( ; lbr->count; lbr++ )
-{
-unsigned int i;
-
-for ( i = 0; i < lbr->count; i++ )
+for ( ; lbr->count; lbr++ )
 {
-int rc = vmx_add_guest_msr(v, lbr->base + i, 0);
+unsigned int i;
 
-if ( unlikely(rc) )
+for ( i = 0; i < lbr->count; i++ )
 {
-gprintk(XENLOG_ERR,
-"Guest load/save list error %d\n", rc);
-domain_crash(v->domain);
-return X86EMUL_OKAY;
-}
+int rc = vmx_add_guest_msr(v, lbr->base + i, 0);
 
-vmx_clear_msr_intercept(v, lbr->base + i, VMX_MSR_RW);
+if ( unlikely(rc) )
+{
+gprintk(XENLOG_ERR,
+"Guest load/save list error %d\n", rc);
+domain_crash(v->domain);
+return X86EMUL_OKAY;
+}
+
+vmx_clear_msr_intercept(v, lbr->base + i, VMX_MSR_RW);
+}
 }
-}
 
-v->arch.hvm.vmx.lbr_flags |= LBR_MSRS_INSERTED;
-if ( lbr_tsx_fixup_needed )
-v->arch.hvm.vmx.lbr_flags |= LBR_FIXUP_TSX;
-if ( ler_to_fixup_needed )
-v->arch.hvm.vmx.lbr_flags |= LBR_FIXUP_LER_TO;
+v->arch.hvm.vmx.lbr_flags |= LBR_MSRS_INSERTED;
+if ( lbr_tsx_fixup_needed )
+v->arch.hvm.vmx.lbr_flags |= LBR_FIXUP_TSX;
+if ( ler_to_fixup_needed )
+v->arch.hvm.vmx.lbr_flags |= LBR_FIXUP_LER_TO;
 }
 else
 /* No model specific LBRs, ignore DEBUGCTLMSR.LBR. */
-- 
2.36.0

[PATCH 1/5] x86/ler: use feature flag to check if option is enabled

2022-05-20 Thread Roger Pau Monne

It's more consistent with the rest of the usages of cpu_has_xen_lbr.

No functional change intended.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/x86_64/traps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index 9d7f1f818b..24c5067ca2 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -156,7 +156,7 @@ void show_registers(const struct cpu_user_regs *regs)
 printk("CPU:%d\n", smp_processor_id());
 _show_registers(&fault_regs, fault_crs, context, v);
 
-if ( ler_msr && !guest_mode(regs) )
+if ( cpu_has_xen_lbr && !guest_mode(regs) )
 {
 u64 from, to;
 
-- 
2.36.0

[PATCH 2/5] x86/lbr: enable hypervisor LER with arch LBR

2022-05-20 Thread Roger Pau Monne

CPUs having no model-specific LBRs don't implement DEBUGCTLMSR.LBR
and LBRs can only be enabled if the processor supports architectural
LBRs.

Split the logic to enable LBRs into a separate function and expand the
logic to also implement support for arch LBRs if model-specific LBRs
are not supported.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/include/asm/msr-index.h| 18 +
 xen/arch/x86/traps.c| 29 ++---
 xen/include/public/arch-x86/cpufeatureset.h |  1 +
 3 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/include/asm/msr-index.h 
b/xen/arch/x86/include/asm/msr-index.h
index 3e038db618..7b08e1804b 100644
--- a/xen/arch/x86/include/asm/msr-index.h
+++ b/xen/arch/x86/include/asm/msr-index.h
@@ -139,6 +139,24 @@
 #define  PASID_PASID_MASK   0x000f
 #define  PASID_VALID(_AC(1, ULL) << 31)
 
+#define MSR_ARCH_LBR_CTL0x14ce
+#define  ARCH_LBR_CTL_LBREN (_AC(1, ULL) <<  0)
+#define  ARCH_LBR_CTL_OS(_AC(1, ULL) <<  1)
+#define  ARCH_LBR_CTL_COND  (_AC(1, ULL) << 16)
+#define  ARCH_LBR_CTL_NEAR_REL_JMP  (_AC(1, ULL) << 17)
+#define  ARCH_LBR_CTL_NEAR_IND_JMP  (_AC(1, ULL) << 18)
+#define  ARCH_LBR_CTL_NEAR_REL_CALL (_AC(1, ULL) << 19)
+#define  ARCH_LBR_CTL_NEAR_IND_CALL (_AC(1, ULL) << 20)
+#define  ARCH_LBR_CTL_NEAR_RET  (_AC(1, ULL) << 21)
+#define  ARCH_LBR_CTL_OTHER_BRANCH  (_AC(1, ULL) << 22)
+#define  ARCH_LBR_CTL_RECORD_ALL(ARCH_LBR_CTL_COND | \
+ ARCH_LBR_CTL_NEAR_REL_JMP | \
+ ARCH_LBR_CTL_NEAR_IND_JMP | \
+ ARCH_LBR_CTL_NEAR_REL_CALL | \
+ ARCH_LBR_CTL_NEAR_IND_CALL | \
+ ARCH_LBR_CTL_NEAR_RET | \
+ ARCH_LBR_CTL_OTHER_BRANCH)
+
 #define MSR_EFER0xc080 /* Extended Feature 
Enable Register */
 #define  EFER_SCE   (_AC(1, ULL) <<  0) /* SYSCALL 
Enable */
 #define  EFER_LME   (_AC(1, ULL) <<  8) /* Long Mode 
Enable */
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 4c38f6c015..133348d9f9 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1963,6 +1963,29 @@ void do_device_not_available(struct cpu_user_regs *regs)
 #endif
 }
 
+static bool enable_lbr(void)
+{
+uint64_t debugctl;
+
+wrmsrl(MSR_IA32_DEBUGCTLMSR, IA32_DEBUGCTLMSR_LBR);
+rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
+if ( !(debugctl & IA32_DEBUGCTLMSR_LBR) )
+{
+/*
+ * CPUs with no model-specific LBRs always return DEBUGCTLMSR.LBR
+ * == 0, attempt to set arch LBR if available.
+ */
+if ( !boot_cpu_has(X86_FEATURE_ARCH_LBR) )
+return false;
+
+/* Note that LASTINT{FROMIP,TOIP} matches LER_{FROM_IP,TO_IP} */
+wrmsrl(MSR_ARCH_LBR_CTL, ARCH_LBR_CTL_LBREN | ARCH_LBR_CTL_OS |
+ ARCH_LBR_CTL_RECORD_ALL);
+}
+
+return true;
+}
+
 void do_debug(struct cpu_user_regs *regs)
 {
 unsigned long dr6;
@@ -1997,7 +2020,7 @@ void do_debug(struct cpu_user_regs *regs)
 
 /* #DB automatically disabled LBR.  Reinstate it if debugging Xen. */
 if ( cpu_has_xen_lbr )
-wrmsrl(MSR_IA32_DEBUGCTLMSR, IA32_DEBUGCTLMSR_LBR);
+enable_lbr();
 
 if ( !guest_mode(regs) )
 {
@@ -2179,8 +2202,8 @@ void percpu_traps_init(void)
 if ( !ler_msr && (ler_msr = calc_ler_msr()) )
 setup_force_cpu_cap(X86_FEATURE_XEN_LBR);
 
-if ( cpu_has_xen_lbr )
-wrmsrl(MSR_IA32_DEBUGCTLMSR, IA32_DEBUGCTLMSR_LBR);
+if ( cpu_has_xen_lbr && !enable_lbr() )
+printk(XENLOG_ERR "CPU#%u: failed to enable LBR\n", 
smp_processor_id());
 }
 
 void __init init_idt_traps(void)
diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
b/xen/include/public/arch-x86/cpufeatureset.h
index 9cee4b439e..cd6409f9f3 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -280,6 +280,7 @@ XEN_CPUFEATURE(RTM_ALWAYS_ABORT, 9*32+11) /*! June 2021 TSX 
defeaturing in micro
 XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
 XEN_CPUFEATURE(SERIALIZE, 9*32+14) /*A  SERIALIZE insn */
 XEN_CPUFEATURE(TSXLDTRK,  9*32+16) /*a  TSX load tracking suspend/resume 
insns */
+XEN_CPUFEATURE(ARCH_LBR,  9*32+19) /*   Intel ARCH LBR */
 XEN_CPUFEATURE(CET_IBT,   9*32+20) /*   CET - Indirect Branch Tracking */
 XEN_CPUFEATURE(IBRSB, 9*32+26) /*A  IBRS and IBPB support (used by 
Intel) */
 XEN_CPUFEATURE(STIBP, 9*32+27) /*A  STIBP */
-- 
2.36.0

[PATCH 4/5] x86/vmx: handle no model-specific LBR presence

2022-05-20 Thread Roger Pau Monne

Sapphire Rapids have no model-specific LBRs, and instead only expose
architectural LBRs.  As documented in the Architectural Last Branch
Records specification, processors not supporting model-specific LBRs
MSR_IA32_DEBUGCTLMSR.LBR has no meaning, and can be written to 0 or 1,
but reads will always return 0.

Implement support in vmx_msr_write_intercept() by adding generic
detection of lack of model-specific LBRs by checking if the LBR format
reported in PERF_CAPABILITIES matches 0x3f, which is explicitly listed
in the manual as a way to signal lack of model-specific LBRs
presence.

Signed-off-by: Roger Pau Monné 
---
Note the indentation change in vmx_msr_write_intercept() as a result
of the addition of a new condition is left for a following patch in
order to aid readability of the change.
---
 xen/arch/x86/hvm/vmx/vmx.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index cf428a4849..3f45ac05c6 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3007,6 +3007,8 @@ static const struct lbr_info {
 { MSR_GM_LASTBRANCH_0_FROM_IP,  NUM_MSR_GM_LASTBRANCH_FROM_TO },
 { MSR_GM_LASTBRANCH_0_TO_IP,NUM_MSR_GM_LASTBRANCH_FROM_TO },
 { 0, 0 }
+}, no_lbr[] = {
+{0, 0}
 };
 
 static const struct lbr_info *last_branch_msr_get(void)
@@ -3070,6 +3072,21 @@ static const struct lbr_info *last_branch_msr_get(void)
 /* Goldmont */
 case 0x5c: case 0x5f:
 return gm_lbr;
+
+default:
+if ( cpu_has_pdcm )
+{
+uint64_t cap;
+
+rdmsrl(MSR_IA32_PERF_CAPABILITIES, cap);
+if ( (cap & MSR_IA32_PERF_CAP_LBR_FORMAT) == 0x3f )
+/*
+ * On processors that do not support model-specific LBRs,
+ * PERF_CAPABILITIES.LBR_FMT will have the value 0x3f.
+ */
+return no_lbr;
+}
+break;
 }
 break;
 
@@ -3521,6 +3538,8 @@ static int cf_check vmx_msr_write_intercept(
 return X86EMUL_OKAY;
 }
 
+if ( lbr->count )
+{
 for ( ; lbr->count; lbr++ )
 {
 unsigned int i;
@@ -3546,6 +3565,10 @@ static int cf_check vmx_msr_write_intercept(
 v->arch.hvm.vmx.lbr_flags |= LBR_FIXUP_TSX;
 if ( ler_to_fixup_needed )
 v->arch.hvm.vmx.lbr_flags |= LBR_FIXUP_LER_TO;
+}
+else
+/* No model specific LBRs, ignore DEBUGCTLMSR.LBR. */
+msr_content &= ~IA32_DEBUGCTLMSR_LBR;
 }
 
 __vmwrite(GUEST_IA32_DEBUGCTL, msr_content);
-- 
2.36.0

Re: [PATCH RFC 0/6] x86/ioapic: fix edge triggered interrupt migration

2022-05-20 Thread Roger Pau Monné

On Fri, May 20, 2022 at 02:46:39PM +0200, Jan Beulich wrote:
> On 20.05.2022 14:36, Roger Pau Monné wrote:
> > On Thu, Apr 21, 2022 at 03:21:08PM +0200, Roger Pau Monne wrote:
> >> Hello,
> >>
> >> Following series attempts to solve the issue with IO-APIC edge triggered
> >> interrupts seeing an inconsistent RTE or IRTE when injected while being
> >> migrated.
> >>
> >> It's currently RFC because some patches have post commit message notes,
> >> and because I'm not sure if patch 1 is really needed.  I originally had
> >> the idea of suggesting to only backport patch 1 in order to fix the
> >> issue in older releases, while leaving the more complex (and thus
> >> error prone) IOMMU changes in unstable.  Note however that patch 1 is
> >> just a workaround to prevent interrupts seeing inconsistent entries
> >> while being updated, masking the entry just makes the warning go away,
> >> but the interrupt will be lost.
> > 
> > Ping?
> 
> Sorry, the usual thing with RFCs: They take lower priority than other
> work items. This series is certainly the first of the several pending
> RFC series which I mean to get to, but it's hard to predict when this
> would be.

No problem, I don't think it's a super urgent issue: we have always
handled interrupts this way and so far got no noticeable issues (apart
from the log messages reported on the console).

There's a non-trivial amount of IOMMU code changes, so maybe the IOMMU
maintainers could take a stab at those as a start?

Thanks, Roger.

Re: [PATCH v2 2/3] tools/xl: Use sparse init for dom_info, remove duplicate vars

2022-05-20 Thread Anthony PERARD

On Fri, Apr 29, 2022 at 03:45:25PM -0700, Elliott Mitchell wrote:
> Rather than having shadow variables for every element of dom_info, it is
> better to properly initialize dom_info at the start.  This also removes
> the misleading memset() in the middle of main_create().
> 
> Remove the dryrun element of domain_create as that has been displaced
> by the global "dryrun_only" variable.
> 
> Signed-off-by: Elliott Mitchell 

Reviewed-by: Anthony PERARD 

Thanks for the clean up.

-- 
Anthony PERARD

Re: [PATCH v2] build: Fix make warning if there is no cppcheck

2022-05-20 Thread Jan Beulich

On 20.05.2022 15:23, Bertrand Marquis wrote:
>> On 20 May 2022, at 13:51, Jan Beulich  wrote:
>> On 20.05.2022 14:14, Bertrand Marquis wrote:
>>> --- a/xen/Makefile
>>> +++ b/xen/Makefile
>>> @@ -694,12 +694,14 @@ $(objtree)/%.c.cppcheck: $(srctree)/%.c 
>>> $(objtree)/include/generated/autoconf.h
>>> $(call if_changed,cppcheck_xml)
>>>
>>> cppcheck-version:
>>> -ifeq ($(shell which $(CPPCHECK)),)
>>> -   $(error Cannot find cppcheck executable: $(CPPCHECK))
>>> -endif
>>> -ifeq ($(shell $(CPPCHECK) --version | awk '{print ($$2 < 2.7)}'),1)
>>> -   $(error Please upgrade your cppcheck to version 2.7 or greater)
>>> -endif
>>> +   @if ! which $(CPPCHECK) > /dev/null 2>&1; then \
>>> +   echo "Cannot find cppcheck executable: $(CPPCHECK)"; \
>>> +   exit 1; \
>>> +   fi
>>> +   @if [ "$$($(CPPCHECK) --version | awk '{print ($$2 < 2.7)}')" -eq 1 ]; 
>>> then \
>>> +   echo "Please upgrade your cppcheck to version 2.7 or greater"; \
>>> +   exit 1; \
>>> +   fi
>>>
>>> # Put this in generated headers this way it is cleaned by include/Makefile
>>> $(objtree)/include/generated/compiler-def.h:
>>
>> Fine with me, even if - as said on v1 - I would have preferred $(if ...).
> 
> Could you explain why and what you mean exactly ?

I generally think that make scripts should resort to shell language
only if things cannot reasonably be expressed in make language.

> I thought the code would be more complex and less clear using if and I
> do not see how it would solve the issue with which being called.

The problem to deal with was to move the shell invocation from
makefile parsing time to rule execution time. Hence I don't see
why

cppcheck-version:
$(if $(shell which ...),,$(error ...))

wouldn't deal with the problem equally well. But I guess I may
not be understanding your question / concern.

>> One question though: Wouldn't it better be $(Q) instead of the two plain
>> @? Preferably with that adjustment (which I guess can be made while
>> committing):
> 
> I thought of it but who would be interested in actually seeing those
> commands which are not “building” anything.

You never know what's relevant to see when hunting down some
obscure build system issue.

Jan

[ovmf test] 170603: regressions - FAIL

2022-05-20 Thread osstest service owner

flight 170603 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170603/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf a21a3438f795deecb24e1843c1636f95c485017c
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z   81 days
Failing since168258  2022-03-01 01:55:31 Z   80 days 1121 attempts
Testing same since   170593  2022-05-20 06:42:41 Z0 days9 attempts


People who touched revisions under test:
  Abdul Lateef Attar 
  Abdul Lateef Attar via groups.io 
  Abner Chang 
  Akihiko Odaki 
  Anthony PERARD 
  Bandaru, Purna Chandra Rao 
  Bo Chang Ke 
  Bob Feng 
  Chao Li 
  Chao, Zhuoran 
  Chen Lin Z 
  Chen, Christine 
  Chen, Lin Z 
  Corvin Köhne 
  Dandan Bi 
  dann frazier 
  Dun Tan 
  duntan 
  Feng, Bob C 
  Gerd Hoffmann 
  Gua Guo 
  Guo Dong 
  Guomin Jiang 
  Hao A Wu 
  Heng Luo 
  Hua Ma 
  Huang, Li-Xia 
  Jagadeesh Ujja 
  Jake Garver 
  Jake Garver via groups.io 
  Jason 
  Jason Lou 
  Jiewen Yao 
  Ke, Bo-ChangX 
  Ken Lautner 
  Kenneth Lautner 
  Kun Qin 
  Kuo, Ted 
  Laszlo Ersek 
  Lean Sheng Tan 
  Leif Lindholm 
  Li, Yi1 
  Li, Zhihao 
  Liming Gao 
  Liu 
  Liu Yun 
  Liu Yun Y 
  Liu, Zhiguang 
  Lixia Huang 
  Lou, Yun 
  Ma, Hua 
  Mara Sophie Grosch 
  Mara Sophie Grosch via groups.io 
  Matt DeVillier 
  Michael D Kinney 
  Michael Kubacki 
  Michael Kubacki 
  Min M Xu 
  Min Xu 
  Oliver Steffen 
  Patrick Rudolph 
  Peter Grehan 
  Purna Chandra Rao Bandaru 
  Ray Ni 
  Rebecca Cran 
  Rebecca Cran 
  Sami Mujawar 
  Sean Rhodes 
  Sean Rhodes sean@starlabs.systems
  Sebastien Boeuf 
  Sunny Wang 
  Tan, Dun 
  Ted Kuo 
  Tom Lendacky 
  Wenyi Xie 
  wenyi,xie via groups.io 
  Xiaolu.Jiang 
  Xie, Yuanhao 
  Yi Li 
  yi1 li 
  Yu Pu 
  Yuanhao Xie 
  Yuwei Chen 
  Zhiguang Liu 
  Zhihao Li 
  Zhuoran Chao 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 6968 lines long.)

RE: [PATCH 3/3] x86/monitor: Add new monitor event to catch all vmexits

2022-05-20 Thread Lengyel, Tamas



> -Original Message-
> From: Tian, Kevin 
> Sent: Thursday, May 19, 2022 8:35 PM
> To: Tamas K Lengyel ; xen-
> de...@lists.xenproject.org
> Cc: Lengyel, Tamas ; Wei Liu ;
> Anthony PERARD ; Gross, Jurgen
> ; Cooper, Andrew ;
> George Dunlap ; Beulich, Jan
> ; Julien Grall ; Stefano Stabellini
> ; Alexandru Isaila ; Petre
> Pircalabu ; Pau Monné, Roger
> ; Nakajima, Jun 
> Subject: RE: [PATCH 3/3] x86/monitor: Add new monitor event to catch all
> vmexits
> 
> > From: Tamas K Lengyel 
> > Sent: Wednesday, May 18, 2022 11:02 PM
> >
> > On Thu, May 12, 2022 at 9:47 AM Tamas K Lengyel 
> > wrote:
> > >
> > > On Wed, May 4, 2022 at 9:12 AM Tamas K Lengyel
> 
> > wrote:
> > > >
> > > > On Wed, Apr 27, 2022 at 11:51 AM Tamas K Lengyel
> > > >  wrote:
> > > > >
> > > > > Add monitor event that hooks the vmexit handler allowing for
> > > > > both sync
> > and
> > > > > async monitoring of events. With async monitoring an event is
> > > > > placed
> > on the
> > > > > monitor ring for each exit and the rest of the vmexit handler
> > > > > resumes
> > normally.
> > > > > If there are additional monitor events configured those will
> > > > > also place
> > their
> > > > > respective events on the monitor ring.
> > > > >
> > > > > With the sync version an event is placed on the monitor ring but
> > > > > the
> > handler
> > > > > does not get resumed, thus the sync version is only useful when
> > > > > the VM
> > is not
> > > > > expected to resume normally after the vmexit. Our use-case is
> > > > > primarily
> > with
> > > > > the sync version with VM forks where the fork gets reset after
> > > > > sync
> > vmexit
> > > > > event, thus the rest of the vmexit handler can be safely
> > > > > skipped. This is very useful when we want to avoid Xen crashing
> > > > > the VM under any
> > circumstance,
> > > > > for example during fuzzing. Collecting all vmexit information
> > > > > regardless
> > of
> > > > > the root cause makes it easier to reason about the state of the
> > > > > VM on
> > the
> > > > > monitor side, hence we opt to receive all events, even for
> > > > > external
> > interrupt
> > > > > and NMI exits and let the monitor agent decide how to proceed.
> > > > >
> > > > > Signed-off-by: Tamas K Lengyel 
> > > > > ---
> > > > > v5: wrap vmexit fields in arch.vmx structures in the public
> > > > > vm_event ABI
> > > >
> > > > Patch ping. Could a toolstack maintainer please take a look at this?
> > > > The hypervisor side already has a Reviewed-by.
> > >
> > > Patch ping.
> >
> > Patch ping.
> >
> 
> I guess what you really missed is an ack from toostack maintainer, but
> anyway:
> 
> Reviewed-by: Kevin Tian 

Thanks, the review is still appreciated!
Tamas

Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability

2022-05-20 Thread Jan Beulich

On 20.05.2022 15:33, Chuck Zmudzinski wrote:
> On 5/20/2022 5:41 AM, Jan Beulich wrote:
>> On 20.05.2022 10:30, Chuck Zmudzinski wrote:
>>> On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote:
 On 5/20/2022 2:05 AM, Jan Beulich wrote:
> On 20.05.2022 06:43, Chuck Zmudzinski wrote:
>> On 5/4/22 5:14 AM, Juergen Gross wrote:
>>> On 04.05.22 10:31, Jan Beulich wrote:
 On 03.05.2022 15:22, Juergen Gross wrote:

 ... these uses there are several more. You say nothing on why
 those want
 leaving unaltered. When preparing my earlier patch I did inspect them
 and came to the conclusion that these all would also better
 observe the
 adjusted behavior (or else I couldn't have left pat_enabled() as the
 only predicate). In fact, as said in the description of my earlier
 patch, in
 my debugging I did find the use in i915_gem_object_pin_map() to be
 the
 problematic one, which you leave alone.
>>> Oh, I missed that one, sorry.
>> That is why your patch would not fix my Haswell unless
>> it also touches i915_gem_object_pin_map() in
>> drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>
>>> I wanted to be rather defensive in my changes, but I agree at least
>>> the
>>> case in arch_phys_wc_add() might want to be changed, too.
>> I think your approach needs to be more aggressive so it will fix
>> all the known false negatives introduced by bdd8b6c98239
>> such as the one in i915_gem_object_pin_map().
>>
>> I looked at Jan's approach and I think it would fix the issue
>> with my Haswell as long as I don't use the nopat option. I
>> really don't have a strong opinion on that question, but I
>> think the nopat option as a Linux kernel option, as opposed
>> to a hypervisor option, should only affect the kernel, and
>> if the hypervisor provides the pat feature, then the kernel
>> should not override that,
> Hmm, why would the kernel not be allowed to override that? Such
> an override would affect only the single domain where the
> kernel runs; other domains could take their own decisions.
>
> Also, for the sake of completeness: "nopat" used when running on
> bare metal has the same bad effect on system boot, so there
> pretty clearly is an error cleanup issue in the i915 driver. But
> that's orthogonal, and I expect the maintainers may not even care
> (but tell us "don't do that then").
>>> Actually I just did a test with the last official Debian kernel
>>> build of Linux 5.16, that is, a kernel before bdd8b6c98239 was
>>> applied. In fact, the nopat option does *not* break the i915 driver
>>> in 5.16. That is, with the nopat option, the i915 driver loads
>>> normally on both the bare metal and on the Xen hypervisor.
>>> That means your presumption (and the presumption of
>>> the author of bdd8b6c98239) that the "nopat" option was
>>> being observed by the i915 driver is incorrect. Setting "nopat"
>>> had no effect on my system with Linux 5.16. So after doing these
>>> tests, I am against the aggressive approach of breaking the i915
>>> driver with the "nopat" option because prior to bdd8b6c98239,
>>> nopat did not break the i915 driver. Why break it now?
>> Because that's, in my understanding, is the purpose of "nopat"
>> (not breaking the driver of course - that's a driver bug -, but
>> having an effect on the driver).
> 
> I wouldn't call it a driver bug, but an incorrect configuration of the
> kernel by the user.  I presume X86_FEATURE_PAT is required by the
> i915 driver

The driver ought to work fine without PAT (and hence without being
able to make WC mappings). It would use UC instead and be slow, but
it ought to work.

> and therefore the driver should refuse to disable
> it if the user requests to disable it and instead warn the user that
> the driver did not disable the feature, contrary to what the user
> requested with the nopat option.
> 
> In any case, my test did not verify that when nopat is set in Linux 5.16,
> the thread takes the same code path as when nopat is not set,
> so I am not totally sure that the reason nopat does not break the
> i915 driver in 5.16 is that static_cpu_has(X86_FEATURE_PAT)
> returns true even when nopat is set. I could test it with a custom
> log message in 5.16 if that is necessary.
> 
> Are you saying it was wrong for static_cpu_has(X86_FEATURE_PAT)
> to return true in 5.16 when the user requests nopat?

No, I'm not saying that. It was wrong for this construct to be used
in the driver, which was fixed for 5.17 (and which had caused the
regression I did observe, leading to the patch as a hopefully least
bad option).

> I think that is
> just permitting a bad configuration to break the driver that a
> well-written operating system should not allow. The i915 driver
> was, in my opinion, correctly ignoring the nopat option in 5.16
> because that option is not compatible

Re: [PATCH 3/5] x86/perf: expose LBR format in PERF_CAPABILITIES

2022-05-20 Thread Andrew Cooper

On 20/05/2022 14:37, Roger Pau Monne wrote:
> Allow exposing the PDCM bit in CPUID for HVM guests if present on the
> platform, which in turn allows exposing PERF_CAPABILITIES.  Limit the
> information exposed in PERF_CAPABILITIES to the LBR format only.
>
> This is helpful as hardware without model-specific LBRs set format to
> 0x3f in order to notify the feature is not present.
>
> Signed-off-by: Roger Pau Monné 
> ---
> Seeing as we have never exposed PDCM in CPUID I wonder whether there's
> something that I'm missing that makes exposing PERF_CAPABILITIES LBR
> format not as trivial as it looks.
> ---
>  xen/arch/x86/msr.c  | 9 +
>  xen/include/public/arch-x86/cpufeatureset.h | 2 +-
>  2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
> index 01a15857b7..423a795d1d 100644
> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -316,6 +316,15 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t 
> *val)
>  *val = 0;
>  break;
>  
> +case MSR_IA32_PERF_CAPABILITIES:
> +if ( !cp->basic.pdcm )
> +goto gp_fault;
> +
> +/* Only report LBR format. */
> +rdmsrl(MSR_IA32_PERF_CAPABILITIES, *val);
> +*val &= MSR_IA32_PERF_CAP_LBR_FORMAT;

Urgh.  We should have this info cached from boot.  Except caching on
boot is broken on hybrid cpus.  The P and E cores of an AlderLake report
a different format iirc (they differ between linear, and effective addr).

Given the other pain points with hybrid cpus, I think we can ignore it
in the short term.

> +break;
> +
>  case MSR_X2APIC_FIRST ... MSR_X2APIC_LAST:
>  if ( !is_hvm_domain(d) || v != curr )
>  goto gp_fault;
> diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
> b/xen/include/public/arch-x86/cpufeatureset.h
> index cd6409f9f3..5fdaec43c5 100644
> --- a/xen/include/public/arch-x86/cpufeatureset.h
> +++ b/xen/include/public/arch-x86/cpufeatureset.h
> @@ -135,7 +135,7 @@ XEN_CPUFEATURE(SSSE3, 1*32+ 9) /*A  Supplemental 
> Streaming SIMD Extensio
>  XEN_CPUFEATURE(FMA,   1*32+12) /*A  Fused Multiply Add */
>  XEN_CPUFEATURE(CX16,  1*32+13) /*A  CMPXCHG16B */
>  XEN_CPUFEATURE(XTPR,  1*32+14) /*   Send Task Priority Messages */
> -XEN_CPUFEATURE(PDCM,  1*32+15) /*   Perf/Debug Capability MSR */
> +XEN_CPUFEATURE(PDCM,  1*32+15) /*S  Perf/Debug Capability MSR */

This is the bit which requires more toolstack logic to safely enable. 
Using 's' for off-by-default is fine if we want to get the series in now.

But before we expose the MSR generally, we need to:

1) Put the configuration in msr_policy so the toolstack can reason about it
2) Reject migration attempts to destinations where the LBR format changes
3) Actually put the lBR registers in the migration stream

~Andrew

Re: [PATCH v2 3/3] tools/xl: Allow specifying JSON for domain configuration file format

2022-05-20 Thread Anthony PERARD

On Tue, Apr 19, 2022 at 06:23:41PM -0700, Elliott Mitchell wrote:
> JSON is currently used when saving domains to mass storage.  Being able
> to use JSON as the normal input to `xl create` has potential to be
> valuable.  Add the functionality.
> 
> Move the memset() earlier so to allow use of the structure sooner.
> 
> Signed-off-by: Elliott Mitchell 

So, I gave this a try and creating a guest from a json config, and that
fails very early with "Unknown guest type".

Have you actually tried to create a guest from config file written in
json?

Also, this would need documentation about the new option, about the
format. The man page need to be edited.

An example of a config file written in json would be nice as well.

Thanks,

-- 
Anthony PERARD

Re: [PATCH v2] build: Fix make warning if there is no cppcheck

2022-05-20 Thread Bertrand Marquis

Hi Jan,

> On 20 May 2022, at 14:56, Jan Beulich  wrote:
> 
> On 20.05.2022 15:23, Bertrand Marquis wrote:
>>> On 20 May 2022, at 13:51, Jan Beulich  wrote:
>>> On 20.05.2022 14:14, Bertrand Marquis wrote:
 --- a/xen/Makefile
 +++ b/xen/Makefile
 @@ -694,12 +694,14 @@ $(objtree)/%.c.cppcheck: $(srctree)/%.c 
 $(objtree)/include/generated/autoconf.h
$(call if_changed,cppcheck_xml)
 
 cppcheck-version:
 -ifeq ($(shell which $(CPPCHECK)),)
 -  $(error Cannot find cppcheck executable: $(CPPCHECK))
 -endif
 -ifeq ($(shell $(CPPCHECK) --version | awk '{print ($$2 < 2.7)}'),1)
 -  $(error Please upgrade your cppcheck to version 2.7 or greater)
 -endif
 +  @if ! which $(CPPCHECK) > /dev/null 2>&1; then \
 +  echo "Cannot find cppcheck executable: $(CPPCHECK)"; \
 +  exit 1; \
 +  fi
 +  @if [ "$$($(CPPCHECK) --version | awk '{print ($$2 < 2.7)}')" -eq 1 ]; 
 then \
 +  echo "Please upgrade your cppcheck to version 2.7 or greater"; \
 +  exit 1; \
 +  fi
 
 # Put this in generated headers this way it is cleaned by include/Makefile
 $(objtree)/include/generated/compiler-def.h:
>>> 
>>> Fine with me, even if - as said on v1 - I would have preferred $(if ...).
>> 
>> Could you explain why and what you mean exactly ?
> 
> I generally think that make scripts should resort to shell language
> only if things cannot reasonably be expressed in make language.

Agree hence my first implementation.

> 
>> I thought the code would be more complex and less clear using if and I
>> do not see how it would solve the issue with which being called.
> 
> The problem to deal with was to move the shell invocation from
> makefile parsing time to rule execution time. Hence I don't see
> why
> 
> cppcheck-version:
>   $(if $(shell which ...),,$(error ...))
> 
> wouldn't deal with the problem equally well. But I guess I may
> not be understanding your question / concern.

There are always thousands of ways to achieve the same and here this is only a 
matter of taste.
I must admit that I did not think of using that solution this way.

If you prefer this I have nothing against it and I will ack a patch changing to 
this.

> 
>>> One question though: Wouldn't it better be $(Q) instead of the two plain
>>> @? Preferably with that adjustment (which I guess can be made while
>>> committing):
>> 
>> I thought of it but who would be interested in actually seeing those
>> commands which are not “building” anything.
> 
> You never know what's relevant to see when hunting down some
> obscure build system issue.
> 

Feel free to replace @ by $(Q) in my patch on commit.

Cheers
Bertrand


> Jan
>

Re: [PATCH 3/5] x86/perf: expose LBR format in PERF_CAPABILITIES

2022-05-20 Thread Jan Beulich

On 20.05.2022 16:10, Andrew Cooper wrote:
> On 20/05/2022 14:37, Roger Pau Monne wrote:
>> --- a/xen/include/public/arch-x86/cpufeatureset.h
>> +++ b/xen/include/public/arch-x86/cpufeatureset.h
>> @@ -135,7 +135,7 @@ XEN_CPUFEATURE(SSSE3, 1*32+ 9) /*A  Supplemental 
>> Streaming SIMD Extensio
>>  XEN_CPUFEATURE(FMA,   1*32+12) /*A  Fused Multiply Add */
>>  XEN_CPUFEATURE(CX16,  1*32+13) /*A  CMPXCHG16B */
>>  XEN_CPUFEATURE(XTPR,  1*32+14) /*   Send Task Priority Messages */
>> -XEN_CPUFEATURE(PDCM,  1*32+15) /*   Perf/Debug Capability MSR */
>> +XEN_CPUFEATURE(PDCM,  1*32+15) /*S  Perf/Debug Capability MSR */
> 
> This is the bit which requires more toolstack logic to safely enable. 
> Using 's' for off-by-default is fine if we want to get the series in now.
> 
> But before we expose the MSR generally, we need to:
> 
> 1) Put the configuration in msr_policy so the toolstack can reason about it
> 2) Reject migration attempts to destinations where the LBR format changes

Since this could be quite restrictive, and since people needing to know
they need to hide this feature for migration to work, I guess this would
further want qualifying by "did the guest actually use LBRs so far"?

Jan

> 3) Actually put the lBR registers in the migration stream
> 
> ~Andrew

Re: [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables

2022-05-20 Thread Roger Pau Monné

On Fri, May 20, 2022 at 02:36:02PM +0200, Jan Beulich wrote:
> On 20.05.2022 14:22, Roger Pau Monné wrote:
> > On Fri, May 20, 2022 at 01:13:28PM +0200, Jan Beulich wrote:
> >> On 20.05.2022 13:11, Jan Beulich wrote:
> >>> On 20.05.2022 12:47, Roger Pau Monné wrote:
>  On Thu, May 19, 2022 at 02:12:04PM +0200, Jan Beulich wrote:
> > On 06.05.2022 13:16, Roger Pau Monné wrote:
> >> On Mon, Apr 25, 2022 at 10:40:55AM +0200, Jan Beulich wrote:
> >>> --- a/xen/drivers/passthrough/amd/iommu_map.c
> >>> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> >>> @@ -115,7 +115,19 @@ static void set_iommu_ptes_present(unsig
> >>>  
> >>>  while ( nr_ptes-- )
> >>>  {
> >>> -set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
> >>> +ASSERT(!pde->next_level);
> >>> +ASSERT(!pde->u);
> >>> +
> >>> +if ( pde > table )
> >>> +ASSERT(pde->ign0 == find_first_set_bit(pde - table));
> >>> +else
> >>> +ASSERT(pde->ign0 == PAGE_SHIFT - 3);
> >>
> >> I think PAGETABLE_ORDER would be clearer here.
> >
> > I disagree - PAGETABLE_ORDER is a CPU-side concept. It's not used 
> > anywhere
> > in IOMMU code afaics.
> 
>  Isn't PAGE_SHIFT also a CPU-side concept in the same way?  I'm not
>  sure what's the rule for declaring that PAGE_SHIFT is fine to use in
>  IOMMU code  but not PAGETABLE_ORDER.
> >>>
> >>> Hmm, yes and no. But for consistency with other IOMMU code I may want
> >>> to switch to PAGE_SHIFT_4K.
> >>
> >> Except that, with the plan to re-use pt_update_contig_markers() for CPU-
> >> side re-coalescing, there I'd prefer to stick to PAGE_SHIFT.
> > 
> > Then can PAGETABLE_ORDER be used instead of PAGE_SHIFT - 3?
> 
> pt_update_contig_markers() isn't IOMMU code; since I've said I'd switch
> to PAGE_SHIFT_4K in IOMMU code I'm having a hard time seeing how I could
> at the same time start using PAGETABLE_ORDER there.

I've got confused by the double reply and read it as if you where
going to stick to using PAGE_SHIFT everywhere as proposed originally.

> What I maybe could do is use PTE_PER_TABLE_SHIFT in AMD code and
> LEVEL_STRIDE in VT-d one. Yet I'm not sure that would be fully correct/
> consistent, ...
> 
> > IMO it makes the code quite easier to understand.
> 
> ... or in fact helping readability.

Looking at pt_update_contig_markers() we hardcode CONTIG_LEVEL_SHIFT
to 9 there, which means all users must have a page table order of 9.

It seems to me we are just making things more complicated than
necessary by trying to avoid dependencies between CPU and IOMMU
page-table sizes and definitions, when the underlying mechanism to set
->ign0 has those assumptions baked in.

Would it help if you introduced a PAGE_TABLE_ORDER in page-defs.h?

Thanks, Roger.

[ovmf test] 170604: regressions - FAIL

2022-05-20 Thread osstest service owner

flight 170604 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170604/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf a21a3438f795deecb24e1843c1636f95c485017c
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z   81 days
Failing since168258  2022-03-01 01:55:31 Z   80 days 1122 attempts
Testing same since   170593  2022-05-20 06:42:41 Z0 days   10 attempts


People who touched revisions under test:
  Abdul Lateef Attar 
  Abdul Lateef Attar via groups.io 
  Abner Chang 
  Akihiko Odaki 
  Anthony PERARD 
  Bandaru, Purna Chandra Rao 
  Bo Chang Ke 
  Bob Feng 
  Chao Li 
  Chao, Zhuoran 
  Chen Lin Z 
  Chen, Christine 
  Chen, Lin Z 
  Corvin Köhne 
  Dandan Bi 
  dann frazier 
  Dun Tan 
  duntan 
  Feng, Bob C 
  Gerd Hoffmann 
  Gua Guo 
  Guo Dong 
  Guomin Jiang 
  Hao A Wu 
  Heng Luo 
  Hua Ma 
  Huang, Li-Xia 
  Jagadeesh Ujja 
  Jake Garver 
  Jake Garver via groups.io 
  Jason 
  Jason Lou 
  Jiewen Yao 
  Ke, Bo-ChangX 
  Ken Lautner 
  Kenneth Lautner 
  Kun Qin 
  Kuo, Ted 
  Laszlo Ersek 
  Lean Sheng Tan 
  Leif Lindholm 
  Li, Yi1 
  Li, Zhihao 
  Liming Gao 
  Liu 
  Liu Yun 
  Liu Yun Y 
  Liu, Zhiguang 
  Lixia Huang 
  Lou, Yun 
  Ma, Hua 
  Mara Sophie Grosch 
  Mara Sophie Grosch via groups.io 
  Matt DeVillier 
  Michael D Kinney 
  Michael Kubacki 
  Michael Kubacki 
  Min M Xu 
  Min Xu 
  Oliver Steffen 
  Patrick Rudolph 
  Peter Grehan 
  Purna Chandra Rao Bandaru 
  Ray Ni 
  Rebecca Cran 
  Rebecca Cran 
  Sami Mujawar 
  Sean Rhodes 
  Sean Rhodes sean@starlabs.systems
  Sebastien Boeuf 
  Sunny Wang 
  Tan, Dun 
  Ted Kuo 
  Tom Lendacky 
  Wenyi Xie 
  wenyi,xie via groups.io 
  Xiaolu.Jiang 
  Xie, Yuanhao 
  Yi Li 
  yi1 li 
  Yu Pu 
  Yuanhao Xie 
  Yuwei Chen 
  Zhiguang Liu 
  Zhihao Li 
  Zhuoran Chao 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 6968 lines long.)

Re: [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables

2022-05-20 Thread Roger Pau Monné

On Fri, May 20, 2022 at 04:28:14PM +0200, Roger Pau Monné wrote:
> On Fri, May 20, 2022 at 02:36:02PM +0200, Jan Beulich wrote:
> > On 20.05.2022 14:22, Roger Pau Monné wrote:
> > > On Fri, May 20, 2022 at 01:13:28PM +0200, Jan Beulich wrote:
> > >> On 20.05.2022 13:11, Jan Beulich wrote:
> > >>> On 20.05.2022 12:47, Roger Pau Monné wrote:
> >  On Thu, May 19, 2022 at 02:12:04PM +0200, Jan Beulich wrote:
> > > On 06.05.2022 13:16, Roger Pau Monné wrote:
> > >> On Mon, Apr 25, 2022 at 10:40:55AM +0200, Jan Beulich wrote:
> > >>> --- a/xen/drivers/passthrough/amd/iommu_map.c
> > >>> +++ b/xen/drivers/passthrough/amd/iommu_map.c
> > >>> @@ -115,7 +115,19 @@ static void set_iommu_ptes_present(unsig
> > >>>  
> > >>>  while ( nr_ptes-- )
> > >>>  {
> > >>> -set_iommu_pde_present(pde, next_mfn, 0, iw, ir);
> > >>> +ASSERT(!pde->next_level);
> > >>> +ASSERT(!pde->u);
> > >>> +
> > >>> +if ( pde > table )
> > >>> +ASSERT(pde->ign0 == find_first_set_bit(pde - table));
> > >>> +else
> > >>> +ASSERT(pde->ign0 == PAGE_SHIFT - 3);
> > >>
> > >> I think PAGETABLE_ORDER would be clearer here.
> > >
> > > I disagree - PAGETABLE_ORDER is a CPU-side concept. It's not used 
> > > anywhere
> > > in IOMMU code afaics.
> > 
> >  Isn't PAGE_SHIFT also a CPU-side concept in the same way?  I'm not
> >  sure what's the rule for declaring that PAGE_SHIFT is fine to use in
> >  IOMMU code  but not PAGETABLE_ORDER.
> > >>>
> > >>> Hmm, yes and no. But for consistency with other IOMMU code I may want
> > >>> to switch to PAGE_SHIFT_4K.
> > >>
> > >> Except that, with the plan to re-use pt_update_contig_markers() for CPU-
> > >> side re-coalescing, there I'd prefer to stick to PAGE_SHIFT.
> > > 
> > > Then can PAGETABLE_ORDER be used instead of PAGE_SHIFT - 3?
> > 
> > pt_update_contig_markers() isn't IOMMU code; since I've said I'd switch
> > to PAGE_SHIFT_4K in IOMMU code I'm having a hard time seeing how I could
> > at the same time start using PAGETABLE_ORDER there.
> 
> I've got confused by the double reply and read it as if you where
> going to stick to using PAGE_SHIFT everywhere as proposed originally.
> 
> > What I maybe could do is use PTE_PER_TABLE_SHIFT in AMD code and
> > LEVEL_STRIDE in VT-d one. Yet I'm not sure that would be fully correct/
> > consistent, ...
> > 
> > > IMO it makes the code quite easier to understand.
> > 
> > ... or in fact helping readability.
> 
> Looking at pt_update_contig_markers() we hardcode CONTIG_LEVEL_SHIFT
> to 9 there, which means all users must have a page table order of 9.
> 
> It seems to me we are just making things more complicated than
> necessary by trying to avoid dependencies between CPU and IOMMU
> page-table sizes and definitions, when the underlying mechanism to set
> ->ign0 has those assumptions baked in.
> 
> Would it help if you introduced a PAGE_TABLE_ORDER in page-defs.h?

Sorry, should be PAGE_TABLE_ORDER_4K.

Roger.

[xen-unstable-smoke test] 170599: tolerable all pass - PUSHED

2022-05-20 Thread osstest service owner

flight 170599 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170599/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  ec0cce125b8b9fccde3fa825b8ee963083b5de3b
baseline version:
 xen  43aa3f6e72d340a85d3943b86350f6196a87289c

Last test of basis   170553  2022-05-19 01:03:09 Z1 days
Testing same since   170599  2022-05-20 11:00:25 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Juergen Gross 
  Julien Grall  # Arm
  Marek Marczykowski-Górecki 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   43aa3f6e72..ec0cce125b  ec0cce125b8b9fccde3fa825b8ee963083b5de3b -> smoke

Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability

2022-05-20 Thread Chuck Zmudzinski


On 5/20/2022 10:06 AM, Jan Beulich wrote:

On 20.05.2022 15:33, Chuck Zmudzinski wrote:

On 5/20/2022 5:41 AM, Jan Beulich wrote:

On 20.05.2022 10:30, Chuck Zmudzinski wrote:

On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote:

On 5/20/2022 2:05 AM, Jan Beulich wrote:

On 20.05.2022 06:43, Chuck Zmudzinski wrote:

On 5/4/22 5:14 AM, Juergen Gross wrote:

On 04.05.22 10:31, Jan Beulich wrote:

On 03.05.2022 15:22, Juergen Gross wrote:

... these uses there are several more. You say nothing on why
those want
leaving unaltered. When preparing my earlier patch I did inspect them
and came to the conclusion that these all would also better
observe the
adjusted behavior (or else I couldn't have left pat_enabled() as the
only predicate). In fact, as said in the description of my earlier
patch, in
my debugging I did find the use in i915_gem_object_pin_map() to be
the
problematic one, which you leave alone.

Oh, I missed that one, sorry.

That is why your patch would not fix my Haswell unless
it also touches i915_gem_object_pin_map() in
drivers/gpu/drm/i915/gem/i915_gem_pages.c


I wanted to be rather defensive in my changes, but I agree at least
the
case in arch_phys_wc_add() might want to be changed, too.

I think your approach needs to be more aggressive so it will fix
all the known false negatives introduced by bdd8b6c98239
such as the one in i915_gem_object_pin_map().

I looked at Jan's approach and I think it would fix the issue
with my Haswell as long as I don't use the nopat option. I
really don't have a strong opinion on that question, but I
think the nopat option as a Linux kernel option, as opposed
to a hypervisor option, should only affect the kernel, and
if the hypervisor provides the pat feature, then the kernel
should not override that,

Hmm, why would the kernel not be allowed to override that? Such
an override would affect only the single domain where the
kernel runs; other domains could take their own decisions.

Also, for the sake of completeness: "nopat" used when running on
bare metal has the same bad effect on system boot, so there
pretty clearly is an error cleanup issue in the i915 driver. But
that's orthogonal, and I expect the maintainers may not even care
(but tell us "don't do that then").

Actually I just did a test with the last official Debian kernel
build of Linux 5.16, that is, a kernel before bdd8b6c98239 was
applied. In fact, the nopat option does *not* break the i915 driver
in 5.16. That is, with the nopat option, the i915 driver loads
normally on both the bare metal and on the Xen hypervisor.
That means your presumption (and the presumption of
the author of bdd8b6c98239) that the "nopat" option was
being observed by the i915 driver is incorrect. Setting "nopat"
had no effect on my system with Linux 5.16. So after doing these
tests, I am against the aggressive approach of breaking the i915
driver with the "nopat" option because prior to bdd8b6c98239,
nopat did not break the i915 driver. Why break it now?

Because that's, in my understanding, is the purpose of "nopat"
(not breaking the driver of course - that's a driver bug -, but
having an effect on the driver).

I wouldn't call it a driver bug, but an incorrect configuration of the
kernel by the user.  I presume X86_FEATURE_PAT is required by the
i915 driver

The driver ought to work fine without PAT (and hence without being
able to make WC mappings). It would use UC instead and be slow, but
it ought to work.


and therefore the driver should refuse to disable
it if the user requests to disable it and instead warn the user that
the driver did not disable the feature, contrary to what the user
requested with the nopat option.

In any case, my test did not verify that when nopat is set in Linux 5.16,
the thread takes the same code path as when nopat is not set,
so I am not totally sure that the reason nopat does not break the
i915 driver in 5.16 is that static_cpu_has(X86_FEATURE_PAT)
returns true even when nopat is set. I could test it with a custom
log message in 5.16 if that is necessary.

Are you saying it was wrong for static_cpu_has(X86_FEATURE_PAT)
to return true in 5.16 when the user requests nopat?

No, I'm not saying that. It was wrong for this construct to be used
in the driver, which was fixed for 5.17 (and which had caused the
regression I did observe, leading to the patch as a hopefully least
bad option).


I think that is
just permitting a bad configuration to break the driver that a
well-written operating system should not allow. The i915 driver
was, in my opinion, correctly ignoring the nopat option in 5.16
because that option is not compatible with the hardware the
i915 driver is trying to initialize and setup at boot time. At least
that is my understanding now, but I will need to test it on 5.16
to be sure I understand it correctly.

Also, AFAICT, your patch would break the driver when the nopat
option is set and only fix the regression introduced by bdd8b6c98239
when nopat is not set on my box, so your patch would

Re: [PATCH 3/5] x86/perf: expose LBR format in PERF_CAPABILITIES

2022-05-20 Thread Roger Pau Monné

On Fri, May 20, 2022 at 02:10:31PM +, Andrew Cooper wrote:
> On 20/05/2022 14:37, Roger Pau Monne wrote:
> > Allow exposing the PDCM bit in CPUID for HVM guests if present on the
> > platform, which in turn allows exposing PERF_CAPABILITIES.  Limit the
> > information exposed in PERF_CAPABILITIES to the LBR format only.
> >
> > This is helpful as hardware without model-specific LBRs set format to
> > 0x3f in order to notify the feature is not present.
> >
> > Signed-off-by: Roger Pau Monné 
> > ---
> > Seeing as we have never exposed PDCM in CPUID I wonder whether there's
> > something that I'm missing that makes exposing PERF_CAPABILITIES LBR
> > format not as trivial as it looks.
> > ---
> >  xen/arch/x86/msr.c  | 9 +
> >  xen/include/public/arch-x86/cpufeatureset.h | 2 +-
> >  2 files changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
> > index 01a15857b7..423a795d1d 100644
> > --- a/xen/arch/x86/msr.c
> > +++ b/xen/arch/x86/msr.c
> > @@ -316,6 +316,15 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t 
> > *val)
> >  *val = 0;
> >  break;
> >  
> > +case MSR_IA32_PERF_CAPABILITIES:
> > +if ( !cp->basic.pdcm )
> > +goto gp_fault;
> > +
> > +/* Only report LBR format. */
> > +rdmsrl(MSR_IA32_PERF_CAPABILITIES, *val);
> > +*val &= MSR_IA32_PERF_CAP_LBR_FORMAT;
> 
> Urgh.  We should have this info cached from boot.  Except caching on
> boot is broken on hybrid cpus.  The P and E cores of an AlderLake report
> a different format iirc (they differ between linear, and effective addr).
> 
> Given the other pain points with hybrid cpus, I think we can ignore it
> in the short term.
> 
> > +break;
> > +
> >  case MSR_X2APIC_FIRST ... MSR_X2APIC_LAST:
> >  if ( !is_hvm_domain(d) || v != curr )
> >  goto gp_fault;
> > diff --git a/xen/include/public/arch-x86/cpufeatureset.h 
> > b/xen/include/public/arch-x86/cpufeatureset.h
> > index cd6409f9f3..5fdaec43c5 100644
> > --- a/xen/include/public/arch-x86/cpufeatureset.h
> > +++ b/xen/include/public/arch-x86/cpufeatureset.h
> > @@ -135,7 +135,7 @@ XEN_CPUFEATURE(SSSE3, 1*32+ 9) /*A  
> > Supplemental Streaming SIMD Extensio
> >  XEN_CPUFEATURE(FMA,   1*32+12) /*A  Fused Multiply Add */
> >  XEN_CPUFEATURE(CX16,  1*32+13) /*A  CMPXCHG16B */
> >  XEN_CPUFEATURE(XTPR,  1*32+14) /*   Send Task Priority Messages */
> > -XEN_CPUFEATURE(PDCM,  1*32+15) /*   Perf/Debug Capability MSR */
> > +XEN_CPUFEATURE(PDCM,  1*32+15) /*S  Perf/Debug Capability MSR */
> 
> This is the bit which requires more toolstack logic to safely enable. 
> Using 's' for off-by-default is fine if we want to get the series in now.
> 
> But before we expose the MSR generally, we need to:
> 
> 1) Put the configuration in msr_policy so the toolstack can reason about it
> 2) Reject migration attempts to destinations where the LBR format changes
> 3) Actually put the lBR registers in the migration stream

So far we have allowed guests to enable LBRs (DEBUGCTLMSR.LBR) freely
without any restrictions, and migration of guests using LBRs certainly
won't work currently, hence I wonder why exposing the LBR format makes
it worse as to require all this extra handling.

I'm not saying it's not worth having, but IMO we should better spend
the time in getting architectural LBRs available to guests and
migration safe, and for architectural LBRs we don't really care about
PERF_CAPABILITIES.LBR_FORMAT other than hardcoding it to 0x3f.

Thanks, Roger.

Re: [PATCH 3/5] x86/perf: expose LBR format in PERF_CAPABILITIES

2022-05-20 Thread Andrew Cooper

On 20/05/2022 15:19, Jan Beulich wrote:
> On 20.05.2022 16:10, Andrew Cooper wrote:
>> On 20/05/2022 14:37, Roger Pau Monne wrote:
>>> --- a/xen/include/public/arch-x86/cpufeatureset.h
>>> +++ b/xen/include/public/arch-x86/cpufeatureset.h
>>> @@ -135,7 +135,7 @@ XEN_CPUFEATURE(SSSE3, 1*32+ 9) /*A  
>>> Supplemental Streaming SIMD Extensio
>>>  XEN_CPUFEATURE(FMA,   1*32+12) /*A  Fused Multiply Add */
>>>  XEN_CPUFEATURE(CX16,  1*32+13) /*A  CMPXCHG16B */
>>>  XEN_CPUFEATURE(XTPR,  1*32+14) /*   Send Task Priority Messages */
>>> -XEN_CPUFEATURE(PDCM,  1*32+15) /*   Perf/Debug Capability MSR */
>>> +XEN_CPUFEATURE(PDCM,  1*32+15) /*S  Perf/Debug Capability MSR */
>> This is the bit which requires more toolstack logic to safely enable. 
>> Using 's' for off-by-default is fine if we want to get the series in now.
>>
>> But before we expose the MSR generally, we need to:
>>
>> 1) Put the configuration in msr_policy so the toolstack can reason about it
>> 2) Reject migration attempts to destinations where the LBR format changes
> Since this could be quite restrictive, and since people needing to know
> they need to hide this feature for migration to work, I guess this would
> further want qualifying by "did the guest actually use LBRs so far"?

In practice, it's every major generation ("tock" on Intel's old model),
so isn't actually limiting the kinds of heterogeneous setups used in
production.  (Migration gets steadily less stable the further apart the
two CPUs are.)

As to dynamic, no - that would be a security bug in a cloud scenario,
because there must not be anything the guest can do to interfere with
the manageability.

Use of LBR is rare, as demonstrated by the fact that noone has
complained about the fact that migrating such a VM will malfunction.

As we now have a way of reporting "no model-specific LBR", I'm tempted
to suggest that VMs get no LBR by default, and someone wanting LBR has
to opt in, which is also an explicit agreement to the migration limitation.

~Andrew

Re: [PATCH V8 2/2] libxl: Introduce basic virtio-mmio support on Arm

2022-05-20 Thread Anthony PERARD

On Thu, May 19, 2022 at 08:16:16PM +0300, Oleksandr wrote:
> On 18.05.22 14:05, Anthony PERARD wrote:
> > On Tue, May 03, 2022 at 08:26:03PM +0300, Oleksandr Tyshchenko wrote:
> > > +for (i = 0; i < d_config->num_disks; i++) {
> > > +libxl_device_disk *disk = &d_config->disks[i];
> > > +
> > > +if (disk->specification == LIBXL_DISK_SPECIFICATION_VIRTIO) {
> > > +disk->base = alloc_virtio_mmio_base(gc, &virtio_mmio_base);
> > > +if (!disk->base)
> > > +return ERROR_FAIL;
> > > +
> > > +disk->irq = alloc_virtio_mmio_irq(gc, &virtio_mmio_irq);
> > > +if (!disk->irq)
> > > +return ERROR_FAIL;
> > > +
> > > +if (virtio_irq < disk->irq)
> > > +virtio_irq = disk->irq;
> > > +virtio_enabled = true;
> > > +
> > > +LOG(DEBUG, "Allocate Virtio MMIO params for Vdev %s: IRQ %u 
> > > BASE 0x%"PRIx64,
> > > +disk->vdev, disk->irq, disk->base);
> > > +}
> > > +}
> > > +
> > > +if (virtio_enabled)
> > > +nr_spis += (virtio_irq - 32) + 1;
> > Is it possible to update "nr_spis" inside the loop?
> 
> yes, but ...
> 
> 
> >   The added value
> > seems to be "number of virtio device + 1",
> 
>    ... not really ...
> 
> 
> >   so updating "nr_spis" and
> > adding +1 after the loop could work, right?
> 
>    ... from my understanding, we cannot just increment nr_spis by "one"
> inside a loop, we need to calculate it.
> 
> 
> Something like that (not tested):
> 
>    uint32_t spi;
> 
>    ...
> 
>    spi = irq - 32;
>    if (nr_spis <= spi)
>    nr_spis = spi + 1;
> 
> 
> Shall I update "nr_spis" inside the loop?
> 
> Are you asking because of eliminating "virtio_enabled" and/or "virtio_irq"
> locals? They are used down the code.

I'm asking because the calculation doesn't really make sense to me yet. At the
moment "virtio_irq-32+1" happen to be the "number of disk + 1" and we
have "nr_spis += " which I don't think makes sense with the "+1".

Doesn't "nr_spis" only need to be the highest irq value for the devices
we're adding? (Maybe with +1) (also -32 because I think I understand
what 32 stand for now) (also, the "num_irqs" loop just after this loop
seems to do exactly that)

But I think what this line of code needs the most is a comment.

> > Also, what is "32"? Is it "GUEST_VIRTIO_MMIO_SPI_FIRST - 1" ?
> 
> Although currently "GUEST_VIRTIO_MMIO_SPI_FIRST - 1" = "32", we cannot rely
> on this (I mean to use "GUEST_VIRTIO_MMIO_SPI_FIRST - 1"
> 
> instead of "32"),  because "32" has yet another meaning. This is an offset
> for SPI, the values before 32 are for private IRQs (PPI, SGI).

Do you think it could be possible to name that value? I've seen other
use of 32 in the same function that have probably the same meaning. But
if you don't have a good name, I guess we can also live a bit longer
with a plain "32".

Cheers,

-- 
Anthony PERARD

[ovmf test] 170605: regressions - FAIL

2022-05-20 Thread osstest service owner

flight 170605 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170605/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf a21a3438f795deecb24e1843c1636f95c485017c
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z   81 days
Failing since168258  2022-03-01 01:55:31 Z   80 days 1123 attempts
Testing same since   170593  2022-05-20 06:42:41 Z0 days   11 attempts


People who touched revisions under test:
  Abdul Lateef Attar 
  Abdul Lateef Attar via groups.io 
  Abner Chang 
  Akihiko Odaki 
  Anthony PERARD 
  Bandaru, Purna Chandra Rao 
  Bo Chang Ke 
  Bob Feng 
  Chao Li 
  Chao, Zhuoran 
  Chen Lin Z 
  Chen, Christine 
  Chen, Lin Z 
  Corvin Köhne 
  Dandan Bi 
  dann frazier 
  Dun Tan 
  duntan 
  Feng, Bob C 
  Gerd Hoffmann 
  Gua Guo 
  Guo Dong 
  Guomin Jiang 
  Hao A Wu 
  Heng Luo 
  Hua Ma 
  Huang, Li-Xia 
  Jagadeesh Ujja 
  Jake Garver 
  Jake Garver via groups.io 
  Jason 
  Jason Lou 
  Jiewen Yao 
  Ke, Bo-ChangX 
  Ken Lautner 
  Kenneth Lautner 
  Kun Qin 
  Kuo, Ted 
  Laszlo Ersek 
  Lean Sheng Tan 
  Leif Lindholm 
  Li, Yi1 
  Li, Zhihao 
  Liming Gao 
  Liu 
  Liu Yun 
  Liu Yun Y 
  Liu, Zhiguang 
  Lixia Huang 
  Lou, Yun 
  Ma, Hua 
  Mara Sophie Grosch 
  Mara Sophie Grosch via groups.io 
  Matt DeVillier 
  Michael D Kinney 
  Michael Kubacki 
  Michael Kubacki 
  Min M Xu 
  Min Xu 
  Oliver Steffen 
  Patrick Rudolph 
  Peter Grehan 
  Purna Chandra Rao Bandaru 
  Ray Ni 
  Rebecca Cran 
  Rebecca Cran 
  Sami Mujawar 
  Sean Rhodes 
  Sean Rhodes sean@starlabs.systems
  Sebastien Boeuf 
  Sunny Wang 
  Tan, Dun 
  Ted Kuo 
  Tom Lendacky 
  Wenyi Xie 
  wenyi,xie via groups.io 
  Xiaolu.Jiang 
  Xie, Yuanhao 
  Yi Li 
  yi1 li 
  Yu Pu 
  Yuanhao Xie 
  Yuwei Chen 
  Zhiguang Liu 
  Zhihao Li 
  Zhuoran Chao 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 6968 lines long.)

[REGRESSION} Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability

2022-05-20 Thread Chuck Zmudzinski


On 5/20/2022 10:06 AM, Jan Beulich wrote:

On 20.05.2022 15:33, Chuck Zmudzinski wrote:

On 5/20/2022 5:41 AM, Jan Beulich wrote:

On 20.05.2022 10:30, Chuck Zmudzinski wrote:

On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote:

On 5/20/2022 2:05 AM, Jan Beulich wrote:

On 20.05.2022 06:43, Chuck Zmudzinski wrote:

On 5/4/22 5:14 AM, Juergen Gross wrote:

On 04.05.22 10:31, Jan Beulich wrote:

On 03.05.2022 15:22, Juergen Gross wrote:

... these uses there are several more. You say nothing on why
those want
leaving unaltered. When preparing my earlier patch I did inspect them
and came to the conclusion that these all would also better
observe the
adjusted behavior (or else I couldn't have left pat_enabled() as the
only predicate). In fact, as said in the description of my earlier
patch, in
my debugging I did find the use in i915_gem_object_pin_map() to be
the
problematic one, which you leave alone.

Oh, I missed that one, sorry.

That is why your patch would not fix my Haswell unless
it also touches i915_gem_object_pin_map() in
drivers/gpu/drm/i915/gem/i915_gem_pages.c


I wanted to be rather defensive in my changes, but I agree at least
the
case in arch_phys_wc_add() might want to be changed, too.

I think your approach needs to be more aggressive so it will fix
all the known false negatives introduced by bdd8b6c98239
such as the one in i915_gem_object_pin_map().

I looked at Jan's approach and I think it would fix the issue
with my Haswell as long as I don't use the nopat option. I
really don't have a strong opinion on that question, but I
think the nopat option as a Linux kernel option, as opposed
to a hypervisor option, should only affect the kernel, and
if the hypervisor provides the pat feature, then the kernel
should not override that,

Hmm, why would the kernel not be allowed to override that? Such
an override would affect only the single domain where the
kernel runs; other domains could take their own decisions.

Also, for the sake of completeness: "nopat" used when running on
bare metal has the same bad effect on system boot, so there
pretty clearly is an error cleanup issue in the i915 driver. But
that's orthogonal, and I expect the maintainers may not even care
(but tell us "don't do that then").

Actually I just did a test with the last official Debian kernel
build of Linux 5.16, that is, a kernel before bdd8b6c98239 was
applied. In fact, the nopat option does *not* break the i915 driver
in 5.16. That is, with the nopat option, the i915 driver loads
normally on both the bare metal and on the Xen hypervisor.
That means your presumption (and the presumption of
the author of bdd8b6c98239) that the "nopat" option was
being observed by the i915 driver is incorrect. Setting "nopat"
had no effect on my system with Linux 5.16. So after doing these
tests, I am against the aggressive approach of breaking the i915
driver with the "nopat" option because prior to bdd8b6c98239,
nopat did not break the i915 driver. Why break it now?

Because that's, in my understanding, is the purpose of "nopat"
(not breaking the driver of course - that's a driver bug -, but
having an effect on the driver).

I wouldn't call it a driver bug, but an incorrect configuration of the
kernel by the user.  I presume X86_FEATURE_PAT is required by the
i915 driver

The driver ought to work fine without PAT (and hence without being
able to make WC mappings). It would use UC instead and be slow, but
it ought to work.


I am not an expert, but I think the reason it failed on my box was
because of the requirements of CI. Maybe the driver would fall back
to UC if the add_taint_for_CI function did not halt the entire system
in response to the failed test for PAT when trying to use WC mappings.


and therefore the driver should refuse to disable
it if the user requests to disable it and instead warn the user that
the driver did not disable the feature, contrary to what the user
requested with the nopat option.

In any case, my test did not verify that when nopat is set in Linux 5.16,
the thread takes the same code path as when nopat is not set,
so I am not totally sure that the reason nopat does not break the
i915 driver in 5.16 is that static_cpu_has(X86_FEATURE_PAT)
returns true even when nopat is set. I could test it with a custom
log message in 5.16 if that is necessary.

Are you saying it was wrong for
to return true in 5.16 when the user requests nopat?

No, I'm not saying that. It was wrong for this construct to be used
in the driver, which was fixed for 5.17 (and which had caused the
regression I did observe, leading to the patch as a hopefully least
bad option).


Hmm, the patch I used to fix my box with 5.17.6 used
static_cpu_has(X86_FEATURE_PAT) so the driver could
continue to configure the hardware using WC. This is the
relevant part of the patch I used to fix my box, which includes
extra error logs, (against Debian's official build of 5.17.6):

--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c    2022-05-09 
03:16:33.0 -0400
+

[linux-linus test] 170592: tolerable FAIL - PUSHED

2022-05-20 Thread osstest service owner

flight 170592 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170592/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 170581
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 170581
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 170581
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 170581
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 170581
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 170581
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 170581
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 170581
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 linux3d7285a335edaf23b699e87c528cf0b0070e3293
baseline version:
 linuxb015dcd62b86d298829990f8261d5d154b8d7af5

Last test of basis   170581  2022-05-19 21:41:24 Z0 days
Testing same since   170592  2022-05-20 06:42:30 Z0 days1 attempts


People who touched revisions under test:
  Herbert Xu 
  Linus Torvalds 
  Ondrej Mosnacek 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64

[ovmf test] 170607: regressions - FAIL

2022-05-20 Thread osstest service owner

flight 170607 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170607/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf a21a3438f795deecb24e1843c1636f95c485017c
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z   81 days
Failing since168258  2022-03-01 01:55:31 Z   80 days 1124 attempts
Testing same since   170593  2022-05-20 06:42:41 Z0 days   12 attempts


People who touched revisions under test:
  Abdul Lateef Attar 
  Abdul Lateef Attar via groups.io 
  Abner Chang 
  Akihiko Odaki 
  Anthony PERARD 
  Bandaru, Purna Chandra Rao 
  Bo Chang Ke 
  Bob Feng 
  Chao Li 
  Chao, Zhuoran 
  Chen Lin Z 
  Chen, Christine 
  Chen, Lin Z 
  Corvin Köhne 
  Dandan Bi 
  dann frazier 
  Dun Tan 
  duntan 
  Feng, Bob C 
  Gerd Hoffmann 
  Gua Guo 
  Guo Dong 
  Guomin Jiang 
  Hao A Wu 
  Heng Luo 
  Hua Ma 
  Huang, Li-Xia 
  Jagadeesh Ujja 
  Jake Garver 
  Jake Garver via groups.io 
  Jason 
  Jason Lou 
  Jiewen Yao 
  Ke, Bo-ChangX 
  Ken Lautner 
  Kenneth Lautner 
  Kun Qin 
  Kuo, Ted 
  Laszlo Ersek 
  Lean Sheng Tan 
  Leif Lindholm 
  Li, Yi1 
  Li, Zhihao 
  Liming Gao 
  Liu 
  Liu Yun 
  Liu Yun Y 
  Liu, Zhiguang 
  Lixia Huang 
  Lou, Yun 
  Ma, Hua 
  Mara Sophie Grosch 
  Mara Sophie Grosch via groups.io 
  Matt DeVillier 
  Michael D Kinney 
  Michael Kubacki 
  Michael Kubacki 
  Min M Xu 
  Min Xu 
  Oliver Steffen 
  Patrick Rudolph 
  Peter Grehan 
  Purna Chandra Rao Bandaru 
  Ray Ni 
  Rebecca Cran 
  Rebecca Cran 
  Sami Mujawar 
  Sean Rhodes 
  Sean Rhodes sean@starlabs.systems
  Sebastien Boeuf 
  Sunny Wang 
  Tan, Dun 
  Ted Kuo 
  Tom Lendacky 
  Wenyi Xie 
  wenyi,xie via groups.io 
  Xiaolu.Jiang 
  Xie, Yuanhao 
  Yi Li 
  yi1 li 
  Yu Pu 
  Yuanhao Xie 
  Yuwei Chen 
  Zhiguang Liu 
  Zhihao Li 
  Zhuoran Chao 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 6968 lines long.)

Re: [OSSTEST PATCH v2] ts-xen-build-prep: Install newer NASM version, to build OVMF

2022-05-20 Thread Ian Jackson

Anthony PERARD writes ("[OSSTEST PATCH v2] ts-xen-build-prep: Install newer 
NASM version, to build OVMF"):
> Recent versions of OVMF now need a version of NASM that is newer
> than the one available on Debian oldstable/buster. They want to use
> NASM 2.15.05 [1], which is available in Debian stable/bullseye. The
> need to use a newer version started with d3febfd9ade3 ("MdePkg:
> Replace Opcode with the corresponding instructions.").
> 
> There is no backport package available but the nasm package from
> Debian Bullseye can easily be installed on Buster as it has few
> dependencies and are already satisfied.
> 
> [1] 
> https://github.com/tianocore/edk2/commit/6a890db161cd6d378bec3499a1e774db3f5a27a7
> ("BaseTools: Upgrade the version of NASM tool")

LGTM assuming Roger is also happy...

Acked-by: Ian Jackson 

-- 
Ian JacksonThese opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.

Re: [REGRESSION} Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability

2022-05-20 Thread Chuck Zmudzinski

I think this summary of the regression is appropriate for a top-post. 
Details follow below.


commit bdd8b6c98239: introduced what I call a real regression which 
persists in 5.17.x


Jan's proposed patch: 
https://lore.kernel.org/lkml/9385fa60-fa5d-f559-a137-6608408f8...@suse.com/


Jan's patch would fix the real regression introduced by bdd8b6c98239 when
the nopat option is not enabled, but when the nopat option is enabled, this
patch would introduce what Jan calls a "perceived regression" that is really
caused by the failure of the i915 driver to handle the case of the nopat 
option

being provided on the command line properly.

What I request: commit Jan's proposed patch, and backport it to 5.17. That
would fix the real regression and only cause a perceived regression for the
case when nopat is enabled. In that case, patches to the i915 driver
would be helpful but necessary to fix a regression.

Regard,

Chuck Zmudzinski

On 5/20/2022 11:46 AM, Chuck Zmudzinski wrote:

On 5/20/2022 10:06 AM, Jan Beulich wrote:

On 20.05.2022 15:33, Chuck Zmudzinski wrote:

On 5/20/2022 5:41 AM, Jan Beulich wrote:

On 20.05.2022 10:30, Chuck Zmudzinski wrote:

On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote:

On 5/20/2022 2:05 AM, Jan Beulich wrote:

On 20.05.2022 06:43, Chuck Zmudzinski wrote:

On 5/4/22 5:14 AM, Juergen Gross wrote:

On 04.05.22 10:31, Jan Beulich wrote:

On 03.05.2022 15:22, Juergen Gross wrote:

... these uses there are several more. You say nothing on why
those want
leaving unaltered. When preparing my earlier patch I did 
inspect them

and came to the conclusion that these all would also better
observe the
adjusted behavior (or else I couldn't have left pat_enabled() 
as the
only predicate). In fact, as said in the description of my 
earlier

patch, in
my debugging I did find the use in i915_gem_object_pin_map() 
to be

the
problematic one, which you leave alone.

Oh, I missed that one, sorry.

That is why your patch would not fix my Haswell unless
it also touches i915_gem_object_pin_map() in
drivers/gpu/drm/i915/gem/i915_gem_pages.c

I wanted to be rather defensive in my changes, but I agree at 
least

the
case in arch_phys_wc_add() might want to be changed, too.

I think your approach needs to be more aggressive so it will fix
all the known false negatives introduced by bdd8b6c98239
such as the one in i915_gem_object_pin_map().

I looked at Jan's approach and I think it would fix the issue
with my Haswell as long as I don't use the nopat option. I
really don't have a strong opinion on that question, but I
think the nopat option as a Linux kernel option, as opposed
to a hypervisor option, should only affect the kernel, and
if the hypervisor provides the pat feature, then the kernel
should not override that,

Hmm, why would the kernel not be allowed to override that? Such
an override would affect only the single domain where the
kernel runs; other domains could take their own decisions.

Also, for the sake of completeness: "nopat" used when running on
bare metal has the same bad effect on system boot, so there
pretty clearly is an error cleanup issue in the i915 driver. But
that's orthogonal, and I expect the maintainers may not even care
(but tell us "don't do that then").

Actually I just did a test with the last official Debian kernel
build of Linux 5.16, that is, a kernel before bdd8b6c98239 was
applied. In fact, the nopat option does *not* break the i915 driver
in 5.16. That is, with the nopat option, the i915 driver loads
normally on both the bare metal and on the Xen hypervisor.
That means your presumption (and the presumption of
the author of bdd8b6c98239) that the "nopat" option was
being observed by the i915 driver is incorrect. Setting "nopat"
had no effect on my system with Linux 5.16. So after doing these
tests, I am against the aggressive approach of breaking the i915
driver with the "nopat" option because prior to bdd8b6c98239,
nopat did not break the i915 driver. Why break it now?

Because that's, in my understanding, is the purpose of "nopat"
(not breaking the driver of course - that's a driver bug -, but
having an effect on the driver).

I wouldn't call it a driver bug, but an incorrect configuration of the
kernel by the user.  I presume X86_FEATURE_PAT is required by the
i915 driver

The driver ought to work fine without PAT (and hence without being
able to make WC mappings). It would use UC instead and be slow, but
it ought to work.


I am not an expert, but I think the reason it failed on my box was
because of the requirements of CI. Maybe the driver would fall back
to UC if the add_taint_for_CI function did not halt the entire system
in response to the failed test for PAT when trying to use WC mappings.


and therefore the driver should refuse to disable
it if the user requests to disable it and instead warn the user that
the driver did not disable the feature, contrary to what the user
requested with the nopat option.

In any case, my test did not verify that when nopat is set

Re: [REGRESSION} Re: [PATCH 2/2] x86/pat: add functions to query specific cache mode availability

2022-05-20 Thread Chuck Zmudzinski


On 5/20/2022 1:13 PM, Chuck Zmudzinski wrote:
I think this summary of the regression is appropriate for a top-post. 
Details follow below.


commit bdd8b6c98239: introduced what I call a real regression which 
persists in 5.17.x


Jan's proposed patch: 
https://lore.kernel.org/lkml/9385fa60-fa5d-f559-a137-6608408f8...@suse.com/


Jan's patch would fix the real regression introduced by bdd8b6c98239 when
the nopat option is not enabled, but when the nopat option is enabled, 
this
patch would introduce what Jan calls a "perceived regression" that is 
really
caused by the failure of the i915 driver to handle the case of the 
nopat option

being provided on the command line properly.

What I request: commit Jan's proposed patch, and backport it to 5.17. 
That
would fix the real regression and only cause a perceived regression 
for the

case when nopat is enabled. In that case, patches to the i915 driver
would be helpful but necessary to fix a regression.


Sorry again, I mean patches to i915 would be helpful but *not* necessary
to fix a regression.

Regards,

Chuck Zmudzinski



On 5/20/2022 11:46 AM, Chuck Zmudzinski wrote:

On 5/20/2022 10:06 AM, Jan Beulich wrote:

On 20.05.2022 15:33, Chuck Zmudzinski wrote:

On 5/20/2022 5:41 AM, Jan Beulich wrote:

On 20.05.2022 10:30, Chuck Zmudzinski wrote:

On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote:

On 5/20/2022 2:05 AM, Jan Beulich wrote:

On 20.05.2022 06:43, Chuck Zmudzinski wrote:

On 5/4/22 5:14 AM, Juergen Gross wrote:

On 04.05.22 10:31, Jan Beulich wrote:

On 03.05.2022 15:22, Juergen Gross wrote:

... these uses there are several more. You say nothing on why
those want
leaving unaltered. When preparing my earlier patch I did 
inspect them

and came to the conclusion that these all would also better
observe the
adjusted behavior (or else I couldn't have left 
pat_enabled() as the
only predicate). In fact, as said in the description of my 
earlier

patch, in
my debugging I did find the use in i915_gem_object_pin_map() 
to be

the
problematic one, which you leave alone.

Oh, I missed that one, sorry.

That is why your patch would not fix my Haswell unless
it also touches i915_gem_object_pin_map() in
drivers/gpu/drm/i915/gem/i915_gem_pages.c

I wanted to be rather defensive in my changes, but I agree at 
least

the
case in arch_phys_wc_add() might want to be changed, too.

I think your approach needs to be more aggressive so it will fix
all the known false negatives introduced by bdd8b6c98239
such as the one in i915_gem_object_pin_map().

I looked at Jan's approach and I think it would fix the issue
with my Haswell as long as I don't use the nopat option. I
really don't have a strong opinion on that question, but I
think the nopat option as a Linux kernel option, as opposed
to a hypervisor option, should only affect the kernel, and
if the hypervisor provides the pat feature, then the kernel
should not override that,

Hmm, why would the kernel not be allowed to override that? Such
an override would affect only the single domain where the
kernel runs; other domains could take their own decisions.

Also, for the sake of completeness: "nopat" used when running on
bare metal has the same bad effect on system boot, so there
pretty clearly is an error cleanup issue in the i915 driver. But
that's orthogonal, and I expect the maintainers may not even care
(but tell us "don't do that then").

Actually I just did a test with the last official Debian kernel
build of Linux 5.16, that is, a kernel before bdd8b6c98239 was
applied. In fact, the nopat option does *not* break the i915 driver
in 5.16. That is, with the nopat option, the i915 driver loads
normally on both the bare metal and on the Xen hypervisor.
That means your presumption (and the presumption of
the author of bdd8b6c98239) that the "nopat" option was
being observed by the i915 driver is incorrect. Setting "nopat"
had no effect on my system with Linux 5.16. So after doing these
tests, I am against the aggressive approach of breaking the i915
driver with the "nopat" option because prior to bdd8b6c98239,
nopat did not break the i915 driver. Why break it now?

Because that's, in my understanding, is the purpose of "nopat"
(not breaking the driver of course - that's a driver bug -, but
having an effect on the driver).

I wouldn't call it a driver bug, but an incorrect configuration of the
kernel by the user.  I presume X86_FEATURE_PAT is required by the
i915 driver

The driver ought to work fine without PAT (and hence without being
able to make WC mappings). It would use UC instead and be slow, but
it ought to work.


I am not an expert, but I think the reason it failed on my box was
because of the requirements of CI. Maybe the driver would fall back
to UC if the add_taint_for_CI function did not halt the entire system
in response to the failed test for PAT when trying to use WC mappings.


and therefore the driver should refuse to disable
it if the user requests to disable it and instead warn the user that
the dri

Re: [PATCH V8 1/2] libxl: Add support for Virtio disk configuration

2022-05-20 Thread Oleksandr




On 18.05.22 13:52, Anthony PERARD wrote:


Hello Anthony



On Tue, May 03, 2022 at 08:26:02PM +0300, Oleksandr Tyshchenko wrote:

From: Oleksandr Tyshchenko 

This patch adds basic support for configuring and assisting virtio-mmio
based virtio-disk backend (emulator) which is intended to run out of
Qemu and could be run in any domain.
Although the Virtio block device is quite different from traditional
Xen PV block device (vbd) from the toolstack's point of view:
  - as the frontend is virtio-blk which is not a Xenbus driver, nothing
written to Xenstore are fetched by the frontend currently ("vdev"
is not passed to the frontend). But this might need to be revised
in future, so frontend data might be written to Xenstore in order to
support hotplugging virtio devices or passing the backend domain id
on arch where the device-tree is not available.
  - the ring-ref/event-channel are not used for the backend<->frontend
communication, the proposed IPC for Virtio is IOREQ/DM
it is still a "block device" and ought to be integrated in existing
"disk" handling. So, re-use (and adapt) "disk" parsing/configuration
logic to deal with Virtio devices as well.

For the immediate purpose and an ability to extend that support for
other use-cases in future (Qemu, virtio-pci, etc) perform the following
actions:
- Add new disk backend type (LIBXL_DISK_BACKEND_OTHER) and reflect
   that in the configuration
- Introduce new disk "specification" and "transport" fields to struct
   libxl_device_disk. Both are written to the Xenstore. The transport
   field is only used for the specification "virtio" and it assumes
   only "mmio" value for now.
- Introduce new "specification" option with "xen" communication
   protocol being default value.
- Add new device kind (LIBXL__DEVICE_KIND_VIRTIO_DISK) as current
   one (LIBXL__DEVICE_KIND_VBD) doesn't fit into Virtio disk model

Is this still an issue?



Yes, it is.

Both LIBXL__DEVICE_KIND_VBD and LIBXL__DEVICE_KIND_VIRTIO_DISK are for 
disk devices, but they are quite different from toolstack PoV.


They use different specifications/transports. The latter is for 
virtio-blk driver which is not xenbus driver. The way how both Virtio 
backend and frontend get configuration and communicate with each


are completely different from Xen PV drivers.



  Since v5, the "disk/vbd" kind is used. Also see
my comment about libxl_device_disk_get_path() regarding this.


ok






An example of domain configuration for Virtio disk:
disk = [ 'phy:/dev/mmcblk0p3, xvda1, backendtype=other, specification=virtio']

Nothing has changed for default Xen disk configuration.

Please note, this patch is not enough for virtio-disk to work
on Xen (Arm), as for every Virtio device (including disk) we need
to allocate Virtio MMIO params (IRQ and memory region) and pass
them to the backend, also update Guest device-tree. The subsequent
patch will add these missing bits. For the current patch,
the default "irq" and "base" are just written to the Xenstore.
This is not an ideal splitting, but this way we avoid breaking
the bisectability.

Signed-off-by: Oleksandr Tyshchenko 
---
diff --git a/tools/libs/light/libxl_disk.c b/tools/libs/light/libxl_disk.c
index a5ca778..7fd98ce 100644
--- a/tools/libs/light/libxl_disk.c
+++ b/tools/libs/light/libxl_disk.c
@@ -163,6 +163,19 @@ static int libxl__device_disk_setdefault(libxl__gc *gc, 
uint32_t domid,
  rc = libxl__resolve_domid(gc, disk->backend_domname, 
&disk->backend_domid);
  if (rc < 0) return rc;
  
+if (disk->specification == LIBXL_DISK_SPECIFICATION_UNKNOWN)

+disk->specification = LIBXL_DISK_SPECIFICATION_XEN;
+
+/*
+ * The transport field is only used for the specification "virtio" and
+ * it assumes only "mmio" value for now. When there will be a need to add
+ * "pci" support, we will need to remove the enforcement here and
+ * respective assert(s) down the code and let the toolstack to decide
+ * the transport to use.
+ */
+if (disk->specification == LIBXL_DISK_SPECIFICATION_VIRTIO)
+disk->transport = LIBXL_DISK_TRANSPORT_MMIO;

Could you check that `disk->transport` is unset when
`specification==xen` ? And probably return ERROR_INVAL in this case.


yes, will do




Also, I don't think you should overwrite the value set by an application
in _setdefault(). If `specification==virtio`, check first that
`transport` as a supported value (unknown or mmio) then you can then you
can set the `transport` value expected by virtio if it wasn't set by the
application. ( An example of this is done the function already when
enforcing qdisk for cdroms. )


I got it, will do


I transform to something like that:


    if (disk->specification == LIBXL_DISK_SPECIFICATION_XEN &&
    disk->transport != LIBXL_DISK_TRANSPORT_UNKNOWN) {
    LOGD(ERROR, domid, "Transport is only supported for 
specification virtio");

    return ERROR_FAIL;
    }

    /* Force transport mmio for specifica

[ovmf test] 170608: regressions - FAIL

2022-05-20 Thread osstest service owner

flight 170608 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/170608/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 168254
 build-amd64   6 xen-buildfail REGR. vs. 168254
 build-i3866 xen-buildfail REGR. vs. 168254
 build-i386-xsm6 xen-buildfail REGR. vs. 168254

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a

version targeted for testing:
 ovmf a21a3438f795deecb24e1843c1636f95c485017c
baseline version:
 ovmf b1b89f9009f2390652e0061bd7b24fc40732bc70

Last test of basis   168254  2022-02-28 10:41:46 Z   81 days
Failing since168258  2022-03-01 01:55:31 Z   80 days 1125 attempts
Testing same since   170593  2022-05-20 06:42:41 Z0 days   13 attempts


People who touched revisions under test:
  Abdul Lateef Attar 
  Abdul Lateef Attar via groups.io 
  Abner Chang 
  Akihiko Odaki 
  Anthony PERARD 
  Bandaru, Purna Chandra Rao 
  Bo Chang Ke 
  Bob Feng 
  Chao Li 
  Chao, Zhuoran 
  Chen Lin Z 
  Chen, Christine 
  Chen, Lin Z 
  Corvin Köhne 
  Dandan Bi 
  dann frazier 
  Dun Tan 
  duntan 
  Feng, Bob C 
  Gerd Hoffmann 
  Gua Guo 
  Guo Dong 
  Guomin Jiang 
  Hao A Wu 
  Heng Luo 
  Hua Ma 
  Huang, Li-Xia 
  Jagadeesh Ujja 
  Jake Garver 
  Jake Garver via groups.io 
  Jason 
  Jason Lou 
  Jiewen Yao 
  Ke, Bo-ChangX 
  Ken Lautner 
  Kenneth Lautner 
  Kun Qin 
  Kuo, Ted 
  Laszlo Ersek 
  Lean Sheng Tan 
  Leif Lindholm 
  Li, Yi1 
  Li, Zhihao 
  Liming Gao 
  Liu 
  Liu Yun 
  Liu Yun Y 
  Liu, Zhiguang 
  Lixia Huang 
  Lou, Yun 
  Ma, Hua 
  Mara Sophie Grosch 
  Mara Sophie Grosch via groups.io 
  Matt DeVillier 
  Michael D Kinney 
  Michael Kubacki 
  Michael Kubacki 
  Min M Xu 
  Min Xu 
  Oliver Steffen 
  Patrick Rudolph 
  Peter Grehan 
  Purna Chandra Rao Bandaru 
  Ray Ni 
  Rebecca Cran 
  Rebecca Cran 
  Sami Mujawar 
  Sean Rhodes 
  Sean Rhodes sean@starlabs.systems
  Sebastien Boeuf 
  Sunny Wang 
  Tan, Dun 
  Ted Kuo 
  Tom Lendacky 
  Wenyi Xie 
  wenyi,xie via groups.io 
  Xiaolu.Jiang 
  Xie, Yuanhao 
  Yi Li 
  yi1 li 
  Yu Pu 
  Yuanhao Xie 
  Yuwei Chen 
  Zhiguang Liu 
  Zhihao Li 
  Zhuoran Chao 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 6968 lines long.)

1 2 >

1 - 100 of 112 matches

Mail list logo