[libvirt test] 149455: regressions - FAIL

2020-04-06 Thread osstest service owner
flight 149455 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149455/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 146182
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 146182
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 146182
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 146182

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  bc210e7ab25945895ea112a62b3435bd2b6b0e8d
baseline version:
 libvirt  a1cd25b919509be2645dbe6f952d5263e0d4e4e5

Last test of basis   146182  2020-01-17 06:00:23 Z   80 days
Failing since146211  2020-01-18 04:18:52 Z   79 days   76 attempts
Testing same since   149407  2020-04-04 04:18:49 Z2 days3 attempts


People who touched revisions under test:
  Andrea Bolognani 
  Arnaud Patard 
  Boris Fiuczynski 
  Christian Ehrhardt 
  Christian Schoenebeck 
  Collin Walling 
  Daniel Henrique Barboza 
  Daniel P. Berrangé 
  Daniel Veillard 
  Dario Faggioli 
  Erik Skultety 
  Gaurav Agrawal 
  Han Han 
  Jim Fehlig 
  Jiri Denemark 
  Jonathon Jongsma 
  Julio Faracco 
  Ján Tomko 
  Laine Stump 
  Lin Ma 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Mauro S. M. Rodrigues 
  Michal Privoznik 
  Nikolay Shirokovskiy 
  Pavel Hrdina 
  Pavel Mores 
  Peter Krempa 
  Pino Toscano 
  Rafael Fonseca 
  Richard W.M. Jones 
  Rikard Falkeborn 
  Ryan Moeller 
  Sahid Orentino Ferdjaoui 
  Sebastian Mitterle 
  Seeteena Thoufeek 
  Stefan Berger 
  Stefan Berger 
  Stefan Hajnoczi 
  Thomas Huth 
  Wu Qingliang 
  Your Name 
  Zhang Bo 
  zhenwei pi 
  Zhimin Feng 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  fail
 build-arm64-libvirt  fail
 build-armhf-libvirt  fail
 build-i386-libvirt   fail
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   blocked 
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmblocked 
 test-amd64-amd64-libvirt-xsm blocked 
 test-arm64-arm64-libvirt-xsm blocked 
 test-amd64-i386-libvirt-xsm  blocked 
 test-amd64-amd64-libvirt blocked 
 test-arm64-arm64-libvirt blocked 
 test-armhf-armhf-libvirt blocked 
 test-amd64-i386-libvirt  blocked 
 test-amd64-amd64-libvirt-pairblocked 
 test-amd64-i386-libvirt-pair blocked 
 test-arm64-arm64-libvirt-qcow2   blocked 
 test-armhf-armhf-libvirt-raw blocked 
 test-amd64-amd64-libvirt-vhd  

RE: [PATCH 5/7] xen: include xen/guest_access.h rather than asm/guest_access.h

2020-04-06 Thread Paul Durrant
> -Original Message-
> From: Julien Grall 
> Sent: 04 April 2020 14:10
> To: xen-devel@lists.xenproject.org
> Cc: jul...@xen.org; Julien Grall ; Stefano Stabellini 
> ;
> Volodymyr Babchuk ; Andrew Cooper 
> ; George
> Dunlap ; Ian Jackson ; 
> Jan Beulich
> ; Wei Liu ; Roger Pau Monné 
> ; Paul Durrant
> ; Jun Nakajima ; Kevin Tian 
> 
> Subject: [PATCH 5/7] xen: include xen/guest_access.h rather than 
> asm/guest_access.h
> 
> From: Julien Grall 
> 
> Only a few places are actually including asm/guest_access.h. While this
> is fine today, a follow-up patch will want to move most of the helpers
> from asm/guest_access.h to xen/guest_access.h.
> 
> To prepare the move, everyone should include xen/guest_access.h rather
> than asm/guest_access.h.
> 
> Interestingly, asm-arm/guest_access.h includes xen/guest_access.h. The
> inclusion is now removed as no-one but the latter should include the
> former.
> 
> Signed-off-by: Julien Grall 
> ---
>  xen/arch/arm/decode.c|  2 +-
>  xen/arch/arm/domain.c|  2 +-
>  xen/arch/arm/guest_walk.c|  3 ++-
>  xen/arch/arm/guestcopy.c |  2 +-
>  xen/arch/arm/vgic-v3-its.c   |  2 +-
>  xen/arch/x86/hvm/svm/svm.c   |  2 +-
>  xen/arch/x86/hvm/viridian/viridian.c |  2 +-
>  xen/arch/x86/hvm/vmx/vmx.c   |  2 +-
>  xen/common/libelf/libelf-loader.c|  2 +-
>  xen/include/asm-arm/guest_access.h   |  1 -
>  xen/include/asm-x86/guest_access.h   | 22 --
>  xen/lib/x86/private.h|  2 +-
>  12 files changed, 23 insertions(+), 21 deletions(-)
> 
> diff --git a/xen/arch/arm/decode.c b/xen/arch/arm/decode.c
> index 144793c8ce..792c2e92a7 100644
> --- a/xen/arch/arm/decode.c
> +++ b/xen/arch/arm/decode.c
> @@ -17,12 +17,12 @@
>   * GNU General Public License for more details.
>   */
> 
> +#include 
>  #include 
>  #include 
>  #include 
> 
>  #include 
> -#include 
> 
>  #include "decode.h"
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 2190d908eb..b062c232b6 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -26,7 +27,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> diff --git a/xen/arch/arm/guest_walk.c b/xen/arch/arm/guest_walk.c
> index a1cdd7f4af..b4496c4c86 100644
> --- a/xen/arch/arm/guest_walk.c
> +++ b/xen/arch/arm/guest_walk.c
> @@ -16,8 +16,9 @@
>   */
> 
>  #include 
> +#include 
>  #include 
> -#include 
> +
>  #include 
>  #include 
> 
> diff --git a/xen/arch/arm/guestcopy.c b/xen/arch/arm/guestcopy.c
> index c8023e2bca..32681606d8 100644
> --- a/xen/arch/arm/guestcopy.c
> +++ b/xen/arch/arm/guestcopy.c
> @@ -1,10 +1,10 @@
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> 
>  #include 
> -#include 
> 
>  #define COPY_flush_dcache   (1U << 0)
>  #define COPY_from_guest (0U << 1)
> diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c
> index 6e153c698d..58d939b85f 100644
> --- a/xen/arch/arm/vgic-v3-its.c
> +++ b/xen/arch/arm/vgic-v3-its.c
> @@ -32,6 +32,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -39,7 +40,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 888f504a94..9e14a451eb 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -16,6 +16,7 @@
>   * this program; If not, see .
>   */
> 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -34,7 +35,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
> b/xen/arch/x86/hvm/viridian/viridian.c
> index 977c1bc54f..dc7183a546 100644
> --- a/xen/arch/x86/hvm/viridian/viridian.c
> +++ b/xen/arch/x86/hvm/viridian/viridian.c
> @@ -5,12 +5,12 @@
>   * Hypervisor Top Level Functional Specification for more information.
>   */
> 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 1c398fdb6e..98e9c91ea3 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -15,6 +15,7 @@
>   * this program; If not, see .
>   */
> 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -31,7 +32,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> diff --git a/xen/common/libelf/libelf-loader.c 
> b/xen/common/libelf/libelf-loader.c
> index 0f468727d0..629cc0d3e6 100644
> --- a/xen/common/libelf/libelf-loader.c
> +++ b/xen/common/libelf/libelf-loader.c
> @@ -16,7 +16,7 @@
>   */
> 
>  #ifdef __XEN__
> -#i

RE: Live migration and PV device handling

2020-04-06 Thread Paul Durrant
> -Original Message-
> From: Xen-devel  On Behalf Of Dongli 
> Zhang
> Sent: 03 April 2020 23:33
> To: Andrew Cooper ; Anastassios Nanos 
> ; xen-
> de...@lists.xen.org
> Subject: Re: Live migration and PV device handling
> 
> Hi Andrew,
> 
> On 4/3/20 5:42 AM, Andrew Cooper wrote:
> > On 03/04/2020 13:32, Anastassios Nanos wrote:
> >> Hi all,
> >>
> >> I am trying to understand how live-migration happens in xen. I am
> >> looking in the HVM guest case and I have dug into the relevant parts
> >> of the toolstack and the hypervisor regarding memory, vCPU context
> >> etc.
> >>
> >> In particular, I am interested in how PV device migration happens. I
> >> assume that the guest is not aware of any suspend/resume operations
> >> being done
> >
> > Sadly, this assumption is not correct.  HVM guests with PV drivers
> > currently have to be aware in exactly the same way as PV guests.
> >
> > Work is in progress to try and address this.  See
> > https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=775a02452ddf3a6889690de90b1a94eb29c3c732
> > (sorry - for some reason that doc isn't being rendered properly in
> > https://xenbits.xen.org/docs/ )
> >
> 
> I read below from the commit:
> 
> +* The toolstack choose a randomized domid for initial creation or default
> +migration, but preserve the source domid non-cooperative migration.
> +Non-Cooperative migration will have to be denied if the domid is
> +unavailable on the target host, but randomization of domid on creation
> +should hopefully minimize the likelihood of this. Non-Cooperative migration
> +to localhost will clearly not be possible.
> 
> Does that indicate while scope of domid_t is shared by a single server in old
> design, the scope of domid_t is shared by a cluster of server in new design?
> 
> That is, the domid should be unique in the cluster of all servers if we expect
> non-cooperative migration always succeed?
> 

That would be necessary to guarantee success (or rather guarantee no failure 
due to domid clash) but the scope of xl/libxl is single serve, hence 
randomization is the best we have to reduce clashes to a minimum.

  Paul




[qemu-mainline test] 149447: regressions - FAIL

2020-04-06 Thread osstest service owner
flight 149447 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149447/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 10 debian-hvm-install fail REGR. vs. 
144861
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail 
REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-win7-amd64 10 windows-install  fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. 
vs. 144861
 test-amd64-i386-qemuu-rhel6hvm-intel 10 redhat-install   fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. 
vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-amd64 10 debian-hvm-install fail REGR. vs. 
144861
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail 
REGR. vs. 144861
 test-amd64-i386-xl-qemuu-ws16-amd64 10 windows-install   fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-install  fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 
144861
 test-amd64-amd64-qemuu-nested-amd 10 debian-hvm-install  fail REGR. vs. 144861
 test-amd64-i386-qemuu-rhel6hvm-amd 10 redhat-install fail REGR. vs. 144861
 test-amd64-amd64-libvirt 12 guest-start  fail REGR. vs. 144861
 test-amd64-i386-libvirt-pair 21 guest-start/debian   fail REGR. vs. 144861
 test-amd64-amd64-libvirt-xsm 12 guest-start  fail REGR. vs. 144861
 test-amd64-i386-libvirt  12 guest-start  fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 10 debian-hvm-install fail 
REGR. vs. 144861
 test-amd64-amd64-libvirt-pair 21 guest-start/debian  fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 10 debian-hvm-install fail 
REGR. vs. 144861
 test-amd64-amd64-qemuu-nested-intel 10 debian-hvm-install fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 144861
 test-amd64-i386-libvirt-xsm  12 guest-start  fail REGR. vs. 144861
 test-arm64-arm64-libvirt-xsm 12 guest-start  fail REGR. vs. 144861
 test-armhf-armhf-libvirt 12 guest-start  fail REGR. vs. 144861
 test-amd64-amd64-libvirt-vhd 10 debian-di-installfail REGR. vs. 144861
 test-amd64-amd64-xl-qcow210 debian-di-installfail REGR. vs. 144861
 test-armhf-armhf-xl-vhd  10 debian-di-installfail REGR. vs. 144861
 test-armhf-armhf-libvirt-raw 10 debian-di-installfail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-win7-amd64 10 windows-install   fail REGR. vs. 144861

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-rtds   16 guest-localmigrate fail in 149409 pass in 149447
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeat  fail pass in 149409

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-dom0pvh-xl-amd 20 guest-start/debian.repeat fail baseline 
untested
 test-amd64-amd64-dom0pvh-xl-intel 18 guest-localmigrate/x10 fail in 149409 
baseline untested
 test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10   fail  like 144861
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-

[PATCH OSSTEST] linux: enable x2APIC kernel support

2020-04-06 Thread Roger Pau Monne
Without it Linux is not able to parse the x2APIC ACPI MADT entries
crafted by Xen when booted in PVH mode, following log is from one of
the dom0pvh jobs:

ACPI: x2apic entry ignored
ACPI: x2apic entry ignored
ACPI: x2apic entry ignored
ACPI: x2apic entry ignored
IOAPIC[0]: apic_id 0, version 17, address 0xfec0, GSI 0-23
IOAPIC[1]: apic_id 1, version 17, address 0xfec2, GSI 24-55
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
Using ACPI (MADT) for SMP configuration information
smpboot: Boot CPU (id 0) not listed by BIOS
smpboot: Allowing 1 CPUs, 0 hotplug CPUs

Note that PVH mode only creates x2APIC entries for simplicity, and
because x2APIC mode is always provided to PVH guests. Not adding
x2APIC support forces Linux to boot in UP mode, since x2APIC entries
contain the information of additional processors available on the
system.

Signed-off-by: Roger Pau Monné 
---
 ts-kernel-build | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/ts-kernel-build b/ts-kernel-build
index 89cdafcb..6c8f1d6a 100755
--- a/ts-kernel-build
+++ b/ts-kernel-build
@@ -622,6 +622,9 @@ esac
 # Disable components that don't build
 setopt CONFIG_TEGRA_HOST1X n
 
+# Enable x2APIC support for PVH mode
+setopt CONFIG_X86_X2APIC y
+
 exit 0
 END
 }
-- 
2.26.0




[PATCH v2] tools/libxl: make default of max event channels dependant on vcpus

2020-04-06 Thread Juergen Gross
Since Xen 4.4 the maximum number of event channels for a guest is
defaulting to 1023. For large guests with lots of vcpus this is not
enough, as e.g. the Linux kernel uses 7 event channels per vcpu,
limiting the guest to about 140 vcpus.

Instead of requiring to specify the allowed number of event channels
via the "event_channels" domain config option, make the default
depend on the maximum number of vcpus of the guest. This will require
to use the "event_channels" domain config option in fewer cases as
before.

As different guests will have differing needs the calculation of the
maximum number of event channels can be a rough estimate only,
currently based on the Linux kernel requirements. Nevertheless it is
much better than the static upper limit of today as more guests will
boot just fine with the new approach.

In order not to regress current configs use 1023 as the minimum
default setting.

Signed-off-by: Juergen Gross 
---
V2:
- use max() instead of min()
- clarify commit message a little bit
---
 tools/libxl/libxl_create.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index e7cb2dbc2b..c025b21894 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -226,7 +226,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 b_info->iomem[i].gfn = b_info->iomem[i].start;
 
 if (!b_info->event_channels)
-b_info->event_channels = 1023;
+b_info->event_channels = max(1023, b_info->max_vcpus * 8 + 255);
 
 libxl__arch_domain_build_info_setdefault(gc, b_info);
 libxl_defbool_setdefault(&b_info->dm_restrict, false);
-- 
2.16.4




RE: [PATCH 1/5] xen/common: introduce a new framework for save/restore of 'domain' context

2020-04-06 Thread Paul Durrant
> -Original Message-
> From: Julien Grall 
> Sent: 03 April 2020 18:24
> To: p...@xen.org; xen-devel@lists.xenproject.org
> Cc: 'Andrew Cooper' ; 'George Dunlap' 
> ; 'Ian
> Jackson' ; 'Jan Beulich' ; 
> 'Stefano Stabellini'
> ; 'Wei Liu' 
> Subject: Re: [PATCH 1/5] xen/common: introduce a new framework for 
> save/restore of 'domain' context
> 
> Hi Paul,
> 
> On 03/04/2020 16:55, Paul Durrant wrote:
> >> -Original Message-
> > [snip]
> >>> +
> >>> +#include 
> >>> +
> >>> +struct domain_context {
> >>> +bool log;
> >>> +struct domain_save_descriptor desc;
> >>> +domain_copy_entry copy;
> >>
> >> As your new framework is technically an extension of existing one, it
> >> would be good to explain why we diverge in the definitions.
> >>
> >
> > I don't follow. What is diverging? I explain in the commit comment that 
> > this is a parallel
> framework. Do I need to justify why it is not a carbon copy of the HVM one?
> 
> Well, they are both restoring/saving guest state. The only difference is
> the existing one is focusing on HVM state.
> 
> So it would make sense long term to have only one hypercall and tell
> what you want to save. In fact, some of the improvement here would
> definitely make the HVM one nicer to use (at least in the context of LU).
> 

I guess we could move the HVM save records over to the new framework, but it 
works for the moment so I don't want to bring it into scope now.

>  From the commit message, it is not clear to me why a new framework and
> why the infrastructure is at the same time different but not.
> 

An alternative would be to move the HVM save code into common code and then try 
to adapt it. I think that would result in more code churn and ultimately be 
harder to review. The extra infrastructure introduced here is fairly minimal 
and, for the moment, only targeting PV state. As I said above there's nothing 
stopping the HVM records being ported over later once any initial issues have 
been shaken out.

  Paul






Re: [Xen-devel] [PATCH 0/5] use new API for Xen page tables

2020-04-06 Thread Hongyan Xia
Ping.

On Mon, 2020-03-23 at 09:41 +, Hongyan Xia wrote:
> From: Hongyan Xia 
> 
> This small series is basically just rewriting functions using the new
> API to map and unmap PTEs. Each patch is independent.
> 
> Apart from mapping and unmapping page tables, no other functional
> change
> intended.
> 
> Wei Liu (5):
>   x86/shim: map and unmap page tables in replace_va_mapping
>   x86_64/mm: map and unmap page tables in m2p_mapped
>   x86_64/mm: map and unmap page tables in share_hotadd_m2p_table
>   x86_64/mm: map and unmap page tables in destroy_compat_m2p_mapping
>   x86_64/mm: map and unmap page tables in destroy_m2p_mapping
> 
>  xen/arch/x86/pv/shim.c | 10 ---
>  xen/arch/x86/x86_64/mm.c   | 55 +---
> --
>  xen/include/asm-x86/page.h | 18 +
>  3 files changed, 62 insertions(+), 21 deletions(-)
> 




Re: [PATCH v2 2/2] xen: credit2: fix credit reset happening too few times

2020-04-06 Thread Dario Faggioli
On Thu, 2020-04-02 at 18:32 +, George Dunlap wrote:
> > On Mar 19, 2020, at 12:12 AM, Dario Faggioli 
> > wrote:
> > 
> > There is a bug in commit 5e4b4199667b9 ("xen: credit2: only reset
> > credit on reset condition"). In fact, the aim of that commit was to
> > make sure that we do not perform too many credit reset operations
> > (which are not super cheap, and in an hot-path). But the check used
> > to determine whether a reset is necessary was the wrong one.
> > 
> > In fact, knowing just that some vCPUs have been skipped, while
> > traversing the runqueue (in runq_candidate()), is not enough. We
> > need to check explicitly whether the first vCPU in the runqueue
> > has a negative amount of credit.
> 
> Oh, so if the top of the runqueue has negative credit, but it’s not
> chosen, then the one we *do* run has even lower credit.  Still not
> quite sure how that leads to a situation where credit resets don’t
> happen for long periods of time.  But anyway...
> 
Fact is, without this patch, we wouldn't call reset_credit() if we
"skipped" a vcpu/unit.

That means we skip all the times we did not pick the head of the
runqueue, even if the one we picked also have negative credits (as the
check for 'skipped_units == 0' was the first condition of the '&&').

That's what was making us skip resets. :-)

> Acked-by: George Dunlap 
> 
Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)



signature.asc
Description: This is a digitally signed message part


Re: [PATCH 5/7] xen: include xen/guest_access.h rather than asm/guest_access.h

2020-04-06 Thread Julien Grall

Hi Paul,

On 06/04/2020 08:40, Paul Durrant wrote:

diff --git a/xen/include/asm-x86/guest_access.h 
b/xen/include/asm-x86/guest_access.h
index 9ee275d01f..5c3dfc47b6 100644
--- a/xen/include/asm-x86/guest_access.h
+++ b/xen/include/asm-x86/guest_access.h
@@ -54,22 +54,24 @@

  /* Cast a XEN_GUEST_HANDLE to XEN_GUEST_HANDLE_PARAM */
  #define guest_handle_to_param(hnd, type) ({  \
+typeof((hnd).p) _x = (hnd).p;\
+XEN_GUEST_HANDLE_PARAM(type) _y = { _x };\
  /* type checking: make sure that the pointers inside \
   * XEN_GUEST_HANDLE and XEN_GUEST_HANDLE_PARAM are of\
   * the same type, then return hnd */ \
-(void)((typeof(&(hnd).p)) 0 ==   \
-(typeof(&((XEN_GUEST_HANDLE_PARAM(type)) {}).p)) 0); \
-(hnd);   \
+(void)(&_x == &_y.p);\
+_y;  \
  })

  /* Cast a XEN_GUEST_HANDLE_PARAM to XEN_GUEST_HANDLE */
-#define guest_handle_from_param(hnd, type) ({\
-/* type checking: make sure that the pointers inside \
- * XEN_GUEST_HANDLE and XEN_GUEST_HANDLE_PARAM are of\
- * the same type, then return hnd */ \
-(void)((typeof(&(hnd).p)) 0 ==   \
-(typeof(&((XEN_GUEST_HANDLE_PARAM(type)) {}).p)) 0); \
-(hnd);   \
+#define guest_handle_from_param(hnd, type) ({   \
+typeof((hnd).p) _x = (hnd).p;   \
+XEN_GUEST_HANDLE(type) _y = { _x }; \
+/* type checking: make sure that the pointers inside\
+ * XEN_GUEST_HANDLE and XEN_GUEST_HANDLE_PARAM are of   \
+ * the same type, then return hnd */\
+(void)(&_x == &_y.p);   \
+_y; \
  })



The commit comment would have the reader believe that this patch is just some 
changes in header file inclusion. These last two hunks are something else so I 
would suggest they get split out into a separate patch.


These two chunks were meant to be squashed in patch #6, but I messed up 
the rebase. I will fix on the next version.


Sorry for that.

Cheers,



   Paul


  #define guest_handle_for_field(hnd, type, fld)  \
diff --git a/xen/lib/x86/private.h b/xen/lib/x86/private.h
index b793181464..2d53bd3ced 100644
--- a/xen/lib/x86/private.h
+++ b/xen/lib/x86/private.h
@@ -4,12 +4,12 @@
  #ifdef __XEN__

  #include 
+#include 
  #include 
  #include 
  #include 
  #include 

-#include 
  #include 

  #define copy_to_buffer_offset copy_to_guest_offset
--
2.17.1





--
Julien Grall



Re: [PATCH] xen/blkfront: fix memory allocation flags in blkfront_setup_indirect()

2020-04-06 Thread Roger Pau Monné
On Fri, Apr 03, 2020 at 11:00:34AM +0200, Juergen Gross wrote:
> Commit 1d5c76e664333 ("xen-blkfront: switch kcalloc to kvcalloc for
> large array allocation") didn't fix the issue it was meant to, as the
> flags for allocating the memory are GFP_NOIO, which will lead the
> memory allocation falling back to kmalloc().
> 
> So instead of GFP_NOIO use GFP_KERNEL and do all the memory allocation
> in blkfront_setup_indirect() in a memalloc_noio_{save,restore} section.
> 
> Fixes: 1d5c76e664333 ("xen-blkfront: switch kcalloc to kvcalloc for large 
> array allocation")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Juergen Gross 

Acked-by: Roger Pau Monné 

Thanks!



RE: [PATCH 2/5] xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext

2020-04-06 Thread Paul Durrant
> -Original Message-
> From: Julien Grall 
> Sent: 01 April 2020 14:42
> To: Paul Durrant ; xen-devel@lists.xenproject.org
> Cc: Daniel De Graaf ; Ian Jackson 
> ; Wei Liu
> ; Andrew Cooper ; George Dunlap 
> ; Jan
> Beulich ; Stefano Stabellini 
> Subject: Re: [PATCH 2/5] xen/common/domctl: introduce 
> XEN_DOMCTL_get/setdomaincontext
> 
> Hi Paul,
> 
> On 27/03/2020 18:50, Paul Durrant wrote:
> > These domctls provide a mechanism to get and set domain context from
> > the toolstack.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Daniel De Graaf 
> > Cc: Ian Jackson 
> > Cc: Wei Liu 
> > Cc: Andrew Cooper 
> > Cc: George Dunlap 
> > Cc: Jan Beulich 
> > Cc: Julien Grall 
> > Cc: Stefano Stabellini 
> > ---
> >   tools/flask/policy/modules/xen.if   |   4 +-
> >   tools/libxc/include/xenctrl.h   |  11 +++
> >   tools/libxc/xc_domain.c |  52 +
> >   xen/common/domctl.c | 115 
> >   xen/include/public/domctl.h |  41 +-
> >   xen/xsm/flask/hooks.c   |   6 ++
> >   xen/xsm/flask/policy/access_vectors |   4 +
> >   7 files changed, 230 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/flask/policy/modules/xen.if 
> > b/tools/flask/policy/modules/xen.if
> > index 8eb2293a52..2bc9db4f64 100644
> > --- a/tools/flask/policy/modules/xen.if
> > +++ b/tools/flask/policy/modules/xen.if
> > @@ -53,7 +53,7 @@ define(`create_domain_common', `
> > allow $1 $2:domain2 { set_cpu_policy settsc setscheduler setclaim
> > set_vnumainfo get_vnumainfo cacheflush
> > psr_cmt_op psr_alloc soft_reset
> > -   resource_map get_cpu_policy };
> > +   resource_map get_cpu_policy setcontext };
> > allow $1 $2:security check_context;
> > allow $1 $2:shadow enable;
> > allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage 
> > mmuext_op updatemp };
> > @@ -97,7 +97,7 @@ define(`migrate_domain_out', `
> > allow $1 $2:hvm { gethvmc getparam };
> > allow $1 $2:mmu { stat pageinfo map_read };
> > allow $1 $2:domain { getaddrsize getvcpucontext pause destroy };
> > -   allow $1 $2:domain2 gettsc;
> > +   allow $1 $2:domain2 { gettsc getcontext };
> > allow $1 $2:shadow { enable disable logdirty };
> >   ')
> >
> > diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> > index fc6e57a1a0..5c0d0d27a4 100644
> > --- a/tools/libxc/include/xenctrl.h
> > +++ b/tools/libxc/include/xenctrl.h
> > @@ -867,6 +867,17 @@ int xc_domain_hvm_setcontext(xc_interface *xch,
> >uint8_t *hvm_ctxt,
> >uint32_t size);
> >
> > +int xc_domain_getcontext(xc_interface *xch,
> > + uint32_t domid,
> > + uint32_t mask,
> > + uint8_t *ctxt_buf,
> 
> Why do you use uint8_t rather than void here?
> 
> > + uint32_t size);
> > +int xc_domain_setcontext(xc_interface *xch,
> > + uint32_t domid,
> > + uint32_t mask,
> > + uint8_t *ctxt_buf,
> > + uint32_t size);
> > +
> >   /**
> >* This function will return guest IO ABI protocol
> >*
> > diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> > index 71829c2bce..15bcf671fc 100644
> > --- a/tools/libxc/xc_domain.c
> > +++ b/tools/libxc/xc_domain.c
> > @@ -537,6 +537,58 @@ int xc_domain_hvm_setcontext(xc_interface *xch,
> >   return ret;
> >   }
> >
> > +int xc_domain_getcontext(xc_interface *xch,
> > + uint32_t domid,
> > + uint32_t mask,
> > + uint8_t *ctxt_buf,
> > + uint32_t size)
> > +{
> > +int ret;
> > +DECLARE_DOMCTL;
> > +DECLARE_HYPERCALL_BOUNCE(ctxt_buf, size, 
> > XC_HYPERCALL_BUFFER_BOUNCE_OUT);
> > +
> > +if ( xc_hypercall_bounce_pre(xch, ctxt_buf) )
> > +return -1;
> > +
> > +domctl.cmd = XEN_DOMCTL_getdomaincontext;
> > +domctl.domain = domid;
> > +domctl.u.domaincontext.mask = mask;
> > +domctl.u.domaincontext.size = size;
> > +set_xen_guest_handle(domctl.u.domaincontext.buffer, ctxt_buf);
> > +
> > +ret = do_domctl(xch, &domctl);
> > +
> > +xc_hypercall_bounce_post(xch, ctxt_buf);
> > +
> > +return !ret ? domctl.u.domaincontext.size : -1;
> 
> Looking at the domctl interface, size would be a 32-bit unsigned value.
> However the return is a 32-bit signed value. This is a call for trouble
> if the size is too big.
> 
> > +}
> > +
> > +int xc_domain_setcontext(xc_interface *xch,
> > + uint32_t domid,
> > + uint32_t mask,
> > + uint8_t *ctxt_buf,
> 
> The context is not meant to be modified. So this should really be const.
> 

All of the above were basically inherited via 

Re: [PATCH 1/5] xen/common: introduce a new framework for save/restore of 'domain' context

2020-04-06 Thread Julien Grall

Hi Paul,

On 06/04/2020 09:27, Paul Durrant wrote:

-Original Message-
From: Julien Grall 
Sent: 03 April 2020 18:24
To: p...@xen.org; xen-devel@lists.xenproject.org
Cc: 'Andrew Cooper' ; 'George Dunlap' 
; 'Ian
Jackson' ; 'Jan Beulich' ; 
'Stefano Stabellini'
; 'Wei Liu' 
Subject: Re: [PATCH 1/5] xen/common: introduce a new framework for save/restore 
of 'domain' context

Hi Paul,

On 03/04/2020 16:55, Paul Durrant wrote:

-Original Message-

[snip]

+
+#include 
+
+struct domain_context {
+bool log;
+struct domain_save_descriptor desc;
+domain_copy_entry copy;


As your new framework is technically an extension of existing one, it
would be good to explain why we diverge in the definitions.



I don't follow. What is diverging? I explain in the commit comment that this is 
a parallel

framework. Do I need to justify why it is not a carbon copy of the HVM one?

Well, they are both restoring/saving guest state. The only difference is
the existing one is focusing on HVM state.

So it would make sense long term to have only one hypercall and tell
what you want to save. In fact, some of the improvement here would
definitely make the HVM one nicer to use (at least in the context of LU).



I guess we could move the HVM save records over to the new framework, but it 
works for the moment so I don't want to bring it into scope now.


And I agree, hence why I say "long term" :).




  From the commit message, it is not clear to me why a new framework and
why the infrastructure is at the same time different but not.



An alternative would be to move the HVM save code into common code and then try 
to adapt it. I think that would result in more code churn and ultimately be 
harder to review. The extra infrastructure introduced here is fairly minimal 
and, for the moment, only targeting PV state. As I said above there's nothing 
stopping the HVM records being ported over later once any initial issues have 
been shaken out.


Code churn is always going to necessary one day or another if we want to 
consolidate the two.


Having two frameworks is not without risks. There are a few unknown to 
be answered:

  * Is there any dependency between the two? If yes, what is the ordering?
  * The name of the hypercall does not say anything about "PV". So a 
contributor could think it would be fine to save/restore new HVM state 
in the new generic hypercall. Is it going to be an issue? If so, how do 
we prevent it?

  * Are we going to deprecate the existing framework?

I am not suggesting we should not go with two frameworks, but the 
reasons and implications are not clear to me. Hence, why I think the 
commit message should be expanded with some rationale.


Cheers,

--
Julien Grall



RE: [PATCH 1/5] xen/common: introduce a new framework for save/restore of 'domain' context

2020-04-06 Thread Paul Durrant
> -Original Message-
> From: Julien Grall 
> Sent: 06 April 2020 10:08
> To: p...@xen.org; xen-devel@lists.xenproject.org
> Cc: 'Andrew Cooper' ; 'George Dunlap' 
> ; 'Ian
> Jackson' ; 'Jan Beulich' ; 
> 'Stefano Stabellini'
> ; 'Wei Liu' 
> Subject: Re: [PATCH 1/5] xen/common: introduce a new framework for 
> save/restore of 'domain' context
> 
> Hi Paul,
> 
> On 06/04/2020 09:27, Paul Durrant wrote:
> >> -Original Message-
> >> From: Julien Grall 
> >> Sent: 03 April 2020 18:24
> >> To: p...@xen.org; xen-devel@lists.xenproject.org
> >> Cc: 'Andrew Cooper' ; 'George Dunlap' 
> >> ; 'Ian
> >> Jackson' ; 'Jan Beulich' ; 
> >> 'Stefano Stabellini'
> >> ; 'Wei Liu' 
> >> Subject: Re: [PATCH 1/5] xen/common: introduce a new framework for 
> >> save/restore of 'domain' context
> >>
> >> Hi Paul,
> >>
> >> On 03/04/2020 16:55, Paul Durrant wrote:
>  -Original Message-
> >>> [snip]
> > +
> > +#include 
> > +
> > +struct domain_context {
> > +bool log;
> > +struct domain_save_descriptor desc;
> > +domain_copy_entry copy;
> 
>  As your new framework is technically an extension of existing one, it
>  would be good to explain why we diverge in the definitions.
> 
> >>>
> >>> I don't follow. What is diverging? I explain in the commit comment that 
> >>> this is a parallel
> >> framework. Do I need to justify why it is not a carbon copy of the HVM one?
> >>
> >> Well, they are both restoring/saving guest state. The only difference is
> >> the existing one is focusing on HVM state.
> >>
> >> So it would make sense long term to have only one hypercall and tell
> >> what you want to save. In fact, some of the improvement here would
> >> definitely make the HVM one nicer to use (at least in the context of LU).
> >>
> >
> > I guess we could move the HVM save records over to the new framework, but 
> > it works for the moment so
> I don't want to bring it into scope now.
> 
> And I agree, hence why I say "long term" :).
> 
> >
> >>   From the commit message, it is not clear to me why a new framework and
> >> why the infrastructure is at the same time different but not.
> >>
> >
> > An alternative would be to move the HVM save code into common code and then 
> > try to adapt it. I think
> that would result in more code churn and ultimately be harder to review. The 
> extra infrastructure
> introduced here is fairly minimal and, for the moment, only targeting PV 
> state. As I said above
> there's nothing stopping the HVM records being ported over later once any 
> initial issues have been
> shaken out.
> 
> Code churn is always going to necessary one day or another if we want to
> consolidate the two.
> 
> Having two frameworks is not without risks. There are a few unknown to
> be answered:
>* Is there any dependency between the two? If yes, what is the ordering?

There isn't any dependency at the moment so need I say anything? If an ordering 
dependency is introduced by a future patch then surely that would be time to 
call it out.

>* The name of the hypercall does not say anything about "PV". So a
> contributor could think it would be fine to save/restore new HVM state
> in the new generic hypercall. Is it going to be an issue? If so, how do
> we prevent it?

The commit message says:

"Domain context is state held in the hypervisor that does not come under
the category of 'HVM state' but is instead 'PV state' that is common
between PV guests and enlightened HVM guests (i.e. those that have PV
drivers) such as event channel state, grant entry state, etc."

Do you think this should also appear in a comment? Do we really care? Nothing 
fundamentally prevents the mechanism being used for HVM state, but that may 
introduce an ordering dependency.

>* Are we going to deprecate the existing framework?
> 

There is no intention as yet.

> I am not suggesting we should not go with two frameworks, but the
> reasons and implications are not clear to me. Hence, why I think the
> commit message should be expanded with some rationale.
> 

Ok, I can add a paragraph to try to explain a little more.

  Paul




Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus

2020-04-06 Thread Julien Grall

Hi Jurgen,

On 06/04/2020 09:27, Juergen Gross wrote:

Since Xen 4.4 the maximum number of event channels for a guest is
defaulting to 1023. For large guests with lots of vcpus this is not
enough, as e.g. the Linux kernel uses 7 event channels per vcpu,
limiting the guest to about 140 vcpus.


Large guests on which arch? Which type of guests?


Instead of requiring to specify the allowed number of event channels
via the "event_channels" domain config option, make the default
depend on the maximum number of vcpus of the guest. This will require
to use the "event_channels" domain config option in fewer cases as
before.

As different guests will have differing needs the calculation of the
maximum number of event channels can be a rough estimate only,
currently based on the Linux kernel requirements.


I am not overly happy to extend the default numbers of event channels 
for everyone based on Linux behavior on a given setup. Yes you have more 
guests that would be able to run, but at the expense of allowing a guest 
to use more xen memory.


For instance, I don't think this limit increase is necessary on Arm.


Nevertheless it is
much better than the static upper limit of today as more guests will
boot just fine with the new approach.

In order not to regress current configs use 1023 as the minimum
default setting.

Signed-off-by: Juergen Gross 
---
V2:
- use max() instead of min()
- clarify commit message a little bit
---
  tools/libxl/libxl_create.c | 2 +-


The documentation should be updated.


  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index e7cb2dbc2b..c025b21894 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -226,7 +226,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
  b_info->iomem[i].gfn = b_info->iomem[i].start;
  
  if (!b_info->event_channels)

-b_info->event_channels = 1023;
+b_info->event_channels = max(1023, b_info->max_vcpus * 8 + 255);


What is the 255 for?

Cheers,

--
Julien Grall



Re: [PATCH 1/5] xen/common: introduce a new framework for save/restore of 'domain' context

2020-04-06 Thread Julien Grall

Hi Paul,

On 06/04/2020 10:18, Paul Durrant wrote:

-Original Message-
From: Julien Grall 
Sent: 06 April 2020 10:08
To: p...@xen.org; xen-devel@lists.xenproject.org
Cc: 'Andrew Cooper' ; 'George Dunlap' 
; 'Ian
Jackson' ; 'Jan Beulich' ; 
'Stefano Stabellini'
; 'Wei Liu' 
Subject: Re: [PATCH 1/5] xen/common: introduce a new framework for save/restore 
of 'domain' context

Hi Paul,

On 06/04/2020 09:27, Paul Durrant wrote:

-Original Message-
From: Julien Grall 
Sent: 03 April 2020 18:24
To: p...@xen.org; xen-devel@lists.xenproject.org
Cc: 'Andrew Cooper' ; 'George Dunlap' 
; 'Ian
Jackson' ; 'Jan Beulich' ; 
'Stefano Stabellini'
; 'Wei Liu' 
Subject: Re: [PATCH 1/5] xen/common: introduce a new framework for save/restore 
of 'domain' context

Hi Paul,

On 03/04/2020 16:55, Paul Durrant wrote:

-Original Message-

[snip]

+
+#include 
+
+struct domain_context {
+bool log;
+struct domain_save_descriptor desc;
+domain_copy_entry copy;


As your new framework is technically an extension of existing one, it
would be good to explain why we diverge in the definitions.



I don't follow. What is diverging? I explain in the commit comment that this is 
a parallel

framework. Do I need to justify why it is not a carbon copy of the HVM one?

Well, they are both restoring/saving guest state. The only difference is
the existing one is focusing on HVM state.

So it would make sense long term to have only one hypercall and tell
what you want to save. In fact, some of the improvement here would
definitely make the HVM one nicer to use (at least in the context of LU).



I guess we could move the HVM save records over to the new framework, but it 
works for the moment so

I don't want to bring it into scope now.

And I agree, hence why I say "long term" :).




   From the commit message, it is not clear to me why a new framework and
why the infrastructure is at the same time different but not.



An alternative would be to move the HVM save code into common code and then try 
to adapt it. I think

that would result in more code churn and ultimately be harder to review. The 
extra infrastructure
introduced here is fairly minimal and, for the moment, only targeting PV state. 
As I said above
there's nothing stopping the HVM records being ported over later once any 
initial issues have been
shaken out.

Code churn is always going to necessary one day or another if we want to
consolidate the two.

Having two frameworks is not without risks. There are a few unknown to
be answered:
* Is there any dependency between the two? If yes, what is the ordering?


There isn't any dependency at the moment so need I say anything? If an ordering 
dependency is introduced by a future patch then surely that would be time to 
call it out.


If we don't spell out a dependency now, then the risk is we have to 
modify the toolstack at the same time as spelling out the dependency.


I am not sure whether this matters thought. This would only affect the 
save part as the restore part should be just reading records one by one.





* The name of the hypercall does not say anything about "PV". So a
contributor could think it would be fine to save/restore new HVM state
in the new generic hypercall. Is it going to be an issue? If so, how do
we prevent it?


The commit message says:

"Domain context is state held in the hypervisor that does not come under
the category of 'HVM state' but is instead 'PV state' that is common
between PV guests and enlightened HVM guests (i.e. those that have PV
drivers) such as event channel state, grant entry state, etc."


This does not seem to cover all the cases. For instance, where would you 
save IOREQ servers information?




Do you think this should also appear in a comment? Do we really care? Nothing 
fundamentally prevents the mechanism being used for HVM state, but that may 
introduce an ordering dependency.


I don't particularly like the idea to let the contributor decide where 
"HVM context" or as part of the "Domain context".


This is  going to result to unwanted dependency and possibly 
bikeshedding on where they should be saved.


My preference would be to mark the existing framework as deprecated and 
force all the new states to be saved as part of the new "Domain Context".





* Are we going to deprecate the existing framework?



There is no intention as yet.


I am not suggesting we should not go with two frameworks, but the
reasons and implications are not clear to me. Hence, why I think the
commit message should be expanded with some rationale.



Ok, I can add a paragraph to try to explain a little more.


That would be appreciated. Thank you!

Cheers,

--
Julien Grall



Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus

2020-04-06 Thread Jürgen Groß

On 06.04.20 11:24, Julien Grall wrote:

Hi Jurgen,

On 06/04/2020 09:27, Juergen Gross wrote:

Since Xen 4.4 the maximum number of event channels for a guest is
defaulting to 1023. For large guests with lots of vcpus this is not
enough, as e.g. the Linux kernel uses 7 event channels per vcpu,
limiting the guest to about 140 vcpus.


Large guests on which arch? Which type of guests?


I'm pretty sure this applies to x86 only. I'm not aware of event
channels being used on ARM for IPIs.




Instead of requiring to specify the allowed number of event channels
via the "event_channels" domain config option, make the default
depend on the maximum number of vcpus of the guest. This will require
to use the "event_channels" domain config option in fewer cases as
before.

As different guests will have differing needs the calculation of the
maximum number of event channels can be a rough estimate only,
currently based on the Linux kernel requirements.


I am not overly happy to extend the default numbers of event channels 
for everyone based on Linux behavior on a given setup. Yes you have more 
guests that would be able to run, but at the expense of allowing a guest 
to use more xen memory.


The resulting number would be larger than today only for guests with
more than 96 vcpus. So I don't think the additional amount of memory
is really that problematic.



For instance, I don't think this limit increase is necessary on Arm.


Nevertheless it is
much better than the static upper limit of today as more guests will
boot just fine with the new approach.

In order not to regress current configs use 1023 as the minimum
default setting.

Signed-off-by: Juergen Gross 
---
V2:
- use max() instead of min()
- clarify commit message a little bit
---
  tools/libxl/libxl_create.c | 2 +-


The documentation should be updated.


Oh, indeed.




  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index e7cb2dbc2b..c025b21894 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -226,7 +226,7 @@ int libxl__domain_build_info_setdefault(libxl__gc 
*gc,

  b_info->iomem[i].gfn = b_info->iomem[i].start;
  if (!b_info->event_channels)
-    b_info->event_channels = 1023;
+    b_info->event_channels = max(1023, b_info->max_vcpus * 8 + 255);


What is the 255 for?


Just some headroom for e.g. pv devices.


Juergen



RE: [PATCH 1/5] xen/common: introduce a new framework for save/restore of 'domain' context

2020-04-06 Thread Paul Durrant
> -Original Message-
[snip]
> >> * The name of the hypercall does not say anything about "PV". So a
> >> contributor could think it would be fine to save/restore new HVM state
> >> in the new generic hypercall. Is it going to be an issue? If so, how do
> >> we prevent it?
> >
> > The commit message says:
> >
> > "Domain context is state held in the hypervisor that does not come under
> > the category of 'HVM state' but is instead 'PV state' that is common
> > between PV guests and enlightened HVM guests (i.e. those that have PV
> > drivers) such as event channel state, grant entry state, etc."
> 
> This does not seem to cover all the cases. For instance, where would you
> save IOREQ servers information?
> 

Ok, I agree that is ambiguous. I'd still call it PV state but of course it does 
only relate to HVM guests.

> >
> > Do you think this should also appear in a comment? Do we really care? 
> > Nothing fundamentally prevents
> the mechanism being used for HVM state, but that may introduce an ordering 
> dependency.
> 
> I don't particularly like the idea to let the contributor decide where
> "HVM context" or as part of the "Domain context".
> 
> This is  going to result to unwanted dependency and possibly
> bikeshedding on where they should be saved.
> 
> My preference would be to mark the existing framework as deprecated and
> force all the new states to be saved as part of the new "Domain Context".
> 

I'm ok with that. Any others have any opinion to the contrary?

> >
> >> * Are we going to deprecate the existing framework?
> >>
> >
> > There is no intention as yet.
> >
> >> I am not suggesting we should not go with two frameworks, but the
> >> reasons and implications are not clear to me. Hence, why I think the
> >> commit message should be expanded with some rationale.
> >>
> >
> > Ok, I can add a paragraph to try to explain a little more.
> 
> That would be appreciated. Thank you!
> 

I'll mention the deprecation of the HVM context interface there too.

  Paul




Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus

2020-04-06 Thread Julien Grall

Hi Juergen,

On 06/04/2020 11:17, Jürgen Groß wrote:

On 06.04.20 11:24, Julien Grall wrote:

Hi Jurgen,

On 06/04/2020 09:27, Juergen Gross wrote:

Since Xen 4.4 the maximum number of event channels for a guest is
defaulting to 1023. For large guests with lots of vcpus this is not
enough, as e.g. the Linux kernel uses 7 event channels per vcpu,
limiting the guest to about 140 vcpus.


Large guests on which arch? Which type of guests?


I'm pretty sure this applies to x86 only. I'm not aware of event
channels being used on ARM for IPIs.


How about the guest types?






Instead of requiring to specify the allowed number of event channels
via the "event_channels" domain config option, make the default
depend on the maximum number of vcpus of the guest. This will require
to use the "event_channels" domain config option in fewer cases as
before.

As different guests will have differing needs the calculation of the
maximum number of event channels can be a rough estimate only,
currently based on the Linux kernel requirements.


I am not overly happy to extend the default numbers of event channels 
for everyone based on Linux behavior on a given setup. Yes you have 
more guests that would be able to run, but at the expense of allowing 
a guest to use more xen memory.


The resulting number would be larger than today only for guests with
more than 96 vcpus. So I don't think the additional amount of memory
is really that problematic.
This is not a very forward looking argument. For Arm, we limit to 128 
vCPUs at the moment but it would be possible to support many more (I 
think our vGIC implementation support up to 4096 vCPUs).


So even if your change impacts a small subset, each architectures should 
be able to make the decision on the limit and not imposed by x86 Linux PV.






For instance, I don't think this limit increase is necessary on Arm.


Nevertheless it is
much better than the static upper limit of today as more guests will
boot just fine with the new approach.

In order not to regress current configs use 1023 as the minimum
default setting.

Signed-off-by: Juergen Gross 
---
V2:
- use max() instead of min()
- clarify commit message a little bit
---
  tools/libxl/libxl_create.c | 2 +-


The documentation should be updated.


Oh, indeed.




  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index e7cb2dbc2b..c025b21894 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -226,7 +226,7 @@ int libxl__domain_build_info_setdefault(libxl__gc 
*gc,

  b_info->iomem[i].gfn = b_info->iomem[i].start;
  if (!b_info->event_channels)
-    b_info->event_channels = 1023;
+    b_info->event_channels = max(1023, b_info->max_vcpus * 8 + 
255);


What is the 255 for?


Just some headroom for e.g. pv devices.


That should really be explained in the commit message and a comment.

Cheers,

--
Julien Grall



Re: [PATCH OSSTEST] linux: enable x2APIC kernel support

2020-04-06 Thread Ian Jackson
Roger Pau Monne writes ("[PATCH OSSTEST] linux: enable x2APIC kernel support"):
> Without it Linux is not able to parse the x2APIC ACPI MADT entries
> crafted by Xen when booted in PVH mode, following log is from one of
> the dom0pvh jobs:

Acked-by: Ian Jackson 

And pushed to pretest.

Thanks,
Ian.



Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus

2020-04-06 Thread Ian Jackson
Jürgen Groß writes ("Re: [PATCH v2] tools/libxl: make default of max event 
channels dependant on vcpus"):
> On 06.04.20 11:24, Julien Grall wrote:
> > Large guests on which arch? Which type of guests?
> 
> I'm pretty sure this applies to x86 only. I'm not aware of event
> channels being used on ARM for IPIs.

Should this be arch-dependent then ?  It seems like the figure is just
a heuristic anyway, and ...

> The resulting number would be larger than today only for guests with
> more than 96 vcpus. So I don't think the additional amount of memory
> is really that problematic.

Julien, are there likely to be any ARM guests now which have anywhere
near that number of vcpus ?  If not do we know now what such guests
are likely to be like ?

If this is all hypothetical on ARM it would seem silly to make this
arch-specific for the benefit of ARM given that the ARM implementation
would be entirely guesswork.  Maybe we should postpone that
specialisation until we know better what the ARM function should be
like for these large numbers of vcpus.

If ARM folks want to have a different formula for the default then
that is of course fine but I wonder whether this might do ARMk more
harm than good in this case.

Ian.



Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus

2020-04-06 Thread Jürgen Groß

On 06.04.20 12:37, Julien Grall wrote:

Hi Juergen,

On 06/04/2020 11:17, Jürgen Groß wrote:

On 06.04.20 11:24, Julien Grall wrote:

Hi Jurgen,

On 06/04/2020 09:27, Juergen Gross wrote:

Since Xen 4.4 the maximum number of event channels for a guest is
defaulting to 1023. For large guests with lots of vcpus this is not
enough, as e.g. the Linux kernel uses 7 event channels per vcpu,
limiting the guest to about 140 vcpus.


Large guests on which arch? Which type of guests?


I'm pretty sure this applies to x86 only. I'm not aware of event
channels being used on ARM for IPIs.


How about the guest types?


PV and HVM with PV enhancements.








Instead of requiring to specify the allowed number of event channels
via the "event_channels" domain config option, make the default
depend on the maximum number of vcpus of the guest. This will require
to use the "event_channels" domain config option in fewer cases as
before.

As different guests will have differing needs the calculation of the
maximum number of event channels can be a rough estimate only,
currently based on the Linux kernel requirements.


I am not overly happy to extend the default numbers of event channels 
for everyone based on Linux behavior on a given setup. Yes you have 
more guests that would be able to run, but at the expense of allowing 
a guest to use more xen memory.


The resulting number would be larger than today only for guests with
more than 96 vcpus. So I don't think the additional amount of memory
is really that problematic.
This is not a very forward looking argument. For Arm, we limit to 128 
vCPUs at the moment but it would be possible to support many more (I 
think our vGIC implementation support up to 4096 vCPUs).


So even if your change impacts a small subset, each architectures should 
be able to make the decision on the limit and not imposed by x86 Linux PV.


Okay, what about moving the default setting of b_info->event_channels
into libxl__arch_domain_build_info_setdefault() then?







For instance, I don't think this limit increase is necessary on Arm.


Nevertheless it is
much better than the static upper limit of today as more guests will
boot just fine with the new approach.

In order not to regress current configs use 1023 as the minimum
default setting.

Signed-off-by: Juergen Gross 
---
V2:
- use max() instead of min()
- clarify commit message a little bit
---
  tools/libxl/libxl_create.c | 2 +-


The documentation should be updated.


Oh, indeed.




  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index e7cb2dbc2b..c025b21894 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -226,7 +226,7 @@ int 
libxl__domain_build_info_setdefault(libxl__gc *gc,

  b_info->iomem[i].gfn = b_info->iomem[i].start;
  if (!b_info->event_channels)
-    b_info->event_channels = 1023;
+    b_info->event_channels = max(1023, b_info->max_vcpus * 8 + 
255);


What is the 255 for?


Just some headroom for e.g. pv devices.


That should really be explained in the commit message and a comment.


Okay.


Juergen



Re: [PATCH v13 1/3] xen/mem_sharing: VM forking

2020-04-06 Thread Roger Pau Monné
On Mon, Mar 30, 2020 at 08:02:08AM -0700, Tamas K Lengyel wrote:
> VM forking is the process of creating a domain with an empty memory space and 
> a
> parent domain specified from which to populate the memory when necessary. For
> the new domain to be functional the VM state is copied over as part of the 
> fork
> operation (HVM params, hap allocation, etc).
> 
> Signed-off-by: Tamas K Lengyel 
> Acked-by: Jan Beulich 
> +static int bring_up_vcpus(struct domain *cd, struct domain *d)
> +{
> +unsigned int i;
> +int ret = -EINVAL;
> +
> +if ( d->max_vcpus != cd->max_vcpus ||
> +(ret = cpupool_move_domain(cd, d->cpupool)) )
> +return ret;
> +
> +for ( i = 0; i < cd->max_vcpus; i++ )
> +{
> +if ( !d->vcpu[i] || cd->vcpu[i] )
> +continue;
> +
> +if ( !vcpu_create(cd, i) )
> +return -EINVAL;
> +}
> +
> +domain_update_node_affinity(cd);
> +return 0;
> +}
> +
> +static int copy_vcpu_settings(struct domain *cd, struct domain *d)

Nit: AFAICT *d can be constified.

> +{
> +unsigned int i;
> +struct p2m_domain *p2m = p2m_get_hostp2m(cd);
> +int ret = -EINVAL;
> +
> +for ( i = 0; i < cd->max_vcpus; i++ )
> +{
> +const struct vcpu *d_vcpu = d->vcpu[i];
> +struct vcpu *cd_vcpu = cd->vcpu[i];
> +struct vcpu_runstate_info runstate;
> +mfn_t vcpu_info_mfn;
> +
> +if ( !d_vcpu || !cd_vcpu )
> +continue;
> +
> +/* Copy & map in the vcpu_info page if the guest uses one */
> +vcpu_info_mfn = d_vcpu->vcpu_info_mfn;
> +if ( !mfn_eq(vcpu_info_mfn, INVALID_MFN) )
> +{
> +mfn_t new_vcpu_info_mfn = cd_vcpu->vcpu_info_mfn;
> +
> +/* Allocate & map the page for it if it hasn't been already */
> +if ( mfn_eq(new_vcpu_info_mfn, INVALID_MFN) )
> +{
> +gfn_t gfn = mfn_to_gfn(d, vcpu_info_mfn);
> +unsigned long gfn_l = gfn_x(gfn);
> +struct page_info *page;
> +
> +if ( !(page = alloc_domheap_page(cd, 0)) )
> +return -ENOMEM;
> +
> +new_vcpu_info_mfn = page_to_mfn(page);
> +set_gpfn_from_mfn(mfn_x(new_vcpu_info_mfn), gfn_l);
> +
> +ret = p2m->set_entry(p2m, gfn, new_vcpu_info_mfn,
> + PAGE_ORDER_4K, p2m_ram_rw,
> + p2m->default_access, -1);
> +if ( ret )
> +return ret;
> +
> +ret = map_vcpu_info(cd_vcpu, gfn_l,
> +PAGE_OFFSET(d_vcpu->vcpu_info));
> +if ( ret )
> +return ret;
> +}
> +
> +copy_domain_page(new_vcpu_info_mfn, vcpu_info_mfn);
> +}
> +
> +/* Setup the vCPU runstate area */
> +if ( !guest_handle_is_null(runstate_guest(d_vcpu)) )
> +{
> +runstate_guest(cd_vcpu) = runstate_guest(d_vcpu);
> +vcpu_runstate_get(cd_vcpu, &runstate);
> +__copy_to_guest(runstate_guest(cd_vcpu), &runstate, 1);

I just realized there's no need to copy the runstate area contents
here, since they will get copied anyway in schedule_tail before
resuming execution og cd_vcpu as long as runstate_guest is set.

Note that the vcpu_info needs to be copied since it contains event
channel info which is not unconditionally updated on context switch
IIRC.

> +}
> +
> +/*
> + * TODO: to support VMs with PV interfaces copy additional
> + * settings here, such as PV timers.
> + */
> +}
> +
> +return 0;
> +}
> +
> +static int fork_hap_allocation(struct domain *cd, struct domain *d)
> +{
> +int rc;
> +bool preempted;
> +unsigned long mb = hap_get_allocation(d);
> +
> +if ( mb == hap_get_allocation(cd) )
> +return 0;
> +
> +paging_lock(cd);
> +rc = hap_set_allocation(cd, mb << (20 - PAGE_SHIFT), &preempted);
> +paging_unlock(cd);
> +
> +return preempted ? -ERESTART : rc;
> +}
> +
> +static void copy_tsc(struct domain *cd, struct domain *d)
> +{
> +uint32_t tsc_mode;
> +uint32_t gtsc_khz;
> +uint32_t incarnation;
> +uint64_t elapsed_nsec;
> +
> +tsc_get_info(d, &tsc_mode, &elapsed_nsec, >sc_khz, &incarnation);
> +/* Don't bump incarnation on set */
> +tsc_set_info(cd, tsc_mode, elapsed_nsec, gtsc_khz, incarnation - 1);
> +}
> +
> +static int copy_special_pages(struct domain *cd, struct domain *d)
> +{
> +mfn_t new_mfn, old_mfn;
> +struct p2m_domain *p2m = p2m_get_hostp2m(cd);
> +static const unsigned int params[] =
> +{
> +HVM_PARAM_STORE_PFN,
> +HVM_PARAM_IOREQ_PFN,
> +HVM_PARAM_BUFIOREQ_PFN,
> +HVM_PARAM_CONSOLE_PFN
> +};
> +unsigned int i;
> +int rc;
> +
> +for ( i = 0; i < ARRAY_SIZE(params); i++ )
> +{
> +p2m_type_t t;
> +uint6

Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus

2020-04-06 Thread Ian Jackson
Jürgen Groß writes ("Re: [PATCH v2] tools/libxl: make default of max event 
channels dependant on vcpus"):
> Okay, what about moving the default setting of b_info->event_channels
> into libxl__arch_domain_build_info_setdefault() then?

FTAOD, if this satisfies ARM maintainers then I am content with it,
even though I doubt the utility.

I guess you should make two patches 1. duplicate the existing formula
(no functional change) 2. change the x86 formula.

I would ack the first and be guided by x86 folks for the 2nd.

Ian.



Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus

2020-04-06 Thread Julien Grall

Hi Ian,

On 06/04/2020 11:47, Ian Jackson wrote:

Jürgen Groß writes ("Re: [PATCH v2] tools/libxl: make default of max event channels 
dependant on vcpus"):

On 06.04.20 11:24, Julien Grall wrote:

Large guests on which arch? Which type of guests?


I'm pretty sure this applies to x86 only. I'm not aware of event
channels being used on ARM for IPIs.


Should this be arch-dependent then ?  It seems like the figure is just
a heuristic anyway, and ...


The resulting number would be larger than today only for guests with
more than 96 vcpus. So I don't think the additional amount of memory
is really that problematic.


Julien, are there likely to be any ARM guests now which have anywhere
near that number of vcpus ?  If not do we know now what such guests
are likely to be like ?


We are meant to support up to 128 vCPUs. But our implementation can 
support up to 4096 vCPUs on vGICv3.




If this is all hypothetical on ARM it would seem silly to make this
arch-specific for the benefit of ARM given that the ARM implementation
would be entirely guesswork.  Maybe we should postpone that
specialisation until we know better what the ARM function should be
like for these large numbers of vcpus.


There are no correlation between event channels and vCPUs. The number of 
event channels only depends on the number of frontend you have in your 
guest. So...



If ARM folks want to have a different formula for the default then
that is of course fine but I wonder whether this might do ARMk more
harm than good in this case.


... 1023 event channels is going to be plenty enough for most of the use 
cases.


Cheers,

--
Julien Grall



[PATCH v9 0/3] x86/guest: use assisted TLB flush in guest mode

2020-04-06 Thread Roger Pau Monne
Hello,

This is the remaining of the assisted TLB flush series. This last set of
patches enable the usage of the Xen assisted flush when running nested
on Xen.

Thanks, Roger.

Roger Pau Monne (3):
  x86/tlb: introduce a flush HVM ASIDs flag
  x86/tlb: allow disabling the TLB clock
  x86/tlb: use Xen L0 assisted TLB flush when available

 xen/arch/x86/flushtlb.c| 25 +
 xen/arch/x86/guest/hypervisor.c| 14 ++
 xen/arch/x86/guest/xen/xen.c   |  6 ++
 xen/arch/x86/mm/hap/hap.c  |  8 
 xen/arch/x86/mm/hap/nested_hap.c   |  2 +-
 xen/arch/x86/mm/hap/private.h  |  5 +
 xen/arch/x86/mm/p2m-pt.c   |  2 +-
 xen/arch/x86/mm/paging.c   |  3 ++-
 xen/arch/x86/mm/shadow/common.c| 18 +-
 xen/arch/x86/mm/shadow/hvm.c   |  2 +-
 xen/arch/x86/mm/shadow/multi.c | 17 +
 xen/arch/x86/mm/shadow/private.h   |  6 ++
 xen/arch/x86/smp.c |  7 +++
 xen/include/asm-x86/flushtlb.h | 23 ++-
 xen/include/asm-x86/guest/hypervisor.h | 17 +
 15 files changed, 121 insertions(+), 34 deletions(-)

-- 
2.26.0




[PATCH v9 1/3] x86/tlb: introduce a flush HVM ASIDs flag

2020-04-06 Thread Roger Pau Monne
Introduce a specific flag to request a HVM guest linear TLB flush,
which is an ASID/VPID tickle that forces a guest linear to guest
physical TLB flush for all HVM guests.

This was previously unconditionally done in each pre_flush call, but
that's not required: HVM guests not using shadow don't require linear
TLB flushes as Xen doesn't modify the guest page tables in that case
(ie: when using HAP). Note that shadow paging code already takes care
of issuing the necessary flushes when the shadow page tables are
modified.

In order to keep the previous behavior modify all shadow code TLB
flushes to also flush the guest linear to physical TLB if the guest is
HVM. I haven't looked at each specific shadow code TLB flush in order
to figure out whether it actually requires a guest TLB flush or not,
so there might be room for improvement in that regard.

Also perform ASID/VPID flushes when modifying the p2m tables as it's a
requirement for AMD hardware. Finally keep the flush in
switch_cr3_cr4, as it's not clear whether code could rely on
switch_cr3_cr4 also performing a guest linear TLB flush. A following
patch can remove the ASID/VPID tickle from switch_cr3_cr4 if found to
not be necessary.

Signed-off-by: Roger Pau Monné 
---
Changes since v8:
 - Don't flush host TLB on HAP changes.
 - Introduce a helper for shadow changes that only flushes ASIDs/VPIDs
   when the guest is HVM.
 - Introduce a helper for HAP that only flushes ASIDs/VPIDs.

Changes since v7:
 - Do not perform an ASID flush in filtered_flush_tlb_mask: the
   requested flush is related to the page need_tlbflush field and not
   to p2m changes (applies to both callers).

Changes since v6:
 - Add ASID/VPID flushes when modifying the p2m.
 - Keep the ASID/VPID flush in switch_cr3_cr4.

Changes since v5:
 - Rename FLUSH_GUESTS_TLB to FLUSH_HVM_ASID_CORE.
 - Clarify commit message.
 - Define FLUSH_HVM_ASID_CORE to 0 when !CONFIG_HVM.
---
 xen/arch/x86/flushtlb.c  |  6 --
 xen/arch/x86/mm/hap/hap.c|  8 
 xen/arch/x86/mm/hap/nested_hap.c |  2 +-
 xen/arch/x86/mm/hap/private.h|  5 +
 xen/arch/x86/mm/p2m-pt.c |  2 +-
 xen/arch/x86/mm/paging.c |  3 ++-
 xen/arch/x86/mm/shadow/common.c  | 18 +-
 xen/arch/x86/mm/shadow/hvm.c |  2 +-
 xen/arch/x86/mm/shadow/multi.c   | 17 +
 xen/arch/x86/mm/shadow/private.h |  6 ++
 xen/include/asm-x86/flushtlb.h   |  6 ++
 11 files changed, 48 insertions(+), 27 deletions(-)

diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 03f92c23dc..c81e53c0ae 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -59,8 +59,6 @@ static u32 pre_flush(void)
 raise_softirq(NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ);
 
  skip_clocktick:
-hvm_flush_guest_tlbs();
-
 return t2;
 }
 
@@ -118,6 +116,7 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
 local_irq_save(flags);
 
 t = pre_flush();
+hvm_flush_guest_tlbs();
 
 old_cr4 = read_cr4();
 ASSERT(!(old_cr4 & X86_CR4_PCIDE) || !(old_cr4 & X86_CR4_PGE));
@@ -221,6 +220,9 @@ unsigned int flush_area_local(const void *va, unsigned int 
flags)
 do_tlb_flush();
 }
 
+if ( flags & FLUSH_HVM_ASID_CORE )
+hvm_flush_guest_tlbs();
+
 if ( flags & FLUSH_CACHE )
 {
 const struct cpuinfo_x86 *c = ¤t_cpu_data;
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index a6d5e39b02..12856245be 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -118,7 +118,7 @@ int hap_track_dirty_vram(struct domain *d,
 p2m_change_type_range(d, begin_pfn, begin_pfn + nr,
   p2m_ram_rw, p2m_ram_logdirty);
 
-flush_tlb_mask(d->dirty_cpumask);
+hap_flush_tlb_mask(d->dirty_cpumask);
 
 memset(dirty_bitmap, 0xff, size); /* consider all pages dirty */
 }
@@ -205,7 +205,7 @@ static int hap_enable_log_dirty(struct domain *d, bool_t 
log_global)
  * to be read-only, or via hardware-assisted log-dirty.
  */
 p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
-flush_tlb_mask(d->dirty_cpumask);
+hap_flush_tlb_mask(d->dirty_cpumask);
 }
 return 0;
 }
@@ -234,7 +234,7 @@ static void hap_clean_dirty_bitmap(struct domain *d)
  * be read-only, or via hardware-assisted log-dirty.
  */
 p2m_change_entry_type_global(d, p2m_ram_rw, p2m_ram_logdirty);
-flush_tlb_mask(d->dirty_cpumask);
+hap_flush_tlb_mask(d->dirty_cpumask);
 }
 
 //
@@ -798,7 +798,7 @@ hap_write_p2m_entry(struct p2m_domain *p2m, unsigned long 
gfn, l1_pgentry_t *p,
 
 safe_write_pte(p, new);
 if ( old_flags & _PAGE_PRESENT )
-flush_tlb_mask(d->dirty_cpumask);
+hap_flush_tlb_mask(d->dirty_cpumask);
 
 paging_unlock(d);
 
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
in

[PATCH v9 2/3] x86/tlb: allow disabling the TLB clock

2020-04-06 Thread Roger Pau Monne
The TLB clock is helpful when running Xen on bare metal because when
doing a TLB flush each CPU is IPI'ed and can keep a timestamp of the
last flush.

This is not the case however when Xen is running virtualized, and the
underlying hypervisor provides mechanism to assist in performing TLB
flushes: Xen itself for example offers a HVMOP_flush_tlbs hypercall in
order to perform a TLB flush without having to IPI each CPU. When
using such mechanisms it's no longer possible to keep a timestamp of
the flushes on each CPU, as they are performed by the underlying
hypervisor.

Offer a boolean in order to signal Xen that the timestamped TLB
shouldn't be used. This avoids keeping the timestamps of the flushes,
and also forces NEED_FLUSH to always return true.

No functional change intended, as this change doesn't introduce any
user that disables the timestamped TLB.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Wei Liu 
Acked-by: Jan Beulich 
---
 xen/arch/x86/flushtlb.c| 19 +--
 xen/include/asm-x86/flushtlb.h | 17 -
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index c81e53c0ae..22b2e84329 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -32,6 +32,9 @@
 u32 tlbflush_clock = 1U;
 DEFINE_PER_CPU(u32, tlbflush_time);
 
+/* Signals whether the TLB flush clock is in use. */
+bool __read_mostly tlb_clk_enabled = true;
+
 /*
  * pre_flush(): Increment the virtual TLB-flush clock. Returns new clock value.
  * 
@@ -82,12 +85,13 @@ static void post_flush(u32 t)
 static void do_tlb_flush(void)
 {
 unsigned long flags, cr4;
-u32 t;
+u32 t = 0;
 
 /* This non-reentrant function is sometimes called in interrupt context. */
 local_irq_save(flags);
 
-t = pre_flush();
+if ( tlb_clk_enabled )
+t = pre_flush();
 
 if ( use_invpcid )
 invpcid_flush_all();
@@ -99,7 +103,8 @@ static void do_tlb_flush(void)
 else
 write_cr3(read_cr3());
 
-post_flush(t);
+if ( tlb_clk_enabled )
+post_flush(t);
 
 local_irq_restore(flags);
 }
@@ -107,7 +112,7 @@ static void do_tlb_flush(void)
 void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
 {
 unsigned long flags, old_cr4;
-u32 t;
+u32 t = 0;
 
 /* Throughout this function we make this assumption: */
 ASSERT(!(cr4 & X86_CR4_PCIDE) || !(cr4 & X86_CR4_PGE));
@@ -115,7 +120,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
 /* This non-reentrant function is sometimes called in interrupt context. */
 local_irq_save(flags);
 
-t = pre_flush();
+if ( tlb_clk_enabled )
+t = pre_flush();
 hvm_flush_guest_tlbs();
 
 old_cr4 = read_cr4();
@@ -168,7 +174,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
 if ( cr4 & X86_CR4_PCIDE )
 invpcid_flush_all_nonglobals();
 
-post_flush(t);
+if ( tlb_clk_enabled )
+post_flush(t);
 
 local_irq_restore(flags);
 }
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 579dc56803..724455ae0c 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -21,10 +21,21 @@ extern u32 tlbflush_clock;
 /* Time at which each CPU's TLB was last flushed. */
 DECLARE_PER_CPU(u32, tlbflush_time);
 
-#define tlbflush_current_time() tlbflush_clock
+/* TLB clock is in use. */
+extern bool tlb_clk_enabled;
+
+static inline uint32_t tlbflush_current_time(void)
+{
+/* Returning 0 from tlbflush_current_time will always force a flush. */
+return tlb_clk_enabled ? tlbflush_clock : 0;
+}
 
 static inline void page_set_tlbflush_timestamp(struct page_info *page)
 {
+/* Avoid the write if the TLB clock is disabled. */
+if ( !tlb_clk_enabled )
+return;
+
 /*
  * Prevent storing a stale time stamp, which could happen if an update
  * to tlbflush_clock plus a subsequent flush IPI happen between the
@@ -67,6 +78,10 @@ static inline void tlbflush_filter(cpumask_t *mask, uint32_t 
page_timestamp)
 {
 unsigned int cpu;
 
+/* Short-circuit: there's no need to iterate if the clock is disabled. */
+if ( !tlb_clk_enabled )
+return;
+
 for_each_cpu ( cpu, mask )
 if ( !NEED_FLUSH(per_cpu(tlbflush_time, cpu), page_timestamp) )
 __cpumask_clear_cpu(cpu, mask);
-- 
2.26.0




[PATCH v9 3/3] x86/tlb: use Xen L0 assisted TLB flush when available

2020-04-06 Thread Roger Pau Monne
Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
This greatly increases the performance of TLB flushes when running
with a high amount of vCPUs as a Xen guest, and is specially important
when running in shim mode.

The following figures are from a PV guest running `make -j32 xen` in
shim mode with 32 vCPUs and HAP.

Using x2APIC and ALLBUT shorthand:
real4m35.973s
user4m35.110s
sys 36m24.117s

Using L0 assisted flush:
real1m2.596s
user4m34.818s
sys 5m16.374s

The implementation adds a new hook to hypervisor_ops so other
enlightenments can also implement such assisted flush just by filling
the hook.

Note that the Xen implementation completely ignores the dirty CPU mask
and the linear address passed in, and always performs a global TLB
flush on all vCPUs. This is a limitation of the hypercall provided by
Xen. Also note that local TLB flushes are not performed using the
assisted TLB flush, only remote ones.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Wei Liu 
Reviewed-by: Jan Beulich 
---
Changes since v5:
 - Clarify commit message.
 - Test for assisted flush at setup, do this for all hypervisors.
 - Return EOPNOTSUPP if assisted flush is not available.

Changes since v4:
 - Adjust order calculation.

Changes since v3:
 - Use an alternative call for the flush hook.

Changes since v1:
 - Add a L0 assisted hook to hypervisor ops.
---
 xen/arch/x86/guest/hypervisor.c| 14 ++
 xen/arch/x86/guest/xen/xen.c   |  6 ++
 xen/arch/x86/smp.c |  7 +++
 xen/include/asm-x86/guest/hypervisor.h | 17 +
 4 files changed, 44 insertions(+)

diff --git a/xen/arch/x86/guest/hypervisor.c b/xen/arch/x86/guest/hypervisor.c
index 647cdb1367..e46de42ded 100644
--- a/xen/arch/x86/guest/hypervisor.c
+++ b/xen/arch/x86/guest/hypervisor.c
@@ -18,6 +18,7 @@
  *
  * Copyright (c) 2019 Microsoft.
  */
+#include 
 #include 
 #include 
 
@@ -51,6 +52,10 @@ void __init hypervisor_setup(void)
 {
 if ( ops.setup )
 ops.setup();
+
+/* Check if assisted flush is available and disable the TLB clock if so. */
+if ( !hypervisor_flush_tlb(cpumask_of(smp_processor_id()), NULL, 0) )
+tlb_clk_enabled = false;
 }
 
 int hypervisor_ap_setup(void)
@@ -73,6 +78,15 @@ void __init hypervisor_e820_fixup(struct e820map *e820)
 ops.e820_fixup(e820);
 }
 
+int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
+ unsigned int order)
+{
+if ( ops.flush_tlb )
+return alternative_call(ops.flush_tlb, mask, va, order);
+
+return -EOPNOTSUPP;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/guest/xen/xen.c b/xen/arch/x86/guest/xen/xen.c
index e74fd1e995..3bc01c8723 100644
--- a/xen/arch/x86/guest/xen/xen.c
+++ b/xen/arch/x86/guest/xen/xen.c
@@ -324,12 +324,18 @@ static void __init e820_fixup(struct e820map *e820)
 pv_shim_fixup_e820(e820);
 }
 
+static int flush_tlb(const cpumask_t *mask, const void *va, unsigned int order)
+{
+return xen_hypercall_hvm_op(HVMOP_flush_tlbs, NULL);
+}
+
 static const struct hypervisor_ops __initconstrel ops = {
 .name = "Xen",
 .setup = setup,
 .ap_setup = ap_setup,
 .resume = resume,
 .e820_fixup = e820_fixup,
+.flush_tlb = flush_tlb,
 };
 
 const struct hypervisor_ops *__init xg_probe(void)
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index bcead5d01b..1d9fec65de 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -268,6 +269,12 @@ void flush_area_mask(const cpumask_t *mask, const void 
*va, unsigned int flags)
 if ( (flags & ~FLUSH_ORDER_MASK) &&
  !cpumask_subset(mask, cpumask_of(cpu)) )
 {
+if ( cpu_has_hypervisor &&
+ !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
+ FLUSH_ORDER_MASK)) &&
+ !hypervisor_flush_tlb(mask, va, (flags - 1) & FLUSH_ORDER_MASK) )
+return;
+
 spin_lock(&flush_lock);
 cpumask_and(&flush_cpumask, mask, &cpu_online_map);
 cpumask_clear_cpu(cpu, &flush_cpumask);
diff --git a/xen/include/asm-x86/guest/hypervisor.h 
b/xen/include/asm-x86/guest/hypervisor.h
index ade10e74ea..77a1d21824 100644
--- a/xen/include/asm-x86/guest/hypervisor.h
+++ b/xen/include/asm-x86/guest/hypervisor.h
@@ -19,6 +19,8 @@
 #ifndef __X86_HYPERVISOR_H__
 #define __X86_HYPERVISOR_H__
 
+#include 
+
 #include 
 
 struct hypervisor_ops {
@@ -32,6 +34,8 @@ struct hypervisor_ops {
 void (*resume)(void);
 /* Fix up e820 map */
 void (*e820_fixup)(struct e820map *e820);
+/* L0 assisted TLB flush */
+int (*flush_tlb)(const cpumask_t *mask, const void *va, unsigned int 
order);
 };
 
 #ifdef CONFIG_GUEST
@@ -41,6 +45,14 @@ void hypervisor_setup(void);
 int hypervisor_ap_setup(void);
 void hypervisor_resume(void);
 void hypervisor_e820_fixup(struct e820m

Re: [PATCH for-5.0] xen-block: Fix double qlist remove

2020-04-06 Thread Anthony PERARD
On Thu, Apr 02, 2020 at 03:27:22PM +0100, Paul Durrant wrote:
> > -Original Message-
> > From: Anthony PERARD 
> > Sent: 02 April 2020 14:08
> > To: qemu-de...@nongnu.org
> > Cc: qemu-sta...@nongnu.org; Anthony PERARD ; 
> > Stefano Stabellini
> > ; Paul Durrant ; Stefan Hajnoczi 
> > ; Kevin
> > Wolf ; Max Reitz ; 
> > xen-devel@lists.xenproject.org; qemu-
> > bl...@nongnu.org
> > Subject: [PATCH for-5.0] xen-block: Fix double qlist remove
> > 
> > Commit a31ca6801c02 ("qemu/queue.h: clear linked list pointers on
> > remove") revealed that a request was removed twice from a list, once
> > in xen_block_finish_request() and a second time in
> > xen_block_release_request() when both function are called from
> > xen_block_complete_aio(). But also, the `requests_inflight' counter is
> > decreased twice, and thus became negative.
> > 
> > This is a bug that was introduced in bfd0d6366043, where a `finished'
> > list was removed.
> > 
> > This patch simply re-add the `finish' parameter of
> > xen_block_release_request() so that we can distinguish when we need to
> > remove a request from the inflight list and when not.
> > 
> > Fixes: bfd0d6366043 ("xen-block: improve response latency")
> > Signed-off-by: Anthony PERARD 
> 
> It looks to me like it would just be more straightforward to simply drop the 
> QLIST_REMOVE and requests_inflight-- from
> xen_block_release_request() and simply insist that xen_block_finish_request() 
> is called in all cases (which I think means adding one
> extra call to it in xen_block_handle_requests()).

I'm thinking of going further than that. I've notice another bug, in
case of error in xen_block_do_aio(), xen_block_finish_request() is
called without ever calling send_response() or release_request(). I
think that mean a leak of request.

So, I'm thinking of creating a function that would do finish_request(),
send_response(), release_request(), has I believe those operations needs
to be done together anyway.

I'll rework the patch.

-- 
Anthony PERARD



Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus [and 1 more messages]

2020-04-06 Thread Ian Jackson
Julien Grall writes ("Re: [PATCH v2] tools/libxl: make default of max event 
channels dependant on vcpus"):
> There are no correlation between event channels and vCPUs. The number of 
> event channels only depends on the number of frontend you have in your 
> guest. So...
> 
> Hi Ian,
> 
> On 06/04/2020 11:47, Ian Jackson wrote:
> > If ARM folks want to have a different formula for the default then
> > that is of course fine but I wonder whether this might do ARMk more
> > harm than good in this case.
> 
> ... 1023 event channels is going to be plenty enough for most of the use 
> cases.

OK, thanks for the quick reply.

So, Jürgen, I think everyone will be happy with this:

Ian Jackson writes ("Re: [PATCH v2] tools/libxl: make default of max event 
channels dependant on vcpus"):
> I guess you should make two patches 1. duplicate the existing formula
> (no functional change) 2. change the x86 formula.
> 
> I would ack the first and be guided by x86 folks for the 2nd.

Ian.



Re: [PATCH v2] xen/guest_access: Harden *copy_to_guest_offset() to prevent const dest operand

2020-04-06 Thread Jan Beulich
On 04.04.2020 15:06, Julien Grall wrote:
> From: Julien Grall 
> 
> At the moment, *copy_to_guest_offset() will allow the hypervisor to copy
> data to guest handle marked const.
> 
> Thankfully, no users of the helper will do that. Rather than hoping this
> can be caught during review, harden copy_to_guest_offset() so the build
> will fail if such users are introduced.
> 
> There is no easy way to check whether a const is NULL in C99. The
> approach used is to introduce an unused variable that is non-const and
> assign the handle. If the handle were const, this would fail at build
> because without an explicit cast, it is not possible to assign a const
> variable to a non-const variable.
> 
> Suggested-by: Jan Beulich 
> Signed-off-by: Julien Grall 

I'm not convinced it is a good idea to add (recurring) comments
like you do - there are more aspects of these macros that would
be worth commenting on, and commenting on some but not all may
give the wrong impression of all subtleties being covered (also
for others).

In any event I'd like to ask that each header gain such a
comment only once, with the other being a tiny reference to the
one "complete" instance.

Jan



Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus [and 1 more messages]

2020-04-06 Thread Jürgen Groß

On 06.04.20 13:00, Ian Jackson wrote:

Julien Grall writes ("Re: [PATCH v2] tools/libxl: make default of max event channels 
dependant on vcpus"):

There are no correlation between event channels and vCPUs. The number of
event channels only depends on the number of frontend you have in your
guest. So...

Hi Ian,

On 06/04/2020 11:47, Ian Jackson wrote:

If ARM folks want to have a different formula for the default then
that is of course fine but I wonder whether this might do ARMk more
harm than good in this case.


... 1023 event channels is going to be plenty enough for most of the use
cases.


OK, thanks for the quick reply.

So, Jürgen, I think everyone will be happy with this:

Ian Jackson writes ("Re: [PATCH v2] tools/libxl: make default of max event channels 
dependant on vcpus"):

I guess you should make two patches 1. duplicate the existing formula
(no functional change) 2. change the x86 formula.

I would ack the first and be guided by x86 folks for the 2nd.


Okay, thanks. Will do that.


Juergen



Re: [PATCH 0/5] use new API for Xen page tables

2020-04-06 Thread Jan Beulich
On 06.04.2020 10:27, Hongyan Xia wrote:
> Ping.

Does this somehow imply you didn't get my replies sent on the 1st?

Jan

> On Mon, 2020-03-23 at 09:41 +, Hongyan Xia wrote:
>> From: Hongyan Xia 
>>
>> This small series is basically just rewriting functions using the new
>> API to map and unmap PTEs. Each patch is independent.
>>
>> Apart from mapping and unmapping page tables, no other functional
>> change
>> intended.
>>
>> Wei Liu (5):
>>   x86/shim: map and unmap page tables in replace_va_mapping
>>   x86_64/mm: map and unmap page tables in m2p_mapped
>>   x86_64/mm: map and unmap page tables in share_hotadd_m2p_table
>>   x86_64/mm: map and unmap page tables in destroy_compat_m2p_mapping
>>   x86_64/mm: map and unmap page tables in destroy_m2p_mapping
>>
>>  xen/arch/x86/pv/shim.c | 10 ---
>>  xen/arch/x86/x86_64/mm.c   | 55 +---
>> --
>>  xen/include/asm-x86/page.h | 18 +
>>  3 files changed, 62 insertions(+), 21 deletions(-)
>>
> 




Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus [and 1 more messages]

2020-04-06 Thread Jan Beulich
On 06.04.2020 13:00, Ian Jackson wrote:
> Julien Grall writes ("Re: [PATCH v2] tools/libxl: make default of max event 
> channels dependant on vcpus"):
>> There are no correlation between event channels and vCPUs. The number of 
>> event channels only depends on the number of frontend you have in your 
>> guest. So...
>>
>> Hi Ian,
>>
>> On 06/04/2020 11:47, Ian Jackson wrote:
>>> If ARM folks want to have a different formula for the default then
>>> that is of course fine but I wonder whether this might do ARMk more
>>> harm than good in this case.
>>
>> ... 1023 event channels is going to be plenty enough for most of the use 
>> cases.
> 
> OK, thanks for the quick reply.
> 
> So, Jürgen, I think everyone will be happy with this:

I don't think I will be - my prior comment still holds on there not
being any grounds to use a specific OS kernel's (and to be precise
a specific OS kernel version's) requirements for determining
defaults. If there was to be such a dependency, then OS kernel
[variant] should be part of the inputs to such a (set of) formula(s).

Jan

> Ian Jackson writes ("Re: [PATCH v2] tools/libxl: make default of max event 
> channels dependant on vcpus"):
>> I guess you should make two patches 1. duplicate the existing formula
>> (no functional change) 2. change the x86 formula.
>>
>> I would ack the first and be guided by x86 folks for the 2nd.
> 
> Ian.
> 




[xen-unstable test] 149451: tolerable FAIL

2020-04-06 Thread osstest service owner
flight 149451 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149451/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds 16 guest-localmigrate   fail  like 149406
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 149431
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 149431
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 149431
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 149431
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 149431
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 149431
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 149431
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 149431
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 149431
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass

version targeted for testing:
 xen  990b6e38d93c6e60f9d81e8b71ddfd209fca00bd
baseline version:
 xen  990b6e38d93c6e60f9d81e8b71ddfd209fca00bd

Last test of basis   149451  2020-04-06 01:51:54 Z0 days
Testing same since  (not found) 0 attempts

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64   

Re: [Xen-devel] [PATCH v2 2/2] xen: credit2: fix credit reset happening too few times

2020-04-06 Thread Jan Beulich
On 19.03.2020 01:12, Dario Faggioli wrote:
> @@ -3328,12 +3325,9 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>  (unsigned char *)&d);
>  }
>  
> -/* Only consider units that are allowed to run on this processor. */
> +/* Only consider vcpus that are allowed to run on this processor. */
>  if ( !cpumask_test_cpu(cpu, svc->unit->cpu_hard_affinity) )

While backporting this to 4.12 I noticed that this is a presumably
unintended comment adjustment.

Jan



Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus [and 1 more messages]

2020-04-06 Thread Jürgen Groß

On 06.04.20 13:11, Jan Beulich wrote:

On 06.04.2020 13:00, Ian Jackson wrote:

Julien Grall writes ("Re: [PATCH v2] tools/libxl: make default of max event channels 
dependant on vcpus"):

There are no correlation between event channels and vCPUs. The number of
event channels only depends on the number of frontend you have in your
guest. So...

Hi Ian,

On 06/04/2020 11:47, Ian Jackson wrote:

If ARM folks want to have a different formula for the default then
that is of course fine but I wonder whether this might do ARMk more
harm than good in this case.


... 1023 event channels is going to be plenty enough for most of the use
cases.


OK, thanks for the quick reply.

So, Jürgen, I think everyone will be happy with this:


I don't think I will be - my prior comment still holds on there not
being any grounds to use a specific OS kernel's (and to be precise
a specific OS kernel version's) requirements for determining
defaults. If there was to be such a dependency, then OS kernel
[variant] should be part of the inputs to such a (set of) formula(s).


IMO this kind of trying to be perfect will completely block a sane
heuristic for being able to boot large guests at all.

The patch isn't about to find an as stringent as possible upper
boundary for huge guests, but a sane value being able to boot most of
those.

And how should Xen know the OS kernel needs exactly after all?

And it is not that we talking about megabytes of additional memory. A
guest with 256 vcpus will just be able to use additional 36 memory
pages. The maximum non-PV domain (the probably only relevant case
of another OS than Linux being used) with 128 vcpus would "waste"
32 kB. In case the guest misbehaves.

The alternative would be to do nothing and having to let the user
experience a somewhat cryptic guest crash. He could google for a
possible solution which would probably end in a rather high static
limit resulting in wasting even more memory.


Juergen



Re: Live migration and PV device handling

2020-04-06 Thread Andrew Cooper
On 06/04/2020 08:50, Paul Durrant wrote:
>> -Original Message-
>> From: Xen-devel  On Behalf Of Dongli 
>> Zhang
>> Sent: 03 April 2020 23:33
>> To: Andrew Cooper ; Anastassios Nanos 
>> ; xen-
>> de...@lists.xen.org
>> Subject: Re: Live migration and PV device handling
>>
>> Hi Andrew,
>>
>> On 4/3/20 5:42 AM, Andrew Cooper wrote:
>>> On 03/04/2020 13:32, Anastassios Nanos wrote:
 Hi all,

 I am trying to understand how live-migration happens in xen. I am
 looking in the HVM guest case and I have dug into the relevant parts
 of the toolstack and the hypervisor regarding memory, vCPU context
 etc.

 In particular, I am interested in how PV device migration happens. I
 assume that the guest is not aware of any suspend/resume operations
 being done
>>> Sadly, this assumption is not correct.  HVM guests with PV drivers
>>> currently have to be aware in exactly the same way as PV guests.
>>>
>>> Work is in progress to try and address this.  See
>>> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=775a02452ddf3a6889690de90b1a94eb29c3c732
>>> (sorry - for some reason that doc isn't being rendered properly in
>>> https://xenbits.xen.org/docs/ )

Document rendering now fixed.

https://xenbits.xen.org/docs/unstable/designs/non-cooperative-migration.html

>> I read below from the commit:
>>
>> +* The toolstack choose a randomized domid for initial creation or default
>> +migration, but preserve the source domid non-cooperative migration.
>> +Non-Cooperative migration will have to be denied if the domid is
>> +unavailable on the target host, but randomization of domid on creation
>> +should hopefully minimize the likelihood of this. Non-Cooperative migration
>> +to localhost will clearly not be possible.
>>
>> Does that indicate while scope of domid_t is shared by a single server in old
>> design, the scope of domid_t is shared by a cluster of server in new design?
>>
>> That is, the domid should be unique in the cluster of all servers if we 
>> expect
>> non-cooperative migration always succeed?
>>
> That would be necessary to guarantee success (or rather guarantee no failure 
> due to domid clash) but the scope of xl/libxl is single serve, hence 
> randomization is the best we have to reduce clashes to a minimum.

domid's are inherently a local concept and will remain so, but a
toolstack managing multiple servers and wanting to use this version of
non-cooperative migration will have to manage domid's cluster wide.

~Andrew



[PATCH v2 for-5.0] xen-block: Fix double qlist remove and request leak

2020-04-06 Thread Anthony PERARD
Commit a31ca6801c02 ("qemu/queue.h: clear linked list pointers on
remove") revealed that a request was removed twice from a list, once
in xen_block_finish_request() and a second time in
xen_block_release_request() when both function are called from
xen_block_complete_aio(). But also, the `requests_inflight' counter is
decreased twice, and thus became negative.

This is a bug that was introduced in bfd0d6366043, where a `finished'
list was removed.

That commit also introduced a leak of request in xen_block_do_aio().
That function calls xen_block_finish_request() but the request is
never released after that.

To fix both issue, we do two changes:
- we squash finish_request() and release_request() together as we want
  to remove a request from 'inflight' list to add it to 'freelist'.
- before releasing a request, we need to let now the result to the
  other end, thus we should call xen_block_send_response() before
  releasing a request.

The first change fix the double QLIST_REMOVE() as we remove the extra
call. The second change makes the leak go away because if we want to
call finish_request(), we need to call a function that do all of
finish, send response, and release.

Fixes: bfd0d6366043 ("xen-block: improve response latency")
Signed-off-by: Anthony PERARD 
---
 hw/block/dataplane/xen-block.c | 48 --
 1 file changed, 16 insertions(+), 32 deletions(-)

diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index 288a87a814ad..5f8f15778ba5 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -64,6 +64,8 @@ struct XenBlockDataPlane {
 AioContext *ctx;
 };
 
+static int xen_block_send_response(XenBlockRequest *request);
+
 static void reset_request(XenBlockRequest *request)
 {
 memset(&request->req, 0, sizeof(request->req));
@@ -115,23 +117,26 @@ static XenBlockRequest 
*xen_block_start_request(XenBlockDataPlane *dataplane)
 return request;
 }
 
-static void xen_block_finish_request(XenBlockRequest *request)
+static void xen_block_complete_request(XenBlockRequest *request)
 {
 XenBlockDataPlane *dataplane = request->dataplane;
 
-QLIST_REMOVE(request, list);
-dataplane->requests_inflight--;
-}
+if (xen_block_send_response(request)) {
+Error *local_err = NULL;
 
-static void xen_block_release_request(XenBlockRequest *request)
-{
-XenBlockDataPlane *dataplane = request->dataplane;
+xen_device_notify_event_channel(dataplane->xendev,
+dataplane->event_channel,
+&local_err);
+if (local_err) {
+error_report_err(local_err);
+}
+}
 
 QLIST_REMOVE(request, list);
+dataplane->requests_inflight--;
 reset_request(request);
 request->dataplane = dataplane;
 QLIST_INSERT_HEAD(&dataplane->freelist, request, list);
-dataplane->requests_inflight--;
 }
 
 /*
@@ -246,7 +251,6 @@ static int xen_block_copy_request(XenBlockRequest *request)
 }
 
 static int xen_block_do_aio(XenBlockRequest *request);
-static int xen_block_send_response(XenBlockRequest *request);
 
 static void xen_block_complete_aio(void *opaque, int ret)
 {
@@ -286,7 +290,6 @@ static void xen_block_complete_aio(void *opaque, int ret)
 }
 
 request->status = request->aio_errors ? BLKIF_RSP_ERROR : BLKIF_RSP_OKAY;
-xen_block_finish_request(request);
 
 switch (request->req.operation) {
 case BLKIF_OP_WRITE:
@@ -306,17 +309,8 @@ static void xen_block_complete_aio(void *opaque, int ret)
 default:
 break;
 }
-if (xen_block_send_response(request)) {
-Error *local_err = NULL;
 
-xen_device_notify_event_channel(dataplane->xendev,
-dataplane->event_channel,
-&local_err);
-if (local_err) {
-error_report_err(local_err);
-}
-}
-xen_block_release_request(request);
+xen_block_complete_request(request);
 
 if (dataplane->more_work) {
 qemu_bh_schedule(dataplane->bh);
@@ -420,8 +414,8 @@ static int xen_block_do_aio(XenBlockRequest *request)
 return 0;
 
 err:
-xen_block_finish_request(request);
 request->status = BLKIF_RSP_ERROR;
+xen_block_complete_request(request);
 return -1;
 }
 
@@ -575,17 +569,7 @@ static bool xen_block_handle_requests(XenBlockDataPlane 
*dataplane)
 break;
 };
 
-if (xen_block_send_response(request)) {
-Error *local_err = NULL;
-
-xen_device_notify_event_channel(dataplane->xendev,
-dataplane->event_channel,
-&local_err);
-if (local_err) {
-error_report_err(local_err);
-}
-}
-xen_block_release_request(request);
+xen_block_complete_req

Re: [PATCH v13 1/3] xen/mem_sharing: VM forking

2020-04-06 Thread Tamas K Lengyel
On Mon, Apr 6, 2020 at 4:52 AM Roger Pau Monné  wrote:
>
> On Mon, Mar 30, 2020 at 08:02:08AM -0700, Tamas K Lengyel wrote:
> > VM forking is the process of creating a domain with an empty memory space 
> > and a
> > parent domain specified from which to populate the memory when necessary. 
> > For
> > the new domain to be functional the VM state is copied over as part of the 
> > fork
> > operation (HVM params, hap allocation, etc).
> >
> > Signed-off-by: Tamas K Lengyel 
> > Acked-by: Jan Beulich 
> > +static int bring_up_vcpus(struct domain *cd, struct domain *d)
> > +{
> > +unsigned int i;
> > +int ret = -EINVAL;
> > +
> > +if ( d->max_vcpus != cd->max_vcpus ||
> > +(ret = cpupool_move_domain(cd, d->cpupool)) )
> > +return ret;
> > +
> > +for ( i = 0; i < cd->max_vcpus; i++ )
> > +{
> > +if ( !d->vcpu[i] || cd->vcpu[i] )
> > +continue;
> > +
> > +if ( !vcpu_create(cd, i) )
> > +return -EINVAL;
> > +}
> > +
> > +domain_update_node_affinity(cd);
> > +return 0;
> > +}
> > +
> > +static int copy_vcpu_settings(struct domain *cd, struct domain *d)
>
> Nit: AFAICT *d can be constified.

Sure.

>
> > +{
> > +unsigned int i;
> > +struct p2m_domain *p2m = p2m_get_hostp2m(cd);
> > +int ret = -EINVAL;
> > +
> > +for ( i = 0; i < cd->max_vcpus; i++ )
> > +{
> > +const struct vcpu *d_vcpu = d->vcpu[i];
> > +struct vcpu *cd_vcpu = cd->vcpu[i];
> > +struct vcpu_runstate_info runstate;
> > +mfn_t vcpu_info_mfn;
> > +
> > +if ( !d_vcpu || !cd_vcpu )
> > +continue;
> > +
> > +/* Copy & map in the vcpu_info page if the guest uses one */
> > +vcpu_info_mfn = d_vcpu->vcpu_info_mfn;
> > +if ( !mfn_eq(vcpu_info_mfn, INVALID_MFN) )
> > +{
> > +mfn_t new_vcpu_info_mfn = cd_vcpu->vcpu_info_mfn;
> > +
> > +/* Allocate & map the page for it if it hasn't been already */
> > +if ( mfn_eq(new_vcpu_info_mfn, INVALID_MFN) )
> > +{
> > +gfn_t gfn = mfn_to_gfn(d, vcpu_info_mfn);
> > +unsigned long gfn_l = gfn_x(gfn);
> > +struct page_info *page;
> > +
> > +if ( !(page = alloc_domheap_page(cd, 0)) )
> > +return -ENOMEM;
> > +
> > +new_vcpu_info_mfn = page_to_mfn(page);
> > +set_gpfn_from_mfn(mfn_x(new_vcpu_info_mfn), gfn_l);
> > +
> > +ret = p2m->set_entry(p2m, gfn, new_vcpu_info_mfn,
> > + PAGE_ORDER_4K, p2m_ram_rw,
> > + p2m->default_access, -1);
> > +if ( ret )
> > +return ret;
> > +
> > +ret = map_vcpu_info(cd_vcpu, gfn_l,
> > +PAGE_OFFSET(d_vcpu->vcpu_info));
> > +if ( ret )
> > +return ret;
> > +}
> > +
> > +copy_domain_page(new_vcpu_info_mfn, vcpu_info_mfn);
> > +}
> > +
> > +/* Setup the vCPU runstate area */
> > +if ( !guest_handle_is_null(runstate_guest(d_vcpu)) )
> > +{
> > +runstate_guest(cd_vcpu) = runstate_guest(d_vcpu);
> > +vcpu_runstate_get(cd_vcpu, &runstate);
> > +__copy_to_guest(runstate_guest(cd_vcpu), &runstate, 1);
>
> I just realized there's no need to copy the runstate area contents
> here, since they will get copied anyway in schedule_tail before
> resuming execution og cd_vcpu as long as runstate_guest is set.
>
> Note that the vcpu_info needs to be copied since it contains event
> channel info which is not unconditionally updated on context switch
> IIRC.

OK

>
> > +}
> > +
> > +/*
> > + * TODO: to support VMs with PV interfaces copy additional
> > + * settings here, such as PV timers.
> > + */
> > +}
> > +
> > +return 0;
> > +}
> > +
> > +static int fork_hap_allocation(struct domain *cd, struct domain *d)
> > +{
> > +int rc;
> > +bool preempted;
> > +unsigned long mb = hap_get_allocation(d);
> > +
> > +if ( mb == hap_get_allocation(cd) )
> > +return 0;
> > +
> > +paging_lock(cd);
> > +rc = hap_set_allocation(cd, mb << (20 - PAGE_SHIFT), &preempted);
> > +paging_unlock(cd);
> > +
> > +return preempted ? -ERESTART : rc;
> > +}
> > +
> > +static void copy_tsc(struct domain *cd, struct domain *d)
> > +{
> > +uint32_t tsc_mode;
> > +uint32_t gtsc_khz;
> > +uint32_t incarnation;
> > +uint64_t elapsed_nsec;
> > +
> > +tsc_get_info(d, &tsc_mode, &elapsed_nsec, >sc_khz, &incarnation);
> > +/* Don't bump incarnation on set */
> > +tsc_set_info(cd, tsc_mode, elapsed_nsec, gtsc_khz, incarnation - 1);
> > +}
> > +
> > +static int copy_special_pages(struct domain *cd, struct domain *d)
> > +{
> > +mfn_t new_mfn, old_mfn;
> > +struct p2m_domain *p2m = p2m

RE: [PATCH v2 for-5.0] xen-block: Fix double qlist remove and request leak

2020-04-06 Thread Paul Durrant
> -Original Message-
> From: Anthony PERARD 
> Sent: 06 April 2020 15:02
> To: qemu-de...@nongnu.org
> Cc: qemu-sta...@nongnu.org; Anthony PERARD ; 
> Stefano Stabellini
> ; Paul Durrant ; Stefan Hajnoczi 
> ; Kevin
> Wolf ; Max Reitz ; 
> xen-devel@lists.xenproject.org; qemu-
> bl...@nongnu.org
> Subject: [PATCH v2 for-5.0] xen-block: Fix double qlist remove and request 
> leak
> 
> Commit a31ca6801c02 ("qemu/queue.h: clear linked list pointers on
> remove") revealed that a request was removed twice from a list, once
> in xen_block_finish_request() and a second time in
> xen_block_release_request() when both function are called from
> xen_block_complete_aio(). But also, the `requests_inflight' counter is
> decreased twice, and thus became negative.
> 
> This is a bug that was introduced in bfd0d6366043

NIT: I guess you should quote the patch title here as well.

> , where a `finished'
> list was removed.
> 
> That commit also introduced a leak of request in xen_block_do_aio().
> That function calls xen_block_finish_request() but the request is
> never released after that.
> 
> To fix both issue, we do two changes:
> - we squash finish_request() and release_request() together as we want
>   to remove a request from 'inflight' list to add it to 'freelist'.
> - before releasing a request, we need to let now the result to the
>   other end,

"we need to let the other end know the result"

> thus we should call xen_block_send_response() before
>   releasing a request.
> 
> The first change fix the double QLIST_REMOVE() as we remove the extra

s/fix/fixes

> call. The second change makes the leak go away because if we want to
> call finish_request(), we need to call a function that do all of

s/do/does

> finish, send response, and release.
> 
> Fixes: bfd0d6366043 ("xen-block: improve response latency")
> Signed-off-by: Anthony PERARD 

The code looks ok, so with the cosmetic fixes...

Reviewed-by: Paul Durrant 




[PATCH v14 0/3] VM forking

2020-04-06 Thread Tamas K Lengyel
The following series implements VM forking for Intel HVM guests to allow for
the fast creation of identical VMs without the assosciated high startup costs
of booting or restoring the VM from a savefile.

JIRA issue: https://xenproject.atlassian.net/browse/XEN-89

The fork operation is implemented as part of the "xl fork-vm" command:
xl fork-vm -C  -Q  -m  

By default a fully functional fork is created. The user is in charge however to
create the appropriate config file for the fork and to generate the QEMU save
file before the fork-vm call is made. The config file needs to give the
fork a new name at minimum but other settings may also require changes. Certain
settings in the config file of both the parent and the fork have to be set to
default. Details are documented.

The interface also allows to split the forking into two steps:
xl fork-vm --launch-dm no \
   -m  \
   -p 
xl fork-vm --launch-dm late \
   -C  \
   -Q  \
   

The split creation model is useful when the VM needs to be created as fast as
possible. The forked VM can be unpaused without the device model being launched
to be monitored and accessed via VMI. Note however that without its device
model running (depending on what is executing in the VM) it is bound to
misbehave or even crash when its trying to access devices that would be
emulated by QEMU. We anticipate that for certain use-cases this would be an
acceptable situation, in case for example when fuzzing is performed of code
segments that don't access such devices.

Launching the device model requires the QEMU Xen savefile to be generated
manually from the parent VM. This can be accomplished simply by connecting to
its QMP socket and issuing the "xen-save-devices-state" command. For example
using the standard tool socat these commands can be used to generate the file:
socat - UNIX-CONNECT:/var/run/xen/qmp-libxl-
{ "execute": "qmp_capabilities" }
{ "execute": "xen-save-devices-state", \
"arguments": { "filename": "/path/to/save/qemu_state", \
"live": false} }

At runtime the forked VM starts running with an empty p2m which gets lazily
populated when the VM generates EPT faults, similar to how altp2m views are
populated. If the memory access is a read-only access, the p2m entry is
populated with a memory shared entry with its parent. For write memory accesses
or in case memory sharing wasn't possible (for example in case a reference is
held by a third party), a new page is allocated and the page contents are
copied over from the parent VM. Forks can be further forked if needed, thus
allowing for further memory savings.

A VM fork reset hypercall is also added that allows the fork to be reset to the
state it was just after a fork, also accessible via xl:
xl fork-vm --fork-reset -p 

This is an optimization for cases where the forks are very short-lived and run
without a device model, so resetting saves some time compared to creating a
brand new fork provided the fork has not aquired a lot of memory. If the fork
has a lot of memory deduplicated it is likely going to be faster to create a
new fork from scratch and asynchronously destroying the old one.

The series has been tested with Windows VMs and functions as expected. Linux
VMs when forked from a running VM will have a frozen VNC screen. Linux VMs at
this time can only be forked with a working device model when the parent VM was
restored from a snapshot using "xl restore -p". This is a known limitation.
Also note that PVHVM/PVH Linux guests have not been tested. Forking most likely
works but PV devices and drivers would require additional wiring to set things
up properly since the guests are unaware of the forking taking place, unlike
the save/restore routine where the guest is made aware of the procedure.

Forking time has been measured to be 0.0007s, device model launch to be around
1s depending largely on the number of devices being emulated. Fork resets have
been measured to be 0.0001s under the optimal circumstances.

New in v14:
minor adjustments

Patch 1 implements the VM fork
Patch 2 implements fork reset operation
Patch 3 adds the toolstack-side code implementing VM forking and reset

Tamas K Lengyel (3):
  xen/mem_sharing: VM forking
  x86/mem_sharing: reset a fork
  xen/tools: VM forking toolstack side

 docs/man/xl.1.pod.in  |  44 
 tools/libxc/include/xenctrl.h |  13 +
 tools/libxc/xc_memshr.c   |  22 ++
 tools/libxl/libxl.h   |  11 +
 tools/libxl/libxl_create.c| 361 ++---
 tools/libxl/libxl_dm.c|   2 +-
 tools/libxl/libxl_dom.c   |  43 ++-
 tools/libxl/libxl_internal.h  |   7 +
 tools/libxl/libxl_types.idl   |   1 +
 tools/libxl/libxl_x86.c   |  41 +++
 tools/xl/Makefile |   2 +-
 tools/xl/xl.h |   5 +
 tools/xl/xl_cmdtable.c|  15 ++
 tools/xl/xl

[PATCH v14 3/3] xen/tools: VM forking toolstack side

2020-04-06 Thread Tamas K Lengyel
Add necessary bits to implement "xl fork-vm" commands. The command allows the
user to specify how to launch the device model allowing for a late-launch model
in which the user can execute the fork without the device model and decide to
only later launch it.

Signed-off-by: Tamas K Lengyel 
---
 docs/man/xl.1.pod.in  |  44 +
 tools/libxc/include/xenctrl.h |  13 ++
 tools/libxc/xc_memshr.c   |  22 +++
 tools/libxl/libxl.h   |  11 ++
 tools/libxl/libxl_create.c| 361 +++---
 tools/libxl/libxl_dm.c|   2 +-
 tools/libxl/libxl_dom.c   |  43 +++-
 tools/libxl/libxl_internal.h  |   7 +
 tools/libxl/libxl_types.idl   |   1 +
 tools/libxl/libxl_x86.c   |  41 
 tools/xl/Makefile |   2 +-
 tools/xl/xl.h |   5 +
 tools/xl/xl_cmdtable.c|  15 ++
 tools/xl/xl_forkvm.c  | 144 ++
 tools/xl/xl_vmcontrol.c   |  14 ++
 15 files changed, 559 insertions(+), 166 deletions(-)
 create mode 100644 tools/xl/xl_forkvm.c

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 09339282e6..59c03c6427 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -708,6 +708,50 @@ above).
 
 =back
 
+=item B [I] I
+
+Create a fork of a running VM.  The domain will be paused after the operation
+and remains paused while forks of it exist.  Experimental and x86 only.
+Forks can only be made of domains with HAP enabled and on Intel hardware.  The
+parent domain must be created with the xl toolstack and its configuration must
+not manually define max_grant_frames, max_maptrack_frames or 
max_event_channels.
+
+B
+
+=over 4
+
+=item B<-p>
+
+Leave the fork paused after creating it.
+
+=item B<--launch-dm>
+
+Specify whether the device model (QEMU) should be launched for the fork. Late
+launch allows to start the device model for an already running fork.
+
+=item B<-C>
+
+The config file to use when launching the device model.  Currently required 
when
+launching the device model.  Most config settings MUST match the parent domain
+exactly, only change VM name, disk path and network configurations.
+
+=item B<-Q>
+
+The path to the qemu save file to use when launching the device model.  
Currently
+required when launching the device model.
+
+=item B<--fork-reset>
+
+Perform a reset operation of an already running fork.  Note that resetting may
+be less performant then creating a new fork depending on how much memory the
+fork has deduplicated during its runtime.
+
+=item B<--max-vcpus>
+
+Specify the max-vcpus matching the parent domain when not launching the dm.
+
+=back
+
 =item B [I]
 
 Display the number of shared pages for a specified domain. If no domain is
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 58fa931de1..631750a242 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2225,6 +2225,19 @@ int xc_memshr_range_share(xc_interface *xch,
   uint64_t first_gfn,
   uint64_t last_gfn);
 
+int xc_memshr_fork(xc_interface *xch,
+   uint32_t source_domain,
+   uint32_t client_domain);
+
+/*
+ * Note: this function is only intended to be used on short-lived forks that
+ * haven't yet aquired a lot of memory. In case the fork has a lot of memory
+ * it is likely more performant to create a new fork with xc_memshr_fork.
+ *
+ * With VMs that have a lot of memory this call may block for a long time.
+ */
+int xc_memshr_fork_reset(xc_interface *xch, uint32_t forked_domain);
+
 /* Debug calls: return the number of pages referencing the shared frame backing
  * the input argument. Should be one or greater.
  *
diff --git a/tools/libxc/xc_memshr.c b/tools/libxc/xc_memshr.c
index 97e2e6a8d9..d0e4ee225b 100644
--- a/tools/libxc/xc_memshr.c
+++ b/tools/libxc/xc_memshr.c
@@ -239,6 +239,28 @@ int xc_memshr_debug_gref(xc_interface *xch,
 return xc_memshr_memop(xch, domid, &mso);
 }
 
+int xc_memshr_fork(xc_interface *xch, uint32_t pdomid, uint32_t domid)
+{
+xen_mem_sharing_op_t mso;
+
+memset(&mso, 0, sizeof(mso));
+
+mso.op = XENMEM_sharing_op_fork;
+mso.u.fork.parent_domain = pdomid;
+
+return xc_memshr_memop(xch, domid, &mso);
+}
+
+int xc_memshr_fork_reset(xc_interface *xch, uint32_t domid)
+{
+xen_mem_sharing_op_t mso;
+
+memset(&mso, 0, sizeof(mso));
+mso.op = XENMEM_sharing_op_fork_reset;
+
+return xc_memshr_memop(xch, domid, &mso);
+}
+
 int xc_memshr_audit(xc_interface *xch)
 {
 xen_mem_sharing_op_t mso;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 71709dc585..088e81c78b 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -2666,6 +2666,17 @@ int libxl_psr_get_hw_info(libxl_ctx *ctx, 
libxl_psr_feat_type type,
   unsigned int lvl, unsigned int *nr,
   libxl_psr_hw_info **info);
 void libxl_psr_hw_info_list_free(libxl_psr_hw_info *list, unsigned in

[PATCH v14 2/3] x86/mem_sharing: reset a fork

2020-04-06 Thread Tamas K Lengyel
Implement hypercall that allows a fork to shed all memory that got allocated
for it during its execution and re-load its vCPU context from the parent VM.
This allows the forked VM to reset into the same state the parent VM is in a
faster way then creating a new fork would be. Measurements show about a 2x
speedup during normal fuzzing operations. Performance may vary depending how
much memory got allocated for the forked VM. If it has been completely
deduplicated from the parent VM then creating a new fork would likely be more
performant.

Signed-off-by: Tamas K Lengyel 
Reviewed-by: Roger Pau Monné 
---
v12: remove continuation & add comment back
 address style issues pointed out by Jan
---
 xen/arch/x86/mm/mem_sharing.c | 77 +++
 xen/include/public/memory.h   |  1 +
 2 files changed, 78 insertions(+)

diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 64cd706e5a..85951b7bea 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1749,6 +1749,61 @@ static int fork(struct domain *cd, struct domain *d)
 return rc;
 }
 
+/*
+ * The fork reset operation is intended to be used on short-lived forks only.
+ * There is no hypercall continuation operation implemented for this reason.
+ * For forks that obtain a larger memory footprint it is likely going to be
+ * more performant to create a new fork instead of resetting an existing one.
+ *
+ * TODO: In case this hypercall would become useful on forks with larger memory
+ * footprints the hypercall continuation should be implemented (or if this
+ * feature needs to be become "stable").
+ */
+static int mem_sharing_fork_reset(struct domain *d, struct domain *pd)
+{
+int rc;
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+struct page_info *page, *tmp;
+
+domain_pause(d);
+
+/* need recursive lock because we will free pages */
+spin_lock_recursive(&d->page_alloc_lock);
+page_list_for_each_safe(page, tmp, &d->page_list)
+{
+p2m_type_t p2mt;
+p2m_access_t p2ma;
+mfn_t mfn = page_to_mfn(page);
+gfn_t gfn = mfn_to_gfn(d, mfn);
+
+mfn = __get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma,
+0, NULL, false);
+
+/* only reset pages that are sharable */
+if ( !p2m_is_sharable(p2mt) )
+continue;
+
+/* take an extra reference or just skip if can't for whatever reason */
+if ( !get_page(page, d) )
+continue;
+
+/* forked memory is 4k, not splitting large pages so this must work */
+rc = p2m->set_entry(p2m, gfn, INVALID_MFN, PAGE_ORDER_4K,
+p2m_invalid, p2m_access_rwx, -1);
+ASSERT(!rc);
+
+put_page_alloc_ref(page);
+put_page(page);
+}
+spin_unlock_recursive(&d->page_alloc_lock);
+
+rc = copy_settings(d, pd);
+
+domain_unpause(d);
+
+return rc;
+}
+
 int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 {
 int rc;
@@ -2039,6 +2094,28 @@ int 
mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg)
 break;
 }
 
+case XENMEM_sharing_op_fork_reset:
+{
+struct domain *pd;
+
+rc = -EINVAL;
+if ( mso.u.fork.pad[0] || mso.u.fork.pad[1] || mso.u.fork.pad[2] )
+goto out;
+
+rc = -ENOSYS;
+if ( !d->parent )
+goto out;
+
+rc = rcu_lock_live_remote_domain_by_id(d->parent->domain_id, &pd);
+if ( rc )
+goto out;
+
+rc = mem_sharing_fork_reset(d, pd);
+
+rcu_unlock_domain(pd);
+break;
+}
+
 default:
 rc = -ENOSYS;
 break;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 5ee4e0da12..d36d64b8dc 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -483,6 +483,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_access_op_t);
 #define XENMEM_sharing_op_audit 7
 #define XENMEM_sharing_op_range_share   8
 #define XENMEM_sharing_op_fork  9
+#define XENMEM_sharing_op_fork_reset10
 
 #define XENMEM_SHARING_OP_S_HANDLE_INVALID  (-10)
 #define XENMEM_SHARING_OP_C_HANDLE_INVALID  (-9)
-- 
2.20.1




[PATCH v14 1/3] xen/mem_sharing: VM forking

2020-04-06 Thread Tamas K Lengyel
VM forking is the process of creating a domain with an empty memory space and a
parent domain specified from which to populate the memory when necessary. For
the new domain to be functional the VM state is copied over as part of the fork
operation (HVM params, hap allocation, etc).

Signed-off-by: Tamas K Lengyel 
Reviewed-by: Roger Pau Monné 
Acked-by: Jan Beulich 
Acked-by: Julien Grall 
---
v14: constify some function inputs
 minor fixes pointed out by Roger
---
 xen/arch/x86/domain.c |  13 ++
 xen/arch/x86/hvm/hvm.c|   4 +-
 xen/arch/x86/mm/hap/hap.c |   3 +-
 xen/arch/x86/mm/mem_sharing.c | 342 ++
 xen/arch/x86/mm/p2m.c |   9 +-
 xen/include/asm-arm/page.h|   1 +
 xen/include/asm-x86/hap.h |   1 +
 xen/include/asm-x86/hvm/hvm.h |   2 +
 xen/include/asm-x86/mem_sharing.h |  18 ++
 xen/include/asm-x86/page.h|   1 +
 xen/include/public/memory.h   |   5 +
 xen/include/xen/sched.h   |   1 +
 12 files changed, 395 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 683bc619aa..a008d7df1c 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2211,6 +2211,19 @@ int domain_relinquish_resources(struct domain *d)
 ret = relinquish_shared_pages(d);
 if ( ret )
 return ret;
+
+/*
+ * If the domain is forked, decrement the parent's pause count
+ * and release the domain.
+ */
+if ( mem_sharing_is_fork(d) )
+{
+struct domain *parent = d->parent;
+
+d->parent = NULL;
+domain_unpause(parent);
+put_domain(parent);
+}
 }
 #endif
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index a3d115b650..304b3d1562 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1917,7 +1917,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
 }
 #endif
 
-/* Spurious fault? PoD and log-dirty also take this path. */
+/* Spurious fault? PoD, log-dirty and VM forking also take this path. */
 if ( p2m_is_ram(p2mt) )
 {
 rc = 1;
@@ -4377,7 +4377,7 @@ static int hvm_allow_get_param(struct domain *d,
 return rc;
 }
 
-static int hvm_get_param(struct domain *d, uint32_t index, uint64_t *value)
+int hvm_get_param(struct domain *d, uint32_t index, uint64_t *value)
 {
 int rc;
 
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index a6d5e39b02..814d0c3253 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -321,8 +321,7 @@ static void hap_free_p2m_page(struct domain *d, struct 
page_info *pg)
 }
 
 /* Return the size of the pool, rounded up to the nearest MB */
-static unsigned int
-hap_get_allocation(struct domain *d)
+unsigned int hap_get_allocation(struct domain *d)
 {
 unsigned int pg = d->arch.paging.hap.total_pages
 + d->arch.paging.hap.p2m_pages;
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index f49f27a3ef..64cd706e5a 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -22,6 +22,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -36,6 +37,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 #include "mm-locks.h"
@@ -1443,6 +1446,309 @@ static inline int mem_sharing_control(struct domain *d, 
bool enable)
 return 0;
 }
 
+/*
+ * Forking a page only gets called when the VM faults due to no entry being
+ * in the EPT for the access. Depending on the type of access we either
+ * populate the physmap with a shared entry for read-only access or
+ * fork the page if its a write access.
+ *
+ * The client p2m is already locked so we only need to lock
+ * the parent's here.
+ */
+int mem_sharing_fork_page(struct domain *d, gfn_t gfn, bool unsharing)
+{
+int rc = -ENOENT;
+shr_handle_t handle;
+struct domain *parent = d->parent;
+struct p2m_domain *p2m;
+unsigned long gfn_l = gfn_x(gfn);
+mfn_t mfn, new_mfn;
+p2m_type_t p2mt;
+struct page_info *page;
+
+if ( !mem_sharing_is_fork(d) )
+return -ENOENT;
+
+if ( !unsharing )
+{
+/* For read-only accesses we just add a shared entry to the physmap */
+while ( parent )
+{
+if ( !(rc = nominate_page(parent, gfn, 0, &handle)) )
+break;
+
+parent = parent->parent;
+}
+
+if ( !rc )
+{
+/* The client's p2m is already locked */
+p2m = p2m_get_hostp2m(parent);
+
+p2m_lock(p2m);
+rc = add_to_physmap(parent, gfn_l, handle, d, gfn_l, false);
+p2m_unlock(p2m);
+
+if ( !rc )
+return 0;
+}
+}
+
+/*
+ * If it's a write access (ie. unsharing) or if adding a shared entry to
+  

[PATCH for-5.0] xen-block: Fix uninitialized variable

2020-04-06 Thread Anthony PERARD
Since 7f5d9b206d1e ("object-add: don't create return value if
failed"), qmp_object_add() don't write any value in 'ret_data', thus
has random data. Then qobject_unref() fails and abort().

Fix by initialising 'ret_data' properly.

Fixes: 5f07c4d60d09 ("qapi: Flatten object-add")
Signed-off-by: Anthony PERARD 
---
 hw/block/xen-block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index 07bb32e22b51..99cb4c67cb09 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -860,7 +860,7 @@ static XenBlockIOThread *xen_block_iothread_create(const 
char *id,
 XenBlockIOThread *iothread = g_new(XenBlockIOThread, 1);
 Error *local_err = NULL;
 QDict *opts;
-QObject *ret_data;
+QObject *ret_data = NULL;
 
 iothread->id = g_strdup(id);
 
-- 
Anthony PERARD




Re: [PATCH for-5.0] xen-block: Fix uninitialized variable

2020-04-06 Thread Philippe Mathieu-Daudé

On 4/6/20 6:42 PM, Anthony PERARD wrote:

Since 7f5d9b206d1e ("object-add: don't create return value if
failed"), qmp_object_add() don't write any value in 'ret_data', thus
has random data. Then qobject_unref() fails and abort().

Fix by initialising 'ret_data' properly.


Or move qobject_unref() after the error check?

-- >8 --
diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index 07bb32e22b..f3f1cbef65 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -869,7 +869,6 @@ static XenBlockIOThread 
*xen_block_iothread_create(const char *id,

 qdict_put_str(opts, "id", id);
 qmp_object_add(opts, &ret_data, &local_err);
 qobject_unref(opts);
-qobject_unref(ret_data);

 if (local_err) {
 error_propagate(errp, local_err);
@@ -878,6 +877,7 @@ static XenBlockIOThread 
*xen_block_iothread_create(const char *id,

 g_free(iothread);
 return NULL;
 }
+qobject_unref(ret_data);

 return iothread;
 }
---



Fixes: 5f07c4d60d09 ("qapi: Flatten object-add")
Signed-off-by: Anthony PERARD 
---
  hw/block/xen-block.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
index 07bb32e22b51..99cb4c67cb09 100644
--- a/hw/block/xen-block.c
+++ b/hw/block/xen-block.c
@@ -860,7 +860,7 @@ static XenBlockIOThread *xen_block_iothread_create(const 
char *id,
  XenBlockIOThread *iothread = g_new(XenBlockIOThread, 1);
  Error *local_err = NULL;
  QDict *opts;
-QObject *ret_data;
+QObject *ret_data = NULL;
  
  iothread->id = g_strdup(id);
  






Re: Live migration and PV device handling

2020-04-06 Thread Tamas K Lengyel
On Fri, Apr 3, 2020 at 6:44 AM Andrew Cooper  wrote:
>
> On 03/04/2020 13:32, Anastassios Nanos wrote:
> > Hi all,
> >
> > I am trying to understand how live-migration happens in xen. I am
> > looking in the HVM guest case and I have dug into the relevant parts
> > of the toolstack and the hypervisor regarding memory, vCPU context
> > etc.
> >
> > In particular, I am interested in how PV device migration happens. I
> > assume that the guest is not aware of any suspend/resume operations
> > being done
>
> Sadly, this assumption is not correct.  HVM guests with PV drivers
> currently have to be aware in exactly the same way as PV guests.
>
> Work is in progress to try and address this.  See
> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=775a02452ddf3a6889690de90b1a94eb29c3c732
> (sorry - for some reason that doc isn't being rendered properly in
> https://xenbits.xen.org/docs/ )

That proposal is very interesting - first time it came across my radar
- but I dislike the idea that domain IDs need to be preserved for
uncooperative migration to work. Ideally I would be able to take
advantage of the same plumbing to perform forking of VMs with PV
drivers where preserving the domain id is impossible since its still
in use.

Tamas



Re: [PATCH for-5.0] xen-block: Fix uninitialized variable

2020-04-06 Thread Anthony PERARD
On Mon, Apr 06, 2020 at 06:50:41PM +0200, Philippe Mathieu-Daudé wrote:
> On 4/6/20 6:42 PM, Anthony PERARD wrote:
> > Since 7f5d9b206d1e ("object-add: don't create return value if
> > failed"), qmp_object_add() don't write any value in 'ret_data', thus
> > has random data. Then qobject_unref() fails and abort().
> > 
> > Fix by initialising 'ret_data' properly.
> 
> Or move qobject_unref() after the error check?
> 
> -- >8 --
> diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
> index 07bb32e22b..f3f1cbef65 100644
> --- a/hw/block/xen-block.c
> +++ b/hw/block/xen-block.c
> @@ -869,7 +869,6 @@ static XenBlockIOThread *xen_block_iothread_create(const
> char *id,
>  qdict_put_str(opts, "id", id);
>  qmp_object_add(opts, &ret_data, &local_err);
>  qobject_unref(opts);
> -qobject_unref(ret_data);
> 
>  if (local_err) {
>  error_propagate(errp, local_err);
> @@ -878,6 +877,7 @@ static XenBlockIOThread *xen_block_iothread_create(const
> char *id,
>  g_free(iothread);
>  return NULL;
>  }
> +qobject_unref(ret_data);

That won't help, qmp_object_add() doesn't change the value of ret_data
at all. The other users of qmp_object_add() passes an initialised
'ret_data', so we should do the same I think.

Thanks,

-- 
Anthony PERARD



Re: Live migration and PV device handling

2020-04-06 Thread Andrew Cooper
On 06/04/2020 18:16, Tamas K Lengyel wrote:
> On Fri, Apr 3, 2020 at 6:44 AM Andrew Cooper  
> wrote:
>> On 03/04/2020 13:32, Anastassios Nanos wrote:
>>> Hi all,
>>>
>>> I am trying to understand how live-migration happens in xen. I am
>>> looking in the HVM guest case and I have dug into the relevant parts
>>> of the toolstack and the hypervisor regarding memory, vCPU context
>>> etc.
>>>
>>> In particular, I am interested in how PV device migration happens. I
>>> assume that the guest is not aware of any suspend/resume operations
>>> being done
>> Sadly, this assumption is not correct.  HVM guests with PV drivers
>> currently have to be aware in exactly the same way as PV guests.
>>
>> Work is in progress to try and address this.  See
>> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=775a02452ddf3a6889690de90b1a94eb29c3c732
>> (sorry - for some reason that doc isn't being rendered properly in
>> https://xenbits.xen.org/docs/ )
> That proposal is very interesting - first time it came across my radar
> - but I dislike the idea that domain IDs need to be preserved for
> uncooperative migration to work.

The above restriction is necessary to work with existing guests, which
is an implementation requirement of the folks driving the work.

> Ideally I would be able to take
> advantage of the same plumbing to perform forking of VMs with PV
> drivers where preserving the domain id is impossible since its still
> in use.

We would of course like to make changes to remove the above restriction
in the longterm.  The problem is that it is not a trivial thing to fix. 
Various things were discussed in Chicago, but I don't recall if any of
the plans made their way onto xen-devel.

~Andrew



[linux-linus test] 149452: regressions - trouble: broken/fail/pass

2020-04-06 Thread osstest service owner
flight 149452 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149452/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-freebsd10-amd64 broken
 test-amd64-i386-xl-qemuu-debianhvm-amd64broken
 test-amd64-i386-freebsd10-i386 broken
 test-amd64-i386-xl-pvshimbroken
 test-amd64-i386-xl-raw   broken
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm   broken
 test-amd64-i386-xl-qemuu-ws16-amd64 broken
 test-amd64-i386-xl-xsm   broken
 test-amd64-i386-libvirt-pair broken
 test-amd64-i386-pair broken
 test-amd64-i386-libvirt  broken
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow broken
 test-amd64-i386-qemut-rhel6hvm-intel broken
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsmbroken
 test-amd64-amd64-dom0pvh-xl-intel broken
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrictbroken
 test-amd64-i386-qemuu-rhel6hvm-amd broken
 test-amd64-i386-qemuu-rhel6hvm-intel broken
 test-amd64-i386-xl   broken
 test-amd64-i386-qemut-rhel6hvm-amd broken
 test-amd64-i386-xl-qemut-ws16-amd64 broken
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm broken
 test-amd64-i386-xl-qemuu-ovmf-amd64 broken
 test-amd64-i386-xl-qemut-win7-amd64 broken
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm broken
 test-amd64-i386-libvirt-xsm  broken
 test-amd64-i386-xl-qemut-debianhvm-amd64broken
 test-amd64-i386-xl-qemuu-win7-amd64 broken
 test-amd64-i386-xl-shadowbroken
 test-amd64-i386-examine   9 capture-logs   broken REGR. vs. 149238
 test-amd64-amd64-dom0pvh-xl-intel 16 capture-logs(16)  broken REGR. vs. 149238
 test-amd64-i386-xl-pvshim 7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-boot  fail REGR. vs. 149238
 test-amd64-i386-examine   8 reboot   fail REGR. vs. 149238
 test-amd64-i386-freebsd10-amd64  7 xen-boot  fail REGR. vs. 149238
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-boot  fail REGR. vs. 149238
 test-amd64-i386-libvirt   7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-xl-shadow 7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  7 xen-boot  fail REGR. vs. 149238
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-boot fail REGR. vs. 
149238
 test-amd64-i386-libvirt-pair 10 xen-boot/src_hostfail REGR. vs. 149238
 test-amd64-i386-libvirt-pair 11 xen-boot/dst_hostfail REGR. vs. 149238
 test-amd64-i386-pair 10 xen-boot/src_hostfail REGR. vs. 149238
 test-amd64-i386-pair 11 xen-boot/dst_hostfail REGR. vs. 149238
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-boot  fail REGR. vs. 149238
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm  7 xen-boot  fail REGR. vs. 149238
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-boot fail REGR. vs. 
149238
 test-amd64-i386-libvirt-xsm   7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-boot   fail REGR. vs. 149238
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 
149238
 test-amd64-i386-xl7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-xl-raw7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-boot   fail REGR. vs. 149238
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 
149238
 test-amd64-i386-xl-xsm7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-boot  fail REGR. vs. 149238
 test-amd64-i386-freebsd10-i386  7 xen-boot   fail REGR. vs. 149238
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-boot  fail REGR. vs. 149238

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 149238

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-pvshim 8 capture-logs(8)   broken blocked in 149238
 test-amd64-i386-xl-qemuu-ws16-amd64 8 capture-lo

Re: [PATCH v2 3/3] xen/x86: ioapic: Simplify ioapic_init()

2020-04-06 Thread Jan Beulich
On 04.04.2020 12:26, Julien Grall wrote:
> From: Julien Grall 
> 
> Since commit 9facd54a45 "x86/ioapic: Add register level checks to detect
> bogus io-apic entries", Xen is able to cope with IO APICs not mapped in
> the fixmap.
> 
> Therefore the whole logic to allocate a fake page for some IO APICs is
> unnecessary.
> 
> With the logic removed, the code can be simplified a lot as we don't
> need to go through all the IO APIC if SMP has not been detected or a
> bogus zero IO-APIC address has been detected.
> 
> To avoid another level of tabulation, the simplification is now moved in
> its own function.
> 
> Signed-off-by: Julien Grall 

Reviewed-by: Jan Beulich 



Re: [PATCH 2/5] xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext

2020-04-06 Thread Jan Beulich
On 06.04.2020 11:07, Paul Durrant wrote:
>> -Original Message-
>> From: Julien Grall 
>> Sent: 01 April 2020 14:42
>>
>> On 27/03/2020 18:50, Paul Durrant wrote:
>> Also, in the case of XEN_DOMCTL_setdomaincontext, the buffer is not
>> meant to be modified by the hypervisor. So I would rather introduce two
>> separate structure so we can enforce the const.
> 
> Can handles be meaningfully const?

Yes, see e.g. Julien's recent patch to force honoring const-ness
in some cases where it wasn't enforced so far. Luckily there
look to not have crept in any violators.

Jan



Re: Live migration and PV device handling

2020-04-06 Thread Tamas K Lengyel
On Mon, Apr 6, 2020 at 11:24 AM Andrew Cooper  wrote:
>
> On 06/04/2020 18:16, Tamas K Lengyel wrote:
> > On Fri, Apr 3, 2020 at 6:44 AM Andrew Cooper  
> > wrote:
> >> On 03/04/2020 13:32, Anastassios Nanos wrote:
> >>> Hi all,
> >>>
> >>> I am trying to understand how live-migration happens in xen. I am
> >>> looking in the HVM guest case and I have dug into the relevant parts
> >>> of the toolstack and the hypervisor regarding memory, vCPU context
> >>> etc.
> >>>
> >>> In particular, I am interested in how PV device migration happens. I
> >>> assume that the guest is not aware of any suspend/resume operations
> >>> being done
> >> Sadly, this assumption is not correct.  HVM guests with PV drivers
> >> currently have to be aware in exactly the same way as PV guests.
> >>
> >> Work is in progress to try and address this.  See
> >> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=775a02452ddf3a6889690de90b1a94eb29c3c732
> >> (sorry - for some reason that doc isn't being rendered properly in
> >> https://xenbits.xen.org/docs/ )
> > That proposal is very interesting - first time it came across my radar
> > - but I dislike the idea that domain IDs need to be preserved for
> > uncooperative migration to work.
>
> The above restriction is necessary to work with existing guests, which
> is an implementation requirement of the folks driving the work.
>
> > Ideally I would be able to take
> > advantage of the same plumbing to perform forking of VMs with PV
> > drivers where preserving the domain id is impossible since its still
> > in use.
>
> We would of course like to make changes to remove the above restriction
> in the longterm.  The problem is that it is not a trivial thing to fix.
> Various things were discussed in Chicago, but I don't recall if any of
> the plans made their way onto xen-devel.

Yea I imagine trying to get this to work with existing PV drivers is
not possible in any other way. But if we can update the PV driver code
such that in the longterm it can work without preserving the domain
ID, that would be worthwhile.

Tamas



Re: [PATCH v7 08/12] xen: add /buildinfo/config entry to hypervisor filesystem

2020-04-06 Thread Jan Beulich
On 03.04.2020 17:45, Jürgen Groß wrote:
> On 03.04.20 17:33, Jan Beulich wrote:
>> On 03.04.2020 17:12, Jürgen Groß wrote:
>>> On 03.04.20 16:31, Jan Beulich wrote:
 On 02.04.2020 17:46, Juergen Gross wrote:
> --- a/xen/common/Kconfig
> +++ b/xen/common/Kconfig
> @@ -353,6 +353,16 @@ config DOM0_MEM
>        Leave empty if you are not sure what to specify.
>    +config HYPFS_CONFIG
> +    bool "Provide hypervisor .config via hypfs entry"
> +    default y

 My initial reaction was to ask for "depends on HYPFS", but then
 I noticed the earlier patch doesn't introduce such. Am I
 mis-remembering that it was agreed to make the whole thing
 possible to disable at least in EXPERT mode?
>>>
>>> No, I don't remember that agreement.
>>>
>>> And TBH I'm not sure this is a good idea, as that would at once make the
>>> plan to replace at least some sysctl and/or domctl interfaces impossible
>>> (like e.g. the last 3 patches of the series are doing already), or at
>>> least would tie the related functionality to CONFIG_HYPFS.
>>
>> I think that would be fine - that's what config setting are for.
>> Someone caring about space may not care about runtime setting of
>> parameters.
> 
> So right now it would start with a plain hypfs available or not.
> 
> The next step would be in patch 12 to tell the user he will lose the
> capability of setting runtime parameters.
> 
> Another planned extension would be to control per-cpupool settings,
> which would the go away (possibly functionality being unconditionally
> available today).
> 
> Next would be the lack of being able to control per-domain mitigations
> like XPTI or L1TF, which I'd like to add.
> 
> Another thing I wanted to add is some debugging stuff (e.g. to switch
> lock profiling using hypfs).
> 
> And the list will go on.

Understood.

> Does it really make sense to make a central control and information
> interface conditional?

None of the above may be of interest to e.g. embedded use cases.

> I'd like at least a second opinion on that topic.

Yes, further opinions would surely help.

Jan



Re: [PATCH v2] tools/libxl: make default of max event channels dependant on vcpus [and 1 more messages]

2020-04-06 Thread Jan Beulich
On 06.04.2020 13:54, Jürgen Groß wrote:
> On 06.04.20 13:11, Jan Beulich wrote:
>> On 06.04.2020 13:00, Ian Jackson wrote:
>>> Julien Grall writes ("Re: [PATCH v2] tools/libxl: make default of max event 
>>> channels dependant on vcpus"):
 There are no correlation between event channels and vCPUs. The number of
 event channels only depends on the number of frontend you have in your
 guest. So...

 Hi Ian,

 On 06/04/2020 11:47, Ian Jackson wrote:
> If ARM folks want to have a different formula for the default then
> that is of course fine but I wonder whether this might do ARMk more
> harm than good in this case.

 ... 1023 event channels is going to be plenty enough for most of the use
 cases.
>>>
>>> OK, thanks for the quick reply.
>>>
>>> So, Jürgen, I think everyone will be happy with this:
>>
>> I don't think I will be - my prior comment still holds on there not
>> being any grounds to use a specific OS kernel's (and to be precise
>> a specific OS kernel version's) requirements for determining
>> defaults. If there was to be such a dependency, then OS kernel
>> [variant] should be part of the inputs to such a (set of) formula(s).
> 
> IMO this kind of trying to be perfect will completely block a sane
> heuristic for being able to boot large guests at all.

This isn't about being perfect - I'm suggesting to leave the
default alone, not to improve the calculation, not the least
because I've been implying ...

> The patch isn't about to find an as stringent as possible upper
> boundary for huge guests, but a sane value being able to boot most of
> those.
> 
> And how should Xen know the OS kernel needs exactly after all?

... the answer of "It can#t" to this question.

> And it is not that we talking about megabytes of additional memory. A
> guest with 256 vcpus will just be able to use additional 36 memory
> pages. The maximum non-PV domain (the probably only relevant case
> of another OS than Linux being used) with 128 vcpus would "waste"
> 32 kB. In case the guest misbehaves.

Any extra page counts, or else - where do you draw the line? Any
single page may decide between Xen (not) being out of memory,
and hence also not being able to fulfill certain other requests.

> The alternative would be to do nothing and having to let the user
> experience a somewhat cryptic guest crash. He could google for a
> possible solution which would probably end in a rather high static
> limit resulting in wasting even more memory.

I realize this. Otoh more people running into this will improve
the chances of later ones finding useful suggestions. Of course
there's also nothing wrong with trying to make the error less
cryptic.

Jan



Re: [RFC] hw/usb/xen-usb.c: Pass struct usbback_req* to usbback_packet_complete()

2020-04-06 Thread Anthony PERARD
On Mon, Mar 23, 2020 at 04:43:18PM +, Peter Maydell wrote:
> The function usbback_packet_complete() currently takes a USBPacket*,
> which must be a pointer to the packet field within a struct
> usbback_req; the function uses container_of() to get the struct
> usbback_req* given the USBPacket*.
> 
> This is unnecessarily confusing (and in particular it confuses the
> Coverity Scan analysis, resulting in the false positive CID 1421919
> where it thinks that we write off the end of the structure). Since
> both callsites already have the pointer to the struct usbback_req,
> just pass that in directly.
> 
> Signed-off-by: Peter Maydell 
> ---
> This is an RFC because:
>  * I'm not very familiar with the Xen bits of QEMU
>  * the main rationale here is to change something that's
>confusing Coverity -- the code as it stands isn't wrong
>  * the only testing I've done is "make check"
> Still, the change seems like a good thing to me as a human reader...
> 
> PS: QEMU's MAINTAINERS file stanza for Xen doesn't pick up
> that this file is Xen related, so it could use an extra F: line.

Looks good,
Reviewed-by: Anthony PERARD 

Thanks,

-- 
Anthony PERARD



Re: [PATCH v2] xen/arm: implement GICD_I[S/C]ACTIVER reads

2020-04-06 Thread George Dunlap


> On Apr 3, 2020, at 9:27 PM, Julien Grall  wrote:
> 
> On Fri, 3 Apr 2020 at 20:41, Stefano Stabellini  
> wrote:
>> 
>> On Thu, 2 Apr 2020, Julien Grall wrote:
>>> As we discussed on Tuesday, the cost for other vCPUs is only going to be a
>>> trap to the hypervisor and then back again. The cost is likely smaller than
>>> receiving and forwarding an interrupt.
>>> 
>>> You actually agreed on this analysis. So can you enlighten me as to why
>>> receiving an interrupt is a not problem for latency but this is?
>> 
>> My answer was that the difference is that an operating system can
>> disable interrupts, but it cannot disable receiving this special IPI.
> 
> An OS can *only* disable its own interrupts, yet interrupts will still
> be received by Xen even if the interrupts are masked at the processor
> (e.g local_irq_disable()).
> 
> You would need to disable interrupts one by one as the GIC level (use
> ICENABLER) in order to not receive any interrupts. Yet, Xen may still
> receive interrupts for operational purposes (e.g serial, console,
> maintainance IRQ...). So trap will happen.

I think Stefano’s assertion is that the users he has in mind will be 
configuring the system such that RT workloads get a minimum number of 
hypervisor-related interrupts possible.  On a 4-core system, you could  have 
non-RT workloads running on cores 0-1, and RT workloads running with the NULL 
scheduler on cores 2-3.  In such a system, you’d obviously arrange that serial 
and maintenance IRQs are delivered to cores 0-1.

Julien, I think your argument above about interrupts somewhat undermines your 
point about deadlock.  Your basic argument is, that a guest will be interrupted 
by Xen quite frequently anyway for lots of reasons; adding one more to the list 
shouldn’t measurably increase the jitter.  But if it’s true that a guest will 
be interrupted by Xen frequently, then deadlock shouldn’t be much of an issue; 
and insofar as deadlock is an issue, it’s because there were no interrupts — 
and thus, adding an extra IPI will have a statistically significant effect on 
jitter.

On the other had, Stefano’s argument about deadlock (if I understand him 
correctly) has been that guests shouldn’t really be spinning on that register 
anyway.  But it sounds from Julien’s other email that perhaps spinning on the 
register is exactly what newer versions of Linux do — so Linux would certainly 
hang on such systems if Stefano’s assertion about the low number of Xen-related 
interrupts is true.

If the latter is true, then it sounds like addressing the deadlock issue will 
be necessary?  And so effort needs to be put towards figuring out how to 
minimize jitter due to Xen-related IPIs.

Wei Xu / Peng Fan, any opinions here?

 - George



Re: [PATCH 1/5] xen/common: introduce a new framework for save/restore of 'domain' context

2020-04-06 Thread Jan Beulich
On 06.04.2020 12:34, Paul Durrant wrote:
>>> Do you think this should also appear in a comment? Do we really care? 
>>> Nothing fundamentally prevents
>> the mechanism being used for HVM state, but that may introduce an ordering 
>> dependency.
>>
>> I don't particularly like the idea to let the contributor decide where
>> "HVM context" or as part of the "Domain context".
>>
>> This is  going to result to unwanted dependency and possibly
>> bikeshedding on where they should be saved.
>>
>> My preference would be to mark the existing framework as deprecated and
>> force all the new states to be saved as part of the new "Domain Context".
> 
> I'm ok with that. Any others have any opinion to the contrary?

Well, not in principle, but I won't have a firm opinion until I've
actually looked at (relevant parts of) the patches.

Jan



Re: [XEN PATCH] hvmloader: Enable MMIO and I/O decode, after all resource allocation

2020-04-06 Thread Andrew Cooper
On 06/04/2020 18:46, Harsha Shamsundara Havanur wrote:
> It was observed that PCI MMIO and/or IO BARs were programmed with
> BUS master, memory and I/O decodes (bits 0,1 and 2 of PCI COMMAND
> register) enabled, during PCI setup phase. This resulted in
> spurious and premature bus transactions as soon as the lower bar of
> the 64 bit bar is programmed. It is highly recommended that BARs be
> programmed whilst decode bits are cleared to avoid spurious bus
> transactions.

What kinds of spurious transactions?

Keeping memory and I/O decoding disabled until the BARs are set up is a
no-brainer, but busmastering is a more complicated subject.  Therefore,
it would be helpful to know exactly what you've seen in the way of
spurious transactions.

>
> This patch address the issue by deferring enablement of memory and
> I/O decode in command register until all the resources, like interrupts
> I/O and/or MMIO BARs for all the PCI device functions are programmed.
> PCI bus memory and I/O space is enabled in command register after
> all the resources like interrupts, I/O and/or MMIO BARs are
> programmed for all valid device functions. PCI BUS MASTER is kept
> disabled in the bootloader as this needs to be enabled by the guest
> OS driver once it initializes and takes control of the device.

Has this been tested with an Intel integrated graphics card?  These have
a habit of hitting a platform reset line if busmaster is ever disabled.

A lot of this will depend on what Qemu does behind the scenes, and
whether disabling busmastering gets reflected in the real device.

>
> Signed-off-by: Harsha Shamsundara Havanur 
> Ack-by: Paul Durrant 

Acked-by

~Andrew



Re: [PATCH v2] xen/arm: implement GICD_I[S/C]ACTIVER reads

2020-04-06 Thread Julien Grall




On 06/04/2020 18:58, George Dunlap wrote:




On Apr 3, 2020, at 9:27 PM, Julien Grall  wrote:

On Fri, 3 Apr 2020 at 20:41, Stefano Stabellini  wrote:


On Thu, 2 Apr 2020, Julien Grall wrote:

As we discussed on Tuesday, the cost for other vCPUs is only going to be a
trap to the hypervisor and then back again. The cost is likely smaller than
receiving and forwarding an interrupt.

You actually agreed on this analysis. So can you enlighten me as to why
receiving an interrupt is a not problem for latency but this is?


My answer was that the difference is that an operating system can
disable interrupts, but it cannot disable receiving this special IPI.


An OS can *only* disable its own interrupts, yet interrupts will still
be received by Xen even if the interrupts are masked at the processor
(e.g local_irq_disable()).

You would need to disable interrupts one by one as the GIC level (use
ICENABLER) in order to not receive any interrupts. Yet, Xen may still
receive interrupts for operational purposes (e.g serial, console,
maintainance IRQ...). So trap will happen.


I think Stefano’s assertion is that the users he has in mind will be 
configuring the system such that RT workloads get a minimum number of 
hypervisor-related interrupts possible.  On a 4-core system, you could  have 
non-RT workloads running on cores 0-1, and RT workloads running with the NULL 
scheduler on cores 2-3.  In such a system, you’d obviously arrange that serial 
and maintenance IRQs are delivered to cores 0-1.

Well maintenance IRQs are per pCPU so you can't route to another one...

But, I think you missed my point that local_irq_disable() from the guest 
will not prevent the hypervisor to receive interrupts *even* the one 
routed to the vCPU itself. They will just not be delivered to the guest 
context until local_irq_enable() is called.




Julien, I think your argument above about interrupts somewhat undermines your 
point about deadlock.  Your basic argument is, that a guest will be interrupted 
by Xen quite frequently anyway for lots of reasons; adding one more to the list 
shouldn’t measurably increase the jitter.  But if it’s true that a guest will 
be interrupted by Xen frequently, then deadlock shouldn’t be much of an issue; 
and insofar as deadlock is an issue, it’s because there were no interrupts — 
and thus, adding an extra IPI will have a statistically significant effect on 
jitter.


I presented two way to disable interrupts because Stefano's e-mail was 
not clear enough which one he was referring to. So I was hoping to safe 
some round-trip, but it looks like I muddle my point.


If Stefano was referring to critical sections where interrupts are 
masked using local_irq_disable(). Then, you are not going to limit the 
numbers of traps at all (see why above). So the point here was moot.


I don't believe Stefano was referring to the second case as disabling 
interrupts at the GIC level will require to trap in the hypervisor. But 
I thought I would mention it.


Regarding the livelock (Marc pointed out it wasn't a deadlock), there 
are multiple conflicting use cases. In an ideal world, there will be 
limited reasons to interrupts the guest. And yes it will result to a 
livelock.


In a less ideal world, there will some time be interruptions. But you 
don't know when. If you look at determinism then, I am afraid that 
Stefano's implementation is not because you don't know when the vCPU 
will exit.




On the other had, Stefano’s argument about deadlock (if I understand him 
correctly) has been that guests shouldn’t really be spinning on that register 
anyway.  But it sounds from Julien’s other email that perhaps spinning on the 
register is exactly what newer versions of Linux do — so Linux would certainly 
hang on such systems if Stefano’s assertion about the low number of Xen-related 
interrupts is true.


Rather than playing the game "one person's word against the other 
person's word", from Linux 5.4:


do {
unsigned long flags;

/*
 * Wait until we're out of the critical section.  This might
 * give the wrong answer due to the lack of memory barriers.
 */
while (irqd_irq_inprogress(&desc->irq_data))
cpu_relax();

/* Ok, that indicated we're done: double-check carefully. */
raw_spin_lock_irqsave(&desc->lock, flags);
inprogress = irqd_irq_inprogress(&desc->irq_data);

/*
 * If requested and supported, check at the chip whether it
 * is in flight at the hardware level, i.e. already pending
 * in a CPU and waiting for service and acknowledge.
 */
if (!inprogress && sync_chip) {
/*
 * Ignore the return code. inprogress is only updated
 * when the chip supports it.
 */
__irq_get_irqchip_state(irqd, IRQCHIP_STATE_ACTIVE,
&inprogress);
}
raw_spin_unlock_irqrestore(&desc->lock, flags);
/* Oops, that failed? */
} while (inprogress);



If the latter is true, then it sound

[ovmf test] 149462: all pass - PUSHED

2020-04-06 Thread osstest service owner
flight 149462 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149462/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf ee026ea78b0e32a9ffbaf0040afe91de8ae2179c
baseline version:
 ovmf ef5dcba975ee3b4c29b19ad0b23d371a2cd9d60a

Last test of basis   149393  2020-04-03 17:09:33 Z3 days
Testing same since   149462  2020-04-06 12:09:20 Z0 days1 attempts


People who touched revisions under test:
  Leif Lindholm 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   ef5dcba975..ee026ea78b  ee026ea78b0e32a9ffbaf0040afe91de8ae2179c -> 
xen-tested-master



[qemu-mainline test] 149458: regressions - FAIL

2020-04-06 Thread osstest service owner
flight 149458 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149458/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 10 debian-hvm-install fail REGR. vs. 
144861
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail 
REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-win7-amd64 10 windows-install  fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. 
vs. 144861
 test-amd64-i386-qemuu-rhel6hvm-intel 10 redhat-install   fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 10 debian-hvm-install fail REGR. 
vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-amd64 10 debian-hvm-install fail REGR. vs. 
144861
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail 
REGR. vs. 144861
 test-amd64-i386-xl-qemuu-ws16-amd64 10 windows-install   fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-install  fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 
144861
 test-amd64-amd64-qemuu-nested-amd 10 debian-hvm-install  fail REGR. vs. 144861
 test-amd64-i386-qemuu-rhel6hvm-amd 10 redhat-install fail REGR. vs. 144861
 test-amd64-amd64-libvirt 12 guest-start  fail REGR. vs. 144861
 test-amd64-i386-libvirt-pair 21 guest-start/debian   fail REGR. vs. 144861
 test-amd64-amd64-libvirt-xsm 12 guest-start  fail REGR. vs. 144861
 test-amd64-i386-libvirt  12 guest-start  fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 10 debian-hvm-install fail 
REGR. vs. 144861
 test-amd64-amd64-libvirt-pair 21 guest-start/debian  fail REGR. vs. 144861
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 10 debian-hvm-install fail 
REGR. vs. 144861
 test-amd64-amd64-qemuu-nested-intel 10 debian-hvm-install fail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 144861
 test-amd64-i386-libvirt-xsm  12 guest-start  fail REGR. vs. 144861
 test-arm64-arm64-libvirt-xsm 12 guest-start  fail REGR. vs. 144861
 test-armhf-armhf-libvirt 12 guest-start  fail REGR. vs. 144861
 test-amd64-amd64-libvirt-vhd 10 debian-di-installfail REGR. vs. 144861
 test-amd64-amd64-xl-qcow210 debian-di-installfail REGR. vs. 144861
 test-armhf-armhf-xl-vhd  10 debian-di-installfail REGR. vs. 144861
 test-armhf-armhf-libvirt-raw 10 debian-di-installfail REGR. vs. 144861
 test-amd64-i386-xl-qemuu-win7-amd64 10 windows-install   fail REGR. vs. 144861

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-rtds   16 guest-localmigrate fail in 149409 pass in 149458
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeat  fail pass in 149409

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-dom0pvh-xl-amd 18 guest-localmigrate/x10 fail baseline 
untested
 test-amd64-amd64-dom0pvh-xl-intel 18 guest-localmigrate/x10 fail in 149409 
baseline untested
 test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10   fail  like 144861
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-arm

[linux-linus test] 149469: regressions - trouble: broken/fail/pass

2020-04-06 Thread osstest service owner
flight 149469 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149469/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-dom0pvh-xl-amd broken
 test-amd64-i386-libvirt-pair broken
 test-amd64-amd64-dom0pvh-xl-amd  4 host-install(4) broken REGR. vs. 149238
 test-amd64-i386-libvirt-pair 5 host-install/dst_host(5) broken REGR. vs. 149238
 test-amd64-i386-xl-pvshim 7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-boot  fail REGR. vs. 149238
 test-amd64-i386-examine   8 reboot   fail REGR. vs. 149238
 test-amd64-i386-freebsd10-amd64  7 xen-boot  fail REGR. vs. 149238
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-boot  fail REGR. vs. 149238
 test-amd64-i386-xl-shadow 7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm  7 xen-boot  fail REGR. vs. 149238
 test-amd64-i386-libvirt   7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-pair 10 xen-boot/src_hostfail REGR. vs. 149238
 test-amd64-i386-pair 11 xen-boot/dst_hostfail REGR. vs. 149238
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-boot  fail REGR. vs. 149238
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-boot fail REGR. vs. 
149238
 test-amd64-i386-libvirt-xsm   7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-boot   fail REGR. vs. 149238
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-boot fail REGR. vs. 
149238
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm  7 xen-boot  fail REGR. vs. 149238
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 
149238
 test-amd64-i386-xl7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-xl-raw7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-boot   fail REGR. vs. 149238
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 
149238
 test-amd64-i386-xl-xsm7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-boot fail REGR. vs. 149238
 test-amd64-i386-freebsd10-i386  7 xen-boot   fail REGR. vs. 149238
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-boot  fail REGR. vs. 149238
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-boot  fail REGR. vs. 149238

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 149238

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-dom0pvh-xl-intel 15 guest-saverestorefail like 149238
 test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10   fail  like 149238
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 149238
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 149238
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 149238
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 149238
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 149238
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 149238
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-suppor

Re: [PATCH for-5.0] xen-block: Fix uninitialized variable

2020-04-06 Thread Markus Armbruster
Anthony PERARD  writes:

> On Mon, Apr 06, 2020 at 06:50:41PM +0200, Philippe Mathieu-Daudé wrote:
>> On 4/6/20 6:42 PM, Anthony PERARD wrote:
>> > Since 7f5d9b206d1e ("object-add: don't create return value if
>> > failed"), qmp_object_add() don't write any value in 'ret_data', thus
>> > has random data. Then qobject_unref() fails and abort().
>> > 
>> > Fix by initialising 'ret_data' properly.
>> 
>> Or move qobject_unref() after the error check?
>> 
>> -- >8 --
>> diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c
>> index 07bb32e22b..f3f1cbef65 100644
>> --- a/hw/block/xen-block.c
>> +++ b/hw/block/xen-block.c
>> @@ -869,7 +869,6 @@ static XenBlockIOThread *xen_block_iothread_create(const
>> char *id,
>>  qdict_put_str(opts, "id", id);
>>  qmp_object_add(opts, &ret_data, &local_err);
>>  qobject_unref(opts);
>> -qobject_unref(ret_data);
>> 
>>  if (local_err) {
>>  error_propagate(errp, local_err);
>> @@ -878,6 +877,7 @@ static XenBlockIOThread *xen_block_iothread_create(const
>> char *id,
>>  g_free(iothread);
>>  return NULL;
>>  }
>> +qobject_unref(ret_data);
>
> That won't help, qmp_object_add() doesn't change the value of ret_data
> at all. The other users of qmp_object_add() passes an initialised
> 'ret_data', so we should do the same I think.

Since the QMP core does it, other callers should do it, too.

For QAPI commands that don't return anything, the marshaller should not
use @ret_data, let alone store through it.  qmp_object_add() complies.
Thus, assert(!ret_data) would do.  qobject_unref(ret_data) is also
correct.

Reviewed-by: Markus Armbruster 




[libvirt test] 149482: regressions - FAIL

2020-04-06 Thread osstest service owner
flight 149482 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149482/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 146182
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 146182
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 146182
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 146182

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  88011ed280c4f946a7b8e7ffcea2335eb075de60
baseline version:
 libvirt  a1cd25b919509be2645dbe6f952d5263e0d4e4e5

Last test of basis   146182  2020-01-17 06:00:23 Z   81 days
Failing since146211  2020-01-18 04:18:52 Z   80 days   77 attempts
Testing same since   149482  2020-04-07 04:18:51 Z0 days1 attempts


People who touched revisions under test:
  Andrea Bolognani 
  Arnaud Patard 
  Boris Fiuczynski 
  Christian Ehrhardt 
  Christian Schoenebeck 
  Collin Walling 
  Daniel Henrique Barboza 
  Daniel P. Berrangé 
  Daniel Veillard 
  Dario Faggioli 
  Erik Skultety 
  Gaurav Agrawal 
  Han Han 
  Jim Fehlig 
  Jiri Denemark 
  Jonathon Jongsma 
  Julio Faracco 
  Ján Tomko 
  Laine Stump 
  Lin Ma 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Mauro S. M. Rodrigues 
  Michal Privoznik 
  Nikolay Shirokovskiy 
  Pavel Hrdina 
  Pavel Mores 
  Peter Krempa 
  Pino Toscano 
  Prathamesh Chavan 
  Rafael Fonseca 
  Richard W.M. Jones 
  Rikard Falkeborn 
  Ryan Moeller 
  Sahid Orentino Ferdjaoui 
  Sebastian Mitterle 
  Seeteena Thoufeek 
  Stefan Berger 
  Stefan Berger 
  Stefan Hajnoczi 
  Thomas Huth 
  Wu Qingliang 
  Your Name 
  Zhang Bo 
  zhenwei pi 
  Zhimin Feng 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  fail
 build-arm64-libvirt  fail
 build-armhf-libvirt  fail
 build-i386-libvirt   fail
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   blocked 
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmblocked 
 test-amd64-amd64-libvirt-xsm blocked 
 test-arm64-arm64-libvirt-xsm blocked 
 test-amd64-i386-libvirt-xsm  blocked 
 test-amd64-amd64-libvirt blocked 
 test-arm64-arm64-libvirt blocked 
 test-armhf-armhf-libvirt blocked 
 test-amd64-i386-libvirt  blocked 
 test-amd64-amd64-libvirt-pairblocked 
 test-amd64-i386-libvirt-pair blocked 
 test-arm64-arm64-libvirt-qcow2   blocked 
 test-armhf-armhf-libvirt-raw blocked 
 test-amd64-amd64-libvirt-vhd