date:20201029

Re: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header

2020-10-29 Thread Paolo Bonzini

On 28/10/20 22:20, Arnd Bergmann wrote:
> Avoid this by renaming the global 'apic' variable to the more descriptive
> 'x86_system_apic'. It was originally called 'genapic', but both that
> and the current 'apic' seem to be a little overly generic for a global
> variable.

The 'apic' affects only the current CPU, so one of 'x86_local_apic',
'x86_lapic' or 'x86_apic' is probably preferrable.

I don't have huge objections to renaming 'apic' variables and arguments
in KVM to 'lapic'.  I do agree with Sean however that it's going to
break again very soon.

Paolo

Re: [PATCH 20/33] docs: ABI: testing: make the files compatible with ReST output

2020-10-29 Thread Mauro Carvalho Chehab

Hi Richard,

Em Wed, 28 Oct 2020 10:44:27 -0700
Richard Cochran  escreveu:

> On Wed, Oct 28, 2020 at 03:23:18PM +0100, Mauro Carvalho Chehab wrote:
> 
> > diff --git a/Documentation/ABI/testing/sysfs-uevent 
> > b/Documentation/ABI/testing/sysfs-uevent
> > index aa39f8d7bcdf..d0893dad3f38 100644
> > --- a/Documentation/ABI/testing/sysfs-uevent
> > +++ b/Documentation/ABI/testing/sysfs-uevent
> > @@ -19,7 +19,8 @@ Description:
> >  a transaction identifier so it's possible to use the same 
> > UUID
> >  value for one or more synthetic uevents in which case we
> >  logically group these uevents together for any userspace
> > -listeners. The UUID value appears in uevent as
> > +listeners. The UUID value appears in uevent as:  
> 
> I know almost nothing about Sphinx, but why have one colon here ^^^ and ...

Good point. After re-reading the text, this ":" doesn't belong here.

> 
> > +
> >  "SYNTH_UUID=----" 
> > environment
> >  variable.
> >  
> > @@ -30,18 +31,19 @@ Description:
> >  It's possible to define zero or more pairs - each pair is 
> > then
> >  delimited by a space character ' '. Each pair appears in
> >  synthetic uevent as "SYNTH_ARG_KEY=VALUE". That means the 
> > KEY
> > -name gains "SYNTH_ARG_" prefix to avoid possible collisions
> > +name gains `SYNTH_ARG_` prefix to avoid possible collisions
> >  with existing variables.
> >  
> > -Example of valid sequence written to the uevent file:
> > +Example of valid sequence written to the uevent file::  
> 
> ... two here?

The main issue that this patch wants to solve is here:

This generates synthetic uevent including these variables::

ACTION=add
SYNTH_ARG_A=1
SYNTH_ARG_B=abc
SYNTH_UUID=fe4d7c9d-b8c6-4a70-9ef1-3d8a58d18eed

On Sphinx, consecutive lines with the same indent belongs to the same
paragraph. So, without "::", the above will be displayed on a single line,
which is undesired.

using "::" tells Sphinx to display as-is. It will also place it into a a 
box (colored for html output) and using a monospaced font.

The change at the "uevent file:" line was done just for coherency
purposes.

Yet, after re-reading the text, there are other things that are not
coherent. So, I guess the enclosed patch will work better for sys-uevent.

Thanks,
Mauro

docs: ABI: sysfs-uevent: make it compatible with ReST output

- Replace " by ``, in order to use monospaced fonts;
- mark literal blocks as such.

Signed-off-by: Mauro Carvalho Chehab 

diff --git a/Documentation/ABI/testing/sysfs-uevent 
b/Documentation/ABI/testing/sysfs-uevent
index aa39f8d7bcdf..0b6227706b35 100644
--- a/Documentation/ABI/testing/sysfs-uevent
+++ b/Documentation/ABI/testing/sysfs-uevent
@@ -6,42 +6,46 @@ Description:
 Enable passing additional variables for synthetic uevents that
 are generated by writing /sys/.../uevent file.
 
-Recognized extended format is ACTION [UUID [KEY=VALUE ...].
+Recognized extended format is::
 
-The ACTION is compulsory - it is the name of the uevent action
-("add", "change", "remove"). There is no change compared to
-previous functionality here. The rest of the extended format
-is optional.
+   ACTION [UUID [KEY=VALUE ...]
+
+The ACTION is compulsory - it is the name of the uevent
+action (``add``, ``change``, ``remove``). There is no change
+compared to previous functionality here. The rest of the
+extended format is optional.
 
 You need to pass UUID first before any KEY=VALUE pairs.
-The UUID must be in "----"
+The UUID must be in ``----``
 format where 'x' is a hex digit. The UUID is considered to be
 a transaction identifier so it's possible to use the same UUID
 value for one or more synthetic uevents in which case we
 logically group these uevents together for any userspace
 listeners. The UUID value appears in uevent as
-"SYNTH_UUID=----" environment
+``SYNTH_UUID=----`` environment
 variable.
 
 If UUID is not passed in, the generated synthetic uevent gains
-"SYNTH_UUID=0" environment variable automatically.
+``SYNTH_UUID=0`` environment variable automatically.
 
 The KEY=VALUE pairs can contain alphanumeric char

Re: [PATCH V2 00/23] IOREQ feature (+ virtio-mmio) on Arm

2020-10-29 Thread Masami Hiramatsu

Hi Oleksandr,

I would like to try this on my arm64 board.

According to your comments in the patch, I made this config file.
# cat debian.conf
name = "debian"
type = "pvh"
vcpus = 8
memory = 512
kernel = "/opt/agl/vmlinuz-5.9.0-1-arm64"
ramdisk = "/opt/agl/initrd.img-5.9.0-1-arm64"
cmdline= "console=hvc0 earlyprintk=xen root=/dev/xvda1 rw"
disk = [ '/opt/agl/debian.qcow2,qcow2,hda' ]
vif = [ 'mac=00:16:3E:74:3d:76,bridge=xenbr0' ]
virtio = 1
vdisk = [ 'backend=Dom0, disks=ro:/dev/sda1' ]

And tried to boot a DomU, but I got below error.

# xl create -c debian.conf
Parsing config from debian.conf
libxl: error: libxl_create.c:1863:domcreate_attach_devices: Domain
1:unable to add virtio_disk devices
libxl: error: libxl_domain.c:1218:destroy_domid_pci_done: Domain
1:xc_domain_pause failed
libxl: error: libxl_dom.c:39:libxl__domain_type: unable to get domain
type for domid=1
libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain
1:Unable to destroy guest
libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain
1:Destruction of domain failed


Could you tell me how can I test it?

Thank you,

2020年10月16日(金) 1:46 Oleksandr Tyshchenko :
>
> From: Oleksandr Tyshchenko 
>
> Hello all.
>
> The purpose of this patch series is to add IOREQ/DM support to Xen on Arm.
> You can find an initial discussion at [1] and RFC/V1 series at [2]/[3].
> Xen on Arm requires some implementation to forward guest MMIO access to a 
> device
> model in order to implement virtio-mmio backend or even mediator outside of 
> hypervisor.
> As Xen on x86 already contains required support this series tries to make it 
> common
> and introduce Arm specific bits plus some new functionality. Patch series is 
> based on
> Julien's PoC "xen/arm: Add support for Guest IO forwarding to a device 
> emulator".
> Besides splitting existing IOREQ/DM support and introducing Arm side, the 
> series
> also includes virtio-mmio related changes (last 2 patches for toolstack)
> for the reviewers to be able to see how the whole picture could look like.
>
> According to the initial discussion there are a few open questions/concerns
> regarding security, performance in VirtIO solution:
> 1. virtio-mmio vs virtio-pci, SPI vs MSI, different use-cases require 
> different
>transport...
> 2. virtio backend is able to access all guest memory, some kind of protection
>is needed: 'virtio-iommu in Xen' vs 'pre-shared-memory & memcpys in guest'
> 3. interface between toolstack and 'out-of-qemu' virtio backend, avoid using
>Xenstore in virtio backend if possible.
> 4. a lot of 'foreing mapping' could lead to the memory exhaustion, Julien
>has some idea regarding that.
>
> Looks like all of them are valid and worth considering, but the first thing
> which we need on Arm is a mechanism to forward guest IO to a device emulator,
> so let's focus on it in the first place.
>
> ***
>
> There are a lot of changes since RFC series, almost all TODOs were resolved 
> on Arm,
> Arm code was improved and hardened, common IOREQ/DM code became really 
> arch-agnostic
> (without HVM-ism), but one TODO still remains which is "PIO handling" on Arm.
> The "PIO handling" TODO is expected to left unaddressed for the current 
> series.
> It is not an big issue for now while Xen doesn't have support for vPCI on Arm.
> On Arm64 they are only used for PCI IO Bar and we would probably want to 
> expose
> them to emulator as PIO access to make a DM completely arch-agnostic. So "PIO 
> handling"
> should be implemented when we add support for vPCI.
>
> I left interface untouched in the following patch
> "xen/dm: Introduce xendevicemodel_set_irq_level DM op"
> since there is still an open discussion what interface to use/what
> information to pass to the hypervisor.
>
> Also I decided to drop the following patch:
> "[RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in 
> driver domain"
> as I got an advise to write our own policy using FLASK which would cover our 
> use
> case (with emulator in driver domain) rather than tweak Xen.
>
> There are two patches on review this series depends on (each involved patch 
> in this series
> contains this note as well):
> 1. https://patchwork.kernel.org/patch/11816689
> 2. https://patchwork.kernel.org/patch/11803383
>
> Please note, that IOREQ feature is disabled by default within this series.
>
> ***
>
> Patch series [4] was rebased on recent "staging branch"
> (8a62dee x86/vLAPIC: don't leak regs page from vlapic_init() upon error) and 
> tested on
> Renesas Salvator-X board + H3 ES3.0 SoC (Arm64) with virtio-mmio disk backend 
> (we will
> share it later) running in driver domain and unmodified Linux Guest running 
> on existing
> virtio-blk driver (frontend). No issues were observed. Guest domain 
> 'reboot/destroy'
> use-cases work properly. Patch series was only build-tested on x86.
>
> Please note, build-test passed for the following modes:
> 1. x86: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default)
> 2. x86:

[libvirt test] 156273: regressions - FAIL

2020-10-29 Thread osstest service owner

flight 156273 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/156273/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  1f807631f402210d036ec4803e7adfefa222f786
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  111 days
Failing since151818  2020-07-11 04:18:52 Z  110 days  104 attempts
Testing same since   156273  2020-10-28 04:19:15 Z1 days1 attempts


People who touched revisions under test:
  Adolfo Jayme Barrientos 
  Andika Triwidada 
  Andrea Bolognani 
  Balázs Meskó 
  Bastien Orivel 
  Bihong Yu 
  Binfeng Wu 
  Boris Fiuczynski 
  Christian Ehrhardt 
  Christian Schoenebeck 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Erik Skultety 
  Fabian Freyer 
  Fangge Jin 
  Fedora Weblate Translation 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Ian Wienand 
  Jamie Strandboge 
  Jamie Strandboge 
  Jean-Baptiste Holcroft 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jiri Denemark 
  Jonathon Jongsma 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Laine Stump 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Marc Hartmayer 
  Marek Marczykowski-Górecki 
  Markus Schade 
  Martin Kletzander 
  Masayoshi Mizuma 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Neal Gompa 
  Nico Pache 
  Nikolay Shirokovskiy 
  Olesya Gerasimenko 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Roman Bogorodskiy 
  Roman Bolshakov 
  Ryan Schmidt 
  Sam Hartman 
  Scott Shambarger 
  Sebastian Mitterle 
  Simon Gaiser 
  Stefan Bader 
  Stefan Berger 
  Szymon Scholz 
  Thomas Huth 
  Tim Wiederhake 
  Tomáš Golembiovský 
  Wang Xin 
  Weblate 
  Yang Hang 
  Yanqiu Zhang 
  Yi Li 
  Yi Wang 
  Yuri Chornoivan 
  Zheng Chuan 
  zhenwei pi 
  Zhenyu Zheng 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  fail
 build-arm64-libvirt  fail
 build-armhf-libvirt  fail
 build-i386-libvirt   fail
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   blocked 
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmblocked 
 test-amd64-amd64-libvirt-xsm blocked 
 test-arm64-arm64-libvirt-xsm

[xen-unstable test] 156268: regressions - FAIL

2020-10-29 Thread osstest service owner

flight 156268 xen-unstable real [real]
flight 156289 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/156268/
http://logs.test-lab.xenproject.org/osstest/logs/156289/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-vhd  20 leak-check/check fail REGR. vs. 156254

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds18 guest-start/debian.repeat fail REGR. vs. 156254

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10   fail  like 156254
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 156254
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 156254
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 156254
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 156254
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 156254
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 156254
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 156254
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 156254
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 156254
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 156254
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 156254
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 xen  16a20963b3209788f2c0d3a3eebb7d92f03f5883
baseline version:
 xen  964781c6f162893677c50a779b7d562a299727ba

Last test of basis   156254  2020-10-27 06:38:30 Z2 days

Re: [PATCH v1] libacpi: use temporary files for generated files

2020-10-29 Thread Jan Beulich

On 28.10.2020 19:13, Anthony PERARD wrote:
> On Tue, Oct 27, 2020 at 12:06:56PM +0100, Jan Beulich wrote:
>> On 27.10.2020 11:57, Andrew Cooper wrote:
>>> On 27/10/2020 10:37, Jan Beulich wrote:
 On 27.10.2020 11:27, Olaf Hering wrote:
> Am Tue, 27 Oct 2020 11:16:04 +0100
> schrieb Jan Beulich :
>
>> This pattern is used when a rule consists of multiple commands
>> having their output appended to one another's.
> My understanding is: a rule is satisfied as soon as the file exists.
 No - once make has found that a rule's commands need running, it'll
 run the full set and only check again afterwards.
>>>
>>> It stops at the first command which fails.
>>>
>>> Olaf is correct, but the problem here is an incremental build issue, not
>>> a parallel build issue.
>>>
>>> Intermediate files must not use the name of the target, or a failure and
>>> re-build will use the (bogus) intermediate state rather than rebuilding it.
>>
>> But there's no intermediate file here - the file gets created in one
>> go. Furthermore doesn't make delete the target file(s) when a rule
>> fails? (One may not want to rely on this, and hence indeed keep multi-
>> part rules update intermediate files of different names.)
> 
> What if something went badly enough and sed didn't managed to write the
> whole file and make never had a chance to remove the bogus file?

How's this different from an object file getting only partly written
and not deleted? We'd have to use the temporary file approach in
literally every rule if we wanted to cater for this.

Jan

Re: [PATCH v3] x86/pv: inject #UD for entirely missing SYSCALL callbacks

2020-10-29 Thread Jan Beulich

On 28.10.2020 22:31, Andrew Cooper wrote:
> On 26/10/2020 09:40, Jan Beulich wrote:
>> In the case that no 64-bit SYSCALL callback is registered, the guest
>> will be crashed when 64-bit userspace executes a SYSCALL instruction,
>> which would be a userspace => kernel DoS.  Similarly for 32-bit
>> userspace when no 32-bit SYSCALL callback was registered either.
>>
>> This has been the case ever since the introduction of 64bit PV support,
>> but behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which
>> yield #GP/#UD in userspace before the callback is registered, and are
>> therefore safe by default.
>>
>> This change does constitute a change in the PV ABI, for the corner case
>> of a PV guest kernel not registering a 64-bit callback (which has to be
>> considered a defacto requirement of the unwritten PV ABI, considering
>> there is no PV equivalent of EFER.SCE).
>>
>> It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
>> SYSENTER (safe by default, until explicitly enabled).
>>
>> Signed-off-by: Andrew Cooper 
>> Signed-off-by: Jan Beulich 
>> ---
>> v3:
>>  * Split this change off of "x86/pv: Inject #UD for missing SYSCALL
>>callbacks", to allow the uncontroversial part of that change to go
>>in. Add conditional "rex64" for UREGS_rip adjustment. (Is branching
>>over just the REX prefix too much trickery even for an unlikely to be
>>taken code path?)
> 
> I find this submission confusion seeing as my v3 is already suitably
> acked and ready to commit.  What I haven't had yet enough free time to
> do so.

My objection to the other half stands, and hence, I'm afraid, stands
in the way of your patch getting committed. Aiui Roger's ack doesn't
invalidate my objection, sorry.

>> --- a/xen/arch/x86/x86_64/entry.S
>> +++ b/xen/arch/x86/x86_64/entry.S
>> @@ -33,11 +33,27 @@ ENTRY(switch_to_kernel)
>>  cmoveq VCPU_syscall32_addr(%rbx),%rax
>>  testq %rax,%rax
>>  cmovzq VCPU_syscall_addr(%rbx),%rax
>> -movq  %rax,TRAPBOUNCE_eip(%rdx)
>>  /* TB_flags = VGCF_syscall_disables_events ? TBF_INTERRUPT : 0 */
>>  btl   $_VGCF_syscall_disables_events,VCPU_guest_context_flags(%rbx)
>>  setc  %cl
>>  leal  (,%rcx,TBF_INTERRUPT),%ecx
>> +
>> +test  %rax, %rax
>> +UNLIKELY_START(z, syscall_no_callback) /* TB_eip == 0 => #UD */
>> +mov   VCPU_trap_ctxt(%rbx), %rdi
>> +movl  $X86_EXC_UD, UREGS_entry_vector(%rsp)
>> +cmpw  $FLAT_USER_CS32, UREGS_cs(%rsp)
>> +je0f
>> +rex64   # subl => subq
>> +0:
>> +subl  $2, UREGS_rip(%rsp)
> 
> There was deliberately not a 32bit sub here (see below).

Funny you should say this when what I've taken as input (v2; I
don't think I had seen a v3, or else I would have called this
one v4) had "subl", not "subq". Roger's comment was regarding
a "mov" with a 32-bit register destination, not this "sub". If
you had also noticed and fixed this one, without you having
posted v3 (or without me having seen the posting) I couldn't
have known.

> As to the construct, I'm having a hard time deciding whether this is an
> excellent idea, or far too clever for its own good.
> 
> Some basic perf testing shows that there is a visible difference in
> execution time of these two paths, which means there is some
> optimisation being missed in the frontend for the 32bit case.  However,
> the difference is tiny in the grand scheme of things (about 0.4%
> difference in aggregate time to execute a loop of this pattern with a
> fixed number of iterations.)
> 
> However, the 32bit case isn't actually interesting here.  A guest can't
> execute a SYSCALL instruction on/across the 4G->0 boundary because the
> M2P is mapped NX up to the 4G boundary, so we can never reach this point
> with %eip < 2.
> 
> Therefore, the 64bit-only form is the appropriate one to use, which
> solves any question of cleverness, or potential decode stalls it causes.

Good point.

Jan

Re: call traces in xl dmesg during boot

2020-10-29 Thread Jan Beulich

On 29.10.2020 09:22, Olaf Hering wrote:
> During boot of xen.git#staging, Xen seems to think something pressed debug 
> keys very early, which shows various call traces in 'xl dmesg'. A reboot may 
> "fix" it, and no traces are printed.

I'm seeing the same every now and then, albeit only with a single
'A' so far, iirc.

> Any idea what may cause this behavior? I do not see it on a slightly smaller 
> box.

I've been assuming this is stuff left in the serial port's input
latch or FIFO. So far I didn't have a good idea how to tell
left over garbage from actual input that was sent very early, so
I've no good idea how to work around it. A command line option
triggering the discarding of initially present input doesn't
look very attractive to me ...

Jan

Re: call traces in xl dmesg during boot

2020-10-29 Thread Andrew Cooper

On 29/10/2020 08:22, Olaf Hering wrote:
> During boot of xen.git#staging, Xen seems to think something pressed debug 
> keys very early, which shows various call traces in 'xl dmesg'. A reboot may 
> "fix" it, and no traces are printed.
>
> Any idea what may cause this behavior? I do not see it on a slightly smaller 
> box.

That means Xen is receiving real keystrokes on the UART.

I've seen it before on hardware with floating wires in place of a
properly connected UART, but I've also seen it on systems with an
incorrectly configured BAUD rate.  Looking at your command line, you
probably want com1=115200,8n1 although this should also be the default.

Totally unrelated to the problem at hand, but an observation.

(XEN) [00d6b68c0ecc] WARNING: NR_CPUS limit of 256 reached -
ignoring further processors

You'll need NR_CPUS configured to 512 to boot properly on this system.

~Andrew

signature.asc
Description: OpenPGP digital signature

Re: call traces in xl dmesg during boot

2020-10-29 Thread Olaf Hering

Am Thu, 29 Oct 2020 09:13:08 +
schrieb Andrew Cooper :

> You'll need NR_CPUS configured to 512 to boot properly on this system.

Ah, thanks. My builds use the built-in defaults. I will adjust my future builds.

Olaf


pgpr5K1K_ZLum.pgp
Description: Digitale Signatur von OpenPGP

Re: call traces in xl dmesg during boot

2020-10-29 Thread Olaf Hering

Am Thu, 29 Oct 2020 10:10:51 +0100
schrieb Jan Beulich :

> I've been assuming this is stuff left in the serial port's input
> latch or FIFO.

Weird, there is no `ipmitool` attached to this host.

Maybe the firmware does funny things...


Olaf


pgpQfM2XzJkIH.pgp
Description: Digitale Signatur von OpenPGP

Re: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header

2020-10-29 Thread Arnd Bergmann

On Thu, Oct 29, 2020 at 8:04 AM Paolo Bonzini  wrote:
>
> On 28/10/20 22:20, Arnd Bergmann wrote:
> > Avoid this by renaming the global 'apic' variable to the more descriptive
> > 'x86_system_apic'. It was originally called 'genapic', but both that
> > and the current 'apic' seem to be a little overly generic for a global
> > variable.
>
> The 'apic' affects only the current CPU, so one of 'x86_local_apic',
> 'x86_lapic' or 'x86_apic' is probably preferrable.

Ok, I'll change it to x86_local_apic then, unless someone else has
a preference between them.

> I don't have huge objections to renaming 'apic' variables and arguments
> in KVM to 'lapic'.  I do agree with Sean however that it's going to
> break again very soon.

I think ideally there would be no global variable, withall accesses
encapsulated in function calls, possibly using static_call() optimizations
if any of them are performance critical.

It doesn't seem hard to do, but I'd rather leave that change to
an x86 person ;-)

  Arnd

Re: [PATCH v2 3/3] xen/arm: Warn user on cpu errata 832075

2020-10-29 Thread Bertrand Marquis

Hi Julien,

> On 28 Oct 2020, at 18:39, Julien Grall  wrote:
> 
> Hi Bertrand,
> 
> On 26/10/2020 16:21, Bertrand Marquis wrote:
>> When a Cortex A57 processor is affected by CPU errata 832075, a guest
>> not implementing the workaround for it could deadlock the system.
>> Add a warning during boot informing the user that only trusted guests
>> should be executed on the system.
>> An equivalent warning is already given to the user by KVM on cores
>> affected by this errata.
>> Also taint the hypervisor as unsecure when this errata applies and
>> mention Cortex A57 r0p0 - r1p2 as not security supported in SUPPORT.md
>> Signed-off-by: Bertrand Marquis 
> 
> Reviewed-by: Julien Grall 

Thanks

> 
> If you don't need to resend the series, then I would be happy to fix the typo 
> pointed out by George on commit.

There is only the condensing from Stefano.
If you can handle that on commit to great but if you need me to send a v3 to 
make your life easier do not hesitate to tell me

Cheers
Bertrand

> 
> Cheers,
> 
> -- 
> Julien Grall
>

Re: [PATCH] meson.build: fix building of Xen support for aarch64

2020-10-29 Thread Alex Bennée



Stefano Stabellini  writes:

> On Wed, 28 Oct 2020, Alex Bennée wrote:
>> Xen is supported on aarch64 although weirdly using the i386-softmmu
>> model. Checking based on the host CPU meant we never enabled Xen
>> support. It would be nice to enable CONFIG_XEN for aarch64-softmmu to
>> make it not seem weird but that will require further build surgery.
>> 
>> Signed-off-by: Alex Bennée 
>> Cc: Masami Hiramatsu 
>> Cc: Stefano Stabellini 
>> Cc: Anthony Perard 
>> Cc: Paul Durrant 
>> Fixes: 8a19980e3f ("configure: move accelerator logic to meson")
>> ---
>>  meson.build | 2 ++
>>  1 file changed, 2 insertions(+)
>> 
>> diff --git a/meson.build b/meson.build
>> index 835424999d..f1fcbfed4c 100644
>> --- a/meson.build
>> +++ b/meson.build
>> @@ -81,6 +81,8 @@ if cpu in ['x86', 'x86_64']
>>  'CONFIG_HVF': ['x86_64-softmmu'],
>>  'CONFIG_WHPX': ['i386-softmmu', 'x86_64-softmmu'],
>>}
>> +elif cpu in [ 'arm', 'aarch64' ]
>> +  accelerator_targets += { 'CONFIG_XEN': ['i386-softmmu'] }
>>  endif
>
> This looks very reasonable -- the patch makes sense.
>
>
> However I have two questions, mostly for my own understanding. I tried
> to repro the aarch64 build problem but it works at my end, even without
> this patch.

Building on a x86 host or with cross compiler?

> I wonder why. I suspect it works thanks to these lines in
> meson.build:
>
>   if not get_option('xen').disabled() and 'CONFIG_XEN_BACKEND' in config_host
> accelerators += 'CONFIG_XEN'
> have_xen_pci_passthrough = not 
> get_option('xen_pci_passthrough').disabled() and targetos == 'linux'
>   else
> have_xen_pci_passthrough = false
>   endif
>
> But I am not entirely sure who is adding CONFIG_XEN_BACKEND to
> config_host.

The is part of the top level configure check - which basically checks
for --enable-xen or autodetects the presence of the userspace libraries.
I'm not sure if we have a slight over proliferation of symbols for Xen
support (although I'm about to add more).

> The other question is: does it make sense to print the value of
> CONFIG_XEN as part of the summary? Something like:
>
> diff --git a/meson.build b/meson.build
> index 47e32e1fcb..c6e7832dc9 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -2070,6 +2070,7 @@ summary_info += {'KVM support':   
> config_all.has_key('CONFIG_KVM')}
>  summary_info += {'HAX support':   config_all.has_key('CONFIG_HAX')}
>  summary_info += {'HVF support':   config_all.has_key('CONFIG_HVF')}
>  summary_info += {'WHPX support':  config_all.has_key('CONFIG_WHPX')}
> +summary_info += {'XEN support':  config_all.has_key('CONFIG_XEN')}
>  summary_info += {'TCG support':   config_all.has_key('CONFIG_TCG')}
>  if config_all.has_key('CONFIG_TCG')
>summary_info += {'TCG debug enabled': 
> config_host.has_key('CONFIG_DEBUG_TCG')}
>
>
> But I realize there is already:
>
> summary_info += {'xen support':   
> config_host.has_key('CONFIG_XEN_BACKEND')}
>
> so it would be a bit of a duplicate

Hmm so what we have is:

  CONFIG_XEN_BACKEND
- ensures that appropriate compiler flags are added
- pegs RAM_ADDR_MAX at UINT64_MAX (instead of UINTPTR_MAX)
  CONFIG_XEN
- which controls a bunch of build objects, some of which may be i386 only?
./accel/meson.build:15:specific_ss.add_all(when: ['CONFIG_XEN'], if_true: 
dummy_ss)
./accel/stubs/meson.build:2:specific_ss.add(when: 'CONFIG_XEN', if_false: 
files('xen-stub.c'))
./accel/xen/meson.build:1:specific_ss.add(when: 'CONFIG_XEN', if_true: 
files('xen-all.c'))
./hw/9pfs/meson.build:17:fs_ss.add(when: 'CONFIG_XEN', if_true: 
files('xen-9p-backend.c'))
./hw/block/dataplane/meson.build:2:specific_ss.add(when: 'CONFIG_XEN', 
if_true: files('xen-block.c'))
./hw/block/meson.build:14:softmmu_ss.add(when: 'CONFIG_XEN', if_true: 
files('xen-block.c'))
./hw/char/meson.build:23:softmmu_ss.add(when: 'CONFIG_XEN', if_true: 
files('xen_console.c'))
./hw/display/meson.build:18:softmmu_ss.add(when: 'CONFIG_XEN', if_true: 
files('xenfb.c'))
./hw/i386/xen/meson.build:1:i386_ss.add(when: 'CONFIG_XEN', if_true: 
files('xen-hvm.c',
   
'xen-mapcache.c',
   
'xen_apic.c',
   
'xen_platform.c',
   
'xen_pvdevice.c')
./hw/net/meson.build:2:softmmu_ss.add(when: 'CONFIG_XEN', if_true: 
files('xen_nic.c'))
./hw/usb/meson.build:76:softmmu_ss.add(when: ['CONFIG_USB', 'CONFIG_XEN', 
libusb], if_true: files('xen-usb.c'))
./hw/xen/meson.build:1:softmmu_ss.add(when: ['CONFIG_XEN', xen], if_true: 
files(
./hw/xen/meson.build:20:specific_ss.add_all(when: ['CONFIG_XEN', xen], 
if_true: xen_specific_ss)
./hw/xenpv/meson.build:3:xenpv_ss.add(when: 'CONFIG_XEN', if_true: 
files('xen_machine_p

Re: [PATCH v1] libacpi: use temporary files for generated files

2020-10-29 Thread Anthony PERARD

On Thu, Oct 29, 2020 at 09:47:09AM +0100, Jan Beulich wrote:
> On 28.10.2020 19:13, Anthony PERARD wrote:
> > On Tue, Oct 27, 2020 at 12:06:56PM +0100, Jan Beulich wrote:
> >> On 27.10.2020 11:57, Andrew Cooper wrote:
> >>> On 27/10/2020 10:37, Jan Beulich wrote:
>  On 27.10.2020 11:27, Olaf Hering wrote:
> > Am Tue, 27 Oct 2020 11:16:04 +0100
> > schrieb Jan Beulich :
> >
> >> This pattern is used when a rule consists of multiple commands
> >> having their output appended to one another's.
> > My understanding is: a rule is satisfied as soon as the file exists.
>  No - once make has found that a rule's commands need running, it'll
>  run the full set and only check again afterwards.
> >>>
> >>> It stops at the first command which fails.
> >>>
> >>> Olaf is correct, but the problem here is an incremental build issue, not
> >>> a parallel build issue.
> >>>
> >>> Intermediate files must not use the name of the target, or a failure and
> >>> re-build will use the (bogus) intermediate state rather than rebuilding 
> >>> it.
> >>
> >> But there's no intermediate file here - the file gets created in one
> >> go. Furthermore doesn't make delete the target file(s) when a rule
> >> fails? (One may not want to rely on this, and hence indeed keep multi-
> >> part rules update intermediate files of different names.)
> > 
> > What if something went badly enough and sed didn't managed to write the
> > whole file and make never had a chance to remove the bogus file?
> 
> How's this different from an object file getting only partly written
> and not deleted? We'd have to use the temporary file approach in
> literally every rule if we wanted to cater for this.

I though that things like `gcc' would write the final object to a
temporary place then rename it to the final destination, but that
doesn't seems to be the case.

I tried to see what happens if the `sed' command fails, and the target is
created, empty, and doesn't gets deleted by make. So an incremental
build uses a broken file without trying to rebuild it.

If we want `make' to delete target when a rule fails, I think we need to
add '.DELETE_ON_ERROR:' somewhere. Or avoid creating files before the
command is likely to succeed.

Cheers,

-- 
Anthony PERARD

Re: [PATCH v1] libacpi: use temporary files for generated files

2020-10-29 Thread Olaf Hering

Am Thu, 29 Oct 2020 10:57:15 +
schrieb Anthony PERARD :

> we need to add '.DELETE_ON_ERROR:'

The last paragraph at 
https://www.gnu.org/software/make/manual/html_node/Errors.html#Errors suggests 
that this is a good thing to have.


Olaf


pgp60XmNfzvTX.pgp
Description: Digitale Signatur von OpenPGP

[ovmf test] 156270: all pass - PUSHED

2020-10-29 Thread osstest service owner

flight 156270 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/156270/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 3b87d728742fe58f427f4b775b11250e29d75cc6
baseline version:
 ovmf eb520b93d279e901a593c57e30649fb08f4290c5

Last test of basis   156255  2020-10-27 09:04:36 Z2 days
Testing same since   156270  2020-10-28 03:11:01 Z1 days1 attempts


People who touched revisions under test:
  Abner Chang 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   eb520b93d2..3b87d72874  3b87d728742fe58f427f4b775b11250e29d75cc6 -> 
xen-tested-master

Re: [PATCH v1] libacpi: use temporary files for generated files

2020-10-29 Thread Jan Beulich

On 29.10.2020 11:57, Anthony PERARD wrote:
> On Thu, Oct 29, 2020 at 09:47:09AM +0100, Jan Beulich wrote:
>> On 28.10.2020 19:13, Anthony PERARD wrote:
>>> On Tue, Oct 27, 2020 at 12:06:56PM +0100, Jan Beulich wrote:
 On 27.10.2020 11:57, Andrew Cooper wrote:
> On 27/10/2020 10:37, Jan Beulich wrote:
>> On 27.10.2020 11:27, Olaf Hering wrote:
>>> Am Tue, 27 Oct 2020 11:16:04 +0100
>>> schrieb Jan Beulich :
>>>
 This pattern is used when a rule consists of multiple commands
 having their output appended to one another's.
>>> My understanding is: a rule is satisfied as soon as the file exists.
>> No - once make has found that a rule's commands need running, it'll
>> run the full set and only check again afterwards.
>
> It stops at the first command which fails.
>
> Olaf is correct, but the problem here is an incremental build issue, not
> a parallel build issue.
>
> Intermediate files must not use the name of the target, or a failure and
> re-build will use the (bogus) intermediate state rather than rebuilding 
> it.

 But there's no intermediate file here - the file gets created in one
 go. Furthermore doesn't make delete the target file(s) when a rule
 fails? (One may not want to rely on this, and hence indeed keep multi-
 part rules update intermediate files of different names.)
>>>
>>> What if something went badly enough and sed didn't managed to write the
>>> whole file and make never had a chance to remove the bogus file?
>>
>> How's this different from an object file getting only partly written
>> and not deleted? We'd have to use the temporary file approach in
>> literally every rule if we wanted to cater for this.
> 
> I though that things like `gcc' would write the final object to a
> temporary place then rename it to the final destination, but that
> doesn't seems to be the case.
> 
> I tried to see what happens if the `sed' command fails, and the target is
> created, empty, and doesn't gets deleted by make. So an incremental
> build uses a broken file without trying to rebuild it.

IOW it's rather a courtesy of the compiler / assembler / linker
to delete their output files on error.

> If we want `make' to delete target when a rule fails, I think we need to
> add '.DELETE_ON_ERROR:' somewhere.

Ah, indeed. I thought this was the default nowadays, but the doc
says it isn't. I think this would be preferable over touching
individual rules to go through temporary files.

Jan

Re: [PATCH v3] x86/pv: inject #UD for entirely missing SYSCALL callbacks

2020-10-29 Thread Jan Beulich

On 29.10.2020 09:53, Jan Beulich wrote:
> On 28.10.2020 22:31, Andrew Cooper wrote:
>> On 26/10/2020 09:40, Jan Beulich wrote:
>>> In the case that no 64-bit SYSCALL callback is registered, the guest
>>> will be crashed when 64-bit userspace executes a SYSCALL instruction,
>>> which would be a userspace => kernel DoS.  Similarly for 32-bit
>>> userspace when no 32-bit SYSCALL callback was registered either.
>>>
>>> This has been the case ever since the introduction of 64bit PV support,
>>> but behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which
>>> yield #GP/#UD in userspace before the callback is registered, and are
>>> therefore safe by default.
>>>
>>> This change does constitute a change in the PV ABI, for the corner case
>>> of a PV guest kernel not registering a 64-bit callback (which has to be
>>> considered a defacto requirement of the unwritten PV ABI, considering
>>> there is no PV equivalent of EFER.SCE).
>>>
>>> It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
>>> SYSENTER (safe by default, until explicitly enabled).
>>>
>>> Signed-off-by: Andrew Cooper 
>>> Signed-off-by: Jan Beulich 
>>> ---
>>> v3:
>>>  * Split this change off of "x86/pv: Inject #UD for missing SYSCALL
>>>callbacks", to allow the uncontroversial part of that change to go
>>>in. Add conditional "rex64" for UREGS_rip adjustment. (Is branching
>>>over just the REX prefix too much trickery even for an unlikely to be
>>>taken code path?)
>>
>> I find this submission confusion seeing as my v3 is already suitably
>> acked and ready to commit.  What I haven't had yet enough free time to
>> do so.
> 
> My objection to the other half stands, and hence, I'm afraid, stands
> in the way of your patch getting committed. Aiui Roger's ack doesn't
> invalidate my objection, sorry.

Actually I realize now that earlier I said I'm okay with this going in
if Roger acks it. I take it that Roger was aware of the controversy
when providing the ack. Therefore I'd like to take back what I've said
above, and your supposed v3 ought to be okay to get committed.

As to backporting - I'd surely be taking the split off part here, but
I have to admit I'm hesitant to take the full change. IOW splitting
the two changes might still be a helpful thing.

Jan

Re: [PATCH] meson.build: fix building of Xen support for aarch64

2020-10-29 Thread Jason Andryuk

On Thu, Oct 29, 2020 at 6:01 AM Alex Bennée  wrote:
>
>
> Stefano Stabellini  writes:
>
> > On Wed, 28 Oct 2020, Alex Bennée wrote:
> >> Xen is supported on aarch64 although weirdly using the i386-softmmu
> >> model. Checking based on the host CPU meant we never enabled Xen
> >> support. It would be nice to enable CONFIG_XEN for aarch64-softmmu to
> >> make it not seem weird but that will require further build surgery.
> >>
> >> Signed-off-by: Alex Bennée 
> >> Cc: Masami Hiramatsu 
> >> Cc: Stefano Stabellini 
> >> Cc: Anthony Perard 
> >> Cc: Paul Durrant 
> >> Fixes: 8a19980e3f ("configure: move accelerator logic to meson")
> >> ---
> >>  meson.build | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/meson.build b/meson.build
> >> index 835424999d..f1fcbfed4c 100644
> >> --- a/meson.build
> >> +++ b/meson.build
> >> @@ -81,6 +81,8 @@ if cpu in ['x86', 'x86_64']
> >>  'CONFIG_HVF': ['x86_64-softmmu'],
> >>  'CONFIG_WHPX': ['i386-softmmu', 'x86_64-softmmu'],
> >>}
> >> +elif cpu in [ 'arm', 'aarch64' ]
> >> +  accelerator_targets += { 'CONFIG_XEN': ['i386-softmmu'] }
> >>  endif
> >
> > This looks very reasonable -- the patch makes sense.

A comment would be useful to explain that Arm needs i386-softmmu for
the xenpv machine.

> >
> > However I have two questions, mostly for my own understanding. I tried
> > to repro the aarch64 build problem but it works at my end, even without
> > this patch.
>
> Building on a x86 host or with cross compiler?
>
> > I wonder why. I suspect it works thanks to these lines in
> > meson.build:

I think it's a runtime and not a build problem.  In osstest, Xen
support was detected and configured, but CONFIG_XEN wasn't set for
Arm.  So at runtime xen_available() returns 0, and QEMU doesn't start
with "qemu-system-i386: -xen-domid 1: Option not supported for this
target"

I posted my investigation here:
https://lore.kernel.org/xen-devel/cakf6xpss8kpgovzrkitpz63bhbvbjxrtywdhekzuo2q1kem...@mail.gmail.com/

Regards,
Jason

Ping: [PATCH v3 0/3] x86: shim building adjustments (plus shadow follow-on)

2020-10-29 Thread Jan Beulich

Tim,

On 19.10.2020 10:42, Jan Beulich wrote:
> The change addressing the shadow related build issue noticed by
> Andrew went in already. The build breakage goes beyond this
> specific combination though - PV_SHIM_EXCLUSIVE plus HVM is
> similarly an issue. This is what the 1st patch tries to take care
> of, in a shape already on irc noticed to be controversial. I'm
> submitting the change nevertheless because for the moment there
> looks to be a majority in favor of going this route. One argument
> not voiced there yet: What good does it do to allow a user to
> enable HVM when then on the resulting hypervisor they still can't
> run HVM guests (for the hypervisor still being a dedicated PV
> shim one). On top of this, the alternative approach is likely
> going to get ugly.
> 
> The shadow related adjustments are here merely because the want
> to make them was noticed in the context of the patch which has
> already gone in.
> 
> 1: don't permit HVM and PV_SHIM_EXCLUSIVE at the same time
> 2: refactor shadow_vram_{get,put}_l1e()
> 3: sh_{make,destroy}_monitor_table() are "even more" HVM-only

unless you tell me otherwise I'm intending to commit the latter
two with Roger's acks some time next week. Of course it would
still be nice to know your view on the first of the TBDs in
patch 3 (which may result in further changes, in a follow-up).

Jan

[linux-linus test] 156269: regressions - FAIL

2020-10-29 Thread osstest service owner

flight 156269 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/156269/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-xsm7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-coresched-i386-xl  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-libvirt-xsm   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-examine   6 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-raw7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-pvshim 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-shadow 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-freebsd10-i386  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. 
vs. 152332
 test-arm64-arm64-xl-credit1   8 xen-boot fail REGR. vs. 152332
 test-arm64-arm64-xl-xsm   8 xen-boot fail REGR. vs. 152332
 test-arm64-arm64-libvirt-xsm  8 xen-boot fail REGR. vs. 152332
 test-arm64-arm64-examine  8 reboot   fail REGR. vs. 152332
 test-amd64-amd64-amd64-pvgrub 20 guest-stop  fail REGR. vs. 152332
 test-amd64-amd64-i386-pvgrub 19 guest-localmigrate/x10   fail REGR. vs. 152332
 test-armhf-armhf-xl-credit1   8 xen-boot fail REGR. vs. 152332
 test-armhf-armhf-libvirt  8 xen-boot fail REGR. vs. 152332
 test-armhf-armhf-examine  8 reboot   fail REGR. vs. 152332
 test-armhf-armhf-xl-cubietruck  8 xen-boot   fail REGR. vs. 152332
 test-armhf-armhf-xl-multivcpu  8 xen-bootfail REGR. vs. 152332
 test-armhf-armhf-libvirt-raw  8 xen-boot fail REGR. vs. 152332
 test-arm64-arm64-xl-seattle   8 xen-boot fail REGR. vs. 152332
 test-arm64-arm64-xl-credit2   8 xen-boot fail REGR. vs. 152332
 test-armhf-armhf-xl   8 xen-boot fail REGR. vs. 152332
 test-armhf-armhf-xl-credit2   8 xen-boot fail REGR. vs. 152332
 test-armhf-armhf-xl-vhd   8 xen-boot fail REGR. vs. 152332

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds  8 xen-boot fail REGR. vs. 152332

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl  11 leak-check/basis(11)fail blocked in 152332
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 152332
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152332
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 152332
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152332
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152332
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm

Ping: [PATCH 2/2] tools/libs: fix uninstall rule for header files

2020-10-29 Thread Jan Beulich

On 19.10.2020 09:21, Jan Beulich wrote:
> This again was working right only as long as $(LIBHEADER) consisted of
> just one entry.
> 
> Signed-off-by: Jan Beulich 

While patch 1 has become irrelevant with Juergen's more complete
change, this one is still applicable afaict.

Jan

> ---
> An alternative would be to use $(addprefix ) without any shell loop.
> 
> --- a/tools/libs/libs.mk
> +++ b/tools/libs/libs.mk
> @@ -107,7 +107,7 @@ install: build
>  .PHONY: uninstall
>  uninstall:
>   rm -f $(DESTDIR)$(PKG_INSTALLDIR)/$(LIB_FILE_NAME).pc
> - for i in $(LIBHEADER); do rm -f $(DESTDIR)$(includedir)/$(LIBHEADER); 
> done
> + for i in $(LIBHEADER); do rm -f $(DESTDIR)$(includedir)/$$i; done
>   rm -f $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so
>   rm -f $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so.$(MAJOR)
>   rm -f $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so.$(MAJOR).$(MINOR)
> 
>

Ping: [PATCH] tools/python: pass more -rpath-link options to ld

2020-10-29 Thread Jan Beulich

On 19.10.2020 10:31, Jan Beulich wrote:
> With the split of libraries, I've observed a number of warnings from
> (old?) ld.
> 
> Signed-off-by: Jan Beulich 

Marek?

> ---
> It's unclear to me whether this is ld version dependent - the pattern
> of where I've seen such warnings doesn't suggest a clear version
> dependency.
> 
> --- a/tools/python/setup.py
> +++ b/tools/python/setup.py
> @@ -7,10 +7,15 @@ XEN_ROOT = "../.."
>  extra_compile_args  = [ "-fno-strict-aliasing", "-Werror" ]
>  
>  PATH_XEN  = XEN_ROOT + "/tools/include"
> +PATH_LIBXENTOOLCORE = XEN_ROOT + "/tools/libs/toolcore"
>  PATH_LIBXENTOOLLOG = XEN_ROOT + "/tools/libs/toollog"
> +PATH_LIBXENCALL = XEN_ROOT + "/tools/libs/call"
>  PATH_LIBXENEVTCHN = XEN_ROOT + "/tools/libs/evtchn"
> +PATH_LIBXENGNTTAB = XEN_ROOT + "/tools/libs/gnttab"
>  PATH_LIBXENCTRL = XEN_ROOT + "/tools/libs/ctrl"
>  PATH_LIBXENGUEST = XEN_ROOT + "/tools/libs/guest"
> +PATH_LIBXENDEVICEMODEL = XEN_ROOT + "/tools/libs/devicemodel"
> +PATH_LIBXENFOREIGNMEMORY = XEN_ROOT + "/tools/libs/foreignmemory"
>  PATH_XENSTORE = XEN_ROOT + "/tools/libs/store"
>  
>  xc = Extension("xc",
> @@ -24,7 +29,13 @@ xc = Extension("xc",
> library_dirs   = [ PATH_LIBXENCTRL, PATH_LIBXENGUEST ],
> libraries  = [ "xenctrl", "xenguest" ],
> depends= [ PATH_LIBXENCTRL + "/libxenctrl.so", 
> PATH_LIBXENGUEST + "/libxenguest.so" ],
> -   extra_link_args= [ "-Wl,-rpath-link="+PATH_LIBXENTOOLLOG 
> ],
> +   extra_link_args= [ "-Wl,-rpath-link="+PATH_LIBXENCALL,
> +  
> "-Wl,-rpath-link="+PATH_LIBXENDEVICEMODEL,
> +  "-Wl,-rpath-link="+PATH_LIBXENEVTCHN,
> +  
> "-Wl,-rpath-link="+PATH_LIBXENFOREIGNMEMORY,
> +  "-Wl,-rpath-link="+PATH_LIBXENGNTTAB,
> +  "-Wl,-rpath-link="+PATH_LIBXENTOOLCORE,
> +  "-Wl,-rpath-link="+PATH_LIBXENTOOLLOG 
> ],
> sources= [ "xen/lowlevel/xc/xc.c" ])
>  
>  xs = Extension("xs",
> @@ -33,6 +44,7 @@ xs = Extension("xs",
> library_dirs   = [ PATH_XENSTORE ],
> libraries  = [ "xenstore" ],
> depends= [ PATH_XENSTORE + "/libxenstore.so" ],
> +   extra_link_args= [ "-Wl,-rpath-link="+PATH_LIBXENTOOLCORE 
> ],
> sources= [ "xen/lowlevel/xs/xs.c" ])
>  
>  plat = os.uname()[0]
>

Ping²: [PATCH] x86/PV: make post-migration page state consistent

2020-10-29 Thread Jan Beulich

On 11.09.2020 12:34, Jan Beulich wrote:
> When a page table page gets de-validated, its type reference count drops
> to zero (and PGT_validated gets cleared), but its type remains intact.
> XEN_DOMCTL_getpageframeinfo3, therefore, so far reported prior usage for
> such pages. An intermediate write to such a page via e.g.
> MMU_NORMAL_PT_UPDATE, however, would transition the page's type to
> PGT_writable_page, thus altering what XEN_DOMCTL_getpageframeinfo3 would
> return. In libxc the decision which pages to normalize / localize
> depends solely on the type returned from the domctl. As a result without
> further precautions the guest won't be able to tell whether such a page
> has had its (apparent) PTE entries transitioned to the new MFNs.
> 
> Add a check of PGT_validated, thus consistently avoiding normalization /
> localization in the tool stack.
> 
> Alongside using XEN_DOMCTL_PFINFO_NOTAB instead of plain zero for the
> change at hand, also change the variable's initializer to use this
> constant, too. Take the opportunity and also adjust its type.
> 
> Signed-off-by: Jan Beulich 

I think I did address all questions here.

Jan

> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -215,7 +215,8 @@ long arch_do_domctl(
>  
>  for ( i = 0; i < num; ++i )
>  {
> -unsigned long gfn = 0, type = 0;
> +unsigned long gfn = 0;
> +unsigned int type = XEN_DOMCTL_PFINFO_NOTAB;
>  struct page_info *page;
>  p2m_type_t t;
>  
> @@ -255,6 +256,8 @@ long arch_do_domctl(
>  
>  if ( page->u.inuse.type_info & PGT_pinned )
>  type |= XEN_DOMCTL_PFINFO_LPINTAB;
> +else if ( !(page->u.inuse.type_info & PGT_validated) )
> +type = XEN_DOMCTL_PFINFO_NOTAB;
>  
>  if ( page->count_info & PGC_broken )
>  type = XEN_DOMCTL_PFINFO_BROKEN;
>

[PATCH] x86/pv: Drop stale comment in dom0_construct_pv()

2020-10-29 Thread Andrew Cooper

This comment has been around since c/s 1372bca0615 in 2004.  It is stale, as
it predates the introduction of struct vcpu.

It is not obvious that it was even correct at the time.  Where a vcpu (domain
at the time) has been configured to run is unrelated to construct the domain's
initial pagetables, etc.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

Almost...  I'm not entirely sure NUMA memory allocation is plumbed through
correctly, but even that still has nothing to do with v->processor
---
 xen/arch/x86/pv/dom0_build.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index d79503d6a9..f7165309a2 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -616,7 +616,6 @@ int __init dom0_construct_pv(struct domain *d,
 v->arch.pv.event_callback_cs= FLAT_COMPAT_KERNEL_CS;
 }
 
-/* WARNING: The new domain must have its 'processor' field filled in! */
 if ( !is_pv_32bit_domain(d) )
 {
 maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table;
-- 
2.11.0

Re: [PATCH] xen: add support for automatic debug key actions in case of crash

2020-10-29 Thread Jan Beulich

On 22.10.2020 16:39, Juergen Gross wrote:
> When the host crashes it would sometimes be nice to have additional
> debug data available which could be produced via debug keys, but
> halting the server for manual intervention might be impossible due to
> the need to reboot/kexec rather sooner than later.
> 
> Add support for automatic debug key actions in case of crashes which
> can be activated via boot- or runtime-parameter.

While I can certainly see this possibly being a useful thing in
certain situations, I'm uncertain it's going to be helpful in at
least a fair set of cases. What output to request very much
depends on the nature of the problem one is running into, and
the more keys one adds "just in case", the longer the reboot
latency, and the higher the risk (see also below) of the output
generation actually causing further problems.

IOW I'm neither fully convinced that we want this, nor fully
opposed.

> Depending on the type of crash the desired data might be different, so
> support different settings for the possible types of crashes.
> 
> The parameter is "crash-debug" with the following syntax:
> 
>   crash-debug-=
> 
> with  being one of:
> 
>   panic, hwdom, watchdog, kexeccmd, debugkey
> 
> and  a sequence of debug key characters with '.' having the
> special semantics of a 1 second pause.

1 second is a whole lot of time. To get two successive sets
of data, a much shorter delay (if any) would normally suffice.

Also, while '.' may seem like a good choice right now, with the
shortage of characters we may want to put a real handler behind
it at come point. The one character that clearly won't make much
sense to use in this context is 'h', but that's awful as a (kind
of) separator. Could we perhaps replace 'h' by '?', freeing up
'h' and allowing '?' to be used for this purpose here?

> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -574,6 +574,29 @@ reduction of features at Xen's disposal to manage guests.
>  ### cpuinfo (x86)
>  > `= `
>  
> +### crash-debug-debugkey
> +### crash-debug-hwdom
> +### crash-debug-kexeccmd
> +### crash-debug-panic
> +### crash-debug-watchdog
> +> `= `
> +
> +> Can be modified at runtime
> +
> +Specify debug-key actions in cases of crashes. Each of the parameters applies
> +to a different crash reason. The `` is a sequence of debug key
> +characters, with `.` having the special meaning of a 1 second pause.
> +
> +So e.g. `crash-debug-watchdog=0.0r` would dump dom0 state twice with a
> +second between the two state dumps, followed by the run queues of the
> +hypervisor, if the system crashes due to a watchdog timeout.
> +
> +These parameters should be used carefully, as e.g. specifying
> +`crash-debug-debugkey=C` would result in an endless loop. Depending on the
> +reason of the system crash it might happen that triggering some debug key
> +action will result in a hang instead of dumping data and then doing a
> +reboot or crash dump.

I think it would be useful if the flavors were (briefly)
explained: At the very least "debugkey" doesn't directly fit "in
cases of crashes", as there's no crash involved. kexec_crash()
instead gets invoked without there having been any crash.

You may also want to point out that this is a best effort thing
only - system state at the point of a crash may be such that the
attempt of handling one or the debug keys would have further bad
effects on the system, including that the actual kexec may then
never occur.

> @@ -507,6 +509,41 @@ void __init initialize_keytable(void)
>  }
>  }
>  
> +#define CRASHACTION_SIZE  32
> +static char crash_debug_panic[CRASHACTION_SIZE];
> +static char crash_debug_hwdom[CRASHACTION_SIZE];
> +static char crash_debug_watchdog[CRASHACTION_SIZE];
> +static char crash_debug_kexeccmd[CRASHACTION_SIZE];
> +static char crash_debug_debugkey[CRASHACTION_SIZE];
> +
> +static char *crash_action[CRASHREASON_N] = {
> +[CRASHREASON_PANIC] = crash_debug_panic,
> +[CRASHREASON_HWDOM] = crash_debug_hwdom,
> +[CRASHREASON_WATCHDOG] = crash_debug_watchdog,
> +[CRASHREASON_KEXECCMD] = crash_debug_kexeccmd,
> +[CRASHREASON_DEBUGKEY] = crash_debug_debugkey,
> +};
> +
> +string_runtime_param("crash-debug-panic", crash_debug_panic);
> +string_runtime_param("crash-debug-hwdom", crash_debug_hwdom);
> +string_runtime_param("crash-debug-watchdog", crash_debug_watchdog);
> +string_runtime_param("crash-debug-kexeccmd", crash_debug_kexeccmd);
> +string_runtime_param("crash-debug-debugkey", crash_debug_debugkey);

In general I'm not in favor of this (or similar) needing
five new command line options, instead of just one. The problem
with e.g.

crash-debug=panic:rq,watchdog:0d

is that ',' (or any other separator chosen) could in principle
also be a debug key. It would still be possible to use

crash-debug=panic:rq crash-debug=watchdog:0d

though. Thoughts?

> +void keyhandler_crash_action(enum crash_reason reason)
> +{
> +const char *action = crash_action[reason];
> +

Re: [XEN PATCH v1] xen/arm : Add support for SMMUv3 driver

2020-10-29 Thread Bertrand Marquis

Hi Julien,

> On 28 Oct 2020, at 19:12, Julien Grall  wrote:
> 
> 
> 
> On 26/10/2020 11:03, Rahul Singh wrote:
>> Hello Julien,
> 
> Hi Rahul,
> 
>>> On 23 Oct 2020, at 4:19 pm, Julien Grall  wrote:
>>> 
>>> 
>>> 
>>> On 23/10/2020 15:27, Rahul Singh wrote:
 Hello Julien,
> On 23 Oct 2020, at 2:00 pm, Julien Grall  wrote:
> 
> 
> 
> On 23/10/2020 12:35, Rahul Singh wrote:
>> Hello,
>>> On 23 Oct 2020, at 1:02 am, Stefano Stabellini  
>>> wrote:
>>> 
>>> On Thu, 22 Oct 2020, Julien Grall wrote:
>> On 20/10/2020 16:25, Rahul Singh wrote:
>>> Add support for ARM architected SMMUv3 implementations. It is based 
>>> on
>>> the Linux SMMUv3 driver.
>>> Major differences between the Linux driver are as follows:
>>> 1. Only Stage-2 translation is supported as compared to the Linux 
>>> driver
>>>that supports both Stage-1 and Stage-2 translations.
>>> 2. Use P2M  page table instead of creating one as SMMUv3 has the
>>>capability to share the page tables with the CPU.
>>> 3. Tasklets is used in place of threaded IRQ's in Linux for event 
>>> queue
>>>and priority queue IRQ handling.
>> 
>> Tasklets are not a replacement for threaded IRQ. In particular, they 
>> will
>> have priority over anything else (IOW nothing will run on the pCPU 
>> until
>> they are done).
>> 
>> Do you know why Linux is using thread. Is it because of long running
>> operations?
> 
> Yes you are right because of long running operations Linux is using 
> the
> threaded IRQs.
> 
> SMMUv3 reports fault/events bases on memory-based circular buffer 
> queues not
> based on the register. As per my understanding, it is time-consuming 
> to
> process the memory based queues in interrupt context because of that 
> Linux
> is using threaded IRQ to process the faults/events from SMMU.
> 
> I didn’t find any other solution in XEN in place of tasklet to defer 
> the
> work, that’s why I used tasklet in XEN in replacement of threaded 
> IRQs. If
> we do all work in interrupt context we will make XEN less responsive.
 
 So we need to make sure that Xen continue to receives interrupts, but 
 we also
 need to make sure that a vCPU bound to the pCPU is also responsive.
 
> 
> If you know another solution in XEN that will be used to defer the 
> work in
> the interrupt please let me know I will try to use that.
 
 One of my work colleague encountered a similar problem recently. He 
 had a long
 running tasklet and wanted to be broken down in smaller chunk.
 
 We decided to use a timer to reschedule the taslket in the future. 
 This allows
 the scheduler to run other loads (e.g. vCPU) for some time.
 
 This is pretty hackish but I couldn't find a better solution as 
 tasklet have
 high priority.
 
 Maybe the other will have a better idea.
>>> 
>>> Julien's suggestion is a good one.
>>> 
>>> But I think tasklets can be configured to be called from the idle_loop,
>>> in which case they are not run in interrupt context?
>>> 
>>  Yes you are right tasklet will be scheduled from the idle_loop that is 
>> not interrupt conext.
> 
> This depends on your tasklet. Some will run from the softirq context 
> which is usually (for Arm) on the return of an exception.
> 
 Thanks for the info. I will check and will get better understanding of the 
 tasklet how it will run in XEN.
>>> 
>>> 4. Latest version of the Linux SMMUv3 code implements the commands 
>>> queue
>>>access functions based on atomic operations implemented in Linux.
>> 
>> Can you provide more details?
> 
> I tried to port the latest version of the SMMUv3 code than I observed 
> that
> in order to port that code I have to also port atomic operation 
> implemented
> in Linux to XEN. As latest Linux code uses atomic operation to 
> process the
> command queues 
> (atomic_cond_read_relaxed(),atomic_long_cond_read_relaxed() ,
> atomic_fetch_andnot_relaxed()) .
 
 Thank you for the explanation. I think it would be best to import the 
 atomic
 helpers and use the latest code.
 
 This will ensure that we don't re-introduce bugs and also buy us some 
 time
 before the Linux and Xen driver diverge again too much.
 
 Stefano, what do you think?
>>> 
>>> I think you are right.
>> Yes, I agree with yo

Re: [PATCH] x86/pv: Drop stale comment in dom0_construct_pv()

2020-10-29 Thread Jan Beulich

On 29.10.2020 15:00, Andrew Cooper wrote:
> This comment has been around since c/s 1372bca0615 in 2004.  It is stale, as
> it predates the introduction of struct vcpu.

That commit only moved it around; it's 22a857bde9b8 afaics from
early 2003 where it first appeared, where it had a reason:

/*
 * WARNING: The new domain must have its 'processor' field
 * filled in by now !!
 */
phys_l2tab = ALLOC_FRAME_FROM_DOMAIN();
l2tab = map_domain_mem(phys_l2tab);
memcpy(l2tab, idle_pg_table[p->processor], PAGE_SIZE);

But yes, the comment has been stale for a long time, and I've
been wondering a number of times what it was supposed to tell
me. (I think it was already stale at the point the comment
first got altered, in 3072fef54df8.)

> It is not obvious that it was even correct at the time.  Where a vcpu (domain
> at the time) has been configured to run is unrelated to construct the domain's
> initial pagetables, etc.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Jan Beulich 

Jan

Re: [PATCH] xen: add support for automatic debug key actions in case of crash

2020-10-29 Thread Jürgen Groß


On 29.10.20 15:22, Jan Beulich wrote:

On 22.10.2020 16:39, Juergen Gross wrote:

When the host crashes it would sometimes be nice to have additional
debug data available which could be produced via debug keys, but
halting the server for manual intervention might be impossible due to
the need to reboot/kexec rather sooner than later.

Add support for automatic debug key actions in case of crashes which
can be activated via boot- or runtime-parameter.


While I can certainly see this possibly being a useful thing in
certain situations, I'm uncertain it's going to be helpful in at
least a fair set of cases. What output to request very much
depends on the nature of the problem one is running into, and
the more keys one adds "just in case", the longer the reboot
latency, and the higher the risk (see also below) of the output
generation actually causing further problems.


The obvious case is a watchdog induced crash: at least 2 sets of dom0
state will help in many cases.


IOW I'm neither fully convinced that we want this, nor fully
opposed.


Depending on the type of crash the desired data might be different, so
support different settings for the possible types of crashes.

The parameter is "crash-debug" with the following syntax:

   crash-debug-=

with  being one of:

   panic, hwdom, watchdog, kexeccmd, debugkey

and  a sequence of debug key characters with '.' having the
special semantics of a 1 second pause.


1 second is a whole lot of time. To get two successive sets
of data, a much shorter delay (if any) would normally suffice.


Yes, I'd be fine to trade that for a shorter period of time.


Also, while '.' may seem like a good choice right now, with the
shortage of characters we may want to put a real handler behind
it at come point. The one character that clearly won't make much
sense to use in this context is 'h', but that's awful as a (kind
of) separator. Could we perhaps replace 'h' by '?', freeing up
'h' and allowing '?' to be used for this purpose here?


Fine with me. Another possibility would be to add '\' as an escape
character with '\.' meaning "debug-key .".


--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -574,6 +574,29 @@ reduction of features at Xen's disposal to manage guests.
  ### cpuinfo (x86)
  > `= `
  
+### crash-debug-debugkey

+### crash-debug-hwdom
+### crash-debug-kexeccmd
+### crash-debug-panic
+### crash-debug-watchdog
+> `= `
+
+> Can be modified at runtime
+
+Specify debug-key actions in cases of crashes. Each of the parameters applies
+to a different crash reason. The `` is a sequence of debug key
+characters, with `.` having the special meaning of a 1 second pause.
+
+So e.g. `crash-debug-watchdog=0.0r` would dump dom0 state twice with a
+second between the two state dumps, followed by the run queues of the
+hypervisor, if the system crashes due to a watchdog timeout.
+
+These parameters should be used carefully, as e.g. specifying
+`crash-debug-debugkey=C` would result in an endless loop. Depending on the
+reason of the system crash it might happen that triggering some debug key
+action will result in a hang instead of dumping data and then doing a
+reboot or crash dump.


I think it would be useful if the flavors were (briefly)
explained: At the very least "debugkey" doesn't directly fit "in
cases of crashes", as there's no crash involved. kexec_crash()
instead gets invoked without there having been any crash.


Yes, and having some additional state generate for this case might
help diagnosis.



You may also want to point out that this is a best effort thing
only - system state at the point of a crash may be such that the
attempt of handling one or the debug keys would have further bad
effects on the system, including that the actual kexec may then
never occur.


True.




@@ -507,6 +509,41 @@ void __init initialize_keytable(void)
  }
  }
  
+#define CRASHACTION_SIZE  32

+static char crash_debug_panic[CRASHACTION_SIZE];
+static char crash_debug_hwdom[CRASHACTION_SIZE];
+static char crash_debug_watchdog[CRASHACTION_SIZE];
+static char crash_debug_kexeccmd[CRASHACTION_SIZE];
+static char crash_debug_debugkey[CRASHACTION_SIZE];
+
+static char *crash_action[CRASHREASON_N] = {
+[CRASHREASON_PANIC] = crash_debug_panic,
+[CRASHREASON_HWDOM] = crash_debug_hwdom,
+[CRASHREASON_WATCHDOG] = crash_debug_watchdog,
+[CRASHREASON_KEXECCMD] = crash_debug_kexeccmd,
+[CRASHREASON_DEBUGKEY] = crash_debug_debugkey,
+};
+
+string_runtime_param("crash-debug-panic", crash_debug_panic);
+string_runtime_param("crash-debug-hwdom", crash_debug_hwdom);
+string_runtime_param("crash-debug-watchdog", crash_debug_watchdog);
+string_runtime_param("crash-debug-kexeccmd", crash_debug_kexeccmd);
+string_runtime_param("crash-debug-debugkey", crash_debug_debugkey);


In general I'm not in favor of this (or similar) needing
five new command line options, instead of just one. The problem
with e.g.

crash-debug=panic:rq,watchdog:0d

is that ',' (

Re: [PATCH] xen: add support for automatic debug key actions in case of crash

2020-10-29 Thread Jan Beulich

On 29.10.2020 15:40, Jürgen Groß wrote:
> On 29.10.20 15:22, Jan Beulich wrote:
>> On 22.10.2020 16:39, Juergen Gross wrote:
>>> @@ -507,6 +509,41 @@ void __init initialize_keytable(void)
>>>   }
>>>   }
>>>   
>>> +#define CRASHACTION_SIZE  32
>>> +static char crash_debug_panic[CRASHACTION_SIZE];
>>> +static char crash_debug_hwdom[CRASHACTION_SIZE];
>>> +static char crash_debug_watchdog[CRASHACTION_SIZE];
>>> +static char crash_debug_kexeccmd[CRASHACTION_SIZE];
>>> +static char crash_debug_debugkey[CRASHACTION_SIZE];
>>> +
>>> +static char *crash_action[CRASHREASON_N] = {
>>> +[CRASHREASON_PANIC] = crash_debug_panic,
>>> +[CRASHREASON_HWDOM] = crash_debug_hwdom,
>>> +[CRASHREASON_WATCHDOG] = crash_debug_watchdog,
>>> +[CRASHREASON_KEXECCMD] = crash_debug_kexeccmd,
>>> +[CRASHREASON_DEBUGKEY] = crash_debug_debugkey,
>>> +};
>>> +
>>> +string_runtime_param("crash-debug-panic", crash_debug_panic);
>>> +string_runtime_param("crash-debug-hwdom", crash_debug_hwdom);
>>> +string_runtime_param("crash-debug-watchdog", crash_debug_watchdog);
>>> +string_runtime_param("crash-debug-kexeccmd", crash_debug_kexeccmd);
>>> +string_runtime_param("crash-debug-debugkey", crash_debug_debugkey);
>>
>> In general I'm not in favor of this (or similar) needing
>> five new command line options, instead of just one. The problem
>> with e.g.
>>
>> crash-debug=panic:rq,watchdog:0d
>>
>> is that ',' (or any other separator chosen) could in principle
>> also be a debug key. It would still be possible to use
>>
>> crash-debug=panic:rq crash-debug=watchdog:0d
>>
>> though. Thoughts?
> 
> OTOH the runtime parameters are much easier addressable that way.

Ah yes, I can see this as a reason. Would make we wonder whether
command line and runtime handling may want disconnecting in this
specific case then. (But I can also see the argument of this
being too much overhead then.)

>>> +void keyhandler_crash_action(enum crash_reason reason)
>>> +{
>>> +const char *action = crash_action[reason];
>>> +struct cpu_user_regs *regs = get_irq_regs() ? : guest_cpu_user_regs();
>>> +
>>> +while ( *action ) {
>>> +if ( *action == '.' )
>>> +mdelay(1000);
>>> +else
>>> +handle_keypress(*action, regs);
>>> +action++;
>>> +}
>>> +}
>>
>> I think only diagnostic keys should be permitted here.
> 
> While I understand that other keys could produce nonsense or do harm,
> I'm not sure we should really prohibit them. Allowing them would e.g.
> allow to do just a reboot without kdump for one type of crashes.

Ah yes, that's a fair point.

>>> --- a/xen/include/xen/kexec.h
>>> +++ b/xen/include/xen/kexec.h
>>> @@ -1,6 +1,8 @@
>>>   #ifndef __XEN_KEXEC_H__
>>>   #define __XEN_KEXEC_H__
>>>   
>>> +#include 
>>
>> Could we go without this, utilizing the gcc extension of forward
>> declared enums? Otoh ...
>>
>>> @@ -82,7 +84,11 @@ void vmcoreinfo_append_str(const char *fmt, ...)
>>>   #define kexecing 0
>>>   
>>>   static inline void kexec_early_calculations(void) {}
>>> -static inline void kexec_crash(void) {}
>>> +static inline void kexec_crash(enum crash_reason reason)
>>> +{
>>> +keyhandler_crash_action(reason);
>>> +}
>>
>> ... if this is to be an inline function and not just a #define,
>> it'll need the declaration of the function to have been seen.
> 
> And even being a #define all users of kexec_crash() would need to
> #include keyhandler.h (and I'm not sure there are any source files
> including kexec.h which don't use kexec_crash()).

About as many which do as ones which don't. But there's no
generally accessible header which includes xen/kexec.h, so perhaps
the extra dependency indeed isn't all this problematic.

Jan

Re: [PATCH 20/33] docs: ABI: testing: make the files compatible with ReST output

2020-10-29 Thread Jonathan Cameron

On Wed, 28 Oct 2020 15:23:18 +0100
Mauro Carvalho Chehab  wrote:

> From: Mauro Carvalho Chehab 
> 
> Some files over there won't parse well by Sphinx.
> 
> Fix them.
> 
> Signed-off-by: Mauro Carvalho Chehab 
> Signed-off-by: Mauro Carvalho Chehab 

Query below...  I'm going to guess a rebase issue?

Other than that
Acked-by: Jonathan Cameron  # for IIO


> diff --git a/Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 
> b/Documentation/ABI/testing/sysfs-bus-iio-timer-stm32
> index b7259234ad70..a10a4de3e5fe 100644
> --- a/Documentation/ABI/testing/sysfs-bus-iio-timer-stm32
> +++ b/Documentation/ABI/testing/sysfs-bus-iio-timer-stm32
> @@ -3,67 +3,85 @@ KernelVersion:  4.11
>  Contact: benjamin.gaign...@st.com
>  Description:
>   Reading returns the list possible master modes which are:
> - - "reset" : The UG bit from the TIMx_EGR register is
> +
> +
> + - "reset"
> + The UG bit from the TIMx_EGR register is
>   used as trigger output (TRGO).
> - - "enable": The Counter Enable signal CNT_EN is used
> + - "enable"
> + The Counter Enable signal CNT_EN is used
>   as trigger output.
> - - "update": The update event is selected as trigger output.
> + - "update"
> + The update event is selected as trigger output.
>   For instance a master timer can then be used
>   as a prescaler for a slave timer.
> - - "compare_pulse" : The trigger output send a positive pulse
> - when the CC1IF flag is to be set.
> - - "OC1REF": OC1REF signal is used as trigger output.
> - - "OC2REF": OC2REF signal is used as trigger output.
> - - "OC3REF": OC3REF signal is used as trigger output.
> - - "OC4REF": OC4REF signal is used as trigger output.
> + - "compare_pulse"
> + The trigger output send a positive pulse
> + when the CC1IF flag is to be set.
> + - "OC1REF"
> + OC1REF signal is used as trigger output.
> + - "OC2REF"
> + OC2REF signal is used as trigger output.
> + - "OC3REF"
> + OC3REF signal is used as trigger output.
> + - "OC4REF"
> + OC4REF signal is used as trigger output.
> +
>   Additional modes (on TRGO2 only):
> - - "OC5REF": OC5REF signal is used as trigger output.
> - - "OC6REF": OC6REF signal is used as trigger output.
> +
> + - "OC5REF"
> + OC5REF signal is used as trigger output.
> + - "OC6REF"
> + OC6REF signal is used as trigger output.
>   - "compare_pulse_OC4REF":
> -   OC4REF rising or falling edges generate pulses.
> + OC4REF rising or falling edges generate pulses.
>   - "compare_pulse_OC6REF":
> -   OC6REF rising or falling edges generate pulses.
> + OC6REF rising or falling edges generate pulses.
>   - "compare_pulse_OC4REF_r_or_OC6REF_r":
> -   OC4REF or OC6REF rising edges generate pulses.
> + OC4REF or OC6REF rising edges generate pulses.
>   - "compare_pulse_OC4REF_r_or_OC6REF_f":
> -   OC4REF rising or OC6REF falling edges generate pulses.
> + OC4REF rising or OC6REF falling edges generate
> + pulses.
>   - "compare_pulse_OC5REF_r_or_OC6REF_r":
> -   OC5REF or OC6REF rising edges generate pulses.
> + OC5REF or OC6REF rising edges generate pulses.
>   - "compare_pulse_OC5REF_r_or_OC6REF_f":
> -   OC5REF rising or OC6REF falling edges generate pulses.
> + OC5REF rising or OC6REF falling edges generate
> + pulses.
>  
> - +---+   +-++-+
> - | Prescaler +-> | Counter |+-> | Master  | TRGO(2)
> - +---+   +--++-+|-> | Control +-->
> -||  ||  +-+
> - +--v+-+ OCxREF ||  +-+
> - | Chx compare +--> | Output  | ChX
> - +---+-+ |  | Control +-->
> -   . |   |  +-+
> -   . |   |.
> - +---

Re: [PATCH 12/12] xen/cpupool: make per-cpupool sched-gran hypfs node writable

2020-10-29 Thread Jan Beulich

On 26.10.2020 10:13, Juergen Gross wrote:
> @@ -1088,13 +1098,58 @@ static int cpupool_gran_read(const struct hypfs_entry 
> *entry,
>  return copy_to_guest(uaddr, name, strlen(name) + 1) ? -EFAULT : 0;
>  }
>  
> +static int cpupool_gran_write(struct hypfs_entry_leaf *leaf,
> +  XEN_GUEST_HANDLE_PARAM(void) uaddr,
> +  unsigned int ulen)
> +{
> +const struct hypfs_dyndir_id *data;
> +struct cpupool *cpupool;
> +enum sched_gran gran;
> +unsigned int sched_gran;
> +char name[SCHED_GRAN_NAME_LEN];
> +int ret = 0;
> +
> +if ( ulen > SCHED_GRAN_NAME_LEN )
> +return -ENOSPC;
> +
> +if ( copy_from_guest(name, uaddr, ulen) )
> +return -EFAULT;
> +
> +sched_gran = sched_gran_get(name, &gran) ? 0
> + : 
> cpupool_check_granularity(gran);
> +if ( memchr(name, 0, ulen) != (name + ulen - 1) || sched_gran == 0 )
> +return -EINVAL;

I guess the memchr() check wants to happen before the call to
sched_gran_get()?

Jan

Re: [PATCH 12/12] xen/cpupool: make per-cpupool sched-gran hypfs node writable

2020-10-29 Thread Jürgen Groß


On 29.10.20 15:58, Jan Beulich wrote:

On 26.10.2020 10:13, Juergen Gross wrote:

@@ -1088,13 +1098,58 @@ static int cpupool_gran_read(const struct hypfs_entry 
*entry,
  return copy_to_guest(uaddr, name, strlen(name) + 1) ? -EFAULT : 0;
  }
  
+static int cpupool_gran_write(struct hypfs_entry_leaf *leaf,

+  XEN_GUEST_HANDLE_PARAM(void) uaddr,
+  unsigned int ulen)
+{
+const struct hypfs_dyndir_id *data;
+struct cpupool *cpupool;
+enum sched_gran gran;
+unsigned int sched_gran;
+char name[SCHED_GRAN_NAME_LEN];
+int ret = 0;
+
+if ( ulen > SCHED_GRAN_NAME_LEN )
+return -ENOSPC;
+
+if ( copy_from_guest(name, uaddr, ulen) )
+return -EFAULT;
+
+sched_gran = sched_gran_get(name, &gran) ? 0
+ : cpupool_check_granularity(gran);
+if ( memchr(name, 0, ulen) != (name + ulen - 1) || sched_gran == 0 )
+return -EINVAL;


I guess the memchr() check wants to happen before the call to
sched_gran_get()?


Yes.


Juergen

[PATCH] x86emul: support AVX-VNNI

2020-10-29 Thread Jan Beulich

These are VEX-encoded equivalents of the EVEX-encoded AVX512-VNNI ISA
extension.

Signed-off-by: Jan Beulich 
---
SDE: -spr

--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -226,6 +226,7 @@ int libxl_cpuid_parse_config(libxl_cpuid
 {"core-caps",0x0007,  0, CPUID_REG_EDX, 30,  1},
 {"ssbd", 0x0007,  0, CPUID_REG_EDX, 31,  1},
 
+{"avx-vnni", 0x0007,  1, CPUID_REG_EAX,  4,  1},
 {"avx512-bf16",  0x0007,  1, CPUID_REG_EAX,  5,  1},
 
 {"lahfsahf", 0x8001, NA, CPUID_REG_ECX,  0,  1},
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -175,7 +175,7 @@ static const char *const str_7d0[32] =
 
 static const char *const str_7a1[32] =
 {
-/* 4 */ [ 5] = "avx512-bf16",
+[ 4] = "avx-vnni",  [ 5] = "avx512-bf16",
 };
 
 static const struct {
--- a/tools/tests/x86_emulator/predicates.c
+++ b/tools/tests/x86_emulator/predicates.c
@@ -1335,6 +1335,10 @@ static const struct vex {
 { { 0x45 }, 2, T, R, pfx_66, Wn, Ln }, /* vpsrlv{d,q} */
 { { 0x46 }, 2, T, R, pfx_66, W0, Ln }, /* vpsravd */
 { { 0x47 }, 2, T, R, pfx_66, Wn, Ln }, /* vpsllv{d,q} */
+{ { 0x50 }, 2, T, R, pfx_66, W0, Ln }, /* vpdpbusd */
+{ { 0x51 }, 2, T, R, pfx_66, W0, Ln }, /* vpdpbusds */
+{ { 0x52 }, 2, T, R, pfx_66, W0, Ln }, /* vpdpwssd */
+{ { 0x53 }, 2, T, R, pfx_66, W0, Ln }, /* vpdpwssds */
 { { 0x58 }, 2, T, R, pfx_66, W0, Ln }, /* vpbroadcastd */
 { { 0x59 }, 2, T, R, pfx_66, W0, Ln }, /* vpbroadcastq */
 { { 0x5a }, 2, F, R, pfx_66, W0, L1 }, /* vbroadcasti128 */
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -5028,6 +5028,61 @@ int main(int argc, char **argv)
 printf("okay\n");
 }
 
+printf("%-40s", "Testing vpdpwssd (%ecx),%{y,z}mmA,%{y,z}mmB...");
+if ( stack_exec && cpu_has_avx512_vnni && cpu_has_avx_vnni )
+{
+/* Do the same operation two ways and compare the results. */
+decl_insn(vpdpwssd_vex1);
+decl_insn(vpdpwssd_vex2);
+decl_insn(vpdpwssd_evex);
+
+for ( i = 0; i < 24; ++i )
+res[i] = i | (~i << 16);
+
+asm volatile ( "vmovdqu32 32(%0), %%zmm1\n\t"
+   "vextracti64x4 $1, %%zmm1, %%ymm2\n\t"
+   "vpxor %%xmm0, %%xmm0, %%xmm3\n\t"
+   "vpxor %%xmm0, %%xmm0, %%xmm4\n\t"
+   "vpxor %%xmm0, %%xmm0, %%xmm5\n"
+   put_insn(vpdpwssd_vex1,
+/* %{vex%} vpdpwssd (%1), %%ymm1, %%ymm3" */
+".byte 0xc4, 0xe2, 0x75, 0x52, 0x19") "\n"
+   put_insn(vpdpwssd_vex2,
+/* "%{vex%} vpdpwssd 32(%1), %%ymm2, %%ymm4" */
+".byte 0xc4, 0xe2, 0x6d, 0x52, 0x61, 0x20") 
"\n"
+   put_insn(vpdpwssd_evex,
+/* "vpdpwssd (%1), %%zmm1, %%zmm5" */
+".byte 0x62, 0xf2, 0x75, 0x48, 0x52, 0x29")
+   :: "r" (res), "c" (NULL) );
+
+set_insn(vpdpwssd_vex1);
+regs.ecx = (unsigned long)res;
+rc = x86_emulate(&ctxt, &emulops);
+if ( rc != X86EMUL_OKAY || !check_eip(vpdpwssd_vex1) )
+goto fail;
+
+set_insn(vpdpwssd_vex2);
+regs.ecx = (unsigned long)res;
+rc = x86_emulate(&ctxt, &emulops);
+if ( rc != X86EMUL_OKAY || !check_eip(vpdpwssd_vex2) )
+goto fail;
+
+set_insn(vpdpwssd_evex);
+regs.ecx = (unsigned long)res;
+rc = x86_emulate(&ctxt, &emulops);
+if ( rc != X86EMUL_OKAY || !check_eip(vpdpwssd_evex) )
+goto fail;
+
+asm ( "vinserti64x4 $1, %%ymm4, %%zmm3, %%zmm0\n\t"
+  "vpcmpeqd %%zmm0, %%zmm5, %%k0\n\t"
+  "kmovw %%k0, %0" : "=g" (rc) );
+if ( rc != 0x )
+goto fail;
+printf("okay\n");
+}
+else
+printf("skipped\n");
+
 printf("%-40s", "Testing invpcid 16(%ecx),%%edx...");
 if ( stack_exec )
 {
--- a/tools/tests/x86_emulator/x86-emulate.h
+++ b/tools/tests/x86_emulator/x86-emulate.h
@@ -170,6 +170,7 @@ static inline bool xcr0_mask(uint64_t ma
 #define cpu_has_avx512_4fmaps (cp.feat.avx512_4fmaps && xcr0_mask(0xe6))
 #define cpu_has_avx512_vp2intersect (cp.feat.avx512_vp2intersect && 
xcr0_mask(0xe6))
 #define cpu_has_serialize  cp.feat.serialize
+#define cpu_has_avx_vnni   (cp.feat.avx_vnni && xcr0_mask(6))
 #define cpu_has_avx512_bf16 (cp.feat.avx512_bf16 && xcr0_mask(0xe6))
 
 #define cpu_has_xgetbv1   (cpu_has_xsave && cp.xstate.xgetbv1)
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -2008,6 +2008,7 @@ amd_like(const struct x86_emulate_ctxt *
 #define vcpu_has_avx512_4fmaps() (ctxt->cpuid->feat.avx512_4fmaps)
 #define vcpu_has_avx512_vp2intersec

RE: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header

2020-10-29 Thread David Laight

From: Arnd Bergmann
> Sent: 28 October 2020 21:21
> 
> From: Arnd Bergmann 
> 
> There are hundreds of warnings in a W=2 build about a local
> variable shadowing the global 'apic' definition:
> 
> arch/x86/kvm/lapic.h:149:65: warning: declaration of 'apic' shadows a global 
> declaration [-Wshadow]
> 
> Avoid this by renaming the global 'apic' variable to the more descriptive
> 'x86_system_apic'. It was originally called 'genapic', but both that
> and the current 'apic' seem to be a little overly generic for a global
> variable.
> 
> Fixes: c48f14966cc4 ("KVM: inline kvm_apic_present() and kvm_lapic_enabled()")
> Fixes: c8d46cf06dc2 ("x86: rename 'genapic' to 'apic'")
> Signed-off-by: Arnd Bergmann 
> ---
> v2: rename the global instead of the local variable in the header
...
> diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
> index 284e73661a18..33e2dc78ca11 100644
> --- a/arch/x86/hyperv/hv_apic.c
> +++ b/arch/x86/hyperv/hv_apic.c
> @@ -259,14 +259,14 @@ void __init hv_apic_init(void)
>   /*
>* Set the IPI entry points.
>*/
> - orig_apic = *apic;
> -
> - apic->send_IPI = hv_send_ipi;
> - apic->send_IPI_mask = hv_send_ipi_mask;
> - apic->send_IPI_mask_allbutself = hv_send_ipi_mask_allbutself;
> - apic->send_IPI_allbutself = hv_send_ipi_allbutself;
> - apic->send_IPI_all = hv_send_ipi_all;
> - apic->send_IPI_self = hv_send_ipi_self;
> + orig_apic = *x86_system_apic;
> +
> + x86_system_apic->send_IPI = hv_send_ipi;
> + x86_system_apic->send_IPI_mask = hv_send_ipi_mask;
> + x86_system_apic->send_IPI_mask_allbutself = 
> hv_send_ipi_mask_allbutself;
> + x86_system_apic->send_IPI_allbutself = hv_send_ipi_allbutself;
> + x86_system_apic->send_IPI_all = hv_send_ipi_all;
> + x86_system_apic->send_IPI_self = hv_send_ipi_self;
>   }
> 
>   if (ms_hyperv.hints & HV_X64_APIC_ACCESS_RECOMMENDED) {
> @@ -285,10 +285,10 @@ void __init hv_apic_init(void)
>*/
>   apic_set_eoi_write(hv_apic_eoi_write);
>   if (!x2apic_enabled()) {
> - apic->read  = hv_apic_read;
> - apic->write = hv_apic_write;
> - apic->icr_write = hv_apic_icr_write;
> - apic->icr_read  = hv_apic_icr_read;
> + x86_system_apic->read  = hv_apic_read;
> + x86_system_apic->write = hv_apic_write;
> + x86_system_apic->icr_write = hv_apic_icr_write;
> + x86_system_apic->icr_read  = hv_apic_icr_read;
>   }

For those two just add:
struct apic *apic = x86_system_apic;
before all the assignments.
Less churn and much better code.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

[PATCH] x86/HVM: send mapcache invalidation request to qemu regardless of preemption

2020-10-29 Thread Jan Beulich

Even if only part of a hypercall completed before getting preempted,
invalidation ought to occur. Therefore fold the two return statements.

Signed-off-by: Jan Beulich 
---
Split off from "x86/HVM: refine when to send mapcache invalidation
request to qemu".

--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -326,14 +326,11 @@ int hvm_hypercall(struct cpu_user_regs *
 
 HVM_DBG_LOG(DBG_LEVEL_HCALL, "hcall%lu -> %lx", eax, regs->rax);
 
-if ( curr->hcall_preempted )
-return HVM_HCALL_preempted;
-
 if ( unlikely(currd->arch.hvm.qemu_mapcache_invalidate) &&
  test_and_clear_bool(currd->arch.hvm.qemu_mapcache_invalidate) )
 send_invalidate_req();
 
-return HVM_HCALL_completed;
+return curr->hcall_preempted ? HVM_HCALL_preempted : HVM_HCALL_completed;
 }
 
 /*

[PATCH] x86/hvm: process softirq while saving/loading entries

2020-10-29 Thread Roger Pau Monne

On slow systems with sync_console saving or loading the context of big
guests can cause the watchdog to trigger. Fix this by adding a couple
of process_pending_softirqs.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/hvm/save.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/xen/arch/x86/hvm/save.c b/xen/arch/x86/hvm/save.c
index a2c56fbc1e..584620985b 100644
--- a/xen/arch/x86/hvm/save.c
+++ b/xen/arch/x86/hvm/save.c
@@ -21,6 +21,7 @@
  */
 
 #include 
+#include 
 #include 
 
 #include 
@@ -255,6 +256,7 @@ int hvm_save(struct domain *d, hvm_domain_context_t *h)
v, i);
 return -ENODATA;
 }
+process_pending_softirqs();
 }
 }
 else
@@ -268,6 +270,7 @@ int hvm_save(struct domain *d, hvm_domain_context_t *h)
d->domain_id, i);
 return -ENODATA;
 }
+process_pending_softirqs();
 }
 }
 
@@ -341,6 +344,7 @@ int hvm_load(struct domain *d, hvm_domain_context_t *h)
d->domain_id, desc->typecode, desc->instance);
 return -1;
 }
+process_pending_softirqs();
 }
 
 /* Not reached */
-- 
2.29.0

Re: [PATCH 2/2] tools/libs: fix uninstall rule for header files

2020-10-29 Thread Bertrand Marquis




> On 19 Oct 2020, at 08:21, Jan Beulich  wrote:
> 
> This again was working right only as long as $(LIBHEADER) consisted of
> just one entry.
> 
> Signed-off-by: Jan Beulich 
Reviewed-by: Bertrand Marquis 

The change is obviously fixing a bug :-) and the double $ is required to 
protect from make.

Cheers
Bertrand


> ---
> An alternative would be to use $(addprefix ) without any shell loop.
> 
> --- a/tools/libs/libs.mk
> +++ b/tools/libs/libs.mk
> @@ -107,7 +107,7 @@ install: build
> .PHONY: uninstall
> uninstall:
>   rm -f $(DESTDIR)$(PKG_INSTALLDIR)/$(LIB_FILE_NAME).pc
> - for i in $(LIBHEADER); do rm -f $(DESTDIR)$(includedir)/$(LIBHEADER); 
> done
> + for i in $(LIBHEADER); do rm -f $(DESTDIR)$(includedir)/$$i; done
>   rm -f $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so
>   rm -f $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so.$(MAJOR)
>   rm -f $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so.$(MAJOR).$(MINOR)
> 
>

[PATCH v1] xl: fix description of migrate --debug

2020-10-29 Thread Olaf Hering

xl migrate --debug used to track every pfn in every batch of pages.
But these times are gone. Adjust the help text to tell what --debug
is supposed to do today.

Signed-off-by: Olaf Hering 
---
 docs/man/xl.1.pod.in   | 4 +++-
 tools/xl/xl_cmdtable.c | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 4bde0672fa..d0f50f0b4a 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -488,7 +488,9 @@ Include timestamps in output.
 
 =item B<--debug>
 
-Display huge (!) amount of debug information during the migration process.
+Verify transferred domU page data. All memory will be transferred one more
+time to the destination host while the domU is paused, and compare with
+the result of the inital transfer while the domU was still running.
 
 =item B<-p>
 
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index fd2dc0aef2..af160dde42 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -168,7 +168,7 @@ struct cmd_spec cmd_table[] = {
   "-e  Do not wait in the background (on ) for the 
death\n"
   "of the domain.\n"
   "-T  Show timestamps during the migration process.\n"
-  "--debug Print huge (!) amount of debug during the migration 
process.\n"
+  "--debug Verify transferred domU page data.\n"
   "-p  Do not unpause domain after migrating it.\n"
   "-D  Preserve the domain id"
 },

RE: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header

2020-10-29 Thread David Laight

From: Arnd Bergmann
> Sent: 29 October 2020 09:51
...
> I think ideally there would be no global variable, withall accesses
> encapsulated in function calls, possibly using static_call() optimizations
> if any of them are performance critical.

There isn't really a massive difference between global variables
and global access functions.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Re: [PATCH] x86/hvm: process softirq while saving/loading entries

2020-10-29 Thread Jan Beulich

On 29.10.2020 16:20, Roger Pau Monne wrote:
> On slow systems with sync_console saving or loading the context of big
> guests can cause the watchdog to trigger. Fix this by adding a couple
> of process_pending_softirqs.

Which raises the question in how far this is then also a problem
for the caller of the underlying hypercall. IOW I wonder whether
instead we need to make use of continuations here. Nevertheless
...

> Signed-off-by: Roger Pau Monné 

Acked-by: Jan Beulich 

Jan

[PATCH] tools: add noidentpt domain config option

2020-10-29 Thread Tamas K Lengyel

The Identity Pagetable is currently being created for all HVM VMs during setup.
This was only necessary for running HVM guests on Intel CPUs where EPT was
available but unrestricted guest mode was not.

Add option to skip the creation of the Identity Pagetable via the "noidentpt"
xl config option. This allows a system administrator to skip this step when
the hardware is known to have the required features.

Signed-off-by: Tamas K Lengyel 
---
Further context: while running VM forks a significant bottle-neck was
identified when HVM_PARAM_IDENT_PT is getting copied from the parent VM. This
is due to the fact that setting this parameter requires obtaining a Xen-wide
lock (domctl_lock_acquire). When running several VM forks in parallel during
fuzzing the fork reset hypercall can fail due to the lock being taken by
another fork that's being reset at the same time. This whole situation can be
avoided if there is no Identity-map pagetable to begin with as on modern CPUs
this identity-map pagetable is not actually required.
---
 docs/man/xl.cfg.5.pod.in |  5 +
 tools/include/xenguest.h |  1 +
 tools/libs/guest/xg_dom_x86.c| 31 +--
 tools/libs/light/libxl_create.c  |  2 ++
 tools/libs/light/libxl_dom.c |  2 ++
 tools/libs/light/libxl_types.idl |  1 +
 tools/xl/xl_parse.c  |  2 ++
 7 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 0532739c1f..4d992fe346 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -587,6 +587,11 @@ which are incompatible with migration. Currently this is 
limited to
 enabling the invariant TSC feature flag in CPUID results when TSC is
 not emulated.
 
+=item B
+
+Disable the creation of the Identity-map Pagetable that was required to run HVM
+guests on Intel CPUs with EPT where no unrestricted guest mode was available.
+
 =item B
 
 Specify that this domain is a driver domain. This enables certain
diff --git a/tools/include/xenguest.h b/tools/include/xenguest.h
index a9984dbea5..998a8c57ba 100644
--- a/tools/include/xenguest.h
+++ b/tools/include/xenguest.h
@@ -26,6 +26,7 @@
 
 #define XCFLAGS_LIVE  (1 << 0)
 #define XCFLAGS_DEBUG (1 << 1)
+#define XCFLAGS_NOIDENTPT (1 << 2)
 
 #define X86_64_B_SIZE   64 
 #define X86_32_B_SIZE   32
diff --git a/tools/libs/guest/xg_dom_x86.c b/tools/libs/guest/xg_dom_x86.c
index 2953aeb90b..827bea7eb7 100644
--- a/tools/libs/guest/xg_dom_x86.c
+++ b/tools/libs/guest/xg_dom_x86.c
@@ -718,20 +718,23 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom)
 goto out;
 }
 
-/*
- * Identity-map page table is required for running with CR0.PG=0 when
- * using Intel EPT. Create a 32-bit non-PAE page directory of superpages.
- */
-if ( (ident_pt = xc_map_foreign_range(
-  xch, domid, PAGE_SIZE, PROT_READ | PROT_WRITE,
-  special_pfn(SPECIALPAGE_IDENT_PT))) == NULL )
-goto error_out;
-for ( i = 0; i < PAGE_SIZE / sizeof(*ident_pt); i++ )
-ident_pt[i] = ((i << 22) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
-   _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-munmap(ident_pt, PAGE_SIZE);
-xc_hvm_param_set(xch, domid, HVM_PARAM_IDENT_PT,
- special_pfn(SPECIALPAGE_IDENT_PT) << PAGE_SHIFT);
+if ( !(dom->flags & XCFLAGS_NOIDENTPT) )
+{
+/*
+ * Identity-map page table is required for running with CR0.PG=0 when
+ * using Intel EPT. Create a 32-bit non-PAE page directory of 
superpages.
+ */
+if ( (ident_pt = xc_map_foreign_range(
+  xch, domid, PAGE_SIZE, PROT_READ | PROT_WRITE,
+  special_pfn(SPECIALPAGE_IDENT_PT))) == NULL )
+goto error_out;
+for ( i = 0; i < PAGE_SIZE / sizeof(*ident_pt); i++ )
+ident_pt[i] = ((i << 22) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
+   _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
+munmap(ident_pt, PAGE_SIZE);
+xc_hvm_param_set(xch, domid, HVM_PARAM_IDENT_PT,
+ special_pfn(SPECIALPAGE_IDENT_PT) << PAGE_SHIFT);
+}
 
 dom->console_pfn = special_pfn(SPECIALPAGE_CONSOLE);
 xc_clear_domain_page(dom->xch, dom->guest_domid, dom->console_pfn);
diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index 321a13e519..62b06b359c 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -256,6 +256,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 
 libxl_defbool_setdefault(&b_info->disable_migrate, false);
 
+libxl_defbool_setdefault(&b_info->disable_identpt, false);
+
 for (i = 0 ; i < b_info->num_iomem; i++)
 if (b_info->iomem[i].gfn == LIBXL_INVALID_GFN)
 b_info->iomem[i].gfn = b_info->iomem[i].start;
diff --git a/tools/libs/light/libxl_dom.c b/tools/libs/light/libxl_dom.c
index 01d989a976..

Re: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header

2020-10-29 Thread Thomas Gleixner

Arnd,

On Thu, Oct 29 2020 at 10:51, Arnd Bergmann wrote:
> On Thu, Oct 29, 2020 at 8:04 AM Paolo Bonzini  wrote:
>> On 28/10/20 22:20, Arnd Bergmann wrote:
>> > Avoid this by renaming the global 'apic' variable to the more descriptive
>> > 'x86_system_apic'. It was originally called 'genapic', but both that
>> > and the current 'apic' seem to be a little overly generic for a global
>> > variable.
>>
>> The 'apic' affects only the current CPU, so one of 'x86_local_apic',
>> 'x86_lapic' or 'x86_apic' is probably preferrable.
>
> Ok, I'll change it to x86_local_apic then, unless someone else has
> a preference between them.

x86_local_apic is fine with me.

> I think ideally there would be no global variable, withall accesses
> encapsulated in function calls, possibly using static_call() optimizations
> if any of them are performance critical.

There are a bunch, yes.

> It doesn't seem hard to do, but I'd rather leave that change to
> an x86 person ;-)

It's not hard, but I'm not really sure whether it buys much.

Can you please redo that against tip x86/apic. Much of what you are
touching got a major overhaul.

Thanks,

tglx

[xen-unstable-smoke test] 156297: tolerable all pass - PUSHED

2020-10-29 Thread osstest service owner

flight 156297 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/156297/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  1fd1d4bafdf6f9f8fe5ca9b947f016a7aae92a74
baseline version:
 xen  16a20963b3209788f2c0d3a3eebb7d92f03f5883

Last test of basis   156260  2020-10-27 18:01:29 Z1 days
Testing same since   156297  2020-10-29 14:01:23 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   16a20963b3..1fd1d4bafd  1fd1d4bafdf6f9f8fe5ca9b947f016a7aae92a74 -> smoke

Re: [PATCH v1 4/4] xen/pci: solve compilation error when memory paging is not enabled.

2020-10-29 Thread Rahul Singh

Hello Jan,

> On 28 Oct 2020, at 3:13 pm, Rahul Singh  wrote:
> 
> Hello Jan,
> 
>> On 28 Oct 2020, at 11:56 am, Jan Beulich  wrote:
>> 
>> On 26.10.2020 18:17, Rahul Singh wrote:
>>> --- a/xen/drivers/passthrough/pci.c
>>> +++ b/xen/drivers/passthrough/pci.c
>>> @@ -1419,13 +1419,15 @@ static int assign_device(struct domain *d, u16 seg, 
>>> u8 bus, u8 devfn, u32 flag)
>>>if ( !is_iommu_enabled(d) )
>>>return 0;
>>> 
>>> -/* Prevent device assign if mem paging or mem sharing have been 
>>> +#if defined(CONFIG_HAS_MEM_PAGING) || defined(CONFIG_MEM_SHARING)
>>> +/* Prevent device assign if mem paging or mem sharing have been
>>> * enabled for this domain */
>>>if ( d != dom_io &&
>>> unlikely(mem_sharing_enabled(d) ||
>>>  vm_event_check_ring(d->vm_event_paging) ||
>>>  p2m_get_hostp2m(d)->global_logdirty) )
>>>return -EXDEV;
>>> +#endif
>> 
>> Besides this also disabling mem-sharing and log-dirty related
>> logic, I don't think the change is correct: Each item being
>> checked needs individually disabling depending on its associated
>> CONFIG_*. For this, perhaps you want to introduce something
>> like mem_paging_enabled(d), to avoid the need for #ifdef here?
>> 
> 
> Ok I will fix that in next version. 

I just check and found out that mem-sharing , men-paging and log-dirty is x86 
specific and not implemented for ARM.
Is that will be ok if I move above code to x86 specific directory and introduce 
new function arch_pcidev_is_assignable() that will test if pcidev is assignable 
or not ?

> 
>> Jan

Regards,
Rahul

Re: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header

2020-10-29 Thread Paolo Bonzini

On 29/10/20 17:56, Arvind Sankar wrote:
>> For those two just add:
>>  struct apic *apic = x86_system_apic;
>> before all the assignments.
>> Less churn and much better code.
>>
> Why would it be better code?
> 

I think he means the compiler produces better code, because it won't
read the global variable repeatedly.  Not sure if that's true,(*) but I
think I do prefer that version if Arnd wants to do that tweak.

Paolo

(*) if it is, it might also be due to Linux using -fno-strict-aliasing

[PATCH 0/2] x86: PV shim vs grant table

2020-10-29 Thread Jan Beulich

While Andrew reported a randconfig build failure, I started wondering
why we'd ever build in a way different from what had failed to build.
Fix the build and then switch the shim config file accordingly.

1: fix build of PV shim when !GRANT_TABLE
2: PV shim doesn't need GRANT_TABLE

Jan

[PATCH 1/2] x86: fix build of PV shim when !GRANT_TABLE

2020-10-29 Thread Jan Beulich

To do its compat translation, shim code needs to include the compat
header. For this to work, this header first of all needs to be
generated.

Reported-by: Andrew Cooper 
Signed-off-by: Jan Beulich 

--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -24,6 +24,7 @@ headers-$(CONFIG_X86) += compat/arch
 headers-$(CONFIG_ARGO)+= compat/argo.h
 headers-$(CONFIG_PV)  += compat/callback.h
 headers-$(CONFIG_GRANT_TABLE) += compat/grant_table.h
+headers-$(CONFIG_PV_SHIM) += compat/grant_table.h
 headers-$(CONFIG_HVM) += compat/hvm/dm_op.h
 headers-$(CONFIG_HVM) += compat/hvm/hvm_op.h
 headers-$(CONFIG_HVM) += compat/hvm/hvm_vcpu.h

[PATCH 2/2] x86: PV shim doesn't need GRANT_TABLE

2020-10-29 Thread Jan Beulich

The only reference into the code controlled by this option is from the
hypercall table, and that hypercall table entry gets overwritten.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/configs/pvshim_defconfig
+++ b/xen/arch/x86/configs/pvshim_defconfig
@@ -9,6 +9,7 @@ CONFIG_EXPERT=y
 CONFIG_SCHED_NULL=y
 # Disable features not used by the PV shim
 # CONFIG_XEN_SHSTK is not set
+# CONFIG_GRANT_TABLE is not set
 # CONFIG_HYPFS is not set
 # CONFIG_BIGMEM is not set
 # CONFIG_KEXEC is not set

Re: [PATCH v1 4/4] xen/pci: solve compilation error when memory paging is not enabled.

2020-10-29 Thread Jan Beulich

On 29.10.2020 17:58, Rahul Singh wrote:
>> On 28 Oct 2020, at 3:13 pm, Rahul Singh  wrote:
>>> On 28 Oct 2020, at 11:56 am, Jan Beulich  wrote:
>>> On 26.10.2020 18:17, Rahul Singh wrote:
 --- a/xen/drivers/passthrough/pci.c
 +++ b/xen/drivers/passthrough/pci.c
 @@ -1419,13 +1419,15 @@ static int assign_device(struct domain *d, u16 
 seg, u8 bus, u8 devfn, u32 flag)
if ( !is_iommu_enabled(d) )
return 0;

 -/* Prevent device assign if mem paging or mem sharing have been 
 +#if defined(CONFIG_HAS_MEM_PAGING) || defined(CONFIG_MEM_SHARING)
 +/* Prevent device assign if mem paging or mem sharing have been
 * enabled for this domain */
if ( d != dom_io &&
 unlikely(mem_sharing_enabled(d) ||
  vm_event_check_ring(d->vm_event_paging) ||
  p2m_get_hostp2m(d)->global_logdirty) )
return -EXDEV;
 +#endif
>>>
>>> Besides this also disabling mem-sharing and log-dirty related
>>> logic, I don't think the change is correct: Each item being
>>> checked needs individually disabling depending on its associated
>>> CONFIG_*. For this, perhaps you want to introduce something
>>> like mem_paging_enabled(d), to avoid the need for #ifdef here?
>>>
>>
>> Ok I will fix that in next version. 
> 
> I just check and found out that mem-sharing , men-paging and log-dirty is x86 
> specific and not implemented for ARM.
> Is that will be ok if I move above code to x86 specific directory and 
> introduce new function arch_pcidev_is_assignable() that will test if pcidev 
> is assignable or not ?

As an immediate workaround - perhaps (long term the individual
pieces should still be individually checked here, as they're
not inherently x86-specific). Since there's no device property
involved here, the suggested name looks misleading. Perhaps
arch_iommu_usable(d)?

Jan

Re: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header

2020-10-29 Thread Arvind Sankar

On Thu, Oct 29, 2020 at 03:05:31PM +, David Laight wrote:
> From: Arnd Bergmann
> > Sent: 28 October 2020 21:21
> > 
> > From: Arnd Bergmann 
> > 
> > There are hundreds of warnings in a W=2 build about a local
> > variable shadowing the global 'apic' definition:
> > 
> > arch/x86/kvm/lapic.h:149:65: warning: declaration of 'apic' shadows a 
> > global declaration [-Wshadow]
> > 
> > Avoid this by renaming the global 'apic' variable to the more descriptive
> > 'x86_system_apic'. It was originally called 'genapic', but both that
> > and the current 'apic' seem to be a little overly generic for a global
> > variable.
> > 
> > Fixes: c48f14966cc4 ("KVM: inline kvm_apic_present() and 
> > kvm_lapic_enabled()")
> > Fixes: c8d46cf06dc2 ("x86: rename 'genapic' to 'apic'")
> > Signed-off-by: Arnd Bergmann 
> > ---
> > v2: rename the global instead of the local variable in the header
> ...
> > diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
> > index 284e73661a18..33e2dc78ca11 100644
> > --- a/arch/x86/hyperv/hv_apic.c
> > +++ b/arch/x86/hyperv/hv_apic.c
> > @@ -259,14 +259,14 @@ void __init hv_apic_init(void)
> > /*
> >  * Set the IPI entry points.
> >  */
> > -   orig_apic = *apic;
> > -
> > -   apic->send_IPI = hv_send_ipi;
> > -   apic->send_IPI_mask = hv_send_ipi_mask;
> > -   apic->send_IPI_mask_allbutself = hv_send_ipi_mask_allbutself;
> > -   apic->send_IPI_allbutself = hv_send_ipi_allbutself;
> > -   apic->send_IPI_all = hv_send_ipi_all;
> > -   apic->send_IPI_self = hv_send_ipi_self;
> > +   orig_apic = *x86_system_apic;
> > +
> > +   x86_system_apic->send_IPI = hv_send_ipi;
> > +   x86_system_apic->send_IPI_mask = hv_send_ipi_mask;
> > +   x86_system_apic->send_IPI_mask_allbutself = 
> > hv_send_ipi_mask_allbutself;
> > +   x86_system_apic->send_IPI_allbutself = hv_send_ipi_allbutself;
> > +   x86_system_apic->send_IPI_all = hv_send_ipi_all;
> > +   x86_system_apic->send_IPI_self = hv_send_ipi_self;
> > }
> > 
> > if (ms_hyperv.hints & HV_X64_APIC_ACCESS_RECOMMENDED) {
> > @@ -285,10 +285,10 @@ void __init hv_apic_init(void)
> >  */
> > apic_set_eoi_write(hv_apic_eoi_write);
> > if (!x2apic_enabled()) {
> > -   apic->read  = hv_apic_read;
> > -   apic->write = hv_apic_write;
> > -   apic->icr_write = hv_apic_icr_write;
> > -   apic->icr_read  = hv_apic_icr_read;
> > +   x86_system_apic->read  = hv_apic_read;
> > +   x86_system_apic->write = hv_apic_write;
> > +   x86_system_apic->icr_write = hv_apic_icr_write;
> > +   x86_system_apic->icr_read  = hv_apic_icr_read;
> > }
> 
> For those two just add:
>   struct apic *apic = x86_system_apic;
> before all the assignments.
> Less churn and much better code.
> 

Why would it be better code?

[PATCH v1 02/23] tools: add xc_is_known_page_type to libxenctrl

2020-10-29 Thread Olaf Hering

Users of xc_get_pfn_type_batch may want to sanity check the data
returned by Xen. Add a simple helper for this purpose.

Signed-off-by: Olaf Hering 
---
 tools/libs/ctrl/xc_private.h | 33 +
 1 file changed, 33 insertions(+)

diff --git a/tools/libs/ctrl/xc_private.h b/tools/libs/ctrl/xc_private.h
index 5d2c7274fb..afb08aafe1 100644
--- a/tools/libs/ctrl/xc_private.h
+++ b/tools/libs/ctrl/xc_private.h
@@ -421,6 +421,39 @@ void *xc_map_foreign_ranges(xc_interface *xch, uint32_t 
dom,
 int xc_get_pfn_type_batch(xc_interface *xch, uint32_t dom,
   unsigned int num, xen_pfn_t *);
 
+/* Sanitiy check for types returned by Xen */
+static inline bool xc_is_known_page_type(xen_pfn_t type)
+{
+bool ret;
+
+switch (type)
+{
+case XEN_DOMCTL_PFINFO_NOTAB:
+
+case XEN_DOMCTL_PFINFO_L1TAB:
+case XEN_DOMCTL_PFINFO_L1TAB | XEN_DOMCTL_PFINFO_LPINTAB:
+
+case XEN_DOMCTL_PFINFO_L2TAB:
+case XEN_DOMCTL_PFINFO_L2TAB | XEN_DOMCTL_PFINFO_LPINTAB:
+
+case XEN_DOMCTL_PFINFO_L3TAB:
+case XEN_DOMCTL_PFINFO_L3TAB | XEN_DOMCTL_PFINFO_LPINTAB:
+
+case XEN_DOMCTL_PFINFO_L4TAB:
+case XEN_DOMCTL_PFINFO_L4TAB | XEN_DOMCTL_PFINFO_LPINTAB:
+
+case XEN_DOMCTL_PFINFO_XTAB:
+case XEN_DOMCTL_PFINFO_XALLOC:
+case XEN_DOMCTL_PFINFO_BROKEN:
+ret = true;
+break;
+default:
+ret = false;
+break;
+}
+return ret;
+}
+
 void bitmap_64_to_byte(uint8_t *bp, const uint64_t *lp, int nbits);
 void bitmap_byte_to_64(uint64_t *lp, const uint8_t *bp, int nbits);

[PATCH v1 05/23] tools: show migration transfer rate in send_dirty_pages

2020-10-29 Thread Olaf Hering

Show how fast domU pages are transferred in each iteration.

The relevant data is how fast the pfns travel, not so much how much
protocol overhead exists. So the reported MiB/sec is just for pfns.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h |  2 ++
 tools/libs/guest/xg_sr_save.c   | 47 +
 2 files changed, 49 insertions(+)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 70e328e951..f3a7a29298 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -238,6 +238,8 @@ struct xc_sr_context
 bool debug;
 
 unsigned long p2m_size;
+size_t pages_sent;
+size_t overhead_sent;
 
 struct precopy_stats stats;
 
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 0546d3d9e6..f58a35ddde 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -1,5 +1,6 @@
 #include 
 #include 
+#include 
 
 #include "xg_sr_common.h"
 
@@ -238,6 +239,8 @@ static int write_batch(struct xc_sr_context *ctx)
 iov[3].iov_len = nr_pfns * sizeof(*rec_pfns);
 
 iovcnt = 4;
+ctx->save.pages_sent += nr_pages;
+ctx->save.overhead_sent += sizeof(rec) + sizeof(hdr) + nr_pfns * 
sizeof(*rec_pfns);
 
 if ( nr_pages )
 {
@@ -357,6 +360,43 @@ static int suspend_domain(struct xc_sr_context *ctx)
 return 0;
 }
 
+static void show_transfer_rate(struct xc_sr_context *ctx, struct timespec 
*start)
+{
+xc_interface *xch = ctx->xch;
+struct timespec end = {}, diff = {};
+size_t ms, MiB_sec = ctx->save.pages_sent * PAGE_SIZE;
+
+if (!MiB_sec)
+return;
+
+if ( clock_gettime(CLOCK_MONOTONIC, &end) )
+PERROR("clock_gettime");
+
+if ( (end.tv_nsec - start->tv_nsec) < 0 )
+{
+diff.tv_sec = end.tv_sec - start->tv_sec - 1;
+diff.tv_nsec = end.tv_nsec - start->tv_nsec + (1000U*1000U*1000U);
+}
+else
+{
+diff.tv_sec = end.tv_sec - start->tv_sec;
+diff.tv_nsec = end.tv_nsec - start->tv_nsec;
+}
+
+ms = (diff.tv_nsec / (1000U*1000U));
+if (!ms)
+ms = 1;
+ms += (diff.tv_sec * 1000U);
+
+MiB_sec *= 1000U;
+MiB_sec /= ms;
+MiB_sec /= 1024U*1024U;
+
+errno = 0;
+ERROR("%s: %zu bytes + %zu pages in %ld.%09ld sec, %zu MiB/sec", __func__,
+  ctx->save.overhead_sent, ctx->save.pages_sent, diff.tv_sec, 
diff.tv_nsec, MiB_sec);
+}
+
 /*
  * Send a subset of pages in the guests p2m, according to the dirty bitmap.
  * Used for each subsequent iteration of the live migration loop.
@@ -370,9 +410,15 @@ static int send_dirty_pages(struct xc_sr_context *ctx,
 xen_pfn_t p;
 unsigned long written;
 int rc;
+struct timespec start = {};
 DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap,
 &ctx->save.dirty_bitmap_hbuf);
 
+ctx->save.pages_sent = 0;
+ctx->save.overhead_sent = 0;
+if ( clock_gettime(CLOCK_MONOTONIC, &start) )
+PERROR("clock_gettime");
+
 for ( p = 0, written = 0; p < ctx->save.p2m_size; ++p )
 {
 if ( !test_bit(p, dirty_bitmap) )
@@ -396,6 +442,7 @@ static int send_dirty_pages(struct xc_sr_context *ctx,
 if ( written > entries )
 DPRINTF("Bitmap contained more entries than expected...");
 
+show_transfer_rate(ctx, &start);
 xc_report_progress_step(xch, entries, entries);
 
 return ctx->save.ops.check_vm_state(ctx);

[PATCH v1 15/23] tools/guest: restore: move pfns array

2020-10-29 Thread Olaf Hering

Remove allocation from hotpath, move pfns array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  | 2 ++
 tools/libs/guest/xg_sr_restore.c | 6 ++
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 2a020fef5c..516d9b9fb5 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -224,6 +224,8 @@ struct xc_sr_save_arrays {
 };
 
 struct xc_sr_restore_arrays {
+/* handle_page_data */
+xen_pfn_t pfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 4a9ece9681..7d1447e86c 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -315,7 +315,7 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 unsigned int i, pages_of_data = 0;
 int rc = -1;
 
-xen_pfn_t *pfns = NULL, pfn;
+xen_pfn_t *pfns = ctx->restore.m->pfns, pfn;
 uint32_t *types = NULL, type;
 
 /*
@@ -363,9 +363,8 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 goto err;
 }
 
-pfns = malloc(pages->count * sizeof(*pfns));
 types = malloc(pages->count * sizeof(*types));
-if ( !pfns || !types )
+if ( !types )
 {
 ERROR("Unable to allocate enough memory for %u pfns",
   pages->count);
@@ -412,7 +411,6 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
&pages->pfn[pages->count]);
  err:
 free(types);
-free(pfns);
 
 return rc;
 }

[PATCH v1 14/23] tools/guest: save: move local_pages array

2020-10-29 Thread Olaf Hering

Remove allocation from hotpath, move local_pages array into preallocated space.

Adjust the code to use the src page as is in case of HVM.
In case of PV the page may need to be normalised, use an private memory
area for this purpose.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h   | 22 ++-
 tools/libs/guest/xg_sr_save.c | 25 +++--
 tools/libs/guest/xg_sr_save_x86_hvm.c |  5 +++--
 tools/libs/guest/xg_sr_save_x86_pv.c  | 31 ++-
 4 files changed, 39 insertions(+), 44 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 33e66678c6..2a020fef5c 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -33,16 +33,12 @@ struct xc_sr_save_ops
  * Optionally transform the contents of a page from being specific to the
  * sending environment, to being generic for the stream.
  *
- * The page of data at the end of 'page' may be a read-only mapping of a
- * running guest; it must not be modified.  If no transformation is
- * required, the callee should leave '*pages' untouched.
+ * The page of data '*src' may be a read-only mapping of a running guest;
+ * it must not be modified. If no transformation is required, the callee
+ * should leave '*src' untouched, and return it via '**ptr'.
  *
- * If a transformation is required, the callee should allocate themselves
- * a local page using malloc() and return it via '*page'.
- *
- * The caller shall free() '*page' in all cases.  In the case that the
- * callee encounters an error, it should *NOT* free() the memory it
- * allocated for '*page'.
+ * If a transformation is required, the callee should provide the
+ * transformed page in a private buffer and return it via '**ptr'.
  *
  * It is valid to fail with EAGAIN if the transformation is not able to be
  * completed at this point.  The page shall be retried later.
@@ -50,7 +46,7 @@ struct xc_sr_save_ops
  * @returns 0 for success, -1 for failure, with errno appropriately set.
  */
 int (*normalise_page)(struct xc_sr_context *ctx, xen_pfn_t type,
-  void **page);
+  void *src, unsigned int idx, void **ptr);
 
 /**
  * Set up local environment to save a domain. (Typically querying
@@ -371,6 +367,12 @@ struct xc_sr_context
 
 union
 {
+struct
+{
+/* Used by write_batch for modified pages. */
+void *normalised_pages;
+} save;
+
 struct
 {
 /* State machine for the order of received records. */
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 658f834ae8..804e4ccb3a 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -91,11 +91,10 @@ static int write_batch(struct xc_sr_context *ctx)
 xen_pfn_t *mfns = ctx->save.m->mfns, *types = ctx->save.m->types;
 void *guest_mapping = NULL;
 void **guest_data = ctx->save.m->guest_data;
-void **local_pages = NULL;
 int *errors = ctx->save.m->errors, rc = -1;
 unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0;
 unsigned int nr_pfns = ctx->save.nr_batch_pfns;
-void *page, *orig_page;
+void *src;
 uint64_t *rec_pfns = ctx->save.m->rec_pfns;
 struct iovec *iov = ctx->save.m->iov; int iovcnt = 0;
 struct xc_sr_rec_page_data_header hdr = { 0 };
@@ -105,16 +104,6 @@ static int write_batch(struct xc_sr_context *ctx)
 
 assert(nr_pfns != 0);
 
-/* Pointers to locally allocated pages.  Need freeing. */
-local_pages = calloc(nr_pfns, sizeof(*local_pages));
-
-if ( !local_pages )
-{
-ERROR("Unable to allocate arrays for a batch of %u pages",
-  nr_pfns);
-goto err;
-}
-
 for ( i = 0; i < nr_pfns; ++i )
 {
 types[i] = mfns[i] = ctx->save.ops.pfn_to_gfn(ctx,
@@ -176,11 +165,8 @@ static int write_batch(struct xc_sr_context *ctx)
 goto err;
 }
 
-orig_page = page = guest_mapping + (p * PAGE_SIZE);
-rc = ctx->save.ops.normalise_page(ctx, types[i], &page);
-
-if ( orig_page != page )
-local_pages[i] = page;
+src = guest_mapping + (p * PAGE_SIZE);
+rc = ctx->save.ops.normalise_page(ctx, types[i], src, i, 
&guest_data[i]);
 
 if ( rc )
 {
@@ -195,8 +181,6 @@ static int write_batch(struct xc_sr_context *ctx)
 else
 goto err;
 }
-else
-guest_data[i] = page;
 
 rc = -1;
 ++p;
@@ -255,9 +239,6 @@ static int write_batch(struct xc_sr_context *ctx)
  err:
 if ( guest_mapping )
 xenforeignmemory_un

[PATCH v1 12/23] tools/guest: save: move rec_pfns array

2020-10-29 Thread Olaf Hering

Remove allocation from hotpath, move rec_pfns array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h |  2 ++
 tools/libs/guest/xg_sr_save.c   | 11 +--
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index f315b4f526..81158a4f4d 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -221,6 +221,8 @@ struct xc_sr_save_arrays {
 int errors[MAX_BATCH_SIZE];
 /* write_batch: iovec[] for writev(). */
 struct iovec iov[MAX_BATCH_SIZE + 4];
+/* write_batch */
+uint64_t rec_pfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_restore_arrays {
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 385a591332..4d7fbe09ed 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -96,7 +96,7 @@ static int write_batch(struct xc_sr_context *ctx)
 unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0;
 unsigned int nr_pfns = ctx->save.nr_batch_pfns;
 void *page, *orig_page;
-uint64_t *rec_pfns = NULL;
+uint64_t *rec_pfns = ctx->save.m->rec_pfns;
 struct iovec *iov = ctx->save.m->iov; int iovcnt = 0;
 struct xc_sr_rec_page_data_header hdr = { 0 };
 struct xc_sr_record rec = {
@@ -201,14 +201,6 @@ static int write_batch(struct xc_sr_context *ctx)
 }
 }
 
-rec_pfns = malloc(nr_pfns * sizeof(*rec_pfns));
-if ( !rec_pfns )
-{
-ERROR("Unable to allocate %zu bytes of memory for page data pfn list",
-  nr_pfns * sizeof(*rec_pfns));
-goto err;
-}
-
 hdr.count = nr_pfns;
 
 rec.length = sizeof(hdr);
@@ -259,7 +251,6 @@ static int write_batch(struct xc_sr_context *ctx)
 rc = ctx->save.nr_batch_pfns = 0;
 
  err:
-free(rec_pfns);
 if ( guest_mapping )
 xenforeignmemory_unmap(xch->fmem, guest_mapping, nr_pages_mapped);
 for ( i = 0; local_pages && i < nr_pfns; ++i )

[PATCH v1 22/23] tools/guest: restore: split handle_page_data

2020-10-29 Thread Olaf Hering

handle_page_data must be able to read directly into mapped guest memory.
This will avoid unneccesary memcpy calls for data that can be consumed verbatim.

Split the various steps of record processing:
- move processing to handle_buffered_page_data
- adjust xenforeignmemory_map to set errno in case of failure
- adjust verify mode to set errno in case of failure

This change is preparation for future changes in handle_page_data,
no change in behavior is intended.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  |   9 +
 tools/libs/guest/xg_sr_restore.c | 343 ---
 2 files changed, 231 insertions(+), 121 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 66d1b0dfe6..7ec8867b88 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -230,9 +230,14 @@ struct xc_sr_restore_arrays {
 /* process_page_data */
 xen_pfn_t mfns[MAX_BATCH_SIZE];
 int map_errs[MAX_BATCH_SIZE];
+void *guest_data[MAX_BATCH_SIZE];
+
 /* populate_pfns */
 xen_pfn_t pp_mfns[MAX_BATCH_SIZE];
 xen_pfn_t pp_pfns[MAX_BATCH_SIZE];
+
+/* Must be the last member */
+struct xc_sr_rec_page_data_header pages;
 };
 
 struct xc_sr_context
@@ -323,7 +328,11 @@ struct xc_sr_context
 
 /* Sender has invoked verify mode on the stream. */
 bool verify;
+void *verify_buf;
+
 struct xc_sr_restore_arrays *m;
+void *guest_mapping;
+uint32_t nr_mapped_pages;
 } restore;
 };
 
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 93f69b9ba8..060f3d1f4e 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -186,123 +186,18 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned 
int count,
 return rc;
 }
 
-/*
- * Given a list of pfns, their types, and a block of page data from the
- * stream, populate and record their types, map the relevant subset and copy
- * the data into the guest.
- */
-static int process_page_data(struct xc_sr_context *ctx, unsigned int count,
- xen_pfn_t *pfns, uint32_t *types, void *page_data)
+static int handle_static_data_end_v2(struct xc_sr_context *ctx)
 {
-xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = ctx->restore.m->mfns;
-int *map_errs = ctx->restore.m->map_errs;
-int rc;
-void *mapping = NULL, *guest_page = NULL;
-unsigned int i, /* i indexes the pfns from the record. */
-j,  /* j indexes the subset of pfns we decide to map. */
-nr_pages = 0;
-
-rc = populate_pfns(ctx, count, pfns, types);
-if ( rc )
-{
-ERROR("Failed to populate pfns for batch of %u pages", count);
-goto err;
-}
-
-for ( i = 0; i < count; ++i )
-{
-ctx->restore.ops.set_page_type(ctx, pfns[i], types[i]);
-
-if ( page_type_has_stream_data(types[i]) == true )
-mfns[nr_pages++] = ctx->restore.ops.pfn_to_gfn(ctx, pfns[i]);
-}
-
-/* Nothing to do? */
-if ( nr_pages == 0 )
-goto done;
-
-mapping = guest_page = xenforeignmemory_map(
-xch->fmem, ctx->domid, PROT_READ | PROT_WRITE,
-nr_pages, mfns, map_errs);
-if ( !mapping )
-{
-rc = -1;
-PERROR("Unable to map %u mfns for %u pages of data",
-   nr_pages, count);
-goto err;
-}
-
-for ( i = 0, j = 0; i < count; ++i )
-{
-if ( page_type_has_stream_data(types[i]) == false )
-continue;
-
-if ( map_errs[j] )
-{
-rc = -1;
-ERROR("Mapping pfn %#"PRIpfn" (mfn %#"PRIpfn", type %#"PRIx32") 
failed with %d",
-  pfns[i], mfns[j], types[i], map_errs[j]);
-goto err;
-}
-
-/* Undo page normalisation done by the saver. */
-rc = ctx->restore.ops.localise_page(ctx, types[i], page_data);
-if ( rc )
-{
-ERROR("Failed to localise pfn %#"PRIpfn" (type %#"PRIx32")",
-  pfns[i], types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
-goto err;
-}
-
-if ( ctx->restore.verify )
-{
-/* Verify mode - compare incoming data to what we already have. */
-if ( memcmp(guest_page, page_data, PAGE_SIZE) )
-ERROR("verify pfn %#"PRIpfn" failed (type %#"PRIx32")",
-  pfns[i], types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
-}
-else
-{
-/* Regular mode - copy incoming data into place. */
-memcpy(guest_page, page_data, PAGE_SIZE);
-}
-
-++j;
-guest_page += PAGE_SIZE;
-page_data += PAGE_SIZE;
-}
-
- done:
-rc = 0;
-
- err:
-if ( mapping )
-xenforeignmemory_unmap(xch->fmem, mapping, nr_pages);
-
-return rc;
-}
+int rc = 0;
 
-/*
- * Validate a PAGE_DATA record from the stream, and p

[PATCH v1 04/23] tools: unify type checking for data pfns in migration stream

2020-10-29 Thread Olaf Hering

Introduce a helper which decides if a given pfn type has data
for the migration stream.

No change in behavior intended.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  | 17 
 tools/libs/guest/xg_sr_restore.c | 34 +---
 tools/libs/guest/xg_sr_save.c| 14 ++---
 3 files changed, 24 insertions(+), 41 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index cc3ad1c394..70e328e951 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -455,6 +455,23 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int 
count,
 /* Handle a STATIC_DATA_END record. */
 int handle_static_data_end(struct xc_sr_context *ctx);
 
+static inline bool page_type_has_stream_data(uint32_t type)
+{
+bool ret;
+
+switch (type)
+{
+case XEN_DOMCTL_PFINFO_XTAB:
+case XEN_DOMCTL_PFINFO_XALLOC:
+case XEN_DOMCTL_PFINFO_BROKEN:
+ret = false;
+break;
+default:
+ret = true;
+break;
+}
+return ret;
+}
 #endif
 /*
  * Local variables:
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index f1c3169229..0332ae9f32 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -152,9 +152,8 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int 
count,
 
 for ( i = 0; i < count; ++i )
 {
-if ( (!types || (types &&
- (types[i] != XEN_DOMCTL_PFINFO_XTAB &&
-  types[i] != XEN_DOMCTL_PFINFO_BROKEN))) &&
+if ( (!types ||
+  (types && page_type_has_stream_data(types[i]) == true)) &&
  !pfn_is_populated(ctx, original_pfns[i]) )
 {
 rc = pfn_set_populated(ctx, original_pfns[i]);
@@ -233,25 +232,8 @@ static int process_page_data(struct xc_sr_context *ctx, 
unsigned int count,
 {
 ctx->restore.ops.set_page_type(ctx, pfns[i], types[i]);
 
-switch ( types[i] )
-{
-case XEN_DOMCTL_PFINFO_NOTAB:
-
-case XEN_DOMCTL_PFINFO_L1TAB:
-case XEN_DOMCTL_PFINFO_L1TAB | XEN_DOMCTL_PFINFO_LPINTAB:
-
-case XEN_DOMCTL_PFINFO_L2TAB:
-case XEN_DOMCTL_PFINFO_L2TAB | XEN_DOMCTL_PFINFO_LPINTAB:
-
-case XEN_DOMCTL_PFINFO_L3TAB:
-case XEN_DOMCTL_PFINFO_L3TAB | XEN_DOMCTL_PFINFO_LPINTAB:
-
-case XEN_DOMCTL_PFINFO_L4TAB:
-case XEN_DOMCTL_PFINFO_L4TAB | XEN_DOMCTL_PFINFO_LPINTAB:
-
+if ( page_type_has_stream_data(types[i]) == true )
 mfns[nr_pages++] = ctx->restore.ops.pfn_to_gfn(ctx, pfns[i]);
-break;
-}
 }
 
 /* Nothing to do? */
@@ -271,14 +253,8 @@ static int process_page_data(struct xc_sr_context *ctx, 
unsigned int count,
 
 for ( i = 0, j = 0; i < count; ++i )
 {
-switch ( types[i] )
-{
-case XEN_DOMCTL_PFINFO_XTAB:
-case XEN_DOMCTL_PFINFO_BROKEN:
-case XEN_DOMCTL_PFINFO_XALLOC:
-/* No page data to deal with. */
+if ( page_type_has_stream_data(types[i]) == false )
 continue;
-}
 
 if ( map_errs[j] )
 {
@@ -413,7 +389,7 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 goto err;
 }
 
-if ( type < XEN_DOMCTL_PFINFO_BROKEN )
+if ( page_type_has_stream_data(type) == true )
 /* NOTAB and all L1 through L4 tables (including pinned) should
  * have a page worth of data in the record. */
 pages_of_data++;
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 044d0ae3aa..0546d3d9e6 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -153,13 +153,8 @@ static int write_batch(struct xc_sr_context *ctx)
 goto err;
 }
 
-switch ( types[i] )
-{
-case XEN_DOMCTL_PFINFO_BROKEN:
-case XEN_DOMCTL_PFINFO_XALLOC:
-case XEN_DOMCTL_PFINFO_XTAB:
+if ( page_type_has_stream_data(types[i]) == false )
 continue;
-}
 
 mfns[nr_pages++] = mfns[i];
 }
@@ -177,13 +172,8 @@ static int write_batch(struct xc_sr_context *ctx)
 
 for ( i = 0, p = 0; i < nr_pfns; ++i )
 {
-switch ( types[i] )
-{
-case XEN_DOMCTL_PFINFO_BROKEN:
-case XEN_DOMCTL_PFINFO_XALLOC:
-case XEN_DOMCTL_PFINFO_XTAB:
+if ( page_type_has_stream_data(types[i]) == false )
 continue;
-}
 
 if ( errors[p] )
 {

[PATCH v1 10/23] tools/guest: save: move errors array

2020-10-29 Thread Olaf Hering

Remove allocation from hotpath, move errors array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h | 2 ++
 tools/libs/guest/xg_sr_save.c   | 7 ++-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 3cbadb607b..71b676c0e0 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -217,6 +217,8 @@ struct xc_sr_save_arrays {
 xen_pfn_t mfns[MAX_BATCH_SIZE];
 /* write_batch: Types of the batch pfns. */
 xen_pfn_t types[MAX_BATCH_SIZE];
+/* write_batch: Errors from attempting to map the gfns. */
+int errors[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_restore_arrays {
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 6678df95b8..cdc27a9cde 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -92,7 +92,7 @@ static int write_batch(struct xc_sr_context *ctx)
 void *guest_mapping = NULL;
 void **guest_data = NULL;
 void **local_pages = NULL;
-int *errors = NULL, rc = -1;
+int *errors = ctx->save.m->errors, rc = -1;
 unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0;
 unsigned int nr_pfns = ctx->save.nr_batch_pfns;
 void *page, *orig_page;
@@ -105,8 +105,6 @@ static int write_batch(struct xc_sr_context *ctx)
 
 assert(nr_pfns != 0);
 
-/* Errors from attempting to map the gfns. */
-errors = malloc(nr_pfns * sizeof(*errors));
 /* Pointers to page data to send.  Mapped gfns or local allocations. */
 guest_data = calloc(nr_pfns, sizeof(*guest_data));
 /* Pointers to locally allocated pages.  Need freeing. */
@@ -114,7 +112,7 @@ static int write_batch(struct xc_sr_context *ctx)
 /* iovec[] for writev(). */
 iov = malloc((nr_pfns + 4) * sizeof(*iov));
 
-if ( !errors || !guest_data || !local_pages || !iov )
+if ( !guest_data || !local_pages || !iov )
 {
 ERROR("Unable to allocate arrays for a batch of %u pages",
   nr_pfns);
@@ -271,7 +269,6 @@ static int write_batch(struct xc_sr_context *ctx)
 free(iov);
 free(local_pages);
 free(guest_data);
-free(errors);
 
 return rc;
 }

[PATCH v1 18/23] tools/guest: restore: move map_errs array

2020-10-29 Thread Olaf Hering

Remove allocation from hotpath, move map_errs array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  |  1 +
 tools/libs/guest/xg_sr_restore.c | 12 +---
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 5731a5c186..eba3a49877 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -229,6 +229,7 @@ struct xc_sr_restore_arrays {
 uint32_t types[MAX_BATCH_SIZE];
 /* process_page_data */
 xen_pfn_t mfns[MAX_BATCH_SIZE];
+int map_errs[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 3ba089f862..94c329032f 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -206,21 +206,13 @@ static int process_page_data(struct xc_sr_context *ctx, 
unsigned int count,
 {
 xc_interface *xch = ctx->xch;
 xen_pfn_t *mfns = ctx->restore.m->mfns;
-int *map_errs = malloc(count * sizeof(*map_errs));
+int *map_errs = ctx->restore.m->map_errs;
 int rc;
 void *mapping = NULL, *guest_page = NULL;
 unsigned int i, /* i indexes the pfns from the record. */
 j,  /* j indexes the subset of pfns we decide to map. */
 nr_pages = 0;
 
-if ( !map_errs )
-{
-rc = -1;
-ERROR("Failed to allocate %zu bytes to process page data",
-  count * (sizeof(*mfns) + sizeof(*map_errs)));
-goto err;
-}
-
 rc = populate_pfns(ctx, count, pfns, types);
 if ( rc )
 {
@@ -298,8 +290,6 @@ static int process_page_data(struct xc_sr_context *ctx, 
unsigned int count,
 if ( mapping )
 xenforeignmemory_unmap(xch->fmem, mapping, nr_pages);
 
-free(map_errs);
-
 return rc;
 }

[PATCH v1 11/23] tools/guest: save: move iov array

2020-10-29 Thread Olaf Hering

Remove allocation from hotpath, move iov array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h | 2 ++
 tools/libs/guest/xg_sr_save.c   | 7 ++-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 71b676c0e0..f315b4f526 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -219,6 +219,8 @@ struct xc_sr_save_arrays {
 xen_pfn_t types[MAX_BATCH_SIZE];
 /* write_batch: Errors from attempting to map the gfns. */
 int errors[MAX_BATCH_SIZE];
+/* write_batch: iovec[] for writev(). */
+struct iovec iov[MAX_BATCH_SIZE + 4];
 };
 
 struct xc_sr_restore_arrays {
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index cdc27a9cde..385a591332 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -97,7 +97,7 @@ static int write_batch(struct xc_sr_context *ctx)
 unsigned int nr_pfns = ctx->save.nr_batch_pfns;
 void *page, *orig_page;
 uint64_t *rec_pfns = NULL;
-struct iovec *iov = NULL; int iovcnt = 0;
+struct iovec *iov = ctx->save.m->iov; int iovcnt = 0;
 struct xc_sr_rec_page_data_header hdr = { 0 };
 struct xc_sr_record rec = {
 .type = REC_TYPE_PAGE_DATA,
@@ -109,10 +109,8 @@ static int write_batch(struct xc_sr_context *ctx)
 guest_data = calloc(nr_pfns, sizeof(*guest_data));
 /* Pointers to locally allocated pages.  Need freeing. */
 local_pages = calloc(nr_pfns, sizeof(*local_pages));
-/* iovec[] for writev(). */
-iov = malloc((nr_pfns + 4) * sizeof(*iov));
 
-if ( !guest_data || !local_pages || !iov )
+if ( !guest_data || !local_pages )
 {
 ERROR("Unable to allocate arrays for a batch of %u pages",
   nr_pfns);
@@ -266,7 +264,6 @@ static int write_batch(struct xc_sr_context *ctx)
 xenforeignmemory_unmap(xch->fmem, guest_mapping, nr_pages_mapped);
 for ( i = 0; local_pages && i < nr_pfns; ++i )
 free(local_pages[i]);
-free(iov);
 free(local_pages);
 free(guest_data);

[PATCH v1 20/23] tools/guest: restore: move pfns array in populate_pfns

2020-10-29 Thread Olaf Hering

Remove allocation from hotpath, move populate_pfns' pfns array into 
preallocated space.
Use some prefix to avoid conflict with an array used in handle_page_data.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  |  1 +
 tools/libs/guest/xg_sr_restore.c | 11 +--
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 96a77b5969..3fe665b91d 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -232,6 +232,7 @@ struct xc_sr_restore_arrays {
 int map_errs[MAX_BATCH_SIZE];
 /* populate_pfns */
 xen_pfn_t pp_mfns[MAX_BATCH_SIZE];
+xen_pfn_t pp_pfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 85a32aaed2..71b39612ee 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -139,17 +139,10 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int 
count,
 {
 xc_interface *xch = ctx->xch;
 xen_pfn_t *mfns = ctx->restore.m->pp_mfns,
-*pfns = malloc(count * sizeof(*pfns));
+*pfns = ctx->restore.m->pp_pfns;
 unsigned int i, nr_pfns = 0;
 int rc = -1;
 
-if ( !pfns )
-{
-ERROR("Failed to allocate %zu bytes for populating the physmap",
-  2 * count * sizeof(*mfns));
-goto err;
-}
-
 for ( i = 0; i < count; ++i )
 {
 if ( (!types ||
@@ -190,8 +183,6 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int 
count,
 rc = 0;
 
  err:
-free(pfns);
-
 return rc;
 }

[PATCH v1 01/23] tools: add readv_exact to libxenctrl

2020-10-29 Thread Olaf Hering

Read a batch of iovec's.

In the common case of short reads, finish individual iov's with read_exact.

Signed-off-by: Olaf Hering 
---
 tools/libs/ctrl/xc_private.c | 54 +++-
 tools/libs/ctrl/xc_private.h |  1 +
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/tools/libs/ctrl/xc_private.c b/tools/libs/ctrl/xc_private.c
index d94f846686..a86ed213a5 100644
--- a/tools/libs/ctrl/xc_private.c
+++ b/tools/libs/ctrl/xc_private.c
@@ -659,8 +659,23 @@ int write_exact(int fd, const void *data, size_t size)
 
 #if defined(__MINIOS__)
 /*
- * MiniOS's libc doesn't know about writev(). Implement it as multiple 
write()s.
+ * MiniOS's libc doesn't know about readv/writev().
+ * Implement it as multiple read/write()s.
  */
+int readv_exact(int fd, const struct iovec *iov, int iovcnt)
+{
+int rc, i;
+
+for ( i = 0; i < iovcnt; ++i )
+{
+rc = read_exact(fd, iov[i].iov_base, iov[i].iov_len);
+if ( rc )
+return rc;
+}
+
+return 0;
+}
+
 int writev_exact(int fd, const struct iovec *iov, int iovcnt)
 {
 int rc, i;
@@ -675,6 +689,44 @@ int writev_exact(int fd, const struct iovec *iov, int 
iovcnt)
 return 0;
 }
 #else
+int readv_exact(int fd, const struct iovec *iov, int iovcnt)
+{
+int rc = 0, idx = 0;
+ssize_t len;
+
+while ( idx < iovcnt )
+{
+len = readv(fd, &iov[idx], min(iovcnt - idx, IOV_MAX));
+if ( len == -1 && errno == EINTR )
+continue;
+if ( len <= 0 )
+{
+rc = -1;
+goto out;
+}
+while ( len > 0 && idx < iovcnt )
+{
+if ( len >= iov[idx].iov_len )
+{
+len -= iov[idx].iov_len;
+}
+else
+{
+void *p = iov[idx].iov_base + len;
+size_t l = iov[idx].iov_len - len;
+
+rc = read_exact(fd, p, l);
+if ( rc )
+goto out;
+len = 0;
+}
+idx++;
+}
+}
+out:
+return rc;
+}
+
 int writev_exact(int fd, const struct iovec *iov, int iovcnt)
 {
 struct iovec *local_iov = NULL;
diff --git a/tools/libs/ctrl/xc_private.h b/tools/libs/ctrl/xc_private.h
index f0b5f83ac8..5d2c7274fb 100644
--- a/tools/libs/ctrl/xc_private.h
+++ b/tools/libs/ctrl/xc_private.h
@@ -441,6 +441,7 @@ int xc_flush_mmu_updates(xc_interface *xch, struct xc_mmu 
*mmu);
 
 /* Return 0 on success; -1 on error setting errno. */
 int read_exact(int fd, void *data, size_t size); /* EOF => -1, errno=0 */
+int readv_exact(int fd, const struct iovec *iov, int iovcnt);
 int write_exact(int fd, const void *data, size_t size);
 int writev_exact(int fd, const struct iovec *iov, int iovcnt);

[PATCH v1 13/23] tools/guest: save: move guest_data array

2020-10-29 Thread Olaf Hering

Remove allocation from hotpath, move guest_data array into preallocated space.

Because this was allocated with calloc:
Adjust the loop to clear unused entries as needed.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h |  2 ++
 tools/libs/guest/xg_sr_save.c   | 11 ++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 81158a4f4d..33e66678c6 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -223,6 +223,8 @@ struct xc_sr_save_arrays {
 struct iovec iov[MAX_BATCH_SIZE + 4];
 /* write_batch */
 uint64_t rec_pfns[MAX_BATCH_SIZE];
+/* write_batch: Pointers to page data to send. Mapped gfns or local 
allocations. */
+void *guest_data[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_restore_arrays {
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 4d7fbe09ed..658f834ae8 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -90,7 +90,7 @@ static int write_batch(struct xc_sr_context *ctx)
 xc_interface *xch = ctx->xch;
 xen_pfn_t *mfns = ctx->save.m->mfns, *types = ctx->save.m->types;
 void *guest_mapping = NULL;
-void **guest_data = NULL;
+void **guest_data = ctx->save.m->guest_data;
 void **local_pages = NULL;
 int *errors = ctx->save.m->errors, rc = -1;
 unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0;
@@ -105,12 +105,10 @@ static int write_batch(struct xc_sr_context *ctx)
 
 assert(nr_pfns != 0);
 
-/* Pointers to page data to send.  Mapped gfns or local allocations. */
-guest_data = calloc(nr_pfns, sizeof(*guest_data));
 /* Pointers to locally allocated pages.  Need freeing. */
 local_pages = calloc(nr_pfns, sizeof(*local_pages));
 
-if ( !guest_data || !local_pages )
+if ( !local_pages )
 {
 ERROR("Unable to allocate arrays for a batch of %u pages",
   nr_pfns);
@@ -166,7 +164,10 @@ static int write_batch(struct xc_sr_context *ctx)
 for ( i = 0, p = 0; i < nr_pfns; ++i )
 {
 if ( page_type_has_stream_data(types[i]) == false )
+{
+guest_data[i] = NULL;
 continue;
+}
 
 if ( errors[p] )
 {
@@ -183,6 +184,7 @@ static int write_batch(struct xc_sr_context *ctx)
 
 if ( rc )
 {
+guest_data[i] = NULL;
 if ( rc == -1 && errno == EAGAIN )
 {
 set_bit(ctx->save.m->batch_pfns[i], 
ctx->save.deferred_pages);
@@ -256,7 +258,6 @@ static int write_batch(struct xc_sr_context *ctx)
 for ( i = 0; local_pages && i < nr_pfns; ++i )
 free(local_pages[i]);
 free(local_pages);
-free(guest_data);
 
 return rc;
 }

[PATCH v1 17/23] tools/guest: restore: move mfns array

2020-10-29 Thread Olaf Hering

Remove allocation from hotpath, move mfns array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  | 2 ++
 tools/libs/guest/xg_sr_restore.c | 5 ++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 0ceecb289d..5731a5c186 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -227,6 +227,8 @@ struct xc_sr_restore_arrays {
 /* handle_page_data */
 xen_pfn_t pfns[MAX_BATCH_SIZE];
 uint32_t types[MAX_BATCH_SIZE];
+/* process_page_data */
+xen_pfn_t mfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 7729071e41..3ba089f862 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -205,7 +205,7 @@ static int process_page_data(struct xc_sr_context *ctx, 
unsigned int count,
  xen_pfn_t *pfns, uint32_t *types, void *page_data)
 {
 xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = malloc(count * sizeof(*mfns));
+xen_pfn_t *mfns = ctx->restore.m->mfns;
 int *map_errs = malloc(count * sizeof(*map_errs));
 int rc;
 void *mapping = NULL, *guest_page = NULL;
@@ -213,7 +213,7 @@ static int process_page_data(struct xc_sr_context *ctx, 
unsigned int count,
 j,  /* j indexes the subset of pfns we decide to map. */
 nr_pages = 0;
 
-if ( !mfns || !map_errs )
+if ( !map_errs )
 {
 rc = -1;
 ERROR("Failed to allocate %zu bytes to process page data",
@@ -299,7 +299,6 @@ static int process_page_data(struct xc_sr_context *ctx, 
unsigned int count,
 xenforeignmemory_unmap(xch->fmem, mapping, nr_pages);
 
 free(map_errs);
-free(mfns);
 
 return rc;
 }

[PATCH v1 07/23] tools/guest: save: move batch_pfns

2020-10-29 Thread Olaf Hering

The batch_pfns array is already allocated in advance.
Move it into the preallocated area.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h |  2 +-
 tools/libs/guest/xg_sr_save.c   | 25 +++--
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 62bc87b5f4..c78a07b8f8 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -212,6 +212,7 @@ static inline int update_blob(struct xc_sr_blob *blob,
 }
 
 struct xc_sr_save_arrays {
+xen_pfn_t batch_pfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_restore_arrays {
@@ -249,7 +250,6 @@ struct xc_sr_context
 
 struct precopy_stats stats;
 
-xen_pfn_t *batch_pfns;
 unsigned int nr_batch_pfns;
 unsigned long *deferred_pages;
 unsigned long nr_deferred_pages;
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 1e3c8eff2f..597e638c59 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -77,7 +77,7 @@ static int write_checkpoint_record(struct xc_sr_context *ctx)
 
 /*
  * Writes a batch of memory as a PAGE_DATA record into the stream.  The batch
- * is constructed in ctx->save.batch_pfns.
+ * is constructed in ctx->save.m->batch_pfns.
  *
  * This function:
  * - gets the types for each pfn in the batch.
@@ -128,12 +128,12 @@ static int write_batch(struct xc_sr_context *ctx)
 for ( i = 0; i < nr_pfns; ++i )
 {
 types[i] = mfns[i] = ctx->save.ops.pfn_to_gfn(ctx,
-  ctx->save.batch_pfns[i]);
+  
ctx->save.m->batch_pfns[i]);
 
 /* Likely a ballooned page. */
 if ( mfns[i] == INVALID_MFN )
 {
-set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
+set_bit(ctx->save.m->batch_pfns[i], ctx->save.deferred_pages);
 ++ctx->save.nr_deferred_pages;
 }
 }
@@ -179,7 +179,7 @@ static int write_batch(struct xc_sr_context *ctx)
 if ( errors[p] )
 {
 ERROR("Mapping of pfn %#"PRIpfn" (mfn %#"PRIpfn") failed %d",
-  ctx->save.batch_pfns[i], mfns[p], errors[p]);
+  ctx->save.m->batch_pfns[i], mfns[p], errors[p]);
 goto err;
 }
 
@@ -193,7 +193,7 @@ static int write_batch(struct xc_sr_context *ctx)
 {
 if ( rc == -1 && errno == EAGAIN )
 {
-set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages);
+set_bit(ctx->save.m->batch_pfns[i], 
ctx->save.deferred_pages);
 ++ctx->save.nr_deferred_pages;
 types[i] = XEN_DOMCTL_PFINFO_XTAB;
 --nr_pages;
@@ -224,7 +224,7 @@ static int write_batch(struct xc_sr_context *ctx)
 rec.length += nr_pages * PAGE_SIZE;
 
 for ( i = 0; i < nr_pfns; ++i )
-rec_pfns[i] = ((uint64_t)(types[i]) << 32) | ctx->save.batch_pfns[i];
+rec_pfns[i] = ((uint64_t)(types[i]) << 32) | 
ctx->save.m->batch_pfns[i];
 
 iov[0].iov_base = &rec.type;
 iov[0].iov_len = sizeof(rec.type);
@@ -296,9 +296,9 @@ static int flush_batch(struct xc_sr_context *ctx)
 
 if ( !rc )
 {
-VALGRIND_MAKE_MEM_UNDEFINED(ctx->save.batch_pfns,
+VALGRIND_MAKE_MEM_UNDEFINED(ctx->save.m->batch_pfns,
 MAX_BATCH_SIZE *
-sizeof(*ctx->save.batch_pfns));
+sizeof(*ctx->save.m->batch_pfns));
 }
 
 return rc;
@@ -315,7 +315,7 @@ static int add_to_batch(struct xc_sr_context *ctx, 
xen_pfn_t pfn)
 rc = flush_batch(ctx);
 
 if ( rc == 0 )
-ctx->save.batch_pfns[ctx->save.nr_batch_pfns++] = pfn;
+ctx->save.m->batch_pfns[ctx->save.nr_batch_pfns++] = pfn;
 
 return rc;
 }
@@ -850,14 +850,12 @@ static int setup(struct xc_sr_context *ctx)
 
 dirty_bitmap = xc_hypercall_buffer_alloc_pages(
 xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size)));
-ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
-  sizeof(*ctx->save.batch_pfns));
 ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
 ctx->save.m = malloc(sizeof(*ctx->save.m));
 
-if ( !ctx->save.m || !ctx->save.batch_pfns || !dirty_bitmap || 
!ctx->save.deferred_pages )
+if ( !ctx->save.m || !dirty_bitmap || !ctx->save.deferred_pages )
 {
-ERROR("Unable to allocate memory for dirty bitmaps, batch pfns and"
+ERROR("Unable to allocate memory for dirty bitmaps and"
   " deferred pages");
 rc = -1;
 errno = ENOMEM;
@@ -886,7 +884,6 @@ static void cleanup(struct xc_sr_context *ctx)
 xc_hypercall_buffer_free_pages(xch, dirty_bitmap,

[PATCH v1 16/23] tools/guest: restore: move types array

2020-10-29 Thread Olaf Hering

Remove allocation from hotpath, move types array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  |  1 +
 tools/libs/guest/xg_sr_restore.c | 12 +---
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 516d9b9fb5..0ceecb289d 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -226,6 +226,7 @@ struct xc_sr_save_arrays {
 struct xc_sr_restore_arrays {
 /* handle_page_data */
 xen_pfn_t pfns[MAX_BATCH_SIZE];
+uint32_t types[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 7d1447e86c..7729071e41 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -316,7 +316,7 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 int rc = -1;
 
 xen_pfn_t *pfns = ctx->restore.m->pfns, pfn;
-uint32_t *types = NULL, type;
+uint32_t *types = ctx->restore.m->types, type;
 
 /*
  * v2 compatibility only exists for x86 streams.  This is a bit of a
@@ -363,14 +363,6 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 goto err;
 }
 
-types = malloc(pages->count * sizeof(*types));
-if ( !types )
-{
-ERROR("Unable to allocate enough memory for %u pfns",
-  pages->count);
-goto err;
-}
-
 for ( i = 0; i < pages->count; ++i )
 {
 pfn = pages->pfn[i] & PAGE_DATA_PFN_MASK;
@@ -410,8 +402,6 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 rc = process_page_data(ctx, pages->count, pfns, types,
&pages->pfn[pages->count]);
  err:
-free(types);
-
 return rc;
 }

[PATCH v1 08/23] tools/guest: save: move mfns array

2020-10-29 Thread Olaf Hering

Remove allocation from hotpath, move mfns array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h | 2 ++
 tools/libs/guest/xg_sr_save.c   | 7 ++-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index c78a07b8f8..0c2bef8f78 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -213,6 +213,8 @@ static inline int update_blob(struct xc_sr_blob *blob,
 
 struct xc_sr_save_arrays {
 xen_pfn_t batch_pfns[MAX_BATCH_SIZE];
+/* write_batch: Mfns of the batch pfns. */
+xen_pfn_t mfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_restore_arrays {
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 597e638c59..cdab288c3f 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -88,7 +88,7 @@ static int write_checkpoint_record(struct xc_sr_context *ctx)
 static int write_batch(struct xc_sr_context *ctx)
 {
 xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = NULL, *types = NULL;
+xen_pfn_t *mfns = ctx->save.m->mfns, *types = NULL;
 void *guest_mapping = NULL;
 void **guest_data = NULL;
 void **local_pages = NULL;
@@ -105,8 +105,6 @@ static int write_batch(struct xc_sr_context *ctx)
 
 assert(nr_pfns != 0);
 
-/* Mfns of the batch pfns. */
-mfns = malloc(nr_pfns * sizeof(*mfns));
 /* Types of the batch pfns. */
 types = malloc(nr_pfns * sizeof(*types));
 /* Errors from attempting to map the gfns. */
@@ -118,7 +116,7 @@ static int write_batch(struct xc_sr_context *ctx)
 /* iovec[] for writev(). */
 iov = malloc((nr_pfns + 4) * sizeof(*iov));
 
-if ( !mfns || !types || !errors || !guest_data || !local_pages || !iov )
+if ( !types || !errors || !guest_data || !local_pages || !iov )
 {
 ERROR("Unable to allocate arrays for a batch of %u pages",
   nr_pfns);
@@ -277,7 +275,6 @@ static int write_batch(struct xc_sr_context *ctx)
 free(guest_data);
 free(errors);
 free(types);
-free(mfns);
 
 return rc;
 }

[PATCH v1 03/23] tools: use xc_is_known_page_type

2020-10-29 Thread Olaf Hering

Verify pfn type on sending side, also verify incoming batch of pfns.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_restore.c | 3 +--
 tools/libs/guest/xg_sr_save.c| 6 ++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index b57a787519..f1c3169229 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -406,8 +406,7 @@ static int handle_page_data(struct xc_sr_context *ctx, 
struct xc_sr_record *rec)
 }
 
 type = (pages->pfn[i] & PAGE_DATA_TYPE_MASK) >> 32;
-if ( ((type >> XEN_DOMCTL_PFINFO_LTAB_SHIFT) >= 5) &&
- ((type >> XEN_DOMCTL_PFINFO_LTAB_SHIFT) <= 8) )
+if ( xc_is_known_page_type(type) == false )
 {
 ERROR("Invalid type %#"PRIx32" for pfn %#"PRIpfn" (index %u)",
   type, pfn, i);
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index 2ba7c3200c..044d0ae3aa 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -147,6 +147,12 @@ static int write_batch(struct xc_sr_context *ctx)
 
 for ( i = 0; i < nr_pfns; ++i )
 {
+if ( xc_is_known_page_type(types[i]) == false )
+{
+ERROR("Wrong type %#"PRIpfn" for pfn %#"PRIpfn, types[i], mfns[i]);
+goto err;
+}
+
 switch ( types[i] )
 {
 case XEN_DOMCTL_PFINFO_BROKEN:

[PATCH v1 06/23] tools/guest: prepare to allocate arrays once

2020-10-29 Thread Olaf Hering

The hotpath 'send_dirty_pages' is supposed to do just one thing: sending.
The other end 'handle_page_data' is supposed to do just receiving.

But instead both do other costly work like memory allocations and data moving.
Do the allocations once, the array sizes are a compiletime constant.
Avoid unneeded copying of data by receiving data directly into mapped guest 
memory.

This patch is just prepartion, subsequent changes will populate the arrays.

Once all changes are applied, migration of a busy HVM domU changes like that:

Without this series, from sr650 to sr950 (xen-4.15.20201027T173911.16a20963b3 
xen_testing):
2020-10-29 10:23:10.711+: xc: show_transfer_rate: 23663128 bytes + 2879563 
pages in 55.324905335 sec, 203 MiB/sec: Internal error
2020-10-29 10:23:35.115+: xc: show_transfer_rate: 16829632 bytes + 2097552 
pages in 24.401179720 sec, 335 MiB/sec: Internal error
2020-10-29 10:23:59.436+: xc: show_transfer_rate: 16829032 bytes + 2097478 
pages in 24.319025928 sec, 336 MiB/sec: Internal error
2020-10-29 10:24:23.844+: xc: show_transfer_rate: 16829024 bytes + 2097477 
pages in 24.406992500 sec, 335 MiB/sec: Internal error
2020-10-29 10:24:48.292+: xc: show_transfer_rate: 16828912 bytes + 2097463 
pages in 24.446489027 sec, 335 MiB/sec: Internal error
2020-10-29 10:25:01.816+: xc: show_transfer_rate: 16836080 bytes + 2098356 
pages in 13.447091818 sec, 609 MiB/sec: Internal error

With this series, from sr650 to sr950 (xen-4.15.20201027T173911.16a20963b3 
xen_unstable):
2020-10-28 21:26:05.074+: xc: show_transfer_rate: 23663128 bytes + 2879563 
pages in 52.564054368 sec, 213 MiB/sec: Internal error
2020-10-28 21:26:23.527+: xc: show_transfer_rate: 16830040 bytes + 2097603 
pages in 18.450592015 sec, 444 MiB/sec: Internal error
2020-10-28 21:26:41.926+: xc: show_transfer_rate: 16830944 bytes + 2097717 
pages in 18.397862306 sec, 445 MiB/sec: Internal error
2020-10-28 21:27:00.339+: xc: show_transfer_rate: 16829176 bytes + 2097498 
pages in 18.411973339 sec, 445 MiB/sec: Internal error
2020-10-28 21:27:18.643+: xc: show_transfer_rate: 16828592 bytes + 2097425 
pages in 18.303326695 sec, 447 MiB/sec: Internal error
2020-10-28 21:27:26.289+: xc: show_transfer_rate: 16835952 bytes + 2098342 
pages in 7.579846749 sec, 1081 MiB/sec: Internal error

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  | 8 
 tools/libs/guest/xg_sr_restore.c | 8 
 tools/libs/guest/xg_sr_save.c| 4 +++-
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index f3a7a29298..62bc87b5f4 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -211,6 +211,12 @@ static inline int update_blob(struct xc_sr_blob *blob,
 return 0;
 }
 
+struct xc_sr_save_arrays {
+};
+
+struct xc_sr_restore_arrays {
+};
+
 struct xc_sr_context
 {
 xc_interface *xch;
@@ -248,6 +254,7 @@ struct xc_sr_context
 unsigned long *deferred_pages;
 unsigned long nr_deferred_pages;
 xc_hypercall_buffer_t dirty_bitmap_hbuf;
+struct xc_sr_save_arrays *m;
 } save;
 
 struct /* Restore data. */
@@ -299,6 +306,7 @@ struct xc_sr_context
 
 /* Sender has invoked verify mode on the stream. */
 bool verify;
+struct xc_sr_restore_arrays *m;
 } restore;
 };
 
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 0332ae9f32..4a9ece9681 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -739,6 +739,13 @@ static int setup(struct xc_sr_context *ctx)
 }
 ctx->restore.allocated_rec_num = DEFAULT_BUF_RECORDS;
 
+ctx->restore.m = malloc(sizeof(*ctx->restore.m));
+if ( !ctx->restore.m ) {
+ERROR("Unable to allocate memory for arrays");
+rc = -1;
+goto err;
+}
+
  err:
 return rc;
 }
@@ -757,6 +764,7 @@ static void cleanup(struct xc_sr_context *ctx)
 xc_hypercall_buffer_free_pages(
 xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->restore.p2m_size)));
 
+free(ctx->restore.m);
 free(ctx->restore.buffered_records);
 free(ctx->restore.populated_pfns);
 
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index f58a35ddde..1e3c8eff2f 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -853,8 +853,9 @@ static int setup(struct xc_sr_context *ctx)
 ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE *
   sizeof(*ctx->save.batch_pfns));
 ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size);
+ctx->save.m = malloc(sizeof(*ctx->save.m));
 
-if ( !ctx->save.batch_pfns || !dirty_bitmap || !ctx->save.deferred_pages )
+if ( !ctx->save.m || !ctx->save.batch_pfns || !dirty_bitmap || 
!ctx->save.deferred_pages )
 {
 ERROR("Unable to allocate m

[PATCH v1 09/23] tools/guest: save: move types array

2020-10-29 Thread Olaf Hering

Remove allocation from hotpath, move types array into preallocated space.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h | 2 ++
 tools/libs/guest/xg_sr_save.c   | 7 ++-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 0c2bef8f78..3cbadb607b 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -215,6 +215,8 @@ struct xc_sr_save_arrays {
 xen_pfn_t batch_pfns[MAX_BATCH_SIZE];
 /* write_batch: Mfns of the batch pfns. */
 xen_pfn_t mfns[MAX_BATCH_SIZE];
+/* write_batch: Types of the batch pfns. */
+xen_pfn_t types[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_restore_arrays {
diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c
index cdab288c3f..6678df95b8 100644
--- a/tools/libs/guest/xg_sr_save.c
+++ b/tools/libs/guest/xg_sr_save.c
@@ -88,7 +88,7 @@ static int write_checkpoint_record(struct xc_sr_context *ctx)
 static int write_batch(struct xc_sr_context *ctx)
 {
 xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = ctx->save.m->mfns, *types = NULL;
+xen_pfn_t *mfns = ctx->save.m->mfns, *types = ctx->save.m->types;
 void *guest_mapping = NULL;
 void **guest_data = NULL;
 void **local_pages = NULL;
@@ -105,8 +105,6 @@ static int write_batch(struct xc_sr_context *ctx)
 
 assert(nr_pfns != 0);
 
-/* Types of the batch pfns. */
-types = malloc(nr_pfns * sizeof(*types));
 /* Errors from attempting to map the gfns. */
 errors = malloc(nr_pfns * sizeof(*errors));
 /* Pointers to page data to send.  Mapped gfns or local allocations. */
@@ -116,7 +114,7 @@ static int write_batch(struct xc_sr_context *ctx)
 /* iovec[] for writev(). */
 iov = malloc((nr_pfns + 4) * sizeof(*iov));
 
-if ( !types || !errors || !guest_data || !local_pages || !iov )
+if ( !errors || !guest_data || !local_pages || !iov )
 {
 ERROR("Unable to allocate arrays for a batch of %u pages",
   nr_pfns);
@@ -274,7 +272,6 @@ static int write_batch(struct xc_sr_context *ctx)
 free(local_pages);
 free(guest_data);
 free(errors);
-free(types);
 
 return rc;
 }

[PATCH v1 23/23] tools/guest: restore: write data directly into guest

2020-10-29 Thread Olaf Hering

Read incoming migration stream directly into the guest memory.
This avoids the memory allocation and copying, and the resulting
performance penalty.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  |   1 +
 tools/libs/guest/xg_sr_restore.c | 132 ++-
 2 files changed, 129 insertions(+), 4 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 7ec8867b88..f76af23bcc 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -231,6 +231,7 @@ struct xc_sr_restore_arrays {
 xen_pfn_t mfns[MAX_BATCH_SIZE];
 int map_errs[MAX_BATCH_SIZE];
 void *guest_data[MAX_BATCH_SIZE];
+struct iovec iov[MAX_BATCH_SIZE];
 
 /* populate_pfns */
 xen_pfn_t pp_mfns[MAX_BATCH_SIZE];
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 060f3d1f4e..2f575d7dd9 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -392,6 +392,122 @@ err:
 return rc;
 }
 
+/*
+ * Handle PAGE_DATA record from the stream.
+ * Given a list of pfns, their types, and a block of page data from the
+ * stream, populate and record their types, map the relevant subset and copy
+ * the data into the guest.
+ */
+static int handle_incoming_page_data(struct xc_sr_context *ctx,
+ struct xc_sr_rhdr *rhdr)
+{
+xc_interface *xch = ctx->xch;
+struct xc_sr_restore_arrays *m = ctx->restore.m;
+struct xc_sr_rec_page_data_header *pages = &m->pages;
+uint64_t *pfn_nums = m->pages.pfn;
+uint32_t i;
+int rc, iov_idx;
+
+rc = handle_static_data_end_v2(ctx);
+if ( rc )
+goto err;
+
+/* First read and verify the header */
+rc = read_exact(ctx->fd, pages, sizeof(*pages));
+if ( rc )
+{
+PERROR("Could not read rec_pfn header");
+goto err;
+}
+
+if ( verify_rec_page_hdr(ctx, rhdr->length, pages) == false )
+{
+rc = -1;
+goto err;
+}
+
+/* Then read and verify the incoming pfn numbers */
+rc = read_exact(ctx->fd, pfn_nums, sizeof(*pfn_nums) * pages->count);
+if ( rc )
+{
+PERROR("Could not read rec_pfn data");
+goto err;
+}
+
+if ( verify_rec_page_pfns(ctx, rhdr->length, pages) == false )
+{
+rc = -1;
+goto err;
+}
+
+/* Finally read and verify the incoming pfn data */
+rc = map_guest_pages(ctx, pages);
+if ( rc )
+goto err;
+
+/* Prepare read buffers, either guest or throw away memory */
+for ( i = 0, iov_idx = 0; i < pages->count; i++ )
+{
+if ( !m->guest_data[i] )
+continue;
+
+m->iov[iov_idx].iov_len = PAGE_SIZE;
+if ( ctx->restore.verify )
+m->iov[iov_idx].iov_base = ctx->restore.verify_buf + i * PAGE_SIZE;
+else
+m->iov[iov_idx].iov_base = m->guest_data[i];
+iov_idx++;
+}
+
+if ( !iov_idx )
+goto done;
+
+rc = readv_exact(ctx->fd, m->iov, iov_idx);
+if ( rc )
+{
+PERROR("read of %d pages failed", iov_idx);
+goto err;
+}
+
+/* Post-processing of pfn data */
+for ( i = 0, iov_idx = 0; i < pages->count; i++ )
+{
+if ( !m->guest_data[i] )
+continue;
+
+rc = ctx->restore.ops.localise_page(ctx, m->types[i], 
m->iov[iov_idx].iov_base);
+if ( rc )
+{
+ERROR("Failed to localise pfn %#"PRIpfn" (type %#"PRIx32")",
+  m->pfns[i], m->types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
+goto err;
+
+}
+
+if ( ctx->restore.verify )
+{
+if ( memcmp(m->guest_data[i], m->iov[iov_idx].iov_base, PAGE_SIZE) 
)
+{
+ERROR("verify pfn %#"PRIpfn" failed (type %#"PRIx32")",
+  m->pfns[i], m->types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT);
+}
+}
+
+iov_idx++;
+}
+
+done:
+rc = 0;
+
+err:
+if ( ctx->restore.guest_mapping )
+{
+xenforeignmemory_unmap(xch->fmem, ctx->restore.guest_mapping, 
ctx->restore.nr_mapped_pages);
+ctx->restore.guest_mapping = NULL;
+}
+return rc;
+}
+
 /*
  * Handle PAGE_DATA record from an existing buffer
  * Given a list of pfns, their types, and a block of page data from the
@@ -773,11 +889,19 @@ static int process_incoming_record_header(struct 
xc_sr_context *ctx, struct xc_s
 struct xc_sr_record rec;
 int rc;
 
-rc = read_record_data(ctx, ctx->fd, rhdr, &rec);
-if ( rc )
-return rc;
+switch ( rhdr->type )
+{
+case REC_TYPE_PAGE_DATA:
+rc = handle_incoming_page_data(ctx, rhdr);
+break;
+default:
+rc = read_record_data(ctx, ctx->fd, rhdr, &rec);
+if ( rc == 0 )
+rc = process_buffered_record(ctx, &rec);;
+break;
+}
 
-return process_buffered_record(ctx, &rec);
+return rc;
 }

[PATCH v1 19/23] tools/guest: restore: move mfns array in populate_pfns

2020-10-29 Thread Olaf Hering

Remove allocation from hotpath, move populate_pfns mfns array into preallocated 
space.
Use some prefix to avoid conflict with an array used in handle_page_data.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.h  | 2 ++
 tools/libs/guest/xg_sr_restore.c | 5 ++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index eba3a49877..96a77b5969 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -230,6 +230,8 @@ struct xc_sr_restore_arrays {
 /* process_page_data */
 xen_pfn_t mfns[MAX_BATCH_SIZE];
 int map_errs[MAX_BATCH_SIZE];
+/* populate_pfns */
+xen_pfn_t pp_mfns[MAX_BATCH_SIZE];
 };
 
 struct xc_sr_context
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 94c329032f..85a32aaed2 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -138,12 +138,12 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int 
count,
   const xen_pfn_t *original_pfns, const uint32_t *types)
 {
 xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = malloc(count * sizeof(*mfns)),
+xen_pfn_t *mfns = ctx->restore.m->pp_mfns,
 *pfns = malloc(count * sizeof(*pfns));
 unsigned int i, nr_pfns = 0;
 int rc = -1;
 
-if ( !mfns || !pfns )
+if ( !pfns )
 {
 ERROR("Failed to allocate %zu bytes for populating the physmap",
   2 * count * sizeof(*mfns));
@@ -191,7 +191,6 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int 
count,
 
  err:
 free(pfns);
-free(mfns);
 
 return rc;
 }

Re: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header

2020-10-29 Thread Arvind Sankar

On Thu, Oct 29, 2020 at 05:59:54PM +0100, Paolo Bonzini wrote:
> On 29/10/20 17:56, Arvind Sankar wrote:
> >> For those two just add:
> >>struct apic *apic = x86_system_apic;
> >> before all the assignments.
> >> Less churn and much better code.
> >>
> > Why would it be better code?
> > 
> 
> I think he means the compiler produces better code, because it won't
> read the global variable repeatedly.  Not sure if that's true,(*) but I
> think I do prefer that version if Arnd wants to do that tweak.
> 
> Paolo
> 
> (*) if it is, it might also be due to Linux using -fno-strict-aliasing
> 

Nope, it doesn't read it multiple times. I guess it still assumes that
apic isn't in the middle of what it points to: it would reload the
address if the first element of *apic was modified, but doesn't for
other elements. Interesting.

[PATCH v1 21/23] tools/guest: restore: split record processing

2020-10-29 Thread Olaf Hering

handle_page_data must be able to read directly into mapped guest memory.
This will avoid unneccesary memcpy calls for data which can be consumed 
verbatim.

Rearrange the code to allow decisions based on the incoming record.

This change is preparation for future changes in handle_page_data,
no change in behavior is intended.

Signed-off-by: Olaf Hering 
---
 tools/libs/guest/xg_sr_common.c  | 33 -
 tools/libs/guest/xg_sr_common.h  |  4 ++-
 tools/libs/guest/xg_sr_restore.c | 49 ++--
 tools/libs/guest/xg_sr_save.c|  7 -
 4 files changed, 63 insertions(+), 30 deletions(-)

diff --git a/tools/libs/guest/xg_sr_common.c b/tools/libs/guest/xg_sr_common.c
index 17567ab133..cabde4ef74 100644
--- a/tools/libs/guest/xg_sr_common.c
+++ b/tools/libs/guest/xg_sr_common.c
@@ -91,26 +91,33 @@ int write_split_record(struct xc_sr_context *ctx, struct 
xc_sr_record *rec,
 return -1;
 }
 
-int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec)
+int read_record_header(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr 
*rhdr)
 {
 xc_interface *xch = ctx->xch;
-struct xc_sr_rhdr rhdr;
-size_t datasz;
 
-if ( read_exact(fd, &rhdr, sizeof(rhdr)) )
+if ( read_exact(fd, rhdr, sizeof(*rhdr)) )
 {
 PERROR("Failed to read Record Header from stream");
 return -1;
 }
 
-if ( rhdr.length > REC_LENGTH_MAX )
+if ( rhdr->length > REC_LENGTH_MAX )
 {
-ERROR("Record (0x%08x, %s) length %#x exceeds max (%#x)", rhdr.type,
-  rec_type_to_str(rhdr.type), rhdr.length, REC_LENGTH_MAX);
+ERROR("Record (0x%08x, %s) length %#x exceeds max (%#x)", rhdr->type,
+  rec_type_to_str(rhdr->type), rhdr->length, REC_LENGTH_MAX);
 return -1;
 }
 
-datasz = ROUNDUP(rhdr.length, REC_ALIGN_ORDER);
+return 0;
+}
+
+int read_record_data(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr 
*rhdr,
+ struct xc_sr_record *rec)
+{
+xc_interface *xch = ctx->xch;
+size_t datasz;
+
+datasz = ROUNDUP(rhdr->length, REC_ALIGN_ORDER);
 
 if ( datasz )
 {
@@ -119,7 +126,7 @@ int read_record(struct xc_sr_context *ctx, int fd, struct 
xc_sr_record *rec)
 if ( !rec->data )
 {
 ERROR("Unable to allocate %zu bytes for record data (0x%08x, %s)",
-  datasz, rhdr.type, rec_type_to_str(rhdr.type));
+  datasz, rhdr->type, rec_type_to_str(rhdr->type));
 return -1;
 }
 
@@ -128,18 +135,18 @@ int read_record(struct xc_sr_context *ctx, int fd, struct 
xc_sr_record *rec)
 free(rec->data);
 rec->data = NULL;
 PERROR("Failed to read %zu bytes of data for record (0x%08x, %s)",
-   datasz, rhdr.type, rec_type_to_str(rhdr.type));
+   datasz, rhdr->type, rec_type_to_str(rhdr->type));
 return -1;
 }
 }
 else
 rec->data = NULL;
 
-rec->type   = rhdr.type;
-rec->length = rhdr.length;
+rec->type   = rhdr->type;
+rec->length = rhdr->length;
 
 return 0;
-};
+}
 
 static void __attribute__((unused)) build_assertions(void)
 {
diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h
index 3fe665b91d..66d1b0dfe6 100644
--- a/tools/libs/guest/xg_sr_common.h
+++ b/tools/libs/guest/xg_sr_common.h
@@ -475,7 +475,9 @@ static inline int write_record(struct xc_sr_context *ctx,
  *
  * On failure, the contents of the record structure are undefined.
  */
-int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec);
+int read_record_header(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr 
*rhdr);
+int read_record_data(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr 
*rhdr,
+ struct xc_sr_record *rec);
 
 /*
  * This would ideally be private in restore.c, but is needed by
diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c
index 71b39612ee..93f69b9ba8 100644
--- a/tools/libs/guest/xg_sr_restore.c
+++ b/tools/libs/guest/xg_sr_restore.c
@@ -471,7 +471,7 @@ static int send_checkpoint_dirty_pfn_list(struct 
xc_sr_context *ctx)
 return rc;
 }
 
-static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec);
+static int process_buffered_record(struct xc_sr_context *ctx, struct 
xc_sr_record *rec);
 static int handle_checkpoint(struct xc_sr_context *ctx)
 {
 xc_interface *xch = ctx->xch;
@@ -510,7 +510,7 @@ static int handle_checkpoint(struct xc_sr_context *ctx)
 
 for ( i = 0; i < ctx->restore.buffered_rec_num; i++ )
 {
-rc = process_record(ctx, &ctx->restore.buffered_records[i]);
+rc = process_buffered_record(ctx, 
&ctx->restore.buffered_records[i]);
 if ( rc )
 goto err;
 }
@@ -571,10 +571,11 @@ static int handle_checkpoint(struct xc_sr_context *ctx)
 return rc;
 }
 
-static int buffer_

Re: [PATCH] x86emul: support AVX-VNNI

2020-10-29 Thread Andrew Cooper

On 29/10/2020 15:01, Jan Beulich wrote:
> These are VEX-encoded equivalents of the EVEX-encoded AVX512-VNNI ISA
> extension.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper

Re: [PATCH 2/2] x86: PV shim doesn't need GRANT_TABLE

2020-10-29 Thread Andrew Cooper

On 29/10/2020 17:10, Jan Beulich wrote:
> The only reference into the code controlled by this option is from the
> hypercall table, and that hypercall table entry gets overwritten.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper

Re: [PATCH 1/2] x86: fix build of PV shim when !GRANT_TABLE

2020-10-29 Thread Andrew Cooper

On 29/10/2020 17:09, Jan Beulich wrote:
> To do its compat translation, shim code needs to include the compat
> header. For this to work, this header first of all needs to be
> generated.
>
> Reported-by: Andrew Cooper 
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper

Re: [PATCH] x86/HVM: send mapcache invalidation request to qemu regardless of preemption

2020-10-29 Thread Andrew Cooper

On 29/10/2020 15:14, Jan Beulich wrote:
> Even if only part of a hypercall completed before getting preempted,
> invalidation ought to occur. Therefore fold the two return statements.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper

Re: [PATCH V2 00/23] IOREQ feature (+ virtio-mmio) on Arm

2020-10-29 Thread Oleksandr Tyshchenko

On Thu, Oct 29, 2020 at 9:42 AM Masami Hiramatsu <
masami.hirama...@linaro.org> wrote:

> Hi Oleksandr,
>
Hi Masami

[sorry for the possible format issue]


>
> I would like to try this on my arm64 board.
>
Glad to hear you are interested in this topic.


>
> According to your comments in the patch, I made this config file.
> # cat debian.conf
> name = "debian"
> type = "pvh"
> vcpus = 8
> memory = 512
> kernel = "/opt/agl/vmlinuz-5.9.0-1-arm64"
> ramdisk = "/opt/agl/initrd.img-5.9.0-1-arm64"
> cmdline= "console=hvc0 earlyprintk=xen root=/dev/xvda1 rw"
> disk = [ '/opt/agl/debian.qcow2,qcow2,hda' ]
> vif = [ 'mac=00:16:3E:74:3d:76,bridge=xenbr0' ]
> virtio = 1
> vdisk = [ 'backend=Dom0, disks=ro:/dev/sda1' ]
>
> And tried to boot a DomU, but I got below error.
>
> # xl create -c debian.conf
> Parsing config from debian.conf
> libxl: error: libxl_create.c:1863:domcreate_attach_devices: Domain
> 1:unable to add virtio_disk devices
> libxl: error: libxl_domain.c:1218:destroy_domid_pci_done: Domain
> 1:xc_domain_pause failed
> libxl: error: libxl_dom.c:39:libxl__domain_type: unable to get domain
> type for domid=1
> libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain
> 1:Unable to destroy guest
> libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain
> 1:Destruction of domain failed
>
>
> Could you tell me how can I test it?
>

I assume it is due to the lack of the virtio-disk backend (which I haven't
shared yet as I focused on the IOREQ/DM support on Arm in the first place).
Could you wait a little bit, I am going to share it soon.


-- 
Regards,

Oleksandr Tyshchenko

[PATCH v2] libxl: Add suppress-vmdesc to QEMU machine

2020-10-29 Thread Jason Andryuk

The device model state saved by QMP xen-save-devices-state doesn't
include the vmdesc json.  When restoring an HVM, xen-load-devices-state
always triggers "Expected vmdescription section, but got 0".  This is
not a problem when restore comes from a file.  However, when QEMU runs
in a linux stubdom and comes over a console, EOF is not received.  This
causes a delay restoring - though it does restore.

Setting suppress-vmdesc skips looking for the vmdesc during restore and
avoids the wait.

QEMU 5.2 enables suppress-vmdesc by default for xenfv, but this change
sets it manually for xenfv and xen_platform_pci=0 when -machine pc is
use.

QEMU commit 9850c6047b8b "migration: Allow to suppress vmdesc
submission" added suppress-vmdesc in QEMU 2.3.

Signed-off-by: Jason Andryuk 

---
QEMU 2.3 came out in 2015, so setting suppress-vmdesc unilaterally
should be okay...  Is this okay?
---
 tools/libs/light/libxl_dm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
index d1ff35dda3..3da83259c0 100644
--- a/tools/libs/light/libxl_dm.c
+++ b/tools/libs/light/libxl_dm.c
@@ -1778,9 +1778,9 @@ static int libxl__build_device_model_args_new(libxl__gc 
*gc,
 /* Switching here to the machine "pc" which does not add
  * the xen-platform device instead of the default "xenfv" machine.
  */
-machinearg = libxl__strdup(gc, "pc,accel=xen");
+machinearg = libxl__strdup(gc, "pc,accel=xen,suppress-vmdesc=on");
 } else {
-machinearg = libxl__strdup(gc, "xenfv");
+machinearg = libxl__strdup(gc, "xenfv,suppress-vmdesc=on");
 }
 if (b_info->u.hvm.mmio_hole_memkb) {
 uint64_t max_ram_below_4g = (1ULL << 32) -
-- 
2.25.1

Re: [PATCH] x86/pv: Drop stale comment in dom0_construct_pv()

2020-10-29 Thread Andrew Cooper

On 29/10/2020 14:37, Jan Beulich wrote:
> On 29.10.2020 15:00, Andrew Cooper wrote:
>> This comment has been around since c/s 1372bca0615 in 2004.  It is stale, as
>> it predates the introduction of struct vcpu.
> That commit only moved it around; it's 22a857bde9b8 afaics from
> early 2003 where it first appeared, where it had a reason:
>
> /*
>  * WARNING: The new domain must have its 'processor' field
>  * filled in by now !!
>  */
> phys_l2tab = ALLOC_FRAME_FROM_DOMAIN();
> l2tab = map_domain_mem(phys_l2tab);
> memcpy(l2tab, idle_pg_table[p->processor], PAGE_SIZE);

Oh yes - my simple search didn't spot the reformat.

>
> But yes, the comment has been stale for a long time, and I've
> been wondering a number of times what it was supposed to tell
> me. (I think it was already stale at the point the comment
> first got altered, in 3072fef54df8.)

Looks like it became stale with 99db02d5097 "Remove CPU-dependent
page-directory entries." which drops the per-cpu idle_pg_table.

>
>> It is not obvious that it was even correct at the time.  Where a vcpu (domain
>> at the time) has been configured to run is unrelated to construct the 
>> domain's
>> initial pagetables, etc.
>>
>> Signed-off-by: Andrew Cooper 
> Acked-by: Jan Beulich 

Thanks.  I'll update the commit message.

~Andrew

[xen-4.11-testing test] 156277: tolerable FAIL - PUSHED

2020-10-29 Thread osstest service owner

flight 156277 xen-4.11-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/156277/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 12 debian-hvm-install 
fail like 156034
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 12 debian-hvm-install 
fail like 156034
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 156034
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 156034
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 156034
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 156034
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 156034
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 156034
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 156034
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 156034
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 156034
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 156034
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 156034
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 xen  e274c8bdc12eb596e55233040e8b49da27150f31
baseline version:
 xen  63199dfd3a0418f1677c6ccd7fe05b123af4610a

Last test of basis   156034  2020-10-20 13:35:54 Z9 days
Testing same since   156262  2020-10-27 18:36:52 Z2 days2 attempts


People who touched revisions under test:
  Andrew Cooper 

jobs:
 build-amd64-xsm  pass
 build-

Re: [PATCH V2 00/23] IOREQ feature (+ virtio-mmio) on Arm

2020-10-29 Thread Stefano Stabellini

On Thu, 29 Oct 2020, Oleksandr Tyshchenko wrote:
> On Thu, Oct 29, 2020 at 9:42 AM Masami Hiramatsu 
>  wrote:
>   Hi Oleksandr,
> 
> Hi Masami
> 
> [sorry for the possible format issue]
>  
> 
>   I would like to try this on my arm64 board.
> 
> Glad to hear you are interested in this topic. 
>  
> 
>   According to your comments in the patch, I made this config file.
>   # cat debian.conf
>   name = "debian"
>   type = "pvh"
>   vcpus = 8
>   memory = 512
>   kernel = "/opt/agl/vmlinuz-5.9.0-1-arm64"
>   ramdisk = "/opt/agl/initrd.img-5.9.0-1-arm64"
>   cmdline= "console=hvc0 earlyprintk=xen root=/dev/xvda1 rw"
>   disk = [ '/opt/agl/debian.qcow2,qcow2,hda' ]
>   vif = [ 'mac=00:16:3E:74:3d:76,bridge=xenbr0' ]
>   virtio = 1
>   vdisk = [ 'backend=Dom0, disks=ro:/dev/sda1' ]
> 
>   And tried to boot a DomU, but I got below error.
> 
>   # xl create -c debian.conf
>   Parsing config from debian.conf
>   libxl: error: libxl_create.c:1863:domcreate_attach_devices: Domain
>   1:unable to add virtio_disk devices
>   libxl: error: libxl_domain.c:1218:destroy_domid_pci_done: Domain
>   1:xc_domain_pause failed
>   libxl: error: libxl_dom.c:39:libxl__domain_type: unable to get domain
>   type for domid=1
>   libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain
>   1:Unable to destroy guest
>   libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain
>   1:Destruction of domain failed
> 
> 
>   Could you tell me how can I test it?
> 
> 
> I assume it is due to the lack of the virtio-disk backend (which I haven't 
> shared yet as I focused on the IOREQ/DM support on Arm in the
> first place).
> Could you wait a little bit, I am going to share it soon. 

Do you have a quick-and-dirty hack you can share in the meantime? Even
just on github as a special branch? It would be very useful to be able
to have a test-driver for the new feature.

Re: Xen on RP4

2020-10-29 Thread Stefano Stabellini

On Thu, 29 Oct 2020, Jürgen Groß wrote:
> On 29.10.20 01:37, Stefano Stabellini wrote:
> > On Tue, 27 Oct 2020, Elliott Mitchell wrote:
> > > On Mon, Oct 26, 2020 at 06:44:27PM +, Julien Grall wrote:
> > > > On 26/10/2020 16:03, Elliott Mitchell wrote:
> > > > > On Mon, Oct 26, 2020 at 01:31:42PM +, Julien Grall wrote:
> > > > > > On 24/10/2020 06:35, Elliott Mitchell wrote:
> > > > > > > ACPI has a distinct
> > > > > > > means of specifying a limited DMA-width; the above fails, because
> > > > > > > it
> > > > > > > assumes a *device-tree*.
> > > > > > 
> > > > > > Do you know if it would be possible to infer from the ACPI static
> > > > > > table
> > > > > > the DMA-width?
> > > > > 
> > > > > Yes, and it is.  Due to not knowing much about ACPI tables I don't
> > > > > know
> > > > > what the C code would look like though (problem is which documentation
> > > > > should I be looking at first?).
> > > > 
> > > > What you provided below is an excerpt of the DSDT. AFAIK, DSDT content
> > > > is written in AML. So far the shortest implementation I have seen for
> > > > the AML parser is around 5000 lines (see [1]). It might be possible to
> > > > strip some the code, although I think this will still probably too big
> > > > for a single workaround.
> > > > 
> > > > What I meant by "static table" is a table that looks like a structure
> > > > and can be parsed in a few lines. If we can't find on contain the DMA
> > > > window, then the next best solution is to find a way to identity the
> > > > platform.
> > > > 
> > > > I don't know enough ACPI to know if this solution is possible. A good
> > > > starter would probably be the ACPI spec [2].
> > > 
> > > Okay, that is worse than I had thought (okay, ACPI is impressively
> > > complex for something nominally firmware-level).
> > > 
> > > There are strings in the present Tianocore implementation for Raspberry
> > > PI 4B which could be targeted.  Notably included in the output during
> > > boot listing the tables, "RPIFDN", "RPIFDN RPI" and "RPIFDN RPI4" (I'm
> > > unsure how kosher these are as this wsn't implemented nor blessed by the
> > > Raspberry PI Foundation).
> > > 
> > > I strongly dislike this approach as you soon turn the Xen project into a
> > > database of hardware.  This is already occurring with
> > > xen/arch/arm/platforms and I would love to do something about this.  On
> > > that thought, how about utilizing Xen's command-line for this purpose?
> > 
> > I don't think that a command line option is a good idea: basically it is
> > punting to users the task of platform detection. Also, it means that
> > users will be necessarily forced to edit the uboot script or grub
> > configuration file to boot.
> > 
> > Note that even if we introduced a new command line, we wouldn't take
> > away the need for xen/arch/arm/platforms anyway.
> > 
> > It would be far better for Xen to autodetect the platform if we can.
> > 
> > 
> > > Have a procedure of during installation/updates retrieve DMA limitation
> > > information from the running OS and the following boot Xen will follow
> > > the appropriate setup.  By its nature, Domain 0 will have the information
> > > needed, just becomes an issue of how hard that is to retrieve...
> > 
> > Historically that is what we used to do for many things related to ACPI,
> > but unfortunately it leads to a pretty bad architecture where Xen
> > depends on Dom0 for booting rather than the other way around. (Dom0
> > should be the one requiring Xen for booting, given that Xen is higher
> > privilege and boots first.)
> > 
> > 
> > I think the best compromise is still to use an ACPI string to detect the
> > platform. For instance, would it be possible to use the OEMID fields in
> > RSDT, XSDT, FADT?  Possibly even a combination of them?
> > 
> > Another option might be to get the platform name from UEFI somehow.
> 
> What about having a small domain parsing the ACPI booting first and use
> that information for booting dom0?
> 
> That dom would be part of the Xen build and the hypervisor wouldn't need
> to gain all the ACPI AML logic.

That could work, but in practice we don't have such a domain today --
the infrastructure is missing. I wonder whether the bootloader (uboot or
grub) would know about the platform and might be able to pass that
information to Xen somehow.

[PATCH v1 00/23] reduce overhead during live migration

2020-10-29 Thread Olaf Hering

The current live migration code can easily saturate an 1Gb link.
There is still room for improvement with faster network connections.
Even with this series reviewed and applied.
See description of patch #6.

Olaf

Olaf Hering (23):
  tools: add readv_exact to libxenctrl
  tools: add xc_is_known_page_type to libxenctrl
  tools: use xc_is_known_page_type
  tools: unify type checking for data pfns in migration stream
  tools: show migration transfer rate in send_dirty_pages
  tools/guest: prepare to allocate arrays once
  tools/guest: save: move batch_pfns
  tools/guest: save: move mfns array
  tools/guest: save: move types array
  tools/guest: save: move errors array
  tools/guest: save: move iov array
  tools/guest: save: move rec_pfns array
  tools/guest: save: move guest_data array
  tools/guest: save: move local_pages array
  tools/guest: restore: move pfns array
  tools/guest: restore: move types array
  tools/guest: restore: move mfns array
  tools/guest: restore: move map_errs array
  tools/guest: restore: move mfns array in populate_pfns
  tools/guest: restore: move pfns array in populate_pfns
  tools/guest: restore: split record processing
  tools/guest: restore: split handle_page_data
  tools/guest: restore: write data directly into guest

 tools/libs/ctrl/xc_private.c  |  54 ++-
 tools/libs/ctrl/xc_private.h  |  34 ++
 tools/libs/guest/xg_sr_common.c   |  33 +-
 tools/libs/guest/xg_sr_common.h   |  86 +++-
 tools/libs/guest/xg_sr_restore.c  | 562 +-
 tools/libs/guest/xg_sr_save.c | 158 
 tools/libs/guest/xg_sr_save_x86_hvm.c |   5 +-
 tools/libs/guest/xg_sr_save_x86_pv.c  |  31 +-
 8 files changed, 666 insertions(+), 297 deletions(-)

Re: [PATCH] meson.build: fix building of Xen support for aarch64

2020-10-29 Thread Stefano Stabellini

On Thu, 29 Oct 2020, Jason Andryuk wrote:
> On Thu, Oct 29, 2020 at 6:01 AM Alex Bennée  wrote:
> >
> >
> > Stefano Stabellini  writes:
> >
> > > On Wed, 28 Oct 2020, Alex Bennée wrote:
> > >> Xen is supported on aarch64 although weirdly using the i386-softmmu
> > >> model. Checking based on the host CPU meant we never enabled Xen
> > >> support. It would be nice to enable CONFIG_XEN for aarch64-softmmu to
> > >> make it not seem weird but that will require further build surgery.
> > >>
> > >> Signed-off-by: Alex Bennée 
> > >> Cc: Masami Hiramatsu 
> > >> Cc: Stefano Stabellini 
> > >> Cc: Anthony Perard 
> > >> Cc: Paul Durrant 
> > >> Fixes: 8a19980e3f ("configure: move accelerator logic to meson")
> > >> ---
> > >>  meson.build | 2 ++
> > >>  1 file changed, 2 insertions(+)
> > >>
> > >> diff --git a/meson.build b/meson.build
> > >> index 835424999d..f1fcbfed4c 100644
> > >> --- a/meson.build
> > >> +++ b/meson.build
> > >> @@ -81,6 +81,8 @@ if cpu in ['x86', 'x86_64']
> > >>  'CONFIG_HVF': ['x86_64-softmmu'],
> > >>  'CONFIG_WHPX': ['i386-softmmu', 'x86_64-softmmu'],
> > >>}
> > >> +elif cpu in [ 'arm', 'aarch64' ]
> > >> +  accelerator_targets += { 'CONFIG_XEN': ['i386-softmmu'] }
> > >>  endif
> > >
> > > This looks very reasonable -- the patch makes sense.
> 
> A comment would be useful to explain that Arm needs i386-softmmu for
> the xenpv machine.
> 
> > >
> > > However I have two questions, mostly for my own understanding. I tried
> > > to repro the aarch64 build problem but it works at my end, even without
> > > this patch.
> >
> > Building on a x86 host or with cross compiler?
> >
> > > I wonder why. I suspect it works thanks to these lines in
> > > meson.build:
> 
> I think it's a runtime and not a build problem.  In osstest, Xen
> support was detected and configured, but CONFIG_XEN wasn't set for
> Arm.  So at runtime xen_available() returns 0, and QEMU doesn't start
> with "qemu-system-i386: -xen-domid 1: Option not supported for this
> target"
> 
> I posted my investigation here:
> https://lore.kernel.org/xen-devel/cakf6xpss8kpgovzrkitpz63bhbvbjxrtywdhekzuo2q1kem...@mail.gmail.com/

Right, but strangely I cannot reproduce it here. I am natively building
(qemu-user) on aarch64 and for me CONFIG_XEN gets set.

Re: [PATCH] x86/hvm: process softirq while saving/loading entries

2020-10-29 Thread Roger Pau Monné

On Thu, Oct 29, 2020 at 05:38:17PM +0100, Jan Beulich wrote:
> On 29.10.2020 16:20, Roger Pau Monne wrote:
> > On slow systems with sync_console saving or loading the context of big
> > guests can cause the watchdog to trigger. Fix this by adding a couple
> > of process_pending_softirqs.
> 
> Which raises the question in how far this is then also a problem
> for the caller of the underlying hypercall. IOW I wonder whether
> instead we need to make use of continuations here. Nevertheless

FWIW, I've only hit this with debug builds on boxes that have slow
serial with sync_console enabled, due to the verbose printks.

> 
> > Signed-off-by: Roger Pau Monné 
> 
> Acked-by: Jan Beulich 

Thanks.

Re: [PATCH V2 00/23] IOREQ feature (+ virtio-mmio) on Arm

2020-10-29 Thread Alex Bennée

Oleksandr Tyshchenko  writes:

> On Thu, Oct 29, 2020 at 9:42 AM Masami Hiramatsu <
> masami.hirama...@linaro.org> wrote:
>
>> Hi Oleksandr,
>>
> Hi Masami
>
> [sorry for the possible format issue]
>
>
>>
>> I would like to try this on my arm64 board.
>>
> Glad to hear you are interested in this topic.
>
>
>>
>> According to your comments in the patch, I made this config file.
>> # cat debian.conf
>> name = "debian"
>> type = "pvh"
>> vcpus = 8
>> memory = 512
>> kernel = "/opt/agl/vmlinuz-5.9.0-1-arm64"
>> ramdisk = "/opt/agl/initrd.img-5.9.0-1-arm64"
>> cmdline= "console=hvc0 earlyprintk=xen root=/dev/xvda1 rw"
>> disk = [ '/opt/agl/debian.qcow2,qcow2,hda' ]
>> vif = [ 'mac=00:16:3E:74:3d:76,bridge=xenbr0' ]
>> virtio = 1
>> vdisk = [ 'backend=Dom0, disks=ro:/dev/sda1' ]
>>
>> And tried to boot a DomU, but I got below error.
>>
>> # xl create -c debian.conf
>> Parsing config from debian.conf
>> libxl: error: libxl_create.c:1863:domcreate_attach_devices: Domain
>> 1:unable to add virtio_disk devices
>> libxl: error: libxl_domain.c:1218:destroy_domid_pci_done: Domain
>> 1:xc_domain_pause failed
>> libxl: error: libxl_dom.c:39:libxl__domain_type: unable to get domain
>> type for domid=1
>> libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain
>> 1:Unable to destroy guest
>> libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain
>> 1:Destruction of domain failed
>>
>>
>> Could you tell me how can I test it?
>>
>
> I assume it is due to the lack of the virtio-disk backend (which I haven't
> shared yet as I focused on the IOREQ/DM support on Arm in the first place).
> Could you wait a little bit, I am going to share it soon.

I assume this is wiring up the required bits of support to handle the
IOREQ requests in QEMU? We are putting together a PoC demo to show
a virtio enabled image (AGL) running on both KVM and Xen hypervisors so
we are keen to see your code as soon as you can share it.

I'm currently preparing a patch series for QEMU which fixes the recent
breakage caused by changes to the build system. As part of that I've
separated CONFIG_XEN and CONFIG_XEN_HVM so it's possible to build
i386-softmmu with unneeded for ARM bits. Hopefully this will allow me to
create a qemu-aarch64-system binary with just the PV related models in
it.

Talking to Stefano it probably makes sense going forward to introduce a
new IOREQ aware machine type for QEMU that doesn't bring in the rest of
the x86 overhead. I was thinking maybe xenvirt?

You've tested with virtio-block but we'd also like to extend this to
other arbitrary virtio devices. I guess we will need some sort of
mechanism to inform the QEMU command line where each device sits in the
virtio-mmio bus so the FDT Xen delivers to the guest matches up with
what QEMU is ready to serve requests for?

-- 
Alex Bennée

Re: [PATCH] meson.build: fix building of Xen support for aarch64

2020-10-29 Thread Stefano Stabellini

On Thu, 29 Oct 2020, Alex Bennée wrote:
> > On Wed, 28 Oct 2020, Alex Bennée wrote:
> >> Xen is supported on aarch64 although weirdly using the i386-softmmu
> >> model. Checking based on the host CPU meant we never enabled Xen
> >> support. It would be nice to enable CONFIG_XEN for aarch64-softmmu to
> >> make it not seem weird but that will require further build surgery.
> >> 
> >> Signed-off-by: Alex Bennée 
> >> Cc: Masami Hiramatsu 
> >> Cc: Stefano Stabellini 
> >> Cc: Anthony Perard 
> >> Cc: Paul Durrant 
> >> Fixes: 8a19980e3f ("configure: move accelerator logic to meson")
> >> ---
> >>  meson.build | 2 ++
> >>  1 file changed, 2 insertions(+)
> >> 
> >> diff --git a/meson.build b/meson.build
> >> index 835424999d..f1fcbfed4c 100644
> >> --- a/meson.build
> >> +++ b/meson.build
> >> @@ -81,6 +81,8 @@ if cpu in ['x86', 'x86_64']
> >>  'CONFIG_HVF': ['x86_64-softmmu'],
> >>  'CONFIG_WHPX': ['i386-softmmu', 'x86_64-softmmu'],
> >>}
> >> +elif cpu in [ 'arm', 'aarch64' ]
> >> +  accelerator_targets += { 'CONFIG_XEN': ['i386-softmmu'] }
> >>  endif
> >
> > This looks very reasonable -- the patch makes sense.
> >
> >
> > However I have two questions, mostly for my own understanding. I tried
> > to repro the aarch64 build problem but it works at my end, even without
> > this patch.
> 
> Building on a x86 host or with cross compiler?

native build (qemu-user)


> > I wonder why. I suspect it works thanks to these lines in
> > meson.build:
> >
> >   if not get_option('xen').disabled() and 'CONFIG_XEN_BACKEND' in 
> > config_host
> > accelerators += 'CONFIG_XEN'
> > have_xen_pci_passthrough = not 
> > get_option('xen_pci_passthrough').disabled() and targetos == 'linux'
> >   else
> > have_xen_pci_passthrough = false
> >   endif
> >
> > But I am not entirely sure who is adding CONFIG_XEN_BACKEND to
> > config_host.
> 
> The is part of the top level configure check - which basically checks
> for --enable-xen or autodetects the presence of the userspace libraries.
> I'm not sure if we have a slight over proliferation of symbols for Xen
> support (although I'm about to add more).
> 
> > The other question is: does it make sense to print the value of
> > CONFIG_XEN as part of the summary? Something like:
> >
> > diff --git a/meson.build b/meson.build
> > index 47e32e1fcb..c6e7832dc9 100644
> > --- a/meson.build
> > +++ b/meson.build
> > @@ -2070,6 +2070,7 @@ summary_info += {'KVM support':   
> > config_all.has_key('CONFIG_KVM')}
> >  summary_info += {'HAX support':   config_all.has_key('CONFIG_HAX')}
> >  summary_info += {'HVF support':   config_all.has_key('CONFIG_HVF')}
> >  summary_info += {'WHPX support':  config_all.has_key('CONFIG_WHPX')}
> > +summary_info += {'XEN support':  config_all.has_key('CONFIG_XEN')}
> >  summary_info += {'TCG support':   config_all.has_key('CONFIG_TCG')}
> >  if config_all.has_key('CONFIG_TCG')
> >summary_info += {'TCG debug enabled': 
> > config_host.has_key('CONFIG_DEBUG_TCG')}
> >
> >
> > But I realize there is already:
> >
> > summary_info += {'xen support':   
> > config_host.has_key('CONFIG_XEN_BACKEND')}
> >
> > so it would be a bit of a duplicate
> 
> Hmm so what we have is:
> 
>   CONFIG_XEN_BACKEND
> - ensures that appropriate compiler flags are added
> - pegs RAM_ADDR_MAX at UINT64_MAX (instead of UINTPTR_MAX)
>   CONFIG_XEN
> - which controls a bunch of build objects, some of which may be i386 only?
> ./accel/meson.build:15:specific_ss.add_all(when: ['CONFIG_XEN'], if_true: 
> dummy_ss)
> ./accel/stubs/meson.build:2:specific_ss.add(when: 'CONFIG_XEN', if_false: 
> files('xen-stub.c'))
> ./accel/xen/meson.build:1:specific_ss.add(when: 'CONFIG_XEN', if_true: 
> files('xen-all.c'))
> ./hw/9pfs/meson.build:17:fs_ss.add(when: 'CONFIG_XEN', if_true: 
> files('xen-9p-backend.c'))
> ./hw/block/dataplane/meson.build:2:specific_ss.add(when: 'CONFIG_XEN', 
> if_true: files('xen-block.c'))
> ./hw/block/meson.build:14:softmmu_ss.add(when: 'CONFIG_XEN', if_true: 
> files('xen-block.c'))
> ./hw/char/meson.build:23:softmmu_ss.add(when: 'CONFIG_XEN', if_true: 
> files('xen_console.c'))
> ./hw/display/meson.build:18:softmmu_ss.add(when: 'CONFIG_XEN', if_true: 
> files('xenfb.c'))
> ./hw/i386/xen/meson.build:1:i386_ss.add(when: 'CONFIG_XEN', if_true: 
> files('xen-hvm.c',
>   
>  'xen-mapcache.c',
>   
>  'xen_apic.c',
>   
>  'xen_platform.c',
>   
>  'xen_pvdevice.c')
> ./hw/net/meson.build:2:softmmu_ss.add(when: 'CONFIG_XEN', if_true: 
> files('xen_nic.c'))
> ./hw/usb/meson.build:76:softmmu_ss.add(when: ['CONFIG_USB', 'CONFIG_XEN', 
> libusb], if_true: files('xen-usb.c'))
>

Re: [PATCH V2 00/23] IOREQ feature (+ virtio-mmio) on Arm

2020-10-29 Thread Stefano Stabellini

On Thu, 29 Oct 2020, Alex Bennée wrote:
> Oleksandr Tyshchenko  writes:
> > On Thu, Oct 29, 2020 at 9:42 AM Masami Hiramatsu <
> > masami.hirama...@linaro.org> wrote:
> >
> >> Hi Oleksandr,
> >>
> > Hi Masami
> >
> > [sorry for the possible format issue]
> >
> >
> >>
> >> I would like to try this on my arm64 board.
> >>
> > Glad to hear you are interested in this topic.
> >
> >
> >>
> >> According to your comments in the patch, I made this config file.
> >> # cat debian.conf
> >> name = "debian"
> >> type = "pvh"
> >> vcpus = 8
> >> memory = 512
> >> kernel = "/opt/agl/vmlinuz-5.9.0-1-arm64"
> >> ramdisk = "/opt/agl/initrd.img-5.9.0-1-arm64"
> >> cmdline= "console=hvc0 earlyprintk=xen root=/dev/xvda1 rw"
> >> disk = [ '/opt/agl/debian.qcow2,qcow2,hda' ]
> >> vif = [ 'mac=00:16:3E:74:3d:76,bridge=xenbr0' ]
> >> virtio = 1
> >> vdisk = [ 'backend=Dom0, disks=ro:/dev/sda1' ]
> >>
> >> And tried to boot a DomU, but I got below error.
> >>
> >> # xl create -c debian.conf
> >> Parsing config from debian.conf
> >> libxl: error: libxl_create.c:1863:domcreate_attach_devices: Domain
> >> 1:unable to add virtio_disk devices
> >> libxl: error: libxl_domain.c:1218:destroy_domid_pci_done: Domain
> >> 1:xc_domain_pause failed
> >> libxl: error: libxl_dom.c:39:libxl__domain_type: unable to get domain
> >> type for domid=1
> >> libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain
> >> 1:Unable to destroy guest
> >> libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain
> >> 1:Destruction of domain failed
> >>
> >>
> >> Could you tell me how can I test it?
> >>
> >
> > I assume it is due to the lack of the virtio-disk backend (which I haven't
> > shared yet as I focused on the IOREQ/DM support on Arm in the first place).
> > Could you wait a little bit, I am going to share it soon.
> 
> I assume this is wiring up the required bits of support to handle the
> IOREQ requests in QEMU? We are putting together a PoC demo to show
> a virtio enabled image (AGL) running on both KVM and Xen hypervisors so
> we are keen to see your code as soon as you can share it.
> 
> I'm currently preparing a patch series for QEMU which fixes the recent
> breakage caused by changes to the build system. As part of that I've
> separated CONFIG_XEN and CONFIG_XEN_HVM so it's possible to build
> i386-softmmu with unneeded for ARM bits. Hopefully this will allow me to
> create a qemu-aarch64-system binary with just the PV related models in
> it.
> 
> Talking to Stefano it probably makes sense going forward to introduce a
> new IOREQ aware machine type for QEMU that doesn't bring in the rest of
> the x86 overhead. I was thinking maybe xenvirt?
> 
> You've tested with virtio-block but we'd also like to extend this to
> other arbitrary virtio devices. I guess we will need some sort of
> mechanism to inform the QEMU command line where each device sits in the
> virtio-mmio bus so the FDT Xen delivers to the guest matches up with
> what QEMU is ready to serve requests for?

That would be great. As a reference, given that the domU memory mapping
layout is fixed, on x86 we just made sure that QEMU's idea of where the
devices are is the same of Xen's. What you are suggesting would make it
much more flexible and clearer.

Re: [PATCH v3 2/3] x86/shadow: refactor shadow_vram_{get,put}_l1e()

2020-10-29 Thread Tim Deegan

At 10:44 +0200 on 19 Oct (1603104271), Jan Beulich wrote:
> By passing the functions an MFN and flags, only a single instance of
> each is needed; they were pretty large for being inline functions
> anyway.
> 
> While moving the code, also adjust coding style and add const where
> sensible / possible.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan

Re: [XEN PATCH v1] xen/arm : Add support for SMMUv3 driver

2020-10-29 Thread Stefano Stabellini

On Thu, 29 Oct 2020, Bertrand Marquis wrote:
> > On 28 Oct 2020, at 19:12, Julien Grall  wrote:
> > On 26/10/2020 11:03, Rahul Singh wrote:
> >> Hello Julien,
> >>> On 23 Oct 2020, at 4:19 pm, Julien Grall  wrote:
> >>> On 23/10/2020 15:27, Rahul Singh wrote:
>  Hello Julien,
> > On 23 Oct 2020, at 2:00 pm, Julien Grall  wrote:
> > On 23/10/2020 12:35, Rahul Singh wrote:
> >> Hello,
> >>> On 23 Oct 2020, at 1:02 am, Stefano Stabellini 
> >>>  wrote:
> >>> 
> >>> On Thu, 22 Oct 2020, Julien Grall wrote:
> >> On 20/10/2020 16:25, Rahul Singh wrote:
> >>> Add support for ARM architected SMMUv3 implementations. It is 
> >>> based on
> >>> the Linux SMMUv3 driver.
> >>> Major differences between the Linux driver are as follows:
> >>> 1. Only Stage-2 translation is supported as compared to the Linux 
> >>> driver
> >>>that supports both Stage-1 and Stage-2 translations.
> >>> 2. Use P2M  page table instead of creating one as SMMUv3 has the
> >>>capability to share the page tables with the CPU.
> >>> 3. Tasklets is used in place of threaded IRQ's in Linux for event 
> >>> queue
> >>>and priority queue IRQ handling.
> >> 
> >> Tasklets are not a replacement for threaded IRQ. In particular, 
> >> they will
> >> have priority over anything else (IOW nothing will run on the pCPU 
> >> until
> >> they are done).
> >> 
> >> Do you know why Linux is using thread. Is it because of long 
> >> running
> >> operations?
> > 
> > Yes you are right because of long running operations Linux is using 
> > the
> > threaded IRQs.
> > 
> > SMMUv3 reports fault/events bases on memory-based circular buffer 
> > queues not
> > based on the register. As per my understanding, it is 
> > time-consuming to
> > process the memory based queues in interrupt context because of 
> > that Linux
> > is using threaded IRQ to process the faults/events from SMMU.
> > 
> > I didn’t find any other solution in XEN in place of tasklet to 
> > defer the
> > work, that’s why I used tasklet in XEN in replacement of threaded 
> > IRQs. If
> > we do all work in interrupt context we will make XEN less 
> > responsive.
>  
>  So we need to make sure that Xen continue to receives interrupts, 
>  but we also
>  need to make sure that a vCPU bound to the pCPU is also responsive.
>  
> > 
> > If you know another solution in XEN that will be used to defer the 
> > work in
> > the interrupt please let me know I will try to use that.
>  
>  One of my work colleague encountered a similar problem recently. He 
>  had a long
>  running tasklet and wanted to be broken down in smaller chunk.
>  
>  We decided to use a timer to reschedule the taslket in the future. 
>  This allows
>  the scheduler to run other loads (e.g. vCPU) for some time.
>  
>  This is pretty hackish but I couldn't find a better solution as 
>  tasklet have
>  high priority.
>  
>  Maybe the other will have a better idea.
> >>> 
> >>> Julien's suggestion is a good one.
> >>> 
> >>> But I think tasklets can be configured to be called from the 
> >>> idle_loop,
> >>> in which case they are not run in interrupt context?
> >>> 
> >>  Yes you are right tasklet will be scheduled from the idle_loop that 
> >> is not interrupt conext.
> > 
> > This depends on your tasklet. Some will run from the softirq context 
> > which is usually (for Arm) on the return of an exception.
> > 
>  Thanks for the info. I will check and will get better understanding of 
>  the tasklet how it will run in XEN.
> >>> 
> >>> 4. Latest version of the Linux SMMUv3 code implements the 
> >>> commands queue
> >>>access functions based on atomic operations implemented in 
> >>> Linux.
> >> 
> >> Can you provide more details?
> > 
> > I tried to port the latest version of the SMMUv3 code than I 
> > observed that
> > in order to port that code I have to also port atomic operation 
> > implemented
> > in Linux to XEN. As latest Linux code uses atomic operation to 
> > process the
> > command queues 
> > (atomic_cond_read_relaxed(),atomic_long_cond_read_relaxed() ,
> > atomic_fetch_andnot_relaxed()) .
>  
>  Thank you for the explanation. I think it would be best to import 
>  the atomic
>  helpers and use the latest code.
>  
>  This will ensure

Re: [PATCH v3 3/3] x86/shadow: sh_{make,destroy}_monitor_table() are "even more" HVM-only

2020-10-29 Thread Tim Deegan

At 10:45 +0200 on 19 Oct (1603104300), Jan Beulich wrote:
> With them depending on just the number of shadow levels, there's no need
> for more than one instance of them, and hence no need for any hook (IOW
> 452219e24648 ["x86/shadow: monitor table is HVM-only"] didn't go quite
> far enough). Move the functions to hvm.c while dropping the dead
> is_pv_32bit_domain() code paths.
>
> While moving the code, replace a stale comment reference to
> sh_install_xen_entries_in_l4(). Doing so made me notice the function
> also didn't have its prototype dropped in 8d7b633adab7 ("x86/mm:
> Consolidate all Xen L4 slot writing into init_xen_l4_slots()"), which
> gets done here as well.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 

> TBD: In principle both functions could have their first parameter
>  constified. In fact, "destroy" doesn't depend on the vCPU at all
>  and hence could be passed a struct domain *. Not sure whether such
>  an asymmetry would be acceptable.
>  In principle "make" would also not need passing of the number of
>  shadow levels (can be derived from v), which would result in yet
>  another asymmetry.
>  If these asymmetries were acceptable, "make" could then also update
>  v->arch.hvm.monitor_table, instead of doing so at both call sites.

Feel free to add consts, but please don't change the function
parameters any more than that.  I would rather keep them as a matched
pair, and leave the hvm.monitor_table updates in the caller, where
it's easier to see why they're not symmetrical.

Cheers

Tim.

Re: [PATCH 2/5] x86/p2m: collapse the two ->write_p2m_entry() hooks

2020-10-29 Thread Tim Deegan

At 10:22 +0100 on 28 Oct (1603880578), Jan Beulich wrote:
> The struct paging_mode instances get set to the same functions
> regardless of mode by both HAP and shadow code, hence there's no point
> having this hook there. The hook also doesn't need moving elsewhere - we
> can directly use struct p2m_domain's. This merely requires (from a
> strictly formal pov; in practice this may not even be needed) making
> sure we don't end up using safe_write_pte() for nested P2Ms.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan

Re: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header

2020-10-29 Thread Thomas Gleixner

On Thu, Oct 29 2020 at 17:59, Paolo Bonzini wrote:
> On 29/10/20 17:56, Arvind Sankar wrote:
>>> For those two just add:
>>> struct apic *apic = x86_system_apic;
>>> before all the assignments.
>>> Less churn and much better code.
>>>
>> Why would it be better code?
>> 
>
> I think he means the compiler produces better code, because it won't
> read the global variable repeatedly.  Not sure if that's true,(*) but I
> think I do prefer that version if Arnd wants to do that tweak.

It's not true.

 foo *p = bar;

 p->a = 1;
 p->b = 2;

The compiler is free to reload bar after accessing p->a and with

bar->a = 1;
bar->b = 1;

it can either cache bar in a register or reread it after bar->a

The generated code is the same as long as there is no reason to reload,
e.g. register pressure.

Thanks,

tglx

Re: [PATCH 5/5] x86/p2m: split write_p2m_entry() hook

2020-10-29 Thread Tim Deegan

At 10:24 +0100 on 28 Oct (1603880693), Jan Beulich wrote:
> Fair parts of the present handlers are identical; in fact
> nestedp2m_write_p2m_entry() lacks a call to p2m_entry_modify(). Move
> common parts right into write_p2m_entry(), splitting the hooks into a
> "pre" one (needed just by shadow code) and a "post" one.
> 
> For the common parts moved I think that the p2m_flush_nestedp2m() is,
> at least from an abstract perspective, also applicable in the shadow
> case. Hence it doesn't get a 3rd hook put in place.
> 
> The initial comment that was in hap_write_p2m_entry() gets dropped: Its
> placement was bogus, and looking back the the commit introducing it
> (dd6de3ab9985 "Implement Nested-on-Nested") I can't see either what use
> of a p2m it was meant to be associated with.
> 
> Signed-off-by: Jan Beulich 

This seems like a good approach to me.  I'm happy with the shadow
parts but am not confident enough on nested p2m any more to have an
opinion on that side. 

Acked-by: Tim Deegan

Re: [PATCH] x86/shadow: correct GFN use by sh_unshadow_for_p2m_change()

2020-10-29 Thread Tim Deegan

At 16:42 +0100 on 28 Oct (1603903365), Jan Beulich wrote:
> Luckily sh_remove_all_mappings()'s use of the parameter is limited to
> generation of log messages. Nevertheless we'd better pass correct GFNs
> around:
> - the incoming GFN, when replacing a large page, may not be large page
>   aligned,
> - incrementing by page-size-scaled values can't be right.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Tim Deegan 

Thanks!

Tim.

Re: Ping: [PATCH v3 0/3] x86: shim building adjustments (plus shadow follow-on)

2020-10-29 Thread Tim Deegan

At 14:40 +0100 on 29 Oct (1603982415), Jan Beulich wrote:
> Tim,
> 
> unless you tell me otherwise I'm intending to commit the latter
> two with Roger's acks some time next week. Of course it would
> still be nice to know your view on the first of the TBDs in
> patch 3 (which may result in further changes, in a follow-up).

Ack, sorry for the dropped patches, and thank you for the ping.  I've
now replied to everything that I think is wating for my review.

Cheers,

Tim.

1 2 >

1 - 100 of 119 matches

Mail list logo