Re: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header
On 28/10/20 22:20, Arnd Bergmann wrote: > Avoid this by renaming the global 'apic' variable to the more descriptive > 'x86_system_apic'. It was originally called 'genapic', but both that > and the current 'apic' seem to be a little overly generic for a global > variable. The 'apic' affects only the current CPU, so one of 'x86_local_apic', 'x86_lapic' or 'x86_apic' is probably preferrable. I don't have huge objections to renaming 'apic' variables and arguments in KVM to 'lapic'. I do agree with Sean however that it's going to break again very soon. Paolo
Re: [PATCH 20/33] docs: ABI: testing: make the files compatible with ReST output
Hi Richard, Em Wed, 28 Oct 2020 10:44:27 -0700 Richard Cochran escreveu: > On Wed, Oct 28, 2020 at 03:23:18PM +0100, Mauro Carvalho Chehab wrote: > > > diff --git a/Documentation/ABI/testing/sysfs-uevent > > b/Documentation/ABI/testing/sysfs-uevent > > index aa39f8d7bcdf..d0893dad3f38 100644 > > --- a/Documentation/ABI/testing/sysfs-uevent > > +++ b/Documentation/ABI/testing/sysfs-uevent > > @@ -19,7 +19,8 @@ Description: > > a transaction identifier so it's possible to use the same > > UUID > > value for one or more synthetic uevents in which case we > > logically group these uevents together for any userspace > > -listeners. The UUID value appears in uevent as > > +listeners. The UUID value appears in uevent as: > > I know almost nothing about Sphinx, but why have one colon here ^^^ and ... Good point. After re-reading the text, this ":" doesn't belong here. > > > + > > "SYNTH_UUID=----" > > environment > > variable. > > > > @@ -30,18 +31,19 @@ Description: > > It's possible to define zero or more pairs - each pair is > > then > > delimited by a space character ' '. Each pair appears in > > synthetic uevent as "SYNTH_ARG_KEY=VALUE". That means the > > KEY > > -name gains "SYNTH_ARG_" prefix to avoid possible collisions > > +name gains `SYNTH_ARG_` prefix to avoid possible collisions > > with existing variables. > > > > -Example of valid sequence written to the uevent file: > > +Example of valid sequence written to the uevent file:: > > ... two here? The main issue that this patch wants to solve is here: This generates synthetic uevent including these variables:: ACTION=add SYNTH_ARG_A=1 SYNTH_ARG_B=abc SYNTH_UUID=fe4d7c9d-b8c6-4a70-9ef1-3d8a58d18eed On Sphinx, consecutive lines with the same indent belongs to the same paragraph. So, without "::", the above will be displayed on a single line, which is undesired. using "::" tells Sphinx to display as-is. It will also place it into a a box (colored for html output) and using a monospaced font. The change at the "uevent file:" line was done just for coherency purposes. Yet, after re-reading the text, there are other things that are not coherent. So, I guess the enclosed patch will work better for sys-uevent. Thanks, Mauro docs: ABI: sysfs-uevent: make it compatible with ReST output - Replace " by ``, in order to use monospaced fonts; - mark literal blocks as such. Signed-off-by: Mauro Carvalho Chehab diff --git a/Documentation/ABI/testing/sysfs-uevent b/Documentation/ABI/testing/sysfs-uevent index aa39f8d7bcdf..0b6227706b35 100644 --- a/Documentation/ABI/testing/sysfs-uevent +++ b/Documentation/ABI/testing/sysfs-uevent @@ -6,42 +6,46 @@ Description: Enable passing additional variables for synthetic uevents that are generated by writing /sys/.../uevent file. -Recognized extended format is ACTION [UUID [KEY=VALUE ...]. +Recognized extended format is:: -The ACTION is compulsory - it is the name of the uevent action -("add", "change", "remove"). There is no change compared to -previous functionality here. The rest of the extended format -is optional. + ACTION [UUID [KEY=VALUE ...] + +The ACTION is compulsory - it is the name of the uevent +action (``add``, ``change``, ``remove``). There is no change +compared to previous functionality here. The rest of the +extended format is optional. You need to pass UUID first before any KEY=VALUE pairs. -The UUID must be in "----" +The UUID must be in ``----`` format where 'x' is a hex digit. The UUID is considered to be a transaction identifier so it's possible to use the same UUID value for one or more synthetic uevents in which case we logically group these uevents together for any userspace listeners. The UUID value appears in uevent as -"SYNTH_UUID=----" environment +``SYNTH_UUID=----`` environment variable. If UUID is not passed in, the generated synthetic uevent gains -"SYNTH_UUID=0" environment variable automatically. +``SYNTH_UUID=0`` environment variable automatically. The KEY=VALUE pairs can contain alphanumeric char
Re: [PATCH V2 00/23] IOREQ feature (+ virtio-mmio) on Arm
Hi Oleksandr, I would like to try this on my arm64 board. According to your comments in the patch, I made this config file. # cat debian.conf name = "debian" type = "pvh" vcpus = 8 memory = 512 kernel = "/opt/agl/vmlinuz-5.9.0-1-arm64" ramdisk = "/opt/agl/initrd.img-5.9.0-1-arm64" cmdline= "console=hvc0 earlyprintk=xen root=/dev/xvda1 rw" disk = [ '/opt/agl/debian.qcow2,qcow2,hda' ] vif = [ 'mac=00:16:3E:74:3d:76,bridge=xenbr0' ] virtio = 1 vdisk = [ 'backend=Dom0, disks=ro:/dev/sda1' ] And tried to boot a DomU, but I got below error. # xl create -c debian.conf Parsing config from debian.conf libxl: error: libxl_create.c:1863:domcreate_attach_devices: Domain 1:unable to add virtio_disk devices libxl: error: libxl_domain.c:1218:destroy_domid_pci_done: Domain 1:xc_domain_pause failed libxl: error: libxl_dom.c:39:libxl__domain_type: unable to get domain type for domid=1 libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain 1:Unable to destroy guest libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain 1:Destruction of domain failed Could you tell me how can I test it? Thank you, 2020年10月16日(金) 1:46 Oleksandr Tyshchenko : > > From: Oleksandr Tyshchenko > > Hello all. > > The purpose of this patch series is to add IOREQ/DM support to Xen on Arm. > You can find an initial discussion at [1] and RFC/V1 series at [2]/[3]. > Xen on Arm requires some implementation to forward guest MMIO access to a > device > model in order to implement virtio-mmio backend or even mediator outside of > hypervisor. > As Xen on x86 already contains required support this series tries to make it > common > and introduce Arm specific bits plus some new functionality. Patch series is > based on > Julien's PoC "xen/arm: Add support for Guest IO forwarding to a device > emulator". > Besides splitting existing IOREQ/DM support and introducing Arm side, the > series > also includes virtio-mmio related changes (last 2 patches for toolstack) > for the reviewers to be able to see how the whole picture could look like. > > According to the initial discussion there are a few open questions/concerns > regarding security, performance in VirtIO solution: > 1. virtio-mmio vs virtio-pci, SPI vs MSI, different use-cases require > different >transport... > 2. virtio backend is able to access all guest memory, some kind of protection >is needed: 'virtio-iommu in Xen' vs 'pre-shared-memory & memcpys in guest' > 3. interface between toolstack and 'out-of-qemu' virtio backend, avoid using >Xenstore in virtio backend if possible. > 4. a lot of 'foreing mapping' could lead to the memory exhaustion, Julien >has some idea regarding that. > > Looks like all of them are valid and worth considering, but the first thing > which we need on Arm is a mechanism to forward guest IO to a device emulator, > so let's focus on it in the first place. > > *** > > There are a lot of changes since RFC series, almost all TODOs were resolved > on Arm, > Arm code was improved and hardened, common IOREQ/DM code became really > arch-agnostic > (without HVM-ism), but one TODO still remains which is "PIO handling" on Arm. > The "PIO handling" TODO is expected to left unaddressed for the current > series. > It is not an big issue for now while Xen doesn't have support for vPCI on Arm. > On Arm64 they are only used for PCI IO Bar and we would probably want to > expose > them to emulator as PIO access to make a DM completely arch-agnostic. So "PIO > handling" > should be implemented when we add support for vPCI. > > I left interface untouched in the following patch > "xen/dm: Introduce xendevicemodel_set_irq_level DM op" > since there is still an open discussion what interface to use/what > information to pass to the hypervisor. > > Also I decided to drop the following patch: > "[RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in > driver domain" > as I got an advise to write our own policy using FLASK which would cover our > use > case (with emulator in driver domain) rather than tweak Xen. > > There are two patches on review this series depends on (each involved patch > in this series > contains this note as well): > 1. https://patchwork.kernel.org/patch/11816689 > 2. https://patchwork.kernel.org/patch/11803383 > > Please note, that IOREQ feature is disabled by default within this series. > > *** > > Patch series [4] was rebased on recent "staging branch" > (8a62dee x86/vLAPIC: don't leak regs page from vlapic_init() upon error) and > tested on > Renesas Salvator-X board + H3 ES3.0 SoC (Arm64) with virtio-mmio disk backend > (we will > share it later) running in driver domain and unmodified Linux Guest running > on existing > virtio-blk driver (frontend). No issues were observed. Guest domain > 'reboot/destroy' > use-cases work properly. Patch series was only build-tested on x86. > > Please note, build-test passed for the following modes: > 1. x86: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default) > 2. x86:
[libvirt test] 156273: regressions - FAIL
flight 156273 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/156273/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777 build-arm64-libvirt 6 libvirt-buildfail REGR. vs. 151777 build-armhf-libvirt 6 libvirt-buildfail REGR. vs. 151777 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-pair 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-vhd 1 build-check(1) blocked n/a test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt 1 build-check(1) blocked n/a test-amd64-i386-libvirt-pair 1 build-check(1) blocked n/a test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked n/a test-arm64-arm64-libvirt 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-qcow2 1 build-check(1) blocked n/a test-arm64-arm64-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-libvirt 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-raw 1 build-check(1) blocked n/a version targeted for testing: libvirt 1f807631f402210d036ec4803e7adfefa222f786 baseline version: libvirt 2c846fa6bcc11929c9fb857a22430fb9945654ad Last test of basis 151777 2020-07-10 04:19:19 Z 111 days Failing since151818 2020-07-11 04:18:52 Z 110 days 104 attempts Testing same since 156273 2020-10-28 04:19:15 Z1 days1 attempts People who touched revisions under test: Adolfo Jayme Barrientos Andika Triwidada Andrea Bolognani Balázs Meskó Bastien Orivel Bihong Yu Binfeng Wu Boris Fiuczynski Christian Ehrhardt Christian Schoenebeck Cole Robinson Collin Walling Cornelia Huck Côme Borsoi Daniel Henrique Barboza Daniel Letai Daniel P. Berrange Daniel P. Berrangé Erik Skultety Fabian Freyer Fangge Jin Fedora Weblate Translation Halil Pasic Han Han Hao Wang Ian Wienand Jamie Strandboge Jamie Strandboge Jean-Baptiste Holcroft Jianan Gao Jim Fehlig Jin Yan Jiri Denemark Jonathon Jongsma Ján Tomko Kashyap Chamarthy Kevin Locke Laine Stump Liao Pingfang Lin Ma Lin Ma Marc Hartmayer Marek Marczykowski-Górecki Markus Schade Martin Kletzander Masayoshi Mizuma Matt Coleman Matt Coleman Mauro Matteo Cascella Michal Privoznik Michał Smyk Milo Casagrande Neal Gompa Nico Pache Nikolay Shirokovskiy Olesya Gerasimenko Patrick Magauran Paulo de Rezende Pinatti Pavel Hrdina Peter Krempa Pino Toscano Pino Toscano Piotr Drąg Prathamesh Chavan Roman Bogorodskiy Roman Bolshakov Ryan Schmidt Sam Hartman Scott Shambarger Sebastian Mitterle Simon Gaiser Stefan Bader Stefan Berger Szymon Scholz Thomas Huth Tim Wiederhake Tomáš Golembiovský Wang Xin Weblate Yang Hang Yanqiu Zhang Yi Li Yi Wang Yuri Chornoivan Zheng Chuan zhenwei pi Zhenyu Zheng jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt fail build-arm64-libvirt fail build-armhf-libvirt fail build-i386-libvirt fail build-amd64-pvopspass build-arm64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm blocked test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmblocked test-amd64-amd64-libvirt-xsm blocked test-arm64-arm64-libvirt-xsm
[xen-unstable test] 156268: regressions - FAIL
flight 156268 xen-unstable real [real] flight 156289 xen-unstable real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/156268/ http://logs.test-lab.xenproject.org/osstest/logs/156289/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-armhf-armhf-xl-vhd 20 leak-check/check fail REGR. vs. 156254 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds18 guest-start/debian.repeat fail REGR. vs. 156254 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10 fail like 156254 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 156254 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 156254 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 156254 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 156254 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 156254 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 156254 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 156254 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 156254 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 156254 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 156254 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 156254 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass version targeted for testing: xen 16a20963b3209788f2c0d3a3eebb7d92f03f5883 baseline version: xen 964781c6f162893677c50a779b7d562a299727ba Last test of basis 156254 2020-10-27 06:38:30 Z2 days
Re: [PATCH v1] libacpi: use temporary files for generated files
On 28.10.2020 19:13, Anthony PERARD wrote: > On Tue, Oct 27, 2020 at 12:06:56PM +0100, Jan Beulich wrote: >> On 27.10.2020 11:57, Andrew Cooper wrote: >>> On 27/10/2020 10:37, Jan Beulich wrote: On 27.10.2020 11:27, Olaf Hering wrote: > Am Tue, 27 Oct 2020 11:16:04 +0100 > schrieb Jan Beulich : > >> This pattern is used when a rule consists of multiple commands >> having their output appended to one another's. > My understanding is: a rule is satisfied as soon as the file exists. No - once make has found that a rule's commands need running, it'll run the full set and only check again afterwards. >>> >>> It stops at the first command which fails. >>> >>> Olaf is correct, but the problem here is an incremental build issue, not >>> a parallel build issue. >>> >>> Intermediate files must not use the name of the target, or a failure and >>> re-build will use the (bogus) intermediate state rather than rebuilding it. >> >> But there's no intermediate file here - the file gets created in one >> go. Furthermore doesn't make delete the target file(s) when a rule >> fails? (One may not want to rely on this, and hence indeed keep multi- >> part rules update intermediate files of different names.) > > What if something went badly enough and sed didn't managed to write the > whole file and make never had a chance to remove the bogus file? How's this different from an object file getting only partly written and not deleted? We'd have to use the temporary file approach in literally every rule if we wanted to cater for this. Jan
Re: [PATCH v3] x86/pv: inject #UD for entirely missing SYSCALL callbacks
On 28.10.2020 22:31, Andrew Cooper wrote: > On 26/10/2020 09:40, Jan Beulich wrote: >> In the case that no 64-bit SYSCALL callback is registered, the guest >> will be crashed when 64-bit userspace executes a SYSCALL instruction, >> which would be a userspace => kernel DoS. Similarly for 32-bit >> userspace when no 32-bit SYSCALL callback was registered either. >> >> This has been the case ever since the introduction of 64bit PV support, >> but behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which >> yield #GP/#UD in userspace before the callback is registered, and are >> therefore safe by default. >> >> This change does constitute a change in the PV ABI, for the corner case >> of a PV guest kernel not registering a 64-bit callback (which has to be >> considered a defacto requirement of the unwritten PV ABI, considering >> there is no PV equivalent of EFER.SCE). >> >> It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64 >> SYSENTER (safe by default, until explicitly enabled). >> >> Signed-off-by: Andrew Cooper >> Signed-off-by: Jan Beulich >> --- >> v3: >> * Split this change off of "x86/pv: Inject #UD for missing SYSCALL >>callbacks", to allow the uncontroversial part of that change to go >>in. Add conditional "rex64" for UREGS_rip adjustment. (Is branching >>over just the REX prefix too much trickery even for an unlikely to be >>taken code path?) > > I find this submission confusion seeing as my v3 is already suitably > acked and ready to commit. What I haven't had yet enough free time to > do so. My objection to the other half stands, and hence, I'm afraid, stands in the way of your patch getting committed. Aiui Roger's ack doesn't invalidate my objection, sorry. >> --- a/xen/arch/x86/x86_64/entry.S >> +++ b/xen/arch/x86/x86_64/entry.S >> @@ -33,11 +33,27 @@ ENTRY(switch_to_kernel) >> cmoveq VCPU_syscall32_addr(%rbx),%rax >> testq %rax,%rax >> cmovzq VCPU_syscall_addr(%rbx),%rax >> -movq %rax,TRAPBOUNCE_eip(%rdx) >> /* TB_flags = VGCF_syscall_disables_events ? TBF_INTERRUPT : 0 */ >> btl $_VGCF_syscall_disables_events,VCPU_guest_context_flags(%rbx) >> setc %cl >> leal (,%rcx,TBF_INTERRUPT),%ecx >> + >> +test %rax, %rax >> +UNLIKELY_START(z, syscall_no_callback) /* TB_eip == 0 => #UD */ >> +mov VCPU_trap_ctxt(%rbx), %rdi >> +movl $X86_EXC_UD, UREGS_entry_vector(%rsp) >> +cmpw $FLAT_USER_CS32, UREGS_cs(%rsp) >> +je0f >> +rex64 # subl => subq >> +0: >> +subl $2, UREGS_rip(%rsp) > > There was deliberately not a 32bit sub here (see below). Funny you should say this when what I've taken as input (v2; I don't think I had seen a v3, or else I would have called this one v4) had "subl", not "subq". Roger's comment was regarding a "mov" with a 32-bit register destination, not this "sub". If you had also noticed and fixed this one, without you having posted v3 (or without me having seen the posting) I couldn't have known. > As to the construct, I'm having a hard time deciding whether this is an > excellent idea, or far too clever for its own good. > > Some basic perf testing shows that there is a visible difference in > execution time of these two paths, which means there is some > optimisation being missed in the frontend for the 32bit case. However, > the difference is tiny in the grand scheme of things (about 0.4% > difference in aggregate time to execute a loop of this pattern with a > fixed number of iterations.) > > However, the 32bit case isn't actually interesting here. A guest can't > execute a SYSCALL instruction on/across the 4G->0 boundary because the > M2P is mapped NX up to the 4G boundary, so we can never reach this point > with %eip < 2. > > Therefore, the 64bit-only form is the appropriate one to use, which > solves any question of cleverness, or potential decode stalls it causes. Good point. Jan
Re: call traces in xl dmesg during boot
On 29.10.2020 09:22, Olaf Hering wrote: > During boot of xen.git#staging, Xen seems to think something pressed debug > keys very early, which shows various call traces in 'xl dmesg'. A reboot may > "fix" it, and no traces are printed. I'm seeing the same every now and then, albeit only with a single 'A' so far, iirc. > Any idea what may cause this behavior? I do not see it on a slightly smaller > box. I've been assuming this is stuff left in the serial port's input latch or FIFO. So far I didn't have a good idea how to tell left over garbage from actual input that was sent very early, so I've no good idea how to work around it. A command line option triggering the discarding of initially present input doesn't look very attractive to me ... Jan
Re: call traces in xl dmesg during boot
On 29/10/2020 08:22, Olaf Hering wrote: > During boot of xen.git#staging, Xen seems to think something pressed debug > keys very early, which shows various call traces in 'xl dmesg'. A reboot may > "fix" it, and no traces are printed. > > Any idea what may cause this behavior? I do not see it on a slightly smaller > box. That means Xen is receiving real keystrokes on the UART. I've seen it before on hardware with floating wires in place of a properly connected UART, but I've also seen it on systems with an incorrectly configured BAUD rate. Looking at your command line, you probably want com1=115200,8n1 although this should also be the default. Totally unrelated to the problem at hand, but an observation. (XEN) [00d6b68c0ecc] WARNING: NR_CPUS limit of 256 reached - ignoring further processors You'll need NR_CPUS configured to 512 to boot properly on this system. ~Andrew signature.asc Description: OpenPGP digital signature
Re: call traces in xl dmesg during boot
Am Thu, 29 Oct 2020 09:13:08 + schrieb Andrew Cooper : > You'll need NR_CPUS configured to 512 to boot properly on this system. Ah, thanks. My builds use the built-in defaults. I will adjust my future builds. Olaf pgpr5K1K_ZLum.pgp Description: Digitale Signatur von OpenPGP
Re: call traces in xl dmesg during boot
Am Thu, 29 Oct 2020 10:10:51 +0100 schrieb Jan Beulich : > I've been assuming this is stuff left in the serial port's input > latch or FIFO. Weird, there is no `ipmitool` attached to this host. Maybe the firmware does funny things... Olaf pgpQfM2XzJkIH.pgp Description: Digitale Signatur von OpenPGP
Re: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header
On Thu, Oct 29, 2020 at 8:04 AM Paolo Bonzini wrote: > > On 28/10/20 22:20, Arnd Bergmann wrote: > > Avoid this by renaming the global 'apic' variable to the more descriptive > > 'x86_system_apic'. It was originally called 'genapic', but both that > > and the current 'apic' seem to be a little overly generic for a global > > variable. > > The 'apic' affects only the current CPU, so one of 'x86_local_apic', > 'x86_lapic' or 'x86_apic' is probably preferrable. Ok, I'll change it to x86_local_apic then, unless someone else has a preference between them. > I don't have huge objections to renaming 'apic' variables and arguments > in KVM to 'lapic'. I do agree with Sean however that it's going to > break again very soon. I think ideally there would be no global variable, withall accesses encapsulated in function calls, possibly using static_call() optimizations if any of them are performance critical. It doesn't seem hard to do, but I'd rather leave that change to an x86 person ;-) Arnd
Re: [PATCH v2 3/3] xen/arm: Warn user on cpu errata 832075
Hi Julien, > On 28 Oct 2020, at 18:39, Julien Grall wrote: > > Hi Bertrand, > > On 26/10/2020 16:21, Bertrand Marquis wrote: >> When a Cortex A57 processor is affected by CPU errata 832075, a guest >> not implementing the workaround for it could deadlock the system. >> Add a warning during boot informing the user that only trusted guests >> should be executed on the system. >> An equivalent warning is already given to the user by KVM on cores >> affected by this errata. >> Also taint the hypervisor as unsecure when this errata applies and >> mention Cortex A57 r0p0 - r1p2 as not security supported in SUPPORT.md >> Signed-off-by: Bertrand Marquis > > Reviewed-by: Julien Grall Thanks > > If you don't need to resend the series, then I would be happy to fix the typo > pointed out by George on commit. There is only the condensing from Stefano. If you can handle that on commit to great but if you need me to send a v3 to make your life easier do not hesitate to tell me Cheers Bertrand > > Cheers, > > -- > Julien Grall >
Re: [PATCH] meson.build: fix building of Xen support for aarch64
Stefano Stabellini writes: > On Wed, 28 Oct 2020, Alex Bennée wrote: >> Xen is supported on aarch64 although weirdly using the i386-softmmu >> model. Checking based on the host CPU meant we never enabled Xen >> support. It would be nice to enable CONFIG_XEN for aarch64-softmmu to >> make it not seem weird but that will require further build surgery. >> >> Signed-off-by: Alex Bennée >> Cc: Masami Hiramatsu >> Cc: Stefano Stabellini >> Cc: Anthony Perard >> Cc: Paul Durrant >> Fixes: 8a19980e3f ("configure: move accelerator logic to meson") >> --- >> meson.build | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/meson.build b/meson.build >> index 835424999d..f1fcbfed4c 100644 >> --- a/meson.build >> +++ b/meson.build >> @@ -81,6 +81,8 @@ if cpu in ['x86', 'x86_64'] >> 'CONFIG_HVF': ['x86_64-softmmu'], >> 'CONFIG_WHPX': ['i386-softmmu', 'x86_64-softmmu'], >>} >> +elif cpu in [ 'arm', 'aarch64' ] >> + accelerator_targets += { 'CONFIG_XEN': ['i386-softmmu'] } >> endif > > This looks very reasonable -- the patch makes sense. > > > However I have two questions, mostly for my own understanding. I tried > to repro the aarch64 build problem but it works at my end, even without > this patch. Building on a x86 host or with cross compiler? > I wonder why. I suspect it works thanks to these lines in > meson.build: > > if not get_option('xen').disabled() and 'CONFIG_XEN_BACKEND' in config_host > accelerators += 'CONFIG_XEN' > have_xen_pci_passthrough = not > get_option('xen_pci_passthrough').disabled() and targetos == 'linux' > else > have_xen_pci_passthrough = false > endif > > But I am not entirely sure who is adding CONFIG_XEN_BACKEND to > config_host. The is part of the top level configure check - which basically checks for --enable-xen or autodetects the presence of the userspace libraries. I'm not sure if we have a slight over proliferation of symbols for Xen support (although I'm about to add more). > The other question is: does it make sense to print the value of > CONFIG_XEN as part of the summary? Something like: > > diff --git a/meson.build b/meson.build > index 47e32e1fcb..c6e7832dc9 100644 > --- a/meson.build > +++ b/meson.build > @@ -2070,6 +2070,7 @@ summary_info += {'KVM support': > config_all.has_key('CONFIG_KVM')} > summary_info += {'HAX support': config_all.has_key('CONFIG_HAX')} > summary_info += {'HVF support': config_all.has_key('CONFIG_HVF')} > summary_info += {'WHPX support': config_all.has_key('CONFIG_WHPX')} > +summary_info += {'XEN support': config_all.has_key('CONFIG_XEN')} > summary_info += {'TCG support': config_all.has_key('CONFIG_TCG')} > if config_all.has_key('CONFIG_TCG') >summary_info += {'TCG debug enabled': > config_host.has_key('CONFIG_DEBUG_TCG')} > > > But I realize there is already: > > summary_info += {'xen support': > config_host.has_key('CONFIG_XEN_BACKEND')} > > so it would be a bit of a duplicate Hmm so what we have is: CONFIG_XEN_BACKEND - ensures that appropriate compiler flags are added - pegs RAM_ADDR_MAX at UINT64_MAX (instead of UINTPTR_MAX) CONFIG_XEN - which controls a bunch of build objects, some of which may be i386 only? ./accel/meson.build:15:specific_ss.add_all(when: ['CONFIG_XEN'], if_true: dummy_ss) ./accel/stubs/meson.build:2:specific_ss.add(when: 'CONFIG_XEN', if_false: files('xen-stub.c')) ./accel/xen/meson.build:1:specific_ss.add(when: 'CONFIG_XEN', if_true: files('xen-all.c')) ./hw/9pfs/meson.build:17:fs_ss.add(when: 'CONFIG_XEN', if_true: files('xen-9p-backend.c')) ./hw/block/dataplane/meson.build:2:specific_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c')) ./hw/block/meson.build:14:softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c')) ./hw/char/meson.build:23:softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen_console.c')) ./hw/display/meson.build:18:softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xenfb.c')) ./hw/i386/xen/meson.build:1:i386_ss.add(when: 'CONFIG_XEN', if_true: files('xen-hvm.c', 'xen-mapcache.c', 'xen_apic.c', 'xen_platform.c', 'xen_pvdevice.c') ./hw/net/meson.build:2:softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen_nic.c')) ./hw/usb/meson.build:76:softmmu_ss.add(when: ['CONFIG_USB', 'CONFIG_XEN', libusb], if_true: files('xen-usb.c')) ./hw/xen/meson.build:1:softmmu_ss.add(when: ['CONFIG_XEN', xen], if_true: files( ./hw/xen/meson.build:20:specific_ss.add_all(when: ['CONFIG_XEN', xen], if_true: xen_specific_ss) ./hw/xenpv/meson.build:3:xenpv_ss.add(when: 'CONFIG_XEN', if_true: files('xen_machine_p
Re: [PATCH v1] libacpi: use temporary files for generated files
On Thu, Oct 29, 2020 at 09:47:09AM +0100, Jan Beulich wrote: > On 28.10.2020 19:13, Anthony PERARD wrote: > > On Tue, Oct 27, 2020 at 12:06:56PM +0100, Jan Beulich wrote: > >> On 27.10.2020 11:57, Andrew Cooper wrote: > >>> On 27/10/2020 10:37, Jan Beulich wrote: > On 27.10.2020 11:27, Olaf Hering wrote: > > Am Tue, 27 Oct 2020 11:16:04 +0100 > > schrieb Jan Beulich : > > > >> This pattern is used when a rule consists of multiple commands > >> having their output appended to one another's. > > My understanding is: a rule is satisfied as soon as the file exists. > No - once make has found that a rule's commands need running, it'll > run the full set and only check again afterwards. > >>> > >>> It stops at the first command which fails. > >>> > >>> Olaf is correct, but the problem here is an incremental build issue, not > >>> a parallel build issue. > >>> > >>> Intermediate files must not use the name of the target, or a failure and > >>> re-build will use the (bogus) intermediate state rather than rebuilding > >>> it. > >> > >> But there's no intermediate file here - the file gets created in one > >> go. Furthermore doesn't make delete the target file(s) when a rule > >> fails? (One may not want to rely on this, and hence indeed keep multi- > >> part rules update intermediate files of different names.) > > > > What if something went badly enough and sed didn't managed to write the > > whole file and make never had a chance to remove the bogus file? > > How's this different from an object file getting only partly written > and not deleted? We'd have to use the temporary file approach in > literally every rule if we wanted to cater for this. I though that things like `gcc' would write the final object to a temporary place then rename it to the final destination, but that doesn't seems to be the case. I tried to see what happens if the `sed' command fails, and the target is created, empty, and doesn't gets deleted by make. So an incremental build uses a broken file without trying to rebuild it. If we want `make' to delete target when a rule fails, I think we need to add '.DELETE_ON_ERROR:' somewhere. Or avoid creating files before the command is likely to succeed. Cheers, -- Anthony PERARD
Re: [PATCH v1] libacpi: use temporary files for generated files
Am Thu, 29 Oct 2020 10:57:15 + schrieb Anthony PERARD : > we need to add '.DELETE_ON_ERROR:' The last paragraph at https://www.gnu.org/software/make/manual/html_node/Errors.html#Errors suggests that this is a good thing to have. Olaf pgp60XmNfzvTX.pgp Description: Digitale Signatur von OpenPGP
[ovmf test] 156270: all pass - PUSHED
flight 156270 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/156270/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf 3b87d728742fe58f427f4b775b11250e29d75cc6 baseline version: ovmf eb520b93d279e901a593c57e30649fb08f4290c5 Last test of basis 156255 2020-10-27 09:04:36 Z2 days Testing same since 156270 2020-10-28 03:11:01 Z1 days1 attempts People who touched revisions under test: Abner Chang jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git eb520b93d2..3b87d72874 3b87d728742fe58f427f4b775b11250e29d75cc6 -> xen-tested-master
Re: [PATCH v1] libacpi: use temporary files for generated files
On 29.10.2020 11:57, Anthony PERARD wrote: > On Thu, Oct 29, 2020 at 09:47:09AM +0100, Jan Beulich wrote: >> On 28.10.2020 19:13, Anthony PERARD wrote: >>> On Tue, Oct 27, 2020 at 12:06:56PM +0100, Jan Beulich wrote: On 27.10.2020 11:57, Andrew Cooper wrote: > On 27/10/2020 10:37, Jan Beulich wrote: >> On 27.10.2020 11:27, Olaf Hering wrote: >>> Am Tue, 27 Oct 2020 11:16:04 +0100 >>> schrieb Jan Beulich : >>> This pattern is used when a rule consists of multiple commands having their output appended to one another's. >>> My understanding is: a rule is satisfied as soon as the file exists. >> No - once make has found that a rule's commands need running, it'll >> run the full set and only check again afterwards. > > It stops at the first command which fails. > > Olaf is correct, but the problem here is an incremental build issue, not > a parallel build issue. > > Intermediate files must not use the name of the target, or a failure and > re-build will use the (bogus) intermediate state rather than rebuilding > it. But there's no intermediate file here - the file gets created in one go. Furthermore doesn't make delete the target file(s) when a rule fails? (One may not want to rely on this, and hence indeed keep multi- part rules update intermediate files of different names.) >>> >>> What if something went badly enough and sed didn't managed to write the >>> whole file and make never had a chance to remove the bogus file? >> >> How's this different from an object file getting only partly written >> and not deleted? We'd have to use the temporary file approach in >> literally every rule if we wanted to cater for this. > > I though that things like `gcc' would write the final object to a > temporary place then rename it to the final destination, but that > doesn't seems to be the case. > > I tried to see what happens if the `sed' command fails, and the target is > created, empty, and doesn't gets deleted by make. So an incremental > build uses a broken file without trying to rebuild it. IOW it's rather a courtesy of the compiler / assembler / linker to delete their output files on error. > If we want `make' to delete target when a rule fails, I think we need to > add '.DELETE_ON_ERROR:' somewhere. Ah, indeed. I thought this was the default nowadays, but the doc says it isn't. I think this would be preferable over touching individual rules to go through temporary files. Jan
Re: [PATCH v3] x86/pv: inject #UD for entirely missing SYSCALL callbacks
On 29.10.2020 09:53, Jan Beulich wrote: > On 28.10.2020 22:31, Andrew Cooper wrote: >> On 26/10/2020 09:40, Jan Beulich wrote: >>> In the case that no 64-bit SYSCALL callback is registered, the guest >>> will be crashed when 64-bit userspace executes a SYSCALL instruction, >>> which would be a userspace => kernel DoS. Similarly for 32-bit >>> userspace when no 32-bit SYSCALL callback was registered either. >>> >>> This has been the case ever since the introduction of 64bit PV support, >>> but behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which >>> yield #GP/#UD in userspace before the callback is registered, and are >>> therefore safe by default. >>> >>> This change does constitute a change in the PV ABI, for the corner case >>> of a PV guest kernel not registering a 64-bit callback (which has to be >>> considered a defacto requirement of the unwritten PV ABI, considering >>> there is no PV equivalent of EFER.SCE). >>> >>> It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64 >>> SYSENTER (safe by default, until explicitly enabled). >>> >>> Signed-off-by: Andrew Cooper >>> Signed-off-by: Jan Beulich >>> --- >>> v3: >>> * Split this change off of "x86/pv: Inject #UD for missing SYSCALL >>>callbacks", to allow the uncontroversial part of that change to go >>>in. Add conditional "rex64" for UREGS_rip adjustment. (Is branching >>>over just the REX prefix too much trickery even for an unlikely to be >>>taken code path?) >> >> I find this submission confusion seeing as my v3 is already suitably >> acked and ready to commit. What I haven't had yet enough free time to >> do so. > > My objection to the other half stands, and hence, I'm afraid, stands > in the way of your patch getting committed. Aiui Roger's ack doesn't > invalidate my objection, sorry. Actually I realize now that earlier I said I'm okay with this going in if Roger acks it. I take it that Roger was aware of the controversy when providing the ack. Therefore I'd like to take back what I've said above, and your supposed v3 ought to be okay to get committed. As to backporting - I'd surely be taking the split off part here, but I have to admit I'm hesitant to take the full change. IOW splitting the two changes might still be a helpful thing. Jan
Re: [PATCH] meson.build: fix building of Xen support for aarch64
On Thu, Oct 29, 2020 at 6:01 AM Alex Bennée wrote: > > > Stefano Stabellini writes: > > > On Wed, 28 Oct 2020, Alex Bennée wrote: > >> Xen is supported on aarch64 although weirdly using the i386-softmmu > >> model. Checking based on the host CPU meant we never enabled Xen > >> support. It would be nice to enable CONFIG_XEN for aarch64-softmmu to > >> make it not seem weird but that will require further build surgery. > >> > >> Signed-off-by: Alex Bennée > >> Cc: Masami Hiramatsu > >> Cc: Stefano Stabellini > >> Cc: Anthony Perard > >> Cc: Paul Durrant > >> Fixes: 8a19980e3f ("configure: move accelerator logic to meson") > >> --- > >> meson.build | 2 ++ > >> 1 file changed, 2 insertions(+) > >> > >> diff --git a/meson.build b/meson.build > >> index 835424999d..f1fcbfed4c 100644 > >> --- a/meson.build > >> +++ b/meson.build > >> @@ -81,6 +81,8 @@ if cpu in ['x86', 'x86_64'] > >> 'CONFIG_HVF': ['x86_64-softmmu'], > >> 'CONFIG_WHPX': ['i386-softmmu', 'x86_64-softmmu'], > >>} > >> +elif cpu in [ 'arm', 'aarch64' ] > >> + accelerator_targets += { 'CONFIG_XEN': ['i386-softmmu'] } > >> endif > > > > This looks very reasonable -- the patch makes sense. A comment would be useful to explain that Arm needs i386-softmmu for the xenpv machine. > > > > However I have two questions, mostly for my own understanding. I tried > > to repro the aarch64 build problem but it works at my end, even without > > this patch. > > Building on a x86 host or with cross compiler? > > > I wonder why. I suspect it works thanks to these lines in > > meson.build: I think it's a runtime and not a build problem. In osstest, Xen support was detected and configured, but CONFIG_XEN wasn't set for Arm. So at runtime xen_available() returns 0, and QEMU doesn't start with "qemu-system-i386: -xen-domid 1: Option not supported for this target" I posted my investigation here: https://lore.kernel.org/xen-devel/cakf6xpss8kpgovzrkitpz63bhbvbjxrtywdhekzuo2q1kem...@mail.gmail.com/ Regards, Jason
Ping: [PATCH v3 0/3] x86: shim building adjustments (plus shadow follow-on)
Tim, On 19.10.2020 10:42, Jan Beulich wrote: > The change addressing the shadow related build issue noticed by > Andrew went in already. The build breakage goes beyond this > specific combination though - PV_SHIM_EXCLUSIVE plus HVM is > similarly an issue. This is what the 1st patch tries to take care > of, in a shape already on irc noticed to be controversial. I'm > submitting the change nevertheless because for the moment there > looks to be a majority in favor of going this route. One argument > not voiced there yet: What good does it do to allow a user to > enable HVM when then on the resulting hypervisor they still can't > run HVM guests (for the hypervisor still being a dedicated PV > shim one). On top of this, the alternative approach is likely > going to get ugly. > > The shadow related adjustments are here merely because the want > to make them was noticed in the context of the patch which has > already gone in. > > 1: don't permit HVM and PV_SHIM_EXCLUSIVE at the same time > 2: refactor shadow_vram_{get,put}_l1e() > 3: sh_{make,destroy}_monitor_table() are "even more" HVM-only unless you tell me otherwise I'm intending to commit the latter two with Roger's acks some time next week. Of course it would still be nice to know your view on the first of the TBDs in patch 3 (which may result in further changes, in a follow-up). Jan
[linux-linus test] 156269: regressions - FAIL
flight 156269 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/156269/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-xsm7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-ws16-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemut-rhel6hvm-intel 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-debianhvm-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemuu-rhel6hvm-intel 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-coresched-i386-xl 7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemut-rhel6hvm-amd 7 xen-installfail REGR. vs. 152332 test-amd64-i386-xl-qemut-ws16-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-pair 10 xen-install/src_host fail REGR. vs. 152332 test-amd64-i386-pair 11 xen-install/dst_host fail REGR. vs. 152332 test-amd64-i386-libvirt-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-libvirt 7 xen-install fail REGR. vs. 152332 test-amd64-i386-qemuu-rhel6hvm-amd 7 xen-installfail REGR. vs. 152332 test-amd64-i386-xl7 xen-install fail REGR. vs. 152332 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-examine 6 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-raw7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-pvshim 7 xen-install fail REGR. vs. 152332 test-amd64-i386-freebsd10-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-shadow 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332 test-amd64-i386-freebsd10-i386 7 xen-installfail REGR. vs. 152332 test-amd64-i386-xl-qemuu-win7-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemut-win7-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-xl-qemuu-ovmf-amd64 7 xen-install fail REGR. vs. 152332 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 152332 test-arm64-arm64-xl-credit1 8 xen-boot fail REGR. vs. 152332 test-arm64-arm64-xl-xsm 8 xen-boot fail REGR. vs. 152332 test-arm64-arm64-libvirt-xsm 8 xen-boot fail REGR. vs. 152332 test-arm64-arm64-examine 8 reboot fail REGR. vs. 152332 test-amd64-amd64-amd64-pvgrub 20 guest-stop fail REGR. vs. 152332 test-amd64-amd64-i386-pvgrub 19 guest-localmigrate/x10 fail REGR. vs. 152332 test-armhf-armhf-xl-credit1 8 xen-boot fail REGR. vs. 152332 test-armhf-armhf-libvirt 8 xen-boot fail REGR. vs. 152332 test-armhf-armhf-examine 8 reboot fail REGR. vs. 152332 test-armhf-armhf-xl-cubietruck 8 xen-boot fail REGR. vs. 152332 test-armhf-armhf-xl-multivcpu 8 xen-bootfail REGR. vs. 152332 test-armhf-armhf-libvirt-raw 8 xen-boot fail REGR. vs. 152332 test-arm64-arm64-xl-seattle 8 xen-boot fail REGR. vs. 152332 test-arm64-arm64-xl-credit2 8 xen-boot fail REGR. vs. 152332 test-armhf-armhf-xl 8 xen-boot fail REGR. vs. 152332 test-armhf-armhf-xl-credit2 8 xen-boot fail REGR. vs. 152332 test-armhf-armhf-xl-vhd 8 xen-boot fail REGR. vs. 152332 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds 8 xen-boot fail REGR. vs. 152332 Tests which did not succeed, but are not blocking: test-arm64-arm64-xl 11 leak-check/basis(11)fail blocked in 152332 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 152332 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152332 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 152332 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152332 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152332 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm
Ping: [PATCH 2/2] tools/libs: fix uninstall rule for header files
On 19.10.2020 09:21, Jan Beulich wrote: > This again was working right only as long as $(LIBHEADER) consisted of > just one entry. > > Signed-off-by: Jan Beulich While patch 1 has become irrelevant with Juergen's more complete change, this one is still applicable afaict. Jan > --- > An alternative would be to use $(addprefix ) without any shell loop. > > --- a/tools/libs/libs.mk > +++ b/tools/libs/libs.mk > @@ -107,7 +107,7 @@ install: build > .PHONY: uninstall > uninstall: > rm -f $(DESTDIR)$(PKG_INSTALLDIR)/$(LIB_FILE_NAME).pc > - for i in $(LIBHEADER); do rm -f $(DESTDIR)$(includedir)/$(LIBHEADER); > done > + for i in $(LIBHEADER); do rm -f $(DESTDIR)$(includedir)/$$i; done > rm -f $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so > rm -f $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so.$(MAJOR) > rm -f $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so.$(MAJOR).$(MINOR) > >
Ping: [PATCH] tools/python: pass more -rpath-link options to ld
On 19.10.2020 10:31, Jan Beulich wrote: > With the split of libraries, I've observed a number of warnings from > (old?) ld. > > Signed-off-by: Jan Beulich Marek? > --- > It's unclear to me whether this is ld version dependent - the pattern > of where I've seen such warnings doesn't suggest a clear version > dependency. > > --- a/tools/python/setup.py > +++ b/tools/python/setup.py > @@ -7,10 +7,15 @@ XEN_ROOT = "../.." > extra_compile_args = [ "-fno-strict-aliasing", "-Werror" ] > > PATH_XEN = XEN_ROOT + "/tools/include" > +PATH_LIBXENTOOLCORE = XEN_ROOT + "/tools/libs/toolcore" > PATH_LIBXENTOOLLOG = XEN_ROOT + "/tools/libs/toollog" > +PATH_LIBXENCALL = XEN_ROOT + "/tools/libs/call" > PATH_LIBXENEVTCHN = XEN_ROOT + "/tools/libs/evtchn" > +PATH_LIBXENGNTTAB = XEN_ROOT + "/tools/libs/gnttab" > PATH_LIBXENCTRL = XEN_ROOT + "/tools/libs/ctrl" > PATH_LIBXENGUEST = XEN_ROOT + "/tools/libs/guest" > +PATH_LIBXENDEVICEMODEL = XEN_ROOT + "/tools/libs/devicemodel" > +PATH_LIBXENFOREIGNMEMORY = XEN_ROOT + "/tools/libs/foreignmemory" > PATH_XENSTORE = XEN_ROOT + "/tools/libs/store" > > xc = Extension("xc", > @@ -24,7 +29,13 @@ xc = Extension("xc", > library_dirs = [ PATH_LIBXENCTRL, PATH_LIBXENGUEST ], > libraries = [ "xenctrl", "xenguest" ], > depends= [ PATH_LIBXENCTRL + "/libxenctrl.so", > PATH_LIBXENGUEST + "/libxenguest.so" ], > - extra_link_args= [ "-Wl,-rpath-link="+PATH_LIBXENTOOLLOG > ], > + extra_link_args= [ "-Wl,-rpath-link="+PATH_LIBXENCALL, > + > "-Wl,-rpath-link="+PATH_LIBXENDEVICEMODEL, > + "-Wl,-rpath-link="+PATH_LIBXENEVTCHN, > + > "-Wl,-rpath-link="+PATH_LIBXENFOREIGNMEMORY, > + "-Wl,-rpath-link="+PATH_LIBXENGNTTAB, > + "-Wl,-rpath-link="+PATH_LIBXENTOOLCORE, > + "-Wl,-rpath-link="+PATH_LIBXENTOOLLOG > ], > sources= [ "xen/lowlevel/xc/xc.c" ]) > > xs = Extension("xs", > @@ -33,6 +44,7 @@ xs = Extension("xs", > library_dirs = [ PATH_XENSTORE ], > libraries = [ "xenstore" ], > depends= [ PATH_XENSTORE + "/libxenstore.so" ], > + extra_link_args= [ "-Wl,-rpath-link="+PATH_LIBXENTOOLCORE > ], > sources= [ "xen/lowlevel/xs/xs.c" ]) > > plat = os.uname()[0] >
Ping²: [PATCH] x86/PV: make post-migration page state consistent
On 11.09.2020 12:34, Jan Beulich wrote: > When a page table page gets de-validated, its type reference count drops > to zero (and PGT_validated gets cleared), but its type remains intact. > XEN_DOMCTL_getpageframeinfo3, therefore, so far reported prior usage for > such pages. An intermediate write to such a page via e.g. > MMU_NORMAL_PT_UPDATE, however, would transition the page's type to > PGT_writable_page, thus altering what XEN_DOMCTL_getpageframeinfo3 would > return. In libxc the decision which pages to normalize / localize > depends solely on the type returned from the domctl. As a result without > further precautions the guest won't be able to tell whether such a page > has had its (apparent) PTE entries transitioned to the new MFNs. > > Add a check of PGT_validated, thus consistently avoiding normalization / > localization in the tool stack. > > Alongside using XEN_DOMCTL_PFINFO_NOTAB instead of plain zero for the > change at hand, also change the variable's initializer to use this > constant, too. Take the opportunity and also adjust its type. > > Signed-off-by: Jan Beulich I think I did address all questions here. Jan > --- a/xen/arch/x86/domctl.c > +++ b/xen/arch/x86/domctl.c > @@ -215,7 +215,8 @@ long arch_do_domctl( > > for ( i = 0; i < num; ++i ) > { > -unsigned long gfn = 0, type = 0; > +unsigned long gfn = 0; > +unsigned int type = XEN_DOMCTL_PFINFO_NOTAB; > struct page_info *page; > p2m_type_t t; > > @@ -255,6 +256,8 @@ long arch_do_domctl( > > if ( page->u.inuse.type_info & PGT_pinned ) > type |= XEN_DOMCTL_PFINFO_LPINTAB; > +else if ( !(page->u.inuse.type_info & PGT_validated) ) > +type = XEN_DOMCTL_PFINFO_NOTAB; > > if ( page->count_info & PGC_broken ) > type = XEN_DOMCTL_PFINFO_BROKEN; >
[PATCH] x86/pv: Drop stale comment in dom0_construct_pv()
This comment has been around since c/s 1372bca0615 in 2004. It is stale, as it predates the introduction of struct vcpu. It is not obvious that it was even correct at the time. Where a vcpu (domain at the time) has been configured to run is unrelated to construct the domain's initial pagetables, etc. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Roger Pau Monné CC: Wei Liu Almost... I'm not entirely sure NUMA memory allocation is plumbed through correctly, but even that still has nothing to do with v->processor --- xen/arch/x86/pv/dom0_build.c | 1 - 1 file changed, 1 deletion(-) diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c index d79503d6a9..f7165309a2 100644 --- a/xen/arch/x86/pv/dom0_build.c +++ b/xen/arch/x86/pv/dom0_build.c @@ -616,7 +616,6 @@ int __init dom0_construct_pv(struct domain *d, v->arch.pv.event_callback_cs= FLAT_COMPAT_KERNEL_CS; } -/* WARNING: The new domain must have its 'processor' field filled in! */ if ( !is_pv_32bit_domain(d) ) { maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table; -- 2.11.0
Re: [PATCH] xen: add support for automatic debug key actions in case of crash
On 22.10.2020 16:39, Juergen Gross wrote: > When the host crashes it would sometimes be nice to have additional > debug data available which could be produced via debug keys, but > halting the server for manual intervention might be impossible due to > the need to reboot/kexec rather sooner than later. > > Add support for automatic debug key actions in case of crashes which > can be activated via boot- or runtime-parameter. While I can certainly see this possibly being a useful thing in certain situations, I'm uncertain it's going to be helpful in at least a fair set of cases. What output to request very much depends on the nature of the problem one is running into, and the more keys one adds "just in case", the longer the reboot latency, and the higher the risk (see also below) of the output generation actually causing further problems. IOW I'm neither fully convinced that we want this, nor fully opposed. > Depending on the type of crash the desired data might be different, so > support different settings for the possible types of crashes. > > The parameter is "crash-debug" with the following syntax: > > crash-debug-= > > with being one of: > > panic, hwdom, watchdog, kexeccmd, debugkey > > and a sequence of debug key characters with '.' having the > special semantics of a 1 second pause. 1 second is a whole lot of time. To get two successive sets of data, a much shorter delay (if any) would normally suffice. Also, while '.' may seem like a good choice right now, with the shortage of characters we may want to put a real handler behind it at come point. The one character that clearly won't make much sense to use in this context is 'h', but that's awful as a (kind of) separator. Could we perhaps replace 'h' by '?', freeing up 'h' and allowing '?' to be used for this purpose here? > --- a/docs/misc/xen-command-line.pandoc > +++ b/docs/misc/xen-command-line.pandoc > @@ -574,6 +574,29 @@ reduction of features at Xen's disposal to manage guests. > ### cpuinfo (x86) > > `= ` > > +### crash-debug-debugkey > +### crash-debug-hwdom > +### crash-debug-kexeccmd > +### crash-debug-panic > +### crash-debug-watchdog > +> `= ` > + > +> Can be modified at runtime > + > +Specify debug-key actions in cases of crashes. Each of the parameters applies > +to a different crash reason. The `` is a sequence of debug key > +characters, with `.` having the special meaning of a 1 second pause. > + > +So e.g. `crash-debug-watchdog=0.0r` would dump dom0 state twice with a > +second between the two state dumps, followed by the run queues of the > +hypervisor, if the system crashes due to a watchdog timeout. > + > +These parameters should be used carefully, as e.g. specifying > +`crash-debug-debugkey=C` would result in an endless loop. Depending on the > +reason of the system crash it might happen that triggering some debug key > +action will result in a hang instead of dumping data and then doing a > +reboot or crash dump. I think it would be useful if the flavors were (briefly) explained: At the very least "debugkey" doesn't directly fit "in cases of crashes", as there's no crash involved. kexec_crash() instead gets invoked without there having been any crash. You may also want to point out that this is a best effort thing only - system state at the point of a crash may be such that the attempt of handling one or the debug keys would have further bad effects on the system, including that the actual kexec may then never occur. > @@ -507,6 +509,41 @@ void __init initialize_keytable(void) > } > } > > +#define CRASHACTION_SIZE 32 > +static char crash_debug_panic[CRASHACTION_SIZE]; > +static char crash_debug_hwdom[CRASHACTION_SIZE]; > +static char crash_debug_watchdog[CRASHACTION_SIZE]; > +static char crash_debug_kexeccmd[CRASHACTION_SIZE]; > +static char crash_debug_debugkey[CRASHACTION_SIZE]; > + > +static char *crash_action[CRASHREASON_N] = { > +[CRASHREASON_PANIC] = crash_debug_panic, > +[CRASHREASON_HWDOM] = crash_debug_hwdom, > +[CRASHREASON_WATCHDOG] = crash_debug_watchdog, > +[CRASHREASON_KEXECCMD] = crash_debug_kexeccmd, > +[CRASHREASON_DEBUGKEY] = crash_debug_debugkey, > +}; > + > +string_runtime_param("crash-debug-panic", crash_debug_panic); > +string_runtime_param("crash-debug-hwdom", crash_debug_hwdom); > +string_runtime_param("crash-debug-watchdog", crash_debug_watchdog); > +string_runtime_param("crash-debug-kexeccmd", crash_debug_kexeccmd); > +string_runtime_param("crash-debug-debugkey", crash_debug_debugkey); In general I'm not in favor of this (or similar) needing five new command line options, instead of just one. The problem with e.g. crash-debug=panic:rq,watchdog:0d is that ',' (or any other separator chosen) could in principle also be a debug key. It would still be possible to use crash-debug=panic:rq crash-debug=watchdog:0d though. Thoughts? > +void keyhandler_crash_action(enum crash_reason reason) > +{ > +const char *action = crash_action[reason]; > +
Re: [XEN PATCH v1] xen/arm : Add support for SMMUv3 driver
Hi Julien, > On 28 Oct 2020, at 19:12, Julien Grall wrote: > > > > On 26/10/2020 11:03, Rahul Singh wrote: >> Hello Julien, > > Hi Rahul, > >>> On 23 Oct 2020, at 4:19 pm, Julien Grall wrote: >>> >>> >>> >>> On 23/10/2020 15:27, Rahul Singh wrote: Hello Julien, > On 23 Oct 2020, at 2:00 pm, Julien Grall wrote: > > > > On 23/10/2020 12:35, Rahul Singh wrote: >> Hello, >>> On 23 Oct 2020, at 1:02 am, Stefano Stabellini >>> wrote: >>> >>> On Thu, 22 Oct 2020, Julien Grall wrote: >> On 20/10/2020 16:25, Rahul Singh wrote: >>> Add support for ARM architected SMMUv3 implementations. It is based >>> on >>> the Linux SMMUv3 driver. >>> Major differences between the Linux driver are as follows: >>> 1. Only Stage-2 translation is supported as compared to the Linux >>> driver >>>that supports both Stage-1 and Stage-2 translations. >>> 2. Use P2M page table instead of creating one as SMMUv3 has the >>>capability to share the page tables with the CPU. >>> 3. Tasklets is used in place of threaded IRQ's in Linux for event >>> queue >>>and priority queue IRQ handling. >> >> Tasklets are not a replacement for threaded IRQ. In particular, they >> will >> have priority over anything else (IOW nothing will run on the pCPU >> until >> they are done). >> >> Do you know why Linux is using thread. Is it because of long running >> operations? > > Yes you are right because of long running operations Linux is using > the > threaded IRQs. > > SMMUv3 reports fault/events bases on memory-based circular buffer > queues not > based on the register. As per my understanding, it is time-consuming > to > process the memory based queues in interrupt context because of that > Linux > is using threaded IRQ to process the faults/events from SMMU. > > I didn’t find any other solution in XEN in place of tasklet to defer > the > work, that’s why I used tasklet in XEN in replacement of threaded > IRQs. If > we do all work in interrupt context we will make XEN less responsive. So we need to make sure that Xen continue to receives interrupts, but we also need to make sure that a vCPU bound to the pCPU is also responsive. > > If you know another solution in XEN that will be used to defer the > work in > the interrupt please let me know I will try to use that. One of my work colleague encountered a similar problem recently. He had a long running tasklet and wanted to be broken down in smaller chunk. We decided to use a timer to reschedule the taslket in the future. This allows the scheduler to run other loads (e.g. vCPU) for some time. This is pretty hackish but I couldn't find a better solution as tasklet have high priority. Maybe the other will have a better idea. >>> >>> Julien's suggestion is a good one. >>> >>> But I think tasklets can be configured to be called from the idle_loop, >>> in which case they are not run in interrupt context? >>> >> Yes you are right tasklet will be scheduled from the idle_loop that is >> not interrupt conext. > > This depends on your tasklet. Some will run from the softirq context > which is usually (for Arm) on the return of an exception. > Thanks for the info. I will check and will get better understanding of the tasklet how it will run in XEN. >>> >>> 4. Latest version of the Linux SMMUv3 code implements the commands >>> queue >>>access functions based on atomic operations implemented in Linux. >> >> Can you provide more details? > > I tried to port the latest version of the SMMUv3 code than I observed > that > in order to port that code I have to also port atomic operation > implemented > in Linux to XEN. As latest Linux code uses atomic operation to > process the > command queues > (atomic_cond_read_relaxed(),atomic_long_cond_read_relaxed() , > atomic_fetch_andnot_relaxed()) . Thank you for the explanation. I think it would be best to import the atomic helpers and use the latest code. This will ensure that we don't re-introduce bugs and also buy us some time before the Linux and Xen driver diverge again too much. Stefano, what do you think? >>> >>> I think you are right. >> Yes, I agree with yo
Re: [PATCH] x86/pv: Drop stale comment in dom0_construct_pv()
On 29.10.2020 15:00, Andrew Cooper wrote: > This comment has been around since c/s 1372bca0615 in 2004. It is stale, as > it predates the introduction of struct vcpu. That commit only moved it around; it's 22a857bde9b8 afaics from early 2003 where it first appeared, where it had a reason: /* * WARNING: The new domain must have its 'processor' field * filled in by now !! */ phys_l2tab = ALLOC_FRAME_FROM_DOMAIN(); l2tab = map_domain_mem(phys_l2tab); memcpy(l2tab, idle_pg_table[p->processor], PAGE_SIZE); But yes, the comment has been stale for a long time, and I've been wondering a number of times what it was supposed to tell me. (I think it was already stale at the point the comment first got altered, in 3072fef54df8.) > It is not obvious that it was even correct at the time. Where a vcpu (domain > at the time) has been configured to run is unrelated to construct the domain's > initial pagetables, etc. > > Signed-off-by: Andrew Cooper Acked-by: Jan Beulich Jan
Re: [PATCH] xen: add support for automatic debug key actions in case of crash
On 29.10.20 15:22, Jan Beulich wrote: On 22.10.2020 16:39, Juergen Gross wrote: When the host crashes it would sometimes be nice to have additional debug data available which could be produced via debug keys, but halting the server for manual intervention might be impossible due to the need to reboot/kexec rather sooner than later. Add support for automatic debug key actions in case of crashes which can be activated via boot- or runtime-parameter. While I can certainly see this possibly being a useful thing in certain situations, I'm uncertain it's going to be helpful in at least a fair set of cases. What output to request very much depends on the nature of the problem one is running into, and the more keys one adds "just in case", the longer the reboot latency, and the higher the risk (see also below) of the output generation actually causing further problems. The obvious case is a watchdog induced crash: at least 2 sets of dom0 state will help in many cases. IOW I'm neither fully convinced that we want this, nor fully opposed. Depending on the type of crash the desired data might be different, so support different settings for the possible types of crashes. The parameter is "crash-debug" with the following syntax: crash-debug-= with being one of: panic, hwdom, watchdog, kexeccmd, debugkey and a sequence of debug key characters with '.' having the special semantics of a 1 second pause. 1 second is a whole lot of time. To get two successive sets of data, a much shorter delay (if any) would normally suffice. Yes, I'd be fine to trade that for a shorter period of time. Also, while '.' may seem like a good choice right now, with the shortage of characters we may want to put a real handler behind it at come point. The one character that clearly won't make much sense to use in this context is 'h', but that's awful as a (kind of) separator. Could we perhaps replace 'h' by '?', freeing up 'h' and allowing '?' to be used for this purpose here? Fine with me. Another possibility would be to add '\' as an escape character with '\.' meaning "debug-key .". --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -574,6 +574,29 @@ reduction of features at Xen's disposal to manage guests. ### cpuinfo (x86) > `= ` +### crash-debug-debugkey +### crash-debug-hwdom +### crash-debug-kexeccmd +### crash-debug-panic +### crash-debug-watchdog +> `= ` + +> Can be modified at runtime + +Specify debug-key actions in cases of crashes. Each of the parameters applies +to a different crash reason. The `` is a sequence of debug key +characters, with `.` having the special meaning of a 1 second pause. + +So e.g. `crash-debug-watchdog=0.0r` would dump dom0 state twice with a +second between the two state dumps, followed by the run queues of the +hypervisor, if the system crashes due to a watchdog timeout. + +These parameters should be used carefully, as e.g. specifying +`crash-debug-debugkey=C` would result in an endless loop. Depending on the +reason of the system crash it might happen that triggering some debug key +action will result in a hang instead of dumping data and then doing a +reboot or crash dump. I think it would be useful if the flavors were (briefly) explained: At the very least "debugkey" doesn't directly fit "in cases of crashes", as there's no crash involved. kexec_crash() instead gets invoked without there having been any crash. Yes, and having some additional state generate for this case might help diagnosis. You may also want to point out that this is a best effort thing only - system state at the point of a crash may be such that the attempt of handling one or the debug keys would have further bad effects on the system, including that the actual kexec may then never occur. True. @@ -507,6 +509,41 @@ void __init initialize_keytable(void) } } +#define CRASHACTION_SIZE 32 +static char crash_debug_panic[CRASHACTION_SIZE]; +static char crash_debug_hwdom[CRASHACTION_SIZE]; +static char crash_debug_watchdog[CRASHACTION_SIZE]; +static char crash_debug_kexeccmd[CRASHACTION_SIZE]; +static char crash_debug_debugkey[CRASHACTION_SIZE]; + +static char *crash_action[CRASHREASON_N] = { +[CRASHREASON_PANIC] = crash_debug_panic, +[CRASHREASON_HWDOM] = crash_debug_hwdom, +[CRASHREASON_WATCHDOG] = crash_debug_watchdog, +[CRASHREASON_KEXECCMD] = crash_debug_kexeccmd, +[CRASHREASON_DEBUGKEY] = crash_debug_debugkey, +}; + +string_runtime_param("crash-debug-panic", crash_debug_panic); +string_runtime_param("crash-debug-hwdom", crash_debug_hwdom); +string_runtime_param("crash-debug-watchdog", crash_debug_watchdog); +string_runtime_param("crash-debug-kexeccmd", crash_debug_kexeccmd); +string_runtime_param("crash-debug-debugkey", crash_debug_debugkey); In general I'm not in favor of this (or similar) needing five new command line options, instead of just one. The problem with e.g. crash-debug=panic:rq,watchdog:0d is that ',' (
Re: [PATCH] xen: add support for automatic debug key actions in case of crash
On 29.10.2020 15:40, Jürgen Groß wrote: > On 29.10.20 15:22, Jan Beulich wrote: >> On 22.10.2020 16:39, Juergen Gross wrote: >>> @@ -507,6 +509,41 @@ void __init initialize_keytable(void) >>> } >>> } >>> >>> +#define CRASHACTION_SIZE 32 >>> +static char crash_debug_panic[CRASHACTION_SIZE]; >>> +static char crash_debug_hwdom[CRASHACTION_SIZE]; >>> +static char crash_debug_watchdog[CRASHACTION_SIZE]; >>> +static char crash_debug_kexeccmd[CRASHACTION_SIZE]; >>> +static char crash_debug_debugkey[CRASHACTION_SIZE]; >>> + >>> +static char *crash_action[CRASHREASON_N] = { >>> +[CRASHREASON_PANIC] = crash_debug_panic, >>> +[CRASHREASON_HWDOM] = crash_debug_hwdom, >>> +[CRASHREASON_WATCHDOG] = crash_debug_watchdog, >>> +[CRASHREASON_KEXECCMD] = crash_debug_kexeccmd, >>> +[CRASHREASON_DEBUGKEY] = crash_debug_debugkey, >>> +}; >>> + >>> +string_runtime_param("crash-debug-panic", crash_debug_panic); >>> +string_runtime_param("crash-debug-hwdom", crash_debug_hwdom); >>> +string_runtime_param("crash-debug-watchdog", crash_debug_watchdog); >>> +string_runtime_param("crash-debug-kexeccmd", crash_debug_kexeccmd); >>> +string_runtime_param("crash-debug-debugkey", crash_debug_debugkey); >> >> In general I'm not in favor of this (or similar) needing >> five new command line options, instead of just one. The problem >> with e.g. >> >> crash-debug=panic:rq,watchdog:0d >> >> is that ',' (or any other separator chosen) could in principle >> also be a debug key. It would still be possible to use >> >> crash-debug=panic:rq crash-debug=watchdog:0d >> >> though. Thoughts? > > OTOH the runtime parameters are much easier addressable that way. Ah yes, I can see this as a reason. Would make we wonder whether command line and runtime handling may want disconnecting in this specific case then. (But I can also see the argument of this being too much overhead then.) >>> +void keyhandler_crash_action(enum crash_reason reason) >>> +{ >>> +const char *action = crash_action[reason]; >>> +struct cpu_user_regs *regs = get_irq_regs() ? : guest_cpu_user_regs(); >>> + >>> +while ( *action ) { >>> +if ( *action == '.' ) >>> +mdelay(1000); >>> +else >>> +handle_keypress(*action, regs); >>> +action++; >>> +} >>> +} >> >> I think only diagnostic keys should be permitted here. > > While I understand that other keys could produce nonsense or do harm, > I'm not sure we should really prohibit them. Allowing them would e.g. > allow to do just a reboot without kdump for one type of crashes. Ah yes, that's a fair point. >>> --- a/xen/include/xen/kexec.h >>> +++ b/xen/include/xen/kexec.h >>> @@ -1,6 +1,8 @@ >>> #ifndef __XEN_KEXEC_H__ >>> #define __XEN_KEXEC_H__ >>> >>> +#include >> >> Could we go without this, utilizing the gcc extension of forward >> declared enums? Otoh ... >> >>> @@ -82,7 +84,11 @@ void vmcoreinfo_append_str(const char *fmt, ...) >>> #define kexecing 0 >>> >>> static inline void kexec_early_calculations(void) {} >>> -static inline void kexec_crash(void) {} >>> +static inline void kexec_crash(enum crash_reason reason) >>> +{ >>> +keyhandler_crash_action(reason); >>> +} >> >> ... if this is to be an inline function and not just a #define, >> it'll need the declaration of the function to have been seen. > > And even being a #define all users of kexec_crash() would need to > #include keyhandler.h (and I'm not sure there are any source files > including kexec.h which don't use kexec_crash()). About as many which do as ones which don't. But there's no generally accessible header which includes xen/kexec.h, so perhaps the extra dependency indeed isn't all this problematic. Jan
Re: [PATCH 20/33] docs: ABI: testing: make the files compatible with ReST output
On Wed, 28 Oct 2020 15:23:18 +0100 Mauro Carvalho Chehab wrote: > From: Mauro Carvalho Chehab > > Some files over there won't parse well by Sphinx. > > Fix them. > > Signed-off-by: Mauro Carvalho Chehab > Signed-off-by: Mauro Carvalho Chehab Query below... I'm going to guess a rebase issue? Other than that Acked-by: Jonathan Cameron # for IIO > diff --git a/Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 > b/Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 > index b7259234ad70..a10a4de3e5fe 100644 > --- a/Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 > +++ b/Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 > @@ -3,67 +3,85 @@ KernelVersion: 4.11 > Contact: benjamin.gaign...@st.com > Description: > Reading returns the list possible master modes which are: > - - "reset" : The UG bit from the TIMx_EGR register is > + > + > + - "reset" > + The UG bit from the TIMx_EGR register is > used as trigger output (TRGO). > - - "enable": The Counter Enable signal CNT_EN is used > + - "enable" > + The Counter Enable signal CNT_EN is used > as trigger output. > - - "update": The update event is selected as trigger output. > + - "update" > + The update event is selected as trigger output. > For instance a master timer can then be used > as a prescaler for a slave timer. > - - "compare_pulse" : The trigger output send a positive pulse > - when the CC1IF flag is to be set. > - - "OC1REF": OC1REF signal is used as trigger output. > - - "OC2REF": OC2REF signal is used as trigger output. > - - "OC3REF": OC3REF signal is used as trigger output. > - - "OC4REF": OC4REF signal is used as trigger output. > + - "compare_pulse" > + The trigger output send a positive pulse > + when the CC1IF flag is to be set. > + - "OC1REF" > + OC1REF signal is used as trigger output. > + - "OC2REF" > + OC2REF signal is used as trigger output. > + - "OC3REF" > + OC3REF signal is used as trigger output. > + - "OC4REF" > + OC4REF signal is used as trigger output. > + > Additional modes (on TRGO2 only): > - - "OC5REF": OC5REF signal is used as trigger output. > - - "OC6REF": OC6REF signal is used as trigger output. > + > + - "OC5REF" > + OC5REF signal is used as trigger output. > + - "OC6REF" > + OC6REF signal is used as trigger output. > - "compare_pulse_OC4REF": > - OC4REF rising or falling edges generate pulses. > + OC4REF rising or falling edges generate pulses. > - "compare_pulse_OC6REF": > - OC6REF rising or falling edges generate pulses. > + OC6REF rising or falling edges generate pulses. > - "compare_pulse_OC4REF_r_or_OC6REF_r": > - OC4REF or OC6REF rising edges generate pulses. > + OC4REF or OC6REF rising edges generate pulses. > - "compare_pulse_OC4REF_r_or_OC6REF_f": > - OC4REF rising or OC6REF falling edges generate pulses. > + OC4REF rising or OC6REF falling edges generate > + pulses. > - "compare_pulse_OC5REF_r_or_OC6REF_r": > - OC5REF or OC6REF rising edges generate pulses. > + OC5REF or OC6REF rising edges generate pulses. > - "compare_pulse_OC5REF_r_or_OC6REF_f": > - OC5REF rising or OC6REF falling edges generate pulses. > + OC5REF rising or OC6REF falling edges generate > + pulses. > > - +---+ +-++-+ > - | Prescaler +-> | Counter |+-> | Master | TRGO(2) > - +---+ +--++-+|-> | Control +--> > -|| || +-+ > - +--v+-+ OCxREF || +-+ > - | Chx compare +--> | Output | ChX > - +---+-+ | | Control +--> > - . | | +-+ > - . | |. > - +---
Re: [PATCH 12/12] xen/cpupool: make per-cpupool sched-gran hypfs node writable
On 26.10.2020 10:13, Juergen Gross wrote: > @@ -1088,13 +1098,58 @@ static int cpupool_gran_read(const struct hypfs_entry > *entry, > return copy_to_guest(uaddr, name, strlen(name) + 1) ? -EFAULT : 0; > } > > +static int cpupool_gran_write(struct hypfs_entry_leaf *leaf, > + XEN_GUEST_HANDLE_PARAM(void) uaddr, > + unsigned int ulen) > +{ > +const struct hypfs_dyndir_id *data; > +struct cpupool *cpupool; > +enum sched_gran gran; > +unsigned int sched_gran; > +char name[SCHED_GRAN_NAME_LEN]; > +int ret = 0; > + > +if ( ulen > SCHED_GRAN_NAME_LEN ) > +return -ENOSPC; > + > +if ( copy_from_guest(name, uaddr, ulen) ) > +return -EFAULT; > + > +sched_gran = sched_gran_get(name, &gran) ? 0 > + : > cpupool_check_granularity(gran); > +if ( memchr(name, 0, ulen) != (name + ulen - 1) || sched_gran == 0 ) > +return -EINVAL; I guess the memchr() check wants to happen before the call to sched_gran_get()? Jan
Re: [PATCH 12/12] xen/cpupool: make per-cpupool sched-gran hypfs node writable
On 29.10.20 15:58, Jan Beulich wrote: On 26.10.2020 10:13, Juergen Gross wrote: @@ -1088,13 +1098,58 @@ static int cpupool_gran_read(const struct hypfs_entry *entry, return copy_to_guest(uaddr, name, strlen(name) + 1) ? -EFAULT : 0; } +static int cpupool_gran_write(struct hypfs_entry_leaf *leaf, + XEN_GUEST_HANDLE_PARAM(void) uaddr, + unsigned int ulen) +{ +const struct hypfs_dyndir_id *data; +struct cpupool *cpupool; +enum sched_gran gran; +unsigned int sched_gran; +char name[SCHED_GRAN_NAME_LEN]; +int ret = 0; + +if ( ulen > SCHED_GRAN_NAME_LEN ) +return -ENOSPC; + +if ( copy_from_guest(name, uaddr, ulen) ) +return -EFAULT; + +sched_gran = sched_gran_get(name, &gran) ? 0 + : cpupool_check_granularity(gran); +if ( memchr(name, 0, ulen) != (name + ulen - 1) || sched_gran == 0 ) +return -EINVAL; I guess the memchr() check wants to happen before the call to sched_gran_get()? Yes. Juergen
[PATCH] x86emul: support AVX-VNNI
These are VEX-encoded equivalents of the EVEX-encoded AVX512-VNNI ISA extension. Signed-off-by: Jan Beulich --- SDE: -spr --- a/tools/libs/light/libxl_cpuid.c +++ b/tools/libs/light/libxl_cpuid.c @@ -226,6 +226,7 @@ int libxl_cpuid_parse_config(libxl_cpuid {"core-caps",0x0007, 0, CPUID_REG_EDX, 30, 1}, {"ssbd", 0x0007, 0, CPUID_REG_EDX, 31, 1}, +{"avx-vnni", 0x0007, 1, CPUID_REG_EAX, 4, 1}, {"avx512-bf16", 0x0007, 1, CPUID_REG_EAX, 5, 1}, {"lahfsahf", 0x8001, NA, CPUID_REG_ECX, 0, 1}, --- a/tools/misc/xen-cpuid.c +++ b/tools/misc/xen-cpuid.c @@ -175,7 +175,7 @@ static const char *const str_7d0[32] = static const char *const str_7a1[32] = { -/* 4 */ [ 5] = "avx512-bf16", +[ 4] = "avx-vnni", [ 5] = "avx512-bf16", }; static const struct { --- a/tools/tests/x86_emulator/predicates.c +++ b/tools/tests/x86_emulator/predicates.c @@ -1335,6 +1335,10 @@ static const struct vex { { { 0x45 }, 2, T, R, pfx_66, Wn, Ln }, /* vpsrlv{d,q} */ { { 0x46 }, 2, T, R, pfx_66, W0, Ln }, /* vpsravd */ { { 0x47 }, 2, T, R, pfx_66, Wn, Ln }, /* vpsllv{d,q} */ +{ { 0x50 }, 2, T, R, pfx_66, W0, Ln }, /* vpdpbusd */ +{ { 0x51 }, 2, T, R, pfx_66, W0, Ln }, /* vpdpbusds */ +{ { 0x52 }, 2, T, R, pfx_66, W0, Ln }, /* vpdpwssd */ +{ { 0x53 }, 2, T, R, pfx_66, W0, Ln }, /* vpdpwssds */ { { 0x58 }, 2, T, R, pfx_66, W0, Ln }, /* vpbroadcastd */ { { 0x59 }, 2, T, R, pfx_66, W0, Ln }, /* vpbroadcastq */ { { 0x5a }, 2, F, R, pfx_66, W0, L1 }, /* vbroadcasti128 */ --- a/tools/tests/x86_emulator/test_x86_emulator.c +++ b/tools/tests/x86_emulator/test_x86_emulator.c @@ -5028,6 +5028,61 @@ int main(int argc, char **argv) printf("okay\n"); } +printf("%-40s", "Testing vpdpwssd (%ecx),%{y,z}mmA,%{y,z}mmB..."); +if ( stack_exec && cpu_has_avx512_vnni && cpu_has_avx_vnni ) +{ +/* Do the same operation two ways and compare the results. */ +decl_insn(vpdpwssd_vex1); +decl_insn(vpdpwssd_vex2); +decl_insn(vpdpwssd_evex); + +for ( i = 0; i < 24; ++i ) +res[i] = i | (~i << 16); + +asm volatile ( "vmovdqu32 32(%0), %%zmm1\n\t" + "vextracti64x4 $1, %%zmm1, %%ymm2\n\t" + "vpxor %%xmm0, %%xmm0, %%xmm3\n\t" + "vpxor %%xmm0, %%xmm0, %%xmm4\n\t" + "vpxor %%xmm0, %%xmm0, %%xmm5\n" + put_insn(vpdpwssd_vex1, +/* %{vex%} vpdpwssd (%1), %%ymm1, %%ymm3" */ +".byte 0xc4, 0xe2, 0x75, 0x52, 0x19") "\n" + put_insn(vpdpwssd_vex2, +/* "%{vex%} vpdpwssd 32(%1), %%ymm2, %%ymm4" */ +".byte 0xc4, 0xe2, 0x6d, 0x52, 0x61, 0x20") "\n" + put_insn(vpdpwssd_evex, +/* "vpdpwssd (%1), %%zmm1, %%zmm5" */ +".byte 0x62, 0xf2, 0x75, 0x48, 0x52, 0x29") + :: "r" (res), "c" (NULL) ); + +set_insn(vpdpwssd_vex1); +regs.ecx = (unsigned long)res; +rc = x86_emulate(&ctxt, &emulops); +if ( rc != X86EMUL_OKAY || !check_eip(vpdpwssd_vex1) ) +goto fail; + +set_insn(vpdpwssd_vex2); +regs.ecx = (unsigned long)res; +rc = x86_emulate(&ctxt, &emulops); +if ( rc != X86EMUL_OKAY || !check_eip(vpdpwssd_vex2) ) +goto fail; + +set_insn(vpdpwssd_evex); +regs.ecx = (unsigned long)res; +rc = x86_emulate(&ctxt, &emulops); +if ( rc != X86EMUL_OKAY || !check_eip(vpdpwssd_evex) ) +goto fail; + +asm ( "vinserti64x4 $1, %%ymm4, %%zmm3, %%zmm0\n\t" + "vpcmpeqd %%zmm0, %%zmm5, %%k0\n\t" + "kmovw %%k0, %0" : "=g" (rc) ); +if ( rc != 0x ) +goto fail; +printf("okay\n"); +} +else +printf("skipped\n"); + printf("%-40s", "Testing invpcid 16(%ecx),%%edx..."); if ( stack_exec ) { --- a/tools/tests/x86_emulator/x86-emulate.h +++ b/tools/tests/x86_emulator/x86-emulate.h @@ -170,6 +170,7 @@ static inline bool xcr0_mask(uint64_t ma #define cpu_has_avx512_4fmaps (cp.feat.avx512_4fmaps && xcr0_mask(0xe6)) #define cpu_has_avx512_vp2intersect (cp.feat.avx512_vp2intersect && xcr0_mask(0xe6)) #define cpu_has_serialize cp.feat.serialize +#define cpu_has_avx_vnni (cp.feat.avx_vnni && xcr0_mask(6)) #define cpu_has_avx512_bf16 (cp.feat.avx512_bf16 && xcr0_mask(0xe6)) #define cpu_has_xgetbv1 (cpu_has_xsave && cp.xstate.xgetbv1) --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -2008,6 +2008,7 @@ amd_like(const struct x86_emulate_ctxt * #define vcpu_has_avx512_4fmaps() (ctxt->cpuid->feat.avx512_4fmaps) #define vcpu_has_avx512_vp2intersec
RE: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header
From: Arnd Bergmann > Sent: 28 October 2020 21:21 > > From: Arnd Bergmann > > There are hundreds of warnings in a W=2 build about a local > variable shadowing the global 'apic' definition: > > arch/x86/kvm/lapic.h:149:65: warning: declaration of 'apic' shadows a global > declaration [-Wshadow] > > Avoid this by renaming the global 'apic' variable to the more descriptive > 'x86_system_apic'. It was originally called 'genapic', but both that > and the current 'apic' seem to be a little overly generic for a global > variable. > > Fixes: c48f14966cc4 ("KVM: inline kvm_apic_present() and kvm_lapic_enabled()") > Fixes: c8d46cf06dc2 ("x86: rename 'genapic' to 'apic'") > Signed-off-by: Arnd Bergmann > --- > v2: rename the global instead of the local variable in the header ... > diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c > index 284e73661a18..33e2dc78ca11 100644 > --- a/arch/x86/hyperv/hv_apic.c > +++ b/arch/x86/hyperv/hv_apic.c > @@ -259,14 +259,14 @@ void __init hv_apic_init(void) > /* >* Set the IPI entry points. >*/ > - orig_apic = *apic; > - > - apic->send_IPI = hv_send_ipi; > - apic->send_IPI_mask = hv_send_ipi_mask; > - apic->send_IPI_mask_allbutself = hv_send_ipi_mask_allbutself; > - apic->send_IPI_allbutself = hv_send_ipi_allbutself; > - apic->send_IPI_all = hv_send_ipi_all; > - apic->send_IPI_self = hv_send_ipi_self; > + orig_apic = *x86_system_apic; > + > + x86_system_apic->send_IPI = hv_send_ipi; > + x86_system_apic->send_IPI_mask = hv_send_ipi_mask; > + x86_system_apic->send_IPI_mask_allbutself = > hv_send_ipi_mask_allbutself; > + x86_system_apic->send_IPI_allbutself = hv_send_ipi_allbutself; > + x86_system_apic->send_IPI_all = hv_send_ipi_all; > + x86_system_apic->send_IPI_self = hv_send_ipi_self; > } > > if (ms_hyperv.hints & HV_X64_APIC_ACCESS_RECOMMENDED) { > @@ -285,10 +285,10 @@ void __init hv_apic_init(void) >*/ > apic_set_eoi_write(hv_apic_eoi_write); > if (!x2apic_enabled()) { > - apic->read = hv_apic_read; > - apic->write = hv_apic_write; > - apic->icr_write = hv_apic_icr_write; > - apic->icr_read = hv_apic_icr_read; > + x86_system_apic->read = hv_apic_read; > + x86_system_apic->write = hv_apic_write; > + x86_system_apic->icr_write = hv_apic_icr_write; > + x86_system_apic->icr_read = hv_apic_icr_read; > } For those two just add: struct apic *apic = x86_system_apic; before all the assignments. Less churn and much better code. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
[PATCH] x86/HVM: send mapcache invalidation request to qemu regardless of preemption
Even if only part of a hypercall completed before getting preempted, invalidation ought to occur. Therefore fold the two return statements. Signed-off-by: Jan Beulich --- Split off from "x86/HVM: refine when to send mapcache invalidation request to qemu". --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -326,14 +326,11 @@ int hvm_hypercall(struct cpu_user_regs * HVM_DBG_LOG(DBG_LEVEL_HCALL, "hcall%lu -> %lx", eax, regs->rax); -if ( curr->hcall_preempted ) -return HVM_HCALL_preempted; - if ( unlikely(currd->arch.hvm.qemu_mapcache_invalidate) && test_and_clear_bool(currd->arch.hvm.qemu_mapcache_invalidate) ) send_invalidate_req(); -return HVM_HCALL_completed; +return curr->hcall_preempted ? HVM_HCALL_preempted : HVM_HCALL_completed; } /*
[PATCH] x86/hvm: process softirq while saving/loading entries
On slow systems with sync_console saving or loading the context of big guests can cause the watchdog to trigger. Fix this by adding a couple of process_pending_softirqs. Signed-off-by: Roger Pau Monné --- xen/arch/x86/hvm/save.c | 4 1 file changed, 4 insertions(+) diff --git a/xen/arch/x86/hvm/save.c b/xen/arch/x86/hvm/save.c index a2c56fbc1e..584620985b 100644 --- a/xen/arch/x86/hvm/save.c +++ b/xen/arch/x86/hvm/save.c @@ -21,6 +21,7 @@ */ #include +#include #include #include @@ -255,6 +256,7 @@ int hvm_save(struct domain *d, hvm_domain_context_t *h) v, i); return -ENODATA; } +process_pending_softirqs(); } } else @@ -268,6 +270,7 @@ int hvm_save(struct domain *d, hvm_domain_context_t *h) d->domain_id, i); return -ENODATA; } +process_pending_softirqs(); } } @@ -341,6 +344,7 @@ int hvm_load(struct domain *d, hvm_domain_context_t *h) d->domain_id, desc->typecode, desc->instance); return -1; } +process_pending_softirqs(); } /* Not reached */ -- 2.29.0
Re: [PATCH 2/2] tools/libs: fix uninstall rule for header files
> On 19 Oct 2020, at 08:21, Jan Beulich wrote: > > This again was working right only as long as $(LIBHEADER) consisted of > just one entry. > > Signed-off-by: Jan Beulich Reviewed-by: Bertrand Marquis The change is obviously fixing a bug :-) and the double $ is required to protect from make. Cheers Bertrand > --- > An alternative would be to use $(addprefix ) without any shell loop. > > --- a/tools/libs/libs.mk > +++ b/tools/libs/libs.mk > @@ -107,7 +107,7 @@ install: build > .PHONY: uninstall > uninstall: > rm -f $(DESTDIR)$(PKG_INSTALLDIR)/$(LIB_FILE_NAME).pc > - for i in $(LIBHEADER); do rm -f $(DESTDIR)$(includedir)/$(LIBHEADER); > done > + for i in $(LIBHEADER); do rm -f $(DESTDIR)$(includedir)/$$i; done > rm -f $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so > rm -f $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so.$(MAJOR) > rm -f $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so.$(MAJOR).$(MINOR) > >
[PATCH v1] xl: fix description of migrate --debug
xl migrate --debug used to track every pfn in every batch of pages. But these times are gone. Adjust the help text to tell what --debug is supposed to do today. Signed-off-by: Olaf Hering --- docs/man/xl.1.pod.in | 4 +++- tools/xl/xl_cmdtable.c | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in index 4bde0672fa..d0f50f0b4a 100644 --- a/docs/man/xl.1.pod.in +++ b/docs/man/xl.1.pod.in @@ -488,7 +488,9 @@ Include timestamps in output. =item B<--debug> -Display huge (!) amount of debug information during the migration process. +Verify transferred domU page data. All memory will be transferred one more +time to the destination host while the domU is paused, and compare with +the result of the inital transfer while the domU was still running. =item B<-p> diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c index fd2dc0aef2..af160dde42 100644 --- a/tools/xl/xl_cmdtable.c +++ b/tools/xl/xl_cmdtable.c @@ -168,7 +168,7 @@ struct cmd_spec cmd_table[] = { "-e Do not wait in the background (on ) for the death\n" "of the domain.\n" "-T Show timestamps during the migration process.\n" - "--debug Print huge (!) amount of debug during the migration process.\n" + "--debug Verify transferred domU page data.\n" "-p Do not unpause domain after migrating it.\n" "-D Preserve the domain id" },
RE: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header
From: Arnd Bergmann > Sent: 29 October 2020 09:51 ... > I think ideally there would be no global variable, withall accesses > encapsulated in function calls, possibly using static_call() optimizations > if any of them are performance critical. There isn't really a massive difference between global variables and global access functions. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH] x86/hvm: process softirq while saving/loading entries
On 29.10.2020 16:20, Roger Pau Monne wrote: > On slow systems with sync_console saving or loading the context of big > guests can cause the watchdog to trigger. Fix this by adding a couple > of process_pending_softirqs. Which raises the question in how far this is then also a problem for the caller of the underlying hypercall. IOW I wonder whether instead we need to make use of continuations here. Nevertheless ... > Signed-off-by: Roger Pau Monné Acked-by: Jan Beulich Jan
[PATCH] tools: add noidentpt domain config option
The Identity Pagetable is currently being created for all HVM VMs during setup. This was only necessary for running HVM guests on Intel CPUs where EPT was available but unrestricted guest mode was not. Add option to skip the creation of the Identity Pagetable via the "noidentpt" xl config option. This allows a system administrator to skip this step when the hardware is known to have the required features. Signed-off-by: Tamas K Lengyel --- Further context: while running VM forks a significant bottle-neck was identified when HVM_PARAM_IDENT_PT is getting copied from the parent VM. This is due to the fact that setting this parameter requires obtaining a Xen-wide lock (domctl_lock_acquire). When running several VM forks in parallel during fuzzing the fork reset hypercall can fail due to the lock being taken by another fork that's being reset at the same time. This whole situation can be avoided if there is no Identity-map pagetable to begin with as on modern CPUs this identity-map pagetable is not actually required. --- docs/man/xl.cfg.5.pod.in | 5 + tools/include/xenguest.h | 1 + tools/libs/guest/xg_dom_x86.c| 31 +-- tools/libs/light/libxl_create.c | 2 ++ tools/libs/light/libxl_dom.c | 2 ++ tools/libs/light/libxl_types.idl | 1 + tools/xl/xl_parse.c | 2 ++ 7 files changed, 30 insertions(+), 14 deletions(-) diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in index 0532739c1f..4d992fe346 100644 --- a/docs/man/xl.cfg.5.pod.in +++ b/docs/man/xl.cfg.5.pod.in @@ -587,6 +587,11 @@ which are incompatible with migration. Currently this is limited to enabling the invariant TSC feature flag in CPUID results when TSC is not emulated. +=item B + +Disable the creation of the Identity-map Pagetable that was required to run HVM +guests on Intel CPUs with EPT where no unrestricted guest mode was available. + =item B Specify that this domain is a driver domain. This enables certain diff --git a/tools/include/xenguest.h b/tools/include/xenguest.h index a9984dbea5..998a8c57ba 100644 --- a/tools/include/xenguest.h +++ b/tools/include/xenguest.h @@ -26,6 +26,7 @@ #define XCFLAGS_LIVE (1 << 0) #define XCFLAGS_DEBUG (1 << 1) +#define XCFLAGS_NOIDENTPT (1 << 2) #define X86_64_B_SIZE 64 #define X86_32_B_SIZE 32 diff --git a/tools/libs/guest/xg_dom_x86.c b/tools/libs/guest/xg_dom_x86.c index 2953aeb90b..827bea7eb7 100644 --- a/tools/libs/guest/xg_dom_x86.c +++ b/tools/libs/guest/xg_dom_x86.c @@ -718,20 +718,23 @@ static int alloc_magic_pages_hvm(struct xc_dom_image *dom) goto out; } -/* - * Identity-map page table is required for running with CR0.PG=0 when - * using Intel EPT. Create a 32-bit non-PAE page directory of superpages. - */ -if ( (ident_pt = xc_map_foreign_range( - xch, domid, PAGE_SIZE, PROT_READ | PROT_WRITE, - special_pfn(SPECIALPAGE_IDENT_PT))) == NULL ) -goto error_out; -for ( i = 0; i < PAGE_SIZE / sizeof(*ident_pt); i++ ) -ident_pt[i] = ((i << 22) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER | - _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE); -munmap(ident_pt, PAGE_SIZE); -xc_hvm_param_set(xch, domid, HVM_PARAM_IDENT_PT, - special_pfn(SPECIALPAGE_IDENT_PT) << PAGE_SHIFT); +if ( !(dom->flags & XCFLAGS_NOIDENTPT) ) +{ +/* + * Identity-map page table is required for running with CR0.PG=0 when + * using Intel EPT. Create a 32-bit non-PAE page directory of superpages. + */ +if ( (ident_pt = xc_map_foreign_range( + xch, domid, PAGE_SIZE, PROT_READ | PROT_WRITE, + special_pfn(SPECIALPAGE_IDENT_PT))) == NULL ) +goto error_out; +for ( i = 0; i < PAGE_SIZE / sizeof(*ident_pt); i++ ) +ident_pt[i] = ((i << 22) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER | + _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE); +munmap(ident_pt, PAGE_SIZE); +xc_hvm_param_set(xch, domid, HVM_PARAM_IDENT_PT, + special_pfn(SPECIALPAGE_IDENT_PT) << PAGE_SHIFT); +} dom->console_pfn = special_pfn(SPECIALPAGE_CONSOLE); xc_clear_domain_page(dom->xch, dom->guest_domid, dom->console_pfn); diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c index 321a13e519..62b06b359c 100644 --- a/tools/libs/light/libxl_create.c +++ b/tools/libs/light/libxl_create.c @@ -256,6 +256,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, libxl_defbool_setdefault(&b_info->disable_migrate, false); +libxl_defbool_setdefault(&b_info->disable_identpt, false); + for (i = 0 ; i < b_info->num_iomem; i++) if (b_info->iomem[i].gfn == LIBXL_INVALID_GFN) b_info->iomem[i].gfn = b_info->iomem[i].start; diff --git a/tools/libs/light/libxl_dom.c b/tools/libs/light/libxl_dom.c index 01d989a976..
Re: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header
Arnd, On Thu, Oct 29 2020 at 10:51, Arnd Bergmann wrote: > On Thu, Oct 29, 2020 at 8:04 AM Paolo Bonzini wrote: >> On 28/10/20 22:20, Arnd Bergmann wrote: >> > Avoid this by renaming the global 'apic' variable to the more descriptive >> > 'x86_system_apic'. It was originally called 'genapic', but both that >> > and the current 'apic' seem to be a little overly generic for a global >> > variable. >> >> The 'apic' affects only the current CPU, so one of 'x86_local_apic', >> 'x86_lapic' or 'x86_apic' is probably preferrable. > > Ok, I'll change it to x86_local_apic then, unless someone else has > a preference between them. x86_local_apic is fine with me. > I think ideally there would be no global variable, withall accesses > encapsulated in function calls, possibly using static_call() optimizations > if any of them are performance critical. There are a bunch, yes. > It doesn't seem hard to do, but I'd rather leave that change to > an x86 person ;-) It's not hard, but I'm not really sure whether it buys much. Can you please redo that against tip x86/apic. Much of what you are touching got a major overhaul. Thanks, tglx
[xen-unstable-smoke test] 156297: tolerable all pass - PUSHED
flight 156297 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/156297/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen 1fd1d4bafdf6f9f8fe5ca9b947f016a7aae92a74 baseline version: xen 16a20963b3209788f2c0d3a3eebb7d92f03f5883 Last test of basis 156260 2020-10-27 18:01:29 Z1 days Testing same since 156297 2020-10-29 14:01:23 Z0 days1 attempts People who touched revisions under test: Jan Beulich jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 16a20963b3..1fd1d4bafd 1fd1d4bafdf6f9f8fe5ca9b947f016a7aae92a74 -> smoke
Re: [PATCH v1 4/4] xen/pci: solve compilation error when memory paging is not enabled.
Hello Jan, > On 28 Oct 2020, at 3:13 pm, Rahul Singh wrote: > > Hello Jan, > >> On 28 Oct 2020, at 11:56 am, Jan Beulich wrote: >> >> On 26.10.2020 18:17, Rahul Singh wrote: >>> --- a/xen/drivers/passthrough/pci.c >>> +++ b/xen/drivers/passthrough/pci.c >>> @@ -1419,13 +1419,15 @@ static int assign_device(struct domain *d, u16 seg, >>> u8 bus, u8 devfn, u32 flag) >>>if ( !is_iommu_enabled(d) ) >>>return 0; >>> >>> -/* Prevent device assign if mem paging or mem sharing have been >>> +#if defined(CONFIG_HAS_MEM_PAGING) || defined(CONFIG_MEM_SHARING) >>> +/* Prevent device assign if mem paging or mem sharing have been >>> * enabled for this domain */ >>>if ( d != dom_io && >>> unlikely(mem_sharing_enabled(d) || >>> vm_event_check_ring(d->vm_event_paging) || >>> p2m_get_hostp2m(d)->global_logdirty) ) >>>return -EXDEV; >>> +#endif >> >> Besides this also disabling mem-sharing and log-dirty related >> logic, I don't think the change is correct: Each item being >> checked needs individually disabling depending on its associated >> CONFIG_*. For this, perhaps you want to introduce something >> like mem_paging_enabled(d), to avoid the need for #ifdef here? >> > > Ok I will fix that in next version. I just check and found out that mem-sharing , men-paging and log-dirty is x86 specific and not implemented for ARM. Is that will be ok if I move above code to x86 specific directory and introduce new function arch_pcidev_is_assignable() that will test if pcidev is assignable or not ? > >> Jan Regards, Rahul
Re: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header
On 29/10/20 17:56, Arvind Sankar wrote: >> For those two just add: >> struct apic *apic = x86_system_apic; >> before all the assignments. >> Less churn and much better code. >> > Why would it be better code? > I think he means the compiler produces better code, because it won't read the global variable repeatedly. Not sure if that's true,(*) but I think I do prefer that version if Arnd wants to do that tweak. Paolo (*) if it is, it might also be due to Linux using -fno-strict-aliasing
[PATCH 0/2] x86: PV shim vs grant table
While Andrew reported a randconfig build failure, I started wondering why we'd ever build in a way different from what had failed to build. Fix the build and then switch the shim config file accordingly. 1: fix build of PV shim when !GRANT_TABLE 2: PV shim doesn't need GRANT_TABLE Jan
[PATCH 1/2] x86: fix build of PV shim when !GRANT_TABLE
To do its compat translation, shim code needs to include the compat header. For this to work, this header first of all needs to be generated. Reported-by: Andrew Cooper Signed-off-by: Jan Beulich --- a/xen/include/Makefile +++ b/xen/include/Makefile @@ -24,6 +24,7 @@ headers-$(CONFIG_X86) += compat/arch headers-$(CONFIG_ARGO)+= compat/argo.h headers-$(CONFIG_PV) += compat/callback.h headers-$(CONFIG_GRANT_TABLE) += compat/grant_table.h +headers-$(CONFIG_PV_SHIM) += compat/grant_table.h headers-$(CONFIG_HVM) += compat/hvm/dm_op.h headers-$(CONFIG_HVM) += compat/hvm/hvm_op.h headers-$(CONFIG_HVM) += compat/hvm/hvm_vcpu.h
[PATCH 2/2] x86: PV shim doesn't need GRANT_TABLE
The only reference into the code controlled by this option is from the hypercall table, and that hypercall table entry gets overwritten. Signed-off-by: Jan Beulich --- a/xen/arch/x86/configs/pvshim_defconfig +++ b/xen/arch/x86/configs/pvshim_defconfig @@ -9,6 +9,7 @@ CONFIG_EXPERT=y CONFIG_SCHED_NULL=y # Disable features not used by the PV shim # CONFIG_XEN_SHSTK is not set +# CONFIG_GRANT_TABLE is not set # CONFIG_HYPFS is not set # CONFIG_BIGMEM is not set # CONFIG_KEXEC is not set
Re: [PATCH v1 4/4] xen/pci: solve compilation error when memory paging is not enabled.
On 29.10.2020 17:58, Rahul Singh wrote: >> On 28 Oct 2020, at 3:13 pm, Rahul Singh wrote: >>> On 28 Oct 2020, at 11:56 am, Jan Beulich wrote: >>> On 26.10.2020 18:17, Rahul Singh wrote: --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1419,13 +1419,15 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag) if ( !is_iommu_enabled(d) ) return 0; -/* Prevent device assign if mem paging or mem sharing have been +#if defined(CONFIG_HAS_MEM_PAGING) || defined(CONFIG_MEM_SHARING) +/* Prevent device assign if mem paging or mem sharing have been * enabled for this domain */ if ( d != dom_io && unlikely(mem_sharing_enabled(d) || vm_event_check_ring(d->vm_event_paging) || p2m_get_hostp2m(d)->global_logdirty) ) return -EXDEV; +#endif >>> >>> Besides this also disabling mem-sharing and log-dirty related >>> logic, I don't think the change is correct: Each item being >>> checked needs individually disabling depending on its associated >>> CONFIG_*. For this, perhaps you want to introduce something >>> like mem_paging_enabled(d), to avoid the need for #ifdef here? >>> >> >> Ok I will fix that in next version. > > I just check and found out that mem-sharing , men-paging and log-dirty is x86 > specific and not implemented for ARM. > Is that will be ok if I move above code to x86 specific directory and > introduce new function arch_pcidev_is_assignable() that will test if pcidev > is assignable or not ? As an immediate workaround - perhaps (long term the individual pieces should still be individually checked here, as they're not inherently x86-specific). Since there's no device property involved here, the suggested name looks misleading. Perhaps arch_iommu_usable(d)? Jan
Re: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header
On Thu, Oct 29, 2020 at 03:05:31PM +, David Laight wrote: > From: Arnd Bergmann > > Sent: 28 October 2020 21:21 > > > > From: Arnd Bergmann > > > > There are hundreds of warnings in a W=2 build about a local > > variable shadowing the global 'apic' definition: > > > > arch/x86/kvm/lapic.h:149:65: warning: declaration of 'apic' shadows a > > global declaration [-Wshadow] > > > > Avoid this by renaming the global 'apic' variable to the more descriptive > > 'x86_system_apic'. It was originally called 'genapic', but both that > > and the current 'apic' seem to be a little overly generic for a global > > variable. > > > > Fixes: c48f14966cc4 ("KVM: inline kvm_apic_present() and > > kvm_lapic_enabled()") > > Fixes: c8d46cf06dc2 ("x86: rename 'genapic' to 'apic'") > > Signed-off-by: Arnd Bergmann > > --- > > v2: rename the global instead of the local variable in the header > ... > > diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c > > index 284e73661a18..33e2dc78ca11 100644 > > --- a/arch/x86/hyperv/hv_apic.c > > +++ b/arch/x86/hyperv/hv_apic.c > > @@ -259,14 +259,14 @@ void __init hv_apic_init(void) > > /* > > * Set the IPI entry points. > > */ > > - orig_apic = *apic; > > - > > - apic->send_IPI = hv_send_ipi; > > - apic->send_IPI_mask = hv_send_ipi_mask; > > - apic->send_IPI_mask_allbutself = hv_send_ipi_mask_allbutself; > > - apic->send_IPI_allbutself = hv_send_ipi_allbutself; > > - apic->send_IPI_all = hv_send_ipi_all; > > - apic->send_IPI_self = hv_send_ipi_self; > > + orig_apic = *x86_system_apic; > > + > > + x86_system_apic->send_IPI = hv_send_ipi; > > + x86_system_apic->send_IPI_mask = hv_send_ipi_mask; > > + x86_system_apic->send_IPI_mask_allbutself = > > hv_send_ipi_mask_allbutself; > > + x86_system_apic->send_IPI_allbutself = hv_send_ipi_allbutself; > > + x86_system_apic->send_IPI_all = hv_send_ipi_all; > > + x86_system_apic->send_IPI_self = hv_send_ipi_self; > > } > > > > if (ms_hyperv.hints & HV_X64_APIC_ACCESS_RECOMMENDED) { > > @@ -285,10 +285,10 @@ void __init hv_apic_init(void) > > */ > > apic_set_eoi_write(hv_apic_eoi_write); > > if (!x2apic_enabled()) { > > - apic->read = hv_apic_read; > > - apic->write = hv_apic_write; > > - apic->icr_write = hv_apic_icr_write; > > - apic->icr_read = hv_apic_icr_read; > > + x86_system_apic->read = hv_apic_read; > > + x86_system_apic->write = hv_apic_write; > > + x86_system_apic->icr_write = hv_apic_icr_write; > > + x86_system_apic->icr_read = hv_apic_icr_read; > > } > > For those two just add: > struct apic *apic = x86_system_apic; > before all the assignments. > Less churn and much better code. > Why would it be better code?
[PATCH v1 02/23] tools: add xc_is_known_page_type to libxenctrl
Users of xc_get_pfn_type_batch may want to sanity check the data returned by Xen. Add a simple helper for this purpose. Signed-off-by: Olaf Hering --- tools/libs/ctrl/xc_private.h | 33 + 1 file changed, 33 insertions(+) diff --git a/tools/libs/ctrl/xc_private.h b/tools/libs/ctrl/xc_private.h index 5d2c7274fb..afb08aafe1 100644 --- a/tools/libs/ctrl/xc_private.h +++ b/tools/libs/ctrl/xc_private.h @@ -421,6 +421,39 @@ void *xc_map_foreign_ranges(xc_interface *xch, uint32_t dom, int xc_get_pfn_type_batch(xc_interface *xch, uint32_t dom, unsigned int num, xen_pfn_t *); +/* Sanitiy check for types returned by Xen */ +static inline bool xc_is_known_page_type(xen_pfn_t type) +{ +bool ret; + +switch (type) +{ +case XEN_DOMCTL_PFINFO_NOTAB: + +case XEN_DOMCTL_PFINFO_L1TAB: +case XEN_DOMCTL_PFINFO_L1TAB | XEN_DOMCTL_PFINFO_LPINTAB: + +case XEN_DOMCTL_PFINFO_L2TAB: +case XEN_DOMCTL_PFINFO_L2TAB | XEN_DOMCTL_PFINFO_LPINTAB: + +case XEN_DOMCTL_PFINFO_L3TAB: +case XEN_DOMCTL_PFINFO_L3TAB | XEN_DOMCTL_PFINFO_LPINTAB: + +case XEN_DOMCTL_PFINFO_L4TAB: +case XEN_DOMCTL_PFINFO_L4TAB | XEN_DOMCTL_PFINFO_LPINTAB: + +case XEN_DOMCTL_PFINFO_XTAB: +case XEN_DOMCTL_PFINFO_XALLOC: +case XEN_DOMCTL_PFINFO_BROKEN: +ret = true; +break; +default: +ret = false; +break; +} +return ret; +} + void bitmap_64_to_byte(uint8_t *bp, const uint64_t *lp, int nbits); void bitmap_byte_to_64(uint64_t *lp, const uint8_t *bp, int nbits);
[PATCH v1 05/23] tools: show migration transfer rate in send_dirty_pages
Show how fast domU pages are transferred in each iteration. The relevant data is how fast the pfns travel, not so much how much protocol overhead exists. So the reported MiB/sec is just for pfns. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 2 ++ tools/libs/guest/xg_sr_save.c | 47 + 2 files changed, 49 insertions(+) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index 70e328e951..f3a7a29298 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -238,6 +238,8 @@ struct xc_sr_context bool debug; unsigned long p2m_size; +size_t pages_sent; +size_t overhead_sent; struct precopy_stats stats; diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c index 0546d3d9e6..f58a35ddde 100644 --- a/tools/libs/guest/xg_sr_save.c +++ b/tools/libs/guest/xg_sr_save.c @@ -1,5 +1,6 @@ #include #include +#include #include "xg_sr_common.h" @@ -238,6 +239,8 @@ static int write_batch(struct xc_sr_context *ctx) iov[3].iov_len = nr_pfns * sizeof(*rec_pfns); iovcnt = 4; +ctx->save.pages_sent += nr_pages; +ctx->save.overhead_sent += sizeof(rec) + sizeof(hdr) + nr_pfns * sizeof(*rec_pfns); if ( nr_pages ) { @@ -357,6 +360,43 @@ static int suspend_domain(struct xc_sr_context *ctx) return 0; } +static void show_transfer_rate(struct xc_sr_context *ctx, struct timespec *start) +{ +xc_interface *xch = ctx->xch; +struct timespec end = {}, diff = {}; +size_t ms, MiB_sec = ctx->save.pages_sent * PAGE_SIZE; + +if (!MiB_sec) +return; + +if ( clock_gettime(CLOCK_MONOTONIC, &end) ) +PERROR("clock_gettime"); + +if ( (end.tv_nsec - start->tv_nsec) < 0 ) +{ +diff.tv_sec = end.tv_sec - start->tv_sec - 1; +diff.tv_nsec = end.tv_nsec - start->tv_nsec + (1000U*1000U*1000U); +} +else +{ +diff.tv_sec = end.tv_sec - start->tv_sec; +diff.tv_nsec = end.tv_nsec - start->tv_nsec; +} + +ms = (diff.tv_nsec / (1000U*1000U)); +if (!ms) +ms = 1; +ms += (diff.tv_sec * 1000U); + +MiB_sec *= 1000U; +MiB_sec /= ms; +MiB_sec /= 1024U*1024U; + +errno = 0; +ERROR("%s: %zu bytes + %zu pages in %ld.%09ld sec, %zu MiB/sec", __func__, + ctx->save.overhead_sent, ctx->save.pages_sent, diff.tv_sec, diff.tv_nsec, MiB_sec); +} + /* * Send a subset of pages in the guests p2m, according to the dirty bitmap. * Used for each subsequent iteration of the live migration loop. @@ -370,9 +410,15 @@ static int send_dirty_pages(struct xc_sr_context *ctx, xen_pfn_t p; unsigned long written; int rc; +struct timespec start = {}; DECLARE_HYPERCALL_BUFFER_SHADOW(unsigned long, dirty_bitmap, &ctx->save.dirty_bitmap_hbuf); +ctx->save.pages_sent = 0; +ctx->save.overhead_sent = 0; +if ( clock_gettime(CLOCK_MONOTONIC, &start) ) +PERROR("clock_gettime"); + for ( p = 0, written = 0; p < ctx->save.p2m_size; ++p ) { if ( !test_bit(p, dirty_bitmap) ) @@ -396,6 +442,7 @@ static int send_dirty_pages(struct xc_sr_context *ctx, if ( written > entries ) DPRINTF("Bitmap contained more entries than expected..."); +show_transfer_rate(ctx, &start); xc_report_progress_step(xch, entries, entries); return ctx->save.ops.check_vm_state(ctx);
[PATCH v1 15/23] tools/guest: restore: move pfns array
Remove allocation from hotpath, move pfns array into preallocated space. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 2 ++ tools/libs/guest/xg_sr_restore.c | 6 ++ 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index 2a020fef5c..516d9b9fb5 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -224,6 +224,8 @@ struct xc_sr_save_arrays { }; struct xc_sr_restore_arrays { +/* handle_page_data */ +xen_pfn_t pfns[MAX_BATCH_SIZE]; }; struct xc_sr_context diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c index 4a9ece9681..7d1447e86c 100644 --- a/tools/libs/guest/xg_sr_restore.c +++ b/tools/libs/guest/xg_sr_restore.c @@ -315,7 +315,7 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec) unsigned int i, pages_of_data = 0; int rc = -1; -xen_pfn_t *pfns = NULL, pfn; +xen_pfn_t *pfns = ctx->restore.m->pfns, pfn; uint32_t *types = NULL, type; /* @@ -363,9 +363,8 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec) goto err; } -pfns = malloc(pages->count * sizeof(*pfns)); types = malloc(pages->count * sizeof(*types)); -if ( !pfns || !types ) +if ( !types ) { ERROR("Unable to allocate enough memory for %u pfns", pages->count); @@ -412,7 +411,6 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec) &pages->pfn[pages->count]); err: free(types); -free(pfns); return rc; }
[PATCH v1 14/23] tools/guest: save: move local_pages array
Remove allocation from hotpath, move local_pages array into preallocated space. Adjust the code to use the src page as is in case of HVM. In case of PV the page may need to be normalised, use an private memory area for this purpose. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 22 ++- tools/libs/guest/xg_sr_save.c | 25 +++-- tools/libs/guest/xg_sr_save_x86_hvm.c | 5 +++-- tools/libs/guest/xg_sr_save_x86_pv.c | 31 ++- 4 files changed, 39 insertions(+), 44 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index 33e66678c6..2a020fef5c 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -33,16 +33,12 @@ struct xc_sr_save_ops * Optionally transform the contents of a page from being specific to the * sending environment, to being generic for the stream. * - * The page of data at the end of 'page' may be a read-only mapping of a - * running guest; it must not be modified. If no transformation is - * required, the callee should leave '*pages' untouched. + * The page of data '*src' may be a read-only mapping of a running guest; + * it must not be modified. If no transformation is required, the callee + * should leave '*src' untouched, and return it via '**ptr'. * - * If a transformation is required, the callee should allocate themselves - * a local page using malloc() and return it via '*page'. - * - * The caller shall free() '*page' in all cases. In the case that the - * callee encounters an error, it should *NOT* free() the memory it - * allocated for '*page'. + * If a transformation is required, the callee should provide the + * transformed page in a private buffer and return it via '**ptr'. * * It is valid to fail with EAGAIN if the transformation is not able to be * completed at this point. The page shall be retried later. @@ -50,7 +46,7 @@ struct xc_sr_save_ops * @returns 0 for success, -1 for failure, with errno appropriately set. */ int (*normalise_page)(struct xc_sr_context *ctx, xen_pfn_t type, - void **page); + void *src, unsigned int idx, void **ptr); /** * Set up local environment to save a domain. (Typically querying @@ -371,6 +367,12 @@ struct xc_sr_context union { +struct +{ +/* Used by write_batch for modified pages. */ +void *normalised_pages; +} save; + struct { /* State machine for the order of received records. */ diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c index 658f834ae8..804e4ccb3a 100644 --- a/tools/libs/guest/xg_sr_save.c +++ b/tools/libs/guest/xg_sr_save.c @@ -91,11 +91,10 @@ static int write_batch(struct xc_sr_context *ctx) xen_pfn_t *mfns = ctx->save.m->mfns, *types = ctx->save.m->types; void *guest_mapping = NULL; void **guest_data = ctx->save.m->guest_data; -void **local_pages = NULL; int *errors = ctx->save.m->errors, rc = -1; unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0; unsigned int nr_pfns = ctx->save.nr_batch_pfns; -void *page, *orig_page; +void *src; uint64_t *rec_pfns = ctx->save.m->rec_pfns; struct iovec *iov = ctx->save.m->iov; int iovcnt = 0; struct xc_sr_rec_page_data_header hdr = { 0 }; @@ -105,16 +104,6 @@ static int write_batch(struct xc_sr_context *ctx) assert(nr_pfns != 0); -/* Pointers to locally allocated pages. Need freeing. */ -local_pages = calloc(nr_pfns, sizeof(*local_pages)); - -if ( !local_pages ) -{ -ERROR("Unable to allocate arrays for a batch of %u pages", - nr_pfns); -goto err; -} - for ( i = 0; i < nr_pfns; ++i ) { types[i] = mfns[i] = ctx->save.ops.pfn_to_gfn(ctx, @@ -176,11 +165,8 @@ static int write_batch(struct xc_sr_context *ctx) goto err; } -orig_page = page = guest_mapping + (p * PAGE_SIZE); -rc = ctx->save.ops.normalise_page(ctx, types[i], &page); - -if ( orig_page != page ) -local_pages[i] = page; +src = guest_mapping + (p * PAGE_SIZE); +rc = ctx->save.ops.normalise_page(ctx, types[i], src, i, &guest_data[i]); if ( rc ) { @@ -195,8 +181,6 @@ static int write_batch(struct xc_sr_context *ctx) else goto err; } -else -guest_data[i] = page; rc = -1; ++p; @@ -255,9 +239,6 @@ static int write_batch(struct xc_sr_context *ctx) err: if ( guest_mapping ) xenforeignmemory_un
[PATCH v1 12/23] tools/guest: save: move rec_pfns array
Remove allocation from hotpath, move rec_pfns array into preallocated space. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 2 ++ tools/libs/guest/xg_sr_save.c | 11 +-- 2 files changed, 3 insertions(+), 10 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index f315b4f526..81158a4f4d 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -221,6 +221,8 @@ struct xc_sr_save_arrays { int errors[MAX_BATCH_SIZE]; /* write_batch: iovec[] for writev(). */ struct iovec iov[MAX_BATCH_SIZE + 4]; +/* write_batch */ +uint64_t rec_pfns[MAX_BATCH_SIZE]; }; struct xc_sr_restore_arrays { diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c index 385a591332..4d7fbe09ed 100644 --- a/tools/libs/guest/xg_sr_save.c +++ b/tools/libs/guest/xg_sr_save.c @@ -96,7 +96,7 @@ static int write_batch(struct xc_sr_context *ctx) unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0; unsigned int nr_pfns = ctx->save.nr_batch_pfns; void *page, *orig_page; -uint64_t *rec_pfns = NULL; +uint64_t *rec_pfns = ctx->save.m->rec_pfns; struct iovec *iov = ctx->save.m->iov; int iovcnt = 0; struct xc_sr_rec_page_data_header hdr = { 0 }; struct xc_sr_record rec = { @@ -201,14 +201,6 @@ static int write_batch(struct xc_sr_context *ctx) } } -rec_pfns = malloc(nr_pfns * sizeof(*rec_pfns)); -if ( !rec_pfns ) -{ -ERROR("Unable to allocate %zu bytes of memory for page data pfn list", - nr_pfns * sizeof(*rec_pfns)); -goto err; -} - hdr.count = nr_pfns; rec.length = sizeof(hdr); @@ -259,7 +251,6 @@ static int write_batch(struct xc_sr_context *ctx) rc = ctx->save.nr_batch_pfns = 0; err: -free(rec_pfns); if ( guest_mapping ) xenforeignmemory_unmap(xch->fmem, guest_mapping, nr_pages_mapped); for ( i = 0; local_pages && i < nr_pfns; ++i )
[PATCH v1 22/23] tools/guest: restore: split handle_page_data
handle_page_data must be able to read directly into mapped guest memory. This will avoid unneccesary memcpy calls for data that can be consumed verbatim. Split the various steps of record processing: - move processing to handle_buffered_page_data - adjust xenforeignmemory_map to set errno in case of failure - adjust verify mode to set errno in case of failure This change is preparation for future changes in handle_page_data, no change in behavior is intended. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 9 + tools/libs/guest/xg_sr_restore.c | 343 --- 2 files changed, 231 insertions(+), 121 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index 66d1b0dfe6..7ec8867b88 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -230,9 +230,14 @@ struct xc_sr_restore_arrays { /* process_page_data */ xen_pfn_t mfns[MAX_BATCH_SIZE]; int map_errs[MAX_BATCH_SIZE]; +void *guest_data[MAX_BATCH_SIZE]; + /* populate_pfns */ xen_pfn_t pp_mfns[MAX_BATCH_SIZE]; xen_pfn_t pp_pfns[MAX_BATCH_SIZE]; + +/* Must be the last member */ +struct xc_sr_rec_page_data_header pages; }; struct xc_sr_context @@ -323,7 +328,11 @@ struct xc_sr_context /* Sender has invoked verify mode on the stream. */ bool verify; +void *verify_buf; + struct xc_sr_restore_arrays *m; +void *guest_mapping; +uint32_t nr_mapped_pages; } restore; }; diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c index 93f69b9ba8..060f3d1f4e 100644 --- a/tools/libs/guest/xg_sr_restore.c +++ b/tools/libs/guest/xg_sr_restore.c @@ -186,123 +186,18 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int count, return rc; } -/* - * Given a list of pfns, their types, and a block of page data from the - * stream, populate and record their types, map the relevant subset and copy - * the data into the guest. - */ -static int process_page_data(struct xc_sr_context *ctx, unsigned int count, - xen_pfn_t *pfns, uint32_t *types, void *page_data) +static int handle_static_data_end_v2(struct xc_sr_context *ctx) { -xc_interface *xch = ctx->xch; -xen_pfn_t *mfns = ctx->restore.m->mfns; -int *map_errs = ctx->restore.m->map_errs; -int rc; -void *mapping = NULL, *guest_page = NULL; -unsigned int i, /* i indexes the pfns from the record. */ -j, /* j indexes the subset of pfns we decide to map. */ -nr_pages = 0; - -rc = populate_pfns(ctx, count, pfns, types); -if ( rc ) -{ -ERROR("Failed to populate pfns for batch of %u pages", count); -goto err; -} - -for ( i = 0; i < count; ++i ) -{ -ctx->restore.ops.set_page_type(ctx, pfns[i], types[i]); - -if ( page_type_has_stream_data(types[i]) == true ) -mfns[nr_pages++] = ctx->restore.ops.pfn_to_gfn(ctx, pfns[i]); -} - -/* Nothing to do? */ -if ( nr_pages == 0 ) -goto done; - -mapping = guest_page = xenforeignmemory_map( -xch->fmem, ctx->domid, PROT_READ | PROT_WRITE, -nr_pages, mfns, map_errs); -if ( !mapping ) -{ -rc = -1; -PERROR("Unable to map %u mfns for %u pages of data", - nr_pages, count); -goto err; -} - -for ( i = 0, j = 0; i < count; ++i ) -{ -if ( page_type_has_stream_data(types[i]) == false ) -continue; - -if ( map_errs[j] ) -{ -rc = -1; -ERROR("Mapping pfn %#"PRIpfn" (mfn %#"PRIpfn", type %#"PRIx32") failed with %d", - pfns[i], mfns[j], types[i], map_errs[j]); -goto err; -} - -/* Undo page normalisation done by the saver. */ -rc = ctx->restore.ops.localise_page(ctx, types[i], page_data); -if ( rc ) -{ -ERROR("Failed to localise pfn %#"PRIpfn" (type %#"PRIx32")", - pfns[i], types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT); -goto err; -} - -if ( ctx->restore.verify ) -{ -/* Verify mode - compare incoming data to what we already have. */ -if ( memcmp(guest_page, page_data, PAGE_SIZE) ) -ERROR("verify pfn %#"PRIpfn" failed (type %#"PRIx32")", - pfns[i], types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT); -} -else -{ -/* Regular mode - copy incoming data into place. */ -memcpy(guest_page, page_data, PAGE_SIZE); -} - -++j; -guest_page += PAGE_SIZE; -page_data += PAGE_SIZE; -} - - done: -rc = 0; - - err: -if ( mapping ) -xenforeignmemory_unmap(xch->fmem, mapping, nr_pages); - -return rc; -} +int rc = 0; -/* - * Validate a PAGE_DATA record from the stream, and p
[PATCH v1 04/23] tools: unify type checking for data pfns in migration stream
Introduce a helper which decides if a given pfn type has data for the migration stream. No change in behavior intended. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 17 tools/libs/guest/xg_sr_restore.c | 34 +--- tools/libs/guest/xg_sr_save.c| 14 ++--- 3 files changed, 24 insertions(+), 41 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index cc3ad1c394..70e328e951 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -455,6 +455,23 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int count, /* Handle a STATIC_DATA_END record. */ int handle_static_data_end(struct xc_sr_context *ctx); +static inline bool page_type_has_stream_data(uint32_t type) +{ +bool ret; + +switch (type) +{ +case XEN_DOMCTL_PFINFO_XTAB: +case XEN_DOMCTL_PFINFO_XALLOC: +case XEN_DOMCTL_PFINFO_BROKEN: +ret = false; +break; +default: +ret = true; +break; +} +return ret; +} #endif /* * Local variables: diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c index f1c3169229..0332ae9f32 100644 --- a/tools/libs/guest/xg_sr_restore.c +++ b/tools/libs/guest/xg_sr_restore.c @@ -152,9 +152,8 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int count, for ( i = 0; i < count; ++i ) { -if ( (!types || (types && - (types[i] != XEN_DOMCTL_PFINFO_XTAB && - types[i] != XEN_DOMCTL_PFINFO_BROKEN))) && +if ( (!types || + (types && page_type_has_stream_data(types[i]) == true)) && !pfn_is_populated(ctx, original_pfns[i]) ) { rc = pfn_set_populated(ctx, original_pfns[i]); @@ -233,25 +232,8 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count, { ctx->restore.ops.set_page_type(ctx, pfns[i], types[i]); -switch ( types[i] ) -{ -case XEN_DOMCTL_PFINFO_NOTAB: - -case XEN_DOMCTL_PFINFO_L1TAB: -case XEN_DOMCTL_PFINFO_L1TAB | XEN_DOMCTL_PFINFO_LPINTAB: - -case XEN_DOMCTL_PFINFO_L2TAB: -case XEN_DOMCTL_PFINFO_L2TAB | XEN_DOMCTL_PFINFO_LPINTAB: - -case XEN_DOMCTL_PFINFO_L3TAB: -case XEN_DOMCTL_PFINFO_L3TAB | XEN_DOMCTL_PFINFO_LPINTAB: - -case XEN_DOMCTL_PFINFO_L4TAB: -case XEN_DOMCTL_PFINFO_L4TAB | XEN_DOMCTL_PFINFO_LPINTAB: - +if ( page_type_has_stream_data(types[i]) == true ) mfns[nr_pages++] = ctx->restore.ops.pfn_to_gfn(ctx, pfns[i]); -break; -} } /* Nothing to do? */ @@ -271,14 +253,8 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count, for ( i = 0, j = 0; i < count; ++i ) { -switch ( types[i] ) -{ -case XEN_DOMCTL_PFINFO_XTAB: -case XEN_DOMCTL_PFINFO_BROKEN: -case XEN_DOMCTL_PFINFO_XALLOC: -/* No page data to deal with. */ +if ( page_type_has_stream_data(types[i]) == false ) continue; -} if ( map_errs[j] ) { @@ -413,7 +389,7 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec) goto err; } -if ( type < XEN_DOMCTL_PFINFO_BROKEN ) +if ( page_type_has_stream_data(type) == true ) /* NOTAB and all L1 through L4 tables (including pinned) should * have a page worth of data in the record. */ pages_of_data++; diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c index 044d0ae3aa..0546d3d9e6 100644 --- a/tools/libs/guest/xg_sr_save.c +++ b/tools/libs/guest/xg_sr_save.c @@ -153,13 +153,8 @@ static int write_batch(struct xc_sr_context *ctx) goto err; } -switch ( types[i] ) -{ -case XEN_DOMCTL_PFINFO_BROKEN: -case XEN_DOMCTL_PFINFO_XALLOC: -case XEN_DOMCTL_PFINFO_XTAB: +if ( page_type_has_stream_data(types[i]) == false ) continue; -} mfns[nr_pages++] = mfns[i]; } @@ -177,13 +172,8 @@ static int write_batch(struct xc_sr_context *ctx) for ( i = 0, p = 0; i < nr_pfns; ++i ) { -switch ( types[i] ) -{ -case XEN_DOMCTL_PFINFO_BROKEN: -case XEN_DOMCTL_PFINFO_XALLOC: -case XEN_DOMCTL_PFINFO_XTAB: +if ( page_type_has_stream_data(types[i]) == false ) continue; -} if ( errors[p] ) {
[PATCH v1 10/23] tools/guest: save: move errors array
Remove allocation from hotpath, move errors array into preallocated space. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 2 ++ tools/libs/guest/xg_sr_save.c | 7 ++- 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index 3cbadb607b..71b676c0e0 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -217,6 +217,8 @@ struct xc_sr_save_arrays { xen_pfn_t mfns[MAX_BATCH_SIZE]; /* write_batch: Types of the batch pfns. */ xen_pfn_t types[MAX_BATCH_SIZE]; +/* write_batch: Errors from attempting to map the gfns. */ +int errors[MAX_BATCH_SIZE]; }; struct xc_sr_restore_arrays { diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c index 6678df95b8..cdc27a9cde 100644 --- a/tools/libs/guest/xg_sr_save.c +++ b/tools/libs/guest/xg_sr_save.c @@ -92,7 +92,7 @@ static int write_batch(struct xc_sr_context *ctx) void *guest_mapping = NULL; void **guest_data = NULL; void **local_pages = NULL; -int *errors = NULL, rc = -1; +int *errors = ctx->save.m->errors, rc = -1; unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0; unsigned int nr_pfns = ctx->save.nr_batch_pfns; void *page, *orig_page; @@ -105,8 +105,6 @@ static int write_batch(struct xc_sr_context *ctx) assert(nr_pfns != 0); -/* Errors from attempting to map the gfns. */ -errors = malloc(nr_pfns * sizeof(*errors)); /* Pointers to page data to send. Mapped gfns or local allocations. */ guest_data = calloc(nr_pfns, sizeof(*guest_data)); /* Pointers to locally allocated pages. Need freeing. */ @@ -114,7 +112,7 @@ static int write_batch(struct xc_sr_context *ctx) /* iovec[] for writev(). */ iov = malloc((nr_pfns + 4) * sizeof(*iov)); -if ( !errors || !guest_data || !local_pages || !iov ) +if ( !guest_data || !local_pages || !iov ) { ERROR("Unable to allocate arrays for a batch of %u pages", nr_pfns); @@ -271,7 +269,6 @@ static int write_batch(struct xc_sr_context *ctx) free(iov); free(local_pages); free(guest_data); -free(errors); return rc; }
[PATCH v1 18/23] tools/guest: restore: move map_errs array
Remove allocation from hotpath, move map_errs array into preallocated space. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 1 + tools/libs/guest/xg_sr_restore.c | 12 +--- 2 files changed, 2 insertions(+), 11 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index 5731a5c186..eba3a49877 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -229,6 +229,7 @@ struct xc_sr_restore_arrays { uint32_t types[MAX_BATCH_SIZE]; /* process_page_data */ xen_pfn_t mfns[MAX_BATCH_SIZE]; +int map_errs[MAX_BATCH_SIZE]; }; struct xc_sr_context diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c index 3ba089f862..94c329032f 100644 --- a/tools/libs/guest/xg_sr_restore.c +++ b/tools/libs/guest/xg_sr_restore.c @@ -206,21 +206,13 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count, { xc_interface *xch = ctx->xch; xen_pfn_t *mfns = ctx->restore.m->mfns; -int *map_errs = malloc(count * sizeof(*map_errs)); +int *map_errs = ctx->restore.m->map_errs; int rc; void *mapping = NULL, *guest_page = NULL; unsigned int i, /* i indexes the pfns from the record. */ j, /* j indexes the subset of pfns we decide to map. */ nr_pages = 0; -if ( !map_errs ) -{ -rc = -1; -ERROR("Failed to allocate %zu bytes to process page data", - count * (sizeof(*mfns) + sizeof(*map_errs))); -goto err; -} - rc = populate_pfns(ctx, count, pfns, types); if ( rc ) { @@ -298,8 +290,6 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count, if ( mapping ) xenforeignmemory_unmap(xch->fmem, mapping, nr_pages); -free(map_errs); - return rc; }
[PATCH v1 11/23] tools/guest: save: move iov array
Remove allocation from hotpath, move iov array into preallocated space. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 2 ++ tools/libs/guest/xg_sr_save.c | 7 ++- 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index 71b676c0e0..f315b4f526 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -219,6 +219,8 @@ struct xc_sr_save_arrays { xen_pfn_t types[MAX_BATCH_SIZE]; /* write_batch: Errors from attempting to map the gfns. */ int errors[MAX_BATCH_SIZE]; +/* write_batch: iovec[] for writev(). */ +struct iovec iov[MAX_BATCH_SIZE + 4]; }; struct xc_sr_restore_arrays { diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c index cdc27a9cde..385a591332 100644 --- a/tools/libs/guest/xg_sr_save.c +++ b/tools/libs/guest/xg_sr_save.c @@ -97,7 +97,7 @@ static int write_batch(struct xc_sr_context *ctx) unsigned int nr_pfns = ctx->save.nr_batch_pfns; void *page, *orig_page; uint64_t *rec_pfns = NULL; -struct iovec *iov = NULL; int iovcnt = 0; +struct iovec *iov = ctx->save.m->iov; int iovcnt = 0; struct xc_sr_rec_page_data_header hdr = { 0 }; struct xc_sr_record rec = { .type = REC_TYPE_PAGE_DATA, @@ -109,10 +109,8 @@ static int write_batch(struct xc_sr_context *ctx) guest_data = calloc(nr_pfns, sizeof(*guest_data)); /* Pointers to locally allocated pages. Need freeing. */ local_pages = calloc(nr_pfns, sizeof(*local_pages)); -/* iovec[] for writev(). */ -iov = malloc((nr_pfns + 4) * sizeof(*iov)); -if ( !guest_data || !local_pages || !iov ) +if ( !guest_data || !local_pages ) { ERROR("Unable to allocate arrays for a batch of %u pages", nr_pfns); @@ -266,7 +264,6 @@ static int write_batch(struct xc_sr_context *ctx) xenforeignmemory_unmap(xch->fmem, guest_mapping, nr_pages_mapped); for ( i = 0; local_pages && i < nr_pfns; ++i ) free(local_pages[i]); -free(iov); free(local_pages); free(guest_data);
[PATCH v1 20/23] tools/guest: restore: move pfns array in populate_pfns
Remove allocation from hotpath, move populate_pfns' pfns array into preallocated space. Use some prefix to avoid conflict with an array used in handle_page_data. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 1 + tools/libs/guest/xg_sr_restore.c | 11 +-- 2 files changed, 2 insertions(+), 10 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index 96a77b5969..3fe665b91d 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -232,6 +232,7 @@ struct xc_sr_restore_arrays { int map_errs[MAX_BATCH_SIZE]; /* populate_pfns */ xen_pfn_t pp_mfns[MAX_BATCH_SIZE]; +xen_pfn_t pp_pfns[MAX_BATCH_SIZE]; }; struct xc_sr_context diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c index 85a32aaed2..71b39612ee 100644 --- a/tools/libs/guest/xg_sr_restore.c +++ b/tools/libs/guest/xg_sr_restore.c @@ -139,17 +139,10 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int count, { xc_interface *xch = ctx->xch; xen_pfn_t *mfns = ctx->restore.m->pp_mfns, -*pfns = malloc(count * sizeof(*pfns)); +*pfns = ctx->restore.m->pp_pfns; unsigned int i, nr_pfns = 0; int rc = -1; -if ( !pfns ) -{ -ERROR("Failed to allocate %zu bytes for populating the physmap", - 2 * count * sizeof(*mfns)); -goto err; -} - for ( i = 0; i < count; ++i ) { if ( (!types || @@ -190,8 +183,6 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int count, rc = 0; err: -free(pfns); - return rc; }
[PATCH v1 01/23] tools: add readv_exact to libxenctrl
Read a batch of iovec's. In the common case of short reads, finish individual iov's with read_exact. Signed-off-by: Olaf Hering --- tools/libs/ctrl/xc_private.c | 54 +++- tools/libs/ctrl/xc_private.h | 1 + 2 files changed, 54 insertions(+), 1 deletion(-) diff --git a/tools/libs/ctrl/xc_private.c b/tools/libs/ctrl/xc_private.c index d94f846686..a86ed213a5 100644 --- a/tools/libs/ctrl/xc_private.c +++ b/tools/libs/ctrl/xc_private.c @@ -659,8 +659,23 @@ int write_exact(int fd, const void *data, size_t size) #if defined(__MINIOS__) /* - * MiniOS's libc doesn't know about writev(). Implement it as multiple write()s. + * MiniOS's libc doesn't know about readv/writev(). + * Implement it as multiple read/write()s. */ +int readv_exact(int fd, const struct iovec *iov, int iovcnt) +{ +int rc, i; + +for ( i = 0; i < iovcnt; ++i ) +{ +rc = read_exact(fd, iov[i].iov_base, iov[i].iov_len); +if ( rc ) +return rc; +} + +return 0; +} + int writev_exact(int fd, const struct iovec *iov, int iovcnt) { int rc, i; @@ -675,6 +689,44 @@ int writev_exact(int fd, const struct iovec *iov, int iovcnt) return 0; } #else +int readv_exact(int fd, const struct iovec *iov, int iovcnt) +{ +int rc = 0, idx = 0; +ssize_t len; + +while ( idx < iovcnt ) +{ +len = readv(fd, &iov[idx], min(iovcnt - idx, IOV_MAX)); +if ( len == -1 && errno == EINTR ) +continue; +if ( len <= 0 ) +{ +rc = -1; +goto out; +} +while ( len > 0 && idx < iovcnt ) +{ +if ( len >= iov[idx].iov_len ) +{ +len -= iov[idx].iov_len; +} +else +{ +void *p = iov[idx].iov_base + len; +size_t l = iov[idx].iov_len - len; + +rc = read_exact(fd, p, l); +if ( rc ) +goto out; +len = 0; +} +idx++; +} +} +out: +return rc; +} + int writev_exact(int fd, const struct iovec *iov, int iovcnt) { struct iovec *local_iov = NULL; diff --git a/tools/libs/ctrl/xc_private.h b/tools/libs/ctrl/xc_private.h index f0b5f83ac8..5d2c7274fb 100644 --- a/tools/libs/ctrl/xc_private.h +++ b/tools/libs/ctrl/xc_private.h @@ -441,6 +441,7 @@ int xc_flush_mmu_updates(xc_interface *xch, struct xc_mmu *mmu); /* Return 0 on success; -1 on error setting errno. */ int read_exact(int fd, void *data, size_t size); /* EOF => -1, errno=0 */ +int readv_exact(int fd, const struct iovec *iov, int iovcnt); int write_exact(int fd, const void *data, size_t size); int writev_exact(int fd, const struct iovec *iov, int iovcnt);
[PATCH v1 13/23] tools/guest: save: move guest_data array
Remove allocation from hotpath, move guest_data array into preallocated space. Because this was allocated with calloc: Adjust the loop to clear unused entries as needed. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 2 ++ tools/libs/guest/xg_sr_save.c | 11 ++- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index 81158a4f4d..33e66678c6 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -223,6 +223,8 @@ struct xc_sr_save_arrays { struct iovec iov[MAX_BATCH_SIZE + 4]; /* write_batch */ uint64_t rec_pfns[MAX_BATCH_SIZE]; +/* write_batch: Pointers to page data to send. Mapped gfns or local allocations. */ +void *guest_data[MAX_BATCH_SIZE]; }; struct xc_sr_restore_arrays { diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c index 4d7fbe09ed..658f834ae8 100644 --- a/tools/libs/guest/xg_sr_save.c +++ b/tools/libs/guest/xg_sr_save.c @@ -90,7 +90,7 @@ static int write_batch(struct xc_sr_context *ctx) xc_interface *xch = ctx->xch; xen_pfn_t *mfns = ctx->save.m->mfns, *types = ctx->save.m->types; void *guest_mapping = NULL; -void **guest_data = NULL; +void **guest_data = ctx->save.m->guest_data; void **local_pages = NULL; int *errors = ctx->save.m->errors, rc = -1; unsigned int i, p, nr_pages = 0, nr_pages_mapped = 0; @@ -105,12 +105,10 @@ static int write_batch(struct xc_sr_context *ctx) assert(nr_pfns != 0); -/* Pointers to page data to send. Mapped gfns or local allocations. */ -guest_data = calloc(nr_pfns, sizeof(*guest_data)); /* Pointers to locally allocated pages. Need freeing. */ local_pages = calloc(nr_pfns, sizeof(*local_pages)); -if ( !guest_data || !local_pages ) +if ( !local_pages ) { ERROR("Unable to allocate arrays for a batch of %u pages", nr_pfns); @@ -166,7 +164,10 @@ static int write_batch(struct xc_sr_context *ctx) for ( i = 0, p = 0; i < nr_pfns; ++i ) { if ( page_type_has_stream_data(types[i]) == false ) +{ +guest_data[i] = NULL; continue; +} if ( errors[p] ) { @@ -183,6 +184,7 @@ static int write_batch(struct xc_sr_context *ctx) if ( rc ) { +guest_data[i] = NULL; if ( rc == -1 && errno == EAGAIN ) { set_bit(ctx->save.m->batch_pfns[i], ctx->save.deferred_pages); @@ -256,7 +258,6 @@ static int write_batch(struct xc_sr_context *ctx) for ( i = 0; local_pages && i < nr_pfns; ++i ) free(local_pages[i]); free(local_pages); -free(guest_data); return rc; }
[PATCH v1 17/23] tools/guest: restore: move mfns array
Remove allocation from hotpath, move mfns array into preallocated space. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 2 ++ tools/libs/guest/xg_sr_restore.c | 5 ++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index 0ceecb289d..5731a5c186 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -227,6 +227,8 @@ struct xc_sr_restore_arrays { /* handle_page_data */ xen_pfn_t pfns[MAX_BATCH_SIZE]; uint32_t types[MAX_BATCH_SIZE]; +/* process_page_data */ +xen_pfn_t mfns[MAX_BATCH_SIZE]; }; struct xc_sr_context diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c index 7729071e41..3ba089f862 100644 --- a/tools/libs/guest/xg_sr_restore.c +++ b/tools/libs/guest/xg_sr_restore.c @@ -205,7 +205,7 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count, xen_pfn_t *pfns, uint32_t *types, void *page_data) { xc_interface *xch = ctx->xch; -xen_pfn_t *mfns = malloc(count * sizeof(*mfns)); +xen_pfn_t *mfns = ctx->restore.m->mfns; int *map_errs = malloc(count * sizeof(*map_errs)); int rc; void *mapping = NULL, *guest_page = NULL; @@ -213,7 +213,7 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count, j, /* j indexes the subset of pfns we decide to map. */ nr_pages = 0; -if ( !mfns || !map_errs ) +if ( !map_errs ) { rc = -1; ERROR("Failed to allocate %zu bytes to process page data", @@ -299,7 +299,6 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned int count, xenforeignmemory_unmap(xch->fmem, mapping, nr_pages); free(map_errs); -free(mfns); return rc; }
[PATCH v1 07/23] tools/guest: save: move batch_pfns
The batch_pfns array is already allocated in advance. Move it into the preallocated area. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 2 +- tools/libs/guest/xg_sr_save.c | 25 +++-- 2 files changed, 12 insertions(+), 15 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index 62bc87b5f4..c78a07b8f8 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -212,6 +212,7 @@ static inline int update_blob(struct xc_sr_blob *blob, } struct xc_sr_save_arrays { +xen_pfn_t batch_pfns[MAX_BATCH_SIZE]; }; struct xc_sr_restore_arrays { @@ -249,7 +250,6 @@ struct xc_sr_context struct precopy_stats stats; -xen_pfn_t *batch_pfns; unsigned int nr_batch_pfns; unsigned long *deferred_pages; unsigned long nr_deferred_pages; diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c index 1e3c8eff2f..597e638c59 100644 --- a/tools/libs/guest/xg_sr_save.c +++ b/tools/libs/guest/xg_sr_save.c @@ -77,7 +77,7 @@ static int write_checkpoint_record(struct xc_sr_context *ctx) /* * Writes a batch of memory as a PAGE_DATA record into the stream. The batch - * is constructed in ctx->save.batch_pfns. + * is constructed in ctx->save.m->batch_pfns. * * This function: * - gets the types for each pfn in the batch. @@ -128,12 +128,12 @@ static int write_batch(struct xc_sr_context *ctx) for ( i = 0; i < nr_pfns; ++i ) { types[i] = mfns[i] = ctx->save.ops.pfn_to_gfn(ctx, - ctx->save.batch_pfns[i]); + ctx->save.m->batch_pfns[i]); /* Likely a ballooned page. */ if ( mfns[i] == INVALID_MFN ) { -set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages); +set_bit(ctx->save.m->batch_pfns[i], ctx->save.deferred_pages); ++ctx->save.nr_deferred_pages; } } @@ -179,7 +179,7 @@ static int write_batch(struct xc_sr_context *ctx) if ( errors[p] ) { ERROR("Mapping of pfn %#"PRIpfn" (mfn %#"PRIpfn") failed %d", - ctx->save.batch_pfns[i], mfns[p], errors[p]); + ctx->save.m->batch_pfns[i], mfns[p], errors[p]); goto err; } @@ -193,7 +193,7 @@ static int write_batch(struct xc_sr_context *ctx) { if ( rc == -1 && errno == EAGAIN ) { -set_bit(ctx->save.batch_pfns[i], ctx->save.deferred_pages); +set_bit(ctx->save.m->batch_pfns[i], ctx->save.deferred_pages); ++ctx->save.nr_deferred_pages; types[i] = XEN_DOMCTL_PFINFO_XTAB; --nr_pages; @@ -224,7 +224,7 @@ static int write_batch(struct xc_sr_context *ctx) rec.length += nr_pages * PAGE_SIZE; for ( i = 0; i < nr_pfns; ++i ) -rec_pfns[i] = ((uint64_t)(types[i]) << 32) | ctx->save.batch_pfns[i]; +rec_pfns[i] = ((uint64_t)(types[i]) << 32) | ctx->save.m->batch_pfns[i]; iov[0].iov_base = &rec.type; iov[0].iov_len = sizeof(rec.type); @@ -296,9 +296,9 @@ static int flush_batch(struct xc_sr_context *ctx) if ( !rc ) { -VALGRIND_MAKE_MEM_UNDEFINED(ctx->save.batch_pfns, +VALGRIND_MAKE_MEM_UNDEFINED(ctx->save.m->batch_pfns, MAX_BATCH_SIZE * -sizeof(*ctx->save.batch_pfns)); +sizeof(*ctx->save.m->batch_pfns)); } return rc; @@ -315,7 +315,7 @@ static int add_to_batch(struct xc_sr_context *ctx, xen_pfn_t pfn) rc = flush_batch(ctx); if ( rc == 0 ) -ctx->save.batch_pfns[ctx->save.nr_batch_pfns++] = pfn; +ctx->save.m->batch_pfns[ctx->save.nr_batch_pfns++] = pfn; return rc; } @@ -850,14 +850,12 @@ static int setup(struct xc_sr_context *ctx) dirty_bitmap = xc_hypercall_buffer_alloc_pages( xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->save.p2m_size))); -ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE * - sizeof(*ctx->save.batch_pfns)); ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size); ctx->save.m = malloc(sizeof(*ctx->save.m)); -if ( !ctx->save.m || !ctx->save.batch_pfns || !dirty_bitmap || !ctx->save.deferred_pages ) +if ( !ctx->save.m || !dirty_bitmap || !ctx->save.deferred_pages ) { -ERROR("Unable to allocate memory for dirty bitmaps, batch pfns and" +ERROR("Unable to allocate memory for dirty bitmaps and" " deferred pages"); rc = -1; errno = ENOMEM; @@ -886,7 +884,6 @@ static void cleanup(struct xc_sr_context *ctx) xc_hypercall_buffer_free_pages(xch, dirty_bitmap,
[PATCH v1 16/23] tools/guest: restore: move types array
Remove allocation from hotpath, move types array into preallocated space. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 1 + tools/libs/guest/xg_sr_restore.c | 12 +--- 2 files changed, 2 insertions(+), 11 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index 516d9b9fb5..0ceecb289d 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -226,6 +226,7 @@ struct xc_sr_save_arrays { struct xc_sr_restore_arrays { /* handle_page_data */ xen_pfn_t pfns[MAX_BATCH_SIZE]; +uint32_t types[MAX_BATCH_SIZE]; }; struct xc_sr_context diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c index 7d1447e86c..7729071e41 100644 --- a/tools/libs/guest/xg_sr_restore.c +++ b/tools/libs/guest/xg_sr_restore.c @@ -316,7 +316,7 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec) int rc = -1; xen_pfn_t *pfns = ctx->restore.m->pfns, pfn; -uint32_t *types = NULL, type; +uint32_t *types = ctx->restore.m->types, type; /* * v2 compatibility only exists for x86 streams. This is a bit of a @@ -363,14 +363,6 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec) goto err; } -types = malloc(pages->count * sizeof(*types)); -if ( !types ) -{ -ERROR("Unable to allocate enough memory for %u pfns", - pages->count); -goto err; -} - for ( i = 0; i < pages->count; ++i ) { pfn = pages->pfn[i] & PAGE_DATA_PFN_MASK; @@ -410,8 +402,6 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec) rc = process_page_data(ctx, pages->count, pfns, types, &pages->pfn[pages->count]); err: -free(types); - return rc; }
[PATCH v1 08/23] tools/guest: save: move mfns array
Remove allocation from hotpath, move mfns array into preallocated space. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 2 ++ tools/libs/guest/xg_sr_save.c | 7 ++- 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index c78a07b8f8..0c2bef8f78 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -213,6 +213,8 @@ static inline int update_blob(struct xc_sr_blob *blob, struct xc_sr_save_arrays { xen_pfn_t batch_pfns[MAX_BATCH_SIZE]; +/* write_batch: Mfns of the batch pfns. */ +xen_pfn_t mfns[MAX_BATCH_SIZE]; }; struct xc_sr_restore_arrays { diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c index 597e638c59..cdab288c3f 100644 --- a/tools/libs/guest/xg_sr_save.c +++ b/tools/libs/guest/xg_sr_save.c @@ -88,7 +88,7 @@ static int write_checkpoint_record(struct xc_sr_context *ctx) static int write_batch(struct xc_sr_context *ctx) { xc_interface *xch = ctx->xch; -xen_pfn_t *mfns = NULL, *types = NULL; +xen_pfn_t *mfns = ctx->save.m->mfns, *types = NULL; void *guest_mapping = NULL; void **guest_data = NULL; void **local_pages = NULL; @@ -105,8 +105,6 @@ static int write_batch(struct xc_sr_context *ctx) assert(nr_pfns != 0); -/* Mfns of the batch pfns. */ -mfns = malloc(nr_pfns * sizeof(*mfns)); /* Types of the batch pfns. */ types = malloc(nr_pfns * sizeof(*types)); /* Errors from attempting to map the gfns. */ @@ -118,7 +116,7 @@ static int write_batch(struct xc_sr_context *ctx) /* iovec[] for writev(). */ iov = malloc((nr_pfns + 4) * sizeof(*iov)); -if ( !mfns || !types || !errors || !guest_data || !local_pages || !iov ) +if ( !types || !errors || !guest_data || !local_pages || !iov ) { ERROR("Unable to allocate arrays for a batch of %u pages", nr_pfns); @@ -277,7 +275,6 @@ static int write_batch(struct xc_sr_context *ctx) free(guest_data); free(errors); free(types); -free(mfns); return rc; }
[PATCH v1 03/23] tools: use xc_is_known_page_type
Verify pfn type on sending side, also verify incoming batch of pfns. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_restore.c | 3 +-- tools/libs/guest/xg_sr_save.c| 6 ++ 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c index b57a787519..f1c3169229 100644 --- a/tools/libs/guest/xg_sr_restore.c +++ b/tools/libs/guest/xg_sr_restore.c @@ -406,8 +406,7 @@ static int handle_page_data(struct xc_sr_context *ctx, struct xc_sr_record *rec) } type = (pages->pfn[i] & PAGE_DATA_TYPE_MASK) >> 32; -if ( ((type >> XEN_DOMCTL_PFINFO_LTAB_SHIFT) >= 5) && - ((type >> XEN_DOMCTL_PFINFO_LTAB_SHIFT) <= 8) ) +if ( xc_is_known_page_type(type) == false ) { ERROR("Invalid type %#"PRIx32" for pfn %#"PRIpfn" (index %u)", type, pfn, i); diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c index 2ba7c3200c..044d0ae3aa 100644 --- a/tools/libs/guest/xg_sr_save.c +++ b/tools/libs/guest/xg_sr_save.c @@ -147,6 +147,12 @@ static int write_batch(struct xc_sr_context *ctx) for ( i = 0; i < nr_pfns; ++i ) { +if ( xc_is_known_page_type(types[i]) == false ) +{ +ERROR("Wrong type %#"PRIpfn" for pfn %#"PRIpfn, types[i], mfns[i]); +goto err; +} + switch ( types[i] ) { case XEN_DOMCTL_PFINFO_BROKEN:
[PATCH v1 06/23] tools/guest: prepare to allocate arrays once
The hotpath 'send_dirty_pages' is supposed to do just one thing: sending. The other end 'handle_page_data' is supposed to do just receiving. But instead both do other costly work like memory allocations and data moving. Do the allocations once, the array sizes are a compiletime constant. Avoid unneeded copying of data by receiving data directly into mapped guest memory. This patch is just prepartion, subsequent changes will populate the arrays. Once all changes are applied, migration of a busy HVM domU changes like that: Without this series, from sr650 to sr950 (xen-4.15.20201027T173911.16a20963b3 xen_testing): 2020-10-29 10:23:10.711+: xc: show_transfer_rate: 23663128 bytes + 2879563 pages in 55.324905335 sec, 203 MiB/sec: Internal error 2020-10-29 10:23:35.115+: xc: show_transfer_rate: 16829632 bytes + 2097552 pages in 24.401179720 sec, 335 MiB/sec: Internal error 2020-10-29 10:23:59.436+: xc: show_transfer_rate: 16829032 bytes + 2097478 pages in 24.319025928 sec, 336 MiB/sec: Internal error 2020-10-29 10:24:23.844+: xc: show_transfer_rate: 16829024 bytes + 2097477 pages in 24.406992500 sec, 335 MiB/sec: Internal error 2020-10-29 10:24:48.292+: xc: show_transfer_rate: 16828912 bytes + 2097463 pages in 24.446489027 sec, 335 MiB/sec: Internal error 2020-10-29 10:25:01.816+: xc: show_transfer_rate: 16836080 bytes + 2098356 pages in 13.447091818 sec, 609 MiB/sec: Internal error With this series, from sr650 to sr950 (xen-4.15.20201027T173911.16a20963b3 xen_unstable): 2020-10-28 21:26:05.074+: xc: show_transfer_rate: 23663128 bytes + 2879563 pages in 52.564054368 sec, 213 MiB/sec: Internal error 2020-10-28 21:26:23.527+: xc: show_transfer_rate: 16830040 bytes + 2097603 pages in 18.450592015 sec, 444 MiB/sec: Internal error 2020-10-28 21:26:41.926+: xc: show_transfer_rate: 16830944 bytes + 2097717 pages in 18.397862306 sec, 445 MiB/sec: Internal error 2020-10-28 21:27:00.339+: xc: show_transfer_rate: 16829176 bytes + 2097498 pages in 18.411973339 sec, 445 MiB/sec: Internal error 2020-10-28 21:27:18.643+: xc: show_transfer_rate: 16828592 bytes + 2097425 pages in 18.303326695 sec, 447 MiB/sec: Internal error 2020-10-28 21:27:26.289+: xc: show_transfer_rate: 16835952 bytes + 2098342 pages in 7.579846749 sec, 1081 MiB/sec: Internal error Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 8 tools/libs/guest/xg_sr_restore.c | 8 tools/libs/guest/xg_sr_save.c| 4 +++- 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index f3a7a29298..62bc87b5f4 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -211,6 +211,12 @@ static inline int update_blob(struct xc_sr_blob *blob, return 0; } +struct xc_sr_save_arrays { +}; + +struct xc_sr_restore_arrays { +}; + struct xc_sr_context { xc_interface *xch; @@ -248,6 +254,7 @@ struct xc_sr_context unsigned long *deferred_pages; unsigned long nr_deferred_pages; xc_hypercall_buffer_t dirty_bitmap_hbuf; +struct xc_sr_save_arrays *m; } save; struct /* Restore data. */ @@ -299,6 +306,7 @@ struct xc_sr_context /* Sender has invoked verify mode on the stream. */ bool verify; +struct xc_sr_restore_arrays *m; } restore; }; diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c index 0332ae9f32..4a9ece9681 100644 --- a/tools/libs/guest/xg_sr_restore.c +++ b/tools/libs/guest/xg_sr_restore.c @@ -739,6 +739,13 @@ static int setup(struct xc_sr_context *ctx) } ctx->restore.allocated_rec_num = DEFAULT_BUF_RECORDS; +ctx->restore.m = malloc(sizeof(*ctx->restore.m)); +if ( !ctx->restore.m ) { +ERROR("Unable to allocate memory for arrays"); +rc = -1; +goto err; +} + err: return rc; } @@ -757,6 +764,7 @@ static void cleanup(struct xc_sr_context *ctx) xc_hypercall_buffer_free_pages( xch, dirty_bitmap, NRPAGES(bitmap_size(ctx->restore.p2m_size))); +free(ctx->restore.m); free(ctx->restore.buffered_records); free(ctx->restore.populated_pfns); diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c index f58a35ddde..1e3c8eff2f 100644 --- a/tools/libs/guest/xg_sr_save.c +++ b/tools/libs/guest/xg_sr_save.c @@ -853,8 +853,9 @@ static int setup(struct xc_sr_context *ctx) ctx->save.batch_pfns = malloc(MAX_BATCH_SIZE * sizeof(*ctx->save.batch_pfns)); ctx->save.deferred_pages = bitmap_alloc(ctx->save.p2m_size); +ctx->save.m = malloc(sizeof(*ctx->save.m)); -if ( !ctx->save.batch_pfns || !dirty_bitmap || !ctx->save.deferred_pages ) +if ( !ctx->save.m || !ctx->save.batch_pfns || !dirty_bitmap || !ctx->save.deferred_pages ) { ERROR("Unable to allocate m
[PATCH v1 09/23] tools/guest: save: move types array
Remove allocation from hotpath, move types array into preallocated space. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 2 ++ tools/libs/guest/xg_sr_save.c | 7 ++- 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index 0c2bef8f78..3cbadb607b 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -215,6 +215,8 @@ struct xc_sr_save_arrays { xen_pfn_t batch_pfns[MAX_BATCH_SIZE]; /* write_batch: Mfns of the batch pfns. */ xen_pfn_t mfns[MAX_BATCH_SIZE]; +/* write_batch: Types of the batch pfns. */ +xen_pfn_t types[MAX_BATCH_SIZE]; }; struct xc_sr_restore_arrays { diff --git a/tools/libs/guest/xg_sr_save.c b/tools/libs/guest/xg_sr_save.c index cdab288c3f..6678df95b8 100644 --- a/tools/libs/guest/xg_sr_save.c +++ b/tools/libs/guest/xg_sr_save.c @@ -88,7 +88,7 @@ static int write_checkpoint_record(struct xc_sr_context *ctx) static int write_batch(struct xc_sr_context *ctx) { xc_interface *xch = ctx->xch; -xen_pfn_t *mfns = ctx->save.m->mfns, *types = NULL; +xen_pfn_t *mfns = ctx->save.m->mfns, *types = ctx->save.m->types; void *guest_mapping = NULL; void **guest_data = NULL; void **local_pages = NULL; @@ -105,8 +105,6 @@ static int write_batch(struct xc_sr_context *ctx) assert(nr_pfns != 0); -/* Types of the batch pfns. */ -types = malloc(nr_pfns * sizeof(*types)); /* Errors from attempting to map the gfns. */ errors = malloc(nr_pfns * sizeof(*errors)); /* Pointers to page data to send. Mapped gfns or local allocations. */ @@ -116,7 +114,7 @@ static int write_batch(struct xc_sr_context *ctx) /* iovec[] for writev(). */ iov = malloc((nr_pfns + 4) * sizeof(*iov)); -if ( !types || !errors || !guest_data || !local_pages || !iov ) +if ( !errors || !guest_data || !local_pages || !iov ) { ERROR("Unable to allocate arrays for a batch of %u pages", nr_pfns); @@ -274,7 +272,6 @@ static int write_batch(struct xc_sr_context *ctx) free(local_pages); free(guest_data); free(errors); -free(types); return rc; }
[PATCH v1 23/23] tools/guest: restore: write data directly into guest
Read incoming migration stream directly into the guest memory. This avoids the memory allocation and copying, and the resulting performance penalty. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 1 + tools/libs/guest/xg_sr_restore.c | 132 ++- 2 files changed, 129 insertions(+), 4 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index 7ec8867b88..f76af23bcc 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -231,6 +231,7 @@ struct xc_sr_restore_arrays { xen_pfn_t mfns[MAX_BATCH_SIZE]; int map_errs[MAX_BATCH_SIZE]; void *guest_data[MAX_BATCH_SIZE]; +struct iovec iov[MAX_BATCH_SIZE]; /* populate_pfns */ xen_pfn_t pp_mfns[MAX_BATCH_SIZE]; diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c index 060f3d1f4e..2f575d7dd9 100644 --- a/tools/libs/guest/xg_sr_restore.c +++ b/tools/libs/guest/xg_sr_restore.c @@ -392,6 +392,122 @@ err: return rc; } +/* + * Handle PAGE_DATA record from the stream. + * Given a list of pfns, their types, and a block of page data from the + * stream, populate and record their types, map the relevant subset and copy + * the data into the guest. + */ +static int handle_incoming_page_data(struct xc_sr_context *ctx, + struct xc_sr_rhdr *rhdr) +{ +xc_interface *xch = ctx->xch; +struct xc_sr_restore_arrays *m = ctx->restore.m; +struct xc_sr_rec_page_data_header *pages = &m->pages; +uint64_t *pfn_nums = m->pages.pfn; +uint32_t i; +int rc, iov_idx; + +rc = handle_static_data_end_v2(ctx); +if ( rc ) +goto err; + +/* First read and verify the header */ +rc = read_exact(ctx->fd, pages, sizeof(*pages)); +if ( rc ) +{ +PERROR("Could not read rec_pfn header"); +goto err; +} + +if ( verify_rec_page_hdr(ctx, rhdr->length, pages) == false ) +{ +rc = -1; +goto err; +} + +/* Then read and verify the incoming pfn numbers */ +rc = read_exact(ctx->fd, pfn_nums, sizeof(*pfn_nums) * pages->count); +if ( rc ) +{ +PERROR("Could not read rec_pfn data"); +goto err; +} + +if ( verify_rec_page_pfns(ctx, rhdr->length, pages) == false ) +{ +rc = -1; +goto err; +} + +/* Finally read and verify the incoming pfn data */ +rc = map_guest_pages(ctx, pages); +if ( rc ) +goto err; + +/* Prepare read buffers, either guest or throw away memory */ +for ( i = 0, iov_idx = 0; i < pages->count; i++ ) +{ +if ( !m->guest_data[i] ) +continue; + +m->iov[iov_idx].iov_len = PAGE_SIZE; +if ( ctx->restore.verify ) +m->iov[iov_idx].iov_base = ctx->restore.verify_buf + i * PAGE_SIZE; +else +m->iov[iov_idx].iov_base = m->guest_data[i]; +iov_idx++; +} + +if ( !iov_idx ) +goto done; + +rc = readv_exact(ctx->fd, m->iov, iov_idx); +if ( rc ) +{ +PERROR("read of %d pages failed", iov_idx); +goto err; +} + +/* Post-processing of pfn data */ +for ( i = 0, iov_idx = 0; i < pages->count; i++ ) +{ +if ( !m->guest_data[i] ) +continue; + +rc = ctx->restore.ops.localise_page(ctx, m->types[i], m->iov[iov_idx].iov_base); +if ( rc ) +{ +ERROR("Failed to localise pfn %#"PRIpfn" (type %#"PRIx32")", + m->pfns[i], m->types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT); +goto err; + +} + +if ( ctx->restore.verify ) +{ +if ( memcmp(m->guest_data[i], m->iov[iov_idx].iov_base, PAGE_SIZE) ) +{ +ERROR("verify pfn %#"PRIpfn" failed (type %#"PRIx32")", + m->pfns[i], m->types[i] >> XEN_DOMCTL_PFINFO_LTAB_SHIFT); +} +} + +iov_idx++; +} + +done: +rc = 0; + +err: +if ( ctx->restore.guest_mapping ) +{ +xenforeignmemory_unmap(xch->fmem, ctx->restore.guest_mapping, ctx->restore.nr_mapped_pages); +ctx->restore.guest_mapping = NULL; +} +return rc; +} + /* * Handle PAGE_DATA record from an existing buffer * Given a list of pfns, their types, and a block of page data from the @@ -773,11 +889,19 @@ static int process_incoming_record_header(struct xc_sr_context *ctx, struct xc_s struct xc_sr_record rec; int rc; -rc = read_record_data(ctx, ctx->fd, rhdr, &rec); -if ( rc ) -return rc; +switch ( rhdr->type ) +{ +case REC_TYPE_PAGE_DATA: +rc = handle_incoming_page_data(ctx, rhdr); +break; +default: +rc = read_record_data(ctx, ctx->fd, rhdr, &rec); +if ( rc == 0 ) +rc = process_buffered_record(ctx, &rec);; +break; +} -return process_buffered_record(ctx, &rec); +return rc; }
[PATCH v1 19/23] tools/guest: restore: move mfns array in populate_pfns
Remove allocation from hotpath, move populate_pfns mfns array into preallocated space. Use some prefix to avoid conflict with an array used in handle_page_data. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.h | 2 ++ tools/libs/guest/xg_sr_restore.c | 5 ++--- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index eba3a49877..96a77b5969 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -230,6 +230,8 @@ struct xc_sr_restore_arrays { /* process_page_data */ xen_pfn_t mfns[MAX_BATCH_SIZE]; int map_errs[MAX_BATCH_SIZE]; +/* populate_pfns */ +xen_pfn_t pp_mfns[MAX_BATCH_SIZE]; }; struct xc_sr_context diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c index 94c329032f..85a32aaed2 100644 --- a/tools/libs/guest/xg_sr_restore.c +++ b/tools/libs/guest/xg_sr_restore.c @@ -138,12 +138,12 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int count, const xen_pfn_t *original_pfns, const uint32_t *types) { xc_interface *xch = ctx->xch; -xen_pfn_t *mfns = malloc(count * sizeof(*mfns)), +xen_pfn_t *mfns = ctx->restore.m->pp_mfns, *pfns = malloc(count * sizeof(*pfns)); unsigned int i, nr_pfns = 0; int rc = -1; -if ( !mfns || !pfns ) +if ( !pfns ) { ERROR("Failed to allocate %zu bytes for populating the physmap", 2 * count * sizeof(*mfns)); @@ -191,7 +191,6 @@ int populate_pfns(struct xc_sr_context *ctx, unsigned int count, err: free(pfns); -free(mfns); return rc; }
Re: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header
On Thu, Oct 29, 2020 at 05:59:54PM +0100, Paolo Bonzini wrote: > On 29/10/20 17:56, Arvind Sankar wrote: > >> For those two just add: > >>struct apic *apic = x86_system_apic; > >> before all the assignments. > >> Less churn and much better code. > >> > > Why would it be better code? > > > > I think he means the compiler produces better code, because it won't > read the global variable repeatedly. Not sure if that's true,(*) but I > think I do prefer that version if Arnd wants to do that tweak. > > Paolo > > (*) if it is, it might also be due to Linux using -fno-strict-aliasing > Nope, it doesn't read it multiple times. I guess it still assumes that apic isn't in the middle of what it points to: it would reload the address if the first element of *apic was modified, but doesn't for other elements. Interesting.
[PATCH v1 21/23] tools/guest: restore: split record processing
handle_page_data must be able to read directly into mapped guest memory. This will avoid unneccesary memcpy calls for data which can be consumed verbatim. Rearrange the code to allow decisions based on the incoming record. This change is preparation for future changes in handle_page_data, no change in behavior is intended. Signed-off-by: Olaf Hering --- tools/libs/guest/xg_sr_common.c | 33 - tools/libs/guest/xg_sr_common.h | 4 ++- tools/libs/guest/xg_sr_restore.c | 49 ++-- tools/libs/guest/xg_sr_save.c| 7 - 4 files changed, 63 insertions(+), 30 deletions(-) diff --git a/tools/libs/guest/xg_sr_common.c b/tools/libs/guest/xg_sr_common.c index 17567ab133..cabde4ef74 100644 --- a/tools/libs/guest/xg_sr_common.c +++ b/tools/libs/guest/xg_sr_common.c @@ -91,26 +91,33 @@ int write_split_record(struct xc_sr_context *ctx, struct xc_sr_record *rec, return -1; } -int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec) +int read_record_header(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr *rhdr) { xc_interface *xch = ctx->xch; -struct xc_sr_rhdr rhdr; -size_t datasz; -if ( read_exact(fd, &rhdr, sizeof(rhdr)) ) +if ( read_exact(fd, rhdr, sizeof(*rhdr)) ) { PERROR("Failed to read Record Header from stream"); return -1; } -if ( rhdr.length > REC_LENGTH_MAX ) +if ( rhdr->length > REC_LENGTH_MAX ) { -ERROR("Record (0x%08x, %s) length %#x exceeds max (%#x)", rhdr.type, - rec_type_to_str(rhdr.type), rhdr.length, REC_LENGTH_MAX); +ERROR("Record (0x%08x, %s) length %#x exceeds max (%#x)", rhdr->type, + rec_type_to_str(rhdr->type), rhdr->length, REC_LENGTH_MAX); return -1; } -datasz = ROUNDUP(rhdr.length, REC_ALIGN_ORDER); +return 0; +} + +int read_record_data(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr *rhdr, + struct xc_sr_record *rec) +{ +xc_interface *xch = ctx->xch; +size_t datasz; + +datasz = ROUNDUP(rhdr->length, REC_ALIGN_ORDER); if ( datasz ) { @@ -119,7 +126,7 @@ int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec) if ( !rec->data ) { ERROR("Unable to allocate %zu bytes for record data (0x%08x, %s)", - datasz, rhdr.type, rec_type_to_str(rhdr.type)); + datasz, rhdr->type, rec_type_to_str(rhdr->type)); return -1; } @@ -128,18 +135,18 @@ int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec) free(rec->data); rec->data = NULL; PERROR("Failed to read %zu bytes of data for record (0x%08x, %s)", - datasz, rhdr.type, rec_type_to_str(rhdr.type)); + datasz, rhdr->type, rec_type_to_str(rhdr->type)); return -1; } } else rec->data = NULL; -rec->type = rhdr.type; -rec->length = rhdr.length; +rec->type = rhdr->type; +rec->length = rhdr->length; return 0; -}; +} static void __attribute__((unused)) build_assertions(void) { diff --git a/tools/libs/guest/xg_sr_common.h b/tools/libs/guest/xg_sr_common.h index 3fe665b91d..66d1b0dfe6 100644 --- a/tools/libs/guest/xg_sr_common.h +++ b/tools/libs/guest/xg_sr_common.h @@ -475,7 +475,9 @@ static inline int write_record(struct xc_sr_context *ctx, * * On failure, the contents of the record structure are undefined. */ -int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec); +int read_record_header(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr *rhdr); +int read_record_data(struct xc_sr_context *ctx, int fd, struct xc_sr_rhdr *rhdr, + struct xc_sr_record *rec); /* * This would ideally be private in restore.c, but is needed by diff --git a/tools/libs/guest/xg_sr_restore.c b/tools/libs/guest/xg_sr_restore.c index 71b39612ee..93f69b9ba8 100644 --- a/tools/libs/guest/xg_sr_restore.c +++ b/tools/libs/guest/xg_sr_restore.c @@ -471,7 +471,7 @@ static int send_checkpoint_dirty_pfn_list(struct xc_sr_context *ctx) return rc; } -static int process_record(struct xc_sr_context *ctx, struct xc_sr_record *rec); +static int process_buffered_record(struct xc_sr_context *ctx, struct xc_sr_record *rec); static int handle_checkpoint(struct xc_sr_context *ctx) { xc_interface *xch = ctx->xch; @@ -510,7 +510,7 @@ static int handle_checkpoint(struct xc_sr_context *ctx) for ( i = 0; i < ctx->restore.buffered_rec_num; i++ ) { -rc = process_record(ctx, &ctx->restore.buffered_records[i]); +rc = process_buffered_record(ctx, &ctx->restore.buffered_records[i]); if ( rc ) goto err; } @@ -571,10 +571,11 @@ static int handle_checkpoint(struct xc_sr_context *ctx) return rc; } -static int buffer_
Re: [PATCH] x86emul: support AVX-VNNI
On 29/10/2020 15:01, Jan Beulich wrote: > These are VEX-encoded equivalents of the EVEX-encoded AVX512-VNNI ISA > extension. > > Signed-off-by: Jan Beulich Acked-by: Andrew Cooper
Re: [PATCH 2/2] x86: PV shim doesn't need GRANT_TABLE
On 29/10/2020 17:10, Jan Beulich wrote: > The only reference into the code controlled by this option is from the > hypercall table, and that hypercall table entry gets overwritten. > > Signed-off-by: Jan Beulich Acked-by: Andrew Cooper
Re: [PATCH 1/2] x86: fix build of PV shim when !GRANT_TABLE
On 29/10/2020 17:09, Jan Beulich wrote: > To do its compat translation, shim code needs to include the compat > header. For this to work, this header first of all needs to be > generated. > > Reported-by: Andrew Cooper > Signed-off-by: Jan Beulich Acked-by: Andrew Cooper
Re: [PATCH] x86/HVM: send mapcache invalidation request to qemu regardless of preemption
On 29/10/2020 15:14, Jan Beulich wrote: > Even if only part of a hypercall completed before getting preempted, > invalidation ought to occur. Therefore fold the two return statements. > > Signed-off-by: Jan Beulich Acked-by: Andrew Cooper
Re: [PATCH V2 00/23] IOREQ feature (+ virtio-mmio) on Arm
On Thu, Oct 29, 2020 at 9:42 AM Masami Hiramatsu < masami.hirama...@linaro.org> wrote: > Hi Oleksandr, > Hi Masami [sorry for the possible format issue] > > I would like to try this on my arm64 board. > Glad to hear you are interested in this topic. > > According to your comments in the patch, I made this config file. > # cat debian.conf > name = "debian" > type = "pvh" > vcpus = 8 > memory = 512 > kernel = "/opt/agl/vmlinuz-5.9.0-1-arm64" > ramdisk = "/opt/agl/initrd.img-5.9.0-1-arm64" > cmdline= "console=hvc0 earlyprintk=xen root=/dev/xvda1 rw" > disk = [ '/opt/agl/debian.qcow2,qcow2,hda' ] > vif = [ 'mac=00:16:3E:74:3d:76,bridge=xenbr0' ] > virtio = 1 > vdisk = [ 'backend=Dom0, disks=ro:/dev/sda1' ] > > And tried to boot a DomU, but I got below error. > > # xl create -c debian.conf > Parsing config from debian.conf > libxl: error: libxl_create.c:1863:domcreate_attach_devices: Domain > 1:unable to add virtio_disk devices > libxl: error: libxl_domain.c:1218:destroy_domid_pci_done: Domain > 1:xc_domain_pause failed > libxl: error: libxl_dom.c:39:libxl__domain_type: unable to get domain > type for domid=1 > libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain > 1:Unable to destroy guest > libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain > 1:Destruction of domain failed > > > Could you tell me how can I test it? > I assume it is due to the lack of the virtio-disk backend (which I haven't shared yet as I focused on the IOREQ/DM support on Arm in the first place). Could you wait a little bit, I am going to share it soon. -- Regards, Oleksandr Tyshchenko
[PATCH v2] libxl: Add suppress-vmdesc to QEMU machine
The device model state saved by QMP xen-save-devices-state doesn't include the vmdesc json. When restoring an HVM, xen-load-devices-state always triggers "Expected vmdescription section, but got 0". This is not a problem when restore comes from a file. However, when QEMU runs in a linux stubdom and comes over a console, EOF is not received. This causes a delay restoring - though it does restore. Setting suppress-vmdesc skips looking for the vmdesc during restore and avoids the wait. QEMU 5.2 enables suppress-vmdesc by default for xenfv, but this change sets it manually for xenfv and xen_platform_pci=0 when -machine pc is use. QEMU commit 9850c6047b8b "migration: Allow to suppress vmdesc submission" added suppress-vmdesc in QEMU 2.3. Signed-off-by: Jason Andryuk --- QEMU 2.3 came out in 2015, so setting suppress-vmdesc unilaterally should be okay... Is this okay? --- tools/libs/light/libxl_dm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c index d1ff35dda3..3da83259c0 100644 --- a/tools/libs/light/libxl_dm.c +++ b/tools/libs/light/libxl_dm.c @@ -1778,9 +1778,9 @@ static int libxl__build_device_model_args_new(libxl__gc *gc, /* Switching here to the machine "pc" which does not add * the xen-platform device instead of the default "xenfv" machine. */ -machinearg = libxl__strdup(gc, "pc,accel=xen"); +machinearg = libxl__strdup(gc, "pc,accel=xen,suppress-vmdesc=on"); } else { -machinearg = libxl__strdup(gc, "xenfv"); +machinearg = libxl__strdup(gc, "xenfv,suppress-vmdesc=on"); } if (b_info->u.hvm.mmio_hole_memkb) { uint64_t max_ram_below_4g = (1ULL << 32) - -- 2.25.1
Re: [PATCH] x86/pv: Drop stale comment in dom0_construct_pv()
On 29/10/2020 14:37, Jan Beulich wrote: > On 29.10.2020 15:00, Andrew Cooper wrote: >> This comment has been around since c/s 1372bca0615 in 2004. It is stale, as >> it predates the introduction of struct vcpu. > That commit only moved it around; it's 22a857bde9b8 afaics from > early 2003 where it first appeared, where it had a reason: > > /* > * WARNING: The new domain must have its 'processor' field > * filled in by now !! > */ > phys_l2tab = ALLOC_FRAME_FROM_DOMAIN(); > l2tab = map_domain_mem(phys_l2tab); > memcpy(l2tab, idle_pg_table[p->processor], PAGE_SIZE); Oh yes - my simple search didn't spot the reformat. > > But yes, the comment has been stale for a long time, and I've > been wondering a number of times what it was supposed to tell > me. (I think it was already stale at the point the comment > first got altered, in 3072fef54df8.) Looks like it became stale with 99db02d5097 "Remove CPU-dependent page-directory entries." which drops the per-cpu idle_pg_table. > >> It is not obvious that it was even correct at the time. Where a vcpu (domain >> at the time) has been configured to run is unrelated to construct the >> domain's >> initial pagetables, etc. >> >> Signed-off-by: Andrew Cooper > Acked-by: Jan Beulich Thanks. I'll update the commit message. ~Andrew
[xen-4.11-testing test] 156277: tolerable FAIL - PUSHED
flight 156277 xen-4.11-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/156277/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 12 debian-hvm-install fail like 156034 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 12 debian-hvm-install fail like 156034 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 156034 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 156034 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 156034 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 156034 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 156034 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 156034 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 156034 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 156034 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 156034 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 156034 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 156034 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass version targeted for testing: xen e274c8bdc12eb596e55233040e8b49da27150f31 baseline version: xen 63199dfd3a0418f1677c6ccd7fe05b123af4610a Last test of basis 156034 2020-10-20 13:35:54 Z9 days Testing same since 156262 2020-10-27 18:36:52 Z2 days2 attempts People who touched revisions under test: Andrew Cooper jobs: build-amd64-xsm pass build-
Re: [PATCH V2 00/23] IOREQ feature (+ virtio-mmio) on Arm
On Thu, 29 Oct 2020, Oleksandr Tyshchenko wrote: > On Thu, Oct 29, 2020 at 9:42 AM Masami Hiramatsu > wrote: > Hi Oleksandr, > > Hi Masami > > [sorry for the possible format issue] > > > I would like to try this on my arm64 board. > > Glad to hear you are interested in this topic. > > > According to your comments in the patch, I made this config file. > # cat debian.conf > name = "debian" > type = "pvh" > vcpus = 8 > memory = 512 > kernel = "/opt/agl/vmlinuz-5.9.0-1-arm64" > ramdisk = "/opt/agl/initrd.img-5.9.0-1-arm64" > cmdline= "console=hvc0 earlyprintk=xen root=/dev/xvda1 rw" > disk = [ '/opt/agl/debian.qcow2,qcow2,hda' ] > vif = [ 'mac=00:16:3E:74:3d:76,bridge=xenbr0' ] > virtio = 1 > vdisk = [ 'backend=Dom0, disks=ro:/dev/sda1' ] > > And tried to boot a DomU, but I got below error. > > # xl create -c debian.conf > Parsing config from debian.conf > libxl: error: libxl_create.c:1863:domcreate_attach_devices: Domain > 1:unable to add virtio_disk devices > libxl: error: libxl_domain.c:1218:destroy_domid_pci_done: Domain > 1:xc_domain_pause failed > libxl: error: libxl_dom.c:39:libxl__domain_type: unable to get domain > type for domid=1 > libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain > 1:Unable to destroy guest > libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain > 1:Destruction of domain failed > > > Could you tell me how can I test it? > > > I assume it is due to the lack of the virtio-disk backend (which I haven't > shared yet as I focused on the IOREQ/DM support on Arm in the > first place). > Could you wait a little bit, I am going to share it soon. Do you have a quick-and-dirty hack you can share in the meantime? Even just on github as a special branch? It would be very useful to be able to have a test-driver for the new feature.
Re: Xen on RP4
On Thu, 29 Oct 2020, Jürgen Groß wrote: > On 29.10.20 01:37, Stefano Stabellini wrote: > > On Tue, 27 Oct 2020, Elliott Mitchell wrote: > > > On Mon, Oct 26, 2020 at 06:44:27PM +, Julien Grall wrote: > > > > On 26/10/2020 16:03, Elliott Mitchell wrote: > > > > > On Mon, Oct 26, 2020 at 01:31:42PM +, Julien Grall wrote: > > > > > > On 24/10/2020 06:35, Elliott Mitchell wrote: > > > > > > > ACPI has a distinct > > > > > > > means of specifying a limited DMA-width; the above fails, because > > > > > > > it > > > > > > > assumes a *device-tree*. > > > > > > > > > > > > Do you know if it would be possible to infer from the ACPI static > > > > > > table > > > > > > the DMA-width? > > > > > > > > > > Yes, and it is. Due to not knowing much about ACPI tables I don't > > > > > know > > > > > what the C code would look like though (problem is which documentation > > > > > should I be looking at first?). > > > > > > > > What you provided below is an excerpt of the DSDT. AFAIK, DSDT content > > > > is written in AML. So far the shortest implementation I have seen for > > > > the AML parser is around 5000 lines (see [1]). It might be possible to > > > > strip some the code, although I think this will still probably too big > > > > for a single workaround. > > > > > > > > What I meant by "static table" is a table that looks like a structure > > > > and can be parsed in a few lines. If we can't find on contain the DMA > > > > window, then the next best solution is to find a way to identity the > > > > platform. > > > > > > > > I don't know enough ACPI to know if this solution is possible. A good > > > > starter would probably be the ACPI spec [2]. > > > > > > Okay, that is worse than I had thought (okay, ACPI is impressively > > > complex for something nominally firmware-level). > > > > > > There are strings in the present Tianocore implementation for Raspberry > > > PI 4B which could be targeted. Notably included in the output during > > > boot listing the tables, "RPIFDN", "RPIFDN RPI" and "RPIFDN RPI4" (I'm > > > unsure how kosher these are as this wsn't implemented nor blessed by the > > > Raspberry PI Foundation). > > > > > > I strongly dislike this approach as you soon turn the Xen project into a > > > database of hardware. This is already occurring with > > > xen/arch/arm/platforms and I would love to do something about this. On > > > that thought, how about utilizing Xen's command-line for this purpose? > > > > I don't think that a command line option is a good idea: basically it is > > punting to users the task of platform detection. Also, it means that > > users will be necessarily forced to edit the uboot script or grub > > configuration file to boot. > > > > Note that even if we introduced a new command line, we wouldn't take > > away the need for xen/arch/arm/platforms anyway. > > > > It would be far better for Xen to autodetect the platform if we can. > > > > > > > Have a procedure of during installation/updates retrieve DMA limitation > > > information from the running OS and the following boot Xen will follow > > > the appropriate setup. By its nature, Domain 0 will have the information > > > needed, just becomes an issue of how hard that is to retrieve... > > > > Historically that is what we used to do for many things related to ACPI, > > but unfortunately it leads to a pretty bad architecture where Xen > > depends on Dom0 for booting rather than the other way around. (Dom0 > > should be the one requiring Xen for booting, given that Xen is higher > > privilege and boots first.) > > > > > > I think the best compromise is still to use an ACPI string to detect the > > platform. For instance, would it be possible to use the OEMID fields in > > RSDT, XSDT, FADT? Possibly even a combination of them? > > > > Another option might be to get the platform name from UEFI somehow. > > What about having a small domain parsing the ACPI booting first and use > that information for booting dom0? > > That dom would be part of the Xen build and the hypervisor wouldn't need > to gain all the ACPI AML logic. That could work, but in practice we don't have such a domain today -- the infrastructure is missing. I wonder whether the bootloader (uboot or grub) would know about the platform and might be able to pass that information to Xen somehow.
[PATCH v1 00/23] reduce overhead during live migration
The current live migration code can easily saturate an 1Gb link. There is still room for improvement with faster network connections. Even with this series reviewed and applied. See description of patch #6. Olaf Olaf Hering (23): tools: add readv_exact to libxenctrl tools: add xc_is_known_page_type to libxenctrl tools: use xc_is_known_page_type tools: unify type checking for data pfns in migration stream tools: show migration transfer rate in send_dirty_pages tools/guest: prepare to allocate arrays once tools/guest: save: move batch_pfns tools/guest: save: move mfns array tools/guest: save: move types array tools/guest: save: move errors array tools/guest: save: move iov array tools/guest: save: move rec_pfns array tools/guest: save: move guest_data array tools/guest: save: move local_pages array tools/guest: restore: move pfns array tools/guest: restore: move types array tools/guest: restore: move mfns array tools/guest: restore: move map_errs array tools/guest: restore: move mfns array in populate_pfns tools/guest: restore: move pfns array in populate_pfns tools/guest: restore: split record processing tools/guest: restore: split handle_page_data tools/guest: restore: write data directly into guest tools/libs/ctrl/xc_private.c | 54 ++- tools/libs/ctrl/xc_private.h | 34 ++ tools/libs/guest/xg_sr_common.c | 33 +- tools/libs/guest/xg_sr_common.h | 86 +++- tools/libs/guest/xg_sr_restore.c | 562 +- tools/libs/guest/xg_sr_save.c | 158 tools/libs/guest/xg_sr_save_x86_hvm.c | 5 +- tools/libs/guest/xg_sr_save_x86_pv.c | 31 +- 8 files changed, 666 insertions(+), 297 deletions(-)
Re: [PATCH] meson.build: fix building of Xen support for aarch64
On Thu, 29 Oct 2020, Jason Andryuk wrote: > On Thu, Oct 29, 2020 at 6:01 AM Alex Bennée wrote: > > > > > > Stefano Stabellini writes: > > > > > On Wed, 28 Oct 2020, Alex Bennée wrote: > > >> Xen is supported on aarch64 although weirdly using the i386-softmmu > > >> model. Checking based on the host CPU meant we never enabled Xen > > >> support. It would be nice to enable CONFIG_XEN for aarch64-softmmu to > > >> make it not seem weird but that will require further build surgery. > > >> > > >> Signed-off-by: Alex Bennée > > >> Cc: Masami Hiramatsu > > >> Cc: Stefano Stabellini > > >> Cc: Anthony Perard > > >> Cc: Paul Durrant > > >> Fixes: 8a19980e3f ("configure: move accelerator logic to meson") > > >> --- > > >> meson.build | 2 ++ > > >> 1 file changed, 2 insertions(+) > > >> > > >> diff --git a/meson.build b/meson.build > > >> index 835424999d..f1fcbfed4c 100644 > > >> --- a/meson.build > > >> +++ b/meson.build > > >> @@ -81,6 +81,8 @@ if cpu in ['x86', 'x86_64'] > > >> 'CONFIG_HVF': ['x86_64-softmmu'], > > >> 'CONFIG_WHPX': ['i386-softmmu', 'x86_64-softmmu'], > > >>} > > >> +elif cpu in [ 'arm', 'aarch64' ] > > >> + accelerator_targets += { 'CONFIG_XEN': ['i386-softmmu'] } > > >> endif > > > > > > This looks very reasonable -- the patch makes sense. > > A comment would be useful to explain that Arm needs i386-softmmu for > the xenpv machine. > > > > > > > However I have two questions, mostly for my own understanding. I tried > > > to repro the aarch64 build problem but it works at my end, even without > > > this patch. > > > > Building on a x86 host or with cross compiler? > > > > > I wonder why. I suspect it works thanks to these lines in > > > meson.build: > > I think it's a runtime and not a build problem. In osstest, Xen > support was detected and configured, but CONFIG_XEN wasn't set for > Arm. So at runtime xen_available() returns 0, and QEMU doesn't start > with "qemu-system-i386: -xen-domid 1: Option not supported for this > target" > > I posted my investigation here: > https://lore.kernel.org/xen-devel/cakf6xpss8kpgovzrkitpz63bhbvbjxrtywdhekzuo2q1kem...@mail.gmail.com/ Right, but strangely I cannot reproduce it here. I am natively building (qemu-user) on aarch64 and for me CONFIG_XEN gets set.
Re: [PATCH] x86/hvm: process softirq while saving/loading entries
On Thu, Oct 29, 2020 at 05:38:17PM +0100, Jan Beulich wrote: > On 29.10.2020 16:20, Roger Pau Monne wrote: > > On slow systems with sync_console saving or loading the context of big > > guests can cause the watchdog to trigger. Fix this by adding a couple > > of process_pending_softirqs. > > Which raises the question in how far this is then also a problem > for the caller of the underlying hypercall. IOW I wonder whether > instead we need to make use of continuations here. Nevertheless FWIW, I've only hit this with debug builds on boxes that have slow serial with sync_console enabled, due to the verbose printks. > > > Signed-off-by: Roger Pau Monné > > Acked-by: Jan Beulich Thanks.
Re: [PATCH V2 00/23] IOREQ feature (+ virtio-mmio) on Arm
Oleksandr Tyshchenko writes: > On Thu, Oct 29, 2020 at 9:42 AM Masami Hiramatsu < > masami.hirama...@linaro.org> wrote: > >> Hi Oleksandr, >> > Hi Masami > > [sorry for the possible format issue] > > >> >> I would like to try this on my arm64 board. >> > Glad to hear you are interested in this topic. > > >> >> According to your comments in the patch, I made this config file. >> # cat debian.conf >> name = "debian" >> type = "pvh" >> vcpus = 8 >> memory = 512 >> kernel = "/opt/agl/vmlinuz-5.9.0-1-arm64" >> ramdisk = "/opt/agl/initrd.img-5.9.0-1-arm64" >> cmdline= "console=hvc0 earlyprintk=xen root=/dev/xvda1 rw" >> disk = [ '/opt/agl/debian.qcow2,qcow2,hda' ] >> vif = [ 'mac=00:16:3E:74:3d:76,bridge=xenbr0' ] >> virtio = 1 >> vdisk = [ 'backend=Dom0, disks=ro:/dev/sda1' ] >> >> And tried to boot a DomU, but I got below error. >> >> # xl create -c debian.conf >> Parsing config from debian.conf >> libxl: error: libxl_create.c:1863:domcreate_attach_devices: Domain >> 1:unable to add virtio_disk devices >> libxl: error: libxl_domain.c:1218:destroy_domid_pci_done: Domain >> 1:xc_domain_pause failed >> libxl: error: libxl_dom.c:39:libxl__domain_type: unable to get domain >> type for domid=1 >> libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain >> 1:Unable to destroy guest >> libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain >> 1:Destruction of domain failed >> >> >> Could you tell me how can I test it? >> > > I assume it is due to the lack of the virtio-disk backend (which I haven't > shared yet as I focused on the IOREQ/DM support on Arm in the first place). > Could you wait a little bit, I am going to share it soon. I assume this is wiring up the required bits of support to handle the IOREQ requests in QEMU? We are putting together a PoC demo to show a virtio enabled image (AGL) running on both KVM and Xen hypervisors so we are keen to see your code as soon as you can share it. I'm currently preparing a patch series for QEMU which fixes the recent breakage caused by changes to the build system. As part of that I've separated CONFIG_XEN and CONFIG_XEN_HVM so it's possible to build i386-softmmu with unneeded for ARM bits. Hopefully this will allow me to create a qemu-aarch64-system binary with just the PV related models in it. Talking to Stefano it probably makes sense going forward to introduce a new IOREQ aware machine type for QEMU that doesn't bring in the rest of the x86 overhead. I was thinking maybe xenvirt? You've tested with virtio-block but we'd also like to extend this to other arbitrary virtio devices. I guess we will need some sort of mechanism to inform the QEMU command line where each device sits in the virtio-mmio bus so the FDT Xen delivers to the guest matches up with what QEMU is ready to serve requests for? -- Alex Bennée
Re: [PATCH] meson.build: fix building of Xen support for aarch64
On Thu, 29 Oct 2020, Alex Bennée wrote: > > On Wed, 28 Oct 2020, Alex Bennée wrote: > >> Xen is supported on aarch64 although weirdly using the i386-softmmu > >> model. Checking based on the host CPU meant we never enabled Xen > >> support. It would be nice to enable CONFIG_XEN for aarch64-softmmu to > >> make it not seem weird but that will require further build surgery. > >> > >> Signed-off-by: Alex Bennée > >> Cc: Masami Hiramatsu > >> Cc: Stefano Stabellini > >> Cc: Anthony Perard > >> Cc: Paul Durrant > >> Fixes: 8a19980e3f ("configure: move accelerator logic to meson") > >> --- > >> meson.build | 2 ++ > >> 1 file changed, 2 insertions(+) > >> > >> diff --git a/meson.build b/meson.build > >> index 835424999d..f1fcbfed4c 100644 > >> --- a/meson.build > >> +++ b/meson.build > >> @@ -81,6 +81,8 @@ if cpu in ['x86', 'x86_64'] > >> 'CONFIG_HVF': ['x86_64-softmmu'], > >> 'CONFIG_WHPX': ['i386-softmmu', 'x86_64-softmmu'], > >>} > >> +elif cpu in [ 'arm', 'aarch64' ] > >> + accelerator_targets += { 'CONFIG_XEN': ['i386-softmmu'] } > >> endif > > > > This looks very reasonable -- the patch makes sense. > > > > > > However I have two questions, mostly for my own understanding. I tried > > to repro the aarch64 build problem but it works at my end, even without > > this patch. > > Building on a x86 host or with cross compiler? native build (qemu-user) > > I wonder why. I suspect it works thanks to these lines in > > meson.build: > > > > if not get_option('xen').disabled() and 'CONFIG_XEN_BACKEND' in > > config_host > > accelerators += 'CONFIG_XEN' > > have_xen_pci_passthrough = not > > get_option('xen_pci_passthrough').disabled() and targetos == 'linux' > > else > > have_xen_pci_passthrough = false > > endif > > > > But I am not entirely sure who is adding CONFIG_XEN_BACKEND to > > config_host. > > The is part of the top level configure check - which basically checks > for --enable-xen or autodetects the presence of the userspace libraries. > I'm not sure if we have a slight over proliferation of symbols for Xen > support (although I'm about to add more). > > > The other question is: does it make sense to print the value of > > CONFIG_XEN as part of the summary? Something like: > > > > diff --git a/meson.build b/meson.build > > index 47e32e1fcb..c6e7832dc9 100644 > > --- a/meson.build > > +++ b/meson.build > > @@ -2070,6 +2070,7 @@ summary_info += {'KVM support': > > config_all.has_key('CONFIG_KVM')} > > summary_info += {'HAX support': config_all.has_key('CONFIG_HAX')} > > summary_info += {'HVF support': config_all.has_key('CONFIG_HVF')} > > summary_info += {'WHPX support': config_all.has_key('CONFIG_WHPX')} > > +summary_info += {'XEN support': config_all.has_key('CONFIG_XEN')} > > summary_info += {'TCG support': config_all.has_key('CONFIG_TCG')} > > if config_all.has_key('CONFIG_TCG') > >summary_info += {'TCG debug enabled': > > config_host.has_key('CONFIG_DEBUG_TCG')} > > > > > > But I realize there is already: > > > > summary_info += {'xen support': > > config_host.has_key('CONFIG_XEN_BACKEND')} > > > > so it would be a bit of a duplicate > > Hmm so what we have is: > > CONFIG_XEN_BACKEND > - ensures that appropriate compiler flags are added > - pegs RAM_ADDR_MAX at UINT64_MAX (instead of UINTPTR_MAX) > CONFIG_XEN > - which controls a bunch of build objects, some of which may be i386 only? > ./accel/meson.build:15:specific_ss.add_all(when: ['CONFIG_XEN'], if_true: > dummy_ss) > ./accel/stubs/meson.build:2:specific_ss.add(when: 'CONFIG_XEN', if_false: > files('xen-stub.c')) > ./accel/xen/meson.build:1:specific_ss.add(when: 'CONFIG_XEN', if_true: > files('xen-all.c')) > ./hw/9pfs/meson.build:17:fs_ss.add(when: 'CONFIG_XEN', if_true: > files('xen-9p-backend.c')) > ./hw/block/dataplane/meson.build:2:specific_ss.add(when: 'CONFIG_XEN', > if_true: files('xen-block.c')) > ./hw/block/meson.build:14:softmmu_ss.add(when: 'CONFIG_XEN', if_true: > files('xen-block.c')) > ./hw/char/meson.build:23:softmmu_ss.add(when: 'CONFIG_XEN', if_true: > files('xen_console.c')) > ./hw/display/meson.build:18:softmmu_ss.add(when: 'CONFIG_XEN', if_true: > files('xenfb.c')) > ./hw/i386/xen/meson.build:1:i386_ss.add(when: 'CONFIG_XEN', if_true: > files('xen-hvm.c', > > 'xen-mapcache.c', > > 'xen_apic.c', > > 'xen_platform.c', > > 'xen_pvdevice.c') > ./hw/net/meson.build:2:softmmu_ss.add(when: 'CONFIG_XEN', if_true: > files('xen_nic.c')) > ./hw/usb/meson.build:76:softmmu_ss.add(when: ['CONFIG_USB', 'CONFIG_XEN', > libusb], if_true: files('xen-usb.c')) >
Re: [PATCH V2 00/23] IOREQ feature (+ virtio-mmio) on Arm
On Thu, 29 Oct 2020, Alex Bennée wrote: > Oleksandr Tyshchenko writes: > > On Thu, Oct 29, 2020 at 9:42 AM Masami Hiramatsu < > > masami.hirama...@linaro.org> wrote: > > > >> Hi Oleksandr, > >> > > Hi Masami > > > > [sorry for the possible format issue] > > > > > >> > >> I would like to try this on my arm64 board. > >> > > Glad to hear you are interested in this topic. > > > > > >> > >> According to your comments in the patch, I made this config file. > >> # cat debian.conf > >> name = "debian" > >> type = "pvh" > >> vcpus = 8 > >> memory = 512 > >> kernel = "/opt/agl/vmlinuz-5.9.0-1-arm64" > >> ramdisk = "/opt/agl/initrd.img-5.9.0-1-arm64" > >> cmdline= "console=hvc0 earlyprintk=xen root=/dev/xvda1 rw" > >> disk = [ '/opt/agl/debian.qcow2,qcow2,hda' ] > >> vif = [ 'mac=00:16:3E:74:3d:76,bridge=xenbr0' ] > >> virtio = 1 > >> vdisk = [ 'backend=Dom0, disks=ro:/dev/sda1' ] > >> > >> And tried to boot a DomU, but I got below error. > >> > >> # xl create -c debian.conf > >> Parsing config from debian.conf > >> libxl: error: libxl_create.c:1863:domcreate_attach_devices: Domain > >> 1:unable to add virtio_disk devices > >> libxl: error: libxl_domain.c:1218:destroy_domid_pci_done: Domain > >> 1:xc_domain_pause failed > >> libxl: error: libxl_dom.c:39:libxl__domain_type: unable to get domain > >> type for domid=1 > >> libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain > >> 1:Unable to destroy guest > >> libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain > >> 1:Destruction of domain failed > >> > >> > >> Could you tell me how can I test it? > >> > > > > I assume it is due to the lack of the virtio-disk backend (which I haven't > > shared yet as I focused on the IOREQ/DM support on Arm in the first place). > > Could you wait a little bit, I am going to share it soon. > > I assume this is wiring up the required bits of support to handle the > IOREQ requests in QEMU? We are putting together a PoC demo to show > a virtio enabled image (AGL) running on both KVM and Xen hypervisors so > we are keen to see your code as soon as you can share it. > > I'm currently preparing a patch series for QEMU which fixes the recent > breakage caused by changes to the build system. As part of that I've > separated CONFIG_XEN and CONFIG_XEN_HVM so it's possible to build > i386-softmmu with unneeded for ARM bits. Hopefully this will allow me to > create a qemu-aarch64-system binary with just the PV related models in > it. > > Talking to Stefano it probably makes sense going forward to introduce a > new IOREQ aware machine type for QEMU that doesn't bring in the rest of > the x86 overhead. I was thinking maybe xenvirt? > > You've tested with virtio-block but we'd also like to extend this to > other arbitrary virtio devices. I guess we will need some sort of > mechanism to inform the QEMU command line where each device sits in the > virtio-mmio bus so the FDT Xen delivers to the guest matches up with > what QEMU is ready to serve requests for? That would be great. As a reference, given that the domU memory mapping layout is fixed, on x86 we just made sure that QEMU's idea of where the devices are is the same of Xen's. What you are suggesting would make it much more flexible and clearer.
Re: [PATCH v3 2/3] x86/shadow: refactor shadow_vram_{get,put}_l1e()
At 10:44 +0200 on 19 Oct (1603104271), Jan Beulich wrote: > By passing the functions an MFN and flags, only a single instance of > each is needed; they were pretty large for being inline functions > anyway. > > While moving the code, also adjust coding style and add const where > sensible / possible. > > Signed-off-by: Jan Beulich Acked-by: Tim Deegan
Re: [XEN PATCH v1] xen/arm : Add support for SMMUv3 driver
On Thu, 29 Oct 2020, Bertrand Marquis wrote: > > On 28 Oct 2020, at 19:12, Julien Grall wrote: > > On 26/10/2020 11:03, Rahul Singh wrote: > >> Hello Julien, > >>> On 23 Oct 2020, at 4:19 pm, Julien Grall wrote: > >>> On 23/10/2020 15:27, Rahul Singh wrote: > Hello Julien, > > On 23 Oct 2020, at 2:00 pm, Julien Grall wrote: > > On 23/10/2020 12:35, Rahul Singh wrote: > >> Hello, > >>> On 23 Oct 2020, at 1:02 am, Stefano Stabellini > >>> wrote: > >>> > >>> On Thu, 22 Oct 2020, Julien Grall wrote: > >> On 20/10/2020 16:25, Rahul Singh wrote: > >>> Add support for ARM architected SMMUv3 implementations. It is > >>> based on > >>> the Linux SMMUv3 driver. > >>> Major differences between the Linux driver are as follows: > >>> 1. Only Stage-2 translation is supported as compared to the Linux > >>> driver > >>>that supports both Stage-1 and Stage-2 translations. > >>> 2. Use P2M page table instead of creating one as SMMUv3 has the > >>>capability to share the page tables with the CPU. > >>> 3. Tasklets is used in place of threaded IRQ's in Linux for event > >>> queue > >>>and priority queue IRQ handling. > >> > >> Tasklets are not a replacement for threaded IRQ. In particular, > >> they will > >> have priority over anything else (IOW nothing will run on the pCPU > >> until > >> they are done). > >> > >> Do you know why Linux is using thread. Is it because of long > >> running > >> operations? > > > > Yes you are right because of long running operations Linux is using > > the > > threaded IRQs. > > > > SMMUv3 reports fault/events bases on memory-based circular buffer > > queues not > > based on the register. As per my understanding, it is > > time-consuming to > > process the memory based queues in interrupt context because of > > that Linux > > is using threaded IRQ to process the faults/events from SMMU. > > > > I didn’t find any other solution in XEN in place of tasklet to > > defer the > > work, that’s why I used tasklet in XEN in replacement of threaded > > IRQs. If > > we do all work in interrupt context we will make XEN less > > responsive. > > So we need to make sure that Xen continue to receives interrupts, > but we also > need to make sure that a vCPU bound to the pCPU is also responsive. > > > > > If you know another solution in XEN that will be used to defer the > > work in > > the interrupt please let me know I will try to use that. > > One of my work colleague encountered a similar problem recently. He > had a long > running tasklet and wanted to be broken down in smaller chunk. > > We decided to use a timer to reschedule the taslket in the future. > This allows > the scheduler to run other loads (e.g. vCPU) for some time. > > This is pretty hackish but I couldn't find a better solution as > tasklet have > high priority. > > Maybe the other will have a better idea. > >>> > >>> Julien's suggestion is a good one. > >>> > >>> But I think tasklets can be configured to be called from the > >>> idle_loop, > >>> in which case they are not run in interrupt context? > >>> > >> Yes you are right tasklet will be scheduled from the idle_loop that > >> is not interrupt conext. > > > > This depends on your tasklet. Some will run from the softirq context > > which is usually (for Arm) on the return of an exception. > > > Thanks for the info. I will check and will get better understanding of > the tasklet how it will run in XEN. > >>> > >>> 4. Latest version of the Linux SMMUv3 code implements the > >>> commands queue > >>>access functions based on atomic operations implemented in > >>> Linux. > >> > >> Can you provide more details? > > > > I tried to port the latest version of the SMMUv3 code than I > > observed that > > in order to port that code I have to also port atomic operation > > implemented > > in Linux to XEN. As latest Linux code uses atomic operation to > > process the > > command queues > > (atomic_cond_read_relaxed(),atomic_long_cond_read_relaxed() , > > atomic_fetch_andnot_relaxed()) . > > Thank you for the explanation. I think it would be best to import > the atomic > helpers and use the latest code. > > This will ensure
Re: [PATCH v3 3/3] x86/shadow: sh_{make,destroy}_monitor_table() are "even more" HVM-only
At 10:45 +0200 on 19 Oct (1603104300), Jan Beulich wrote: > With them depending on just the number of shadow levels, there's no need > for more than one instance of them, and hence no need for any hook (IOW > 452219e24648 ["x86/shadow: monitor table is HVM-only"] didn't go quite > far enough). Move the functions to hvm.c while dropping the dead > is_pv_32bit_domain() code paths. > > While moving the code, replace a stale comment reference to > sh_install_xen_entries_in_l4(). Doing so made me notice the function > also didn't have its prototype dropped in 8d7b633adab7 ("x86/mm: > Consolidate all Xen L4 slot writing into init_xen_l4_slots()"), which > gets done here as well. > > Signed-off-by: Jan Beulich Acked-by: Tim Deegan > TBD: In principle both functions could have their first parameter > constified. In fact, "destroy" doesn't depend on the vCPU at all > and hence could be passed a struct domain *. Not sure whether such > an asymmetry would be acceptable. > In principle "make" would also not need passing of the number of > shadow levels (can be derived from v), which would result in yet > another asymmetry. > If these asymmetries were acceptable, "make" could then also update > v->arch.hvm.monitor_table, instead of doing so at both call sites. Feel free to add consts, but please don't change the function parameters any more than that. I would rather keep them as a matched pair, and leave the hvm.monitor_table updates in the caller, where it's easier to see why they're not symmetrical. Cheers Tim.
Re: [PATCH 2/5] x86/p2m: collapse the two ->write_p2m_entry() hooks
At 10:22 +0100 on 28 Oct (1603880578), Jan Beulich wrote: > The struct paging_mode instances get set to the same functions > regardless of mode by both HAP and shadow code, hence there's no point > having this hook there. The hook also doesn't need moving elsewhere - we > can directly use struct p2m_domain's. This merely requires (from a > strictly formal pov; in practice this may not even be needed) making > sure we don't end up using safe_write_pte() for nested P2Ms. > > Signed-off-by: Jan Beulich Acked-by: Tim Deegan
Re: [PATCH] [v2] x86: apic: avoid -Wshadow warning in header
On Thu, Oct 29 2020 at 17:59, Paolo Bonzini wrote: > On 29/10/20 17:56, Arvind Sankar wrote: >>> For those two just add: >>> struct apic *apic = x86_system_apic; >>> before all the assignments. >>> Less churn and much better code. >>> >> Why would it be better code? >> > > I think he means the compiler produces better code, because it won't > read the global variable repeatedly. Not sure if that's true,(*) but I > think I do prefer that version if Arnd wants to do that tweak. It's not true. foo *p = bar; p->a = 1; p->b = 2; The compiler is free to reload bar after accessing p->a and with bar->a = 1; bar->b = 1; it can either cache bar in a register or reread it after bar->a The generated code is the same as long as there is no reason to reload, e.g. register pressure. Thanks, tglx
Re: [PATCH 5/5] x86/p2m: split write_p2m_entry() hook
At 10:24 +0100 on 28 Oct (1603880693), Jan Beulich wrote: > Fair parts of the present handlers are identical; in fact > nestedp2m_write_p2m_entry() lacks a call to p2m_entry_modify(). Move > common parts right into write_p2m_entry(), splitting the hooks into a > "pre" one (needed just by shadow code) and a "post" one. > > For the common parts moved I think that the p2m_flush_nestedp2m() is, > at least from an abstract perspective, also applicable in the shadow > case. Hence it doesn't get a 3rd hook put in place. > > The initial comment that was in hap_write_p2m_entry() gets dropped: Its > placement was bogus, and looking back the the commit introducing it > (dd6de3ab9985 "Implement Nested-on-Nested") I can't see either what use > of a p2m it was meant to be associated with. > > Signed-off-by: Jan Beulich This seems like a good approach to me. I'm happy with the shadow parts but am not confident enough on nested p2m any more to have an opinion on that side. Acked-by: Tim Deegan
Re: [PATCH] x86/shadow: correct GFN use by sh_unshadow_for_p2m_change()
At 16:42 +0100 on 28 Oct (1603903365), Jan Beulich wrote: > Luckily sh_remove_all_mappings()'s use of the parameter is limited to > generation of log messages. Nevertheless we'd better pass correct GFNs > around: > - the incoming GFN, when replacing a large page, may not be large page > aligned, > - incrementing by page-size-scaled values can't be right. > > Signed-off-by: Jan Beulich Reviewed-by: Tim Deegan Thanks! Tim.
Re: Ping: [PATCH v3 0/3] x86: shim building adjustments (plus shadow follow-on)
At 14:40 +0100 on 29 Oct (1603982415), Jan Beulich wrote: > Tim, > > unless you tell me otherwise I'm intending to commit the latter > two with Roger's acks some time next week. Of course it would > still be nice to know your view on the first of the TBDs in > patch 3 (which may result in further changes, in a follow-up). Ack, sorry for the dropped patches, and thank you for the ping. I've now replied to everything that I think is wating for my review. Cheers, Tim.