Re: [PATCH] block: make BlockConf.*_size properties 32-bit
On Wed, Feb 12, 2020 at 03:44:19PM -0600, Eric Blake wrote: > On 2/11/20 5:54 AM, Roman Kagan wrote: > > Devices (virtio-blk, scsi, etc.) and the block layer are happy to use > > 32-bit for logical_block_size, physical_block_size, and min_io_size. > > However, the properties in BlockConf are defined as uint16_t limiting > > the values to 32768. > > > > This appears unnecessary tight, and we've seen bigger block sizes handy > > at times. > > What larger sizes? I could see 64k or maybe even 1M block sizes,... We played exactly with these two :) > > > > Make them 32 bit instead and lift the limitation. > > > > Signed-off-by: Roman Kagan > > --- > > hw/core/qdev-properties.c| 21 - > > include/hw/block/block.h | 8 > > include/hw/qdev-properties.h | 2 +- > > 3 files changed, 17 insertions(+), 14 deletions(-) > > > > diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c > > index 7f93bfeb88..5f84e4a3b8 100644 > > --- a/hw/core/qdev-properties.c > > +++ b/hw/core/qdev-properties.c > > @@ -716,30 +716,32 @@ const PropertyInfo qdev_prop_pci_devfn = { > > /* --- blocksize --- */ > > +#define MIN_BLOCK_SIZE 512 > > +#define MAX_BLOCK_SIZE 2147483648 > > ...but 2G block sizes are going to have tremendous performance problems. > > I'm not necessarily opposed to the widening to a 32-bit type, but think you > need more justification or a smaller number for the max block size, I thought any smaller value would just be arbitrary and hard to reason about, so I went ahead with the max value that fit in the type and could be made visibile to the guest. Besides this is a property that is set explicitly, so I don't see a problem leaving this up to the user. > particularly since qcow2 refuses to use cluster sizes larger than 2M and it > makes no sense to allow a block size larger than a cluster size. This still doesn't contradict passing a bigger value to the guest, for experimenting if nothing else. Thanks, Roman.
Re: [PATCH v2 0/4] arm64: Add the cpufreq device to show cpufreq info to guest
Patchew URL: https://patchew.org/QEMU/20200213073630.2125-1-fangyi...@huawei.com/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash make docker-image-centos7 V=1 NETWORK=1 time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 === TEST SCRIPT END === Using expected file 'tests/data/acpi/virt/DSDT' acpi-test: Warning! DSDT binary file mismatch. Actual [aml:/tmp/aml-SWR9F0], Expected [aml:tests/data/acpi/virt/DSDT]. to see ASL diff between mismatched files install IASL, rebuild QEMU from scratch and re-run tests with V=1 environment variable set** ERROR:/tmp/qemu-test/src/tests/qtest/bios-tables-test.c:490:test_acpi_asl: assertion failed: (all_tables_match) ERROR - Bail out! ERROR:/tmp/qemu-test/src/tests/qtest/bios-tables-test.c:490:test_acpi_asl: assertion failed: (all_tables_match) make: *** [check-qtest-aarch64] Error 1 make: *** Waiting for unfinished jobs Could not access KVM kernel module: No such file or directory qemu-system-x86_64: -accel kvm: failed to initialize kvm: No such file or directory --- raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=66a8dfa4a91642e38c77cdc7d2e5fc0c', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-bzlhgmky/src/docker-src.2020-02-13-02.56.43.27075:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2. filter=--filter=label=com.qemu.instance.uuid=66a8dfa4a91642e38c77cdc7d2e5fc0c make[1]: *** [docker-run] Error 1 make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-bzlhgmky/src' make: *** [docker-run-test-quick@centos7] Error 2 real11m15.564s user0m8.311s The full log is available at http://patchew.org/logs/20200213073630.2125-1-fangyi...@huawei.com/testing.docker-quick@centos7/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
Re: [PATCH v2] virtio: increase virtuqueue size for virtio-scsi and virtio-blk
On 12.02.2020 18:43, Stefan Hajnoczi wrote: On Tue, Feb 11, 2020 at 05:14:14PM +0300, Denis Plotnikov wrote: The goal is to reduce the amount of requests issued by a guest on 1M reads/writes. This rises the performance up to 4% on that kind of disk access pattern. The maximum chunk size to be used for the guest disk accessing is limited with seg_max parameter, which represents the max amount of pices in the scatter-geather list in one guest disk request. Since seg_max is virqueue_size dependent, increasing the virtqueue size increases seg_max, which, in turn, increases the maximum size of data to be read/write from a guest disk. More details in the original problem statment: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg03721.html Suggested-by: Denis V. Lunev Signed-off-by: Denis Plotnikov --- hw/block/virtio-blk.c | 4 ++-- hw/core/machine.c | 2 ++ hw/scsi/virtio-scsi.c | 4 ++-- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 09f46ed85f..6df3a7a6df 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -914,7 +914,7 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) memset(&blkcfg, 0, sizeof(blkcfg)); virtio_stq_p(vdev, &blkcfg.capacity, capacity); virtio_stl_p(vdev, &blkcfg.seg_max, - s->conf.seg_max_adjust ? s->conf.queue_size - 2 : 128 - 2); + s->conf.seg_max_adjust ? s->conf.queue_size - 2 : 256 - 2); This value must not change on older machine types. Yes, that's true, but .. So does this patch need to turn seg-max-adjust *on* in hw_compat_4_2 so that old machine types get 126 instead of 254? If we set seg-max-adjust "on" in older machine types, the setups using them and having queue_sizes set , for example, 1024 will also set seg_max to 1024 - 2 which isn't the expected behavior: older mt didn't change seg_max in that case and stuck with 128 - 2. So, should we, instead, leave the default 128 - 2, for seg_max? Denis virtio_stw_p(vdev, &blkcfg.geometry.cylinders, conf->cyls); virtio_stl_p(vdev, &blkcfg.blk_size, blk_size); virtio_stw_p(vdev, &blkcfg.min_io_size, conf->min_io_size / blk_size); @@ -1272,7 +1272,7 @@ static Property virtio_blk_properties[] = { DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0, true), DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1), -DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 128), +DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 256), DEFINE_PROP_BOOL("seg-max-adjust", VirtIOBlock, conf.seg_max_adjust, true), DEFINE_PROP_LINK("iothread", VirtIOBlock, conf.iothread, TYPE_IOTHREAD, IOThread *), diff --git a/hw/core/machine.c b/hw/core/machine.c index 2501b540ec..3427d6cf4c 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -28,6 +28,8 @@ #include "hw/mem/nvdimm.h" GlobalProperty hw_compat_4_2[] = { +{ "virtio-blk-device", "queue-size", "128"}, +{ "virtio-scsi-device", "virtqueue_size", "128"}, { "virtio-blk-device", "x-enable-wce-if-config-wce", "off" }, { "virtio-blk-device", "seg-max-adjust", "off"}, { "virtio-scsi-device", "seg_max_adjust", "off"}, diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index 3b61563609..b38f50a429 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -660,7 +660,7 @@ static void virtio_scsi_get_config(VirtIODevice *vdev, virtio_stl_p(vdev, &scsiconf->num_queues, s->conf.num_queues); virtio_stl_p(vdev, &scsiconf->seg_max, - s->conf.seg_max_adjust ? s->conf.virtqueue_size - 2 : 128 - 2); + s->conf.seg_max_adjust ? s->conf.virtqueue_size - 2 : 256 - 2); virtio_stl_p(vdev, &scsiconf->max_sectors, s->conf.max_sectors); virtio_stl_p(vdev, &scsiconf->cmd_per_lun, s->conf.cmd_per_lun); virtio_stl_p(vdev, &scsiconf->event_info_size, sizeof(VirtIOSCSIEvent)); @@ -965,7 +965,7 @@ static void virtio_scsi_device_unrealize(DeviceState *dev, Error **errp) static Property virtio_scsi_properties[] = { DEFINE_PROP_UINT32("num_queues", VirtIOSCSI, parent_obj.conf.num_queues, 1), DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSI, - parent_obj.conf.virtqueue_size, 128), + parent_obj.conf.virtqueue_size, 256), DEFINE_PROP_BOOL("seg_max_adjust", VirtIOSCSI, parent_obj.conf.seg_max_adjust, true), DEFINE_PROP_UINT32("max_sectors", VirtIOSCSI, parent_obj.conf.max_sectors, -- 2.17.0
Re: [PATCH v2 0/4] arm64: Add the cpufreq device to show cpufreq info to guest
On Thu, Feb 13, 2020 at 03:36:26PM +0800, Ying Fang wrote: > On ARM64 platform, cpu frequency is retrieved via ACPI CPPC. > A virtual cpufreq device based on ACPI CPPC is created to > present cpu frequency info to the guest. > > The default frequency is set to host cpu nominal frequency, > which is obtained from the host CPPC sysfs. Other performance > data are set to the same value, since we don't support guest > performance scaling here. > > Performance counters are also not emulated and they simply > return 1 if read, and guest should fallback to use desired > performance value as the current performance. > > Guest kernel version above 4.18 is required to make it work. > This is v2 of the series, but I don't see a changelog. Can you please describe the motivation for this? If I understand correctly, all of this is just to inform the guest of the host's CPU0 nominal or max (if nominal isn't present?) frequency. Why do that? What happens if the guest migrates somewhere where the frequency is different? If this is for a special use case, then why not come up with a different channel (guest agent?) to pass this information? Thanks, drew
Re: [PATCH v1 5/5] travis.yml: Test the s390-ccw build, too
On Wed, 12 Feb 2020 21:48:42 +0100 Philippe Mathieu-Daudé wrote: > On 2/7/20 12:39 PM, Alex Bennée wrote: > > From: Thomas Huth > > > > Since we can now use a s390x host on Travis, we can also build and > > test the s390-ccw bios images there. For this we have to make sure > > that roms/SLOF is checked out, too, and then move the generated *.img > > files to the right location before running the tests. > > > > Signed-off-by: Thomas Huth > > Signed-off-by: Alex Bennée > > Message-Id: <20200206202543.7085-1-th...@redhat.com> > > --- > > .travis.yml | 10 ++ > > 1 file changed, 10 insertions(+) > Already reviewed/tested [*] with comment: > > Maybe remove the trailing ", too" in subject... > > Reviewed-by: Philippe Mathieu-Daudé > Tested-by: Philippe Mathieu-Daudé > > [*] https://www.mail-archive.com/qemu-devel@nongnu.org/msg677641.html > Hm, I also gave an Acked-by: Cornelia Huck
Re: [PATCH 0/2] Add AVX512F optimization option and buffer_zero_avx512()
Patchew URL: https://patchew.org/QEMU/1581580379-54109-1-git-send-email-robert...@linux.intel.com/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash make docker-image-centos7 V=1 NETWORK=1 time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 === TEST SCRIPT END === qemu-system-aarch64: check_section_footer: Read section footer failed: -5 qemu-system-aarch64: load of migration failed: Invalid argument /tmp/qemu-test/src/tests/qtest/libqtest.c:140: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0) ERROR - too few tests run (expected 15, got 2) make: *** [check-qtest-aarch64] Error 1 make: *** Waiting for unfinished jobs TESTcheck-unit: tests/test-bitmap TESTcheck-unit: tests/test-aio --- qemu-system-x86_64: check_section_footer: Read section footer failed: -5 qemu-system-x86_64: load of migration failed: Invalid argument /tmp/qemu-test/src/tests/qtest/libqtest.c:140: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0) ERROR - too few tests run (expected 74, got 62) make: *** [check-qtest-x86_64] Error 1 TESTcheck-unit: tests/test-bufferiszero ERROR - too few tests run (expected 1, got 0) make: *** [check-unit] Error 1 TESTiotest-qcow2: 030 [fail] QEMU -- "/tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64" -nodefaults -display none -accel qtest QEMU_IMG -- "/tmp/qemu-test/build/tests/qemu-iotests/../../qemu-img" --- +++ /tmp/qemu-test/build/tests/qemu-iotests/030.out.bad 2020-02-13 08:33:05.905912850 + @@ -1,5 +1,335 @@ -... +WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads +WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads +WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads +WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads +WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/test.img,format=qcow2,cache=writeback,aio=threads +WARNING:qemu.machine:qemu received signal 4: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.90Fwth2KCH/qemu-16248-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.90Fwth2KCH/qemu-16248-qtest.sock -accel qtest -nodefaults -display none -accel qtest -drive if=virtio,id=drive0,file=/tmp/qemu-test/img-8.img,format=qcow2,cache=writeback,aio=threads,backing.backing.backing.backing.backing.backing.backing.backing.node-name=node0,backing.backing.backing.backing.backing.backing.backing.node-name=node1,backing.backing.backing.backing.backing.backing.node-name=node2,backing.backing.backing.backing.backing.node-name=node3,backing.backing.backing.backing.node-name=node4,backing.backing.backing.node-name=node5,backing.backing.node-
Re: [PATCH qemu v6 2/6] ppc/spapr: Move GPRs setup to one place
On Mon, 3 Feb 2020 14:29:39 +1100 Alexey Kardashevskiy wrote: > At the moment "pseries" starts in SLOF which only expects the FDT blob > pointer in r3. As we are going to introduce a OpenFirmware support in > QEMU, we will be booting OF clients directly and these expect a stack > pointer in r1, the OF entry point in r5 and in addition to this, Linux > looks at r3/r4 for the initramdisk location (although vmlinux can find > this from the device tree but zImage from distro kernels cannot). > > This extends spapr_cpu_set_entry_state() to take more registers. This > should cause no behavioral change. > > Signed-off-by: Alexey Kardashevskiy > --- Reviewed-by: Greg Kurz > include/hw/ppc/spapr_cpu_core.h | 4 +++- > hw/ppc/spapr.c | 4 ++-- > hw/ppc/spapr_cpu_core.c | 7 ++- > hw/ppc/spapr_rtas.c | 2 +- > 4 files changed, 12 insertions(+), 5 deletions(-) > > diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h > index 1c4cc6559c52..edd7214fafcf 100644 > --- a/include/hw/ppc/spapr_cpu_core.h > +++ b/include/hw/ppc/spapr_cpu_core.h > @@ -40,7 +40,9 @@ typedef struct SpaprCpuCoreClass { > } SpaprCpuCoreClass; > > const char *spapr_get_cpu_core_type(const char *cpu_type); > -void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip, > target_ulong r3); > +void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip, > + target_ulong r1, target_ulong r3, > + target_ulong r4, target_ulong r5); > > typedef struct SpaprCpuState { > uint64_t vpa_addr; > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index c9b2e0a5e060..660a4b60e072 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -1674,8 +1674,8 @@ static void spapr_machine_reset(MachineState *machine) > spapr->fdt_blob = fdt; > > /* Set up the entry state */ > -spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT, fdt_addr); > -first_ppc_cpu->env.gpr[5] = 0; > +spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT, > + 0, fdt_addr, 0, 0); > > spapr->cas_reboot = false; > > diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c > index d09125d9afd4..696b76598dd7 100644 > --- a/hw/ppc/spapr_cpu_core.c > +++ b/hw/ppc/spapr_cpu_core.c > @@ -84,13 +84,18 @@ static void spapr_reset_vcpu(PowerPCCPU *cpu) > spapr_irq_cpu_intc_reset(spapr, cpu); > } > > -void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip, > target_ulong r3) > +void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip, > + target_ulong r1, target_ulong r3, > + target_ulong r4, target_ulong r5) > { > PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu); > CPUPPCState *env = &cpu->env; > > env->nip = nip; > +env->gpr[1] = r1; > env->gpr[3] = r3; > +env->gpr[4] = r4; > +env->gpr[5] = r5; > kvmppc_set_reg_ppc_online(cpu, 1); > CPU(cpu)->halted = 0; > /* Enable Power-saving mode Exit Cause exceptions */ > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c > index 656fdd221665..9e3cbd70bbd9 100644 > --- a/hw/ppc/spapr_rtas.c > +++ b/hw/ppc/spapr_rtas.c > @@ -190,7 +190,7 @@ static void rtas_start_cpu(PowerPCCPU *callcpu, > SpaprMachineState *spapr, > */ > newcpu->env.tb_env->tb_offset = callcpu->env.tb_env->tb_offset; > > -spapr_cpu_set_entry_state(newcpu, start, r3); > +spapr_cpu_set_entry_state(newcpu, start, 0, r3, 0, 0); > > qemu_cpu_kick(CPU(newcpu)); >
[PATCH] qemu-doc: Clarify extent of build platform support
Supporting a build platform beyond its end of life makes no sense. Spell that out just to be clear. Signed-off-by: Markus Armbruster --- qemu-doc.texi | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/qemu-doc.texi b/qemu-doc.texi index a1ef6b6484..33b9597b1d 100644 --- a/qemu-doc.texi +++ b/qemu-doc.texi @@ -2880,10 +2880,11 @@ lifetime distros will be assumed to ship similar software versions. For distributions with long-lifetime releases, the project will aim to support the most recent major version at all times. Support for the previous major -version will be dropped 2 years after the new major version is released. For -the purposes of identifying supported software versions, the project will look -at RHEL, Debian, Ubuntu LTS, and SLES distros. Other long-lifetime distros will -be assumed to ship similar software versions. +version will be dropped 2 years after the new major version is released, +or when it reaches ``end of life''. For the purposes of identifying +supported software versions, the project will look at RHEL, Debian, +Ubuntu LTS, and SLES distros. Other long-lifetime distros will be +assumed to ship similar software versions. @section Windows -- 2.21.1
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
Could HiSilicon respond to Dann & Rafael's comments #36 and #38? Is there an upstream acceptable patch that addresses this issue? ** Changed in: kunpeng920 Status: Confirmed => Incomplete ** Changed in: qemu (Ubuntu Bionic) Status: Confirmed => Incomplete ** Changed in: qemu (Ubuntu Disco) Status: Confirmed => Incomplete ** Changed in: qemu (Ubuntu Eoan) Status: In Progress => Incomplete ** Changed in: qemu (Ubuntu Focal) Status: Confirmed => Incomplete -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Incomplete Status in QEMU: In Progress Status in qemu package in Ubuntu: Incomplete Status in qemu source package in Bionic: Incomplete Status in qemu source package in Disco: Incomplete Status in qemu source package in Eoan: Incomplete Status in qemu source package in Focal: Incomplete Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). [ Original Description ] On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions
[PULL 1/1] qxl: introduce hardware revision 5
The only difference to hardware revision 4 is that the device doesn't switch to VGA mode in case someone happens to touch a VGA register, which should make things more robust in configurations with multiple vga devices. Swtiching back to VGA mode happens on reset, either full machine reset or qxl device reset (QXL_IO_RESET ioport command). Signed-off-by: Gerd Hoffmann Reviewed-by: Maxim Levitsky Message-id: 20200206074358.4274-1-kra...@redhat.com --- hw/display/qxl.h | 2 +- hw/core/machine.c | 2 ++ hw/display/qxl.c | 7 ++- 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/hw/display/qxl.h b/hw/display/qxl.h index 80eb0d267269..707631a1f573 100644 --- a/hw/display/qxl.h +++ b/hw/display/qxl.h @@ -144,7 +144,7 @@ typedef struct PCIQXLDevice { } \ } while (0) -#define QXL_DEFAULT_REVISION QXL_REVISION_STABLE_V12 +#define QXL_DEFAULT_REVISION (QXL_REVISION_STABLE_V12 + 1) /* qxl.c */ void *qxl_phys2virt(PCIQXLDevice *qxl, QXLPHYSICAL phys, int group_id); diff --git a/hw/core/machine.c b/hw/core/machine.c index d8e30e4895d8..84812a1d1cc1 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -34,6 +34,8 @@ GlobalProperty hw_compat_4_2[] = { { "vhost-blk-device", "seg_max_adjust", "off"}, { "usb-host", "suppress-remote-wake", "off" }, { "usb-redir", "suppress-remote-wake", "off" }, +{ "qxl", "revision", "4" }, +{ "qxl-vga", "revision", "4" }, }; const size_t hw_compat_4_2_len = G_N_ELEMENTS(hw_compat_4_2); diff --git a/hw/display/qxl.c b/hw/display/qxl.c index c33b1915a52c..64884da70857 100644 --- a/hw/display/qxl.c +++ b/hw/display/qxl.c @@ -1309,7 +1309,8 @@ static void qxl_vga_ioport_write(void *opaque, uint32_t addr, uint32_t val) PCIQXLDevice *qxl = container_of(vga, PCIQXLDevice, vga); trace_qxl_io_write_vga(qxl->id, qxl_mode_to_string(qxl->mode), addr, val); -if (qxl->mode != QXL_MODE_VGA) { +if (qxl->mode != QXL_MODE_VGA && +qxl->revision <= QXL_REVISION_STABLE_V12) { qxl_destroy_primary(qxl, QXL_SYNC); qxl_soft_reset(qxl); } @@ -2121,6 +2122,10 @@ static void qxl_realize_common(PCIQXLDevice *qxl, Error **errp) pci_device_rev = QXL_REVISION_STABLE_V12; io_size = pow2ceil(QXL_IO_RANGE_SIZE); break; +case 5: /* qxl-5 */ +pci_device_rev = QXL_REVISION_STABLE_V12 + 1; +io_size = pow2ceil(QXL_IO_RANGE_SIZE); +break; default: error_setg(errp, "Invalid revision %d for qxl device (max %d)", qxl->revision, QXL_DEFAULT_REVISION); -- 2.18.2
[PULL 0/1] Vga 20200213 patches
The following changes since commit e18e5501d8ac692d32657a3e1ef545b14e72b730: Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-virtiofs-20200210' into staging (2020-02-10 18:09:14 +) are available in the Git repository at: git://git.kraxel.org/qemu tags/vga-20200213-pull-request for you to fetch changes up to ed71c09ffd6fbd01c2a487d47291ae57b08671ea: qxl: introduce hardware revision 5 (2020-02-13 08:31:40 +0100) qxl: introduce hardware revision 5 Gerd Hoffmann (1): qxl: introduce hardware revision 5 hw/display/qxl.h | 2 +- hw/core/machine.c | 2 ++ hw/display/qxl.c | 7 ++- 3 files changed, 9 insertions(+), 2 deletions(-) -- 2.18.2
Re: [PATCH v2] virtio: increase virtuqueue size for virtio-scsi and virtio-blk
On Thu, Feb 13, 2020 at 11:08:35AM +0300, Denis Plotnikov wrote: > On 12.02.2020 18:43, Stefan Hajnoczi wrote: > > On Tue, Feb 11, 2020 at 05:14:14PM +0300, Denis Plotnikov wrote: > > > The goal is to reduce the amount of requests issued by a guest on > > > 1M reads/writes. This rises the performance up to 4% on that kind of > > > disk access pattern. > > > > > > The maximum chunk size to be used for the guest disk accessing is > > > limited with seg_max parameter, which represents the max amount of > > > pices in the scatter-geather list in one guest disk request. > > > > > > Since seg_max is virqueue_size dependent, increasing the virtqueue > > > size increases seg_max, which, in turn, increases the maximum size > > > of data to be read/write from a guest disk. > > > > > > More details in the original problem statment: > > > https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg03721.html > > > > > > Suggested-by: Denis V. Lunev > > > Signed-off-by: Denis Plotnikov > > > --- > > > hw/block/virtio-blk.c | 4 ++-- > > > hw/core/machine.c | 2 ++ > > > hw/scsi/virtio-scsi.c | 4 ++-- > > > 3 files changed, 6 insertions(+), 4 deletions(-) > > > > > > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c > > > index 09f46ed85f..6df3a7a6df 100644 > > > --- a/hw/block/virtio-blk.c > > > +++ b/hw/block/virtio-blk.c > > > @@ -914,7 +914,7 @@ static void virtio_blk_update_config(VirtIODevice > > > *vdev, uint8_t *config) > > > memset(&blkcfg, 0, sizeof(blkcfg)); > > > virtio_stq_p(vdev, &blkcfg.capacity, capacity); > > > virtio_stl_p(vdev, &blkcfg.seg_max, > > > - s->conf.seg_max_adjust ? s->conf.queue_size - 2 : 128 - > > > 2); > > > + s->conf.seg_max_adjust ? s->conf.queue_size - 2 : 256 - > > > 2); > > This value must not change on older machine types. > Yes, that's true, but .. > > So does this patch > > need to turn seg-max-adjust *on* in hw_compat_4_2 so that old machine > > types get 126 instead of 254? > If we set seg-max-adjust "on" in older machine types, the setups using them > and having queue_sizes set , for example, 1024 will also set seg_max to 1024 > - 2 which isn't the expected behavior: older mt didn't change seg_max in > that case and stuck with 128 - 2. > So, should we, instead, leave the default 128 - 2, for seg_max? Argh! Good point :-). How about a seg_max_default property that is initialized to 254 for modern machines and 126 to old machines? Stefan signature.asc Description: PGP signature
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
=》Could HiSilicon respond to Dann & Rafael's comments #36 and #38? =》Is there an upstream acceptable patch that addresses this issue? No upstream patchset, I Only provide a private solution and do not know this root cause. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Incomplete Status in QEMU: In Progress Status in qemu package in Ubuntu: Incomplete Status in qemu source package in Bionic: Incomplete Status in qemu source package in Disco: Incomplete Status in qemu source package in Eoan: Incomplete Status in qemu source package in Focal: Incomplete Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). [ Original Description ] On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions
Re: [PATCH v2] virtio: increase virtuqueue size for virtio-scsi and virtio-blk
On 13.02.2020 12:08, Stefan Hajnoczi wrote: On Thu, Feb 13, 2020 at 11:08:35AM +0300, Denis Plotnikov wrote: On 12.02.2020 18:43, Stefan Hajnoczi wrote: On Tue, Feb 11, 2020 at 05:14:14PM +0300, Denis Plotnikov wrote: The goal is to reduce the amount of requests issued by a guest on 1M reads/writes. This rises the performance up to 4% on that kind of disk access pattern. The maximum chunk size to be used for the guest disk accessing is limited with seg_max parameter, which represents the max amount of pices in the scatter-geather list in one guest disk request. Since seg_max is virqueue_size dependent, increasing the virtqueue size increases seg_max, which, in turn, increases the maximum size of data to be read/write from a guest disk. More details in the original problem statment: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg03721.html Suggested-by: Denis V. Lunev Signed-off-by: Denis Plotnikov --- hw/block/virtio-blk.c | 4 ++-- hw/core/machine.c | 2 ++ hw/scsi/virtio-scsi.c | 4 ++-- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 09f46ed85f..6df3a7a6df 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -914,7 +914,7 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) memset(&blkcfg, 0, sizeof(blkcfg)); virtio_stq_p(vdev, &blkcfg.capacity, capacity); virtio_stl_p(vdev, &blkcfg.seg_max, - s->conf.seg_max_adjust ? s->conf.queue_size - 2 : 128 - 2); + s->conf.seg_max_adjust ? s->conf.queue_size - 2 : 256 - 2); This value must not change on older machine types. Yes, that's true, but .. So does this patch need to turn seg-max-adjust *on* in hw_compat_4_2 so that old machine types get 126 instead of 254? If we set seg-max-adjust "on" in older machine types, the setups using them and having queue_sizes set , for example, 1024 will also set seg_max to 1024 - 2 which isn't the expected behavior: older mt didn't change seg_max in that case and stuck with 128 - 2. So, should we, instead, leave the default 128 - 2, for seg_max? Argh! Good point :-). How about a seg_max_default property that is initialized to 254 for modern machines and 126 to old machines? Hmm, but we'll achieve the same but with more code changes, don't we? 254 is because the queue-size is 256. We gonna leave 128-2 for older machine types just for not breaking anything. All other seg_max adjustment is provided by seg_max_adjust which is "on" by default in modern machine types. to summarize: modern mt defaults: seg_max_adjust = on queue_size = 256 => default seg_max = 254 => changing queue-size will change seg_max = queue_size - 2 old mt defaults: seg_max_adjust = off queue_size = 128 => default seg_max = 126 => changing queue-size won't change seg_max, it's always = 126 like it was before Denis Stefan
Re: [PATCH] qemu-doc: Clarify extent of build platform support
On Thu, Feb 13, 2020 at 09:43:34AM +0100, Markus Armbruster wrote: > Supporting a build platform beyond its end of life makes no sense. > Spell that out just to be clear. Agreed, that matches my intention when first writing this doc. > > Signed-off-by: Markus Armbruster > --- > qemu-doc.texi | 9 + > 1 file changed, 5 insertions(+), 4 deletions(-) Reviewed-by: Daniel P. Berrangé > > diff --git a/qemu-doc.texi b/qemu-doc.texi > index a1ef6b6484..33b9597b1d 100644 > --- a/qemu-doc.texi > +++ b/qemu-doc.texi > @@ -2880,10 +2880,11 @@ lifetime distros will be assumed to ship similar > software versions. > > For distributions with long-lifetime releases, the project will aim to > support > the most recent major version at all times. Support for the previous major > -version will be dropped 2 years after the new major version is released. For > -the purposes of identifying supported software versions, the project will > look > -at RHEL, Debian, Ubuntu LTS, and SLES distros. Other long-lifetime distros > will > -be assumed to ship similar software versions. > +version will be dropped 2 years after the new major version is released, > +or when it reaches ``end of life''. For the purposes of identifying > +supported software versions, the project will look at RHEL, Debian, > +Ubuntu LTS, and SLES distros. Other long-lifetime distros will be > +assumed to ship similar software versions. > > @section Windows > > -- > 2.21.1 > Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
[PATCH RFC 07/14] migration/rdma: Export the 'qemu_rdma_registration_handle' and 'qemu_rdma_exchange_send' functions
Signed-off-by: Zhimin Feng --- migration/rdma.c | 25 + migration/rdma.h | 16 2 files changed, 21 insertions(+), 20 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index 2f1e69197f..23f7f525f4 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -170,17 +170,6 @@ static void network_to_dest_block(RDMADestBlock *db) db->remote_rkey = ntohl(db->remote_rkey); } -/* - * Main structure for IB Send/Recv control messages. - * This gets prepended at the beginning of every Send/Recv. - */ -typedef struct QEMU_PACKED { -uint32_t len; /* Total length of data portion */ -uint32_t type;/* which control command to perform */ -uint32_t repeat; /* number of commands in data portion of same type */ -uint32_t padding; -} RDMAControlHeader; - static void control_to_network(RDMAControlHeader *control) { control->type = htonl(control->type); @@ -289,10 +278,6 @@ static void network_to_result(RDMARegisterResult *result) }; const char *print_wrid(int wrid); -static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head, - uint8_t *data, RDMAControlHeader *resp, - int *resp_idx, - int (*callback)(RDMAContext *rdma)); static inline uint64_t ram_chunk_index(const uint8_t *start, const uint8_t *host) @@ -1590,10 +1575,10 @@ static void qemu_rdma_move_header(RDMAContext *rdma, int idx, * to perform an *additional* exchange of message just to provide a response by * instead piggy-backing on the acknowledgement. */ -static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head, - uint8_t *data, RDMAControlHeader *resp, - int *resp_idx, - int (*callback)(RDMAContext *rdma)) +int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head, +uint8_t *data, RDMAControlHeader *resp, +int *resp_idx, +int (*callback)(RDMAContext *rdma)) { int ret = 0; @@ -3210,7 +3195,7 @@ static int dest_ram_sort_func(const void *a, const void *b) * * Keep doing this until the source tells us to stop. */ -static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque) +int qemu_rdma_registration_handle(QEMUFile *f, void *opaque) { RDMAControlHeader reg_resp = { .len = sizeof(RDMARegisterResult), .type = RDMA_CONTROL_REGISTER_RESULT, diff --git a/migration/rdma.h b/migration/rdma.h index ace6e5be90..8e1a6edf57 100644 --- a/migration/rdma.h +++ b/migration/rdma.h @@ -144,6 +144,17 @@ typedef struct QEMU_PACKED RDMADestBlock { uint32_t padding; } RDMADestBlock; +/* + * Main structure for IB Send/Recv control messages. + * This gets prepended at the beginning of every Send/Recv. + */ +typedef struct QEMU_PACKED { +uint32_t len; /* Total length of data portion */ +uint32_t type;/* which control command to perform */ +uint32_t repeat; /* number of commands in data portion of same type */ +uint32_t padding; +} RDMAControlHeader; + /* * Virtual address of the above structures used for transmitting * the RAMBlock descriptions at connection-time. @@ -264,6 +275,11 @@ struct QIOChannelRDMA { }; int multifd_channel_rdma_connect(void *opaque); +int qemu_rdma_registration_handle(QEMUFile *f, void *opaque); +int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head, +uint8_t *data, RDMAControlHeader *resp, +int *resp_idx, +int (*callback)(RDMAContext *rdma)); void rdma_start_outgoing_migration(void *opaque, const char *host_port, Error **errp); -- 2.19.1
[PATCH RFC 03/14] migration/rdma: Create multiFd migration threads
Creation of the multifd send threads for RDMA migration, nothing inside yet. Signed-off-by: Zhimin Feng --- migration/multifd.c | 33 +--- migration/multifd.h | 2 + migration/qemu-file.c | 5 +++ migration/qemu-file.h | 1 + migration/rdma.c | 88 ++- migration/rdma.h | 3 ++ 6 files changed, 125 insertions(+), 7 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index b3e8ae9bcc..63678d7fdd 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -424,7 +424,7 @@ void multifd_send_sync_main(QEMUFile *f) { int i; -if (!migrate_use_multifd()) { +if (!migrate_use_multifd() || migrate_use_rdma()) { return; } if (multifd_send_state->pages->used) { @@ -562,6 +562,20 @@ out: return NULL; } +static void rdma_send_channel_create(MultiFDSendParams *p) +{ +Error *local_err = NULL; + +if (p->quit) { +error_setg(&local_err, "multifd: send id %d already quit", p->id); +return ; +} +p->running = true; + +qemu_thread_create(&p->thread, p->name, multifd_rdma_send_thread, p, + QEMU_THREAD_JOINABLE); +} + static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque) { MultiFDSendParams *p = opaque; @@ -621,7 +635,11 @@ int multifd_save_setup(Error **errp) p->packet->magic = cpu_to_be32(MULTIFD_MAGIC); p->packet->version = cpu_to_be32(MULTIFD_VERSION); p->name = g_strdup_printf("multifdsend_%d", i); -socket_send_channel_create(multifd_new_send_channel_async, p); +if (!migrate_use_rdma()) { +socket_send_channel_create(multifd_new_send_channel_async, p); +} else { +rdma_send_channel_create(p); +} } return 0; } @@ -720,7 +738,7 @@ void multifd_recv_sync_main(void) { int i; -if (!migrate_use_multifd()) { +if (!migrate_use_multifd() || migrate_use_rdma()) { return; } for (i = 0; i < migrate_multifd_channels(); i++) { @@ -890,8 +908,13 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp) p->num_packets = 1; p->running = true; -qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p, - QEMU_THREAD_JOINABLE); +if (!migrate_use_rdma()) { +qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p, + QEMU_THREAD_JOINABLE); +} else { +qemu_thread_create(&p->thread, p->name, multifd_rdma_recv_thread, p, + QEMU_THREAD_JOINABLE); +} atomic_inc(&multifd_recv_state->count); return atomic_read(&multifd_recv_state->count) == migrate_multifd_channels(); diff --git a/migration/multifd.h b/migration/multifd.h index d8b0205977..c9c11ad140 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -13,6 +13,8 @@ #ifndef QEMU_MIGRATION_MULTIFD_H #define QEMU_MIGRATION_MULTIFD_H +#include "migration/rdma.h" + int multifd_save_setup(Error **errp); void multifd_save_cleanup(void); int multifd_load_setup(Error **errp); diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 1c3a358a14..f0ed8f1381 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -248,6 +248,11 @@ void qemu_fflush(QEMUFile *f) f->iovcnt = 0; } +void *getQIOChannel(QEMUFile *f) +{ +return f->opaque; +} + void ram_control_before_iterate(QEMUFile *f, uint64_t flags) { int ret = 0; diff --git a/migration/qemu-file.h b/migration/qemu-file.h index a9b6d6ccb7..fc656a3b72 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -161,6 +161,7 @@ int qemu_file_shutdown(QEMUFile *f); QEMUFile *qemu_file_get_return_path(QEMUFile *f); void qemu_fflush(QEMUFile *f); void qemu_file_set_blocking(QEMUFile *f, bool block); +void *getQIOChannel(QEMUFile *f); void ram_control_before_iterate(QEMUFile *f, uint64_t flags); void ram_control_after_iterate(QEMUFile *f, uint64_t flags); diff --git a/migration/rdma.c b/migration/rdma.c index 2379b8345b..f086ab5a82 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -34,6 +34,7 @@ #include #include #include "trace.h" +#include "multifd.h" /* * Print and error on both the Monitor and the Log file. @@ -3975,6 +3976,34 @@ static QEMUFile *qemu_fopen_rdma(RDMAContext *rdma, const char *mode) return rioc->file; } +static void migration_rdma_process_incoming(QEMUFile *f, Error **errp) +{ +MigrationIncomingState *mis = migration_incoming_get_current(); +Error *local_err = NULL; +QIOChannel *ioc = NULL; +bool start_migration; + +if (!mis->from_src_file) { +mis->from_src_file = f; +qemu_file_set_blocking(f, false); + +start_migration = migrate_use_multifd(); +} else { +ioc = QIO_CHANNEL(getQIOChannel(f)); +/* Multiple connections */ +assert(migrate_use_multifd()); +start_migration = multifd_recv_new_chan
[PATCH RFC 10/14] migration/rdma: Wait for all multifd to complete registration
Signed-off-by: Zhimin Feng --- migration/multifd.c | 6 ++ migration/rdma.c| 17 + 2 files changed, 23 insertions(+) diff --git a/migration/multifd.c b/migration/multifd.c index 4ae25fc88f..c986d4c247 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -389,6 +389,7 @@ static void multifd_send_terminate_threads(Error *err) qemu_mutex_lock(&p->mutex); p->quit = true; if (migrate_use_rdma()) { +qemu_sem_post(&p->sem); qemu_sem_post(&p->sem_sync); } else { qemu_sem_post(&p->sem); @@ -502,6 +503,11 @@ static void *multifd_rdma_send_thread(void *opaque) if (qemu_rdma_registration(p->rdma) < 0) { goto out; } +/* + * Inform the main RDMA thread to run when multifd + * RDMA thread have completed registration. + */ +qemu_sem_post(&p->sem); while (true) { qemu_sem_wait(&p->sem_sync); diff --git a/migration/rdma.c b/migration/rdma.c index 5de3a29712..4c48e9832c 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3733,6 +3733,23 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, rdma->dest_blocks[i].remote_host_addr; local->block[i].remote_rkey = rdma->dest_blocks[i].remote_rkey; } + +/* Wait for all multifd channels to complete registration */ +if (migrate_use_multifd()) { +int i; +int thread_count = migrate_multifd_channels(); +MultiFDSendParams *multifd_send_param = NULL; +for (i = 0; i < thread_count; i++) { +ret = get_multifd_send_param(i, &multifd_send_param); +if (ret) { +ERROR(errp, "rdma: error" + "getting multifd_send_param(%d)", i); +return ret; +} + +qemu_sem_wait(&multifd_send_param->sem); +} +} } trace_qemu_rdma_registration_stop(flags); -- 2.19.1
[PATCH RFC 01/14] migration: add the 'migrate_use_rdma_pin_all' function
Signed-off-by: Zhimin Feng --- migration/migration.c | 9 + migration/migration.h | 1 + 2 files changed, 10 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 3a21a4686c..10a13e0c79 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2208,6 +2208,15 @@ bool migrate_use_events(void) return s->enabled_capabilities[MIGRATION_CAPABILITY_EVENTS]; } +bool migrate_use_rdma_pin_all(void) +{ +MigrationState *s; + +s = migrate_get_current(); + +return s->enabled_capabilities[MIGRATION_CAPABILITY_RDMA_PIN_ALL]; +} + bool migrate_use_multifd(void) { MigrationState *s; diff --git a/migration/migration.h b/migration/migration.h index 8473ddfc88..50fc2693c7 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -297,6 +297,7 @@ bool migrate_ignore_shared(void); bool migrate_validate_uuid(void); bool migrate_auto_converge(void); +bool migrate_use_rdma_pin_all(void); bool migrate_use_multifd(void); bool migrate_pause_before_switchover(void); int migrate_multifd_channels(void); -- 2.19.1
[PATCH RFC 02/14] migration: judge whether or not the RDMA is used for migration
Signed-off-by: Zhimin Feng --- migration/migration.c | 10 ++ migration/migration.h | 1 + 2 files changed, 11 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 10a13e0c79..819089a7ea 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -107,6 +107,7 @@ static NotifierList migration_state_notifiers = NOTIFIER_LIST_INITIALIZER(migration_state_notifiers); static bool deferred_incoming; +static bool enabled_rdma_migration; /* Messages sent on the return path from destination to source */ enum mig_rp_message_type { @@ -354,6 +355,7 @@ void migrate_add_address(SocketAddress *address) void qemu_start_incoming_migration(const char *uri, Error **errp) { const char *p; +enabled_rdma_migration = false; qapi_event_send_migration(MIGRATION_STATUS_SETUP); if (!strcmp(uri, "defer")) { @@ -362,6 +364,7 @@ void qemu_start_incoming_migration(const char *uri, Error **errp) tcp_start_incoming_migration(p, errp); #ifdef CONFIG_RDMA } else if (strstart(uri, "rdma:", &p)) { +enabled_rdma_migration = true; rdma_start_incoming_migration(p, errp); #endif } else if (strstart(uri, "exec:", &p)) { @@ -1982,6 +1985,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, Error *local_err = NULL; MigrationState *s = migrate_get_current(); const char *p; +enabled_rdma_migration = false; if (!migrate_prepare(s, has_blk && blk, has_inc && inc, has_resume && resume, errp)) { @@ -1993,6 +1997,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, tcp_start_outgoing_migration(s, p, &local_err); #ifdef CONFIG_RDMA } else if (strstart(uri, "rdma:", &p)) { +enabled_rdma_migration = true; rdma_start_outgoing_migration(s, p, &local_err); #endif } else if (strstart(uri, "exec:", &p)) { @@ -2208,6 +2213,11 @@ bool migrate_use_events(void) return s->enabled_capabilities[MIGRATION_CAPABILITY_EVENTS]; } +bool migrate_use_rdma(void) +{ +return enabled_rdma_migration; +} + bool migrate_use_rdma_pin_all(void) { MigrationState *s; diff --git a/migration/migration.h b/migration/migration.h index 50fc2693c7..9b37320d50 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -297,6 +297,7 @@ bool migrate_ignore_shared(void); bool migrate_validate_uuid(void); bool migrate_auto_converge(void); +bool migrate_use_rdma(void); bool migrate_use_rdma_pin_all(void); bool migrate_use_multifd(void); bool migrate_pause_before_switchover(void); -- 2.19.1
[PATCH RFC 08/14] migration/rdma: Add the function for dynamic page registration
Add the 'qemu_rdma_registration' function, multifd send threads call it to register memory. Signed-off-by: Zhimin Feng --- migration/rdma.c | 51 migration/rdma.h | 1 + 2 files changed, 52 insertions(+) diff --git a/migration/rdma.c b/migration/rdma.c index 23f7f525f4..19a238be30 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3471,6 +3471,57 @@ out: return ret; } +/* + * Dynamic page registrations for multifd RDMA threads. + */ +int qemu_rdma_registration(void *opaque) +{ +RDMAContext *rdma = opaque; +RDMAControlHeader resp = {.type = RDMA_CONTROL_RAM_BLOCKS_RESULT }; +RDMALocalBlocks *local = &rdma->local_ram_blocks; +int reg_result_idx, i, nb_dest_blocks; +RDMAControlHeader head = { .len = 0, .repeat = 1 }; +int ret = 0; + +head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST; + +ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp, +®_result_idx, rdma->pin_all ? +qemu_rdma_reg_whole_ram_blocks : NULL); +if (ret < 0) { +goto out; +} + +nb_dest_blocks = resp.len / sizeof(RDMADestBlock); + +if (local->nb_blocks != nb_dest_blocks) { +rdma->error_state = -EINVAL; +ret = -1; +goto out; +} + +qemu_rdma_move_header(rdma, reg_result_idx, &resp); +memcpy(rdma->dest_blocks, + rdma->wr_data[reg_result_idx].control_curr, resp.len); + +for (i = 0; i < nb_dest_blocks; i++) { +network_to_dest_block(&rdma->dest_blocks[i]); + +/* We require that the blocks are in the same order */ +if (rdma->dest_blocks[i].length != local->block[i].length) { +rdma->error_state = -EINVAL; +ret = -1; +goto out; +} +local->block[i].remote_host_addr = +rdma->dest_blocks[i].remote_host_addr; +local->block[i].remote_rkey = rdma->dest_blocks[i].remote_rkey; +} + +out: +return ret; +} + /* Destination: * Called via a ram_control_load_hook during the initial RAM load section which * lists the RAMBlocks by name. This lets us know the order of the RAMBlocks diff --git a/migration/rdma.h b/migration/rdma.h index 8e1a6edf57..86c89bdd1f 100644 --- a/migration/rdma.h +++ b/migration/rdma.h @@ -280,6 +280,7 @@ int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head, uint8_t *data, RDMAControlHeader *resp, int *resp_idx, int (*callback)(RDMAContext *rdma)); +int qemu_rdma_registration(void *opaque); void rdma_start_outgoing_migration(void *opaque, const char *host_port, Error **errp); -- 2.19.1
[PATCH RFC 04/14] migration/rdma: Export the RDMAContext struct
We need to use the RDMAContext in migration/multifd.c so it has to be exported. Signed-off-by: Zhimin Feng --- migration/rdma.c | 243 -- migration/rdma.h | 247 +++ 2 files changed, 247 insertions(+), 243 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index f086ab5a82..a76823986e 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -19,9 +19,7 @@ #include "qemu/cutils.h" #include "rdma.h" #include "migration.h" -#include "qemu-file.h" #include "ram.h" -#include "qemu-file-channel.h" #include "qemu/error-report.h" #include "qemu/main-loop.h" #include "qemu/module.h" @@ -47,34 +45,6 @@ } \ } while (0) -#define RDMA_RESOLVE_TIMEOUT_MS 1 - -/* Do not merge data if larger than this. */ -#define RDMA_MERGE_MAX (2 * 1024 * 1024) -#define RDMA_SIGNALED_SEND_MAX (RDMA_MERGE_MAX / 4096) - -#define RDMA_REG_CHUNK_SHIFT 20 /* 1 MB */ - -/* - * This is only for non-live state being migrated. - * Instead of RDMA_WRITE messages, we use RDMA_SEND - * messages for that state, which requires a different - * delivery design than main memory. - */ -#define RDMA_SEND_INCREMENT 32768 - -/* - * Maximum size infiniband SEND message - */ -#define RDMA_CONTROL_MAX_BUFFER (512 * 1024) -#define RDMA_CONTROL_MAX_COMMANDS_PER_MESSAGE 4096 - -#define RDMA_CONTROL_VERSION_CURRENT 1 -/* - * Capabilities for negotiation. - */ -#define RDMA_CAPABILITY_PIN_ALL 0x01 - /* * Add the other flags above to this list of known capabilities * as they are introduced. @@ -117,18 +87,6 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL; #define RDMA_WRID_CHUNK_MASK (~RDMA_WRID_BLOCK_MASK & ~RDMA_WRID_TYPE_MASK) -/* - * RDMA migration protocol: - * 1. RDMA Writes (data messages, i.e. RAM) - * 2. IB Send/Recv (control channel messages) - */ -enum { -RDMA_WRID_NONE = 0, -RDMA_WRID_RDMA_WRITE = 1, -RDMA_WRID_SEND_CONTROL = 2000, -RDMA_WRID_RECV_CONTROL = 4000, -}; - static const char *wrid_desc[] = { [RDMA_WRID_NONE] = "NONE", [RDMA_WRID_RDMA_WRITE] = "WRITE RDMA", @@ -136,50 +94,6 @@ static const char *wrid_desc[] = { [RDMA_WRID_RECV_CONTROL] = "CONTROL RECV", }; -/* - * Work request IDs for IB SEND messages only (not RDMA writes). - * This is used by the migration protocol to transmit - * control messages (such as device state and registration commands) - * - * We could use more WRs, but we have enough for now. - */ -enum { -RDMA_WRID_READY = 0, -RDMA_WRID_DATA, -RDMA_WRID_CONTROL, -RDMA_WRID_MAX, -}; - -/* - * SEND/RECV IB Control Messages. - */ -enum { -RDMA_CONTROL_NONE = 0, -RDMA_CONTROL_ERROR, -RDMA_CONTROL_READY, /* ready to receive */ -RDMA_CONTROL_QEMU_FILE, /* QEMUFile-transmitted bytes */ -RDMA_CONTROL_RAM_BLOCKS_REQUEST, /* RAMBlock synchronization */ -RDMA_CONTROL_RAM_BLOCKS_RESULT, /* RAMBlock synchronization */ -RDMA_CONTROL_COMPRESS,/* page contains repeat values */ -RDMA_CONTROL_REGISTER_REQUEST,/* dynamic page registration */ -RDMA_CONTROL_REGISTER_RESULT, /* key to use after registration */ -RDMA_CONTROL_REGISTER_FINISHED, /* current iteration finished */ -RDMA_CONTROL_UNREGISTER_REQUEST, /* dynamic UN-registration */ -RDMA_CONTROL_UNREGISTER_FINISHED, /* unpinning finished */ -}; - - -/* - * Memory and MR structures used to represent an IB Send/Recv work request. - * This is *not* used for RDMA writes, only IB Send/Recv. - */ -typedef struct { -uint8_t control[RDMA_CONTROL_MAX_BUFFER]; /* actual buffer to register */ -struct ibv_mr *control_mr; /* registration metadata */ -size_t control_len; /* length of the message */ -uint8_t *control_curr; /* start of unconsumed bytes */ -} RDMAWorkRequestData; - /* * Negotiate RDMA capabilities during connection-setup time. */ @@ -200,46 +114,6 @@ static void network_to_caps(RDMACapabilities *cap) cap->flags = ntohl(cap->flags); } -/* - * Representation of a RAMBlock from an RDMA perspective. - * This is not transmitted, only local. - * This and subsequent structures cannot be linked lists - * because we're using a single IB message to transmit - * the information. It's small anyway, so a list is overkill. - */ -typedef struct RDMALocalBlock { -char *block_name; -uint8_t *local_host_addr; /* local virtual address */ -uint64_t remote_host_addr; /* remote virtual address */ -uint64_t offset; -uint64_t length; -struct ibv_mr **pmr;/* MRs for chunk-level registration */ -struct ibv_mr *mr; /* MR for non-chunk-level registration */ -uint32_t *remote_keys; /* rkeys for chunk-level registration */ -uint32_t remote_rkey; /* rkeys for non-chunk-level registration */ -intindex;
[PATCH RFC 00/14] *** multifd for RDMA v2 ***
Hi This is a version against current code. It is based on the multifd work. we can use the multifd parameters for rdma transport. All data is transported by the multifd RDMA channels and the main channel is only used to distribute its to the different multifd channels. Zhimin Feng (14): migration: add the 'migrate_use_rdma_pin_all' function migration: judge whether or not the RDMA is used for migration migration/rdma: Create multiFd migration threads migration/rdma: Export the RDMAContext struct migration/rdma: Create the multifd channels for RDMA migration/rdma: Transmit initial packet migration/rdma: Export the 'qemu_rdma_registration_handle' and 'qemu_rdma_exchange_send' functions migration/rdma: Add the function for dynamic page registration migration/rdma: register memory for multifd RDMA channels migration/rdma: Wait for all multifd to complete registration migration/rdma: use multifd to migrate VM for rdma-pin-all mode migration/rdma: use multifd to migrate VM for NOT rdma-pin-all mode migration/rdma: only register the memory for multifd channels migration/rdma: RDMA cleanup for multifd migration migration/migration.c | 19 ++ migration/migration.h | 2 + migration/multifd.c | 192 +- migration/multifd.h | 12 + migration/qemu-file.c | 5 + migration/qemu-file.h | 1 + migration/rdma.c | 579 +++--- migration/rdma.h | 268 +++ 8 files changed, 804 insertions(+), 274 deletions(-) -- 2.19.1
Re: [Qemu-devel] [PATCH v3] hw: net: cadence_gem: Fix build errors in DB_PRINT()
Hi Jason, On Mon, Aug 19, 2019 at 1:40 PM Jason Wang wrote: > > > On 2019/8/19 下午1:24, Bin Meng wrote: > > On Sat, Aug 10, 2019 at 9:58 AM Alistair Francis > > wrote: > >> On Fri, Aug 9, 2019 at 12:26 AM Bin Meng wrote: > >>> When CADENCE_GEM_ERR_DEBUG is turned on, there are several > >>> compilation errors in DB_PRINT(). Fix them. > >>> > >>> While we are here, update to use appropriate modifiers in > >>> the same DB_PRINT() call. > >>> > >>> Signed-off-by: Bin Meng > >> Reviewed-by: Alistair Francis > >> > > Ping? > > > > What's the status of this patch? > > > > Regards, > > Bin > > > Applied. I checked latest qemu/master and found this patch isn't applied. Could you please take a look? Regards, Bin
[PATCH RFC 09/14] migration/rdma: register memory for multifd RDMA channels
register memory for multifd RDMA channels and transmit the destination the keys to source to use including the virtual addresses. Signed-off-by: Zhimin Feng --- migration/multifd.c | 34 +--- migration/rdma.c| 48 + 2 files changed, 79 insertions(+), 3 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index a57d7a2eab..4ae25fc88f 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -388,7 +388,11 @@ static void multifd_send_terminate_threads(Error *err) qemu_mutex_lock(&p->mutex); p->quit = true; -qemu_sem_post(&p->sem); +if (migrate_use_rdma()) { +qemu_sem_post(&p->sem_sync); +} else { +qemu_sem_post(&p->sem); +} qemu_mutex_unlock(&p->mutex); } } @@ -484,6 +488,8 @@ static void *multifd_rdma_send_thread(void *opaque) { MultiFDSendParams *p = opaque; Error *local_err = NULL; +int ret = 0; +RDMAControlHeader head = { .len = 0, .repeat = 1 }; trace_multifd_send_thread_start(p->id); @@ -491,14 +497,28 @@ static void *multifd_rdma_send_thread(void *opaque) goto out; } +/* wait for semaphore notification to register memory */ +qemu_sem_wait(&p->sem_sync); +if (qemu_rdma_registration(p->rdma) < 0) { +goto out; +} + while (true) { +qemu_sem_wait(&p->sem_sync); + qemu_mutex_lock(&p->mutex); if (p->quit) { qemu_mutex_unlock(&p->mutex); break; } qemu_mutex_unlock(&p->mutex); -qemu_sem_wait(&p->sem); + +/* Send FINISHED to the destination */ +head.type = RDMA_CONTROL_REGISTER_FINISHED; +ret = qemu_rdma_exchange_send(p->rdma, &head, NULL, NULL, NULL, NULL); +if (ret < 0) { +return NULL; +} } out: @@ -836,15 +856,23 @@ void multifd_recv_sync_main(void) static void *multifd_rdma_recv_thread(void *opaque) { MultiFDRecvParams *p = opaque; +int ret = 0; while (true) { +qemu_sem_wait(&p->sem_sync); + qemu_mutex_lock(&p->mutex); if (p->quit) { qemu_mutex_unlock(&p->mutex); break; } qemu_mutex_unlock(&p->mutex); -qemu_sem_wait(&p->sem_sync); + +ret = qemu_rdma_registration_handle(p->file, p->c); +if (ret < 0) { +qemu_file_set_error(p->file, ret); +break; +} } qemu_mutex_lock(&p->mutex); diff --git a/migration/rdma.c b/migration/rdma.c index 19a238be30..5de3a29712 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3570,6 +3570,19 @@ static int rdma_load_hook(QEMUFile *f, void *opaque, uint64_t flags, void *data) return rdma_block_notification_handle(opaque, data); case RAM_CONTROL_HOOK: +if (migrate_use_multifd()) { +int i; +MultiFDRecvParams *multifd_recv_param = NULL; +int thread_count = migrate_multifd_channels(); +/* Inform dest recv_thread to poll */ +for (i = 0; i < thread_count; i++) { +if (get_multifd_recv_param(i, &multifd_recv_param)) { +return -1; +} +qemu_sem_post(&multifd_recv_param->sem_sync); +} +} + return qemu_rdma_registration_handle(f, opaque); default: @@ -3643,6 +3656,25 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, head.type = RDMA_CONTROL_RAM_BLOCKS_REQUEST; trace_qemu_rdma_registration_stop_ram(); +if (migrate_use_multifd()) { +/* + * Inform the multifd channels to register memory + */ +int i; +int thread_count = migrate_multifd_channels(); +MultiFDSendParams *multifd_send_param = NULL; +for (i = 0; i < thread_count; i++) { +ret = get_multifd_send_param(i, &multifd_send_param); +if (ret) { +ERROR(errp, "rdma: error getting" +"multifd_send_param(%d)", i); +return ret; +} + +qemu_sem_post(&multifd_send_param->sem_sync); +} +} + /* * Make sure that we parallelize the pinning on both sides. * For very large guests, doing this serially takes a really @@ -3708,6 +3740,22 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, head.type = RDMA_CONTROL_REGISTER_FINISHED; ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL); +if (migrate_use_multifd()) { +/* Inform src send_thread to send FINISHED signal */ +int i; +int thread_count = migrate_multifd_channels(); +MultiFDSendParams *multifd_send_param = NULL; +for (i = 0; i < thread_count; i++) { +ret = get_m
[PATCH RFC 11/14] migration/rdma: use multifd to migrate VM for rdma-pin-all mode
Signed-off-by: Zhimin Feng --- migration/multifd.c | 15 migration/rdma.c| 58 + migration/rdma.h| 2 ++ 3 files changed, 70 insertions(+), 5 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index c986d4c247..ba5e0b11d0 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -519,12 +519,27 @@ static void *multifd_rdma_send_thread(void *opaque) } qemu_mutex_unlock(&p->mutex); +/* To complete polling(CQE) */ +while (p->rdma->nb_sent) { +ret = qemu_rdma_block_for_wrid(p->rdma, RDMA_WRID_RDMA_WRITE, NULL); +if (ret < 0) { +error_report("multifd RDMA migration: " + "complete polling error!"); +return NULL; +} +} + /* Send FINISHED to the destination */ head.type = RDMA_CONTROL_REGISTER_FINISHED; ret = qemu_rdma_exchange_send(p->rdma, &head, NULL, NULL, NULL, NULL); if (ret < 0) { +error_report("multifd RDMA migration: " + "receiving remote info!"); return NULL; } + +/* sync main thread */ +qemu_sem_post(&p->sem); } out: diff --git a/migration/rdma.c b/migration/rdma.c index 4c48e9832c..873c17dc03 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -96,6 +96,23 @@ static const char *wrid_desc[] = { static const char *rdma_host_port; +/* + * index of current RDMA channel for multifd + */ +static int current_RDMA_index; + +/* + * Get the multifd RDMA channel used to send data. + */ +static int get_multifd_RDMA_channel(void) +{ +int thread_count = migrate_multifd_channels(); +current_RDMA_index++; +current_RDMA_index %= thread_count; + +return current_RDMA_index; +} + /* * Negotiate RDMA capabilities during connection-setup time. */ @@ -1328,8 +1345,8 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma) * completions only need to be recorded, but do not actually * need further processing. */ -static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested, -uint32_t *byte_len) +int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested, + uint32_t *byte_len) { int num_cq_events = 0, ret = 0; struct ibv_cq *cq; @@ -1731,6 +1748,20 @@ static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma, .repeat = 1, }; +/* use multifd to send data */ +if (migrate_use_multifd() && migrate_use_rdma_pin_all()) { +int channel = get_multifd_RDMA_channel(); +int ret = 0; +MultiFDSendParams *multifd_send_param = NULL; +ret = get_multifd_send_param(channel, &multifd_send_param); +if (ret) { +error_report("rdma: error getting multifd_send_param(%d)", channel); +return -EINVAL; +} +rdma = multifd_send_param->rdma; +block = &(rdma->local_ram_blocks.block[current_index]); +} + retry: sge.addr = (uintptr_t)(block->local_host_addr + (current_addr - block->offset)); @@ -1948,8 +1979,21 @@ static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma) } if (ret == 0) { -rdma->nb_sent++; -trace_qemu_rdma_write_flush(rdma->nb_sent); +if (migrate_use_multifd() && migrate_use_rdma_pin_all()) { +/* The multifd RDMA threads send data */ +MultiFDSendParams *multifd_send_param = NULL; +ret = get_multifd_send_param(current_RDMA_index, + &multifd_send_param); +if (ret) { +error_report("rdma: error getting multifd_send_param(%d)", + current_RDMA_index); +return ret; +} +multifd_send_param->rdma->nb_sent++; +} else { +rdma->nb_sent++; +trace_qemu_rdma_write_flush(rdma->nb_sent); +} } rdma->current_length = 0; @@ -3758,7 +3802,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL); if (migrate_use_multifd()) { -/* Inform src send_thread to send FINISHED signal */ +/* + * Inform src send_thread to send FINISHED signal. + * Wait for multifd RDMA send threads to poll the CQE. + */ int i; int thread_count = migrate_multifd_channels(); MultiFDSendParams *multifd_send_param = NULL; @@ -3770,6 +3817,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, } qemu_sem_post(&multifd_send_param->sem_sync); +qemu_sem_wait(&multifd_send_param->sem); } } diff --git a/migration/rdma.h b/migration/rdm
Re: [Qemu-devel] [PATCH v3] hw: net: cadence_gem: Fix build errors in DB_PRINT()
On 2020/2/13 下午5:39, Bin Meng wrote: Hi Jason, On Mon, Aug 19, 2019 at 1:40 PM Jason Wang wrote: On 2019/8/19 下午1:24, Bin Meng wrote: On Sat, Aug 10, 2019 at 9:58 AM Alistair Francis wrote: On Fri, Aug 9, 2019 at 12:26 AM Bin Meng wrote: When CADENCE_GEM_ERR_DEBUG is turned on, there are several compilation errors in DB_PRINT(). Fix them. While we are here, update to use appropriate modifiers in the same DB_PRINT() call. Signed-off-by: Bin Meng Reviewed-by: Alistair Francis Ping? What's the status of this patch? Regards, Bin Applied. I checked latest qemu/master and found this patch isn't applied. Could you please take a look? Regards, Bin For some unknown reason it was lost, I've applied in my tree and it will be in the next pull request. Sorry.
[PATCH RFC 12/14] migration/rdma: use multifd to migrate VM for NOT rdma-pin-all mode
Signed-off-by: Zhimin Feng --- migration/rdma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index 873c17dc03..eb7c2edbe7 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -1749,7 +1749,7 @@ static int qemu_rdma_write_one(QEMUFile *f, RDMAContext *rdma, }; /* use multifd to send data */ -if (migrate_use_multifd() && migrate_use_rdma_pin_all()) { +if (migrate_use_multifd()) { int channel = get_multifd_RDMA_channel(); int ret = 0; MultiFDSendParams *multifd_send_param = NULL; @@ -1979,7 +1979,7 @@ static int qemu_rdma_write_flush(QEMUFile *f, RDMAContext *rdma) } if (ret == 0) { -if (migrate_use_multifd() && migrate_use_rdma_pin_all()) { +if (migrate_use_multifd()) { /* The multifd RDMA threads send data */ MultiFDSendParams *multifd_send_param = NULL; ret = get_multifd_send_param(current_RDMA_index, -- 2.19.1
Re: [PATCH] migration: Maybe VM is paused when migration is cancelled
Zhimin Feng wrote: > If the migration is cancelled when it is in the completion phase, > the migration state is set to MIGRATION_STATUS_CANCELLING. > The VM maybe wait for the 'pause_sem' semaphore in migration_maybe_pause > function, so that VM always is paused. > > Reported-by: Euler Robot > Signed-off-by: Zhimin Feng Reviewed-by: Juan Quintela
[PATCH RFC 06/14] migration/rdma: Transmit initial packet
Transmit initial packet through the multifd RDMA channels, so that we can identify the multifd channels. Signed-off-by: Zhimin Feng --- migration/multifd.c | 33 + migration/rdma.c| 2 ++ 2 files changed, 23 insertions(+), 12 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index acdfd3d5b3..a57d7a2eab 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -483,6 +483,13 @@ void multifd_send_sync_main(QEMUFile *f) static void *multifd_rdma_send_thread(void *opaque) { MultiFDSendParams *p = opaque; +Error *local_err = NULL; + +trace_multifd_send_thread_start(p->id); + +if (multifd_send_initial_packet(p, &local_err) < 0) { +goto out; +} while (true) { qemu_mutex_lock(&p->mutex); @@ -494,6 +501,12 @@ static void *multifd_rdma_send_thread(void *opaque) qemu_sem_wait(&p->sem); } +out: +if (local_err) { +trace_multifd_send_error(p->id); +multifd_send_terminate_threads(local_err); +} + qemu_mutex_lock(&p->mutex); p->running = false; qemu_mutex_unlock(&p->mutex); @@ -964,18 +977,14 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp) Error *local_err = NULL; int id; -if (migrate_use_rdma()) { -id = multifd_recv_state->count; -} else { -id = multifd_recv_initial_packet(ioc, &local_err); -if (id < 0) { -multifd_recv_terminate_threads(local_err); -error_propagate_prepend(errp, local_err, -"failed to receive packet" -" via multifd channel %d: ", -atomic_read(&multifd_recv_state->count)); -return false; -} +id = multifd_recv_initial_packet(ioc, &local_err); +if (id < 0) { +multifd_recv_terminate_threads(local_err); +error_propagate_prepend(errp, local_err, +"failed to receive packet" +" via multifd channel %d: ", +atomic_read(&multifd_recv_state->count)); +return false; } trace_multifd_recv_new_channel(id); diff --git a/migration/rdma.c b/migration/rdma.c index 48615fcaad..2f1e69197f 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -4003,6 +4003,8 @@ int multifd_channel_rdma_connect(void *opaque) goto out; } +p->c = QIO_CHANNEL(getQIOChannel(p->file)); + out: if (local_err) { trace_multifd_send_error(p->id); -- 2.19.1
[PATCH RFC 05/14] migration/rdma: Create the multifd channels for RDMA
In both sides. We still don't transmit anything through them, and we only build the RDMA connections. Signed-off-by: Zhimin Feng --- migration/multifd.c | 103 --- migration/multifd.h | 10 migration/rdma.c| 115 migration/rdma.h| 4 +- 4 files changed, 189 insertions(+), 43 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 63678d7fdd..acdfd3d5b3 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -248,6 +248,19 @@ struct { int exiting; } *multifd_send_state; +int get_multifd_send_param(int id, MultiFDSendParams **param) +{ +int ret = 0; + +if (id < 0 || id >= migrate_multifd_channels()) { +ret = -1; +} else { +*param = &(multifd_send_state->params[id]); +} + +return ret; +} + /* * How we use multifd_send_state->pages and channel->pages? * @@ -410,6 +423,9 @@ void multifd_save_cleanup(void) p->packet_len = 0; g_free(p->packet); p->packet = NULL; +if (migrate_use_rdma()) { +g_free(p->rdma); +} } qemu_sem_destroy(&multifd_send_state->channels_ready); g_free(multifd_send_state->params); @@ -464,6 +480,27 @@ void multifd_send_sync_main(QEMUFile *f) trace_multifd_send_sync_main(multifd_send_state->packet_num); } +static void *multifd_rdma_send_thread(void *opaque) +{ +MultiFDSendParams *p = opaque; + +while (true) { +qemu_mutex_lock(&p->mutex); +if (p->quit) { +qemu_mutex_unlock(&p->mutex); +break; +} +qemu_mutex_unlock(&p->mutex); +qemu_sem_wait(&p->sem); +} + +qemu_mutex_lock(&p->mutex); +p->running = false; +qemu_mutex_unlock(&p->mutex); + +return NULL; +} + static void *multifd_send_thread(void *opaque) { MultiFDSendParams *p = opaque; @@ -566,6 +603,12 @@ static void rdma_send_channel_create(MultiFDSendParams *p) { Error *local_err = NULL; +if (multifd_channel_rdma_connect(p)) { +error_setg(&local_err, "multifd: rdma channel %d not established", + p->id); +return ; +} + if (p->quit) { error_setg(&local_err, "multifd: send id %d already quit", p->id); return ; @@ -654,6 +697,19 @@ struct { uint64_t packet_num; } *multifd_recv_state; +int get_multifd_recv_param(int id, MultiFDRecvParams **param) +{ +int ret = 0; + +if (id < 0 || id >= migrate_multifd_channels()) { +ret = -1; +} else { +*param = &(multifd_recv_state->params[id]); +} + +return ret; +} + static void multifd_recv_terminate_threads(Error *err) { int i; @@ -724,6 +780,9 @@ int multifd_load_cleanup(Error **errp) p->packet_len = 0; g_free(p->packet); p->packet = NULL; +if (migrate_use_rdma()) { +g_free(p->rdma); +} } qemu_sem_destroy(&multifd_recv_state->sem_sync); g_free(multifd_recv_state->params); @@ -761,6 +820,27 @@ void multifd_recv_sync_main(void) trace_multifd_recv_sync_main(multifd_recv_state->packet_num); } +static void *multifd_rdma_recv_thread(void *opaque) +{ +MultiFDRecvParams *p = opaque; + +while (true) { +qemu_mutex_lock(&p->mutex); +if (p->quit) { +qemu_mutex_unlock(&p->mutex); +break; +} +qemu_mutex_unlock(&p->mutex); +qemu_sem_wait(&p->sem_sync); +} + +qemu_mutex_lock(&p->mutex); +p->running = false; +qemu_mutex_unlock(&p->mutex); + +return NULL; +} + static void *multifd_recv_thread(void *opaque) { MultiFDRecvParams *p = opaque; @@ -880,18 +960,24 @@ bool multifd_recv_all_channels_created(void) bool multifd_recv_new_channel(QIOChannel *ioc, Error **errp) { MultiFDRecvParams *p; +QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc); Error *local_err = NULL; int id; -id = multifd_recv_initial_packet(ioc, &local_err); -if (id < 0) { -multifd_recv_terminate_threads(local_err); -error_propagate_prepend(errp, local_err, -"failed to receive packet" -" via multifd channel %d: ", -atomic_read(&multifd_recv_state->count)); -return false; +if (migrate_use_rdma()) { +id = multifd_recv_state->count; +} else { +id = multifd_recv_initial_packet(ioc, &local_err); +if (id < 0) { +multifd_recv_terminate_threads(local_err); +error_propagate_prepend(errp, local_err, +"failed to receive packet" +" via multifd channel %d: ", +atomic_read(&multifd_recv_state->count)); +return false; +} } + trace_multifd_recv_new_channel(id); p = &multifd_recv_state->params[id]; @@ -903,6 +989,7 @@ bool multifd_recv_new_channel(Q
[PATCH RFC 13/14] migration/rdma: only register the memory for multifd channels
All data is sent by multifd Channels, so we only register its for multifd channels and main channel don't register its. Signed-off-by: Zhimin Feng --- migration/rdma.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/migration/rdma.c b/migration/rdma.c index eb7c2edbe7..b7b56c0493 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3717,6 +3717,12 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, qemu_sem_post(&multifd_send_param->sem_sync); } + +/* + * Use multifd to migrate, we only register memory for + * multifd RDMA channel and main channel don't register it. + */ +goto wait_reg_complete; } /* @@ -3778,6 +3784,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque, local->block[i].remote_rkey = rdma->dest_blocks[i].remote_rkey; } +wait_reg_complete: /* Wait for all multifd channels to complete registration */ if (migrate_use_multifd()) { int i; -- 2.19.1
[PATCH RFC 14/14] migration/rdma: RDMA cleanup for multifd migration
Signed-off-by: Zhimin Feng --- migration/multifd.c | 6 ++ migration/rdma.c| 5 ++--- migration/rdma.h| 1 + 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index ba5e0b11d0..886c8e1271 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -429,6 +429,9 @@ void multifd_save_cleanup(void) g_free(p->packet); p->packet = NULL; if (migrate_use_rdma()) { +p->rdma->listen_id = NULL; +p->rdma->channel = NULL; +qemu_rdma_cleanup(p->rdma); g_free(p->rdma); } } @@ -835,6 +838,9 @@ int multifd_load_cleanup(Error **errp) g_free(p->packet); p->packet = NULL; if (migrate_use_rdma()) { +p->rdma->listen_id = NULL; +p->rdma->channel = NULL; +qemu_rdma_cleanup(p->rdma); g_free(p->rdma); } } diff --git a/migration/rdma.c b/migration/rdma.c index b7b56c0493..0a48713d03 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -2096,11 +2096,11 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext *rdma, return 0; } -static void qemu_rdma_cleanup(RDMAContext *rdma) +void qemu_rdma_cleanup(RDMAContext *rdma) { int idx; -if (rdma->cm_id && rdma->connected) { +if (rdma->channel && rdma->cm_id && rdma->connected) { if ((rdma->error_state || migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) && !rdma->received_error) { @@ -2181,7 +2181,6 @@ static void qemu_rdma_cleanup(RDMAContext *rdma) rdma->host = NULL; } - static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp) { int ret, idx; diff --git a/migration/rdma.h b/migration/rdma.h index 7dc3895698..b78f79ddc2 100644 --- a/migration/rdma.h +++ b/migration/rdma.h @@ -283,6 +283,7 @@ int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head, int qemu_rdma_registration(void *opaque); int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested, uint32_t *byte_len); +void qemu_rdma_cleanup(RDMAContext *rdma); void rdma_start_outgoing_migration(void *opaque, const char *host_port, Error **errp); -- 2.19.1
Re: Question about (and problem with) pflash data access
On 13/02/20 08:40, Alexey Kardashevskiy wrote: >>> >>> memory-region: system >>> - (prio 0, i/o): system >>> -01ff (prio 0, romd): omap_sx1.flash0-1 >>> -01ff (prio 0, rom): omap_sx1.flash0-0 >> Eh two memory regions with same size and same priority... Is this legal? > > I'd say yes if used with memory_region_set_enabled() to make sure only > one is enabled. Having both enabled is weird and we should print a > warning. Yeah, it's undefined which one becomes visible. Paolo
Re: [PATCH] migration-test: fix some memleaks in migration-test
wrote: > From: Pan Nengyuan > > spotted by asan, 'check-qtest-aarch64' runs fail if sanitizers is enabled. > > Reported-by: Euler Robot > Signed-off-by: Pan Nengyuan Reviewed-by: Juan Quintela
Re: [PATCH] acpi: cpuhp: document CPHP_GET_CPU_ID_CMD command
On Wed, Feb 12, 2020 at 11:27:34PM +0100, Laszlo Ersek wrote: > Michael, > > On 01/29/20 15:06, Igor Mammedov wrote: > > Commit 3a61c8db9d25 introduced CPHP_GET_CPU_ID_CMD command but > > did not sufficiently describe it. Fix it by adding missing command > > documentation. > > > > Fixes: 3a61c8db9d25 ("acpi: cpuhp: add CPHP_GET_CPU_ID_CMD command") > > Signed-off-by: Igor Mammedov > > Reviewed-by: Laszlo Ersek > > --- > > docs/specs/acpi_cpu_hotplug.txt | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/docs/specs/acpi_cpu_hotplug.txt > > b/docs/specs/acpi_cpu_hotplug.txt > > index a8ce5e7..9bb22d1 100644 > > --- a/docs/specs/acpi_cpu_hotplug.txt > > +++ b/docs/specs/acpi_cpu_hotplug.txt > > @@ -94,6 +94,8 @@ write access: > > register in QEMU > > 2: following writes to 'Command data' register set OST status > > register in QEMU > > +3: following reads from 'Command data' and 'Command data 2' > > return > > + architecture specific CPU ID value for currently selected > > CPU. > > other values: reserved > > [0x6-0x7] reserved > > [0x8] Command data: (DWORD access) > > > > can you please merge this? > > It's a docs patch, but 3a61c8db9d25 (noted in "Fixes:") had gone in > through your tree. > > Thank you! > Laszlo Will do, thanks!
Re: [PATCH RFC 01/14] migration: add the 'migrate_use_rdma_pin_all' function
Zhimin Feng wrote: > Signed-off-by: Zhimin Feng Reviewed-by: Juan Quintela
Re: [PATCH RFC 02/14] migration: judge whether or not the RDMA is used for migration
Zhimin Feng wrote: > Signed-off-by: Zhimin Feng > --- > migration/migration.c | 10 ++ > migration/migration.h | 1 + > 2 files changed, 11 insertions(+) > > diff --git a/migration/migration.c b/migration/migration.c > index 10a13e0c79..819089a7ea 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -107,6 +107,7 @@ static NotifierList migration_state_notifiers = > NOTIFIER_LIST_INITIALIZER(migration_state_notifiers); > > static bool deferred_incoming; > +static bool enabled_rdma_migration; Please no. Use a field in migration state. No problem with the rest ofthe patch. Later, Juan.
Re: [PATCH v2] hw/char/exynos4210_uart: Fix memleaks in exynos4210_uart_init
On 2/13/20 3:56 AM, kuhn.chen...@huawei.com wrote: From: Chen Qun It's easy to reproduce as follow: virsh qemu-monitor-command vm1 --pretty '{"execute": "device-list-properties", "arguments":{"typename":"exynos4210.uart"}}' ASAN shows memory leak stack: #1 0xfffd896d71cb in g_malloc0 (/lib64/libglib-2.0.so.0+0x571cb) #2 0xaaad270beee3 in timer_new_full /qemu/include/qemu/timer.h:530 #3 0xaaad270beee3 in timer_new /qemu/include/qemu/timer.h:551 #4 0xaaad270beee3 in timer_new_ns /qemu/include/qemu/timer.h:569 #5 0xaaad270beee3 in exynos4210_uart_init /qemu/hw/char/exynos4210_uart.c:677 #6 0xaaad275c8f4f in object_initialize_with_type /qemu/qom/object.c:516 #7 0xaaad275c91bb in object_new_with_type /qemu/qom/object.c:684 #8 0xaaad2755df2f in qmp_device_list_properties /qemu/qom/qom-qmp-cmds.c:152 Reported-by: Euler Robot Signed-off-by: Chen Qun --- Changes V2 to V1: -Keep s->wordtime in exynos4210_uart_init (Base on Eduardo and Philippe's comments). Thanks. Reviewed-by: Philippe Mathieu-Daudé --- hw/char/exynos4210_uart.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/char/exynos4210_uart.c b/hw/char/exynos4210_uart.c index 25d6588e41..96d5180e3e 100644 --- a/hw/char/exynos4210_uart.c +++ b/hw/char/exynos4210_uart.c @@ -674,8 +674,6 @@ static void exynos4210_uart_init(Object *obj) SysBusDevice *dev = SYS_BUS_DEVICE(obj); Exynos4210UartState *s = EXYNOS4210_UART(dev); -s->fifo_timeout_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, - exynos4210_uart_timeout_int, s); s->wordtime = NANOSECONDS_PER_SECOND * 10 / 9600; /* memory mapping */ @@ -691,6 +689,9 @@ static void exynos4210_uart_realize(DeviceState *dev, Error **errp) { Exynos4210UartState *s = EXYNOS4210_UART(dev); +s->fifo_timeout_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, + exynos4210_uart_timeout_int, s); + qemu_chr_fe_set_handlers(&s->chr, exynos4210_uart_can_receive, exynos4210_uart_receive, exynos4210_uart_event, NULL, s, NULL, true);
Re: [PATCH RFC 03/14] migration/rdma: Create multiFd migration threads
Zhimin Feng wrote: > Creation of the multifd send threads for RDMA migration, > nothing inside yet. > > Signed-off-by: Zhimin Feng > --- > migration/multifd.c | 33 +--- > migration/multifd.h | 2 + > migration/qemu-file.c | 5 +++ > migration/qemu-file.h | 1 + > migration/rdma.c | 88 ++- > migration/rdma.h | 3 ++ > 6 files changed, 125 insertions(+), 7 deletions(-) > > diff --git a/migration/multifd.c b/migration/multifd.c > index b3e8ae9bcc..63678d7fdd 100644 > --- a/migration/multifd.c > +++ b/migration/multifd.c > @@ -424,7 +424,7 @@ void multifd_send_sync_main(QEMUFile *f) > { > int i; > > -if (!migrate_use_multifd()) { > +if (!migrate_use_multifd() || migrate_use_rdma()) { You don't need sync with main channel on rdma? > +static void rdma_send_channel_create(MultiFDSendParams *p) > +{ > +Error *local_err = NULL; > + > +if (p->quit) { > +error_setg(&local_err, "multifd: send id %d already quit", p->id); > +return ; > +} > +p->running = true; > + > +qemu_thread_create(&p->thread, p->name, multifd_rdma_send_thread, p, > + QEMU_THREAD_JOINABLE); > +} > + > static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque) > { > MultiFDSendParams *p = opaque; > @@ -621,7 +635,11 @@ int multifd_save_setup(Error **errp) > p->packet->magic = cpu_to_be32(MULTIFD_MAGIC); > p->packet->version = cpu_to_be32(MULTIFD_VERSION); > p->name = g_strdup_printf("multifdsend_%d", i); > -socket_send_channel_create(multifd_new_send_channel_async, p); > +if (!migrate_use_rdma()) { > +socket_send_channel_create(multifd_new_send_channel_async, p); > +} else { > +rdma_send_channel_create(p); > +} This is what we are trying to avoid. Just create a struct ops, where we have a ops->create_channel(new_channel_async, p) or whatever, and fill it differently for rdma and for tcp. > } > return 0; > } > @@ -720,7 +738,7 @@ void multifd_recv_sync_main(void) > { > int i; > > -if (!migrate_use_multifd()) { > +if (!migrate_use_multifd() || migrate_use_rdma()) { > return; > } Ok. you can just put an empty function for you here. > for (i = 0; i < migrate_multifd_channels(); i++) { > @@ -890,8 +908,13 @@ bool multifd_recv_new_channel(QIOChannel *ioc, Error > **errp) > p->num_packets = 1; > > p->running = true; > -qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p, > - QEMU_THREAD_JOINABLE); > +if (!migrate_use_rdma()) { > +qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p, > + QEMU_THREAD_JOINABLE); > +} else { > +qemu_thread_create(&p->thread, p->name, multifd_rdma_recv_thread, p, > + QEMU_THREAD_JOINABLE); > +} new_recv_chanel() member function. > atomic_inc(&multifd_recv_state->count); > return atomic_read(&multifd_recv_state->count) == > migrate_multifd_channels(); > diff --git a/migration/multifd.h b/migration/multifd.h > index d8b0205977..c9c11ad140 100644 > --- a/migration/multifd.h > +++ b/migration/multifd.h > @@ -13,6 +13,8 @@ > #ifndef QEMU_MIGRATION_MULTIFD_H > #define QEMU_MIGRATION_MULTIFD_H > > +#include "migration/rdma.h" > + > int multifd_save_setup(Error **errp); > void multifd_save_cleanup(void); > int multifd_load_setup(Error **errp); You are not exporting anything rdma related from here, are you? > diff --git a/migration/qemu-file.c b/migration/qemu-file.c > index 1c3a358a14..f0ed8f1381 100644 > --- a/migration/qemu-file.c > +++ b/migration/qemu-file.c > @@ -248,6 +248,11 @@ void qemu_fflush(QEMUFile *f) > f->iovcnt = 0; > } > > +void *getQIOChannel(QEMUFile *f) > +{ > +return f->opaque; > +} > + We really want this to return a void? and not a better type? > +static void migration_rdma_process_incoming(QEMUFile *f, Error **errp) > +{ > +MigrationIncomingState *mis = migration_incoming_get_current(); > +Error *local_err = NULL; > +QIOChannel *ioc = NULL; > +bool start_migration; > + > +if (!mis->from_src_file) { > +mis->from_src_file = f; > +qemu_file_set_blocking(f, false); > + > +start_migration = migrate_use_multifd(); > +} else { > +ioc = QIO_CHANNEL(getQIOChannel(f)); > +/* Multiple connections */ > +assert(migrate_use_multifd()); I am not sure that you can make this incompatible change. You need to have *both*, old method and new multifd one. I would have been happy to remove old precopy tcp method, but we *assure* backwards compatibility. > @@ -4003,8 +4032,12 @@ static void rdma_accept_incoming_migration(void > *opaque) > return; > } > > -rdma->migration_started_on_destination = 1; > -migration_fd_process_incoming(f, errp); > +i
Re: [PATCH RFC 00/14] *** multifd for RDMA v2 ***
Patchew URL: https://patchew.org/QEMU/20200213093755.370-1-fengzhim...@huawei.com/ Hi, This series failed the docker-mingw@fedora build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #! /bin/bash export ARCH=x86_64 make docker-image-fedora V=1 NETWORK=1 time make docker-test-mingw@fedora J=14 NETWORK=1 === TEST SCRIPT END === /tmp/qemu-test/src/migration/multifd.c:663: undefined reference to `multifd_channel_rdma_connect' ../migration/multifd.o: In function `multifd_load_cleanup': /tmp/qemu-test/src/migration/multifd.c:843: undefined reference to `qemu_rdma_cleanup' collect2: error: ld returned 1 exit status make[1]: *** [Makefile:206: qemu-system-x86_64w.exe] Error 1 make: *** [Makefile:497: x86_64-softmmu/all] Error 2 make: *** Waiting for unfinished jobs ../migration/multifd.o: In function `multifd_rdma_recv_thread': /tmp/qemu-test/src/migration/multifd.c:898: undefined reference to `qemu_rdma_registration_handle' --- /tmp/qemu-test/src/migration/multifd.c:663: undefined reference to `multifd_channel_rdma_connect' ../migration/multifd.o: In function `multifd_load_cleanup': /tmp/qemu-test/src/migration/multifd.c:843: undefined reference to `qemu_rdma_cleanup' collect2: error: ld returned 1 exit status make[1]: *** [Makefile:206: qemu-system-aarch64w.exe] Error 1 make: *** [Makefile:497: aarch64-softmmu/all] Error 2 Traceback (most recent call last): File "./tests/docker/docker.py", line 664, in sys.exit(main()) --- raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=88473d634d6543ea992045cbe9a806e1', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-yiq7aevf/src/docker-src.2020-02-13-05.11.35.1374:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2. filter=--filter=label=com.qemu.instance.uuid=88473d634d6543ea992045cbe9a806e1 make[1]: *** [docker-run] Error 1 make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-yiq7aevf/src' make: *** [docker-run-test-mingw@fedora] Error 2 real2m35.791s user0m7.717s The full log is available at http://patchew.org/logs/20200213093755.370-1-fengzhim...@huawei.com/testing.docker-mingw@fedora/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
Re: [PATCH] migration-test: fix some memleaks in migration-test
On 11/02/2020 09:45, pannengy...@huawei.com wrote: > From: Pan Nengyuan > > spotted by asan, 'check-qtest-aarch64' runs fail if sanitizers is enabled. > > Reported-by: Euler Robot > Signed-off-by: Pan Nengyuan > --- > tests/qtest/migration-test.c | 14 -- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c > index cf27ebbc9d..2bb214c87f 100644 > --- a/tests/qtest/migration-test.c > +++ b/tests/qtest/migration-test.c > @@ -498,11 +498,13 @@ static int test_migrate_start(QTestState **from, > QTestState **to, > const char *arch = qtest_get_arch(); > const char *machine_opts = NULL; > const char *memory_size; > +int ret = 0; > > if (args->use_shmem) { > if (!g_file_test("/dev/shm", G_FILE_TEST_IS_DIR)) { > g_test_skip("/dev/shm is not supported"); > -return -1; > +ret = -1; > +goto out; > } > } > > @@ -611,8 +613,9 @@ static int test_migrate_start(QTestState **from, > QTestState **to, > g_free(shmem_path); > } > > +out: > migrate_start_destroy(args); > -return 0; > +return ret; > } > > static void test_migrate_end(QTestState *from, QTestState *to, bool > test_dest) > @@ -1134,6 +1137,8 @@ static void test_validate_uuid(void) > { > MigrateStart *args = migrate_start_new(); > > +g_free(args->opts_source); > +g_free(args->opts_target); > args->opts_source = g_strdup("-uuid > ----"); > args->opts_target = g_strdup("-uuid > ----"); > do_test_validate_uuid(args, false); > @@ -1143,6 +1148,8 @@ static void test_validate_uuid_error(void) > { > MigrateStart *args = migrate_start_new(); > > +g_free(args->opts_source); > +g_free(args->opts_target); > args->opts_source = g_strdup("-uuid > ----"); > args->opts_target = g_strdup("-uuid > ----"); > args->hide_stderr = true; > @@ -1153,6 +1160,7 @@ static void test_validate_uuid_src_not_set(void) > { > MigrateStart *args = migrate_start_new(); > > +g_free(args->opts_target); > args->opts_target = g_strdup("-uuid > ----"); > args->hide_stderr = true; > do_test_validate_uuid(args, false); > @@ -1162,6 +1170,7 @@ static void test_validate_uuid_dst_not_set(void) > { > MigrateStart *args = migrate_start_new(); > > +g_free(args->opts_source); > args->opts_source = g_strdup("-uuid > ----"); > args->hide_stderr = true; > do_test_validate_uuid(args, false); > @@ -1379,6 +1388,7 @@ static void test_multifd_tcp_cancel(void) > " 'arguments': { 'uri': 'tcp:127.0.0.1:0' }}"); > qobject_unref(rsp); > > +g_free(uri); > uri = migrate_get_socket_address(to2, "socket-address"); > > wait_for_migration_status(from, "cancelled", NULL); > Reviewed-by: Laurent Vivier
Re: [RFC 2/2] pci-expender-bus:Add pcie-root-port to pxb-pcie under arm.
On Thu, Feb 13, 2020 at 03:49:52PM +0800, Yubo Miao wrote: > From: miaoyubo > > Since devices could not directly plugged into pxb-pcie, Hmm is this different from the root port? intergrated devices do exist for that actually. > under arm, how is arm special? > one > pcie-root port is plugged into pxb-pcie. Due to the bus for each pxb-pcie > is defined as 2 in acpi dsdt tables(one for pxb-pcie, one for pcie-root-port), > only one device could be plugged into one pxb-pcie. So why can't we have users specify any number of root ports using -device? then make acpi tables match the # of ports created? > > Signed-off-by: miaoyubo > --- > hw/pci-bridge/pci_expander_bridge.c | 9 + > include/hw/pci/pcie_port.h | 1 + > 2 files changed, 10 insertions(+) > > diff --git a/hw/pci-bridge/pci_expander_bridge.c > b/hw/pci-bridge/pci_expander_bridge.c > index 47aaaf8fd1..3d896dd452 100644 > --- a/hw/pci-bridge/pci_expander_bridge.c > +++ b/hw/pci-bridge/pci_expander_bridge.c > @@ -15,6 +15,7 @@ > #include "hw/pci/pci.h" > #include "hw/pci/pci_bus.h" > #include "hw/pci/pci_host.h" > +#include "hw/pci/pcie_port.h" > #include "hw/qdev-properties.h" > #include "hw/pci/pci_bridge.h" > #include "qemu/range.h" > @@ -233,7 +234,15 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool > pcie, Error **errp) > > ds = qdev_create(NULL, TYPE_PXB_HOST); > if (pcie) { > +#ifdef __aarch64__ > +bus = pci_root_bus_new(ds, "pxb-pcie-internal", > + NULL, NULL, 0, TYPE_PXB_PCIE_BUS); > +bds = qdev_create(BUS(bus), "pcie-root-port"); > +bds->id = dev_name; > +qdev_prop_set_uint8(bds, PCIE_ROOT_PORT_PROP_CHASSIS, pxb->bus_nr); > +#else > bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, > TYPE_PXB_PCIE_BUS); > +#endif What does all this have to do with building on aarch64? > } else { > bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, > TYPE_PXB_BUS); > bds = qdev_create(BUS(bus), "pci-bridge"); > diff --git a/include/hw/pci/pcie_port.h b/include/hw/pci/pcie_port.h > index 4b3d254b08..b41d473220 100644 > --- a/include/hw/pci/pcie_port.h > +++ b/include/hw/pci/pcie_port.h > @@ -64,6 +64,7 @@ int pcie_chassis_add_slot(struct PCIESlot *slot); > void pcie_chassis_del_slot(PCIESlot *s); > > #define TYPE_PCIE_ROOT_PORT "pcie-root-port-base" > +#define PCIE_ROOT_PORT_PROP_CHASSIS "chassis" If you are going to do this, replace other instances of "chassis" with the macro. > #define PCIE_ROOT_PORT_CLASS(klass) \ > OBJECT_CLASS_CHECK(PCIERootPortClass, (klass), TYPE_PCIE_ROOT_PORT) > #define PCIE_ROOT_PORT_GET_CLASS(obj) \ > -- > 2.19.1 >
Re: [PATCH] migration/postcopy: not necessary to discard all RAM at the beginning
* Wei Yang (richardw.y...@linux.intel.com) wrote: > ram_discard_range() unmap page for specific range. To be specific, this > clears related page table entries so that userfault would be triggered. > But this step is not necessary at the very beginning. > > ram_postcopy_incoming_init() is called when destination gets ADVISE > command. ADVISE command is sent when migration thread just starts, which > implies destination is not running yet. This means no page fault > happened and memory region's page tables entries are empty. > > This patch removes the discard at the beginning. > > Signed-off-by: Wei Yang > --- > migration/postcopy-ram.c | 46 > migration/postcopy-ram.h | 7 -- > migration/ram.c | 16 -- > migration/ram.h | 1 - > migration/savevm.c | 4 > 5 files changed, 74 deletions(-) > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c > index 5da6de8c8b..459be8e780 100644 > --- a/migration/postcopy-ram.c > +++ b/migration/postcopy-ram.c > @@ -443,32 +443,6 @@ out: > return ret; > } > > -/* > - * Setup an area of RAM so that it *can* be used for postcopy later; this > - * must be done right at the start prior to pre-copy. > - * opaque should be the MIS. > - */ > -static int init_range(RAMBlock *rb, void *opaque) > -{ > -const char *block_name = qemu_ram_get_idstr(rb); > -void *host_addr = qemu_ram_get_host_addr(rb); > -ram_addr_t offset = qemu_ram_get_offset(rb); > -ram_addr_t length = qemu_ram_get_used_length(rb); > -trace_postcopy_init_range(block_name, host_addr, offset, length); > - > -/* > - * We need the whole of RAM to be truly empty for postcopy, so things > - * like ROMs and any data tables built during init must be zero'd > - * - we're going to get the copy from the source anyway. > - * (Precopy will just overwrite this data, so doesn't need the discard) > - */ But this comment explains why we want to do the discard; we want to make sure that any memory that's been populated by the destination during the init process is discarded and replaced by content from the source. Dave > -if (ram_discard_range(block_name, 0, length)) { > -return -1; > -} > - > -return 0; > -} > - > /* > * At the end of migration, undo the effects of init_range > * opaque should be the MIS. > @@ -506,20 +480,6 @@ static int cleanup_range(RAMBlock *rb, void *opaque) > return 0; > } > > -/* > - * Initialise postcopy-ram, setting the RAM to a state where we can go into > - * postcopy later; must be called prior to any precopy. > - * called from arch_init's similarly named ram_postcopy_incoming_init > - */ > -int postcopy_ram_incoming_init(MigrationIncomingState *mis) > -{ > -if (foreach_not_ignored_block(init_range, NULL)) { > -return -1; > -} > - > -return 0; > -} > - > /* > * Manage a single vote to the QEMU balloon inhibitor for all postcopy usage, > * last caller wins. > @@ -1282,12 +1242,6 @@ bool > postcopy_ram_supported_by_host(MigrationIncomingState *mis) > return false; > } > > -int postcopy_ram_incoming_init(MigrationIncomingState *mis) > -{ > -error_report("postcopy_ram_incoming_init: No OS support"); > -return -1; > -} > - > int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis) > { > assert(0); > diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h > index c0ccf64a96..1c79c6e51f 100644 > --- a/migration/postcopy-ram.h > +++ b/migration/postcopy-ram.h > @@ -22,13 +22,6 @@ bool postcopy_ram_supported_by_host(MigrationIncomingState > *mis); > */ > int postcopy_ram_incoming_setup(MigrationIncomingState *mis); > > -/* > - * Initialise postcopy-ram, setting the RAM to a state where we can go into > - * postcopy later; must be called prior to any precopy. > - * called from ram.c's similarly named ram_postcopy_incoming_init > - */ > -int postcopy_ram_incoming_init(MigrationIncomingState *mis); > - > /* > * At the end of a migration where postcopy_ram_incoming_init was called. > */ > diff --git a/migration/ram.c b/migration/ram.c > index dfc50d57d5..9a853703d8 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -4015,22 +4015,6 @@ static int ram_load_cleanup(void *opaque) > return 0; > } > > -/** > - * ram_postcopy_incoming_init: allocate postcopy data structures > - * > - * Returns 0 for success and negative if there was one error > - * > - * @mis: current migration incoming state > - * > - * Allocate data structures etc needed by incoming migration with > - * postcopy-ram. postcopy-ram's similarly names > - * postcopy_ram_incoming_init does the work. > - */ > -int ram_postcopy_incoming_init(MigrationIncomingState *mis) > -{ > -return postcopy_ram_incoming_init(mis); > -} > - > /** > * ram_load_postcopy: load a page in postcopy case > * > diff --git a/migration/ram.h b/migration/ram.h > index 44fe4753ad..66cbff1d52 100644
Re: VW ELF loader
On 13/02/20 02:43, Alexey Kardashevskiy wrote: > > Ok. So, I have made a small firmware which does OF CI, loads GRUB and > instantiates RTAS: > https://github.com/aik/of1275 > Quite raw but gives the idea. > > It does not contain drivers and still relies on QEMU to hook an OF path > to a backend. Is this a showstopper and without drivers it is no go? Thanks, Yes, it's really the drivers. Something like netboot wouldn't work for example. I don't have a problem with relying on QEMU for opening and closing OF paths, but I really believe that read/write on ihandles should be done within the firmware and not QEMU. Paolo
Re: [RFC 1/2] arm: acpi: pci-expender-bus: Make arm to support PXB-PCIE
On Thu, Feb 13, 2020 at 03:49:51PM +0800, Yubo Miao wrote: > From: miaoyubo > > Currently virt machine is not supported by pxb-pcie, > and only one main host bridge described in ACPI tables. > Under this circumstance, different io numas for differnt devices > is not possible, in order to present io numas to the guest, > especially for host pssthrough devices. PXB-PCIE is supproted > by arm and certain resource is allocated for each pxb-pcie > in acpi table. > > Signed-off-by: miaoyubo > --- > hw/arm/virt-acpi-build.c | 234 +-- > hw/pci-host/gpex.c | 4 + > include/hw/arm/virt.h| 1 + > 3 files changed, 228 insertions(+), 11 deletions(-) > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c > index bd5f771e9b..2e449d0098 100644 > --- a/hw/arm/virt-acpi-build.c > +++ b/hw/arm/virt-acpi-build.c > @@ -49,6 +49,8 @@ > #include "kvm_arm.h" > #include "migration/vmstate.h" > > +#include "hw/arm/virt.h" > +#include "hw/pci/pci_bus.h" > #define ARM_SPI_BASE 32 > > static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus) > @@ -152,20 +154,227 @@ static void acpi_dsdt_add_virtio(Aml *scope, > } > > static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, > - uint32_t irq, bool use_highmem, bool > highmem_ecam) > + uint32_t irq, bool use_highmem, bool > highmem_ecam, > + VirtMachineState *vms) > { > int ecam_id = VIRT_ECAM_ID(highmem_ecam); > -Aml *method, *crs, *ifctx, *UUID, *ifctx1, *elsectx, *buf; > +Aml *method, *crs, *ifctx, *UUID, *ifctx1, *elsectx, *buf, *dev; > int i, bus_no; > +int count = 0; > hwaddr base_mmio = memmap[VIRT_PCIE_MMIO].base; > hwaddr size_mmio = memmap[VIRT_PCIE_MMIO].size; > hwaddr base_pio = memmap[VIRT_PCIE_PIO].base; > hwaddr size_pio = memmap[VIRT_PCIE_PIO].size; > hwaddr base_ecam = memmap[ecam_id].base; > hwaddr size_ecam = memmap[ecam_id].size; > +/* > + * 0x60 would be enough for pxb device > + * if it is too small, there is no enough space > + * for a pcie device plugged in a pcie-root port > + */ > +hwaddr size_addr = 0x60; > +hwaddr size_io = 0x4000; > int nr_pcie_buses = size_ecam / PCIE_MMCFG_SIZE_MIN; > +int root_bus_limit = 0xFF; > +PCIBus *bus = NULL; > +bus = VIRT_MACHINE(vms)->bus; > + > +if (bus) { > +QLIST_FOREACH(bus, &bus->child, sibling) { > +uint8_t bus_num = pci_bus_num(bus); > +uint8_t numa_node = pci_bus_numa_node(bus); > + > +if (!pci_bus_is_root(bus)) { > +continue; > +} > +if (bus_num < root_bus_limit) { > +root_bus_limit = bus_num - 1; > +} > +count++; > +dev = aml_device("PC%.02X", bus_num); > +aml_append(dev, aml_name_decl("_HID", aml_string("PNP0A08"))); > +aml_append(dev, aml_name_decl("_CID", aml_string("PNP0A03"))); > +aml_append(dev, aml_name_decl("_ADR", aml_int(0))); > +aml_append(dev, aml_name_decl("_CCA", aml_int(1))); > +aml_append(dev, aml_name_decl("_SEG", aml_int(0))); > +aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num))); > +aml_append(dev, aml_name_decl("_UID", aml_int(bus_num))); > +aml_append(dev, aml_name_decl("_STR", aml_unicode("pxb > Device"))); > +if (numa_node != NUMA_NODE_UNASSIGNED) { > +method = aml_method("_PXM", 0, AML_NOTSERIALIZED); > +aml_append(method, aml_return(aml_int(numa_node))); > +aml_append(dev, method); > +} > +/* Declare the PCI Routing Table. */ > +Aml *rt_pkg = aml_varpackage(nr_pcie_buses * PCI_NUM_PINS); > +for (bus_no = 0; bus_no < nr_pcie_buses; bus_no++) { > +for (i = 0; i < PCI_NUM_PINS; i++) { > +int gsi = (i + bus_no) % (PCI_NUM_PINS); > +Aml *pkg = aml_package(4); > +aml_append(pkg, aml_int((bus_no << 16) | 0x)); > +aml_append(pkg, aml_int(i)); > +aml_append(pkg, aml_name("GSI%d", gsi)); > +aml_append(pkg, aml_int(0)); > +aml_append(rt_pkg, pkg); > +} > +} > +aml_append(dev, aml_name_decl("_PRT", rt_pkg)); > + > +for (i = 0; i < PCI_NUM_PINS; i++) { > +uint32_t irqs = irq + i; > +Aml *dev_gsi = aml_device("GSI%d", i); > +aml_append(dev_gsi, aml_name_decl("_HID", > + aml_string("PNP0C0F"))); > +aml_append(dev_gsi, aml_name_decl("_UID", aml_int(0))); > +crs = aml_resource_template(); > +aml_append(crs, > + aml_interrupt(AML_
Re: [PATCH 2/2] util: add util function buffer_zero_avx512()
On 13/02/20 08:52, Robert Hoo wrote: > + > +} > +#pragma GCC pop_options > +#endif > + > + > /* Note that for test_buffer_is_zero_next_accel, the most preferred > * ISA must have the least significant bit. > */ > -#define CACHE_AVX21 > -#define CACHE_SSE42 > -#define CACHE_SSE24 > +#define CACHE_AVX512F 1 > +#define CACHE_AVX22 > +#define CACHE_SSE44 > +#define CACHE_SSE26 This should be 8, not 6. Paolo > > /* Make sure that these variables are appropriately initialized when > * SSE2 is enabled on the compiler command-line, but the compiler is > @@ -226,6 +268,11 @@ static void init_accel(unsigned cache) > fn = buffer_zero_avx2; > } > #endif > +#ifdef CONFIG_AVX512F_OPT > +if (cache & CACHE_AVX512F) { > +fn = buffer_zero_avx512; > +} > +#endif > buffer_accel = fn; > } > > @@ -255,6 +302,9 @@ static void __attribute__((constructor)) > init_cpuid_cache(void) > if ((bv & 6) == 6 && (b & bit_AVX2)) { > cache |= CACHE_AVX2; > } > +if ((bv & 6) == 6 && (b & bit_AVX512F)) { > +cache |= CACHE_AVX512F; > +} > }
Re: [RESEND RFC PATCH v2 1/2] target/arm: Allow to inject SError interrupt
On Thu, 13 Feb 2020 at 03:49, Gavin Shan wrote: > On 2/12/20 10:34 PM, Peter Maydell wrote: > > Yeah, this is on my list to look at; Richard Henderson also could > > have a look at it. From a quick scan I suspect you may be missing > > handling for AArch32. > Yes, the functionality is only supported on aarch64 currently by intention > because the next patch enables it on "max" and "host" CPU models and both > of them are running in aarch64 mode. > > https://patchwork.kernel.org/patch/11366119/ > > If you really want to get this supported for aarch32 either, I can do > it. However, it seems there is a long list of aarch32 CPU models, defined > in target/arm/cpu.c::arm_cpus. so which CPU models you prefer to see with > this supported? I think we might choose one or two popular CPU models if > you agree. I don't think you should need to care about the CPU models. We should implement SError (aka "asynchronous external abort" in ARMv7 and earlier) generically for all CPUs. The SError/async abort should be triggered by a qemu_irq line inbound to the CPU (similar to FIQ and IRQ); the board can choose to wire that up to something, or not, as it likes. thanks -- PMM
Re: [PATCH] docs: Fix virtiofsd.1 location
On 12/02/20 18:51, Peter Maydell wrote: > On Wed, 12 Feb 2020 at 16:51, Philippe Mathieu-Daudé > wrote: >> We stopped testing in-tree builds 2 months ago: >> >> commit bc4486fb233573e77b6e9ad6d6379afb5e37ad8c >> Author: Paolo Bonzini >> Date: Wed Dec 11 15:33:49 2019 +0100 >> >> ci: build out-of-tree >> >> Most developers are using out-of-tree builds and it was discussed >> in the past to only allow those. To prepare for the transition, >> use out-of-tree builds in all continuous integration jobs. > > I'd missed that. Paolo, do you have a plan for following > through and actively forbidding in-tree-builds, if that's > the route we're taking ? I can follow up on that, yes. Paolo
Re: The issues about architecture of the COLO checkpoint
* Daniel Cho (daniel...@qnap.com) wrote: > Hi Hailiang, > > 1. > OK, we will try the patch > “0001-COLO-Optimize-memory-back-up-process.patch”, > and thanks for your help. > > 2. > We understand the reason to compare PVM and SVM's packet. However, the > empty of SVM's packet queue might happened on setting COLO feature and SVM > broken. > > On situation 1 ( setting COLO feature ): > We could force do checkpoint after setting COLO feature finish, then it > will protect the state of PVM and SVM . As the Zhang Chen said. > > On situation 2 ( SVM broken ): > COLO will do failover for PVM, so it might not cause any wrong on PVM. > > However, those situations are our views, so there might be a big difference > between reality and our views. > If we have any wrong views and opinions, please let us know, and correct > us. It does need a timeout; the SVM being broken or being in a state where it never sends the corresponding packet (because of a state difference) can happen and COLO needs to timeout when the packet hasn't arrived after a while and trigger the checkpoint. Dave > Thanks. > > Best regards, > Daniel Cho > > Zhang, Chen 於 2020年2月13日 週四 上午10:17寫道: > > > Add cc Jason Wang, he is a network expert. > > > > In case some network things goes wrong. > > > > > > > > Thanks > > > > Zhang Chen > > > > > > > > *From:* Zhang, Chen > > *Sent:* Thursday, February 13, 2020 10:10 AM > > *To:* 'Zhanghailiang' ; Daniel Cho < > > daniel...@qnap.com> > > *Cc:* Dr. David Alan Gilbert ; qemu-devel@nongnu.org > > *Subject:* RE: The issues about architecture of the COLO checkpoint > > > > > > > > For the issue 2: > > > > > > > > COLO need use the network packets to confirm PVM and SVM in the same state, > > > > Generally speaking, we can’t send PVM packets without compared with SVM > > packets. > > > > But to prevent jamming, I think COLO can do force checkpoint and send the > > PVM packets in this case. > > > > > > > > Thanks > > > > Zhang Chen > > > > > > > > *From:* Zhanghailiang > > *Sent:* Thursday, February 13, 2020 9:45 AM > > *To:* Daniel Cho > > *Cc:* Dr. David Alan Gilbert ; qemu-devel@nongnu.org; > > Zhang, Chen > > *Subject:* RE: The issues about architecture of the COLO checkpoint > > > > > > > > Hi, > > > > > > > > 1. After re-walked through the codes, yes, you are right, actually, > > after the first migration, we will keep dirty log on in primary side, > > > > And only send the dirty pages in PVM to SVM. The ram cache in secondary > > side is always a backup of PVM, so we don’t have to > > > > Re-send the none-dirtied pages. > > > > The reason why the first checkpoint takes longer time is we have to backup > > the whole VM’s ram into ram cache, that is colo_init_ram_cache(). > > > > It is time consuming, but I have optimized in the second patch > > “0001-COLO-Optimize-memory-back-up-process.patch” which you can find in my > > previous reply. > > > > > > > > Besides, I found that, In my previous reply “We can only copy the pages > > that dirtied by PVM and SVM in last checkpoint.”, > > > > We have done this optimization in current upstream codes. > > > > > > > > 2.I don’t quite understand this question. For COLO, we always need both > > network packets of PVM’s and SVM’s to compare before send this packets to > > client. > > > > It depends on this to decide whether or not PVM and SVM are in same state. > > > > > > > > Thanks, > > > > hailiang > > > > > > > > *From:* Daniel Cho [mailto:daniel...@qnap.com ] > > *Sent:* Wednesday, February 12, 2020 4:37 PM > > *To:* Zhang, Chen > > *Cc:* Zhanghailiang ; Dr. David Alan > > Gilbert ; qemu-devel@nongnu.org > > *Subject:* Re: The issues about architecture of the COLO checkpoint > > > > > > > > Hi Hailiang, > > > > > > > > Thanks for your replaying and explain in detail. > > > > We will try to use the attachments to enhance memory copy. > > > > > > > > However, we have some questions for your replying. > > > > > > > > 1. As you said, "for each checkpoint, we have to send the whole PVM's > > pages To SVM", why the only first checkpoint will takes more pause time? > > > > In our observing, the first checkpoint will take more time for pausing, > > then other checkpoints will takes a few time for pausing. Does it means > > only the first checkpoint will send the whole pages to SVM, and the other > > checkpoints send the dirty pages to SVM for reloading? > > > > > > > > 2. We notice the COLO-COMPARE component will stuck the packet until > > receive packets from PVM and SVM, as this rule, when we add the > > COLO-COMPARE to PVM, its network will stuck until SVM start. So it is an > > other issue to make PVM stuck while setting COLO feature. With this issue, > > could we let colo-compare to pass the PVM's packet when the SVM's packet > > queue is empty? Then, the PVM's network won't stock, and "if PVM runs > > firstly, it still need to wait for The network packets from SVM to > > compare before send it to client side" won't happened either
Re: [RESEND RFC PATCH v2 1/2] target/arm: Allow to inject SError interrupt
On 2/13/20 4:39 PM, Richard Henderson wrote: On 2/12/20 7:49 PM, Gavin Shan wrote: On 2/12/20 10:34 PM, Peter Maydell wrote: Yeah, this is on my list to look at; Richard Henderson also could have a look at it. From a quick scan I suspect you may be missing handling for AArch32. [Thanks for copying Richard Henderson] Yes, the functionality is only supported on aarch64 currently by intention because the next patch enables it on "max" and "host" CPU models and both of them are running in aarch64 mode. We shouldn't leave the aarch32 exception entry paths unimplemented though. C.f. AArch32.TakePhysicalSErrorException() AArch32.TakeVirtualSErrorException() It really shouldn't be more than a couple of lines, just like arm_cpu_do_interrupt_aarch64. Remember both arm_cpu_do_interrupt_aarch32 and arm_cpu_do_interrupt_aarch32_hyp. Thanks for the details. The SError injection for aarch32 will be included in v3. However, it seems there is a long list of aarch32 CPU models, defined in target/arm/cpu.c::arm_cpus. so which CPU models you prefer to see with this supported? I think we might choose one or two popular CPU models if you agree. Even qemu-system-aarch64 -cpu max can exercise this path when EL1 is running in aarch32 mode. Admittedly it would be easier if we had the rest of the plumbing so that -cpu max,aarch64=off worked. FWIW, the rest of the patch looks good. I think "-cpu max,aarch64=off" is only valid when KVM is enabled? If that's true, the ioctl(cpu, KVM_SET_VCPU_EVENTS, &events) already worked for aarch32 or aarch64 guest if I'm correct enough. But yes, I need to test it because I never tested this series on aarch32 guest :) Thanks, Gavin
Re: [PATCH v2 5/9] linux-user: mips: Update syscall numbers to kernel 5.5 level
Le 13/02/2020 à 02:46, Aleksandar Markovic a écrit : > From: Aleksandar Markovic > > Update mips syscall numbers based on Linux kernel tag v5.5. > > CC: Aurelien Jarno > CC: Aleksandar Rikalo > Signed-off-by: Aleksandar Markovic > --- > linux-user/mips/syscall_nr.h | 45 +++ > linux-user/mips64/syscall_nr.h | 50 - > linux-user/mips/cpu_loop.c | 83 > +- > 3 files changed, 175 insertions(+), 3 deletions(-) > ... > diff --git a/linux-user/mips/cpu_loop.c b/linux-user/mips/cpu_loop.c > index 39915b3..a2c72fa 100644 > --- a/linux-user/mips/cpu_loop.c > +++ b/linux-user/mips/cpu_loop.c > @@ -25,8 +25,9 @@ > #include "internal.h" > > # ifdef TARGET_ABI_MIPSO32 > +# define MIPS_SYSCALL_NUMBER_UNUSED -1 > # define MIPS_SYS(name, args) args, > -static const uint8_t mips_syscall_args[] = { > +static const int8_t mips_syscall_args[] = { > MIPS_SYS(sys_syscall, 8)/* 4000 */ > MIPS_SYS(sys_exit , 1) > MIPS_SYS(sys_fork , 0) > @@ -390,6 +391,80 @@ static const uint8_t mips_syscall_args[] = { > MIPS_SYS(sys_copy_file_range, 6) /* 360 */ > MIPS_SYS(sys_preadv2, 6) > MIPS_SYS(sys_pwritev2, 6) > +MIPS_SYS(sys_pkey_mprotect, 4) > +MIPS_SYS(sys_pkey_alloc, 2) > +MIPS_SYS(sys_pkey_free, 1) /* 365 */ > +MIPS_SYS(sys_statx, 5) > +MIPS_SYS(sys_rseq, 4) > +MIPS_SYS(sys_io_pgetevents, 6) > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED,/* 370 */ > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED,/* 375 */ > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED,/* 380 */ > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED,/* 385 */ > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED,/* 390 */ > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYSCALL_NUMBER_UNUSED, > +MIPS_SYS(sys_semget, 3) > +MIPS_SYS(sys_semctl, 4) > +MIPS_SYS(sys_shmget, 3)/* 395 */ > +MIPS_SYS(sys_shmctl, 3) > +MIPS_SYS(sys_shmat, 3) > +MIPS_SYS(sys_shmdt, 1) > +MIPS_SYS(sys_msgget, 2) > +MIPS_SYS(sys_msgsnd, 4)/* 400 */ > +MIPS_SYS(sys_msgrcv, 5) > +MIPS_SYS(sys_msgctl, 3) > +MIPS_SYS(sys_clock_gettime64, 2) > +MIPS_SYS(sys_clock_settime64, 4) > +MIPS_SYS(sys_clock_adjtime64, 2) /* 405 */ > +MIPS_SYS(sys_clock_getres_time64, 4) According to https://github.com/strace/strace/blob/master/linux/syscallent-common-32.h: [BASE_NR + 406] = { 2, 0, SEN(clock_getres_time64), "clock_getres_time64" }, once fixed you can add my Reviewed-by: Laurent Vivier
Re: [RESEND RFC PATCH v2 2/2] target/arm: Support NMI injection
On Wed, 5 Feb 2020 at 11:06, Gavin Shan wrote: > > This supports QMP/HMP "nmi" command by injecting SError interrupt to > guest, which is expected to crash with that. Currently, It's supported > on two CPU models: "host" and "max". > > Signed-off-by: Gavin Shan > --- > hw/arm/virt.c | 18 > target/arm/cpu-qom.h | 1 + > target/arm/cpu.c | 48 ++ > target/arm/cpu64.c | 25 ++ > target/arm/internals.h | 8 +++ > 5 files changed, 96 insertions(+), 4 deletions(-) A few quick general notes: (1) as I mentioned on the cover letter, the mechanism for injecting an SError/async external abort into the CPU should be a qemu_irq line, like FIQ/IRQ, not a special-purpose method on the CPU object. (2) for function naming, there's a dividing line between: * code that implements the (unfortunately x86-centric) monitor command named "nmi"; these functions can have names with 'nmi' in them * code which implements the actual mechanism of 'deliver an SError to the CPU'; these functions should not have 'nmi' in the name or mention nmi, because nmi is not a concept in the Arm architecture (3) Before we expose 'nmi' to users as something that delivers an SError, we need to think about the interactions with RAS, because currently we also use SError to say "there was an error in the host memory you're using", and we might in future want to use SError for proper emulated RAS. We don't want to paint ourselves into a corner by grabbing SError exclusively for 'nmi'. thanks -- PMM
Re: [PATCH v2] virtio: increase virtuqueue size for virtio-scsi and virtio-blk
On Thu, Feb 13, 2020 at 12:28:25PM +0300, Denis Plotnikov wrote: > > > On 13.02.2020 12:08, Stefan Hajnoczi wrote: > > On Thu, Feb 13, 2020 at 11:08:35AM +0300, Denis Plotnikov wrote: > > > On 12.02.2020 18:43, Stefan Hajnoczi wrote: > > > > On Tue, Feb 11, 2020 at 05:14:14PM +0300, Denis Plotnikov wrote: > > > > > The goal is to reduce the amount of requests issued by a guest on > > > > > 1M reads/writes. This rises the performance up to 4% on that kind of > > > > > disk access pattern. > > > > > > > > > > The maximum chunk size to be used for the guest disk accessing is > > > > > limited with seg_max parameter, which represents the max amount of > > > > > pices in the scatter-geather list in one guest disk request. > > > > > > > > > > Since seg_max is virqueue_size dependent, increasing the virtqueue > > > > > size increases seg_max, which, in turn, increases the maximum size > > > > > of data to be read/write from a guest disk. > > > > > > > > > > More details in the original problem statment: > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg03721.html > > > > > > > > > > Suggested-by: Denis V. Lunev > > > > > Signed-off-by: Denis Plotnikov > > > > > --- > > > > >hw/block/virtio-blk.c | 4 ++-- > > > > >hw/core/machine.c | 2 ++ > > > > >hw/scsi/virtio-scsi.c | 4 ++-- > > > > >3 files changed, 6 insertions(+), 4 deletions(-) > > > > > > > > > > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c > > > > > index 09f46ed85f..6df3a7a6df 100644 > > > > > --- a/hw/block/virtio-blk.c > > > > > +++ b/hw/block/virtio-blk.c > > > > > @@ -914,7 +914,7 @@ static void virtio_blk_update_config(VirtIODevice > > > > > *vdev, uint8_t *config) > > > > >memset(&blkcfg, 0, sizeof(blkcfg)); > > > > >virtio_stq_p(vdev, &blkcfg.capacity, capacity); > > > > >virtio_stl_p(vdev, &blkcfg.seg_max, > > > > > - s->conf.seg_max_adjust ? s->conf.queue_size - 2 : > > > > > 128 - 2); > > > > > + s->conf.seg_max_adjust ? s->conf.queue_size - 2 : > > > > > 256 - 2); > > > > This value must not change on older machine types. > > > Yes, that's true, but .. > > > > So does this patch > > > > need to turn seg-max-adjust *on* in hw_compat_4_2 so that old machine > > > > types get 126 instead of 254? > > > If we set seg-max-adjust "on" in older machine types, the setups using > > > them > > > and having queue_sizes set , for example, 1024 will also set seg_max to > > > 1024 > > > - 2 which isn't the expected behavior: older mt didn't change seg_max in > > > that case and stuck with 128 - 2. > > > So, should we, instead, leave the default 128 - 2, for seg_max? > > Argh! Good point :-). > > > > How about a seg_max_default property that is initialized to 254 for > > modern machines and 126 to old machines? > Hmm, but we'll achieve the same but with more code changes, don't we? > 254 is because the queue-size is 256. We gonna leave 128-2 for older machine > types > just for not breaking anything. All other seg_max adjustment is provided by > seg_max_adjust which is "on" by default in modern machine types. > > to summarize: > > modern mt defaults: > seg_max_adjust = on > queue_size = 256 > > => default seg_max = 254 > => changing queue-size will change seg_max = queue_size - 2 > > old mt defaults: > seg_max_adjust = off > queue_size = 128 > > => default seg_max = 126 > => changing queue-size won't change seg_max, it's always = 126 like it was > before You're right! The only strange case is a modern machine type with seg_max_adjust=off, where queue_size will be 256 but seg_max will be 126. But no user would want to disable seg_max_adjust, so it's okay. I agree with you that the line of code can remain unchanged: /* * Only old machine types use seg_max_adjust=off and there the default * value of queue_size is 128. */ virtio_stl_p(vdev, &blkcfg.seg_max, s->conf.seg_max_adjust ? s->conf.queue_size - 2 : 128 - 2); Stefan signature.asc Description: PGP signature
Re: [PATCH v2 0/2] spapr: Use vIOMMU translation for virtio by default
On Thu, 13 Feb 2020 11:58:35 +1100 David Gibson wrote: > Upcoming Secure VM support for pSeries machines introduces some > complications for virtio, since the transfer buffers need to be > explicitly shared so that the hypervisor can access them. > > While it's not strictly speaking dependent on it, the fact that virtio > devices bypass normal platform IOMMU translation complicates the issue > on the guest side. Since there are some significan downsides to > bypassing the vIOMMU anyway, let's just disable that. > > There's already a flag to do this in virtio, just turn it on by > default for forthcoming pseries machine types. > > Any opinions on whether dropping support for the older guest kernels > is acceptable at this point? > As expected, this breaks compatibility with existing RHEL 6.10 guests. Each patch in this series requires an extra -global option to be specified on the command line in order to boot successfully. Patch 1: -global virtio-pci.disable-legacy=auto Patch 2: -global virtio-pci.iommu_platform=off As seen on the RH site [1], RHEL6 will reach "End of Maintenance Support or Maintenance Support 2 (Product retirement)" on November 30, 2020 and "End of Extended Life-cycle Support" on June 30, 2024. Not sure if it's okay to drop support for RHEL6 this soon. RHEL 7.7 guests seem to be unaffected. [1] https://access.redhat.com/support/policy/updates/errata/#Life_Cycle_Dates > Changes since v1: > * Added information on which guest kernel versions will no longer >work with these changes > * Use Michael Tsirkin's suggested better way of handling the machine >type change > > David Gibson (2): > spapr: Disable legacy virtio devices for pseries-5.0 and later > spapr: Enable virtio iommu_platform=on by default > > hw/ppc/spapr.c | 16 +++- > 1 file changed, 15 insertions(+), 1 deletion(-) >
Re: [PATCH v9 01/23] checkpatch: replace vl.c in the top of repo check
On Tue, Feb 11, 2020 at 03:34:48PM -0500, Alexander Bulekov wrote: > 524b4c2c5c moves vl.c into softmmu/ , breaking the checkpatch 524b4c2c5c is a local git sha1. That commit will have a different sha1 when applied to qemu.git/master. Saying "The next patch" instead would be fine. However, this patch leaves the tree in a state where checkpatch.pl will fail because softmmu/ doesn't exist yet! Please squash this patch into the next commit instead. I guess you kept it separate because changing checkpatch.pl can be thought of as a separate change. However, they two need to happen in a single step in order for checkpatch.pl to function correctly at each commit. Therefore it's appropriate to combine them into a single commit. > top-of-kernel-tree check. Replace with checks for softmmu and linux-user > > Signed-off-by: Alexander Bulekov > --- > scripts/checkpatch.pl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl > index ce43a306f8..2e2273b8a3 100755 > --- a/scripts/checkpatch.pl > +++ b/scripts/checkpatch.pl > @@ -462,7 +462,7 @@ sub top_of_kernel_tree { > my @tree_check = ( > "COPYING", "MAINTAINERS", "Makefile", > "README.rst", "docs", "VERSION", > - "vl.c" > + "softmmu", "linux-user" > ); > > foreach my $check (@tree_check) { > -- > 2.25.0 > signature.asc Description: PGP signature
Re: [PATCH 2/2] util: add util function buffer_zero_avx512()
On Thu, 2020-02-13 at 11:30 +0100, Paolo Bonzini wrote: > On 13/02/20 08:52, Robert Hoo wrote: > > + > > +} > > +#pragma GCC pop_options > > +#endif > > + > > + > > /* Note that for test_buffer_is_zero_next_accel, the most > > preferred > > * ISA must have the least significant bit. > > */ > > -#define CACHE_AVX21 > > -#define CACHE_SSE42 > > -#define CACHE_SSE24 > > +#define CACHE_AVX512F 1 > > +#define CACHE_AVX22 > > +#define CACHE_SSE44 > > +#define CACHE_SSE26 > > This should be 8, not 6. > > Paolo Thanks Paolo, going to fix it in v2. > > > > > /* Make sure that these variables are appropriately initialized > > when > > * SSE2 is enabled on the compiler command-line, but the compiler > > is > > @@ -226,6 +268,11 @@ static void init_accel(unsigned cache) > > fn = buffer_zero_avx2; > > } > > #endif > > +#ifdef CONFIG_AVX512F_OPT > > +if (cache & CACHE_AVX512F) { > > +fn = buffer_zero_avx512; > > +} > > +#endif > > buffer_accel = fn; > > } > > > > @@ -255,6 +302,9 @@ static void __attribute__((constructor)) > > init_cpuid_cache(void) > > if ((bv & 6) == 6 && (b & bit_AVX2)) { > > cache |= CACHE_AVX2; > > } > > +if ((bv & 6) == 6 && (b & bit_AVX512F)) { > > +cache |= CACHE_AVX512F; > > +} > > } > >
Re: [PATCH] hw/arm: ast2400/ast2500: Wire up EHCI controllers
On Thu, 6 Feb 2020 at 18:34, Guenter Roeck wrote: > > Initialize EHCI controllers on AST2400 and AST2500 using the existing > TYPE_PLATFORM_EHCI. After this change, booting ast2500-evb into Linux > successfully instantiates a USB interface. > > ehci-platform 1e6a3000.usb: EHCI Host Controller > ehci-platform 1e6a3000.usb: new USB bus registered, assigned bus number 1 > ehci-platform 1e6a3000.usb: irq 21, io mem 0x1e6a3000 > ehci-platform 1e6a3000.usb: USB 2.0 started, EHCI 1.00 > usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.05 > usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 > usb usb1: Product: EHCI Host Controller > > Signed-off-by: Guenter Roeck Applied to target-arm.next, thanks. -- PMM
Re: [PATCH v9 02/23] softmmu: move vl.c to softmmu/
On Tue, Feb 11, 2020 at 03:34:49PM -0500, Alexander Bulekov wrote: > Signed-off-by: Alexander Bulekov > --- > Makefile.objs | 2 -- > Makefile.target | 1 + > softmmu/Makefile.objs | 2 ++ > vl.c => softmmu/vl.c | 0 > 4 files changed, 3 insertions(+), 2 deletions(-) Please update the ./MAINTAINERS entry for vl.c here too. There is also a top_of_tree check in scripts/get_maintainer.pl that needs to be updated in this commit. signature.asc Description: PGP signature
Re: [PATCH v2] hw/arm: ast2600: Wire up EHCI controllers
On Fri, 7 Feb 2020 at 17:45, Guenter Roeck wrote: > > Initialize EHCI controllers on AST2600 using the existing > TYPE_PLATFORM_EHCI. After this change, booting ast2600-evb > into Linux successfully instantiates a USB interface after > the necessary changes are made to its devicetree files. > > ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver > ehci-platform: EHCI generic platform driver > ehci-platform 1e6a3000.usb: EHCI Host Controller > ehci-platform 1e6a3000.usb: new USB bus registered, assigned bus number 1 > ehci-platform 1e6a3000.usb: irq 25, io mem 0x1e6a3000 > ehci-platform 1e6a3000.usb: USB 2.0 started, EHCI 1.00 > usb usb1: Manufacturer: Linux 5.5.0-09825-ga0802f2d0ef5-dirty ehci_hcd > usb 1-1: new high-speed USB device number 2 using ehci-platform > > Reviewed-by: Cédric Le Goater > Signed-off-by: Guenter Roeck Applied to target-arm.next, thanks. -- PMM
[PULL 3/5] linux-user: fix TARGET_NSIG and _NSIG uses
Valid signal numbers are between 1 (SIGHUP) and SIGRTMAX. System includes define _NSIG to SIGRTMAX + 1, but QEMU (like kernel) defines TARGET_NSIG to TARGET_SIGRTMAX. Fix all the checks involving the signal range. Signed-off-by: Laurent Vivier Reviewed-by: Peter Maydell Tested-by: Taylor Simpson Message-Id: <20200212125658.644558-4-laur...@vivier.eu> --- linux-user/signal.c | 52 - 1 file changed, 37 insertions(+), 15 deletions(-) diff --git a/linux-user/signal.c b/linux-user/signal.c index 246315571c09..c1e664f97a7c 100644 --- a/linux-user/signal.c +++ b/linux-user/signal.c @@ -30,6 +30,15 @@ static struct target_sigaction sigact_table[TARGET_NSIG]; static void host_signal_handler(int host_signum, siginfo_t *info, void *puc); + +/* + * System includes define _NSIG as SIGRTMAX + 1, + * but qemu (like the kernel) defines TARGET_NSIG as TARGET_SIGRTMAX + * and the first signal is SIGHUP defined as 1 + * Signal number 0 is reserved for use as kill(pid, 0), to test whether + * a process exists without sending it a signal. + */ +QEMU_BUILD_BUG_ON(__SIGRTMAX + 1 != _NSIG); static uint8_t host_to_target_signal_table[_NSIG] = { [SIGHUP] = TARGET_SIGHUP, [SIGINT] = TARGET_SIGINT, @@ -67,19 +76,24 @@ static uint8_t host_to_target_signal_table[_NSIG] = { [SIGSYS] = TARGET_SIGSYS, /* next signals stay the same */ }; -static uint8_t target_to_host_signal_table[_NSIG]; +static uint8_t target_to_host_signal_table[TARGET_NSIG + 1]; + +/* valid sig is between 1 and _NSIG - 1 */ int host_to_target_signal(int sig) { -if (sig < 0 || sig >= _NSIG) +if (sig < 1 || sig >= _NSIG) { return sig; +} return host_to_target_signal_table[sig]; } +/* valid sig is between 1 and TARGET_NSIG */ int target_to_host_signal(int sig) { -if (sig < 0 || sig >= _NSIG) +if (sig < 1 || sig > TARGET_NSIG) { return sig; +} return target_to_host_signal_table[sig]; } @@ -100,11 +114,15 @@ static inline int target_sigismember(const target_sigset_t *set, int signum) void host_to_target_sigset_internal(target_sigset_t *d, const sigset_t *s) { -int i; +int host_sig, target_sig; target_sigemptyset(d); -for (i = 1; i <= TARGET_NSIG; i++) { -if (sigismember(s, i)) { -target_sigaddset(d, host_to_target_signal(i)); +for (host_sig = 1; host_sig < _NSIG; host_sig++) { +target_sig = host_to_target_signal(host_sig); +if (target_sig < 1 || target_sig > TARGET_NSIG) { +continue; +} +if (sigismember(s, host_sig)) { +target_sigaddset(d, target_sig); } } } @@ -122,11 +140,15 @@ void host_to_target_sigset(target_sigset_t *d, const sigset_t *s) void target_to_host_sigset_internal(sigset_t *d, const target_sigset_t *s) { -int i; +int host_sig, target_sig; sigemptyset(d); -for (i = 1; i <= TARGET_NSIG; i++) { -if (target_sigismember(s, i)) { -sigaddset(d, target_to_host_signal(i)); +for (target_sig = 1; target_sig <= TARGET_NSIG; target_sig++) { +host_sig = target_to_host_signal(target_sig); +if (host_sig < 1 || host_sig >= _NSIG) { +continue; +} +if (target_sigismember(s, target_sig)) { +sigaddset(d, host_sig); } } } @@ -492,10 +514,10 @@ static void signal_table_init(void) if (host_to_target_signal_table[host_sig] == 0) { host_to_target_signal_table[host_sig] = host_sig; } -} -for (host_sig = 1; host_sig < _NSIG; host_sig++) { target_sig = host_to_target_signal_table[host_sig]; -target_to_host_signal_table[target_sig] = host_sig; +if (target_sig <= TARGET_NSIG) { +target_to_host_signal_table[target_sig] = host_sig; +} } } @@ -518,7 +540,7 @@ void signal_init(void) act.sa_sigaction = host_signal_handler; for(i = 1; i <= TARGET_NSIG; i++) { #ifdef TARGET_GPROF -if (i == SIGPROF) { +if (i == TARGET_SIGPROF) { continue; } #endif -- 2.24.1
[PULL 0/5] Linux user for 5.0 patches
The following changes since commit e18e5501d8ac692d32657a3e1ef545b14e72b730: Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-virtiofs-20200210' into staging (2020-02-10 18:09:14 +) are available in the Git repository at: git://github.com/vivier/qemu.git tags/linux-user-for-5.0-pull-request for you to fetch changes up to 6d485a55d0cd8fbb8b4337b298f79ddb0c2a5511: linux-user: implement TARGET_SO_PEERSEC (2020-02-12 18:56:45 +0100) Implement TARGET_SO_PEERSEC Fix rt signals management Laurent Vivier (5): linux-user: add missing TARGET_SIGRTMIN for hppa linux-user: cleanup signal.c linux-user: fix TARGET_NSIG and _NSIG uses linux-user: fix use of SIGRTMIN linux-user: implement TARGET_SO_PEERSEC linux-user/hppa/target_signal.h | 1 + linux-user/signal.c | 134 linux-user/syscall.c| 22 ++ linux-user/trace-events | 3 + 4 files changed, 128 insertions(+), 32 deletions(-) -- 2.24.1
[PULL 1/5] linux-user: add missing TARGET_SIGRTMIN for hppa
This signal is defined for all other targets and we will need it later Signed-off-by: Laurent Vivier [pm: that this was actually an ABI change in the hppa kernel (at kernel version 3.17, kernel commit 1f25df2eff5b25f52c139d). Before that SIGRTMIN was 37... All our other HPPA TARGET_SIG* values are for the updated ABI following that commit, so using 32 for SIGRTMIN is the right thing for us.] Reviewed-by: Peter Maydell Tested-by: Taylor Simpson Message-Id: <20200212125658.644558-2-laur...@vivier.eu> Signed-off-by: Laurent Vivier --- linux-user/hppa/target_signal.h | 1 + 1 file changed, 1 insertion(+) diff --git a/linux-user/hppa/target_signal.h b/linux-user/hppa/target_signal.h index ba159ff8d006..c2a0102ed73d 100644 --- a/linux-user/hppa/target_signal.h +++ b/linux-user/hppa/target_signal.h @@ -34,6 +34,7 @@ #define TARGET_SIGURG 29 #define TARGET_SIGXFSZ 30 #define TARGET_SIGSYS 31 +#define TARGET_SIGRTMIN32 #define TARGET_SIG_BLOCK 0 #define TARGET_SIG_UNBLOCK 1 -- 2.24.1
[PULL 5/5] linux-user: implement TARGET_SO_PEERSEC
"The purpose of this option is to allow an application to obtain the security credentials of a Unix stream socket peer. It is analogous to SO_PEERCRED (which provides authentication using standard Unix credentials of pid, uid and gid), and extends this concept to other security models." -- https://lwn.net/Articles/62370/ Until now it was passed to the kernel with an "int" argument and fails when it was supported by the host because the parameter is like a filename: it is always a \0-terminated string with no embedded \0 characters, but is not guaranteed to be ASCII or UTF-8. I've tested the option with the following program: /* * cc -o getpeercon getpeercon.c */ #include #include #include #include #include int main(void) { int fd; struct sockaddr_in server, addr; int ret; socklen_t len; char buf[256]; fd = socket(PF_INET, SOCK_STREAM, 0); if (fd == -1) { perror("socket"); return 1; } server.sin_family = AF_INET; inet_aton("127.0.0.1", &server.sin_addr); server.sin_port = htons(40390); connect(fd, (struct sockaddr*)&server, sizeof(server)); len = sizeof(buf); ret = getsockopt(fd, SOL_SOCKET, SO_PEERSEC, buf, &len); if (ret == -1) { perror("getsockopt"); return 1; } printf("%d %s\n", len, buf); return 0; } On host: $ ./getpeercon 33 system_u:object_r:unlabeled_t:s0 With qemu-aarch64/bionic without the patch: $ ./getpeercon getsockopt: Numerical result out of range With the patch: $ ./getpeercon 33 system_u:object_r:unlabeled_t:s0 Bug: https://bugs.launchpad.net/qemu/+bug/1823790 Reported-by: Matthias Lüscher Tested-by: Matthias Lüscher Signed-off-by: Laurent Vivier Reviewed-by: Philippe Mathieu-Daudé Tested-by: Philippe Mathieu-Daudé Message-Id: <20200204211901.1731821-1-laur...@vivier.eu> --- linux-user/syscall.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index d60142f0691c..c930577686da 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -2344,6 +2344,28 @@ static abi_long do_getsockopt(int sockfd, int level, int optname, } break; } +case TARGET_SO_PEERSEC: { +char *name; + +if (get_user_u32(len, optlen)) { +return -TARGET_EFAULT; +} +if (len < 0) { +return -TARGET_EINVAL; +} +name = lock_user(VERIFY_WRITE, optval_addr, len, 0); +if (!name) { +return -TARGET_EFAULT; +} +lv = len; +ret = get_errno(getsockopt(sockfd, level, SO_PEERSEC, + name, &lv)); +if (put_user_u32(lv, optlen)) { +ret = -TARGET_EFAULT; +} +unlock_user(name, optval_addr, lv); +break; +} case TARGET_SO_LINGER: { struct linger lg; -- 2.24.1
[PULL 4/5] linux-user: fix use of SIGRTMIN
Some RT signals can be in use by glibc, it's why SIGRTMIN (34) is generally greater than __SIGRTMIN (32). So SIGRTMIN cannot be mapped to TARGET_SIGRTMIN. Instead of swapping only SIGRTMIN and SIGRTMAX, map all the range [TARGET_SIGRTMIN ... TARGET_SIGRTMAX - X] to [__SIGRTMIN + X ... SIGRTMAX ] (SIGRTMIN is __SIGRTMIN + X). Signed-off-by: Laurent Vivier Reviewed-by: Taylor Simson Tested-by: Taylor Simpson Reviewed-by: Peter Maydell Message-Id: <20200212125658.644558-5-laur...@vivier.eu> --- linux-user/signal.c | 50 - linux-user/trace-events | 3 +++ 2 files changed, 48 insertions(+), 5 deletions(-) diff --git a/linux-user/signal.c b/linux-user/signal.c index c1e664f97a7c..046159dd0c5b 100644 --- a/linux-user/signal.c +++ b/linux-user/signal.c @@ -498,18 +498,30 @@ static int core_dump_signal(int sig) static void signal_table_init(void) { -int host_sig, target_sig; +int host_sig, target_sig, count; /* - * Nasty hack: Reverse SIGRTMIN and SIGRTMAX to avoid overlap with - * host libpthread signals. This assumes no one actually uses SIGRTMAX :-/ + * Signals are supported starting from TARGET_SIGRTMIN and going up + * until we run out of host realtime signals. + * glibc at least uses only the lower 2 rt signals and probably + * nobody's using the upper ones. + * it's why SIGRTMIN (34) is generally greater than __SIGRTMIN (32) * To fix this properly we need to do manual signal delivery multiplexed * over a single host signal. + * Attempts for configure "missing" signals via sigaction will be + * silently ignored. */ -host_to_target_signal_table[__SIGRTMIN] = __SIGRTMAX; -host_to_target_signal_table[__SIGRTMAX] = __SIGRTMIN; +for (host_sig = SIGRTMIN; host_sig <= SIGRTMAX; host_sig++) { +target_sig = host_sig - SIGRTMIN + TARGET_SIGRTMIN; +if (target_sig <= TARGET_NSIG) { +host_to_target_signal_table[host_sig] = target_sig; +} +} /* generate signal conversion tables */ +for (target_sig = 1; target_sig <= TARGET_NSIG; target_sig++) { +target_to_host_signal_table[target_sig] = _NSIG; /* poison */ +} for (host_sig = 1; host_sig < _NSIG; host_sig++) { if (host_to_target_signal_table[host_sig] == 0) { host_to_target_signal_table[host_sig] = host_sig; @@ -519,6 +531,15 @@ static void signal_table_init(void) target_to_host_signal_table[target_sig] = host_sig; } } + +if (trace_event_get_state_backends(TRACE_SIGNAL_TABLE_INIT)) { +for (target_sig = 1, count = 0; target_sig <= TARGET_NSIG; target_sig++) { +if (target_to_host_signal_table[target_sig] == _NSIG) { +count++; +} +} +trace_signal_table_init(count); +} } void signal_init(void) @@ -817,6 +838,8 @@ int do_sigaction(int sig, const struct target_sigaction *act, int host_sig; int ret = 0; +trace_signal_do_sigaction_guest(sig, TARGET_NSIG); + if (sig < 1 || sig > TARGET_NSIG || sig == TARGET_SIGKILL || sig == TARGET_SIGSTOP) { return -TARGET_EINVAL; } @@ -847,6 +870,23 @@ int do_sigaction(int sig, const struct target_sigaction *act, /* we update the host linux signal state */ host_sig = target_to_host_signal(sig); +trace_signal_do_sigaction_host(host_sig, TARGET_NSIG); +if (host_sig > SIGRTMAX) { +/* we don't have enough host signals to map all target signals */ +qemu_log_mask(LOG_UNIMP, "Unsupported target signal #%d, ignored\n", + sig); +/* + * we don't return an error here because some programs try to + * register an handler for all possible rt signals even if they + * don't need it. + * An error here can abort them whereas there can be no problem + * to not have the signal available later. + * This is the case for golang, + * See https://github.com/golang/go/issues/33746 + * So we silently ignore the error. + */ +return 0; +} if (host_sig != SIGSEGV && host_sig != SIGBUS) { sigfillset(&act1.sa_mask); act1.sa_flags = SA_SIGINFO; diff --git a/linux-user/trace-events b/linux-user/trace-events index f6de1b8befc0..0296133daeb6 100644 --- a/linux-user/trace-events +++ b/linux-user/trace-events @@ -1,6 +1,9 @@ # See docs/devel/tracing.txt for syntax documentation. # signal.c +signal_table_init(int i) "number of unavailable signals: %d" +signal_do_sigaction_guest(int sig, int max) "target signal %d (MAX %d)" +signal_do_sigaction_host(int sig, int max) "host signal %d (MAX %d)" # */signal.c user_setup_frame(void *env, uint64_t frame_addr) "env=%p frame_addr=0x%"PRIx64 user_setup_rt_frame(void *env, uint64_t frame_addr) "env=%p frame_ad
[PULL 2/5] linux-user: cleanup signal.c
No functional changes. Prepare the field for future fixes. Remove memset(.., 0, ...) that is useless on a static array Signed-off-by: Laurent Vivier Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Peter Maydell Tested-by: Taylor Simpson Message-Id: <20200212125658.644558-3-laur...@vivier.eu> --- linux-user/signal.c | 48 ++--- 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/linux-user/signal.c b/linux-user/signal.c index 5ca6d62b15d3..246315571c09 100644 --- a/linux-user/signal.c +++ b/linux-user/signal.c @@ -66,12 +66,6 @@ static uint8_t host_to_target_signal_table[_NSIG] = { [SIGPWR] = TARGET_SIGPWR, [SIGSYS] = TARGET_SIGSYS, /* next signals stay the same */ -/* Nasty hack: Reverse SIGRTMIN and SIGRTMAX to avoid overlap with - host libpthread signals. This assumes no one actually uses SIGRTMAX :-/ - To fix this properly we need to do manual signal delivery multiplexed - over a single host signal. */ -[__SIGRTMIN] = __SIGRTMAX, -[__SIGRTMAX] = __SIGRTMIN, }; static uint8_t target_to_host_signal_table[_NSIG]; @@ -480,31 +474,45 @@ static int core_dump_signal(int sig) } } +static void signal_table_init(void) +{ +int host_sig, target_sig; + +/* + * Nasty hack: Reverse SIGRTMIN and SIGRTMAX to avoid overlap with + * host libpthread signals. This assumes no one actually uses SIGRTMAX :-/ + * To fix this properly we need to do manual signal delivery multiplexed + * over a single host signal. + */ +host_to_target_signal_table[__SIGRTMIN] = __SIGRTMAX; +host_to_target_signal_table[__SIGRTMAX] = __SIGRTMIN; + +/* generate signal conversion tables */ +for (host_sig = 1; host_sig < _NSIG; host_sig++) { +if (host_to_target_signal_table[host_sig] == 0) { +host_to_target_signal_table[host_sig] = host_sig; +} +} +for (host_sig = 1; host_sig < _NSIG; host_sig++) { +target_sig = host_to_target_signal_table[host_sig]; +target_to_host_signal_table[target_sig] = host_sig; +} +} + void signal_init(void) { TaskState *ts = (TaskState *)thread_cpu->opaque; struct sigaction act; struct sigaction oact; -int i, j; +int i; int host_sig; -/* generate signal conversion tables */ -for(i = 1; i < _NSIG; i++) { -if (host_to_target_signal_table[i] == 0) -host_to_target_signal_table[i] = i; -} -for(i = 1; i < _NSIG; i++) { -j = host_to_target_signal_table[i]; -target_to_host_signal_table[j] = i; -} +/* initialize signal conversion tables */ +signal_table_init(); /* Set the signal mask from the host mask. */ sigprocmask(0, 0, &ts->signal_mask); -/* set all host signal handlers. ALL signals are blocked during - the handlers to serialize them. */ -memset(sigact_table, 0, sizeof(sigact_table)); - sigfillset(&act.sa_mask); act.sa_flags = SA_SIGINFO; act.sa_sigaction = host_signal_handler; -- 2.24.1
Re: [PATCH v2 5/9] linux-user: mips: Update syscall numbers to kernel 5.5 level
> > + MIPS_SYS(sys_clock_gettime64, 2) > > + MIPS_SYS(sys_clock_settime64, 4) > > + MIPS_SYS(sys_clock_adjtime64, 2) /* 405 */ > > + MIPS_SYS(sys_clock_getres_time64, 4) > > According to > https://github.com/strace/strace/blob/master/linux/syscallent-common-32.h: > > [BASE_NR + 406] = { 2, 0, SEN(clock_getres_time64), > "clock_getres_time64" }, > 404 also has the same problem, I am going to fix both. Thanks, Aleksandar > > once fixed you can add my > > Reviewed-by: Laurent Vivier
[PATCH v3 4/9] linux-user: microblaze: Update syscall numbers to kernel 5.5 level
From: Aleksandar Markovic Update microblaze syscall numbers based on Linux kernel v5.5. CC: Edgar E. Iglesias Signed-off-by: Aleksandar Markovic Reviewed-by: Laurent Vivier --- linux-user/microblaze/syscall_nr.h | 45 ++ 1 file changed, 45 insertions(+) diff --git a/linux-user/microblaze/syscall_nr.h b/linux-user/microblaze/syscall_nr.h index aa2eb93..ec1758e 100644 --- a/linux-user/microblaze/syscall_nr.h +++ b/linux-user/microblaze/syscall_nr.h @@ -393,5 +393,50 @@ #define TARGET_NR_memfd_create 386 #define TARGET_NR_bpf 387 #define TARGET_NR_execveat 388 +#define TARGET_NR_userfaultfd 389 +#define TARGET_NR_membarrier390 +#define TARGET_NR_mlock2391 +#define TARGET_NR_copy_file_range 392 +#define TARGET_NR_preadv2 393 +#define TARGET_NR_pwritev2 394 +#define TARGET_NR_pkey_mprotect 395 +#define TARGET_NR_pkey_alloc396 +#define TARGET_NR_pkey_free 397 +#define TARGET_NR_statx 398 +#define TARGET_NR_io_pgetevents 399 +#define TARGET_NR_rseq 400 +/* 401 and 402 are unused */ +#define TARGET_NR_clock_gettime64 403 +#define TARGET_NR_clock_settime64 404 +#define TARGET_NR_clock_adjtime64 405 +#define TARGET_NR_clock_getres_time64 406 +#define TARGET_NR_clock_nanosleep_time64 407 +#define TARGET_NR_timer_gettime64 408 +#define TARGET_NR_timer_settime64 409 +#define TARGET_NR_timerfd_gettime64 410 +#define TARGET_NR_timerfd_settime64 411 +#define TARGET_NR_utimensat_time64 412 +#define TARGET_NR_pselect6_time64 413 +#define TARGET_NR_ppoll_time64 414 +#define TARGET_NR_io_pgetevents_time64 416 +#define TARGET_NR_recvmmsg_time64 417 +#define TARGET_NR_mq_timedsend_time64 418 +#define TARGET_NR_mq_timedreceive_time64 419 +#define TARGET_NR_semtimedop_time64 420 +#define TARGET_NR_rt_sigtimedwait_time64 421 +#define TARGET_NR_futex_time64 422 +#define TARGET_NR_sched_rr_get_interval_time64 423 +#define TARGET_NR_pidfd_send_signal 424 +#define TARGET_NR_io_uring_setup425 +#define TARGET_NR_io_uring_enter426 +#define TARGET_NR_io_uring_register 427 +#define TARGET_NR_open_tree 428 +#define TARGET_NR_move_mount429 +#define TARGET_NR_fsopen430 +#define TARGET_NR_fsconfig 431 +#define TARGET_NR_fsmount 432 +#define TARGET_NR_fspick433 +#define TARGET_NR_pidfd_open434 +#define TARGET_NR_clone3435 #endif -- 2.7.4
[PATCH v3 0/9] linux-user: Update syscall numbers to kernel 5.5 level
From: Aleksandar Markovic v2->v3: - corrected number of arguments for two mips syscalls v1->v2: - corrected mips parts based on Laurent's review This series is a spin-off of another larger linux-user series that become too large to handle, hence these patches related to syscall numbers are now in this, separate, series. This series covers updating syscall numbers defined in the following files: - linux-user/alpha/syscall_nr.h - linux-user/arm/syscall_nr.h - linux-user/m68k/syscall_nr.h - linux-user/microblaze/syscall_nr.h - linux-user/mips/cpu_loop.c - linux-user/mips/syscall_nr.h - linux-user/mips64/syscall_nr.h - linux-user/sh4/syscall_nr.h - linux-user/x86_64/syscall_nr.h - linux-user/xtensa/syscall_nr.h -- Aleksandar Markovic (9): linux-user: alpha: Update syscall numbers to kernel 5.5 level linux-user: arm: Update syscall numbers to kernel 5.5 level linux-user: m68k: Update syscall numbers to kernel 5.5 level linux-user: microblaze: Update syscall numbers to kernel 5.5 level linux-user: mips: Update syscall numbers to kernel 5.5 level linux-user: sh4: Update syscall numbers to kernel 5.5 level linux-user: x86_64: Update syscall numbers to kernel 5.5 level linux-user: xtensa: Update syscall numbers to kernel 5.5 level linux-user: xtensa: Remove unused constant TARGET_NR_syscall_count linux-user/alpha/syscall_nr.h | 35 linux-user/arm/syscall_nr.h| 44 linux-user/m68k/syscall_nr.h | 50 ++- linux-user/microblaze/syscall_nr.h | 45 + linux-user/mips/syscall_nr.h | 45 + linux-user/mips64/syscall_nr.h | 50 ++- linux-user/sh4/syscall_nr.h| 48 ++ linux-user/x86_64/syscall_nr.h | 24 +++ linux-user/xtensa/syscall_nr.h | 36 - linux-user/mips/cpu_loop.c | 83 +- 10 files changed, 454 insertions(+), 6 deletions(-) -- 2.7.4
[PATCH v3 1/9] linux-user: alpha: Update syscall numbers to kernel 5.5 level
From: Aleksandar Markovic Update alpha syscall numbers based on Linux kernel v5.5. CC: Richard Henderson Signed-off-by: Aleksandar Markovic Reviewed-by: Laurent Vivier --- linux-user/alpha/syscall_nr.h | 35 +++ 1 file changed, 35 insertions(+) diff --git a/linux-user/alpha/syscall_nr.h b/linux-user/alpha/syscall_nr.h index 2e5541b..c29fc17 100644 --- a/linux-user/alpha/syscall_nr.h +++ b/linux-user/alpha/syscall_nr.h @@ -453,5 +453,40 @@ #define TARGET_NR_getrandom 511 #define TARGET_NR_memfd_create 512 #define TARGET_NR_execveat 513 +#define TARGET_NR_seccomp 514 +#define TARGET_NR_bpf 515 +#define TARGET_NR_userfaultfd 516 +#define TARGET_NR_membarrier517 +#define TARGET_NR_mlock2518 +#define TARGET_NR_copy_file_range 519 +#define TARGET_NR_preadv2 520 +#define TARGET_NR_pwritev2 521 +#define TARGET_NR_statx 522 +#define TARGET_NR_io_pgetevents 523 +#define TARGET_NR_pkey_mprotect 524 +#define TARGET_NR_pkey_alloc525 +#define TARGET_NR_pkey_free 526 +#define TARGET_NR_rseq 527 +#define TARGET_NR_statfs64 528 +#define TARGET_NR_fstatfs64 529 +#define TARGET_NR_getegid 530 +#define TARGET_NR_geteuid 531 +#define TARGET_NR_getppid 532 +/* + * all other architectures have common numbers for new syscall, alpha + * is the exception. + */ +#define TARGET_NR_pidfd_send_signal 534 +#define TARGET_NR_io_uring_setup535 +#define TARGET_NR_io_uring_enter536 +#define TARGET_NR_io_uring_register 537 +#define TARGET_NR_open_tree 538 +#define TARGET_NR_move_mount539 +#define TARGET_NR_fsopen540 +#define TARGET_NR_fsconfig 541 +#define TARGET_NR_fsmount 542 +#define TARGET_NR_fspick543 +#define TARGET_NR_pidfd_open544 +/* 545 reserved for clone3 */ #endif -- 2.7.4
[PATCH v3 3/9] linux-user: m68k: Update syscall numbers to kernel 5.5 level
From: Aleksandar Markovic Update m68k syscall numbers based on Linux kernel v5.5. CC: Laurent Vivier Signed-off-by: Aleksandar Markovic Reviewed-by: Laurent Vivier --- linux-user/m68k/syscall_nr.h | 50 +++- 1 file changed, 49 insertions(+), 1 deletion(-) diff --git a/linux-user/m68k/syscall_nr.h b/linux-user/m68k/syscall_nr.h index d33d8e9..01aee34 100644 --- a/linux-user/m68k/syscall_nr.h +++ b/linux-user/m68k/syscall_nr.h @@ -382,5 +382,53 @@ #define TARGET_NR_copy_file_range 376 #define TARGET_NR_preadv2 377 #define TARGET_NR_pwritev2 378 - +#define TARGET_NR_statx 379 +#define TARGET_NR_seccomp 380 +#define TARGET_NR_pkey_mprotect 381 +#define TARGET_NR_pkey_alloc382 +#define TARGET_NR_pkey_free 383 +#define TARGET_NR_rseq 384 +/* room for arch specific calls */ +#define TARGET_NR_semget393 +#define TARGET_NR_semctl394 +#define TARGET_NR_shmget395 +#define TARGET_NR_shmctl396 +#define TARGET_NR_shmat 397 +#define TARGET_NR_shmdt 398 +#define TARGET_NR_msgget399 +#define TARGET_NR_msgsnd400 +#define TARGET_NR_msgrcv401 +#define TARGET_NR_msgctl402 +#define TARGET_NR_clock_gettime64 403 +#define TARGET_NR_clock_settime64 404 +#define TARGET_NR_clock_adjtime64 405 +#define TARGET_NR_clock_getres_time64 406 +#define TARGET_NR_clock_nanosleep_time64 407 +#define TARGET_NR_timer_gettime64 408 +#define TARGET_NR_timer_settime64 409 +#define TARGET_NR_timerfd_gettime64 410 +#define TARGET_NR_timerfd_settime64 411 +#define TARGET_NR_utimensat_time64 412 +#define TARGET_NR_pselect6_time64 413 +#define TARGET_NR_ppoll_time64 414 +#define TARGET_NR_io_pgetevents_time64 416 +#define TARGET_NR_recvmmsg_time64 417 +#define TARGET_NR_mq_timedsend_time64 418 +#define TARGET_NR_mq_timedreceive_time64 419 +#define TARGET_NR_semtimedop_time64 420 +#define TARGET_NR_rt_sigtimedwait_time64 421 +#define TARGET_NR_futex_time64 422 +#define TARGET_NR_sched_rr_get_interval_time64 423 +#define TARGET_NR_pidfd_send_signal 424 +#define TARGET_NR_io_uring_setup425 +#define TARGET_NR_io_uring_enter426 +#define TARGET_NR_io_uring_register 427 +#define TARGET_NR_open_tree 428 +#define TARGET_NR_move_mount429 +#define TARGET_NR_fsopen430 +#define TARGET_NR_fsconfig 431 +#define TARGET_NR_fsmount 432 +#define TARGET_NR_fspick433 +#define TARGET_NR_pidfd_open434 +/* 435 reserved for clone3 */ #endif -- 2.7.4
[PATCH v3 6/9] linux-user: sh4: Update syscall numbers to kernel 5.5 level
From: Aleksandar Markovic Update sh4 syscall numbers based on Linux kernel v5.5. CC: Aurelien Jarno Signed-off-by: Aleksandar Markovic Reviewed-by: Laurent Vivier --- linux-user/sh4/syscall_nr.h | 48 + 1 file changed, 48 insertions(+) diff --git a/linux-user/sh4/syscall_nr.h b/linux-user/sh4/syscall_nr.h index d53a2a0..8c21fcf 100644 --- a/linux-user/sh4/syscall_nr.h +++ b/linux-user/sh4/syscall_nr.h @@ -389,5 +389,53 @@ #define TARGET_NR_copy_file_range 380 #define TARGET_NR_preadv2 381 #define TARGET_NR_pwritev2 382 +#define TARGET_NR_statx 383 +#define TARGET_NR_pkey_mprotect 384 +#define TARGET_NR_pkey_alloc385 +#define TARGET_NR_pkey_free 386 +#define TARGET_NR_rseq 387 +/* room for arch specific syscalls */ +#define TARGET_NR_semget 393 +#define TARGET_NR_semctl 394 +#define TARGET_NR_shmget 395 +#define TARGET_NR_shmctl 396 +#define TARGET_NR_shmat 397 +#define TARGET_NR_shmdt 398 +#define TARGET_NR_msgget 399 +#define TARGET_NR_msgsnd 400 +#define TARGET_NR_msgrcv 401 +#define TARGET_NR_msgctl 402 +#define TARGET_NR_clock_gettime64403 +#define TARGET_NR_clock_settime64404 +#define TARGET_NR_clock_adjtime64405 +#define TARGET_NR_clock_getres_time64406 +#define TARGET_NR_clock_nanosleep_time64 407 +#define TARGET_NR_timer_gettime64408 +#define TARGET_NR_timer_settime64409 +#define TARGET_NR_timerfd_gettime64 410 +#define TARGET_NR_timerfd_settime64 411 +#define TARGET_NR_utimensat_time64 412 +#define TARGET_NR_pselect6_time64413 +#define TARGET_NR_ppoll_time64 414 +#define TARGET_NR_io_pgetevents_time64 416 +#define TARGET_NR_recvmmsg_time64417 +#define TARGET_NR_mq_timedsend_time64418 +#define TARGET_NR_mq_timedreceive_time64 419 +#define TARGET_NR_semtimedop_time64 420 +#define TARGET_NR_rt_sigtimedwait_time64 421 +#define TARGET_NR_futex_time64 422 +#define TARGET_NR_sched_rr_get_interval_time64 423 +#define TARGET_NR_pidfd_send_signal 424 +#define TARGET_NR_io_uring_setup 425 +#define TARGET_NR_io_uring_enter 426 +#define TARGET_NR_io_uring_register 427 +#define TARGET_NR_open_tree 428 +#define TARGET_NR_move_mount 429 +#define TARGET_NR_fsopen 430 +#define TARGET_NR_fsconfig 431 +#define TARGET_NR_fsmount432 +#define TARGET_NR_fspick 433 +#define TARGET_NR_pidfd_open 434 +/* 435 reserved for clone3 */ #endif -- 2.7.4
[PATCH v3 5/9] linux-user: mips: Update syscall numbers to kernel 5.5 level
From: Aleksandar Markovic Update mips syscall numbers based on Linux kernel tag v5.5. CC: Aurelien Jarno CC: Aleksandar Rikalo Signed-off-by: Aleksandar Markovic Reviewed-by: Laurent Vivier --- linux-user/mips/syscall_nr.h | 45 +++ linux-user/mips64/syscall_nr.h | 50 - linux-user/mips/cpu_loop.c | 83 +- 3 files changed, 175 insertions(+), 3 deletions(-) diff --git a/linux-user/mips/syscall_nr.h b/linux-user/mips/syscall_nr.h index 7fa7fa5..0be3af1 100644 --- a/linux-user/mips/syscall_nr.h +++ b/linux-user/mips/syscall_nr.h @@ -376,5 +376,50 @@ #define TARGET_NR_statx (TARGET_NR_Linux + 366) #define TARGET_NR_rseq (TARGET_NR_Linux + 367) #define TARGET_NR_io_pgetevents (TARGET_NR_Linux + 368) +/* room for arch specific calls */ +#define TARGET_NR_semget(TARGET_NR_Linux + 393) +#define TARGET_NR_semctl(TARGET_NR_Linux + 394) +#define TARGET_NR_shmget(TARGET_NR_Linux + 395) +#define TARGET_NR_shmctl(TARGET_NR_Linux + 396) +#define TARGET_NR_shmat (TARGET_NR_Linux + 397) +#define TARGET_NR_shmdt (TARGET_NR_Linux + 398) +#define TARGET_NR_msgget(TARGET_NR_Linux + 399) +#define TARGET_NR_msgsnd(TARGET_NR_Linux + 400) +#define TARGET_NR_msgrcv(TARGET_NR_Linux + 401) +#define TARGET_NR_msgctl(TARGET_NR_Linux + 402) +/* 403-423 common for 32-bit archs */ +#define TARGET_NR_clock_gettime64 (TARGET_NR_Linux + 403) +#define TARGET_NR_clock_settime64 (TARGET_NR_Linux + 404) +#define TARGET_NR_clock_adjtime64 (TARGET_NR_Linux + 405) +#define TARGET_NR_clock_getres_time64 (TARGET_NR_Linux + 406) +#define TARGET_NR_clock_nanosleep_time64 (TARGET_NR_Linux + 407) +#define TARGET_NR_timer_gettime64 (TARGET_NR_Linux + 408) +#define TARGET_NR_timer_settime64 (TARGET_NR_Linux + 409) +#define TARGET_NR_timerfd_gettime64(TARGET_NR_Linux + 410) +#define TARGET_NR_timerfd_settime64(TARGET_NR_Linux + 411) +#define TARGET_NR_utimensat_time64 (TARGET_NR_Linux + 412) +#define TARGET_NR_pselect6_time64 (TARGET_NR_Linux + 413) +#define TARGET_NR_ppoll_time64 (TARGET_NR_Linux + 414) +#define TARGET_NR_io_pgetevents_time64 (TARGET_NR_Linux + 416) +#define TARGET_NR_recvmmsg_time64 (TARGET_NR_Linux + 417) +#define TARGET_NR_mq_timedsend_time64 (TARGET_NR_Linux + 418) +#define TARGET_NR_mq_timedreceive_time64 (TARGET_NR_Linux + 419) +#define TARGET_NR_semtimedop_time64(TARGET_NR_Linux + 420) +#define TARGET_NR_rt_sigtimedwait_time64 (TARGET_NR_Linux + 421) +#define TARGET_NR_futex_time64 (TARGET_NR_Linux + 422) +#define TARGET_NR_sched_rr_get_interval_time64 (TARGET_NR_Linux + 423) +/* 424 onwards common for all archs */ +#define TARGET_NR_pidfd_send_signal(TARGET_NR_Linux + 424) +#define TARGET_NR_io_uring_setup (TARGET_NR_Linux + 425) +#define TARGET_NR_io_uring_enter (TARGET_NR_Linux + 426) +#define TARGET_NR_io_uring_register(TARGET_NR_Linux + 427) +#define TARGET_NR_open_tree(TARGET_NR_Linux + 428) +#define TARGET_NR_move_mount (TARGET_NR_Linux + 429) +#define TARGET_NR_fsopen (TARGET_NR_Linux + 430) +#define TARGET_NR_fsconfig (TARGET_NR_Linux + 431) +#define TARGET_NR_fsmount (TARGET_NR_Linux + 432) +#define TARGET_NR_fspick (TARGET_NR_Linux + 433) +#define TARGET_NR_pidfd_open (TARGET_NR_Linux + 434) +#define TARGET_NR_clone3 (TARGET_NR_Linux + 435) #endif diff --git a/linux-user/mips64/syscall_nr.h b/linux-user/mips64/syscall_nr.h index db40f69..6e23e9f 100644 --- a/linux-user/mips64/syscall_nr.h +++ b/linux-user/mips64/syscall_nr.h @@ -339,6 +339,39 @@ #define TARGET_NR_statx (TARGET_NR_Linux + 330) #define TARGET_NR_rseq (TARGET_NR_Linux + 331) #define TARGET_NR_io_pgetevents (TARGET_NR_Linux + 332) +/* 333 through 402 are unassigned to sync up with generic numbers */ +#define TARGET_NR_clock_gettime64 (TARGET_NR_Linux + 403) +#define TARGET_NR_clock_settime64 (TARGET_NR_Linux + 404) +#define TARGET_NR_clock_adjtime64 (TARGET_NR_Linux + 405) +#define TARGET_NR_clock_getres_time64 (TARGET_NR_Linux + 406) +#define TARGET_NR_clock_nanosleep_time64 (TARGET_NR_Linux + 407) +#define TARGET_NR_timer_gettime64 (TARGET_NR_Linux + 408) +#define TARGET_NR_timer_settime64 (TARGET_NR_Linux + 409) +#define TARGET_NR_timerfd_gettime64 (TARGET_NR_Linux + 410) +#define TARGET_NR_timerfd_settime64 (TARGET_NR_Linux + 411) +#define TARGET_NR_u
[PATCH v3 2/9] linux-user: arm: Update syscall numbers to kernel 5.5 level
From: Aleksandar Markovic Update arm syscall numbers based on Linux kernel v5.5. CC: Peter Maydell Signed-off-by: Aleksandar Markovic Reviewed-by: Laurent Vivier --- linux-user/arm/syscall_nr.h | 44 1 file changed, 44 insertions(+) diff --git a/linux-user/arm/syscall_nr.h b/linux-user/arm/syscall_nr.h index e7eda0d..6db9235 100644 --- a/linux-user/arm/syscall_nr.h +++ b/linux-user/arm/syscall_nr.h @@ -399,5 +399,49 @@ #define TARGET_NR_userfaultfd (388) #define TARGET_NR_membarrier (389) #define TARGET_NR_mlock2 (390) +#define TARGET_NR_copy_file_range (391) +#define TARGET_NR_preadv2 (392) +#define TARGET_NR_pwritev2 (393) +#define TARGET_NR_pkey_mprotect(394) +#define TARGET_NR_pkey_alloc (395) +#define TARGET_NR_pkey_free(396) +#define TARGET_NR_statx(397) +#define TARGET_NR_rseq (398) +#define TARGET_NR_io_pgetevents(399) +#define TARGET_NR_migrate_pages(400) +#define TARGET_NR_kexec_file_load (401) +/* 402 is unused */ +#define TARGET_NR_clock_gettime64 (403) +#define TARGET_NR_clock_settime64 (404) +#define TARGET_NR_clock_adjtime64 (405) +#define TARGET_NR_clock_getres_time64 (406) +#define TARGET_NR_clock_nanosleep_time64 (407) +#define TARGET_NR_timer_gettime64 (408) +#define TARGET_NR_timer_settime64 (409) +#define TARGET_NR_timerfd_gettime64(410) +#define TARGET_NR_timerfd_settime64(411) +#define TARGET_NR_utimensat_time64 (412) +#define TARGET_NR_pselect6_time64 (413) +#define TARGET_NR_ppoll_time64 (414) +#define TARGET_NR_io_pgetevents_time64 (416) +#define TARGET_NR_recvmmsg_time64 (417) +#define TARGET_NR_mq_timedsend_time64 (418) +#define TARGET_NR_mq_timedreceive_time64 (419) +#define TARGET_NR_semtimedop_time64(420) +#define TARGET_NR_rt_sigtimedwait_time64 (421) +#define TARGET_NR_futex_time64 (422) +#define TARGET_NR_sched_rr_get_interval_time64 (423) +#define TARGET_NR_pidfd_send_signal(424) +#define TARGET_NR_io_uring_setup (425) +#define TARGET_NR_io_uring_enter (426) +#define TARGET_NR_io_uring_register(427) +#define TARGET_NR_open_tree(428) +#define TARGET_NR_move_mount (429) +#define TARGET_NR_fsopen (430) +#define TARGET_NR_fsconfig (431) +#define TARGET_NR_fsmount (432) +#define TARGET_NR_fspick (433) +#define TARGET_NR_pidfd_open (434) +#define TARGET_NR_clone3 (435) #endif -- 2.7.4
[PATCH v3 9/9] linux-user: xtensa: Remove unused constant TARGET_NR_syscall_count
From: Aleksandar Markovic Currently, there is no usage of TARGET_NR_syscall_count for target xtensa, and there is no obvious indication if there is some planned usage in future. CC: Max Filippov Acked-by: Max Filippov Signed-off-by: Aleksandar Markovic Reviewed-by: Laurent Vivier --- linux-user/xtensa/syscall_nr.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/linux-user/xtensa/syscall_nr.h b/linux-user/xtensa/syscall_nr.h index 3d19d0c..39bff65 100644 --- a/linux-user/xtensa/syscall_nr.h +++ b/linux-user/xtensa/syscall_nr.h @@ -466,6 +466,4 @@ #define TARGET_NR_pidfd_open 434 #define TARGET_NR_clone3 435 -#define TARGET_NR_syscall_count 436 - #endif /* XTENSA_SYSCALL_NR_H */ -- 2.7.4
[PATCH v3 8/9] linux-user: xtensa: Update syscall numbers to kernel 5.5 level
From: Aleksandar Markovic Update xtensa syscall numbers based on Linux kernel v5.5. CC: Max Filippov Acked-by: Max Filippov Signed-off-by: Aleksandar Markovic Reviewed-by: Laurent Vivier --- linux-user/xtensa/syscall_nr.h | 38 -- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/linux-user/xtensa/syscall_nr.h b/linux-user/xtensa/syscall_nr.h index 27645be..3d19d0c 100644 --- a/linux-user/xtensa/syscall_nr.h +++ b/linux-user/xtensa/syscall_nr.h @@ -431,7 +431,41 @@ #define TARGET_NR_pkey_free 350 #define TARGET_NR_statx 351 - -#define TARGET_NR_syscall_count 352 +#define TARGET_NR_rseq 352 +/* 353 through 402 are unassigned to sync up with generic numbers */ +#define TARGET_NR_clock_gettime64403 +#define TARGET_NR_clock_settime64404 +#define TARGET_NR_clock_adjtime64405 +#define TARGET_NR_clock_getres_time64406 +#define TARGET_NR_clock_nanosleep_time64 407 +#define TARGET_NR_timer_gettime64408 +#define TARGET_NR_timer_settime64409 +#define TARGET_NR_timerfd_gettime64 410 +#define TARGET_NR_timerfd_settime64 411 +#define TARGET_NR_utimensat_time64 412 +#define TARGET_NR_pselect6_time64413 +#define TARGET_NR_ppoll_time64 414 +#define TARGET_NR_io_pgetevents_time64 416 +#define TARGET_NR_recvmmsg_time64417 +#define TARGET_NR_mq_timedsend_time64418 +#define TARGET_NR_mq_timedreceive_time64 419 +#define TARGET_NR_semtimedop_time64 420 +#define TARGET_NR_rt_sigtimedwait_time64 421 +#define TARGET_NR_futex_time64 422 +#define TARGET_NR_sched_rr_get_interval_time64 423 +#define TARGET_NR_pidfd_send_signal 424 +#define TARGET_NR_io_uring_setup 425 +#define TARGET_NR_io_uring_enter 426 +#define TARGET_NR_io_uring_register 427 +#define TARGET_NR_open_tree 428 +#define TARGET_NR_move_mount 429 +#define TARGET_NR_fsopen 430 +#define TARGET_NR_fsconfig 431 +#define TARGET_NR_fsmount432 +#define TARGET_NR_fspick 433 +#define TARGET_NR_pidfd_open 434 +#define TARGET_NR_clone3 435 + +#define TARGET_NR_syscall_count 436 #endif /* XTENSA_SYSCALL_NR_H */ -- 2.7.4
[PATCH v3 7/9] linux-user: x86_64: Update syscall numbers to kernel 5.5 level
From: Aleksandar Markovic Update x86_64 syscall numbers based on Linux kernel v5.5. CC: Paolo Bonzini CC: Richard Henderson CC: Eduardo Habkost Signed-off-by: Aleksandar Markovic Reviewed-by: Laurent Vivier --- linux-user/x86_64/syscall_nr.h | 24 1 file changed, 24 insertions(+) diff --git a/linux-user/x86_64/syscall_nr.h b/linux-user/x86_64/syscall_nr.h index 9b6981e..e5d14ec 100644 --- a/linux-user/x86_64/syscall_nr.h +++ b/linux-user/x86_64/syscall_nr.h @@ -328,5 +328,29 @@ #define TARGET_NR_membarrier324 #define TARGET_NR_mlock2325 #define TARGET_NR_copy_file_range 326 +#define TARGET_NR_preadv2 327 +#define TARGET_NR_pwritev2 328 +#define TARGET_NR_pkey_mprotect 329 +#define TARGET_NR_pkey_alloc330 +#define TARGET_NR_pkey_free 331 +#define TARGET_NR_statx 332 +#define TARGET_NR_io_pgetevents 333 +#define TARGET_NR_rseq 334 +/* + * don't use numbers 387 through 423, add new calls after the last + * 'common' entry + */ +#define TARGET_NR_pidfd_send_signal 424 +#define TARGET_NR_io_uring_setup425 +#define TARGET_NR_io_uring_enter426 +#define TARGET_NR_io_uring_register 427 +#define TARGET_NR_open_tree 428 +#define TARGET_NR_move_mount429 +#define TARGET_NR_fsopen430 +#define TARGET_NR_fsconfig 431 +#define TARGET_NR_fsmount 432 +#define TARGET_NR_fspick433 +#define TARGET_NR_pidfd_open434 +#define TARGET_NR_clone3435 #endif -- 2.7.4
Re: [PULL] RISC-V Patches for the 5.0 Soft Freeze, Part 2
Hi Palmer, On Thu, Feb 13, 2020 at 1:30 AM Palmer Dabbelt wrote: > > The following changes since commit 81a23caf47956778c5a5056ad656d1ef92bf9659: > > Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' > into staging (2020-02-10 17:08:51 +) > > are available in the Git repository at: > > g...@github.com:palmer-dabbelt/qemu.git tags/riscv-for-master-5.0-sf2 > > for you to fetch changes up to 9c8fdcece53e05590441785ab22d91a22da36e29: > > MAINTAINERS: Add maintainer entry for Goldfish RTC (2020-02-10 12:01:39 > -0800) > > > RISC-V Patches for the 5.0 Soft Freeze, Part 2 > > This is a fairly light-weight pull request, but I wanted to send it out to > avoid the Goldfish stuff getting buried as the next PR should contain the H > extension implementation. > > As far as this PR goes, it contains: > > * The addition of syscon device tree nodes for reboot and poweroff, which > allows Linux to control QEMU without an additional driver. The existing > device was already compatible with the syscon interface. > * A fix to our GDB stub to avoid confusing XLEN and FLEN, specifically useful > for rv32id-based systems. > * A device emulation for the Goldfish RTC device, a simple memory-mapped RTC. > * The addition of the Goldfish RTC device to the RISC-V virt board. > > This passes "make check" and boots buildroot for me. > This PR is still missing: http://patchwork.ozlabs.org/patch/1199516/ > > > Peter: I'm sending hw/rtc code because it was suggested that the Goldfish > implementation gets handled via the RISC-V tree as our virt board is the only > user. I'm happy to do things differently in the future (maybe send > goldfish-specific PRs?) if that's better for you. Just LMK what makes sense, > I > anticipate that this'll be a pretty low traffic device so I'm fine with pretty > much anything. > Regards, Bin
Re: [PATCH v2] virtio: increase virtuqueue size for virtio-scsi and virtio-blk
On 13.02.2020 14:45, Stefan Hajnoczi wrote: On Thu, Feb 13, 2020 at 12:28:25PM +0300, Denis Plotnikov wrote: On 13.02.2020 12:08, Stefan Hajnoczi wrote: On Thu, Feb 13, 2020 at 11:08:35AM +0300, Denis Plotnikov wrote: On 12.02.2020 18:43, Stefan Hajnoczi wrote: On Tue, Feb 11, 2020 at 05:14:14PM +0300, Denis Plotnikov wrote: The goal is to reduce the amount of requests issued by a guest on 1M reads/writes. This rises the performance up to 4% on that kind of disk access pattern. The maximum chunk size to be used for the guest disk accessing is limited with seg_max parameter, which represents the max amount of pices in the scatter-geather list in one guest disk request. Since seg_max is virqueue_size dependent, increasing the virtqueue size increases seg_max, which, in turn, increases the maximum size of data to be read/write from a guest disk. More details in the original problem statment: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg03721.html Suggested-by: Denis V. Lunev Signed-off-by: Denis Plotnikov --- hw/block/virtio-blk.c | 4 ++-- hw/core/machine.c | 2 ++ hw/scsi/virtio-scsi.c | 4 ++-- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 09f46ed85f..6df3a7a6df 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -914,7 +914,7 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) memset(&blkcfg, 0, sizeof(blkcfg)); virtio_stq_p(vdev, &blkcfg.capacity, capacity); virtio_stl_p(vdev, &blkcfg.seg_max, - s->conf.seg_max_adjust ? s->conf.queue_size - 2 : 128 - 2); + s->conf.seg_max_adjust ? s->conf.queue_size - 2 : 256 - 2); This value must not change on older machine types. Yes, that's true, but .. So does this patch need to turn seg-max-adjust *on* in hw_compat_4_2 so that old machine types get 126 instead of 254? If we set seg-max-adjust "on" in older machine types, the setups using them and having queue_sizes set , for example, 1024 will also set seg_max to 1024 - 2 which isn't the expected behavior: older mt didn't change seg_max in that case and stuck with 128 - 2. So, should we, instead, leave the default 128 - 2, for seg_max? Argh! Good point :-). How about a seg_max_default property that is initialized to 254 for modern machines and 126 to old machines? Hmm, but we'll achieve the same but with more code changes, don't we? 254 is because the queue-size is 256. We gonna leave 128-2 for older machine types just for not breaking anything. All other seg_max adjustment is provided by seg_max_adjust which is "on" by default in modern machine types. to summarize: modern mt defaults: seg_max_adjust = on queue_size = 256 => default seg_max = 254 => changing queue-size will change seg_max = queue_size - 2 old mt defaults: seg_max_adjust = off queue_size = 128 => default seg_max = 126 => changing queue-size won't change seg_max, it's always = 126 like it was before You're right! The only strange case is a modern machine type with seg_max_adjust=off, where queue_size will be 256 but seg_max will be 126. But no user would want to disable seg_max_adjust, so it's okay. I agree with you that the line of code can remain unchanged: /* * Only old machine types use seg_max_adjust=off and there the default * value of queue_size is 128. */ virtio_stl_p(vdev, &blkcfg.seg_max, s->conf.seg_max_adjust ? s->conf.queue_size - 2 : 128 - 2); Stefan Ok, I'll resend the patch sortly Thanks! Denis
Re: [PATCH v2 fixed 13/16] numa: Teach ram block notifiers about resizable ram blocks
On Wed, 12 Feb 2020 at 14:44, David Hildenbrand wrote: > > We want to actually resize ram blocks (make everything between > used_length and max_length inaccessible) - however, not all ram block > notifiers will support that. Let's teach the notifier that ram blocks > are indeed resizable, but keep using max_size in the existing notifiers. > > Supply the max_size when adding and removing ram blocks. Also, notify on > resizes. Introduce a way to detect if any registered notifier does not > support resizes - ram_block_notifiers_support_resize() - which we can later > use to fallback to legacy handling if a registered notifier (esp., SEV and > HAX) does not support actual resizes. > > Cc: Richard Henderson > Cc: Paolo Bonzini > Cc: "Dr. David Alan Gilbert" > Cc: Eduardo Habkost > Cc: Marcel Apfelbaum > Cc: Stefano Stabellini > Cc: Anthony Perard > Cc: Paul Durrant > Cc: "Michael S. Tsirkin" > Cc: xen-de...@lists.xenproject.org > Cc: Igor Mammedov > Signed-off-by: David Hildenbrand Xen parts... Acked-by: Paul Durrant
Re: [PATCH v3 0/9] linux-user: Update syscall numbers to kernel 5.5 level
Le 13/02/2020 à 13:29, Aleksandar Markovic a écrit : > From: Aleksandar Markovic > > v2->v3: > > - corrected number of arguments for two mips syscalls > > v1->v2: > > - corrected mips parts based on Laurent's review > > This series is a spin-off of another larger linux-user series > that become too large to handle, hence these patches related to > syscall numbers are now in this, separate, series. > > This series covers updating syscall numbers defined in the following > files: > > - linux-user/alpha/syscall_nr.h > - linux-user/arm/syscall_nr.h > - linux-user/m68k/syscall_nr.h > - linux-user/microblaze/syscall_nr.h > - linux-user/mips/cpu_loop.c > - linux-user/mips/syscall_nr.h > - linux-user/mips64/syscall_nr.h > - linux-user/sh4/syscall_nr.h > - linux-user/x86_64/syscall_nr.h > - linux-user/xtensa/syscall_nr.h > > -- > > Aleksandar Markovic (9): > linux-user: alpha: Update syscall numbers to kernel 5.5 level > linux-user: arm: Update syscall numbers to kernel 5.5 level > linux-user: m68k: Update syscall numbers to kernel 5.5 level > linux-user: microblaze: Update syscall numbers to kernel 5.5 level > linux-user: mips: Update syscall numbers to kernel 5.5 level > linux-user: sh4: Update syscall numbers to kernel 5.5 level > linux-user: x86_64: Update syscall numbers to kernel 5.5 level > linux-user: xtensa: Update syscall numbers to kernel 5.5 level > linux-user: xtensa: Remove unused constant TARGET_NR_syscall_count > > linux-user/alpha/syscall_nr.h | 35 > linux-user/arm/syscall_nr.h| 44 > linux-user/m68k/syscall_nr.h | 50 ++- > linux-user/microblaze/syscall_nr.h | 45 + > linux-user/mips/syscall_nr.h | 45 + > linux-user/mips64/syscall_nr.h | 50 ++- > linux-user/sh4/syscall_nr.h| 48 ++ > linux-user/x86_64/syscall_nr.h | 24 +++ > linux-user/xtensa/syscall_nr.h | 36 - > linux-user/mips/cpu_loop.c | 83 > +- > 10 files changed, 454 insertions(+), 6 deletions(-) > I've applied the series to my linux-user for 5.0 branch. Thanks, Laurent
Re: [PATCH] block: make BlockConf.*_size properties 32-bit
On 2/13/20 2:01 AM, Roman Kagan wrote: On Wed, Feb 12, 2020 at 03:44:19PM -0600, Eric Blake wrote: On 2/11/20 5:54 AM, Roman Kagan wrote: Devices (virtio-blk, scsi, etc.) and the block layer are happy to use 32-bit for logical_block_size, physical_block_size, and min_io_size. However, the properties in BlockConf are defined as uint16_t limiting the values to 32768. This appears unnecessary tight, and we've seen bigger block sizes handy at times. What larger sizes? I could see 64k or maybe even 1M block sizes,... We played exactly with these two :) Make them 32 bit instead and lift the limitation. Signed-off-by: Roman Kagan --- hw/core/qdev-properties.c| 21 - include/hw/block/block.h | 8 include/hw/qdev-properties.h | 2 +- 3 files changed, 17 insertions(+), 14 deletions(-) diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c index 7f93bfeb88..5f84e4a3b8 100644 --- a/hw/core/qdev-properties.c +++ b/hw/core/qdev-properties.c @@ -716,30 +716,32 @@ const PropertyInfo qdev_prop_pci_devfn = { /* --- blocksize --- */ +#define MIN_BLOCK_SIZE 512 +#define MAX_BLOCK_SIZE 2147483648 ...but 2G block sizes are going to have tremendous performance problems. I'm not necessarily opposed to the widening to a 32-bit type, but think you need more justification or a smaller number for the max block size, I thought any smaller value would just be arbitrary and hard to reason about, so I went ahead with the max value that fit in the type and could be made visibile to the guest. You've got bigger problems than what is visible to the guest. block/qcow2.c operates on a cluster at a time; if you are stating that it now requires reading multiple clusters to operate on one, qcow2 will have to do lots of wasteful read-modify-write cycles. You really need a strong reason to support a maximum larger than 2M other than just "so the guest can experiment with it". Besides this is a property that is set explicitly, so I don't see a problem leaving this up to the user. particularly since qcow2 refuses to use cluster sizes larger than 2M and it makes no sense to allow a block size larger than a cluster size. This still doesn't contradict passing a bigger value to the guest, for experimenting if nothing else. Thanks, Roman. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
[Bug 1863096] [NEW] vhost-user multi-queues interrupt failed when Qemu reconnection happens
Public bug reported: After upgrade qemu to v4.2.0, vhost-user multi-queues interrupt failed with event idx interrupt mode when reconnection happens. Test Environment: DPDK version: DPDK v19.11 Other software versions: qemu 4.2.0. OS: Linux 4.15.0-20-generic Compiler: gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0 Hardware platform: Purley. The reproduce step is: 1. Launch l3fwd-power example app with client mode:: ./l3fwd-power -l 1-16 \ -n 4 --socket-mem 1024,1024 --legacy-mem --no-pci\ --log-level=9 \ --vdev 'eth_vhost0,iface=/vhost-net0,queues=16,client=1' \ -- -p 0x1 \ --parse-ptype 1 \ --config "(0,0,1),(0,1,2),(0,2,3),(0,3,4),(0,4,5),(0,5,6),(0,6,7),(0,7,8),(0,8,9),(0,9,10),(0,10,11),(0,11,12),(0,12,13),(0,13,14),(0,14,15),(0,15,16)" 2. Launch VM1 with server mode: 3. Relauch l3fwd-power sample to for reconnection: ./l3fwd-power -l 1-16 \ -n 4 --socket-mem 1024,1024 --legacy-mem --no-pci\ --log-level=9 \ --vdev 'eth_vhost0,iface=/vhost-net0,queues=16,client=1' \ -- -p 0x1 \ --parse-ptype 1 \ --config "(0,0,1),(0,1,2),(0,2,3),(0,3,4),(0,4,5),(0,5,6),(0,6,7),(0,7,8),(0,8,9),(0,9,10),(0,10,11),(0,11,12),(0,12,13),(0,13,14),(0,14,15),(0,15,16)" 4. Set vitio-net with 16 quques and give vitio-net ip address: ethtool -L [ens3] combined 16# [ens3] is the name of virtio-net ifconfig [ens3] 1.1.1.1 5. Send packets with different IPs from virtio-net, notice to bind each vcpu to different send packets process:: taskset -c 0 ping 1.1.1.2 taskset -c 1 ping 1.1.1.3 taskset -c 2 ping 1.1.1.4 taskset -c 3 ping 1.1.1.5 taskset -c 4 ping 1.1.1.6 taskset -c 5 ping 1.1.1.7 taskset -c 6 ping 1.1.1.8 taskset -c 7 ping 1.1.1.9 taskset -c 8 ping 1.1.1.2 taskset -c 9 ping 1.1.1.2 taskset -c 10 ping 1.1.1.2 taskset -c 11 ping 1.1.1.2 taskset -c 12 ping 1.1.1.2 taskset -c 13 ping 1.1.1.2 taskset -c 14 ping 1.1.1.2 taskset -c 15 ping 1.1.1.2 If everything ok, then we can see the result such as following: L3FWD_POWER: lcore 0 is waked up from rx interrupt on port 0 queue 0 ... ... L3FWD_POWER: lcore 15 is waked up from rx interrupt on port 0 queue 15 But we can't see the log above because of the bug. ** Affects: qemu Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1863096 Title: vhost-user multi-queues interrupt failed when Qemu reconnection happens Status in QEMU: New Bug description: After upgrade qemu to v4.2.0, vhost-user multi-queues interrupt failed with event idx interrupt mode when reconnection happens. Test Environment: DPDK version: DPDK v19.11 Other software versions: qemu 4.2.0. OS: Linux 4.15.0-20-generic Compiler: gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0 Hardware platform: Purley. The reproduce step is: 1. Launch l3fwd-power example app with client mode:: ./l3fwd-power -l 1-16 \ -n 4 --socket-mem 1024,1024 --legacy-mem --no-pci\ --log-level=9 \ --vdev 'eth_vhost0,iface=/vhost-net0,queues=16,client=1' \ -- -p 0x1 \ --parse-ptype 1 \ --config "(0,0,1),(0,1,2),(0,2,3),(0,3,4),(0,4,5),(0,5,6),(0,6,7),(0,7,8),(0,8,9),(0,9,10),(0,10,11),(0,11,12),(0,12,13),(0,13,14),(0,14,15),(0,15,16)" 2. Launch VM1 with server mode: 3. Relauch l3fwd-power sample to for reconnection: ./l3fwd-power -l 1-16 \ -n 4 --socket-mem 1024,1024 --legacy-mem --no-pci\ --log-level=9 \ --vdev 'eth_vhost0,iface=/vhost-net0,queues=16,client=1' \ -- -p 0x1 \ --parse-ptype 1 \ --config "(0,0,1),(0,1,2),(0,2,3),(0,3,4),(0,4,5),(0,5,6),(0,6,7),(0,7,8),(0,8,9),(0,9,10),(0,10,11),(0,11,12),(0,12,13),(0,13,14),(0,14,15),(0,15,16)" 4. Set vitio-net with 16 quques and give vitio-net ip address: ethtool -L [ens3] combined 16# [ens3] is the name of virtio-net ifconfig [ens3] 1.1.1.1 5. Send packets with different IPs from virtio-net, notice to bind each vcpu to different send packets process:: taskset -c 0 ping 1.1.1.2 taskset -c 1 ping 1.1.1.3 taskset -c 2 ping 1.1.1.4 taskset -c 3 ping 1.1.1.5 taskset -c 4 ping 1.1.1.6 taskset -c 5 ping 1.1.1.7 taskset -c 6 ping 1.1.1.8 taskset -c 7 ping 1.1.1.9 taskset -c 8 ping 1.1.1.2 taskset -c 9 ping 1.1.1.2 taskset -c 10 ping 1.1.1.2 taskset -c 11 ping 1.1.1.2 taskset -c 12 ping 1.1.1.2 taskset -c 13 ping 1.1.1.2 taskset -c 14 ping 1.1.1.2 taskset -c 15 ping 1.1.1.2 If everything ok, then we can see the result such as following: L3FWD_POWER: lcore 0 is waked up from rx interrupt on port 0 queue 0 ... ... L3FWD_POWER: lcore 15 is waked up from rx interrupt on port 0 queue 15 But we can't see the log above because of the bug. To manage notifications about this bug
Re: [PATCH] console: make QMP screendump use coroutine
Gerd Hoffmann writes: > Hi, > >> Thanks to the QMP coroutine support, the screendump handler can >> trigger a graphic_hw_update(), yield and let the main loop run until >> update is done. Then the handler is resumed, and the ppm_save() will >> write the screen image to disk in the coroutine context (thus >> non-blocking). >> >> For now, HMP doesn't have coroutine support, so it remains potentially >> outdated or glitched. >> >> Fixes: >> https://bugzilla.redhat.com/show_bug.cgi?id=1230527 >> >> Based-on: <20200109183545.27452-2-kw...@redhat.com> > > What is the status here? Tried to apply (worked) and build (failed), > seems Kevins patches are not merged yet? I reviewed v3, Kevin worked in improvements promptly, and I failed to review v4 promptly. Sorry about that.
Re: [PATCH v2 2/2] target/arm: Split out aa64_va_parameter_tbi, aa64_va_parameter_tbid
On Tue, 11 Feb 2020 at 19:42, Richard Henderson wrote: > > For the purpose of rebuild_hflags_a64, we do not need to compute > all of the va parameters, only tbi. Moreover, we can compute them > in a form that is more useful to storing in hflags. > > This eliminates the need for aa64_va_parameter_both, so fold that > in to aa64_va_parameter. The remaining calls to aa64_va_parameter > are in get_phys_addr_lpae and in pauth_helper.c. > > This reduces the total cpu consumption of aa64_va_parameter in a > kernel boot plus a kvm guest kernel boot from 3% to 0.5%. > > Signed-off-by: Richard Henderson Reviewed-by: Peter Maydell thanks -- PMM
Re: [PATCH v2 1/2] target/arm: Fix select for aa64_va_parameters_both
On Tue, 11 Feb 2020 at 19:42, Richard Henderson wrote: > > Select should always be 0 for a regime with one range. > > Signed-off-by: Richard Henderson This change makes sense, and matches what aa32_va_parameters() does, but I think we need to update some of the callsites. (1) In get_phys_addr_lpae() we have the check: if (-top_bits != param.select || (param.select && !ttbr1_valid)) { where ttbr1_valid is the return value of (effectively) aarch64 ? regime_has_2_ranges() : (el != 2); but I think it's no longer possible to get here with param.select == 1 and !ttbr1_valid, so this becomes a dead check. (Side note, could we pull "ttbr1_valid = regime_has_2_ranges(mmu_idx);" out of the "if (aarch64) {...} else {...}" ? -- I think it works for aarch32 too, right?) (2) in pauth_original_ptr() we do uint64_t extfield = -param.select; but in the pseudocode Auth() function the extfield is unconditionally calculated based on bit 55 of the address, regardless of whether the regime has 1 range or 2. So I think this code can't use param.select any more but should simply pull out and replicate bit 55 of its 'ptr' argument, now that param.select is not simply the value of bit 55. Change 1 would need to be done after this patch and change 2 before it. thanks -- PMM
Re: [PATCH v2 1/2] target/arm: Fix select for aa64_va_parameters_both
On Thu, 13 Feb 2020 at 13:12, Peter Maydell wrote: > > On Tue, 11 Feb 2020 at 19:42, Richard Henderson > wrote: > > > > Select should always be 0 for a regime with one range. > > > > Signed-off-by: Richard Henderson > > This change makes sense, and matches what aa32_va_parameters() does, > but I think we need to update some of the callsites. Assuming those changes are done in separate patches, for this patch itself: Reviewed-by: Peter Maydell thanks -- PMM
Re: [PATCH v2] hw/char/exynos4210_uart: Fix memleaks in exynos4210_uart_init
On Thu, 13 Feb 2020 at 10:09, Philippe Mathieu-Daudé wrote: > > On 2/13/20 3:56 AM, kuhn.chen...@huawei.com wrote: > > From: Chen Qun > > > > It's easy to reproduce as follow: > > virsh qemu-monitor-command vm1 --pretty '{"execute": > > "device-list-properties", > > "arguments":{"typename":"exynos4210.uart"}}' > > > > ASAN shows memory leak stack: > >#1 0xfffd896d71cb in g_malloc0 (/lib64/libglib-2.0.so.0+0x571cb) > >#2 0xaaad270beee3 in timer_new_full /qemu/include/qemu/timer.h:530 > >#3 0xaaad270beee3 in timer_new /qemu/include/qemu/timer.h:551 > >#4 0xaaad270beee3 in timer_new_ns /qemu/include/qemu/timer.h:569 > >#5 0xaaad270beee3 in exynos4210_uart_init > > /qemu/hw/char/exynos4210_uart.c:677 > >#6 0xaaad275c8f4f in object_initialize_with_type /qemu/qom/object.c:516 > >#7 0xaaad275c91bb in object_new_with_type /qemu/qom/object.c:684 > >#8 0xaaad2755df2f in qmp_device_list_properties > > /qemu/qom/qom-qmp-cmds.c:152 > > > > Reported-by: Euler Robot > > Signed-off-by: Chen Qun > > --- > > Changes V2 to V1: > > -Keep s->wordtime in exynos4210_uart_init (Base on Eduardo and Philippe's > > comments). > > Thanks. > > Reviewed-by: Philippe Mathieu-Daudé Applied to target-arm.next, thanks. -- PMM
[PULL 3/6] migration/rdma: rdma_accept_incoming_migration fix error handling
From: "Dr. David Alan Gilbert" rdma_accept_incoming_migration is called from an fd handler and can't return an Error * anywhere. Currently it's leaking Error's in errp/local_err - there's no point putting them in there unless we can report them. Turn most into fprintf's, and the last into an error_reportf_err where it's coming up from another function. Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/rdma.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index 2379b8345b..f61587891b 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3980,13 +3980,13 @@ static void rdma_accept_incoming_migration(void *opaque) RDMAContext *rdma = opaque; int ret; QEMUFile *f; -Error *local_err = NULL, **errp = &local_err; +Error *local_err = NULL; trace_qemu_rdma_accept_incoming_migration(); ret = qemu_rdma_accept(rdma); if (ret) { -ERROR(errp, "RDMA Migration initialization failed!"); +fprintf(stderr, "RDMA ERROR: Migration initialization failed\n"); return; } @@ -3998,13 +3998,16 @@ static void rdma_accept_incoming_migration(void *opaque) f = qemu_fopen_rdma(rdma, "rb"); if (f == NULL) { -ERROR(errp, "could not qemu_fopen_rdma!"); +fprintf(stderr, "RDMA ERROR: could not qemu_fopen_rdma\n"); qemu_rdma_cleanup(rdma); return; } rdma->migration_started_on_destination = 1; -migration_fd_process_incoming(f, errp); +migration_fd_process_incoming(f, &local_err); +if (local_err) { +error_reportf_err(local_err, "RDMA ERROR:"); +} } void rdma_start_incoming_migration(const char *host_port, Error **errp) -- 2.24.1
[PULL 2/6] migration: Optimization about wait-unplug migration state
From: Keqian Zhu qemu_savevm_nr_failover_devices() is originally designed to get the number of failover devices, but it actually returns the number of "unplug-pending" failover devices now. Moreover, what drives migration state to wait-unplug should be the number of "unplug-pending" failover devices, not all failover devices. We can also notice that qemu_savevm_state_guest_unplug_pending() and qemu_savevm_nr_failover_devices() is equivalent almost (from the code view). So the latter is incorrect semantically and useless, just delete it. In the qemu_savevm_state_guest_unplug_pending(), once hit a unplug-pending failover device, then it can return true right now to save cpu time. Signed-off-by: Keqian Zhu Reviewed-by: Juan Quintela Tested-by: Jens Freimann Reviewed-by: Jens Freimann Signed-off-by: Juan Quintela --- migration/migration.c | 2 +- migration/savevm.c| 24 +++- migration/savevm.h| 1 - 3 files changed, 4 insertions(+), 23 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 1ca6be2323..8fb68795dc 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3341,7 +3341,7 @@ static void *migration_thread(void *opaque) qemu_savevm_state_setup(s->to_dst_file); -if (qemu_savevm_nr_failover_devices()) { +if (qemu_savevm_state_guest_unplug_pending()) { migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_WAIT_UNPLUG); diff --git a/migration/savevm.c b/migration/savevm.c index f19cb9ec7a..1d4220ece8 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1140,36 +1140,18 @@ void qemu_savevm_state_header(QEMUFile *f) } } -int qemu_savevm_nr_failover_devices(void) +bool qemu_savevm_state_guest_unplug_pending(void) { SaveStateEntry *se; -int n = 0; QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { if (se->vmsd && se->vmsd->dev_unplug_pending && se->vmsd->dev_unplug_pending(se->opaque)) { -n++; -} -} - -return n; -} - -bool qemu_savevm_state_guest_unplug_pending(void) -{ -SaveStateEntry *se; -int n = 0; - -QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { -if (!se->vmsd || !se->vmsd->dev_unplug_pending) { -continue; -} -if (se->vmsd->dev_unplug_pending(se->opaque)) { -n++; +return true; } } -return n > 0; +return false; } void qemu_savevm_state_setup(QEMUFile *f) diff --git a/migration/savevm.h b/migration/savevm.h index c42b9c80ee..ba64a7e271 100644 --- a/migration/savevm.h +++ b/migration/savevm.h @@ -31,7 +31,6 @@ bool qemu_savevm_state_blocked(Error **errp); void qemu_savevm_state_setup(QEMUFile *f); -int qemu_savevm_nr_failover_devices(void); bool qemu_savevm_state_guest_unplug_pending(void); int qemu_savevm_state_resume_prepare(MigrationState *s); void qemu_savevm_state_header(QEMUFile *f); -- 2.24.1
[PULL 0/6] Pull migration patches
The following changes since commit e18e5501d8ac692d32657a3e1ef545b14e72b730: Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-virtiofs-20200210' into staging (2020-02-10 18:09:14 +) are available in the Git repository at: https://github.com/juanquintela/qemu.git tags/pull-migration-pull-request for you to fetch changes up to 1a920d2b633e13df8961328b3b3e128989a34570: git: Make submodule check only needed modules (2020-02-13 11:31:58 +0100) Migration pull request - don't pause when migration has been cancelled (Zhimin) - fix memleaks in tests (pan)( - optimize wait-unplug (keqian) - improve rdma error handling/messages (dave) - add some flexibility in autoconverge test (dave) - git-submodule: allow compiliation from same tree with different number of git-submodules (juan) Please, Apply. Dr. David Alan Gilbert (2): migration/rdma: rdma_accept_incoming_migration fix error handling tests/migration: Add some slack to auto converge Juan Quintela (1): git: Make submodule check only needed modules Keqian Zhu (1): migration: Optimization about wait-unplug migration state Pan Nengyuan (1): migration-test: fix some memleaks in migration-test Zhimin Feng (1): migration: Maybe VM is paused when migration is cancelled migration/migration.c| 26 +- migration/rdma.c | 11 +++ migration/savevm.c | 24 +++- migration/savevm.h | 1 - scripts/git-submodule.sh | 12 tests/qtest/migration-test.c | 17 ++--- 6 files changed, 49 insertions(+), 42 deletions(-) -- 2.24.1
[PULL 4/6] tests/migration: Add some slack to auto converge
From: "Dr. David Alan Gilbert" There's an assert in autoconverge that checks that we quit the iteration when we go below the expected threshold. Philippe saw a case where this assert fired with the measured value slightly over the threshold. (about 3k out of a few million). I can think of two reasons: a) Rounding errors b) That after we make the decision to quit iteration we do one more sync and that sees a few more dirty pages. So add 1% slack to the assertion, that should cover a and most cases of b, probably all we'll see for the test. Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Juan Quintela Reviewed-by: Peter Xu Signed-off-by: Juan Quintela --- tests/qtest/migration-test.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index cf27ebbc9d..a78ac0c7da 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -1237,7 +1237,8 @@ static void test_migrate_auto_converge(void) g_assert_cmpint(percentage, <=, max_pct); remaining = read_ram_property_int(from, "remaining"); -g_assert_cmpint(remaining, <, expected_threshold); +g_assert_cmpint(remaining, <, +(expected_threshold + expected_threshold / 100)); migrate_continue(from, "pre-switchover"); -- 2.24.1
[PULL 5/6] migration-test: fix some memleaks in migration-test
From: Pan Nengyuan spotted by asan, 'check-qtest-aarch64' runs fail if sanitizers is enabled. Reported-by: Euler Robot Signed-off-by: Pan Nengyuan Reviewed-by: Juan Quintela Reviewed-by: Laurent Vivier Signed-off-by: Juan Quintela --- tests/qtest/migration-test.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index a78ac0c7da..ccf313f288 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -498,11 +498,13 @@ static int test_migrate_start(QTestState **from, QTestState **to, const char *arch = qtest_get_arch(); const char *machine_opts = NULL; const char *memory_size; +int ret = 0; if (args->use_shmem) { if (!g_file_test("/dev/shm", G_FILE_TEST_IS_DIR)) { g_test_skip("/dev/shm is not supported"); -return -1; +ret = -1; +goto out; } } @@ -611,8 +613,9 @@ static int test_migrate_start(QTestState **from, QTestState **to, g_free(shmem_path); } +out: migrate_start_destroy(args); -return 0; +return ret; } static void test_migrate_end(QTestState *from, QTestState *to, bool test_dest) @@ -1134,6 +1137,8 @@ static void test_validate_uuid(void) { MigrateStart *args = migrate_start_new(); +g_free(args->opts_source); +g_free(args->opts_target); args->opts_source = g_strdup("-uuid ----"); args->opts_target = g_strdup("-uuid ----"); do_test_validate_uuid(args, false); @@ -1143,6 +1148,8 @@ static void test_validate_uuid_error(void) { MigrateStart *args = migrate_start_new(); +g_free(args->opts_source); +g_free(args->opts_target); args->opts_source = g_strdup("-uuid ----"); args->opts_target = g_strdup("-uuid ----"); args->hide_stderr = true; @@ -1153,6 +1160,7 @@ static void test_validate_uuid_src_not_set(void) { MigrateStart *args = migrate_start_new(); +g_free(args->opts_target); args->opts_target = g_strdup("-uuid ----"); args->hide_stderr = true; do_test_validate_uuid(args, false); @@ -1162,6 +1170,7 @@ static void test_validate_uuid_dst_not_set(void) { MigrateStart *args = migrate_start_new(); +g_free(args->opts_source); args->opts_source = g_strdup("-uuid ----"); args->hide_stderr = true; do_test_validate_uuid(args, false); @@ -1380,6 +1389,7 @@ static void test_multifd_tcp_cancel(void) " 'arguments': { 'uri': 'tcp:127.0.0.1:0' }}"); qobject_unref(rsp); +g_free(uri); uri = migrate_get_socket_address(to2, "socket-address"); wait_for_migration_status(from, "cancelled", NULL); -- 2.24.1
[PULL 1/6] migration: Maybe VM is paused when migration is cancelled
From: Zhimin Feng If the migration is cancelled when it is in the completion phase, the migration state is set to MIGRATION_STATUS_CANCELLING. The VM maybe wait for the 'pause_sem' semaphore in migration_maybe_pause function, so that VM always is paused. Reported-by: Euler Robot Signed-off-by: Zhimin Feng Reviewed-by: Juan Quintela Signed-off-by: Juan Quintela --- migration/migration.c | 24 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 3a21a4686c..1ca6be2323 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2797,14 +2797,22 @@ static int migration_maybe_pause(MigrationState *s, /* This block intentionally left blank */ } -qemu_mutex_unlock_iothread(); -migrate_set_state(&s->state, *current_active_state, - MIGRATION_STATUS_PRE_SWITCHOVER); -qemu_sem_wait(&s->pause_sem); -migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER, - new_state); -*current_active_state = new_state; -qemu_mutex_lock_iothread(); +/* + * If the migration is cancelled when it is in the completion phase, + * the migration state is set to MIGRATION_STATUS_CANCELLING. + * So we don't need to wait a semaphore, otherwise we would always + * wait for the 'pause_sem' semaphore. + */ +if (s->state != MIGRATION_STATUS_CANCELLING) { +qemu_mutex_unlock_iothread(); +migrate_set_state(&s->state, *current_active_state, + MIGRATION_STATUS_PRE_SWITCHOVER); +qemu_sem_wait(&s->pause_sem); +migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER, + new_state); +*current_active_state = new_state; +qemu_mutex_lock_iothread(); +} return s->state == new_state ? 0 : -EINVAL; } -- 2.24.1
[PULL 6/6] git: Make submodule check only needed modules
If one is compiling more than one tree from the same source, it is possible that they need different submodules. Change the check to see that all modules that we are interested in are updated, discarding the ones that we don't care about. Signed-off-by: Juan Quintela --- v1->v2: patchw insists in not using modules --- scripts/git-submodule.sh | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/scripts/git-submodule.sh b/scripts/git-submodule.sh index 98ca0f2737..65ed877aef 100755 --- a/scripts/git-submodule.sh +++ b/scripts/git-submodule.sh @@ -59,10 +59,14 @@ status) fi test -f "$substat" || exit 1 -CURSTATUS=$($GIT submodule status $modules) -OLDSTATUS=$(cat $substat) -test "$CURSTATUS" = "$OLDSTATUS" -exit $? +for module in $modules; do +CURSTATUS=$($GIT submodule status $module) +OLDSTATUS=$(cat $substat | grep $module) +if test "$CURSTATUS" != "$OLDSTATUS"; then +exit 1 +fi +done +exit 0 ;; update) if test -z "$maybe_modules" -- 2.24.1
Re: [PATCH v5 0/6] small vhost changes and in-band notifications
On Thu, 2020-01-23 at 09:17 +0100, Johannes Berg wrote: > Hi, > > Here's a repost of all the patches I sent back in August, with the > in-band notifications rebased over the reset patch, so IDs have now > changed a bit. Ping? The patches still apply on top of latest qemu. I wanted to send some corresponding kernel patches, but without the protocol nailed down ... johannes
Re: [PATCH v2 1/2] target/arm: Fix select for aa64_va_parameters_both
On Thu, 13 Feb 2020 at 13:12, Peter Maydell wrote: > > On Tue, 11 Feb 2020 at 19:42, Richard Henderson > wrote: > > > > Select should always be 0 for a regime with one range. > > > > Signed-off-by: Richard Henderson > > This change makes sense, and matches what aa32_va_parameters() does, > but I think we need to update some of the callsites. > > (1) In get_phys_addr_lpae() we have the check: > > if (-top_bits != param.select || (param.select && !ttbr1_valid)) { > > where ttbr1_valid is the return value of (effectively) > aarch64 ? regime_has_2_ranges() : (el != 2); > but I think it's no longer possible to get here with param.select == 1 > and !ttbr1_valid, so this becomes a dead check. ...or should the code instead be checking literal pointer bit 55 against ttbr1_valid now ? thanks -- PMM
Re: [PATCH v5 4/8] multifd: Add multifd-zlib-level parameter
Daniel P. Berrangé writes: > On Thu, Jan 30, 2020 at 09:03:00AM +0100, Markus Armbruster wrote: >> Juan Quintela writes: >> >> > It will indicate which level use for compression. >> > >> > Signed-off-by: Juan Quintela >> >> This is slightly confusing (there is no zlib compression), unless you >> peek at the next patch (which adds zlib compression). >> >> Three ways to make it less confusing: >> >> * Squash the two commits >> >> * Swap them: first add zlib compression with level hardcoded to 1, then >> make the level configurable. >> >> * Have the first commit explain itself better. Something like >> >> multifd: Add multifd-zlib-level parameter >> >> This parameter specifies zlib compression level. The next patch >> will put it to use. > > Wouldn't the "normal" best practice for QAPI design be to use a > enum and discriminated union. eg > > { 'enum': 'MigrationCompression', > 'data': ['none', 'zlib'] } > > { 'struct': 'MigrationCompressionParamsZLib', > 'data': { 'compression-level' } } > > { 'union': 'MigrationCompressionParams', > 'base': { 'mode': 'MigrationCompression' }, > 'discriminator': 'mode', > 'data': { > 'zlib': 'MigrationCompressionParamsZLib', > } This expresses the connection between compression mode and level. In general, we prefer that to a bunch of optional members where the comments say things like "member A can be present only when member B has value V", or worse "member A is silently ignored unless member B has value V". However: > Of course this is quite different from how migration parameters are > done today. Maybe it makes sense to stick with the flat list of > migration parameters for consistency & ignore normal QAPI design > practice ? Unsure. It's certainly ugly now. Each parameter is defined in three places: enum MigrationParameter (for HMP), struct MigrateSetParameters (for QMP migrate-set-parameters), and struct MigrationParameters (for QMP query-migrate-parameters). I don't know how to make this better other than by starting over. I don't know whether starting over would result in enough of an improvement to make it worthwhile.
Re: [PATCH v5 0/6] small vhost changes and in-band notifications
On Thu, Feb 13, 2020 at 02:26:10PM +0100, Johannes Berg wrote: > On Thu, 2020-01-23 at 09:17 +0100, Johannes Berg wrote: > > Hi, > > > > Here's a repost of all the patches I sent back in August, with the > > in-band notifications rebased over the reset patch, so IDs have now > > changed a bit. > > Ping? > > The patches still apply on top of latest qemu. > > I wanted to send some corresponding kernel patches, but without the > protocol nailed down ... > > johannes Queued, thanks!
Re: [PATCH v2 fixed 00/16] Ram blocks with resizable anonymous allocations under POSIX
On 12.02.20 19:03, David Hildenbrand wrote: > On 12.02.20 14:42, David Hildenbrand wrote: >> We already allow resizable ram blocks for anonymous memory, however, they >> are not actually resized. All memory is mmaped() R/W, including the memory >> exceeding the used_length, up to the max_length. >> >> When resizing, effectively only the boundary is moved. Implement actually >> resizable anonymous allocations and make use of them in resizable ram >> blocks when possible. Memory exceeding the used_length will be >> inaccessible. Especially ram block notifiers require care. >> >> Having actually resizable anonymous allocations (via mmap-hackery) allows >> to reserve a big region in virtual address space and grow the >> accessible/usable part on demand. Even if "/proc/sys/vm/overcommit_memory" >> is set to "never" under Linux, huge reservations will succeed. If there is >> not enough memory when resizing (to populate parts of the reserved region), >> trying to resize will fail. Only the actually used size is reserved in the >> OS. >> >> E.g., virtio-mem [1] wants to reserve big resizable memory regions and >> grow the usable part on demand. I think this change is worth sending out >> individually. Accompanied by a bunch of minor fixes and cleanups. >> >> Especially, memory notifiers already handle resizing by first removing >> the old region, and then re-adding the resized region. prealloc is >> currently not possible with resizable ram blocks. mlock() should continue >> to work as is. Resizing is currently rare and must only happen on the >> start of an incoming migration, or during resets. No code path (except >> HAX and SEV ram block notifiers) should access memory outside of the usable >> range - and if we ever find one, that one has to be fixed (I did not >> identify any). >> >> v1 -> v2: >> - Add "util: vfio-helpers: Fix qemu_vfio_close()" >> - Add "util: vfio-helpers: Remove Error parameter from >>qemu_vfio_undo_mapping()" >> - Add "util: vfio-helpers: Factor out removal from >>qemu_vfio_undo_mapping()" >> - "util/mmap-alloc: ..." >> -- Minor changes due to review feedback (e.g., assert alignment, return >> bool when resizing) >> - "util: vfio-helpers: Implement ram_block_resized()" >> -- Reserve max_size in the IOVA address space. >> -- On resize, undo old mapping and do new mapping. We can later implement >> a new ioctl to resize the mapping directly. >> - "numa: Teach ram block notifiers about resizable ram blocks" >> -- Pass size/max_size to ram block notifiers, which makes things easier an >> cleaner >> - "exec: Ram blocks with resizable anonymous allocations under POSIX" >> -- Adapt to new ram block notifiers >> -- Shrink after notifying. Always trigger ram block notifiers on resizes >> -- Add a safety net that all ram block notifiers registered at runtime >> support resizes. >> >> [1] https://lore.kernel.org/kvm/20191212171137.13872-1-da...@redhat.com/ >> >> David Hildenbrand (16): >> util: vfio-helpers: Factor out and fix processing of existing ram >> blocks >> util: vfio-helpers: Fix qemu_vfio_close() >> util: vfio-helpers: Remove Error parameter from >> qemu_vfio_undo_mapping() >> util: vfio-helpers: Factor out removal from qemu_vfio_undo_mapping() >> exec: Factor out setting ram settings (madvise ...) into >> qemu_ram_apply_settings() >> exec: Reuse qemu_ram_apply_settings() in qemu_ram_remap() >> exec: Drop "shared" parameter from ram_block_add() >> util/mmap-alloc: Factor out calculation of pagesize to mmap_pagesize() >> util/mmap-alloc: Factor out reserving of a memory region to >> mmap_reserve() >> util/mmap-alloc: Factor out populating of memory to mmap_populate() >> util/mmap-alloc: Prepare for resizable mmaps >> util/mmap-alloc: Implement resizable mmaps >> numa: Teach ram block notifiers about resizable ram blocks >> util: vfio-helpers: Implement ram_block_resized() >> util: oslib: Resizable anonymous allocations under POSIX >> exec: Ram blocks with resizable anonymous allocations under POSIX >> >> exec.c | 104 +++ >> hw/core/numa.c | 53 +++- >> hw/i386/xen/xen-mapcache.c | 7 +- >> include/exec/cpu-common.h | 3 + >> include/exec/memory.h | 8 ++ >> include/exec/ramlist.h | 14 +++- >> include/qemu/mmap-alloc.h | 21 +++-- >> include/qemu/osdep.h | 6 +- >> stubs/ram-block.c | 20 - >> target/i386/hax-mem.c | 5 +- >> target/i386/sev.c | 18 ++-- >> util/mmap-alloc.c | 165 +++-- >> util/oslib-posix.c | 37 - >> util/oslib-win32.c | 14 >> util/trace-events | 9 +- >> util/vfio-helpers.c| 145 +--- >> 16 files changed, 450 insertions(+), 179 deletions(-) >> > > 1. I will do resizable -> resizeable > 2. I think migration might indeed need some care regarding >