Re: [PATCH v11 7/7] docs/zoned-storage: add zoned device documentation
Hannes Reinecke 于2022年10月10日周一 14:19写道: > > On 10/10/22 04:21, Sam Li wrote: > > Add the documentation about the zoned device support to virtio-blk > > emulation. > > > > Signed-off-by: Sam Li > > Reviewed-by: Stefan Hajnoczi > > --- > > docs/devel/zoned-storage.rst | 40 ++ > > docs/system/qemu-block-drivers.rst.inc | 6 > > 2 files changed, 46 insertions(+) > > create mode 100644 docs/devel/zoned-storage.rst > > > > diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst > > new file mode 100644 > > index 00..deaa4ce99b > > --- /dev/null > > +++ b/docs/devel/zoned-storage.rst > > @@ -0,0 +1,40 @@ > > += > > +zoned-storage > > += > > + > > +Zoned Block Devices (ZBDs) devide the LBA space into block regions called > > zones > > divide > > > +that are larger than the LBA size. They can only allow sequential writes, > > which > > +can reduce write amplification in SSDs, and potentially lead to higher > > +throughput and increased capacity. More details about ZBDs can be found at: > > + > > +https://zonedstorage.io/docs/introduction/zoned-storage > > + > > +1. Block layer APIs for zoned storage > > +- > > +QEMU block layer has three zoned storage model: > > +- BLK_Z_HM: This model only allows sequential writes access. It supports a > > set > > +of ZBD-specific I/O request that used by the host to manage device zones. > > Maybe: > This model only allow for sequential write access to zones. It supports > ZBD-specific I/O requests to manage device zones. > > > +- BLK_Z_HA: It deals with both sequential writes and random writes access. > > Maybe better: > This model allows sequential and random writes to zones. It supports > ZBD-specific I/O requests to manage device zones. > > > +- BLK_Z_NONE: Regular block devices and drive-managed ZBDs are treated as > > +non-zoned devices. > > Maybe: > This is the default model with no zones support; it includes both > regular and drive-managed ZBD devices. ZBD-specific I/O requests are not > supported. Thanks! > > > + > > +The block device information resides inside BlockDriverState. QEMU uses > > +BlockLimits struct(BlockDriverState::bl) that is continuously accessed by > > the > > +block layer while processing I/O requests. A BlockBackend has a root > > pointer to > > +a BlockDriverState graph(for example, raw format on top of file-posix). The > > +zoned storage information can be propagated from the leaf BlockDriverState > > all > > +the way up to the BlockBackend. If the zoned storage model in file-posix is > > +set to BLK_Z_HM, then block drivers will declare support for zoned host > > device. > > + > > +The block layer APIs support commands needed for zoned storage devices, > > +including report zones, four zone operations, and zone append. > > + > > +2. Emulating zoned storage controllers > > +-- > > +When the BlockBackend's BlockLimits model reports a zoned storage device, > > users > > +like the virtio-blk emulation or the qemu-io-cmds.c utility can use block > > layer > > +APIs for zoned storage emulation or testing. > > + > > +For example, to test zone_report on a null_blk device using qemu-io is: > > +$ path/to/qemu-io --image-opts -n > > driver=zoned_host_device,filename=/dev/nullb0 > > +-c "zrp offset nr_zones" > > diff --git a/docs/system/qemu-block-drivers.rst.inc > > b/docs/system/qemu-block-drivers.rst.inc > > index dfe5d2293d..0b97227fd9 100644 > > --- a/docs/system/qemu-block-drivers.rst.inc > > +++ b/docs/system/qemu-block-drivers.rst.inc > > @@ -430,6 +430,12 @@ Hard disks > > you may corrupt your host data (use the ``-snapshot`` command > > line option or modify the device permissions accordingly). > > > > +Zoned block devices > > + Zoned block devices can be passed through to the guest if the emulated > > storage > > + controller supports zoned storage. Use ``--blockdev zoned_host_device, > > + node-name=drive0,filename=/dev/nullb0`` to pass through ``/dev/nullb0`` > > + as ``drive0``. > > + > > Windows > > ^^^ > > > > Cheers, > > Hannes > -- > Dr. Hannes ReineckeKernel Storage Architect > h...@suse.de +49 911 74053 688 > SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg > HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew > Myers, Andrew McDonald, Martje Boudien Moerman >
Re: FAILED: libqemu-arm-bsd-user.fa.p/bsd-user_signal.c.o
On 10/10/2022 08.18, Dennis Clarke wrote: On FreeBSD 14.0 CURRENT amd64 everything seems to go swimmingly until : [5679/6848] Compiling C object libqemu-arm-bsd-user.fa.p/bsd-user_mmap.c.o [5680/6848] Compiling C object libqemu-arm-bsd-user.fa.p/bsd-user_signal.c.o FAILED: libqemu-arm-bsd-user.fa.p/bsd-user_signal.c.o /usr/bin/cc -m64 -mcx16 -Ilibqemu-arm-bsd-user.fa.p -I. -I.. -Itarget/arm -I../target/arm -I../common-user/host/x86_64 -I../bsd-user/include -Ibsd-user/freebsd -I../bsd-user/freebsd -I../bsd-user/host/x86_64 -Ibsd-user -I../bsd-user -I../bsd-user/arm -Iqapi -Itrace -Iui -Iui/shader -I/usr/local/include -I/usr/local/include/glib-2.0 -I/usr/local/lib/glib-2.0/include -fcolor-diagnostics -Wall -Winvalid-pch -std=gnu11 -O0 -g -iquote . -iquote /opt/bw/build/qemu -iquote /opt/bw/build/qemu/include -iquote /opt/bw/build/qemu/tcg/i386 -pthread -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wno-initializer-overrides -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-string-plus-int -Wno-typedef-redefinition -Wno-tautological-type-limit-compare -Wno-psabi -Wno-gnu-variable-sized-type-not-at-end -fstack-protector-strong -m64 -g -O0 -fno-builtin -fPIE -DNEED_CPU_H '-DCONFIG_TARGET="arm-bsd-user-config-target.h"' '-DCONFIG_DEVICES="arm-bsd-user-config-devices.h"' -MD -MQ libqemu-arm-bsd-user.fa.p/bsd-user_signal.c.o -MF libqemu-arm-bsd-user.fa.p/bsd-user_signal.c.o.d -o libqemu-arm-bsd-user.fa.p/bsd-user_signal.c.o -c ../bsd-user/signal.c In file included from ../bsd-user/signal.c:27: In file included from ../bsd-user/host/x86_64/host-signal.h:14: In file included from /usr/include/vm/pmap.h:92: /usr/include/machine/pmap.h:452:2: error: fields must have a constant size: 'variable length array in structure' extension will never be supported PV_CHUNK_HEADER ^ /usr/include/machine/pmap.h:448:12: note: expanded from macro 'PV_CHUNK_HEADER' uint64_t pc_map[_NPCM]; /* bitmap; 1 = free */ \ ^ /usr/include/machine/pmap.h:456:2: error: fields must have a constant size: 'variable length array in structure' extension will never be supported PV_CHUNK_HEADER ^ /usr/include/machine/pmap.h:448:12: note: expanded from macro 'PV_CHUNK_HEADER' uint64_t pc_map[_NPCM]; /* bitmap; 1 = free */ \ ^ 2 errors generated. ninja: build stopped: subcommand failed. gmake[1]: *** [Makefile:165: run-ninja] Error 1 gmake[1]: Leaving directory '/opt/bw/build/qemu/build' gmake: *** [GNUmakefile:11: all] Error 2 phobos# Is there a trivial patch ? Or perhaps try again using GCC and not LLVM/Clang? I'm not using FreeBSD, so no real clue, but this pretty much sounds like _NPCM is not properly defined by your system headers anymore, so I assume this is a problem on the FreeBSD side ... I'd suggest to report it on the FreeBSD mailing list. Thomas
Re: total fail on FreeBSD 14.0 amd64 regardless of compiler
On 10/10/2022 08.56, Dennis Clarke wrote: re: https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg01249.html Using GCC 12 is even worse : [2040/6841] Compiling C object qemu-system-aarch64.p/softmmu_main.c.o [2041/6841] Linking target qemu-system-aarch64 FAILED: qemu-system-aarch64 /usr/local/bin/g++12 -m64 -mcx16 @qemu-system-aarch64.rsp /usr/local/bin/ld: libqemuutil.a.p/util_filemonitor-inotify.c.o: undefined reference to symbol 'inotify_init1' Now that sounds like the detection for inotify_init1 did not work right in the meson.build script... Looking at meson.build: config_host_data.set('CONFIG_INOTIFY1', cc.has_header_symbol('sys/inotify.h', 'inotify_init1')) ... do you have such a "inotify.h" header on your FreeBSD system and does it contain an inotify_init1 function? Thomas
Re: total fail on FreeBSD 14.0 amd64 regardless of compiler
On 10/10/22 07:21, Thomas Huth wrote: On 10/10/2022 08.56, Dennis Clarke wrote: re: https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg01249.html Using GCC 12 is even worse : [2040/6841] Compiling C object qemu-system-aarch64.p/softmmu_main.c.o [2041/6841] Linking target qemu-system-aarch64 FAILED: qemu-system-aarch64 /usr/local/bin/g++12 -m64 -mcx16 @qemu-system-aarch64.rsp /usr/local/bin/ld: libqemuutil.a.p/util_filemonitor-inotify.c.o: undefined reference to symbol 'inotify_init1' Now that sounds like the detection for inotify_init1 did not work right in the meson.build script... Looking at meson.build: config_host_data.set('CONFIG_INOTIFY1', cc.has_header_symbol('sys/inotify.h', 'inotify_init1')) ... do you have such a "inotify.h" header on your FreeBSD system and does it contain an inotify_init1 function? Thomas Let's see : phobos# phobos# uname -apKU FreeBSD phobos 14.0-CURRENT FreeBSD 14.0-CURRENT #14 main-n258340-497cdf9673e: Sun Oct 2 09:51:14 GMT 2022 root@phobos:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 amd64 1400072 1400072 phobos# ls -lap /usr/local/include/sys/inotify.h -rw-r--r-- 1 root wheel 4540 Oct 4 01:24 /usr/local/include/sys/inotify.h phobos# phobos# grep 'inotify_init1' /usr/local/include/sys/inotify.h /* Flags for the parameter of inotify_init1. */ int inotify_init1 (int flags) __THROW; phobos# Looks to be there. -- Dennis Clarke RISC-V/SPARC/PPC/ARM/CISC UNIX and Linux spoken GreyBeard and suspenders optional
Re: [PATCH v3] disas/riscv.c: rvv: Add disas support for vector instructions
On Wed, Sep 28, 2022 at 3:19 PM Yang Liu wrote: > > Tested with https://github.com/ksco/rvv-decoder-tests > > Expected checkpatch errors for consistency and brevity reasons: > > ERROR: line over 90 characters > ERROR: trailing statements should be on next line > ERROR: braces {} are necessary for all arms of this statement > > Signed-off-by: Yang Liu Thanks! Applied to riscv-to-apply.next Alistair > --- > disas/riscv.c | 1432 - > 1 file changed, 1430 insertions(+), 2 deletions(-) > > diff --git a/disas/riscv.c b/disas/riscv.c > index f107d94c4c..d216b9c39b 100644 > --- a/disas/riscv.c > +++ b/disas/riscv.c > @@ -158,6 +158,11 @@ typedef enum { > rv_codec_css_sqsp, > rv_codec_k_bs, > rv_codec_k_rnum, > +rv_codec_v_r, > +rv_codec_v_ldst, > +rv_codec_v_i, > +rv_codec_vsetvli, > +rv_codec_vsetivli, > } rv_codec; > > typedef enum { > @@ -560,6 +565,376 @@ typedef enum { > rv_op_zip = 396, > rv_op_xperm4 = 397, > rv_op_xperm8 = 398, > +rv_op_vle8_v = 399, > +rv_op_vle16_v = 400, > +rv_op_vle32_v = 401, > +rv_op_vle64_v = 402, > +rv_op_vse8_v = 403, > +rv_op_vse16_v = 404, > +rv_op_vse32_v = 405, > +rv_op_vse64_v = 406, > +rv_op_vlm_v = 407, > +rv_op_vsm_v = 408, > +rv_op_vlse8_v = 409, > +rv_op_vlse16_v = 410, > +rv_op_vlse32_v = 411, > +rv_op_vlse64_v = 412, > +rv_op_vsse8_v = 413, > +rv_op_vsse16_v = 414, > +rv_op_vsse32_v = 415, > +rv_op_vsse64_v = 416, > +rv_op_vluxei8_v = 417, > +rv_op_vluxei16_v = 418, > +rv_op_vluxei32_v = 419, > +rv_op_vluxei64_v = 420, > +rv_op_vloxei8_v = 421, > +rv_op_vloxei16_v = 422, > +rv_op_vloxei32_v = 423, > +rv_op_vloxei64_v = 424, > +rv_op_vsuxei8_v = 425, > +rv_op_vsuxei16_v = 426, > +rv_op_vsuxei32_v = 427, > +rv_op_vsuxei64_v = 428, > +rv_op_vsoxei8_v = 429, > +rv_op_vsoxei16_v = 430, > +rv_op_vsoxei32_v = 431, > +rv_op_vsoxei64_v = 432, > +rv_op_vle8ff_v = 433, > +rv_op_vle16ff_v = 434, > +rv_op_vle32ff_v = 435, > +rv_op_vle64ff_v = 436, > +rv_op_vl1re8_v = 437, > +rv_op_vl1re16_v = 438, > +rv_op_vl1re32_v = 439, > +rv_op_vl1re64_v = 440, > +rv_op_vl2re8_v = 441, > +rv_op_vl2re16_v = 442, > +rv_op_vl2re32_v = 443, > +rv_op_vl2re64_v = 444, > +rv_op_vl4re8_v = 445, > +rv_op_vl4re16_v = 446, > +rv_op_vl4re32_v = 447, > +rv_op_vl4re64_v = 448, > +rv_op_vl8re8_v = 449, > +rv_op_vl8re16_v = 450, > +rv_op_vl8re32_v = 451, > +rv_op_vl8re64_v = 452, > +rv_op_vs1r_v = 453, > +rv_op_vs2r_v = 454, > +rv_op_vs4r_v = 455, > +rv_op_vs8r_v = 456, > +rv_op_vadd_vv = 457, > +rv_op_vadd_vx = 458, > +rv_op_vadd_vi = 459, > +rv_op_vsub_vv = 460, > +rv_op_vsub_vx = 461, > +rv_op_vrsub_vx = 462, > +rv_op_vrsub_vi = 463, > +rv_op_vwaddu_vv = 464, > +rv_op_vwaddu_vx = 465, > +rv_op_vwadd_vv = 466, > +rv_op_vwadd_vx = 467, > +rv_op_vwsubu_vv = 468, > +rv_op_vwsubu_vx = 469, > +rv_op_vwsub_vv = 470, > +rv_op_vwsub_vx = 471, > +rv_op_vwaddu_wv = 472, > +rv_op_vwaddu_wx = 473, > +rv_op_vwadd_wv = 474, > +rv_op_vwadd_wx = 475, > +rv_op_vwsubu_wv = 476, > +rv_op_vwsubu_wx = 477, > +rv_op_vwsub_wv = 478, > +rv_op_vwsub_wx = 479, > +rv_op_vadc_vvm = 480, > +rv_op_vadc_vxm = 481, > +rv_op_vadc_vim = 482, > +rv_op_vmadc_vvm = 483, > +rv_op_vmadc_vxm = 484, > +rv_op_vmadc_vim = 485, > +rv_op_vsbc_vvm = 486, > +rv_op_vsbc_vxm = 487, > +rv_op_vmsbc_vvm = 488, > +rv_op_vmsbc_vxm = 489, > +rv_op_vand_vv = 490, > +rv_op_vand_vx = 491, > +rv_op_vand_vi = 492, > +rv_op_vor_vv = 493, > +rv_op_vor_vx = 494, > +rv_op_vor_vi = 495, > +rv_op_vxor_vv = 496, > +rv_op_vxor_vx = 497, > +rv_op_vxor_vi = 498, > +rv_op_vsll_vv = 499, > +rv_op_vsll_vx = 500, > +rv_op_vsll_vi = 501, > +rv_op_vsrl_vv = 502, > +rv_op_vsrl_vx = 503, > +rv_op_vsrl_vi = 504, > +rv_op_vsra_vv = 505, > +rv_op_vsra_vx = 506, > +rv_op_vsra_vi = 507, > +rv_op_vnsrl_wv = 508, > +rv_op_vnsrl_wx = 509, > +rv_op_vnsrl_wi = 510, > +rv_op_vnsra_wv = 511, > +rv_op_vnsra_wx = 512, > +rv_op_vnsra_wi = 513, > +rv_op_vmseq_vv = 514, > +rv_op_vmseq_vx = 515, > +rv_op_vmseq_vi = 516, > +rv_op_vmsne_vv = 517, > +rv_op_vmsne_vx = 518, > +rv_op_vmsne_vi = 519, > +rv_op_vmsltu_vv = 520, > +rv_op_vmsltu_vx = 521, > +rv_op_vmslt_vv = 522, > +rv_op_vmslt_vx = 523, > +rv_op_vmsleu_vv = 524, > +rv_op_vmsleu_vx = 525, > +rv_op_vmsleu_vi = 526, > +rv_op_vmsle_vv = 527, > +rv_op_vmsle_vx = 528, > +rv_op_vmsle_vi = 529, > +rv_op_vmsgtu_vx = 530, > +rv_op_vmsgtu_vi = 531, > +rv_op_vmsgt_vx = 532, > +rv_op_vmsgt_vi = 533, > +rv_op_vminu_vv =
[PATCH v3 00/10] Introduce new acpi/smbios avocado tests using biosbits
Please see the README file added in patch 10 for more details. Sample runs are as follows: $ ./tests/venv/bin/avocado run -t acpi tests/avocado --tap - ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? (smbios.py, line 92) ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? (smilatency.py, line 47) ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? (testacpi.py, line 158) 1..1 ok 1 tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits $ ./tests/venv/bin/avocado run -t acpi tests/avocado ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? (smbios.py, line 92) ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? (smilatency.py, line 47) ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? (testacpi.py, line 158) Fetching asset from tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits JOB ID : c06a9feb423bcda5de89bb51353c6c1718719f14 JOB LOG: /home/anisinha/avocado/job-results/job-2022-10-10T13.12-c06a9fe/job.log (1/1) tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits: PASS (34.11 s) RESULTS: PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0 JOB TIME : 37.40 s ./tests/venv/bin/avocado run tests/avocado/acpi-bits.py Fetching asset from tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits JOB ID : a2476dd3fafe289c5e6475f447bc8369f77d57ba JOB LOG: /home/anisinha/avocado/job-results/job-2022-10-10T13.14-a2476dd/job.log (1/1) tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits: PASS (34.07 s) RESULTS: PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0 JOB TIME : 40.20 s Changelog: v3: - converted the existing test to avocado tests as per the popular recommendation. Added appropriate tags. - artifact download URL modified based on gitlab-CI URL. For biosbits repo: - moved to a gitlab repo with me being the maintainer. - added .gitlab-ci.yml file to generate the artifacts. v2: - a new class of python based tests introduced that is separate from avocado tests or qtests. Can be run by using "make check-pytest". - acpi biosbits tests are the first tests to use pytest environment. - bios bits tests now download the bits binary archives from a remote repository if they are not found locally. The test skips if download fails. - A new environment variable is introduced that can be passed by the tester to specify the location of the bits archives locally. test skips if the bits binaries are not found in that location. - if pip install of python module fails for whatever reaoson, the test skips. - misc code fixes including spell check of the README doc. README has been updated as well. - addition of SPDX license headers to bits test files. - update MAINTAINERS to reflect the new pytest test class. For biosbits repo: - added Dockerfile and build script. Made bios bits build on gcc 11. https://github.com/ani-sinha/bits/blob/bits-qemu-logging/Dockerfile https://github.com/ani-sinha/bits/blob/bits-qemu-logging/build-artifacts.sh The build script generates the zip archive and tarball used by the test. v1: initial patchset. uses qtest to implement the bios bits tests. Cc: Qemu Devel Cc: Daniel P. Berrangé Cc: Paolo Bonzini Cc: Maydell Peter Cc: John Snow Cc: Thomas Huth Cc: Igor M Cc: Michael Tsirkin Ani Sinha (10): acpi/tests/avocado/bits: initial commit of test scripts that are run by biosbits acpi/tests/avocado/bits: add SPDX license identifiers for bios bits tests acpi/tests/avocado/bits: disable acpi PSS tests that are failing in biosbits acpi/tests/avocado/bits: add smilatency test suite from bits in order to disable it acpi/tests/avocado/bits: add SPDX license identifiers for bios bits smilatency tests acpi/tests/avocado/bits: disable smilatency test since it does not pass everytime acpi/tests/avocado/bits: add biosbits config file for running bios tests acpi/tests/avocado/bits: add acpi and smbios avocado tests that uses biosbits MAINTAINERS: add myself as the maintainer for acpi biosbits avocado tests acpi/tests/avocado/bits: add a README file MAINTAINERS |6 + tests/avocado/acpi-bits.py| 334 +++ tests/avocado/acpi-bits/README| 90 + .../acpi-bits/bits-config/bits-cfg.txt| 18 + tests/avocado/acpi-bits/bits-tests/smbios.py | 2434 + .../acpi-bits/bits-tests/smilatency.py| 107 + .../avocado/acpi-bits/bits-tests/testacpi.py | 287 ++ .../avocado/acpi-bits/bits-tests/testcpuid.py | 87 + 8 files changed, 3363 insertions(+) create mode 100644 tests/avocado/acpi-bits.py create mode 100644 tests/avocado/acpi-bits/README create mode 100644 tests/avocado/acpi-bits/bits-config/bits-cfg.txt create mode 100644 tests/avocado/acpi-bits/bit
[PATCH v3 07/10] acpi/tests/avocado/bits: add biosbits config file for running bios tests
This change adds initial biosbits config file that instructs biosbits to run bios test suits in batch mode. Additionally acpi and smbios structures are also dumped. Cc: Daniel P. Berrangé Cc: Paolo Bonzini Cc: Maydell Peter Cc: John Snow Cc: Thomas Huth Signed-off-by: Ani Sinha --- .../avocado/acpi-bits/bits-config/bits-cfg.txt | 18 ++ 1 file changed, 18 insertions(+) create mode 100644 tests/avocado/acpi-bits/bits-config/bits-cfg.txt diff --git a/tests/avocado/acpi-bits/bits-config/bits-cfg.txt b/tests/avocado/acpi-bits/bits-config/bits-cfg.txt new file mode 100644 index 00..8010804453 --- /dev/null +++ b/tests/avocado/acpi-bits/bits-config/bits-cfg.txt @@ -0,0 +1,18 @@ +# BITS configuration file +[bits] + +# To run BITS in batch mode, set batch to a list of one or more of the +# following keywords; BITS will then run all of the requested operations, then +# save the log file to disk. +# +# test: Run the full BITS testsuite. +# acpi: Dump all ACPI structures. +# smbios: Dump all SMBIOS structures. +# +# Leave batch set to an empty string to disable batch mode. +# batch = + +# Uncomment the following to run all available batch operations +# please take a look at boot/python/init.py in bits zip file +# to see how these options are parsed and used. +batch = test acpi smbios -- 2.34.1
[PATCH v3 02/10] acpi/tests/avocado/bits: add SPDX license identifiers for bios bits tests
Added the SPDX license identifiers for biosbits tests. Also added a comment on each of the test scripts to indicate that they run from within the biosbits environment and hence are not subjected to the regular maintanance acivities for QEMU and is excluded from the dependency management challenges in the host testing environment. Cc: Daniel P. Berrangé Cc: Paolo Bonzini Cc: Maydell Peter Cc: John Snow Cc: Thomas Huth Signed-off-by: Ani Sinha --- tests/avocado/acpi-bits/bits-tests/smbios.py| 4 tests/avocado/acpi-bits/bits-tests/testacpi.py | 4 tests/avocado/acpi-bits/bits-tests/testcpuid.py | 4 3 files changed, 12 insertions(+) diff --git a/tests/avocado/acpi-bits/bits-tests/smbios.py b/tests/avocado/acpi-bits/bits-tests/smbios.py index 9667d0542c..fc623de072 100644 --- a/tests/avocado/acpi-bits/bits-tests/smbios.py +++ b/tests/avocado/acpi-bits/bits-tests/smbios.py @@ -1,6 +1,8 @@ # Copyright (c) 2015, Intel Corporation # All rights reserved. # +# SPDX-License-Identifier: BSD-3-Clause +# # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are met: # @@ -24,6 +26,8 @@ # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +# This script runs only from the biosbits VM. + """SMBIOS/DMI module.""" import bits diff --git a/tests/avocado/acpi-bits/bits-tests/testacpi.py b/tests/avocado/acpi-bits/bits-tests/testacpi.py index 9ec452f330..18dc818d62 100644 --- a/tests/avocado/acpi-bits/bits-tests/testacpi.py +++ b/tests/avocado/acpi-bits/bits-tests/testacpi.py @@ -1,6 +1,8 @@ # Copyright (c) 2015, Intel Corporation # All rights reserved. # +# SPDX-License-Identifier: BSD-3-Clause +# # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are met: # @@ -24,6 +26,8 @@ # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +# This script runs only from the biosbits VM. + """Tests for ACPI""" import acpi diff --git a/tests/avocado/acpi-bits/bits-tests/testcpuid.py b/tests/avocado/acpi-bits/bits-tests/testcpuid.py index ac55d912e1..7adefbe355 100644 --- a/tests/avocado/acpi-bits/bits-tests/testcpuid.py +++ b/tests/avocado/acpi-bits/bits-tests/testcpuid.py @@ -1,6 +1,8 @@ # Copyright (c) 2012, Intel Corporation # All rights reserved. # +# SPDX-License-Identifier: BSD-3-Clause +# # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are met: # @@ -24,6 +26,8 @@ # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +# This script runs only from the biosbits VM. + """Tests and helpers for CPUID.""" import bits -- 2.34.1
[PATCH v3 03/10] acpi/tests/avocado/bits: disable acpi PSS tests that are failing in biosbits
PSS tests in acpi test suite seems to be failing in biosbits. This is because the test is unable to find PSS support in QEMU bios. Let us disable them for now so that make check does not fail. We can fix the tests and re-enable them later. Example failure: ACPI _PSS (Pstate) table conformance tests [assert] _PSS must exist FAIL \_SB_.CPUS.C000 No _PSS exists Summary: 1 passed, 1 failed ACPI _PSS (Pstate) runtime tests [assert] _PSS must exist FAIL \_SB_.CPUS.C000 No _PSS exists Summary: 0 passed, 1 failed Cc: Daniel P. Berrangé Cc: Paolo Bonzini Cc: Maydell Peter Cc: John Snow Cc: Thomas Huth Signed-off-by: Ani Sinha --- tests/avocado/acpi-bits/bits-tests/testacpi.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/avocado/acpi-bits/bits-tests/testacpi.py b/tests/avocado/acpi-bits/bits-tests/testacpi.py index 18dc818d62..f818a9cce6 100644 --- a/tests/avocado/acpi-bits/bits-tests/testacpi.py +++ b/tests/avocado/acpi-bits/bits-tests/testacpi.py @@ -40,8 +40,8 @@ def register_tests(): testsuite.add_test("ACPI _MAT (Multiple APIC Table Entry) under Processor objects", test_mat, submenu="ACPI Tests") -testsuite.add_test("ACPI _PSS (Pstate) table conformance tests", test_pss, submenu="ACPI Tests") -testsuite.add_test("ACPI _PSS (Pstate) runtime tests", test_pstates, submenu="ACPI Tests") +#testsuite.add_test("ACPI _PSS (Pstate) table conformance tests", test_pss, submenu="ACPI Tests") +#testsuite.add_test("ACPI _PSS (Pstate) runtime tests", test_pstates, submenu="ACPI Tests") testsuite.add_test("ACPI DSDT (Differentiated System Description Table)", test_dsdt, submenu="ACPI Tests") testsuite.add_test("ACPI FACP (Fixed ACPI Description Table)", test_facp, submenu="ACPI Tests") testsuite.add_test("ACPI HPET (High Precision Event Timer Table)", test_hpet, submenu="ACPI Tests") -- 2.34.1
[PATCH v3 05/10] acpi/tests/avocado/bits: add SPDX license identifiers for bios bits smilatency tests
Added the SPDX license identifier for smilatency tests. Also added a comment indicating that smilatency test is run from within the biosbits environment/VM and hence is not subjected to QEMU build/test environment dependency fulfilments or QEMU maintanance activities. Cc: Daniel P. Berrangé Cc: Paolo Bonzini Cc: Maydell Peter Cc: John Snow Cc: Thomas Huth Signed-off-by: Ani Sinha --- tests/avocado/acpi-bits/bits-tests/smilatency.py | 4 1 file changed, 4 insertions(+) diff --git a/tests/avocado/acpi-bits/bits-tests/smilatency.py b/tests/avocado/acpi-bits/bits-tests/smilatency.py index fb1b7228e3..d616970b31 100644 --- a/tests/avocado/acpi-bits/bits-tests/smilatency.py +++ b/tests/avocado/acpi-bits/bits-tests/smilatency.py @@ -1,6 +1,8 @@ # Copyright (c) 2015, Intel Corporation # All rights reserved. # +# SPDX-License-Identifier: BSD-3-Clause +# # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are met: # @@ -24,6 +26,8 @@ # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +# This script runs only from the biosbits VM. + """SMI latency test.""" import bits -- 2.34.1
[PATCH v3 04/10] acpi/tests/avocado/bits: add smilatency test suite from bits in order to disable it
smilatency tests does not reliably pass every time it is run from QEMU. This change adds the test file unchanged from bits so that the next change can disable the test. Cc: Daniel P. Berrangé Cc: Paolo Bonzini Cc: Maydell Peter Cc: John Snow Cc: Thomas Huth Signed-off-by: Ani Sinha --- .../acpi-bits/bits-tests/smilatency.py| 102 ++ 1 file changed, 102 insertions(+) create mode 100644 tests/avocado/acpi-bits/bits-tests/smilatency.py diff --git a/tests/avocado/acpi-bits/bits-tests/smilatency.py b/tests/avocado/acpi-bits/bits-tests/smilatency.py new file mode 100644 index 00..fb1b7228e3 --- /dev/null +++ b/tests/avocado/acpi-bits/bits-tests/smilatency.py @@ -0,0 +1,102 @@ +# Copyright (c) 2015, Intel Corporation +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are met: +# +# * Redistributions of source code must retain the above copyright notice, +# this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright notice, +# this list of conditions and the following disclaimer in the documentation +# and/or other materials provided with the distribution. +# * Neither the name of Intel Corporation nor the names of its contributors +# may be used to endorse or promote products derived from this software +# without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR +# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES +# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON +# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +"""SMI latency test.""" + +import bits +from collections import namedtuple +import testsuite +import time +import usb + +def register_tests(): +testsuite.add_test("SMI latency test", smi_latency); +testsuite.add_test("SMI latency test with USB disabled via BIOS handoff", test_with_usb_disabled, runall=False); + +def smi_latency(): +MSR_SMI_COUNT = 0x34 + +print "Warning: touching the keyboard can affect the results of this test." + +tsc_per_sec = bits.tsc_per_sec() +tsc_per_usec = tsc_per_sec / (1000 * 1000) +bins = [long(tsc_per_usec * 10**i) for i in range(9)] +bin_descs = [ +"0 < t <= 1us", +"1us < t <= 10us", +"10us < t <= 100us", +"100us < t <= 1ms", +"1ms < t <= 10ms", +"10ms < t <= 100ms", +"100ms < t <= 1s ", +"1s< t <= 10s ", +"10s < t <= 100s ", +"100s < t ", +] + +print "Starting test. Wait here, I will be back in 15 seconds." +(max_latency, smi_count_delta, bins) = bits.smi_latency(long(15 * tsc_per_sec), bins) +BinType = namedtuple('BinType', ("max", "total", "count", "times")) +bins = [BinType(*b) for b in bins] + +testsuite.test("SMI latency < 150us to minimize risk of OS timeouts", max_latency / tsc_per_usec <= 150) +if not testsuite.show_detail(): +return + +for bin, desc in zip(bins, bin_descs): +if bin.count == 0: +continue +testsuite.print_detail("{}; average = {}; count = {}".format(desc, bits.format_tsc(bin.total/bin.count), bin.count)) +deltas = (bits.format_tsc(t2 - t1) for t1,t2 in zip(bin.times, bin.times[1:])) +testsuite.print_detail(" Times between first few observations: {}".format(" ".join("{:>6}".format(delta) for delta in deltas))) + +if smi_count_delta is not None: +testsuite.print_detail("{} SMI detected using MSR_SMI_COUNT (MSR {:#x})".format(smi_count_delta, MSR_SMI_COUNT)) + +testsuite.print_detail("Summary of impact: observed maximum latency = {}".format(bits.format_tsc(max_latency))) + +def test_with_usb_disabled(): +if usb.handoff_to_os(): +smi_latency() + +def average_io_smi(port, value, count): +def f(): +tsc_start = bits.rdtsc() +bits.outb(port, value) +return bits.rdtsc() - tsc_start +counts = [f() for i in range(count)] +return sum(counts)/len(counts) + +def time_io_smi(port=0xb2, value=0, count=1000): +count_for_estimate = 10 +start = time.time() +average_io_smi(port, value, count_for_estimate) +avg10 = time.time() - start +estimate = avg10 * count/count_for_estimate
[PATCH v3 06/10] acpi/tests/avocado/bits: disable smilatency test since it does not pass everytime
smilatency test is latency sensitive and does not pass deterministically when run in QEMU environment under biosbits. Disable the test suite for now. Example failure: SMI latency test Warning: touching the keyboard can affect the results of this test. Starting test. Wait here, I will be back in 15 seconds. [assert] SMI latency < 150us to minimize risk of OS timeouts FAIL 1us < t <= 10us; average = 1372ns; count = 10912449 Times between first few observations: 176us 1646ns 1441ns 1450ns 1462ns 10us < t <= 100us; average = 16us; count = 1187 Times between first few observations: 15ms 3148us 5856us 49ms 33ms 100us < t <= 1ms; average = 259us; count = 8 Times between first few observations: 111ms 2227ms 1779ms 999ms 219ms 0 SMI detected using MSR_SMI_COUNT (MSR 0x34) Summary of impact: observed maximum latency = 298us Summary: 0 passed, 1 failed Cc: Daniel P. Berrangé Cc: Paolo Bonzini Cc: Maydell Peter Cc: John Snow Cc: Thomas Huth Signed-off-by: Ani Sinha --- tests/avocado/acpi-bits/bits-tests/smilatency.py | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tests/avocado/acpi-bits/bits-tests/smilatency.py b/tests/avocado/acpi-bits/bits-tests/smilatency.py index d616970b31..e907c55cc2 100644 --- a/tests/avocado/acpi-bits/bits-tests/smilatency.py +++ b/tests/avocado/acpi-bits/bits-tests/smilatency.py @@ -37,8 +37,9 @@ import usb def register_tests(): -testsuite.add_test("SMI latency test", smi_latency); -testsuite.add_test("SMI latency test with USB disabled via BIOS handoff", test_with_usb_disabled, runall=False); +pass +# testsuite.add_test("SMI latency test", smi_latency); +# testsuite.add_test("SMI latency test with USB disabled via BIOS handoff", test_with_usb_disabled, runall=False); def smi_latency(): MSR_SMI_COUNT = 0x34 -- 2.34.1
[PATCH v3 01/10] acpi/tests/avocado/bits: initial commit of test scripts that are run by biosbits
This is initial commit of cpuid, acpi and smbios python test scripts for biosbits to execute. No change has been made to them from the original code written by the biosbits author Josh Triplett. They are required to be installed into the bits iso file and then run from within the virtual machine booted off with biosbits iso. The original location of these tests are here: https://github.com/biosbits/bits/blob/master/python/testacpi.py https://github.com/biosbits/bits/blob/master/python/smbios.py https://github.com/biosbits/bits/blob/master/python/testcpuid.py Cc: Daniel P. Berrangé Cc: Paolo Bonzini Cc: Maydell Peter Cc: John Snow Cc: Thomas Huth Signed-off-by: Ani Sinha --- tests/avocado/acpi-bits/bits-tests/smbios.py | 2430 + .../avocado/acpi-bits/bits-tests/testacpi.py | 283 ++ .../avocado/acpi-bits/bits-tests/testcpuid.py | 83 + 3 files changed, 2796 insertions(+) create mode 100644 tests/avocado/acpi-bits/bits-tests/smbios.py create mode 100644 tests/avocado/acpi-bits/bits-tests/testacpi.py create mode 100644 tests/avocado/acpi-bits/bits-tests/testcpuid.py diff --git a/tests/avocado/acpi-bits/bits-tests/smbios.py b/tests/avocado/acpi-bits/bits-tests/smbios.py new file mode 100644 index 00..9667d0542c --- /dev/null +++ b/tests/avocado/acpi-bits/bits-tests/smbios.py @@ -0,0 +1,2430 @@ +# Copyright (c) 2015, Intel Corporation +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are met: +# +# * Redistributions of source code must retain the above copyright notice, +# this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright notice, +# this list of conditions and the following disclaimer in the documentation +# and/or other materials provided with the distribution. +# * Neither the name of Intel Corporation nor the names of its contributors +# may be used to endorse or promote products derived from this software +# without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED +# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR +# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES +# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; +# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON +# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +"""SMBIOS/DMI module.""" + +import bits +import bitfields +import ctypes +import redirect +import struct +import uuid +import unpack +import ttypager +import sys + +class SMBIOS(unpack.Struct): +def __new__(cls): +if sys.platform == "BITS-EFI": +import efi +sm_ptr = efi.system_table.ConfigurationTableDict.get(efi.SMBIOS_TABLE_GUID) +else: +address = 0xF +mem = bits.memory(0xF, 0x1) +for offset in range(0, len(mem), 16): +signature = (ctypes.c_char * 4).from_address(address + offset).value +if signature == "_SM_": +entry_point_length = ctypes.c_ubyte.from_address(address + offset + 5).value +csum = sum(map(ord, mem[offset:offset + entry_point_length])) & 0xff +if csum == 0: +sm_ptr = address + offset +break +else: +return None + +if not sm_ptr: +return None + +sm = super(SMBIOS, cls).__new__(cls) +sm._header_memory = bits.memory(sm_ptr, 0x1f) +return sm + +def __init__(self): +super(SMBIOS, self).__init__() +u = unpack.Unpackable(self._header_memory) +self.add_field('header', Header(u)) +self._structure_memory = bits.memory(self.header.structure_table_address, self.header.structure_table_length) +u = unpack.Unpackable(self._structure_memory) +self.add_field('structures', unpack.unpack_all(u, _smbios_structures, self), unpack.format_each("\n\n{!r}")) + +def structure_type(self, num): +'''Dumps structure of given Type if present''' +try: +types_present = [self.structures[x].smbios_structure_type for x in range(len(self.structures))] +matrix = dict() +for index in range(len(types_present)): +if types_present.count(types_present[index]) == 1: +matrix[types_present[i
[PATCH v3 09/10] MAINTAINERS: add myself as the maintainer for acpi biosbits avocado tests
I wrote the biosbits avocado tests for testing QEMU's ACPI/SMBIOS implementation with biosbits and all the related changes. Making myself as the maintainer for biosbits related files and test scripts. Cc: Daniel P. Berrangé Cc: Paolo Bonzini Cc: Maydell Peter Cc: John Snow Cc: Thomas Huth Signed-off-by: Ani Sinha --- MAINTAINERS | 6 ++ 1 file changed, 6 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index e1530b51a2..46d829efd1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1859,6 +1859,12 @@ S: Supported F: hw/acpi/viot.c F: hw/acpi/viot.h +ACPI/AVOCADO/BIOSBITS +M: Ani Sinha +S: Supported +F: tests/avocado/acpi-bits/* +F: tests/avocado/acpi-bits.py + ACPI/HEST/GHES R: Dongjiu Geng L: qemu-...@nongnu.org -- 2.34.1
Re: [PATCH v3 00/10] Introduce new acpi/smbios avocado tests using biosbits
On Mon, Oct 10, 2022 at 1:24 PM Ani Sinha wrote: > > Please see the README file added in patch 10 for more details. > Sample runs are as follows: > > $ ./tests/venv/bin/avocado run -t acpi tests/avocado --tap - > ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? > (smbios.py, line 92) > ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? > (smilatency.py, line 47) > ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? > (testacpi.py, line 158) None of the above files are avocado tests or avocado related python scripts. They are run from within bits in a python 2.7 environment. I could not find a mechanism to exclude a directory from avocado tests. I also do not think making those scripts python 3 compliant is a good use of my time since upgrading bits to use python 3 would be a major task unrelated to QEMU testing. > 1..1 > ok 1 tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits > > $ ./tests/venv/bin/avocado run -t acpi tests/avocado > ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? > (smbios.py, line 92) > ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? > (smilatency.py, line 47) > ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? > (testacpi.py, line 158) > Fetching asset from > tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits > JOB ID : c06a9feb423bcda5de89bb51353c6c1718719f14 > JOB LOG: > /home/anisinha/avocado/job-results/job-2022-10-10T13.12-c06a9fe/job.log > (1/1) tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits: PASS > (34.11 s) > RESULTS: PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | > CANCEL 0 > JOB TIME : 37.40 s > > ./tests/venv/bin/avocado run tests/avocado/acpi-bits.py > Fetching asset from > tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits > JOB ID : a2476dd3fafe289c5e6475f447bc8369f77d57ba > JOB LOG: > /home/anisinha/avocado/job-results/job-2022-10-10T13.14-a2476dd/job.log > (1/1) tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits: PASS > (34.07 s) > RESULTS: PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | > CANCEL 0 > JOB TIME : 40.20 s > > > Changelog: > v3: > - converted the existing test to avocado tests as per the popular > recommendation. Added appropriate tags. > - artifact download URL modified based on gitlab-CI URL. > > For biosbits repo: > - moved to a gitlab repo with me being the maintainer. > - added .gitlab-ci.yml file to generate the artifacts. > v2: > - a new class of python based tests introduced that is separate from avocado >tests or qtests. Can be run by using "make check-pytest". > - acpi biosbits tests are the first tests to use pytest environment. > - bios bits tests now download the bits binary archives from a remote >repository if they are not found locally. The test skips if download >fails. > - A new environment variable is introduced that can be passed by the tester >to specify the location of the bits archives locally. test skips if the >bits binaries are not found in that location. > - if pip install of python module fails for whatever reaoson, the test skips. > - misc code fixes including spell check of the README doc. README has been >updated as well. > - addition of SPDX license headers to bits test files. > - update MAINTAINERS to reflect the new pytest test class. > > For biosbits repo: > - added Dockerfile and build script. Made bios bits build on gcc 11. >https://github.com/ani-sinha/bits/blob/bits-qemu-logging/Dockerfile >https://github.com/ani-sinha/bits/blob/bits-qemu-logging/build-artifacts.sh >The build script generates the zip archive and tarball used by the test. > > v1: initial patchset. uses qtest to implement the bios bits tests. > > Cc: Qemu Devel > Cc: Daniel P. Berrangé > Cc: Paolo Bonzini > Cc: Maydell Peter > Cc: John Snow > Cc: Thomas Huth > Cc: Igor M > Cc: Michael Tsirkin > > Ani Sinha (10): > acpi/tests/avocado/bits: initial commit of test scripts that are run > by biosbits > acpi/tests/avocado/bits: add SPDX license identifiers for bios bits > tests > acpi/tests/avocado/bits: disable acpi PSS tests that are failing in > biosbits > acpi/tests/avocado/bits: add smilatency test suite from bits in order > to disable it > acpi/tests/avocado/bits: add SPDX license identifiers for bios bits > smilatency tests > acpi/tests/avocado/bits: disable smilatency test since it does not > pass everytime > acpi/tests/avocado/bits: add biosbits config file for running bios > tests > acpi/tests/avocado/bits: add acpi and smbios avocado tests that uses > biosbits > MAINTAINERS: add myself as the maintainer for acpi biosbits avocado > tests > acpi/tests/avocado/bits: add a README file > > MAINTAINERS |6 + > tests/avocado/acpi-b
[PATCH v3 10/10] acpi/tests/avocado/bits: add a README file
Add a README file that describes the purpose of the various test files and gives guidance to developers on where and how to make changes. Cc: Daniel P. BerrangÃ" Cc: Paolo Bonzini Cc: Maydell Peter Cc: John Snow Cc: Thomas Huth Signed-off-by: Ani Sinha --- tests/avocado/acpi-bits/README | 90 ++ 1 file changed, 90 insertions(+) create mode 100644 tests/avocado/acpi-bits/README diff --git a/tests/avocado/acpi-bits/README b/tests/avocado/acpi-bits/README new file mode 100644 index 00..91e8fd235f --- /dev/null +++ b/tests/avocado/acpi-bits/README @@ -0,0 +1,90 @@ += +ACPI/SMBIOS PYTESTS USING BIOSBITS += + +Biosbits is a software written by Josh Triplett that can be downloaded by +visiting https://biosbits.org/. The github codebase can be found here: +https://github.com/biosbits/bits/tree/master. It is a software that executes +the bios components such as acpi and smbios tables directly through acpica +bios interpreter (a freely available C based library written by Intel, +downloadable from https://acpica.org/ and is included with biosbits) without an +operating system getting involved in between. +There are several advantages to directly testing the bios in a real physical +machine or VM as opposed to indirectly discovering bios issues through the +operating system. For one thing, the OSes tend to hide bios problems from the +end user. The other is that we have more control of what we wanted to test +and how by directly using acpica interpreter on top of the bios on a running +system. More details on the inspiration for developing biosbits and its real +life uses can be found in (a) and (b). +This directory contains tests written in python that exercizes the QEMU +bios components using biosbits and reports test failures. +Under the directory tests/avocado/, acpi-bits.py is a QEMU avocado test that +drives all this. + +A brief description of the various test files follows. + +Under tests/avocado/ as the root we have: + +├── acpi-bits +│ ├── bits-config +│ │ └── bits-cfg.txt +│ ├── bits-tests +│ │ ├── smbios.py +│ │ ├── smilatency.py +│ │ ├── testacpi.py +│ │ └── testcpuid.py +│ └── README +├── acpi-bits.py + +tests/avocado: + - acpi-bits.py: This is the main python avocado test script that generates a + biosbits iso. It then spawns a QEMU VM with it, collects the logs and reports + test failures. This is the script one would be interested in if they wanted + to add or change some component of the log parsing, add a new command line to + how QEMU is spawned etc. Test writers typically would not need to modify + this script unless they wanted to enhance or change the log parsing for + their tests. Following environment variables are used in this test: + - V=1 : This enables verbose mode for the test. It dumps the entire log + from bios bits and also more details in case failure happens. It is + useful for debugging the test failures or tests themselves. + +tests/avocado/acpi-bits: + - README: This text file. + +tests/avocado/acpi-bits/bits-config: + This location contains biosbits config files that determine how the software + runs the tests. + - bits-config.txt: this is the biosbits config file that determines what tests + or actions are performed by bits. The description of the config options are + provided in the file itself. + +tests/avocado/acpi-bits/bits-tests: + This directory contains biosbits python based tests that are run from within + the biosbits environment in the spawned VM. New additions of test cases can + be made in the appropriate test file. For example, new acpi tests can go + into testacpi.py and one would call testsuite.add_test() to register the new + test so that it gets executed as a part of the ACPI tests. + It might be occasionally necessary to disable some subtests or add a new + test that belongs to a test suite not already present in this directory. To + do this, please extract the bits source from the zip file mentioned above. + Copy the test suite/script that needs modification (addition of new tests + or disabling them) from boot/python location of the extracted directory + into this directory. + + step (a): copy unmodified test script to this directory. + step (b): update meson.build and add this file to the list. + Commit (a) and (b) together in the same commit. + + step (c): perform modifications to the test. + Commit (c) in a separate commit. + + The test framework will then use your modified test script to run the test. + No further changes would be needed. Please check the logs to make sure that + appropriate changes have taken effect. + + +Author: Ani Sinha + +References: +(a) https://blog.linuxplumbersconf.org/2011/ocw/system/presentations/867/original/bits.pdf +(b) https://www.youtube.com/watch?
Re: [PATCH] error handling: Use TFR() macro where applicable
Hi! Thanks for your notes. I'll try to send updated patches by the end of the day. On Fri, Oct 7, 2022 at 6:32 PM Peter Maydell wrote: > I think this patch is doing things in the wrong order. Instead of > converting code to use the old macro that we don't like and then > updating it again in patch 2 to use the new macro, we should > first introduce the new macro, and then after that we can update > code that's currently not using a macro at all to use the new one. > This makes code review easier because we don't have to look at a > change to this code which is then going to be rewritten anyway. Sounds smooth. I'll refactor patches accordingly. > > if (ret < 0) { > > ret = -errno; > > > > @@ -1472,8 +1472,8 @@ static ssize_t > handle_aiocb_rw_vector(RawPosixAIOData *aiocb) > > { > > ssize_t len; > > > > -TFR( > > -len = (aiocb->aio_type & QEMU_AIO_WRITE) ? > > +len = TEMP_FAILURE_RETRY( > > +(aiocb->aio_type & QEMU_AIO_WRITE) ? > > qemu_pwritev(aiocb->aio_fildes, > > aiocb->io.iov, > > aiocb->io.niov, > > I'm not sure why you've put the TEMP_FAILURE_RETRY on the outside here > rather than just on the individual function calls. > The original code contained both branches in one while loop, so I was afraid that value `aiocb->aio_type & QEMU_AIO_WRITE` would change somehow during the loop. If you'll say that this is impossible, I'll adjust the code as you propose. > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h > > index b1c161c035..6e244f15fa 100644 > > --- a/include/qemu/osdep.h > > +++ b/include/qemu/osdep.h > > @@ -243,7 +243,13 @@ void QEMU_ERROR("code path is reachable") > > #define ESHUTDOWN 4099 > > #endif > > > > -#define TFR(expr) do { if ((expr) != -1) break; } while (errno == EINTR) > > +#define TEMP_FAILURE_RETRY(expr) \ > > We can't call the macro this, because the glibc system headers already > may define a macro of that name, so the compiler will complain if they're > both defined at the same time, and depending on header ordering it might > not be clear which version you're getting. > Sorry, my fault. I will rename it to "RETRY_ON_EINTR" as it was proposed earlier in this thread. -- Best Regards, *Nikita Ivanov* | C developer
Re: [PATCH v8 8/8] KVM: Enable and expose KVM_MEM_PRIVATE
On Tue, Oct 04, 2022 at 05:55:28PM +0300, Jarkko Sakkinen wrote: > On Thu, Sep 15, 2022 at 10:29:13PM +0800, Chao Peng wrote: > > Expose KVM_MEM_PRIVATE and memslot fields private_fd/offset to > > userspace. KVM will register/unregister private memslot to fd-based > > memory backing store and response to invalidation event from > > inaccessible_notifier to zap the existing memory mappings in the > > secondary page table. > > > > Whether KVM_MEM_PRIVATE is actually exposed to userspace is determined > > by architecture code which can turn on it by overriding the default > > kvm_arch_has_private_mem(). > > > > A 'kvm' reference is added in memslot structure since in > > inaccessible_notifier callback we can only obtain a memslot reference > > but 'kvm' is needed to do the zapping. > > > > Co-developed-by: Yu Zhang > > Signed-off-by: Yu Zhang > > Signed-off-by: Chao Peng > > ld: arch/x86/../../virt/kvm/kvm_main.o: in function `kvm_free_memslot': > kvm_main.c:(.text+0x1385): undefined reference to > `inaccessible_unregister_notifier' > ld: arch/x86/../../virt/kvm/kvm_main.o: in function `kvm_set_memslot': > kvm_main.c:(.text+0x1b86): undefined reference to > `inaccessible_register_notifier' > ld: kvm_main.c:(.text+0x1c85): undefined reference to > `inaccessible_unregister_notifier' > ld: arch/x86/kvm/mmu/mmu.o: in function `kvm_faultin_pfn': > mmu.c:(.text+0x1e38): undefined reference to `inaccessible_get_pfn' > ld: arch/x86/kvm/mmu/mmu.o: in function `direct_page_fault': > mmu.c:(.text+0x67ca): undefined reference to `inaccessible_put_pfn' > make: *** [Makefile:1169: vmlinux] Error 1 > > I attached kernel config for reproduction. > > The problem is that CONFIG_MEMFD_CREATE does not get enabled: > > mm/Makefile:obj-$(CONFIG_MEMFD_CREATE) += memfd.o memfd_inaccessible.o Thanks for reporting. Yes there is a dependency issue needs to fix. Chao
Re: [PATCH v8 8/8] KVM: Enable and expose KVM_MEM_PRIVATE
On Thu, Oct 06, 2022 at 09:55:31AM +0100, Fuad Tabba wrote: > Hi, > > On Thu, Sep 15, 2022 at 3:37 PM Chao Peng wrote: > > > > Expose KVM_MEM_PRIVATE and memslot fields private_fd/offset to > > userspace. KVM will register/unregister private memslot to fd-based > > memory backing store and response to invalidation event from > > inaccessible_notifier to zap the existing memory mappings in the > > secondary page table. > > > > Whether KVM_MEM_PRIVATE is actually exposed to userspace is determined > > by architecture code which can turn on it by overriding the default > > kvm_arch_has_private_mem(). > > > > A 'kvm' reference is added in memslot structure since in > > inaccessible_notifier callback we can only obtain a memslot reference > > but 'kvm' is needed to do the zapping. > > > > Co-developed-by: Yu Zhang > > Signed-off-by: Yu Zhang > > Signed-off-by: Chao Peng > > --- > > include/linux/kvm_host.h | 1 + > > virt/kvm/kvm_main.c | 116 +-- > > 2 files changed, 111 insertions(+), 6 deletions(-) > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index b9906cdf468b..cb4eefac709c 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -589,6 +589,7 @@ struct kvm_memory_slot { > > struct file *private_file; > > loff_t private_offset; > > struct inaccessible_notifier notifier; > > + struct kvm *kvm; > > }; > > > > static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot > > *slot) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 97d893f7482c..87e239d35b96 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -983,6 +983,57 @@ static int kvm_vm_ioctl_set_mem_attr(struct kvm *kvm, > > gpa_t gpa, gpa_t size, > > xa_erase(&kvm->mem_attr_array, index); > > return r; > > } > > + > > +static void kvm_private_notifier_invalidate(struct inaccessible_notifier > > *notifier, > > + pgoff_t start, pgoff_t end) > > +{ > > + struct kvm_memory_slot *slot = container_of(notifier, > > + struct kvm_memory_slot, > > + notifier); > > + unsigned long base_pgoff = slot->private_offset >> PAGE_SHIFT; > > + gfn_t start_gfn = slot->base_gfn; > > + gfn_t end_gfn = slot->base_gfn + slot->npages; > > + > > + > > + if (start > base_pgoff) > > + start_gfn = slot->base_gfn + start - base_pgoff; > > + > > + if (end < base_pgoff + slot->npages) > > + end_gfn = slot->base_gfn + end - base_pgoff; > > + > > + if (start_gfn >= end_gfn) > > + return; > > + > > + kvm_zap_gfn_range(slot->kvm, start_gfn, end_gfn); > > +} > > + > > +static struct inaccessible_notifier_ops kvm_private_notifier_ops = { > > + .invalidate = kvm_private_notifier_invalidate, > > +}; > > + > > +static inline void kvm_private_mem_register(struct kvm_memory_slot *slot) > > +{ > > + slot->notifier.ops = &kvm_private_notifier_ops; > > + inaccessible_register_notifier(slot->private_file, &slot->notifier); > > +} > > + > > +static inline void kvm_private_mem_unregister(struct kvm_memory_slot *slot) > > +{ > > + inaccessible_unregister_notifier(slot->private_file, > > &slot->notifier); > > +} > > + > > +#else /* !CONFIG_HAVE_KVM_PRIVATE_MEM */ > > + > > +static inline void kvm_private_mem_register(struct kvm_memory_slot *slot) > > +{ > > + WARN_ON_ONCE(1); > > +} > > + > > +static inline void kvm_private_mem_unregister(struct kvm_memory_slot *slot) > > +{ > > + WARN_ON_ONCE(1); > > +} > > + > > #endif /* CONFIG_HAVE_KVM_PRIVATE_MEM */ > > > > #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER > > @@ -1029,6 +1080,11 @@ static void kvm_destroy_dirty_bitmap(struct > > kvm_memory_slot *memslot) > > /* This does not remove the slot from struct kvm_memslots data structures > > */ > > static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) > > { > > + if (slot->flags & KVM_MEM_PRIVATE) { > > + kvm_private_mem_unregister(slot); > > + fput(slot->private_file); > > + } > > + > > kvm_destroy_dirty_bitmap(slot); > > > > kvm_arch_free_memslot(kvm, slot); > > @@ -1600,10 +1656,16 @@ bool __weak kvm_arch_has_private_mem(struct kvm > > *kvm) > > return false; > > } > > > > -static int check_memory_region_flags(const struct kvm_user_mem_region *mem) > > +static int check_memory_region_flags(struct kvm *kvm, > > +const struct kvm_user_mem_region *mem) > > { > > u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES; > > > > +#ifdef CONFIG_HAVE_KVM_PRIVATE_MEM > > + if (kvm_arch_has_private_mem(kvm)) > > + valid_flags |= KVM_MEM_PRIVATE; > > +#endif > > + > > #ifdef __KVM_HAVE_READONLY_MEM > >
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory
On Sat, Oct 08, 2022 at 08:35:47PM +0300, Jarkko Sakkinen wrote: > On Sat, Oct 08, 2022 at 07:15:17PM +0300, Jarkko Sakkinen wrote: > > On Sat, Oct 08, 2022 at 12:54:32AM +0300, Jarkko Sakkinen wrote: > > > On Fri, Oct 07, 2022 at 02:58:54PM +, Sean Christopherson wrote: > > > > On Fri, Oct 07, 2022, Jarkko Sakkinen wrote: > > > > > On Thu, Oct 06, 2022 at 03:34:58PM +, Sean Christopherson wrote: > > > > > > On Thu, Oct 06, 2022, Jarkko Sakkinen wrote: > > > > > > > On Thu, Oct 06, 2022 at 05:58:03PM +0300, Jarkko Sakkinen wrote: > > > > > > > > On Thu, Sep 15, 2022 at 10:29:07PM +0800, Chao Peng wrote: > > > > > > > > > This new extension, indicated by the new flag > > > > > > > > > KVM_MEM_PRIVATE, adds two > > > > > > > > > additional KVM memslot fields private_fd/private_offset to > > > > > > > > > allow > > > > > > > > > userspace to specify that guest private memory provided from > > > > > > > > > the > > > > > > > > > private_fd and guest_phys_addr mapped at the private_offset > > > > > > > > > of the > > > > > > > > > private_fd, spanning a range of memory_size. > > > > > > > > > > > > > > > > > > The extended memslot can still have the userspace_addr(hva). > > > > > > > > > When use, a > > > > > > > > > single memslot can maintain both private memory through > > > > > > > > > private > > > > > > > > > fd(private_fd/private_offset) and shared memory through > > > > > > > > > hva(userspace_addr). Whether the private or shared part is > > > > > > > > > visible to > > > > > > > > > guest is maintained by other KVM code. > > > > > > > > > > > > > > > > What is anyway the appeal of private_offset field, instead of > > > > > > > > having just > > > > > > > > 1:1 association between regions and files, i.e. one memfd per > > > > > > > > region? > > > > > > > > > > > > Modifying memslots is slow, both in KVM and in QEMU (not sure about > > > > > > Google's VMM). > > > > > > E.g. if a vCPU converts a single page, it will be forced to wait > > > > > > until all other > > > > > > vCPUs drop SRCU, which can have severe latency spikes, e.g. if KVM > > > > > > is faulting in > > > > > > memory. KVM's memslot updates also hold a mutex for the entire > > > > > > duration of the > > > > > > update, i.e. conversions on different vCPUs would be fully > > > > > > serialized, exacerbating > > > > > > the SRCU problem. > > > > > > > > > > > > KVM also has historical baggage where it "needs" to zap _all_ SPTEs > > > > > > when any > > > > > > memslot is deleted. > > > > > > > > > > > > Taking both a private_fd and a shared userspace address allows > > > > > > userspace to convert > > > > > > between private and shared without having to manipulate memslots. > > > > > > > > > > Right, this was really good explanation, thank you. > > > > > > > > > > Still wondering could this possibly work (or not): > > > > > > > > > > 1. Union userspace_addr and private_fd. > > > > > > > > No, because userspace needs to be able to provide both userspace_addr > > > > (shared > > > > memory) and private_fd (private memory) for a single memslot. > > > > > > Got it, thanks for clearing my misunderstandings on this topic, and it > > > is quite obviously visible in 5/8 and 7/8. I.e. if I got it right, > > > memblock can be partially private, and you dig the shared holes with > > > KVM_MEMORY_ENCRYPT_UNREG_REGION. We have (in Enarx) ATM have memblock > > > per host mmap, I was looking into this dilated by that mindset but makes > > > definitely sense to support that. > > > > For me the most useful reference with this feature is kvm_set_phys_mem() > > implementation in privmem-v8 branch. Took while to find it because I did > > not have much experience with QEMU code base. I'd even recommend to mention > > that function in the cover letter because it is really good reference on > > how this feature is supposed to be used. That's a good point, I can mention that if people find useful. > > While learning QEMU code, I also noticed bunch of comparison like this: > > if (slot->flags | KVM_MEM_PRIVATE) > > I guess those could be just replaced with unconditional fills as it does > not do any harm, if KVM_MEM_PRIVATE is not set. Make sense, thanks. Chao > > BR, Jarkko
Re: [PATCH] block/io_uring: revert "Use io_uring_register_ring_fd() to skip fd operations"
On Sat, Sep 24, 2022 at 10:48:15PM +0800, Sam Li wrote: Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1193 The commit "Use io_uring_register_ring_fd() to skip fd operations" broke when booting a guest with iothread and io_uring. That is because the io_uring_register_ring_fd() call is made from the main thread instead of IOThread where io_uring_submit() is called. It can not be guaranteed to register the ring fd in the correct thread or unregister the same ring fd if the IOThread is disabled. This optimization is not critical so we will revert previous commit. Right, maybe we can call it on the first submit. This reverts commit e2848bc574fe2715c694bf8fe9a1ba7f78a1125a and 77e3f038af1764983087e3551a0fde9951952c4d. Yep, better to revert for now: Reviewed-by: Stefano Garzarella Should we queue this for stable? Thanks, Stefano Signed-off-by: Sam Li --- block/io_uring.c | 13 + meson.build | 1 - 2 files changed, 1 insertion(+), 13 deletions(-) diff --git a/block/io_uring.c b/block/io_uring.c index a1760152e0..973e15d876 100644 --- a/block/io_uring.c +++ b/block/io_uring.c @@ -11,7 +11,6 @@ #include "qemu/osdep.h" #include #include "block/aio.h" -#include "qemu/error-report.h" #include "qemu/queue.h" #include "block/block.h" #include "block/raw-aio.h" @@ -19,7 +18,6 @@ #include "qapi/error.h" #include "trace.h" - /* io_uring ring size */ #define MAX_ENTRIES 128 @@ -432,17 +430,8 @@ LuringState *luring_init(Error **errp) } ioq_init(&s->io_q); -#ifdef CONFIG_LIBURING_REGISTER_RING_FD -if (io_uring_register_ring_fd(&s->ring) < 0) { -/* - * Only warn about this error: we will fallback to the non-optimized - * io_uring operations. - */ -warn_report("failed to register linux io_uring ring file descriptor"); -} -#endif - return s; + } void luring_cleanup(LuringState *s) diff --git a/meson.build b/meson.build index 3885fc1076..63cfb844cf 100644 --- a/meson.build +++ b/meson.build @@ -1793,7 +1793,6 @@ config_host_data.set('CONFIG_LIBNFS', libnfs.found()) config_host_data.set('CONFIG_LIBSSH', libssh.found()) config_host_data.set('CONFIG_LINUX_AIO', libaio.found()) config_host_data.set('CONFIG_LINUX_IO_URING', linux_io_uring.found()) -config_host_data.set('CONFIG_LIBURING_REGISTER_RING_FD', cc.has_function('io_uring_register_ring_fd', prefix: '#include ', dependencies:linux_io_uring)) config_host_data.set('CONFIG_LIBPMEM', libpmem.found()) config_host_data.set('CONFIG_NUMA', numa.found()) config_host_data.set('CONFIG_OPENGL', opengl.found()) -- 2.37.3
[PATCH 2/2] tests/qtest/ide-test: Verify READ NATIVE MAX ADDRESS is not limited
Verify that the ATA command READ NATIVE MAX ADDRESS returns the last valid CHS tuple for the native device rather than any limit established by INITIALIZE DEVICE PARAMETERS. Signed-off-by: Lev Kujawski --- tests/qtest/ide-test.c | 47 +- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/tests/qtest/ide-test.c b/tests/qtest/ide-test.c index dbe1563b23..c406e6752a 100644 --- a/tests/qtest/ide-test.c +++ b/tests/qtest/ide-test.c @@ -37,7 +37,8 @@ /* TODO actually test the results and get rid of this */ #define qmp_discard_response(q, ...) qobject_unref(qtest_qmp(q, __VA_ARGS__)) -#define TEST_IMAGE_SIZE 64 * 1024 * 1024 +/* Specified by ATA (physical) CHS geometry for ~64 MiB device. */ +#define TEST_IMAGE_SIZE ((130 * 16 * 63) * 512) #define IDE_PCI_DEV 1 #define IDE_PCI_FUNC1 @@ -91,11 +92,13 @@ enum { enum { CMD_DSM = 0x06, CMD_DIAGNOSE= 0x90, +CMD_INIT_DP = 0x91, /* INITIALIZE DEVICE PARAMETERS */ CMD_READ_DMA= 0xc8, CMD_WRITE_DMA = 0xca, CMD_FLUSH_CACHE = 0xe7, CMD_IDENTIFY= 0xec, CMD_PACKET = 0xa0, +CMD_READ_NATIVE = 0xf8, /* READ NATIVE MAX ADDRESS */ CMDF_ABORT = 0x100, CMDF_NO_BM = 0x200, @@ -562,6 +565,46 @@ static void string_cpu_to_be16(uint16_t *s, size_t bytes) } } +static void test_specify(void) +{ +QTestState *qts; +QPCIDevice *dev; +QPCIBar bmdma_bar, ide_bar; +uint16_t cyls; +uint8_t heads, spt; + +qts = ide_test_start( +"-blockdev driver=file,node-name=hda,filename=%s " +"-device ide-hd,drive=hda,bus=ide.0,unit=0 ", +tmp_path[0]); + +dev = get_pci_device(qts, &bmdma_bar, &ide_bar); + +/* Initialize drive with zero sectors per track and one head. */ +qpci_io_writeb(dev, ide_bar, reg_nsectors, 0); +qpci_io_writeb(dev, ide_bar, reg_device, 0); +qpci_io_writeb(dev, ide_bar, reg_command, CMD_INIT_DP); + +/* READ NATIVE MAX ADDRESS (CHS mode). */ +qpci_io_writeb(dev, ide_bar, reg_device, 0xa0); +qpci_io_writeb(dev, ide_bar, reg_command, CMD_READ_NATIVE); + +heads = qpci_io_readb(dev, ide_bar, reg_device) & 0xf; +++heads; +g_assert_cmpint(heads, ==, 16); + +cyls = qpci_io_readb(dev, ide_bar, reg_lba_high) << 8; +cyls |= qpci_io_readb(dev, ide_bar, reg_lba_middle); +++cyls; +g_assert_cmpint(cyls, ==, 130); + +spt = qpci_io_readb(dev, ide_bar, reg_lba_low); +g_assert_cmpint(spt, ==, 63); + +ide_test_quit(qts); +free_pci_device(dev); +} + static void test_identify(void) { QTestState *qts; @@ -1079,6 +1122,8 @@ int main(int argc, char **argv) /* Run the tests */ g_test_init(&argc, &argv, NULL); +qtest_add_func("/ide/read_native", test_specify); + qtest_add_func("/ide/identify", test_identify); qtest_add_func("/ide/diagnostic", test_diagnostic); -- 2.34.1
Re: [PATCH] linux-user: Implement faccessat2
On 10/9/22 08:08, WANG Xuerui wrote: User space has been preferring this syscall for a while, due to its closer match with C semantics, and newer platforms such as LoongArch apparently have libc implementations that don't fallback to faccessat so normal access checks are failing without the emulation in place. Tested by successfully emerging several packages within a Gentoo loong stage3 chroot, emulated on amd64 with help of static qemu-loongarch64. Reported-by: Andreas K. Hüttel Signed-off-by: WANG Xuerui --- linux-user/strace.list | 3 +++ linux-user/syscall.c | 9 + 2 files changed, 12 insertions(+) There were two similar approaches from Richard and me: https://lore.kernel.org/qemu-devel/20220729201414.29869-1-richard.hender...@linaro.org/#t and https://lore.kernel.org/qemu-devel/YzLdcnL6x646T61W@p100/ diff --git a/linux-user/strace.list b/linux-user/strace.list index a87415bf3d..3df2184580 100644 --- a/linux-user/strace.list +++ b/linux-user/strace.list @@ -178,6 +178,9 @@ #ifdef TARGET_NR_faccessat { TARGET_NR_faccessat, "faccessat" , NULL, print_faccessat, NULL }, #endif +#ifdef TARGET_NR_faccessat2 +{ TARGET_NR_faccessat2, "faccessat2" , NULL, print_faccessat, NULL }, +#endif You are missing that part (from my patch): --- a/linux-user/strace.c +++ b/linux-user/strace.c @@ -1931,7 +1931,7 @@ print_execv(CPUArchState *cpu_env, const struct syscallname *name, } #endif -#ifdef TARGET_NR_faccessat +#if defined(TARGET_NR_faccessat) || defined(TARGET_NR_faccessat2) otherwise if TARGET_NR_faccessat isn't defined, you won't have the function print_faccessat() in strace.c defined. #ifdef TARGET_NR_fadvise64 { TARGET_NR_fadvise64, "fadvise64" , NULL, NULL, NULL }, #endif diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 2e954d8dbd..a81f0b65b9 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -9110,6 +9110,15 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int num, abi_long arg1, unlock_user(p, arg2, 0); return ret; #endif +#if defined(TARGET_NR_faccessat2) && defined(__NR_faccessat2) +case TARGET_NR_faccessat2: +if (!(p = lock_user_string(arg2))) { +return -TARGET_EFAULT; +} +ret = get_errno(faccessat(arg1, p, arg3, arg4)); You rely here on the libc faccessat() function to either use faccessat2() or faccessat() syscalls - which is probably the best way around... Helge +unlock_user(p, arg2, 0); +return ret; +#endif #ifdef TARGET_NR_nice /* not on alpha */ case TARGET_NR_nice: return get_errno(nice(arg1));
[PATCH 1/2] hw/ide/core.c (cmd_read_native_max): Avoid limited device parameters
Always use the native CHS device parameters for the ATA commands READ NATIVE MAX ADDRESS and READ NATIVE MAX ADDRESS EXT, not those limited by the ATA command INITIALIZE_DEVICE_PARAMETERS (introduced in patch 176e4961, hw/ide/core.c: Implement ATA INITIALIZE_DEVICE_PARAMETERS command, 2022-07-07.) As stated by the ATA/ATAPI specification, "[t]he native maximum is the highest address accepted by the device in the factory default condition." Therefore this patch substitutes the native values in drive_heads and drive_sectors before calling ide_set_sector(). One consequence of the prior behavior was that setting zero sectors per track could lead to an FPE within ide_set_sector(). Thanks to Alexander Bulekov for reporting this issue. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1243 Signed-off-by: Lev Kujawski --- hw/ide/core.c | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 39afdc0006..ee836401bc 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -1608,11 +1608,24 @@ static bool cmd_read_native_max(IDEState *s, uint8_t cmd) /* Refuse if no sectors are addressable (e.g. medium not inserted) */ if (s->nb_sectors == 0) { ide_abort_command(s); -return true; -} +} else { +/* + * Save the active drive parameters, which may have been + * limited from their native counterparts by, e.g., INITIALIZE + * DEVICE PARAMETERS or SET MAX ADDRESS. + */ +const int aheads = s->heads; +const int asectors = s->sectors; -ide_cmd_lba48_transform(s, lba48); -ide_set_sector(s, s->nb_sectors - 1); +s->heads = s->drive_heads; +s->sectors = s->drive_sectors; + +ide_cmd_lba48_transform(s, lba48); +ide_set_sector(s, s->nb_sectors - 1); + +s->heads = aheads; +s->sectors = asectors; +} return true; } -- 2.34.1
Re: [PATCH v2] tests: Add sndio to the FreeBSD CI containers / VM
On Fri, Oct 07, 2022 at 06:27:29PM -0400, Brad Smith wrote: > On 10/7/2022 4:33 PM, Warner Losh wrote: > > > > > > On Fri, Oct 7, 2022 at 1:21 AM Brad Smith wrote: > > > > tests: Add sndio to the FreeBSD CI containers / VM > > > > --- > > .gitlab-ci.d/cirrus/freebsd-12.vars | 2 +- > > .gitlab-ci.d/cirrus/freebsd-13.vars | 2 +- > > tests/docker/dockerfiles/alpine.docker | 3 +- > > tests/docker/dockerfiles/centos8.docker | 2 +- > > .../dockerfiles/debian-amd64-cross.docker | 235 - > > tests/docker/dockerfiles/debian-amd64.docker | 237 > > +- > > .../dockerfiles/debian-arm64-cross.docker | 233 - > > .../dockerfiles/debian-armel-cross.docker | 231 - > > .../dockerfiles/debian-armhf-cross.docker | 233 - > > .../dockerfiles/debian-mips64el-cross.docker | 227 - > > .../dockerfiles/debian-mipsel-cross.docker | 227 - > > .../dockerfiles/debian-ppc64el-cross.docker | 231 - > > .../dockerfiles/debian-s390x-cross.docker | 229 - > > tests/docker/dockerfiles/fedora.docker | 230 - > > tests/docker/dockerfiles/opensuse-leap.docker | 3 +- > > tests/docker/dockerfiles/ubuntu2004.docker | 235 - > > tests/lcitool/libvirt-ci | 2 +- > > tests/lcitool/projects/qemu.yml | 1 + > > tests/vm/freebsd | 3 + > > 19 files changed, 1291 insertions(+), 1275 deletions(-) > > > > > > This looks good to me. Why did the Linux containers need updating for > > the FreeBSD update? > > > > Otherwise, the changes look good to my eye > > > > Reviewed-by: Warner Losh > > > Because the CI configs are auto-generated. When refreshing them it generates > them all. The intent was > to update the FreeBSD configs, but when adding the dependency to > tests/lcitool/projects/qemu.yml > the FreeBSD configs are updated as well as the rest. Whatever OS's have a > corresponding mapping > in libvirt-ci are updated. The POV of libvirt-ci, is that if the dependancy exists in any given platform, we add it to the package list, so that we maximise the test coverage across platforms. Surprisingly sndio was available in several Linux distros. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: total fail on FreeBSD 14.0 amd64 regardless of compiler
On Mon, Oct 10, 2022 at 06:56:51AM +, Dennis Clarke wrote: > > re: https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg01249.html > > Using GCC 12 is even worse : > > [2040/6841] Compiling C object qemu-system-aarch64.p/softmmu_main.c.o > [2041/6841] Linking target qemu-system-aarch64 > FAILED: qemu-system-aarch64 > /usr/local/bin/g++12 -m64 -mcx16 @qemu-system-aarch64.rsp > /usr/local/bin/ld: libqemuutil.a.p/util_filemonitor-inotify.c.o: undefined > reference to symbol 'inotify_init1' > /usr/local/bin/ld: /usr/local/lib/libinotify.so.0: error adding symbols: DSO > missing from command line This is trying to tell us that inotify_init1 is in libinotify.so.0 but we have not put -linotify on the command line for the link step. On Linux these are a standard part of glibc, but on FreeBSD they are separate. I'm thinking that in previous FreeBSD we probably got this linked indirectly, but something in 14.0 has become stricer, wanting it listed explicitly. IOW, a meson.build change is likely needed to add -linotify on FreeBSD. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH v3 00/10] Introduce new acpi/smbios avocado tests using biosbits
On 10/10/2022 10.13, Ani Sinha wrote: On Mon, Oct 10, 2022 at 1:24 PM Ani Sinha wrote: Please see the README file added in patch 10 for more details. Sample runs are as follows: $ ./tests/venv/bin/avocado run -t acpi tests/avocado --tap - ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? (smbios.py, line 92) ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? (smilatency.py, line 47) ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? (testacpi.py, line 158) None of the above files are avocado tests or avocado related python scripts. They are run from within bits in a python 2.7 environment. I could not find a mechanism to exclude a directory from avocado tests. I also do not think making those scripts python 3 compliant is a good use of my time since upgrading bits to use python 3 would be a major task unrelated to QEMU testing. Maybe you could at least switch those three lines to use the new print() syntax to silence at least these ugly errors? ... Python 2.7 should cope very well with the new syntax, as far as I know... Otherwise, it might be better to put the non-avocado python files into another directory under tests/ ? Thomas
[PATCH v2 6/7] hostmem: Allow for specifying a ThreadContext for preallocation
Let's allow for specifying a thread context via the "prealloc-context" property. When set, preallcoation threads will be crated via the thread context -- inheriting the same CPU affinity as the thread context. Pinning preallcoation threads to CPUs can heavily increase performance in NUMA setups, because, preallocation from a CPU close to the target NUMA node(s) is faster then preallocation from a CPU further remote, simply because of memory bandwidth for initializing memory with zeroes. This is especially relevant for very large VMs backed by huge/gigantic pages, whereby preallocation is mandatory. Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- backends/hostmem.c | 12 +--- include/sysemu/hostmem.h | 2 ++ qapi/qom.json| 4 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/backends/hostmem.c b/backends/hostmem.c index 76f0394490..8640294c10 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -232,8 +232,8 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value, void *ptr = memory_region_get_ram_ptr(&backend->mr); uint64_t sz = memory_region_size(&backend->mr); -qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, NULL, - &local_err); +qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, + backend->prealloc_context, &local_err); if (local_err) { error_propagate(errp, local_err); return; @@ -385,7 +385,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) */ if (backend->prealloc) { qemu_prealloc_mem(memory_region_get_fd(&backend->mr), ptr, sz, - backend->prealloc_threads, NULL, &local_err); + backend->prealloc_threads, + backend->prealloc_context, &local_err); if (local_err) { goto out; } @@ -493,6 +494,11 @@ host_memory_backend_class_init(ObjectClass *oc, void *data) NULL, NULL); object_class_property_set_description(oc, "prealloc-threads", "Number of CPU threads to use for prealloc"); +object_class_property_add_link(oc, "prealloc-context", +TYPE_THREAD_CONTEXT, offsetof(HostMemoryBackend, prealloc_context), +object_property_allow_set_link, OBJ_PROP_LINK_STRONG); +object_class_property_set_description(oc, "prealloc-context", +"Context to use for creating CPU threads for preallocation"); object_class_property_add(oc, "size", "int", host_memory_backend_get_size, host_memory_backend_set_size, diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h index 9ff5c16963..39326f1d4f 100644 --- a/include/sysemu/hostmem.h +++ b/include/sysemu/hostmem.h @@ -18,6 +18,7 @@ #include "qom/object.h" #include "exec/memory.h" #include "qemu/bitmap.h" +#include "qemu/thread-context.h" #define TYPE_MEMORY_BACKEND "memory-backend" OBJECT_DECLARE_TYPE(HostMemoryBackend, HostMemoryBackendClass, @@ -66,6 +67,7 @@ struct HostMemoryBackend { bool merge, dump, use_canonical_path; bool prealloc, is_mapped, share, reserve; uint32_t prealloc_threads; +ThreadContext *prealloc_context; DECLARE_BITMAP(host_nodes, MAX_NODES + 1); HostMemPolicy policy; diff --git a/qapi/qom.json b/qapi/qom.json index cb6f700c0c..efa74f9799 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -578,6 +578,9 @@ # # @prealloc-threads: number of CPU threads to use for prealloc (default: 1) # +# @prealloc-context: thread context to use for creation of preallocation threads +#(default: none) (since 7.2) +# # @share: if false, the memory is private to QEMU; if true, it is shared # (default: false) # @@ -608,6 +611,7 @@ '*policy': 'HostMemPolicy', '*prealloc': 'bool', '*prealloc-threads': 'uint32', +'*prealloc-context': 'str', '*share': 'bool', '*reserve': 'bool', 'size': 'size', -- 2.37.3
Re: [PATCH v4 0/2] Add memmap and fix bugs for LoongArch
在 2022/9/30 17:51, Xiaojuan Yang 写道: This series add memmap table and fix extioi, ipi device emulation for LoongArch virt machine. Changes for v4: Add 'reviewed-by' tag in fixing ipi patch, and other changes are the same as v3. 1. Remove the memmap table patch in this series, it will apply until we have more than one machinestate. 2. Using MemTxAttrs' requester_type and requester_id to get current cpu index in loongarch extioi regs emulation. This patch based on: 20220927141504.3886314-1-alex.ben...@linaro.org 3. Rewrite the commit message of fixing ipi patch, and add reviewed by tag in the patch. Changes for v3: 1. Remove the memmap table patch in this series, it will apply until we have more than one machinestate. 2. Using MemTxAttrs' requester_type and requester_id to get current cpu index in loongarch extioi regs emulation. This patch based on: 20220927141504.3886314-1-alex.ben...@linaro.org 3. Rewrite the commit message of fixing ipi patch, and this patch has been reviewed. Changes for v2: 1. Adjust the position of 'PLATFORM' element in memmap table Changes for v1: 1. Add memmap table for LoongArch virt machine 2. Fix LoongArch extioi function 3. Fix LoongArch ipi device emulation Xiaojuan Yang (2): hw/intc: Fix LoongArch extioi function hw/intc: Fix LoongArch ipi device emulation hw/intc/loongarch_extioi.c | 51 +++-- hw/intc/loongarch_ipi.c | 1 - hw/intc/trace-events| 2 +- target/loongarch/iocsr_helper.c | 16 +-- 4 files changed, 38 insertions(+), 32 deletions(-) Applied to loongarch-next. Thanks. Song Gao
Re: [PATCH v3 00/10] Introduce new acpi/smbios avocado tests using biosbits
On Mon, Oct 10, 2022 at 2:26 PM Thomas Huth wrote: > > On 10/10/2022 10.13, Ani Sinha wrote: > > On Mon, Oct 10, 2022 at 1:24 PM Ani Sinha wrote: > >> > >> Please see the README file added in patch 10 for more details. > >> Sample runs are as follows: > >> > >> $ ./tests/venv/bin/avocado run -t acpi tests/avocado --tap - > >> ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? > >> (smbios.py, line 92) > >> ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? > >> (smilatency.py, line 47) > >> ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? > >> (testacpi.py, line 158) > > > > None of the above files are avocado tests or avocado related python > > scripts. They are run from within bits in a python 2.7 environment. I > > could not find a mechanism to exclude a directory from avocado tests. > > I also do not think making those scripts python 3 compliant is a good > > use of my time since upgrading bits to use python 3 would be a major > > task unrelated to QEMU testing. > > Maybe you could at least switch those three lines to use the new print() > syntax There are lots of print statements in those three files using old syntax. It's only complaining about the first one. to silence at least these ugly errors? ... Python 2.7 should cope > very well with the new syntax, as far as I know... > > Otherwise, it might be better to put the non-avocado python files into > another directory under tests/ ? > > Thomas > >
Re: [PATCH RESEND] linux-user: Fix struct statfs ABI on loongarch64
在 2022/10/6 19:11, Philippe Mathieu-Daudé 写道: On 6/10/22 12:07, WANG Xuerui wrote: Previously the 32-bit version was incorrectly chosen, leading to funny but incorrect output from e.g. df(1). Simply select the version corresponding to the 64-bit asm-generic definition. For reference, this program should produce the same output no matter natively compiled or not, for loongarch64 or not: ```c #include #include int main(int argc, const char *argv[]) { struct statfs b; if (statfs(argv[0], &b)) return 1; printf("f_type = 0x%lx\n", b.f_type); printf("f_bsize = %ld\n", b.f_bsize); printf("f_blocks = %ld\n", b.f_blocks); printf("f_bfree = %ld\n", b.f_bfree); printf("f_bavail = %ld\n", b.f_bavail); return 0; } // Example output on my amd64 box, with the test binary residing on a // btrfs partition. // Native and emulated output after the fix: // // f_type = 0x9123683e // f_bsize = 4096 // f_blocks = 268435456 // f_bfree = 168406890 // f_bavail = 168355058 // Output before the fix, note the messed layout: // // f_type = 0x10009123683e // f_bsize = 723302085239504896 // f_blocks = 168355058 // f_bfree = 2250817541779750912 // f_bavail = 1099229433104 ``` Fixes: 1f63019632 ("linux-user: Add LoongArch syscall support") Signed-off-by: WANG Xuerui Cc: Song Gao Cc: Xiaojuan Yang Cc: Andreas K. Hüttel --- Resend with amended commit message to 100% clarify the example output are generated on my box and will differ for everyone else. Sorry for the noise. Reviewed-by: Philippe Mathieu-Daudé Applied to loongarch-next. Thanks. Song Gao
[PATCH v2 1/7] util: Cleanup and rename os_mem_prealloc()
Let's * give the function a "qemu_*" style name * make sure the parameters in the implementation match the prototype * rename smp_cpus to max_threads, which makes the semantics of that parameter clearer ... and add a function documentation. Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- backends/hostmem.c | 6 +++--- hw/virtio/virtio-mem.c | 2 +- include/qemu/osdep.h | 17 +++-- softmmu/cpus.c | 2 +- util/oslib-posix.c | 24 util/oslib-win32.c | 8 6 files changed, 36 insertions(+), 23 deletions(-) diff --git a/backends/hostmem.c b/backends/hostmem.c index 4428e06738..491cb10b97 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -232,7 +232,7 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value, void *ptr = memory_region_get_ram_ptr(&backend->mr); uint64_t sz = memory_region_size(&backend->mr); -os_mem_prealloc(fd, ptr, sz, backend->prealloc_threads, &local_err); +qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, &local_err); if (local_err) { error_propagate(errp, local_err); return; @@ -383,8 +383,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) * specified NUMA policy in place. */ if (backend->prealloc) { -os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz, -backend->prealloc_threads, &local_err); +qemu_prealloc_mem(memory_region_get_fd(&backend->mr), ptr, sz, + backend->prealloc_threads, &local_err); if (local_err) { goto out; } diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index 30d03e987a..0e9ef4ff19 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -467,7 +467,7 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa, int fd = memory_region_get_fd(&vmem->memdev->mr); Error *local_err = NULL; -os_mem_prealloc(fd, area, size, 1, &local_err); +qemu_prealloc_mem(fd, area, size, 1, &local_err); if (local_err) { static bool warned; diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index b1c161c035..e556e45143 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -568,8 +568,21 @@ unsigned long qemu_getauxval(unsigned long type); void qemu_set_tty_echo(int fd, bool echo); -void os_mem_prealloc(int fd, char *area, size_t sz, int smp_cpus, - Error **errp); +/** + * qemu_prealloc_mem: + * @fd: the fd mapped into the area, -1 for anonymous memory + * @area: start address of the are to preallocate + * @sz: the size of the area to preallocate + * @max_threads: maximum number of threads to use + * @errp: returns an error if this function fails + * + * Preallocate memory (populate/prefault page tables writable) for the virtual + * memory area starting at @area with the size of @sz. After a successful call, + * each page in the area was faulted in writable at least once, for example, + * after allocating file blocks for mapped files. + */ +void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads, + Error **errp); /** * qemu_get_pid_name: diff --git a/softmmu/cpus.c b/softmmu/cpus.c index 61b27ff59d..01c94fd298 100644 --- a/softmmu/cpus.c +++ b/softmmu/cpus.c @@ -354,7 +354,7 @@ static void qemu_init_sigbus(void) /* * ALERT: when modifying this, take care that SIGBUS forwarding in - * os_mem_prealloc() will continue working as expected. + * qemu_prealloc_mem() will continue working as expected. */ memset(&action, 0, sizeof(action)); action.sa_flags = SA_SIGINFO; diff --git a/util/oslib-posix.c b/util/oslib-posix.c index 827a7aadba..905cbc27cc 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -329,7 +329,7 @@ static void sigbus_handler(int signal) return; } #endif /* CONFIG_LINUX */ -warn_report("os_mem_prealloc: unrelated SIGBUS detected and ignored"); +warn_report("qemu_prealloc_mem: unrelated SIGBUS detected and ignored"); } static void *do_touch_pages(void *arg) @@ -399,13 +399,13 @@ static void *do_madv_populate_write_pages(void *arg) } static inline int get_memset_num_threads(size_t hpagesize, size_t numpages, - int smp_cpus) + int max_threads) { long host_procs = sysconf(_SC_NPROCESSORS_ONLN); int ret = 1; if (host_procs > 0) { -ret = MIN(MIN(host_procs, MAX_MEM_PREALLOC_THREAD_COUNT), smp_cpus); +ret = MIN(MIN(host_procs, MAX_MEM_PREALLOC_THREAD_COUNT), max_threads); } /* Especially with gigantic pages, don't create more threads than pages. */ @@ -418,11 +418,11 @@ static inline int get_memset_num_threa
[PATCH v2 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext
This is a follow-up on "util: NUMA aware memory preallocation" [1] by Michal. Setting the CPU affinity of threads from inside QEMU usually isn't easily possible, because we don't want QEMU -- once started and running guest code -- to be able to mess up the system. QEMU disallows relevant syscalls using seccomp, such that any such invocation will fail. Especially for memory preallocation in memory backends, the CPU affinity can significantly increase guest startup time, for example, when running large VMs backed by huge/gigantic pages, because of NUMA effects. For NUMA-aware preallocation, we have to set the CPU affinity, however: (1) Once preallocation threads are created during preallocation, management tools cannot intercept anymore to change the affinity. These threads are created automatically on demand. (2) QEMU cannot easily set the CPU affinity itself. (3) The CPU affinity derived from the NUMA bindings of the memory backend might not necessarily be exactly the CPUs we actually want to use (e.g., CPU-less NUMA nodes, CPUs that are pinned/used for other VMs). There is an easy "workaround". If we have a thread with the right CPU affinity, we can simply create new threads on demand via that prepared context. So, all we have to do is setup and create such a context ahead of time, to then configure preallocation to create new threads via that environment. So, let's introduce a user-creatable "thread-context" object that essentially consists of a context thread used to create new threads. QEMU can either try setting the CPU affinity itself ("cpu-affinity", "node-affinity" property), or upper layers can extract the thread id ("thread-id" property) to configure it externally. Make memory-backends consume a thread-context object (via the "prealloc-context" property) and use it when preallocating to create new threads with the desired CPU affinity. Further, to make it easier to use, allow creation of "thread-context" objects, including setting the CPU affinity directly from QEMU, before enabling the sandbox option. Quick test on a system with 2 NUMA nodes: Without CPU affinity: time qemu-system-x86_64 \ -object memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind \ -nographic -monitor stdio real0m5.383s real0m3.499s real0m5.129s real0m4.232s real0m5.220s real0m4.288s real0m3.582s real0m4.305s real0m5.421s real0m4.502s -> It heavily depends on the scheduler CPU selection With CPU affinity: time qemu-system-x86_64 \ -object thread-context,id=tc1,node-affinity=0 \ -object memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind,prealloc-context=tc1 \ -sandbox enable=on,resourcecontrol=deny \ -nographic -monitor stdio real0m1.959s real0m1.942s real0m1.943s real0m1.941s real0m1.948s real0m1.964s real0m1.949s real0m1.948s real0m1.941s real0m1.937s On reasonably large VMs, the speedup can be quite significant. While this concept is currently only used for short-lived preallocation threads, nothing major speaks against reusing the concept for other threads that are harder to identify/configure -- except that we need additional (idle) context threads that are otherwise left unused. This series does not yet tackle concurrent preallocation of memory backends. Memory backend objects are created and memory is preallocated one memory backend at a time -- and there is currently no way to do preallocation asynchronously. [1] https://lkml.kernel.org/r/ffdcd118d59b379ede2b64745144165a40f6a813.1652165704.git.mpriv...@redhat.com v1 -> v2: * Fixed some minor style nits * "util: Introduce ThreadContext user-creatable object" -> Impove documentation and patch description. [Markus] * "util: Add write-only "node-affinity" property for ThreadContext" -> Impove documentation and patch description. [Markus] RFC -> v1: * "vl: Allow ThreadContext objects to be created before the sandbox option" -> Move parsing of the "name" property before object_create_pre_sandbox * Added RB's Cc: Michal Privoznik Cc: Igor Mammedov Cc: "Michael S. Tsirkin" Cc: Paolo Bonzini Cc: "Daniel P. Berrangé" Cc: Eduardo Habkost Cc: Dr. David Alan Gilbert Cc: Eric Blake Cc: Markus Armbruster Cc: Richard Henderson Cc: Stefan Weil David Hildenbrand (7): util: Cleanup and rename os_mem_prealloc() util: Introduce qemu_thread_set_affinity() and qemu_thread_get_affinity() util: Introduce ThreadContext user-creatable object util: Add write-only "node-affinity" property for ThreadContext util: Make qemu_prealloc_mem() optionally consume a ThreadContext hostmem: Allow for specifying a ThreadContext for preallocation vl: Allow ThreadContext objects to be crea
[PATCH v2 3/7] util: Introduce ThreadContext user-creatable object
Setting the CPU affinity of QEMU threads is a bit problematic, because QEMU doesn't always have permissions to set the CPU affinity itself, for example, with seccomp after initialized by QEMU: -sandbox enable=on,resourcecontrol=deny General information about CPU affinities can be found in the man page of taskset: CPU affinity is a scheduler property that "bonds" a process to a given set of CPUs on the system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs. While upper layers are already aware of how to handle CPU affinities for long-lived threads like iothreads or vcpu threads, especially short-lived threads, as used for memory-backend preallocation, are more involved to handle. These threads are created on demand and upper layers are not even able to identify and configure them. Introduce the concept of a ThreadContext, that is essentially a thread used for creating new threads. All threads created via that context thread inherit the configured CPU affinity. Consequently, it's sufficient to create a ThreadContext and configure it once, and have all threads created via that ThreadContext inherit the same CPU affinity. The CPU affinity of a ThreadContext can be configured two ways: (1) Obtaining the thread id via the "thread-id" property and setting the CPU affinity manually. (2) Setting the "cpu-affinity" property and letting QEMU try set the CPU affinity itself. This will fail if QEMU doesn't have permissions to do so anymore after seccomp was initialized. A simple QEMU example to set the CPU affinity to CPU 0,1,6,7 would be: qemu-system-x86_64 -S \ -object thread-context,id=tc1,cpu-affinity=0-1,cpu-affinity=6-7 And we can query it via HMP/QMP: (qemu) qom-get tc1 cpu-affinity [ 0, 1, 6, 7 ] But note that due to dynamic library loading this example will not work before we actually make use of thread_context_create_thread() in QEMU code, because the type will otherwise not get registered. A ThreadContext can be reused, simply by reconfiguring the CPU affinity. Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- include/qemu/thread-context.h | 57 +++ qapi/qom.json | 17 +++ util/meson.build | 1 + util/oslib-posix.c| 1 + util/thread-context.c | 278 ++ 5 files changed, 354 insertions(+) create mode 100644 include/qemu/thread-context.h create mode 100644 util/thread-context.c diff --git a/include/qemu/thread-context.h b/include/qemu/thread-context.h new file mode 100644 index 00..2ebd6b7fe1 --- /dev/null +++ b/include/qemu/thread-context.h @@ -0,0 +1,57 @@ +/* + * QEMU Thread Context + * + * Copyright Red Hat Inc., 2022 + * + * Authors: + * David Hildenbrand + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#ifndef SYSEMU_THREAD_CONTEXT_H +#define SYSEMU_THREAD_CONTEXT_H + +#include "qapi/qapi-types-machine.h" +#include "qemu/thread.h" +#include "qom/object.h" + +#define TYPE_THREAD_CONTEXT "thread-context" +OBJECT_DECLARE_TYPE(ThreadContext, ThreadContextClass, +THREAD_CONTEXT) + +struct ThreadContextClass { +ObjectClass parent_class; +}; + +struct ThreadContext { +/* private */ +Object parent; + +/* private */ +unsigned int thread_id; +QemuThread thread; + +/* Semaphore to wait for context thread action. */ +QemuSemaphore sem; +/* Semaphore to wait for action in context thread. */ +QemuSemaphore sem_thread; +/* Mutex to synchronize requests. */ +QemuMutex mutex; + +/* Commands for the thread to execute. */ +int thread_cmd; +void *thread_cmd_data; + +/* CPU affinity bitmap used for initialization. */ +unsigned long *init_cpu_bitmap; +int init_cpu_nbits; +}; + +void thread_context_create_thread(ThreadContext *tc, QemuThread *thread, + const char *name, + void *(*start_routine)(void *), void *arg, + int mode); + +#endif /* SYSEMU_THREAD_CONTEXT_H */ diff --git a/qapi/qom.json b/qapi/qom.json index 80dd419b39..67d47f4051 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -830,6 +830,21 @@ 'reduced-phys-bits': 'uint32', '*kernel-hashes': 'bool' } } +## +# @ThreadContextProperties: +# +# Properties for thread context objects. +# +# @cpu-affinity: the list of CPU numbers used as CPU affinity for all threads +#created in the thread context (default: QEMU main thread +#affinity) +# +# Since: 7.2 +## +{ 'struct': 'ThreadContextProperties', + 'data': { '*cpu-affinity': ['uint16'] } } + + ## # @ObjectType: # @@ -882,6 +897,7 @@ { 'name': 'secret_keyring', 'if': 'CONFIG_SECRET_KEYRING' }, 'sev-gue
[PATCH v2 5/7] util: Make qemu_prealloc_mem() optionally consume a ThreadContext
... and implement it under POSIX. When a ThreadContext is provided, create new threads via the context such that these new threads obtain a porperly configured CPU affinity. Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- backends/hostmem.c | 5 +++-- hw/virtio/virtio-mem.c | 2 +- include/qemu/osdep.h | 4 +++- util/oslib-posix.c | 20 ++-- util/oslib-win32.c | 2 +- 5 files changed, 22 insertions(+), 11 deletions(-) diff --git a/backends/hostmem.c b/backends/hostmem.c index 491cb10b97..76f0394490 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -232,7 +232,8 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value, void *ptr = memory_region_get_ram_ptr(&backend->mr); uint64_t sz = memory_region_size(&backend->mr); -qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, &local_err); +qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, NULL, + &local_err); if (local_err) { error_propagate(errp, local_err); return; @@ -384,7 +385,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp) */ if (backend->prealloc) { qemu_prealloc_mem(memory_region_get_fd(&backend->mr), ptr, sz, - backend->prealloc_threads, &local_err); + backend->prealloc_threads, NULL, &local_err); if (local_err) { goto out; } diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index 0e9ef4ff19..ed170def48 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -467,7 +467,7 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa, int fd = memory_region_get_fd(&vmem->memdev->mr); Error *local_err = NULL; -qemu_prealloc_mem(fd, area, size, 1, &local_err); +qemu_prealloc_mem(fd, area, size, 1, NULL, &local_err); if (local_err) { static bool warned; diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index e556e45143..625298c8bc 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -568,6 +568,8 @@ unsigned long qemu_getauxval(unsigned long type); void qemu_set_tty_echo(int fd, bool echo); +typedef struct ThreadContext ThreadContext; + /** * qemu_prealloc_mem: * @fd: the fd mapped into the area, -1 for anonymous memory @@ -582,7 +584,7 @@ void qemu_set_tty_echo(int fd, bool echo); * after allocating file blocks for mapped files. */ void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads, - Error **errp); + ThreadContext *tc, Error **errp); /** * qemu_get_pid_name: diff --git a/util/oslib-posix.c b/util/oslib-posix.c index 28305cdea3..59a891b6a8 100644 --- a/util/oslib-posix.c +++ b/util/oslib-posix.c @@ -419,7 +419,8 @@ static inline int get_memset_num_threads(size_t hpagesize, size_t numpages, } static int touch_all_pages(char *area, size_t hpagesize, size_t numpages, - int max_threads, bool use_madv_populate_write) + int max_threads, ThreadContext *tc, + bool use_madv_populate_write) { static gsize initialized = 0; MemsetContext context = { @@ -458,9 +459,16 @@ static int touch_all_pages(char *area, size_t hpagesize, size_t numpages, context.threads[i].numpages = numpages_per_thread + (i < leftover); context.threads[i].hpagesize = hpagesize; context.threads[i].context = &context; -qemu_thread_create(&context.threads[i].pgthread, "touch_pages", - touch_fn, &context.threads[i], - QEMU_THREAD_JOINABLE); +if (tc) { +thread_context_create_thread(tc, &context.threads[i].pgthread, + "touch_pages", + touch_fn, &context.threads[i], + QEMU_THREAD_JOINABLE); +} else { +qemu_thread_create(&context.threads[i].pgthread, "touch_pages", + touch_fn, &context.threads[i], + QEMU_THREAD_JOINABLE); +} addr += context.threads[i].numpages * hpagesize; } @@ -496,7 +504,7 @@ static bool madv_populate_write_possible(char *area, size_t pagesize) } void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads, - Error **errp) + ThreadContext *tc, Error **errp) { static gsize initialized; int ret; @@ -537,7 +545,7 @@ void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads, } /* touch pages simultaneously */ -ret = touch_all_pages(area, hpagesize, numpages, max_threads, +ret = touch_all_pages(area, hpage
[PATCH v2 7/7] vl: Allow ThreadContext objects to be created before the sandbox option
Currently, there is no way to configure a CPU affinity inside QEMU when the sandbox option disables it for QEMU as a whole, for example, via: -sandbox enable=on,resourcecontrol=deny While ThreadContext objects can be created on the QEMU commandline and the CPU affinity can be configured externally via the thread-id, this is insufficient if a ThreadContext with a certain CPU affinity is already required during QEMU startup, before we can intercept QEMU and configure the CPU affinity. Blocking sched_setaffinity() was introduced in 24f8cdc57224 ("seccomp: add resourcecontrol argument to command line"), "to avoid any bigger of the process". However, we only care about once QEMU is running, not when the instance starting QEMU explicitly requests a certain CPU affinity on the QEMU comandline. Right now, for NUMA-aware preallocation of memory backends used for initial machine RAM, one has to: 1) Start QEMU with the memory-backend with "prealloc=off" 2) Pause QEMU before it starts the guest (-S) 3) Create ThreadContext, configure the CPU affinity using the thread-id 4) Configure the ThreadContext as "prealloc-context" of the memory backend 5) Trigger preallocation by setting "prealloc=on" To simplify this handling especially for initial machine RAM, allow creation of ThreadContext objects before parsing sandbox options, such that the CPU affinity requested on the QEMU commandline alongside the sandbox option can be set. As ThreadContext objects essentially only create a persistant context thread and set the CPU affinity, this is easily possible. With this change, we can create a ThreadContext with a CPU affinity on the QEMU commandline and use it for preallocation of memory backends glued to the machine (simplified example): To make "-name debug-threads=on" keep working as expected for the context threads, perform earlier parsing of "-name". qemu-system-x86_64 -m 1G \ -object thread-context,id=tc1,cpu-affinity=3-4 \ -object memory-backend-ram,id=pc.ram,size=1G,prealloc=on,prealloc-threads=2,prealloc-context=tc1 \ -machine memory-backend=pc.ram \ -S -monitor stdio -sandbox enable=on,resourcecontrol=deny And while we can query the current CPU affinity: (qemu) qom-get tc1 cpu-affinity [ 3, 4 ] We can no longer change it from QEMU directly: (qemu) qom-set tc1 cpu-affinity 1-2 Error: Setting CPU affinity failed: Operation not permitted Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- softmmu/vl.c | 36 1 file changed, 32 insertions(+), 4 deletions(-) diff --git a/softmmu/vl.c b/softmmu/vl.c index b464da25bc..b5a23420ac 100644 --- a/softmmu/vl.c +++ b/softmmu/vl.c @@ -1759,6 +1759,27 @@ static void object_option_parse(const char *optarg) visit_free(v); } +/* + * Very early object creation, before the sandbox options have been activated. + */ +static bool object_create_pre_sandbox(const char *type) +{ +/* + * Objects should in general not get initialized "too early" without + * a reason. If you add one, state the reason in a comment! + */ + +/* + * Reason: -sandbox on,resourcecontrol=deny disallows setting CPU + * affinity of threads. + */ +if (g_str_equal(type, "thread-context")) { +return true; +} + +return false; +} + /* * Initial object creation happens before all other * QEMU data types are created. The majority of objects @@ -1773,6 +1794,11 @@ static bool object_create_early(const char *type) * add one, state the reason in a comment! */ +/* Reason: already created. */ +if (object_create_pre_sandbox(type)) { +return false; +} + /* Reason: property "chardev" */ if (g_str_equal(type, "rng-egd") || g_str_equal(type, "qtest")) { @@ -1895,7 +1921,7 @@ static void qemu_create_early_backends(void) */ static bool object_create_late(const char *type) { -return !object_create_early(type); +return !object_create_early(type) && !object_create_pre_sandbox(type); } static void qemu_create_late_backends(void) @@ -2351,6 +2377,11 @@ static int process_runstate_actions(void *opaque, QemuOpts *opts, Error **errp) static void qemu_process_early_options(void) { +qemu_opts_foreach(qemu_find_opts("name"), + parse_name, NULL, &error_fatal); + +object_option_foreach_add(object_create_pre_sandbox); + #ifdef CONFIG_SECCOMP QemuOptsList *olist = qemu_find_opts_err("sandbox", NULL); if (olist) { @@ -2358,9 +2389,6 @@ static void qemu_process_early_options(void) } #endif -qemu_opts_foreach(qemu_find_opts("name"), - parse_name, NULL, &error_fatal); - if (qemu_opts_foreach(qemu_find_opts("action"), process_runstate_actions, NULL, &error_fatal)) { exit(1); -- 2.37.3
[PATCH v2 2/7] util: Introduce qemu_thread_set_affinity() and qemu_thread_get_affinity()
Usually, we let upper layers handle CPU pinning, because pthread_setaffinity_np() (-> sched_setaffinity()) is blocked via seccomp when starting QEMU with -sandbox enable=on,resourcecontrol=deny However, we want to configure and observe the CPU affinity of threads from QEMU directly in some cases when the sandbox option is either not enabled or not active yet. So let's add a way to configure CPU pinning via qemu_thread_set_affinity() and obtain CPU affinity via qemu_thread_get_affinity() and implement them under POSIX using pthread_setaffinity_np() + pthread_getaffinity_np(). Implementation under Windows is possible using SetProcessAffinityMask() + GetProcessAffinityMask(), however, that is left as future work. Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- include/qemu/thread.h| 4 +++ meson.build | 16 + util/qemu-thread-posix.c | 70 util/qemu-thread-win32.c | 12 +++ 4 files changed, 102 insertions(+) diff --git a/include/qemu/thread.h b/include/qemu/thread.h index af19f2b3fc..79e507c7f0 100644 --- a/include/qemu/thread.h +++ b/include/qemu/thread.h @@ -185,6 +185,10 @@ void qemu_event_destroy(QemuEvent *ev); void qemu_thread_create(QemuThread *thread, const char *name, void *(*start_routine)(void *), void *arg, int mode); +int qemu_thread_set_affinity(QemuThread *thread, unsigned long *host_cpus, + unsigned long nbits); +int qemu_thread_get_affinity(QemuThread *thread, unsigned long **host_cpus, + unsigned long *nbits); void *qemu_thread_join(QemuThread *thread); void qemu_thread_get_self(QemuThread *thread); bool qemu_thread_is_self(QemuThread *thread); diff --git a/meson.build b/meson.build index b686dfef75..3e0aa4925d 100644 --- a/meson.build +++ b/meson.build @@ -2114,7 +2114,23 @@ config_host_data.set('CONFIG_PTHREAD_CONDATTR_SETCLOCK', cc.links(gnu_source_pre pthread_condattr_setclock(&attr, CLOCK_MONOTONIC); return 0; }''', dependencies: threads)) +config_host_data.set('CONFIG_PTHREAD_AFFINITY_NP', cc.links(gnu_source_prefix + ''' + #include + static void *f(void *p) { return NULL; } + int main(void) + { +int setsize = CPU_ALLOC_SIZE(64); +pthread_t thread; +cpu_set_t *cpuset; +pthread_create(&thread, 0, f, 0); +cpuset = CPU_ALLOC(64); +CPU_ZERO_S(setsize, cpuset); +pthread_setaffinity_np(thread, setsize, cpuset); +pthread_getaffinity_np(thread, setsize, cpuset); +CPU_FREE(cpuset); +return 0; + }''', dependencies: threads)) config_host_data.set('CONFIG_SIGNALFD', cc.links(gnu_source_prefix + ''' #include #include diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c index ac1d56e673..bae938c670 100644 --- a/util/qemu-thread-posix.c +++ b/util/qemu-thread-posix.c @@ -16,6 +16,7 @@ #include "qemu/notify.h" #include "qemu-thread-common.h" #include "qemu/tsan.h" +#include "qemu/bitmap.h" static bool name_threads; @@ -552,6 +553,75 @@ void qemu_thread_create(QemuThread *thread, const char *name, pthread_attr_destroy(&attr); } +int qemu_thread_set_affinity(QemuThread *thread, unsigned long *host_cpus, + unsigned long nbits) +{ +#if defined(CONFIG_PTHREAD_AFFINITY_NP) +const size_t setsize = CPU_ALLOC_SIZE(nbits); +unsigned long value; +cpu_set_t *cpuset; +int err; + +cpuset = CPU_ALLOC(nbits); +g_assert(cpuset); + +CPU_ZERO_S(setsize, cpuset); +value = find_first_bit(host_cpus, nbits); +while (value < nbits) { +CPU_SET_S(value, setsize, cpuset); +value = find_next_bit(host_cpus, nbits, value + 1); +} + +err = pthread_setaffinity_np(thread->thread, setsize, cpuset); +CPU_FREE(cpuset); +return err; +#else +return -ENOSYS; +#endif +} + +int qemu_thread_get_affinity(QemuThread *thread, unsigned long **host_cpus, + unsigned long *nbits) +{ +#if defined(CONFIG_PTHREAD_AFFINITY_NP) +unsigned long tmpbits; +cpu_set_t *cpuset; +size_t setsize; +int i, err; + +tmpbits = CPU_SETSIZE; +while (true) { +setsize = CPU_ALLOC_SIZE(tmpbits); +cpuset = CPU_ALLOC(tmpbits); +g_assert(cpuset); + +err = pthread_getaffinity_np(thread->thread, setsize, cpuset); +if (err) { +CPU_FREE(cpuset); +if (err != -EINVAL) { +return err; +} +tmpbits *= 2; +} else { +break; +} +} + +/* Convert the result into a proper bitmap. */ +*nbits = tmpbits; +*host_cpus = bitmap_new(tmpbits); +for (i = 0; i < tmpbits; i++) { +if (CPU_ISSET(i, cpuset)) { +set_bit(i, *host_cpus); +} +} +CPU_FREE(cpuset); +return 0; +#else +return -ENOSYS; +#endif +} + void qemu_thread_get_self(QemuThread *thread) {
[PATCH v2 4/7] util: Add write-only "node-affinity" property for ThreadContext
Let's make it easier to pin threads created via a ThreadContext to all CPUs currently belonging to a given set of NUMA nodes -- which is the common case. "node-affinity" is simply a shortcut for setting "cpu-affinity" manually to the list of CPUs belonging to the set of nodes. This property can only be written. A simple QEMU example to set the CPU affinity to Node 1 on a system with two NUMA nodes, 24 CPUs each: qemu-system-x86_64 -S \ -object thread-context,id=tc1,node-affinity=1 And we can query the cpu-affinity via HMP/QMP: (qemu) qom-get tc1 cpu-affinity [ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47 ] We cannot query the node-affinity: (qemu) qom-get tc1 node-affinity Error: Insufficient permission to perform this operation But note that due to dynamic library loading this example will not work before we actually make use of thread_context_create_thread() in QEMU code, because the type will otherwise not get registered. Note that if the CPUs for a node change due do physical CPU hotplug or hotunplug (e.g., lscpu output changes) after the ThreadContext was started, the CPU affinity will not get updated. Reviewed-by: Michal Privoznik Signed-off-by: David Hildenbrand --- qapi/qom.json | 9 - util/meson.build | 2 +- util/thread-context.c | 84 +++ 3 files changed, 93 insertions(+), 2 deletions(-) diff --git a/qapi/qom.json b/qapi/qom.json index 67d47f4051..cb6f700c0c 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -839,10 +839,17 @@ #created in the thread context (default: QEMU main thread #affinity) # +# @node-affinity: the list of node numbers that will be resolved to a list +# of CPU numbers used as CPU affinity. This is a shortcut for +# specifying the list of CPU numbers belonging to the nodes +# manually by setting @cpu-affinity. (default: QEMU main thread +# affinity) +# # Since: 7.2 ## { 'struct': 'ThreadContextProperties', - 'data': { '*cpu-affinity': ['uint16'] } } + 'data': { '*cpu-affinity': ['uint16'], +'*node-affinity': ['uint16'] } } ## diff --git a/util/meson.build b/util/meson.build index e97cd2d779..c0a7bc54d4 100644 --- a/util/meson.build +++ b/util/meson.build @@ -1,5 +1,5 @@ util_ss.add(files('osdep.c', 'cutils.c', 'unicode.c', 'qemu-timer-common.c')) -util_ss.add(files('thread-context.c')) +util_ss.add(files('thread-context.c'), numa) if not config_host_data.get('CONFIG_ATOMIC64') util_ss.add(files('atomic64.c')) endif diff --git a/util/thread-context.c b/util/thread-context.c index c921905396..4138245332 100644 --- a/util/thread-context.c +++ b/util/thread-context.c @@ -21,6 +21,10 @@ #include "qemu/module.h" #include "qemu/bitmap.h" +#ifdef CONFIG_NUMA +#include +#endif + enum { TC_CMD_NONE = 0, TC_CMD_STOP, @@ -88,6 +92,11 @@ static void thread_context_set_cpu_affinity(Object *obj, Visitor *v, int nbits = 0, ret; Error *err = NULL; +if (tc->init_cpu_bitmap) { +error_setg(errp, "Mixing CPU and node affinity not supported"); +return; +} + visit_type_uint16List(v, name, &host_cpus, &err); if (err) { error_propagate(errp, err); @@ -159,6 +168,79 @@ static void thread_context_get_cpu_affinity(Object *obj, Visitor *v, qapi_free_uint16List(host_cpus); } +static void thread_context_set_node_affinity(Object *obj, Visitor *v, + const char *name, void *opaque, + Error **errp) +{ +#ifdef CONFIG_NUMA +const int nbits = numa_num_possible_cpus(); +ThreadContext *tc = THREAD_CONTEXT(obj); +uint16List *l, *host_nodes = NULL; +unsigned long *bitmap = NULL; +struct bitmask *tmp_cpus; +Error *err = NULL; +int ret, i; + +if (tc->init_cpu_bitmap) { +error_setg(errp, "Mixing CPU and node affinity not supported"); +return; +} + +visit_type_uint16List(v, name, &host_nodes, &err); +if (err) { +error_propagate(errp, err); +return; +} + +if (!host_nodes) { +error_setg(errp, "Node list is empty"); +goto out; +} + +bitmap = bitmap_new(nbits); +tmp_cpus = numa_allocate_cpumask(); +for (l = host_nodes; l; l = l->next) { +numa_bitmask_clearall(tmp_cpus); +ret = numa_node_to_cpus(l->value, tmp_cpus); +if (ret) { +/* We ignore any errors, such as impossible nodes. */ +continue; +} +for (i = 0; i < nbits; i++) { +if (numa_bitmask_isbitset(tmp_cpus, i)) { +set_bit(i, bitmap); +
Re: [PATCH v3 00/10] Introduce new acpi/smbios avocado tests using biosbits
On 10/10/2022 11.13, Ani Sinha wrote: On Mon, Oct 10, 2022 at 2:26 PM Thomas Huth wrote: On 10/10/2022 10.13, Ani Sinha wrote: On Mon, Oct 10, 2022 at 1:24 PM Ani Sinha wrote: Please see the README file added in patch 10 for more details. Sample runs are as follows: $ ./tests/venv/bin/avocado run -t acpi tests/avocado --tap - ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? (smbios.py, line 92) ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? (smilatency.py, line 47) ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? (testacpi.py, line 158) None of the above files are avocado tests or avocado related python scripts. They are run from within bits in a python 2.7 environment. I could not find a mechanism to exclude a directory from avocado tests. I also do not think making those scripts python 3 compliant is a good use of my time since upgrading bits to use python 3 would be a major task unrelated to QEMU testing. Maybe you could at least switch those three lines to use the new print() syntax There are lots of print statements in those three files using old syntax. It's only complaining about the first one. to silence at least these ugly errors? ... Python 2.7 should cope Yeah, but everybody who wants to run the QEMU avocado tests will wonder about those ERROR messages! So I don't think that this is acceptable in the current shape. Either fix the print lines, or move it to another directory. Thomas
Re: [PATCH v3 0/3] Fix some loongarch tcg bugs
在 2022/9/30 10:45, Song Gao 写道: Hi, This series fix some bugs find from RISU test. V3: -drop patch set some instruction result high 32bit 1. -follow some change from Richard's suggestion. v2: -remove patch5 div if x/0 set dividend to 0. Song Gao (3): target/loongarch: bstrins.w src register need EXT_NONE target/loongarch: Fix fnm{sub/add}_{s/d} set wrong flags softfloat: logB(0) should raise divideByZero exception fpu/softfloat-parts.c.inc | 1 + target/loongarch/insn_trans/trans_bit.c.inc | 36 +++ .../loongarch/insn_trans/trans_farith.c.inc | 12 +++ 3 files changed, 29 insertions(+), 20 deletions(-) Applied to loongarch-next. Thanks. Song Gao
Re: [PATCH] ui/vnc-clipboard: fix integer underflow in vnc_client_cut_text_ext
On Sun, Sep 25, 2022 at 10:45 PM Mauro Matteo Cascella wrote: > > Extended ClientCutText messages start with a 4-byte header. If len < 4, > an integer underflow occurs in vnc_client_cut_text_ext. The result is > used to decompress data in a while loop in inflate_buffer, leading to > CPU consumption and denial of service. Prevent this by checking dlen in > protocol_client_msg. > > Fixes: CVE-2022-3165 > Fixes: 0bf41cab93e5 ("ui/vnc: clipboard support") > Reported-by: TangPeng > Signed-off-by: Mauro Matteo Cascella > --- > Extended Clipboard Pseudo-Encoding: > https://github.com/rfbproto/rfbproto/blob/master/rfbproto.rst#extended-clipboard-pseudo-encoding > > ui/vnc.c | 11 --- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/ui/vnc.c b/ui/vnc.c > index 6a05d06147..acb3629cd8 100644 > --- a/ui/vnc.c > +++ b/ui/vnc.c > @@ -2442,8 +2442,8 @@ static int protocol_client_msg(VncState *vs, uint8_t > *data, size_t len) > if (len == 1) { > return 8; > } > +uint32_t dlen = abs(read_s32(data, 4)); > if (len == 8) { > -uint32_t dlen = abs(read_s32(data, 4)); > if (dlen > (1 << 20)) { > error_report("vnc: client_cut_text msg payload has %u bytes" > " which exceeds our limit of 1MB.", dlen); > @@ -2456,8 +2456,13 @@ static int protocol_client_msg(VncState *vs, uint8_t > *data, size_t len) > } > > if (read_s32(data, 4) < 0) { > -vnc_client_cut_text_ext(vs, abs(read_s32(data, 4)), > -read_u32(data, 8), data + 12); > +if (dlen < 4) { > +error_report("vnc: malformed payload (header less than 4 > bytes)" > + " in extended clipboard pseudo-encoding."); > +vnc_client_error(vs); > +break; > +} > +vnc_client_cut_text_ext(vs, dlen, read_u32(data, 8), data + 12); > break; > } > vnc_client_cut_text(vs, read_u32(data, 4), data + 8); > -- > 2.37.3 > Any updates here? Thanks, -- Mauro Matteo Cascella Red Hat Product Security PGP-Key ID: BB3410B0
Re: [PATCH v3 00/10] Introduce new acpi/smbios avocado tests using biosbits
On Mon, Oct 10, 2022 at 2:50 PM Daniel P. Berrangé wrote: > > On Mon, Oct 10, 2022 at 01:43:21PM +0530, Ani Sinha wrote: > > On Mon, Oct 10, 2022 at 1:24 PM Ani Sinha wrote: > > > > > > Please see the README file added in patch 10 for more details. > > > Sample runs are as follows: > > > > > > $ ./tests/venv/bin/avocado run -t acpi tests/avocado --tap - > > > ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? > > > (smbios.py, line 92) > > > ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? > > > (smilatency.py, line 47) > > > ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? > > > (testacpi.py, line 158) > > > > None of the above files are avocado tests or avocado related python > > scripts. They are run from within bits in a python 2.7 environment. I > > could not find a mechanism to exclude a directory from avocado tests. > > I also do not think making those scripts python 3 compliant is a good > > use of my time since upgrading bits to use python 3 would be a major > > task unrelated to QEMU testing. > > In one of the later patches copy_test_scripts() copies the files > into the guest image IIUC. > > If you rename them in git, to be .py2 then presumably avocado > wont try to load them, and then in copy_test_scripts you can > give them the normal .py extension for the guest. The .py2 > extension will also make it more obvious to maintainers that > these are different from the rest of the python coded we have. I did a quick test and this approach seems to work. ./tests/venv/bin/avocado run -t acpi tests/avocado --tap - 1..1 ok 1 tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits
[PATCH] migration: Fix the minus value for compressed_size
When update_compress_thread_counts() is called first time, there is no data stream yet. We see compression_counters.compressed_size becomes minus value shortly. Signed-off-by: Zhenzhong Duan --- migration/ram.c | 4 1 file changed, 4 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index dc1de9ddbc68..510db95cdc36 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1443,6 +1443,10 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream *stream, RAMBlock *block, static void update_compress_thread_counts(const CompressParam *param, int bytes_xmit) { +if (bytes_xmit <= 0) { +return; +} + ram_transferred_add(bytes_xmit); if (param->zero_page) { -- 2.25.1
Re: [PATCH v3 00/10] Introduce new acpi/smbios avocado tests using biosbits
On Mon, Oct 10, 2022 at 01:43:21PM +0530, Ani Sinha wrote: > On Mon, Oct 10, 2022 at 1:24 PM Ani Sinha wrote: > > > > Please see the README file added in patch 10 for more details. > > Sample runs are as follows: > > > > $ ./tests/venv/bin/avocado run -t acpi tests/avocado --tap - > > ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? > > (smbios.py, line 92) > > ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? > > (smilatency.py, line 47) > > ERROR: Missing parentheses in call to 'print'. Did you mean print(...)? > > (testacpi.py, line 158) > > None of the above files are avocado tests or avocado related python > scripts. They are run from within bits in a python 2.7 environment. I > could not find a mechanism to exclude a directory from avocado tests. > I also do not think making those scripts python 3 compliant is a good > use of my time since upgrading bits to use python 3 would be a major > task unrelated to QEMU testing. In one of the later patches copy_test_scripts() copies the files into the guest image IIUC. If you rename them in git, to be .py2 then presumably avocado wont try to load them, and then in copy_test_scripts you can give them the normal .py extension for the guest. The .py2 extension will also make it more obvious to maintainers that these are different from the rest of the python coded we have. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH] include/qemu/atomic128: Support 16-byte atomic read/write for Intel AVX
On Sat, 8 Oct 2022 at 16:38, Richard Henderson wrote: > > Intel has now given guarantees about the atomicity of SSE read > and write instructions on cpus supporting AVX. We can use these > instead of the much slower cmpxchg16b. > > Derived from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 > > Signed-off-by: Richard Henderson > +/* > + * The xgetbv instruction is not available to older versions of > + * the assembler, so we encode the instruction manually. > + */ > +asm(".byte 0x0f, 0x01, 0xd0" : "=a" (xcrl), "=d" (xcrh) : "c" (0)); This would make the third place in the tree where we hand-code this asm instruction (we already do it in the xgetbv() function in target/i386/hvf/x86_cpuid.c and opencoded in tcg_target_init()): can we abstract this out somehow, please? (There is also a just-written-out "xgetbv" in init_cpuid_cache(), but that one's guarded by ifdefs so presumably OK.) thanks -- PMM
Re: [PATCH v6 00/13] blkio: add libblkio BlockDriver
On Thu, Oct 6, 2022 at 10:35 PM Stefan Hajnoczi wrote: > v6: > - Add untested nvme-io_uring driver. Please test in your nested NVMe > environment, Alberto. [Alberto] I did some I/O verification using fio [1] and it seems to be working fine. Alberto [1] https://fio.readthedocs.io/en/latest/fio_doc.html#verification
Re: [PATCH] error handling: Use TFR() macro where applicable
On Mon, 10 Oct 2022 at 09:34, Nikita Ivanov wrote: > > Hi! Thanks for your notes. I'll try to send updated patches by the end of the > day. > > On Fri, Oct 7, 2022 at 6:32 PM Peter Maydell wrote: >> I'm not sure why you've put the TEMP_FAILURE_RETRY on the outside here >> rather than just on the individual function calls. > > > The original code contained both branches in one while loop, so I was afraid > that > value `aiocb->aio_type & QEMU_AIO_WRITE` would change somehow during the loop. > If you'll say that this is impossible, I'll adjust the code as you propose. Oh, yes, I see. No, nothing should be changing that during the loop, I think it's just written the way it is now because it seemed neater when you write out the retry longhand. -- PMM
Re: [PATCH v3 4/8] m25p80: Add the mx25l25635f SFPD table
Am 2022-10-10 08:23, schrieb Cédric Le Goater: On 10/7/22 16:44, Francisco Iglesias wrote: --- a/hw/block/m25p80.c +++ b/hw/block/m25p80.c @@ -234,6 +234,8 @@ static const FlashPartInfo known_devices[] = { { INFO("mx25l12855e", 0xc22618, 0, 64 << 10, 256, 0) }, { INFO6("mx25l25635e", 0xc22019, 0xc22019, 64 << 10, 512, 0), .sfdp_read = m25p80_sfdp_mx25l25635e }, +{ INFO6("mx25l25635f", 0xc22019, 0xc22019, 64 << 10, 512, 0), I think I'm not seeing the extended id part in the datasheet I've found so might be that you can switch to just INFO and _ext_id 0 above This was added by commit 6bbe036f32dc ("m25p80: Return the JEDEC ID twice for mx25l25635e") to fix a real breakage on HW. From my experience, the ID has a particular length, at least three bytes and if you read past that length for some (all?) devices the id bytes just get repeated. I.e. the counter in the device will just wrap to offset 0 again. If you want to emulate the hardware correctly, you would have to take that into consideration. But I don't think it's worth it, OTOH there seems to be some broken software which rely on that (undefined?) behavior. -michael
Re: [PATCH v7 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0
On Fri, 7 Oct 2022 16:21:51 +0100 Jonathan Cameron wrote: > Whilst I have carried on Huai-Cheng Kuo's series version numbering and > naming, there have been very substantial changes since v6 so I would > suggest fresh review makes sense for anyone who has looked at this before. > In particularly if the Avery design folks could check I haven't broken > anything that would be great. I forgot to run checkpatch on these and there is some white space that will need cleaning up and one instance of missing brackets. As that doesn't greatly affect review, I'll wait for a few days to see if there is other feedback to incorporate in v8. Sorry for the resulting noise! These are now available at https://gitlab.com/jic23/qemu/-/commits/cxl-2022-10-09 along with a bunch of other CXL features: * Compliance DOE protocol * SPDM / CMA over DOE supprot * ARM64 support in general. * Various small emulation additions. * CPMU support I'll add a few more features to similarly named branches over the next week or so including initial support for standalone switch CCI mailboxes. Jonathan > > For reference v6: QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0 > https://lore.kernel.org/qemu-devel/1623330943-18290-1-git-send-email-cbr...@avery-design.com/ > > Summary of changes: > 1) Linux headers definitions for DOE are now upstream so drop that patch. > 2) Add CDAT for switch upstream port. > 3) Generate 'plausible' default CDAT tables when a file is not provided. > 4) General refactoring to calculate the correct table sizes and allocate >based on that rather than copying from a local static array. > 5) Changes from earlier reviews such as matching QEMU type naming style. > 6) Moved compliance and SPDM usecases to future patch sets. > > Sign-offs on these are complex because the patches were originally developed > by Huai-Cheng Kuo, but posted by Chris Browy and then picked up by Jonathan > Cameron who made substantial changes. > > Huai-Cheng Kuo / Chris Browy, please confirm you are still happy to maintain > this > code as per the original MAINTAINERS entry. > > What's here? > > This series brings generic PCI Express Data Object Exchange support (DOE) > DOE is defined in the PCIe Base Spec r6.0. It consists of a mailbox in PCI > config space via a PCIe Extended Capability Structure. > The PCIe spec defines several protocols (including one to discover what > protocols a given DOE instance supports) and other specification such as > CXL define additional protocols using their own vendor IDs. > > In this series we make use of the DOE to support the CXL spec defined > Table Access Protocol, specifically to provide access to CDAT - a > table specified in a specification that is hosted by the UEFI forum > and is used to provide runtime discoverability of the sort of information > that would otherwise be available in firmware tables (memory types, > latency and bandwidth information etc). > > The Linux kernel gained support for DOE / CDAT on CXL type 3 EPs in 6.0. > The version merged did not support interrupts (earlier versions did > so that support in the emulation was tested a while back). > > This series provides CDAT emulation for CXL switch upstream ports > and CXL type 3 memory devices. Note that to exercise the switch support > additional Linux kernel patches are needed. > https://lore.kernel.org/linux-cxl/20220503153449.4088-1-jonathan.came...@huawei.com/ > (I'll post a new version of that support shortly) > > Additional protocols will be supported by follow on patch sets: > * CXL compliance protocol. > * CMA / SPDM device attestation. > (Old version at https://gitlab.com/jic23/qemu/-/commits/cxl-next - will > refresh > that tree next week) > > Huai-Cheng Kuo (3): > hw/pci: PCIe Data Object Exchange emulation > hw/cxl/cdat: CXL CDAT Data Object Exchange implementation > hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange > > Jonathan Cameron (2): > hw/mem/cxl-type3: Add MSIX support > hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE > > MAINTAINERS| 7 + > hw/cxl/cxl-cdat.c | 222 > hw/cxl/meson.build | 1 + > hw/mem/cxl_type3.c | 236 + > hw/pci-bridge/cxl_upstream.c | 182 +++- > hw/pci/meson.build | 1 + > hw/pci/pcie_doe.c | 367 + > include/hw/cxl/cxl_cdat.h | 166 +++ > include/hw/cxl/cxl_component.h | 7 + > include/hw/cxl/cxl_device.h| 3 + > include/hw/cxl/cxl_pci.h | 1 + > include/hw/pci/pci_ids.h | 3 + > include/hw/pci/pcie.h | 1 + > include/hw/pci/pcie_doe.h | 123 +++ > include/hw/pci/pcie_regs.h | 4 + > 15 files changed, 1323 insertions(+), 1 deletion(-) > create mode 100644 hw/cxl/cxl-cdat.c > create mode 100644 hw/pci/pcie_doe.c > create mode 100644 include/hw/cxl/cxl_cdat.h > create mode 100644 include/hw/pci/pcie_doe.h >
Re: [PATCH] migration: Fix the minus value for compressed_size
* Zhenzhong Duan (zhenzhong.d...@intel.com) wrote: > When update_compress_thread_counts() is called first time, there is > no data stream yet. We see compression_counters.compressed_size > becomes minus value shortly. > > Signed-off-by: Zhenzhong Duan > --- > migration/ram.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/migration/ram.c b/migration/ram.c > index dc1de9ddbc68..510db95cdc36 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -1443,6 +1443,10 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream > *stream, RAMBlock *block, > static void > update_compress_thread_counts(const CompressParam *param, int bytes_xmit) > { > +if (bytes_xmit <= 0) { > +return; > +} What's the call path where that happens? The only place I see bytes_xmit being less than 0 is in compress_page_with_multi_thread where it's initialised to -1 - but it's always updated before the call to update_compress_thread_counts. I wonder if the real problem is: compression_counters.compressed_size += bytes_xmit - 8; Is bytes_xmit being less than 8 for some reason? Dave > ram_transferred_add(bytes_xmit); > > if (param->zero_page) { > -- > 2.25.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [PATCH v2 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext
* David Hildenbrand (da...@redhat.com) wrote: > This is a follow-up on "util: NUMA aware memory preallocation" [1] by > Michal. > > Setting the CPU affinity of threads from inside QEMU usually isn't > easily possible, because we don't want QEMU -- once started and running > guest code -- to be able to mess up the system. QEMU disallows relevant > syscalls using seccomp, such that any such invocation will fail. > > Especially for memory preallocation in memory backends, the CPU affinity > can significantly increase guest startup time, for example, when running > large VMs backed by huge/gigantic pages, because of NUMA effects. For > NUMA-aware preallocation, we have to set the CPU affinity, however: > > (1) Once preallocation threads are created during preallocation, management > tools cannot intercept anymore to change the affinity. These threads > are created automatically on demand. > (2) QEMU cannot easily set the CPU affinity itself. > (3) The CPU affinity derived from the NUMA bindings of the memory backend > might not necessarily be exactly the CPUs we actually want to use > (e.g., CPU-less NUMA nodes, CPUs that are pinned/used for other VMs). > > There is an easy "workaround". If we have a thread with the right CPU > affinity, we can simply create new threads on demand via that prepared > context. So, all we have to do is setup and create such a context ahead > of time, to then configure preallocation to create new threads via that > environment. > > So, let's introduce a user-creatable "thread-context" object that > essentially consists of a context thread used to create new threads. > QEMU can either try setting the CPU affinity itself ("cpu-affinity", > "node-affinity" property), or upper layers can extract the thread id > ("thread-id" property) to configure it externally. > > Make memory-backends consume a thread-context object > (via the "prealloc-context" property) and use it when preallocating to > create new threads with the desired CPU affinity. Further, to make it > easier to use, allow creation of "thread-context" objects, including > setting the CPU affinity directly from QEMU, before enabling the > sandbox option. > > > Quick test on a system with 2 NUMA nodes: > > Without CPU affinity: > time qemu-system-x86_64 \ > -object > memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind > \ > -nographic -monitor stdio > > real0m5.383s > real0m3.499s > real0m5.129s > real0m4.232s > real0m5.220s > real0m4.288s > real0m3.582s > real0m4.305s > real0m5.421s > real0m4.502s > > -> It heavily depends on the scheduler CPU selection > > With CPU affinity: > time qemu-system-x86_64 \ > -object thread-context,id=tc1,node-affinity=0 \ > -object > memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind,prealloc-context=tc1 > \ > -sandbox enable=on,resourcecontrol=deny \ > -nographic -monitor stdio > > real0m1.959s > real0m1.942s > real0m1.943s > real0m1.941s > real0m1.948s > real0m1.964s > real0m1.949s > real0m1.948s > real0m1.941s > real0m1.937s > > On reasonably large VMs, the speedup can be quite significant. > > While this concept is currently only used for short-lived preallocation > threads, nothing major speaks against reusing the concept for other > threads that are harder to identify/configure -- except that > we need additional (idle) context threads that are otherwise left unused. > > This series does not yet tackle concurrent preallocation of memory > backends. Memory backend objects are created and memory is preallocated one > memory backend at a time -- and there is currently no way to do > preallocation asynchronously. Since you seem to have a full set of r-b's - do you intend to merge this as-is or do the cuncurrenct preallocation first? Dave > [1] > https://lkml.kernel.org/r/ffdcd118d59b379ede2b64745144165a40f6a813.1652165704.git.mpriv...@redhat.com > > v1 -> v2: > * Fixed some minor style nits > * "util: Introduce ThreadContext user-creatable object" > -> Impove documentation and patch description. [Markus] > * "util: Add write-only "node-affinity" property for ThreadContext" > -> Impove documentation and patch description. [Markus] > > RFC -> v1: > * "vl: Allow ThreadContext objects to be created before the sandbox option" > -> Move parsing of the "name" property before object_create_pre_sandbox > * Added RB's > > Cc: Michal Privoznik > Cc: Igor Mammedov > Cc: "Michael S. Tsirkin" > Cc: Paolo Bonzini > Cc: "Daniel P. Berrangé" > Cc: Eduardo Habkost > Cc: Dr. David Alan Gilbert > Cc: Eric Blake > Cc: Markus Armbruster > Cc: Richard Henderson > Cc: Stefan Weil > > David Hildenbrand (7): >
Re: [PATCH v2 05/11] blockjob: implement .change_aio_ctx in child_job
Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben: > child_job_change_aio_ctx() is very similar to > child_job_can_set_aio_ctx(), but it implements a new transaction > so that if all check pass, the new transaction's .commit() > will take care of changin the BlockJob AioContext. > child_job_set_aio_ctx_commit() is similar to child_job_set_aio_ctx(), > but it doesn't need to invoke the recursion, as this is already > taken care by child_job_change_aio_ctx(). > > Note: bdrv_child_try_change_aio_context() is not called by > anyone at this point. > > Reviewed-by: Hanna Reitz > Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Kevin Wolf
Re: [PATCH v2 07/11] block-backend: implement .change_aio_ctx in child_root
Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben: > blk_root_change_aio_ctx() is very similar to blk_root_can_set_aio_ctx(), > but implements a new transaction so that if all check pass, the new > transaction's .commit will take care of changing the BlockBackend > AioContext. blk_root_set_aio_ctx_commit() is the same as > blk_root_set_aio_ctx(). > > Note: bdrv_child_try_change_aio_context() is not called by > anyone at this point. > > Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Kevin Wolf
Re: [PATCH v2 06/11] block: implement .change_aio_ctx in child_of_bds
Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben: > bdrv_child_cb_change_aio_ctx() is identical to > bdrv_child_cb_can_set_aio_ctx(), as we only need > to recursively go on the parent bs. > > Note: bdrv_child_try_change_aio_context() is not called by > anyone at this point. > > Reviewed-by: Hanna Reitz > Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Kevin Wolf
Re: [PATCH v3 4/8] m25p80: Add the mx25l25635f SFPD table
Hi Cedric, On [2022 Oct 10] Mon 11:58:40, Michael Walle wrote: > Am 2022-10-10 08:23, schrieb Cédric Le Goater: > > On 10/7/22 16:44, Francisco Iglesias wrote: > > > > > --- a/hw/block/m25p80.c > > > > +++ b/hw/block/m25p80.c > > > > @@ -234,6 +234,8 @@ static const FlashPartInfo known_devices[] = { > > > > { INFO("mx25l12855e", 0xc22618, 0, 64 << 10, 256, 0) }, > > > > { INFO6("mx25l25635e", 0xc22019, 0xc22019, 64 << 10, > > > > 512, 0), > > > > .sfdp_read = m25p80_sfdp_mx25l25635e }, > > > > +{ INFO6("mx25l25635f", 0xc22019, 0xc22019, 64 << 10, > > > > 512, 0), I think I missed the (ER_4K | ER_32K) flags above (in case we go for a v4 we can add it in). > > > > > > I think I'm not seeing the extended id part in the datasheet I've > > > found so > > > might be that you can switch to just INFO and _ext_id 0 above > > > > This was added by commit 6bbe036f32dc ("m25p80: Return the JEDEC ID > > twice for > > mx25l25635e") to fix a real breakage on HW. > > From my experience, the ID has a particular length, at least three bytes > and if you read past that length for some (all?) devices the id bytes just > get repeated. I.e. the counter in the device will just wrap to offset 0 > again. If you want to emulate the hardware correctly, you would have to > take that into consideration. If we decide to go with Michael's proposal above you can use '0' on the 'extended_id' and enable 's->data_read_loop = true' when reading the ID. Best regards, Francisco > But I don't think it's worth it, OTOH there seems to be some broken > software which rely on that (undefined?) behavior. > > -michael
Re: [PATCH 1/6] device-tree: add re-randomization helper function
On Thu, 6 Oct 2022 at 14:16, Peter Maydell wrote: > > On Fri, 30 Sept 2022 at 00:23, Jason A. Donenfeld wrote: > > > > When the system reboots, the rng-seed that the FDT has should be > > re-randomized, so that the new boot gets a new seed. Several > > architectures require this functionality, so export a function for > > injecting a new seed into the given FDT. > > > > Cc: Alistair Francis > > Cc: David Gibson > > Signed-off-by: Jason A. Donenfeld > > Hi; I've applied this series to target-arm.next (seems the easiest way > to take it into the tree). Unfortunately it turns out that this breaks the reverse-debugging test that is part of 'make check-avocado'. Running all of 'check-avocado' takes a long time, so here's how to run the specific test: make -C your-build-tree check-venv # Only for the first time your-build-tree/tests/venv/bin/avocado run your-build-tree/tests/avocado/boot_linux.py Probably more convenient though is to run the equivalent commands by hand: wget -O /tmp/vmlinuz https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/29/Everything/aarch64/os/images/pxeboot/vmlinuz ./build/x86/qemu-img create -f qcow2 /tmp/disk.qcow2 128M ./build/x86/qemu-system-aarch64 -display none -machine virt -serial stdio -cpu cortex-a53 -icount shift=7,rr=record,rrfile=/tmp/qemu.rr,rrsnapshot=init -net none -drive file=/tmp/disk.qcow2 -kernel /tmp/vmlinuz # this will boot the kernel to the no-root-fs panic; hit ctrl-C when it gets there ./build/x86/qemu-system-aarch64 -display none -machine virt -serial stdio -cpu cortex-a53 -icount shift=7,rr=replay,rrfile=/tmp/qemu.rr,rrsnapshot=init -net none -drive file=/tmp/disk.qcow2 -kernel /tmp/vmlinuz # same command line, but 'replay' rather than 'record', QEMU will exit with an error: qemu-system-aarch64: Missing random event in the replay log Without these patches the replay step will replay the recorded execution up to the guest panic. The error is essentially the record-and-replay subsystem saying "the replay just asked for a random number at point when the recording did not ask for one, and so there's no 'this is what the number was' info in the record". I have had a quick look, and I think the reason for this is that load_snapshot() ("reset the VM state to the snapshot state stored in the disk image or migration stream") does a system reset. The replay process involves a lot of "load state from a snapshot and play forwards from there" operations. It doesn't expect that load_snapshot() would result in something reading random data, but now that we are calling qemu_guest_getrandom() in a reset hook, that happens. I'm not sure exactly what the best approach here is, so I've cc'd the migration and replay submaintainers. For the moment I'm dropping this patchset from target-arm.next. thanks -- PMM
Re: [PATCH 1/6] device-tree: add re-randomization helper function
On Mon, 10 Oct 2022 at 11:54, Peter Maydell wrote: > > On Thu, 6 Oct 2022 at 14:16, Peter Maydell wrote: > > > > On Fri, 30 Sept 2022 at 00:23, Jason A. Donenfeld wrote: > > > > > > When the system reboots, the rng-seed that the FDT has should be > > > re-randomized, so that the new boot gets a new seed. Several > > > architectures require this functionality, so export a function for > > > injecting a new seed into the given FDT. > > > > > > Cc: Alistair Francis > > > Cc: David Gibson > > > Signed-off-by: Jason A. Donenfeld > > > > Hi; I've applied this series to target-arm.next (seems the easiest way > > to take it into the tree). > > Unfortunately it turns out that this breaks the reverse-debugging > test that is part of 'make check-avocado'. > > Running all of 'check-avocado' takes a long time, so here's how > to run the specific test: > > make -C your-build-tree check-venv # Only for the first time > your-build-tree/tests/venv/bin/avocado run > your-build-tree/tests/avocado/boot_linux.py derp, wrong test name, should be your-build-tree/tests/venv/bin/avocado run your-build-tree/tests/avocado/reverse_debugging.py -- PMM
RE: [PATCH] migration: Fix the minus value for compressed_size
>-Original Message- >From: Dr. David Alan Gilbert >Sent: Monday, October 10, 2022 6:35 PM >To: Duan, Zhenzhong >Cc: qemu-devel@nongnu.org; quint...@redhat.com >Subject: Re: [PATCH] migration: Fix the minus value for compressed_size > >* Zhenzhong Duan (zhenzhong.d...@intel.com) wrote: >> When update_compress_thread_counts() is called first time, there is no >> data stream yet. We see compression_counters.compressed_size >> becomes minus value shortly. >> >> Signed-off-by: Zhenzhong Duan >> --- >> migration/ram.c | 4 >> 1 file changed, 4 insertions(+) >> >> diff --git a/migration/ram.c b/migration/ram.c index >> dc1de9ddbc68..510db95cdc36 100644 >> --- a/migration/ram.c >> +++ b/migration/ram.c >> @@ -1443,6 +1443,10 @@ static bool do_compress_ram_page(QEMUFile *f, >> z_stream *stream, RAMBlock *block, static void >> update_compress_thread_counts(const CompressParam *param, int >> bytes_xmit) { >> +if (bytes_xmit <= 0) { >> +return; >> +} > >What's the call path where that happens? The only place I see bytes_xmit >being less than 0 is in compress_page_with_multi_thread where it's initialised >to -1 - but it's always updated before the call to >update_compress_thread_counts. > >I wonder if the real problem is: > >compression_counters.compressed_size += bytes_xmit - 8; > >Is bytes_xmit being less than 8 for some reason? bytes_xmit is 0 when update_compress_thread_counts() is called first time as comp_param[idx].file is empty, which makes compression_counters.compressed_size=-8 (gdb) bt #0 update_compress_thread_counts (param=0x7ffe340011d0, bytes_xmit=0) at ../migration/ram.c:1446 #1 0x55cbf480 in flush_compressed_data (rs=0x7ffe34259d40) at ../migration/ram.c:1486 #2 0x55cc0a53 in save_compress_page (rs=0x7ffe34259d40, block=0x56aab280, offset=0) at ../migration/ram.c:2260 #3 0x55cc0b0e in ram_save_target_page (rs=0x7ffe34259d40, pss=0x7ffece7fb800) at ../migration/ram.c:2290 #4 0x55cc0fd8 in ram_save_host_page (rs=0x7ffe34259d40, pss=0x7ffece7fb800) at ../migration/ram.c:2487 #5 0x55cc11ce in ram_find_and_save_block (rs=0x7ffe34259d40) at ../migration/ram.c:2576 #6 0x55cc2863 in ram_save_iterate (f=0x56a98b30, opaque=0x567c92e0 ) at ../migration/ram.c:3304 #7 0x55ae86bd in qemu_savevm_state_iterate (f=0x56a98b30, postcopy=false) at ../migration/savevm.c:1261 #8 0x55ad7500 in migration_iteration_run (s=0x56ad5390) at ../migration/migration.c:3757 #9 0x55ad7abb in migration_thread (opaque=0x56ad5390) at ../migration/migration.c:3988 #10 0x55f20012 in qemu_thread_start (args=0x56862180) at ../util/qemu-thread-posix.c:504 #11 0x778f5609 in start_thread (arg=) at pthread_create.c:477 #12 0x7781a163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) n (gdb) n (gdb) p compression_counters $6 = {pages = 0, busy = 0, busy_rate = 0, compressed_size = -8, compression_rate = 0} Thanks Zhenzhong
Re: [PATCH v2 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext
On 10.10.22 12:40, Dr. David Alan Gilbert wrote: * David Hildenbrand (da...@redhat.com) wrote: This is a follow-up on "util: NUMA aware memory preallocation" [1] by Michal. Setting the CPU affinity of threads from inside QEMU usually isn't easily possible, because we don't want QEMU -- once started and running guest code -- to be able to mess up the system. QEMU disallows relevant syscalls using seccomp, such that any such invocation will fail. Especially for memory preallocation in memory backends, the CPU affinity can significantly increase guest startup time, for example, when running large VMs backed by huge/gigantic pages, because of NUMA effects. For NUMA-aware preallocation, we have to set the CPU affinity, however: (1) Once preallocation threads are created during preallocation, management tools cannot intercept anymore to change the affinity. These threads are created automatically on demand. (2) QEMU cannot easily set the CPU affinity itself. (3) The CPU affinity derived from the NUMA bindings of the memory backend might not necessarily be exactly the CPUs we actually want to use (e.g., CPU-less NUMA nodes, CPUs that are pinned/used for other VMs). There is an easy "workaround". If we have a thread with the right CPU affinity, we can simply create new threads on demand via that prepared context. So, all we have to do is setup and create such a context ahead of time, to then configure preallocation to create new threads via that environment. So, let's introduce a user-creatable "thread-context" object that essentially consists of a context thread used to create new threads. QEMU can either try setting the CPU affinity itself ("cpu-affinity", "node-affinity" property), or upper layers can extract the thread id ("thread-id" property) to configure it externally. Make memory-backends consume a thread-context object (via the "prealloc-context" property) and use it when preallocating to create new threads with the desired CPU affinity. Further, to make it easier to use, allow creation of "thread-context" objects, including setting the CPU affinity directly from QEMU, before enabling the sandbox option. Quick test on a system with 2 NUMA nodes: Without CPU affinity: time qemu-system-x86_64 \ -object memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind \ -nographic -monitor stdio real0m5.383s real0m3.499s real0m5.129s real0m4.232s real0m5.220s real0m4.288s real0m3.582s real0m4.305s real0m5.421s real0m4.502s -> It heavily depends on the scheduler CPU selection With CPU affinity: time qemu-system-x86_64 \ -object thread-context,id=tc1,node-affinity=0 \ -object memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind,prealloc-context=tc1 \ -sandbox enable=on,resourcecontrol=deny \ -nographic -monitor stdio real0m1.959s real0m1.942s real0m1.943s real0m1.941s real0m1.948s real0m1.964s real0m1.949s real0m1.948s real0m1.941s real0m1.937s On reasonably large VMs, the speedup can be quite significant. While this concept is currently only used for short-lived preallocation threads, nothing major speaks against reusing the concept for other threads that are harder to identify/configure -- except that we need additional (idle) context threads that are otherwise left unused. This series does not yet tackle concurrent preallocation of memory backends. Memory backend objects are created and memory is preallocated one memory backend at a time -- and there is currently no way to do preallocation asynchronously. Hi Dave, Since you seem to have a full set of r-b's - do you intend to merge this as-is or do the cuncurrenct preallocation first? I intent to merge this as is, as it provides a benefit as it stands and concurrent preallcoation might not require user interface changes. I do have some ideas on how to implement concurrent preallocation, but it needs more thought (and more importantly, time). -- Thanks, David / dhildenb
Re: [PATCH v6 08/13] numa: use QLIST_FOREACH_SAFE() for RAM block notifiers
On 06.10.22 23:35, Stefan Hajnoczi wrote: Make list traversal work when a callback removes a notifier mid-traversal. This is a cleanup to prevent bugs in the future. Signed-off-by: Stefan Hajnoczi --- hw/core/numa.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/hw/core/numa.c b/hw/core/numa.c index 31e6fe1caa..ea24a5fa8c 100644 --- a/hw/core/numa.c +++ b/hw/core/numa.c @@ -857,8 +857,9 @@ void ram_block_notifier_remove(RAMBlockNotifier *n) void ram_block_notify_add(void *host, size_t size, size_t max_size) { RAMBlockNotifier *notifier; +RAMBlockNotifier *next; -QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) { +QLIST_FOREACH_SAFE(notifier, &ram_list.ramblock_notifiers, next, next) { if (notifier->ram_block_added) { notifier->ram_block_added(notifier, host, size, max_size); } @@ -868,8 +869,9 @@ void ram_block_notify_add(void *host, size_t size, size_t max_size) void ram_block_notify_remove(void *host, size_t size, size_t max_size) { RAMBlockNotifier *notifier; +RAMBlockNotifier *next; -QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) { +QLIST_FOREACH_SAFE(notifier, &ram_list.ramblock_notifiers, next, next) { if (notifier->ram_block_removed) { notifier->ram_block_removed(notifier, host, size, max_size); } @@ -879,8 +881,9 @@ void ram_block_notify_remove(void *host, size_t size, size_t max_size) void ram_block_notify_resize(void *host, size_t old_size, size_t new_size) { RAMBlockNotifier *notifier; +RAMBlockNotifier *next; -QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) { +QLIST_FOREACH_SAFE(notifier, &ram_list.ramblock_notifiers, next, next) { if (notifier->ram_block_resized) { notifier->ram_block_resized(notifier, host, old_size, new_size); } Reviewed-by: David Hildenbrand -- Thanks, David / dhildenb
Re: [PATCH] target/arm: Make the final stage1+2 write to secure be unconditional
On Fri, 7 Oct 2022 at 17:53, Richard Henderson wrote: > > On 10/7/22 09:20, Peter Maydell wrote: > >> -/* Check if IPA translates to secure or non-secure PA space. > >> */ > >> -if (is_secure) { > >> -if (ipa_secure) { > >> -result->attrs.secure = > >> -!(env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW)); > >> -} else { > >> -result->attrs.secure = > >> -!((env->cp15.vtcr_el2 & (VTCR_NSA | VTCR_NSW)) > >> -|| (env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW))); > >> -} > >> -} > > > > If: > > is_secure == true > > ipa_secure == false > > (env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW) is non-zero > > (env->cp15.vtcr_el2 & (VTCR_NSA | VTCR_NSW) is zero > > then the old code sets attrs.secure to true... > > No, I think the misalignment of the two lines wrt the !() may have been > confusing: Ah, yes -- I was indeed confused by the misalignment. thanks -- PMM
Re: [PATCH v4] x86: add etc/phys-bits fw_cfg file
Hi, > > > > Given newer processors have more than 40 and for older ones we know > > > > the possible values for the two relevant x86 vendors we could do > > > > something along the lines of: > > > > > > > >phys-bits >= 41 -> valid > > > >phys-bits == 40+ AuthenticAMD -> valid > > > >phys-bits == 36,39 + GenuineIntel -> valid > > > >everything else -> invalid > I dropped the patch for now. You can drop it forever. For the mail archives and anyone interested: The approach outlined above appears to work well, patches just landed in edk2 master branch. Next edk2 stable tag (2022-11) will have it. take care, Gerd
Re: [PULL 28/54] configure: build ROMs with container-based cross compilers
On Tue, Oct 04, 2022 at 02:01:12PM +0100, Alex Bennée wrote: > From: Paolo Bonzini > > s390-ccw remains a bit more complex, because the -march=z900 test is done > only for the native cross compiler. Otherwise, all that is needed is > to pass the (now mandatory) target argument to write_target_makefile. > > Signed-off-by: Paolo Bonzini > Signed-off-by: Alex Bennée > Message-Id: <20220929114231.583801-29-alex.ben...@linaro.org> I'm not at all convinced this change was/is a good idea. First of all, it causes 'make' to now download about 1 GB of container images $ ./configure --target-list=x86_64-softmmu $ make ...snip... BUILD debian-powerpc-test-cross Trying to pull registry.gitlab.com/qemu-project/qemu/qemu/debian-powerpc-test-cross:latest... Getting image source signatures Copying blob 2a205c8a1d36 [=>] 12.4MiB / 257.2MiB ... ...snip... Despite downloading this image, it then proceeded to rebuild the image from scratch, requiring another few 100MBs of downloads of dpkgs. This time the download was without progress information until it entirely failed due to a dead Debia mirror server, needing a retry. It then went on to download an s390x image which seems to have two layers, each with 360 MB. BUILD debian-s390x-cross Trying to pull registry.gitlab.com/qemu-project/qemu/qemu/debian-s390x-cross:latest... Getting image source signatures Copying blob fc8d65e34cd5 [>-] 12.0MiB / 360.2MiB Copying blob bd159e379b3b skipped: already exists Copying blob 13224e2971af [>-] 12.2MiB / 366.5MiB So overall it was more than 1 GB of downloads when typing 'make' I wasn't too amuzed by seeing this downloaded data , given that I'm usually running off a 4G mobile connection, and it took a very long time. The progress information printed by docker when downloading the images splatters all over the output meson displays, when doing a parallel make making everything unintelligible. Finally, I had requested only building x86_64, so we shouldn't be doing anything related to ppc or s390 at all, but even if AFAICT, it enables this downloading unconditionally merely by having 'docker'/'podman' binaries installed, if you don't otherwise have cross compuilers present. I'd really not want to see any of this stuff downloaded without an explicit opt-in choice at configure time. I'm also a little concerned at what happens if we have to stop publishing the containers at registry.gitlab.com in future. Are we going to break the default 'make' for existing released QEMU tarballs ? Generally we've only relied on the gitlab infra for our CI testing, so we have been free to change infra or alter the way we publish images at any time, without risk of impact on the released tarballs. This isn't a theoretical problem, because GitLab has announced their intention to limit storage usage in gitlab.com, and even having joined the Open Source Program, our quota is only increased from 5 GB to 25 GB. I'd be concerned we're at risk of exceeding that 25 GB limit, when they start to enforce it, requiring us to move container image host to somewhere else such as quay.io > diff --git a/configure b/configure > index c175650eb9..a54e17aca9 100755 > --- a/configure > +++ b/configure > @@ -2152,7 +2152,7 @@ probe_target_compiler() { > target_ranlib= > target_strip= >fi > - test -n "$target_cc" > + test -n "$target_cc" || test -n "$container_image" > } > > write_target_makefile() { > @@ -2307,7 +2307,7 @@ if test "$targetos" != "darwin" && test "$targetos" != > "sunos" && \ > config_mak=pc-bios/optionrom/config.mak > echo "# Automatically generated by configure - do not modify" > > $config_mak > echo "TOPSRC_DIR=$source_path" >> $config_mak > -write_target_makefile >> $config_mak > +write_target_makefile pc-bios/optionrom/all >> $config_mak > fi > > if test "$softmmu" = yes && probe_target_compiler ppc-softmmu; then > @@ -2315,25 +2315,31 @@ if test "$softmmu" = yes && probe_target_compiler > ppc-softmmu; then > config_mak=pc-bios/vof/config.mak > echo "# Automatically generated by configure - do not modify" > > $config_mak > echo "SRC_DIR=$source_path/pc-bios/vof" >> $config_mak > -write_target_makefile >> $config_mak > +write_target_makefile pc-bios/vof/all >> $config_mak > fi > > # Only build s390-ccw bios if the compiler has -march=z900 or -march=z10 > # (which is the lowest architecture level that Clang supports) > if test "$softmmu" = yes && probe_target_compiler s390x-softmmu; then > - write_c_skeleton > - do_compiler "$target_cc" $target_cc_cflags -march=z900 -o $TMPO -c $TMPC > - has_z900=$? > - if [ $has_z900 = 0 ] || do_compiler "$target_cc" $target_cc_cflags > -march=z10 -msoft-float -Werror -o $TMPO -c $TMPC; then > -if [ $has_z900 != 0 ]; then > - echo "WARNING: Your compiler does not support the
Re: [PATCH v2 08/11] block: use the new _change_ API instead of _can_set_ and _set_
Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben: > Replace all direct usage of ->can_set_aio_ctx and ->set_aio_ctx, > and call bdrv_child_try_change_aio_context() in > bdrv_try_set_aio_context(), the main function called through > the whole block layer. > > From this point onwards, ->can_set_aio_ctx and ->set_aio_ctx > won't be used anymore. > > Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Kevin Wolf
Re: [PATCH v2 09/11] block: remove all unused ->can_set_aio_ctx and ->set_aio_ctx callbacks
Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben: > Together with all _can_set_ and _set_ APIs, as they are not needed > anymore. > > Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Kevin Wolf
Re: [PATCH v2 10/11] block: rename bdrv_child_try_change_aio_context in bdrv_try_change_aio_context
Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben: > No functional changes intended. > > Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Kevin Wolf
Re: [PATCH v2 11/11] block: remove bdrv_try_set_aio_context and replace it with bdrv_try_change_aio_context
Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben: > No functional change intended. > > Signed-off-by: Emanuele Giuseppe Esposito Reviewed-by: Kevin Wolf
Re: [PULL 28/54] configure: build ROMs with container-based cross compilers
On Mon, Oct 10, 2022 at 01:54:16PM +0100, Daniel P. Berrangé wrote: > On Tue, Oct 04, 2022 at 02:01:12PM +0100, Alex Bennée wrote: > > From: Paolo Bonzini > > > > s390-ccw remains a bit more complex, because the -march=z900 test is done > > only for the native cross compiler. Otherwise, all that is needed is > > to pass the (now mandatory) target argument to write_target_makefile. > > > > Signed-off-by: Paolo Bonzini > > Signed-off-by: Alex Bennée > > Message-Id: <20220929114231.583801-29-alex.ben...@linaro.org> > > I'm not at all convinced this change was/is a good idea. > > First of all, it causes 'make' to now download about 1 GB of > container images > > $ ./configure --target-list=x86_64-softmmu > $ make > ...snip... > BUILD debian-powerpc-test-cross > Trying to pull > registry.gitlab.com/qemu-project/qemu/qemu/debian-powerpc-test-cross:latest... > Getting image source signatures > Copying blob 2a205c8a1d36 [=>] 12.4MiB > / 257.2MiB > > ... > ...snip... > > Despite downloading this image, it then proceeded to rebuild the > image from scratch, requiring another few 100MBs of downloads > of dpkgs. This time the download was without progress information > until it entirely failed due to a dead Debia mirror server, needing > a retry. This failure seems worse than I realized. It also failed on a non-responsive mirror when re-building the s390 image. WHen I re-ran 'make' it definitely did not retry the re-build process, as 'BUILD debian-s390x-cross' was instantaneous. So it looks like it is using the downloaded cached image, despite the previous 'make' apparently considering that to have been outdated content needing a rebuild. FWIW the failure output was: BUILD debian-s390x-cross Trying to pull registry.gitlab.com/qemu-project/qemu/qemu/debian-s390x-cross:latest... Getting image source signatures Copying blob fc8d65e34cd5 [===>--] 341.6MiB / 360.2MiB Copying blob fc8d65e34cd5 done Copying blob bd159e379b3b skipped: already exists Copying blob 13224e2971af done Copying config 67a127f7cd done Writing manifest to image destination Storing signatures debconf: delaying package configuration, since apt-utils is not installed E: Failed to fetch http://deb.debian.org/debian/pool/main/s/systemd/systemd_247.3-7%2bdeb11u1_amd64.deb 400 Bad Request [IP: 199.232.122.132 80] E: Failed to fetch http://deb.debian.org/debian/pool/main/p/perl/perl_5.32.1-4%2bdeb11u2_amd64.deb Error reading from server - read (104: Connection reset by peer) [IP: 199.232.122.132 80] E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing? Error: error building at STEP "RUN export DEBIAN_FRONTEND=noninteractive && apt-get update && apt-get install -y eatmydata && eatmydata apt-get dist-upgrade -y && eatmydata apt-get install --no-install-recommends -y bash bc bison bsdextrautils bzip2 ca-certificates ccache dbus debianutils diffutils exuberant-ctags findutils flex gcovr genisoimage gettext git hostname libglib2.0-dev libpcre2-dev libspice-protocol-dev llvm locales make meson ncat ninja-build openssh-client perl-base pkgconf python3 python3-numpy python3-opencv python3-pillow python3-pip python3-sphinx python3-sphinx-rtd-theme python3-venv python3-yaml rpm2cpio sed sparse tar tesseract-ocr tesseract-ocr-eng texinfo && eatmydata apt-get autoremove -y && eatmydata apt-get autoclean -y && sed -Ei 's,^# (en_US\.UTF-8 .*)$,\1,' /etc/locale.gen && dpkg-reconfigure locales": error while running runtime: exit status 100 Traceback (most recent call last): File "/home/berrange/src/virt/qemu/tests/docker/docker.py", line 683, in sys.exit(main()) File "/home/berrange/src/virt/qemu/tests/docker/docker.py", line 679, in main return args.cmdobj.run(args, argv) File "/home/berrange/src/virt/qemu/tests/docker/docker.py", line 493, in run dkr.build_image(tag, docker_dir, dockerfile, File "/home/berrange/src/virt/qemu/tests/docker/docker.py", line 343, in build_image self._do_check(build_args, File "/home/berrange/src/virt/qemu/tests/docker/docker.py", line 247, in _do_check return subprocess.check_call(self._command + cmd, **kwargs) File "/usr/lib64/python3.10/subprocess.py", line 369, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command
Re: [PATCH v3 00/42] target/arm: Implement FEAT_HAFDBS
On Sat, 1 Oct 2022 at 17:24, Richard Henderson wrote: > > This is a major reorg to arm page table walking. While the result > here is "merely" Hardware-assited Access Flag and Dirty Bit Setting > (HAFDBS), the ultimate goal is the Realm Management Extension (RME). > RME "recommends" that HAFDBS be implemented (I_CSLWZ). > I've taken patches 1-20 plus the extra patch 1.5 into target-arm.next (last patch taken "Use tlb_set_page_full"). thanks -- PMM
Re: [PATCH] hw/arm/boot: set CPTR_EL3.ESM and SCR_EL3.EnTP2 when booting Linux with EL3
On Mon, 3 Oct 2022 at 15:57, Jerome Forissier wrote: > > According to the Linux kernel booting.rst [1], CPTR_EL3.ESM and > SCR_EL3.EnTP2 must be initialized to 1 when EL3 is present and FEAT_SME > is advertised. This has to be taken care of when QEMU boots directly > into the kernel (i.e., "-M virt,secure=on -cpu max -kernel Image"). > > Cc: qemu-sta...@nongnu.org > Fixes: 78cb9776662a ("target/arm: Enable SME for -cpu max") > Link: [1] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/arm64/booting.rst?h=v6.0#n321 Ooh, that detailed set of control bit requirements is new since I last read that doc -- we should probably go through and cross-check that we're setting them all correctly. > Signed-off-by: Jerome Forissier > --- > hw/arm/boot.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/hw/arm/boot.c b/hw/arm/boot.c > index ada2717f76..ee3858b673 100644 > --- a/hw/arm/boot.c > +++ b/hw/arm/boot.c > @@ -763,6 +763,10 @@ static void do_cpu_reset(void *opaque) > if (cpu_isar_feature(aa64_sve, cpu)) { > env->cp15.cptr_el[3] |= R_CPTR_EL3_EZ_MASK; > } > +if (cpu_isar_feature(aa64_sme, cpu)) { > +env->cp15.cptr_el[3] |= R_CPTR_EL3_ESM_MASK; > +env->cp15.scr_el3 |= SCR_ENTP2; > +} The doc also says that we (as fake EL3) should be setting SMCR_EL3.LEN to the same value for all CPUs. Currently we do do that, but it's always the reset value of 0. Richard: does that have any odd effects (I have a feeling it clamps the VL to the minimum supported value)? Should we be setting SMCR_EL3.LEN to the max supported value here ? thanks -- PMM
Re: [PATCH] hw/arm/boot: set CPTR_EL3.ESM and SCR_EL3.EnTP2 when booting Linux with EL3
On Mon, 10 Oct 2022 at 14:16, Peter Maydell wrote: > > On Mon, 3 Oct 2022 at 15:57, Jerome Forissier > wrote: > > > > According to the Linux kernel booting.rst [1], CPTR_EL3.ESM and > > SCR_EL3.EnTP2 must be initialized to 1 when EL3 is present and FEAT_SME > > is advertised. This has to be taken care of when QEMU boots directly > > into the kernel (i.e., "-M virt,secure=on -cpu max -kernel Image"). > > > > Cc: qemu-sta...@nongnu.org > > Fixes: 78cb9776662a ("target/arm: Enable SME for -cpu max") > > Link: [1] > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/arm64/booting.rst?h=v6.0#n321 > > Ooh, that detailed set of control bit requirements is new > since I last read that doc -- we should probably go through > and cross-check that we're setting them all correctly. > > > Signed-off-by: Jerome Forissier > > --- > > hw/arm/boot.c | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/hw/arm/boot.c b/hw/arm/boot.c > > index ada2717f76..ee3858b673 100644 > > --- a/hw/arm/boot.c > > +++ b/hw/arm/boot.c > > @@ -763,6 +763,10 @@ static void do_cpu_reset(void *opaque) > > if (cpu_isar_feature(aa64_sve, cpu)) { > > env->cp15.cptr_el[3] |= R_CPTR_EL3_EZ_MASK; > > } > > +if (cpu_isar_feature(aa64_sme, cpu)) { > > +env->cp15.cptr_el[3] |= R_CPTR_EL3_ESM_MASK; > > +env->cp15.scr_el3 |= SCR_ENTP2; > > +} > > The doc also says that we (as fake EL3) should be setting > SMCR_EL3.LEN to the same value for all CPUs. Currently we do > do that, but it's always the reset value of 0. Richard: does > that have any odd effects (I have a feeling it clamps the > VL to the minimum supported value)? Should we be setting > SMCR_EL3.LEN to the max supported value here ? In any case, this patch is right as far as it goes and obviously improves the current situation, and it puts SME into the same place as SVE, so I'll take this into target-arm.next. -- PMM
[RFC PATCH v2 1/4] tests/acpi: virt: allow acpi MADT and FADT changes
Step 3 from bios-tables-test.c documented procedure. Signed-off-by: Miguel Luis --- tests/qtest/bios-tables-test-allowed-diff.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h index dfb8523c8b..8dc50f7a8a 100644 --- a/tests/qtest/bios-tables-test-allowed-diff.h +++ b/tests/qtest/bios-tables-test-allowed-diff.h @@ -1 +1,7 @@ /* List of comma-separated changed AML files to ignore */ +"tests/data/acpi/virt/FACP", +"tests/data/acpi/virt/FACP.numamem", +"tests/data/acpi/virt/FACP.memhp", +"tests/data/acpi/virt/APIC", +"tests/data/acpi/virt/APIC.memhp", +"tests/data/acpi/virt/APIC.numamem", -- 2.37.3
[RFC PATCH v2 3/4] acpi: arm/virt: madt: bump to revision 4 accordingly to ACPI 6.0 Errata A
MADT has been updated with the GIC Structures from ACPI 6.0 Errata A and so MADT revision and GICC Structure must be updated also. Fixes: 37f33084ed2e ("acpi: arm/virt: madt: use build_append_int_noprefix() API to compose MADT table") Signed-off-by: Miguel Luis Reviewed-by: Ani Sinha --- hw/arm/virt-acpi-build.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 72bb6f61a5..2d21e3cec4 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -686,7 +686,7 @@ build_dbg2(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) }; /* - * ACPI spec, Revision 5.1 Errata A + * ACPI spec, Revision 6.0 Errata A * 5.2.12 Multiple APIC Description Table (MADT) */ static void build_append_gicr(GArray *table_data, uint64_t base, uint32_t size) @@ -705,7 +705,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) int i; VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms); const MemMapEntry *memmap = vms->memmap; -AcpiTable table = { .sig = "APIC", .rev = 3, .oem_id = vms->oem_id, +AcpiTable table = { .sig = "APIC", .rev = 4, .oem_id = vms->oem_id, .oem_table_id = vms->oem_table_id }; acpi_table_begin(&table, table_data); @@ -740,7 +740,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) /* 5.2.12.14 GIC Structure */ build_append_int_noprefix(table_data, 0xB, 1); /* Type */ -build_append_int_noprefix(table_data, 76, 1); /* Length */ +build_append_int_noprefix(table_data, 80, 1); /* Length */ build_append_int_noprefix(table_data, 0, 2);/* Reserved */ build_append_int_noprefix(table_data, i, 4);/* GIC ID */ build_append_int_noprefix(table_data, i, 4);/* ACPI Processor UID */ @@ -760,6 +760,10 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) build_append_int_noprefix(table_data, 0, 8);/* GICR Base Address*/ /* MPIDR */ build_append_int_noprefix(table_data, armcpu->mp_affinity, 8); +/* Processor Power Efficiency Class */ +build_append_int_noprefix(table_data, 0, 1); +/* Reserved */ +build_append_int_noprefix(table_data, 0, 3); } if (vms->gic_version != VIRT_GIC_VERSION_2) { @@ -771,12 +775,6 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) } if (its_class_name() && !vmc->no_its) { -/* - * FIXME: Structure is from Revision 6.0 where 'GIC Structure' - * has additional fields on top of implemented 5.1 Errata A, - * to make it consistent with v6.0 we need to bump everything - * to v6.0 - */ /* * ACPI spec, Revision 6.0 Errata A * (original 6.0 definition has invalid Length) -- 2.37.3
[RFC PATCH v2 0/4] ACPI MADT and FADT update according to the ACPI 6.0 spec
The MADT table structure has been updated in commit 37f33084ed2e ("acpi: arm/virt: madt: use build_append_int_noprefix() API to compose MADT table") to include the 5.2.12.18 GIC ITS Structure and so table's revision also needs to be updated. MADT and the FADT tables from the same spec need to be in sync and in this case also the FADT needs to be updated. Revision 6.0 of the ACPI FADT table introduces the field "Hypervisor Vendor Identity" which is missing and must be included. Patch 2/4 includes a suggestion for the value of this field. Ref: https://uefi.org/sites/default/files/resources/ACPI_6_0_Errata_A.PDF Changelog: v2: patch 2/4: fix expression that checks for the revision number (Ani Sinha) use "QEMU" as the Hypervisor Vendor ID [1] (Ani Sinha) patch 3/4: add Reviewed-by tag from Ani Sinha v1: https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg00910.html Open to discussion, your comments, thoughts and suggestions are very welcome. Thanks in advance. Miguel [1]: https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg00989.html Miguel Luis (4): tests/acpi: virt: allow acpi MADT and FADT changes acpi: fadt: support revision 6.0 of the ACPI specification acpi: arm/virt: madt: bump to revision 4 accordingly to ACPI 6.0 Errata A tests/acpi: virt: update ACPI MADT and FADT binaries hw/acpi/aml-build.c | 13 ++--- hw/arm/virt-acpi-build.c | 26 -- tests/data/acpi/virt/APIC | Bin 168 -> 172 bytes tests/data/acpi/virt/APIC.memhp | Bin 168 -> 172 bytes tests/data/acpi/virt/APIC.numamem | Bin 168 -> 172 bytes tests/data/acpi/virt/FACP | Bin 268 -> 276 bytes tests/data/acpi/virt/FACP.memhp | Bin 268 -> 276 bytes tests/data/acpi/virt/FACP.numamem | Bin 268 -> 276 bytes 8 files changed, 22 insertions(+), 17 deletions(-) -- 2.37.3
[RFC PATCH v2 4/4] tests/acpi: virt: update ACPI MADT and FADT binaries
Step 6 & 7 of the bios-tables-test.c documented procedure. Differences between disassembled ASL files for MADT: @@ -11,9 +11,9 @@ */ [000h 4]Signature : "APIC"[Multiple APIC Description Table (MADT)] -[004h 0004 4] Table Length : 00A8 -[008h 0008 1] Revision : 03 -[009h 0009 1] Checksum : 50 +[004h 0004 4] Table Length : 00AC +[008h 0008 1] Revision : 04 +[009h 0009 1] Checksum : 47 [00Ah 0010 6] Oem ID : "BOCHS " [010h 0016 8] Oem Table ID : "BXPC" [018h 0024 4] Oem Revision : 0001 @@ -34,7 +34,7 @@ [041h 0065 3] Reserved : 00 [044h 0068 1]Subtable Type : 0B [Generic Interrupt Controller] -[045h 0069 1] Length : 4C +[045h 0069 1] Length : 50 [046h 0070 2] Reserved : [048h 0072 4] CPU Interface Number : [04Ch 0076 4]Processor UID : @@ -51,28 +51,29 @@ [07Ch 0124 4]Virtual GIC Interrupt : [080h 0128 8] Redistributor Base Address : [088h 0136 8]ARM MPIDR : -/ ACPI subtable terminates early - may be older version (dump table) */ +[090h 0144 1] Efficiency Class : 00 +[091h 0145 3] Reserved : 00 -[090h 0144 1]Subtable Type : 0D [Generic MSI Frame] -[091h 0145 1] Length : 18 -[092h 0146 2] Reserved : -[094h 0148 4] MSI Frame ID : -[098h 0152 8] Base Address : 0802 -[0A0h 0160 4]Flags (decoded below) : 0001 +[094h 0148 1]Subtable Type : 0D [Generic MSI Frame] +[095h 0149 1] Length : 18 +[096h 0150 2] Reserved : +[098h 0152 4] MSI Frame ID : +[09Ch 0156 8] Base Address : 0802 +[0A4h 0164 4]Flags (decoded below) : 0001 Select SPI : 1 -[0A4h 0164 2]SPI Count : 0040 -[0A6h 0166 2] SPI Base : 0050 +[0A8h 0168 2]SPI Count : 0040 +[0AAh 0170 2] SPI Base : 0050 -Raw Table Data: Length 168 (0xA8) +Raw Table Data: Length 172 (0xAC) -: 41 50 49 43 A8 00 00 00 03 50 42 4F 43 48 53 20 // APIC.PBOCHS +: 41 50 49 43 AC 00 00 00 04 47 42 4F 43 48 53 20 // APIC.GBOCHS 0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43 // BXPCBXPC 0020: 01 00 00 00 00 00 00 00 00 00 00 00 0C 18 00 00 // 0030: 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 00 // -0040: 02 00 00 00 0B 4C 00 00 00 00 00 00 00 00 00 00 // .L.. +0040: 02 00 00 00 0B 50 00 00 00 00 00 00 00 00 00 00 // .P.. 0050: 01 00 00 00 00 00 00 00 17 00 00 00 00 00 00 00 // 0060: 00 00 00 00 00 00 01 08 00 00 00 00 00 00 04 08 // 0070: 00 00 00 00 00 00 03 08 00 00 00 00 00 00 00 00 // 0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 // -0090: 0D 18 00 00 00 00 00 00 00 00 02 08 00 00 00 00 // -00A0: 01 00 00 00 40 00 50 00 // @.P. +0090: 00 00 00 00 0D 18 00 00 00 00 00 00 00 00 02 08 // +00A0: 00 00 00 00 01 00 00 00 40 00 50 00 // @.P. Differences between disassembled ASL files for FADT: @@ -11,9 +11,9 @@ */ [000h 4]Signature : "FACP"[Fixed ACPI Description Table (FADT)] -[004h 0004 4] Table Length : 010C -[008h 0008 1] Revision : 05 -[009h 0009 1] Checksum : 55 +[004h 0004 4] Table Length : 0114 +[008h 0008 1] Revision : 06 +[009h 0009 1] Checksum : 15 [00Ah 0010 6] Oem ID : "BOCHS " [010h 0016 8] Oem Table ID : "BXPC" [018h 0024 4] Oem Revision : 0001 @@ -99,7 +99,7 @@ PSCI Compliant : 1 Must use HVC for PSCI : 1 -[083h 0131 1] FADT Minor Revision : 01 +[083h 0131 1] FADT Minor Revision : 00 [084h 0132 8] FACS Address : [08Ch 0140 8] DSDT Address : [094h 0148 12] PM1A Event Block : [Generic Address Structure] @@ -173,11 +173,11 @@ [103h 0259 1] Encoded Access Width : 00 [Undefined/Legacy] [104h 0260 8] A
[RFC PATCH v2 2/4] acpi: fadt: support revision 6.0 of the ACPI specification
Update the Fixed ACPI Description Table (FADT) to revision 6.0 of the ACPI specification adding the field "Hypervisor Vendor Identity" that was missing. This field's description states the following: "64-bit identifier of hypervisor vendor. All bytes in this field are considered part of the vendor identity. These identifiers are defined independently by the vendors themselves, usually following the name of the hypervisor product. Version information should NOT be included in this field - this shall simply denote the vendor's name or identifier. Version information can be communicated through a supplemental vendor-specific hypervisor API. Firmware implementers would place zero bytes into this field, denoting that no hypervisor is present in the actual firmware." Hereupon, what should a valid identifier of an Hypervisor Vendor ID be and where should QEMU provide that information? On the v1 [1] of this RFC there's the suggestion of having this information in sync by the current acceleration name. This also seems to imply that QEMU, which generates the FADT table, and the FADT consumer need to be in sync with the values of this field. This version follows Ani Sinha's suggestion [2] of using "QEMU" for the hypervisor vendor ID. [1]: https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg00911.html [2]: https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg00989.html Signed-off-by: Miguel Luis --- hw/acpi/aml-build.c | 13 ++--- hw/arm/virt-acpi-build.c | 10 +- 2 files changed, 15 insertions(+), 8 deletions(-) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index e6bfac95c7..42feb4d4d7 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -2070,7 +2070,7 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms, acpi_table_end(linker, &table); } -/* build rev1/rev3/rev5.1 FADT */ +/* build rev1/rev3/rev5.1/rev6.0 FADT */ void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, const char *oem_id, const char *oem_table_id) { @@ -2193,8 +2193,15 @@ void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, /* SLEEP_STATUS_REG */ build_append_gas_from_struct(tbl, &f->sleep_sts); -/* TODO: extra fields need to be added to support revisions above rev5 */ -assert(f->rev == 5); +if (f->rev == 5) { +goto done; +} + +/* Hypervisor Vendor Identity */ +build_append_padded_str(tbl, "QEMU", 8, '\0'); + +/* TODO: extra fields need to be added to support revisions above rev6 */ +assert(f->rev == 6); done: acpi_table_end(linker, &table); diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 9b3aee01bf..72bb6f61a5 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -809,13 +809,13 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) } /* FADT */ -static void build_fadt_rev5(GArray *table_data, BIOSLinker *linker, +static void build_fadt_rev6(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms, unsigned dsdt_tbl_offset) { -/* ACPI v5.1 */ +/* ACPI v6.0 */ AcpiFadtData fadt = { -.rev = 5, -.minor_ver = 1, +.rev = 6, +.minor_ver = 0, .flags = 1 << ACPI_FADT_F_HW_REDUCED_ACPI, .xdsdt_tbl_offset = &dsdt_tbl_offset, }; @@ -945,7 +945,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables) /* FADT MADT PPTT GTDT MCFG SPCR DBG2 pointed to by RSDT */ acpi_add_table(table_offsets, tables_blob); -build_fadt_rev5(tables_blob, tables->linker, vms, dsdt); +build_fadt_rev6(tables_blob, tables->linker, vms, dsdt); acpi_add_table(table_offsets, tables_blob); build_madt(tables_blob, tables->linker, vms); -- 2.37.3
[PATCH v2 01/11] migration: support file: uri for source migration
Implement support for a "file:" uri so that a migration can be initiated directly to a file from QEMU. Signed-off-by: Nikolay Borisov --- migration/file.c | 23 +++ migration/file.h | 9 + migration/meson.build | 1 + migration/migration.c | 3 +++ 4 files changed, 36 insertions(+) create mode 100644 migration/file.c create mode 100644 migration/file.h diff --git a/migration/file.c b/migration/file.c new file mode 100644 index ..02896a7cab99 --- /dev/null +++ b/migration/file.c @@ -0,0 +1,23 @@ +#include "qemu/osdep.h" +#include "channel.h" +#include "io/channel-file.h" +#include "file.h" +#include "qemu/error-report.h" + + +void file_start_outgoing_migration(MigrationState *s, const char *fname, Error **errp) +{ + QIOChannelFile *ioc; + + ioc = qio_channel_file_new_path(fname, O_CREAT|O_TRUNC|O_WRONLY, 0660, errp); + if (!ioc) { + error_report("Error creating a channel"); + return; + } + + qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-outgoing"); + migration_channel_connect(s, QIO_CHANNEL(ioc), NULL, NULL); + object_unref(OBJECT(ioc)); +} + + diff --git a/migration/file.h b/migration/file.h new file mode 100644 index ..d476eb1157f9 --- /dev/null +++ b/migration/file.h @@ -0,0 +1,9 @@ +#ifndef QEMU_MIGRATION_FILE_H +#define QEMU_MIGRATION_FILE_H + +void file_start_outgoing_migration(MigrationState *s, + const char *filename, + Error **errp); + +#endif + diff --git a/migration/meson.build b/migration/meson.build index 690487cf1a81..30a8392701c3 100644 --- a/migration/meson.build +++ b/migration/meson.build @@ -17,6 +17,7 @@ softmmu_ss.add(files( 'colo.c', 'exec.c', 'fd.c', + 'file.c', 'global_state.c', 'migration.c', 'multifd.c', diff --git a/migration/migration.c b/migration/migration.c index bb8bbddfe467..8813b78b9a6b 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -20,6 +20,7 @@ #include "migration/blocker.h" #include "exec.h" #include "fd.h" +#include "file.h" #include "socket.h" #include "sysemu/runstate.h" #include "sysemu/sysemu.h" @@ -2414,6 +2415,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, exec_start_outgoing_migration(s, p, &local_err); } else if (strstart(uri, "fd:", &p)) { fd_start_outgoing_migration(s, p, &local_err); +} else if (strstart(uri, "file:", &p)) { + file_start_outgoing_migration(s, p, &local_err); } else { if (!(has_resume && resume)) { yank_unregister_instance(MIGRATION_YANK_INSTANCE); -- 2.34.1
[PATCH v2 10/11] migration: Add support for 'fixed-ram' migration restore
Add the necessary code to parse the format changes for 'fixed-ram' capability. One of the more notable changes in behavior is that in the 'fixed-ram' case ram pages are restored in one go rather than constantly looping through the migration stream. Also due to idiosyncrasies of the format I have added the 'ram_migrated' since it was easier to simply return directly from ->load_state rather than introducing more conditionals around the code to prevent ->load_state being called multiple times (from qemu_loadvm_section_start_full/qemu_loadvm_section_part_end i.e. from multiple QEMU_VM_SECTION_(PART|END) flags). Signed-off-by: Nikolay Borisov --- migration/migration.h | 2 + migration/ram.c | 95 ++- 2 files changed, 95 insertions(+), 2 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index 9aab1b16f407..7a832d072415 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -96,6 +96,8 @@ struct MigrationIncomingState { bool have_listen_thread; QemuThread listen_thread; +bool ram_migrated; + /* For the kernel to send us notifications */ int userfault_fd; /* To notify the fault_thread to wake, e.g., when need to quit */ diff --git a/migration/ram.c b/migration/ram.c index 1dd68c221667..15441ed9f745 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4330,6 +4330,90 @@ static int parse_ramblocks(QEMUFile *f, ram_addr_t total_ram_bytes) return ret; } + +static int parse_ramblocks_fixed_ram(QEMUFile *f) +{ +int ret = 0; + +while (!ret) { +char id[256]; +RAMBlock *block; +ram_addr_t length; +unsigned long clear_bit_idx; +long num_pages, bitmap_size; +int len = qemu_get_byte(f); +g_autofree unsigned long *dirty_bitmap = NULL; + +qemu_get_buffer(f, (uint8_t *)id, len); +id[len] = 0; +length = qemu_get_be64(f); + +block = qemu_ram_block_by_name(id); +if (block) { +ret = parse_ramblock(f, block, length); +if (ret < 0) { +return ret; +} +} else { +error_report("Unknown ramblock \"%s\", cannot accept " + "migration", id); +ret = -EINVAL; +continue; +} + +/* 1. read the bitmap size */ +num_pages = length >> TARGET_PAGE_BITS; +bitmap_size = qemu_get_be32(f); + +assert(bitmap_size == BITS_TO_LONGS(num_pages)*sizeof(unsigned long)); + +block->pages_offset = qemu_get_be64(f); + +/* 2. read the actual bitmap */ +dirty_bitmap = g_malloc0(bitmap_size); +if (qemu_get_buffer(f, (uint8_t *)dirty_bitmap, bitmap_size) != bitmap_size) { +error_report("Error parsing dirty bitmap"); +return -EINVAL; +} + +#define BUFSIZE (4*1024*1024) +for (unsigned long set_bit_idx = find_first_bit(dirty_bitmap, num_pages); + set_bit_idx < num_pages; + set_bit_idx = find_next_bit(dirty_bitmap, num_pages, clear_bit_idx + 1)) { + +clear_bit_idx = find_next_zero_bit(dirty_bitmap, num_pages, set_bit_idx + 1); +unsigned long len = TARGET_PAGE_SIZE * (clear_bit_idx - set_bit_idx); +ram_addr_t offset = set_bit_idx << TARGET_PAGE_BITS; + +for (size_t written = 0, completed = 0; completed < len; offset += written) { +void *host = host_from_ram_block_offset(block, offset); +size_t read_len = MIN(len, BUFSIZE); + +written = qemu_get_buffer_at(f, host, read_len, + block->pages_offset + offset); +completed += written; +} +} + +/* Skip pages array */ +qemu_set_offset(f, block->pages_offset + length, SEEK_SET); + +/* Check if this is the last ramblock */ +if (qemu_get_be64(f) == RAM_SAVE_FLAG_EOS) { +ret = 1; +} else { +/* + * If not, adjust the internal file index to account for the + * previous 64 bit read + */ +qemu_file_skip(f, -8); +ret = 0; +} +} + +return ret; +} + /** * ram_load_precopy: load pages in precopy case * @@ -4349,7 +4433,7 @@ static int ram_load_precopy(QEMUFile *f) invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE; } -while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) { +while (!ret && !(flags & RAM_SAVE_FLAG_EOS) && !mis->ram_migrated) { ram_addr_t addr; void *host = NULL, *host_bak = NULL; uint8_t ch; @@ -4421,7 +4505,14 @@ static int ram_load_precopy(QEMUFile *f) switch (flags & ~RAM_SAVE_FLAG_CONTINUE) { case RAM_SAVE_FLAG_MEM_SIZE: -ret = parse_ramblocks(f, addr); +if (migrate_fixed_ram()) { +ret = parse_ramblocks_fixed_ram(f);
[PATCH v2 07/11] migration: add qemu_get_buffer_at
Restoring a 'fixed-ram' enabled migration stream would require reading from specific offsets in the file so add a helper to QEMUFile that uses the newly introduced qio_channel_file_preadv. Signed-off-by: Nikolay Borisov --- migration/qemu-file.c | 23 +++ migration/qemu-file.h | 1 + 2 files changed, 24 insertions(+) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 07ba1125e83f..73fffbdb9b7e 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -564,6 +564,29 @@ void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t po return; } + +size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos) +{ +Error *err = NULL; +struct iovec iov = { .iov_base = (char *)buf, .iov_len = buflen }; +ssize_t ret; + +if (f->last_error) { +return 0; +} + +ret = qio_channel_file_preadv(f->ioc, &iov, 1, pos, &err); +if (ret == -1) { + goto error; +} + +return (size_t)ret; + + error: +qemu_file_set_error_obj(f, -EIO, err); +return 0; +} + void qemu_set_offset(QEMUFile *f, off_t off, int whence) { Error *err = NULL; diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 33cfc07b81d1..ab10c3ad7e42 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -151,6 +151,7 @@ void qemu_file_set_blocking(QEMUFile *f, bool block); void qemu_set_offset(QEMUFile *f, off_t off, int whence); off_t qemu_get_offset(QEMUFile *f); void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos); +size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos); void ram_control_before_iterate(QEMUFile *f, uint64_t flags); void ram_control_after_iterate(QEMUFile *f, uint64_t flags); -- 2.34.1
[PATCH v2 03/11] migration: Make migration json writer part of MigrationState struct
This is required so that migration stream configuration is written to the migration stream. This would allow analyze-migration to parse enabled capabilities for the migration and adjust its behavior accordingly. This is in preparation for analyze-migration.py to support 'fixed-ram' capability format changes. Signed-off-by: Nikolay Borisov --- migration/migration.c | 5 + migration/migration.h | 3 +++ migration/savevm.c| 38 ++ 3 files changed, 30 insertions(+), 16 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 140b0f1a54bd..d0779bbaf862 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1896,6 +1896,8 @@ static void migrate_fd_cleanup(MigrationState *s) g_free(s->hostname); s->hostname = NULL; +json_writer_free(s->vmdesc); + qemu_savevm_state_cleanup(); if (s->to_dst_file) { @@ -2154,6 +2156,7 @@ void migrate_init(MigrationState *s) error_free(s->error); s->error = NULL; s->hostname = NULL; +s->vmdesc = NULL; migrate_set_state(&s->state, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP); @@ -4269,6 +4272,8 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) return; } +s->vmdesc = json_writer_new(false); + if (multifd_save_setup(&local_err) != 0) { error_report_err(local_err); migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, diff --git a/migration/migration.h b/migration/migration.h index cdad8aceaaab..96f27aba2210 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -17,6 +17,7 @@ #include "exec/cpu-common.h" #include "hw/qdev-core.h" #include "qapi/qapi-types-migration.h" +#include "qapi/qmp/json-writer.h" #include "qemu/thread.h" #include "qemu/coroutine_int.h" #include "io/channel.h" @@ -261,6 +262,8 @@ struct MigrationState { int state; +JSONWriter *vmdesc; + /* State related to return path */ struct { /* Protected by qemu_file_lock */ diff --git a/migration/savevm.c b/migration/savevm.c index 48e85c052c2c..174cdbefc29d 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1137,13 +1137,18 @@ void qemu_savevm_non_migratable_list(strList **reasons) void qemu_savevm_state_header(QEMUFile *f) { +MigrationState *s = migrate_get_current(); trace_savevm_state_header(); qemu_put_be32(f, QEMU_VM_FILE_MAGIC); qemu_put_be32(f, QEMU_VM_FILE_VERSION); -if (migrate_get_current()->send_configuration) { +if (s->send_configuration) { qemu_put_byte(f, QEMU_VM_CONFIGURATION); -vmstate_save_state(f, &vmstate_configuration, &savevm_state, 0); + json_writer_start_object(s->vmdesc, NULL); + json_writer_start_object(s->vmdesc, "configuration"); +vmstate_save_state(f, &vmstate_configuration, &savevm_state, s->vmdesc); + json_writer_end_object(s->vmdesc); + } } @@ -1364,15 +1369,16 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f, bool in_postcopy, bool inactivate_disks) { -g_autoptr(JSONWriter) vmdesc = NULL; +MigrationState *s = migrate_get_current(); int vmdesc_len; SaveStateEntry *se; int ret; -vmdesc = json_writer_new(false); -json_writer_start_object(vmdesc, NULL); -json_writer_int64(vmdesc, "page_size", qemu_target_page_size()); -json_writer_start_array(vmdesc, "devices"); +if (!s->send_configuration) { + json_writer_start_object(s->vmdesc, NULL); +} +json_writer_int64(s->vmdesc, "page_size", qemu_target_page_size()); +json_writer_start_array(s->vmdesc, "devices"); QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { if ((!se->ops || !se->ops->save_state) && !se->vmsd) { @@ -1385,12 +1391,12 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f, trace_savevm_section_start(se->idstr, se->section_id); -json_writer_start_object(vmdesc, NULL); -json_writer_str(vmdesc, "name", se->idstr); -json_writer_int64(vmdesc, "instance_id", se->instance_id); +json_writer_start_object(s->vmdesc, NULL); +json_writer_str(s->vmdesc, "name", se->idstr); +json_writer_int64(s->vmdesc, "instance_id", se->instance_id); save_section_header(f, se, QEMU_VM_SECTION_FULL); -ret = vmstate_save(f, se, vmdesc); +ret = vmstate_save(f, se, s->vmdesc); if (ret) { qemu_file_set_error(f, ret); return ret; @@ -1398,7 +1404,7 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f, trace_savevm_section_end(se->idstr, se->section_id, 0); save_section_footer(f, se); -json_writer_end_object(vmdesc); +json_writer_end_object(s->vmdesc); } if (inactivate_disks) { @@ -1417,14 +1423,14 @@ int qemu_savevm_state_complete_pr
[PATCH v2 08/11] migration/ram: Introduce 'fixed-ram' migration stream capability
Implement 'fixed-ram' feature. The core of the feature is to ensure that each ram page of the migration stream has a specific offset in the resulting migration stream. The reason why we'd want such behavior are two fold: - When doing a 'fixed-ram' migration the resulting file will have a bounded size, since pages which are dirtied multiple times will always go to a fixed location in the file, rather than constantly being added to a sequential stream. This eliminates cases where a vm with, say, 1g of ram can result in a migration file that's 10s of Gbs, provided that the workload constantly redirties memory. - It paves the way to implement DIO-enabled save/restore of the migration stream as the pages are ensured to be written at aligned offsets. The features requires changing the format. First, a bitmap is introduced which tracks which pages have been written (i.e are dirtied) during migration and subsequently it's being written in the resultin file, again at a fixed location for every ramblock. Zero pages are ignored as they'd be zero in the destination migration as well. With the changed format data would look like the following: |name len|name|used_len|pc*|bitmap_size|pages_offset|bitmap|pages| * pc - refers to the page_size/mr->addr members, so newly added members begin from "bitmap_size". This layout is initialized during ram_save_setup so instead of having a sequential stream of pages that follow the ramblock headers the dirty pages for a ramblock follow its header. Since all pages have a fixed location RAM_SAVE_FLAG_EOS is no longer generated on every migration iteration but there is effectively a single RAM_SAVE_FLAG_EOS right at the end. Signed-off-by: Nikolay Borisov --- include/exec/ramblock.h | 7 +++ migration/migration.c | 51 +- migration/migration.h | 1 + migration/ram.c | 97 + migration/savevm.c | 1 + qapi/migration.json | 2 +- 6 files changed, 138 insertions(+), 21 deletions(-) diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h index 6cbedf9e0c9a..30216c1a41d3 100644 --- a/include/exec/ramblock.h +++ b/include/exec/ramblock.h @@ -43,6 +43,13 @@ struct RAMBlock { size_t page_size; /* dirty bitmap used during migration */ unsigned long *bmap; +/* shadow dirty bitmap used when migrating to a file */ +unsigned long *shadow_bmap; +/* offset in the file pages belonging to this ramblock are saved, used + * only during migration to a file + */ +off_t bitmap_offset; +uint64_t pages_offset; /* bitmap of already received pages in postcopy */ unsigned long *receivedmap; diff --git a/migration/migration.c b/migration/migration.c index d0779bbaf862..1d08664da5db 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -165,7 +165,8 @@ INITIALIZE_MIGRATE_CAPS_SET(check_caps_background_snapshot, MIGRATION_CAPABILITY_XBZRLE, MIGRATION_CAPABILITY_X_COLO, MIGRATION_CAPABILITY_VALIDATE_UUID, -MIGRATION_CAPABILITY_ZERO_COPY_SEND); +MIGRATION_CAPABILITY_ZERO_COPY_SEND, +MIGRATION_CAPABILITY_FIXED_RAM); /* When we add fault tolerance, we could have several migrations at once. For now we don't need to add @@ -1325,6 +1326,27 @@ static bool migrate_caps_check(bool *cap_list, } #endif +if (cap_list[MIGRATION_CAPABILITY_FIXED_RAM]) { + if (cap_list[MIGRATION_CAPABILITY_MULTIFD]) { + error_setg(errp, "Directly mapped memory incompatible with multifd"); + return false; + } + + if (cap_list[MIGRATION_CAPABILITY_XBZRLE]) { + error_setg(errp, "Directly mapped memory incompatible with xbzrle"); + return false; + } + + if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) { + error_setg(errp, "Directly mapped memory incompatible with compression"); + return false; + } + + if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) { + error_setg(errp, "Directly mapped memory incompatible with postcopy ram"); + return false; + } +} /* incoming side only */ if (runstate_check(RUN_STATE_INMIGRATE) && @@ -2629,6 +2651,11 @@ MultiFDCompression migrate_multifd_compression(void) return s->parameters.multifd_compression; } +int migrate_fixed_ram(void) +{ +return migrate_get_current()->enabled_capabilities[MIGRATION_CAPABILITY_FIXED_RAM]; +} + int migrate_multifd_zlib_level(void) { MigrationState *s; @@ -4189,6 +4216,21 @@ static void *bg_migration_thread(void *opaque) return NULL; } +static int +migrate_check_fixed_ram(MigrationState *s, Error **errp) +{ +if (!s->enabled_capabilities[MIGRATION_CAPABILITY_FIXED_RAM]) +return 0; + +if (!qemu_file_is_seekable(s->to_dst_file)) { +error_setg(errp, "Directly mapped memory req
[PATCH v2 02/11] migration: Add support for 'file:' uri for incoming migration
This is a counterpart to the 'file:' uri support for source migration, now a file can also serve as the source of an incoming migration. Signed-off-by: Nikolay Borisov --- migration/file.c | 15 +++ migration/file.h | 1 + migration/migration.c | 2 ++ 3 files changed, 18 insertions(+) diff --git a/migration/file.c b/migration/file.c index 02896a7cab99..93eb718aa0f4 100644 --- a/migration/file.c +++ b/migration/file.c @@ -21,3 +21,18 @@ void file_start_outgoing_migration(MigrationState *s, const char *fname, Error * } +void file_start_incoming_migration(const char *fname, Error **errp) +{ + QIOChannelFile *ioc; + + ioc = qio_channel_file_new_path(fname, O_RDONLY, 0, errp); + if (!ioc) { + error_report("Error creating a channel"); + return; + } + + qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-incoming"); + migration_channel_process_incoming(QIO_CHANNEL(ioc)); + object_unref(OBJECT(ioc)); +} + diff --git a/migration/file.h b/migration/file.h index d476eb1157f9..cdbd291322d4 100644 --- a/migration/file.h +++ b/migration/file.h @@ -5,5 +5,6 @@ void file_start_outgoing_migration(MigrationState *s, const char *filename, Error **errp); +void file_start_incoming_migration(const char *fname, Error **errp); #endif diff --git a/migration/migration.c b/migration/migration.c index 8813b78b9a6b..140b0f1a54bd 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -506,6 +506,8 @@ static void qemu_start_incoming_migration(const char *uri, Error **errp) exec_start_incoming_migration(p, errp); } else if (strstart(uri, "fd:", &p)) { fd_start_incoming_migration(p, errp); +} else if (strstart(uri, "file:", &p)) { + file_start_incoming_migration(p, errp); } else { error_setg(errp, "unknown migration protocol: %s", uri); } -- 2.34.1
[PATCH v2 04/11] io: add pwritev support to QIOChannelFile
The upcoming 'fixed-ram' feature would require qemu to write data at specific offsets of the file. This is currently missing so this patch adds it. I've chosen to implement it as a type-specific function rather than plumb it through the generic channel interface as it relies on being able to do seeks. Signed-off-by: Nikolay Borisov --- include/io/channel-file.h | 5 + io/channel-file.c | 24 2 files changed, 29 insertions(+) diff --git a/include/io/channel-file.h b/include/io/channel-file.h index 50e8eb113868..0a5d54f5e58e 100644 --- a/include/io/channel-file.h +++ b/include/io/channel-file.h @@ -89,4 +89,9 @@ qio_channel_file_new_path(const char *path, mode_t mode, Error **errp); +ssize_t qio_channel_file_pwritev(QIOChannel *ioc, +const struct iovec *iov, +size_t niov, +off_t offset, +Error **errp); #endif /* QIO_CHANNEL_FILE_H */ diff --git a/io/channel-file.c b/io/channel-file.c index b67687c2aa64..da17d0a11ba7 100644 --- a/io/channel-file.c +++ b/io/channel-file.c @@ -136,6 +136,30 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc, return ret; } +ssize_t qio_channel_file_pwritev(QIOChannel *ioc, +const struct iovec *iov, +size_t niov, +off_t offset, +Error **errp) +{ +QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc); +ssize_t ret; + + retry: +ret = pwritev(fioc->fd, iov, niov, offset); +if (ret <= 0) { +if (errno == EAGAIN) { +return QIO_CHANNEL_ERR_BLOCK; +} +if (errno == EINTR) { +goto retry; +} +error_setg_errno(errp, errno, "Unable to write to file"); +return -1; +} +return ret; +} + static int qio_channel_file_set_blocking(QIOChannel *ioc, bool enabled, Error **errp) -- 2.34.1
[PATCH v2 00/11] Add support for fixed ram offsets during migration
Here's a slightly updated version of the patchset. In its core it remains however I have addressed a couple of issue, namely : * Don't initialize the shadow bitmap as all set - this was causing the full ram to be copied always. * Fixed a memory leak in parse_ramblocks_fixed_ram by making the dirty bitmap variable an g_autofree one * Slightly reworked how ram is being copied during restore and now the code is working in chunks of 4mb (might have to tweak this?) For a more general overview of what this series is all about refer to the initial posting [0]. [0] https://lore.kernel.org/qemu-devel/20221004123733.2745519-1-nbori...@suse.com/ Nikolay Borisov (11): migration: support file: uri for source migration migration: Add support for 'file:' uri for incoming migration migration: Make migration json writer part of MigrationState struct io: add pwritev support to QIOChannelFile io: Add support for seekable channels io: Add preadv support to QIOChannelFile migration: add qemu_get_buffer_at migration/ram: Introduce 'fixed-ram' migration stream capability migration: Refactor precopy ram loading code migration: Add support for 'fixed-ram' migration restore analyze-migration.py: add initial support for fixed ram streams include/exec/ramblock.h | 7 + include/io/channel-file.h | 10 + include/io/channel.h| 1 + include/migration/qemu-file-types.h | 2 + io/channel-file.c | 55 + migration/file.c| 38 migration/file.h| 10 + migration/meson.build | 1 + migration/migration.c | 61 +- migration/migration.h | 6 + migration/qemu-file.c | 82 +++ migration/qemu-file.h | 4 + migration/ram.c | 328 +--- migration/savevm.c | 39 ++-- qapi/migration.json | 2 +- scripts/analyze-migration.py| 49 - 16 files changed, 594 insertions(+), 101 deletions(-) create mode 100644 migration/file.c create mode 100644 migration/file.h -- 2.34.1
[PATCH v2 06/11] io: Add preadv support to QIOChannelFile
preadv is going to be needed when 'fixed-ram'-enabled stream are to be restored. Simply add a wrapper around preadv that's specific to QIOChannelFile. Signed-off-by: Nikolay Borisov --- include/io/channel-file.h | 5 + io/channel-file.c | 26 ++ 2 files changed, 31 insertions(+) diff --git a/include/io/channel-file.h b/include/io/channel-file.h index 0a5d54f5e58e..2187f5affd23 100644 --- a/include/io/channel-file.h +++ b/include/io/channel-file.h @@ -89,6 +89,11 @@ qio_channel_file_new_path(const char *path, mode_t mode, Error **errp); +ssize_t qio_channel_file_preadv(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + off_t offset, + Error **errp); ssize_t qio_channel_file_pwritev(QIOChannel *ioc, const struct iovec *iov, size_t niov, diff --git a/io/channel-file.c b/io/channel-file.c index d84a6737f2f7..edca64ad63a7 100644 --- a/io/channel-file.c +++ b/io/channel-file.c @@ -141,6 +141,32 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc, return ret; } +ssize_t qio_channel_file_preadv(QIOChannel *ioc, + const struct iovec *iov, + size_t niov, + off_t offset, + Error **errp) +{ +QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc); +ssize_t ret; + + retry: +ret = preadv(fioc->fd, iov, niov, offset); +if (ret < 0) { +if (errno == EAGAIN) { +return QIO_CHANNEL_ERR_BLOCK; +} +if (errno == EINTR) { +goto retry; +} + +error_setg_errno(errp, errno, "Unable to read from file"); +return -1; +} + +return ret; +} + ssize_t qio_channel_file_pwritev(QIOChannel *ioc, const struct iovec *iov, size_t niov, -- 2.34.1
[PATCH v2 05/11] io: Add support for seekable channels
Add a bunch of auxiliarry methods and a feature flag to work with SEEKABLE channels. Currently the only channel considered seekable is QIOChannelFile. Also add a bunch of helper functions to QEMUFile that can make use of this channel feature. All of this is in prepration for supporting 'fixed-ram' migration stream feature. Signed-off-by: Nikolay Borisov --- include/io/channel.h| 1 + include/migration/qemu-file-types.h | 2 + io/channel-file.c | 5 +++ migration/qemu-file.c | 59 + migration/qemu-file.h | 3 ++ 5 files changed, 70 insertions(+) diff --git a/include/io/channel.h b/include/io/channel.h index c680ee748021..4fc37c78e68c 100644 --- a/include/io/channel.h +++ b/include/io/channel.h @@ -41,6 +41,7 @@ enum QIOChannelFeature { QIO_CHANNEL_FEATURE_SHUTDOWN, QIO_CHANNEL_FEATURE_LISTEN, QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY, +QIO_CHANNEL_FEATURE_SEEKABLE, }; diff --git a/include/migration/qemu-file-types.h b/include/migration/qemu-file-types.h index 2867e3da84ab..eb0325ee8687 100644 --- a/include/migration/qemu-file-types.h +++ b/include/migration/qemu-file-types.h @@ -50,6 +50,8 @@ unsigned int qemu_get_be16(QEMUFile *f); unsigned int qemu_get_be32(QEMUFile *f); uint64_t qemu_get_be64(QEMUFile *f); +bool qemu_file_is_seekable(QEMUFile *f); + static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv) { qemu_put_be64(f, *pv); diff --git a/io/channel-file.c b/io/channel-file.c index da17d0a11ba7..d84a6737f2f7 100644 --- a/io/channel-file.c +++ b/io/channel-file.c @@ -35,6 +35,7 @@ qio_channel_file_new_fd(int fd) ioc->fd = fd; +qio_channel_set_feature(QIO_CHANNEL(ioc), QIO_CHANNEL_FEATURE_SEEKABLE); trace_qio_channel_file_new_fd(ioc, fd); return ioc; @@ -59,6 +60,10 @@ qio_channel_file_new_path(const char *path, return NULL; } +if (lseek(ioc->fd, 0, SEEK_CUR) != (off_t)-1) { +qio_channel_set_feature(QIO_CHANNEL(ioc), QIO_CHANNEL_FEATURE_SEEKABLE); +} + trace_qio_channel_file_new_path(ioc, path, flags, mode, ioc->fd); return ioc; diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 4f400c2e5265..07ba1125e83f 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -30,6 +30,7 @@ #include "qemu-file.h" #include "trace.h" #include "qapi/error.h" +#include "io/channel-file.h" #define IO_BUF_SIZE 32768 #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64) @@ -260,6 +261,10 @@ static void qemu_iovec_release_ram(QEMUFile *f) memset(f->may_free, 0, sizeof(f->may_free)); } +bool qemu_file_is_seekable(QEMUFile *f) +{ +return qio_channel_has_feature(f->ioc, QIO_CHANNEL_FEATURE_SEEKABLE); +} /** * Flushes QEMUFile buffer @@ -538,6 +543,60 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, size_t size) } } +void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos) +{ +Error *err = NULL; +struct iovec iov = { .iov_base = (char *)buf, .iov_len = buflen }; + +if (f->last_error) { +return; +} + +qemu_fflush(f); + +if (qio_channel_file_pwritev(f->ioc, &iov, 1, pos, &err) == (off_t)-1) +goto error; + +return; + + error: +qemu_file_set_error_obj(f, -EIO, err); +return; +} + +void qemu_set_offset(QEMUFile *f, off_t off, int whence) +{ +Error *err = NULL; +off_t ret; + +qemu_fflush(f); + +if (!qemu_file_is_writable(f)) { + f->buf_index = 0; + f->buf_size = 0; +} + +ret = qio_channel_io_seek(f->ioc, off, whence, &err); +if (ret == (off_t)-1) { +qemu_file_set_error_obj(f, -EIO, err); +} +} + +off_t qemu_get_offset(QEMUFile *f) +{ +Error *err = NULL; +off_t ret; + +qemu_fflush(f); + +ret = qio_channel_io_seek(f->ioc, 0, SEEK_CUR, &err); +if (ret == (off_t)-1) { +qemu_file_set_error_obj(f, -EIO, err); +} +return ret; +} + + void qemu_put_byte(QEMUFile *f, int v) { if (f->last_error) { diff --git a/migration/qemu-file.h b/migration/qemu-file.h index fa13d04d787c..33cfc07b81d1 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -148,6 +148,9 @@ int qemu_file_shutdown(QEMUFile *f); QEMUFile *qemu_file_get_return_path(QEMUFile *f); void qemu_fflush(QEMUFile *f); void qemu_file_set_blocking(QEMUFile *f, bool block); +void qemu_set_offset(QEMUFile *f, off_t off, int whence); +off_t qemu_get_offset(QEMUFile *f); +void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t pos); void ram_control_before_iterate(QEMUFile *f, uint64_t flags); void ram_control_after_iterate(QEMUFile *f, uint64_t flags); -- 2.34.1
[PATCH v2 11/11] analyze-migration.py: add initial support for fixed ram streams
This commit introduces the minimum code necessary to support parsing migration strems with 'fixed-ram' capability set. The only thing really missing is the implementation of write_or_dump_fixed_ram() which deals with '-x'/'-m' options. Signed-off-by: Nikolay Borisov --- scripts/analyze-migration.py | 49 +--- 1 file changed, 46 insertions(+), 3 deletions(-) diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py index b82a1b0c58c4..9785a640fbf8 100755 --- a/scripts/analyze-migration.py +++ b/scripts/analyze-migration.py @@ -23,6 +23,7 @@ import collections import struct import sys +import math def mkdir_p(path): @@ -119,11 +120,16 @@ def __init__(self, file, version_id, ramargs, section_key): self.file = file self.section_key = section_key self.TARGET_PAGE_SIZE = ramargs['page_size'] +self.TARGET_PAGE_BITS = math.log2(self.TARGET_PAGE_SIZE) self.dump_memory = ramargs['dump_memory'] self.write_memory = ramargs['write_memory'] +self.fixed_ram = ramargs['fixed-ram'] self.sizeinfo = collections.OrderedDict() +self.bitmap_offset = collections.OrderedDict() +self.pages_offset = collections.OrderedDict() self.data = collections.OrderedDict() self.data['section sizes'] = self.sizeinfo +self.ram_read = False self.name = '' if self.write_memory: self.files = { } @@ -140,7 +146,13 @@ def __str__(self): def getDict(self): return self.data +def write_or_dump_fixed_ram(self): +pass + def read(self): +if self.fixed_ram and self.ram_read: +return + # Read all RAM sections while True: addr = self.file.read64() @@ -167,7 +179,25 @@ def read(self): f.truncate(0) f.truncate(len) self.files[self.name] = f + +if self.fixed_ram: +bitmap_len = self.file.read32() +# skip the pages_offset which we don't need +offset = self.file.tell() + 8 +self.bitmap_offset[self.name] = offset +offset = ((offset + bitmap_len + self.TARGET_PAGE_SIZE - 1) // self.TARGET_PAGE_SIZE) * self.TARGET_PAGE_SIZE +self.pages_offset[self.name] = offset +self.file.file.seek(offset + len) + flags &= ~self.RAM_SAVE_FLAG_MEM_SIZE +if self.fixed_ram: +self.ram_read = True +# now we should rewind to the ram page offset of the first +# ram section +if self.fixed_ram: +if self.write_memory or self.dump_memory: +self.write_or_dump_fixed_ram() +return if flags & self.RAM_SAVE_FLAG_COMPRESS: if flags & self.RAM_SAVE_FLAG_CONTINUE: @@ -208,7 +238,7 @@ def read(self): # End of RAM section if flags & self.RAM_SAVE_FLAG_EOS: -break + return if flags != 0: raise Exception("Unknown RAM flags: %x" % flags) @@ -521,6 +551,7 @@ def read(self, desc_only = False, dump_memory = False, write_memory = False): ramargs['page_size'] = self.vmsd_desc['page_size'] ramargs['dump_memory'] = dump_memory ramargs['write_memory'] = write_memory +ramargs['fixed-ram'] = False self.section_classes[('ram',0)][1] = ramargs while True: @@ -528,8 +559,20 @@ def read(self, desc_only = False, dump_memory = False, write_memory = False): if section_type == self.QEMU_VM_EOF: break elif section_type == self.QEMU_VM_CONFIGURATION: -section = ConfigurationSection(file) -section.read() +config_desc = self.vmsd_desc.get('configuration') +if config_desc is not None: +config = VMSDSection(file, 1, config_desc, 'configuration') +config.read() +caps = config.data.get("configuration/capabilities") +if caps is not None: +caps = caps.data["capabilities"] +if type(caps) != list: +caps = [caps] +for i in caps: +# chomp out string length +cap = i.data[1:].decode("utf8") +if cap == "fixed-ram": +ramargs['fixed-ram'] = True elif section_type == self.QEMU_VM_SECTION_START or section_type == self.QEMU_VM_SECTION_FULL: section_id = file.read32() name = file.readstr() -- 2.34.1
[PATCH v2 09/11] migration: Refactor precopy ram loading code
To facilitate easier implementaiton of the 'fixed-ram' migration restore factor out the code responsible for parsing the ramblocks headers. This also makes ram_load_precopy easier to comprehend. Signed-off-by: Nikolay Borisov --- migration/ram.c | 142 +++- 1 file changed, 80 insertions(+), 62 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 4f5ddaff356b..1dd68c221667 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4253,6 +4253,83 @@ void colo_flush_ram_cache(void) trace_colo_flush_ram_cache_end(); } +static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length) +{ +int ret = 0; +/* ADVISE is earlier, it shows the source has the postcopy capability on */ +bool postcopy_advised = postcopy_is_advised(); + +assert(block); + +if (!qemu_ram_is_migratable(block)) { +error_report("block %s should not be migrated !", block->idstr); +ret = -EINVAL; +} + +if (length != block->used_length) { +Error *local_err = NULL; + +ret = qemu_ram_resize(block, length, &local_err); +if (local_err) { +error_report_err(local_err); +} +} +/* For postcopy we need to check hugepage sizes match */ +if (postcopy_advised && migrate_postcopy_ram() && +block->page_size != qemu_host_page_size) { +uint64_t remote_page_size = qemu_get_be64(f); +if (remote_page_size != block->page_size) { +error_report("Mismatched RAM page size %s " + "(local) %zd != %" PRId64, block->idstr, + block->page_size, remote_page_size); +ret = -EINVAL; +} +} +if (migrate_ignore_shared()) { +hwaddr addr = qemu_get_be64(f); +if (ramblock_is_ignored(block) && +block->mr->addr != addr) { +error_report("Mismatched GPAs for block %s " + "%" PRId64 "!= %" PRId64, block->idstr, + (uint64_t)addr, + (uint64_t)block->mr->addr); +ret = -EINVAL; +} +} +ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG, block->idstr); + +return ret; +} + +static int parse_ramblocks(QEMUFile *f, ram_addr_t total_ram_bytes) +{ +int ret = 0; + +/* Synchronize RAM block list */ +while (!ret && total_ram_bytes) { +char id[256]; +RAMBlock *block; +ram_addr_t length; +int len = qemu_get_byte(f); + +qemu_get_buffer(f, (uint8_t *)id, len); +id[len] = 0; +length = qemu_get_be64(f); + +block = qemu_ram_block_by_name(id); +if (block) { +ret = parse_ramblock(f, block, length); +} else { +error_report("Unknown ramblock \"%s\", cannot accept " + "migration", id); +ret = -EINVAL; +} +total_ram_bytes -= length; +} + +return ret; +} + /** * ram_load_precopy: load pages in precopy case * @@ -4267,14 +4344,13 @@ static int ram_load_precopy(QEMUFile *f) { MigrationIncomingState *mis = migration_incoming_get_current(); int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0; -/* ADVISE is earlier, it shows the source has the postcopy capability on */ -bool postcopy_advised = postcopy_is_advised(); + if (!migrate_use_compression()) { invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE; } while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) { -ram_addr_t addr, total_ram_bytes; +ram_addr_t addr; void *host = NULL, *host_bak = NULL; uint8_t ch; @@ -4345,65 +4421,7 @@ static int ram_load_precopy(QEMUFile *f) switch (flags & ~RAM_SAVE_FLAG_CONTINUE) { case RAM_SAVE_FLAG_MEM_SIZE: -/* Synchronize RAM block list */ -total_ram_bytes = addr; -while (!ret && total_ram_bytes) { -RAMBlock *block; -char id[256]; -ram_addr_t length; - -len = qemu_get_byte(f); -qemu_get_buffer(f, (uint8_t *)id, len); -id[len] = 0; -length = qemu_get_be64(f); - -block = qemu_ram_block_by_name(id); -if (block && !qemu_ram_is_migratable(block)) { -error_report("block %s should not be migrated !", id); -ret = -EINVAL; -} else if (block) { -if (length != block->used_length) { -Error *local_err = NULL; - -ret = qemu_ram_resize(block, length, - &local_err); -if (local_err) { -error_report_err(local_err); -} -} -/* For postcopy we need to check hugepage sizes match */ -if (postcopy_
Re: [PULL 28/54] configure: build ROMs with container-based cross compilers
Daniel P. Berrangé writes: > On Tue, Oct 04, 2022 at 02:01:12PM +0100, Alex Bennée wrote: >> From: Paolo Bonzini >> >> s390-ccw remains a bit more complex, because the -march=z900 test is done >> only for the native cross compiler. Otherwise, all that is needed is >> to pass the (now mandatory) target argument to write_target_makefile. >> >> Signed-off-by: Paolo Bonzini >> Signed-off-by: Alex Bennée >> Message-Id: <20220929114231.583801-29-alex.ben...@linaro.org> > > I'm not at all convinced this change was/is a good idea. > > First of all, it causes 'make' to now download about 1 GB of > container images > > $ ./configure --target-list=x86_64-softmmu > $ make > ...snip... > BUILD debian-powerpc-test-cross > Trying to pull > registry.gitlab.com/qemu-project/qemu/qemu/debian-powerpc-test-cross:latest... > Getting image source signatures > Copying blob 2a205c8a1d36 [=>] 12.4MiB > / 257.2MiB > > ... > ...snip... > > Despite downloading this image, it then proceeded to rebuild the > image from scratch, requiring another few 100MBs of downloads > of dpkgs. This time the download was without progress information > until it entirely failed due to a dead Debia mirror server, needing > a retry. > > It then went on to download an s390x image which seems to have > two layers, each with 360 MB. > > BUILD debian-s390x-cross > Trying to pull > registry.gitlab.com/qemu-project/qemu/qemu/debian-s390x-cross:latest... > Getting image source signatures > Copying blob fc8d65e34cd5 [>-] 12.0MiB / > 360.2MiB > Copying blob bd159e379b3b skipped: already exists > Copying blob 13224e2971af [>-] 12.2MiB / > 366.5MiB > > So overall it was more than 1 GB of downloads when typing 'make' > > I wasn't too amuzed by seeing this downloaded data , given that > I'm usually running off a 4G mobile connection, and it took a > very long time. Yikes, sorry I didn't notice that (probably because I always have most of the containers built). I was hoping the next set of patches would reduce the total re-build time to just the mirror operation by dumping docker.py and any caching that breaks. > The progress information printed by docker when downloading > the images splatters all over the output meson displays, when > doing a parallel make making everything unintelligible. > > > Finally, I had requested only building x86_64, so we shouldn't > be doing anything related to ppc or s390 at all, but even if > > AFAICT, it enables this downloading unconditionally merely by > having 'docker'/'podman' binaries installed, if you don't > otherwise have cross compuilers present. > > I'd really not want to see any of this stuff downloaded without > an explicit opt-in choice at configure time. > > I'm also a little concerned at what happens if we have to stop > publishing the containers at registry.gitlab.com in future. Are > we going to break the default 'make' for existing released QEMU > tarballs ? We can easily move the registry around. The aim of this work is to eventually stop local re-builds for most people. > > Generally we've only relied on the gitlab infra for our CI > testing, so we have been free to change infra or alter the > way we publish images at any time, without risk of impact on > the released tarballs. > > This isn't a theoretical problem, because GitLab has announced > their intention to limit storage usage in gitlab.com, and even > having joined the Open Source Program, our quota is only increased > from 5 GB to 25 GB. I'd be concerned we're at risk of exceeding > that 25 GB limit, when they start to enforce it, requiring us to > move container image host to somewhere else such as quay.io > > >> diff --git a/configure b/configure >> index c175650eb9..a54e17aca9 100755 >> --- a/configure >> +++ b/configure >> @@ -2152,7 +2152,7 @@ probe_target_compiler() { >> target_ranlib= >> target_strip= >>fi >> - test -n "$target_cc" >> + test -n "$target_cc" || test -n "$container_image" >> } >> >> write_target_makefile() { >> @@ -2307,7 +2307,7 @@ if test "$targetos" != "darwin" && test "$targetos" != >> "sunos" && \ >> config_mak=pc-bios/optionrom/config.mak >> echo "# Automatically generated by configure - do not modify" > >> $config_mak >> echo "TOPSRC_DIR=$source_path" >> $config_mak >> -write_target_makefile >> $config_mak >> +write_target_makefile pc-bios/optionrom/all >> $config_mak >> fi >> >> if test "$softmmu" = yes && probe_target_compiler ppc-softmmu; then >> @@ -2315,25 +2315,31 @@ if test "$softmmu" = yes && probe_target_compiler >> ppc-softmmu; then >> config_mak=pc-bios/vof/config.mak >> echo "# Automatically generated by configure - do not modify" > >> $config_mak >> echo "SRC_DIR=$source_path/pc-bios/vof" >> $config_mak >> -write_target_makefile >> $config_mak >> +write_target_makefile pc-bios/vof/all >>
Re: [PATCH v6 12/13] blkio: implement BDRV_REQ_REGISTERED_BUF optimization
On Thu, Oct 6, 2022 at 10:35 PM Stefan Hajnoczi wrote: > Avoid bounce buffers when QEMUIOVector elements are within previously > registered bdrv_register_buf() buffers. > > The idea is that emulated storage controllers will register guest RAM > using bdrv_register_buf() and set the BDRV_REQ_REGISTERED_BUF on I/O > requests. Therefore no blkio_map_mem_region() calls are necessary in the > performance-critical I/O code path. > > This optimization doesn't apply if the I/O buffer is internally > allocated by QEMU (e.g. qcow2 metadata). There we still take the slow > path because BDRV_REQ_REGISTERED_BUF is not set. > > Signed-off-by: Stefan Hajnoczi > --- > block/blkio.c | 183 +- > 1 file changed, 180 insertions(+), 3 deletions(-) > > diff --git a/block/blkio.c b/block/blkio.c > index 9a79789a39..5ce61d5d94 100644 > --- a/block/blkio.c > +++ b/block/blkio.c > @@ -11,9 +11,13 @@ > #include "qemu/osdep.h" > #include > #include "block/block_int.h" > +#include "exec/memory.h" > +#include "exec/cpu-common.h" /* for qemu_ram_get_fd() */ > #include "qapi/error.h" > +#include "qemu/error-report.h" > #include "qapi/qmp/qdict.h" > #include "qemu/module.h" > +#include "exec/memory.h" /* for ram_block_discard_disable() */ > > /* > * Keep the QEMU BlockDriver names identical to the libblkio driver names. > @@ -73,6 +77,12 @@ typedef struct { > > /* Can we skip adding/deleting blkio_mem_regions? */ > bool needs_mem_regions; > + > +/* Are file descriptors necessary for blkio_mem_regions? */ > +bool needs_mem_region_fd; > + > +/* Are madvise(MADV_DONTNEED)-style operations unavailable? */ > +bool mem_regions_pinned; > } BDRVBlkioState; > > /* Called with s->bounce_lock held */ > @@ -347,7 +357,8 @@ blkio_co_preadv(BlockDriverState *bs, int64_t offset, > int64_t bytes, > .coroutine = qemu_coroutine_self(), > }; > BDRVBlkioState *s = bs->opaque; > -bool use_bounce_buffer = s->needs_mem_regions; > +bool use_bounce_buffer = > +s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF); > BlkioBounceBuf bounce; > struct iovec *iov = qiov->iov; > int iovcnt = qiov->niov; > @@ -390,7 +401,8 @@ static int coroutine_fn blkio_co_pwritev(BlockDriverState > *bs, int64_t offset, > .coroutine = qemu_coroutine_self(), > }; > BDRVBlkioState *s = bs->opaque; > -bool use_bounce_buffer = s->needs_mem_regions; > +bool use_bounce_buffer = > +s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF); > BlkioBounceBuf bounce; > struct iovec *iov = qiov->iov; > int iovcnt = qiov->niov; > @@ -473,6 +485,130 @@ static void blkio_io_unplug(BlockDriverState *bs) > } > } > > +typedef enum { > +BMRR_OK, > +BMRR_SKIP, > +BMRR_FAIL, > +} BlkioMemRegionResult; > + > +/* > + * Produce a struct blkio_mem_region for a given address and size. > + * > + * This function produces identical results when called multiple times with > the > + * same arguments. This property is necessary because > blkio_unmap_mem_region() > + * must receive the same struct blkio_mem_region field values that were > passed > + * to blkio_map_mem_region(). > + */ > +static BlkioMemRegionResult > +blkio_mem_region_from_host(BlockDriverState *bs, > + void *host, size_t size, > + struct blkio_mem_region *region, > + Error **errp) > +{ > +BDRVBlkioState *s = bs->opaque; > +int fd = -1; > +ram_addr_t fd_offset = 0; > + > +if (((uintptr_t)host | size) % s->mem_region_alignment) { > +error_setg(errp, "unaligned buf %p with size %zu", host, size); > +return BMRR_FAIL; > +} > + > +/* Attempt to find the fd for the underlying memory */ > +if (s->needs_mem_region_fd) { > +RAMBlock *ram_block; > +RAMBlock *end_block; > +ram_addr_t offset; > + > +/* > + * bdrv_register_buf() is called with the BQL held so mr lives at > least > + * until this function returns. > + */ > +ram_block = qemu_ram_block_from_host(host, false, &fd_offset); > +if (ram_block) { > +fd = qemu_ram_get_fd(ram_block); > +} > +if (fd == -1) { > +/* > + * Ideally every RAMBlock would have an fd. pc-bios and other > + * things don't. Luckily they are usually not I/O buffers and we > + * can just ignore them. > + */ > +return BMRR_SKIP; > +} > + > +/* Make sure the fd covers the entire range */ > +end_block = qemu_ram_block_from_host(host + size - 1, false, > &offset); > +if (ram_block != end_block) { > +error_setg(errp, "registered buffer at %p with size %zu extends " > + "beyond RAMBlock", host, size); > +return BMRR_FAIL; > +} > +} > + > +*regi
Re: [PATCH] include/qemu/atomic128: Support 16-byte atomic read/write for Intel AVX
On 10/10/22 02:49, Peter Maydell wrote: On Sat, 8 Oct 2022 at 16:38, Richard Henderson wrote: Intel has now given guarantees about the atomicity of SSE read and write instructions on cpus supporting AVX. We can use these instead of the much slower cmpxchg16b. Derived from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 Signed-off-by: Richard Henderson +/* + * The xgetbv instruction is not available to older versions of + * the assembler, so we encode the instruction manually. + */ +asm(".byte 0x0f, 0x01, 0xd0" : "=a" (xcrl), "=d" (xcrh) : "c" (0)); This would make the third place in the tree where we hand-code this asm instruction (we already do it in the xgetbv() function in target/i386/hvf/x86_cpuid.c and opencoded in tcg_target_init()): can we abstract this out somehow, please? (There is also a just-written-out "xgetbv" in init_cpuid_cache(), but that one's guarded by ifdefs so presumably OK.) It's also possible that the Xcode revision that didn't know xgetbv is now unsupported. Something to check, I suppose. r~
Re: [PATCH] hw/arm/boot: set CPTR_EL3.ESM and SCR_EL3.EnTP2 when booting Linux with EL3
On 10/10/22 06:16, Peter Maydell wrote: The doc also says that we (as fake EL3) should be setting SMCR_EL3.LEN to the same value for all CPUs. Currently we do do that, but it's always the reset value of 0. Richard: does that have any odd effects (I have a feeling it clamps the VL to the minimum supported value)? Yes, it clamps to minimum. Should we be setting SMCR_EL3.LEN to the max supported value here? We can set it to 15 universally without consequence. It will be clamped to the max supported value elsewhere when interpreting the combined EL[123] fields. r~
Re: [RISU PATCH 1/5] risu: Use alternate stack
On 9/17/22 00:43, Song Gao wrote: We can use alternate stack, so that we can use sp register as intput/ouput register. I had tested aarch64/LoongArch architecture. Signed-off-by: Song Gao --- risu.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) Good idea. Reviewed-by: Richard Henderson r~
[PATCH 1/3] linux-user: i386/signal: move fpstate at the end of the 32-bit frames
Recent versions of Linux moved the 32-bit fpstate towards the end of the frame, so that the variable-sized xsave data does not overwrite the (ABI-defined) extramask[] field. Follow suit in QEMU. Signed-off-by: Paolo Bonzini --- linux-user/i386/signal.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index 4372621a4d..76317a3d16 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -163,9 +163,16 @@ struct sigframe { abi_ulong pretcode; int sig; struct target_sigcontext sc; -struct target_fpstate fpstate; +/* + * The actual fpstate is placed after retcode[] below, to make + * room for the variable-sized xsave data. The older unused fpstate + * has to be kept to avoid changing the offset of extramask[], which + * is part of the ABI. + */ +struct target_fpstate fpstate_unused; abi_ulong extramask[TARGET_NSIG_WORDS-1]; char retcode[8]; +struct target_fpstate fpstate; }; struct rt_sigframe { @@ -175,8 +182,8 @@ struct rt_sigframe { abi_ulong puc; struct target_siginfo info; struct target_ucontext uc; -struct target_fpstate fpstate; char retcode[8]; +struct target_fpstate fpstate; }; #else -- 2.37.3
[PATCH 2/3] linux-user: i386/signal: support FXSAVE fpstate on 32-bit emulation
Linux can use FXSAVE to save/restore XMM registers even on 32-bit systems. This requires some care in order to keep the FXSAVE area aligned to 16 bytes; for this reason, get_sigframe is changed to pass the offset into the FXSAVE area rather than the full frame size. Signed-off-by: Paolo Bonzini --- linux-user/i386/signal.c | 129 +++ 1 file changed, 77 insertions(+), 52 deletions(-) diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index 76317a3d16..53c1c05581 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -39,29 +39,7 @@ struct target_xmmreg { uint32_t element[4]; }; -struct target_fpstate_32 { -/* Regular FPU environment */ -uint32_t cw; -uint32_t sw; -uint32_t tag; -uint32_t ipoff; -uint32_t cssel; -uint32_t dataoff; -uint32_t datasel; -struct target_fpreg st[8]; -uint16_t status; -uint16_t magic; /* 0x = regular FPU data only */ - -/* FXSR FPU environment */ -uint32_t _fxsr_env[6]; /* FXSR FPU env is ignored */ -uint32_t mxcsr; -uint32_t reserved; -struct target_fpxreg fxsr_st[8]; /* FXSR FPU reg data is ignored */ -struct target_xmmreg xmm[8]; -uint32_t padding[56]; -}; - -struct target_fpstate_64 { +struct target_fpstate_fxsave { /* FXSAVE format */ uint16_t cw; uint16_t sw; @@ -75,11 +53,36 @@ struct target_fpstate_64 { uint32_t xmm_space[64]; uint32_t reserved[24]; }; +#define TARGET_FXSAVE_SIZE sizeof(struct target_fpstate_fxsave) +QEMU_BUILD_BUG_ON(TARGET_FXSAVE_SIZE != 512); + +struct target_fpstate_32 { +/* Regular FPU environment */ +uint32_t cw; +uint32_t sw; +uint32_t tag; +uint32_t ipoff; +uint32_t cssel; +uint32_t dataoff; +uint32_t datasel; +struct target_fpreg st[8]; +uint16_t status; +uint16_t magic; /* 0x = regular FPU data only */ +struct target_fpstate_fxsave fxsave; +}; + +/* + * For simplicity, setup_frame aligns struct target_fpstate_32 to + * 16 bytes, so ensure that the FXSAVE area is also aligned. + */ +QEMU_BUILD_BUG_ON(offsetof(struct target_fpstate_32, fxsave) & 15); #ifndef TARGET_X86_64 # define target_fpstate target_fpstate_32 +# define TARGET_FPSTATE_FXSAVE_OFFSET offsetof(struct target_fpstate_32, fxsave) #else -# define target_fpstate target_fpstate_64 +# define target_fpstate target_fpstate_fxsave +# define TARGET_FPSTATE_FXSAVE_OFFSET 0 #endif struct target_sigcontext_32 { @@ -172,8 +175,16 @@ struct sigframe { struct target_fpstate fpstate_unused; abi_ulong extramask[TARGET_NSIG_WORDS-1]; char retcode[8]; -struct target_fpstate fpstate; + +/* + * This field must be 16-byte aligned in memory. Applying QEMU_ALIGNED + * to it ensures that the base of the frame has an appropriate alignment + * too. + */ +struct target_fpstate fpstate QEMU_ALIGNED(8); }; +#define TARGET_SIGFRAME_FXSAVE_OFFSET (\ +offsetof(struct sigframe, fpstate) + TARGET_FPSTATE_FXSAVE_OFFSET) struct rt_sigframe { abi_ulong pretcode; @@ -183,25 +194,35 @@ struct rt_sigframe { struct target_siginfo info; struct target_ucontext uc; char retcode[8]; -struct target_fpstate fpstate; +struct target_fpstate fpstate QEMU_ALIGNED(8); }; - +#define TARGET_RT_SIGFRAME_FXSAVE_OFFSET ( \ +offsetof(struct rt_sigframe, fpstate) + TARGET_FPSTATE_FXSAVE_OFFSET) #else struct rt_sigframe { abi_ulong pretcode; struct target_ucontext uc; struct target_siginfo info; -struct target_fpstate fpstate; +struct target_fpstate fpstate QEMU_ALIGNED(16); }; - +#define TARGET_RT_SIGFRAME_FXSAVE_OFFSET ( \ +offsetof(struct rt_sigframe, fpstate) + TARGET_FPSTATE_FXSAVE_OFFSET) #endif /* * Set up a signal frame. */ -/* XXX: save x87 state */ +static void fxsave_sigcontext(CPUX86State *env, struct target_fpstate_fxsave *fxsave, + abi_ulong fxsave_addr) +{ +/* fxsave_addr must be 16 byte aligned for fxsave */ +assert(!(fxsave_addr & 0xf)); + +cpu_x86_fxsave(env, fxsave_addr); +} + static void setup_sigcontext(struct target_sigcontext *sc, struct target_fpstate *fpstate, CPUX86State *env, abi_ulong mask, abi_ulong fpstate_addr) @@ -233,13 +254,14 @@ static void setup_sigcontext(struct target_sigcontext *sc, cpu_x86_fsave(env, fpstate_addr, 1); fpstate->status = fpstate->sw; -magic = 0x; +if (!(env->features[FEAT_1_EDX] & CPUID_FXSR)) { +magic = 0x; +} else { +fxsave_sigcontext(env, &fpstate->fxsave, + fpstate_addr + TARGET_FPSTATE_FXSAVE_OFFSET); +magic = 0; +} __put_user(magic, &fpstate->magic); -__put_user(fpstate_addr, &sc->fpstate); - -/* non-iBCS2 extensions.. */ -__put_user(mas
[PATCH 0/3] linux-user: i386/signal: support XSAVE/XRSTOR for signal frame fpstate
These three patches add support for x86 XSAVE/XRSTOR signal frames in linux-user emulation. This ensures that signals save and restore the extended save states as well. For 32-bit emulation not even FXSAVE was used, even though the signal frame supports it. Therefore, patch 2 extends 32-bit emulation to use FXSAVE/FXRSTOR if the FXSR bit is set in CPUID. If it is not set, QEMU will use FSAVE/FRSTOR as usual; note that recent builds of glibc most likely will check for FXSR bit even on 32-bit builds, and will refuse to start with a "CPU ISA level is lower than required" error. Paolo Paolo Bonzini (3): linux-user: i386/signal: move fpstate at the end of the 32-bit frames linux-user: i386/signal: support FXSAVE fpstate on 32-bit emulation linux-user: i386/signal: support XSAVE/XRSTOR for signal frame fpstate linux-user/i386/signal.c | 227 ++- target/i386/cpu.c| 2 +- target/i386/cpu.h| 3 + target/i386/tcg/fpu_helper.c | 64 ++ 4 files changed, 210 insertions(+), 86 deletions(-) -- 2.37.3
[PATCH 3/3] linux-user: i386/signal: support XSAVE/XRSTOR for signal frame fpstate
Add support for saving/restoring extended save states when signals are delivered. This allows using AVX, MPX or PKRU registers in signal handlers. The patch follows the Linux ABI. Signed-off-by: Paolo Bonzini --- linux-user/i386/signal.c | 115 +-- target/i386/cpu.c| 2 +- target/i386/cpu.h| 3 + target/i386/tcg/fpu_helper.c | 64 +++ 4 files changed, 138 insertions(+), 46 deletions(-) diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c index 53c1c05581..7dd52eb5dc 100644 --- a/linux-user/i386/signal.c +++ b/linux-user/i386/signal.c @@ -39,6 +39,15 @@ struct target_xmmreg { uint32_t element[4]; }; +struct target_fpx_sw_bytes { +uint32_t magic1; +uint32_t extended_size; +uint64_t xfeatures; +uint32_t xstate_size; +uint32_t reserved[7]; +}; +QEMU_BUILD_BUG_ON(sizeof(struct target_fpx_sw_bytes) != 12*4); + struct target_fpstate_fxsave { /* FXSAVE format */ uint16_t cw; @@ -51,10 +60,13 @@ struct target_fpstate_fxsave { uint32_t mxcsr_mask; uint32_t st_space[32]; uint32_t xmm_space[64]; -uint32_t reserved[24]; +uint32_t hw_reserved[12]; +struct target_fpx_sw_bytes sw_reserved; +uint8_t xfeatures[]; }; #define TARGET_FXSAVE_SIZE sizeof(struct target_fpstate_fxsave) QEMU_BUILD_BUG_ON(TARGET_FXSAVE_SIZE != 512); +QEMU_BUILD_BUG_ON(offsetof(struct target_fpstate_fxsave, sw_reserved) != 464); struct target_fpstate_32 { /* Regular FPU environment */ @@ -214,13 +226,39 @@ struct rt_sigframe { * Set up a signal frame. */ -static void fxsave_sigcontext(CPUX86State *env, struct target_fpstate_fxsave *fxsave, - abi_ulong fxsave_addr) +static void xsave_sigcontext(CPUX86State *env, struct target_fpstate_fxsave *fxsave, + abi_ulong fxsave_addr) { -/* fxsave_addr must be 16 byte aligned for fxsave */ -assert(!(fxsave_addr & 0xf)); +if (!(env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE)) { +/* fxsave_addr must be 16 byte aligned for fxsave */ +assert(!(fxsave_addr & 0xf)); -cpu_x86_fxsave(env, fxsave_addr); +cpu_x86_fxsave(env, fxsave_addr); +__put_user(0, &fxsave->sw_reserved.magic1); +} else { +uint32_t xstate_size = xsave_area_size(env->xcr0, false); +uint32_t xfeatures_size = xstate_size - TARGET_FXSAVE_SIZE; + +/* + * extended_size is the offset from fpstate_addr to right after the end + * of the extended save states. On 32-bit that includes the legacy + * FSAVE area. + */ +uint32_t extended_size = TARGET_FPSTATE_FXSAVE_OFFSET ++ xstate_size + FP_XSTATE_MAGIC2_SIZE; + +/* fxsave_addr must be 64 byte aligned for xsave */ +assert(!(fxsave_addr & 0x3f)); + +/* Zero the header, XSAVE *adds* features to an existing save state. */ +memset(fxsave->xfeatures, 0, 64); +cpu_x86_xsave(env, fxsave_addr); +__put_user(FP_XSTATE_MAGIC1, &fxsave->sw_reserved.magic1); +__put_user(extended_size, &fxsave->sw_reserved.extended_size); +__put_user(env->xcr0, &fxsave->sw_reserved.xfeatures); +__put_user(xstate_size, &fxsave->sw_reserved.xstate_size); +__put_user(FP_XSTATE_MAGIC2, (uint32_t *) &fxsave->xfeatures[xfeatures_size]); +} } static void setup_sigcontext(struct target_sigcontext *sc, @@ -257,8 +295,8 @@ static void setup_sigcontext(struct target_sigcontext *sc, if (!(env->features[FEAT_1_EDX] & CPUID_FXSR)) { magic = 0x; } else { -fxsave_sigcontext(env, &fpstate->fxsave, - fpstate_addr + TARGET_FPSTATE_FXSAVE_OFFSET); +xsave_sigcontext(env, &fpstate->fxsave, + fpstate_addr + TARGET_FPSTATE_FXSAVE_OFFSET); magic = 0; } __put_user(magic, &fpstate->magic); @@ -291,7 +329,7 @@ static void setup_sigcontext(struct target_sigcontext *sc, __put_user((uint16_t)0, &sc->fs); __put_user(env->segs[R_SS].selector, &sc->ss); -fxsave_sigcontext(env, fpstate, fpstate_addr); +xsave_sigcontext(env, fpstate, fpstate_addr); #endif __put_user(fpstate_addr, &sc->fpstate); @@ -332,8 +370,12 @@ get_sigframe(struct target_sigaction *ka, CPUX86State *env, size_t fxsave_offset if (!(env->features[FEAT_1_EDX] & CPUID_FXSR)) { return (esp - (fxsave_offset + TARGET_FXSAVE_SIZE)) & -8ul; -} else { +} else if (!(env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE)) { return ((esp - TARGET_FXSAVE_SIZE) & -16ul) - fxsave_offset; +} else { +size_t xstate_size = + xsave_area_size(env->xcr0, false) + FP_XSTATE_MAGIC2_SIZE; +return ((esp - xstate_size) & -64ul) - fxsave_offset; } } @@ -437,7 +479,11 @@ void setup_rt_frame(int sig, struct target_sigaction *ka, } /* Create the ucontext. */ -__put_user
Re: [PATCH v2 1/3] hw/misc/latching_switch: Add a simple latching_switch device
On Tue, 20 Sept 2022 at 19:46, Jian Zhang wrote: > > Implement a simple latching switch device. > > The latching switch is a switch that can be turned on and off. > When the input new state and match the trigger edge, the switch > state will be toggled. > > This device privide 2 properties `state(bool)` and `trigger-edge(string)`, > and 2 gpios `input` and `output`. > > The `state` property is the initial state of the switch, and the > `trigger-edge` > property is the edge that will trigger the switch state to be toggled, > the value can be `rising`, `falling` or `both`. > +static void toggle_output(LatchingSwitchState *s) > +{ > +s->state = !(s->state); > +qemu_set_irq(s->output, !!(s->state)); s->state is a bool so this !! is unnecessary. > +} > + > +static void input_handler(void *opaque, int line, int new_state) > +{ > +LatchingSwitchState *s = LATCHING_SWITCH(opaque); > + > +assert(line == 0); > + > +if (s->trigger_edge == TRIGGER_EDGE_FALLING && new_state == 0) { > +toggle_output(s); This won't have the correct behaviour, because the device on the other end of this input line might call qemu_set_irq() on it twice in a row with new_state == 0 both times. It's only a falling edge if new_state is 0 and the old input line state was not 0. If you need to detect edges then you need to record the state of the input line within LatchingSwitchState. > +} else if (s->trigger_edge == TRIGGER_EDGE_RISING && new_state == 1) { > +toggle_output(s); > +} else if (s->trigger_edge == TRIGGER_EDGE_BOTH) { > +toggle_output(s); > +} > +} > + > +static void latching_switch_reset(DeviceState *dev) > +{ > +LatchingSwitchState *s = LATCHING_SWITCH(dev); > +/* reset to off */ > +s->state = false; > +/* reset to falling edge trigger */ > +s->trigger_edge = TRIGGER_EDGE_FALLING; trigger_edge isn't guest-visible runtime modifiable state, it's how the device or board configures the latch when it creates it, right? Configuration shouldn't be reset, because if the board creates a rising-edge switch, it should stay a rising edge switch even if the guest power-cycles itself. (If the QOM property can be changed at runtime things get more complex, but in this case it can't be.) > +} > + > +static const VMStateDescription vmstate_latching_switch = { > +.name = TYPE_LATCHING_SWITCH, > +.version_id = 1, > +.minimum_version_id = 1, > +.fields = (VMStateField[]) { > +VMSTATE_BOOL(state, LatchingSwitchState), > +VMSTATE_U8(trigger_edge, LatchingSwitchState), trigger_edge isn't runtime changeable so it doesn't need to be saved in the vmstate. > +VMSTATE_END_OF_LIST() > +} > +}; > + > +static void latching_switch_realize(DeviceState *dev, Error **errp) > +{ > +LatchingSwitchState *s = LATCHING_SWITCH(dev); > + > +/* init a input io */ > +qdev_init_gpio_in(dev, input_handler, 1); > + > +/* init a output io */ > +qdev_init_gpio_out(dev, &(s->output), 1); > +} > + > +static void latching_switch_class_init(ObjectClass *klass, void *data) > +{ > +DeviceClass *dc = DEVICE_CLASS(klass); > + > +dc->desc = "Latching Switch"; > +dc->vmsd = &vmstate_latching_switch; > +dc->reset = latching_switch_reset; > +dc->realize = latching_switch_realize; > +set_bit(DEVICE_CATEGORY_DISPLAY, dc->categories); This is definitely not a display device :-) > +} > + > +static void latching_switch_get_state(Object *obj, Visitor *v, const char > *name, > + void *opaque, Error **errp) > +{ > +LatchingSwitchState *s = LATCHING_SWITCH(obj); > +bool value = s->state; > + > +visit_type_bool(v, name, &value, errp); > +} > + > +static void latching_switch_set_state(Object *obj, Visitor *v, const char > *name, > + void *opaque, Error **errp) What's the requirement for being able to set the output state via a QOM property ? > +{ > +LatchingSwitchState *s = LATCHING_SWITCH(obj); > +bool value; > +Error *err = NULL; > + > +visit_type_bool(v, name, &value, &err); > +if (err) { > +error_propagate(errp, err); > +return; > +} > + > +if (value != s->state) { > +toggle_output(s); This will call qemu_set_irq() on the output, which isn't a valid thing to do during board creation or in certain stages of reset. If you need to handle "the board can create a switch which starts off with its output asserted", that can be done, but it's a little more complicated (it involves implementing your reset handler as 3-phase-reset). It looks from patch 3 like your use case in the aspeed board doesn't require this, so my suggestion would be simply to not implement it: make the switch always start with the output state being 0. > +} > +} > +static void latching_switch_get_trigger_edge(Object *obj, Visitor *v, Missing newline between the two functions. > +
[PULL 00/28] target-arm queue
Hi; this is the latest target-arm queue; most of this is a refactoring patchset from RTH for the arm page-table-walk emulation. thanks -- PMM The following changes since commit f1d33f55c47dfdaf8daacd618588ad3ae4c452d1: Merge tag 'pull-testing-gdbstub-plugins-gitdm-061022-3' of https://github.com/stsquad/qemu into staging (2022-10-06 07:11:56 -0400) are available in the Git repository at: https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20221010 for you to fetch changes up to 915f62844cf62e428c7c178149b5ff1cbe129b07: docs/system/arm/emulation.rst: Report FEAT_GTG support (2022-10-10 14:52:25 +0100) target-arm queue: * Retry KVM_CREATE_VM call if it fails EINTR * allow setting SCR_EL3.EnTP2 when FEAT_SME is implemented * docs/nuvoton: Update URL for images * refactoring of page table walk code * hw/arm/boot: set CPTR_EL3.ESM and SCR_EL3.EnTP2 when booting Linux with EL3 * Don't allow guest to use unimplemented granule sizes * Report FEAT_GTG support Jerome Forissier (2): target/arm: allow setting SCR_EL3.EnTP2 when FEAT_SME is implemented hw/arm/boot: set CPTR_EL3.ESM and SCR_EL3.EnTP2 when booting Linux with EL3 Joel Stanley (1): docs/nuvoton: Update URL for images Peter Maydell (4): target/arm/kvm: Retry KVM_CREATE_VM call if it fails EINTR target/arm: Don't allow guest to use unimplemented granule sizes target/arm: Use ARMGranuleSize in ARMVAParameters docs/system/arm/emulation.rst: Report FEAT_GTG support Richard Henderson (21): target/arm: Split s2walk_secure from ipa_secure in get_phys_addr target/arm: Make the final stage1+2 write to secure be unconditional target/arm: Add is_secure parameter to get_phys_addr_lpae target/arm: Fix S2 disabled check in S1_ptw_translate target/arm: Add is_secure parameter to regime_translation_disabled target/arm: Split out get_phys_addr_with_secure target/arm: Add is_secure parameter to v7m_read_half_insn target/arm: Add TBFLAG_M32.SECURE target/arm: Merge regime_is_secure into get_phys_addr target/arm: Add is_secure parameter to do_ats_write target/arm: Fold secure and non-secure a-profile mmu indexes target/arm: Reorg regime_translation_disabled target/arm: Drop secure check for HCR.TGE vs SCTLR_EL1.M target/arm: Introduce arm_hcr_el2_eff_secstate target/arm: Hoist read of *is_secure in S1_ptw_translate target/arm: Remove env argument from combined_attrs_fwb target/arm: Pass HCR to attribute subroutines. target/arm: Fix ATS12NSO* from S PL1 target/arm: Split out get_phys_addr_disabled target/arm: Fix cacheattr in get_phys_addr_disabled target/arm: Use tlb_set_page_full docs/system/arm/emulation.rst | 1 + docs/system/arm/nuvoton.rst | 4 +- target/arm/cpu-param.h| 2 +- target/arm/cpu.h | 181 -- target/arm/internals.h| 150 ++- hw/arm/boot.c | 4 + target/arm/helper.c | 332 ++-- target/arm/kvm.c | 4 +- target/arm/m_helper.c | 29 ++- target/arm/ptw.c | 570 ++ target/arm/tlb_helper.c | 9 +- target/arm/translate-a64.c| 8 - target/arm/translate.c| 9 +- 13 files changed, 717 insertions(+), 586 deletions(-)
[PULL 01/28] target/arm/kvm: Retry KVM_CREATE_VM call if it fails EINTR
Occasionally the KVM_CREATE_VM ioctl can return EINTR, even though there is no pending signal to be taken. In commit 94ccff13382055 we added a retry-on-EINTR loop to the KVM_CREATE_VM call in the generic KVM code. Adopt the same approach for the use of the ioctl in the Arm-specific KVM code (where we use it to create a scratch VM for probing for various things). For more information, see the mailing list thread: https://lore.kernel.org/qemu-devel/8735e0s1zw.wl-...@kernel.org/ Reported-by: Vitaly Chikunov Signed-off-by: Peter Maydell Reviewed-by: Vitaly Chikunov Reviewed-by: Eric Auger Acked-by: Marc Zyngier Message-id: 20220930113824.1933293-1-peter.mayd...@linaro.org --- target/arm/kvm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/target/arm/kvm.c b/target/arm/kvm.c index e5c1bd50d29..1e4de9b42e3 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -79,7 +79,9 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try, if (max_vm_pa_size < 0) { max_vm_pa_size = 0; } -vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size); +do { +vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size); +} while (vmfd == -1 && errno == EINTR); if (vmfd < 0) { goto err; } -- 2.25.1