Re: [PATCH v11 7/7] docs/zoned-storage: add zoned device documentation

2022-10-10 Thread Sam Li
Hannes Reinecke  于2022年10月10日周一 14:19写道:
>
> On 10/10/22 04:21, Sam Li wrote:
> > Add the documentation about the zoned device support to virtio-blk
> > emulation.
> >
> > Signed-off-by: Sam Li 
> > Reviewed-by: Stefan Hajnoczi 
> > ---
> >   docs/devel/zoned-storage.rst   | 40 ++
> >   docs/system/qemu-block-drivers.rst.inc |  6 
> >   2 files changed, 46 insertions(+)
> >   create mode 100644 docs/devel/zoned-storage.rst
> >
> > diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
> > new file mode 100644
> > index 00..deaa4ce99b
> > --- /dev/null
> > +++ b/docs/devel/zoned-storage.rst
> > @@ -0,0 +1,40 @@
> > +=
> > +zoned-storage
> > +=
> > +
> > +Zoned Block Devices (ZBDs) devide the LBA space into block regions called 
> > zones
>
> divide
>
> > +that are larger than the LBA size. They can only allow sequential writes, 
> > which
> > +can reduce write amplification in SSDs, and potentially lead to higher
> > +throughput and increased capacity. More details about ZBDs can be found at:
> > +
> > +https://zonedstorage.io/docs/introduction/zoned-storage
> > +
> > +1. Block layer APIs for zoned storage
> > +-
> > +QEMU block layer has three zoned storage model:
> > +- BLK_Z_HM: This model only allows sequential writes access. It supports a 
> > set
> > +of ZBD-specific I/O request that used by the host to manage device zones.
>
> Maybe:
> This model only allow for sequential write access to zones. It supports
> ZBD-specific I/O requests to manage device zones.
>
> > +- BLK_Z_HA: It deals with both sequential writes and random writes access.
>
> Maybe better:
> This model allows sequential and random writes to zones. It supports
> ZBD-specific I/O requests to manage device zones.
>
> > +- BLK_Z_NONE: Regular block devices and drive-managed ZBDs are treated as
> > +non-zoned devices.
>
> Maybe:
> This is the default model with no zones support; it includes both
> regular and drive-managed ZBD devices. ZBD-specific I/O requests are not
> supported.

Thanks!

>
> > +
> > +The block device information resides inside BlockDriverState. QEMU uses
> > +BlockLimits struct(BlockDriverState::bl) that is continuously accessed by 
> > the
> > +block layer while processing I/O requests. A BlockBackend has a root 
> > pointer to
> > +a BlockDriverState graph(for example, raw format on top of file-posix). The
> > +zoned storage information can be propagated from the leaf BlockDriverState 
> > all
> > +the way up to the BlockBackend. If the zoned storage model in file-posix is
> > +set to BLK_Z_HM, then block drivers will declare support for zoned host 
> > device.
> > +
> > +The block layer APIs support commands needed for zoned storage devices,
> > +including report zones, four zone operations, and zone append.
> > +
> > +2. Emulating zoned storage controllers
> > +--
> > +When the BlockBackend's BlockLimits model reports a zoned storage device, 
> > users
> > +like the virtio-blk emulation or the qemu-io-cmds.c utility can use block 
> > layer
> > +APIs for zoned storage emulation or testing.
> > +
> > +For example, to test zone_report on a null_blk device using qemu-io is:
> > +$ path/to/qemu-io --image-opts -n 
> > driver=zoned_host_device,filename=/dev/nullb0
> > +-c "zrp offset nr_zones"
> > diff --git a/docs/system/qemu-block-drivers.rst.inc 
> > b/docs/system/qemu-block-drivers.rst.inc
> > index dfe5d2293d..0b97227fd9 100644
> > --- a/docs/system/qemu-block-drivers.rst.inc
> > +++ b/docs/system/qemu-block-drivers.rst.inc
> > @@ -430,6 +430,12 @@ Hard disks
> > you may corrupt your host data (use the ``-snapshot`` command
> > line option or modify the device permissions accordingly).
> >
> > +Zoned block devices
> > +  Zoned block devices can be passed through to the guest if the emulated 
> > storage
> > +  controller supports zoned storage. Use ``--blockdev zoned_host_device,
> > +  node-name=drive0,filename=/dev/nullb0`` to pass through ``/dev/nullb0``
> > +  as ``drive0``.
> > +
> >   Windows
> >   ^^^
> >
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes ReineckeKernel Storage Architect
> h...@suse.de  +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
> Myers, Andrew McDonald, Martje Boudien Moerman
>



Re: FAILED: libqemu-arm-bsd-user.fa.p/bsd-user_signal.c.o

2022-10-10 Thread Thomas Huth

On 10/10/2022 08.18, Dennis Clarke wrote:


On FreeBSD 14.0 CURRENT amd64 everything seems to go swimmingly until :

[5679/6848] Compiling C object libqemu-arm-bsd-user.fa.p/bsd-user_mmap.c.o
[5680/6848] Compiling C object libqemu-arm-bsd-user.fa.p/bsd-user_signal.c.o
FAILED: libqemu-arm-bsd-user.fa.p/bsd-user_signal.c.o
/usr/bin/cc -m64 -mcx16 -Ilibqemu-arm-bsd-user.fa.p -I. -I.. -Itarget/arm 
-I../target/arm -I../common-user/host/x86_64 -I../bsd-user/include 
-Ibsd-user/freebsd -I../bsd-user/freebsd -I../bsd-user/host/x86_64 
-Ibsd-user -I../bsd-user -I../bsd-user/arm -Iqapi -Itrace -Iui -Iui/shader 
-I/usr/local/include -I/usr/local/include/glib-2.0 
-I/usr/local/lib/glib-2.0/include -fcolor-diagnostics -Wall -Winvalid-pch 
-std=gnu11 -O0 -g -iquote . -iquote /opt/bw/build/qemu -iquote 
/opt/bw/build/qemu/include -iquote /opt/bw/build/qemu/tcg/i386 -pthread 
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv -Wold-style-definition 
-Wtype-limits -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels 
-Wexpansion-to-defined -Wno-initializer-overrides -Wno-missing-include-dirs 
-Wno-shift-negative-value -Wno-string-plus-int -Wno-typedef-redefinition 
-Wno-tautological-type-limit-compare -Wno-psabi 
-Wno-gnu-variable-sized-type-not-at-end -fstack-protector-strong -m64 -g -O0 
-fno-builtin -fPIE -DNEED_CPU_H 
'-DCONFIG_TARGET="arm-bsd-user-config-target.h"' 
'-DCONFIG_DEVICES="arm-bsd-user-config-devices.h"' -MD -MQ 
libqemu-arm-bsd-user.fa.p/bsd-user_signal.c.o -MF 
libqemu-arm-bsd-user.fa.p/bsd-user_signal.c.o.d -o 
libqemu-arm-bsd-user.fa.p/bsd-user_signal.c.o -c ../bsd-user/signal.c

In file included from ../bsd-user/signal.c:27:
In file included from ../bsd-user/host/x86_64/host-signal.h:14:
In file included from /usr/include/vm/pmap.h:92:
/usr/include/machine/pmap.h:452:2: error: fields must have a constant size: 
'variable length array in structure' extension will never be supported

     PV_CHUNK_HEADER
     ^
/usr/include/machine/pmap.h:448:12: note: expanded from macro 'PV_CHUNK_HEADER'
     uint64_t    pc_map[_NPCM];  /* bitmap; 1 = free */  \
     ^
/usr/include/machine/pmap.h:456:2: error: fields must have a constant size: 
'variable length array in structure' extension will never be supported

     PV_CHUNK_HEADER
     ^
/usr/include/machine/pmap.h:448:12: note: expanded from macro 'PV_CHUNK_HEADER'
     uint64_t    pc_map[_NPCM];  /* bitmap; 1 = free */  \
     ^
2 errors generated.
ninja: build stopped: subcommand failed.
gmake[1]: *** [Makefile:165: run-ninja] Error 1
gmake[1]: Leaving directory '/opt/bw/build/qemu/build'
gmake: *** [GNUmakefile:11: all] Error 2

phobos#

Is there a trivial patch ?  Or perhaps try again using GCC and not LLVM/Clang?


I'm not using FreeBSD, so no real clue, but this pretty much sounds like 
_NPCM is not properly defined by your system headers anymore, so I assume 
this is a problem on the FreeBSD side ... I'd suggest to report it on the 
FreeBSD mailing list.


 Thomas




Re: total fail on FreeBSD 14.0 amd64 regardless of compiler

2022-10-10 Thread Thomas Huth

On 10/10/2022 08.56, Dennis Clarke wrote:


re: https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg01249.html

Using GCC 12 is even worse :

[2040/6841] Compiling C object qemu-system-aarch64.p/softmmu_main.c.o
[2041/6841] Linking target qemu-system-aarch64
FAILED: qemu-system-aarch64
/usr/local/bin/g++12 -m64 -mcx16 @qemu-system-aarch64.rsp
/usr/local/bin/ld: libqemuutil.a.p/util_filemonitor-inotify.c.o: undefined 
reference to symbol 'inotify_init1'


Now that sounds like the detection for inotify_init1 did not work
right in the meson.build script...

Looking at meson.build:

config_host_data.set('CONFIG_INOTIFY1',
 cc.has_header_symbol('sys/inotify.h', 'inotify_init1'))

... do you have such a "inotify.h" header on your FreeBSD system
and does it contain an inotify_init1 function?

 Thomas




Re: total fail on FreeBSD 14.0 amd64 regardless of compiler

2022-10-10 Thread Dennis Clarke

On 10/10/22 07:21, Thomas Huth wrote:

On 10/10/2022 08.56, Dennis Clarke wrote:


re: 
https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg01249.html


Using GCC 12 is even worse :

[2040/6841] Compiling C object qemu-system-aarch64.p/softmmu_main.c.o
[2041/6841] Linking target qemu-system-aarch64
FAILED: qemu-system-aarch64
/usr/local/bin/g++12 -m64 -mcx16 @qemu-system-aarch64.rsp
/usr/local/bin/ld: libqemuutil.a.p/util_filemonitor-inotify.c.o: 
undefined reference to symbol 'inotify_init1'


Now that sounds like the detection for inotify_init1 did not work
right in the meson.build script...

Looking at meson.build:

config_host_data.set('CONFIG_INOTIFY1',
  cc.has_header_symbol('sys/inotify.h', 
'inotify_init1'))


... do you have such a "inotify.h" header on your FreeBSD system
and does it contain an inotify_init1 function?

  Thomas


Let's see :

phobos#
phobos# uname -apKU
FreeBSD phobos 14.0-CURRENT FreeBSD 14.0-CURRENT #14 
main-n258340-497cdf9673e: Sun Oct  2 09:51:14 GMT 2022 
root@phobos:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 amd64 1400072 
1400072

phobos# ls -lap /usr/local/include/sys/inotify.h
-rw-r--r--  1 root  wheel  4540 Oct  4 01:24 
/usr/local/include/sys/inotify.h

phobos#
phobos# grep 'inotify_init1' /usr/local/include/sys/inotify.h
/* Flags for the parameter of inotify_init1. */
int inotify_init1 (int flags) __THROW;
phobos#

Looks to be there.


--
Dennis Clarke
RISC-V/SPARC/PPC/ARM/CISC
UNIX and Linux spoken
GreyBeard and suspenders optional



Re: [PATCH v3] disas/riscv.c: rvv: Add disas support for vector instructions

2022-10-10 Thread Alistair Francis
On Wed, Sep 28, 2022 at 3:19 PM Yang Liu  wrote:
>
> Tested with https://github.com/ksco/rvv-decoder-tests
>
> Expected checkpatch errors for consistency and brevity reasons:
>
> ERROR: line over 90 characters
> ERROR: trailing statements should be on next line
> ERROR: braces {} are necessary for all arms of this statement
>
> Signed-off-by: Yang Liu 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  disas/riscv.c | 1432 -
>  1 file changed, 1430 insertions(+), 2 deletions(-)
>
> diff --git a/disas/riscv.c b/disas/riscv.c
> index f107d94c4c..d216b9c39b 100644
> --- a/disas/riscv.c
> +++ b/disas/riscv.c
> @@ -158,6 +158,11 @@ typedef enum {
>  rv_codec_css_sqsp,
>  rv_codec_k_bs,
>  rv_codec_k_rnum,
> +rv_codec_v_r,
> +rv_codec_v_ldst,
> +rv_codec_v_i,
> +rv_codec_vsetvli,
> +rv_codec_vsetivli,
>  } rv_codec;
>
>  typedef enum {
> @@ -560,6 +565,376 @@ typedef enum {
>  rv_op_zip = 396,
>  rv_op_xperm4 = 397,
>  rv_op_xperm8 = 398,
> +rv_op_vle8_v = 399,
> +rv_op_vle16_v = 400,
> +rv_op_vle32_v = 401,
> +rv_op_vle64_v = 402,
> +rv_op_vse8_v = 403,
> +rv_op_vse16_v = 404,
> +rv_op_vse32_v = 405,
> +rv_op_vse64_v = 406,
> +rv_op_vlm_v = 407,
> +rv_op_vsm_v = 408,
> +rv_op_vlse8_v = 409,
> +rv_op_vlse16_v = 410,
> +rv_op_vlse32_v = 411,
> +rv_op_vlse64_v = 412,
> +rv_op_vsse8_v = 413,
> +rv_op_vsse16_v = 414,
> +rv_op_vsse32_v = 415,
> +rv_op_vsse64_v = 416,
> +rv_op_vluxei8_v = 417,
> +rv_op_vluxei16_v = 418,
> +rv_op_vluxei32_v = 419,
> +rv_op_vluxei64_v = 420,
> +rv_op_vloxei8_v = 421,
> +rv_op_vloxei16_v = 422,
> +rv_op_vloxei32_v = 423,
> +rv_op_vloxei64_v = 424,
> +rv_op_vsuxei8_v = 425,
> +rv_op_vsuxei16_v = 426,
> +rv_op_vsuxei32_v = 427,
> +rv_op_vsuxei64_v = 428,
> +rv_op_vsoxei8_v = 429,
> +rv_op_vsoxei16_v = 430,
> +rv_op_vsoxei32_v = 431,
> +rv_op_vsoxei64_v = 432,
> +rv_op_vle8ff_v = 433,
> +rv_op_vle16ff_v = 434,
> +rv_op_vle32ff_v = 435,
> +rv_op_vle64ff_v = 436,
> +rv_op_vl1re8_v = 437,
> +rv_op_vl1re16_v = 438,
> +rv_op_vl1re32_v = 439,
> +rv_op_vl1re64_v = 440,
> +rv_op_vl2re8_v = 441,
> +rv_op_vl2re16_v = 442,
> +rv_op_vl2re32_v = 443,
> +rv_op_vl2re64_v = 444,
> +rv_op_vl4re8_v = 445,
> +rv_op_vl4re16_v = 446,
> +rv_op_vl4re32_v = 447,
> +rv_op_vl4re64_v = 448,
> +rv_op_vl8re8_v = 449,
> +rv_op_vl8re16_v = 450,
> +rv_op_vl8re32_v = 451,
> +rv_op_vl8re64_v = 452,
> +rv_op_vs1r_v = 453,
> +rv_op_vs2r_v = 454,
> +rv_op_vs4r_v = 455,
> +rv_op_vs8r_v = 456,
> +rv_op_vadd_vv = 457,
> +rv_op_vadd_vx = 458,
> +rv_op_vadd_vi = 459,
> +rv_op_vsub_vv = 460,
> +rv_op_vsub_vx = 461,
> +rv_op_vrsub_vx = 462,
> +rv_op_vrsub_vi = 463,
> +rv_op_vwaddu_vv = 464,
> +rv_op_vwaddu_vx = 465,
> +rv_op_vwadd_vv = 466,
> +rv_op_vwadd_vx = 467,
> +rv_op_vwsubu_vv = 468,
> +rv_op_vwsubu_vx = 469,
> +rv_op_vwsub_vv = 470,
> +rv_op_vwsub_vx = 471,
> +rv_op_vwaddu_wv = 472,
> +rv_op_vwaddu_wx = 473,
> +rv_op_vwadd_wv = 474,
> +rv_op_vwadd_wx = 475,
> +rv_op_vwsubu_wv = 476,
> +rv_op_vwsubu_wx = 477,
> +rv_op_vwsub_wv = 478,
> +rv_op_vwsub_wx = 479,
> +rv_op_vadc_vvm = 480,
> +rv_op_vadc_vxm = 481,
> +rv_op_vadc_vim = 482,
> +rv_op_vmadc_vvm = 483,
> +rv_op_vmadc_vxm = 484,
> +rv_op_vmadc_vim = 485,
> +rv_op_vsbc_vvm = 486,
> +rv_op_vsbc_vxm = 487,
> +rv_op_vmsbc_vvm = 488,
> +rv_op_vmsbc_vxm = 489,
> +rv_op_vand_vv = 490,
> +rv_op_vand_vx = 491,
> +rv_op_vand_vi = 492,
> +rv_op_vor_vv = 493,
> +rv_op_vor_vx = 494,
> +rv_op_vor_vi = 495,
> +rv_op_vxor_vv = 496,
> +rv_op_vxor_vx = 497,
> +rv_op_vxor_vi = 498,
> +rv_op_vsll_vv = 499,
> +rv_op_vsll_vx = 500,
> +rv_op_vsll_vi = 501,
> +rv_op_vsrl_vv = 502,
> +rv_op_vsrl_vx = 503,
> +rv_op_vsrl_vi = 504,
> +rv_op_vsra_vv = 505,
> +rv_op_vsra_vx = 506,
> +rv_op_vsra_vi = 507,
> +rv_op_vnsrl_wv = 508,
> +rv_op_vnsrl_wx = 509,
> +rv_op_vnsrl_wi = 510,
> +rv_op_vnsra_wv = 511,
> +rv_op_vnsra_wx = 512,
> +rv_op_vnsra_wi = 513,
> +rv_op_vmseq_vv = 514,
> +rv_op_vmseq_vx = 515,
> +rv_op_vmseq_vi = 516,
> +rv_op_vmsne_vv = 517,
> +rv_op_vmsne_vx = 518,
> +rv_op_vmsne_vi = 519,
> +rv_op_vmsltu_vv = 520,
> +rv_op_vmsltu_vx = 521,
> +rv_op_vmslt_vv = 522,
> +rv_op_vmslt_vx = 523,
> +rv_op_vmsleu_vv = 524,
> +rv_op_vmsleu_vx = 525,
> +rv_op_vmsleu_vi = 526,
> +rv_op_vmsle_vv = 527,
> +rv_op_vmsle_vx = 528,
> +rv_op_vmsle_vi = 529,
> +rv_op_vmsgtu_vx = 530,
> +rv_op_vmsgtu_vi = 531,
> +rv_op_vmsgt_vx = 532,
> +rv_op_vmsgt_vi = 533,
> +rv_op_vminu_vv = 

[PATCH v3 00/10] Introduce new acpi/smbios avocado tests using biosbits

2022-10-10 Thread Ani Sinha
Please see the README file added in patch 10 for more details.
Sample runs are as follows:

$ ./tests/venv/bin/avocado run -t acpi tests/avocado --tap -
ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
(smbios.py, line 92)
ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
(smilatency.py, line 47)
ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
(testacpi.py, line 158)
1..1
ok 1 tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits

$ ./tests/venv/bin/avocado run -t acpi tests/avocado 
ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
(smbios.py, line 92)
ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
(smilatency.py, line 47)
ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
(testacpi.py, line 158)
Fetching asset from 
tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits
JOB ID : c06a9feb423bcda5de89bb51353c6c1718719f14
JOB LOG: 
/home/anisinha/avocado/job-results/job-2022-10-10T13.12-c06a9fe/job.log
 (1/1) tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits: PASS 
(34.11 s)
RESULTS: PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB TIME   : 37.40 s

 ./tests/venv/bin/avocado run tests/avocado/acpi-bits.py
Fetching asset from 
tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits
JOB ID : a2476dd3fafe289c5e6475f447bc8369f77d57ba
JOB LOG: 
/home/anisinha/avocado/job-results/job-2022-10-10T13.14-a2476dd/job.log
 (1/1) tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits: PASS 
(34.07 s)
RESULTS: PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB TIME   : 40.20 s


Changelog:
v3:
  - converted the existing test to avocado tests as per the popular
recommendation. Added appropriate tags.
  - artifact download URL modified based on gitlab-CI URL.

  For biosbits repo:
  - moved to a gitlab repo with me being the maintainer.
  - added .gitlab-ci.yml file to generate the artifacts.
v2:
 - a new class of python based tests introduced that is separate from avocado
   tests or qtests. Can be run by using "make check-pytest".
 - acpi biosbits tests are the first tests to use pytest environment.
 - bios bits tests now download the bits binary archives from a remote
   repository if they are not found locally. The test skips if download
   fails.
 - A new environment variable is introduced that can be passed by the tester
   to specify the location of the bits archives locally. test skips if the
   bits binaries are not found in that location.
 - if pip install of python module fails for whatever reaoson, the test skips.
 - misc code fixes including spell check of the README doc. README has been
   updated as well.
 - addition of SPDX license headers to bits test files.
 - update MAINTAINERS to reflect the new pytest test class.

For biosbits repo:
 - added Dockerfile and build script. Made bios bits build on gcc 11.
   https://github.com/ani-sinha/bits/blob/bits-qemu-logging/Dockerfile
   https://github.com/ani-sinha/bits/blob/bits-qemu-logging/build-artifacts.sh
   The build script generates the zip archive and tarball used by the test.

v1: initial patchset. uses qtest to implement the bios bits tests.

Cc: Qemu Devel 
Cc: Daniel P. Berrangé 
Cc: Paolo Bonzini 
Cc: Maydell Peter 
Cc: John Snow 
Cc: Thomas Huth 
Cc: Igor M 
Cc: Michael Tsirkin 

Ani Sinha (10):
  acpi/tests/avocado/bits: initial commit of test scripts that are run
by biosbits
  acpi/tests/avocado/bits: add SPDX license identifiers for bios bits
tests
  acpi/tests/avocado/bits: disable acpi PSS tests that are failing in
biosbits
  acpi/tests/avocado/bits: add smilatency test suite from bits in order
to disable it
  acpi/tests/avocado/bits: add SPDX license identifiers for bios bits
smilatency tests
  acpi/tests/avocado/bits: disable smilatency test since it does not
pass everytime
  acpi/tests/avocado/bits: add biosbits config file for running bios
tests
  acpi/tests/avocado/bits: add acpi and smbios avocado tests that uses
biosbits
  MAINTAINERS: add myself as the maintainer for acpi biosbits avocado
tests
  acpi/tests/avocado/bits: add a README file

 MAINTAINERS   |6 +
 tests/avocado/acpi-bits.py|  334 +++
 tests/avocado/acpi-bits/README|   90 +
 .../acpi-bits/bits-config/bits-cfg.txt|   18 +
 tests/avocado/acpi-bits/bits-tests/smbios.py  | 2434 +
 .../acpi-bits/bits-tests/smilatency.py|  107 +
 .../avocado/acpi-bits/bits-tests/testacpi.py  |  287 ++
 .../avocado/acpi-bits/bits-tests/testcpuid.py |   87 +
 8 files changed, 3363 insertions(+)
 create mode 100644 tests/avocado/acpi-bits.py
 create mode 100644 tests/avocado/acpi-bits/README
 create mode 100644 tests/avocado/acpi-bits/bits-config/bits-cfg.txt
 create mode 100644 tests/avocado/acpi-bits/bit

[PATCH v3 07/10] acpi/tests/avocado/bits: add biosbits config file for running bios tests

2022-10-10 Thread Ani Sinha
This change adds initial biosbits config file that instructs biosbits to run
bios test suits in batch mode. Additionally acpi and smbios structures are also
dumped.

Cc: Daniel P. Berrangé 
Cc: Paolo Bonzini 
Cc: Maydell Peter 
Cc: John Snow 
Cc: Thomas Huth 
Signed-off-by: Ani Sinha 
---
 .../avocado/acpi-bits/bits-config/bits-cfg.txt | 18 ++
 1 file changed, 18 insertions(+)
 create mode 100644 tests/avocado/acpi-bits/bits-config/bits-cfg.txt

diff --git a/tests/avocado/acpi-bits/bits-config/bits-cfg.txt 
b/tests/avocado/acpi-bits/bits-config/bits-cfg.txt
new file mode 100644
index 00..8010804453
--- /dev/null
+++ b/tests/avocado/acpi-bits/bits-config/bits-cfg.txt
@@ -0,0 +1,18 @@
+# BITS configuration file
+[bits]
+
+# To run BITS in batch mode, set batch to a list of one or more of the
+# following keywords; BITS will then run all of the requested operations, then
+# save the log file to disk.
+#
+# test: Run the full BITS testsuite.
+# acpi: Dump all ACPI structures.
+# smbios: Dump all SMBIOS structures.
+#
+# Leave batch set to an empty string to disable batch mode.
+# batch =
+
+# Uncomment the following to run all available batch operations
+# please take a look at boot/python/init.py in bits zip file
+# to see how these options are parsed and used.
+batch = test acpi smbios
-- 
2.34.1




[PATCH v3 02/10] acpi/tests/avocado/bits: add SPDX license identifiers for bios bits tests

2022-10-10 Thread Ani Sinha
Added the SPDX license identifiers for biosbits tests.
Also added a comment on each of the test scripts to indicate that they run
from within the biosbits environment and hence are not subjected to the regular
maintanance acivities for QEMU and is excluded from the dependency management
challenges in the host testing environment.

Cc: Daniel P. Berrangé 
Cc: Paolo Bonzini 
Cc: Maydell Peter 
Cc: John Snow 
Cc: Thomas Huth 
Signed-off-by: Ani Sinha 
---
 tests/avocado/acpi-bits/bits-tests/smbios.py| 4 
 tests/avocado/acpi-bits/bits-tests/testacpi.py  | 4 
 tests/avocado/acpi-bits/bits-tests/testcpuid.py | 4 
 3 files changed, 12 insertions(+)

diff --git a/tests/avocado/acpi-bits/bits-tests/smbios.py 
b/tests/avocado/acpi-bits/bits-tests/smbios.py
index 9667d0542c..fc623de072 100644
--- a/tests/avocado/acpi-bits/bits-tests/smbios.py
+++ b/tests/avocado/acpi-bits/bits-tests/smbios.py
@@ -1,6 +1,8 @@
 # Copyright (c) 2015, Intel Corporation
 # All rights reserved.
 #
+# SPDX-License-Identifier: BSD-3-Clause
+#
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions are met:
 #
@@ -24,6 +26,8 @@
 # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
+# This script runs only from the biosbits VM.
+
 """SMBIOS/DMI module."""
 
 import bits
diff --git a/tests/avocado/acpi-bits/bits-tests/testacpi.py 
b/tests/avocado/acpi-bits/bits-tests/testacpi.py
index 9ec452f330..18dc818d62 100644
--- a/tests/avocado/acpi-bits/bits-tests/testacpi.py
+++ b/tests/avocado/acpi-bits/bits-tests/testacpi.py
@@ -1,6 +1,8 @@
 # Copyright (c) 2015, Intel Corporation
 # All rights reserved.
 #
+# SPDX-License-Identifier: BSD-3-Clause
+#
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions are met:
 #
@@ -24,6 +26,8 @@
 # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
+# This script runs only from the biosbits VM.
+
 """Tests for ACPI"""
 
 import acpi
diff --git a/tests/avocado/acpi-bits/bits-tests/testcpuid.py 
b/tests/avocado/acpi-bits/bits-tests/testcpuid.py
index ac55d912e1..7adefbe355 100644
--- a/tests/avocado/acpi-bits/bits-tests/testcpuid.py
+++ b/tests/avocado/acpi-bits/bits-tests/testcpuid.py
@@ -1,6 +1,8 @@
 # Copyright (c) 2012, Intel Corporation
 # All rights reserved.
 #
+# SPDX-License-Identifier: BSD-3-Clause
+#
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions are met:
 #
@@ -24,6 +26,8 @@
 # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
+# This script runs only from the biosbits VM.
+
 """Tests and helpers for CPUID."""
 
 import bits
-- 
2.34.1




[PATCH v3 03/10] acpi/tests/avocado/bits: disable acpi PSS tests that are failing in biosbits

2022-10-10 Thread Ani Sinha
PSS tests in acpi test suite seems to be failing in biosbits. This is because
the test is unable to find PSS support in QEMU bios. Let us disable
them for now so that make check does not fail. We can fix the tests and
re-enable them later.

Example failure:

 ACPI _PSS (Pstate) table conformance tests 
[assert] _PSS must exist FAIL
  \_SB_.CPUS.C000
  No _PSS exists
Summary: 1 passed, 1 failed
 ACPI _PSS (Pstate) runtime tests 
[assert] _PSS must exist FAIL
  \_SB_.CPUS.C000
  No _PSS exists
Summary: 0 passed, 1 failed

Cc: Daniel P. Berrangé 
Cc: Paolo Bonzini 
Cc: Maydell Peter 
Cc: John Snow 
Cc: Thomas Huth 
Signed-off-by: Ani Sinha 
---
 tests/avocado/acpi-bits/bits-tests/testacpi.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/avocado/acpi-bits/bits-tests/testacpi.py 
b/tests/avocado/acpi-bits/bits-tests/testacpi.py
index 18dc818d62..f818a9cce6 100644
--- a/tests/avocado/acpi-bits/bits-tests/testacpi.py
+++ b/tests/avocado/acpi-bits/bits-tests/testacpi.py
@@ -40,8 +40,8 @@
 
 def register_tests():
 testsuite.add_test("ACPI _MAT (Multiple APIC Table Entry) under Processor 
objects", test_mat, submenu="ACPI Tests")
-testsuite.add_test("ACPI _PSS (Pstate) table conformance tests", test_pss, 
submenu="ACPI Tests")
-testsuite.add_test("ACPI _PSS (Pstate) runtime tests", test_pstates, 
submenu="ACPI Tests")
+#testsuite.add_test("ACPI _PSS (Pstate) table conformance tests", 
test_pss, submenu="ACPI Tests")
+#testsuite.add_test("ACPI _PSS (Pstate) runtime tests", test_pstates, 
submenu="ACPI Tests")
 testsuite.add_test("ACPI DSDT (Differentiated System Description Table)", 
test_dsdt, submenu="ACPI Tests")
 testsuite.add_test("ACPI FACP (Fixed ACPI Description Table)", test_facp, 
submenu="ACPI Tests")
 testsuite.add_test("ACPI HPET (High Precision Event Timer Table)", 
test_hpet, submenu="ACPI Tests")
-- 
2.34.1




[PATCH v3 05/10] acpi/tests/avocado/bits: add SPDX license identifiers for bios bits smilatency tests

2022-10-10 Thread Ani Sinha
Added the SPDX license identifier for smilatency tests.
Also added a comment indicating that smilatency test is run from within the
biosbits environment/VM and hence is not subjected to QEMU build/test
environment dependency fulfilments or QEMU maintanance activities.

Cc: Daniel P. Berrangé 
Cc: Paolo Bonzini 
Cc: Maydell Peter 
Cc: John Snow 
Cc: Thomas Huth 
Signed-off-by: Ani Sinha 
---
 tests/avocado/acpi-bits/bits-tests/smilatency.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/avocado/acpi-bits/bits-tests/smilatency.py 
b/tests/avocado/acpi-bits/bits-tests/smilatency.py
index fb1b7228e3..d616970b31 100644
--- a/tests/avocado/acpi-bits/bits-tests/smilatency.py
+++ b/tests/avocado/acpi-bits/bits-tests/smilatency.py
@@ -1,6 +1,8 @@
 # Copyright (c) 2015, Intel Corporation
 # All rights reserved.
 #
+# SPDX-License-Identifier: BSD-3-Clause
+#
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions are met:
 #
@@ -24,6 +26,8 @@
 # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
+# This script runs only from the biosbits VM.
+
 """SMI latency test."""
 
 import bits
-- 
2.34.1




[PATCH v3 04/10] acpi/tests/avocado/bits: add smilatency test suite from bits in order to disable it

2022-10-10 Thread Ani Sinha
smilatency tests does not reliably pass every time it is run from QEMU. This
change adds the test file unchanged from bits so that the next change can
disable the test.

Cc: Daniel P. Berrangé 
Cc: Paolo Bonzini 
Cc: Maydell Peter 
Cc: John Snow 
Cc: Thomas Huth 
Signed-off-by: Ani Sinha 
---
 .../acpi-bits/bits-tests/smilatency.py| 102 ++
 1 file changed, 102 insertions(+)
 create mode 100644 tests/avocado/acpi-bits/bits-tests/smilatency.py

diff --git a/tests/avocado/acpi-bits/bits-tests/smilatency.py 
b/tests/avocado/acpi-bits/bits-tests/smilatency.py
new file mode 100644
index 00..fb1b7228e3
--- /dev/null
+++ b/tests/avocado/acpi-bits/bits-tests/smilatency.py
@@ -0,0 +1,102 @@
+# Copyright (c) 2015, Intel Corporation
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# * Redistributions of source code must retain the above copyright notice,
+#   this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright 
notice,
+#   this list of conditions and the following disclaimer in the 
documentation
+#   and/or other materials provided with the distribution.
+# * Neither the name of Intel Corporation nor the names of its contributors
+#   may be used to endorse or promote products derived from this software
+#   without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 
AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE 
FOR
+# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 
DAMAGES
+# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 
ON
+# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""SMI latency test."""
+
+import bits
+from collections import namedtuple
+import testsuite
+import time
+import usb
+
+def register_tests():
+testsuite.add_test("SMI latency test", smi_latency);
+testsuite.add_test("SMI latency test with USB disabled via BIOS handoff", 
test_with_usb_disabled, runall=False);
+
+def smi_latency():
+MSR_SMI_COUNT = 0x34
+
+print "Warning: touching the keyboard can affect the results of this test."
+
+tsc_per_sec = bits.tsc_per_sec()
+tsc_per_usec = tsc_per_sec / (1000 * 1000)
+bins = [long(tsc_per_usec * 10**i) for i in range(9)]
+bin_descs = [
+"0 < t <=   1us",
+"1us   < t <=  10us",
+"10us  < t <= 100us",
+"100us < t <=   1ms",
+"1ms   < t <=  10ms",
+"10ms  < t <= 100ms",
+"100ms < t <=   1s ",
+"1s< t <=  10s ",
+"10s   < t <= 100s ",
+"100s  < t ",
+]
+
+print "Starting test. Wait here, I will be back in 15 seconds."
+(max_latency, smi_count_delta, bins) = bits.smi_latency(long(15 * 
tsc_per_sec), bins)
+BinType = namedtuple('BinType', ("max", "total", "count", "times"))
+bins = [BinType(*b) for b in bins]
+
+testsuite.test("SMI latency < 150us to minimize risk of OS timeouts", 
max_latency / tsc_per_usec <= 150)
+if not testsuite.show_detail():
+return
+
+for bin, desc in zip(bins, bin_descs):
+if bin.count == 0:
+continue
+testsuite.print_detail("{}; average = {}; count = {}".format(desc, 
bits.format_tsc(bin.total/bin.count), bin.count))
+deltas = (bits.format_tsc(t2 - t1) for t1,t2 in zip(bin.times, 
bin.times[1:]))
+testsuite.print_detail(" Times between first few observations: 
{}".format(" ".join("{:>6}".format(delta) for delta in deltas)))
+
+if smi_count_delta is not None:
+testsuite.print_detail("{} SMI detected using MSR_SMI_COUNT (MSR 
{:#x})".format(smi_count_delta, MSR_SMI_COUNT))
+
+testsuite.print_detail("Summary of impact: observed maximum latency = 
{}".format(bits.format_tsc(max_latency)))
+
+def test_with_usb_disabled():
+if usb.handoff_to_os():
+smi_latency()
+
+def average_io_smi(port, value, count):
+def f():
+tsc_start = bits.rdtsc()
+bits.outb(port, value)
+return bits.rdtsc() - tsc_start
+counts = [f() for i in range(count)]
+return sum(counts)/len(counts)
+
+def time_io_smi(port=0xb2, value=0, count=1000):
+count_for_estimate = 10
+start = time.time()
+average_io_smi(port, value, count_for_estimate)
+avg10 = time.time() - start
+estimate = avg10 * count/count_for_estimate

[PATCH v3 06/10] acpi/tests/avocado/bits: disable smilatency test since it does not pass everytime

2022-10-10 Thread Ani Sinha
smilatency test is latency sensitive and does not pass deterministically when
run in QEMU environment under biosbits. Disable the test suite for now.

Example failure:

 SMI latency test 
Warning: touching the keyboard can affect the results of this test.
Starting test. Wait here, I will be back in 15 seconds.
[assert] SMI latency < 150us to minimize risk of OS timeouts FAIL
  1us   < t <=  10us; average = 1372ns; count = 10912449
   Times between first few observations:  176us 1646ns 1441ns 1450ns 1462ns
  10us  < t <= 100us; average = 16us; count = 1187
   Times between first few observations:   15ms 3148us 5856us   49ms   33ms
  100us < t <=   1ms; average = 259us; count = 8
   Times between first few observations:  111ms 2227ms 1779ms  999ms  219ms
  0 SMI detected using MSR_SMI_COUNT (MSR 0x34)
  Summary of impact: observed maximum latency = 298us
Summary: 0 passed, 1 failed

Cc: Daniel P. Berrangé 
Cc: Paolo Bonzini 
Cc: Maydell Peter 
Cc: John Snow 
Cc: Thomas Huth 
Signed-off-by: Ani Sinha 
---
 tests/avocado/acpi-bits/bits-tests/smilatency.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tests/avocado/acpi-bits/bits-tests/smilatency.py 
b/tests/avocado/acpi-bits/bits-tests/smilatency.py
index d616970b31..e907c55cc2 100644
--- a/tests/avocado/acpi-bits/bits-tests/smilatency.py
+++ b/tests/avocado/acpi-bits/bits-tests/smilatency.py
@@ -37,8 +37,9 @@
 import usb
 
 def register_tests():
-testsuite.add_test("SMI latency test", smi_latency);
-testsuite.add_test("SMI latency test with USB disabled via BIOS handoff", 
test_with_usb_disabled, runall=False);
+pass
+# testsuite.add_test("SMI latency test", smi_latency);
+# testsuite.add_test("SMI latency test with USB disabled via BIOS 
handoff", test_with_usb_disabled, runall=False);
 
 def smi_latency():
 MSR_SMI_COUNT = 0x34
-- 
2.34.1




[PATCH v3 01/10] acpi/tests/avocado/bits: initial commit of test scripts that are run by biosbits

2022-10-10 Thread Ani Sinha
This is initial commit of cpuid, acpi and smbios python test scripts for
biosbits to execute. No change has been made to them from the original code
written by the biosbits author Josh Triplett. They are required to be installed
into the bits iso file and then run from within the virtual machine booted off
with biosbits iso.

The original location of these tests are here:
https://github.com/biosbits/bits/blob/master/python/testacpi.py
https://github.com/biosbits/bits/blob/master/python/smbios.py
https://github.com/biosbits/bits/blob/master/python/testcpuid.py

Cc: Daniel P. Berrangé 
Cc: Paolo Bonzini 
Cc: Maydell Peter 
Cc: John Snow 
Cc: Thomas Huth 
Signed-off-by: Ani Sinha 
---
 tests/avocado/acpi-bits/bits-tests/smbios.py  | 2430 +
 .../avocado/acpi-bits/bits-tests/testacpi.py  |  283 ++
 .../avocado/acpi-bits/bits-tests/testcpuid.py |   83 +
 3 files changed, 2796 insertions(+)
 create mode 100644 tests/avocado/acpi-bits/bits-tests/smbios.py
 create mode 100644 tests/avocado/acpi-bits/bits-tests/testacpi.py
 create mode 100644 tests/avocado/acpi-bits/bits-tests/testcpuid.py

diff --git a/tests/avocado/acpi-bits/bits-tests/smbios.py 
b/tests/avocado/acpi-bits/bits-tests/smbios.py
new file mode 100644
index 00..9667d0542c
--- /dev/null
+++ b/tests/avocado/acpi-bits/bits-tests/smbios.py
@@ -0,0 +1,2430 @@
+# Copyright (c) 2015, Intel Corporation
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# * Redistributions of source code must retain the above copyright notice,
+#   this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright 
notice,
+#   this list of conditions and the following disclaimer in the 
documentation
+#   and/or other materials provided with the distribution.
+# * Neither the name of Intel Corporation nor the names of its contributors
+#   may be used to endorse or promote products derived from this software
+#   without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 
AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE 
FOR
+# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 
DAMAGES
+# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 
ON
+# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+"""SMBIOS/DMI module."""
+
+import bits
+import bitfields
+import ctypes
+import redirect
+import struct
+import uuid
+import unpack
+import ttypager
+import sys
+
+class SMBIOS(unpack.Struct):
+def __new__(cls):
+if sys.platform == "BITS-EFI":
+import efi
+sm_ptr = 
efi.system_table.ConfigurationTableDict.get(efi.SMBIOS_TABLE_GUID)
+else:
+address = 0xF
+mem = bits.memory(0xF, 0x1)
+for offset in range(0, len(mem), 16):
+signature = (ctypes.c_char * 4).from_address(address + 
offset).value
+if signature == "_SM_":
+entry_point_length = ctypes.c_ubyte.from_address(address + 
offset + 5).value
+csum = sum(map(ord, mem[offset:offset + 
entry_point_length])) & 0xff
+if csum == 0:
+sm_ptr = address + offset
+break
+else:
+return None
+
+if not sm_ptr:
+return None
+
+sm = super(SMBIOS, cls).__new__(cls)
+sm._header_memory = bits.memory(sm_ptr, 0x1f)
+return sm
+
+def __init__(self):
+super(SMBIOS, self).__init__()
+u = unpack.Unpackable(self._header_memory)
+self.add_field('header', Header(u))
+self._structure_memory = 
bits.memory(self.header.structure_table_address, 
self.header.structure_table_length)
+u = unpack.Unpackable(self._structure_memory)
+self.add_field('structures', unpack.unpack_all(u, _smbios_structures, 
self), unpack.format_each("\n\n{!r}"))
+
+def structure_type(self, num):
+'''Dumps structure of given Type if present'''
+try:
+types_present = [self.structures[x].smbios_structure_type for x in 
range(len(self.structures))]
+matrix = dict()
+for index in range(len(types_present)):
+if types_present.count(types_present[index]) == 1:
+matrix[types_present[i

[PATCH v3 09/10] MAINTAINERS: add myself as the maintainer for acpi biosbits avocado tests

2022-10-10 Thread Ani Sinha
I wrote the biosbits avocado tests for testing QEMU's ACPI/SMBIOS implementation
with biosbits and all the related changes. Making myself as the maintainer for
biosbits related files and test scripts.

Cc: Daniel P. Berrangé 
Cc: Paolo Bonzini 
Cc: Maydell Peter 
Cc: John Snow 
Cc: Thomas Huth 
Signed-off-by: Ani Sinha 
---
 MAINTAINERS | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e1530b51a2..46d829efd1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1859,6 +1859,12 @@ S: Supported
 F: hw/acpi/viot.c
 F: hw/acpi/viot.h
 
+ACPI/AVOCADO/BIOSBITS
+M: Ani Sinha 
+S: Supported
+F: tests/avocado/acpi-bits/*
+F: tests/avocado/acpi-bits.py
+
 ACPI/HEST/GHES
 R: Dongjiu Geng 
 L: qemu-...@nongnu.org
-- 
2.34.1




Re: [PATCH v3 00/10] Introduce new acpi/smbios avocado tests using biosbits

2022-10-10 Thread Ani Sinha
On Mon, Oct 10, 2022 at 1:24 PM Ani Sinha  wrote:
>
> Please see the README file added in patch 10 for more details.
> Sample runs are as follows:
>
> $ ./tests/venv/bin/avocado run -t acpi tests/avocado --tap -
> ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
> (smbios.py, line 92)
> ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
> (smilatency.py, line 47)
> ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
> (testacpi.py, line 158)

None of the above files are avocado tests or avocado related python
scripts. They are run from within bits in a python 2.7 environment. I
could not find a mechanism to exclude a directory from avocado tests.
I also do not think making those scripts python 3 compliant is a good
use of my time since upgrading bits to use python 3 would be a major
task unrelated to QEMU testing.

> 1..1
> ok 1 tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits
>
> $ ./tests/venv/bin/avocado run -t acpi tests/avocado
> ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
> (smbios.py, line 92)
> ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
> (smilatency.py, line 47)
> ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
> (testacpi.py, line 158)
> Fetching asset from 
> tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits
> JOB ID : c06a9feb423bcda5de89bb51353c6c1718719f14
> JOB LOG: 
> /home/anisinha/avocado/job-results/job-2022-10-10T13.12-c06a9fe/job.log
>  (1/1) tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits: PASS 
> (34.11 s)
> RESULTS: PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | 
> CANCEL 0
> JOB TIME   : 37.40 s
>
>  ./tests/venv/bin/avocado run tests/avocado/acpi-bits.py
> Fetching asset from 
> tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits
> JOB ID : a2476dd3fafe289c5e6475f447bc8369f77d57ba
> JOB LOG: 
> /home/anisinha/avocado/job-results/job-2022-10-10T13.14-a2476dd/job.log
>  (1/1) tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits: PASS 
> (34.07 s)
> RESULTS: PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | 
> CANCEL 0
> JOB TIME   : 40.20 s
>
>
> Changelog:
> v3:
>   - converted the existing test to avocado tests as per the popular
> recommendation. Added appropriate tags.
>   - artifact download URL modified based on gitlab-CI URL.
>
>   For biosbits repo:
>   - moved to a gitlab repo with me being the maintainer.
>   - added .gitlab-ci.yml file to generate the artifacts.
> v2:
>  - a new class of python based tests introduced that is separate from avocado
>tests or qtests. Can be run by using "make check-pytest".
>  - acpi biosbits tests are the first tests to use pytest environment.
>  - bios bits tests now download the bits binary archives from a remote
>repository if they are not found locally. The test skips if download
>fails.
>  - A new environment variable is introduced that can be passed by the tester
>to specify the location of the bits archives locally. test skips if the
>bits binaries are not found in that location.
>  - if pip install of python module fails for whatever reaoson, the test skips.
>  - misc code fixes including spell check of the README doc. README has been
>updated as well.
>  - addition of SPDX license headers to bits test files.
>  - update MAINTAINERS to reflect the new pytest test class.
>
> For biosbits repo:
>  - added Dockerfile and build script. Made bios bits build on gcc 11.
>https://github.com/ani-sinha/bits/blob/bits-qemu-logging/Dockerfile
>https://github.com/ani-sinha/bits/blob/bits-qemu-logging/build-artifacts.sh
>The build script generates the zip archive and tarball used by the test.
>
> v1: initial patchset. uses qtest to implement the bios bits tests.
>
> Cc: Qemu Devel 
> Cc: Daniel P. Berrangé 
> Cc: Paolo Bonzini 
> Cc: Maydell Peter 
> Cc: John Snow 
> Cc: Thomas Huth 
> Cc: Igor M 
> Cc: Michael Tsirkin 
>
> Ani Sinha (10):
>   acpi/tests/avocado/bits: initial commit of test scripts that are run
> by biosbits
>   acpi/tests/avocado/bits: add SPDX license identifiers for bios bits
> tests
>   acpi/tests/avocado/bits: disable acpi PSS tests that are failing in
> biosbits
>   acpi/tests/avocado/bits: add smilatency test suite from bits in order
> to disable it
>   acpi/tests/avocado/bits: add SPDX license identifiers for bios bits
> smilatency tests
>   acpi/tests/avocado/bits: disable smilatency test since it does not
> pass everytime
>   acpi/tests/avocado/bits: add biosbits config file for running bios
> tests
>   acpi/tests/avocado/bits: add acpi and smbios avocado tests that uses
> biosbits
>   MAINTAINERS: add myself as the maintainer for acpi biosbits avocado
> tests
>   acpi/tests/avocado/bits: add a README file
>
>  MAINTAINERS   |6 +
>  tests/avocado/acpi-b

[PATCH v3 10/10] acpi/tests/avocado/bits: add a README file

2022-10-10 Thread Ani Sinha
Add a README file that describes the purpose of the various test files and gives
guidance to developers on where and how to make changes.

Cc: Daniel P. BerrangÃ" 
Cc: Paolo Bonzini 
Cc: Maydell Peter 
Cc: John Snow 
Cc: Thomas Huth 
Signed-off-by: Ani Sinha 
---
 tests/avocado/acpi-bits/README | 90 ++
 1 file changed, 90 insertions(+)
 create mode 100644 tests/avocado/acpi-bits/README

diff --git a/tests/avocado/acpi-bits/README b/tests/avocado/acpi-bits/README
new file mode 100644
index 00..91e8fd235f
--- /dev/null
+++ b/tests/avocado/acpi-bits/README
@@ -0,0 +1,90 @@
+=
+ACPI/SMBIOS PYTESTS USING BIOSBITS
+=
+
+Biosbits is a software written by Josh Triplett that can be downloaded by
+visiting https://biosbits.org/. The github codebase can be found here:
+https://github.com/biosbits/bits/tree/master. It is a software that executes
+the bios components such as acpi and smbios tables directly through acpica
+bios interpreter (a freely available C based library written by Intel,
+downloadable from https://acpica.org/ and is included with biosbits) without an
+operating system getting involved in between.
+There are several advantages to directly testing the bios in a real physical
+machine or VM as opposed to indirectly discovering bios issues through the
+operating system. For one thing, the OSes tend to hide bios problems from the
+end user. The other is that we have more control of what we wanted to test
+and how by directly using acpica interpreter on top of the bios on a running
+system. More details on the inspiration for developing biosbits and its real
+life uses can be found in (a) and (b).
+This directory contains tests written in python that exercizes the QEMU
+bios components using biosbits and reports test failures.
+Under the directory tests/avocado/, acpi-bits.py is a QEMU avocado test that
+drives all this.
+
+A brief description of the various test files follows.
+
+Under tests/avocado/ as the root we have:
+
+├── acpi-bits
+│ ├── bits-config
+│ │ └── bits-cfg.txt
+│ ├── bits-tests
+│ │ ├── smbios.py
+│ │ ├── smilatency.py
+│ │ ├── testacpi.py
+│ │ └── testcpuid.py
+│ └── README
+├── acpi-bits.py
+
+tests/avocado:
+ - acpi-bits.py: This is the main python avocado test script that generates a
+   biosbits iso. It then spawns a QEMU VM with it, collects the logs and 
reports
+   test failures. This is the script one would be interested in if they wanted
+   to add or change some component of the log parsing, add a new command line 
to
+   how QEMU is spawned etc. Test writers typically would not need to modify
+   this script unless they wanted to enhance or change the log parsing for
+   their tests. Following environment variables are used in this test:
+ - V=1 : This enables verbose mode for the test. It dumps the entire log
+   from bios bits and also more details in case failure happens. It is
+   useful for debugging the test failures or tests themselves.
+
+tests/avocado/acpi-bits:
+ - README: This text file.
+
+tests/avocado/acpi-bits/bits-config:
+   This location contains biosbits config files that determine how the software
+   runs the tests.
+ - bits-config.txt: this is the biosbits config file that determines what tests
+   or actions are performed by bits. The description of the config options are
+   provided in the file itself.
+
+tests/avocado/acpi-bits/bits-tests:
+   This directory contains biosbits python based tests that are run from within
+   the biosbits environment in the spawned VM. New additions of test cases can
+   be made in the appropriate test file. For example, new acpi tests can go
+   into testacpi.py and one would call testsuite.add_test() to register the new
+   test so that it gets executed as a part of the ACPI tests.
+   It might be occasionally necessary to disable some subtests or add a new
+   test that belongs to a test suite not already present in this directory. To
+   do this, please extract the bits source from the zip file mentioned above.
+   Copy the test suite/script that needs modification (addition of new tests
+   or disabling them) from boot/python location of the extracted directory
+   into this directory.
+
+   step (a): copy unmodified test script to this directory.
+   step (b): update meson.build and add this file to the list.
+   Commit (a) and (b) together in the same commit.
+
+   step (c): perform modifications to the test.
+   Commit (c) in a separate commit.
+
+   The test framework will then use your modified test script to run the test.
+   No further changes would be needed. Please check the logs to make sure that
+   appropriate changes have taken effect.
+
+
+Author: Ani Sinha 
+
+References:
+(a) 
https://blog.linuxplumbersconf.org/2011/ocw/system/presentations/867/original/bits.pdf
+(b) https://www.youtube.com/watch?

Re: [PATCH] error handling: Use TFR() macro where applicable

2022-10-10 Thread Nikita Ivanov
Hi! Thanks for your notes. I'll try to send updated patches by the end of
the day.

On Fri, Oct 7, 2022 at 6:32 PM Peter Maydell 
wrote:

> I think this patch is doing things in the wrong order. Instead of
> converting code to use the old macro that we don't like and then
> updating it again in patch 2 to use the new macro, we should
> first introduce the new macro, and then after that we can update
> code that's currently not using a macro at all to use the new one.
> This makes code review easier because we don't have to look at a
> change to this code which is then going to be rewritten anyway.


Sounds smooth. I'll refactor patches accordingly.


> >  if (ret < 0) {
> >  ret = -errno;
>
>
> > @@ -1472,8 +1472,8 @@ static ssize_t
> handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
> >  {
> >  ssize_t len;
> >
> > -TFR(
> > -len = (aiocb->aio_type & QEMU_AIO_WRITE) ?
> > +len = TEMP_FAILURE_RETRY(
> > +(aiocb->aio_type & QEMU_AIO_WRITE) ?
> >  qemu_pwritev(aiocb->aio_fildes,
> > aiocb->io.iov,
> > aiocb->io.niov,
>
> I'm not sure why you've put the TEMP_FAILURE_RETRY on the outside here
> rather than just on the individual function calls.
>

The original code contained both branches in one while loop, so I was
afraid that
value `aiocb->aio_type & QEMU_AIO_WRITE` would change somehow during the
loop.
If you'll say that this is impossible, I'll adjust the code as you propose.

> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > index b1c161c035..6e244f15fa 100644
> > --- a/include/qemu/osdep.h
> > +++ b/include/qemu/osdep.h
> > @@ -243,7 +243,13 @@ void QEMU_ERROR("code path is reachable")
> >  #define ESHUTDOWN 4099
> >  #endif
> >
> > -#define TFR(expr) do { if ((expr) != -1) break; } while (errno == EINTR)
> > +#define TEMP_FAILURE_RETRY(expr) \
>
> We can't call the macro this, because the glibc system headers already
> may define a macro of that name, so the compiler will complain if they're
> both defined at the same time, and depending on header ordering it might
> not be clear which version you're getting.
>

Sorry, my fault. I will rename it to "RETRY_ON_EINTR" as it was proposed
earlier in this thread.
-- 
Best Regards,
*Nikita Ivanov* | C developer


Re: [PATCH v8 8/8] KVM: Enable and expose KVM_MEM_PRIVATE

2022-10-10 Thread Chao Peng
On Tue, Oct 04, 2022 at 05:55:28PM +0300, Jarkko Sakkinen wrote:
> On Thu, Sep 15, 2022 at 10:29:13PM +0800, Chao Peng wrote:
> > Expose KVM_MEM_PRIVATE and memslot fields private_fd/offset to
> > userspace. KVM will register/unregister private memslot to fd-based
> > memory backing store and response to invalidation event from
> > inaccessible_notifier to zap the existing memory mappings in the
> > secondary page table.
> > 
> > Whether KVM_MEM_PRIVATE is actually exposed to userspace is determined
> > by architecture code which can turn on it by overriding the default
> > kvm_arch_has_private_mem().
> > 
> > A 'kvm' reference is added in memslot structure since in
> > inaccessible_notifier callback we can only obtain a memslot reference
> > but 'kvm' is needed to do the zapping.
> > 
> > Co-developed-by: Yu Zhang 
> > Signed-off-by: Yu Zhang 
> > Signed-off-by: Chao Peng 
> 
> ld: arch/x86/../../virt/kvm/kvm_main.o: in function `kvm_free_memslot':
> kvm_main.c:(.text+0x1385): undefined reference to 
> `inaccessible_unregister_notifier'
> ld: arch/x86/../../virt/kvm/kvm_main.o: in function `kvm_set_memslot':
> kvm_main.c:(.text+0x1b86): undefined reference to 
> `inaccessible_register_notifier'
> ld: kvm_main.c:(.text+0x1c85): undefined reference to 
> `inaccessible_unregister_notifier'
> ld: arch/x86/kvm/mmu/mmu.o: in function `kvm_faultin_pfn':
> mmu.c:(.text+0x1e38): undefined reference to `inaccessible_get_pfn'
> ld: arch/x86/kvm/mmu/mmu.o: in function `direct_page_fault':
> mmu.c:(.text+0x67ca): undefined reference to `inaccessible_put_pfn'
> make: *** [Makefile:1169: vmlinux] Error 1
> 
> I attached kernel config for reproduction.
> 
> The problem is that CONFIG_MEMFD_CREATE does not get enabled:
> 
> mm/Makefile:obj-$(CONFIG_MEMFD_CREATE) += memfd.o memfd_inaccessible.o

Thanks for reporting. Yes there is a dependency issue needs to fix.

Chao




Re: [PATCH v8 8/8] KVM: Enable and expose KVM_MEM_PRIVATE

2022-10-10 Thread Chao Peng
On Thu, Oct 06, 2022 at 09:55:31AM +0100, Fuad Tabba wrote:
> Hi,
> 
> On Thu, Sep 15, 2022 at 3:37 PM Chao Peng  wrote:
> >
> > Expose KVM_MEM_PRIVATE and memslot fields private_fd/offset to
> > userspace. KVM will register/unregister private memslot to fd-based
> > memory backing store and response to invalidation event from
> > inaccessible_notifier to zap the existing memory mappings in the
> > secondary page table.
> >
> > Whether KVM_MEM_PRIVATE is actually exposed to userspace is determined
> > by architecture code which can turn on it by overriding the default
> > kvm_arch_has_private_mem().
> >
> > A 'kvm' reference is added in memslot structure since in
> > inaccessible_notifier callback we can only obtain a memslot reference
> > but 'kvm' is needed to do the zapping.
> >
> > Co-developed-by: Yu Zhang 
> > Signed-off-by: Yu Zhang 
> > Signed-off-by: Chao Peng 
> > ---
> >  include/linux/kvm_host.h |   1 +
> >  virt/kvm/kvm_main.c  | 116 +--
> >  2 files changed, 111 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index b9906cdf468b..cb4eefac709c 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -589,6 +589,7 @@ struct kvm_memory_slot {
> > struct file *private_file;
> > loff_t private_offset;
> > struct inaccessible_notifier notifier;
> > +   struct kvm *kvm;
> >  };
> >
> >  static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot 
> > *slot)
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 97d893f7482c..87e239d35b96 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -983,6 +983,57 @@ static int kvm_vm_ioctl_set_mem_attr(struct kvm *kvm, 
> > gpa_t gpa, gpa_t size,
> > xa_erase(&kvm->mem_attr_array, index);
> > return r;
> >  }
> > +
> > +static void kvm_private_notifier_invalidate(struct inaccessible_notifier 
> > *notifier,
> > +   pgoff_t start, pgoff_t end)
> > +{
> > +   struct kvm_memory_slot *slot = container_of(notifier,
> > +   struct kvm_memory_slot,
> > +   notifier);
> > +   unsigned long base_pgoff = slot->private_offset >> PAGE_SHIFT;
> > +   gfn_t start_gfn = slot->base_gfn;
> > +   gfn_t end_gfn = slot->base_gfn + slot->npages;
> > +
> > +
> > +   if (start > base_pgoff)
> > +   start_gfn = slot->base_gfn + start - base_pgoff;
> > +
> > +   if (end < base_pgoff + slot->npages)
> > +   end_gfn = slot->base_gfn + end - base_pgoff;
> > +
> > +   if (start_gfn >= end_gfn)
> > +   return;
> > +
> > +   kvm_zap_gfn_range(slot->kvm, start_gfn, end_gfn);
> > +}
> > +
> > +static struct inaccessible_notifier_ops kvm_private_notifier_ops = {
> > +   .invalidate = kvm_private_notifier_invalidate,
> > +};
> > +
> > +static inline void kvm_private_mem_register(struct kvm_memory_slot *slot)
> > +{
> > +   slot->notifier.ops = &kvm_private_notifier_ops;
> > +   inaccessible_register_notifier(slot->private_file, &slot->notifier);
> > +}
> > +
> > +static inline void kvm_private_mem_unregister(struct kvm_memory_slot *slot)
> > +{
> > +   inaccessible_unregister_notifier(slot->private_file, 
> > &slot->notifier);
> > +}
> > +
> > +#else /* !CONFIG_HAVE_KVM_PRIVATE_MEM */
> > +
> > +static inline void kvm_private_mem_register(struct kvm_memory_slot *slot)
> > +{
> > +   WARN_ON_ONCE(1);
> > +}
> > +
> > +static inline void kvm_private_mem_unregister(struct kvm_memory_slot *slot)
> > +{
> > +   WARN_ON_ONCE(1);
> > +}
> > +
> >  #endif /* CONFIG_HAVE_KVM_PRIVATE_MEM */
> >
> >  #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
> > @@ -1029,6 +1080,11 @@ static void kvm_destroy_dirty_bitmap(struct 
> > kvm_memory_slot *memslot)
> >  /* This does not remove the slot from struct kvm_memslots data structures 
> > */
> >  static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> >  {
> > +   if (slot->flags & KVM_MEM_PRIVATE) {
> > +   kvm_private_mem_unregister(slot);
> > +   fput(slot->private_file);
> > +   }
> > +
> > kvm_destroy_dirty_bitmap(slot);
> >
> > kvm_arch_free_memslot(kvm, slot);
> > @@ -1600,10 +1656,16 @@ bool __weak kvm_arch_has_private_mem(struct kvm 
> > *kvm)
> > return false;
> >  }
> >
> > -static int check_memory_region_flags(const struct kvm_user_mem_region *mem)
> > +static int check_memory_region_flags(struct kvm *kvm,
> > +const struct kvm_user_mem_region *mem)
> >  {
> > u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
> >
> > +#ifdef CONFIG_HAVE_KVM_PRIVATE_MEM
> > +   if (kvm_arch_has_private_mem(kvm))
> > +   valid_flags |= KVM_MEM_PRIVATE;
> > +#endif
> > +
> >  #ifdef __KVM_HAVE_READONLY_MEM
> > 

Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory

2022-10-10 Thread Chao Peng
On Sat, Oct 08, 2022 at 08:35:47PM +0300, Jarkko Sakkinen wrote:
> On Sat, Oct 08, 2022 at 07:15:17PM +0300, Jarkko Sakkinen wrote:
> > On Sat, Oct 08, 2022 at 12:54:32AM +0300, Jarkko Sakkinen wrote:
> > > On Fri, Oct 07, 2022 at 02:58:54PM +, Sean Christopherson wrote:
> > > > On Fri, Oct 07, 2022, Jarkko Sakkinen wrote:
> > > > > On Thu, Oct 06, 2022 at 03:34:58PM +, Sean Christopherson wrote:
> > > > > > On Thu, Oct 06, 2022, Jarkko Sakkinen wrote:
> > > > > > > On Thu, Oct 06, 2022 at 05:58:03PM +0300, Jarkko Sakkinen wrote:
> > > > > > > > On Thu, Sep 15, 2022 at 10:29:07PM +0800, Chao Peng wrote:
> > > > > > > > > This new extension, indicated by the new flag 
> > > > > > > > > KVM_MEM_PRIVATE, adds two
> > > > > > > > > additional KVM memslot fields private_fd/private_offset to 
> > > > > > > > > allow
> > > > > > > > > userspace to specify that guest private memory provided from 
> > > > > > > > > the
> > > > > > > > > private_fd and guest_phys_addr mapped at the private_offset 
> > > > > > > > > of the
> > > > > > > > > private_fd, spanning a range of memory_size.
> > > > > > > > > 
> > > > > > > > > The extended memslot can still have the userspace_addr(hva). 
> > > > > > > > > When use, a
> > > > > > > > > single memslot can maintain both private memory through 
> > > > > > > > > private
> > > > > > > > > fd(private_fd/private_offset) and shared memory through
> > > > > > > > > hva(userspace_addr). Whether the private or shared part is 
> > > > > > > > > visible to
> > > > > > > > > guest is maintained by other KVM code.
> > > > > > > > 
> > > > > > > > What is anyway the appeal of private_offset field, instead of 
> > > > > > > > having just
> > > > > > > > 1:1 association between regions and files, i.e. one memfd per 
> > > > > > > > region?
> > > > > > 
> > > > > > Modifying memslots is slow, both in KVM and in QEMU (not sure about 
> > > > > > Google's VMM).
> > > > > > E.g. if a vCPU converts a single page, it will be forced to wait 
> > > > > > until all other
> > > > > > vCPUs drop SRCU, which can have severe latency spikes, e.g. if KVM 
> > > > > > is faulting in
> > > > > > memory.  KVM's memslot updates also hold a mutex for the entire 
> > > > > > duration of the
> > > > > > update, i.e. conversions on different vCPUs would be fully 
> > > > > > serialized, exacerbating
> > > > > > the SRCU problem.
> > > > > > 
> > > > > > KVM also has historical baggage where it "needs" to zap _all_ SPTEs 
> > > > > > when any
> > > > > > memslot is deleted.
> > > > > > 
> > > > > > Taking both a private_fd and a shared userspace address allows 
> > > > > > userspace to convert
> > > > > > between private and shared without having to manipulate memslots.
> > > > > 
> > > > > Right, this was really good explanation, thank you.
> > > > > 
> > > > > Still wondering could this possibly work (or not):
> > > > > 
> > > > > 1. Union userspace_addr and private_fd.
> > > > 
> > > > No, because userspace needs to be able to provide both userspace_addr 
> > > > (shared
> > > > memory) and private_fd (private memory) for a single memslot.
> > > 
> > > Got it, thanks for clearing my misunderstandings on this topic, and it
> > > is quite obviously visible in 5/8 and 7/8. I.e. if I got it right,
> > > memblock can be partially private, and you dig the shared holes with
> > > KVM_MEMORY_ENCRYPT_UNREG_REGION. We have (in Enarx) ATM have memblock
> > > per host mmap, I was looking into this dilated by that mindset but makes
> > > definitely sense to support that.
> > 
> > For me the most useful reference with this feature is kvm_set_phys_mem()
> > implementation in privmem-v8 branch. Took while to find it because I did
> > not have much experience with QEMU code base. I'd even recommend to mention
> > that function in the cover letter because it is really good reference on
> > how this feature is supposed to be used.

That's a good point, I can mention that if people find useful. 

> 
> While learning QEMU code, I also noticed bunch of comparison like this:
> 
> if (slot->flags | KVM_MEM_PRIVATE)
> 
> I guess those could be just replaced with unconditional fills as it does
> not do any harm, if KVM_MEM_PRIVATE is not set.

Make sense, thanks.

Chao
> 
> BR, Jarkko



Re: [PATCH] block/io_uring: revert "Use io_uring_register_ring_fd() to skip fd operations"

2022-10-10 Thread Stefano Garzarella

On Sat, Sep 24, 2022 at 10:48:15PM +0800, Sam Li wrote:

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1193

The commit "Use io_uring_register_ring_fd() to skip fd operations" broke
when booting a guest with iothread and io_uring. That is because the
io_uring_register_ring_fd() call is made from the main thread instead of
IOThread where io_uring_submit() is called. It can not be guaranteed
to register the ring fd in the correct thread or unregister the same ring
fd if the IOThread is disabled. This optimization is not critical so we
will revert previous commit.


Right, maybe we can call it on the first submit.



This reverts commit e2848bc574fe2715c694bf8fe9a1ba7f78a1125a
and 77e3f038af1764983087e3551a0fde9951952c4d.


Yep, better to revert for now:

Reviewed-by: Stefano Garzarella 


Should we queue this for stable?

Thanks,
Stefano



Signed-off-by: Sam Li 
---
block/io_uring.c | 13 +
meson.build  |  1 -
2 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index a1760152e0..973e15d876 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -11,7 +11,6 @@
#include "qemu/osdep.h"
#include 
#include "block/aio.h"
-#include "qemu/error-report.h"
#include "qemu/queue.h"
#include "block/block.h"
#include "block/raw-aio.h"
@@ -19,7 +18,6 @@
#include "qapi/error.h"
#include "trace.h"

-
/* io_uring ring size */
#define MAX_ENTRIES 128

@@ -432,17 +430,8 @@ LuringState *luring_init(Error **errp)
}

ioq_init(&s->io_q);
-#ifdef CONFIG_LIBURING_REGISTER_RING_FD
-if (io_uring_register_ring_fd(&s->ring) < 0) {
-/*
- * Only warn about this error: we will fallback to the non-optimized
- * io_uring operations.
- */
-warn_report("failed to register linux io_uring ring file descriptor");
-}
-#endif
-
return s;
+
}

void luring_cleanup(LuringState *s)
diff --git a/meson.build b/meson.build
index 3885fc1076..63cfb844cf 100644
--- a/meson.build
+++ b/meson.build
@@ -1793,7 +1793,6 @@ config_host_data.set('CONFIG_LIBNFS', libnfs.found())
config_host_data.set('CONFIG_LIBSSH', libssh.found())
config_host_data.set('CONFIG_LINUX_AIO', libaio.found())
config_host_data.set('CONFIG_LINUX_IO_URING', linux_io_uring.found())
-config_host_data.set('CONFIG_LIBURING_REGISTER_RING_FD', 
cc.has_function('io_uring_register_ring_fd', prefix: '#include ', 
dependencies:linux_io_uring))
config_host_data.set('CONFIG_LIBPMEM', libpmem.found())
config_host_data.set('CONFIG_NUMA', numa.found())
config_host_data.set('CONFIG_OPENGL', opengl.found())
--
2.37.3






[PATCH 2/2] tests/qtest/ide-test: Verify READ NATIVE MAX ADDRESS is not limited

2022-10-10 Thread Lev Kujawski
Verify that the ATA command READ NATIVE MAX ADDRESS returns the last
valid CHS tuple for the native device rather than any limit
established by INITIALIZE DEVICE PARAMETERS.

Signed-off-by: Lev Kujawski 
---
 tests/qtest/ide-test.c | 47 +-
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/ide-test.c b/tests/qtest/ide-test.c
index dbe1563b23..c406e6752a 100644
--- a/tests/qtest/ide-test.c
+++ b/tests/qtest/ide-test.c
@@ -37,7 +37,8 @@
 /* TODO actually test the results and get rid of this */
 #define qmp_discard_response(q, ...) qobject_unref(qtest_qmp(q, __VA_ARGS__))
 
-#define TEST_IMAGE_SIZE 64 * 1024 * 1024
+/* Specified by ATA (physical) CHS geometry for ~64 MiB device.  */
+#define TEST_IMAGE_SIZE ((130 * 16 * 63) * 512)
 
 #define IDE_PCI_DEV 1
 #define IDE_PCI_FUNC1
@@ -91,11 +92,13 @@ enum {
 enum {
 CMD_DSM = 0x06,
 CMD_DIAGNOSE= 0x90,
+CMD_INIT_DP = 0x91,  /* INITIALIZE DEVICE PARAMETERS */
 CMD_READ_DMA= 0xc8,
 CMD_WRITE_DMA   = 0xca,
 CMD_FLUSH_CACHE = 0xe7,
 CMD_IDENTIFY= 0xec,
 CMD_PACKET  = 0xa0,
+CMD_READ_NATIVE = 0xf8,  /* READ NATIVE MAX ADDRESS */
 
 CMDF_ABORT  = 0x100,
 CMDF_NO_BM  = 0x200,
@@ -562,6 +565,46 @@ static void string_cpu_to_be16(uint16_t *s, size_t bytes)
 }
 }
 
+static void test_specify(void)
+{
+QTestState *qts;
+QPCIDevice *dev;
+QPCIBar bmdma_bar, ide_bar;
+uint16_t cyls;
+uint8_t heads, spt;
+
+qts = ide_test_start(
+"-blockdev driver=file,node-name=hda,filename=%s "
+"-device ide-hd,drive=hda,bus=ide.0,unit=0 ",
+tmp_path[0]);
+
+dev = get_pci_device(qts, &bmdma_bar, &ide_bar);
+
+/* Initialize drive with zero sectors per track and one head.  */
+qpci_io_writeb(dev, ide_bar, reg_nsectors, 0);
+qpci_io_writeb(dev, ide_bar, reg_device, 0);
+qpci_io_writeb(dev, ide_bar, reg_command, CMD_INIT_DP);
+
+/* READ NATIVE MAX ADDRESS (CHS mode).  */
+qpci_io_writeb(dev, ide_bar, reg_device, 0xa0);
+qpci_io_writeb(dev, ide_bar, reg_command, CMD_READ_NATIVE);
+
+heads = qpci_io_readb(dev, ide_bar, reg_device) & 0xf;
+++heads;
+g_assert_cmpint(heads, ==, 16);
+
+cyls = qpci_io_readb(dev, ide_bar, reg_lba_high) << 8;
+cyls |= qpci_io_readb(dev, ide_bar, reg_lba_middle);
+++cyls;
+g_assert_cmpint(cyls, ==, 130);
+
+spt = qpci_io_readb(dev, ide_bar, reg_lba_low);
+g_assert_cmpint(spt, ==, 63);
+
+ide_test_quit(qts);
+free_pci_device(dev);
+}
+
 static void test_identify(void)
 {
 QTestState *qts;
@@ -1079,6 +1122,8 @@ int main(int argc, char **argv)
 /* Run the tests */
 g_test_init(&argc, &argv, NULL);
 
+qtest_add_func("/ide/read_native", test_specify);
+
 qtest_add_func("/ide/identify", test_identify);
 
 qtest_add_func("/ide/diagnostic", test_diagnostic);
-- 
2.34.1




Re: [PATCH] linux-user: Implement faccessat2

2022-10-10 Thread Helge Deller

On 10/9/22 08:08, WANG Xuerui wrote:

User space has been preferring this syscall for a while, due to its
closer match with C semantics, and newer platforms such as LoongArch
apparently have libc implementations that don't fallback to faccessat
so normal access checks are failing without the emulation in place.

Tested by successfully emerging several packages within a Gentoo loong
stage3 chroot, emulated on amd64 with help of static qemu-loongarch64.

Reported-by: Andreas K. Hüttel 
Signed-off-by: WANG Xuerui 
---
  linux-user/strace.list | 3 +++
  linux-user/syscall.c   | 9 +
  2 files changed, 12 insertions(+)


There were two similar approaches from Richard and me:
https://lore.kernel.org/qemu-devel/20220729201414.29869-1-richard.hender...@linaro.org/#t
and
https://lore.kernel.org/qemu-devel/YzLdcnL6x646T61W@p100/


diff --git a/linux-user/strace.list b/linux-user/strace.list
index a87415bf3d..3df2184580 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -178,6 +178,9 @@
  #ifdef TARGET_NR_faccessat
  { TARGET_NR_faccessat, "faccessat" , NULL, print_faccessat, NULL },
  #endif
+#ifdef TARGET_NR_faccessat2
+{ TARGET_NR_faccessat2, "faccessat2" , NULL, print_faccessat, NULL },
+#endif


You are missing that part (from my patch):

--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -1931,7 +1931,7 @@ print_execv(CPUArchState *cpu_env, const struct 
syscallname *name,
 }
 #endif

-#ifdef TARGET_NR_faccessat
+#if defined(TARGET_NR_faccessat) || defined(TARGET_NR_faccessat2)

otherwise if TARGET_NR_faccessat isn't defined, you won't have
the function print_faccessat() in strace.c defined.



  #ifdef TARGET_NR_fadvise64
  { TARGET_NR_fadvise64, "fadvise64" , NULL, NULL, NULL },
  #endif
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 2e954d8dbd..a81f0b65b9 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -9110,6 +9110,15 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
  unlock_user(p, arg2, 0);
  return ret;
  #endif
+#if defined(TARGET_NR_faccessat2) && defined(__NR_faccessat2)
+case TARGET_NR_faccessat2:
+if (!(p = lock_user_string(arg2))) {
+return -TARGET_EFAULT;
+}
+ret = get_errno(faccessat(arg1, p, arg3, arg4));


You rely here on the libc faccessat() function to either use
faccessat2() or faccessat() syscalls - which is probably the
best way around...

Helge


+unlock_user(p, arg2, 0);
+return ret;
+#endif
  #ifdef TARGET_NR_nice /* not on alpha */
  case TARGET_NR_nice:
  return get_errno(nice(arg1));





[PATCH 1/2] hw/ide/core.c (cmd_read_native_max): Avoid limited device parameters

2022-10-10 Thread Lev Kujawski
Always use the native CHS device parameters for the ATA commands READ
NATIVE MAX ADDRESS and READ NATIVE MAX ADDRESS EXT, not those limited
by the ATA command INITIALIZE_DEVICE_PARAMETERS (introduced in patch
176e4961, hw/ide/core.c: Implement ATA INITIALIZE_DEVICE_PARAMETERS
command, 2022-07-07.)

As stated by the ATA/ATAPI specification, "[t]he native maximum is the
highest address accepted by the device in the factory default
condition."  Therefore this patch substitutes the native values in
drive_heads and drive_sectors before calling ide_set_sector().

One consequence of the prior behavior was that setting zero sectors
per track could lead to an FPE within ide_set_sector().  Thanks to
Alexander Bulekov for reporting this issue.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1243
Signed-off-by: Lev Kujawski 
---
 hw/ide/core.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 39afdc0006..ee836401bc 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -1608,11 +1608,24 @@ static bool cmd_read_native_max(IDEState *s, uint8_t 
cmd)
 /* Refuse if no sectors are addressable (e.g. medium not inserted) */
 if (s->nb_sectors == 0) {
 ide_abort_command(s);
-return true;
-}
+} else {
+/*
+ * Save the active drive parameters, which may have been
+ * limited from their native counterparts by, e.g., INITIALIZE
+ * DEVICE PARAMETERS or SET MAX ADDRESS.
+ */
+const int aheads = s->heads;
+const int asectors = s->sectors;
 
-ide_cmd_lba48_transform(s, lba48);
-ide_set_sector(s, s->nb_sectors - 1);
+s->heads = s->drive_heads;
+s->sectors = s->drive_sectors;
+
+ide_cmd_lba48_transform(s, lba48);
+ide_set_sector(s, s->nb_sectors - 1);
+
+s->heads = aheads;
+s->sectors = asectors;
+}
 
 return true;
 }
-- 
2.34.1




Re: [PATCH v2] tests: Add sndio to the FreeBSD CI containers / VM

2022-10-10 Thread Daniel P . Berrangé
On Fri, Oct 07, 2022 at 06:27:29PM -0400, Brad Smith wrote:
> On 10/7/2022 4:33 PM, Warner Losh wrote:
> > 
> > 
> > On Fri, Oct 7, 2022 at 1:21 AM Brad Smith  wrote:
> > 
> > tests: Add sndio to the FreeBSD CI containers / VM
> > 
> > ---
> >  .gitlab-ci.d/cirrus/freebsd-12.vars           |   2 +-
> >  .gitlab-ci.d/cirrus/freebsd-13.vars           |   2 +-
> >  tests/docker/dockerfiles/alpine.docker        |   3 +-
> >  tests/docker/dockerfiles/centos8.docker       |   2 +-
> >  .../dockerfiles/debian-amd64-cross.docker     | 235 -
> >  tests/docker/dockerfiles/debian-amd64.docker  | 237
> > +-
> >  .../dockerfiles/debian-arm64-cross.docker     | 233 -
> >  .../dockerfiles/debian-armel-cross.docker     | 231 -
> >  .../dockerfiles/debian-armhf-cross.docker     | 233 -
> >  .../dockerfiles/debian-mips64el-cross.docker  | 227 -
> >  .../dockerfiles/debian-mipsel-cross.docker    | 227 -
> >  .../dockerfiles/debian-ppc64el-cross.docker   | 231 -
> >  .../dockerfiles/debian-s390x-cross.docker     | 229 -
> >  tests/docker/dockerfiles/fedora.docker        | 230 -
> >  tests/docker/dockerfiles/opensuse-leap.docker |   3 +-
> >  tests/docker/dockerfiles/ubuntu2004.docker    | 235 -
> >  tests/lcitool/libvirt-ci                      |   2 +-
> >  tests/lcitool/projects/qemu.yml               |   1 +
> >  tests/vm/freebsd                              |   3 +
> >  19 files changed, 1291 insertions(+), 1275 deletions(-)
> > 
> > 
> > This looks good to me. Why did the Linux containers need updating for
> > the FreeBSD update?
> > 
> > Otherwise, the changes look good to my eye
> > 
> > Reviewed-by:  Warner Losh 
> 
> 
> Because the CI configs are auto-generated. When refreshing them it generates
> them all. The intent was
> to update the FreeBSD configs, but when adding the dependency to
> tests/lcitool/projects/qemu.yml
> the FreeBSD configs are updated as well as the rest. Whatever OS's have a
> corresponding mapping
> in libvirt-ci are updated.

The POV of libvirt-ci, is that if the dependancy exists in any given
platform, we add it to the package list, so that we maximise the test
coverage across platforms. Surprisingly sndio was available in several
Linux distros.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: total fail on FreeBSD 14.0 amd64 regardless of compiler

2022-10-10 Thread Daniel P . Berrangé
On Mon, Oct 10, 2022 at 06:56:51AM +, Dennis Clarke wrote:
> 
> re: https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg01249.html
> 
> Using GCC 12 is even worse :
> 
> [2040/6841] Compiling C object qemu-system-aarch64.p/softmmu_main.c.o
> [2041/6841] Linking target qemu-system-aarch64
> FAILED: qemu-system-aarch64
> /usr/local/bin/g++12 -m64 -mcx16 @qemu-system-aarch64.rsp
> /usr/local/bin/ld: libqemuutil.a.p/util_filemonitor-inotify.c.o: undefined
> reference to symbol 'inotify_init1'
> /usr/local/bin/ld: /usr/local/lib/libinotify.so.0: error adding symbols: DSO
> missing from command line

This is trying to tell us that inotify_init1 is in libinotify.so.0
but we have not put -linotify on the command line for the link
step.  On Linux these are a standard part of glibc, but on FreeBSD
they are separate.

I'm thinking that in previous FreeBSD we probably got this linked
indirectly, but something in 14.0 has become stricer, wanting it
listed explicitly.

IOW, a meson.build change is likely needed to  add -linotify on
FreeBSD.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v3 00/10] Introduce new acpi/smbios avocado tests using biosbits

2022-10-10 Thread Thomas Huth

On 10/10/2022 10.13, Ani Sinha wrote:

On Mon, Oct 10, 2022 at 1:24 PM Ani Sinha  wrote:


Please see the README file added in patch 10 for more details.
Sample runs are as follows:

$ ./tests/venv/bin/avocado run -t acpi tests/avocado --tap -
ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
(smbios.py, line 92)
ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
(smilatency.py, line 47)
ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
(testacpi.py, line 158)


None of the above files are avocado tests or avocado related python
scripts. They are run from within bits in a python 2.7 environment. I
could not find a mechanism to exclude a directory from avocado tests.
I also do not think making those scripts python 3 compliant is a good
use of my time since upgrading bits to use python 3 would be a major
task unrelated to QEMU testing.


Maybe you could at least switch those three lines to use the new print() 
syntax to silence at least these ugly errors? ... Python 2.7 should cope 
very well with the new syntax, as far as I know...


Otherwise, it might be better to put the non-avocado python files into 
another directory under tests/ ?


 Thomas





[PATCH v2 6/7] hostmem: Allow for specifying a ThreadContext for preallocation

2022-10-10 Thread David Hildenbrand
Let's allow for specifying a thread context via the "prealloc-context"
property. When set, preallcoation threads will be crated via the
thread context -- inheriting the same CPU affinity as the thread
context.

Pinning preallcoation threads to CPUs can heavily increase performance
in NUMA setups, because, preallocation from a CPU close to the target
NUMA node(s) is faster then preallocation from a CPU further remote,
simply because of memory bandwidth for initializing memory with zeroes.
This is especially relevant for very large VMs backed by huge/gigantic
pages, whereby preallocation is mandatory.

Reviewed-by: Michal Privoznik 
Signed-off-by: David Hildenbrand 
---
 backends/hostmem.c   | 12 +---
 include/sysemu/hostmem.h |  2 ++
 qapi/qom.json|  4 
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 76f0394490..8640294c10 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -232,8 +232,8 @@ static void host_memory_backend_set_prealloc(Object *obj, 
bool value,
 void *ptr = memory_region_get_ram_ptr(&backend->mr);
 uint64_t sz = memory_region_size(&backend->mr);
 
-qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, NULL,
-  &local_err);
+qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads,
+  backend->prealloc_context, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -385,7 +385,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
  */
 if (backend->prealloc) {
 qemu_prealloc_mem(memory_region_get_fd(&backend->mr), ptr, sz,
-  backend->prealloc_threads, NULL, &local_err);
+  backend->prealloc_threads,
+  backend->prealloc_context, &local_err);
 if (local_err) {
 goto out;
 }
@@ -493,6 +494,11 @@ host_memory_backend_class_init(ObjectClass *oc, void *data)
 NULL, NULL);
 object_class_property_set_description(oc, "prealloc-threads",
 "Number of CPU threads to use for prealloc");
+object_class_property_add_link(oc, "prealloc-context",
+TYPE_THREAD_CONTEXT, offsetof(HostMemoryBackend, prealloc_context),
+object_property_allow_set_link, OBJ_PROP_LINK_STRONG);
+object_class_property_set_description(oc, "prealloc-context",
+"Context to use for creating CPU threads for preallocation");
 object_class_property_add(oc, "size", "int",
 host_memory_backend_get_size,
 host_memory_backend_set_size,
diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
index 9ff5c16963..39326f1d4f 100644
--- a/include/sysemu/hostmem.h
+++ b/include/sysemu/hostmem.h
@@ -18,6 +18,7 @@
 #include "qom/object.h"
 #include "exec/memory.h"
 #include "qemu/bitmap.h"
+#include "qemu/thread-context.h"
 
 #define TYPE_MEMORY_BACKEND "memory-backend"
 OBJECT_DECLARE_TYPE(HostMemoryBackend, HostMemoryBackendClass,
@@ -66,6 +67,7 @@ struct HostMemoryBackend {
 bool merge, dump, use_canonical_path;
 bool prealloc, is_mapped, share, reserve;
 uint32_t prealloc_threads;
+ThreadContext *prealloc_context;
 DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
 HostMemPolicy policy;
 
diff --git a/qapi/qom.json b/qapi/qom.json
index cb6f700c0c..efa74f9799 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -578,6 +578,9 @@
 #
 # @prealloc-threads: number of CPU threads to use for prealloc (default: 1)
 #
+# @prealloc-context: thread context to use for creation of preallocation 
threads
+#(default: none) (since 7.2)
+#
 # @share: if false, the memory is private to QEMU; if true, it is shared
 # (default: false)
 #
@@ -608,6 +611,7 @@
 '*policy': 'HostMemPolicy',
 '*prealloc': 'bool',
 '*prealloc-threads': 'uint32',
+'*prealloc-context': 'str',
 '*share': 'bool',
 '*reserve': 'bool',
 'size': 'size',
-- 
2.37.3




Re: [PATCH v4 0/2] Add memmap and fix bugs for LoongArch

2022-10-10 Thread gaosong



在 2022/9/30 17:51, Xiaojuan Yang 写道:

This series add memmap table and fix extioi, ipi device
emulation for LoongArch virt machine.

Changes for v4:
Add 'reviewed-by' tag in fixing ipi patch, and other changes
are the same as v3.
1. Remove the memmap table patch in this series, it
will apply until we have more than one machinestate.
2. Using MemTxAttrs' requester_type and requester_id
to get current cpu index in loongarch extioi regs
emulation.
This patch based on:
20220927141504.3886314-1-alex.ben...@linaro.org
3. Rewrite the commit message of fixing ipi patch, and
add reviewed by tag in the patch.

Changes for v3:
1. Remove the memmap table patch in this series, it
will apply until we have more than one machinestate.
2. Using MemTxAttrs' requester_type and requester_id
to get current cpu index in loongarch extioi regs
emulation.
This patch based on:
20220927141504.3886314-1-alex.ben...@linaro.org
3. Rewrite the commit message of fixing ipi patch, and
this patch has been reviewed.

Changes for v2:
1. Adjust the position of 'PLATFORM' element in memmap table

Changes for v1:
1. Add memmap table for LoongArch virt machine
2. Fix LoongArch extioi function
3. Fix LoongArch ipi device emulation

Xiaojuan Yang (2):
   hw/intc: Fix LoongArch extioi function
   hw/intc: Fix LoongArch ipi device emulation

  hw/intc/loongarch_extioi.c  | 51 +++--
  hw/intc/loongarch_ipi.c |  1 -
  hw/intc/trace-events|  2 +-
  target/loongarch/iocsr_helper.c | 16 +--
  4 files changed, 38 insertions(+), 32 deletions(-)

Applied to loongarch-next.

Thanks.
Song Gao




Re: [PATCH v3 00/10] Introduce new acpi/smbios avocado tests using biosbits

2022-10-10 Thread Ani Sinha
On Mon, Oct 10, 2022 at 2:26 PM Thomas Huth  wrote:
>
> On 10/10/2022 10.13, Ani Sinha wrote:
> > On Mon, Oct 10, 2022 at 1:24 PM Ani Sinha  wrote:
> >>
> >> Please see the README file added in patch 10 for more details.
> >> Sample runs are as follows:
> >>
> >> $ ./tests/venv/bin/avocado run -t acpi tests/avocado --tap -
> >> ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
> >> (smbios.py, line 92)
> >> ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
> >> (smilatency.py, line 47)
> >> ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
> >> (testacpi.py, line 158)
> >
> > None of the above files are avocado tests or avocado related python
> > scripts. They are run from within bits in a python 2.7 environment. I
> > could not find a mechanism to exclude a directory from avocado tests.
> > I also do not think making those scripts python 3 compliant is a good
> > use of my time since upgrading bits to use python 3 would be a major
> > task unrelated to QEMU testing.
>
> Maybe you could at least switch those three lines to use the new print()
> syntax

There are lots of print statements in those three files using old
syntax. It's only complaining about the first one.

to silence at least these ugly errors? ... Python 2.7 should cope
> very well with the new syntax, as far as I know...
>
> Otherwise, it might be better to put the non-avocado python files into
> another directory under tests/ ?
>
>   Thomas
>
>



Re: [PATCH RESEND] linux-user: Fix struct statfs ABI on loongarch64

2022-10-10 Thread gaosong



在 2022/10/6 19:11, Philippe Mathieu-Daudé 写道:

On 6/10/22 12:07, WANG Xuerui wrote:

Previously the 32-bit version was incorrectly chosen, leading to funny
but incorrect output from e.g. df(1). Simply select the version
corresponding to the 64-bit asm-generic definition.

For reference, this program should produce the same output no matter
natively compiled or not, for loongarch64 or not:

```c
#include 
#include 

int main(int argc, const char *argv[])
{
   struct statfs b;
   if (statfs(argv[0], &b))
 return 1;

   printf("f_type = 0x%lx\n", b.f_type);
   printf("f_bsize = %ld\n", b.f_bsize);
   printf("f_blocks = %ld\n", b.f_blocks);
   printf("f_bfree = %ld\n", b.f_bfree);
   printf("f_bavail = %ld\n", b.f_bavail);

   return 0;
}

// Example output on my amd64 box, with the test binary residing on a
// btrfs partition.

// Native and emulated output after the fix:
//
// f_type = 0x9123683e
// f_bsize = 4096
// f_blocks = 268435456
// f_bfree = 168406890
// f_bavail = 168355058

// Output before the fix, note the messed layout:
//
// f_type = 0x10009123683e
// f_bsize = 723302085239504896
// f_blocks = 168355058
// f_bfree = 2250817541779750912
// f_bavail = 1099229433104
```

Fixes: 1f63019632 ("linux-user: Add LoongArch syscall support")
Signed-off-by: WANG Xuerui 
Cc: Song Gao 
Cc: Xiaojuan Yang 
Cc: Andreas K. Hüttel 
---

Resend with amended commit message to 100% clarify the example output
are generated on my box and will differ for everyone else. Sorry for
the noise.


Reviewed-by: Philippe Mathieu-Daudé 

Applied to loongarch-next.

Thanks.
Song Gao





[PATCH v2 1/7] util: Cleanup and rename os_mem_prealloc()

2022-10-10 Thread David Hildenbrand
Let's
* give the function a "qemu_*" style name
* make sure the parameters in the implementation match the prototype
* rename smp_cpus to max_threads, which makes the semantics of that
  parameter clearer

... and add a function documentation.

Reviewed-by: Michal Privoznik 
Signed-off-by: David Hildenbrand 
---
 backends/hostmem.c |  6 +++---
 hw/virtio/virtio-mem.c |  2 +-
 include/qemu/osdep.h   | 17 +++--
 softmmu/cpus.c |  2 +-
 util/oslib-posix.c | 24 
 util/oslib-win32.c |  8 
 6 files changed, 36 insertions(+), 23 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 4428e06738..491cb10b97 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -232,7 +232,7 @@ static void host_memory_backend_set_prealloc(Object *obj, 
bool value,
 void *ptr = memory_region_get_ram_ptr(&backend->mr);
 uint64_t sz = memory_region_size(&backend->mr);
 
-os_mem_prealloc(fd, ptr, sz, backend->prealloc_threads, &local_err);
+qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -383,8 +383,8 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
  * specified NUMA policy in place.
  */
 if (backend->prealloc) {
-os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz,
-backend->prealloc_threads, &local_err);
+qemu_prealloc_mem(memory_region_get_fd(&backend->mr), ptr, sz,
+  backend->prealloc_threads, &local_err);
 if (local_err) {
 goto out;
 }
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 30d03e987a..0e9ef4ff19 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -467,7 +467,7 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, 
uint64_t start_gpa,
 int fd = memory_region_get_fd(&vmem->memdev->mr);
 Error *local_err = NULL;
 
-os_mem_prealloc(fd, area, size, 1, &local_err);
+qemu_prealloc_mem(fd, area, size, 1, &local_err);
 if (local_err) {
 static bool warned;
 
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index b1c161c035..e556e45143 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -568,8 +568,21 @@ unsigned long qemu_getauxval(unsigned long type);
 
 void qemu_set_tty_echo(int fd, bool echo);
 
-void os_mem_prealloc(int fd, char *area, size_t sz, int smp_cpus,
- Error **errp);
+/**
+ * qemu_prealloc_mem:
+ * @fd: the fd mapped into the area, -1 for anonymous memory
+ * @area: start address of the are to preallocate
+ * @sz: the size of the area to preallocate
+ * @max_threads: maximum number of threads to use
+ * @errp: returns an error if this function fails
+ *
+ * Preallocate memory (populate/prefault page tables writable) for the virtual
+ * memory area starting at @area with the size of @sz. After a successful call,
+ * each page in the area was faulted in writable at least once, for example,
+ * after allocating file blocks for mapped files.
+ */
+void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads,
+   Error **errp);
 
 /**
  * qemu_get_pid_name:
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 61b27ff59d..01c94fd298 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -354,7 +354,7 @@ static void qemu_init_sigbus(void)
 
 /*
  * ALERT: when modifying this, take care that SIGBUS forwarding in
- * os_mem_prealloc() will continue working as expected.
+ * qemu_prealloc_mem() will continue working as expected.
  */
 memset(&action, 0, sizeof(action));
 action.sa_flags = SA_SIGINFO;
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 827a7aadba..905cbc27cc 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -329,7 +329,7 @@ static void sigbus_handler(int signal)
 return;
 }
 #endif /* CONFIG_LINUX */
-warn_report("os_mem_prealloc: unrelated SIGBUS detected and ignored");
+warn_report("qemu_prealloc_mem: unrelated SIGBUS detected and ignored");
 }
 
 static void *do_touch_pages(void *arg)
@@ -399,13 +399,13 @@ static void *do_madv_populate_write_pages(void *arg)
 }
 
 static inline int get_memset_num_threads(size_t hpagesize, size_t numpages,
- int smp_cpus)
+ int max_threads)
 {
 long host_procs = sysconf(_SC_NPROCESSORS_ONLN);
 int ret = 1;
 
 if (host_procs > 0) {
-ret = MIN(MIN(host_procs, MAX_MEM_PREALLOC_THREAD_COUNT), smp_cpus);
+ret = MIN(MIN(host_procs, MAX_MEM_PREALLOC_THREAD_COUNT), max_threads);
 }
 
 /* Especially with gigantic pages, don't create more threads than pages. */
@@ -418,11 +418,11 @@ static inline int get_memset_num_threa

[PATCH v2 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext

2022-10-10 Thread David Hildenbrand
This is a follow-up on "util: NUMA aware memory preallocation" [1] by
Michal.

Setting the CPU affinity of threads from inside QEMU usually isn't
easily possible, because we don't want QEMU -- once started and running
guest code -- to be able to mess up the system. QEMU disallows relevant
syscalls using seccomp, such that any such invocation will fail.

Especially for memory preallocation in memory backends, the CPU affinity
can significantly increase guest startup time, for example, when running
large VMs backed by huge/gigantic pages, because of NUMA effects. For
NUMA-aware preallocation, we have to set the CPU affinity, however:

(1) Once preallocation threads are created during preallocation, management
tools cannot intercept anymore to change the affinity. These threads
are created automatically on demand.
(2) QEMU cannot easily set the CPU affinity itself.
(3) The CPU affinity derived from the NUMA bindings of the memory backend
might not necessarily be exactly the CPUs we actually want to use
(e.g., CPU-less NUMA nodes, CPUs that are pinned/used for other VMs).

There is an easy "workaround". If we have a thread with the right CPU
affinity, we can simply create new threads on demand via that prepared
context. So, all we have to do is setup and create such a context ahead
of time, to then configure preallocation to create new threads via that
environment.

So, let's introduce a user-creatable "thread-context" object that
essentially consists of a context thread used to create new threads.
QEMU can either try setting the CPU affinity itself ("cpu-affinity",
"node-affinity" property), or upper layers can extract the thread id
("thread-id" property) to configure it externally.

Make memory-backends consume a thread-context object
(via the "prealloc-context" property) and use it when preallocating to
create new threads with the desired CPU affinity. Further, to make it
easier to use, allow creation of "thread-context" objects, including
setting the CPU affinity directly from QEMU, before enabling the
sandbox option.


Quick test on a system with 2 NUMA nodes:

Without CPU affinity:
time qemu-system-x86_64 \
-object 
memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind
 \
-nographic -monitor stdio

real0m5.383s
real0m3.499s
real0m5.129s
real0m4.232s
real0m5.220s
real0m4.288s
real0m3.582s
real0m4.305s
real0m5.421s
real0m4.502s

-> It heavily depends on the scheduler CPU selection

With CPU affinity:
time qemu-system-x86_64 \
-object thread-context,id=tc1,node-affinity=0 \
-object 
memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind,prealloc-context=tc1
 \
-sandbox enable=on,resourcecontrol=deny \
-nographic -monitor stdio

real0m1.959s
real0m1.942s
real0m1.943s
real0m1.941s
real0m1.948s
real0m1.964s
real0m1.949s
real0m1.948s
real0m1.941s
real0m1.937s

On reasonably large VMs, the speedup can be quite significant.

While this concept is currently only used for short-lived preallocation
threads, nothing major speaks against reusing the concept for other
threads that are harder to identify/configure -- except that
we need additional (idle) context threads that are otherwise left unused.

This series does not yet tackle concurrent preallocation of memory
backends. Memory backend objects are created and memory is preallocated one
memory backend at a time -- and there is currently no way to do
preallocation asynchronously.

[1] 
https://lkml.kernel.org/r/ffdcd118d59b379ede2b64745144165a40f6a813.1652165704.git.mpriv...@redhat.com

v1 -> v2:
* Fixed some minor style nits
* "util: Introduce ThreadContext user-creatable object"
 -> Impove documentation and patch description. [Markus]
* "util: Add write-only "node-affinity" property for ThreadContext"
 -> Impove documentation and patch description. [Markus]

RFC -> v1:
* "vl: Allow ThreadContext objects to be created before the sandbox option"
 -> Move parsing of the "name" property before object_create_pre_sandbox
* Added RB's

Cc: Michal Privoznik 
Cc: Igor Mammedov 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: "Daniel P. Berrangé" 
Cc: Eduardo Habkost 
Cc: Dr. David Alan Gilbert 
Cc: Eric Blake 
Cc: Markus Armbruster 
Cc: Richard Henderson 
Cc: Stefan Weil 

David Hildenbrand (7):
  util: Cleanup and rename os_mem_prealloc()
  util: Introduce qemu_thread_set_affinity() and
qemu_thread_get_affinity()
  util: Introduce ThreadContext user-creatable object
  util: Add write-only "node-affinity" property for ThreadContext
  util: Make qemu_prealloc_mem() optionally consume a ThreadContext
  hostmem: Allow for specifying a ThreadContext for preallocation
  vl: Allow ThreadContext objects to be crea

[PATCH v2 3/7] util: Introduce ThreadContext user-creatable object

2022-10-10 Thread David Hildenbrand
Setting the CPU affinity of QEMU threads is a bit problematic, because
QEMU doesn't always have permissions to set the CPU affinity itself,
for example, with seccomp after initialized by QEMU:
-sandbox enable=on,resourcecontrol=deny

General information about CPU affinities can be found in the man page of
taskset:
CPU affinity is a scheduler property that "bonds" a process to a given
set of CPUs on the system. The Linux scheduler will honor the given CPU
affinity and the process will not run on any other CPUs.

While upper layers are already aware of how to handle CPU affinities for
long-lived threads like iothreads or vcpu threads, especially short-lived
threads, as used for memory-backend preallocation, are more involved to
handle. These threads are created on demand and upper layers are not even
able to identify and configure them.

Introduce the concept of a ThreadContext, that is essentially a thread
used for creating new threads. All threads created via that context
thread inherit the configured CPU affinity. Consequently, it's
sufficient to create a ThreadContext and configure it once, and have all
threads created via that ThreadContext inherit the same CPU affinity.

The CPU affinity of a ThreadContext can be configured two ways:

(1) Obtaining the thread id via the "thread-id" property and setting the
CPU affinity manually.

(2) Setting the "cpu-affinity" property and letting QEMU try set the
CPU affinity itself. This will fail if QEMU doesn't have permissions
to do so anymore after seccomp was initialized.

A simple QEMU example to set the CPU affinity to CPU 0,1,6,7 would be:
qemu-system-x86_64 -S \
  -object thread-context,id=tc1,cpu-affinity=0-1,cpu-affinity=6-7

And we can query it via HMP/QMP:
(qemu) qom-get tc1 cpu-affinity
[
0,
1,
6,
7
]

But note that due to dynamic library loading this example will not work
before we actually make use of thread_context_create_thread() in QEMU
code, because the type will otherwise not get registered.

A ThreadContext can be reused, simply by reconfiguring the CPU affinity.

Reviewed-by: Michal Privoznik 
Signed-off-by: David Hildenbrand 
---
 include/qemu/thread-context.h |  57 +++
 qapi/qom.json |  17 +++
 util/meson.build  |   1 +
 util/oslib-posix.c|   1 +
 util/thread-context.c | 278 ++
 5 files changed, 354 insertions(+)
 create mode 100644 include/qemu/thread-context.h
 create mode 100644 util/thread-context.c

diff --git a/include/qemu/thread-context.h b/include/qemu/thread-context.h
new file mode 100644
index 00..2ebd6b7fe1
--- /dev/null
+++ b/include/qemu/thread-context.h
@@ -0,0 +1,57 @@
+/*
+ * QEMU Thread Context
+ *
+ * Copyright Red Hat Inc., 2022
+ *
+ * Authors:
+ *  David Hildenbrand 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef SYSEMU_THREAD_CONTEXT_H
+#define SYSEMU_THREAD_CONTEXT_H
+
+#include "qapi/qapi-types-machine.h"
+#include "qemu/thread.h"
+#include "qom/object.h"
+
+#define TYPE_THREAD_CONTEXT "thread-context"
+OBJECT_DECLARE_TYPE(ThreadContext, ThreadContextClass,
+THREAD_CONTEXT)
+
+struct ThreadContextClass {
+ObjectClass parent_class;
+};
+
+struct ThreadContext {
+/* private */
+Object parent;
+
+/* private */
+unsigned int thread_id;
+QemuThread thread;
+
+/* Semaphore to wait for context thread action. */
+QemuSemaphore sem;
+/* Semaphore to wait for action in context thread. */
+QemuSemaphore sem_thread;
+/* Mutex to synchronize requests. */
+QemuMutex mutex;
+
+/* Commands for the thread to execute. */
+int thread_cmd;
+void *thread_cmd_data;
+
+/* CPU affinity bitmap used for initialization. */
+unsigned long *init_cpu_bitmap;
+int init_cpu_nbits;
+};
+
+void thread_context_create_thread(ThreadContext *tc, QemuThread *thread,
+  const char *name,
+  void *(*start_routine)(void *), void *arg,
+  int mode);
+
+#endif /* SYSEMU_THREAD_CONTEXT_H */
diff --git a/qapi/qom.json b/qapi/qom.json
index 80dd419b39..67d47f4051 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -830,6 +830,21 @@
 'reduced-phys-bits': 'uint32',
 '*kernel-hashes': 'bool' } }
 
+##
+# @ThreadContextProperties:
+#
+# Properties for thread context objects.
+#
+# @cpu-affinity: the list of CPU numbers used as CPU affinity for all threads
+#created in the thread context (default: QEMU main thread
+#affinity)
+#
+# Since: 7.2
+##
+{ 'struct': 'ThreadContextProperties',
+  'data': { '*cpu-affinity': ['uint16'] } }
+
+
 ##
 # @ObjectType:
 #
@@ -882,6 +897,7 @@
 { 'name': 'secret_keyring',
   'if': 'CONFIG_SECRET_KEYRING' },
 'sev-gue

[PATCH v2 5/7] util: Make qemu_prealloc_mem() optionally consume a ThreadContext

2022-10-10 Thread David Hildenbrand
... and implement it under POSIX. When a ThreadContext is provided,
create new threads via the context such that these new threads obtain a
porperly configured CPU affinity.

Reviewed-by: Michal Privoznik 
Signed-off-by: David Hildenbrand 
---
 backends/hostmem.c |  5 +++--
 hw/virtio/virtio-mem.c |  2 +-
 include/qemu/osdep.h   |  4 +++-
 util/oslib-posix.c | 20 ++--
 util/oslib-win32.c |  2 +-
 5 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 491cb10b97..76f0394490 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -232,7 +232,8 @@ static void host_memory_backend_set_prealloc(Object *obj, 
bool value,
 void *ptr = memory_region_get_ram_ptr(&backend->mr);
 uint64_t sz = memory_region_size(&backend->mr);
 
-qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, &local_err);
+qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads, NULL,
+  &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -384,7 +385,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
Error **errp)
  */
 if (backend->prealloc) {
 qemu_prealloc_mem(memory_region_get_fd(&backend->mr), ptr, sz,
-  backend->prealloc_threads, &local_err);
+  backend->prealloc_threads, NULL, &local_err);
 if (local_err) {
 goto out;
 }
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 0e9ef4ff19..ed170def48 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -467,7 +467,7 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, 
uint64_t start_gpa,
 int fd = memory_region_get_fd(&vmem->memdev->mr);
 Error *local_err = NULL;
 
-qemu_prealloc_mem(fd, area, size, 1, &local_err);
+qemu_prealloc_mem(fd, area, size, 1, NULL, &local_err);
 if (local_err) {
 static bool warned;
 
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index e556e45143..625298c8bc 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -568,6 +568,8 @@ unsigned long qemu_getauxval(unsigned long type);
 
 void qemu_set_tty_echo(int fd, bool echo);
 
+typedef struct ThreadContext ThreadContext;
+
 /**
  * qemu_prealloc_mem:
  * @fd: the fd mapped into the area, -1 for anonymous memory
@@ -582,7 +584,7 @@ void qemu_set_tty_echo(int fd, bool echo);
  * after allocating file blocks for mapped files.
  */
 void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads,
-   Error **errp);
+   ThreadContext *tc, Error **errp);
 
 /**
  * qemu_get_pid_name:
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 28305cdea3..59a891b6a8 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -419,7 +419,8 @@ static inline int get_memset_num_threads(size_t hpagesize, 
size_t numpages,
 }
 
 static int touch_all_pages(char *area, size_t hpagesize, size_t numpages,
-   int max_threads, bool use_madv_populate_write)
+   int max_threads, ThreadContext *tc,
+   bool use_madv_populate_write)
 {
 static gsize initialized = 0;
 MemsetContext context = {
@@ -458,9 +459,16 @@ static int touch_all_pages(char *area, size_t hpagesize, 
size_t numpages,
 context.threads[i].numpages = numpages_per_thread + (i < leftover);
 context.threads[i].hpagesize = hpagesize;
 context.threads[i].context = &context;
-qemu_thread_create(&context.threads[i].pgthread, "touch_pages",
-   touch_fn, &context.threads[i],
-   QEMU_THREAD_JOINABLE);
+if (tc) {
+thread_context_create_thread(tc, &context.threads[i].pgthread,
+ "touch_pages",
+ touch_fn, &context.threads[i],
+ QEMU_THREAD_JOINABLE);
+} else {
+qemu_thread_create(&context.threads[i].pgthread, "touch_pages",
+   touch_fn, &context.threads[i],
+   QEMU_THREAD_JOINABLE);
+}
 addr += context.threads[i].numpages * hpagesize;
 }
 
@@ -496,7 +504,7 @@ static bool madv_populate_write_possible(char *area, size_t 
pagesize)
 }
 
 void qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads,
-   Error **errp)
+   ThreadContext *tc, Error **errp)
 {
 static gsize initialized;
 int ret;
@@ -537,7 +545,7 @@ void qemu_prealloc_mem(int fd, char *area, size_t sz, int 
max_threads,
 }
 
 /* touch pages simultaneously */
-ret = touch_all_pages(area, hpagesize, numpages, max_threads,
+ret = touch_all_pages(area, hpage

[PATCH v2 7/7] vl: Allow ThreadContext objects to be created before the sandbox option

2022-10-10 Thread David Hildenbrand
Currently, there is no way to configure a CPU affinity inside QEMU when
the sandbox option disables it for QEMU as a whole, for example, via:
-sandbox enable=on,resourcecontrol=deny

While ThreadContext objects can be created on the QEMU commandline and
the CPU affinity can be configured externally via the thread-id, this is
insufficient if a ThreadContext with a certain CPU affinity is already
required during QEMU startup, before we can intercept QEMU and
configure the CPU affinity.

Blocking sched_setaffinity() was introduced in 24f8cdc57224 ("seccomp:
add resourcecontrol argument to command line"), "to avoid any bigger of the
process". However, we only care about once QEMU is running, not when
the instance starting QEMU explicitly requests a certain CPU affinity
on the QEMU comandline.

Right now, for NUMA-aware preallocation of memory backends used for initial
machine RAM, one has to:

1) Start QEMU with the memory-backend with "prealloc=off"
2) Pause QEMU before it starts the guest (-S)
3) Create ThreadContext, configure the CPU affinity using the thread-id
4) Configure the ThreadContext as "prealloc-context" of the memory
   backend
5) Trigger preallocation by setting "prealloc=on"

To simplify this handling especially for initial machine RAM,
allow creation of ThreadContext objects before parsing sandbox options,
such that the CPU affinity requested on the QEMU commandline alongside the
sandbox option can be set. As ThreadContext objects essentially only create
a persistant context thread and set the CPU affinity, this is easily
possible.

With this change, we can create a ThreadContext with a CPU affinity on
the QEMU commandline and use it for preallocation of memory backends
glued to the machine (simplified example):

To make "-name debug-threads=on" keep working as expected for the
context threads, perform earlier parsing of "-name".

qemu-system-x86_64 -m 1G \
 -object thread-context,id=tc1,cpu-affinity=3-4 \
 -object 
memory-backend-ram,id=pc.ram,size=1G,prealloc=on,prealloc-threads=2,prealloc-context=tc1
 \
 -machine memory-backend=pc.ram \
 -S -monitor stdio -sandbox enable=on,resourcecontrol=deny

And while we can query the current CPU affinity:
  (qemu) qom-get tc1 cpu-affinity
  [
  3,
  4
  ]

We can no longer change it from QEMU directly:
  (qemu) qom-set tc1 cpu-affinity 1-2
  Error: Setting CPU affinity failed: Operation not permitted

Reviewed-by: Michal Privoznik 
Signed-off-by: David Hildenbrand 
---
 softmmu/vl.c | 36 
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/softmmu/vl.c b/softmmu/vl.c
index b464da25bc..b5a23420ac 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -1759,6 +1759,27 @@ static void object_option_parse(const char *optarg)
 visit_free(v);
 }
 
+/*
+ * Very early object creation, before the sandbox options have been activated.
+ */
+static bool object_create_pre_sandbox(const char *type)
+{
+/*
+ * Objects should in general not get initialized "too early" without
+ * a reason. If you add one, state the reason in a comment!
+ */
+
+/*
+ * Reason: -sandbox on,resourcecontrol=deny disallows setting CPU
+ * affinity of threads.
+ */
+if (g_str_equal(type, "thread-context")) {
+return true;
+}
+
+return false;
+}
+
 /*
  * Initial object creation happens before all other
  * QEMU data types are created. The majority of objects
@@ -1773,6 +1794,11 @@ static bool object_create_early(const char *type)
  * add one, state the reason in a comment!
  */
 
+/* Reason: already created. */
+if (object_create_pre_sandbox(type)) {
+return false;
+}
+
 /* Reason: property "chardev" */
 if (g_str_equal(type, "rng-egd") ||
 g_str_equal(type, "qtest")) {
@@ -1895,7 +1921,7 @@ static void qemu_create_early_backends(void)
  */
 static bool object_create_late(const char *type)
 {
-return !object_create_early(type);
+return !object_create_early(type) && !object_create_pre_sandbox(type);
 }
 
 static void qemu_create_late_backends(void)
@@ -2351,6 +2377,11 @@ static int process_runstate_actions(void *opaque, 
QemuOpts *opts, Error **errp)
 
 static void qemu_process_early_options(void)
 {
+qemu_opts_foreach(qemu_find_opts("name"),
+  parse_name, NULL, &error_fatal);
+
+object_option_foreach_add(object_create_pre_sandbox);
+
 #ifdef CONFIG_SECCOMP
 QemuOptsList *olist = qemu_find_opts_err("sandbox", NULL);
 if (olist) {
@@ -2358,9 +2389,6 @@ static void qemu_process_early_options(void)
 }
 #endif
 
-qemu_opts_foreach(qemu_find_opts("name"),
-  parse_name, NULL, &error_fatal);
-
 if (qemu_opts_foreach(qemu_find_opts("action"),
   process_runstate_actions, NULL, &error_fatal)) {
 exit(1);
-- 
2.37.3




[PATCH v2 2/7] util: Introduce qemu_thread_set_affinity() and qemu_thread_get_affinity()

2022-10-10 Thread David Hildenbrand
Usually, we let upper layers handle CPU pinning, because
pthread_setaffinity_np() (-> sched_setaffinity()) is blocked via
seccomp when starting QEMU with
-sandbox enable=on,resourcecontrol=deny

However, we want to configure and observe the CPU affinity of threads
from QEMU directly in some cases when the sandbox option is either not
enabled or not active yet.

So let's add a way to configure CPU pinning via
qemu_thread_set_affinity() and obtain CPU affinity via
qemu_thread_get_affinity() and implement them under POSIX using
pthread_setaffinity_np() + pthread_getaffinity_np().

Implementation under Windows is possible using SetProcessAffinityMask()
+ GetProcessAffinityMask(), however, that is left as future work.

Reviewed-by: Michal Privoznik 
Signed-off-by: David Hildenbrand 
---
 include/qemu/thread.h|  4 +++
 meson.build  | 16 +
 util/qemu-thread-posix.c | 70 
 util/qemu-thread-win32.c | 12 +++
 4 files changed, 102 insertions(+)

diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index af19f2b3fc..79e507c7f0 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -185,6 +185,10 @@ void qemu_event_destroy(QemuEvent *ev);
 void qemu_thread_create(QemuThread *thread, const char *name,
 void *(*start_routine)(void *),
 void *arg, int mode);
+int qemu_thread_set_affinity(QemuThread *thread, unsigned long *host_cpus,
+ unsigned long nbits);
+int qemu_thread_get_affinity(QemuThread *thread, unsigned long **host_cpus,
+ unsigned long *nbits);
 void *qemu_thread_join(QemuThread *thread);
 void qemu_thread_get_self(QemuThread *thread);
 bool qemu_thread_is_self(QemuThread *thread);
diff --git a/meson.build b/meson.build
index b686dfef75..3e0aa4925d 100644
--- a/meson.build
+++ b/meson.build
@@ -2114,7 +2114,23 @@ config_host_data.set('CONFIG_PTHREAD_CONDATTR_SETCLOCK', 
cc.links(gnu_source_pre
 pthread_condattr_setclock(&attr, CLOCK_MONOTONIC);
 return 0;
   }''', dependencies: threads))
+config_host_data.set('CONFIG_PTHREAD_AFFINITY_NP', cc.links(gnu_source_prefix 
+ '''
+  #include 
 
+  static void *f(void *p) { return NULL; }
+  int main(void)
+  {
+int setsize = CPU_ALLOC_SIZE(64);
+pthread_t thread;
+cpu_set_t *cpuset;
+pthread_create(&thread, 0, f, 0);
+cpuset = CPU_ALLOC(64);
+CPU_ZERO_S(setsize, cpuset);
+pthread_setaffinity_np(thread, setsize, cpuset);
+pthread_getaffinity_np(thread, setsize, cpuset);
+CPU_FREE(cpuset);
+return 0;
+  }''', dependencies: threads))
 config_host_data.set('CONFIG_SIGNALFD', cc.links(gnu_source_prefix + '''
   #include 
   #include 
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index ac1d56e673..bae938c670 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -16,6 +16,7 @@
 #include "qemu/notify.h"
 #include "qemu-thread-common.h"
 #include "qemu/tsan.h"
+#include "qemu/bitmap.h"
 
 static bool name_threads;
 
@@ -552,6 +553,75 @@ void qemu_thread_create(QemuThread *thread, const char 
*name,
 pthread_attr_destroy(&attr);
 }
 
+int qemu_thread_set_affinity(QemuThread *thread, unsigned long *host_cpus,
+ unsigned long nbits)
+{
+#if defined(CONFIG_PTHREAD_AFFINITY_NP)
+const size_t setsize = CPU_ALLOC_SIZE(nbits);
+unsigned long value;
+cpu_set_t *cpuset;
+int err;
+
+cpuset = CPU_ALLOC(nbits);
+g_assert(cpuset);
+
+CPU_ZERO_S(setsize, cpuset);
+value = find_first_bit(host_cpus, nbits);
+while (value < nbits) {
+CPU_SET_S(value, setsize, cpuset);
+value = find_next_bit(host_cpus, nbits, value + 1);
+}
+
+err = pthread_setaffinity_np(thread->thread, setsize, cpuset);
+CPU_FREE(cpuset);
+return err;
+#else
+return -ENOSYS;
+#endif
+}
+
+int qemu_thread_get_affinity(QemuThread *thread, unsigned long **host_cpus,
+ unsigned long *nbits)
+{
+#if defined(CONFIG_PTHREAD_AFFINITY_NP)
+unsigned long tmpbits;
+cpu_set_t *cpuset;
+size_t setsize;
+int i, err;
+
+tmpbits = CPU_SETSIZE;
+while (true) {
+setsize = CPU_ALLOC_SIZE(tmpbits);
+cpuset = CPU_ALLOC(tmpbits);
+g_assert(cpuset);
+
+err = pthread_getaffinity_np(thread->thread, setsize, cpuset);
+if (err) {
+CPU_FREE(cpuset);
+if (err != -EINVAL) {
+return err;
+}
+tmpbits *= 2;
+} else {
+break;
+}
+}
+
+/* Convert the result into a proper bitmap. */
+*nbits = tmpbits;
+*host_cpus = bitmap_new(tmpbits);
+for (i = 0; i < tmpbits; i++) {
+if (CPU_ISSET(i, cpuset)) {
+set_bit(i, *host_cpus);
+}
+}
+CPU_FREE(cpuset);
+return 0;
+#else
+return -ENOSYS;
+#endif
+}
+
 void qemu_thread_get_self(QemuThread *thread)
 {
 

[PATCH v2 4/7] util: Add write-only "node-affinity" property for ThreadContext

2022-10-10 Thread David Hildenbrand
Let's make it easier to pin threads created via a ThreadContext to
all CPUs currently belonging to a given set of NUMA nodes -- which is the
common case.

"node-affinity" is simply a shortcut for setting "cpu-affinity" manually
to the list of CPUs belonging to the set of nodes. This property can only
be written.

A simple QEMU example to set the CPU affinity to Node 1 on a system with
two NUMA nodes, 24 CPUs each:
qemu-system-x86_64 -S \
  -object thread-context,id=tc1,node-affinity=1

And we can query the cpu-affinity via HMP/QMP:
(qemu) qom-get tc1 cpu-affinity
[
1,
3,
5,
7,
9,
11,
13,
15,
17,
19,
21,
23,
25,
27,
29,
31,
33,
35,
37,
39,
41,
43,
45,
47
]

We cannot query the node-affinity:
(qemu) qom-get tc1 node-affinity
Error: Insufficient permission to perform this operation

But note that due to dynamic library loading this example will not work
before we actually make use of thread_context_create_thread() in QEMU
code, because the type will otherwise not get registered.

Note that if the CPUs for a node change due do physical CPU hotplug or
hotunplug (e.g., lscpu output changes) after the ThreadContext was started,
the CPU affinity will not get updated.

Reviewed-by: Michal Privoznik 
Signed-off-by: David Hildenbrand 
---
 qapi/qom.json |  9 -
 util/meson.build  |  2 +-
 util/thread-context.c | 84 +++
 3 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 67d47f4051..cb6f700c0c 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -839,10 +839,17 @@
 #created in the thread context (default: QEMU main thread
 #affinity)
 #
+# @node-affinity: the list of node numbers that will be resolved to a list
+# of CPU numbers used as CPU affinity. This is a shortcut for
+# specifying the list of CPU numbers belonging to the nodes
+# manually by setting @cpu-affinity. (default: QEMU main thread
+# affinity)
+#
 # Since: 7.2
 ##
 { 'struct': 'ThreadContextProperties',
-  'data': { '*cpu-affinity': ['uint16'] } }
+  'data': { '*cpu-affinity': ['uint16'],
+'*node-affinity': ['uint16'] } }
 
 
 ##
diff --git a/util/meson.build b/util/meson.build
index e97cd2d779..c0a7bc54d4 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -1,5 +1,5 @@
 util_ss.add(files('osdep.c', 'cutils.c', 'unicode.c', 'qemu-timer-common.c'))
-util_ss.add(files('thread-context.c'))
+util_ss.add(files('thread-context.c'), numa)
 if not config_host_data.get('CONFIG_ATOMIC64')
   util_ss.add(files('atomic64.c'))
 endif
diff --git a/util/thread-context.c b/util/thread-context.c
index c921905396..4138245332 100644
--- a/util/thread-context.c
+++ b/util/thread-context.c
@@ -21,6 +21,10 @@
 #include "qemu/module.h"
 #include "qemu/bitmap.h"
 
+#ifdef CONFIG_NUMA
+#include 
+#endif
+
 enum {
 TC_CMD_NONE = 0,
 TC_CMD_STOP,
@@ -88,6 +92,11 @@ static void thread_context_set_cpu_affinity(Object *obj, 
Visitor *v,
 int nbits = 0, ret;
 Error *err = NULL;
 
+if (tc->init_cpu_bitmap) {
+error_setg(errp, "Mixing CPU and node affinity not supported");
+return;
+}
+
 visit_type_uint16List(v, name, &host_cpus, &err);
 if (err) {
 error_propagate(errp, err);
@@ -159,6 +168,79 @@ static void thread_context_get_cpu_affinity(Object *obj, 
Visitor *v,
 qapi_free_uint16List(host_cpus);
 }
 
+static void thread_context_set_node_affinity(Object *obj, Visitor *v,
+ const char *name, void *opaque,
+ Error **errp)
+{
+#ifdef CONFIG_NUMA
+const int nbits = numa_num_possible_cpus();
+ThreadContext *tc = THREAD_CONTEXT(obj);
+uint16List *l, *host_nodes = NULL;
+unsigned long *bitmap = NULL;
+struct bitmask *tmp_cpus;
+Error *err = NULL;
+int ret, i;
+
+if (tc->init_cpu_bitmap) {
+error_setg(errp, "Mixing CPU and node affinity not supported");
+return;
+}
+
+visit_type_uint16List(v, name, &host_nodes, &err);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+
+if (!host_nodes) {
+error_setg(errp, "Node list is empty");
+goto out;
+}
+
+bitmap = bitmap_new(nbits);
+tmp_cpus = numa_allocate_cpumask();
+for (l = host_nodes; l; l = l->next) {
+numa_bitmask_clearall(tmp_cpus);
+ret = numa_node_to_cpus(l->value, tmp_cpus);
+if (ret) {
+/* We ignore any errors, such as impossible nodes. */
+continue;
+}
+for (i = 0; i < nbits; i++) {
+if (numa_bitmask_isbitset(tmp_cpus, i)) {
+set_bit(i, bitmap);
+

Re: [PATCH v3 00/10] Introduce new acpi/smbios avocado tests using biosbits

2022-10-10 Thread Thomas Huth

On 10/10/2022 11.13, Ani Sinha wrote:

On Mon, Oct 10, 2022 at 2:26 PM Thomas Huth  wrote:


On 10/10/2022 10.13, Ani Sinha wrote:

On Mon, Oct 10, 2022 at 1:24 PM Ani Sinha  wrote:


Please see the README file added in patch 10 for more details.
Sample runs are as follows:

$ ./tests/venv/bin/avocado run -t acpi tests/avocado --tap -
ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
(smbios.py, line 92)
ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
(smilatency.py, line 47)
ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
(testacpi.py, line 158)


None of the above files are avocado tests or avocado related python
scripts. They are run from within bits in a python 2.7 environment. I
could not find a mechanism to exclude a directory from avocado tests.
I also do not think making those scripts python 3 compliant is a good
use of my time since upgrading bits to use python 3 would be a major
task unrelated to QEMU testing.


Maybe you could at least switch those three lines to use the new print()
syntax


There are lots of print statements in those three files using old
syntax. It's only complaining about the first one.

to silence at least these ugly errors? ... Python 2.7 should cope


Yeah, but everybody who wants to run the QEMU avocado tests will wonder 
about those ERROR messages! So I don't think that this is acceptable in the 
current shape. Either fix the print lines, or move it to another directory.


 Thomas





Re: [PATCH v3 0/3] Fix some loongarch tcg bugs

2022-10-10 Thread gaosong



在 2022/9/30 10:45, Song Gao 写道:

Hi,

This series fix some bugs find from RISU test.

V3:
   -drop patch set some instruction result high 32bit 1.
   -follow some change from Richard's suggestion.

v2:
   -remove patch5 div if x/0 set dividend to 0.


Song Gao (3):
   target/loongarch: bstrins.w src register need EXT_NONE
   target/loongarch: Fix fnm{sub/add}_{s/d} set wrong flags
   softfloat: logB(0) should raise divideByZero exception

  fpu/softfloat-parts.c.inc |  1 +
  target/loongarch/insn_trans/trans_bit.c.inc   | 36 +++
  .../loongarch/insn_trans/trans_farith.c.inc   | 12 +++
  3 files changed, 29 insertions(+), 20 deletions(-)

Applied to loongarch-next.

Thanks.
Song Gao




Re: [PATCH] ui/vnc-clipboard: fix integer underflow in vnc_client_cut_text_ext

2022-10-10 Thread Mauro Matteo Cascella
On Sun, Sep 25, 2022 at 10:45 PM Mauro Matteo Cascella
 wrote:
>
> Extended ClientCutText messages start with a 4-byte header. If len < 4,
> an integer underflow occurs in vnc_client_cut_text_ext. The result is
> used to decompress data in a while loop in inflate_buffer, leading to
> CPU consumption and denial of service. Prevent this by checking dlen in
> protocol_client_msg.
>
> Fixes: CVE-2022-3165
> Fixes: 0bf41cab93e5 ("ui/vnc: clipboard support")
> Reported-by: TangPeng 
> Signed-off-by: Mauro Matteo Cascella 
> ---
> Extended Clipboard Pseudo-Encoding:
> https://github.com/rfbproto/rfbproto/blob/master/rfbproto.rst#extended-clipboard-pseudo-encoding
>
>  ui/vnc.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/ui/vnc.c b/ui/vnc.c
> index 6a05d06147..acb3629cd8 100644
> --- a/ui/vnc.c
> +++ b/ui/vnc.c
> @@ -2442,8 +2442,8 @@ static int protocol_client_msg(VncState *vs, uint8_t 
> *data, size_t len)
>  if (len == 1) {
>  return 8;
>  }
> +uint32_t dlen = abs(read_s32(data, 4));
>  if (len == 8) {
> -uint32_t dlen = abs(read_s32(data, 4));
>  if (dlen > (1 << 20)) {
>  error_report("vnc: client_cut_text msg payload has %u bytes"
>   " which exceeds our limit of 1MB.", dlen);
> @@ -2456,8 +2456,13 @@ static int protocol_client_msg(VncState *vs, uint8_t 
> *data, size_t len)
>  }
>
>  if (read_s32(data, 4) < 0) {
> -vnc_client_cut_text_ext(vs, abs(read_s32(data, 4)),
> -read_u32(data, 8), data + 12);
> +if (dlen < 4) {
> +error_report("vnc: malformed payload (header less than 4 
> bytes)"
> + " in extended clipboard pseudo-encoding.");
> +vnc_client_error(vs);
> +break;
> +}
> +vnc_client_cut_text_ext(vs, dlen, read_u32(data, 8), data + 12);
>  break;
>  }
>  vnc_client_cut_text(vs, read_u32(data, 4), data + 8);
> --
> 2.37.3
>

Any updates here?

Thanks,
-- 
Mauro Matteo Cascella
Red Hat Product Security
PGP-Key ID: BB3410B0




Re: [PATCH v3 00/10] Introduce new acpi/smbios avocado tests using biosbits

2022-10-10 Thread Ani Sinha
On Mon, Oct 10, 2022 at 2:50 PM Daniel P. Berrangé  wrote:
>
> On Mon, Oct 10, 2022 at 01:43:21PM +0530, Ani Sinha wrote:
> > On Mon, Oct 10, 2022 at 1:24 PM Ani Sinha  wrote:
> > >
> > > Please see the README file added in patch 10 for more details.
> > > Sample runs are as follows:
> > >
> > > $ ./tests/venv/bin/avocado run -t acpi tests/avocado --tap -
> > > ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
> > > (smbios.py, line 92)
> > > ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
> > > (smilatency.py, line 47)
> > > ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
> > > (testacpi.py, line 158)
> >
> > None of the above files are avocado tests or avocado related python
> > scripts. They are run from within bits in a python 2.7 environment. I
> > could not find a mechanism to exclude a directory from avocado tests.
> > I also do not think making those scripts python 3 compliant is a good
> > use of my time since upgrading bits to use python 3 would be a major
> > task unrelated to QEMU testing.
>
> In one of the later patches copy_test_scripts()  copies the files
> into the guest image IIUC.
>
> If you rename them in git, to be .py2  then presumably avocado
> wont try to load them, and then in copy_test_scripts you can
> give them the normal .py extension for the guest.  The .py2
> extension will also make it more obvious to maintainers that
> these are different from the rest of the python coded we have.

I did a quick test and this approach seems to work.

 ./tests/venv/bin/avocado run -t acpi tests/avocado --tap -
1..1
ok 1 tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits



[PATCH] migration: Fix the minus value for compressed_size

2022-10-10 Thread Zhenzhong Duan
When update_compress_thread_counts() is called first time, there is
no data stream yet. We see compression_counters.compressed_size
becomes minus value shortly.

Signed-off-by: Zhenzhong Duan 
---
 migration/ram.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index dc1de9ddbc68..510db95cdc36 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1443,6 +1443,10 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream 
*stream, RAMBlock *block,
 static void
 update_compress_thread_counts(const CompressParam *param, int bytes_xmit)
 {
+if (bytes_xmit <= 0) {
+return;
+}
+
 ram_transferred_add(bytes_xmit);
 
 if (param->zero_page) {
-- 
2.25.1




Re: [PATCH v3 00/10] Introduce new acpi/smbios avocado tests using biosbits

2022-10-10 Thread Daniel P . Berrangé
On Mon, Oct 10, 2022 at 01:43:21PM +0530, Ani Sinha wrote:
> On Mon, Oct 10, 2022 at 1:24 PM Ani Sinha  wrote:
> >
> > Please see the README file added in patch 10 for more details.
> > Sample runs are as follows:
> >
> > $ ./tests/venv/bin/avocado run -t acpi tests/avocado --tap -
> > ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
> > (smbios.py, line 92)
> > ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
> > (smilatency.py, line 47)
> > ERROR:  Missing parentheses in call to 'print'. Did you mean print(...)? 
> > (testacpi.py, line 158)
> 
> None of the above files are avocado tests or avocado related python
> scripts. They are run from within bits in a python 2.7 environment. I
> could not find a mechanism to exclude a directory from avocado tests.
> I also do not think making those scripts python 3 compliant is a good
> use of my time since upgrading bits to use python 3 would be a major
> task unrelated to QEMU testing.

In one of the later patches copy_test_scripts()  copies the files
into the guest image IIUC.

If you rename them in git, to be .py2  then presumably avocado
wont try to load them, and then in copy_test_scripts you can
give them the normal .py extension for the guest.  The .py2
extension will also make it more obvious to maintainers that
these are different from the rest of the python coded we have.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] include/qemu/atomic128: Support 16-byte atomic read/write for Intel AVX

2022-10-10 Thread Peter Maydell
On Sat, 8 Oct 2022 at 16:38, Richard Henderson
 wrote:
>
> Intel has now given guarantees about the atomicity of SSE read
> and write instructions on cpus supporting AVX.  We can use these
> instead of the much slower cmpxchg16b.
>
> Derived from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688
>
> Signed-off-by: Richard Henderson 

> +/*
> + * The xgetbv instruction is not available to older versions of
> + * the assembler, so we encode the instruction manually.
> + */
> +asm(".byte 0x0f, 0x01, 0xd0" : "=a" (xcrl), "=d" (xcrh) : "c" (0));

This would make the third place in the tree where we hand-code
this asm instruction (we already do it in the xgetbv() function
in target/i386/hvf/x86_cpuid.c and opencoded in tcg_target_init()):
can we abstract this out somehow, please?

(There is also a just-written-out "xgetbv" in init_cpuid_cache(),
but that one's guarded by ifdefs so presumably OK.)

thanks
-- PMM



Re: [PATCH v6 00/13] blkio: add libblkio BlockDriver

2022-10-10 Thread Alberto Faria
On Thu, Oct 6, 2022 at 10:35 PM Stefan Hajnoczi  wrote:
> v6:
> - Add untested nvme-io_uring driver. Please test in your nested NVMe 
> environment, Alberto. [Alberto]

I did some I/O verification using fio [1] and it seems to be working fine.

Alberto

[1] https://fio.readthedocs.io/en/latest/fio_doc.html#verification




Re: [PATCH] error handling: Use TFR() macro where applicable

2022-10-10 Thread Peter Maydell
On Mon, 10 Oct 2022 at 09:34, Nikita Ivanov  wrote:
>
> Hi! Thanks for your notes. I'll try to send updated patches by the end of the 
> day.
>
> On Fri, Oct 7, 2022 at 6:32 PM Peter Maydell  wrote:
>> I'm not sure why you've put the TEMP_FAILURE_RETRY on the outside here
>> rather than just on the individual function calls.
>
>
> The original code contained both branches in one while loop, so I was afraid 
> that
> value `aiocb->aio_type & QEMU_AIO_WRITE` would change somehow during the loop.
> If you'll say that this is impossible, I'll adjust the code as you propose.

Oh, yes, I see. No, nothing should be changing that during the loop,
I think it's just written the way it is now because it seemed neater
when you write out the retry longhand.

-- PMM



Re: [PATCH v3 4/8] m25p80: Add the mx25l25635f SFPD table

2022-10-10 Thread Michael Walle

Am 2022-10-10 08:23, schrieb Cédric Le Goater:

On 10/7/22 16:44, Francisco Iglesias wrote:



--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -234,6 +234,8 @@ static const FlashPartInfo known_devices[] = {
  { INFO("mx25l12855e", 0xc22618,  0,  64 << 10, 256, 0) },
  { INFO6("mx25l25635e", 0xc22019, 0xc22019,  64 << 10, 512, 
0),

.sfdp_read = m25p80_sfdp_mx25l25635e },
+{ INFO6("mx25l25635f", 0xc22019, 0xc22019,  64 << 10, 512, 
0),


I think I'm not seeing the extended id part in the datasheet I've 
found so

might be that you can switch to just INFO and _ext_id 0 above


This was added by commit 6bbe036f32dc ("m25p80: Return the JEDEC ID 
twice for

mx25l25635e") to fix a real breakage on HW.


From my experience, the ID has a particular length, at least three bytes
and if you read past that length for some (all?) devices the id bytes 
just

get repeated. I.e. the counter in the device will just wrap to offset 0
again. If you want to emulate the hardware correctly, you would have to
take that into consideration.
But I don't think it's worth it, OTOH there seems to be some broken
software which rely on that (undefined?) behavior.

-michael



Re: [PATCH v7 0/5] QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0

2022-10-10 Thread Jonathan Cameron via
On Fri, 7 Oct 2022 16:21:51 +0100
Jonathan Cameron  wrote:

> Whilst I have carried on Huai-Cheng Kuo's series version numbering and
> naming, there have been very substantial changes since v6 so I would
> suggest fresh review makes sense for anyone who has looked at this before.
> In particularly if the Avery design folks could check I haven't broken
> anything that would be great.

I forgot to run checkpatch on these and there is some white space that
will need cleaning up and one instance of missing brackets.
As that doesn't greatly affect review, I'll wait for a few days to see
if there is other feedback to incorporate in v8.

Sorry for the resulting noise!

These are now available at
https://gitlab.com/jic23/qemu/-/commits/cxl-2022-10-09
along with a bunch of other CXL features:
* Compliance DOE protocol
* SPDM / CMA over DOE supprot
* ARM64 support in general.
* Various small emulation additions.
* CPMU support

I'll add a few more features to similarly named branches over the next
week or so including initial support for standalone switch CCI mailboxes.

Jonathan
 
> 
> For reference v6: QEMU PCIe DOE for PCIe 4.0/5.0 and CXL 2.0
> https://lore.kernel.org/qemu-devel/1623330943-18290-1-git-send-email-cbr...@avery-design.com/
> 
> Summary of changes:
> 1) Linux headers definitions for DOE are now upstream so drop that patch.
> 2) Add CDAT for switch upstream port.
> 3) Generate 'plausible' default CDAT tables when a file is not provided.
> 4) General refactoring to calculate the correct table sizes and allocate
>based on that rather than copying from a local static array.
> 5) Changes from earlier reviews such as matching QEMU type naming style.
> 6) Moved compliance and SPDM usecases to future patch sets.
> 
> Sign-offs on these are complex because the patches were originally developed
> by Huai-Cheng Kuo, but posted by Chris Browy and then picked up by Jonathan
> Cameron who made substantial changes.
> 
> Huai-Cheng Kuo / Chris Browy, please confirm you are still happy to maintain 
> this
> code as per the original MAINTAINERS entry.
> 
> What's here?
> 
> This series brings generic PCI Express Data Object Exchange support (DOE)
> DOE is defined in the PCIe Base Spec r6.0. It consists of a mailbox in PCI
> config space via a PCIe Extended Capability Structure.
> The PCIe spec defines several protocols (including one to discover what
> protocols a given DOE instance supports) and other specification such as
> CXL define additional protocols using their own vendor IDs.
> 
> In this series we make use of the DOE to support the CXL spec defined
> Table Access Protocol, specifically to provide access to CDAT - a
> table specified in a specification that is hosted by the UEFI forum
> and is used to provide runtime discoverability of the sort of information
> that would otherwise be available in firmware tables (memory types,
> latency and bandwidth information etc).
> 
> The Linux kernel gained support for DOE / CDAT on CXL type 3 EPs in 6.0.
> The version merged did not support interrupts (earlier versions did
> so that support in the emulation was tested a while back).
> 
> This series provides CDAT emulation for CXL switch upstream ports
> and CXL type 3 memory devices. Note that to exercise the switch support
> additional Linux kernel patches are needed.
> https://lore.kernel.org/linux-cxl/20220503153449.4088-1-jonathan.came...@huawei.com/
> (I'll post a new version of that support shortly)
> 
> Additional protocols will be supported by follow on patch sets:
> * CXL compliance protocol.
> * CMA / SPDM device attestation.
> (Old version at https://gitlab.com/jic23/qemu/-/commits/cxl-next - will 
> refresh
> that tree next week)
> 
> Huai-Cheng Kuo (3):
>   hw/pci: PCIe Data Object Exchange emulation
>   hw/cxl/cdat: CXL CDAT Data Object Exchange implementation
>   hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange
> 
> Jonathan Cameron (2):
>   hw/mem/cxl-type3: Add MSIX support
>   hw/pci-bridge/cxl-upstream: Add a CDAT table access DOE
> 
>  MAINTAINERS|   7 +
>  hw/cxl/cxl-cdat.c  | 222 
>  hw/cxl/meson.build |   1 +
>  hw/mem/cxl_type3.c | 236 +
>  hw/pci-bridge/cxl_upstream.c   | 182 +++-
>  hw/pci/meson.build |   1 +
>  hw/pci/pcie_doe.c  | 367 +
>  include/hw/cxl/cxl_cdat.h  | 166 +++
>  include/hw/cxl/cxl_component.h |   7 +
>  include/hw/cxl/cxl_device.h|   3 +
>  include/hw/cxl/cxl_pci.h   |   1 +
>  include/hw/pci/pci_ids.h   |   3 +
>  include/hw/pci/pcie.h  |   1 +
>  include/hw/pci/pcie_doe.h  | 123 +++
>  include/hw/pci/pcie_regs.h |   4 +
>  15 files changed, 1323 insertions(+), 1 deletion(-)
>  create mode 100644 hw/cxl/cxl-cdat.c
>  create mode 100644 hw/pci/pcie_doe.c
>  create mode 100644 include/hw/cxl/cxl_cdat.h
>  create mode 100644 include/hw/pci/pcie_doe.h
> 


Re: [PATCH] migration: Fix the minus value for compressed_size

2022-10-10 Thread Dr. David Alan Gilbert
* Zhenzhong Duan (zhenzhong.d...@intel.com) wrote:
> When update_compress_thread_counts() is called first time, there is
> no data stream yet. We see compression_counters.compressed_size
> becomes minus value shortly.
> 
> Signed-off-by: Zhenzhong Duan 
> ---
>  migration/ram.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index dc1de9ddbc68..510db95cdc36 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1443,6 +1443,10 @@ static bool do_compress_ram_page(QEMUFile *f, z_stream 
> *stream, RAMBlock *block,
>  static void
>  update_compress_thread_counts(const CompressParam *param, int bytes_xmit)
>  {
> +if (bytes_xmit <= 0) {
> +return;
> +}

What's the call path where that happens? The only place I see
bytes_xmit being less than 0 is in compress_page_with_multi_thread where
it's initialised to -1 - but it's always updated before the call
to update_compress_thread_counts.

I wonder if the real problem is:

compression_counters.compressed_size += bytes_xmit - 8;

Is bytes_xmit being less than 8 for some reason?

Dave

>  ram_transferred_add(bytes_xmit);
>  
>  if (param->zero_page) {
> -- 
> 2.25.1
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH v2 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext

2022-10-10 Thread Dr. David Alan Gilbert
* David Hildenbrand (da...@redhat.com) wrote:
> This is a follow-up on "util: NUMA aware memory preallocation" [1] by
> Michal.
> 
> Setting the CPU affinity of threads from inside QEMU usually isn't
> easily possible, because we don't want QEMU -- once started and running
> guest code -- to be able to mess up the system. QEMU disallows relevant
> syscalls using seccomp, such that any such invocation will fail.
> 
> Especially for memory preallocation in memory backends, the CPU affinity
> can significantly increase guest startup time, for example, when running
> large VMs backed by huge/gigantic pages, because of NUMA effects. For
> NUMA-aware preallocation, we have to set the CPU affinity, however:
> 
> (1) Once preallocation threads are created during preallocation, management
> tools cannot intercept anymore to change the affinity. These threads
> are created automatically on demand.
> (2) QEMU cannot easily set the CPU affinity itself.
> (3) The CPU affinity derived from the NUMA bindings of the memory backend
> might not necessarily be exactly the CPUs we actually want to use
> (e.g., CPU-less NUMA nodes, CPUs that are pinned/used for other VMs).
> 
> There is an easy "workaround". If we have a thread with the right CPU
> affinity, we can simply create new threads on demand via that prepared
> context. So, all we have to do is setup and create such a context ahead
> of time, to then configure preallocation to create new threads via that
> environment.
> 
> So, let's introduce a user-creatable "thread-context" object that
> essentially consists of a context thread used to create new threads.
> QEMU can either try setting the CPU affinity itself ("cpu-affinity",
> "node-affinity" property), or upper layers can extract the thread id
> ("thread-id" property) to configure it externally.
> 
> Make memory-backends consume a thread-context object
> (via the "prealloc-context" property) and use it when preallocating to
> create new threads with the desired CPU affinity. Further, to make it
> easier to use, allow creation of "thread-context" objects, including
> setting the CPU affinity directly from QEMU, before enabling the
> sandbox option.
> 
> 
> Quick test on a system with 2 NUMA nodes:
> 
> Without CPU affinity:
> time qemu-system-x86_64 \
> -object 
> memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind
>  \
> -nographic -monitor stdio
> 
> real0m5.383s
> real0m3.499s
> real0m5.129s
> real0m4.232s
> real0m5.220s
> real0m4.288s
> real0m3.582s
> real0m4.305s
> real0m5.421s
> real0m4.502s
> 
> -> It heavily depends on the scheduler CPU selection
> 
> With CPU affinity:
> time qemu-system-x86_64 \
> -object thread-context,id=tc1,node-affinity=0 \
> -object 
> memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind,prealloc-context=tc1
>  \
> -sandbox enable=on,resourcecontrol=deny \
> -nographic -monitor stdio
> 
> real0m1.959s
> real0m1.942s
> real0m1.943s
> real0m1.941s
> real0m1.948s
> real0m1.964s
> real0m1.949s
> real0m1.948s
> real0m1.941s
> real0m1.937s
> 
> On reasonably large VMs, the speedup can be quite significant.
> 
> While this concept is currently only used for short-lived preallocation
> threads, nothing major speaks against reusing the concept for other
> threads that are harder to identify/configure -- except that
> we need additional (idle) context threads that are otherwise left unused.
> 
> This series does not yet tackle concurrent preallocation of memory
> backends. Memory backend objects are created and memory is preallocated one
> memory backend at a time -- and there is currently no way to do
> preallocation asynchronously.

Since you seem to have a full set of r-b's - do you intend to merge this
as-is or do the cuncurrenct preallocation first?

Dave

> [1] 
> https://lkml.kernel.org/r/ffdcd118d59b379ede2b64745144165a40f6a813.1652165704.git.mpriv...@redhat.com
> 
> v1 -> v2:
> * Fixed some minor style nits
> * "util: Introduce ThreadContext user-creatable object"
>  -> Impove documentation and patch description. [Markus]
> * "util: Add write-only "node-affinity" property for ThreadContext"
>  -> Impove documentation and patch description. [Markus]
> 
> RFC -> v1:
> * "vl: Allow ThreadContext objects to be created before the sandbox option"
>  -> Move parsing of the "name" property before object_create_pre_sandbox
> * Added RB's
> 
> Cc: Michal Privoznik 
> Cc: Igor Mammedov 
> Cc: "Michael S. Tsirkin" 
> Cc: Paolo Bonzini 
> Cc: "Daniel P. Berrangé" 
> Cc: Eduardo Habkost 
> Cc: Dr. David Alan Gilbert 
> Cc: Eric Blake 
> Cc: Markus Armbruster 
> Cc: Richard Henderson 
> Cc: Stefan Weil 
> 
> David Hildenbrand (7):
>   

Re: [PATCH v2 05/11] blockjob: implement .change_aio_ctx in child_job

2022-10-10 Thread Kevin Wolf
Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben:
> child_job_change_aio_ctx() is very similar to
> child_job_can_set_aio_ctx(), but it implements a new transaction
> so that if all check pass, the new transaction's .commit()
> will take care of changin the BlockJob AioContext.
> child_job_set_aio_ctx_commit() is similar to child_job_set_aio_ctx(),
> but it doesn't need to invoke the recursion, as this is already
> taken care by child_job_change_aio_ctx().
> 
> Note: bdrv_child_try_change_aio_context() is not called by
> anyone at this point.
> 
> Reviewed-by: Hanna Reitz 
> Signed-off-by: Emanuele Giuseppe Esposito 

Reviewed-by: Kevin Wolf 




Re: [PATCH v2 07/11] block-backend: implement .change_aio_ctx in child_root

2022-10-10 Thread Kevin Wolf
Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben:
> blk_root_change_aio_ctx() is very similar to blk_root_can_set_aio_ctx(),
> but implements a new transaction so that if all check pass, the new
> transaction's .commit will take care of changing the BlockBackend
> AioContext. blk_root_set_aio_ctx_commit() is the same as
> blk_root_set_aio_ctx().
> 
> Note: bdrv_child_try_change_aio_context() is not called by
> anyone at this point.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 

Reviewed-by: Kevin Wolf 




Re: [PATCH v2 06/11] block: implement .change_aio_ctx in child_of_bds

2022-10-10 Thread Kevin Wolf
Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben:
> bdrv_child_cb_change_aio_ctx() is identical to
> bdrv_child_cb_can_set_aio_ctx(), as we only need
> to recursively go on the parent bs.
> 
> Note: bdrv_child_try_change_aio_context() is not called by
> anyone at this point.
> 
> Reviewed-by: Hanna Reitz 
> Signed-off-by: Emanuele Giuseppe Esposito 

Reviewed-by: Kevin Wolf 




Re: [PATCH v3 4/8] m25p80: Add the mx25l25635f SFPD table

2022-10-10 Thread Francisco Iglesias
Hi  Cedric,

On [2022 Oct 10] Mon 11:58:40, Michael Walle wrote:
> Am 2022-10-10 08:23, schrieb Cédric Le Goater:
> > On 10/7/22 16:44, Francisco Iglesias wrote:
> 
> > > > --- a/hw/block/m25p80.c
> > > > +++ b/hw/block/m25p80.c
> > > > @@ -234,6 +234,8 @@ static const FlashPartInfo known_devices[] = {
> > > >   { INFO("mx25l12855e", 0xc22618,  0,  64 << 10, 256, 0) },
> > > >   { INFO6("mx25l25635e", 0xc22019, 0xc22019,  64 << 10,
> > > > 512, 0),
> > > > .sfdp_read = m25p80_sfdp_mx25l25635e },
> > > > +{ INFO6("mx25l25635f", 0xc22019, 0xc22019,  64 << 10,
> > > > 512, 0),

I think I missed the (ER_4K | ER_32K) flags above (in case we go for a v4 we 
can add it in). 

> > > 
> > > I think I'm not seeing the extended id part in the datasheet I've
> > > found so
> > > might be that you can switch to just INFO and _ext_id 0 above
> > 
> > This was added by commit 6bbe036f32dc ("m25p80: Return the JEDEC ID
> > twice for
> > mx25l25635e") to fix a real breakage on HW.
> 
> From my experience, the ID has a particular length, at least three bytes
> and if you read past that length for some (all?) devices the id bytes just
> get repeated. I.e. the counter in the device will just wrap to offset 0
> again. If you want to emulate the hardware correctly, you would have to
> take that into consideration.

If we decide to go with Michael's proposal above you can use '0' on the
'extended_id' and enable 's->data_read_loop = true' when reading the ID.

Best regards,
Francisco

> But I don't think it's worth it, OTOH there seems to be some broken
> software which rely on that (undefined?) behavior.
> 
> -michael




Re: [PATCH 1/6] device-tree: add re-randomization helper function

2022-10-10 Thread Peter Maydell
On Thu, 6 Oct 2022 at 14:16, Peter Maydell  wrote:
>
> On Fri, 30 Sept 2022 at 00:23, Jason A. Donenfeld  wrote:
> >
> > When the system reboots, the rng-seed that the FDT has should be
> > re-randomized, so that the new boot gets a new seed. Several
> > architectures require this functionality, so export a function for
> > injecting a new seed into the given FDT.
> >
> > Cc: Alistair Francis 
> > Cc: David Gibson 
> > Signed-off-by: Jason A. Donenfeld 
>
> Hi; I've applied this series to target-arm.next (seems the easiest way
> to take it into the tree).

Unfortunately it turns out that this breaks the reverse-debugging
test that is part of 'make check-avocado'.

Running all of 'check-avocado' takes a long time, so here's how
to run the specific test:

  make -C your-build-tree check-venv   # Only for the first time
  your-build-tree/tests/venv/bin/avocado run
your-build-tree/tests/avocado/boot_linux.py

Probably more convenient though is to run the equivalent commands
by hand:

wget -O /tmp/vmlinuz
https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/29/Everything/aarch64/os/images/pxeboot/vmlinuz
./build/x86/qemu-img create -f qcow2 /tmp/disk.qcow2 128M
./build/x86/qemu-system-aarch64 -display none -machine virt -serial
stdio -cpu cortex-a53 -icount
shift=7,rr=record,rrfile=/tmp/qemu.rr,rrsnapshot=init -net none -drive
file=/tmp/disk.qcow2 -kernel /tmp/vmlinuz
# this will boot the kernel to the no-root-fs panic; hit ctrl-C when
it gets there
./build/x86/qemu-system-aarch64 -display none -machine virt -serial
stdio -cpu cortex-a53 -icount
shift=7,rr=replay,rrfile=/tmp/qemu.rr,rrsnapshot=init  -net none
-drive file=/tmp/disk.qcow2 -kernel /tmp/vmlinuz
# same command line, but 'replay' rather than 'record', QEMU will exit
with an error:
qemu-system-aarch64: Missing random event in the replay log

Without these patches the replay step will replay the recorded execution
up to the guest panic.

The error is essentially the record-and-replay subsystem saying "the
replay just asked for a random number at point when the recording
did not ask for one, and so there's no 'this is what the number was'
info in the record".

I have had a quick look, and I think the reason for this is that
load_snapshot() ("reset the VM state to the snapshot state stored in the
disk image or migration stream") does a system reset. The replay
process involves a lot of "load state from a snapshot and play
forwards from there" operations. It doesn't expect that load_snapshot()
would result in something reading random data, but now that we are
calling qemu_guest_getrandom() in a reset hook, that happens.

I'm not sure exactly what the best approach here is, so I've cc'd
the migration and replay submaintainers. For the moment I'm dropping
this patchset from target-arm.next.

thanks
-- PMM



Re: [PATCH 1/6] device-tree: add re-randomization helper function

2022-10-10 Thread Peter Maydell
On Mon, 10 Oct 2022 at 11:54, Peter Maydell  wrote:
>
> On Thu, 6 Oct 2022 at 14:16, Peter Maydell  wrote:
> >
> > On Fri, 30 Sept 2022 at 00:23, Jason A. Donenfeld  wrote:
> > >
> > > When the system reboots, the rng-seed that the FDT has should be
> > > re-randomized, so that the new boot gets a new seed. Several
> > > architectures require this functionality, so export a function for
> > > injecting a new seed into the given FDT.
> > >
> > > Cc: Alistair Francis 
> > > Cc: David Gibson 
> > > Signed-off-by: Jason A. Donenfeld 
> >
> > Hi; I've applied this series to target-arm.next (seems the easiest way
> > to take it into the tree).
>
> Unfortunately it turns out that this breaks the reverse-debugging
> test that is part of 'make check-avocado'.
>
> Running all of 'check-avocado' takes a long time, so here's how
> to run the specific test:
>
>   make -C your-build-tree check-venv   # Only for the first time
>   your-build-tree/tests/venv/bin/avocado run
> your-build-tree/tests/avocado/boot_linux.py

derp, wrong test name, should be

 your-build-tree/tests/venv/bin/avocado run
your-build-tree/tests/avocado/reverse_debugging.py

-- PMM



RE: [PATCH] migration: Fix the minus value for compressed_size

2022-10-10 Thread Duan, Zhenzhong



>-Original Message-
>From: Dr. David Alan Gilbert 
>Sent: Monday, October 10, 2022 6:35 PM
>To: Duan, Zhenzhong 
>Cc: qemu-devel@nongnu.org; quint...@redhat.com
>Subject: Re: [PATCH] migration: Fix the minus value for compressed_size
>
>* Zhenzhong Duan (zhenzhong.d...@intel.com) wrote:
>> When update_compress_thread_counts() is called first time, there is no
>> data stream yet. We see compression_counters.compressed_size
>> becomes minus value shortly.
>>
>> Signed-off-by: Zhenzhong Duan 
>> ---
>>  migration/ram.c | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/migration/ram.c b/migration/ram.c index
>> dc1de9ddbc68..510db95cdc36 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1443,6 +1443,10 @@ static bool do_compress_ram_page(QEMUFile *f,
>> z_stream *stream, RAMBlock *block,  static void
>> update_compress_thread_counts(const CompressParam *param, int
>> bytes_xmit)  {
>> +if (bytes_xmit <= 0) {
>> +return;
>> +}
>
>What's the call path where that happens? The only place I see bytes_xmit
>being less than 0 is in compress_page_with_multi_thread where it's initialised
>to -1 - but it's always updated before the call to
>update_compress_thread_counts.
>
>I wonder if the real problem is:
>
>compression_counters.compressed_size += bytes_xmit - 8;
>
>Is bytes_xmit being less than 8 for some reason?

bytes_xmit is 0 when update_compress_thread_counts() is called first time as 
comp_param[idx].file
is empty, which makes compression_counters.compressed_size=-8

(gdb) bt
#0  update_compress_thread_counts (param=0x7ffe340011d0, bytes_xmit=0) at 
../migration/ram.c:1446
#1  0x55cbf480 in flush_compressed_data (rs=0x7ffe34259d40) at 
../migration/ram.c:1486
#2  0x55cc0a53 in save_compress_page (rs=0x7ffe34259d40, 
block=0x56aab280, offset=0) at ../migration/ram.c:2260
#3  0x55cc0b0e in ram_save_target_page (rs=0x7ffe34259d40, 
pss=0x7ffece7fb800) at ../migration/ram.c:2290
#4  0x55cc0fd8 in ram_save_host_page (rs=0x7ffe34259d40, 
pss=0x7ffece7fb800) at ../migration/ram.c:2487
#5  0x55cc11ce in ram_find_and_save_block (rs=0x7ffe34259d40) at 
../migration/ram.c:2576
#6  0x55cc2863 in ram_save_iterate (f=0x56a98b30, 
opaque=0x567c92e0 ) at ../migration/ram.c:3304
#7  0x55ae86bd in qemu_savevm_state_iterate (f=0x56a98b30, 
postcopy=false) at ../migration/savevm.c:1261
#8  0x55ad7500 in migration_iteration_run (s=0x56ad5390) at 
../migration/migration.c:3757
#9  0x55ad7abb in migration_thread (opaque=0x56ad5390) at 
../migration/migration.c:3988
#10 0x55f20012 in qemu_thread_start (args=0x56862180) at 
../util/qemu-thread-posix.c:504
#11 0x778f5609 in start_thread (arg=) at 
pthread_create.c:477
#12 0x7781a163 in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) n
(gdb) n
(gdb) p compression_counters
$6 = {pages = 0, busy = 0, busy_rate = 0, compressed_size = -8, 
compression_rate = 0}

Thanks
Zhenzhong



Re: [PATCH v2 0/7] hostmem: NUMA-aware memory preallocation using ThreadContext

2022-10-10 Thread David Hildenbrand

On 10.10.22 12:40, Dr. David Alan Gilbert wrote:

* David Hildenbrand (da...@redhat.com) wrote:

This is a follow-up on "util: NUMA aware memory preallocation" [1] by
Michal.

Setting the CPU affinity of threads from inside QEMU usually isn't
easily possible, because we don't want QEMU -- once started and running
guest code -- to be able to mess up the system. QEMU disallows relevant
syscalls using seccomp, such that any such invocation will fail.

Especially for memory preallocation in memory backends, the CPU affinity
can significantly increase guest startup time, for example, when running
large VMs backed by huge/gigantic pages, because of NUMA effects. For
NUMA-aware preallocation, we have to set the CPU affinity, however:

(1) Once preallocation threads are created during preallocation, management
 tools cannot intercept anymore to change the affinity. These threads
 are created automatically on demand.
(2) QEMU cannot easily set the CPU affinity itself.
(3) The CPU affinity derived from the NUMA bindings of the memory backend
 might not necessarily be exactly the CPUs we actually want to use
 (e.g., CPU-less NUMA nodes, CPUs that are pinned/used for other VMs).

There is an easy "workaround". If we have a thread with the right CPU
affinity, we can simply create new threads on demand via that prepared
context. So, all we have to do is setup and create such a context ahead
of time, to then configure preallocation to create new threads via that
environment.

So, let's introduce a user-creatable "thread-context" object that
essentially consists of a context thread used to create new threads.
QEMU can either try setting the CPU affinity itself ("cpu-affinity",
"node-affinity" property), or upper layers can extract the thread id
("thread-id" property) to configure it externally.

Make memory-backends consume a thread-context object
(via the "prealloc-context" property) and use it when preallocating to
create new threads with the desired CPU affinity. Further, to make it
easier to use, allow creation of "thread-context" objects, including
setting the CPU affinity directly from QEMU, before enabling the
sandbox option.


Quick test on a system with 2 NUMA nodes:

Without CPU affinity:
 time qemu-system-x86_64 \
 -object 
memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind
 \
 -nographic -monitor stdio

 real0m5.383s
 real0m3.499s
 real0m5.129s
 real0m4.232s
 real0m5.220s
 real0m4.288s
 real0m3.582s
 real0m4.305s
 real0m5.421s
 real0m4.502s

 -> It heavily depends on the scheduler CPU selection

With CPU affinity:
 time qemu-system-x86_64 \
 -object thread-context,id=tc1,node-affinity=0 \
 -object 
memory-backend-memfd,id=md1,hugetlb=on,hugetlbsize=2M,size=64G,prealloc-threads=12,prealloc=on,host-nodes=0,policy=bind,prealloc-context=tc1
 \
 -sandbox enable=on,resourcecontrol=deny \
 -nographic -monitor stdio

 real0m1.959s
 real0m1.942s
 real0m1.943s
 real0m1.941s
 real0m1.948s
 real0m1.964s
 real0m1.949s
 real0m1.948s
 real0m1.941s
 real0m1.937s

On reasonably large VMs, the speedup can be quite significant.

While this concept is currently only used for short-lived preallocation
threads, nothing major speaks against reusing the concept for other
threads that are harder to identify/configure -- except that
we need additional (idle) context threads that are otherwise left unused.

This series does not yet tackle concurrent preallocation of memory
backends. Memory backend objects are created and memory is preallocated one
memory backend at a time -- and there is currently no way to do
preallocation asynchronously.


Hi Dave,



Since you seem to have a full set of r-b's - do you intend to merge this
as-is or do the cuncurrenct preallocation first?


I intent to merge this as is, as it provides a benefit as it stands and 
concurrent preallcoation might not require user interface changes.


I do have some ideas on how to implement concurrent preallocation, but 
it needs more thought (and more importantly, time).


--
Thanks,

David / dhildenb




Re: [PATCH v6 08/13] numa: use QLIST_FOREACH_SAFE() for RAM block notifiers

2022-10-10 Thread David Hildenbrand

On 06.10.22 23:35, Stefan Hajnoczi wrote:

Make list traversal work when a callback removes a notifier
mid-traversal. This is a cleanup to prevent bugs in the future.

Signed-off-by: Stefan Hajnoczi 
---
  hw/core/numa.c | 9 ++---
  1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 31e6fe1caa..ea24a5fa8c 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -857,8 +857,9 @@ void ram_block_notifier_remove(RAMBlockNotifier *n)
  void ram_block_notify_add(void *host, size_t size, size_t max_size)
  {
  RAMBlockNotifier *notifier;
+RAMBlockNotifier *next;
  
-QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {

+QLIST_FOREACH_SAFE(notifier, &ram_list.ramblock_notifiers, next, next) {
  if (notifier->ram_block_added) {
  notifier->ram_block_added(notifier, host, size, max_size);
  }
@@ -868,8 +869,9 @@ void ram_block_notify_add(void *host, size_t size, size_t 
max_size)
  void ram_block_notify_remove(void *host, size_t size, size_t max_size)
  {
  RAMBlockNotifier *notifier;
+RAMBlockNotifier *next;
  
-QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {

+QLIST_FOREACH_SAFE(notifier, &ram_list.ramblock_notifiers, next, next) {
  if (notifier->ram_block_removed) {
  notifier->ram_block_removed(notifier, host, size, max_size);
  }
@@ -879,8 +881,9 @@ void ram_block_notify_remove(void *host, size_t size, 
size_t max_size)
  void ram_block_notify_resize(void *host, size_t old_size, size_t new_size)
  {
  RAMBlockNotifier *notifier;
+RAMBlockNotifier *next;
  
-QLIST_FOREACH(notifier, &ram_list.ramblock_notifiers, next) {

+QLIST_FOREACH_SAFE(notifier, &ram_list.ramblock_notifiers, next, next) {
  if (notifier->ram_block_resized) {
  notifier->ram_block_resized(notifier, host, old_size, new_size);
  }


Reviewed-by: David Hildenbrand 

--
Thanks,

David / dhildenb




Re: [PATCH] target/arm: Make the final stage1+2 write to secure be unconditional

2022-10-10 Thread Peter Maydell
On Fri, 7 Oct 2022 at 17:53, Richard Henderson
 wrote:
>
> On 10/7/22 09:20, Peter Maydell wrote:
> >> -/* Check if IPA translates to secure or non-secure PA space. 
> >> */
> >> -if (is_secure) {
> >> -if (ipa_secure) {
> >> -result->attrs.secure =
> >> -!(env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW));
> >> -} else {
> >> -result->attrs.secure =
> >> -!((env->cp15.vtcr_el2 & (VTCR_NSA | VTCR_NSW))
> >> -|| (env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW)));
> >> -}
> >> -}
> >
> > If:
> >   is_secure == true
> >   ipa_secure == false
> >   (env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW) is non-zero
> >   (env->cp15.vtcr_el2 & (VTCR_NSA | VTCR_NSW) is zero
> > then the old code sets attrs.secure to true...
>
> No, I think the misalignment of the two lines wrt the !() may have been 
> confusing:

Ah, yes -- I was indeed confused by the misalignment.

thanks
-- PMM



Re: [PATCH v4] x86: add etc/phys-bits fw_cfg file

2022-10-10 Thread Gerd Hoffmann
  Hi,

> > > > Given newer processors have more than 40 and for older ones we know
> > > > the possible values for the two relevant x86 vendors we could do
> > > > something along the lines of:
> > > >
> > > >phys-bits >= 41   -> valid
> > > >phys-bits == 40+ AuthenticAMD -> valid
> > > >phys-bits == 36,39 + GenuineIntel -> valid
> > > >everything else   -> invalid

> I dropped the patch for now.

You can drop it forever.

For the mail archives and anyone interested:  The approach outlined
above appears to work well, patches just landed in edk2 master branch.
Next edk2 stable tag (2022-11) will have it.

take care,
  Gerd




Re: [PULL 28/54] configure: build ROMs with container-based cross compilers

2022-10-10 Thread Daniel P . Berrangé
On Tue, Oct 04, 2022 at 02:01:12PM +0100, Alex Bennée wrote:
> From: Paolo Bonzini 
> 
> s390-ccw remains a bit more complex, because the -march=z900 test is done
> only for the native cross compiler.  Otherwise, all that is needed is
> to pass the (now mandatory) target argument to write_target_makefile.
> 
> Signed-off-by: Paolo Bonzini 
> Signed-off-by: Alex Bennée 
> Message-Id: <20220929114231.583801-29-alex.ben...@linaro.org>

I'm not at all convinced this change was/is a good idea.

First of all, it causes 'make' to now download about 1 GB of
container images

  $ ./configure --target-list=x86_64-softmmu
  $ make
  ...snip...
  BUILD   debian-powerpc-test-cross
  Trying to pull 
registry.gitlab.com/qemu-project/qemu/qemu/debian-powerpc-test-cross:latest...
  Getting image source signatures
  Copying blob 2a205c8a1d36 [=>] 12.4MiB / 
257.2MiB

  ...
  ...snip...
  
Despite downloading this image, it then proceeded to rebuild the
image from scratch, requiring another few 100MBs of downloads
of dpkgs. This time the download was without progress information
until it entirely failed due to a dead Debia mirror server, needing
a retry.

It then went on to download an s390x image which seems to have
two layers, each with 360 MB.

  BUILD   debian-s390x-cross
Trying to pull 
registry.gitlab.com/qemu-project/qemu/qemu/debian-s390x-cross:latest...
Getting image source signatures
Copying blob fc8d65e34cd5 [>-] 12.0MiB / 
360.2MiB
Copying blob bd159e379b3b skipped: already exists  
Copying blob 13224e2971af [>-] 12.2MiB / 
366.5MiB

So overall it was more than 1 GB of downloads when typing 'make'

I wasn't too amuzed by seeing this downloaded data , given that
I'm usually running off a 4G mobile connection, and it took a
very long time.


The progress information printed by docker when downloading
the images splatters all over the output meson displays, when
doing a parallel make making everything unintelligible.


Finally, I had requested only building x86_64, so we shouldn't
be doing anything related to ppc or s390 at all, but even if

AFAICT, it enables this downloading unconditionally merely by
having 'docker'/'podman' binaries installed, if you don't
otherwise have cross compuilers present.

I'd really not want to see any of this stuff downloaded without
an explicit opt-in choice at configure time.

I'm also a little concerned at what happens if we have to stop
publishing the containers at registry.gitlab.com in future. Are
we going to break the default 'make' for existing released QEMU
tarballs ?

Generally we've only relied on the gitlab infra for our CI
testing, so we have been free to change infra or alter the
way we publish images at any time, without risk of impact on
the released tarballs.

This isn't a theoretical problem, because GitLab has announced
their intention to limit storage usage in gitlab.com, and even
having joined the Open Source Program, our quota is only increased
from 5 GB to 25 GB.  I'd be concerned we're at risk of exceeding
that 25 GB limit, when they start to enforce it, requiring us to
move container image host to somewhere else such as quay.io


> diff --git a/configure b/configure
> index c175650eb9..a54e17aca9 100755
> --- a/configure
> +++ b/configure
> @@ -2152,7 +2152,7 @@ probe_target_compiler() {
>  target_ranlib=
>  target_strip=
>fi
> -  test -n "$target_cc"
> +  test -n "$target_cc" || test -n "$container_image"
>  }
>  
>  write_target_makefile() {
> @@ -2307,7 +2307,7 @@ if test "$targetos" != "darwin" && test "$targetos" != 
> "sunos" && \
>  config_mak=pc-bios/optionrom/config.mak
>  echo "# Automatically generated by configure - do not modify" > 
> $config_mak
>  echo "TOPSRC_DIR=$source_path" >> $config_mak
> -write_target_makefile >> $config_mak
> +write_target_makefile pc-bios/optionrom/all >> $config_mak
>  fi
>  
>  if test "$softmmu" = yes && probe_target_compiler ppc-softmmu; then
> @@ -2315,25 +2315,31 @@ if test "$softmmu" = yes && probe_target_compiler 
> ppc-softmmu; then
>  config_mak=pc-bios/vof/config.mak
>  echo "# Automatically generated by configure - do not modify" > 
> $config_mak
>  echo "SRC_DIR=$source_path/pc-bios/vof" >> $config_mak
> -write_target_makefile >> $config_mak
> +write_target_makefile pc-bios/vof/all >> $config_mak
>  fi
>  
>  # Only build s390-ccw bios if the compiler has -march=z900 or -march=z10
>  # (which is the lowest architecture level that Clang supports)
>  if test "$softmmu" = yes && probe_target_compiler s390x-softmmu; then
> -  write_c_skeleton
> -  do_compiler "$target_cc" $target_cc_cflags -march=z900 -o $TMPO -c $TMPC
> -  has_z900=$?
> -  if [ $has_z900 = 0 ] || do_compiler "$target_cc" $target_cc_cflags 
> -march=z10 -msoft-float -Werror -o $TMPO -c $TMPC; then
> -if [ $has_z900 != 0 ]; then
> -  echo "WARNING: Your compiler does not support the

Re: [PATCH v2 08/11] block: use the new _change_ API instead of _can_set_ and _set_

2022-10-10 Thread Kevin Wolf
Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben:
> Replace all direct usage of ->can_set_aio_ctx and ->set_aio_ctx,
> and call bdrv_child_try_change_aio_context() in
> bdrv_try_set_aio_context(), the main function called through
> the whole block layer.
> 
> From this point onwards, ->can_set_aio_ctx and ->set_aio_ctx
> won't be used anymore.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 

Reviewed-by: Kevin Wolf 




Re: [PATCH v2 09/11] block: remove all unused ->can_set_aio_ctx and ->set_aio_ctx callbacks

2022-10-10 Thread Kevin Wolf
Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben:
> Together with all _can_set_ and _set_ APIs, as they are not needed
> anymore.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 

Reviewed-by: Kevin Wolf 




Re: [PATCH v2 10/11] block: rename bdrv_child_try_change_aio_context in bdrv_try_change_aio_context

2022-10-10 Thread Kevin Wolf
Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben:
> No functional changes intended.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 

Reviewed-by: Kevin Wolf 




Re: [PATCH v2 11/11] block: remove bdrv_try_set_aio_context and replace it with bdrv_try_change_aio_context

2022-10-10 Thread Kevin Wolf
Am 25.07.2022 um 14:21 hat Emanuele Giuseppe Esposito geschrieben:
> No functional change intended.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 

Reviewed-by: Kevin Wolf 




Re: [PULL 28/54] configure: build ROMs with container-based cross compilers

2022-10-10 Thread Daniel P . Berrangé
On Mon, Oct 10, 2022 at 01:54:16PM +0100, Daniel P. Berrangé wrote:
> On Tue, Oct 04, 2022 at 02:01:12PM +0100, Alex Bennée wrote:
> > From: Paolo Bonzini 
> > 
> > s390-ccw remains a bit more complex, because the -march=z900 test is done
> > only for the native cross compiler.  Otherwise, all that is needed is
> > to pass the (now mandatory) target argument to write_target_makefile.
> > 
> > Signed-off-by: Paolo Bonzini 
> > Signed-off-by: Alex Bennée 
> > Message-Id: <20220929114231.583801-29-alex.ben...@linaro.org>
> 
> I'm not at all convinced this change was/is a good idea.
> 
> First of all, it causes 'make' to now download about 1 GB of
> container images
> 
>   $ ./configure --target-list=x86_64-softmmu
>   $ make
>   ...snip...
>   BUILD   debian-powerpc-test-cross
>   Trying to pull 
> registry.gitlab.com/qemu-project/qemu/qemu/debian-powerpc-test-cross:latest...
>   Getting image source signatures
>   Copying blob 2a205c8a1d36 [=>] 12.4MiB 
> / 257.2MiB
> 
>   ...
>   ...snip...
>   
> Despite downloading this image, it then proceeded to rebuild the
> image from scratch, requiring another few 100MBs of downloads
> of dpkgs. This time the download was without progress information
> until it entirely failed due to a dead Debia mirror server, needing
> a retry.

This failure seems worse than I realized.  It also failed on a
non-responsive mirror when re-building the s390 image. WHen I
re-ran 'make' it definitely did not retry the re-build process,
as 'BUILD debian-s390x-cross' was instantaneous. So it looks
like it is using the downloaded cached image, despite the previous
'make' apparently considering that to have been outdated content
needing a rebuild.

FWIW the  failure output was:

  BUILD   debian-s390x-cross
Trying to pull 
registry.gitlab.com/qemu-project/qemu/qemu/debian-s390x-cross:latest...
Getting image source signatures
Copying blob fc8d65e34cd5 [===>--] 341.6MiB / 
360.2MiB
Copying blob fc8d65e34cd5 done  
Copying blob bd159e379b3b skipped: already exists  
Copying blob 13224e2971af done  
Copying config 67a127f7cd done  
Writing manifest to image destination
Storing signatures
debconf: delaying package configuration, since apt-utils is not installed
E: Failed to fetch 
http://deb.debian.org/debian/pool/main/s/systemd/systemd_247.3-7%2bdeb11u1_amd64.deb
  400  Bad Request [IP: 199.232.122.132 80]
E: Failed to fetch 
http://deb.debian.org/debian/pool/main/p/perl/perl_5.32.1-4%2bdeb11u2_amd64.deb 
 Error reading from server - read (104: Connection reset by peer) [IP: 
199.232.122.132 80]
E: Unable to fetch some archives, maybe run apt-get update or try with 
--fix-missing?
Error: error building at STEP "RUN export DEBIAN_FRONTEND=noninteractive && 
apt-get update && apt-get install -y eatmydata && eatmydata apt-get 
dist-upgrade -y && eatmydata apt-get install --no-install-recommends -y 
bash bc bison bsdextrautils 
bzip2 ca-certificates ccache dbus   
  debianutils diffutils exuberant-ctags 
findutils flex gcovr genisoimage
 gettext git hostname libglib2.0-dev
 libpcre2-dev libspice-protocol-dev llvm
 locales make meson ncat 
ninja-build openssh-client perl-base 
pkgconf python3 python3-numpy 
python3-opencv python3-pillow python3-pip 
python3-sphinx python3-sphinx-rtd-theme python3-venv
 python3-yaml rpm2cpio sed sparse   
  tar tesseract-ocr tesseract-ocr-eng   
  texinfo && eatmydata apt-get autoremove -y && eatmydata apt-get 
autoclean -y && sed -Ei 's,^# (en_US\.UTF-8 .*)$,\1,' /etc/locale.gen &&
 dpkg-reconfigure locales": error while running runtime: exit status 100
Traceback (most recent call last):
  File "/home/berrange/src/virt/qemu/tests/docker/docker.py", line 683, in 

sys.exit(main())
  File "/home/berrange/src/virt/qemu/tests/docker/docker.py", line 679, in main
return args.cmdobj.run(args, argv)
  File "/home/berrange/src/virt/qemu/tests/docker/docker.py", line 493, in run
dkr.build_image(tag, docker_dir, dockerfile,
  File "/home/berrange/src/virt/qemu/tests/docker/docker.py", line 343, in 
build_image
self._do_check(build_args,
  File "/home/berrange/src/virt/qemu/tests/docker/docker.py", line 247, in 
_do_check
return subprocess.check_call(self._command + cmd, **kwargs)
  File "/usr/lib64/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command

Re: [PATCH v3 00/42] target/arm: Implement FEAT_HAFDBS

2022-10-10 Thread Peter Maydell
On Sat, 1 Oct 2022 at 17:24, Richard Henderson
 wrote:
>
> This is a major reorg to arm page table walking.  While the result
> here is "merely" Hardware-assited Access Flag and Dirty Bit Setting
> (HAFDBS), the ultimate goal is the Realm Management Extension (RME).
> RME "recommends" that HAFDBS be implemented (I_CSLWZ).
>

I've taken patches 1-20 plus the extra patch 1.5 into
target-arm.next (last patch taken "Use tlb_set_page_full").

thanks
-- PMM



Re: [PATCH] hw/arm/boot: set CPTR_EL3.ESM and SCR_EL3.EnTP2 when booting Linux with EL3

2022-10-10 Thread Peter Maydell
On Mon, 3 Oct 2022 at 15:57, Jerome Forissier
 wrote:
>
> According to the Linux kernel booting.rst [1], CPTR_EL3.ESM and
> SCR_EL3.EnTP2 must be initialized to 1 when EL3 is present and FEAT_SME
> is advertised. This has to be taken care of when QEMU boots directly
> into the kernel (i.e., "-M virt,secure=on -cpu max -kernel Image").
>
> Cc: qemu-sta...@nongnu.org
> Fixes: 78cb9776662a ("target/arm: Enable SME for -cpu max")
> Link: [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/arm64/booting.rst?h=v6.0#n321

Ooh, that detailed set of control bit requirements is new
since I last read that doc -- we should probably go through
and cross-check that we're setting them all correctly.

> Signed-off-by: Jerome Forissier 
> ---
>  hw/arm/boot.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index ada2717f76..ee3858b673 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -763,6 +763,10 @@ static void do_cpu_reset(void *opaque)
>  if (cpu_isar_feature(aa64_sve, cpu)) {
>  env->cp15.cptr_el[3] |= R_CPTR_EL3_EZ_MASK;
>  }
> +if (cpu_isar_feature(aa64_sme, cpu)) {
> +env->cp15.cptr_el[3] |= R_CPTR_EL3_ESM_MASK;
> +env->cp15.scr_el3 |= SCR_ENTP2;
> +}

The doc also says that we (as fake EL3) should be setting
SMCR_EL3.LEN to the same value for all CPUs. Currently we do
do that, but it's always the reset value of 0. Richard: does
that have any odd effects (I have a feeling it clamps the
VL to the minimum supported value)? Should we be setting
SMCR_EL3.LEN to the max supported value here ?

thanks
-- PMM



Re: [PATCH] hw/arm/boot: set CPTR_EL3.ESM and SCR_EL3.EnTP2 when booting Linux with EL3

2022-10-10 Thread Peter Maydell
On Mon, 10 Oct 2022 at 14:16, Peter Maydell  wrote:
>
> On Mon, 3 Oct 2022 at 15:57, Jerome Forissier
>  wrote:
> >
> > According to the Linux kernel booting.rst [1], CPTR_EL3.ESM and
> > SCR_EL3.EnTP2 must be initialized to 1 when EL3 is present and FEAT_SME
> > is advertised. This has to be taken care of when QEMU boots directly
> > into the kernel (i.e., "-M virt,secure=on -cpu max -kernel Image").
> >
> > Cc: qemu-sta...@nongnu.org
> > Fixes: 78cb9776662a ("target/arm: Enable SME for -cpu max")
> > Link: [1] 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/arm64/booting.rst?h=v6.0#n321
>
> Ooh, that detailed set of control bit requirements is new
> since I last read that doc -- we should probably go through
> and cross-check that we're setting them all correctly.
>
> > Signed-off-by: Jerome Forissier 
> > ---
> >  hw/arm/boot.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> > index ada2717f76..ee3858b673 100644
> > --- a/hw/arm/boot.c
> > +++ b/hw/arm/boot.c
> > @@ -763,6 +763,10 @@ static void do_cpu_reset(void *opaque)
> >  if (cpu_isar_feature(aa64_sve, cpu)) {
> >  env->cp15.cptr_el[3] |= R_CPTR_EL3_EZ_MASK;
> >  }
> > +if (cpu_isar_feature(aa64_sme, cpu)) {
> > +env->cp15.cptr_el[3] |= R_CPTR_EL3_ESM_MASK;
> > +env->cp15.scr_el3 |= SCR_ENTP2;
> > +}
>
> The doc also says that we (as fake EL3) should be setting
> SMCR_EL3.LEN to the same value for all CPUs. Currently we do
> do that, but it's always the reset value of 0. Richard: does
> that have any odd effects (I have a feeling it clamps the
> VL to the minimum supported value)? Should we be setting
> SMCR_EL3.LEN to the max supported value here ?

In any case, this patch is right as far as it goes and
obviously improves the current situation, and it puts
SME into the same place as SVE, so I'll take this into
target-arm.next.

-- PMM



[RFC PATCH v2 1/4] tests/acpi: virt: allow acpi MADT and FADT changes

2022-10-10 Thread Miguel Luis
Step 3 from bios-tables-test.c documented procedure.

Signed-off-by: Miguel Luis 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..8dc50f7a8a 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,7 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/virt/FACP",
+"tests/data/acpi/virt/FACP.numamem",
+"tests/data/acpi/virt/FACP.memhp",
+"tests/data/acpi/virt/APIC",
+"tests/data/acpi/virt/APIC.memhp",
+"tests/data/acpi/virt/APIC.numamem",
-- 
2.37.3




[RFC PATCH v2 3/4] acpi: arm/virt: madt: bump to revision 4 accordingly to ACPI 6.0 Errata A

2022-10-10 Thread Miguel Luis
MADT has been updated with the GIC Structures from ACPI 6.0 Errata A
and so MADT revision and GICC Structure must be updated also.

Fixes: 37f33084ed2e ("acpi: arm/virt: madt: use build_append_int_noprefix() API 
to compose MADT table")

Signed-off-by: Miguel Luis 
Reviewed-by: Ani Sinha 
---
 hw/arm/virt-acpi-build.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 72bb6f61a5..2d21e3cec4 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -686,7 +686,7 @@ build_dbg2(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 };
 
 /*
- * ACPI spec, Revision 5.1 Errata A
+ * ACPI spec, Revision 6.0 Errata A
  * 5.2.12 Multiple APIC Description Table (MADT)
  */
 static void build_append_gicr(GArray *table_data, uint64_t base, uint32_t size)
@@ -705,7 +705,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 int i;
 VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
 const MemMapEntry *memmap = vms->memmap;
-AcpiTable table = { .sig = "APIC", .rev = 3, .oem_id = vms->oem_id,
+AcpiTable table = { .sig = "APIC", .rev = 4, .oem_id = vms->oem_id,
 .oem_table_id = vms->oem_table_id };
 
 acpi_table_begin(&table, table_data);
@@ -740,7 +740,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 
 /* 5.2.12.14 GIC Structure */
 build_append_int_noprefix(table_data, 0xB, 1);  /* Type */
-build_append_int_noprefix(table_data, 76, 1);   /* Length */
+build_append_int_noprefix(table_data, 80, 1);   /* Length */
 build_append_int_noprefix(table_data, 0, 2);/* Reserved */
 build_append_int_noprefix(table_data, i, 4);/* GIC ID */
 build_append_int_noprefix(table_data, i, 4);/* ACPI Processor UID 
*/
@@ -760,6 +760,10 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 build_append_int_noprefix(table_data, 0, 8);/* GICR Base Address*/
 /* MPIDR */
 build_append_int_noprefix(table_data, armcpu->mp_affinity, 8);
+/* Processor Power Efficiency Class */
+build_append_int_noprefix(table_data, 0, 1);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 3);
 }
 
 if (vms->gic_version != VIRT_GIC_VERSION_2) {
@@ -771,12 +775,6 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 }
 
 if (its_class_name() && !vmc->no_its) {
-/*
- * FIXME: Structure is from Revision 6.0 where 'GIC Structure'
- * has additional fields on top of implemented 5.1 Errata A,
- * to make it consistent with v6.0 we need to bump everything
- * to v6.0
- */
 /*
  * ACPI spec, Revision 6.0 Errata A
  * (original 6.0 definition has invalid Length)
-- 
2.37.3




[RFC PATCH v2 0/4] ACPI MADT and FADT update according to the ACPI 6.0 spec

2022-10-10 Thread Miguel Luis
The MADT table structure has been updated in commit 37f33084ed2e
 
("acpi: arm/virt: madt: use build_append_int_noprefix() API to compose MADT 
table")
to include the 5.2.12.18 GIC ITS Structure and so table's revision also needs 
to 
be updated. MADT and the FADT tables from the same spec need to be in sync and 
in
this case also the FADT needs to be updated.
 

 
Revision 6.0 of the ACPI FADT table introduces the field "Hypervisor Vendor 
 
Identity" which is missing and must be included. Patch 2/4 includes a
suggestion for the value of this field.

Ref: https://uefi.org/sites/default/files/resources/ACPI_6_0_Errata_A.PDF   
 

Changelog:

v2:
patch 2/4:
fix expression that checks for the revision number (Ani Sinha)
use "QEMU" as the Hypervisor Vendor ID [1] (Ani Sinha)

patch 3/4:
add Reviewed-by tag from Ani Sinha 


v1: https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg00910.html

Open to discussion, your comments, thoughts and suggestions are very welcome.   
 
Thanks in advance.  
 
Miguel

[1]: https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg00989.html

Miguel Luis (4):
  tests/acpi: virt: allow acpi MADT and FADT changes
  acpi: fadt: support revision 6.0 of the ACPI specification
  acpi: arm/virt: madt: bump to revision 4 accordingly to ACPI 6.0
Errata A
  tests/acpi: virt: update ACPI MADT and FADT binaries

 hw/acpi/aml-build.c   |  13 ++---
 hw/arm/virt-acpi-build.c  |  26 --
 tests/data/acpi/virt/APIC | Bin 168 -> 172 bytes
 tests/data/acpi/virt/APIC.memhp   | Bin 168 -> 172 bytes
 tests/data/acpi/virt/APIC.numamem | Bin 168 -> 172 bytes
 tests/data/acpi/virt/FACP | Bin 268 -> 276 bytes
 tests/data/acpi/virt/FACP.memhp   | Bin 268 -> 276 bytes
 tests/data/acpi/virt/FACP.numamem | Bin 268 -> 276 bytes
 8 files changed, 22 insertions(+), 17 deletions(-)

-- 
2.37.3




[RFC PATCH v2 4/4] tests/acpi: virt: update ACPI MADT and FADT binaries

2022-10-10 Thread Miguel Luis
Step 6 & 7 of the bios-tables-test.c documented procedure.

Differences between disassembled ASL files for MADT:

@@ -11,9 +11,9 @@
  */

 [000h    4]Signature : "APIC"[Multiple APIC 
Description Table (MADT)]
-[004h 0004   4] Table Length : 00A8
-[008h 0008   1] Revision : 03
-[009h 0009   1] Checksum : 50
+[004h 0004   4] Table Length : 00AC
+[008h 0008   1] Revision : 04
+[009h 0009   1] Checksum : 47
 [00Ah 0010   6]   Oem ID : "BOCHS "
 [010h 0016   8] Oem Table ID : "BXPC"
 [018h 0024   4] Oem Revision : 0001
@@ -34,7 +34,7 @@
 [041h 0065   3] Reserved : 00

 [044h 0068   1]Subtable Type : 0B [Generic Interrupt 
Controller]
-[045h 0069   1]   Length : 4C
+[045h 0069   1]   Length : 50
 [046h 0070   2] Reserved : 
 [048h 0072   4] CPU Interface Number : 
 [04Ch 0076   4]Processor UID : 
@@ -51,28 +51,29 @@
 [07Ch 0124   4]Virtual GIC Interrupt : 
 [080h 0128   8]   Redistributor Base Address : 
 [088h 0136   8]ARM MPIDR : 
-/ ACPI subtable terminates early - may be older version (dump table) */
+[090h 0144   1] Efficiency Class : 00
+[091h 0145   3] Reserved : 00

-[090h 0144   1]Subtable Type : 0D [Generic MSI Frame]
-[091h 0145   1]   Length : 18
-[092h 0146   2] Reserved : 
-[094h 0148   4] MSI Frame ID : 
-[098h 0152   8] Base Address : 0802
-[0A0h 0160   4]Flags (decoded below) : 0001
+[094h 0148   1]Subtable Type : 0D [Generic MSI Frame]
+[095h 0149   1]   Length : 18
+[096h 0150   2] Reserved : 
+[098h 0152   4] MSI Frame ID : 
+[09Ch 0156   8] Base Address : 0802
+[0A4h 0164   4]Flags (decoded below) : 0001
   Select SPI : 1
-[0A4h 0164   2]SPI Count : 0040
-[0A6h 0166   2] SPI Base : 0050
+[0A8h 0168   2]SPI Count : 0040
+[0AAh 0170   2] SPI Base : 0050

-Raw Table Data: Length 168 (0xA8)
+Raw Table Data: Length 172 (0xAC)

-: 41 50 49 43 A8 00 00 00 03 50 42 4F 43 48 53 20  // APIC.PBOCHS
+: 41 50 49 43 AC 00 00 00 04 47 42 4F 43 48 53 20  // APIC.GBOCHS
 0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPCBXPC
 0020: 01 00 00 00 00 00 00 00 00 00 00 00 0C 18 00 00  // 
 0030: 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 00  // 
-0040: 02 00 00 00 0B 4C 00 00 00 00 00 00 00 00 00 00  // .L..
+0040: 02 00 00 00 0B 50 00 00 00 00 00 00 00 00 00 00  // .P..
 0050: 01 00 00 00 00 00 00 00 17 00 00 00 00 00 00 00  // 
 0060: 00 00 00 00 00 00 01 08 00 00 00 00 00 00 04 08  // 
 0070: 00 00 00 00 00 00 03 08 00 00 00 00 00 00 00 00  // 
 0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
-0090: 0D 18 00 00 00 00 00 00 00 00 02 08 00 00 00 00  // 
-00A0: 01 00 00 00 40 00 50 00  // @.P.
+0090: 00 00 00 00 0D 18 00 00 00 00 00 00 00 00 02 08  // 
+00A0: 00 00 00 00 01 00 00 00 40 00 50 00  // @.P.

Differences between disassembled ASL files for FADT:

@@ -11,9 +11,9 @@
  */

 [000h    4]Signature : "FACP"[Fixed ACPI 
Description Table (FADT)]
-[004h 0004   4] Table Length : 010C
-[008h 0008   1] Revision : 05
-[009h 0009   1] Checksum : 55
+[004h 0004   4] Table Length : 0114
+[008h 0008   1] Revision : 06
+[009h 0009   1] Checksum : 15
 [00Ah 0010   6]   Oem ID : "BOCHS "
 [010h 0016   8] Oem Table ID : "BXPC"
 [018h 0024   4] Oem Revision : 0001
@@ -99,7 +99,7 @@
   PSCI Compliant : 1
Must use HVC for PSCI : 1

-[083h 0131   1]  FADT Minor Revision : 01
+[083h 0131   1]  FADT Minor Revision : 00
 [084h 0132   8] FACS Address : 
 [08Ch 0140   8] DSDT Address : 
 [094h 0148  12] PM1A Event Block : [Generic Address Structure]
@@ -173,11 +173,11 @@
 [103h 0259   1] Encoded Access Width : 00 [Undefined/Legacy]
 [104h 0260   8]  A

[RFC PATCH v2 2/4] acpi: fadt: support revision 6.0 of the ACPI specification

2022-10-10 Thread Miguel Luis
Update the Fixed ACPI Description Table (FADT) to revision 6.0 of the ACPI
specification adding the field "Hypervisor Vendor Identity" that was missing.

This field's description states the following: "64-bit identifier of hypervisor
vendor. All bytes in this field are considered part of the vendor identity.
These identifiers are defined independently by the vendors themselves,
usually following the name of the hypervisor product. Version information
should NOT be included in this field - this shall simply denote the vendor's
name or identifier. Version information can be communicated through a
supplemental vendor-specific hypervisor API. Firmware implementers would
place zero bytes into this field, denoting that no hypervisor is present in
the actual firmware."

Hereupon, what should a valid identifier of an Hypervisor Vendor ID be and
where should QEMU provide that information?

On the v1 [1] of this RFC there's the suggestion of having this information
in sync by the current acceleration name. This also seems to imply that QEMU,
which generates the FADT table, and the FADT consumer need to be in sync with
the values of this field.

This version follows Ani Sinha's suggestion [2] of using "QEMU" for the
hypervisor vendor ID.

[1]: https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg00911.html
[2]: https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg00989.html

Signed-off-by: Miguel Luis 
---
 hw/acpi/aml-build.c  | 13 ++---
 hw/arm/virt-acpi-build.c | 10 +-
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index e6bfac95c7..42feb4d4d7 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -2070,7 +2070,7 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, 
MachineState *ms,
 acpi_table_end(linker, &table);
 }
 
-/* build rev1/rev3/rev5.1 FADT */
+/* build rev1/rev3/rev5.1/rev6.0 FADT */
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
 const char *oem_id, const char *oem_table_id)
 {
@@ -2193,8 +2193,15 @@ void build_fadt(GArray *tbl, BIOSLinker *linker, const 
AcpiFadtData *f,
 /* SLEEP_STATUS_REG */
 build_append_gas_from_struct(tbl, &f->sleep_sts);
 
-/* TODO: extra fields need to be added to support revisions above rev5 */
-assert(f->rev == 5);
+if (f->rev == 5) {
+goto done;
+}
+
+/* Hypervisor Vendor Identity */
+build_append_padded_str(tbl, "QEMU", 8, '\0');
+
+/* TODO: extra fields need to be added to support revisions above rev6 */
+assert(f->rev == 6);
 
 done:
 acpi_table_end(linker, &table);
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 9b3aee01bf..72bb6f61a5 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -809,13 +809,13 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 }
 
 /* FADT */
-static void build_fadt_rev5(GArray *table_data, BIOSLinker *linker,
+static void build_fadt_rev6(GArray *table_data, BIOSLinker *linker,
 VirtMachineState *vms, unsigned dsdt_tbl_offset)
 {
-/* ACPI v5.1 */
+/* ACPI v6.0 */
 AcpiFadtData fadt = {
-.rev = 5,
-.minor_ver = 1,
+.rev = 6,
+.minor_ver = 0,
 .flags = 1 << ACPI_FADT_F_HW_REDUCED_ACPI,
 .xdsdt_tbl_offset = &dsdt_tbl_offset,
 };
@@ -945,7 +945,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables 
*tables)
 
 /* FADT MADT PPTT GTDT MCFG SPCR DBG2 pointed to by RSDT */
 acpi_add_table(table_offsets, tables_blob);
-build_fadt_rev5(tables_blob, tables->linker, vms, dsdt);
+build_fadt_rev6(tables_blob, tables->linker, vms, dsdt);
 
 acpi_add_table(table_offsets, tables_blob);
 build_madt(tables_blob, tables->linker, vms);
-- 
2.37.3




[PATCH v2 01/11] migration: support file: uri for source migration

2022-10-10 Thread Nikolay Borisov
Implement support for a "file:" uri so that a migration can be initiated
directly to a file from QEMU.

Signed-off-by: Nikolay Borisov 
---
 migration/file.c  | 23 +++
 migration/file.h  |  9 +
 migration/meson.build |  1 +
 migration/migration.c |  3 +++
 4 files changed, 36 insertions(+)
 create mode 100644 migration/file.c
 create mode 100644 migration/file.h

diff --git a/migration/file.c b/migration/file.c
new file mode 100644
index ..02896a7cab99
--- /dev/null
+++ b/migration/file.c
@@ -0,0 +1,23 @@
+#include "qemu/osdep.h"
+#include "channel.h"
+#include "io/channel-file.h"
+#include "file.h"
+#include "qemu/error-report.h"
+
+
+void file_start_outgoing_migration(MigrationState *s, const char *fname, Error 
**errp)
+{
+   QIOChannelFile *ioc;
+
+   ioc = qio_channel_file_new_path(fname, O_CREAT|O_TRUNC|O_WRONLY, 0660, 
errp);
+   if (!ioc) {
+   error_report("Error creating a channel");
+   return;
+   }
+
+   qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-outgoing");
+   migration_channel_connect(s, QIO_CHANNEL(ioc), NULL, NULL);
+   object_unref(OBJECT(ioc));
+}
+
+
diff --git a/migration/file.h b/migration/file.h
new file mode 100644
index ..d476eb1157f9
--- /dev/null
+++ b/migration/file.h
@@ -0,0 +1,9 @@
+#ifndef QEMU_MIGRATION_FILE_H
+#define QEMU_MIGRATION_FILE_H
+
+void file_start_outgoing_migration(MigrationState *s,
+   const char *filename,
+   Error **errp);
+
+#endif
+
diff --git a/migration/meson.build b/migration/meson.build
index 690487cf1a81..30a8392701c3 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -17,6 +17,7 @@ softmmu_ss.add(files(
   'colo.c',
   'exec.c',
   'fd.c',
+  'file.c',
   'global_state.c',
   'migration.c',
   'multifd.c',
diff --git a/migration/migration.c b/migration/migration.c
index bb8bbddfe467..8813b78b9a6b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -20,6 +20,7 @@
 #include "migration/blocker.h"
 #include "exec.h"
 #include "fd.h"
+#include "file.h"
 #include "socket.h"
 #include "sysemu/runstate.h"
 #include "sysemu/sysemu.h"
@@ -2414,6 +2415,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 exec_start_outgoing_migration(s, p, &local_err);
 } else if (strstart(uri, "fd:", &p)) {
 fd_start_outgoing_migration(s, p, &local_err);
+} else if (strstart(uri, "file:", &p)) {
+   file_start_outgoing_migration(s, p, &local_err);
 } else {
 if (!(has_resume && resume)) {
 yank_unregister_instance(MIGRATION_YANK_INSTANCE);
-- 
2.34.1




[PATCH v2 10/11] migration: Add support for 'fixed-ram' migration restore

2022-10-10 Thread Nikolay Borisov
Add the necessary code to parse the format changes for 'fixed-ram'
capability. One of the more notable changes in behavior is that in the
'fixed-ram' case ram pages are restored in one go rather than constantly
looping through the migration stream. Also due to idiosyncrasies of the
format I have added the 'ram_migrated' since it was easier to simply
return directly from ->load_state rather than introducing more
conditionals around the code to prevent ->load_state being called
multiple times (from
qemu_loadvm_section_start_full/qemu_loadvm_section_part_end i.e. from
multiple QEMU_VM_SECTION_(PART|END) flags).

Signed-off-by: Nikolay Borisov 
---
 migration/migration.h |  2 +
 migration/ram.c   | 95 ++-
 2 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 9aab1b16f407..7a832d072415 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -96,6 +96,8 @@ struct MigrationIncomingState {
 bool   have_listen_thread;
 QemuThread listen_thread;
 
+bool ram_migrated;
+
 /* For the kernel to send us notifications */
 int   userfault_fd;
 /* To notify the fault_thread to wake, e.g., when need to quit */
diff --git a/migration/ram.c b/migration/ram.c
index 1dd68c221667..15441ed9f745 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4330,6 +4330,90 @@ static int parse_ramblocks(QEMUFile *f, ram_addr_t 
total_ram_bytes)
 return ret;
 }
 
+
+static int parse_ramblocks_fixed_ram(QEMUFile *f)
+{
+int ret = 0;
+
+while (!ret) {
+char id[256];
+RAMBlock *block;
+ram_addr_t length;
+unsigned long clear_bit_idx;
+long num_pages, bitmap_size;
+int len = qemu_get_byte(f);
+g_autofree unsigned long *dirty_bitmap = NULL;
+
+qemu_get_buffer(f, (uint8_t *)id, len);
+id[len] = 0;
+length = qemu_get_be64(f);
+
+block = qemu_ram_block_by_name(id);
+if (block) {
+ret = parse_ramblock(f, block, length);
+if (ret < 0) {
+return ret;
+}
+} else {
+error_report("Unknown ramblock \"%s\", cannot accept "
+ "migration", id);
+ret = -EINVAL;
+continue;
+}
+
+/* 1. read the bitmap size */
+num_pages = length >> TARGET_PAGE_BITS;
+bitmap_size = qemu_get_be32(f);
+
+assert(bitmap_size == BITS_TO_LONGS(num_pages)*sizeof(unsigned long));
+
+block->pages_offset = qemu_get_be64(f);
+
+/* 2. read the actual bitmap */
+dirty_bitmap = g_malloc0(bitmap_size);
+if (qemu_get_buffer(f, (uint8_t *)dirty_bitmap, bitmap_size) != 
bitmap_size) {
+error_report("Error parsing dirty bitmap");
+return -EINVAL;
+}
+
+#define BUFSIZE (4*1024*1024)
+for (unsigned long set_bit_idx = find_first_bit(dirty_bitmap, 
num_pages);
+ set_bit_idx < num_pages;
+ set_bit_idx = find_next_bit(dirty_bitmap, num_pages, 
clear_bit_idx + 1)) {
+
+clear_bit_idx = find_next_zero_bit(dirty_bitmap, num_pages, 
set_bit_idx + 1);
+unsigned long len = TARGET_PAGE_SIZE * (clear_bit_idx - 
set_bit_idx);
+ram_addr_t offset = set_bit_idx << TARGET_PAGE_BITS;
+
+for (size_t written = 0, completed = 0; completed < len; offset += 
written) {
+void *host = host_from_ram_block_offset(block, offset);
+size_t read_len = MIN(len, BUFSIZE);
+
+written = qemu_get_buffer_at(f, host, read_len,
+ block->pages_offset + offset);
+completed += written;
+}
+}
+
+/* Skip pages array */
+qemu_set_offset(f, block->pages_offset + length, SEEK_SET);
+
+/* Check if this is the last ramblock */
+if (qemu_get_be64(f) == RAM_SAVE_FLAG_EOS) {
+ret = 1;
+} else {
+/*
+ * If not, adjust the internal file index to account for the
+ * previous 64 bit read
+ */
+qemu_file_skip(f, -8);
+ret = 0;
+}
+}
+
+return ret;
+}
+
 /**
  * ram_load_precopy: load pages in precopy case
  *
@@ -4349,7 +4433,7 @@ static int ram_load_precopy(QEMUFile *f)
 invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE;
 }
 
-while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
+while (!ret && !(flags & RAM_SAVE_FLAG_EOS) && !mis->ram_migrated) {
 ram_addr_t addr;
 void *host = NULL, *host_bak = NULL;
 uint8_t ch;
@@ -4421,7 +4505,14 @@ static int ram_load_precopy(QEMUFile *f)
 
 switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
 case RAM_SAVE_FLAG_MEM_SIZE:
-ret = parse_ramblocks(f, addr);
+if (migrate_fixed_ram()) {
+ret = parse_ramblocks_fixed_ram(f);

[PATCH v2 07/11] migration: add qemu_get_buffer_at

2022-10-10 Thread Nikolay Borisov
Restoring a 'fixed-ram' enabled migration stream would require reading
from specific offsets in the file so add a helper to QEMUFile that uses
the newly introduced qio_channel_file_preadv.

Signed-off-by: Nikolay Borisov 
---
 migration/qemu-file.c | 23 +++
 migration/qemu-file.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 07ba1125e83f..73fffbdb9b7e 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -564,6 +564,29 @@ void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, 
size_t buflen, off_t po
 return;
 }
 
+
+size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, 
off_t pos)
+{
+Error *err = NULL;
+struct iovec iov = { .iov_base = (char *)buf, .iov_len = buflen };
+ssize_t ret;
+
+if (f->last_error) {
+return 0;
+}
+
+ret = qio_channel_file_preadv(f->ioc, &iov, 1, pos, &err);
+if (ret == -1) {
+   goto error;
+}
+
+return (size_t)ret;
+
+ error:
+qemu_file_set_error_obj(f, -EIO, err);
+return 0;
+}
+
 void qemu_set_offset(QEMUFile *f, off_t off, int whence)
 {
 Error *err = NULL;
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 33cfc07b81d1..ab10c3ad7e42 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -151,6 +151,7 @@ void qemu_file_set_blocking(QEMUFile *f, bool block);
 void qemu_set_offset(QEMUFile *f, off_t off, int whence);
 off_t qemu_get_offset(QEMUFile *f);
 void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t 
pos);
+size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, 
off_t pos);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
-- 
2.34.1




[PATCH v2 03/11] migration: Make migration json writer part of MigrationState struct

2022-10-10 Thread Nikolay Borisov
This is required so that migration stream configuration is written
to the migration stream. This would allow analyze-migration to
parse enabled capabilities for the migration and adjust its behavior
accordingly. This is in preparation for analyze-migration.py to support
'fixed-ram' capability format changes.

Signed-off-by: Nikolay Borisov 
---
 migration/migration.c |  5 +
 migration/migration.h |  3 +++
 migration/savevm.c| 38 ++
 3 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 140b0f1a54bd..d0779bbaf862 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1896,6 +1896,8 @@ static void migrate_fd_cleanup(MigrationState *s)
 g_free(s->hostname);
 s->hostname = NULL;
 
+json_writer_free(s->vmdesc);
+
 qemu_savevm_state_cleanup();
 
 if (s->to_dst_file) {
@@ -2154,6 +2156,7 @@ void migrate_init(MigrationState *s)
 error_free(s->error);
 s->error = NULL;
 s->hostname = NULL;
+s->vmdesc = NULL;
 
 migrate_set_state(&s->state, MIGRATION_STATUS_NONE, 
MIGRATION_STATUS_SETUP);
 
@@ -4269,6 +4272,8 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 return;
 }
 
+s->vmdesc = json_writer_new(false);
+
 if (multifd_save_setup(&local_err) != 0) {
 error_report_err(local_err);
 migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
diff --git a/migration/migration.h b/migration/migration.h
index cdad8aceaaab..96f27aba2210 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -17,6 +17,7 @@
 #include "exec/cpu-common.h"
 #include "hw/qdev-core.h"
 #include "qapi/qapi-types-migration.h"
+#include "qapi/qmp/json-writer.h"
 #include "qemu/thread.h"
 #include "qemu/coroutine_int.h"
 #include "io/channel.h"
@@ -261,6 +262,8 @@ struct MigrationState {
 
 int state;
 
+JSONWriter *vmdesc;
+
 /* State related to return path */
 struct {
 /* Protected by qemu_file_lock */
diff --git a/migration/savevm.c b/migration/savevm.c
index 48e85c052c2c..174cdbefc29d 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1137,13 +1137,18 @@ void qemu_savevm_non_migratable_list(strList **reasons)
 
 void qemu_savevm_state_header(QEMUFile *f)
 {
+MigrationState *s = migrate_get_current();
 trace_savevm_state_header();
 qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
 qemu_put_be32(f, QEMU_VM_FILE_VERSION);
 
-if (migrate_get_current()->send_configuration) {
+if (s->send_configuration) {
 qemu_put_byte(f, QEMU_VM_CONFIGURATION);
-vmstate_save_state(f, &vmstate_configuration, &savevm_state, 0);
+   json_writer_start_object(s->vmdesc, NULL);
+   json_writer_start_object(s->vmdesc, "configuration");
+vmstate_save_state(f, &vmstate_configuration, &savevm_state, 
s->vmdesc);
+   json_writer_end_object(s->vmdesc);
+
 }
 }
 
@@ -1364,15 +1369,16 @@ int 
qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
 bool in_postcopy,
 bool inactivate_disks)
 {
-g_autoptr(JSONWriter) vmdesc = NULL;
+MigrationState *s = migrate_get_current();
 int vmdesc_len;
 SaveStateEntry *se;
 int ret;
 
-vmdesc = json_writer_new(false);
-json_writer_start_object(vmdesc, NULL);
-json_writer_int64(vmdesc, "page_size", qemu_target_page_size());
-json_writer_start_array(vmdesc, "devices");
+if (!s->send_configuration) {
+   json_writer_start_object(s->vmdesc, NULL);
+}
+json_writer_int64(s->vmdesc, "page_size", qemu_target_page_size());
+json_writer_start_array(s->vmdesc, "devices");
 QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
 
 if ((!se->ops || !se->ops->save_state) && !se->vmsd) {
@@ -1385,12 +1391,12 @@ int 
qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
 
 trace_savevm_section_start(se->idstr, se->section_id);
 
-json_writer_start_object(vmdesc, NULL);
-json_writer_str(vmdesc, "name", se->idstr);
-json_writer_int64(vmdesc, "instance_id", se->instance_id);
+json_writer_start_object(s->vmdesc, NULL);
+json_writer_str(s->vmdesc, "name", se->idstr);
+json_writer_int64(s->vmdesc, "instance_id", se->instance_id);
 
 save_section_header(f, se, QEMU_VM_SECTION_FULL);
-ret = vmstate_save(f, se, vmdesc);
+ret = vmstate_save(f, se, s->vmdesc);
 if (ret) {
 qemu_file_set_error(f, ret);
 return ret;
@@ -1398,7 +1404,7 @@ int 
qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
 trace_savevm_section_end(se->idstr, se->section_id, 0);
 save_section_footer(f, se);
 
-json_writer_end_object(vmdesc);
+json_writer_end_object(s->vmdesc);
 }
 
 if (inactivate_disks) {
@@ -1417,14 +1423,14 @@ int 
qemu_savevm_state_complete_pr

[PATCH v2 08/11] migration/ram: Introduce 'fixed-ram' migration stream capability

2022-10-10 Thread Nikolay Borisov
Implement 'fixed-ram' feature. The core of the feature is to ensure that
each ram page of the migration stream has a specific offset in the
resulting migration stream. The reason why we'd want such behavior are
two fold:

 - When doing a 'fixed-ram' migration the resulting file will have a
   bounded size, since pages which are dirtied multiple times will
   always go to a fixed location in the file, rather than constantly
   being added to a sequential stream. This eliminates cases where a vm
   with, say, 1g of ram can result in a migration file that's 10s of
   Gbs, provided that the workload constantly redirties memory.

 - It paves the way to implement DIO-enabled save/restore of the
   migration stream as the pages are ensured to be written at aligned
   offsets.

The features requires changing the format. First, a bitmap is introduced
which tracks which pages have been written (i.e are dirtied) during
migration and subsequently it's being written in the resultin file,
again at a fixed location for every ramblock. Zero pages are ignored as
they'd be zero in the destination migration as well. With the changed
format data would look like the following:

|name len|name|used_len|pc*|bitmap_size|pages_offset|bitmap|pages|

* pc - refers to the page_size/mr->addr members, so newly added members
begin from "bitmap_size".

This layout is initialized during ram_save_setup so instead of having a
sequential stream of pages that follow the ramblock headers the dirty
pages for a ramblock follow its header. Since all pages have a fixed
location RAM_SAVE_FLAG_EOS is no longer generated on every migration
iteration but there is effectively a single RAM_SAVE_FLAG_EOS right at
the end.

Signed-off-by: Nikolay Borisov 
---
 include/exec/ramblock.h |  7 +++
 migration/migration.c   | 51 +-
 migration/migration.h   |  1 +
 migration/ram.c | 97 +
 migration/savevm.c  |  1 +
 qapi/migration.json |  2 +-
 6 files changed, 138 insertions(+), 21 deletions(-)

diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 6cbedf9e0c9a..30216c1a41d3 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -43,6 +43,13 @@ struct RAMBlock {
 size_t page_size;
 /* dirty bitmap used during migration */
 unsigned long *bmap;
+/* shadow dirty bitmap used when migrating to a file */
+unsigned long *shadow_bmap;
+/* offset in the file pages belonging to this ramblock are saved, used
+ * only during migration to a file
+ */
+off_t bitmap_offset;
+uint64_t pages_offset;
 /* bitmap of already received pages in postcopy */
 unsigned long *receivedmap;
 
diff --git a/migration/migration.c b/migration/migration.c
index d0779bbaf862..1d08664da5db 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -165,7 +165,8 @@ INITIALIZE_MIGRATE_CAPS_SET(check_caps_background_snapshot,
 MIGRATION_CAPABILITY_XBZRLE,
 MIGRATION_CAPABILITY_X_COLO,
 MIGRATION_CAPABILITY_VALIDATE_UUID,
-MIGRATION_CAPABILITY_ZERO_COPY_SEND);
+MIGRATION_CAPABILITY_ZERO_COPY_SEND,
+MIGRATION_CAPABILITY_FIXED_RAM);
 
 /* When we add fault tolerance, we could have several
migrations at once.  For now we don't need to add
@@ -1325,6 +1326,27 @@ static bool migrate_caps_check(bool *cap_list,
 }
 #endif
 
+if (cap_list[MIGRATION_CAPABILITY_FIXED_RAM]) {
+   if (cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
+   error_setg(errp, "Directly mapped memory incompatible with 
multifd");
+   return false;
+   }
+
+   if (cap_list[MIGRATION_CAPABILITY_XBZRLE]) {
+   error_setg(errp, "Directly mapped memory incompatible with 
xbzrle");
+   return false;
+   }
+
+   if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) {
+   error_setg(errp, "Directly mapped memory incompatible with 
compression");
+   return false;
+   }
+
+   if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) {
+   error_setg(errp, "Directly mapped memory incompatible with 
postcopy ram");
+   return false;
+   }
+}
 
 /* incoming side only */
 if (runstate_check(RUN_STATE_INMIGRATE) &&
@@ -2629,6 +2651,11 @@ MultiFDCompression migrate_multifd_compression(void)
 return s->parameters.multifd_compression;
 }
 
+int migrate_fixed_ram(void)
+{
+return 
migrate_get_current()->enabled_capabilities[MIGRATION_CAPABILITY_FIXED_RAM];
+}
+
 int migrate_multifd_zlib_level(void)
 {
 MigrationState *s;
@@ -4189,6 +4216,21 @@ static void *bg_migration_thread(void *opaque)
 return NULL;
 }
 
+static int
+migrate_check_fixed_ram(MigrationState *s, Error **errp)
+{
+if (!s->enabled_capabilities[MIGRATION_CAPABILITY_FIXED_RAM])
+return 0;
+
+if (!qemu_file_is_seekable(s->to_dst_file)) {
+error_setg(errp, "Directly mapped memory req

[PATCH v2 02/11] migration: Add support for 'file:' uri for incoming migration

2022-10-10 Thread Nikolay Borisov
This is a counterpart to the 'file:' uri support for source migration,
now a file can also serve as the source of an incoming migration.

Signed-off-by: Nikolay Borisov 
---
 migration/file.c  | 15 +++
 migration/file.h  |  1 +
 migration/migration.c |  2 ++
 3 files changed, 18 insertions(+)

diff --git a/migration/file.c b/migration/file.c
index 02896a7cab99..93eb718aa0f4 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -21,3 +21,18 @@ void file_start_outgoing_migration(MigrationState *s, const 
char *fname, Error *
 }
 
 
+void file_start_incoming_migration(const char *fname, Error **errp)
+{
+   QIOChannelFile *ioc;
+
+   ioc = qio_channel_file_new_path(fname, O_RDONLY, 0, errp);
+   if (!ioc) {
+   error_report("Error creating a channel");
+   return;
+   }
+
+   qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-incoming");
+   migration_channel_process_incoming(QIO_CHANNEL(ioc));
+   object_unref(OBJECT(ioc));
+}
+
diff --git a/migration/file.h b/migration/file.h
index d476eb1157f9..cdbd291322d4 100644
--- a/migration/file.h
+++ b/migration/file.h
@@ -5,5 +5,6 @@ void file_start_outgoing_migration(MigrationState *s,
const char *filename,
Error **errp);
 
+void file_start_incoming_migration(const char *fname, Error **errp);
 #endif
 
diff --git a/migration/migration.c b/migration/migration.c
index 8813b78b9a6b..140b0f1a54bd 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -506,6 +506,8 @@ static void qemu_start_incoming_migration(const char *uri, 
Error **errp)
 exec_start_incoming_migration(p, errp);
 } else if (strstart(uri, "fd:", &p)) {
 fd_start_incoming_migration(p, errp);
+} else if (strstart(uri, "file:", &p)) {
+   file_start_incoming_migration(p, errp);
 } else {
 error_setg(errp, "unknown migration protocol: %s", uri);
 }
-- 
2.34.1




[PATCH v2 04/11] io: add pwritev support to QIOChannelFile

2022-10-10 Thread Nikolay Borisov
The upcoming 'fixed-ram' feature would require qemu to write data at
specific offsets of the file. This is currently missing so this patch
adds it. I've chosen to implement it as a type-specific function rather
than plumb it through the generic channel interface as it relies on
being able to do seeks.

Signed-off-by: Nikolay Borisov 
---
 include/io/channel-file.h |  5 +
 io/channel-file.c | 24 
 2 files changed, 29 insertions(+)

diff --git a/include/io/channel-file.h b/include/io/channel-file.h
index 50e8eb113868..0a5d54f5e58e 100644
--- a/include/io/channel-file.h
+++ b/include/io/channel-file.h
@@ -89,4 +89,9 @@ qio_channel_file_new_path(const char *path,
   mode_t mode,
   Error **errp);
 
+ssize_t qio_channel_file_pwritev(QIOChannel *ioc,
+const struct iovec *iov,
+size_t niov,
+off_t offset,
+Error **errp);
 #endif /* QIO_CHANNEL_FILE_H */
diff --git a/io/channel-file.c b/io/channel-file.c
index b67687c2aa64..da17d0a11ba7 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -136,6 +136,30 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc,
 return ret;
 }
 
+ssize_t qio_channel_file_pwritev(QIOChannel *ioc,
+const struct iovec *iov,
+size_t niov,
+off_t offset,
+Error **errp)
+{
+QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
+ssize_t ret;
+
+ retry:
+ret = pwritev(fioc->fd, iov, niov, offset);
+if (ret <= 0) {
+if (errno == EAGAIN) {
+return QIO_CHANNEL_ERR_BLOCK;
+}
+if (errno == EINTR) {
+goto retry;
+}
+error_setg_errno(errp, errno, "Unable to write to file");
+return -1;
+}
+return ret;
+}
+
 static int qio_channel_file_set_blocking(QIOChannel *ioc,
  bool enabled,
  Error **errp)
-- 
2.34.1




[PATCH v2 00/11] Add support for fixed ram offsets during migration

2022-10-10 Thread Nikolay Borisov
Here's a slightly updated version of the patchset. In its core it remains
however I have addressed a couple of issue, namely :

* Don't initialize the shadow bitmap as all set - this was causing the full ram
to be copied always.

* Fixed a memory leak in parse_ramblocks_fixed_ram by making the dirty bitmap
variable an g_autofree one

* Slightly reworked how ram is being copied during restore and now the code is
working in chunks of 4mb (might have to tweak this?)

For a more general overview of what this series is all about refer to the
initial posting [0].

[0] 
https://lore.kernel.org/qemu-devel/20221004123733.2745519-1-nbori...@suse.com/

Nikolay Borisov (11):
  migration: support file: uri for source migration
  migration: Add support for 'file:' uri for incoming migration
  migration: Make migration json writer part of MigrationState struct
  io: add pwritev support to QIOChannelFile
  io: Add support for seekable channels
  io: Add preadv support to QIOChannelFile
  migration: add qemu_get_buffer_at
  migration/ram: Introduce 'fixed-ram' migration stream capability
  migration: Refactor precopy ram loading code
  migration: Add support for 'fixed-ram' migration restore
  analyze-migration.py: add initial support for fixed ram streams

 include/exec/ramblock.h |   7 +
 include/io/channel-file.h   |  10 +
 include/io/channel.h|   1 +
 include/migration/qemu-file-types.h |   2 +
 io/channel-file.c   |  55 +
 migration/file.c|  38 
 migration/file.h|  10 +
 migration/meson.build   |   1 +
 migration/migration.c   |  61 +-
 migration/migration.h   |   6 +
 migration/qemu-file.c   |  82 +++
 migration/qemu-file.h   |   4 +
 migration/ram.c | 328 +---
 migration/savevm.c  |  39 ++--
 qapi/migration.json |   2 +-
 scripts/analyze-migration.py|  49 -
 16 files changed, 594 insertions(+), 101 deletions(-)
 create mode 100644 migration/file.c
 create mode 100644 migration/file.h

--
2.34.1




[PATCH v2 06/11] io: Add preadv support to QIOChannelFile

2022-10-10 Thread Nikolay Borisov
preadv is going to be needed when 'fixed-ram'-enabled stream are to be
restored. Simply add a wrapper around preadv that's specific to
QIOChannelFile.

Signed-off-by: Nikolay Borisov 
---
 include/io/channel-file.h |  5 +
 io/channel-file.c | 26 ++
 2 files changed, 31 insertions(+)

diff --git a/include/io/channel-file.h b/include/io/channel-file.h
index 0a5d54f5e58e..2187f5affd23 100644
--- a/include/io/channel-file.h
+++ b/include/io/channel-file.h
@@ -89,6 +89,11 @@ qio_channel_file_new_path(const char *path,
   mode_t mode,
   Error **errp);
 
+ssize_t qio_channel_file_preadv(QIOChannel *ioc,
+  const struct iovec *iov,
+  size_t niov,
+  off_t offset,
+  Error **errp);
 ssize_t qio_channel_file_pwritev(QIOChannel *ioc,
 const struct iovec *iov,
 size_t niov,
diff --git a/io/channel-file.c b/io/channel-file.c
index d84a6737f2f7..edca64ad63a7 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -141,6 +141,32 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc,
 return ret;
 }
 
+ssize_t qio_channel_file_preadv(QIOChannel *ioc,
+  const struct iovec *iov,
+  size_t niov,
+  off_t offset,
+  Error **errp)
+{
+QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
+ssize_t ret;
+
+ retry:
+ret = preadv(fioc->fd, iov, niov, offset);
+if (ret < 0) {
+if (errno == EAGAIN) {
+return QIO_CHANNEL_ERR_BLOCK;
+}
+if (errno == EINTR) {
+goto retry;
+}
+
+error_setg_errno(errp, errno, "Unable to read from file");
+return -1;
+}
+
+return ret;
+}
+
 ssize_t qio_channel_file_pwritev(QIOChannel *ioc,
 const struct iovec *iov,
 size_t niov,
-- 
2.34.1




[PATCH v2 05/11] io: Add support for seekable channels

2022-10-10 Thread Nikolay Borisov
 Add a bunch of auxiliarry methods and a feature flag to work with
 SEEKABLE channels. Currently the only channel considered seekable is
 QIOChannelFile. Also add a bunch of helper functions to QEMUFile that
 can make use of this channel feature. All of this is in prepration for
 supporting 'fixed-ram' migration stream feature.

Signed-off-by: Nikolay Borisov 
---
 include/io/channel.h|  1 +
 include/migration/qemu-file-types.h |  2 +
 io/channel-file.c   |  5 +++
 migration/qemu-file.c   | 59 +
 migration/qemu-file.h   |  3 ++
 5 files changed, 70 insertions(+)

diff --git a/include/io/channel.h b/include/io/channel.h
index c680ee748021..4fc37c78e68c 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -41,6 +41,7 @@ enum QIOChannelFeature {
 QIO_CHANNEL_FEATURE_SHUTDOWN,
 QIO_CHANNEL_FEATURE_LISTEN,
 QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY,
+QIO_CHANNEL_FEATURE_SEEKABLE,
 };
 
 
diff --git a/include/migration/qemu-file-types.h 
b/include/migration/qemu-file-types.h
index 2867e3da84ab..eb0325ee8687 100644
--- a/include/migration/qemu-file-types.h
+++ b/include/migration/qemu-file-types.h
@@ -50,6 +50,8 @@ unsigned int qemu_get_be16(QEMUFile *f);
 unsigned int qemu_get_be32(QEMUFile *f);
 uint64_t qemu_get_be64(QEMUFile *f);
 
+bool qemu_file_is_seekable(QEMUFile *f);
+
 static inline void qemu_put_be64s(QEMUFile *f, const uint64_t *pv)
 {
 qemu_put_be64(f, *pv);
diff --git a/io/channel-file.c b/io/channel-file.c
index da17d0a11ba7..d84a6737f2f7 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -35,6 +35,7 @@ qio_channel_file_new_fd(int fd)
 
 ioc->fd = fd;
 
+qio_channel_set_feature(QIO_CHANNEL(ioc), QIO_CHANNEL_FEATURE_SEEKABLE);
 trace_qio_channel_file_new_fd(ioc, fd);
 
 return ioc;
@@ -59,6 +60,10 @@ qio_channel_file_new_path(const char *path,
 return NULL;
 }
 
+if (lseek(ioc->fd, 0, SEEK_CUR) != (off_t)-1) {
+qio_channel_set_feature(QIO_CHANNEL(ioc), 
QIO_CHANNEL_FEATURE_SEEKABLE);
+}
+
 trace_qio_channel_file_new_path(ioc, path, flags, mode, ioc->fd);
 
 return ioc;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 4f400c2e5265..07ba1125e83f 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -30,6 +30,7 @@
 #include "qemu-file.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "io/channel-file.h"
 
 #define IO_BUF_SIZE 32768
 #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
@@ -260,6 +261,10 @@ static void qemu_iovec_release_ram(QEMUFile *f)
 memset(f->may_free, 0, sizeof(f->may_free));
 }
 
+bool qemu_file_is_seekable(QEMUFile *f)
+{
+return qio_channel_has_feature(f->ioc, QIO_CHANNEL_FEATURE_SEEKABLE);
+}
 
 /**
  * Flushes QEMUFile buffer
@@ -538,6 +543,60 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, 
size_t size)
 }
 }
 
+void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t 
pos)
+{
+Error *err = NULL;
+struct iovec iov = { .iov_base = (char *)buf, .iov_len = buflen };
+
+if (f->last_error) {
+return;
+}
+
+qemu_fflush(f);
+
+if (qio_channel_file_pwritev(f->ioc, &iov, 1, pos, &err) == (off_t)-1)
+goto error;
+
+return;
+
+ error:
+qemu_file_set_error_obj(f, -EIO, err);
+return;
+}
+
+void qemu_set_offset(QEMUFile *f, off_t off, int whence)
+{
+Error *err = NULL;
+off_t ret;
+
+qemu_fflush(f);
+
+if (!qemu_file_is_writable(f)) {
+   f->buf_index = 0;
+   f->buf_size = 0;
+}
+
+ret = qio_channel_io_seek(f->ioc, off, whence, &err);
+if (ret == (off_t)-1) {
+qemu_file_set_error_obj(f, -EIO, err);
+}
+}
+
+off_t qemu_get_offset(QEMUFile *f)
+{
+Error *err = NULL;
+off_t ret;
+
+qemu_fflush(f);
+
+ret = qio_channel_io_seek(f->ioc, 0, SEEK_CUR, &err);
+if (ret == (off_t)-1) {
+qemu_file_set_error_obj(f, -EIO, err);
+}
+return ret;
+}
+
+
 void qemu_put_byte(QEMUFile *f, int v)
 {
 if (f->last_error) {
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index fa13d04d787c..33cfc07b81d1 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -148,6 +148,9 @@ int qemu_file_shutdown(QEMUFile *f);
 QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
+void qemu_set_offset(QEMUFile *f, off_t off, int whence);
+off_t qemu_get_offset(QEMUFile *f);
+void qemu_put_buffer_at(QEMUFile *f, const uint8_t *buf, size_t buflen, off_t 
pos);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
-- 
2.34.1




[PATCH v2 11/11] analyze-migration.py: add initial support for fixed ram streams

2022-10-10 Thread Nikolay Borisov
This commit introduces the minimum code necessary to support parsing
migration strems with 'fixed-ram' capability set. The only thing really
missing is the implementation of write_or_dump_fixed_ram() which deals
with '-x'/'-m' options.

Signed-off-by: Nikolay Borisov 
---
 scripts/analyze-migration.py | 49 +---
 1 file changed, 46 insertions(+), 3 deletions(-)

diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
index b82a1b0c58c4..9785a640fbf8 100755
--- a/scripts/analyze-migration.py
+++ b/scripts/analyze-migration.py
@@ -23,6 +23,7 @@
 import collections
 import struct
 import sys
+import math
 
 
 def mkdir_p(path):
@@ -119,11 +120,16 @@ def __init__(self, file, version_id, ramargs, 
section_key):
 self.file = file
 self.section_key = section_key
 self.TARGET_PAGE_SIZE = ramargs['page_size']
+self.TARGET_PAGE_BITS = math.log2(self.TARGET_PAGE_SIZE)
 self.dump_memory = ramargs['dump_memory']
 self.write_memory = ramargs['write_memory']
+self.fixed_ram = ramargs['fixed-ram']
 self.sizeinfo = collections.OrderedDict()
+self.bitmap_offset = collections.OrderedDict()
+self.pages_offset = collections.OrderedDict()
 self.data = collections.OrderedDict()
 self.data['section sizes'] = self.sizeinfo
+self.ram_read = False
 self.name = ''
 if self.write_memory:
 self.files = { }
@@ -140,7 +146,13 @@ def __str__(self):
 def getDict(self):
 return self.data
 
+def write_or_dump_fixed_ram(self):
+pass
+
 def read(self):
+if self.fixed_ram and self.ram_read:
+return
+
 # Read all RAM sections
 while True:
 addr = self.file.read64()
@@ -167,7 +179,25 @@ def read(self):
 f.truncate(0)
 f.truncate(len)
 self.files[self.name] = f
+
+if self.fixed_ram:
+bitmap_len = self.file.read32()
+# skip the pages_offset which we don't need
+offset = self.file.tell() + 8
+self.bitmap_offset[self.name] = offset
+offset = ((offset + bitmap_len + self.TARGET_PAGE_SIZE 
- 1) // self.TARGET_PAGE_SIZE) * self.TARGET_PAGE_SIZE
+self.pages_offset[self.name] = offset
+self.file.file.seek(offset + len)
+
 flags &= ~self.RAM_SAVE_FLAG_MEM_SIZE
+if self.fixed_ram:
+self.ram_read = True
+# now we should rewind to the ram page offset of the first
+# ram section
+if self.fixed_ram:
+if self.write_memory or self.dump_memory:
+self.write_or_dump_fixed_ram()
+return
 
 if flags & self.RAM_SAVE_FLAG_COMPRESS:
 if flags & self.RAM_SAVE_FLAG_CONTINUE:
@@ -208,7 +238,7 @@ def read(self):
 
 # End of RAM section
 if flags & self.RAM_SAVE_FLAG_EOS:
-break
+   return
 
 if flags != 0:
 raise Exception("Unknown RAM flags: %x" % flags)
@@ -521,6 +551,7 @@ def read(self, desc_only = False, dump_memory = False, 
write_memory = False):
 ramargs['page_size'] = self.vmsd_desc['page_size']
 ramargs['dump_memory'] = dump_memory
 ramargs['write_memory'] = write_memory
+ramargs['fixed-ram'] = False
 self.section_classes[('ram',0)][1] = ramargs
 
 while True:
@@ -528,8 +559,20 @@ def read(self, desc_only = False, dump_memory = False, 
write_memory = False):
 if section_type == self.QEMU_VM_EOF:
 break
 elif section_type == self.QEMU_VM_CONFIGURATION:
-section = ConfigurationSection(file)
-section.read()
+config_desc = self.vmsd_desc.get('configuration')
+if config_desc is not None:
+config = VMSDSection(file, 1, config_desc, 'configuration')
+config.read()
+caps = config.data.get("configuration/capabilities")
+if caps is not None:
+caps = caps.data["capabilities"]
+if type(caps) != list:
+caps = [caps]
+for i in caps:
+# chomp out string length
+cap = i.data[1:].decode("utf8")
+if cap == "fixed-ram":
+ramargs['fixed-ram'] = True
 elif section_type == self.QEMU_VM_SECTION_START or section_type == 
self.QEMU_VM_SECTION_FULL:
 section_id = file.read32()
 name = file.readstr()
-- 
2.34.1




[PATCH v2 09/11] migration: Refactor precopy ram loading code

2022-10-10 Thread Nikolay Borisov
To facilitate easier implementaiton of the 'fixed-ram' migration restore
factor out the code responsible for parsing the ramblocks headers. This
also makes ram_load_precopy easier to comprehend.

Signed-off-by: Nikolay Borisov 
---
 migration/ram.c | 142 +++-
 1 file changed, 80 insertions(+), 62 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 4f5ddaff356b..1dd68c221667 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4253,6 +4253,83 @@ void colo_flush_ram_cache(void)
 trace_colo_flush_ram_cache_end();
 }
 
+static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length)
+{
+int ret = 0;
+/* ADVISE is earlier, it shows the source has the postcopy capability on */
+bool postcopy_advised = postcopy_is_advised();
+
+assert(block);
+
+if (!qemu_ram_is_migratable(block)) {
+error_report("block %s should not be migrated !", block->idstr);
+ret = -EINVAL;
+}
+
+if (length != block->used_length) {
+Error *local_err = NULL;
+
+ret = qemu_ram_resize(block, length, &local_err);
+if (local_err) {
+error_report_err(local_err);
+}
+}
+/* For postcopy we need to check hugepage sizes match */
+if (postcopy_advised && migrate_postcopy_ram() &&
+block->page_size != qemu_host_page_size) {
+uint64_t remote_page_size = qemu_get_be64(f);
+if (remote_page_size != block->page_size) {
+error_report("Mismatched RAM page size %s "
+ "(local) %zd != %" PRId64, block->idstr,
+ block->page_size, remote_page_size);
+ret = -EINVAL;
+}
+}
+if (migrate_ignore_shared()) {
+hwaddr addr = qemu_get_be64(f);
+if (ramblock_is_ignored(block) &&
+block->mr->addr != addr) {
+error_report("Mismatched GPAs for block %s "
+ "%" PRId64 "!= %" PRId64, block->idstr,
+ (uint64_t)addr,
+ (uint64_t)block->mr->addr);
+ret = -EINVAL;
+}
+}
+ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG, block->idstr);
+
+return ret;
+}
+
+static int parse_ramblocks(QEMUFile *f, ram_addr_t total_ram_bytes)
+{
+int ret = 0;
+
+/* Synchronize RAM block list */
+while (!ret && total_ram_bytes) {
+char id[256];
+RAMBlock *block;
+ram_addr_t length;
+int len = qemu_get_byte(f);
+
+qemu_get_buffer(f, (uint8_t *)id, len);
+id[len] = 0;
+length = qemu_get_be64(f);
+
+block = qemu_ram_block_by_name(id);
+if (block) {
+ret = parse_ramblock(f, block, length);
+} else {
+error_report("Unknown ramblock \"%s\", cannot accept "
+ "migration", id);
+ret = -EINVAL;
+}
+total_ram_bytes -= length;
+}
+
+return ret;
+}
+
 /**
  * ram_load_precopy: load pages in precopy case
  *
@@ -4267,14 +4344,13 @@ static int ram_load_precopy(QEMUFile *f)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
 int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0;
-/* ADVISE is earlier, it shows the source has the postcopy capability on */
-bool postcopy_advised = postcopy_is_advised();
+
 if (!migrate_use_compression()) {
 invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE;
 }
 
 while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
-ram_addr_t addr, total_ram_bytes;
+ram_addr_t addr;
 void *host = NULL, *host_bak = NULL;
 uint8_t ch;
 
@@ -4345,65 +4421,7 @@ static int ram_load_precopy(QEMUFile *f)
 
 switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
 case RAM_SAVE_FLAG_MEM_SIZE:
-/* Synchronize RAM block list */
-total_ram_bytes = addr;
-while (!ret && total_ram_bytes) {
-RAMBlock *block;
-char id[256];
-ram_addr_t length;
-
-len = qemu_get_byte(f);
-qemu_get_buffer(f, (uint8_t *)id, len);
-id[len] = 0;
-length = qemu_get_be64(f);
-
-block = qemu_ram_block_by_name(id);
-if (block && !qemu_ram_is_migratable(block)) {
-error_report("block %s should not be migrated !", id);
-ret = -EINVAL;
-} else if (block) {
-if (length != block->used_length) {
-Error *local_err = NULL;
-
-ret = qemu_ram_resize(block, length,
-  &local_err);
-if (local_err) {
-error_report_err(local_err);
-}
-}
-/* For postcopy we need to check hugepage sizes match */
-if (postcopy_

Re: [PULL 28/54] configure: build ROMs with container-based cross compilers

2022-10-10 Thread Alex Bennée


Daniel P. Berrangé  writes:

> On Tue, Oct 04, 2022 at 02:01:12PM +0100, Alex Bennée wrote:
>> From: Paolo Bonzini 
>> 
>> s390-ccw remains a bit more complex, because the -march=z900 test is done
>> only for the native cross compiler.  Otherwise, all that is needed is
>> to pass the (now mandatory) target argument to write_target_makefile.
>> 
>> Signed-off-by: Paolo Bonzini 
>> Signed-off-by: Alex Bennée 
>> Message-Id: <20220929114231.583801-29-alex.ben...@linaro.org>
>
> I'm not at all convinced this change was/is a good idea.
>
> First of all, it causes 'make' to now download about 1 GB of
> container images
>
>   $ ./configure --target-list=x86_64-softmmu
>   $ make
>   ...snip...
>   BUILD   debian-powerpc-test-cross
>   Trying to pull 
> registry.gitlab.com/qemu-project/qemu/qemu/debian-powerpc-test-cross:latest...
>   Getting image source signatures
>   Copying blob 2a205c8a1d36 [=>] 12.4MiB 
> / 257.2MiB
>
>   ...
>   ...snip...
>   
> Despite downloading this image, it then proceeded to rebuild the
> image from scratch, requiring another few 100MBs of downloads
> of dpkgs. This time the download was without progress information
> until it entirely failed due to a dead Debia mirror server, needing
> a retry.
>
> It then went on to download an s390x image which seems to have
> two layers, each with 360 MB.
>
>   BUILD   debian-s390x-cross
> Trying to pull 
> registry.gitlab.com/qemu-project/qemu/qemu/debian-s390x-cross:latest...
> Getting image source signatures
> Copying blob fc8d65e34cd5 [>-] 12.0MiB / 
> 360.2MiB
> Copying blob bd159e379b3b skipped: already exists  
> Copying blob 13224e2971af [>-] 12.2MiB / 
> 366.5MiB
>
> So overall it was more than 1 GB of downloads when typing 'make'
>
> I wasn't too amuzed by seeing this downloaded data , given that
> I'm usually running off a 4G mobile connection, and it took a
> very long time.

Yikes, sorry I didn't notice that (probably because I always have most
of the containers built).

I was hoping the next set of patches would reduce the total re-build
time to just the mirror operation by dumping docker.py and any caching
that breaks.

> The progress information printed by docker when downloading
> the images splatters all over the output meson displays, when
> doing a parallel make making everything unintelligible.
>
>
> Finally, I had requested only building x86_64, so we shouldn't
> be doing anything related to ppc or s390 at all, but even if
>
> AFAICT, it enables this downloading unconditionally merely by
> having 'docker'/'podman' binaries installed, if you don't
> otherwise have cross compuilers present.
>
> I'd really not want to see any of this stuff downloaded without
> an explicit opt-in choice at configure time.
>
> I'm also a little concerned at what happens if we have to stop
> publishing the containers at registry.gitlab.com in future. Are
> we going to break the default 'make' for existing released QEMU
> tarballs ?

We can easily move the registry around. The aim of this work is to
eventually stop local re-builds for most people.

>
> Generally we've only relied on the gitlab infra for our CI
> testing, so we have been free to change infra or alter the
> way we publish images at any time, without risk of impact on
> the released tarballs.
>
> This isn't a theoretical problem, because GitLab has announced
> their intention to limit storage usage in gitlab.com, and even
> having joined the Open Source Program, our quota is only increased
> from 5 GB to 25 GB.  I'd be concerned we're at risk of exceeding
> that 25 GB limit, when they start to enforce it, requiring us to
> move container image host to somewhere else such as quay.io
>
>
>> diff --git a/configure b/configure
>> index c175650eb9..a54e17aca9 100755
>> --- a/configure
>> +++ b/configure
>> @@ -2152,7 +2152,7 @@ probe_target_compiler() {
>>  target_ranlib=
>>  target_strip=
>>fi
>> -  test -n "$target_cc"
>> +  test -n "$target_cc" || test -n "$container_image"
>>  }
>>  
>>  write_target_makefile() {
>> @@ -2307,7 +2307,7 @@ if test "$targetos" != "darwin" && test "$targetos" != 
>> "sunos" && \
>>  config_mak=pc-bios/optionrom/config.mak
>>  echo "# Automatically generated by configure - do not modify" > 
>> $config_mak
>>  echo "TOPSRC_DIR=$source_path" >> $config_mak
>> -write_target_makefile >> $config_mak
>> +write_target_makefile pc-bios/optionrom/all >> $config_mak
>>  fi
>>  
>>  if test "$softmmu" = yes && probe_target_compiler ppc-softmmu; then
>> @@ -2315,25 +2315,31 @@ if test "$softmmu" = yes && probe_target_compiler 
>> ppc-softmmu; then
>>  config_mak=pc-bios/vof/config.mak
>>  echo "# Automatically generated by configure - do not modify" > 
>> $config_mak
>>  echo "SRC_DIR=$source_path/pc-bios/vof" >> $config_mak
>> -write_target_makefile >> $config_mak
>> +write_target_makefile pc-bios/vof/all >> 

Re: [PATCH v6 12/13] blkio: implement BDRV_REQ_REGISTERED_BUF optimization

2022-10-10 Thread Alberto Faria
On Thu, Oct 6, 2022 at 10:35 PM Stefan Hajnoczi  wrote:
> Avoid bounce buffers when QEMUIOVector elements are within previously
> registered bdrv_register_buf() buffers.
>
> The idea is that emulated storage controllers will register guest RAM
> using bdrv_register_buf() and set the BDRV_REQ_REGISTERED_BUF on I/O
> requests. Therefore no blkio_map_mem_region() calls are necessary in the
> performance-critical I/O code path.
>
> This optimization doesn't apply if the I/O buffer is internally
> allocated by QEMU (e.g. qcow2 metadata). There we still take the slow
> path because BDRV_REQ_REGISTERED_BUF is not set.
>
> Signed-off-by: Stefan Hajnoczi 
> ---
>  block/blkio.c | 183 +-
>  1 file changed, 180 insertions(+), 3 deletions(-)
>
> diff --git a/block/blkio.c b/block/blkio.c
> index 9a79789a39..5ce61d5d94 100644
> --- a/block/blkio.c
> +++ b/block/blkio.c
> @@ -11,9 +11,13 @@
>  #include "qemu/osdep.h"
>  #include 
>  #include "block/block_int.h"
> +#include "exec/memory.h"
> +#include "exec/cpu-common.h" /* for qemu_ram_get_fd() */
>  #include "qapi/error.h"
> +#include "qemu/error-report.h"
>  #include "qapi/qmp/qdict.h"
>  #include "qemu/module.h"
> +#include "exec/memory.h" /* for ram_block_discard_disable() */
>
>  /*
>   * Keep the QEMU BlockDriver names identical to the libblkio driver names.
> @@ -73,6 +77,12 @@ typedef struct {
>
>  /* Can we skip adding/deleting blkio_mem_regions? */
>  bool needs_mem_regions;
> +
> +/* Are file descriptors necessary for blkio_mem_regions? */
> +bool needs_mem_region_fd;
> +
> +/* Are madvise(MADV_DONTNEED)-style operations unavailable? */
> +bool mem_regions_pinned;
>  } BDRVBlkioState;
>
>  /* Called with s->bounce_lock held */
> @@ -347,7 +357,8 @@ blkio_co_preadv(BlockDriverState *bs, int64_t offset, 
> int64_t bytes,
>  .coroutine = qemu_coroutine_self(),
>  };
>  BDRVBlkioState *s = bs->opaque;
> -bool use_bounce_buffer = s->needs_mem_regions;
> +bool use_bounce_buffer =
> +s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
>  BlkioBounceBuf bounce;
>  struct iovec *iov = qiov->iov;
>  int iovcnt = qiov->niov;
> @@ -390,7 +401,8 @@ static int coroutine_fn blkio_co_pwritev(BlockDriverState 
> *bs, int64_t offset,
>  .coroutine = qemu_coroutine_self(),
>  };
>  BDRVBlkioState *s = bs->opaque;
> -bool use_bounce_buffer = s->needs_mem_regions;
> +bool use_bounce_buffer =
> +s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF);
>  BlkioBounceBuf bounce;
>  struct iovec *iov = qiov->iov;
>  int iovcnt = qiov->niov;
> @@ -473,6 +485,130 @@ static void blkio_io_unplug(BlockDriverState *bs)
>  }
>  }
>
> +typedef enum {
> +BMRR_OK,
> +BMRR_SKIP,
> +BMRR_FAIL,
> +} BlkioMemRegionResult;
> +
> +/*
> + * Produce a struct blkio_mem_region for a given address and size.
> + *
> + * This function produces identical results when called multiple times with 
> the
> + * same arguments. This property is necessary because 
> blkio_unmap_mem_region()
> + * must receive the same struct blkio_mem_region field values that were 
> passed
> + * to blkio_map_mem_region().
> + */
> +static BlkioMemRegionResult
> +blkio_mem_region_from_host(BlockDriverState *bs,
> +   void *host, size_t size,
> +   struct blkio_mem_region *region,
> +   Error **errp)
> +{
> +BDRVBlkioState *s = bs->opaque;
> +int fd = -1;
> +ram_addr_t fd_offset = 0;
> +
> +if (((uintptr_t)host | size) % s->mem_region_alignment) {
> +error_setg(errp, "unaligned buf %p with size %zu", host, size);
> +return BMRR_FAIL;
> +}
> +
> +/* Attempt to find the fd for the underlying memory */
> +if (s->needs_mem_region_fd) {
> +RAMBlock *ram_block;
> +RAMBlock *end_block;
> +ram_addr_t offset;
> +
> +/*
> + * bdrv_register_buf() is called with the BQL held so mr lives at 
> least
> + * until this function returns.
> + */
> +ram_block = qemu_ram_block_from_host(host, false, &fd_offset);
> +if (ram_block) {
> +fd = qemu_ram_get_fd(ram_block);
> +}
> +if (fd == -1) {
> +/*
> + * Ideally every RAMBlock would have an fd. pc-bios and other
> + * things don't. Luckily they are usually not I/O buffers and we
> + * can just ignore them.
> + */
> +return BMRR_SKIP;
> +}
> +
> +/* Make sure the fd covers the entire range */
> +end_block = qemu_ram_block_from_host(host + size - 1, false, 
> &offset);
> +if (ram_block != end_block) {
> +error_setg(errp, "registered buffer at %p with size %zu extends "
> +   "beyond RAMBlock", host, size);
> +return BMRR_FAIL;
> +}
> +}
> +
> +*regi

Re: [PATCH] include/qemu/atomic128: Support 16-byte atomic read/write for Intel AVX

2022-10-10 Thread Richard Henderson

On 10/10/22 02:49, Peter Maydell wrote:

On Sat, 8 Oct 2022 at 16:38, Richard Henderson
 wrote:


Intel has now given guarantees about the atomicity of SSE read
and write instructions on cpus supporting AVX.  We can use these
instead of the much slower cmpxchg16b.

Derived from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

Signed-off-by: Richard Henderson 



+/*
+ * The xgetbv instruction is not available to older versions of
+ * the assembler, so we encode the instruction manually.
+ */
+asm(".byte 0x0f, 0x01, 0xd0" : "=a" (xcrl), "=d" (xcrh) : "c" (0));


This would make the third place in the tree where we hand-code
this asm instruction (we already do it in the xgetbv() function
in target/i386/hvf/x86_cpuid.c and opencoded in tcg_target_init()):
can we abstract this out somehow, please?

(There is also a just-written-out "xgetbv" in init_cpuid_cache(),
but that one's guarded by ifdefs so presumably OK.)


It's also possible that the Xcode revision that didn't know xgetbv is now unsupported. 
Something to check, I suppose.



r~




Re: [PATCH] hw/arm/boot: set CPTR_EL3.ESM and SCR_EL3.EnTP2 when booting Linux with EL3

2022-10-10 Thread Richard Henderson

On 10/10/22 06:16, Peter Maydell wrote:

The doc also says that we (as fake EL3) should be setting
SMCR_EL3.LEN to the same value for all CPUs. Currently we do
do that, but it's always the reset value of 0. Richard: does
that have any odd effects (I have a feeling it clamps the
VL to the minimum supported value)?


Yes, it clamps to minimum.


Should we be setting SMCR_EL3.LEN to the max supported value here?


We can set it to 15 universally without consequence.
It will be clamped to the max supported value elsewhere
when interpreting the combined EL[123] fields.


r~



Re: [RISU PATCH 1/5] risu: Use alternate stack

2022-10-10 Thread Richard Henderson

On 9/17/22 00:43, Song Gao wrote:

We can use alternate stack, so that we can use sp register as intput/ouput 
register.
I had tested aarch64/LoongArch architecture.

Signed-off-by: Song Gao
---
  risu.c | 16 +++-
  1 file changed, 15 insertions(+), 1 deletion(-)


Good idea.

Reviewed-by: Richard Henderson 


r~



[PATCH 1/3] linux-user: i386/signal: move fpstate at the end of the 32-bit frames

2022-10-10 Thread Paolo Bonzini
Recent versions of Linux moved the 32-bit fpstate towards the end of the
frame, so that the variable-sized xsave data does not overwrite the
(ABI-defined) extramask[] field.  Follow suit in QEMU.

Signed-off-by: Paolo Bonzini 
---
 linux-user/i386/signal.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c
index 4372621a4d..76317a3d16 100644
--- a/linux-user/i386/signal.c
+++ b/linux-user/i386/signal.c
@@ -163,9 +163,16 @@ struct sigframe {
 abi_ulong pretcode;
 int sig;
 struct target_sigcontext sc;
-struct target_fpstate fpstate;
+/*
+ * The actual fpstate is placed after retcode[] below, to make
+ * room for the variable-sized xsave data.  The older unused fpstate
+ * has to be kept to avoid changing the offset of extramask[], which
+ * is part of the ABI.
+ */
+struct target_fpstate fpstate_unused;
 abi_ulong extramask[TARGET_NSIG_WORDS-1];
 char retcode[8];
+struct target_fpstate fpstate;
 };
 
 struct rt_sigframe {
@@ -175,8 +182,8 @@ struct rt_sigframe {
 abi_ulong puc;
 struct target_siginfo info;
 struct target_ucontext uc;
-struct target_fpstate fpstate;
 char retcode[8];
+struct target_fpstate fpstate;
 };
 
 #else
-- 
2.37.3




[PATCH 2/3] linux-user: i386/signal: support FXSAVE fpstate on 32-bit emulation

2022-10-10 Thread Paolo Bonzini
Linux can use FXSAVE to save/restore XMM registers even on 32-bit
systems.  This requires some care in order to keep the FXSAVE area
aligned to 16 bytes; for this reason, get_sigframe is changed to
pass the offset into the FXSAVE area rather than the full frame
size.

Signed-off-by: Paolo Bonzini 
---
 linux-user/i386/signal.c | 129 +++
 1 file changed, 77 insertions(+), 52 deletions(-)

diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c
index 76317a3d16..53c1c05581 100644
--- a/linux-user/i386/signal.c
+++ b/linux-user/i386/signal.c
@@ -39,29 +39,7 @@ struct target_xmmreg {
 uint32_t element[4];
 };
 
-struct target_fpstate_32 {
-/* Regular FPU environment */
-uint32_t cw;
-uint32_t sw;
-uint32_t tag;
-uint32_t ipoff;
-uint32_t cssel;
-uint32_t dataoff;
-uint32_t datasel;
-struct target_fpreg st[8];
-uint16_t  status;
-uint16_t  magic;  /* 0x = regular FPU data only */
-
-/* FXSR FPU environment */
-uint32_t _fxsr_env[6];   /* FXSR FPU env is ignored */
-uint32_t mxcsr;
-uint32_t reserved;
-struct target_fpxreg fxsr_st[8]; /* FXSR FPU reg data is ignored */
-struct target_xmmreg xmm[8];
-uint32_t padding[56];
-};
-
-struct target_fpstate_64 {
+struct target_fpstate_fxsave {
 /* FXSAVE format */
 uint16_t cw;
 uint16_t sw;
@@ -75,11 +53,36 @@ struct target_fpstate_64 {
 uint32_t xmm_space[64];
 uint32_t reserved[24];
 };
+#define TARGET_FXSAVE_SIZE   sizeof(struct target_fpstate_fxsave)
+QEMU_BUILD_BUG_ON(TARGET_FXSAVE_SIZE != 512);
+
+struct target_fpstate_32 {
+/* Regular FPU environment */
+uint32_t cw;
+uint32_t sw;
+uint32_t tag;
+uint32_t ipoff;
+uint32_t cssel;
+uint32_t dataoff;
+uint32_t datasel;
+struct target_fpreg st[8];
+uint16_t  status;
+uint16_t  magic;  /* 0x = regular FPU data only */
+struct target_fpstate_fxsave fxsave;
+};
+
+/*
+ * For simplicity, setup_frame aligns struct target_fpstate_32 to
+ * 16 bytes, so ensure that the FXSAVE area is also aligned.
+ */
+QEMU_BUILD_BUG_ON(offsetof(struct target_fpstate_32, fxsave) & 15);
 
 #ifndef TARGET_X86_64
 # define target_fpstate target_fpstate_32
+# define TARGET_FPSTATE_FXSAVE_OFFSET offsetof(struct target_fpstate_32, 
fxsave)
 #else
-# define target_fpstate target_fpstate_64
+# define target_fpstate target_fpstate_fxsave
+# define TARGET_FPSTATE_FXSAVE_OFFSET 0
 #endif
 
 struct target_sigcontext_32 {
@@ -172,8 +175,16 @@ struct sigframe {
 struct target_fpstate fpstate_unused;
 abi_ulong extramask[TARGET_NSIG_WORDS-1];
 char retcode[8];
-struct target_fpstate fpstate;
+
+/*
+ * This field must be 16-byte aligned in memory.  Applying QEMU_ALIGNED
+ * to it ensures that the base of the frame has an appropriate alignment
+ * too.
+ */
+struct target_fpstate fpstate QEMU_ALIGNED(8);
 };
+#define TARGET_SIGFRAME_FXSAVE_OFFSET (\
+offsetof(struct sigframe, fpstate) + TARGET_FPSTATE_FXSAVE_OFFSET)
 
 struct rt_sigframe {
 abi_ulong pretcode;
@@ -183,25 +194,35 @@ struct rt_sigframe {
 struct target_siginfo info;
 struct target_ucontext uc;
 char retcode[8];
-struct target_fpstate fpstate;
+struct target_fpstate fpstate QEMU_ALIGNED(8);
 };
-
+#define TARGET_RT_SIGFRAME_FXSAVE_OFFSET ( \
+offsetof(struct rt_sigframe, fpstate) + TARGET_FPSTATE_FXSAVE_OFFSET)
 #else
 
 struct rt_sigframe {
 abi_ulong pretcode;
 struct target_ucontext uc;
 struct target_siginfo info;
-struct target_fpstate fpstate;
+struct target_fpstate fpstate QEMU_ALIGNED(16);
 };
-
+#define TARGET_RT_SIGFRAME_FXSAVE_OFFSET ( \
+offsetof(struct rt_sigframe, fpstate) + TARGET_FPSTATE_FXSAVE_OFFSET)
 #endif
 
 /*
  * Set up a signal frame.
  */
 
-/* XXX: save x87 state */
+static void fxsave_sigcontext(CPUX86State *env, struct target_fpstate_fxsave 
*fxsave,
+  abi_ulong fxsave_addr)
+{
+/* fxsave_addr must be 16 byte aligned for fxsave */
+assert(!(fxsave_addr & 0xf));
+
+cpu_x86_fxsave(env, fxsave_addr);
+}
+
 static void setup_sigcontext(struct target_sigcontext *sc,
 struct target_fpstate *fpstate, CPUX86State *env, abi_ulong mask,
 abi_ulong fpstate_addr)
@@ -233,13 +254,14 @@ static void setup_sigcontext(struct target_sigcontext *sc,
 
 cpu_x86_fsave(env, fpstate_addr, 1);
 fpstate->status = fpstate->sw;
-magic = 0x;
+if (!(env->features[FEAT_1_EDX] & CPUID_FXSR)) {
+magic = 0x;
+} else {
+fxsave_sigcontext(env, &fpstate->fxsave,
+  fpstate_addr + TARGET_FPSTATE_FXSAVE_OFFSET);
+magic = 0;
+}
 __put_user(magic, &fpstate->magic);
-__put_user(fpstate_addr, &sc->fpstate);
-
-/* non-iBCS2 extensions.. */
-__put_user(mas

[PATCH 0/3] linux-user: i386/signal: support XSAVE/XRSTOR for signal frame fpstate

2022-10-10 Thread Paolo Bonzini
These three patches add support for x86 XSAVE/XRSTOR signal frames in
linux-user emulation.  This ensures that signals save and restore the
extended save states as well.

For 32-bit emulation not even FXSAVE was used, even though the
signal frame supports it.  Therefore, patch 2 extends 32-bit emulation
to use FXSAVE/FXRSTOR if the FXSR bit is set in CPUID.  If it is not
set, QEMU will use FSAVE/FRSTOR as usual; note that recent builds of
glibc most likely will check for FXSR bit even on 32-bit builds, and
will refuse to start with a "CPU ISA level is lower than required"
error.

Paolo

Paolo Bonzini (3):
  linux-user: i386/signal: move fpstate at the end of the 32-bit frames
  linux-user: i386/signal: support FXSAVE fpstate on 32-bit emulation
  linux-user: i386/signal: support XSAVE/XRSTOR for signal frame fpstate

 linux-user/i386/signal.c | 227 ++-
 target/i386/cpu.c|   2 +-
 target/i386/cpu.h|   3 +
 target/i386/tcg/fpu_helper.c |  64 ++
 4 files changed, 210 insertions(+), 86 deletions(-)

-- 
2.37.3




[PATCH 3/3] linux-user: i386/signal: support XSAVE/XRSTOR for signal frame fpstate

2022-10-10 Thread Paolo Bonzini
Add support for saving/restoring extended save states when signals
are delivered.  This allows using AVX, MPX or PKRU registers in
signal handlers.  The patch follows the Linux ABI.

Signed-off-by: Paolo Bonzini 
---
 linux-user/i386/signal.c | 115 +--
 target/i386/cpu.c|   2 +-
 target/i386/cpu.h|   3 +
 target/i386/tcg/fpu_helper.c |  64 +++
 4 files changed, 138 insertions(+), 46 deletions(-)

diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c
index 53c1c05581..7dd52eb5dc 100644
--- a/linux-user/i386/signal.c
+++ b/linux-user/i386/signal.c
@@ -39,6 +39,15 @@ struct target_xmmreg {
 uint32_t element[4];
 };
 
+struct target_fpx_sw_bytes {
+uint32_t magic1;
+uint32_t extended_size;
+uint64_t xfeatures;
+uint32_t xstate_size;
+uint32_t reserved[7];
+};
+QEMU_BUILD_BUG_ON(sizeof(struct target_fpx_sw_bytes) != 12*4);
+
 struct target_fpstate_fxsave {
 /* FXSAVE format */
 uint16_t cw;
@@ -51,10 +60,13 @@ struct target_fpstate_fxsave {
 uint32_t mxcsr_mask;
 uint32_t st_space[32];
 uint32_t xmm_space[64];
-uint32_t reserved[24];
+uint32_t hw_reserved[12];
+struct target_fpx_sw_bytes sw_reserved;
+uint8_t xfeatures[];
 };
 #define TARGET_FXSAVE_SIZE   sizeof(struct target_fpstate_fxsave)
 QEMU_BUILD_BUG_ON(TARGET_FXSAVE_SIZE != 512);
+QEMU_BUILD_BUG_ON(offsetof(struct target_fpstate_fxsave, sw_reserved) != 464);
 
 struct target_fpstate_32 {
 /* Regular FPU environment */
@@ -214,13 +226,39 @@ struct rt_sigframe {
  * Set up a signal frame.
  */
 
-static void fxsave_sigcontext(CPUX86State *env, struct target_fpstate_fxsave 
*fxsave,
-  abi_ulong fxsave_addr)
+static void xsave_sigcontext(CPUX86State *env, struct target_fpstate_fxsave 
*fxsave,
+ abi_ulong fxsave_addr)
 {
-/* fxsave_addr must be 16 byte aligned for fxsave */
-assert(!(fxsave_addr & 0xf));
+if (!(env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE)) {
+/* fxsave_addr must be 16 byte aligned for fxsave */
+assert(!(fxsave_addr & 0xf));
 
-cpu_x86_fxsave(env, fxsave_addr);
+cpu_x86_fxsave(env, fxsave_addr);
+__put_user(0, &fxsave->sw_reserved.magic1);
+} else {
+uint32_t xstate_size = xsave_area_size(env->xcr0, false);
+uint32_t xfeatures_size = xstate_size - TARGET_FXSAVE_SIZE;
+
+/*
+ * extended_size is the offset from fpstate_addr to right after the end
+ * of the extended save states.  On 32-bit that includes the legacy
+ * FSAVE area.
+ */
+uint32_t extended_size = TARGET_FPSTATE_FXSAVE_OFFSET
++ xstate_size + FP_XSTATE_MAGIC2_SIZE;
+
+/* fxsave_addr must be 64 byte aligned for xsave */
+assert(!(fxsave_addr & 0x3f));
+
+/* Zero the header, XSAVE *adds* features to an existing save state.  
*/
+memset(fxsave->xfeatures, 0, 64);
+cpu_x86_xsave(env, fxsave_addr);
+__put_user(FP_XSTATE_MAGIC1, &fxsave->sw_reserved.magic1);
+__put_user(extended_size, &fxsave->sw_reserved.extended_size);
+__put_user(env->xcr0, &fxsave->sw_reserved.xfeatures);
+__put_user(xstate_size, &fxsave->sw_reserved.xstate_size);
+__put_user(FP_XSTATE_MAGIC2, (uint32_t *) 
&fxsave->xfeatures[xfeatures_size]);
+}
 }
 
 static void setup_sigcontext(struct target_sigcontext *sc,
@@ -257,8 +295,8 @@ static void setup_sigcontext(struct target_sigcontext *sc,
 if (!(env->features[FEAT_1_EDX] & CPUID_FXSR)) {
 magic = 0x;
 } else {
-fxsave_sigcontext(env, &fpstate->fxsave,
-  fpstate_addr + TARGET_FPSTATE_FXSAVE_OFFSET);
+xsave_sigcontext(env, &fpstate->fxsave,
+ fpstate_addr + TARGET_FPSTATE_FXSAVE_OFFSET);
 magic = 0;
 }
 __put_user(magic, &fpstate->magic);
@@ -291,7 +329,7 @@ static void setup_sigcontext(struct target_sigcontext *sc,
 __put_user((uint16_t)0, &sc->fs);
 __put_user(env->segs[R_SS].selector, &sc->ss);
 
-fxsave_sigcontext(env, fpstate, fpstate_addr);
+xsave_sigcontext(env, fpstate, fpstate_addr);
 #endif
 
 __put_user(fpstate_addr, &sc->fpstate);
@@ -332,8 +370,12 @@ get_sigframe(struct target_sigaction *ka, CPUX86State 
*env, size_t fxsave_offset
 
 if (!(env->features[FEAT_1_EDX] & CPUID_FXSR)) {
 return (esp - (fxsave_offset + TARGET_FXSAVE_SIZE)) & -8ul;
-} else {
+} else if (!(env->features[FEAT_1_ECX] & CPUID_EXT_XSAVE)) {
 return ((esp - TARGET_FXSAVE_SIZE) & -16ul) - fxsave_offset;
+} else {
+size_t xstate_size =
+   xsave_area_size(env->xcr0, false) + FP_XSTATE_MAGIC2_SIZE;
+return ((esp - xstate_size) & -64ul) - fxsave_offset;
 }
 }
 
@@ -437,7 +479,11 @@ void setup_rt_frame(int sig, struct target_sigaction *ka,
 }
 
 /* Create the ucontext.  */
-__put_user

Re: [PATCH v2 1/3] hw/misc/latching_switch: Add a simple latching_switch device

2022-10-10 Thread Peter Maydell
On Tue, 20 Sept 2022 at 19:46, Jian Zhang  wrote:
>
> Implement a simple latching switch device.
>
> The latching switch is a switch that can be turned on and off.
> When the input new state and match the trigger edge, the switch
> state will be toggled.
>
> This device privide 2 properties `state(bool)` and `trigger-edge(string)`,
> and 2 gpios `input` and `output`.
>
> The `state` property is the initial state of the switch, and the 
> `trigger-edge`
> property is the edge that will trigger the switch state to be toggled,
> the value can be `rising`, `falling` or `both`.

> +static void toggle_output(LatchingSwitchState *s)
> +{
> +s->state = !(s->state);
> +qemu_set_irq(s->output, !!(s->state));

s->state is a bool so this !! is unnecessary.

> +}
> +
> +static void input_handler(void *opaque, int line, int new_state)
> +{
> +LatchingSwitchState *s = LATCHING_SWITCH(opaque);
> +
> +assert(line == 0);
> +
> +if (s->trigger_edge == TRIGGER_EDGE_FALLING && new_state == 0) {
> +toggle_output(s);

This won't have the correct behaviour, because the device
on the other end of this input line might call
qemu_set_irq() on it twice in a row with new_state == 0 both times.
It's only a falling edge if new_state is 0 and the old
input line state was not 0.

If you need to detect edges then you need to record the state
of the input line within LatchingSwitchState.

> +} else if (s->trigger_edge == TRIGGER_EDGE_RISING && new_state == 1) {
> +toggle_output(s);
> +} else if (s->trigger_edge == TRIGGER_EDGE_BOTH) {
> +toggle_output(s);
> +}
> +}
> +
> +static void latching_switch_reset(DeviceState *dev)
> +{
> +LatchingSwitchState *s = LATCHING_SWITCH(dev);
> +/* reset to off */
> +s->state = false;
> +/* reset to falling edge trigger */
> +s->trigger_edge = TRIGGER_EDGE_FALLING;

trigger_edge isn't guest-visible runtime modifiable state, it's how
the device or board configures the latch when it creates it, right?
Configuration shouldn't be reset, because if the board creates a
rising-edge switch, it should stay a rising edge switch even if the
guest power-cycles itself.

(If the QOM property can be changed at runtime things get more
complex, but in this case it can't be.)

> +}
> +
> +static const VMStateDescription vmstate_latching_switch = {
> +.name = TYPE_LATCHING_SWITCH,
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.fields = (VMStateField[]) {
> +VMSTATE_BOOL(state, LatchingSwitchState),
> +VMSTATE_U8(trigger_edge, LatchingSwitchState),

trigger_edge isn't runtime changeable so it doesn't need to be
saved in the vmstate.

> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
> +static void latching_switch_realize(DeviceState *dev, Error **errp)
> +{
> +LatchingSwitchState *s = LATCHING_SWITCH(dev);
> +
> +/* init a input io */
> +qdev_init_gpio_in(dev, input_handler, 1);
> +
> +/* init a output io */
> +qdev_init_gpio_out(dev, &(s->output), 1);
> +}
> +
> +static void latching_switch_class_init(ObjectClass *klass, void *data)
> +{
> +DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +dc->desc = "Latching Switch";
> +dc->vmsd = &vmstate_latching_switch;
> +dc->reset = latching_switch_reset;
> +dc->realize = latching_switch_realize;
> +set_bit(DEVICE_CATEGORY_DISPLAY, dc->categories);

This is definitely not a display device :-)

> +}
> +
> +static void latching_switch_get_state(Object *obj, Visitor *v, const char 
> *name,
> +  void *opaque, Error **errp)
> +{
> +LatchingSwitchState *s = LATCHING_SWITCH(obj);
> +bool value = s->state;
> +
> +visit_type_bool(v, name, &value, errp);
> +}
> +
> +static void latching_switch_set_state(Object *obj, Visitor *v, const char 
> *name,
> +  void *opaque, Error **errp)

What's the requirement for being able to set the output state via a
QOM property ?

> +{
> +LatchingSwitchState *s = LATCHING_SWITCH(obj);
> +bool value;
> +Error *err = NULL;
> +
> +visit_type_bool(v, name, &value, &err);
> +if (err) {
> +error_propagate(errp, err);
> +return;
> +}
> +
> +if (value != s->state) {
> +toggle_output(s);

This will call qemu_set_irq() on the output, which isn't a valid thing
to do during board creation or in certain stages of reset.

If you need to handle "the board can create a switch which starts off
with its output asserted", that can be done, but it's a little more
complicated (it involves implementing your reset handler as 3-phase-reset).

It looks from patch 3 like your use case in the aspeed board doesn't
require this, so my suggestion would be simply to not implement it:
make the switch always start with the output state being 0.

> +}
> +}
> +static void latching_switch_get_trigger_edge(Object *obj, Visitor *v,

Missing newline between the two functions.

> + 

[PULL 00/28] target-arm queue

2022-10-10 Thread Peter Maydell
Hi; this is the latest target-arm queue; most of this is a refactoring
patchset from RTH for the arm page-table-walk emulation.

thanks
-- PMM

The following changes since commit f1d33f55c47dfdaf8daacd618588ad3ae4c452d1:

  Merge tag 'pull-testing-gdbstub-plugins-gitdm-061022-3' of 
https://github.com/stsquad/qemu into staging (2022-10-06 07:11:56 -0400)

are available in the Git repository at:

  https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20221010

for you to fetch changes up to 915f62844cf62e428c7c178149b5ff1cbe129b07:

  docs/system/arm/emulation.rst: Report FEAT_GTG support (2022-10-10 14:52:25 
+0100)


target-arm queue:
 * Retry KVM_CREATE_VM call if it fails EINTR
 * allow setting SCR_EL3.EnTP2 when FEAT_SME is implemented
 * docs/nuvoton: Update URL for images
 * refactoring of page table walk code
 * hw/arm/boot: set CPTR_EL3.ESM and SCR_EL3.EnTP2 when booting Linux with EL3
 * Don't allow guest to use unimplemented granule sizes
 * Report FEAT_GTG support


Jerome Forissier (2):
  target/arm: allow setting SCR_EL3.EnTP2 when FEAT_SME is implemented
  hw/arm/boot: set CPTR_EL3.ESM and SCR_EL3.EnTP2 when booting Linux with 
EL3

Joel Stanley (1):
  docs/nuvoton: Update URL for images

Peter Maydell (4):
  target/arm/kvm: Retry KVM_CREATE_VM call if it fails EINTR
  target/arm: Don't allow guest to use unimplemented granule sizes
  target/arm: Use ARMGranuleSize in ARMVAParameters
  docs/system/arm/emulation.rst: Report FEAT_GTG support

Richard Henderson (21):
  target/arm: Split s2walk_secure from ipa_secure in get_phys_addr
  target/arm: Make the final stage1+2 write to secure be unconditional
  target/arm: Add is_secure parameter to get_phys_addr_lpae
  target/arm: Fix S2 disabled check in S1_ptw_translate
  target/arm: Add is_secure parameter to regime_translation_disabled
  target/arm: Split out get_phys_addr_with_secure
  target/arm: Add is_secure parameter to v7m_read_half_insn
  target/arm: Add TBFLAG_M32.SECURE
  target/arm: Merge regime_is_secure into get_phys_addr
  target/arm: Add is_secure parameter to do_ats_write
  target/arm: Fold secure and non-secure a-profile mmu indexes
  target/arm: Reorg regime_translation_disabled
  target/arm: Drop secure check for HCR.TGE vs SCTLR_EL1.M
  target/arm: Introduce arm_hcr_el2_eff_secstate
  target/arm: Hoist read of *is_secure in S1_ptw_translate
  target/arm: Remove env argument from combined_attrs_fwb
  target/arm: Pass HCR to attribute subroutines.
  target/arm: Fix ATS12NSO* from S PL1
  target/arm: Split out get_phys_addr_disabled
  target/arm: Fix cacheattr in get_phys_addr_disabled
  target/arm: Use tlb_set_page_full

 docs/system/arm/emulation.rst |   1 +
 docs/system/arm/nuvoton.rst   |   4 +-
 target/arm/cpu-param.h|   2 +-
 target/arm/cpu.h  | 181 --
 target/arm/internals.h| 150 ++-
 hw/arm/boot.c |   4 +
 target/arm/helper.c   | 332 ++--
 target/arm/kvm.c  |   4 +-
 target/arm/m_helper.c |  29 ++-
 target/arm/ptw.c  | 570 ++
 target/arm/tlb_helper.c   |   9 +-
 target/arm/translate-a64.c|   8 -
 target/arm/translate.c|   9 +-
 13 files changed, 717 insertions(+), 586 deletions(-)



[PULL 01/28] target/arm/kvm: Retry KVM_CREATE_VM call if it fails EINTR

2022-10-10 Thread Peter Maydell
Occasionally the KVM_CREATE_VM ioctl can return EINTR, even though
there is no pending signal to be taken. In commit 94ccff13382055
we added a retry-on-EINTR loop to the KVM_CREATE_VM call in the
generic KVM code. Adopt the same approach for the use of the
ioctl in the Arm-specific KVM code (where we use it to create a
scratch VM for probing for various things).

For more information, see the mailing list thread:
https://lore.kernel.org/qemu-devel/8735e0s1zw.wl-...@kernel.org/

Reported-by: Vitaly Chikunov 
Signed-off-by: Peter Maydell 
Reviewed-by: Vitaly Chikunov 
Reviewed-by: Eric Auger 
Acked-by: Marc Zyngier 
Message-id: 20220930113824.1933293-1-peter.mayd...@linaro.org
---
 target/arm/kvm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index e5c1bd50d29..1e4de9b42e3 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -79,7 +79,9 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
*cpus_to_try,
 if (max_vm_pa_size < 0) {
 max_vm_pa_size = 0;
 }
-vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size);
+do {
+vmfd = ioctl(kvmfd, KVM_CREATE_VM, max_vm_pa_size);
+} while (vmfd == -1 && errno == EINTR);
 if (vmfd < 0) {
 goto err;
 }
-- 
2.25.1




  1   2   3   4   >