Re: [PATCH 12/12] tests: ensure we export job results for some cross builds
On 15/02/2023 20.25, Alex Bennée wrote: We do run tests on some cross builds. Provide a template to ensure we export the testlog to the build artefacts and report the test results via the junit. Signed-off-by: Alex Bennée Reported-by: Peter Maydell --- v2 - properly format extends --- .gitlab-ci.d/crossbuild-template.yml | 11 +++ .gitlab-ci.d/crossbuilds.yml | 12 +--- 2 files changed, 20 insertions(+), 3 deletions(-) Reviewed-by: Thomas Huth
Re: [PULL V2 00/10] Net patches
Hi Jason, On 16/2/23 06:24, Jason Wang wrote: The following changes since commit 6a50f64ca01d0a7b97f14f069762bfd88160f31e: Merge tag 'pull-request-2023-02-14' of https://gitlab.com/thuth/qemu into staging (2023-02-14 14:46:10 +) are available in the git repository at: https://github.com/jasowang/qemu.git tags/net-pull-request for you to fetch changes up to 5e53a346d8bd2bd22522e1e7abd8f122673e4adf: vdpa: fix VHOST_BACKEND_F_IOTLB_ASID flag check (2023-02-16 13:17:57 +0800) Changes since V1: - Fix the wrong guest error detection in xlnx-zynqmp-can Christian Svensson (1): net: Increase L2TPv3 buffer to fit jumboframes Eugenio Pérez (1): vdpa: fix VHOST_BACKEND_F_IOTLB_ASID flag check Fiona Ebner (1): hw/net/vmxnet3: allow VMXNET3_MAX_MTU itself as a value Joelle van Dyne (1): vmnet: stop recieving events when VM is stopped Laurent Vivier (1): net: stream: add a new option to automatically reconnect Qiang Liu (2): hw/net/lan9118: log [read|write]b when mode_16bit is enabled rather than abort hw/net/can/xlnx-zynqmp-can: fix assertion failures in transfer_fifo() Can you have a look at this comment from v1? https://lore.kernel.org/qemu-devel/572fcb76-b2f7-20ca-0701-e22dd4e4c...@linaro.org/
Re: [PATCH v2 4/4] target/mips: set correct CP0.Config[4,5] values for M14K(c)
On 16/2/23 06:17, Marcin Nowakowski wrote: Signed-off-by: Marcin Nowakowski Suggested-by: Philippe Mathieu-Daudé --- target/mips/cpu-defs.c.inc | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH 05/12] gitlab: reduce default verbosity of cirrus run
Thomas Huth writes: > On 15/02/2023 20.25, Alex Bennée wrote: >> We also truncate the echoing of the test log if we fail. Ideally we >> would want the build aretefact to be available to gitlab but so far >> how to do this eludes me. >> Signed-off-by: Alex Bennée >> Cc: Daniel P. Berrangé >> --- >> .gitlab-ci.d/cirrus/build.yml | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> diff --git a/.gitlab-ci.d/cirrus/build.yml >> b/.gitlab-ci.d/cirrus/build.yml >> index 7ef6af8d33..6563ff3c7a 100644 >> --- a/.gitlab-ci.d/cirrus/build.yml >> +++ b/.gitlab-ci.d/cirrus/build.yml >> @@ -32,6 +32,6 @@ build_task: >> - $MAKE -j$(sysctl -n hw.ncpu) >> - for TARGET in $TEST_TARGETS ; >> do >> -$MAKE -j$(sysctl -n hw.ncpu) $TARGET V=1 >> -|| { cat meson-logs/testlog.txt; exit 1; } ; >> +$MAKE -j$(sysctl -n hw.ncpu) $TARGET >> +|| { tail -n 200 meson-logs/testlog.txt; exit 1; } ; >> done > > I think it should be OK to publish the artifacts on cirrus-ci.com > instead - you have to click a little bit more often, but you can still > get the artifacts there, see: > > https://lore.kernel.org/qemu-devel/20230215142503.90660-1-th...@redhat.com/ But dropping the V=1 also helps by reducing those chatty softfloat tests. If we could merge that with yours. Are you sending a PR soon or should I pull your patch into this series? > > Thomas -- Alex Bennée Virtualisation Tech Lead @ Linaro
Re: [PATCH v2 3/4] target/mips: implement CP0.Config7.WII bit support
On 16/2/23 06:17, Marcin Nowakowski wrote: Some pre-release 6 cores use CP0.Config7.WII bit to indicate that a disabled interrupt should wake up a sleeping CPU. Enable this bit by default for M14K(c) and P5600. There are potentially other cores that support this feature, but I do not have a complete list. Signed-off-by: Marcin Nowakowski --- target/mips/cpu-defs.c.inc | 3 +++ target/mips/cpu.c | 6 -- target/mips/cpu.h | 1 + 3 files changed, 8 insertions(+), 2 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH v10 09/12] vfio/migration: Implement VFIO migration protocol v2
On 15/02/2023 22:53, Alex Williamson wrote: External email: Use caution opening links or attachments On Wed, 15 Feb 2023 20:23:12 +0200 Avihai Horon wrote: On 15/02/2023 15:01, Juan Quintela wrote: External email: Use caution opening links or attachments Avihai Horon wrote: Implement the basic mandatory part of VFIO migration protocol v2. This includes all functionality that is necessary to support VFIO_MIGRATION_STOP_COPY part of the v2 protocol. The two protocols, v1 and v2, will co-exist and in the following patches v1 protocol code will be removed. There are several main differences between v1 and v2 protocols: - VFIO device state is now represented as a finite state machine instead of a bitmap. - Migration interface with kernel is now done using VFIO_DEVICE_FEATURE ioctl and normal read() and write() instead of the migration region. - Pre-copy is made optional in v2 protocol. Support for pre-copy will be added later on. Detailed information about VFIO migration protocol v2 and its difference compared to v1 protocol can be found here [1]. [1] https://lore.kernel.org/all/20220224142024.147653-10-yish...@nvidia.com/ Signed-off-by: Avihai Horon +/* + * Migration size of VFIO devices can be as little as a few KBs or as big as + * many GBs. This value should be big enough to cover the worst case. + */ +#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB) Wow O:-) + +/* + * Only exact function is implemented and not estimate function. The reason is + * that during pre-copy phase of migration the estimate function is called + * repeatedly while pending RAM size is over the threshold, thus migration + * can't converge and querying the VFIO device pending data size is useless. + */ You can do it after this is merge, but I think you can do better than this. Something in the lines of: // I put it in a global variable, but it really needs to be in VFIODevice to be // able to support several devices. You get the idea O:-) static uint64_t cached_size = -1; static void vfio_state_pending_exact(void *opaque, uint64_t *res_precopy_only, uint64_t *res_compatible, uint64_t *res_postcopy_only) { VFIODevice *vbasedev = opaque; uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE; /* * If getting pending migration size fails, VFIO_MIG_STOP_COPY_SIZE is * reported so downtime limit won't be violated. */ vfio_query_stop_copy_size(vbasedev, &stop_copy_size); *res_precopy_only += stop_copy_size; cached_size = stop_copy_size; trace_vfio_state_pending_exact(vbasedev->name, *res_precopy_only, *res_postcopy_only, *res_compatible, stop_copy_size); } static void vfio_state_pending_estimate(void *opaque, uint64_t *res_precopy_only, uint64_t *res_compatible, uint64_t *res_postcopy_only) { VFIODevice *vbasedev = opaque; uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE; if (cached_size == -1) { uint64_t res_precopy; uint64_t res_compatible; uint64_t res_postcopy; vfio_state_pending_exact(opaque, &res_precopy, &res_compatible, &res_postcopy); } *res_precopy_only += cached_size; } In the next series, which will add pre-copy support to VFIO migration (v1 was sent [1] but isn't rebased on your pull reqs yet), I am going to do something similar to what you suggested. It will be like you did here but with pre-copy data size (data which can be transferred during pre-copy phase) instead of the stop_copy_size. Plus, I don't think caching the stop_copy_size and reporting the cached value in the estimate handler fits the best here, because stop_copy_size doesn't decrease by pre-copy iterations as opposed to RAM pre-copy data, for example. So I would rather keep things as they are and add something similar to your suggestion in the pre-copy series. Assuming Juan is on board with this, please re-base your series on Juan's latest pull request. At the time of writing, this is: https://lore.kernel.org/all/20230215200554.1365-1-quint...@redhat.com/ The kernel headers need an update (you might as well pick up v6.2-rc8 at this point to be as close as possible to what lands in v6.2), Juan's pull request includes qemu_file_get_to_fd(), so we can drop it here, and there are some conflicts that need to be ironed out relative to 24beea4efe6e ("migration: Rename res_{postcopy,precopy}_only"). Sure, I will do it and post v11. Thanks.
Re: [PATCH 05/12] gitlab: reduce default verbosity of cirrus run
On 16/02/2023 09.02, Alex Bennée wrote: Thomas Huth writes: On 15/02/2023 20.25, Alex Bennée wrote: We also truncate the echoing of the test log if we fail. Ideally we would want the build aretefact to be available to gitlab but so far how to do this eludes me. Signed-off-by: Alex Bennée Cc: Daniel P. Berrangé --- .gitlab-ci.d/cirrus/build.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.gitlab-ci.d/cirrus/build.yml b/.gitlab-ci.d/cirrus/build.yml index 7ef6af8d33..6563ff3c7a 100644 --- a/.gitlab-ci.d/cirrus/build.yml +++ b/.gitlab-ci.d/cirrus/build.yml @@ -32,6 +32,6 @@ build_task: - $MAKE -j$(sysctl -n hw.ncpu) - for TARGET in $TEST_TARGETS ; do -$MAKE -j$(sysctl -n hw.ncpu) $TARGET V=1 -|| { cat meson-logs/testlog.txt; exit 1; } ; +$MAKE -j$(sysctl -n hw.ncpu) $TARGET +|| { tail -n 200 meson-logs/testlog.txt; exit 1; } ; done I think it should be OK to publish the artifacts on cirrus-ci.com instead - you have to click a little bit more often, but you can still get the artifacts there, see: https://lore.kernel.org/qemu-devel/20230215142503.90660-1-th...@redhat.com/ But dropping the V=1 also helps by reducing those chatty softfloat tests. If we could merge that with yours. Are you sending a PR soon or should I pull your patch into this series? I'm not planning a new pull request in the next few days, so feel free to pick my patch up or simply ignore it. Anyway, the V=1 has just been added a little bit more than 2 years ago to address a different problem: https://gitlab.com/qemu-project/qemu/-/commit/2a5a79d1b57280edd I'm fine with dropping the V=1 again, but it still feels like we're going around in circles here. Thomas
Re: [PATCH qemu 1/2] hw/at24c : modify at24c to support 1 byte address mode
On 10/2/23 07:20, ~ssinprem wrote: From: Sittisak Sinprem --- hw/nvram/eeprom_at24c.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/hw/nvram/eeprom_at24c.c b/hw/nvram/eeprom_at24c.c index 2d4d8b952f..693212b661 100644 --- a/hw/nvram/eeprom_at24c.c +++ b/hw/nvram/eeprom_at24c.c @@ -87,7 +87,7 @@ uint8_t at24c_eeprom_recv(I2CSlave *s) EEPROMState *ee = AT24C_EE(s); uint8_t ret; -if (ee->haveaddr == 1) { +if (ee->rsize > 256 && ee->haveaddr == 1) { return 0xff; } What represents this '256' magic value? Please add a definition.
[PATCH v2] audio/pwaudio.c: Add Pipewire audio backend for QEMU
This commit adds a new audiodev backend to allow QEMU to use Pipewire as both an audio sink and source. Signed-off-by: Dorinda Bassey --- v2: * Shorten commit message * fix copyright ownership and authour * use QEMU standard of 4 space indentation * verbose use of pipewire instead pf pw audio/audio.c | 3 + audio/audio_template.h| 4 + audio/meson.build | 1 + audio/pwaudio.c | 818 ++ meson.build | 7 + meson_options.txt | 4 +- qapi/audio.json | 45 ++ qemu-options.hx | 17 + scripts/meson-buildoptions.sh | 8 +- 9 files changed, 904 insertions(+), 3 deletions(-) create mode 100644 audio/pwaudio.c diff --git a/audio/audio.c b/audio/audio.c index 4290309d18..aa55e41ad8 100644 --- a/audio/audio.c +++ b/audio/audio.c @@ -2069,6 +2069,9 @@ void audio_create_pdos(Audiodev *dev) #ifdef CONFIG_AUDIO_PA CASE(PA, pa, Pa); #endif +#ifdef CONFIG_AUDIO_PIPEWIRE +CASE(PIPEWIRE, pipewire, Pipewire); +#endif #ifdef CONFIG_AUDIO_SDL CASE(SDL, sdl, Sdl); #endif diff --git a/audio/audio_template.h b/audio/audio_template.h index 42b4712acb..0f02afb921 100644 --- a/audio/audio_template.h +++ b/audio/audio_template.h @@ -355,6 +355,10 @@ AudiodevPerDirectionOptions *glue(audio_get_pdo_, TYPE)(Audiodev *dev) case AUDIODEV_DRIVER_PA: return qapi_AudiodevPaPerDirectionOptions_base(dev->u.pa.TYPE); #endif +#ifdef CONFIG_AUDIO_PIPEWIRE +case AUDIODEV_DRIVER_PIPEWIRE: +return qapi_AudiodevPipewirePerDirectionOptions_base(dev->u.pipewire.TYPE); +#endif #ifdef CONFIG_AUDIO_SDL case AUDIODEV_DRIVER_SDL: return qapi_AudiodevSdlPerDirectionOptions_base(dev->u.sdl.TYPE); diff --git a/audio/meson.build b/audio/meson.build index 074ba9..65a49c1a10 100644 --- a/audio/meson.build +++ b/audio/meson.build @@ -19,6 +19,7 @@ foreach m : [ ['sdl', sdl, files('sdlaudio.c')], ['jack', jack, files('jackaudio.c')], ['sndio', sndio, files('sndioaudio.c')], + ['pipewire', pipewire, files('pwaudio.c')], ['spice', spice, files('spiceaudio.c')] ] if m[1].found() diff --git a/audio/pwaudio.c b/audio/pwaudio.c new file mode 100644 index 00..bb25133414 --- /dev/null +++ b/audio/pwaudio.c @@ -0,0 +1,818 @@ +/* + * QEMU Pipewire audio driver + * + * Copyright (c) 2023 Red Hat Inc. + * + * Author: Dorinda Bassey + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" +#include "qemu/module.h" +#include "audio.h" +#include +#include +#include +#include + +#include + +#define AUDIO_CAP "pipewire" +#define RINGBUFFER_SIZE(1u << 22) +#define RINGBUFFER_MASK(RINGBUFFER_SIZE - 1) +#define BUFFER_SAMPLES128 + +#include "audio_int.h" + +enum { +MODE_SINK, +MODE_SOURCE +}; + +typedef struct pwaudio { +Audiodev *dev; +struct pw_thread_loop *thread_loop; +struct pw_context *context; + +struct pw_core *core; +struct spa_hook core_listener; +int seq; +} pwaudio; + +typedef struct PWVoice { +pwaudio *g; +bool enabled; +struct pw_stream *stream; +struct spa_hook stream_listener; +struct spa_audio_info_raw info; +uint32_t frame_size; +struct spa_ringbuffer ring; +uint8_t buffer[RINGBUFFER_SIZE]; + +uint32_t mode; +struct pw_properties *props; +} PWVoice; + +typedef struct PWVoiceOut { +HWVoiceOut hw; +PWVoice v; +} PWVoiceOut; + +typedef struct PWVoiceIn { +HWVoiceIn hw; +PWVoice v; +} PWVoiceIn; + +static void +stream_destroy(void *data) +{ +PWVoice *v = (PWVoice *) data; +spa_hook_remove(&v->stream_listener); +v->stream = NULL; +} + +/* output data processing function to read stuffs from the buffer */ +static void +playback_on_process(void *data) +{ +PWVoice *v = (PWVoice *) data; +void *p; +struct pw_buffer *b; +struct spa_buf
Re: [RFC 08/52] machine: Add helpers to get cpu topology info from MachineState.topo
Hi Zhao, 在 2023/2/13 17:49, Zhao Liu 写道: From: Zhao Liu When MachineState.topo is introduced, the topology related structures become complicated. In the general case (hybrid or smp topology), accessing the topology information needs to determine whether it is currently smp or hybrid topology, and then access the corresponding MachineState.topo.smp or MachineState.topo.hybrid. The best way to do this is to wrap the access to the topology to avoid having to check each time it is accessed. The following helpers are provided here: - General interfaces - no need to worry about whether the underlying topology is smp or hybrid: * machine_topo_get_cpus() * machine_topo_get_max_cpus() * machine_topo_is_smp() * machine_topo_get_sockets() * machine_topo_get_dies() * machine_topo_get_clusters() * machine_topo_get_threads(); * machine_topo_get_cores(); * machine_topo_get_threads_by_idx() * machine_topo_get_cores_by_idx() * machine_topo_get_cores_per_socket() * machine_topo_get_threads_per_socket() - SMP-specific interfaces - provided for the cases that are clearly known to be smp topology: * machine_topo_get_smp_cores() * machine_topo_get_smp_threads() Since for hybrid topology, each core may has different threads, if someone wants "cpus per core", the cpu_index is need to target a specific core (machine_topo_get_threads_by_idx()). But for smp, there is no need to be so troublesome, so for this case, we provide smp-specific interfaces. Signed-off-by: Zhao Liu --- hw/core/machine-topo.c | 142 + include/hw/boards.h| 35 ++ 2 files changed, 177 insertions(+) diff --git a/hw/core/machine-topo.c b/hw/core/machine-topo.c index 7223f73f99b0..b20160479629 100644 --- a/hw/core/machine-topo.c +++ b/hw/core/machine-topo.c @@ -21,6 +21,148 @@ #include "hw/boards.h" #include "qapi/error.h" +unsigned int machine_topo_get_sockets(const MachineState *ms) +{ +return machine_topo_is_smp(ms) ? ms->topo.smp.sockets : + ms->topo.hybrid.sockets; +} + +unsigned int machine_topo_get_dies(const MachineState *ms) +{ +return machine_topo_is_smp(ms) ? ms->topo.smp.dies : + ms->topo.hybrid.dies; +} + +unsigned int machine_topo_get_clusters(const MachineState *ms) +{ +return machine_topo_is_smp(ms) ? ms->topo.smp.clusters : + ms->topo.hybrid.clusters; +} + +unsigned int machine_topo_get_smp_cores(const MachineState *ms) +{ +g_assert(machine_topo_is_smp(ms)); +return ms->topo.smp.cores; +} + +unsigned int machine_topo_get_smp_threads(const MachineState *ms) +{ +g_assert(machine_topo_is_smp(ms)); +return ms->topo.smp.threads; +} + +unsigned int machine_topo_get_threads(const MachineState *ms, + unsigned int cluster_id, + unsigned int core_id) +{ +if (machine_topo_is_smp(ms)) { +return ms->topo.smp.threads; +} else { +return ms->topo.hybrid.cluster_list[cluster_id] + .core_list[core_id].threads; +} + +return 0; +} + +unsigned int machine_topo_get_cores(const MachineState *ms, +unsigned int cluster_id) +{ +if (machine_topo_is_smp(ms)) { +return ms->topo.smp.cores; +} else { +return ms->topo.hybrid.cluster_list[cluster_id].cores; +} +} Is it possible to use variadic function so that those two smp specific helpers can be avoided? It's a bit wired that we have the generic machine_topo_get_threads but also need machine_topo_get_smp_threads at the same time. + +unsigned int machine_topo_get_threads_by_idx(const MachineState *ms, + unsigned int cpu_index) +{ +unsigned cpus_per_die; +unsigned tmp_idx; +HybridCluster *cluster; +HybridCore *core; + +if (machine_topo_is_smp(ms)) { +return ms->topo.smp.threads; +} + +cpus_per_die = ms->topo.max_cpus / (ms->topo.hybrid.sockets * +ms->topo.hybrid.dies); +tmp_idx = cpu_index % cpus_per_die; + +for (int i = 0; i < ms->topo.hybrid.clusters; i++) { +cluster = &ms->topo.hybrid.cluster_list[i]; + +for (int j = 0; j < cluster->cores; j++) { +core = &cluster->core_list[j]; + +if (tmp_idx < core->threads) { +return core->threads; +} else { +tmp_idx -= core->threads; +} +} +} + +return 0; +} + +unsigned int machine_topo_get_cores_by_idx(const MachineState *ms, + unsigned int cpu_index) +{ +unsigned cpus_per_die; +unsigned tmp_idx; +HybridCluster *cluster; +HybridCore *core; + +if (machine_topo_is_smp(ms)) { +return ms->topo.smp.cores; +} + +cpus_per_die = ms->topo.max_cpus / (ms->topo.hybrid.sockets * +
Re: [PATCH v3 2/2] s390x/pv: Add support for asynchronous teardown for reboot
On 14/02/2023 17.30, Claudio Imbrenda wrote: This patch adds support for the asynchronous teardown for reboot for protected VMs. When attempting to tear down a protected VM, try to use the new asynchronous interface first. If that fails, fall back to the classic synchronous one. The asynchronous interface involves invoking the new KVM_PV_ASYNC_DISABLE_PREPARE command for the KVM_S390_PV_COMMAND ioctl. This will prepare the current protected VM for asynchronous teardown. Once the protected VM is prepared for teardown, execution can continue immediately. Once the protected VM has been prepared, a new thread is started to actually perform the teardown. The new thread uses the new KVM_PV_ASYNC_DISABLE command for the KVM_S390_PV_COMMAND ioctl. The previously prepared protected VM is torn down in the new thread. Once KVM_PV_ASYNC_DISABLE is invoked, it is possible to use KVM_PV_ASYNC_DISABLE_PREPARE again. If a protected VM has already been prepared and its cleanup has not started, it will not be possible to prepare a new VM. In that case the classic synchronous teardown has to be performed. The synchronous teardown will now also clean up any prepared VMs whose asynchronous teardown has not been initiated yet. This considerably speeds up the reboot of a protected VM; for large VMs especially, it could take a long time to perform a reboot with the traditional synchronous teardown, while with this patch it is almost immediate. Signed-off-by: Claudio Imbrenda --- hw/s390x/pv.c | 28 hw/s390x/s390-virtio-ccw.c | 5 - include/hw/s390x/pv.h | 2 ++ 3 files changed, 34 insertions(+), 1 deletion(-) Reviewed-by: Thomas Huth
Re: [PATCH 1/8] m25p80: Improve error when the backend file size does not match the device
On 15/2/23 20:52, Peter Delevoryas wrote: On Tue, Feb 14, 2023 at 06:18:23PM +0100, Cédric Le Goater wrote: Currently, when a block backend is attached to a m25p80 device and the associated file size does not match the flash model, QEMU complains with the error message "failed to read the initial flash content". This is confusing for the user. Use blk_check_size_and_read_all() instead of blk_pread() to improve the reported error. Could you reference commit 06f1521795? Reviewed-by: Peter Maydell Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Peter Delevoryas Reviewed-by: Alistair Francis Message-Id: <20221115151000.2080833-1-...@kaod.org> Signed-off-by: Cédric Le Goater --- breakage with commit a4b15a8b9e ("pflash: Only read non-zero parts of backend image") when using -snaphot. I guess it's not obvious to me, what broke? 1. BlockBackend *blk = -drive file=blob.mtd,format=raw,if=mtd,snapshot=on 2. uint8_t *storage = malloc(...) 3. blk_check_size_and_read_all(blk, storage, size) 4. commit a4b15a8b9e for sector in blk: if any(b != 0 for b in sector): memcpy(&storage[...], sector, sizeof(sector)) else: # Skip memcpy of zeroes But this is probably a regression for us because we actually expect the zeroes to be copied? Yes... Commit a4b15a8b9e mostly considered cloud payload optimization. Since EEPROMs are usually "small", this could be kludged as: -- >8 -- --- a/hw/block/m25p80.c +++ b/hw/block/m25p80.c @@ -1615,6 +1615,7 @@ static void m25p80_realize(SSIPeripheral *ss, Error **errp) trace_m25p80_binding(s); s->storage = blk_blockalign(s->blk, s->size); +memset(s->storage, ERASED_BYTE, s->size); if (blk_pread(s->blk, 0, s->size, s->storage, 0) < 0) { error_setg(errp, "failed to read the initial flash content"); return; --- On emulation, it is simpler to ask the user to provide a block image with the same size of the emulated device. See commit 06f1521795 ("pflash: Require backend size to match device, improve errors") for more information. diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c index 802d2eb021..dc5ffbc4ff 100644 --- a/hw/block/m25p80.c +++ b/hw/block/m25p80.c @@ -24,6 +24,7 @@ #include "qemu/osdep.h" #include "qemu/units.h" #include "sysemu/block-backend.h" +#include "hw/block/block.h" #include "hw/qdev-properties.h" #include "hw/qdev-properties-system.h" #include "hw/ssi/ssi.h" @@ -1615,8 +1616,7 @@ static void m25p80_realize(SSIPeripheral *ss, Error **errp) trace_m25p80_binding(s); s->storage = blk_blockalign(s->blk, s->size); -if (blk_pread(s->blk, 0, s->size, s->storage, 0) < 0) { -error_setg(errp, "failed to read the initial flash content"); +if (!blk_check_size_and_read_all(s->blk, s->storage, s->size, errp)) { return; } } else { -- 2.39.1
Re: [PATCH qemu 1/2] hw/at24c : modify at24c to support 1 byte address mode
Hi Philippe, the EEPROM size less than or equal to 256 , such as 24c02 use only 1 byte to mention the memory (0 - 255) (0x00 - 0xff) write byte [ Start sAddr/W ] -> [ mem Addr ] -> [ mem data ] -> [ Stop ] read byte [ Start sAddr/W ] -> [ mem Addr ] -> [ reStart sAddr/R] -> [ mem data ]-> [ Stop ] --- Meanwhile, in EEPROM size more 256 , such as 24c64 has size 8192 bytes (0x2000) use 2 bytes for pointing the memory (0x - 0x2000 ) write byte [ Start sAddr/W ] -> [ 1st mem Addr ] -> [ 2nd mem Addr ] -> [ Mem Data ] -> [ Stop ] read byte [ Start sAddr/W ] -> [ 1st mem Addr ] -> [ 2nd mem Addr ] -> [ reStart sAddr/R] -> [ mem data ]-> [ Stop ] On Thu, Feb 16, 2023 at 3:25 PM Philippe Mathieu-Daudé wrote: > On 10/2/23 07:20, ~ssinprem wrote: > > From: Sittisak Sinprem > > > > --- > > hw/nvram/eeprom_at24c.c | 8 +--- > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > diff --git a/hw/nvram/eeprom_at24c.c b/hw/nvram/eeprom_at24c.c > > index 2d4d8b952f..693212b661 100644 > > --- a/hw/nvram/eeprom_at24c.c > > +++ b/hw/nvram/eeprom_at24c.c > > @@ -87,7 +87,7 @@ uint8_t at24c_eeprom_recv(I2CSlave *s) > > EEPROMState *ee = AT24C_EE(s); > > uint8_t ret; > > > > -if (ee->haveaddr == 1) { > > +if (ee->rsize > 256 && ee->haveaddr == 1) { > > return 0xff; > > } > > What represents this '256' magic value? Please add a definition. >
[RFC PATCH v2 0/2] i386/sev: Support measured direct kernel boot on SNP
This RFC patch series is based on AMD's RFC upmv10-snpv3 tree [1]. In order to enable measured direct kernel boot on SNP, QEMU needs to fill the hashes page when kernel-hashes=on. This relies on several changes to the SNP metadata published by OVMF (See [2] for proposed OVMF patches). Patch 1 pulls the 'kernel-hashes' property from the SEV guest settings to the common settings to make it available for both SEV and SNP. Patch 2 adds the hashes table for SNP guests (or validates the page as a zero page if kernel-hashes=off). This patch series is also available at [3]. [1] https://github.com/mdroth/qemu/commits/upmv10-snpv3 [2] https://edk2.groups.io/g/devel/message/100286 [3] https://github.com/confidential-containers-demo/qemu/tree/snp-kernel-hashes-v2 v2 changes: * Rebase on top of upmv10-snpv3 which includes kernel-hashes. v1: https://lore.kernel.org/qemu-devel/20220329064038.96006-1-dovmurik%40linux.ibm.com/ Cc: Paolo Bonzini Cc: Daniel P. Berrangé Cc: Dr. David Alan Gilbert Cc: Eduardo Habkost Cc: Eric Blake Cc: Markus Armbruster Cc: Marcelo Tosatti Cc: Gerd Hoffmann Cc: James Bottomley Cc: Tom Lendacky Cc: Michael Roth Cc: Ashish Kalra Cc: Mario Smarduch Cc: Tobin Feldman-Fitzthum Dov Murik (2): qapi, i386: Move kernel-hashes to SevCommonProperties i386/sev: Allow measured direct kernel boot on SNP qapi/qom.json | 12 +++--- target/i386/sev.c | 95 +-- 2 files changed, 65 insertions(+), 42 deletions(-) -- 2.25.1
[RFC PATCH v2 2/2] i386/sev: Allow measured direct kernel boot on SNP
In SNP, the hashes page is not included in the ranges to pre-validate that appear in the SNP metadata published by AmdSev OVMF. Therefore, if the user enabled kernel hashes (for measured direct boot), QEMU should fill hashes table and encrypt the page. Note that in SNP (unlike SEV and SEV-ES) the measurements is done in whole 4KB pages. Therefore QEMU zeros the whole page that includes the hashes table, and fills in the kernel hashes area in that page, and then encrypts the whole page. The rest of the page is reserved for SEV launch secrets which are not usable anyway on SNP. If the user disabled kernel hashes, QEMU pre-validates the page as a zero page. Signed-off-by: Dov Murik --- target/i386/sev.c | 51 +-- 1 file changed, 41 insertions(+), 10 deletions(-) diff --git a/target/i386/sev.c b/target/i386/sev.c index 6b8e85888f..c36ba9a541 100644 --- a/target/i386/sev.c +++ b/target/i386/sev.c @@ -2079,8 +2079,11 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp) uint8_t initrd_hash[HASH_SIZE]; uint8_t kernel_hash[HASH_SIZE]; uint8_t *hashp; +hwaddr mapped_gpa, mapped_offset, mapped_len, expected_mapped_len; +uint8_t *mapped_area = NULL; +MemoryRegion *mr = NULL; +void *hva; size_t hash_len = HASH_SIZE; -hwaddr mapped_len = sizeof(*padded_ht); MemTxAttrs attrs = { 0 }; bool ret = true; SevCommonState *sev_common = SEV_COMMON(MACHINE(qdev_get_machine())->cgs); @@ -2090,6 +2093,25 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp) * stated kernel-hashes=on. */ if (!sev_common->kernel_hashes) { +if (sev_snp_enabled()) { +/* Mark the hashes page (if defined) as a zero page */ +if (!pc_system_ovmf_table_find(SEV_HASH_TABLE_RV_GUID, &data, NULL)) { +return false; +} + +area = (SevHashTableDescriptor *)data; +if (!area->base || area->size < sizeof(PaddedSevHashTable)) { +return false; +} + +mapped_gpa = area->base & TARGET_PAGE_MASK; +hva = gpa2hva(&mr, mapped_gpa, TARGET_PAGE_SIZE, NULL); +if (sev_snp_launch_update(SEV_SNP_GUEST(sev_common), mapped_gpa, hva, + TARGET_PAGE_SIZE, KVM_SEV_SNP_PAGE_TYPE_ZERO)) { +error_setg(errp, "SEV: error marking kernel hashes page as zero"); +} +return false; +} return false; } @@ -2099,10 +2121,6 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp) return false; } -if (sev_snp_enabled()) { -return false; -} - area = (SevHashTableDescriptor *)data; if (!area->base || area->size < sizeof(PaddedSevHashTable)) { error_setg(errp, "SEV: guest firmware hashes table area is invalid " @@ -2149,12 +2167,25 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp) * Populate the hashes table in the guest's memory at the OVMF-designated * area for the SEV hashes table */ -padded_ht = address_space_map(&address_space_memory, area->base, - &mapped_len, true, attrs); -if (!padded_ht || mapped_len != sizeof(*padded_ht)) { +if (sev_snp_enabled()) { +/* SNP encrypts and measures memory in whole pages */ +mapped_gpa = area->base & TARGET_PAGE_MASK; +mapped_offset = area->base & ~TARGET_PAGE_MASK; +mapped_len = TARGET_PAGE_SIZE; +} else { +mapped_gpa = area->base; +mapped_offset = 0; +mapped_len = sizeof(*padded_ht); +} +expected_mapped_len = mapped_len; +mapped_area = address_space_map(&address_space_memory, mapped_gpa, +&mapped_len, true, attrs); +if (!mapped_area || mapped_len != expected_mapped_len) { error_setg(errp, "SEV: cannot map hashes table guest memory area"); return false; } +memset(mapped_area, 0, mapped_len); +padded_ht = (PaddedSevHashTable *)(mapped_area + mapped_offset); ht = &padded_ht->ht; ht->guid = sev_hash_table_header_guid; @@ -2175,11 +2206,11 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp) /* zero the excess data so the measurement can be reliably calculated */ memset(padded_ht->padding, 0, sizeof(padded_ht->padding)); -if (sev_encrypt_flash(area->base, (uint8_t *)padded_ht, sizeof(*padded_ht), errp) < 0) { +if (sev_encrypt_flash(mapped_gpa, mapped_area, mapped_len, errp) < 0) { ret = false; } -address_space_unmap(&address_space_memory, padded_ht, +address_space_unmap(&address_space_memory, mapped_area, mapped_len, true, mapped_len); return ret; -- 2.25.1
[RFC PATCH v2 1/2] qapi, i386: Move kernel-hashes to SevCommonProperties
In order to enable kernel-hashes for SNP, pull it from SevGuestProperties to its parent SevCommonProperties so it will be available for both SEV and SNP. --- qapi/qom.json | 12 ++-- target/i386/sev.c | 44 ++-- 2 files changed, 24 insertions(+), 32 deletions(-) diff --git a/qapi/qom.json b/qapi/qom.json index 33abba0e04..9b2897d54c 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -836,6 +836,10 @@ # @reduced-phys-bits: number of bits in physical addresses that become # unavailable when SEV is enabled # +# @kernel-hashes: if true, add hashes of kernel/initrd/cmdline to a +# designated guest firmware page for measured boot +# with -kernel (default: false) (since 6.2) +# # @upm-mode: configure Unmapped Private Memory mode # # @discard: configure how discarding is handled for memory after @@ -848,6 +852,7 @@ 'data': { '*sev-device': 'str', '*cbitpos': 'uint32', 'reduced-phys-bits': 'uint32', +'*kernel-hashes': 'bool', '*upm-mode': 'bool', '*discard': 'str' } } @@ -864,10 +869,6 @@ # # @handle: SEV firmware handle (default: 0) # -# @kernel-hashes: if true, add hashes of kernel/initrd/cmdline to a -# designated guest firmware page for measured boot -# with -kernel (default: false) (since 6.2) -# # Since: 2.12 ## { 'struct': 'SevGuestProperties', @@ -875,8 +876,7 @@ 'data': { '*dh-cert-file': 'str', '*session-file': 'str', '*policy': 'uint32', -'*handle': 'uint32', -'*kernel-hashes': 'bool' } } +'*handle': 'uint32' } } ## # @SevSnpGuestProperties: diff --git a/target/i386/sev.c b/target/i386/sev.c index 758e8225c2..6b8e85888f 100644 --- a/target/i386/sev.c +++ b/target/i386/sev.c @@ -64,6 +64,7 @@ struct SevCommonState { char *sev_device; uint32_t cbitpos; uint32_t reduced_phys_bits; +bool kernel_hashes; bool upm_mode; char *discard; @@ -88,7 +89,6 @@ struct SevGuestState { uint32_t policy; char *dh_cert_file; char *session_file; -bool kernel_hashes; }; struct SevSnpGuestState { @@ -390,6 +390,16 @@ sev_common_set_sev_device(Object *obj, const char *value, Error **errp) SEV_COMMON(obj)->sev_device = g_strdup(value); } +static bool sev_common_get_kernel_hashes(Object *obj, Error **errp) +{ +return SEV_COMMON(obj)->kernel_hashes; +} + +static void sev_common_set_kernel_hashes(Object *obj, bool value, Error **errp) +{ +SEV_COMMON(obj)->kernel_hashes = value; +} + static bool sev_common_get_upm_mode(Object *obj, Error **errp) { return SEV_COMMON(obj)->upm_mode; @@ -420,6 +430,11 @@ sev_common_class_init(ObjectClass *oc, void *data) sev_common_set_sev_device); object_class_property_set_description(oc, "sev-device", "SEV device to use"); +object_class_property_add_bool(oc, "kernel-hashes", + sev_common_get_kernel_hashes, + sev_common_set_kernel_hashes); +object_class_property_set_description(oc, "kernel-hashes", +"add kernel hashes to guest firmware for measured Linux boot"); object_class_property_add_bool(oc, "upm-mode", sev_common_get_upm_mode, sev_common_set_upm_mode); @@ -484,20 +499,6 @@ sev_guest_set_session_file(Object *obj, const char *value, Error **errp) SEV_GUEST(obj)->session_file = g_strdup(value); } -static bool sev_guest_get_kernel_hashes(Object *obj, Error **errp) -{ -SevGuestState *sev_guest = SEV_GUEST(obj); - -return sev_guest->kernel_hashes; -} - -static void sev_guest_set_kernel_hashes(Object *obj, bool value, Error **errp) -{ -SevGuestState *sev = SEV_GUEST(obj); - -sev->kernel_hashes = value; -} - static void sev_guest_class_init(ObjectClass *oc, void *data) { @@ -511,11 +512,6 @@ sev_guest_class_init(ObjectClass *oc, void *data) sev_guest_set_session_file); object_class_property_set_description(oc, "session-file", "guest owners session parameters (encoded with base64)"); -object_class_property_add_bool(oc, "kernel-hashes", - sev_guest_get_kernel_hashes, - sev_guest_set_kernel_hashes); -object_class_property_set_description(oc, "kernel-hashes", -"add kernel hashes to guest firmware for measured Linux boot"); } static void @@ -2088,16 +2084,12 @@ bool sev_add_kernel_loader_hashes(SevKernelLoaderContext *ctx, Error **errp) MemTxAttrs attrs = { 0 }; bool ret = true; SevCommonState *sev_common = SEV_COMMON(MACHINE(qdev_get_machine())->cgs); -SevGuestState *sev_guest = -(SevGuestState *)object_dynamic_cast(OBJECT(sev_common), -
Re: Lost partition tables on ide-hd + ahci drive
Am 15.02.23 um 22:47 schrieb John Snow: > Hm, I'm not sure I see any pattern that might help. Could be that AHCI > is just bugged during load, but it's tough to know in what way. If we ever get a backtrace where the bad write actually goes through QEMU, I'll let you know. We are considering providing a custom build to affected users (using GDB-hooks leads to too much slowdown in these performance-critical paths) in the hope to catch it if it triggers again. We can't really roll it out to all users, because most writes to sector zero are legitimate after all and most users are not affected. > What versions of QEMU are in use here? Is there a date on which you > noticed an increased frequency of these reports? There were a few reports around the time we rolled out 4.2 and 5.0 (Q2/Q3 of 2020), but the frequency was always very low. AFAICT, there's about 20-40 reports that could be this issue in total. The earliest I know of with lost partitions, but not much more information, are forum threads from 2017/2018. With 4.2, there was a rework with our backup patches so naturally, I suspected that. Before 4.2, we had extended the backup job to allow using a callback to handle the writes instead of the BlockDriverState target. But starting from 4.2, we are not messing with that anymore and using a custom driver as the backup target. That custom driver doesn't even know about the source. The source is handled by the usual backup job mechanisms. If there was some general mix-up there, I'd not expect it to work for >99.99% of backups and only trigger in combination with AHCI, but who knows? Best Regards, Fiona
Re: [RFC PATCH v2 0/2] i386/sev: Support measured direct kernel boot on SNP
On 16/02/2023 10:49, Dov Murik wrote: > This RFC patch series is based on AMD's RFC upmv10-snpv3 tree [1]. > Note that in order to test this you must use '-machine pc-q35-7.1' to circumvent the SETUP_RNG_SEED bug [1] that interferes with the measured kernel. [1] https://lore.kernel.org/qemu-devel/20230208211212.41951-1-...@redhat.com/ -Dov > > In order to enable measured direct kernel boot on SNP, QEMU needs to > fill the hashes page when kernel-hashes=on. This relies on several > changes to the SNP metadata published by OVMF (See [2] for proposed > OVMF patches). > > Patch 1 pulls the 'kernel-hashes' property from the SEV guest settings > to the common settings to make it available for both SEV and SNP. > > Patch 2 adds the hashes table for SNP guests (or validates the page as a > zero page if kernel-hashes=off). > > This patch series is also available at [3]. > > > [1] https://github.com/mdroth/qemu/commits/upmv10-snpv3 > [2] https://edk2.groups.io/g/devel/message/100286 > [3] > https://github.com/confidential-containers-demo/qemu/tree/snp-kernel-hashes-v2 > > v2 changes: > * Rebase on top of upmv10-snpv3 which includes kernel-hashes. > > v1: > https://lore.kernel.org/qemu-devel/20220329064038.96006-1-dovmurik%40linux.ibm.com/ > > Cc: Paolo Bonzini > Cc: Daniel P. Berrangé > Cc: Dr. David Alan Gilbert > Cc: Eduardo Habkost > Cc: Eric Blake > Cc: Markus Armbruster > Cc: Marcelo Tosatti > Cc: Gerd Hoffmann > Cc: James Bottomley > Cc: Tom Lendacky > Cc: Michael Roth > Cc: Ashish Kalra > Cc: Mario Smarduch > Cc: Tobin Feldman-Fitzthum > > Dov Murik (2): > qapi, i386: Move kernel-hashes to SevCommonProperties > i386/sev: Allow measured direct kernel boot on SNP > > qapi/qom.json | 12 +++--- > target/i386/sev.c | 95 +-- > 2 files changed, 65 insertions(+), 42 deletions(-) >
[PATCH] hw/arm: Add missing ZynqMP ZCU102 -> USB_DWC3 Kconfig dependency
Since commit acc0b8b05a when running the ZynqMP ZCU102 board with a QEMU configured using --without-default-devices, we get: $ qemu-system-aarch64 -M xlnx-zcu102 qemu-system-aarch64: missing object type 'usb_dwc3' Abort trap: 6 Fix by adding the missing Kconfig dependency. Fixes: acc0b8b05a ("hw/arm/xlnx-zynqmp: Connect ZynqMP's USB controllers") Signed-off-by: Philippe Mathieu-Daudé --- hw/arm/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index 2d157de9b8..b5aed4aff5 100644 --- a/hw/arm/Kconfig +++ b/hw/arm/Kconfig @@ -389,6 +389,7 @@ config XLNX_ZYNQMP_ARM select XLNX_CSU_DMA select XLNX_ZYNQMP select XLNX_ZDMA +select USB_DWC3 config XLNX_VERSAL bool -- 2.38.1
Re: [RFC PATCH v2 1/2] qapi, i386: Move kernel-hashes to SevCommonProperties
Dov Murik writes: > In order to enable kernel-hashes for SNP, pull it from > SevGuestProperties to its parent SevCommonProperties so > it will be available for both SEV and SNP. Missing Signed-off-by: Dov Murik Patch does not apply for me.
Re: [PATCH] hw/arm: Add missing ZynqMP ZCU102 -> USB_DWC3 Kconfig dependency
Oops I meant this as subject: "hw/arm: Add missing XLNX_ZYNQMP_ARM -> USB_DWC3 Kconfig dependency" On 16/2/23 10:23, Philippe Mathieu-Daudé wrote: Since commit acc0b8b05a when running the ZynqMP ZCU102 board with a QEMU configured using --without-default-devices, we get: $ qemu-system-aarch64 -M xlnx-zcu102 qemu-system-aarch64: missing object type 'usb_dwc3' Abort trap: 6 Fix by adding the missing Kconfig dependency. Fixes: acc0b8b05a ("hw/arm/xlnx-zynqmp: Connect ZynqMP's USB controllers") Signed-off-by: Philippe Mathieu-Daudé --- hw/arm/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index 2d157de9b8..b5aed4aff5 100644 --- a/hw/arm/Kconfig +++ b/hw/arm/Kconfig @@ -389,6 +389,7 @@ config XLNX_ZYNQMP_ARM select XLNX_CSU_DMA select XLNX_ZYNQMP select XLNX_ZDMA +select USB_DWC3 config XLNX_VERSAL bool
Re: [PATCH v3 02/10] target/riscv: always allow write_misa() to write MISA
On Wed, Feb 15, 2023 at 03:57:18PM -0300, Daniel Henrique Barboza wrote: > At this moment, and apparently since ever, we have no way of enabling > RISCV_FEATURE_MISA. This means that all the code from write_misa(), all > the nuts and bolts that handles how to properly write this CSR, has > always been a no-op as well because write_misa() will always exit > earlier. > > This seems to be benign in the majority of cases. Booting an Ubuntu > 'virt' guest and logging all the calls to 'write_misa' shows that no > writes to MISA CSR was attempted. Writing MISA, i.e. enabling/disabling > RISC-V extensions after the machine is powered on, seems to be a niche > use. > > Regardless, the spec says that MISA is a WARL read-write CSR, and gating > the writes in the register doesn't make sense. OS and applications > should be wary of the consequences when writing it, but the write itself > must always be allowed. The write is already allowed, i.e. no exception is raised when writing it. The spec only says that the fields may/can be writable. So we can correctly implement the spec with just write_misa() { return RISCV_EXCP_NONE; } as it has effectively been implemented to this point. Based on Weiwei Li's pointing out of known bugs, and the fact that this function has likely never been tested, then maybe we should just implement it as above for now. Once a better solution to extension sanity checking exists and a use (or at least test) case arises, then the function could be expanded with some actually writable bits. (Also, I think that when/if we do the expansion, then something like the misa_w config proposed in the previous version of this series may still be needed in order to allow opting-in/out of the behavior change.) Thanks, drew
Re: [RFC 12/52] hw/acpi: Replace MachineState.smp access with topology helpers
Hi Zhao, 在 2023/2/13 17:49, Zhao Liu 写道: From: Zhao Liu At present, in QEMU only arm needs PPTT table to build cpu topology. Before QEMU's arm supports hybrid architectures, it's enough to limit the cpu topology of PPTT to smp type through the explicit smp interface (machine_topo_get_smp_threads()). Cc: Michael S. Tsirkin Cc: Igor Mammedov Cc: Ani Sinha Signed-off-by: Zhao Liu --- hw/acpi/aml-build.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index ea331a20d131..693bd8833d10 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -2044,7 +2044,7 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms, cluster_offset = socket_offset; } -if (ms->smp.threads == 1) { +if (machine_topo_get_smp_threads(ms) == 1) { build_processor_hierarchy_node(table_data, (1 << 1) | /* ACPI Processor ID valid */ (1 << 3), /* Node is a Leaf */ ACPI PPTT table is designed to also support the hybrid CPU topology case where nodes on the same CPU topology level can have different number of child nodes. So to be general, the diff should be: diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index ea331a20d1..dfded95bbc 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -2044,7 +2044,7 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms, cluster_offset = socket_offset; } - if (ms->smp.threads == 1) { + if (machine_topo_get_threads_by_idx(n) == 1) { build_processor_hierarchy_node(table_data, (1 << 1) | /* ACPI Processor ID valid */ (1 << 3), /* Node is a Leaf */ Actually I'm recently working on ARM hmp virtualization which relys on PPTT for topology representation, so we will also need PPTT to be general for hybrid case anyway. Thanks, Yanan
Re: [RFC PATCH v2 1/2] qapi, i386: Move kernel-hashes to SevCommonProperties
Hello Markus, On 16/02/2023 11:24, Markus Armbruster wrote: > Dov Murik writes: > >> In order to enable kernel-hashes for SNP, pull it from >> SevGuestProperties to its parent SevCommonProperties so >> it will be available for both SEV and SNP. > > Missing > > Signed-off-by: Dov Murik > Oops, thanks. I'll fix. > Patch does not apply for me. > This patch series is based on AMD's upmv10-snpv3 tree: https://github.com/mdroth/qemu/tree/upmv10-snpv3 Have you tried to apply it on top of that tree? -Dov
Re: [PATCH v3 02/10] target/riscv: always allow write_misa() to write MISA
On Thu, Feb 16, 2023 at 5:29 PM Andrew Jones wrote: > > On Wed, Feb 15, 2023 at 03:57:18PM -0300, Daniel Henrique Barboza wrote: > > At this moment, and apparently since ever, we have no way of enabling > > RISCV_FEATURE_MISA. This means that all the code from write_misa(), all > > the nuts and bolts that handles how to properly write this CSR, has > > always been a no-op as well because write_misa() will always exit > > earlier. > > > > This seems to be benign in the majority of cases. Booting an Ubuntu > > 'virt' guest and logging all the calls to 'write_misa' shows that no > > writes to MISA CSR was attempted. Writing MISA, i.e. enabling/disabling > > RISC-V extensions after the machine is powered on, seems to be a niche > > use. > > > > Regardless, the spec says that MISA is a WARL read-write CSR, and gating > > the writes in the register doesn't make sense. OS and applications > > should be wary of the consequences when writing it, but the write itself > > must always be allowed. > > The write is already allowed, i.e. no exception is raised when writing it. > The spec only says that the fields may/can be writable. So we can > correctly implement the spec with just > > write_misa() > { >return RISCV_EXCP_NONE; > } Agree. Such change is still spec compliant without worrying about the bugs. > > as it has effectively been implemented to this point. > > Based on Weiwei Li's pointing out of known bugs, and the fact that > this function has likely never been tested, then maybe we should just > implement it as above for now. Once a better solution to extension > sanity checking exists and a use (or at least test) case arises, then > the function could be expanded with some actually writable bits. (Also, > I think that when/if we do the expansion, then something like the misa_w > config proposed in the previous version of this series may still be > needed in order to allow opting-in/out of the behavior change.) In QEMU RISC-V we have some examples of implementing optional spec features without exposing a config parameter. Do we need to add config parameters for those cases too? If yes, I am afraid the parameters will grow a lot. Regards, Bin
Re: Does the page boundary check still necessary?
On Wed, Feb 15, 2023 at 04:52:24PM -1000, Richard Henderson wrote: > Date: Wed, 15 Feb 2023 16:52:24 -1000 > From: Richard Henderson > To: Kenneth Lee > Cc: qemu-devel@nongnu.org > Subject: Re: Does the page boundary check still necessary? > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 > Thunderbird/102.7.1 > > On 2/15/23 16:28, Kenneth Lee wrote: > > On Wed, Feb 15, 2023 at 04:26:18PM -1000, Richard Henderson wrote: > > > Date: Wed, 15 Feb 2023 16:26:18 -1000 > > > From: Richard Henderson > > > To: Kenneth Lee > > > Cc: qemu-devel@nongnu.org > > > Subject: Re: Does the page boundary check still necessary? > > > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 > > > Thunderbird/102.7.1 > > > > > > On 2/15/23 15:45, Kenneth Lee wrote: > > > > > > Now the chained TBs have been link with tb_link_page(), the chain > > > > > > will > > > > > > be rebuilt if it is invalidate on page. So why is this check still > > > > > > there? > > > > > > > > > > Even for a guest which doesn't use paging, and therefore does not > > > > > need to > > > > > worry about memory maps changing, we still enable breakpoints and > > > > > watchpoints on a per-page basis. > > > > > > > > > > > > > Thank you. So is this the only reason? May I write a fine grained > > > > checking to remove this limitation? > > > > > > No. > > > > > Why? > > When breakpoints change, we discard all translations on the affected page, > similarly to how we handle writes from self-modifying code. If you link > from further away, then TBs won't be invalidated properly when breakpoints > change. For most guests, this isn't a limitation because we also have to > care for modifications to page tables, so we can't allow such links anyway. > > I have no idea what you're trying to accomplish that's different from > existing guests. Thank you for the clarification. You are right. It is senseless to the current guests. I just consider a guest with some hint to the main sequence of the code. It jumps out of range constantely. Maybe I can find other way to solve the problem. Thanks again. > > > r~ -- -Kenneth Lee (Hisilicon)
Re: [PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM
On 16.02.23 06:13, Mike Rapoport wrote: Hi, On Fri, Dec 02, 2022 at 02:13:38PM +0800, Chao Peng wrote: This patch series implements KVM guest private memory for confidential computing scenarios like Intel TDX[1]. If a TDX host accesses TDX-protected guest memory, machine check can happen which can further crash the running host system, this is terrible for multi-tenant configurations. The host accesses include those from KVM userspace like QEMU. This series addresses KVM userspace induced crash by introducing new mm and KVM interfaces so KVM userspace can still manage guest memory via a fd-based approach, but it can never access the guest memory content. Sorry for jumping late. Unless I'm missing something, hibernation will also cause an machine check when there is TDX-protected memory in the system. When the hibernation creates memory snapshot it essentially walks all physical pages and saves their contents, so for TDX memory this will trigger machine check, right? I recall bringing that up in the past (also memory access due to kdump, /prov/kcore) and was told that the main focus for now is preventing unprivileged users from crashing the system, that is, not mapping such memory into user space (e.g., QEMU). In the long run, we'll want to handle such pages also properly in the other events where the kernel might access them. -- Thanks, David / dhildenb
Re: [PATCH] qemu: make version available in coredump
On Wed, Feb 15, 2023 at 05:05:47PM -0500, Stefan Hajnoczi wrote: > On Tue, 7 Jun 2022 at 16:33, Vladimir Sementsov-Ogievskiy > wrote: > > > > Add a variable with QEMU_FULL_VERSION definition. Then the content of > > the variable is easily searchable: > > > >strings /path/to/core | grep QEMU_FULL_VERSION > > > > 'volatile' keyword is used to avoid removing the variable by compiler as > > unused. > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy > > --- > > > > Hi all! > > > > Probably, I just don't know the correct way to get version from core > > file. If so, please teach me :) > > I've never hit this issue because bug reports usually include the QEMU > distro package version. Keeping the version string in the core file > seems reasonable (unless there is already another way to do this). > > Something I'm curious about: is the coredump guaranteed to contain > static const variables? I wondered if they might be located in the > .rodata ELF section and excluded from the coredump because they are > referenced in the NT_FILE mmap note instead. Maybe volatile prevents > this? In Fedora / RHEL based systems (and some other distros too IIUC) for many years, all binaries have included a "build-id" ELF note which uniquely identifies the package build. Note section [ 3] '.note.gnu.build-id' of 36 bytes at offset 0x3c0: Owner Data size Type GNU 20 GNU_BUILD_ID Build ID: e3143405b7f653a0a65b3295df760fdf2c09ba79 This can be used to query what RPM it came from (assuming the RPM is still in your repos) dnf repoquery --whatprovides debuginfo(build-id) = ...hash... this makes it into the coredump files and is what current distro tooling uses to find the binary (and libraries). There are some downsides/limitations with this though, so in Fedora 36 a new impl was added alongside which provides full package info in json Note section [ 5] '.note.package' of 136 bytes at offset 0x404: Owner Data size Type FDO 120 FDO_PACKAGING_METADATA Packaging Metadata: {"type":"rpm","name":"qemu","version":"7.0.0-13.fc37","architecture":"x86_64","osCpe":"cpe:/o:fedoraproject:fedora:37"} This format is supported by systemd core dump tools https://systemd.io/ELF_PACKAGE_METADATA/ I believe it has been proposed (and possibly implemented?) for Debian too. This is a long winded way of asking, do we really need a QEMU specific solution here ? Especially one that only tells us a QEMU verison, and nothing about the many libraries QEMU links to which affect its operational behaviour. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
[RFC PATCH v11bis 23/26] hw/xen: Map guest XENSTORE_PFN grant in emulated Xenstore
From: David Woodhouse Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 16 1 file changed, 16 insertions(+) diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index 74729cbb63..a58c89b555 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -21,6 +21,7 @@ #include "hw/sysbus.h" #include "hw/xen/xen.h" +#include "hw/xen/xen_backend_ops.h" #include "xen_overlay.h" #include "xen_evtchn.h" #include "xen_xenstore.h" @@ -34,6 +35,7 @@ #include "hw/xen/interface/io/xs_wire.h" #include "hw/xen/interface/event_channel.h" +#include "hw/xen/interface/grant_table.h" #define TYPE_XEN_XENSTORE "xen-xenstore" OBJECT_DECLARE_SIMPLE_TYPE(XenXenstoreState, XEN_XENSTORE) @@ -66,6 +68,9 @@ struct XenXenstoreState { uint8_t *impl_state; uint32_t impl_state_size; + +struct xengntdev_handle *gt; +void *granted_xs; }; struct XenXenstoreState *xen_xenstore_singleton; @@ -1413,6 +1418,17 @@ int xen_xenstore_reset(void) } s->be_port = err; +/* + * We don't actually access the guest's page through the grant, because + * this isn't real Xen, and we can just use the page we gave it in the + * first place. Map the grant anyway, mostly for cosmetic purposes so + * it *looks* like it's in use in the guest-visible grant table. + */ +s->gt = qemu_xen_gnttab_open(); +uint32_t xs_gntref = GNTTAB_RESERVED_XENSTORE; +s->granted_xs = qemu_xen_gnttab_map_refs(s->gt, 1, xen_domid, + &xs_gntref, PROT_READ|PROT_WRITE); + return 0; } -- 2.39.0
[RFC PATCH v11bis 04/26] hw/xen: Implement XenStore transactions
From: David Woodhouse Given that the whole thing supported copy on write from the beginning, transactions end up being fairly simple. On starting a transaction, just take a ref of the existing root; swap it back in on a successful commit. The main tree has a transaction ID too, and we keep a record of the last transaction ID given out. if the main tree is ever modified when it isn't the latest, it gets a new transaction ID. A commit can only succeed if the main tree hasn't moved on since it was forked. Strictly speaking, the XenStore protocol allows a transaction to succeed as long as nothing *it* read or wrote has changed in the interim, but no implementations do that; *any* change is sufficient to abort a transaction. This does not yet fire watches on the changed nodes on a commit. That bit is more fun and will come in a follow-on commit. Signed-off-by: David Woodhouse --- hw/i386/kvm/xenstore_impl.c | 99 +++-- tests/unit/test-xs-node.c | 97 2 files changed, 192 insertions(+), 4 deletions(-) diff --git a/hw/i386/kvm/xenstore_impl.c b/hw/i386/kvm/xenstore_impl.c index e7bbe3deee..692622f706 100644 --- a/hw/i386/kvm/xenstore_impl.c +++ b/hw/i386/kvm/xenstore_impl.c @@ -44,9 +44,19 @@ typedef struct XsWatch { int rel_prefix; } XsWatch; +typedef struct XsTransaction { +XsNode *root; +unsigned int base_tx; +unsigned int tx_id; +unsigned int dom_id; +} XsTransaction; + struct XenstoreImplState { XsNode *root; GHashTable *watches; +GHashTable *transactions; +unsigned int root_tx; +unsigned int last_tx; }; static inline XsNode *xs_node_new(void) @@ -169,6 +179,7 @@ struct walk_op { bool inplace; bool mutating; bool create_dirs; +bool in_transaction; }; static void fire_watches(struct walk_op *op, bool parents) @@ -176,7 +187,7 @@ static void fire_watches(struct walk_op *op, bool parents) GList *l = NULL; XsWatch *w; -if (!op->mutating) { +if (!op->mutating || op->in_transaction) { return; } @@ -421,6 +432,16 @@ static int xs_node_walk(XsNode **n, struct walk_op *op) op->path[namelen] = '\0'; if (!namelen) { assert(!op->watches); +/* + * If the main tree was changed, bump its tx ID so that outstanding + * transactions correctly fail. But don't bump it every time; only + * if it makes a difference. + */ +if (!err && op->mutating && !op->in_transaction) { +if (op->s->root_tx != op->s->last_tx) { +op->s->root_tx = ++op->s->last_tx; +} +} } return err; } @@ -479,13 +500,20 @@ static int init_walk_op(XenstoreImplState *s, struct walk_op *op, op->inplace = true; op->mutating = false; op->create_dirs = false; +op->in_transaction = false; op->dom_id = dom_id; op->s = s; if (tx_id == XBT_NULL) { *rootp = &s->root; } else { -return ENOENT; +XsTransaction *tx = g_hash_table_lookup(s->transactions, +GINT_TO_POINTER(tx_id)); +if (!tx) { +return ENOENT; +} +*rootp = &tx->root; +op->in_transaction = true; } return 0; @@ -559,13 +587,65 @@ int xs_impl_directory(XenstoreImplState *s, unsigned int dom_id, int xs_impl_transaction_start(XenstoreImplState *s, unsigned int dom_id, xs_transaction_t *tx_id) { -return ENOSYS; +XsTransaction *tx; + +if (*tx_id != XBT_NULL) { +return EINVAL; +} + +tx = g_new0(XsTransaction, 1); + +tx->tx_id = ++s->last_tx; +tx->base_tx = s->root_tx; +tx->root = xs_node_ref(s->root); +tx->dom_id = dom_id; + +g_hash_table_insert(s->transactions, GINT_TO_POINTER(tx->tx_id), tx); +*tx_id = tx->tx_id; +return 0; +} + +static int transaction_commit(XenstoreImplState *s, XsTransaction *tx) +{ +if (s->root_tx != tx->base_tx) { +return EAGAIN; +} +xs_node_unref(s->root); +s->root = tx->root; +tx->root = NULL; +s->root_tx = tx->tx_id; + +/* + * XX: Walk the new root and fire watches on any node which has a + * refcount of one (which is therefore unique to this transaction). + */ +return 0; } int xs_impl_transaction_end(XenstoreImplState *s, unsigned int dom_id, xs_transaction_t tx_id, bool commit) { -return ENOSYS; +int ret = 0; + +if (commit) { +XsTransaction *tx = g_hash_table_lookup(s->transactions, +GINT_TO_POINTER(tx_id)); +if (!tx || tx->dom_id != dom_id) { +return ENOENT; +} + +ret = transaction_commit(s, tx); +/* + * It *is* in the hash table still, so g_hash_table_remove() will + * return true and we'll return ret; + */ +
[RFC PATCH v11bis 24/26] hw/xen: Implement soft reset for emulated gnttab
From: David Woodhouse This is only part of it; we will also need to get the PV back end drivers to tear down their own mappings (or do it for them, but they kind of need to stop using the pointers too). Some more work on the actual PV back ends and xen-bus code is going to be needed to really make soft reset and migration fully functional, and this part is the basis for that. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_gnttab.c | 26 -- hw/i386/kvm/xen_gnttab.h | 1 + target/i386/kvm/xen-emu.c | 5 + 3 files changed, 30 insertions(+), 2 deletions(-) diff --git a/hw/i386/kvm/xen_gnttab.c b/hw/i386/kvm/xen_gnttab.c index 5bd99eaf17..16230f8581 100644 --- a/hw/i386/kvm/xen_gnttab.c +++ b/hw/i386/kvm/xen_gnttab.c @@ -72,13 +72,11 @@ static void xen_gnttab_realize(DeviceState *dev, Error **errp) error_setg(errp, "Xen grant table support is for Xen emulation"); return; } -s->nr_frames = 0; s->max_frames = kvm_xen_get_gnttab_max_frames(); memory_region_init_ram(&s->gnt_frames, OBJECT(dev), "xen:grant_table", XEN_PAGE_SIZE * s->max_frames, &error_abort); memory_region_set_enabled(&s->gnt_frames, true); s->entries.v1 = memory_region_get_ram_ptr(&s->gnt_frames); -memset(s->entries.v1, 0, XEN_PAGE_SIZE * s->max_frames); /* Create individual page-sizes aliases for overlays */ s->gnt_aliases = (void *)g_new0(MemoryRegion, s->max_frames); @@ -90,8 +88,11 @@ static void xen_gnttab_realize(DeviceState *dev, Error **errp) s->gnt_frame_gpas[i] = INVALID_GPA; } +s->nr_frames = 0; +memset(s->entries.v1, 0, XEN_PAGE_SIZE * s->max_frames); s->entries.v1[GNTTAB_RESERVED_XENSTORE].flags = GTF_permit_access; s->entries.v1[GNTTAB_RESERVED_XENSTORE].frame = XEN_SPECIAL_PFN(XENSTORE); + qemu_mutex_init(&s->gnt_lock); xen_gnttab_singleton = s; @@ -518,3 +519,24 @@ static struct gnttab_backend_ops emu_gnttab_backend_ops = { .unmap = xen_be_gnttab_unmap, }; +int xen_gnttab_reset(void) +{ +XenGnttabState *s = xen_gnttab_singleton; + +if (!s) { +return -ENOTSUP; +} + +QEMU_LOCK_GUARD(&s->gnt_lock); + +s->nr_frames = 0; + +memset(s->entries.v1, 0, XEN_PAGE_SIZE * s->max_frames); + +s->entries.v1[GNTTAB_RESERVED_XENSTORE].flags = GTF_permit_access; +s->entries.v1[GNTTAB_RESERVED_XENSTORE].frame = XEN_SPECIAL_PFN(XENSTORE); + +memset(s->map_track, 0, s->max_frames * ENTRIES_PER_FRAME_V1); + +return 0; +} diff --git a/hw/i386/kvm/xen_gnttab.h b/hw/i386/kvm/xen_gnttab.h index 3bdbe96191..ee215239b0 100644 --- a/hw/i386/kvm/xen_gnttab.h +++ b/hw/i386/kvm/xen_gnttab.h @@ -13,6 +13,7 @@ #define QEMU_XEN_GNTTAB_H void xen_gnttab_create(void); +int xen_gnttab_reset(void); int xen_gnttab_map_page(uint64_t idx, uint64_t gfn); struct gnttab_set_version; diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c index ef77f64e75..705de0ebb1 100644 --- a/target/i386/kvm/xen-emu.c +++ b/target/i386/kvm/xen-emu.c @@ -1384,6 +1384,11 @@ int kvm_xen_soft_reset(void) return err; } +err = xen_gnttab_reset(); +if (err) { +return err; +} + err = xen_xenstore_reset(); if (err) { return err; -- 2.39.0
[RFC PATCH v11bis 06/26] xenstore perms WIP
From: Paul Durrant Store perms as a GList of strings, check permissions. No unit tests yet. Signed-off-by: Paul Durrant Signed-off-by: David Woodhoues --- hw/i386/kvm/xen_xenstore.c | 2 +- hw/i386/kvm/xenstore_impl.c | 239 +--- hw/i386/kvm/xenstore_impl.h | 8 +- tests/unit/test-xs-node.c | 24 +++- 4 files changed, 252 insertions(+), 21 deletions(-) diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index 78c887ab29..90f3bb8946 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -98,7 +98,7 @@ static void xen_xenstore_realize(DeviceState *dev, Error **errp) aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), true, xen_xenstore_event, NULL, NULL, NULL, s); -s->impl = xs_impl_create(); +s->impl = xs_impl_create(xen_domid); } static bool xen_xenstore_is_needed(void *opaque) diff --git a/hw/i386/kvm/xenstore_impl.c b/hw/i386/kvm/xenstore_impl.c index 2f9e1ff51b..e6c6a2200e 100644 --- a/hw/i386/kvm/xenstore_impl.c +++ b/hw/i386/kvm/xenstore_impl.c @@ -12,6 +12,8 @@ #include "qemu/osdep.h" #include "qom/object.h" +#include "hw/xen/xen.h" + #include "xen_xenstore.h" #include "xenstore_impl.h" @@ -28,6 +30,7 @@ typedef struct XsNode { uint32_t ref; GByteArray *content; +GList *perms; GHashTable *children; uint64_t gencnt; bool deleted_in_tx; @@ -96,6 +99,9 @@ static inline void xs_node_unref(XsNode *n) if (n->content) { g_byte_array_unref(n->content); } +if (n->perms) { +g_list_free_full(n->perms, g_free); +} if (n->children) { g_hash_table_unref(n->children); } @@ -107,8 +113,51 @@ static inline void xs_node_unref(XsNode *n) g_free(n); } +char *xs_perm_as_string(unsigned int perm, unsigned int domid) +{ +char letter; + +switch (perm) { +case XS_PERM_READ | XS_PERM_WRITE: +letter = 'b'; +break; +case XS_PERM_READ: +letter = 'r'; +break; +case XS_PERM_WRITE: +letter = 'w'; +break; +case XS_PERM_NONE: +default: +letter = 'n'; +break; +} + +return g_strdup_printf("%c%u", letter, domid); +} + +static gpointer do_perm_copy(gconstpointer src, gpointer user_data) +{ +return g_strdup(src); +} + +static XsNode *xs_node_create(const char *name, GList *perms) +{ +XsNode *n = xs_node_new(); + +#ifdef XS_NODE_UNIT_TEST +if (name) { +n->name = g_strdup(name); +} +#endif + +n->perms = g_list_copy_deep(perms, do_perm_copy, NULL); + +return n; +} + /* For copying from one hash table to another using g_hash_table_foreach() */ -static void do_insert(gpointer key, gpointer value, gpointer user_data) +static void do_child_insert(gpointer key, gpointer value, gpointer user_data) { g_hash_table_insert(user_data, g_strdup(key), xs_node_ref(value)); } @@ -125,12 +174,16 @@ static XsNode *xs_node_copy(XsNode *old) } #endif +assert(old); if (old->children) { n->children = g_hash_table_new_full(g_str_hash, g_str_equal, g_free, (GDestroyNotify)xs_node_unref); -g_hash_table_foreach(old->children, do_insert, n->children); +g_hash_table_foreach(old->children, do_child_insert, n->children); +} +if (old->perms) { +n->perms = g_list_copy_deep(old->perms, do_perm_copy, NULL); } -if (old && old->content) { +if (old->content) { n->content = g_byte_array_ref(old->content); } return n; @@ -320,6 +373,9 @@ static XsNode *xs_node_copy_deleted(XsNode *old) (GDestroyNotify)xs_node_unref); g_hash_table_foreach(old->children, copy_deleted_recurse, n->children); } +if (old->perms) { +n->perms = g_list_copy_deep(old->perms, do_perm_copy, NULL); +} n->deleted_in_tx = true; /* If it gets resurrected we only fire a watch if it lost its content */ if (old->content) { @@ -352,6 +408,84 @@ static int xs_node_rm(XsNode **n, struct walk_op *op) return 0; } +static int xs_node_get_perms(XsNode **n, struct walk_op *op) +{ +GList **perms = op->op_opaque; + +assert(op->inplace); +assert(*n); + +*perms = g_list_copy_deep((*n)->perms, do_perm_copy, NULL); +return 0; +} + +static int xs_node_set_perms(XsNode **n, struct walk_op *op) +{ +GList *perms = op->op_opaque; + +/* We *are* the node to be written. Either this or a copy. */ +if (!op->inplace) { +XsNode *old = *n; +*n = xs_node_copy(old); +xs_node_unref(old); +} + +if ((*n)->perms) { +g_list_free_full((*n)->perms, g_free); +} +(*n)->perms = g_list_copy_deep(perms, do_perm_copy, NULL); +if (op->tx_id != XBT_NULL) { +(*n)->modified_in_tx = true; +} +return 0; +} + +static void parse_perm(const char *perm, char *
[RFC PATCH v11bis 25/26] hw/xen: Subsume xen_be_register_common() into xen_be_init()
From: David Woodhouse Every caller of xen_be_init() checks and exits on error, then calls xen_be_register_common(). Just make xen_be_init() abort for itself and return void, and register the common devices too. Signed-off-by: David Woodhouse --- hw/xen/xen-hvm-common.c | 18 +- hw/xen/xen-legacy-backend.c | 55 - hw/xenpv/xen_machine_pv.c | 6 +--- include/hw/xen/xen-legacy-backend.h | 3 +- 4 files changed, 25 insertions(+), 57 deletions(-) diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c index 0a3ffbf955..44679116ce 100644 --- a/hw/xen/xen-hvm-common.c +++ b/hw/xen/xen-hvm-common.c @@ -765,22 +765,6 @@ void xen_shutdown_fatal_error(const char *fmt, ...) qemu_system_shutdown_request(SHUTDOWN_CAUSE_HOST_ERROR); } -static void xen_register_backend(XenIOState *state) -{ -/* Initialize backend core & drivers */ -if (xen_be_init() != 0) { -error_report("xen backend core setup failed"); -goto err; -} -xen_be_register_common(); - -return; - -err: -error_report("xen hardware virtual machine backend registration failed"); -exit(1); -} - static void xen_do_ioreq_register(XenIOState *state, unsigned int max_cpus, MemoryListener xen_memory_listener) @@ -886,7 +870,7 @@ void xen_register_ioreq(XenIOState *state, unsigned int max_cpus, xen_bus_init(); -xen_register_backend(state); +xen_be_init(); return; diff --git a/hw/xen/xen-legacy-backend.c b/hw/xen/xen-legacy-backend.c index 96d0941a7c..d6ad190062 100644 --- a/hw/xen/xen-legacy-backend.c +++ b/hw/xen/xen-legacy-backend.c @@ -593,17 +593,26 @@ static int xenstore_scan(const char *type, int dom, struct XenDevOps *ops) /* */ -int xen_be_init(void) +static void xen_set_dynamic_sysbus(void) +{ +Object *machine = qdev_get_machine(); +ObjectClass *oc = object_get_class(machine); +MachineClass *mc = MACHINE_CLASS(oc); + +machine_class_allow_dynamic_sysbus_dev(mc, TYPE_XENSYSDEV); +} + +void xen_be_init(void) { xenstore = qemu_xen_xs_open(); if (!xenstore) { xen_pv_printf(NULL, 0, "can't connect to xenstored\n"); -return -1; +exit(1); } if (xen_evtchn_ops == NULL || xen_gnttab_ops == NULL) { -/* Check if xen_init() have been called */ -goto err; +xen_pv_printf(NULL, 0, "Xen operations not set up\n"); +exit(1); } xen_sysdev = qdev_new(TYPE_XENSYSDEV); @@ -611,22 +620,16 @@ int xen_be_init(void) xen_sysbus = qbus_new(TYPE_XENSYSBUS, xen_sysdev, "xen-sysbus"); qbus_set_bus_hotplug_handler(xen_sysbus); -return 0; - -err: -qemu_xen_xs_close(xenstore); -xenstore = NULL; - -return -1; -} - -static void xen_set_dynamic_sysbus(void) -{ -Object *machine = qdev_get_machine(); -ObjectClass *oc = object_get_class(machine); -MachineClass *mc = MACHINE_CLASS(oc); +xen_set_dynamic_sysbus(); -machine_class_allow_dynamic_sysbus_dev(mc, TYPE_XENSYSDEV); +xen_be_register("console", &xen_console_ops); +xen_be_register("vkbd", &xen_kbdmouse_ops); +#ifdef CONFIG_VIRTFS +xen_be_register("9pfs", &xen_9pfs_ops); +#endif +#ifdef CONFIG_USB_LIBUSB +xen_be_register("qusb", &xen_usb_ops); +#endif } int xen_be_register(const char *type, struct XenDevOps *ops) @@ -648,20 +651,6 @@ int xen_be_register(const char *type, struct XenDevOps *ops) return xenstore_scan(type, xen_domid, ops); } -void xen_be_register_common(void) -{ -xen_set_dynamic_sysbus(); - -xen_be_register("console", &xen_console_ops); -xen_be_register("vkbd", &xen_kbdmouse_ops); -#ifdef CONFIG_VIRTFS -xen_be_register("9pfs", &xen_9pfs_ops); -#endif -#ifdef CONFIG_USB_LIBUSB -xen_be_register("qusb", &xen_usb_ops); -#endif -} - int xen_be_bind_evtchn(struct XenLegacyDevice *xendev) { if (xendev->local_port != -1) { diff --git a/hw/xenpv/xen_machine_pv.c b/hw/xenpv/xen_machine_pv.c index 20c9611d71..2e759d0619 100644 --- a/hw/xenpv/xen_machine_pv.c +++ b/hw/xenpv/xen_machine_pv.c @@ -36,10 +36,7 @@ static void xen_init_pv(MachineState *machine) int i; /* Initialize backend core & drivers */ -if (xen_be_init() != 0) { -error_report("%s: xen backend core setup failed", __func__); -exit(1); -} +xen_be_init(); switch (xen_mode) { case XEN_ATTACH: @@ -55,7 +52,6 @@ static void xen_init_pv(MachineState *machine) break; } -xen_be_register_common(); xen_be_register("vfb", &xen_framebuffer_ops); xen_be_register("qnic", &xen_netdev_ops); diff --git a/include/hw/xen/xen-legacy-backend.h b/include/hw/xen/xen-legacy-backend.h index 6c2ae709f5..6c307c5f2c 100644 --- a/include/hw/xen/xen-legacy-backend.h +++ b/include/hw/xen/xen-legacy-backe
[RFC PATCH v11bis 20/26] hw/xen: Hook up emulated implementation for event channel operations
From: David Woodhouse We provided the backend-facing evtchn functions very early on as part of the core Xen platform support, since things like timers and xenstore need to use them. By what may or may not be an astonishing coincidence, those functions just *happen* all to have exactly the right function prototypes to slot into the evtchn_backend_ops table and be called by the PV backends. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_evtchn.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c index 3f60461e5c..4442e3ab06 100644 --- a/hw/i386/kvm/xen_evtchn.c +++ b/hw/i386/kvm/xen_evtchn.c @@ -34,6 +34,7 @@ #include "hw/pci/msi.h" #include "hw/pci/msix.h" #include "hw/irq.h" +#include "hw/xen/xen_backend_ops.h" #include "xen_evtchn.h" #include "xen_overlay.h" @@ -278,6 +279,17 @@ static const TypeInfo xen_evtchn_info = { .class_init= xen_evtchn_class_init, }; +static struct evtchn_backend_ops emu_evtchn_backend_ops = { +.open = xen_be_evtchn_open, +.bind_interdomain = xen_be_evtchn_bind_interdomain, +.unbind = xen_be_evtchn_unbind, +.close = xen_be_evtchn_close, +.get_fd = xen_be_evtchn_fd, +.notify = xen_be_evtchn_notify, +.unmask = xen_be_evtchn_unmask, +.pending = xen_be_evtchn_pending, +}; + static void gsi_assert_bh(void *opaque) { struct vcpu_info *vi = kvm_xen_get_vcpu_info_hva(0); @@ -318,6 +330,9 @@ void xen_evtchn_create(void) s->nr_pirq_inuse_words = DIV_ROUND_UP(s->nr_pirqs, 64); s->pirq_inuse_bitmap = g_new0(uint64_t, s->nr_pirq_inuse_words); s->pirq = g_new0(struct pirq_info, s->nr_pirqs); + +/* Set event channel functions for backend drivers to use */ +xen_evtchn_ops = &emu_evtchn_backend_ops; } void xen_evtchn_connect_gsis(qemu_irq *system_gsis) -- 2.39.0
[RFC PATCH v11bis 08/26] hw/xen: Create initial XenStore nodes
From: Paul Durrant Signed-off-by: Paul Durrant Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 70 ++ 1 file changed, 70 insertions(+) diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index d15eea76c4..ff39137846 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -76,9 +76,39 @@ struct XenXenstoreState *xen_xenstore_singleton; static void xen_xenstore_event(void *opaque); static void fire_watch_cb(void *opaque, const char *path, const char *token); +static void G_GNUC_PRINTF (4, 5) relpath_printf(XenXenstoreState *s, +GList *perms, +const char *relpath, +const char *fmt, ...) +{ +gchar *abspath; +gchar *value; +va_list args; +GByteArray *data; +int err; + +abspath = g_strdup_printf("/local/domain/%u/%s", xen_domid, relpath); +va_start(args, fmt); +value = g_strdup_vprintf(fmt, args); +va_end(args); + +data = g_byte_array_new_take((void *)value, strlen(value)); + +err = xs_impl_write(s->impl, DOMID_QEMU, XBT_NULL, abspath, data); +assert(!err); + +g_byte_array_unref(data); + +err = xs_impl_set_perms(s->impl, DOMID_QEMU, XBT_NULL, abspath, perms); +assert(!err); + +g_free(abspath); +} + static void xen_xenstore_realize(DeviceState *dev, Error **errp) { XenXenstoreState *s = XEN_XENSTORE(dev); +GList *perms; if (xen_mode != XEN_EMULATE) { error_setg(errp, "Xen xenstore support is for Xen emulation"); @@ -102,6 +132,46 @@ static void xen_xenstore_realize(DeviceState *dev, Error **errp) xen_xenstore_event, NULL, NULL, NULL, s); s->impl = xs_impl_create(xen_domid); + +/* Populate the default nodes */ + +/* Nodes owned by 'dom0' but readable by the guest */ +perms = g_list_append(NULL, xs_perm_as_string(XS_PERM_NONE, DOMID_QEMU)); +perms = g_list_append(perms, xs_perm_as_string(XS_PERM_READ, xen_domid)); + +relpath_printf(s, perms, "", "%s", ""); + +relpath_printf(s, perms, "domid", "%u", xen_domid); + +relpath_printf(s, perms, "control/platform-feature-xs_reset_watches", "%u", 1); +relpath_printf(s, perms, "control/platform-feature-multiprocessor-suspend", "%u", 1); + +relpath_printf(s, perms, "platform/acpi", "%u", 1); +relpath_printf(s, perms, "platform/acpi_s3", "%u", 1); +relpath_printf(s, perms, "platform/acpi_s4", "%u", 1); +relpath_printf(s, perms, "platform/acpi_laptop_slate", "%u", 0); + +g_list_free_full(perms, g_free); + +/* Nodes owned by the guest */ +perms = g_list_append(NULL, xs_perm_as_string(XS_PERM_NONE, xen_domid)); + +relpath_printf(s, perms, "attr", "%s", ""); + +relpath_printf(s, perms, "control/shutdown", "%s", ""); +relpath_printf(s, perms, "control/feature-poweroff", "%u", 1); +relpath_printf(s, perms, "control/feature-reboot", "%u", 1); +relpath_printf(s, perms, "control/feature-suspend", "%u", 1); +relpath_printf(s, perms, "control/feature-s3", "%u", 1); +relpath_printf(s, perms, "control/feature-s4", "%u", 1); + +relpath_printf(s, perms, "data", "%s", ""); +relpath_printf(s, perms, "device", "%s", ""); +relpath_printf(s, perms, "drivers", "%s", ""); +relpath_printf(s, perms, "error", "%s", ""); +relpath_printf(s, perms, "feature", "%s", ""); + +g_list_free_full(perms, g_free); } static bool xen_xenstore_is_needed(void *opaque) -- 2.39.0
[RFC PATCH v11bis 02/26] hw/xen: Add basic XenStore tree walk and write/read/directory support
From: David Woodhouse This is a fairly simple implementation of a copy-on-write tree. The node walk function starts off at the root, with 'inplace == true'. If it ever encounters a node with a refcount greater than one (including the root node), then that node is shared with other trees, and cannot be modified in place, so the inplace flag is cleared and we copy on write from there on down. Xenstore write has 'mkdir -p' semantics and will create the intermediate nodes if they don't already exist, so in that case we flip the inplace flag back to true as as populated the newly-created nodes. We put a copy of the absolute path into the buffer in the struct walk_op, with *two* NUL terminators at the end. As xs_node_walk() goes down the tree, it replaces the next '/' separator with a NUL so that it can use the 'child name' in place. The next recursion down then puts the '/' back and repeats the exercise for the next path element... if it doesn't hit that *second* NUL termination which indicates the true end of the path. Signed-off-by: David Woodhouse --- hw/i386/kvm/xenstore_impl.c | 440 +++- tests/unit/meson.build | 1 + tests/unit/test-xs-node.c | 177 +++ 3 files changed, 611 insertions(+), 7 deletions(-) create mode 100644 tests/unit/test-xs-node.c diff --git a/hw/i386/kvm/xenstore_impl.c b/hw/i386/kvm/xenstore_impl.c index 31dbc98fe0..520883d172 100644 --- a/hw/i386/kvm/xenstore_impl.c +++ b/hw/i386/kvm/xenstore_impl.c @@ -10,13 +10,384 @@ */ #include "qemu/osdep.h" +#include "qom/object.h" #include "xen_xenstore.h" #include "xenstore_impl.h" +#include "hw/xen/interface/io/xs_wire.h" + +#define TYPE_XS_NODE "xs-node" + +typedef struct XsNode { +uint32_t ref; +GByteArray *content; +GHashTable *children; +uint64_t gencnt; +#ifdef XS_NODE_UNIT_TEST +gchar *name; /* debug only */ +#endif +} XsNode; + +#define XS_MAX_WATCHES 128 +#define XS_MAX_DOMAIN_NODES 1000 +#define XS_MAX_NODE_SIZE2048 +#define XS_MAX_TRANSACTIONS 10 +#define XS_MAX_PERMS_PER_NODE 5 + struct XenstoreImplState { +XsNode *root; +}; + +static inline XsNode *xs_node_new(void) +{ +XsNode *n = g_new0(XsNode, 1); +n->ref = 1; + +#ifdef XS_NODE_UNIT_TEST +nr_xs_nodes++; +xs_node_list = g_list_prepend(xs_node_list, n); +#endif +return n; +} + +static inline XsNode *xs_node_ref(XsNode *n) +{ +/* With just 10 transactions, it can never get anywhere near this. */ +g_assert(n->ref < INT_MAX); + +g_assert(n->ref); +n->ref++; +return n; +} + +static inline void xs_node_unref(XsNode *n) +{ +if (!n) { +return; +} +g_assert(n->ref); +if (--n->ref) { +return; +} + +if (n->content) { +g_byte_array_unref(n->content); +} +if (n->children) { +g_hash_table_unref(n->children); +} +#ifdef XS_NODE_UNIT_TEST +g_free(n->name); +nr_xs_nodes--; +xs_node_list = g_list_remove(xs_node_list, n); +#endif +g_free(n); +} + +/* For copying from one hash table to another using g_hash_table_foreach() */ +static void do_insert(gpointer key, gpointer value, gpointer user_data) +{ +g_hash_table_insert(user_data, g_strdup(key), xs_node_ref(value)); +} + +static XsNode *xs_node_copy(XsNode *old) +{ +XsNode *n = xs_node_new(); + +n->gencnt = old->gencnt; +if (old->children) { +n->children = g_hash_table_new_full(g_str_hash, g_str_equal, g_free, +(GDestroyNotify)xs_node_unref); +g_hash_table_foreach(old->children, do_insert, n->children); +} +if (old && old->content) { +n->content = g_byte_array_ref(old->content); +} +return n; +} + +/* Returns true if it made a change to the hash table */ +static bool xs_node_add_child(XsNode *n, const char *path_elem, XsNode *child) +{ +assert(!strchr(path_elem, '/')); + +if (!child) { +assert(n->children); +return g_hash_table_remove(n->children, path_elem); +} + +#ifdef XS_NODE_UNIT_TEST +g_free(child->name); +child->name = g_strdup(path_elem); +#endif +if (!n->children) { +n->children = g_hash_table_new_full(g_str_hash, g_str_equal, g_free, +(GDestroyNotify)xs_node_unref); +} + +/* + * The documentation for g_hash_table_insert() says that it "returns a + * boolean value to indicate whether the newly added value was already + * in the hash table or not." + * + * It could perhaps be clearer that returning TRUE means it wasn't, + */ +return g_hash_table_insert(n->children, g_strdup(path_elem), child); +} + +struct walk_op { +struct XenstoreImplState *s; +char path[XENSTORE_ABS_PATH_MAX + 2]; /* Two NUL terminators */ +int (*op_fn)(XsNode **n, struct walk_op *op); +void *op_opaque; +void *op_opaque2; + +unsigned int dom_id; + +/* + * This is maintain
[RFC PATCH v11bis 12/26] hw/xen: Add foreignmem operations to allow redirection to internal emulation
From: David Woodhouse Signed-off-by: David Woodhouse Signed-off-by: Paul Durrant --- hw/char/xen_console.c| 8 ++-- hw/display/xenfb.c | 20 +- hw/xen/xen-operations.c | 63 include/hw/xen/xen_backend_ops.h | 26 + include/hw/xen/xen_common.h | 13 --- softmmu/globals.c| 1 + 6 files changed, 105 insertions(+), 26 deletions(-) diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c index 19ad6c946a..e9cef3e1ef 100644 --- a/hw/char/xen_console.c +++ b/hw/char/xen_console.c @@ -237,9 +237,9 @@ static int con_initialise(struct XenLegacyDevice *xendev) if (!xendev->dev) { xen_pfn_t mfn = con->ring_ref; -con->sring = xenforeignmemory_map(xen_fmem, con->xendev.dom, - PROT_READ | PROT_WRITE, - 1, &mfn, NULL); +con->sring = qemu_xen_foreignmem_map(con->xendev.dom, NULL, + PROT_READ | PROT_WRITE, + 1, &mfn, NULL); } else { con->sring = xen_be_map_grant_ref(xendev, con->ring_ref, PROT_READ | PROT_WRITE); @@ -269,7 +269,7 @@ static void con_disconnect(struct XenLegacyDevice *xendev) if (con->sring) { if (!xendev->dev) { -xenforeignmemory_unmap(xen_fmem, con->sring, 1); +qemu_xen_foreignmem_unmap(con->sring, 1); } else { xen_be_unmap_grant_ref(xendev, con->sring, con->ring_ref); } diff --git a/hw/display/xenfb.c b/hw/display/xenfb.c index 260eb38a76..2c4016fcbd 100644 --- a/hw/display/xenfb.c +++ b/hw/display/xenfb.c @@ -98,8 +98,9 @@ static int common_bind(struct common *c) if (xenstore_read_fe_int(&c->xendev, "event-channel", &c->xendev.remote_port) == -1) return -1; -c->page = xenforeignmemory_map(xen_fmem, c->xendev.dom, - PROT_READ | PROT_WRITE, 1, &mfn, NULL); +c->page = qemu_xen_foreignmem_map(c->xendev.dom, NULL, + PROT_READ | PROT_WRITE, 1, &mfn, + NULL); if (c->page == NULL) return -1; @@ -115,7 +116,7 @@ static void common_unbind(struct common *c) { xen_pv_unbind_evtchn(&c->xendev); if (c->page) { -xenforeignmemory_unmap(xen_fmem, c->page, 1); +qemu_xen_foreignmem_unmap(c->page, 1); c->page = NULL; } } @@ -500,15 +501,16 @@ static int xenfb_map_fb(struct XenFB *xenfb) fbmfns = g_new0(xen_pfn_t, xenfb->fbpages); xenfb_copy_mfns(mode, n_fbdirs, pgmfns, pd); -map = xenforeignmemory_map(xen_fmem, xenfb->c.xendev.dom, - PROT_READ, n_fbdirs, pgmfns, NULL); +map = qemu_xen_foreignmem_map(xenfb->c.xendev.dom, NULL, PROT_READ, + n_fbdirs, pgmfns, NULL); if (map == NULL) goto out; xenfb_copy_mfns(mode, xenfb->fbpages, fbmfns, map); -xenforeignmemory_unmap(xen_fmem, map, n_fbdirs); +qemu_xen_foreignmem_unmap(map, n_fbdirs); -xenfb->pixels = xenforeignmemory_map(xen_fmem, xenfb->c.xendev.dom, -PROT_READ, xenfb->fbpages, fbmfns, NULL); +xenfb->pixels = qemu_xen_foreignmem_map(xenfb->c.xendev.dom, NULL, +PROT_READ, xenfb->fbpages, +fbmfns, NULL); if (xenfb->pixels == NULL) goto out; @@ -927,7 +929,7 @@ static void fb_disconnect(struct XenLegacyDevice *xendev) * Replacing the framebuffer with anonymous shared memory * instead. This releases the guest pages and keeps qemu happy. */ -xenforeignmemory_unmap(xen_fmem, fb->pixels, fb->fbpages); +qemu_xen_foreignmem_unmap(fb->pixels, fb->fbpages); fb->pixels = mmap(fb->pixels, fb->fbpages * XC_PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANON, -1, 0); diff --git a/hw/xen/xen-operations.c b/hw/xen/xen-operations.c index 73dabac8e5..4c6b305cc4 100644 --- a/hw/xen/xen-operations.c +++ b/hw/xen/xen-operations.c @@ -22,6 +22,7 @@ */ #undef XC_WANT_COMPAT_EVTCHN_API #undef XC_WANT_COMPAT_GNTTAB_API +#undef XC_WANT_COMPAT_MAP_FOREIGN_API #include @@ -56,10 +57,13 @@ typedef xc_gnttab xengnttab_handle; #define xengnttab_map_domain_grant_refs(h, c, d, r, p) \ xc_gnttab_map_domain_grant_refs(h, c, d, r, p) +typedef xc_interface xenforeignmemory_handle; + #else /* CONFIG_XEN_CTRL_INTERFACE_VERSION >= 40701 */ #include #include +#include #endif @@ -218,6 +222,64 @@ static struct gnttab_backend_ops libxengnttab_backend_ops = { .unmap = libxengnttab_backend_unmap, }; +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 40701 + +static void *libxenforeignmem_backend_map(uint32_t dom, void *addr, int prot, +
[RFC PATCH v11bis 03/26] hw/xen: Implement XenStore watches
From: David Woodhouse Starts out fairly simple: a hash table of watches based on the path. Except there can be multiple watches on the same path, so the watch ends up being a simple linked list, and the head of that list is in the hash table. Which makes removal a bit of a PITA but it's not so bad; we just special-case "I had to remove the head of the list and now I have to replace it in / remove it from the hash table". And if we don't remove the head, it's a simple linked-list operation. We do need to fire watches on *deleted* nodes, so instead of just a simple xs_node_unref() on the topmost victim, we need to recurse down and fire watches on them all. Signed-off-by: David Woodhouse --- hw/i386/kvm/xenstore_impl.c | 294 +--- tests/unit/test-xs-node.c | 80 ++ 2 files changed, 355 insertions(+), 19 deletions(-) diff --git a/hw/i386/kvm/xenstore_impl.c b/hw/i386/kvm/xenstore_impl.c index 520883d172..e7bbe3deee 100644 --- a/hw/i386/kvm/xenstore_impl.c +++ b/hw/i386/kvm/xenstore_impl.c @@ -19,6 +19,12 @@ #define TYPE_XS_NODE "xs-node" +#define XS_MAX_WATCHES 128 +#define XS_MAX_DOMAIN_NODES 1000 +#define XS_MAX_NODE_SIZE2048 +#define XS_MAX_TRANSACTIONS 10 +#define XS_MAX_PERMS_PER_NODE 5 + typedef struct XsNode { uint32_t ref; GByteArray *content; @@ -29,14 +35,18 @@ typedef struct XsNode { #endif } XsNode; -#define XS_MAX_WATCHES 128 -#define XS_MAX_DOMAIN_NODES 1000 -#define XS_MAX_NODE_SIZE2048 -#define XS_MAX_TRANSACTIONS 10 -#define XS_MAX_PERMS_PER_NODE 5 +typedef struct XsWatch { +struct XsWatch *next; +xs_impl_watch_fn *cb; +void *cb_opaque; +char *token; +unsigned int dom_id; +int rel_prefix; +} XsWatch; struct XenstoreImplState { XsNode *root; +GHashTable *watches; }; static inline XsNode *xs_node_new(void) @@ -143,6 +153,7 @@ struct walk_op { void *op_opaque; void *op_opaque2; +GList *watches; unsigned int dom_id; /* @@ -160,6 +171,35 @@ struct walk_op { bool create_dirs; }; +static void fire_watches(struct walk_op *op, bool parents) +{ +GList *l = NULL; +XsWatch *w; + +if (!op->mutating) { +return; +} + +if (parents) { +l = op->watches; +} + +w = g_hash_table_lookup(op->s->watches, op->path); +while (w || l) { +if (!w) { +/* Fire the parent nodes from 'op' if asked to */ +w = l->data; +l = l->next; +continue; +} + +assert(strlen(op->path) > w->rel_prefix); +w->cb(w->cb_opaque, op->path + w->rel_prefix, w->token); + +w = w->next; +} +} + static int xs_node_add_content(XsNode **n, struct walk_op *op) { /* We *are* the node to be written. Either this or a copy. */ @@ -192,10 +232,51 @@ static int xs_node_get_content(XsNode **n, struct walk_op *op) return 0; } +static int node_rm_recurse(gpointer key, gpointer value, gpointer user_data) +{ +struct walk_op *op = user_data; +int path_len = strlen(op->path); +int key_len = strlen(key); +XsNode *n = value; +bool this_inplace = op->inplace; + +if (n->ref != 1) { +op->inplace = 0; +} + +assert(key_len + path_len + 2 <= sizeof(op->path)); +op->path[path_len] = '/'; +memcpy(op->path + path_len + 1, key, key_len + 1); + +if (n->children) { +g_hash_table_foreach_remove(n->children, node_rm_recurse, op); +} +/* + * Fire watches on *this* node but not the parents because they are + * going to be deleted too, so the watch will fire for them anyway. + */ +fire_watches(op, false); +op->path[path_len] = '\0'; + + +/* + * Actually deleting the child here is just an optimisation; if we + * don't then the final unref on the topmost victim will just have + * to cascade down again repeating all the g_hash_table_foreach() + * calls. + */ +return this_inplace; +} + static int xs_node_rm(XsNode **n, struct walk_op *op) { bool this_inplace = op->inplace; +/* Fire watches for any node in the subtree which gets deleted. */ +if ((*n)->children) { +g_hash_table_foreach_remove((*n)->children, node_rm_recurse, op); +} + if (this_inplace) { xs_node_unref(*n); } @@ -221,9 +302,11 @@ static int xs_node_walk(XsNode **n, struct walk_op *op) XsNode *old = *n, *child = NULL; bool stole_child = false; bool this_inplace; +XsWatch *watch; int err; namelen = strlen(op->path); +watch = g_hash_table_lookup(op->s->watches, op->path); /* Is there a child, or do we hit the double-NUL termination? */ if (op->path[namelen + 1]) { @@ -244,6 +327,9 @@ static int xs_node_walk(XsNode **n, struct walk_op *op) if (!child_name) { /* This is the actual node on which the operation shall be performed */ err = op->op_fn(n, op);
[RFC PATCH v11bis 21/26] hw/xen: Add emulated implementation of grant table operations
From: David Woodhouse This is limited to mapping a single grant at a time, because under Xen the pages are mapped *contiguously* into qemu's address space, and that's very hard to do when those pages actually come from anonymous mappings in qemu in the first place. Eventually perhaps we can look at using shared mappings of actual objects for system RAM, and then we can make new mappings of the same backing store (be it deleted files, shmem, whatever). But for now let's stick to a page at a time. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_gnttab.c | 294 ++- 1 file changed, 291 insertions(+), 3 deletions(-) diff --git a/hw/i386/kvm/xen_gnttab.c b/hw/i386/kvm/xen_gnttab.c index 1e691ded32..5bd99eaf17 100644 --- a/hw/i386/kvm/xen_gnttab.c +++ b/hw/i386/kvm/xen_gnttab.c @@ -22,6 +22,7 @@ #include "hw/sysbus.h" #include "hw/xen/xen.h" +#include "hw/xen/xen_backend_ops.h" #include "xen_overlay.h" #include "xen_gnttab.h" @@ -34,11 +35,10 @@ #define TYPE_XEN_GNTTAB "xen-gnttab" OBJECT_DECLARE_SIMPLE_TYPE(XenGnttabState, XEN_GNTTAB) -#define XEN_PAGE_SHIFT 12 -#define XEN_PAGE_SIZE (1ULL << XEN_PAGE_SHIFT) - #define ENTRIES_PER_FRAME_V1 (XEN_PAGE_SIZE / sizeof(grant_entry_v1_t)) +static struct gnttab_backend_ops emu_gnttab_backend_ops; + struct XenGnttabState { /*< private >*/ SysBusDevice busdev; @@ -57,6 +57,8 @@ struct XenGnttabState { MemoryRegion gnt_frames; MemoryRegion *gnt_aliases; uint64_t *gnt_frame_gpas; + +uint8_t *map_track; }; struct XenGnttabState *xen_gnttab_singleton; @@ -88,9 +90,15 @@ static void xen_gnttab_realize(DeviceState *dev, Error **errp) s->gnt_frame_gpas[i] = INVALID_GPA; } +s->entries.v1[GNTTAB_RESERVED_XENSTORE].flags = GTF_permit_access; +s->entries.v1[GNTTAB_RESERVED_XENSTORE].frame = XEN_SPECIAL_PFN(XENSTORE); qemu_mutex_init(&s->gnt_lock); xen_gnttab_singleton = s; + +s->map_track = g_new0(uint8_t, s->max_frames * ENTRIES_PER_FRAME_V1); + +xen_gnttab_ops = &emu_gnttab_backend_ops; } static int xen_gnttab_post_load(void *opaque, int version_id) @@ -230,3 +238,283 @@ int xen_gnttab_query_size_op(struct gnttab_query_size *size) size->max_nr_frames = s->max_frames; return 0; } + +/* Track per-open refs, to allow close() to clean up. */ +struct active_ref { +MemoryRegionSection mrs; +void *virtaddr; +uint32_t refcnt; +int prot; +}; + +static void gnt_unref(XenGnttabState *s, grant_ref_t ref, + MemoryRegionSection *mrs, int prot) +{ +if (mrs && mrs->mr) { + if (prot & PROT_WRITE) { +memory_region_set_dirty(mrs->mr, mrs->offset_within_region, +XEN_PAGE_SIZE); +} +memory_region_unref(mrs->mr); +mrs->mr = NULL; +} +assert(s->map_track[ref] != 0); + +if (--s->map_track[ref] == 0) { +grant_entry_v1_t *gnt_p = &s->entries.v1[ref]; +qatomic_and(&gnt_p->flags, ~(GTF_reading | GTF_writing)); +} +} + +static uint64_t gnt_ref(XenGnttabState *s, grant_ref_t ref, int prot) +{ +uint16_t mask = GTF_type_mask | GTF_sub_page; +volatile grant_entry_v1_t *gnt_p; +grant_entry_v1_t gnt; +int retries = 0; + +if (ref >= s->max_frames * ENTRIES_PER_FRAME_V1 || +s->map_track[ref] == UINT8_MAX) { +return INVALID_GPA; +} + +if (prot & PROT_WRITE) { +mask |= GTF_readonly; +} + +gnt_p = &s->entries.v1[ref]; + +/* + * The guest can legitimately be changing the GTF_readonly flag. Allow + * that, but don't let a malicious guest cause a livelock. + */ +for (retries = 0; retries < 5; retries++) { +uint16_t new_flags; +gnt = *gnt_p; + +if ((gnt.flags & mask) != GTF_permit_access || +gnt.domid != DOMID_QEMU) { +return INVALID_GPA; +} + +new_flags = gnt.flags | GTF_reading; +if (prot & PROT_WRITE) { +new_flags |= GTF_writing; +} + +if (qatomic_cmpxchg(&gnt_p->flags, gnt.flags, new_flags) == gnt.flags) { +return (uint64_t)gnt.frame << XEN_PAGE_SHIFT; +} +} + +return INVALID_GPA; +} + +struct xengntdev_handle { +GHashTable *active_maps; +}; + +static int xen_be_gnttab_set_max_grants(struct xengntdev_handle *xgt, +uint32_t nr_grants) +{ +return 0; +} + +static void *xen_be_gnttab_map_refs(struct xengntdev_handle *xgt, uint32_t count, +uint32_t domid, uint32_t *refs, int prot) +{ +XenGnttabState *s = xen_gnttab_singleton; +struct active_ref *act; + +if (!s) { +errno = ENOTSUP; +return NULL; +} + +if (domid != xen_domid) { +errno = EINVAL; +return NULL; +} + +if (!count || count > 4096) { +errno = EINVAL; +return NULL; +} + +/* + * Making a contig
[RFC PATCH v11bis 16/26] hw/xen: Rename xen_common.h to xen_native.h
From: David Woodhouse This header is now only for native Xen code, not PV backends that may be used in Xen emulation. Since the toolstack libraries may depend on the specific version of Xen headers that they pull in (and will set the __XEN_TOOLS__ macro to enable internal definitions that they depend on), the rule is that xen_native.h (and thus the toolstack library headers) must be included *before* any of the headers in include/hw/xen/interface. Signed-off-by: David Woodhouse --- accel/xen/xen-all.c | 1 + hw/9pfs/xen-9p-backend.c | 1 + hw/block/dataplane/xen-block.c| 3 ++- hw/block/xen-block.c | 1 - hw/i386/pc_piix.c | 4 ++-- hw/i386/xen/xen-hvm.c | 8 +++ hw/i386/xen/xen_platform.c| 7 +++--- hw/xen/trace-events | 2 +- hw/xen/xen-mapcache.c | 2 +- hw/xen/xen-operations.c | 2 +- hw/xen/xen_pt.c | 2 +- hw/xen/xen_pt.h | 2 +- hw/xen/xen_pt_config_init.c | 2 +- hw/xen/xen_pt_msi.c | 4 ++-- include/hw/xen/xen-hvm-common.h | 2 +- include/hw/xen/xen.h | 22 --- include/hw/xen/{xen_common.h => xen_native.h} | 10 ++--- include/hw/xen/xen_pvdev.h| 3 ++- 18 files changed, 46 insertions(+), 32 deletions(-) rename include/hw/xen/{xen_common.h => xen_native.h} (98%) diff --git a/accel/xen/xen-all.c b/accel/xen/xen-all.c index 514bc9eea4..fcdc8364bf 100644 --- a/accel/xen/xen-all.c +++ b/accel/xen/xen-all.c @@ -12,6 +12,7 @@ #include "qemu/error-report.h" #include "qemu/module.h" #include "qapi/error.h" +#include "hw/xen/xen_native.h" #include "hw/xen/xen-legacy-backend.h" #include "hw/xen/xen_pt.h" #include "chardev/char.h" diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c index d8bb0e847c..74f3a05f88 100644 --- a/hw/9pfs/xen-9p-backend.c +++ b/hw/9pfs/xen-9p-backend.c @@ -22,6 +22,7 @@ #include "qemu/config-file.h" #include "qemu/main-loop.h" #include "qemu/option.h" +#include "qemu/iov.h" #include "fsdev/qemu-fsdev.h" #define VERSIONS "1" diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c index 8322a1de82..734da42ea7 100644 --- a/hw/block/dataplane/xen-block.c +++ b/hw/block/dataplane/xen-block.c @@ -23,8 +23,9 @@ #include "qemu/main-loop.h" #include "qemu/memalign.h" #include "qapi/error.h" -#include "hw/xen/xen_common.h" +#include "hw/xen/xen.h" #include "hw/block/xen_blkif.h" +#include "hw/xen/interface/io/ring.h" #include "sysemu/block-backend.h" #include "sysemu/iothread.h" #include "xen-block.h" diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c index 345b284d70..87299615e3 100644 --- a/hw/block/xen-block.c +++ b/hw/block/xen-block.c @@ -19,7 +19,6 @@ #include "qapi/qmp/qdict.h" #include "qapi/qmp/qstring.h" #include "qom/object_interfaces.h" -#include "hw/xen/xen_common.h" #include "hw/block/xen_blkif.h" #include "hw/qdev-properties.h" #include "hw/xen/xen-block.h" diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index df64dd8dcc..4a4825e81d 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -46,8 +46,6 @@ #include "hw/kvm/clock.h" #include "hw/sysbus.h" #include "hw/i2c/smbus_eeprom.h" -#include "hw/xen/xen-x86.h" -#include "hw/xen/xen.h" #include "exec/memory.h" #include "hw/acpi/acpi.h" #include "hw/acpi/piix4.h" @@ -59,6 +57,8 @@ #include #include "hw/xen/xen_pt.h" #endif +#include "hw/xen/xen-x86.h" +#include "hw/xen/xen.h" #include "migration/global_state.h" #include "migration/misc.h" #include "sysemu/numa.h" diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c index 4219308caf..98f1836708 100644 --- a/hw/i386/xen/xen-hvm.c +++ b/hw/i386/xen/xen-hvm.c @@ -30,10 +30,10 @@ static bool xen_in_migration; /* Compatibility with older version */ -/* This allows QEMU to build on a system that has Xen 4.5 or earlier - * installed. This here (not in hw/xen/xen_common.h) because xen/hvm/ioreq.h - * needs to be included before this block and hw/xen/xen_common.h needs to - * be included before xen/hvm/ioreq.h +/* This allows QEMU to build on a system that has Xen 4.5 or earlier installed. + * This is here (not in hw/xen/xen_native.h) because xen/hvm/ioreq.h needs to + * be included before this block and hw/xen/xen_native.h needs to be included + * before xen/hvm/ioreq.h */ #ifndef IOREQ_TYPE_VMWARE_PORT #define IOREQ_TYPE_VMWARE_PORT 3 diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c index 319049d80c..dfa1379ec9 100644 --- a/hw/i386/xen/xen_platform.c +++ b/hw/i386/xen/xen_platform.c @@ -29,7 +29,6 @@ #include "hw/ide/pci.h" #include "hw/pci/pci.h" #include "migration/vmstate.h" -#include "hw/xen/xen.h" #include "net/net.h" #includ
[RFC PATCH v11bis 14/26] hw/xen: Move xenstore_store_pv_console_info to xen_console.c
From: David Woodhouse There's no need for this to be in the Xen accel code, and as we want to use the Xen console support with KVM-emulated Xen we'll want to have a platform-agnostic version of it. Make it use GString to build up the path while we're at it. Signed-off-by: David Woodhouse --- accel/xen/xen-all.c | 61 --- hw/char/xen_console.c | 45 +-- include/hw/xen/xen.h | 2 -- 3 files changed, 43 insertions(+), 65 deletions(-) diff --git a/accel/xen/xen-all.c b/accel/xen/xen-all.c index 6df7b4ff34..514bc9eea4 100644 --- a/accel/xen/xen-all.c +++ b/accel/xen/xen-all.c @@ -39,67 +39,6 @@ xc_interface *xen_xc; xenforeignmemory_handle *xen_fmem; xendevicemodel_handle *xen_dmod; -static int store_dev_info(int domid, Chardev *cs, const char *string) -{ -struct xs_handle *xs = NULL; -char *path = NULL; -char *newpath = NULL; -char *pts = NULL; -int ret = -1; - -/* Only continue if we're talking to a pty. */ -if (!CHARDEV_IS_PTY(cs)) { -return 0; -} -pts = cs->filename + 4; - -/* We now have everything we need to set the xenstore entry. */ -xs = xs_open(0); -if (xs == NULL) { -fprintf(stderr, "Could not contact XenStore\n"); -goto out; -} - -path = xs_get_domain_path(xs, domid); -if (path == NULL) { -fprintf(stderr, "xs_get_domain_path() error\n"); -goto out; -} -newpath = realloc(path, (strlen(path) + strlen(string) + -strlen("/tty") + 1)); -if (newpath == NULL) { -fprintf(stderr, "realloc error\n"); -goto out; -} -path = newpath; - -strcat(path, string); -strcat(path, "/tty"); -if (!xs_write(xs, XBT_NULL, path, pts, strlen(pts))) { -fprintf(stderr, "xs_write for '%s' fail", string); -goto out; -} -ret = 0; - -out: -free(path); -xs_close(xs); - -return ret; -} - -void xenstore_store_pv_console_info(int i, Chardev *chr) -{ -if (i == 0) { -store_dev_info(xen_domid, chr, "/console"); -} else { -char buf[32]; -snprintf(buf, sizeof(buf), "/device/console/%d", i); -store_dev_info(xen_domid, chr, buf); -} -} - - static void xenstore_record_dm_state(const char *state) { struct xs_handle *xs; diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c index ad8638a86d..c7a19c0e7c 100644 --- a/hw/char/xen_console.c +++ b/hw/char/xen_console.c @@ -173,6 +173,48 @@ static void xencons_send(struct XenConsole *con) /* */ +static int store_con_info(struct XenConsole *con) +{ +Chardev *cs = qemu_chr_fe_get_driver(&con->chr); +char *pts = NULL; +char *dom_path; +GString *path; +int ret = -1; + +/* Only continue if we're talking to a pty. */ +if (!CHARDEV_IS_PTY(cs)) { +return 0; +} +pts = cs->filename + 4; + +dom_path = qemu_xen_xs_get_domain_path(xenstore, xen_domid); +if (!dom_path) { +return 0; +} + +path = g_string_new(dom_path); +free(dom_path); + +if (con->xendev.dev) { +g_string_append_printf(path, "/device/console/%d", con->xendev.dev); +} else { +g_string_append(path, "/console"); +} +g_string_append(path, "/tty"); + +if (xenstore_write_str(con->console, path->str, pts)) { +fprintf(stderr, "xenstore_write_str for '%s' fail", path->str); +goto out; +} +ret = 0; + +out: +g_string_free(path, true); +free(path); + +return ret; +} + static int con_init(struct XenLegacyDevice *xendev) { struct XenConsole *con = container_of(xendev, struct XenConsole, xendev); @@ -215,8 +257,7 @@ static int con_init(struct XenLegacyDevice *xendev) &error_abort); } -xenstore_store_pv_console_info(con->xendev.dev, - qemu_chr_fe_get_driver(&con->chr)); +store_con_info(con); out: g_free(type); diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h index b3873c581b..fcc6c5b522 100644 --- a/include/hw/xen/xen.h +++ b/include/hw/xen/xen.h @@ -39,8 +39,6 @@ int xen_is_pirq_msi(uint32_t msi_data); qemu_irq *xen_interrupt_controller_init(void); -void xenstore_store_pv_console_info(int i, Chardev *chr); - void xen_register_framebuffer(struct MemoryRegion *mr); #endif /* QEMU_HW_XEN_H */ -- 2.39.0
[RFC PATCH v11bis 26/26] i386/xen: Initialize Xen backends from pc_basic_device_init() for emulation
From: David Woodhouse Now that all the work is done to enable the PV backends to work without actual Xen, instantiate the bus from pc_basic_device_init() for emulated mode. This allows us finally to launch an emulated Xen guest with PV disk. qemu-system-x86_64 -serial mon:stdio -M q35 -cpu host -display none \ -m 1G -smp 2 -accel kvm,xen-version=0x4000a,kernel-irqchip=split \ -kernel bzImage -append "console=ttyS0 root=/dev/xvda1" \ -drive file=/var/lib/libvirt/images/fedora28.qcow2,if=none,id=disk \ -device xen-disk,drive=disk,vdev=xvda If we use -M pc instead of q35, we can even add an IDE disk and boot a guest image normally through grub. But q35 gives us AHCI and that isn't unplugged by the Xen magic, so the guests ends up seeing "both" disks. Signed-off-by: David Woodhouse --- hw/i386/pc.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 5ec3518b9e..d464f171fa 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -102,6 +102,11 @@ #include "trace.h" #include CONFIG_DEVICES +#ifdef CONFIG_XEN_EMU +#include "hw/xen/xen-legacy-backend.h" +#include "hw/xen/xen-bus.h" +#endif + /* * Helper for setting model-id for CPU models that changed model-id * depending on QEMU versions up to QEMU 2.4. @@ -1316,6 +1321,8 @@ void pc_basic_device_init(struct PCMachineState *pcms, if (pcms->bus) { pci_create_simple(pcms->bus, -1, "xen-platform"); } +xen_bus_init(); +xen_be_init(); } #endif -- 2.39.0
[RFC PATCH v11bis 01/26] hw/xen: Add xenstore wire implementation and implementation stubs
From: David Woodhouse This implements the basic wire protocol for the XenStore commands, punting all the actual implementation to xs_impl_* functions which all just return errors for now. Signed-off-by: David Woodhouse --- hw/i386/kvm/meson.build | 1 + hw/i386/kvm/trace-events| 15 + hw/i386/kvm/xen_xenstore.c | 869 +++- hw/i386/kvm/xenstore_impl.c | 117 + hw/i386/kvm/xenstore_impl.h | 58 +++ 5 files changed, 1051 insertions(+), 9 deletions(-) create mode 100644 hw/i386/kvm/xenstore_impl.c create mode 100644 hw/i386/kvm/xenstore_impl.h diff --git a/hw/i386/kvm/meson.build b/hw/i386/kvm/meson.build index 82dd6ae7c6..6621ba5cd7 100644 --- a/hw/i386/kvm/meson.build +++ b/hw/i386/kvm/meson.build @@ -9,6 +9,7 @@ i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files( 'xen_evtchn.c', 'xen_gnttab.c', 'xen_xenstore.c', + 'xenstore_impl.c', )) i386_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss) diff --git a/hw/i386/kvm/trace-events b/hw/i386/kvm/trace-events index b83c3eb965..7917167e89 100644 --- a/hw/i386/kvm/trace-events +++ b/hw/i386/kvm/trace-events @@ -3,3 +3,18 @@ kvm_xen_unmap_pirq(int pirq, int gsi) "pirq %d gsi %d" kvm_xen_get_free_pirq(int pirq, int type) "pirq %d type %d" kvm_xen_bind_pirq(int pirq, int port) "pirq %d port %d" kvm_xen_unmask_pirq(int pirq, char *dev, int vector) "pirq %d dev %s vector %d" +xenstore_error(unsigned int id, unsigned int tx_id, const char *err) "req %u tx %u err %s" +xenstore_read(unsigned int tx_id, const char *path) "tx %u path %s" +xenstore_write(unsigned int tx_id, const char *path) "tx %u path %s" +xenstore_mkdir(unsigned int tx_id, const char *path) "tx %u path %s" +xenstore_directory(unsigned int tx_id, const char *path) "tx %u path %s" +xenstore_directory_part(unsigned int tx_id, const char *path, size_t offset) "tx %u path %s offset %zu" +xenstore_transaction_start(unsigned int new_tx) "new_tx %u" +xenstore_transaction_end(unsigned int tx_id, bool commit) "tx %u commit %d" +xenstore_rm(unsigned int tx_id, const char *path) "tx %u path %s" +xenstore_get_perms(unsigned int tx_id, const char *path) "tx %u path %s" +xenstore_set_perms(unsigned int tx_id, const char *path) "tx %u path %s" +xenstore_watch(const char *path, const char *token) "path %s token %s" +xenstore_unwatch(const char *path, const char *token) "path %s token %s" +xenstore_reset_watches(void) "" +xenstore_watch_event(const char *path, const char *token) "path %s token %s" diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index 2388842d15..78c887ab29 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -28,6 +28,10 @@ #include "sysemu/kvm.h" #include "sysemu/kvm_xen.h" +#include "trace.h" + +#include "xenstore_impl.h" + #include "hw/xen/interface/io/xs_wire.h" #include "hw/xen/interface/event_channel.h" @@ -47,6 +51,9 @@ struct XenXenstoreState { SysBusDevice busdev; /*< public >*/ +XenstoreImplState *impl; +GList *watch_events; + MemoryRegion xenstore_page; struct xenstore_domain_interface *xs; uint8_t req_data[XENSTORE_HEADER_SIZE + XENSTORE_PAYLOAD_MAX]; @@ -64,6 +71,7 @@ struct XenXenstoreState { struct XenXenstoreState *xen_xenstore_singleton; static void xen_xenstore_event(void *opaque); +static void fire_watch_cb(void *opaque, const char *path, const char *token); static void xen_xenstore_realize(DeviceState *dev, Error **errp) { @@ -89,6 +97,8 @@ static void xen_xenstore_realize(DeviceState *dev, Error **errp) } aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), true, xen_xenstore_event, NULL, NULL, NULL, s); + +s->impl = xs_impl_create(); } static bool xen_xenstore_is_needed(void *opaque) @@ -209,20 +219,757 @@ static void reset_rsp(XenXenstoreState *s) s->rsp_offset = 0; } +static void xs_error(XenXenstoreState *s, unsigned int id, + xs_transaction_t tx_id, int errnum) +{ +struct xsd_sockmsg *rsp = (struct xsd_sockmsg *)s->rsp_data; +const char *errstr = NULL; + +for (unsigned int i = 0; i < ARRAY_SIZE(xsd_errors); i++) { +struct xsd_errors *xsd_error = &xsd_errors[i]; + +if (xsd_error->errnum == errnum) { +errstr = xsd_error->errstring; +break; +} +} +assert(errstr); + +trace_xenstore_error(id, tx_id, errstr); + +rsp->type = XS_ERROR; +rsp->req_id = id; +rsp->tx_id = tx_id; +rsp->len = (uint32_t)strlen(errstr) + 1; + +memcpy(&rsp[1], errstr, rsp->len); +} + +static void xs_ok(XenXenstoreState *s, unsigned int type, unsigned int req_id, + xs_transaction_t tx_id) +{ +struct xsd_sockmsg *rsp = (struct xsd_sockmsg *)s->rsp_data; +const char *okstr = "OK"; + +rsp->type = type; +rsp->req_id = req_id; +rsp->tx_id = tx_id; +rsp->len = (uint32_t)strlen(okstr) + 1; + +memcpy(&rsp[1], okstr, rsp->len); +
[RFC PATCH v11bis 19/26] hw/xen: Only advertise ring-page-order for xen-block if gnttab supports it
From: David Woodhouse Whem emulating Xen, multi-page grants are distinctly non-trivial and we have elected not to support them for the time being. Don't advertise them to the guest. Signed-off-by: David Woodhouse --- hw/block/xen-block.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c index 87299615e3..f5a744589d 100644 --- a/hw/block/xen-block.c +++ b/hw/block/xen-block.c @@ -83,7 +83,8 @@ static void xen_block_connect(XenDevice *xendev, Error **errp) g_free(ring_ref); return; } -} else if (order <= blockdev->props.max_ring_page_order) { +} else if (qemu_xen_gnttab_can_map_multi() && + order <= blockdev->props.max_ring_page_order) { unsigned int i; nr_ring_ref = 1 << order; @@ -255,8 +256,12 @@ static void xen_block_realize(XenDevice *xendev, Error **errp) } xen_device_backend_printf(xendev, "feature-flush-cache", "%u", 1); -xen_device_backend_printf(xendev, "max-ring-page-order", "%u", - blockdev->props.max_ring_page_order); + +if (qemu_xen_gnttab_can_map_multi()) { +xen_device_backend_printf(xendev, "max-ring-page-order", "%u", + blockdev->props.max_ring_page_order); +} + xen_device_backend_printf(xendev, "info", "%u", blockdev->info); xen_device_frontend_printf(xendev, "virtual-device", "%lu", -- 2.39.0
[RFC PATCH v11bis 00/26] Emulated XenStore and PV backend support
The non-RFC patch submisson¹ is just the basic platform support for Xen on KVM. This RFC series is phase 2, adding an internal XenStore and hooking up the PV back end drivers to that and the emulated grant tables and event channels. With this, we can boot a Xen guest with PV disk, under KVM. Full support for migration isn't there yet because it's actually missing in the PV back end drivers in the first place (perhaps because upstream Xen doesn't yet have guest transparent live migration support anyway). I'm assuming that when the first round is merged and we drop the [RFC] from this set, that won't be a showstopper for now? I'd be particularly interested in opinions on the way I implemented serialization for the XenStore, by creating a GByteArray and then dumping it with VMSTATE_VARRAY_UINT32_ALLOC(). I may eventually attempt to make xs_node_walk() not actually recursive, if I can do so without my brain exploding. Since the first time the XenStore code was showed, I've stopped the XsNode from being an actual Object; that just made it larger for almost no benefit. ¹ https://lore.kernel.org/qemu-devel/20230216062444.2129371-1-dw...@infradead.org/ David Woodhouse (22): hw/xen: Add xenstore wire implementation and implementation stubs hw/xen: Add basic XenStore tree walk and write/read/directory support hw/xen: Implement XenStore watches hw/xen: Implement XenStore transactions hw/xen: Watches on XenStore transactions hw/xen: Implement core serialize/deserialize methods for xenstore_impl hw/xen: Add evtchn operations to allow redirection to internal emulation hw/xen: Add gnttab operations to allow redirection to internal emulation hw/xen: Pass grant ref to gnttab unmap operation hw/xen: Add foreignmem operations to allow redirection to internal emulation hw/xen: Move xenstore_store_pv_console_info to xen_console.c hw/xen: Use XEN_PAGE_SIZE in PV backend drivers hw/xen: Rename xen_common.h to xen_native.h hw/xen: Build PV backend drivers for CONFIG_XEN_BUS hw/xen: Only advertise ring-page-order for xen-block if gnttab supports it hw/xen: Hook up emulated implementation for event channel operations hw/xen: Add emulated implementation of grant table operations hw/xen: Add emulated implementation of XenStore operations hw/xen: Map guest XENSTORE_PFN grant in emulated Xenstore hw/xen: Implement soft reset for emulated gnttab hw/xen: Subsume xen_be_register_common() into xen_be_init() i386/xen: Initialize Xen backends from pc_basic_device_init() for emulation Paul Durrant (4): xenstore perms WIP hw/xen: Create initial XenStore nodes hw/xen: Add xenstore operations to allow redirection to internal emulation hw/xen: Avoid crash when backend watch fires too early accel/xen/xen-all.c | 69 +- hw/9pfs/meson.build |2 +- hw/9pfs/xen-9p-backend.c | 32 +- hw/block/dataplane/meson.build|2 +- hw/block/dataplane/xen-block.c| 12 +- hw/block/meson.build |2 +- hw/block/xen-block.c | 12 +- hw/char/meson.build |2 +- hw/char/xen_console.c | 57 +- hw/display/meson.build|2 +- hw/display/xenfb.c| 32 +- hw/i386/kvm/meson.build |1 + hw/i386/kvm/trace-events | 15 + hw/i386/kvm/xen_evtchn.c | 15 + hw/i386/kvm/xen_gnttab.c | 320 - hw/i386/kvm/xen_gnttab.h |1 + hw/i386/kvm/xen_xenstore.c| 1248 +- hw/i386/kvm/xenstore_impl.c | 1754 + hw/i386/kvm/xenstore_impl.h | 63 + hw/i386/pc.c |7 + hw/i386/pc_piix.c |4 +- hw/i386/xen/xen-hvm.c |8 +- hw/i386/xen/xen_platform.c|7 +- hw/net/xen_nic.c | 25 +- hw/usb/meson.build|2 +- hw/usb/xen-usb.c | 29 +- hw/xen/meson.build|6 +- hw/xen/trace-events |2 +- hw/xen/xen-bus-helper.c | 61 +- hw/xen/xen-bus.c | 408 +- hw/xen/xen-hvm-common.c | 45 +- hw/xen/xen-legacy-backend.c | 307 ++--- hw/xen/xen-mapcache.c |2 +- hw/xen/xen-operations.c | 487 +++ hw/xen/xen_devconfig.c|4 +- hw/xen/xen_pt.c |2 +- hw/xen/xen_pt.h
[RFC PATCH v11bis 07/26] hw/xen: Implement core serialize/deserialize methods for xenstore_impl
From: David Woodhouse In fact I think we want to only serialize the contents of the domain's path in /local/domain/${domid} and leave the rest to be recreated? Will defer to Paul for that. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 25 +- hw/i386/kvm/xenstore_impl.c | 530 hw/i386/kvm/xenstore_impl.h | 5 + tests/unit/test-xs-node.c | 213 ++- 4 files changed, 767 insertions(+), 6 deletions(-) diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index 90f3bb8946..d15eea76c4 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -66,6 +66,9 @@ struct XenXenstoreState { evtchn_port_t guest_port; evtchn_port_t be_port; struct xenevtchn_handle *eh; + +uint8_t *impl_state; +uint32_t impl_state_size; }; struct XenXenstoreState *xen_xenstore_singleton; @@ -109,16 +112,26 @@ static bool xen_xenstore_is_needed(void *opaque) static int xen_xenstore_pre_save(void *opaque) { XenXenstoreState *s = opaque; +GByteArray *save; if (s->eh) { s->guest_port = xen_be_evtchn_get_guest_port(s->eh); } + +g_free(s->impl_state); +save = xs_impl_serialize(s->impl); +s->impl_state = save->data; +s->impl_state_size = save->len; +g_byte_array_free(save, false); + return 0; } static int xen_xenstore_post_load(void *opaque, int ver) { XenXenstoreState *s = opaque; +GByteArray *save; +int ret; /* * As qemu/dom0, rebind to the guest's port. The Windows drivers may @@ -134,7 +147,13 @@ static int xen_xenstore_post_load(void *opaque, int ver) } s->be_port = be_port; } -return 0; + +save = g_byte_array_new_take(s->impl_state, s->impl_state_size); +s->impl_state = NULL; +s->impl_state_size = 0; + +ret = xs_impl_deserialize(s->impl, save, xen_domid, fire_watch_cb, s); +return ret; } static const VMStateDescription xen_xenstore_vmstate = { @@ -152,6 +171,10 @@ static const VMStateDescription xen_xenstore_vmstate = { VMSTATE_BOOL(rsp_pending, XenXenstoreState), VMSTATE_UINT32(guest_port, XenXenstoreState), VMSTATE_BOOL(fatal_error, XenXenstoreState), +VMSTATE_UINT32(impl_state_size, XenXenstoreState), +VMSTATE_VARRAY_UINT32_ALLOC(impl_state, XenXenstoreState, +impl_state_size, 0, +vmstate_info_uint8, uint8_t), VMSTATE_END_OF_LIST() } }; diff --git a/hw/i386/kvm/xenstore_impl.c b/hw/i386/kvm/xenstore_impl.c index e6c6a2200e..6744e99e4e 100644 --- a/hw/i386/kvm/xenstore_impl.c +++ b/hw/i386/kvm/xenstore_impl.c @@ -35,6 +35,7 @@ typedef struct XsNode { uint64_t gencnt; bool deleted_in_tx; bool modified_in_tx; +unsigned int serialized_tx; #ifdef XS_NODE_UNIT_TEST gchar *name; /* debug only */ #endif @@ -62,6 +63,7 @@ struct XenstoreImplState { GHashTable *transactions; unsigned int root_tx; unsigned int last_tx; +bool serialized; }; static inline XsNode *xs_node_new(void) @@ -1220,5 +1222,533 @@ XenstoreImplState *xs_impl_create(unsigned int dom_id) s->root = xs_node_create("/", perms); g_list_free_full(perms, g_free); +s->root_tx = s->last_tx = 1; return s; } + + +static void clear_serialized_tx(gpointer key, gpointer value, gpointer opaque) +{ +XsNode *n = value; + +n->serialized_tx = XBT_NULL; +if (n->children) { +g_hash_table_foreach(n->children, clear_serialized_tx, NULL); +} +} + +static void clear_tx_serialized_tx(gpointer key, gpointer value, + gpointer opaque) +{ +XsTransaction *t = value; + +clear_serialized_tx(NULL, t->root, NULL); +} + +static void write_be32(GByteArray *save, uint32_t val) +{ +uint32_t be = htonl(val); +g_byte_array_append(save, (void *)&be, sizeof(be)); +} + + +struct save_state { +GByteArray *bytes; +unsigned int tx_id; +}; + +#define MODIFIED_IN_TX (1U << 0) +#define DELETED_IN_TX (1U << 1) +#define NODE_REF(1U << 2) + +static void save_node(gpointer key, gpointer value, gpointer opaque) +{ +struct save_state *ss = opaque; +XsNode *n = value; +char *name = key; +uint8_t flag = 0; + +/* Child nodes (i.e. anything but the root) have a name */ +if (name) { +g_byte_array_append(ss->bytes, key, strlen(key) + 1); +} + +/* + * If we already wrote this node, refer to the previous copy. + * There's no rename/move in XenStore, so all we need to find + * it is the tx_id of the transation in which it exists. Which + * may be the root tx. + */ +if (n->serialized_tx != XBT_NULL) { +flag = NODE_REF; +g_byte_array_append(ss->bytes, &flag, 1); +write_be32(ss->bytes, n->serialized_tx); +} else { +GList *l; +n->serialized_tx = ss->tx_id; + +if (n->modifie
[RFC PATCH v11bis 09/26] hw/xen: Add evtchn operations to allow redirection to internal emulation
From: David Woodhouse The existing implementation calling into the real libxenevtchn moves to a new file hw/xen/xen-operations.c, and is called via a function table which in a subsequent commit will also be able to invoke the emulated event channel support. Signed-off-by: David Woodhouse Signed-off-by: Paul Durrant --- hw/9pfs/xen-9p-backend.c| 24 +++--- hw/xen/meson.build | 1 + hw/xen/xen-bus.c| 22 +++--- hw/xen/xen-hvm-common.c | 27 --- hw/xen/xen-legacy-backend.c | 8 +- hw/xen/xen-operations.c | 71 + hw/xen/xen_pvdev.c | 12 +-- include/hw/xen/xen-bus.h| 1 + include/hw/xen/xen-hvm-common.h | 1 + include/hw/xen/xen-legacy-backend.h | 1 + include/hw/xen/xen_backend_ops.h| 118 include/hw/xen/xen_common.h | 12 --- include/hw/xen/xen_pvdev.h | 1 + softmmu/globals.c | 1 + 14 files changed, 243 insertions(+), 57 deletions(-) create mode 100644 hw/xen/xen-operations.c create mode 100644 include/hw/xen/xen_backend_ops.h diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c index 65c4979c3c..864bdaf952 100644 --- a/hw/9pfs/xen-9p-backend.c +++ b/hw/9pfs/xen-9p-backend.c @@ -241,7 +241,7 @@ static void xen_9pfs_push_and_notify(V9fsPDU *pdu) xen_wmb(); ring->inprogress = false; -xenevtchn_notify(ring->evtchndev, ring->local_port); +qemu_xen_evtchn_notify(ring->evtchndev, ring->local_port); qemu_bh_schedule(ring->bh); } @@ -324,8 +324,8 @@ static void xen_9pfs_evtchn_event(void *opaque) Xen9pfsRing *ring = opaque; evtchn_port_t port; -port = xenevtchn_pending(ring->evtchndev); -xenevtchn_unmask(ring->evtchndev, port); +port = qemu_xen_evtchn_pending(ring->evtchndev); +qemu_xen_evtchn_unmask(ring->evtchndev, port); qemu_bh_schedule(ring->bh); } @@ -337,10 +337,10 @@ static void xen_9pfs_disconnect(struct XenLegacyDevice *xendev) for (i = 0; i < xen_9pdev->num_rings; i++) { if (xen_9pdev->rings[i].evtchndev != NULL) { -qemu_set_fd_handler(xenevtchn_fd(xen_9pdev->rings[i].evtchndev), -NULL, NULL, NULL); -xenevtchn_unbind(xen_9pdev->rings[i].evtchndev, - xen_9pdev->rings[i].local_port); + qemu_set_fd_handler(qemu_xen_evtchn_fd(xen_9pdev->rings[i].evtchndev), +NULL, NULL, NULL); +qemu_xen_evtchn_unbind(xen_9pdev->rings[i].evtchndev, + xen_9pdev->rings[i].local_port); xen_9pdev->rings[i].evtchndev = NULL; } } @@ -447,12 +447,12 @@ static int xen_9pfs_connect(struct XenLegacyDevice *xendev) xen_9pdev->rings[i].inprogress = false; -xen_9pdev->rings[i].evtchndev = xenevtchn_open(NULL, 0); +xen_9pdev->rings[i].evtchndev = qemu_xen_evtchn_open(); if (xen_9pdev->rings[i].evtchndev == NULL) { goto out; } -qemu_set_cloexec(xenevtchn_fd(xen_9pdev->rings[i].evtchndev)); -xen_9pdev->rings[i].local_port = xenevtchn_bind_interdomain +qemu_set_cloexec(qemu_xen_evtchn_fd(xen_9pdev->rings[i].evtchndev)); +xen_9pdev->rings[i].local_port = qemu_xen_evtchn_bind_interdomain (xen_9pdev->rings[i].evtchndev, xendev->dom, xen_9pdev->rings[i].evtchn); @@ -463,8 +463,8 @@ static int xen_9pfs_connect(struct XenLegacyDevice *xendev) goto out; } xen_pv_printf(xendev, 2, "bind evtchn port %d\n", xendev->local_port); -qemu_set_fd_handler(xenevtchn_fd(xen_9pdev->rings[i].evtchndev), -xen_9pfs_evtchn_event, NULL, &xen_9pdev->rings[i]); +qemu_set_fd_handler(qemu_xen_evtchn_fd(xen_9pdev->rings[i].evtchndev), +xen_9pfs_evtchn_event, NULL, &xen_9pdev->rings[i]); } xen_9pdev->security_model = xenstore_read_be_str(xendev, "security_model"); diff --git a/hw/xen/meson.build b/hw/xen/meson.build index 008e036d63..02b9118b7f 100644 --- a/hw/xen/meson.build +++ b/hw/xen/meson.build @@ -5,6 +5,7 @@ softmmu_ss.add(when: ['CONFIG_XEN', xen], if_true: files( 'xen-legacy-backend.c', 'xen_devconfig.c', 'xen_pvdev.c', + 'xen-operations.c', )) xen_specific_ss = ss.source_set() diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c index df3f6b9ae0..d0b1ae93da 100644 --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -1095,12 +1095,12 @@ static bool xen_device_poll(void *opaque) static void xen_device_event(void *opaque) { XenEventChannel *channel = opaque; -unsigned long port = xenevtchn_pending(channel->xeh); +unsigned long port = qemu_xen_evtchn_pending(channel->xeh); if (port == channel->local_port) {
[RFC PATCH v11bis 18/26] hw/xen: Avoid crash when backend watch fires too early
From: Paul Durrant The xen-block code ends up calling aio_poll() through blkconf_geometry(), which means we see watch events during the indirect call to xendev_class->realize() in xen_device_realize(). Unfortunately this call is made before populating the initial frontend and backend device nodes in xenstore and hence xen_block_frontend_changed() (which is called from a watch event) fails to read the frontend's 'state' node, and hence believes the device is being torn down. This in-turn sets the backend state to XenbusStateClosed and causes the device to be deleted before it is fully set up, leading to the crash. By simply moving the call to xendev_class->realize() after the initial xenstore nodes are populated, this sorry state of affairs is avoided. Reported-by: David Woodhouse Signed-off-by: Paul Durrant Signed-off-by: David Woodhouse --- hw/xen/xen-bus.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c index e8d9adeb3e..83121c5a43 100644 --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -1031,13 +1031,6 @@ static void xen_device_realize(DeviceState *dev, Error **errp) goto unrealize; } -if (xendev_class->realize) { -xendev_class->realize(xendev, errp); -if (*errp) { -goto unrealize; -} -} - xen_device_backend_printf(xendev, "frontend", "%s", xendev->frontend_path); xen_device_backend_printf(xendev, "frontend-id", "%u", @@ -1056,6 +1049,13 @@ static void xen_device_realize(DeviceState *dev, Error **errp) xen_device_frontend_set_state(xendev, XenbusStateInitialising, true); } +if (xendev_class->realize) { +xendev_class->realize(xendev, errp); +if (*errp) { +goto unrealize; +} +} + xendev->exit.notify = xen_device_exit; qemu_add_exit_notifier(&xendev->exit); return; -- 2.39.0
[RFC PATCH v11bis 22/26] hw/xen: Add emulated implementation of XenStore operations
From: David Woodhouse Now that we have an internal implementation of XenStore, we can populate the xenstore_backend_ops to allow PV backends to talk to it. Watches can't be processed with immediate callbacks because that would call back into XenBus code recursively. Defer them to a QEMUBH to be run as appropriate from the main loop. We use a QEMUBH per XS handle, and it walks all the watches (there shouldn't be many per handle) to fire any which have pending events. We *could* have done it differently but this allows us to use the same struct watch_event as we have for the guest side, and keeps things relatively simple. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_xenstore.c | 273 - 1 file changed, 269 insertions(+), 4 deletions(-) diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index 57cd18a7c9..74729cbb63 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -49,7 +49,7 @@ struct XenXenstoreState { /*< public >*/ XenstoreImplState *impl; -GList *watch_events; +GList *watch_events; /* for the guest */ MemoryRegion xenstore_page; struct xenstore_domain_interface *xs; @@ -73,6 +73,8 @@ struct XenXenstoreState *xen_xenstore_singleton; static void xen_xenstore_event(void *opaque); static void fire_watch_cb(void *opaque, const char *path, const char *token); +static struct xenstore_backend_ops emu_xenstore_backend_ops; + static void G_GNUC_PRINTF (4, 5) relpath_printf(XenXenstoreState *s, GList *perms, const char *relpath, @@ -169,6 +171,8 @@ static void xen_xenstore_realize(DeviceState *dev, Error **errp) relpath_printf(s, perms, "feature", "%s", ""); g_list_free_full(perms, g_free); + +xen_xenstore_ops = &emu_xenstore_backend_ops; } static bool xen_xenstore_is_needed(void *opaque) @@ -1268,6 +1272,15 @@ struct watch_event { char *token; }; +static void free_watch_event(struct watch_event *ev) +{ +if (ev) { +g_free(ev->path); +g_free(ev->token); +g_free(ev); +} +} + static void queue_watch(XenXenstoreState *s, const char *path, const char *token) { @@ -1314,9 +1327,7 @@ static void process_watch_events(XenXenstoreState *s) deliver_watch(s, ev->path, ev->token); s->watch_events = g_list_remove(s->watch_events, ev); -g_free(ev->path); -g_free(ev->token); -g_free(ev); +free_watch_event(ev); } static void xen_xenstore_event(void *opaque) @@ -1404,3 +1415,257 @@ int xen_xenstore_reset(void) return 0; } + +struct qemu_xs_handle { +XenstoreImplState *impl; +GList *watches; +QEMUBH *watch_bh; +}; + +struct qemu_xs_watch { +struct qemu_xs_handle *h; +char *path; +xs_watch_fn fn; +void *opaque; +GList *events; +}; + +static char *xs_be_get_domain_path(struct qemu_xs_handle *h, unsigned int domid) +{ +return g_strdup_printf("/local/domain/%u", domid); +} + +static char **xs_be_directory(struct qemu_xs_handle *h, xs_transaction_t t, + const char *path, unsigned int *num) +{ +GList *items = NULL, *l; +unsigned int i = 0; +char **items_ret; +int err; + +err = xs_impl_directory(h->impl, DOMID_QEMU, t, path, NULL, &items); +if (err) { +errno = err; +return NULL; +} + +items_ret = g_new0(char *, g_list_length(items) + 1); +*num = 0; +for (l = items; l; l = l->next) { +items_ret[i++] = l->data; +(*num)++; +} +g_list_free(items); +return items_ret; +} + +static void *xs_be_read(struct qemu_xs_handle *h, xs_transaction_t t, +const char *path, unsigned int *len) +{ +GByteArray *data = g_byte_array_new(); +bool free_segment = false; +int err; + +err = xs_impl_read(h->impl, DOMID_QEMU, t, path, data); +if (err) { +free_segment = true; +errno = err; +} else { +if (len) { +*len = data->len; +} +/* The xen-bus-helper code expects to get NUL terminated string! */ +g_byte_array_append(data, (void *)"", 1); +} + +return g_byte_array_free(data, free_segment); +} + +static bool xs_be_write(struct qemu_xs_handle *h, xs_transaction_t t, +const char *path, const void *data, unsigned int len) +{ +GByteArray *gdata = g_byte_array_new(); +int err; + +g_byte_array_append(gdata, data, len); +err = xs_impl_write(h->impl, DOMID_QEMU, t, path, gdata); +g_byte_array_unref(gdata); +if (err) { +errno = err; +return false; +} +return true; +} + +static bool xs_be_create(struct qemu_xs_handle *h, xs_transaction_t t, + unsigned int owner, unsigned int domid, + unsigned int perms, const char *path) +{ +g_autoptr(GByteArray) d
[RFC PATCH v11bis 10/26] hw/xen: Add gnttab operations to allow redirection to internal emulation
From: David Woodhouse Move the existing code using libxengnttab to xen-operations.c and allow the operations to be redirected so that we can add emulation of grant table mapping for backend drivers. In emulation, mapping more than one grant ref to be virtually contiguous would be fairly difficult. The best way to do it might be to make the ram_block mappings actually backed by a file (shmem or a deleted file, perhaps) so that we can have multiple *shared* mappings of it. But that would be fairly intrusive. Making the backend drivers cope with page *lists* instead of expecting the mapping to be contiguous is also non-trivial, since some structures would actually *cross* page boundaries (e.g. the 32-bit blkif responses which are 12 bytes). So for now, we'll support only single-page mappings in emulation. Add a XEN_GNTTAB_OP_FEATURE_MAP_MULTIPLE flag to indicate that the native Xen implementation *does* support multi-page maps, and a helper function to query it. Signed-off-by: David Woodhouse Signed-off-by: Paul Durrant --- hw/xen/xen-bus.c| 112 ++-- hw/xen/xen-legacy-backend.c | 125 ++ hw/xen/xen-operations.c | 157 hw/xen/xen_pvdev.c | 2 +- include/hw/xen/xen-bus.h| 3 +- include/hw/xen/xen-legacy-backend.h | 13 +-- include/hw/xen/xen_backend_ops.h| 99 ++ include/hw/xen/xen_common.h | 39 --- softmmu/globals.c | 1 + 9 files changed, 279 insertions(+), 272 deletions(-) diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c index d0b1ae93da..b247e86f28 100644 --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -947,7 +947,7 @@ static void xen_device_frontend_destroy(XenDevice *xendev) void xen_device_set_max_grant_refs(XenDevice *xendev, unsigned int nr_refs, Error **errp) { -if (xengnttab_set_max_grants(xendev->xgth, nr_refs)) { +if (qemu_xen_gnttab_set_max_grants(xendev->xgth, nr_refs)) { error_setg_errno(errp, errno, "xengnttab_set_max_grants failed"); } } @@ -956,9 +956,8 @@ void *xen_device_map_grant_refs(XenDevice *xendev, uint32_t *refs, unsigned int nr_refs, int prot, Error **errp) { -void *map = xengnttab_map_domain_grant_refs(xendev->xgth, nr_refs, -xendev->frontend_id, refs, -prot); +void *map = qemu_xen_gnttab_map_refs(xendev->xgth, nr_refs, + xendev->frontend_id, refs, prot); if (!map) { error_setg_errno(errp, errno, @@ -971,109 +970,17 @@ void *xen_device_map_grant_refs(XenDevice *xendev, uint32_t *refs, void xen_device_unmap_grant_refs(XenDevice *xendev, void *map, unsigned int nr_refs, Error **errp) { -if (xengnttab_unmap(xendev->xgth, map, nr_refs)) { +if (qemu_xen_gnttab_unmap(xendev->xgth, map, nr_refs)) { error_setg_errno(errp, errno, "xengnttab_unmap failed"); } } -static void compat_copy_grant_refs(XenDevice *xendev, bool to_domain, - XenDeviceGrantCopySegment segs[], - unsigned int nr_segs, Error **errp) -{ -uint32_t *refs = g_new(uint32_t, nr_segs); -int prot = to_domain ? PROT_WRITE : PROT_READ; -void *map; -unsigned int i; - -for (i = 0; i < nr_segs; i++) { -XenDeviceGrantCopySegment *seg = &segs[i]; - -refs[i] = to_domain ? seg->dest.foreign.ref : -seg->source.foreign.ref; -} - -map = xengnttab_map_domain_grant_refs(xendev->xgth, nr_segs, - xendev->frontend_id, refs, - prot); -if (!map) { -error_setg_errno(errp, errno, - "xengnttab_map_domain_grant_refs failed"); -goto done; -} - -for (i = 0; i < nr_segs; i++) { -XenDeviceGrantCopySegment *seg = &segs[i]; -void *page = map + (i * XC_PAGE_SIZE); - -if (to_domain) { -memcpy(page + seg->dest.foreign.offset, seg->source.virt, - seg->len); -} else { -memcpy(seg->dest.virt, page + seg->source.foreign.offset, - seg->len); -} -} - -if (xengnttab_unmap(xendev->xgth, map, nr_segs)) { -error_setg_errno(errp, errno, "xengnttab_unmap failed"); -} - -done: -g_free(refs); -} - void xen_device_copy_grant_refs(XenDevice *xendev, bool to_domain, XenDeviceGrantCopySegment segs[], unsigned int nr_segs, Error **errp) { -xengnttab_grant_copy_segment_t *xengnttab_segs; -unsigned int i; - -if (!xendev->feature_grant_copy) { -compat_copy_grant_refs(xendev,
[RFC PATCH v11bis 13/26] hw/xen: Add xenstore operations to allow redirection to internal emulation
From: Paul Durrant Signed-off-by: Paul Durrant Signed-off-by: David Woodhouse --- accel/xen/xen-all.c | 11 +- hw/char/xen_console.c | 2 +- hw/i386/kvm/xen_xenstore.c | 3 - hw/i386/kvm/xenstore_impl.h | 8 +- hw/xen/xen-bus-helper.c | 61 +++ hw/xen/xen-bus.c| 258 hw/xen/xen-legacy-backend.c | 121 +++-- hw/xen/xen-operations.c | 189 hw/xen/xen_devconfig.c | 4 +- hw/xen/xen_pt_graphics.c| 1 - hw/xen/xen_pvdev.c | 49 +- include/hw/xen/xen-bus-helper.h | 25 ++- include/hw/xen/xen-bus.h| 17 +- include/hw/xen/xen-legacy-backend.h | 6 +- include/hw/xen/xen_backend_ops.h| 155 + include/hw/xen/xen_common.h | 1 - include/hw/xen/xen_pvdev.h | 2 +- softmmu/globals.c | 1 + 18 files changed, 503 insertions(+), 411 deletions(-) diff --git a/accel/xen/xen-all.c b/accel/xen/xen-all.c index 2329556595..6df7b4ff34 100644 --- a/accel/xen/xen-all.c +++ b/accel/xen/xen-all.c @@ -100,12 +100,15 @@ void xenstore_store_pv_console_info(int i, Chardev *chr) } -static void xenstore_record_dm_state(struct xs_handle *xs, const char *state) +static void xenstore_record_dm_state(const char *state) { +struct xs_handle *xs; char path[50]; +/* We now have everything we need to set the xenstore entry. */ +xs = xs_open(0); if (xs == NULL) { -error_report("xenstore connection not initialized"); +fprintf(stderr, "Could not contact XenStore\n"); exit(1); } @@ -119,6 +122,8 @@ static void xenstore_record_dm_state(struct xs_handle *xs, const char *state) error_report("error recording dm state"); exit(1); } + +xs_close(xs); } @@ -127,7 +132,7 @@ static void xen_change_state_handler(void *opaque, bool running, { if (running) { /* record state running */ -xenstore_record_dm_state(xenstore, "running"); +xenstore_record_dm_state("running"); } } diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c index e9cef3e1ef..ad8638a86d 100644 --- a/hw/char/xen_console.c +++ b/hw/char/xen_console.c @@ -181,7 +181,7 @@ static int con_init(struct XenLegacyDevice *xendev) const char *output; /* setup */ -dom = xs_get_domain_path(xenstore, con->xendev.dom); +dom = qemu_xen_xs_get_domain_path(xenstore, con->xendev.dom); if (!xendev->dev) { snprintf(con->console, sizeof(con->console), "%s/console", dom); } else { diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index ff39137846..57cd18a7c9 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -38,9 +38,6 @@ #define TYPE_XEN_XENSTORE "xen-xenstore" OBJECT_DECLARE_SIMPLE_TYPE(XenXenstoreState, XEN_XENSTORE) -#define XEN_PAGE_SHIFT 12 -#define XEN_PAGE_SIZE (1ULL << XEN_PAGE_SHIFT) - #define ENTRIES_PER_FRAME_V1 (XEN_PAGE_SIZE / sizeof(grant_entry_v1_t)) #define ENTRIES_PER_FRAME_V2 (XEN_PAGE_SIZE / sizeof(grant_entry_v2_t)) diff --git a/hw/i386/kvm/xenstore_impl.h b/hw/i386/kvm/xenstore_impl.h index bbe2391e2e..0df2a91aae 100644 --- a/hw/i386/kvm/xenstore_impl.h +++ b/hw/i386/kvm/xenstore_impl.h @@ -12,13 +12,7 @@ #ifndef QEMU_XENSTORE_IMPL_H #define QEMU_XENSTORE_IMPL_H -typedef uint32_t xs_transaction_t; - -#define XBT_NULL 0 - -#define XS_PERM_NONE 0x00 -#define XS_PERM_READ 0x01 -#define XS_PERM_WRITE 0x02 +#include "hw/xen/xen_backend_ops.h" typedef struct XenstoreImplState XenstoreImplState; diff --git a/hw/xen/xen-bus-helper.c b/hw/xen/xen-bus-helper.c index 5a1e12b374..db1e64e0fb 100644 --- a/hw/xen/xen-bus-helper.c +++ b/hw/xen/xen-bus-helper.c @@ -10,6 +10,7 @@ #include "hw/xen/xen-bus.h" #include "hw/xen/xen-bus-helper.h" #include "qapi/error.h" +#include "trace.h" #include @@ -46,34 +47,28 @@ const char *xs_strstate(enum xenbus_state state) return "INVALID"; } -void xs_node_create(struct xs_handle *xsh, xs_transaction_t tid, -const char *node, struct xs_permissions perms[], -unsigned int nr_perms, Error **errp) +void xs_node_create(struct qemu_xs_handle *h, xs_transaction_t tid, +const char *node, unsigned int owner, unsigned int domid, +unsigned int perms, Error **errp) { trace_xs_node_create(node); -if (!xs_write(xsh, tid, node, "", 0)) { +if (!qemu_xen_xs_create(h, tid, owner, domid, perms, node)) { error_setg_errno(errp, errno, "failed to create node '%s'", node); -return; -} - -if (!xs_set_permissions(xsh, tid, node, perms, nr_perms)) { -error_setg_errno(errp, errno, "failed to set node '%s' permissions", - node); } } -void xs_node_destroy(struct xs_handle *xsh, xs_transaction_
[RFC PATCH v11bis 05/26] hw/xen: Watches on XenStore transactions
From: David Woodhouse Firing watches on the nodes that still exist is relatively easy; just walk the tree and look at the nodes with refcount of one. Firing watches on *deleted* nodes is more fun. We add 'modified_in_tx' and 'deleted_in_tx' flags to each node. Nodes with those flags cannot be shared, as they will always be unique to the transaction in which they were created. When xs_node_walk would need to *create* a node as scaffolding and it encounters a deleted_in_tx node, it can resurrect it simply by clearing its deleted_in_tx flag. If that node originally had any *data*, they're gone, and the modified_in_tx flag will have been set when it was first deleted. We then attempt to send appropriate watches when the transaction is committed, properly delete the deleted_in_tx nodes, and remove the modified_in_tx flag from the others. Signed-off-by: David Woodhouse --- hw/i386/kvm/xenstore_impl.c | 133 - tests/unit/test-xs-node.c | 142 +++- 2 files changed, 272 insertions(+), 3 deletions(-) diff --git a/hw/i386/kvm/xenstore_impl.c b/hw/i386/kvm/xenstore_impl.c index 692622f706..2f9e1ff51b 100644 --- a/hw/i386/kvm/xenstore_impl.c +++ b/hw/i386/kvm/xenstore_impl.c @@ -30,6 +30,8 @@ typedef struct XsNode { GByteArray *content; GHashTable *children; uint64_t gencnt; +bool deleted_in_tx; +bool modified_in_tx; #ifdef XS_NODE_UNIT_TEST gchar *name; /* debug only */ #endif @@ -116,6 +118,13 @@ static XsNode *xs_node_copy(XsNode *old) XsNode *n = xs_node_new(); n->gencnt = old->gencnt; + +#ifdef XS_NODE_UNIT_TEST +if (n->name) { +n->name = g_strdup(old->name); +} +#endif + if (old->children) { n->children = g_hash_table_new_full(g_str_hash, g_str_equal, g_free, (GDestroyNotify)xs_node_unref); @@ -165,6 +174,7 @@ struct walk_op { GList *watches; unsigned int dom_id; +unsigned int tx_id; /* * This is maintained on the way *down* the walk to indicate @@ -180,6 +190,9 @@ struct walk_op { bool mutating; bool create_dirs; bool in_transaction; + +/* Tracking during recursion so we know which is first. */ +bool deleted_in_tx; }; static void fire_watches(struct walk_op *op, bool parents) @@ -224,6 +237,9 @@ static int xs_node_add_content(XsNode **n, struct walk_op *op) g_byte_array_unref((*n)->content); } (*n)->content = g_byte_array_ref(op->op_opaque); +if (op->tx_id != XBT_NULL) { +(*n)->modified_in_tx = true; +} return 0; } @@ -279,10 +295,51 @@ static int node_rm_recurse(gpointer key, gpointer value, gpointer user_data) return this_inplace; } +static XsNode *xs_node_copy_deleted(XsNode *old); +static void copy_deleted_recurse(gpointer key, gpointer value, + gpointer user_data) +{ +XsNode *n = xs_node_copy_deleted(value); +g_hash_table_insert(user_data, g_strdup(key), n); +} + +static XsNode *xs_node_copy_deleted(XsNode *old) +{ +XsNode *n = xs_node_new(); + +n->gencnt = old->gencnt; + +#ifdef XS_NODE_UNIT_TEST +if (old->name) { +n->name = g_strdup(old->name); +} +#endif + +if (old->children) { +n->children = g_hash_table_new_full(g_str_hash, g_str_equal, g_free, +(GDestroyNotify)xs_node_unref); +g_hash_table_foreach(old->children, copy_deleted_recurse, n->children); +} +n->deleted_in_tx = true; +/* If it gets resurrected we only fire a watch if it lost its content */ +if (old->content) { +n->modified_in_tx = true; +} +return n; +} + static int xs_node_rm(XsNode **n, struct walk_op *op) { bool this_inplace = op->inplace; +if (op->tx_id != XBT_NULL) { +/* It's not trivial to do inplace handling for this one */ +XsNode *old = *n; +*n = xs_node_copy_deleted(old); +xs_node_unref(old); +return 0; +} + /* Fire watches for any node in the subtree which gets deleted. */ if ((*n)->children) { g_hash_table_foreach_remove((*n)->children, node_rm_recurse, op); @@ -353,6 +410,10 @@ static int xs_node_walk(XsNode **n, struct walk_op *op) } if (child) { +if (child->deleted_in_tx) { +assert(child->ref == 1); +child->deleted_in_tx = false; +} xs_node_ref(child); /* * Now we own it too. But if we can modify inplace, that's going to @@ -502,6 +563,7 @@ static int init_walk_op(XenstoreImplState *s, struct walk_op *op, op->create_dirs = false; op->in_transaction = false; op->dom_id = dom_id; +op->tx_id = tx_id; op->s = s; if (tx_id == XBT_NULL) { @@ -605,8 +667,69 @@ int xs_impl_transaction_start(XenstoreImplState *s, unsigned int dom_id, return 0; } +static gboolean tx_commit_walk(gpointer key,
[RFC PATCH v11bis 17/26] hw/xen: Build PV backend drivers for CONFIG_XEN_BUS
From: David Woodhouse Now that we have the redirectable Xen backend operations we can build the PV backends even without the Xen libraries. Signed-off-by: David Woodhouse --- hw/9pfs/meson.build| 2 +- hw/block/dataplane/meson.build | 2 +- hw/block/meson.build | 2 +- hw/char/meson.build| 2 +- hw/display/meson.build | 2 +- hw/usb/meson.build | 2 +- hw/xen/meson.build | 5 - 7 files changed, 10 insertions(+), 7 deletions(-) diff --git a/hw/9pfs/meson.build b/hw/9pfs/meson.build index 12443b6ad5..fd37b7a02d 100644 --- a/hw/9pfs/meson.build +++ b/hw/9pfs/meson.build @@ -15,7 +15,7 @@ fs_ss.add(files( )) fs_ss.add(when: 'CONFIG_LINUX', if_true: files('9p-util-linux.c')) fs_ss.add(when: 'CONFIG_DARWIN', if_true: files('9p-util-darwin.c')) -fs_ss.add(when: 'CONFIG_XEN', if_true: files('xen-9p-backend.c')) +fs_ss.add(when: 'CONFIG_XEN_BUS', if_true: files('xen-9p-backend.c')) softmmu_ss.add_all(when: 'CONFIG_FSDEV_9P', if_true: fs_ss) specific_ss.add(when: 'CONFIG_VIRTIO_9P', if_true: files('virtio-9p-device.c')) diff --git a/hw/block/dataplane/meson.build b/hw/block/dataplane/meson.build index 12c6a264f1..78d7ac1a11 100644 --- a/hw/block/dataplane/meson.build +++ b/hw/block/dataplane/meson.build @@ -1,2 +1,2 @@ specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c')) -specific_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c')) +specific_ss.add(when: 'CONFIG_XEN_BUS', if_true: files('xen-block.c')) diff --git a/hw/block/meson.build b/hw/block/meson.build index b434d5654c..cc2a75cc50 100644 --- a/hw/block/meson.build +++ b/hw/block/meson.build @@ -14,7 +14,7 @@ softmmu_ss.add(when: 'CONFIG_PFLASH_CFI02', if_true: files('pflash_cfi02.c')) softmmu_ss.add(when: 'CONFIG_SSI_M25P80', if_true: files('m25p80.c')) softmmu_ss.add(when: 'CONFIG_SSI_M25P80', if_true: files('m25p80_sfdp.c')) softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c')) -softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c')) +softmmu_ss.add(when: 'CONFIG_XEN_BUS', if_true: files('xen-block.c')) softmmu_ss.add(when: 'CONFIG_TC58128', if_true: files('tc58128.c')) specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c', 'virtio-blk-common.c')) diff --git a/hw/char/meson.build b/hw/char/meson.build index 7b594f51b8..e02c60dd54 100644 --- a/hw/char/meson.build +++ b/hw/char/meson.build @@ -18,7 +18,7 @@ softmmu_ss.add(when: 'CONFIG_SERIAL_PCI', if_true: files('serial-pci.c')) softmmu_ss.add(when: 'CONFIG_SERIAL_PCI_MULTI', if_true: files('serial-pci-multi.c')) softmmu_ss.add(when: 'CONFIG_SHAKTI_UART', if_true: files('shakti_uart.c')) softmmu_ss.add(when: 'CONFIG_VIRTIO_SERIAL', if_true: files('virtio-console.c')) -softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen_console.c')) +softmmu_ss.add(when: 'CONFIG_XEN_BUS', if_true: files('xen_console.c')) softmmu_ss.add(when: 'CONFIG_XILINX', if_true: files('xilinx_uartlite.c')) softmmu_ss.add(when: 'CONFIG_AVR_USART', if_true: files('avr_usart.c')) diff --git a/hw/display/meson.build b/hw/display/meson.build index f470179122..4191694380 100644 --- a/hw/display/meson.build +++ b/hw/display/meson.build @@ -14,7 +14,7 @@ softmmu_ss.add(when: 'CONFIG_PL110', if_true: files('pl110.c')) softmmu_ss.add(when: 'CONFIG_SII9022', if_true: files('sii9022.c')) softmmu_ss.add(when: 'CONFIG_SSD0303', if_true: files('ssd0303.c')) softmmu_ss.add(when: 'CONFIG_SSD0323', if_true: files('ssd0323.c')) -softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xenfb.c')) +softmmu_ss.add(when: 'CONFIG_XEN_BUS', if_true: files('xenfb.c')) softmmu_ss.add(when: 'CONFIG_VGA_PCI', if_true: files('vga-pci.c')) softmmu_ss.add(when: 'CONFIG_VGA_ISA', if_true: files('vga-isa.c')) diff --git a/hw/usb/meson.build b/hw/usb/meson.build index bdf34cbd3e..599dc24f0d 100644 --- a/hw/usb/meson.build +++ b/hw/usb/meson.build @@ -84,6 +84,6 @@ if libusb.found() hw_usb_modules += {'host': usbhost_ss} endif -softmmu_ss.add(when: ['CONFIG_USB', 'CONFIG_XEN', libusb], if_true: files('xen-usb.c')) +softmmu_ss.add(when: ['CONFIG_USB', 'CONFIG_XEN_BUS', libusb], if_true: files('xen-usb.c')) modules += { 'hw-usb': hw_usb_modules } diff --git a/hw/xen/meson.build b/hw/xen/meson.build index 02b9118b7f..afd20754a1 100644 --- a/hw/xen/meson.build +++ b/hw/xen/meson.build @@ -1,10 +1,13 @@ -softmmu_ss.add(when: ['CONFIG_XEN', xen], if_true: files( +softmmu_ss.add(when: ['CONFIG_XEN_BUS'], if_true: files( 'xen-backend.c', 'xen-bus-helper.c', 'xen-bus.c', 'xen-legacy-backend.c', 'xen_devconfig.c', 'xen_pvdev.c', +)) + +softmmu_ss.add(when: ['CONFIG_XEN', xen], if_true: files( 'xen-operations.c', )) -- 2.39.0
[RFC PATCH v11bis 11/26] hw/xen: Pass grant ref to gnttab unmap operation
From: David Woodhouse The previous commit introduced redirectable gnttab operations fairly much like-for-like, with the exception of the extra arguments to the ->open() call which were always NULL/0 anyway. This *changes* the arguments to the ->unmap() operation to include the original ref# that was mapped. Under real Xen it isn't necessary; all we need to do from QEMU is munmap(), then the kernel will release the grant, and Xen does the tracking/refcounting for the guest. When we have emulated grant tables though, we need to do all that for ourselves. So let's have the back ends keep track of what they mapped and pass it in to the ->unmap() method for us. Signed-off-by: David Woodhouse --- hw/9pfs/xen-9p-backend.c| 7 --- hw/block/dataplane/xen-block.c | 1 + hw/char/xen_console.c | 2 +- hw/net/xen_nic.c| 13 - hw/usb/xen-usb.c| 21 - hw/xen/xen-bus.c| 4 ++-- hw/xen/xen-legacy-backend.c | 4 ++-- hw/xen/xen-operations.c | 9 - include/hw/xen/xen-bus.h| 2 +- include/hw/xen/xen-legacy-backend.h | 6 +++--- include/hw/xen/xen_backend_ops.h| 7 --- 11 files changed, 50 insertions(+), 26 deletions(-) diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c index 864bdaf952..d8bb0e847c 100644 --- a/hw/9pfs/xen-9p-backend.c +++ b/hw/9pfs/xen-9p-backend.c @@ -359,12 +359,13 @@ static int xen_9pfs_free(struct XenLegacyDevice *xendev) if (xen_9pdev->rings[i].data != NULL) { xen_be_unmap_grant_refs(&xen_9pdev->xendev, xen_9pdev->rings[i].data, +xen_9pdev->rings[i].intf->ref, (1 << xen_9pdev->rings[i].ring_order)); } if (xen_9pdev->rings[i].intf != NULL) { -xen_be_unmap_grant_refs(&xen_9pdev->xendev, -xen_9pdev->rings[i].intf, -1); +xen_be_unmap_grant_ref(&xen_9pdev->xendev, + xen_9pdev->rings[i].intf, + xen_9pdev->rings[i].ref); } if (xen_9pdev->rings[i].bh != NULL) { qemu_bh_delete(xen_9pdev->rings[i].bh); diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c index 2785b9e849..e55b713002 100644 --- a/hw/block/dataplane/xen-block.c +++ b/hw/block/dataplane/xen-block.c @@ -705,6 +705,7 @@ void xen_block_dataplane_stop(XenBlockDataPlane *dataplane) Error *local_err = NULL; xen_device_unmap_grant_refs(xendev, dataplane->sring, +dataplane->ring_ref, dataplane->nr_ring_ref, &local_err); dataplane->sring = NULL; diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c index 63153dfde4..19ad6c946a 100644 --- a/hw/char/xen_console.c +++ b/hw/char/xen_console.c @@ -271,7 +271,7 @@ static void con_disconnect(struct XenLegacyDevice *xendev) if (!xendev->dev) { xenforeignmemory_unmap(xen_fmem, con->sring, 1); } else { -xen_be_unmap_grant_ref(xendev, con->sring); +xen_be_unmap_grant_ref(xendev, con->sring, con->ring_ref); } con->sring = NULL; } diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c index 7d92c2d022..166d03787d 100644 --- a/hw/net/xen_nic.c +++ b/hw/net/xen_nic.c @@ -181,7 +181,7 @@ static void net_tx_packets(struct XenNetDev *netdev) qemu_send_packet(qemu_get_queue(netdev->nic), page + txreq.offset, txreq.size); } -xen_be_unmap_grant_ref(&netdev->xendev, page); +xen_be_unmap_grant_ref(&netdev->xendev, page, txreq.gref); net_tx_response(netdev, &txreq, NETIF_RSP_OKAY); } if (!netdev->tx_work) { @@ -261,7 +261,7 @@ static ssize_t net_rx_packet(NetClientState *nc, const uint8_t *buf, size_t size return -1; } memcpy(page + NET_IP_ALIGN, buf, size); -xen_be_unmap_grant_ref(&netdev->xendev, page); +xen_be_unmap_grant_ref(&netdev->xendev, page, rxreq.gref); net_rx_response(netdev, &rxreq, NETIF_RSP_OKAY, NET_IP_ALIGN, size, 0); return size; @@ -343,7 +343,8 @@ static int net_connect(struct XenLegacyDevice *xendev) netdev->rx_ring_ref, PROT_READ | PROT_WRITE); if (!netdev->rxs) { -xen_be_unmap_grant_ref(&netdev->xendev, netdev->txs); +xen_be_unmap_grant_ref(&netdev->xendev, netdev->txs, + netdev->tx_ring_ref); netdev->txs = NULL; return -1; } @@ -368,11 +369,13 @@ static void net_disconnect(struct XenLegacyDevice *xendev) xen_pv_unbind_evtchn(&netdev->xendev); if (ne
[RFC PATCH v11bis 15/26] hw/xen: Use XEN_PAGE_SIZE in PV backend drivers
From: David Woodhouse XC_PAGE_SIZE comes from the actual Xen libraries, while XEN_PAGE_SIZE is provided by QEMU itself in xen_backend_ops.h. For backends which may be built for emulation mode, use the latter. Signed-off-by: David Woodhouse --- hw/block/dataplane/xen-block.c | 8 hw/display/xenfb.c | 12 ++-- hw/net/xen_nic.c | 12 ++-- hw/usb/xen-usb.c | 8 4 files changed, 20 insertions(+), 20 deletions(-) diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c index e55b713002..8322a1de82 100644 --- a/hw/block/dataplane/xen-block.c +++ b/hw/block/dataplane/xen-block.c @@ -101,9 +101,9 @@ static XenBlockRequest *xen_block_start_request(XenBlockDataPlane *dataplane) * re-use requests, allocate the memory once here. It will be freed * xen_block_dataplane_destroy() when the request list is freed. */ -request->buf = qemu_memalign(XC_PAGE_SIZE, +request->buf = qemu_memalign(XEN_PAGE_SIZE, BLKIF_MAX_SEGMENTS_PER_REQUEST * - XC_PAGE_SIZE); + XEN_PAGE_SIZE); dataplane->requests_total++; qemu_iovec_init(&request->v, 1); } else { @@ -185,7 +185,7 @@ static int xen_block_parse_request(XenBlockRequest *request) goto err; } if (request->req.seg[i].last_sect * dataplane->sector_size >= -XC_PAGE_SIZE) { +XEN_PAGE_SIZE) { error_report("error: page crossing"); goto err; } @@ -740,7 +740,7 @@ void xen_block_dataplane_start(XenBlockDataPlane *dataplane, dataplane->protocol = protocol; -ring_size = XC_PAGE_SIZE * dataplane->nr_ring_ref; +ring_size = XEN_PAGE_SIZE * dataplane->nr_ring_ref; switch (dataplane->protocol) { case BLKIF_PROTOCOL_NATIVE: { diff --git a/hw/display/xenfb.c b/hw/display/xenfb.c index 2c4016fcbd..0074a9b6f8 100644 --- a/hw/display/xenfb.c +++ b/hw/display/xenfb.c @@ -489,13 +489,13 @@ static int xenfb_map_fb(struct XenFB *xenfb) } if (xenfb->pixels) { -munmap(xenfb->pixels, xenfb->fbpages * XC_PAGE_SIZE); +munmap(xenfb->pixels, xenfb->fbpages * XEN_PAGE_SIZE); xenfb->pixels = NULL; } -xenfb->fbpages = DIV_ROUND_UP(xenfb->fb_len, XC_PAGE_SIZE); +xenfb->fbpages = DIV_ROUND_UP(xenfb->fb_len, XEN_PAGE_SIZE); n_fbdirs = xenfb->fbpages * mode / 8; -n_fbdirs = DIV_ROUND_UP(n_fbdirs, XC_PAGE_SIZE); +n_fbdirs = DIV_ROUND_UP(n_fbdirs, XEN_PAGE_SIZE); pgmfns = g_new0(xen_pfn_t, n_fbdirs); fbmfns = g_new0(xen_pfn_t, xenfb->fbpages); @@ -528,8 +528,8 @@ static int xenfb_configure_fb(struct XenFB *xenfb, size_t fb_len_lim, { size_t mfn_sz = sizeof_field(struct xenfb_page, pd[0]); size_t pd_len = sizeof_field(struct xenfb_page, pd) / mfn_sz; -size_t fb_pages = pd_len * XC_PAGE_SIZE / mfn_sz; -size_t fb_len_max = fb_pages * XC_PAGE_SIZE; +size_t fb_pages = pd_len * XEN_PAGE_SIZE / mfn_sz; +size_t fb_len_max = fb_pages * XEN_PAGE_SIZE; int max_width, max_height; if (fb_len_lim > fb_len_max) { @@ -930,7 +930,7 @@ static void fb_disconnect(struct XenLegacyDevice *xendev) * instead. This releases the guest pages and keeps qemu happy. */ qemu_xen_foreignmem_unmap(fb->pixels, fb->fbpages); -fb->pixels = mmap(fb->pixels, fb->fbpages * XC_PAGE_SIZE, +fb->pixels = mmap(fb->pixels, fb->fbpages * XEN_PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANON, -1, 0); if (fb->pixels == MAP_FAILED) { diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c index 166d03787d..9bbf6599fc 100644 --- a/hw/net/xen_nic.c +++ b/hw/net/xen_nic.c @@ -145,7 +145,7 @@ static void net_tx_packets(struct XenNetDev *netdev) continue; } -if ((txreq.offset + txreq.size) > XC_PAGE_SIZE) { +if ((txreq.offset + txreq.size) > XEN_PAGE_SIZE) { xen_pv_printf(&netdev->xendev, 0, "error: page crossing\n"); net_tx_error(netdev, &txreq, rc); continue; @@ -171,7 +171,7 @@ static void net_tx_packets(struct XenNetDev *netdev) if (txreq.flags & NETTXF_csum_blank) { /* have read-only mapping -> can't fill checksum in-place */ if (!tmpbuf) { -tmpbuf = g_malloc(XC_PAGE_SIZE); +tmpbuf = g_malloc(XEN_PAGE_SIZE); } memcpy(tmpbuf, page + txreq.offset, txreq.size); net_checksum_calculate(tmpbuf, txreq.size, CSUM_ALL); @@ -243,9 +243,9 @@ static ssize_t net_rx_packet(NetClientState *nc, const uint8_t *buf, size_t size if (rc == rp || RING_REQUEST_CONS_OVERFLOW(&netdev->rx_ring, rc)) { return 0; } -if (size > XC_PAGE_SIZE - NET_I
Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory
> +static struct file *restrictedmem_file_create(struct file *memfd) > +{ > + struct restrictedmem_data *data; > + struct address_space *mapping; > + struct inode *inode; > + struct file *file; > + > + data = kzalloc(sizeof(*data), GFP_KERNEL); > + if (!data) > + return ERR_PTR(-ENOMEM); > + > + data->memfd = memfd; > + mutex_init(&data->lock); > + INIT_LIST_HEAD(&data->notifiers); > + > + inode = alloc_anon_inode(restrictedmem_mnt->mnt_sb); > + if (IS_ERR(inode)) { > + kfree(data); > + return ERR_CAST(inode); > + } alloc_anon_inode() uses new_pseudo_inode() to get the inode. As per the comment, new inode is not added to the superblock s_inodes list. /** * new_inode_pseudo- obtain an inode * @sb: superblock * * Allocates a new inode for given superblock. * Inode wont be chained in superblock s_inodes list * This means : * - fs can't be unmount * - quotas, fsnotify, writeback can't work */ So the restrictedmem_error_page will not find the inode as it was never added to the s_inodes list. We might need to add the inode after allocating. inode_sb_list_add(inode); > +void restrictedmem_error_page(struct page *page, struct address_space > *mapping) > +{ > + struct super_block *sb = restrictedmem_mnt->mnt_sb; > + struct inode *inode, *next; > + > + if (!shmem_mapping(mapping)) > + return; > + > + spin_lock(&sb->s_inode_list_lock); > + list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { > + struct restrictedmem_data *data = > inode->i_mapping->private_data; > + struct file *memfd = data->memfd; > + > + if (memfd->f_mapping == mapping) { > + pgoff_t start, end; > + > + spin_unlock(&sb->s_inode_list_lock); > + > + start = page->index; > + end = start + thp_nr_pages(page); > + restrictedmem_notifier_error(data, start, end); > + return; > + } > + } > + spin_unlock(&sb->s_inode_list_lock); > +} Regards Nikunj
Re: [RFC PATCH 1/2] mm: restrictedmem: Allow userspace to specify mount_path for memfd_restricted
On Thu, Feb 16, 2023 at 12:41:16AM +, Ackerley Tng wrote: > By default, the backing shmem file for a restrictedmem fd is created > on shmem's kernel space mount. > > With this patch, an optional tmpfs mount can be specified, which will > be used as the mountpoint for backing the shmem file associated with a > restrictedmem fd. > > This change is modeled after how sys_open() can create an unnamed > temporary file in a given directory with O_TMPFILE. > > This will help restrictedmem fds inherit the properties of the > provided tmpfs mounts, for example, hugepage allocation hints, NUMA > binding hints, etc. > > Signed-off-by: Ackerley Tng > --- > include/linux/syscalls.h | 2 +- > include/uapi/linux/restrictedmem.h | 8 > mm/restrictedmem.c | 63 +++--- > 3 files changed, 66 insertions(+), 7 deletions(-) > create mode 100644 include/uapi/linux/restrictedmem.h > > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h > index f9e9e0c820c5..4b8efe9a8680 100644 > --- a/include/linux/syscalls.h > +++ b/include/linux/syscalls.h > @@ -1056,7 +1056,7 @@ asmlinkage long sys_memfd_secret(unsigned int flags); > asmlinkage long sys_set_mempolicy_home_node(unsigned long start, unsigned > long len, > unsigned long home_node, > unsigned long flags); > -asmlinkage long sys_memfd_restricted(unsigned int flags); > +asmlinkage long sys_memfd_restricted(unsigned int flags, const char __user > *mount_path); > > /* > * Architecture-specific system calls I'm not sure what the right practice now: do we provide string that contains mount path or fd that represents the filesystem (returned from fsmount(2) or open_tree(2)). fd seems more flexible: it allows to specify unbind mounts. -- Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCH v3 02/10] target/riscv: always allow write_misa() to write MISA
On Thu, Feb 16, 2023 at 05:33:55PM +0800, Bin Meng wrote: > On Thu, Feb 16, 2023 at 5:29 PM Andrew Jones wrote: > > > > On Wed, Feb 15, 2023 at 03:57:18PM -0300, Daniel Henrique Barboza wrote: > > > At this moment, and apparently since ever, we have no way of enabling > > > RISCV_FEATURE_MISA. This means that all the code from write_misa(), all > > > the nuts and bolts that handles how to properly write this CSR, has > > > always been a no-op as well because write_misa() will always exit > > > earlier. > > > > > > This seems to be benign in the majority of cases. Booting an Ubuntu > > > 'virt' guest and logging all the calls to 'write_misa' shows that no > > > writes to MISA CSR was attempted. Writing MISA, i.e. enabling/disabling > > > RISC-V extensions after the machine is powered on, seems to be a niche > > > use. > > > > > > Regardless, the spec says that MISA is a WARL read-write CSR, and gating > > > the writes in the register doesn't make sense. OS and applications > > > should be wary of the consequences when writing it, but the write itself > > > must always be allowed. > > > > The write is already allowed, i.e. no exception is raised when writing it. > > The spec only says that the fields may/can be writable. So we can > > correctly implement the spec with just > > > > write_misa() > > { > >return RISCV_EXCP_NONE; > > } > > Agree. Such change is still spec compliant without worrying about the bugs. > > > > > as it has effectively been implemented to this point. > > > > Based on Weiwei Li's pointing out of known bugs, and the fact that > > this function has likely never been tested, then maybe we should just > > implement it as above for now. Once a better solution to extension > > sanity checking exists and a use (or at least test) case arises, then > > the function could be expanded with some actually writable bits. (Also, > > I think that when/if we do the expansion, then something like the misa_w > > config proposed in the previous version of this series may still be > > needed in order to allow opting-in/out of the behavior change.) > > In QEMU RISC-V we have some examples of implementing optional spec > features without exposing a config parameter. Do we need to add config > parameters for those cases too? If yes, I am afraid the parameters > will grow a lot. > I agree, particularly for RISC-V, the options grow quickly. How about this for a policy? 1) When adding an optional, on-by-default CPU feature, which applies to all currently existing CPU types, then just add the feature without a config. 2) When, later, a CPU type or use case needs to disable an optional CPU feature, which doesn't have a config, then the config is added at that time. While that policy seems reasonable (to me), I have a feeling the "applies to all currently existing CPU types" requirement won't be easily satisfied, so we'll end up mostly always adding configs anyway. We can always change RISCVCPUConfig.ext_* to a bitmap if we feel like the CPU is getting too bloated. Thanks, drew
Re: [PATCH v2 0/7] Python: Drop support for Python 3.6
On Wed, 15 Feb 2023 at 19:05, Markus Armbruster wrote: > > The discussion under PATCH 6 makes me think there's a bit of confusion > about the actual impact of dropping support for Python 3.6. Possibly > because it's spelled out in the commit message of PATCH 7. Let me > summarize it in one sentence: > > *** All supported host systems continue to work *** > > Evidence: CI remains green. > > On some supported host systems, different packages need to be installed. > On CentOS 8, for instance, we need to install Python 3.8.13 or 3.9.16 > instead of 3.6.8. Let me stress again: same repository, different > package. No downsides I can see. Yes; I never had any issues with this part of it. If there was a "Sphinx that also used that Python" in that repo, the answer would be easy. > The *one* exception is Sphinx on CentOS 8. CentOS 8 does not ship a > version of Sphinx that works with Python 3.7 or newer. This series > proposes to simply stop building the docs there, unless the user > provides a suitable version of Sphinx (which is easy enough with pip). > That's PATCH 7. Yes; this brings CentOS 8 down from "fully supported" to "kinda supported but not for everything", which is less than ideal. I think the level of not-idealness of that side of the scales is probably clear enough. The difficulty I think for those who haven't had their arms deep in QEMU's Python code is not having the background info to be able to weigh up how heavy the other side of the tradeoff scales is (since the naive "well, just keep writing Python 3.6 for the moment" take is clearly wrong). thanks -- PMM
Re: [RFC PATCH 12/16] hw/arm/smmuv3: Add VMID to tlb tagging
Hi, On 2/5/23 10:44, Mostafa Saleh wrote: > Allow TLB to be tagged with VMID. > > If stage-1 is only supported, VMID is set to -1 and ignored from STE > and CMD_TLBI_NH* cmds. > > Signed-off-by: Mostafa Saleh > --- > hw/arm/smmu-common.c | 24 +++- > hw/arm/smmu-internal.h | 2 ++ > hw/arm/smmuv3.c | 12 +--- > include/hw/arm/smmu-common.h | 5 +++-- > 4 files changed, 29 insertions(+), 14 deletions(-) > > diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c > index 541c427684..028a60949a 100644 > --- a/hw/arm/smmu-common.c > +++ b/hw/arm/smmu-common.c > @@ -56,10 +56,11 @@ static gboolean smmu_iotlb_key_equal(gconstpointer v1, > gconstpointer v2) > (k1->level == k2->level) && (k1->tg == k2->tg); > } > > -SMMUIOTLBKey smmu_get_iotlb_key(uint16_t asid, uint64_t iova, > +SMMUIOTLBKey smmu_get_iotlb_key(uint16_t asid, uint16_t vmid, uint64_t iova, > uint8_t tg, uint8_t level) > { > -SMMUIOTLBKey key = {.asid = asid, .iova = iova, .tg = tg, .level = > level}; > +SMMUIOTLBKey key = {.asid = asid, .vmid = vmid, .iova = iova, > +.tg = tg, .level = level}; > > return key; > } > @@ -78,7 +79,8 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg > *cfg, > uint64_t mask = subpage_size - 1; > SMMUIOTLBKey key; > > -key = smmu_get_iotlb_key(cfg->asid, iova & ~mask, tg, level); > +key = smmu_get_iotlb_key(cfg->asid, cfg->s2cfg.vmid, > + iova & ~mask, tg, level); > entry = g_hash_table_lookup(bs->iotlb, &key); > if (entry) { > break; > @@ -111,7 +113,8 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, > SMMUTLBEntry *new) > smmu_iotlb_inv_all(bs); > } > > -*key = smmu_get_iotlb_key(cfg->asid, new->entry.iova, tg, new->level); > +*key = smmu_get_iotlb_key(cfg->asid, cfg->s2cfg.vmid, new->entry.iova, > + tg, new->level); > trace_smmu_iotlb_insert(cfg->asid, new->entry.iova, tg, new->level); you may update the trace point as well Thanks Erric > g_hash_table_insert(bs->iotlb, key, new); > } > @@ -130,8 +133,7 @@ static gboolean smmu_hash_remove_by_asid(gpointer key, > gpointer value, > > return SMMU_IOTLB_ASID(*iotlb_key) == asid; > } > - > -static gboolean smmu_hash_remove_by_asid_iova(gpointer key, gpointer value, > +static gboolean smmu_hash_remove_by_asid_vmid_iova(gpointer key, gpointer > value, >gpointer user_data) > { > SMMUTLBEntry *iter = (SMMUTLBEntry *)value; > @@ -142,18 +144,21 @@ static gboolean smmu_hash_remove_by_asid_iova(gpointer > key, gpointer value, > if (info->asid >= 0 && info->asid != SMMU_IOTLB_ASID(iotlb_key)) { > return false; > } > +if (info->vmid >= 0 && info->vmid != SMMU_IOTLB_VMID(iotlb_key)) { > +return false; > +} > return ((info->iova & ~entry->addr_mask) == entry->iova) || > ((entry->iova & ~info->mask) == info->iova); > } > > -void smmu_iotlb_inv_iova(SMMUState *s, int asid, dma_addr_t iova, > +void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova, > uint8_t tg, uint64_t num_pages, uint8_t ttl) > { > /* if tg is not set we use 4KB range invalidation */ > uint8_t granule = tg ? tg * 2 + 10 : 12; > > if (ttl && (num_pages == 1) && (asid >= 0)) { > -SMMUIOTLBKey key = smmu_get_iotlb_key(asid, iova, tg, ttl); > +SMMUIOTLBKey key = smmu_get_iotlb_key(asid, vmid, iova, tg, ttl); > > if (g_hash_table_remove(s->iotlb, &key)) { > return; > @@ -166,10 +171,11 @@ void smmu_iotlb_inv_iova(SMMUState *s, int asid, > dma_addr_t iova, > > SMMUIOTLBPageInvInfo info = { > .asid = asid, .iova = iova, > +.vmid = vmid, > .mask = (num_pages * 1 << granule) - 1}; > > g_hash_table_foreach_remove(s->iotlb, > -smmu_hash_remove_by_asid_iova, > +smmu_hash_remove_by_asid_vmid_iova, > &info); > } > > diff --git a/hw/arm/smmu-internal.h b/hw/arm/smmu-internal.h > index 7d3f76ce14..3a14e5dca5 100644 > --- a/hw/arm/smmu-internal.h > +++ b/hw/arm/smmu-internal.h > @@ -136,9 +136,11 @@ static inline int pgd_idx(int start_level, int granule, > dma_addr_t iova) > } > > #define SMMU_IOTLB_ASID(key) ((key).asid) > +#define SMMU_IOTLB_VMID(key) ((key).vmid) > > typedef struct SMMUIOTLBPageInvInfo { > int asid; > +int vmid; > uint64_t iova; > uint64_t mask; > } SMMUIOTLBPageInvInfo; > diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c > index 35a0149bbf..8b070f6bb5 100644 > --- a/hw/arm/smmuv3.c > +++ b/hw/arm/smmuv3.c > @@ -986,7 +986,7 @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd) > { >
Re: Detecting qemu from guest on arm/hvf (apple arm)
On Thu, 16 Feb 2023 at 02:05, John-Mark Gurney wrote: > > Hello, > > I was wondering what the best way to detect that FreeBSD is running > under qemu/hvf on Apple ARM? FreeBSD doesn't see the ACPI FADT table, > so I'm wondering if keying off of something like the QEMU0002 device > in ACPI is the best way? Or is there another option? I guess the question is, why do you want to know? Typically the guest OS shouldn't care about whether it's running on bare metal, under a hypervisor, or under emulated QEMU, except to the extent that it wants to use specific features, in which case the question is more "how do I query for the existence of feature X?". thanks -- PMM
Re: Future of icount discussion for next KVM call?
(replying all because qemu-devel rejected my email again) On Thu, 16 Feb 2023 at 10:19, Alex Bennée wrote: > Hi Juan, > > Do we have an agenda for next weeks KVM call yet? If there is space I'd > like to take some time to discuss the future direction of icount. > > Specifically I believe there might be some proposals for how we could > support icount with MTTCG worth discussing. From my point of view icount > provides too things: > > - a sense of time vaguely related to execution rather than wall clock > - determinism > > I would love to divorce the former from icount and punt it to plugins. > The plugin would be free to instrument as heavily or lightly as it sees > fit and provide its best guess as to guest time on demand. I wrote this > idea up as a card in Linaro's JIRA if anyone is interested: > > https://linaro.atlassian.net/browse/QEMU-481 > > Being able to punt cost modelling and sense of time into plugins would > allow the core icount support to concentrate on determinism. Then any > attempt to enable icount for MTTCG would then have to ensure it stays > deterministic. > > Richard and I have discussed the problem a few times and weren't sure it > was solvable but I'm totally open to hearing ideas on how to do it. > Fundamentally I think we would have to ensure any TB's doing IO would > have to execute in an exclusive context. The TCG code already has > mechanisms to ensure all IO is only done at the end of blocks so it > doesn't seem a huge leap to ensure we execute those blocks exclusively. > However there is still the problem of what to do about other pure > computation threads getting ahead or behind of the IO blocks on > subsequent runs. > > Anyway does anyone else have ideas to bring to the discussion? > > -- > Alex Bennée > Virtualisation Tech Lead @ Linaro > -- Alex Bennée Emulation and Virtualisation Tech Lead @ Linaro
Re: [RFC PATCH v2 0/2] i386/sev: Support measured direct kernel boot on SNP
On Thu, Feb 16, 2023 at 08:49:11AM +, Dov Murik wrote: > This RFC patch series is based on AMD's RFC upmv10-snpv3 tree [1]. I've seen postings of the kernel patches for SNP using the kernel UPM support, but I don't recall ever seeing these QEMU pieces posted for review. The code in that QEMU branch looks different from the last posting of SNP to qemu-devel years ago. IMHO it would be very desirable if that QEMU UPM tree was submitted to qemu-devel for review feedback, before requesting review of patches that build on top of it. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH] audio/pwaudio.c: Add Pipewire audio backend for QEMU
15.02.2023 14:26, Daniel P. Berrangé пишет: .. - choices: ['alsa', 'coreaudio', 'default', 'dsound', 'jack', 'oss', 'pa', 'sdl', 'sndio'], + choices: ['alsa', 'coreaudio', 'default', 'dsound', 'jack', 'oss', 'pa', 'pw', 'sdl', 'sndio'], I appreciate you probably just followed the example of pulseaudio, abbreviated to 'pa', but I'm not a fan of the existing usage there. So lets be more verbose and say 'pipewire' so users are clear on what this is. I'd vote for "pw" here, it is a quite well-known acronym. It is not because "pa" is used in qemu, it is because "pa" and now "pw" is used everywhere in context of linux sound. But if there's no need to type it manually every time, it can be "pipewire" too. /mjt
Re: [PATCH v2 6/7] CI: Stop building docs on centos8
Daniel P. Berrangé writes: > On Tue, Feb 14, 2023 at 08:40:20AM +0100, Markus Armbruster wrote: >> Daniel P. Berrangé writes: >> >> [...] >> >> > We don't have to drop python 3.6. It is a choice because >> > of a desire to be able to use some shiny new python >> > features without caring about back compat. >> >> I read this on Friday, and decided to let it sit until after the >> weekend. Well, it's now Tuesday, and to be frank, it's still as >> offensively flippant as it was on Friday. It shows either ignorance of >> or cavalier disregard for the sheer amount of work some of us have had >> to put into keeping old versions of Python viable. >> >> The latter would be quite unlike you, so it must be the former. > > I'm sorry, I don't mean it to be offensive. I'm genuinely not seeing > from the descriptions in the series what the functional benefits are > from dropping 3.6. Apology accepted, and point well taken. >> John has sunk *man-months* into keeping old versions of Python viable. >> I've put in a lot less myself, but still enough to be painfully aware of >> it. I figure Cleber and Beraldo are somewhere in between >> >> Insinuating John's proposal is motivated by "a desire to be able to use >> some shiny new python features without caring about back compat" >> disrespects all this work. > > I'm writing my comments based on what is described in the cover letter > as the motivations for the change: > > [quote] > The motivation for this series is that Python 3.6 was EOL at the end of > 2021; upstream tools are beginning to drop support for it, including > setuptools, pylint, mypy, etc. As time goes by, it becomes more > difficult to support and test against the full range of Python versions > that QEMU supports. The closer we get to Python 3.12, the harder it will > be to cover that full spread of versions. > [/quote] > > this is all about new/eol versions of software upstream, and I don't > think that's a justification. QEMU explicitly aims to use distro provided > versions and upstream EOL status is not relevant in that context. Even > if using "pip" to install it is possible to limit yourself to upstream > releases which still support 3.6. > > There is the separate issue of Meson dropping python 3.6 which motivates > Paolo's series. Again though, we don't have to increase our minimum meson > version, because meson is working today. It is our choice to to increase > it to use latest available meson features. At some point we can decide > what we have is good enough and we don't have to keep chasing the latest > features. Maybe we're not there yet, but we should think about when that > would be. > > [quote] > The qemu.qmp library and the avocado testing framework both have > motivations for dropping 3.6 support, but are committed to not doing so > until QEMU drops support. > [/quote] > > I suspect that this is more of a driver for the drop of 3.6, but I > don't see any details. > > IOW overall justification come across as wanting to use new features, > and follow upstream EOL, without any real detail of what we're going > to gain from a functional POV. I think what we gain is there: "As time goes by, it becomes more difficult to support and test against the full range of Python versions that QEMU supports. The closer we get to Python 3.12, the harder it will be to cover that full spread of versions." In other words, the gain is "we make something that's hard and getting harder easier". What isn't in the cover letter, only in commit message 6+7/7, i.e. this patch and the next one, is what we pay for it: basically nothing (next patch's commit message) except for the issue of Sphinx in CentOS 8 (this patch's commit message). Putting at least a summary it in the cover letter would have been better. John did explain this again and in more detail in reply to Peter's "This confuses me. We work fine with Python 3.6 today." Quote: That won't last - Many tools such as mypy, pylint and flake8 which I use to manage our python codebase have been dropping support for 3.6 and I've had to implement an increasing number of workarounds to help keep it possible to test 3.6 via CI while also ensuring our newest platforms work as dev environments. Our testing matrix for Python is novel and thorough enough that it's revealed several bugs in other downstream Python distributions for Debian and Fedora, and dozens of bugs for the linters themselves. I'm concerned that when 3.7 is EOL'd in just a few months that the support and testing gap is going to get even uglier. [...] The argument I'm making is: - CentOS 8 is a supported build platform - All platforms *do* have a Python modern enough to allow us to drop 3.6 - CentOS's repo version of sphinx is hardcoded to use the older 3.6, though - You expressed a preference to me in the past to NOT use a pip installed version of sphinx in preference to the system version in "configure" - It's
Re: [RFC 23/52] arm: Replace MachineState.smp access with topology helpers
在 2023/2/13 17:50, Zhao Liu 写道: From: Zhao Liu When MachineState.topo is introduced, the topology related structures become complicated. So we wrapped the access to topology fields of MachineState.topo into some helpers, and we are using these helpers to replace the use of MachineState.smp. Before arm supports hybrid, here we use smp-specific interface to get "threads per core" and "cores per cluster". For other cases, it's straightforward to replace topology access with wrapped generic interfaces. Sorry. I have not yet understand the necessity of the mixed use of generic helpers and smp specific helpers😉. For a machine, the topo type is either smp or hybrid. So far the ARM virt machine's topo type is always smp, I don't see the difference between machine_topo_get_cores and machine_topo_get_smp_cores. When we want to support hybrid for ARM, change the naming of variables will be enough. Thanks, Yanan Cc: Jean-Christophe Dubois Cc: Andrey Smirnov Cc: Radoslaw Biernacki Cc: Leif Lindholm Cc: Shannon Zhao Cc: Alistair Francis Cc: Edgar E. Iglesias Signed-off-by: Zhao Liu --- hw/arm/fsl-imx6.c| 4 ++-- hw/arm/fsl-imx6ul.c | 4 ++-- hw/arm/fsl-imx7.c| 4 ++-- hw/arm/highbank.c| 2 +- hw/arm/realview.c| 2 +- hw/arm/sbsa-ref.c| 8 +++ hw/arm/vexpress.c| 2 +- hw/arm/virt-acpi-build.c | 4 ++-- hw/arm/virt.c| 50 ++-- hw/arm/xlnx-zynqmp.c | 6 ++--- include/hw/arm/virt.h| 2 +- target/arm/cpu.c | 2 +- target/arm/cpu_tcg.c | 2 +- target/arm/kvm.c | 2 +- 14 files changed, 50 insertions(+), 44 deletions(-) diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c index 00dafe3f62de..e94dec5e6c8d 100644 --- a/hw/arm/fsl-imx6.c +++ b/hw/arm/fsl-imx6.c @@ -41,7 +41,7 @@ static void fsl_imx6_init(Object *obj) char name[NAME_SIZE]; int i; -for (i = 0; i < MIN(ms->smp.cpus, FSL_IMX6_NUM_CPUS); i++) { +for (i = 0; i < MIN(machine_topo_get_cpus(ms), FSL_IMX6_NUM_CPUS); i++) { snprintf(name, NAME_SIZE, "cpu%d", i); object_initialize_child(obj, name, &s->cpu[i], ARM_CPU_TYPE_NAME("cortex-a9")); @@ -108,7 +108,7 @@ static void fsl_imx6_realize(DeviceState *dev, Error **errp) FslIMX6State *s = FSL_IMX6(dev); uint16_t i; Error *err = NULL; -unsigned int smp_cpus = ms->smp.cpus; +unsigned int smp_cpus = machine_topo_get_cpus(ms); if (smp_cpus > FSL_IMX6_NUM_CPUS) { error_setg(errp, "%s: Only %d CPUs are supported (%d requested)", diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c index d88d6cc1c5f9..1216b7ff1a92 100644 --- a/hw/arm/fsl-imx6ul.c +++ b/hw/arm/fsl-imx6ul.c @@ -160,9 +160,9 @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp) SysBusDevice *sbd; DeviceState *d; -if (ms->smp.cpus > 1) { +if (machine_topo_get_cpus(ms) > 1) { error_setg(errp, "%s: Only a single CPU is supported (%d requested)", - TYPE_FSL_IMX6UL, ms->smp.cpus); + TYPE_FSL_IMX6UL, machine_topo_get_cpus(ms)); return; } diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c index afc74807990f..f3e569a6ec29 100644 --- a/hw/arm/fsl-imx7.c +++ b/hw/arm/fsl-imx7.c @@ -36,7 +36,7 @@ static void fsl_imx7_init(Object *obj) char name[NAME_SIZE]; int i; -for (i = 0; i < MIN(ms->smp.cpus, FSL_IMX7_NUM_CPUS); i++) { +for (i = 0; i < MIN(machine_topo_get_cpus(ms), FSL_IMX7_NUM_CPUS); i++) { snprintf(name, NAME_SIZE, "cpu%d", i); object_initialize_child(obj, name, &s->cpu[i], ARM_CPU_TYPE_NAME("cortex-a7")); @@ -148,7 +148,7 @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp) int i; qemu_irq irq; char name[NAME_SIZE]; -unsigned int smp_cpus = ms->smp.cpus; +unsigned int smp_cpus = machine_topo_get_cpus(ms); if (smp_cpus > FSL_IMX7_NUM_CPUS) { error_setg(errp, "%s: Only %d CPUs are supported (%d requested)", diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c index f12aacea6b86..22d6987eafe1 100644 --- a/hw/arm/highbank.c +++ b/hw/arm/highbank.c @@ -181,7 +181,7 @@ static void calxeda_init(MachineState *machine, enum cxmachines machine_id) SysBusDevice *busdev; qemu_irq pic[128]; int n; -unsigned int smp_cpus = machine->smp.cpus; +unsigned int smp_cpus = machine_topo_get_cpus(machine); qemu_irq cpu_irq[4]; qemu_irq cpu_fiq[4]; qemu_irq cpu_virq[4]; diff --git a/hw/arm/realview.c b/hw/arm/realview.c index a5aa2f046aec..0a2022a34629 100644 --- a/hw/arm/realview.c +++ b/hw/arm/realview.c @@ -87,7 +87,7 @@ static void realview_init(MachineState *machine, DriveInfo *dinfo; I2CBus *i2c; int n; -unsigned int smp_cpus = machine->smp.cpus; +unsigned int smp_cpus = machine_topo_get
Re: [PATCH] audio/pwaudio.c: Add Pipewire audio backend for QEMU
On Thu, Feb 16, 2023 at 01:33:47PM +0300, Michael Tokarev wrote: > 15.02.2023 14:26, Daniel P. Berrangé пишет: > .. > > > - choices: ['alsa', 'coreaudio', 'default', 'dsound', 'jack', > > > 'oss', 'pa', 'sdl', 'sndio'], > > > + choices: ['alsa', 'coreaudio', 'default', 'dsound', 'jack', > > > 'oss', 'pa', 'pw', 'sdl', 'sndio'], > > > > I appreciate you probably just followed the example of pulseaudio, > > abbreviated > > to 'pa', but I'm not a fan of the existing usage there. So lets be more > > verbose > > and say 'pipewire' so users are clear on what this is. > > I'd vote for "pw" here, it is a quite well-known acronym. It is not because > "pa" > is used in qemu, it is because "pa" and now "pw" is used everywhere in context > of linux sound. I don't see 'pw' as well known. The package is 'pipewire', the processes are 'pipewire', the config files are 'pipewire'. The 'pw' abbreviation is too short to be meaningful unless you are already familiar with pipewire. > But if there's no need to type it manually every time, it can > be "pipewire" too. "pipewire" is not exactly a long word to begin with, and saving a couple of letters to abbreviate it isn't going to make a difference in the context of QEMU's command line. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
[PATCH] Adding new machine Yosemitev2 in QEMU
Signed-off-by: Karthikeyan Pasupathi --- hw/arm/aspeed.c| 38 ++ hw/arm/aspeed_eeprom.c | 23 +++ hw/arm/aspeed_eeprom.h | 3 +++ 3 files changed, 64 insertions(+) diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c index 27dda58338..35ff29b752 100644 --- a/hw/arm/aspeed.c +++ b/hw/arm/aspeed.c @@ -521,6 +521,22 @@ static void ast2600_evb_i2c_init(AspeedMachineState *bmc) TYPE_TMP105, 0x4d); } +static void yosemitev2_bmc_i2c_init(AspeedMachineState *bmc) +{ +AspeedSoCState *soc = &bmc->soc; + +I2CBus *i2c[16]; + +for (int i = 0; i < 16; i++) { +i2c[i] = aspeed_i2c_get_bus(&soc->i2c, i); +} + +at24c_eeprom_init(i2c[4], 0x51, 128 * KiB); + +at24c_eeprom_init_rom(i2c[8], 0x51, 128 * KiB, fbyv2_bmc_fruid, + fbyv2_bmc_fruid_len); +} + static void romulus_bmc_i2c_init(AspeedMachineState *bmc) { AspeedSoCState *soc = &bmc->soc; @@ -1174,6 +1190,24 @@ static void aspeed_machine_ast2500_evb_class_init(ObjectClass *oc, void *data) aspeed_soc_num_cpus(amc->soc_name); }; +static void aspeed_machine_fbyv2_class_init(ObjectClass *oc, void *data) +{ +MachineClass *mc = MACHINE_CLASS(oc); +AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc); + +mc->desc = "Facebook YosemiteV2 BMC (ARM1176)"; +amc->soc_name = "ast2500-a1"; +amc->hw_strap1 = AST2500_EVB_HW_STRAP1; +amc->hw_strap2 = 0; +amc->fmc_model = "n25q256a"; +amc->spi_model = "mx25l25635e"; +amc->num_cs= 2; +amc->i2c_init = yosemitev2_bmc_i2c_init; +mc->default_ram_size = 512 * MiB; +mc->default_cpus = mc->min_cpus = mc->max_cpus = +aspeed_soc_num_cpus(amc->soc_name); +}; + static void aspeed_machine_romulus_class_init(ObjectClass *oc, void *data) { MachineClass *mc = MACHINE_CLASS(oc); @@ -1562,6 +1596,10 @@ static const TypeInfo aspeed_machine_types[] = { .name = MACHINE_TYPE_NAME("ast2600-evb"), .parent= TYPE_ASPEED_MACHINE, .class_init= aspeed_machine_ast2600_evb_class_init, +}, { +.name = MACHINE_TYPE_NAME("yosemitev2-bmc"), +.parent= TYPE_ASPEED_MACHINE, +.class_init= aspeed_machine_fbyv2_class_init, }, { .name = MACHINE_TYPE_NAME("tacoma-bmc"), .parent= TYPE_ASPEED_MACHINE, diff --git a/hw/arm/aspeed_eeprom.c b/hw/arm/aspeed_eeprom.c index 04463acc9d..807036d416 100644 --- a/hw/arm/aspeed_eeprom.c +++ b/hw/arm/aspeed_eeprom.c @@ -77,6 +77,29 @@ const uint8_t fby35_bmc_fruid[] = { 0x6e, 0x66, 0x69, 0x67, 0x20, 0x41, 0xc1, 0x45, }; +// Yosemite V2 BMC FRU +const uint8_t fbyv2_bmc_fruid[] = { +0x01, 0x00, 0x00, 0x01, 0x0d, 0x00, 0x00, 0xf1, 0x01, 0x0c, 0x00, 0x36, +0xe6, 0xd0, 0xc6, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xd2, 0x42, 0x4d, +0x43, 0x20, 0x53, 0x74, 0x6f, 0x72, 0x61, 0x67, 0x65, 0x20, 0x4d, 0x6f, +0x64, 0x75, 0x6c, 0x65, 0xcd, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, +0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xce, 0x58, 0x58, 0x58, 0x58, 0x58, +0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xc3, 0x31, 0x2e, +0x30, 0xc9, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xd2, +0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, +0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xc1, 0x39, 0x01, 0x0c, 0x00, 0xc6, +0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xd2, 0x59, 0x6f, 0x73, 0x65, 0x6d, +0x69, 0x74, 0x65, 0x20, 0x56, 0x32, 0x2e, 0x30, 0x20, 0x45, 0x56, 0x54, +0x32, 0xce, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, +0x58, 0x58, 0x58, 0x58, 0xc4, 0x45, 0x56, 0x54, 0x32, 0xcd, 0x58, 0x58, +0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xc7, +0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xc3, 0x31, 0x2e, 0x30, 0xc9, +0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xc8, 0x43, 0x6f, +0x6e, 0x66, 0x69, 0x67, 0x20, 0x41, 0xc1, 0x45, +}; + const size_t fby35_nic_fruid_len = sizeof(fby35_nic_fruid); const size_t fby35_bb_fruid_len = sizeof(fby35_bb_fruid); const size_t fby35_bmc_fruid_len = sizeof(fby35_bmc_fruid); + +const size_t fbyv2_bmc_fruid_len = sizeof(fbyv2_bmc_fruid); diff --git a/hw/arm/aspeed_eeprom.h b/hw/arm/aspeed_eeprom.h index a0f848fa6e..14d2533a28 100644 --- a/hw/arm/aspeed_eeprom.h +++ b/hw/arm/aspeed_eeprom.h @@ -16,4 +16,7 @@ extern const size_t fby35_nic_fruid_len; extern const size_t fby35_bb_fruid_len; extern const size_t fby35_bmc_fruid_len; +extern const uint8_t fbyv2_bmc_fruid[]; +extern const size_t fbyv2_bmc_fruid_len; + #endif -- 2.25.1
Re: [RFC PATCH v11bis 00/26] Emulated XenStore and PV backend support
David Woodhouse wrote: > The non-RFC patch submisson¹ is just the basic platform support for Xen > on KVM. This RFC series is phase 2, adding an internal XenStore and > hooking up the PV back end drivers to that and the emulated grant tables > and event channels. > > With this, we can boot a Xen guest with PV disk, under KVM. Full support > for migration isn't there yet because it's actually missing in the PV > back end drivers in the first place (perhaps because upstream Xen doesn't > yet have guest transparent live migration support anyway). I'm assuming > that when the first round is merged and we drop the [RFC] from this set, > that won't be a showstopper for now? > > I'd be particularly interested in opinions on the way I implemented > serialization for the XenStore, by creating a GByteArray and then dumping > it with VMSTATE_VARRAY_UINT32_ALLOC(). And I was wondering why I was CC'd in the whole series O:-) How big is the xenstore? I mean typical size and maximun size. If it is suficientely small (i.e. in the single unit megabytes), you can send it as a normal device at the end of migration. If it is bigger, I think that you are going to have to enter Migration iteration stage, and have some kind of dirty bitmap to know what entries are on the target and what not. As examples, we are going to discuss how to migrate Vhost-user-fs in the near future, and as far as I know that is something similar to the Xenstore (from 1 feet view, both are a (key, value) store, no?). We are having starting other discussions about how to migrate vfio and (not yet started) vhost/vpda devices, so you can get some "inspiration" from there if you are going the "opaque" route. Later, Juan.
[PATCH v2] Adding new machine Yosemitev2 in QEMU
This patch support Yosemitev2 in QEMU environment. and introduced EEPROM BMC FRU data support "add fbyv2_bmc_fruid data" along with the machine support. Signed-off-by: Karthikeyan Pasupathi --- hw/arm/aspeed.c| 38 ++ hw/arm/aspeed_eeprom.c | 23 +++ hw/arm/aspeed_eeprom.h | 3 +++ 3 files changed, 64 insertions(+) diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c index 27dda58338..35ff29b752 100644 --- a/hw/arm/aspeed.c +++ b/hw/arm/aspeed.c @@ -521,6 +521,22 @@ static void ast2600_evb_i2c_init(AspeedMachineState *bmc) TYPE_TMP105, 0x4d); } +static void yosemitev2_bmc_i2c_init(AspeedMachineState *bmc) +{ +AspeedSoCState *soc = &bmc->soc; + +I2CBus *i2c[16]; + +for (int i = 0; i < 16; i++) { +i2c[i] = aspeed_i2c_get_bus(&soc->i2c, i); +} + +at24c_eeprom_init(i2c[4], 0x51, 128 * KiB); + +at24c_eeprom_init_rom(i2c[8], 0x51, 128 * KiB, fbyv2_bmc_fruid, + fbyv2_bmc_fruid_len); +} + static void romulus_bmc_i2c_init(AspeedMachineState *bmc) { AspeedSoCState *soc = &bmc->soc; @@ -1174,6 +1190,24 @@ static void aspeed_machine_ast2500_evb_class_init(ObjectClass *oc, void *data) aspeed_soc_num_cpus(amc->soc_name); }; +static void aspeed_machine_fbyv2_class_init(ObjectClass *oc, void *data) +{ +MachineClass *mc = MACHINE_CLASS(oc); +AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc); + +mc->desc = "Facebook YosemiteV2 BMC (ARM1176)"; +amc->soc_name = "ast2500-a1"; +amc->hw_strap1 = AST2500_EVB_HW_STRAP1; +amc->hw_strap2 = 0; +amc->fmc_model = "n25q256a"; +amc->spi_model = "mx25l25635e"; +amc->num_cs= 2; +amc->i2c_init = yosemitev2_bmc_i2c_init; +mc->default_ram_size = 512 * MiB; +mc->default_cpus = mc->min_cpus = mc->max_cpus = +aspeed_soc_num_cpus(amc->soc_name); +}; + static void aspeed_machine_romulus_class_init(ObjectClass *oc, void *data) { MachineClass *mc = MACHINE_CLASS(oc); @@ -1562,6 +1596,10 @@ static const TypeInfo aspeed_machine_types[] = { .name = MACHINE_TYPE_NAME("ast2600-evb"), .parent= TYPE_ASPEED_MACHINE, .class_init= aspeed_machine_ast2600_evb_class_init, +}, { +.name = MACHINE_TYPE_NAME("yosemitev2-bmc"), +.parent= TYPE_ASPEED_MACHINE, +.class_init= aspeed_machine_fbyv2_class_init, }, { .name = MACHINE_TYPE_NAME("tacoma-bmc"), .parent= TYPE_ASPEED_MACHINE, diff --git a/hw/arm/aspeed_eeprom.c b/hw/arm/aspeed_eeprom.c index 04463acc9d..807036d416 100644 --- a/hw/arm/aspeed_eeprom.c +++ b/hw/arm/aspeed_eeprom.c @@ -77,6 +77,29 @@ const uint8_t fby35_bmc_fruid[] = { 0x6e, 0x66, 0x69, 0x67, 0x20, 0x41, 0xc1, 0x45, }; +// Yosemite V2 BMC FRU +const uint8_t fbyv2_bmc_fruid[] = { +0x01, 0x00, 0x00, 0x01, 0x0d, 0x00, 0x00, 0xf1, 0x01, 0x0c, 0x00, 0x36, +0xe6, 0xd0, 0xc6, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xd2, 0x42, 0x4d, +0x43, 0x20, 0x53, 0x74, 0x6f, 0x72, 0x61, 0x67, 0x65, 0x20, 0x4d, 0x6f, +0x64, 0x75, 0x6c, 0x65, 0xcd, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, +0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xce, 0x58, 0x58, 0x58, 0x58, 0x58, +0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xc3, 0x31, 0x2e, +0x30, 0xc9, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xd2, +0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, +0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xc1, 0x39, 0x01, 0x0c, 0x00, 0xc6, +0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xd2, 0x59, 0x6f, 0x73, 0x65, 0x6d, +0x69, 0x74, 0x65, 0x20, 0x56, 0x32, 0x2e, 0x30, 0x20, 0x45, 0x56, 0x54, +0x32, 0xce, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, +0x58, 0x58, 0x58, 0x58, 0xc4, 0x45, 0x56, 0x54, 0x32, 0xcd, 0x58, 0x58, +0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xc7, +0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xc3, 0x31, 0x2e, 0x30, 0xc9, +0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0x58, 0xc8, 0x43, 0x6f, +0x6e, 0x66, 0x69, 0x67, 0x20, 0x41, 0xc1, 0x45, +}; + const size_t fby35_nic_fruid_len = sizeof(fby35_nic_fruid); const size_t fby35_bb_fruid_len = sizeof(fby35_bb_fruid); const size_t fby35_bmc_fruid_len = sizeof(fby35_bmc_fruid); + +const size_t fbyv2_bmc_fruid_len = sizeof(fbyv2_bmc_fruid); diff --git a/hw/arm/aspeed_eeprom.h b/hw/arm/aspeed_eeprom.h index a0f848fa6e..14d2533a28 100644 --- a/hw/arm/aspeed_eeprom.h +++ b/hw/arm/aspeed_eeprom.h @@ -16,4 +16,7 @@ extern const size_t fby35_nic_fruid_len; extern const size_t fby35_bb_fruid_len; extern const size_t fby35_bmc_fruid_len; +extern const uint8_t fbyv2_bmc_fruid[]; +extern const size_t fbyv2_bmc_fruid_len; + #endif -- 2.25.1
Re: [PATCH 1/1] block: improve error logging in bdrv_reopen_prepare()
Am 13.02.2023 um 11:31 hat Denis V. Lunev geschrieben: > The error generated when the option could not be changed inside > bdrv_reopen_prepare() does not give a clue about problematic > BlockDriverState as we could get very long tree of devices. > > The patch adds node name to the error report in the same way as done > above. > > Signed-off-by: Denis V. Lunev > CC: Kevin Wolf > CC: Hanna Reitz > CC: Vladimir Sementsov-Ogievskiy Truly fascinating how inconsistent error reporting is in bdrv_reopen_prepare(). Some places use the node name, some places device or node name, some places filename and some places nothing. Your choice is as good as any. The only problem I can see with this patch is that qemu-iotests 245 needs an update, too. Kevin
Re: Future of icount discussion for next KVM call?
Alex Bennée wrote: > (replying all because qemu-devel rejected my email again) > > On Thu, 16 Feb 2023 at 10:19, Alex Bennée wrote: > >> Hi Juan, >> >> Do we have an agenda for next weeks KVM call yet? If there is space I'd >> like to take some time to discuss the future direction of icount. For next week we have: - more single binary qemu (philippe?) - TDX migration from intel. We asked them on the previous call to change their design to transfer stuff through migration channels and not create a new channel. But I haven't heard from intel. (wei?) They agreed to send the slides and post the code before continue discussion. And now I like the title of you topic - Future Direction of icount O:-) So, I will recommend 20 minutes each if Wei shows up, or 30/30 for the rest. What do the rest of the people think. >> Specifically I believe there might be some proposals for how we could >> support icount with MTTCG worth discussing. From my point of view icount >> provides too things: >> >> - a sense of time vaguely related to execution rather than wall clock >> - determinism >> >> I would love to divorce the former from icount and punt it to plugins. >> The plugin would be free to instrument as heavily or lightly as it sees >> fit and provide its best guess as to guest time on demand. I wrote this >> idea up as a card in Linaro's JIRA if anyone is interested: >> >> https://linaro.atlassian.net/browse/QEMU-481 >> >> Being able to punt cost modelling and sense of time into plugins would >> allow the core icount support to concentrate on determinism. Then any >> attempt to enable icount for MTTCG would then have to ensure it stays >> deterministic. >> >> Richard and I have discussed the problem a few times and weren't sure it >> was solvable but I'm totally open to hearing ideas on how to do it. >> Fundamentally I think we would have to ensure any TB's doing IO would >> have to execute in an exclusive context. The TCG code already has >> mechanisms to ensure all IO is only done at the end of blocks so it >> doesn't seem a huge leap to ensure we execute those blocks exclusively. >> However there is still the problem of what to do about other pure >> computation threads getting ahead or behind of the IO blocks on >> subsequent runs. >> >> Anyway does anyone else have ideas to bring to the discussion? Hat on to you O:-) Open discussion with a Jira Epic and a good introduction. I am sorry that I am not an expert (or even newbie) on that part of qemu to give apport anything. Thanks, Juan.
Re: [PATCH v2 0/7] Python: Drop support for Python 3.6
On 15/02/2023 20.05, Markus Armbruster wrote: The discussion under PATCH 6 makes me think there's a bit of confusion about the actual impact of dropping support for Python 3.6. Possibly because it's spelled out in the commit message of PATCH 7. Let me summarize it in one sentence: *** All supported host systems continue to work *** Evidence: CI remains green. The CI remains green since one of the patches disabled the building of the docs on CentOS 8. That's not how I'd describe "continue to work", at least not in the same extend as before. On some supported host systems, different packages need to be installed. On CentOS 8, for instance, we need to install Python 3.8.13 or 3.9.16 instead of 3.6.8. Let me stress again: same repository, different package. No downsides I can see. The *one* exception is Sphinx on CentOS 8. CentOS 8 does not ship a version of Sphinx that works with Python 3.7 or newer. This series proposes to simply stop building the docs there, unless the user provides a suitable version of Sphinx (which is easy enough with pip). I think we've all understood that. The thing that you obviously did not understood: This breaks our support statement. I'm pretty sure that you could also build the whole QEMU suite successfully on an ancient CentOS 7 or Ubuntu 18.04 system if you manually install a newer version of GCC and some of the required libraries first. But that's not how we understand our support statement. Sure, you can argue that you can use "pip install" to get a newer version of Sphinx on RHEL 8 / CentOS 8 to continue building the docs there - but is that really that much different from installing a newer version of GCC and libraries on an ancient distro that we do not officially support anymore? I'd say no. You also have to consider that not every build host has access to the internet, maybe some companies only have an internal mirror of the distro packages in their intranet (I remember some discussion about such a case in the past) - so while you were perfectly fine to build the whole of QEMU on a CentOS 8 there before this change, you could now not build parts of QEMU anymore there due to the missing possibility to run "pip install" without full internet connection. And sure, you can argue that it's "just" the documentation. But IMHO that's still an essential part of the QEMU build, and it used to work before, so it feels wrong to cut that now out. And also, if we start with the documentation now, what's next? If for example scripts/shaderinclude.py stops working with Python 3.6, do we then simply say: "Oh, it's fine, you can still build all the other targets that work without this script, just not the ones anymore that need it"? All the angst about CentOS falling off the end of our "supported build platforms" list is not actually warranted by this series :) Using the term "angst" for the concerns of your fellows here is quite cheeky. It's not about "angst", it's about a discussion that our support policy might need to be adjusted if we do this step. So instead of writing such sentences, I'd rather would like to see you posting a patch for docs/about/build-platforms.rst for constructive further discussion instead. Thanks, Thomas
Re: [PATCH] hbitmap: fix hbitmap_status() return value for first dirty bit case
Am 02.02.2023 um 19:15 hat Andrey Zhadchenko via geschrieben: > The last return statement should return true, as we already evaluated that > start == next_dirty > > Also, fix hbitmap_status() description in header > > Cc: qemu-sta...@nongnu.org > Fixes: a6426475a75 ("block/dirty-bitmap: introduce > bdrv_dirty_bitmap_status()") > Signed-off-by: Andrey Zhadchenko Thanks, applied to the block branch. Kevin
Re: [PATCH v6 3/3] multifd: Only flush once each full round of memory
Peter Xu wrote: > On Wed, Feb 15, 2023 at 07:02:31PM +0100, Juan Quintela wrote: >> We need to add a new flag to mean to flush at that point. >> Notice that we still flush at the end of setup and at the end of >> complete stages. >> >> Signed-off-by: Juan Quintela > > Acked-by: Peter Xu > > One nitpick below. Thanks. >> @@ -4169,7 +4190,9 @@ int ram_load_postcopy(QEMUFile *f, int channel) >> } >> decompress_data_with_multi_threads(f, page_buffer, len); >> break; >> - >> +case RAM_SAVE_FLAG_MULTIFD_FLUSH: >> +multifd_recv_sync_main(); >> +break; >> case RAM_SAVE_FLAG_EOS: >> /* normal exit */ >> if (migrate_multifd_flush_after_each_section()) { > > We could have dropped RAM_SAVE_FLAG_MULTIFD_FLUSH and RAM_SAVE_FLAG_EOS for > now until we support postcopy+multifd. I don't think so. We have this curse of biblic proportions called Backwards compatibility. We need to mark the beggining and end of sections. That is independent of multifd. And for multifd we have to flush all channels at the end of each iteration through RAM. We could do that without involving the main thread, but I don't see the point of doing that. > Here it's not only about enabling them together, but it's about running > them in parallel, which I doubt whether it can really be implemented at all > due to the fundamentally concepts difference between multifd & postcopy.. :( Later, Juan.
Re: [PATCH v2 6/7] CI: Stop building docs on centos8
On Thu, Feb 16, 2023 at 02:08:33AM +0100, Markus Armbruster wrote: > Daniel P. Berrangé writes: > > Our support policy is written from the POV of the C world, and > > merely reducing the length of time we support a distro does not > > address the different world view of Python. > > > > Should we instead try to be more explicit about the different > > needs of the non-C dependencies ? > > > > We could for example say > > > > * For native library/application dependancies we aim to > >support the two most recent distro versions, for 2 years > >overlap > > > > * For non-native library/applications dependancies we aim > >to support only the most recent distro version. Users > >of older distros may need to dynamically fetch newer > >deps. > > Who does the fetching, users manually, or the build process > automatically? I expect both cases need supporting. In the case of distro builds, they will have to fetch the deps ahead of time, because the build environments usually won't have any network access. Some contributors have also previously objected to the build system fetching stuff off the network, but they're a small minority. For friendliness to developers, the build process would be best to fetch automatically, if they haven't been provided upfront. > > The python 3.8 runtime would be considered a native dep, so fall > > under the 2 distro versions rule. This is fine with CentOS 8, > > since it provides newer python runtime versions. > > > > The python libraries, or tools written in python (meson), would > > fall under the second rule, and so only need to target one distro > > version. This would be compatible with CentOS 8, as the users would > > be expected to download extra python components (or we do it on > > their behalf). > > > > For the second rule, rather than saying most recent distro versions, > > possibly we might want to carve out an exclusion for LTS distros too. > > ie, explicitly don't care about versions of non-native bits in RHEL > > at all, beyond availability of the base (python) runtime. > > Interesting idea. Can anyone think of reasons not to do this? It is probably even more compelling when looking at SLES, which is having an even larger gap between releases than RHEL does. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH v2 6/7] CI: Stop building docs on centos8
On Thu, Feb 16, 2023 at 02:46:18AM +0100, Markus Armbruster wrote: > Daniel P. Berrangé writes: > > > On Tue, Feb 14, 2023 at 09:52:44PM +0100, Paolo Bonzini wrote: > >> Il mar 14 feb 2023, 18:26 Kevin Wolf ha scritto: > >> > >> > Am 14.02.2023 um 15:03 hat Paolo Bonzini geschrieben: > >> > > In the case of Python the issue is not the interpreter per se, though > >> > > there are a couple new feature in Python 3.7 that are quite nice (for > >> > > example improved data classes[1] or context variables[2]). The main > >> > > problem as far as I understood (and have seen in my experience) is > >> > > linting tools. New versions fix bugs that caused false positives, but > >> > > also become more strict at the same time. The newer versions at the > >> > > same time are very quick at dropping support for old versions of > >> > > Python; while older versions sometimes throw deprecation warnings on > >> > > new versions of Python. This makes it very hard to support a single > >> > > version of, say, mypy that works on all versions from RHEL8 and SLE15 > >> > > to Fedora 38 and Ubuntu 23.04. > >> > > >> > Why do we have to support a single version of mypy? What is wrong with > >> > running an old mypy version with old Python version, and a newer mypy > >> > with newer Python versions? > >> > > >> > Sure, they will complain about different things, but it doesn't feel > >> > that different from supporting multiple C compilers in various versions. > >> > > >> > >> It's more like having to support only C++03 on RHEL 8 and only C++20 in > >> Fedora 37, without even being able to use a preprocessor. > >> > >> For example old versions might not understand some type annotations and > >> will fail mypy altogether, therefore even with newer versions you can't > >> annotate the whole source and have to fall back to non-strict mode. > > > > In terms of back compatibility, is there a distinction to be > > made between mypy compat and the python runtime compat ? > > > > If we add annotations wanted by new mypy, and old mypy doesn't > > understand them, that's not a huge problem. We can simply declare > > that we don't support old mypy, and skip the validation tests if > > old mypy is installed. > > In theory, type hints are transparent at run time. In practice, use of > modern type hints is known to break 3.6 at run time. I don't remember > the details offhand, but John should be able to dig them up if you're > interested. > > So, it should not be a problem, but it is. Ok, this is a pretty compelling motivating factor for dropping 3.6, as it is clear demonstration that we're being held back by the unfortunate lack of runtime compatibility. For most of the other problems we can simply mandate a new enough version of the add on tool, but if using new mypy requires annotations that break at runtime on 3.6 that's a painful blocker. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH] Adding new machine Yosemitev2 in QEMU
Hi Pasupathi, On 16/2/23 11:48, Karthikeyan Pasupathi wrote: Signed-off-by: Karthikeyan Pasupathi --- hw/arm/aspeed.c| 38 ++ hw/arm/aspeed_eeprom.c | 23 +++ hw/arm/aspeed_eeprom.h | 3 +++ 3 files changed, 64 insertions(+) +static void yosemitev2_bmc_i2c_init(AspeedMachineState *bmc) +{ +AspeedSoCState *soc = &bmc->soc; + +I2CBus *i2c[16]; + +for (int i = 0; i < 16; i++) { +i2c[i] = aspeed_i2c_get_bus(&soc->i2c, i); +} + +at24c_eeprom_init(i2c[4], 0x51, 128 * KiB); + +at24c_eeprom_init_rom(i2c[8], 0x51, 128 * KiB, fbyv2_bmc_fruid, + fbyv2_bmc_fruid_len); This can be simplified as: at24c_eeprom_init(aspeed_i2c_get_bus(&soc->i2c, 4), 0x51, 128 * KiB); at24c_eeprom_init_rom(aspeed_i2c_get_bus(&soc->i2c, 8), 0x51, 128 * KiB, fbyv2_bmc_fruid, fbyv2_bmc_fruid_len); +}
Re: [RFC PATCH v2 0/2] i386/sev: Support measured direct kernel boot on SNP
* Daniel P. Berrangé (berra...@redhat.com) wrote: > On Thu, Feb 16, 2023 at 08:49:11AM +, Dov Murik wrote: > > This RFC patch series is based on AMD's RFC upmv10-snpv3 tree [1]. > > I've seen postings of the kernel patches for SNP using the kernel > UPM support, but I don't recall ever seeing these QEMU pieces > posted for review. The code in that QEMU branch looks different > from the last posting of SNP to qemu-devel years ago. > > IMHO it would be very desirable if that QEMU UPM tree was submitted > to qemu-devel for review feedback Some of the patches in there look like they're not dependent on SNP or the UPM interface; (eg some CPU model updates). It's probably worth posting those separately so that they can be reviewed and merged and out of the way. > before requesting review of patches > that build on top of it. But at the same time it seems right for Dov to send these patches for review. Dave > > With regards, > Daniel > -- > |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o-https://fstop138.berrange.com :| > |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [PATCH v2 6/7] CI: Stop building docs on centos8
On Tue, Feb 14, 2023 at 03:03:54PM +0100, Paolo Bonzini wrote: > On Tue, Feb 14, 2023 at 12:49 PM Daniel P. Berrangé > wrote: > > [quote] > > The motivation for this series is that Python 3.6 was EOL at the end of > > 2021; upstream tools are beginning to drop support for it, including > > setuptools, pylint, mypy, etc. As time goes by, it becomes more > > difficult to support and test against the full range of Python versions > > that QEMU supports. The closer we get to Python 3.12, the harder it will > > be to cover that full spread of versions. > > [/quote] > > > > this is all about new/eol versions of software upstream, and I don't > > think that's a justification. QEMU explicitly aims to use distro provided > > versions and upstream EOL status is not relevant in that context. Even > > if using "pip" to install it is possible to limit yourself to upstream > > releases which still support 3.6. > > > > There is the separate issue of Meson dropping python 3.6 which motivates > > Paolo's series. Again though, we don't have to increase our minimum meson > > version, because meson is working today. It is our choice to to increase > > it to use latest available meson features. At some point we can decide > > what we have is good enough and we don't have to keep chasing the latest > > features. Maybe we're not there yet, but we should think about when that > > would be. > > In the case of Meson, the main advantage is moving _all_ of the > emulator configury out of the configure script. This requires > add_global_dependencies which was added in 0.63. So in that case it > is indeed mostly about shiny new features and it's not absolutely > necessary. I forgot to mention in my previous reply, I feel like the ability to finish the configure -> meson conversion is a pretty compelling motivation to adopt the new meson. The hybrid configure/meson state we've been in has worked better than I expected it would at first, but none the less, the more we can get out of configure the better it will be for ongoing maint burden. So yes, its shiny new features, but they're pretty compelling features as they allow us to finish the job we started. We've clear precedent all over QEMU codebase that half-finished conversions harm our ability to maintain the project. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH v6 0/4] memory: prevent dma-reentracy issues
On 13/02/2023 03.11, Alexander Bulekov wrote: ping I think it would be really good to finally get these dma-reentrancy issues fixed! Who's supposed to pick up these patches? Paolo? David? Peter? Thomas On 230204 2307, Alexander Bulekov wrote: These patches aim to solve two types of DMA-reentrancy issues: 1.) mmio -> dma -> mmio case To solve this, we track whether the device is engaged in io by checking/setting a reentrancy-guard within APIs used for MMIO access. 2.) bh -> dma write -> mmio case This case is trickier, since we dont have a generic way to associate a bh with the underlying Device/DeviceState. Thus, this version allows a device to associate a reentrancy-guard with a bh, when creating it. (Instead of calling qemu_bh_new, you call qemu_bh_new_guarded) I replaced most of the qemu_bh_new invocations with the guarded analog, except for the ones where the DeviceState was not trivially accessible. v5 -> v6: - Only apply checkpatch checks to code in paths containing "/hw/" (/hw/ and include/hw/) - Fix a bug in a _guarded call added to hw/block/virtio-blk.c v4-> v5: - Add corresponding checkpatch checks - Save/restore reentrancy-flag when entering/exiting BHs - Improve documentation - Check object_dynamic_cast return value v3 -> v4: Instead of changing all of the DMA APIs, instead add an optional reentrancy guard to the BH API. v2 -> v3: Bite the bullet and modify the DMA APIs, rather than attempting to guess DeviceStates in BHs. Alexander Bulekov (4): memory: prevent dma-reentracy issues async: Add an optional reentrancy guard to the BH API checkpatch: add qemu_bh_new/aio_bh_new checks hw: replace most qemu_bh_new calls with qemu_bh_new_guarded docs/devel/multiple-iothreads.txt | 7 +++ hw/9pfs/xen-9p-backend.c | 4 +++- hw/block/dataplane/virtio-blk.c | 3 ++- hw/block/dataplane/xen-block.c| 5 +++-- hw/char/virtio-serial-bus.c | 3 ++- hw/display/qxl.c | 9 ++--- hw/display/virtio-gpu.c | 6 -- hw/ide/ahci.c | 3 ++- hw/ide/core.c | 3 ++- hw/misc/imx_rngc.c| 6 -- hw/misc/macio/mac_dbdma.c | 2 +- hw/net/virtio-net.c | 3 ++- hw/nvme/ctrl.c| 6 -- hw/scsi/mptsas.c | 3 ++- hw/scsi/scsi-bus.c| 3 ++- hw/scsi/vmw_pvscsi.c | 3 ++- hw/usb/dev-uas.c | 3 ++- hw/usb/hcd-dwc2.c | 3 ++- hw/usb/hcd-ehci.c | 3 ++- hw/usb/hcd-uhci.c | 2 +- hw/usb/host-libusb.c | 6 -- hw/usb/redirect.c | 6 -- hw/usb/xen-usb.c | 3 ++- hw/virtio/virtio-balloon.c| 5 +++-- hw/virtio/virtio-crypto.c | 3 ++- include/block/aio.h | 18 -- include/hw/qdev-core.h| 7 +++ include/qemu/main-loop.h | 7 +-- scripts/checkpatch.pl | 8 softmmu/memory.c | 17 + softmmu/trace-events | 1 + tests/unit/ptimer-test-stubs.c| 3 ++- util/async.c | 18 +- util/main-loop.c | 5 +++-- util/trace-events | 1 + 35 files changed, 147 insertions(+), 41 deletions(-) -- 2.39.0
Re: [PULL 0/5] Misc next patches
On Wed, 15 Feb 2023 at 17:48, Daniel P. Berrangé wrote: > > The following changes since commit 6a50f64ca01d0a7b97f14f069762bfd88160f31e: > > Merge tag 'pull-request-2023-02-14' of https://gitlab.com/thuth/qemu into > staging (2023-02-14 14:46:10 +) > > are available in the Git repository at: > > https://gitlab.com/berrange/qemu tags/misc-next-pull-request > > for you to fetch changes up to 36debafddd788066be10b33c5f11b984a08e5c85: > > ui: remove deprecated 'password' option for SPICE (2023-02-15 11:14:58 > -0500) > > > * Document 'password-secret' option for -iscsi > * Deprecate iSCSI 'password' in favour of 'password-secret' > * Remove deprecated 'password' option for SPICE > * Fix handling of cached read buffers with TLS > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0 for any user-visible changes. -- PMM
Re: [PATCH v3 01/14] hw/char/serial-pci: Replace DO_UPCAST(PCISerialState) by PCI_SERIAL()
On 13/02/2023 19.43, Philippe Mathieu-Daudé wrote: Use the PCI_SERIAL() QOM type-checking macro to avoid DO_UPCAST(). Signed-off-by: Philippe Mathieu-Daudé --- hw/char/serial-pci.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/hw/char/serial-pci.c b/hw/char/serial-pci.c index 801b769aba..9689645cac 100644 --- a/hw/char/serial-pci.c +++ b/hw/char/serial-pci.c @@ -36,7 +36,10 @@ #include "qom/object.h" struct PCISerialState { +/*< private >*/ PCIDevice dev; +/*< public >*/ + I'm not sure about this part of the patch. It does not seem to be related to the other changes at all, and are you sure about which parts are really "public" and which parts are "private"? If so, I'd like to see a description about this in the commit message, preferably in a separate patch. Also, why an empty line after the "public" comment? SerialState state; uint8_t prog_if; }; @@ -46,7 +49,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(PCISerialState, PCI_SERIAL) static void serial_pci_realize(PCIDevice *dev, Error **errp) { -PCISerialState *pci = DO_UPCAST(PCISerialState, dev, dev); +PCISerialState *pci = PCI_SERIAL(dev); SerialState *s = &pci->state; if (!qdev_realize(DEVICE(s), NULL, errp)) { @@ -63,7 +66,7 @@ static void serial_pci_realize(PCIDevice *dev, Error **errp) static void serial_pci_exit(PCIDevice *dev) { -PCISerialState *pci = DO_UPCAST(PCISerialState, dev, dev); +PCISerialState *pci = PCI_SERIAL(dev); SerialState *s = &pci->state; qdev_unrealize(DEVICE(s)); Ack for the DO_UPCAST removal. Thomas
Re: [PATCH] qemu: make version available in coredump
On 16.02.23 12:44, Daniel P. Berrangé wrote: On Wed, Feb 15, 2023 at 05:05:47PM -0500, Stefan Hajnoczi wrote: On Tue, 7 Jun 2022 at 16:33, Vladimir Sementsov-Ogievskiy wrote: Add a variable with QEMU_FULL_VERSION definition. Then the content of the variable is easily searchable: strings /path/to/core | grep QEMU_FULL_VERSION 'volatile' keyword is used to avoid removing the variable by compiler as unused. Signed-off-by: Vladimir Sementsov-Ogievskiy --- Hi all! Probably, I just don't know the correct way to get version from core file. If so, please teach me :) I've never hit this issue because bug reports usually include the QEMU distro package version. Keeping the version string in the core file seems reasonable (unless there is already another way to do this). Something I'm curious about: is the coredump guaranteed to contain static const variables? I wondered if they might be located in the .rodata ELF section and excluded from the coredump because they are referenced in the NT_FILE mmap note instead. Maybe volatile prevents this? In Fedora / RHEL based systems (and some other distros too IIUC) for many years, all binaries have included a "build-id" ELF note which uniquely identifies the package build. Note section [ 3] '.note.gnu.build-id' of 36 bytes at offset 0x3c0: Owner Data size Type GNU 20 GNU_BUILD_ID Build ID: e3143405b7f653a0a65b3295df760fdf2c09ba79 This can be used to query what RPM it came from (assuming the RPM is still in your repos) dnf repoquery --whatprovides debuginfo(build-id) = ...hash... this makes it into the coredump files and is what current distro tooling uses to find the binary (and libraries). There are some downsides/limitations with this though, so in Fedora 36 a new impl was added alongside which provides full package info in json Note section [ 5] '.note.package' of 136 bytes at offset 0x404: Owner Data size Type FDO 120 FDO_PACKAGING_METADATA Packaging Metadata: {"type":"rpm","name":"qemu","version":"7.0.0-13.fc37","architecture":"x86_64","osCpe":"cpe:/o:fedoraproject:fedora:37"} Looks very good This format is supported by systemd core dump tools https://systemd.io/ELF_PACKAGE_METADATA/ I believe it has been proposed (and possibly implemented?) for Debian too. This is a long winded way of asking, do we really need a QEMU specific solution here ? Especially one that only tells us a QEMU verison, and nothing about the many libraries QEMU links to which affect its operational behaviour. Generic solution is of course better. Hmm. I'm on Ubuntu 22.04. readelf -n /usr/bin/qemu-system-x86_64 Displaying notes found in: .note.gnu.property OwnerData sizeDescription GNU 0x0020 NT_GNU_PROPERTY_TYPE_0 Properties: x86 feature: IBT, SHSTK x86 ISA needed: x86-64-baseline Displaying notes found in: .note.gnu.build-id OwnerData sizeDescription GNU 0x0014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: 4298cd7c2623c58e1cd71668d9d48508bb7f8d52 Displaying notes found in: .note.ABI-tag OwnerData sizeDescription GNU 0x0010 NT_GNU_ABI_TAG (ABI version tag) OS: Linux, ABI: 3.2.0 OK. But I can't find this build-id in core file.. readelf -n /tmp/cores/core.qemu-system-x86.20351.vsementsov-win.1676544081 | grep -i 'build' gets nothing strings /tmp/cores//core.qemu-system-x86.20351.vsementsov-win.1676544081 | grep 4298cd7c2623c58e nothing as well So the case is to find the package not having the binary, only by core file. Probably right solution is to fix our workflow so that if you have core file you always have corresponding binary as well. Still, having the information exactly inside core file seems good anyway. Maybe there is a generic way to force the system put "Packaging Metadata" into core file on creation of it? -- Best regards, Vladimir
Re: [RFC PATCH 10/16] hw/arm/smmuv3: Make TLB lookup work for stage-2
Hi Mostafa, On 2/5/23 10:44, Mostafa Saleh wrote: > Right now, either stage-1 or stage-2 are supported, this simplifies > how we can deal with TLBs. > This patch makes TLB lookup work if stage-2 is enabled instead of > stage-1. > TLB lookup is done before a PTW, if a valid entry is found we won't > do the PTW. > To be able to do TLB lookup, we need the correct tagging info, as > granularity and input size, so we get this based on the supported > translation stage. The TLB entries are added correctly from each > stage PTW. > > When nested translation is supported, this would need to change, for > example if we go with a combined TLB implementation, we would need to > use the min of the granularities in TLB. > > As stage-2 shouldn't be tagged by ASID, it will be set to -1 if S1P > is not enabled. > > Signed-off-by: Mostafa Saleh > --- > hw/arm/smmuv3.c | 44 +--- > 1 file changed, 33 insertions(+), 11 deletions(-) > > diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c > index c18460a4ff..769c735697 100644 > --- a/hw/arm/smmuv3.c > +++ b/hw/arm/smmuv3.c > @@ -653,6 +653,8 @@ static int smmuv3_decode_config(IOMMUMemoryRegion *mr, > SMMUTransCfg *cfg, > return ret; > } > > +/* ASID defaults to -1 if s1 is not supported. */ > +cfg->asid = -1; > if (cfg->aborted || cfg->bypassed || !STAGE1_SUPPORTED(s->features)) { > return 0; > } > @@ -733,6 +735,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion > *mr, hwaddr addr, > SMMUTLBEntry *cached_entry = NULL; > SMMUTransTableInfo *tt; > SMMUTransCfg *cfg = NULL; > +uint8_t granule_sz, tsz; > IOMMUTLBEntry entry = { > .target_as = &address_space_memory, > .iova = addr, > @@ -764,21 +767,40 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion > *mr, hwaddr addr, > goto epilogue; > } > > -tt = select_tt(cfg, addr); maybe we shall adapt select_tt for S2 instead of using temp? I know there is a single range on S2 but well, use tt[0]? > -if (!tt) { > -if (cfg->record_faults) { > -event.type = SMMU_EVT_F_TRANSLATION; > -event.u.f_translation.addr = addr; > -event.u.f_translation.rnw = flag & 0x1; > +if (STAGE1_SUPPORTED(s->features)) { maybe check the enable state instead. > +/* Select stage1 translation table. */ > +tt = select_tt(cfg, addr); > +if (!tt) { > +if (cfg->record_faults) { > +event.type = SMMU_EVT_F_TRANSLATION; > +event.u.f_translation.addr = addr; > +event.u.f_translation.rnw = flag & 0x1; > +} > +status = SMMU_TRANS_ERROR; > +goto epilogue; > } > -status = SMMU_TRANS_ERROR; > -goto epilogue; > -} > +granule_sz = tt->granule_sz; > +tsz = tt->tsz; > > -page_mask = (1ULL << (tt->granule_sz)) - 1; > +} else { > +/* Stage2. */ > +granule_sz = cfg->s2cfg.granule_sz; > +tsz = cfg->s2cfg.tsz; > +} > +/* > + * TLB lookup looks for granule and input size for a translation stage, > + * as only one stage is supported right now, choose the right values > + * from the configuration. > + */ > +page_mask = (1ULL << granule_sz) - 1; > aligned_addr = addr & ~page_mask; > > -cached_entry = smmu_iotlb_lookup(bs, cfg, tt, aligned_addr); > +SMMUTransTableInfo temp = { > +.granule_sz = granule_sz, > +.tsz = tsz, > +}; > + > +cached_entry = smmu_iotlb_lookup(bs, cfg, &temp, aligned_addr); > if (cached_entry) { > if ((flag & IOMMU_WO) && !(cached_entry->entry.perm & IOMMU_WO)) { > status = SMMU_TRANS_ERROR; Thanks Eric
Re: [PATCH] hw/arm: Add missing ZynqMP ZCU102 -> USB_DWC3 Kconfig dependency
On Thu, 16 Feb 2023 at 09:27, Philippe Mathieu-Daudé wrote: > > Oops I meant this as subject: > "hw/arm: Add missing XLNX_ZYNQMP_ARM -> USB_DWC3 Kconfig dependency" > > On 16/2/23 10:23, Philippe Mathieu-Daudé wrote: > > Since commit acc0b8b05a when running the ZynqMP ZCU102 board with > > a QEMU configured using --without-default-devices, we get: > > > >$ qemu-system-aarch64 -M xlnx-zcu102 > >qemu-system-aarch64: missing object type 'usb_dwc3' > >Abort trap: 6 > > > > Fix by adding the missing Kconfig dependency. > > > > Fixes: acc0b8b05a ("hw/arm/xlnx-zynqmp: Connect ZynqMP's USB controllers") > > Signed-off-by: Philippe Mathieu-Daudé Applied to target-arm.next with the subject line fixed, thanks. -- PMM
Re: [PATCH v5 0/3] arm: enable MTE for QEMU + kvm
On Fri, 3 Feb 2023 at 13:44, Cornelia Huck wrote: > > Respin of my kvm mte series; tested via check + check-tcg and on FVP. I've taken patch 1 into target-arm.next since it's a simple cleanup. thanks -- PMM
Re: [PATCH v2] audio/pwaudio.c: Add Pipewire audio backend for QEMU
On Thursday, February 16, 2023 9:25:44 AM CET Dorinda Bassey wrote: > This commit adds a new audiodev backend to allow QEMU to use Pipewire as both > an audio sink and source. > Please wrap commit log. > Signed-off-by: Dorinda Bassey > --- > v2: > * Shorten commit message > * fix copyright ownership and authour > * use QEMU standard of 4 space indentation > * verbose use of pipewire instead pf pw > > audio/audio.c | 3 + > audio/audio_template.h| 4 + > audio/meson.build | 1 + > audio/pwaudio.c | 818 ++ > meson.build | 7 + > meson_options.txt | 4 +- > qapi/audio.json | 45 ++ > qemu-options.hx | 17 + > scripts/meson-buildoptions.sh | 8 +- > 9 files changed, 904 insertions(+), 3 deletions(-) > create mode 100644 audio/pwaudio.c > > diff --git a/audio/audio.c b/audio/audio.c > index 4290309d18..aa55e41ad8 100644 > --- a/audio/audio.c > +++ b/audio/audio.c > @@ -2069,6 +2069,9 @@ void audio_create_pdos(Audiodev *dev) > #ifdef CONFIG_AUDIO_PA > CASE(PA, pa, Pa); > #endif > +#ifdef CONFIG_AUDIO_PIPEWIRE > +CASE(PIPEWIRE, pipewire, Pipewire); > +#endif > #ifdef CONFIG_AUDIO_SDL > CASE(SDL, sdl, Sdl); > #endif > diff --git a/audio/audio_template.h b/audio/audio_template.h > index 42b4712acb..0f02afb921 100644 > --- a/audio/audio_template.h > +++ b/audio/audio_template.h > @@ -355,6 +355,10 @@ AudiodevPerDirectionOptions *glue(audio_get_pdo_, > TYPE)(Audiodev *dev) > case AUDIODEV_DRIVER_PA: > return qapi_AudiodevPaPerDirectionOptions_base(dev->u.pa.TYPE); > #endif > +#ifdef CONFIG_AUDIO_PIPEWIRE > +case AUDIODEV_DRIVER_PIPEWIRE: > +return > qapi_AudiodevPipewirePerDirectionOptions_base(dev->u.pipewire.TYPE); > +#endif > #ifdef CONFIG_AUDIO_SDL > case AUDIODEV_DRIVER_SDL: > return qapi_AudiodevSdlPerDirectionOptions_base(dev->u.sdl.TYPE); > diff --git a/audio/meson.build b/audio/meson.build > index 074ba9..65a49c1a10 100644 > --- a/audio/meson.build > +++ b/audio/meson.build > @@ -19,6 +19,7 @@ foreach m : [ >['sdl', sdl, files('sdlaudio.c')], >['jack', jack, files('jackaudio.c')], >['sndio', sndio, files('sndioaudio.c')], > + ['pipewire', pipewire, files('pwaudio.c')], >['spice', spice, files('spiceaudio.c')] > ] >if m[1].found() > diff --git a/audio/pwaudio.c b/audio/pwaudio.c > new file mode 100644 > index 00..bb25133414 > --- /dev/null > +++ b/audio/pwaudio.c > @@ -0,0 +1,818 @@ > +/* > + * QEMU Pipewire audio driver > + * > + * Copyright (c) 2023 Red Hat Inc. > + * > + * Author: Dorinda Bassey > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > copy > + * of this software and associated documentation files (the "Software"), to > deal > + * in the Software without restriction, including without limitation the > rights > + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell > + * copies of the Software, and to permit persons to whom the Software is > + * furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice shall be included in > + * all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > FROM, > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > + * THE SOFTWARE. > + */ > + > +#include "qemu/osdep.h" > +#include "qemu/module.h" > +#include "audio.h" > +#include > +#include > +#include > +#include > + > +#include > + > +#define AUDIO_CAP "pipewire" > +#define RINGBUFFER_SIZE(1u << 22) > +#define RINGBUFFER_MASK(RINGBUFFER_SIZE - 1) > +#define BUFFER_SAMPLES128 BUFFER_SAMPLES is not used anywhere, and in code you are using 512 as literals instead. > + > +#include "audio_int.h" > + > +enum { > +MODE_SINK, > +MODE_SOURCE > +}; > + > +typedef struct pwaudio { > +Audiodev *dev; > +struct pw_thread_loop *thread_loop; > +struct pw_context *context; > + > +struct pw_core *core; > +struct spa_hook core_listener; > +int seq; > +} pwaudio; > + > +typedef struct PWVoice { > +pwaudio *g; > +bool enabled; > +struct pw_stream *stream; > +struct spa_hook stream_listener; > +struct spa_audio_info_raw info; > +uint32_t frame_size; > +struct spa_ringbuffer ring; > +uint8_t buffer[RINGBUFFER_SIZE]; s/buffer/ringbuffer/ maybe? > + > +uint32_t mode; > +struct pw_properties *props; > +} PWVoice; > + > +typedef str
Re: [PATCH] audio/pwaudio.c: Add Pipewire audio backend for QEMU
On Wed, Feb 15, 2023 at 04:09:59PM +0100, Dorinda Bassey wrote: > > > > My point is that pipewire has ALSA plugin (and also pulseaudio compat > > library etc). So why add another back in QEMU? > > > Good question. Well the pulseaudio compatibility layer adds to the runtime > overhead and dependency, dropping the usage of pw-pulse daemon as a runtime > dependency would reduce footprint. Oh, pulseaudio compatibility goes through another daemon. That is IMHO enough reason for a native pipewire backend, to avoid that overhead and specifically reduce latency. (this and other good reasons should be added to the commit message). take care, Gerd
Re: [PATCH v5 0/3] arm: enable MTE for QEMU + kvm
On Thu, Feb 16 2023, Peter Maydell wrote: > On Fri, 3 Feb 2023 at 13:44, Cornelia Huck wrote: >> >> Respin of my kvm mte series; tested via check + check-tcg and on FVP. > > I've taken patch 1 into target-arm.next since it's a simple > cleanup. Thanks! (I plan to send a respin of the remainder once we've agreed on some of the feedback.)
[PATCH v1 0/2] vhost: memslot handling improvements
Following up on my previous work to make virtio-mem consume multiple memslots dynamically [1] that requires precise accounting between used vs. reserved memslots, I realized that vhost makes this extra hard by filtering out some memory region sections (so they don't consume a memslot) in the vhost-user case, which messes up the whole memslot accounting. This series fixes what I found to be broken and prepares for more work on [1]. Further, it cleanes up the merge checks that I consider unnecessary. [1] https://lkml.kernel.org/r/20211027124531.57561-8-da...@redhat.com Cc: "Michael S. Tsirkin" Cc: Stefan Hajnoczi Cc: Dr. David Alan Gilbert David Hildenbrand (2): vhost: Defer filtering memory sections until building the vhost memory structure vhost: Remove vhost_backend_can_merge() callback hw/virtio/vhost-user.c| 14 - hw/virtio/vhost-vdpa.c| 1 - hw/virtio/vhost.c | 85 --- include/hw/virtio/vhost-backend.h | 4 -- 4 files changed, 56 insertions(+), 48 deletions(-) -- 2.39.1
[PATCH v1 2/2] vhost: Remove vhost_backend_can_merge() callback
Checking whether the memory regions are equal is sufficient: if they are equal, then most certainly the contained fd is equal. The whole vhost-user memslot handling is suboptimal and overly complicated. We shouldn't have to lookup a RAM memory regions we got notified about in vhost_user_get_mr_data() using a host pointer. But that requires a bigger rework -- especially an alternative vhost_set_mem_table() backend call that simply consumes MemoryRegionSections. For now, let's just drop vhost_backend_can_merge(). Signed-off-by: David Hildenbrand --- hw/virtio/vhost-user.c| 14 -- hw/virtio/vhost-vdpa.c| 1 - hw/virtio/vhost.c | 6 +- include/hw/virtio/vhost-backend.h | 4 4 files changed, 1 insertion(+), 24 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index e68daa35d4..4bfaf559a7 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -2195,19 +2195,6 @@ static int vhost_user_migration_done(struct vhost_dev *dev, char* mac_addr) return -ENOTSUP; } -static bool vhost_user_can_merge(struct vhost_dev *dev, - uint64_t start1, uint64_t size1, - uint64_t start2, uint64_t size2) -{ -ram_addr_t offset; -int mfd, rfd; - -(void)vhost_user_get_mr_data(start1, &offset, &mfd); -(void)vhost_user_get_mr_data(start2, &offset, &rfd); - -return mfd == rfd; -} - static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu) { VhostUserMsg msg; @@ -2704,7 +2691,6 @@ const VhostOps user_ops = { .vhost_set_vring_enable = vhost_user_set_vring_enable, .vhost_requires_shm_log = vhost_user_requires_shm_log, .vhost_migration_done = vhost_user_migration_done, -.vhost_backend_can_merge = vhost_user_can_merge, .vhost_net_set_mtu = vhost_user_net_set_mtu, .vhost_set_iotlb_callback = vhost_user_set_iotlb_callback, .vhost_send_device_iotlb_msg = vhost_user_send_device_iotlb_msg, diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 542e003101..9ab7bc8718 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -1317,7 +1317,6 @@ const VhostOps vdpa_ops = { .vhost_set_config = vhost_vdpa_set_config, .vhost_requires_shm_log = NULL, .vhost_migration_done = NULL, -.vhost_backend_can_merge = NULL, .vhost_net_set_mtu = NULL, .vhost_set_iotlb_callback = NULL, .vhost_send_device_iotlb_msg = NULL, diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index b7fb960fa9..9d8662aa98 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -733,11 +733,7 @@ static void vhost_region_add_section(struct vhost_dev *dev, size_t offset = mrs_gpa - prev_gpa_start; if (prev_host_start + offset == mrs_host && -section->mr == prev_sec->mr && -(!dev->vhost_ops->vhost_backend_can_merge || - dev->vhost_ops->vhost_backend_can_merge(dev, -mrs_host, mrs_size, -prev_host_start, prev_size))) { +section->mr == prev_sec->mr) { uint64_t max_end = MAX(prev_host_end, mrs_host + mrs_size); need_add = false; prev_sec->offset_within_address_space = diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h index c5ab49051e..abf1601ba2 100644 --- a/include/hw/virtio/vhost-backend.h +++ b/include/hw/virtio/vhost-backend.h @@ -86,9 +86,6 @@ typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev, typedef bool (*vhost_requires_shm_log_op)(struct vhost_dev *dev); typedef int (*vhost_migration_done_op)(struct vhost_dev *dev, char *mac_addr); -typedef bool (*vhost_backend_can_merge_op)(struct vhost_dev *dev, - uint64_t start1, uint64_t size1, - uint64_t start2, uint64_t size2); typedef int (*vhost_vsock_set_guest_cid_op)(struct vhost_dev *dev, uint64_t guest_cid); typedef int (*vhost_vsock_set_running_op)(struct vhost_dev *dev, int start); @@ -160,7 +157,6 @@ typedef struct VhostOps { vhost_set_vring_enable_op vhost_set_vring_enable; vhost_requires_shm_log_op vhost_requires_shm_log; vhost_migration_done_op vhost_migration_done; -vhost_backend_can_merge_op vhost_backend_can_merge; vhost_vsock_set_guest_cid_op vhost_vsock_set_guest_cid; vhost_vsock_set_running_op vhost_vsock_set_running; vhost_set_iotlb_callback_op vhost_set_iotlb_callback; -- 2.39.1
[PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure
Having multiple devices, some filtering memslots and some not filtering memslots, messes up the "used_memslot" accounting. If we'd have a device the filters out less memory sections after a device that filters out more, we'd be in trouble, because our memslot checks stop working reliably. For example, hotplugging a device that filters out less memslots might end up passing the checks based on max vs. used memslots, but can run out of memslots when getting notified about all memory sections. Further, it will be helpful in memory device context in the near future to know that a RAM memory region section will consume a memslot, and be accounted for in the used vs. free memslots, such that we can implement reservation of memslots for memory devices properly. Whether a device filters this out and would theoretically still have a free memslot is then hidden internally, making overall vhost memslot accounting easier. Let's filter the memslots when creating the vhost memory array, accounting all RAM && !ROM memory regions as "used_memslots" even if vhost_user isn't interested in anonymous RAM regions, because it needs an fd. When a device actually filters out regions (which should happen rarely in practice), we might detect a layout change although only filtered regions changed. We won't bother about optimizing that for now. Note: we cannot simply filter out the region and count them as "filtered" to add them to used, because filtered regions could get merged and result in a smaller effective number of memslots. Further, we won't touch the hmp/qmp virtio introspection output. Fixes: 988a27754bbb ("vhost: allow backends to filter memory sections") Cc: Tiwei Bie Signed-off-by: David Hildenbrand --- hw/virtio/vhost.c | 79 +-- 1 file changed, 55 insertions(+), 24 deletions(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index eb8c4c378c..b7fb960fa9 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -219,8 +219,13 @@ static void vhost_log_sync_range(struct vhost_dev *dev, int i; /* FIXME: this is N^2 in number of sections */ for (i = 0; i < dev->n_mem_sections; ++i) { -MemoryRegionSection *section = &dev->mem_sections[i]; -vhost_sync_dirty_bitmap(dev, section, first, last); +MemoryRegionSection *mrs = &dev->mem_sections[i]; + +if (dev->vhost_ops->vhost_backend_mem_section_filter && +!dev->vhost_ops->vhost_backend_mem_section_filter(dev, mrs)) { +continue; +} +vhost_sync_dirty_bitmap(dev, mrs, first, last); } } @@ -503,12 +508,6 @@ static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *section) return false; } -if (dev->vhost_ops->vhost_backend_mem_section_filter && -!dev->vhost_ops->vhost_backend_mem_section_filter(dev, section)) { -trace_vhost_reject_section(mr->name, 2); -return false; -} - trace_vhost_section(mr->name); return true; } else { @@ -525,6 +524,43 @@ static void vhost_begin(MemoryListener *listener) dev->n_tmp_sections = 0; } +static void vhost_realloc_vhost_memory(struct vhost_dev *dev, + unsigned int nregions) +{ +const size_t size = offsetof(struct vhost_memory, regions) + +nregions * sizeof dev->mem->regions[0]; + +dev->mem = g_realloc(dev->mem, size); +dev->mem->nregions = nregions; +} + +static void vhost_rebuild_vhost_memory(struct vhost_dev *dev) +{ +unsigned int nregions = 0; +int i; + +vhost_realloc_vhost_memory(dev, dev->n_mem_sections); +for (i = 0; i < dev->n_mem_sections; i++) { +struct MemoryRegionSection *mrs = dev->mem_sections + i; +struct vhost_memory_region *cur_vmr; + +if (dev->vhost_ops->vhost_backend_mem_section_filter && +!dev->vhost_ops->vhost_backend_mem_section_filter(dev, mrs)) { +continue; +} +cur_vmr = dev->mem->regions + nregions; +nregions++; + +cur_vmr->guest_phys_addr = mrs->offset_within_address_space; +cur_vmr->memory_size = int128_get64(mrs->size); +cur_vmr->userspace_addr = +(uintptr_t)memory_region_get_ram_ptr(mrs->mr) + +mrs->offset_within_region; +cur_vmr->flags_padding = 0; +} +vhost_realloc_vhost_memory(dev, nregions); +} + static void vhost_commit(MemoryListener *listener) { struct vhost_dev *dev = container_of(listener, struct vhost_dev, @@ -532,7 +568,6 @@ static void vhost_commit(MemoryListener *listener) MemoryRegionSection *old_sections; int n_old_sections; uint64_t log_size; -size_t regions_size; int r; int i; bool changed = false; @@ -564,23 +599,19 @@ static void vhost_commit(MemoryListener *listener) goto out; } -/* Rebuild the regions list from the new sections list */ -regions_s
Re: [PATCH] audio/pwaudio.c: Add Pipewire audio backend for QEMU
On Wednesday, February 15, 2023 4:59:47 PM CET Daniel P. Berrangé wrote: > On Wed, Feb 15, 2023 at 05:18:50PM +0400, Marc-André Lureau wrote: > > Hi > > > > On Wed, Feb 15, 2023 at 12:51 PM Dorinda Bassey wrote: > > > > > > This commit adds a new audiodev backend to allow QEMU to use Pipewire as > > > both an audio sink and source. This backend is available on most systems. > > > > > > > Hmm, I would rather have less audio (and ui) backends in QEMU. (for > > audio, if I could introduce and keep only one, that would be > > GStreamer: to remove the others..) > > Even if we take this patch, and don't have a gstreamer impl, > it feels like we've scope for cutting down the backends. > > The 'oss' driver for example ? On Linux that's long obsolete, > with alsa or one of the higher level APIs available. OSS was > also use on freebsd, but IIUC, sndio is better choice there > now too ? Deprecate (and later remove) 'oss' now ? There is indeed little reason to still use OSS nowadays IMO. At least marking OSS as deprecated would not hurt. > The 'sdl' driver is setup in meson.build as our lowest priority > impl, we'll pick any other driver ahead of sdl. Is there any > compelling reason why we must give users the option of 'sdl' > for audio when we have soo many other choices available ? > Even if using SDL for graphics, it seems like we can use any > other backend for audio. Deprecate (and later remove) 'sdl' > for audio ? > > IIUC, pipewire is positioned to replace pulseaudio. So if we > take a pipewire backend, once pipewire is available in enough > distros we could deprecate the pulseaudio backend and eventually > remove it. Maybe the same applies for 'jack' ? AFAIK Pipewire is Linux only, whereas PulseAudio and JACK are available for Windows and macOS, too. And in it's current version (provided QEMU Pipewire patch), I don't see it as a full replacement for the JACK driver yet. > IOW, could we get to > > - Windows: dsound > - MacOS: coreaudio > - (Open|Net|Free)BSD: sndio > - Linux: alsa/pipewire > > ? > > With regards, > Daniel >
Re: [RFC PATCH 13/16] hw/arm/smmuv3: Add CMDs related to stage 2
Hi Mostafa, On 2/5/23 10:44, Mostafa Saleh wrote: > CMD_TLBI_S2_IPA: As S1+S2 is not enabled, for now this can be the > same as CMD_TLBI_NH_VAA. > > CMD_TLBI_S12_VMALL: Added new function to invalidate TLB by VMID. > > Signed-off-by: Mostafa Saleh > --- > hw/arm/smmu-common.c | 16 > hw/arm/smmuv3.c | 25 +++-- > hw/arm/trace-events | 2 ++ > include/hw/arm/smmu-common.h | 1 + > 4 files changed, 42 insertions(+), 2 deletions(-) > > diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c > index 028a60949a..28089d94a6 100644 > --- a/hw/arm/smmu-common.c > +++ b/hw/arm/smmu-common.c > @@ -133,6 +133,16 @@ static gboolean smmu_hash_remove_by_asid(gpointer key, > gpointer value, > > return SMMU_IOTLB_ASID(*iotlb_key) == asid; > } > + > +static gboolean smmu_hash_remove_by_vmid(gpointer key, gpointer value, > + gpointer user_data) > +{ > +uint16_t vmid = *(uint16_t *)user_data; > +SMMUIOTLBKey *iotlb_key = (SMMUIOTLBKey *)key; > + > +return SMMU_IOTLB_VMID(*iotlb_key) == vmid; > +} > + > static gboolean smmu_hash_remove_by_asid_vmid_iova(gpointer key, gpointer > value, >gpointer user_data) > { > @@ -185,6 +195,12 @@ void smmu_iotlb_inv_asid(SMMUState *s, uint16_t asid) > g_hash_table_foreach_remove(s->iotlb, smmu_hash_remove_by_asid, &asid); > } > > +inline void smmu_iotlb_inv_vmid(SMMUState *s, uint16_t vmid) > +{ > +trace_smmu_iotlb_inv_vmid(vmid); > +g_hash_table_foreach_remove(s->iotlb, smmu_hash_remove_by_vmid, &vmid); > +} > + > /* VMSAv8-64 Translation */ > > /** > diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c > index 8b070f6bb5..2b563a5b1b 100644 > --- a/hw/arm/smmuv3.c > +++ b/hw/arm/smmuv3.c > @@ -1174,14 +1174,35 @@ static int smmuv3_cmdq_consume(SMMUv3State *s) > case SMMU_CMD_TLBI_NH_VA: > smmuv3_s1_range_inval(bs, &cmd); > break; > +case SMMU_CMD_TLBI_S12_VMALL: > +uint16_t vmid = CMD_VMID(&cmd); > + > +if (!STAGE2_SUPPORTED(s->features)) { if you add such checks for S2, may you should consider adding similar ones for existing S1? > +cmd_error = SMMU_CERROR_ILL; > +break; > +} > + > +trace_smmuv3_cmdq_tlbi_s12_vmid(vmid); > +smmu_inv_notifiers_all(&s->smmu_state); > +smmu_iotlb_inv_vmid(bs, vmid); > +break; > +case SMMU_CMD_TLBI_S2_IPA: > +if (!STAGE2_SUPPORTED(s->features)) { > +cmd_error = SMMU_CERROR_ILL; > +break; > +} > +/* > + * As currently only either s1 or s2 are supported > + * we can reuse same function for s2. > + */ > +smmuv3_s1_range_inval(bs, &cmd); Shouldn't we rename the function then? Eric > +break; > case SMMU_CMD_TLBI_EL3_ALL: > case SMMU_CMD_TLBI_EL3_VA: > case SMMU_CMD_TLBI_EL2_ALL: > case SMMU_CMD_TLBI_EL2_ASID: > case SMMU_CMD_TLBI_EL2_VA: > case SMMU_CMD_TLBI_EL2_VAA: > -case SMMU_CMD_TLBI_S12_VMALL: > -case SMMU_CMD_TLBI_S2_IPA: > case SMMU_CMD_ATC_INV: > case SMMU_CMD_PRI_RESP: > case SMMU_CMD_RESUME: > diff --git a/hw/arm/trace-events b/hw/arm/trace-events > index 2dee296c8f..61e2ffade5 100644 > --- a/hw/arm/trace-events > +++ b/hw/arm/trace-events > @@ -12,6 +12,7 @@ smmu_ptw_block_pte(int stage, int level, uint64_t baseaddr, > uint64_t pteaddr, ui > smmu_get_pte(uint64_t baseaddr, int index, uint64_t pteaddr, uint64_t pte) > "baseaddr=0x%"PRIx64" index=0x%x, pteaddr=0x%"PRIx64", pte=0x%"PRIx64 > smmu_iotlb_inv_all(void) "IOTLB invalidate all" > smmu_iotlb_inv_asid(uint16_t asid) "IOTLB invalidate asid=%d" > +smmu_iotlb_inv_vmid(uint16_t vmid) "IOTLB invalidate vmid=%d" > smmu_iotlb_inv_iova(uint16_t asid, uint64_t addr) "IOTLB invalidate asid=%d > addr=0x%"PRIx64 > smmu_inv_notifiers_mr(const char *name) "iommu mr=%s" > smmu_iotlb_lookup_hit(uint16_t asid, uint64_t addr, uint32_t hit, uint32_t > miss, uint32_t p) "IOTLB cache HIT asid=%d addr=0x%"PRIx64" hit=%d miss=%d > hit rate=%d" > @@ -48,6 +49,7 @@ smmuv3_config_cache_miss(uint32_t sid, uint32_t hits, > uint32_t misses, uint32_t > smmuv3_s1_range_inval(int vmid, int asid, uint64_t addr, uint8_t tg, > uint64_t num_pages, uint8_t ttl, bool leaf) "vmid=%d asid=%d addr=0x%"PRIx64" > tg=%d num_pages=0x%"PRIx64" ttl=%d leaf=%d" > smmuv3_cmdq_tlbi_nh(void) "" > smmuv3_cmdq_tlbi_nh_asid(uint16_t asid) "asid=%d" > +smmuv3_cmdq_tlbi_s12_vmid(uint16_t vmid) "vmid=%d" > smmuv3_config_cache_inv(uint32_t sid) "Config cache INV for sid=0x%x" > smmuv3_notify_flag_add(const char *iommu) "ADD SMMUNotifier node for iommu > mr=%s" > smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu > mr=%s" > diff
Re: [PATCH] qemu: make version available in coredump
On Thu, Feb 16, 2023 at 02:30:16PM +0300, Vladimir Sementsov-Ogievskiy wrote: > On 16.02.23 12:44, Daniel P. Berrangé wrote: > > On Wed, Feb 15, 2023 at 05:05:47PM -0500, Stefan Hajnoczi wrote: > > > On Tue, 7 Jun 2022 at 16:33, Vladimir Sementsov-Ogievskiy > > > wrote: > > > > > > > > Add a variable with QEMU_FULL_VERSION definition. Then the content of > > > > the variable is easily searchable: > > > > > > > > strings /path/to/core | grep QEMU_FULL_VERSION > > > > > > > > 'volatile' keyword is used to avoid removing the variable by compiler as > > > > unused. > > > > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy > > > > --- > > > > > > > > Hi all! > > > > > > > > Probably, I just don't know the correct way to get version from core > > > > file. If so, please teach me :) > > > > > > I've never hit this issue because bug reports usually include the QEMU > > > distro package version. Keeping the version string in the core file > > > seems reasonable (unless there is already another way to do this). > > > > > > Something I'm curious about: is the coredump guaranteed to contain > > > static const variables? I wondered if they might be located in the > > > .rodata ELF section and excluded from the coredump because they are > > > referenced in the NT_FILE mmap note instead. Maybe volatile prevents > > > this? > > > > In Fedora / RHEL based systems (and some other distros too IIUC) for > > many years, all binaries have included a "build-id" ELF note which > > uniquely identifies the package build. > > > > Note section [ 3] '.note.gnu.build-id' of 36 bytes at offset 0x3c0: > >Owner Data size Type > >GNU 20 GNU_BUILD_ID > > Build ID: e3143405b7f653a0a65b3295df760fdf2c09ba79 > > > > This can be used to query what RPM it came from (assuming the RPM > > is still in your repos) > > > > dnf repoquery --whatprovides debuginfo(build-id) = ...hash... > > > > this makes it into the coredump files and is what current distro > > tooling uses to find the binary (and libraries). > > > > There are some downsides/limitations with this though, so in > > Fedora 36 a new impl was added alongside which provides full > > package info in json > > > > Note section [ 5] '.note.package' of 136 bytes at offset 0x404: > >Owner Data size Type > >FDO 120 FDO_PACKAGING_METADATA > > Packaging Metadata: > > {"type":"rpm","name":"qemu","version":"7.0.0-13.fc37","architecture":"x86_64","osCpe":"cpe:/o:fedoraproject:fedora:37"} > > Looks very good > > > > > This format is supported by systemd core dump tools > > > >https://systemd.io/ELF_PACKAGE_METADATA/ > > > > I believe it has been proposed (and possibly implemented?) for > > Debian too. > > > > This is a long winded way of asking, do we really need a QEMU specific > > solution here ? Especially one that only tells us a QEMU verison, and > > nothing about the many libraries QEMU links to which affect its > > operational behaviour. > > > Generic solution is of course better. > > Hmm. I'm on Ubuntu 22.04. > > readelf -n /usr/bin/qemu-system-x86_64 > > Displaying notes found in: .note.gnu.property > OwnerData sizeDescription > GNU 0x0020 NT_GNU_PROPERTY_TYPE_0 > Properties: x86 feature: IBT, SHSTK > x86 ISA needed: x86-64-baseline > > Displaying notes found in: .note.gnu.build-id > OwnerData sizeDescription > GNU 0x0014 NT_GNU_BUILD_ID (unique build ID > bitstring) > Build ID: 4298cd7c2623c58e1cd71668d9d48508bb7f8d52 > > Displaying notes found in: .note.ABI-tag > OwnerData sizeDescription > GNU 0x0010 NT_GNU_ABI_TAG (ABI version tag) > OS: Linux, ABI: 3.2.0 > > > OK. But I can't find this build-id in core file.. > > readelf -n /tmp/cores/core.qemu-system-x86.20351.vsementsov-win.1676544081 | > grep -i 'build' gets nothing > > strings /tmp/cores//core.qemu-system-x86.20351.vsementsov-win.1676544081 | > grep 4298cd7c2623c58e nothing as well I don't understand why that's not visible directly, I guess it must be encoded in some binary format instead, because at least tools like eu-unstrip can extract it. eg " # eu-unstrip -n --core a 0x558ff8145000+0xd3f000 e3143405b7f653a0a65b3295df760fdf2c09ba79@0x558ff81453d0 . - /usr/bin/qemu-system-x86_64 0x7fffdf36e000+0x1000 9ff92e165010e0806172add635849ec55533b287@0x7fffdf36e554 . - linux-vdso.so.1 0x7f00d2e35000+0x6028 e62598a2d2be298ca20184413edea75fc5a3f1d7@0x7f00d2e352f8 /usr/bin/../lib64/qemu/accel-tcg-x86_64.so - accel-tcg-x86_64.so 0x7f00d02bd000+0x432b0 05ba68b0c1f03dd879a78a4a8b75713d7134bdbc@0x7f00d02bd2f8 /usr/lib64/gvfs/libgvfscommon.so - libgvfscommon.so 0x7f00d0301000+0x34300 7c9fd184be4d2c3593d4901feca9fd59c4981d11@0x7f00d03012f8 /usr/lib64/gio/modules/libgvfsdbus.so - libgvfsdbus.so 0x7f00d2e3c000+0xf0e0 0
Re: [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure
On Thu, Feb 16, 2023 at 12:47:51PM +0100, David Hildenbrand wrote: > Having multiple devices, some filtering memslots and some not filtering > memslots, messes up the "used_memslot" accounting. If we'd have a device > the filters out less memory sections after a device that filters out more, > we'd be in trouble, because our memslot checks stop working reliably. > For example, hotplugging a device that filters out less memslots might end > up passing the checks based on max vs. used memslots, but can run out of > memslots when getting notified about all memory sections. > > Further, it will be helpful in memory device context in the near future > to know that a RAM memory region section will consume a memslot, and be > accounted for in the used vs. free memslots, such that we can implement > reservation of memslots for memory devices properly. Whether a device > filters this out and would theoretically still have a free memslot is > then hidden internally, making overall vhost memslot accounting easier. > > Let's filter the memslots when creating the vhost memory array, > accounting all RAM && !ROM memory regions as "used_memslots" even if > vhost_user isn't interested in anonymous RAM regions, because it needs > an fd. > > When a device actually filters out regions (which should happen rarely > in practice), we might detect a layout change although only filtered > regions changed. We won't bother about optimizing that for now. That caused trouble in the past when using VGA because it is playing with mappings in weird ways. I think we have to optimize it, sorry. > Note: we cannot simply filter out the region and count them as > "filtered" to add them to used, because filtered regions could get > merged and result in a smaller effective number of memslots. Further, > we won't touch the hmp/qmp virtio introspection output. > > Fixes: 988a27754bbb ("vhost: allow backends to filter memory sections") > Cc: Tiwei Bie > Signed-off-by: David Hildenbrand I didn't review this yet but maybe you can answer: will this create more slots for the backend? Because some backends are limited in # of slots and breaking them is not a good idea. Thanks! > --- > hw/virtio/vhost.c | 79 +-- > 1 file changed, 55 insertions(+), 24 deletions(-) > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > index eb8c4c378c..b7fb960fa9 100644 > --- a/hw/virtio/vhost.c > +++ b/hw/virtio/vhost.c > @@ -219,8 +219,13 @@ static void vhost_log_sync_range(struct vhost_dev *dev, > int i; > /* FIXME: this is N^2 in number of sections */ > for (i = 0; i < dev->n_mem_sections; ++i) { > -MemoryRegionSection *section = &dev->mem_sections[i]; > -vhost_sync_dirty_bitmap(dev, section, first, last); > +MemoryRegionSection *mrs = &dev->mem_sections[i]; > + > +if (dev->vhost_ops->vhost_backend_mem_section_filter && > +!dev->vhost_ops->vhost_backend_mem_section_filter(dev, mrs)) { > +continue; > +} > +vhost_sync_dirty_bitmap(dev, mrs, first, last); > } > } > > @@ -503,12 +508,6 @@ static bool vhost_section(struct vhost_dev *dev, > MemoryRegionSection *section) > return false; > } > > -if (dev->vhost_ops->vhost_backend_mem_section_filter && > -!dev->vhost_ops->vhost_backend_mem_section_filter(dev, section)) > { > -trace_vhost_reject_section(mr->name, 2); > -return false; > -} > - > trace_vhost_section(mr->name); > return true; > } else { > @@ -525,6 +524,43 @@ static void vhost_begin(MemoryListener *listener) > dev->n_tmp_sections = 0; > } > > +static void vhost_realloc_vhost_memory(struct vhost_dev *dev, > + unsigned int nregions) > +{ > +const size_t size = offsetof(struct vhost_memory, regions) + > +nregions * sizeof dev->mem->regions[0]; > + > +dev->mem = g_realloc(dev->mem, size); > +dev->mem->nregions = nregions; > +} > + > +static void vhost_rebuild_vhost_memory(struct vhost_dev *dev) > +{ > +unsigned int nregions = 0; > +int i; > + > +vhost_realloc_vhost_memory(dev, dev->n_mem_sections); > +for (i = 0; i < dev->n_mem_sections; i++) { > +struct MemoryRegionSection *mrs = dev->mem_sections + i; > +struct vhost_memory_region *cur_vmr; > + > +if (dev->vhost_ops->vhost_backend_mem_section_filter && > +!dev->vhost_ops->vhost_backend_mem_section_filter(dev, mrs)) { > +continue; > +} > +cur_vmr = dev->mem->regions + nregions; > +nregions++; > + > +cur_vmr->guest_phys_addr = mrs->offset_within_address_space; > +cur_vmr->memory_size = int128_get64(mrs->size); > +cur_vmr->userspace_addr = > +(uintptr_t)memory_region_get_ram_ptr(mrs->mr) + > +mrs->offset_within_region; > +cur_vmr->flags_p
Re: [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure
On 16.02.23 13:04, Michael S. Tsirkin wrote: On Thu, Feb 16, 2023 at 12:47:51PM +0100, David Hildenbrand wrote: Having multiple devices, some filtering memslots and some not filtering memslots, messes up the "used_memslot" accounting. If we'd have a device the filters out less memory sections after a device that filters out more, we'd be in trouble, because our memslot checks stop working reliably. For example, hotplugging a device that filters out less memslots might end up passing the checks based on max vs. used memslots, but can run out of memslots when getting notified about all memory sections. Further, it will be helpful in memory device context in the near future to know that a RAM memory region section will consume a memslot, and be accounted for in the used vs. free memslots, such that we can implement reservation of memslots for memory devices properly. Whether a device filters this out and would theoretically still have a free memslot is then hidden internally, making overall vhost memslot accounting easier. Let's filter the memslots when creating the vhost memory array, accounting all RAM && !ROM memory regions as "used_memslots" even if vhost_user isn't interested in anonymous RAM regions, because it needs an fd. When a device actually filters out regions (which should happen rarely in practice), we might detect a layout change although only filtered regions changed. We won't bother about optimizing that for now. That caused trouble in the past when using VGA because it is playing with mappings in weird ways. I think we have to optimize it, sorry. We still filter them out, just later. Note: we cannot simply filter out the region and count them as "filtered" to add them to used, because filtered regions could get merged and result in a smaller effective number of memslots. Further, we won't touch the hmp/qmp virtio introspection output. Fixes: 988a27754bbb ("vhost: allow backends to filter memory sections") Cc: Tiwei Bie Signed-off-by: David Hildenbrand I didn't review this yet but maybe you can answer: will this create more slots for the backend? Because some backends are limited in # of slots and breaking them is not a good idea. It restores the handling we had before 988a27754bbb. RAM without an fd should be rare for vhost-user setups (where we actually filter) I assume? -- Thanks, David / dhildenb
Re: [PATCH v1 1/2] vhost: Defer filtering memory sections until building the vhost memory structure
On 16.02.23 13:10, David Hildenbrand wrote: On 16.02.23 13:04, Michael S. Tsirkin wrote: On Thu, Feb 16, 2023 at 12:47:51PM +0100, David Hildenbrand wrote: Having multiple devices, some filtering memslots and some not filtering memslots, messes up the "used_memslot" accounting. If we'd have a device the filters out less memory sections after a device that filters out more, we'd be in trouble, because our memslot checks stop working reliably. For example, hotplugging a device that filters out less memslots might end up passing the checks based on max vs. used memslots, but can run out of memslots when getting notified about all memory sections. Further, it will be helpful in memory device context in the near future to know that a RAM memory region section will consume a memslot, and be accounted for in the used vs. free memslots, such that we can implement reservation of memslots for memory devices properly. Whether a device filters this out and would theoretically still have a free memslot is then hidden internally, making overall vhost memslot accounting easier. Let's filter the memslots when creating the vhost memory array, accounting all RAM && !ROM memory regions as "used_memslots" even if vhost_user isn't interested in anonymous RAM regions, because it needs an fd. When a device actually filters out regions (which should happen rarely in practice), we might detect a layout change although only filtered regions changed. We won't bother about optimizing that for now. That caused trouble in the past when using VGA because it is playing with mappings in weird ways. I think we have to optimize it, sorry. We still filter them out, just later. To be precise, we still filter out all DIRTY_MEMORY_VGA as we used to. Only the device-specific filtering (vhost-user) is modified. -- Thanks, David / dhildenb
Re: [RFC PATCH v2 1/2] qapi, i386: Move kernel-hashes to SevCommonProperties
Dov Murik writes: > Hello Markus, > > On 16/02/2023 11:24, Markus Armbruster wrote: >> Dov Murik writes: >> >>> In order to enable kernel-hashes for SNP, pull it from >>> SevGuestProperties to its parent SevCommonProperties so >>> it will be available for both SEV and SNP. >> >> Missing >> >> Signed-off-by: Dov Murik >> > > Oops, thanks. I'll fix. > >> Patch does not apply for me. >> > > This patch series is based on AMD's upmv10-snpv3 tree: > > https://github.com/mdroth/qemu/tree/upmv10-snpv3 > > Have you tried to apply it on top of that tree? Missed that part in the cover letter, oops :) Recommend to also express it like Based-on: so machines like Patchew can apply it correctly. However, that upmv10-snpv3 branch is a bit over 1700 commits behind upstream master. I'm afraid you guys need to rebase :)