Re: [PATCH] acpi/nvdimm: Define trace events for NVDIMM and substitute nvdimm_debug()
On Thu, 2022-07-07 at 11:21 +0200, Igor Mammedov wrote: > On Mon, 4 Jul 2022 16:58:52 +0800 > Robert Hoo wrote: > > > Signed-off-by: Robert Hoo > > Reviewed-by: Jingqi Liu > > Reviewed-by: Igor Mammedov Thanks for review Igor. BTW, during the unit test, I met some bios-table test error, https://gitlab.com/qemu-project/qemu/-/issues/1098, perhaps related to your patches in June. https://patchwork.kernel.org/project/qemu-devel/cover/20220608135340.3304695-1-imamm...@redhat.com/ > > > ---
Re: [RFC PATCH 01/12] vhost: Get vring base from vq, not svq
On Mon, Jul 18, 2022 at 7:48 AM Jason Wang wrote: > > On Sat, Jul 16, 2022 at 7:34 PM Eugenio Pérez wrote: > > > > The SVQ vring used idx usually match with the guest visible one, as long > > as all the guest buffers (GPA) maps to exactly one buffer within qemu's > > VA. However, as we can see in virtqueue_map_desc, a single guest buffer > > could map to many buffers in SVQ vring. > > > > The solution is to stop using the device's used idx and check for the > > last avail idx. Since we cannot report in-flight descriptors with vdpa, > > let's rewind all of them. > > > > Fixes: 6d0b22266633 ("vdpa: Adapt vhost_vdpa_get_vring_base to SVQ") > > Signed-off-by: Eugenio Pérez > > --- > > hw/virtio/vhost-vdpa.c | 8 +--- > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > > index 795ed5a049..18820498b3 100644 > > --- a/hw/virtio/vhost-vdpa.c > > +++ b/hw/virtio/vhost-vdpa.c > > @@ -1194,11 +1194,10 @@ static int vhost_vdpa_get_vring_base(struct > > vhost_dev *dev, > > struct vhost_vring_state *ring) > > { > > struct vhost_vdpa *v = dev->opaque; > > -int vdpa_idx = ring->index - dev->vq_index; > > int ret; > > > > if (v->shadow_vqs_enabled) { > > -VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, > > vdpa_idx); > > +VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index); > > > > /* > > * Setting base as last used idx, so destination will see as > > available > > @@ -1208,7 +1207,10 @@ static int vhost_vdpa_get_vring_base(struct > > vhost_dev *dev, > > * TODO: This is ok for networking, but other kinds of devices > > might > > * have problems with these retransmissions. > > */ > > -ring->num = svq->last_used_idx; > > +while (virtqueue_rewind(vq, 1)) { > > +continue; > > +} > > Will this leak mapped VirtQueueElement? > vhost_get_vring_base op is called only on the device's teardown path, so they should be free by vhost_svq_stop. However, you have a point that maybe vhost_get_vring_base should not trust in that cleaning, even for -stable. So I think it should be better to squash this and the next one as the same fix. But it's doing two things at the same time: One of them is to use the right state (as vring_base), and another one is not reverting in-flight descriptors. Thanks! > Thanks > > > +ring->num = virtio_queue_get_last_avail_idx(dev->vdev, > > ring->index); > > return 0; > > } > > > > -- > > 2.31.1 > > >
Re: [RFC PATCH 02/12] vhost: Move SVQ queue rewind to the destination
On Mon, Jul 18, 2022 at 7:50 AM Jason Wang wrote: > > On Sat, Jul 16, 2022 at 7:34 PM Eugenio Pérez wrote: > > > > Migration with SVQ already migrate the inflight descriptors, > > How is this done? > Migration between vhost-vdpa with x-svq=on maintain the guest's visible state in VirtQueues, and they already have all the protocols to migrate them. We cannot migrate if we cannot restore the state / inflight in the destination, but since we need x-svq=on at this point, we can restore both of them. Extra care will be needed when ASID is introduced. > > so the > > destination can perform the work. > > > > This makes easier to migrate between backends or to recover them in > > vhost devices that support set in flight descriptors. > > > > Signed-off-by: Eugenio Pérez > > --- > > hw/virtio/vhost-vdpa.c | 24 +++- > > 1 file changed, 11 insertions(+), 13 deletions(-) > > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > > index 18820498b3..4458c8d23e 100644 > > --- a/hw/virtio/vhost-vdpa.c > > +++ b/hw/virtio/vhost-vdpa.c > > @@ -1178,7 +1178,18 @@ static int vhost_vdpa_set_vring_base(struct > > vhost_dev *dev, > > struct vhost_vring_state *ring) > > { > > struct vhost_vdpa *v = dev->opaque; > > +VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index); > > > > +/* > > + * vhost-vdpa devices does not support in-flight requests. Set all of > > them > > + * as available. > > + * > > + * TODO: This is ok for networking, but other kinds of devices might > > + * have problems with these retransmissions. > > + */ > > +while (virtqueue_rewind(vq, 1)) { > > +continue; > > +} > > if (v->shadow_vqs_enabled) { > > /* > > * Device vring base was set at device start. SVQ base is handled > > by > > @@ -1197,19 +1208,6 @@ static int vhost_vdpa_get_vring_base(struct > > vhost_dev *dev, > > int ret; > > > > if (v->shadow_vqs_enabled) { > > -VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index); > > - > > -/* > > - * Setting base as last used idx, so destination will see as > > available > > - * all the entries that the device did not use, including the > > in-flight > > - * processing ones. > > - * > > - * TODO: This is ok for networking, but other kinds of devices > > might > > - * have problems with these retransmissions. > > - */ > > -while (virtqueue_rewind(vq, 1)) { > > -continue; > > -} > > I think it's not a good idea to partially revert the code that was > just introduced in the previous patch (unless it can be backported to > -stable independently). > > We probably need to squash the changes. > You definitely have a point. Maybe it's better to remove the "Fixes" tag? -stable version cannot enable SVQ anyway (it doesn't have the parameter). I'll send a v2 when we have all the comments sorted out. Maybe a better alternative is to delay the x-svq parameter to the top of both series if both of them go in? Thanks! > Thanks > > > ring->num = virtio_queue_get_last_avail_idx(dev->vdev, > > ring->index); > > return 0; > > } > > -- > > 2.31.1 > > >
Re: [RFC PATCH 03/12] vdpa: Small rename of error labels
On Mon, Jul 18, 2022 at 7:50 AM Jason Wang wrote: > > On Sat, Jul 16, 2022 at 7:34 PM Eugenio Pérez wrote: > > > > So later patches are cleaner > > > > Signed-off-by: Eugenio Pérez > > --- > > hw/virtio/vhost-vdpa.c | 5 ++--- > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > > index 4458c8d23e..906c365036 100644 > > --- a/hw/virtio/vhost-vdpa.c > > +++ b/hw/virtio/vhost-vdpa.c > > @@ -1039,7 +1039,7 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev > > *dev) > > int r; > > bool ok = vhost_vdpa_svq_setup(dev, svq, i, &err); > > if (unlikely(!ok)) { > > -goto err; > > +goto err_svq_setup; > > } > > > > vhost_svq_start(svq, dev->vdev, vq); > > @@ -1064,8 +1064,7 @@ err_set_addr: > > err_map: > > vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, i)); > > > > -err: > > -error_reportf_err(err, "Cannot setup SVQ %u: ", i); > > This does not look like a simple rename. > That's right. I'll drop his patch from the series, it's not really useful and only adds noise. Thanks! > Thanks > > > +err_svq_setup: > > for (unsigned j = 0; j < i; ++j) { > > VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, j); > > vhost_vdpa_svq_unmap_rings(dev, svq); > > -- > > 2.31.1 > > >
Re: Building tools on unsupported cpu/arch
On 30/04/2022 16.11, Michael Tokarev wrote: Hello! Previously, it was possible to build qemu tools (such as qemu-img, or qemu-ga) on an unsupported cpu/architecture. In a hackish way, by specifying --enable-tcg-interpreter on the ./configure line. Today (with 7.0), it does not work anymore, with the following error during configure: common-user/meson.build:1:0: ERROR: Include dir host/unknown does not exist. This is with --disable-system --disable-linux-user --disable-user. And without --enable-tcg-interpreter, it gives: meson.build:390:6: ERROR: Problem encountered: Unsupported CPU m68k, try --enable-tcg-interpreter What's the way to build tools on an unsupported architecture these days? Hi Michael, I think all required patches should now have been merged - could you please try again whether building the tools now works again for you on unsupported CPU architectures? Thanks, Thomas
RE: [PATCH] i386: Disable BTS and PEBS
>-Original Message- >From: Like Xu >Sent: Monday, July 18, 2022 11:57 AM >To: Duan, Zhenzhong >Cc: pbonz...@redhat.com; mtosa...@redhat.com; Christopherson,, Sean >; Ma, XiangfeiX ; qemu- >de...@nongnu.org >Subject: Re: [PATCH] i386: Disable BTS and PEBS > >On 18/7/2022 11:22 am, Zhenzhong Duan wrote: >> Since below KVM commit, KVM hided BTS as it's not supported yet. >> b9181c8ef356 ("KVM: x86/pmu: Avoid exposing Intel BTS feature") >> >> After below KVM commit, it gave control of MSR_IA32_MISC_ENABLES to >userspace. >> 9fc222967a39 ("KVM: x86: Give host userspace full control of >> MSR_IA32_MISC_ENABLES") >> >> So qemu takes the responsibility to hide BTS. >> Without fix, we get below warning in guest kernel: >> >> [] unchecked MSR access error: WRMSR to 0x1d9 (tried to write >> 0x01c0) at rIP: 0xaa070644 >(native_write_msr+0x4/0x20) [] Call Trace: >> [] >> [] intel_pmu_enable_bts+0x5d/0x70 >> [] bts_event_add+0x77/0x90 >> [] event_sched_in.isra.135+0x99/0x1e0 >> >> Tested-by: Xiangfei Ma >> Signed-off-by: Zhenzhong Duan >> --- >> target/i386/cpu.h | 6 -- >> target/i386/kvm/kvm.c | 4 >> 2 files changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/target/i386/cpu.h b/target/i386/cpu.h index >> 82004b65b944..8a83d0995c66 100644 >> --- a/target/i386/cpu.h >> +++ b/target/i386/cpu.h >> @@ -434,8 +434,10 @@ typedef enum X86Seg { >> >> #define MSR_IA32_MISC_ENABLE0x1a0 >> /* Indicates good rep/movs microcode on some processors: */ >> -#define MSR_IA32_MISC_ENABLE_DEFAULT1 >> -#define MSR_IA32_MISC_ENABLE_MWAIT (1ULL << 18) >> +#define MSR_IA32_MISC_ENABLE_DEFAULT 1 >> +#define MSR_IA32_MISC_ENABLE_BTS_UNAVAIL (1ULL << 11) #define >> +MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL (1ULL << 12) >> +#define MSR_IA32_MISC_ENABLE_MWAIT(1ULL << 18) >> >> #define MSR_MTRRphysBase(reg) (0x200 + 2 * (reg)) >> #define MSR_MTRRphysMask(reg) (0x200 + 2 * (reg) + 1) >> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index >> f148a6d52fa4..002e0520dd76 100644 >> --- a/target/i386/kvm/kvm.c >> +++ b/target/i386/kvm/kvm.c >> @@ -2180,6 +2180,10 @@ void kvm_arch_reset_vcpu(X86CPU *cpu) >> { >> CPUX86State *env = &cpu->env; >> >> +/* Disable BTS feature which is unsupported on KVM */ >> +env->msr_ia32_misc_enable |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL; >> +env->msr_ia32_misc_enable |= >MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL; > >Would it be more readable to group msr_ia32_misc_enable code into this >function: > > static void x86_cpu_reset(DeviceState *dev) OK, will do. Initially I thought BTS/PEBS stuffs will only be implemented in KVM, so thought putting it in kvm_arch_reset_vcpu() may be better. > >and, why disable PEBS (we need it at least for "-cpu host,migratable=no") ? Will fix. I see in below commit in KVM side, PEBS is initialized as unavailable together with BTS. 9fc222967a39 ("KVM: x86: Give host userspace full control of MSR_IA32_MISC_ENABLES"). Then I thought PEBS is unsupported too. > >Also, the behavior of MISC_ENABLE_EMON is also inconsistent with >"pmu=off”. Will init EMON bit based on pmu setting. Thanks! Zhenzhong
Re: [PATCH v4 00/12] Improve reliability of VM tests
On 12/07/2022 20.34, John Snow wrote: On Fri, Jul 8, 2022 at 11:35 AM John Snow wrote: Note: patches 10-12 are included for testing simplicity, they shouldn't be merged. They will be included in a forthcoming block PR. Patches 1-9 are fully reviewed. Whose tree should this go in? I can take them - unless Alex beats me with his next testing pull request ;-) Queued to my testing-next now: https://gitlab.com/thuth/qemu/-/commits/testing-next Thomas
Re: [PATCH v4 00/12] Improve reliability of VM tests
On 08/07/2022 17.34, John Snow wrote: Note: patches 10-12 are included for testing simplicity, they shouldn't be merged. They will be included in a forthcoming block PR. V4: - Addressed concern by Marc-Andre in patch 01. - Squashed Ubuntu patches (rth) This patch series attempts to improve the reliability of several of the VM test targets. In particular, both CentOS 8 tests are non-functional because CentOS 8 was EOL at the beginning of this calendar year, with repositories and mirrors going offline. I also remove the ubuntu.i386 test because we no longer support Ubuntu 18.04 nor do we have explicit need of an i386 build test. After this series, I am able to successfully run every VM target on an x86_64 host, except: - ubuntu.aarch64: Hangs often during testing, see below. - centos.aarch64: Hangs often during testing, see below. - haiku.x86_64: Build failures not addressed by this series, see https://lists.gnu.org/archive/html/qemu-devel/2022-06/msg02103.html The unit tests that I see fail most often under aarch64 are: - virtio-net-failover: Seems to like to hang on openbsd - migration-test: Tends to hang under aarch64 tcg Future work (next version? next series?); - Try to get centos.aarch64 working reliably under TCG - Upgrade ubuntu.aarch64 to 20.04 after fixing centos.aarch64 - Fix the Haiku build test, if possible. - Ensure I can reliably run and pass "make vm-build-all". (Remove VMs from this recipe if necessary.) Not sure whether it's related to your patches, but when testing these I just got an error while running 'vm-build-openbsd' : VM-BUILD openbsd fatal: not a valid object name: failed to archive qemu Failed to prepare guest environment Traceback (most recent call last): File "/home/thuth/devel/qemu/tests/vm/basevm.py", line 641, in main vm.add_source_dir(args.build_qemu) File "/home/thuth/devel/qemu/tests/vm/basevm.py", line 277, in add_source_dir stdout=self._stdout, stderr=self._stderr) File "/usr/lib64/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['./scripts/archive-source.sh', '/home/thuth/tmp/qemu-build/vm-test-rm_z92hq.tmp/data-44e42.tar']' returned non-zero exit status 1. The error did not occur again when running the command again, though. Thomas
[PATCH] x86: cpu: Error out if memory exceeds addressable range
Currently it is possible to start a guest with memory that is beyond the addressable range of CPU and QEMU does not even warn about it. The default phys_bits is 40 and can address 1TB. However it allows to start a guest with greater than 1TB memory. Prevent this by erroring out in such a scenario. Reported-by: Shaju Abraham Signed-off-by: Nikunj A Dadhania --- target/i386/cpu.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 6a57ef13af..1afbdbac7d 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6376,6 +6376,7 @@ static void x86_cpu_hyperv_realize(X86CPU *cpu) static void x86_cpu_realizefn(DeviceState *dev, Error **errp) { +MachineState *machine = MACHINE(qdev_get_machine()); CPUState *cs = CPU(dev); X86CPU *cpu = X86_CPU(dev); X86CPUClass *xcc = X86_CPU_GET_CLASS(dev); @@ -6541,6 +6542,15 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp) } } +if (BIT_ULL(cpu->phys_bits) < machine->maxram_size) { +error_setg(&local_err, "cannot setup guest memory: " + "%s memory(%lu MiB) exceeds addressable limit(%llu MiB)", + machine->maxram_size == machine->ram_size ? "" : "max", + machine->maxram_size / MiB, + BIT_ULL(cpu->phys_bits) / MiB); +goto out; +} + /* Cache information initialization */ if (!cpu->legacy_cache) { if (!xcc->model || !xcc->model->cpudef->cache_info) { -- 2.32.0
Re: [PATCH 1/4] Modifying ‘migrate’ qmp command to add multi-FD socket on particular source and destination pair
Het Gala writes: > i) Modified the format of the qemu monitor command : 'migrate' by adding a > list, >each element in the list consists of multi-FD connection parameters: source >and destination uris and of the number of multi-fd channels between each > pair. > > ii) Information of all multi-FD connection parameters’ list, length of the > list > and total number of multi-fd channels for all the connections together is > stored in ‘OutgoingArgs’ struct. > > Suggested-by: Manish Mishra > Signed-off-by: Het Gala > --- > include/qapi/util.h | 9 > migration/migration.c | 47 ++ > migration/socket.c| 53 --- > migration/socket.h| 17 +- > monitor/hmp-cmds.c| 22 -- > qapi/migration.json | 43 +++ > 6 files changed, 170 insertions(+), 21 deletions(-) > > diff --git a/include/qapi/util.h b/include/qapi/util.h > index 81a2b13a33..3041feb3d9 100644 > --- a/include/qapi/util.h > +++ b/include/qapi/util.h > @@ -56,4 +56,13 @@ int parse_qapi_name(const char *name, bool complete); > (tail) = &(*(tail))->next; \ > } while (0) > > +#define QAPI_LIST_LENGTH(list) ({ \ > +int _len = 0; \ > +typeof(list) _elem; \ > +for (_elem = list; _elem != NULL; _elem = _elem->next) { \ > +_len++; \ > +} \ > +_len; \ > +}) > + Unless there is a compelling reason for open-coding this, make it a (non-inline) function. > #endif > diff --git a/migration/migration.c b/migration/migration.c > index 31739b2af9..c408175aeb 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -2328,13 +2328,14 @@ static bool migrate_prepare(MigrationState *s, bool > blk, bool blk_inc, > return true; > } > > -void qmp_migrate(const char *uri, bool has_blk, bool blk, > +void qmp_migrate(const char *uri, bool has_multi_fd_uri_list, > + MigrateUriParameterList *cap, bool has_blk, bool blk, > bool has_inc, bool inc, bool has_detach, bool detach, > bool has_resume, bool resume, Error **errp) > { > Error *local_err = NULL; > MigrationState *s = migrate_get_current(); > -const char *p = NULL; > +const char *dst_ptr = NULL; > > if (!migrate_prepare(s, has_blk && blk, has_inc && inc, > has_resume && resume, errp)) { > @@ -2348,20 +2349,46 @@ void qmp_migrate(const char *uri, bool has_blk, bool > blk, > } > } > > +/* > + * In case of Multi-FD migration parameters, if uri is provided, > + * supports only tcp network protocol. > + */ > +if (has_multi_fd_uri_list) { > +int length = QAPI_LIST_LENGTH(cap); > +init_multifd_array(length); > +for (int i = 0; i < length; i++) { > +const char *p1 = NULL, *p2 = NULL; > +const char *multifd_dst_uri = cap->value->destination_uri; > +const char *multifd_src_uri = cap->value->source_uri; > +uint8_t multifd_channels = cap->value->multifd_channels; > +if (!strstart(multifd_dst_uri, "tcp:", &p1) || > +!strstart(multifd_src_uri, "tcp:", &p2)) { > +error_setg(errp, "multi-fd destination and multi-fd source " > +"uri, both should be present and follows tcp protocol only"); > +break; > +} else { > +store_multifd_migration_params(p1 ? p1 : multifd_dst_uri, > +p2 ? p2 : multifd_src_uri, > +multifd_channels, i, &local_err); > +} > +cap = cap->next; > +} > +} > + > migrate_protocol_allow_multi_channels(false); > -if (strstart(uri, "tcp:", &p) || > +if (strstart(uri, "tcp:", &dst_ptr) || > strstart(uri, "unix:", NULL) || > strstart(uri, "vsock:", NULL)) { > migrate_protocol_allow_multi_channels(true); > -socket_start_outgoing_migration(s, p ? p : uri, &local_err); > +socket_start_outgoing_migration(s, dst_ptr ? dst_ptr : uri, > &local_err); > #ifdef CONFIG_RDMA > -} else if (strstart(uri, "rdma:", &p)) { > -rdma_start_outgoing_migration(s, p, &local_err); > +} else if (strstart(uri, "rdma:", &dst_ptr)) { > +rdma_start_outgoing_migration(s, dst_ptr, &local_err); > #endif > -} else if (strstart(uri, "exec:", &p)) { > -exec_start_outgoing_migration(s, p, &local_err); > -} else if (strstart(uri, "fd:", &p)) { > -fd_start_outgoing_migration(s, p, &local_err); > +} else if (strstart(uri, "exec:", &dst_ptr)) { > +exec_start_outgoing_migration(s, dst_ptr, &local_err); > +} else if (strstart(uri, "fd:", &dst_ptr)) { > +fd_start_outgoing_migration(s, dst_ptr, &local_err); > } else { > if (!(has_resume && resume)) { > yank_
Re: [RFC PATCH 8/8] block: remove all unused ->can_set_aio_ctx and ->set_aio_ctx callbacks
Am 15/07/2022 um 16:34 schrieb Hanna Reitz: > On 12.07.22 23:19, Emanuele Giuseppe Esposito wrote: >> Together with all _can_set_ and _set_ APIs, as they are not needed >> anymore. >> >> Signed-off-by: Emanuele Giuseppe Esposito >> --- >> block.c | 196 - >> block/block-backend.c | 33 - >> blockjob.c | 35 -- >> include/block/block-global-state.h | 9 -- >> include/block/block_int-common.h | 4 - >> 5 files changed, 277 deletions(-) > > Looks good! I’d just like a follow-up commit that also drops > bdrv_try_set_aio_context(), so it’s all gone (I think that’s the final > remnant?). > It's the same for me, I thought renaming bdrv_try_set_aio_context was a little bit unnecessary. You want to rename it to something else, or directly call bdrv_child_try_change_aio_context? I agree with the rest of comments in this series :) Emanuele
Re: [RFC PATCH 3/8] RFC: block: use transactions as a replacement of ->{can_}set_aio_context()
Am 14/07/2022 um 18:45 schrieb Hanna Reitz: > On 12.07.22 23:19, Emanuele Giuseppe Esposito wrote: >> - >> RFC because I am not sure about the AioContext locks. >> - Do we need to take the new AioContext lock? what does it protect? >> - Taking the old AioContext lock is required now, because of >> bdrv_drained_begin calling AIO_WAIT_WHILE that unlocks the >> aiocontext. If we replace it with AIO_WAIT_WHILE_UNLOCKED, >> could we get rid of taking every time the old AioContext? >> drain would be enough to protect the graph modification. > > It’s been a while, but as far as I remember (which may be wrong), the > reason for how the locks are supposed to be taken was mostly that we > need some defined state so that we know how to invoke > bdrv_drained_begin() and bdrv_drained_end() (i.e. call the first one > as-is, and switch the locks around for the latter one). Right, so bdrv_drained_begin must be always under old aiocontext lock, bdrv_drained_end always under the new one. If so, then I think I can make the whole thing even cleaner: 1. bdrv_drained_begin is taken in bdrv_change_aio_context, and not in a transaction. This was the initial idea, but somehow it didn't work, something was failing. However, I must have changed something now because all tests are passing :) Then old aiocontext lock is assumed to be taken by the caller. 2. old aiocontext lock is released 3. set_aio_context (actual detach + attach operation) are kept in a commit transaciton, as it is now. They don't need any aiocontext lock, as far as I understand. bdrv_drained_end is implemented in a .clean transaction 4. new aiocontext lock is taken 5. either tran_abort or tran_commit, depending on the result of the recursion. Disadvantage: the unlock/lock mess is still there, but at least we know now why we need that. Advantage: nothing should break (because it is similar as it was before), no double lock held, and I can always add an additional patch where I use _UNLOCKED functions and see that happens. In addition, no tran_add_tail needed. > > The idea of using _UNLOCKED sounds interesting, almost too obvious. I > can’t see why that wouldn’t work, actually. > >> -- >> >> Simplify the way the aiocontext can be changed in a BDS graph. >> There are currently two problems in bdrv_try_set_aio_context: >> - There is a confusion of AioContext locks taken and released, because >> we assume that old aiocontext is always taken and new one is >> taken inside. > > Yep, and that assumption is just broken in some cases, which is the main > pain point I’m feeling with it right now. > > For example, look at bdrv_attach_child_common(): Here, we attach a child > to a parent, so we need to get them into a common AioContext. So first > we try to put the child into the parent’s context, and if that fails, > we’ll try the other way, putting the parent into the child’s context. > > The problem is clear: The bdrv_try_set_aio_context() call requires us to > hold the child’s current context but not the parent’s, and the > child_class->set_aio_ctx() call requires the exact opposite. But we > never switch the context we have acquired, so this can’t both work. > Better yet, nowhere is it defined what context a caller to > bdrv_attach_child_common() will hold. > > In practice, what happens here most of the time is that something will > be moved from the main context to some other context, and since we’re in > the main context already, that’ll just work. But you can construct > cases where something is attempted to be moved from an I/O thread into a > different thread and then you’ll get a crash. > > I’d be happy if we could do away with the requirement of having to hold > any lock for changing a node’s AioContext. > >> - It doesn't look very safe to call bdrv_drained_begin while some >> nodes have already switched to the new aiocontext and others haven't. >> This could be especially dangerous because bdrv_drained_begin >> polls, so >> something else could be executed while graph is in an inconsistent >> state. >> >> Additional minor nitpick: can_set and set_ callbacks both traverse the >> graph, both using the ignored list of visited nodes in a different way. >> >> Therefore, get rid of all of this and introduce a new callback, >> change_aio_context, that uses transactions to efficiently, cleanly >> and most importantly safely change the aiocontext of a graph. >> >> This new callback is a "merge" of the two previous ones: >> - Just like can_set_aio_context, recursively traverses the graph. >> Marks all nodes that are visited using a GList, and checks if >> they *could* change the aio_context. >> - For each node that passes the above check, add a new transaction >> that implements a callback that effectively changes the aiocontext. >> - If a node is a BDS, add two transactions: one taking care of draining >> the node at the beginning of the list (so that will be executed first) >> and one at t
[PATCH v2 0/1] MAINTAINERS: Add myself as Guest Agent co-maintainer
v1 -> v2: Rebase to current master v1: https://patchew.org/QEMU/20220713133249.2229623-1-kkost...@redhat.com/ Konstantin Kostiuk (1): MAINTAINERS: Add myself as Guest Agent co-maintainer MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.25.1
[PATCH v2 1/1] MAINTAINERS: Add myself as Guest Agent co-maintainer
Signed-off-by: Konstantin Kostiuk Acked-by: Michael Roth --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index ead2bed652..6af9cd985c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2881,7 +2881,7 @@ T: git https://repo.or.cz/qemu/armbru.git qapi-next QEMU Guest Agent M: Michael Roth -R: Konstantin Kostiuk +M: Konstantin Kostiuk S: Maintained F: qga/ F: docs/interop/qemu-ga.rst -- 2.25.1
Re: [RFC PATCH 07/12] vhost: Add VhostVDPAStartOp operation
On Sat, Jul 16, 2022 at 7:34 PM Eugenio Pérez wrote: > > It allows to run commands at start of the device, before it have enabled > any queue. > > Signed-off-by: Eugenio Pérez > --- > hw/virtio/vhost-shadow-virtqueue.h | 3 +++ > include/hw/virtio/vhost-vdpa.h | 5 + > hw/virtio/vhost-vdpa.c | 8 > 3 files changed, 16 insertions(+) > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h > b/hw/virtio/vhost-shadow-virtqueue.h > index 03eb7ff670..210fe393cd 100644 > --- a/hw/virtio/vhost-shadow-virtqueue.h > +++ b/hw/virtio/vhost-shadow-virtqueue.h > @@ -26,6 +26,8 @@ typedef struct SVQDescState { > } SVQDescState; > > typedef struct VhostShadowVirtqueue VhostShadowVirtqueue; > +typedef int (*ShadowVirtQueueStart)(VhostShadowVirtqueue *svq, > +void *opaque); > > /** > * Callback to handle an avail buffer. > @@ -43,6 +45,7 @@ typedef int (*VirtQueueAvailCallback)(VhostShadowVirtqueue > *svq, >void *vq_callback_opaque); > > typedef struct VhostShadowVirtqueueOps { > +ShadowVirtQueueStart start; What's the difference between this and start_op? Thanks > VirtQueueAvailCallback avail_handler; > } VhostShadowVirtqueueOps; > > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h > index d10a89303e..b7d18b4e30 100644 > --- a/include/hw/virtio/vhost-vdpa.h > +++ b/include/hw/virtio/vhost-vdpa.h > @@ -24,6 +24,10 @@ typedef struct VhostVDPAHostNotifier { > void *addr; > } VhostVDPAHostNotifier; > > +struct vhost_vdpa; > +/* Called after send DRIVER_OK but after enabling vrings */ > +typedef int (*VhostVDPAStartOp)(struct vhost_vdpa *v); > + > typedef struct vhost_vdpa { > int device_fd; > int index; > @@ -39,6 +43,7 @@ typedef struct vhost_vdpa { > GPtrArray *shadow_vqs; > const VhostShadowVirtqueueOps *shadow_vq_ops; > void *shadow_vq_ops_opaque; > +VhostVDPAStartOp start_op; > struct vhost_dev *dev; > VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX]; > } VhostVDPA; > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > index 1d8829c619..48f031b8c0 100644 > --- a/hw/virtio/vhost-vdpa.c > +++ b/hw/virtio/vhost-vdpa.c > @@ -1136,6 +1136,14 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, > bool started) > if (unlikely(r)) { > return r; > } > + > +if (v->start_op) { > +r = v->start_op(v); > +if (unlikely(r)) { > +return r; > +} > +} > + > vhost_vdpa_set_vring_ready(dev); > } else { > vhost_vdpa_reset_device(dev); > -- > 2.31.1 >
Re: [RFC PATCH 09/12] vdpa: Extract vhost_vdpa_net_svq_add from vhost_vdpa_net_handle_ctrl_avail
在 2022/7/16 19:34, Eugenio Pérez 写道: So we can reuse to inject state messages. Signed-off-by: Eugenio Pérez --- net/vhost-vdpa.c | 89 +++- 1 file changed, 51 insertions(+), 38 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 211bd0468b..aaae51a778 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -334,6 +334,54 @@ static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s, return true; } +static virtio_net_ctrl_ack vhost_vdpa_net_svq_add(VhostShadowVirtqueue *svq, + const struct iovec *dev_buffers) The name should be tweaked since it is used only for cvq. +{ +/* in buffer used for device model */ +virtio_net_ctrl_ack status; +const struct iovec in = { +.iov_base = &status, +.iov_len = sizeof(status), +}; +size_t dev_written; +int r; +void *unused = (void *)1; + +r = vhost_svq_add(svq, &dev_buffers[0], 1, &dev_buffers[1], 1, unused); +if (unlikely(r != 0)) { +if (unlikely(r == -ENOSPC)) { +qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n", + __func__); +} +return VIRTIO_NET_ERR; +} + +/* + * We can poll here since we've had BQL from the time we sent the + * descriptor. Also, we need to take the answer before SVQ pulls by itself, + * when BQL is released + */ This reminds me that, do we need a upper limit of the time on the polling here. (Avoid taking BQL for too long time). Thanks +dev_written = vhost_svq_poll(svq); +if (unlikely(dev_written < sizeof(status))) { +error_report("Insufficient written data (%zu)", dev_written); +return VIRTIO_NET_ERR; +} + +memcpy(&status, dev_buffers[1].iov_base, sizeof(status)); +if (status != VIRTIO_NET_OK) { +return VIRTIO_NET_ERR; +} + +status = VIRTIO_NET_ERR; +virtio_net_handle_ctrl_iov(svq->vdev, &in, 1, dev_buffers, 1); +if (status != VIRTIO_NET_OK) { +error_report("Bad CVQ processing in model"); +return VIRTIO_NET_ERR; +} + +return VIRTIO_NET_OK; +} + static int vhost_vdpa_start_control_svq(struct vhost_vdpa *v) { struct vhost_vring_state state = { @@ -392,19 +440,13 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, void *opaque) { VhostVDPAState *s = opaque; -size_t in_len, dev_written; +size_t in_len; virtio_net_ctrl_ack status = VIRTIO_NET_ERR; /* out and in buffers sent to the device */ struct iovec dev_buffers[2] = { { .iov_base = s->cvq_cmd_out_buffer }, { .iov_base = s->cvq_cmd_in_buffer }, }; -/* in buffer used for device model */ -const struct iovec in = { -.iov_base = &status, -.iov_len = sizeof(status), -}; -int r; bool ok; ok = vhost_vdpa_net_cvq_map_elem(s, elem, dev_buffers); @@ -417,36 +459,7 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, goto out; } -r = vhost_svq_add(svq, &dev_buffers[0], 1, &dev_buffers[1], 1, elem); -if (unlikely(r != 0)) { -if (unlikely(r == -ENOSPC)) { -qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n", - __func__); -} -goto out; -} - -/* - * We can poll here since we've had BQL from the time we sent the - * descriptor. Also, we need to take the answer before SVQ pulls by itself, - * when BQL is released - */ -dev_written = vhost_svq_poll(svq); -if (unlikely(dev_written < sizeof(status))) { -error_report("Insufficient written data (%zu)", dev_written); -goto out; -} - -memcpy(&status, dev_buffers[1].iov_base, sizeof(status)); -if (status != VIRTIO_NET_OK) { -goto out; -} - -status = VIRTIO_NET_ERR; -virtio_net_handle_ctrl_iov(svq->vdev, &in, 1, dev_buffers, 1); -if (status != VIRTIO_NET_OK) { -error_report("Bad CVQ processing in model"); -} +status = vhost_vdpa_net_svq_add(svq, dev_buffers); out: in_len = iov_from_buf(elem->in_sg, elem->in_num, 0, &status, @@ -462,7 +475,7 @@ out: if (dev_buffers[1].iov_base) { vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, dev_buffers[1].iov_base); } -return r; +return status == VIRTIO_NET_OK ? 0 : 1; } static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
Re: [PATCH v2 1/1] python/machine: Fix AF_UNIX path too long on macOS
On Sat, Jul 16, 2022 at 10:34:34AM -0700, Peter Delevoryas wrote: > On macOS, private $TMPDIR's are the default. These $TMPDIR's are > generated from a user's unix UID and UUID [1], which can create a > relatively long path: > > /var/folders/d7/rz20f6hd709c1ty8f6_6y_z4gn/T/ > > QEMU's avocado tests create a temporary directory prefixed by > "avo_qemu_sock_", and create QMP sockets within _that_ as well. > The QMP socket is unnecessarily long, because a temporary directory > is created for every QEMUMachine object. > > /avo_qemu_sock_uh3w_dgc/qemu-37331-10bacf110-monitor.sock Looking at this again, I realize my suggestion for dealing with the second part of the path was mistaken. The "qemu-37331-10bacf110-monitor.sock" part is combining two pieces. First the result of f"qemu-{os.getpid()}-{id(self):02x}" is qemu-37331-10bacf110 and the code later than appends '-monitor.sock' So... > > The path limit for unix sockets on macOS is 104: [2] > > /* > * [XSI] Definitions for UNIX IPC domain. > */ > struct sockaddr_un { > unsigned char sun_len;/* sockaddr len including null */ > sa_family_t sun_family; /* [XSI] AF_UNIX */ > charsun_path[104]; /* [XSI] path name (gag) */ > }; > > This results in avocado tests failing on macOS because the QMP unix > socket can't be created, because the path is too long: > > ERROR| Failed to establish connection: OSError: AF_UNIX path too long > > This change reduces the size of both paths, and removes the unique > identification information from the socket name, since it seems to be > unnecessary. > > This commit produces paths like the following: > > pdel@pdel-mbp:/var/folders/d7/rz20f6hd709c1ty8f6_6y_z4gn/T > $ tree qemu* > qemu_oc7h7f3u > ├── qmp-console.sock > └── qmp-monitor.sock > > [1] > https://apple.stackexchange.com/questions/353832/why-is-mac-osx-temp-directory-in-weird-path > [2] > /Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk/usr/include/sys/un.h > > Signed-off-by: Peter Delevoryas > --- > python/qemu/machine/machine.py | 2 +- > tests/avocado/avocado_qemu/__init__.py | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py > index 37191f433b..b1823966b3 100644 > --- a/python/qemu/machine/machine.py > +++ b/python/qemu/machine/machine.py > @@ -157,7 +157,7 @@ def __init__(self, > self._wrapper = wrapper > self._qmp_timer = qmp_timer > > -self._name = name or f"qemu-{os.getpid()}-{id(self):02x}" > +self._name = name or "qmp" ...my suggestion here was wrong. We don't need the os.getpid() unoiqueness because the tmpdir already ensures that is safe, but keeping 'id(self)' is a good idea, if the test case creates multiple machines concurrently. Bearing in mind we later append '-monitor.sock' we don't need 'qmp' in the self._name. So on reflection I think I should have suggested using: self._name = name or f"{id(self):02x}" And *in addition*, a few lines later change: self._monitor_address = os.path.join( self.sock_dir, f"{self._name}-monitor.sock" ) To self._monitor_address = os.path.join( self.sock_dir, f"{self._name}.qmp" ) > diff --git a/tests/avocado/avocado_qemu/__init__.py > b/tests/avocado/avocado_qemu/__init__.py > index ed4853c805..43b8c8848c 100644 > --- a/tests/avocado/avocado_qemu/__init__.py > +++ b/tests/avocado/avocado_qemu/__init__.py > @@ -296,7 +296,7 @@ def require_accelerator(self, accelerator): > "available" % accelerator) > > def _new_vm(self, name, *args): > -self._sd = tempfile.TemporaryDirectory(prefix="avo_qemu_sock_") > +self._sd = tempfile.TemporaryDirectory(prefix="qemu_") This bit is fine. > vm = QEMUMachine(self.qemu_bin, base_temp_dir=self.workdir, > sock_dir=self._sd.name, log_dir=self.logdir) > self.log.debug('QEMUMachine "%s" created', name) With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [RFC PATCH 11/12] vdpa: Add virtio-net mac address via CVQ at start
在 2022/7/16 19:34, Eugenio Pérez 写道: This is needed so the destination vdpa device see the same state a the guest set in the source. Signed-off-by: Eugenio Pérez --- net/vhost-vdpa.c | 53 +++- 1 file changed, 52 insertions(+), 1 deletion(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 0183fce353..2873be2ba4 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -383,7 +383,7 @@ static virtio_net_ctrl_ack vhost_vdpa_net_svq_add(VhostShadowVirtqueue *svq, return VIRTIO_NET_OK; } -static int vhost_vdpa_start_control_svq(struct vhost_vdpa *v) +static int vhost_vdpa_enable_control_svq(struct vhost_vdpa *v) { struct vhost_vring_state state = { .index = v->dev->vq_index, @@ -395,6 +395,57 @@ static int vhost_vdpa_start_control_svq(struct vhost_vdpa *v) return r < 0 ? -errno : r; } +static int vhost_vdpa_start_control_svq(struct vhost_vdpa *v) +{ + +VirtIONet *n = VIRTIO_NET(v->dev->vdev); +uint64_t features = v->dev->vdev->host_features; +VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, 0); +VhostVDPAState *s = container_of(v, VhostVDPAState, vhost_vdpa); +int r; + +r = vhost_vdpa_enable_control_svq(v); +if (unlikely(r < 0)) { +return r; +} + +if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) { +const struct virtio_net_ctrl_hdr ctrl = { +.class = VIRTIO_NET_CTRL_MAC, +.cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET, +}; +uint8_t mac[6]; +const struct iovec out[] = { +{ +.iov_base = (void *)&ctrl, +.iov_len = sizeof(ctrl), +},{ +.iov_base = mac, +.iov_len = sizeof(mac), +}, +}; +struct iovec dev_buffers[2] = { +{ .iov_base = s->cvq_cmd_out_buffer }, +{ .iov_base = s->cvq_cmd_in_buffer }, +}; +bool ok; +virtio_net_ctrl_ack state; + +ok = vhost_vdpa_net_cvq_map_sg(s, out, ARRAY_SIZE(out), dev_buffers); +if (unlikely(!ok)) { +return -1; +} + +memcpy(mac, n->mac, sizeof(mac)); +state = vhost_vdpa_net_svq_add(svq, dev_buffers); +vhost_vdpa_cvq_unmap_buf(v, dev_buffers[0].iov_base); +vhost_vdpa_cvq_unmap_buf(v, dev_buffers[1].iov_base); Any reason we do per buffer unmap instead of the sg unmap here? Thanks +return state == VIRTIO_NET_OK ? 0 : 1; +} + +return 0; +} + /** * Do not forward commands not supported by SVQ. Otherwise, the device could * accept it and qemu would not know how to update the device model.
Re: Internal MAC addresses list (mac_table) usage
On Thu, Jul 14, 2022 at 7:54 PM Ovchinnikov, Vitalii wrote: > > Hi Jason, > > Thanks for pointing out that corner case with "52:54:00:12:34:XX". > > In the NIC model I'm developing qemu_macaddr_default_if_unset is called every > time MAC is updated in the NIC registers. > This way a just assigned "52:54:00:12:34:XX" MAC is at least marked as used > in the mac_table. > > However it doesn't cover the case when "52:54:00:12:34:XX" MAC being assigned > through NIC registers has already been assigned to another NIC by QEMU. This should be fine, and it needs to be addressed in a separate patch. Thanks > So one more improvement the code might need is a way to check whether MAC is > free or used from within NIC model. > Returning bool from qemu_macaddr_default_if_unset may well do the trick. > Moreover it might also help to spot an error when -1 is returned from > qemu_macaddr_get_free (for the time being it's silently interpreted as 0xFF > MAC LSB). > > BR, > Vitalii > > From: Jason Wang > Sent: Thursday, July 14, 2022 9:44 > To: Ovchinnikov, Vitalii > Cc: qemu-devel@nongnu.org > Subject: Re: Internal MAC addresses list (mac_table) usage > > On Tue, Jul 12, 2022 at 4:43 PM Ovchinnikov, Vitalii > wrote: > > > > Hi folks, > > > > While developing an Ethernet NIC model I noticed that QEMU maintains the > > following internal array which marks used/free MAC addresses in net/net.c: > > > > static int mac_table[256] = {0}; > > > > with three private (static) functions accessing it: qemu_macaddr_set_used, > > qemu_macaddr_set_free, qemu_macaddr_get_free. > > Public (non-static) interface to this array includes two functions: > > qemu_macaddr_default_if_unset and qemu_del_nic. > > > > The vast majority of existing NIC models calls > > qemu_macaddr_default_if_unset in their *_realize functions replacing > > zeroed-out MAC address with the free one returned by QEMU, for instance > > (lan9118_realize functions from hw/net/lan9118.c): > > > >... > > qemu_macaddr_default_if_unset(&s->conf.macaddr); > > > > s->nic = qemu_new_nic(&net_lan9118_info, &s->conf, > > object_get_typename(OBJECT(dev)), dev->id, s); > > qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a); > >... > > > > qemu_del_nic is being called from net_cleanup function right before QEMU > > finishes execution. > > > > What appears to be a possible SW architecture gap is that NIC models have > > no means to inform QEMU about changing their MAC addresses during execution > > (again from hw/net/lan9118.c, do_mac_write function): > > > > case MAC_ADDRH: > > s->conf.macaddr.a[4] = val & 0xff; > > s->conf.macaddr.a[5] = (val >> 8) & 0xff; > > lan9118_mac_changed(s); > > break; > > case MAC_ADDRL: > > s->conf.macaddr.a[0] = val & 0xff; > > s->conf.macaddr.a[1] = (val >> 8) & 0xff; > > s->conf.macaddr.a[2] = (val >> 16) & 0xff; > > s->conf.macaddr.a[3] = (val >> 24) & 0xff; > > lan9118_mac_changed(s); > > break; > > > > lan9118_mac_changed function here simply changes NIC info string using > > qemu_format_nic_info_str, hence stale MAC address stays marked as used in > > the mac_table whereas it's not actually in use any more. > > > > Am I right in thinking of it as a SW architecture gap/bug that needs to be > > addressed? > > I think so. Note that the code can not deal with the case when > "52:54:00:12:34:XX" was passed from cli. > > Thanks > > > > > BR, > > Vitalii > > > > >
Re: [PATCH v4 2/2] ui/gtk: a new array param monitor to specify the target displays
Dongwon Kim writes: > On Tue, Jul 12, 2022 at 08:11:08AM +0200, Markus Armbruster wrote: >> Dongwon Kim writes: >> >> > New integer array parameter, 'monitor' is for specifying the target >> > monitors where individual GTK windows are placed upon launching. >> > >> > Monitor numbers in the array are associated with virtual consoles >> > in the order of [VC0, VC1, VC2 ... VCn]. >> > >> > Every GTK window containing each VC will be placed in the region >> > of corresponding monitors. >> > >> > Usage: -display gtk,monitor.=,.. >> >ex)-display gtk,monitor.0=1,monitor.1=0 >> > >> > Cc: Daniel P. Berrangé >> > Cc: Markus Armbruster >> > Cc: Philippe Mathieu-Daudé >> > Cc: Paolo Bonzini >> > Cc: Gerd Hoffmann >> > Cc: Vivek Kasireddy >> > Signed-off-by: Dongwon Kim >> > --- >> > qapi/ui.json| 9 - >> > qemu-options.hx | 3 ++- >> > ui/gtk.c| 30 -- >> > 3 files changed, 38 insertions(+), 4 deletions(-) >> > >> > diff --git a/qapi/ui.json b/qapi/ui.json >> > index 413371d5e8..ee0f9244ef 100644 >> > --- a/qapi/ui.json >> > +++ b/qapi/ui.json >> > @@ -1195,12 +1195,19 @@ >> > # assuming the guest will resize the display to match >> > # the window size then. Otherwise it defaults to "off". >> > # Since 3.1 >> > +# @monitor: Array of numbers, each of which represents physical >> > +# monitor where GTK window containing a given VC will be >> > +# placed. Each monitor number in the array will be >> > +# associated with a virtual console starting from VC0. >> > +# >> > +# since 7.1 >> >> I dislike repeating the type (here: array of numbers) in the >> description. >> >> Suggest something like >> >># @monitor: List of physical monitor numbers where the GTK windows >># containing the virtual consoles VC0, VC1, ... are to be >># placed. (Since 7.1) >> >> Missing: what happens when there are more VCs than list elements. Can >> you tell us? > > # @monitor: List of physical monitor numbers where the GTK windows > # containing the virtual consoles VC0, VC1, ... are to be > # placed. If a mapping exists for a VC, then it'd be > # placed on that specific physical monitor; otherwise, > # it'd default to the monitor from where it was launched > # since 7.1 > > How does this look? Pretty good! Nitpicks: replace "id'd" by "it will" or "it is to be", end your second sentence with a period, and format "since" like we do elsewhere. Together: # @monitor: List of physical monitor numbers where the GTK windows # containing the virtual consoles VC0, VC1, ... are to be # placed. If a mapping exists for a VC, then it is to be # placed on that specific physical monitor; otherwise, # it defaults to the monitor from where it was launched. # (Since 7.1) >> >> > # >> > # Since: 2.12 >> > ## >> > { 'struct' : 'DisplayGTK', >> >'data': { '*grab-on-hover' : 'bool', >> > -'*zoom-to-fit' : 'bool' } } >> > +'*zoom-to-fit' : 'bool', >> > +'*monitor' : ['uint16'] } } >> > >> > ## >> > # @DisplayEGLHeadless: >> >> [...] >>
Re: [RFC PATCH 11/12] vdpa: Add virtio-net mac address via CVQ at start
On Mon, Jul 18, 2022 at 10:55 AM Jason Wang wrote: > > > 在 2022/7/16 19:34, Eugenio Pérez 写道: > > This is needed so the destination vdpa device see the same state a the > > guest set in the source. > > > > Signed-off-by: Eugenio Pérez > > --- > > net/vhost-vdpa.c | 53 +++- > > 1 file changed, 52 insertions(+), 1 deletion(-) > > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c > > index 0183fce353..2873be2ba4 100644 > > --- a/net/vhost-vdpa.c > > +++ b/net/vhost-vdpa.c > > @@ -383,7 +383,7 @@ static virtio_net_ctrl_ack > > vhost_vdpa_net_svq_add(VhostShadowVirtqueue *svq, > > return VIRTIO_NET_OK; > > } > > > > -static int vhost_vdpa_start_control_svq(struct vhost_vdpa *v) > > +static int vhost_vdpa_enable_control_svq(struct vhost_vdpa *v) > > { > > struct vhost_vring_state state = { > > .index = v->dev->vq_index, > > @@ -395,6 +395,57 @@ static int vhost_vdpa_start_control_svq(struct > > vhost_vdpa *v) > > return r < 0 ? -errno : r; > > } > > > > +static int vhost_vdpa_start_control_svq(struct vhost_vdpa *v) > > +{ > > + > > +VirtIONet *n = VIRTIO_NET(v->dev->vdev); > > +uint64_t features = v->dev->vdev->host_features; > > +VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, 0); > > +VhostVDPAState *s = container_of(v, VhostVDPAState, vhost_vdpa); > > +int r; > > + > > +r = vhost_vdpa_enable_control_svq(v); > > +if (unlikely(r < 0)) { > > +return r; > > +} > > + > > +if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) { > > +const struct virtio_net_ctrl_hdr ctrl = { > > +.class = VIRTIO_NET_CTRL_MAC, > > +.cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET, > > +}; > > +uint8_t mac[6]; > > +const struct iovec out[] = { > > +{ > > +.iov_base = (void *)&ctrl, > > +.iov_len = sizeof(ctrl), > > +},{ > > +.iov_base = mac, > > +.iov_len = sizeof(mac), > > +}, > > +}; > > +struct iovec dev_buffers[2] = { > > +{ .iov_base = s->cvq_cmd_out_buffer }, > > +{ .iov_base = s->cvq_cmd_in_buffer }, > > +}; > > +bool ok; > > +virtio_net_ctrl_ack state; > > + > > +ok = vhost_vdpa_net_cvq_map_sg(s, out, ARRAY_SIZE(out), > > dev_buffers); > > +if (unlikely(!ok)) { > > +return -1; > > +} > > + > > +memcpy(mac, n->mac, sizeof(mac)); > > +state = vhost_vdpa_net_svq_add(svq, dev_buffers); > > +vhost_vdpa_cvq_unmap_buf(v, dev_buffers[0].iov_base); > > +vhost_vdpa_cvq_unmap_buf(v, dev_buffers[1].iov_base); > > > Any reason we do per buffer unmap instead of the sg unmap here? > I think I don't get this comment. vhost_vdpa_net_handle_ctrl_avail also unmap each buffer individually, and I need a function to unmap one of them at a time. I could create a function to unmap a whole sg, but I'm not sure how much value it adds. Thanks! > Thanks > > > > +return state == VIRTIO_NET_OK ? 0 : 1; > > +} > > + > > +return 0; > > +} > > + > > /** > >* Do not forward commands not supported by SVQ. Otherwise, the device > > could > >* accept it and qemu would not know how to update the device model. >
Re: [RFC PATCH 3/8] RFC: block: use transactions as a replacement of ->{can_}set_aio_context()
Am 14/07/2022 um 18:45 schrieb Hanna Reitz: >> + * First, go recursively through all nodes in the graph, and see if they >> + * can change their AioContext. >> + * If so, add for each node a new transaction with a callback to >> change the >> + * AioContext with the new one. >> + * Once recursion is completed, if all nodes are allowed to change their >> + * AioContext then commit the transaction, running all callbacks in >> order. >> + * Otherwise don't do anything. > > Nice explanation, but I’d start with something more simple to describe > the external interface, like “Change bs's and recursively all of its > parents' and children's AioContext to the given new context, returning > an error if that isn't possible. If ignore_child is not NULL, that > child (and its subgraph) will not be touched.” You want to have your suggestion as a replacement or addition to mine? > >> + * >> + * This function still requires the caller to take the bs current >> + * AioContext lock, otherwise draining could fail since AIO_WAIT_WHILE > > Well, let’s just say “will” instead of “could”. Callers don’t need to > know they can get lucky when they ignore the rules. > >> + * assumes the lock is always held if bs is in another AioContext. >> + */ >> +int bdrv_child_try_change_aio_context(BlockDriverState *bs, >> AioContext *ctx, >> + BdrvChild *ignore_child, Error >> **errp) > > Do the other functions need to be public? Outside of this file, this > one seems sufficient to me, but I may well be overlooking something. > > Also, why is this function called bdrv_child_*, and not just > bdrv_try_change_aio_context()? (Or maybe have a > bdrv_try_change_aio_context() variant without the ignore_child > parameter, just like bdrv_try_set_aio_context()?) > > Actually, wouldn’t just bdrv_change_aio_context() be sufficient? I > mean, it isn’t called bdrv_try_co_pwritev(). Most functions try to do > something and return errors if they can’t. Not sure if we need to make > that clear in the name. > > (I may be wrong, but I suspect bdrv_try_set_aio_context() is only named > such that the compiler will check that the bdrv_set_aio_context() > interface is completely unused; > 42a65f02f9b380bd8074882d5844d4ea033389cc’s commit message does seem to > confirm that.) Ok, I will change the name in bdrv_change_aio_context. And I now understand your suggestion on the last patch to remove bdrv_try_set_aio_context. Emanuele
Re: [RFC PATCH 07/12] vhost: Add VhostVDPAStartOp operation
On Mon, Jul 18, 2022 at 10:50 AM Jason Wang wrote: > > On Sat, Jul 16, 2022 at 7:34 PM Eugenio Pérez wrote: > > > > It allows to run commands at start of the device, before it have enabled > > any queue. > > > > Signed-off-by: Eugenio Pérez > > --- > > hw/virtio/vhost-shadow-virtqueue.h | 3 +++ > > include/hw/virtio/vhost-vdpa.h | 5 + > > hw/virtio/vhost-vdpa.c | 8 > > 3 files changed, 16 insertions(+) > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h > > b/hw/virtio/vhost-shadow-virtqueue.h > > index 03eb7ff670..210fe393cd 100644 > > --- a/hw/virtio/vhost-shadow-virtqueue.h > > +++ b/hw/virtio/vhost-shadow-virtqueue.h > > @@ -26,6 +26,8 @@ typedef struct SVQDescState { > > } SVQDescState; > > > > typedef struct VhostShadowVirtqueue VhostShadowVirtqueue; > > +typedef int (*ShadowVirtQueueStart)(VhostShadowVirtqueue *svq, > > +void *opaque); > > > > /** > > * Callback to handle an avail buffer. > > @@ -43,6 +45,7 @@ typedef int > > (*VirtQueueAvailCallback)(VhostShadowVirtqueue *svq, > >void *vq_callback_opaque); > > > > typedef struct VhostShadowVirtqueueOps { > > +ShadowVirtQueueStart start; > > What's the difference between this and start_op? > This is a leftover, I'll delete for the next series. Thanks! > Thanks > > > VirtQueueAvailCallback avail_handler; > > } VhostShadowVirtqueueOps; > > > > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h > > index d10a89303e..b7d18b4e30 100644 > > --- a/include/hw/virtio/vhost-vdpa.h > > +++ b/include/hw/virtio/vhost-vdpa.h > > @@ -24,6 +24,10 @@ typedef struct VhostVDPAHostNotifier { > > void *addr; > > } VhostVDPAHostNotifier; > > > > +struct vhost_vdpa; > > +/* Called after send DRIVER_OK but after enabling vrings */ > > +typedef int (*VhostVDPAStartOp)(struct vhost_vdpa *v); > > + > > typedef struct vhost_vdpa { > > int device_fd; > > int index; > > @@ -39,6 +43,7 @@ typedef struct vhost_vdpa { > > GPtrArray *shadow_vqs; > > const VhostShadowVirtqueueOps *shadow_vq_ops; > > void *shadow_vq_ops_opaque; > > +VhostVDPAStartOp start_op; > > struct vhost_dev *dev; > > VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX]; > > } VhostVDPA; > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > > index 1d8829c619..48f031b8c0 100644 > > --- a/hw/virtio/vhost-vdpa.c > > +++ b/hw/virtio/vhost-vdpa.c > > @@ -1136,6 +1136,14 @@ static int vhost_vdpa_dev_start(struct vhost_dev > > *dev, bool started) > > if (unlikely(r)) { > > return r; > > } > > + > > +if (v->start_op) { > > +r = v->start_op(v); > > +if (unlikely(r)) { > > +return r; > > +} > > +} > > + > > vhost_vdpa_set_vring_ready(dev); > > } else { > > vhost_vdpa_reset_device(dev); > > -- > > 2.31.1 > > >
Re: [PATCH 1/2] block/parallels: Fix buffer-based write call
On 14.07.2022 15:28, Hanna Reitz wrote: Commit a4072543ccdddbd241d5962d9237b8b41fd006bf has changed the I/O here from working on a local one-element I/O vector to just using the buffer directly (using the bdrv_co_pread()/bdrv_co_pwrite() helper functions introduced shortly before). However, it only changed the bdrv_co_preadv() call to bdrv_co_pread() - the subsequent bdrv_co_pwritev() call stayed this way, and so still expects a QEMUIOVector pointer instead of a plain buffer. We must change that to be a bdrv_co_pwrite() call. Fixes: a4072543ccdddbd241d5962d ("block/parallels: use buffer-based io") Signed-off-by: Hanna Reitz --- block/parallels.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/parallels.c b/block/parallels.c index 8b235b9505..a229c06f25 100644 --- a/block/parallels.c +++ b/block/parallels.c @@ -241,8 +241,8 @@ static int64_t allocate_clusters(BlockDriverState *bs, int64_t sector_num, return ret; } -ret = bdrv_co_pwritev(bs->file, s->data_end * BDRV_SECTOR_SIZE, - nb_cow_bytes, buf, 0); +ret = bdrv_co_pwrite(bs->file, s->data_end * BDRV_SECTOR_SIZE, + nb_cow_bytes, buf, 0); qemu_vfree(buf); if (ret < 0) { return ret; Reviewed-by: Denis V. Lunev
Re: [PATCH 2/2] iotests/131: Add parallels regression test
On 14.07.2022 15:28, Hanna Reitz wrote: Test an allocating write to a parallels image that has a backing node. Before HEAD^, doing so used to give me a failed assertion (when the backing node contains only `42` bytes; the results varies with the value chosen, for `0` bytes, for example, all I get is EIO). Signed-off-by: Hanna Reitz --- tests/qemu-iotests/131 | 35 ++- tests/qemu-iotests/131.out | 13 + 2 files changed, 47 insertions(+), 1 deletion(-) diff --git a/tests/qemu-iotests/131 b/tests/qemu-iotests/131 index d7456cae5b..a847692b4c 100755 --- a/tests/qemu-iotests/131 +++ b/tests/qemu-iotests/131 @@ -43,7 +43,7 @@ _supported_os Linux inuse_offset=$((0x2c)) -size=64M +size=$((64 * 1024 * 1024)) CLUSTER_SIZE=64k IMGFMT=parallels _make_test_img $size @@ -70,6 +70,39 @@ _check_test_img _check_test_img -r all { $QEMU_IO -c "read -P 0x11 64k 64k" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir +echo "== allocate with backing ==" +# Verify that allocating clusters works fine even when there is a backing image. +# Regression test for a bug where we would pass a buffer read from the backing +# node as a QEMUIOVector object, which could cause anything from I/O errors over +# assertion failures to invalid reads from memory. + +# Clear image +_make_test_img $size +# Create base image +TEST_IMG="$TEST_IMG.base" _make_test_img $size + +# Write some data to the base image (which would trigger an assertion failure if +# interpreted as a QEMUIOVector) +$QEMU_IO -c 'write -P 42 0 64k' "$TEST_IMG.base" | _filter_qemu_io + +# Parallels does not seem to support storing a backing filename in the image +# itself, so we need to build our backing chain on the command line +imgopts="driver=$IMGFMT,file.driver=$IMGPROTO,file.filename=$TEST_IMG" +imgopts+=",backing.driver=$IMGFMT" +imgopts+=",backing.file.driver=$IMGPROTO,backing.file.filename=$TEST_IMG.base" + +# Cause allocation in the top image +QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT \ +$QEMU_IO --image-opts "$imgopts" -c 'write -P 1 0 64' | _filter_qemu_io + +# Verify +QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT \ +$QEMU_IO --image-opts "$imgopts" \ +-c 'read -P 1 0 64' \ +-c "read -P 42 64 $((64 * 1024 - 64))" \ +-c "read -P 0 64k $((size - 64 * 1024))" \ +| _filter_qemu_io + # success, all done echo "*** done" rm -f $seq.full diff --git a/tests/qemu-iotests/131.out b/tests/qemu-iotests/131.out index 70da03dee5..de5ef7a8f5 100644 --- a/tests/qemu-iotests/131.out +++ b/tests/qemu-iotests/131.out @@ -37,4 +37,17 @@ Double checking the fixed image now... No errors were found on the image. read 65536/65536 bytes at offset 65536 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +== allocate with backing == +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864 +wrote 65536/65536 bytes at offset 0 +64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 64/64 bytes at offset 0 +64 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 64/64 bytes at offset 0 +64 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 65472/65472 bytes at offset 64 +63.938 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 67043328/67043328 bytes at offset 65536 +63.938 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) *** done Reviewed-by: Denis V. Lunev
Re: [PATCH 0/3] target: RFC: display deprecation note for '-cpu help'
On 14/07/2022 17.07, Daniel P. Berrangé wrote: When querying '-cpu help' there is no presentation of fact that a CPU may be deprecated. The user just has to try it and see if they get a depecation message at runtime. The QMP command for querying CPUs report a deprecation bool flag, but not the explanatory reason. The Icelake-Client CPU (removed in 6df39f5e583ca0f67bd934d1327f9ead2e3bd49c) handled this by modifying the '.notes' section to add the word 'deprecated': { .version = 2, .note = "no TSX, deprecated", .alias = "Icelake-Client-noTSX", .props = (PropValue[]) { { "hle", "off" }, { "rtm", "off" }, { /* end of list */ } }, }, This relies on the person deprecating the CPU to remember to do this, and is redundant when this info is already expressed in the '.deprecation_note' field. This short series suggests just modifying the '-cpu help' formatter so that it displays the full deprecation message eg $ qemu-system-x86_64 -cpu help: Available CPUs: x86 486 (alias configured by machine type) (deprecated: use at least 'Nehalem' / 'Opteron_G4', or 'host' / 'max') I wonder if this is too verbose, and we should just do a concise flag like approach, similar to QMP: $ qemu-system-x86_64 -cpu help: Available CPUs: x86 486 (alias configured by machine type) (deprecated) leaving the full message to be displayed at runtime ? I'm slightly inclined to the simpler more concise output. I'd prefer to keep it short here and just write "deprecated" without the reason. Otherwise this will overflow the lines and break the readability of the output. And it's also what we're also doing for "-machine", e.g.: $ ./qemu-system-ppc64 -M help | grep deprecate taihutaihu (deprecated) $ ./qemu-system-ppc64 -M taihu qemu-system-ppc64: warning: Machine type 'taihu' is deprecated: incomplete, use 'ref405ep' instead Thomas
[PULL 0/1] MAINTAINERS: Add myself as Guest Agent co-maintainer
The following changes since commit 0ebf76aae58324b8f7bf6af798696687f5f4c2a9: Merge tag 'nvme-next-pull-request' of git://git.infradead.org/qemu-nvme into staging (2022-07-15 15:38:13 +0100) are available in the Git repository at: g...@github.com:kostyanf14/qemu.git tags/qga-win32-pull-2022-07-18 for you to fetch changes up to e8cbe5842bad80bc2df692dea70134da0d13c556: MAINTAINERS: Add myself as Guest Agent co-maintainer (2022-07-18 11:56:09 +0300) qga-win32-pull-2022-07-18 Konstantin Kostiuk (1): MAINTAINERS: Add myself as Guest Agent co-maintainer MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.25.1
[PULL 1/1] MAINTAINERS: Add myself as Guest Agent co-maintainer
Signed-off-by: Konstantin Kostiuk Acked-by: Michael Roth --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index ead2bed652..6af9cd985c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2881,7 +2881,7 @@ T: git https://repo.or.cz/qemu/armbru.git qapi-next QEMU Guest Agent M: Michael Roth -R: Konstantin Kostiuk +M: Konstantin Kostiuk S: Maintained F: qga/ F: docs/interop/qemu-ga.rst -- 2.25.1
Re: [PATCH 0/3] target: RFC: display deprecation note for '-cpu help'
On Mon, Jul 18 2022, Thomas Huth wrote: > On 14/07/2022 17.07, Daniel P. Berrangé wrote: >> $ qemu-system-x86_64 -cpu help: >> Available CPUs: >> x86 486 (alias configured by machine type) (deprecated: >> use at least 'Nehalem' / 'Opteron_G4', or 'host' / 'max') >> >> I wonder if this is too verbose, and we should just do a >> concise flag like approach, similar to QMP: >> >> $ qemu-system-x86_64 -cpu help: >> Available CPUs: >> x86 486 (alias configured by machine type) (deprecated) >> >> leaving the full message to be displayed at runtime ? I'm slightly >> inclined to the simpler more concise output. > > I'd prefer to keep it short here and just write "deprecated" without the > reason. Otherwise this will overflow the lines and break the readability of > the output. And it's also what we're also doing for "-machine", e.g.: > > $ ./qemu-system-ppc64 -M help | grep deprecate > taihutaihu (deprecated) > $ ./qemu-system-ppc64 -M taihu > qemu-system-ppc64: warning: Machine type 'taihu' is deprecated: incomplete, > use 'ref405ep' instead Ok, following what -machine does is certainly a good point. Is it easy enough the figure out the deprecation note? I think you either have to actually start something with the deprecated entity, or use qmp (which is not that straightforward)?
Re: [PATCH 0/3] target: RFC: display deprecation note for '-cpu help'
On Mon, Jul 18, 2022 at 11:37:35AM +0200, Cornelia Huck wrote: > On Mon, Jul 18 2022, Thomas Huth wrote: > > > On 14/07/2022 17.07, Daniel P. Berrangé wrote: > >> $ qemu-system-x86_64 -cpu help: > >> Available CPUs: > >> x86 486 (alias configured by machine type) (deprecated: > >> use at least 'Nehalem' / 'Opteron_G4', or 'host' / 'max') > >> > >> I wonder if this is too verbose, and we should just do a > >> concise flag like approach, similar to QMP: > >> > >> $ qemu-system-x86_64 -cpu help: > >> Available CPUs: > >> x86 486 (alias configured by machine type) (deprecated) > >> > >> leaving the full message to be displayed at runtime ? I'm slightly > >> inclined to the simpler more concise output. > > > > I'd prefer to keep it short here and just write "deprecated" without the > > reason. Otherwise this will overflow the lines and break the readability of > > the output. And it's also what we're also doing for "-machine", e.g.: > > > > $ ./qemu-system-ppc64 -M help | grep deprecate > > taihutaihu (deprecated) > > $ ./qemu-system-ppc64 -M taihu > > qemu-system-ppc64: warning: Machine type 'taihu' is deprecated: incomplete, > > use 'ref405ep' instead > > Ok, following what -machine does is certainly a good point. Yes, I should have thought to check what -machine does, it makese sense to be consistent. > Is it easy enough the figure out the deprecation note? I think you > either have to actually start something with the deprecated entity, or > use qmp (which is not that straightforward)? QMP doesn't tell you the note, just a boolean deprecation flag. It is only printed on startup only right now. In the context of libvirt what happens is that libvirt can report that something is deprecated (based on the QMP response). If you go ahead and use it anyway, you'll get the deprecation message in the logfile for the VM, and the VM gets marked tainted by libvirt, which serves as a guide to look in the logfile. With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH v5 5/8] blockdev: Add a new IF type IF_OTHER
Hao Wu writes: > This type is used to represent block devs that are not suitable to > be represented by other existing types. > > A sample use is to represent an at24c eeprom device defined in > hw/nvram/eeprom_at24c.c. The block device can be used to contain the > content of the said eeprom device. > > Signed-off-by: Hao Wu Let me add a bit more history in the hope of helping other reviewers. v3 of this series[1] used IF_NONE for the EEPROM device. Peter Maydell questioned that[2], and we concluded it's wrong. I wrote [A] board can use any traditional interface type. If none of them matches, and common decency prevents use of a non-matching one, invent a new one. We could do IF_OTHER. Thomas Huth cleaned up the existing abuse of IF_NONE to use IF_PFLASH instead, in commit 6b717a8d44: hw/misc/sifive_u_otp: Use IF_PFLASH for the OTP device instead of IF_NONE Configuring a drive with "if=none" is meant for creation of a backend only, it should not get automatically assigned to a device frontend. Use "if=pflash" for the One-Time-Programmable device instead (like it is e.g. also done for the efuse device in hw/arm/xlnx-zcu102.c). Since the old way of configuring the device has already been published with the previous QEMU versions, we cannot remove this immediately, but have to deprecate it and support it for at least two more releases. Signed-off-by: Thomas Huth Acked-by: Philippe Mathieu-Daudé Reviewed-by: Markus Armbruster Reviewed-by: Alistair Francis Message-id: 2029102549.217755-1-th...@redhat.com Signed-off-by: Alistair Francis An OTP device isn't really a parallel flash, and neither are eFuses. More fast-and-lose use of IF_PFLASH may exist in the tree, and maybe of other interface types, too. This patch introduces IF_OTHER. The patch after next uses it for an EEPROM device. Do we want IF_OTHER? If no, I guess we get to abuse IF_PFLASH some more. If yes, I guess we should use IF_PFLASH only for actual parallel flash memory going forward. Cleaning up existing abuse of IF_PFLASH may not be worth the trouble, though. Thoughts? > --- > blockdev.c| 4 +++- > include/sysemu/blockdev.h | 1 + > 2 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/blockdev.c b/blockdev.c > index 9230888e34..befd69ac5f 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -82,6 +82,7 @@ static const char *const if_name[IF_COUNT] = { > [IF_MTD] = "mtd", > [IF_SD] = "sd", > [IF_VIRTIO] = "virtio", > +[IF_OTHER] = "other", > [IF_XEN] = "xen", > }; > > @@ -726,7 +727,8 @@ QemuOptsList qemu_legacy_drive_opts = { > },{ > .name = "if", > .type = QEMU_OPT_STRING, > -.help = "interface (ide, scsi, sd, mtd, floppy, pflash, virtio)", > +.help = "interface (ide, scsi, sd, mtd, floppy, pflash, virtio," > +" other)", > },{ > .name = "file", > .type = QEMU_OPT_STRING, > diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h > index 3211b16513..d9dd5af291 100644 > --- a/include/sysemu/blockdev.h > +++ b/include/sysemu/blockdev.h > @@ -21,6 +21,7 @@ typedef enum { > */ > IF_NONE = 0, > IF_IDE, IF_SCSI, IF_FLOPPY, IF_PFLASH, IF_MTD, IF_SD, IF_VIRTIO, IF_XEN, > +IF_OTHER, > IF_COUNT > } BlockInterfaceType; [1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg00372.html [2] Subject: does drive_get_next(IF_NONE) make sense? Message-ID: https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg00710.html
Re: [PATCH v5 7/8] hw/arm: Set drive property for at24c eeprom
Hao Wu writes: > This patch allows the user to attach an external drive as a property > for an onboard at24c eeprom device. What's the contents of the EEPROM before the patch? I guess the patch lets users replace that contents. Why would a user want to do that? > It uses an unit number to > distinguish different devices. > > Signed-off-by: Hao Wu
Re: [PATCH v5 5/8] blockdev: Add a new IF type IF_OTHER
On 18/07/2022 11.49, Markus Armbruster wrote: [...] An OTP device isn't really a parallel flash, and neither are eFuses. More fast-and-lose use of IF_PFLASH may exist in the tree, and maybe of other interface types, too. This patch introduces IF_OTHER. The patch after next uses it for an EEPROM device. Do we want IF_OTHER? If no, I guess we get to abuse IF_PFLASH some more. If yes, I guess we should use IF_PFLASH only for actual parallel flash memory going forward. Cleaning up existing abuse of IF_PFLASH may not be worth the trouble, though. Thoughts? Maybe we should simply rename IF_PFLASH to IF_EPROM or IF_FLASH to make it clear that it is not only about parallel flashes anymore? Just my 0.02 €. Thomas
Re: [PATCH v2] gtk: Add show_tabs=on|off command line option.
On 12/07/2022 15.37, Felix xq Queißner wrote: The patch adds "show_tabs" command line option for GTK ui similar to "grab_on_hover". This option allows tabbed view mode to not have to be enabled by hand at each start of the VM. Nit: In case you have to respin again, please replace "show_tabs" with "show-tabs" and "grab_on_hover" with "grab-on-hover". Anyway: Reviewed-by: Thomas Huth
Re: [PATCH 0/3] target: RFC: display deprecation note for '-cpu help'
On Mon, Jul 18 2022, Daniel P. Berrangé wrote: > On Mon, Jul 18, 2022 at 11:37:35AM +0200, Cornelia Huck wrote: >> Is it easy enough the figure out the deprecation note? I think you >> either have to actually start something with the deprecated entity, or >> use qmp (which is not that straightforward)? > > QMP doesn't tell you the note, just a boolean deprecation flag. It is > only printed on startup only right now. > > In the context of libvirt what happens is that libvirt can report that > something is deprecated (based on the QMP response). If you go ahead > and use it anyway, you'll get the deprecation message in the logfile > for the VM, and the VM gets marked tainted by libvirt, which serves > as a guide to look in the logfile. Hm... so, a user who notes via -help that 'foo' is deprecated does not really have a good way to figure out what they should use instead, other than actually trying to use 'foo'? Is that a use case worth spending some effort on, or do we consider it more of a niche case?
[PATCH] target/arm: Add MO_128 entry to pred_esz_masks[]
In commit 7390e0e9ab8475, we added support for SME loads and stores. Unlike SVE loads and stores, these include handling of 128-bit elements. The SME load/store functions call down into the existing sve_cont_ldst_elements() function, which uses the element size MO_* value as an index into the pred_esz_masks[] array. Because this code path now has to handle MO_128, we need to add an extra element to the array. This bug was spotted by Coverity because it meant we were reading off the end of the array. Resolves: Coverity CID 1490539, 1490541, 1490543, 1490544, 1490545, 1490546, 1490548, 1490549, 1490550, 1490551, 1490555, 1490557, 1490558, 1490560, 1490561, 1490563 Fixes: 7390e0e9ab8475 ("target/arm: Implement SME LD1, ST1") Signed-off-by: Peter Maydell --- target/arm/cpu.h | 2 +- target/arm/translate-sve.c | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index 1e36a839ee4..3123488014d 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -3374,7 +3374,7 @@ static inline uint64_t *aa64_vfp_qreg(CPUARMState *env, unsigned regno) } /* Shared between translate-sve.c and sve_helper.c. */ -extern const uint64_t pred_esz_masks[4]; +extern const uint64_t pred_esz_masks[5]; /* Helper for the macros below, validating the argument type. */ static inline MemTxAttrs *typecheck_memtxattrs(MemTxAttrs *x) diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 41f8b12259e..621a2abb22f 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -529,9 +529,10 @@ static void do_predtest(DisasContext *s, int dofs, int gofs, int words) } /* For each element size, the bits within a predicate word that are active. */ -const uint64_t pred_esz_masks[4] = { +const uint64_t pred_esz_masks[5] = { 0xull, 0xull, -0xull, 0x0101010101010101ull +0xull, 0x0101010101010101ull, +0x0001000100010001ull, }; static bool trans_INVALID(DisasContext *s, arg_INVALID *a) -- 2.25.1
Re: [PATCH 1/7] pci: designware: add 64-bit viewport limit
On 13/07/2022 17:54, Ben Dooks wrote: Versions 4 and above add support for 64-bit viewport limit. Add support for the DESIGNWARE_PCIE_ATU_UPPER_LIMIT regiser where supported. Signed-off-by: Ben Dooks Whoops, just noticed this was my old ct address. --- hw/pci-host/designware.c | 22 +- include/hw/pci-host/designware.h | 2 +- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c index bde3a343a2..296f1b9760 100644 --- a/hw/pci-host/designware.c +++ b/hw/pci-host/designware.c @@ -54,6 +54,7 @@ #define DESIGNWARE_PCIE_ATU_BUS(x) (((x) >> 24) & 0xff) #define DESIGNWARE_PCIE_ATU_DEVFN(x) (((x) >> 16) & 0xff) #define DESIGNWARE_PCIE_ATU_UPPER_TARGET 0x91C +#define DESIGNWARE_PCIE_ATU_UPPER_LIMIT0x924 #define DESIGNWARE_PCIE_IRQ_MSI3 @@ -196,6 +197,10 @@ designware_pcie_root_config_read(PCIDevice *d, uint32_t address, int len) val = viewport->target >> 32; break; +case DESIGNWARE_PCIE_ATU_UPPER_LIMIT: +val = viewport->limit >> 32; +break; + case DESIGNWARE_PCIE_ATU_LIMIT: val = viewport->limit; break; @@ -269,7 +274,7 @@ static void designware_pcie_update_viewport(DesignwarePCIERoot *root, { const uint64_t target = viewport->target; const uint64_t base = viewport->base; -const uint64_t size = (uint64_t)viewport->limit - base + 1; +const uint64_t size = viewport->limit - base + 1; const bool enabled= viewport->cr[1] & DESIGNWARE_PCIE_ATU_ENABLE; MemoryRegion *current, *other; @@ -363,14 +368,21 @@ static void designware_pcie_root_config_write(PCIDevice *d, uint32_t address, viewport->target |= val; break; +case DESIGNWARE_PCIE_ATU_UPPER_LIMIT: +viewport->limit &= 0xUL; +viewport->limit |= (uint64_t)val << 32; +break; + case DESIGNWARE_PCIE_ATU_LIMIT: -viewport->limit = val; +viewport->limit = 0xULL; +viewport->limit |= val; break; case DESIGNWARE_PCIE_ATU_CR1: viewport->cr[0] = val; break; case DESIGNWARE_PCIE_ATU_CR2: +//printf("CR2: value %08x\n", val); viewport->cr[1] = val; designware_pcie_update_viewport(root, viewport); break; @@ -429,7 +441,7 @@ static void designware_pcie_root_realize(PCIDevice *dev, Error **errp) viewport->inbound = true; viewport->base= 0xULL; viewport->target = 0xULL; -viewport->limit = UINT32_MAX; +viewport->limit = UINT64_MAX-1; viewport->cr[0] = DESIGNWARE_PCIE_ATU_TYPE_MEM; source = &host->pci.address_space_root; @@ -453,7 +465,7 @@ static void designware_pcie_root_realize(PCIDevice *dev, Error **errp) viewport->inbound = false; viewport->base= 0xULL; viewport->target = 0xULL; -viewport->limit = UINT32_MAX; +viewport->limit = UINT64_MAX-1; viewport->cr[0] = DESIGNWARE_PCIE_ATU_TYPE_MEM; destination = &host->pci.memory; @@ -560,7 +572,7 @@ static const VMStateDescription vmstate_designware_pcie_viewport = { .fields = (VMStateField[]) { VMSTATE_UINT64(base, DesignwarePCIEViewport), VMSTATE_UINT64(target, DesignwarePCIEViewport), -VMSTATE_UINT32(limit, DesignwarePCIEViewport), +VMSTATE_UINT64(limit, DesignwarePCIEViewport), VMSTATE_UINT32_ARRAY(cr, DesignwarePCIEViewport, 2), VMSTATE_END_OF_LIST() } diff --git a/include/hw/pci-host/designware.h b/include/hw/pci-host/designware.h index 6d9b51ae67..bd4dd49aec 100644 --- a/include/hw/pci-host/designware.h +++ b/include/hw/pci-host/designware.h @@ -44,7 +44,7 @@ typedef struct DesignwarePCIEViewport { uint64_t base; uint64_t target; -uint32_t limit; +uint64_t limit; uint32_t cr[2]; bool inbound;
Re: [PATCH] gpio: designware gpio driver
On 13/07/2022 18:20, Ben Dooks wrote: A model for the DesignWare GPIO (v1) block. Is there anyone else who should be reviewing these that was missed off the original list? I'd like to get an idea if there is any work to do. I've got a couple more drivers to submit and was waiting on feedback from this before getting these submitted. -- Ben
Re: [PATCH] gpio: designware gpio driver
On Mon, 18 Jul 2022 at 11:05, Ben Dooks wrote: > > On 13/07/2022 18:20, Ben Dooks wrote: > > A model for the DesignWare GPIO (v1) block. > > Is there anyone else who should be reviewing these that > was missed off the original list? I'd like to get an idea > if there is any work to do. I've got a couple more drivers > to submit and was waiting on feedback from this before > getting these submitted. My overall feedback is: this isn't a pluggable device (PCI, etc), so what's it intended to be used by? Generally we don't take device models except when there's a board model that's using them. thanks -- PMM
Re: [RFC PATCH 09/12] vdpa: Extract vhost_vdpa_net_svq_add from vhost_vdpa_net_handle_ctrl_avail
On Mon, Jul 18, 2022 at 10:53 AM Jason Wang wrote: > > > 在 2022/7/16 19:34, Eugenio Pérez 写道: > > So we can reuse to inject state messages. > > > > Signed-off-by: Eugenio Pérez > > --- > > net/vhost-vdpa.c | 89 +++- > > 1 file changed, 51 insertions(+), 38 deletions(-) > > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c > > index 211bd0468b..aaae51a778 100644 > > --- a/net/vhost-vdpa.c > > +++ b/net/vhost-vdpa.c > > @@ -334,6 +334,54 @@ static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState > > *s, > > return true; > > } > > > > +static virtio_net_ctrl_ack vhost_vdpa_net_svq_add(VhostShadowVirtqueue > > *svq, > > + const struct iovec > > *dev_buffers) > > > The name should be tweaked since it is used only for cvq. > Right, I'll change. > > > +{ > > +/* in buffer used for device model */ > > +virtio_net_ctrl_ack status; > > +const struct iovec in = { > > +.iov_base = &status, > > +.iov_len = sizeof(status), > > +}; > > +size_t dev_written; > > +int r; > > +void *unused = (void *)1; > > + > > +r = vhost_svq_add(svq, &dev_buffers[0], 1, &dev_buffers[1], 1, unused); > > +if (unlikely(r != 0)) { > > +if (unlikely(r == -ENOSPC)) { > > +qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device > > queue\n", > > + __func__); > > +} > > +return VIRTIO_NET_ERR; > > +} > > + > > +/* > > + * We can poll here since we've had BQL from the time we sent the > > + * descriptor. Also, we need to take the answer before SVQ pulls by > > itself, > > + * when BQL is released > > + */ > > > This reminds me that, do we need a upper limit of the time on the > polling here. (Avoid taking BQL for too long time). > Sending a new version of rx filters here. But we have other parts where we can have BQL forever because we trust the device, like vring enable syscalls for example. Thanks!
Re: [PATCH] gpio: designware gpio driver
On 18/07/2022 11:15, Peter Maydell wrote: On Mon, 18 Jul 2022 at 11:05, Ben Dooks wrote: On 13/07/2022 18:20, Ben Dooks wrote: A model for the DesignWare GPIO (v1) block. Is there anyone else who should be reviewing these that was missed off the original list? I'd like to get an idea if there is any work to do. I've got a couple more drivers to submit and was waiting on feedback from this before getting these submitted. My overall feedback is: this isn't a pluggable device (PCI, etc), so what's it intended to be used by? Generally we don't take device models except when there's a board model that's using them. I have a board file, but that's currently under NDA, so we're not allowed to release it at the moment. However we've done a few drivers which we'd like to get out of our development tree which other people might find useful (GPIO, SPI, I2C). -- Ben
[PATCH v4 00/19] vdpa net devices Rx filter change notification with Shadow VQ
Control virtqueue is used by networking device for accepting various commands from the driver. It's a must to support advanced configurations. Rx filtering event is issues by qemu when device's MAC address changed once and the previous one has not been queried by external agents. Shadow VirtQueue (SVQ) already makes possible tracking the state of virtqueues, effectively intercepting them so qemu can track what regions of memory are dirty because device action and needs migration. However, this does not solve networking device state seen by the driver because CVQ messages, like changes on MAC addresses from the driver. This series uses SVQ infrastructure to intercept networking control messages used by the device. This way, qemu is able to update VirtIONet device model and react to them. In particular, this series enables rx filter change notification. This is a prerequisite to achieve net vdpa device with CVQ live migration. It's a stripped down version of [1], with error paths checked and no migration enabled. First nine patches reorder and clean code base so its easier to apply later ones. No functional change should be noticed from these changes. Patches from 11 to 14 enable SVQ API to make other parts of qemu to interact with it. In particular, they will be used by vhost-vdpa net to handle CVQ messages. Patches 15 to 17 enable the update of the virtio-net device model for each CVQ message acknowledged by the device. Last patches enable x-svq parameter, forbidding device migration since it is not restored in the destination's vdpa device yet. This will be added in later series, using this work. Comments are welcome. v4: - Add a timeout to vhost_svq_poll to not hold BQL forever v3: - Replace SVQElement with SVQDescState v2: - (Comments from series [1]). - Active poll for CVQ answer instead of relay on async used callback - Do not offer a new buffer to device but reuse qemu's - Use vhost_svq_add instead of not needed vhost_svq_inject - Delete used and detach callbacks, not needed anymore - Embed members of SVQElement in VirtQueueElement - Reuse the same buffers for all CVQ commands [1] https://patchwork.kernel.org/project/qemu-devel/cover/20220706184008.1649478-1-epere...@redhat.com/ Eugenio Pérez (19): vhost: move descriptor translation to vhost_svq_vring_write_descs virtio-net: Expose MAC_TABLE_ENTRIES virtio-net: Expose ctrl virtqueue logic vhost: Reorder vhost_svq_kick vhost: Move vhost_svq_kick call to vhost_svq_add vhost: Check for queue full at vhost_svq_add vhost: Decouple vhost_svq_add from VirtQueueElement vhost: Add SVQDescState vhost: Track number of descs in SVQDescState vhost: add vhost_svq_push_elem vhost: Expose vhost_svq_add vhost: add vhost_svq_poll vhost: Add svq avail_handler callback vdpa: Export vhost_vdpa_dma_map and unmap calls vdpa: manual forward CVQ buffers vdpa: Buffer CVQ support on shadow virtqueue vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs vdpa: Add device migration blocker vdpa: Add x-svq to NetdevVhostVDPAOptions qapi/net.json | 9 +- hw/virtio/vhost-shadow-virtqueue.h | 52 - include/hw/virtio/vhost-vdpa.h | 8 + include/hw/virtio/virtio-net.h | 7 + hw/net/virtio-net.c| 85 --- hw/virtio/vhost-shadow-virtqueue.c | 207 - hw/virtio/vhost-vdpa.c | 25 +- net/vhost-vdpa.c | 357 +++-- 8 files changed, 632 insertions(+), 118 deletions(-) -- 2.31.1
[PATCH v4 01/19] vhost: move descriptor translation to vhost_svq_vring_write_descs
It's done for both in and out descriptors so it's better placed here. Acked-by: Jason Wang Signed-off-by: Eugenio Pérez --- hw/virtio/vhost-shadow-virtqueue.c | 38 +- 1 file changed, 27 insertions(+), 11 deletions(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index 56c96ebd13..e2184a4481 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -122,17 +122,35 @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq, return true; } -static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg, -const struct iovec *iovec, size_t num, -bool more_descs, bool write) +/** + * Write descriptors to SVQ vring + * + * @svq: The shadow virtqueue + * @sg: Cache for hwaddr + * @iovec: The iovec from the guest + * @num: iovec length + * @more_descs: True if more descriptors come in the chain + * @write: True if they are writeable descriptors + * + * Return true if success, false otherwise and print error. + */ +static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg, +const struct iovec *iovec, size_t num, +bool more_descs, bool write) { uint16_t i = svq->free_head, last = svq->free_head; unsigned n; uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0; vring_desc_t *descs = svq->vring.desc; +bool ok; if (num == 0) { -return; +return true; +} + +ok = vhost_svq_translate_addr(svq, sg, iovec, num); +if (unlikely(!ok)) { +return false; } for (n = 0; n < num; n++) { @@ -150,6 +168,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg, } svq->free_head = le16_to_cpu(svq->desc_next[last]); +return true; } static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, @@ -169,21 +188,18 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, return false; } -ok = vhost_svq_translate_addr(svq, sgs, elem->out_sg, elem->out_num); +ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num, + elem->in_num > 0, false); if (unlikely(!ok)) { return false; } -vhost_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num, -elem->in_num > 0, false); - -ok = vhost_svq_translate_addr(svq, sgs, elem->in_sg, elem->in_num); +ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, + true); if (unlikely(!ok)) { return false; } -vhost_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, true); - /* * Put the entry in the available array (but don't update avail->idx until * they do sync). -- 2.31.1
[PATCH v4 03/19] virtio-net: Expose ctrl virtqueue logic
This allows external vhost-net devices to modify the state of the VirtIO device model once the vhost-vdpa device has acknowledged the control commands. Signed-off-by: Eugenio Pérez --- include/hw/virtio/virtio-net.h | 4 ++ hw/net/virtio-net.c| 84 -- 2 files changed, 53 insertions(+), 35 deletions(-) diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h index cce1c554f7..ef234ffe7e 100644 --- a/include/hw/virtio/virtio-net.h +++ b/include/hw/virtio/virtio-net.h @@ -221,6 +221,10 @@ struct VirtIONet { struct EBPFRSSContext ebpf_rss; }; +size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev, + const struct iovec *in_sg, unsigned in_num, + const struct iovec *out_sg, + unsigned out_num); void virtio_net_set_netclient_name(VirtIONet *n, const char *name, const char *type); diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index f83e96e4ce..dd0d056fde 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -1433,57 +1433,71 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd, return VIRTIO_NET_OK; } -static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) +size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev, + const struct iovec *in_sg, unsigned in_num, + const struct iovec *out_sg, + unsigned out_num) { VirtIONet *n = VIRTIO_NET(vdev); struct virtio_net_ctrl_hdr ctrl; virtio_net_ctrl_ack status = VIRTIO_NET_ERR; -VirtQueueElement *elem; size_t s; struct iovec *iov, *iov2; -unsigned int iov_cnt; + +if (iov_size(in_sg, in_num) < sizeof(status) || +iov_size(out_sg, out_num) < sizeof(ctrl)) { +virtio_error(vdev, "virtio-net ctrl missing headers"); +return 0; +} + +iov2 = iov = g_memdup2(out_sg, sizeof(struct iovec) * out_num); +s = iov_to_buf(iov, out_num, 0, &ctrl, sizeof(ctrl)); +iov_discard_front(&iov, &out_num, sizeof(ctrl)); +if (s != sizeof(ctrl)) { +status = VIRTIO_NET_ERR; +} else if (ctrl.class == VIRTIO_NET_CTRL_RX) { +status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, out_num); +} else if (ctrl.class == VIRTIO_NET_CTRL_MAC) { +status = virtio_net_handle_mac(n, ctrl.cmd, iov, out_num); +} else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) { +status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, out_num); +} else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) { +status = virtio_net_handle_announce(n, ctrl.cmd, iov, out_num); +} else if (ctrl.class == VIRTIO_NET_CTRL_MQ) { +status = virtio_net_handle_mq(n, ctrl.cmd, iov, out_num); +} else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) { +status = virtio_net_handle_offloads(n, ctrl.cmd, iov, out_num); +} + +s = iov_from_buf(in_sg, in_num, 0, &status, sizeof(status)); +assert(s == sizeof(status)); + +g_free(iov2); +return sizeof(status); +} + +static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) +{ +VirtQueueElement *elem; for (;;) { +size_t written; elem = virtqueue_pop(vq, sizeof(VirtQueueElement)); if (!elem) { break; } -if (iov_size(elem->in_sg, elem->in_num) < sizeof(status) || -iov_size(elem->out_sg, elem->out_num) < sizeof(ctrl)) { -virtio_error(vdev, "virtio-net ctrl missing headers"); + +written = virtio_net_handle_ctrl_iov(vdev, elem->in_sg, elem->in_num, + elem->out_sg, elem->out_num); +if (written > 0) { +virtqueue_push(vq, elem, written); +virtio_notify(vdev, vq); +g_free(elem); +} else { virtqueue_detach_element(vq, elem, 0); g_free(elem); break; } - -iov_cnt = elem->out_num; -iov2 = iov = g_memdup2(elem->out_sg, - sizeof(struct iovec) * elem->out_num); -s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl)); -iov_discard_front(&iov, &iov_cnt, sizeof(ctrl)); -if (s != sizeof(ctrl)) { -status = VIRTIO_NET_ERR; -} else if (ctrl.class == VIRTIO_NET_CTRL_RX) { -status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, iov_cnt); -} else if (ctrl.class == VIRTIO_NET_CTRL_MAC) { -status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt); -} else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) { -status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt); -} else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) { -status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt); -} else if (ctrl.class == VIRTIO_NET_
[PATCH v4 11/19] vhost: Expose vhost_svq_add
This allows external parts of SVQ to forward custom buffers to the device. Signed-off-by: Eugenio Pérez --- hw/virtio/vhost-shadow-virtqueue.h | 3 +++ hw/virtio/vhost-shadow-virtqueue.c | 6 +++--- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h index d9fc1f1799..dd78f4bec2 100644 --- a/hw/virtio/vhost-shadow-virtqueue.h +++ b/hw/virtio/vhost-shadow-virtqueue.h @@ -86,6 +86,9 @@ bool vhost_svq_valid_features(uint64_t features, Error **errp); void vhost_svq_push_elem(VhostShadowVirtqueue *svq, const VirtQueueElement *elem, uint32_t len); +int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg, + size_t out_num, const struct iovec *in_sg, size_t in_num, + VirtQueueElement *elem); void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd); void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd); diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index b377e125e7..406a823c81 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -238,9 +238,9 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq) * * Return -EINVAL if element is invalid, -ENOSPC if dev queue is full */ -static int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg, - size_t out_num, const struct iovec *in_sg, - size_t in_num, VirtQueueElement *elem) +int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg, + size_t out_num, const struct iovec *in_sg, size_t in_num, + VirtQueueElement *elem) { unsigned qemu_head; unsigned ndescs = in_num + out_num; -- 2.31.1
[PATCH v4 02/19] virtio-net: Expose MAC_TABLE_ENTRIES
vhost-vdpa control virtqueue needs to know the maximum entries supported by the virtio-net device, so we know if it is possible to apply the filter. Signed-off-by: Eugenio Pérez --- include/hw/virtio/virtio-net.h | 3 +++ hw/net/virtio-net.c| 1 - 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h index eb87032627..cce1c554f7 100644 --- a/include/hw/virtio/virtio-net.h +++ b/include/hw/virtio/virtio-net.h @@ -35,6 +35,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(VirtIONet, VIRTIO_NET) * and latency. */ #define TX_BURST 256 +/* Maximum VIRTIO_NET_CTRL_MAC_TABLE_SET unicast + multicast entries. */ +#define MAC_TABLE_ENTRIES64 + typedef struct virtio_net_conf { uint32_t txtimer; diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 7ad948ee7c..f83e96e4ce 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -49,7 +49,6 @@ #define VIRTIO_NET_VM_VERSION11 -#define MAC_TABLE_ENTRIES64 #define MAX_VLAN(1 << 12) /* Per 802.1Q definition */ /* previously fixed value */ -- 2.31.1
[PATCH v4 15/19] vdpa: manual forward CVQ buffers
Do a simple forwarding of CVQ buffers, the same work SVQ could do but through callbacks. No functional change intended. Signed-off-by: Eugenio Pérez --- include/hw/virtio/vhost-vdpa.h | 3 ++ hw/virtio/vhost-vdpa.c | 3 +- net/vhost-vdpa.c | 58 ++ 3 files changed, 63 insertions(+), 1 deletion(-) diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h index 7214eb47dc..d85643 100644 --- a/include/hw/virtio/vhost-vdpa.h +++ b/include/hw/virtio/vhost-vdpa.h @@ -15,6 +15,7 @@ #include #include "hw/virtio/vhost-iova-tree.h" +#include "hw/virtio/vhost-shadow-virtqueue.h" #include "hw/virtio/virtio.h" #include "standard-headers/linux/vhost_types.h" @@ -35,6 +36,8 @@ typedef struct vhost_vdpa { /* IOVA mapping used by the Shadow Virtqueue */ VhostIOVATree *iova_tree; GPtrArray *shadow_vqs; +const VhostShadowVirtqueueOps *shadow_vq_ops; +void *shadow_vq_ops_opaque; struct vhost_dev *dev; VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX]; } VhostVDPA; diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 96997210be..beaaa7049a 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -419,7 +419,8 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v, for (unsigned n = 0; n < hdev->nvqs; ++n) { g_autoptr(VhostShadowVirtqueue) svq; -svq = vhost_svq_new(v->iova_tree, NULL, NULL); +svq = vhost_svq_new(v->iova_tree, v->shadow_vq_ops, +v->shadow_vq_ops_opaque); if (unlikely(!svq)) { error_setg(errp, "Cannot create svq %u", n); return -1; diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index df1e69ee72..2e3b6b10d8 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -11,11 +11,14 @@ #include "qemu/osdep.h" #include "clients.h" +#include "hw/virtio/virtio-net.h" #include "net/vhost_net.h" #include "net/vhost-vdpa.h" #include "hw/virtio/vhost-vdpa.h" #include "qemu/config-file.h" #include "qemu/error-report.h" +#include "qemu/log.h" +#include "qemu/memalign.h" #include "qemu/option.h" #include "qapi/error.h" #include @@ -187,6 +190,57 @@ static NetClientInfo net_vhost_vdpa_info = { .check_peer_type = vhost_vdpa_check_peer_type, }; +/** + * Forward buffer for the moment. + */ +static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, +VirtQueueElement *elem, +void *opaque) +{ +unsigned int n = elem->out_num + elem->in_num; +g_autofree struct iovec *dev_buffers = g_new(struct iovec, n); +size_t in_len, dev_written; +virtio_net_ctrl_ack status = VIRTIO_NET_ERR; +int r; + +memcpy(dev_buffers, elem->out_sg, elem->out_num); +memcpy(dev_buffers + elem->out_num, elem->in_sg, elem->in_num); + +r = vhost_svq_add(svq, &dev_buffers[0], elem->out_num, &dev_buffers[1], + elem->in_num, elem); +if (unlikely(r != 0)) { +if (unlikely(r == -ENOSPC)) { +qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n", + __func__); +} +goto out; +} + +/* + * We can poll here since we've had BQL from the time we sent the + * descriptor. Also, we need to take the answer before SVQ pulls by itself, + * when BQL is released + */ +dev_written = vhost_svq_poll(svq); +if (unlikely(dev_written < sizeof(status))) { +error_report("Insufficient written data (%zu)", dev_written); +} + +out: +in_len = iov_from_buf(elem->in_sg, elem->in_num, 0, &status, + sizeof(status)); +if (unlikely(in_len < sizeof(status))) { +error_report("Bad device CVQ written length"); +} +vhost_svq_push_elem(svq, elem, MIN(in_len, sizeof(status))); +g_free(elem); +return r; +} + +static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = { +.avail_handler = vhost_vdpa_net_handle_ctrl_avail, +}; + static NetClientState *net_vhost_vdpa_init(NetClientState *peer, const char *device, const char *name, @@ -211,6 +265,10 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer, s->vhost_vdpa.device_fd = vdpa_device_fd; s->vhost_vdpa.index = queue_pair_index; +if (!is_datapath) { +s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops; +s->vhost_vdpa.shadow_vq_ops_opaque = s; +} ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs); if (ret) { qemu_del_net_client(nc); -- 2.31.1
[PATCH v4 04/19] vhost: Reorder vhost_svq_kick
Future code needs to call it from vhost_svq_add. No functional change intended. Signed-off-by: Eugenio Pérez --- hw/virtio/vhost-shadow-virtqueue.c | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index e2184a4481..fd1839cec5 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -215,6 +215,20 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, return true; } +static void vhost_svq_kick(VhostShadowVirtqueue *svq) +{ +/* + * We need to expose the available array entries before checking the used + * flags + */ +smp_mb(); +if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) { +return; +} + +event_notifier_set(&svq->hdev_kick); +} + /** * Add an element to a SVQ. * @@ -235,20 +249,6 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem) return true; } -static void vhost_svq_kick(VhostShadowVirtqueue *svq) -{ -/* - * We need to expose the available array entries before checking the used - * flags - */ -smp_mb(); -if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) { -return; -} - -event_notifier_set(&svq->hdev_kick); -} - /** * Forward available buffers. * -- 2.31.1
[PATCH v4 06/19] vhost: Check for queue full at vhost_svq_add
The series need to expose vhost_svq_add with full functionality, including checking for full queue. Signed-off-by: Eugenio Pérez --- hw/virtio/vhost-shadow-virtqueue.c | 59 +- 1 file changed, 33 insertions(+), 26 deletions(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index e5a4a62daa..aee9891a67 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -233,21 +233,29 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq) * Add an element to a SVQ. * * The caller must check that there is enough slots for the new element. It - * takes ownership of the element: In case of failure, it is free and the SVQ - * is considered broken. + * takes ownership of the element: In case of failure not ENOSPC, it is free. + * + * Return -EINVAL if element is invalid, -ENOSPC if dev queue is full */ -static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem) +static int vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem) { unsigned qemu_head; -bool ok = vhost_svq_add_split(svq, elem, &qemu_head); +unsigned ndescs = elem->in_num + elem->out_num; +bool ok; + +if (unlikely(ndescs > vhost_svq_available_slots(svq))) { +return -ENOSPC; +} + +ok = vhost_svq_add_split(svq, elem, &qemu_head); if (unlikely(!ok)) { g_free(elem); -return false; +return -EINVAL; } svq->ring_id_maps[qemu_head] = elem; vhost_svq_kick(svq); -return true; +return 0; } /** @@ -274,7 +282,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq) while (true) { VirtQueueElement *elem; -bool ok; +int r; if (svq->next_guest_avail_elem) { elem = g_steal_pointer(&svq->next_guest_avail_elem); @@ -286,25 +294,24 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq) break; } -if (elem->out_num + elem->in_num > vhost_svq_available_slots(svq)) { -/* - * This condition is possible since a contiguous buffer in GPA - * does not imply a contiguous buffer in qemu's VA - * scatter-gather segments. If that happens, the buffer exposed - * to the device needs to be a chain of descriptors at this - * moment. - * - * SVQ cannot hold more available buffers if we are here: - * queue the current guest descriptor and ignore further kicks - * until some elements are used. - */ -svq->next_guest_avail_elem = elem; -return; -} - -ok = vhost_svq_add(svq, elem); -if (unlikely(!ok)) { -/* VQ is broken, just return and ignore any other kicks */ +r = vhost_svq_add(svq, elem); +if (unlikely(r != 0)) { +if (r == -ENOSPC) { +/* + * This condition is possible since a contiguous buffer in + * GPA does not imply a contiguous buffer in qemu's VA + * scatter-gather segments. If that happens, the buffer + * exposed to the device needs to be a chain of descriptors + * at this moment. + * + * SVQ cannot hold more available buffers if we are here: + * queue the current guest descriptor and ignore kicks + * until some elements are used. + */ +svq->next_guest_avail_elem = elem; +} + +/* VQ is full or broken, just return and ignore kicks */ return; } } -- 2.31.1
[PATCH v4 08/19] vhost: Add SVQDescState
This will allow SVQ to add context to the different queue elements. This patch only store the actual element, no functional change intended. Signed-off-by: Eugenio Pérez --- hw/virtio/vhost-shadow-virtqueue.h | 8 ++-- hw/virtio/vhost-shadow-virtqueue.c | 16 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h index c132c994e9..d646c35054 100644 --- a/hw/virtio/vhost-shadow-virtqueue.h +++ b/hw/virtio/vhost-shadow-virtqueue.h @@ -15,6 +15,10 @@ #include "standard-headers/linux/vhost_types.h" #include "hw/virtio/vhost-iova-tree.h" +typedef struct SVQDescState { +VirtQueueElement *elem; +} SVQDescState; + /* Shadow virtqueue to relay notifications */ typedef struct VhostShadowVirtqueue { /* Shadow vring */ @@ -47,8 +51,8 @@ typedef struct VhostShadowVirtqueue { /* IOVA mapping */ VhostIOVATree *iova_tree; -/* Map for use the guest's descriptors */ -VirtQueueElement **ring_id_maps; +/* SVQ vring descriptors state */ +SVQDescState *desc_state; /* Next VirtQueue element that guest made available */ VirtQueueElement *next_guest_avail_elem; diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index b005a457c6..d12f5afffb 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -256,7 +256,7 @@ static int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg, return -EINVAL; } -svq->ring_id_maps[qemu_head] = elem; +svq->desc_state[qemu_head].elem = elem; vhost_svq_kick(svq); return 0; } @@ -410,21 +410,21 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, return NULL; } -if (unlikely(!svq->ring_id_maps[used_elem.id])) { +if (unlikely(!svq->desc_state[used_elem.id].elem)) { qemu_log_mask(LOG_GUEST_ERROR, "Device %s says index %u is used, but it was not available", svq->vdev->name, used_elem.id); return NULL; } -num = svq->ring_id_maps[used_elem.id]->in_num + - svq->ring_id_maps[used_elem.id]->out_num; +num = svq->desc_state[used_elem.id].elem->in_num + + svq->desc_state[used_elem.id].elem->out_num; last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id); svq->desc_next[last_used_chain] = svq->free_head; svq->free_head = used_elem.id; *len = used_elem.len; -return g_steal_pointer(&svq->ring_id_maps[used_elem.id]); +return g_steal_pointer(&svq->desc_state[used_elem.id].elem); } static void vhost_svq_flush(VhostShadowVirtqueue *svq, @@ -594,7 +594,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev, memset(svq->vring.desc, 0, driver_size); svq->vring.used = qemu_memalign(qemu_real_host_page_size(), device_size); memset(svq->vring.used, 0, device_size); -svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num); +svq->desc_state = g_new0(SVQDescState, svq->vring.num); svq->desc_next = g_new0(uint16_t, svq->vring.num); for (unsigned i = 0; i < svq->vring.num - 1; i++) { svq->desc_next[i] = cpu_to_le16(i + 1); @@ -619,7 +619,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq) for (unsigned i = 0; i < svq->vring.num; ++i) { g_autofree VirtQueueElement *elem = NULL; -elem = g_steal_pointer(&svq->ring_id_maps[i]); +elem = g_steal_pointer(&svq->desc_state[i].elem); if (elem) { virtqueue_detach_element(svq->vq, elem, 0); } @@ -631,7 +631,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq) } svq->vq = NULL; g_free(svq->desc_next); -g_free(svq->ring_id_maps); +g_free(svq->desc_state); qemu_vfree(svq->vring.desc); qemu_vfree(svq->vring.used); } -- 2.31.1
[PATCH v4 12/19] vhost: add vhost_svq_poll
It allows the Shadow Control VirtQueue to wait for the device to use the available buffers. Signed-off-by: Eugenio Pérez --- hw/virtio/vhost-shadow-virtqueue.h | 1 + hw/virtio/vhost-shadow-virtqueue.c | 27 +++ 2 files changed, 28 insertions(+) diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h index dd78f4bec2..cf442f7dea 100644 --- a/hw/virtio/vhost-shadow-virtqueue.h +++ b/hw/virtio/vhost-shadow-virtqueue.h @@ -89,6 +89,7 @@ void vhost_svq_push_elem(VhostShadowVirtqueue *svq, int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg, size_t out_num, const struct iovec *in_sg, size_t in_num, VirtQueueElement *elem); +size_t vhost_svq_poll(VhostShadowVirtqueue *svq); void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd); void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd); diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index 406a823c81..3c26781cf7 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -484,6 +484,33 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq, } while (!vhost_svq_enable_notification(svq)); } +/** + * Poll the SVQ for one device used buffer. + * + * This function race with main event loop SVQ polling, so extra + * synchronization is needed. + * + * Return the length written by the device. + */ +size_t vhost_svq_poll(VhostShadowVirtqueue *svq) +{ +int64_t start_us = g_get_monotonic_time(); +do { +uint32_t len; +VirtQueueElement *elem = vhost_svq_get_buf(svq, &len); +if (elem) { +return len; +} + +if (unlikely(g_get_monotonic_time() - start_us > 10e6)) { +return 0; +} + +/* Make sure we read new used_idx */ +smp_rmb(); +} while (true); +} + /** * Forward used buffers. * -- 2.31.1
[PATCH v4 16/19] vdpa: Buffer CVQ support on shadow virtqueue
Introduce the control virtqueue support for vDPA shadow virtqueue. This is needed for advanced networking features like rx filtering. Virtio-net control VQ copies the descriptors to qemu's VA, so we avoid TOCTOU with the guest's or device's memory every time there is a device model change. Otherwise, the guest could change the memory content in the time between qemu and the device read it. To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR is implemented. If the virtio-net driver changes MAC the virtio-net device model will be updated with the new one, and a rx filtering change event will be raised. More cvq commands could be added here straightforwardly but they have not been tested. Signed-off-by: Eugenio Pérez --- net/vhost-vdpa.c | 211 +-- 1 file changed, 204 insertions(+), 7 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 2e3b6b10d8..3915b148c4 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -33,6 +33,9 @@ typedef struct VhostVDPAState { NetClientState nc; struct vhost_vdpa vhost_vdpa; VHostNetState *vhost_net; + +/* Control commands shadow buffers */ +void *cvq_cmd_out_buffer, *cvq_cmd_in_buffer; bool started; } VhostVDPAState; @@ -131,6 +134,8 @@ static void vhost_vdpa_cleanup(NetClientState *nc) { VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc); +qemu_vfree(s->cvq_cmd_out_buffer); +qemu_vfree(s->cvq_cmd_in_buffer); if (s->vhost_net) { vhost_net_cleanup(s->vhost_net); g_free(s->vhost_net); @@ -190,24 +195,191 @@ static NetClientInfo net_vhost_vdpa_info = { .check_peer_type = vhost_vdpa_check_peer_type, }; +static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr) +{ +VhostIOVATree *tree = v->iova_tree; +DMAMap needle = { +/* + * No need to specify size or to look for more translations since + * this contiguous chunk was allocated by us. + */ +.translated_addr = (hwaddr)(uintptr_t)addr, +}; +const DMAMap *map = vhost_iova_tree_find_iova(tree, &needle); +int r; + +if (unlikely(!map)) { +error_report("Cannot locate expected map"); +return; +} + +r = vhost_vdpa_dma_unmap(v, map->iova, map->size + 1); +if (unlikely(r != 0)) { +error_report("Device cannot unmap: %s(%d)", g_strerror(r), r); +} + +vhost_iova_tree_remove(tree, map); +} + +static size_t vhost_vdpa_net_cvq_cmd_len(void) +{ +/* + * MAC_TABLE_SET is the ctrl command that produces the longer out buffer. + * In buffer is always 1 byte, so it should fit here + */ +return sizeof(struct virtio_net_ctrl_hdr) + + 2 * sizeof(struct virtio_net_ctrl_mac) + + MAC_TABLE_ENTRIES * ETH_ALEN; +} + +static size_t vhost_vdpa_net_cvq_cmd_page_len(void) +{ +return ROUND_UP(vhost_vdpa_net_cvq_cmd_len(), qemu_real_host_page_size()); +} + +/** Copy and map a guest buffer. */ +static bool vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v, + const struct iovec *out_data, + size_t out_num, size_t data_len, void *buf, + size_t *written, bool write) +{ +DMAMap map = {}; +int r; + +if (unlikely(!data_len)) { +qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid legnth of %s buffer\n", + __func__, write ? "in" : "out"); +return false; +} + +*written = iov_to_buf(out_data, out_num, 0, buf, data_len); +map.translated_addr = (hwaddr)(uintptr_t)buf; +map.size = vhost_vdpa_net_cvq_cmd_page_len() - 1; +map.perm = write ? IOMMU_RW : IOMMU_RO, +r = vhost_iova_tree_map_alloc(v->iova_tree, &map); +if (unlikely(r != IOVA_OK)) { +error_report("Cannot map injected element"); +return false; +} + +r = vhost_vdpa_dma_map(v, map.iova, vhost_vdpa_net_cvq_cmd_page_len(), buf, + !write); +if (unlikely(r < 0)) { +goto dma_map_err; +} + +return true; + +dma_map_err: +vhost_iova_tree_remove(v->iova_tree, &map); +return false; +} + /** - * Forward buffer for the moment. + * Copy the guest element into a dedicated buffer suitable to be sent to NIC + * + * @iov: [0] is the out buffer, [1] is the in one + */ +static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s, +VirtQueueElement *elem, +struct iovec *iov) +{ +size_t in_copied; +bool ok; + +iov[0].iov_base = s->cvq_cmd_out_buffer; +ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, elem->out_sg, elem->out_num, +vhost_vdpa_net_cvq_cmd_len(), iov[0].iov_base, +&iov[0].iov_len, false); +if (unlikely(!ok)) { +return false; +} + +iov[1].iov_base = s->cvq_cmd_in_buffer; +ok = vhost_vdpa_cvq_map_buf(&s->
[PATCH v4 05/19] vhost: Move vhost_svq_kick call to vhost_svq_add
The series needs to expose vhost_svq_add with full functionality, including kick Signed-off-by: Eugenio Pérez --- hw/virtio/vhost-shadow-virtqueue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index fd1839cec5..e5a4a62daa 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -246,6 +246,7 @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem) } svq->ring_id_maps[qemu_head] = elem; +vhost_svq_kick(svq); return true; } @@ -306,7 +307,6 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq) /* VQ is broken, just return and ignore any other kicks */ return; } -vhost_svq_kick(svq); } virtio_queue_set_notification(svq->vq, true); -- 2.31.1
[PATCH v4 10/19] vhost: add vhost_svq_push_elem
This function allows external SVQ users to return guest's available buffers. Signed-off-by: Eugenio Pérez --- hw/virtio/vhost-shadow-virtqueue.h | 3 +++ hw/virtio/vhost-shadow-virtqueue.c | 16 2 files changed, 19 insertions(+) diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h index 5c7e7cbab6..d9fc1f1799 100644 --- a/hw/virtio/vhost-shadow-virtqueue.h +++ b/hw/virtio/vhost-shadow-virtqueue.h @@ -84,6 +84,9 @@ typedef struct VhostShadowVirtqueue { bool vhost_svq_valid_features(uint64_t features, Error **errp); +void vhost_svq_push_elem(VhostShadowVirtqueue *svq, + const VirtQueueElement *elem, uint32_t len); + void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd); void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd); void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq, diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index ae5bd6efa8..b377e125e7 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -427,6 +427,22 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, return g_steal_pointer(&svq->desc_state[used_elem.id].elem); } +/** + * Push an element to SVQ, returning it to the guest. + */ +void vhost_svq_push_elem(VhostShadowVirtqueue *svq, + const VirtQueueElement *elem, uint32_t len) +{ +virtqueue_push(svq->vq, elem, len); +if (svq->next_guest_avail_elem) { +/* + * Avail ring was full when vhost_svq_flush was called, so it's a + * good moment to make more descriptors available if possible. + */ +vhost_handle_guest_kick(svq); +} +} + static void vhost_svq_flush(VhostShadowVirtqueue *svq, bool check_for_avail_queue) { -- 2.31.1
[PATCH v4 18/19] vdpa: Add device migration blocker
Since the vhost-vdpa device is exposing _F_LOG, adding a migration blocker if it uses CVQ. Signed-off-by: Eugenio Pérez --- include/hw/virtio/vhost-vdpa.h | 1 + hw/virtio/vhost-vdpa.c | 14 ++ 2 files changed, 15 insertions(+) diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h index d85643..d10a89303e 100644 --- a/include/hw/virtio/vhost-vdpa.h +++ b/include/hw/virtio/vhost-vdpa.h @@ -35,6 +35,7 @@ typedef struct vhost_vdpa { bool shadow_vqs_enabled; /* IOVA mapping used by the Shadow Virtqueue */ VhostIOVATree *iova_tree; +Error *migration_blocker; GPtrArray *shadow_vqs; const VhostShadowVirtqueueOps *shadow_vq_ops; void *shadow_vq_ops_opaque; diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index beaaa7049a..795ed5a049 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -20,6 +20,7 @@ #include "hw/virtio/vhost-shadow-virtqueue.h" #include "hw/virtio/vhost-vdpa.h" #include "exec/address-spaces.h" +#include "migration/blocker.h" #include "qemu/cutils.h" #include "qemu/main-loop.h" #include "cpu.h" @@ -1022,6 +1023,13 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev) return true; } +if (v->migration_blocker) { +int r = migrate_add_blocker(v->migration_blocker, &err); +if (unlikely(r < 0)) { +goto err_migration_blocker; +} +} + for (i = 0; i < v->shadow_vqs->len; ++i) { VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i); VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i); @@ -1064,6 +1072,9 @@ err: vhost_svq_stop(svq); } +err_migration_blocker: +error_reportf_err(err, "Cannot setup SVQ %u: ", i); + return false; } @@ -1083,6 +1094,9 @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev) } } +if (v->migration_blocker) { +migrate_del_blocker(v->migration_blocker); +} return true; } -- 2.31.1
[PATCH v4 14/19] vdpa: Export vhost_vdpa_dma_map and unmap calls
Shadow CVQ will copy buffers on qemu VA, so we avoid TOCTOU attacks from the guest that could set a different state in qemu device model and vdpa device. To do so, it needs to be able to map these new buffers to the device. Signed-off-by: Eugenio Pérez Acked-by: Jason Wang --- include/hw/virtio/vhost-vdpa.h | 4 hw/virtio/vhost-vdpa.c | 7 +++ 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h index a29dbb3f53..7214eb47dc 100644 --- a/include/hw/virtio/vhost-vdpa.h +++ b/include/hw/virtio/vhost-vdpa.h @@ -39,4 +39,8 @@ typedef struct vhost_vdpa { VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX]; } VhostVDPA; +int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size, + void *vaddr, bool readonly); +int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size); + #endif diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 0b13e98471..96997210be 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -71,8 +71,8 @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section, return false; } -static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size, - void *vaddr, bool readonly) +int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size, + void *vaddr, bool readonly) { struct vhost_msg_v2 msg = {}; int fd = v->device_fd; @@ -97,8 +97,7 @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size, return ret; } -static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, -hwaddr size) +int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size) { struct vhost_msg_v2 msg = {}; int fd = v->device_fd; -- 2.31.1
[PATCH v4 13/19] vhost: Add svq avail_handler callback
This allows external handlers to be aware of new buffers that the guest places in the virtqueue. When this callback is defined the ownership of the guest's virtqueue element is transferred to the callback. This means that if the user wants to forward the descriptor it needs to manually inject it. The callback is also free to process the command by itself and use the element with svq_push. Signed-off-by: Eugenio Pérez --- hw/virtio/vhost-shadow-virtqueue.h | 31 +- hw/virtio/vhost-shadow-virtqueue.c | 14 -- hw/virtio/vhost-vdpa.c | 3 ++- 3 files changed, 44 insertions(+), 4 deletions(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h index cf442f7dea..d04c34a589 100644 --- a/hw/virtio/vhost-shadow-virtqueue.h +++ b/hw/virtio/vhost-shadow-virtqueue.h @@ -25,6 +25,27 @@ typedef struct SVQDescState { unsigned int ndescs; } SVQDescState; +typedef struct VhostShadowVirtqueue VhostShadowVirtqueue; + +/** + * Callback to handle an avail buffer. + * + * @svq: Shadow virtqueue + * @elem: Element placed in the queue by the guest + * @vq_callback_opaque: Opaque + * + * Returns 0 if the vq is running as expected. + * + * Note that ownership of elem is transferred to the callback. + */ +typedef int (*VirtQueueAvailCallback)(VhostShadowVirtqueue *svq, + VirtQueueElement *elem, + void *vq_callback_opaque); + +typedef struct VhostShadowVirtqueueOps { +VirtQueueAvailCallback avail_handler; +} VhostShadowVirtqueueOps; + /* Shadow virtqueue to relay notifications */ typedef struct VhostShadowVirtqueue { /* Shadow vring */ @@ -69,6 +90,12 @@ typedef struct VhostShadowVirtqueue { */ uint16_t *desc_next; +/* Caller callbacks */ +const VhostShadowVirtqueueOps *ops; + +/* Caller callbacks opaque */ +void *ops_opaque; + /* Next head to expose to the device */ uint16_t shadow_avail_idx; @@ -102,7 +129,9 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev, VirtQueue *vq); void vhost_svq_stop(VhostShadowVirtqueue *svq); -VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree); +VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree, +const VhostShadowVirtqueueOps *ops, +void *ops_opaque); void vhost_svq_free(gpointer vq); G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free); diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index 3c26781cf7..1646d3cb40 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -306,7 +306,11 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq) break; } -r = vhost_svq_add_element(svq, elem); +if (svq->ops) { +r = svq->ops->avail_handler(svq, elem, svq->ops_opaque); +} else { +r = vhost_svq_add_element(svq, elem); +} if (unlikely(r != 0)) { if (r == -ENOSPC) { /* @@ -684,12 +688,16 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq) * shadow methods and file descriptors. * * @iova_tree: Tree to perform descriptors translations + * @ops: SVQ owner callbacks + * @ops_opaque: ops opaque pointer * * Returns the new virtqueue or NULL. * * In case of error, reason is reported through error_report. */ -VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree) +VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree, +const VhostShadowVirtqueueOps *ops, +void *ops_opaque) { g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1); int r; @@ -711,6 +719,8 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree) event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND); event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call); svq->iova_tree = iova_tree; +svq->ops = ops; +svq->ops_opaque = ops_opaque; return g_steal_pointer(&svq); err_init_hdev_call: diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 66f054a12c..0b13e98471 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -418,8 +418,9 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v, shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free); for (unsigned n = 0; n < hdev->nvqs; ++n) { -g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree); +g_autoptr(VhostShadowVirtqueue) svq; +svq = vhost_svq_new(v->iova_tree, NULL, NULL); if (unlikely(!svq)) { error_setg(errp, "Cannot create svq %u", n); return -1; -- 2.31.1
[PATCH v4 19/19] vdpa: Add x-svq to NetdevVhostVDPAOptions
Finally offering the possibility to enable SVQ from the command line. Signed-off-by: Eugenio Pérez Acked-by: Markus Armbruster --- qapi/net.json| 9 +- net/vhost-vdpa.c | 72 ++-- 2 files changed, 77 insertions(+), 4 deletions(-) diff --git a/qapi/net.json b/qapi/net.json index 9af11e9a3b..75ba2cb989 100644 --- a/qapi/net.json +++ b/qapi/net.json @@ -445,12 +445,19 @@ # @queues: number of queues to be created for multiqueue vhost-vdpa # (default: 1) # +# @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1) +# (default: false) +# +# Features: +# @unstable: Member @x-svq is experimental. +# # Since: 5.1 ## { 'struct': 'NetdevVhostVDPAOptions', 'data': { '*vhostdev': 'str', -'*queues': 'int' } } +'*queues': 'int', +'*x-svq':{'type': 'bool', 'features' : [ 'unstable'] } } } ## # @NetdevVmnetHostOptions: diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 0afa60bb51..986e6414b4 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -75,6 +75,28 @@ const int vdpa_feature_bits[] = { VHOST_INVALID_FEATURE_BIT }; +/** Supported device specific feature bits with SVQ */ +static const uint64_t vdpa_svq_device_features = +BIT_ULL(VIRTIO_NET_F_CSUM) | +BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) | +BIT_ULL(VIRTIO_NET_F_MTU) | +BIT_ULL(VIRTIO_NET_F_MAC) | +BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) | +BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) | +BIT_ULL(VIRTIO_NET_F_GUEST_ECN) | +BIT_ULL(VIRTIO_NET_F_GUEST_UFO) | +BIT_ULL(VIRTIO_NET_F_HOST_TSO4) | +BIT_ULL(VIRTIO_NET_F_HOST_TSO6) | +BIT_ULL(VIRTIO_NET_F_HOST_ECN) | +BIT_ULL(VIRTIO_NET_F_HOST_UFO) | +BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) | +BIT_ULL(VIRTIO_NET_F_STATUS) | +BIT_ULL(VIRTIO_NET_F_CTRL_VQ) | +BIT_ULL(VIRTIO_F_ANY_LAYOUT) | +BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) | +BIT_ULL(VIRTIO_NET_F_RSC_EXT) | +BIT_ULL(VIRTIO_NET_F_STANDBY); + VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc) { VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc); @@ -133,9 +155,13 @@ err_init: static void vhost_vdpa_cleanup(NetClientState *nc) { VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc); +struct vhost_dev *dev = &s->vhost_net->dev; qemu_vfree(s->cvq_cmd_out_buffer); qemu_vfree(s->cvq_cmd_in_buffer); +if (dev->vq_index + dev->nvqs == dev->vq_index_end) { +g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete); +} if (s->vhost_net) { vhost_net_cleanup(s->vhost_net); g_free(s->vhost_net); @@ -437,7 +463,9 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer, int vdpa_device_fd, int queue_pair_index, int nvqs, - bool is_datapath) + bool is_datapath, + bool svq, + VhostIOVATree *iova_tree) { NetClientState *nc = NULL; VhostVDPAState *s; @@ -455,6 +483,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer, s->vhost_vdpa.device_fd = vdpa_device_fd; s->vhost_vdpa.index = queue_pair_index; +s->vhost_vdpa.shadow_vqs_enabled = svq; +s->vhost_vdpa.iova_tree = iova_tree; if (!is_datapath) { s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(), vhost_vdpa_net_cvq_cmd_page_len()); @@ -465,6 +495,8 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer, s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops; s->vhost_vdpa.shadow_vq_ops_opaque = s; +error_setg(&s->vhost_vdpa.migration_blocker, + "Migration disabled: vhost-vdpa uses CVQ."); } ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs); if (ret) { @@ -474,6 +506,14 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer, return nc; } +static int vhost_vdpa_get_iova_range(int fd, + struct vhost_vdpa_iova_range *iova_range) +{ +int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range); + +return ret < 0 ? -errno : 0; +} + static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp) { int ret = ioctl(fd, VHOST_GET_FEATURES, features); @@ -524,6 +564,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name, uint64_t features; int vdpa_device_fd; g_autofree NetClientState **ncs = NULL; +g_autoptr(VhostIOVATree) iova_tree = NULL; NetClientState *nc; int queue_pairs, r, i, has_cvq = 0; @@ -551,22 +592,45 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name, return queue_pairs; } +if (
[PATCH v4 07/19] vhost: Decouple vhost_svq_add from VirtQueueElement
VirtQueueElement comes from the guest, but we're heading SVQ to be able to modify the element presented to the device without the guest's knowledge. To do so, make SVQ accept sg buffers directly, instead of using VirtQueueElement. Add vhost_svq_add_element to maintain element convenience. Signed-off-by: Eugenio Pérez Acked-by: Jason Wang --- hw/virtio/vhost-shadow-virtqueue.c | 33 -- 1 file changed, 22 insertions(+), 11 deletions(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index aee9891a67..b005a457c6 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -172,30 +172,31 @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg, } static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, -VirtQueueElement *elem, unsigned *head) +const struct iovec *out_sg, size_t out_num, +const struct iovec *in_sg, size_t in_num, +unsigned *head) { unsigned avail_idx; vring_avail_t *avail = svq->vring.avail; bool ok; -g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num)); +g_autofree hwaddr *sgs = g_new(hwaddr, MAX(out_num, in_num)); *head = svq->free_head; /* We need some descriptors here */ -if (unlikely(!elem->out_num && !elem->in_num)) { +if (unlikely(!out_num && !in_num)) { qemu_log_mask(LOG_GUEST_ERROR, "Guest provided element with no descriptors"); return false; } -ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num, - elem->in_num > 0, false); +ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_num, in_num > 0, + false); if (unlikely(!ok)) { return false; } -ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, - true); +ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_num, false, true); if (unlikely(!ok)) { return false; } @@ -237,17 +238,19 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq) * * Return -EINVAL if element is invalid, -ENOSPC if dev queue is full */ -static int vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem) +static int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg, + size_t out_num, const struct iovec *in_sg, + size_t in_num, VirtQueueElement *elem) { unsigned qemu_head; -unsigned ndescs = elem->in_num + elem->out_num; +unsigned ndescs = in_num + out_num; bool ok; if (unlikely(ndescs > vhost_svq_available_slots(svq))) { return -ENOSPC; } -ok = vhost_svq_add_split(svq, elem, &qemu_head); +ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, &qemu_head); if (unlikely(!ok)) { g_free(elem); return -EINVAL; @@ -258,6 +261,14 @@ static int vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem) return 0; } +/* Convenience wrapper to add a guest's element to SVQ */ +static int vhost_svq_add_element(VhostShadowVirtqueue *svq, + VirtQueueElement *elem) +{ +return vhost_svq_add(svq, elem->out_sg, elem->out_num, elem->in_sg, + elem->in_num, elem); +} + /** * Forward available buffers. * @@ -294,7 +305,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq) break; } -r = vhost_svq_add(svq, elem); +r = vhost_svq_add_element(svq, elem); if (unlikely(r != 0)) { if (r == -ENOSPC) { /* -- 2.31.1
[PATCH v4 17/19] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
To know the device features is needed for CVQ SVQ, so SVQ knows if it can handle all commands or not. Extract from vhost_vdpa_get_max_queue_pairs so we can reuse it. Signed-off-by: Eugenio Pérez Acked-by: Jason Wang --- net/vhost-vdpa.c | 30 -- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 3915b148c4..0afa60bb51 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -474,20 +474,24 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer, return nc; } -static int vhost_vdpa_get_max_queue_pairs(int fd, int *has_cvq, Error **errp) +static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp) +{ +int ret = ioctl(fd, VHOST_GET_FEATURES, features); +if (unlikely(ret < 0)) { +error_setg_errno(errp, errno, + "Fail to query features from vhost-vDPA device"); +} +return ret; +} + +static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features, + int *has_cvq, Error **errp) { unsigned long config_size = offsetof(struct vhost_vdpa_config, buf); g_autofree struct vhost_vdpa_config *config = NULL; __virtio16 *max_queue_pairs; -uint64_t features; int ret; -ret = ioctl(fd, VHOST_GET_FEATURES, &features); -if (ret) { -error_setg(errp, "Fail to query features from vhost-vDPA device"); -return ret; -} - if (features & (1 << VIRTIO_NET_F_CTRL_VQ)) { *has_cvq = 1; } else { @@ -517,10 +521,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name, NetClientState *peer, Error **errp) { const NetdevVhostVDPAOptions *opts; +uint64_t features; int vdpa_device_fd; g_autofree NetClientState **ncs = NULL; NetClientState *nc; -int queue_pairs, i, has_cvq = 0; +int queue_pairs, r, i, has_cvq = 0; assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA); opts = &netdev->u.vhost_vdpa; @@ -534,7 +539,12 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name, return -errno; } -queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd, +r = vhost_vdpa_get_features(vdpa_device_fd, &features, errp); +if (unlikely(r < 0)) { +return r; +} + +queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd, features, &has_cvq, errp); if (queue_pairs < 0) { qemu_close(vdpa_device_fd); -- 2.31.1
Re: [PATCH] gpio: designware gpio driver
On Mon, 18 Jul 2022 at 11:25, Ben Dooks wrote: > > On 18/07/2022 11:15, Peter Maydell wrote: > > On Mon, 18 Jul 2022 at 11:05, Ben Dooks wrote: > >> > >> On 13/07/2022 18:20, Ben Dooks wrote: > >>> A model for the DesignWare GPIO (v1) block. > >> > >> Is there anyone else who should be reviewing these that > >> was missed off the original list? I'd like to get an idea > >> if there is any work to do. I've got a couple more drivers > >> to submit and was waiting on feedback from this before > >> getting these submitted. > > > > > > My overall feedback is: this isn't a pluggable device (PCI, etc), > > so what's it intended to be used by? Generally we don't take > > device models except when there's a board model that's using them. > > I have a board file, but that's currently under NDA, so we're not > allowed to release it at the moment. However we've done a few drivers > which we'd like to get out of our development tree which other people > might find useful (GPIO, SPI, I2C). As I say, we don't really accept those, because we have no way of testing them upstream unless they're used in a board file: they're dead code from our point of view. When you have a board model you're ready to submit you can send them in the same patchseries as the board model. thanks -- PMM
[PATCH v4 09/19] vhost: Track number of descs in SVQDescState
A guest's buffer continuos on GPA may need multiple descriptors on qemu's VA, so SVQ should track its length sepparatedly. Signed-off-by: Eugenio Pérez --- hw/virtio/vhost-shadow-virtqueue.h | 6 ++ hw/virtio/vhost-shadow-virtqueue.c | 4 ++-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h index d646c35054..5c7e7cbab6 100644 --- a/hw/virtio/vhost-shadow-virtqueue.h +++ b/hw/virtio/vhost-shadow-virtqueue.h @@ -17,6 +17,12 @@ typedef struct SVQDescState { VirtQueueElement *elem; + +/* + * Number of descriptors exposed to the device. May or may not match + * guest's + */ +unsigned int ndescs; } SVQDescState; /* Shadow virtqueue to relay notifications */ diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index d12f5afffb..ae5bd6efa8 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -257,6 +257,7 @@ static int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg, } svq->desc_state[qemu_head].elem = elem; +svq->desc_state[qemu_head].ndescs = ndescs; vhost_svq_kick(svq); return 0; } @@ -417,8 +418,7 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, return NULL; } -num = svq->desc_state[used_elem.id].elem->in_num + - svq->desc_state[used_elem.id].elem->out_num; +num = svq->desc_state[used_elem.id].ndescs; last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id); svq->desc_next[last_used_chain] = svq->free_head; svq->free_head = used_elem.id; -- 2.31.1
Re: [RFC PATCH] qobject: Rewrite implementation of QDict for in-order traversal
Peter Maydell writes: > On Mon, 11 Jul 2022 at 12:09, Daniel P. Berrangé wrote: >> >> On Mon, Jul 11, 2022 at 11:32:35AM +0100, Peter Maydell wrote: >> > I'm pretty sure that nothing needs sparse array elements like >> > that. The major reason for the len-PROP field is an implementation >> > one: because there is currently no way for a QOM object to >> > say "call this method if somebody tries to set a non-existent >> > property", the way array properties work is that the 'set' >> > method for the len-PROP property is the place where we then >> > add the PROP[0], PROP[1], ... properties. >> >> Ahhh, I see what you mean. I totally missed this subtle detail. >> >> IIUC, there's essentially no such thing as array properties >> in QOM. 'prop[0]', 'prop[1]', 'prop[2]', etc are all simply >> scalar properties from QOM's, that just happen to follow a >> common naming scheme, but QOM doesn't care about that. > > Yeah. The exception is the bit Markus pointed out where somebody > has later added special case support for setting "prop[*]" to > mean "whatever the next index is". > >> > If we either had a "call this for any property set/get attempt >> > where there is no specific method set" or else had array >> > properties supported by the core QOM code, we could avoid >> > having to set len-PROP first. >> >> Techically arrays are already supported at the core QOM level, because >> you can use any QAPI type as a property. The authz/list.c object >> has a 'rules' property that is an array of QAuthzListRule objects: >> >> { 'struct': 'AuthZListProperties', >> 'data': { '*policy': 'QAuthZListPolicy', >> '*rules': ['QAuthZListRule'] } } >> >> At the time I wrote that, we couldn't express it on the CLI though, >> without using JSON syntax for -object. I don't think we've ever >> made it possible to use the opts_visitor with non-scalar properties >> though. The opts visitor is flat by design. Avoid it in new code, use qobject_input_visitor_new_keyval() instead. Typical use is via a wrapper such as qobject_input_visitor_new_str(). > Mmm, if we had started the array-property implementation starting > from a QAPI API (or even with the idea of setting properties from > command line arguments) and working from there we'd probably have ended > up with something different. The primary use case though has > never involved QAPI or the command line, it's just C code for setting > property values on devices created within QEMU. Trouble with QOM is that things meant for internal use bleed to the external interface so easily. We've been quite cavalier about that. Should we change our attitude?
Re: [RISU PATCH v4 15/29] Rearrange reginfo and memblock buffers
On Fri, 8 Jul 2022 at 17:06, Richard Henderson wrote: > > For send_register_info from master_sigill, do not keep a > reginfo buffer on the stack. At the moment, this struct > is quite large for aarch64. > > Put the two reginfo buffers into an array, for the benefit > of future dumping. For recv_and_compare_register_info, > index this array with constants, so it's a simple rename. > > Signed-off-by: Richard Henderson Reviewed-by: Peter Maydell thanks -- PMM
Re: [RISU PATCH v4 14/29] Merge reginfo.c into risu.c
On Fri, 8 Jul 2022 at 17:02, Richard Henderson wrote: > > The distinction between the two is artificial. Following > patches will rearrange the functions involved to make it > easier for dumping of the trace file. > > Signed-off-by: Richard Henderson Reviewed-by: Peter Maydell thanks -- PMM
Re: [RISU PATCH v4 16/29] Split out recv_register_info
On Fri, 8 Jul 2022 at 17:36, Richard Henderson wrote: > > We will want to share this code when dumping. > > Signed-off-by: Richard Henderson Reviewed-by: Peter Maydell thanks -- PMM
Re: [RFC v4 0/9] Add support for zoned device
Sam Li writes: > Markus Armbruster 于2022年7月12日周二 13:47写道: >> >> Sam Li writes: >> >> > This patch series adds support for zoned device to virtio-blk emulation. >> > Zoned >> > Storage can support sequential writes, which reduces write amplification >> > in SSD, >> > leading to higher write throughput and increased capacity. >> >> Forgive me if this has already been discussed, or is explained deeper in >> the patch series... >> >> The commit message sounds like you're extending virtio-blk to optionally >> emulate zoned storage. Correct? > > Yes! The main purpose is to emulate zoned storage only for the zoned > device files. Right now, QEMU sees those as regular block devices. > >> PATCH 1 adds a new block block device driver 'zoned_host_device', and >> PATCH 9 exposes it in QAPI. This is for passing through a zoned host >> device, correct? > > Yes! It allows the guest os see zoned host device. It is still in > development. Maybe the implementations will change later. Your cover letter only mentions the virtio-blk part, not the pass-through part. Please correct that if you need to respin.
Re: [RISU PATCH v4 17/29] Add magic and size to the trace header
On Fri, 8 Jul 2022 at 17:08, Richard Henderson wrote: > > Sanity check that we're not getting out of sync with > the trace stream. This will be especially bad with > the change in size of the sve save data. > > Signed-off-by: Richard Henderson > --- Reviewed-by: Peter Maydell thanks -- PMM
Re: [PATCH v2 1/6] target/riscv: add check for supported privilege modes conbinations
在 2022/7/18 下午5:02, Andrew Jones 写道: On Tue, Jul 12, 2022 at 02:32:31PM +0800, Weiwei Li wrote: - There are 3 suggested privilege modes conbinations listed in the spec: No need for '-' here. s/modes/mode/ s/conbinations/combinations/ (Same typos in $SUBJECT, also please capitalize 'add' in $SUBJECT.) When referencing the spec it's nice to point out the doc/version/section. 1) M, 2) M, U 3) M, S, U Signed-off-by: Weiwei Li Signed-off-by: Junqiang Wang Reviewed-by: Alistair Francis --- target/riscv/cpu.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c index db2b8e4d30..36c1b26fb3 100644 --- a/target/riscv/cpu.c +++ b/target/riscv/cpu.c @@ -726,6 +726,12 @@ static void riscv_cpu_realize(DeviceState *dev, Error **errp) return; } +if (cpu->cfg.ext_s && !cpu->cfg.ext_u) { +error_setg(errp, + "Setting S extension without U extension is illegal"); +return; +} + if (cpu->cfg.ext_f && !cpu->cfg.ext_icsr) { error_setg(errp, "F extension requires Zicsr"); return; -- 2.17.1 Besides the commit message issues Reviewed-by: Andrew Jones Thanks for your comments. I'll fix the issues in commit messages in the next version. Regards, Weiwei Li
Re: [RISU PATCH v4 19/29] aarch64: Assume system support for SVE
On Fri, 8 Jul 2022 at 17:09, Richard Henderson wrote: > > SVE support is no longer new, assume it's present. > > Signed-off-by: Richard Henderson Reviewed-by: Peter Maydell thanks -- PMM
Re: [RISU PATCH v4 18/29] Compute reginfo_size based on the reginfo
On Fri, 8 Jul 2022 at 17:35, Richard Henderson wrote: > > This will allow dumping of SVE frames without having > to know the SVE vector length beforehand. > > Signed-off-by: Richard Henderson > --- Reviewed-by: Peter Maydell thanks -- PMM
Re: [RISU PATCH v4 20/29] aarch64: Reorg sve reginfo to save space
On Fri, 8 Jul 2022 at 17:38, Richard Henderson wrote: > > Mirror the signal frame by storing all of the registers > as a lump. Use the signal macros to pull out the values. > > Signed-off-by: Richard Henderson > --- > risu_reginfo_aarch64.h | 45 ++- > risu_reginfo_aarch64.c | 171 - > 2 files changed, 108 insertions(+), 108 deletions(-) Reviewed-by: Peter Maydell thanks -- PMM
Re: [RISU PATCH v4 21/29] aarch64: Use arch_init to configure sve
On Fri, 8 Jul 2022 at 17:41, Richard Henderson wrote: > > Adjust some of the aarch64 code to look at the reginfo struct > instead of looking at test_sve, so that we do not need to pass > the --test-sve option in order to dump sve trace files. > > Diagnose EINVAL as either cpu or kernel does not support SVE. > > Signed-off-by: Richard Henderson > --- Reviewed-by: Peter Maydell thanks -- PMM
[PATCH v2 7/9] target/arm: Add PMSAv8r registers
From: Tobias Röhmel Signed-off-by: Tobias Röhmel --- target/arm/cpu.h| 10 +++ target/arm/helper.c | 171 2 files changed, 181 insertions(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index 86e06116a9..632d0d13c6 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -726,8 +726,18 @@ typedef struct CPUArchState { */ uint32_t *rbar[M_REG_NUM_BANKS]; uint32_t *rlar[M_REG_NUM_BANKS]; +uint32_t prbarn[255]; +uint32_t prlarn[255]; +uint32_t hprbarn[255]; +uint32_t hprlarn[255]; uint32_t mair0[M_REG_NUM_BANKS]; uint32_t mair1[M_REG_NUM_BANKS]; +uint32_t prbar; +uint32_t prlar; +uint32_t prselr; +uint32_t hprbar; +uint32_t hprlar; +uint32_t hprselr; } pmsav8; /* v8M SAU */ diff --git a/target/arm/helper.c b/target/arm/helper.c index 03bdc3d149..f9ed2bd5c3 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -7422,6 +7422,78 @@ static CPAccessResult access_joscr_jmcr(CPUARMState *env, return CP_ACCESS_OK; } +static void prbar_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ +env->pmsav8.prbarn[env->pmsav8.prselr] = value; +} + +static void prlar_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ +env->pmsav8.prlarn[env->pmsav8.prselr] = value; +} + +static uint64_t prbar_read(CPUARMState *env, const ARMCPRegInfo *ri) +{ +return env->pmsav8.prbarn[env->pmsav8.prselr]; +} + +static uint64_t prlar_read(CPUARMState *env, const ARMCPRegInfo *ri) +{ +return env->pmsav8.prlarn[env->pmsav8.prselr]; +} + +static void hprbar_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ +env->pmsav8.hprbarn[env->pmsav8.hprselr] = value; +} + +static void hprlar_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ +env->pmsav8.hprlarn[env->pmsav8.hprselr] = value; +} + +static void hprenr_write(CPUARMState *env, const ARMCPRegInfo *ri, + uint64_t value) +{ +uint32_t n; +ARMCPU *cpu = env_archcpu(env); +for (n = 0; n < (int)cpu->pmsav7_dregion; ++n) { +if (value & (1 << n)) { +env->pmsav8.hprlarn[n] |= 0x1; +} else { +env->pmsav8.hprlarn[n] &= (~0x1); +} +} +} + +static uint64_t hprbar_read(CPUARMState *env, const ARMCPRegInfo *ri) +{ +return env->pmsav8.hprbarn[env->pmsav8.hprselr]; +} + +static uint64_t hprlar_read(CPUARMState *env, const ARMCPRegInfo *ri) +{ +return env->pmsav8.hprlarn[env->pmsav8.hprselr]; +} + +static uint64_t hprenr_read(CPUARMState *env, const ARMCPRegInfo *ri) +{ +uint32_t n; +uint32_t result = 0x0; +ARMCPU *cpu = env_archcpu(env); + +for (n = 0; n < (int)cpu->pmsav7_dregion; ++n) { +if (env->pmsav8.hprlarn[n] & 0x1) { +result |= (0x1 << n); +} +} +return result; +} + static const ARMCPRegInfo jazelle_regs[] = { { .name = "JIDR", .cp = 14, .crn = 0, .crm = 0, .opc1 = 7, .opc2 = 0, @@ -8235,6 +8307,46 @@ void register_cp_regs_for_features(ARMCPU *cpu) .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = cpu->pmsav7_dregion << 8 }; +/* PMSAv8-R registers*/ +ARMCPRegInfo id_pmsav8_r_reginfo[] = { +{ .name = "HMPUIR", + .cp = 15, .crn = 0, .crm = 0, .opc1 = 4, .opc2 = 4, + .access = PL2_R, .type = ARM_CP_CONST, + .resetvalue = cpu->pmsav7_dregion}, + /* PMSAv8-R registers */ +{ .name = "PRBAR", + .cp = 15, .opc1 = 0, .crn = 6, .crm = 3, .opc2 = 0, + .access = PL1_RW, .resetvalue = 0, + .readfn = prbar_read, .writefn = prbar_write, + .fieldoffset = offsetof(CPUARMState, pmsav8.prbar)}, +{ .name = "PRLAR", + .cp = 15, .opc1 = 0, .crn = 6, .crm = 3, .opc2 = 1, + .access = PL1_RW, .resetvalue = 0, + .readfn = prlar_read, .writefn = prlar_write, + .fieldoffset = offsetof(CPUARMState, pmsav8.prlar)}, +{ .name = "PRSELR", .resetvalue = 0, + .cp = 15, .opc1 = 0, .crn = 6, .crm = 2, .opc2 = 1, + .access = PL1_RW, .accessfn = access_tvm_trvm, + .fieldoffset = offsetof(CPUARMState, pmsav8.prselr)}, +{ .name = "HPRBAR", .resetvalue = 0, + .readfn = hprbar_read, .writefn = hprbar_write, + .cp = 15, .opc1 = 4, .crn = 6, .crm = 3, .opc2 = 0, + .access = PL2_RW, .resetvalue = 0, + .fieldoffset = offsetof(CPUARMState, pmsav8.hprbar)}, +{ .name = "HPRLAR", + .cp = 15, .opc1 = 4, .crn = 6, .crm = 3, .opc2 = 1, + .access = PL2_RW, .resetvalue = 0, + .readfn = hprlar_r
[PATCH v2 3/9] target/arm: Make RVBAR available for all ARMv8 CPUs
From: Tobias Röhmel Signed-off-by: Tobias Röhmel --- target/arm/cpu.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 1b5d535788..9007768418 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -258,6 +258,10 @@ static void arm_cpu_reset(DeviceState *dev) env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1, CPACR, CP11, 3); #endif +if (arm_feature(env, ARM_FEATURE_V8)) { +env->cp15.rvbar = cpu->rvbar_prop; +env->regs[15] = cpu->rvbar_prop; +} } #if defined(CONFIG_USER_ONLY) @@ -1273,7 +1277,7 @@ void arm_cpu_post_init(Object *obj) qdev_property_add_static(DEVICE(obj), &arm_cpu_reset_hivecs_property); } -if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) { +if (arm_feature(&cpu->env, ARM_FEATURE_V8)) { object_property_add_uint64_ptr(obj, "rvbar", &cpu->rvbar_prop, OBJ_PROP_FLAG_READWRITE); -- 2.25.1
[PATCH v2 6/9] target/arm: Enable TTBCR_EAE for ARMv8-R AArch32
From: Tobias Röhmel ARMv8-R AArch32 CPUs behave as if TTBCR.EAE is always 1 even tough they don't have the TTBCR register. See ARM Architecture Reference Manual Supplement - ARMv8, for the ARMv8-R AArch32 architecture profile Version:A.c section C1.2. Signed-off-by: Tobias Röhmel --- target/arm/debug_helper.c | 3 ++- target/arm/internals.h| 3 ++- target/arm/tlb_helper.c | 3 ++- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c index b18a6bd3a2..44b1e32974 100644 --- a/target/arm/debug_helper.c +++ b/target/arm/debug_helper.c @@ -434,7 +434,8 @@ static uint32_t arm_debug_exception_fsr(CPUARMState *env) using_lpae = true; } else { if (arm_feature(env, ARM_FEATURE_LPAE) && -(env->cp15.tcr_el[target_el].raw_tcr & TTBCR_EAE)) { +((env->cp15.tcr_el[target_el].raw_tcr & TTBCR_EAE) +|| arm_feature(env, ARM_FEATURE_V8_R))) { using_lpae = true; } } diff --git a/target/arm/internals.h b/target/arm/internals.h index b03049d920..e2a2b03d41 100644 --- a/target/arm/internals.h +++ b/target/arm/internals.h @@ -254,7 +254,8 @@ static inline bool extended_addresses_enabled(CPUARMState *env) { TCR *tcr = &env->cp15.tcr_el[arm_is_secure(env) ? 3 : 1]; return arm_el_is_aa64(env, 1) || - (arm_feature(env, ARM_FEATURE_LPAE) && (tcr->raw_tcr & TTBCR_EAE)); + (arm_feature(env, ARM_FEATURE_LPAE) && ((tcr->raw_tcr & TTBCR_EAE) + || arm_feature(env, ARM_FEATURE_V8_R))); } /* Update a QEMU watchpoint based on the information the guest has set in the diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c index 7d8a86b3c4..891326edb8 100644 --- a/target/arm/tlb_helper.c +++ b/target/arm/tlb_helper.c @@ -20,7 +20,8 @@ bool regime_using_lpae_format(CPUARMState *env, ARMMMUIdx mmu_idx) return true; } if (arm_feature(env, ARM_FEATURE_LPAE) -&& (regime_tcr(env, mmu_idx)->raw_tcr & TTBCR_EAE)) { +&& ((regime_tcr(env, mmu_idx)->raw_tcr & TTBCR_EAE) +|| arm_feature(env, ARM_FEATURE_V8_R))) { return true; } return false; -- 2.25.1
[PATCH v2 0/9] Add Cortex-R52
v2: PATCH 1: I have left the flag in for now because there there is a lot of use for it in the MPU translation code. The PMSAv8r differs in quite a view ways from the implementation in the Cortex-M. I think using !ARM_FEATURE_M in all of those cases might run into problems down the road when new things are added. But I'll gladly change that if those concerns are not valid. PATCH 2: Patch was moved and I removed the ATCM... registers. PATCH 3: The issue here is that the R52 has the MPUIR register which shares the encoding with one of the MIDR alias registers. It's now changed to not add that register for ARM_FEATURE_V8_R. PATCH 4: Added RVBAR for all v8 CPUs instead of just ARMv8r PATCH 7: Instead of setting TTBCR_EAE to 1, change all functions that rely on that value and are relevant for Cortex-R52 PATCH 10: SPSR_hyp changes removed PATCH 11: Removed the r52_machine. The issue with adding the Cortex-R52 to the virt board is that target_page.bits is expected to be 12 but is set to 10 for ARM_FEATURE_PMSA. Simply changing that or using virt2.7(which doesn't have that restriction) leads to problems with memory accesses. It might be best to model an existing board. These patches add the ARM Cortex-R52. The biggest addition is an implementation of the armv8-r MPU. All information is taken from: - ARM Cortex-R52 TRM revision r1p3 - ARM Architecture Reference Manual Supplement -ARMv8 for the ARMv8-R AArch32 architecture profile Version A.c Functionality that is not implemented: - Changing between single and double precision floats - Some hypervisor related functionality (HCR.T(R)VM,HADFSR,...) Tobias Röhmel (9): target/arm: Add ARM_FEATURE_V8_R target/arm: Don't add all MIDR aliases for Cortex-R target/arm: Make RVBAR available for all ARMv8 CPUs target/arm: Make stage_2_format for cache attributes optional target/arm: Add ARMCacheAttrs to the signature of pmsav8_mpu_lookup target/arm: Enable TTBCR_EAE for ARMv8-R AArch32 target/arm: Add PMSAv8r registers target/arm: Add PMSAv8r functionality target/arm: Add ARM Cortex-R52 cpu target/arm/cpu.c | 6 +- target/arm/cpu.h | 11 +++ target/arm/cpu_tcg.c | 42 + target/arm/debug_helper.c | 3 +- target/arm/helper.c | 183 +++- target/arm/internals.h| 16 ++-- target/arm/m_helper.c | 3 +- target/arm/ptw.c | 191 -- target/arm/tlb_helper.c | 3 +- 9 files changed, 417 insertions(+), 41 deletions(-) -- 2.25.1
[PATCH v2 2/9] target/arm: Don't add all MIDR aliases for Cortex-R
From: Tobias Röhmel Cortex-R52 has the MPUIR register which has the same encoding has the MIDR alias with opc2=4. So we only add that alias when we are not realizing a Cortex-R. Signed-off-by: Tobias Röhmel --- target/arm/helper.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/target/arm/helper.c b/target/arm/helper.c index 6457e6301c..03bdc3d149 100644 --- a/target/arm/helper.c +++ b/target/arm/helper.c @@ -8189,9 +8189,6 @@ void register_cp_regs_for_features(ARMCPU *cpu) .fieldoffset = offsetof(CPUARMState, cp15.c0_cpuid), .readfn = midr_read }, /* crn = 0 op1 = 0 crm = 0 op2 = 4,7 : AArch32 aliases of MIDR */ -{ .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST, - .cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 4, - .access = PL1_R, .resetvalue = cpu->midr }, { .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST, .cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 7, .access = PL1_R, .resetvalue = cpu->midr }, @@ -8201,6 +8198,11 @@ void register_cp_regs_for_features(ARMCPU *cpu) .accessfn = access_aa64_tid1, .type = ARM_CP_CONST, .resetvalue = cpu->revidr }, }; +ARMCPRegInfo id_v8_midr_alias_cp_reginfo[] = { +{ .name = "MIDR", .type = ARM_CP_ALIAS | ARM_CP_CONST, + .cp = 15, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 4, + .access = PL1_R, .resetvalue = cpu->midr }, +}; ARMCPRegInfo id_cp_reginfo[] = { /* These are common to v8 and pre-v8 */ { .name = "CTR", @@ -8264,8 +8266,12 @@ void register_cp_regs_for_features(ARMCPU *cpu) id_mpuir_reginfo.access = PL1_RW; id_tlbtr_reginfo.access = PL1_RW; } + if (arm_feature(env, ARM_FEATURE_V8)) { define_arm_cp_regs(cpu, id_v8_midr_cp_reginfo); +if (!arm_feature(env, ARM_FEATURE_V8_R)) { +define_arm_cp_regs(cpu, id_v8_midr_alias_cp_reginfo); +} } else { define_arm_cp_regs(cpu, id_pre_v8_midr_cp_reginfo); } -- 2.25.1
[PATCH v2 4/9] target/arm: Make stage_2_format for cache attributes optional
From: Tobias Röhmel The Cortex-R52 has a 2 stage MPU translation process but doesn't have the FEAT_S2FWB feature. This makes it neccessary to allow for the old cache attribut combination. This is facilitated by changing the control path of combine_cacheattrs instead of failing if the second cache attributes struct is not in that format. Signed-off-by: Tobias Röhmel --- target/arm/ptw.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/target/arm/ptw.c b/target/arm/ptw.c index 4d97a24808..8b037c1f55 100644 --- a/target/arm/ptw.c +++ b/target/arm/ptw.c @@ -2108,7 +2108,11 @@ static uint8_t combined_attrs_nofwb(CPUARMState *env, { uint8_t s1lo, s2lo, s1hi, s2hi, s2_mair_attrs, ret_attrs; -s2_mair_attrs = convert_stage2_attrs(env, s2.attrs); +if (s2.is_s2_format) { +s2_mair_attrs = convert_stage2_attrs(env, s2.attrs); +} else { +s2_mair_attrs = s2.attrs; +} s1lo = extract32(s1.attrs, 0, 4); s2lo = extract32(s2_mair_attrs, 0, 4); @@ -2166,6 +2170,8 @@ static uint8_t force_cacheattr_nibble_wb(uint8_t attr) static uint8_t combined_attrs_fwb(CPUARMState *env, ARMCacheAttrs s1, ARMCacheAttrs s2) { +assert(s2.is_s2_format && !s1.is_s2_format); + switch (s2.attrs) { case 7: /* Use stage 1 attributes */ @@ -2215,7 +2221,6 @@ static ARMCacheAttrs combine_cacheattrs(CPUARMState *env, ARMCacheAttrs ret; bool tagged = false; -assert(s2.is_s2_format && !s1.is_s2_format); ret.is_s2_format = false; if (s1.attrs == 0xf0) { -- 2.25.1
[PATCH v2 9/9] target/arm: Add ARM Cortex-R52 cpu
From: Tobias Röhmel All constants are taken from the ARM Cortex-R52 Processor TRM Revision: r1p3 Signed-off-by: Tobias Röhmel --- target/arm/cpu_tcg.c | 42 ++ 1 file changed, 42 insertions(+) diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c index b751a19c8a..e0f445dc91 100644 --- a/target/arm/cpu_tcg.c +++ b/target/arm/cpu_tcg.c @@ -843,6 +843,47 @@ static void cortex_r5_initfn(Object *obj) define_arm_cp_regs(cpu, cortexr5_cp_reginfo); } +static void cortex_r52_initfn(Object *obj) +{ +ARMCPU *cpu = ARM_CPU(obj); + +set_feature(&cpu->env, ARM_FEATURE_V8); +set_feature(&cpu->env, ARM_FEATURE_V8_R); +set_feature(&cpu->env, ARM_FEATURE_EL2); +set_feature(&cpu->env, ARM_FEATURE_PMSA); +set_feature(&cpu->env, ARM_FEATURE_NEON); +set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER); +cpu->midr = 0x411fd133; /* r1p3 */ +cpu->revidr = 0x; +cpu->reset_fpsid = 0x41034023; +cpu->isar.mvfr0 = 0x10110222; +cpu->isar.mvfr1 = 0x1211; +cpu->isar.mvfr2 = 0x0043; +cpu->ctr = 0x8144c004; +cpu->reset_sctlr = 0x30c50838; +cpu->isar.id_pfr0 = 0x0131; +cpu->isar.id_pfr1 = 0x10111001; +cpu->isar.id_dfr0 = 0x03010006; +cpu->id_afr0 = 0x; +cpu->isar.id_mmfr0 = 0x00211040; +cpu->isar.id_mmfr1 = 0x4000; +cpu->isar.id_mmfr2 = 0x0120; +cpu->isar.id_mmfr3 = 0xf0102211; +cpu->isar.id_mmfr4 = 0x0010; +cpu->isar.id_isar0 = 0x02101110; +cpu->isar.id_isar1 = 0x13112111; +cpu->isar.id_isar2 = 0x21232142; +cpu->isar.id_isar3 = 0x01112131; +cpu->isar.id_isar4 = 0x00010142; +cpu->isar.id_isar5 = 0x00010001; +cpu->isar.dbgdidr = 0x77168000; +cpu->clidr = (1 << 27) | (1 << 24) | 0x3; +cpu->ccsidr[0] = 0x700fe01a; /* 32KB L1 dcache */ +cpu->ccsidr[1] = 0x201fe00a; /* 32KB L1 icache */ + +cpu->pmsav7_dregion = 16; +} + static void cortex_r5f_initfn(Object *obj) { ARMCPU *cpu = ARM_CPU(obj); @@ -1149,6 +1190,7 @@ static const ARMCPUInfo arm_tcg_cpus[] = { .class_init = arm_v7m_class_init }, { .name = "cortex-r5", .initfn = cortex_r5_initfn }, { .name = "cortex-r5f", .initfn = cortex_r5f_initfn }, +{ .name = "cortex-r52", .initfn = cortex_r52_initfn }, { .name = "ti925t", .initfn = ti925t_initfn }, { .name = "sa1100", .initfn = sa1100_initfn }, { .name = "sa1110", .initfn = sa1110_initfn }, -- 2.25.1
[PATCH v2 1/9] target/arm: Add ARM_FEATURE_V8_R
From: Tobias Röhmel This flag is necessary to add features for the Cortex-R52. Signed-off-by: Tobias Röhmel --- target/arm/cpu.h | 1 + 1 file changed, 1 insertion(+) diff --git a/target/arm/cpu.h b/target/arm/cpu.h index df677b2d5d..86e06116a9 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -2287,6 +2287,7 @@ enum arm_features { ARM_FEATURE_M_SECURITY, /* M profile Security Extension */ ARM_FEATURE_M_MAIN, /* M profile Main Extension */ ARM_FEATURE_V8_1M, /* M profile extras only in v8.1M and later */ +ARM_FEATURE_V8_R, }; static inline int arm_feature(CPUARMState *env, int feature) -- 2.25.1
Re: [PATCH v7 07/13] multifd: Make flags field thread local
* Juan Quintela (quint...@redhat.com) wrote: > Use of flags with respect to locking was incensistant. For the > sending side: > - it was set to 0 with mutex held on the multifd channel. > - MULTIFD_FLAG_SYNC was set with mutex held on the migration thread. > - Everything else was done without the mutex held on the multifd channel. > > On the reception side, it is not used on the migration thread, only on > the multifd channels threads. > > So we move it to the multifd channels thread only variables, and we > introduce a new bool sync_needed on the send side to pass that information. Does this need rework after Leo's recent changes? Dave > Signed-off-by: Juan Quintela > --- > migration/multifd.h | 10 ++ > migration/multifd.c | 23 +-- > 2 files changed, 19 insertions(+), 14 deletions(-) > > diff --git a/migration/multifd.h b/migration/multifd.h > index 8a45dda58c..af8ce8921d 100644 > --- a/migration/multifd.h > +++ b/migration/multifd.h > @@ -98,12 +98,12 @@ typedef struct { > bool running; > /* should this thread finish */ > bool quit; > -/* multifd flags for each packet */ > -uint32_t flags; > /* global number of generated multifd packets */ > uint64_t packet_num; > /* How many bytes have we sent on the last packet */ > uint64_t sent_bytes; > +/* Do we need to do an iteration sync */ > +bool sync_needed; > /* thread has work to do */ > int pending_job; > /* array of pages to sent. > @@ -117,6 +117,8 @@ typedef struct { > > /* pointer to the packet */ > MultiFDPacket_t *packet; > +/* multifd flags for each packet */ > +uint32_t flags; > /* size of the next packet that contains pages */ > uint32_t next_packet_size; > /* packets sent through this channel */ > @@ -163,8 +165,6 @@ typedef struct { > bool running; > /* should this thread finish */ > bool quit; > -/* multifd flags for each packet */ > -uint32_t flags; > /* global number of generated multifd packets */ > uint64_t packet_num; > > @@ -172,6 +172,8 @@ typedef struct { > > /* pointer to the packet */ > MultiFDPacket_t *packet; > +/* multifd flags for each packet */ > +uint32_t flags; > /* size of the next packet that contains pages */ > uint32_t next_packet_size; > /* packets sent through this channel */ > diff --git a/migration/multifd.c b/migration/multifd.c > index eef47c274f..69b9d7cf98 100644 > --- a/migration/multifd.c > +++ b/migration/multifd.c > @@ -602,7 +602,7 @@ int multifd_send_sync_main(QEMUFile *f) > } > > p->packet_num = multifd_send_state->packet_num++; > -p->flags |= MULTIFD_FLAG_SYNC; > +p->sync_needed = true; > p->pending_job++; > qemu_mutex_unlock(&p->mutex); > qemu_sem_post(&p->sem); > @@ -656,7 +656,11 @@ static void *multifd_send_thread(void *opaque) > > if (p->pending_job) { > uint64_t packet_num = p->packet_num; > -uint32_t flags = p->flags; > +p->flags = 0; > +if (p->sync_needed) { > +p->flags |= MULTIFD_FLAG_SYNC; > +p->sync_needed = false; > +} > p->normal_num = 0; > > if (use_zero_copy_send) { > @@ -678,14 +682,13 @@ static void *multifd_send_thread(void *opaque) > } > } > multifd_send_fill_packet(p); > -p->flags = 0; > p->num_packets++; > p->total_normal_pages += p->normal_num; > p->pages->num = 0; > p->pages->block = NULL; > qemu_mutex_unlock(&p->mutex); > > -trace_multifd_send(p->id, packet_num, p->normal_num, flags, > +trace_multifd_send(p->id, packet_num, p->normal_num, p->flags, > p->next_packet_size); > > if (use_zero_copy_send) { > @@ -713,7 +716,7 @@ static void *multifd_send_thread(void *opaque) > p->pending_job--; > qemu_mutex_unlock(&p->mutex); > > -if (flags & MULTIFD_FLAG_SYNC) { > +if (p->flags & MULTIFD_FLAG_SYNC) { > qemu_sem_post(&p->sem_sync); > } > qemu_sem_post(&multifd_send_state->channels_ready); > @@ -1090,7 +1093,7 @@ static void *multifd_recv_thread(void *opaque) > rcu_register_thread(); > > while (true) { > -uint32_t flags; > +bool sync_needed = false; > > if (p->quit) { > break; > @@ -1112,11 +1115,11 @@ static void *multifd_recv_thread(void *opaque) > break; > } > > -flags = p->flags; > +trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->flags, > + p->next_packet_size); > +sync_needed = p->flags & MULTIFD_FLAG_SYNC; > /* recv methods don't know how to handle the SYNC flag *
[PATCH v2 5/9] target/arm: Add ARMCacheAttrs to the signature of pmsav8_mpu_lookup
From: Tobias Röhmel Add ARMCacheAttrs to the signature of pmsav8_mpu_lookup to prepare for the Cortex-R52 MPU which uses and combines cache attributes of different translation levels. Signed-off-by: Tobias Röhmel --- target/arm/internals.h | 13 +++-- target/arm/m_helper.c | 3 ++- target/arm/ptw.c | 11 +++ 3 files changed, 16 insertions(+), 11 deletions(-) diff --git a/target/arm/internals.h b/target/arm/internals.h index 6f94f3019d..b03049d920 100644 --- a/target/arm/internals.h +++ b/target/arm/internals.h @@ -1109,12 +1109,6 @@ void v8m_security_lookup(CPUARMState *env, uint32_t address, MMUAccessType access_type, ARMMMUIdx mmu_idx, V8M_SAttributes *sattrs); -bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address, - MMUAccessType access_type, ARMMMUIdx mmu_idx, - hwaddr *phys_ptr, MemTxAttrs *txattrs, - int *prot, bool *is_subpage, - ARMMMUFaultInfo *fi, uint32_t *mregion); - /* Cacheability and shareability attributes for a memory access */ typedef struct ARMCacheAttrs { /* @@ -1126,6 +1120,13 @@ typedef struct ARMCacheAttrs { bool is_s2_format:1; } ARMCacheAttrs; +bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address, + MMUAccessType access_type, ARMMMUIdx mmu_idx, + hwaddr *phys_ptr, MemTxAttrs *txattrs, + int *prot, bool *is_subpage, + ARMMMUFaultInfo *fi, uint32_t *mregion, + ARMCacheAttrs *cacheattrs); + bool get_phys_addr(CPUARMState *env, target_ulong address, MMUAccessType access_type, ARMMMUIdx mmu_idx, hwaddr *phys_ptr, MemTxAttrs *attrs, int *prot, diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c index a740c3e160..44c80d733a 100644 --- a/target/arm/m_helper.c +++ b/target/arm/m_helper.c @@ -2829,10 +2829,11 @@ uint32_t HELPER(v7m_tt)(CPUARMState *env, uint32_t addr, uint32_t op) * inspecting the other MPU state. */ if (arm_current_el(env) != 0 || alt) { +ARMCacheAttrs cacheattrs = {0}; /* We can ignore the return value as prot is always set */ pmsav8_mpu_lookup(env, addr, MMU_DATA_LOAD, mmu_idx, &phys_addr, &attrs, &prot, &is_subpage, - &fi, &mregion); + &fi, &mregion, &cacheattrs); if (mregion == -1) { mrvalid = false; mregion = 0; diff --git a/target/arm/ptw.c b/target/arm/ptw.c index 8b037c1f55..c4f5721012 100644 --- a/target/arm/ptw.c +++ b/target/arm/ptw.c @@ -1702,7 +1702,8 @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address, MMUAccessType access_type, ARMMMUIdx mmu_idx, hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot, bool *is_subpage, - ARMMMUFaultInfo *fi, uint32_t *mregion) + ARMMMUFaultInfo *fi, uint32_t *mregion, + ARMCacheAttrs *cacheattrs) { /* * Perform a PMSAv8 MPU lookup (without also doing the SAU check @@ -1968,7 +1969,7 @@ static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address, MMUAccessType access_type, ARMMMUIdx mmu_idx, hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot, target_ulong *page_size, - ARMMMUFaultInfo *fi) + ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs) { uint32_t secure = regime_is_secure(env, mmu_idx); V8M_SAttributes sattrs = {}; @@ -2036,7 +2037,8 @@ static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address, } ret = pmsav8_mpu_lookup(env, address, access_type, mmu_idx, phys_ptr, -txattrs, prot, &mpu_is_subpage, fi, NULL); +txattrs, prot, &mpu_is_subpage, fi, +NULL, cacheattrs); *page_size = sattrs.subpage || mpu_is_subpage ? 1 : TARGET_PAGE_SIZE; return ret; } @@ -2416,7 +2418,8 @@ bool get_phys_addr(CPUARMState *env, target_ulong address, if (arm_feature(env, ARM_FEATURE_V8)) { /* PMSAv8 */ ret = get_phys_addr_pmsav8(env, address, access_type, mmu_idx, - phys_ptr, attrs, prot, page_size, fi); + phys_ptr, attrs, prot, page_size, + fi, cacheattrs); } else if (arm_feature(env, ARM_FEATURE_V7)) { /* PMSAv7 */ ret = get_phys_addr_pmsav7(env, address, access_type, mmu_idx, -- 2.25.1
Re: [RISU PATCH v4 24/29] Add --fulldump and --diffdup options
On Fri, 8 Jul 2022 at 17:51, Richard Henderson wrote: > > These allow the inspection of the trace files. > > Signed-off-by: Richard Henderson Reviewed-by: Peter Maydell thanks -- PMM
[PATCH v2 8/9] target/arm: Add PMSAv8r functionality
From: Tobias Röhmel Add PMSAv8r translation that is used by the ARM Cortex-R52. Signed-off-by: Tobias Röhmel --- target/arm/ptw.c | 171 +-- 1 file changed, 150 insertions(+), 21 deletions(-) diff --git a/target/arm/ptw.c b/target/arm/ptw.c index c4f5721012..c7e37c66d0 100644 --- a/target/arm/ptw.c +++ b/target/arm/ptw.c @@ -140,6 +140,9 @@ static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx) */ return true; } +} else if (arm_feature(env, ARM_FEATURE_V8_R)) { +return !(regime_sctlr(env, mmu_idx) & SCTLR_M) || +(!(regime_el(env, mmu_idx) == 2) && arm_hcr_el2_eff(env) & HCR_TGE); } hcr_el2 = arm_hcr_el2_eff(env); @@ -1504,6 +1507,8 @@ static bool pmsav7_use_background_region(ARMCPU *cpu, ARMMMUIdx mmu_idx, if (arm_feature(env, ARM_FEATURE_M)) { return env->v7m.mpu_ctrl[regime_is_secure(env, mmu_idx)] & R_V7M_MPU_CTRL_PRIVDEFENA_MASK; +} else if (arm_feature(env, ARM_FEATURE_V8_R)) { +return false; } else { return regime_sctlr(env, mmu_idx) & SCTLR_BR; } @@ -1698,6 +1703,77 @@ static bool get_phys_addr_pmsav7(CPUARMState *env, uint32_t address, return !(*prot & (1 << access_type)); } +static uint32_t *regime_rbar(CPUARMState *env, ARMMMUIdx mmu_idx, + uint32_t secure) +{ +if (arm_feature(env, ARM_FEATURE_V8_R)) { +if (regime_el(env, mmu_idx) == 2) { +return env->pmsav8.hprbarn; +} else { +return env->pmsav8.prbarn; +} +} else { + return env->pmsav8.rbar[secure]; +} +} + +static uint32_t *regime_rlar(CPUARMState *env, ARMMMUIdx mmu_idx, + uint32_t secure) +{ +if (arm_feature(env, ARM_FEATURE_V8_R)) { +if (regime_el(env, mmu_idx) == 2) { +return env->pmsav8.hprlarn; +} else { +return env->pmsav8.prlarn; +} +} else { +return env->pmsav8.rlar[secure]; +} +} + +static inline void get_phys_addr_pmsav8_default(CPUARMState *env, +ARMMMUIdx mmu_idx, +uint32_t address, int *prot) +{ +if (arm_feature(env, ARM_FEATURE_V8_R)) { +*prot = PAGE_READ | PAGE_WRITE; +if (address <= 0x7FFF) { +*prot |= PAGE_EXEC; +} +if ((regime_el(env, mmu_idx) == 2) +&& (regime_sctlr(env, mmu_idx) & SCTLR_WXN) +&& (regime_sctlr(env, mmu_idx) & SCTLR_M)) { +*prot &= ~PAGE_EXEC; +} +} else { +get_phys_addr_pmsav7_default(env, mmu_idx, address, prot); +} +} + +static bool pmsav8_fault(bool hit, CPUARMState *env, ARMMMUIdx mmu_idx) +{ +if (arm_feature(env, ARM_FEATURE_V8_R)) { +if (regime_el(env, mmu_idx) == 2) { +if (!hit && (mmu_idx != ARMMMUIdx_E2)) { +return true; +} else if (!hit && (mmu_idx == ARMMMUIdx_E2) + &&!(regime_sctlr(env, mmu_idx) & SCTLR_BR)) { +return true; +} +} else { +if (!hit && (mmu_idx != ARMMMUIdx_Stage1_E1)) { +return true; +} else if (!hit && (mmu_idx == ARMMMUIdx_Stage1_E1) + &&!(regime_sctlr(env, mmu_idx) & SCTLR_BR)) { +return true; +} +} +return false; +} else { +return !hit; +} +} + bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address, MMUAccessType access_type, ARMMMUIdx mmu_idx, hwaddr *phys_ptr, MemTxAttrs *txattrs, @@ -1730,6 +1806,12 @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address, *mregion = -1; } +if (arm_feature(env, ARM_FEATURE_V8_R)) { +if (mmu_idx == ARMMMUIdx_Stage2) { +fi->stage2 = true; +} +} + /* * Unlike the ARM ARM pseudocode, we don't need to check whether this * was an exception vector read from the vector table (which is always @@ -1746,17 +1828,26 @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address, hit = true; } +uint32_t bitmask; +if (arm_feature(env, ARM_FEATURE_V8_R)) { +bitmask = 0x3f; +} else { +bitmask = 0x1f; +} + + for (n = (int)cpu->pmsav7_dregion - 1; n >= 0; n--) { /* region search */ /* - * Note that the base address is bits [31:5] from the register - * with bits [4:0] all zeroes, but the limit address is bits - * [31:5] from the register with bits [4:0] all ones. + * Note that the base address is bits [31:x] from the register + * with bits [x-1:0] all zeroes, but the limit address is bits + * [31:x] from the register with bits [
Re: [RISU PATCH v4 27/29] aarch64: Tidy reginfo dumping ahead of ZA state
On Fri, 8 Jul 2022 at 17:14, Richard Henderson wrote: > > A misalignment for sve_vl, plus add a bit more space > on the left for the ZA[n] field name. > > Signed-off-by: Richard Henderson > --- > risu_reginfo_aarch64.c | 29 ++--- > 1 file changed, 18 insertions(+), 11 deletions(-) Reviewed-by: Peter Maydell thanks -- PMM
Re: [RISU PATCH v4 28/29] aarch64: Add support for ZA storage
On Fri, 8 Jul 2022 at 18:07, Richard Henderson wrote: > > Require NVL == SVL on startup, to make it easier to manage reginfo. > Most of the time PSTATE.SM would be active with PSTATE.ZA anyway, > for any non-trivial SME testing. > > Extend saved storage only when PSTATE.ZA is active. > Use a carefully reserved uint16_t for saving SVCR. > > Signed-off-by: Richard Henderson > --- Reviewed-by: Peter Maydell thanks -- PMM
Re: [RISU PATCH v4 23/29] Standardize reginfo_dump_mismatch printing
On Fri, 8 Jul 2022 at 17:39, Richard Henderson wrote: > > Hoist the "master vs apprentice" label to apprentice(), since > we will want different labels for dumping. Remove all of the > "mismatch" text from reginfo_dump_mismatch -- just print "vs". > > Signed-off-by: Richard Henderson > diff --git a/risu_reginfo_ppc64.c b/risu_reginfo_ppc64.c > index 9899b36..e96dc48 100644 > --- a/risu_reginfo_ppc64.c > +++ b/risu_reginfo_ppc64.c > @@ -27,6 +27,15 @@ > const struct option * const arch_long_opts; > const char * const arch_extra_help; > > +static const char * const greg_names[NGREG] = { > +"r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7", > +"r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15", > + "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23", > + "r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31", > + [XER] = "xer", > + [CCR] = "ccr", > +}; > + > void process_arch_opt(int opt, const char *arg) > { > abort(); > @@ -147,35 +156,21 @@ int reginfo_dump(struct reginfo *ri, FILE * f) > return !ferror(f); > } > > -int reginfo_dump_mismatch(struct reginfo *m, struct reginfo *a, FILE *f) > +void reginfo_dump_mismatch(struct reginfo *m, struct reginfo *a, FILE *f) > { > int i; > -for (i = 0; i < 32; i++) { > -if (i == 1 || i == 13) { > -continue; > + > +for (i = 0; i < NGREG; i++) { > +if (greg_names[i] != NULL && m->gregs[i] != a->gregs[i]) { > +fprintf(f, "%6s: %016lx vs %016lx\n", > +greg_names[i], m->gregs[i], a->gregs[i]); > } This used to skip r1 and r13, and now it does not. > - > -if (m->gregs[i] != a->gregs[i]) { > -fprintf(f, "Mismatch: Register r%d\n", i); > -fprintf(f, "master: [%lx] - apprentice: [%lx]\n", > -m->gregs[i], a->gregs[i]); > -} > -} > - > -if (m->gregs[XER] != a->gregs[XER]) { > -fprintf(f, "Mismatch: XER\n"); > -fprintf(f, "m: [%lx] != a: [%lx]\n", m->gregs[XER], a->gregs[XER]); > -} > - > -if (m->gregs[CCR] != a->gregs[CCR]) { > -fprintf(f, "Mismatch: Cond. Register\n"); > -fprintf(f, "m: [%lx] != a: [%lx]\n", m->gregs[CCR], a->gregs[CCR]); > } > > for (i = 0; i < 32; i++) { > if (m->fpregs[i] != a->fpregs[i]) { > -fprintf(f, "Mismatch: Register f%d\n", i); > -fprintf(f, "m: [%016lx] != a: [%016lx]\n", > +fprintf(f, "%*s%d: %016lx vs %016lx\n", > +6 - (i < 10 ? 1 : 2), "f", i, > m->fpregs[i], a->fpregs[i]); > } > } thanks -- PMM
Re: [RISU PATCH v4 25/29] Remove return value from reginfo_dump
On Fri, 8 Jul 2022 at 18:05, Richard Henderson wrote: > > No uses actually checked the error indication. Even if we wanted > to check ferror on the stream, we should do that generically rather > than per arch. > > Signed-off-by: Richard Henderson Reviewed-by: Peter Maydell thanks -- PMM
Re: [PATCH] block/rbd: support driver-specific reopen
On 7/1/22 11:41, Hanna Reitz wrote: On 13.04.22 14:26, Raphael Pour wrote: >> } - return ret; + /* + * Remove all keys from the generic layer which + * can't be converted by rbd + * > Does any other driver do this? Shouldn’t we leave them there so that the generic layer can verify that they aren’t changed? + qdict_del(state->options, "driver"); + qdict_del(state->options, "node-name"); + qdict_del(state->options, "auto-read-only"); + qdict_del(state->options, "discard"); + qdict_del(state->options, "cache"); Because AFAIU this would mean that users could attempt to change e.g. the @cache option, and wouldn’t receive an error back, even though there is no support for changing it. + + /* + * To maintain the compatibility prior the rbd-reopen, + * where the generic layer can be altered without any + * rbd argument given, What does “the generic layer can be altered” mean? As far as I understand, it was only possible to change between read/write and read-only access. + + keypairs = g_strdup(qdict_get_try_str(state->options, "=keyvalue-pairs")); + if (keypairs) { + qdict_del(state->options, "=keyvalue-pairs"); + } + + secretid = g_strdup(qdict_get_try_str(state->options, "password-secret")); + if (secretid) { + qdict_del(state->options, "password-secret"); + } + + r = qemu_rbd_convert_options(state->options, &opts, &local_err); + if (local_err) { + /* + * If keypairs are present, that means some options are present in + * the modern option format. Don't attempt to parse legacy option + * formats, as we won't support mixed usage. + */ + if (keypairs) { + error_propagate(errp, local_err); + goto out; + } + + /* + * If the initial attempt to convert and process the options failed, + * we may be attempting to open an image file that has the rbd options + * specified in the older format consisting of all key/value pairs + * encoded in the filename. Go ahead and attempt to parse the + * filename, and see if we can pull out the required options. + */ + r = qemu_rbd_attempt_legacy_options(state->options, &opts, &keypairs); + if (r < 0) { + /* + * Propagate the original error, not the legacy parsing fallback + * error, as the latter was just a best-effort attempt. + */ + error_propagate(errp, local_err); + goto out; + } + /* + * Take care whenever deciding to actually deprecate; once this ability + * is removed, we will not be able to open any images with legacy-styled + * backing image strings. + */ + warn_report("RBD options encoded in the filename as keyvalue pairs " + "is deprecated"); + } + + /* + * Remove the processed options from the QDict (the visitor processes + * _all_ options in the QDict) + */ + while ((e = qdict_first(state->options))) { + qdict_del(state->options, e->key); + } > > OK, I see why you’d then want to remove all non-RBD options before this > point. Other block drivers seem to solve this by having an > X_runtime_opts QemuOptsList, which contains all driver-handled options, > so they can then use qemu_opts_absorb_qdict() to extract the options > they can handle. (Compare e.g. qcow2_update_options_prepare().) > Looking through all the drivers, rbd seems to be the only one not absorbing its options via runtime_opts but instead using qemu_rbd_convert_options. The converter visits all the options from BlockdevOptionsRbd defined in block-core.json. If there is an unknown option the conversion fails with EINVAL. The runtime_opts in contrast to the already defined json schema with the visitor-code-generation seem to be redundant. Do you have some insights why and when this redundancy is reasonable? I came up with several alternatives: 1. add own runtime_opts: - pro: "the qemu way", it behaves like most of the drivers - con: redundancy to qemu_rbd_convert_options which does the same except skipping the generic block layer options and adds +1k lines - con: I couldn't figure out how to add non-primitive options to the runtime_opts like encrypt or server 2. tell visitor to ignore unknown keys (?) 3. parse rbd options manually (opposite of deleting the generic block keys) I agree deleting the generic block opts isn't optimal. What do you think? Your remaining points are also reasonable and I'll submit their fix along with the solution to the issue above. OpenPGP_0xCDB1EBB785C5EB7E.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
[PATCH v2] vhost: Get vring base from vq, not svq
The SVQ vring used idx usually match with the guest visible one, as long as all the guest buffers (GPA) maps to exactly one buffer within qemu's VA. However, as we can see in virtqueue_map_desc, a single guest buffer could map to many buffers in SVQ vring. Also, its also a mistake to rewind them at the source of migration. Since VirtQueue is able to migrate the inflight descriptors, its responsability of the destination to perform the rewind just in case it cannot report the inflight descriptors to the device. This makes easier to migrate between backends or to recover them in vhost devices that support set in flight descriptors. Fixes: 6d0b22266633 ("vdpa: Adapt vhost_vdpa_get_vring_base to SVQ") Signed-off-by: Eugenio Pérez -- v2: Squash both fixes in one. --- hw/virtio/vhost-vdpa.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 795ed5a049..4458c8d23e 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -1178,7 +1178,18 @@ static int vhost_vdpa_set_vring_base(struct vhost_dev *dev, struct vhost_vring_state *ring) { struct vhost_vdpa *v = dev->opaque; +VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index); +/* + * vhost-vdpa devices does not support in-flight requests. Set all of them + * as available. + * + * TODO: This is ok for networking, but other kinds of devices might + * have problems with these retransmissions. + */ +while (virtqueue_rewind(vq, 1)) { +continue; +} if (v->shadow_vqs_enabled) { /* * Device vring base was set at device start. SVQ base is handled by @@ -1194,21 +1205,10 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev, struct vhost_vring_state *ring) { struct vhost_vdpa *v = dev->opaque; -int vdpa_idx = ring->index - dev->vq_index; int ret; if (v->shadow_vqs_enabled) { -VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx); - -/* - * Setting base as last used idx, so destination will see as available - * all the entries that the device did not use, including the in-flight - * processing ones. - * - * TODO: This is ok for networking, but other kinds of devices might - * have problems with these retransmissions. - */ -ring->num = svq->last_used_idx; +ring->num = virtio_queue_get_last_avail_idx(dev->vdev, ring->index); return 0; } -- 2.31.1
Re: [RISU PATCH v4 00/29] risu cleanups and improvements
On Fri, 8 Jul 2022 at 16:49, Richard Henderson wrote: > > If you can imagine, v3 was back in 2020: > https://lore.kernel.org/qemu-devel/20200522023440.26261-1-richard.hender...@linaro.org/ > > I've refreshed that, not that risu has changed much in that time, > and then also added support for SME to aarch64, i.e. SVCR and ZA > storage are now present in the reginfo, and compared. > > I include a small test case, which allows one to see that ZA > storage is being handled properly. When run with > > ./risu --test-sve=1 --test-za=1 --master -t test_sme_aarch64.{out,bin} > ./risu --fulldump -t test_sme_aarch64.out > > one can see the 16x16 bytes filled with row major then > column major indexes. I've applied patches 1-22 to risu upstream git, and left review comments/reviewed-by tags on the rest. thanks -- PMM
Re: [PATCH 2/2] iotests/131: Add parallels regression test
On 7/14/22 16:28, Hanna Reitz wrote: Test an allocating write to a parallels image that has a backing node. Before HEAD^, doing so used to give me a failed assertion (when the backing node contains only `42` bytes; the results varies with the value chosen, for `0` bytes, for example, all I get is EIO). Signed-off-by: Hanna Reitz Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [PATCH 1/2] block/parallels: Fix buffer-based write call
On 7/14/22 16:28, Hanna Reitz wrote: Commit a4072543ccdddbd241d5962d9237b8b41fd006bf has changed the I/O here from working on a local one-element I/O vector to just using the buffer directly (using the bdrv_co_pread()/bdrv_co_pwrite() helper functions introduced shortly before). However, it only changed the bdrv_co_preadv() call to bdrv_co_pread() - the subsequent bdrv_co_pwritev() call stayed this way, and so still expects a QEMUIOVector pointer instead of a plain buffer. We must change that to be a bdrv_co_pwrite() call. Fixes: a4072543ccdddbd241d5962d ("block/parallels: use buffer-based io") Signed-off-by: Hanna Reitz Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [RISU PATCH v4 29/29] aarch64: Trivial SME test
On Fri, 8 Jul 2022 at 17:59, Richard Henderson wrote: > > Signed-off-by: Richard Henderson > --- > test_sme_aarch64.s | 55 ++ > 1 file changed, 55 insertions(+) > create mode 100644 test_sme_aarch64.s > > diff --git a/test_sme_aarch64.s b/test_sme_aarch64.s > new file mode 100644 > index 000..a5ef909 > --- /dev/null > +++ b/test_sme_aarch64.s > @@ -0,0 +1,55 @@ > + .arch_extension sme Can we have the usual copyright-and-license comment at the top of the file, please? thanks -- PMM
Re: [PATCH v8 00/12] s390x: CPU Topology
On 7/15/22 20:28, Janis Schoetterl-Glausch wrote: On 7/15/22 15:47, Pierre Morel wrote: On 7/15/22 11:31, Janis Schoetterl-Glausch wrote: On 7/14/22 22:05, Pierre Morel wrote: On 7/14/22 20:43, Janis Schoetterl-Glausch wrote: On 6/20/22 16:03, Pierre Morel wrote: Hi, This new spin is essentially for coherence with the last Linux CPU Topology patch, function testing and coding style modifications. Forword === The goal of this series is to implement CPU topology for S390, it improves the preceeding series with the implementation of books and drawers, of non uniform CPU topology and with documentation. To use these patches, you will need the Linux series version 10. You find it there: https://lkml.org/lkml/2022/6/20/590 Currently this code is for KVM only, I have no idea if it is interesting to provide a TCG patch. If ever it will be done in another series. To have a better understanding of the S390x CPU Topology and its implementation in QEMU you can have a look at the documentation in the last patch or follow the introduction here under. A short introduction CPU Topology is described in the S390 POP with essentially the description of two instructions: PTF Perform Topology function used to poll for topology change and used to set the polarization but this part is not part of this item. STSI Store System Information and the SYSIB 15.1.x providing the Topology configuration. S390 Topology is a 6 levels hierarchical topology with up to 5 level of containers. The last topology level, specifying the CPU cores. This patch series only uses the two lower levels sockets and cores. To get the information on the topology, S390 provides the STSI instruction, which stores a structures providing the list of the containers used in the Machine topology: the SYSIB. A selector within the STSI instruction allow to chose how many topology levels will be provide in the SYSIB. Using the Topology List Entries (TLE) provided inside the SYSIB we the Linux kernel is able to compute the information about the cache distance between two cores and can use this information to take scheduling decisions. Do the socket, book, ... metaphors and looking at STSI from the existing smp infrastructure even make sense? Sorry, I do not understand. I admit the cover-letter is old and I did not rewrite it really good since the first patch series. What we do is: Compute the STSI from the SMP + numa + device QEMU parameters . STSI 15.1.x reports the topology to the guest and for a virtual machine, this topology can be very dynamic. So a CPU can move from from one topology container to another, but the socket of a cpu changing while it's running seems a bit strange. And this isn't supported by this patch series as far as I understand, the only topology changes are on hotplug. A CPU changing from a socket to another socket is the only case the PTF instruction reports a change in the topology with the case a new CPU is plug in. Can a CPU actually change between sockets right now? To be exact, what I understand is that a shared CPU can be scheduled to another real CPU exactly as a guest vCPU can be scheduled by the host to another host CPU. Ah, ok, this is what I'm forgetting, and what made communication harder, there are two ways by which the topology can change: 1. the host topology changes 2. the vCPU threads are scheduled on another host CPU I've been only thinking about the 2. I assumed some outside entity (libvirt?) pins vCPU threads, and so it would be the responsibility of that entity to set the topology which then is reported to the guest. So if you pin vCPUs for the whole lifetime of the vm then you could do that by specifying the topology up front with -devices. If you want to support migration, then the outside entity would need a way to tell qemu the updated topology. Yes The socket-id is computed from the core-id, so it's fixed, is it not? the virtual socket-id is computed from the virtual core-id Meaning cpu.env.core_id, correct? (which is the same as cpu.cpu_index which is the same as ms->possible_cpus->cpus[core_id].props.core_id) And a cpu's core id doesn't change during the lifetime of the vm, right? right And so it's socket id doesn't either. Yes It is not expected to appear often but it does appear. The code has been removed from the kernel in spin 10 for 2 reasons: 1) we decided to first support only dedicated and pinned CPU> 2) Christian fears it may happen too often due to Linux host scheduling and could be a performance problem This seems sensible, but now it seems too static. For example after migration, you cannot tell the guest which CPUs are in the same socket, book, ..., unless I'm misunderstanding something. No, to do this we would need to ask the kernel about it. You mean polling /sys/devices/system/cpu/cpu*/topology/*_id
Re: [PATCH v7 08/13] multifd: Prepare to send a packet without the mutex held
* Juan Quintela (quint...@redhat.com) wrote: > We do the send_prepare() and the fill of the head packet without the > mutex held. It will help a lot for compression and later in the > series for zero pages. > > Notice that we can use p->pages without holding p->mutex because > p->pending_job == 1. > > Signed-off-by: Juan Quintela > --- > migration/multifd.h | 2 ++ > migration/multifd.c | 11 ++- > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/migration/multifd.h b/migration/multifd.h > index af8ce8921d..d48597a1ea 100644 > --- a/migration/multifd.h > +++ b/migration/multifd.h > @@ -109,7 +109,9 @@ typedef struct { > /* array of pages to sent. > * The owner of 'pages' depends of 'pending_job' value: > * pending_job == 0 -> migration_thread can use it. > + * No need for mutex lock. > * pending_job != 0 -> multifd_channel can use it. > + * No need for mutex lock. > */ > MultiFDPages_t *pages; > > diff --git a/migration/multifd.c b/migration/multifd.c > index 69b9d7cf98..056599cbaf 100644 > --- a/migration/multifd.c > +++ b/migration/multifd.c > @@ -661,6 +661,8 @@ static void *multifd_send_thread(void *opaque) > p->flags |= MULTIFD_FLAG_SYNC; > p->sync_needed = false; > } > +qemu_mutex_unlock(&p->mutex); > + > p->normal_num = 0; > > if (use_zero_copy_send) { > @@ -682,11 +684,6 @@ static void *multifd_send_thread(void *opaque) Looking at my source tree, without the rest of your series, I see an unlock just before this pair of }'s; doesn't that need to go? Dave > } > } > multifd_send_fill_packet(p); > -p->num_packets++; > -p->total_normal_pages += p->normal_num; > -p->pages->num = 0; > -p->pages->block = NULL; > -qemu_mutex_unlock(&p->mutex); > > trace_multifd_send(p->id, packet_num, p->normal_num, p->flags, > p->next_packet_size); > @@ -711,6 +708,10 @@ static void *multifd_send_thread(void *opaque) > } > > qemu_mutex_lock(&p->mutex); > +p->num_packets++; > +p->total_normal_pages += p->normal_num; > +p->pages->num = 0; > +p->pages->block = NULL; > p->sent_bytes += p->packet_len;; > p->sent_bytes += p->next_packet_size; > p->pending_job--; > -- > 2.35.3 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [PATCH] x86: cpu: Error out if memory exceeds addressable range
On Mon, 18 Jul 2022 13:47:34 +0530 Nikunj A Dadhania wrote: > Currently it is possible to start a guest with memory that is beyond > the addressable range of CPU and QEMU does not even warn about it. > The default phys_bits is 40 and can address 1TB. However it allows to > start a guest with greater than 1TB memory. > > Prevent this by erroring out in such a scenario. > > Reported-by: Shaju Abraham > Signed-off-by: Nikunj A Dadhania Following shall care of your issue: https://www.mail-archive.com/qemu-devel@nongnu.org/msg900136.html > --- > target/i386/cpu.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > index 6a57ef13af..1afbdbac7d 100644 > --- a/target/i386/cpu.c > +++ b/target/i386/cpu.c > @@ -6376,6 +6376,7 @@ static void x86_cpu_hyperv_realize(X86CPU *cpu) > > static void x86_cpu_realizefn(DeviceState *dev, Error **errp) > { > +MachineState *machine = MACHINE(qdev_get_machine()); > CPUState *cs = CPU(dev); > X86CPU *cpu = X86_CPU(dev); > X86CPUClass *xcc = X86_CPU_GET_CLASS(dev); > @@ -6541,6 +6542,15 @@ static void x86_cpu_realizefn(DeviceState *dev, Error > **errp) > } > } > > +if (BIT_ULL(cpu->phys_bits) < machine->maxram_size) { > +error_setg(&local_err, "cannot setup guest memory: " > + "%s memory(%lu MiB) exceeds addressable limit(%llu MiB)", > + machine->maxram_size == machine->ram_size ? "" : "max", > + machine->maxram_size / MiB, > + BIT_ULL(cpu->phys_bits) / MiB); > +goto out; > +} > + > /* Cache information initialization */ > if (!cpu->legacy_cache) { > if (!xcc->model || !xcc->model->cpudef->cache_info) {
Re: [PULL 05/20] meson: Prefix each element of firmware path
On 14/07/2022 11.01, Paolo Bonzini wrote: From: Akihiko Odaki Signed-off-by: Akihiko Odaki Message-Id: <20220624154042.51512-1-akihiko.od...@gmail.com> [Rewrite shell function without using Bash extensions. - Paolo] Signed-off-by: Paolo Bonzini --- configure | 15 +++ meson.build | 11 +-- meson_options.txt | 2 +- scripts/meson-buildoptions.py | 7 +-- scripts/meson-buildoptions.sh | 4 ++-- softmmu/datadir.c | 8 +--- 6 files changed, 37 insertions(+), 10 deletions(-) diff --git a/configure b/configure index e8cc850727..f02635b087 100755 --- a/configure +++ b/configure @@ -676,6 +676,21 @@ fi werror="" +meson_option_build_array() { + printf '[' + (if test "$targetos" == windows; then +IFS=\; + else +IFS=: + fi + for e in $1; do +e=${e/'\'/'\\'} +e=${e/\"/'\"'} +printf '"""%s""",' "$e" + done) + printf ']\n' +} Hi! This seems to break the NetBSD VM builds: $ make vm-build-netbsd J=4 TARGET_LIST=x86_64-softmmu GIT ui/keycodemapdb meson tests/fp/berkeley-testfloat-3 tests/fp/berkeley-softfloat-3 dtc slirp roms/SLOF VM-BUILD netbsd ../src/configure: 687: Syntax error: Bad substitution Could you please have a look? Thanks, Thomas
Re: [PATCH v8 06/11] i386/pc: factor out cxl range start to helper
On Fri, 15 Jul 2022 18:16:23 +0100 Joao Martins wrote: > Factor out the calculation of the base address of the memory region. > It will be used later on for the cxl range end counterpart calculation > and as well in pc_memory_init() CXL memory region initialization, thus > avoiding duplication. > > Cc: Jonathan Cameron > Signed-off-by: Joao Martins Acked-by: Igor Mammedov PS: see note below in case series respin > --- > hw/i386/pc.c | 26 +- > 1 file changed, 17 insertions(+), 9 deletions(-) > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > index 1f42f194d7b7..3fdcab4bb4f3 100644 > --- a/hw/i386/pc.c > +++ b/hw/i386/pc.c > @@ -825,6 +825,22 @@ static hwaddr pc_above_4g_end(PCMachineState *pcms) > return x86ms->above_4g_mem_start + x86ms->above_4g_mem_size; > } > > +static uint64_t pc_get_cxl_range_start(PCMachineState *pcms) > +{ > +PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms); > +MachineState *machine = MACHINE(pcms); > +hwaddr cxl_base; > + > +if (pcmc->has_reserved_memory && machine->device_memory->base) { > +cxl_base = machine->device_memory->base > ++ memory_region_size(&machine->device_memory->mr); > +} else { > +cxl_base = pc_above_4g_end(pcms); > +} > + > +return cxl_base; > +} > + > static uint64_t pc_get_cxl_range_end(PCMachineState *pcms) > { > uint64_t start = 0; > @@ -946,15 +962,7 @@ void pc_memory_init(PCMachineState *pcms, > MemoryRegion *mr = &pcms->cxl_devices_state.host_mr; > hwaddr cxl_size = MiB; > > -if (pcmc->has_reserved_memory && machine->device_memory->base) { > -cxl_base = machine->device_memory->base > -+ memory_region_size(&machine->device_memory->mr); > -} else if (pcms->sgx_epc.size != 0) { > -cxl_base = sgx_epc_above_4g_end(&pcms->sgx_epc); > -} else { shouldn't be this hunk be a part of 4/11? (otherwise it looks like it's been dropped by mistake) end result is fine as pc_above_4g_end() already has this hunk (hence Ack) > -cxl_base = pc_above_4g_end(pcms); > -} > - > +cxl_base = pc_get_cxl_range_start(pcms); > e820_add_entry(cxl_base, cxl_size, E820_RESERVED); > memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size); > memory_region_add_subregion(system_memory, cxl_base, mr);