Re: [RFC PATCH 04/12] vhost_net: Use ops->vhost_reset_queue in vhost_net_virtqueue_reset
在 2023/7/21 02:14, Eugenio Pérez 写道: Actually use vhost_reset_queue operation at queue reset. Signed-off-by: Eugenio Pérez --- hw/net/vhost_net.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index 416b7d8132..5516b7a5aa 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -571,7 +571,9 @@ void vhost_net_virtqueue_reset(VirtIODevice *vdev, NetClientState *nc, idx = vhost_ops->vhost_get_vq_index(&net->dev, vq_index); -if (net->nc->info->type == NET_CLIENT_DRIVER_TAP) { +if (vhost_ops->vhost_reset_queue) { +vhost_ops->vhost_reset_queue(&net->dev, idx); +} else if (net->nc->info->type == NET_CLIENT_DRIVER_TAP) { file.index = idx; int r = vhost_net_set_backend(&net->dev, &file); assert(r >= 0); Let's move the TAP specific logic to vhost-net specific reset_queue function. Thanks
Re: [RFC PATCH 03/12] vhost_net: Use ops->vhost_restart_queue in vhost_net_virtqueue_restart
在 2023/7/21 02:14, Eugenio Pérez 写道: Actually use vhost_restart_queue operation at restart. Signed-off-by: Eugenio Pérez --- hw/net/vhost_net.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index 6b958d6363..416b7d8132 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -608,14 +608,16 @@ int vhost_net_virtqueue_restart(VirtIODevice *vdev, NetClientState *nc, goto err_start; } -if (net->nc->info->type == NET_CLIENT_DRIVER_TAP) { +if (vhost_ops->vhost_restart_queue) { +r = vhost_ops->vhost_restart_queue(&net->dev, idx); +} else if (net->nc->info->type == NET_CLIENT_DRIVER_TAP) { file.index = idx; file.fd = net->backend; r = vhost_net_set_backend(&net->dev, &file); I would introduce a vhost-net specific reset routine and move the above logic there. Thanks -if (r < 0) { -r = -errno; -goto err_start; -} +} +if (r < 0) { +r = -errno; +goto err_start; } return 0;
Re: [RFC PATCH 00/12] Prefer to use SVQ to stall dataplane at NIC state restore through CVQ
在 2023/7/21 14:48, Eugenio Perez Martin 写道: On Thu, Jul 20, 2023 at 8:15 PM Eugenio Pérez wrote: At this moment the migration of net features that depends on CVQ is not possible, as there is no reliable way to restore the device state like mac address, number of enabled queues, etc to the destination. This is mainly caused because the device must only read CVQ, and process all the commands before resuming the dataplane. This series uses the VirtIO feature _F_RING_RESET to achieve it, adding an alternative method to late vq enabling proposed in [1][2]. It expose SVQ to the device until it process all the CVQ messages, and then replaces the vring for the guest's one. A couple of things I forgot to add: * Assuming the implementation of _F_RING_RESET in vdpa is calling kernel vdpa ops .set_vq_ready(vq, false). I'm not sure if this is the best implementation, but it is currently unused in the kernel. At the same time, .set_vq_ready(vq, true) also enables the vq again. I think we need another ops, as set_vq_ready() tends to be functional equivalent to queue_enable. If we reuse set_vq_ready(vq, false), we would get conflict in the future when driver is allowed to stop/resume a specific virtqueue via setting 0 to queue_enable. And that's also the reason why queue_enable is not resued to reset a virtqueue. As an advantage, it uses a feature well deviced in the VirtIO standard. As a disadvantage, current HW already support the late enabling and it does not support RING_RESET. This patch must be applied on top of the series ("Enable vdpa net migration with features depending on CVQ") [1][2]. The patch has been tested with vp_vdpa, but using high IOVA instead of a sepparated ID for shadow CVQ and shadow temporal vrings. And with _F_STATE implementation I sent long ago. Based on this, my suggestion is: * Leave the late enable for vDPA devices. * Make them fail if the vDPA parent device does not support it. This can be considered as a fix. * Leave _F_RING_RESET to be added on top, as the semantics are not implemented in vDPA at the moment. Would that work? I think it can work, let's start from late enabling which seems lightweight than reset and see. We can leave the vp_vdpa to be done on top with another series. Thanks [1] Message-id: <20230706191227.835526-1-epere...@redhat.com> [2] https://lists.nongnu.org/archive/html/qemu-devel/2023-07/msg01325.html Eugenio Pérez (12): vhost: add vhost_reset_queue_op vhost: add vhost_restart_queue_op vhost_net: Use ops->vhost_restart_queue in vhost_net_virtqueue_restart vhost_net: Use ops->vhost_reset_queue in vhost_net_virtqueue_reset vdpa: add vhost_vdpa_set_vring_ready_internal vdpa: add vhost_vdpa_svq_stop vdpa: add vhost_vdpa_reset_queue vdpa: add vhost_vdpa_svq_start vdpa: add vhost_vdpa_restart_queue vdpa: enable all vqs if the device support RING_RESET feature vdpa: use SVQ to stall dataplane while NIC state is being restored vhost: Allow _F_RING_RESET with shadow virtqueue include/hw/virtio/vhost-backend.h | 6 ++ hw/net/vhost_net.c | 16 ++-- hw/virtio/vhost-shadow-virtqueue.c | 1 + hw/virtio/vhost-vdpa.c | 139 + net/vhost-vdpa.c | 55 ++-- hw/virtio/trace-events | 2 +- 6 files changed, 171 insertions(+), 48 deletions(-) -- 2.39.3
Re: [PATCH v2 3/4] vdpa: Restore vlan filtering state
On 2023/7/25 14:47, Jason Wang wrote: > On Sun, Jul 23, 2023 at 5:28 PM Hawkins Jiawei wrote: >> >> This patch introduces vhost_vdpa_net_load_single_vlan() >> and vhost_vdpa_net_load_vlan() to restore the vlan >> filtering state at device's startup. >> >> Co-developed-by: Eugenio Pérez >> Signed-off-by: Eugenio Pérez >> Signed-off-by: Hawkins Jiawei > > Acked-by: Jason Wang > > But this seems to be a source of latency killer as it may at most send > 1024 commands. > > As discussed in the past, we need a better cvq command to do this: for > example, a single command to carray a bitmap. Hi Jason, Thanks for your review. You are right, we need some improvement here. Therefore, I have submitted another patch series titled "vdpa: Send all CVQ state load commands in parallel" at [1], which allows QEMU to delay polling and checking the device used buffer until either the SVQ is full or control commands shadow buffers are full, so that QEMU can send all the SVQ control commands in parallel, which has better performance improvement. To test that patch series, I created 4094 VLANS in guest to build an environment for sending multiple CVQ state load commands. According to the result on the real vdpa device at [2], this patch series can improve latency from 23296 us to 6539 us. Thanks! [1]. https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg03726.html [2]. https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg03947.html > > Thanks > >> --- >> v2: >> - remove the extra line pointed out by Eugenio >> >> v1: >> https://lore.kernel.org/all/0a568cc8a8d2b750c2e09b2237e9f05cece07c3f.1689690854.git.yin31...@gmail.com/ >> >> net/vhost-vdpa.c | 48 >> 1 file changed, 48 insertions(+) >> >> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c >> index 9795306742..347241796d 100644 >> --- a/net/vhost-vdpa.c >> +++ b/net/vhost-vdpa.c >> @@ -965,6 +965,50 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s, >> return 0; >> } >> >> +static int vhost_vdpa_net_load_single_vlan(VhostVDPAState *s, >> + const VirtIONet *n, >> + uint16_t vid) >> +{ >> +const struct iovec data = { >> +.iov_base = &vid, >> +.iov_len = sizeof(vid), >> +}; >> +ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_VLAN, >> + VIRTIO_NET_CTRL_VLAN_ADD, >> + &data, 1); >> +if (unlikely(dev_written < 0)) { >> +return dev_written; >> +} >> +if (unlikely(*s->status != VIRTIO_NET_OK)) { >> +return -EIO; >> +} >> + >> +return 0; >> +} >> + >> +static int vhost_vdpa_net_load_vlan(VhostVDPAState *s, >> +const VirtIONet *n) >> +{ >> +int r; >> + >> +if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_VLAN)) { >> +return 0; >> +} >> + >> +for (int i = 0; i < MAX_VLAN >> 5; i++) { >> +for (int j = 0; n->vlans[i] && j <= 0x1f; j++) { >> +if (n->vlans[i] & (1U << j)) { >> +r = vhost_vdpa_net_load_single_vlan(s, n, (i << 5) + j); >> +if (unlikely(r != 0)) { >> +return r; >> +} >> +} >> +} >> +} >> + >> +return 0; >> +} >> + >> static int vhost_vdpa_net_load(NetClientState *nc) >> { >> VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc); >> @@ -995,6 +1039,10 @@ static int vhost_vdpa_net_load(NetClientState *nc) >> if (unlikely(r)) { >> return r; >> } >> +r = vhost_vdpa_net_load_vlan(s, n); >> +if (unlikely(r)) { >> +return r; >> +} >> >> return 0; >> } >> -- >> 2.25.1 >> >
Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop
On 24.07.23 17:48, Eugenio Perez Martin wrote: On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek wrote: On 21.07.23 17:25, Eugenio Perez Martin wrote: On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek wrote: Move the `suspended` field from vhost_vdpa into the global vhost_dev struct, so vhost_dev_stop() can check whether the back-end has been suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has, there is no need to reset it; the reset is just a fall-back to stop device operations for back-ends that do not support suspend. Unfortunately, for vDPA specifically, RESUME is not yet implemented, so when the device is re-started, we still have to do the reset to have it un-suspend. Signed-off-by: Hanna Czenczek --- include/hw/virtio/vhost-vdpa.h | 2 -- include/hw/virtio/vhost.h | 8 hw/virtio/vhost-vdpa.c | 11 +++ hw/virtio/vhost.c | 8 +++- 4 files changed, 22 insertions(+), 7 deletions(-) diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h index e64bfc7f98..72c3686b7f 100644 --- a/include/hw/virtio/vhost-vdpa.h +++ b/include/hw/virtio/vhost-vdpa.h @@ -42,8 +42,6 @@ typedef struct vhost_vdpa { bool shadow_vqs_enabled; /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */ bool shadow_data; -/* Device suspended successfully */ -bool suspended; /* IOVA mapping used by the Shadow Virtqueue */ VhostIOVATree *iova_tree; GPtrArray *shadow_vqs; diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h index 6a173cb9fa..69bf59d630 100644 --- a/include/hw/virtio/vhost.h +++ b/include/hw/virtio/vhost.h @@ -120,6 +120,14 @@ struct vhost_dev { uint64_t backend_cap; /* @started: is the vhost device started? */ bool started; +/** + * @suspended: Whether the vhost device is currently suspended. Set + * and reset by implementations (vhost-user, vhost-vdpa, ...), which + * are supposed to automatically suspend/resume in their + * vhost_dev_start handlers as required. Must also be cleared when + * the device is reset. + */ +bool suspended; bool log_enabled; uint64_t log_size; Error *migration_blocker; diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 7b7dee468e..f7fd19a203 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev, static int vhost_vdpa_reset_device(struct vhost_dev *dev) { -struct vhost_vdpa *v = dev->opaque; int ret; uint8_t status = 0; ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status); trace_vhost_vdpa_reset_device(dev); -v->suspended = false; +dev->suspended = false; return ret; } @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev) if (unlikely(r)) { error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno); } else { -v->suspended = true; +dev->suspended = true; return; } } @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) return -1; } vhost_vdpa_set_vring_ready(dev); +if (dev->suspended) { +/* TODO: When RESUME is available, use it instead of resetting */ +vhost_vdpa_reset_status(dev); How is that we reset the status at each vhost_vdpa_dev_start? That will clean all the vqs configured, features negotiated, etc. in the vDPA device. Or am I missing something? What alternative do you propose? We don’t have RESUME for vDPA in qemu, but we somehow need to lift the previous SUSPEND so the device will again respond to guest requests, do we not? Reset also clears the suspend state in vDPA, and it should be called at vhost_dev_stop. So the device should never be in suspended state here. Does that solve your concerns? My intention with this patch was precisely not to reset in vhost_dev_stop when suspending is supported. So now I’m more confused than before. But more generally, is this any different from what is done before this patch? Before this patch, vhost_dev_stop() unconditionally invokes vhost_reset_status(), so the device is reset in every stop/start cycle, that doesn’t change. And we still won’t reset it on the first vhost_dev_start(), because dev->suspended will be false then, only on subsequent stop/start cycles, as before. So the only difference is that now the device is reset on start, not on stop. The difference is that vhost_vdpa_dev_start is called after features ack (via vhost_dev_start, through vhost_dev_set_features call) and vq configuration (using vhost_virtqueue_start). A device reset forces the device to forget about all of that, and qemu cannot configure them again until qemu acks the features again. Now I’m complete
Re: [PATCH v6 5/6] qapi: Add HV_BALLOON_STATUS_REPORT event
"Maciej S. Szmigiero" writes: > From: "Maciej S. Szmigiero" > > Used by the hv-balloon driver for (optional) guest memory status reports. Inhowfar optional? What enables / triggers it? Use case for the event? Could a status event make sense for other balloon drivers as well? > Signed-off-by: Maciej S. Szmigiero > --- > qapi/machine.json | 25 + > 1 file changed, 25 insertions(+) > > diff --git a/qapi/machine.json b/qapi/machine.json > index 5ede977cf2bc..9649616b9ed2 100644 > --- a/qapi/machine.json > +++ b/qapi/machine.json > @@ -1113,6 +1113,31 @@ > { 'event': 'BALLOON_CHANGE', >'data': { 'actual': 'int' } } > > +## > +# @HV_BALLOON_STATUS_REPORT: > +# > +# Emitted when the hv-balloon driver receives a "STATUS" message from > +# the guest. Aha, the event is triggered by the guest. It must therefore be rate-limited, just like BALLOON_CHANGE. To do that, add it to monitor_qapi_event_conf[] in monitor/monitor.c, and document it as noted below. > +# > +# @commited: the amount of memory in use inside the guest plus the amount > +#of the memory unusable inside the guest (ballooned out, > +#offline, etc.) > +# > +# @available: the amount of the memory inside the guest available for new > +# allocations ("free") Spelling: committed. Remember to update the example, too. Please format like # @committed: the amount of memory in use inside the guest plus the # amount of the memory unusable inside the guest (ballooned out, # offline, etc.) # # @available: the amount of the memory inside the guest available for # new allocations ("free") to blend in with recent commit a937b6aa739 (qapi: Reformat doc comments to conform to current conventions). > +# To document rate-limiting, add: # Note: this event is rate-limited. # > +# Since: TBD > +# > +# Example: > +# > +# <- { "event": "HV_BALLOON_STATUS_REPORT", > +# "data": { "commited": 81664, "available": 054464 }, > +# "timestamp": { "seconds": 1600295492, "microseconds": 661044 } } > +# > +## > +{ 'event': 'HV_BALLOON_STATUS_REPORT', > + 'data': { 'commited': 'size', 'available': 'size' } } > + > ## > # @MemoryInfo: > # An event is commonly paired with a query command, so that QMP clients can resynchronize state after missing events, e.g. when reconnecting after a client restart. query-balloon isn't such a query: it returns less than the event. If a paired query doesn't make sense, explain why.
[PATCH] semihosting/uaccess.c: Replaced a malloc call with g_malloc.
Replaced a call to malloc() and its respective call to free() with g_malloc() and g_free(). Signed-off-by: dinglimin --- semihosting/uaccess.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/semihosting/uaccess.c b/semihosting/uaccess.c index 8018828069..8f2e6f63ee 100644 --- a/semihosting/uaccess.c +++ b/semihosting/uaccess.c @@ -14,11 +14,10 @@ void *softmmu_lock_user(CPUArchState *env, target_ulong addr, target_ulong len, bool copy) { -void *p = malloc(len); +void *p = g_malloc(len); if (p && copy) { if (cpu_memory_rw_debug(env_cpu(env), addr, p, len, 0)) { -free(p); -p = NULL; +g_free(p); } } return p; -- 2.30.0.windows.2
Re: [PATCH] semihosting/uaccess.c: Replaced a malloc call with g_malloc.
25.07.2023 11:06, dinglimin wrote: Replaced a call to malloc() and its respective call to free() with g_malloc() and g_free(). ... void *softmmu_lock_user(CPUArchState *env, target_ulong addr, target_ulong len, bool copy) { -void *p = malloc(len); +void *p = g_malloc(len); if (p && copy) { if (cpu_memory_rw_debug(env_cpu(env), addr, p, len, 0)) { -free(p); -p = NULL; +g_free(p); } } return p; This is definitely wrong. Hint: what this function will return if cpu_memory_rw_debug() fails? /mjt
Re: [PATCH v6 4/6] qapi: Add HvBalloonDeviceInfo sub-type to MemoryDeviceInfo
"Maciej S. Szmigiero" writes: > From: "Maciej S. Szmigiero" > > Used by the hv-balloon driver to report its provided memory state > information. > > Co-developed-by: David Hildenbrand > Signed-off-by: Maciej S. Szmigiero > --- > hw/core/machine-hmp-cmds.c | 15 +++ > qapi/machine.json | 39 -- > 2 files changed, 52 insertions(+), 2 deletions(-) > > diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c > index c3e55ef9e9cd..7b06ed35decb 100644 > --- a/hw/core/machine-hmp-cmds.c > +++ b/hw/core/machine-hmp-cmds.c > @@ -247,6 +247,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict > *qdict) > MemoryDeviceInfo *value; > PCDIMMDeviceInfo *di; > SgxEPCDeviceInfo *se; > +HvBalloonDeviceInfo *hi; > > for (info = info_list; info; info = info->next) { > value = info->value; > @@ -304,6 +305,20 @@ void hmp_info_memory_devices(Monitor *mon, const QDict > *qdict) > monitor_printf(mon, " node: %" PRId64 "\n", se->node); > monitor_printf(mon, " memdev: %s\n", se->memdev); > break; > +case MEMORY_DEVICE_INFO_KIND_HV_BALLOON: > +hi = value->u.hv_balloon.data; > +monitor_printf(mon, "Memory device [%s]: \"%s\"\n", > + MemoryDeviceInfoKind_str(value->type), > + hi->id ? hi->id : ""); > +if (hi->has_memaddr) { > +monitor_printf(mon, " memaddr: 0x%" PRIx64 "\n", > + hi->memaddr); > +} > +monitor_printf(mon, " max-size: %" PRIu64 "\n", > hi->max_size); > +if (hi->memdev) { > +monitor_printf(mon, " memdev: %s\n", hi->memdev); > +} > +break; > default: > g_assert_not_reached(); > } > diff --git a/qapi/machine.json b/qapi/machine.json > index a08b6576cac6..5ede977cf2bc 100644 > --- a/qapi/machine.json > +++ b/qapi/machine.json > @@ -1265,6 +1265,29 @@ >} > } > > +## > +# @HvBalloonDeviceInfo: > +# > +# hv-balloon provided memory state information > +# > +# @id: device's ID > +# > +# @memaddr: physical address in memory, where device is mapped > +# > +# @max-size: the maximum size of memory that the device can provide > +# > +# @memdev: memory backend linked with device > +# > +# Since: TBD I understand why you put in TBD here (aiming for a moving target is a hassle), but patches not marked RFC should have no known issues that should be fixed before merging them. > +## > +{ 'struct': 'HvBalloonDeviceInfo', > + 'data': { '*id': 'str', > +'*memaddr': 'size', > +'max-size': 'size', > +'*memdev': 'str' > + } > +} > + > ## > # @MemoryDeviceInfoKind: > # > @@ -1276,10 +1299,13 @@ > # > # @sgx-epc: since 6.2. > # > +# @hv-balloon: since TBD. > +# > # Since: 2.1 > ## > { 'enum': 'MemoryDeviceInfoKind', > - 'data': [ 'dimm', 'nvdimm', 'virtio-pmem', 'virtio-mem', 'sgx-epc' ] } > + 'data': [ 'dimm', 'nvdimm', 'virtio-pmem', 'virtio-mem', 'sgx-epc', > +'hv-balloon' ] } > > ## > # @PCDIMMDeviceInfoWrapper: > @@ -1313,6 +1339,14 @@ > { 'struct': 'SgxEPCDeviceInfoWrapper', >'data': { 'data': 'SgxEPCDeviceInfo' } } > > +## > +# @HvBalloonDeviceInfoWrapper: > +# > +# Since: TBD > +## > +{ 'struct': 'HvBalloonDeviceInfoWrapper', > + 'data': { 'data': 'HvBalloonDeviceInfo' } } > + > ## > # @MemoryDeviceInfo: > # > @@ -1327,7 +1361,8 @@ > 'nvdimm': 'PCDIMMDeviceInfoWrapper', > 'virtio-pmem': 'VirtioPMEMDeviceInfoWrapper', > 'virtio-mem': 'VirtioMEMDeviceInfoWrapper', > -'sgx-epc': 'SgxEPCDeviceInfoWrapper' > +'sgx-epc': 'SgxEPCDeviceInfoWrapper', > +'hv-balloon': 'HvBalloonDeviceInfoWrapper' >} > } > The organization of the series feels a bit awkward. In this patch, you define QAPI types and add a bit of code reading them, but the code creating them is left for later. In the next patch, you define a QMP event, but the code sending it is left for later. In the final, huge patch, you fill in the blanks. Adding definitions before their uses can be the least awkward solution. But then the commit messages should point out that uses come later. Describing these future uses briefly may be necessary to help the reader understand the patch on its own. Perhaps you can restructure the series instead.
Re: [PATCH v6 0/6] Hyper-V Dynamic Memory Protocol driver (hv-balloon 🎈️)
David Hildenbrand writes: > On 20.07.23 12:12, Maciej S. Szmigiero wrote: >> From: "Maciej S. Szmigiero" >> This is a continuation of the v5 of the patch series located here: >> https://lore.kernel.org/qemu-devel/cover.1686577753.git.maciej.szmigi...@oracle.com/ >> > > We're now in QEMU soft-freeze, which means the memslot series might take a > bit to land. I'm going to follow-up on that soonish. > >> Changes from v5: >> * Incorporate David's rework of the driver on top of his virtio-mem-memslots >> patches (specifically, commit 6769107d1a4f), making use of a memory region >> container created upfront to avoid calling memory_device{,_pre}_plug() >> functions from the driver and introducing a driver-specific MemoryDeviceInfo >> sub-type. >> * Include two additional David's memory-device patches necessary for the >> aforementioned conversion in this patch set. >> * Use multiple memslots to cover the hot-add memory backend in order to >> reduce metadata size for the not-yet-hot-added part of the memory backend. >> * Add David's "Co-developed-by:" to patches where he contributed some >> changes. >> * Use OBJECT_DEFINE_TYPE_WITH_INTERFACES() and OBJECT_DECLARE_SIMPLE_TYPE() >> macros instead of open-coding the equivalent functionality. >> * Drop no longer necessary patch adding g_autoptr() cleanup function for the >> Error type. >> David Hildenbrand (2): >>memory-device: Support empty memory devices >>memory-device: Drop size alignment check >> Maciej S. Szmigiero (4): >>Add Hyper-V Dynamic Memory Protocol definitions >>qapi: Add HvBalloonDeviceInfo sub-type to MemoryDeviceInfo >>qapi: Add HV_BALLOON_STATUS_REPORT event >>Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon) > > That is still a gigantic patch. Is there any way to split that into > reasonable chunks? For example, move the whole hotplug/memslot part into > a dedicated patch? > > See below on splitting off the PC changes. > >> Kconfig.host |3 + >> hw/core/machine-hmp-cmds.c | 15 + >> hw/hyperv/Kconfig|5 + >> hw/hyperv/hv-balloon.c | 2246 ++ >> hw/hyperv/meson.build|1 + >> hw/hyperv/trace-events | 18 + >> hw/i386/pc.c | 22 + >> hw/mem/memory-device.c | 45 +- >> include/hw/hyperv/dynmem-proto.h | 423 ++ >> include/hw/hyperv/hv-balloon.h | 18 + >> include/hw/mem/memory-device.h |7 +- >> meson.build | 28 +- >> meson_options.txt|2 + >> qapi/machine.json| 64 +- >> scripts/meson-buildoptions.sh|3 + > > It's probably best to separate the actual device implementation from wiring > up the machine. That is, have a HV_BALLOON_SUPPORTED kconfig > (like VIRTIO_MEM_SUPPORTED), and activate that in a single commit for > PC, where you also modify hw/i386/pc.c. > > That commit would be called something like "pc: Support hv-balloon". Furthermore, try to factor out stuff related to QMP: 0. The driver with QMP stuff omitted / stubbed out 1. Enable query-memory-devices 2. Add HV_BALLOON_STATUS_REPORT event
Re: [PATCH v6 4/6] qapi: Add HvBalloonDeviceInfo sub-type to MemoryDeviceInfo
"Maciej S. Szmigiero" writes: > On 24.07.2023 13:37, Markus Armbruster wrote: >> "Maciej S. Szmigiero" writes: >> >>> From: "Maciej S. Szmigiero" >>> >>> Used by the hv-balloon driver to report its provided memory state >>> information. >>> >>> Co-developed-by: David Hildenbrand >>> Signed-off-by: Maciej S. Szmigiero >>> --- >>> hw/core/machine-hmp-cmds.c | 15 +++ >>> qapi/machine.json | 39 -- >>> 2 files changed, 52 insertions(+), 2 deletions(-) >>> >>> diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c >>> index c3e55ef9e9cd..7b06ed35decb 100644 >>> --- a/hw/core/machine-hmp-cmds.c >>> +++ b/hw/core/machine-hmp-cmds.c >>> @@ -247,6 +247,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict >>> *qdict) >>> MemoryDeviceInfo *value; >>> PCDIMMDeviceInfo *di; >>> SgxEPCDeviceInfo *se; >>> +HvBalloonDeviceInfo *hi; >>> for (info = info_list; info; info = info->next) { >>> value = info->value; >>> @@ -304,6 +305,20 @@ void hmp_info_memory_devices(Monitor *mon, const QDict >>> *qdict) >>> monitor_printf(mon, " node: %" PRId64 "\n", se->node); >>> monitor_printf(mon, " memdev: %s\n", se->memdev); >>> break; >>> +case MEMORY_DEVICE_INFO_KIND_HV_BALLOON: >> >> This is the only occurence of MEMORY_DEVICE_INFO_KIND_HV_BALLOON at the >> end of the series. Where are MemoryDeviceInfo with this union tag >> created? > > From patch 6: [...] I messed around with git, and managed to confuse myself. Thanks for your help!
Re: [PATCH 1/6] vhost-user.rst: Add suspend/resume
On 24.07.23 19:55, Stefan Hajnoczi wrote: On Wed, Jul 19, 2023 at 03:59:32PM +0200, Hanna Czenczek wrote: On 18.07.23 16:25, Stefan Hajnoczi wrote: On Tue, Jul 11, 2023 at 05:52:23PM +0200, Hanna Czenczek wrote: When stopping the VM, qemu wants all devices to fully cease any operation, too. Currently, we can only have vhost-user back-ends stop processing vrings, but not background operations. Add the SUSPEND and RESUME commands from vDPA, which allow the front-end (qemu) to tell back-ends to cease all operations, including those running in the background. qemu's current work-around for this is to reset the back-end instead of suspending it, which will not work for back-ends that have internal state that must be preserved across e.g. stop/cont. Note that the given specification requires the back-end to delay processing kicks (i.e. starting vrings) until the device is resumed, instead of requiring the front-end not to send kicks while suspended. qemu starts devices (and would just resume them) only when the VM is in a running state, so it would be difficult to have qemu delay kicks until the device is resumed, which is why this patch specifies handling of kicks as it does. Signed-off-by: Hanna Czenczek --- docs/interop/vhost-user.rst | 35 +-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst index 5a070adbc1..ac6be34c4c 100644 --- a/docs/interop/vhost-user.rst +++ b/docs/interop/vhost-user.rst @@ -381,6 +381,10 @@ readable) on the descriptor specified by ``VHOST_USER_SET_VRING_KICK`` or receiving the in-band message ``VHOST_USER_VRING_KICK`` if negotiated, and stop ring upon receiving ``VHOST_USER_GET_VRING_BASE``. +While the back-end is suspended (via ``VHOST_USER_SUSPEND``), it must +never process rings, and thus also delay handling kicks until the If you respin this series, I suggest replacing "never" with "not" to emphasize that ring processing is only skipped while the device is suspended (rather than forever). "Never" feels too strong to use when describing a temporary state. Sure. +back-end is resumed again. + Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``. If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the @@ -479,8 +483,9 @@ supplied by ``VhostUserLog``. ancillary data, it may be used to inform the front-end that the log has been modified. -Once the source has finished migration, rings will be stopped by the -source. No further update must be done before rings are restarted. +Once the source has finished migration, the device will be suspended and +its rings will be stopped by the source. No further update must be done +before the device and its rings are resumed. This paragraph is abstract and doesn't directly identify the mechanisms or who does what: - "the device will be suspended" via VHOST_USER_SUSPEND (or reset when VHOST_USER_SUSPEND is not supported?) or automatically by the device itself or some other mechanism? OK, I’ll add VHOST_USER_SUSPEND. So far I hadn’t considered making a note of resetting as a fallback right in the specification. I don’t think I would want it in the specification, but on the other hand, it is probably important for back-end authors to know that if they do not implement SUSPEND, their device is going to be reset by qemu. Can we make that a ”may”, i.e.: ``` Once the source has finished migration, the device will be suspended via VHOST_USER_SUSPEND and its rings will be stopped by the source. I'm not sure what "its rings will be stopped by the source" means exactly. Is it summarizing the effect of VHOST_USER_SUSPEND or is there an additional action after VHOST_USER_SUSPEND that stops rings? And I'm not sure whether "by the source" means by the front-end or back-end on the source machine? This is pre-existing text and I assumed it (with not doubt) to refer to GET_VRING_BASE, because that is how rings are stopped. I can improve the existing documentation and add the reference to GET_VRING_BASE, and say that it clearly must come from the front-end. No further update must be done before the device and its rings are resumed. "Update" to what? Guest RAM? Device state? Rings? I feel like this text is too vague for a spec. People may interpret it differently. Can you make rephrase this to more concrete? Honestly, no. This is pre-existing, and I have the same questions as you do. I cannot “rephrase” this to make it more concrete. I can try to actually specify that was was left unspecified, but that would be a change in specification that would require its own patch, separate from this series. Personally, I’ve generally taken this sentence to be fluff. If the rings are stopped, clearly, they should not be accessed at all. Probably the back-end should also refrain from writing to guest memory, because that is a logical conclusion from having the rings stopped.
[Bug 1787] Qemu asan test make vm crash when using qxl and spice
Bug links: https://gitlab.com/qemu-project/qemu/-/issues/1787 When we tested QEMU with asan, the vm crash. How to reproduce the bug: 1、 Start the vm with qxl and spice. 2、 Attach the vm with vnc and spice. 3、 Placed for more than three days. 4、 Operation on spice client and possible reproduce this bug. I think the reason for the problem is that the cursor pointer was not set to NULL when qemu call cursor_put. But I don't know what situation will trigger this error. This error is difficult to reproduce by natural.
Re: [PATCH 6/6] vhost-user: Have reset_status fall back to reset
On 24.07.23 20:04, Stefan Hajnoczi wrote: On Fri, Jul 21, 2023 at 04:16:07PM +0200, Hanna Czenczek wrote: On 20.07.23 18:03, Stefan Hajnoczi wrote: On Wed, Jul 19, 2023 at 04:27:58PM +0200, Hanna Czenczek wrote: On 19.07.23 16:11, Hanna Czenczek wrote: On 18.07.23 17:10, Stefan Hajnoczi wrote: On Tue, Jul 11, 2023 at 05:52:28PM +0200, Hanna Czenczek wrote: The only user of vhost_user_reset_status() is vhost_dev_stop(), which only uses it as a fall-back to stop the back-end if it does not support SUSPEND. However, vhost-user's implementation is a no-op unless the back-end supports SET_STATUS. vhost-vdpa's implementation instead just calls vhost_vdpa_reset_device(), implying that it's OK to fully reset the device if SET_STATUS is not supported. To be fair, vhost_vdpa_reset_device() does nothing but to set the status to zero. However, that may well be because vhost-vdpa has no method besides this to reset a device. In contrast, vhost-user has RESET_DEVICE and a RESET_OWNER, which can be used instead. While it is not entirely clear from documentation or git logs, from discussions and the order of vhost-user protocol features, it appears to me as if RESET_OWNER originally had no real meaning for vhost-user, and was thus used to signal a device reset to the back-end. Then, RESET_DEVICE was introduced, to have a well-defined dedicated reset command. Finally, vhost-user received full STATUS support, including SET_STATUS, so setting the device status to 0 is now the preferred way of resetting a device. Still, RESET_DEVICE and RESET_OWNER should remain valid as fall-backs. Therefore, have vhost_user_reset_status() fall back to vhost_user_reset_device() if the back-end has no STATUS support. Signed-off-by: Hanna Czenczek --- hw/virtio/vhost-user.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 4507de5a92..53a881ec2a 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -2833,6 +2833,8 @@ static void vhost_user_reset_status(struct vhost_dev *dev) if (virtio_has_feature(dev->protocol_features, VHOST_USER_PROTOCOL_F_STATUS)) { vhost_user_set_status(dev, 0); + } else { + vhost_user_reset_device(dev); } } Did you check whether DPDK treats setting the status to 0 as equivalent to RESET_DEVICE? If it doesn’t, what’s even the point of using reset_status? Sorry, I’m being unclear, and I think this may be important because it ties into the question from patch 1, what qemu is even trying to do by running SET_STATUS(0) vhost_dev_stop(), so here’s what gave me the impression that SET_STATUS(0) and RESET_DEVICE should be equivalent: vhost-vdpa.c runs SET_STATUS(0) in a function called vhost_vdpa_reset_device(). This is one thing that gave me the impression that this is about an actual full reset. Another is the whole discussion that we’ve had. vhost_dev_stop() does not call a `vhost_reset_device()` function, it calls `vhost_reset_status()`. Still, we were always talking about resetting the device. There is some hacky stuff with struct vhost_dev's vq_index_end and multi-queue devices. I think it's because multi-queue vhost-net device consist of many vhost_devs and NetClientStates, so certain vhost operations are skipped unless this is the "first" or "last" vhost_dev from a large aggregate vhost-net device. That might be responsible for part of the weirdness. It doesn’t make sense to me that vDPA would provide no function to fully reset a device, while vhost-user does. Being able to reset a device sounds vital to me. This also gave me the impression that SET_STATUS(0) on vDPA at least is functionally equivalent to a full device reset. Maybe SET_STATUS(0) does mean a full device reset on vDPA, but not on vhost-user. That would be a real shame, so I assumed this would not be the case; that SET_STATUS(0) does the same thing on both protocols. Yes, exactly. It has the real VIRTIO spec meaning in vDPA. In vhost-user it's currently only used by DPDK as a hint for when device initialization is complete: https://github.com/DPDK/dpdk/commit/41d201804c4c44738168e2d247d3b1780845faa1 FWIW, now the code is a bit different. https://github.com/DPDK/dpdk/commit/671cc679a5fcd26705bb20ddc13b93e665719054 has added a RESET interpretation for the status field, i.e. when it is 0. It doesn’t do anything, but at least DPDK seems to agree that SET_STATUS(0) is a reset. That patch adds diagnostics but does not perform any action for SET_STATUS 0. DPDK's vhost_user_reset_owner() is still the only place where the device is actually reset. That’s what I said, it doesn’t do anything, but the diagnostics agree that it is a RESET. QEMU cannot switch to just SET_STATUS 0, it still needs to send RESET_DEVICE/RESET_OWNER. That is what I questioned below: We currently *do not* call RESET_DEVICE/RESET_OWNER. This patch is not about switching to SET_STATUS(0), it is abo
Re: [PATCH] migrate/multifd: fix coredump when the multifd thread cleanup
@Peter Xu @Fabiano Rosas Kindly ping on this. On 2023/6/27 9:11, chenyuhui (A) wrote: > > On 2023/6/26 21:16, chenyuhui (A) wrote: >> >> On 2023/6/21 22:22, Fabiano Rosas wrote: >>> Jianguo Zhang via writes: >>> From: Yuhui Chen There is a coredump while trying to destroy mutex when p->running is false but p->mutex is not unlock. Make sure all mutexes has been released before destroy them. Signed-off-by: Yuhui Chen --- migration/multifd.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index b7ad7002e0..7dcdb2d3a0 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -523,9 +523,7 @@ void multifd_save_cleanup(void) for (i = 0; i < migrate_multifd_channels(); i++) { MultiFDSendParams *p = &multifd_send_state->params[i]; -if (p->running) { >>> >>> The need for this flag is dubious IMO. Commit 10351fbad1 >>> ("migration/multifd: Join all multifd threads in order to avoid leaks") >>> already moved the other join outside of it. If we figure out another way >>> to deal with the sem_sync lockup we could probably remove this >>> altogether. >> >> >> I've seen this commit 10351fbad1, and it's seems to have the same >> problem in function multifd_save_cleanup. >> >> So that may my patch only need to modify multifd_save_cleanup. >> >> __ >> >> >> On 2023/6/21 21:24, Peter Xu wrote: >>> On Wed, Jun 21, 2023 at 04:18:26PM +0800, Jianguo Zhang via wrote: From: Yuhui Chen There is a coredump while trying to destroy mutex when p->running is false but p->mutex is not unlock. Make sure all mutexes has been released before destroy them. >>> >>> It'll be nice to add a backtrace of the coredump here, and also copy >>> maintainer (Juan Quintela, copied now). >>> >> >> The following is coredump, and my code is base on >> https://github.com/qemu/qemu.git tag v6.2.0. >> > (gdb) bt > #0 0xabe3b2b8 in () at /usr/lib64/libc.so.6 > #1 0xabdf6d7c in raise () at /usr/lib64/libc.so.6 > #2 0xabde4d2c in abort () at /usr/lib64/libc.so.6 > #3 0xc67fcc10 in error_exit (err=, > msg=msg@entry=0xc6dc52b8 <__func__.33> "qemu_mutex_destroy") at > ../util/qemu-thread-posix.c:38 > #4 0xc67fce38 in qemu_mutex_destroy > (mutex=mutex@entry=0xfa1a4250) at ../util/qemu-thread-posix.c:71 > #5 0xc6055688 in multifd_save_cleanup () at > ../migration/multifd.c:555 > #6 0xc6050198 in migrate_fd_cleanup (s=s@entry=0xf7518800) at > ../migration/migration.c:1808 > #7 0xc6050384 in migrate_fd_cleanup_bh (opaque=0xf7518800) at > ../migration/migration.c:1850 > #8 0xc680d790 in aio_bh_call (bh=0xa0004c40) at > ../util/async.c:141 > #9 aio_bh_poll (ctx=ctx@entry=0xf73285a0) at ../util/async.c:169 > #10 0xc67f9e18 in aio_dispatch (ctx=0xf73285a0) at > ../util/aio-posix.c:381 > #11 0xc680d414 in aio_ctx_dispatch (source=, > callback=, user_data=) at ../util/async.c:311 > #12 0xac44cf88 in g_main_context_dispatch () at > /usr/lib64/libglib-2.0.so.0 > #13 0xc6819214 in glib_pollfds_poll () at ../util/main-loop.c:232 > #14 os_host_main_loop_wait (timeout=73500) at ../util/main-loop.c:255 > #15 main_loop_wait (nonblocking=nonblocking@entry=0) at > ../util/main-loop.c:531 > #16 0xc65005cc in qemu_main_loop () at ../softmmu/runstate.c:726 > #17 0xc5fe2030 in main (argc=, argv=, > envp=) at ../softmmu/main.c:50 > (gdb) q > >> How reproducible: >> 1、And sleep time to produce p->running is false but p->mutex is >> not unlock.(apply following patch) >> 2、Do migration with --parallel-connections. From: Yuhui Chen >> Date: Mon, 26 Jun 2023 14:24:35 +0800 >> Subject: [DEBUG][PATCH] And sleep time to produce p->running is false but >> p->mutex is >> not unlock. >> >> --- >> migration/multifd.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/migration/multifd.c b/migration/multifd.c >> index 7c9deb1921..09a7b0748a 100644 >> --- a/migration/multifd.c >> +++ b/migration/multifd.c >> @@ -538,6 +538,7 @@ void multifd_save_cleanup(void) >> for (i = 0; i < migrate_multifd_channels(); i++) { >> MultiFDSendParams *p = &multifd_send_state->params[i]; >> >> +sleep(2); >> if (p->running) { >> qemu_thread_join(&p->thread); >> } >> @@ -719,6 +720,7 @@ out: >> >> qemu_mutex_lock(&p->mutex); >> p->running = false; >> +sleep(20); >> qemu_mutex_unlock(&p->mutex); >> >> rcu_unregister_thread();
Re: [PATCH v21 01/20] s390x/cpu topology: add s390 specifics to CPU topology
On 7/24/23 12:15, Nina Schoetterl-Glausch wrote: On Fri, 2023-07-21 at 13:24 +0200, Pierre Morel wrote: On 7/18/23 18:31, Nina Schoetterl-Glausch wrote: Reviewed-by: Nina Schoetterl-Glausch Some notes below. The s390x/ prefix in the title might suggest that this patch is s390 specific, but it touches common files. Right. What do you suggest? I can cut it in two or squash it with patch number 2. The first idea was to separate the patch to ease the review but the functionality introduced in patch 1 do only make sense with patch 2. So I would be for squashing the first two patches. ? Oh, I'd just change the title. CPU topology: extend with s390 specifics or similar, it was just a nit not to create the impression that the patch only touches s390 stuff. OK thanks Pierre
Re: [PATCH v21 02/20] s390x/cpu topology: add topology entries on CPU hotplug
On 7/24/23 22:19, Nina Schoetterl-Glausch wrote: On Fri, 2023-06-30 at 11:17 +0200, Pierre Morel wrote: The topology information are attributes of the CPU and are specified during the CPU device creation. On hot plug we: - calculate the default values for the topology for drawers, books and sockets in the case they are not specified. - verify the CPU attributes - check that we have still room on the desired socket The possibility to insert a CPU in a mask is dependent on the number of cores allowed in a socket, a book or a drawer, the checking is done during the hot plug of the CPU to have an immediate answer. If the complete topology is not specified, the core is added in the physical topology based on its core ID and it gets defaults values for the modifier attributes. This way, starting QEMU without specifying the topology can still get some advantage of the CPU topology. Signed-off-by: Pierre Morel Reviewed-by: Nina Schoetterl-Glausch if you address Thomas' comments. Thanks, regards Pierre
[PATCH] semihosting/uaccess.c: Replaced a malloc call with g_malloc.
Replaced a call to malloc() and its respective call to free() with g_malloc() and g_free(). Signed-off-by: dinglimin V1 -> V2:if cpu_memory_rw_debug failed, still need to set p=NULL --- semihosting/uaccess.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/semihosting/uaccess.c b/semihosting/uaccess.c index 8018828069..2ac754cdb6 100644 --- a/semihosting/uaccess.c +++ b/semihosting/uaccess.c @@ -14,10 +14,10 @@ void *softmmu_lock_user(CPUArchState *env, target_ulong addr, target_ulong len, bool copy) { -void *p = malloc(len); +void *p = g_malloc(len); if (p && copy) { if (cpu_memory_rw_debug(env_cpu(env), addr, p, len, 0)) { -free(p); +g_free(p); p = NULL; } } -- 2.30.0.windows.2
Re: [PATCH QEMU v10 3/9] qapi/migration: Introduce vcpu-dirty-limit parameters
~hyman writes: > From: Hyman Huang(黄勇) > > Introduce "vcpu-dirty-limit" migration parameter used > to limit dirty page rate during live migration. > > "vcpu-dirty-limit" and "x-vcpu-dirty-limit-period" are > two dirty-limit-related migration parameters, which can > be set before and during live migration by qmp > migrate-set-parameters. > > This two parameters are used to help implement the dirty > page rate limit algo of migration. > > Signed-off-by: Hyman Huang(黄勇) > Acked-by: Peter Xu > Reviewed-by: Juan Quintela Reviewed-by: Markus Armbruster
Re: [PATCH] semihosting/uaccess.c: Replaced a malloc call with g_malloc.
25.07.2023 12:00, dinglimin wrote: Replaced a call to malloc() and its respective call to free() with g_malloc() and g_free(). Signed-off-by: dinglimin V1 -> V2:if cpu_memory_rw_debug failed, still need to set p=NULL --- semihosting/uaccess.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/semihosting/uaccess.c b/semihosting/uaccess.c index 8018828069..2ac754cdb6 100644 --- a/semihosting/uaccess.c +++ b/semihosting/uaccess.c @@ -14,10 +14,10 @@ void *softmmu_lock_user(CPUArchState *env, target_ulong addr, target_ulong len, bool copy) { -void *p = malloc(len); +void *p = g_malloc(len); if (p && copy) { if (cpu_memory_rw_debug(env_cpu(env), addr, p, len, 0)) { -free(p); +g_free(p); p = NULL; } } Ok, that was the obvious part. Now a next one, also obvious. You changed lock_user to use g_malloc(), but unlock_user still uses free() instead of g_free(). At the very least the other one needs to be changed too. And I'd say the callers should be analyzed to ensure they don't free() the result (they should not, think it is a bug if they do). lock_user/unlock_user (which #defines to softmmu_lock_user/ softmmu_unlock_user in softmmu mode) is used a *lot*. /mjt
Re: [PATCH] migration: Allow user to specify migration available bandwidth
On Mon, Jul 24, 2023 at 03:47:50PM -0400, Peter Xu wrote: > On Mon, Jul 24, 2023 at 07:04:29PM +0100, Daniel P. Berrangé wrote: > > On Mon, Jul 24, 2023 at 01:07:55PM -0400, Peter Xu wrote: > > > Migration bandwidth is a very important value to live migration. It's > > > because it's one of the major factors that we'll make decision on when to > > > switchover to destination in a precopy process. > > > > To elaborate on this for those reading along... > > > > QEMU takes maxmimum downtime limit and multiplies by its estimate > > of bandwidth. This gives a figure for the amount of data QEMU thinks > > it can transfer within the downtime period. > > > > QEMU compares this figure to the amount of data that is still pending > > at the end of an iteration. > > > > > This value is currently estimated by QEMU during the whole live migration > > > process by monitoring how fast we were sending the data. This can be the > > > most accurate bandwidth if in the ideal world, where we're always feeding > > > unlimited data to the migration channel, and then it'll be limited to the > > > bandwidth that is available. > > > > The QEMU estimate for available bandwidth will definitely be wrong, > > potentially by orders of magnitude, if QEMU has a max bandwidth limit > > set, as in that case it is never trying to push the peak rates available > > from the NICs/network fabric. > > > > > The issue is QEMU itself may not be able to avoid those uncertainties on > > > measuing the real "available migration bandwidth". At least not something > > > I can think of so far. > > > > IIUC, you can query the NIC properties to find the hardware transfer > > rate of the NICs. That doesn't imply apps can actually reach that > > rate in practice - it has a decent chance of being a over-estimate > > of bandwidth, possibly very very much over. > > > > Is such an over estimate better or worse than QEMU's current > > under-estimate ? It depends on the POV. > > > > From the POV of QEMU, over-estimating means means it'll be not > > be throttling as much as it should. That's not a downside of > > migration - it makes it more likely for migration to complete :-) > > Heh. :) > > > > > From the POV of non-QEMU apps though, if QEMU over-estimates, > > it'll mean other apps get starved of network bandwidth. > > > > Overall I agree, there's no obvious way QEMU can ever come up > > with a reliable estimate for bandwidth available. > > > > > One way to fix this is when the user is fully aware of the available > > > bandwidth, then we can allow the user to help providing an accurate value. > > > > > > For example, if the user has a dedicated channel of 10Gbps for migration > > > for this specific VM, the user can specify this bandwidth so QEMU can > > > always do the calculation based on this fact, trusting the user as long as > > > specified. > > > > I can see that in theory, but when considering a non-trivial > > deployments of QEMU, I wonder if the user can really have any > > such certainty of what is truely avaialble. It would need > > global awareness of the whole network of hosts & workloads. > > Indeed it may or may not be easy always. > > The good thing about this parameter is we always use the old estimation if > the user can't specify anything valid, so this is always optional not > required. > > It solves the cases where the user can still specify accurately on the bw - > our QE team has already verified that it worked for us on GPU tests, where > it used to not be able to migrate at all with any sane downtime specified. > I should have attached a Tested-By from Zhiyi but since this is not exactly > the patch he was using I didn't. > > > > > > When the user wants to have migration only use 5Gbps out of that 10Gbps, > > > one can set max-bandwidth to 5Gbps, along with available-bandwidth to > > > 5Gbps > > > so it'll never use over 5Gbps too (so the user can have the rest 5Gbps for > > > other things). So it can be useful even if the network is not dedicated, > > > but as long as the user can know a solid value. > > > > > > A new parameter "available-bandwidth" is introduced just for this. So when > > > the user specified this parameter, instead of trusting the estimated value > > > from QEMU itself (based on the QEMUFile send speed), let's trust the user > > > more. > > > > I feel like rather than "available-bandwidth", we should call > > it "max-convergance-bandwidth". > > > > To me that name would better reflect the fact that this isn't > > really required to be a measure of how much NIC bandwidth is > > available. It is merely an expression of a different bandwidth > > limit to apply during switch over. > > > > IOW > > > > * max-bandwidth: limit during pre-copy main transfer > > * max-convergance-bandwidth: limit during pre-copy switch-over > > * max-postcopy-banwidth: limit during post-copy phase > > I worry the new name suggested is not straightforward enough at the 1st > glance, even to me as a developer. >
Re: [PATCH 0/6] Add nRF51 DETECT signal with test
On Tue, 25 Jul 2023 at 04:25, Chris Laplante wrote: > > Hi Peter, > > > Thanks for this patchset and especially for the work > > improving the qtest infrastructure. I've given my > > comments on the different patches, and in some cases > > reviewed-by tags. (Where I've given one of those, you should > > add it to your commit message for the relevant patch under > > your Signed-off-by: line, so that when you send the version > > 2 of the patchset we know that those parts are already > > reviewed and don't need re-examining. If I said "make > > some change; otherwise Reviewed-by" that means "make > > that minor change, and then you can add the tag, etc".) > > Thanks very much for the feedback and help! > > > Do you have the parts of this feature that use the DETECT > > signal in the POWER device, or have you not written those > > yet ? If you have them, you could send those too in v2. > > That part is halfway done, so I will work on finishing it before submitting > v2. Two questions regarding that (to potentially save us a v3): > > 1. The nRF51 POWER device overlaps with the memory maps of the CLOCK and MPU > devices. So I have created a CPM (CLOCK, POWER, MPU) device in hw/misc. Does > that sound reasonable naming-wise? Yes, I think from QEMU's point of view the massive register overlap makes them a single device. The name sounds OK (give it the same kind of nrf51 prefix the rng device has). > 2. I also have some implementations for pieces of CLOCK, namely the > HFCLKSTART/HFCLKSTOP events and HFCLKSTARTED event. Should I include that in > this patch series, or would you prefer it in a separate series? It is > unrelated to DETECT and POWER. If you think they're ready to go in, and it doesn't make the series more than about 12-15 patches long, you can put them on the end of the series. If the patchset is starting to get a bit big then it might be easier to get the POWER/DETECT parts reviewed first. thanks -- PMM
Re: [PATCH] semihosting/uaccess.c: Replaced a malloc call with g_malloc.
On Tue, 25 Jul 2023 at 10:13, Michael Tokarev wrote: > > 25.07.2023 12:00, dinglimin wrote: > > Replaced a call to malloc() and its respective call to free() with > > g_malloc() and g_free(). > > > > Signed-off-by: dinglimin > > > > V1 -> V2:if cpu_memory_rw_debug failed, still need to set p=NULL > > --- > > semihosting/uaccess.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/semihosting/uaccess.c b/semihosting/uaccess.c > > index 8018828069..2ac754cdb6 100644 > > --- a/semihosting/uaccess.c > > +++ b/semihosting/uaccess.c > > @@ -14,10 +14,10 @@ > > void *softmmu_lock_user(CPUArchState *env, target_ulong addr, > > target_ulong len, bool copy) > > { > > -void *p = malloc(len); > > +void *p = g_malloc(len); > > if (p && copy) { > > if (cpu_memory_rw_debug(env_cpu(env), addr, p, len, 0)) { > > -free(p); > > +g_free(p); > > p = NULL; > > } > > } > > Ok, that was the obvious part. Now a next one, also obvious. > > You changed lock_user to use g_malloc(), but unlock_user > still uses free() instead of g_free(). At the very least > the other one needs to be changed too. And I'd say the callers > should be analyzed to ensure they don't free() the result > (they should not, think it is a bug if they do). We can be pretty sure the callers don't free() the returned value, because the calling code is also used in user-mode, where the lock/unlock implementation is entirely different and calling free() on the pointer will not work. > lock_user/unlock_user (which #defines to softmmu_lock_user/ > softmmu_unlock_user in softmmu mode) is used a *lot*. The third part here, is that g_malloc() does not ever fail -- it will abort() on out of memory. However the code here is still handling g_malloc() returning NULL. The equivalent for "we expect this might fail" (which we want here, because the guest is passing us the length of memory to try to allocate) is g_try_malloc(). thanks -- PMM
intel-iommu: Report interrupt remapping faults, fix return value
From: David Woodhouse A generic X86IOMMUClass->int_remap function should not return VT-d specific values; fix it to return 0 if the interrupt was successfully translated or -EINVAL if not. The VTD_FR_IR_xxx values are supposed to be used to actually raise faults through the fault reporting mechanism, so do that instead for the case where the IRQ is actually being injected. There is more work to be done here, as pretranslations for the KVM IRQ routing table can't fault; an untranslatable IRQ should be handled in userspace and the fault raised only when the IRQ actually happens (if indeed the IRTE is still not valid at that time). But we can work on that later; we can at least raise faults for the direct case. Signed-off-by: David Woodhouse --- hw/i386/intel_iommu.c | 148 ++--- hw/i386/intel_iommu_internal.h | 1 + 2 files changed, 102 insertions(+), 47 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index dcc334060c..a65a94a4ce 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -469,21 +469,12 @@ static void vtd_set_frcd_and_update_ppf(IntelIOMMUState *s, uint16_t index) /* Must not update F field now, should be done later */ static void vtd_record_frcd(IntelIOMMUState *s, uint16_t index, -uint16_t source_id, hwaddr addr, -VTDFaultReason fault, bool is_write, -bool is_pasid, uint32_t pasid) +uint64_t hi, uint64_t lo) { -uint64_t hi = 0, lo; hwaddr frcd_reg_addr = DMAR_FRCD_REG_OFFSET + (((uint64_t)index) << 4); assert(index < DMAR_FRCD_REG_NR); -lo = VTD_FRCD_FI(addr); -hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault) | - VTD_FRCD_PV(pasid) | VTD_FRCD_PP(is_pasid); -if (!is_write) { -hi |= VTD_FRCD_T; -} vtd_set_quad_raw(s, frcd_reg_addr, lo); vtd_set_quad_raw(s, frcd_reg_addr + 8, hi); @@ -509,17 +500,11 @@ static bool vtd_try_collapse_fault(IntelIOMMUState *s, uint16_t source_id) } /* Log and report an DMAR (address translation) fault to software */ -static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id, - hwaddr addr, VTDFaultReason fault, - bool is_write, bool is_pasid, - uint32_t pasid) +static void vtd_report_frcd_fault(IntelIOMMUState *s, uint64_t source_id, + uint64_t hi, uint64_t lo) { uint32_t fsts_reg = vtd_get_long_raw(s, DMAR_FSTS_REG); -assert(fault < VTD_FR_MAX); - -trace_vtd_dmar_fault(source_id, fault, addr, is_write); - if (fsts_reg & VTD_FSTS_PFO) { error_report_once("New fault is not recorded due to " "Primary Fault Overflow"); @@ -539,8 +524,7 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id, return; } -vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault, -is_write, is_pasid, pasid); +vtd_record_frcd(s, s->next_frcd_reg, hi, lo); if (fsts_reg & VTD_FSTS_PPF) { error_report_once("There are pending faults already, " @@ -565,6 +549,40 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id, } } +/* Log and report an DMAR (address translation) fault to software */ +static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id, + hwaddr addr, VTDFaultReason fault, + bool is_write, bool is_pasid, + uint32_t pasid) +{ +uint64_t hi, lo; + +assert(fault < VTD_FR_MAX); + +trace_vtd_dmar_fault(source_id, fault, addr, is_write); + +lo = VTD_FRCD_FI(addr); +hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault) | + VTD_FRCD_PV(pasid) | VTD_FRCD_PP(is_pasid); +if (!is_write) { +hi |= VTD_FRCD_T; +} + +vtd_report_frcd_fault(s, source_id, hi, lo); +} + + +static void vtd_report_ir_fault(IntelIOMMUState *s, uint64_t source_id, +VTDFaultReason fault, uint16_t index) +{ +uint64_t hi, lo; + +lo = VTD_FRCD_IR_IDX(index); +hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault); + +vtd_report_frcd_fault(s, source_id, hi, lo); +} + /* Handle Invalidation Queue Errors of queued invalidation interface error * conditions. */ @@ -3300,8 +3318,9 @@ static Property vtd_properties[] = { }; /* Read IRTE entry with specific index */ -static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t index, -VTD_IR_TableEntry *entry, uint16_t sid) +static bool vtd_irte_get(IntelIOMMUState *iommu, uint16_t index, + VTD_IR_TableEntry *entry, uint16_t sid, + bool do_fault) { static const uint16_t vtd_svt_mask[VTD_SQ_MAX] = \ {0x, 0xfffb, 0xfff9, 0x
Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop
On Tue, Jul 25, 2023 at 9:53 AM Hanna Czenczek wrote: > > On 24.07.23 17:48, Eugenio Perez Martin wrote: > > On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek wrote: > >> On 21.07.23 17:25, Eugenio Perez Martin wrote: > >>> On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek wrote: > Move the `suspended` field from vhost_vdpa into the global vhost_dev > struct, so vhost_dev_stop() can check whether the back-end has been > suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has, > there is no need to reset it; the reset is just a fall-back to stop > device operations for back-ends that do not support suspend. > > Unfortunately, for vDPA specifically, RESUME is not yet implemented, so > when the device is re-started, we still have to do the reset to have it > un-suspend. > > Signed-off-by: Hanna Czenczek > --- > include/hw/virtio/vhost-vdpa.h | 2 -- > include/hw/virtio/vhost.h | 8 > hw/virtio/vhost-vdpa.c | 11 +++ > hw/virtio/vhost.c | 8 +++- > 4 files changed, 22 insertions(+), 7 deletions(-) > > diff --git a/include/hw/virtio/vhost-vdpa.h > b/include/hw/virtio/vhost-vdpa.h > index e64bfc7f98..72c3686b7f 100644 > --- a/include/hw/virtio/vhost-vdpa.h > +++ b/include/hw/virtio/vhost-vdpa.h > @@ -42,8 +42,6 @@ typedef struct vhost_vdpa { > bool shadow_vqs_enabled; > /* Vdpa must send shadow addresses as IOTLB key for data queues, > not GPA */ > bool shadow_data; > -/* Device suspended successfully */ > -bool suspended; > /* IOVA mapping used by the Shadow Virtqueue */ > VhostIOVATree *iova_tree; > GPtrArray *shadow_vqs; > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h > index 6a173cb9fa..69bf59d630 100644 > --- a/include/hw/virtio/vhost.h > +++ b/include/hw/virtio/vhost.h > @@ -120,6 +120,14 @@ struct vhost_dev { > uint64_t backend_cap; > /* @started: is the vhost device started? */ > bool started; > +/** > + * @suspended: Whether the vhost device is currently suspended. Set > + * and reset by implementations (vhost-user, vhost-vdpa, ...), which > + * are supposed to automatically suspend/resume in their > + * vhost_dev_start handlers as required. Must also be cleared when > + * the device is reset. > + */ > +bool suspended; > bool log_enabled; > uint64_t log_size; > Error *migration_blocker; > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > index 7b7dee468e..f7fd19a203 100644 > --- a/hw/virtio/vhost-vdpa.c > +++ b/hw/virtio/vhost-vdpa.c > @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct > vhost_dev *dev, > > static int vhost_vdpa_reset_device(struct vhost_dev *dev) > { > -struct vhost_vdpa *v = dev->opaque; > int ret; > uint8_t status = 0; > > ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status); > trace_vhost_vdpa_reset_device(dev); > -v->suspended = false; > +dev->suspended = false; > return ret; > } > > @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev > *dev) > if (unlikely(r)) { > error_report("Cannot suspend: %s(%d)", g_strerror(errno), > errno); > } else { > -v->suspended = true; > +dev->suspended = true; > return; > } > } > @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev > *dev, bool started) > return -1; > } > vhost_vdpa_set_vring_ready(dev); > +if (dev->suspended) { > +/* TODO: When RESUME is available, use it instead of > resetting */ > +vhost_vdpa_reset_status(dev); > >>> How is that we reset the status at each vhost_vdpa_dev_start? That > >>> will clean all the vqs configured, features negotiated, etc. in the > >>> vDPA device. Or am I missing something? > >> What alternative do you propose? We don’t have RESUME for vDPA in qemu, > >> but we somehow need to lift the previous SUSPEND so the device will > >> again respond to guest requests, do we not? > >> > > Reset also clears the suspend state in vDPA, and it should be called > > at vhost_dev_stop. So the device should never be in suspended state > > here. Does that solve your concerns? > > My intention with this patch was precisely not to reset in > vhost_dev_stop when suspending is supported. So now I’m more confused > than before. > At this moment, I think that should be foc
i386/xen: prevent guest from binding loopback event channel to itself
From: David Woodhouse Fuzzing showed that a guest could bind an interdomain port to itself, by guessing the next port to be allocated and putting that as the 'remote' port number. By chance, that works because the newly-allocated port has type EVTCHNSTAT_unbound. It shouldn't. Signed-off-by: David Woodhouse --- hw/i386/kvm/xen_evtchn.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c index 0e9c108614..70b4b8a6ef 100644 --- a/hw/i386/kvm/xen_evtchn.c +++ b/hw/i386/kvm/xen_evtchn.c @@ -1408,8 +1408,15 @@ int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain *interdomain) XenEvtchnPort *rp = &s->port_table[interdomain->remote_port]; XenEvtchnPort *lp = &s->port_table[interdomain->local_port]; -if (rp->type == EVTCHNSTAT_unbound && rp->type_val == 0) { -/* It's a match! */ +/* + * The 'remote' port for loopback must be an unbound port allocated for + * communication with the local domain (as indicated by rp->type_val + * being zero, not PORT_INFO_TYPEVAL_REMOTE_QEMU), and must *not* be + * the port that was just allocated for the local end. + */ +if (interdomain->local_port != interdomain->remote_port && +rp->type == EVTCHNSTAT_unbound && rp->type_val == 0) { + rp->type = EVTCHNSTAT_interdomain; rp->type_val = interdomain->local_port; -- 2.34.1 smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] vhost-user-scsi: support reconnect to backend
Thanks for your comments. > 2023年7月25日 上午1:21,Raphael Norwitz 写道: > > Very excited to see this. High level looks good modulo a few small things. > > My major concern is around existing vhost-user-scsi backends which don’t > support VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD. IMO we should hide the > reconnect behavior behind a VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD check. We > may want to do the same for vhost-user-blk. > > The question is then what happens if the check is false. IIUC without an > inflight FD, if a device processes requests out of order, it’s not safe to > continue execution on reconnect, as there’s no way for the backend to know > how to replay IO. Should we permanently wedge the device or have QEMU fail > out? May be nice to have a toggle for this. Based on what MST said, is there anything else I need to do? > >> On Jul 21, 2023, at 6:51 AM, Li Feng wrote: >> >> If the backend crashes and restarts, the device is broken. >> This patch adds reconnect for vhost-user-scsi. >> >> Tested with spdk backend. >> >> Signed-off-by: Li Feng >> --- >> hw/block/vhost-user-blk.c | 2 - >> hw/scsi/vhost-scsi-common.c | 27 ++--- >> hw/scsi/vhost-user-scsi.c | 163 +--- >> include/hw/virtio/vhost-user-scsi.h | 3 + >> include/hw/virtio/vhost.h | 2 + >> 5 files changed, 165 insertions(+), 32 deletions(-) >> >> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c >> index eecf3f7a81..f250c740b5 100644 >> --- a/hw/block/vhost-user-blk.c >> +++ b/hw/block/vhost-user-blk.c >> @@ -32,8 +32,6 @@ >> #include "sysemu/sysemu.h" >> #include "sysemu/runstate.h" >> >> -#define REALIZE_CONNECTION_RETRIES 3 >> - >> static const int user_feature_bits[] = { >>VIRTIO_BLK_F_SIZE_MAX, >>VIRTIO_BLK_F_SEG_MAX, >> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c > > Why can’t all the vhost-scsi-common stuff be moved to a separate change? I will move this code to separate patch. > > Especially the stuff introduced for vhost-user-blk in > 1b0063b3048af65dfaae6422a572c87db8575a92 should be moved out. OK. > >> index a06f01af26..08801886b8 100644 >> --- a/hw/scsi/vhost-scsi-common.c >> +++ b/hw/scsi/vhost-scsi-common.c >> @@ -52,16 +52,22 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc) >> >>vsc->dev.acked_features = vdev->guest_features; >> >> -assert(vsc->inflight == NULL); >> -vsc->inflight = g_new0(struct vhost_inflight, 1); >> -ret = vhost_dev_get_inflight(&vsc->dev, >> - vs->conf.virtqueue_size, >> - vsc->inflight); >> +ret = vhost_dev_prepare_inflight(&vsc->dev, vdev); >>if (ret < 0) { >> -error_report("Error get inflight: %d", -ret); >> +error_report("Error setting inflight format: %d", -ret); >>goto err_guest_notifiers; >>} >> >> +if (!vsc->inflight->addr) { >> +ret = vhost_dev_get_inflight(&vsc->dev, >> +vs->conf.virtqueue_size, >> +vsc->inflight); >> +if (ret < 0) { >> +error_report("Error get inflight: %d", -ret); >> +goto err_guest_notifiers; >> +} >> +} >> + >>ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight); >>if (ret < 0) { >>error_report("Error set inflight: %d", -ret); >> @@ -85,9 +91,6 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc) >>return ret; >> >> err_guest_notifiers: >> -g_free(vsc->inflight); >> -vsc->inflight = NULL; >> - >>k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false); >> err_host_notifiers: >>vhost_dev_disable_notifiers(&vsc->dev, vdev); >> @@ -111,12 +114,6 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc) >>} >>assert(ret >= 0); >> > > In the vhost-scsi (kernel backend) path, what will cleanup vsc->inflight now? OK, we should check the vsc->inflight if it is null, the vhost-scsi doesn’t allocate the inflight object memory. > >> -if (vsc->inflight) { >> -vhost_dev_free_inflight(vsc->inflight); >> -g_free(vsc->inflight); >> -vsc->inflight = NULL; >> -} >> - >>vhost_dev_disable_notifiers(&vsc->dev, vdev); >> } >> >> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c >> index ee99b19e7a..e0e88b0c42 100644 >> --- a/hw/scsi/vhost-user-scsi.c >> +++ b/hw/scsi/vhost-user-scsi.c >> @@ -89,14 +89,126 @@ static void vhost_dummy_handle_output(VirtIODevice >> *vdev, VirtQueue *vq) >> { >> } >> >> +static int vhost_user_scsi_connect(DeviceState *dev, Error **errp) >> +{ >> +VirtIODevice *vdev = VIRTIO_DEVICE(dev); >> +VHostUserSCSI *s = VHOST_USER_SCSI(vdev); >> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); >> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); >> +int ret = 0; >> + >> +if (s->connected) { >> +return 0; >> +} >> +s->connected = true; >> + >> +vsc->dev.num_q
Re: [PATCH] kvm: Remove KVM_CREATE_IRQCHIP support assumption
On Mon, Jul 24, 2023 at 11:53:39AM +0200, Thomas Huth wrote: > On 22/07/2023 08.21, Andrew Jones wrote: > > Since Linux commit 00f918f61c56 ("RISC-V: KVM: Skeletal in-kernel AIA > > irqchip support") checking KVM_CAP_IRQCHIP returns non-zero when the > > RISC-V platform has AIA. The cap indicates KVM supports at least one > > of the following ioctls: > > > >KVM_CREATE_IRQCHIP > >KVM_IRQ_LINE > >KVM_GET_IRQCHIP > >KVM_SET_IRQCHIP > >KVM_GET_LAPIC > >KVM_SET_LAPIC > > > > but the cap doesn't imply that KVM must support any of those ioctls > > in particular. However, QEMU was assuming the KVM_CREATE_IRQCHIP > > ioctl was supported. Stop making that assumption by introducing a > > KVM parameter that each architecture which supports KVM_CREATE_IRQCHIP > > sets. Adding parameters isn't awesome, but given how the > > KVM_CAP_IRQCHIP isn't very helpful on its own, we don't have a lot of > > options. > > > > Signed-off-by: Andrew Jones > > --- > > > > While this fixes booting guests on riscv KVM with AIA it's unlikely > > to get merged before the QEMU support for KVM AIA[1] lands, which > > would also fix the issue. I think this patch is still worth considering > > though since QEMU's assumption is wrong. > > > > [1] > > https://lore.kernel.org/all/20230714084429.22349-1-yongxuan.w...@sifive.com/ > > > > > > accel/kvm/kvm-all.c| 5 - > > include/sysemu/kvm.h | 1 + > > target/arm/kvm.c | 3 +++ > > target/i386/kvm/kvm.c | 2 ++ > > target/s390x/kvm/kvm.c | 3 +++ > > 5 files changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > > index 373d876c0580..0f5ff8630502 100644 > > --- a/accel/kvm/kvm-all.c > > +++ b/accel/kvm/kvm-all.c > > @@ -86,6 +86,7 @@ struct KVMParkedVcpu { > > }; > > KVMState *kvm_state; > > +bool kvm_has_create_irqchip; > > bool kvm_kernel_irqchip; > > bool kvm_split_irqchip; > > bool kvm_async_interrupts_allowed; > > @@ -2377,8 +2378,10 @@ static void kvm_irqchip_create(KVMState *s) > > if (s->kernel_irqchip_split == ON_OFF_AUTO_ON) { > > error_report("Split IRQ chip mode not supported."); > > exit(1); > > -} else { > > +} else if (kvm_has_create_irqchip) { > > ret = kvm_vm_ioctl(s, KVM_CREATE_IRQCHIP); > > +} else { > > +return; > > } > > } > > if (ret < 0) { > > I think I'd do this differntly... at the beginning of the function, there is > a check for kvm_check_extension(s, KVM_CAP_IRQCHIP) etc. ... I think you > could now replace that check with a simple > > if (!kvm_has_create_irqchip) { > return; > } > > The "kvm_vm_enable_cap(s, KVM_CAP_S390_IRQCHIP, 0)" of course has to be > moved to the target/s390x/kvm/kvm.c file, too. > Sounds good. I'll do that for v2. Thanks, drew
[PULL 2/5] scripts/git-submodule.sh: Don't rely on non-POSIX 'read' behaviour
The POSIX definition of the 'read' utility requires that you specify the variable name to set; omitting the name and having it default to 'REPLY' is a bashism. If your system sh is dash, then it will print an error message during build: qemu/pc-bios/s390-ccw/../../scripts/git-submodule.sh: 106: read: arg count Specify the variable name explicitly. Fixes: fdb8fd8cb915647b ("git-submodule: allow partial update of .git-submodule-status") Signed-off-by: Peter Maydell Reviewed-by: Daniel P. Berrangé Reviewed-by: Philippe Mathieu-Daudé Message-id: 20230720153038.1587196-1-peter.mayd...@linaro.org --- scripts/git-submodule.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/git-submodule.sh b/scripts/git-submodule.sh index 335f7f5fdf8..bb1222c7727 100755 --- a/scripts/git-submodule.sh +++ b/scripts/git-submodule.sh @@ -103,7 +103,7 @@ update) check_updated $module || echo Updated "$module" done -(while read -r; do +(while read -r REPLY; do for module in $modules; do case $REPLY in *" $module "*) continue 2 ;; -- 2.34.1
[PULL 1/5] hw/arm/smmu: Handle big-endian hosts correctly
The implementation of the SMMUv3 has multiple places where it reads a data structure from the guest and directly operates on it without doing a guest-to-host endianness conversion. Since all SMMU data structures are little-endian, this means that the SMMU doesn't work on a big-endian host. In particular, this causes the Avocado test machine_aarch64_virt.py:Aarch64VirtMachine.test_alpine_virt_tcg_gic_max to fail on an s390x host. Add appropriate byte-swapping on reads and writes of guest in-memory data structures so that the device works correctly on big-endian hosts. As part of this we constrain queue_read() to operate only on Cmd structs and queue_write() on Evt structs, because in practice these are the only data structures the two functions are used with, and we need to know what the data structure is to be able to byte-swap its parts correctly. Signed-off-by: Peter Maydell Tested-by: Thomas Huth Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Eric Auger Message-id: 20230717132641.764660-1-peter.mayd...@linaro.org Cc: qemu-sta...@nongnu.org --- hw/arm/smmu-common.c | 3 +-- hw/arm/smmuv3.c | 39 +++ 2 files changed, 32 insertions(+), 10 deletions(-) diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c index 5ab9d45d58a..f35ae9aa22c 100644 --- a/hw/arm/smmu-common.c +++ b/hw/arm/smmu-common.c @@ -216,8 +216,7 @@ static int get_pte(dma_addr_t baseaddr, uint32_t index, uint64_t *pte, dma_addr_t addr = baseaddr + index * sizeof(*pte); /* TODO: guarantee 64-bit single-copy atomicity */ -ret = dma_memory_read(&address_space_memory, addr, pte, sizeof(*pte), - MEMTXATTRS_UNSPECIFIED); +ret = ldq_le_dma(&address_space_memory, addr, pte, MEMTXATTRS_UNSPECIFIED); if (ret != MEMTX_OK) { info->type = SMMU_PTW_ERR_WALK_EABT; diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c index 932f0096974..1e9be8e89af 100644 --- a/hw/arm/smmuv3.c +++ b/hw/arm/smmuv3.c @@ -102,20 +102,34 @@ static void smmuv3_write_gerrorn(SMMUv3State *s, uint32_t new_gerrorn) trace_smmuv3_write_gerrorn(toggled & pending, s->gerrorn); } -static inline MemTxResult queue_read(SMMUQueue *q, void *data) +static inline MemTxResult queue_read(SMMUQueue *q, Cmd *cmd) { dma_addr_t addr = Q_CONS_ENTRY(q); +MemTxResult ret; +int i; -return dma_memory_read(&address_space_memory, addr, data, q->entry_size, - MEMTXATTRS_UNSPECIFIED); +ret = dma_memory_read(&address_space_memory, addr, cmd, sizeof(Cmd), + MEMTXATTRS_UNSPECIFIED); +if (ret != MEMTX_OK) { +return ret; +} +for (i = 0; i < ARRAY_SIZE(cmd->word); i++) { +le32_to_cpus(&cmd->word[i]); +} +return ret; } -static MemTxResult queue_write(SMMUQueue *q, void *data) +static MemTxResult queue_write(SMMUQueue *q, Evt *evt_in) { dma_addr_t addr = Q_PROD_ENTRY(q); MemTxResult ret; +Evt evt = *evt_in; +int i; -ret = dma_memory_write(&address_space_memory, addr, data, q->entry_size, +for (i = 0; i < ARRAY_SIZE(evt.word); i++) { +cpu_to_le32s(&evt.word[i]); +} +ret = dma_memory_write(&address_space_memory, addr, &evt, sizeof(Evt), MEMTXATTRS_UNSPECIFIED); if (ret != MEMTX_OK) { return ret; @@ -298,7 +312,7 @@ static void smmuv3_init_regs(SMMUv3State *s) static int smmu_get_ste(SMMUv3State *s, dma_addr_t addr, STE *buf, SMMUEventInfo *event) { -int ret; +int ret, i; trace_smmuv3_get_ste(addr); /* TODO: guarantee 64-bit single-copy atomicity */ @@ -311,6 +325,9 @@ static int smmu_get_ste(SMMUv3State *s, dma_addr_t addr, STE *buf, event->u.f_ste_fetch.addr = addr; return -EINVAL; } +for (i = 0; i < ARRAY_SIZE(buf->word); i++) { +le32_to_cpus(&buf->word[i]); +} return 0; } @@ -320,7 +337,7 @@ static int smmu_get_cd(SMMUv3State *s, STE *ste, uint32_t ssid, CD *buf, SMMUEventInfo *event) { dma_addr_t addr = STE_CTXPTR(ste); -int ret; +int ret, i; trace_smmuv3_get_cd(addr); /* TODO: guarantee 64-bit single-copy atomicity */ @@ -333,6 +350,9 @@ static int smmu_get_cd(SMMUv3State *s, STE *ste, uint32_t ssid, event->u.f_ste_fetch.addr = addr; return -EINVAL; } +for (i = 0; i < ARRAY_SIZE(buf->word); i++) { +le32_to_cpus(&buf->word[i]); +} return 0; } @@ -569,7 +589,7 @@ static int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste, return -EINVAL; } if (s->features & SMMU_FEATURE_2LVL_STE) { -int l1_ste_offset, l2_ste_offset, max_l2_ste, span; +int l1_ste_offset, l2_ste_offset, max_l2_ste, span, i; dma_addr_t l1ptr, l2ptr; STEDesc l1std; @@ -593,6 +613,9 @@ static int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE *ste, event->u.f_ste_fetch.
[PULL 3/5] target/arm: Special case M-profile in debug_helper.c code
A lot of the code called from helper_exception_bkpt_insn() is written assuming A-profile, but we will also call this helper on M-profile CPUs when they execute a BKPT insn. This used to work by accident, but recent changes mean that we will hit an assert when some of this code calls down into lower level functions that end up calling arm_security_space_below_el3(), arm_el_is_aa64(), and other functions that now explicitly assert that the guest CPU is not M-profile. Handle M-profile directly to avoid the assertions: * in arm_debug_target_el(), M-profile debug exceptions always go to EL1 * in arm_debug_exception_fsr(), M-profile always uses the short format FSR (compare commit d7fe699be54b2, though in this case the code in arm_v7m_cpu_do_interrupt() does not need to look at the FSR value at all) Cc: qemu-sta...@nongnu.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1775 Signed-off-by: Peter Maydell Reviewed-by: Richard Henderson Message-id: 20230721143239.1753066-1-peter.mayd...@linaro.org --- target/arm/debug_helper.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c index 8362462a07e..abe72e35ae6 100644 --- a/target/arm/debug_helper.c +++ b/target/arm/debug_helper.c @@ -21,6 +21,10 @@ static int arm_debug_target_el(CPUARMState *env) bool secure = arm_is_secure(env); bool route_to_el2 = false; +if (arm_feature(env, ARM_FEATURE_M)) { +return 1; +} + if (arm_is_el2_enabled(env)) { route_to_el2 = env->cp15.hcr_el2 & HCR_TGE || env->cp15.mdcr_el2 & MDCR_TDE; @@ -434,18 +438,20 @@ static uint32_t arm_debug_exception_fsr(CPUARMState *env) { ARMMMUFaultInfo fi = { .type = ARMFault_Debug }; int target_el = arm_debug_target_el(env); -bool using_lpae = false; +bool using_lpae; -if (target_el == 2 || arm_el_is_aa64(env, target_el)) { +if (arm_feature(env, ARM_FEATURE_M)) { +using_lpae = false; +} else if (target_el == 2 || arm_el_is_aa64(env, target_el)) { using_lpae = true; } else if (arm_feature(env, ARM_FEATURE_PMSA) && arm_feature(env, ARM_FEATURE_V8)) { using_lpae = true; +} else if (arm_feature(env, ARM_FEATURE_LPAE) && + (env->cp15.tcr_el[target_el] & TTBCR_EAE)) { +using_lpae = true; } else { -if (arm_feature(env, ARM_FEATURE_LPAE) && -(env->cp15.tcr_el[target_el] & TTBCR_EAE)) { -using_lpae = true; -} +using_lpae = false; } if (using_lpae) { -- 2.34.1
[PULL 0/5] target-arm queue
target-arm queue: just bugfixes, mostly mine. thanks -- PMM The following changes since commit 885fc169f09f5915ce037263d20a59eb226d473d: Merge tag 'pull-riscv-to-apply-20230723-3' of https://github.com/alistair23/qemu into staging (2023-07-24 11:34:35 +0100) are available in the Git repository at: https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20230725 for you to fetch changes up to 78cc90346ec680a7f1bb9f138bf7c9654cf526d5: tests/decode: Suppress "error: " string for expected-failure tests (2023-07-25 10:56:52 +0100) target-arm queue: * tests/decode: Suppress "error: " string for expected-failure tests * ui/curses: For curses display, recognize a few more control keys * target/arm: Special case M-profile in debug_helper.c code * scripts/git-submodule.sh: Don't rely on non-POSIX 'read' behaviour * hw/arm/smmu: Handle big-endian hosts correctly Peter Maydell (4): hw/arm/smmu: Handle big-endian hosts correctly scripts/git-submodule.sh: Don't rely on non-POSIX 'read' behaviour target/arm: Special case M-profile in debug_helper.c code tests/decode: Suppress "error: " string for expected-failure tests Sean Estabrooks (1): For curses display, recognize a few more control keys ui/curses_keys.h | 6 ++ hw/arm/smmu-common.c | 3 +-- hw/arm/smmuv3.c | 39 +++ target/arm/debug_helper.c | 18 -- scripts/decodetree.py | 6 +- scripts/git-submodule.sh | 2 +- 6 files changed, 56 insertions(+), 18 deletions(-)
[PULL 4/5] For curses display, recognize a few more control keys
From: Sean Estabrooks The curses display handles most control-X keys, and translates them into their corresponding keycode. Here we recognize a few that are missing, Ctrl-@ (null), Ctrl-\ (backslash), Ctrl-] (right bracket), Ctrl-^ (caret), Ctrl-_ (underscore). Signed-off-by: Sean Estabrooks Message-id: cahyvn3bh9crgduomf7g7ngwamu8d4cvozacb2i4ymnnggbx...@mail.gmail.com Reviewed-by: Peter Maydell Signed-off-by: Peter Maydell --- ui/curses_keys.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/ui/curses_keys.h b/ui/curses_keys.h index 71e04acdc75..88a2208ed18 100644 --- a/ui/curses_keys.h +++ b/ui/curses_keys.h @@ -210,6 +210,12 @@ static const int _curses2keycode[CURSES_CHARS] = { ['N' - '@'] = 49 | CNTRL, /* Control + n */ /* Control + m collides with the keycode for Enter */ +['@' - '@'] = 3 | CNTRL, /* Control + @ */ +/* Control + [ collides with the keycode for Escape */ +['\\' - '@'] = 43 | CNTRL, /* Control + Backslash */ +[']' - '@'] = 27 | CNTRL, /* Control + ] */ +['^' - '@'] = 7 | CNTRL, /* Control + ^ */ +['_' - '@'] = 12 | CNTRL, /* Control + Underscore */ }; static const int _curseskey2keycode[CURSES_KEYS] = { -- 2.34.1
[PULL 5/5] tests/decode: Suppress "error: " string for expected-failure tests
The "expected failure" tests for decodetree result in the error messages from decodetree ending up in logs and in V=1 output: >>> MALLOC_PERTURB_=226 >>> /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/x86/pyvenv/bin/python3 >>> /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/scripts/decodetree.py >>> --output-null --test-for-error >>> /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/x86/../../tests/decode/err_argset1.decode ――― ✀ /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/x86/../../tests/decode/err_argset1.decode:5: error: duplicate argument "a" ――― 1/44 qemu:decodetree / err_argset1OK 0.05s This then produces false positives when scanning the logfiles for strings like "error: ". For the expected-failure tests, make decodetree print "detected:" instead of "error:". Signed-off-by: Peter Maydell Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Richard Henderson Message-id: 20230720131521.1325905-1-peter.mayd...@linaro.org --- scripts/decodetree.py | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/scripts/decodetree.py b/scripts/decodetree.py index a8a6cb69cda..e8b72da3a97 100644 --- a/scripts/decodetree.py +++ b/scripts/decodetree.py @@ -134,6 +134,10 @@ def error_with_file(file, lineno, *args): global output_file global output_fd +# For the test suite expected-errors case, don't print the +# string "error: ", so they don't turn up as false positives +# if you grep the meson logs for strings like that. +end = 'error: ' if not testforerror else 'detected: ' prefix = '' if file: prefix += f'{file}:' @@ -141,7 +145,7 @@ def error_with_file(file, lineno, *args): prefix += f'{lineno}:' if prefix: prefix += ' ' -print(prefix, end='error: ', file=sys.stderr) +print(prefix, end=end, file=sys.stderr) print(*args, file=sys.stderr) if output_file and output_fd: -- 2.34.1
[PATCH] block/blkio: enable the completion eventfd
Until libblkio 1.3.0, virtio-blk drivers had completion eventfd notifications enabled from the start, but from the next releases this is no longer the case, so we have to explicitly enable them. In fact, the libblkio documentation says they could be disabled, so we should always enable them at the start if we want to be sure to get completion eventfd notifications: By default, the driver might not generate completion events for requests so it is necessary to explicitly enable the completion file descriptor before use: void blkioq_set_completion_fd_enabled(struct blkioq *q, bool enable); I discovered this while trying a development version of libblkio: the guest kernel hangs during boot, while probing the device. Fixes: fd66dbd424f5 ("blkio: add libblkio block driver") Signed-off-by: Stefano Garzarella --- block/blkio.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/blkio.c b/block/blkio.c index 1798648134..bc1fac48b7 100644 --- a/block/blkio.c +++ b/block/blkio.c @@ -845,6 +845,7 @@ static int blkio_file_open(BlockDriverState *bs, QDict *options, int flags, QLIST_INIT(&s->bounce_bufs); s->blkioq = blkio_get_queue(s->blkio, 0); s->completion_fd = blkioq_get_completion_fd(s->blkioq); +blkioq_set_completion_fd_enabled(s->blkioq, true); blkio_attach_aio_context(bs, bdrv_get_aio_context(bs)); return 0; -- 2.41.0
[PATCH v2 3/4] vhost: move and rename the conn retry times
Multile devices need this macro, move it to a common header. Signed-off-by: Li Feng --- hw/block/vhost-user-blk.c | 4 +--- hw/virtio/vhost-user-gpio.c | 3 +-- include/hw/virtio/vhost.h | 2 ++ 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c index eecf3f7a81..3c69fa47d5 100644 --- a/hw/block/vhost-user-blk.c +++ b/hw/block/vhost-user-blk.c @@ -32,8 +32,6 @@ #include "sysemu/sysemu.h" #include "sysemu/runstate.h" -#define REALIZE_CONNECTION_RETRIES 3 - static const int user_feature_bits[] = { VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_SEG_MAX, @@ -482,7 +480,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, Error **errp) s->inflight = g_new0(struct vhost_inflight, 1); s->vhost_vqs = g_new0(struct vhost_virtqueue, s->num_queues); -retries = REALIZE_CONNECTION_RETRIES; +retries = VU_REALIZE_CONN_RETRIES; assert(!*errp); do { if (*errp) { diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c index 3b013f2d0f..d9979aa5db 100644 --- a/hw/virtio/vhost-user-gpio.c +++ b/hw/virtio/vhost-user-gpio.c @@ -15,7 +15,6 @@ #include "standard-headers/linux/virtio_ids.h" #include "trace.h" -#define REALIZE_CONNECTION_RETRIES 3 #define VHOST_NVQS 2 /* Features required from VirtIO */ @@ -359,7 +358,7 @@ static void vu_gpio_device_realize(DeviceState *dev, Error **errp) qemu_chr_fe_set_handlers(&gpio->chardev, NULL, NULL, vu_gpio_event, NULL, dev, NULL, true); -retries = REALIZE_CONNECTION_RETRIES; +retries = VU_REALIZE_CONN_RETRIES; g_assert(!*errp); do { if (*errp) { diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h index 6a173cb9fa..ca3131b1af 100644 --- a/include/hw/virtio/vhost.h +++ b/include/hw/virtio/vhost.h @@ -8,6 +8,8 @@ #define VHOST_F_DEVICE_IOTLB 63 #define VHOST_USER_F_PROTOCOL_FEATURES 30 +#define VU_REALIZE_CONN_RETRIES 3 + /* Generic structures common for any vhost based device. */ struct vhost_inflight { -- 2.41.0
[PATCH v2 2/4] vhost-user-common: send get_inflight_fd once
Get_inflight_fd is sent only once. When reconnecting to the backend, qemu sent set_inflight_fd to the backend. Signed-off-by: Li Feng --- hw/scsi/vhost-scsi-common.c | 37 ++--- 1 file changed, 18 insertions(+), 19 deletions(-) diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c index a06f01af26..664adb15b4 100644 --- a/hw/scsi/vhost-scsi-common.c +++ b/hw/scsi/vhost-scsi-common.c @@ -52,20 +52,28 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc) vsc->dev.acked_features = vdev->guest_features; -assert(vsc->inflight == NULL); -vsc->inflight = g_new0(struct vhost_inflight, 1); -ret = vhost_dev_get_inflight(&vsc->dev, - vs->conf.virtqueue_size, - vsc->inflight); +ret = vhost_dev_prepare_inflight(&vsc->dev, vdev); if (ret < 0) { -error_report("Error get inflight: %d", -ret); +error_report("Error setting inflight format: %d", -ret); goto err_guest_notifiers; } -ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight); -if (ret < 0) { -error_report("Error set inflight: %d", -ret); -goto err_guest_notifiers; +if (vsc->inflight) { +if (!vsc->inflight->addr) { +ret = vhost_dev_get_inflight(&vsc->dev, +vs->conf.virtqueue_size, +vsc->inflight); +if (ret < 0) { +error_report("Error get inflight: %d", -ret); +goto err_guest_notifiers; +} +} + +ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight); +if (ret < 0) { +error_report("Error set inflight: %d", -ret); +goto err_guest_notifiers; +} } ret = vhost_dev_start(&vsc->dev, vdev, true); @@ -85,9 +93,6 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc) return ret; err_guest_notifiers: -g_free(vsc->inflight); -vsc->inflight = NULL; - k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false); err_host_notifiers: vhost_dev_disable_notifiers(&vsc->dev, vdev); @@ -111,12 +116,6 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc) } assert(ret >= 0); -if (vsc->inflight) { -vhost_dev_free_inflight(vsc->inflight); -g_free(vsc->inflight); -vsc->inflight = NULL; -} - vhost_dev_disable_notifiers(&vsc->dev, vdev); } -- 2.41.0
[PATCH v2 1/4] vhost: fix the fd leak
When the vhost-user reconnect to the backend, the notifer should be cleanup. Otherwise, the fd resource will be exhausted. Fixes: f9a09ca3ea ("vhost: add support for configure interrupt") Signed-off-by: Li Feng --- hw/virtio/vhost.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index abf0d03c8d..e2f6ffb446 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -2044,6 +2044,8 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings) event_notifier_test_and_clear( &hdev->vqs[VHOST_QUEUE_NUM_CONFIG_INR].masked_config_notifier); event_notifier_test_and_clear(&vdev->config_notifier); +event_notifier_cleanup( +&hdev->vqs[VHOST_QUEUE_NUM_CONFIG_INR].masked_config_notifier); trace_vhost_dev_stop(hdev, vdev->name, vrings); -- 2.41.0
[PATCH v2 4/4] vhost-user-scsi: support reconnect to backend
If the backend crashes and restarts, the device is broken. This patch adds reconnect for vhost-user-scsi. Tested with spdk backend. Signed-off-by: Li Feng --- hw/scsi/vhost-scsi-common.c | 6 + hw/scsi/vhost-user-scsi.c | 220 +++--- include/hw/virtio/vhost-scsi-common.h | 3 + include/hw/virtio/vhost-user-scsi.h | 3 + 4 files changed, 211 insertions(+), 21 deletions(-) diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c index 664adb15b4..3fde477eee 100644 --- a/hw/scsi/vhost-scsi-common.c +++ b/hw/scsi/vhost-scsi-common.c @@ -81,6 +81,7 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc) error_report("Error start vhost dev"); goto err_guest_notifiers; } +vsc->started_vu = true; /* guest_notifier_mask/pending not used yet, so just unmask * everything here. virtio-pci will do the right thing by @@ -106,6 +107,11 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc) VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); int ret = 0; +if (!vsc->started_vu) { +return; +} +vsc->started_vu = false; + vhost_dev_stop(&vsc->dev, vdev, true); if (k->set_guest_notifiers) { diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c index ee99b19e7a..bd32dcf999 100644 --- a/hw/scsi/vhost-user-scsi.c +++ b/hw/scsi/vhost-user-scsi.c @@ -46,20 +46,25 @@ enum VhostUserProtocolFeature { static void vhost_user_scsi_set_status(VirtIODevice *vdev, uint8_t status) { VHostUserSCSI *s = (VHostUserSCSI *)vdev; +DeviceState *dev = &s->parent_obj.parent_obj.parent_obj.parent_obj; VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); -bool start = (status & VIRTIO_CONFIG_S_DRIVER_OK) && vdev->vm_running; +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); +bool should_start = virtio_device_should_start(vdev, status); +int ret; -if (vhost_dev_is_started(&vsc->dev) == start) { +if (!s->connected) { return; } -if (start) { -int ret; +if (vhost_dev_is_started(&vsc->dev) == should_start) { +return; +} +if (should_start) { ret = vhost_scsi_common_start(vsc); if (ret < 0) { error_report("unable to start vhost-user-scsi: %s", strerror(-ret)); -exit(1); +qemu_chr_fe_disconnect(&vs->conf.chardev); } } else { vhost_scsi_common_stop(vsc); @@ -85,8 +90,160 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev) } } -static void vhost_dummy_handle_output(VirtIODevice *vdev, VirtQueue *vq) +static void vhost_user_scsi_handle_output(VirtIODevice *vdev, VirtQueue *vq) { +VHostUserSCSI *s = (VHostUserSCSI *)vdev; +DeviceState *dev = &s->parent_obj.parent_obj.parent_obj.parent_obj; +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); + +Error *local_err = NULL; +int i, ret; + +if (!vdev->start_on_kick) { +return; +} + +if (!s->connected) { +return; +} + +if (vhost_dev_is_started(&vsc->dev)) { +return; +} + +/* + * Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start + * vhost here instead of waiting for .set_status(). + */ +ret = vhost_scsi_common_start(vsc); +if (ret < 0) { +error_reportf_err(local_err, "vhost-user-blk: vhost start failed: "); +qemu_chr_fe_disconnect(&vs->conf.chardev); +return; +} + +/* Kick right away to begin processing requests already in vring */ +for (i = 0; i < vsc->dev.nvqs; i++) { +VirtQueue *kick_vq = virtio_get_queue(vdev, i); + +if (!virtio_queue_get_desc_addr(vdev, i)) { +continue; +} +event_notifier_set(virtio_queue_get_host_notifier(kick_vq)); +} +} + +static int vhost_user_scsi_connect(DeviceState *dev, Error **errp) +{ +VirtIODevice *vdev = VIRTIO_DEVICE(dev); +VHostUserSCSI *s = VHOST_USER_SCSI(vdev); +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); +int ret = 0; + +if (s->connected) { +return 0; +} +s->connected = true; + +vsc->dev.num_queues = vs->conf.num_queues; +vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues; +vsc->dev.vqs = s->vhost_vqs; +vsc->dev.vq_index = 0; +vsc->dev.backend_features = 0; + +ret = vhost_dev_init(&vsc->dev, &s->vhost_user, VHOST_BACKEND_TYPE_USER, 0, + errp); +if (ret < 0) { +return ret; +} + +/* restore vhost state */ +if (virtio_device_started(vdev, vdev->status)) { +ret = vhost_scsi_common_start(vsc); +if (ret < 0) { +return ret; +} +} + +return 0; +} + +static void vhost_user_scsi_event(void *opaque, QEMUChrEvent event); + +static void vhost_user_scsi_disconnect(DeviceState *dev) +{ +VirtIODevice *vdev = VIRTIO_DEV
[PATCH v2 0/4] Implement reconnect for vhost-user-scsi
Hi, This patchset adds reconnect support for vhost-user-scsi. At the same times, fix vhost fd leak and refactor some code. Changes for v2: - Split the v1 patch to small separate patchset; - New patch for fixing fd leak, which has sent to reviewers in another mail; - Implement the `vhost_user_scsi_handle_output`; - Add the started_vu safe check; - Fix error handler; - Check the inflight before set/get inflight fd. Li Feng (4): vhost: fix the fd leak vhost-user-common: send get_inflight_fd once vhost: move and rename the conn retry times vhost-user-scsi: support reconnect to backend hw/block/vhost-user-blk.c | 4 +- hw/scsi/vhost-scsi-common.c | 43 ++--- hw/scsi/vhost-user-scsi.c | 220 +++--- hw/virtio/vhost-user-gpio.c | 3 +- hw/virtio/vhost.c | 2 + include/hw/virtio/vhost-scsi-common.h | 3 + include/hw/virtio/vhost-user-scsi.h | 3 + include/hw/virtio/vhost.h | 2 + 8 files changed, 235 insertions(+), 45 deletions(-) -- 2.41.0
[PATCH] semihosting/uaccess.c: Replaced a malloc call with g_malloc.
Replaced a call to malloc() and its respective call to free() with g_malloc() and g_free(). Signed-off-by: dinglimin V2 -> V3:softmmu_unlock_user changes free to g free. V1 -> V2:if cpu_memory_rw_debug failed, still need to set p=NULL --- semihosting/uaccess.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/semihosting/uaccess.c b/semihosting/uaccess.c index 8018828069..81a0f80e9f 100644 --- a/semihosting/uaccess.c +++ b/semihosting/uaccess.c @@ -14,10 +14,10 @@ void *softmmu_lock_user(CPUArchState *env, target_ulong addr, target_ulong len, bool copy) { -void *p = malloc(len); +void *p = g_malloc(len); if (p && copy) { if (cpu_memory_rw_debug(env_cpu(env), addr, p, len, 0)) { -free(p); +g_free(p); p = NULL; } } @@ -87,5 +87,5 @@ void softmmu_unlock_user(CPUArchState *env, void *p, if (len) { cpu_memory_rw_debug(env_cpu(env), addr, p, len, 1); } -free(p); +g_free(p); } -- 2.30.0.windows.2
Re: [PATCH v2 4/4] vhost-user-scsi: support reconnect to backend
> 2023年7月25日 下午6:42,Li Feng 写道: > > If the backend crashes and restarts, the device is broken. > This patch adds reconnect for vhost-user-scsi. > > Tested with spdk backend. > > Signed-off-by: Li Feng > --- > hw/scsi/vhost-scsi-common.c | 6 + > hw/scsi/vhost-user-scsi.c | 220 +++--- > include/hw/virtio/vhost-scsi-common.h | 3 + > include/hw/virtio/vhost-user-scsi.h | 3 + > 4 files changed, 211 insertions(+), 21 deletions(-) > > diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c > index 664adb15b4..3fde477eee 100644 > --- a/hw/scsi/vhost-scsi-common.c > +++ b/hw/scsi/vhost-scsi-common.c > @@ -81,6 +81,7 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc) > error_report("Error start vhost dev"); > goto err_guest_notifiers; > } > +vsc->started_vu = true; > > /* guest_notifier_mask/pending not used yet, so just unmask > * everything here. virtio-pci will do the right thing by > @@ -106,6 +107,11 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc) > VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); > int ret = 0; > > +if (!vsc->started_vu) { > +return; > +} > +vsc->started_vu = false; > + > vhost_dev_stop(&vsc->dev, vdev, true); > > if (k->set_guest_notifiers) { > diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c > index ee99b19e7a..bd32dcf999 100644 > --- a/hw/scsi/vhost-user-scsi.c > +++ b/hw/scsi/vhost-user-scsi.c > @@ -46,20 +46,25 @@ enum VhostUserProtocolFeature { > static void vhost_user_scsi_set_status(VirtIODevice *vdev, uint8_t status) > { > VHostUserSCSI *s = (VHostUserSCSI *)vdev; > +DeviceState *dev = &s->parent_obj.parent_obj.parent_obj.parent_obj; > VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); > -bool start = (status & VIRTIO_CONFIG_S_DRIVER_OK) && vdev->vm_running; > +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); > +bool should_start = virtio_device_should_start(vdev, status); > +int ret; > > -if (vhost_dev_is_started(&vsc->dev) == start) { > +if (!s->connected) { > return; > } > > -if (start) { > -int ret; > +if (vhost_dev_is_started(&vsc->dev) == should_start) { > +return; > +} > > +if (should_start) { > ret = vhost_scsi_common_start(vsc); > if (ret < 0) { > error_report("unable to start vhost-user-scsi: %s", > strerror(-ret)); > -exit(1); > +qemu_chr_fe_disconnect(&vs->conf.chardev); > } > } else { > vhost_scsi_common_stop(vsc); > @@ -85,8 +90,160 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev) > } > } > > -static void vhost_dummy_handle_output(VirtIODevice *vdev, VirtQueue *vq) > +static void vhost_user_scsi_handle_output(VirtIODevice *vdev, VirtQueue *vq) > { > +VHostUserSCSI *s = (VHostUserSCSI *)vdev; > +DeviceState *dev = &s->parent_obj.parent_obj.parent_obj.parent_obj; > +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); > +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); > + > +Error *local_err = NULL; > +int i, ret; > + > +if (!vdev->start_on_kick) { > +return; > +} > + > +if (!s->connected) { > +return; > +} > + > +if (vhost_dev_is_started(&vsc->dev)) { > +return; > +} > + > +/* > + * Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start > + * vhost here instead of waiting for .set_status(). > + */ > +ret = vhost_scsi_common_start(vsc); > +if (ret < 0) { > +error_reportf_err(local_err, "vhost-user-blk: vhost start failed: “); Need fix typo in v3. s/vhost-user-blk/vhost-user-scsi/g > +qemu_chr_fe_disconnect(&vs->conf.chardev); > +return; > +} > + > +/* Kick right away to begin processing requests already in vring */ > +for (i = 0; i < vsc->dev.nvqs; i++) { > +VirtQueue *kick_vq = virtio_get_queue(vdev, i); > + > +if (!virtio_queue_get_desc_addr(vdev, i)) { > +continue; > +} > +event_notifier_set(virtio_queue_get_host_notifier(kick_vq)); > +} > +} > + > +static int vhost_user_scsi_connect(DeviceState *dev, Error **errp) > +{ > +VirtIODevice *vdev = VIRTIO_DEVICE(dev); > +VHostUserSCSI *s = VHOST_USER_SCSI(vdev); > +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); > +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); > +int ret = 0; > + > +if (s->connected) { > +return 0; > +} > +s->connected = true; > + > +vsc->dev.num_queues = vs->conf.num_queues; > +vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues; > +vsc->dev.vqs = s->vhost_vqs; > +vsc->dev.vq_index = 0; > +vsc->dev.backend_features = 0; > + > +ret = vhost_dev_init(&vsc->dev, &s->vhost_user, VHOST_BACKEND_TYPE_USER, > 0, > + errp); > +if (ret < 0) { > +return ret; > +} > + > +/* restore vhost state
Re: [PATCH] migration: Allow user to specify migration available bandwidth
Peter Xu writes: > Migration bandwidth is a very important value to live migration. It's > because it's one of the major factors that we'll make decision on when to > switchover to destination in a precopy process. > > This value is currently estimated by QEMU during the whole live migration > process by monitoring how fast we were sending the data. This can be the > most accurate bandwidth if in the ideal world, where we're always feeding > unlimited data to the migration channel, and then it'll be limited to the > bandwidth that is available. > > However in reality it may be very different, e.g., over a 10Gbps network we > can see query-migrate showing migration bandwidth of only a few tens of > MB/s just because there are plenty of other things the migration thread > might be doing. For example, the migration thread can be busy scanning > zero pages, or it can be fetching dirty bitmap from other external dirty > sources (like vhost or KVM). It means we may not be pushing data as much > as possible to migration channel, so the bandwidth estimated from "how many > data we sent in the channel" can be dramatically inaccurate sometimes, > e.g., that a few tens of MB/s even if 10Gbps available, and then the > decision to switchover will be further affected by this. > > The migration may not even converge at all with the downtime specified, > with that wrong estimation of bandwidth. > > The issue is QEMU itself may not be able to avoid those uncertainties on > measuing the real "available migration bandwidth". At least not something > I can think of so far. > > One way to fix this is when the user is fully aware of the available > bandwidth, then we can allow the user to help providing an accurate value. > > For example, if the user has a dedicated channel of 10Gbps for migration > for this specific VM, the user can specify this bandwidth so QEMU can > always do the calculation based on this fact, trusting the user as long as > specified. > > When the user wants to have migration only use 5Gbps out of that 10Gbps, > one can set max-bandwidth to 5Gbps, along with available-bandwidth to 5Gbps > so it'll never use over 5Gbps too (so the user can have the rest 5Gbps for > other things). So it can be useful even if the network is not dedicated, > but as long as the user can know a solid value. > > A new parameter "available-bandwidth" is introduced just for this. So when > the user specified this parameter, instead of trusting the estimated value > from QEMU itself (based on the QEMUFile send speed), let's trust the user > more. > > This can resolve issues like "unconvergence migration" which is caused by > hilarious low "migration bandwidth" detected for whatever reason. > > Reported-by: Zhiyi Guo > Signed-off-by: Peter Xu > --- > qapi/migration.json| 20 +++- > migration/migration.h | 2 +- > migration/options.h| 1 + > migration/migration-hmp-cmds.c | 14 ++ > migration/migration.c | 19 +++ > migration/options.c| 28 > migration/trace-events | 2 +- > 7 files changed, 79 insertions(+), 7 deletions(-) > > diff --git a/qapi/migration.json b/qapi/migration.json > index 47dfef0278..fdc269e0a1 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -730,6 +730,16 @@ > # @max-bandwidth: to set maximum speed for migration. maximum speed > # in bytes per second. (Since 2.8) > # > +# @available-bandwidth: to set available bandwidth for migration. By > +# default, this value is zero, means the user is not aware of the > +# available bandwidth that can be used by QEMU migration, so QEMU will > +# estimate the bandwidth automatically. This can be set when the > +# estimated value is not accurate, while the user is able to guarantee > +# such bandwidth is available for migration purpose during the > +# migration procedure. When specified correctly, this can make the > +# switchover decision much more accurate, which will also be based on > +# the max downtime specified. (Since 8.2) Humor me: break lines slightly earlier, like # @available-bandwidth: to set available bandwidth for migration. By # default, this value is zero, means the user is not aware of the # available bandwidth that can be used by QEMU migration, so QEMU # will estimate the bandwidth automatically. This can be set when # the estimated value is not accurate, while the user is able to # guarantee such bandwidth is available for migration purpose # during the migration procedure. When specified correctly, this # can make the switchover decision much more accurate, which will # also be based on the max downtime specified. (Since 8.2) > +# > # @downtime-limit: set maximum tolerated downtime for migration. > # maximum downtime in milliseconds (Since 2.8) > # > @@ -803,7 +813,7 @@ >
[PATCH] block/blkio: do not use open flags in qemu_open()
qemu_open() in blkio_virtio_blk_common_open() is used to open the character device (e.g. /dev/vhost-vdpa-0 or /dev/vfio/vfio) or in the future eventually the unix socket. In all these cases we cannot open the path in read-only mode, when the `read-only` option of blockdev is on, because the exchange of IOCTL commands for example will fail. In order to open the device read-only, we have to use the `read-only` property of the libblkio driver as we already do in blkio_file_open(). Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for virtio-blk") Reported-by: Qing Wang Signed-off-by: Stefano Garzarella --- block/blkio.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/block/blkio.c b/block/blkio.c index 1798648134..fe9bf8ea5f 100644 --- a/block/blkio.c +++ b/block/blkio.c @@ -686,15 +686,7 @@ static int blkio_virtio_blk_common_open(BlockDriverState *bs, * layer through the "/dev/fdset/N" special path. */ if (fd_supported) { -int open_flags; - -if (flags & BDRV_O_RDWR) { -open_flags = O_RDWR; -} else { -open_flags = O_RDONLY; -} - -fd = qemu_open(path, open_flags, errp); +fd = qemu_open(path, O_RDWR, errp); if (fd < 0) { return -EINVAL; } -- 2.41.0
Re: [PATCH] block/blkio: do not use open flags in qemu_open()
On Tue, Jul 25, 2023 at 01:11:55PM +0200, Stefano Garzarella wrote: > qemu_open() in blkio_virtio_blk_common_open() is used to open the > character device (e.g. /dev/vhost-vdpa-0 or /dev/vfio/vfio) or in > the future eventually the unix socket. > > In all these cases we cannot open the path in read-only mode, > when the `read-only` option of blockdev is on, because the exchange > of IOCTL commands for example will fail. > > In order to open the device read-only, we have to use the `read-only` > property of the libblkio driver as we already do in blkio_file_open(). > > Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for > virtio-blk") > Reported-by: Qing Wang > Signed-off-by: Stefano Garzarella > --- > block/blkio.c | 10 +- > 1 file changed, 1 insertion(+), 9 deletions(-) > > diff --git a/block/blkio.c b/block/blkio.c > index 1798648134..fe9bf8ea5f 100644 > --- a/block/blkio.c > +++ b/block/blkio.c > @@ -686,15 +686,7 @@ static int blkio_virtio_blk_common_open(BlockDriverState > *bs, > * layer through the "/dev/fdset/N" special path. > */ > if (fd_supported) { > -int open_flags; > - > -if (flags & BDRV_O_RDWR) { > -open_flags = O_RDWR; > -} else { > -open_flags = O_RDONLY; > -} > - > -fd = qemu_open(path, open_flags, errp); I'd suggest taking the paragraph from the commit message explaining why it is correct to hardcode O_RDWR and putting it into a comment here. > +fd = qemu_open(path, O_RDWR, errp); > if (fd < 0) { > return -EINVAL; > } > -- > 2.41.0 > > With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH] block/blkio: do not use open flags in qemu_open()
On Tue, Jul 25, 2023 at 12:15:40PM +0100, Daniel P. Berrangé wrote: On Tue, Jul 25, 2023 at 01:11:55PM +0200, Stefano Garzarella wrote: qemu_open() in blkio_virtio_blk_common_open() is used to open the character device (e.g. /dev/vhost-vdpa-0 or /dev/vfio/vfio) or in the future eventually the unix socket. In all these cases we cannot open the path in read-only mode, when the `read-only` option of blockdev is on, because the exchange of IOCTL commands for example will fail. In order to open the device read-only, we have to use the `read-only` property of the libblkio driver as we already do in blkio_file_open(). Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for virtio-blk") Reported-by: Qing Wang Signed-off-by: Stefano Garzarella --- block/blkio.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/block/blkio.c b/block/blkio.c index 1798648134..fe9bf8ea5f 100644 --- a/block/blkio.c +++ b/block/blkio.c @@ -686,15 +686,7 @@ static int blkio_virtio_blk_common_open(BlockDriverState *bs, * layer through the "/dev/fdset/N" special path. */ if (fd_supported) { -int open_flags; - -if (flags & BDRV_O_RDWR) { -open_flags = O_RDWR; -} else { -open_flags = O_RDONLY; -} - -fd = qemu_open(path, open_flags, errp); I'd suggest taking the paragraph from the commit message explaining why it is correct to hardcode O_RDWR and putting it into a comment here. Ack, I'd also add: Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2225439 in the commit message. I'll fix in v2. Thanks, Stefano +fd = qemu_open(path, O_RDWR, errp); if (fd < 0) { return -EINVAL; } -- 2.41.0 With regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Format type of qemu NVMe virtual drive reverted back to its default (512 bytes block size) after performing hot plugout/plugin operation on that drive.
Hi, I have a virtual system created using qemu 7.2. In that system, I attached/hot plugged a virtual NVMe drive. This drive had a default block size of 512 bytes. admin@node-3:~$ sudo nvme list Node SN Model Namespace Usage Format FW Rev - - -- /dev/nvme0n1 ashudev-6f34a1cf_13 QEMU NVMe Ctrl 1 34.36 GB / 34.36 GB512 B + 0 B 7.1.92 After that, I formatted this drive with 4k block size and it formatted successfully. admin@node-3:~$ sudo nvme format /dev/nvme0n1 -f --lbaf 4 Success formatting namespace:1 admin@node-3:~$ admin@node-3:~$ sudo nvme list Node SN Model Namespace Usage Format FW Rev - - -- /dev/nvme0n1 ashudev-6f34a1cf_13 QEMU NVMe Ctrl 1 34.36 GB / 34.36 GB 4 KiB + 0 B 7.1.92 Then, I just performed the hot plugout and then plugin operation on that drive using qmp.execute's device_del and device_add cmd respectively. But, after that, the default block size of that drive reverted to 512 bytes. admin@node-3:~$ sudo nvme list Node SN Model Namespace Usage Format FW Rev - - -- /dev/nvme0n1 ashudev-6f34a1cf_13 QEMU NVMe Ctrl 1 34.36 GB / 34.36 GB512 B + 0 B 7.1.92 So, I just wanted to know why the NVMe format type reverted back to 512 bytes, as I just performed the hot plugout/plugin operations only. Drive's block size (format type) should not be changed upon removal/insertion, right ? or am I missing something ? Regards, Ashutosh
[PATCH] ui/dbus: fix win32 compilation when !opengl
From: Marc-Andre Lureau Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1782 Signed-off-by: Marc-André Lureau --- ui/dbus-listener.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/ui/dbus-listener.c b/ui/dbus-listener.c index 68ff343799..02fc6ae239 100644 --- a/ui/dbus-listener.c +++ b/ui/dbus-listener.c @@ -338,6 +338,7 @@ static bool dbus_scanout_map(DBusDisplayListener *ddl) return true; } +#ifdef CONFIG_OPENGL static bool dbus_scanout_share_d3d_texture( DBusDisplayListener *ddl, @@ -399,7 +400,8 @@ dbus_scanout_share_d3d_texture( return true; } -#endif +#endif /* CONFIG_OPENGL */ +#endif /* WIN32 */ #ifdef CONFIG_OPENGL static void dbus_scanout_texture(DisplayChangeListener *dcl, -- 2.41.0
[PATCH for-8.1] hw/rdma/vmw/pvrdma_cmd: Use correct struct in query_port()
In query_port() we pass the address of a local pvrdma_port_attr struct to the rdma_query_backend_port() function. Unfortunately, rdma_backend_query_port() wants a pointer to a struct ibv_port_attr, and the two are not the same length. Coverity spotted this (CID 1507146): pvrdma_port_attr is 48 bytes long, and ibv_port_attr is 52 bytes, because it has a few extra fields at the end. Fortunately, all we do with the attrs struct after the call is to read a few specific fields out of it which are all at the same offsets in both structs, so we can simply make the local variable the correct type. This also lets us drop the cast (which should have been a bit of a warning flag that we were doing something wrong here). Cc: qemu-sta...@nongnu.org Signed-off-by: Peter Maydell --- I don't know anything about the rdma code so this fix is based purely on looking at the code, and is untested beyond just make check/make check-avocado. --- hw/rdma/vmw/pvrdma_cmd.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/hw/rdma/vmw/pvrdma_cmd.c b/hw/rdma/vmw/pvrdma_cmd.c index c6ed0259821..d31c1875938 100644 --- a/hw/rdma/vmw/pvrdma_cmd.c +++ b/hw/rdma/vmw/pvrdma_cmd.c @@ -129,14 +129,13 @@ static int query_port(PVRDMADev *dev, union pvrdma_cmd_req *req, { struct pvrdma_cmd_query_port *cmd = &req->query_port; struct pvrdma_cmd_query_port_resp *resp = &rsp->query_port_resp; -struct pvrdma_port_attr attrs = {}; +struct ibv_port_attr attrs = {}; if (cmd->port_num > MAX_PORTS) { return -EINVAL; } -if (rdma_backend_query_port(&dev->backend_dev, -(struct ibv_port_attr *)&attrs)) { +if (rdma_backend_query_port(&dev->backend_dev, &attrs)) { return -ENOMEM; } -- 2.34.1
Re: [PATCH] Open file as read only on private mapping in qemu_ram_alloc_from_file
Hi, patch subject should start with "softmmu/physmem: Open ..." On 25.07.23 12:52, Thiner Logoer wrote: An read only file can be mapped with read write as long as the mapping is private, which is very common case. Make At least in the environments I know, using private file mappings is a corner case ;) What is you use case? VM templating? qemu_ram_alloc_from_file open file as read only when the mapping is private, otherwise open will fail when file does not allow write. If this file does not exist or is a directory, the flag is not used, so it should be OK. from https://gitlab.com/qemu-project/qemu/-/issues/1689 Signed-off-by: Thiner Logoer --- softmmu/physmem.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/softmmu/physmem.c b/softmmu/physmem.c index 3df73542e1..e8036ee335 100644 --- a/softmmu/physmem.c +++ b/softmmu/physmem.c @@ -1945,8 +1945,15 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, int fd; bool created; RAMBlock *block; + ^ .git/rebase-apply/patch:13: trailing whitespace. +/* + * If map is private, the fd does not need to be writable. + * This only get effective when the file is existent. "This will get ignored if the file does not yet exist." + */ +bool open_as_readonly = readonly || !(ram_flags & RAM_SHARED); -fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created, +fd = file_ram_open(mem_path, memory_region_name(mr), + open_as_readonly, &created, errp); if (fd < 0) { return NULL; Opening a file R/O will also make operations like fallocate/ftruncate/ ... fail. For example, this will make fallocate(FALLOC_FL_PUNCH_HOLE) stop working and in turn make ram_block_discard_range() bail out. There was a recent discussion/patch on that: commit 1d44ff586f8a8e113379430750b5a0a2a3f64cf9 Author: David Hildenbrand Date: Thu Jul 6 09:56:06 2023 +0200 softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping ram_block_discard_range() cannot possibly do the right thing in MAP_PRIVATE file mappings in the general case. To achieve the documented semantics, we also have to punch a hole into the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings of such a file. For example, using VM templating -- see commit b17fbbe55cba ("migration: allow private destination ram with x-ignore-shared") -- in combination with any mechanism that relies on discarding of RAM is problematic. This includes: * Postcopy live migration * virtio-balloon inflation/deflation or free-page-reporting * virtio-mem So at least warn that there is something possibly dangerous is going on when using ram_block_discard_range() in these cases. While it doesn't work "in the general case", it works in the "single file owner" case where someone simply forgot to specify "share=on" -- "share=off" is the default for memory-backend-file :( . For example, with hugetlb+virtio-mem the following works if the file does not exists: (note that virtio-mem will fallocate(FALLOC_FL_PUNCH_HOLE) the whole file upfront) ... -object memory-backend-file,share=off,mem-path=/dev/hugepages/vmem0,id=mem2,size=2G \ -device virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root With you patch, once the file already exists, we would now get qemu-system-x86_64: -device virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root: ram_block_discard_range: Failed to fallocate :0 +8000 (-9) qemu-system-x86_64: -device virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root: Unexpected error discarding RAM: Bad file descriptor So this has the potential to break existing setups. The easy fix for these would be to configure "share=on" in these now-failing setups. Hm -- Cheers, David / dhildenb
Re: [PULL 0/2] QAPI patches patches for 2023-07-10
On Mon, 10 Jul 2023 at 12:21, Markus Armbruster wrote: > > > The following changes since commit 2ff49e96accc8fd9a38e9abd16f0cfa0adab1605: > > Merge tag 'pull-tcg-20230709' of https://gitlab.com/rth7680/qemu into > staging (2023-07-09 15:01:43 +0100) > > are available in the Git repository at: > > https://repo.or.cz/qemu/armbru.git tags/pull-qapi-2023-07-10 > > for you to fetch changes up to fd658a7b8cf1091ae2914655add4511865d7edc0: > > migration.json: Don't use space before colon (2023-07-10 07:47:36 +0200) > > > QAPI patches patches for 2023-07-10 > > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/8.1 for any user-visible changes. -- PMM
Re: [PULL 00/16] s390x fixes
On Mon, 24 Jul 2023 at 17:53, Thomas Huth wrote: > > The following changes since commit 885fc169f09f5915ce037263d20a59eb226d473d: > > Merge tag 'pull-riscv-to-apply-20230723-3' of https://github.com/alistair23/qemu into staging (2023-07-24 11:34:35 +0100) > > are available in the Git repository at: > > https://gitlab.com/thuth/qemu.git tags/pull-request-2023-07-24 > > for you to fetch changes up to bd39b7b5f34c2f6b9272bf281ee0324cb07fc3ee: > > tests/avocado/machine_s390_ccw_virtio: Skip the flaky virtio-gpu test by default (2023-07-24 18:44:48 +0200) > > > * Fix emulation of s390x instructions: CKSM, CLM, ICM, MC, CLGEBR(A) > * Remove useless and non-working s390x migration avocado tests > * Fix loongarch CSRRD CPUID instruction when running on s390x hosts > * Disable flaky s390x virtio-gpu test by default > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/8.1 for any user-visible changes. -- PMM
[PATCH v3 1/3] linux-user: Show heap address in /proc/pid/maps
Show the memory location of the heap in the /proc/pid/maps file inside the guest. The heap address will be stored in ts->heap_base, so make that variable visible for all guest architectures, not just architectures for semihosted binaries (arm, m68k, riscv). Show 32- and 64-bit pointers with 8 digits and leading zeros (%08x/%08lx). For 64-bit we could use %16lx, but we mimic the Linux kernel, which shows it with %08lx too. Example: user@machine:/# uname -a Linux paq 5.15.88+ #47 SMP Sun Jan 15 12:53:11 CET 2023 aarch64 GNU/Linux user@machine:/# cat /proc/self/maps -9000 r-xp 08:01 2380521 /usr/bin/cat 9000-0001f000 ---p 00:00 0 0001f000-0002 r--p f000 08:01 2380521 /usr/bin/cat 0002-00021000 rw-p 0001 08:01 2380521 /usr/bin/cat 00021000-00042000 rw-p 00:00 0 [heap] 55-551000 ---p 00:00 0 551000-5500801000 rw-p 00:00 0 [stack] 5500801000-5500827000 r-xp 08:01 2395258 /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1 5500827000-550083f000 ---p 00:00 0 550083f000-5500841000 r--p 0002e000 08:01 2395258 /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1 5500841000-5500843000 rw-p 0003 08:01 2395258 /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1 5500843000-5500844000 r-xp 00:00 0 5500844000-5500846000 rw-p 00:00 0 550085-55009d7000 r-xp 08:01 2395261 /usr/lib/aarch64-linux-gnu/libc.so.6 55009d7000-55009ed000 ---p 00187000 08:01 2395261 /usr/lib/aarch64-linux-gnu/libc.so.6 55009ed000-55009f r--p 0018d000 08:01 2395261 /usr/lib/aarch64-linux-gnu/libc.so.6 55009f-55009f2000 rw-p 0019 08:01 2395261 /usr/lib/aarch64-linux-gnu/libc.so.6 55009f2000-55009ff000 rw-p 00:00 0 Signed-off-by: Helge Deller --- include/exec/cpu_ldst.h | 4 ++-- linux-user/main.c | 1 + linux-user/qemu.h | 4 ++-- linux-user/syscall.c| 8 ++-- 4 files changed, 11 insertions(+), 6 deletions(-) diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h index 645476f0e5..f1e6f31e88 100644 --- a/include/exec/cpu_ldst.h +++ b/include/exec/cpu_ldst.h @@ -72,10 +72,10 @@ */ #if TARGET_VIRT_ADDR_SPACE_BITS <= 32 typedef uint32_t abi_ptr; -#define TARGET_ABI_FMT_ptr "%x" +#define TARGET_ABI_FMT_ptr "%08x" #else typedef uint64_t abi_ptr; -#define TARGET_ABI_FMT_ptr "%"PRIx64 +#define TARGET_ABI_FMT_ptr "%08"PRIx64 #endif #ifndef TARGET_TAGGED_ADDRESSES diff --git a/linux-user/main.c b/linux-user/main.c index dba67ffa36..12f3d8a93e 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -955,6 +955,7 @@ int main(int argc, char **argv, char **envp) the real value of GUEST_BASE into account. */ tcg_prologue_init(tcg_ctx); +ts->heap_base = info->brk; target_cpu_copy_regs(env, regs); if (gdbstub) { diff --git a/linux-user/qemu.h b/linux-user/qemu.h index 802794db63..7a6adac637 100644 --- a/linux-user/qemu.h +++ b/linux-user/qemu.h @@ -121,11 +121,11 @@ typedef struct TaskState { #ifdef TARGET_M68K abi_ulong tp_value; #endif -#if defined(TARGET_ARM) || defined(TARGET_M68K) || defined(TARGET_RISCV) + /* Extra fields for semihosted binaries. */ abi_ulong heap_base; abi_ulong heap_limit; -#endif + abi_ulong stack_base; int used; /* non zero if used */ struct image_info *info; diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 95727a816a..220c4a04b8 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -8131,14 +8131,18 @@ static int open_self_maps_1(CPUArchState *cpu_env, int fd, bool smaps) continue; } +path = e->path; + +if (ts->heap_base && h2g(min) == ts->heap_base) { +path = "[heap]"; +} + #ifdef TARGET_HPPA if (h2g(max) == ts->info->stack_limit) { #else if (h2g(min) == ts->info->stack_limit) { #endif path = "[stack]"; -} else { -path = e->path; } count = dprintf(fd, TARGET_ABI_FMT_ptr "-" TARGET_ABI_FMT_ptr -- 2.41.0
[PATCH v3 3/3] linux-user: Load pie executables at upper memory
Adjust the loader to load dynamic pie executables at around: ~ 0x55 for 64-bit guest binaries on 64-bit host, - 0x4000for 32-bit guest binaries on 64-bit host, and - 0xfor 32-bit guest binaries on 32-bit host. Reason for this change is to unbreak the Thread Sanitizer (TSan) application again, as it was done in aab613fb9597 ("linux-user: Update TASK_UNMAPPED_BASE for aarch64"). Signed-off-by: Helge Deller --- linux-user/elfload.c | 6 -- linux-user/loader.h | 12 linux-user/mmap.c| 16 +--- 3 files changed, 17 insertions(+), 17 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index 47a118e430..8f5a79b537 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -3021,6 +3021,7 @@ static void load_elf_image(const char *image_name, int image_fd, struct elfhdr *ehdr = (struct elfhdr *)bprm_buf; struct elf_phdr *phdr; abi_ulong load_addr, load_bias, loaddr, hiaddr, error; +unsigned long load_offset = 0; int i, retval, prot_exec; Error *err = NULL; bool is_main_executable; @@ -3121,6 +3122,7 @@ static void load_elf_image(const char *image_name, int image_fd, * select guest_base. In this case we pass a size. */ probe_guest_base(image_name, 0, hiaddr - loaddr); +load_offset = TASK_UNMAPPED_BASE_PIE; } } @@ -3138,7 +3140,7 @@ static void load_elf_image(const char *image_name, int image_fd, * In both cases, we will overwrite pages in this range with mappings * from the executable. */ -load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE, +load_addr = target_mmap(loaddr + load_offset, (size_t)hiaddr - loaddr + 1, PROT_NONE, MAP_PRIVATE | MAP_ANON | MAP_NORESERVE | (is_main_executable ? MAP_FIXED : 0), -1, 0); @@ -3176,7 +3178,7 @@ static void load_elf_image(const char *image_name, int image_fd, info->start_data = -1; info->end_data = 0; /* possible start for brk is behind all sections of this ELF file. */ -info->brk = TARGET_PAGE_ALIGN(hiaddr); +info->brk = TARGET_PAGE_ALIGN(load_offset + hiaddr); info->elf_flags = ehdr->e_flags; prot_exec = PROT_EXEC; diff --git a/linux-user/loader.h b/linux-user/loader.h index 59cbeacf24..3ba41e9a7b 100644 --- a/linux-user/loader.h +++ b/linux-user/loader.h @@ -18,6 +18,18 @@ #ifndef LINUX_USER_LOADER_H #define LINUX_USER_LOADER_H +/* where to map binaries? */ +#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64 +# define TASK_UNMAPPED_BASE_PIE 0x55 +# define TASK_UNMAPPED_BASE0x70 +#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32 +# define TASK_UNMAPPED_BASE_PIE0x4000 +# define TASK_UNMAPPED_BASE0xf000 +#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */ +# define TASK_UNMAPPED_BASE_PIE0x +# define TASK_UNMAPPED_BASE0x4000 +#endif + /* * Read a good amount of data initially, to hopefully get all the * program headers loaded. diff --git a/linux-user/mmap.c b/linux-user/mmap.c index 848d2fd4bb..9434bc805d 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -23,6 +23,7 @@ #include "user-internals.h" #include "user-mmap.h" #include "target_mman.h" +#include "loader.h" static pthread_mutex_t mmap_mutex = PTHREAD_MUTEX_INITIALIZER; static __thread int mmap_lock_count; @@ -295,21 +296,6 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong start, abi_ulong last, return true; } -#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64 -#ifdef TARGET_AARCH64 -# define TASK_UNMAPPED_BASE 0x55 -#else -# define TASK_UNMAPPED_BASE 0x40 -#endif -#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32 -#ifdef TARGET_HPPA -# define TASK_UNMAPPED_BASE 0xfa00 -#else -# define TASK_UNMAPPED_BASE 0xe000 -#endif -#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */ -# define TASK_UNMAPPED_BASE 0x4000 -#endif abi_ulong mmap_next_start = TASK_UNMAPPED_BASE; unsigned long last_brk; -- 2.41.0
[PATCH v3 0/3] linux-user: Fix static armhf binaries and optmize memory layout
This patch series: - fixes qemu-arm to run static armhf binaries - shows address of heap in /proc/pid/maps output for all architectures - optimizes address layout of loaded executable It can be pulled from here: https://github.com/hdeller/qemu-hppa/tree/brk-fixes-2 Helge Deller (3): linux-user: Show heap address in /proc/pid/maps linux-user: Optimize memory layout for static and dynamic executables linux-user: Load pie executables at upper memory include/exec/cpu_ldst.h | 4 +-- linux-user/elfload.c| 59 + linux-user/loader.h | 12 + linux-user/main.c | 1 + linux-user/mmap.c | 14 +- linux-user/qemu.h | 4 +-- linux-user/syscall.c| 8 -- 7 files changed, 43 insertions(+), 59 deletions(-) -- 2.41.0
[PATCH v3 2/3] linux-user: Optimize memory layout for static and dynamic executables
Organize the emulated memory layout in a way which leaves as much memory as possible for heap for the application. This patch tries to optize the memory layout by loading pie executables into lower memory and shared libs into higher memory (at TASK_UNMAPPED_BASE). This leaves a bigger memory area usable for heap space which will be located directly after the executable. Up to now, pie executable and shared libs were loaded directly behind each other in the area at TASK_UNMAPPED_BASE, which leaves very little space for heap. I tested this change on arm64, armhf and hppa (all in chroot on x86-64), and with a static armhf binary (which is broken without this patch). This patch temporarily breaks the Thread Sanitizer (TSan) application which expects specific boundary definitions for memory mappings on different platforms [1], see commit aab613fb9597 ("linux-user: Update TASK_UNMAPPED_BASE for aarch64") for aarch64. The follow-up patch fixes it again. [1] https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/tsan/rtl/tsan_platform.h Signed-off-by: Helge Deller --- linux-user/elfload.c | 55 +--- linux-user/mmap.c| 8 --- 2 files changed, 21 insertions(+), 42 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index 861ec07abc..47a118e430 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -3023,6 +3023,7 @@ static void load_elf_image(const char *image_name, int image_fd, abi_ulong load_addr, load_bias, loaddr, hiaddr, error; int i, retval, prot_exec; Error *err = NULL; +bool is_main_executable; /* First of all, some simple consistency checks */ if (!elf_check_ident(ehdr)) { @@ -3106,28 +3107,8 @@ static void load_elf_image(const char *image_name, int image_fd, } } -if (pinterp_name != NULL) { -/* - * This is the main executable. - * - * Reserve extra space for brk. - * We hold on to this space while placing the interpreter - * and the stack, lest they be placed immediately after - * the data segment and block allocation from the brk. - * - * 16MB is chosen as "large enough" without being so large as - * to allow the result to not fit with a 32-bit guest on a - * 32-bit host. However some 64 bit guests (e.g. s390x) - * attempt to place their heap further ahead and currently - * nothing stops them smashing into QEMUs address space. - */ -#if TARGET_LONG_BITS == 64 -info->reserve_brk = 32 * MiB; -#else -info->reserve_brk = 16 * MiB; -#endif -hiaddr += info->reserve_brk; - +is_main_executable = (pinterp_name != NULL); +if (is_main_executable) { if (ehdr->e_type == ET_EXEC) { /* * Make sure that the low address does not conflict with @@ -3136,7 +3117,7 @@ static void load_elf_image(const char *image_name, int image_fd, probe_guest_base(image_name, loaddr, hiaddr); } else { /* - * The binary is dynamic, but we still need to + * The binary is dynamic (pie-executabe), but we still need to * select guest_base. In this case we pass a size. */ probe_guest_base(image_name, 0, hiaddr - loaddr); @@ -3159,7 +3140,7 @@ static void load_elf_image(const char *image_name, int image_fd, */ load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE, MAP_PRIVATE | MAP_ANON | MAP_NORESERVE | -(ehdr->e_type == ET_EXEC ? MAP_FIXED : 0), +(is_main_executable ? MAP_FIXED : 0), -1, 0); if (load_addr == -1) { goto exit_mmap; @@ -3194,7 +3175,8 @@ static void load_elf_image(const char *image_name, int image_fd, info->end_code = 0; info->start_data = -1; info->end_data = 0; -info->brk = 0; +/* possible start for brk is behind all sections of this ELF file. */ +info->brk = TARGET_PAGE_ALIGN(hiaddr); info->elf_flags = ehdr->e_flags; prot_exec = PROT_EXEC; @@ -3288,9 +3270,6 @@ static void load_elf_image(const char *image_name, int image_fd, info->end_data = vaddr_ef; } } -if (vaddr_em > info->brk) { -info->brk = vaddr_em; -} #ifdef TARGET_MIPS } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) { Mips_elf_abiflags_v0 abiflags; @@ -3618,6 +3597,15 @@ int load_elf_binary(struct linux_binprm *bprm, struct image_info *info) if (elf_interpreter) { load_elf_interp(elf_interpreter, &interp_info, bprm->buf); +/* +* Use brk address of interpreter if it was loaded above the +* executable and leaves less than 16 MB for heap. +* This happens e.g. with static binaries on armhf. + */
Re: intel-iommu: Report interrupt remapping faults, fix return value
On Tue, Jul 25, 2023 at 11:01:16AM +0100, David Woodhouse wrote: > From: David Woodhouse > > A generic X86IOMMUClass->int_remap function should not return VT-d > specific values; fix it to return 0 if the interrupt was successfully > translated or -EINVAL if not. > > The VTD_FR_IR_xxx values are supposed to be used to actually raise > faults through the fault reporting mechanism, so do that instead for > the case where the IRQ is actually being injected. > > There is more work to be done here, as pretranslations for the KVM IRQ > routing table can't fault; an untranslatable IRQ should be handled in > userspace and the fault raised only when the IRQ actually happens (if > indeed the IRTE is still not valid at that time). But we can work on > that later; we can at least raise faults for the direct case. > > Signed-off-by: David Woodhouse looks like post 8.1 material yes? > --- > hw/i386/intel_iommu.c | 148 ++--- > hw/i386/intel_iommu_internal.h | 1 + > 2 files changed, 102 insertions(+), 47 deletions(-) > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > index dcc334060c..a65a94a4ce 100644 > --- a/hw/i386/intel_iommu.c > +++ b/hw/i386/intel_iommu.c > @@ -469,21 +469,12 @@ static void vtd_set_frcd_and_update_ppf(IntelIOMMUState > *s, uint16_t index) > > /* Must not update F field now, should be done later */ > static void vtd_record_frcd(IntelIOMMUState *s, uint16_t index, > -uint16_t source_id, hwaddr addr, > -VTDFaultReason fault, bool is_write, > -bool is_pasid, uint32_t pasid) > +uint64_t hi, uint64_t lo) > { > -uint64_t hi = 0, lo; > hwaddr frcd_reg_addr = DMAR_FRCD_REG_OFFSET + (((uint64_t)index) << 4); > > assert(index < DMAR_FRCD_REG_NR); > > -lo = VTD_FRCD_FI(addr); > -hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault) | > - VTD_FRCD_PV(pasid) | VTD_FRCD_PP(is_pasid); > -if (!is_write) { > -hi |= VTD_FRCD_T; > -} > vtd_set_quad_raw(s, frcd_reg_addr, lo); > vtd_set_quad_raw(s, frcd_reg_addr + 8, hi); > > @@ -509,17 +500,11 @@ static bool vtd_try_collapse_fault(IntelIOMMUState *s, > uint16_t source_id) > } > > /* Log and report an DMAR (address translation) fault to software */ > -static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id, > - hwaddr addr, VTDFaultReason fault, > - bool is_write, bool is_pasid, > - uint32_t pasid) > +static void vtd_report_frcd_fault(IntelIOMMUState *s, uint64_t source_id, > + uint64_t hi, uint64_t lo) > { > uint32_t fsts_reg = vtd_get_long_raw(s, DMAR_FSTS_REG); > > -assert(fault < VTD_FR_MAX); > - > -trace_vtd_dmar_fault(source_id, fault, addr, is_write); > - > if (fsts_reg & VTD_FSTS_PFO) { > error_report_once("New fault is not recorded due to " >"Primary Fault Overflow"); > @@ -539,8 +524,7 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, > uint16_t source_id, > return; > } > > -vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault, > -is_write, is_pasid, pasid); > +vtd_record_frcd(s, s->next_frcd_reg, hi, lo); > > if (fsts_reg & VTD_FSTS_PPF) { > error_report_once("There are pending faults already, " > @@ -565,6 +549,40 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, > uint16_t source_id, > } > } > > +/* Log and report an DMAR (address translation) fault to software */ > +static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id, > + hwaddr addr, VTDFaultReason fault, > + bool is_write, bool is_pasid, > + uint32_t pasid) > +{ > +uint64_t hi, lo; > + > +assert(fault < VTD_FR_MAX); > + > +trace_vtd_dmar_fault(source_id, fault, addr, is_write); > + > +lo = VTD_FRCD_FI(addr); > +hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault) | > + VTD_FRCD_PV(pasid) | VTD_FRCD_PP(is_pasid); > +if (!is_write) { > +hi |= VTD_FRCD_T; > +} > + > +vtd_report_frcd_fault(s, source_id, hi, lo); > +} > + > + > +static void vtd_report_ir_fault(IntelIOMMUState *s, uint64_t source_id, > +VTDFaultReason fault, uint16_t index) > +{ > +uint64_t hi, lo; > + > +lo = VTD_FRCD_IR_IDX(index); > +hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault); > + > +vtd_report_frcd_fault(s, source_id, hi, lo); > +} > + > /* Handle Invalidation Queue Errors of queued invalidation interface error > * conditions. > */ > @@ -3300,8 +3318,9 @@ static Property vtd_properties[] = { > }; > > /* Read IRTE entry with specific index */ > -static int vtd_irte_get(IntelIOMMUSta
Re: [PATCH] kvm: Remove KVM_CREATE_IRQCHIP support assumption
On Mon, Jul 24, 2023 at 11:53:39AM +0200, Thomas Huth wrote: > On 22/07/2023 08.21, Andrew Jones wrote: > > Since Linux commit 00f918f61c56 ("RISC-V: KVM: Skeletal in-kernel AIA > > irqchip support") checking KVM_CAP_IRQCHIP returns non-zero when the > > RISC-V platform has AIA. The cap indicates KVM supports at least one > > of the following ioctls: > > > >KVM_CREATE_IRQCHIP > >KVM_IRQ_LINE > >KVM_GET_IRQCHIP > >KVM_SET_IRQCHIP > >KVM_GET_LAPIC > >KVM_SET_LAPIC > > > > but the cap doesn't imply that KVM must support any of those ioctls > > in particular. However, QEMU was assuming the KVM_CREATE_IRQCHIP > > ioctl was supported. Stop making that assumption by introducing a > > KVM parameter that each architecture which supports KVM_CREATE_IRQCHIP > > sets. Adding parameters isn't awesome, but given how the > > KVM_CAP_IRQCHIP isn't very helpful on its own, we don't have a lot of > > options. > > > > Signed-off-by: Andrew Jones > > --- > > > > While this fixes booting guests on riscv KVM with AIA it's unlikely > > to get merged before the QEMU support for KVM AIA[1] lands, which > > would also fix the issue. I think this patch is still worth considering > > though since QEMU's assumption is wrong. > > > > [1] > > https://lore.kernel.org/all/20230714084429.22349-1-yongxuan.w...@sifive.com/ > > > > > > accel/kvm/kvm-all.c| 5 - > > include/sysemu/kvm.h | 1 + > > target/arm/kvm.c | 3 +++ > > target/i386/kvm/kvm.c | 2 ++ > > target/s390x/kvm/kvm.c | 3 +++ > > 5 files changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > > index 373d876c0580..0f5ff8630502 100644 > > --- a/accel/kvm/kvm-all.c > > +++ b/accel/kvm/kvm-all.c > > @@ -86,6 +86,7 @@ struct KVMParkedVcpu { > > }; > > KVMState *kvm_state; > > +bool kvm_has_create_irqchip; > > bool kvm_kernel_irqchip; > > bool kvm_split_irqchip; > > bool kvm_async_interrupts_allowed; > > @@ -2377,8 +2378,10 @@ static void kvm_irqchip_create(KVMState *s) > > if (s->kernel_irqchip_split == ON_OFF_AUTO_ON) { > > error_report("Split IRQ chip mode not supported."); > > exit(1); > > -} else { > > +} else if (kvm_has_create_irqchip) { > > ret = kvm_vm_ioctl(s, KVM_CREATE_IRQCHIP); > > +} else { > > +return; > > } > > } > > if (ret < 0) { > > I think I'd do this differntly... at the beginning of the function, there is > a check for kvm_check_extension(s, KVM_CAP_IRQCHIP) etc. ... I think you > could now replace that check with a simple > > if (!kvm_has_create_irqchip) { > return; > } > > The "kvm_vm_enable_cap(s, KVM_CAP_S390_IRQCHIP, 0)" of course has to be > moved to the target/s390x/kvm/kvm.c file, too. Actually, once we've moved the s390 cap enablement to the s390 file we can just drop the whole if-else chain. We don't want the if (!kvm_has_create_irqchip) at the top because we want to try kvm_arch_irqchip_create() even when kvm_has_create_irqchip is false, and we don't need to check KVM_CREATE_IRQCHIP for kvm_arch_irqchip_create() either. Keeping the check, as it is above in this v1, of kvm_has_create_irqchip for KVM_CREATE_IRQCHIP is still necessary, though. Thanks, drew
[PATCH v2] kvm: Remove KVM_CREATE_IRQCHIP support assumption
Since Linux commit 00f918f61c56 ("RISC-V: KVM: Skeletal in-kernel AIA irqchip support") checking KVM_CAP_IRQCHIP returns non-zero when the RISC-V platform has AIA. The cap indicates KVM supports at least one of the following ioctls: KVM_CREATE_IRQCHIP KVM_IRQ_LINE KVM_GET_IRQCHIP KVM_SET_IRQCHIP KVM_GET_LAPIC KVM_SET_LAPIC but the cap doesn't imply that KVM must support any of those ioctls in particular. However, QEMU was assuming the KVM_CREATE_IRQCHIP ioctl was supported. Stop making that assumption by introducing a KVM parameter that each architecture which supports KVM_CREATE_IRQCHIP sets. Adding parameters isn't awesome, but given how the KVM_CAP_IRQCHIP isn't very helpful on its own, we don't have a lot of options. Signed-off-by: Andrew Jones --- While this fixes booting guests on riscv KVM with AIA it's unlikely to get merged before the QEMU support for KVM AIA[1] lands, which would also fix the issue. I think this patch is still worth considering though since QEMU's assumption is wrong. [1] https://lore.kernel.org/all/20230714084429.22349-1-yongxuan.w...@sifive.com/ v2: - Move the s390x code to an s390x file. [Thomas] - Drop the KVM_CAP_IRQCHIP check from the top of kvm_irqchip_create(), as it's no longer necessary. accel/kvm/kvm-all.c| 16 include/sysemu/kvm.h | 1 + target/arm/kvm.c | 3 +++ target/i386/kvm/kvm.c | 2 ++ target/s390x/kvm/kvm.c | 11 +++ 5 files changed, 21 insertions(+), 12 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 373d876c0580..cddcb6eca641 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -86,6 +86,7 @@ struct KVMParkedVcpu { }; KVMState *kvm_state; +bool kvm_has_create_irqchip; bool kvm_kernel_irqchip; bool kvm_split_irqchip; bool kvm_async_interrupts_allowed; @@ -2358,17 +2359,6 @@ static void kvm_irqchip_create(KVMState *s) int ret; assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO); -if (kvm_check_extension(s, KVM_CAP_IRQCHIP)) { -; -} else if (kvm_check_extension(s, KVM_CAP_S390_IRQCHIP)) { -ret = kvm_vm_enable_cap(s, KVM_CAP_S390_IRQCHIP, 0); -if (ret < 0) { -fprintf(stderr, "Enable kernel irqchip failed: %s\n", strerror(-ret)); -exit(1); -} -} else { -return; -} /* First probe and see if there's a arch-specific hook to create the * in-kernel irqchip for us */ @@ -2377,8 +2367,10 @@ static void kvm_irqchip_create(KVMState *s) if (s->kernel_irqchip_split == ON_OFF_AUTO_ON) { error_report("Split IRQ chip mode not supported."); exit(1); -} else { +} else if (kvm_has_create_irqchip) { ret = kvm_vm_ioctl(s, KVM_CREATE_IRQCHIP); +} else { +return; } } if (ret < 0) { diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index 115f0cca79d1..84b1bb3dc91e 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -32,6 +32,7 @@ #ifdef CONFIG_KVM_IS_POSSIBLE extern bool kvm_allowed; +extern bool kvm_has_create_irqchip; extern bool kvm_kernel_irqchip; extern bool kvm_split_irqchip; extern bool kvm_async_interrupts_allowed; diff --git a/target/arm/kvm.c b/target/arm/kvm.c index b4c7654f4980..2fa87b495d68 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -250,6 +250,9 @@ int kvm_arm_get_max_vm_ipa_size(MachineState *ms, bool *fixed_ipa) int kvm_arch_init(MachineState *ms, KVMState *s) { int ret = 0; + +kvm_has_create_irqchip = kvm_check_extension(s, KVM_CAP_IRQCHIP); + /* For ARM interrupt delivery is always asynchronous, * whether we are using an in-kernel VGIC or not. */ diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index ebfaf3d24c79..6363e67f092d 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -2771,6 +2771,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } } +kvm_has_create_irqchip = kvm_check_extension(s, KVM_CAP_IRQCHIP); + return 0; } diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c index a9e5880349d9..bcc735227f7d 100644 --- a/target/s390x/kvm/kvm.c +++ b/target/s390x/kvm/kvm.c @@ -391,6 +391,17 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES); + +kvm_has_create_irqchip = kvm_check_extension(s, KVM_CAP_S390_IRQCHIP); +if (kvm_has_create_irqchip) { +int ret = kvm_vm_enable_cap(s, KVM_CAP_S390_IRQCHIP, 0); + +if (ret < 0) { +fprintf(stderr, "Enable kernel irqchip failed: %s\n", strerror(-ret)); +exit(1); +} +} + return 0; } -- 2.41.0
Re: [PATCH v2] kvm: Remove KVM_CREATE_IRQCHIP support assumption
On 25/07/2023 14.26, Andrew Jones wrote: Since Linux commit 00f918f61c56 ("RISC-V: KVM: Skeletal in-kernel AIA irqchip support") checking KVM_CAP_IRQCHIP returns non-zero when the RISC-V platform has AIA. The cap indicates KVM supports at least one of the following ioctls: KVM_CREATE_IRQCHIP KVM_IRQ_LINE KVM_GET_IRQCHIP KVM_SET_IRQCHIP KVM_GET_LAPIC KVM_SET_LAPIC but the cap doesn't imply that KVM must support any of those ioctls in particular. However, QEMU was assuming the KVM_CREATE_IRQCHIP ioctl was supported. Stop making that assumption by introducing a KVM parameter that each architecture which supports KVM_CREATE_IRQCHIP sets. Adding parameters isn't awesome, but given how the KVM_CAP_IRQCHIP isn't very helpful on its own, we don't have a lot of options. Signed-off-by: Andrew Jones --- While this fixes booting guests on riscv KVM with AIA it's unlikely to get merged before the QEMU support for KVM AIA[1] lands, which would also fix the issue. I think this patch is still worth considering though since QEMU's assumption is wrong. [1] https://lore.kernel.org/all/20230714084429.22349-1-yongxuan.w...@sifive.com/ v2: - Move the s390x code to an s390x file. [Thomas] - Drop the KVM_CAP_IRQCHIP check from the top of kvm_irqchip_create(), as it's no longer necessary. Looks good now! Reviewed-by: Thomas Huth
Re: [PATCH v2] kvm: Remove KVM_CREATE_IRQCHIP support assumption
On 25/7/23 14:26, Andrew Jones wrote: Since Linux commit 00f918f61c56 ("RISC-V: KVM: Skeletal in-kernel AIA irqchip support") checking KVM_CAP_IRQCHIP returns non-zero when the RISC-V platform has AIA. The cap indicates KVM supports at least one of the following ioctls: KVM_CREATE_IRQCHIP KVM_IRQ_LINE KVM_GET_IRQCHIP KVM_SET_IRQCHIP KVM_GET_LAPIC KVM_SET_LAPIC but the cap doesn't imply that KVM must support any of those ioctls in particular. However, QEMU was assuming the KVM_CREATE_IRQCHIP ioctl was supported. Stop making that assumption by introducing a KVM parameter that each architecture which supports KVM_CREATE_IRQCHIP sets. Adding parameters isn't awesome, but given how the KVM_CAP_IRQCHIP isn't very helpful on its own, we don't have a lot of options. Signed-off-by: Andrew Jones --- While this fixes booting guests on riscv KVM with AIA it's unlikely to get merged before the QEMU support for KVM AIA[1] lands, which would also fix the issue. I think this patch is still worth considering though since QEMU's assumption is wrong. [1] https://lore.kernel.org/all/20230714084429.22349-1-yongxuan.w...@sifive.com/ v2: - Move the s390x code to an s390x file. [Thomas] - Drop the KVM_CAP_IRQCHIP check from the top of kvm_irqchip_create(), as it's no longer necessary. accel/kvm/kvm-all.c| 16 include/sysemu/kvm.h | 1 + target/arm/kvm.c | 3 +++ target/i386/kvm/kvm.c | 2 ++ target/s390x/kvm/kvm.c | 11 +++ 5 files changed, 21 insertions(+), 12 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH] ui/dbus: fix win32 compilation when !opengl
On 25/7/23 13:25, marcandre.lur...@redhat.com wrote: From: Marc-Andre Lureau Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1782 Signed-off-by: Marc-André Lureau --- ui/dbus-listener.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH v3 0/8] misc AHCI cleanups
Hi Niklas, John, Paolo, Kevin, On 19/7/23 12:47, Niklas Cassel wrote: Niklas Cassel (8): hw/ide/ahci: remove stray backslash hw/ide/core: set ERR_STAT in unsupported command completion hw/ide/ahci: write D2H FIS when processing NCQ command hw/ide/ahci: simplify and document PxCI handling hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set hw/ide/ahci: fix ahci_write_fis_sdb() hw/ide/ahci: fix broken SError handling hw/ide/ahci.c | 112 +++--- hw/ide/core.c | 2 +- tests/qtest/libqos/ahci.c | 106 +++- tests/qtest/libqos/ahci.h | 8 +-- 4 files changed, 164 insertions(+), 64 deletions(-) -- 2.40.1 Hello Philippe, Considering that you picked up my patch, "hw/ide/ahci: remove stray backslash" (patch 1/8 in this series), and since John seems to have gone silent for 40+ days, could you please consider taking this series through your misc tree? (First patch was a cleanup) Niklas, I don't feel confident enough :/ John, Paolo, Kevin, do you Ack? Regards, Phil.
Re: [PATCH for-8.1] hw/rdma/vmw/pvrdma_cmd: Use correct struct in query_port()
On 25/7/23 13:36, Peter Maydell wrote: In query_port() we pass the address of a local pvrdma_port_attr struct to the rdma_query_backend_port() function. Unfortunately, rdma_backend_query_port() wants a pointer to a struct ibv_port_attr, and the two are not the same length. Coverity spotted this (CID 1507146): pvrdma_port_attr is 48 bytes long, and ibv_port_attr is 52 bytes, because it has a few extra fields at the end. Fortunately, all we do with the attrs struct after the call is to read a few specific fields out of it which are all at the same offsets in both structs, so we can simply make the local variable the correct type. This also lets us drop the cast (which should have been a bit of a warning flag that we were doing something wrong here). Fortunate but also kind of amusing :) Reviewed-by: Philippe Mathieu-Daudé Cc: qemu-sta...@nongnu.org Signed-off-by: Peter Maydell --- I don't know anything about the rdma code so this fix is based purely on looking at the code, and is untested beyond just make check/make check-avocado. --- hw/rdma/vmw/pvrdma_cmd.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop
On 25.07.23 12:03, Eugenio Perez Martin wrote: On Tue, Jul 25, 2023 at 9:53 AM Hanna Czenczek wrote: On 24.07.23 17:48, Eugenio Perez Martin wrote: On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek wrote: On 21.07.23 17:25, Eugenio Perez Martin wrote: On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek wrote: Move the `suspended` field from vhost_vdpa into the global vhost_dev struct, so vhost_dev_stop() can check whether the back-end has been suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has, there is no need to reset it; the reset is just a fall-back to stop device operations for back-ends that do not support suspend. Unfortunately, for vDPA specifically, RESUME is not yet implemented, so when the device is re-started, we still have to do the reset to have it un-suspend. Signed-off-by: Hanna Czenczek --- include/hw/virtio/vhost-vdpa.h | 2 -- include/hw/virtio/vhost.h | 8 hw/virtio/vhost-vdpa.c | 11 +++ hw/virtio/vhost.c | 8 +++- 4 files changed, 22 insertions(+), 7 deletions(-) diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h index e64bfc7f98..72c3686b7f 100644 --- a/include/hw/virtio/vhost-vdpa.h +++ b/include/hw/virtio/vhost-vdpa.h @@ -42,8 +42,6 @@ typedef struct vhost_vdpa { bool shadow_vqs_enabled; /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */ bool shadow_data; -/* Device suspended successfully */ -bool suspended; /* IOVA mapping used by the Shadow Virtqueue */ VhostIOVATree *iova_tree; GPtrArray *shadow_vqs; diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h index 6a173cb9fa..69bf59d630 100644 --- a/include/hw/virtio/vhost.h +++ b/include/hw/virtio/vhost.h @@ -120,6 +120,14 @@ struct vhost_dev { uint64_t backend_cap; /* @started: is the vhost device started? */ bool started; +/** + * @suspended: Whether the vhost device is currently suspended. Set + * and reset by implementations (vhost-user, vhost-vdpa, ...), which + * are supposed to automatically suspend/resume in their + * vhost_dev_start handlers as required. Must also be cleared when + * the device is reset. + */ +bool suspended; bool log_enabled; uint64_t log_size; Error *migration_blocker; diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 7b7dee468e..f7fd19a203 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev, static int vhost_vdpa_reset_device(struct vhost_dev *dev) { -struct vhost_vdpa *v = dev->opaque; int ret; uint8_t status = 0; ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status); trace_vhost_vdpa_reset_device(dev); -v->suspended = false; +dev->suspended = false; return ret; } @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev) if (unlikely(r)) { error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno); } else { -v->suspended = true; +dev->suspended = true; return; } } @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) return -1; } vhost_vdpa_set_vring_ready(dev); +if (dev->suspended) { +/* TODO: When RESUME is available, use it instead of resetting */ +vhost_vdpa_reset_status(dev); How is that we reset the status at each vhost_vdpa_dev_start? That will clean all the vqs configured, features negotiated, etc. in the vDPA device. Or am I missing something? What alternative do you propose? We don’t have RESUME for vDPA in qemu, but we somehow need to lift the previous SUSPEND so the device will again respond to guest requests, do we not? Reset also clears the suspend state in vDPA, and it should be called at vhost_dev_stop. So the device should never be in suspended state here. Does that solve your concerns? My intention with this patch was precisely not to reset in vhost_dev_stop when suspending is supported. So now I’m more confused than before. At this moment, I think that should be focused as an optimization and not to be included in the main series. It is absolutely not an optimization but vital for my use case. virtiofsd does not currently implement resetting, but if it did (and we want this support in the future), resetting it during stop/cont would be catastrophic. There must be a way to have qemu not issue a reset. This patch is the reason why this series exists. But more generally, is this any different from what is done before this patch? Before this patch, vhost_dev_stop() unconditionally invokes vhost_reset_status(), so the device is reset in every stop/start cycle, that doesn’t
Re: intel-iommu: Report interrupt remapping faults, fix return value
On Tue, 2023-07-25 at 08:03 -0400, Michael S. Tsirkin wrote: > On Tue, Jul 25, 2023 at 11:01:16AM +0100, David Woodhouse wrote: > > From: David Woodhouse > > > > A generic X86IOMMUClass->int_remap function should not return VT-d > > specific values; fix it to return 0 if the interrupt was successfully > > translated or -EINVAL if not. > > > > The VTD_FR_IR_xxx values are supposed to be used to actually raise > > faults through the fault reporting mechanism, so do that instead for > > the case where the IRQ is actually being injected. > > > > There is more work to be done here, as pretranslations for the KVM IRQ > > routing table can't fault; an untranslatable IRQ should be handled in > > userspace and the fault raised only when the IRQ actually happens (if > > indeed the IRTE is still not valid at that time). But we can work on > > that later; we can at least raise faults for the direct case. > > > > Signed-off-by: David Woodhouse > > > looks like post 8.1 material yes? Makes sense. I just found it lying around in my tree and figured it was better posted than sitting there for another few months. I do still want to get round to fixing the reports for passthrough interrupts at some point. That annoys me. :) smime.p7s Description: S/MIME cryptographic signature
[PATCH] memory: avoid updating ioeventfds for some address_space
When updating ioeventfds, we need to iterate all address spaces, but some address spaces do not register eventfd_add|del call when memory_listener_register() and they do nothing when updating ioeventfds. So we can skip these AS in address_space_update_ioeventfds(). The overhead of memory_region_transaction_commit() can be significantly reduced. For example, a VM with 8 vhost net devices and each one has 64 vectors, can reduce the time spent on memory_region_transaction_commit by 20%. Signed-off-by: hongmianquan --- include/exec/memory.h | 1 + softmmu/memory.c | 12 2 files changed, 13 insertions(+) diff --git a/include/exec/memory.h b/include/exec/memory.h index 7f5c11a0cc..556f4f1871 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1089,6 +1089,7 @@ struct AddressSpace { struct FlatView *current_map; int ioeventfd_nb; +int ioeventfd_notifiers; struct MemoryRegionIoeventfd *ioeventfds; QTAILQ_HEAD(, MemoryListener) listeners; QTAILQ_ENTRY(AddressSpace) address_spaces_link; diff --git a/softmmu/memory.c b/softmmu/memory.c index 7d9494ce70..77042d3f93 100644 --- a/softmmu/memory.c +++ b/softmmu/memory.c @@ -842,6 +842,10 @@ static void address_space_update_ioeventfds(AddressSpace *as) AddrRange tmp; unsigned i; +if (!as->ioeventfd_notifiers) { +return; +} + /* * It is likely that the number of ioeventfds hasn't changed much, so use * the previous size as the starting value, with some headroom to avoid @@ -3075,6 +3079,10 @@ void memory_listener_register(MemoryListener *listener, AddressSpace *as) } listener_add_address_space(listener, as); + +if (listener->eventfd_add || listener->eventfd_del) { +as->ioeventfd_notifiers++; +} } void memory_listener_unregister(MemoryListener *listener) @@ -3083,6 +3091,10 @@ void memory_listener_unregister(MemoryListener *listener) return; } +if (listener->eventfd_add || listener->eventfd_del) { +as->ioeventfd_notifiers--; +} + listener_del_address_space(listener, listener->address_space); QTAILQ_REMOVE(&memory_listeners, listener, link); QTAILQ_REMOVE(&listener->address_space->listeners, listener, link_as); -- 2.11.0
[PATCH] memory: avoid updating ioeventfds for some address_space
When updating ioeventfds, we need to iterate all address spaces, but some address spaces do not register eventfd_add|del call when memory_listener_register() and they do nothing when updating ioeventfds. So we can skip these AS in address_space_update_ioeventfds(). The overhead of memory_region_transaction_commit() can be significantly reduced. For example, a VM with 8 vhost net devices and each one has 64 vectors, can reduce the time spent on memory_region_transaction_commit by 20%. Signed-off-by: hongmianquan --- include/exec/memory.h | 1 + softmmu/memory.c | 12 2 files changed, 13 insertions(+) diff --git a/include/exec/memory.h b/include/exec/memory.h index 7f5c11a0cc..556f4f1871 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1089,6 +1089,7 @@ struct AddressSpace { struct FlatView *current_map; int ioeventfd_nb; +int ioeventfd_notifiers; struct MemoryRegionIoeventfd *ioeventfds; QTAILQ_HEAD(, MemoryListener) listeners; QTAILQ_ENTRY(AddressSpace) address_spaces_link; diff --git a/softmmu/memory.c b/softmmu/memory.c index 7d9494ce70..178816c845 100644 --- a/softmmu/memory.c +++ b/softmmu/memory.c @@ -842,6 +842,10 @@ static void address_space_update_ioeventfds(AddressSpace *as) AddrRange tmp; unsigned i; +if (!as->ioeventfd_notifiers) { +return; +} + /* * It is likely that the number of ioeventfds hasn't changed much, so use * the previous size as the starting value, with some headroom to avoid @@ -3075,6 +3079,10 @@ void memory_listener_register(MemoryListener *listener, AddressSpace *as) } listener_add_address_space(listener, as); + +if (listener->eventfd_add || listener->eventfd_del) { +as->ioeventfd_notifiers++; +} } void memory_listener_unregister(MemoryListener *listener) @@ -3083,6 +3091,10 @@ void memory_listener_unregister(MemoryListener *listener) return; } +if (listener->eventfd_add || listener->eventfd_del) { +listener->address_space->ioeventfd_notifiers--; +} + listener_del_address_space(listener, listener->address_space); QTAILQ_REMOVE(&memory_listeners, listener, link); QTAILQ_REMOVE(&listener->address_space->listeners, listener, link_as); -- 2.11.0
Re: [PATCH] ppc: spapr: Fix device tree entries in absence of XIVE native mode
Ok noted, thanks for the feedback Greg and Cedric. Thanks, Gautam
[PATCH] Open file as read only on private mapping in qemu_ram_alloc_from_file
An read only file can be mapped with read write as long as the mapping is private, which is very common case. Make qemu_ram_alloc_from_file open file as read only when the mapping is private, otherwise open will fail when file does not allow write. If this file does not exist or is a directory, the flag is not used, so it should be OK. from https://gitlab.com/qemu-project/qemu/-/issues/1689 Signed-off-by: Thiner Logoer --- softmmu/physmem.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/softmmu/physmem.c b/softmmu/physmem.c index 3df73542e1..e8036ee335 100644 --- a/softmmu/physmem.c +++ b/softmmu/physmem.c @@ -1945,8 +1945,15 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr, int fd; bool created; RAMBlock *block; + +/* + * If map is private, the fd does not need to be writable. + * This only get effective when the file is existent. + */ +bool open_as_readonly = readonly || !(ram_flags & RAM_SHARED); -fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created, +fd = file_ram_open(mem_path, memory_region_name(mr), + open_as_readonly, &created, errp); if (fd < 0) { return NULL; -- 2.40.1
[PATCH] migration/ram: Refactor precopy ram loading code
From: Nikolay Borisov Extract the ramblock parsing code into a routine that operates on the sequence of headers from the stream and another the parses the individual ramblock. This makes ram_load_precopy() easier to comprehend. Signed-off-by: Nikolay Borisov Signed-off-by: Fabiano Rosas --- I'm extracting the parts from the fixed-ram migration series [1] that could already go in. This patch is one of them. 1- https://lore.kernel.org/r/20230330180336.2791-1-faro...@suse.de --- migration/ram.c | 141 +++- 1 file changed, 79 insertions(+), 62 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 0ada6477e8..685e129b70 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3826,6 +3826,82 @@ void colo_flush_ram_cache(void) trace_colo_flush_ram_cache_end(); } +static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length) +{ +int ret = 0; +/* ADVISE is earlier, it shows the source has the postcopy capability on */ +bool postcopy_advised = migration_incoming_postcopy_advised(); + +assert(block); + +if (!qemu_ram_is_migratable(block)) { +error_report("block %s should not be migrated !", block->idstr); +return -EINVAL; +} + +if (length != block->used_length) { +Error *local_err = NULL; + +ret = qemu_ram_resize(block, length, &local_err); +if (local_err) { +error_report_err(local_err); +} +} +/* For postcopy we need to check hugepage sizes match */ +if (postcopy_advised && migrate_postcopy_ram() && +block->page_size != qemu_host_page_size) { +uint64_t remote_page_size = qemu_get_be64(f); +if (remote_page_size != block->page_size) { +error_report("Mismatched RAM page size %s " + "(local) %zd != %" PRId64, block->idstr, + block->page_size, remote_page_size); +ret = -EINVAL; +} +} +if (migrate_ignore_shared()) { +hwaddr addr = qemu_get_be64(f); +if (migrate_ram_is_ignored(block) && +block->mr->addr != addr) { +error_report("Mismatched GPAs for block %s " + "%" PRId64 "!= %" PRId64, block->idstr, + (uint64_t)addr, (uint64_t)block->mr->addr); +ret = -EINVAL; +} +} +ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG, block->idstr); + +return ret; +} + +static int parse_ramblocks(QEMUFile *f, ram_addr_t total_ram_bytes) +{ +int ret = 0; + +/* Synchronize RAM block list */ +while (!ret && total_ram_bytes) { +RAMBlock *block; +char id[256]; +ram_addr_t length; +int len = qemu_get_byte(f); + +qemu_get_buffer(f, (uint8_t *)id, len); +id[len] = 0; +length = qemu_get_be64(f); + +block = qemu_ram_block_by_name(id); +if (block) { +ret = parse_ramblock(f, block, length); +} else { +error_report("Unknown ramblock \"%s\", cannot accept " + "migration", id); +ret = -EINVAL; +} +total_ram_bytes -= length; +} + +return ret; +} + /** * ram_load_precopy: load pages in precopy case * @@ -3840,14 +3916,13 @@ static int ram_load_precopy(QEMUFile *f) { MigrationIncomingState *mis = migration_incoming_get_current(); int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0; -/* ADVISE is earlier, it shows the source has the postcopy capability on */ -bool postcopy_advised = migration_incoming_postcopy_advised(); + if (!migrate_compress()) { invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE; } while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) { -ram_addr_t addr, total_ram_bytes; +ram_addr_t addr; void *host = NULL, *host_bak = NULL; uint8_t ch; @@ -3918,65 +3993,7 @@ static int ram_load_precopy(QEMUFile *f) switch (flags & ~RAM_SAVE_FLAG_CONTINUE) { case RAM_SAVE_FLAG_MEM_SIZE: -/* Synchronize RAM block list */ -total_ram_bytes = addr; -while (!ret && total_ram_bytes) { -RAMBlock *block; -char id[256]; -ram_addr_t length; - -len = qemu_get_byte(f); -qemu_get_buffer(f, (uint8_t *)id, len); -id[len] = 0; -length = qemu_get_be64(f); - -block = qemu_ram_block_by_name(id); -if (block && !qemu_ram_is_migratable(block)) { -error_report("block %s should not be migrated !", id); -ret = -EINVAL; -} else if (block) { -if (length != block->used_length) { -Error *local_err = NULL; - -ret = qemu_ram_resize(block, length, - &local_err); -
Re: [PATCH] migration/ram: Refactor precopy ram loading code
On 25/7/23 15:26, Fabiano Rosas wrote: From: Nikolay Borisov Extract the ramblock parsing code into a routine that operates on the sequence of headers from the stream and another the parses the individual ramblock. This makes ram_load_precopy() easier to comprehend. Signed-off-by: Nikolay Borisov Signed-off-by: Fabiano Rosas --- I'm extracting the parts from the fixed-ram migration series [1] that could already go in. This patch is one of them. 1- https://lore.kernel.org/r/20230330180336.2791-1-faro...@suse.de --- migration/ram.c | 141 +++- 1 file changed, 79 insertions(+), 62 deletions(-) I'd rather 1 patch extracting parse_ramblock() then another one for parse_ramblocks(), anyhow: Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH v4 05/12] virtio-sound: prepare PCM streams
Hi On Thu, Jul 20, 2023 at 4:59 PM Emmanouil Pitsidianakis < manos.pitsidiana...@linaro.org> wrote: > After setting PCM parameters, instantiate ("prepare") each stream in > virtio_snd_pcm_prepare_impl(). > > Signed-off-by: Emmanouil Pitsidianakis > --- > hw/virtio/virtio-snd.c | 133 + > 1 file changed, 133 insertions(+) > > diff --git a/hw/virtio/virtio-snd.c b/hw/virtio/virtio-snd.c > index 62d9bf557c..ca09c937c7 100644 > --- a/hw/virtio/virtio-snd.c > +++ b/hw/virtio/virtio-snd.c > @@ -200,6 +200,132 @@ uint32_t virtio_snd_pcm_set_params_impl(VirtIOSound > *s, > return VIRTIO_SND_S_OK; > } > > +/* > + * Get a QEMU Audiosystem compatible format value from a > VIRTIO_SND_PCM_FMT_* > + */ > +static AudioFormat virtio_snd_get_qemu_format(uint32_t format) > +{ > +#define CASE(FMT) \ > +case VIRTIO_SND_PCM_FMT_##FMT: \ > +return AUDIO_FORMAT_##FMT; > + > +switch (format) { > +CASE(U8) > +CASE(S8) > +CASE(U16) > +CASE(S16) > +CASE(U32) > +CASE(S32) > +case VIRTIO_SND_PCM_FMT_FLOAT: > +return AUDIO_FORMAT_F32; > +default: > +g_assert_not_reached(); > +} > + > +#undef CASE > +} > + > +/* > + * Get a QEMU Audiosystem compatible frequency value from a > + * VIRTIO_SND_PCM_RATE_* > + */ > +static uint32_t virtio_snd_get_qemu_freq(uint32_t rate) > +{ > +#define CASE(RATE) \ > +case VIRTIO_SND_PCM_RATE_##RATE: \ > +return RATE; > + > +switch (rate) { > +CASE(5512) > +CASE(8000) > +CASE(11025) > +CASE(16000) > +CASE(22050) > +CASE(32000) > +CASE(44100) > +CASE(48000) > +CASE(64000) > +CASE(88200) > +CASE(96000) > +CASE(176400) > +CASE(192000) > +CASE(384000) > +default: > +g_assert_not_reached(); > +} > + > +#undef CASE > +} > + > +/* > + * Get QEMU Audiosystem compatible audsettings from virtio based pcm > stream > + * params. > + */ > +static void virtio_snd_get_qemu_audsettings(audsettings *as, > +VirtIOSoundPCMParams *params) > +{ > +as->nchannels = MIN(AUDIO_MAX_CHANNELS, params->channels); > +as->fmt = virtio_snd_get_qemu_format(params->format); > +as->freq = virtio_snd_get_qemu_freq(params->rate); > +as->endianness = AUDIO_HOST_ENDIANNESS; > +} > + > +/* > + * Prepares a VirtIOSound card stream. > + * Returns the response status code. (VIRTIO_SND_S_*). > + * > + * @s: VirtIOSound device > + * @stream_id: stream id > + */ > +static uint32_t virtio_snd_pcm_prepare_impl(VirtIOSound *s, uint32_t > stream_id) > +{ > +audsettings as; > +VirtIOSoundPCMParams *params; > +VirtIOSoundPCMStream *stream; > + > +if (!s->pcm->streams || > +!s->pcm->pcm_params || > +!s->pcm->pcm_params[stream_id]) { > +return VIRTIO_SND_S_BAD_MSG; > +} > + > +params = virtio_snd_pcm_get_params(s, stream_id); > +if (!params) { > +return VIRTIO_SND_S_BAD_MSG; > +} > + > +virtio_snd_get_qemu_audsettings(&as, params); > + > +stream = g_new0(VirtIOSoundPCMStream, 1); > + > +stream->id = stream_id; > +stream->pcm = s->pcm; > +stream->direction = stream_id < s->snd_conf.streams / 2 + > +(s->snd_conf.streams & 1) ? VIRTIO_SND_D_OUTPUT : > VIRTIO_SND_D_INPUT; > +stream->info.hdr.hda_fn_nid = VIRTIO_SOUND_HDA_FN_NID; > +stream->features = 0; > +stream->channels_min = 1; > +stream->channels_max = as.nchannels; > +stream->formats = supported_formats; > +stream->rates = supported_rates; > +stream->s = s; > + > +stream->buffer_bytes = params->buffer_bytes; > +stream->period_bytes = params->period_bytes; > + > +stream->positions[0] = VIRTIO_SND_CHMAP_FL; > +stream->positions[1] = VIRTIO_SND_CHMAP_FR; > + > +stream->as = as; > +stream->desired_as = stream->as; > +qemu_mutex_init(&stream->queue_mutex); > +QSIMPLEQ_INIT(&stream->queue); > + > +s->pcm->streams[stream_id] = stream; > Shouldn't it close & free the existing stream? Or return an error? > + > +return VIRTIO_SND_S_OK; > +} > + > /* > * The actual processing done in virtio_snd_process_cmdq(). > * > @@ -432,6 +558,13 @@ static void virtio_snd_common_realize(DeviceState > *dev, > print_code(status)); > return; > } > +status = virtio_snd_pcm_prepare_impl(vsnd, i); > +if (status != VIRTIO_SND_S_OK) { > +error_setg(errp, > + "Can't prepare streams, device responded with %s.", > + print_code(status)); > +return; > +} > } > } > > -- > 2.39.2 > > > -- Marc-André Lureau
Re: [PATCH v21 05/20] s390x/cpu topology: resetting the Topology-Change-Report
On Fri, 2023-06-30 at 11:17 +0200, Pierre Morel wrote: > During a subsystem reset the Topology-Change-Report is cleared > by the machine. > Let's ask KVM to clear the Modified Topology Change Report (MTCR) > bit of the SCA in the case of a subsystem reset. > > Signed-off-by: Pierre Morel > Reviewed-by: Thomas Huth Reviewed-by: Nina Schoetterl-Glausch
[Stable-8.0.4 04/31] linux-user: Fix accept4(SOCK_NONBLOCK) syscall
From: Helge Deller The Linux accept4() syscall allows two flags only: SOCK_NONBLOCK and SOCK_CLOEXEC, and returns -EINVAL if any other bits have been set. Change the qemu implementation accordingly, which means we can not use the fcntl_flags_tbl[] translation table which allows too many other values. Beside the correction in behaviour, this actually fixes the accept4() emulation for hppa, mips and alpha targets for which SOCK_NONBLOCK is different than TARGET_SOCK_NONBLOCK (aka O_NONBLOCK). The fix can be verified with the testcase of the debian lwt package, which hangs forever in a read() syscall without this patch. Signed-off-by: Helge Deller Reviewed-by: Richard Henderson (cherry picked from commit dca4c8384d68bbf5d67f50a5446865d92d61f032) Signed-off-by: Michael Tokarev diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 333e6b7026..0901884495 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -3440,7 +3440,17 @@ static abi_long do_accept4(int fd, abi_ulong target_addr, abi_long ret; int host_flags; -host_flags = target_to_host_bitmask(flags, fcntl_flags_tbl); +if (flags & ~(TARGET_SOCK_CLOEXEC | TARGET_SOCK_NONBLOCK)) { +return -TARGET_EINVAL; +} + +host_flags = 0; +if (flags & TARGET_SOCK_NONBLOCK) { +host_flags |= SOCK_NONBLOCK; +} +if (flags & TARGET_SOCK_CLOEXEC) { +host_flags |= SOCK_CLOEXEC; +} if (target_addr == 0) { return get_errno(safe_accept4(fd, NULL, NULL, host_flags)); -- 2.39.2
[Stable-8.0.4 01/31] virtio-net: correctly report maximum tx_queue_size value
From: Laurent Vivier Maximum value for tx_queue_size depends on the backend type. 1024 for vDPA/vhost-user, 256 for all the others. The value is returned by virtio_net_max_tx_queue_size() to set the parameter: n->net_conf.tx_queue_size = MIN(virtio_net_max_tx_queue_size(n), n->net_conf.tx_queue_size); But the parameter checking uses VIRTQUEUE_MAX_SIZE (1024). So the parameter is silently ignored and ethtool reports a different value than the one provided by the user. ... -netdev tap,... -device virtio-net,tx_queue_size=1024 # ethtool -g enp0s2 Ring parameters for enp0s2: Pre-set maximums: RX: 256 RX Mini:n/a RX Jumbo: n/a TX: 256 Current hardware settings: RX: 256 RX Mini:n/a RX Jumbo: n/a TX: 256 ... -netdev vhost-user,... -device virtio-net,tx_queue_size=2048 Invalid tx_queue_size (= 2048), must be a power of 2 between 256 and 1024 With this patch the correct maximum value is checked and displayed. For vDPA/vhost-user: Invalid tx_queue_size (= 2048), must be a power of 2 between 256 and 1024 For all the others: Invalid tx_queue_size (= 512), must be a power of 2 between 256 and 256 Fixes: 2eef278b9e63 ("virtio-net: fix tx queue size for !vhost-user") Cc: m...@redhat.com Cc: qemu-sta...@nongnu.org Signed-off-by: Laurent Vivier Signed-off-by: Jason Wang (cherry picked from commit 4271f4038372f174dbafffacca1a748d058a03ba) Signed-off-by: Michael Tokarev diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index af1e89706c..5c0a771170 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -3629,12 +3629,12 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp) } if (n->net_conf.tx_queue_size < VIRTIO_NET_TX_QUEUE_MIN_SIZE || -n->net_conf.tx_queue_size > VIRTQUEUE_MAX_SIZE || +n->net_conf.tx_queue_size > virtio_net_max_tx_queue_size(n) || !is_power_of_2(n->net_conf.tx_queue_size)) { error_setg(errp, "Invalid tx_queue_size (= %" PRIu16 "), " "must be a power of 2 between %d and %d", n->net_conf.tx_queue_size, VIRTIO_NET_TX_QUEUE_MIN_SIZE, - VIRTQUEUE_MAX_SIZE); + virtio_net_max_tx_queue_size(n)); virtio_cleanup(vdev); return; } -- 2.39.2
[Stable-8.0.4 02/31] qemu_cleanup: begin drained section after vm_shutdown()
From: Fiona Ebner in order to avoid requests being stuck in a BlockBackend's request queue during cleanup. Having such requests can lead to a deadlock [0] with a virtio-scsi-pci device using iothread that's busy with IO when initiating a shutdown with QMP 'quit'. There is a race where such a queued request can continue sometime (maybe after bdrv_child_free()?) during bdrv_root_unref_child() [1]. The completion will hold the AioContext lock and wait for the BQL during SCSI completion, but the main thread will hold the BQL and wait for the AioContext as part of bdrv_root_unref_child(), leading to the deadlock [0]. [0]: > Thread 3 (Thread 0x7f3bbd87b700 (LWP 135952) "qemu-system-x86"): > #0 __lll_lock_wait (futex=futex@entry=0x564183365f00 , > private=0) at lowlevellock.c:52 > #1 0x7f3bc1c0d843 in __GI___pthread_mutex_lock (mutex=0x564183365f00 > ) at ../nptl/pthread_mutex_lock.c:80 > #2 0x564182939f2e in qemu_mutex_lock_impl (mutex=0x564183365f00 > , file=0x564182b7f774 "../softmmu/physmem.c", line=2593) > at ../util/qemu-thread-posix.c:94 > #3 0x56418247cc2a in qemu_mutex_lock_iothread_impl (file=0x564182b7f774 > "../softmmu/physmem.c", line=2593) at ../softmmu/cpus.c:504 > #4 0x5641826d5325 in prepare_mmio_access (mr=0x5641856148a0) at > ../softmmu/physmem.c:2593 > #5 0x5641826d6fe7 in address_space_stl_internal (as=0x56418679b310, > addr=4276113408, val=16418, attrs=..., result=0x0, > endian=DEVICE_LITTLE_ENDIAN) at /home/febner/repos/qemu/memory_ldst.c.inc:318 > #6 0x5641826d7154 in address_space_stl_le (as=0x56418679b310, > addr=4276113408, val=16418, attrs=..., result=0x0) at > /home/febner/repos/qemu/memory_ldst.c.inc:357 > #7 0x564182374b07 in pci_msi_trigger (dev=0x56418679b0d0, msg=...) at > ../hw/pci/pci.c:359 > #8 0x56418237118b in msi_send_message (dev=0x56418679b0d0, msg=...) at > ../hw/pci/msi.c:379 > #9 0x564182372c10 in msix_notify (dev=0x56418679b0d0, vector=8) at > ../hw/pci/msix.c:542 > #10 0x56418243719c in virtio_pci_notify (d=0x56418679b0d0, vector=8) at > ../hw/virtio/virtio-pci.c:77 > #11 0x5641826933b0 in virtio_notify_vector (vdev=0x5641867a34a0, > vector=8) at ../hw/virtio/virtio.c:1985 > #12 0x5641826948d6 in virtio_irq (vq=0x5641867ac078) at > ../hw/virtio/virtio.c:2461 > #13 0x564182694978 in virtio_notify (vdev=0x5641867a34a0, > vq=0x5641867ac078) at ../hw/virtio/virtio.c:2473 > #14 0x564182665b83 in virtio_scsi_complete_req (req=0x7f3bb000e5d0) at > ../hw/scsi/virtio-scsi.c:115 > #15 0x5641826670ce in virtio_scsi_complete_cmd_req (req=0x7f3bb000e5d0) > at ../hw/scsi/virtio-scsi.c:641 > #16 0x56418266736b in virtio_scsi_command_complete (r=0x7f3bb0010560, > resid=0) at ../hw/scsi/virtio-scsi.c:712 > #17 0x56418239aac6 in scsi_req_complete (req=0x7f3bb0010560, status=2) at > ../hw/scsi/scsi-bus.c:1526 > #18 0x56418239e090 in scsi_handle_rw_error (r=0x7f3bb0010560, ret=-123, > acct_failed=false) at ../hw/scsi/scsi-disk.c:242 > #19 0x56418239e13f in scsi_disk_req_check_error (r=0x7f3bb0010560, > ret=-123, acct_failed=false) at ../hw/scsi/scsi-disk.c:265 > #20 0x56418239e482 in scsi_dma_complete_noio (r=0x7f3bb0010560, ret=-123) > at ../hw/scsi/scsi-disk.c:340 > #21 0x56418239e5d9 in scsi_dma_complete (opaque=0x7f3bb0010560, ret=-123) > at ../hw/scsi/scsi-disk.c:371 > #22 0x5641824809ad in dma_complete (dbs=0x7f3bb000d9d0, ret=-123) at > ../softmmu/dma-helpers.c:107 > #23 0x564182480a72 in dma_blk_cb (opaque=0x7f3bb000d9d0, ret=-123) at > ../softmmu/dma-helpers.c:127 > #24 0x5641827bf78a in blk_aio_complete (acb=0x7f3bb00021a0) at > ../block/block-backend.c:1563 > #25 0x5641827bfa5e in blk_aio_write_entry (opaque=0x7f3bb00021a0) at > ../block/block-backend.c:1630 > #26 0x56418295638a in coroutine_trampoline (i0=-1342102448, i1=32571) at > ../util/coroutine-ucontext.c:177 > #27 0x7f3bc0caed40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > #28 0x7f3bbd8757f0 in ?? () > #29 0x in ?? () > > Thread 1 (Thread 0x7f3bbe3e9280 (LWP 135944) "qemu-system-x86"): > #0 __lll_lock_wait (futex=futex@entry=0x5641856f2a00, private=0) at > lowlevellock.c:52 > #1 0x7f3bc1c0d8d1 in __GI___pthread_mutex_lock (mutex=0x5641856f2a00) at > ../nptl/pthread_mutex_lock.c:115 > #2 0x564182939f2e in qemu_mutex_lock_impl (mutex=0x5641856f2a00, > file=0x564182c0e319 "../util/async.c", line=728) at > ../util/qemu-thread-posix.c:94 > #3 0x56418293a140 in qemu_rec_mutex_lock_impl (mutex=0x5641856f2a00, > file=0x564182c0e319 "../util/async.c", line=728) at > ../util/qemu-thread-posix.c:149 > #4 0x5641829532d5 in aio_context_acquire (ctx=0x5641856f29a0) at > ../util/async.c:728 > #5 0x56418279d5df in bdrv_set_aio_context_commit (opaque=0x5641856e6e50) > at ../block.c:7493 > #6 0x56418294e288 in tran_commit (tran=0x56418630bfe0) at > ../util/transactions.c:87 > #7 0x56418279d880 in bdrv_try_change_aio_co
[Stable-8.0.4 10/31] target/s390x: Fix LRA when DAT is off
From: Ilya Leoshkevich LRA should perform DAT regardless of whether it's on or off. Disable DAT check for MMU_S390_LRA. Fixes: defb0e3157af ("s390x: Implement opcode helpers") Signed-off-by: Ilya Leoshkevich Reviewed-by: David Hildenbrand Cc: qemu-sta...@nongnu.org Message-Id: <20230704081506.276055-7-...@linux.ibm.com> Signed-off-by: Thomas Huth (cherry picked from commit b0ef81062d2404ccef0289b1cc6e70244901c9be) Signed-off-by: Michael Tokarev diff --git a/target/s390x/mmu_helper.c b/target/s390x/mmu_helper.c index b04b57c235..fbb2f1b4d4 100644 --- a/target/s390x/mmu_helper.c +++ b/target/s390x/mmu_helper.c @@ -417,7 +417,7 @@ int mmu_translate(CPUS390XState *env, target_ulong vaddr, int rw, uint64_t asc, vaddr &= TARGET_PAGE_MASK; -if (!(env->psw.mask & PSW_MASK_DAT)) { +if (rw != MMU_S390_LRA && !(env->psw.mask & PSW_MASK_DAT)) { *raddr = vaddr; goto nodat; } -- 2.39.2
[Stable-8.0.4 06/31] target/s390x: Fix EPSW CC reporting
From: Ilya Leoshkevich EPSW should explicitly calculate and insert CC, like IPM does. Fixes: e30a9d3fea58 ("target-s390: Implement EPSW") Signed-off-by: Ilya Leoshkevich Reviewed-by: David Hildenbrand Cc: qemu-sta...@nongnu.org Message-Id: <20230704081506.276055-3-...@linux.ibm.com> Signed-off-by: Thomas Huth (cherry picked from commit 110b1bac2ecd94a78a1d38003e24e37367bf074e) Signed-off-by: Michael Tokarev diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c index 82900f53f4..0c22d2f17f 100644 --- a/target/s390x/tcg/translate.c +++ b/target/s390x/tcg/translate.c @@ -2393,10 +2393,14 @@ static DisasJumpType op_epsw(DisasContext *s, DisasOps *o) int r1 = get_field(s, r1); int r2 = get_field(s, r2); TCGv_i64 t = tcg_temp_new_i64(); +TCGv_i64 t_cc = tcg_temp_new_i64(); /* Note the "subsequently" in the PoO, which implies a defined result if r1 == r2. Thus we cannot defer these writes to an output hook. */ +gen_op_calc_cc(s); +tcg_gen_extu_i32_i64(t_cc, cc_op); tcg_gen_shri_i64(t, psw_mask, 32); +tcg_gen_deposit_i64(t, t, t_cc, 12, 2); store_reg32_i64(r1, t); if (r2 != 0) { store_reg32_i64(r2, psw_mask); -- 2.39.2
[Stable-8.0.4 17/31] linux-user: Make sure initial brk(0) is page-aligned
From: Andreas Schwab Fixes: 86f04735ac ("linux-user: Fix brk() to release pages") Signed-off-by: Andreas Schwab Message-Id: Reviewed-by: Richard Henderson Signed-off-by: Richard Henderson (cherry picked from commit d28b3c90cfad1a7e211ae2bce36ecb9071086129) Signed-off-by: Michael Tokarev diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 150d70633e..57aaa87e30 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -806,7 +806,7 @@ static abi_ulong brk_page; void target_set_brk(abi_ulong new_brk) { -target_brk = new_brk; +target_brk = TARGET_PAGE_ALIGN(new_brk); brk_page = HOST_PAGE_ALIGN(target_brk); } -- 2.39.2
[Stable-8.0.4 22/31] qemu-nbd: pass structure into nbd_client_thread instead of plain char*
From: "Denis V. Lunev" We are going to pass additional flag inside next patch. Signed-off-by: Denis V. Lunev CC: Eric Blake CC: Vladimir Sementsov-Ogievskiy CC: Message-ID: <20230717145544.194786-2-...@openvz.org> Reviewed-by: Eric Blake Signed-off-by: Eric Blake (cherry picked from commit 03b67621445d601c9cdc7dfe25812e9f19b81488) Signed-off-by: Michael Tokarev diff --git a/qemu-nbd.c b/qemu-nbd.c index 6ff45308a9..87c46bb0e5 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -272,9 +272,13 @@ static void *show_parts(void *arg) return NULL; } +struct NbdClientOpts { +char *device; +}; + static void *nbd_client_thread(void *arg) { -char *device = arg; +struct NbdClientOpts *opts = arg; NBDExportInfo info = { .request_sizes = false, .name = g_strdup("") }; QIOChannelSocket *sioc; int fd = -1; @@ -298,10 +302,10 @@ static void *nbd_client_thread(void *arg) goto out; } -fd = open(device, O_RDWR); +fd = open(opts->device, O_RDWR); if (fd < 0) { /* Linux-only, we can use %m in printf. */ -error_report("Failed to open %s: %m", device); +error_report("Failed to open %s: %m", opts->device); goto out; } @@ -311,11 +315,11 @@ static void *nbd_client_thread(void *arg) } /* update partition table */ -pthread_create(&show_parts_thread, NULL, show_parts, device); +pthread_create(&show_parts_thread, NULL, show_parts, opts->device); if (verbose) { fprintf(stderr, "NBD device %s is now connected to %s\n", -device, srcpath); +opts->device, srcpath); } else { /* Close stderr so that the qemu-nbd process exits. */ dup2(STDOUT_FILENO, STDERR_FILENO); @@ -1121,8 +1125,11 @@ int main(int argc, char **argv) if (device) { #if HAVE_NBD_DEVICE int ret; +struct NbdClientOpts opts = { +.device = device, +}; -ret = pthread_create(&client_thread, NULL, nbd_client_thread, device); +ret = pthread_create(&client_thread, NULL, nbd_client_thread, &opts); if (ret != 0) { error_report("Failed to create client thread: %s", strerror(ret)); exit(EXIT_FAILURE); -- 2.39.2
[Stable-8.0.4 15/31] linux-user/arm: Do not allocate a commpage at all for M-profile CPUs
From: Philippe Mathieu-Daudé Since commit fbd3c4cff6 ("linux-user/arm: Mark the commpage executable") executing bare-metal (linked with rdimon.specs) cortex-M code fails as: $ qemu-arm -cpu cortex-m3 ~/hello.exe.m3 qemu-arm: ../../accel/tcg/user-exec.c:492: page_set_flags: Assertion `last <= GUEST_ADDR_MAX' failed. Aborted (core dumped) Commit 4f5c67f8df ("linux-user/arm: Take more care allocating commpage") already took care of not allocating a commpage for M-profile CPUs, however it had to be reverted as commit 6cda41daa2. Re-introduce the M-profile fix from commit 4f5c67f8df. Fixes: fbd3c4cff6 ("linux-user/arm: Mark the commpage executable") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1755 Reported-by: Christophe Lyon Suggested-by: Richard Henderson Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Anton Johansson Reviewed-by: Richard Henderson Message-Id: <20230711153408.68389-1-phi...@linaro.org> Signed-off-by: Richard Henderson (cherry picked from commit d713cf4d6c71076513a10528303b3e337b4d5998) Signed-off-by: Michael Tokarev diff --git a/linux-user/elfload.c b/linux-user/elfload.c index f1370a7a8b..88ef26dc03 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -423,10 +423,23 @@ enum { static bool init_guest_commpage(void) { -abi_ptr commpage = HI_COMMPAGE & -qemu_host_page_size; -void *want = g2h_untagged(commpage); -void *addr = mmap(want, qemu_host_page_size, PROT_READ | PROT_WRITE, - MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); +ARMCPU *cpu = ARM_CPU(thread_cpu); +abi_ptr commpage; +void *want; +void *addr; + +/* + * M-profile allocates maximum of 2GB address space, so can never + * allocate the commpage. Skip it. + */ +if (arm_feature(&cpu->env, ARM_FEATURE_M)) { +return true; +} + +commpage = HI_COMMPAGE & -qemu_host_page_size; +want = g2h_untagged(commpage); +addr = mmap(want, qemu_host_page_size, PROT_READ | PROT_WRITE, +MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); if (addr == MAP_FAILED) { perror("Allocating guest commpage"); -- 2.39.2
[Stable-8.0.4 08/31] target/s390x: Fix MVCRL with a large value in R0
From: Ilya Leoshkevich Using a large R0 causes an assertion error: qemu-s390x: target/s390x/tcg/mem_helper.c:183: access_prepare_nf: Assertion `size > 0 && size <= 4096' failed. Even though PoP explicitly advises against using more than 8 bits for the size, an emulator crash is never a good thing. Fix by truncating the size to 8 bits. Fixes: ea0a1053e276 ("s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x") Signed-off-by: Ilya Leoshkevich Reviewed-by: David Hildenbrand Cc: qemu-sta...@nongnu.org Message-Id: <20230704081506.276055-5-...@linux.ibm.com> Signed-off-by: Thomas Huth (cherry picked from commit 92a57534619a4058544ce8f9c0beae3e054f342b) Signed-off-by: Michael Tokarev diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c index 8b58b8d88d..7141d0ad88 100644 --- a/target/s390x/tcg/mem_helper.c +++ b/target/s390x/tcg/mem_helper.c @@ -514,6 +514,7 @@ void HELPER(mvcrl)(CPUS390XState *env, uint64_t l, uint64_t dest, uint64_t src) int32_t i; /* MVCRL always copies one more byte than specified - maximum is 256 */ +l &= 0xff; l++; access_prepare(&srca, env, src, l, MMU_DATA_LOAD, mmu_idx, ra); -- 2.39.2
[Stable-8.0.4 23/31] qemu-nbd: fix regression with qemu-nbd --fork run over ssh
From: "Denis V. Lunev" Commit e6df58a5578fee7a50bbf36f4a50a2781cff855d Author: Hanna Reitz Date: Wed May 8 23:18:18 2019 +0200 qemu-nbd: Do not close stderr has introduced an interesting regression. Original behavior of ssh somehost qemu-nbd /home/den/tmp/file -f raw --fork was the following: * qemu-nbd was started as a daemon * the command execution is done and ssh exited with success The patch has changed this behavior and 'ssh' command now hangs forever. According to the normal specification of the daemon() call, we should endup with STDERR pointing to /dev/null. That should be done at the very end of the successful startup sequence when the pipe to the bootstrap process (used for diagnostics) is no longer needed. This could be achived in the same way as done for 'qemu-nbd -c' case. That was commit 0eaf453e, also fixing up e6df58a5. STDOUT copying to STDERR does the trick. This also leads to proper 'ssh' connection closing which fixes my original problem. Signed-off-by: Denis V. Lunev CC: Eric Blake CC: Vladimir Sementsov-Ogievskiy CC: Hanna Reitz CC: Message-ID: <20230717145544.194786-3-...@openvz.org> Reviewed-by: Eric Blake Signed-off-by: Eric Blake (cherry picked from commit 5c56dd27a2c905c9cf2472d2fd057621ce5fd00d) Signed-off-by: Michael Tokarev diff --git a/qemu-nbd.c b/qemu-nbd.c index 87c46bb0e5..e64f45f767 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -274,6 +274,7 @@ static void *show_parts(void *arg) struct NbdClientOpts { char *device; +bool fork_process; }; static void *nbd_client_thread(void *arg) @@ -317,7 +318,7 @@ static void *nbd_client_thread(void *arg) /* update partition table */ pthread_create(&show_parts_thread, NULL, show_parts, opts->device); -if (verbose) { +if (verbose && !opts->fork_process) { fprintf(stderr, "NBD device %s is now connected to %s\n", opts->device, srcpath); } else { @@ -579,7 +580,6 @@ int main(int argc, char **argv) bool writethrough = false; /* Client will flush as needed. */ bool fork_process = false; bool list = false; -int old_stderr = -1; unsigned socket_activation; const char *pid_file_name = NULL; const char *selinux_label = NULL; @@ -934,11 +934,6 @@ int main(int argc, char **argv) } else if (pid == 0) { close(stderr_fd[0]); -/* Remember parent's stderr if we will be restoring it. */ -if (fork_process) { -old_stderr = dup(STDERR_FILENO); -} - ret = qemu_daemon(1, 0); /* Temporarily redirect stderr to the parent's pipe... */ @@ -1127,6 +1122,7 @@ int main(int argc, char **argv) int ret; struct NbdClientOpts opts = { .device = device, +.fork_process = fork_process, }; ret = pthread_create(&client_thread, NULL, nbd_client_thread, &opts); @@ -1155,8 +1151,7 @@ int main(int argc, char **argv) } if (fork_process) { -dup2(old_stderr, STDERR_FILENO); -close(old_stderr); +dup2(STDOUT_FILENO, STDERR_FILENO); } state = RUNNING; -- 2.39.2
[Stable-8.0.4 07/31] target/s390x: Fix MDEB and MDEBR
From: Ilya Leoshkevich These instructions multiply 32 bits by 32 bits, not 32 bits by 64 bits. Fixes: 83b00736f3d8 ("target-s390: Convert FP MULTIPLY") Signed-off-by: Ilya Leoshkevich Reviewed-by: David Hildenbrand Cc: qemu-sta...@nongnu.org Message-Id: <20230704081506.276055-4-...@linux.ibm.com> Signed-off-by: Thomas Huth (cherry picked from commit fed9a4fe0ce0ec917a6b3a2da0a7ecd3cb9eba56) Signed-off-by: Michael Tokarev diff --git a/target/s390x/tcg/fpu_helper.c b/target/s390x/tcg/fpu_helper.c index 57e5829283..4b7fa58af3 100644 --- a/target/s390x/tcg/fpu_helper.c +++ b/target/s390x/tcg/fpu_helper.c @@ -306,8 +306,9 @@ uint64_t HELPER(mdb)(CPUS390XState *env, uint64_t f1, uint64_t f2) /* 64/32-bit FP multiplication */ uint64_t HELPER(mdeb)(CPUS390XState *env, uint64_t f1, uint64_t f2) { +float64 f1_64 = float32_to_float64(f1, &env->fpu_status); float64 ret = float32_to_float64(f2, &env->fpu_status); -ret = float64_mul(f1, ret, &env->fpu_status); +ret = float64_mul(f1_64, ret, &env->fpu_status); handle_exceptions(env, false, GETPC()); return ret; } diff --git a/target/s390x/tcg/insn-data.h.inc b/target/s390x/tcg/insn-data.h.inc index a586cc515b..295eb07173 100644 --- a/target/s390x/tcg/insn-data.h.inc +++ b/target/s390x/tcg/insn-data.h.inc @@ -667,11 +667,11 @@ F(0xb317, MEEBR, RRE, Z, e1, e2, new, e1, meeb, 0, IF_BFP) F(0xb31c, MDBR,RRE, Z, f1, f2, new, f1, mdb, 0, IF_BFP) F(0xb34c, MXBR,RRE, Z, x1, x2, new_x, x1, mxb, 0, IF_BFP) -F(0xb30c, MDEBR, RRE, Z, f1, e2, new, f1, mdeb, 0, IF_BFP) +F(0xb30c, MDEBR, RRE, Z, e1, e2, new, f1, mdeb, 0, IF_BFP) F(0xb307, MXDBR, RRE, Z, f1, f2, new_x, x1, mxdb, 0, IF_BFP) F(0xed17, MEEB,RXE, Z, e1, m2_32u, new, e1, meeb, 0, IF_BFP) F(0xed1c, MDB, RXE, Z, f1, m2_64, new, f1, mdb, 0, IF_BFP) -F(0xed0c, MDEB,RXE, Z, f1, m2_32u, new, f1, mdeb, 0, IF_BFP) +F(0xed0c, MDEB,RXE, Z, e1, m2_32u, new, f1, mdeb, 0, IF_BFP) F(0xed07, MXDB,RXE, Z, f1, m2_64, new_x, x1, mxdb, 0, IF_BFP) /* MULTIPLY HALFWORD */ C(0x4c00, MH, RX_a, Z, r1_o, m2_16s, new, r1_32, mul, 0) -- 2.39.2
[Stable-8.0.4 29/31] target/s390x: Fix ICM with M3=0
From: Ilya Leoshkevich When the mask is zero, access exceptions should still be recognized for 1 byte at the second-operand address. CC should be set to 0. Cc: qemu-sta...@nongnu.org Fixes: e023e832d0ac ("s390x: translate engine for s390x CPU") Reviewed-by: David Hildenbrand Reviewed-by: Richard Henderson Signed-off-by: Ilya Leoshkevich Message-Id: <20230724082032.66864-5-...@linux.ibm.com> Signed-off-by: Thomas Huth (cherry picked from commit a2025557ed4d8d5e6a4d0dd681717c390f51f5be) Signed-off-by: Michael Tokarev diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c index f005539861..9e9fa3cef0 100644 --- a/target/s390x/tcg/translate.c +++ b/target/s390x/tcg/translate.c @@ -2525,6 +2525,12 @@ static DisasJumpType op_icm(DisasContext *s, DisasOps *o) ccm = ((1ull << len) - 1) << pos; break; +case 0: +/* Recognize access exceptions for the first byte. */ +tcg_gen_qemu_ld_i64(tmp, o->in2, get_mem_index(s), MO_UB); +gen_op_movi_cc(s, 0); +return DISAS_NEXT; + default: /* This is going to be a sequence of loads and inserts. */ pos = base + 32 - 8; -- 2.39.2
[Stable-8.0.4 00/31] Patch Round-up for stable 8.0.4, freeze on 2023-08-05
The following patches are queued for QEMU stable v8.0.4: https://gitlab.com/qemu-project/qemu/-/commits/staging-8.0 Patch freeze is 2023-08-05, and the release is planned for 2023-08-07: https://wiki.qemu.org/Planning/8.0 Please respond here or CC qemu-sta...@nongnu.org on any additional patches you think should (or shouldn't) be included in the release. The changes which are staging for inclusion, with the original commit hash from master branch, are given below the bottom line. Thanks! /mjt -- 01 4271f4038372 Laurent Vivier: virtio-net: correctly report maximum tx_queue_size value 02 ca2a5e630dc1 Fiona Ebner: qemu_cleanup: begin drained section after vm_shutdown() 03 2ad2e113deb5 Nicholas Piggin: hw/ppc: Fix clock update drift 04 dca4c8384d68 Helge Deller: linux-user: Fix accept4(SOCK_NONBLOCK) syscall 05 8af87a3ec7e4 Avihai Horon: vfio: Fix null pointer dereference bug in vfio_bars_finalize() 06 110b1bac2ecd Ilya Leoshkevich: target/s390x: Fix EPSW CC reporting 07 fed9a4fe0ce0 Ilya Leoshkevich: target/s390x: Fix MDEB and MDEBR 08 92a57534619a Ilya Leoshkevich: target/s390x: Fix MVCRL with a large value in R0 09 6da311a60d58 Ilya Leoshkevich: target/s390x: Fix LRA overwriting the top 32 bits on DAT error 10 b0ef81062d24 Ilya Leoshkevich: target/s390x: Fix LRA when DAT is off 11 baf21eebc3e1 Marcin Nowakowski: target/mips: enable GINVx support for I6400 and I6500 12 230dfd9257e9 Olaf Hering: hw/ide/piix: properly initialize the BMIBA register 13 7a8d9f3a0e88 Pierrick Bouvier: linux-user/syscall: Implement execve without execveat 14 e18ed26ce785 Richard Henderson: tcg: Fix info_in_idx increment in layout_arg_by_ref 15 d713cf4d6c71 Philippe Mathieu-Daudé: linux-user/arm: Do not allocate a commpage at all for M-profile CPUs 16 d921fea338c1 Mauro Matteo Cascella: ui/vnc-clipboard: fix infinite loop in inflate_buffer (CVE-2023-3255) 17 d28b3c90cfad Andreas Schwab: linux-user: Make sure initial brk(0) is page-aligned 18 ea3c76f1494d Klaus Jensen: hw/nvme: fix endianness issue for shadow doorbells 19 15ad98536ad9 Helge Deller: linux-user: Fix qemu brk() to not zero bytes on current page 20 dfe49864afb0 Helge Deller: linux-user: Prohibit brk() to to shrink below initial heap address 21 eac78a4b0b7d Helge Deller: linux-user: Fix signed math overflow in brk() syscall 22 03b67621445d Denis V. Lunev: qemu-nbd: pass structure into nbd_client_thread instead of plain char* 23 5c56dd27a2c9 Denis V. Lunev: qemu-nbd: fix regression with qemu-nbd --fork run over ssh 24 736a1588c104 Jordan Niethe: tcg/ppc: Fix race in goto_tb implementation 25 22d2e5351a18 Ilya Leoshkevich: tcg/{i386, s390x}: Add earlyclobber to the op_add2's first output 26 761b0aa9381e Ilya Leoshkevich: target/s390x: Make CKSM raise an exception if R2 is odd 27 4b6e4c0b8223 Ilya Leoshkevich: target/s390x: Fix CLM with M3=0 28 53684e344a27 Ilya Leoshkevich: target/s390x: Fix CONVERT TO LOGICAL/FIXED with out-of-range inputs 29 a2025557ed4d Ilya Leoshkevich: target/s390x: Fix ICM with M3=0 30 9c028c057adc Ilya Leoshkevich: target/s390x: Make MC raise specification exception when class >= 16 31 ff537b0370ab Ilya Leoshkevich: target/s390x: Fix assertion failure in VFMIN/VFMAX with type 13
[Stable-8.0.4 24/31] tcg/ppc: Fix race in goto_tb implementation
From: Jordan Niethe Commit 20b6643324 ("tcg/ppc: Reorg goto_tb implementation") modified goto_tb to ensure only a single instruction was patched to prevent incorrect behavior if a thread was in the middle of multiple instructions when they were replaced. However this introduced a race between loading the jmp target into TCG_REG_TB and patching and executing the direct branch. The relevant part of the goto_tb implementation: ld TCG_REG_TB, TARGET_ADDR_LOCATION(TCG_REG_TB) patch_location: mtctr TCG_REG_TB bctr tb_target_set_jmp_target() will replace 'patch_location' with a direct branch if the target is in range. The direct branch now relies on TCG_REG_TB being set up correctly by the ld. Prior to this commit multiple instructions were patched in for the direct branch case; these instructions would initialize TCG_REG_TB to the same value as the branch target. Imagine the following sequence: 1) Thread A is executing the goto_tb sequence and loads the jmp target into TCG_REG_TB. 2) Thread B updates the jmp target address and calls tb_target_set_jmp_target(). This patches a new direct branch into the goto_tb sequence. 3) Thread A executes the newly patched direct branch. The value in TCG_REG_TB still contains the old jmp target. TCG_REG_TB MUST contain the translation block's tc.ptr. Execution will eventually crash after performing memory accesses generated from a faulty value in TCG_REG_TB. This presents as segfaults or illegal instruction exceptions. Do not revert commit 20b6643324 as it did fix a different race condition. Instead remove the direct branch optimization and always use indirect branches. The direct branch optimization can be re-added later with a race free sequence. Fixes: 20b6643324 ("tcg/ppc: Reorg goto_tb implementation") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1726 Reported-by: Anushree Mathur Tested-by: Anushree Mathur Tested-by: Michael Tokarev Reviewed-by: Richard Henderson Co-developed-by: Benjamin Gray Signed-off-by: Jordan Niethe Signed-off-by: Benjamin Gray Message-Id: <20230717093001.13167-1-jniet...@gmail.com> (cherry picked from commit 736a1588c104e9995c1831df33554df1f1def8b8) Signed-off-by: Michael Tokarev diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc index 066b49224a..c68bf08e38 100644 --- a/tcg/ppc/tcg-target.c.inc +++ b/tcg/ppc/tcg-target.c.inc @@ -2546,11 +2546,10 @@ static void tcg_out_goto_tb(TCGContext *s, int which) ptrdiff_t offset = tcg_tbrel_diff(s, (void *)ptr); tcg_out_mem_long(s, LD, LDX, TCG_REG_TB, TCG_REG_TB, offset); -/* Direct branch will be patched by tb_target_set_jmp_target. */ +/* TODO: Use direct branches when possible. */ set_jmp_insn_offset(s, which); tcg_out32(s, MTSPR | RS(TCG_REG_TB) | CTR); -/* When branch is out of range, fall through to indirect. */ tcg_out32(s, BCCTR | BO_ALWAYS); /* For the unlinked case, need to reset TCG_REG_TB. */ @@ -2578,10 +2577,12 @@ void tb_target_set_jmp_target(const TranslationBlock *tb, int n, intptr_t diff = addr - jmp_rx; tcg_insn_unit insn; +if (USE_REG_TB) { +return; +} + if (in_range_b(diff)) { insn = B | (diff & 0x3fc); -} else if (USE_REG_TB) { -insn = MTSPR | RS(TCG_REG_TB) | CTR; } else { insn = NOP; } -- 2.39.2
[Stable-8.0.4 05/31] vfio: Fix null pointer dereference bug in vfio_bars_finalize()
From: Avihai Horon vfio_realize() has the following flow: 1. vfio_bars_prepare() -- sets VFIOBAR->size. 2. msix_early_setup(). 3. vfio_bars_register() -- allocates VFIOBAR->mr. After vfio_bars_prepare() is called msix_early_setup() can fail. If it does fail, vfio_bars_register() is never called and VFIOBAR->mr is not allocated. In this case, vfio_bars_finalize() is called as part of the error flow to free the bars' resources. However, vfio_bars_finalize() calls object_unparent() for VFIOBAR->mr after checking only VFIOBAR->size, and thus we get a null pointer dereference. Fix it by checking VFIOBAR->mr in vfio_bars_finalize(). Fixes: 89d5202edc50 ("vfio/pci: Allow relocating MSI-X MMIO") Signed-off-by: Avihai Horon Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Cédric Le Goater Reviewed-by: Alex Williamson Signed-off-by: Cédric Le Goater (cherry picked from commit 8af87a3ec7e42ff1b9cf75ceee0451c31e34d153) Signed-off-by: Michael Tokarev diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 4773cc1f2b..53dcb3efaa 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -1752,9 +1752,11 @@ static void vfio_bars_finalize(VFIOPCIDevice *vdev) vfio_bar_quirk_finalize(vdev, i); vfio_region_finalize(&bar->region); -if (bar->size) { +if (bar->mr) { +assert(bar->size); object_unparent(OBJECT(bar->mr)); g_free(bar->mr); +bar->mr = NULL; } } -- 2.39.2
[Stable-8.0.4 20/31] linux-user: Prohibit brk() to to shrink below initial heap address
From: Helge Deller Since commit 86f04735ac ("linux-user: Fix brk() to release pages") it's possible for userspace applications to reduce their memory footprint by calling brk() with a lower address and free up memory. Before that commit guest heap memory was never unmapped. But the Linux kernel prohibits to reduce brk() below the initial memory address which is set at startup by the set_brk() function in binfmt_elf.c. Such a range check was missed in commit 86f04735ac. This patch adds the missing check by storing the initial brk value in initial_target_brk and verify any new brk addresses against that value. Tested with the i386 upx binary from https://github.com/upx/upx/releases/download/v4.0.2/upx-4.0.2-i386_linux.tar.xz Signed-off-by: Helge Deller Tested-by: "Markus F.X.J. Oberhumer" Fixes: 86f04735ac ("linux-user: Fix brk() to release pages") Cc: qemu-sta...@nongnu.org Buglink: https://github.com/upx/upx/issues/683 (cherry picked from commit dfe49864afb06e7e452a4366051697bc4fcfc1a5) Signed-off-by: Michael Tokarev diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 450487af57..e106633837 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -801,12 +801,13 @@ static inline int host_to_target_sock_type(int host_type) return target_type; } -static abi_ulong target_brk; +static abi_ulong target_brk, initial_target_brk; static abi_ulong brk_page; void target_set_brk(abi_ulong new_brk) { target_brk = TARGET_PAGE_ALIGN(new_brk); +initial_target_brk = target_brk; brk_page = HOST_PAGE_ALIGN(target_brk); } @@ -824,6 +825,11 @@ abi_long do_brk(abi_ulong brk_val) return target_brk; } +/* do not allow to shrink below initial brk value */ +if (brk_val < initial_target_brk) { +brk_val = initial_target_brk; +} + new_brk = TARGET_PAGE_ALIGN(brk_val); new_host_brk_page = HOST_PAGE_ALIGN(brk_val); -- 2.39.2
[Stable-8.0.4 28/31] target/s390x: Fix CONVERT TO LOGICAL/FIXED with out-of-range inputs
From: Ilya Leoshkevich CONVERT TO LOGICAL/FIXED deviate from IEEE 754 in that they raise an inexact exception on out-of-range inputs. float_flag_invalid_cvti aligns nicely with that behavior, so convert it to S390_IEEE_MASK_INEXACT. Cc: qemu-sta...@nongnu.org Fixes: defb0e3157af ("s390x: Implement opcode helpers") Reviewed-by: David Hildenbrand Reviewed-by: Richard Henderson Signed-off-by: Ilya Leoshkevich Message-Id: <20230724082032.66864-4-...@linux.ibm.com> Signed-off-by: Thomas Huth (cherry picked from commit 53684e344a27da770acc9012740334154ddea24f) Signed-off-by: Michael Tokarev diff --git a/target/s390x/tcg/fpu_helper.c b/target/s390x/tcg/fpu_helper.c index 4b7fa58af3..3d941ed2d2 100644 --- a/target/s390x/tcg/fpu_helper.c +++ b/target/s390x/tcg/fpu_helper.c @@ -52,7 +52,8 @@ uint8_t s390_softfloat_exc_to_ieee(unsigned int exc) s390_exc |= (exc & float_flag_divbyzero) ? S390_IEEE_MASK_DIVBYZERO : 0; s390_exc |= (exc & float_flag_overflow) ? S390_IEEE_MASK_OVERFLOW : 0; s390_exc |= (exc & float_flag_underflow) ? S390_IEEE_MASK_UNDERFLOW : 0; -s390_exc |= (exc & float_flag_inexact) ? S390_IEEE_MASK_INEXACT : 0; +s390_exc |= (exc & (float_flag_inexact | float_flag_invalid_cvti)) ? +S390_IEEE_MASK_INEXACT : 0; return s390_exc; } -- 2.39.2
[Stable-8.0.4 18/31] hw/nvme: fix endianness issue for shadow doorbells
From: Klaus Jensen In commit 2fda0726e514 ("hw/nvme: fix missing endian conversions for doorbell buffers"), we fixed shadow doorbells for big-endian guests running on little endian hosts. But I did not fix little-endian guests on big-endian hosts. Fix this. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1765 Fixes: 3f7fe8de3d49 ("hw/nvme: Implement shadow doorbell buffer support") Cc: qemu-sta...@nongnu.org Reported-by: Thomas Huth Tested-by: Cédric Le Goater Tested-by: Thomas Huth Reviewed-by: Keith Busch Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Klaus Jensen (cherry picked from commit ea3c76f1494d0c75873c3b470e6e048202661ad8) Signed-off-by: Michael Tokarev diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index ac24eeb5ed..2097fb1310 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -6767,6 +6767,7 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const NvmeRequest *req) PCIDevice *pci = PCI_DEVICE(n); uint64_t dbs_addr = le64_to_cpu(req->cmd.dptr.prp1); uint64_t eis_addr = le64_to_cpu(req->cmd.dptr.prp2); +uint32_t v; int i; /* Address should be page aligned */ @@ -6784,6 +6785,8 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const NvmeRequest *req) NvmeCQueue *cq = n->cq[i]; if (sq) { +v = cpu_to_le32(sq->tail); + /* * CAP.DSTRD is 0, so offset of ith sq db_addr is (i<<3) * nvme_process_db() uses this hard-coded way to calculate @@ -6791,7 +6794,7 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const NvmeRequest *req) */ sq->db_addr = dbs_addr + (i << 3); sq->ei_addr = eis_addr + (i << 3); -pci_dma_write(pci, sq->db_addr, &sq->tail, sizeof(sq->tail)); +pci_dma_write(pci, sq->db_addr, &v, sizeof(sq->tail)); if (n->params.ioeventfd && sq->sqid != 0) { if (!nvme_init_sq_ioeventfd(sq)) { @@ -6801,10 +6804,12 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const NvmeRequest *req) } if (cq) { +v = cpu_to_le32(cq->head); + /* CAP.DSTRD is 0, so offset of ith cq db_addr is (i<<3)+(1<<2) */ cq->db_addr = dbs_addr + (i << 3) + (1 << 2); cq->ei_addr = eis_addr + (i << 3) + (1 << 2); -pci_dma_write(pci, cq->db_addr, &cq->head, sizeof(cq->head)); +pci_dma_write(pci, cq->db_addr, &v, sizeof(cq->head)); if (n->params.ioeventfd && cq->cqid != 0) { if (!nvme_init_cq_ioeventfd(cq)) { @@ -7555,7 +7560,7 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr addr, unsigned size) static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) { PCIDevice *pci = PCI_DEVICE(n); -uint32_t qid; +uint32_t qid, v; if (unlikely(addr & ((1 << 2) - 1))) { NVME_GUEST_ERR(pci_nvme_ub_db_wr_misaligned, @@ -7622,7 +7627,8 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) start_sqs = nvme_cq_full(cq) ? 1 : 0; cq->head = new_head; if (!qid && n->dbbuf_enabled) { -pci_dma_write(pci, cq->db_addr, &cq->head, sizeof(cq->head)); +v = cpu_to_le32(cq->head); +pci_dma_write(pci, cq->db_addr, &v, sizeof(cq->head)); } if (start_sqs) { NvmeSQueue *sq; @@ -7682,6 +7688,8 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) sq->tail = new_tail; if (!qid && n->dbbuf_enabled) { +v = cpu_to_le32(sq->tail); + /* * The spec states "the host shall also update the controller's * corresponding doorbell property to match the value of that entry @@ -7695,7 +7703,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val) * including ones that run on Linux, are not updating Admin Queues, * so we can't trust reading it for an appropriate sq tail. */ -pci_dma_write(pci, sq->db_addr, &sq->tail, sizeof(sq->tail)); +pci_dma_write(pci, sq->db_addr, &v, sizeof(sq->tail)); } qemu_bh_schedule(sq->bh); -- 2.39.2
[Stable-8.0.4 16/31] ui/vnc-clipboard: fix infinite loop in inflate_buffer (CVE-2023-3255)
From: Mauro Matteo Cascella A wrong exit condition may lead to an infinite loop when inflating a valid zlib buffer containing some extra bytes in the `inflate_buffer` function. The bug only occurs post-authentication. Return the buffer immediately if the end of the compressed data has been reached (Z_STREAM_END). Fixes: CVE-2023-3255 Fixes: 0bf41cab ("ui/vnc: clipboard support") Reported-by: Kevin Denis Signed-off-by: Mauro Matteo Cascella Reviewed-by: Marc-André Lureau Tested-by: Marc-André Lureau Message-ID: <20230704084210.101822-1-mcasc...@redhat.com> (cherry picked from commit d921fea338c1059a27ce7b75309d7a2e485f710b) Signed-off-by: Michael Tokarev diff --git a/ui/vnc-clipboard.c b/ui/vnc-clipboard.c index 8aeadfaa21..c759be3438 100644 --- a/ui/vnc-clipboard.c +++ b/ui/vnc-clipboard.c @@ -50,8 +50,11 @@ static uint8_t *inflate_buffer(uint8_t *in, uint32_t in_len, uint32_t *size) ret = inflate(&stream, Z_FINISH); switch (ret) { case Z_OK: -case Z_STREAM_END: break; +case Z_STREAM_END: +*size = stream.total_out; +inflateEnd(&stream); +return out; case Z_BUF_ERROR: out_len <<= 1; if (out_len > (1 << 20)) { @@ -66,11 +69,6 @@ static uint8_t *inflate_buffer(uint8_t *in, uint32_t in_len, uint32_t *size) } } -*size = stream.total_out; -inflateEnd(&stream); - -return out; - err_end: inflateEnd(&stream); err: -- 2.39.2
[Stable-8.0.4 03/31] hw/ppc: Fix clock update drift
From: Nicholas Piggin The clock update logic reads the clock twice to compute the new clock value, with a value derived from the later time subtracted from a value derived from the earlier time. The delta causes time to be lost. This can ultimately result in time becoming unsynchronized between CPUs and that can cause OS lockups, timeouts, watchdogs, etc. This can be seen running a KVM guest (that causes lots of TB updates) on a powernv SMP machine. Fix this by reading the clock once. Cc: qemu-sta...@nongnu.org Fixes: dbdd25065e90 ("Implement time-base start/stop helpers.") Signed-off-by: Nicholas Piggin Reviewed-by: Cédric Le Goater Reviewed-by: Frederic Barrat Message-ID: <20230629020713.327745-1-npig...@gmail.com> Signed-off-by: Daniel Henrique Barboza (cherry picked from commit 2ad2e113deb5663e69a05dd6922cbfc6d7ea34d3) Signed-off-by: Michael Tokarev diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c index d80b0adc6c..85d442fbce 100644 --- a/hw/ppc/ppc.c +++ b/hw/ppc/ppc.c @@ -535,23 +535,24 @@ static inline void cpu_ppc_store_tb(ppc_tb_t *tb_env, uint64_t vmclk, void cpu_ppc_store_tbl (CPUPPCState *env, uint32_t value) { ppc_tb_t *tb_env = env->tb_env; +int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); uint64_t tb; -tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), tb_env->tb_offset); +tb = cpu_ppc_get_tb(tb_env, clock, tb_env->tb_offset); tb &= 0xULL; -cpu_ppc_store_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), - &tb_env->tb_offset, tb | (uint64_t)value); +cpu_ppc_store_tb(tb_env, clock, &tb_env->tb_offset, tb | (uint64_t)value); } static inline void _cpu_ppc_store_tbu(CPUPPCState *env, uint32_t value) { ppc_tb_t *tb_env = env->tb_env; +int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); uint64_t tb; -tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), tb_env->tb_offset); +tb = cpu_ppc_get_tb(tb_env, clock, tb_env->tb_offset); tb &= 0xULL; -cpu_ppc_store_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), - &tb_env->tb_offset, ((uint64_t)value << 32) | tb); +cpu_ppc_store_tb(tb_env, clock, &tb_env->tb_offset, + ((uint64_t)value << 32) | tb); } void cpu_ppc_store_tbu (CPUPPCState *env, uint32_t value) @@ -584,23 +585,24 @@ uint32_t cpu_ppc_load_atbu (CPUPPCState *env) void cpu_ppc_store_atbl (CPUPPCState *env, uint32_t value) { ppc_tb_t *tb_env = env->tb_env; +int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); uint64_t tb; -tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), tb_env->atb_offset); +tb = cpu_ppc_get_tb(tb_env, clock, tb_env->atb_offset); tb &= 0xULL; -cpu_ppc_store_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), - &tb_env->atb_offset, tb | (uint64_t)value); +cpu_ppc_store_tb(tb_env, clock, &tb_env->atb_offset, tb | (uint64_t)value); } void cpu_ppc_store_atbu (CPUPPCState *env, uint32_t value) { ppc_tb_t *tb_env = env->tb_env; +int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); uint64_t tb; -tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), tb_env->atb_offset); +tb = cpu_ppc_get_tb(tb_env, clock, tb_env->atb_offset); tb &= 0xULL; -cpu_ppc_store_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), - &tb_env->atb_offset, ((uint64_t)value << 32) | tb); +cpu_ppc_store_tb(tb_env, clock, &tb_env->atb_offset, + ((uint64_t)value << 32) | tb); } uint64_t cpu_ppc_load_vtb(CPUPPCState *env) @@ -622,14 +624,13 @@ void cpu_ppc_store_vtb(CPUPPCState *env, uint64_t value) void cpu_ppc_store_tbu40(CPUPPCState *env, uint64_t value) { ppc_tb_t *tb_env = env->tb_env; +int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL); uint64_t tb; -tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), -tb_env->tb_offset); +tb = cpu_ppc_get_tb(tb_env, clock, tb_env->tb_offset); tb &= 0xFFUL; tb |= (value & ~0xFFUL); -cpu_ppc_store_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), - &tb_env->tb_offset, tb); +cpu_ppc_store_tb(tb_env, clock, &tb_env->tb_offset, tb); } static void cpu_ppc_tb_stop (CPUPPCState *env) -- 2.39.2
[Stable-8.0.4 30/31] target/s390x: Make MC raise specification exception when class >= 16
From: Ilya Leoshkevich MC requires bit positions 8-11 (upper 4 bits of class) to be zeros, otherwise it must raise a specification exception. Cc: qemu-sta...@nongnu.org Fixes: 20d143e2cab8 ("s390x/tcg: Implement MONITOR CALL") Reviewed-by: David Hildenbrand Reviewed-by: Richard Henderson Signed-off-by: Ilya Leoshkevich Message-Id: <20230724082032.66864-6-...@linux.ibm.com> Signed-off-by: Thomas Huth (cherry picked from commit 9c028c057adce49304c6e4a51f6b426bd4f8f6b8) Signed-off-by: Michael Tokarev diff --git a/target/s390x/tcg/excp_helper.c b/target/s390x/tcg/excp_helper.c index 228aa9f237..3da337f7c7 100644 --- a/target/s390x/tcg/excp_helper.c +++ b/target/s390x/tcg/excp_helper.c @@ -639,7 +639,7 @@ void monitor_event(CPUS390XState *env, void HELPER(monitor_call)(CPUS390XState *env, uint64_t monitor_code, uint32_t monitor_class) { -g_assert(monitor_class <= 0xff); +g_assert(monitor_class <= 0xf); if (env->cregs[8] & (0x8000 >> monitor_class)) { monitor_event(env, monitor_code, monitor_class, GETPC()); diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c index 9e9fa3cef0..964ddd12d1 100644 --- a/target/s390x/tcg/translate.c +++ b/target/s390x/tcg/translate.c @@ -3192,9 +3192,9 @@ static DisasJumpType op_lcbb(DisasContext *s, DisasOps *o) static DisasJumpType op_mc(DisasContext *s, DisasOps *o) { -const uint16_t monitor_class = get_field(s, i2); +const uint8_t monitor_class = get_field(s, i2); -if (monitor_class & 0xff00) { +if (monitor_class & 0xf0) { gen_program_exception(s, PGM_SPECIFICATION); return DISAS_NORETURN; } -- 2.39.2
[Stable-8.0.4 27/31] target/s390x: Fix CLM with M3=0
From: Ilya Leoshkevich When the mask is zero, access exceptions should still be recognized for 1 byte at the second-operand address. CC should be set to 0. Cc: qemu-sta...@nongnu.org Fixes: defb0e3157af ("s390x: Implement opcode helpers") Reviewed-by: David Hildenbrand Reviewed-by: Richard Henderson Signed-off-by: Ilya Leoshkevich Message-Id: <20230724082032.66864-3-...@linux.ibm.com> Signed-off-by: Thomas Huth (cherry picked from commit 4b6e4c0b8223681ae85462794848db4386de1a8d) Signed-off-by: Michael Tokarev diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c index 51894f17f5..541859afe7 100644 --- a/target/s390x/tcg/mem_helper.c +++ b/target/s390x/tcg/mem_helper.c @@ -667,6 +667,11 @@ uint32_t HELPER(clm)(CPUS390XState *env, uint32_t r1, uint32_t mask, HELPER_LOG("%s: r1 0x%x mask 0x%x addr 0x%" PRIx64 "\n", __func__, r1, mask, addr); +if (!mask) { +/* Recognize access exceptions for the first byte */ +probe_read(env, addr, 1, cpu_mmu_index(env, false), ra); +} + while (mask) { if (mask & 8) { uint8_t d = cpu_ldub_data_ra(env, addr, ra); -- 2.39.2
[Stable-8.0.4 12/31] hw/ide/piix: properly initialize the BMIBA register
From: Olaf Hering According to the 82371FB documentation (82371FB.pdf, 2.3.9. BMIBA-BUS MASTER INTERFACE BASE ADDRESS REGISTER, April 1997), the register is 32bit wide. To properly reset it to default values, all 32bit need to be cleared. Bit #0 "Resource Type Indicator (RTE)" needs to be enabled. The initial change wrote just the lower 8 bit, leaving parts of the "Bus Master Interface Base Address" address at bit 15:4 unchanged. Fixes: e6a71ae327 ("Add support for 82371FB (Step A1) and Improved support for 82371SB (Function 1)") Signed-off-by: Olaf Hering Reviewed-by: Bernhard Beschow Reviewed-by: Philippe Mathieu-Daudé Message-ID: <20230712074721.14728-1-o...@aepfle.de> Signed-off-by: Paolo Bonzini (cherry picked from commit 230dfd9257e92259876c113e58b5f0d22b056d2e) Signed-off-by: Michael Tokarev diff --git a/hw/ide/piix.c b/hw/ide/piix.c index 41d60921e3..17ec304064 100644 --- a/hw/ide/piix.c +++ b/hw/ide/piix.c @@ -118,7 +118,7 @@ static void piix_ide_reset(DeviceState *dev) pci_set_word(pci_conf + PCI_COMMAND, 0x); pci_set_word(pci_conf + PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM | PCI_STATUS_FAST_BACK); -pci_set_byte(pci_conf + 0x20, 0x01); /* BMIBA: 20-23h */ +pci_set_long(pci_conf + 0x20, 0x1); /* BMIBA: 20-23h */ } static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp) -- 2.39.2
[Stable-8.0.4 21/31] linux-user: Fix signed math overflow in brk() syscall
From: Helge Deller Fix the math overflow when calculating the new_malloc_size. new_host_brk_page and brk_page are unsigned integers. If userspace reduces the heap, new_host_brk_page is lower than brk_page which results in a huge positive number (but should actually be negative). Fix it by adding a proper check and as such make the code more readable. Signed-off-by: Helge Deller Tested-by: "Markus F.X.J. Oberhumer" Reviewed-by: Philippe Mathieu-Daudé Fixes: 86f04735ac ("linux-user: Fix brk() to release pages") Cc: qemu-sta...@nongnu.org Buglink: https://github.com/upx/upx/issues/683 (cherry picked from commit eac78a4b0b7da4de2c0a297f4d528ca9cc6256a3) Signed-off-by: Michael Tokarev diff --git a/linux-user/syscall.c b/linux-user/syscall.c index e106633837..30a2ac0099 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -860,12 +860,13 @@ abi_long do_brk(abi_ulong brk_val) * itself); instead we treat "mapped but at wrong address" as * a failure and unmap again. */ -new_alloc_size = new_host_brk_page - brk_page; -if (new_alloc_size) { +if (new_host_brk_page > brk_page) { +new_alloc_size = new_host_brk_page - brk_page; mapped_addr = get_errno(target_mmap(brk_page, new_alloc_size, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE, 0, 0)); } else { +new_alloc_size = 0; mapped_addr = brk_page; } -- 2.39.2
[Stable-8.0.4 31/31] target/s390x: Fix assertion failure in VFMIN/VFMAX with type 13
From: Ilya Leoshkevich Type 13 is reserved, so using it should result in specification exception. Due to an off-by-1 error the code triggers an assertion at a later point in time instead. Cc: qemu-sta...@nongnu.org Fixes: da4807527f3b ("s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM)") Reviewed-by: David Hildenbrand Reviewed-by: Richard Henderson Signed-off-by: Ilya Leoshkevich Message-Id: <20230724082032.66864-8-...@linux.ibm.com> Signed-off-by: Thomas Huth (cherry picked from commit ff537b0370ab5918052b8d8a798e803c47272406) Signed-off-by: Michael Tokarev diff --git a/target/s390x/tcg/translate_vx.c.inc b/target/s390x/tcg/translate_vx.c.inc index 43dfbfd03f..f8df121d3d 100644 --- a/target/s390x/tcg/translate_vx.c.inc +++ b/target/s390x/tcg/translate_vx.c.inc @@ -3047,7 +3047,7 @@ static DisasJumpType op_vfmax(DisasContext *s, DisasOps *o) const uint8_t m5 = get_field(s, m5); gen_helper_gvec_3_ptr *fn; -if (m6 == 5 || m6 == 6 || m6 == 7 || m6 > 13) { +if (m6 == 5 || m6 == 6 || m6 == 7 || m6 >= 13) { gen_program_exception(s, PGM_SPECIFICATION); return DISAS_NORETURN; } -- 2.39.2
[Stable-8.0.4 25/31] tcg/{i386, s390x}: Add earlyclobber to the op_add2's first output
From: Ilya Leoshkevich i386 and s390x implementations of op_add2 require an earlyclobber, which is currently missing. This breaks VCKSM in s390x guests. E.g., on x86_64 the following op: add2_i32 tmp2,tmp3,tmp2,tmp3,tmp3,tmp2 dead: 0 2 3 4 5 pref=none,0x is translated to: addl %ebx, %r12d adcl %r12d, %ebx Introduce a new C_N1_O1_I4 constraint, and make sure that earlyclobber of aliased outputs is honored. Cc: qemu-sta...@nongnu.org Fixes: 82790a870992 ("tcg: Add markup for output requires new register") Signed-off-by: Ilya Leoshkevich Reviewed-by: Richard Henderson Message-Id: <20230719221310.1968845-7-...@linux.ibm.com> Signed-off-by: Richard Henderson (cherry picked from commit 22d2e5351a18aff5a9c7e3984b50ecce61ff8975) Signed-off-by: Michael Tokarev diff --git a/tcg/i386/tcg-target-con-set.h b/tcg/i386/tcg-target-con-set.h index 91ceb0e1da..5ea3a292f0 100644 --- a/tcg/i386/tcg-target-con-set.h +++ b/tcg/i386/tcg-target-con-set.h @@ -11,6 +11,9 @@ * * C_N1_Im(...) defines a constraint set with 1 output and inputs, * except that the output must use a new register. + * + * C_Nn_Om_Ik(...) defines a constraint set with outputs and + * inputs, except that the first outputs must use new registers. */ C_O0_I1(r) C_O0_I2(L, L) @@ -53,4 +56,4 @@ C_O2_I1(r, r, L) C_O2_I2(a, d, a, r) C_O2_I2(r, r, L, L) C_O2_I3(a, d, 0, 1, r) -C_O2_I4(r, r, 0, 1, re, re) +C_N1_O1_I4(r, r, 0, 1, re, re) diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc index 5c7c180799..d00800d18a 100644 --- a/tcg/i386/tcg-target.c.inc +++ b/tcg/i386/tcg-target.c.inc @@ -3356,7 +3356,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_add2_i64: case INDEX_op_sub2_i32: case INDEX_op_sub2_i64: -return C_O2_I4(r, r, 0, 1, re, re); +return C_N1_O1_I4(r, r, 0, 1, re, re); case INDEX_op_ctz_i32: case INDEX_op_ctz_i64: diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h index 15f1c55103..31daa5daca 100644 --- a/tcg/s390x/tcg-target-con-set.h +++ b/tcg/s390x/tcg-target-con-set.h @@ -8,6 +8,9 @@ * C_On_Im(...) defines a constraint set with outputs and inputs. * Each operand should be a sequence of constraint letters as defined by * tcg-target-con-str.h; the constraint combination is inclusive or. + * + * C_Nn_Om_Ik(...) defines a constraint set with outputs and + * inputs, except that the first outputs must use new registers. */ C_O0_I1(r) C_O0_I2(L, L) @@ -41,6 +44,5 @@ C_O1_I4(r, r, rA, rI, r) C_O2_I2(o, m, 0, r) C_O2_I2(o, m, r, r) C_O2_I3(o, m, 0, 1, r) -C_O2_I4(r, r, 0, 1, rA, r) -C_O2_I4(r, r, 0, 1, ri, r) -C_O2_I4(r, r, 0, 1, r, r) +C_N1_O1_I4(r, r, 0, 1, ri, r) +C_N1_O1_I4(r, r, 0, 1, rA, r) diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc index 844532156b..2e5fd4968c 100644 --- a/tcg/s390x/tcg-target.c.inc +++ b/tcg/s390x/tcg-target.c.inc @@ -3229,11 +3229,11 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_add2_i32: case INDEX_op_sub2_i32: -return C_O2_I4(r, r, 0, 1, ri, r); +return C_N1_O1_I4(r, r, 0, 1, ri, r); case INDEX_op_add2_i64: case INDEX_op_sub2_i64: -return C_O2_I4(r, r, 0, 1, rA, r); +return C_N1_O1_I4(r, r, 0, 1, rA, r); case INDEX_op_st_vec: return C_O0_I2(v, r); diff --git a/tcg/tcg.c b/tcg/tcg.c index f3bf471274..09f345fa1b 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -368,6 +368,7 @@ void tcg_raise_tb_overflow(TCGContext *s) #define C_O2_I2(O1, O2, I1, I2) C_PFX4(c_o2_i2_, O1, O2, I1, I2), #define C_O2_I3(O1, O2, I1, I2, I3) C_PFX5(c_o2_i3_, O1, O2, I1, I2, I3), #define C_O2_I4(O1, O2, I1, I2, I3, I4) C_PFX6(c_o2_i4_, O1, O2, I1, I2, I3, I4), +#define C_N1_O1_I4(O1, O2, I1, I2, I3, I4) C_PFX6(c_n1_o1_i4_, O1, O2, I1, I2, I3, I4), typedef enum { #include "tcg-target-con-set.h" @@ -388,6 +389,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode); #undef C_O2_I2 #undef C_O2_I3 #undef C_O2_I4 +#undef C_N1_O1_I4 /* Put all of the constraint sets into an array, indexed by the enum. */ @@ -407,6 +409,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode); #define C_O2_I2(O1, O2, I1, I2) { .args_ct_str = { #O1, #O2, #I1, #I2 } }, #define C_O2_I3(O1, O2, I1, I2, I3) { .args_ct_str = { #O1, #O2, #I1, #I2, #I3 } }, #define C_O2_I4(O1, O2, I1, I2, I3, I4) { .args_ct_str = { #O1, #O2, #I1, #I2, #I3, #I4 } }, +#define C_N1_O1_I4(O1, O2, I1, I2, I3, I4) { .args_ct_str = { "&" #O1, #O2, #I1, #I2, #I3, #I4 } }, static const TCGTargetOpDef constraint_sets[] = { #include "tcg-target-con-set.h" @@ -426,6 +429,7 @@ static const TCGTargetOpDef constraint_sets[] = { #undef C_O2_I2 #undef C_O2_I3 #undef C_O2_I4 +#undef C_N1_O1_I4 /* Expand the enumerator to be returned from tcg_target_op_def(). */ @@ -445,6 +449,7 @@ static const TCGTargetOpDef constraint_sets[] = { #define C_O