Re: [RFC PATCH 04/12] vhost_net: Use ops->vhost_reset_queue in vhost_net_virtqueue_reset

2023-07-25 Thread Jason Wang



在 2023/7/21 02:14, Eugenio Pérez 写道:

Actually use vhost_reset_queue operation at queue reset.

Signed-off-by: Eugenio Pérez 
---
  hw/net/vhost_net.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 416b7d8132..5516b7a5aa 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -571,7 +571,9 @@ void vhost_net_virtqueue_reset(VirtIODevice *vdev, 
NetClientState *nc,
  
  idx = vhost_ops->vhost_get_vq_index(&net->dev, vq_index);
  
-if (net->nc->info->type == NET_CLIENT_DRIVER_TAP) {

+if (vhost_ops->vhost_reset_queue) {
+vhost_ops->vhost_reset_queue(&net->dev, idx);
+} else if (net->nc->info->type == NET_CLIENT_DRIVER_TAP) {
  file.index = idx;
  int r = vhost_net_set_backend(&net->dev, &file);
  assert(r >= 0);



Let's move the TAP specific logic to vhost-net specific reset_queue 
function.


Thanks




Re: [RFC PATCH 03/12] vhost_net: Use ops->vhost_restart_queue in vhost_net_virtqueue_restart

2023-07-25 Thread Jason Wang



在 2023/7/21 02:14, Eugenio Pérez 写道:

Actually use vhost_restart_queue operation at restart.

Signed-off-by: Eugenio Pérez 
---
  hw/net/vhost_net.c | 12 +++-
  1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 6b958d6363..416b7d8132 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -608,14 +608,16 @@ int vhost_net_virtqueue_restart(VirtIODevice *vdev, 
NetClientState *nc,
  goto err_start;
  }
  
-if (net->nc->info->type == NET_CLIENT_DRIVER_TAP) {

+if (vhost_ops->vhost_restart_queue) {
+r = vhost_ops->vhost_restart_queue(&net->dev, idx);
+} else if (net->nc->info->type == NET_CLIENT_DRIVER_TAP) {
  file.index = idx;
  file.fd = net->backend;
  r = vhost_net_set_backend(&net->dev, &file);



I would introduce a vhost-net specific reset routine and move the above 
logic there.


Thanks



-if (r < 0) {
-r = -errno;
-goto err_start;
-}
+}
+if (r < 0) {
+r = -errno;
+goto err_start;
  }
  
  return 0;





Re: [RFC PATCH 00/12] Prefer to use SVQ to stall dataplane at NIC state restore through CVQ

2023-07-25 Thread Jason Wang



在 2023/7/21 14:48, Eugenio Perez Martin 写道:

On Thu, Jul 20, 2023 at 8:15 PM Eugenio Pérez  wrote:

At this moment the migration of net features that depends on CVQ is not
possible, as there is no reliable way to restore the device state like mac
address, number of enabled queues, etc to the destination.  This is mainly
caused because the device must only read CVQ, and process all the commands
before resuming the dataplane.

This series uses the VirtIO feature _F_RING_RESET to achieve it, adding an
alternative method to late vq enabling proposed in [1][2].  It expose SVQ to
the device until it process all the CVQ messages, and then replaces the vring
for the guest's one.


A couple of things I forgot to add:
* Assuming the implementation of _F_RING_RESET in vdpa is calling
kernel vdpa ops .set_vq_ready(vq, false). I'm not sure if this is the
best implementation, but it is currently unused in the kernel. At the
same time, .set_vq_ready(vq, true) also enables the vq again.



I think we need another ops, as set_vq_ready() tends to be functional 
equivalent to queue_enable.


If we reuse set_vq_ready(vq, false), we would get conflict in the future 
when driver is allowed to stop/resume a specific virtqueue via setting 0 
to queue_enable. And that's also the reason why queue_enable is not 
resued to reset a virtqueue.






As an advantage, it uses a feature well deviced in the VirtIO standard.  As a
disadvantage, current HW already support the late enabling and it does not
support RING_RESET.

This patch must be applied on top of the series ("Enable vdpa net migration
with features depending on CVQ") [1][2].

The patch has been tested with vp_vdpa, but using high IOVA instead of a
sepparated ID for shadow CVQ and shadow temporal vrings.


And with _F_STATE implementation I sent long ago.

Based on this, my suggestion is:
* Leave the late enable for vDPA devices.
* Make them fail if the vDPA parent device does not support it. This
can be considered as a fix.
* Leave _F_RING_RESET to be added on top, as the semantics are not
implemented in vDPA at the moment.

Would that work?



I think it can work, let's start from late enabling which seems 
lightweight than reset and see. We can leave the vp_vdpa to be done on 
top with another series.


Thanks





[1] Message-id: <20230706191227.835526-1-epere...@redhat.com>
[2] https://lists.nongnu.org/archive/html/qemu-devel/2023-07/msg01325.html

Eugenio Pérez (12):
   vhost: add vhost_reset_queue_op
   vhost: add vhost_restart_queue_op
   vhost_net: Use ops->vhost_restart_queue in vhost_net_virtqueue_restart
   vhost_net: Use ops->vhost_reset_queue in vhost_net_virtqueue_reset
   vdpa: add vhost_vdpa_set_vring_ready_internal
   vdpa: add vhost_vdpa_svq_stop
   vdpa: add vhost_vdpa_reset_queue
   vdpa: add vhost_vdpa_svq_start
   vdpa: add vhost_vdpa_restart_queue
   vdpa: enable all vqs if the device support RING_RESET feature
   vdpa: use SVQ to stall dataplane while NIC state is being restored
   vhost: Allow _F_RING_RESET with shadow virtqueue

  include/hw/virtio/vhost-backend.h  |   6 ++
  hw/net/vhost_net.c |  16 ++--
  hw/virtio/vhost-shadow-virtqueue.c |   1 +
  hw/virtio/vhost-vdpa.c | 139 +
  net/vhost-vdpa.c   |  55 ++--
  hw/virtio/trace-events |   2 +-
  6 files changed, 171 insertions(+), 48 deletions(-)

--
2.39.3







Re: [PATCH v2 3/4] vdpa: Restore vlan filtering state

2023-07-25 Thread Hawkins Jiawei
On 2023/7/25 14:47, Jason Wang wrote:
> On Sun, Jul 23, 2023 at 5:28 PM Hawkins Jiawei  wrote:
>>
>> This patch introduces vhost_vdpa_net_load_single_vlan()
>> and vhost_vdpa_net_load_vlan() to restore the vlan
>> filtering state at device's startup.
>>
>> Co-developed-by: Eugenio Pérez 
>> Signed-off-by: Eugenio Pérez 
>> Signed-off-by: Hawkins Jiawei 
>
> Acked-by: Jason Wang 
>
> But this seems to be a source of latency killer as it may at most send
> 1024 commands.
>
> As discussed in the past, we need a better cvq command to do this: for
> example, a single command to carray a bitmap.

Hi Jason,

Thanks for your review.

You are right, we need some improvement here.

Therefore, I have submitted another patch series titled "vdpa: Send all
CVQ state load commands in parallel" at [1], which allows QEMU to delay
polling and checking the device used buffer until either the SVQ is full
or control commands shadow buffers are full, so that QEMU can send all
the SVQ control commands in parallel, which has better performance
improvement.

To test that patch series, I created 4094 VLANS in guest to build an
environment for sending multiple CVQ state load commands. According to
the result on the real vdpa device at [2], this patch series can improve
latency from 23296 us to 6539 us.

Thanks!

[1]. https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg03726.html
[2]. https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg03947.html


>
> Thanks
>
>> ---
>> v2:
>>   - remove the extra line pointed out by Eugenio
>>
>> v1: 
>> https://lore.kernel.org/all/0a568cc8a8d2b750c2e09b2237e9f05cece07c3f.1689690854.git.yin31...@gmail.com/
>>
>>   net/vhost-vdpa.c | 48 
>>   1 file changed, 48 insertions(+)
>>
>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
>> index 9795306742..347241796d 100644
>> --- a/net/vhost-vdpa.c
>> +++ b/net/vhost-vdpa.c
>> @@ -965,6 +965,50 @@ static int vhost_vdpa_net_load_rx(VhostVDPAState *s,
>>   return 0;
>>   }
>>
>> +static int vhost_vdpa_net_load_single_vlan(VhostVDPAState *s,
>> +   const VirtIONet *n,
>> +   uint16_t vid)
>> +{
>> +const struct iovec data = {
>> +.iov_base = &vid,
>> +.iov_len = sizeof(vid),
>> +};
>> +ssize_t dev_written = vhost_vdpa_net_load_cmd(s, VIRTIO_NET_CTRL_VLAN,
>> +  VIRTIO_NET_CTRL_VLAN_ADD,
>> +  &data, 1);
>> +if (unlikely(dev_written < 0)) {
>> +return dev_written;
>> +}
>> +if (unlikely(*s->status != VIRTIO_NET_OK)) {
>> +return -EIO;
>> +}
>> +
>> +return 0;
>> +}
>> +
>> +static int vhost_vdpa_net_load_vlan(VhostVDPAState *s,
>> +const VirtIONet *n)
>> +{
>> +int r;
>> +
>> +if (!virtio_vdev_has_feature(&n->parent_obj, VIRTIO_NET_F_CTRL_VLAN)) {
>> +return 0;
>> +}
>> +
>> +for (int i = 0; i < MAX_VLAN >> 5; i++) {
>> +for (int j = 0; n->vlans[i] && j <= 0x1f; j++) {
>> +if (n->vlans[i] & (1U << j)) {
>> +r = vhost_vdpa_net_load_single_vlan(s, n, (i << 5) + j);
>> +if (unlikely(r != 0)) {
>> +return r;
>> +}
>> +}
>> +}
>> +}
>> +
>> +return 0;
>> +}
>> +
>>   static int vhost_vdpa_net_load(NetClientState *nc)
>>   {
>>   VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
>> @@ -995,6 +1039,10 @@ static int vhost_vdpa_net_load(NetClientState *nc)
>>   if (unlikely(r)) {
>>   return r;
>>   }
>> +r = vhost_vdpa_net_load_vlan(s, n);
>> +if (unlikely(r)) {
>> +return r;
>> +}
>>
>>   return 0;
>>   }
>> --
>> 2.25.1
>>
>



Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop

2023-07-25 Thread Hanna Czenczek

On 24.07.23 17:48, Eugenio Perez Martin wrote:

On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek  wrote:

On 21.07.23 17:25, Eugenio Perez Martin wrote:

On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek  wrote:

Move the `suspended` field from vhost_vdpa into the global vhost_dev
struct, so vhost_dev_stop() can check whether the back-end has been
suspended by `vhost_ops->vhost_dev_start(hdev, false)`.  If it has,
there is no need to reset it; the reset is just a fall-back to stop
device operations for back-ends that do not support suspend.

Unfortunately, for vDPA specifically, RESUME is not yet implemented, so
when the device is re-started, we still have to do the reset to have it
un-suspend.

Signed-off-by: Hanna Czenczek 
---
   include/hw/virtio/vhost-vdpa.h |  2 --
   include/hw/virtio/vhost.h  |  8 
   hw/virtio/vhost-vdpa.c | 11 +++
   hw/virtio/vhost.c  |  8 +++-
   4 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index e64bfc7f98..72c3686b7f 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -42,8 +42,6 @@ typedef struct vhost_vdpa {
   bool shadow_vqs_enabled;
   /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA 
*/
   bool shadow_data;
-/* Device suspended successfully */
-bool suspended;
   /* IOVA mapping used by the Shadow Virtqueue */
   VhostIOVATree *iova_tree;
   GPtrArray *shadow_vqs;
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 6a173cb9fa..69bf59d630 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -120,6 +120,14 @@ struct vhost_dev {
   uint64_t backend_cap;
   /* @started: is the vhost device started? */
   bool started;
+/**
+ * @suspended: Whether the vhost device is currently suspended.  Set
+ * and reset by implementations (vhost-user, vhost-vdpa, ...), which
+ * are supposed to automatically suspend/resume in their
+ * vhost_dev_start handlers as required.  Must also be cleared when
+ * the device is reset.
+ */
+bool suspended;
   bool log_enabled;
   uint64_t log_size;
   Error *migration_blocker;
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 7b7dee468e..f7fd19a203 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev,

   static int vhost_vdpa_reset_device(struct vhost_dev *dev)
   {
-struct vhost_vdpa *v = dev->opaque;
   int ret;
   uint8_t status = 0;

   ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
   trace_vhost_vdpa_reset_device(dev);
-v->suspended = false;
+dev->suspended = false;
   return ret;
   }

@@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev)
   if (unlikely(r)) {
   error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno);
   } else {
-v->suspended = true;
+dev->suspended = true;
   return;
   }
   }
@@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, 
bool started)
   return -1;
   }
   vhost_vdpa_set_vring_ready(dev);
+if (dev->suspended) {
+/* TODO: When RESUME is available, use it instead of resetting */
+vhost_vdpa_reset_status(dev);

How is that we reset the status at each vhost_vdpa_dev_start? That
will clean all the vqs configured, features negotiated, etc. in the
vDPA device. Or am I missing something?

What alternative do you propose?  We don’t have RESUME for vDPA in qemu,
but we somehow need to lift the previous SUSPEND so the device will
again respond to guest requests, do we not?


Reset also clears the suspend state in vDPA, and it should be called
at vhost_dev_stop. So the device should never be in suspended state
here. Does that solve your concerns?


My intention with this patch was precisely not to reset in 
vhost_dev_stop when suspending is supported.  So now I’m more confused 
than before.



But more generally, is this any different from what is done before this
patch?  Before this patch, vhost_dev_stop() unconditionally invokes
vhost_reset_status(), so the device is reset in every stop/start cycle,
that doesn’t change.  And we still won’t reset it on the first
vhost_dev_start(), because dev->suspended will be false then, only on
subsequent stop/start cycles, as before.  So the only difference is that
now the device is reset on start, not on stop.


The difference is that vhost_vdpa_dev_start is called after features
ack (via vhost_dev_start, through vhost_dev_set_features call) and vq
configuration (using vhost_virtqueue_start). A device reset forces the
device to forget about all of that, and qemu cannot configure them
again until qemu acks the features again.


Now I’m complete

Re: [PATCH v6 5/6] qapi: Add HV_BALLOON_STATUS_REPORT event

2023-07-25 Thread Markus Armbruster
"Maciej S. Szmigiero"  writes:

> From: "Maciej S. Szmigiero" 
>
> Used by the hv-balloon driver for (optional) guest memory status reports.

Inhowfar optional?  What enables / triggers it?

Use case for the event?

Could a status event make sense for other balloon drivers as well?

> Signed-off-by: Maciej S. Szmigiero 
> ---
>  qapi/machine.json | 25 +
>  1 file changed, 25 insertions(+)
>
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 5ede977cf2bc..9649616b9ed2 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -1113,6 +1113,31 @@
>  { 'event': 'BALLOON_CHANGE',
>'data': { 'actual': 'int' } }
>  
> +##
> +# @HV_BALLOON_STATUS_REPORT:
> +#
> +# Emitted when the hv-balloon driver receives a "STATUS" message from
> +# the guest.

Aha, the event is triggered by the guest.  It must therefore be
rate-limited, just like BALLOON_CHANGE.  To do that, add it to
monitor_qapi_event_conf[] in monitor/monitor.c, and document it as noted
below.

> +#
> +# @commited: the amount of memory in use inside the guest plus the amount
> +#of the memory unusable inside the guest (ballooned out,
> +#offline, etc.)
> +#
> +# @available: the amount of the memory inside the guest available for new
> +# allocations ("free")

Spelling: committed.  Remember to update the example, too.

Please format like

# @committed: the amount of memory in use inside the guest plus the
# amount of the memory unusable inside the guest (ballooned out,
# offline, etc.)
#
# @available: the amount of the memory inside the guest available for
# new allocations ("free")

to blend in with recent commit a937b6aa739 (qapi: Reformat doc comments
to conform to current conventions).

> +#

To document rate-limiting, add:

   # Note: this event is rate-limited.
   #

> +# Since: TBD
> +#
> +# Example:
> +#
> +# <- { "event": "HV_BALLOON_STATUS_REPORT",
> +#  "data": { "commited": 81664, "available": 054464 },
> +#  "timestamp": { "seconds": 1600295492, "microseconds": 661044 } }
> +#
> +##
> +{ 'event': 'HV_BALLOON_STATUS_REPORT',
> +  'data': { 'commited': 'size', 'available': 'size' } }
> +
>  ##
>  # @MemoryInfo:
>  #

An event is commonly paired with a query command, so that QMP clients
can resynchronize state after missing events, e.g. when reconnecting
after a client restart.

query-balloon isn't such a query: it returns less than the event.

If a paired query doesn't make sense, explain why.




[PATCH] semihosting/uaccess.c: Replaced a malloc call with g_malloc.

2023-07-25 Thread dinglimin
Replaced a call to malloc() and its respective call to free() with g_malloc() 
and g_free().

Signed-off-by: dinglimin 
---
 semihosting/uaccess.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/semihosting/uaccess.c b/semihosting/uaccess.c
index 8018828069..8f2e6f63ee 100644
--- a/semihosting/uaccess.c
+++ b/semihosting/uaccess.c
@@ -14,11 +14,10 @@
 void *softmmu_lock_user(CPUArchState *env, target_ulong addr,
 target_ulong len, bool copy)
 {
-void *p = malloc(len);
+void *p = g_malloc(len);
 if (p && copy) {
 if (cpu_memory_rw_debug(env_cpu(env), addr, p, len, 0)) {
-free(p);
-p = NULL;
+g_free(p);
 }
 }
 return p;
-- 
2.30.0.windows.2






Re: [PATCH] semihosting/uaccess.c: Replaced a malloc call with g_malloc.

2023-07-25 Thread Michael Tokarev

25.07.2023 11:06, dinglimin wrote:

Replaced a call to malloc() and its respective call to free() with g_malloc() 
and g_free().

...

  void *softmmu_lock_user(CPUArchState *env, target_ulong addr,
  target_ulong len, bool copy)
  {
-void *p = malloc(len);
+void *p = g_malloc(len);
  if (p && copy) {
  if (cpu_memory_rw_debug(env_cpu(env), addr, p, len, 0)) {
-free(p);
-p = NULL;
+g_free(p);
  }
  }
  return p;


This is definitely wrong.

Hint: what this function will return if cpu_memory_rw_debug() fails?

/mjt



Re: [PATCH v6 4/6] qapi: Add HvBalloonDeviceInfo sub-type to MemoryDeviceInfo

2023-07-25 Thread Markus Armbruster
"Maciej S. Szmigiero"  writes:

> From: "Maciej S. Szmigiero" 
>
> Used by the hv-balloon driver to report its provided memory state
> information.
>
> Co-developed-by: David Hildenbrand 
> Signed-off-by: Maciej S. Szmigiero 
> ---
>  hw/core/machine-hmp-cmds.c | 15 +++
>  qapi/machine.json  | 39 --
>  2 files changed, 52 insertions(+), 2 deletions(-)
>
> diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
> index c3e55ef9e9cd..7b06ed35decb 100644
> --- a/hw/core/machine-hmp-cmds.c
> +++ b/hw/core/machine-hmp-cmds.c
> @@ -247,6 +247,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict 
> *qdict)
>  MemoryDeviceInfo *value;
>  PCDIMMDeviceInfo *di;
>  SgxEPCDeviceInfo *se;
> +HvBalloonDeviceInfo *hi;
>  
>  for (info = info_list; info; info = info->next) {
>  value = info->value;
> @@ -304,6 +305,20 @@ void hmp_info_memory_devices(Monitor *mon, const QDict 
> *qdict)
>  monitor_printf(mon, "  node: %" PRId64 "\n", se->node);
>  monitor_printf(mon, "  memdev: %s\n", se->memdev);
>  break;
> +case MEMORY_DEVICE_INFO_KIND_HV_BALLOON:
> +hi = value->u.hv_balloon.data;
> +monitor_printf(mon, "Memory device [%s]: \"%s\"\n",
> +   MemoryDeviceInfoKind_str(value->type),
> +   hi->id ? hi->id : "");
> +if (hi->has_memaddr) {
> +monitor_printf(mon, "  memaddr: 0x%" PRIx64 "\n",
> +   hi->memaddr);
> +}
> +monitor_printf(mon, "  max-size: %" PRIu64 "\n", 
> hi->max_size);
> +if (hi->memdev) {
> +monitor_printf(mon, "  memdev: %s\n", hi->memdev);
> +}
> +break;
>  default:
>  g_assert_not_reached();
>  }
> diff --git a/qapi/machine.json b/qapi/machine.json
> index a08b6576cac6..5ede977cf2bc 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -1265,6 +1265,29 @@
>}
>  }
>  
> +##
> +# @HvBalloonDeviceInfo:
> +#
> +# hv-balloon provided memory state information
> +#
> +# @id: device's ID
> +#
> +# @memaddr: physical address in memory, where device is mapped
> +#
> +# @max-size: the maximum size of memory that the device can provide
> +#
> +# @memdev: memory backend linked with device
> +#
> +# Since: TBD

I understand why you put in TBD here (aiming for a moving target is a
hassle), but patches not marked RFC should have no known issues that
should be fixed before merging them.

> +##
> +{ 'struct': 'HvBalloonDeviceInfo',
> +  'data': { '*id': 'str',
> +'*memaddr': 'size',
> +'max-size': 'size',
> +'*memdev': 'str'
> +  }
> +}
> +
>  ##
>  # @MemoryDeviceInfoKind:
>  #
> @@ -1276,10 +1299,13 @@
>  #
>  # @sgx-epc: since 6.2.
>  #
> +# @hv-balloon: since TBD.
> +#
>  # Since: 2.1
>  ##
>  { 'enum': 'MemoryDeviceInfoKind',
> -  'data': [ 'dimm', 'nvdimm', 'virtio-pmem', 'virtio-mem', 'sgx-epc' ] }
> +  'data': [ 'dimm', 'nvdimm', 'virtio-pmem', 'virtio-mem', 'sgx-epc',
> +'hv-balloon' ] }
>  
>  ##
>  # @PCDIMMDeviceInfoWrapper:
> @@ -1313,6 +1339,14 @@
>  { 'struct': 'SgxEPCDeviceInfoWrapper',
>'data': { 'data': 'SgxEPCDeviceInfo' } }
>  
> +##
> +# @HvBalloonDeviceInfoWrapper:
> +#
> +# Since: TBD
> +##
> +{ 'struct': 'HvBalloonDeviceInfoWrapper',
> +  'data': { 'data': 'HvBalloonDeviceInfo' } }
> +
>  ##
>  # @MemoryDeviceInfo:
>  #
> @@ -1327,7 +1361,8 @@
>  'nvdimm': 'PCDIMMDeviceInfoWrapper',
>  'virtio-pmem': 'VirtioPMEMDeviceInfoWrapper',
>  'virtio-mem': 'VirtioMEMDeviceInfoWrapper',
> -'sgx-epc': 'SgxEPCDeviceInfoWrapper'
> +'sgx-epc': 'SgxEPCDeviceInfoWrapper',
> +'hv-balloon': 'HvBalloonDeviceInfoWrapper'
>}
>  }
>  

The organization of the series feels a bit awkward.

In this patch, you define QAPI types and add a bit of code reading them,
but the code creating them is left for later.

In the next patch, you define a QMP event, but the code sending it is
left for later.

In the final, huge patch, you fill in the blanks.

Adding definitions before their uses can be the least awkward solution.
But then the commit messages should point out that uses come later.
Describing these future uses briefly may be necessary to help the reader
understand the patch on its own.

Perhaps you can restructure the series instead.




Re: [PATCH v6 0/6] Hyper-V Dynamic Memory Protocol driver (hv-balloon 🎈️)

2023-07-25 Thread Markus Armbruster
David Hildenbrand  writes:

> On 20.07.23 12:12, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" 
>> This is a continuation of the v5 of the patch series located here:
>> https://lore.kernel.org/qemu-devel/cover.1686577753.git.maciej.szmigi...@oracle.com/
>> 
>
> We're now in QEMU soft-freeze, which means the memslot series might take a 
> bit to land. I'm going to follow-up on that soonish.
>
>> Changes from v5:
>> * Incorporate David's rework of the driver on top of his virtio-mem-memslots
>> patches (specifically, commit 6769107d1a4f), making use of a memory region
>> container created upfront to avoid calling memory_device{,_pre}_plug()
>> functions from the driver and introducing a driver-specific MemoryDeviceInfo
>> sub-type.
>> * Include two additional David's memory-device patches necessary for the
>> aforementioned conversion in this patch set.
>> * Use multiple memslots to cover the hot-add memory backend in order to
>> reduce metadata size for the not-yet-hot-added part of the memory backend.
>> * Add David's "Co-developed-by:" to patches where he contributed some 
>> changes.
>> * Use OBJECT_DEFINE_TYPE_WITH_INTERFACES() and OBJECT_DECLARE_SIMPLE_TYPE()
>> macros instead of open-coding the equivalent functionality.
>> * Drop no longer necessary patch adding g_autoptr() cleanup function for the
>> Error type.
>> David Hildenbrand (2):
>>memory-device: Support empty memory devices
>>memory-device: Drop size alignment check
>> Maciej S. Szmigiero (4):
>>Add Hyper-V Dynamic Memory Protocol definitions
>>qapi: Add HvBalloonDeviceInfo sub-type to MemoryDeviceInfo
>>qapi: Add HV_BALLOON_STATUS_REPORT event
>>Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)
>
> That is still a gigantic patch. Is there any way to split that into 
> reasonable chunks? For example, move the whole hotplug/memslot part into
> a dedicated patch?
>
> See below on splitting off the PC changes.
>
>>   Kconfig.host |3 +
>>   hw/core/machine-hmp-cmds.c   |   15 +
>>   hw/hyperv/Kconfig|5 +
>>   hw/hyperv/hv-balloon.c   | 2246 ++
>>   hw/hyperv/meson.build|1 +
>>   hw/hyperv/trace-events   |   18 +
>>   hw/i386/pc.c |   22 +
>>   hw/mem/memory-device.c   |   45 +-
>>   include/hw/hyperv/dynmem-proto.h |  423 ++
>>   include/hw/hyperv/hv-balloon.h   |   18 +
>>   include/hw/mem/memory-device.h   |7 +-
>>   meson.build  |   28 +-
>>   meson_options.txt|2 +
>>   qapi/machine.json|   64 +-
>>   scripts/meson-buildoptions.sh|3 +
>
> It's probably best to separate the actual device implementation from wiring 
> up the machine. That is, have a HV_BALLOON_SUPPORTED kconfig
> (like VIRTIO_MEM_SUPPORTED), and activate that in a single commit for
> PC, where you also modify hw/i386/pc.c.
>
> That commit would be called something like "pc: Support hv-balloon".

Furthermore, try to factor out stuff related to QMP:

0. The driver with QMP stuff omitted / stubbed out

1. Enable query-memory-devices

2. Add HV_BALLOON_STATUS_REPORT event




Re: [PATCH v6 4/6] qapi: Add HvBalloonDeviceInfo sub-type to MemoryDeviceInfo

2023-07-25 Thread Markus Armbruster
"Maciej S. Szmigiero"  writes:

> On 24.07.2023 13:37, Markus Armbruster wrote:
>> "Maciej S. Szmigiero"  writes:
>> 
>>> From: "Maciej S. Szmigiero" 
>>>
>>> Used by the hv-balloon driver to report its provided memory state
>>> information.
>>>
>>> Co-developed-by: David Hildenbrand 
>>> Signed-off-by: Maciej S. Szmigiero 
>>> ---
>>>   hw/core/machine-hmp-cmds.c | 15 +++
>>>   qapi/machine.json  | 39 --
>>>   2 files changed, 52 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
>>> index c3e55ef9e9cd..7b06ed35decb 100644
>>> --- a/hw/core/machine-hmp-cmds.c
>>> +++ b/hw/core/machine-hmp-cmds.c
>>> @@ -247,6 +247,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict 
>>> *qdict)
>>>   MemoryDeviceInfo *value;
>>>   PCDIMMDeviceInfo *di;
>>>   SgxEPCDeviceInfo *se;
>>> +HvBalloonDeviceInfo *hi;
>>> for (info = info_list; info; info = info->next) {
>>>   value = info->value;
>>> @@ -304,6 +305,20 @@ void hmp_info_memory_devices(Monitor *mon, const QDict 
>>> *qdict)
>>>   monitor_printf(mon, "  node: %" PRId64 "\n", se->node);
>>>   monitor_printf(mon, "  memdev: %s\n", se->memdev);
>>>   break;
>>> +case MEMORY_DEVICE_INFO_KIND_HV_BALLOON:
>>
>> This is the only occurence of MEMORY_DEVICE_INFO_KIND_HV_BALLOON at the
>> end of the series.  Where are MemoryDeviceInfo with this union tag
>> created?
>
> From patch 6:
[...]

I messed around with git, and managed to confuse myself.  Thanks for
your help!




Re: [PATCH 1/6] vhost-user.rst: Add suspend/resume

2023-07-25 Thread Hanna Czenczek

On 24.07.23 19:55, Stefan Hajnoczi wrote:

On Wed, Jul 19, 2023 at 03:59:32PM +0200, Hanna Czenczek wrote:

On 18.07.23 16:25, Stefan Hajnoczi wrote:

On Tue, Jul 11, 2023 at 05:52:23PM +0200, Hanna Czenczek wrote:

When stopping the VM, qemu wants all devices to fully cease any
operation, too.  Currently, we can only have vhost-user back-ends stop
processing vrings, but not background operations.  Add the SUSPEND and
RESUME commands from vDPA, which allow the front-end (qemu) to tell
back-ends to cease all operations, including those running in the
background.

qemu's current work-around for this is to reset the back-end instead of
suspending it, which will not work for back-ends that have internal
state that must be preserved across e.g. stop/cont.

Note that the given specification requires the back-end to delay
processing kicks (i.e. starting vrings) until the device is resumed,
instead of requiring the front-end not to send kicks while suspended.
qemu starts devices (and would just resume them) only when the VM is in
a running state, so it would be difficult to have qemu delay kicks until
the device is resumed, which is why this patch specifies handling of
kicks as it does.

Signed-off-by: Hanna Czenczek 
---
   docs/interop/vhost-user.rst | 35 +--
   1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 5a070adbc1..ac6be34c4c 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -381,6 +381,10 @@ readable) on the descriptor specified by 
``VHOST_USER_SET_VRING_KICK``
   or receiving the in-band message ``VHOST_USER_VRING_KICK`` if negotiated,
   and stop ring upon receiving ``VHOST_USER_GET_VRING_BASE``.
+While the back-end is suspended (via ``VHOST_USER_SUSPEND``), it must
+never process rings, and thus also delay handling kicks until the

If you respin this series, I suggest replacing "never" with "not" to
emphasize that ring processing is only skipped while the device is
suspended (rather than forever). "Never" feels too strong to use when
describing a temporary state.

Sure.


+back-end is resumed again.
+
   Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``.
   If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
@@ -479,8 +483,9 @@ supplied by ``VhostUserLog``.
   ancillary data, it may be used to inform the front-end that the log has
   been modified.
-Once the source has finished migration, rings will be stopped by the
-source. No further update must be done before rings are restarted.
+Once the source has finished migration, the device will be suspended and
+its rings will be stopped by the source. No further update must be done
+before the device and its rings are resumed.

This paragraph is abstract and doesn't directly identify the mechanisms
or who does what:
- "the device will be suspended" via VHOST_USER_SUSPEND (or reset when
VHOST_USER_SUSPEND is not supported?) or automatically by the device
itself or some other mechanism?

OK, I’ll add VHOST_USER_SUSPEND.

So far I hadn’t considered making a note of resetting as a fallback right in
the specification.  I don’t think I would want it in the specification, but
on the other hand, it is probably important for back-end authors to know
that if they do not implement SUSPEND, their device is going to be reset by
qemu.

Can we make that a ”may”, i.e.:

```
Once the source has finished migration, the device will be suspended via
VHOST_USER_SUSPEND and its rings will be stopped by the source.

I'm not sure what "its rings will be stopped by the source" means
exactly. Is it summarizing the effect of VHOST_USER_SUSPEND or is there
an additional action after VHOST_USER_SUSPEND that stops rings? And I'm
not sure whether "by the source" means by the front-end or back-end on
the source machine?


This is pre-existing text and I assumed it (with not doubt) to refer to 
GET_VRING_BASE, because that is how rings are stopped.


I can improve the existing documentation and add the reference to 
GET_VRING_BASE, and say that it clearly must come from the front-end.



No further
update must be done before the device and its rings are resumed.

"Update" to what? Guest RAM? Device state? Rings?

I feel like this text is too vague for a spec. People may interpret it
differently. Can you make rephrase this to more concrete?


Honestly, no.  This is pre-existing, and I have the same questions as 
you do.


I cannot “rephrase” this to make it more concrete.  I can try to 
actually specify that was was left unspecified, but that would be a 
change in specification that would require its own patch, separate from 
this series.


Personally, I’ve generally taken this sentence to be fluff.  If the 
rings are stopped, clearly, they should not be accessed at all. Probably 
the back-end should also refrain from writing to guest memory, because 
that is a logical conclusion from having the rings stopped. 

[Bug 1787] Qemu asan test make vm crash when using qxl and spice

2023-07-25 Thread zhangjianguo (A)
Bug links: https://gitlab.com/qemu-project/qemu/-/issues/1787

When we tested QEMU with asan, the vm crash.
How to reproduce the bug:

1、 Start the vm with qxl and spice.

2、 Attach the vm with vnc and spice.

3、 Placed for more than three days.

4、 Operation on spice client and possible reproduce this bug.

I think the reason for the problem is that the cursor pointer was not set to 
NULL when qemu call cursor_put.
But I don't know what situation will trigger this error. This error is 
difficult to reproduce by natural.


Re: [PATCH 6/6] vhost-user: Have reset_status fall back to reset

2023-07-25 Thread Hanna Czenczek

On 24.07.23 20:04, Stefan Hajnoczi wrote:

On Fri, Jul 21, 2023 at 04:16:07PM +0200, Hanna Czenczek wrote:

On 20.07.23 18:03, Stefan Hajnoczi wrote:

On Wed, Jul 19, 2023 at 04:27:58PM +0200, Hanna Czenczek wrote:

On 19.07.23 16:11, Hanna Czenczek wrote:

On 18.07.23 17:10, Stefan Hajnoczi wrote:

On Tue, Jul 11, 2023 at 05:52:28PM +0200, Hanna Czenczek wrote:

The only user of vhost_user_reset_status() is vhost_dev_stop(), which
only uses it as a fall-back to stop the back-end if it does not support
SUSPEND.  However, vhost-user's implementation is a no-op unless the
back-end supports SET_STATUS.

vhost-vdpa's implementation instead just calls
vhost_vdpa_reset_device(), implying that it's OK to fully reset the
device if SET_STATUS is not supported.

To be fair, vhost_vdpa_reset_device() does nothing but to set
the status
to zero.  However, that may well be because vhost-vdpa has no method
besides this to reset a device.  In contrast, vhost-user has
RESET_DEVICE and a RESET_OWNER, which can be used instead.

While it is not entirely clear from documentation or git logs, from
discussions and the order of vhost-user protocol features, it
appears to
me as if RESET_OWNER originally had no real meaning for vhost-user, and
was thus used to signal a device reset to the back-end.  Then,
RESET_DEVICE was introduced, to have a well-defined dedicated reset
command.  Finally, vhost-user received full STATUS support, including
SET_STATUS, so setting the device status to 0 is now the preferred way
of resetting a device.  Still, RESET_DEVICE and RESET_OWNER should
remain valid as fall-backs.

Therefore, have vhost_user_reset_status() fall back to
vhost_user_reset_device() if the back-end has no STATUS support.

Signed-off-by: Hanna Czenczek 
---
    hw/virtio/vhost-user.c | 2 ++
    1 file changed, 2 insertions(+)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 4507de5a92..53a881ec2a 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -2833,6 +2833,8 @@ static void vhost_user_reset_status(struct
vhost_dev *dev)
    if (virtio_has_feature(dev->protocol_features,
   VHOST_USER_PROTOCOL_F_STATUS)) {
    vhost_user_set_status(dev, 0);
+    } else {
+    vhost_user_reset_device(dev);
    }
    }

Did you check whether DPDK treats setting the status to 0 as equivalent
to RESET_DEVICE?

If it doesn’t, what’s even the point of using reset_status?

Sorry, I’m being unclear, and I think this may be important because it ties
into the question from patch 1, what qemu is even trying to do by running
SET_STATUS(0) vhost_dev_stop(), so here’s what gave me the impression that
SET_STATUS(0) and RESET_DEVICE should be equivalent:

vhost-vdpa.c runs SET_STATUS(0) in a function called
vhost_vdpa_reset_device().  This is one thing that gave me the impression
that this is about an actual full reset.

Another is the whole discussion that we’ve had.  vhost_dev_stop() does not
call a `vhost_reset_device()` function, it calls `vhost_reset_status()`.
Still, we were always talking about resetting the device.

There is some hacky stuff with struct vhost_dev's vq_index_end and
multi-queue devices. I think it's because multi-queue vhost-net device
consist of many vhost_devs and NetClientStates, so certain vhost
operations are skipped unless this is the "first" or "last" vhost_dev
from a large aggregate vhost-net device. That might be responsible for
part of the weirdness.


It doesn’t make sense to me that vDPA would provide no function to fully
reset a device, while vhost-user does.  Being able to reset a device sounds
vital to me.  This also gave me the impression that SET_STATUS(0) on vDPA at
least is functionally equivalent to a full device reset.


Maybe SET_STATUS(0) does mean a full device reset on vDPA, but not on
vhost-user.  That would be a real shame, so I assumed this would not be the
case; that SET_STATUS(0) does the same thing on both protocols.

Yes, exactly. It has the real VIRTIO spec meaning in vDPA. In vhost-user
it's currently only used by DPDK as a hint for when device
initialization is complete:
https://github.com/DPDK/dpdk/commit/41d201804c4c44738168e2d247d3b1780845faa1

FWIW, now the code is a bit different.
https://github.com/DPDK/dpdk/commit/671cc679a5fcd26705bb20ddc13b93e665719054
has added a RESET interpretation for the status field, i.e. when it is 0.
It doesn’t do anything, but at least DPDK seems to agree that SET_STATUS(0)
is a reset.

That patch adds diagnostics but does not perform any action for
SET_STATUS 0. DPDK's vhost_user_reset_owner() is still the only place
where the device is actually reset.


That’s what I said, it doesn’t do anything, but the diagnostics agree 
that it is a RESET.



QEMU cannot switch to just
SET_STATUS 0, it still needs to send RESET_DEVICE/RESET_OWNER.


That is what I questioned below: We currently *do not* call 
RESET_DEVICE/RESET_OWNER.  This patch is not about switching to 
SET_STATUS(0), it is abo

Re: [PATCH] migrate/multifd: fix coredump when the multifd thread cleanup

2023-07-25 Thread chenyuhui (A)
@Peter Xu @Fabiano Rosas
Kindly ping on this.

On 2023/6/27 9:11, chenyuhui (A) wrote:
> 
> On 2023/6/26 21:16, chenyuhui (A) wrote:
>>
>> On 2023/6/21 22:22, Fabiano Rosas wrote:
>>> Jianguo Zhang via  writes:
>>>
 From: Yuhui Chen 

 There is a coredump while trying to destroy mutex when
 p->running is false but p->mutex is not unlock.
 Make sure all mutexes has been released before destroy them.

 Signed-off-by: Yuhui Chen 
 ---
  migration/multifd.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)

 diff --git a/migration/multifd.c b/migration/multifd.c
 index b7ad7002e0..7dcdb2d3a0 100644
 --- a/migration/multifd.c
 +++ b/migration/multifd.c
 @@ -523,9 +523,7 @@ void multifd_save_cleanup(void)
  for (i = 0; i < migrate_multifd_channels(); i++) {
  MultiFDSendParams *p = &multifd_send_state->params[i];
  
 -if (p->running) {
>>>
>>> The need for this flag is dubious IMO. Commit 10351fbad1
>>> ("migration/multifd: Join all multifd threads in order to avoid leaks")
>>> already moved the other join outside of it. If we figure out another way
>>> to deal with the sem_sync lockup we could probably remove this
>>> altogether.
>>
>>
>> I've seen this commit 10351fbad1, and it's seems to have the same
>> problem in function multifd_save_cleanup.
>>
>> So that may my patch only need to modify multifd_save_cleanup.
>>
>> __
>>
>>
>> On 2023/6/21 21:24, Peter Xu wrote:
>>> On Wed, Jun 21, 2023 at 04:18:26PM +0800, Jianguo Zhang via wrote:
 From: Yuhui Chen

 There is a coredump while trying to destroy mutex when
 p->running is false but p->mutex is not unlock.
 Make sure all mutexes has been released before destroy them.
>>>
>>> It'll be nice to add a backtrace of the coredump here, and also copy
>>> maintainer (Juan Quintela, copied now).
>>>
>>
>> The following is coredump, and my code is base on
>> https://github.com/qemu/qemu.git tag v6.2.0.
>>
> (gdb) bt
> #0  0xabe3b2b8 in  () at /usr/lib64/libc.so.6
> #1  0xabdf6d7c in raise () at /usr/lib64/libc.so.6
> #2  0xabde4d2c in abort () at /usr/lib64/libc.so.6
> #3  0xc67fcc10 in error_exit (err=, 
> msg=msg@entry=0xc6dc52b8 <__func__.33> "qemu_mutex_destroy") at 
> ../util/qemu-thread-posix.c:38
> #4  0xc67fce38 in qemu_mutex_destroy 
> (mutex=mutex@entry=0xfa1a4250) at ../util/qemu-thread-posix.c:71
> #5  0xc6055688 in multifd_save_cleanup () at 
> ../migration/multifd.c:555
> #6  0xc6050198 in migrate_fd_cleanup (s=s@entry=0xf7518800) at 
> ../migration/migration.c:1808
> #7  0xc6050384 in migrate_fd_cleanup_bh (opaque=0xf7518800) at 
> ../migration/migration.c:1850
> #8  0xc680d790 in aio_bh_call (bh=0xa0004c40) at 
> ../util/async.c:141
> #9  aio_bh_poll (ctx=ctx@entry=0xf73285a0) at ../util/async.c:169
> #10 0xc67f9e18 in aio_dispatch (ctx=0xf73285a0) at 
> ../util/aio-posix.c:381
> #11 0xc680d414 in aio_ctx_dispatch (source=, 
> callback=, user_data=) at ../util/async.c:311
> #12 0xac44cf88 in g_main_context_dispatch () at 
> /usr/lib64/libglib-2.0.so.0
> #13 0xc6819214 in glib_pollfds_poll () at ../util/main-loop.c:232
> #14 os_host_main_loop_wait (timeout=73500) at ../util/main-loop.c:255
> #15 main_loop_wait (nonblocking=nonblocking@entry=0) at 
> ../util/main-loop.c:531
> #16 0xc65005cc in qemu_main_loop () at ../softmmu/runstate.c:726
> #17 0xc5fe2030 in main (argc=, argv=, 
> envp=) at ../softmmu/main.c:50
> (gdb) q
> 
>> How reproducible:
>> 1、And sleep time to produce p->running is false but p->mutex is
>>  not unlock.(apply following patch)
>> 2、Do migration with --parallel-connections.
 From: Yuhui Chen 
>> Date: Mon, 26 Jun 2023 14:24:35 +0800
>> Subject: [DEBUG][PATCH] And sleep time to produce p->running is false but 
>> p->mutex is
>>  not unlock.
>>
>> ---
>>  migration/multifd.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index 7c9deb1921..09a7b0748a 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -538,6 +538,7 @@ void multifd_save_cleanup(void)
>>  for (i = 0; i < migrate_multifd_channels(); i++) {
>>  MultiFDSendParams *p = &multifd_send_state->params[i];
>>
>> +sleep(2);
>>  if (p->running) {
>>  qemu_thread_join(&p->thread);
>>  }
>> @@ -719,6 +720,7 @@ out:
>>
>>  qemu_mutex_lock(&p->mutex);
>>  p->running = false;
>> +sleep(20);
>>  qemu_mutex_unlock(&p->mutex);
>>
>>  rcu_unregister_thread();



Re: [PATCH v21 01/20] s390x/cpu topology: add s390 specifics to CPU topology

2023-07-25 Thread Pierre Morel



On 7/24/23 12:15, Nina Schoetterl-Glausch wrote:

On Fri, 2023-07-21 at 13:24 +0200, Pierre Morel wrote:

On 7/18/23 18:31, Nina Schoetterl-Glausch wrote:

Reviewed-by: Nina Schoetterl-Glausch 

Some notes below.

The s390x/ prefix in the title might suggest that this patch
is s390 specific, but it touches common files.


Right.

What do you suggest?

I can cut it in two or squash it with patch number 2.

The first idea was to separate the patch to ease the review but the
functionality introduced in patch 1 do only make sense with patch 2.

So I would be for squashing the first two patches.

?

Oh, I'd just change the title.

CPU topology: extend with s390 specifics

or similar, it was just a nit not to create the impression that the
patch only touches s390 stuff.



OK thanks

Pierre






Re: [PATCH v21 02/20] s390x/cpu topology: add topology entries on CPU hotplug

2023-07-25 Thread Pierre Morel



On 7/24/23 22:19, Nina Schoetterl-Glausch wrote:

On Fri, 2023-06-30 at 11:17 +0200, Pierre Morel wrote:

The topology information are attributes of the CPU and are
specified during the CPU device creation.

On hot plug we:
- calculate the default values for the topology for drawers,
   books and sockets in the case they are not specified.
- verify the CPU attributes
- check that we have still room on the desired socket

The possibility to insert a CPU in a mask is dependent on the
number of cores allowed in a socket, a book or a drawer, the
checking is done during the hot plug of the CPU to have an
immediate answer.

If the complete topology is not specified, the core is added
in the physical topology based on its core ID and it gets
defaults values for the modifier attributes.

This way, starting QEMU without specifying the topology can
still get some advantage of the CPU topology.

Signed-off-by: Pierre Morel 

Reviewed-by: Nina Schoetterl-Glausch  if you address
Thomas' comments.


Thanks,

regards

Pierre




[PATCH] semihosting/uaccess.c: Replaced a malloc call with g_malloc.

2023-07-25 Thread dinglimin
Replaced a call to malloc() and its respective call to free() with g_malloc() 
and g_free().

Signed-off-by: dinglimin 

V1 -> V2:if cpu_memory_rw_debug failed, still need to set p=NULL
---
 semihosting/uaccess.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/semihosting/uaccess.c b/semihosting/uaccess.c
index 8018828069..2ac754cdb6 100644
--- a/semihosting/uaccess.c
+++ b/semihosting/uaccess.c
@@ -14,10 +14,10 @@
 void *softmmu_lock_user(CPUArchState *env, target_ulong addr,
 target_ulong len, bool copy)
 {
-void *p = malloc(len);
+void *p = g_malloc(len);
 if (p && copy) {
 if (cpu_memory_rw_debug(env_cpu(env), addr, p, len, 0)) {
-free(p);
+g_free(p);
 p = NULL;
 }
 }
-- 
2.30.0.windows.2






Re: [PATCH QEMU v10 3/9] qapi/migration: Introduce vcpu-dirty-limit parameters

2023-07-25 Thread Markus Armbruster
~hyman  writes:

> From: Hyman Huang(黄勇) 
>
> Introduce "vcpu-dirty-limit" migration parameter used
> to limit dirty page rate during live migration.
>
> "vcpu-dirty-limit" and "x-vcpu-dirty-limit-period" are
> two dirty-limit-related migration parameters, which can
> be set before and during live migration by qmp
> migrate-set-parameters.
>
> This two parameters are used to help implement the dirty
> page rate limit algo of migration.
>
> Signed-off-by: Hyman Huang(黄勇) 
> Acked-by: Peter Xu 
> Reviewed-by: Juan Quintela 

Reviewed-by: Markus Armbruster 




Re: [PATCH] semihosting/uaccess.c: Replaced a malloc call with g_malloc.

2023-07-25 Thread Michael Tokarev

25.07.2023 12:00, dinglimin wrote:

Replaced a call to malloc() and its respective call to free() with g_malloc() 
and g_free().

Signed-off-by: dinglimin 

V1 -> V2:if cpu_memory_rw_debug failed, still need to set p=NULL
---
  semihosting/uaccess.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/semihosting/uaccess.c b/semihosting/uaccess.c
index 8018828069..2ac754cdb6 100644
--- a/semihosting/uaccess.c
+++ b/semihosting/uaccess.c
@@ -14,10 +14,10 @@
  void *softmmu_lock_user(CPUArchState *env, target_ulong addr,
  target_ulong len, bool copy)
  {
-void *p = malloc(len);
+void *p = g_malloc(len);
  if (p && copy) {
  if (cpu_memory_rw_debug(env_cpu(env), addr, p, len, 0)) {
-free(p);
+g_free(p);
  p = NULL;
  }
  }


Ok, that was the obvious part.  Now a next one, also obvious.

You changed lock_user to use g_malloc(), but unlock_user
still uses free() instead of g_free().  At the very least
the other one needs to be changed too.  And I'd say the callers
should be analyzed to ensure they don't free() the result
(they should not, think it is a bug if they do).

lock_user/unlock_user (which #defines to softmmu_lock_user/
softmmu_unlock_user in softmmu mode) is used a *lot*.

/mjt



Re: [PATCH] migration: Allow user to specify migration available bandwidth

2023-07-25 Thread Daniel P . Berrangé
On Mon, Jul 24, 2023 at 03:47:50PM -0400, Peter Xu wrote:
> On Mon, Jul 24, 2023 at 07:04:29PM +0100, Daniel P. Berrangé wrote:
> > On Mon, Jul 24, 2023 at 01:07:55PM -0400, Peter Xu wrote:
> > > Migration bandwidth is a very important value to live migration.  It's
> > > because it's one of the major factors that we'll make decision on when to
> > > switchover to destination in a precopy process.
> > 
> > To elaborate on this for those reading along...
> > 
> > QEMU takes maxmimum downtime limit and multiplies by its estimate
> > of bandwidth. This gives a figure for the amount of data QEMU thinks
> > it can transfer within the downtime period.
> > 
> > QEMU compares this figure to the amount of data that is still pending
> > at the end of an iteration.
> > 
> > > This value is currently estimated by QEMU during the whole live migration
> > > process by monitoring how fast we were sending the data.  This can be the
> > > most accurate bandwidth if in the ideal world, where we're always feeding
> > > unlimited data to the migration channel, and then it'll be limited to the
> > > bandwidth that is available.
> > 
> > The QEMU estimate for available bandwidth will definitely be wrong,
> > potentially by orders of magnitude, if QEMU has a max bandwidth limit
> > set, as in that case it is never trying to push the peak rates available
> > from the NICs/network fabric.
> > 
> > > The issue is QEMU itself may not be able to avoid those uncertainties on
> > > measuing the real "available migration bandwidth".  At least not something
> > > I can think of so far.
> > 
> > IIUC, you can query the NIC properties to find the hardware transfer
> > rate of the NICs. That doesn't imply apps can actually reach that
> > rate in practice - it has a decent chance of being a over-estimate
> > of bandwidth, possibly very very much over.
> > 
> > Is such an over estimate better or worse than QEMU's current
> > under-estimate ? It depends on the POV.
> > 
> > From the POV of QEMU, over-estimating means means it'll be not
> > be throttling as much as it should. That's not a downside of
> > migration - it makes it more likely for migration to complete :-)
> 
> Heh. :)
> 
> > 
> > From the POV of non-QEMU apps though, if QEMU over-estimates,
> > it'll mean other apps get starved of network bandwidth.
> > 
> > Overall I agree, there's no obvious way QEMU can ever come up
> > with a reliable estimate for bandwidth available.
> > 
> > > One way to fix this is when the user is fully aware of the available
> > > bandwidth, then we can allow the user to help providing an accurate value.
> > >
> > > For example, if the user has a dedicated channel of 10Gbps for migration
> > > for this specific VM, the user can specify this bandwidth so QEMU can
> > > always do the calculation based on this fact, trusting the user as long as
> > > specified.
> > 
> > I can see that in theory, but when considering a non-trivial
> > deployments of QEMU, I wonder if the user can really have any
> > such certainty of what is truely avaialble. It would need
> > global awareness of the whole network of hosts & workloads.
> 
> Indeed it may or may not be easy always.
> 
> The good thing about this parameter is we always use the old estimation if
> the user can't specify anything valid, so this is always optional not
> required.
> 
> It solves the cases where the user can still specify accurately on the bw -
> our QE team has already verified that it worked for us on GPU tests, where
> it used to not be able to migrate at all with any sane downtime specified.
> I should have attached a Tested-By from Zhiyi but since this is not exactly
> the patch he was using I didn't.
> 
> > 
> > > When the user wants to have migration only use 5Gbps out of that 10Gbps,
> > > one can set max-bandwidth to 5Gbps, along with available-bandwidth to 
> > > 5Gbps
> > > so it'll never use over 5Gbps too (so the user can have the rest 5Gbps for
> > > other things).  So it can be useful even if the network is not dedicated,
> > > but as long as the user can know a solid value.
> > > 
> > > A new parameter "available-bandwidth" is introduced just for this. So when
> > > the user specified this parameter, instead of trusting the estimated value
> > > from QEMU itself (based on the QEMUFile send speed), let's trust the user
> > > more.
> > 
> > I feel like rather than "available-bandwidth", we should call
> > it "max-convergance-bandwidth".
> > 
> > To me that name would better reflect the fact that this isn't
> > really required to be a measure of how much NIC bandwidth is
> > available. It is merely an expression of a different bandwidth
> > limit to apply during switch over.
> > 
> > IOW
> > 
> > * max-bandwidth: limit during pre-copy main transfer
> > * max-convergance-bandwidth: limit during pre-copy switch-over
> > * max-postcopy-banwidth: limit during post-copy phase
> 
> I worry the new name suggested is not straightforward enough at the 1st
> glance, even to me as a developer.
> 

Re: [PATCH 0/6] Add nRF51 DETECT signal with test

2023-07-25 Thread Peter Maydell
On Tue, 25 Jul 2023 at 04:25, Chris Laplante  wrote:
>
> Hi Peter,
>
> > Thanks for this patchset and especially for the work
> > improving the qtest infrastructure. I've given my
> > comments on the different patches, and in some cases
> > reviewed-by tags. (Where I've given one of those, you should
> > add it to your commit message for the relevant patch under
> > your Signed-off-by: line, so that when you send the version
> > 2 of the patchset we know that those parts are already
> > reviewed and don't need re-examining. If I said "make
> > some change; otherwise Reviewed-by" that means "make
> > that minor change, and then you can add the tag, etc".)
>
> Thanks very much for the feedback and help!
>
> > Do you have the parts of this feature that use the DETECT
> > signal in the POWER device, or have you not written those
> > yet ? If you have them, you could send those too in v2.
>
> That part is halfway done, so I will work on finishing it before submitting 
> v2. Two questions regarding that (to potentially save us a v3):
>
> 1. The nRF51 POWER device overlaps with the memory maps of the CLOCK and MPU 
> devices. So I have created a CPM (CLOCK, POWER, MPU) device in hw/misc. Does 
> that sound reasonable naming-wise?

Yes, I think from QEMU's point of view the massive register
overlap makes them a single device. The name sounds OK
(give it the same kind of nrf51 prefix the rng device has).

> 2. I also have some implementations for pieces of CLOCK, namely the 
> HFCLKSTART/HFCLKSTOP events and HFCLKSTARTED event. Should I include that in 
> this patch series, or would you prefer it in a separate series? It is 
> unrelated to DETECT and POWER.

If you think they're ready to go in, and it doesn't
make the series more than about 12-15 patches long,
you can put them on the end of the series. If the
patchset is starting to get a bit big then it might
be easier to get the POWER/DETECT parts reviewed
first.

thanks
-- PMM



Re: [PATCH] semihosting/uaccess.c: Replaced a malloc call with g_malloc.

2023-07-25 Thread Peter Maydell
On Tue, 25 Jul 2023 at 10:13, Michael Tokarev  wrote:
>
> 25.07.2023 12:00, dinglimin wrote:
> > Replaced a call to malloc() and its respective call to free() with 
> > g_malloc() and g_free().
> >
> > Signed-off-by: dinglimin 
> >
> > V1 -> V2:if cpu_memory_rw_debug failed, still need to set p=NULL
> > ---
> >   semihosting/uaccess.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/semihosting/uaccess.c b/semihosting/uaccess.c
> > index 8018828069..2ac754cdb6 100644
> > --- a/semihosting/uaccess.c
> > +++ b/semihosting/uaccess.c
> > @@ -14,10 +14,10 @@
> >   void *softmmu_lock_user(CPUArchState *env, target_ulong addr,
> >   target_ulong len, bool copy)
> >   {
> > -void *p = malloc(len);
> > +void *p = g_malloc(len);
> >   if (p && copy) {
> >   if (cpu_memory_rw_debug(env_cpu(env), addr, p, len, 0)) {
> > -free(p);
> > +g_free(p);
> >   p = NULL;
> >   }
> >   }
>
> Ok, that was the obvious part.  Now a next one, also obvious.
>
> You changed lock_user to use g_malloc(), but unlock_user
> still uses free() instead of g_free().  At the very least
> the other one needs to be changed too.  And I'd say the callers
> should be analyzed to ensure they don't free() the result
> (they should not, think it is a bug if they do).

We can be pretty sure the callers don't free() the returned
value, because the calling code is also used in user-mode,
where the lock/unlock implementation is entirely different
and calling free() on the pointer will not work.

> lock_user/unlock_user (which #defines to softmmu_lock_user/
> softmmu_unlock_user in softmmu mode) is used a *lot*.

The third part here, is that g_malloc() does not ever
fail -- it will abort() on out of memory. However
the code here is still handling g_malloc() returning NULL.
The equivalent for "we expect this might fail" (which we want
here, because the guest is passing us the length of memory
to try to allocate) is g_try_malloc().

thanks
-- PMM



intel-iommu: Report interrupt remapping faults, fix return value

2023-07-25 Thread David Woodhouse
From: David Woodhouse 

A generic X86IOMMUClass->int_remap function should not return VT-d
specific values; fix it to return 0 if the interrupt was successfully
translated or -EINVAL if not.

The VTD_FR_IR_xxx values are supposed to be used to actually raise
faults through the fault reporting mechanism, so do that instead for
the case where the IRQ is actually being injected.

There is more work to be done here, as pretranslations for the KVM IRQ
routing table can't fault; an untranslatable IRQ should be handled in
userspace and the fault raised only when the IRQ actually happens (if
indeed the IRTE is still not valid at that time). But we can work on
that later; we can at least raise faults for the direct case.

Signed-off-by: David Woodhouse 
---
 hw/i386/intel_iommu.c  | 148 ++---
 hw/i386/intel_iommu_internal.h |   1 +
 2 files changed, 102 insertions(+), 47 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index dcc334060c..a65a94a4ce 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -469,21 +469,12 @@ static void vtd_set_frcd_and_update_ppf(IntelIOMMUState 
*s, uint16_t index)
 
 /* Must not update F field now, should be done later */
 static void vtd_record_frcd(IntelIOMMUState *s, uint16_t index,
-uint16_t source_id, hwaddr addr,
-VTDFaultReason fault, bool is_write,
-bool is_pasid, uint32_t pasid)
+uint64_t hi, uint64_t lo)
 {
-uint64_t hi = 0, lo;
 hwaddr frcd_reg_addr = DMAR_FRCD_REG_OFFSET + (((uint64_t)index) << 4);
 
 assert(index < DMAR_FRCD_REG_NR);
 
-lo = VTD_FRCD_FI(addr);
-hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault) |
- VTD_FRCD_PV(pasid) | VTD_FRCD_PP(is_pasid);
-if (!is_write) {
-hi |= VTD_FRCD_T;
-}
 vtd_set_quad_raw(s, frcd_reg_addr, lo);
 vtd_set_quad_raw(s, frcd_reg_addr + 8, hi);
 
@@ -509,17 +500,11 @@ static bool vtd_try_collapse_fault(IntelIOMMUState *s, 
uint16_t source_id)
 }
 
 /* Log and report an DMAR (address translation) fault to software */
-static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id,
-  hwaddr addr, VTDFaultReason fault,
-  bool is_write, bool is_pasid,
-  uint32_t pasid)
+static void vtd_report_frcd_fault(IntelIOMMUState *s, uint64_t source_id,
+  uint64_t hi, uint64_t lo)
 {
 uint32_t fsts_reg = vtd_get_long_raw(s, DMAR_FSTS_REG);
 
-assert(fault < VTD_FR_MAX);
-
-trace_vtd_dmar_fault(source_id, fault, addr, is_write);
-
 if (fsts_reg & VTD_FSTS_PFO) {
 error_report_once("New fault is not recorded due to "
   "Primary Fault Overflow");
@@ -539,8 +524,7 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, 
uint16_t source_id,
 return;
 }
 
-vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault,
-is_write, is_pasid, pasid);
+vtd_record_frcd(s, s->next_frcd_reg, hi, lo);
 
 if (fsts_reg & VTD_FSTS_PPF) {
 error_report_once("There are pending faults already, "
@@ -565,6 +549,40 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, 
uint16_t source_id,
 }
 }
 
+/* Log and report an DMAR (address translation) fault to software */
+static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id,
+  hwaddr addr, VTDFaultReason fault,
+  bool is_write, bool is_pasid,
+  uint32_t pasid)
+{
+uint64_t hi, lo;
+
+assert(fault < VTD_FR_MAX);
+
+trace_vtd_dmar_fault(source_id, fault, addr, is_write);
+
+lo = VTD_FRCD_FI(addr);
+hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault) |
+ VTD_FRCD_PV(pasid) | VTD_FRCD_PP(is_pasid);
+if (!is_write) {
+hi |= VTD_FRCD_T;
+}
+
+vtd_report_frcd_fault(s, source_id, hi, lo);
+}
+
+
+static void vtd_report_ir_fault(IntelIOMMUState *s, uint64_t source_id,
+VTDFaultReason fault, uint16_t index)
+{
+uint64_t hi, lo;
+
+lo = VTD_FRCD_IR_IDX(index);
+hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault);
+
+vtd_report_frcd_fault(s, source_id, hi, lo);
+}
+
 /* Handle Invalidation Queue Errors of queued invalidation interface error
  * conditions.
  */
@@ -3300,8 +3318,9 @@ static Property vtd_properties[] = {
 };
 
 /* Read IRTE entry with specific index */
-static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t index,
-VTD_IR_TableEntry *entry, uint16_t sid)
+static bool vtd_irte_get(IntelIOMMUState *iommu, uint16_t index,
+ VTD_IR_TableEntry *entry, uint16_t sid,
+ bool do_fault)
 {
 static const uint16_t vtd_svt_mask[VTD_SQ_MAX] = \
 {0x, 0xfffb, 0xfff9, 0x

Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop

2023-07-25 Thread Eugenio Perez Martin
On Tue, Jul 25, 2023 at 9:53 AM Hanna Czenczek  wrote:
>
> On 24.07.23 17:48, Eugenio Perez Martin wrote:
> > On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek  wrote:
> >> On 21.07.23 17:25, Eugenio Perez Martin wrote:
> >>> On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek  wrote:
>  Move the `suspended` field from vhost_vdpa into the global vhost_dev
>  struct, so vhost_dev_stop() can check whether the back-end has been
>  suspended by `vhost_ops->vhost_dev_start(hdev, false)`.  If it has,
>  there is no need to reset it; the reset is just a fall-back to stop
>  device operations for back-ends that do not support suspend.
> 
>  Unfortunately, for vDPA specifically, RESUME is not yet implemented, so
>  when the device is re-started, we still have to do the reset to have it
>  un-suspend.
> 
>  Signed-off-by: Hanna Czenczek 
>  ---
> include/hw/virtio/vhost-vdpa.h |  2 --
> include/hw/virtio/vhost.h  |  8 
> hw/virtio/vhost-vdpa.c | 11 +++
> hw/virtio/vhost.c  |  8 +++-
> 4 files changed, 22 insertions(+), 7 deletions(-)
> 
>  diff --git a/include/hw/virtio/vhost-vdpa.h 
>  b/include/hw/virtio/vhost-vdpa.h
>  index e64bfc7f98..72c3686b7f 100644
>  --- a/include/hw/virtio/vhost-vdpa.h
>  +++ b/include/hw/virtio/vhost-vdpa.h
>  @@ -42,8 +42,6 @@ typedef struct vhost_vdpa {
> bool shadow_vqs_enabled;
> /* Vdpa must send shadow addresses as IOTLB key for data queues, 
>  not GPA */
> bool shadow_data;
>  -/* Device suspended successfully */
>  -bool suspended;
> /* IOVA mapping used by the Shadow Virtqueue */
> VhostIOVATree *iova_tree;
> GPtrArray *shadow_vqs;
>  diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>  index 6a173cb9fa..69bf59d630 100644
>  --- a/include/hw/virtio/vhost.h
>  +++ b/include/hw/virtio/vhost.h
>  @@ -120,6 +120,14 @@ struct vhost_dev {
> uint64_t backend_cap;
> /* @started: is the vhost device started? */
> bool started;
>  +/**
>  + * @suspended: Whether the vhost device is currently suspended.  Set
>  + * and reset by implementations (vhost-user, vhost-vdpa, ...), which
>  + * are supposed to automatically suspend/resume in their
>  + * vhost_dev_start handlers as required.  Must also be cleared when
>  + * the device is reset.
>  + */
>  +bool suspended;
> bool log_enabled;
> uint64_t log_size;
> Error *migration_blocker;
>  diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>  index 7b7dee468e..f7fd19a203 100644
>  --- a/hw/virtio/vhost-vdpa.c
>  +++ b/hw/virtio/vhost-vdpa.c
>  @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct 
>  vhost_dev *dev,
> 
> static int vhost_vdpa_reset_device(struct vhost_dev *dev)
> {
>  -struct vhost_vdpa *v = dev->opaque;
> int ret;
> uint8_t status = 0;
> 
> ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
> trace_vhost_vdpa_reset_device(dev);
>  -v->suspended = false;
>  +dev->suspended = false;
> return ret;
> }
> 
>  @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev 
>  *dev)
> if (unlikely(r)) {
> error_report("Cannot suspend: %s(%d)", g_strerror(errno), 
>  errno);
> } else {
>  -v->suspended = true;
>  +dev->suspended = true;
> return;
> }
> }
>  @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
>  *dev, bool started)
> return -1;
> }
> vhost_vdpa_set_vring_ready(dev);
>  +if (dev->suspended) {
>  +/* TODO: When RESUME is available, use it instead of 
>  resetting */
>  +vhost_vdpa_reset_status(dev);
> >>> How is that we reset the status at each vhost_vdpa_dev_start? That
> >>> will clean all the vqs configured, features negotiated, etc. in the
> >>> vDPA device. Or am I missing something?
> >> What alternative do you propose?  We don’t have RESUME for vDPA in qemu,
> >> but we somehow need to lift the previous SUSPEND so the device will
> >> again respond to guest requests, do we not?
> >>
> > Reset also clears the suspend state in vDPA, and it should be called
> > at vhost_dev_stop. So the device should never be in suspended state
> > here. Does that solve your concerns?
>
> My intention with this patch was precisely not to reset in
> vhost_dev_stop when suspending is supported.  So now I’m more confused
> than before.
>

At this moment, I think that should be foc

i386/xen: prevent guest from binding loopback event channel to itself

2023-07-25 Thread David Woodhouse
From: David Woodhouse 

Fuzzing showed that a guest could bind an interdomain port to itself, by
guessing the next port to be allocated and putting that as the 'remote'
port number. By chance, that works because the newly-allocated port has
type EVTCHNSTAT_unbound. It shouldn't.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 0e9c108614..70b4b8a6ef 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -1408,8 +1408,15 @@ int xen_evtchn_bind_interdomain_op(struct 
evtchn_bind_interdomain *interdomain)
 XenEvtchnPort *rp = &s->port_table[interdomain->remote_port];
 XenEvtchnPort *lp = &s->port_table[interdomain->local_port];
 
-if (rp->type == EVTCHNSTAT_unbound && rp->type_val == 0) {
-/* It's a match! */
+/*
+ * The 'remote' port for loopback must be an unbound port allocated for
+ * communication with the local domain (as indicated by rp->type_val
+ * being zero, not PORT_INFO_TYPEVAL_REMOTE_QEMU), and must *not* be
+ * the port that was just allocated for the local end.
+ */
+if (interdomain->local_port != interdomain->remote_port &&
+rp->type == EVTCHNSTAT_unbound && rp->type_val == 0) {
+
 rp->type = EVTCHNSTAT_interdomain;
 rp->type_val = interdomain->local_port;
 
-- 
2.34.1




smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] vhost-user-scsi: support reconnect to backend

2023-07-25 Thread Li Feng
Thanks for your comments.

> 2023年7月25日 上午1:21,Raphael Norwitz  写道:
> 
> Very excited to see this. High level looks good modulo a few small things.
> 
> My major concern is around existing vhost-user-scsi backends which don’t 
> support VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD. IMO we should hide the 
> reconnect behavior behind a VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD check. We 
> may want to do the same for vhost-user-blk.
> 
> The question is then what happens if the check is false. IIUC without an 
> inflight FD, if a device processes requests out of order, it’s not safe to 
> continue execution on reconnect, as there’s no way for the backend to know 
> how to replay IO. Should we permanently wedge the device or have QEMU fail 
> out? May be nice to have a toggle for this.

Based on what MST said, is there anything else I need to do?
> 
>> On Jul 21, 2023, at 6:51 AM, Li Feng  wrote:
>> 
>> If the backend crashes and restarts, the device is broken.
>> This patch adds reconnect for vhost-user-scsi.
>> 
>> Tested with spdk backend.
>> 
>> Signed-off-by: Li Feng 
>> ---
>> hw/block/vhost-user-blk.c   |   2 -
>> hw/scsi/vhost-scsi-common.c |  27 ++---
>> hw/scsi/vhost-user-scsi.c   | 163 +---
>> include/hw/virtio/vhost-user-scsi.h |   3 +
>> include/hw/virtio/vhost.h   |   2 +
>> 5 files changed, 165 insertions(+), 32 deletions(-)
>> 
>> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
>> index eecf3f7a81..f250c740b5 100644
>> --- a/hw/block/vhost-user-blk.c
>> +++ b/hw/block/vhost-user-blk.c
>> @@ -32,8 +32,6 @@
>> #include "sysemu/sysemu.h"
>> #include "sysemu/runstate.h"
>> 
>> -#define REALIZE_CONNECTION_RETRIES 3
>> -
>> static const int user_feature_bits[] = {
>>VIRTIO_BLK_F_SIZE_MAX,
>>VIRTIO_BLK_F_SEG_MAX,
>> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
> 
> Why can’t all the vhost-scsi-common stuff be moved to a separate change?

I will move this code to separate patch.
> 
> Especially the stuff introduced for vhost-user-blk in 
> 1b0063b3048af65dfaae6422a572c87db8575a92 should be moved out.
OK.

> 
>> index a06f01af26..08801886b8 100644
>> --- a/hw/scsi/vhost-scsi-common.c
>> +++ b/hw/scsi/vhost-scsi-common.c
>> @@ -52,16 +52,22 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>> 
>>vsc->dev.acked_features = vdev->guest_features;
>> 
>> -assert(vsc->inflight == NULL);
>> -vsc->inflight = g_new0(struct vhost_inflight, 1);
>> -ret = vhost_dev_get_inflight(&vsc->dev,
>> - vs->conf.virtqueue_size,
>> - vsc->inflight);
>> +ret = vhost_dev_prepare_inflight(&vsc->dev, vdev);
>>if (ret < 0) {
>> -error_report("Error get inflight: %d", -ret);
>> +error_report("Error setting inflight format: %d", -ret);
>>goto err_guest_notifiers;
>>}
>> 
>> +if (!vsc->inflight->addr) {
>> +ret = vhost_dev_get_inflight(&vsc->dev,
>> +vs->conf.virtqueue_size,
>> +vsc->inflight);
>> +if (ret < 0) {
>> +error_report("Error get inflight: %d", -ret);
>> +goto err_guest_notifiers;
>> +}
>> +}
>> +
>>ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight);
>>if (ret < 0) {
>>error_report("Error set inflight: %d", -ret);
>> @@ -85,9 +91,6 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>>return ret;
>> 
>> err_guest_notifiers:
>> -g_free(vsc->inflight);
>> -vsc->inflight = NULL;
>> -
>>k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
>> err_host_notifiers:
>>vhost_dev_disable_notifiers(&vsc->dev, vdev);
>> @@ -111,12 +114,6 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc)
>>}
>>assert(ret >= 0);
>> 
> 
> In the vhost-scsi (kernel backend) path, what will cleanup vsc->inflight now?
OK, we should check the vsc->inflight if it is null, the vhost-scsi doesn’t 
allocate the
inflight object memory.

> 
>> -if (vsc->inflight) {
>> -vhost_dev_free_inflight(vsc->inflight);
>> -g_free(vsc->inflight);
>> -vsc->inflight = NULL;
>> -}
>> -
>>vhost_dev_disable_notifiers(&vsc->dev, vdev);
>> }
>> 
>> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
>> index ee99b19e7a..e0e88b0c42 100644
>> --- a/hw/scsi/vhost-user-scsi.c
>> +++ b/hw/scsi/vhost-user-scsi.c
>> @@ -89,14 +89,126 @@ static void vhost_dummy_handle_output(VirtIODevice 
>> *vdev, VirtQueue *vq)
>> {
>> }
>> 
>> +static int vhost_user_scsi_connect(DeviceState *dev, Error **errp)
>> +{
>> +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>> +VHostUserSCSI *s = VHOST_USER_SCSI(vdev);
>> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
>> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
>> +int ret = 0;
>> +
>> +if (s->connected) {
>> +return 0;
>> +}
>> +s->connected = true;
>> +
>> +vsc->dev.num_q

Re: [PATCH] kvm: Remove KVM_CREATE_IRQCHIP support assumption

2023-07-25 Thread Andrew Jones
On Mon, Jul 24, 2023 at 11:53:39AM +0200, Thomas Huth wrote:
> On 22/07/2023 08.21, Andrew Jones wrote:
> > Since Linux commit 00f918f61c56 ("RISC-V: KVM: Skeletal in-kernel AIA
> > irqchip support") checking KVM_CAP_IRQCHIP returns non-zero when the
> > RISC-V platform has AIA. The cap indicates KVM supports at least one
> > of the following ioctls:
> > 
> >KVM_CREATE_IRQCHIP
> >KVM_IRQ_LINE
> >KVM_GET_IRQCHIP
> >KVM_SET_IRQCHIP
> >KVM_GET_LAPIC
> >KVM_SET_LAPIC
> > 
> > but the cap doesn't imply that KVM must support any of those ioctls
> > in particular. However, QEMU was assuming the KVM_CREATE_IRQCHIP
> > ioctl was supported. Stop making that assumption by introducing a
> > KVM parameter that each architecture which supports KVM_CREATE_IRQCHIP
> > sets. Adding parameters isn't awesome, but given how the
> > KVM_CAP_IRQCHIP isn't very helpful on its own, we don't have a lot of
> > options.
> > 
> > Signed-off-by: Andrew Jones 
> > ---
> > 
> > While this fixes booting guests on riscv KVM with AIA it's unlikely
> > to get merged before the QEMU support for KVM AIA[1] lands, which
> > would also fix the issue. I think this patch is still worth considering
> > though since QEMU's assumption is wrong.
> > 
> > [1] 
> > https://lore.kernel.org/all/20230714084429.22349-1-yongxuan.w...@sifive.com/
> > 
> > 
> >   accel/kvm/kvm-all.c| 5 -
> >   include/sysemu/kvm.h   | 1 +
> >   target/arm/kvm.c   | 3 +++
> >   target/i386/kvm/kvm.c  | 2 ++
> >   target/s390x/kvm/kvm.c | 3 +++
> >   5 files changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> > index 373d876c0580..0f5ff8630502 100644
> > --- a/accel/kvm/kvm-all.c
> > +++ b/accel/kvm/kvm-all.c
> > @@ -86,6 +86,7 @@ struct KVMParkedVcpu {
> >   };
> >   KVMState *kvm_state;
> > +bool kvm_has_create_irqchip;
> >   bool kvm_kernel_irqchip;
> >   bool kvm_split_irqchip;
> >   bool kvm_async_interrupts_allowed;
> > @@ -2377,8 +2378,10 @@ static void kvm_irqchip_create(KVMState *s)
> >   if (s->kernel_irqchip_split == ON_OFF_AUTO_ON) {
> >   error_report("Split IRQ chip mode not supported.");
> >   exit(1);
> > -} else {
> > +} else if (kvm_has_create_irqchip) {
> >   ret = kvm_vm_ioctl(s, KVM_CREATE_IRQCHIP);
> > +} else {
> > +return;
> >   }
> >   }
> >   if (ret < 0) {
> 
> I think I'd do this differntly... at the beginning of the function, there is
> a check for kvm_check_extension(s, KVM_CAP_IRQCHIP) etc. ... I think you
> could now replace that check with a simple
> 
>   if (!kvm_has_create_irqchip) {
>   return;
>   }
> 
> The "kvm_vm_enable_cap(s, KVM_CAP_S390_IRQCHIP, 0)" of course has to be
> moved to the target/s390x/kvm/kvm.c file, too.
>

Sounds good. I'll do that for v2.

Thanks,
drew



[PULL 2/5] scripts/git-submodule.sh: Don't rely on non-POSIX 'read' behaviour

2023-07-25 Thread Peter Maydell
The POSIX definition of the 'read' utility requires that you
specify the variable name to set; omitting the name and
having it default to 'REPLY' is a bashism. If your system
sh is dash, then it will print an error message during build:

qemu/pc-bios/s390-ccw/../../scripts/git-submodule.sh: 106: read: arg count

Specify the variable name explicitly.

Fixes: fdb8fd8cb915647b ("git-submodule: allow partial update of 
.git-submodule-status")
Signed-off-by: Peter Maydell 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20230720153038.1587196-1-peter.mayd...@linaro.org
---
 scripts/git-submodule.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/git-submodule.sh b/scripts/git-submodule.sh
index 335f7f5fdf8..bb1222c7727 100755
--- a/scripts/git-submodule.sh
+++ b/scripts/git-submodule.sh
@@ -103,7 +103,7 @@ update)
 check_updated $module || echo Updated "$module"
 done
 
-(while read -r; do
+(while read -r REPLY; do
 for module in $modules; do
 case $REPLY in
 *" $module "*) continue 2 ;;
-- 
2.34.1




[PULL 1/5] hw/arm/smmu: Handle big-endian hosts correctly

2023-07-25 Thread Peter Maydell
The implementation of the SMMUv3 has multiple places where it reads a
data structure from the guest and directly operates on it without
doing a guest-to-host endianness conversion.  Since all SMMU data
structures are little-endian, this means that the SMMU doesn't work
on a big-endian host.  In particular, this causes the Avocado test
  machine_aarch64_virt.py:Aarch64VirtMachine.test_alpine_virt_tcg_gic_max
to fail on an s390x host.

Add appropriate byte-swapping on reads and writes of guest in-memory
data structures so that the device works correctly on big-endian
hosts.

As part of this we constrain queue_read() to operate only on Cmd
structs and queue_write() on Evt structs, because in practice these
are the only data structures the two functions are used with, and we
need to know what the data structure is to be able to byte-swap its
parts correctly.

Signed-off-by: Peter Maydell 
Tested-by: Thomas Huth 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Eric Auger 
Message-id: 20230717132641.764660-1-peter.mayd...@linaro.org
Cc: qemu-sta...@nongnu.org
---
 hw/arm/smmu-common.c |  3 +--
 hw/arm/smmuv3.c  | 39 +++
 2 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 5ab9d45d58a..f35ae9aa22c 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -216,8 +216,7 @@ static int get_pte(dma_addr_t baseaddr, uint32_t index, 
uint64_t *pte,
 dma_addr_t addr = baseaddr + index * sizeof(*pte);
 
 /* TODO: guarantee 64-bit single-copy atomicity */
-ret = dma_memory_read(&address_space_memory, addr, pte, sizeof(*pte),
-  MEMTXATTRS_UNSPECIFIED);
+ret = ldq_le_dma(&address_space_memory, addr, pte, MEMTXATTRS_UNSPECIFIED);
 
 if (ret != MEMTX_OK) {
 info->type = SMMU_PTW_ERR_WALK_EABT;
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 932f0096974..1e9be8e89af 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -102,20 +102,34 @@ static void smmuv3_write_gerrorn(SMMUv3State *s, uint32_t 
new_gerrorn)
 trace_smmuv3_write_gerrorn(toggled & pending, s->gerrorn);
 }
 
-static inline MemTxResult queue_read(SMMUQueue *q, void *data)
+static inline MemTxResult queue_read(SMMUQueue *q, Cmd *cmd)
 {
 dma_addr_t addr = Q_CONS_ENTRY(q);
+MemTxResult ret;
+int i;
 
-return dma_memory_read(&address_space_memory, addr, data, q->entry_size,
-   MEMTXATTRS_UNSPECIFIED);
+ret = dma_memory_read(&address_space_memory, addr, cmd, sizeof(Cmd),
+  MEMTXATTRS_UNSPECIFIED);
+if (ret != MEMTX_OK) {
+return ret;
+}
+for (i = 0; i < ARRAY_SIZE(cmd->word); i++) {
+le32_to_cpus(&cmd->word[i]);
+}
+return ret;
 }
 
-static MemTxResult queue_write(SMMUQueue *q, void *data)
+static MemTxResult queue_write(SMMUQueue *q, Evt *evt_in)
 {
 dma_addr_t addr = Q_PROD_ENTRY(q);
 MemTxResult ret;
+Evt evt = *evt_in;
+int i;
 
-ret = dma_memory_write(&address_space_memory, addr, data, q->entry_size,
+for (i = 0; i < ARRAY_SIZE(evt.word); i++) {
+cpu_to_le32s(&evt.word[i]);
+}
+ret = dma_memory_write(&address_space_memory, addr, &evt, sizeof(Evt),
MEMTXATTRS_UNSPECIFIED);
 if (ret != MEMTX_OK) {
 return ret;
@@ -298,7 +312,7 @@ static void smmuv3_init_regs(SMMUv3State *s)
 static int smmu_get_ste(SMMUv3State *s, dma_addr_t addr, STE *buf,
 SMMUEventInfo *event)
 {
-int ret;
+int ret, i;
 
 trace_smmuv3_get_ste(addr);
 /* TODO: guarantee 64-bit single-copy atomicity */
@@ -311,6 +325,9 @@ static int smmu_get_ste(SMMUv3State *s, dma_addr_t addr, 
STE *buf,
 event->u.f_ste_fetch.addr = addr;
 return -EINVAL;
 }
+for (i = 0; i < ARRAY_SIZE(buf->word); i++) {
+le32_to_cpus(&buf->word[i]);
+}
 return 0;
 
 }
@@ -320,7 +337,7 @@ static int smmu_get_cd(SMMUv3State *s, STE *ste, uint32_t 
ssid,
CD *buf, SMMUEventInfo *event)
 {
 dma_addr_t addr = STE_CTXPTR(ste);
-int ret;
+int ret, i;
 
 trace_smmuv3_get_cd(addr);
 /* TODO: guarantee 64-bit single-copy atomicity */
@@ -333,6 +350,9 @@ static int smmu_get_cd(SMMUv3State *s, STE *ste, uint32_t 
ssid,
 event->u.f_ste_fetch.addr = addr;
 return -EINVAL;
 }
+for (i = 0; i < ARRAY_SIZE(buf->word); i++) {
+le32_to_cpus(&buf->word[i]);
+}
 return 0;
 }
 
@@ -569,7 +589,7 @@ static int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE 
*ste,
 return -EINVAL;
 }
 if (s->features & SMMU_FEATURE_2LVL_STE) {
-int l1_ste_offset, l2_ste_offset, max_l2_ste, span;
+int l1_ste_offset, l2_ste_offset, max_l2_ste, span, i;
 dma_addr_t l1ptr, l2ptr;
 STEDesc l1std;
 
@@ -593,6 +613,9 @@ static int smmu_find_ste(SMMUv3State *s, uint32_t sid, STE 
*ste,
 event->u.f_ste_fetch.

[PULL 3/5] target/arm: Special case M-profile in debug_helper.c code

2023-07-25 Thread Peter Maydell
A lot of the code called from helper_exception_bkpt_insn() is written
assuming A-profile, but we will also call this helper on M-profile
CPUs when they execute a BKPT insn.  This used to work by accident,
but recent changes mean that we will hit an assert when some of this
code calls down into lower level functions that end up calling
arm_security_space_below_el3(), arm_el_is_aa64(), and other functions
that now explicitly assert that the guest CPU is not M-profile.

Handle M-profile directly to avoid the assertions:
 * in arm_debug_target_el(), M-profile debug exceptions always
   go to EL1
 * in arm_debug_exception_fsr(), M-profile always uses the short
   format FSR (compare commit d7fe699be54b2, though in this case
   the code in arm_v7m_cpu_do_interrupt() does not need to
   look at the FSR value at all)

Cc: qemu-sta...@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1775
Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20230721143239.1753066-1-peter.mayd...@linaro.org
---
 target/arm/debug_helper.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
index 8362462a07e..abe72e35ae6 100644
--- a/target/arm/debug_helper.c
+++ b/target/arm/debug_helper.c
@@ -21,6 +21,10 @@ static int arm_debug_target_el(CPUARMState *env)
 bool secure = arm_is_secure(env);
 bool route_to_el2 = false;
 
+if (arm_feature(env, ARM_FEATURE_M)) {
+return 1;
+}
+
 if (arm_is_el2_enabled(env)) {
 route_to_el2 = env->cp15.hcr_el2 & HCR_TGE ||
env->cp15.mdcr_el2 & MDCR_TDE;
@@ -434,18 +438,20 @@ static uint32_t arm_debug_exception_fsr(CPUARMState *env)
 {
 ARMMMUFaultInfo fi = { .type = ARMFault_Debug };
 int target_el = arm_debug_target_el(env);
-bool using_lpae = false;
+bool using_lpae;
 
-if (target_el == 2 || arm_el_is_aa64(env, target_el)) {
+if (arm_feature(env, ARM_FEATURE_M)) {
+using_lpae = false;
+} else if (target_el == 2 || arm_el_is_aa64(env, target_el)) {
 using_lpae = true;
 } else if (arm_feature(env, ARM_FEATURE_PMSA) &&
arm_feature(env, ARM_FEATURE_V8)) {
 using_lpae = true;
+} else if (arm_feature(env, ARM_FEATURE_LPAE) &&
+   (env->cp15.tcr_el[target_el] & TTBCR_EAE)) {
+using_lpae = true;
 } else {
-if (arm_feature(env, ARM_FEATURE_LPAE) &&
-(env->cp15.tcr_el[target_el] & TTBCR_EAE)) {
-using_lpae = true;
-}
+using_lpae = false;
 }
 
 if (using_lpae) {
-- 
2.34.1




[PULL 0/5] target-arm queue

2023-07-25 Thread Peter Maydell
target-arm queue: just bugfixes, mostly mine.

thanks
-- PMM

The following changes since commit 885fc169f09f5915ce037263d20a59eb226d473d:

  Merge tag 'pull-riscv-to-apply-20230723-3' of 
https://github.com/alistair23/qemu into staging (2023-07-24 11:34:35 +0100)

are available in the Git repository at:

  https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20230725

for you to fetch changes up to 78cc90346ec680a7f1bb9f138bf7c9654cf526d5:

  tests/decode: Suppress "error: " string for expected-failure tests 
(2023-07-25 10:56:52 +0100)


target-arm queue:
 * tests/decode: Suppress "error: " string for expected-failure tests
 * ui/curses: For curses display, recognize a few more control keys
 * target/arm: Special case M-profile in debug_helper.c code
 * scripts/git-submodule.sh: Don't rely on non-POSIX 'read' behaviour
 * hw/arm/smmu: Handle big-endian hosts correctly


Peter Maydell (4):
  hw/arm/smmu: Handle big-endian hosts correctly
  scripts/git-submodule.sh: Don't rely on non-POSIX 'read' behaviour
  target/arm: Special case M-profile in debug_helper.c code
  tests/decode: Suppress "error: " string for expected-failure tests

Sean Estabrooks (1):
  For curses display, recognize a few more control keys

 ui/curses_keys.h  |  6 ++
 hw/arm/smmu-common.c  |  3 +--
 hw/arm/smmuv3.c   | 39 +++
 target/arm/debug_helper.c | 18 --
 scripts/decodetree.py |  6 +-
 scripts/git-submodule.sh  |  2 +-
 6 files changed, 56 insertions(+), 18 deletions(-)



[PULL 4/5] For curses display, recognize a few more control keys

2023-07-25 Thread Peter Maydell
From: Sean Estabrooks 

The curses display handles most control-X keys, and translates
them into their corresponding keycode.  Here we recognize
a few that are missing, Ctrl-@ (null), Ctrl-\ (backslash),
Ctrl-] (right bracket), Ctrl-^ (caret), Ctrl-_ (underscore).

Signed-off-by: Sean Estabrooks 
Message-id: cahyvn3bh9crgduomf7g7ngwamu8d4cvozacb2i4ymnnggbx...@mail.gmail.com
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 ui/curses_keys.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/ui/curses_keys.h b/ui/curses_keys.h
index 71e04acdc75..88a2208ed18 100644
--- a/ui/curses_keys.h
+++ b/ui/curses_keys.h
@@ -210,6 +210,12 @@ static const int _curses2keycode[CURSES_CHARS] = {
 ['N' - '@'] = 49 | CNTRL, /* Control + n */
 /* Control + m collides with the keycode for Enter */
 
+['@' - '@']  =  3 | CNTRL, /* Control + @ */
+/* Control + [ collides with the keycode for Escape */
+['\\' - '@'] = 43 | CNTRL, /* Control + Backslash */
+[']' - '@']  = 27 | CNTRL, /* Control + ] */
+['^' - '@']  =  7 | CNTRL, /* Control + ^ */
+['_' - '@']  = 12 | CNTRL, /* Control + Underscore */
 };
 
 static const int _curseskey2keycode[CURSES_KEYS] = {
-- 
2.34.1




[PULL 5/5] tests/decode: Suppress "error: " string for expected-failure tests

2023-07-25 Thread Peter Maydell
The "expected failure" tests for decodetree result in the
error messages from decodetree ending up in logs and in
V=1 output:

>>> MALLOC_PERTURB_=226 
>>> /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/x86/pyvenv/bin/python3 
>>> /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/scripts/decodetree.py 
>>> --output-null --test-for-error 
>>> /mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/x86/../../tests/decode/err_argset1.decode
――― 
✀  

/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/x86/../../tests/decode/err_argset1.decode:5:
 error: duplicate argument "a"
―――
 1/44 qemu:decodetree / err_argset1OK  0.05s

This then produces false positives when scanning the
logfiles for strings like "error: ".

For the expected-failure tests, make decodetree print
"detected:" instead of "error:".

Signed-off-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-id: 20230720131521.1325905-1-peter.mayd...@linaro.org
---
 scripts/decodetree.py | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index a8a6cb69cda..e8b72da3a97 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -134,6 +134,10 @@ def error_with_file(file, lineno, *args):
 global output_file
 global output_fd
 
+# For the test suite expected-errors case, don't print the
+# string "error: ", so they don't turn up as false positives
+# if you grep the meson logs for strings like that.
+end = 'error: ' if not testforerror else 'detected: '
 prefix = ''
 if file:
 prefix += f'{file}:'
@@ -141,7 +145,7 @@ def error_with_file(file, lineno, *args):
 prefix += f'{lineno}:'
 if prefix:
 prefix += ' '
-print(prefix, end='error: ', file=sys.stderr)
+print(prefix, end=end, file=sys.stderr)
 print(*args, file=sys.stderr)
 
 if output_file and output_fd:
-- 
2.34.1




[PATCH] block/blkio: enable the completion eventfd

2023-07-25 Thread Stefano Garzarella
Until libblkio 1.3.0, virtio-blk drivers had completion eventfd
notifications enabled from the start, but from the next releases
this is no longer the case, so we have to explicitly enable them.

In fact, the libblkio documentation says they could be disabled,
so we should always enable them at the start if we want to be
sure to get completion eventfd notifications:

By default, the driver might not generate completion events for
requests so it is necessary to explicitly enable the completion
file descriptor before use:

void blkioq_set_completion_fd_enabled(struct blkioq *q, bool enable);

I discovered this while trying a development version of libblkio:
the guest kernel hangs during boot, while probing the device.

Fixes: fd66dbd424f5 ("blkio: add libblkio block driver")
Signed-off-by: Stefano Garzarella 
---
 block/blkio.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/blkio.c b/block/blkio.c
index 1798648134..bc1fac48b7 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -845,6 +845,7 @@ static int blkio_file_open(BlockDriverState *bs, QDict 
*options, int flags,
 QLIST_INIT(&s->bounce_bufs);
 s->blkioq = blkio_get_queue(s->blkio, 0);
 s->completion_fd = blkioq_get_completion_fd(s->blkioq);
+blkioq_set_completion_fd_enabled(s->blkioq, true);
 
 blkio_attach_aio_context(bs, bdrv_get_aio_context(bs));
 return 0;
-- 
2.41.0




[PATCH v2 3/4] vhost: move and rename the conn retry times

2023-07-25 Thread Li Feng
Multile devices need this macro, move it to a common header.

Signed-off-by: Li Feng 
---
 hw/block/vhost-user-blk.c   | 4 +---
 hw/virtio/vhost-user-gpio.c | 3 +--
 include/hw/virtio/vhost.h   | 2 ++
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index eecf3f7a81..3c69fa47d5 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -32,8 +32,6 @@
 #include "sysemu/sysemu.h"
 #include "sysemu/runstate.h"
 
-#define REALIZE_CONNECTION_RETRIES 3
-
 static const int user_feature_bits[] = {
 VIRTIO_BLK_F_SIZE_MAX,
 VIRTIO_BLK_F_SEG_MAX,
@@ -482,7 +480,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 s->inflight = g_new0(struct vhost_inflight, 1);
 s->vhost_vqs = g_new0(struct vhost_virtqueue, s->num_queues);
 
-retries = REALIZE_CONNECTION_RETRIES;
+retries = VU_REALIZE_CONN_RETRIES;
 assert(!*errp);
 do {
 if (*errp) {
diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
index 3b013f2d0f..d9979aa5db 100644
--- a/hw/virtio/vhost-user-gpio.c
+++ b/hw/virtio/vhost-user-gpio.c
@@ -15,7 +15,6 @@
 #include "standard-headers/linux/virtio_ids.h"
 #include "trace.h"
 
-#define REALIZE_CONNECTION_RETRIES 3
 #define VHOST_NVQS 2
 
 /* Features required from VirtIO */
@@ -359,7 +358,7 @@ static void vu_gpio_device_realize(DeviceState *dev, Error 
**errp)
 qemu_chr_fe_set_handlers(&gpio->chardev, NULL, NULL, vu_gpio_event, NULL,
  dev, NULL, true);
 
-retries = REALIZE_CONNECTION_RETRIES;
+retries = VU_REALIZE_CONN_RETRIES;
 g_assert(!*errp);
 do {
 if (*errp) {
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 6a173cb9fa..ca3131b1af 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -8,6 +8,8 @@
 #define VHOST_F_DEVICE_IOTLB 63
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
 
+#define VU_REALIZE_CONN_RETRIES 3
+
 /* Generic structures common for any vhost based device. */
 
 struct vhost_inflight {
-- 
2.41.0




[PATCH v2 2/4] vhost-user-common: send get_inflight_fd once

2023-07-25 Thread Li Feng
Get_inflight_fd is sent only once. When reconnecting to the backend,
qemu sent set_inflight_fd to the backend.

Signed-off-by: Li Feng 
---
 hw/scsi/vhost-scsi-common.c | 37 ++---
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
index a06f01af26..664adb15b4 100644
--- a/hw/scsi/vhost-scsi-common.c
+++ b/hw/scsi/vhost-scsi-common.c
@@ -52,20 +52,28 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
 
 vsc->dev.acked_features = vdev->guest_features;
 
-assert(vsc->inflight == NULL);
-vsc->inflight = g_new0(struct vhost_inflight, 1);
-ret = vhost_dev_get_inflight(&vsc->dev,
- vs->conf.virtqueue_size,
- vsc->inflight);
+ret = vhost_dev_prepare_inflight(&vsc->dev, vdev);
 if (ret < 0) {
-error_report("Error get inflight: %d", -ret);
+error_report("Error setting inflight format: %d", -ret);
 goto err_guest_notifiers;
 }
 
-ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight);
-if (ret < 0) {
-error_report("Error set inflight: %d", -ret);
-goto err_guest_notifiers;
+if (vsc->inflight) {
+if (!vsc->inflight->addr) {
+ret = vhost_dev_get_inflight(&vsc->dev,
+vs->conf.virtqueue_size,
+vsc->inflight);
+if (ret < 0) {
+error_report("Error get inflight: %d", -ret);
+goto err_guest_notifiers;
+}
+}
+
+ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight);
+if (ret < 0) {
+error_report("Error set inflight: %d", -ret);
+goto err_guest_notifiers;
+}
 }
 
 ret = vhost_dev_start(&vsc->dev, vdev, true);
@@ -85,9 +93,6 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
 return ret;
 
 err_guest_notifiers:
-g_free(vsc->inflight);
-vsc->inflight = NULL;
-
 k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
 err_host_notifiers:
 vhost_dev_disable_notifiers(&vsc->dev, vdev);
@@ -111,12 +116,6 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc)
 }
 assert(ret >= 0);
 
-if (vsc->inflight) {
-vhost_dev_free_inflight(vsc->inflight);
-g_free(vsc->inflight);
-vsc->inflight = NULL;
-}
-
 vhost_dev_disable_notifiers(&vsc->dev, vdev);
 }
 
-- 
2.41.0




[PATCH v2 1/4] vhost: fix the fd leak

2023-07-25 Thread Li Feng
When the vhost-user reconnect to the backend, the notifer should be
cleanup. Otherwise, the fd resource will be exhausted.

Fixes: f9a09ca3ea ("vhost: add support for configure interrupt")

Signed-off-by: Li Feng 
---
 hw/virtio/vhost.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index abf0d03c8d..e2f6ffb446 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -2044,6 +2044,8 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice 
*vdev, bool vrings)
 event_notifier_test_and_clear(
 &hdev->vqs[VHOST_QUEUE_NUM_CONFIG_INR].masked_config_notifier);
 event_notifier_test_and_clear(&vdev->config_notifier);
+event_notifier_cleanup(
+&hdev->vqs[VHOST_QUEUE_NUM_CONFIG_INR].masked_config_notifier);
 
 trace_vhost_dev_stop(hdev, vdev->name, vrings);
 
-- 
2.41.0




[PATCH v2 4/4] vhost-user-scsi: support reconnect to backend

2023-07-25 Thread Li Feng
If the backend crashes and restarts, the device is broken.
This patch adds reconnect for vhost-user-scsi.

Tested with spdk backend.

Signed-off-by: Li Feng 
---
 hw/scsi/vhost-scsi-common.c   |   6 +
 hw/scsi/vhost-user-scsi.c | 220 +++---
 include/hw/virtio/vhost-scsi-common.h |   3 +
 include/hw/virtio/vhost-user-scsi.h   |   3 +
 4 files changed, 211 insertions(+), 21 deletions(-)

diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
index 664adb15b4..3fde477eee 100644
--- a/hw/scsi/vhost-scsi-common.c
+++ b/hw/scsi/vhost-scsi-common.c
@@ -81,6 +81,7 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
 error_report("Error start vhost dev");
 goto err_guest_notifiers;
 }
+vsc->started_vu = true;
 
 /* guest_notifier_mask/pending not used yet, so just unmask
  * everything here.  virtio-pci will do the right thing by
@@ -106,6 +107,11 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc)
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
 int ret = 0;
 
+if (!vsc->started_vu) {
+return;
+}
+vsc->started_vu = false;
+
 vhost_dev_stop(&vsc->dev, vdev, true);
 
 if (k->set_guest_notifiers) {
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index ee99b19e7a..bd32dcf999 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -46,20 +46,25 @@ enum VhostUserProtocolFeature {
 static void vhost_user_scsi_set_status(VirtIODevice *vdev, uint8_t status)
 {
 VHostUserSCSI *s = (VHostUserSCSI *)vdev;
+DeviceState *dev = &s->parent_obj.parent_obj.parent_obj.parent_obj;
 VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
-bool start = (status & VIRTIO_CONFIG_S_DRIVER_OK) && vdev->vm_running;
+VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
+bool should_start = virtio_device_should_start(vdev, status);
+int ret;
 
-if (vhost_dev_is_started(&vsc->dev) == start) {
+if (!s->connected) {
 return;
 }
 
-if (start) {
-int ret;
+if (vhost_dev_is_started(&vsc->dev) == should_start) {
+return;
+}
 
+if (should_start) {
 ret = vhost_scsi_common_start(vsc);
 if (ret < 0) {
 error_report("unable to start vhost-user-scsi: %s", 
strerror(-ret));
-exit(1);
+qemu_chr_fe_disconnect(&vs->conf.chardev);
 }
 } else {
 vhost_scsi_common_stop(vsc);
@@ -85,8 +90,160 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
 }
 }
 
-static void vhost_dummy_handle_output(VirtIODevice *vdev, VirtQueue *vq)
+static void vhost_user_scsi_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
+VHostUserSCSI *s = (VHostUserSCSI *)vdev;
+DeviceState *dev = &s->parent_obj.parent_obj.parent_obj.parent_obj;
+VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
+VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
+
+Error *local_err = NULL;
+int i, ret;
+
+if (!vdev->start_on_kick) {
+return;
+}
+
+if (!s->connected) {
+return;
+}
+
+if (vhost_dev_is_started(&vsc->dev)) {
+return;
+}
+
+/*
+ * Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
+ * vhost here instead of waiting for .set_status().
+ */
+ret = vhost_scsi_common_start(vsc);
+if (ret < 0) {
+error_reportf_err(local_err, "vhost-user-blk: vhost start failed: ");
+qemu_chr_fe_disconnect(&vs->conf.chardev);
+return;
+}
+
+/* Kick right away to begin processing requests already in vring */
+for (i = 0; i < vsc->dev.nvqs; i++) {
+VirtQueue *kick_vq = virtio_get_queue(vdev, i);
+
+if (!virtio_queue_get_desc_addr(vdev, i)) {
+continue;
+}
+event_notifier_set(virtio_queue_get_host_notifier(kick_vq));
+}
+}
+
+static int vhost_user_scsi_connect(DeviceState *dev, Error **errp)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VHostUserSCSI *s = VHOST_USER_SCSI(vdev);
+VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
+VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
+int ret = 0;
+
+if (s->connected) {
+return 0;
+}
+s->connected = true;
+
+vsc->dev.num_queues = vs->conf.num_queues;
+vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues;
+vsc->dev.vqs = s->vhost_vqs;
+vsc->dev.vq_index = 0;
+vsc->dev.backend_features = 0;
+
+ret = vhost_dev_init(&vsc->dev, &s->vhost_user, VHOST_BACKEND_TYPE_USER, 0,
+ errp);
+if (ret < 0) {
+return ret;
+}
+
+/* restore vhost state */
+if (virtio_device_started(vdev, vdev->status)) {
+ret = vhost_scsi_common_start(vsc);
+if (ret < 0) {
+return ret;
+}
+}
+
+return 0;
+}
+
+static void vhost_user_scsi_event(void *opaque, QEMUChrEvent event);
+
+static void vhost_user_scsi_disconnect(DeviceState *dev)
+{
+VirtIODevice *vdev = VIRTIO_DEV

[PATCH v2 0/4] Implement reconnect for vhost-user-scsi

2023-07-25 Thread Li Feng
Hi,

This patchset adds reconnect support for vhost-user-scsi. At the same
times, fix vhost fd leak and refactor some code.

Changes for v2:
- Split the v1 patch to small separate patchset;
- New patch for fixing fd leak, which has sent to reviewers in another
  mail;
- Implement the `vhost_user_scsi_handle_output`;
- Add the started_vu safe check;
- Fix error handler;
- Check the inflight before set/get inflight fd.

Li Feng (4):
  vhost: fix the fd leak
  vhost-user-common: send get_inflight_fd once
  vhost: move and rename the conn retry times
  vhost-user-scsi: support reconnect to backend

 hw/block/vhost-user-blk.c |   4 +-
 hw/scsi/vhost-scsi-common.c   |  43 ++---
 hw/scsi/vhost-user-scsi.c | 220 +++---
 hw/virtio/vhost-user-gpio.c   |   3 +-
 hw/virtio/vhost.c |   2 +
 include/hw/virtio/vhost-scsi-common.h |   3 +
 include/hw/virtio/vhost-user-scsi.h   |   3 +
 include/hw/virtio/vhost.h |   2 +
 8 files changed, 235 insertions(+), 45 deletions(-)

-- 
2.41.0




[PATCH] semihosting/uaccess.c: Replaced a malloc call with g_malloc.

2023-07-25 Thread dinglimin
Replaced a call to malloc() and its respective call to free() with g_malloc() 
and g_free().

Signed-off-by: dinglimin 

V2 -> V3:softmmu_unlock_user changes free to g free.
V1 -> V2:if cpu_memory_rw_debug failed, still need to set p=NULL
---
 semihosting/uaccess.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/semihosting/uaccess.c b/semihosting/uaccess.c
index 8018828069..81a0f80e9f 100644
--- a/semihosting/uaccess.c
+++ b/semihosting/uaccess.c
@@ -14,10 +14,10 @@
 void *softmmu_lock_user(CPUArchState *env, target_ulong addr,
 target_ulong len, bool copy)
 {
-void *p = malloc(len);
+void *p = g_malloc(len);
 if (p && copy) {
 if (cpu_memory_rw_debug(env_cpu(env), addr, p, len, 0)) {
-free(p);
+g_free(p);
 p = NULL;
 }
 }
@@ -87,5 +87,5 @@ void softmmu_unlock_user(CPUArchState *env, void *p,
 if (len) {
 cpu_memory_rw_debug(env_cpu(env), addr, p, len, 1);
 }
-free(p);
+g_free(p);
 }
-- 
2.30.0.windows.2






Re: [PATCH v2 4/4] vhost-user-scsi: support reconnect to backend

2023-07-25 Thread Li Feng



> 2023年7月25日 下午6:42,Li Feng  写道:
> 
> If the backend crashes and restarts, the device is broken.
> This patch adds reconnect for vhost-user-scsi.
> 
> Tested with spdk backend.
> 
> Signed-off-by: Li Feng 
> ---
> hw/scsi/vhost-scsi-common.c   |   6 +
> hw/scsi/vhost-user-scsi.c | 220 +++---
> include/hw/virtio/vhost-scsi-common.h |   3 +
> include/hw/virtio/vhost-user-scsi.h   |   3 +
> 4 files changed, 211 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
> index 664adb15b4..3fde477eee 100644
> --- a/hw/scsi/vhost-scsi-common.c
> +++ b/hw/scsi/vhost-scsi-common.c
> @@ -81,6 +81,7 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
> error_report("Error start vhost dev");
> goto err_guest_notifiers;
> }
> +vsc->started_vu = true;
> 
> /* guest_notifier_mask/pending not used yet, so just unmask
>  * everything here.  virtio-pci will do the right thing by
> @@ -106,6 +107,11 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc)
> VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> int ret = 0;
> 
> +if (!vsc->started_vu) {
> +return;
> +}
> +vsc->started_vu = false;
> +
> vhost_dev_stop(&vsc->dev, vdev, true);
> 
> if (k->set_guest_notifiers) {
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index ee99b19e7a..bd32dcf999 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -46,20 +46,25 @@ enum VhostUserProtocolFeature {
> static void vhost_user_scsi_set_status(VirtIODevice *vdev, uint8_t status)
> {
> VHostUserSCSI *s = (VHostUserSCSI *)vdev;
> +DeviceState *dev = &s->parent_obj.parent_obj.parent_obj.parent_obj;
> VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> -bool start = (status & VIRTIO_CONFIG_S_DRIVER_OK) && vdev->vm_running;
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +bool should_start = virtio_device_should_start(vdev, status);
> +int ret;
> 
> -if (vhost_dev_is_started(&vsc->dev) == start) {
> +if (!s->connected) {
> return;
> }
> 
> -if (start) {
> -int ret;
> +if (vhost_dev_is_started(&vsc->dev) == should_start) {
> +return;
> +}
> 
> +if (should_start) {
> ret = vhost_scsi_common_start(vsc);
> if (ret < 0) {
> error_report("unable to start vhost-user-scsi: %s", 
> strerror(-ret));
> -exit(1);
> +qemu_chr_fe_disconnect(&vs->conf.chardev);
> }
> } else {
> vhost_scsi_common_stop(vsc);
> @@ -85,8 +90,160 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
> }
> }
> 
> -static void vhost_dummy_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> +static void vhost_user_scsi_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> {
> +VHostUserSCSI *s = (VHostUserSCSI *)vdev;
> +DeviceState *dev = &s->parent_obj.parent_obj.parent_obj.parent_obj;
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +
> +Error *local_err = NULL;
> +int i, ret;
> +
> +if (!vdev->start_on_kick) {
> +return;
> +}
> +
> +if (!s->connected) {
> +return;
> +}
> +
> +if (vhost_dev_is_started(&vsc->dev)) {
> +return;
> +}
> +
> +/*
> + * Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
> + * vhost here instead of waiting for .set_status().
> + */
> +ret = vhost_scsi_common_start(vsc);
> +if (ret < 0) {
> +error_reportf_err(local_err, "vhost-user-blk: vhost start failed: “);

Need fix typo in v3. s/vhost-user-blk/vhost-user-scsi/g

> +qemu_chr_fe_disconnect(&vs->conf.chardev);
> +return;
> +}
> +
> +/* Kick right away to begin processing requests already in vring */
> +for (i = 0; i < vsc->dev.nvqs; i++) {
> +VirtQueue *kick_vq = virtio_get_queue(vdev, i);
> +
> +if (!virtio_queue_get_desc_addr(vdev, i)) {
> +continue;
> +}
> +event_notifier_set(virtio_queue_get_host_notifier(kick_vq));
> +}
> +}
> +
> +static int vhost_user_scsi_connect(DeviceState *dev, Error **errp)
> +{
> +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +VHostUserSCSI *s = VHOST_USER_SCSI(vdev);
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +int ret = 0;
> +
> +if (s->connected) {
> +return 0;
> +}
> +s->connected = true;
> +
> +vsc->dev.num_queues = vs->conf.num_queues;
> +vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues;
> +vsc->dev.vqs = s->vhost_vqs;
> +vsc->dev.vq_index = 0;
> +vsc->dev.backend_features = 0;
> +
> +ret = vhost_dev_init(&vsc->dev, &s->vhost_user, VHOST_BACKEND_TYPE_USER, 
> 0,
> + errp);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +/* restore vhost state 

Re: [PATCH] migration: Allow user to specify migration available bandwidth

2023-07-25 Thread Markus Armbruster
Peter Xu  writes:

> Migration bandwidth is a very important value to live migration.  It's
> because it's one of the major factors that we'll make decision on when to
> switchover to destination in a precopy process.
>
> This value is currently estimated by QEMU during the whole live migration
> process by monitoring how fast we were sending the data.  This can be the
> most accurate bandwidth if in the ideal world, where we're always feeding
> unlimited data to the migration channel, and then it'll be limited to the
> bandwidth that is available.
>
> However in reality it may be very different, e.g., over a 10Gbps network we
> can see query-migrate showing migration bandwidth of only a few tens of
> MB/s just because there are plenty of other things the migration thread
> might be doing.  For example, the migration thread can be busy scanning
> zero pages, or it can be fetching dirty bitmap from other external dirty
> sources (like vhost or KVM).  It means we may not be pushing data as much
> as possible to migration channel, so the bandwidth estimated from "how many
> data we sent in the channel" can be dramatically inaccurate sometimes,
> e.g., that a few tens of MB/s even if 10Gbps available, and then the
> decision to switchover will be further affected by this.
>
> The migration may not even converge at all with the downtime specified,
> with that wrong estimation of bandwidth.
>
> The issue is QEMU itself may not be able to avoid those uncertainties on
> measuing the real "available migration bandwidth".  At least not something
> I can think of so far.
>
> One way to fix this is when the user is fully aware of the available
> bandwidth, then we can allow the user to help providing an accurate value.
>
> For example, if the user has a dedicated channel of 10Gbps for migration
> for this specific VM, the user can specify this bandwidth so QEMU can
> always do the calculation based on this fact, trusting the user as long as
> specified.
>
> When the user wants to have migration only use 5Gbps out of that 10Gbps,
> one can set max-bandwidth to 5Gbps, along with available-bandwidth to 5Gbps
> so it'll never use over 5Gbps too (so the user can have the rest 5Gbps for
> other things).  So it can be useful even if the network is not dedicated,
> but as long as the user can know a solid value.
>
> A new parameter "available-bandwidth" is introduced just for this. So when
> the user specified this parameter, instead of trusting the estimated value
> from QEMU itself (based on the QEMUFile send speed), let's trust the user
> more.
>
> This can resolve issues like "unconvergence migration" which is caused by
> hilarious low "migration bandwidth" detected for whatever reason.
>
> Reported-by: Zhiyi Guo 
> Signed-off-by: Peter Xu 
> ---
>  qapi/migration.json| 20 +++-
>  migration/migration.h  |  2 +-
>  migration/options.h|  1 +
>  migration/migration-hmp-cmds.c | 14 ++
>  migration/migration.c  | 19 +++
>  migration/options.c| 28 
>  migration/trace-events |  2 +-
>  7 files changed, 79 insertions(+), 7 deletions(-)
>
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 47dfef0278..fdc269e0a1 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -730,6 +730,16 @@
>  # @max-bandwidth: to set maximum speed for migration.  maximum speed
>  # in bytes per second.  (Since 2.8)
>  #
> +# @available-bandwidth: to set available bandwidth for migration.  By
> +# default, this value is zero, means the user is not aware of the
> +# available bandwidth that can be used by QEMU migration, so QEMU will
> +# estimate the bandwidth automatically.  This can be set when the
> +# estimated value is not accurate, while the user is able to guarantee
> +# such bandwidth is available for migration purpose during the
> +# migration procedure.  When specified correctly, this can make the
> +# switchover decision much more accurate, which will also be based on
> +# the max downtime specified.  (Since 8.2)

Humor me: break lines slightly earlier, like

   # @available-bandwidth: to set available bandwidth for migration.  By
   # default, this value is zero, means the user is not aware of the
   # available bandwidth that can be used by QEMU migration, so QEMU
   # will estimate the bandwidth automatically.  This can be set when
   # the estimated value is not accurate, while the user is able to
   # guarantee such bandwidth is available for migration purpose
   # during the migration procedure.  When specified correctly, this
   # can make the switchover decision much more accurate, which will
   # also be based on the max downtime specified.  (Since 8.2)

> +#
>  # @downtime-limit: set maximum tolerated downtime for migration.
>  # maximum downtime in milliseconds (Since 2.8)
>  #
> @@ -803,7 +813,7 @@
>  

[PATCH] block/blkio: do not use open flags in qemu_open()

2023-07-25 Thread Stefano Garzarella
qemu_open() in blkio_virtio_blk_common_open() is used to open the
character device (e.g. /dev/vhost-vdpa-0 or /dev/vfio/vfio) or in
the future eventually the unix socket.

In all these cases we cannot open the path in read-only mode,
when the `read-only` option of blockdev is on, because the exchange
of IOCTL commands for example will fail.

In order to open the device read-only, we have to use the `read-only`
property of the libblkio driver as we already do in blkio_file_open().

Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for 
virtio-blk")
Reported-by: Qing Wang 
Signed-off-by: Stefano Garzarella 
---
 block/blkio.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 1798648134..fe9bf8ea5f 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -686,15 +686,7 @@ static int blkio_virtio_blk_common_open(BlockDriverState 
*bs,
  * layer through the "/dev/fdset/N" special path.
  */
 if (fd_supported) {
-int open_flags;
-
-if (flags & BDRV_O_RDWR) {
-open_flags = O_RDWR;
-} else {
-open_flags = O_RDONLY;
-}
-
-fd = qemu_open(path, open_flags, errp);
+fd = qemu_open(path, O_RDWR, errp);
 if (fd < 0) {
 return -EINVAL;
 }
-- 
2.41.0




Re: [PATCH] block/blkio: do not use open flags in qemu_open()

2023-07-25 Thread Daniel P . Berrangé
On Tue, Jul 25, 2023 at 01:11:55PM +0200, Stefano Garzarella wrote:
> qemu_open() in blkio_virtio_blk_common_open() is used to open the
> character device (e.g. /dev/vhost-vdpa-0 or /dev/vfio/vfio) or in
> the future eventually the unix socket.
> 
> In all these cases we cannot open the path in read-only mode,
> when the `read-only` option of blockdev is on, because the exchange
> of IOCTL commands for example will fail.
> 
> In order to open the device read-only, we have to use the `read-only`
> property of the libblkio driver as we already do in blkio_file_open().
> 
> Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for 
> virtio-blk")
> Reported-by: Qing Wang 
> Signed-off-by: Stefano Garzarella 
> ---
>  block/blkio.c | 10 +-
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/block/blkio.c b/block/blkio.c
> index 1798648134..fe9bf8ea5f 100644
> --- a/block/blkio.c
> +++ b/block/blkio.c
> @@ -686,15 +686,7 @@ static int blkio_virtio_blk_common_open(BlockDriverState 
> *bs,
>   * layer through the "/dev/fdset/N" special path.
>   */
>  if (fd_supported) {
> -int open_flags;
> -
> -if (flags & BDRV_O_RDWR) {
> -open_flags = O_RDWR;
> -} else {
> -open_flags = O_RDONLY;
> -}
> -
> -fd = qemu_open(path, open_flags, errp);

I'd suggest taking the paragraph from the commit message explaining why
it is correct to hardcode O_RDWR and putting it into a comment here.

> +fd = qemu_open(path, O_RDWR, errp);
>  if (fd < 0) {
>  return -EINVAL;
>  }
> -- 
> 2.41.0
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] block/blkio: do not use open flags in qemu_open()

2023-07-25 Thread Stefano Garzarella

On Tue, Jul 25, 2023 at 12:15:40PM +0100, Daniel P. Berrangé wrote:

On Tue, Jul 25, 2023 at 01:11:55PM +0200, Stefano Garzarella wrote:

qemu_open() in blkio_virtio_blk_common_open() is used to open the
character device (e.g. /dev/vhost-vdpa-0 or /dev/vfio/vfio) or in
the future eventually the unix socket.

In all these cases we cannot open the path in read-only mode,
when the `read-only` option of blockdev is on, because the exchange
of IOCTL commands for example will fail.

In order to open the device read-only, we have to use the `read-only`
property of the libblkio driver as we already do in blkio_file_open().

Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for 
virtio-blk")
Reported-by: Qing Wang 
Signed-off-by: Stefano Garzarella 
---
 block/blkio.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 1798648134..fe9bf8ea5f 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -686,15 +686,7 @@ static int blkio_virtio_blk_common_open(BlockDriverState 
*bs,
  * layer through the "/dev/fdset/N" special path.
  */
 if (fd_supported) {
-int open_flags;
-
-if (flags & BDRV_O_RDWR) {
-open_flags = O_RDWR;
-} else {
-open_flags = O_RDONLY;
-}
-
-fd = qemu_open(path, open_flags, errp);


I'd suggest taking the paragraph from the commit message explaining why
it is correct to hardcode O_RDWR and putting it into a comment here.


Ack, I'd also add:

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2225439

in the commit message.

I'll fix in v2.

Thanks,
Stefano




+fd = qemu_open(path, O_RDWR, errp);
 if (fd < 0) {
 return -EINVAL;
 }
--
2.41.0




With regards,
Daniel
--
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|






Format type of qemu NVMe virtual drive reverted back to its default (512 bytes block size) after performing hot plugout/plugin operation on that drive.

2023-07-25 Thread Ashutosh Sharma
Hi,

I have a virtual system created using qemu 7.2. In that system, I
attached/hot plugged a virtual NVMe drive. This drive had a default
block size of 512 bytes.

admin@node-3:~$ sudo nvme list
Node  SN   Model
 Namespace Usage  Format   FW
Rev
- 
 -
--  
/dev/nvme0n1  ashudev-6f34a1cf_13  QEMU NVMe Ctrl
 1  34.36  GB /  34.36  GB512   B +  0 B
7.1.92

After that, I formatted this drive with 4k block size and it formatted
successfully.

admin@node-3:~$ sudo nvme format /dev/nvme0n1 -f --lbaf 4
Success formatting namespace:1
admin@node-3:~$
admin@node-3:~$ sudo nvme list
Node  SN   Model
 Namespace Usage  Format   FW
Rev
- 
 -
--  
/dev/nvme0n1  ashudev-6f34a1cf_13  QEMU NVMe Ctrl
 1  34.36  GB /  34.36  GB  4 KiB +  0 B
7.1.92

Then, I just performed the hot plugout and then plugin operation on
that drive using qmp.execute's device_del and device_add cmd
respectively.

But, after that, the default block size of that drive reverted to 512 bytes.

admin@node-3:~$ sudo nvme list
Node  SN   Model
 Namespace Usage  Format   FW
Rev
- 
 -
--  
/dev/nvme0n1  ashudev-6f34a1cf_13  QEMU NVMe Ctrl
 1  34.36  GB /  34.36  GB512   B +  0 B
7.1.92

So, I just wanted to know why the NVMe format type reverted back to
512 bytes, as I just performed the hot plugout/plugin operations only.
Drive's block size (format type) should not be changed upon
removal/insertion, right ? or am I missing something ?

Regards,
Ashutosh



[PATCH] ui/dbus: fix win32 compilation when !opengl

2023-07-25 Thread marcandre . lureau
From: Marc-Andre Lureau 

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1782

Signed-off-by: Marc-André Lureau 
---
 ui/dbus-listener.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ui/dbus-listener.c b/ui/dbus-listener.c
index 68ff343799..02fc6ae239 100644
--- a/ui/dbus-listener.c
+++ b/ui/dbus-listener.c
@@ -338,6 +338,7 @@ static bool dbus_scanout_map(DBusDisplayListener *ddl)
 return true;
 }
 
+#ifdef CONFIG_OPENGL
 static bool
 dbus_scanout_share_d3d_texture(
 DBusDisplayListener *ddl,
@@ -399,7 +400,8 @@ dbus_scanout_share_d3d_texture(
 
 return true;
 }
-#endif
+#endif /* CONFIG_OPENGL */
+#endif /* WIN32 */
 
 #ifdef CONFIG_OPENGL
 static void dbus_scanout_texture(DisplayChangeListener *dcl,
-- 
2.41.0




[PATCH for-8.1] hw/rdma/vmw/pvrdma_cmd: Use correct struct in query_port()

2023-07-25 Thread Peter Maydell
In query_port() we pass the address of a local pvrdma_port_attr
struct to the rdma_query_backend_port() function.  Unfortunately,
rdma_backend_query_port() wants a pointer to a struct ibv_port_attr,
and the two are not the same length.

Coverity spotted this (CID 1507146): pvrdma_port_attr is 48 bytes
long, and ibv_port_attr is 52 bytes, because it has a few extra
fields at the end.

Fortunately, all we do with the attrs struct after the call is to
read a few specific fields out of it which are all at the same
offsets in both structs, so we can simply make the local variable the
correct type.  This also lets us drop the cast (which should have
been a bit of a warning flag that we were doing something wrong
here).

Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Maydell 
---
I don't know anything about the rdma code so this fix is based
purely on looking at the code, and is untested beyond just
make check/make check-avocado.
---
 hw/rdma/vmw/pvrdma_cmd.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/rdma/vmw/pvrdma_cmd.c b/hw/rdma/vmw/pvrdma_cmd.c
index c6ed0259821..d31c1875938 100644
--- a/hw/rdma/vmw/pvrdma_cmd.c
+++ b/hw/rdma/vmw/pvrdma_cmd.c
@@ -129,14 +129,13 @@ static int query_port(PVRDMADev *dev, union 
pvrdma_cmd_req *req,
 {
 struct pvrdma_cmd_query_port *cmd = &req->query_port;
 struct pvrdma_cmd_query_port_resp *resp = &rsp->query_port_resp;
-struct pvrdma_port_attr attrs = {};
+struct ibv_port_attr attrs = {};
 
 if (cmd->port_num > MAX_PORTS) {
 return -EINVAL;
 }
 
-if (rdma_backend_query_port(&dev->backend_dev,
-(struct ibv_port_attr *)&attrs)) {
+if (rdma_backend_query_port(&dev->backend_dev, &attrs)) {
 return -ENOMEM;
 }
 
-- 
2.34.1




Re: [PATCH] Open file as read only on private mapping in qemu_ram_alloc_from_file

2023-07-25 Thread David Hildenbrand

Hi,

patch subject should start with "softmmu/physmem: Open ..."

On 25.07.23 12:52, Thiner Logoer wrote:

An read only file can be mapped with read write as long as the
mapping is private, which is very common case. Make


At least in the environments I know, using private file mappings is a corner 
case ;)

What is you use case? VM templating?


qemu_ram_alloc_from_file open file as read only when the
mapping is private, otherwise open will fail when file
does not allow write.

If this file does not exist or is a directory, the flag is not used,
so it should be OK.

from https://gitlab.com/qemu-project/qemu/-/issues/1689

Signed-off-by: Thiner Logoer 
---
  softmmu/physmem.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 3df73542e1..e8036ee335 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1945,8 +1945,15 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
  int fd;
  bool created;
  RAMBlock *block;
+


^

.git/rebase-apply/patch:13: trailing whitespace.


+/*
+ * If map is private, the fd does not need to be writable.
+ * This only get effective when the file is existent.


"This will get ignored if the file does not yet exist."


+ */
+bool open_as_readonly = readonly || !(ram_flags & RAM_SHARED);
  
-fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created,

+fd = file_ram_open(mem_path, memory_region_name(mr),
+   open_as_readonly, &created,
 errp);
  if (fd < 0) {
  return NULL;



Opening a file R/O will also make operations like fallocate/ftruncate/ ... fail.

For example, this will make fallocate(FALLOC_FL_PUNCH_HOLE) stop working and in
turn make ram_block_discard_range() bail out.


There was a recent discussion/patch on that:

commit 1d44ff586f8a8e113379430750b5a0a2a3f64cf9
Author: David Hildenbrand 
Date:   Thu Jul 6 09:56:06 2023 +0200

softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file 
mapping

ram_block_discard_range() cannot possibly do the right thing in

MAP_PRIVATE file mappings in the general case.

To achieve the documented semantics, we also have to punch a hole into

the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings
of such a file.

For example, using VM templating -- see commit b17fbbe55cba ("migration:

allow private destination ram with x-ignore-shared") -- in combination with
any mechanism that relies on discarding of RAM is problematic. This
includes:
* Postcopy live migration
* virtio-balloon inflation/deflation or free-page-reporting
* virtio-mem

So at least warn that there is something possibly dangerous is going on

when using ram_block_discard_range() in these cases.


While it doesn't work "in the general case", it works in the "single file 
owner" case
where someone simply forgot to specify "share=on" -- "share=off" is the default 
for
memory-backend-file :( .


For example, with hugetlb+virtio-mem the following works if the file does not 
exists:

(note that virtio-mem will fallocate(FALLOC_FL_PUNCH_HOLE) the whole file 
upfront)

...
-object 
memory-backend-file,share=off,mem-path=/dev/hugepages/vmem0,id=mem2,size=2G \
-device virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root


With you patch, once the file already exists, we would now get

qemu-system-x86_64: -device 
virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root: 
ram_block_discard_range: Failed to fallocate :0 +8000 (-9)
qemu-system-x86_64: -device 
virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=1g,bus=root: Unexpected 
error discarding RAM: Bad file descriptor


So this has the potential to break existing setups.

The easy fix for these would be to configure "share=on" in these now-failing 
setups. Hm 

--
Cheers,

David / dhildenb




Re: [PULL 0/2] QAPI patches patches for 2023-07-10

2023-07-25 Thread Peter Maydell
On Mon, 10 Jul 2023 at 12:21, Markus Armbruster  wrote:
>
>
> The following changes since commit 2ff49e96accc8fd9a38e9abd16f0cfa0adab1605:
>
>   Merge tag 'pull-tcg-20230709' of https://gitlab.com/rth7680/qemu into 
> staging (2023-07-09 15:01:43 +0100)
>
> are available in the Git repository at:
>
>   https://repo.or.cz/qemu/armbru.git tags/pull-qapi-2023-07-10
>
> for you to fetch changes up to fd658a7b8cf1091ae2914655add4511865d7edc0:
>
>   migration.json: Don't use space before colon (2023-07-10 07:47:36 +0200)
>
> 
> QAPI patches patches for 2023-07-10
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.1
for any user-visible changes.

-- PMM



Re: [PULL 00/16] s390x fixes

2023-07-25 Thread Peter Maydell
On Mon, 24 Jul 2023 at 17:53, Thomas Huth  wrote:
>
> The following changes since commit
885fc169f09f5915ce037263d20a59eb226d473d:
>
>   Merge tag 'pull-riscv-to-apply-20230723-3' of
https://github.com/alistair23/qemu into staging (2023-07-24 11:34:35 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/thuth/qemu.git tags/pull-request-2023-07-24
>
> for you to fetch changes up to bd39b7b5f34c2f6b9272bf281ee0324cb07fc3ee:
>
>   tests/avocado/machine_s390_ccw_virtio: Skip the flaky virtio-gpu test
by default (2023-07-24 18:44:48 +0200)
>
> 
> * Fix emulation of s390x instructions: CKSM, CLM, ICM, MC, CLGEBR(A)
> * Remove useless and non-working s390x migration avocado tests
> * Fix loongarch CSRRD CPUID instruction when running on s390x hosts
> * Disable flaky s390x virtio-gpu test by default
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.1
for any user-visible changes.

-- PMM


[PATCH v3 1/3] linux-user: Show heap address in /proc/pid/maps

2023-07-25 Thread Helge Deller
Show the memory location of the heap in the /proc/pid/maps file inside the
guest. The heap address will be stored in ts->heap_base, so make that
variable visible for all guest architectures, not just architectures for
semihosted binaries (arm, m68k, riscv).

Show 32- and 64-bit pointers with 8 digits and leading zeros (%08x/%08lx).
For 64-bit we could use %16lx, but we mimic the Linux kernel, which shows
it with %08lx too.

Example:

user@machine:/# uname -a
Linux paq 5.15.88+ #47 SMP Sun Jan 15 12:53:11 CET 2023 aarch64 GNU/Linux

user@machine:/# cat /proc/self/maps
-9000 r-xp  08:01 2380521
/usr/bin/cat
9000-0001f000 ---p  00:00 0
0001f000-0002 r--p f000 08:01 2380521
/usr/bin/cat
0002-00021000 rw-p 0001 08:01 2380521
/usr/bin/cat
00021000-00042000 rw-p  00:00 0  [heap]
55-551000 ---p  00:00 0
551000-5500801000 rw-p  00:00 0  [stack]
5500801000-5500827000 r-xp  08:01 2395258
/usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1
5500827000-550083f000 ---p  00:00 0
550083f000-5500841000 r--p 0002e000 08:01 2395258
/usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1
5500841000-5500843000 rw-p 0003 08:01 2395258
/usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1
5500843000-5500844000 r-xp  00:00 0
5500844000-5500846000 rw-p  00:00 0
550085-55009d7000 r-xp  08:01 2395261
/usr/lib/aarch64-linux-gnu/libc.so.6
55009d7000-55009ed000 ---p 00187000 08:01 2395261
/usr/lib/aarch64-linux-gnu/libc.so.6
55009ed000-55009f r--p 0018d000 08:01 2395261
/usr/lib/aarch64-linux-gnu/libc.so.6
55009f-55009f2000 rw-p 0019 08:01 2395261
/usr/lib/aarch64-linux-gnu/libc.so.6
55009f2000-55009ff000 rw-p  00:00 0

Signed-off-by: Helge Deller 
---
 include/exec/cpu_ldst.h | 4 ++--
 linux-user/main.c   | 1 +
 linux-user/qemu.h   | 4 ++--
 linux-user/syscall.c| 8 ++--
 4 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index 645476f0e5..f1e6f31e88 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -72,10 +72,10 @@
  */
 #if TARGET_VIRT_ADDR_SPACE_BITS <= 32
 typedef uint32_t abi_ptr;
-#define TARGET_ABI_FMT_ptr "%x"
+#define TARGET_ABI_FMT_ptr "%08x"
 #else
 typedef uint64_t abi_ptr;
-#define TARGET_ABI_FMT_ptr "%"PRIx64
+#define TARGET_ABI_FMT_ptr "%08"PRIx64
 #endif

 #ifndef TARGET_TAGGED_ADDRESSES
diff --git a/linux-user/main.c b/linux-user/main.c
index dba67ffa36..12f3d8a93e 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -955,6 +955,7 @@ int main(int argc, char **argv, char **envp)
the real value of GUEST_BASE into account.  */
 tcg_prologue_init(tcg_ctx);

+ts->heap_base = info->brk;
 target_cpu_copy_regs(env, regs);

 if (gdbstub) {
diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 802794db63..7a6adac637 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -121,11 +121,11 @@ typedef struct TaskState {
 #ifdef TARGET_M68K
 abi_ulong tp_value;
 #endif
-#if defined(TARGET_ARM) || defined(TARGET_M68K) || defined(TARGET_RISCV)
+
 /* Extra fields for semihosted binaries.  */
 abi_ulong heap_base;
 abi_ulong heap_limit;
-#endif
+
 abi_ulong stack_base;
 int used; /* non zero if used */
 struct image_info *info;
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 95727a816a..220c4a04b8 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8131,14 +8131,18 @@ static int open_self_maps_1(CPUArchState *cpu_env, int 
fd, bool smaps)
 continue;
 }

+path = e->path;
+
+if (ts->heap_base && h2g(min) == ts->heap_base) {
+path = "[heap]";
+}
+
 #ifdef TARGET_HPPA
 if (h2g(max) == ts->info->stack_limit) {
 #else
 if (h2g(min) == ts->info->stack_limit) {
 #endif
 path = "[stack]";
-} else {
-path = e->path;
 }

 count = dprintf(fd, TARGET_ABI_FMT_ptr "-" TARGET_ABI_FMT_ptr
--
2.41.0




[PATCH v3 3/3] linux-user: Load pie executables at upper memory

2023-07-25 Thread Helge Deller
Adjust the loader to load dynamic pie executables at around:
~ 0x55  for 64-bit guest binaries on 64-bit host,
- 0x4000for 32-bit guest binaries on 64-bit host, and
- 0xfor 32-bit guest binaries on 32-bit host.

Reason for this change is to unbreak the Thread Sanitizer (TSan)
application again, as it was done in aab613fb9597 ("linux-user: Update
TASK_UNMAPPED_BASE for aarch64").

Signed-off-by: Helge Deller 
---
 linux-user/elfload.c |  6 --
 linux-user/loader.h  | 12 
 linux-user/mmap.c| 16 +---
 3 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 47a118e430..8f5a79b537 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3021,6 +3021,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 struct elfhdr *ehdr = (struct elfhdr *)bprm_buf;
 struct elf_phdr *phdr;
 abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
+unsigned long load_offset = 0;
 int i, retval, prot_exec;
 Error *err = NULL;
 bool is_main_executable;
@@ -3121,6 +3122,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
  * select guest_base.  In this case we pass a size.
  */
 probe_guest_base(image_name, 0, hiaddr - loaddr);
+load_offset = TASK_UNMAPPED_BASE_PIE;
 }
 }

@@ -3138,7 +3140,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
  * In both cases, we will overwrite pages in this range with mappings
  * from the executable.
  */
-load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
+load_addr = target_mmap(loaddr + load_offset, (size_t)hiaddr - loaddr + 1, 
PROT_NONE,
 MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
 (is_main_executable ? MAP_FIXED : 0),
 -1, 0);
@@ -3176,7 +3178,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 info->start_data = -1;
 info->end_data = 0;
 /* possible start for brk is behind all sections of this ELF file. */
-info->brk = TARGET_PAGE_ALIGN(hiaddr);
+info->brk = TARGET_PAGE_ALIGN(load_offset + hiaddr);
 info->elf_flags = ehdr->e_flags;

 prot_exec = PROT_EXEC;
diff --git a/linux-user/loader.h b/linux-user/loader.h
index 59cbeacf24..3ba41e9a7b 100644
--- a/linux-user/loader.h
+++ b/linux-user/loader.h
@@ -18,6 +18,18 @@
 #ifndef LINUX_USER_LOADER_H
 #define LINUX_USER_LOADER_H

+/* where to map binaries? */
+#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
+# define TASK_UNMAPPED_BASE_PIE 0x55
+# define TASK_UNMAPPED_BASE0x70
+#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32
+# define TASK_UNMAPPED_BASE_PIE0x4000
+# define TASK_UNMAPPED_BASE0xf000
+#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */
+# define TASK_UNMAPPED_BASE_PIE0x
+# define TASK_UNMAPPED_BASE0x4000
+#endif
+
 /*
  * Read a good amount of data initially, to hopefully get all the
  * program headers loaded.
diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 848d2fd4bb..9434bc805d 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -23,6 +23,7 @@
 #include "user-internals.h"
 #include "user-mmap.h"
 #include "target_mman.h"
+#include "loader.h"

 static pthread_mutex_t mmap_mutex = PTHREAD_MUTEX_INITIALIZER;
 static __thread int mmap_lock_count;
@@ -295,21 +296,6 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong 
start, abi_ulong last,
 return true;
 }

-#if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 64
-#ifdef TARGET_AARCH64
-# define TASK_UNMAPPED_BASE  0x55
-#else
-# define TASK_UNMAPPED_BASE  0x40
-#endif
-#elif HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 32
-#ifdef TARGET_HPPA
-# define TASK_UNMAPPED_BASE  0xfa00
-#else
-# define TASK_UNMAPPED_BASE  0xe000
-#endif
-#else /* HOST_LONG_BITS == 32 && TARGET_ABI_BITS == 32 */
-# define TASK_UNMAPPED_BASE  0x4000
-#endif
 abi_ulong mmap_next_start = TASK_UNMAPPED_BASE;

 unsigned long last_brk;
--
2.41.0




[PATCH v3 0/3] linux-user: Fix static armhf binaries and optmize memory layout

2023-07-25 Thread Helge Deller
This patch series:
- fixes qemu-arm to run static armhf binaries
- shows address of heap in /proc/pid/maps output for all architectures
- optimizes address layout of loaded executable

It can be pulled from here:
https://github.com/hdeller/qemu-hppa/tree/brk-fixes-2

Helge Deller (3):
  linux-user: Show heap address in /proc/pid/maps
  linux-user: Optimize memory layout for static and dynamic executables
  linux-user: Load pie executables at upper memory

 include/exec/cpu_ldst.h |  4 +--
 linux-user/elfload.c| 59 +
 linux-user/loader.h | 12 +
 linux-user/main.c   |  1 +
 linux-user/mmap.c   | 14 +-
 linux-user/qemu.h   |  4 +--
 linux-user/syscall.c|  8 --
 7 files changed, 43 insertions(+), 59 deletions(-)

--
2.41.0




[PATCH v3 2/3] linux-user: Optimize memory layout for static and dynamic executables

2023-07-25 Thread Helge Deller
Organize the emulated memory layout in a way which leaves as much memory as
possible for heap for the application.

This patch tries to optize the memory layout by loading pie executables
into lower memory and shared libs into higher memory (at
TASK_UNMAPPED_BASE). This leaves a bigger memory area usable for heap
space which will be located directly after the executable.
Up to now, pie executable and shared libs were loaded directly behind
each other in the area at TASK_UNMAPPED_BASE, which leaves very little
space for heap.

I tested this change on arm64, armhf and hppa (all in chroot on x86-64),
and with a static armhf binary (which is broken without this patch).

This patch temporarily breaks the Thread Sanitizer (TSan) application which
expects specific boundary definitions for memory mappings on different
platforms [1], see commit aab613fb9597 ("linux-user: Update TASK_UNMAPPED_BASE 
for aarch64")
for aarch64. The follow-up patch fixes it again.

[1] 
https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/tsan/rtl/tsan_platform.h

Signed-off-by: Helge Deller 
---
 linux-user/elfload.c | 55 +---
 linux-user/mmap.c|  8 ---
 2 files changed, 21 insertions(+), 42 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 861ec07abc..47a118e430 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3023,6 +3023,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
 int i, retval, prot_exec;
 Error *err = NULL;
+bool is_main_executable;

 /* First of all, some simple consistency checks */
 if (!elf_check_ident(ehdr)) {
@@ -3106,28 +3107,8 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 }
 }

-if (pinterp_name != NULL) {
-/*
- * This is the main executable.
- *
- * Reserve extra space for brk.
- * We hold on to this space while placing the interpreter
- * and the stack, lest they be placed immediately after
- * the data segment and block allocation from the brk.
- *
- * 16MB is chosen as "large enough" without being so large as
- * to allow the result to not fit with a 32-bit guest on a
- * 32-bit host. However some 64 bit guests (e.g. s390x)
- * attempt to place their heap further ahead and currently
- * nothing stops them smashing into QEMUs address space.
- */
-#if TARGET_LONG_BITS == 64
-info->reserve_brk = 32 * MiB;
-#else
-info->reserve_brk = 16 * MiB;
-#endif
-hiaddr += info->reserve_brk;
-
+is_main_executable = (pinterp_name != NULL);
+if (is_main_executable) {
 if (ehdr->e_type == ET_EXEC) {
 /*
  * Make sure that the low address does not conflict with
@@ -3136,7 +3117,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 probe_guest_base(image_name, loaddr, hiaddr);
 } else {
 /*
- * The binary is dynamic, but we still need to
+ * The binary is dynamic (pie-executabe), but we still need to
  * select guest_base.  In this case we pass a size.
  */
 probe_guest_base(image_name, 0, hiaddr - loaddr);
@@ -3159,7 +3140,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
  */
 load_addr = target_mmap(loaddr, (size_t)hiaddr - loaddr + 1, PROT_NONE,
 MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
-(ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
+(is_main_executable ? MAP_FIXED : 0),
 -1, 0);
 if (load_addr == -1) {
 goto exit_mmap;
@@ -3194,7 +3175,8 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 info->end_code = 0;
 info->start_data = -1;
 info->end_data = 0;
-info->brk = 0;
+/* possible start for brk is behind all sections of this ELF file. */
+info->brk = TARGET_PAGE_ALIGN(hiaddr);
 info->elf_flags = ehdr->e_flags;

 prot_exec = PROT_EXEC;
@@ -3288,9 +3270,6 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 info->end_data = vaddr_ef;
 }
 }
-if (vaddr_em > info->brk) {
-info->brk = vaddr_em;
-}
 #ifdef TARGET_MIPS
 } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) {
 Mips_elf_abiflags_v0 abiflags;
@@ -3618,6 +3597,15 @@ int load_elf_binary(struct linux_binprm *bprm, struct 
image_info *info)

 if (elf_interpreter) {
 load_elf_interp(elf_interpreter, &interp_info, bprm->buf);
+/*
+* Use brk address of interpreter if it was loaded above the
+* executable and leaves less than 16 MB for heap.
+* This happens e.g. with static binaries on armhf.
+ */

Re: intel-iommu: Report interrupt remapping faults, fix return value

2023-07-25 Thread Michael S. Tsirkin
On Tue, Jul 25, 2023 at 11:01:16AM +0100, David Woodhouse wrote:
> From: David Woodhouse 
> 
> A generic X86IOMMUClass->int_remap function should not return VT-d
> specific values; fix it to return 0 if the interrupt was successfully
> translated or -EINVAL if not.
> 
> The VTD_FR_IR_xxx values are supposed to be used to actually raise
> faults through the fault reporting mechanism, so do that instead for
> the case where the IRQ is actually being injected.
> 
> There is more work to be done here, as pretranslations for the KVM IRQ
> routing table can't fault; an untranslatable IRQ should be handled in
> userspace and the fault raised only when the IRQ actually happens (if
> indeed the IRTE is still not valid at that time). But we can work on
> that later; we can at least raise faults for the direct case.
> 
> Signed-off-by: David Woodhouse 


looks like post 8.1 material yes?

> ---
>  hw/i386/intel_iommu.c  | 148 ++---
>  hw/i386/intel_iommu_internal.h |   1 +
>  2 files changed, 102 insertions(+), 47 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index dcc334060c..a65a94a4ce 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -469,21 +469,12 @@ static void vtd_set_frcd_and_update_ppf(IntelIOMMUState 
> *s, uint16_t index)
>  
>  /* Must not update F field now, should be done later */
>  static void vtd_record_frcd(IntelIOMMUState *s, uint16_t index,
> -uint16_t source_id, hwaddr addr,
> -VTDFaultReason fault, bool is_write,
> -bool is_pasid, uint32_t pasid)
> +uint64_t hi, uint64_t lo)
>  {
> -uint64_t hi = 0, lo;
>  hwaddr frcd_reg_addr = DMAR_FRCD_REG_OFFSET + (((uint64_t)index) << 4);
>  
>  assert(index < DMAR_FRCD_REG_NR);
>  
> -lo = VTD_FRCD_FI(addr);
> -hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault) |
> - VTD_FRCD_PV(pasid) | VTD_FRCD_PP(is_pasid);
> -if (!is_write) {
> -hi |= VTD_FRCD_T;
> -}
>  vtd_set_quad_raw(s, frcd_reg_addr, lo);
>  vtd_set_quad_raw(s, frcd_reg_addr + 8, hi);
>  
> @@ -509,17 +500,11 @@ static bool vtd_try_collapse_fault(IntelIOMMUState *s, 
> uint16_t source_id)
>  }
>  
>  /* Log and report an DMAR (address translation) fault to software */
> -static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id,
> -  hwaddr addr, VTDFaultReason fault,
> -  bool is_write, bool is_pasid,
> -  uint32_t pasid)
> +static void vtd_report_frcd_fault(IntelIOMMUState *s, uint64_t source_id,
> +  uint64_t hi, uint64_t lo)
>  {
>  uint32_t fsts_reg = vtd_get_long_raw(s, DMAR_FSTS_REG);
>  
> -assert(fault < VTD_FR_MAX);
> -
> -trace_vtd_dmar_fault(source_id, fault, addr, is_write);
> -
>  if (fsts_reg & VTD_FSTS_PFO) {
>  error_report_once("New fault is not recorded due to "
>"Primary Fault Overflow");
> @@ -539,8 +524,7 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, 
> uint16_t source_id,
>  return;
>  }
>  
> -vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault,
> -is_write, is_pasid, pasid);
> +vtd_record_frcd(s, s->next_frcd_reg, hi, lo);
>  
>  if (fsts_reg & VTD_FSTS_PPF) {
>  error_report_once("There are pending faults already, "
> @@ -565,6 +549,40 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, 
> uint16_t source_id,
>  }
>  }
>  
> +/* Log and report an DMAR (address translation) fault to software */
> +static void vtd_report_dmar_fault(IntelIOMMUState *s, uint16_t source_id,
> +  hwaddr addr, VTDFaultReason fault,
> +  bool is_write, bool is_pasid,
> +  uint32_t pasid)
> +{
> +uint64_t hi, lo;
> +
> +assert(fault < VTD_FR_MAX);
> +
> +trace_vtd_dmar_fault(source_id, fault, addr, is_write);
> +
> +lo = VTD_FRCD_FI(addr);
> +hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault) |
> + VTD_FRCD_PV(pasid) | VTD_FRCD_PP(is_pasid);
> +if (!is_write) {
> +hi |= VTD_FRCD_T;
> +}
> +
> +vtd_report_frcd_fault(s, source_id, hi, lo);
> +}
> +
> +
> +static void vtd_report_ir_fault(IntelIOMMUState *s, uint64_t source_id,
> +VTDFaultReason fault, uint16_t index)
> +{
> +uint64_t hi, lo;
> +
> +lo = VTD_FRCD_IR_IDX(index);
> +hi = VTD_FRCD_SID(source_id) | VTD_FRCD_FR(fault);
> +
> +vtd_report_frcd_fault(s, source_id, hi, lo);
> +}
> +
>  /* Handle Invalidation Queue Errors of queued invalidation interface error
>   * conditions.
>   */
> @@ -3300,8 +3318,9 @@ static Property vtd_properties[] = {
>  };
>  
>  /* Read IRTE entry with specific index */
> -static int vtd_irte_get(IntelIOMMUSta

Re: [PATCH] kvm: Remove KVM_CREATE_IRQCHIP support assumption

2023-07-25 Thread Andrew Jones
On Mon, Jul 24, 2023 at 11:53:39AM +0200, Thomas Huth wrote:
> On 22/07/2023 08.21, Andrew Jones wrote:
> > Since Linux commit 00f918f61c56 ("RISC-V: KVM: Skeletal in-kernel AIA
> > irqchip support") checking KVM_CAP_IRQCHIP returns non-zero when the
> > RISC-V platform has AIA. The cap indicates KVM supports at least one
> > of the following ioctls:
> > 
> >KVM_CREATE_IRQCHIP
> >KVM_IRQ_LINE
> >KVM_GET_IRQCHIP
> >KVM_SET_IRQCHIP
> >KVM_GET_LAPIC
> >KVM_SET_LAPIC
> > 
> > but the cap doesn't imply that KVM must support any of those ioctls
> > in particular. However, QEMU was assuming the KVM_CREATE_IRQCHIP
> > ioctl was supported. Stop making that assumption by introducing a
> > KVM parameter that each architecture which supports KVM_CREATE_IRQCHIP
> > sets. Adding parameters isn't awesome, but given how the
> > KVM_CAP_IRQCHIP isn't very helpful on its own, we don't have a lot of
> > options.
> > 
> > Signed-off-by: Andrew Jones 
> > ---
> > 
> > While this fixes booting guests on riscv KVM with AIA it's unlikely
> > to get merged before the QEMU support for KVM AIA[1] lands, which
> > would also fix the issue. I think this patch is still worth considering
> > though since QEMU's assumption is wrong.
> > 
> > [1] 
> > https://lore.kernel.org/all/20230714084429.22349-1-yongxuan.w...@sifive.com/
> > 
> > 
> >   accel/kvm/kvm-all.c| 5 -
> >   include/sysemu/kvm.h   | 1 +
> >   target/arm/kvm.c   | 3 +++
> >   target/i386/kvm/kvm.c  | 2 ++
> >   target/s390x/kvm/kvm.c | 3 +++
> >   5 files changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> > index 373d876c0580..0f5ff8630502 100644
> > --- a/accel/kvm/kvm-all.c
> > +++ b/accel/kvm/kvm-all.c
> > @@ -86,6 +86,7 @@ struct KVMParkedVcpu {
> >   };
> >   KVMState *kvm_state;
> > +bool kvm_has_create_irqchip;
> >   bool kvm_kernel_irqchip;
> >   bool kvm_split_irqchip;
> >   bool kvm_async_interrupts_allowed;
> > @@ -2377,8 +2378,10 @@ static void kvm_irqchip_create(KVMState *s)
> >   if (s->kernel_irqchip_split == ON_OFF_AUTO_ON) {
> >   error_report("Split IRQ chip mode not supported.");
> >   exit(1);
> > -} else {
> > +} else if (kvm_has_create_irqchip) {
> >   ret = kvm_vm_ioctl(s, KVM_CREATE_IRQCHIP);
> > +} else {
> > +return;
> >   }
> >   }
> >   if (ret < 0) {
> 
> I think I'd do this differntly... at the beginning of the function, there is
> a check for kvm_check_extension(s, KVM_CAP_IRQCHIP) etc. ... I think you
> could now replace that check with a simple
> 
>   if (!kvm_has_create_irqchip) {
>   return;
>   }
> 
> The "kvm_vm_enable_cap(s, KVM_CAP_S390_IRQCHIP, 0)" of course has to be
> moved to the target/s390x/kvm/kvm.c file, too.

Actually, once we've moved the s390 cap enablement to the s390 file we can
just drop the whole if-else chain. We don't want the
if (!kvm_has_create_irqchip) at the top because we want to try
kvm_arch_irqchip_create() even when kvm_has_create_irqchip is false, and
we don't need to check KVM_CREATE_IRQCHIP for kvm_arch_irqchip_create()
either. Keeping the check, as it is above in this v1, of
kvm_has_create_irqchip for KVM_CREATE_IRQCHIP is still necessary, though.

Thanks,
drew



[PATCH v2] kvm: Remove KVM_CREATE_IRQCHIP support assumption

2023-07-25 Thread Andrew Jones
Since Linux commit 00f918f61c56 ("RISC-V: KVM: Skeletal in-kernel AIA
irqchip support") checking KVM_CAP_IRQCHIP returns non-zero when the
RISC-V platform has AIA. The cap indicates KVM supports at least one
of the following ioctls:

  KVM_CREATE_IRQCHIP
  KVM_IRQ_LINE
  KVM_GET_IRQCHIP
  KVM_SET_IRQCHIP
  KVM_GET_LAPIC
  KVM_SET_LAPIC

but the cap doesn't imply that KVM must support any of those ioctls
in particular. However, QEMU was assuming the KVM_CREATE_IRQCHIP
ioctl was supported. Stop making that assumption by introducing a
KVM parameter that each architecture which supports KVM_CREATE_IRQCHIP
sets. Adding parameters isn't awesome, but given how the
KVM_CAP_IRQCHIP isn't very helpful on its own, we don't have a lot of
options.

Signed-off-by: Andrew Jones 
---

While this fixes booting guests on riscv KVM with AIA it's unlikely
to get merged before the QEMU support for KVM AIA[1] lands, which
would also fix the issue. I think this patch is still worth considering
though since QEMU's assumption is wrong.

[1] https://lore.kernel.org/all/20230714084429.22349-1-yongxuan.w...@sifive.com/

v2:
  - Move the s390x code to an s390x file. [Thomas]
  - Drop the KVM_CAP_IRQCHIP check from the top of kvm_irqchip_create(),
as it's no longer necessary.

 accel/kvm/kvm-all.c| 16 
 include/sysemu/kvm.h   |  1 +
 target/arm/kvm.c   |  3 +++
 target/i386/kvm/kvm.c  |  2 ++
 target/s390x/kvm/kvm.c | 11 +++
 5 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 373d876c0580..cddcb6eca641 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -86,6 +86,7 @@ struct KVMParkedVcpu {
 };
 
 KVMState *kvm_state;
+bool kvm_has_create_irqchip;
 bool kvm_kernel_irqchip;
 bool kvm_split_irqchip;
 bool kvm_async_interrupts_allowed;
@@ -2358,17 +2359,6 @@ static void kvm_irqchip_create(KVMState *s)
 int ret;
 
 assert(s->kernel_irqchip_split != ON_OFF_AUTO_AUTO);
-if (kvm_check_extension(s, KVM_CAP_IRQCHIP)) {
-;
-} else if (kvm_check_extension(s, KVM_CAP_S390_IRQCHIP)) {
-ret = kvm_vm_enable_cap(s, KVM_CAP_S390_IRQCHIP, 0);
-if (ret < 0) {
-fprintf(stderr, "Enable kernel irqchip failed: %s\n", 
strerror(-ret));
-exit(1);
-}
-} else {
-return;
-}
 
 /* First probe and see if there's a arch-specific hook to create the
  * in-kernel irqchip for us */
@@ -2377,8 +2367,10 @@ static void kvm_irqchip_create(KVMState *s)
 if (s->kernel_irqchip_split == ON_OFF_AUTO_ON) {
 error_report("Split IRQ chip mode not supported.");
 exit(1);
-} else {
+} else if (kvm_has_create_irqchip) {
 ret = kvm_vm_ioctl(s, KVM_CREATE_IRQCHIP);
+} else {
+return;
 }
 }
 if (ret < 0) {
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 115f0cca79d1..84b1bb3dc91e 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -32,6 +32,7 @@
 #ifdef CONFIG_KVM_IS_POSSIBLE
 
 extern bool kvm_allowed;
+extern bool kvm_has_create_irqchip;
 extern bool kvm_kernel_irqchip;
 extern bool kvm_split_irqchip;
 extern bool kvm_async_interrupts_allowed;
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index b4c7654f4980..2fa87b495d68 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -250,6 +250,9 @@ int kvm_arm_get_max_vm_ipa_size(MachineState *ms, bool 
*fixed_ipa)
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
 int ret = 0;
+
+kvm_has_create_irqchip = kvm_check_extension(s, KVM_CAP_IRQCHIP);
+
 /* For ARM interrupt delivery is always asynchronous,
  * whether we are using an in-kernel VGIC or not.
  */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index ebfaf3d24c79..6363e67f092d 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2771,6 +2771,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 }
 }
 
+kvm_has_create_irqchip = kvm_check_extension(s, KVM_CAP_IRQCHIP);
+
 return 0;
 }
 
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index a9e5880349d9..bcc735227f7d 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -391,6 +391,17 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 }
 
 kvm_set_max_memslot_size(KVM_SLOT_MAX_BYTES);
+
+kvm_has_create_irqchip = kvm_check_extension(s, KVM_CAP_S390_IRQCHIP);
+if (kvm_has_create_irqchip) {
+int ret = kvm_vm_enable_cap(s, KVM_CAP_S390_IRQCHIP, 0);
+
+if (ret < 0) {
+fprintf(stderr, "Enable kernel irqchip failed: %s\n", 
strerror(-ret));
+exit(1);
+}
+}
+
 return 0;
 }
 
-- 
2.41.0




Re: [PATCH v2] kvm: Remove KVM_CREATE_IRQCHIP support assumption

2023-07-25 Thread Thomas Huth

On 25/07/2023 14.26, Andrew Jones wrote:

Since Linux commit 00f918f61c56 ("RISC-V: KVM: Skeletal in-kernel AIA
irqchip support") checking KVM_CAP_IRQCHIP returns non-zero when the
RISC-V platform has AIA. The cap indicates KVM supports at least one
of the following ioctls:

   KVM_CREATE_IRQCHIP
   KVM_IRQ_LINE
   KVM_GET_IRQCHIP
   KVM_SET_IRQCHIP
   KVM_GET_LAPIC
   KVM_SET_LAPIC

but the cap doesn't imply that KVM must support any of those ioctls
in particular. However, QEMU was assuming the KVM_CREATE_IRQCHIP
ioctl was supported. Stop making that assumption by introducing a
KVM parameter that each architecture which supports KVM_CREATE_IRQCHIP
sets. Adding parameters isn't awesome, but given how the
KVM_CAP_IRQCHIP isn't very helpful on its own, we don't have a lot of
options.

Signed-off-by: Andrew Jones 
---

While this fixes booting guests on riscv KVM with AIA it's unlikely
to get merged before the QEMU support for KVM AIA[1] lands, which
would also fix the issue. I think this patch is still worth considering
though since QEMU's assumption is wrong.

[1] https://lore.kernel.org/all/20230714084429.22349-1-yongxuan.w...@sifive.com/

v2:
   - Move the s390x code to an s390x file. [Thomas]
   - Drop the KVM_CAP_IRQCHIP check from the top of kvm_irqchip_create(),
 as it's no longer necessary.


Looks good now!

Reviewed-by: Thomas Huth 





Re: [PATCH v2] kvm: Remove KVM_CREATE_IRQCHIP support assumption

2023-07-25 Thread Philippe Mathieu-Daudé

On 25/7/23 14:26, Andrew Jones wrote:

Since Linux commit 00f918f61c56 ("RISC-V: KVM: Skeletal in-kernel AIA
irqchip support") checking KVM_CAP_IRQCHIP returns non-zero when the
RISC-V platform has AIA. The cap indicates KVM supports at least one
of the following ioctls:

   KVM_CREATE_IRQCHIP
   KVM_IRQ_LINE
   KVM_GET_IRQCHIP
   KVM_SET_IRQCHIP
   KVM_GET_LAPIC
   KVM_SET_LAPIC

but the cap doesn't imply that KVM must support any of those ioctls
in particular. However, QEMU was assuming the KVM_CREATE_IRQCHIP
ioctl was supported. Stop making that assumption by introducing a
KVM parameter that each architecture which supports KVM_CREATE_IRQCHIP
sets. Adding parameters isn't awesome, but given how the
KVM_CAP_IRQCHIP isn't very helpful on its own, we don't have a lot of
options.

Signed-off-by: Andrew Jones 
---

While this fixes booting guests on riscv KVM with AIA it's unlikely
to get merged before the QEMU support for KVM AIA[1] lands, which
would also fix the issue. I think this patch is still worth considering
though since QEMU's assumption is wrong.

[1] https://lore.kernel.org/all/20230714084429.22349-1-yongxuan.w...@sifive.com/

v2:
   - Move the s390x code to an s390x file. [Thomas]
   - Drop the KVM_CAP_IRQCHIP check from the top of kvm_irqchip_create(),
 as it's no longer necessary.

  accel/kvm/kvm-all.c| 16 
  include/sysemu/kvm.h   |  1 +
  target/arm/kvm.c   |  3 +++
  target/i386/kvm/kvm.c  |  2 ++
  target/s390x/kvm/kvm.c | 11 +++
  5 files changed, 21 insertions(+), 12 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] ui/dbus: fix win32 compilation when !opengl

2023-07-25 Thread Philippe Mathieu-Daudé

On 25/7/23 13:25, marcandre.lur...@redhat.com wrote:

From: Marc-Andre Lureau 

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1782

Signed-off-by: Marc-André Lureau 
---
  ui/dbus-listener.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé 





Re: [PATCH v3 0/8] misc AHCI cleanups

2023-07-25 Thread Philippe Mathieu-Daudé

Hi Niklas, John, Paolo, Kevin,

On 19/7/23 12:47, Niklas Cassel wrote:


Niklas Cassel (8):
   hw/ide/ahci: remove stray backslash
   hw/ide/core: set ERR_STAT in unsupported command completion
   hw/ide/ahci: write D2H FIS when processing NCQ command
   hw/ide/ahci: simplify and document PxCI handling
   hw/ide/ahci: PxSACT and PxCI is cleared when PxCMD.ST is cleared
   hw/ide/ahci: PxCI should not get cleared when ERR_STAT is set
   hw/ide/ahci: fix ahci_write_fis_sdb()
   hw/ide/ahci: fix broken SError handling

  hw/ide/ahci.c | 112 +++---
  hw/ide/core.c |   2 +-
  tests/qtest/libqos/ahci.c | 106 +++-
  tests/qtest/libqos/ahci.h |   8 +--
  4 files changed, 164 insertions(+), 64 deletions(-)

--
2.40.1




Hello Philippe,

Considering that you picked up my patch,
"hw/ide/ahci: remove stray backslash" (patch 1/8 in this series),
and since John seems to have gone silent for 40+ days,
could you please consider taking this series through your misc tree?


(First patch was a cleanup)

Niklas, I don't feel confident enough :/

John, Paolo, Kevin, do you Ack?

Regards,

Phil.



Re: [PATCH for-8.1] hw/rdma/vmw/pvrdma_cmd: Use correct struct in query_port()

2023-07-25 Thread Philippe Mathieu-Daudé

On 25/7/23 13:36, Peter Maydell wrote:

In query_port() we pass the address of a local pvrdma_port_attr
struct to the rdma_query_backend_port() function.  Unfortunately,
rdma_backend_query_port() wants a pointer to a struct ibv_port_attr,
and the two are not the same length.

Coverity spotted this (CID 1507146): pvrdma_port_attr is 48 bytes
long, and ibv_port_attr is 52 bytes, because it has a few extra
fields at the end.

Fortunately, all we do with the attrs struct after the call is to
read a few specific fields out of it which are all at the same
offsets in both structs, so we can simply make the local variable the
correct type.  This also lets us drop the cast (which should have
been a bit of a warning flag that we were doing something wrong
here).


Fortunate but also kind of amusing :)

Reviewed-by: Philippe Mathieu-Daudé 


Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Maydell 
---
I don't know anything about the rdma code so this fix is based
purely on looking at the code, and is untested beyond just
make check/make check-avocado.
---
  hw/rdma/vmw/pvrdma_cmd.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)






Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop

2023-07-25 Thread Hanna Czenczek

On 25.07.23 12:03, Eugenio Perez Martin wrote:

On Tue, Jul 25, 2023 at 9:53 AM Hanna Czenczek  wrote:

On 24.07.23 17:48, Eugenio Perez Martin wrote:

On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek  wrote:

On 21.07.23 17:25, Eugenio Perez Martin wrote:

On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek  wrote:

Move the `suspended` field from vhost_vdpa into the global vhost_dev
struct, so vhost_dev_stop() can check whether the back-end has been
suspended by `vhost_ops->vhost_dev_start(hdev, false)`.  If it has,
there is no need to reset it; the reset is just a fall-back to stop
device operations for back-ends that do not support suspend.

Unfortunately, for vDPA specifically, RESUME is not yet implemented, so
when the device is re-started, we still have to do the reset to have it
un-suspend.

Signed-off-by: Hanna Czenczek 
---
include/hw/virtio/vhost-vdpa.h |  2 --
include/hw/virtio/vhost.h  |  8 
hw/virtio/vhost-vdpa.c | 11 +++
hw/virtio/vhost.c  |  8 +++-
4 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index e64bfc7f98..72c3686b7f 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -42,8 +42,6 @@ typedef struct vhost_vdpa {
bool shadow_vqs_enabled;
/* Vdpa must send shadow addresses as IOTLB key for data queues, not 
GPA */
bool shadow_data;
-/* Device suspended successfully */
-bool suspended;
/* IOVA mapping used by the Shadow Virtqueue */
VhostIOVATree *iova_tree;
GPtrArray *shadow_vqs;
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 6a173cb9fa..69bf59d630 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -120,6 +120,14 @@ struct vhost_dev {
uint64_t backend_cap;
/* @started: is the vhost device started? */
bool started;
+/**
+ * @suspended: Whether the vhost device is currently suspended.  Set
+ * and reset by implementations (vhost-user, vhost-vdpa, ...), which
+ * are supposed to automatically suspend/resume in their
+ * vhost_dev_start handlers as required.  Must also be cleared when
+ * the device is reset.
+ */
+bool suspended;
bool log_enabled;
uint64_t log_size;
Error *migration_blocker;
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 7b7dee468e..f7fd19a203 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev,

static int vhost_vdpa_reset_device(struct vhost_dev *dev)
{
-struct vhost_vdpa *v = dev->opaque;
int ret;
uint8_t status = 0;

ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
trace_vhost_vdpa_reset_device(dev);
-v->suspended = false;
+dev->suspended = false;
return ret;
}

@@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev)
if (unlikely(r)) {
error_report("Cannot suspend: %s(%d)", g_strerror(errno), 
errno);
} else {
-v->suspended = true;
+dev->suspended = true;
return;
}
}
@@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, 
bool started)
return -1;
}
vhost_vdpa_set_vring_ready(dev);
+if (dev->suspended) {
+/* TODO: When RESUME is available, use it instead of resetting */
+vhost_vdpa_reset_status(dev);

How is that we reset the status at each vhost_vdpa_dev_start? That
will clean all the vqs configured, features negotiated, etc. in the
vDPA device. Or am I missing something?

What alternative do you propose?  We don’t have RESUME for vDPA in qemu,
but we somehow need to lift the previous SUSPEND so the device will
again respond to guest requests, do we not?


Reset also clears the suspend state in vDPA, and it should be called
at vhost_dev_stop. So the device should never be in suspended state
here. Does that solve your concerns?

My intention with this patch was precisely not to reset in
vhost_dev_stop when suspending is supported.  So now I’m more confused
than before.


At this moment, I think that should be focused as an optimization and
not to be included in the main series.


It is absolutely not an optimization but vital for my use case. 
virtiofsd does not currently implement resetting, but if it did (and we 
want this support in the future), resetting it during stop/cont would be 
catastrophic.  There must be a way to have qemu not issue a reset.  This 
patch is the reason why this series exists.



But more generally, is this any different from what is done before this
patch?  Before this patch, vhost_dev_stop() unconditionally invokes
vhost_reset_status(), so the device is reset in every stop/start cycle,
that doesn’t 

Re: intel-iommu: Report interrupt remapping faults, fix return value

2023-07-25 Thread David Woodhouse
On Tue, 2023-07-25 at 08:03 -0400, Michael S. Tsirkin wrote:
> On Tue, Jul 25, 2023 at 11:01:16AM +0100, David Woodhouse wrote:
> > From: David Woodhouse 
> > 
> > A generic X86IOMMUClass->int_remap function should not return VT-d
> > specific values; fix it to return 0 if the interrupt was successfully
> > translated or -EINVAL if not.
> > 
> > The VTD_FR_IR_xxx values are supposed to be used to actually raise
> > faults through the fault reporting mechanism, so do that instead for
> > the case where the IRQ is actually being injected.
> > 
> > There is more work to be done here, as pretranslations for the KVM IRQ
> > routing table can't fault; an untranslatable IRQ should be handled in
> > userspace and the fault raised only when the IRQ actually happens (if
> > indeed the IRTE is still not valid at that time). But we can work on
> > that later; we can at least raise faults for the direct case.
> > 
> > Signed-off-by: David Woodhouse 
> 
> 
> looks like post 8.1 material yes?

Makes sense. I just found it lying around in my tree and figured it was
better posted than sitting there for another few months.

I do still want to get round to fixing the reports for passthrough
interrupts at some point. That annoys me. :)


smime.p7s
Description: S/MIME cryptographic signature


[PATCH] memory: avoid updating ioeventfds for some address_space

2023-07-25 Thread hongmianquan
When updating ioeventfds, we need to iterate all address spaces,
but some address spaces do not register eventfd_add|del call when
memory_listener_register() and they do nothing when updating ioeventfds.
So we can skip these AS in address_space_update_ioeventfds().

The overhead of memory_region_transaction_commit() can be significantly
reduced. For example, a VM with 8 vhost net devices and each one has
64 vectors, can reduce the time spent on memory_region_transaction_commit by 
20%.

Signed-off-by: hongmianquan 
---
 include/exec/memory.h |  1 +
 softmmu/memory.c  | 12 
 2 files changed, 13 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 7f5c11a0cc..556f4f1871 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1089,6 +1089,7 @@ struct AddressSpace {
 struct FlatView *current_map;
 
 int ioeventfd_nb;
+int ioeventfd_notifiers;
 struct MemoryRegionIoeventfd *ioeventfds;
 QTAILQ_HEAD(, MemoryListener) listeners;
 QTAILQ_ENTRY(AddressSpace) address_spaces_link;
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7d9494ce70..77042d3f93 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -842,6 +842,10 @@ static void address_space_update_ioeventfds(AddressSpace 
*as)
 AddrRange tmp;
 unsigned i;
 
+if (!as->ioeventfd_notifiers) {
+return;
+}
+
 /*
  * It is likely that the number of ioeventfds hasn't changed much, so use
  * the previous size as the starting value, with some headroom to avoid
@@ -3075,6 +3079,10 @@ void memory_listener_register(MemoryListener *listener, 
AddressSpace *as)
 }
 
 listener_add_address_space(listener, as);
+
+if (listener->eventfd_add || listener->eventfd_del) {
+as->ioeventfd_notifiers++;
+}
 }
 
 void memory_listener_unregister(MemoryListener *listener)
@@ -3083,6 +3091,10 @@ void memory_listener_unregister(MemoryListener *listener)
 return;
 }
 
+if (listener->eventfd_add || listener->eventfd_del) {
+as->ioeventfd_notifiers--;
+}
+
 listener_del_address_space(listener, listener->address_space);
 QTAILQ_REMOVE(&memory_listeners, listener, link);
 QTAILQ_REMOVE(&listener->address_space->listeners, listener, link_as);
-- 
2.11.0




[PATCH] memory: avoid updating ioeventfds for some address_space

2023-07-25 Thread hongmianquan
When updating ioeventfds, we need to iterate all address spaces,
but some address spaces do not register eventfd_add|del call when
memory_listener_register() and they do nothing when updating ioeventfds.
So we can skip these AS in address_space_update_ioeventfds().

The overhead of memory_region_transaction_commit() can be significantly
reduced. For example, a VM with 8 vhost net devices and each one has
64 vectors, can reduce the time spent on memory_region_transaction_commit by 
20%.

Signed-off-by: hongmianquan 
---
 include/exec/memory.h |  1 +
 softmmu/memory.c  | 12 
 2 files changed, 13 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 7f5c11a0cc..556f4f1871 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1089,6 +1089,7 @@ struct AddressSpace {
 struct FlatView *current_map;
 
 int ioeventfd_nb;
+int ioeventfd_notifiers;
 struct MemoryRegionIoeventfd *ioeventfds;
 QTAILQ_HEAD(, MemoryListener) listeners;
 QTAILQ_ENTRY(AddressSpace) address_spaces_link;
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7d9494ce70..178816c845 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -842,6 +842,10 @@ static void address_space_update_ioeventfds(AddressSpace 
*as)
 AddrRange tmp;
 unsigned i;
 
+if (!as->ioeventfd_notifiers) {
+return;
+}
+
 /*
  * It is likely that the number of ioeventfds hasn't changed much, so use
  * the previous size as the starting value, with some headroom to avoid
@@ -3075,6 +3079,10 @@ void memory_listener_register(MemoryListener *listener, 
AddressSpace *as)
 }
 
 listener_add_address_space(listener, as);
+
+if (listener->eventfd_add || listener->eventfd_del) {
+as->ioeventfd_notifiers++;
+}
 }
 
 void memory_listener_unregister(MemoryListener *listener)
@@ -3083,6 +3091,10 @@ void memory_listener_unregister(MemoryListener *listener)
 return;
 }
 
+if (listener->eventfd_add || listener->eventfd_del) {
+listener->address_space->ioeventfd_notifiers--;
+}
+
 listener_del_address_space(listener, listener->address_space);
 QTAILQ_REMOVE(&memory_listeners, listener, link);
 QTAILQ_REMOVE(&listener->address_space->listeners, listener, link_as);
-- 
2.11.0




Re: [PATCH] ppc: spapr: Fix device tree entries in absence of XIVE native mode

2023-07-25 Thread Gautam Menghani
Ok noted, thanks for the feedback Greg and Cedric.

Thanks,
Gautam



[PATCH] Open file as read only on private mapping in qemu_ram_alloc_from_file

2023-07-25 Thread Thiner Logoer
An read only file can be mapped with read write as long as the
mapping is private, which is very common case. Make
qemu_ram_alloc_from_file open file as read only when the
mapping is private, otherwise open will fail when file
does not allow write.

If this file does not exist or is a directory, the flag is not used,
so it should be OK.

from https://gitlab.com/qemu-project/qemu/-/issues/1689

Signed-off-by: Thiner Logoer 
---
 softmmu/physmem.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 3df73542e1..e8036ee335 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1945,8 +1945,15 @@ RAMBlock *qemu_ram_alloc_from_file(ram_addr_t size, 
MemoryRegion *mr,
 int fd;
 bool created;
 RAMBlock *block;
+
+/*
+ * If map is private, the fd does not need to be writable.
+ * This only get effective when the file is existent.
+ */
+bool open_as_readonly = readonly || !(ram_flags & RAM_SHARED);
 
-fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created,
+fd = file_ram_open(mem_path, memory_region_name(mr),
+   open_as_readonly, &created,
errp);
 if (fd < 0) {
 return NULL;
-- 
2.40.1




[PATCH] migration/ram: Refactor precopy ram loading code

2023-07-25 Thread Fabiano Rosas
From: Nikolay Borisov 

Extract the ramblock parsing code into a routine that operates on the
sequence of headers from the stream and another the parses the
individual ramblock. This makes ram_load_precopy() easier to
comprehend.

Signed-off-by: Nikolay Borisov 
Signed-off-by: Fabiano Rosas 
---
I'm extracting the parts from the fixed-ram migration series [1] that
could already go in. This patch is one of them.

1- https://lore.kernel.org/r/20230330180336.2791-1-faro...@suse.de
---
 migration/ram.c | 141 +++-
 1 file changed, 79 insertions(+), 62 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 0ada6477e8..685e129b70 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3826,6 +3826,82 @@ void colo_flush_ram_cache(void)
 trace_colo_flush_ram_cache_end();
 }
 
+static int parse_ramblock(QEMUFile *f, RAMBlock *block, ram_addr_t length)
+{
+int ret = 0;
+/* ADVISE is earlier, it shows the source has the postcopy capability on */
+bool postcopy_advised = migration_incoming_postcopy_advised();
+
+assert(block);
+
+if (!qemu_ram_is_migratable(block)) {
+error_report("block %s should not be migrated !", block->idstr);
+return -EINVAL;
+}
+
+if (length != block->used_length) {
+Error *local_err = NULL;
+
+ret = qemu_ram_resize(block, length, &local_err);
+if (local_err) {
+error_report_err(local_err);
+}
+}
+/* For postcopy we need to check hugepage sizes match */
+if (postcopy_advised && migrate_postcopy_ram() &&
+block->page_size != qemu_host_page_size) {
+uint64_t remote_page_size = qemu_get_be64(f);
+if (remote_page_size != block->page_size) {
+error_report("Mismatched RAM page size %s "
+ "(local) %zd != %" PRId64, block->idstr,
+ block->page_size, remote_page_size);
+ret = -EINVAL;
+}
+}
+if (migrate_ignore_shared()) {
+hwaddr addr = qemu_get_be64(f);
+if (migrate_ram_is_ignored(block) &&
+block->mr->addr != addr) {
+error_report("Mismatched GPAs for block %s "
+ "%" PRId64 "!= %" PRId64, block->idstr,
+ (uint64_t)addr, (uint64_t)block->mr->addr);
+ret = -EINVAL;
+}
+}
+ram_control_load_hook(f, RAM_CONTROL_BLOCK_REG, block->idstr);
+
+return ret;
+}
+
+static int parse_ramblocks(QEMUFile *f, ram_addr_t total_ram_bytes)
+{
+int ret = 0;
+
+/* Synchronize RAM block list */
+while (!ret && total_ram_bytes) {
+RAMBlock *block;
+char id[256];
+ram_addr_t length;
+int len = qemu_get_byte(f);
+
+qemu_get_buffer(f, (uint8_t *)id, len);
+id[len] = 0;
+length = qemu_get_be64(f);
+
+block = qemu_ram_block_by_name(id);
+if (block) {
+ret = parse_ramblock(f, block, length);
+} else {
+error_report("Unknown ramblock \"%s\", cannot accept "
+ "migration", id);
+ret = -EINVAL;
+}
+total_ram_bytes -= length;
+}
+
+return ret;
+}
+
 /**
  * ram_load_precopy: load pages in precopy case
  *
@@ -3840,14 +3916,13 @@ static int ram_load_precopy(QEMUFile *f)
 {
 MigrationIncomingState *mis = migration_incoming_get_current();
 int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0;
-/* ADVISE is earlier, it shows the source has the postcopy capability on */
-bool postcopy_advised = migration_incoming_postcopy_advised();
+
 if (!migrate_compress()) {
 invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE;
 }
 
 while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
-ram_addr_t addr, total_ram_bytes;
+ram_addr_t addr;
 void *host = NULL, *host_bak = NULL;
 uint8_t ch;
 
@@ -3918,65 +3993,7 @@ static int ram_load_precopy(QEMUFile *f)
 
 switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
 case RAM_SAVE_FLAG_MEM_SIZE:
-/* Synchronize RAM block list */
-total_ram_bytes = addr;
-while (!ret && total_ram_bytes) {
-RAMBlock *block;
-char id[256];
-ram_addr_t length;
-
-len = qemu_get_byte(f);
-qemu_get_buffer(f, (uint8_t *)id, len);
-id[len] = 0;
-length = qemu_get_be64(f);
-
-block = qemu_ram_block_by_name(id);
-if (block && !qemu_ram_is_migratable(block)) {
-error_report("block %s should not be migrated !", id);
-ret = -EINVAL;
-} else if (block) {
-if (length != block->used_length) {
-Error *local_err = NULL;
-
-ret = qemu_ram_resize(block, length,
-  &local_err);
-   

Re: [PATCH] migration/ram: Refactor precopy ram loading code

2023-07-25 Thread Philippe Mathieu-Daudé

On 25/7/23 15:26, Fabiano Rosas wrote:

From: Nikolay Borisov 

Extract the ramblock parsing code into a routine that operates on the
sequence of headers from the stream and another the parses the
individual ramblock. This makes ram_load_precopy() easier to
comprehend.

Signed-off-by: Nikolay Borisov 
Signed-off-by: Fabiano Rosas 
---
I'm extracting the parts from the fixed-ram migration series [1] that
could already go in. This patch is one of them.

1- https://lore.kernel.org/r/20230330180336.2791-1-faro...@suse.de
---
  migration/ram.c | 141 +++-
  1 file changed, 79 insertions(+), 62 deletions(-)


I'd rather 1 patch extracting parse_ramblock() then another
one for parse_ramblocks(), anyhow:

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v4 05/12] virtio-sound: prepare PCM streams

2023-07-25 Thread Marc-André Lureau
Hi

On Thu, Jul 20, 2023 at 4:59 PM Emmanouil Pitsidianakis <
manos.pitsidiana...@linaro.org> wrote:

> After setting PCM parameters, instantiate ("prepare") each stream in
> virtio_snd_pcm_prepare_impl().
>
> Signed-off-by: Emmanouil Pitsidianakis 
> ---
>  hw/virtio/virtio-snd.c | 133 +
>  1 file changed, 133 insertions(+)
>
> diff --git a/hw/virtio/virtio-snd.c b/hw/virtio/virtio-snd.c
> index 62d9bf557c..ca09c937c7 100644
> --- a/hw/virtio/virtio-snd.c
> +++ b/hw/virtio/virtio-snd.c
> @@ -200,6 +200,132 @@ uint32_t virtio_snd_pcm_set_params_impl(VirtIOSound
> *s,
>  return VIRTIO_SND_S_OK;
>  }
>
> +/*
> + * Get a QEMU Audiosystem compatible format value from a
> VIRTIO_SND_PCM_FMT_*
> + */
> +static AudioFormat virtio_snd_get_qemu_format(uint32_t format)
> +{
> +#define CASE(FMT)   \
> +case VIRTIO_SND_PCM_FMT_##FMT:  \
> +return AUDIO_FORMAT_##FMT;
> +
> +switch (format) {
> +CASE(U8)
> +CASE(S8)
> +CASE(U16)
> +CASE(S16)
> +CASE(U32)
> +CASE(S32)
> +case VIRTIO_SND_PCM_FMT_FLOAT:
> +return AUDIO_FORMAT_F32;
> +default:
> +g_assert_not_reached();
> +}
> +
> +#undef CASE
> +}
> +
> +/*
> + * Get a QEMU Audiosystem compatible frequency value from a
> + * VIRTIO_SND_PCM_RATE_*
> + */
> +static uint32_t virtio_snd_get_qemu_freq(uint32_t rate)
> +{
> +#define CASE(RATE)   \
> +case VIRTIO_SND_PCM_RATE_##RATE: \
> +return RATE;
> +
> +switch (rate) {
> +CASE(5512)
> +CASE(8000)
> +CASE(11025)
> +CASE(16000)
> +CASE(22050)
> +CASE(32000)
> +CASE(44100)
> +CASE(48000)
> +CASE(64000)
> +CASE(88200)
> +CASE(96000)
> +CASE(176400)
> +CASE(192000)
> +CASE(384000)
> +default:
> +g_assert_not_reached();
> +}
> +
> +#undef CASE
> +}
> +
> +/*
> + * Get QEMU Audiosystem compatible audsettings from virtio based pcm
> stream
> + * params.
> + */
> +static void virtio_snd_get_qemu_audsettings(audsettings *as,
> +VirtIOSoundPCMParams *params)
> +{
> +as->nchannels = MIN(AUDIO_MAX_CHANNELS, params->channels);
> +as->fmt = virtio_snd_get_qemu_format(params->format);
> +as->freq = virtio_snd_get_qemu_freq(params->rate);
> +as->endianness = AUDIO_HOST_ENDIANNESS;
> +}
> +
> +/*
> + * Prepares a VirtIOSound card stream.
> + * Returns the response status code. (VIRTIO_SND_S_*).
> + *
> + * @s: VirtIOSound device
> + * @stream_id: stream id
> + */
> +static uint32_t virtio_snd_pcm_prepare_impl(VirtIOSound *s, uint32_t
> stream_id)
> +{
> +audsettings as;
> +VirtIOSoundPCMParams *params;
> +VirtIOSoundPCMStream *stream;
> +
> +if (!s->pcm->streams ||
> +!s->pcm->pcm_params ||
> +!s->pcm->pcm_params[stream_id]) {
> +return VIRTIO_SND_S_BAD_MSG;
> +}
> +
> +params = virtio_snd_pcm_get_params(s, stream_id);
> +if (!params) {
> +return VIRTIO_SND_S_BAD_MSG;
> +}
> +
> +virtio_snd_get_qemu_audsettings(&as, params);
> +
> +stream = g_new0(VirtIOSoundPCMStream, 1);
> +
> +stream->id = stream_id;
> +stream->pcm = s->pcm;
> +stream->direction = stream_id < s->snd_conf.streams / 2 +
> +(s->snd_conf.streams & 1) ? VIRTIO_SND_D_OUTPUT :
> VIRTIO_SND_D_INPUT;
> +stream->info.hdr.hda_fn_nid = VIRTIO_SOUND_HDA_FN_NID;
> +stream->features = 0;
> +stream->channels_min = 1;
> +stream->channels_max = as.nchannels;
> +stream->formats = supported_formats;
> +stream->rates = supported_rates;
> +stream->s = s;
> +
> +stream->buffer_bytes = params->buffer_bytes;
> +stream->period_bytes = params->period_bytes;
> +
> +stream->positions[0] = VIRTIO_SND_CHMAP_FL;
> +stream->positions[1] = VIRTIO_SND_CHMAP_FR;
> +
> +stream->as = as;
> +stream->desired_as = stream->as;
> +qemu_mutex_init(&stream->queue_mutex);
> +QSIMPLEQ_INIT(&stream->queue);
> +
> +s->pcm->streams[stream_id] = stream;
>

Shouldn't it close & free the existing stream? Or return an error?


> +
> +return VIRTIO_SND_S_OK;
> +}
> +
>  /*
>   * The actual processing done in virtio_snd_process_cmdq().
>   *
> @@ -432,6 +558,13 @@ static void virtio_snd_common_realize(DeviceState
> *dev,
> print_code(status));
>  return;
>  }
> +status = virtio_snd_pcm_prepare_impl(vsnd, i);
> +if (status != VIRTIO_SND_S_OK) {
> +error_setg(errp,
> +   "Can't prepare streams, device responded with %s.",
> +   print_code(status));
> +return;
> +}
>  }
>  }
>
> --
> 2.39.2
>
>
>

-- 
Marc-André Lureau


Re: [PATCH v21 05/20] s390x/cpu topology: resetting the Topology-Change-Report

2023-07-25 Thread Nina Schoetterl-Glausch
On Fri, 2023-06-30 at 11:17 +0200, Pierre Morel wrote:
> During a subsystem reset the Topology-Change-Report is cleared
> by the machine.
> Let's ask KVM to clear the Modified Topology Change Report (MTCR)
> bit of the SCA in the case of a subsystem reset.
> 
> Signed-off-by: Pierre Morel 
> Reviewed-by: Thomas Huth 
Reviewed-by: Nina Schoetterl-Glausch 



[Stable-8.0.4 04/31] linux-user: Fix accept4(SOCK_NONBLOCK) syscall

2023-07-25 Thread Michael Tokarev
From: Helge Deller 

The Linux accept4() syscall allows two flags only: SOCK_NONBLOCK and
SOCK_CLOEXEC, and returns -EINVAL if any other bits have been set.

Change the qemu implementation accordingly, which means we can not use
the fcntl_flags_tbl[] translation table which allows too many other
values.

Beside the correction in behaviour, this actually fixes the accept4()
emulation for hppa, mips and alpha targets for which SOCK_NONBLOCK is
different than TARGET_SOCK_NONBLOCK (aka O_NONBLOCK).

The fix can be verified with the testcase of the debian lwt package,
which hangs forever in a read() syscall without this patch.

Signed-off-by: Helge Deller 
Reviewed-by: Richard Henderson 
(cherry picked from commit dca4c8384d68bbf5d67f50a5446865d92d61f032)
Signed-off-by: Michael Tokarev 

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 333e6b7026..0901884495 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -3440,7 +3440,17 @@ static abi_long do_accept4(int fd, abi_ulong target_addr,
 abi_long ret;
 int host_flags;
 
-host_flags = target_to_host_bitmask(flags, fcntl_flags_tbl);
+if (flags & ~(TARGET_SOCK_CLOEXEC | TARGET_SOCK_NONBLOCK)) {
+return -TARGET_EINVAL;
+}
+
+host_flags = 0;
+if (flags & TARGET_SOCK_NONBLOCK) {
+host_flags |= SOCK_NONBLOCK;
+}
+if (flags & TARGET_SOCK_CLOEXEC) {
+host_flags |= SOCK_CLOEXEC;
+}
 
 if (target_addr == 0) {
 return get_errno(safe_accept4(fd, NULL, NULL, host_flags));
-- 
2.39.2




[Stable-8.0.4 01/31] virtio-net: correctly report maximum tx_queue_size value

2023-07-25 Thread Michael Tokarev
From: Laurent Vivier 

Maximum value for tx_queue_size depends on the backend type.
1024 for vDPA/vhost-user, 256 for all the others.

The value is returned by virtio_net_max_tx_queue_size() to set the
parameter:

n->net_conf.tx_queue_size = MIN(virtio_net_max_tx_queue_size(n),
n->net_conf.tx_queue_size);

But the parameter checking uses VIRTQUEUE_MAX_SIZE (1024).

So the parameter is silently ignored and ethtool reports a different
value than the one provided by the user.

   ... -netdev tap,... -device virtio-net,tx_queue_size=1024

# ethtool -g enp0s2
Ring parameters for enp0s2:
Pre-set maximums:
RX: 256
RX Mini:n/a
RX Jumbo:   n/a
TX: 256
Current hardware settings:
RX: 256
RX Mini:n/a
RX Jumbo:   n/a
TX: 256

   ... -netdev vhost-user,... -device virtio-net,tx_queue_size=2048

Invalid tx_queue_size (= 2048), must be a power of 2 between 256 and 1024

With this patch the correct maximum value is checked and displayed.

For vDPA/vhost-user:

Invalid tx_queue_size (= 2048), must be a power of 2 between 256 and 1024

For all the others:

Invalid tx_queue_size (= 512), must be a power of 2 between 256 and 256

Fixes: 2eef278b9e63 ("virtio-net: fix tx queue size for !vhost-user")
Cc: m...@redhat.com
Cc: qemu-sta...@nongnu.org
Signed-off-by: Laurent Vivier 
Signed-off-by: Jason Wang 
(cherry picked from commit 4271f4038372f174dbafffacca1a748d058a03ba)
Signed-off-by: Michael Tokarev 

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index af1e89706c..5c0a771170 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3629,12 +3629,12 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 }
 
 if (n->net_conf.tx_queue_size < VIRTIO_NET_TX_QUEUE_MIN_SIZE ||
-n->net_conf.tx_queue_size > VIRTQUEUE_MAX_SIZE ||
+n->net_conf.tx_queue_size > virtio_net_max_tx_queue_size(n) ||
 !is_power_of_2(n->net_conf.tx_queue_size)) {
 error_setg(errp, "Invalid tx_queue_size (= %" PRIu16 "), "
"must be a power of 2 between %d and %d",
n->net_conf.tx_queue_size, VIRTIO_NET_TX_QUEUE_MIN_SIZE,
-   VIRTQUEUE_MAX_SIZE);
+   virtio_net_max_tx_queue_size(n));
 virtio_cleanup(vdev);
 return;
 }
-- 
2.39.2




[Stable-8.0.4 02/31] qemu_cleanup: begin drained section after vm_shutdown()

2023-07-25 Thread Michael Tokarev
From: Fiona Ebner 

in order to avoid requests being stuck in a BlockBackend's request
queue during cleanup. Having such requests can lead to a deadlock [0]
with a virtio-scsi-pci device using iothread that's busy with IO when
initiating a shutdown with QMP 'quit'.

There is a race where such a queued request can continue sometime
(maybe after bdrv_child_free()?) during bdrv_root_unref_child() [1].
The completion will hold the AioContext lock and wait for the BQL
during SCSI completion, but the main thread will hold the BQL and
wait for the AioContext as part of bdrv_root_unref_child(), leading to
the deadlock [0].

[0]:

> Thread 3 (Thread 0x7f3bbd87b700 (LWP 135952) "qemu-system-x86"):
> #0  __lll_lock_wait (futex=futex@entry=0x564183365f00 , 
> private=0) at lowlevellock.c:52
> #1  0x7f3bc1c0d843 in __GI___pthread_mutex_lock (mutex=0x564183365f00 
> ) at ../nptl/pthread_mutex_lock.c:80
> #2  0x564182939f2e in qemu_mutex_lock_impl (mutex=0x564183365f00 
> , file=0x564182b7f774 "../softmmu/physmem.c", line=2593) 
> at ../util/qemu-thread-posix.c:94
> #3  0x56418247cc2a in qemu_mutex_lock_iothread_impl (file=0x564182b7f774 
> "../softmmu/physmem.c", line=2593) at ../softmmu/cpus.c:504
> #4  0x5641826d5325 in prepare_mmio_access (mr=0x5641856148a0) at 
> ../softmmu/physmem.c:2593
> #5  0x5641826d6fe7 in address_space_stl_internal (as=0x56418679b310, 
> addr=4276113408, val=16418, attrs=..., result=0x0, 
> endian=DEVICE_LITTLE_ENDIAN) at /home/febner/repos/qemu/memory_ldst.c.inc:318
> #6  0x5641826d7154 in address_space_stl_le (as=0x56418679b310, 
> addr=4276113408, val=16418, attrs=..., result=0x0) at 
> /home/febner/repos/qemu/memory_ldst.c.inc:357
> #7  0x564182374b07 in pci_msi_trigger (dev=0x56418679b0d0, msg=...) at 
> ../hw/pci/pci.c:359
> #8  0x56418237118b in msi_send_message (dev=0x56418679b0d0, msg=...) at 
> ../hw/pci/msi.c:379
> #9  0x564182372c10 in msix_notify (dev=0x56418679b0d0, vector=8) at 
> ../hw/pci/msix.c:542
> #10 0x56418243719c in virtio_pci_notify (d=0x56418679b0d0, vector=8) at 
> ../hw/virtio/virtio-pci.c:77
> #11 0x5641826933b0 in virtio_notify_vector (vdev=0x5641867a34a0, 
> vector=8) at ../hw/virtio/virtio.c:1985
> #12 0x5641826948d6 in virtio_irq (vq=0x5641867ac078) at 
> ../hw/virtio/virtio.c:2461
> #13 0x564182694978 in virtio_notify (vdev=0x5641867a34a0, 
> vq=0x5641867ac078) at ../hw/virtio/virtio.c:2473
> #14 0x564182665b83 in virtio_scsi_complete_req (req=0x7f3bb000e5d0) at 
> ../hw/scsi/virtio-scsi.c:115
> #15 0x5641826670ce in virtio_scsi_complete_cmd_req (req=0x7f3bb000e5d0) 
> at ../hw/scsi/virtio-scsi.c:641
> #16 0x56418266736b in virtio_scsi_command_complete (r=0x7f3bb0010560, 
> resid=0) at ../hw/scsi/virtio-scsi.c:712
> #17 0x56418239aac6 in scsi_req_complete (req=0x7f3bb0010560, status=2) at 
> ../hw/scsi/scsi-bus.c:1526
> #18 0x56418239e090 in scsi_handle_rw_error (r=0x7f3bb0010560, ret=-123, 
> acct_failed=false) at ../hw/scsi/scsi-disk.c:242
> #19 0x56418239e13f in scsi_disk_req_check_error (r=0x7f3bb0010560, 
> ret=-123, acct_failed=false) at ../hw/scsi/scsi-disk.c:265
> #20 0x56418239e482 in scsi_dma_complete_noio (r=0x7f3bb0010560, ret=-123) 
> at ../hw/scsi/scsi-disk.c:340
> #21 0x56418239e5d9 in scsi_dma_complete (opaque=0x7f3bb0010560, ret=-123) 
> at ../hw/scsi/scsi-disk.c:371
> #22 0x5641824809ad in dma_complete (dbs=0x7f3bb000d9d0, ret=-123) at 
> ../softmmu/dma-helpers.c:107
> #23 0x564182480a72 in dma_blk_cb (opaque=0x7f3bb000d9d0, ret=-123) at 
> ../softmmu/dma-helpers.c:127
> #24 0x5641827bf78a in blk_aio_complete (acb=0x7f3bb00021a0) at 
> ../block/block-backend.c:1563
> #25 0x5641827bfa5e in blk_aio_write_entry (opaque=0x7f3bb00021a0) at 
> ../block/block-backend.c:1630
> #26 0x56418295638a in coroutine_trampoline (i0=-1342102448, i1=32571) at 
> ../util/coroutine-ucontext.c:177
> #27 0x7f3bc0caed40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #28 0x7f3bbd8757f0 in ?? ()
> #29 0x in ?? ()
>
> Thread 1 (Thread 0x7f3bbe3e9280 (LWP 135944) "qemu-system-x86"):
> #0  __lll_lock_wait (futex=futex@entry=0x5641856f2a00, private=0) at 
> lowlevellock.c:52
> #1  0x7f3bc1c0d8d1 in __GI___pthread_mutex_lock (mutex=0x5641856f2a00) at 
> ../nptl/pthread_mutex_lock.c:115
> #2  0x564182939f2e in qemu_mutex_lock_impl (mutex=0x5641856f2a00, 
> file=0x564182c0e319 "../util/async.c", line=728) at 
> ../util/qemu-thread-posix.c:94
> #3  0x56418293a140 in qemu_rec_mutex_lock_impl (mutex=0x5641856f2a00, 
> file=0x564182c0e319 "../util/async.c", line=728) at 
> ../util/qemu-thread-posix.c:149
> #4  0x5641829532d5 in aio_context_acquire (ctx=0x5641856f29a0) at 
> ../util/async.c:728
> #5  0x56418279d5df in bdrv_set_aio_context_commit (opaque=0x5641856e6e50) 
> at ../block.c:7493
> #6  0x56418294e288 in tran_commit (tran=0x56418630bfe0) at 
> ../util/transactions.c:87
> #7  0x56418279d880 in bdrv_try_change_aio_co

[Stable-8.0.4 10/31] target/s390x: Fix LRA when DAT is off

2023-07-25 Thread Michael Tokarev
From: Ilya Leoshkevich 

LRA should perform DAT regardless of whether it's on or off.
Disable DAT check for MMU_S390_LRA.

Fixes: defb0e3157af ("s390x: Implement opcode helpers")
Signed-off-by: Ilya Leoshkevich 
Reviewed-by: David Hildenbrand 
Cc: qemu-sta...@nongnu.org
Message-Id: <20230704081506.276055-7-...@linux.ibm.com>
Signed-off-by: Thomas Huth 
(cherry picked from commit b0ef81062d2404ccef0289b1cc6e70244901c9be)
Signed-off-by: Michael Tokarev 

diff --git a/target/s390x/mmu_helper.c b/target/s390x/mmu_helper.c
index b04b57c235..fbb2f1b4d4 100644
--- a/target/s390x/mmu_helper.c
+++ b/target/s390x/mmu_helper.c
@@ -417,7 +417,7 @@ int mmu_translate(CPUS390XState *env, target_ulong vaddr, 
int rw, uint64_t asc,
 
 vaddr &= TARGET_PAGE_MASK;
 
-if (!(env->psw.mask & PSW_MASK_DAT)) {
+if (rw != MMU_S390_LRA && !(env->psw.mask & PSW_MASK_DAT)) {
 *raddr = vaddr;
 goto nodat;
 }
-- 
2.39.2




[Stable-8.0.4 06/31] target/s390x: Fix EPSW CC reporting

2023-07-25 Thread Michael Tokarev
From: Ilya Leoshkevich 

EPSW should explicitly calculate and insert CC, like IPM does.

Fixes: e30a9d3fea58 ("target-s390: Implement EPSW")
Signed-off-by: Ilya Leoshkevich 
Reviewed-by: David Hildenbrand 
Cc: qemu-sta...@nongnu.org
Message-Id: <20230704081506.276055-3-...@linux.ibm.com>
Signed-off-by: Thomas Huth 
(cherry picked from commit 110b1bac2ecd94a78a1d38003e24e37367bf074e)
Signed-off-by: Michael Tokarev 

diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 82900f53f4..0c22d2f17f 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2393,10 +2393,14 @@ static DisasJumpType op_epsw(DisasContext *s, DisasOps 
*o)
 int r1 = get_field(s, r1);
 int r2 = get_field(s, r2);
 TCGv_i64 t = tcg_temp_new_i64();
+TCGv_i64 t_cc = tcg_temp_new_i64();
 
 /* Note the "subsequently" in the PoO, which implies a defined result
if r1 == r2.  Thus we cannot defer these writes to an output hook.  */
+gen_op_calc_cc(s);
+tcg_gen_extu_i32_i64(t_cc, cc_op);
 tcg_gen_shri_i64(t, psw_mask, 32);
+tcg_gen_deposit_i64(t, t, t_cc, 12, 2);
 store_reg32_i64(r1, t);
 if (r2 != 0) {
 store_reg32_i64(r2, psw_mask);
-- 
2.39.2




[Stable-8.0.4 17/31] linux-user: Make sure initial brk(0) is page-aligned

2023-07-25 Thread Michael Tokarev
From: Andreas Schwab 

Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
Signed-off-by: Andreas Schwab 
Message-Id: 
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
(cherry picked from commit d28b3c90cfad1a7e211ae2bce36ecb9071086129)
Signed-off-by: Michael Tokarev 

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 150d70633e..57aaa87e30 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -806,7 +806,7 @@ static abi_ulong brk_page;
 
 void target_set_brk(abi_ulong new_brk)
 {
-target_brk = new_brk;
+target_brk = TARGET_PAGE_ALIGN(new_brk);
 brk_page = HOST_PAGE_ALIGN(target_brk);
 }
 
-- 
2.39.2




[Stable-8.0.4 22/31] qemu-nbd: pass structure into nbd_client_thread instead of plain char*

2023-07-25 Thread Michael Tokarev
From: "Denis V. Lunev" 

We are going to pass additional flag inside next patch.

Signed-off-by: Denis V. Lunev 
CC: Eric Blake 
CC: Vladimir Sementsov-Ogievskiy 
CC: 
Message-ID: <20230717145544.194786-2-...@openvz.org>
Reviewed-by: Eric Blake 
Signed-off-by: Eric Blake 
(cherry picked from commit 03b67621445d601c9cdc7dfe25812e9f19b81488)
Signed-off-by: Michael Tokarev 

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 6ff45308a9..87c46bb0e5 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -272,9 +272,13 @@ static void *show_parts(void *arg)
 return NULL;
 }
 
+struct NbdClientOpts {
+char *device;
+};
+
 static void *nbd_client_thread(void *arg)
 {
-char *device = arg;
+struct NbdClientOpts *opts = arg;
 NBDExportInfo info = { .request_sizes = false, .name = g_strdup("") };
 QIOChannelSocket *sioc;
 int fd = -1;
@@ -298,10 +302,10 @@ static void *nbd_client_thread(void *arg)
 goto out;
 }
 
-fd = open(device, O_RDWR);
+fd = open(opts->device, O_RDWR);
 if (fd < 0) {
 /* Linux-only, we can use %m in printf.  */
-error_report("Failed to open %s: %m", device);
+error_report("Failed to open %s: %m", opts->device);
 goto out;
 }
 
@@ -311,11 +315,11 @@ static void *nbd_client_thread(void *arg)
 }
 
 /* update partition table */
-pthread_create(&show_parts_thread, NULL, show_parts, device);
+pthread_create(&show_parts_thread, NULL, show_parts, opts->device);
 
 if (verbose) {
 fprintf(stderr, "NBD device %s is now connected to %s\n",
-device, srcpath);
+opts->device, srcpath);
 } else {
 /* Close stderr so that the qemu-nbd process exits.  */
 dup2(STDOUT_FILENO, STDERR_FILENO);
@@ -1121,8 +1125,11 @@ int main(int argc, char **argv)
 if (device) {
 #if HAVE_NBD_DEVICE
 int ret;
+struct NbdClientOpts opts = {
+.device = device,
+};
 
-ret = pthread_create(&client_thread, NULL, nbd_client_thread, device);
+ret = pthread_create(&client_thread, NULL, nbd_client_thread, &opts);
 if (ret != 0) {
 error_report("Failed to create client thread: %s", strerror(ret));
 exit(EXIT_FAILURE);
-- 
2.39.2




[Stable-8.0.4 15/31] linux-user/arm: Do not allocate a commpage at all for M-profile CPUs

2023-07-25 Thread Michael Tokarev
From: Philippe Mathieu-Daudé 

Since commit fbd3c4cff6 ("linux-user/arm: Mark the commpage
executable") executing bare-metal (linked with rdimon.specs)
cortex-M code fails as:

  $ qemu-arm -cpu cortex-m3 ~/hello.exe.m3
  qemu-arm: ../../accel/tcg/user-exec.c:492: page_set_flags: Assertion `last <= 
GUEST_ADDR_MAX' failed.
  Aborted (core dumped)

Commit 4f5c67f8df ("linux-user/arm: Take more care allocating
commpage") already took care of not allocating a commpage for
M-profile CPUs, however it had to be reverted as commit 6cda41daa2.

Re-introduce the M-profile fix from commit 4f5c67f8df.

Fixes: fbd3c4cff6 ("linux-user/arm: Mark the commpage executable")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1755
Reported-by: Christophe Lyon 
Suggested-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Anton Johansson 
Reviewed-by: Richard Henderson 
Message-Id: <20230711153408.68389-1-phi...@linaro.org>
Signed-off-by: Richard Henderson 
(cherry picked from commit d713cf4d6c71076513a10528303b3e337b4d5998)
Signed-off-by: Michael Tokarev 

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index f1370a7a8b..88ef26dc03 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -423,10 +423,23 @@ enum {
 
 static bool init_guest_commpage(void)
 {
-abi_ptr commpage = HI_COMMPAGE & -qemu_host_page_size;
-void *want = g2h_untagged(commpage);
-void *addr = mmap(want, qemu_host_page_size, PROT_READ | PROT_WRITE,
-  MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
+ARMCPU *cpu = ARM_CPU(thread_cpu);
+abi_ptr commpage;
+void *want;
+void *addr;
+
+/*
+ * M-profile allocates maximum of 2GB address space, so can never
+ * allocate the commpage.  Skip it.
+ */
+if (arm_feature(&cpu->env, ARM_FEATURE_M)) {
+return true;
+}
+
+commpage = HI_COMMPAGE & -qemu_host_page_size;
+want = g2h_untagged(commpage);
+addr = mmap(want, qemu_host_page_size, PROT_READ | PROT_WRITE,
+MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
 
 if (addr == MAP_FAILED) {
 perror("Allocating guest commpage");
-- 
2.39.2




[Stable-8.0.4 08/31] target/s390x: Fix MVCRL with a large value in R0

2023-07-25 Thread Michael Tokarev
From: Ilya Leoshkevich 

Using a large R0 causes an assertion error:

qemu-s390x: target/s390x/tcg/mem_helper.c:183: access_prepare_nf: Assertion 
`size > 0 && size <= 4096' failed.

Even though PoP explicitly advises against using more than 8 bits for the
size, an emulator crash is never a good thing.

Fix by truncating the size to 8 bits.

Fixes: ea0a1053e276 ("s390x/tcg: Implement Miscellaneous-Instruction-Extensions 
Facility 3 for the s390x")
Signed-off-by: Ilya Leoshkevich 
Reviewed-by: David Hildenbrand 
Cc: qemu-sta...@nongnu.org
Message-Id: <20230704081506.276055-5-...@linux.ibm.com>
Signed-off-by: Thomas Huth 
(cherry picked from commit 92a57534619a4058544ce8f9c0beae3e054f342b)
Signed-off-by: Michael Tokarev 

diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index 8b58b8d88d..7141d0ad88 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -514,6 +514,7 @@ void HELPER(mvcrl)(CPUS390XState *env, uint64_t l, uint64_t 
dest, uint64_t src)
 int32_t i;
 
 /* MVCRL always copies one more byte than specified - maximum is 256 */
+l &= 0xff;
 l++;
 
 access_prepare(&srca, env, src, l, MMU_DATA_LOAD, mmu_idx, ra);
-- 
2.39.2




[Stable-8.0.4 23/31] qemu-nbd: fix regression with qemu-nbd --fork run over ssh

2023-07-25 Thread Michael Tokarev
From: "Denis V. Lunev" 

Commit e6df58a5578fee7a50bbf36f4a50a2781cff855d
Author: Hanna Reitz 
Date:   Wed May 8 23:18:18 2019 +0200
qemu-nbd: Do not close stderr

has introduced an interesting regression. Original behavior of
ssh somehost qemu-nbd /home/den/tmp/file -f raw --fork
was the following:
 * qemu-nbd was started as a daemon
 * the command execution is done and ssh exited with success

The patch has changed this behavior and 'ssh' command now hangs forever.

According to the normal specification of the daemon() call, we should
endup with STDERR pointing to /dev/null. That should be done at the
very end of the successful startup sequence when the pipe to the
bootstrap process (used for diagnostics) is no longer needed.

This could be achived in the same way as done for 'qemu-nbd -c' case.
That was commit 0eaf453e, also fixing up e6df58a5. STDOUT copying to
STDERR does the trick.

This also leads to proper 'ssh' connection closing which fixes my
original problem.

Signed-off-by: Denis V. Lunev 
CC: Eric Blake 
CC: Vladimir Sementsov-Ogievskiy 
CC: Hanna Reitz 
CC: 
Message-ID: <20230717145544.194786-3-...@openvz.org>
Reviewed-by: Eric Blake 
Signed-off-by: Eric Blake 
(cherry picked from commit 5c56dd27a2c905c9cf2472d2fd057621ce5fd00d)
Signed-off-by: Michael Tokarev 

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 87c46bb0e5..e64f45f767 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -274,6 +274,7 @@ static void *show_parts(void *arg)
 
 struct NbdClientOpts {
 char *device;
+bool fork_process;
 };
 
 static void *nbd_client_thread(void *arg)
@@ -317,7 +318,7 @@ static void *nbd_client_thread(void *arg)
 /* update partition table */
 pthread_create(&show_parts_thread, NULL, show_parts, opts->device);
 
-if (verbose) {
+if (verbose && !opts->fork_process) {
 fprintf(stderr, "NBD device %s is now connected to %s\n",
 opts->device, srcpath);
 } else {
@@ -579,7 +580,6 @@ int main(int argc, char **argv)
 bool writethrough = false; /* Client will flush as needed. */
 bool fork_process = false;
 bool list = false;
-int old_stderr = -1;
 unsigned socket_activation;
 const char *pid_file_name = NULL;
 const char *selinux_label = NULL;
@@ -934,11 +934,6 @@ int main(int argc, char **argv)
 } else if (pid == 0) {
 close(stderr_fd[0]);
 
-/* Remember parent's stderr if we will be restoring it. */
-if (fork_process) {
-old_stderr = dup(STDERR_FILENO);
-}
-
 ret = qemu_daemon(1, 0);
 
 /* Temporarily redirect stderr to the parent's pipe...  */
@@ -1127,6 +1122,7 @@ int main(int argc, char **argv)
 int ret;
 struct NbdClientOpts opts = {
 .device = device,
+.fork_process = fork_process,
 };
 
 ret = pthread_create(&client_thread, NULL, nbd_client_thread, &opts);
@@ -1155,8 +1151,7 @@ int main(int argc, char **argv)
 }
 
 if (fork_process) {
-dup2(old_stderr, STDERR_FILENO);
-close(old_stderr);
+dup2(STDOUT_FILENO, STDERR_FILENO);
 }
 
 state = RUNNING;
-- 
2.39.2




[Stable-8.0.4 07/31] target/s390x: Fix MDEB and MDEBR

2023-07-25 Thread Michael Tokarev
From: Ilya Leoshkevich 

These instructions multiply 32 bits by 32 bits, not 32 bits by 64 bits.

Fixes: 83b00736f3d8 ("target-s390: Convert FP MULTIPLY")
Signed-off-by: Ilya Leoshkevich 
Reviewed-by: David Hildenbrand 
Cc: qemu-sta...@nongnu.org
Message-Id: <20230704081506.276055-4-...@linux.ibm.com>
Signed-off-by: Thomas Huth 
(cherry picked from commit fed9a4fe0ce0ec917a6b3a2da0a7ecd3cb9eba56)
Signed-off-by: Michael Tokarev 

diff --git a/target/s390x/tcg/fpu_helper.c b/target/s390x/tcg/fpu_helper.c
index 57e5829283..4b7fa58af3 100644
--- a/target/s390x/tcg/fpu_helper.c
+++ b/target/s390x/tcg/fpu_helper.c
@@ -306,8 +306,9 @@ uint64_t HELPER(mdb)(CPUS390XState *env, uint64_t f1, 
uint64_t f2)
 /* 64/32-bit FP multiplication */
 uint64_t HELPER(mdeb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
 {
+float64 f1_64 = float32_to_float64(f1, &env->fpu_status);
 float64 ret = float32_to_float64(f2, &env->fpu_status);
-ret = float64_mul(f1, ret, &env->fpu_status);
+ret = float64_mul(f1_64, ret, &env->fpu_status);
 handle_exceptions(env, false, GETPC());
 return ret;
 }
diff --git a/target/s390x/tcg/insn-data.h.inc b/target/s390x/tcg/insn-data.h.inc
index a586cc515b..295eb07173 100644
--- a/target/s390x/tcg/insn-data.h.inc
+++ b/target/s390x/tcg/insn-data.h.inc
@@ -667,11 +667,11 @@
 F(0xb317, MEEBR,   RRE,   Z,   e1, e2, new, e1, meeb, 0, IF_BFP)
 F(0xb31c, MDBR,RRE,   Z,   f1, f2, new, f1, mdb, 0, IF_BFP)
 F(0xb34c, MXBR,RRE,   Z,   x1, x2, new_x, x1, mxb, 0, IF_BFP)
-F(0xb30c, MDEBR,   RRE,   Z,   f1, e2, new, f1, mdeb, 0, IF_BFP)
+F(0xb30c, MDEBR,   RRE,   Z,   e1, e2, new, f1, mdeb, 0, IF_BFP)
 F(0xb307, MXDBR,   RRE,   Z,   f1, f2, new_x, x1, mxdb, 0, IF_BFP)
 F(0xed17, MEEB,RXE,   Z,   e1, m2_32u, new, e1, meeb, 0, IF_BFP)
 F(0xed1c, MDB, RXE,   Z,   f1, m2_64, new, f1, mdb, 0, IF_BFP)
-F(0xed0c, MDEB,RXE,   Z,   f1, m2_32u, new, f1, mdeb, 0, IF_BFP)
+F(0xed0c, MDEB,RXE,   Z,   e1, m2_32u, new, f1, mdeb, 0, IF_BFP)
 F(0xed07, MXDB,RXE,   Z,   f1, m2_64, new_x, x1, mxdb, 0, IF_BFP)
 /* MULTIPLY HALFWORD */
 C(0x4c00, MH,  RX_a,  Z,   r1_o, m2_16s, new, r1_32, mul, 0)
-- 
2.39.2




[Stable-8.0.4 29/31] target/s390x: Fix ICM with M3=0

2023-07-25 Thread Michael Tokarev
From: Ilya Leoshkevich 

When the mask is zero, access exceptions should still be recognized for
1 byte at the second-operand address. CC should be set to 0.

Cc: qemu-sta...@nongnu.org
Fixes: e023e832d0ac ("s390x: translate engine for s390x CPU")
Reviewed-by: David Hildenbrand 
Reviewed-by: Richard Henderson 
Signed-off-by: Ilya Leoshkevich 
Message-Id: <20230724082032.66864-5-...@linux.ibm.com>
Signed-off-by: Thomas Huth 
(cherry picked from commit a2025557ed4d8d5e6a4d0dd681717c390f51f5be)
Signed-off-by: Michael Tokarev 

diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index f005539861..9e9fa3cef0 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2525,6 +2525,12 @@ static DisasJumpType op_icm(DisasContext *s, DisasOps *o)
 ccm = ((1ull << len) - 1) << pos;
 break;
 
+case 0:
+/* Recognize access exceptions for the first byte.  */
+tcg_gen_qemu_ld_i64(tmp, o->in2, get_mem_index(s), MO_UB);
+gen_op_movi_cc(s, 0);
+return DISAS_NEXT;
+
 default:
 /* This is going to be a sequence of loads and inserts.  */
 pos = base + 32 - 8;
-- 
2.39.2




[Stable-8.0.4 00/31] Patch Round-up for stable 8.0.4, freeze on 2023-08-05

2023-07-25 Thread Michael Tokarev
The following patches are queued for QEMU stable v8.0.4:

  https://gitlab.com/qemu-project/qemu/-/commits/staging-8.0

Patch freeze is 2023-08-05, and the release is planned for 2023-08-07:

  https://wiki.qemu.org/Planning/8.0

Please respond here or CC qemu-sta...@nongnu.org on any additional patches
you think should (or shouldn't) be included in the release.

The changes which are staging for inclusion, with the original commit hash
from master branch, are given below the bottom line.

Thanks!

/mjt

--
01 4271f4038372 Laurent Vivier:
   virtio-net: correctly report maximum tx_queue_size value
02 ca2a5e630dc1 Fiona Ebner:
   qemu_cleanup: begin drained section after vm_shutdown()
03 2ad2e113deb5 Nicholas Piggin:
   hw/ppc: Fix clock update drift
04 dca4c8384d68 Helge Deller:
   linux-user: Fix accept4(SOCK_NONBLOCK) syscall
05 8af87a3ec7e4 Avihai Horon:
   vfio: Fix null pointer dereference bug in vfio_bars_finalize()
06 110b1bac2ecd Ilya Leoshkevich:
   target/s390x: Fix EPSW CC reporting
07 fed9a4fe0ce0 Ilya Leoshkevich:
   target/s390x: Fix MDEB and MDEBR
08 92a57534619a Ilya Leoshkevich:
   target/s390x: Fix MVCRL with a large value in R0
09 6da311a60d58 Ilya Leoshkevich:
   target/s390x: Fix LRA overwriting the top 32 bits on DAT error
10 b0ef81062d24 Ilya Leoshkevich:
   target/s390x: Fix LRA when DAT is off
11 baf21eebc3e1 Marcin Nowakowski:
   target/mips: enable GINVx support for I6400 and I6500
12 230dfd9257e9 Olaf Hering:
   hw/ide/piix: properly initialize the BMIBA register
13 7a8d9f3a0e88 Pierrick Bouvier:
   linux-user/syscall: Implement execve without execveat
14 e18ed26ce785 Richard Henderson:
   tcg: Fix info_in_idx increment in layout_arg_by_ref
15 d713cf4d6c71 Philippe Mathieu-Daudé:
   linux-user/arm: Do not allocate a commpage at all for M-profile CPUs
16 d921fea338c1 Mauro Matteo Cascella:
   ui/vnc-clipboard: fix infinite loop in inflate_buffer (CVE-2023-3255)
17 d28b3c90cfad Andreas Schwab:
   linux-user: Make sure initial brk(0) is page-aligned
18 ea3c76f1494d Klaus Jensen:
   hw/nvme: fix endianness issue for shadow doorbells
19 15ad98536ad9 Helge Deller:
   linux-user: Fix qemu brk() to not zero bytes on current page
20 dfe49864afb0 Helge Deller:
   linux-user: Prohibit brk() to to shrink below initial heap address
21 eac78a4b0b7d Helge Deller:
   linux-user: Fix signed math overflow in brk() syscall
22 03b67621445d Denis V. Lunev:
   qemu-nbd: pass structure into nbd_client_thread instead of plain char*
23 5c56dd27a2c9 Denis V. Lunev:
   qemu-nbd: fix regression with qemu-nbd --fork run over ssh
24 736a1588c104 Jordan Niethe:
   tcg/ppc: Fix race in goto_tb implementation
25 22d2e5351a18 Ilya Leoshkevich:
   tcg/{i386, s390x}: Add earlyclobber to the op_add2's first output
26 761b0aa9381e Ilya Leoshkevich:
   target/s390x: Make CKSM raise an exception if R2 is odd
27 4b6e4c0b8223 Ilya Leoshkevich:
   target/s390x: Fix CLM with M3=0
28 53684e344a27 Ilya Leoshkevich:
   target/s390x: Fix CONVERT TO LOGICAL/FIXED with out-of-range inputs
29 a2025557ed4d Ilya Leoshkevich:
   target/s390x: Fix ICM with M3=0
30 9c028c057adc Ilya Leoshkevich:
   target/s390x: Make MC raise specification exception when class >= 16
31 ff537b0370ab Ilya Leoshkevich:
   target/s390x: Fix assertion failure in VFMIN/VFMAX with type 13



[Stable-8.0.4 24/31] tcg/ppc: Fix race in goto_tb implementation

2023-07-25 Thread Michael Tokarev
From: Jordan Niethe 

Commit 20b6643324 ("tcg/ppc: Reorg goto_tb implementation") modified
goto_tb to ensure only a single instruction was patched to prevent
incorrect behavior if a thread was in the middle of multiple
instructions when they were replaced. However this introduced a race
between loading the jmp target into TCG_REG_TB and patching and
executing the direct branch.

The relevant part of the goto_tb implementation:

ld TCG_REG_TB, TARGET_ADDR_LOCATION(TCG_REG_TB)
  patch_location:
mtctr TCG_REG_TB
bctr

tb_target_set_jmp_target() will replace 'patch_location' with a direct
branch if the target is in range. The direct branch now relies on
TCG_REG_TB being set up correctly by the ld. Prior to this commit
multiple instructions were patched in for the direct branch case; these
instructions would initialize TCG_REG_TB to the same value as the branch
target.

Imagine the following sequence:

1) Thread A is executing the goto_tb sequence and loads the jmp
   target into TCG_REG_TB.

2) Thread B updates the jmp target address and calls
   tb_target_set_jmp_target(). This patches a new direct branch into the
   goto_tb sequence.

3) Thread A executes the newly patched direct branch. The value in
   TCG_REG_TB still contains the old jmp target.

TCG_REG_TB MUST contain the translation block's tc.ptr. Execution will
eventually crash after performing memory accesses generated from a
faulty value in TCG_REG_TB.

This presents as segfaults or illegal instruction exceptions.

Do not revert commit 20b6643324 as it did fix a different race
condition. Instead remove the direct branch optimization and always use
indirect branches.

The direct branch optimization can be re-added later with a race free
sequence.

Fixes: 20b6643324 ("tcg/ppc: Reorg goto_tb implementation")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1726
Reported-by: Anushree Mathur 
Tested-by: Anushree Mathur 
Tested-by: Michael Tokarev 
Reviewed-by: Richard Henderson 
Co-developed-by: Benjamin Gray 
Signed-off-by: Jordan Niethe 
Signed-off-by: Benjamin Gray 
Message-Id: <20230717093001.13167-1-jniet...@gmail.com>
(cherry picked from commit 736a1588c104e9995c1831df33554df1f1def8b8)
Signed-off-by: Michael Tokarev 

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 066b49224a..c68bf08e38 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -2546,11 +2546,10 @@ static void tcg_out_goto_tb(TCGContext *s, int which)
 ptrdiff_t offset = tcg_tbrel_diff(s, (void *)ptr);
 tcg_out_mem_long(s, LD, LDX, TCG_REG_TB, TCG_REG_TB, offset);
 
-/* Direct branch will be patched by tb_target_set_jmp_target. */
+/* TODO: Use direct branches when possible. */
 set_jmp_insn_offset(s, which);
 tcg_out32(s, MTSPR | RS(TCG_REG_TB) | CTR);
 
-/* When branch is out of range, fall through to indirect. */
 tcg_out32(s, BCCTR | BO_ALWAYS);
 
 /* For the unlinked case, need to reset TCG_REG_TB.  */
@@ -2578,10 +2577,12 @@ void tb_target_set_jmp_target(const TranslationBlock 
*tb, int n,
 intptr_t diff = addr - jmp_rx;
 tcg_insn_unit insn;
 
+if (USE_REG_TB) {
+return;
+}
+
 if (in_range_b(diff)) {
 insn = B | (diff & 0x3fc);
-} else if (USE_REG_TB) {
-insn = MTSPR | RS(TCG_REG_TB) | CTR;
 } else {
 insn = NOP;
 }
-- 
2.39.2




[Stable-8.0.4 05/31] vfio: Fix null pointer dereference bug in vfio_bars_finalize()

2023-07-25 Thread Michael Tokarev
From: Avihai Horon 

vfio_realize() has the following flow:
1. vfio_bars_prepare() -- sets VFIOBAR->size.
2. msix_early_setup().
3. vfio_bars_register() -- allocates VFIOBAR->mr.

After vfio_bars_prepare() is called msix_early_setup() can fail. If it
does fail, vfio_bars_register() is never called and VFIOBAR->mr is not
allocated.

In this case, vfio_bars_finalize() is called as part of the error flow
to free the bars' resources. However, vfio_bars_finalize() calls
object_unparent() for VFIOBAR->mr after checking only VFIOBAR->size, and
thus we get a null pointer dereference.

Fix it by checking VFIOBAR->mr in vfio_bars_finalize().

Fixes: 89d5202edc50 ("vfio/pci: Allow relocating MSI-X MMIO")
Signed-off-by: Avihai Horon 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Alex Williamson 
Signed-off-by: Cédric Le Goater 
(cherry picked from commit 8af87a3ec7e42ff1b9cf75ceee0451c31e34d153)
Signed-off-by: Michael Tokarev 

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 4773cc1f2b..53dcb3efaa 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1752,9 +1752,11 @@ static void vfio_bars_finalize(VFIOPCIDevice *vdev)
 
 vfio_bar_quirk_finalize(vdev, i);
 vfio_region_finalize(&bar->region);
-if (bar->size) {
+if (bar->mr) {
+assert(bar->size);
 object_unparent(OBJECT(bar->mr));
 g_free(bar->mr);
+bar->mr = NULL;
 }
 }
 
-- 
2.39.2




[Stable-8.0.4 20/31] linux-user: Prohibit brk() to to shrink below initial heap address

2023-07-25 Thread Michael Tokarev
From: Helge Deller 

Since commit 86f04735ac ("linux-user: Fix brk() to release pages") it's
possible for userspace applications to reduce their memory footprint by
calling brk() with a lower address and free up memory. Before that commit
guest heap memory was never unmapped.

But the Linux kernel prohibits to reduce brk() below the initial memory
address which is set at startup by the set_brk() function in binfmt_elf.c.
Such a range check was missed in commit 86f04735ac.

This patch adds the missing check by storing the initial brk value in
initial_target_brk and verify any new brk addresses against that value.

Tested with the i386 upx binary from
https://github.com/upx/upx/releases/download/v4.0.2/upx-4.0.2-i386_linux.tar.xz

Signed-off-by: Helge Deller 
Tested-by: "Markus F.X.J. Oberhumer" 
Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
Cc: qemu-sta...@nongnu.org
Buglink: https://github.com/upx/upx/issues/683
(cherry picked from commit dfe49864afb06e7e452a4366051697bc4fcfc1a5)
Signed-off-by: Michael Tokarev 

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 450487af57..e106633837 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -801,12 +801,13 @@ static inline int host_to_target_sock_type(int host_type)
 return target_type;
 }
 
-static abi_ulong target_brk;
+static abi_ulong target_brk, initial_target_brk;
 static abi_ulong brk_page;
 
 void target_set_brk(abi_ulong new_brk)
 {
 target_brk = TARGET_PAGE_ALIGN(new_brk);
+initial_target_brk = target_brk;
 brk_page = HOST_PAGE_ALIGN(target_brk);
 }
 
@@ -824,6 +825,11 @@ abi_long do_brk(abi_ulong brk_val)
 return target_brk;
 }
 
+/* do not allow to shrink below initial brk value */
+if (brk_val < initial_target_brk) {
+brk_val = initial_target_brk;
+}
+
 new_brk = TARGET_PAGE_ALIGN(brk_val);
 new_host_brk_page = HOST_PAGE_ALIGN(brk_val);
 
-- 
2.39.2




[Stable-8.0.4 28/31] target/s390x: Fix CONVERT TO LOGICAL/FIXED with out-of-range inputs

2023-07-25 Thread Michael Tokarev
From: Ilya Leoshkevich 

CONVERT TO LOGICAL/FIXED deviate from IEEE 754 in that they raise an
inexact exception on out-of-range inputs. float_flag_invalid_cvti
aligns nicely with that behavior, so convert it to
S390_IEEE_MASK_INEXACT.

Cc: qemu-sta...@nongnu.org
Fixes: defb0e3157af ("s390x: Implement opcode helpers")
Reviewed-by: David Hildenbrand 
Reviewed-by: Richard Henderson 
Signed-off-by: Ilya Leoshkevich 
Message-Id: <20230724082032.66864-4-...@linux.ibm.com>
Signed-off-by: Thomas Huth 
(cherry picked from commit 53684e344a27da770acc9012740334154ddea24f)
Signed-off-by: Michael Tokarev 

diff --git a/target/s390x/tcg/fpu_helper.c b/target/s390x/tcg/fpu_helper.c
index 4b7fa58af3..3d941ed2d2 100644
--- a/target/s390x/tcg/fpu_helper.c
+++ b/target/s390x/tcg/fpu_helper.c
@@ -52,7 +52,8 @@ uint8_t s390_softfloat_exc_to_ieee(unsigned int exc)
 s390_exc |= (exc & float_flag_divbyzero) ? S390_IEEE_MASK_DIVBYZERO : 0;
 s390_exc |= (exc & float_flag_overflow) ? S390_IEEE_MASK_OVERFLOW : 0;
 s390_exc |= (exc & float_flag_underflow) ? S390_IEEE_MASK_UNDERFLOW : 0;
-s390_exc |= (exc & float_flag_inexact) ? S390_IEEE_MASK_INEXACT : 0;
+s390_exc |= (exc & (float_flag_inexact | float_flag_invalid_cvti)) ?
+S390_IEEE_MASK_INEXACT : 0;
 
 return s390_exc;
 }
-- 
2.39.2




[Stable-8.0.4 18/31] hw/nvme: fix endianness issue for shadow doorbells

2023-07-25 Thread Michael Tokarev
From: Klaus Jensen 

In commit 2fda0726e514 ("hw/nvme: fix missing endian conversions for
doorbell buffers"), we fixed shadow doorbells for big-endian guests
running on little endian hosts. But I did not fix little-endian guests
on big-endian hosts. Fix this.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1765
Fixes: 3f7fe8de3d49 ("hw/nvme: Implement shadow doorbell buffer support")
Cc: qemu-sta...@nongnu.org
Reported-by: Thomas Huth 
Tested-by: Cédric Le Goater 
Tested-by: Thomas Huth 
Reviewed-by: Keith Busch 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Klaus Jensen 
(cherry picked from commit ea3c76f1494d0c75873c3b470e6e048202661ad8)
Signed-off-by: Michael Tokarev 

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index ac24eeb5ed..2097fb1310 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6767,6 +6767,7 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const 
NvmeRequest *req)
 PCIDevice *pci = PCI_DEVICE(n);
 uint64_t dbs_addr = le64_to_cpu(req->cmd.dptr.prp1);
 uint64_t eis_addr = le64_to_cpu(req->cmd.dptr.prp2);
+uint32_t v;
 int i;
 
 /* Address should be page aligned */
@@ -6784,6 +6785,8 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const 
NvmeRequest *req)
 NvmeCQueue *cq = n->cq[i];
 
 if (sq) {
+v = cpu_to_le32(sq->tail);
+
 /*
  * CAP.DSTRD is 0, so offset of ith sq db_addr is (i<<3)
  * nvme_process_db() uses this hard-coded way to calculate
@@ -6791,7 +6794,7 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const 
NvmeRequest *req)
  */
 sq->db_addr = dbs_addr + (i << 3);
 sq->ei_addr = eis_addr + (i << 3);
-pci_dma_write(pci, sq->db_addr, &sq->tail, sizeof(sq->tail));
+pci_dma_write(pci, sq->db_addr, &v, sizeof(sq->tail));
 
 if (n->params.ioeventfd && sq->sqid != 0) {
 if (!nvme_init_sq_ioeventfd(sq)) {
@@ -6801,10 +6804,12 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const 
NvmeRequest *req)
 }
 
 if (cq) {
+v = cpu_to_le32(cq->head);
+
 /* CAP.DSTRD is 0, so offset of ith cq db_addr is (i<<3)+(1<<2) */
 cq->db_addr = dbs_addr + (i << 3) + (1 << 2);
 cq->ei_addr = eis_addr + (i << 3) + (1 << 2);
-pci_dma_write(pci, cq->db_addr, &cq->head, sizeof(cq->head));
+pci_dma_write(pci, cq->db_addr, &v, sizeof(cq->head));
 
 if (n->params.ioeventfd && cq->cqid != 0) {
 if (!nvme_init_cq_ioeventfd(cq)) {
@@ -7555,7 +7560,7 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr addr, 
unsigned size)
 static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
 {
 PCIDevice *pci = PCI_DEVICE(n);
-uint32_t qid;
+uint32_t qid, v;
 
 if (unlikely(addr & ((1 << 2) - 1))) {
 NVME_GUEST_ERR(pci_nvme_ub_db_wr_misaligned,
@@ -7622,7 +7627,8 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int 
val)
 start_sqs = nvme_cq_full(cq) ? 1 : 0;
 cq->head = new_head;
 if (!qid && n->dbbuf_enabled) {
-pci_dma_write(pci, cq->db_addr, &cq->head, sizeof(cq->head));
+v = cpu_to_le32(cq->head);
+pci_dma_write(pci, cq->db_addr, &v, sizeof(cq->head));
 }
 if (start_sqs) {
 NvmeSQueue *sq;
@@ -7682,6 +7688,8 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int 
val)
 
 sq->tail = new_tail;
 if (!qid && n->dbbuf_enabled) {
+v = cpu_to_le32(sq->tail);
+
 /*
  * The spec states "the host shall also update the controller's
  * corresponding doorbell property to match the value of that entry
@@ -7695,7 +7703,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int 
val)
  * including ones that run on Linux, are not updating Admin Queues,
  * so we can't trust reading it for an appropriate sq tail.
  */
-pci_dma_write(pci, sq->db_addr, &sq->tail, sizeof(sq->tail));
+pci_dma_write(pci, sq->db_addr, &v, sizeof(sq->tail));
 }
 
 qemu_bh_schedule(sq->bh);
-- 
2.39.2




[Stable-8.0.4 16/31] ui/vnc-clipboard: fix infinite loop in inflate_buffer (CVE-2023-3255)

2023-07-25 Thread Michael Tokarev
From: Mauro Matteo Cascella 

A wrong exit condition may lead to an infinite loop when inflating a
valid zlib buffer containing some extra bytes in the `inflate_buffer`
function. The bug only occurs post-authentication. Return the buffer
immediately if the end of the compressed data has been reached
(Z_STREAM_END).

Fixes: CVE-2023-3255
Fixes: 0bf41cab ("ui/vnc: clipboard support")
Reported-by: Kevin Denis 
Signed-off-by: Mauro Matteo Cascella 
Reviewed-by: Marc-André Lureau 
Tested-by: Marc-André Lureau 
Message-ID: <20230704084210.101822-1-mcasc...@redhat.com>
(cherry picked from commit d921fea338c1059a27ce7b75309d7a2e485f710b)
Signed-off-by: Michael Tokarev 

diff --git a/ui/vnc-clipboard.c b/ui/vnc-clipboard.c
index 8aeadfaa21..c759be3438 100644
--- a/ui/vnc-clipboard.c
+++ b/ui/vnc-clipboard.c
@@ -50,8 +50,11 @@ static uint8_t *inflate_buffer(uint8_t *in, uint32_t in_len, 
uint32_t *size)
 ret = inflate(&stream, Z_FINISH);
 switch (ret) {
 case Z_OK:
-case Z_STREAM_END:
 break;
+case Z_STREAM_END:
+*size = stream.total_out;
+inflateEnd(&stream);
+return out;
 case Z_BUF_ERROR:
 out_len <<= 1;
 if (out_len > (1 << 20)) {
@@ -66,11 +69,6 @@ static uint8_t *inflate_buffer(uint8_t *in, uint32_t in_len, 
uint32_t *size)
 }
 }
 
-*size = stream.total_out;
-inflateEnd(&stream);
-
-return out;
-
 err_end:
 inflateEnd(&stream);
 err:
-- 
2.39.2




[Stable-8.0.4 03/31] hw/ppc: Fix clock update drift

2023-07-25 Thread Michael Tokarev
From: Nicholas Piggin 

The clock update logic reads the clock twice to compute the new clock
value, with a value derived from the later time subtracted from a value
derived from the earlier time. The delta causes time to be lost.

This can ultimately result in time becoming unsynchronized between CPUs
and that can cause OS lockups, timeouts, watchdogs, etc. This can be
seen running a KVM guest (that causes lots of TB updates) on a powernv
SMP machine.

Fix this by reading the clock once.

Cc: qemu-sta...@nongnu.org
Fixes: dbdd25065e90 ("Implement time-base start/stop helpers.")
Signed-off-by: Nicholas Piggin 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Frederic Barrat 
Message-ID: <20230629020713.327745-1-npig...@gmail.com>
Signed-off-by: Daniel Henrique Barboza 
(cherry picked from commit 2ad2e113deb5663e69a05dd6922cbfc6d7ea34d3)
Signed-off-by: Michael Tokarev 

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index d80b0adc6c..85d442fbce 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -535,23 +535,24 @@ static inline void cpu_ppc_store_tb(ppc_tb_t *tb_env, 
uint64_t vmclk,
 void cpu_ppc_store_tbl (CPUPPCState *env, uint32_t value)
 {
 ppc_tb_t *tb_env = env->tb_env;
+int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 uint64_t tb;
 
-tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), 
tb_env->tb_offset);
+tb = cpu_ppc_get_tb(tb_env, clock, tb_env->tb_offset);
 tb &= 0xULL;
-cpu_ppc_store_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
- &tb_env->tb_offset, tb | (uint64_t)value);
+cpu_ppc_store_tb(tb_env, clock, &tb_env->tb_offset, tb | (uint64_t)value);
 }
 
 static inline void _cpu_ppc_store_tbu(CPUPPCState *env, uint32_t value)
 {
 ppc_tb_t *tb_env = env->tb_env;
+int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 uint64_t tb;
 
-tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), 
tb_env->tb_offset);
+tb = cpu_ppc_get_tb(tb_env, clock, tb_env->tb_offset);
 tb &= 0xULL;
-cpu_ppc_store_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
- &tb_env->tb_offset, ((uint64_t)value << 32) | tb);
+cpu_ppc_store_tb(tb_env, clock, &tb_env->tb_offset,
+ ((uint64_t)value << 32) | tb);
 }
 
 void cpu_ppc_store_tbu (CPUPPCState *env, uint32_t value)
@@ -584,23 +585,24 @@ uint32_t cpu_ppc_load_atbu (CPUPPCState *env)
 void cpu_ppc_store_atbl (CPUPPCState *env, uint32_t value)
 {
 ppc_tb_t *tb_env = env->tb_env;
+int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 uint64_t tb;
 
-tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), 
tb_env->atb_offset);
+tb = cpu_ppc_get_tb(tb_env, clock, tb_env->atb_offset);
 tb &= 0xULL;
-cpu_ppc_store_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
- &tb_env->atb_offset, tb | (uint64_t)value);
+cpu_ppc_store_tb(tb_env, clock, &tb_env->atb_offset, tb | (uint64_t)value);
 }
 
 void cpu_ppc_store_atbu (CPUPPCState *env, uint32_t value)
 {
 ppc_tb_t *tb_env = env->tb_env;
+int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 uint64_t tb;
 
-tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), 
tb_env->atb_offset);
+tb = cpu_ppc_get_tb(tb_env, clock, tb_env->atb_offset);
 tb &= 0xULL;
-cpu_ppc_store_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
- &tb_env->atb_offset, ((uint64_t)value << 32) | tb);
+cpu_ppc_store_tb(tb_env, clock, &tb_env->atb_offset,
+ ((uint64_t)value << 32) | tb);
 }
 
 uint64_t cpu_ppc_load_vtb(CPUPPCState *env)
@@ -622,14 +624,13 @@ void cpu_ppc_store_vtb(CPUPPCState *env, uint64_t value)
 void cpu_ppc_store_tbu40(CPUPPCState *env, uint64_t value)
 {
 ppc_tb_t *tb_env = env->tb_env;
+int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 uint64_t tb;
 
-tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
-tb_env->tb_offset);
+tb = cpu_ppc_get_tb(tb_env, clock, tb_env->tb_offset);
 tb &= 0xFFUL;
 tb |= (value & ~0xFFUL);
-cpu_ppc_store_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
- &tb_env->tb_offset, tb);
+cpu_ppc_store_tb(tb_env, clock, &tb_env->tb_offset, tb);
 }
 
 static void cpu_ppc_tb_stop (CPUPPCState *env)
-- 
2.39.2




[Stable-8.0.4 30/31] target/s390x: Make MC raise specification exception when class >= 16

2023-07-25 Thread Michael Tokarev
From: Ilya Leoshkevich 

MC requires bit positions 8-11 (upper 4 bits of class) to be zeros,
otherwise it must raise a specification exception.

Cc: qemu-sta...@nongnu.org
Fixes: 20d143e2cab8 ("s390x/tcg: Implement MONITOR CALL")
Reviewed-by: David Hildenbrand 
Reviewed-by: Richard Henderson 
Signed-off-by: Ilya Leoshkevich 
Message-Id: <20230724082032.66864-6-...@linux.ibm.com>
Signed-off-by: Thomas Huth 
(cherry picked from commit 9c028c057adce49304c6e4a51f6b426bd4f8f6b8)
Signed-off-by: Michael Tokarev 

diff --git a/target/s390x/tcg/excp_helper.c b/target/s390x/tcg/excp_helper.c
index 228aa9f237..3da337f7c7 100644
--- a/target/s390x/tcg/excp_helper.c
+++ b/target/s390x/tcg/excp_helper.c
@@ -639,7 +639,7 @@ void monitor_event(CPUS390XState *env,
 void HELPER(monitor_call)(CPUS390XState *env, uint64_t monitor_code,
   uint32_t monitor_class)
 {
-g_assert(monitor_class <= 0xff);
+g_assert(monitor_class <= 0xf);
 
 if (env->cregs[8] & (0x8000 >> monitor_class)) {
 monitor_event(env, monitor_code, monitor_class, GETPC());
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 9e9fa3cef0..964ddd12d1 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -3192,9 +3192,9 @@ static DisasJumpType op_lcbb(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_mc(DisasContext *s, DisasOps *o)
 {
-const uint16_t monitor_class = get_field(s, i2);
+const uint8_t monitor_class = get_field(s, i2);
 
-if (monitor_class & 0xff00) {
+if (monitor_class & 0xf0) {
 gen_program_exception(s, PGM_SPECIFICATION);
 return DISAS_NORETURN;
 }
-- 
2.39.2




[Stable-8.0.4 27/31] target/s390x: Fix CLM with M3=0

2023-07-25 Thread Michael Tokarev
From: Ilya Leoshkevich 

When the mask is zero, access exceptions should still be recognized for
1 byte at the second-operand address. CC should be set to 0.

Cc: qemu-sta...@nongnu.org
Fixes: defb0e3157af ("s390x: Implement opcode helpers")
Reviewed-by: David Hildenbrand 
Reviewed-by: Richard Henderson 
Signed-off-by: Ilya Leoshkevich 
Message-Id: <20230724082032.66864-3-...@linux.ibm.com>
Signed-off-by: Thomas Huth 
(cherry picked from commit 4b6e4c0b8223681ae85462794848db4386de1a8d)
Signed-off-by: Michael Tokarev 

diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index 51894f17f5..541859afe7 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -667,6 +667,11 @@ uint32_t HELPER(clm)(CPUS390XState *env, uint32_t r1, 
uint32_t mask,
 HELPER_LOG("%s: r1 0x%x mask 0x%x addr 0x%" PRIx64 "\n", __func__, r1,
mask, addr);
 
+if (!mask) {
+/* Recognize access exceptions for the first byte */
+probe_read(env, addr, 1, cpu_mmu_index(env, false), ra);
+}
+
 while (mask) {
 if (mask & 8) {
 uint8_t d = cpu_ldub_data_ra(env, addr, ra);
-- 
2.39.2




[Stable-8.0.4 12/31] hw/ide/piix: properly initialize the BMIBA register

2023-07-25 Thread Michael Tokarev
From: Olaf Hering 

According to the 82371FB documentation (82371FB.pdf, 2.3.9. BMIBA-BUS
MASTER INTERFACE BASE ADDRESS REGISTER, April 1997), the register is
32bit wide. To properly reset it to default values, all 32bit need to be
cleared. Bit #0 "Resource Type Indicator (RTE)" needs to be enabled.

The initial change wrote just the lower 8 bit, leaving parts of the "Bus
Master Interface Base Address" address at bit 15:4 unchanged.

Fixes: e6a71ae327 ("Add support for 82371FB (Step A1) and Improved support for 
82371SB (Function 1)")

Signed-off-by: Olaf Hering 
Reviewed-by: Bernhard Beschow 
Reviewed-by: Philippe Mathieu-Daudé 
Message-ID: <20230712074721.14728-1-o...@aepfle.de>
Signed-off-by: Paolo Bonzini 
(cherry picked from commit 230dfd9257e92259876c113e58b5f0d22b056d2e)
Signed-off-by: Michael Tokarev 

diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 41d60921e3..17ec304064 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -118,7 +118,7 @@ static void piix_ide_reset(DeviceState *dev)
 pci_set_word(pci_conf + PCI_COMMAND, 0x);
 pci_set_word(pci_conf + PCI_STATUS,
  PCI_STATUS_DEVSEL_MEDIUM | PCI_STATUS_FAST_BACK);
-pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
+pci_set_long(pci_conf + 0x20, 0x1);  /* BMIBA: 20-23h */
 }
 
 static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp)
-- 
2.39.2




[Stable-8.0.4 21/31] linux-user: Fix signed math overflow in brk() syscall

2023-07-25 Thread Michael Tokarev
From: Helge Deller 

Fix the math overflow when calculating the new_malloc_size.

new_host_brk_page and brk_page are unsigned integers. If userspace
reduces the heap, new_host_brk_page is lower than brk_page which results
in a huge positive number (but should actually be negative).

Fix it by adding a proper check and as such make the code more readable.

Signed-off-by: Helge Deller 
Tested-by: "Markus F.X.J. Oberhumer" 
Reviewed-by: Philippe Mathieu-Daudé 
Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
Cc: qemu-sta...@nongnu.org
Buglink: https://github.com/upx/upx/issues/683
(cherry picked from commit eac78a4b0b7da4de2c0a297f4d528ca9cc6256a3)
Signed-off-by: Michael Tokarev 

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index e106633837..30a2ac0099 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -860,12 +860,13 @@ abi_long do_brk(abi_ulong brk_val)
  * itself); instead we treat "mapped but at wrong address" as
  * a failure and unmap again.
  */
-new_alloc_size = new_host_brk_page - brk_page;
-if (new_alloc_size) {
+if (new_host_brk_page > brk_page) {
+new_alloc_size = new_host_brk_page - brk_page;
 mapped_addr = get_errno(target_mmap(brk_page, new_alloc_size,
 PROT_READ|PROT_WRITE,
 MAP_ANON|MAP_PRIVATE, 0, 0));
 } else {
+new_alloc_size = 0;
 mapped_addr = brk_page;
 }
 
-- 
2.39.2




[Stable-8.0.4 31/31] target/s390x: Fix assertion failure in VFMIN/VFMAX with type 13

2023-07-25 Thread Michael Tokarev
From: Ilya Leoshkevich 

Type 13 is reserved, so using it should result in specification
exception. Due to an off-by-1 error the code triggers an assertion at a
later point in time instead.

Cc: qemu-sta...@nongnu.org
Fixes: da4807527f3b ("s390x/tcg: Implement VECTOR FP (MAXIMUM|MINIMUM)")
Reviewed-by: David Hildenbrand 
Reviewed-by: Richard Henderson 
Signed-off-by: Ilya Leoshkevich 
Message-Id: <20230724082032.66864-8-...@linux.ibm.com>
Signed-off-by: Thomas Huth 
(cherry picked from commit ff537b0370ab5918052b8d8a798e803c47272406)
Signed-off-by: Michael Tokarev 

diff --git a/target/s390x/tcg/translate_vx.c.inc 
b/target/s390x/tcg/translate_vx.c.inc
index 43dfbfd03f..f8df121d3d 100644
--- a/target/s390x/tcg/translate_vx.c.inc
+++ b/target/s390x/tcg/translate_vx.c.inc
@@ -3047,7 +3047,7 @@ static DisasJumpType op_vfmax(DisasContext *s, DisasOps 
*o)
 const uint8_t m5 = get_field(s, m5);
 gen_helper_gvec_3_ptr *fn;
 
-if (m6 == 5 || m6 == 6 || m6 == 7 || m6 > 13) {
+if (m6 == 5 || m6 == 6 || m6 == 7 || m6 >= 13) {
 gen_program_exception(s, PGM_SPECIFICATION);
 return DISAS_NORETURN;
 }
-- 
2.39.2




[Stable-8.0.4 25/31] tcg/{i386, s390x}: Add earlyclobber to the op_add2's first output

2023-07-25 Thread Michael Tokarev
From: Ilya Leoshkevich 

i386 and s390x implementations of op_add2 require an earlyclobber,
which is currently missing. This breaks VCKSM in s390x guests. E.g., on
x86_64 the following op:

add2_i32 tmp2,tmp3,tmp2,tmp3,tmp3,tmp2   dead: 0 2 3 4 5  pref=none,0x

is translated to:

addl %ebx, %r12d
adcl %r12d, %ebx

Introduce a new C_N1_O1_I4 constraint, and make sure that earlyclobber
of aliased outputs is honored.

Cc: qemu-sta...@nongnu.org
Fixes: 82790a870992 ("tcg: Add markup for output requires new register")
Signed-off-by: Ilya Leoshkevich 
Reviewed-by: Richard Henderson 
Message-Id: <20230719221310.1968845-7-...@linux.ibm.com>
Signed-off-by: Richard Henderson 
(cherry picked from commit 22d2e5351a18aff5a9c7e3984b50ecce61ff8975)
Signed-off-by: Michael Tokarev 

diff --git a/tcg/i386/tcg-target-con-set.h b/tcg/i386/tcg-target-con-set.h
index 91ceb0e1da..5ea3a292f0 100644
--- a/tcg/i386/tcg-target-con-set.h
+++ b/tcg/i386/tcg-target-con-set.h
@@ -11,6 +11,9 @@
  *
  * C_N1_Im(...) defines a constraint set with 1 output and  inputs,
  * except that the output must use a new register.
+ *
+ * C_Nn_Om_Ik(...) defines a constraint set with  outputs and 
+ * inputs, except that the first  outputs must use new registers.
  */
 C_O0_I1(r)
 C_O0_I2(L, L)
@@ -53,4 +56,4 @@ C_O2_I1(r, r, L)
 C_O2_I2(a, d, a, r)
 C_O2_I2(r, r, L, L)
 C_O2_I3(a, d, 0, 1, r)
-C_O2_I4(r, r, 0, 1, re, re)
+C_N1_O1_I4(r, r, 0, 1, re, re)
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 5c7c180799..d00800d18a 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -3356,7 +3356,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_add2_i64:
 case INDEX_op_sub2_i32:
 case INDEX_op_sub2_i64:
-return C_O2_I4(r, r, 0, 1, re, re);
+return C_N1_O1_I4(r, r, 0, 1, re, re);
 
 case INDEX_op_ctz_i32:
 case INDEX_op_ctz_i64:
diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index 15f1c55103..31daa5daca 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -8,6 +8,9 @@
  * C_On_Im(...) defines a constraint set with  outputs and  inputs.
  * Each operand should be a sequence of constraint letters as defined by
  * tcg-target-con-str.h; the constraint combination is inclusive or.
+ *
+ * C_Nn_Om_Ik(...) defines a constraint set with  outputs and 
+ * inputs, except that the first  outputs must use new registers.
  */
 C_O0_I1(r)
 C_O0_I2(L, L)
@@ -41,6 +44,5 @@ C_O1_I4(r, r, rA, rI, r)
 C_O2_I2(o, m, 0, r)
 C_O2_I2(o, m, r, r)
 C_O2_I3(o, m, 0, 1, r)
-C_O2_I4(r, r, 0, 1, rA, r)
-C_O2_I4(r, r, 0, 1, ri, r)
-C_O2_I4(r, r, 0, 1, r, r)
+C_N1_O1_I4(r, r, 0, 1, ri, r)
+C_N1_O1_I4(r, r, 0, 1, rA, r)
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 844532156b..2e5fd4968c 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -3229,11 +3229,11 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)
 
 case INDEX_op_add2_i32:
 case INDEX_op_sub2_i32:
-return C_O2_I4(r, r, 0, 1, ri, r);
+return C_N1_O1_I4(r, r, 0, 1, ri, r);
 
 case INDEX_op_add2_i64:
 case INDEX_op_sub2_i64:
-return C_O2_I4(r, r, 0, 1, rA, r);
+return C_N1_O1_I4(r, r, 0, 1, rA, r);
 
 case INDEX_op_st_vec:
 return C_O0_I2(v, r);
diff --git a/tcg/tcg.c b/tcg/tcg.c
index f3bf471274..09f345fa1b 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -368,6 +368,7 @@ void tcg_raise_tb_overflow(TCGContext *s)
 #define C_O2_I2(O1, O2, I1, I2) C_PFX4(c_o2_i2_, O1, O2, I1, I2),
 #define C_O2_I3(O1, O2, I1, I2, I3) C_PFX5(c_o2_i3_, O1, O2, I1, I2, I3),
 #define C_O2_I4(O1, O2, I1, I2, I3, I4) C_PFX6(c_o2_i4_, O1, O2, I1, I2, I3, 
I4),
+#define C_N1_O1_I4(O1, O2, I1, I2, I3, I4) C_PFX6(c_n1_o1_i4_, O1, O2, I1, I2, 
I3, I4),
 
 typedef enum {
 #include "tcg-target-con-set.h"
@@ -388,6 +389,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode);
 #undef C_O2_I2
 #undef C_O2_I3
 #undef C_O2_I4
+#undef C_N1_O1_I4
 
 /* Put all of the constraint sets into an array, indexed by the enum. */
 
@@ -407,6 +409,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode);
 #define C_O2_I2(O1, O2, I1, I2) { .args_ct_str = { #O1, #O2, #I1, #I2 
} },
 #define C_O2_I3(O1, O2, I1, I2, I3) { .args_ct_str = { #O1, #O2, #I1, #I2, 
#I3 } },
 #define C_O2_I4(O1, O2, I1, I2, I3, I4) { .args_ct_str = { #O1, #O2, #I1, #I2, 
#I3, #I4 } },
+#define C_N1_O1_I4(O1, O2, I1, I2, I3, I4) { .args_ct_str = { "&" #O1, #O2, 
#I1, #I2, #I3, #I4 } },
 
 static const TCGTargetOpDef constraint_sets[] = {
 #include "tcg-target-con-set.h"
@@ -426,6 +429,7 @@ static const TCGTargetOpDef constraint_sets[] = {
 #undef C_O2_I2
 #undef C_O2_I3
 #undef C_O2_I4
+#undef C_N1_O1_I4
 
 /* Expand the enumerator to be returned from tcg_target_op_def(). */
 
@@ -445,6 +449,7 @@ static const TCGTargetOpDef constraint_sets[] = {
 #define C_O

  1   2   3   >