date:20230413

Re: [PATCH v2 27/54] tcg/riscv: Require TCG_TARGET_REG_BITS == 64

2023-04-13 Thread Richard Henderson


On 4/12/23 22:18, Daniel Henrique Barboza wrote:



On 4/10/23 22:04, Richard Henderson wrote:

The port currently does not support "oversize" guests, which
means riscv32 can only target 32-bit guests.  We will soon be
building TCG once for all guests.  This implies that we can
only support riscv64.

Since all Linux distributions target riscv64 not riscv32,
this is not much of a restriction and simplifies the code.


Code looks good but I got confused about the riscv32 implications you cited.

Does this means that if someone happens to have a risc-v 32 bit host, with a
special Linux sauce that runs on that 32 bit risc-v host, this person won't be
able to build the riscv32 TCG target in that machine?


Correct.

At present, one is able to configure with such a host, and if one uses --target-list=x,y,z 
such that all of x, y or z are 32-bit guests the build should even succeed, and the result 
should probably work.


However, if one does not use --target-list in configure, the build will #error 
out here:


@@ -942,9 +913,6 @@ static void * const qemu_st_helpers[MO_SIZE + 1] = {
  #endif
  };
-/* We don't support oversize guests */
-QEMU_BUILD_BUG_ON(TCG_TARGET_REG_BITS < TARGET_LONG_BITS);
-


I am working on a patch set, not yet posted, which builds tcg/*.o twice, once for system 
mode and once for user-only.  At which point riscv32 cannot build at all.


I brought this patch forward from there in order to reduce churn.


r~

Re: [PATCH 1/4] vhost: Re-enable vrings after setting features

2023-04-13 Thread Maxime Coquelin





On 4/12/23 22:51, Stefan Hajnoczi wrote:

On Tue, Apr 11, 2023 at 05:05:12PM +0200, Hanna Czenczek wrote:

If the back-end supports the VHOST_USER_F_PROTOCOL_FEATURES feature,
setting the vhost features will set this feature, too.  Doing so
disables all vrings, which may not be intended.

For example, enabling or disabling logging during migration requires
setting those features (to set or unset VHOST_F_LOG_ALL), which will
automatically disable all vrings.  In either case, the VM is running
(disabling logging is done after a failed or cancelled migration, and
only once the VM is running again, see comment in
memory_global_dirty_log_stop()), so the vrings should really be enabled.
As a result, the back-end seems to hang.

To fix this, we must remember whether the vrings are supposed to be
enabled, and, if so, re-enable them after a SET_FEATURES call that set
VHOST_USER_F_PROTOCOL_FEATURES.

It seems less than ideal that there is a short period in which the VM is
running but the vrings will be stopped (between SET_FEATURES and
SET_VRING_ENABLE).  To fix this, we would need to change the protocol,
e.g. by introducing a new flag or vhost-user protocol feature to disable
disabling vrings whenever VHOST_USER_F_PROTOCOL_FEATURES is set, or add
new functions for setting/clearing singular feature bits (so that
F_LOG_ALL can be set/cleared without touching F_PROTOCOL_FEATURES).

Even with such a potential addition to the protocol, we still need this
fix here, because we cannot expect that back-ends will implement this
addition.

Signed-off-by: Hanna Czenczek 
---
  include/hw/virtio/vhost.h | 10 ++
  hw/virtio/vhost.c | 13 +
  2 files changed, 23 insertions(+)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a52f273347..2fe02ed5d4 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -90,6 +90,16 @@ struct vhost_dev {
  int vq_index_end;
  /* if non-zero, minimum required value for max_queues */
  int num_queues;
+
+/*
+ * Whether the virtqueues are supposed to be enabled (via
+ * SET_VRING_ENABLE).  Setting the features (e.g. for
+ * enabling/disabling logging) will disable all virtqueues if
+ * VHOST_USER_F_PROTOCOL_FEATURES is set, so then we need to
+ * re-enable them if this field is set.
+ */
+bool enable_vqs;
+
  /**
   * vhost feature handling requires matching the feature set
   * offered by a backend which may be a subset of the total
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index a266396576..cbff589efa 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -50,6 +50,8 @@ static unsigned int used_memslots;
  static QLIST_HEAD(, vhost_dev) vhost_devices =
  QLIST_HEAD_INITIALIZER(vhost_devices);
  
+static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable);

+
  bool vhost_has_free_slot(void)
  {
  unsigned int slots_limit = ~0U;
@@ -899,6 +901,15 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
  }
  }
  
+if (dev->enable_vqs) {

+/*
+ * Setting VHOST_USER_F_PROTOCOL_FEATURES would have disabled all
+ * virtqueues, even if that was not intended; re-enable them if
+ * necessary.
+ */
+vhost_dev_set_vring_enable(dev, true);
+}
+
  out:
  return r;
  }
@@ -1896,6 +1907,8 @@ int vhost_dev_get_inflight(struct vhost_dev *dev, 
uint16_t queue_size,
  
  static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable)

  {
+hdev->enable_vqs = enable;
+
  if (!hdev->vhost_ops->vhost_set_vring_enable) {
  return 0;
  }


The vhost-user spec doesn't say that VHOST_F_LOG_ALL needs to be toggled
at runtime and I don't think VHOST_USER_SET_PROTOCOL_FEATURES is
intended to be used like that. This issue shows why doing so is a bad
idea.

VHOST_F_LOG_ALL does not need to be toggled to control logging. Logging
is controlled at runtime by the presence of the dirty log
(VHOST_USER_SET_LOG_BASE) and the per-vring logging flag
(VHOST_VRING_F_LOG).

I suggest permanently enabling VHOST_F_LOG_ALL upon connection when the
the backend supports it. No spec changes are required.

libvhost-user looks like it will work. I didn't look at DPDK/SPDK, but
checking that it works there is important too.

I have CCed people who may be interested in this issue. This is the
first time I've looked at vhost-user logging, so this idea may not work.


In the case of DPDK, we rely on VHOST_F_LOG_ALL to be set to know
whether we should do dirty pages logging or not [0], so setting this
feature at init time will cause performance degradation. The check on
whether the log base address has been set is done afterwards.

Regards,
Maxime


Stefan


[0]: https://git.dpdk.org/dpdk/tree/lib/vhost/vhost.h#n594

Re: Reducing vdpa migration downtime because of memory pin / maps

2023-04-13 Thread Eugenio Perez Martin

On Wed, Apr 12, 2023 at 8:19 AM Jason Wang  wrote:
>
> On Wed, Apr 12, 2023 at 1:56 PM Jason Wang  wrote:
> >
> > On Tue, Apr 11, 2023 at 8:34 PM Eugenio Perez Martin
> >  wrote:
> > >
> > > On Wed, Apr 5, 2023 at 1:37 PM Eugenio Perez Martin  
> > > wrote:
> > > >
> > > > Hi!
> > > >
> > > > As mentioned in the last upstream virtio-networking meeting, one of
> > > > the factors that adds more downtime to migration is the handling of
> > > > the guest memory (pin, map, etc). At this moment this handling is
> > > > bound to the virtio life cycle (DRIVER_OK, RESET). In that sense, the
> > > > destination device waits until all the guest memory / state is
> > > > migrated to start pinning all the memory.
> > > >
> > > > The proposal is to bind it to the char device life cycle (open vs
> > > > close), so all the guest memory can be pinned for all the guest / qemu
> > > > lifecycle.
> > > >
> > > > This has two main problems:
> > > > * At this moment the reset semantics forces the vdpa device to unmap
> > > > all the memory. So this change needs a vhost vdpa feature flag.
> > > > * This may increase the initialization time. Maybe we can delay it if
> > > > qemu is not the destination of a LM. Anyway I think this should be
> > > > done as an optimization on top.
> > > >
> > >
> > > Expanding on this we could reduce the pinning even more now that vring
> > > supports VA [1] with the emulated CVQ.
> >
> > Note that VA for hardware means the device needs to support page fault
> > through either PRI or vendor specific interface.
> >
> > >
> > > Something like:
> > > - Add VHOST_VRING_GROUP_CAN_USE_VA ioctl to check if a given VQ group
> > > capability. Passthrough devices with emulated CVQ would return false
> > > for the dataplane and true for the control vq group.
>
> We don't even need this actually, since the pinning is not visible to
> the userspace. Userspace can only see the IOTLB abstraction actually.
>
> We can invent a group->use_va, then when we attach AS to a group that
> can use va, we can avoid the pinning.
>

That would solve one part for sure, but SVQ will keep translating HVA
to SVQ IOVA, and then the kernel needs to translate it back. With
VHOST_VRING_GROUP_CAN_USE_VA, the SVQ and the kernel skip all
translation.

Thanks!

> Thanks
>
> > > - If that is true, qemu does not need to map and translate addresses
> > > for CVQ but to directly provide VA for buffers. This avoids pinning,
> > > translations, etc in this case.
> >
> > For CVQ yes, but we only avoid the pinning for CVQ not others.
> >
> > Thanks
> >
> > >
> > > Thanks!
> > >
> > > [1] 
> > > https://lore.kernel.org/virtualization/20230404131326.44403-2-sgarz...@redhat.com/
> > >
>

Re: [PATCH] replication: compile out some staff when replication is not configured

2023-04-13 Thread Daniil Tatianin


Just a few minor nits

On 4/11/23 5:51 PM, Vladimir Sementsov-Ogievskiy wrote:

Don't compile-in replication-related files when replication is disabled
in config.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all!

I'm unsure, should it be actually separate
--disable-colo / --enable-colo options or it's really used only together
with replication staff.. So, I decided to start with simpler variant.


You probably meant 'stuff' and not 'staff' in the commit message and 
here as well?




  block/meson.build |  2 +-
  migration/meson.build |  6 --
  net/meson.build   |  8 
  qapi/migration.json   |  6 --
  stubs/colo.c  | 46 +++
  stubs/meson.build |  1 +
  6 files changed, 60 insertions(+), 9 deletions(-)
  create mode 100644 stubs/colo.c

diff --git a/block/meson.build b/block/meson.build
index 382bec0e7d..b9a72e219b 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -84,7 +84,7 @@ block_ss.add(when: 'CONFIG_WIN32', if_true: 
files('file-win32.c', 'win32-aio.c')
  block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, 
iokit])
  block_ss.add(when: libiscsi, if_true: files('iscsi-opts.c'))
  block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
-if not get_option('replication').disabled()
+if get_option('replication').allowed()
block_ss.add(files('replication.c'))
  endif
  block_ss.add(when: libaio, if_true: files('linux-aio.c'))
diff --git a/migration/meson.build b/migration/meson.build
index 0d1bb9f96e..8180eaea7b 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -13,8 +13,6 @@ softmmu_ss.add(files(
'block-dirty-bitmap.c',
'channel.c',
'channel-block.c',
-  'colo-failover.c',
-  'colo.c',
'exec.c',
'fd.c',
'global_state.c',
@@ -29,6 +27,10 @@ softmmu_ss.add(files(
'threadinfo.c',
  ), gnutls)
  
+if get_option('replication').allowed()

+  softmmu_ss.add(files('colo.c', 'colo-failover.c'))
+endif
+
  softmmu_ss.add(when: rdma, if_true: files('rdma.c'))
  if get_option('live_block_migration').allowed()
softmmu_ss.add(files('block.c'))
diff --git a/net/meson.build b/net/meson.build
index 87afca3e93..634ab71cc6 100644
--- a/net/meson.build
+++ b/net/meson.build
@@ -1,13 +1,9 @@
  softmmu_ss.add(files(
'announce.c',
'checksum.c',
-  'colo-compare.c',
-  'colo.c',
'dump.c',
'eth.c',
'filter-buffer.c',
-  'filter-mirror.c',
-  'filter-rewriter.c',
'filter.c',
'hub.c',
'net-hmp-cmds.c',
@@ -19,6 +15,10 @@ softmmu_ss.add(files(
'util.c',
  ))
  
+if get_option('replication').allowed()

+  softmmu_ss.add(files('colo-compare.c', 'colo.c', 'filter-rewriter.c', 
'filter-mirror.c'))
+endif
+
  softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('filter-replay.c'))
  
  if have_l2tpv3

diff --git a/qapi/migration.json b/qapi/migration.json
index c84fa10e86..5b81e09369 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1685,7 +1685,8 @@
  ##
  { 'struct': 'COLOStatus',
'data': { 'mode': 'COLOMode', 'last-mode': 'COLOMode',
-'reason': 'COLOExitReason' } }
+'reason': 'COLOExitReason' },
+  'if': 'CONFIG_REPLICATION' }
  
  ##

  # @query-colo-status:
@@ -1702,7 +1703,8 @@
  # Since: 3.1
  ##
  { 'command': 'query-colo-status',
-  'returns': 'COLOStatus' }
+  'returns': 'COLOStatus',
+  'if': 'CONFIG_REPLICATION' }
  
  ##

  # @migrate-recover:
diff --git a/stubs/colo.c b/stubs/colo.c
new file mode 100644
index 00..5a02540baa
--- /dev/null
+++ b/stubs/colo.c
@@ -0,0 +1,46 @@
+#include "qemu/osdep.h"
+#include "qemu/notify.h"
+#include "net/colo-compare.h"
+#include "migration/colo.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-migration.h"
+
+void colo_compare_cleanup(void)
+{
+abort();
+}
+
+void colo_shutdown(void)
+{
+abort();
+}
+
+void *colo_process_incoming_thread(void *opaque)
+{
+abort();
+}
+
+void colo_checkpoint_notify(void *opaque)
+{
+abort();
+}
+
+void migrate_start_colo_process(MigrationState *s)
+{
+abort();
+}
+
+bool migration_in_colo_state(void)
+{
+return false;
+}
+
+bool migration_incoming_in_colo_state(void)
+{
+return false;
+}
+
+void qmp_x_colo_lost_heartbeat(Error **errp)
+{
+error_setg(errp, "COLO support is not built in");


Maybe 'built-in' with a dash for consistency with usb-dev-stub?


+}
diff --git a/stubs/meson.build b/stubs/meson.build
index b2b5956d97..8412cad15f 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -45,6 +45,7 @@ stub_ss.add(files('target-get-monitor-def.c'))
  stub_ss.add(files('target-monitor-defs.c'))
  stub_ss.add(files('trace-control.c'))
  stub_ss.add(files('uuid.c'))
+stub_ss.add(files('colo.c'))
  stub_ss.add(files('vmstate.c'))
  stub_ss.add(files('vm-stop.c'))
  stub_ss.add(files('win32-kbd-hook.c'))

Re: [PATCH 1/4] vhost: Re-enable vrings after setting features

2023-04-13 Thread Hanna Czenczek


On 12.04.23 22:51, Stefan Hajnoczi wrote:

On Tue, Apr 11, 2023 at 05:05:12PM +0200, Hanna Czenczek wrote:

If the back-end supports the VHOST_USER_F_PROTOCOL_FEATURES feature,
setting the vhost features will set this feature, too.  Doing so
disables all vrings, which may not be intended.

For example, enabling or disabling logging during migration requires
setting those features (to set or unset VHOST_F_LOG_ALL), which will
automatically disable all vrings.  In either case, the VM is running
(disabling logging is done after a failed or cancelled migration, and
only once the VM is running again, see comment in
memory_global_dirty_log_stop()), so the vrings should really be enabled.
As a result, the back-end seems to hang.

To fix this, we must remember whether the vrings are supposed to be
enabled, and, if so, re-enable them after a SET_FEATURES call that set
VHOST_USER_F_PROTOCOL_FEATURES.

It seems less than ideal that there is a short period in which the VM is
running but the vrings will be stopped (between SET_FEATURES and
SET_VRING_ENABLE).  To fix this, we would need to change the protocol,
e.g. by introducing a new flag or vhost-user protocol feature to disable
disabling vrings whenever VHOST_USER_F_PROTOCOL_FEATURES is set, or add
new functions for setting/clearing singular feature bits (so that
F_LOG_ALL can be set/cleared without touching F_PROTOCOL_FEATURES).

Even with such a potential addition to the protocol, we still need this
fix here, because we cannot expect that back-ends will implement this
addition.

Signed-off-by: Hanna Czenczek 
---
  include/hw/virtio/vhost.h | 10 ++
  hw/virtio/vhost.c | 13 +
  2 files changed, 23 insertions(+)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a52f273347..2fe02ed5d4 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -90,6 +90,16 @@ struct vhost_dev {
  int vq_index_end;
  /* if non-zero, minimum required value for max_queues */
  int num_queues;
+
+/*
+ * Whether the virtqueues are supposed to be enabled (via
+ * SET_VRING_ENABLE).  Setting the features (e.g. for
+ * enabling/disabling logging) will disable all virtqueues if
+ * VHOST_USER_F_PROTOCOL_FEATURES is set, so then we need to
+ * re-enable them if this field is set.
+ */
+bool enable_vqs;
+
  /**
   * vhost feature handling requires matching the feature set
   * offered by a backend which may be a subset of the total
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index a266396576..cbff589efa 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -50,6 +50,8 @@ static unsigned int used_memslots;
  static QLIST_HEAD(, vhost_dev) vhost_devices =
  QLIST_HEAD_INITIALIZER(vhost_devices);
  
+static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable);

+
  bool vhost_has_free_slot(void)
  {
  unsigned int slots_limit = ~0U;
@@ -899,6 +901,15 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
  }
  }
  
+if (dev->enable_vqs) {

+/*
+ * Setting VHOST_USER_F_PROTOCOL_FEATURES would have disabled all
+ * virtqueues, even if that was not intended; re-enable them if
+ * necessary.
+ */
+vhost_dev_set_vring_enable(dev, true);
+}
+
  out:
  return r;
  }
@@ -1896,6 +1907,8 @@ int vhost_dev_get_inflight(struct vhost_dev *dev, 
uint16_t queue_size,
  
  static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable)

  {
+hdev->enable_vqs = enable;
+
  if (!hdev->vhost_ops->vhost_set_vring_enable) {
  return 0;
  }

The vhost-user spec doesn't say that VHOST_F_LOG_ALL needs to be toggled
at runtime and I don't think VHOST_USER_SET_PROTOCOL_FEATURES is
intended to be used like that. This issue shows why doing so is a bad
idea.

VHOST_F_LOG_ALL does not need to be toggled to control logging. Logging
is controlled at runtime by the presence of the dirty log
(VHOST_USER_SET_LOG_BASE) and the per-vring logging flag
(VHOST_VRING_F_LOG).


Technically, the spec doesn’t say that SET_LOG_BASE is required.  It says:

“To start/stop logging of data/used ring writes, the front-end may send 
messages VHOST_USER_SET_FEATURES with VHOST_F_LOG_ALL and 
VHOST_USER_SET_VRING_ADDR with VHOST_VRING_F_LOG in ring’s flags set to 
1/0, respectively.”


(So the spec also very much does imply that toggling F_LOG_ALL at 
runtime is a valid way to enable/disable logging.  If we were to no 
longer do that, we should clarify it there.)


I mean, naturally, logging without a shared memory area to log in to 
isn’t much fun, so we could clarify that SET_LOG_BASE is also a 
requirement, but it looks to me as if we can’t use SET_LOG_BASE to 
disable logging, because it’s supposed to always pass a valid FD (at 
least libvhost-user expects this: 
https://gitlab.com/qemu-project/qemu/-/blob/master/subprojects/libvhost-user/libvhost-user.c#L1044). 
So after a cancelled migr

Re: [PATCH 0/4] vhost-user-fs: Internal migration

2023-04-13 Thread Hanna Czenczek


On 12.04.23 23:00, Stefan Hajnoczi wrote:

Hi,
Is there a vhost-user.rst spec patch?


Ah, right, I forgot.

Will add!

Hanna

Re: [PATCH 2/4] vhost-user: Interface for migration state transfer

2023-04-13 Thread Eugenio Perez Martin

On Tue, Apr 11, 2023 at 5:33 PM Hanna Czenczek  wrote:
>
> So-called "internal" virtio-fs migration refers to transporting the
> back-end's (virtiofsd's) state through qemu's migration stream.  To do
> this, we need to be able to transfer virtiofsd's internal state to and
> from virtiofsd.
>
> Because virtiofsd's internal state will not be too large, we believe it
> is best to transfer it as a single binary blob after the streaming
> phase.  Because this method should be useful to other vhost-user
> implementations, too, it is introduced as a general-purpose addition to
> the protocol, not limited to vhost-user-fs.
>
> These are the additions to the protocol:
> - New vhost-user protocol feature VHOST_USER_PROTOCOL_F_MIGRATORY_STATE:
>   This feature signals support for transferring state, and is added so
>   that migration can fail early when the back-end has no support.
>
> - SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a pipe
>   over which to transfer the state.  The front-end sends an FD to the
>   back-end into/from which it can write/read its state, and the back-end
>   can decide to either use it, or reply with a different FD for the
>   front-end to override the front-end's choice.
>   The front-end creates a simple pipe to transfer the state, but maybe
>   the back-end already has an FD into/from which it has to write/read
>   its state, in which case it will want to override the simple pipe.
>   Conversely, maybe in the future we find a way to have the front-end
>   get an immediate FD for the migration stream (in some cases), in which
>   case we will want to send this to the back-end instead of creating a
>   pipe.
>   Hence the negotiation: If one side has a better idea than a plain
>   pipe, we will want to use that.
>
> - CHECK_DEVICE_STATE: After the state has been transferred through the
>   pipe (the end indicated by EOF), the front-end invokes this function
>   to verify success.  There is no in-band way (through the pipe) to
>   indicate failure, so we need to check explicitly.
>
> Once the transfer pipe has been established via SET_DEVICE_STATE_FD
> (which includes establishing the direction of transfer and migration
> phase), the sending side writes its data into the pipe, and the reading
> side reads it until it sees an EOF.  Then, the front-end will check for
> success via CHECK_DEVICE_STATE, which on the destination side includes
> checking for integrity (i.e. errors during deserialization).
>
> Suggested-by: Stefan Hajnoczi 
> Signed-off-by: Hanna Czenczek 
> ---
>  include/hw/virtio/vhost-backend.h |  24 +
>  include/hw/virtio/vhost.h |  79 
>  hw/virtio/vhost-user.c| 147 ++
>  hw/virtio/vhost.c |  37 
>  4 files changed, 287 insertions(+)
>
> diff --git a/include/hw/virtio/vhost-backend.h 
> b/include/hw/virtio/vhost-backend.h
> index ec3fbae58d..5935b32fe3 100644
> --- a/include/hw/virtio/vhost-backend.h
> +++ b/include/hw/virtio/vhost-backend.h
> @@ -26,6 +26,18 @@ typedef enum VhostSetConfigType {
>  VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
>  } VhostSetConfigType;
>
> +typedef enum VhostDeviceStateDirection {
> +/* Transfer state from back-end (device) to front-end */
> +VHOST_TRANSFER_STATE_DIRECTION_SAVE = 0,
> +/* Transfer state from front-end to back-end (device) */
> +VHOST_TRANSFER_STATE_DIRECTION_LOAD = 1,
> +} VhostDeviceStateDirection;
> +
> +typedef enum VhostDeviceStatePhase {
> +/* The device (and all its vrings) is stopped */
> +VHOST_TRANSFER_STATE_PHASE_STOPPED = 0,
> +} VhostDeviceStatePhase;
> +
>  struct vhost_inflight;
>  struct vhost_dev;
>  struct vhost_log;
> @@ -133,6 +145,15 @@ typedef int (*vhost_set_config_call_op)(struct vhost_dev 
> *dev,
>
>  typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
>
> +typedef bool (*vhost_supports_migratory_state_op)(struct vhost_dev *dev);
> +typedef int (*vhost_set_device_state_fd_op)(struct vhost_dev *dev,
> +VhostDeviceStateDirection 
> direction,
> +VhostDeviceStatePhase phase,
> +int fd,
> +int *reply_fd,
> +Error **errp);
> +typedef int (*vhost_check_device_state_op)(struct vhost_dev *dev, Error 
> **errp);
> +
>  typedef struct VhostOps {
>  VhostBackendType backend_type;
>  vhost_backend_init vhost_backend_init;
> @@ -181,6 +202,9 @@ typedef struct VhostOps {
>  vhost_force_iommu_op vhost_force_iommu;
>  vhost_set_config_call_op vhost_set_config_call;
>  vhost_reset_status_op vhost_reset_status;
> +vhost_supports_migratory_state_op vhost_supports_migratory_state;
> +vhost_set_device_state_fd_op vhost_set_device_state_fd;
> +vhost_check_device_state_op vhost_check_device_state;
>  } VhostOps;
>
>  int vhost_backend_upda

[PATCH 1/6] target/riscv: Update pmp_get_tlb_size()

2023-04-13 Thread Weiwei Li

Not only the matched PMP entry, Any PMP entry that overlap with partial of
the tlb page may make the regions in that page have different permission
rights. So all of them should be taken into consideration.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/cpu_helper.c |  7 ++-
 target/riscv/pmp.c| 34 +-
 target/riscv/pmp.h|  3 +--
 3 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 433ea529b0..075fc0538a 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -703,11 +703,8 @@ static int get_physical_address_pmp(CPURISCVState *env, 
int *prot,
 }
 
 *prot = pmp_priv_to_page_prot(pmp_priv);
-if ((tlb_size != NULL) && pmp_index != MAX_RISCV_PMPS) {
-target_ulong tlb_sa = addr & ~(TARGET_PAGE_SIZE - 1);
-target_ulong tlb_ea = tlb_sa + TARGET_PAGE_SIZE - 1;
-
-*tlb_size = pmp_get_tlb_size(env, pmp_index, tlb_sa, tlb_ea);
+if (tlb_size != NULL) {
+*tlb_size = pmp_get_tlb_size(env, addr);
 }
 
 return TRANSLATE_SUCCESS;
diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 1f5aca42e8..4f9389e73c 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -601,28 +601,36 @@ target_ulong mseccfg_csr_read(CPURISCVState *env)
 }
 
 /*
- * Calculate the TLB size if the start address or the end address of
+ * Calculate the TLB size if any start address or the end address of
  * PMP entry is presented in the TLB page.
  */
-target_ulong pmp_get_tlb_size(CPURISCVState *env, int pmp_index,
-  target_ulong tlb_sa, target_ulong tlb_ea)
+target_ulong pmp_get_tlb_size(CPURISCVState *env, target_ulong addr)
 {
-target_ulong pmp_sa = env->pmp_state.addr[pmp_index].sa;
-target_ulong pmp_ea = env->pmp_state.addr[pmp_index].ea;
+target_ulong pmp_sa;
+target_ulong pmp_ea;
+target_ulong tlb_sa = addr & ~(TARGET_PAGE_SIZE - 1);
+target_ulong tlb_ea = tlb_sa + TARGET_PAGE_SIZE - 1;
+int i;
+
+for (i = 0; i < MAX_RISCV_PMPS; i++) {
+pmp_sa = env->pmp_state.addr[i].sa;
+pmp_ea = env->pmp_state.addr[i].ea;
 
-if (pmp_sa <= tlb_sa && pmp_ea >= tlb_ea) {
-return TARGET_PAGE_SIZE;
-} else {
 /*
- * At this point we have a tlb_size that is the smallest possible size
- * That fits within a TARGET_PAGE_SIZE and the PMP region.
- *
- * If the size is less then TARGET_PAGE_SIZE we drop the size to 1.
+ * If any start address or the end address of PMP entry is presented
+ * in the TLB page and cannot override the whole TLB page we drop the
+ * size to 1.
  * This means the result isn't cached in the TLB and is only used for
  * a single translation.
  */
-return 1;
+if (((pmp_sa >= tlb_sa && pmp_sa <= tlb_ea) ||
+ (pmp_ea >= tlb_sa && pmp_ea <= tlb_ea)) &&
+!(pmp_sa == tlb_sa && pmp_ea == tlb_ea)) {
+return 1;
+}
 }
+
+return TARGET_PAGE_SIZE;
 }
 
 /*
diff --git a/target/riscv/pmp.h b/target/riscv/pmp.h
index b296ea1fc6..0a7e24750b 100644
--- a/target/riscv/pmp.h
+++ b/target/riscv/pmp.h
@@ -76,8 +76,7 @@ int pmp_hart_has_privs(CPURISCVState *env, target_ulong addr,
target_ulong size, pmp_priv_t privs,
pmp_priv_t *allowed_privs,
target_ulong mode);
-target_ulong pmp_get_tlb_size(CPURISCVState *env, int pmp_index,
-  target_ulong tlb_sa, target_ulong tlb_ea);
+target_ulong pmp_get_tlb_size(CPURISCVState *env, target_ulong addr);
 void pmp_update_rule_addr(CPURISCVState *env, uint32_t pmp_index);
 void pmp_update_rule_nums(CPURISCVState *env);
 uint32_t pmp_get_num_rules(CPURISCVState *env);
-- 
2.25.1

[PATCH 6/6] accel/tcg: Remain TLB_INVALID_MASK in the address when TLB is re-filled

2023-04-13 Thread Weiwei Li

When PMP entry overlap part of the page, we'll set the tlb_size to 1, and
this will make the address set with TLB_INVALID_MASK to make the page
un-cached. However, if we clear TLB_INVALID_MASK when TLB is re-filled, then
the TLB host address will be cached, and the following instructions can use
this host address directly which may lead to the bypass of PMP related check.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 accel/tcg/cputlb.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index e984a98dc4..d0bf996405 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1563,13 +1563,6 @@ static int probe_access_internal(CPUArchState *env, 
target_ulong addr,
 /* TLB resize via tlb_fill may have moved the entry.  */
 index = tlb_index(env, mmu_idx, addr);
 entry = tlb_entry(env, mmu_idx, addr);
-
-/*
- * With PAGE_WRITE_INV, we set TLB_INVALID_MASK immediately,
- * to force the next access through tlb_fill.  We've just
- * called tlb_fill, so we know that this entry *is* valid.
- */
-flags &= ~TLB_INVALID_MASK;
 }
 tlb_addr = tlb_read_ofs(entry, elt_ofs);
 }
-- 
2.25.1

[PATCH 0/6] target/riscv: Fix PMP related problem

2023-04-13 Thread Weiwei Li

This patchset tries to fix the PMP bypass problem issue 
https://gitlab.com/qemu-project/qemu/-/issues/1542

The port is available here:
https://github.com/plctlab/plct-qemu/tree/plct-pmp-fix

Weiwei Li (6):
  target/riscv: Update pmp_get_tlb_size()
  target/riscv: Move pmp_get_tlb_size apart from
get_physical_address_pmp
  target/riscv: flush tlb when pmpaddr is updated
  target/riscv: Flush TLB only when pmpcfg/pmpaddr really changes
  target/riscv: flush tb when PMP entry changes
  accel/tcg: Remain TLB_INVALID_MASK in the address when TLB is
re-filled

 accel/tcg/cputlb.c|  7 -
 target/riscv/cpu_helper.c | 19 -
 target/riscv/pmp.c| 60 ++-
 target/riscv/pmp.h|  3 +-
 4 files changed, 47 insertions(+), 42 deletions(-)

-- 
2.25.1

[PATCH 2/6] target/riscv: Move pmp_get_tlb_size apart from get_physical_address_pmp

2023-04-13 Thread Weiwei Li

pmp_get_tlb_size have no relationship with pmp-related permission check
currently.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/cpu_helper.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 075fc0538a..83c9699a6d 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -676,14 +676,11 @@ void riscv_cpu_set_mode(CPURISCVState *env, target_ulong 
newpriv)
  *
  * @env: CPURISCVState
  * @prot: The returned protection attributes
- * @tlb_size: TLB page size containing addr. It could be modified after PMP
- *permission checking. NULL if not set TLB page for addr.
  * @addr: The physical address to be checked permission
  * @access_type: The type of MMU access
  * @mode: Indicates current privilege level.
  */
-static int get_physical_address_pmp(CPURISCVState *env, int *prot,
-target_ulong *tlb_size, hwaddr addr,
+static int get_physical_address_pmp(CPURISCVState *env, int *prot, hwaddr addr,
 int size, MMUAccessType access_type,
 int mode)
 {
@@ -703,9 +700,6 @@ static int get_physical_address_pmp(CPURISCVState *env, int 
*prot,
 }
 
 *prot = pmp_priv_to_page_prot(pmp_priv);
-if (tlb_size != NULL) {
-*tlb_size = pmp_get_tlb_size(env, addr);
-}
 
 return TRANSLATE_SUCCESS;
 }
@@ -905,7 +899,7 @@ restart:
 }
 
 int pmp_prot;
-int pmp_ret = get_physical_address_pmp(env, &pmp_prot, NULL, pte_addr,
+int pmp_ret = get_physical_address_pmp(env, &pmp_prot, pte_addr,
sizeof(target_ulong),
MMU_DATA_LOAD, PRV_S);
 if (pmp_ret != TRANSLATE_SUCCESS) {
@@ -1300,8 +1294,9 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 prot &= prot2;
 
 if (ret == TRANSLATE_SUCCESS) {
-ret = get_physical_address_pmp(env, &prot_pmp, &tlb_size, pa,
+ret = get_physical_address_pmp(env, &prot_pmp, pa,
size, access_type, mode);
+tlb_size = pmp_get_tlb_size(env, pa);
 
 qemu_log_mask(CPU_LOG_MMU,
   "%s PMP address=" HWADDR_FMT_plx " ret %d prot"
@@ -1333,8 +1328,9 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
   __func__, address, ret, pa, prot);
 
 if (ret == TRANSLATE_SUCCESS) {
-ret = get_physical_address_pmp(env, &prot_pmp, &tlb_size, pa,
+ret = get_physical_address_pmp(env, &prot_pmp, pa,
size, access_type, mode);
+tlb_size = pmp_get_tlb_size(env, pa);
 
 qemu_log_mask(CPU_LOG_MMU,
   "%s PMP address=" HWADDR_FMT_plx " ret %d prot"
-- 
2.25.1

[PATCH 4/6] target/riscv: Flush TLB only when pmpcfg/pmpaddr really changes

2023-04-13 Thread Weiwei Li

TLB needn't be flushed when pmpcfg/pmpaddr don't changes.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/pmp.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 6d4813806b..aced23c4d5 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -26,7 +26,7 @@
 #include "trace.h"
 #include "exec/exec-all.h"
 
-static void pmp_write_cfg(CPURISCVState *env, uint32_t addr_index,
+static bool pmp_write_cfg(CPURISCVState *env, uint32_t addr_index,
   uint8_t val);
 static uint8_t pmp_read_cfg(CPURISCVState *env, uint32_t addr_index);
 static void pmp_update_rule(CPURISCVState *env, uint32_t pmp_index);
@@ -83,7 +83,7 @@ static inline uint8_t pmp_read_cfg(CPURISCVState *env, 
uint32_t pmp_index)
  * Accessor to set the cfg reg for a specific PMP/HART
  * Bounds checks and relevant lock bit.
  */
-static void pmp_write_cfg(CPURISCVState *env, uint32_t pmp_index, uint8_t val)
+static bool pmp_write_cfg(CPURISCVState *env, uint32_t pmp_index, uint8_t val)
 {
 if (pmp_index < MAX_RISCV_PMPS) {
 bool locked = true;
@@ -119,14 +119,17 @@ static void pmp_write_cfg(CPURISCVState *env, uint32_t 
pmp_index, uint8_t val)
 
 if (locked) {
 qemu_log_mask(LOG_GUEST_ERROR, "ignoring pmpcfg write - locked\n");
-} else {
+} else if (env->pmp_state.pmp[pmp_index].cfg_reg != val) {
 env->pmp_state.pmp[pmp_index].cfg_reg = val;
 pmp_update_rule(env, pmp_index);
+return true;
 }
 } else {
 qemu_log_mask(LOG_GUEST_ERROR,
   "ignoring pmpcfg write - out of bounds\n");
 }
+
+return false;
 }
 
 static void pmp_decode_napot(target_ulong a, target_ulong *sa,
@@ -477,16 +480,19 @@ void pmpcfg_csr_write(CPURISCVState *env, uint32_t 
reg_index,
 int i;
 uint8_t cfg_val;
 int pmpcfg_nums = 2 << riscv_cpu_mxl(env);
+bool modified = false;
 
 trace_pmpcfg_csr_write(env->mhartid, reg_index, val);
 
 for (i = 0; i < pmpcfg_nums; i++) {
 cfg_val = (val >> 8 * i)  & 0xff;
-pmp_write_cfg(env, (reg_index * 4) + i, cfg_val);
+modified |= pmp_write_cfg(env, (reg_index * 4) + i, cfg_val);
 }
 
 /* If PMP permission of any addr has been changed, flush TLB pages. */
-tlb_flush(env_cpu(env));
+if (modified) {
+tlb_flush(env_cpu(env));
+}
 }
 
 
@@ -535,9 +541,11 @@ void pmpaddr_csr_write(CPURISCVState *env, uint32_t 
addr_index,
 }
 
 if (!pmp_is_locked(env, addr_index)) {
-env->pmp_state.pmp[addr_index].addr_reg = val;
-pmp_update_rule(env, addr_index);
-tlb_flush(env_cpu(env));
+if (env->pmp_state.pmp[addr_index].addr_reg != val) {
+env->pmp_state.pmp[addr_index].addr_reg = val;
+pmp_update_rule(env, addr_index);
+tlb_flush(env_cpu(env));
+}
 } else {
 qemu_log_mask(LOG_GUEST_ERROR,
   "ignoring pmpaddr write - locked\n");
-- 
2.25.1

[PATCH 3/6] target/riscv: flush tlb when pmpaddr is updated

2023-04-13 Thread Weiwei Li

TLB should be flushed not only for pmpcfg csr changes, but also for
pmpaddr csr changes.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/pmp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 4f9389e73c..6d4813806b 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -537,6 +537,7 @@ void pmpaddr_csr_write(CPURISCVState *env, uint32_t 
addr_index,
 if (!pmp_is_locked(env, addr_index)) {
 env->pmp_state.pmp[addr_index].addr_reg = val;
 pmp_update_rule(env, addr_index);
+tlb_flush(env_cpu(env));
 } else {
 qemu_log_mask(LOG_GUEST_ERROR,
   "ignoring pmpaddr write - locked\n");
-- 
2.25.1

[RFC PATCH] softmmu/vl: fix typo for PHASE_MACHINE_INITIALIZED

2023-04-13 Thread Alex Bennée

Otherwise people might get confused grepping for
MACHINE_PHASE_INITIALIZED and find nothing refers to it.

Signed-off-by: Alex Bennée 
---
 softmmu/vl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/softmmu/vl.c b/softmmu/vl.c
index ea20b23e4c..1b76fbb656 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2509,7 +2509,7 @@ static void qemu_init_board(void)
 /* process plugin before CPUs are created, but once -smp has been parsed */
 qemu_plugin_load_list(&plugin_list, &error_fatal);
 
-/* From here on we enter MACHINE_PHASE_INITIALIZED.  */
+/* From here on we enter PHASE_MACHINE_INITIALIZED.  */
 machine_run_board_init(current_machine, mem_path, &error_fatal);
 
 drive_check_orphaned();
-- 
2.39.2

[PATCH 5/6] target/riscv: flush tb when PMP entry changes

2023-04-13 Thread Weiwei Li

The translation block may also be affected when PMP entry changes.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 target/riscv/pmp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index aced23c4d5..c2db52361f 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -25,6 +25,7 @@
 #include "cpu.h"
 #include "trace.h"
 #include "exec/exec-all.h"
+#include "exec/tb-flush.h"
 
 static bool pmp_write_cfg(CPURISCVState *env, uint32_t addr_index,
   uint8_t val);
@@ -492,6 +493,7 @@ void pmpcfg_csr_write(CPURISCVState *env, uint32_t 
reg_index,
 /* If PMP permission of any addr has been changed, flush TLB pages. */
 if (modified) {
 tlb_flush(env_cpu(env));
+tb_flush(env_cpu(env));
 }
 }
 
@@ -545,6 +547,7 @@ void pmpaddr_csr_write(CPURISCVState *env, uint32_t 
addr_index,
 env->pmp_state.pmp[addr_index].addr_reg = val;
 pmp_update_rule(env, addr_index);
 tlb_flush(env_cpu(env));
+tb_flush(env_cpu(env));
 }
 } else {
 qemu_log_mask(LOG_GUEST_ERROR,
-- 
2.25.1

Re: [PATCH 3/4] vhost: Add high-level state save/load functions

2023-04-13 Thread Hanna Czenczek


On 12.04.23 23:14, Stefan Hajnoczi wrote:

On Tue, Apr 11, 2023 at 05:05:14PM +0200, Hanna Czenczek wrote:

vhost_save_backend_state() and vhost_load_backend_state() can be used by
vhost front-ends to easily save and load the back-end's state to/from
the migration stream.

Because we do not know the full state size ahead of time,
vhost_save_backend_state() simply reads the data in 1 MB chunks, and
writes each chunk consecutively into the migration stream, prefixed by
its length.  EOF is indicated by a 0-length chunk.

Signed-off-by: Hanna Czenczek 
---
  include/hw/virtio/vhost.h |  35 +++
  hw/virtio/vhost.c | 196 ++
  2 files changed, 231 insertions(+)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 29449e0fe2..d1f1e9e1f3 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -425,4 +425,39 @@ int vhost_set_device_state_fd(struct vhost_dev *dev,
   */
  int vhost_check_device_state(struct vhost_dev *dev, Error **errp);
  
+/**

+ * vhost_save_backend_state(): High-level function to receive a vhost
+ * back-end's state, and save it in `f`.  Uses
+ * `vhost_set_device_state_fd()` to get the data from the back-end, and
+ * stores it in consecutive chunks that are each prefixed by their
+ * respective length (be32).  The end is marked by a 0-length chunk.
+ *
+ * Must only be called while the device and all its vrings are stopped
+ * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
+ *
+ * @dev: The vhost device from which to save the state
+ * @f: Migration stream in which to save the state
+ * @errp: Potential error message
+ *
+ * Returns 0 on success, and -errno otherwise.
+ */
+int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
+
+/**
+ * vhost_load_backend_state(): High-level function to load a vhost
+ * back-end's state from `f`, and send it over to the back-end.  Reads
+ * the data from `f` in the format used by `vhost_save_state()`, and
+ * uses `vhost_set_device_state_fd()` to transfer it to the back-end.
+ *
+ * Must only be called while the device and all its vrings are stopped
+ * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
+ *
+ * @dev: The vhost device to which to send the sate
+ * @f: Migration stream from which to load the state
+ * @errp: Potential error message
+ *
+ * Returns 0 on success, and -errno otherwise.
+ */
+int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp);
+
  #endif
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 90099d8f6a..d08849c691 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -2125,3 +2125,199 @@ int vhost_check_device_state(struct vhost_dev *dev, 
Error **errp)
 "vhost transport does not support migration state transfer");
  return -ENOSYS;
  }
+
+int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error **errp)
+{
+/* Maximum chunk size in which to transfer the state */
+const size_t chunk_size = 1 * 1024 * 1024;
+void *transfer_buf = NULL;
+g_autoptr(GError) g_err = NULL;
+int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
+int ret;
+
+/* [0] for reading (our end), [1] for writing (back-end's end) */
+if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
+error_setg(errp, "Failed to set up state transfer pipe: %s",
+   g_err->message);
+ret = -EINVAL;
+goto fail;
+}
+
+read_fd = pipe_fds[0];
+write_fd = pipe_fds[1];
+
+/* VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be stopped */
+assert(!dev->started && !dev->enable_vqs);
+
+/* Transfer ownership of write_fd to the back-end */
+ret = vhost_set_device_state_fd(dev,
+VHOST_TRANSFER_STATE_DIRECTION_SAVE,
+VHOST_TRANSFER_STATE_PHASE_STOPPED,
+write_fd,
+&reply_fd,
+errp);
+if (ret < 0) {
+error_prepend(errp, "Failed to initiate state transfer: ");
+goto fail;
+}
+
+/* If the back-end wishes to use a different pipe, switch over */
+if (reply_fd >= 0) {
+close(read_fd);
+read_fd = reply_fd;
+}
+
+transfer_buf = g_malloc(chunk_size);
+
+while (true) {
+ssize_t read_ret;
+
+read_ret = read(read_fd, transfer_buf, chunk_size);
+if (read_ret < 0) {
+ret = -errno;
+error_setg_errno(errp, -ret, "Failed to receive state");
+goto fail;
+}
+
+assert(read_ret <= chunk_size);
+qemu_put_be32(f, read_ret);
+
+if (read_ret == 0) {
+/* EOF */
+break;
+}
+
+qemu_put_buffer(f, transfer_buf, read_ret);
+}

I think this synchronous approach with a single contiguous stream of
chunks is okay for now.

Does this make the QEMU monitor unresponsive if the backend is slo

Re: [PATCH for-8.0 0/5] Xen emulation build/Coverity fixes

2023-04-13 Thread Peter Maydell

On Wed, 12 Apr 2023 at 20:01, David Woodhouse  wrote:
>
> On Wed, 2023-04-12 at 19:55 +0100, Peter Maydell wrote:
> > On Wed, 12 Apr 2023 at 19:52, David Woodhouse  wrote:
> > >
> > > Some Coverity fixes and minor cleanups. And most notably, dropping
> > > support for Xen libraries older than 4.7.1.
> > >
> > > I believe there are two issues that remain to be fixed. The x32 build
> > > fails, and I've seen patches which attempt to detect x32 and disable
> > > the Xen emulation. Along with assertions that we just shouldn't care.
> > > I don't have a strong opinion either way but it seems to be in hand.
> > >
> > > The other is the question of what Xen *actually* does if you try to
> > > unmap an IRQ_MSI_EMU PIRQ. I don't think Linux guests try that, and
> > > I'm fairly sure Windows doesn't even use MSI→PIRQ mappings in the
> > > first place, and I doubt any other guests care either. I'd like to
> > > establish the 'correct' behaviour and implement it, ideally before
> > > the 8.0 release, but it's going to take me a few days more.
> > >
> > > David Woodhouse (5):
> > >   hw/xen: Simplify emulated Xen platform init
> > >   hw/xen: Fix memory leak in libxenstore_open() for Xen
> > >   xen: Drop support for Xen versions below 4.7.1
> > >   hw/xen: Fix double-free in xen_console store_con_info()
> > >   hw/xen: Fix broken check for invalid state in xs_be_open()
> > >
> >
> > This is highly unlikely to make 8.0 at this point, FYI.
> > If there's anything in this you think is super-critical we
> > might be able to sneak it in.
>
> Nothing is super-critical except maybe the double-free in
> store_con_info(). That could lead to a crash on startup if the QEMU Xen
> console is being used.

I've cherry-picked that double-free patch to apply for 8.0; thanks.

-- PMM

[PATCH v3] cxl-cdat:Fix open file not closed in ct3_load_cdat

2023-04-13 Thread Hao Zeng


opened file processor not closed,May cause file processor leaks
Fixes: aba578bdac ("hw/cxl: CDAT Data Object Exchange implementation")
ChangeLog:
v2->v3:
Submission of v3 on the basis of v2, based on Philippe Mathieu-DaudÃ©'s 
suggestion
"Pointless bzero in g_malloc0, however this code would be
 simplified using g_file_get_contents()."
v1->v2:
- Patch 1: No change in patch v1
- Patch 2: Fix the check on the return value of fread() in ct3_load_cdat

Signed-off-by: Zeng Hao 
Suggested-by: Philippe Mathieu-DaudÃ© 
Suggested-by: Peter Maydell 
---
 hw/cxl/cxl-cdat.c | 30 --
 1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
index 137abd0992..42c7c2031c 100644
--- a/hw/cxl/cxl-cdat.c
+++ b/hw/cxl/cxl-cdat.c
@@ -110,29 +110,17 @@ static void ct3_load_cdat(CDATObject *cdat, Error **errp)
 g_autofree CDATEntry *cdat_st = NULL;
 uint8_t sum = 0;
 int num_ent;
-int i = 0, ent = 1, file_size = 0;
+int i = 0, ent = 1;
+gsize file_size = 0;
 CDATSubHeader *hdr;
-FILE *fp = NULL;
-
+GError *error = NULL;
 /* Read CDAT file and create its cache */
-fp = fopen(cdat->filename, "r");
-if (!fp) {
-error_setg(errp, "CDAT: Unable to open file");
-return;
-}
-
-fseek(fp, 0, SEEK_END);
-file_size = ftell(fp);
-fseek(fp, 0, SEEK_SET);
-cdat->buf = g_malloc0(file_size);
-
-if (fread(cdat->buf, file_size, 1, fp) == 0) {
-error_setg(errp, "CDAT: File read failed");
+if (!g_file_get_contents(cdat->filename, (gchar **)&cdat->buf,
+&file_size, &error)) {
+error_setg(errp, "CDAT: File read failed: %s", error->message);
+g_error_free(error);
 return;
 }
-
-fclose(fp);
-
 if (file_size < sizeof(CDATTableHeader)) {
 error_setg(errp, "CDAT: File too short");
 return;
@@ -218,7 +206,5 @@ void cxl_doe_cdat_release(CXLComponentState *cxl_cstate)
 cdat->free_cdat_table(cdat->built_buf, cdat->built_buf_len,
   cdat->private);
 }
-if (cdat->buf) {
-free(cdat->buf);
-}
+g_free(cdat->buf);
 }
-- 
2.37.2

Content-type: Text/plain

No virus found
Checked by Hillstone Network AntiVirus

Re: [PATCH 2/4] vhost-user: Interface for migration state transfer

2023-04-13 Thread Hanna Czenczek


On 12.04.23 23:06, Stefan Hajnoczi wrote:

On Tue, Apr 11, 2023 at 05:05:13PM +0200, Hanna Czenczek wrote:

So-called "internal" virtio-fs migration refers to transporting the
back-end's (virtiofsd's) state through qemu's migration stream.  To do
this, we need to be able to transfer virtiofsd's internal state to and
from virtiofsd.

Because virtiofsd's internal state will not be too large, we believe it
is best to transfer it as a single binary blob after the streaming
phase.  Because this method should be useful to other vhost-user
implementations, too, it is introduced as a general-purpose addition to
the protocol, not limited to vhost-user-fs.

These are the additions to the protocol:
- New vhost-user protocol feature VHOST_USER_PROTOCOL_F_MIGRATORY_STATE:
   This feature signals support for transferring state, and is added so
   that migration can fail early when the back-end has no support.

- SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a pipe
   over which to transfer the state.  The front-end sends an FD to the
   back-end into/from which it can write/read its state, and the back-end
   can decide to either use it, or reply with a different FD for the
   front-end to override the front-end's choice.
   The front-end creates a simple pipe to transfer the state, but maybe
   the back-end already has an FD into/from which it has to write/read
   its state, in which case it will want to override the simple pipe.
   Conversely, maybe in the future we find a way to have the front-end
   get an immediate FD for the migration stream (in some cases), in which
   case we will want to send this to the back-end instead of creating a
   pipe.
   Hence the negotiation: If one side has a better idea than a plain
   pipe, we will want to use that.

- CHECK_DEVICE_STATE: After the state has been transferred through the
   pipe (the end indicated by EOF), the front-end invokes this function
   to verify success.  There is no in-band way (through the pipe) to
   indicate failure, so we need to check explicitly.

Once the transfer pipe has been established via SET_DEVICE_STATE_FD
(which includes establishing the direction of transfer and migration
phase), the sending side writes its data into the pipe, and the reading
side reads it until it sees an EOF.  Then, the front-end will check for
success via CHECK_DEVICE_STATE, which on the destination side includes
checking for integrity (i.e. errors during deserialization).

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Hanna Czenczek 
---
  include/hw/virtio/vhost-backend.h |  24 +
  include/hw/virtio/vhost.h |  79 
  hw/virtio/vhost-user.c| 147 ++
  hw/virtio/vhost.c |  37 
  4 files changed, 287 insertions(+)

diff --git a/include/hw/virtio/vhost-backend.h 
b/include/hw/virtio/vhost-backend.h
index ec3fbae58d..5935b32fe3 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -26,6 +26,18 @@ typedef enum VhostSetConfigType {
  VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
  } VhostSetConfigType;
  
+typedef enum VhostDeviceStateDirection {

+/* Transfer state from back-end (device) to front-end */
+VHOST_TRANSFER_STATE_DIRECTION_SAVE = 0,
+/* Transfer state from front-end to back-end (device) */
+VHOST_TRANSFER_STATE_DIRECTION_LOAD = 1,
+} VhostDeviceStateDirection;
+
+typedef enum VhostDeviceStatePhase {
+/* The device (and all its vrings) is stopped */
+VHOST_TRANSFER_STATE_PHASE_STOPPED = 0,
+} VhostDeviceStatePhase;

vDPA has:

   /* Suspend a device so it does not process virtqueue requests anymore
*
* After the return of ioctl the device must preserve all the necessary state
* (the virtqueue vring base plus the possible device specific states) that 
is
* required for restoring in the future. The device must not change its
* configuration after that point.
*/
   #define VHOST_VDPA_SUSPEND  _IO(VHOST_VIRTIO, 0x7D)

   /* Resume a device so it can resume processing virtqueue requests
*
* After the return of this ioctl the device will have restored all the
* necessary states and it is fully operational to continue processing the
* virtqueue descriptors.
*/
   #define VHOST_VDPA_RESUME   _IO(VHOST_VIRTIO, 0x7E)

I wonder if it makes sense to import these into vhost-user so that the
difference between kernel vhost and vhost-user is minimized. It's okay
if one of them is ahead of the other, but it would be nice to avoid
overlapping/duplicated functionality.

(And I hope vDPA will import the device state vhost-user messages
introduced in this series.)


I don’t understand your suggestion.  (Like, I very simply don’t 
understand :))


These are vhost messages, right?  What purpose do you have in mind for 
them in vhost-user for internal migration?  They’re different from the 
state transfer messages, because they don’t transfer state to/from the 
front-end.  Als

Re: [PATCH 2/4] vhost-user: Interface for migration state transfer

2023-04-13 Thread Hanna Czenczek


On 13.04.23 10:50, Eugenio Perez Martin wrote:

On Tue, Apr 11, 2023 at 5:33 PM Hanna Czenczek  wrote:

So-called "internal" virtio-fs migration refers to transporting the
back-end's (virtiofsd's) state through qemu's migration stream.  To do
this, we need to be able to transfer virtiofsd's internal state to and
from virtiofsd.

Because virtiofsd's internal state will not be too large, we believe it
is best to transfer it as a single binary blob after the streaming
phase.  Because this method should be useful to other vhost-user
implementations, too, it is introduced as a general-purpose addition to
the protocol, not limited to vhost-user-fs.

These are the additions to the protocol:
- New vhost-user protocol feature VHOST_USER_PROTOCOL_F_MIGRATORY_STATE:
   This feature signals support for transferring state, and is added so
   that migration can fail early when the back-end has no support.

- SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a pipe
   over which to transfer the state.  The front-end sends an FD to the
   back-end into/from which it can write/read its state, and the back-end
   can decide to either use it, or reply with a different FD for the
   front-end to override the front-end's choice.
   The front-end creates a simple pipe to transfer the state, but maybe
   the back-end already has an FD into/from which it has to write/read
   its state, in which case it will want to override the simple pipe.
   Conversely, maybe in the future we find a way to have the front-end
   get an immediate FD for the migration stream (in some cases), in which
   case we will want to send this to the back-end instead of creating a
   pipe.
   Hence the negotiation: If one side has a better idea than a plain
   pipe, we will want to use that.

- CHECK_DEVICE_STATE: After the state has been transferred through the
   pipe (the end indicated by EOF), the front-end invokes this function
   to verify success.  There is no in-band way (through the pipe) to
   indicate failure, so we need to check explicitly.

Once the transfer pipe has been established via SET_DEVICE_STATE_FD
(which includes establishing the direction of transfer and migration
phase), the sending side writes its data into the pipe, and the reading
side reads it until it sees an EOF.  Then, the front-end will check for
success via CHECK_DEVICE_STATE, which on the destination side includes
checking for integrity (i.e. errors during deserialization).

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Hanna Czenczek 
---
  include/hw/virtio/vhost-backend.h |  24 +
  include/hw/virtio/vhost.h |  79 
  hw/virtio/vhost-user.c| 147 ++
  hw/virtio/vhost.c |  37 
  4 files changed, 287 insertions(+)


[...]


diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 2fe02ed5d4..29449e0fe2 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -346,4 +346,83 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,


[...]


+/**
+ * vhost_set_device_state_fd(): After transferring state from/to the

Nitpick: This function doc is for vhost_check_device_state not
vhost_set_device_state_fd.

Thanks!


Oops, right, thanks!

Hanna


+ * back-end via vhost_set_device_state_fd(), i.e. once the sending end
+ * has closed the pipe, inquire the back-end to report any potential
+ * errors that have occurred on its side.  This allows to sense errors
+ * like:
+ * - During outgoing migration, when the source side had already started
+ *   to produce its state, something went wrong and it failed to finish
+ * - During incoming migration, when the received state is somehow
+ *   invalid and cannot be processed by the back-end
+ *
+ * @dev: The vhost device
+ * @errp: Potential error description
+ *
+ * Returns 0 when the back-end reports successful state transfer and
+ * processing, and -errno when an error occurred somewhere.
+ */
+int vhost_check_device_state(struct vhost_dev *dev, Error **errp);
+

Re: [PATCH] target/riscv: Update check for Zca/zcf/zcd

2023-04-13 Thread Weiwei Li




On 2023/4/13 01:03, Daniel Henrique Barboza wrote:



On 4/12/23 00:06, Weiwei Li wrote:

Even though Zca/Zcf/Zcd can be included by C/F/D, however, their priv
version is higher than the priv version of C/F/D. So if we use check
for them instead of check for C/F/D totally, it will trigger new
problem when we try to disable the extensions based on the configured
priv version.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---


Two things:

- the patch fails checkpatch.pl. I fixed it in my tree, but in case 
the patch

needs a new version entirely here's the error:

v7-0005-target-riscv-Mask-the-implicitly-enabled-extensio.patch has no 
obvious style problems and is ready for submission.


Checking v7-0006-target-riscv-Update-check-for-Zca-zcf-zcd.patch...
ERROR: space required before the open parenthesis '('
#36: FILE: target/riscv/insn_trans/trans_rvd.c.inc:36:
+    if(!has_ext(ctx, RVD) || !has_ext(ctx, RVC)) { \

ERROR: space required before the open parenthesis '('
#72: FILE: target/riscv/insn_trans/trans_rvf.c.inc:35:
+    if(!has_ext(ctx, RVF) || !has_ext(ctx, RVC)) { \



Sorry. I forgot to run the checkpatch.pl. I'll fix it later.


- yesterday Richard sent the following review in the patch "[RFC PATCH 
3/4]

target/riscv: check smstateen fcsr flag":





+#define REQUIRE_ZFINX_OR_F(ctx) do { \
+    if (!has_ext(ctx, RVF)) { \
+    if (!ctx->cfg_ptr->ext_zfinx) { \
+    return false; \
+    } \
+    smstateen_fcsr_check(ctx); \
  } \
  } while (0)


As a matter of style, I strongly object to a *nested* macro returning 
from the calling function.  These should all be changed to normal 
functions of the form


    if (!require_xyz(ctx) || !require_abc(ctx)) {
    return something;
    }

etc.  insn_trans/trans_rvv.c.inc is much much cleaner in this respect.



I believe his comment is also valid for this patch as well due to how
REQUIRE_ZCD_OR_DC(ctx) and REQUIRE_ZCF_OR_FC(ctx) is implemented. Before
re-sending this patch as is it's better to check with him now.


I think there is no nested macro in REQUIRE_ZCD_OR_DC(has_ext() is an 
inline function)


Regards,

Weiwei Li



Richard, does this patch use the nested macro style you strongly object?


Thanks,


Daniel



  target/riscv/insn_trans/trans_rvd.c.inc | 12 +++-
  target/riscv/insn_trans/trans_rvf.c.inc | 14 --
  target/riscv/insn_trans/trans_rvi.c.inc |  5 +++--
  target/riscv/translate.c    |  5 +++--
  4 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvd.c.inc 
b/target/riscv/insn_trans/trans_rvd.c.inc

index 2c51e01c40..f8d0ae48c7 100644
--- a/target/riscv/insn_trans/trans_rvd.c.inc
+++ b/target/riscv/insn_trans/trans_rvd.c.inc
@@ -31,9 +31,11 @@
  } \
  } while (0)
  -#define REQUIRE_ZCD(ctx) do { \
-    if (!ctx->cfg_ptr->ext_zcd) {  \
-    return false; \
+#define REQUIRE_ZCD_OR_DC(ctx) do { \
+    if (!ctx->cfg_ptr->ext_zcd) { \
+    if(!has_ext(ctx, RVD) || !has_ext(ctx, RVC)) { \
+    return false; \
+    } \
  } \
  } while (0)
  @@ -67,13 +69,13 @@ static bool trans_fsd(DisasContext *ctx, 
arg_fsd *a)

    static bool trans_c_fld(DisasContext *ctx, arg_fld *a)
  {
-    REQUIRE_ZCD(ctx);
+    REQUIRE_ZCD_OR_DC(ctx);
  return trans_fld(ctx, a);
  }
    static bool trans_c_fsd(DisasContext *ctx, arg_fsd *a)
  {
-    REQUIRE_ZCD(ctx);
+    REQUIRE_ZCD_OR_DC(ctx);
  return trans_fsd(ctx, a);
  }
  diff --git a/target/riscv/insn_trans/trans_rvf.c.inc 
b/target/riscv/insn_trans/trans_rvf.c.inc

index 9e9fa2087a..58467eb409 100644
--- a/target/riscv/insn_trans/trans_rvf.c.inc
+++ b/target/riscv/insn_trans/trans_rvf.c.inc
@@ -30,10 +30,12 @@
  } \
  } while (0)
  -#define REQUIRE_ZCF(ctx) do {  \
-    if (!ctx->cfg_ptr->ext_zcf) {  \
-    return false;  \
-    }  \
+#define REQUIRE_ZCF_OR_FC(ctx) do {    \
+    if (!ctx->cfg_ptr->ext_zcf) {  \
+    if(!has_ext(ctx, RVF) || !has_ext(ctx, RVC)) { \
+    return false;  \
+    }  \
+    }  \
  } while (0)
    static bool trans_flw(DisasContext *ctx, arg_flw *a)
@@ -69,13 +71,13 @@ static bool trans_fsw(DisasContext *ctx, arg_fsw *a)
    static bool trans_c_flw(DisasContext *ctx, arg_flw *a)
  {
-    REQUIRE_ZCF(ctx);
+    REQUIRE_ZCF_OR_FC(ctx);
  return trans_flw(ctx, a);
  }
    static bool trans_c_fsw(DisasContext *ctx, arg_fsw *a)
  {
-    REQUIRE_ZCF(ctx);
+    REQUIRE_ZCF_OR_FC(ctx);
  return trans_fsw(ctx, a);
  }
  diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc

index c70c495fc5..e33f63bea1 100644
--- a/target/riscv/insn_trans/tr

Re: [PATCH v1 0/2] Update CXL documentation

2023-04-13 Thread Jonathan Cameron via

On Thu,  6 Apr 2023 18:58:37 +0530
Raghu H  wrote:

> Thanks Jonathan for quick review/comments on earlier patch, as suggested
> splitting into two separate patches
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg952999.html
> 
> Removed the unsupported size option for cxl-type3 device, Qemu reads
> the device size directly from the backend memory device config.
> 
> Currently the Qemu CXL emulation for AARCH64 is not available and its
> only supported on x86_64 platform emulations. Removing the incorrect
> information and populating with supported x86_64 sample command to
> emulate cxl devices.
> 
> The document will be updated when the AARCH64 support is mainlined.

Both look good to me.  No need to rush these in.

I'll queue these up in my local tree (and update gitlab/jic23/qemu
sometime later this week) but if anyone wants to pick
them directly that's fine too.

Reviewed-by: Jonathan Cameron 

> 
> 
> Raghu H (2):
>   docs/cxl: Remove incorrect CXL type 3 size parameter
>   docs/cxl: Replace unsupported AARCH64 with x86_64
> 
>  docs/system/devices/cxl.rst | 17 ++---
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> 
> base-commit: 7d0334e49111787ae19fbc8d29ff6e7347f0605e

[PATCH v4] cxl-cdat:Fix open file not closed in ct3_load_cdat

2023-04-13 Thread Hao Zeng


opened file processor not closed,May cause file processor leaks
Fixes: aba578bdac ("hw/cxl: CDAT Data Object Exchange implementation")

Signed-off-by: Zeng Hao 
Suggested-by: Philippe Mathieu-DaudÃ© 
Suggested-by: Peter Maydell 

---
ChangeLog:
v3-v4:
Modify commit information,No code change.
v2->v3:
Submission of v3 on the basis of v2, based on Philippe Mathieu-DaudÃ©'s 
suggestion
"Pointless bzero in g_malloc0, however this code would be
 simplified using g_file_get_contents()."
v1->v2:
- Patch 1: No change in patch v1
- Patch 2: Fix the check on the return value of fread() in ct3_load_cdat
---
 hw/cxl/cxl-cdat.c | 30 --
 1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
index 137abd0992..42c7c2031c 100644
--- a/hw/cxl/cxl-cdat.c
+++ b/hw/cxl/cxl-cdat.c
@@ -110,29 +110,17 @@ static void ct3_load_cdat(CDATObject *cdat, Error **errp)
 g_autofree CDATEntry *cdat_st = NULL;
 uint8_t sum = 0;
 int num_ent;
-int i = 0, ent = 1, file_size = 0;
+int i = 0, ent = 1;
+gsize file_size = 0;
 CDATSubHeader *hdr;
-FILE *fp = NULL;
-
+GError *error = NULL;
 /* Read CDAT file and create its cache */
-fp = fopen(cdat->filename, "r");
-if (!fp) {
-error_setg(errp, "CDAT: Unable to open file");
-return;
-}
-
-fseek(fp, 0, SEEK_END);
-file_size = ftell(fp);
-fseek(fp, 0, SEEK_SET);
-cdat->buf = g_malloc0(file_size);
-
-if (fread(cdat->buf, file_size, 1, fp) == 0) {
-error_setg(errp, "CDAT: File read failed");
+if (!g_file_get_contents(cdat->filename, (gchar **)&cdat->buf,
+&file_size, &error)) {
+error_setg(errp, "CDAT: File read failed: %s", error->message);
+g_error_free(error);
 return;
 }
-
-fclose(fp);
-
 if (file_size < sizeof(CDATTableHeader)) {
 error_setg(errp, "CDAT: File too short");
 return;
@@ -218,7 +206,5 @@ void cxl_doe_cdat_release(CXLComponentState *cxl_cstate)
 cdat->free_cdat_table(cdat->built_buf, cdat->built_buf_len,
   cdat->private);
 }
-if (cdat->buf) {
-free(cdat->buf);
-}
+g_free(cdat->buf);
 }
-- 
2.37.2

Content-type: Text/plain

No virus found
Checked by Hillstone Network AntiVirus

RE: [PATCH] replication: compile out some staff when replication is not configured

2023-04-13 Thread Zhang, Chen




> -Original Message-
> From: Vladimir Sementsov-Ogievskiy 
> Sent: Tuesday, April 11, 2023 10:51 PM
> To: qemu-devel@nongnu.org
> Cc: qemu-bl...@nongnu.org; pbonz...@redhat.com; arm...@redhat.com;
> ebl...@redhat.com; jasow...@redhat.com; dgilb...@redhat.com;
> quint...@redhat.com; hre...@redhat.com; kw...@redhat.com; Zhang,
> Hailiang ; Zhang, Chen
> ; lizhij...@fujitsu.com;
> wencongya...@huawei.com; xiechanglon...@gmail.com; den-
> plotni...@yandex-team.ru; Vladimir Sementsov-Ogievskiy
> 
> Subject: [PATCH] replication: compile out some staff when replication is not
> configured
> 
> Don't compile-in replication-related files when replication is disabled in
> config.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> Hi all!
> 
> I'm unsure, should it be actually separate --disable-colo / --enable-colo
> options or it's really used only together with replication staff.. So, I 
> decided
> to start with simpler variant.
> 

For replication, I think there's nothing wrong with the idea.
But not so for COLO.  COLO project consists of three independent parts: 
Replication, migration, net-proxy.
Each one have ability to run alone for other proposals. For example we can just 
run filter-mirror/redirector for networking
Analysis/debugs. Although the best practice of COLO is to make the three 
modules work together, in fact, we can also
use only some modules of COLO for other usage scenarios. Like COLO migration + 
net-proxy for shared disk, etc...
So I think no need to disable all COLO related modules when replication is not 
configured.
For details:
https://wiki.qemu.org/Features/COLO

Thanks
Chen

> 
>  block/meson.build |  2 +-
>  migration/meson.build |  6 --
>  net/meson.build   |  8 
>  qapi/migration.json   |  6 --
>  stubs/colo.c  | 46 +++
>  stubs/meson.build |  1 +
>  6 files changed, 60 insertions(+), 9 deletions(-)  create mode 100644
> stubs/colo.c
> 
> diff --git a/block/meson.build b/block/meson.build index
> 382bec0e7d..b9a72e219b 100644
> --- a/block/meson.build
> +++ b/block/meson.build
> @@ -84,7 +84,7 @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-
> win32.c', 'win32-aio.c')
>  block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, 
> iokit])
>  block_ss.add(when: libiscsi, if_true: files('iscsi-opts.c'))
>  block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c')) -if not
> get_option('replication').disabled()
> +if get_option('replication').allowed()
>block_ss.add(files('replication.c'))
>  endif
>  block_ss.add(when: libaio, if_true: files('linux-aio.c')) diff --git
> a/migration/meson.build b/migration/meson.build index
> 0d1bb9f96e..8180eaea7b 100644
> --- a/migration/meson.build
> +++ b/migration/meson.build
> @@ -13,8 +13,6 @@ softmmu_ss.add(files(
>'block-dirty-bitmap.c',
>'channel.c',
>'channel-block.c',
> -  'colo-failover.c',
> -  'colo.c',
>'exec.c',
>'fd.c',
>'global_state.c',
> @@ -29,6 +27,10 @@ softmmu_ss.add(files(
>'threadinfo.c',
>  ), gnutls)
> 
> +if get_option('replication').allowed()
> +  softmmu_ss.add(files('colo.c', 'colo-failover.c')) endif
> +
>  softmmu_ss.add(when: rdma, if_true: files('rdma.c'))  if
> get_option('live_block_migration').allowed()
>softmmu_ss.add(files('block.c'))
> diff --git a/net/meson.build b/net/meson.build index
> 87afca3e93..634ab71cc6 100644
> --- a/net/meson.build
> +++ b/net/meson.build
> @@ -1,13 +1,9 @@
>  softmmu_ss.add(files(
>'announce.c',
>'checksum.c',
> -  'colo-compare.c',
> -  'colo.c',
>'dump.c',
>'eth.c',
>'filter-buffer.c',
> -  'filter-mirror.c',
> -  'filter-rewriter.c',
>'filter.c',
>'hub.c',
>'net-hmp-cmds.c',
> @@ -19,6 +15,10 @@ softmmu_ss.add(files(
>'util.c',
>  ))
> 
> +if get_option('replication').allowed()
> +  softmmu_ss.add(files('colo-compare.c', 'colo.c', 'filter-rewriter.c',
> +'filter-mirror.c')) endif
> +
>  softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('filter-replay.c'))
> 
>  if have_l2tpv3
> diff --git a/qapi/migration.json b/qapi/migration.json index
> c84fa10e86..5b81e09369 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1685,7 +1685,8 @@
>  ##
>  { 'struct': 'COLOStatus',
>'data': { 'mode': 'COLOMode', 'last-mode': 'COLOMode',
> -'reason': 'COLOExitReason' } }
> +'reason': 'COLOExitReason' },
> +  'if': 'CONFIG_REPLICATION' }
> 
>  ##
>  # @query-colo-status:
> @@ -1702,7 +1703,8 @@
>  # Since: 3.1
>  ##
>  { 'command': 'query-colo-status',
> -  'returns': 'COLOStatus' }
> +  'returns': 'COLOStatus',
> +  'if': 'CONFIG_REPLICATION' }
> 
>  ##
>  # @migrate-recover:
> diff --git a/stubs/colo.c b/stubs/colo.c new file mode 100644 index
> 00..5a02540baa
> --- /dev/null
> +++ b/stubs/colo.c
> @@ -0,0 +1,46 @@
> +#include "qemu/osdep.h"
> +#include "qemu/notify.h"
> +#include "net/colo-compare.h"
> +#include "migration/colo.h"
> +#include "qapi

Re: [PATCH v3 18/20] bsd-user: Automatically generate syscall_nr.h

2023-04-13 Thread Richard Henderson

On 4/12/23 16:21, Warner Losh wrote:

On Wed, Apr 12, 2023 at 4:10 AM Richard Henderson > wrote:

On 4/11/23 19:09, Warner Losh wrote:
 > +++ b/bsd-user/syscallhdr.sh
 > @@ -0,0 +1,7 @@
 > +#!/bin/sh
 > +
 > +in="$1"
 > +out="$2"
 > +bsd="$3"
 > +
 > +awk -v bsd="$3" '{sub("SYS_", "TARGET_" bsd "_NR_", $0); print;}' < $in 
> $out

If the host/guest syscall numbers always match, there's no point in using
TARGET_freebsd_NR_foo at all -- just use the original SYS_foo symbol from 
.

long term, this is likely correct. Short term, though, changing to SYS_foo would cause 
quite a bit
of churn that I'm looking to avoid. 

Fair.

Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 27/54] tcg/riscv: Require TCG_TARGET_REG_BITS == 64

2023-04-13 Thread Daniel Henrique Barboza





On 4/10/23 22:04, Richard Henderson wrote:

The port currently does not support "oversize" guests, which
means riscv32 can only target 32-bit guests.  We will soon be
building TCG once for all guests.  This implies that we can
only support riscv64.

Since all Linux distributions target riscv64 not riscv32,
this is not much of a restriction and simplifies the code.

Signed-off-by: Richard Henderson 
---


Reviewed-by: Daniel Henrique Barboza 


  tcg/riscv/tcg-target-con-set.h |   6 -
  tcg/riscv/tcg-target.h |  22 ++--
  tcg/riscv/tcg-target.c.inc | 206 ++---
  3 files changed, 72 insertions(+), 162 deletions(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index cf0ac4d751..c11710d117 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -13,18 +13,12 @@ C_O0_I1(r)
  C_O0_I2(LZ, L)
  C_O0_I2(rZ, r)
  C_O0_I2(rZ, rZ)
-C_O0_I3(LZ, L, L)
-C_O0_I3(LZ, LZ, L)
-C_O0_I4(LZ, LZ, L, L)
  C_O0_I4(rZ, rZ, rZ, rZ)
  C_O1_I1(r, L)
  C_O1_I1(r, r)
-C_O1_I2(r, L, L)
  C_O1_I2(r, r, ri)
  C_O1_I2(r, r, rI)
  C_O1_I2(r, rZ, rN)
  C_O1_I2(r, rZ, rZ)
  C_O1_I4(r, rZ, rZ, rZ, rZ)
-C_O2_I1(r, r, L)
-C_O2_I2(r, r, L, L)
  C_O2_I4(r, r, rZ, rZ, rM, rM)
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 0deb33701f..dddf2486c1 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -25,11 +25,14 @@
  #ifndef RISCV_TCG_TARGET_H
  #define RISCV_TCG_TARGET_H
  
-#if __riscv_xlen == 32

-# define TCG_TARGET_REG_BITS 32
-#elif __riscv_xlen == 64
-# define TCG_TARGET_REG_BITS 64
+/*
+ * We don't support oversize guests.
+ * Since we will only build tcg once, this in turn requires a 64-bit host.
+ */
+#if __riscv_xlen != 64
+#error "unsupported code generation mode"
  #endif
+#define TCG_TARGET_REG_BITS 64
  
  #define TCG_TARGET_INSN_UNIT_SIZE 4

  #define TCG_TARGET_TLB_DISPLACEMENT_BITS 20
@@ -83,13 +86,8 @@ typedef enum {
  #define TCG_TARGET_STACK_ALIGN  16
  #define TCG_TARGET_CALL_STACK_OFFSET0
  #define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_NORMAL
-#if TCG_TARGET_REG_BITS == 32
-#define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_EVEN
-#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_EVEN
-#else
  #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL
  #define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_NORMAL
-#endif
  #define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_NORMAL
  
  /* optional instructions */

@@ -106,8 +104,8 @@ typedef enum {
  #define TCG_TARGET_HAS_sub2_i32 1
  #define TCG_TARGET_HAS_mulu2_i320
  #define TCG_TARGET_HAS_muls2_i320
-#define TCG_TARGET_HAS_muluh_i32(TCG_TARGET_REG_BITS == 32)
-#define TCG_TARGET_HAS_mulsh_i32(TCG_TARGET_REG_BITS == 32)
+#define TCG_TARGET_HAS_muluh_i320
+#define TCG_TARGET_HAS_mulsh_i320
  #define TCG_TARGET_HAS_ext8s_i321
  #define TCG_TARGET_HAS_ext16s_i32   1
  #define TCG_TARGET_HAS_ext8u_i321
@@ -128,7 +126,6 @@ typedef enum {
  #define TCG_TARGET_HAS_setcond2 1
  #define TCG_TARGET_HAS_qemu_st8_i32 0
  
-#if TCG_TARGET_REG_BITS == 64

  #define TCG_TARGET_HAS_movcond_i64  0
  #define TCG_TARGET_HAS_div_i64  1
  #define TCG_TARGET_HAS_rem_i64  1
@@ -165,7 +162,6 @@ typedef enum {
  #define TCG_TARGET_HAS_muls2_i640
  #define TCG_TARGET_HAS_muluh_i641
  #define TCG_TARGET_HAS_mulsh_i641
-#endif
  
  #define TCG_TARGET_DEFAULT_MO (0)
  
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc

index 266fe1433d..1edc3b1c4d 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -137,15 +137,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
  #define SOFTMMU_RESERVE_REGS  0
  #endif
  
-

-static inline tcg_target_long sextreg(tcg_target_long val, int pos, int len)
-{
-if (TCG_TARGET_REG_BITS == 32) {
-return sextract32(val, pos, len);
-} else {
-return sextract64(val, pos, len);
-}
-}
+#define sextreg  sextract64
  
  /* test if a constant matches the constraint */

  static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
@@ -235,7 +227,6 @@ typedef enum {
  OPC_XOR = 0x4033,
  OPC_XORI = 0x4013,
  
-#if TCG_TARGET_REG_BITS == 64

  OPC_ADDIW = 0x1b,
  OPC_ADDW = 0x3b,
  OPC_DIVUW = 0x200503b,
@@ -250,23 +241,6 @@ typedef enum {
  OPC_SRLIW = 0x501b,
  OPC_SRLW = 0x503b,
  OPC_SUBW = 0x403b,
-#else
-/* Simplify code throughout by defining aliases for RV32.  */
-OPC_ADDIW = OPC_ADDI,
-OPC_ADDW = OPC_ADD,
-OPC_DIVUW = OPC_DIVU,
-OPC_DIVW = OPC_DIV,
-OPC_MULW = OPC_MUL,
-OPC_REMUW = OPC_REMU,
-OPC_REMW = OPC_REM,
-OPC_SLLIW = OPC_SLLI,
-OPC_SLLW = OPC_SLL,
-OPC_SRAIW = OPC_SRAI,
-OPC_SRAW = OPC_SRA,
-OPC_SRLIW = OPC_SRLI,
-OPC_SRLW = OPC_SRL,
-OPC_SUBW = OPC_SUB,
-#endif
  
  OPC_FENCE

Re: [PATCH v2 27/54] tcg/riscv: Require TCG_TARGET_REG_BITS == 64

2023-04-13 Thread Daniel Henrique Barboza





On 4/13/23 04:12, Richard Henderson wrote:

On 4/12/23 22:18, Daniel Henrique Barboza wrote:



On 4/10/23 22:04, Richard Henderson wrote:

The port currently does not support "oversize" guests, which
means riscv32 can only target 32-bit guests.  We will soon be
building TCG once for all guests.  This implies that we can
only support riscv64.

Since all Linux distributions target riscv64 not riscv32,
this is not much of a restriction and simplifies the code.


Code looks good but I got confused about the riscv32 implications you cited.

Does this means that if someone happens to have a risc-v 32 bit host, with a
special Linux sauce that runs on that 32 bit risc-v host, this person won't be
able to build the riscv32 TCG target in that machine?


Correct.

At present, one is able to configure with such a host, and if one uses 
--target-list=x,y,z such that all of x, y or z are 32-bit guests the build 
should even succeed, and the result should probably work.

However, if one does not use --target-list in configure, the build will #error 
out here:


@@ -942,9 +913,6 @@ static void * const qemu_st_helpers[MO_SIZE + 1] = {
  #endif
  };
-/* We don't support oversize guests */
-QEMU_BUILD_BUG_ON(TCG_TARGET_REG_BITS < TARGET_LONG_BITS);
-


I am working on a patch set, not yet posted, which builds tcg/*.o twice, once 
for system mode and once for user-only.  At which point riscv32 cannot build at 
all.

I brought this patch forward from there in order to reduce churn.


Thanks for clarifying.

As you mentioned in the commit msg, there's no Linux distro (that we care 
about) that
runs on riscv32, so this is not even a restriction today. And we can always 
change our
minds later if the need arrives.


Daniel




r~

Re: [PATCH for 8.1] intel_iommu: refine iotlb hash calculation

2023-04-13 Thread Alex Bennée



Peter Maydell  writes:

> On Wed, 12 Apr 2023 at 09:40, Alex Bennée  wrote:
>> Peter Maydell  writes:
>> > Whoops, hadn't noticed that guint type... (glib's
>> > g_int64_hash()'s approach to this is to XOR the top
>> > 32 bits with the bottom 32 bits to produce the 32-bit
>> > hash value.)
>>
>> This is less of a hash and more just concatting a bunch of fields. BTW
>> if the glib built-in hash isn't suitable we also have the qemu_xxhash()
>> functions which claim a good distribution of values and we use in a
>> number of places throughout the code.
>
> Is that really necessary? If glib doesn't do anything complex
> for "my keys are just integers" I don't see that we need to
> do anything complex for "my keys are a handful of integers".
> glib does do a bit on its end to counteract suboptimal hash functions:
>
> https://github.com/GNOME/glib/blob/main/glib/ghash.c#L429
>
> static inline guint
> g_hash_table_hash_to_index (GHashTable *hash_table, guint hash)
> {
>   /* Multiply the hash by a small prime before applying the modulo. This
>* prevents the table from becoming densely packed, even with a poor hash
>* function. A densely packed table would have poor performance on
>* workloads with many failed lookups or a high degree of churn. */
>   return (hash * 11) % hash_table->mod;
> }
>
> I figure if glib thought that users of hash tables should be
> doing more complex stuff then they would (a) provide helpers
> for that and (b) call it out in the docs. They don't do either.

Ahh I didn't realise glib was adding extra steps (although I guess it
makes sense if it is resizing its tables) or that their default hash
functions where so basic.

The original primary user of the qemu_xxhash functions is QHT which has
to manage its own tables so relies more on having a well distributed
hash function.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH V2] intel_iommu: refine iotlb hash calculation

2023-04-13 Thread Alex Bennée



Jason Wang  writes:

> On Wed, Apr 12, 2023 at 4:43 PM Alex Bennée  wrote:
>>
>>
>> Jason Wang  writes:
>>
>> > Commit 1b2b12376c8 ("intel-iommu: PASID support") takes PASID into
>> > account when calculating iotlb hash like:
>> >
>> > static guint vtd_iotlb_hash(gconstpointer v)
>> > {
>> > const struct vtd_iotlb_key *key = v;
>> >
>> > return key->gfn | ((key->sid) << VTD_IOTLB_SID_SHIFT) |
>> >(key->level) << VTD_IOTLB_LVL_SHIFT |
>> >(key->pasid) << VTD_IOTLB_PASID_SHIFT;
>> > }
>> >
>> > This turns out to be problematic since:
>> >
>> > - the shift will lose bits if not converting to uint64_t
>> > - level should be off by one in order to fit into 2 bits
>> > - VTD_IOTLB_PASID_SHIFT is 30 but PASID is 20 bits which will waste
>> >   some bits
>> > - the hash result is uint64_t so we will lose bits when converting to
>> >   guint
>> >
>> > So this patch fixes them by
>> >
>> > - converting the keys into uint64_t before doing the shift
>> > - off level by one to make it fit into two bits
>> > - change the sid, lvl and pasid shift to 26, 42 and 44 in order to
>> >   take the full width of uint64_t
>> > - perform an XOR to the top 32bit with the bottom 32bit for the final
>> >   result to fit guint
>> >
>> > Fixes: Coverity CID 1508100
>> > Fixes: 1b2b12376c8 ("intel-iommu: PASID support")
>> > Signed-off-by: Jason Wang 
>> > ---
>> > Changes since V1:
>> > - perform XOR to avoid losing bits when converting to gint
>> > ---
>> >  hw/i386/intel_iommu.c  | 9 +
>> >  hw/i386/intel_iommu_internal.h | 6 +++---
>> >  2 files changed, 8 insertions(+), 7 deletions(-)
>> >
>> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> > index a62896759c..94d52f4205 100644
>> > --- a/hw/i386/intel_iommu.c
>> > +++ b/hw/i386/intel_iommu.c
>> > @@ -64,8 +64,8 @@ struct vtd_as_key {
>> >  struct vtd_iotlb_key {
>> >  uint64_t gfn;
>> >  uint32_t pasid;
>> > -uint32_t level;
>> >  uint16_t sid;
>> > +uint8_t level;
>> >  };
>> >
>> >  static void vtd_address_space_refresh_all(IntelIOMMUState *s);
>> > @@ -221,10 +221,11 @@ static gboolean vtd_iotlb_equal(gconstpointer v1, 
>> > gconstpointer v2)
>> >  static guint vtd_iotlb_hash(gconstpointer v)
>> >  {
>> >  const struct vtd_iotlb_key *key = v;
>> > +uint64_t hash64 = key->gfn | ((uint64_t)(key->sid) << 
>> > VTD_IOTLB_SID_SHIFT) |
>> > +(uint64_t)(key->level - 1) << VTD_IOTLB_LVL_SHIFT |
>> > +(uint64_t)(key->pasid) << VTD_IOTLB_PASID_SHIFT;
>> >
>> > -return key->gfn | ((key->sid) << VTD_IOTLB_SID_SHIFT) |
>> > -   (key->level) << VTD_IOTLB_LVL_SHIFT |
>> > -   (key->pasid) << VTD_IOTLB_PASID_SHIFT;
>> > +return (guint)((hash64 >> 32) ^ (hash64 & 0xU));
>>
>> Have you measured the distribution this hash gives you? Otherwise
>> consider using the qemu_xxhash() functions to return a well distributed
>> 32 bit hash value.
>
> It depends on a lot of factors and so it won't be even because the
> individuals keys are not evenly distributed:
>
> - gfn depends on guest DMA subsystems
> - level depends on when huge pages are used
> - pasid depends on whether PASID is being used
>
> I'm ok to switch to use qemu_xxhash() if everyone agree. Or if as
> Peter said, if it has been dealt with glibc, maybe it's not worth to
> bother.

Yeah I missed that glibs default hash functions where pretty basic. I
think you can ignore my suggestion.

>
> Thanks
>
>>
>> --
>> Alex Bennée
>> Virtualisation Tech Lead @ Linaro
>>


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 2/4] vhost-user: Interface for migration state transfer

2023-04-13 Thread Eugenio Perez Martin

On Wed, Apr 12, 2023 at 11:06 PM Stefan Hajnoczi  wrote:
>
> On Tue, Apr 11, 2023 at 05:05:13PM +0200, Hanna Czenczek wrote:
> > So-called "internal" virtio-fs migration refers to transporting the
> > back-end's (virtiofsd's) state through qemu's migration stream.  To do
> > this, we need to be able to transfer virtiofsd's internal state to and
> > from virtiofsd.
> >
> > Because virtiofsd's internal state will not be too large, we believe it
> > is best to transfer it as a single binary blob after the streaming
> > phase.  Because this method should be useful to other vhost-user
> > implementations, too, it is introduced as a general-purpose addition to
> > the protocol, not limited to vhost-user-fs.
> >
> > These are the additions to the protocol:
> > - New vhost-user protocol feature VHOST_USER_PROTOCOL_F_MIGRATORY_STATE:
> >   This feature signals support for transferring state, and is added so
> >   that migration can fail early when the back-end has no support.
> >
> > - SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a pipe
> >   over which to transfer the state.  The front-end sends an FD to the
> >   back-end into/from which it can write/read its state, and the back-end
> >   can decide to either use it, or reply with a different FD for the
> >   front-end to override the front-end's choice.
> >   The front-end creates a simple pipe to transfer the state, but maybe
> >   the back-end already has an FD into/from which it has to write/read
> >   its state, in which case it will want to override the simple pipe.
> >   Conversely, maybe in the future we find a way to have the front-end
> >   get an immediate FD for the migration stream (in some cases), in which
> >   case we will want to send this to the back-end instead of creating a
> >   pipe.
> >   Hence the negotiation: If one side has a better idea than a plain
> >   pipe, we will want to use that.
> >
> > - CHECK_DEVICE_STATE: After the state has been transferred through the
> >   pipe (the end indicated by EOF), the front-end invokes this function
> >   to verify success.  There is no in-band way (through the pipe) to
> >   indicate failure, so we need to check explicitly.
> >
> > Once the transfer pipe has been established via SET_DEVICE_STATE_FD
> > (which includes establishing the direction of transfer and migration
> > phase), the sending side writes its data into the pipe, and the reading
> > side reads it until it sees an EOF.  Then, the front-end will check for
> > success via CHECK_DEVICE_STATE, which on the destination side includes
> > checking for integrity (i.e. errors during deserialization).
> >
> > Suggested-by: Stefan Hajnoczi 
> > Signed-off-by: Hanna Czenczek 
> > ---
> >  include/hw/virtio/vhost-backend.h |  24 +
> >  include/hw/virtio/vhost.h |  79 
> >  hw/virtio/vhost-user.c| 147 ++
> >  hw/virtio/vhost.c |  37 
> >  4 files changed, 287 insertions(+)
> >
> > diff --git a/include/hw/virtio/vhost-backend.h 
> > b/include/hw/virtio/vhost-backend.h
> > index ec3fbae58d..5935b32fe3 100644
> > --- a/include/hw/virtio/vhost-backend.h
> > +++ b/include/hw/virtio/vhost-backend.h
> > @@ -26,6 +26,18 @@ typedef enum VhostSetConfigType {
> >  VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
> >  } VhostSetConfigType;
> >
> > +typedef enum VhostDeviceStateDirection {
> > +/* Transfer state from back-end (device) to front-end */
> > +VHOST_TRANSFER_STATE_DIRECTION_SAVE = 0,
> > +/* Transfer state from front-end to back-end (device) */
> > +VHOST_TRANSFER_STATE_DIRECTION_LOAD = 1,
> > +} VhostDeviceStateDirection;
> > +
> > +typedef enum VhostDeviceStatePhase {
> > +/* The device (and all its vrings) is stopped */
> > +VHOST_TRANSFER_STATE_PHASE_STOPPED = 0,
> > +} VhostDeviceStatePhase;
>
> vDPA has:
>
>   /* Suspend a device so it does not process virtqueue requests anymore
>*
>* After the return of ioctl the device must preserve all the necessary 
> state
>* (the virtqueue vring base plus the possible device specific states) that 
> is
>* required for restoring in the future. The device must not change its
>* configuration after that point.
>*/
>   #define VHOST_VDPA_SUSPEND  _IO(VHOST_VIRTIO, 0x7D)
>
>   /* Resume a device so it can resume processing virtqueue requests
>*
>* After the return of this ioctl the device will have restored all the
>* necessary states and it is fully operational to continue processing the
>* virtqueue descriptors.
>*/
>   #define VHOST_VDPA_RESUME   _IO(VHOST_VIRTIO, 0x7E)
>
> I wonder if it makes sense to import these into vhost-user so that the
> difference between kernel vhost and vhost-user is minimized. It's okay
> if one of them is ahead of the other, but it would be nice to avoid
> overlapping/duplicated functionality.
>

That's what I had in mind in the first versions. I proposed VHOST_STOP
instead of VHOST_VDPA_STOP for thi

Re: [PATCH] qemu-options.hx: Update descriptions of memory options for NUMA node

2023-04-13 Thread Yohei Kojima

Thank you for the review. I will reflect them in the next version.

On 2023/04/11 21:57, Alex Bennée wrote:
> 
> Yohei Kojima  writes:
> 
>> This commit adds the following description:
>> 1. `memdev` option is recommended over `mem` option (see [1,2])
>> 2. users must specify memory for all NUMA nodes (see [2])
>>
>> This commit also separates descriptions for `mem` and `memdev` into two
>> paragraphs. The old doc describes legacy `mem` option first, and it was
>> a bit confusing.
>>
>> Related documantations:
>> [1] https://wiki.qemu.org/ChangeLog/5.1#Incompatible_changes
>> [2] https://www.qemu.org/docs/master/about/removed-features.html
>>
>> Signed-off-by: Yohei Kojima 
>> ---
>>  qemu-options.hx | 25 -
>>  1 file changed, 16 insertions(+), 9 deletions(-)
>>
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 59bdf67a2c..174f0d0c2d 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -405,15 +405,22 @@ SRST
>>  -numa node,nodeid=0 -numa node,nodeid=1 \
>>  -numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1
>>  
>> -Legacy '\ ``mem``\ ' assigns a given RAM amount to a node (not supported
>> -for 5.1 and newer machine types). '\ ``memdev``\ ' assigns RAM from
>> -a given memory backend device to a node. If '\ ``mem``\ ' and
>> -'\ ``memdev``\ ' are omitted in all nodes, RAM is split equally between 
>> them.
>> -
>> -
>> -'\ ``mem``\ ' and '\ ``memdev``\ ' are mutually exclusive.
>> -Furthermore, if one node uses '\ ``memdev``\ ', all of them have to
>> -use it.
>> +'\ ``memdev``\ ' option assigns RAM from a given memory backend
>> +device to a node. It is recommended to use '\ ``memdev``\ ' option
>> +over legacy '\ ``mem``\ ' option. This is because '\ ``memdev``\ '
>> +option provides better performance and more control over the
>> +backend's RAM (e.g. '\ ``prealloc``\ ' parameter of
>> +'\ ``-memory-backend-ram``\ ' allows memory preallocation).
>> +
>> +For compatibility reasons, legacy '\ ``mem``\ ' option is
>> +supported in 5.0 and older machine types. Note that '\ ``mem``\ '
>> +and '\ ``memdev``\ ' are mutually exclusive. If one node uses
>> +'\ ``memdev``\ ', the rest nodes have to use '\ ``memdev``\ '
>> +option, and vice versa.
>> +
>> +Users must specify memory for all NUMA nodes by '\ ``memdev``\ '
>> +(or legacy '\ ``mem``\ ' if available). In QEMU 5.2, the support
>> +for '\ ``-numa node``\ ' without memory specified was removed.
> 
> I think this mixes up memdev and mem too much. It would be better to
> make the lead up to the example just talk about memdev (as it is the
> preferred option) and move the discussion about backwards compatibility
> to after the example. You can use the .. note:: annotation to put it in
> a nice little box, something like:
> 
> .. note::
> 
> For compatibility reasons, legacy '\ ``mem``\ ' option is
> supported in 5.0 and older machine types. Note that '\ ``mem``\ '
> and '\ ``memdev``\ ' are mutually exclusive. If one node uses '\
> ``memdev``\ ', the rest of the nodes must also use the '\
> ``memdev``\ ' option, and vice versa.
> 
> 
>>  
>>  '\ ``initiator``\ ' is an additional option that points to an
>>  initiator NUMA node that has best performance (the lowest latency or
> 
>

Re: [PULL 19/54] acpi: pc: isa bridge: use AcpiDevAmlIf interface to build ISA device descriptors

2023-04-13 Thread Fiona Ebner

Am 12.04.23 um 14:18 schrieb Igor Mammedov:
> On Thu, 30 Mar 2023 13:58:22 +0200
> Fiona Ebner  wrote:
> 
>> Am 30.03.23 um 10:22 schrieb Igor Mammedov:
>>> On Tue, 28 Mar 2023 14:58:21 +0200
>>> Fiona Ebner  wrote:
>>>   

 Hi,
 while trying to reproduce another issue, I ended up with a Windows 10
 guest that would boot with QEMU 7.0, but get stuck after the Windows
 logo/spinning circles with QEMU 7.1 (also with 8.0.0-rc1). Machine type
 is pc-i440fx-6.2[0]. Bisecting led to this commit.

 It only happens the first time the VM is booted, killing the process and
 re-trying always worked afterwards. So it's not a big deal and might
 just be some ACPI-related Windows quirk. But I thought I should ask here
 to be sure.

 For bisecting, I restored the disk state after each attempt. While
 getting stuck sometimes took 3-4 attempts, I tested about 10 times until
 I declared a commit good, and re-tested the commit before this one 15
 times, so I'm pretty sure this is the one where the issue started 
 appearing.

 So, anything that could potentially be wrong with the commit or is this
 most likely just some Windows quirk/bug we can't do much about?

 If you need more information, please let me know!  
>>>
>>> Please describe in more detail your setup/steps where it reproduces
>>> (incl. Windows version/build, used QEMU CLI) so I could try to reproduce it 
>>> locally.
>>>
>>> (in past there were issues with German version that some where
>>> experience but not reproducible on my side, that resolved with
>>> upgrading to newer QEMU (if I recall correctly issue was opened
>>> on QEMU's gitlab tracker))
>>>   
>>
>> Windows 10 Education
>> Version 1809
>> Build 17763.1
>>
>> It's not the German ISO, I used default settings (except location
>> Austria and German keymap) and I don't think I did anything other than
>> shutdown after the install was over.
>>
>> The command line is below. I did use our patched QEMU builds when I got
>> into the situation, but I don't think they touch anything ACPI-related
>> and bisecting was done without our patches on top.
>>
>> I tried to reproduce the situation again from scratch today, but wasn't
>> able to. I do still have the problematic disk (snapshot) where the issue
>> occurs as an LVM-Thin volume. If you'd like to have access to that,
>> please send me a direct mail and we can discuss the details there.
> 
> I couldn't reproduce the issue on my host either.
> If you still have access to 'broken' disk image, you can try to enable
> kernel debug mode in guest and try to attach with debugger to it to see
> where it is stuck.
> 
> quick instructions how to do it:
>  https://gitlab.com/qemu-project/qemu/-/issues/774#note_1270248862
> or read more extensive MS docs on topic.
> 

Hmm, I guess I won't be able to enable kernel debug mode without losing
the problematic state of the image. The VM only gets stuck during the
first boot attempt.

Still, I wanted to give it a shot in the hope I can trigger it again
when shutting down with QEMU 6.2.0 and booting with QEMU 7.1.0. I made a
copy of the VM intending to use it as the debug host, but didn't get the
COM port to show up in the guest with
-serial unix:/tmp/com1,server,nowait
I checked in the Device Manager with "Show hidden devices" enabled.

Anyway, when starting the original problematic VM again, it now also got
stuck (visually, in the same place) with QEMU 6.2.0! But only until I
rebooted my host, which made it working with QEMU 6.2.0 again. So I'd
say this commit has nothing to do with the issue after all, just made it
more likely to trigger for me. And also seems less likely to be a QEMU
issue now :)

Best Regards,
Fiona

Re: virtio-iommu hotplug issue

2023-04-13 Thread Jean-Philippe Brucker

Hello,

On Thu, Apr 13, 2023 at 01:49:43PM +0900, Akihiko Odaki wrote:
> Hi,
> 
> Recently I encountered a problem with the combination of Linux's
> virtio-iommu driver and QEMU when a SR-IOV virtual function gets disabled.
> I'd like to ask you what kind of solution is appropriate here and implement
> the solution if possible.
> 
> A PCIe device implementing the SR-IOV specification exports a virtual
> function, and the guest can enable or disable it at runtime by writing to a
> configuration register. This effectively looks like a PCI device is
> hotplugged for the guest.

Just so I understand this better: the guest gets a whole PCIe device PF
that implements SR-IOV, and so the guest can dynamically create VFs?  Out
of curiosity, is that a hardware device assigned to the guest with VFIO,
or a device emulated by QEMU?

> In such a case, the kernel assumes the endpoint is
> detached from the virtio-iommu domain, but QEMU actually does not detach it.
> 
> This inconsistent view of the removed device sometimes prevents the VM from
> correctly performing the following procedure, for example:
> 1. Enable a VF.
> 2. Disable the VF.
> 3. Open a vfio container.
> 4. Open the group which the PF belongs to.
> 5. Add the group to the vfio container.
> 6. Map some memory region.
> 7. Close the group.
> 8. Close the vfio container.
> 9. Repeat 3-8
> 
> When the VF gets disabled, the kernel assumes the endpoint is detached from
> the IOMMU domain, but QEMU actually doesn't detach it. Later, the domain
> will be reused in step 3-8.
> 
> In step 7, the PF will be detached, and the kernel thinks there is no
> endpoint attached and the mapping the domain holds is cleared, but the VF
> endpoint is still attached and the mapping is kept intact.
> 
> In step 9, the same domain will be reused again, and the kernel requests to
> create a new mapping, but it will conflict with the existing mapping and
> result in -EINVAL.
> 
> This problem can be fixed by either of:
> - requesting the detachment of the endpoint from the guest when the PCI
> device is unplugged (the VF is disabled)

Yes, I think this is an issue in the virtio-iommu driver, which should be
sending a DETACH request when the VF is disabled, likely from
viommu_release_device(). I'll work on a fix unless you would like to do it

> - detecting that the PCI device is gone and automatically detach it on
> QEMU-side.
> 
> It is not completely clear for me which solution is more appropriate as the
> virtio-iommu specification is written in a way independent of the endpoint
> mechanism and does not say what should be done when a PCI device is
> unplugged.

Yes, I'm not sure it's in scope for the specification, it's more about
software guidance

Thanks,
Jean

netdev-socket test hang (s390 host, mips64el guest, backtrace)

2023-04-13 Thread Peter Maydell

I just found a hung netdev-socket test on our s390 CI runner.
Looks like a deadlock, no processes using CPU.
Here's the backtrace; looks like both QEMU processes are sat
idle but the test process is sat waiting forever for something
in test_stream_inet_reconnect(). Any ideas?

Process tree:
netdev-socket(3496843)-+-qemu-system-mip(3496956)
   `-qemu-system-mip(3496976)
===
PROCESS: 3496843
gitlab-+ 3496843 3472329  0 Apr10 ?00:00:00
/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/netdev-socket
--tap -k
[New LWP 3496844]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
__libc_recv (fd=fd@entry=3, buf=buf@entry=0x3ffe1cf9f27,
len=len@entry=1, flags=flags@entry=0) at
../sysdeps/unix/sysv/linux/recv.c:30
30  ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.

Thread 2 (Thread 0x3ffa457f900 (LWP 3496844)):
#0  syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:37
#1  0x02aa22cccbbc in qemu_futex_wait (val=,
f=) at
/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0x2aa22d3c860 )
at ../util/qemu-thread-posix.c:464
#3  0x02aa22cf89a2 in call_rcu_thread (opaque=opaque@entry=0x0) at
../util/rcu.c:261
#4  0x02aa22ccbc22 in qemu_thread_start (args=) at
../util/qemu-thread-posix.c:541
#5  0x03ffa4807e66 in start_thread (arg=0x3ffa457f900) at
pthread_create.c:477
#6  0x03ffa46fcbe6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 1 (Thread 0x3ffa4c72770 (LWP 3496843)):
#0  __libc_recv (fd=fd@entry=3, buf=buf@entry=0x3ffe1cf9f27,
len=len@entry=1, flags=flags@entry=0) at
../sysdeps/unix/sysv/linux/recv.c:30
#1  0x02aa22c9d982 in recv (__flags=0, __n=1, __buf=0x3ffe1cf9f27,
__fd=3) at /usr/include/s390x-linux-gnu/bits/socket2.h:44
#2  qmp_fd_receive (fd=) at ../tests/qtest/libqmp.c:73
#3  0x02aa22c9baee in qtest_qmp_receive_dict (s=0x2aa232a50d0) at
../tests/qtest/libqtest.c:837
#4  qtest_qmp_eventwait_ref (event=, s=)
at ../tests/qtest/libqtest.c:837
#5  qtest_qmp_eventwait_ref (s=0x2aa232a50d0, event=)
at ../tests/qtest/libqtest.c:828
#6  0x02aa22c9262c in wait_stream_connected (qts=,
addr=0x3ffe1cfa1b8, id=0x2aa22cfeed6 "st0") at
../tests/qtest/netdev-socket.c:157
#7  0x02aa22c929b6 in test_stream_inet_reconnect () at
../tests/qtest/netdev-socket.c:229
#8  0x03ffa49fe608 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#9  0x03ffa49fe392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#10 0x03ffa49fe392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#11 0x03ffa49fe392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#12 0x03ffa49fe392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#13 0x03ffa49feada in g_test_run_suite () from
/lib/s390x-linux-gnu/libglib-2.0.so.0
#14 0x03ffa49feb10 in g_test_run () from
/lib/s390x-linux-gnu/libglib-2.0.so.0
#15 0x02aa22c90678 in main (argc=, argv=) at ../tests/qtest/netdev-socket.c:543
[Inferior 1 (process 3496843) detached]

===
PROCESS: 3496956
gitlab-+ 3496956 3496843  0 Apr10 ?00:00:00
./qemu-system-mips64el -qtest unix:/tmp/qtest-3496843.sock -qtest-log
/dev/null -chardev socket,path=/tmp/qtest-3496843.qmp,id=char0 -mon
chardev=char0,mode=control -display none -nodefaults -M none -netdev
stream,server=false,id=st0,addr.type=inet,addr.ipv4=on,addr.ipv6=off,reconnect=1,addr.host=127.0.0.1,addr.port=50989
-accel qtest
[New LWP 3496965]
[New LWP 3496967]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
0x03ff81af1c8c in __ppoll (fds=0x2aa40b08230, nfds=6,
timeout=, timeout@entry=0x0, sigmask=sigmask@entry=0x0)
at ../sysdeps/unix/sysv/linux/ppoll.c:44
44  ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.

Thread 3 (Thread 0x3ff71c20900 (LWP 3496967)):
#0  0x03ff81af1b32 in __GI___poll (fds=0x3ff64003680, nfds=3,
timeout=) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x03ff842d4386 in  () at /lib/s390x-linux-gnu/libglib-2.0.so.0
#2  0x03ff842d4790 in g_main_loop_run () at
/lib/s390x-linux-gnu/libglib-2.0.so.0
#3  0x02aa3ea03bbe in iothread_run
(opaque=opaque@entry=0x2aa4096bf00) at ../iothread.c:70
#4  0x02aa3eb534ca in qemu_thread_start (args=) at
../util/qemu-thread-posix.c:541
#5  0x03ff81c07e66 in start_thread (arg=0x3ff71c20900) at
pthread_create.c:477
#6  0x03ff81afcbe6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Thread 2 (Thread 0x3ff72d23900 (LWP 3496965)):
#0  syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:37
#1  0x02aa3eb54464 in qemu_futex_wait (val=,
f=) at
/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=

Re: virtio-iommu hotplug issue

2023-04-13 Thread Akihiko Odaki


On 2023/04/13 19:40, Jean-Philippe Brucker wrote:

Hello,

On Thu, Apr 13, 2023 at 01:49:43PM +0900, Akihiko Odaki wrote:

Hi,

Recently I encountered a problem with the combination of Linux's
virtio-iommu driver and QEMU when a SR-IOV virtual function gets disabled.
I'd like to ask you what kind of solution is appropriate here and implement
the solution if possible.

A PCIe device implementing the SR-IOV specification exports a virtual
function, and the guest can enable or disable it at runtime by writing to a
configuration register. This effectively looks like a PCI device is
hotplugged for the guest.


Just so I understand this better: the guest gets a whole PCIe device PF
that implements SR-IOV, and so the guest can dynamically create VFs?  Out
of curiosity, is that a hardware device assigned to the guest with VFIO,
or a device emulated by QEMU?


Yes, that's right. The guest can dynamically create and delete VFs. The 
device is emulated by QEMU: igb, an Intel NIC recently added to QEMU and 
projected to be released as part of QEMU 8.0.





In such a case, the kernel assumes the endpoint is
detached from the virtio-iommu domain, but QEMU actually does not detach it.

This inconsistent view of the removed device sometimes prevents the VM from
correctly performing the following procedure, for example:
1. Enable a VF.
2. Disable the VF.
3. Open a vfio container.
4. Open the group which the PF belongs to.
5. Add the group to the vfio container.
6. Map some memory region.
7. Close the group.
8. Close the vfio container.
9. Repeat 3-8

When the VF gets disabled, the kernel assumes the endpoint is detached from
the IOMMU domain, but QEMU actually doesn't detach it. Later, the domain
will be reused in step 3-8.

In step 7, the PF will be detached, and the kernel thinks there is no
endpoint attached and the mapping the domain holds is cleared, but the VF
endpoint is still attached and the mapping is kept intact.

In step 9, the same domain will be reused again, and the kernel requests to
create a new mapping, but it will conflict with the existing mapping and
result in -EINVAL.

This problem can be fixed by either of:
- requesting the detachment of the endpoint from the guest when the PCI
device is unplugged (the VF is disabled)


Yes, I think this is an issue in the virtio-iommu driver, which should be
sending a DETACH request when the VF is disabled, likely from
viommu_release_device(). I'll work on a fix unless you would like to do it


It will be nice if you prepare a fix. I will test your patch with my 
workload if you share it with me.


Regards,
Akihiko Odaki




- detecting that the PCI device is gone and automatically detach it on
QEMU-side.

It is not completely clear for me which solution is more appropriate as the
virtio-iommu specification is written in a way independent of the endpoint
mechanism and does not say what should be done when a PCI device is
unplugged.


Yes, I'm not sure it's in scope for the specification, it's more about
software guidance

Thanks,
Jean

Re: [PATCH 1/4] vhost: Re-enable vrings after setting features

2023-04-13 Thread Stefan Hajnoczi

On Thu, 13 Apr 2023 at 04:20, Hanna Czenczek  wrote:
>
> On 12.04.23 22:51, Stefan Hajnoczi wrote:
> > On Tue, Apr 11, 2023 at 05:05:12PM +0200, Hanna Czenczek wrote:
> >> If the back-end supports the VHOST_USER_F_PROTOCOL_FEATURES feature,
> >> setting the vhost features will set this feature, too.  Doing so
> >> disables all vrings, which may not be intended.
> >>
> >> For example, enabling or disabling logging during migration requires
> >> setting those features (to set or unset VHOST_F_LOG_ALL), which will
> >> automatically disable all vrings.  In either case, the VM is running
> >> (disabling logging is done after a failed or cancelled migration, and
> >> only once the VM is running again, see comment in
> >> memory_global_dirty_log_stop()), so the vrings should really be enabled.
> >> As a result, the back-end seems to hang.
> >>
> >> To fix this, we must remember whether the vrings are supposed to be
> >> enabled, and, if so, re-enable them after a SET_FEATURES call that set
> >> VHOST_USER_F_PROTOCOL_FEATURES.
> >>
> >> It seems less than ideal that there is a short period in which the VM is
> >> running but the vrings will be stopped (between SET_FEATURES and
> >> SET_VRING_ENABLE).  To fix this, we would need to change the protocol,
> >> e.g. by introducing a new flag or vhost-user protocol feature to disable
> >> disabling vrings whenever VHOST_USER_F_PROTOCOL_FEATURES is set, or add
> >> new functions for setting/clearing singular feature bits (so that
> >> F_LOG_ALL can be set/cleared without touching F_PROTOCOL_FEATURES).
> >>
> >> Even with such a potential addition to the protocol, we still need this
> >> fix here, because we cannot expect that back-ends will implement this
> >> addition.
> >>
> >> Signed-off-by: Hanna Czenczek 
> >> ---
> >>   include/hw/virtio/vhost.h | 10 ++
> >>   hw/virtio/vhost.c | 13 +
> >>   2 files changed, 23 insertions(+)
> >>
> >> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> >> index a52f273347..2fe02ed5d4 100644
> >> --- a/include/hw/virtio/vhost.h
> >> +++ b/include/hw/virtio/vhost.h
> >> @@ -90,6 +90,16 @@ struct vhost_dev {
> >>   int vq_index_end;
> >>   /* if non-zero, minimum required value for max_queues */
> >>   int num_queues;
> >> +
> >> +/*
> >> + * Whether the virtqueues are supposed to be enabled (via
> >> + * SET_VRING_ENABLE).  Setting the features (e.g. for
> >> + * enabling/disabling logging) will disable all virtqueues if
> >> + * VHOST_USER_F_PROTOCOL_FEATURES is set, so then we need to
> >> + * re-enable them if this field is set.
> >> + */
> >> +bool enable_vqs;
> >> +
> >>   /**
> >>* vhost feature handling requires matching the feature set
> >>* offered by a backend which may be a subset of the total
> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >> index a266396576..cbff589efa 100644
> >> --- a/hw/virtio/vhost.c
> >> +++ b/hw/virtio/vhost.c
> >> @@ -50,6 +50,8 @@ static unsigned int used_memslots;
> >>   static QLIST_HEAD(, vhost_dev) vhost_devices =
> >>   QLIST_HEAD_INITIALIZER(vhost_devices);
> >>
> >> +static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable);
> >> +
> >>   bool vhost_has_free_slot(void)
> >>   {
> >>   unsigned int slots_limit = ~0U;
> >> @@ -899,6 +901,15 @@ static int vhost_dev_set_features(struct vhost_dev 
> >> *dev,
> >>   }
> >>   }
> >>
> >> +if (dev->enable_vqs) {
> >> +/*
> >> + * Setting VHOST_USER_F_PROTOCOL_FEATURES would have disabled all
> >> + * virtqueues, even if that was not intended; re-enable them if
> >> + * necessary.
> >> + */
> >> +vhost_dev_set_vring_enable(dev, true);
> >> +}
> >> +
> >>   out:
> >>   return r;
> >>   }
> >> @@ -1896,6 +1907,8 @@ int vhost_dev_get_inflight(struct vhost_dev *dev, 
> >> uint16_t queue_size,
> >>
> >>   static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable)
> >>   {
> >> +hdev->enable_vqs = enable;
> >> +
> >>   if (!hdev->vhost_ops->vhost_set_vring_enable) {
> >>   return 0;
> >>   }
> > The vhost-user spec doesn't say that VHOST_F_LOG_ALL needs to be toggled
> > at runtime and I don't think VHOST_USER_SET_PROTOCOL_FEATURES is
> > intended to be used like that. This issue shows why doing so is a bad
> > idea.
> >
> > VHOST_F_LOG_ALL does not need to be toggled to control logging. Logging
> > is controlled at runtime by the presence of the dirty log
> > (VHOST_USER_SET_LOG_BASE) and the per-vring logging flag
> > (VHOST_VRING_F_LOG).
>
> Technically, the spec doesn’t say that SET_LOG_BASE is required.  It says:
>
> “To start/stop logging of data/used ring writes, the front-end may send
> messages VHOST_USER_SET_FEATURES with VHOST_F_LOG_ALL and
> VHOST_USER_SET_VRING_ADDR with VHOST_VRING_F_LOG in ring’s flags set to
> 1/0, respectively.”
>
> (So the spec also very much does imply that toggling F_LOG

Re: [PATCH 1/4] vhost: Re-enable vrings after setting features

2023-04-13 Thread Stefan Hajnoczi

On Tue, 11 Apr 2023 at 11:05, Hanna Czenczek  wrote:
>
> If the back-end supports the VHOST_USER_F_PROTOCOL_FEATURES feature,
> setting the vhost features will set this feature, too.  Doing so
> disables all vrings, which may not be intended.
>
> For example, enabling or disabling logging during migration requires
> setting those features (to set or unset VHOST_F_LOG_ALL), which will
> automatically disable all vrings.  In either case, the VM is running
> (disabling logging is done after a failed or cancelled migration, and
> only once the VM is running again, see comment in
> memory_global_dirty_log_stop()), so the vrings should really be enabled.
> As a result, the back-end seems to hang.
>
> To fix this, we must remember whether the vrings are supposed to be
> enabled, and, if so, re-enable them after a SET_FEATURES call that set
> VHOST_USER_F_PROTOCOL_FEATURES.
>
> It seems less than ideal that there is a short period in which the VM is
> running but the vrings will be stopped (between SET_FEATURES and
> SET_VRING_ENABLE).  To fix this, we would need to change the protocol,
> e.g. by introducing a new flag or vhost-user protocol feature to disable
> disabling vrings whenever VHOST_USER_F_PROTOCOL_FEATURES is set, or add
> new functions for setting/clearing singular feature bits (so that
> F_LOG_ALL can be set/cleared without touching F_PROTOCOL_FEATURES).
>
> Even with such a potential addition to the protocol, we still need this
> fix here, because we cannot expect that back-ends will implement this
> addition.
>
> Signed-off-by: Hanna Czenczek 
> ---
>  include/hw/virtio/vhost.h | 10 ++
>  hw/virtio/vhost.c | 13 +
>  2 files changed, 23 insertions(+)
>
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index a52f273347..2fe02ed5d4 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -90,6 +90,16 @@ struct vhost_dev {
>  int vq_index_end;
>  /* if non-zero, minimum required value for max_queues */
>  int num_queues;
> +
> +/*
> + * Whether the virtqueues are supposed to be enabled (via
> + * SET_VRING_ENABLE).  Setting the features (e.g. for
> + * enabling/disabling logging) will disable all virtqueues if
> + * VHOST_USER_F_PROTOCOL_FEATURES is set, so then we need to
> + * re-enable them if this field is set.
> + */
> +bool enable_vqs;
> +
>  /**
>   * vhost feature handling requires matching the feature set
>   * offered by a backend which may be a subset of the total
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index a266396576..cbff589efa 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -50,6 +50,8 @@ static unsigned int used_memslots;
>  static QLIST_HEAD(, vhost_dev) vhost_devices =
>  QLIST_HEAD_INITIALIZER(vhost_devices);
>
> +static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable);
> +
>  bool vhost_has_free_slot(void)
>  {
>  unsigned int slots_limit = ~0U;
> @@ -899,6 +901,15 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
>  }
>  }
>
> +if (dev->enable_vqs) {
> +/*
> + * Setting VHOST_USER_F_PROTOCOL_FEATURES would have disabled all

Is there a reason to put this vhost-user-specific workaround in
vhost.c instead of vhost-user.c?

Stefan

Re: [PATCH 2/4] vhost-user: Interface for migration state transfer

2023-04-13 Thread Stefan Hajnoczi

On Thu, 13 Apr 2023 at 06:15, Eugenio Perez Martin  wrote:
> On Wed, Apr 12, 2023 at 11:06 PM Stefan Hajnoczi  wrote:
> > On Tue, Apr 11, 2023 at 05:05:13PM +0200, Hanna Czenczek wrote:
> > (And I hope vDPA will import the device state vhost-user messages
> > introduced in this series.)
> >
>
> I guess they will be needed for vdpa-fs devices? Is there any emulated
> virtio-fs in qemu?

Maybe also virtio-gpu or virtio-crypto, if someone decides to create
hardware or in-kernel implementations.

virtiofs is not built into QEMU, there are only vhost-user implementations.

Stefan

[PATCH] vnc: increase max display size

2023-04-13 Thread Gerd Hoffmann

It's 2023.  4k display resolutions are a thing these days.
Raise width and height limits of the qemu vnc server.

Resolves: #1596
Signed-off-by: Gerd Hoffmann 
---
 ui/vnc.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ui/vnc.h b/ui/vnc.h
index 757fa83044e7..5d9cd85a3c24 100644
--- a/ui/vnc.h
+++ b/ui/vnc.h
@@ -81,8 +81,8 @@ typedef void VncSendHextileTile(VncState *vs,
 
 /* VNC_MAX_WIDTH must be a multiple of VNC_DIRTY_PIXELS_PER_BIT. */
 
-#define VNC_MAX_WIDTH ROUND_UP(2560, VNC_DIRTY_PIXELS_PER_BIT)
-#define VNC_MAX_HEIGHT 2048
+#define VNC_MAX_WIDTH ROUND_UP(5120, VNC_DIRTY_PIXELS_PER_BIT)
+#define VNC_MAX_HEIGHT 2160
 
 /* VNC_DIRTY_BITS is the number of bits in the dirty bitmap. */
 #define VNC_DIRTY_BITS (VNC_MAX_WIDTH / VNC_DIRTY_PIXELS_PER_BIT)
-- 
2.39.2

[PATCH v3] memory: Optimize replay of guest mapping

2023-04-13 Thread Zhenzhong Duan

On x86, there are two notifiers registered due to vtd-ir memory
region splitting the entire address space. During replay of the
address space for each notifier, the whole address space is
scanned which is unnecessary. We only need to scan the space
belong to notifier monitored space.

While on x86 IOMMU memory region spans over entire address space,
but on some other platforms(e.g. arm mps3-an547), IOMMU memory
region is only a window in the whole address space. user could
register a notifier with arbitrary scope beyond IOMMU memory
region. Though in current implementation replay is only triggered
by VFIO and dirty page sync with notifiers derived from memory
region section, but this isn't guaranteed in the future.

So, we replay the intersection part of IOMMU memory region and
IOMMU notifier in memory_region_iommu_replay().

Signed-off-by: Zhenzhong Duan 
---
v3: Fix assert failure on mps3-an547
v2: Add an assert per Peter
Tested on x86 with a net card passed to guest(kvm/tcg), ping/ssh pass.
Also did simple bootup test with mps3-an547

 hw/i386/intel_iommu.c | 2 +-
 softmmu/memory.c  | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a62896759c78..faade7def867 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3850,7 +3850,7 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, 
IOMMUNotifier *n)
 .domain_id = vtd_get_domain_id(s, &ce, vtd_as->pasid),
 };
 
-vtd_page_walk(s, &ce, 0, ~0ULL, &info, vtd_as->pasid);
+vtd_page_walk(s, &ce, n->start, n->end, &info, vtd_as->pasid);
 }
 } else {
 trace_vtd_replay_ce_invalid(bus_n, PCI_SLOT(vtd_as->devfn),
diff --git a/softmmu/memory.c b/softmmu/memory.c
index b1a6cae6f583..f7af691991de 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1925,7 +1925,7 @@ void memory_region_iommu_replay(IOMMUMemoryRegion 
*iommu_mr, IOMMUNotifier *n)
 {
 MemoryRegion *mr = MEMORY_REGION(iommu_mr);
 IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
-hwaddr addr, granularity;
+hwaddr addr, end, granularity;
 IOMMUTLBEntry iotlb;
 
 /* If the IOMMU has its own replay callback, override */
@@ -1935,8 +1935,9 @@ void memory_region_iommu_replay(IOMMUMemoryRegion 
*iommu_mr, IOMMUNotifier *n)
 }
 
 granularity = memory_region_iommu_get_min_page_size(iommu_mr);
+end = MIN(n->end, memory_region_size(mr));
 
-for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
+for (addr = n->start; addr < end; addr += granularity) {
 iotlb = imrc->translate(iommu_mr, addr, IOMMU_NONE, n->iommu_idx);
 if (iotlb.perm != IOMMU_NONE) {
 n->notify(n, &iotlb);
-- 
2.25.1

Re: [PATCH v4] cxl-cdat:Fix open file not closed in ct3_load_cdat

2023-04-13 Thread Jonathan Cameron via

On Thu, 13 Apr 2023 17:33:28 +0800
Hao Zeng  wrote:

> opened file processor not closed,May cause file processor leaks

Patch description needs to say more on how this is fixed.
Perhaps something like:
"Open file descriptor not closed in error paths. Fix by replace
 open coded handling of read of whole file into a buffer with
 g_file_get_contents()"

Fixes tag is part of the tag block so blank line here

> Fixes: aba578bdac ("hw/cxl: CDAT Data Object Exchange implementation")
> 
An no blank line here.

> Signed-off-by: Zeng Hao 
> Suggested-by: Philippe Mathieu-Daudé 
> Suggested-by: Peter Maydell 
> 
> ---
> ChangeLog:
> v3-v4:
> Modify commit information,No code change.
> v2->v3:
> Submission of v3 on the basis of v2, based on Philippe 
> Mathieu-Daudé's suggestion
> "Pointless bzero in g_malloc0, however this code would be
>  simplified using g_file_get_contents()."
> v1->v2:
> - Patch 1: No change in patch v1
> - Patch 2: Fix the check on the return value of fread() in 
> ct3_load_cdat
> ---
>  hw/cxl/cxl-cdat.c | 30 --
>  1 file changed, 8 insertions(+), 22 deletions(-)
> 
> diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
> index 137abd0992..42c7c2031c 100644
> --- a/hw/cxl/cxl-cdat.c
> +++ b/hw/cxl/cxl-cdat.c
> @@ -110,29 +110,17 @@ static void ct3_load_cdat(CDATObject *cdat, Error 
> **errp)
>  g_autofree CDATEntry *cdat_st = NULL;
>  uint8_t sum = 0;
>  int num_ent;
> -int i = 0, ent = 1, file_size = 0;
> +int i = 0, ent = 1;
> +gsize file_size = 0;
>  CDATSubHeader *hdr;
> -FILE *fp = NULL;
> -
> +GError *error = NULL;

Blank line here.


>  /* Read CDAT file and create its cache */
> -fp = fopen(cdat->filename, "r");
> -if (!fp) {
> -error_setg(errp, "CDAT: Unable to open file");
> -return;
> -}
> -
> -fseek(fp, 0, SEEK_END);
> -file_size = ftell(fp);
> -fseek(fp, 0, SEEK_SET);
> -cdat->buf = g_malloc0(file_size);
> -
> -if (fread(cdat->buf, file_size, 1, fp) == 0) {
> -error_setg(errp, "CDAT: File read failed");
> +if (!g_file_get_contents(cdat->filename, (gchar **)&cdat->buf,
> +&file_size, &error)) {

Align parameters with start of 'cdat' (just after the opening bracket)

> +error_setg(errp, "CDAT: File read failed: %s", error->message);
> +g_error_free(error);
>  return;
>  }
> -
> -fclose(fp);
> -
>  if (file_size < sizeof(CDATTableHeader)) {
>  error_setg(errp, "CDAT: File too short");
>  return;
> @@ -218,7 +206,5 @@ void cxl_doe_cdat_release(CXLComponentState *cxl_cstate)
>  cdat->free_cdat_table(cdat->built_buf, cdat->built_buf_len,
>cdat->private);
>  }
> -if (cdat->buf) {
> -free(cdat->buf);
> -}
> +g_free(cdat->buf);

Keep the protection if moving to g_free().  Not all paths to this function 
allocate cdat->buf
Protection was not needed when the call was free() though. 

I have a followup patch that will deal with the other issues Peter pointed out. 
I'll
send that once yours has been finalized.

Thanks,

Jonathan



>  }

Re: [PATCH v4 0/3] NUMA: Apply cluster-NUMA-node boundary for aarch64 and riscv machines

2023-04-13 Thread Igor Mammedov

On Thu, 13 Apr 2023 13:50:57 +0800
Gavin Shan  wrote:

> On 4/12/23 7:42 PM, Peter Maydell wrote:
> > On Wed, 12 Apr 2023 at 02:08, Gavin Shan  wrote:  
> >> On 3/27/23 9:26 PM, Igor Mammedov wrote:  
> >>> On Fri, 17 Mar 2023 14:25:39 +0800
> >>> Gavin Shan  wrote:
> >>>  
>  For arm64 and riscv architecture, the driver (/base/arch_topology.c) is
>  used to populate the CPU topology in the Linux guest. It's required that
>  the CPUs in one cluster can't span mutiple NUMA nodes. Otherwise, the 
>  Linux
>  scheduling domain can't be sorted out, as the following warning message
>  indicates. To avoid the unexpected confusion, this series attempts to
>  warn about such kind of irregular configurations.
> 
>   -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
>   -numa node,nodeid=0,cpus=0-1,memdev=ram0\
>   -numa node,nodeid=1,cpus=2-3,memdev=ram1\
>   -numa node,nodeid=2,cpus=4-5,memdev=ram2\
> 
>   [ cut here ]
>   WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 
>  build_sched_domains+0x284/0x910
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1
>   pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : build_sched_domains+0x284/0x910
>   lr : build_sched_domains+0x184/0x910
>   sp : 8804bd50
>   x29: 8804bd50 x28: 0002 x27: 
>   x26: 89cf9a80 x25:  x24: 89cbf840
>   x23: 80325000 x22: 005df800 x21: 8a4ce508
>   x20:  x19: 80324440 x18: 0014
>   x17: 388925c0 x16: 5386a066 x15: 9c10cc2e
>   x14: 01c0 x13: 0001 x12: 7fffb1a0
>   x11: 7fffb180 x10: 8a4ce508 x9 : 0041
>   x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001
>   x5 : 0001 x4 : 0007 x3 : 0002
>   x2 : 1000 x1 : 8a4cf928 x0 : 0001
>   Call trace:
>    build_sched_domains+0x284/0x910
>    sched_init_domains+0xac/0xe0
>    sched_init_smp+0x48/0xc8
>    kernel_init_freeable+0x140/0x1ac
>    kernel_init+0x28/0x140
>    ret_from_fork+0x10/0x20
> 
>  PATCH[1] Warn about the irregular configuration if required
>  PATCH[2] Enable the validation for aarch64 machines
>  PATCH[3] Enable the validation for riscv machines
> 
>  v3: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg01226.html
>  v2: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg01080.html
>  v1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00886.html
> 
>  Changelog
>  =
>  v4:
>  * Pick r-b and ack-b from Daniel/Philippe   (Gavin)
>  * Replace local variable @len with possible_cpus->len in
>    validate_cpu_cluster_to_numa_boundary()   
>  (Philippe)
>  v3:
>  * Validate cluster-to-NUMA instead of socket-to-NUMA
>    boundary  (Gavin)
>  * Move the switch from MachineState to MachineClass 
>  (Philippe)
>  * Warning instead of rejecting the irregular configuration  (Daniel)
>  * Comments to mention cluster-to-NUMA is platform instead
>    of architectural choice   (Drew)
>  * Drop PATCH[v2 1/4] related to qtests/numa-test(Gavin)
>  v2:
>  * Fix socket-NUMA-node boundary issues in qtests/numa-test  (Gavin)
>  * Add helper set_numa_socket_boundary() and validate the
>    boundary in the generic path  
>  (Philippe)
> 
>  Gavin Shan (3):
>  numa: Validate cluster and NUMA node boundary if required
>  hw/arm: Validate cluster and NUMA node boundary
>  hw/riscv: Validate cluster and NUMA node boundary
> 
> hw/arm/sbsa-ref.c   |  2 ++
> hw/arm/virt.c   |  2 ++
> hw/core/machine.c   | 42 ++
> hw/riscv/spike.c|  2 ++
> hw/riscv/virt.c |  2 ++
> include/hw/boards.h |  1 +
> 6 files changed, 51 insertions(+)
>   
> >>>
> >>> Acked-by: Igor Mammedov 
> >>>  
> >>
> >> Not sure if QEMU v8.0 is still available to integrate this series.
> >> Otherwise, it should be something for QEMU v8.1. By the way, I'm
> >> also uncertain who needs to be merge this series.  
> > 
> > It barely touches arm specific boards, so I'm assuming it will
> > be reviewed and taken by whoever handles hw/core/

Re: [PATCH 3/4] vhost: Add high-level state save/load functions

2023-04-13 Thread Stefan Hajnoczi

On Thu, 13 Apr 2023 at 05:04, Hanna Czenczek  wrote:
>
> On 12.04.23 23:14, Stefan Hajnoczi wrote:
> > On Tue, Apr 11, 2023 at 05:05:14PM +0200, Hanna Czenczek wrote:
> >> vhost_save_backend_state() and vhost_load_backend_state() can be used by
> >> vhost front-ends to easily save and load the back-end's state to/from
> >> the migration stream.
> >>
> >> Because we do not know the full state size ahead of time,
> >> vhost_save_backend_state() simply reads the data in 1 MB chunks, and
> >> writes each chunk consecutively into the migration stream, prefixed by
> >> its length.  EOF is indicated by a 0-length chunk.
> >>
> >> Signed-off-by: Hanna Czenczek 
> >> ---
> >>   include/hw/virtio/vhost.h |  35 +++
> >>   hw/virtio/vhost.c | 196 ++
> >>   2 files changed, 231 insertions(+)
> >>
> >> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> >> index 29449e0fe2..d1f1e9e1f3 100644
> >> --- a/include/hw/virtio/vhost.h
> >> +++ b/include/hw/virtio/vhost.h
> >> @@ -425,4 +425,39 @@ int vhost_set_device_state_fd(struct vhost_dev *dev,
> >>*/
> >>   int vhost_check_device_state(struct vhost_dev *dev, Error **errp);
> >>
> >> +/**
> >> + * vhost_save_backend_state(): High-level function to receive a vhost
> >> + * back-end's state, and save it in `f`.  Uses
> >> + * `vhost_set_device_state_fd()` to get the data from the back-end, and
> >> + * stores it in consecutive chunks that are each prefixed by their
> >> + * respective length (be32).  The end is marked by a 0-length chunk.
> >> + *
> >> + * Must only be called while the device and all its vrings are stopped
> >> + * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
> >> + *
> >> + * @dev: The vhost device from which to save the state
> >> + * @f: Migration stream in which to save the state
> >> + * @errp: Potential error message
> >> + *
> >> + * Returns 0 on success, and -errno otherwise.
> >> + */
> >> +int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error 
> >> **errp);
> >> +
> >> +/**
> >> + * vhost_load_backend_state(): High-level function to load a vhost
> >> + * back-end's state from `f`, and send it over to the back-end.  Reads
> >> + * the data from `f` in the format used by `vhost_save_state()`, and
> >> + * uses `vhost_set_device_state_fd()` to transfer it to the back-end.
> >> + *
> >> + * Must only be called while the device and all its vrings are stopped
> >> + * (`VHOST_TRANSFER_STATE_PHASE_STOPPED`).
> >> + *
> >> + * @dev: The vhost device to which to send the sate
> >> + * @f: Migration stream from which to load the state
> >> + * @errp: Potential error message
> >> + *
> >> + * Returns 0 on success, and -errno otherwise.
> >> + */
> >> +int vhost_load_backend_state(struct vhost_dev *dev, QEMUFile *f, Error 
> >> **errp);
> >> +
> >>   #endif
> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >> index 90099d8f6a..d08849c691 100644
> >> --- a/hw/virtio/vhost.c
> >> +++ b/hw/virtio/vhost.c
> >> @@ -2125,3 +2125,199 @@ int vhost_check_device_state(struct vhost_dev 
> >> *dev, Error **errp)
> >>  "vhost transport does not support migration state 
> >> transfer");
> >>   return -ENOSYS;
> >>   }
> >> +
> >> +int vhost_save_backend_state(struct vhost_dev *dev, QEMUFile *f, Error 
> >> **errp)
> >> +{
> >> +/* Maximum chunk size in which to transfer the state */
> >> +const size_t chunk_size = 1 * 1024 * 1024;
> >> +void *transfer_buf = NULL;
> >> +g_autoptr(GError) g_err = NULL;
> >> +int pipe_fds[2], read_fd = -1, write_fd = -1, reply_fd = -1;
> >> +int ret;
> >> +
> >> +/* [0] for reading (our end), [1] for writing (back-end's end) */
> >> +if (!g_unix_open_pipe(pipe_fds, FD_CLOEXEC, &g_err)) {
> >> +error_setg(errp, "Failed to set up state transfer pipe: %s",
> >> +   g_err->message);
> >> +ret = -EINVAL;
> >> +goto fail;
> >> +}
> >> +
> >> +read_fd = pipe_fds[0];
> >> +write_fd = pipe_fds[1];
> >> +
> >> +/* VHOST_TRANSFER_STATE_PHASE_STOPPED means the device must be 
> >> stopped */
> >> +assert(!dev->started && !dev->enable_vqs);
> >> +
> >> +/* Transfer ownership of write_fd to the back-end */
> >> +ret = vhost_set_device_state_fd(dev,
> >> +VHOST_TRANSFER_STATE_DIRECTION_SAVE,
> >> +VHOST_TRANSFER_STATE_PHASE_STOPPED,
> >> +write_fd,
> >> +&reply_fd,
> >> +errp);
> >> +if (ret < 0) {
> >> +error_prepend(errp, "Failed to initiate state transfer: ");
> >> +goto fail;
> >> +}
> >> +
> >> +/* If the back-end wishes to use a different pipe, switch over */
> >> +if (reply_fd >= 0) {
> >> +close(read_fd);
> >> +read_fd = reply_fd;
> >> +}
> >> +
> >> +transfer_buf = g_malloc(chunk_size);
> >> +
> >> +while (tr

Re: [PATCH 2/4] vhost-user: Interface for migration state transfer

2023-04-13 Thread Stefan Hajnoczi

On Thu, 13 Apr 2023 at 05:24, Hanna Czenczek  wrote:
>
> On 12.04.23 23:06, Stefan Hajnoczi wrote:
> > On Tue, Apr 11, 2023 at 05:05:13PM +0200, Hanna Czenczek wrote:
> >> So-called "internal" virtio-fs migration refers to transporting the
> >> back-end's (virtiofsd's) state through qemu's migration stream.  To do
> >> this, we need to be able to transfer virtiofsd's internal state to and
> >> from virtiofsd.
> >>
> >> Because virtiofsd's internal state will not be too large, we believe it
> >> is best to transfer it as a single binary blob after the streaming
> >> phase.  Because this method should be useful to other vhost-user
> >> implementations, too, it is introduced as a general-purpose addition to
> >> the protocol, not limited to vhost-user-fs.
> >>
> >> These are the additions to the protocol:
> >> - New vhost-user protocol feature VHOST_USER_PROTOCOL_F_MIGRATORY_STATE:
> >>This feature signals support for transferring state, and is added so
> >>that migration can fail early when the back-end has no support.
> >>
> >> - SET_DEVICE_STATE_FD function: Front-end and back-end negotiate a pipe
> >>over which to transfer the state.  The front-end sends an FD to the
> >>back-end into/from which it can write/read its state, and the back-end
> >>can decide to either use it, or reply with a different FD for the
> >>front-end to override the front-end's choice.
> >>The front-end creates a simple pipe to transfer the state, but maybe
> >>the back-end already has an FD into/from which it has to write/read
> >>its state, in which case it will want to override the simple pipe.
> >>Conversely, maybe in the future we find a way to have the front-end
> >>get an immediate FD for the migration stream (in some cases), in which
> >>case we will want to send this to the back-end instead of creating a
> >>pipe.
> >>Hence the negotiation: If one side has a better idea than a plain
> >>pipe, we will want to use that.
> >>
> >> - CHECK_DEVICE_STATE: After the state has been transferred through the
> >>pipe (the end indicated by EOF), the front-end invokes this function
> >>to verify success.  There is no in-band way (through the pipe) to
> >>indicate failure, so we need to check explicitly.
> >>
> >> Once the transfer pipe has been established via SET_DEVICE_STATE_FD
> >> (which includes establishing the direction of transfer and migration
> >> phase), the sending side writes its data into the pipe, and the reading
> >> side reads it until it sees an EOF.  Then, the front-end will check for
> >> success via CHECK_DEVICE_STATE, which on the destination side includes
> >> checking for integrity (i.e. errors during deserialization).
> >>
> >> Suggested-by: Stefan Hajnoczi 
> >> Signed-off-by: Hanna Czenczek 
> >> ---
> >>   include/hw/virtio/vhost-backend.h |  24 +
> >>   include/hw/virtio/vhost.h |  79 
> >>   hw/virtio/vhost-user.c| 147 ++
> >>   hw/virtio/vhost.c |  37 
> >>   4 files changed, 287 insertions(+)
> >>
> >> diff --git a/include/hw/virtio/vhost-backend.h 
> >> b/include/hw/virtio/vhost-backend.h
> >> index ec3fbae58d..5935b32fe3 100644
> >> --- a/include/hw/virtio/vhost-backend.h
> >> +++ b/include/hw/virtio/vhost-backend.h
> >> @@ -26,6 +26,18 @@ typedef enum VhostSetConfigType {
> >>   VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
> >>   } VhostSetConfigType;
> >>
> >> +typedef enum VhostDeviceStateDirection {
> >> +/* Transfer state from back-end (device) to front-end */
> >> +VHOST_TRANSFER_STATE_DIRECTION_SAVE = 0,
> >> +/* Transfer state from front-end to back-end (device) */
> >> +VHOST_TRANSFER_STATE_DIRECTION_LOAD = 1,
> >> +} VhostDeviceStateDirection;
> >> +
> >> +typedef enum VhostDeviceStatePhase {
> >> +/* The device (and all its vrings) is stopped */
> >> +VHOST_TRANSFER_STATE_PHASE_STOPPED = 0,
> >> +} VhostDeviceStatePhase;
> > vDPA has:
> >
> >/* Suspend a device so it does not process virtqueue requests anymore
> > *
> > * After the return of ioctl the device must preserve all the necessary 
> > state
> > * (the virtqueue vring base plus the possible device specific states) 
> > that is
> > * required for restoring in the future. The device must not change its
> > * configuration after that point.
> > */
> >#define VHOST_VDPA_SUSPEND  _IO(VHOST_VIRTIO, 0x7D)
> >
> >/* Resume a device so it can resume processing virtqueue requests
> > *
> > * After the return of this ioctl the device will have restored all the
> > * necessary states and it is fully operational to continue processing 
> > the
> > * virtqueue descriptors.
> > */
> >#define VHOST_VDPA_RESUME   _IO(VHOST_VIRTIO, 0x7E)
> >
> > I wonder if it makes sense to import these into vhost-user so that the
> > difference between kernel vhost and vhost-user is minimized. It's okay
> > if one of t

Re: clean after distclean gobbles source files

2023-04-13 Thread Thomas Huth


On 07/04/2023 17.44, Steven Sistare wrote:

Run 'make distclean', and GNUmakefile is removed.
But, GNUmakefile is where we cd to build/.
Run 'make distclean' or 'make clean' again, and Makefile applies
the clean actions, such as this one, at the top level of the tree:

 find . \( -name '*.so' -o -name '*.dll' -o \
   -name '*.[oda]' -o -name '*.gcno' \) -type f \
 ! -path ./roms/edk2/ArmPkg/Library/GccLto/liblto-aarch64.a \
 ! -path ./roms/edk2/ArmPkg/Library/GccLto/liblto-arm.a \
 -exec rm {} +

For example, it removes the .d source files in 'meson/test cases/d/*/*.d'.
The damage could be worse in the future if more suffixes are cleaned.

I don't have a suggested fix.  Recursion and the GNUmakefile bootstrap
make it non-trivial.


That's somewhat ugly, indeed.

We could maybe disallow make [dist]clean if running in-tree? Something like 
that:

diff a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -26,7 +26,7 @@ quiet-command-run = $(if $(V),,$(if $2,printf "  %-7s %s\n" $2 $3 
&& ))$1
 quiet-@ = $(if $(V),,@)
 quiet-command = $(quiet-@)$(call quiet-command-run,$1,$2,$3)
 
-UNCHECKED_GOALS := %clean TAGS cscope ctags dist \

+UNCHECKED_GOALS := TAGS cscope ctags dist \
 help check-help print-% \
 docker docker-% vm-help vm-test vm-build-%
 
@@ -201,7 +201,7 @@ recurse-distclean: $(addsuffix /distclean, $(ROMS))
 
 ##
 
-clean: recurse-clean

+clean: config-host.mak recurse-clean
-$(quiet-@)test -f build.ninja && $(NINJA) $(NINJAFLAGS) -t clean || :
-$(quiet-@)test -f build.ninja && $(NINJA) $(NINJAFLAGS) clean-ctlist 
|| :
find . \( -name '*.so' -o -name '*.dll' -o \


... or if we still want to allow that, maybe just make an exception for the *.d 
files:

diff --git a/Makefile b/Makefile
index e421f8a1f4..0cb2a7aa98 100644
--- a/Makefile
+++ b/Makefile
@@ -208,6 +208,7 @@ clean: recurse-clean
  -name '*.[oda]' -o -name '*.gcno' \) -type f \
! -path ./roms/edk2/ArmPkg/Library/GccLto/liblto-aarch64.a \
! -path ./roms/edk2/ArmPkg/Library/GccLto/liblto-arm.a \
+   ! -path './meson/test cases/d/*/*.d' \
-exec rm {} +
rm -f TAGS cscope.* *~ */*~
 


What do you think?

 Thomas

Re: s390x TCG migration failure

2023-04-13 Thread Nina Schoetterl-Glausch

On Wed, 2023-04-12 at 23:01 +0200, Juan Quintela wrote:
> Nina Schoetterl-Glausch  wrote:
> > Hi,
> > 
> > We're seeing failures running s390x migration kvm-unit-tests tests with TCG.
> 
> As this is tcg, could you tell the exact command that you are running?
> Does it needs to be in s390x host, rigth?

I've just tried with a cross compile of kvm-unit-tests and that fails, too.

git clone https://gitlab.com/kvm-unit-tests/kvm-unit-tests.git
cd kvm-unit-tests/
./configure --cross-prefix=s390x-linux-gnu- --arch=s390x
make
for i in {0..30}; do echo $i; QEMU=../qemu/build/qemu-system-s390x ACCEL=tcg 
./run_tests.sh migration-skey-sequential | grep FAIL && break; done

> 
> $ time ./tests/qtest/migration-test

I haven't looked if that test fails at all, we just noticed it with the 
kvm-unit-tests.

> # random seed: R02S940c4f22abc48b14868566639d3d6c77
> # Skipping test: s390x host with KVM is required
> 1..0
> 
> real  0m0.003s
> user  0m0.002s
> sys   0m0.001s
> 
> 
> > Some initial findings:
> > What seems to be happening is that after migration a control block
> > header accessed by the test code is all zeros which causes an
> > unexpected exception.
> 
> What exception?
> 
> What do you mean here by control block header?

It's all s390x test guest specific stuff, I don't expect it to be too helpful.
The guest gets a specification exception program interrupt while executing a 
SERVC because
the SCCB control block is invalid.

See https://gitlab.com/qemu-project/qemu/-/issues/1565 for a code snippet.
The guest sets a bunch of fields in the SCCB header, but when TCG emulates the 
SERVC,
they are zero which doesn't make sense.

> 
> > I did a bisection which points to c8df4a7aef ("migration: Split 
> > save_live_pending() into state_pending_*") as the culprit.
> > The migration issue persists after applying the fix e264705012 ("migration: 
> > I messed state_pending_exact/estimate") on top of c8df4a7aef.
> > 
> > Applying
> > 
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 56ff9cd29d..2dc546cf28 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -3437,7 +3437,7 @@ static void ram_state_pending_exact(void *opaque, 
> > uint64_t max_size,
> >  
> >  uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
> >  
> > -if (!migration_in_postcopy()) {
> > +if (!migration_in_postcopy() && remaining_size < max_size) {
> 
> If block is all zeros, then remaining_size should be zero, so always
> smaller than max_size.
> 
> I don't really fully understand what is going here.
> 
> >  qemu_mutex_lock_iothread();
> >  WITH_RCU_READ_LOCK_GUARD() {
> >  migration_bitmap_sync_precopy(rs);
> > 
> > on top fixes or hides the issue. (The comparison was removed by c8df4a7aef.)
> > I arrived at this by experimentation, I haven't looked into why this makes 
> > a difference.
> > 
> > Any thoughts on the matter appreciated.
> 
> Later, Juan.
>

Re: [PULL 19/54] acpi: pc: isa bridge: use AcpiDevAmlIf interface to build ISA device descriptors

2023-04-13 Thread Mike Maslenkin

Sorry for the noise, but just curious, how did you shutdown Windows?
Did you use 'shutdown /s' or just press power button?
Could it be that Windows was actually hibernated.
So, when you try to boot it on the new (old) QEMU version with changed
PCI topology, this could make it upset.
I observed similar behaviour in case of Windows for ARM, but there was
true GSOD afterwards.
When windows is starting again its hibernated state dropped and all goes fine.

Best Regards,
Mike


On Thu, Apr 13, 2023 at 1:34 PM Fiona Ebner  wrote:
>
> Am 12.04.23 um 14:18 schrieb Igor Mammedov:
> > On Thu, 30 Mar 2023 13:58:22 +0200
> > Fiona Ebner  wrote:
> >
> >> Am 30.03.23 um 10:22 schrieb Igor Mammedov:
> >>> On Tue, 28 Mar 2023 14:58:21 +0200
> >>> Fiona Ebner  wrote:
> >>>
> 
>  Hi,
>  while trying to reproduce another issue, I ended up with a Windows 10
>  guest that would boot with QEMU 7.0, but get stuck after the Windows
>  logo/spinning circles with QEMU 7.1 (also with 8.0.0-rc1). Machine type
>  is pc-i440fx-6.2[0]. Bisecting led to this commit.
> 
>  It only happens the first time the VM is booted, killing the process and
>  re-trying always worked afterwards. So it's not a big deal and might
>  just be some ACPI-related Windows quirk. But I thought I should ask here
>  to be sure.
> 
>  For bisecting, I restored the disk state after each attempt. While
>  getting stuck sometimes took 3-4 attempts, I tested about 10 times until
>  I declared a commit good, and re-tested the commit before this one 15
>  times, so I'm pretty sure this is the one where the issue started 
>  appearing.
> 
>  So, anything that could potentially be wrong with the commit or is this
>  most likely just some Windows quirk/bug we can't do much about?
> 
>  If you need more information, please let me know!
> >>>
> >>> Please describe in more detail your setup/steps where it reproduces
> >>> (incl. Windows version/build, used QEMU CLI) so I could try to reproduce 
> >>> it locally.
> >>>
> >>> (in past there were issues with German version that some where
> >>> experience but not reproducible on my side, that resolved with
> >>> upgrading to newer QEMU (if I recall correctly issue was opened
> >>> on QEMU's gitlab tracker))
> >>>
> >>
> >> Windows 10 Education
> >> Version 1809
> >> Build 17763.1
> >>
> >> It's not the German ISO, I used default settings (except location
> >> Austria and German keymap) and I don't think I did anything other than
> >> shutdown after the install was over.
> >>
> >> The command line is below. I did use our patched QEMU builds when I got
> >> into the situation, but I don't think they touch anything ACPI-related
> >> and bisecting was done without our patches on top.
> >>
> >> I tried to reproduce the situation again from scratch today, but wasn't
> >> able to. I do still have the problematic disk (snapshot) where the issue
> >> occurs as an LVM-Thin volume. If you'd like to have access to that,
> >> please send me a direct mail and we can discuss the details there.
> >
> > I couldn't reproduce the issue on my host either.
> > If you still have access to 'broken' disk image, you can try to enable
> > kernel debug mode in guest and try to attach with debugger to it to see
> > where it is stuck.
> >
> > quick instructions how to do it:
> >  https://gitlab.com/qemu-project/qemu/-/issues/774#note_1270248862
> > or read more extensive MS docs on topic.
> >
>
> Hmm, I guess I won't be able to enable kernel debug mode without losing
> the problematic state of the image. The VM only gets stuck during the
> first boot attempt.
>
> Still, I wanted to give it a shot in the hope I can trigger it again
> when shutting down with QEMU 6.2.0 and booting with QEMU 7.1.0. I made a
> copy of the VM intending to use it as the debug host, but didn't get the
> COM port to show up in the guest with
> -serial unix:/tmp/com1,server,nowait
> I checked in the Device Manager with "Show hidden devices" enabled.
>
> Anyway, when starting the original problematic VM again, it now also got
> stuck (visually, in the same place) with QEMU 6.2.0! But only until I
> rebooted my host, which made it working with QEMU 6.2.0 again. So I'd
> say this commit has nothing to do with the issue after all, just made it
> more likely to trigger for me. And also seems less likely to be a QEMU
> issue now :)
>
> Best Regards,
> Fiona
>
>

Re: netdev-socket test hang (s390 host, mips64el guest, backtrace)

2023-04-13 Thread Peter Maydell

On Thu, 13 Apr 2023 at 11:50, Peter Maydell  wrote:
>
> I just found a hung netdev-socket test on our s390 CI runner.
> Looks like a deadlock, no processes using CPU.
> Here's the backtrace; looks like both QEMU processes are sat
> idle but the test process is sat waiting forever for something
> in test_stream_inet_reconnect(). Any ideas?

May well not be related, but I think there's a race condition
in this test's inet_get_free_port() code. The code tries
to find a free port number by creating a socket, looking
at what port it is bound to, and then closing the socket.
If there are several copies of this test running at once
(as is plausible in a 'make -j8' setup), then you can
get an interleaving:

 test 1   test 2
   find a port number
   close the socket
  find a port number
  (get the same number as test 1)
  close the socket
   use port number for test
  use port number for test
  (fail because of test 1)

thanks
-- PMM

[PATCH 0/1] qemu-options.hx: Update descriptions of memory options

2023-04-13 Thread Yohei Kojima

This patch updates an outdated description in qemu-options.hx.
The patch reflects the changes in qemu behavior already described in
another documentation, and it also changes paragraph structure for
further readability.

ChangeLog:
v2:
* Moved the description for the legacy `mem` option below example

Yohei Kojima (1):
  qemu-options.hx: Update descriptions of memory options for NUMA node

 qemu-options.hx | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

-- 
2.39.2

[PATCH 1/1] qemu-options.hx: Update descriptions of memory options for NUMA node

2023-04-13 Thread Yohei Kojima

This commit adds the following description:
1. `memdev` option is recommended over `mem` option (see [1,2])
2. users must specify memory for all NUMA nodes (see [2])

This commit also separates descriptions for `mem` and `memdev` into two
paragraphs. The old doc describes legacy `mem` option first, and it was
a bit confusing.

Related documantations:
[1] https://wiki.qemu.org/ChangeLog/5.1#Incompatible_changes
[2] https://www.qemu.org/docs/master/about/removed-features.html

Signed-off-by: Yohei Kojima 
---
 qemu-options.hx | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 59bdf67a2c..b65f88eaf8 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -405,15 +405,9 @@ SRST
 -numa node,nodeid=0 -numa node,nodeid=1 \
 -numa cpu,node-id=0,socket-id=0 -numa cpu,node-id=1,socket-id=1
 
-Legacy '\ ``mem``\ ' assigns a given RAM amount to a node (not supported
-for 5.1 and newer machine types). '\ ``memdev``\ ' assigns RAM from
-a given memory backend device to a node. If '\ ``mem``\ ' and
-'\ ``memdev``\ ' are omitted in all nodes, RAM is split equally between 
them.
-
-
-'\ ``mem``\ ' and '\ ``memdev``\ ' are mutually exclusive.
-Furthermore, if one node uses '\ ``memdev``\ ', all of them have to
-use it.
+'\ ``memdev``\ ' option assigns RAM from a given memory backend
+device to a node. Users must specify memory for all NUMA nodes by
+'\ ``memdev``\ ' (or legacy '\ ``mem``\ ' if available).
 
 '\ ``initiator``\ ' is an additional option that points to an
 initiator NUMA node that has best performance (the lowest latency or
@@ -448,6 +442,20 @@ SRST
 even when they are symmetrical. When a node is unreachable from
 another node, set the pair's distance to 255.
 
+.. note::
+
+For compatibility reasons, legacy '\ ``mem``\ ' option is
+supported in 5.0 and older machine types. It is recommended
+to use '\ ``memdev``\ ' option over legacy '\ ``mem``\ '
+option. This is because '\ ``memdev``\ ' option provides
+better performance and more control over the backend's RAM
+(e.g. '\ ``prealloc``\ ' parameter of
+'\ ``-memory-backend-ram``\ ' allows memory preallocation).
+
+Note that '\ ``mem``\ ' and '\ ``memdev``\ ' are mutually
+exclusive. If one node uses '\ ``memdev``\ ', the rest nodes
+have to use '\ ``memdev``\ ' option, and vice versa.
+
 Note that the -``numa`` option doesn't allocate any of the specified
 resources, it just assigns existing resources to NUMA nodes. This
 means that one still has to use the ``-m``, ``-smp`` options to
-- 
2.39.2

Re: [PULL 19/54] acpi: pc: isa bridge: use AcpiDevAmlIf interface to build ISA device descriptors

2023-04-13 Thread Fiona Ebner

Am 13.04.23 um 13:46 schrieb Mike Maslenkin:
> Sorry for the noise, but just curious, how did you shutdown Windows?
> Did you use 'shutdown /s' or just press power button?
> Could it be that Windows was actually hibernated.
> So, when you try to boot it on the new (old) QEMU version with changed
> PCI topology, this could make it upset.
> I observed similar behaviour in case of Windows for ARM, but there was
> true GSOD afterwards.
> When windows is starting again its hibernated state dropped and all goes fine.
> 
> Best Regards,
> Mike

I think I either pressed the shutdown button in our UI, which sends
system_powerdown via QMP or via "Shut down" in the Windows start menu.
Hibernation is surely something I need to consider (next time), so thank
you for the hint, but if it were that, I'd be surprised at why it got
stuck even with QEMU 6.2.0 today.

If I try "shutdown /h" explicitly, I get "The request is not
supported.(50)".

Best Regards,
Fiona

[PATCH v5] cxl-cdat:Fix open file not closed in ct3_load_cdat

2023-04-13 Thread Hao Zeng


Open file descriptor not closed in error paths. Fix by replace
open coded handling of read of whole file into a buffer with
g_file_get_contents()

Fixes: aba578bdac ("hw/cxl: CDAT Data Object Exchange implementation")
Signed-off-by: Zeng Hao 
Suggested-by: Philippe Mathieu-DaudÃ© 
Suggested-by: Peter Maydell 
Suggested-by: Jonathan Cameron via 

---
ChangeLog:
v4-v5:
fixes some style issues and keep the protection after using g_free()
v3-v4:
Modify commit information,No code change.
v2->v3:
Submission of v3 on the basis of v2, based on Philippe Mathieu-DaudÃ©'s 
suggestion
"Pointless bzero in g_malloc0, however this code would be
 simplified using g_file_get_contents()."
v1->v2:
- Patch 1: No change in patch v1
- Patch 2: Fix the check on the return value of fread() in ct3_load_cdat
---
 hw/cxl/cxl-cdat.c | 27 ---
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
index 137abd0992..dd69366797 100644
--- a/hw/cxl/cxl-cdat.c
+++ b/hw/cxl/cxl-cdat.c
@@ -110,29 +110,18 @@ static void ct3_load_cdat(CDATObject *cdat, Error **errp)
 g_autofree CDATEntry *cdat_st = NULL;
 uint8_t sum = 0;
 int num_ent;
-int i = 0, ent = 1, file_size = 0;
+int i = 0, ent = 1;
+gsize file_size = 0;
 CDATSubHeader *hdr;
-FILE *fp = NULL;
+GError *error = NULL;
 
 /* Read CDAT file and create its cache */
-fp = fopen(cdat->filename, "r");
-if (!fp) {
-error_setg(errp, "CDAT: Unable to open file");
+if (!g_file_get_contents(cdat->filename, (gchar **)&cdat->buf,
+ &file_size, &error)) {
+error_setg(errp, "CDAT: File read failed: %s", error->message);
+g_error_free(error);
 return;
 }
-
-fseek(fp, 0, SEEK_END);
-file_size = ftell(fp);
-fseek(fp, 0, SEEK_SET);
-cdat->buf = g_malloc0(file_size);
-
-if (fread(cdat->buf, file_size, 1, fp) == 0) {
-error_setg(errp, "CDAT: File read failed");
-return;
-}
-
-fclose(fp);
-
 if (file_size < sizeof(CDATTableHeader)) {
 error_setg(errp, "CDAT: File too short");
 return;
@@ -219,6 +208,6 @@ void cxl_doe_cdat_release(CXLComponentState *cxl_cstate)
   cdat->private);
 }
 if (cdat->buf) {
-free(cdat->buf);
+g_free(cdat->buf);
 }
 }
-- 
2.37.2

Content-type: Text/plain

No virus found
Checked by Hillstone Network AntiVirus

Re: [PATCH 0/1] qemu-options.hx: Update descriptions of memory options

2023-04-13 Thread Yohei Kojima

I forgot to add v2 prefix to the subject. This is a revised patch from:
https://patchew.org/QEMU/tyzpr06mb5418d6b0175a49e8e76988439d...@tyzpr06mb5418.apcprd06.prod.outlook.com/

On 2023/04/13 21:15, Yohei Kojima wrote:
> This patch updates an outdated description in qemu-options.hx.
> The patch reflects the changes in qemu behavior already described in
> another documentation, and it also changes paragraph structure for
> further readability.
> 
> ChangeLog:
> v2:
> * Moved the description for the legacy `mem` option below example
> 
> Yohei Kojima (1):
>   qemu-options.hx: Update descriptions of memory options for NUMA node
> 
>  qemu-options.hx | 26 +-
>  1 file changed, 17 insertions(+), 9 deletions(-)
>

Re: [PATCH for-8.1] hw/display: Compile vga.c as target-independent code

2023-04-13 Thread Fabiano Rosas

Thomas Huth  writes:

> The target checks here are only during the initialization, so they
> are not performance critical. We can switch these to runtime checks
> to avoid that we have to compile this file multiple times during
> the build, and make the code ready for an universal build one day.
>
> Signed-off-by: Thomas Huth 

Reviewed-by: Fabiano Rosas

Re: [PATCH v4] cxl-cdat:Fix open file not closed in ct3_load_cdat

2023-04-13 Thread Hao Zeng

On Thu, 2023-04-13 at 12:17 +0100, Jonathan Cameron wrote:
> On Thu, 13 Apr 2023 17:33:28 +0800
> Hao Zeng  wrote:
> 
> > opened file processor not closed,May cause file processor leaks
> 
> Patch description needs to say more on how this is fixed.
> Perhaps something like:
> "Open file descriptor not closed in error paths. Fix by replace
>  open coded handling of read of whole file into a buffer with
>  g_file_get_contents()"
> 
> Fixes tag is part of the tag block so blank line here
> 
> > Fixes: aba578bdac ("hw/cxl: CDAT Data Object Exchange
> > implementation")
> > 
> An no blank line here.
> 
> > Signed-off-by: Zeng Hao 
> > Suggested-by: Philippe Mathieu-Daudé 
> > Suggested-by: Peter Maydell 
> > 
> > ---
> > ChangeLog:
> >     v3-v4:
> >     Modify commit information,No code change.
> >     v2->v3:
> >     Submission of v3 on the basis of v2, based on Philippe
> > Mathieu-Daudé's suggestion
> >     "Pointless bzero in g_malloc0, however this code would be
> >  simplified using g_file_get_contents()."
> >     v1->v2:
> >     - Patch 1: No change in patch v1
> >     - Patch 2: Fix the check on the return value of fread() in
> > ct3_load_cdat
> > ---
> >  hw/cxl/cxl-cdat.c | 30 --
> >  1 file changed, 8 insertions(+), 22 deletions(-)
> > 
> > diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
> > index 137abd0992..42c7c2031c 100644
> > --- a/hw/cxl/cxl-cdat.c
> > +++ b/hw/cxl/cxl-cdat.c
> > @@ -110,29 +110,17 @@ static void ct3_load_cdat(CDATObject *cdat,
> > Error **errp)
> >  g_autofree CDATEntry *cdat_st = NULL;
> >  uint8_t sum = 0;
> >  int num_ent;
> > -    int i = 0, ent = 1, file_size = 0;
> > +    int i = 0, ent = 1;
> > +    gsize file_size = 0;
> >  CDATSubHeader *hdr;
> > -    FILE *fp = NULL;
> > -
> > +    GError *error = NULL;
> 
> Blank line here.
> 
> 
> >  /* Read CDAT file and create its cache */
> > -    fp = fopen(cdat->filename, "r");
> > -    if (!fp) {
> > -    error_setg(errp, "CDAT: Unable to open file");
> > -    return;
> > -    }
> > -
> > -    fseek(fp, 0, SEEK_END);
> > -    file_size = ftell(fp);
> > -    fseek(fp, 0, SEEK_SET);
> > -    cdat->buf = g_malloc0(file_size);
> > -
> > -    if (fread(cdat->buf, file_size, 1, fp) == 0) {
> > -    error_setg(errp, "CDAT: File read failed");
> > +    if (!g_file_get_contents(cdat->filename, (gchar **)&cdat->buf,
> > +    &file_size, &error)) {
> 
> Align parameters with start of 'cdat' (just after the opening
> bracket)
> 
> > +    error_setg(errp, "CDAT: File read failed: %s", error-
> > >message);
> > +    g_error_free(error);
> >  return;
> >  }
> > -
> > -    fclose(fp);
> > -
> >  if (file_size < sizeof(CDATTableHeader)) {
> >  error_setg(errp, "CDAT: File too short");
> >  return;
> > @@ -218,7 +206,5 @@ void cxl_doe_cdat_release(CXLComponentState
> > *cxl_cstate)
> >  cdat->free_cdat_table(cdat->built_buf, cdat-
> > >built_buf_len,
> >    cdat->private);
> >  }
> > -    if (cdat->buf) {
> > -    free(cdat->buf);
> > -    }
> > +    g_free(cdat->buf);
> 
> Keep the protection if moving to g_free().  Not all paths to this
> function allocate cdat->buf
> Protection was not needed when the call was free() though. 
> 
> I have a followup patch that will deal with the other issues Peter
> pointed out. I'll
> send that once yours has been finalized.
> 
> Thanks,
> 
> Jonathan
> 
> 
> 
> >  }
> 
Dear Jonathan

   Thank you for taking the time to reply to my email. I appreciate
your the valuable information you have provided.
   Already submitted in v5 according to the modifications.

Best regards
Hao

Re: [PATCH v4] cxl-cdat:Fix open file not closed in ct3_load_cdat

2023-04-13 Thread Hao Zeng

On Thu, 2023-04-13 at 12:17 +0100, Jonathan Cameron wrote:
> On Thu, 13 Apr 2023 17:33:28 +0800
> Hao Zeng  wrote:
> 
> > opened file processor not closed,May cause file processor leaks
> 
> Patch description needs to say more on how this is fixed.
> Perhaps something like:
> "Open file descriptor not closed in error paths. Fix by replace
>  open coded handling of read of whole file into a buffer with
>  g_file_get_contents()"
> 
> Fixes tag is part of the tag block so blank line here
> 
> > Fixes: aba578bdac ("hw/cxl: CDAT Data Object Exchange
> > implementation")
> > 
> An no blank line here.
> 
> > Signed-off-by: Zeng Hao 
> > Suggested-by: Philippe Mathieu-Daudé 
> > Suggested-by: Peter Maydell 
> > 
> > ---
> > ChangeLog:
> >     v3-v4:
> >     Modify commit information,No code change.
> >     v2->v3:
> >     Submission of v3 on the basis of v2, based on Philippe
> > Mathieu-Daudé's suggestion
> >     "Pointless bzero in g_malloc0, however this code would be
> >  simplified using g_file_get_contents()."
> >     v1->v2:
> >     - Patch 1: No change in patch v1
> >     - Patch 2: Fix the check on the return value of fread() in
> > ct3_load_cdat
> > ---
> >  hw/cxl/cxl-cdat.c | 30 --
> >  1 file changed, 8 insertions(+), 22 deletions(-)
> > 
> > diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
> > index 137abd0992..42c7c2031c 100644
> > --- a/hw/cxl/cxl-cdat.c
> > +++ b/hw/cxl/cxl-cdat.c
> > @@ -110,29 +110,17 @@ static void ct3_load_cdat(CDATObject *cdat,
> > Error **errp)
> >  g_autofree CDATEntry *cdat_st = NULL;
> >  uint8_t sum = 0;
> >  int num_ent;
> > -    int i = 0, ent = 1, file_size = 0;
> > +    int i = 0, ent = 1;
> > +    gsize file_size = 0;
> >  CDATSubHeader *hdr;
> > -    FILE *fp = NULL;
> > -
> > +    GError *error = NULL;
> 
> Blank line here.
> 
> 
> >  /* Read CDAT file and create its cache */
> > -    fp = fopen(cdat->filename, "r");
> > -    if (!fp) {
> > -    error_setg(errp, "CDAT: Unable to open file");
> > -    return;
> > -    }
> > -
> > -    fseek(fp, 0, SEEK_END);
> > -    file_size = ftell(fp);
> > -    fseek(fp, 0, SEEK_SET);
> > -    cdat->buf = g_malloc0(file_size);
> > -
> > -    if (fread(cdat->buf, file_size, 1, fp) == 0) {
> > -    error_setg(errp, "CDAT: File read failed");
> > +    if (!g_file_get_contents(cdat->filename, (gchar **)&cdat->buf,
> > +    &file_size, &error)) {
> 
> Align parameters with start of 'cdat' (just after the opening
> bracket)
> 
> > +    error_setg(errp, "CDAT: File read failed: %s", error-
> > >message);
> > +    g_error_free(error);
> >  return;
> >  }
> > -
> > -    fclose(fp);
> > -
> >  if (file_size < sizeof(CDATTableHeader)) {
> >  error_setg(errp, "CDAT: File too short");
> >  return;
> > @@ -218,7 +206,5 @@ void cxl_doe_cdat_release(CXLComponentState
> > *cxl_cstate)
> >  cdat->free_cdat_table(cdat->built_buf, cdat-
> > >built_buf_len,
> >    cdat->private);
> >  }
> > -    if (cdat->buf) {
> > -    free(cdat->buf);
> > -    }
> > +    g_free(cdat->buf);
> 
> Keep the protection if moving to g_free().  Not all paths to this
> function allocate cdat->buf
> Protection was not needed when the call was free() though. 
> 
> I have a followup patch that will deal with the other issues Peter
> pointed out. I'll
> send that once yours has been finalized.
> 
> Thanks,
> 
> Jonathan
> 
> 
> 
> >  }
> 
Dear Jonathan

   Thank you for taking the time to reply to my email. I appreciate
your the valuable information you have provided.
   Already submitted in v5 according to the modifications.

Best regards
Hao

Re: [PATCH 0/2] hw/arm/npcm7xx_gpio: Add some pin state QOM

2023-04-13 Thread Peter Maydell

On Thu, 6 Apr 2023 at 01:25, Joe Komlodi  wrote:
>
> Hi all,
>
> This series adds a couple QOM properties for retrieving and setting pin
> state via qom-get and qom-get.
>
> We ran into a situation in multi-SoC simulation where the BMC would need
> to update its input pin state based on behavior from the other SoC. It
> made the most sense to expose this over QMP, so this adds properties to
> allow people to do so.

This does leave the simulation in an odd situation if
the input GPIO was connected to some other device -- the
other device thinks it's put the GPIO line low, but then something
external has reached in and set it to 1, so the two ends of
what is conceptually a single signal line now disagree about
what voltage it's at...

It looks like the hw/gpio/aspeed_gpio device has been here before
you, only that device chose to use one bool property per GPIO
line. It would be nice to be consistent -- if we want to allow
QOM to set/get the GPIO line values, it should be the same
interface regardless of GPIO controller.

-- PMM

Re: [PATCH] softmmu: Move dirtylimit.c into the target independent source set

2023-04-13 Thread Fabiano Rosas

Thomas Huth  writes:

> dirtylimit.c just uses one TARGET_PAGE_SIZE macro - change it to
> qemu_target_page_size() so we can move thefile into the target
> independent source set. Then we only have to compile this file
> once during the build instead of multiple times (one time for
> each target).
>
> Signed-off-by: Thomas Huth 

Reviewed-by: Fabiano Rosas

Re: [PULL 0/5] Migration 20230412 patches

2023-04-13 Thread Peter Maydell

On Wed, 12 Apr 2023 at 22:46, Juan Quintela  wrote:
>
> The following changes since commit abb02ce0e76a8e00026699a863ab2d11d88f56d4:
>
>   Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging 
> (2023-04-11 16:19:06 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/juan.quintela/qemu.git 
> tags/migration-20230412-pull-request
>
> for you to fetch changes up to 28ef5339c37f1f78c2fa4df2295bc0cd73a0abfd:
>
>   migration: fix ram_state_pending_exact() (2023-04-12 22:47:50 +0200)
>
> 
> Migration Pull request for 8.0
>
> Last patches found:
> - peter xu preempt channel fixes.
>   needed for backward compatibility with old machine types.
> - lukas fix to get compress working again.
>
> - fix ram on s390x.  Get back to the old code, even when it shouldn't
>   be needed, but as it fails on s390x, just revert.
>
> Later, Juan.


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0
for any user-visible changes.

-- PMM

Re: [PATCH 1/4] vhost: Re-enable vrings after setting features

2023-04-13 Thread Michael S. Tsirkin

On Tue, Apr 11, 2023 at 05:05:12PM +0200, Hanna Czenczek wrote:
> If the back-end supports the VHOST_USER_F_PROTOCOL_FEATURES feature,
> setting the vhost features will set this feature, too.  Doing so
> disables all vrings, which may not be intended.

Hmm not sure I understand: why does it disable vrings?

> For example, enabling or disabling logging during migration requires
> setting those features (to set or unset VHOST_F_LOG_ALL), which will
> automatically disable all vrings.  In either case, the VM is running
> (disabling logging is done after a failed or cancelled migration, and
> only once the VM is running again, see comment in
> memory_global_dirty_log_stop()), so the vrings should really be enabled.
> As a result, the back-end seems to hang.
> 
> To fix this, we must remember whether the vrings are supposed to be
> enabled, and, if so, re-enable them after a SET_FEATURES call that set
> VHOST_USER_F_PROTOCOL_FEATURES.
> 
> It seems less than ideal that there is a short period in which the VM is
> running but the vrings will be stopped (between SET_FEATURES and
> SET_VRING_ENABLE).  To fix this, we would need to change the protocol,
> e.g. by introducing a new flag or vhost-user protocol feature to disable
> disabling vrings whenever VHOST_USER_F_PROTOCOL_FEATURES is set, or add
> new functions for setting/clearing singular feature bits (so that
> F_LOG_ALL can be set/cleared without touching F_PROTOCOL_FEATURES).
> 
> Even with such a potential addition to the protocol, we still need this
> fix here, because we cannot expect that back-ends will implement this
> addition.
> 
> Signed-off-by: Hanna Czenczek 
> ---
>  include/hw/virtio/vhost.h | 10 ++
>  hw/virtio/vhost.c | 13 +
>  2 files changed, 23 insertions(+)
> 
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index a52f273347..2fe02ed5d4 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -90,6 +90,16 @@ struct vhost_dev {
>  int vq_index_end;
>  /* if non-zero, minimum required value for max_queues */
>  int num_queues;
> +
> +/*
> + * Whether the virtqueues are supposed to be enabled (via
> + * SET_VRING_ENABLE).  Setting the features (e.g. for
> + * enabling/disabling logging) will disable all virtqueues if
> + * VHOST_USER_F_PROTOCOL_FEATURES is set, so then we need to
> + * re-enable them if this field is set.
> + */
> +bool enable_vqs;
> +
>  /**
>   * vhost feature handling requires matching the feature set
>   * offered by a backend which may be a subset of the total
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index a266396576..cbff589efa 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -50,6 +50,8 @@ static unsigned int used_memslots;
>  static QLIST_HEAD(, vhost_dev) vhost_devices =
>  QLIST_HEAD_INITIALIZER(vhost_devices);
>  
> +static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable);
> +
>  bool vhost_has_free_slot(void)
>  {
>  unsigned int slots_limit = ~0U;
> @@ -899,6 +901,15 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
>  }
>  }
>  
> +if (dev->enable_vqs) {
> +/*
> + * Setting VHOST_USER_F_PROTOCOL_FEATURES would have disabled all
> + * virtqueues, even if that was not intended; re-enable them if
> + * necessary.
> + */
> +vhost_dev_set_vring_enable(dev, true);
> +}
> +
>  out:
>  return r;
>  }
> @@ -1896,6 +1907,8 @@ int vhost_dev_get_inflight(struct vhost_dev *dev, 
> uint16_t queue_size,
>  
>  static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable)
>  {
> +hdev->enable_vqs = enable;
> +
>  if (!hdev->vhost_ops->vhost_set_vring_enable) {
>  return 0;
>  }
> -- 
> 2.39.1

Re: [PATCH] target/arm: Add overflow check for gt_recalc_timer

2023-04-13 Thread Peter Maydell

On Thu, 6 Apr 2023 at 16:16, Leonid Komarianskyi
 wrote:
>
> If gt_timer is enabled before cval initialization on a virtualized
> setup on QEMU, cval equals (UINT64_MAX - 1). Adding an offset value
> to this causes an overflow that sets timer into the past, which leads
> to infinite loop, because this timer fires immediately and calls
> gt_recalc_timer() once more, which in turn sets the timer into the
> past again and as a result, QEMU hangs. This patch adds check for
> overflowing of the nexttick variable.

This is https://gitlab.com/qemu-project/qemu/-/issues/60 --
thanks for sending a patch.

> Suggested-by: Volodymyr Babchuk 
> Co-Authored-By: Dmytro Firsov 
> Signed-off-by: Leonid Komarianskyi 
> ---
>  target/arm/helper.c | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 2297626bfb..2fbba15040 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -2618,6 +2618,7 @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
>  int istatus = count - offset >= gt->cval;
>  uint64_t nexttick;
>  int irqstate;
> +bool nexttick_overflow = false;
>
>  gt->ctl = deposit32(gt->ctl, 2, 1, istatus);
>
> @@ -2630,6 +2631,16 @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
>  } else {
>  /* Next transition is when we hit cval */
>  nexttick = gt->cval + offset;
> +if (nexttick < offset) {
> +/*
> + * If gt->cval value is close to UINT64_MAX then adding
> + * to it offset can lead to overflow of nexttick variable.
> + * So, this check tests that arguments sum is less than any
> + * addend, and in case it is overflowed we have to mod timer
> + * to INT64_MAX.
> + */
> +nexttick_overflow = true;
> +}

Rather than adding in a bool, I think I prefer the version
of the patch in one of the comments to the bug report:

 /* Next transition is when we hit cval */
 nexttick = gt->cval + offset;
+if (nexttick < gt->cval) {
+nexttick = UINT64_MAX;
+}

i.e. we just saturate nexttick, and then let the existing handling
of "turns out nexttick is too big" handle things.

There is also a comment or two from me in the bug report pointing
out that the handling of wraparound is also wrong in the other
half of this if(); we should look at that too.

>  }
>  /*
>   * Note that the desired next expiry time might be beyond the
> @@ -2637,7 +2648,8 @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
>   * set the timer for as far in the future as possible. When the
>   * timer expires we will reset the timer for any remaining period.
>   */
> -if (nexttick > INT64_MAX / gt_cntfrq_period_ns(cpu)) {
> +if ((nexttick > INT64_MAX / gt_cntfrq_period_ns(cpu))
> + || nexttick_overflow) {
>  timer_mod_ns(cpu->gt_timer[timeridx], INT64_MAX);
>  } else {
>  timer_mod(cpu->gt_timer[timeridx], nexttick);
> --
> 2.25.1

thanks
-- PMM

[PATCH] hw/intc/riscv_aplic: Zero init APLIC internal state

2023-04-13 Thread Ivan Klokov

Since g_new is used to initialize the RISCVAPLICState->state structure,
in some case we get behavior that is not as expected. This patch
changes this to g_new0, which allows to initialize the APLIC in the correct 
state.

Signed-off-by: Ivan Klokov 
---
 hw/intc/riscv_aplic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/intc/riscv_aplic.c b/hw/intc/riscv_aplic.c
index cfd007e629..71591d44bf 100644
--- a/hw/intc/riscv_aplic.c
+++ b/hw/intc/riscv_aplic.c
@@ -803,7 +803,7 @@ static void riscv_aplic_realize(DeviceState *dev, Error 
**errp)
 
 aplic->bitfield_words = (aplic->num_irqs + 31) >> 5;
 aplic->sourcecfg = g_new0(uint32_t, aplic->num_irqs);
-aplic->state = g_new(uint32_t, aplic->num_irqs);
+aplic->state = g_new0(uint32_t, aplic->num_irqs);
 aplic->target = g_new0(uint32_t, aplic->num_irqs);
 if (!aplic->msimode) {
 for (i = 0; i < aplic->num_irqs; i++) {
-- 
2.34.1

Re: [RFC PATCH 0/1] Implement entropy leak reporting for virtio-rng

2023-04-13 Thread Babis Chalios





On 11/4/23 18:20, Jason A. Donenfeld wrote:

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



On Tue, Apr 11, 2023 at 6:19 PM Amit Shah  wrote:

Hey Babis,

On Mon, 2023-04-03 at 12:52 +0200, Babis Chalios wrote:

This patchset implements the entropy leak reporting feature proposal [1]
for virtio-rng devices.

Entropy leaking (as defined in the specification proposal) typically
happens when we take a snapshot of a VM or while we resume a VM from a
snapshot. In these cases, we want to let the guest know so that it can
reset state that needs to be uniqueue, for example.

This feature is offering functionality similar to what VMGENID does.
However, it allows to build mechanisms on the guest side to notify
user-space applications, like VMGENID for userspace and additionally for
kernel.

The new specification describes two request types that the guest might
place in the queues for the device to perform, a fill-on-leak request
where the device needs to fill with random bytes a buffer and a
copy-on-leak request where the device needs to perform a copy between
two guest-provided buffers. We currently trigger the handling of guest
requests when saving the VM state and when loading a VM from a snapshot
file.

This is an RFC, since the corresponding specification changes have not
yet been merged. It also aims to allow testing a respective patch-set
implementing the feature in the Linux front-end driver[2].

However, I would like to ask the community's opinion regarding the
handling of the fill-on-leak requests. Essentially, these requests are
very similar to the normal virtio-rng entropy requests, with the catch
that we should complete these requests before resuming the VM, so that
we avoid race-conditions in notifying the guest about entropy leak
events. This means that we cannot rely on the RngBackend's API, which is
asynchronous. At the moment, I have handled that using getrandom(), but
I would like a solution which doesn't work only with (relatively new)
Linux hosts. I am inclined to solve that by extending the RngBackend API
with a synchronous call to request for random bytes and I'd like to hear
opinion's on this approach.

The patch looks OK - I suggest you add a new sync call that also probes
for the availability of getrandom().

qemu_guest_getrandom_nofail?


That should work, I think. Any objections to this Amit?

Cheers,
Babis

Re: [PATCH] softmmu: Move dirtylimit.c into the target independent source set

2023-04-13 Thread Richard Henderson


On 4/13/23 07:45, Thomas Huth wrote:

  uint32_t dirty_ring_size = kvm_dirty_ring_size();
  uint64_t dirty_ring_size_meory_MB =
-dirty_ring_size * TARGET_PAGE_SIZE >> 20;
+dirty_ring_size * qemu_target_page_size() >> 20;


Existing problem, the types here are suspicious: dirty_ring_size is uint32_t, 
dirty_ring_size_meory (typo) is uint64_t.


I wonder if this is better computed as

uint32_t dirty_ring_size_MB = dirty_ring_size >> (20 - 
qemu_target_page_bits());


r~

Re: [RFC PATCH v2 38/44] target/loongarch: Implement vbitsel vset

2023-04-13 Thread Richard Henderson


On 4/13/23 04:53, gaosong wrote:


在 2023/4/12 下午2:53, Richard Henderson 写道:



+#define SETANYEQZ(NAME, BIT, E) \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
+{   \
+    int i; \
+    bool ret = false;   \
+    VReg *Vj = &(env->fpr[vj].vreg); \
+    \
+    for (i = 0; i < LSX_LEN/BIT; i++) { \
+    ret |= (Vj->E(i) == 0); \
+ } \
+    env->cf[cd & 0x7] = ret;    \
+}
+SETANYEQZ(vsetanyeqz_b, 8, B)
+SETANYEQZ(vsetanyeqz_h, 16, H)
+SETANYEQZ(vsetanyeqz_w, 32, W)
+SETANYEQZ(vsetanyeqz_d, 64, D)


These could be inlined, though slightly harder.
C.f. target/arm/sve_helper.c, do_match2 (your n == 0).


Do you mean an inline like trans_vseteqz_v or just an inline helper function?


I meant inline tcg code generation, instead of a call to a helper.
But even if we keep this in a helper, see do_match2 for avoiding the loop over bytes. 

Ok,
e.g
#define SETANYEQZ(NAME, MO)          \
void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
{     \
     int i;                                                            \
     bool ret = false; \
     VReg *Vj = &(env->fpr[vj].vreg); \
\
     ret = do_match2(0, (uint64_t)Vj->D(0), (uint64_t)Vj->D(1), MO);        
  \
     env->cf[cd & 0x7] = ret;      \
}
SETANYEQZ(vsetanyeqz_b, MO_8)
SETANYEQZ(vsetanyeqz_h, MO_16)
SETANYEQZ(vsetanyeqz_w, MO_32)
SETANYEQZ(vsetanyeqz_d, MO_64)

and
vsetanyeqz.b    $fcc5  $vr11
   v11    : {edc0004d576eef5b, ec03ec0fec03ea47}
--
do_match2
bits is 8
m1 is ec03ec0fec03ea47
m0 is edc0004d576eef5b
ones is 1010101
sings is 80808080
cmp1 is 0
cmp0 is edc0004d576eef5b
cmp1 is ec03ec0fec03ea47
cmp0 is 1
cmp1 is 3000100
ret is 0

but,  the results is not correct  for vsetanyeqz.b. :-)


Well, 'ones' as printed above is only 4 bytes instead of 8, similarly 'sings'.  That would 
certainly explain why it did not detect a zero in byte 5 of 'm0'.


Some problem with your conversion of that function?


r~

Re: virtio-iommu hotplug issue

2023-04-13 Thread Eric Auger

Hi,

On 4/13/23 13:01, Akihiko Odaki wrote:
> On 2023/04/13 19:40, Jean-Philippe Brucker wrote:
>> Hello,
>>
>> On Thu, Apr 13, 2023 at 01:49:43PM +0900, Akihiko Odaki wrote:
>>> Hi,
>>>
>>> Recently I encountered a problem with the combination of Linux's
>>> virtio-iommu driver and QEMU when a SR-IOV virtual function gets
>>> disabled.
>>> I'd like to ask you what kind of solution is appropriate here and
>>> implement
>>> the solution if possible.
>>>
>>> A PCIe device implementing the SR-IOV specification exports a virtual
>>> function, and the guest can enable or disable it at runtime by
>>> writing to a
>>> configuration register. This effectively looks like a PCI device is
>>> hotplugged for the guest.
>>
>> Just so I understand this better: the guest gets a whole PCIe device PF
>> that implements SR-IOV, and so the guest can dynamically create VFs? 
>> Out
>> of curiosity, is that a hardware device assigned to the guest with VFIO,
>> or a device emulated by QEMU?
>
> Yes, that's right. The guest can dynamically create and delete VFs.
> The device is emulated by QEMU: igb, an Intel NIC recently added to
> QEMU and projected to be released as part of QEMU 8.0.
>From below description In understand you then bind this emulated device
to VFIO on guest, correct?
>
>>
>>> In such a case, the kernel assumes the endpoint is
>>> detached from the virtio-iommu domain, but QEMU actually does not
>>> detach it.
The QEMU virtio-iommu device executes commands from the virtio-iommu
driver and my understanding is the VFIO infra is not in trouble here. As
suggested by Jean, a detach command probably is missed.
>>>
>>> This inconsistent view of the removed device sometimes prevents the
>>> VM from
>>> correctly performing the following procedure, for example:
>>> 1. Enable a VF.
>>> 2. Disable the VF.
>>> 3. Open a vfio container.
>>> 4. Open the group which the PF belongs to.
>>> 5. Add the group to the vfio container.
>>> 6. Map some memory region.
>>> 7. Close the group.
>>> 8. Close the vfio container.
>>> 9. Repeat 3-8
>>>
>>> When the VF gets disabled, the kernel assumes the endpoint is
>>> detached from
>>> the IOMMU domain, but QEMU actually doesn't detach it. Later, the
>>> domain
>>> will be reused in step 3-8.
>>>
>>> In step 7, the PF will be detached, and the kernel thinks there is no
>>> endpoint attached and the mapping the domain holds is cleared, but
>>> the VF
>>> endpoint is still attached and the mapping is kept intact.
>>>
>>> In step 9, the same domain will be reused again, and the kernel
>>> requests to
>>> create a new mapping, but it will conflict with the existing mapping
>>> and
>>> result in -EINVAL.
>>>
>>> This problem can be fixed by either of:
>>> - requesting the detachment of the endpoint from the guest when the PCI
>>> device is unplugged (the VF is disabled)
>>
>> Yes, I think this is an issue in the virtio-iommu driver, which
>> should be
>> sending a DETACH request when the VF is disabled, likely from
>> viommu_release_device(). I'll work on a fix unless you would like to
>> do it
>
> It will be nice if you prepare a fix. I will test your patch with my
> workload if you share it with me.

I can help testing too

Thanks

Eric
>
> Regards,
> Akihiko Odaki
>
>>
>>> - detecting that the PCI device is gone and automatically detach it on
>>> QEMU-side.
>>>
>>> It is not completely clear for me which solution is more appropriate
>>> as the
>>> virtio-iommu specification is written in a way independent of the
>>> endpoint
>>> mechanism and does not say what should be done when a PCI device is
>>> unplugged.
>>
>> Yes, I'm not sure it's in scope for the specification, it's more about
>> software guidance
>>
>> Thanks,
>> Jean
>

Re: [PATCH] replication: compile out some staff when replication is not configured

2023-04-13 Thread Vladimir Sementsov-Ogievskiy


On 13.04.23 12:52, Zhang, Chen wrote:




-Original Message-
From: Vladimir Sementsov-Ogievskiy 
Sent: Tuesday, April 11, 2023 10:51 PM
To: qemu-devel@nongnu.org
Cc: qemu-bl...@nongnu.org; pbonz...@redhat.com; arm...@redhat.com;
ebl...@redhat.com; jasow...@redhat.com; dgilb...@redhat.com;
quint...@redhat.com; hre...@redhat.com; kw...@redhat.com; Zhang,
Hailiang ; Zhang, Chen
; lizhij...@fujitsu.com;
wencongya...@huawei.com; xiechanglon...@gmail.com; den-
plotni...@yandex-team.ru; Vladimir Sementsov-Ogievskiy

Subject: [PATCH] replication: compile out some staff when replication is not
configured

Don't compile-in replication-related files when replication is disabled in
config.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all!

I'm unsure, should it be actually separate --disable-colo / --enable-colo
options or it's really used only together with replication staff.. So, I decided
to start with simpler variant.



For replication, I think there's nothing wrong with the idea.
But not so for COLO.  COLO project consists of three independent parts: 
Replication, migration, net-proxy.
Each one have ability to run alone for other proposals. For example we can just 
run filter-mirror/redirector for networking
Analysis/debugs. Although the best practice of COLO is to make the three 
modules work together, in fact, we can also
use only some modules of COLO for other usage scenarios. Like COLO migration + 
net-proxy for shared disk, etc...
So I think no need to disable all COLO related modules when replication is not 
configured.
For details:
https://wiki.qemu.org/Features/COLO



So, if I want to have an option to disable all COLO modules, do you mean it 
should be additional --disable-colo option? Or better keep one option 
--disable-replication (and, maybe just rename to to --disable-colo)?


Thanks
Chen



  block/meson.build |  2 +-
  migration/meson.build |  6 --
  net/meson.build   |  8 
  qapi/migration.json   |  6 --
  stubs/colo.c  | 46 +++
  stubs/meson.build |  1 +
  6 files changed, 60 insertions(+), 9 deletions(-)  create mode 100644
stubs/colo.c

diff --git a/block/meson.build b/block/meson.build index
382bec0e7d..b9a72e219b 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -84,7 +84,7 @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-
win32.c', 'win32-aio.c')
  block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, 
iokit])
  block_ss.add(when: libiscsi, if_true: files('iscsi-opts.c'))
  block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c')) -if not
get_option('replication').disabled()
+if get_option('replication').allowed()
block_ss.add(files('replication.c'))
  endif
  block_ss.add(when: libaio, if_true: files('linux-aio.c')) diff --git
a/migration/meson.build b/migration/meson.build index
0d1bb9f96e..8180eaea7b 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -13,8 +13,6 @@ softmmu_ss.add(files(
'block-dirty-bitmap.c',
'channel.c',
'channel-block.c',
-  'colo-failover.c',
-  'colo.c',
'exec.c',
'fd.c',
'global_state.c',
@@ -29,6 +27,10 @@ softmmu_ss.add(files(
'threadinfo.c',
  ), gnutls)

+if get_option('replication').allowed()
+  softmmu_ss.add(files('colo.c', 'colo-failover.c')) endif
+
  softmmu_ss.add(when: rdma, if_true: files('rdma.c'))  if
get_option('live_block_migration').allowed()
softmmu_ss.add(files('block.c'))
diff --git a/net/meson.build b/net/meson.build index
87afca3e93..634ab71cc6 100644
--- a/net/meson.build
+++ b/net/meson.build
@@ -1,13 +1,9 @@
  softmmu_ss.add(files(
'announce.c',
'checksum.c',
-  'colo-compare.c',
-  'colo.c',
'dump.c',
'eth.c',
'filter-buffer.c',
-  'filter-mirror.c',
-  'filter-rewriter.c',
'filter.c',
'hub.c',
'net-hmp-cmds.c',
@@ -19,6 +15,10 @@ softmmu_ss.add(files(
'util.c',
  ))

+if get_option('replication').allowed()
+  softmmu_ss.add(files('colo-compare.c', 'colo.c', 'filter-rewriter.c',
+'filter-mirror.c')) endif
+
  softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('filter-replay.c'))

  if have_l2tpv3
diff --git a/qapi/migration.json b/qapi/migration.json index
c84fa10e86..5b81e09369 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1685,7 +1685,8 @@
  ##
  { 'struct': 'COLOStatus',
'data': { 'mode': 'COLOMode', 'last-mode': 'COLOMode',
-'reason': 'COLOExitReason' } }
+'reason': 'COLOExitReason' },
+  'if': 'CONFIG_REPLICATION' }

  ##
  # @query-colo-status:
@@ -1702,7 +1703,8 @@
  # Since: 3.1
  ##
  { 'command': 'query-colo-status',
-  'returns': 'COLOStatus' }
+  'returns': 'COLOStatus',
+  'if': 'CONFIG_REPLICATION' }

  ##
  # @migrate-recover:
diff --git a/stubs/colo.c b/stubs/colo.c new file mode 100644 index
00..5a02540baa
--- /dev/null
+++ b/stubs/colo.c
@@ -0,0 +1,46 @@
+#include "qemu/osdep.h"
+#include "qemu/notify.h"
+#include "net/colo-compare.h"
+#include "mig

[PATCH v2] target/riscv: Update check for Zca/Zcf/Zcd

2023-04-13 Thread Weiwei Li

Even though Zca/Zcf/Zcd can be included by C/F/D, however, their priv
version is higher than the priv version of C/F/D. So if we use check
for them instead of check for C/F/D totally, it will trigger new
problem when we try to disable the extensions based on the configured
priv version.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
v2:
* Fix code style errors

 target/riscv/insn_trans/trans_rvd.c.inc | 12 +++-
 target/riscv/insn_trans/trans_rvf.c.inc | 14 --
 target/riscv/insn_trans/trans_rvi.c.inc |  5 +++--
 target/riscv/translate.c|  5 +++--
 4 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvd.c.inc 
b/target/riscv/insn_trans/trans_rvd.c.inc
index 2c51e01c40..6bdb55ef43 100644
--- a/target/riscv/insn_trans/trans_rvd.c.inc
+++ b/target/riscv/insn_trans/trans_rvd.c.inc
@@ -31,9 +31,11 @@
 } \
 } while (0)
 
-#define REQUIRE_ZCD(ctx) do { \
-if (!ctx->cfg_ptr->ext_zcd) {  \
-return false; \
+#define REQUIRE_ZCD_OR_DC(ctx) do { \
+if (!ctx->cfg_ptr->ext_zcd) { \
+if (!has_ext(ctx, RVD) || !has_ext(ctx, RVC)) { \
+return false; \
+} \
 } \
 } while (0)
 
@@ -67,13 +69,13 @@ static bool trans_fsd(DisasContext *ctx, arg_fsd *a)
 
 static bool trans_c_fld(DisasContext *ctx, arg_fld *a)
 {
-REQUIRE_ZCD(ctx);
+REQUIRE_ZCD_OR_DC(ctx);
 return trans_fld(ctx, a);
 }
 
 static bool trans_c_fsd(DisasContext *ctx, arg_fsd *a)
 {
-REQUIRE_ZCD(ctx);
+REQUIRE_ZCD_OR_DC(ctx);
 return trans_fsd(ctx, a);
 }
 
diff --git a/target/riscv/insn_trans/trans_rvf.c.inc 
b/target/riscv/insn_trans/trans_rvf.c.inc
index 9e9fa2087a..593855e73a 100644
--- a/target/riscv/insn_trans/trans_rvf.c.inc
+++ b/target/riscv/insn_trans/trans_rvf.c.inc
@@ -30,10 +30,12 @@
 } \
 } while (0)
 
-#define REQUIRE_ZCF(ctx) do {  \
-if (!ctx->cfg_ptr->ext_zcf) {  \
-return false;  \
-}  \
+#define REQUIRE_ZCF_OR_FC(ctx) do { \
+if (!ctx->cfg_ptr->ext_zcf) {   \
+if (!has_ext(ctx, RVF) || !has_ext(ctx, RVC)) { \
+return false;   \
+}   \
+}   \
 } while (0)
 
 static bool trans_flw(DisasContext *ctx, arg_flw *a)
@@ -69,13 +71,13 @@ static bool trans_fsw(DisasContext *ctx, arg_fsw *a)
 
 static bool trans_c_flw(DisasContext *ctx, arg_flw *a)
 {
-REQUIRE_ZCF(ctx);
+REQUIRE_ZCF_OR_FC(ctx);
 return trans_flw(ctx, a);
 }
 
 static bool trans_c_fsw(DisasContext *ctx, arg_fsw *a)
 {
-REQUIRE_ZCF(ctx);
+REQUIRE_ZCF_OR_FC(ctx);
 return trans_fsw(ctx, a);
 }
 
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index c70c495fc5..e33f63bea1 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -56,7 +56,7 @@ static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
 tcg_gen_andi_tl(cpu_pc, cpu_pc, (target_ulong)-2);
 
 gen_set_pc(ctx, cpu_pc);
-if (!ctx->cfg_ptr->ext_zca) {
+if (!has_ext(ctx, RVC) && !ctx->cfg_ptr->ext_zca) {
 TCGv t0 = tcg_temp_new();
 
 misaligned = gen_new_label();
@@ -169,7 +169,8 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond 
cond)
 
 gen_set_label(l); /* branch taken */
 
-if (!ctx->cfg_ptr->ext_zca && ((ctx->base.pc_next + a->imm) & 0x3)) {
+if (!has_ext(ctx, RVC) && !ctx->cfg_ptr->ext_zca &&
+((ctx->base.pc_next + a->imm) & 0x3)) {
 /* misaligned */
 gen_exception_inst_addr_mis(ctx);
 } else {
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index d0094922b6..661e29ab39 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -551,7 +551,7 @@ static void gen_jal(DisasContext *ctx, int rd, target_ulong 
imm)
 
 /* check misaligned: */
 next_pc = ctx->base.pc_next + imm;
-if (!ctx->cfg_ptr->ext_zca) {
+if (!has_ext(ctx, RVC) && !ctx->cfg_ptr->ext_zca) {
 if ((next_pc & 0x3) != 0) {
 gen_exception_inst_addr_mis(ctx);
 return;
@@ -1137,7 +1137,8 @@ static void decode_opc(CPURISCVState *env, DisasContext 
*ctx, uint16_t opcode)
  * The Zca extension is added as way to refer to instructions in the C
  * extension that do not include the floating-point loads and stores
  */
-if (ctx->cfg_ptr->ext_zca && decode_insn16(ctx, opcode)) {
+if ((has_ext(ctx, RVC) || ctx->cfg_ptr->ext_zca) &&
+decode_insn16(ctx, opcode)) {
 return;
 }
 } else {
-- 
2.25.1

Re: [PATCH 1/4] vhost: Re-enable vrings after setting features

2023-04-13 Thread Anton Kuchin


On 13/04/2023 14:03, Stefan Hajnoczi wrote:

On Thu, 13 Apr 2023 at 04:20, Hanna Czenczek  wrote:

On 12.04.23 22:51, Stefan Hajnoczi wrote:

On Tue, Apr 11, 2023 at 05:05:12PM +0200, Hanna Czenczek wrote:

If the back-end supports the VHOST_USER_F_PROTOCOL_FEATURES feature,
setting the vhost features will set this feature, too.  Doing so
disables all vrings, which may not be intended.

For example, enabling or disabling logging during migration requires
setting those features (to set or unset VHOST_F_LOG_ALL), which will
automatically disable all vrings.  In either case, the VM is running
(disabling logging is done after a failed or cancelled migration, and
only once the VM is running again, see comment in
memory_global_dirty_log_stop()), so the vrings should really be enabled.
As a result, the back-end seems to hang.

To fix this, we must remember whether the vrings are supposed to be
enabled, and, if so, re-enable them after a SET_FEATURES call that set
VHOST_USER_F_PROTOCOL_FEATURES.

It seems less than ideal that there is a short period in which the VM is
running but the vrings will be stopped (between SET_FEATURES and
SET_VRING_ENABLE).  To fix this, we would need to change the protocol,
e.g. by introducing a new flag or vhost-user protocol feature to disable
disabling vrings whenever VHOST_USER_F_PROTOCOL_FEATURES is set, or add
new functions for setting/clearing singular feature bits (so that
F_LOG_ALL can be set/cleared without touching F_PROTOCOL_FEATURES).

Even with such a potential addition to the protocol, we still need this
fix here, because we cannot expect that back-ends will implement this
addition.

Signed-off-by: Hanna Czenczek 
---
   include/hw/virtio/vhost.h | 10 ++
   hw/virtio/vhost.c | 13 +
   2 files changed, 23 insertions(+)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a52f273347..2fe02ed5d4 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -90,6 +90,16 @@ struct vhost_dev {
   int vq_index_end;
   /* if non-zero, minimum required value for max_queues */
   int num_queues;
+
+/*
+ * Whether the virtqueues are supposed to be enabled (via
+ * SET_VRING_ENABLE).  Setting the features (e.g. for
+ * enabling/disabling logging) will disable all virtqueues if
+ * VHOST_USER_F_PROTOCOL_FEATURES is set, so then we need to
+ * re-enable them if this field is set.
+ */
+bool enable_vqs;
+
   /**
* vhost feature handling requires matching the feature set
* offered by a backend which may be a subset of the total
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index a266396576..cbff589efa 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -50,6 +50,8 @@ static unsigned int used_memslots;
   static QLIST_HEAD(, vhost_dev) vhost_devices =
   QLIST_HEAD_INITIALIZER(vhost_devices);

+static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable);
+
   bool vhost_has_free_slot(void)
   {
   unsigned int slots_limit = ~0U;
@@ -899,6 +901,15 @@ static int vhost_dev_set_features(struct vhost_dev *dev,
   }
   }

+if (dev->enable_vqs) {
+/*
+ * Setting VHOST_USER_F_PROTOCOL_FEATURES would have disabled all
+ * virtqueues, even if that was not intended; re-enable them if
+ * necessary.
+ */
+vhost_dev_set_vring_enable(dev, true);
+}
+
   out:
   return r;
   }
@@ -1896,6 +1907,8 @@ int vhost_dev_get_inflight(struct vhost_dev *dev, 
uint16_t queue_size,

   static int vhost_dev_set_vring_enable(struct vhost_dev *hdev, int enable)
   {
+hdev->enable_vqs = enable;
+
   if (!hdev->vhost_ops->vhost_set_vring_enable) {
   return 0;
   }

The vhost-user spec doesn't say that VHOST_F_LOG_ALL needs to be toggled
at runtime and I don't think VHOST_USER_SET_PROTOCOL_FEATURES is
intended to be used like that. This issue shows why doing so is a bad
idea.

VHOST_F_LOG_ALL does not need to be toggled to control logging. Logging
is controlled at runtime by the presence of the dirty log
(VHOST_USER_SET_LOG_BASE) and the per-vring logging flag
(VHOST_VRING_F_LOG).

Technically, the spec doesn’t say that SET_LOG_BASE is required.  It says:

“To start/stop logging of data/used ring writes, the front-end may send
messages VHOST_USER_SET_FEATURES with VHOST_F_LOG_ALL and
VHOST_USER_SET_VRING_ADDR with VHOST_VRING_F_LOG in ring’s flags set to
1/0, respectively.”

(So the spec also very much does imply that toggling F_LOG_ALL at
runtime is a valid way to enable/disable logging.  If we were to no
longer do that, we should clarify it there.)

I missed that VHOST_VRING_F_LOG only controls logging used ring writes
while writes to descriptors are always logged when VHOST_F_LOG_ALL is
set. I agree that the spec does require VHOST_F_LOG_ALL to be toggled
at runtime.

What I suggested won't work.


But is there a valid use-case for logging some dirty memory but not all?
I

Re: [PATCH V2] intel_iommu: refine iotlb hash calculation

2023-04-13 Thread Peter Xu

On Wed, Apr 12, 2023 at 03:35:10PM +0800, Jason Wang wrote:
> Commit 1b2b12376c8 ("intel-iommu: PASID support") takes PASID into
> account when calculating iotlb hash like:
> 
> static guint vtd_iotlb_hash(gconstpointer v)
> {
> const struct vtd_iotlb_key *key = v;
> 
> return key->gfn | ((key->sid) << VTD_IOTLB_SID_SHIFT) |
>(key->level) << VTD_IOTLB_LVL_SHIFT |
>(key->pasid) << VTD_IOTLB_PASID_SHIFT;
> }
> 
> This turns out to be problematic since:
> 
> - the shift will lose bits if not converting to uint64_t
> - level should be off by one in order to fit into 2 bits
> - VTD_IOTLB_PASID_SHIFT is 30 but PASID is 20 bits which will waste
>   some bits
> - the hash result is uint64_t so we will lose bits when converting to
>   guint
> 
> So this patch fixes them by
> 
> - converting the keys into uint64_t before doing the shift
> - off level by one to make it fit into two bits
> - change the sid, lvl and pasid shift to 26, 42 and 44 in order to
>   take the full width of uint64_t
> - perform an XOR to the top 32bit with the bottom 32bit for the final
>   result to fit guint
> 
> Fixes: Coverity CID 1508100
> Fixes: 1b2b12376c8 ("intel-iommu: PASID support")
> Signed-off-by: Jason Wang 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [RFC PATCH v2] riscv: Add support for the Zfa extension

2023-04-13 Thread Christoph Müllner

".

On Mon, Apr 10, 2023 at 3:23 PM LIU Zhiwei  wrote:
>
>
> On 2023/4/1 2:28, Christoph Muellner wrote:
> > From: Christoph Müllner 
> >
> > This patch introduces the RISC-V Zfa extension, which introduces
> > additional floating-point extensions:
> > * fli (load-immediate) with pre-defined immediates
> > * fminm/fmaxm (like fmin/fmax but with different NaN behaviour)
> > * fround/froundmx (round to integer)
> > * fcvtmod.w.d (Modular Convert-to-Integer)
> > * fmv* to access high bits of float register bigger than XLEN
> > * Quiet comparison instructions (fleq/fltq)
> >
> > Zfa defines its instructions in combination with the following extensions:
> > * single-precision floating-point (F)
> > * double-precision floating-point (D)
> > * quad-precision floating-point (Q)
> > * half-precision floating-point (Zfh)
> >
> > Since QEMU does not support the RISC-V quad-precision floating-point
> > ISA extension (Q), this patch does not include the instructions that
> > depend on this extension. All other instructions are included in this
> > patch.
> >
> > The Zfa specification is not frozen at the moment (which is why this
> > patch is RFC) and can be found here:
> >https://github.com/riscv/riscv-isa-manual/blob/master/src/zfa.tex
> >
> > Signed-off-by: Christoph Müllner 
> > ---
> > Changes in v2:
> > * Remove calls to mark_fs_dirty() in comparison trans functions
> > * Rewrite fround(nx) using float*_round_to_int()
> > * Move fli* to translation unit and fix NaN-boxing of NaN values
> > * Reimplement FCVTMOD.W.D
> > * Add use of second register in trans_fmvp_d_x()
> >
> >   target/riscv/cpu.c|   8 +
> >   target/riscv/cpu.h|   1 +
> >   target/riscv/fpu_helper.c | 258 +++
> >   target/riscv/helper.h |  19 +
> >   target/riscv/insn32.decode|  67 +++
> >   target/riscv/insn_trans/trans_rvzfa.c.inc | 529 ++
> >   target/riscv/translate.c  |   1 +
> >   7 files changed, 883 insertions(+)
> >   create mode 100644 target/riscv/insn_trans/trans_rvzfa.c.inc
> >
> > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> > index 1e97473af2..bac9ced4a2 100644
> > --- a/target/riscv/cpu.c
> > +++ b/target/riscv/cpu.c
> > @@ -83,6 +83,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
> >   ISA_EXT_DATA_ENTRY(zifencei, true, PRIV_VERSION_1_10_0, ext_ifencei),
> >   ISA_EXT_DATA_ENTRY(zihintpause, true, PRIV_VERSION_1_10_0, 
> > ext_zihintpause),
> >   ISA_EXT_DATA_ENTRY(zawrs, true, PRIV_VERSION_1_12_0, ext_zawrs),
> > +ISA_EXT_DATA_ENTRY(zfa, true, PRIV_VERSION_1_12_0, ext_zfa),
> >   ISA_EXT_DATA_ENTRY(zfh, true, PRIV_VERSION_1_11_0, ext_zfh),
> >   ISA_EXT_DATA_ENTRY(zfhmin, true, PRIV_VERSION_1_12_0, ext_zfhmin),
> >   ISA_EXT_DATA_ENTRY(zfinx, true, PRIV_VERSION_1_12_0, ext_zfinx),
> > @@ -404,6 +405,7 @@ static void rv64_thead_c906_cpu_init(Object *obj)
> >   cpu->cfg.ext_u = true;
> >   cpu->cfg.ext_s = true;
> >   cpu->cfg.ext_icsr = true;
> > +cpu->cfg.ext_zfa = true;
> >   cpu->cfg.ext_zfh = true;
> >   cpu->cfg.mmu = true;
> >   cpu->cfg.ext_xtheadba = true;
> > @@ -865,6 +867,11 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
> > *cpu, Error **errp)
> >   return;
> >   }
> >
> > +if (cpu->cfg.ext_zfa && !cpu->cfg.ext_f) {
> > +error_setg(errp, "Zfa extension requires F extension");
> > +return;
> > +}
> > +
> >   if (cpu->cfg.ext_zfh) {
> >   cpu->cfg.ext_zfhmin = true;
> >   }
> > @@ -1381,6 +1388,7 @@ static Property riscv_cpu_extensions[] = {
> >   DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
> >   DEFINE_PROP_BOOL("Zihintpause", RISCVCPU, cfg.ext_zihintpause, true),
> >   DEFINE_PROP_BOOL("Zawrs", RISCVCPU, cfg.ext_zawrs, true),
> > +DEFINE_PROP_BOOL("Zfa", RISCVCPU, cfg.ext_zfa, false),
> >   DEFINE_PROP_BOOL("Zfh", RISCVCPU, cfg.ext_zfh, false),
> >   DEFINE_PROP_BOOL("Zfhmin", RISCVCPU, cfg.ext_zfhmin, false),
> >   DEFINE_PROP_BOOL("Zve32f", RISCVCPU, cfg.ext_zve32f, false),
> > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> > index 638e47c75a..deae410fc2 100644
> > --- a/target/riscv/cpu.h
> > +++ b/target/riscv/cpu.h
> > @@ -462,6 +462,7 @@ struct RISCVCPUConfig {
> >   bool ext_svpbmt;
> >   bool ext_zdinx;
> >   bool ext_zawrs;
> > +bool ext_zfa;
> >   bool ext_zfh;
> >   bool ext_zfhmin;
> >   bool ext_zfinx;
> > diff --git a/target/riscv/fpu_helper.c b/target/riscv/fpu_helper.c
> > index 449d236df6..c0ebaa040f 100644
> > --- a/target/riscv/fpu_helper.c
> > +++ b/target/riscv/fpu_helper.c
> > @@ -252,6 +252,21 @@ uint64_t helper_fmin_s(CPURISCVState *env, uint64_t 
> > rs1, uint64_t rs2)
> >   float32_minimum_number(frs1, frs2, &env->fp_status));
> >   }
> >
> > +uint64_t helper_fminm_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
>

Re: [PATCH] riscv: Add support for the Zfa extension

2023-04-13 Thread Christoph Müllner

On Fri, Mar 31, 2023 at 11:39 PM Richard Henderson
 wrote:
>
> On 3/31/23 11:22, Christoph Müllner wrote:
> > On Mon, Mar 27, 2023 at 7:18 PM Richard Henderson
> >  wrote:
> >>
> >> On 3/27/23 01:00, Christoph Muellner wrote:
> >>> +uint64_t helper_fminm_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
> >>> +{
> >>> +float32 frs1 = check_nanbox_s(env, rs1);
> >>> +float32 frs2 = check_nanbox_s(env, rs2);
> >>> +
> >>> +if (float32_is_any_nan(frs1) || float32_is_any_nan(frs2)) {
> >>> +return float32_default_nan(&env->fp_status);
> >>> +}
> >>> +
> >>> +return nanbox_s(env, float32_minimum_number(frs1, frs2, 
> >>> &env->fp_status));
> >>> +}
> >>
> >> Better to set and clear fp_status->default_nan_mode around the operation.
> >
> > I don't see how this can help:
> > * default_nan_mode defines if the default_nan is generated or if the
> > operand's NaN should be used
> > * RISC-V has default_nan_mode always set to true (operations should
> > return the a canonical NaN and not propagate NaN values)
> > * That also does not help to eliminate the is_any_nan() tests, because
> > float*_minimum_number() and float*_minnum() return the non-NaN number
> > if (only) one operand is NaN
> >
> > Am I missing something?
>
> Oh goodness, I did mis-read this.
>
> But if you need a nan when an input is a nan, then float32_min instead of
> float32_minimum_number (which goes out of its way to select the non-nan 
> result) is the
> correct function to use.

Understood and fixed.
Thanks!

>
>
> r~

[PATCH 0/2] chardev/char-file: Allow setting input file on command line

2023-04-13 Thread Peter Maydell

Our 'file' chardev backend supports specifying both an
input and an output file, but only if you create it via
the QMP interface -- there is no command-line syntax
support for setting the input file. This patchset adds
an extra 'input-path' option to the chardev.

The specific use case I have is that I'd like to be able to
feed fuzzer reproducer input into qtest without having to use
'-qtest stdio' and put the input onto stdin. Being able to
use a file chardev like this:
 -chardev file,id=repro,path=/dev/null,input-path=repro.txt -qtest chardev:repro
means that stdio is free for use by gdb.

The first patch in the series fixes an assertion failure
in the qtest code if you try to pass it a named chardev;
the second patch adds the new option to the file backend.

thanks
-- PMM

Peter Maydell (2):
  qtest: Don't assert on "-qtest chardev:myid"
  chardev: Allow setting file chardev input file on the command line

 chardev/char-file.c |  8 
 chardev/char.c  |  3 +++
 softmmu/qtest.c |  2 +-
 qemu-options.hx | 10 --
 4 files changed, 20 insertions(+), 3 deletions(-)

-- 
2.34.1

[PATCH 1/2] qtest: Don't assert on "-qtest chardev:myid"

2023-04-13 Thread Peter Maydell

If the -qtest command line argument is passed a string that says
"use this chardev for I/O", then it will assert:

$ ./build/clang/qemu-system-i386 -chardev file,path=/dev/null,id=myid -qtest 
chardev:myid
Unexpected error in qtest_set_chardev() at ../../softmmu/qtest.c:1011:
qemu-system-i386: Cannot find character device 'qtest'
Aborted (core dumped)

This is because in qtest_server_init() we assume that when we create
the chardev with qemu_chr_new() it will always have the name "qtest".
This is true if qemu_chr_new() had to create a new chardev, but not
true if one already existed and is being referred to with
"chardev:myid".

Use the name of the chardev we get back from qemu_chr_new() as the
string to set the qtest 'chardev' property to, instead of hardcoding
it to "qtest".

Signed-off-by: Peter Maydell 
---
 softmmu/qtest.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/softmmu/qtest.c b/softmmu/qtest.c
index 34bd2a33a76..26852996b5b 100644
--- a/softmmu/qtest.c
+++ b/softmmu/qtest.c
@@ -867,7 +867,7 @@ void qtest_server_init(const char *qtest_chrdev, const char 
*qtest_log, Error **
 }
 
 qtest = object_new(TYPE_QTEST);
-object_property_set_str(qtest, "chardev", "qtest", &error_abort);
+object_property_set_str(qtest, "chardev", chr->label, &error_abort);
 if (qtest_log) {
 object_property_set_str(qtest, "log", qtest_log, &error_abort);
 }
-- 
2.34.1

[PATCH 2/2] chardev: Allow setting file chardev input file on the command line

2023-04-13 Thread Peter Maydell

Our 'file' chardev backend supports both "output from this chardev
is written to a file" and "input from this chardev should be read
from a file" (except on Windows). However, you can only set up
the input file if you're using the QMP interface -- there is no
command line syntax to do it.

Add command line syntax to allow specifying an input file
as well as an output file, using a new 'input-path' suboption.

The specific use case I have is that I'd like to be able to
feed fuzzer reproducer input into qtest without having to use
'-qtest stdio' and put the input onto stdin. Being able to
use a file chardev like this:
 -chardev file,id=repro,path=/dev/null,input-path=repro.txt -qtest chardev:repro
means that stdio is free for use by gdb.

Signed-off-by: Peter Maydell 
---
The "not on Windows" ifdeffery is because qmp_chardev_open_file()
does something similar; it seems likely to produce a nicer
error message to catch it at parse time rather than open time.
---
 chardev/char-file.c |  8 
 chardev/char.c  |  3 +++
 qemu-options.hx | 10 --
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/chardev/char-file.c b/chardev/char-file.c
index 3a7b9caf6f0..263e6da5636 100644
--- a/chardev/char-file.c
+++ b/chardev/char-file.c
@@ -100,6 +100,7 @@ static void qemu_chr_parse_file_out(QemuOpts *opts, 
ChardevBackend *backend,
 Error **errp)
 {
 const char *path = qemu_opt_get(opts, "path");
+const char *inpath = qemu_opt_get(opts, "input-path");
 ChardevFile *file;
 
 backend->type = CHARDEV_BACKEND_KIND_FILE;
@@ -107,9 +108,16 @@ static void qemu_chr_parse_file_out(QemuOpts *opts, 
ChardevBackend *backend,
 error_setg(errp, "chardev: file: no filename given");
 return;
 }
+#ifdef _WIN32
+if (inpath) {
+error_setg(errp, "chardev: file: input-path not supported on Windows");
+return;
+}
+#endif
 file = backend->u.file.data = g_new0(ChardevFile, 1);
 qemu_chr_parse_common(opts, qapi_ChardevFile_base(file));
 file->out = g_strdup(path);
+file->in = g_strdup(inpath);
 
 file->has_append = true;
 file->append = qemu_opt_get_bool(opts, "append", false);
diff --git a/chardev/char.c b/chardev/char.c
index e69390601fc..661ad8176a9 100644
--- a/chardev/char.c
+++ b/chardev/char.c
@@ -805,6 +805,9 @@ QemuOptsList qemu_chardev_opts = {
 },{
 .name = "path",
 .type = QEMU_OPT_STRING,
+},{
+.name = "input-path",
+.type = QEMU_OPT_STRING,
 },{
 .name = "host",
 .type = QEMU_OPT_STRING,
diff --git a/qemu-options.hx b/qemu-options.hx
index 59bdf67a2c5..31d08c60264 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3360,7 +3360,7 @@ DEF("chardev", HAS_ARG, QEMU_OPTION_chardev,
 "-chardev 
vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]]\n"
 " [,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
 "-chardev ringbuf,id=id[,size=size][,logfile=PATH][,logappend=on|off]\n"
-"-chardev 
file,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
+"-chardev 
file,id=id,path=path[,input-file=input-file][,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
 "-chardev 
pipe,id=id,path=path[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
 #ifdef _WIN32
 "-chardev console,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
@@ -3563,13 +3563,19 @@ The available backends are:
 Create a ring buffer with fixed size ``size``. size must be a power
 of two and defaults to ``64K``.
 
-``-chardev file,id=id,path=path``
+``-chardev file,id=id,path=path[,input-path=input-path]``
 Log all traffic received from the guest to a file.
 
 ``path`` specifies the path of the file to be opened. This file will
 be created if it does not already exist, and overwritten if it does.
 ``path`` is required.
 
+If ``input-path`` is specified, this is the path of a second file
+which will be used for input. If ``input-path`` is not specified,
+no input will be available from the chardev.
+
+Note that ``input-path`` is not supported on Windows hosts.
+
 ``-chardev pipe,id=id,path=path``
 Create a two-way connection to the guest. The behaviour differs
 slightly between Windows hosts and other hosts:
-- 
2.34.1

[PATCH 1/2] tcg: ppc64: Fix mask generation for vextractdm

2023-04-13 Thread Shivaprasad G Bhat

In function do_extractm() the mask is calculated as
dup_const(1 << (element_width - 1)). '1' being signed int
works fine for MO_8,16,32. For MO_64, on PPC64 host
this ends up becoming 0 on compilation. The vextractdm
uses MO_64, and it ends up having mask as 0.

Explicitly use 1ULL instead of signed int 1 like its
used everywhere else.

Signed-off-by: Shivaprasad G Bhat 
---
 target/ppc/translate/vmx-impl.c.inc |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index 112233b541..c8712dd7d8 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -2058,7 +2058,7 @@ static bool trans_VEXPANDQM(DisasContext *ctx, arg_VX_tb 
*a)
 static bool do_vextractm(DisasContext *ctx, arg_VX_tb *a, unsigned vece)
 {
 const uint64_t elem_width = 8 << vece, elem_count_half = 8 >> vece,
-   mask = dup_const(vece, 1 << (elem_width - 1));
+   mask = dup_const(vece, 1ULL << (elem_width - 1));
 uint64_t i, j;
 TCGv_i64 lo, hi, t0, t1;

[PATCH 0/2] tcg: ppc64: Fix mask generation for vextractdm

2023-04-13 Thread Shivaprasad G Bhat

While debugging gitlab issue[1] 1536, I happen to try the
vextract[X]m instructions on the real hardware. The test
used in [1] is failing for vextractdm.

On debugging it is seen, in function do_extractm() the
mask is calculated as dup_const(1 << (element_width - 1)).
'1' being signed int works fine for MO_8,16,32. For MO_64,
on PPC64 host this ends up becoming 0 on compilation. The
vextractdm uses MO_64, and it ends up having mask as 0.

The first patch here fixes that by explicitly using
1ULL instead of signed int 1 like its used everywhere else.
Second patch introduces the test case from [1] into qemu
tcg/ppc64 along with fixes/tweaks to make it work for both
big and little-endian targets.

Let me know if both patches should be squashed into single
patch. Checkpatch flagged me to avoid use of __BYTE_ORDER__
in the test file(second patch), however I see it being
used in multiarch/sha1.c also this being arch specific
test, I think it is appropriate to use it here. Let me
know if otherwise.

References:
[1] : https://gitlab.com/qemu-project/qemu/-/issues/1536

---

Shivaprasad G Bhat (2):
  tcg: ppc64: Fix mask generation for vextractdm
  tests: tcg: ppc64: Add tests for Vector Extract Mask Instructions


 target/ppc/translate/vmx-impl.c.inc |  2 +-
 tests/tcg/ppc64/Makefile.target |  6 +++-
 tests/tcg/ppc64/vector.c| 50 +
 3 files changed, 56 insertions(+), 2 deletions(-)
 create mode 100644 tests/tcg/ppc64/vector.c

--
Signature

[PATCH 2/2] tests: tcg: ppc64: Add tests for Vector Extract Mask Instructions

2023-04-13 Thread Shivaprasad G Bhat

Add test for vextractbm, vextractwm, vextractdm and vextractqm
instructions. Test works for both qemu-ppc64 and qemu-ppc64le.

Based on the test case written by John Platts posted at [1]

References:
[1]: https://gitlab.com/qemu-project/qemu/-/issues/1536

Signed-off-by: John Platts 
Signed-off-by: Shivaprasad G Bhat 
---
 tests/tcg/ppc64/Makefile.target |6 -
 tests/tcg/ppc64/vector.c|   50 +++
 2 files changed, 55 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/ppc64/vector.c

diff --git a/tests/tcg/ppc64/Makefile.target b/tests/tcg/ppc64/Makefile.target
index f081f1c683..4fd543ce28 100644
--- a/tests/tcg/ppc64/Makefile.target
+++ b/tests/tcg/ppc64/Makefile.target
@@ -20,7 +20,7 @@ PPC64_TESTS += mtfsf
 PPC64_TESTS += mffsce
 
 ifneq ($(CROSS_CC_HAS_POWER10),)
-PPC64_TESTS += byte_reverse sha512-vector
+PPC64_TESTS += byte_reverse sha512-vector vector
 endif
 byte_reverse: CFLAGS += -mcpu=power10
 run-byte_reverse: QEMU_OPTS+=-cpu POWER10
@@ -33,6 +33,10 @@ sha512-vector: sha512.c
 run-sha512-vector: QEMU_OPTS+=-cpu POWER10
 run-plugin-sha512-vector-with-%: QEMU_OPTS+=-cpu POWER10
 
+vector: CFLAGS += -mcpu=power10
+run-vector: QEMU_OPTS += -cpu POWER10
+run-plugin-vector-with-%: QEMU_OPTS += -cpu POWER10
+
 PPC64_TESTS += signal_save_restore_xer
 PPC64_TESTS += xxspltw
 
diff --git a/tests/tcg/ppc64/vector.c b/tests/tcg/ppc64/vector.c
new file mode 100644
index 00..3cb2b88c87
--- /dev/null
+++ b/tests/tcg/ppc64/vector.c
@@ -0,0 +1,50 @@
+#include 
+#include 
+
+int main(void)
+{
+unsigned int result_wi;
+vector unsigned char vbc_bi_src = { 0xFF, 0xFF, 0, 0xFF, 0xFF, 0xFF,
+0xFF, 0xFF, 0xFF, 0xFF, 0, 0, 0,
+0, 0xFF, 0xFF};
+vector unsigned short vbc_hi_src = { 0x, 0, 0, 0x,
+ 0, 0, 0x, 0x};
+vector unsigned int vbc_wi_src = {0, 0, 0x, 0x};
+vector unsigned long long vbc_di_src = {0x, 0};
+vector __uint128_t vbc_qi_src;
+
+asm("vextractbm %0, %1" : "=r" (result_wi) : "v" (vbc_bi_src));
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+assert(result_wi == 0b11011111);
+#else
+assert(result_wi == 0b11111011);
+#endif
+
+asm("vextracthm %0, %1" : "=r" (result_wi) : "v" (vbc_hi_src));
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+assert(result_wi == 0b10010011);
+#else
+assert(result_wi == 0b11001001);
+#endif
+
+asm("vextractwm %0, %1" : "=r" (result_wi) : "v" (vbc_wi_src));
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+assert(result_wi == 0b0011);
+#else
+assert(result_wi == 0b1100);
+#endif
+
+asm("vextractdm %0, %1" : "=r" (result_wi) : "v" (vbc_di_src));
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+assert(result_wi == 0b10);
+#else
+assert(result_wi == 0b01);
+#endif
+
+vbc_qi_src[0] = 0x1;
+vbc_qi_src[0] = vbc_qi_src[0] << 127;
+asm("vextractqm %0, %1" : "=r" (result_wi) : "v" (vbc_qi_src));
+assert(result_wi == 0b1);
+
+return 0;
+}

[PATCH v3 1/6] virtio-input: generalize virtio_input_key_config()

2023-04-13 Thread Sergio Lopez

As there are other bitmap-based config properties that need to be dealt in a
similar fashion as VIRTIO_INPUT_CFG_EV_BITS, generalize the function to
receive select and subsel as arguments, and rename it to
virtio_input_extend_config()

Signed-off-by: Sergio Lopez 
Reviewed-by: Marc-André Lureau 
---
 hw/input/virtio-input-hid.c | 38 -
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/hw/input/virtio-input-hid.c b/hw/input/virtio-input-hid.c
index a7a244a95d..d28dab69ba 100644
--- a/hw/input/virtio-input-hid.c
+++ b/hw/input/virtio-input-hid.c
@@ -44,30 +44,31 @@ static const unsigned short axismap_abs[INPUT_AXIS__MAX] = {
 
 /* - */
 
-static void virtio_input_key_config(VirtIOInput *vinput,
-const unsigned short *keymap,
-size_t mapsize)
+static void virtio_input_extend_config(VirtIOInput *vinput,
+   const unsigned short *map,
+   size_t mapsize,
+   uint8_t select, uint8_t subsel)
 {
-virtio_input_config keys;
+virtio_input_config ext;
 int i, bit, byte, bmax = 0;
 
-memset(&keys, 0, sizeof(keys));
+memset(&ext, 0, sizeof(ext));
 for (i = 0; i < mapsize; i++) {
-bit = keymap[i];
+bit = map[i];
 if (!bit) {
 continue;
 }
 byte = bit / 8;
 bit  = bit % 8;
-keys.u.bitmap[byte] |= (1 << bit);
+ext.u.bitmap[byte] |= (1 << bit);
 if (bmax < byte+1) {
 bmax = byte+1;
 }
 }
-keys.select = VIRTIO_INPUT_CFG_EV_BITS;
-keys.subsel = EV_KEY;
-keys.size   = bmax;
-virtio_input_add_config(vinput, &keys);
+ext.select = select;
+ext.subsel = subsel;
+ext.size   = bmax;
+virtio_input_add_config(vinput, &ext);
 }
 
 static void virtio_input_handle_event(DeviceState *dev, QemuConsole *src,
@@ -281,8 +282,9 @@ static void virtio_keyboard_init(Object *obj)
 
 vhid->handler = &virtio_keyboard_handler;
 virtio_input_init_config(vinput, virtio_keyboard_config);
-virtio_input_key_config(vinput, qemu_input_map_qcode_to_linux,
-qemu_input_map_qcode_to_linux_len);
+virtio_input_extend_config(vinput, qemu_input_map_qcode_to_linux,
+   qemu_input_map_qcode_to_linux_len,
+   VIRTIO_INPUT_CFG_EV_BITS, EV_KEY);
 }
 
 static const TypeInfo virtio_keyboard_info = {
@@ -373,8 +375,9 @@ static void virtio_mouse_init(Object *obj)
 virtio_input_init_config(vinput, vhid->wheel_axis
  ? virtio_mouse_config_v2
  : virtio_mouse_config_v1);
-virtio_input_key_config(vinput, keymap_button,
-ARRAY_SIZE(keymap_button));
+virtio_input_extend_config(vinput, keymap_button,
+   ARRAY_SIZE(keymap_button),
+   VIRTIO_INPUT_CFG_EV_BITS, EV_KEY);
 }
 
 static const TypeInfo virtio_mouse_info = {
@@ -497,8 +500,9 @@ static void virtio_tablet_init(Object *obj)
 virtio_input_init_config(vinput, vhid->wheel_axis
  ? virtio_tablet_config_v2
  : virtio_tablet_config_v1);
-virtio_input_key_config(vinput, keymap_button,
-ARRAY_SIZE(keymap_button));
+virtio_input_extend_config(vinput, keymap_button,
+   ARRAY_SIZE(keymap_button),
+   VIRTIO_INPUT_CFG_EV_BITS, EV_KEY);
 }
 
 static const TypeInfo virtio_tablet_info = {
-- 
2.38.1

[PATCH v3 4/6] virtio-input-pci: add virtio-multitouch-pci

2023-04-13 Thread Sergio Lopez

Add virtio-multitouch-pci, a Multitouch-capable input device, to the
list of devices that can be provided by virtio-input-pci.

Signed-off-by: Sergio Lopez 
Reviewed-by: Marc-André Lureau 
---
 hw/virtio/virtio-input-pci.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/virtio-input-pci.c b/hw/virtio/virtio-input-pci.c
index a9d0992389..a53edf46c4 100644
--- a/hw/virtio/virtio-input-pci.c
+++ b/hw/virtio/virtio-input-pci.c
@@ -25,10 +25,11 @@ struct VirtIOInputPCI {
 VirtIOInput vdev;
 };
 
-#define TYPE_VIRTIO_INPUT_HID_PCI "virtio-input-hid-pci"
-#define TYPE_VIRTIO_KEYBOARD_PCI  "virtio-keyboard-pci"
-#define TYPE_VIRTIO_MOUSE_PCI "virtio-mouse-pci"
-#define TYPE_VIRTIO_TABLET_PCI"virtio-tablet-pci"
+#define TYPE_VIRTIO_INPUT_HID_PCI  "virtio-input-hid-pci"
+#define TYPE_VIRTIO_KEYBOARD_PCI   "virtio-keyboard-pci"
+#define TYPE_VIRTIO_MOUSE_PCI  "virtio-mouse-pci"
+#define TYPE_VIRTIO_TABLET_PCI "virtio-tablet-pci"
+#define TYPE_VIRTIO_MULTITOUCH_PCI "virtio-multitouch-pci"
 OBJECT_DECLARE_SIMPLE_TYPE(VirtIOInputHIDPCI, VIRTIO_INPUT_HID_PCI)
 
 struct VirtIOInputHIDPCI {
@@ -102,6 +103,14 @@ static void virtio_tablet_initfn(Object *obj)
 TYPE_VIRTIO_TABLET);
 }
 
+static void virtio_multitouch_initfn(Object *obj)
+{
+VirtIOInputHIDPCI *dev = VIRTIO_INPUT_HID_PCI(obj);
+
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_MULTITOUCH);
+}
+
 static const TypeInfo virtio_input_pci_info = {
 .name  = TYPE_VIRTIO_INPUT_PCI,
 .parent= TYPE_VIRTIO_PCI,
@@ -140,6 +149,13 @@ static const VirtioPCIDeviceTypeInfo 
virtio_tablet_pci_info = {
 .instance_init = virtio_tablet_initfn,
 };
 
+static const VirtioPCIDeviceTypeInfo virtio_multitouch_pci_info = {
+.generic_name  = TYPE_VIRTIO_MULTITOUCH_PCI,
+.parent= TYPE_VIRTIO_INPUT_HID_PCI,
+.instance_size = sizeof(VirtIOInputHIDPCI),
+.instance_init = virtio_multitouch_initfn,
+};
+
 static void virtio_pci_input_register(void)
 {
 /* Base types: */
@@ -150,6 +166,7 @@ static void virtio_pci_input_register(void)
 virtio_pci_types_register(&virtio_keyboard_pci_info);
 virtio_pci_types_register(&virtio_mouse_pci_info);
 virtio_pci_types_register(&virtio_tablet_pci_info);
+virtio_pci_types_register(&virtio_multitouch_pci_info);
 }
 
 type_init(virtio_pci_input_register)
-- 
2.38.1

[PATCH v3 6/6] ui/gtk: enable backend to send multi-touch events

2023-04-13 Thread Sergio Lopez

GTK3 provides the infrastructure to receive and process multi-touch
events through the "touch-event" signal and the GdkEventTouch type.
Make use of it to transpose events from the host to the guest.

This allows users of machines with hardware capable of receiving
multi-touch events to run guests that can also receive those events
and interpret them as gestures, when appropriate.

An example of this in action can be seen here:

 https://fosstodon.org/@slp/109545849296546767

Signed-off-by: Sergio Lopez 
Reviewed-by: Marc-André Lureau 
---
 ui/gtk.c | 92 
 1 file changed, 92 insertions(+)

diff --git a/ui/gtk.c b/ui/gtk.c
index f16e0f8dee..b3e6443943 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -130,6 +130,13 @@ typedef struct VCChardev VCChardev;
 DECLARE_INSTANCE_CHECKER(VCChardev, VC_CHARDEV,
  TYPE_CHARDEV_VC)
 
+struct touch_slot {
+int x;
+int y;
+int tracking_id;
+};
+static struct touch_slot touch_slots[INPUT_EVENT_SLOTS_MAX];
+
 bool gtk_use_gl_area;
 
 static void gd_grab_pointer(VirtualConsole *vc, const char *reason);
@@ -1059,6 +1066,82 @@ static gboolean gd_scroll_event(GtkWidget *widget, 
GdkEventScroll *scroll,
 }
 
 
+static gboolean gd_touch_event(GtkWidget *widget, GdkEventTouch *touch,
+   void *opaque)
+{
+VirtualConsole *vc = opaque;
+struct touch_slot *slot;
+uint64_t num_slot = GPOINTER_TO_UINT(touch->sequence);
+bool needs_sync = false;
+int update;
+int type = -1;
+int i;
+
+if (num_slot >= INPUT_EVENT_SLOTS_MAX) {
+warn_report("gtk: unexpected touch slot number: %ld >= %d\n",
+num_slot, INPUT_EVENT_SLOTS_MAX);
+return FALSE;
+}
+
+slot = &touch_slots[num_slot];
+slot->x = touch->x;
+slot->y = touch->y;
+
+switch (touch->type) {
+case GDK_TOUCH_BEGIN:
+type = INPUT_MULTITOUCH_TYPE_BEGIN;
+slot->tracking_id = num_slot;
+break;
+case GDK_TOUCH_UPDATE:
+type = INPUT_MULTITOUCH_TYPE_UPDATE;
+break;
+case GDK_TOUCH_END:
+case GDK_TOUCH_CANCEL:
+type = INPUT_MULTITOUCH_TYPE_END;
+break;
+default:
+warn_report("gtk: unexpected touch event type\n");
+}
+
+for (i = 0; i < INPUT_EVENT_SLOTS_MAX; ++i) {
+if (i == num_slot) {
+update = type;
+} else {
+update = INPUT_MULTITOUCH_TYPE_UPDATE;
+}
+
+slot = &touch_slots[i];
+
+if (slot->tracking_id == -1) {
+continue;
+}
+
+if (update == INPUT_MULTITOUCH_TYPE_END) {
+slot->tracking_id = -1;
+qemu_input_queue_mtt(vc->gfx.dcl.con, update, i, 
slot->tracking_id);
+needs_sync = true;
+} else {
+qemu_input_queue_mtt(vc->gfx.dcl.con, update, i, 
slot->tracking_id);
+qemu_input_queue_btn(vc->gfx.dcl.con, INPUT_BUTTON_TOUCH, true);
+qemu_input_queue_mtt_abs(vc->gfx.dcl.con,
+ INPUT_AXIS_X, (int) slot->x,
+ 0, surface_width(vc->gfx.ds),
+ i, slot->tracking_id);
+qemu_input_queue_mtt_abs(vc->gfx.dcl.con,
+ INPUT_AXIS_Y, (int) slot->y,
+ 0, surface_height(vc->gfx.ds),
+ i, slot->tracking_id);
+needs_sync = true;
+}
+}
+
+if (needs_sync) {
+qemu_input_event_sync();
+}
+
+return TRUE;
+}
+
 static const guint16 *gd_get_keymap(size_t *maplen)
 {
 GdkDisplay *dpy = gdk_display_get_default();
@@ -1980,6 +2063,8 @@ static void gd_connect_vc_gfx_signals(VirtualConsole *vc)
  G_CALLBACK(gd_key_event), vc);
 g_signal_connect(vc->gfx.drawing_area, "key-release-event",
  G_CALLBACK(gd_key_event), vc);
+g_signal_connect(vc->gfx.drawing_area, "touch-event",
+ G_CALLBACK(gd_touch_event), vc);
 
 g_signal_connect(vc->gfx.drawing_area, "enter-notify-event",
  G_CALLBACK(gd_enter_event), vc);
@@ -2089,6 +2174,7 @@ static GSList *gd_vc_gfx_init(GtkDisplayState *s, 
VirtualConsole *vc,
   GSList *group, GtkWidget *view_menu)
 {
 bool zoom_to_fit = false;
+int i;
 
 vc->label = qemu_console_get_label(con);
 vc->s = s;
@@ -2136,6 +,7 @@ static GSList *gd_vc_gfx_init(GtkDisplayState *s, 
VirtualConsole *vc,
   GDK_BUTTON_PRESS_MASK |
   GDK_BUTTON_RELEASE_MASK |
   GDK_BUTTON_MOTION_MASK |
+  GDK_TOUCH_MASK |
   GDK_ENTER_NOTIFY_MASK |
   GDK_LEAVE_NOTIFY_MASK |
   GDK_SCROLL_MASK |
@@ -2171,6 +2258,11 @@ static GSList *g

[PATCH v3 2/6] ui: add the infrastructure to support MT events

2023-04-13 Thread Sergio Lopez

Add the required infrastructure to support generating multitouch events.

Signed-off-by: Sergio Lopez 
Reviewed-by: Marc-André Lureau 
---
 include/ui/input.h|  3 +++
 qapi/ui.json  | 46 ---
 replay/replay-input.c | 18 +
 ui/input.c|  6 ++
 ui/trace-events   |  1 +
 5 files changed, 71 insertions(+), 3 deletions(-)

diff --git a/include/ui/input.h b/include/ui/input.h
index c86219a1c1..2a3dffd417 100644
--- a/include/ui/input.h
+++ b/include/ui/input.h
@@ -8,9 +8,12 @@
 #define INPUT_EVENT_MASK_BTN   (1type) {
@@ -58,6 +59,14 @@ void replay_save_input_event(InputEvent *evt)
 replay_put_dword(move->axis);
 replay_put_qword(move->value);
 break;
+case INPUT_EVENT_KIND_MTT:
+mtt = evt->u.mtt.data;
+replay_put_dword(mtt->type);
+replay_put_qword(mtt->slot);
+replay_put_qword(mtt->tracking_id);
+replay_put_dword(mtt->axis);
+replay_put_qword(mtt->value);
+break;
 case INPUT_EVENT_KIND__MAX:
 /* keep gcc happy */
 break;
@@ -73,6 +82,7 @@ InputEvent *replay_read_input_event(void)
 InputBtnEvent btn;
 InputMoveEvent rel;
 InputMoveEvent abs;
+InputMultitouchEvent mtt;
 
 evt.type = replay_get_dword();
 switch (evt.type) {
@@ -109,6 +119,14 @@ InputEvent *replay_read_input_event(void)
 evt.u.abs.data->axis = (InputAxis)replay_get_dword();
 evt.u.abs.data->value = replay_get_qword();
 break;
+case INPUT_EVENT_KIND_MTT:
+evt.u.mtt.data = &mtt;
+evt.u.mtt.data->type = (InputMultitouchType)replay_get_dword();
+evt.u.mtt.data->slot = replay_get_qword();
+evt.u.mtt.data->tracking_id = replay_get_qword();
+evt.u.mtt.data->axis = (InputAxis)replay_get_dword();
+evt.u.mtt.data->value = replay_get_qword();
+break;
 case INPUT_EVENT_KIND__MAX:
 /* keep gcc happy */
 break;
diff --git a/ui/input.c b/ui/input.c
index f2d1e7a3a7..f788db20f7 100644
--- a/ui/input.c
+++ b/ui/input.c
@@ -212,6 +212,7 @@ static void qemu_input_event_trace(QemuConsole *src, 
InputEvent *evt)
 InputKeyEvent *key;
 InputBtnEvent *btn;
 InputMoveEvent *move;
+InputMultitouchEvent *mtt;
 
 if (src) {
 idx = qemu_console_get_index(src);
@@ -250,6 +251,11 @@ static void qemu_input_event_trace(QemuConsole *src, 
InputEvent *evt)
 name = InputAxis_str(move->axis);
 trace_input_event_abs(idx, name, move->value);
 break;
+case INPUT_EVENT_KIND_MTT:
+mtt = evt->u.mtt.data;
+name = InputAxis_str(mtt->axis);
+trace_input_event_mtt(idx, name, mtt->value);
+break;
 case INPUT_EVENT_KIND__MAX:
 /* keep gcc happy */
 break;
diff --git a/ui/trace-events b/ui/trace-events
index 977577fbba..6747361745 100644
--- a/ui/trace-events
+++ b/ui/trace-events
@@ -90,6 +90,7 @@ input_event_key_qcode(int conidx, const char *qcode, bool 
down) "con %d, key qco
 input_event_btn(int conidx, const char *btn, bool down) "con %d, button %s, 
down %d"
 input_event_rel(int conidx, const char *axis, int value) "con %d, axis %s, 
value %d"
 input_event_abs(int conidx, const char *axis, int value) "con %d, axis %s, 
value 0x%x"
+input_event_mtt(int conidx, const char *axis, int value) "con %d, axis %s, 
value 0x%x"
 input_event_sync(void) ""
 input_mouse_mode(int absolute) "absolute %d"
 
-- 
2.38.1

[PATCH v3 0/6] Implement virtio-multitouch and enable GTK3 to use it

2023-04-13 Thread Sergio Lopez

This series adds a virtio-multitouch device to the family of devices emulated
by virtio-input implementing the Multi-touch protocol as descripted here:

https://www.kernel.org/doc/html/latest/input/multi-touch-protocol.html?highlight=multi+touch

It also extends the GTK UI backend to be able to receive multi-touch events
and transpose them to a guest, so the latter can recognize them as gestures
when appropriate.

An example of this in action can be seen here:

 https://fosstodon.org/@slp/109545849296546767

Since v2:
- Fix InputMultitouchEvent doc in qapi/ui.json (Marc-André).
- Use warn_report() instead of fprintf() in gtk.c (Marc-André).
- Rebase and collect R-b.

Since v1:
- Split 0002 patch to implement ui, virtio-input-hid and virtio-input-pci
  changes in different patches (Marc-André).
- Fix versioning in qapi/ui.json (Marc-André).
- Print a warning if touch->sequence >= INPUT_EVENT_SLOTS_MAX (Marc-André).
- Only send SYN_REPORT once, if needed (Marc-André).
- Rebase and collect R-b.

Sergio Lopez (6):
  virtio-input: generalize virtio_input_key_config()
  ui: add the infrastructure to support MT events
  virtio-input: add a virtio-mulitouch device
  virtio-input-pci: add virtio-multitouch-pci
  ui: add helpers for virtio-multitouch events
  ui/gtk: enable backend to send multi-touch events

 hw/input/virtio-input-hid.c  | 156 +++
 hw/virtio/virtio-input-pci.c |  25 -
 include/hw/virtio/virtio-input.h |   9 +-
 include/ui/input.h   |   8 ++
 qapi/ui.json |  46 -
 replay/replay-input.c|  18 
 ui/gtk.c |  92 ++
 ui/input.c   |  42 +
 ui/trace-events  |   1 +
 9 files changed, 366 insertions(+), 31 deletions(-)

-- 
2.38.1

[PATCH v3 5/6] ui: add helpers for virtio-multitouch events

2023-04-13 Thread Sergio Lopez

Add helpers for generating Multi-touch events from the UI backends that
can be sent to the guest through a virtio-multitouch device.

Signed-off-by: Sergio Lopez 
Reviewed-by: Marc-André Lureau 
---
 include/ui/input.h |  5 +
 ui/input.c | 36 
 2 files changed, 41 insertions(+)

diff --git a/include/ui/input.h b/include/ui/input.h
index 2a3dffd417..c37251e1e9 100644
--- a/include/ui/input.h
+++ b/include/ui/input.h
@@ -64,6 +64,11 @@ int qemu_input_scale_axis(int value,
 void qemu_input_queue_rel(QemuConsole *src, InputAxis axis, int value);
 void qemu_input_queue_abs(QemuConsole *src, InputAxis axis, int value,
   int min_in, int max_in);
+void qemu_input_queue_mtt(QemuConsole *src, InputMultitouchType type, int slot,
+  int tracking_id);
+void qemu_input_queue_mtt_abs(QemuConsole *src, InputAxis axis, int value,
+  int min_in, int max_in,
+  int slot, int tracking_id);
 
 void qemu_input_check_mode_change(void);
 void qemu_add_mouse_mode_change_notifier(Notifier *notify);
diff --git a/ui/input.c b/ui/input.c
index f788db20f7..34331b7b0b 100644
--- a/ui/input.c
+++ b/ui/input.c
@@ -547,6 +547,42 @@ void qemu_input_queue_abs(QemuConsole *src, InputAxis 
axis, int value,
 qemu_input_event_send(src, &evt);
 }
 
+void qemu_input_queue_mtt(QemuConsole *src, InputMultitouchType type,
+  int slot, int tracking_id)
+{
+InputMultitouchEvent mtt = {
+.type = type,
+.slot = slot,
+.tracking_id = tracking_id,
+};
+InputEvent evt = {
+.type = INPUT_EVENT_KIND_MTT,
+.u.mtt.data = &mtt,
+};
+
+qemu_input_event_send(src, &evt);
+}
+
+void qemu_input_queue_mtt_abs(QemuConsole *src, InputAxis axis, int value,
+  int min_in, int max_in, int slot, int 
tracking_id)
+{
+InputMultitouchEvent mtt = {
+.type = INPUT_MULTITOUCH_TYPE_DATA,
+.slot = slot,
+.tracking_id = tracking_id,
+.axis = axis,
+.value = qemu_input_scale_axis(value, min_in, max_in,
+   INPUT_EVENT_ABS_MIN,
+   INPUT_EVENT_ABS_MAX),
+};
+InputEvent evt = {
+.type = INPUT_EVENT_KIND_MTT,
+.u.mtt.data = &mtt,
+};
+
+qemu_input_event_send(src, &evt);
+}
+
 void qemu_input_check_mode_change(void)
 {
 static int current_is_absolute;
-- 
2.38.1

[PATCH] hw/i386/vmmouse:add relative packet flag for button status

2023-04-13 Thread Zongmin Zhou

The buttons value use macros instead of direct numbers.

If request relative mode, have to add this for
guest vmmouse driver to judge this is a relative packet.
otherwise,vmmouse driver will not match
the condition 'status & VMMOUSE_RELATIVE_PACKET',
and can't report events on the correct(relative) input device,
result to relative mode unuseful.

Signed-off-by: Zongmin Zhou
---
 hw/i386/vmmouse.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/hw/i386/vmmouse.c b/hw/i386/vmmouse.c
index a56c185f15..6cd624bd09 100644
--- a/hw/i386/vmmouse.c
+++ b/hw/i386/vmmouse.c
@@ -44,6 +44,12 @@
 
 #define VMMOUSE_VERSION0x3442554a
 
+#define VMMOUSE_RELATIVE_PACKET0x0001
+
+#define VMMOUSE_LEFT_BUTTON0x20
+#define VMMOUSE_RIGHT_BUTTON   0x10
+#define VMMOUSE_MIDDLE_BUTTON  0x08
+
 #ifdef DEBUG_VMMOUSE
 #define DPRINTF(fmt, ...) printf(fmt, ## __VA_ARGS__)
 #else
@@ -103,15 +109,18 @@ static void vmmouse_mouse_event(void *opaque, int x, int 
y, int dz, int buttons_
 x, y, dz, buttons_state);
 
 if ((buttons_state & MOUSE_EVENT_LBUTTON))
-buttons |= 0x20;
+buttons |= VMMOUSE_LEFT_BUTTON;
 if ((buttons_state & MOUSE_EVENT_RBUTTON))
-buttons |= 0x10;
+buttons |= VMMOUSE_RIGHT_BUTTON;
 if ((buttons_state & MOUSE_EVENT_MBUTTON))
-buttons |= 0x08;
+buttons |= VMMOUSE_MIDDLE_BUTTON;
 
 if (s->absolute) {
 x <<= 1;
 y <<= 1;
+} else{
+/* add for guest vmmouse driver to judge this is a relative packet. */
+buttons |= VMMOUSE_RELATIVE_PACKET;
 }
 
 s->queue[s->nb_queue++] = buttons;
-- 
2.34.1


No virus found
Checked by Hillstone Network AntiVirus

[PATCH v3 3/6] virtio-input: add a virtio-mulitouch device

2023-04-13 Thread Sergio Lopez

Add a virtio-multitouch device to the family of devices emulated by
virtio-input implementing the Multi-touch protocol as descripted here:

https://www.kernel.org/doc/html/latest/input/multi-touch-protocol.html?highlight=multi+touch

This patch just add the device itself, without connecting it to any
backends. The following patches will add a PCI-based multitouch device,
some helpers in "ui" and will enable the GTK3 backend to transpose
multi-touch events from the host to the guest.

Signed-off-by: Sergio Lopez 
Reviewed-by: Marc-André Lureau 
---
 hw/input/virtio-input-hid.c  | 118 ++-
 include/hw/virtio/virtio-input.h |   9 +--
 2 files changed, 120 insertions(+), 7 deletions(-)

diff --git a/hw/input/virtio-input-hid.c b/hw/input/virtio-input-hid.c
index d28dab69ba..742235d3fa 100644
--- a/hw/input/virtio-input-hid.c
+++ b/hw/input/virtio-input-hid.c
@@ -16,9 +16,10 @@
 
 #include "standard-headers/linux/input.h"
 
-#define VIRTIO_ID_NAME_KEYBOARD "QEMU Virtio Keyboard"
-#define VIRTIO_ID_NAME_MOUSE"QEMU Virtio Mouse"
-#define VIRTIO_ID_NAME_TABLET   "QEMU Virtio Tablet"
+#define VIRTIO_ID_NAME_KEYBOARD "QEMU Virtio Keyboard"
+#define VIRTIO_ID_NAME_MOUSE"QEMU Virtio Mouse"
+#define VIRTIO_ID_NAME_TABLET   "QEMU Virtio Tablet"
+#define VIRTIO_ID_NAME_MULTITOUCH   "QEMU Virtio Multitouch"
 
 /* - */
 
@@ -30,6 +31,7 @@ static const unsigned short keymap_button[INPUT_BUTTON__MAX] 
= {
 [INPUT_BUTTON_WHEEL_DOWN]= BTN_GEAR_DOWN,
 [INPUT_BUTTON_SIDE]  = BTN_SIDE,
 [INPUT_BUTTON_EXTRA] = BTN_EXTRA,
+[INPUT_BUTTON_TOUCH] = BTN_TOUCH,
 };
 
 static const unsigned short axismap_rel[INPUT_AXIS__MAX] = {
@@ -42,6 +44,11 @@ static const unsigned short axismap_abs[INPUT_AXIS__MAX] = {
 [INPUT_AXIS_Y]   = ABS_Y,
 };
 
+static const unsigned short axismap_tch[INPUT_AXIS__MAX] = {
+[INPUT_AXIS_X]   = ABS_MT_POSITION_X,
+[INPUT_AXIS_Y]   = ABS_MT_POSITION_Y,
+};
+
 /* - */
 
 static void virtio_input_extend_config(VirtIOInput *vinput,
@@ -81,6 +88,7 @@ static void virtio_input_handle_event(DeviceState *dev, 
QemuConsole *src,
 InputKeyEvent *key;
 InputMoveEvent *move;
 InputBtnEvent *btn;
+InputMultitouchEvent *mtt;
 
 switch (evt->type) {
 case INPUT_EVENT_KIND_KEY:
@@ -137,6 +145,24 @@ static void virtio_input_handle_event(DeviceState *dev, 
QemuConsole *src,
 event.value = cpu_to_le32(move->value);
 virtio_input_send(vinput, &event);
 break;
+case INPUT_EVENT_KIND_MTT:
+mtt = evt->u.mtt.data;
+if (mtt->type == INPUT_MULTITOUCH_TYPE_DATA) {
+event.type  = cpu_to_le16(EV_ABS);
+event.code  = cpu_to_le16(axismap_tch[mtt->axis]);
+event.value = cpu_to_le32(mtt->value);
+virtio_input_send(vinput, &event);
+} else {
+event.type  = cpu_to_le16(EV_ABS);
+event.code  = cpu_to_le16(ABS_MT_SLOT);
+event.value = cpu_to_le32(mtt->slot);
+virtio_input_send(vinput, &event);
+event.type  = cpu_to_le16(EV_ABS);
+event.code  = cpu_to_le16(ABS_MT_TRACKING_ID);
+event.value = cpu_to_le32(mtt->tracking_id);
+virtio_input_send(vinput, &event);
+}
+break;
 default:
 /* keep gcc happy */
 break;
@@ -515,12 +541,98 @@ static const TypeInfo virtio_tablet_info = {
 
 /* - */
 
+static QemuInputHandler virtio_multitouch_handler = {
+.name  = VIRTIO_ID_NAME_MULTITOUCH,
+.mask  = INPUT_EVENT_MASK_BTN | INPUT_EVENT_MASK_MTT,
+.event = virtio_input_handle_event,
+.sync  = virtio_input_handle_sync,
+};
+
+static struct virtio_input_config virtio_multitouch_config[] = {
+{
+.select= VIRTIO_INPUT_CFG_ID_NAME,
+.size  = sizeof(VIRTIO_ID_NAME_MULTITOUCH),
+.u.string  = VIRTIO_ID_NAME_MULTITOUCH,
+},{
+.select= VIRTIO_INPUT_CFG_ID_DEVIDS,
+.size  = sizeof(struct virtio_input_devids),
+.u.ids = {
+.bustype = const_le16(BUS_VIRTUAL),
+.vendor  = const_le16(0x0627), /* same we use for usb hid devices 
*/
+.product = const_le16(0x0003),
+.version = const_le16(0x0001),
+},
+},{
+.select= VIRTIO_INPUT_CFG_ABS_INFO,
+.subsel= ABS_MT_SLOT,
+.size  = sizeof(virtio_input_absinfo),
+.u.abs.min = const_le32(INPUT_EVENT_SLOTS_MIN),
+.u.abs.max = const_le32(INPUT_EVENT_SLOTS_MAX),
+},{
+.select= VIRTIO_INPUT_CFG_ABS_INFO,
+.subsel= ABS_MT_TRACKING_ID,
+.size  = sizeof(virtio_input_absinfo),
+.u.abs.min = const_l

Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

2023-04-13 Thread Christian Brauner

On Thu, Aug 18, 2022 at 04:24:21PM +0300, Kirill A . Shutemov wrote:
> On Wed, Aug 17, 2022 at 10:40:12PM -0700, Hugh Dickins wrote:
> > On Wed, 6 Jul 2022, Chao Peng wrote:
> > > This is the v7 of this series which tries to implement the fd-based KVM
> > > guest private memory.
> > 
> > Here at last are my reluctant thoughts on this patchset.
> > 
> > fd-based approach for supporting KVM guest private memory: fine.
> > 
> > Use or abuse of memfd and shmem.c: mistaken.
> > 
> > memfd_create() was an excellent way to put together the initial prototype.
> > 
> > But since then, TDX in particular has forced an effort into preventing
> > (by flags, seals, notifiers) almost everything that makes it shmem/tmpfs.
> > 
> > Are any of the shmem.c mods useful to existing users of shmem.c? No.
> > Is MFD_INACCESSIBLE useful or comprehensible to memfd_create() users? No.
> > 
> > What use do you have for a filesystem here?  Almost none.
> > IIUC, what you want is an fd through which QEMU can allocate kernel
> > memory, selectively free that memory, and communicate fd+offset+length
> > to KVM.  And perhaps an interface to initialize a little of that memory
> > from a template (presumably copied from a real file on disk somewhere).
> > 
> > You don't need shmem.c or a filesystem for that!
> > 
> > If your memory could be swapped, that would be enough of a good reason
> > to make use of shmem.c: but it cannot be swapped; and although there
> > are some references in the mailthreads to it perhaps being swappable
> > in future, I get the impression that will not happen soon if ever.
> > 
> > If your memory could be migrated, that would be some reason to use
> > filesystem page cache (because page migration happens to understand
> > that type of memory): but it cannot be migrated.
> 
> Migration support is in pipeline. It is part of TDX 1.5 [1]. And swapping
> theoretically possible, but I'm not aware of any plans as of now.
> 
> [1] 
> https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html
> 
> > Some of these impressions may come from earlier iterations of the
> > patchset (v7 looks better in several ways than v5).  I am probably
> > underestimating the extent to which you have taken on board other
> > usages beyond TDX and SEV private memory, and rightly want to serve
> > them all with similar interfaces: perhaps there is enough justification
> > for shmem there, but I don't see it.  There was mention of userfaultfd
> > in one link: does that provide the justification for using shmem?
> > 
> > I'm afraid of the special demands you may make of memory allocation
> > later on - surprised that huge pages are not mentioned already;
> > gigantic contiguous extents? secretmem removed from direct map?
> 
> The design allows for extension to hugetlbfs if needed. Combination of
> MFD_INACCESSIBLE | MFD_HUGETLB should route this way. There should be zero
> implications for shmem. It is going to be separate struct 
> memfile_backing_store.
> 
> I'm not sure secretmem is a fit here as we want to extend MFD_INACCESSIBLE
> to be movable if platform supports it and secretmem is not migratable by
> design (without direct mapping fragmentations).
> 
> > Here's what I would prefer, and imagine much easier for you to maintain;
> > but I'm no system designer, and may be misunderstanding throughout.
> > 
> > QEMU gets fd from opening /dev/kvm_something, uses ioctls (or perhaps
> > the fallocate syscall interface itself) to allocate and free the memory,
> > ioctl for initializing some of it too.  KVM in control of whether that
> > fd can be read or written or mmap'ed or whatever, no need to prevent it
> > in shmem.c, no need for flags, seals, notifications to and fro because
> > KVM is already in control and knows the history.  If shmem actually has
> > value, call into it underneath - somewhat like SysV SHM, and /dev/zero
> > mmap, and i915/gem make use of it underneath.  If shmem has nothing to
> > add, just allocate and free kernel memory directly, recorded in your
> > own xarray.
> 
> I guess shim layer on top of shmem *can* work. I don't see immediately why
> it would not. But I'm not sure it is right direction. We risk creating yet
> another parallel VM with own rules/locking/accounting that opaque to
> core-mm.

Sorry for necrobumping this thread but I've been reviewing the
memfd_restricted() extension that Ackerley is currently working on. I
was pointed to this thread as this is what the extension is building
on but I'll reply to both threads here.

>From a glance at v10, memfd_restricted() is currently implemented as an
in-kernel stacking filesystem. A call to memfd_restricted() creates a
new restricted memfd file and a new unlinked tmpfs file and stashes the
tmpfs file into the memfd file's private data member. It then uses the
tmpfs file's f_ops and i_ops to perform the relevant file and inode
operations. So it has the same callstack as a general stacking
filesystem like overlayfs in

Re: [PATCH for-7.2 v3 3/3] rtl8139: honor large send MSS value

2023-04-13 Thread Peter Maydell

On Thu, 17 Nov 2022 at 16:58, Stefan Hajnoczi  wrote:
>
> The Large-Send Task Offload Tx Descriptor (9.2.1 Transmit) has a
> Large-Send MSS value where the driver specifies the MSS. See the
> datasheet here:
> http://realtek.info/pdf/rtl8139cp.pdf
>
> The code ignores this value and uses a hardcoded MSS of 1500 bytes
> instead. When the MTU is less than 1500 bytes the hardcoded value
> results in IP fragmentation and poor performance.
>
> Use the Large-Send MSS value to correctly size Large-Send packets.
>
> Jason Wang  noticed that the Large-Send MSS value
> mask was incorrect so it is adjusted to match the datasheet and Linux
> 8139cp driver.

Hi Stefan -- in v2 of this patch

https://lore.kernel.org/qemu-devel/20221116154122.1705399-1-stefa...@redhat.com/

there was a check for "is the specified large_send_mss value
too small?":

+/* MSS too small? */
+if (tcp_hlen + hlen >= large_send_mss) {
+goto skip_offload;
+}

but it isn't present in this final version of the patch which
went into git. Was that deliberately dropped?

I ask because the fuzzers have discovered that if you feed this
device a descriptor where the large_send_mss value is 0, then
we will now do a division by zero and crash:
https://gitlab.com/qemu-project/qemu/-/issues/1582

(The datasheet, naturally, says nothing at all about what
happens if the descriptor contains a bogus MSS value.)

thanks
-- PMM

Re: [PATCH 1/4] vhost: Re-enable vrings after setting features

2023-04-13 Thread Michael S. Tsirkin

On Thu, Apr 13, 2023 at 05:24:36PM +0300, Anton Kuchin wrote:
> But is there a valid use-case for logging some dirty memory but not all?
> I can't understand if this is a feature or a just flaw in specification.

IRC the use-case originally conceived was for shadow VQs.  If you use
shadow VQs the VQ access by backend does not change memory since shadow
VQ is not in memory. Not a practical concern right now but there you
have it.

-- 
MST

[RFC PATCH v3] riscv: Add support for the Zfa extension

2023-04-13 Thread Christoph Muellner

From: Christoph Müllner 

This patch introduces the RISC-V Zfa extension, which introduces
additional floating-point extensions:
* fli (load-immediate) with pre-defined immediates
* fminm/fmaxm (like fmin/fmax but with different NaN behaviour)
* fround/froundmx (round to integer)
* fcvtmod.w.d (Modular Convert-to-Integer)
* fmv* to access high bits of float register bigger than XLEN
* Quiet comparison instructions (fleq/fltq)

Zfa defines its instructions in combination with the following extensions:
* single-precision floating-point (F)
* double-precision floating-point (D)
* quad-precision floating-point (Q)
* half-precision floating-point (Zfh)

Since QEMU does not support the RISC-V quad-precision floating-point
ISA extension (Q), this patch does not include the instructions that
depend on this extension. All other instructions are included in this
patch.

The Zfa specification is not frozen at the moment (which is why this
patch is RFC) and can be found here:
  https://github.com/riscv/riscv-isa-manual/blob/master/src/zfa.tex

Signed-off-by: Christoph Müllner 
---
Changes in v3:
* Add disassembler support
* Enable Zfa by default
* Remove forgotten comments in the decoder
* Fix fli translation code (use movi instead of ld)
* Tested against SPEC CPU2017 fprate
* Use floatN_[min|max] for f[min|max]m.* instructions

Changes in v2:
* Remove calls to mark_fs_dirty() in comparison trans functions
* Rewrite fround(nx) using float*_round_to_int()
* Move fli* to translation unit and fix NaN-boxing of NaN values
* Reimplement FCVTMOD.W.D
* Add use of second register in trans_fmvp_d_x()

 disas/riscv.c | 155 ++-
 target/riscv/cpu.c|   8 +
 target/riscv/cpu.h|   1 +
 target/riscv/fpu_helper.c | 222 +
 target/riscv/helper.h |  19 +
 target/riscv/insn32.decode|  26 ++
 target/riscv/insn_trans/trans_rvzfa.c.inc | 529 ++
 target/riscv/translate.c  |   1 +
 8 files changed, 959 insertions(+), 2 deletions(-)
 create mode 100644 target/riscv/insn_trans/trans_rvzfa.c.inc

diff --git a/disas/riscv.c b/disas/riscv.c
index d6b0fbe5e8..defbcfa9c2 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -163,6 +163,7 @@ typedef enum {
 rv_codec_v_i,
 rv_codec_vsetvli,
 rv_codec_vsetivli,
+rv_codec_fli,
 } rv_codec;
 
 typedef enum {
@@ -935,6 +936,39 @@ typedef enum {
 rv_op_vsetvli = 766,
 rv_op_vsetivli = 767,
 rv_op_vsetvl = 768,
+rv_op_fli_s = 769,
+rv_op_fli_d = 770,
+rv_op_fli_q = 771,
+rv_op_fli_h = 772,
+rv_op_fminm_s = 773,
+rv_op_fmaxm_s = 774,
+rv_op_fminm_d = 775,
+rv_op_fmaxm_d = 776,
+rv_op_fminm_q = 777,
+rv_op_fmaxm_q = 778,
+rv_op_fminm_h = 779,
+rv_op_fmaxm_h = 780,
+rv_op_fround_s = 781,
+rv_op_froundnx_s = 782,
+rv_op_fround_d = 783,
+rv_op_froundnx_d = 784,
+rv_op_fround_q = 785,
+rv_op_froundnx_q = 786,
+rv_op_fround_h = 787,
+rv_op_froundnx_h = 788,
+rv_op_fcvtmod_w_d = 789,
+rv_op_fmvh_x_d = 790,
+rv_op_fmvp_d_x = 791,
+rv_op_fmvh_x_q = 792,
+rv_op_fmvp_q_x = 793,
+rv_op_fleq_s = 794,
+rv_op_fltq_s = 795,
+rv_op_fleq_d = 796,
+rv_op_fltq_d = 797,
+rv_op_fleq_q = 798,
+rv_op_fltq_q = 799,
+rv_op_fleq_h = 800,
+rv_op_fltq_h = 801,
 } rv_op;
 
 /* structures */
@@ -1003,6 +1037,24 @@ static const char rv_vreg_name_sym[32][4] = {
 "v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31"
 };
 
+/* The FLI.[HSDQ] numeric constants (0.0 for symbolic constants).
+ * The constants use the hex floating-point literal representation
+ * that is printed when using the printf %a format specifier,
+ * which matches the output that is generated by the disassembler.
+ */
+static const char rv_fli_name_const[32][9] =
+{
+"0x1p+0", "min", "0x1p-16", "0x1p-15",
+"0x1p-8", "0x1p-7", "0x1p-4", "0x1p-3",
+"0x1p-2", "0x1.4p-2", "0x1.8p-2", "0x1.cp-2",
+"0x1p-1", "0x1.4p-1", "0x1.8p-1", "0x1.cp-1",
+"0x1p+0", "0x1.4p+0", "0x1.8p+0", "0x1.cp+0",
+"0x1p+1", "0x1.4p+1", "0x1.8p+1", "0x1p+2",
+"0x1p+3", "0x1p+4", "0x1p+7", "0x1p+8",
+"0x1p+15", "0x1p+16", "inf", "nan"
+};
+
+
 /* instruction formats */
 
 #define rv_fmt_none   "O\t"
@@ -1014,6 +1066,7 @@ static const char rv_vreg_name_sym[32][4] = {
 #define rv_fmt_rd_offset  "O\t0,o"
 #define rv_fmt_rd_rs1_rs2 "O\t0,1,2"
 #define rv_fmt_frd_rs1"O\t3,1"
+#define rv_fmt_frd_rs1_rs2"O\t3,1,2"
 #define rv_fmt_frd_frs1   "O\t3,4"
 #define rv_fmt_rd_frs1"O\t0,4"
 #define rv_fmt_rd_frs1_frs2   "O\t0,4,5"
@@ -1071,6 +1124,7 @@ static const char rv_vreg_name_sym[32][4] = {
 #define rv_fmt_vd_vm  "O\tDm"
 #define rv_fmt_vsetvli"O\t0,1,v"
 #define rv_fmt_vsetivli   "O\t0,u,v"
+#define rv_fm

Re: [PATCH] hw/intc/riscv_aplic: Zero init APLIC internal state

2023-04-13 Thread Anup Patel

On Thu, Apr 13, 2023 at 7:04 PM Ivan Klokov  wrote:
>
> Since g_new is used to initialize the RISCVAPLICState->state structure,
> in some case we get behavior that is not as expected. This patch
> changes this to g_new0, which allows to initialize the APLIC in the correct 
> state.
>
> Signed-off-by: Ivan Klokov 

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  hw/intc/riscv_aplic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/intc/riscv_aplic.c b/hw/intc/riscv_aplic.c
> index cfd007e629..71591d44bf 100644
> --- a/hw/intc/riscv_aplic.c
> +++ b/hw/intc/riscv_aplic.c
> @@ -803,7 +803,7 @@ static void riscv_aplic_realize(DeviceState *dev, Error 
> **errp)
>
>  aplic->bitfield_words = (aplic->num_irqs + 31) >> 5;
>  aplic->sourcecfg = g_new0(uint32_t, aplic->num_irqs);
> -aplic->state = g_new(uint32_t, aplic->num_irqs);
> +aplic->state = g_new0(uint32_t, aplic->num_irqs);
>  aplic->target = g_new0(uint32_t, aplic->num_irqs);
>  if (!aplic->msimode) {
>  for (i = 0; i < aplic->num_irqs; i++) {
> --
> 2.34.1
>

Re: [PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM

2023-04-13 Thread Kirill A. Shutemov

On Wed, Apr 12, 2023 at 06:07:28PM -0700, Sean Christopherson wrote:
> On Wed, Jan 25, 2023, Kirill A. Shutemov wrote:
> > On Wed, Jan 25, 2023 at 12:20:26AM +, Sean Christopherson wrote:
> > > On Tue, Jan 24, 2023, Liam Merwick wrote:
> > > > On 14/01/2023 00:37, Sean Christopherson wrote:
> > > > > On Fri, Dec 02, 2022, Chao Peng wrote:
> > > > > > This patch series implements KVM guest private memory for 
> > > > > > confidential
> > > > > > computing scenarios like Intel TDX[1]. If a TDX host accesses
> > > > > > TDX-protected guest memory, machine check can happen which can 
> > > > > > further
> > > > > > crash the running host system, this is terrible for multi-tenant
> > > > > > configurations. The host accesses include those from KVM userspace 
> > > > > > like
> > > > > > QEMU. This series addresses KVM userspace induced crash by 
> > > > > > introducing
> > > > > > new mm and KVM interfaces so KVM userspace can still manage guest 
> > > > > > memory
> > > > > > via a fd-based approach, but it can never access the guest memory
> > > > > > content.
> > > > > > 
> > > > > > The patch series touches both core mm and KVM code. I appreciate
> > > > > > Andrew/Hugh and Paolo/Sean can review and pick these patches. Any 
> > > > > > other
> > > > > > reviews are always welcome.
> > > > > >- 01: mm change, target for mm tree
> > > > > >- 02-09: KVM change, target for KVM tree
> > > > > 
> > > > > A version with all of my feedback, plus reworked versions of Vishal's 
> > > > > selftest,
> > > > > is available here:
> > > > > 
> > > > >g...@github.com:sean-jc/linux.git x86/upm_base_support
> > > > > 
> > > > > It compiles and passes the selftest, but it's otherwise barely 
> > > > > tested.  There are
> > > > > a few todos (2 I think?) and many of the commits need changelogs, 
> > > > > i.e. it's still
> > > > > a WIP.
> > > > > 
> > > > 
> > > > When running LTP (https://github.com/linux-test-project/ltp) on the v10
> > > > bits (and also with Sean's branch above) I encounter the following NULL
> > > > pointer dereference with testcases/kernel/syscalls/madvise/madvise01
> > > > (100% reproducible).
> > > > 
> > > > It appears that in restrictedmem_error_page()
> > > > inode->i_mapping->private_data is NULL in the
> > > > list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) but I
> > > > don't know why.
> > > 
> > > Kirill, can you take a look?  Or pass the buck to someone who can? :-)
> > 
> > The patch below should help.
> > 
> > diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c
> > index 15c52301eeb9..39ada985c7c0 100644
> > --- a/mm/restrictedmem.c
> > +++ b/mm/restrictedmem.c
> > @@ -307,14 +307,29 @@ void restrictedmem_error_page(struct page *page, 
> > struct address_space *mapping)
> >  
> > spin_lock(&sb->s_inode_list_lock);
> > list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
> > -   struct restrictedmem *rm = inode->i_mapping->private_data;
> > struct restrictedmem_notifier *notifier;
> > -   struct file *memfd = rm->memfd;
> > +   struct restrictedmem *rm;
> > unsigned long index;
> > +   struct file *memfd;
> >  
> > -   if (memfd->f_mapping != mapping)
> > +   if (atomic_read(&inode->i_count))
> 
> Kirill, should this be
> 
>   if (!atomic_read(&inode->i_count))
>   continue;
> 
> i.e. skip unreferenced inodes, not skip referenced inodes?

Ouch. Yes.

But looking at other instances of s_inodes usage, I think we can drop the
check altogether. inode cannot be completely free until it is removed from
s_inodes list.

While there, replace list_for_each_entry_safe() with
list_for_each_entry() as we don't remove anything from the list.

diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c
index 55e99e6c09a1..8e8a4420d3d1 100644
--- a/mm/restrictedmem.c
+++ b/mm/restrictedmem.c
@@ -194,22 +194,19 @@ static int restricted_error_remove_page(struct 
address_space *mapping,
struct page *page)
 {
struct super_block *sb = restrictedmem_mnt->mnt_sb;
-   struct inode *inode, *next;
+   struct inode *inode;
pgoff_t start, end;
 
start = page->index;
end = start + thp_nr_pages(page);
 
spin_lock(&sb->s_inode_list_lock);
-   list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
+   list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
struct restrictedmem_notifier *notifier;
struct restrictedmem *rm;
unsigned long index;
struct file *memfd;
 
-   if (atomic_read(&inode->i_count))
-   continue;
-
spin_lock(&inode->i_lock);
if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
spin_unlock(&inode->i_lock);
-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCH] hw/mips/malta: Fix the malta machine on big endian hosts

2023-04-13 Thread Michael Tokarev


30.03.2023 18:26, Thomas Huth wrote:

Booting a Linux kernel with the malta machine is currently broken
on big endian hosts. The cpu_to_gt32 macro wants to byteswap a value
for little endian targets only, but uses the wrong way to do this:
cpu_to_[lb]e32 works the other way round on big endian hosts! Fix
it by using the same ways on both, big and little endian hosts.

Fixes: 0c8427baf0 ("hw/mips/malta: Use bootloader helper to set BAR registers")
Signed-off-by: Thomas Huth 


Has this been forgotten?

Thanks,

/mjt

Re: [RFC PATCH 1/3] python: add mkvenv.py

2023-04-13 Thread John Snow

On Wed, Mar 29, 2023 at 8:56 AM Paolo Bonzini  wrote:
>
> On 3/28/23 23:11, John Snow wrote:

> > +for entry_point in entry_points:
> > +# Python 3.8 doesn't have 'module' or 'attr' attributes
> > +if not (hasattr(entry_point, 'module') and
> > +hasattr(entry_point, 'attr')):
> > +match = pattern.match(entry_point.value)
> > +assert match is not None
> > +module = match.group('module')
> > +attr = match.group('attr')
> > +else:
> > +module = entry_point.module
> > +attr = entry_point.attr
> > +yield {
> > +'name': entry_point.name,
> > +'module': module,
> > +'import_name': attr,
> > +'func': attr,
>
> What about using a dataclass or namedtuple instead of a dictionary?
>

I suppose what I meant was: Once 3.8 is our minimum, we can delete
most of this compat code anyway, so there may not be a point in
creating a new type-safe structure to house it. I can definitely add
that in if you'd like, but I suppose I felt like a dict was "good
enough" for now, since 3.7 will also get dropped off the face of the
earth soon, too.

Before I send a non-RFC patch I'll get everything scrubbed down with
the usual pylint/mypy/isort/flake8 combo, and if I wind up needing to
for type safety I will add something.

Or if you are requesting it specifically. :~)

> >
> > +
> > +try:
> > +entry_points = _get_entry_points()
> > +except ImportError as exc:
> > +logger.debug("%s", str(exc))
> > +raise Ouch(
> > +"Neither importlib.metadata nor pkg_resources found, "
> > +"can't generate console script shims.\n"
> > +"Use Python 3.8+, or install importlib-metadata, or 
> > setuptools."
> > +) from exc
>
> Why not put this extra try/except inside _get_entry_points()?

Hm, no good reason, apparently. O:-) I've fixed this one up.

Unrelated question I'm going to tuck in here:

For the script generation, I am making another call to mkvenv.py using
the venv'ified python to do final initializations. As part of that, I
pass the binpath to the script again because I wasn't sure it was safe
to compute it again myself. CPython seems to assume it's always going
to be env_path/Scripts/ or env_path/bin/, but I wasn't 1000% sure that
this wasn't patched by e.g. Debian or had some complications with the
adjustments to site configuration in recent times. I'll circle back
around to investigating this, but for now I've left it with the dumber
approach of always passing the bindir.

Re: [PATCH 0/4] vhost-user-fs: Internal migration

2023-04-13 Thread Michael S. Tsirkin

On Tue, Apr 11, 2023 at 05:05:11PM +0200, Hanna Czenczek wrote:
> RFC:
> https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg04263.html
> 
> Hi,
> 
> Patch 2 of this series adds new vhost methods (only for vhost-user at
> this point) for transferring the back-end’s internal state to/from qemu
> during migration, so that this state can be stored in the migration
> stream.  (This is what we call “internal migration”, because the state
> is internally available to qemu; this is in contrast to “external
> migration”, which Anton is working on, where the back-end’s state is
> handled by the back-end itself without involving qemu.)
> 
> For this, the state is handled as a binary blob by qemu, and it is
> transferred over a pipe that is established via a new vhost method.
> 
> Patch 3 adds two high-level helper functions to (A) fetch any vhost
> back-end’s internal state and store it in a migration stream (a
> `QEMUFile`), and (B) load such state from a migrations stream and send
> it to a vhost back-end.  These build on the low-level interface
> introduced in patch 2.
> 
> Patch 4 then uses these functions to implement internal migration for
> vhost-user-fs.  Note that this of course depends on support in the
> back-end (virtiofsd), which is not yet ready.
> 
> Finally, patch 1 fixes a bug around migrating vhost-user devices: To
> enable/disable logging[1], the VHOST_F_LOG_ALL feature must be
> set/cleared, via the SET_FEATURES call.  Another, technically unrelated,
> feature exists, VHOST_USER_F_PROTOCOL_FEATURES, which indicates support
> for vhost-user protocol features.  Naturally, qemu wants to keep that
> other feature enabled, so it will set it (when possible) in every
> SET_FEATURES call.  However, a side effect of setting
> VHOST_USER_F_PROTOCOL_FEATURES is that all vrings are disabled.


I didn't get this part.
Two questions:
Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``.

If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
ring starts directly in the enabled state.

If ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, the ring is
initialized in a disabled state and is enabled by
``VHOST_USER_SET_VRING_ENABLE`` with parameter 1.

so VHOST_USER_F_PROTOCOL_FEATURES only controls initial state of rings,
it does not disable rings.



>  This
> causes any enabling (done at the start of migration) or disabling (done
> on the source after a cancelled/failed migration) of logging to make the
> back-end hang.  Without patch 1, therefore, starting a migration will
> have any vhost-user back-end that supports both VHOST_F_LOG_ALL and
> VHOST_USER_F_PROTOCOL_FEATURES immediately hang completely, and unless
> execution is transferred to the destination, it will continue to hang.
> 
> 
> [1] Logging here means logging writes to guest memory pages in a dirty
> bitmap so that these dirty pages are flushed to the destination.  qemu
> cannot monitor the back-end’s writes to guest memory, so the back-end
> has to do so itself, and log its writes in a dirty bitmap shared with
> qemu.
> 
> 
> Changes in v1 compared to the RFC:
> - Patch 1 added
> 
> - Patch 2: Interface is different, now uses a pipe instead of shared
>   memory (as suggested by Stefan); also, this is now a generic
>   vhost-user interface, and not just for vhost-user-fs
> 
> - Patches 3 and 4: Because this is now supposed to be a generic
>   migration method for vhost-user back-ends, most of the migration code
>   has been moved from vhost-user-fs.c to vhost.c so it can be shared
>   between different back-ends.  The vhost-user-fs code is now a rather
>   thin wrapper around the common code.
>   - Note also (as suggested by Anton) that the back-end’s migration
> state is now in a subsection, and that it is technically optional.
> “Technically” means that with this series, it is always used (unless
> the back-end doesn’t support migration, in which case migration is
> just blocked), but Anton’s series for external migration would make
> it optional.  (I.e., the subsection would be skipped for external
> migration, and mandatorily included for internal migration.)
> 
> 
> Hanna Czenczek (4):
>   vhost: Re-enable vrings after setting features
>   vhost-user: Interface for migration state transfer
>   vhost: Add high-level state save/load functions
>   vhost-user-fs: Implement internal migration
> 
>  include/hw/virtio/vhost-backend.h |  24 +++
>  include/hw/virtio/vhost.h | 124 +++
>  hw/virtio/vhost-user-fs.c | 101 +++-
>  hw/virtio/vhost-user.c| 147 ++
>  hw/virtio/vhost.c | 246 ++
>  5 files changed, 641 insertions(+), 1 deletion(-)
> 
> -- 
> 2.39.1

Re: [PATCH v2 03/10] tcg: Use one-insn-per-tb accelerator property in curr_cflags()

2023-04-13 Thread Peter Maydell

On Mon, 3 Apr 2023 at 19:33, Richard Henderson
 wrote:
>
> On 4/3/23 07:46, Peter Maydell wrote:
> >   uint32_t curr_cflags(CPUState *cpu)
> >   {
> >   uint32_t cflags = cpu->tcg_cflags;
> > +TCGState *tcgstate = TCG_STATE(current_accel());
>
> As mentioned against the cover, this is a very hot path.
>
> We should try for something less expensive.  Perhaps as simple as
>
>  return cpu->tcg_cflags | tcg_cflags_global;
>
> where cpu->tcg_cflags is updated with cpu->singlestep_enabled.

I feel like that introduces atomicity issues. If I'm reading
the code right, curr_cflags() is called without any kind
of lock held. At the moment we get away with this because
'singlestep' is an int and is always going to be atomically
updated. If we make tcg_cflags_global a value which might have
multiple bits set or not set I'm not entirely sure what the
right way is to handle the reads and writes of it.

I think we can assume we have the iothread lock at any
point where we want to change either 'singlestep' or
the 'nochain' option, at least.

Any suggestions? I'm not very familiar with the
qemu atomic primitives...

thanks
-- PMM

Re: [PATCH] hw/mips/malta: Fix the malta machine on big endian hosts

2023-04-13 Thread Peter Maydell

On Thu, 13 Apr 2023 at 17:08, Michael Tokarev  wrote:
>
> 30.03.2023 18:26, Thomas Huth wrote:
> > Booting a Linux kernel with the malta machine is currently broken
> > on big endian hosts. The cpu_to_gt32 macro wants to byteswap a value
> > for little endian targets only, but uses the wrong way to do this:
> > cpu_to_[lb]e32 works the other way round on big endian hosts! Fix
> > it by using the same ways on both, big and little endian hosts.
> >
> > Fixes: 0c8427baf0 ("hw/mips/malta: Use bootloader helper to set BAR 
> > registers")
> > Signed-off-by: Thomas Huth 
>
> Has this been forgotten?

Looks like it. Too late for 8.0 now (and it wasn't a regression
since it looks like it was broken in 7.2 as well); will have to
be fixed in 8.1.

thanks
-- PMM

Re: [RFC PATCH 1/3] python: add mkvenv.py

2023-04-13 Thread John Snow

On Wed, Mar 29, 2023 at 8:56 AM Paolo Bonzini  wrote:
>
> BTW, another way to repair Debian 10's pip is to create a symbolic link
> to sys.base_prefix + '/share/python-wheels' in sys.prefix +
> '/share/python-wheels'.  Since this is much faster, perhaps it can be
> done unconditionally and checkpip mode can go away together with
> self._context?
>
> Paolo
>

I'm coming around on this one a bit; it's definitely going to be a lot
faster. As you say, my version is more robust, but more complex and
with more lines. We may decide to drop any workarounds for Debian 10
entirely and we can live without either fix. I'll mention this in the
commit message for the Debian 10 workaround.

I do not know right now if other distros suffer from the same problem;
we could attempt to omit the fix and just see if anyone barks. Not
very nice, but impossible to enumerate all of the bugs that exist in
various downstream distros...

--js

[PATCH] rtl8139: fix large_send_mss divide-by-zero

2023-04-13 Thread Stefan Hajnoczi

If the driver sets large_send_mss to 0 then a divide-by-zero occurs.
Even if the division wasn't a problem, the for loop that emits MSS-sized
packets would never terminate.

Solve these issues by skipping offloading when large_send_mss=0.

This issue was found by OSS-Fuzz as part of Alexander Bulekov's device
fuzzing work. The reproducer is:

  $ cat << EOF | ./qemu-system-i386 -display none -machine accel=qtest, -m \
  512M,slots=1,maxmem=0x -machine q35 -nodefaults -device \
  rtl8139,netdev=net0 -netdev user,id=net0 -device \
  pc-dimm,id=nv1,memdev=mem1,addr=0xb800a6460280 -object \
  memory-backend-ram,id=mem1,size=2M  -qtest stdio
  outl 0xcf8 0x8814
  outl 0xcfc 0xe000
  outl 0xcf8 0x8804
  outw 0xcfc 0x06
  write 0xe037 0x1 0x04
  write 0xe0e0 0x2 0x01
  write 0x1 0x1 0x04
  write 0x3 0x1 0x98
  write 0xa 0x1 0x8c
  write 0xb 0x1 0x02
  write 0xc 0x1 0x46
  write 0xd 0x1 0xa6
  write 0xf 0x1 0xb8
  write 0xb800a646028c000c 0x1 0x08
  write 0xb800a646028c000e 0x1 0x47
  write 0xb800a646028c0010 0x1 0x02
  write 0xb800a646028c0017 0x1 0x06
  write 0xb800a646028c0036 0x1 0x80
  write 0xe0d9 0x1 0x40
  EOF

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1582
Fixes: 6d71357a3b65 ("rtl8139: honor large send MSS value")
Reported-by: Alexander Bulekov 
Cc: Peter Maydell 
Signed-off-by: Stefan Hajnoczi 
---
 hw/net/rtl8139.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 5a5aaf868d..5f1a4d359b 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -2154,6 +2154,9 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
 
 int large_send_mss = (txdw0 >> CP_TC_LGSEN_MSS_SHIFT) &
  CP_TC_LGSEN_MSS_MASK;
+if (large_send_mss == 0) {
+goto skip_offload;
+}
 
 DPRINTF("+++ C+ mode offloaded task TSO IP data %d "
 "frame data %d specified MSS=%d\n",
-- 
2.39.2

Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory

2023-04-13 Thread Ackerley Tng


Chao Peng  writes:


From: "Kirill A. Shutemov" 



Introduce 'memfd_restricted' system call with the ability to create
memory areas that are restricted from userspace access through ordinary
MMU operations (e.g. read/write/mmap). The memory content is expected to
be used through the new in-kernel interface by a third kernel module.



...



diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c
new file mode 100644
index ..56953c204e5c
--- /dev/null
+++ b/mm/restrictedmem.c
@@ -0,0 +1,318 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "linux/sbitmap.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct restrictedmem_data {
+   struct mutex lock;
+   struct file *memfd;


Can this be renamed to file, or lower_file (as in stacking filesystems)?

It's a little confusing because this pointer doesn't actually refer to
an fd.

'memfd' is already used by udmabuf to refer to an actual fd [1], which
makes this a little misleading.

[1]  
https://elixir.bootlin.com/linux/v6.2.10/source/tools/testing/selftests/drivers/dma-buf/udmabuf.c#L63



+   struct list_head notifiers;
+};
+
...

1 2 >

1 - 100 of 175 matches

Mail list logo