date:20220823

Re: [PATCH v5 1/9] parallels: Out of image offset in BAT leads to image inflation

2022-08-23 Thread Vladimir Sementsov-Ogievskiy


On 8/22/22 12:05, Alexander Ivanov wrote:

data_end field in BDRVParallelsState is set to the biggest offset present
in BAT. If this offset is outside of the image, any further write will create
the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct.
Raise an error in parallels_open() if data_end points outside the image and
it is not a check (let the check to repaire the image).

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..c245ca35cd 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -811,6 +812,19 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  }
  }
  
+file_size = bdrv_getlength(bs->file->bs);

+if (file_size < 0) {
+ret = file_size;
+goto fail;
+}
+
+file_size >>= BDRV_SECTOR_BITS;
+if (s->data_end > file_size && !(flags & BDRV_O_CHECK)) {
+error_setg(errp, "parallels: Offset in BAT is out of image");
+ret = -EINVAL;
+goto fail;
+}


If image is unaligned to sector size, and image size is less than s->data_end, 
but the difference itself is less than sector, the error message would be 
misleading.

Should we consider "file_size = DIV_ROUND_UP(file_size, BDRV_SECTOR_SIZE)" instead of 
"file_size >>= BDRV_SECTOR_BITS"?

It's hardly possible to get such image on valid scenarios with Qemu (keeping in 
mind bdrv_truncate() call in parallels_close()). But it still may be possible 
to have such images produced by another software or by some failure path.



+
  if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) {
  /* Image was not closed correctly. The check is mandatory */
  s->header_unclean = true;



--
Best regards,
Vladimir

Re: [PATCH] vhost: reduce the set_mem_table call frenquency

2022-08-23 Thread Li Feng

Sorry, looks like I use the old qemu code, master has fix this issue.
Just ignore this patch.

> 2022年8月23日 下午1:38，Li Feng  写道：
> 
> If the vhost memory layout doesn't change, don't need to call the vhost
> backend.
> The set_mem_table is time consuming when sending to vhost-user backend.
> 
> On aarch64, the edk2 uefi firmware will write the pflash which will
> trigger the vhost_commit hundreds of times.
> 
> Signed-off-by: Li Feng 
> ---
> hw/virtio/vhost.c | 14 ++
> include/hw/virtio/vhost.h |  2 ++
> 2 files changed, 16 insertions(+)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index f758f177bb..848d2f20d6 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -523,6 +523,11 @@ static void vhost_commit(MemoryListener *listener)
> /* Rebuild the regions list from the new sections list */
> regions_size = offsetof(struct vhost_memory, regions) +
>dev->n_mem_sections * sizeof dev->mem->regions[0];
> +if (dev->mem && dev->started) {
> +g_free(dev->old_mem);
> +dev->old_mem = dev->mem;
> +dev->mem = NULL;
> +}
> dev->mem = g_realloc(dev->mem, regions_size);
> dev->mem->nregions = dev->n_mem_sections;
> used_memslots = dev->mem->nregions;
> @@ -542,6 +547,12 @@ static void vhost_commit(MemoryListener *listener)
> goto out;
> }
> 
> +if (dev->old_mem && dev->regions_size == regions_size &&
> +memcmp(dev->mem, dev->old_mem, dev->regions_size) == 0) {
> +goto out;
> +}
> +
> +dev->regions_size = regions_size;
> for (i = 0; i < dev->mem->nregions; i++) {
> if (vhost_verify_ring_mappings(dev,
>(void *)(uintptr_t)dev->mem->regions[i].userspace_addr,
> @@ -1445,6 +1456,8 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
> hdev->mem = g_malloc0(offsetof(struct vhost_memory, regions));
> hdev->n_mem_sections = 0;
> hdev->mem_sections = NULL;
> +hdev->old_mem = NULL;
> +hdev->regions_size = 0;
> hdev->log = NULL;
> hdev->log_size = 0;
> hdev->log_enabled = false;
> @@ -1491,6 +1504,7 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
> }
> g_free(hdev->mem);
> g_free(hdev->mem_sections);
> +g_free(hdev->old_mem);
> if (hdev->vhost_ops) {
> hdev->vhost_ops->vhost_backend_cleanup(hdev);
> }
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index a346f23d13..b1d7287099 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -73,6 +73,8 @@ struct vhost_dev {
> MemoryListener memory_listener;
> MemoryListener iommu_listener;
> struct vhost_memory *mem;
> +struct vhost_memory *old_mem;
> +int regions_size;
> int n_mem_sections;
> MemoryRegionSection *mem_sections;
> int n_tmp_sections;
> -- 
> 2.37.2
>

Re: [PATCH v5 1/9] parallels: Out of image offset in BAT leads to image inflation

2022-08-23 Thread Alexander Ivanov




On 23.08.2022 08:58, Vladimir Sementsov-Ogievskiy wrote:

On 8/22/22 12:05, Alexander Ivanov wrote:
data_end field in BDRVParallelsState is set to the biggest offset 
present
in BAT. If this offset is outside of the image, any further write 
will create

the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct.
Raise an error in parallels_open() if data_end points outside the 
image and

it is not a check (let the check to repaire the image).

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..c245ca35cd 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, 
QDict *options, int flags,

  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+    int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -811,6 +812,19 @@ static int parallels_open(BlockDriverState *bs, 
QDict *options, int flags,

  }
  }
  +    file_size = bdrv_getlength(bs->file->bs);
+    if (file_size < 0) {
+    ret = file_size;
+    goto fail;
+    }
+
+    file_size >>= BDRV_SECTOR_BITS;
+    if (s->data_end > file_size && !(flags & BDRV_O_CHECK)) {
+    error_setg(errp, "parallels: Offset in BAT is out of image");
+    ret = -EINVAL;
+    goto fail;
+    }


If image is unaligned to sector size, and image size is less than 
s->data_end, but the difference itself is less than sector, the error 
message would be misleading.


Should we consider "file_size = DIV_ROUND_UP(file_size, 
BDRV_SECTOR_SIZE)" instead of "file_size >>= BDRV_SECTOR_BITS"?


It's hardly possible to get such image on valid scenarios with Qemu 
(keeping in mind bdrv_truncate() call in parallels_close()). But it 
still may be possible to have such images produced by another software 
or by some failure path.


I think you are right, it would be better to align image size up to 
sector size.



+
  if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) {
  /* Image was not closed correctly. The check is mandatory */
  s->header_unclean = true;

Re: [PATCH v2 02/24] virtio: introduce __virtio_queue_reset()

2022-08-23 Thread Jason Wang




在 2022/8/16 09:06, Kangjie Xu 写道:

From: Xuan Zhuo 

Separate the logic of vq reset. This logic will be called directly
later.

Signed-off-by: Xuan Zhuo 



Acked-by: Jason Wang 



---
  hw/virtio/virtio.c | 37 +
  1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 5d607aeaa0..67d54832a9 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2019,6 +2019,26 @@ static enum virtio_device_endian 
virtio_current_cpu_endian(void)
  }
  }
  
+static void __virtio_queue_reset(VirtIODevice *vdev, uint32_t i)

+{
+vdev->vq[i].vring.desc = 0;
+vdev->vq[i].vring.avail = 0;
+vdev->vq[i].vring.used = 0;
+vdev->vq[i].last_avail_idx = 0;
+vdev->vq[i].shadow_avail_idx = 0;
+vdev->vq[i].used_idx = 0;
+vdev->vq[i].last_avail_wrap_counter = true;
+vdev->vq[i].shadow_avail_wrap_counter = true;
+vdev->vq[i].used_wrap_counter = true;
+virtio_queue_set_vector(vdev, i, VIRTIO_NO_VECTOR);
+vdev->vq[i].signalled_used = 0;
+vdev->vq[i].signalled_used_valid = false;
+vdev->vq[i].notification = true;
+vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
+vdev->vq[i].inuse = 0;
+virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
+}
+
  void virtio_reset(void *opaque)
  {
  VirtIODevice *vdev = opaque;
@@ -2050,22 +2070,7 @@ void virtio_reset(void *opaque)
  virtio_notify_vector(vdev, vdev->config_vector);
  
  for(i = 0; i < VIRTIO_QUEUE_MAX; i++) {

-vdev->vq[i].vring.desc = 0;
-vdev->vq[i].vring.avail = 0;
-vdev->vq[i].vring.used = 0;
-vdev->vq[i].last_avail_idx = 0;
-vdev->vq[i].shadow_avail_idx = 0;
-vdev->vq[i].used_idx = 0;
-vdev->vq[i].last_avail_wrap_counter = true;
-vdev->vq[i].shadow_avail_wrap_counter = true;
-vdev->vq[i].used_wrap_counter = true;
-virtio_queue_set_vector(vdev, i, VIRTIO_NO_VECTOR);
-vdev->vq[i].signalled_used = 0;
-vdev->vq[i].signalled_used_valid = false;
-vdev->vq[i].notification = true;
-vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
-vdev->vq[i].inuse = 0;
-virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
+__virtio_queue_reset(vdev, i);
  }
  }

Re: [PATCH v2 03/24] virtio: introduce virtio_queue_reset()

2022-08-23 Thread Jason Wang




在 2022/8/16 09:06, Kangjie Xu 写道:

From: Xuan Zhuo 

Introduce a new interface function virtio_queue_reset() to implement
reset for vq.

Add a new callback to VirtioDeviceClass for queue reset operation for
each child device.

Signed-off-by: Xuan Zhuo 



Acked-by: Jason Wang 



---
  hw/virtio/virtio.c | 11 +++
  include/hw/virtio/virtio.h |  2 ++
  2 files changed, 13 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 67d54832a9..0e9d41366f 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2039,6 +2039,17 @@ static void __virtio_queue_reset(VirtIODevice *vdev, 
uint32_t i)
  virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
  }
  
+void virtio_queue_reset(VirtIODevice *vdev, uint32_t queue_index)

+{
+VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
+
+if (k->queue_reset) {
+k->queue_reset(vdev, queue_index);
+}
+
+__virtio_queue_reset(vdev, queue_index);
+}
+
  void virtio_reset(void *opaque)
  {
  VirtIODevice *vdev = opaque;
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index db1c0ddf6b..879394299b 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -130,6 +130,7 @@ struct VirtioDeviceClass {
  void (*set_config)(VirtIODevice *vdev, const uint8_t *config);
  void (*reset)(VirtIODevice *vdev);
  void (*set_status)(VirtIODevice *vdev, uint8_t val);
+void (*queue_reset)(VirtIODevice *vdev, uint32_t queue_index);
  /* For transitional devices, this is a bitmap of features
   * that are only exposed on the legacy interface but not
   * the modern one.
@@ -268,6 +269,7 @@ int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, 
int n,
MemoryRegion *mr, bool assign);
  int virtio_set_status(VirtIODevice *vdev, uint8_t val);
  void virtio_reset(void *opaque);
+void virtio_queue_reset(VirtIODevice *vdev, uint32_t queue_index);
  void virtio_update_irq(VirtIODevice *vdev);
  int virtio_set_features(VirtIODevice *vdev, uint64_t val);

Re: [PATCH v7 01/14] mm: Add F_SEAL_AUTO_ALLOCATE seal to memfd

2022-08-23 Thread David Hildenbrand

On 18.08.22 01:41, Kirill A. Shutemov wrote:
> On Fri, Aug 05, 2022 at 07:55:38PM +0200, Paolo Bonzini wrote:
>> On 7/21/22 11:44, David Hildenbrand wrote:
>>>
>>> Also, I*think*  you can place pages via userfaultfd into shmem. Not
>>> sure if that would count "auto alloc", but it would certainly bypass
>>> fallocate().
>>
>> Yeah, userfaultfd_register would probably have to forbid this for
>> F_SEAL_AUTO_ALLOCATE vmas.  Maybe the memfile_node can be reused for this,
>> adding a new MEMFILE_F_NO_AUTO_ALLOCATE flags?  Then userfault_register
>> would do something like memfile_node_get_flags(vma->vm_file) and check the
>> result.
> 
> I donno, memory allocation with userfaultfd looks pretty intentional to
> me. Why would F_SEAL_AUTO_ALLOCATE prevent it?
> 

Can't we say the same about a write()?

> Maybe we would need it in the future for post-copy migration or something?
> 
> Or existing practises around userfaultfd touch memory randomly and
> therefore incompatible with F_SEAL_AUTO_ALLOCATE intent?
> 
> Note, that userfaultfd is only relevant for shared memory as it requires
> VMA which we don't have for MFD_INACCESSIBLE.

This feature (F_SEAL_AUTO_ALLOCATE) is independent of all the lovely
encrypted VM stuff, so it doesn't matter how it relates to MFD_INACCESSIBLE.

-- 
Thanks,

David / dhildenb

Re: [PATCH v2 04/24] virtio: introduce virtio_queue_enable()

2022-08-23 Thread Jason Wang




在 2022/8/16 09:06, Kangjie Xu 写道:

Introduce the interface queue_enable() in VirtioDeviceClass and the
fucntion virtio_queue_enable() in virtio, it can be called when
VIRTIO_PCI_COMMON_Q_ENABLE is written and related virtqueue can be
started. It only supports the devices of virtio 1 or later. The
not-supported devices can only start the virtqueue when DRIVER_OK.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 
---



Acked-by: Jason Wang 



  hw/virtio/virtio.c | 14 ++
  include/hw/virtio/virtio.h |  2 ++
  2 files changed, 16 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 0e9d41366f..141f18c633 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2050,6 +2050,20 @@ void virtio_queue_reset(VirtIODevice *vdev, uint32_t 
queue_index)
  __virtio_queue_reset(vdev, queue_index);
  }
  
+void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index)

+{
+VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
+
+if (!virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+error_report("queue_enable is only suppported in devices of virtio "
+ "1.0 or later.");
+}
+
+if (k->queue_enable) {
+k->queue_enable(vdev, queue_index);
+}
+}
+
  void virtio_reset(void *opaque)
  {
  VirtIODevice *vdev = opaque;
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 879394299b..085997d8f3 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -131,6 +131,7 @@ struct VirtioDeviceClass {
  void (*reset)(VirtIODevice *vdev);
  void (*set_status)(VirtIODevice *vdev, uint8_t val);
  void (*queue_reset)(VirtIODevice *vdev, uint32_t queue_index);
+void (*queue_enable)(VirtIODevice *vdev, uint32_t queue_index);
  /* For transitional devices, this is a bitmap of features
   * that are only exposed on the legacy interface but not
   * the modern one.
@@ -270,6 +271,7 @@ int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, 
int n,
  int virtio_set_status(VirtIODevice *vdev, uint8_t val);
  void virtio_reset(void *opaque);
  void virtio_queue_reset(VirtIODevice *vdev, uint32_t queue_index);
+void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
  void virtio_update_irq(VirtIODevice *vdev);
  int virtio_set_features(VirtIODevice *vdev, uint64_t val);

Re: [PATCH v2 05/24] virtio: core: vq reset feature negotation support

2022-08-23 Thread Jason Wang




在 2022/8/16 09:06, Kangjie Xu 写道:

A a new command line parameter "queue_reset" is added.

Meanwhile, the vq reset feature is disabled for pre-7.1 machines.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 
---
  hw/core/machine.c  | 1 +
  include/hw/virtio/virtio.h | 4 +++-
  2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index a673302cce..8b22b4647f 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -43,6 +43,7 @@
  GlobalProperty hw_compat_7_0[] = {
  { "arm-gicv3-common", "force-8-bit-prio", "on" },
  { "nvme-ns", "eui64-default", "on"},
+{ "virtio-device", "queue_reset", "false" },



7.1 is about to release so we need to do it for pre-7.2.

Thanks



  };
  const size_t hw_compat_7_0_len = G_N_ELEMENTS(hw_compat_7_0);
  
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h

index 085997d8f3..ed3ecbef80 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -295,7 +295,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
  DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
VIRTIO_F_IOMMU_PLATFORM, false), \
  DEFINE_PROP_BIT64("packed", _state, _field, \
-  VIRTIO_F_RING_PACKED, false)
+  VIRTIO_F_RING_PACKED, false), \
+DEFINE_PROP_BIT64("queue_reset", _state, _field, \
+  VIRTIO_F_RING_RESET, true)
  
  hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);

  bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);

Re: [PATCH v2 05/24] virtio: core: vq reset feature negotation support

2022-08-23 Thread Kangjie Xu




在 2022/8/23 15:34, Jason Wang 写道:


在 2022/8/16 09:06, Kangjie Xu 写道:

A a new command line parameter "queue_reset" is added.

Meanwhile, the vq reset feature is disabled for pre-7.1 machines.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 
---
  hw/core/machine.c  | 1 +
  include/hw/virtio/virtio.h | 4 +++-
  2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index a673302cce..8b22b4647f 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -43,6 +43,7 @@
  GlobalProperty hw_compat_7_0[] = {
  { "arm-gicv3-common", "force-8-bit-prio", "on" },
  { "nvme-ns", "eui64-default", "on"},
+    { "virtio-device", "queue_reset", "false" },



7.1 is about to release so we need to do it for pre-7.2.

Thanks

The current version is 7.0.92, should I wait until 7.1 to be released 
and submit my patch after that? or just add hw_compat_7_1[] and related 
support.


Thanks




  };
  const size_t hw_compat_7_0_len = G_N_ELEMENTS(hw_compat_7_0);
  diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 085997d8f3..ed3ecbef80 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -295,7 +295,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
  DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
    VIRTIO_F_IOMMU_PLATFORM, false), \
  DEFINE_PROP_BIT64("packed", _state, _field, \
-  VIRTIO_F_RING_PACKED, false)
+  VIRTIO_F_RING_PACKED, false), \
+    DEFINE_PROP_BIT64("queue_reset", _state, _field, \
+  VIRTIO_F_RING_RESET, true)
    hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
  bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);

Re: [PATCH v2 06/24] virtio-pci: support queue reset

2022-08-23 Thread Jason Wang




在 2022/8/16 09:06, Kangjie Xu 写道:

From: Xuan Zhuo 

PCI devices support vq reset.

Based on this function, the driver can adjust the size of the ring, and
quickly recycle the buffer in the ring.

The migration of the virtio devices will not happen during a reset
operation. This is becuase the global iothread lock is held. Migration
thread also needs the lock. As a result, we do not need to migrate the
reset state of VirtIOPCIQueue.

Signed-off-by: Xuan Zhuo 
Signed-off-by: Kangjie Xu 
---
  hw/virtio/virtio-pci.c | 19 +++
  include/hw/virtio/virtio-pci.h |  1 +
  2 files changed, 20 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 45327f0b31..ec8e92052f 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1246,6 +1246,9 @@ static uint64_t virtio_pci_common_read(void *opaque, 
hwaddr addr,
  case VIRTIO_PCI_COMMON_Q_USEDHI:
  val = proxy->vqs[vdev->queue_sel].used[1];
  break;
+case VIRTIO_PCI_COMMON_Q_RESET:
+val = proxy->vqs[vdev->queue_sel].reset;
+break;
  default:
  val = 0;
  }
@@ -1333,6 +1336,7 @@ static void virtio_pci_common_write(void *opaque, hwaddr 
addr,
 ((uint64_t)proxy->vqs[vdev->queue_sel].used[1]) << 32 |
 proxy->vqs[vdev->queue_sel].used[0]);
  proxy->vqs[vdev->queue_sel].enabled = 1;
+proxy->vqs[vdev->queue_sel].reset = 0;
  } else {
  virtio_error(vdev, "wrong value for queue_enable %"PRIx64, val);
  }
@@ -1355,6 +1359,20 @@ static void virtio_pci_common_write(void *opaque, hwaddr 
addr,
  case VIRTIO_PCI_COMMON_Q_USEDHI:
  proxy->vqs[vdev->queue_sel].used[1] = val;
  break;
+case VIRTIO_PCI_COMMON_Q_RESET:
+if (val == 1) {
+/*
+ * With the global iothread lock taken, the migration will not
+ * happen until the virtqueue reset is done.
+ */



This comment applies to all other common cfg operation as well, So it 
looks not necessary?




+proxy->vqs[vdev->queue_sel].reset = 1;
+
+virtio_queue_reset(vdev, vdev->queue_sel);
+
+proxy->vqs[vdev->queue_sel].reset = 0;
+proxy->vqs[vdev->queue_sel].enabled = 0;
+}
+break;
  default:
  break;
  }
@@ -1950,6 +1968,7 @@ static void virtio_pci_reset(DeviceState *qdev)
  
  for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {

  proxy->vqs[i].enabled = 0;
+proxy->vqs[i].reset = 0;
  proxy->vqs[i].num = 0;
  proxy->vqs[i].desc[0] = proxy->vqs[i].desc[1] = 0;
  proxy->vqs[i].avail[0] = proxy->vqs[i].avail[1] = 0;
diff --git a/include/hw/virtio/virtio-pci.h b/include/hw/virtio/virtio-pci.h
index 2446dcd9ae..e9290e2b94 100644
--- a/include/hw/virtio/virtio-pci.h
+++ b/include/hw/virtio/virtio-pci.h
@@ -117,6 +117,7 @@ typedef struct VirtIOPCIRegion {
  typedef struct VirtIOPCIQueue {
uint16_t num;
bool enabled;
+  bool reset;



Do we need to migrate this?

Thanks



uint32_t desc[2];
uint32_t avail[2];
uint32_t used[2];

Re: [PATCH v5 2/9] parallels: Fix data_end field value in parallels_co_check()

2022-08-23 Thread Vladimir Sementsov-Ogievskiy


On 8/22/22 12:05, Alexander Ivanov wrote:

Make data_end pointing to the end of the last cluster if a leak was fixed.
Otherwise set the file size to data_end.

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/parallels.c b/block/parallels.c
index c245ca35cd..2be56871bc 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -513,7 +513,15 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  res->leaks_fixed += count;
  }
  }
-
+/*
+ * If res->image_end_offset less than the file size,
+ * a leak was fixed and we should set the new offset to s->data_end.
+ * Otherwise set the file size to s->data_end.


I'm not sure about English :) For me "set A to B" means A := B, but you use it 
visa versa..


+ */
+if (res->image_end_offset < size && fix & BDRV_FIX_LEAKS) {
+size = res->image_end_offset;
+}
+s->data_end = size >> BDRV_SECTOR_BITS;


Hmm, actually, when size < data_end, you can shrink data_end only if (fix & 
BDRV_FIX_ERRORS), otherwise BAT is still not fixed.


  out:
  qemu_co_mutex_unlock(&s->lock);
  return ret;



Actually I think we only need to modify s->data_end after successful BAT fixing 
above. Then, bdrv_truncate is formally unrelated to s->data_end and shouldn't 
touch it.

So, I think, more correct is something like

diff --git a/block/parallels.c b/block/parallels.c
index 2be56871bc..b268788974 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -479,20 +479,24 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
 prev_off = off;
 }
 
 ret = 0;

 if (flush_bat) {
 ret = bdrv_co_pwrite_sync(bs->file, 0, s->header_size, s->header, 0);
 if (ret < 0) {
 res->check_errors++;
 goto out;
 }
+
+/* We have dropped some wrong clusters, update data_end */
+assert(s->data_end * BDRV_SECTOR_SIZE >= high_off + s->cluster_size);
+s->data_end = (high_off + s->cluster_size) / BDRV_SECTOR_SIZE;
 }
 
 res->image_end_offset = high_off + s->cluster_size;

 if (size > res->image_end_offset) {




--
Best regards,
Vladimir

Re: [PATCH v2 07/24] virtio-pci: support queue enable

2022-08-23 Thread Jason Wang




在 2022/8/16 09:06, Kangjie Xu 写道:

PCI devices support vq enable.



Nit: it might be "support device specific vq enable"




Based on this function, the driver can re-enable the virtqueue after the
virtqueue is reset.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 
---
  hw/virtio/virtio-pci.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index ec8e92052f..3d560e45ad 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1335,6 +1335,7 @@ static void virtio_pci_common_write(void *opaque, hwaddr 
addr,
 proxy->vqs[vdev->queue_sel].avail[0],
 ((uint64_t)proxy->vqs[vdev->queue_sel].used[1]) << 32 |
 proxy->vqs[vdev->queue_sel].used[0]);
+virtio_queue_enable(vdev, vdev->queue_sel);
  proxy->vqs[vdev->queue_sel].enabled = 1;
  proxy->vqs[vdev->queue_sel].reset = 0;



Any reason we do it before the assignment of 1? It probably means the 
device specific method can't depend on virtio_queue_enabled()?


Thanks



  } else {

[PATCH v1 1/2] oslib-posix: Introduce qemu_socketpair()

2022-08-23 Thread tugy

From: Guoyi Tu 

qemu_socketpair() will create a pair of connected sockets
with FD_CLOEXEC set

Signed-off-by: Guoyi Tu 
---
 include/qemu/sockets.h | 18 ++
 util/oslib-posix.c | 19 +++
 2 files changed, 37 insertions(+)

diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index 038faa157f..036745e586 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -14,6 +14,24 @@ int inet_aton(const char *cp, struct in_addr *ia);
 /* misc helpers */
 bool fd_is_socket(int fd);
 int qemu_socket(int domain, int type, int protocol);
+
+#ifndef WIN32
+/**
+ * qemu_socketpair:
+ * @domain: specifies a communication domain, such as PF_UNIX
+ * @type: specifies the socket type.
+ * @protocol: specifies a particular protocol to be used with the  socket
+ * @sv: an array to store the pair of socket created
+ *
+ * Creates an unnamed pair of connected sockets in the specified domain,
+ * of the specified type, and using the optionally specified protocol.
+ * And automatically set the close-on-exec flags on the returned sockets
+ *
+ * Return 0 on success.
+ */
+int qemu_socketpair(int domain, int type, int protocol, int sv[2]);
+#endif
+
 int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
 int socket_set_cork(int fd, int v);
 int socket_set_nodelay(int fd);
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 7a34c1657c..e5f5ebe469 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -261,6 +261,25 @@ void qemu_set_cloexec(int fd)
 assert(f != -1);
 }
 
+int qemu_socketpair(int domain, int type, int protocol, int sv[2])
+{
+int ret;
+
+#ifdef SOCK_CLOEXEC
+ret = socketpair(domain, type | SOCK_CLOEXEC, protocol, sv);
+if (ret != -1 || errno != EINVAL) {
+return ret;
+}
+#endif
+ret = socketpair(domain, type, protocol, sv);;
+if (ret == 0) {
+qemu_set_cloexec(sv[0]);
+qemu_set_cloexec(sv[1]);
+}
+
+return ret;
+}
+
 char *
 qemu_get_local_state_dir(void)
 {
-- 
2.25.1

[PATCH v1 2/2] vhost-user: Call qemu_socketpair() instead of socketpair()

2022-08-23 Thread tugy

From: Guoyi Tu 

As the close-on-exec flags is not set on the file descriptors returned
by socketpair() at default, the fds will survive across exec' function.

In the case that exec' function get invoked, such as the live-update feature
which is been developing, it will cause fd leaks.

To address this problem, we should call qemu_socketpair() to create an pair of
connected sockets with the close-on-exec flag set.

Signed-off-by: Guoyi Tu 
---
 hw/display/vhost-user-gpu.c | 3 ++-
 hw/virtio/vhost-user.c  | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/display/vhost-user-gpu.c b/hw/display/vhost-user-gpu.c
index 3340ef9e5f..19c0e20103 100644
--- a/hw/display/vhost-user-gpu.c
+++ b/hw/display/vhost-user-gpu.c
@@ -11,6 +11,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/sockets.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio-gpu.h"
 #include "chardev/char-fe.h"
@@ -375,7 +376,7 @@ vhost_user_gpu_do_set_socket(VhostUserGPU *g, Error **errp)
 Chardev *chr;
 int sv[2];
 
-if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
+if (qemu_socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
 error_setg_errno(errp, errno, "socketpair() failed");
 return false;
 }
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 75b8df21a4..4d2b227028 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1726,7 +1726,7 @@ static int vhost_setup_slave_channel(struct vhost_dev 
*dev)
 return 0;
 }
 
-if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
+if (qemu_socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
 int saved_errno = errno;
 error_report("socketpair() failed");
 return -saved_errno;
-- 
2.25.1

Re: [PATCH v2 08/24] vhost: extract the logic of unmapping the vrings and desc

2022-08-23 Thread Jason Wang




在 2022/8/16 09:06, Kangjie Xu 写道:

Introduce vhost_virtqueue_unmap() to ummap the vrings and desc
of a virtqueue.

The function will be used later.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 



Acked-by: Jason Wang 



---
  hw/virtio/vhost.c | 20 ++--
  1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 0827d631c0..e467dfc7bc 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1197,6 +1197,19 @@ fail_alloc_desc:
  return r;
  }
  
+static void vhost_virtqueue_unmap(struct vhost_dev *dev,

+  struct VirtIODevice *vdev,
+  struct vhost_virtqueue *vq,
+  unsigned idx)
+{
+vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),
+   1, virtio_queue_get_used_size(vdev, idx));
+vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
+   0, virtio_queue_get_avail_size(vdev, idx));
+vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
+   0, virtio_queue_get_desc_size(vdev, idx));
+}
+
  static void vhost_virtqueue_stop(struct vhost_dev *dev,
  struct VirtIODevice *vdev,
  struct vhost_virtqueue *vq,
@@ -1235,12 +1248,7 @@ static void vhost_virtqueue_stop(struct vhost_dev *dev,
  vhost_vq_index);
  }
  
-vhost_memory_unmap(dev, vq->used, virtio_queue_get_used_size(vdev, idx),

-   1, virtio_queue_get_used_size(vdev, idx));
-vhost_memory_unmap(dev, vq->avail, virtio_queue_get_avail_size(vdev, idx),
-   0, virtio_queue_get_avail_size(vdev, idx));
-vhost_memory_unmap(dev, vq->desc, virtio_queue_get_desc_size(vdev, idx),
-   0, virtio_queue_get_desc_size(vdev, idx));
+vhost_virtqueue_unmap(dev, vdev, vq, idx);
  }
  
  static void vhost_eventfd_add(MemoryListener *listener,

Re: [PATCH v2 06/24] virtio-pci: support queue reset

2022-08-23 Thread Kangjie Xu




在 2022/8/23 15:40, Jason Wang 写道:


在 2022/8/16 09:06, Kangjie Xu 写道:

From: Xuan Zhuo 

PCI devices support vq reset.

Based on this function, the driver can adjust the size of the ring, and
quickly recycle the buffer in the ring.

The migration of the virtio devices will not happen during a reset
operation. This is becuase the global iothread lock is held. Migration
thread also needs the lock. As a result, we do not need to migrate the
reset state of VirtIOPCIQueue.

Signed-off-by: Xuan Zhuo 
Signed-off-by: Kangjie Xu 
---
  hw/virtio/virtio-pci.c | 19 +++
  include/hw/virtio/virtio-pci.h |  1 +
  2 files changed, 20 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 45327f0b31..ec8e92052f 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1246,6 +1246,9 @@ static uint64_t virtio_pci_common_read(void 
*opaque, hwaddr addr,

  case VIRTIO_PCI_COMMON_Q_USEDHI:
  val = proxy->vqs[vdev->queue_sel].used[1];
  break;
+    case VIRTIO_PCI_COMMON_Q_RESET:
+    val = proxy->vqs[vdev->queue_sel].reset;
+    break;
  default:
  val = 0;
  }
@@ -1333,6 +1336,7 @@ static void virtio_pci_common_write(void 
*opaque, hwaddr addr,

((uint64_t)proxy->vqs[vdev->queue_sel].used[1]) << 32 |
proxy->vqs[vdev->queue_sel].used[0]);
  proxy->vqs[vdev->queue_sel].enabled = 1;
+    proxy->vqs[vdev->queue_sel].reset = 0;
  } else {
  virtio_error(vdev, "wrong value for queue_enable 
%"PRIx64, val);

  }
@@ -1355,6 +1359,20 @@ static void virtio_pci_common_write(void 
*opaque, hwaddr addr,

  case VIRTIO_PCI_COMMON_Q_USEDHI:
  proxy->vqs[vdev->queue_sel].used[1] = val;
  break;
+    case VIRTIO_PCI_COMMON_Q_RESET:
+    if (val == 1) {
+    /*
+ * With the global iothread lock taken, the migration 
will not

+ * happen until the virtqueue reset is done.
+ */



This comment applies to all other common cfg operation as well, So it 
looks not necessary?



Get it.




+ proxy->vqs[vdev->queue_sel].reset = 1;
+
+    virtio_queue_reset(vdev, vdev->queue_sel);
+
+    proxy->vqs[vdev->queue_sel].reset = 0;
+    proxy->vqs[vdev->queue_sel].enabled = 0;
+    }
+    break;
  default:
  break;
  }
@@ -1950,6 +1968,7 @@ static void virtio_pci_reset(DeviceState *qdev)
    for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
  proxy->vqs[i].enabled = 0;
+    proxy->vqs[i].reset = 0;
  proxy->vqs[i].num = 0;
  proxy->vqs[i].desc[0] = proxy->vqs[i].desc[1] = 0;
  proxy->vqs[i].avail[0] = proxy->vqs[i].avail[1] = 0;
diff --git a/include/hw/virtio/virtio-pci.h 
b/include/hw/virtio/virtio-pci.h

index 2446dcd9ae..e9290e2b94 100644
--- a/include/hw/virtio/virtio-pci.h
+++ b/include/hw/virtio/virtio-pci.h
@@ -117,6 +117,7 @@ typedef struct VirtIOPCIRegion {
  typedef struct VirtIOPCIQueue {
    uint16_t num;
    bool enabled;
+  bool reset;



Do we need to migrate this?

Thanks

I think we do not need to migrate this because we hold the global 
iothread lock when virtqueue reset is triggered. The migration of these 
device states also needs this lock.


On the other hand, the 'reset' state of virtqueue is same(is 0) before 
and after the process of resetting a virtqueue.


Thus, the migration will not happen when we are resetting a virtqueue 
and we do not to migrate it.


Thanks




    uint32_t desc[2];
    uint32_t avail[2];
    uint32_t used[2];

Re: [PATCH v2 09/24] vhost: introduce vhost_dev_virtqueue_stop()

2022-08-23 Thread Jason Wang




在 2022/8/16 09:06, Kangjie Xu 写道:

Introduce vhost_dev_virtqueue_stop(), which can ummap the
vrings and the desc of it.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 
---
  hw/virtio/vhost.c | 9 +
  include/hw/virtio/vhost.h | 3 +++
  2 files changed, 12 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e467dfc7bc..1bca9ff48d 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1904,3 +1904,12 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
  
  return -ENOSYS;

  }
+
+void vhost_dev_virtqueue_stop(struct vhost_dev *hdev, VirtIODevice *vdev,
+  int idx)
+{
+vhost_virtqueue_unmap(hdev,
+  vdev,
+  hdev->vqs + idx,
+  idx);



So I think the unmap is not sufficient, we need backend specific 
support. E.g for vhost kernel, need a SET_BACKEND here?


Thanks



+}
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a346f23d13..574888440c 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -288,4 +288,7 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
 struct vhost_inflight *inflight);
  int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
 struct vhost_inflight *inflight);
+
+void vhost_dev_virtqueue_stop(struct vhost_dev *hdev, VirtIODevice *vdev,
+  int idx);
  #endif

[PATCH v1 0/2] introduce qemu_socketpiar()

2022-08-23 Thread tugy

From: Guoyi Tu 

Introduce qemu_socketpair() to create socket pair fd, and
set the close-on-exec flag at default as with the other type
of socket does.

besides, the live update feature is developing, so it's necessary
to do that.

Guoyi Tu (2):
  oslib-posix: Introduce qemu_socketpair()
  vhost-user: Call qemu_socketpair() instead of socketpair()

 hw/display/vhost-user-gpu.c |  3 ++-
 hw/virtio/vhost-user.c  |  2 +-
 include/qemu/sockets.h  | 18 ++
 util/oslib-posix.c  | 19 +++
 4 files changed, 40 insertions(+), 2 deletions(-)

-- 
2.25.1

Re: [PATCH v5 1/9] parallels: Out of image offset in BAT leads to image inflation

2022-08-23 Thread Vladimir Sementsov-Ogievskiy


On 8/23/22 10:23, Alexander Ivanov wrote:


On 23.08.2022 08:58, Vladimir Sementsov-Ogievskiy wrote:

On 8/22/22 12:05, Alexander Ivanov wrote:

data_end field in BDRVParallelsState is set to the biggest offset present
in BAT. If this offset is outside of the image, any further write will create
the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct.


Actually, it's correct, as don't break the spec at docs/interop/parallels.txt.

So, it's bad but still correct.. So, we probably can restrict it in Qemu if we 
want. Do we?

Now I doubt. We are going to refuse images with leaks even for read. That 
means, qemu-img convert will stop working for images with leaks, you'll have to 
fix the image first, which may fail.


Raise an error in parallels_open() if data_end points outside the image and
it is not a check (let the check to repaire the image).

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..c245ca35cd 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+    int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -811,6 +812,19 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  }
  }
  +    file_size = bdrv_getlength(bs->file->bs);
+    if (file_size < 0) {
+    ret = file_size;
+    goto fail;
+    }
+
+    file_size >>= BDRV_SECTOR_BITS;
+    if (s->data_end > file_size && !(flags & BDRV_O_CHECK)) {
+    error_setg(errp, "parallels: Offset in BAT is out of image");
+    ret = -EINVAL;
+    goto fail;
+    }


If image is unaligned to sector size, and image size is less than s->data_end, 
but the difference itself is less than sector, the error message would be 
misleading.

Should we consider "file_size = DIV_ROUND_UP(file_size, BDRV_SECTOR_SIZE)" instead of 
"file_size >>= BDRV_SECTOR_BITS"?

It's hardly possible to get such image on valid scenarios with Qemu (keeping in 
mind bdrv_truncate() call in parallels_close()). But it still may be possible 
to have such images produced by another software or by some failure path.


I think you are right, it would be better to align image size up to sector size.



Hmm, or even to cluster_size? When last cluster is unallocated. But cluster 
boundary is not required to be aligned to cluster size..

Anyway, now I think it's wrong to restrict opening file with leaks of any kind. 
That breaks qemu-img convert, and that's not how other formats behave.




+
  if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) {
  /* Image was not closed correctly. The check is mandatory */
  s->header_unclean = true;






--
Best regards,
Vladimir

Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-08-23 Thread David Hildenbrand

On 19.08.22 05:38, Hugh Dickins wrote:
> On Fri, 19 Aug 2022, Sean Christopherson wrote:
>> On Thu, Aug 18, 2022, Kirill A . Shutemov wrote:
>>> On Wed, Aug 17, 2022 at 10:40:12PM -0700, Hugh Dickins wrote:
 On Wed, 6 Jul 2022, Chao Peng wrote:
 But since then, TDX in particular has forced an effort into preventing
 (by flags, seals, notifiers) almost everything that makes it shmem/tmpfs.

 Are any of the shmem.c mods useful to existing users of shmem.c? No.
 Is MFD_INACCESSIBLE useful or comprehensible to memfd_create() users? No.
>>
>> But QEMU and other VMMs are users of shmem and memfd.  The new features 
>> certainly
>> aren't useful for _all_ existing users, but I don't think it's fair to say 
>> that
>> they're not useful for _any_ existing users.
> 
> Okay, I stand corrected: there exist some users of memfd_create()
> who will also have use for "INACCESSIBLE" memory.

As raised in reply to the relevant patch, I'm not sure if we really have
to/want to expose MFD_INACCESSIBLE to user space. I feel like this is a
requirement of specific memfd_notifer (memfile_notifier) implementations
-- such as TDX that will convert the memory and MCE-kill the machine on
ordinary write access. We might be able to set/enforce this when
registering a notifier internally instead, and fail notifier
registration if a condition isn't met (e.g., existing mmap).

So I'd be curious, which other users of shmem/memfd would benefit from
(MMU)-"INACCESSIBLE" memory obtained via memfd_create()?

-- 
Thanks,

David / dhildenb

Re: [RFC v4 11/11] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint

2022-08-23 Thread David Hildenbrand

On 23.08.22 00:24, Stefan Hajnoczi wrote:
> Register guest RAM using BlockRAMRegistrar and set the
> BDRV_REQ_REGISTERED_BUF flag so block drivers can optimize memory
> accesses in I/O requests.
> 
> This is for vdpa-blk, vhost-user-blk, and other I/O interfaces that rely
> on DMA mapping/unmapping.

Can you explain why we're monitoring RAMRegistrar to hook into "guest
RAM" and not go the usual path of the MemoryListener?

What will BDRV_REQ_REGISTERED_BUF actually do? Pin all guest memory in
the worst case such as io_uring fixed buffers would do ( I hope not ).

-- 
Thanks,

David / dhildenb

Re: [PATCH v2 09/24] vhost: introduce vhost_dev_virtqueue_stop()

2022-08-23 Thread Kangjie Xu




在 2022/8/23 15:52, Jason Wang 写道:


在 2022/8/16 09:06, Kangjie Xu 写道:

Introduce vhost_dev_virtqueue_stop(), which can ummap the
vrings and the desc of it.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 
---
  hw/virtio/vhost.c | 9 +
  include/hw/virtio/vhost.h | 3 +++
  2 files changed, 12 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e467dfc7bc..1bca9ff48d 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1904,3 +1904,12 @@ int vhost_net_set_backend(struct vhost_dev *hdev,
    return -ENOSYS;
  }
+
+void vhost_dev_virtqueue_stop(struct vhost_dev *hdev, VirtIODevice 
*vdev,

+  int idx)
+{
+    vhost_virtqueue_unmap(hdev,
+  vdev,
+  hdev->vqs + idx,
+  idx);



So I think the unmap is not sufficient, we need backend specific 
support. E.g for vhost kernel, need a SET_BACKEND here?


Thanks

But SET_BACKEND of vhost-net needs a parameter fd in vhost_vring_file: 
that is net->backend of VHostNetState.


If we add the fd parameter or struct vhost_vring_file to 
vhost_dev_virtqueue_stop/restart, it exposes some implementation details 
in the parameter list.


And that seems not good? So I put SET_BACKEND in the vhost-net module. 
The workflow is similar to vhost_net_start_one().


Thanks




+}
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a346f23d13..574888440c 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -288,4 +288,7 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
 struct vhost_inflight *inflight);
  int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
 struct vhost_inflight *inflight);
+
+void vhost_dev_virtqueue_stop(struct vhost_dev *hdev, VirtIODevice 
*vdev,

+  int idx);
  #endif

Re: [PATCH v2 07/24] virtio-pci: support queue enable

2022-08-23 Thread Kangjie Xu




在 2022/8/23 15:44, Jason Wang 写道:


在 2022/8/16 09:06, Kangjie Xu 写道:

PCI devices support vq enable.



Nit: it might be "support device specific vq enable"



Get it.


Based on this function, the driver can re-enable the virtqueue after the
virtqueue is reset.

Signed-off-by: Kangjie Xu 
Signed-off-by: Xuan Zhuo 
---
  hw/virtio/virtio-pci.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index ec8e92052f..3d560e45ad 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1335,6 +1335,7 @@ static void virtio_pci_common_write(void 
*opaque, hwaddr addr,

proxy->vqs[vdev->queue_sel].avail[0],
((uint64_t)proxy->vqs[vdev->queue_sel].used[1]) << 32 |
proxy->vqs[vdev->queue_sel].used[0]);
+    virtio_queue_enable(vdev, vdev->queue_sel);
  proxy->vqs[vdev->queue_sel].enabled = 1;
  proxy->vqs[vdev->queue_sel].reset = 0;



Any reason we do it before the assignment of 1? It probably means the 
device specific method can't depend on virtio_queue_enabled()?


Thanks

Sorry, I don't get why device specific method can't depend on 
virtio_queue_enabled().


Before virtio_queue_enable() is done, virtqueue should always be not 
ready and disabled.


Otherwise, If we put it after the assignment of enabled to 1, the 
virtqueue may be accessed illegally and may cause panic, because the 
virtqueue is still being intialized and being configured.


Thanks




  } else {

[PATCH 1/1] MAINTAINERS: add tree to keep parallels format driver changes

2022-08-23 Thread Denis V. Lunev

Parallels driver changes are driving by me for now. At least we need to
get functionally complete check and repair procedure for now. This
would be a good educational task for new people from Virtuozzo.

Signed-off-by: Denis V. Lunev 
CC: Vladimir Sementsov-Ogievskiy 
CC: Stefan Hajnoczi 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5ce4227ff6..b8dcf6f350 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3512,6 +3512,7 @@ F: block/parallels.c
 F: block/parallels-ext.c
 F: docs/interop/parallels.txt
 T: git https://gitlab.com/vsementsov/qemu.git block
+T: git https://src.openvz.org/~den/qemu.git parallels
 
 qed
 M: Stefan Hajnoczi 
-- 
2.32.0

Re: [PATCH for-7.2 v2 10/20] hw/ppc: set machine->fdt in spapr machine

2022-08-23 Thread Alexey Kardashevskiy





On 22/08/2022 20:30, Daniel Henrique Barboza wrote:



On 8/22/22 00:29, Alexey Kardashevskiy wrote:



On 22/08/2022 13:05, David Gibson wrote:

On Fri, Aug 19, 2022 at 06:42:34AM -0300, Daniel Henrique Barboza wrote:



On 8/18/22 23:11, Alexey Kardashevskiy wrote:



On 05/08/2022 19:39, Daniel Henrique Barboza wrote:

The pSeries machine never bothered with the common machine->fdt
attribute. We do all the FDT related work using spapr->fdt_blob.

We're going to introduce HMP commands to read and save the FDT, which
will rely on setting machine->fdt properly to work across all machine
archs/types.



Out of curiosity - why new HMP command, is not QOM'ing this ms::fdt 
property enough?


I tried to do the minimal changes needed for the commands to work. 
ms::fdt is

one of the few MachineState fields that hasn't been QOMified by
machine_class_init() yet. All pre-existing code that uses ms::fdt 
are using the
pointer directly. To make a QOMified use of it would require extra 
patches

in machine.c to QOMify the property first.

There's also the issue with how each machine is creating the FDT. 
Most are using
helpers from device_tree.c, some are creating it from scratch, 
others required
a .dtb file, most of them are not doing a fdt_pack() and so on. To 
really QOMify
the use of ms::fdt we would need some machine hooks that standardize 
all that.

I believe it's worth the trouble, but it would be too much to do
right now.


Hmm.. I think this depends on what you mean by "QOM"ify exactly.  If
you're meaning make the full DT representation QOM objects, that you
can look into in detail, then, yes, that's pretty complicated.

I suspect what Alexey was suggesting though, was merely to make
ms::fdt accessible as a single bytestring property on the machine QOM
object.  Effectively it's just "dumpdtb" but as a property get.



Yes, I meant the bytestream, as DTC can easily decompile it onto a DTS.



I'm not 100% certain if QOM can safely represent arbitrary bytestrings
as QOM properties, which would need checking.


I am not sure either but rather than adding another command to HMP, 
I'd explore this option first.



I'm not sure what you mean by that. The HMP version of 'dumpdtb' is more 
flexible
that the current "-machine dumpdtb", an extra machine option that would 
cause

the guest to exit after writing the dtb


True. Especially with CAS :)

And 'info fdt' is a new command 
that

makes it easier to inspect specific nodes/props.


btw what is this new command going to do? decompile the tree or save dtb?

I don't see how making ms::fdt being retrievable by 
object_property_get() internally
(remember that ms::fdt it's not fully QOMified, so there's no 
introspection of its
value from the QEMU monitor) would make any of these new HMP commands 
obsolete.


Well, there are QMP and HMP and my feeling was that HMP is slowly 
getting deprecated or something and QMP is the superior one. So I 
thought since this FDT is a property and there is no associated action 
with it, making it a property would do.


For ages I've been using a python3 script to talk to QMP as HMP is 
really quite limited, the only thing in HMP which is not in QMP is 
dumping memory ("x", "xp"), in this case I wrap HMP into QMP and keep 
using QMP :)






Thanks,


Daniel







--
Alexey

Re: [PATCH v5 2/9] parallels: Fix data_end field value in parallels_co_check()

2022-08-23 Thread Alexander Ivanov




On 23.08.2022 09:38, Vladimir Sementsov-Ogievskiy wrote:

On 8/22/22 12:05, Alexander Ivanov wrote:
Make data_end pointing to the end of the last cluster if a leak was 
fixed.

Otherwise set the file size to data_end.

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/parallels.c b/block/parallels.c
index c245ca35cd..2be56871bc 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -513,7 +513,15 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,

  res->leaks_fixed += count;
  }
  }
-
+    /*
+ * If res->image_end_offset less than the file size,
+ * a leak was fixed and we should set the new offset to 
s->data_end.

+ * Otherwise set the file size to s->data_end.


I'm not sure about English :) For me "set A to B" means A := B, but 
you use it visa versa..
I hesitated about this. I saw both variants and used the incorrect one 
probably. =)



+ */
+    if (res->image_end_offset < size && fix & BDRV_FIX_LEAKS) {
+    size = res->image_end_offset;
+    }
+    s->data_end = size >> BDRV_SECTOR_BITS;


Hmm, actually, when size < data_end, you can shrink data_end only if 
(fix & BDRV_FIX_ERRORS), otherwise BAT is still not fixed.
We shrink an image if there is a leak and (fix & BDRV_FIX_LEAKS). 
Otherwise we set data_end to the file size. In such a way we don't 
change the file size in parallels_close() even if we have out-of-image 
offsets in BAT.



  out:
  qemu_co_mutex_unlock(&s->lock);
  return ret;



Actually I think we only need to modify s->data_end after successful 
BAT fixing above. Then, bdrv_truncate is formally unrelated to 
s->data_end and shouldn't touch it.


So, I think, more correct is something like

diff --git a/block/parallels.c b/block/parallels.c
index 2be56871bc..b268788974 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -479,20 +479,24 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,

 prev_off = off;
 }

 ret = 0;
 if (flush_bat) {
 ret = bdrv_co_pwrite_sync(bs->file, 0, s->header_size, 
s->header, 0);

 if (ret < 0) {
 res->check_errors++;
 goto out;
 }
+
+    /* We have dropped some wrong clusters, update data_end */
+    assert(s->data_end * BDRV_SECTOR_SIZE >= high_off + 
s->cluster_size);

+    s->data_end = (high_off + s->cluster_size) / BDRV_SECTOR_SIZE;
 }

 res->image_end_offset = high_off + s->cluster_size;
 if (size > res->image_end_offset) {

If an image was opened in RW mode for checking without fixing (I don't 
know if it's possible), data_end can be set outside the image and the 
image will be expanded in parallels_close().


OK, I would propose to fix data_end in two places:

1) after out-of-image check set data_end to the highest offset with (off 
+ cluster_size <= file_size) condition, independent on fix flags;


2) after leak check, only if (size > res->image_end_offset).

In such a way we can have only one inconsistence after any check: 
out-of-image offsets isn't fixed and we have offsets after data_end. But 
it is much better than if we set data_end to petabytes, for example.

Re: [qemu-web PATCH] Add signing pubkey for python-qemu-qmp package

2022-08-23 Thread Andrea Bolognani

On Thu, Aug 18, 2022 at 12:53:58PM -0400, John Snow wrote:
> Add the pubkey currently used for signing PyPI releases of qemu.qmp to a
> stable location where it can be referenced by e.g. Fedora RPM specfiles.
>
> At present, the key happens to just simply be my own -- but future
> releases may be signed by a different key. In that case, we can
> increment '1.txt' to '2.txt' and so on. The old keys should be left in
> place.
>
> The format for the keyfile was chosen by copying what OpenStack was
> doing:
> https://releases.openstack.org/_static/0x2426b928085a020d8a90d0d879ab7008d0896c8a.txt
>
> Generated with:
> > gpg --with-fingerprint --list-keys js...@redhat.com > pubkey
> > gpg --armor --export js...@redhat.com >> pubkey

You might want to pass

  --export-options export-minimal

to the second command in order to obtain a significantly smaller file
that can still serve the intended purpose.

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [PATCH 03/14] accel/tcg: Suppress auto-invalidate in probe_access_internal

2022-08-23 Thread David Hildenbrand

On 23.08.22 01:57, Richard Henderson wrote:
> When PAGE_WRITE_INV is set when calling tlb_set_page,
> we immediately set TLB_INVALID_MASK in order to force
> tlb_fill to be called on the next lookup.  Here in
> probe_access_internal, we have just called tlb_fill
> and eliminated true misses, thus the lookup must be valid.
> 
> This allows us to remove a warning comment from s390x.
> There doesn't seem to be a reason to change the code though.
> 
> Cc: David Hildenbrand 
> Signed-off-by: Richard Henderson 
> ---
>  accel/tcg/cputlb.c| 10 +-
>  target/s390x/tcg/mem_helper.c |  4 
>  2 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index 1509df96b4..5359113e8d 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -1602,6 +1602,7 @@ static int probe_access_internal(CPUArchState *env, 
> target_ulong addr,
>  }
>  tlb_addr = tlb_read_ofs(entry, elt_ofs);
>  
> +flags = TLB_FLAGS_MASK;
>  page_addr = addr & TARGET_PAGE_MASK;
>  if (!tlb_hit_page(tlb_addr, page_addr)) {
>  if (!victim_tlb_hit(env, mmu_idx, index, elt_ofs, page_addr)) {
> @@ -1617,10 +1618,17 @@ static int probe_access_internal(CPUArchState *env, 
> target_ulong addr,
>  
>  /* TLB resize via tlb_fill may have moved the entry.  */
>  entry = tlb_entry(env, mmu_idx, addr);
> +
> +/*
> + * With PAGE_WRITE_INV, we set TLB_INVALID_MASK immediately,
> + * to force the next access through tlb_fill.  We've just
> + * called tlb_fill, so we know that this entry *is* valid.
> + */
> +flags &= ~TLB_INVALID_MASK;
>  }
>  tlb_addr = tlb_read_ofs(entry, elt_ofs);
>  }
> -flags = tlb_addr & TLB_FLAGS_MASK;
> +flags &= tlb_addr;
>  
>  /* Fold all "mmio-like" bits into TLB_MMIO.  This is not RAM.  */
>  if (unlikely(flags & ~(TLB_WATCHPOINT | TLB_NOTDIRTY))) {
> diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
> index fc52aa128b..3758b9e688 100644
> --- a/target/s390x/tcg/mem_helper.c
> +++ b/target/s390x/tcg/mem_helper.c
> @@ -148,10 +148,6 @@ static int s390_probe_access(CPUArchState *env, 
> target_ulong addr, int size,
>  #else
>  int flags;
>  
> -/*
> - * For !CONFIG_USER_ONLY, we cannot rely on TLB_INVALID_MASK or 
> haddr==NULL
> - * to detect if there was an exception during tlb_fill().
> - */

Yeah, that was primarily only a comment that we rely on tlb_fill_exc to
obtain the exact PGM_* value -- and at the same time use it to detect if
there was an exception at all.

>  env->tlb_fill_exc = 0;
>  flags = probe_access_flags(env, addr, access_type, mmu_idx, nonfault, 
> phost,
> ra);


Change itself looks good to me.


However, it's been a while since I stared at this code, but I wonder how
the CONFIG_USER_ONLY path is correct.

1) s390_probe_access() documents to "With nonfault=1, return the PGM_
exception that would have been injected into the guest; return 0 if no
exception was detected."

But in case of CONFIG_USER_ONLY, we return the flags returned by
s390_probe_access(), not a PGM__* value. Maybe it doesn't matter,
because we'll simply inject a SIGSEGV in any case ...

I'd have assume that we have to check here if there was an exception in
a similar way, and detect the actual type. As the old comment indicates,
we can either
* check for *phost == NULL
* flags having TLB_INVALID_MASK set

... and I wonder if we really care about the exception type for
CONFIG_USER_ONLY at all.


2) s390_probe_access() documents that for "CONFIG_USER_ONLY, the
faulting address is stored to env->__excp_addr.".

However, that's only set in s390_cpu_record_sigsegv(). With nonfault=1
that will never actually trigger, right?



I assume db9aab5783a2 ("target/s390x: Use probe_access_flags in
s390_probe_access") might have introduced both. We had a flag conversion
to PGM_ in there and stored env->__excp_addr:

flags = page_get_flags(addr);
if (!(flags & (access_type == MMU_DATA_LOAD ?  PAGE_READ :
PAGE_WRITE_ORG))) {
env->__excp_addr = addr;
flags = (flags & PAGE_VALID) ? PGM_PROTECTION : PGM_ADDRESSING;
if (nonfault) {
return flags;
}



-- 
Thanks,

David / dhildenb

Re: [PATCH v5 1/9] parallels: Out of image offset in BAT leads to image inflation

2022-08-23 Thread Denis V. Lunev


On 23.08.2022 09:23, Alexander Ivanov wrote:


On 23.08.2022 08:58, Vladimir Sementsov-Ogievskiy wrote:

On 8/22/22 12:05, Alexander Ivanov wrote:
data_end field in BDRVParallelsState is set to the biggest offset 
present
in BAT. If this offset is outside of the image, any further write 
will create

the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct.
Raise an error in parallels_open() if data_end points outside the 
image and

it is not a check (let the check to repaire the image).

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..c245ca35cd 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, 
QDict *options, int flags,

  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+    int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -811,6 +812,19 @@ static int parallels_open(BlockDriverState *bs, 
QDict *options, int flags,

  }
  }
  +    file_size = bdrv_getlength(bs->file->bs);
+    if (file_size < 0) {
+    ret = file_size;
+    goto fail;
+    }
+
+    file_size >>= BDRV_SECTOR_BITS;
+    if (s->data_end > file_size && !(flags & BDRV_O_CHECK)) {
+    error_setg(errp, "parallels: Offset in BAT is out of image");
+    ret = -EINVAL;
+    goto fail;
+    }


If image is unaligned to sector size, and image size is less than 
s->data_end, but the difference itself is less than sector, the error 
message would be misleading.


Should we consider "file_size = DIV_ROUND_UP(file_size, 
BDRV_SECTOR_SIZE)" instead of "file_size >>= BDRV_SECTOR_BITS"?


It's hardly possible to get such image on valid scenarios with Qemu 
(keeping in mind bdrv_truncate() call in parallels_close()). But it 
still may be possible to have such images produced by another 
software or by some failure path.


I think you are right, it would be better to align image size up to 
sector size.


I would say that we need to align not on sector size but on cluster size.
That would worth additional check.

Re: [PATCH v5 2/9] parallels: Fix data_end field value in parallels_co_check()

2022-08-23 Thread Denis V. Lunev


On 22.08.2022 11:05, Alexander Ivanov wrote:

Make data_end pointing to the end of the last cluster if a leak was fixed.
Otherwise set the file size to data_end.

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/parallels.c b/block/parallels.c
index c245ca35cd..2be56871bc 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -513,7 +513,15 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  res->leaks_fixed += count;
  }
  }
-
+/*
+ * If res->image_end_offset less than the file size,
+ * a leak was fixed and we should set the new offset to s->data_end.
+ * Otherwise set the file size to s->data_end.
+ */
+if (res->image_end_offset < size && fix & BDRV_FIX_LEAKS) {

technically this is correct, but please add braces around
(fix & BDRV_FIX_LEAKS) - the code would look better


+size = res->image_end_offset;
+}
+s->data_end = size >> BDRV_SECTOR_BITS;
  out:
  qemu_co_mutex_unlock(&s->lock);
  return ret;

Re: [PATCH 1/1] hw/i2c/aspeed: Fix old reg slave receive

2022-08-23 Thread Klaus Jensen

On Aug 20 15:57, Peter Delevoryas wrote:
> I think when Klaus ported his slave mode changes from the original patch
> series to the rewritten I2C module, he changed the behavior of the first
> byte that is received by the slave device.
> 
> What's supposed to happen is that the AspeedI2CBus's slave device's
> i2c_event callback should run, and if the event is "send_async", then it
> should populate the byte buffer with the 8-bit I2C address that is being
> sent to. Since we only support "send_async", the lowest bit should
> always be 0 (indicating that the master is requesting to send data).
> 
> This is the code Klaus had previously, for reference. [1]
> 
> switch (event) {
> case I2C_START_SEND:
> bus->buf = bus->dev_addr << 1;
> 
> bus->buf &= I2CD_BYTE_BUF_RX_MASK;
> bus->buf <<= I2CD_BYTE_BUF_RX_SHIFT;
> 
> bus->intr_status |= (I2CD_INTR_SLAVE_ADDR_RX_MATCH | 
> I2CD_INTR_RX_DONE);
> aspeed_i2c_set_state(bus, I2CD_STXD);
> 
> break;
> 
> [1]: 
> https://lore.kernel.org/qemu-devel/20220331165737.1073520-4-...@irrelevant.dk/
> 
> Signed-off-by: Peter Delevoryas 
> Fixes: a8d48f59cd021b25 ("hw/i2c/aspeed: add slave device in old register 
> mode")
> ---
>  hw/i2c/aspeed_i2c.c | 8 +---
>  include/hw/i2c/aspeed_i2c.h | 1 +
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/i2c/aspeed_i2c.c b/hw/i2c/aspeed_i2c.c
> index 42c6d69b82..c166fd20fa 100644
> --- a/hw/i2c/aspeed_i2c.c
> +++ b/hw/i2c/aspeed_i2c.c
> @@ -1131,7 +1131,9 @@ static int aspeed_i2c_bus_slave_event(I2CSlave *slave, 
> enum i2c_event event)
>  AspeedI2CBus *bus = ASPEED_I2C_BUS(qbus->parent);
>  uint32_t reg_intr_sts = aspeed_i2c_bus_intr_sts_offset(bus);
>  uint32_t reg_byte_buf = aspeed_i2c_bus_byte_buf_offset(bus);
> -uint32_t value;
> +uint32_t reg_dev_addr = aspeed_i2c_bus_dev_addr_offset(bus);
> +uint32_t dev_addr = SHARED_ARRAY_FIELD_EX32(bus->regs, reg_dev_addr,
> +SLAVE_DEV_ADDR1);
>  
>  if (aspeed_i2c_is_new_mode(bus->controller)) {
>  return aspeed_i2c_bus_new_slave_event(bus, event);
> @@ -1139,8 +1141,8 @@ static int aspeed_i2c_bus_slave_event(I2CSlave *slave, 
> enum i2c_event event)
>  
>  switch (event) {
>  case I2C_START_SEND_ASYNC:
> -value = SHARED_ARRAY_FIELD_EX32(bus->regs, reg_byte_buf, TX_BUF);
> -SHARED_ARRAY_FIELD_DP32(bus->regs, reg_byte_buf, RX_BUF, value << 1);
> +/* Bit[0] == 0 indicates "send". */
> +SHARED_ARRAY_FIELD_DP32(bus->regs, reg_byte_buf, RX_BUF, dev_addr << 
> 1);
>  
>  ARRAY_FIELD_DP32(bus->regs, I2CD_INTR_STS, SLAVE_ADDR_RX_MATCH, 1);
>  SHARED_ARRAY_FIELD_DP32(bus->regs, reg_intr_sts, RX_DONE, 1);
> diff --git a/include/hw/i2c/aspeed_i2c.h b/include/hw/i2c/aspeed_i2c.h
> index 300a89b343..adc904d6c1 100644
> --- a/include/hw/i2c/aspeed_i2c.h
> +++ b/include/hw/i2c/aspeed_i2c.h
> @@ -130,6 +130,7 @@ REG32(I2CD_CMD, 0x14) /* I2CD Command/Status */
>  SHARED_FIELD(M_TX_CMD, 1, 1)
>  SHARED_FIELD(M_START_CMD, 0, 1)
>  REG32(I2CD_DEV_ADDR, 0x18) /* Slave Device Address */
> +SHARED_FIELD(SLAVE_DEV_ADDR1, 0, 7)
>  REG32(I2CD_POOL_CTRL, 0x1C) /* Pool Buffer Control */
>  SHARED_FIELD(RX_COUNT, 24, 5)
>  SHARED_FIELD(RX_SIZE, 16, 5)
> -- 
> 2.37.1
> 

Nice catch Peter! I'm not sure how I messed that up like that.

Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature

Re: [PATCH v5 1/9] parallels: Out of image offset in BAT leads to image inflation

2022-08-23 Thread Denis V. Lunev


On 23.08.2022 08:58, Vladimir Sementsov-Ogievskiy wrote:

On 8/22/22 12:05, Alexander Ivanov wrote:
data_end field in BDRVParallelsState is set to the biggest offset 
present
in BAT. If this offset is outside of the image, any further write 
will create

the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct.
Raise an error in parallels_open() if data_end points outside the 
image and

it is not a check (let the check to repaire the image).

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..c245ca35cd 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, 
QDict *options, int flags,

  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+    int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -811,6 +812,19 @@ static int parallels_open(BlockDriverState *bs, 
QDict *options, int flags,

  }
  }
  +    file_size = bdrv_getlength(bs->file->bs);
+    if (file_size < 0) {
+    ret = file_size;
+    goto fail;
+    }
+
+    file_size >>= BDRV_SECTOR_BITS;
+    if (s->data_end > file_size && !(flags & BDRV_O_CHECK)) {
+    error_setg(errp, "parallels: Offset in BAT is out of image");
+    ret = -EINVAL;
+    goto fail;
+    }


If image is unaligned to sector size, and image size is less than 
s->data_end, but the difference itself is less than sector, the error 
message would be misleading.


Should we consider "file_size = DIV_ROUND_UP(file_size, 
BDRV_SECTOR_SIZE)" instead of "file_size >>= BDRV_SECTOR_BITS"?

That is a different check. We need to be sure that file_size is authentic,
which worth additional check.

At my opinion, this worth additional patch later on. Let us agree here,
queue and proceed with further pending fixes.

We should use something like the following

    data_start = le32_to_cpu(s->header->data_off);
    if (data_start == 0) {
    data_start = ROUND_UP(bat_entry_off(s->bat_size), 
BDRV_SECTOR_SIZE);

    }
    if ((file_size - data_start) % cluster_size != 0) {
    error();
    }



It's hardly possible to get such image on valid scenarios with Qemu 
(keeping in mind bdrv_truncate() call in parallels_close()). But it 
still may be possible to have such images produced by another software 
or by some failure path.




+
  if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) {
  /* Image was not closed correctly. The check is mandatory */
  s->header_unclean = true;

Re: [PATCH v5 4/9] parallels: Use generic infrastructure for BAT writing in parallels_co_check()

2022-08-23 Thread Denis V. Lunev


On 22.08.2022 11:05, Alexander Ivanov wrote:

BAT is written in the context of conventional operations over
the image inside bdrv_co_flush() when it calls
parallels_co_flush_to_os() callback. Thus we should not
modify BAT array directly, but call parallels_set_bat_entry()
helper and bdrv_co_flush() further on. After that there is no
need to manually write BAT and track its modification.

This makes code more generic and allows to split
parallels_set_bat_entry() for independent pieces.

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 23 ++-
  1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 8733ef8a70..357c345517 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -425,9 +425,8 @@ static int coroutine_fn parallels_co_check(BlockDriverState 
*bs,
  {
  BDRVParallelsState *s = bs->opaque;
  int64_t size, prev_off, high_off;
-int ret;
+int ret = 0;
  uint32_t i;
-bool flush_bat = false;
  
  size = bdrv_getlength(bs->file->bs);

  if (size < 0) {
@@ -466,9 +465,8 @@ static int coroutine_fn parallels_co_check(BlockDriverState 
*bs,
  res->corruptions++;
  if (fix & BDRV_FIX_ERRORS) {
  prev_off = 0;
-s->bat_bitmap[i] = 0;
+parallels_set_bat_entry(s, i, 0);
  res->corruptions_fixed++;
-flush_bat = true;
  continue;
  }
  }
@@ -484,15 +482,6 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  prev_off = off;
  }
  
-ret = 0;

-if (flush_bat) {
-ret = bdrv_co_pwrite_sync(bs->file, 0, s->header_size, s->header, 0);
-if (ret < 0) {
-res->check_errors++;
-goto out;
-}
-}
-
  res->image_end_offset = high_off + s->cluster_size;
  if (size > res->image_end_offset) {
  int64_t count;
@@ -529,6 +518,14 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  s->data_end = size >> BDRV_SECTOR_BITS;
  out:
  qemu_co_mutex_unlock(&s->lock);
+
+if (ret == 0) {
+ret = bdrv_co_flush(bs);
+if (ret < 0) {
+res->check_errors++;
+}
+}
+
  return ret;
  }
  

Reviewed-by: Denis V. Lunev

Re: [PATCH v5 7/9] parallels: Move check of leaks to a separate function

2022-08-23 Thread Denis V. Lunev


On 22.08.2022 11:05, Alexander Ivanov wrote:

We will add more and more checks so we need a better code structure
in parallels_co_check. Let each check performs in a separate loop
in a separate helper.

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 87 +--
  1 file changed, 54 insertions(+), 33 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index bf074f247c..12e8d382f3 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -468,14 +468,13 @@ static int parallels_check_outside_image(BlockDriverState 
*bs,
  return 0;
  }
  
-static int coroutine_fn parallels_co_check(BlockDriverState *bs,

-   BdrvCheckResult *res,
-   BdrvCheckMode fix)
+static int parallels_check_leak(BlockDriverState *bs,
+BdrvCheckResult *res,
+BdrvCheckMode fix)
  {
  BDRVParallelsState *s = bs->opaque;
-int64_t size, prev_off, high_off;
-int ret;
-uint32_t i;
+int64_t size, off, high_off, count;
+int i, ret;

This is not we have agreed on. 'i' should be uint32_t
You have fixed parallels_co_check, but this needs
the fix too.

  
  size = bdrv_getlength(bs->file->bs);

  if (size < 0) {
@@ -483,41 +482,16 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  return size;
  }
  
-qemu_co_mutex_lock(&s->lock);

-
-parallels_check_unclean(bs, res, fix);
-
-ret = parallels_check_outside_image(bs, res, fix);
-if (ret < 0) {
-goto out;
-}
-
-res->bfi.total_clusters = s->bat_size;
-res->bfi.compressed_clusters = 0; /* compression is not supported */
-
  high_off = 0;
-prev_off = 0;
  for (i = 0; i < s->bat_size; i++) {
-int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
-if (off == 0) {
-prev_off = 0;
-continue;
-}
-
-res->bfi.allocated_clusters++;
+off = bat2sect(s, i) << BDRV_SECTOR_BITS;
  if (off > high_off) {
  high_off = off;
  }
-
-if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
-res->bfi.fragmented_clusters++;
-}
-prev_off = off;
  }
  
  res->image_end_offset = high_off + s->cluster_size;

  if (size > res->image_end_offset) {
-int64_t count;
  count = DIV_ROUND_UP(size - res->image_end_offset, s->cluster_size);
  fprintf(stderr, "%s space leaked at the end of the image %" PRId64 
"\n",
  fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
@@ -535,11 +509,12 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  if (ret < 0) {
  error_report_err(local_err);
  res->check_errors++;
-goto out;
+return ret;
  }
  res->leaks_fixed += count;
  }
  }
+
  /*
   * If res->image_end_offset less than the file size,
   * a leak was fixed and we should set the new offset to s->data_end.
@@ -549,6 +524,52 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,
  size = res->image_end_offset;
  }
  s->data_end = size >> BDRV_SECTOR_BITS;
+
+return 0;
+}
+
+static int coroutine_fn parallels_co_check(BlockDriverState *bs,
+   BdrvCheckResult *res,
+   BdrvCheckMode fix)
+{
+BDRVParallelsState *s = bs->opaque;
+int64_t prev_off;
+int ret;
+uint32_t i;
+
+qemu_co_mutex_lock(&s->lock);
+
+parallels_check_unclean(bs, res, fix);
+
+ret = parallels_check_outside_image(bs, res, fix);
+if (ret < 0) {
+goto out;
+}
+
+ret = parallels_check_leak(bs, res, fix);
+if (ret < 0) {
+goto out;
+}
+
+res->bfi.total_clusters = s->bat_size;
+res->bfi.compressed_clusters = 0; /* compression is not supported */
+
+prev_off = 0;
+for (i = 0; i < s->bat_size; i++) {
+int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+if (off == 0) {
+prev_off = 0;
+continue;
+}
+
+res->bfi.allocated_clusters++;
+
+if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
+res->bfi.fragmented_clusters++;
+}
+prev_off = off;
+}
+
  out:
  qemu_co_mutex_unlock(&s->lock);

Re: [PATCH 1/1] MAINTAINERS: add tree to keep parallels format driver changes

2022-08-23 Thread Vladimir Sementsov-Ogievskiy


On 8/23/22 11:40, Denis V. Lunev wrote:

Parallels driver changes are driving by me for now. At least we need to
get functionally complete check and repair procedure for now. This
would be a good educational task for new people from Virtuozzo.

Signed-off-by: Denis V. Lunev 
CC: Vladimir Sementsov-Ogievskiy 
CC: Stefan Hajnoczi 
---
  MAINTAINERS | 1 +
  1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5ce4227ff6..b8dcf6f350 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3512,6 +3512,7 @@ F: block/parallels.c
  F: block/parallels-ext.c
  F: docs/interop/parallels.txt
  T: git https://gitlab.com/vsementsov/qemu.git block
+T: git https://src.openvz.org/~den/qemu.git parallels
  
  qed

  M: Stefan Hajnoczi 


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [PATCH v5 1/9] parallels: Out of image offset in BAT leads to image inflation

2022-08-23 Thread Vladimir Sementsov-Ogievskiy


On 8/23/22 10:59, Vladimir Sementsov-Ogievskiy wrote:

On 8/23/22 10:23, Alexander Ivanov wrote:


On 23.08.2022 08:58, Vladimir Sementsov-Ogievskiy wrote:

On 8/22/22 12:05, Alexander Ivanov wrote:

data_end field in BDRVParallelsState is set to the biggest offset present
in BAT. If this offset is outside of the image, any further write will create
the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct.


Actually, it's correct, as don't break the spec at docs/interop/parallels.txt.

So, it's bad but still correct.. So, we probably can restrict it in Qemu if we 
want. Do we?

Now I doubt. We are going to refuse images with leaks even for read. That 
means, qemu-img convert will stop working for images with leaks, you'll have to 
fix the image first, which may fail.


Raise an error in parallels_open() if data_end points outside the image and
it is not a check (let the check to repaire the image).

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..c245ca35cd 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+    int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -811,6 +812,19 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  }
  }
  +    file_size = bdrv_getlength(bs->file->bs);
+    if (file_size < 0) {
+    ret = file_size;
+    goto fail;
+    }
+
+    file_size >>= BDRV_SECTOR_BITS;
+    if (s->data_end > file_size && !(flags & BDRV_O_CHECK)) {
+    error_setg(errp, "parallels: Offset in BAT is out of image");
+    ret = -EINVAL;
+    goto fail;
+    }


If image is unaligned to sector size, and image size is less than s->data_end, 
but the difference itself is less than sector, the error message would be 
misleading.

Should we consider "file_size = DIV_ROUND_UP(file_size, BDRV_SECTOR_SIZE)" instead of 
"file_size >>= BDRV_SECTOR_BITS"?

It's hardly possible to get such image on valid scenarios with Qemu (keeping in 
mind bdrv_truncate() call in parallels_close()). But it still may be possible 
to have such images produced by another software or by some failure path.


I think you are right, it would be better to align image size up to sector size.



Hmm, or even to cluster_size? When last cluster is unallocated. But cluster 
boundary is not required to be aligned to cluster size..

Anyway, now I think it's wrong to restrict opening file with leaks of any kind. 
That breaks qemu-img convert, and that's not how other formats behave.




+
  if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) {
  /* Image was not closed correctly. The check is mandatory */
  s->header_unclean = true;








Oops, ignore this. We are not about leaks (when file size > data_end), we are 
about wrong clusters when data_end > file_size.


--
Best regards,
Vladimir

Re: [PATCH v5 1/9] parallels: Out of image offset in BAT leads to image inflation

2022-08-23 Thread Vladimir Sementsov-Ogievskiy


On 8/23/22 12:34, Denis V. Lunev wrote:

On 23.08.2022 08:58, Vladimir Sementsov-Ogievskiy wrote:

On 8/22/22 12:05, Alexander Ivanov wrote:

data_end field in BDRVParallelsState is set to the biggest offset present
in BAT. If this offset is outside of the image, any further write will create
the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct.
Raise an error in parallels_open() if data_end points outside the image and
it is not a check (let the check to repaire the image).

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..c245ca35cd 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+    int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -811,6 +812,19 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  }
  }
  +    file_size = bdrv_getlength(bs->file->bs);
+    if (file_size < 0) {
+    ret = file_size;
+    goto fail;
+    }
+
+    file_size >>= BDRV_SECTOR_BITS;
+    if (s->data_end > file_size && !(flags & BDRV_O_CHECK)) {
+    error_setg(errp, "parallels: Offset in BAT is out of image");
+    ret = -EINVAL;
+    goto fail;
+    }


If image is unaligned to sector size, and image size is less than s->data_end, 
but the difference itself is less than sector, the error message would be 
misleading.

Should we consider "file_size = DIV_ROUND_UP(file_size, BDRV_SECTOR_SIZE)" instead of 
"file_size >>= BDRV_SECTOR_BITS"?

That is a different check. We need to be sure that file_size is authentic,
which worth additional check.

At my opinion, this worth additional patch later on. Let us agree here,
queue and proceed with further pending fixes.

We should use something like the following

     data_start = le32_to_cpu(s->header->data_off);
     if (data_start == 0) {
     data_start = ROUND_UP(bat_entry_off(s->bat_size), BDRV_SECTOR_SIZE);
     }
     if ((file_size - data_start) % cluster_size != 0) {
     error();
     }


Do you think that we should error-out in such conditions? It doesn't break the 
spec, is it? Can the last cluster be half allocated?





It's hardly possible to get such image on valid scenarios with Qemu (keeping in 
mind bdrv_truncate() call in parallels_close()). But it still may be possible 
to have such images produced by another software or by some failure path.



+
  if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) {
  /* Image was not closed correctly. The check is mandatory */
  s->header_unclean = true;








--
Best regards,
Vladimir

Re: [PATCH v5 1/9] parallels: Out of image offset in BAT leads to image inflation

2022-08-23 Thread Alexander Ivanov




On 23.08.2022 11:20, Denis V. Lunev wrote:

On 23.08.2022 09:23, Alexander Ivanov wrote:


On 23.08.2022 08:58, Vladimir Sementsov-Ogievskiy wrote:

On 8/22/22 12:05, Alexander Ivanov wrote:
data_end field in BDRVParallelsState is set to the biggest offset 
present
in BAT. If this offset is outside of the image, any further write 
will create

the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct.
Raise an error in parallels_open() if data_end points outside the 
image and

it is not a check (let the check to repaire the image).

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..c245ca35cd 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, 
QDict *options, int flags,

  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+    int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -811,6 +812,19 @@ static int parallels_open(BlockDriverState 
*bs, QDict *options, int flags,

  }
  }
  +    file_size = bdrv_getlength(bs->file->bs);
+    if (file_size < 0) {
+    ret = file_size;
+    goto fail;
+    }
+
+    file_size >>= BDRV_SECTOR_BITS;
+    if (s->data_end > file_size && !(flags & BDRV_O_CHECK)) {
+    error_setg(errp, "parallels: Offset in BAT is out of image");
+    ret = -EINVAL;
+    goto fail;
+    }


If image is unaligned to sector size, and image size is less than 
s->data_end, but the difference itself is less than sector, the 
error message would be misleading.


Should we consider "file_size = DIV_ROUND_UP(file_size, 
BDRV_SECTOR_SIZE)" instead of "file_size >>= BDRV_SECTOR_BITS"?


It's hardly possible to get such image on valid scenarios with Qemu 
(keeping in mind bdrv_truncate() call in parallels_close()). But it 
still may be possible to have such images produced by another 
software or by some failure path.


I think you are right, it would be better to align image size up to 
sector size.


I would say that we need to align not on sector size but on cluster size.
That would worth additional check.


I agree on it for "WithouFreSpacExt" image format.

For "WithoutFreeSpace" format there are two restrictions:

1) the result of (cluster offset - data offset) must be aligned to 
cluster size:


2) data_off should be aligned to sector size (0 in data_off leads to the 
same restriction).


So file size can be aligned on sector size, but not aligned on cluster size.

Should we check dependent on image format?

Re: [PATCH v5 1/9] parallels: Out of image offset in BAT leads to image inflation

2022-08-23 Thread Denis V. Lunev


On 22.08.2022 11:05, Alexander Ivanov wrote:

data_end field in BDRVParallelsState is set to the biggest offset present
in BAT. If this offset is outside of the image, any further write will create
the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct.
Raise an error in parallels_open() if data_end points outside the image and
it is not a check (let the check to repaire the image).

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..c245ca35cd 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -811,6 +812,19 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  }
  }
  
+file_size = bdrv_getlength(bs->file->bs);

+if (file_size < 0) {
+ret = file_size;
+goto fail;
+}
+
+file_size >>= BDRV_SECTOR_BITS;
+if (s->data_end > file_size && !(flags & BDRV_O_CHECK)) {
+error_setg(errp, "parallels: Offset in BAT is out of image");
+ret = -EINVAL;
+goto fail;
+}
+
  if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) {
  /* Image was not closed correctly. The check is mandatory */
  s->header_unclean = true;

Reviewed-by: Denis V. Lunev

Re: [PATCH 1/1] MAINTAINERS: add tree to keep parallels format driver changes

2022-08-23 Thread Denis V. Lunev


On 23.08.2022 11:42, Vladimir Sementsov-Ogievskiy wrote:

On 8/23/22 11:40, Denis V. Lunev wrote:

Parallels driver changes are driving by me for now. At least we need to
get functionally complete check and repair procedure for now. This
would be a good educational task for new people from Virtuozzo.

Signed-off-by: Denis V. Lunev 
CC: Vladimir Sementsov-Ogievskiy 
CC: Stefan Hajnoczi 
---
  MAINTAINERS | 1 +
  1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5ce4227ff6..b8dcf6f350 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3512,6 +3512,7 @@ F: block/parallels.c
  F: block/parallels-ext.c
  F: docs/interop/parallels.txt
  T: git https://gitlab.com/vsementsov/qemu.git block
+T: git https://src.openvz.org/~den/qemu.git parallels
    qed
  M: Stefan Hajnoczi 


Reviewed-by: Vladimir Sementsov-Ogievskiy 


Wow! Queued the first patch to the tree :)

Thanks!

Re: [PATCH 3/7] util: make a copy of iova_tree_remove_parameter

2022-08-23 Thread Eugenio Perez Martin

On Tue, Aug 23, 2022 at 8:18 AM Jason Wang  wrote:
>
>
> 在 2022/8/20 00:53, Eugenio Pérez 写道:
> > It's convenient to call iova_tree_remove from a map returned from
> > iova_tree_find or iova_tree_find_iova.
>
>
> The looks like a hint of the defect of current API.
>
>
> >   With the current code this is not
> > possible, since we will free it, and then we will try to search for it
> > again.
> >
> > Fix it making a copy of the argument. Not applying a fixes tag, since
> > there is no use like that at the moment.
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >   util/iova-tree.c | 4 +++-
> >   1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/util/iova-tree.c b/util/iova-tree.c
> > index fee530a579..713e3edd7b 100644
> > --- a/util/iova-tree.c
> > +++ b/util/iova-tree.c
> > @@ -166,9 +166,11 @@ void iova_tree_foreach(IOVATree *tree, 
> > iova_tree_iterator iterator)
> >
> >   void iova_tree_remove(IOVATree *tree, const DMAMap *map)
> >   {
> > +/* Just in case caller is calling iova_tree_remove from a result of 
> > find */
> > +const DMAMap needle = *map;
>
>
> Then let's simply make iova_tree_remove() accept const DMAMap instead of
> the pointer to it.
>

Do you mean to accept DMAMap by value, isn't it? (no need for const it
then, isn't it?).

>
> >   const DMAMap *overlap;
> >
> > -while ((overlap = iova_tree_find(tree, map))) {
> > +while ((overlap = iova_tree_find(tree, &needle))) {
> >   g_tree_remove(tree->tree, overlap);
> >   }
>
>
> So we had following in iova_tree_insert():
>
>  /* We don't allow to insert range that overlaps with existings */
>  if (iova_tree_find(tree, map)) {
>  return IOVA_ERR_OVERLAP;
>  }
>
> I wonder how overlap can happen?
>

I don't get this part. The problem is that iova_tree_find returns a
pointer to an internal structure, there is no need to insert multiple
times overlapping maps.

Thanks!

Re: [PATCH v5 7/9] parallels: Move check of leaks to a separate function

2022-08-23 Thread Alexander Ivanov




On 23.08.2022 11:40, Denis V. Lunev wrote:

On 22.08.2022 11:05, Alexander Ivanov wrote:

We will add more and more checks so we need a better code structure
in parallels_co_check. Let each check performs in a separate loop
in a separate helper.

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 87 +--
  1 file changed, 54 insertions(+), 33 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index bf074f247c..12e8d382f3 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -468,14 +468,13 @@ static int 
parallels_check_outside_image(BlockDriverState *bs,

  return 0;
  }
  -static int coroutine_fn parallels_co_check(BlockDriverState *bs,
-   BdrvCheckResult *res,
-   BdrvCheckMode fix)
+static int parallels_check_leak(BlockDriverState *bs,
+    BdrvCheckResult *res,
+    BdrvCheckMode fix)
  {
  BDRVParallelsState *s = bs->opaque;
-    int64_t size, prev_off, high_off;
-    int ret;
-    uint32_t i;
+    int64_t size, off, high_off, count;
+    int i, ret;

This is not we have agreed on. 'i' should be uint32_t
You have fixed parallels_co_check, but this needs
the fix too.

Damn! Missed it when rebased on changed patches.



    size = bdrv_getlength(bs->file->bs);
  if (size < 0) {
@@ -483,41 +482,16 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,

  return size;
  }
  -    qemu_co_mutex_lock(&s->lock);
-
-    parallels_check_unclean(bs, res, fix);
-
-    ret = parallels_check_outside_image(bs, res, fix);
-    if (ret < 0) {
-    goto out;
-    }
-
-    res->bfi.total_clusters = s->bat_size;
-    res->bfi.compressed_clusters = 0; /* compression is not 
supported */

-
  high_off = 0;
-    prev_off = 0;
  for (i = 0; i < s->bat_size; i++) {
-    int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
-    if (off == 0) {
-    prev_off = 0;
-    continue;
-    }
-
-    res->bfi.allocated_clusters++;
+    off = bat2sect(s, i) << BDRV_SECTOR_BITS;
  if (off > high_off) {
  high_off = off;
  }
-
-    if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
-    res->bfi.fragmented_clusters++;
-    }
-    prev_off = off;
  }
    res->image_end_offset = high_off + s->cluster_size;
  if (size > res->image_end_offset) {
-    int64_t count;
  count = DIV_ROUND_UP(size - res->image_end_offset, 
s->cluster_size);
  fprintf(stderr, "%s space leaked at the end of the image %" 
PRId64 "\n",

  fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
@@ -535,11 +509,12 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,

  if (ret < 0) {
  error_report_err(local_err);
  res->check_errors++;
-    goto out;
+    return ret;
  }
  res->leaks_fixed += count;
  }
  }
+
  /*
   * If res->image_end_offset less than the file size,
   * a leak was fixed and we should set the new offset to 
s->data_end.
@@ -549,6 +524,52 @@ static int coroutine_fn 
parallels_co_check(BlockDriverState *bs,

  size = res->image_end_offset;
  }
  s->data_end = size >> BDRV_SECTOR_BITS;
+
+    return 0;
+}
+
+static int coroutine_fn parallels_co_check(BlockDriverState *bs,
+   BdrvCheckResult *res,
+   BdrvCheckMode fix)
+{
+    BDRVParallelsState *s = bs->opaque;
+    int64_t prev_off;
+    int ret;
+    uint32_t i;
+
+    qemu_co_mutex_lock(&s->lock);
+
+    parallels_check_unclean(bs, res, fix);
+
+    ret = parallels_check_outside_image(bs, res, fix);
+    if (ret < 0) {
+    goto out;
+    }
+
+    ret = parallels_check_leak(bs, res, fix);
+    if (ret < 0) {
+    goto out;
+    }
+
+    res->bfi.total_clusters = s->bat_size;
+    res->bfi.compressed_clusters = 0; /* compression is not 
supported */

+
+    prev_off = 0;
+    for (i = 0; i < s->bat_size; i++) {
+    int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+    if (off == 0) {
+    prev_off = 0;
+    continue;
+    }
+
+    res->bfi.allocated_clusters++;
+
+    if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
+    res->bfi.fragmented_clusters++;
+    }
+    prev_off = off;
+    }
+
  out:
  qemu_co_mutex_unlock(&s->lock);

[PATCH v4 0/2] block: add missed block_acct_setup with new block device init procedure

2022-08-23 Thread Denis V. Lunev

Commit 5f76a7aac156ca75680dad5df4a385fd0b58f6b1 is looking harmless from
the first glance, but it has changed things a lot. 'libvirt' uses it to
detect that it should follow new initialization way and this changes
things considerably. With this procedure followed, blockdev_init() is
not called anymore and thus block_acct_setup() helper is not called.

This means in particular that defaults for block accounting statistics
are changed and account_invalid/account_failed are actually initialized
as false instead of true originally.

This commit changes things to match original world. There are the following
constraints:
* new default value in block_acct_init() is set to true
* block_acct_setup() inside blockdev_init() is called before
  blkconf_apply_backend_options()
* thus newly created option in block device properties has precedence if
  specified

Changes from v3:
* fixed accidentally wrong submission. Contains changes which should be
  sent as v3

Changes from v2:
* called bool_from_onoffauto(account_..., true) in the first patch to
  preserve original semantics before patch 2

Changes from v1:
* set account_invalid/account_failed to true by default
* pass OnOffAuto to block_acct_init() to handle double initialization (patch 1)
* changed properties on BLK device to OnOffAuto

Signed-off-by: Denis V. Lunev 
CC: Peter Krempa 
CC: Markus Armbruster 
CC: John Snow 
CC: Kevin Wolf 
CC: Hanna Reitz 
CC: Vladimir Sementsov-Ogievskiy

[PATCH 2/2] block: add missed block_acct_setup with new block device init procedure

2022-08-23 Thread Denis V. Lunev

Commit 5f76a7aac156ca75680dad5df4a385fd0b58f6b1 is looking harmless from
the first glance, but it has changed things a lot. 'libvirt' uses it to
detect that it should follow new initialization way and this changes
things considerably. With this procedure followed, blockdev_init() is
not called anymore and thus block_acct_setup() helper is not called.

This means in particular that defaults for block accounting statistics
are changed and account_invalid/account_failed are actually initialized
as false instead of true originally.

This commit changes things to match original world. There are the following
constraints:
* new default value in block_acct_init() is set to true
* block_acct_setup() inside blockdev_init() is called before
  blkconf_apply_backend_options()
* thus newly created option in block device properties has precedence if
  specified

Signed-off-by: Denis V. Lunev 
CC: Peter Krempa 
CC: Markus Armbruster 
CC: John Snow 
CC: Kevin Wolf 
CC: Hanna Reitz 
CC: Vladimir Sementsov-Ogievskiy 
---
 block/accounting.c |  8 +++-
 hw/block/block.c   |  2 +
 include/hw/block/block.h   |  7 +++-
 qapi/misc.json |  9 +
 tests/qemu-iotests/172.out | 76 ++
 5 files changed, 91 insertions(+), 11 deletions(-)

diff --git a/block/accounting.c b/block/accounting.c
index 6b300c5129..2829745377 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -38,6 +38,8 @@ void block_acct_init(BlockAcctStats *stats)
 if (qtest_enabled()) {
 clock_type = QEMU_CLOCK_VIRTUAL;
 }
+stats->account_invalid = true;
+stats->account_failed = true;
 }
 
 static bool bool_from_onoffauto(OnOffAuto val, bool def)
@@ -57,8 +59,10 @@ static bool bool_from_onoffauto(OnOffAuto val, bool def)
 void block_acct_setup(BlockAcctStats *stats, enum OnOffAuto account_invalid,
   enum OnOffAuto account_failed)
 {
-stats->account_invalid = bool_from_onoffauto(account_invalid, true);
-stats->account_failed = bool_from_onoffauto(account_failed, true);
+stats->account_invalid = bool_from_onoffauto(account_invalid,
+ stats->account_invalid);
+stats->account_failed = bool_from_onoffauto(account_failed,
+stats->account_failed);
 }
 
 void block_acct_cleanup(BlockAcctStats *stats)
diff --git a/hw/block/block.c b/hw/block/block.c
index 04279166ee..f9c4fe6767 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -205,6 +205,8 @@ bool blkconf_apply_backend_options(BlockConf *conf, bool 
readonly,
 blk_set_enable_write_cache(blk, wce);
 blk_set_on_error(blk, rerror, werror);
 
+block_acct_setup(blk_get_stats(blk), conf->account_invalid,
+ conf->account_failed);
 return true;
 }
 
diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index 5902c0440a..15fff66435 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -31,6 +31,7 @@ typedef struct BlockConf {
 uint32_t lcyls, lheads, lsecs;
 OnOffAuto wce;
 bool share_rw;
+OnOffAuto account_invalid, account_failed;
 BlockdevOnError rerror;
 BlockdevOnError werror;
 } BlockConf;
@@ -61,7 +62,11 @@ static inline unsigned int get_physical_block_exp(BlockConf 
*conf)
_conf.discard_granularity, -1),  \
 DEFINE_PROP_ON_OFF_AUTO("write-cache", _state, _conf.wce,   \
 ON_OFF_AUTO_AUTO),  \
-DEFINE_PROP_BOOL("share-rw", _state, _conf.share_rw, false)
+DEFINE_PROP_BOOL("share-rw", _state, _conf.share_rw, false),\
+DEFINE_PROP_ON_OFF_AUTO("account-invalid", _state,  \
+_conf.account_invalid, ON_OFF_AUTO_AUTO),   \
+DEFINE_PROP_ON_OFF_AUTO("account-failed", _state,   \
+_conf.account_failed, ON_OFF_AUTO_AUTO)
 
 #define DEFINE_BLOCK_PROPERTIES(_state, _conf)  \
 DEFINE_PROP_DRIVE("drive", _state, _conf.blk),  \
diff --git a/qapi/misc.json b/qapi/misc.json
index 27ef5a2b20..f469cd0ded 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -210,12 +210,6 @@
 #
 # @cpu-index: The CPU to use for commands that require an implicit CPU
 #
-# Features:
-# @savevm-monitor-nodes: If present, HMP command savevm only snapshots
-#monitor-owned nodes if they have no parents.
-#This allows the use of 'savevm' with
-#-blockdev. (since 4.2)
-#
 # Returns: the output of the command as a string
 #
 # Since: 0.14
@@ -243,8 +237,7 @@
 ##
 { 'command': 'human-monitor-command',
   'data': {'command-line': 'str', '*cpu-index': 'int'},
-  'returns': 'str',
-  'features': [ 'savevm-monitor-nodes' ] }
+  'returns': 'str' }
 
 ##
 # @getfd:
diff --git a/tests/qemu-iotests/172.out b/tests/qemu-iotests/172.out
index

Re: [PATCH v5 1/9] parallels: Out of image offset in BAT leads to image inflation

2022-08-23 Thread Vladimir Sementsov-Ogievskiy


On 8/23/22 12:20, Denis V. Lunev wrote:

On 23.08.2022 09:23, Alexander Ivanov wrote:


On 23.08.2022 08:58, Vladimir Sementsov-Ogievskiy wrote:

On 8/22/22 12:05, Alexander Ivanov wrote:

data_end field in BDRVParallelsState is set to the biggest offset present
in BAT. If this offset is outside of the image, any further write will create
the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct.
Raise an error in parallels_open() if data_end points outside the image and
it is not a check (let the check to repaire the image).

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..c245ca35cd 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+    int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -811,6 +812,19 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  }
  }
  +    file_size = bdrv_getlength(bs->file->bs);
+    if (file_size < 0) {
+    ret = file_size;
+    goto fail;
+    }
+
+    file_size >>= BDRV_SECTOR_BITS;
+    if (s->data_end > file_size && !(flags & BDRV_O_CHECK)) {
+    error_setg(errp, "parallels: Offset in BAT is out of image");
+    ret = -EINVAL;
+    goto fail;
+    }


If image is unaligned to sector size, and image size is less than s->data_end, 
but the difference itself is less than sector, the error message would be 
misleading.

Should we consider "file_size = DIV_ROUND_UP(file_size, BDRV_SECTOR_SIZE)" instead of 
"file_size >>= BDRV_SECTOR_BITS"?

It's hardly possible to get such image on valid scenarios with Qemu (keeping in 
mind bdrv_truncate() call in parallels_close()). But it still may be possible 
to have such images produced by another software or by some failure path.


I think you are right, it would be better to align image size up to sector size.


I would say that we need to align not on sector size but on cluster size.
That would worth additional check.


And not simply align, as data_offset is not necessarily aligned to cluster size.

Finally, what should we check?

I suggest


diff --git a/block/parallels.c b/block/parallels.c
index 6d4ed77f16..b882ea1200 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -725,6 +725,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 BDRVParallelsState *s = bs->opaque;
 ParallelsHeader ph;
 int ret, size, i;
+int64_t file_size;
 QemuOpts *opts = NULL;
 Error *local_err = NULL;
 char *buf;
@@ -735,6 +736,11 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 return -EINVAL;
 }
 
+file_size = bdrv_getlength(bs->file->bs);

+if (file_size < 0) {
+return file_size;
+}
+
 ret = bdrv_pread(bs->file, 0, &ph, sizeof(ph));
 if (ret < 0) {
 goto fail;
@@ -798,6 +804,13 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 for (i = 0; i < s->bat_size; i++) {

 int64_t off = bat2sect(s, i);
+if (off >= file_size) {
+error_setg(errp, "parallels: Offset %" PRIi64 " in BAT[%d] entry "
+   "is larger than file size (%" PRIi64 ")",
+   off, i, file_size);
+ret = -EINVAL;
+goto fail;
+}
 if (off >= s->data_end) {
 s->data_end = off + s->tracks;
 }



- better error message, and we check exactly what's written in the spec 
(docs/interop/parallels.c):


  Cluster offsets specified by BAT entries must meet the following requirements:
  [...]
  - the value must be lower than the desired file size,



--
Best regards,
Vladimir

[PATCH 1/2] block: pass OnOffAuto instead of bool to block_acct_setup()

2022-08-23 Thread Denis V. Lunev

We would have one more place for block_acct_setup() calling, which should
not corrupt original value.

Signed-off-by: Denis V. Lunev 
CC: Peter Krempa 
CC: Markus Armbruster 
CC: John Snow 
CC: Kevin Wolf 
CC: Hanna Reitz 
CC: Vladimir Sementsov-Ogievskiy 
---
 block/accounting.c | 22 ++
 blockdev.c | 17 ++---
 include/block/accounting.h |  6 +++---
 3 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/block/accounting.c b/block/accounting.c
index 2030851d79..6b300c5129 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -40,11 +40,25 @@ void block_acct_init(BlockAcctStats *stats)
 }
 }
 
-void block_acct_setup(BlockAcctStats *stats, bool account_invalid,
-  bool account_failed)
+static bool bool_from_onoffauto(OnOffAuto val, bool def)
 {
-stats->account_invalid = account_invalid;
-stats->account_failed = account_failed;
+switch (val) {
+case ON_OFF_AUTO_AUTO:
+return def;
+case ON_OFF_AUTO_ON:
+return true;
+case ON_OFF_AUTO_OFF:
+return false;
+default:
+abort();
+}
+}
+
+void block_acct_setup(BlockAcctStats *stats, enum OnOffAuto account_invalid,
+  enum OnOffAuto account_failed)
+{
+stats->account_invalid = bool_from_onoffauto(account_invalid, true);
+stats->account_failed = bool_from_onoffauto(account_failed, true);
 }
 
 void block_acct_cleanup(BlockAcctStats *stats)
diff --git a/blockdev.c b/blockdev.c
index 9230888e34..392d9476e6 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -455,6 +455,17 @@ static void extract_common_blockdev_options(QemuOpts 
*opts, int *bdrv_flags,
 }
 }
 
+static OnOffAuto account_get_opt(QemuOpts *opts, const char *name)
+{
+if (!qemu_opt_find(opts, name)) {
+return ON_OFF_AUTO_AUTO;
+}
+if (qemu_opt_get_bool(opts, name, true)) {
+return ON_OFF_AUTO_ON;
+}
+return ON_OFF_AUTO_OFF;
+}
+
 /* Takes the ownership of bs_opts */
 static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
Error **errp)
@@ -462,7 +473,7 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 const char *buf;
 int bdrv_flags = 0;
 int on_read_error, on_write_error;
-bool account_invalid, account_failed;
+OnOffAuto account_invalid, account_failed;
 bool writethrough, read_only;
 BlockBackend *blk;
 BlockDriverState *bs;
@@ -496,8 +507,8 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 /* extract parameters */
 snapshot = qemu_opt_get_bool(opts, "snapshot", 0);
 
-account_invalid = qemu_opt_get_bool(opts, "stats-account-invalid", true);
-account_failed = qemu_opt_get_bool(opts, "stats-account-failed", true);
+account_invalid = account_get_opt(opts, "stats-account-invalid");
+account_failed = account_get_opt(opts, "stats-account-failed");
 
 writethrough = !qemu_opt_get_bool(opts, BDRV_OPT_CACHE_WB, true);
 
diff --git a/include/block/accounting.h b/include/block/accounting.h
index 878b4c3581..b9caad60d5 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -27,7 +27,7 @@
 
 #include "qemu/timed-average.h"
 #include "qemu/thread.h"
-#include "qapi/qapi-builtin-types.h"
+#include "qapi/qapi-types-common.h"
 
 typedef struct BlockAcctTimedStats BlockAcctTimedStats;
 typedef struct BlockAcctStats BlockAcctStats;
@@ -100,8 +100,8 @@ typedef struct BlockAcctCookie {
 } BlockAcctCookie;
 
 void block_acct_init(BlockAcctStats *stats);
-void block_acct_setup(BlockAcctStats *stats, bool account_invalid,
- bool account_failed);
+void block_acct_setup(BlockAcctStats *stats, enum OnOffAuto account_invalid,
+  enum OnOffAuto account_failed);
 void block_acct_cleanup(BlockAcctStats *stats);
 void block_acct_add_interval(BlockAcctStats *stats, unsigned interval_length);
 BlockAcctTimedStats *block_acct_interval_next(BlockAcctStats *stats,
-- 
2.32.0

Re: [PATCH v5 1/9] parallels: Out of image offset in BAT leads to image inflation

2022-08-23 Thread Denis V. Lunev


On 23.08.2022 11:47, Vladimir Sementsov-Ogievskiy wrote:

On 8/23/22 12:34, Denis V. Lunev wrote:

On 23.08.2022 08:58, Vladimir Sementsov-Ogievskiy wrote:

On 8/22/22 12:05, Alexander Ivanov wrote:
data_end field in BDRVParallelsState is set to the biggest offset 
present
in BAT. If this offset is outside of the image, any further write 
will create

the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct.
Raise an error in parallels_open() if data_end points outside the 
image and

it is not a check (let the check to repaire the image).

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..c245ca35cd 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, 
QDict *options, int flags,

  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+    int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -811,6 +812,19 @@ static int parallels_open(BlockDriverState 
*bs, QDict *options, int flags,

  }
  }
  +    file_size = bdrv_getlength(bs->file->bs);
+    if (file_size < 0) {
+    ret = file_size;
+    goto fail;
+    }
+
+    file_size >>= BDRV_SECTOR_BITS;
+    if (s->data_end > file_size && !(flags & BDRV_O_CHECK)) {
+    error_setg(errp, "parallels: Offset in BAT is out of image");
+    ret = -EINVAL;
+    goto fail;
+    }


If image is unaligned to sector size, and image size is less than 
s->data_end, but the difference itself is less than sector, the 
error message would be misleading.


Should we consider "file_size = DIV_ROUND_UP(file_size, 
BDRV_SECTOR_SIZE)" instead of "file_size >>= BDRV_SECTOR_BITS"?
That is a different check. We need to be sure that file_size is 
authentic,

which worth additional check.

At my opinion, this worth additional patch later on. Let us agree here,
queue and proceed with further pending fixes.

We should use something like the following

 data_start = le32_to_cpu(s->header->data_off);
 if (data_start == 0) {
 data_start = ROUND_UP(bat_entry_off(s->bat_size), 
BDRV_SECTOR_SIZE);

 }
 if ((file_size - data_start) % cluster_size != 0) {
 error();
 }


Do you think that we should error-out in such conditions? It doesn't 
break the spec, is it? Can the last cluster be half allocated?

absolutely no!

We should adjust the spec on this.

Den

Re: [PATCH v5 1/3] Update linux headers to 6.0-rc1

2022-08-23 Thread Chenyi Qiang




On 8/22/2022 11:00 PM, Michal Prívozník wrote:

On 8/17/22 04:08, Chenyi Qiang wrote:

commit 568035b01cfb107af8d2e4bd2fb9aea22cf5b868

Signed-off-by: Chenyi Qiang 
---
  include/standard-headers/asm-x86/bootparam.h  |   7 +-
  include/standard-headers/drm/drm_fourcc.h |  73 +++-
  include/standard-headers/linux/ethtool.h  |  29 +--
  include/standard-headers/linux/input.h|  12 +-
  include/standard-headers/linux/pci_regs.h |  30 ++-
  include/standard-headers/linux/vhost_types.h  |  17 +-
  include/standard-headers/linux/virtio_9p.h|   2 +-
  .../standard-headers/linux/virtio_config.h|   7 +-
  include/standard-headers/linux/virtio_ids.h   |  14 +-
  include/standard-headers/linux/virtio_net.h   |  34 +++-
  include/standard-headers/linux/virtio_pci.h   |   2 +
  linux-headers/asm-arm64/kvm.h |  27 +++
  linux-headers/asm-generic/unistd.h|   4 +-
  linux-headers/asm-riscv/kvm.h |  22 +++
  linux-headers/asm-riscv/unistd.h  |   3 +-
  linux-headers/asm-s390/kvm.h  |   1 +
  linux-headers/asm-x86/kvm.h   |  33 ++--
  linux-headers/asm-x86/mman.h  |  14 --
  linux-headers/linux/kvm.h | 172 +-
  linux-headers/linux/userfaultfd.h |  10 +-
  linux-headers/linux/vduse.h   |  47 +
  linux-headers/linux/vfio.h|   4 +-
  linux-headers/linux/vfio_zdev.h   |   7 +
  linux-headers/linux/vhost.h   |  35 +++-
  24 files changed, 523 insertions(+), 83 deletions(-)





diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index bf6e96011d..46de10a809 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -198,13 +198,13 @@ struct kvm_msrs {
__u32 nmsrs; /* number of msrs in entries */
__u32 pad;
  
-	struct kvm_msr_entry entries[0];

+   struct kvm_msr_entry entries[];
  };
  


I don't think it's this simple. I think this needs to go hand in hand with 
kvm_arch_get_supported_msr_feature().

Also, this breaks clang build:

clang -m64 -mcx16 -Ilibqemu-x86_64-softmmu.fa.p -I. -I.. -Itarget/i386 -I../target/i386 -Iqapi 
-Itrace -Iui -Iui/shader -I/usr/include/pixman-1 -I/usr/include/spice-server -I/usr/include/spice-1 
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -fcolor-diagnostics -Wall -Winvalid-pch 
-Werror -std=gnu11 -O0 -g -isystem /home/zippy/work/qemu/qemu.git/linux-headers -isystem 
linux-headers -iquote . -iquote /home/zippy/work/qemu/qemu.git -iquote 
/home/zippy/work/qemu/qemu.git/include -iquote /home/zippy/work/qemu/qemu.git/tcg/i386 -pthread 
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls 
-Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv 
-Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined 
-Wno-initializer-overrides -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-string-plus-int 
-Wno-typedef-redefinition -Wno-tautological-type-limit-compare -Wno-psabi -fstack-protector-strong 
-O0 -ggdb -fPIE -isystem../linux-headers -isystemlinux-headers -DNEED_CPU_H 
'-DCONFIG_TARGET="x86_64-softmmu-config-target.h"' 
'-DCONFIG_DEVICES="x86_64-softmmu-config-devices.h"' -MD -MQ 
libqemu-x86_64-softmmu.fa.p/target_i386_kvm_kvm.c.o -MF 
libqemu-x86_64-softmmu.fa.p/target_i386_kvm_kvm.c.o.d -o 
libqemu-x86_64-softmmu.fa.p/target_i386_kvm_kvm.c.o -c ../target/i386/kvm/kvm.c
../target/i386/kvm/kvm.c:470:25: error: field 'info' with variable sized type 
'struct kvm_msrs' not at the end of a struct or class is a GNU extension 
[-Werror,-Wgnu-variable-sized-type-not-at-end]
 struct kvm_msrs info;
 ^
../target/i386/kvm/kvm.c:1701:27: error: field 'cpuid' with variable sized type 
'struct kvm_cpuid2' not at the end of a struct or class is a GNU extension 
[-Werror,-Wgnu-variable-sized-type-not-at-end]
 struct kvm_cpuid2 cpuid;
   ^
../target/i386/kvm/kvm.c:2868:25: error: field 'info' with variable sized type 
'struct kvm_msrs' not at the end of a struct or class is a GNU extension 
[-Werror,-Wgnu-variable-sized-type-not-at-end]
 struct kvm_msrs info;
 ^
3 errors generated.



OK, I only generated this patch with update-linux-headers.sh and built 
it with gcc. It seems clang requires some other adjustment. I'll have a 
check. Thanks for pointing it out.




Michal

Re: [PATCH v5 1/9] parallels: Out of image offset in BAT leads to image inflation

2022-08-23 Thread Denis V. Lunev


On 23.08.2022 11:58, Vladimir Sementsov-Ogievskiy wrote:

On 8/23/22 12:20, Denis V. Lunev wrote:

On 23.08.2022 09:23, Alexander Ivanov wrote:


On 23.08.2022 08:58, Vladimir Sementsov-Ogievskiy wrote:

On 8/22/22 12:05, Alexander Ivanov wrote:
data_end field in BDRVParallelsState is set to the biggest offset 
present
in BAT. If this offset is outside of the image, any further write 
will create

the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct.
Raise an error in parallels_open() if data_end points outside the 
image and

it is not a check (let the check to repaire the image).

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..c245ca35cd 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState 
*bs, QDict *options, int flags,

  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+    int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -811,6 +812,19 @@ static int parallels_open(BlockDriverState 
*bs, QDict *options, int flags,

  }
  }
  +    file_size = bdrv_getlength(bs->file->bs);
+    if (file_size < 0) {
+    ret = file_size;
+    goto fail;
+    }
+
+    file_size >>= BDRV_SECTOR_BITS;
+    if (s->data_end > file_size && !(flags & BDRV_O_CHECK)) {
+    error_setg(errp, "parallels: Offset in BAT is out of 
image");

+    ret = -EINVAL;
+    goto fail;
+    }


If image is unaligned to sector size, and image size is less than 
s->data_end, but the difference itself is less than sector, the 
error message would be misleading.


Should we consider "file_size = DIV_ROUND_UP(file_size, 
BDRV_SECTOR_SIZE)" instead of "file_size >>= BDRV_SECTOR_BITS"?


It's hardly possible to get such image on valid scenarios with Qemu 
(keeping in mind bdrv_truncate() call in parallels_close()). But it 
still may be possible to have such images produced by another 
software or by some failure path.


I think you are right, it would be better to align image size up to 
sector size.


I would say that we need to align not on sector size but on cluster 
size.

That would worth additional check.


And not simply align, as data_offset is not necessarily aligned to 
cluster size.


Finally, what should we check?

I suggest


diff --git a/block/parallels.c b/block/parallels.c
index 6d4ed77f16..b882ea1200 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -725,6 +725,7 @@ static int parallels_open(BlockDriverState *bs, 
QDict *options, int flags,

 BDRVParallelsState *s = bs->opaque;
 ParallelsHeader ph;
 int ret, size, i;
+    int64_t file_size;
 QemuOpts *opts = NULL;
 Error *local_err = NULL;
 char *buf;
@@ -735,6 +736,11 @@ static int parallels_open(BlockDriverState *bs, 
QDict *options, int flags,

 return -EINVAL;
 }

+    file_size = bdrv_getlength(bs->file->bs);
+    if (file_size < 0) {
+    return file_size;
+    }
+
 ret = bdrv_pread(bs->file, 0, &ph, sizeof(ph));
 if (ret < 0) {
 goto fail;
@@ -798,6 +804,13 @@ static int parallels_open(BlockDriverState *bs, 
QDict *options, int flags,


 for (i = 0; i < s->bat_size; i++) {
 int64_t off = bat2sect(s, i);
+    if (off >= file_size) {

Like this, especially >= check which we have had missed.
Though this would break the repair. We need additional

if (flags & BDRV_O_CHECK) {
    continue;
}

No incorrect data_end assignment, which would be
very welcome.

Den


+ error_setg(errp, "parallels: Offset %" PRIi64 " in BAT[%d] entry "
+   "is larger than file size (%" PRIi64 ")",
+   off, i, file_size);
+    ret = -EINVAL;
+    goto fail;
+    }
 if (off >= s->data_end) {
 s->data_end = off + s->tracks;
 }



- better error message, and we check exactly what's written in the 
spec (docs/interop/parallels.c):



  Cluster offsets specified by BAT entries must meet the following 
requirements:

  [...]
  - the value must be lower than the desired file size,

Re: QEMU Developer Survey for KVM Forum 2022

2022-08-23 Thread Alex Bennée



Alex Bennée  writes:

> Hi,
>
> For the QEMU keynote at KVM Forum I'd like to repeat the developer
> survey this year. It's only short (7 questions) and reprises some of the
> questions from last year as well as a couple of new ones. It should only
> take a few minutes to complete.
>
> You can find the survey at:
>
>   https://forms.gle/Y1niFJLbBHmA5Pgk9
>
> I'll be presenting the data along with all the other interesting stats
> in our QEMU keynote session at this years KVM Forum (Sep 12th-14th).

Bumping for anyone who missed it (currently we have about half the
response rate of last year).

-- 
Alex Bennée

Re: [PATCH for-7.2 v2 1/2] ppc/pnv: consolidate pnv_parent_*_fixup() helpers

2022-08-23 Thread Frederic Barrat





On 19/08/2022 11:47, Daniel Henrique Barboza wrote:

We have 2 helpers that amends the QOM and parent bus of a given object,
repectively. These 2 helpers are called together, and not by accident.
Due to QOM internals, doing an object_unparent() will result in the
device being removed from its parent bus. This means that changing the
QOM parent requires reassigning the parent bus again.

Create a single helper called pnv_parent_fixup(), documenting some of
the QOM specifics that we're dealing with the unparenting/parenting
mechanics, and handle both the QOM and the parent bus assignment.

Next patch will make use of this function to handle a case where we need
to change the QOM parent while keeping the same parent bus assigned
beforehand.

Signed-off-by: Daniel Henrique Barboza 
---


Thanks for the explanation.

Reviewed-by: Frederic Barrat 




  hw/pci-host/pnv_phb.c | 43 ---
  1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index 17d9960aa1..4ea33fb6ba 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -21,34 +21,45 @@
  
  
  /*

- * Set the QOM parent of an object child. If the device state
- * associated with the child has an id, use it as QOM id. Otherwise
- * use object_typename[index] as QOM id.
+ * Set the QOM parent and parent bus of an object child. If the device
+ * state associated with the child has an id, use it as QOM id.
+ * Otherwise use object_typename[index] as QOM id.
+ *
+ * This helper does both operations at the same time because seting
+ * a new QOM child will erase the bus parent of the device. This happens
+ * because object_unparent() will call object_property_del_child(),
+ * which in turn calls the property release callback prop->release if
+ * it's defined. In our case this callback is set to
+ * object_finalize_child_property(), which was assigned during the
+ * first object_property_add_child() call. This callback will end up
+ * calling device_unparent(), and this function removes the device
+ * from its parent bus.
+ *
+ * The QOM and parent bus to be set aren´t necessarily related, so
+ * let's receive both as arguments.
   */
-static void pnv_parent_qom_fixup(Object *parent, Object *child, int index)
+static bool pnv_parent_fixup(Object *parent, BusState *parent_bus,
+ Object *child, int index,
+ Error **errp)
  {
  g_autofree char *default_id =
  g_strdup_printf("%s[%d]", object_get_typename(child), index);
  const char *dev_id = DEVICE(child)->id;
  
  if (child->parent == parent) {

-return;
+return true;
  }
  
  object_ref(child);

  object_unparent(child);
  object_property_add_child(parent, dev_id ? dev_id : default_id, child);
  object_unref(child);
-}
-
-static void pnv_parent_bus_fixup(DeviceState *parent, DeviceState *child,
- Error **errp)
-{
-BusState *parent_bus = qdev_get_parent_bus(parent);
  
-if (!qdev_set_parent_bus(child, parent_bus, errp)) {

-return;
+if (!qdev_set_parent_bus(DEVICE(child), parent_bus, errp)) {
+return false;
  }
+
+return true;
  }
  
  /*

@@ -101,8 +112,10 @@ static bool pnv_phb_user_device_init(PnvPHB *phb, Error 
**errp)
   * correctly the device tree. pnv_xscom_dt() needs every
   * PHB to be a child of the chip to build the DT correctly.
   */
-pnv_parent_qom_fixup(parent, OBJECT(phb), phb->phb_id);
-pnv_parent_bus_fixup(DEVICE(chip), DEVICE(phb), errp);
+if (!pnv_parent_fixup(parent, qdev_get_parent_bus(DEVICE(chip)),
+  OBJECT(phb), phb->phb_id, errp)) {
+return false;
+}
  
  return true;

  }

Re: [PATCH for-7.2 v2 2/2] ppc/pnv: fix QOM parenting of user creatable root ports

2022-08-23 Thread Frederic Barrat





On 19/08/2022 11:47, Daniel Henrique Barboza wrote:

User creatable root ports are being parented by the 'peripheral' or the
'peripheral-anon' container. This happens because this is the regular
QOM schema for sysbus devices that are added via the command line.

Let's make this QOM hierarchy similar to what we have with default root
ports, i.e. the root port must be parented by the pnv-root-bus. To do
that we change the qom and bus parent of the root port during
root_port_realize(). The realize() is shared by the default root port
code path, so we can remove the code inside pnv_phb_attach_root_port()
that was adding the root port as a child of the bus as well.

After all that, remove pnv_phb_attach_root_port() and create the root
port explictly in the 'default_enabled()' case of pnv_phb_realize().

Signed-off-by: Daniel Henrique Barboza 
---



Reviewed-by: Frederic Barrat 



  hw/pci-host/pnv_phb.c | 47 ++-
  1 file changed, 20 insertions(+), 27 deletions(-)

diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index 4ea33fb6ba..7b11f1e8dd 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -62,29 +62,6 @@ static bool pnv_parent_fixup(Object *parent, BusState 
*parent_bus,
  return true;
  }
  
-/*

- * Attach a root port device.
- *
- * 'index' will be used both as a PCIE slot value and to calculate
- * QOM id. 'chip_id' is going to be used as PCIE chassis for the
- * root port.
- */
-static void pnv_phb_attach_root_port(PCIHostState *pci)
-{
-PCIDevice *root = pci_new(PCI_DEVFN(0, 0), TYPE_PNV_PHB_ROOT_PORT);
-const char *dev_id = DEVICE(root)->id;
-g_autofree char *default_id = NULL;
-int index;
-
-index = object_property_get_int(OBJECT(pci->bus), "phb-id", &error_fatal);
-default_id = g_strdup_printf("%s[%d]", TYPE_PNV_PHB_ROOT_PORT, index);
-
-object_property_add_child(OBJECT(pci->bus), dev_id ? dev_id : default_id,
-  OBJECT(root));
-
-pci_realize_and_unref(root, pci->bus, &error_fatal);
-}
-
  /*
   * User created devices won't have the initial setup that default
   * devices have. This setup consists of assigning a parent device
@@ -180,11 +157,11 @@ static void pnv_phb_realize(DeviceState *dev, Error 
**errp)
  pnv_phb4_bus_init(dev, PNV_PHB4(phb->backend));
  }
  
-if (!defaults_enabled()) {

-return;
-}
+if (defaults_enabled()) {
+PCIDevice *root = pci_new(PCI_DEVFN(0, 0), TYPE_PNV_PHB_ROOT_PORT);
  
-pnv_phb_attach_root_port(pci);

+pci_realize_and_unref(root, pci->bus, errp);
+}
  }
  
  static const char *pnv_phb_root_bus_path(PCIHostState *host_bridge,

@@ -259,6 +236,11 @@ static void pnv_phb_root_port_realize(DeviceState *dev, 
Error **errp)
  Error *local_err = NULL;
  int chip_id, index;
  
+/*

+ * 'index' will be used both as a PCIE slot value and to calculate
+ * QOM id. 'chip_id' is going to be used as PCIE chassis for the
+ * root port.
+ */
  chip_id = object_property_get_int(OBJECT(bus), "chip-id", &error_fatal);
  index = object_property_get_int(OBJECT(bus), "phb-id", &error_fatal);
  
@@ -266,6 +248,17 @@ static void pnv_phb_root_port_realize(DeviceState *dev, Error **errp)

  qdev_prop_set_uint8(dev, "chassis", chip_id);
  qdev_prop_set_uint16(dev, "slot", index);
  
+/*

+ * User created root ports are QOM parented to one of
+ * the peripheral containers but it's already at the right
+ * parent bus. Change the QOM parent to be the same as the
+ * parent bus it's already assigned to.
+ */
+if (!pnv_parent_fixup(OBJECT(bus), BUS(bus), OBJECT(dev),
+  index, errp)) {
+return;
+}
+
  rpc->parent_realize(dev, &local_err);
  if (local_err) {
  error_propagate(errp, local_err);

Re: [PATCH 2/4] hw/nvme: add option to (de)assert irq with eventfd

2022-08-23 Thread Klaus Jensen

On Aug 11 23:37, Jinhao Fan wrote:
> When the new option 'irq-eventfd' is turned on, the IO emulation code
> signals an eventfd when it want to (de)assert an irq. The main loop
> eventfd handler does the actual irq (de)assertion.  This paves the way
> for iothread support since QEMU's interrupt emulation is not thread
> safe.
> 
> Asserting and deasseting irq with eventfd has some performance
> implications. For small queue depth it increases request latency but
> for large queue depth it effectively coalesces irqs.
> 
> Comparision (KIOPS):
> 
> QD1   4  16  64
> QEMU 38 123 210 329
> irq-eventfd  32 106 240 364
> 
> Signed-off-by: Jinhao Fan 
> ---
>  hw/nvme/ctrl.c | 89 --
>  hw/nvme/nvme.h |  4 +++
>  2 files changed, 90 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index bd3350d7e0..8a1c5ce3e1 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -1338,6 +1338,54 @@ static void nvme_update_cq_head(NvmeCQueue *cq)
>  trace_pci_nvme_shadow_doorbell_cq(cq->cqid, cq->head);
>  }
>  
> +static void nvme_assert_notifier_read(EventNotifier *e)
> +{
> +NvmeCQueue *cq = container_of(e, NvmeCQueue, assert_notifier);
> +if (event_notifier_test_and_clear(e)) {
> +nvme_irq_assert(cq->ctrl, cq);
> +}
> +}
> +
> +static void nvme_deassert_notifier_read(EventNotifier *e)
> +{
> +NvmeCQueue *cq = container_of(e, NvmeCQueue, deassert_notifier);
> +if (event_notifier_test_and_clear(e)) {
> +nvme_irq_deassert(cq->ctrl, cq);
> +}
> +}
> +
> +static void nvme_init_irq_notifier(NvmeCtrl *n, NvmeCQueue *cq)
> +{
> +int ret;
> +
> +ret = event_notifier_init(&cq->assert_notifier, 0);
> +if (ret < 0) {
> +goto fail_assert_handler;
> +}
> +
> +event_notifier_set_handler(&cq->assert_notifier,
> +nvme_assert_notifier_read);
> +
> +if (!msix_enabled(&n->parent_obj)) {
> +ret = event_notifier_init(&cq->deassert_notifier, 0);
> +if (ret < 0) {
> +goto fail_deassert_handler;
> +}
> +
> +event_notifier_set_handler(&cq->deassert_notifier,
> +   nvme_deassert_notifier_read);
> +}
> +
> +return;
> +
> +fail_deassert_handler:
> +event_notifier_set_handler(&cq->deassert_notifier, NULL);
> +event_notifier_cleanup(&cq->deassert_notifier);
> +fail_assert_handler:
> +event_notifier_set_handler(&cq->assert_notifier, NULL);
> +event_notifier_cleanup(&cq->assert_notifier);
> +}
> +
>  static void nvme_post_cqes(void *opaque)
>  {
>  NvmeCQueue *cq = opaque;
> @@ -1382,7 +1430,23 @@ static void nvme_post_cqes(void *opaque)
>  n->cq_pending++;
>  }
>  
> -nvme_irq_assert(n, cq);
> +if (unlikely(cq->first_io_cqe)) {
> +/*
> + * Initilize event notifier when first cqe is posted. For 
> irqfd 
> + * support we need to register the MSI message in KVM. We
> + * can not do this registration at CQ creation time because
> + * Linux's NVMe driver changes the MSI message after CQ 
> creation.
> + */
> +cq->first_io_cqe = false;
> +
> +nvme_init_irq_notifier(n, cq);
> +}

It is really unfortunate that we have to do this. From what I can tell
in the kernel driver, even if it were to change to set up the irq prior
to creating the completion queue, we'd still have issues making this
work on earlier versions and there is no way to quirk our way out of
this.

We can't even move this upon creation of the submission queue since the
kernel also creates *that* prior to allocating the interrupt line.

In conclusion I don't see any way around this other than asking the NVMe
TWG to add some kind of bit indicating that the host sets up the
interrupt line prior to creating the cq.

Meh.

> +
> +if (cq->assert_notifier.initialized) {
> +event_notifier_set(&cq->assert_notifier);
> +} else {
> +nvme_irq_assert(n, cq);
> +}

There is a lot of duplication below, checking if the notifier is
initialized and then choosing what to do. Can we move this to
nvme_irq_assert/deassert()?

>  }
>  }
>  }
> @@ -4249,7 +4313,11 @@ static void nvme_cq_notifier(EventNotifier *e)
>  if (cq->irq_enabled && cq->tail == cq->head) {
>  n->cq_pending--;
>  if (!msix_enabled(&n->parent_obj)) {
> -nvme_irq_deassert(n, cq);
> +if (cq->deassert_notifier.initialized) {
> +event_notifier_set(&cq->deassert_notifier);
> +} else {
> +nvme_irq_deassert(n, cq);
> +}
>  }
>  }
>  
> @@ -4706,6 +4774,14 @@ static void nvme_free_cq(NvmeCQueue *cq, NvmeCtrl *n)
>  event_notifier_set_handler(&cq->notifier, NULL);
>

Re: [PATCH 4/4] hw/nvme: add MSI-x mask handlers for irqfd

2022-08-23 Thread Klaus Jensen

On Aug 11 23:37, Jinhao Fan wrote:
> When irqfd is enabled, we bypass QEMU's irq emulation and let KVM to
> directly assert the irq. However, KVM is not aware of the device's MSI-x
> masking status. Add MSI-x mask bookkeeping in NVMe emulation and
> detach the corresponding irqfd when the certain vector is masked.
> 
> Signed-off-by: Jinhao Fan 
> ---
>  hw/nvme/ctrl.c   | 82 
>  hw/nvme/nvme.h   |  2 ++
>  hw/nvme/trace-events |  3 ++
>  3 files changed, 87 insertions(+)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index 63f988f2f9..ac5460c7c8 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -7478,10 +7478,84 @@ static int nvme_add_pm_capability(PCIDevice *pci_dev, 
> uint8_t offset)
>  
>  return 0;
>  }
> +static int nvme_vector_unmask(PCIDevice *pci_dev, unsigned vector,
> +   MSIMessage msg)
> +{
> +NvmeCtrl *n = NVME(pci_dev);
> +int ret;
> +
> +trace_pci_nvme_irq_unmask(vector, msg.address, msg.data);
> +
> +for (uint32_t i = 0; i < n->params.max_ioqpairs + 1; i++) {

This loop is OK for now, but could be done better with a reverse
mapping.

This is just an observation. Don't spend time on doing it since we are
not aware of any host actually doing masking/unmasking.

> +NvmeCQueue *cq = n->cq[i];
> +/* 
> + * If this function is called, then irqfd must be available. 
> Therefore,
> + * irqfd must be in use if cq->assert_notifier.initialized is true.
> + */

The wording indicates that you want to assert() that assumption instead.

> +if (cq && cq->vector == vector && cq->assert_notifier.initialized) {
> +if (cq->msg.data != msg.data || cq->msg.address != msg.address) {
> +ret = kvm_irqchip_update_msi_route(kvm_state, cq->virq, msg,
> +   pci_dev);
> +if (ret < 0) {
> +return ret;
> +}
> +kvm_irqchip_commit_routes(kvm_state);
> +cq->msg = msg;
> +}
> +
> +ret = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
> + &cq->assert_notifier,
> + NULL, cq->virq);
> +if (ret < 0) {
> +return ret;
> +}
> +}
> +}
> +
> +return 0;
> +}
> +
> +static void nvme_vector_mask(PCIDevice *pci_dev, unsigned vector)
> +{
> +NvmeCtrl *n = NVME(pci_dev);
> +
> +trace_pci_nvme_irq_mask(vector);
> +
> +for (uint32_t i = 0; i < n->params.max_ioqpairs + 1; i++) {
> +NvmeCQueue *cq = n->cq[i];
> +if (cq && cq->vector == vector && cq->assert_notifier.initialized) {
> +kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state,
> +  &cq->assert_notifier,
> +  cq->virq);
> +}
> +}
> +}
> +
> +static void nvme_vector_poll(PCIDevice *pci_dev,
> + unsigned int vector_start,
> + unsigned int vector_end)
> +{
> +NvmeCtrl *n = NVME(pci_dev);
> +
> +trace_pci_nvme_irq_poll(vector_start, vector_end);
> +
> +for (uint32_t i = 0; i < n->params.max_ioqpairs + 1; i++) {
> +NvmeCQueue *cq = n->cq[i];
> +if (cq && cq->vector >= vector_start && cq->vector <= vector_end 
> +&& msix_is_masked(pci_dev, cq->vector) 
> +&& cq->assert_notifier.initialized) {
> +if (event_notifier_test_and_clear(&cq->assert_notifier)) {
> +msix_set_pending(pci_dev, i);
> +}
> +}
> +}
> +}

I'm a little fuzzy on this - what causes this function to be invoked?


signature.asc
Description: PGP signature

Re: [PATCH v2] ci: Upgrade msys2 release to 20220603

2022-08-23 Thread Thomas Huth


On 22/08/2022 16.39, 罗勇刚(Yonggang Luo) wrote:
After digging, it seems to be a memory issue, cirrus also uses 8gb, that's 
rather weird...


If it's a memory issue, it might help to remove the "-j2" from the 
invocation of "make", so that only one file is compiled at once...?


 Thomas

Re: [PATCH 4/7] vdpa: Remove SVQ vring from iova_tree at shutdown

2022-08-23 Thread Eugenio Perez Martin

On Tue, Aug 23, 2022 at 8:25 AM Jason Wang  wrote:
>
>
> 在 2022/8/20 00:53, Eugenio Pérez 写道:
> > Although the device will be reset before usage, the right thing to do is
> > to clean it.
> >
> > Reported-by: Lei Yang 
> > Fixes: 34e3c94eda ("vdpa: Add custom IOTLB translations to SVQ")
> > Signed-off-by: Eugenio Pérez 
> > ---
> >   hw/virtio/vhost-vdpa.c | 7 ++-
> >   1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 7e28d2f674..943799c17c 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -898,7 +898,12 @@ static bool vhost_vdpa_svq_unmap_ring(struct 
> > vhost_vdpa *v,
> >
> >   size = ROUND_UP(result->size, qemu_real_host_page_size());
> >   r = vhost_vdpa_dma_unmap(v, result->iova, size);
> > -return r == 0;
> > +if (unlikely(r < 0)) {
> > +return false;
>
>
> vhost-vdpa_svq_map_ring() will call error_report() here, should we do
> the same?
>

We can use it unconditionally on error paths if we don't report
errors. I can try to check if we use that way at this moment.

>  if (unlikely(r != 0)) {
>  error_setg_errno(errp, -r, "Cannot map region to device");
>
> (Btw the error is not very informative, we should dump the map it self
> at least?)
>
> Thanks
>
>
> > +}
> > +
> > +vhost_iova_tree_remove(v->iova_tree, result);
> > +return 0;
> >   }
> >
> >   static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
>

Re: [PATCH 7/7] vdpa: Use ring hwaddr at vhost_vdpa_svq_unmap_ring

2022-08-23 Thread Eugenio Perez Martin

On Tue, Aug 23, 2022 at 8:40 AM Jason Wang  wrote:
>
>
> 在 2022/8/20 00:53, Eugenio Pérez 写道:
> > Reduce code duplication.
> >
> > Signed-off-by: Eugenio Pérez 
>
>
> Acked-by: Jason Wang 
>
> (In the future, we need to look for other cases where a function may use
> only a partial of DMAMap.)
>

Do you have a use case in mind? The IOVA tree also trusts the memory
listener will unmap and map chunks symmetrically.

Thanks!

> Thanks
>
>
> > ---
> >   hw/virtio/vhost-vdpa.c | 17 -
> >   1 file changed, 8 insertions(+), 9 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 07d00f5284..45d6e86b45 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -884,10 +884,12 @@ static int vhost_vdpa_svq_set_fds(struct vhost_dev 
> > *dev,
> >   /**
> >* Unmap a SVQ area in the device
> >*/
> > -static void vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
> > -  const DMAMap *needle)
> > +static void vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr addr)
> >   {
> > -const DMAMap *result = vhost_iova_tree_find_iova(v->iova_tree, needle);
> > +const DMAMap needle = {
> > +.translated_addr = addr,
> > +};
> > +const DMAMap *result = vhost_iova_tree_find_iova(v->iova_tree, 
> > &needle);
> >   hwaddr size;
> >   int r;
> >
> > @@ -908,17 +910,14 @@ static void vhost_vdpa_svq_unmap_ring(struct 
> > vhost_vdpa *v,
> >   static void vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> >  const VhostShadowVirtqueue *svq)
> >   {
> > -DMAMap needle = {};
> >   struct vhost_vdpa *v = dev->opaque;
> >   struct vhost_vring_addr svq_addr;
> >
> >   vhost_svq_get_vring_addr(svq, &svq_addr);
> >
> > -needle.translated_addr = svq_addr.desc_user_addr;
> > -vhost_vdpa_svq_unmap_ring(v, &needle);
> > +vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr);
> >
> > -needle.translated_addr = svq_addr.used_user_addr;
> > -vhost_vdpa_svq_unmap_ring(v, &needle);
> > +vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr);
> >   }
> >
> >   /**
> > @@ -996,7 +995,7 @@ static bool vhost_vdpa_svq_map_rings(struct vhost_dev 
> > *dev,
> >   ok = vhost_vdpa_svq_map_ring(v, &device_region, errp);
> >   if (unlikely(!ok)) {
> >   error_prepend(errp, "Cannot create vq device region: ");
> > -vhost_vdpa_svq_unmap_ring(v, &driver_region);
> > +vhost_vdpa_svq_unmap_ring(v, driver_region.translated_addr);
> >   }
> >   addr->used_user_addr = device_region.iova;
> >
>

Re: [PATCH v2] s390x/cpumodel: add stfl197 processor-activity-instrumentation extension 1

2022-08-23 Thread Thomas Huth


On 27/07/2022 15.51, Christian Borntraeger wrote:

Add stfle 197 (processor-activity-instrumentation extension 1) to the
gen16 default model and fence it off for 7.1 and older.

Signed-off-by: Christian Borntraeger 
Reviewed-by: David Hildenbrand 
---
v1->v2:
- this is on top of "hw: Add compat machines for 7.2" from Cornelia Huck
(please queue afterwards)
- move fencing to 7.1

  hw/s390x/s390-virtio-ccw.c  | 1 +
  target/s390x/cpu_features_def.h.inc | 1 +
  target/s390x/gen-features.c | 2 ++
  3 files changed, 4 insertions(+)


Thanks, queue (together with Cornelia's patch) to my s390x-next branch:

 https://gitlab.com/thuth/qemu/-/commits/s390x-next/

 Thomas

Re: [PATCH v7 6/8] config: add check to block layer

2022-08-23 Thread Stefan Hajnoczi

On Tue, Aug 23, 2022 at 12:25:23PM +0800, Sam Li wrote:
> Stefan Hajnoczi  于2022年8月23日周二 08:54写道：
> >
> > On Tue, Aug 16, 2022 at 02:25:20PM +0800, Sam Li wrote:
> > > Putting zoned/non-zoned BlockDrivers on top of each other is not
> > > allowed.
> > >
> > > Signed-off-by: Sam Li 
> > > Reviewed-by: Stefan Hajnoczi 
> > > ---
> > >  block.c  | 14 ++
> > >  block/raw-format.c   |  1 +
> > >  include/block/block_int-common.h |  5 +
> > >  3 files changed, 20 insertions(+)
> > >
> > > diff --git a/block.c b/block.c
> > > index bc85f46eed..affe6c597e 100644
> > > --- a/block.c
> > > +++ b/block.c
> > > @@ -7947,6 +7947,20 @@ void bdrv_add_child(BlockDriverState *parent_bs, 
> > > BlockDriverState *child_bs,
> > >  return;
> > >  }
> > >
> > > +/*
> > > + * Non-zoned block drivers do not follow zoned storage constraints
> > > + * (i.e. sequential writes to zones). Refuse mixing zoned and 
> > > non-zoned
> > > + * drivers in a graph.
> > > + */
> > > +if (!parent_bs->drv->supports_zoned_children &&
> > > +child_bs->bl.zoned != BLK_Z_HM) {
> >
> 
> Should be:
> +if (!parent_bs->drv->supports_zoned_children &&
> +child_bs->bl.zoned == BLK_Z_HM)
> 
> > Is this logical expression correct:
> >
> >   If the parent does not support zoned children and the child is not
> >   zoned, fail with an error.
> >
> > ?
> 
> No. It should be:
> 
> If the parent does not support zoned children and the child is zoned,
> fail with an error.  It should handle the case where a filter node is
> inserted above a raw block driver with a zoned_host_device child.
> 
> There are some QEMU command-line constraints for the zoned devices. I
> was wondering where to add such support so that it can print an error
> message for users:
> 1. cache.direct= setting

The O_DIRECT requirement is specific to file-posix and Linux's zoned
block device implementation, so it belongs in file-posix.c's
zoned_host_device .bdrv_file_open() function.

> 2. mix zoned/non-zoned drivers

This is generic and I think bdrv_add_child() is the right place for
parent-child compatibility checks.


signature.asc
Description: PGP signature

Re: [PATCH v7 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-08-23 Thread Stefan Hajnoczi

On Tue, Aug 23, 2022 at 12:12:44PM +0800, Sam Li wrote:
> Stefan Hajnoczi  于2022年8月23日周二 08:49写道：
> > On Tue, Aug 16, 2022 at 02:25:18PM +0800, Sam Li wrote:
> > > +static int handle_aiocb_zone_report(void *opaque) {
> > > +#if defined(CONFIG_BLKZONED)
> > > +RawPosixAIOData *aiocb = opaque;
> > > +int fd = aiocb->aio_fildes;
> > > +unsigned int *nr_zones = aiocb->zone_report.nr_zones;
> > > +BlockZoneDescriptor *zones = aiocb->zone_report.zones;
> > > +int64_t sector = aiocb->aio_offset;
> > > +
> > > +struct blk_zone *blkz;
> > > +int64_t rep_size;
> > > +unsigned int nrz;
> > > +int ret, n = 0, i = 0;
> > > +
> > > +nrz = *nr_zones;
> > > +rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct 
> > > blk_zone);
> > > +g_autofree struct blk_zone_report *rep = NULL;
> > > +rep = g_malloc(rep_size);
> > > +
> > > +blkz = (struct blk_zone *)(rep + 1);
> > > +while (n < nrz) {
> > > +memset(rep, 0, rep_size);
> > > +rep->sector = sector;
> > > +rep->nr_zones = nrz - n;
> > > +
> > > +ret = ioctl(fd, BLKREPORTZONE, rep);
> >
> > Does this ioctl() need "do { ... } while (ret == -1 && errno == EINTR)"?
> 
> No? We discussed this before. I guess even EINTR should be propagated
> back to the guest. Maybe Damien can talk more about why.

No, EINTR is an internal error that must be handled by QEMU. It means
the QEMU process' syscall was interrupted by a signal and the syscall
must be retried. The guest shouldn't see EINTR (and there is no
virtio-blk error code defined for it).


signature.asc
Description: PGP signature

Re: [PATCH] target/s390x: Fix CLFIT and CLGIT immediate size

2022-08-23 Thread David Hildenbrand

On 17.08.22 18:15, Ilya Leoshkevich wrote:
> I2 is 16 bits, not 32.
> 
> Found by running valgrind's none/tests/s390x/traps.
> 
> Fixes: 1c2687518235 ("target-s390: Implement COMPARE AND TRAP")
> Signed-off-by: Ilya Leoshkevich 
> ---
>  target/s390x/tcg/insn-data.def | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
> index 5e448bb2c4..6d2cfe5fa2 100644
> --- a/target/s390x/tcg/insn-data.def
> +++ b/target/s390x/tcg/insn-data.def
> @@ -290,8 +290,8 @@
>  D(0xb961, CLGRT,   RRF_c, GIE, r1_o, r2_o, 0, 0, ct, 0, 1)
>  D(0xeb23, CLT, RSY_b, MIE, r1_32u, m2_32u, 0, 0, ct, 0, 1)
>  D(0xeb2b, CLGT,RSY_b, MIE, r1_o, m2_64, 0, 0, ct, 0, 1)
> -D(0xec73, CLFIT,   RIE_a, GIE, r1_32u, i2_32u, 0, 0, ct, 0, 1)
> -D(0xec71, CLGIT,   RIE_a, GIE, r1_o, i2_32u, 0, 0, ct, 0, 1)
> +D(0xec73, CLFIT,   RIE_a, GIE, r1_32u, i2_16u, 0, 0, ct, 0, 1)
> +D(0xec71, CLGIT,   RIE_a, GIE, r1_o, i2_16u, 0, 0, ct, 0, 1)
>  
>  /* CONVERT TO DECIMAL */
>  C(0x4e00, CVD, RX_a,  Z,   r1_o, a2, 0, 0, cvd, 0)

Reviewed-by: David Hildenbrand 

-- 
Thanks,

David / dhildenb

Re: [PATCH v7 06/12] multifd: Make flags field thread local

2022-08-23 Thread Juan Quintela

Leonardo Bras Soares Passos  wrote:
> On Fri, Aug 19, 2022 at 7:03 AM Juan Quintela  wrote:
>>
>> Leonardo Brás  wrote:
>> > On Tue, 2022-08-02 at 08:39 +0200, Juan Quintela wrote:
>> >> Use of flags with respect to locking was incensistant.  For the
>> >> sending side:
>> >> - it was set to 0 with mutex held on the multifd channel.
>> >> - MULTIFD_FLAG_SYNC was set with mutex held on the migration thread.
>> >> - Everything else was done without the mutex held on the multifd channel.
>> >>
>> >> On the reception side, it is not used on the migration thread, only on
>> >> the multifd channels threads.
>> >>
>> >> So we move it to the multifd channels thread only variables, and we
>> >> introduce a new bool sync_needed on the send side to pass that 
>> >> information.
>> >>
>> >> Signed-off-by: Juan Quintela 
>> >> ---
>> >>  migration/multifd.h | 10 ++
>> >>  migration/multifd.c | 23 +--
>> >>  2 files changed, 19 insertions(+), 14 deletions(-)
>> >>
>> >> diff --git a/migration/multifd.h b/migration/multifd.h
>> >> index 36f899c56f..a67cefc0a2 100644
>> >> --- a/migration/multifd.h
>> >> +++ b/migration/multifd.h
>> >> @@ -98,12 +98,12 @@ typedef struct {
>> >
>> > Just noticed having no name in 'typedef struct' line makes it harder to
>> > understand what is going on.
>>
>> It is common idiom in QEMU.  The principal reason is that if you don't
>> want anyone to use "struct MultiFDSendParams" but MultiFDSendParams, the
>> best way to achieve that is to do it this way.
>
> I agree, but a comment after the typedef could help reviewing. Something like
>
> typedef struct { /* MultiFDSendParams */
> ...
> } MultiFDSendParams

You have a point here.  Not putting a comment, putting the real thing.

Thanks, Juan.

Re: Teensy 4.1 Implementation

2022-08-23 Thread Peter Maydell

On Sun, 21 Aug 2022 at 01:05, Shiny Saana  wrote:
> I've been able to write an initial Teensy 4.1 machine, for now with
> only the few important memory regions initialized, and successfully
> ran some hand-written ARM code on it.

Great, that's good progress!

> The documentation ( https://www.pjrc.com/teensy/IMXRT1060RM_rev3.pdf ),
> in section 9.7.1, gives some informations on how, in the actual
> Teensy board, the ROM, executed at boot, initialize the board
> peripherals, and also reads from a data structure included in the
> Flash memory (the user-provided program) where the CPU should jump
> to after the ROM has done its work (somewhere in that same Flash memory,
> usually).
>
> I was able to successfully dump the ROM of the real board and
> confirm this behavior. Given that the current plan is not to
> emulate every peripherals, I am of the opinion that writing a very
> simple ROM that merely reads this Flash provided data structure and
> jumps to the provided address sounds like a good starting point, so
> that I can keep iterating on writing more and more complex code
> through the provided Teensy toolchain, and implementing needed
> peripherals.
>
> As such, I have several questions:
>
> 1/ To replicate this behaviour, is this considered the correct
> approach by the current QEMU maintainers?

Yes, I think this is probably a reasonable way to go.

> 2/ If so, I have not been able to find any function that would be
> able to load data into a memory region "statically". Does one
> exist? Is there an alternative to this process?

Depends exactly what you want to do. If you want "let the user
load data to an arbitrary address", then the "generic loader"
is usually helpful:
https://www.qemu.org/docs/master/system/generic-loader.html

If you mean "C code within QEMU loads data to a specific place",
rom_add_blob_fixed_as() is probably what you want. This is how
hw/arm/boot.c loads both user-provided data files and the
hand-coded mini-bootloader into the guest.

> 3/ Regarding loading the "kernel" of the board, as part of the
> init process, I am calling the usual "armv7m_load_kernel" function
> with its usual parameters. However, it seems to load it as the
> very start of the address space, which is not where the flash
> memory is, and so is not where the kernel should be loaded. I
> wasn't able to find a workaround. Is there something I'm missing?

The behaviour of armv7m_load_kernel() depends on what kind
of file you pass to -kernel. If you pass -kernel an ELF file,
then it will get loaded to the addresses as specified by
the ELF file, so you can use that to put data anywhere you like.
If you pass it a raw binary file then, yeah, at the moment
that gets loaded to address 0. There's no real reason for this
limitation -- it's mainly because when that code was written
we supported very few M-profile boards, and all of them booted
from address 0. (That is, the board doesn't set either the
init-svtor or init-nsvtor properties on the armv7m object to
anything other than 0.) We could change how this works, but
the difficulty is that the desire for "Just Do What I Mean"
behaviour for a specific board tends to conflict with the
desire for all boards to behave in a consistent manner.
In particular, at the moment passing a binary file to -kernel
means "I want this to be loaded so that it has the
CPU vector table in it and execution starts from reset as
the architecture says it should"; it can't both mean that
consistently across M-profile boards and also mean "on the
teeny board, be an image file intended to boot via the ROM
loader".

Loading the teeny images via the generic loader rather than
via -kernel would be one way to sidestep this issue...

Do you know what the hardware sets the initial vector
base address to? (that is, where the ROM itself puts its
reset/interrupt vector table). I couldn't find anything in
the datasheet that said that.

(By the way, calling armv7m_load_kernel() is mandatory even if
you don't care about loading an image, because it's the function
that arranges that the CPU gets reset correctly.)

thanks
-- PMM

Re: [PATCH v8 02/12] s390x/cpu_topology: CPU topology objects and structures

2022-08-23 Thread Thomas Huth


On 20/06/2022 16.03, Pierre Morel wrote:

We use new objects to have a dynamic administration of the CPU topology.
The highest level object in this implementation is the s390 book and
in this first implementation of CPU topology for S390 we have a single
book.
The book is built as a SYSBUS bridge during the CPU initialization.
Other objects, sockets and core will be built after the parsing
of the QEMU -smp argument.

Every object under this single book will be build dynamically
immediately after a CPU has be realized if it is needed.
The CPU will fill the sockets once after the other, according to the
number of core per socket defined during the smp parsing.

Each CPU inside a socket will be represented by a bit in a 64bit
unsigned long. Set on plug and clear on unplug of a CPU.

For the S390 CPU topology, thread and cores are merged into
topology cores and the number of topology cores is the multiplication
of cores by the numbers of threads.

Signed-off-by: Pierre Morel 
---
  hw/s390x/cpu-topology.c | 391 
  hw/s390x/meson.build|   1 +
  hw/s390x/s390-virtio-ccw.c  |   6 +
  include/hw/s390x/cpu-topology.h |  74 ++
  target/s390x/cpu.h  |  47 
  5 files changed, 519 insertions(+)
  create mode 100644 hw/s390x/cpu-topology.c
  create mode 100644 include/hw/s390x/cpu-topology.h

...

+bool s390_topology_new_cpu(MachineState *ms, int core_id, Error **errp)
+{
+S390TopologyBook *book;
+S390TopologySocket *socket;
+S390TopologyCores *cores;
+int nb_cores_per_socket;
+int origin, bit;
+
+book = s390_get_topology();
+
+nb_cores_per_socket = ms->smp.cores * ms->smp.threads;
+
+socket = s390_get_socket(ms, book, core_id / nb_cores_per_socket, errp);
+if (!socket) {
+return false;
+}
+
+/*
+ * At the core level, each CPU is represented by a bit in a 64bit
+ * unsigned long. Set on plug and clear on unplug of a CPU.
+ * The firmware assume that all CPU in the core description have the same
+ * type, polarization and are all dedicated or shared.
+ * In the case a socket contains CPU with different type, polarization
+ * or dedication then they will be defined in different CPU containers.
+ * Currently we assume all CPU are identical and the only reason to have
+ * several S390TopologyCores inside a socket is to have more than 64 CPUs
+ * in that case the origin field, representing the offset of the first CPU
+ * in the CPU container allows to represent up to the maximal number of
+ * CPU inside several CPU containers inside the socket container.
+ */
+origin = 64 * (core_id / 64);


Maybe faster:

origin = core_id & ~63;

By the way, where is this limitation to 64 coming from? Just because we're 
using a "unsigned long" for now? Or is this a limitation from the architecture?



+cores = s390_get_cores(ms, socket, origin, errp);
+if (!cores) {
+return false;
+}
+
+bit = 63 - (core_id - origin);
+set_bit(bit, &cores->mask);
+cores->origin = origin;
+
+return true;
+}

...

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index cc3097bfee..a586875b24 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -43,6 +43,7 @@
  #include "sysemu/sysemu.h"
  #include "hw/s390x/pv.h"
  #include "migration/blocker.h"
+#include "hw/s390x/cpu-topology.h"
  
  static Error *pv_mig_blocker;
  
@@ -89,6 +90,7 @@ static void s390_init_cpus(MachineState *machine)

  /* initialize possible_cpus */
  mc->possible_cpu_arch_ids(machine);
  
+s390_topology_setup(machine);


Is this safe with regards to migration? Did you tried a ping-pong migration 
from an older QEMU to a QEMU with your modifications and back to the older 
one? If it does not work, we might need to wire this setup to the machine 
types...



  for (i = 0; i < machine->smp.cpus; i++) {
  s390x_new_cpu(machine->cpu_type, i, &error_fatal);
  }
@@ -306,6 +308,10 @@ static void s390_cpu_plug(HotplugHandler *hotplug_dev,
  g_assert(!ms->possible_cpus->cpus[cpu->env.core_id].cpu);
  ms->possible_cpus->cpus[cpu->env.core_id].cpu = OBJECT(dev);
  
+if (!s390_topology_new_cpu(ms, cpu->env.core_id, errp)) {

+return;
+}
+
  if (dev->hotplugged) {
  raise_irq_cpu_hotplug();
  }


 Thomas

Re: [PATCH for-7.1?] scsi-generic: Fix emulated block limits VPD page

2022-08-23 Thread Stefan Hajnoczi

On Mon, 22 Aug 2022 at 08:54, Kevin Wolf  wrote:
>
> Commits 01ef8185b80 amd 24b36e9813e updated the way that the maximum
> transfer length is calculated for patching block limits VPD page in an
> INQUIRY response.
>
> The same updates also need to be made for the case where the host device
> does not support the block limits VPD page at all and we emulate the
> whole page.
>
> Without this fix, on host block devices a maximum transfer length of
> (INT_MAX - sector_size) bytes is advertised to the guest, resulting in
> I/O errors when a request that exceeds the host limits is made by the
> guest. (Prior to commit 24b36e9813e, this code path would use the
> max_transfer value from the host instead of INT_MAX, but still miss the
> fix from 01ef8185b80 where max_transfer is also capped to max_iov
> host pages, so it would be less wrong, but still wrong.)
>
> Cc: qemu-sta...@nongnu.org
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2096251
> Fixes: 01ef8185b809af9d287e1a03a3f9d8ea8231118a
> Fixes: 24b36e9813ec15da7db62e3b3621730710c5f020
> Signed-off-by: Kevin Wolf 
> ---
>  hw/scsi/scsi-generic.c | 21 ++---
>  1 file changed, 14 insertions(+), 7 deletions(-)

Reviewed-by: Stefan Hajnoczi

Re: [PATCH v5 1/9] parallels: Out of image offset in BAT leads to image inflation

2022-08-23 Thread Alexander Ivanov




On 23.08.2022 11:58, Vladimir Sementsov-Ogievskiy wrote:

On 8/23/22 12:20, Denis V. Lunev wrote:

On 23.08.2022 09:23, Alexander Ivanov wrote:


On 23.08.2022 08:58, Vladimir Sementsov-Ogievskiy wrote:

On 8/22/22 12:05, Alexander Ivanov wrote:
data_end field in BDRVParallelsState is set to the biggest offset 
present
in BAT. If this offset is outside of the image, any further write 
will create

the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct.
Raise an error in parallels_open() if data_end points outside the 
image and

it is not a check (let the check to repaire the image).

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..c245ca35cd 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState 
*bs, QDict *options, int flags,

  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+    int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -811,6 +812,19 @@ static int parallels_open(BlockDriverState 
*bs, QDict *options, int flags,

  }
  }
  +    file_size = bdrv_getlength(bs->file->bs);
+    if (file_size < 0) {
+    ret = file_size;
+    goto fail;
+    }
+
+    file_size >>= BDRV_SECTOR_BITS;
+    if (s->data_end > file_size && !(flags & BDRV_O_CHECK)) {
+    error_setg(errp, "parallels: Offset in BAT is out of 
image");

+    ret = -EINVAL;
+    goto fail;
+    }


If image is unaligned to sector size, and image size is less than 
s->data_end, but the difference itself is less than sector, the 
error message would be misleading.


Should we consider "file_size = DIV_ROUND_UP(file_size, 
BDRV_SECTOR_SIZE)" instead of "file_size >>= BDRV_SECTOR_BITS"?


It's hardly possible to get such image on valid scenarios with Qemu 
(keeping in mind bdrv_truncate() call in parallels_close()). But it 
still may be possible to have such images produced by another 
software or by some failure path.


I think you are right, it would be better to align image size up to 
sector size.


I would say that we need to align not on sector size but on cluster 
size.

That would worth additional check.


And not simply align, as data_offset is not necessarily aligned to 
cluster size.


Finally, what should we check?

I suggest


diff --git a/block/parallels.c b/block/parallels.c
index 6d4ed77f16..b882ea1200 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -725,6 +725,7 @@ static int parallels_open(BlockDriverState *bs, 
QDict *options, int flags,

 BDRVParallelsState *s = bs->opaque;
 ParallelsHeader ph;
 int ret, size, i;
+    int64_t file_size;
 QemuOpts *opts = NULL;
 Error *local_err = NULL;
 char *buf;
@@ -735,6 +736,11 @@ static int parallels_open(BlockDriverState *bs, 
QDict *options, int flags,

 return -EINVAL;
 }

+    file_size = bdrv_getlength(bs->file->bs);
+    if (file_size < 0) {
+    return file_size;
+    }
+
 ret = bdrv_pread(bs->file, 0, &ph, sizeof(ph));
 if (ret < 0) {
 goto fail;
@@ -798,6 +804,13 @@ static int parallels_open(BlockDriverState *bs, 
QDict *options, int flags,


 for (i = 0; i < s->bat_size; i++) {
 int64_t off = bat2sect(s, i);
+    if (off >= file_size) {


Do we really want to compare file size with cluster offset, not with 
cluster end?


Also I would leave the comparison outside the loop, because I'm going to 
move highest offset calculation to a helper (there are two places with 
the same code).



+ error_setg(errp, "parallels: Offset %" PRIi64 " in BAT[%d] entry "
+   "is larger than file size (%" PRIi64 ")",
+   off, i, file_size);
+    ret = -EINVAL;
+    goto fail;
+    }
 if (off >= s->data_end) {
 s->data_end = off + s->tracks;
 }



- better error message, and we check exactly what's written in the 
spec (docs/interop/parallels.c):



  Cluster offsets specified by BAT entries must meet the following 
requirements:

  [...]
  - the value must be lower than the desired file size,

Re: Teensy 4.1 Implementation

2022-08-23 Thread Shiny Saana

Thank you again for your time!

I didn't know at all about the generic loader!! It feels to me that it will
definitely be very useful in loading the Teensy image.

(To give more background: the Teensy-Arduinon toolchain first compiles an
.elf and then convert that to an Intex hex file. We can retrieve that .elf
in /tmp after compilation, but if all you have is an ihex (which is the
case for my use-case), then you're out of luck.)

To answer your question:

> Do you know what the hardware sets the initial vector
> base address to? (that is, where the ROM itself puts its
> reset/interrupt vector table). I couldn't find anything in
> the datasheet that said that.

>From experimentation and dissasembly on the ROM, (located in 0x0020_),
the very first int (converted to BE) is "0x2020_1000" , which is located to
"OCRAM2", also referred as "ROM RAM" by the documentation, and the next int
is "0x0020_2091", which both points inside the ROM itself , and which when
forcibly disassembled in Ghidra does look like a function.
So I'm pretty confident the initial vector base address is 0x0020_.

Regarding the "kernel loading" issue, I believe that I was initially
mistaken. From other examples online, I believed that it was the way to
load the Teensy image. But thinking and discussing it with another ARM dev,
wouldn't the ROM itself actually be considered the kernel?

The teensy ihex image (converted to raw binary) could then be loaded via
the generic loader (and then, document that behavior in QEMU, and for user
convenience, in my own program using QEMU, I could merely provide a script
that handles the arguments for them).

Knowing that, if the call to  armv7m_load_kernel() is mandatory, maybe it
would make sense to load the ROM in C code via this function, with the
compiled ROM addresses from 0x_ to 0x001F_ being padded with 0.
I'm *absolutely not sure* if this is a good idea, but that's the one I got
from the understanding that I have.

As always, thank you again for the help (and for using some of your time to
go through the documentation yourself, I genuinely appreciate the help a
great lot.)

Saana

Le mar. 23 août 2022 à 15:09, Peter Maydell  a
écrit :

> On Sun, 21 Aug 2022 at 01:05, Shiny Saana  wrote:
> > I've been able to write an initial Teensy 4.1 machine, for now with
> > only the few important memory regions initialized, and successfully
> > ran some hand-written ARM code on it.
>
> Great, that's good progress!
>
> > The documentation ( https://www.pjrc.com/teensy/IMXRT1060RM_rev3.pdf ),
> > in section 9.7.1, gives some informations on how, in the actual
> > Teensy board, the ROM, executed at boot, initialize the board
> > peripherals, and also reads from a data structure included in the
> > Flash memory (the user-provided program) where the CPU should jump
> > to after the ROM has done its work (somewhere in that same Flash memory,
> > usually).
> >
> > I was able to successfully dump the ROM of the real board and
> > confirm this behavior. Given that the current plan is not to
> > emulate every peripherals, I am of the opinion that writing a very
> > simple ROM that merely reads this Flash provided data structure and
> > jumps to the provided address sounds like a good starting point, so
> > that I can keep iterating on writing more and more complex code
> > through the provided Teensy toolchain, and implementing needed
> > peripherals.
> >
> > As such, I have several questions:
> >
> > 1/ To replicate this behaviour, is this considered the correct
> > approach by the current QEMU maintainers?
>
> Yes, I think this is probably a reasonable way to go.
>
> > 2/ If so, I have not been able to find any function that would be
> > able to load data into a memory region "statically". Does one
> > exist? Is there an alternative to this process?
>
> Depends exactly what you want to do. If you want "let the user
> load data to an arbitrary address", then the "generic loader"
> is usually helpful:
> https://www.qemu.org/docs/master/system/generic-loader.html
>
> If you mean "C code within QEMU loads data to a specific place",
> rom_add_blob_fixed_as() is probably what you want. This is how
> hw/arm/boot.c loads both user-provided data files and the
> hand-coded mini-bootloader into the guest.
>
> > 3/ Regarding loading the "kernel" of the board, as part of the
> > init process, I am calling the usual "armv7m_load_kernel" function
> > with its usual parameters. However, it seems to load it as the
> > very start of the address space, which is not where the flash
> > memory is, and so is not where the kernel should be loaded. I
> > wasn't able to find a workaround. Is there something I'm missing?
>
> The behaviour of armv7m_load_kernel() depends on what kind
> of file you pass to -kernel. If you pass -kernel an ELF file,
> then it will get loaded to the addresses as specified by
> the ELF file, so you can use that to put data anywhere you like.
> If you pass it a raw binary file then, yeah, at the mo

[PULL 1/1] scsi-generic: Fix emulated block limits VPD page

2022-08-23 Thread Kevin Wolf

Commits 01ef8185b80 amd 24b36e9813e updated the way that the maximum
transfer length is calculated for patching block limits VPD page in an
INQUIRY response.

The same updates also need to be made for the case where the host device
does not support the block limits VPD page at all and we emulate the
whole page.

Without this fix, on host block devices a maximum transfer length of
(INT_MAX - sector_size) bytes is advertised to the guest, resulting in
I/O errors when a request that exceeds the host limits is made by the
guest. (Prior to commit 24b36e9813e, this code path would use the
max_transfer value from the host instead of INT_MAX, but still miss the
fix from 01ef8185b80 where max_transfer is also capped to max_iov
host pages, so it would be less wrong, but still wrong.)

Cc: qemu-sta...@nongnu.org
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2096251
Fixes: 01ef8185b809af9d287e1a03a3f9d8ea8231118a
Fixes: 24b36e9813ec15da7db62e3b3621730710c5f020
Signed-off-by: Kevin Wolf 
Message-Id: <20220822125320.48257-1-kw...@redhat.com>
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 hw/scsi/scsi-generic.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index ada24d7486..3d35d307e1 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -147,6 +147,18 @@ static int execute_command(BlockBackend *blk,
 return 0;
 }
 
+static uint64_t calculate_max_transfer(SCSIDevice *s)
+{
+uint64_t max_transfer = blk_get_max_hw_transfer(s->conf.blk);
+uint32_t max_iov = blk_get_max_hw_iov(s->conf.blk);
+
+assert(max_transfer);
+max_transfer = MIN_NON_ZERO(max_transfer,
+max_iov * qemu_real_host_page_size());
+
+return max_transfer / s->blocksize;
+}
+
 static int scsi_handle_inquiry_reply(SCSIGenericReq *r, SCSIDevice *s, int len)
 {
 uint8_t page, page_idx;
@@ -179,12 +191,7 @@ static int scsi_handle_inquiry_reply(SCSIGenericReq *r, 
SCSIDevice *s, int len)
 (r->req.cmd.buf[1] & 0x01)) {
 page = r->req.cmd.buf[2];
 if (page == 0xb0) {
-uint64_t max_transfer = blk_get_max_hw_transfer(s->conf.blk);
-uint32_t max_iov = blk_get_max_hw_iov(s->conf.blk);
-
-assert(max_transfer);
-max_transfer = MIN_NON_ZERO(max_transfer, max_iov * 
qemu_real_host_page_size())
-/ s->blocksize;
+uint64_t max_transfer = calculate_max_transfer(s);
 stl_be_p(&r->buf[8], max_transfer);
 /* Also take care of the opt xfer len. */
 stl_be_p(&r->buf[12],
@@ -230,7 +237,7 @@ static int scsi_generic_emulate_block_limits(SCSIGenericReq 
*r, SCSIDevice *s)
 uint8_t buf[64];
 
 SCSIBlockLimits bl = {
-.max_io_sectors = blk_get_max_transfer(s->conf.blk) / s->blocksize
+.max_io_sectors = calculate_max_transfer(s),
 };
 
 memset(r->buf, 0, r->buflen);
-- 
2.37.1

[PULL 0/1] Block layer patches

2022-08-23 Thread Kevin Wolf

The following changes since commit ba58ccbef60338d0b7334c714589a6423a3e7f91:

  Merge tag 'for-7.1-hppa' of https://github.com/hdeller/qemu-hppa into staging 
(2022-08-19 09:35:29 -0700)

are available in the Git repository at:

  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 51e15194b0a091e5c40aab2eb234a1d36c5c58ee:

  scsi-generic: Fix emulated block limits VPD page (2022-08-23 16:01:13 +0200)


Block layer patches

- scsi-generic: Fix I/O errors due to wrong block limits


Kevin Wolf (1):
  scsi-generic: Fix emulated block limits VPD page

 hw/scsi/scsi-generic.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

Issues related to the polling system on Windows

2022-08-23 Thread Clément Chigot

Hi all,

I've been struggling with a bug which seems to be linked to several issues
in the polling system on Windows hosts.

When connecting gdb to a qemu-system (it happens with all the emulations
I've tried), I've discovered that sometimes a latency appears. It happens
with all the commands but it is really noticeable with "call" commands. It can
take more than 20s to complete.

While investigating it seems that the polling system misses some events and
thus waits for the timeout of g_poll (1s) before handling them. It can be seen
with any program launched with gdbstub_io_command traces.
$ gdb-system-arm -s -S
...
gdbstub_io_command Received: m422650,8
gdbstub_io_command Received: m422650,8
 Freeze for less than one second
gdbstub_io_command Received: P1f=d09fca00
gdbstub_io_command Received: m422650,8


This is random but pretty obvious when the freeze happens.

An important note is that it's triggered by newer versions of glib. We have
a qemu-6 built with glib-2.54 where everything is fine, but when rebuilding
it with glib-2.60 this problem appears. I didn't check yet with glib
2.56 or 2.58
because it's still using the autoconf approach instead of meson.
Anyway, I didn't find any obvious glib commits which could have introduced this
issue. If anyone more experienced with glib has an idea, I'm interested.

Afterwards, I've dug into qemu core and how it sets up the connection between
gdb and qemu. And I have several questions / ideas about what is happening.

IIUC, the gdb connection is handled using an io/channel-watch. This adds a
GSource for our given socket (-S being a tcp connection) to be polled
by the main
loop.
For Windows, qio_channel_socket_source_check is the function used for the
check operation. In this function, we are both calling WSAEnumNetworkEvents
and select. The first one seems here only to reset the events while the second
retrieves them. However, it's not an atomic operation. So my guess is that some
events are lost during these two operations. I've tried several
solutions around that
move WSAEnumNetworkEvents after select, replace it with WSAResetEvent, use
auto/manual reset in CreateEvent. None of them worked.

Afterwards, I've tried to replace select by just WSAEnumNetworkEvents which
is supposed to be enough.  But I've faced another issue.
We have two sources connected to the same socket. These two sources have
different conditions G_IO_HUP vs G_IO_IN + G_IO_OUT + ... It's fine on Linux
but on Windows, it seems to be problematic as I'm getting the Read event on the
GSource having just G_IO_HUP. It's kind of logical as Windows API only knows
about HANDLE which is the same in both cases. I've made a quick attempt to
create another HANDLE for the second GSource. But it didn't work.

The GSource with G_IO_HUP is created by:
#0  qio_channel_create_socket_watch (... condition=G_IO_HUP) at
io/channel-watch.c
#1  qio_channel_create_watch at io/channel.c
#2  update_ioc_handlers at chardev/char-socket.c
#3  tcp_chr_connect at chardev/char-socket.c
#4  tcp_chr_new_client at chardev/char-socket.c
#5  qio_net_listener_channel_func at io/net-listener.c
#6  g_main_dispatch at glib/gmain.c
#7  g_main_context_dispatch at glib/gmain.c
#8  os_host_main_loop_wait at util/main-loop.c:480
...

The other is made during the poll_prepare and added as a child_source of
the first one.
#0  qio_channel_create_socket_watch (..., condition=(G_IO_IN |
G_IO_OUT | G_IO_ERR | G_IO_HUP | G_IO_NVAL)) at io/channel-watch.c
#1  qio_channel_create_watch at io/channel.c
#2  io_watch_poll_prepare at chardev/char-io.c
#3  io_watch_poll_prepare at chardev/char-io.c
#4  g_main_context_prepare at glib/gmain.c
#5  os_host_main_loop_wait at util/main-loop.c
...

I'm not familiar enough with glib to know if these child_source are working
fine on Windows.

I'm currently trying to change the approach and instead of creating a
new source,
I want to update the previous one. But it needs some important modifications.
As I'm a bit taken by the time, I'm looking for a workaround and any
advice on that.
For now, the only workaround I've found is to reduce the timeout in g_poll to
catch the missed events earlier...

@Paolo, you were the one implementing the part in io/channel-watch in
a5897205677, do you have any ideas or suggestions ?

I'll try to send an update with a reproducer. But I didn't have time
to create it
yet.

Thanks in advance
Clément

Re: Teensy 4.1 Implementation

2022-08-23 Thread Peter Maydell

On Tue, 23 Aug 2022 at 15:00, Shiny Saana  wrote:
> From experimentation and dissasembly on the ROM, (located in 0x0020_), 
> the very first int (converted to BE) is "0x2020_1000" , which is located to 
> "OCRAM2", also referred as "ROM RAM" by the documentation, and the next int 
> is "0x0020_2091", which both points inside the ROM itself , and which when 
> forcibly disassembled in Ghidra does look like a function.
> So I'm pretty confident the initial vector base address is 0x0020_.

Right. In fact, rereading the datasheet, I see that I overlooked
that the IOMUXC_GPR_GPR16 reset value is actually specified. Bits
[31:7] of that are the CM7_INIT_VTOR, which is to say that they're
bits [31:7] of the initial vector table address. And they're set
so that is 0x0020_.

Your board code should be setting the init-nsvtor property on
the armv7m object to 0x0020, if it isn't already.

> Regarding the "kernel loading" issue, I believe that I was initially
> mistaken. From other examples online, I believed that it was the way
> to load the Teensy image. But thinking and discussing it with another
> ARM dev, wouldn't the ROM itself actually be considered the kernel?

Yes, this would be in line with the way we use -kernel on other
M-profile board models.

> Knowing that, if the call to  armv7m_load_kernel() is mandatory,
> maybe it would make sense to load the ROM in C code via this
> function, with the compiled ROM addresses from 0x_ to
> 0x001F_ being padded with 0.

That's one way to do it. I think it would be better to adjust
armv7m_load_kernel() so that it loaded the file to a board-specific
ROM base, which would avoid the need for the weird zero-padding.
Two options:
 * we could make armv7m_load_kernel() take a base address as well as
   a size for the region it loads the "kernel" to
 * we could have armv7m_load_kernel() be clever enough to query
   the CPU to find out what the VTOR value is and load the
   "kernel" there

I'll have a think about which one of those I prefer, and maybe
write a patch...

thanks
-- PMM

Re: [qemu-web PATCH] Add signing pubkey for python-qemu-qmp package

2022-08-23 Thread John Snow

On Tue, Aug 23, 2022, 5:16 AM Andrea Bolognani  wrote:

> On Thu, Aug 18, 2022 at 12:53:58PM -0400, John Snow wrote:
> > Add the pubkey currently used for signing PyPI releases of qemu.qmp to a
> > stable location where it can be referenced by e.g. Fedora RPM specfiles.
> >
> > At present, the key happens to just simply be my own -- but future
> > releases may be signed by a different key. In that case, we can
> > increment '1.txt' to '2.txt' and so on. The old keys should be left in
> > place.
> >
> > The format for the keyfile was chosen by copying what OpenStack was
> > doing:
> >
> https://releases.openstack.org/_static/0x2426b928085a020d8a90d0d879ab7008d0896c8a.txt
> >
> > Generated with:
> > > gpg --with-fingerprint --list-keys js...@redhat.com > pubkey
> > > gpg --armor --export js...@redhat.com >> pubkey
>
> You might want to pass
>
>   --export-options export-minimal
>
> to the second command in order to obtain a significantly smaller file
> that can still serve the intended purpose.
>

OK, I'll test with this option. Thanks!

((Doesn't know anything about gpg))


> --
> Andrea Bolognani / Red Hat / Virtualization
>
>

Re: [PATCH 4/4] hw/nvme: add MSI-x mask handlers for irqfd

2022-08-23 Thread Jinhao Fan


On 8/16/2022 6:46 PM, Klaus Jensen wrote:

Did qtest work out for you for testing? If so, it would be nice to add a
simple test case as well.


Since MSI-x masking handlers are only implemented for IO queues, if we 
want to use qtest we need to implement utilities for controller 
initialization and IO queue creation. After that we can actually test 
the MSI-x masking feature. Although we may reuse some code from virtio's 
tests, that is still a large amount of work.


Is it possible to get this patch merged without testing? If not, I guess 
I'll have to take the hard work to implement something like 
qtest/libqos/nvme.c

Re: [PATCH v5 1/3] Update linux headers to 6.0-rc1

2022-08-23 Thread Daniel P . Berrangé

On Mon, Aug 22, 2022 at 05:00:03PM +0200, Michal Prívozník wrote:
> On 8/17/22 04:08, Chenyi Qiang wrote:
> > commit 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
> > 
> > Signed-off-by: Chenyi Qiang 
> > ---
> >  include/standard-headers/asm-x86/bootparam.h  |   7 +-
> >  include/standard-headers/drm/drm_fourcc.h |  73 +++-
> >  include/standard-headers/linux/ethtool.h  |  29 +--
> >  include/standard-headers/linux/input.h|  12 +-
> >  include/standard-headers/linux/pci_regs.h |  30 ++-
> >  include/standard-headers/linux/vhost_types.h  |  17 +-
> >  include/standard-headers/linux/virtio_9p.h|   2 +-
> >  .../standard-headers/linux/virtio_config.h|   7 +-
> >  include/standard-headers/linux/virtio_ids.h   |  14 +-
> >  include/standard-headers/linux/virtio_net.h   |  34 +++-
> >  include/standard-headers/linux/virtio_pci.h   |   2 +
> >  linux-headers/asm-arm64/kvm.h |  27 +++
> >  linux-headers/asm-generic/unistd.h|   4 +-
> >  linux-headers/asm-riscv/kvm.h |  22 +++
> >  linux-headers/asm-riscv/unistd.h  |   3 +-
> >  linux-headers/asm-s390/kvm.h  |   1 +
> >  linux-headers/asm-x86/kvm.h   |  33 ++--
> >  linux-headers/asm-x86/mman.h  |  14 --
> >  linux-headers/linux/kvm.h | 172 +-
> >  linux-headers/linux/userfaultfd.h |  10 +-
> >  linux-headers/linux/vduse.h   |  47 +
> >  linux-headers/linux/vfio.h|   4 +-
> >  linux-headers/linux/vfio_zdev.h   |   7 +
> >  linux-headers/linux/vhost.h   |  35 +++-
> >  24 files changed, 523 insertions(+), 83 deletions(-)
> > 
> 
> 
> > diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
> > index bf6e96011d..46de10a809 100644
> > --- a/linux-headers/asm-x86/kvm.h
> > +++ b/linux-headers/asm-x86/kvm.h
> > @@ -198,13 +198,13 @@ struct kvm_msrs {
> > __u32 nmsrs; /* number of msrs in entries */
> > __u32 pad;
> >  
> > -   struct kvm_msr_entry entries[0];
> > +   struct kvm_msr_entry entries[];
> >  };
> >  
> 
> I don't think it's this simple. I think this needs to go hand in hand with 
> kvm_arch_get_supported_msr_feature().
> 
> Also, this breaks clang build:
> 
> clang -m64 -mcx16 -Ilibqemu-x86_64-softmmu.fa.p -I. -I.. -Itarget/i386 
> -I../target/i386 -Iqapi -Itrace -Iui -Iui/shader -I/usr/include/pixman-1 
> -I/usr/include/spice-server -I/usr/include/spice-1 -I/usr/include/glib-2.0 
> -I/usr/lib64/glib-2.0/include -fcolor-diagnostics -Wall -Winvalid-pch -Werror 
> -std=gnu11 -O0 -g -isystem /home/zippy/work/qemu/qemu.git/linux-headers 
> -isystem linux-headers -iquote . -iquote /home/zippy/work/qemu/qemu.git 
> -iquote /home/zippy/work/qemu/qemu.git/include -iquote 
> /home/zippy/work/qemu/qemu.git/tcg/i386 -pthread -D_GNU_SOURCE 
> -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
> -Wredundant-decls -Wundef -Wwrite-strings -Wmissing-prototypes 
> -fno-strict-aliasing -fno-common -fwrapv -Wold-style-definition -Wtype-limits 
> -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body 
> -Wnested-externs -Wendif-labels -Wexpansion-to-defined 
> -Wno-initializer-overrides -Wno-missing-include-dirs 
> -Wno-shift-negative-value -Wno-string-plus-int -Wno-typedef-redefinition 
> -Wno-tautological-type-limit-compare -Wno-psabi -fstack-protector-strong -O0 
> -ggdb -fPIE -isystem../linux-headers -isystemlinux-headers -DNEED_CPU_H 
> '-DCONFIG_TARGET="x86_64-softmmu-config-target.h"' 
> '-DCONFIG_DEVICES="x86_64-softmmu-config-devices.h"' -MD -MQ 
> libqemu-x86_64-softmmu.fa.p/target_i386_kvm_kvm.c.o -MF 
> libqemu-x86_64-softmmu.fa.p/target_i386_kvm_kvm.c.o.d -o 
> libqemu-x86_64-softmmu.fa.p/target_i386_kvm_kvm.c.o -c 
> ../target/i386/kvm/kvm.c
> ../target/i386/kvm/kvm.c:470:25: error: field 'info' with variable sized type 
> 'struct kvm_msrs' not at the end of a struct or class is a GNU extension 
> [-Werror,-Wgnu-variable-sized-type-not-at-end]
> struct kvm_msrs info;
> ^
> ../target/i386/kvm/kvm.c:1701:27: error: field 'cpuid' with variable sized 
> type 'struct kvm_cpuid2' not at the end of a struct or class is a GNU 
> extension [-Werror,-Wgnu-variable-sized-type-not-at-end]
> struct kvm_cpuid2 cpuid;
>   ^
> ../target/i386/kvm/kvm.c:2868:25: error: field 'info' with variable sized 
> type 'struct kvm_msrs' not at the end of a struct or class is a GNU extension 
> [-Werror,-Wgnu-variable-sized-type-not-at-end]
> struct kvm_msrs info;
> ^
> 3 errors generated.

We're perfectly OK with using GNU extensions  in QEMU (eg the g_auto stuff),
so IMHO just set  -Wno-gnu-variable-sized-type-not-at-end to turn off this
warning that's only relevant to people striving for fully portable C
code.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com

[PATCH] gdbstub: only send stop-reply upon request

2022-08-23 Thread Matheus Tavares Bernardino

The GDB remote serial protocol[1] specifies that the communication
between GDB and the stub is driven by GDB:

The host (GDB) sends commands, and the target (the debugging
stub incorporated in your program) sends a response.

This is further emphasized by Embecosm's "Howto: GDB Remote Serial
Protocol" document[2], which says:

This is the only circumstance under which the server sends a
packet: in reply to a packet from the client requiring a
response.

However, QEMU may send stop-reply packets even when not solicited by
GDB, as these are currently issued whenever the vm stops. Although this
behavior doesn't seem to cause problems with GDB itself, it does with
other debuggers that implement the GDB remote serial protocol, like
hexagon-lldb. In this case, the debugger fails when it receives an
unexpected stop-reply from QEMU as attaching to it.

Instead, let's make QEMU send stop-reply messages only as a response to
a previous GDB command, and only to commands that accept a stop-reply
packet. According to [3], these are: 'C', 'c', 'S', 's', '?', 'vCont',
'vAttach', 'vRun', and 'vStopped'. The ones starting with 'v' are not
implemented by gdbstup.c, so we only have to handle the single-letter
ones.

[1]: https://sourceware.org/gdb/download/onlinedocs/gdb/Overview.html#Overview
[2]: 
https://www.embecosm.com/appnotes/ean4/embecosm-howto-rsp-server-ean4-issue-2.html
[3]: 
https://sourceware.org/gdb/download/onlinedocs/gdb/Stop-Reply-Packets.html#Stop-Reply-Packets

Signed-off-by: Matheus Tavares Bernardino 
---
 gdbstub.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/gdbstub.c b/gdbstub.c
index cf869b10e3..b8fc0ce07b 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -339,6 +339,7 @@ enum RSState {
 RS_GETLINE_RLE,
 RS_CHKSUM1,
 RS_CHKSUM2,
+RS_HANDLING_CMD,
 };
 typedef struct GDBState {
 bool init;   /* have we been initialised? */
@@ -2562,6 +2563,7 @@ static int gdb_handle_packet(const char *line_buf)
 {
 const GdbCmdParseEntry *cmd_parser = NULL;
 
+gdbserver_state.state = RS_HANDLING_CMD;
 trace_gdbstub_io_command(line_buf);
 
 switch (line_buf[0]) {
@@ -2821,6 +2823,23 @@ void gdb_set_stop_cpu(CPUState *cpu)
 }
 
 #ifndef CONFIG_USER_ONLY
+static bool cmd_allows_stop_reply(const char *cmd)
+{
+if (strlen(cmd) != 1) {
+return false;
+}
+switch (cmd[0]) {
+case 'C':
+case 'c':
+case 'S':
+case 's':
+case '?':
+return true;
+default:
+return false;
+}
+}
+
 static void gdb_vm_state_change(void *opaque, bool running, RunState state)
 {
 CPUState *cpu = gdbserver_state.c_cpu;
@@ -2829,7 +2848,8 @@ static void gdb_vm_state_change(void *opaque, bool 
running, RunState state)
 const char *type;
 int ret;
 
-if (running || gdbserver_state.state == RS_INACTIVE) {
+if (running || gdbserver_state.state != RS_HANDLING_CMD ||
+!cmd_allows_stop_reply(gdbserver_state.line_buf)) {
 return;
 }
 /* Is there a GDB syscall waiting to be sent?  */
-- 
2.25.1

[PATCH] docs/devel/testing: fix minor typo

2022-08-23 Thread Matheus Tavares Bernardino

Signed-off-by: Matheus Tavares Bernardino 
---
 docs/devel/testing.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 3f6ebd5073..f35f117d95 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -297,7 +297,7 @@ build and test QEMU in predefined and widely accessible 
Linux
 environments. This makes it possible to expand the test coverage
 across distros, toolchain flavors and library versions. The support
 was originally written for Docker although we also support Podman as
-an alternative container runtime. Although the many of the target
+an alternative container runtime. Although many of the target
 names and scripts are prefixed with "docker" the system will
 automatically run on whichever is configured.
 
-- 
2.25.1

Re: [PATCH 03/14] accel/tcg: Suppress auto-invalidate in probe_access_internal

2022-08-23 Thread Richard Henderson


On 8/23/22 02:19, David Hildenbrand wrote:

1) s390_probe_access() documents to "With nonfault=1, return the PGM_
exception that would have been injected into the guest; return 0 if no
exception was detected."

But in case of CONFIG_USER_ONLY, we return the flags returned by
s390_probe_access(), not a PGM__* value. Maybe it doesn't matter,
because we'll simply inject a SIGSEGV in any case ...


I would have said it would matter for MVPG, except that is incorrectly *not* marked as a 
privileged instruction.  There should be no CONFIG_USER_ONLY case to answer there.



2) s390_probe_access() documents that for "CONFIG_USER_ONLY, the
faulting address is stored to env->__excp_addr.".

However, that's only set in s390_cpu_record_sigsegv(). With nonfault=1
that will never actually trigger, right?


Correct.


I assume db9aab5783a2 ("target/s390x: Use probe_access_flags in
s390_probe_access") might have introduced both. We had a flag conversion
to PGM_ in there and stored env->__excp_addr:


Indeed, that commit is faulty in that it breaks the contract of 
s390_probe_access.
It's a shame, though, that we need to carry the extra code for the purpose, and that the 
generic interfaces are not sufficient.



r~

[PULL for 7.1 0/6] testing and doc updates

2022-08-23 Thread Alex Bennée

The following changes since commit ba58ccbef60338d0b7334c714589a6423a3e7f91:

  Merge tag 'for-7.1-hppa' of https://github.com/hdeller/qemu-hppa into staging 
(2022-08-19 09:35:29 -0700)

are available in the Git repository at:

  https://github.com/stsquad/qemu.git tags/pull-for-7.1-fixes-230822-1

for you to fetch changes up to 876ed1c1d373a9eb41352165aae3806215e30b1a:

  qemu-options: try and clarify preferred block semantics (2022-08-23 16:21:43 
+0100)


Testing and doc updates:

  - default timeout for all QemuBaseTests is 120s
  - optimise migration tests to run faster
  - removed duplicate migration test
  - add some clarifying language to block options in manual


Alex Bennée (2):
  tests/avocado: push default timeout to QemuBaseTest
  qemu-options: try and clarify preferred block semantics

Thomas Huth (4):
  tests/qtest/migration-test: Only wait for serial output where migration 
succeeds
  tests/migration/aarch64: Speed up the aarch64 migration test
  tests/migration/i386: Speed up the i386 migration test (when using TCG)
  tests/qtest/migration-test: Remove duplicated test_postcopy from the test 
plan

 tests/migration/aarch64/a-b-kernel.h   | 10 +-
 tests/migration/i386/a-b-bootblock.h   | 12 ++--
 tests/qtest/migration-test.c   |  5 +++--
 qemu-options.hx| 13 +
 tests/avocado/avocado_qemu/__init__.py |  5 -
 tests/migration/aarch64/a-b-kernel.S   |  3 +--
 tests/migration/i386/a-b-bootblock.S   |  1 +
 7 files changed, 33 insertions(+), 16 deletions(-)

-- 
2.30.2

Re: [PATCH v5 17/18] s390x: Add KVM PV dump interface

2022-08-23 Thread Steffen Eiden





On 8/11/22 14:11, Janosch Frank wrote:

Let's add a few bits of code which hide the new KVM PV dump API from
us via new functions.

Signed-off-by: Janosch Frank 
---
  hw/s390x/pv.c | 51 +++
  include/hw/s390x/pv.h |  8 +++
  2 files changed, 59 insertions(+)

diff --git a/hw/s390x/pv.c b/hw/s390x/pv.c
index 2b892b45e8..ce3b6ad3e9 100644
--- a/hw/s390x/pv.c
+++ b/hw/s390x/pv.c
@@ -175,6 +175,57 @@ bool kvm_s390_pv_info_basic_valid(void)
  return info_valid;
  }
  
+static int s390_pv_dump_cmd(uint64_t subcmd, uint64_t uaddr, uint64_t gaddr,

+uint64_t len)
+{
+struct kvm_s390_pv_dmp dmp = {
+.subcmd = subcmd,
+.buff_addr = uaddr,
+.buff_len = len,
+.gaddr = gaddr,
+};
+int ret;
+
+ret = s390_pv_cmd(KVM_PV_DUMP, (void *)&dmp);
+if (ret) {
+error_report("KVM DUMP command %ld failed", subcmd);
+}
+return ret;
+}
+
+int kvm_s390_dump_cpu(S390CPU *cpu, void *buff)
+{
+struct kvm_s390_pv_dmp dmp = {
+.subcmd = KVM_PV_DUMP_CPU,
+.buff_addr = (uint64_t)buff,
+.gaddr = 0,
+.buff_len = info_dump.dump_cpu_buffer_len,
+};
+struct kvm_pv_cmd pv = {
+.cmd = KVM_PV_DUMP,
+.data = (uint64_t)&dmp,
+};
+
+return kvm_vcpu_ioctl(CPU(cpu), KVM_S390_PV_CPU_COMMAND, &pv);
+}
+
+int kvm_s390_dump_init(void)
+{
+return s390_pv_dump_cmd(KVM_PV_DUMP_INIT, 0, 0, 0);
+}
+


I suggest changing that into something like
 * kvm_s390_dump_mem_state
 * kvm_s390_dump_config_state

This would make the effect of that function more clear.
It is not dumping the memory, but getting (part of) the
necessary metadata to process the dump.

Additionally, I suggest to reflect the name changes in the next patch.
I mark all functions I find.

+int kvm_s390_dump_mem(uint64_t gaddr, size_t len, void *dest)
+{
+return s390_pv_dump_cmd(KVM_PV_DUMP_CONFIG_STATE, (uint64_t)dest,
+gaddr, len);
+}
+
+int kvm_s390_dump_complete(void *buff)
+{
+return s390_pv_dump_cmd(KVM_PV_DUMP_COMPLETE, (uint64_t)buff, 0,
+info_dump.dump_config_finalize_len);
+}
+
  #define TYPE_S390_PV_GUEST "s390-pv-guest"
  OBJECT_DECLARE_SIMPLE_TYPE(S390PVGuest, S390_PV_GUEST)
  
diff --git a/include/hw/s390x/pv.h b/include/hw/s390x/pv.h

index 573259cf2b..02a6c06b9f 100644
--- a/include/hw/s390x/pv.h
+++ b/include/hw/s390x/pv.h
@@ -51,6 +51,10 @@ uint64_t kvm_s390_pv_dmp_get_size_cpu(void);
  uint64_t kvm_s390_pv_dmp_get_size_stor_state(void);
  uint64_t kvm_s390_pv_dmp_get_size_complete(void);
  bool kvm_s390_pv_info_basic_valid(void);
+int kvm_s390_dump_init(void);
+int kvm_s390_dump_cpu(S390CPU *cpu, void *buff);
+int kvm_s390_dump_mem(uint64_t addr, size_t len, void *dest);
+int kvm_s390_dump_complete(void *buff);
  #else /* CONFIG_KVM */
  static inline bool s390_is_pv(void) { return false; }
  static inline int s390_pv_query_info(void) { return 0; }
@@ -66,6 +70,10 @@ static inline uint64_t kvm_s390_pv_dmp_get_size_cpu(void) { 
return 0; }
  static inline uint64_t kvm_s390_pv_dmp_get_size_stor_state(void) { return 0; }
  static inline uint64_t kvm_s390_pv_dmp_get_size_complete(void) { return 0; }
  static inline bool kvm_s390_pv_info_basic_valid(void) { return false; }
+static inline int kvm_s390_dump_init(void) { return 0; }
+static inline int kvm_s390_dump_cpu(S390CPU *cpu, void *buff, size_t len) { 
return 0; }
+static inline int kvm_s390_dump_mem(uint64_t addr, size_t len, void *dest) { 
return 0; }
+static inline int kvm_s390_dump_complete(void *buff) { return 0; }
  #endif /* CONFIG_KVM */
  
  int s390_pv_kvm_init(ConfidentialGuestSupport *cgs, Error **errp);


Beside that nit, LGTM.

Reviewed-by: Steffen Eiden

[PULL 2/6] tests/qtest/migration-test: Only wait for serial output where migration succeeds

2022-08-23 Thread Alex Bennée

From: Thomas Huth 

Waiting for the serial output can take a couple of seconds - and since
we're doing a lot of migration tests, this time easily sums up to
multiple minutes. But if a test is supposed to fail, it does not make
much sense to wait for the source to be in the right state first, so
we can skip the waiting here. This way we can speed up all tests where
the migration is supposed to fail. In the gitlab-CI gprov-gcov test,
each of the migration-tests now run two minutes faster!

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Thomas Huth 
Message-Id: <20220819053802.296584-2-th...@redhat.com>
Signed-off-by: Alex Bennée 
Reviewed-by: Juan Quintela 
Message-Id: <20220822165608.2980552-3-alex.ben...@linaro.org>

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 520a5f917c..7be321b62d 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1307,7 +1307,9 @@ static void test_precopy_common(MigrateCommon *args)
 }
 
 /* Wait for the first serial output from the source */
-wait_for_serial("src_serial");
+if (args->result == MIG_TEST_SUCCEED) {
+wait_for_serial("src_serial");
+}
 
 if (!args->connect_uri) {
 g_autofree char *local_connect_uri =
-- 
2.30.2

[PULL 5/6] tests/qtest/migration-test: Remove duplicated test_postcopy from the test plan

2022-08-23 Thread Alex Bennée

From: Thomas Huth 

test_postcopy() is currently run twice - which is just a waste of resources
and time. The commit d1a27b169b2d that introduced the duplicate talked about
renaming the "postcopy/unix" test, but apparently it forgot to remove the
old entry. Let's do that now.

Fixes: d1a27b169b ("tests: Add postcopy tls migration test")
Signed-off-by: Thomas Huth 
Reviewed-by: Daniel P. Berrangé 
Message-Id: <20220819053802.296584-5-th...@redhat.com>
Signed-off-by: Alex Bennée 
Reviewed-by: Juan Quintela 
Message-Id: <20220822165608.2980552-6-alex.ben...@linaro.org>

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 7be321b62d..f63edd0bc8 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2461,7 +2461,6 @@ int main(int argc, char **argv)
 module_call_init(MODULE_INIT_QOM);
 
 if (has_uffd) {
-qtest_add_func("/migration/postcopy/unix", test_postcopy);
 qtest_add_func("/migration/postcopy/plain", test_postcopy);
 qtest_add_func("/migration/postcopy/recovery/plain",
test_postcopy_recovery);
-- 
2.30.2

[PULL 1/6] tests/avocado: push default timeout to QemuBaseTest

2022-08-23 Thread Alex Bennée

All of the QEMU tests eventually end up derrived from this class. Move
the default timeout from LinuxTest to ensure we catch them all. As 15
minutes is fairly excessive we drop the default down to 2 minutes
which is a more reasonable target for tests to aim for.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Message-Id: <20220822165608.2980552-2-alex.ben...@linaro.org>

diff --git a/tests/avocado/avocado_qemu/__init__.py 
b/tests/avocado/avocado_qemu/__init__.py
index ed4853c805..0efd2bd212 100644
--- a/tests/avocado/avocado_qemu/__init__.py
+++ b/tests/avocado/avocado_qemu/__init__.py
@@ -227,6 +227,10 @@ def exec_command_and_wait_for_pattern(test, command,
 _console_interaction(test, success_message, failure_message, command + 
'\r')
 
 class QemuBaseTest(avocado.Test):
+
+# default timeout for all tests, can be overridden
+timeout = 120
+
 def _get_unique_tag_val(self, tag_name):
 """
 Gets a tag value, if unique for a key
@@ -512,7 +516,6 @@ class LinuxTest(LinuxSSHMixIn, QemuSystemTest):
 to start with than the more vanilla `QemuSystemTest` class.
 """
 
-timeout = 900
 distro = None
 username = 'root'
 password = 'password'
-- 
2.30.2

Re: [PATCH v5 18/18] s390x: pv: Add dump support

2022-08-23 Thread Steffen Eiden





On 8/11/22 14:11, Janosch Frank wrote:

Sometimes dumping a guest from the outside is the only way to get the
data that is needed. This can be the case if a dumping mechanism like
KDUMP hasn't been configured or data needs to be fetched at a specific
point. Dumping a protected guest from the outside without help from
fw/hw doesn't yield sufficient data to be useful. Hence we now
introduce PV dump support.

The PV dump support works by integrating the firmware into the dump
process. New Ultravisor calls are used to initiate the dump process,
dump cpu data, dump memory state and lastly complete the dump process.
The UV calls are exposed by KVM via the new KVM_PV_DUMP command and
its subcommands. The guest's data is fully encrypted and can only be
decrypted by the entity that owns the customer communication key for
the dumped guest. Also dumping needs to be allowed via a flag in the
SE header.

On the QEMU side of things we store the PV dump data in the newly
introduced architecture ELF sections (storage state and completion
data) and the cpu notes (for cpu dump data).

Users can use the zgetdump tool to convert the encrypted QEMU dump to an
unencrypted one.

Signed-off-by: Janosch Frank 
---
  dump/dump.c  |  12 +-
  include/sysemu/dump.h|   5 +
  target/s390x/arch_dump.c | 242 ++-
  3 files changed, 227 insertions(+), 32 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 65b18fc602..7cf5eb7c8b 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -720,9 +720,9 @@ static void dump_begin(DumpState *s, Error **errp)
  write_elf_notes(s, errp);
  }
  
-static int64_t dump_filtered_memblock_size(GuestPhysBlock *block,

-   int64_t filter_area_start,
-   int64_t filter_area_length)
+int64_t dump_filtered_memblock_size(GuestPhysBlock *block,
+int64_t filter_area_start,
+int64_t filter_area_length)
  {
  int64_t size, left, right;
  
@@ -740,9 +740,9 @@ static int64_t dump_filtered_memblock_size(GuestPhysBlock *block,

  return size;
  }
  
-static int64_t dump_filtered_memblock_start(GuestPhysBlock *block,

-int64_t filter_area_start,
-int64_t filter_area_length)
+int64_t dump_filtered_memblock_start(GuestPhysBlock *block,
+ int64_t filter_area_start,
+ int64_t filter_area_length)
  {
  if (filter_area_length) {
  /* return -1 if the block is not within filter area */
diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index 358b038d47..245c26fbca 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -216,4 +216,9 @@ typedef struct DumpState {
  uint16_t cpu_to_dump16(DumpState *s, uint16_t val);
  uint32_t cpu_to_dump32(DumpState *s, uint32_t val);
  uint64_t cpu_to_dump64(DumpState *s, uint64_t val);
+
+int64_t dump_filtered_memblock_size(GuestPhysBlock *block, int64_t 
filter_area_start,
+int64_t filter_area_length);
+int64_t dump_filtered_memblock_start(GuestPhysBlock *block, int64_t 
filter_area_start,
+ int64_t filter_area_length);
  #endif
diff --git a/target/s390x/arch_dump.c b/target/s390x/arch_dump.c
index f60a14920d..5e8e03d536 100644
--- a/target/s390x/arch_dump.c
+++ b/target/s390x/arch_dump.c
@@ -16,7 +16,8 @@
  #include "s390x-internal.h"
  #include "elf.h"
  #include "sysemu/dump.h"
-
+#include "hw/s390x/pv.h"
+#include "kvm/kvm_s390x.h"
  
  struct S390xUserRegsStruct {

  uint64_t psw[2];
@@ -76,9 +77,16 @@ typedef struct noteStruct {
  uint64_t todcmp;
  uint32_t todpreg;
  uint64_t ctrs[16];
+uint8_t dynamic[1];  /*
+  * Would be a flexible array member, if
+  * that was legal inside a union. Real
+  * size comes from PV info interface.
+  */
  } contents;
  } QEMU_PACKED Note;
  
+static bool pv_dump_initialized;

+
  static void s390x_write_elf64_prstatus(Note *note, S390CPU *cpu, int id)
  {
  int i;
@@ -177,28 +185,39 @@ static void s390x_write_elf64_prefix(Note *note, S390CPU 
*cpu, int id)
  note->contents.prefix = cpu_to_be32((uint32_t)(cpu->env.psa));
  }
  
+static void s390x_write_elf64_pv(Note *note, S390CPU *cpu, int id)

+{
+note->hdr.n_type = cpu_to_be32(NT_S390_PV_CPU_DATA);
+if (!pv_dump_initialized) {
+return;
+}
+kvm_s390_dump_cpu(cpu, ¬e->contents.dynamic);
+}
  
  typedef struct NoteFuncDescStruct {

  int contents_size;
+uint64_t (*note_size_func)(void); /* NULL for non-dynamic sized contents */
  void (*note_contents_func)(Note *note, S390CPU *cpu, int id);
+bool pvonly;
  } NoteFuncDesc;
  
  static co

[PULL 4/6] tests/migration/i386: Speed up the i386 migration test (when using TCG)

2022-08-23 Thread Alex Bennée

From: Thomas Huth 

When KVM is not available, the i386 migration test also runs in a rather
slow fashion, since the guest code takes a couple of seconds to print
the "B"s on the serial console, and the migration test has to wait for
this each time. Let's increase the frequency here, too, so that the
delays in the migration tests get smaller.

Signed-off-by: Thomas Huth 
Reviewed-by: Daniel P. Berrangé 
Message-Id: <20220819053802.296584-4-th...@redhat.com>
Signed-off-by: Alex Bennée 
Message-Id: <20220822165608.2980552-5-alex.ben...@linaro.org>

diff --git a/tests/migration/i386/a-b-bootblock.h 
b/tests/migration/i386/a-b-bootblock.h
index 7d459d4fde..b7b0fce2ee 100644
--- a/tests/migration/i386/a-b-bootblock.h
+++ b/tests/migration/i386/a-b-bootblock.h
@@ -4,17 +4,17 @@
  * the header and the assembler differences in your patch submission.
  */
 unsigned char x86_bootsect[] = {
-  0xfa, 0x0f, 0x01, 0x16, 0x74, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
+  0xfa, 0x0f, 0x01, 0x16, 0x78, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
   0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02,
   0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41,
   0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xb3, 0x00, 0xb8, 0x00, 0x00, 0x10,
   0x00, 0xfe, 0x00, 0x05, 0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40,
-  0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x75, 0xe9, 0x66, 0xb8, 0x42, 0x00, 0x66,
-  0xba, 0xf8, 0x03, 0xee, 0xeb, 0xde, 0x66, 0x90, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x9a, 0xcf, 0x00,
-  0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00, 0x27, 0x00, 0x5c, 0x7c,
-  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+  0x06, 0x7c, 0xf2, 0xfe, 0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8,
+  0x42, 0x00, 0x66, 0xba, 0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00,
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00,
+  0x00, 0x9a, 0xcf, 0x00, 0xff, 0xff, 0x00, 0x00, 0x00, 0x92, 0xcf, 0x00,
+  0x27, 0x00, 0x60, 0x7c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
diff --git a/tests/migration/i386/a-b-bootblock.S 
b/tests/migration/i386/a-b-bootblock.S
index 3f97f28023..3d464c7568 100644
--- a/tests/migration/i386/a-b-bootblock.S
+++ b/tests/migration/i386/a-b-bootblock.S
@@ -50,6 +50,7 @@ innerloop:
 jl innerloop
 
 inc %bl
+andb $0x3f,%bl
 jnz mainloop
 
 mov $66,%ax
-- 
2.30.2

[PULL 3/6] tests/migration/aarch64: Speed up the aarch64 migration test

2022-08-23 Thread Alex Bennée

From: Thomas Huth 

The migration tests spend a lot of time waiting for a sign of live
of the guest on the serial console. The aarch64 migration code only
outputs "B"s every couple of seconds (at least it takes more than 4
seconds between each characeter on my x86 laptop). There are a lot
of migration tests, and if each test that checks for a successful
migration waits for these characters before and after migration, the
wait time sums up to multiple minutes! Let's use a shorter delay to
speed things up.

While we're at it, also remove a superfluous masking with 0xff - we're
reading and storing bytes, so the upper bits of the register do not
matter anyway.

With these changes, the test runs twice as fast on my laptop, decreasing
the total run time from approx. 8 minutes to only 4 minutes!

Signed-off-by: Thomas Huth 
Reviewed-by: Daniel P. Berrangé 
Message-Id: <20220819053802.296584-3-th...@redhat.com>
Signed-off-by: Alex Bennée 
Message-Id: <20220822165608.2980552-4-alex.ben...@linaro.org>

diff --git a/tests/migration/aarch64/a-b-kernel.h 
b/tests/migration/aarch64/a-b-kernel.h
index 0a9b01137e..34e518d061 100644
--- a/tests/migration/aarch64/a-b-kernel.h
+++ b/tests/migration/aarch64/a-b-kernel.h
@@ -10,9 +10,9 @@ unsigned char aarch64_kernel[] = {
   0x03, 0x00, 0x80, 0x52, 0xe4, 0x03, 0x00, 0xaa, 0x83, 0x00, 0x00, 0x39,
   0x84, 0x04, 0x40, 0x91, 0x9f, 0x00, 0x01, 0xeb, 0xad, 0xff, 0xff, 0x54,
   0x05, 0x00, 0x80, 0x52, 0xe4, 0x03, 0x00, 0xaa, 0x83, 0x00, 0x40, 0x39,
-  0x63, 0x04, 0x00, 0x11, 0x63, 0x1c, 0x00, 0x12, 0x83, 0x00, 0x00, 0x39,
-  0x24, 0x7e, 0x0b, 0xd5, 0x84, 0x04, 0x40, 0x91, 0x9f, 0x00, 0x01, 0xeb,
-  0x2b, 0xff, 0xff, 0x54, 0xa5, 0x04, 0x00, 0x11, 0xa5, 0x1c, 0x00, 0x12,
-  0xbf, 0x00, 0x00, 0x71, 0x81, 0xfe, 0xff, 0x54, 0x43, 0x08, 0x80, 0x52,
-  0x43, 0x00, 0x00, 0x39, 0xf1, 0xff, 0xff, 0x17
+  0x63, 0x04, 0x00, 0x11, 0x83, 0x00, 0x00, 0x39, 0x24, 0x7e, 0x0b, 0xd5,
+  0x84, 0x04, 0x40, 0x91, 0x9f, 0x00, 0x01, 0xeb, 0x4b, 0xff, 0xff, 0x54,
+  0xa5, 0x04, 0x00, 0x11, 0xa5, 0x10, 0x00, 0x12, 0xbf, 0x00, 0x00, 0x71,
+  0xa1, 0xfe, 0xff, 0x54, 0x43, 0x08, 0x80, 0x52, 0x43, 0x00, 0x00, 0x39,
+  0xf2, 0xff, 0xff, 0x17
 };
diff --git a/tests/migration/aarch64/a-b-kernel.S 
b/tests/migration/aarch64/a-b-kernel.S
index 0225945348..a4103ecb71 100644
--- a/tests/migration/aarch64/a-b-kernel.S
+++ b/tests/migration/aarch64/a-b-kernel.S
@@ -53,7 +53,6 @@ innerloop:
 /* increment the first byte of each page by 1 */
 ldrbw3, [x4]
 add w3, w3, #1
-and w3, w3, #0xff
 strbw3, [x4]
 
 /* make sure QEMU user space can see consistent data as MMU is off */
@@ -64,7 +63,7 @@ innerloop:
 blt innerloop
 
 add w5, w5, #1
-and w5, w5, #0xff
+and w5, w5, #0x1f
 cmp w5, #0
 bne mainloop
 
-- 
2.30.2

Re: [PATCH] gdbstub: only send stop-reply upon request

2022-08-23 Thread Matheus Tavares Bernardino

On Thu, Aug 32, 2022 at 1:39 PM Matheus Tavares Bernardino wrote:
>
> ---
>  gdbstub.c | 22 +-
>  1 file changed, 21 insertions(+), 1 deletion(-)

Please ignore this patch. This version is broken, I'm working on a new
one. Sorry for the noise.

[PULL 6/6] qemu-options: try and clarify preferred block semantics

2022-08-23 Thread Alex Bennée

Try to correct any confusion about QEMU's Byzantine disk options by
laying out the preferred "modern" options as-per:

 " (best:  -device + -blockdev,  2nd obsolete syntax: -device +
 -drive,  3rd obsolete syntax: -drive, 4th obsolete syntax: -hdNN)"

Signed-off-by: Alex Bennée 
Acked-by: Kevin Wolf 
Reviewed-by: Daniel P. Berrangé 
Cc: qemu-bl...@nongnu.org
Cc: Kevin Wolf 
Cc: Hanna Reitz 
Cc: Thomas Huth 
Message-Id: <20220822165608.2980552-7-alex.ben...@linaro.org>

diff --git a/qemu-options.hx b/qemu-options.hx
index 3f23a42fa8..31c04f7eea 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1105,6 +1105,19 @@ DEFHEADING()
 
 DEFHEADING(Block device options:)
 
+SRST
+The QEMU block device handling options have a long history and
+have gone through several iterations as the feature set and complexity
+of the block layer have grown. Many online guides to QEMU often
+reference older and deprecated options, which can lead to confusion.
+
+The recommended modern way to describe disks is to use a combination of
+``-device`` to specify the hardware device and ``-blockdev`` to
+describe the backend. The device defines what the guest sees and the
+backend describes how QEMU handles the data.
+
+ERST
+
 DEF("fda", HAS_ARG, QEMU_OPTION_fda,
 "-fda/-fdb file  use 'file' as floppy disk 0/1 image\n", QEMU_ARCH_ALL)
 DEF("fdb", HAS_ARG, QEMU_OPTION_fdb, "", QEMU_ARCH_ALL)
-- 
2.30.2

KVM call for agenda for 2022-09-06

2022-08-23 Thread Juan Quintela




Hi

First of all, I am adding

Please, send any topic that you are interested in covering.

At the end of Monday I will send an email with the agenda or the
cancellation of the call, so hurry up.

After discussions on the QEMU Summit, we are going to have always open a
KVM call where you can add topics.

 Call details:

By popular demand, a google calendar public entry with it

  
https://calendar.google.com/calendar/u/0/r/eventedit/copy/NWR0NWppODdqNXFyYzAwbzYza3RxN2dob3VfMjAyMjA4MjNUMTMwMDAwWiBlZ2VkN2NraTA1bG11MXRuZ3ZrbDN0aGlkc0Bn/cXVpbnRlbGFAcmVkaGF0LmNvbQ?scp=ALL&sf=true

  (I updated the entry, I *think* that now everybody can see it, if you
  can't please let me know.)

If you need phone number details,  contact me privately

Thanks, Juan.

Re: [PATCH v5 1/9] parallels: Out of image offset in BAT leads to image inflation

2022-08-23 Thread Vladimir Sementsov-Ogievskiy


On 8/23/22 16:50, Alexander Ivanov wrote:


On 23.08.2022 11:58, Vladimir Sementsov-Ogievskiy wrote:

On 8/23/22 12:20, Denis V. Lunev wrote:

On 23.08.2022 09:23, Alexander Ivanov wrote:


On 23.08.2022 08:58, Vladimir Sementsov-Ogievskiy wrote:

On 8/22/22 12:05, Alexander Ivanov wrote:

data_end field in BDRVParallelsState is set to the biggest offset present
in BAT. If this offset is outside of the image, any further write will create
the cluster at this offset and/or the image will be truncated to this
offset on close. This is definitely not correct.
Raise an error in parallels_open() if data_end points outside the image and
it is not a check (let the check to repaire the image).

Signed-off-by: Alexander Ivanov 
---
  block/parallels.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index a229c06f25..c245ca35cd 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -732,6 +732,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  BDRVParallelsState *s = bs->opaque;
  ParallelsHeader ph;
  int ret, size, i;
+    int64_t file_size;
  QemuOpts *opts = NULL;
  Error *local_err = NULL;
  char *buf;
@@ -811,6 +812,19 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
  }
  }
  +    file_size = bdrv_getlength(bs->file->bs);
+    if (file_size < 0) {
+    ret = file_size;
+    goto fail;
+    }
+
+    file_size >>= BDRV_SECTOR_BITS;
+    if (s->data_end > file_size && !(flags & BDRV_O_CHECK)) {
+    error_setg(errp, "parallels: Offset in BAT is out of image");
+    ret = -EINVAL;
+    goto fail;
+    }


If image is unaligned to sector size, and image size is less than s->data_end, 
but the difference itself is less than sector, the error message would be 
misleading.

Should we consider "file_size = DIV_ROUND_UP(file_size, BDRV_SECTOR_SIZE)" instead of 
"file_size >>= BDRV_SECTOR_BITS"?

It's hardly possible to get such image on valid scenarios with Qemu (keeping in 
mind bdrv_truncate() call in parallels_close()). But it still may be possible 
to have such images produced by another software or by some failure path.


I think you are right, it would be better to align image size up to sector size.


I would say that we need to align not on sector size but on cluster size.
That would worth additional check.


And not simply align, as data_offset is not necessarily aligned to cluster size.

Finally, what should we check?

I suggest


diff --git a/block/parallels.c b/block/parallels.c
index 6d4ed77f16..b882ea1200 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -725,6 +725,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 BDRVParallelsState *s = bs->opaque;
 ParallelsHeader ph;
 int ret, size, i;
+    int64_t file_size;
 QemuOpts *opts = NULL;
 Error *local_err = NULL;
 char *buf;
@@ -735,6 +736,11 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 return -EINVAL;
 }

+    file_size = bdrv_getlength(bs->file->bs);
+    if (file_size < 0) {
+    return file_size;
+    }
+
 ret = bdrv_pread(bs->file, 0, &ph, sizeof(ph));
 if (ret < 0) {
 goto fail;
@@ -798,6 +804,13 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,

 for (i = 0; i < s->bat_size; i++) {
 int64_t off = bat2sect(s, i);
+    if (off >= file_size) {


Do we really want to compare file size with cluster offset, not with cluster 
end?


the spec say:
   Cluster offsets specified by BAT entries must meet the following 
requirements:
   [...]
   - the value must be lower than the desired file size,




Also I would leave the comparison outside the loop, because I'm going to move 
highest offset calculation to a helper (there are two places with the same 
code).


benefits of check inside the loop:

1. we get good error message with BAT index and problematic offset
2. we are sure that data_end is produced by real cluster. After the loop you'll 
have to consider image with no clusters separately




+ error_setg(errp, "parallels: Offset %" PRIi64 " in BAT[%d] entry "
+   "is larger than file size (%" PRIi64 ")",
+   off, i, file_size);
+    ret = -EINVAL;
+    goto fail;
+    }
 if (off >= s->data_end) {
 s->data_end = off + s->tracks;
 }



- better error message, and we check exactly what's written in the spec 
(docs/interop/parallels.c):


  Cluster offsets specified by BAT entries must meet the following requirements:
  [...]
  - the value must be lower than the desired file size,






--
Best regards,
Vladimir

Re: [PATCH] gdbstub: only send stop-reply upon request

2022-08-23 Thread Peter Maydell

On Tue, 23 Aug 2022 at 16:07, Matheus Tavares Bernardino
 wrote:
>
> The GDB remote serial protocol[1] specifies that the communication
> between GDB and the stub is driven by GDB:
>
> The host (GDB) sends commands, and the target (the debugging
> stub incorporated in your program) sends a response.
>
> This is further emphasized by Embecosm's "Howto: GDB Remote Serial
> Protocol" document[2], which says:
>
> This is the only circumstance under which the server sends a
> packet: in reply to a packet from the client requiring a
> response.

That document is 14 years old and is not a reliable reference
for the protocol today...

In particular,
https://sourceware.org/gdb/download/onlinedocs/gdb/Notification-Packets.html
says:
"The GDB remote serial protocol includes notifications, packets that
require no acknowledgment. Both the GDB and the stub may send
notifications (although the only notifications defined at present
are sent by the stub)."

That said, the thing we're sending here isn't a notification, so
it's not like we're in compliance with the protocol. So I guess
I'm just saying the commit message could be improved.

(The notification mechanism does allow notification of "we just
stopped" if the stub implements non-stop mode, but we don't and
it would be a real pain to try, so that's a bit of a red herring.)

thanks
-- PMM

[PATCH 1/2] target/arm: Remove useless TARGET_BIG_ENDIAN check in armv7m_load_kernel()

2022-08-23 Thread Peter Maydell

Arm system emulation targets always have TARGET_BIG_ENDIAN clear, so
there is no need to have handling in armv7m_load_kernel() for the
case when it is defined.  Remove the unnecessary code.

Side notes:
 * our M-profile implementation is always little-endian (that is, it
   makes the IMPDEF choice that the read-only AIRCR.ENDIANNESS is 0)
 * if we did want to handle big-endian ELF files here we should do it
   the way that hw/arm/boot.c:arm_load_elf() does, by looking at the
   ELF header to see what endianness the file itself is

Signed-off-by: Peter Maydell 
---
 hw/arm/armv7m.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index 990861ee5ef..fa4c2c735da 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -572,17 +572,10 @@ void armv7m_load_kernel(ARMCPU *cpu, const char 
*kernel_filename, int mem_size)
 {
 ssize_t image_size;
 uint64_t entry;
-int big_endian;
 AddressSpace *as;
 int asidx;
 CPUState *cs = CPU(cpu);
 
-#if TARGET_BIG_ENDIAN
-big_endian = 1;
-#else
-big_endian = 0;
-#endif
-
 if (arm_feature(&cpu->env, ARM_FEATURE_EL3)) {
 asidx = ARMASIdx_S;
 } else {
@@ -593,7 +586,7 @@ void armv7m_load_kernel(ARMCPU *cpu, const char 
*kernel_filename, int mem_size)
 if (kernel_filename) {
 image_size = load_elf_as(kernel_filename, NULL, NULL, NULL,
  &entry, NULL, NULL,
- NULL, big_endian, EM_ARM, 1, 0, as);
+ NULL, 0, EM_ARM, 1, 0, as);
 if (image_size < 0) {
 image_size = load_image_targphys_as(kernel_filename, 0,
 mem_size, as);
-- 
2.25.1

[PATCH 0/2] target/arm: armv7m_load_kernel() improvements

2022-08-23 Thread Peter Maydell

Two small patches to armv7m_load_kernel().  The first is just getting
rid of some dead code, that I noticed while working on the function. 
The second is to make boards pass armv7m_load_kernel() the base
address for loading guest (non-ELF) binaries.  At the moment we
assume all M-profile boards start at address 0; this happens to be
true for all the ones we implement right now, but it's not true in
general.  In particular the Teeny board has its ROM at 0x0020_.

I thought about having armv7m_load_kernel() be "clever" and ask the
CPU what init-svtor/init-nsvtor were set to, but that seems like it
might have unanticipated consequences[*].  "Just pass the base address"
is simpler and is how A-profile does it (though for A-profile it's
the loader_start field in struct arm_boot_info rather than an extra
argument).

[*] eg where the board has the rom/flash aliased at both address
0 and some other address, and init-svtor points at an alias;
also Secure vs NonSecure address spaces and loading...

thanks
-- PMM

Peter Maydell (2):
  target/arm: Remove useless TARGET_BIG_ENDIAN check in
armv7m_load_kernel()
  target/arm: Make boards pass base address to armv7m_load_kernel()

 include/hw/arm/boot.h |  5 -
 hw/arm/armv7m.c   | 14 --
 hw/arm/aspeed.c   |  1 +
 hw/arm/microbit.c |  2 +-
 hw/arm/mps2-tz.c  |  2 +-
 hw/arm/mps2.c |  2 +-
 hw/arm/msf2-som.c |  2 +-
 hw/arm/musca.c|  3 ++-
 hw/arm/netduino2.c|  2 +-
 hw/arm/netduinoplus2.c|  2 +-
 hw/arm/stellaris.c|  2 +-
 hw/arm/stm32vldiscovery.c |  2 +-
 12 files changed, 19 insertions(+), 20 deletions(-)

-- 
2.25.1

[PATCH 2/2] target/arm: Make boards pass base address to armv7m_load_kernel()

2022-08-23 Thread Peter Maydell

Currently armv7m_load_kernel() takes the size of the block of memory
where it should load the initial guest image, but assumes that it
should always load it at address 0.  This happens to be true of all
our M-profile boards at the moment, but it isn't guaranteed to always
be so: M-profile CPUs can be configured (via init-svtor and
init-nsvtor, which match equivalent hardware configuration signals)
to have the initial vector table at any address, not just zero.  (For
instance the Teeny board has the boot ROM at address 0x0200_.)

Add a base address argument to armv7m_load_kernel(), so that
callers now pass in both base address and size. All the current
callers pass 0, so this is not a behaviour change.

Signed-off-by: Peter Maydell 
---
I thought about having armv7m_load_kernel() be "clever" and ask the
CPU what init-svtor/init-nsvtor were set to, but that seems like it
might have unanticipated consequences.  "Just pass the base address"
is simpler and is how A-profile does it (though for A-profile it's
the loader_start field in struct arm_boot_info rather than an
extra argument).
---
 include/hw/arm/boot.h | 5 -
 hw/arm/armv7m.c   | 5 +++--
 hw/arm/aspeed.c   | 1 +
 hw/arm/microbit.c | 2 +-
 hw/arm/mps2-tz.c  | 2 +-
 hw/arm/mps2.c | 2 +-
 hw/arm/msf2-som.c | 2 +-
 hw/arm/musca.c| 3 ++-
 hw/arm/netduino2.c| 2 +-
 hw/arm/netduinoplus2.c| 2 +-
 hw/arm/stellaris.c| 2 +-
 hw/arm/stm32vldiscovery.c | 2 +-
 12 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/include/hw/arm/boot.h b/include/hw/arm/boot.h
index c7ebae156ec..f18cc3064ff 100644
--- a/include/hw/arm/boot.h
+++ b/include/hw/arm/boot.h
@@ -25,13 +25,16 @@ typedef enum {
  * armv7m_load_kernel:
  * @cpu: CPU
  * @kernel_filename: file to load
+ * @mem_base: base address to load image at (should be where the
+ *CPU expects to find its vector table on reset)
  * @mem_size: mem_size: maximum image size to load
  *
  * Load the guest image for an ARMv7M system. This must be called by
  * any ARMv7M board. (This is necessary to ensure that the CPU resets
  * correctly on system reset, as well as for kernel loading.)
  */
-void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int 
mem_size);
+void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename,
+hwaddr mem_base, int mem_size);
 
 /* arm_boot.c */
 struct arm_boot_info {
diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index fa4c2c735da..50a9507c0bd 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -568,7 +568,8 @@ static void armv7m_reset(void *opaque)
 cpu_reset(CPU(cpu));
 }
 
-void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int mem_size)
+void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename,
+hwaddr mem_base, int mem_size)
 {
 ssize_t image_size;
 uint64_t entry;
@@ -588,7 +589,7 @@ void armv7m_load_kernel(ARMCPU *cpu, const char 
*kernel_filename, int mem_size)
  &entry, NULL, NULL,
  NULL, 0, EM_ARM, 1, 0, as);
 if (image_size < 0) {
-image_size = load_image_targphys_as(kernel_filename, 0,
+image_size = load_image_targphys_as(kernel_filename, mem_base,
 mem_size, as);
 }
 if (image_size < 0) {
diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index b3bbe06f8fa..bc3ecdb6199 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -1430,6 +1430,7 @@ static void aspeed_minibmc_machine_init(MachineState 
*machine)
 
 armv7m_load_kernel(ARM_CPU(first_cpu),
machine->kernel_filename,
+   0,
AST1030_INTERNAL_FLASH_SIZE);
 }
 
diff --git a/hw/arm/microbit.c b/hw/arm/microbit.c
index e9494334ce7..50df3620882 100644
--- a/hw/arm/microbit.c
+++ b/hw/arm/microbit.c
@@ -57,7 +57,7 @@ static void microbit_init(MachineState *machine)
 mr, -1);
 
 armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename,
-   s->nrf51.flash_size);
+   0, s->nrf51.flash_size);
 }
 
 static void microbit_machine_class_init(ObjectClass *oc, void *data)
diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index 4017392bf5a..394192b9b20 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -1197,7 +1197,7 @@ static void mps2tz_common_init(MachineState *machine)
 }
 
 armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename,
-   boot_ram_size(mms));
+   0, boot_ram_size(mms));
 }
 
 static void mps2_tz_idau_check(IDAUInterface *ii, uint32_t address,
diff --git a/hw/arm/mps2.c b/hw/arm/mps2.c
index bb76fa68890..a86a994dbac 100644
--- a/hw/arm/mps2.c
+++ b/hw/arm/mps2.c
@@ -450,7 +450,7 @@ static void mps2_common_init(MachineState *machine)

Re: Teensy 4.1 Implementation

2022-08-23 Thread Shiny Saana

> Your board code should be setting the init-nsvtor property on
> the armv7m object to 0x0020, if it isn't already.

I'm going to add that property right away!

> Yes, this would be in line with the way we use -kernel on other
> M-profile board models.

Got it! Thank you for correcting my understanding!

> I'll have a think about which one of those I prefer, and maybe
> write a patch...

I do think it would be better with such a solution, yes.
I personally think querying init-nsvtor would reduce complexity and "make
sense".
Or perhaps both?
Something as simple as adding a function taking a functing the base load
address as a parameter, and another that queries init-nsvtor and call it
with that value? Everyone would happy :p

I will go with the "padded 0 bytes" plan for now, so that I can move on
with the rest of the implementation, but please do ping me if you do end up
upstreaming such a patch!
As always, thanks again!

Saana

Le mar. 23 août 2022 à 16:22, Peter Maydell  a
écrit :

> On Tue, 23 Aug 2022 at 15:00, Shiny Saana  wrote:
> > From experimentation and dissasembly on the ROM, (located in
> 0x0020_), the very first int (converted to BE) is "0x2020_1000" , which
> is located to "OCRAM2", also referred as "ROM RAM" by the documentation,
> and the next int is "0x0020_2091", which both points inside the ROM itself
> , and which when forcibly disassembled in Ghidra does look like a function.
> > So I'm pretty confident the initial vector base address is 0x0020_.
>
> Right. In fact, rereading the datasheet, I see that I overlooked
> that the IOMUXC_GPR_GPR16 reset value is actually specified. Bits
> [31:7] of that are the CM7_INIT_VTOR, which is to say that they're
> bits [31:7] of the initial vector table address. And they're set
> so that is 0x0020_.
>
> Your board code should be setting the init-nsvtor property on
> the armv7m object to 0x0020, if it isn't already.
>
> > Regarding the "kernel loading" issue, I believe that I was initially
> > mistaken. From other examples online, I believed that it was the way
> > to load the Teensy image. But thinking and discussing it with another
> > ARM dev, wouldn't the ROM itself actually be considered the kernel?
>
> Yes, this would be in line with the way we use -kernel on other
> M-profile board models.
>
> > Knowing that, if the call to  armv7m_load_kernel() is mandatory,
> > maybe it would make sense to load the ROM in C code via this
> > function, with the compiled ROM addresses from 0x_ to
> > 0x001F_ being padded with 0.
>
> That's one way to do it. I think it would be better to adjust
> armv7m_load_kernel() so that it loaded the file to a board-specific
> ROM base, which would avoid the need for the weird zero-padding.
> Two options:
>  * we could make armv7m_load_kernel() take a base address as well as
>a size for the region it loads the "kernel" to
>  * we could have armv7m_load_kernel() be clever enough to query
>the CPU to find out what the VTOR value is and load the
>"kernel" there
>
> I'll have a think about which one of those I prefer, and maybe
> write a patch...
>
> thanks
> -- PMM
>

Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-08-23 Thread Sean Christopherson

On Tue, Aug 23, 2022, David Hildenbrand wrote:
> On 19.08.22 05:38, Hugh Dickins wrote:
> > On Fri, 19 Aug 2022, Sean Christopherson wrote:
> >> On Thu, Aug 18, 2022, Kirill A . Shutemov wrote:
> >>> On Wed, Aug 17, 2022 at 10:40:12PM -0700, Hugh Dickins wrote:
>  On Wed, 6 Jul 2022, Chao Peng wrote:
>  But since then, TDX in particular has forced an effort into preventing
>  (by flags, seals, notifiers) almost everything that makes it shmem/tmpfs.
> 
>  Are any of the shmem.c mods useful to existing users of shmem.c? No.
>  Is MFD_INACCESSIBLE useful or comprehensible to memfd_create() users? No.
> >>
> >> But QEMU and other VMMs are users of shmem and memfd.  The new features 
> >> certainly
> >> aren't useful for _all_ existing users, but I don't think it's fair to say 
> >> that
> >> they're not useful for _any_ existing users.
> > 
> > Okay, I stand corrected: there exist some users of memfd_create()
> > who will also have use for "INACCESSIBLE" memory.
> 
> As raised in reply to the relevant patch, I'm not sure if we really have
> to/want to expose MFD_INACCESSIBLE to user space. I feel like this is a
> requirement of specific memfd_notifer (memfile_notifier) implementations
> -- such as TDX that will convert the memory and MCE-kill the machine on
> ordinary write access. We might be able to set/enforce this when
> registering a notifier internally instead, and fail notifier
> registration if a condition isn't met (e.g., existing mmap).
>
> So I'd be curious, which other users of shmem/memfd would benefit from
> (MMU)-"INACCESSIBLE" memory obtained via memfd_create()?

I agree that there's no need to expose the inaccessible behavior via uAPI.  
Making
it a kernel-internal thing that's negotiated/resolved when KVM binds to the fd
would align INACCESSIBLE with the UNMOVABLE and UNRECLAIMABLE flags (and any 
other
flags that get added in the future).

AFAICT, the user-visible flag is a holdover from the early RFCs and doesn't 
provide
any unique functionality.

If we go that route, we might want to have shmem/memfd require INACCESSIBLE to 
be
set for the initial implementation.  I.e. disallow binding without INACCESSIBLE
until there's a use case.

Re: [PATCH 03/14] accel/tcg: Suppress auto-invalidate in probe_access_internal

2022-08-23 Thread Richard Henderson


On 8/23/22 08:19, Richard Henderson wrote:

On 8/23/22 02:19, David Hildenbrand wrote:

1) s390_probe_access() documents to "With nonfault=1, return the PGM_
exception that would have been injected into the guest; return 0 if no
exception was detected."

But in case of CONFIG_USER_ONLY, we return the flags returned by
s390_probe_access(), not a PGM__* value. Maybe it doesn't matter,
because we'll simply inject a SIGSEGV in any case ...


I would have said it would matter for MVPG, except that is incorrectly *not* marked as a 
privileged instruction.  There should be no CONFIG_USER_ONLY case to answer there.


Ho hum, misread that -- only privileged if access key specified (and we do not check or 
support those).



r~

Re: [PATCH v8 08/12] s390x/cpu_topology: implementing numa for the s390x topology

2022-08-23 Thread Pierre Morel





On 7/22/22 14:08, Janis Schoetterl-Glausch wrote:

On 7/21/22 13:41, Pierre Morel wrote:



On 7/21/22 10:16, Janis Schoetterl-Glausch wrote:

On 7/21/22 09:58, Pierre Morel wrote:





...snip...



You are right, numa is redundant for us as we specify the topology using the 
core-id.
The roadmap I would like to discuss is using a new:

(qemu) cpu_move src dst

where src is the current core-id and dst is the destination core-id.

I am aware that there are deep implication on current cpu code but I do not 
think it is not possible.
If it is unpossible then we would need a new argument to the device_add for cpu to define 
the "effective_core_id"
But we will still need the new hmp command to update the topology.


Why the requirement for a hmp command specifically? Would qom-set on a cpu 
property work?



We will work on modifying the topology in another series.
Let's discuss this at that moment.




I don't think core-id is the right one, that's the guest visible CPU address, 
isn't it?


Yes, the topology is the one seen by the guest.


Although it seems badly named then, since multiple threads are part of the same 
core (ok, we don't support threads).


I guess that threads will always move with the core or... we do not support 
threads.


Instead socket-id, book-id could be changed dynamically instead of being 
computed from the core-id.



What becomes of the core-id ?


It would stay the same. It has to, right? Can't change the address as reported 
by STAP.
I would just be completely independent of the other ids.



We will work on modifying the topology in another series.

--
Pierre Morel
IBM Lab Boeblingen

Re: [PATCH v8 02/12] s390x/cpu_topology: CPU topology objects and structures

2022-08-23 Thread Pierre Morel





On 8/23/22 15:30, Thomas Huth wrote:

On 20/06/2022 16.03, Pierre Morel wrote:

We use new objects to have a dynamic administration of the CPU topology.
The highest level object in this implementation is the s390 book and
in this first implementation of CPU topology for S390 we have a single
book.
The book is built as a SYSBUS bridge during the CPU initialization.
Other objects, sockets and core will be built after the parsing
of the QEMU -smp argument.

Every object under this single book will be build dynamically
immediately after a CPU has be realized if it is needed.
The CPU will fill the sockets once after the other, according to the
number of core per socket defined during the smp parsing.

Each CPU inside a socket will be represented by a bit in a 64bit
unsigned long. Set on plug and clear on unplug of a CPU.

For the S390 CPU topology, thread and cores are merged into
topology cores and the number of topology cores is the multiplication
of cores by the numbers of threads.

Signed-off-by: Pierre Morel 
---
  hw/s390x/cpu-topology.c | 391 
  hw/s390x/meson.build    |   1 +
  hw/s390x/s390-virtio-ccw.c  |   6 +
  include/hw/s390x/cpu-topology.h |  74 ++
  target/s390x/cpu.h  |  47 
  5 files changed, 519 insertions(+)
  create mode 100644 hw/s390x/cpu-topology.c
  create mode 100644 include/hw/s390x/cpu-topology.h

...

+bool s390_topology_new_cpu(MachineState *ms, int core_id, Error **errp)
+{
+    S390TopologyBook *book;
+    S390TopologySocket *socket;
+    S390TopologyCores *cores;
+    int nb_cores_per_socket;
+    int origin, bit;
+
+    book = s390_get_topology();
+
+    nb_cores_per_socket = ms->smp.cores * ms->smp.threads;
+
+    socket = s390_get_socket(ms, book, core_id / nb_cores_per_socket, 
errp);

+    if (!socket) {
+    return false;
+    }
+
+    /*
+ * At the core level, each CPU is represented by a bit in a 64bit
+ * unsigned long. Set on plug and clear on unplug of a CPU.
+ * The firmware assume that all CPU in the core description have 
the same

+ * type, polarization and are all dedicated or shared.
+ * In the case a socket contains CPU with different type, 
polarization
+ * or dedication then they will be defined in different CPU 
containers.
+ * Currently we assume all CPU are identical and the only reason 
to have
+ * several S390TopologyCores inside a socket is to have more than 
64 CPUs
+ * in that case the origin field, representing the offset of the 
first CPU
+ * in the CPU container allows to represent up to the maximal 
number of

+ * CPU inside several CPU containers inside the socket container.
+ */
+    origin = 64 * (core_id / 64);


Maybe faster:

 origin = core_id & ~63;


yes thanks



By the way, where is this limitation to 64 coming from? Just because 
we're using a "unsigned long" for now? Or is this a limitation from the 
architecture?




It is a limitation from the architecture who use a 64bit field to 
represent the CPUs in a CPU TLE mask.


but this patch serie is quite old now and I changed some things 
according to the comments I received

I plan to send the new version before the end of the week.



+    cores = s390_get_cores(ms, socket, origin, errp);
+    if (!cores) {
+    return false;
+    }
+
+    bit = 63 - (core_id - origin);
+    set_bit(bit, &cores->mask);
+    cores->origin = origin;
+
+    return true;
+}

...

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index cc3097bfee..a586875b24 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -43,6 +43,7 @@
  #include "sysemu/sysemu.h"
  #include "hw/s390x/pv.h"
  #include "migration/blocker.h"
+#include "hw/s390x/cpu-topology.h"
  static Error *pv_mig_blocker;
@@ -89,6 +90,7 @@ static void s390_init_cpus(MachineState *machine)
  /* initialize possible_cpus */
  mc->possible_cpu_arch_ids(machine);
+    s390_topology_setup(machine);


Is this safe with regards to migration? Did you tried a ping-pong 
migration from an older QEMU to a QEMU with your modifications and back 
to the older one? If it does not work, we might need to wire this setup 
to the machine types...


I did not, I will add this test.





  for (i = 0; i < machine->smp.cpus; i++) {
  s390x_new_cpu(machine->cpu_type, i, &error_fatal);
  }
@@ -306,6 +308,10 @@ static void s390_cpu_plug(HotplugHandler 
*hotplug_dev,

  g_assert(!ms->possible_cpus->cpus[cpu->env.core_id].cpu);
  ms->possible_cpus->cpus[cpu->env.core_id].cpu = OBJECT(dev);
+    if (!s390_topology_new_cpu(ms, cpu->env.core_id, errp)) {
+    return;
+    }
+
  if (dev->hotplugged) {
  raise_irq_cpu_hotplug();
  }


  Thomas



Thanks Thomas,

Regards,
Pierre

--
Pierre Morel
IBM Lab Boeblingen

Re: [PATCH 1/2] block: pass OnOffAuto instead of bool to block_acct_setup()

2022-08-23 Thread Vladimir Sementsov-Ogievskiy


On 8/23/22 12:57, Denis V. Lunev wrote:

We would have one more place for block_acct_setup() calling, which should
not corrupt original value.

Signed-off-by: Denis V. Lunev
CC: Peter Krempa
CC: Markus Armbruster
CC: John Snow
CC: Kevin Wolf
CC: Hanna Reitz
CC: Vladimir Sementsov-Ogievskiy


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [PATCH] hw/nvme: Abort copy command when format is one while pif is zero

2022-08-23 Thread Klaus Jensen

On Aug 25 22:53, Francis Pravin Antony Michael Raj wrote:
> As per the NVMe command set specification section-3.2.2,
> If:
> i) The namespace is formatted to use 16b Guard Protection Information 
> (i.e., pif = 0) and
> ii) The Descriptor Format is not cleared to 0h
> Then the copy command should be aborted with the status code of Invalid 
> Namespace or Format
> 
> Signed-off-by: Francis Pravin Antony Michael Raj 
> 
> Signed-off-by: Jonathan Derrick 
> ---
>  hw/nvme/ctrl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index 87aeba0564..cb4c0f80bc 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -3040,7 +3040,7 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
>  goto invalid;
>  }
>  
> -if (ns->pif && format != 0x1) {
> +if ((ns->pif == 0x0 && format != 0x0) || (ns->pif && format != 0x1)) {
>  status = NVME_INVALID_FORMAT | NVME_DNR;
>  goto invalid;
>  }
> -- 
> 2.25.1
> 

Thanks, looks good.

Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature

1 2 3 >

1 - 100 of 207 matches

Mail list logo