Re: [PATCH v4 11/22] hw/virtio: move vhd->started check into helper and add FIXME

2022-08-07 Thread Raphael Norwitz
On Tue, Aug 02, 2022 at 10:49:59AM +0100, Alex Bennée wrote:
> The `started` field is manipulated internally within the vhost code
> except for one place, vhost-user-blk via f5b22d06fb (vhost: recheck
> dev state in the vhost_migration_log routine). Mark that as a FIXME
> because it introduces a potential race. I think the referenced fix
> should be tracking its state locally.

I don't think we can track the state locally. As described in the commit
message for f5b22d06fb, the state is used by vhost code in the
vhost_migration_log() function so we probably need something at the
vhost level. I do agree we shouldn't re-use vdev->started.

Maybe we should add another 'active' variable in vhost_dev? I'm happy
to send a patch for that.

Until we agree on a better solution I'm happy with the FIXME.

Reviewed-by: Raphael Norwitz 

> 
> Signed-off-by: Alex Bennée 
> ---
>  include/hw/virtio/vhost.h  | 12 
>  hw/block/vhost-user-blk.c  | 10 --
>  hw/scsi/vhost-scsi.c   |  4 ++--
>  hw/scsi/vhost-user-scsi.c  |  2 +-
>  hw/virtio/vhost-user-fs.c  |  3 ++-
>  hw/virtio/vhost-user-i2c.c |  4 ++--
>  hw/virtio/vhost-user-rng.c |  4 ++--
>  hw/virtio/vhost-user-vsock.c   |  2 +-
>  hw/virtio/vhost-vsock-common.c |  3 ++-
>  hw/virtio/vhost-vsock.c|  2 +-
>  10 files changed, 33 insertions(+), 13 deletions(-)
> 
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 586c5457e2..61b957e927 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -94,6 +94,7 @@ struct vhost_dev {
>  uint64_t protocol_features;
>  uint64_t max_queues;
>  uint64_t backend_cap;
> +/* @started: is the vhost device started? */
>  bool started;
>  bool log_enabled;
>  uint64_t log_size;
> @@ -165,6 +166,17 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
> VirtIODevice *vdev);
>   */
>  void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
>  
> +/**
> + * vhost_dev_is_started() - report status of vhost device
> + * @hdev: common vhost_dev structure
> + *
> + * Return the started status of the vhost device
> + */
> +static inline bool vhost_dev_is_started(struct vhost_dev *hdev)
> +{
> +return hdev->started;
> +}
> +
>  /**
>   * vhost_dev_start() - start the vhost device
>   * @hdev: common vhost_dev structure
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 9117222456..2bba42478d 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -229,7 +229,7 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, 
> uint8_t status)
>  return;
>  }
>  
> -if (s->dev.started == should_start) {
> +if (vhost_dev_is_started(&s->dev) == should_start) {
>  return;
>  }
>  
> @@ -286,7 +286,7 @@ static void vhost_user_blk_handle_output(VirtIODevice 
> *vdev, VirtQueue *vq)
>  return;
>  }
>  
> -if (s->dev.started) {
> +if (vhost_dev_is_started(&s->dev)) {
>  return;
>  }
>  
> @@ -415,6 +415,12 @@ static void vhost_user_blk_event(void *opaque, 
> QEMUChrEvent event)
>   * the vhost migration code. If disconnect was caught there is an
>   * option for the general vhost code to get the dev state without
>   * knowing its type (in this case vhost-user).
> + *
> + * FIXME: this is sketchy to be reaching into vhost_dev
> + * now because we are forcing something that implies we
> + * have executed vhost_dev_stop() but that won't happen
> + * until vhost_user_blk_stop() gets called from the bh.
> + * Really this state check should be tracked locally.
>   */
>  s->dev.started = false;
>  }
> diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
> index 3059068175..bdf337a7a2 100644
> --- a/hw/scsi/vhost-scsi.c
> +++ b/hw/scsi/vhost-scsi.c
> @@ -120,7 +120,7 @@ static void vhost_scsi_set_status(VirtIODevice *vdev, 
> uint8_t val)
>  start = false;
>  }
>  
> -if (vsc->dev.started == start) {
> +if (vhost_dev_is_started(&vsc->dev) == start) {
>  return;
>  }
>  
> @@ -147,7 +147,7 @@ static int vhost_scsi_pre_save(void *opaque)
>  
>  /* At this point, backend must be stopped, otherwise
>   * it might keep writing to memory. */
> -assert(!vsc->dev.started);
> +assert(!vhost_dev_is_started(&vsc->dev));
>  
>  return 0;
>  }
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index 1b2f7eed98..bc37317d55 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -49,7 +49,7 @@ static void vhost_user_scsi_set_status(VirtIODevice *vdev, 
> uint8_t status)
>  VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
>  bool start = (status & VIRTIO_CONFIG_S_DRIVER_OK) && vdev->vm_running;
>  
> -if (vsc->dev.started == start) {
> +if (vhost_dev_is_started

Re: [PATCH v6 4/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-08-07 Thread Sam Li
Sam Li  于2022年8月5日周五 15:58写道:
>
> By adding zone management operations in BlockDriver, storage controller
> emulation can use the new block layer APIs including Report Zone and
> four zone management operations (open, close, finish, reset).
>
> Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
> zone_close(zc), zone_reset(zrs), zone_finish(zf).
>
> For example, to test zone_report, use following command:
> $ ./build/qemu-io --image-opts driver=zoned_host_device, filename=/dev/nullb0
> -c "zrp offset nr_zones"
>
> Signed-off-by: Sam Li 
> Reviewed-by: Hannes Reinecke 
> ---
>  block/block-backend.c|  50 +
>  block/coroutines.h   |   6 +
>  block/file-posix.c   | 315 ++-
>  block/io.c   |  41 
>  include/block/block-common.h |   1 -
>  include/block/block-io.h |  13 ++
>  include/block/block_int-common.h |  22 ++-
>  include/block/raw-aio.h  |   6 +-
>  meson.build  |   1 +
>  qapi/block-core.json |   8 +-
>  qemu-io-cmds.c   | 144 ++
>  11 files changed, 601 insertions(+), 6 deletions(-)
>
> diff --git a/block/block-backend.c b/block/block-backend.c
> index d4a5df2ac2..fc639b0cd7 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1775,6 +1775,56 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
>  return ret;
>  }
>
> +/*
> + * Send a zone_report command.
> + * offset is a byte offset from the start of the device. No alignment
> + * required for offset.
> + * nr_zones represents IN maximum and OUT actual.
> + */
> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> +unsigned int *nr_zones,
> +BlockZoneDescriptor *zones)
> +{
> +int ret;
> +IO_CODE();
> +
> +blk_inc_in_flight(blk); /* increase before waiting */
> +blk_wait_while_drained(blk);
> +if (!blk_is_available(blk)) {
> +blk_dec_in_flight(blk);
> +return -ENOMEDIUM;
> +}
> +ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
> +blk_dec_in_flight(blk);
> +return ret;
> +}
> +
> +/*
> + * Send a zone_management command.
> + * offset is the starting zone specified as a sector offset.
> + * len is the maximum number of sectors the command should operate on.
> + */
> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> +int64_t offset, int64_t len)
> +{
> +int ret;
> +IO_CODE();
> +
> +ret = blk_check_byte_request(blk, offset, len);
> +if (ret < 0) {
> +return ret;
> +}
> +blk_inc_in_flight(blk);
> +blk_wait_while_drained(blk);
> +if (!blk_is_available(blk)) {
> +blk_dec_in_flight(blk);
> +return -ENOMEDIUM;
> +}
> +ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
> +blk_dec_in_flight(blk);
> +return ret;
> +}
> +
>  void blk_drain(BlockBackend *blk)
>  {
>  BlockDriverState *bs = blk_bs(blk);
> diff --git a/block/coroutines.h b/block/coroutines.h
> index 3a2bad564f..e3f62d94e5 100644
> --- a/block/coroutines.h
> +++ b/block/coroutines.h
> @@ -63,6 +63,12 @@ nbd_co_do_establish_connection(BlockDriverState *bs, bool 
> blocking,
> Error **errp);
>
>
> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> +unsigned int *nr_zones,
> +BlockZoneDescriptor *zones);
> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> +  int64_t offset, int64_t len);
> +
>  /*
>   * "I/O or GS" API functions. These functions can run without
>   * the BQL, but only in one specific iothread/main loop.
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 4785203eea..2627431581 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -67,6 +67,9 @@
>  #include 
>  #include 
>  #include 
> +#if defined(CONFIG_BLKZONED)
> +#include 
> +#endif
>  #include 
>  #include 
>  #include 
> @@ -216,6 +219,13 @@ typedef struct RawPosixAIOData {
>  PreallocMode prealloc;
>  Error **errp;
>  } truncate;
> +struct {
> +unsigned int *nr_zones;
> +BlockZoneDescriptor *zones;
> +} zone_report;
> +struct {
> +BlockZoneOp op;
> +} zone_mgmt;
>  };
>  } RawPosixAIOData;
>
> @@ -1369,7 +1379,7 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  #endif
>
>  if (bs->sg || S_ISBLK(st.st_mode)) {
> -int ret = hdev_get_max_hw_transfer(s->fd, &st);
> +ret = hdev_get_max_hw_transfer(s->fd, &st);
>
>  if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
>  bs->bl.max_hw_transfer = ret;
> @@ -1386,6 +1396,27 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **err

Re: [PATCH v5 5/8] blockdev: Add a new IF type IF_OTHER

2022-08-07 Thread Markus Armbruster
Daniel P. Berrangé  writes:

> On Thu, Aug 04, 2022 at 05:30:40PM +0200, Markus Armbruster wrote:
>> Daniel P. Berrangé  writes:
>> 
>> > On Thu, Aug 04, 2022 at 04:56:15PM +0200, Markus Armbruster wrote:
>> >> Daniel P. Berrangé  writes:
>> >> 
>> >> > On Thu, Jul 28, 2022 at 10:46:35AM +0100, Peter Maydell wrote:
>> >> >> On Wed, 27 Jul 2022 at 20:03, Kevin Wolf  wrote:
>> >> >> >
>> >> >> > Am 18.07.2022 um 11:49 hat Markus Armbruster geschrieben:
>> >> >> > > An OTP device isn't really a parallel flash, and neither are 
>> >> >> > > eFuses.
>> >> >> > > More fast-and-lose use of IF_PFLASH may exist in the tree, and 
>> >> >> > > maybe of
>> >> >> > > other interface types, too.
>> >> >> > >
>> >> >> > > This patch introduces IF_OTHER.  The patch after next uses it for 
>> >> >> > > an
>> >> >> > > EEPROM device.
>> >> >> > >
>> >> >> > > Do we want IF_OTHER?
>> >> >> >
>> >> >> > What would the semantics even be? Any block device that doesn't pick 
>> >> >> > up
>> >> >> > a different category may pick up IF_OTHER backends?
>> >> >> >
>> >> >> > It certainly feels like a strange interface to ask for "other" disk 
>> >> >> > and
>> >> >> > then getting as surprise what this other thing might be. It's
>> >> >> > essentially the same as having an explicit '-device other', and I
>> >> >> > suppose most people would find that strange.
>> >> >> >
>> >> >> > > If no, I guess we get to abuse IF_PFLASH some more.
>> >> >> > >
>> >> >> > > If yes, I guess we should use IF_PFLASH only for actual parallel 
>> >> >> > > flash
>> >> >> > > memory going forward.  Cleaning up existing abuse of IF_PFLASH may 
>> >> >> > > not
>> >> >> > > be worth the trouble, though.
>> >> >> > >
>> >> >> > > Thoughts?
>> >> >> >
>> >> >> > If the existing types aren't good enough (I don't have an opinion on
>> >> >> > whether IF_PFLASH is a good match), let's add a new one. But a 
>> >> >> > specific
>> >> >> > new one, not just "other".
>> >> >> 
>> >> >> I think the common thread is "this isn't what anybody actually thinks
>> >> >> of as being a 'disk', but we would like to back it with a block device
>> >> >> anyway". That can cover a fair range of possibilities...
>> >> >
>> >> > Given that, do we even want/have to use -drive for this ?We can use
>> >> > -blockdev for the backend and reference that from any -device we want
>> >> > to create, and leave -drive out of the picture entirely
>> >> 
>> >> -drive is our only means to configure onboard devices.
>> >> 
>> >> We've talked about better means a few times, but no conclusions.  I can
>> >> dig up pointers, if you're interested.
>> >
>> > For onboard pflash with x86, we've just got properties against the
>> > machine that we can point to a blockdev.
>> 
>> True, but the vast majority of onboard block devices doesn't come with
>> such properties.  Please see
>> 
>> Subject: On configuring onboard block devices with -blockdev (was: [PATCH v4 
>> 6/7] hw/nvram: Update at24c EEPROM init function in NPCM7xx boards)
>> Date: Mon, 15 Nov 2021 16:28:33 +0100
>> Message-ID: <875ystigke.fsf...@dusky.pond.sub.org>
>> https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg03173.html
>
> My take away from your mail there is that in the absence of better ideas
> we should at least use machine properties for anything new we do so we
> don't make the problem worse than it already is. It feels more useful
> than inventing new IF_xxx possibilities for something we think is the
> wrong approach already.

The difficulty of providing machine properties for existing devices and
the lack of adoption even for new devices make me doubt they are a
viable path forward.  In the message I referenced above, I wrote:

If "replace them all by machine properties" is the way forward, we
need to get going.  At the current rate of one file a year (measured
charitably), we'll be done around 2090, provided we don't add more
(we've added quite a few since I did the first replacement).

I figure this has slipped to the 22nd century by now.

Yes, more uses of -drive are steps backward.  But they are trivially
easy, and also drops in the bucket.  Machine properties are more
difficult, and whether they actually take us forward seems doubtful.