Re: [RFC v6 2/4] virtio-blk: add zoned storage emulation for zoned devices

Stefan Hajnoczi Tue, 31 Jan 2023 07:11:55 -0800

On Mon, Jan 30, 2023 at 06:30:16PM +0000, Daniel P. Berrangé wrote:
> On Mon, Jan 30, 2023 at 10:17:48AM -0500, Stefan Hajnoczi wrote:
> > On Mon, 30 Jan 2023 at 07:33, Daniel P. Berrangé <berra...@redhat.com> 
> > wrote:
> > >
> > > On Sun, Jan 29, 2023 at 06:39:49PM +0800, Sam Li wrote:
> > > > This patch extends virtio-blk emulation to handle zoned device commands
> > > > by calling the new block layer APIs to perform zoned device I/O on
> > > > behalf of the guest. It supports Report Zone, four zone oparations 
> > > > (open,
> > > > close, finish, reset), and Append Zone.
> > > >
> > > > The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
> > > > support zoned block devices. Regular block devices(conventional zones)
> > > > will not be set.
> > > >
> > > > The guest os can use blktests, fio to test those commands on zoned 
> > > > devices.
> > > > Furthermore, using zonefs to test zone append write is also supported.
> > > >
> > > > Signed-off-by: Sam Li <faithilike...@gmail.com>
> > > > ---
> > > >  hw/block/virtio-blk-common.c |   2 +
> > > >  hw/block/virtio-blk.c        | 394 +++++++++++++++++++++++++++++++++++
> > > >  2 files changed, 396 insertions(+)
> > > >
> > >
> > > > @@ -949,6 +1311,30 @@ static void virtio_blk_update_config(VirtIODevice 
> > > > *vdev, uint8_t *config)
> > > >          blkcfg.write_zeroes_may_unmap = 1;
> > > >          virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1);
> > > >      }
> > > > +    if (bs->bl.zoned != BLK_Z_NONE) {
> > > > +        switch (bs->bl.zoned) {
> > > > +        case BLK_Z_HM:
> > > > +            blkcfg.zoned.model = VIRTIO_BLK_Z_HM;
> > > > +            break;
> > > > +        case BLK_Z_HA:
> > > > +            blkcfg.zoned.model = VIRTIO_BLK_Z_HA;
> > > > +            break;
> > > > +        default:
> > > > +            g_assert_not_reached();
> > > > +        }
> > > > +
> > > > +        virtio_stl_p(vdev, &blkcfg.zoned.zone_sectors,
> > > > +                     bs->bl.zone_size / 512);
> > > > +        virtio_stl_p(vdev, &blkcfg.zoned.max_active_zones,
> > > > +                     bs->bl.max_active_zones);
> > > > +        virtio_stl_p(vdev, &blkcfg.zoned.max_open_zones,
> > > > +                     bs->bl.max_open_zones);
> > > > +        virtio_stl_p(vdev, &blkcfg.zoned.write_granularity, blk_size);
> > > > +        virtio_stl_p(vdev, &blkcfg.zoned.max_append_sectors,
> > > > +                     bs->bl.max_append_sectors);
> > >
> > > So these are all ABI sensitive frontend device settings, but they are
> > > not exposed as tunables on the virtio-blk device, instead they are
> > > implicitly set from the backend.
> > >
> > > We have done this kind of thing before in QEMU, but several times it
> > > has bitten QEMU maintainers/users, as having a backend affect the
> > > frontend ABI is not to typical. It wouldn't be immediately obvious
> > > when starting QEMU on a target host that the live migration would
> > > be breaking ABI if the target host wasn't using a zoned device with
> > > exact same settings.
> > >
> > > This also limits mgmt flexibility across live migration, if the
> > > mgmt app wants/needs to change the storage backend. eg maybe they
> > > need to evacuate the host for an emergency, but don't have spare
> > > hosts with same kind of storage. It might be desirable to migrate
> > > and switch to a plain block device or raw/qcow2 file, rather than
> > > let the VM die.
> > >
> > > Can we make these virtio setting be explicitly controlled on the
> > > virtio-blk device.  If not specified explicitly they could be
> > > auto-populated from the backend for ease of use, but if specified
> > > then simply validate the backend is a match. libvirt would then
> > > make sure these are always explicitly set on the frontend.
> > 
> > I think this is a good idea, especially if we streamline the
> > file-posix.c driver by merging --blockdev zoned_host_device into
> > --blockdev host_device. It won't be obvious from the command-line
> > whether this is a zoned or non-zoned device. There should be a
> > --device virtio-blk-pci,drive=drive0,zoned=on option that fails when
> > drive0 isn't zoned. It should probably be on/off/auto where auto is
> > the default and doesn't check anything, on requires a zoned device,
> > and off requires a non-zoned device. That will prevent accidental
> > migration between zoned/non-zoned devices.
> > 
> > I want to point out that virtio-blk doesn't have checks for the disk
> > size or other details, so what you're suggesting for zone_sectors, etc
> > is stricter than what QEMU does today. Since the virtio-blk parameters
> > you're proposing are optional, I think it doesn't hurt though.
> 
> Yeah, it is slightly different than some of the parameters handling.
> I guess you could say that with disk capacity, matching size is a
> fairly obvious constraint/expectation to manage, and also long standing. 
> 
> With disk capacity, you can add the 'raw' driver on top of any block
> driver stack, to apply an arbitrary offset+size, to make the storage
> smaller than it otherwise is on disk. Conceptually than could have
> been done on the frontend device(s) too, but I guess it made more
> sense to do it in the block layer to give consistent enforcement
> of the limits across frontends. It is fuzzy whether such a use of
> the 'raw' driver is really considered backend config,  as opposed to
> frontend config but to me it feels likle frontend config.
> 
> You could possibly come up with the concept of a 'zoned' format that
> can be layered on top of a block driver stack to add zoned I/O constraints
> for sake of compatibility, where none otherwise exists in the physical
> storage. Possibly useful if multiple frontends all support zoned storage,
> to avoid duplicating the constraints across all ?


Maybe:

  DEFINE_BLOCK_ZONED_PROPERTIES(VirtIOBlock, conf.conf),

and then:

  bool blkconf_check_zoned_properties(BlockBackend *blk, BlockZonedConf *conf, 
Error **errp);

That macro and helper function can be shared by all emulated storage
controllers that implement zoned storage.

However, there's one problem: some storage interfaces extend the zoned
storage model (e.g. NVMe ZNS seems to have functionality that's not
available elsewhere). It would be necessary to check whether there is a
common subset of parameters with matching property names (because
terminology could be different) across emulated storage controllers.

But I think it's likely that this will work. I think the macro and
helper function approach is nice because it's internal to QEMU and users
don't need to set up a --blockdev enforce-zoned.

Stefan

signature.asc
Description: PGP signature

Re: [RFC v6 2/4] virtio-blk: add zoned storage emulation for zoned devices

Reply via email to